From bsmith at petsc.dev Sat Aug 1 01:41:59 2020 From: bsmith at petsc.dev (Barry Smith) Date: Sat, 1 Aug 2020 01:41:59 -0500 Subject: [petsc-users] SUPERLU_DIST in single precision In-Reply-To: <20200727211054.Horde.ZgzkW110frSEh8mw1TA2Zdn@webmail.mpcdf.mpg.de> References: <20200721155807.Horde.5ZYOzrQ7dmwNTQquK_6ebeS@webmail.mpcdf.mpg.de> <20200722150403.Horde.NO015I2GI2E-524J-SXqwui@webmail.mpcdf.mpg.de> <20200727211054.Horde.ZgzkW110frSEh8mw1TA2Zdn@webmail.mpcdf.mpg.de> Message-ID: Felix, The branch is ready. Just use git checkout barry/2020-07-28/superlu_dist-single ./configure --download-superlu_dist --download-metis --download-parmetis --download-ptscotch --with-precision=single and whatever else you use Barry It will automatically get the needed branch of SuperLU_DIST that Sherry prepared. > On Jul 27, 2020, at 2:10 PM, flw at rzg.mpg.de wrote: > > Hi Shery, > Yes, ideally we would like to compile PETSc in single precision and simply run a single precision version of SUPERLU_DIST just like e.g. MUMPS. > > Best regards and thanks, > Felix > Zitat von "Xiaoye S. Li" : > >> Barry, >> >> I have a branch 'Mixed-precision' working with single precision FP32. I >> assume Felix wants to use superlu_dist from petsc. How do you want to >> incorporate it in petsc? >> >> https://github.com/xiaoyeli/superlu_dist >> >> PS1: in this version, FP32 only works on CPU. FP64 and complex-FP64 all >> work on GPU. >> >> PS2: currently there is no mixed-precision yet, but it is the branch we are >> adding mix-prec support. Will take a while before merging to master. >> >> Sherry >> >> >> On Wed, Jul 22, 2020 at 6:04 AM wrote: >> >>> Hi Barry, >>> for now I just want to run everything in single on CPUs only with >>> SUPERLU_DIST. Maybe we will also incorporate GPUs in the future, but >>> there are no immediate plans yet. So if you could provide the support, >>> that would be awesome. >>> >>> Best regards, >>> Felix >>> >>> Zitat von Barry Smith : >>> >>> > Felix, >>> > >>> > What are your needs, do you want this for CPUs or for GPUs? Do >>> > you wish to run all your code in single precision or just the >>> > SuperLU_Dist solver while the rest of your code double? >>> > >>> > If you want to run everything on CPUs using single precision >>> > then adding the support is very easy, we can provide that for you >>> > any time. The other cases will require more thought. >>> > >>> > Barry >>> > >>> > >>> >> On Jul 21, 2020, at 8:58 AM, flw at rzg.mpg.de wrote: >>> >> >>> >> Dear PETSc support team, >>> >> some time ago you told me that you are planning on releasing a >>> >> version that supports SUPERLU_DIST in single-precision soon. Can >>> >> you tell me roughly what time frame you had in mind? >>> >> >>> >> Best regards, >>> >> Felix >>> >> >>> >>> >>> >>> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Sat Aug 1 10:44:29 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Sat, 1 Aug 2020 10:44:29 -0500 (CDT) Subject: [petsc-users] petsc-3.13.4.tar.gz now available Message-ID: Dear PETSc users, The patch release petsc-3.13.4 is now available for download, with change list at 'PETSc-3.13 Changelog' http://www.mcs.anl.gov/petsc/download/index.html Satish From bsmith at petsc.dev Sat Aug 1 11:41:57 2020 From: bsmith at petsc.dev (Barry Smith) Date: Sat, 1 Aug 2020 11:41:57 -0500 Subject: [petsc-users] overlap cpu and gpu? In-Reply-To: References: Message-ID: <6C7446CE-D6FD-4087-8B81-41494FA712E7@petsc.dev> Nicola, This is really viable or practical at this time with PETSc. It is not impossible but requires careful coding with threads, another possibility is to use one half of the virtual GPUs for each solve, this is also not trivial. I would recommend first seeing what kind of performance you can get on the GPU for each type of solve and revist this idea in the future. Barry > On Jul 31, 2020, at 9:23 AM, nicola varini wrote: > > Hello, I would like to know if it is possible to overlap CPU and GPU with DMDA. > I've a machine where each node has 1P100+1Haswell. > I've to resolve Poisson and Ampere equation for each time step. > I'm using 2D DMDA for each of them. Would be possible to compute poisson > and ampere equation at the same time? One on CPU and the other on GPU? > > Thanks From jed at jedbrown.org Sat Aug 1 14:24:41 2020 From: jed at jedbrown.org (Jed Brown) Date: Sat, 01 Aug 2020 13:24:41 -0600 Subject: [petsc-users] overlap cpu and gpu? In-Reply-To: <6C7446CE-D6FD-4087-8B81-41494FA712E7@petsc.dev> References: <6C7446CE-D6FD-4087-8B81-41494FA712E7@petsc.dev> Message-ID: <87eeoqp3t2.fsf@jedbrown.org> You can use MPI and split the communicator so n-1 ranks create a DMDA for one part of your system and the other rank drives the GPU in the other part. They can all be part of the same coupled system on the full communicator, but PETSc doesn't currently support some ranks having their Vec arrays on GPU and others on host, so you'd be paying host-device transfer costs on each iteration (and that might swamp any performance benefit you would have gotten). In any case, be sure to think about the execution time of each part. Load balancing with matching time-to-solution for each part can be really hard. -------------- next part -------------- A non-text attachment was scrubbed... Name: Kronbichler-fig4-crop.png Type: image/png Size: 111247 bytes Desc: not available URL: -------------- next part -------------- Barry Smith writes: > Nicola, > > This is really viable or practical at this time with PETSc. It is not impossible but requires careful coding with threads, another possibility is to use one half of the virtual GPUs for each solve, this is also not trivial. I would recommend first seeing what kind of performance you can get on the GPU for each type of solve and revist this idea in the future. > > Barry > > > > >> On Jul 31, 2020, at 9:23 AM, nicola varini wrote: >> >> Hello, I would like to know if it is possible to overlap CPU and GPU with DMDA. >> I've a machine where each node has 1P100+1Haswell. >> I've to resolve Poisson and Ampere equation for each time step. >> I'm using 2D DMDA for each of them. Would be possible to compute poisson >> and ampere equation at the same time? One on CPU and the other on GPU? >> >> Thanks From mfadams at lbl.gov Sun Aug 2 07:09:34 2020 From: mfadams at lbl.gov (Mark Adams) Date: Sun, 2 Aug 2020 08:09:34 -0400 Subject: [petsc-users] overlap cpu and gpu? In-Reply-To: <87eeoqp3t2.fsf@jedbrown.org> References: <6C7446CE-D6FD-4087-8B81-41494FA712E7@petsc.dev> <87eeoqp3t2.fsf@jedbrown.org> Message-ID: I suspect that the Poisson and Ampere's law solve are not coupled. You might be able to duplicate the communicator and use two threads. You would want to configure PETSc with threadsafty and threads and I think it could/should work, but this mode is never used by anyone. That said, I would not recommend doing this unless you feel like playing in computer science, as opposed to doing application science. The best case scenario you get a speedup of 2x. That is a strict upper bound, but you will never come close to it. Your hardware has some balance of CPU to GPU processing rate. Your application has a balance of volume of work for your two solves. They have to be the same to get close to 2x speedup and that ratio(s) has to be 1:1. To be concrete, from what little I can guess about your applications let's assume that the cost of each of these two solves is about the same (eg, Laplacians on your domain and the best case scenario). But, GPU machines are configured to have roughly 1-10% of capacity in the GPUs, these days, that gives you an upper bound of about 10% speedup. That is noise. Upshot, unless you configure your hardware to match this problem, and the two solves have the same cost, you will not see close to 2x speedup. Your time is better spent elsewhere. Mark On Sat, Aug 1, 2020 at 3:24 PM Jed Brown wrote: > You can use MPI and split the communicator so n-1 ranks create a DMDA for > one part of your system and the other rank drives the GPU in the other > part. They can all be part of the same coupled system on the full > communicator, but PETSc doesn't currently support some ranks having their > Vec arrays on GPU and others on host, so you'd be paying host-device > transfer costs on each iteration (and that might swamp any performance > benefit you would have gotten). > > In any case, be sure to think about the execution time of each part. Load > balancing with matching time-to-solution for each part can be really hard. > > > Barry Smith writes: > > > Nicola, > > > > This is really viable or practical at this time with PETSc. It is > not impossible but requires careful coding with threads, another > possibility is to use one half of the virtual GPUs for each solve, this is > also not trivial. I would recommend first seeing what kind of performance > you can get on the GPU for each type of solve and revist this idea in the > future. > > > > Barry > > > > > > > > > >> On Jul 31, 2020, at 9:23 AM, nicola varini > wrote: > >> > >> Hello, I would like to know if it is possible to overlap CPU and GPU > with DMDA. > >> I've a machine where each node has 1P100+1Haswell. > >> I've to resolve Poisson and Ampere equation for each time step. > >> I'm using 2D DMDA for each of them. Would be possible to compute > poisson > >> and ampere equation at the same time? One on CPU and the other on GPU? > >> > >> Thanks > -------------- next part -------------- An HTML attachment was scrubbed... URL: From flw at rzg.mpg.de Mon Aug 3 10:45:00 2020 From: flw at rzg.mpg.de (flw at rzg.mpg.de) Date: Mon, 03 Aug 2020 17:45:00 +0200 Subject: [petsc-users] SUPERLU_DIST in single precision In-Reply-To: References: <20200721155807.Horde.5ZYOzrQ7dmwNTQquK_6ebeS@webmail.mpcdf.mpg.de> <20200722150403.Horde.NO015I2GI2E-524J-SXqwui@webmail.mpcdf.mpg.de> <20200727211054.Horde.ZgzkW110frSEh8mw1TA2Zdn@webmail.mpcdf.mpg.de> Message-ID: <20200803174500.Horde.ipdcFrT5MFbLXenL8AAo3rJ@webmail.mpcdf.mpg.de> Hi Barry, Thanks for the branch (and thanks to Sherry as well). I tried to use the configure example " arch-ci-linux-intel-mkl-single.py" (adding the --with-batch flag, since I am running on a cluster), but I get the following error message: TESTING: check from config.libraries(config/BuildSystem/config/libraries.py:157) ******************************************************************************* UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): ------------------------------------------------------------------------------- Downloaded ptscotch could not be used. Please check install in /u/flw/petsc-hash-pkgs/073aec050 (see configure1.log) Next I tried a minimalistic version using Intel MPI compilers: ./configure --download-superlu_dist --download-metis --download-parmetis --download-ptscotch --with-precision=single --with-batch --with-fc=mpiifort --with-cc=mpiicc --with-cxx=mpiicpc There I got the following error: =============================================================================== Compiling and installing SUPERLU_DIST; this may take several minutes =============================================================================== ******************************************************************************* UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): ------------------------------------------------------------------------------- Error running make on SUPERLU_DIST (see configure2.log) Next I tried ./configure --download-superlu_dist --download-metis --download-parmetis --download-ptscotch --with-precision=single --with-batch --with-mpi-dir=/mpcdf/soft/SLE_12/packages/x86_64/intel_parallel_studio/2018.4/impi/2018.4.274/intel64 Same error message (see configure3.log) It seems that there is something going on with mkl in case I want to use Intel compilers for C and C++ (compiling with gcc and g++ seems to work ) Do you know what is going on there? (I am running on the Draco cluster, if that helps (https://www.mpcdf.mpg.de/services/computing/draco)) Best regards, Felix Zitat von Barry Smith : > Felix, > > The branch is ready. Just use > > git checkout barry/2020-07-28/superlu_dist-single > ./configure --download-superlu_dist --download-metis > --download-parmetis --download-ptscotch --with-precision=single > and whatever else you use > > Barry > > It will automatically get the needed branch of SuperLU_DIST that > Sherry prepared. > > >> On Jul 27, 2020, at 2:10 PM, flw at rzg.mpg.de wrote: >> >> Hi Shery, >> Yes, ideally we would like to compile PETSc in single precision and >> simply run a single precision version of SUPERLU_DIST just like >> e.g. MUMPS. >> >> Best regards and thanks, >> Felix >> Zitat von "Xiaoye S. Li" : >> >>> Barry, >>> >>> I have a branch 'Mixed-precision' working with single precision FP32. I >>> assume Felix wants to use superlu_dist from petsc. How do you want to >>> incorporate it in petsc? >>> >>> https://github.com/xiaoyeli/superlu_dist >>> >>> PS1: in this version, FP32 only works on CPU. FP64 and complex-FP64 all >>> work on GPU. >>> >>> PS2: currently there is no mixed-precision yet, but it is the branch we are >>> adding mix-prec support. Will take a while before merging to master. >>> >>> Sherry >>> >>> >>> On Wed, Jul 22, 2020 at 6:04 AM wrote: >>> >>>> Hi Barry, >>>> for now I just want to run everything in single on CPUs only with >>>> SUPERLU_DIST. Maybe we will also incorporate GPUs in the future, but >>>> there are no immediate plans yet. So if you could provide the support, >>>> that would be awesome. >>>> >>>> Best regards, >>>> Felix >>>> >>>> Zitat von Barry Smith : >>>> >>>> > Felix, >>>> > >>>> > What are your needs, do you want this for CPUs or for GPUs? Do >>>> > you wish to run all your code in single precision or just the >>>> > SuperLU_Dist solver while the rest of your code double? >>>> > >>>> > If you want to run everything on CPUs using single precision >>>> > then adding the support is very easy, we can provide that for you >>>> > any time. The other cases will require more thought. >>>> > >>>> > Barry >>>> > >>>> > >>>> >> On Jul 21, 2020, at 8:58 AM, flw at rzg.mpg.de wrote: >>>> >> >>>> >> Dear PETSc support team, >>>> >> some time ago you told me that you are planning on releasing a >>>> >> version that supports SUPERLU_DIST in single-precision soon. Can >>>> >> you tell me roughly what time frame you had in mind? >>>> >> >>>> >> Best regards, >>>> >> Felix >>>> >> >>>> >>>> >>>> >>>> >> >> >> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: configure1.log URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: configure2.log URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: configure3.log URL: From bsmith at petsc.dev Mon Aug 3 20:49:16 2020 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 3 Aug 2020 20:49:16 -0500 Subject: [petsc-users] SUPERLU_DIST in single precision In-Reply-To: <20200803174500.Horde.ipdcFrT5MFbLXenL8AAo3rJ@webmail.mpcdf.mpg.de> References: <20200721155807.Horde.5ZYOzrQ7dmwNTQquK_6ebeS@webmail.mpcdf.mpg.de> <20200722150403.Horde.NO015I2GI2E-524J-SXqwui@webmail.mpcdf.mpg.de> <20200727211054.Horde.ZgzkW110frSEh8mw1TA2Zdn@webmail.mpcdf.mpg.de> <20200803174500.Horde.ipdcFrT5MFbLXenL8AAo3rJ@webmail.mpcdf.mpg.de> Message-ID: For the first case do rm -rf arch-ci-linux-intel-mkl-single looks like there might be some outdated material in that directory causing the problem. For the second and third I see in superlu_defs.h #ifdef __INTEL_COMPILER #include "mkl.h" hence the use of Intel compiler requires MKL. But your configuration doesn't find the location of the MKL include files. I'm working on the general fix. > On Aug 3, 2020, at 10:45 AM, flw at rzg.mpg.de wrote: > > Hi Barry, > Thanks for the branch (and thanks to Sherry as well). I tried to use the configure example " arch-ci-linux-intel-mkl-single.py" (adding the --with-batch flag, since I am running on a cluster), but I get the following error message: > TESTING: check from config.libraries(config/BuildSystem/config/libraries.py:157) ******************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): > ------------------------------------------------------------------------------- > Downloaded ptscotch could not be used. Please check install in /u/flw/petsc-hash-pkgs/073aec050 > > (see configure1.log) > > Next I tried a minimalistic version using Intel MPI compilers: > ./configure --download-superlu_dist --download-metis --download-parmetis --download-ptscotch --with-precision=single --with-batch --with-fc=mpiifort --with-cc=mpiicc --with-cxx=mpiicpc > > There I got the following error: > =============================================================================== Compiling and installing SUPERLU_DIST; this may take several minutes =============================================================================== ******************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): > ------------------------------------------------------------------------------- > Error running make on SUPERLU_DIST > > (see configure2.log) > > > Next I tried > ./configure --download-superlu_dist --download-metis --download-parmetis --download-ptscotch --with-precision=single --with-batch --with-mpi-dir=/mpcdf/soft/SLE_12/packages/x86_64/intel_parallel_studio/2018.4/impi/2018.4.274/intel64 > > Same error message (see configure3.log) > > > It seems that there is something going on with mkl in case I want to use Intel compilers for C and C++ (compiling with gcc and g++ seems to work ) > > Do you know what is going on there? (I am running on the Draco cluster, if that helps (https://www.mpcdf.mpg.de/services/computing/draco)) > > > Best regards, > Felix > > Zitat von Barry Smith : > >> Felix, >> >> The branch is ready. Just use >> >> git checkout barry/2020-07-28/superlu_dist-single >> ./configure --download-superlu_dist --download-metis --download-parmetis --download-ptscotch --with-precision=single and whatever else you use >> >> Barry >> >> It will automatically get the needed branch of SuperLU_DIST that Sherry prepared. >> >> >>> On Jul 27, 2020, at 2:10 PM, flw at rzg.mpg.de wrote: >>> >>> Hi Shery, >>> Yes, ideally we would like to compile PETSc in single precision and simply run a single precision version of SUPERLU_DIST just like e.g. MUMPS. >>> >>> Best regards and thanks, >>> Felix >>> Zitat von "Xiaoye S. Li" : >>> >>>> Barry, >>>> >>>> I have a branch 'Mixed-precision' working with single precision FP32. I >>>> assume Felix wants to use superlu_dist from petsc. How do you want to >>>> incorporate it in petsc? >>>> >>>> https://github.com/xiaoyeli/superlu_dist >>>> >>>> PS1: in this version, FP32 only works on CPU. FP64 and complex-FP64 all >>>> work on GPU. >>>> >>>> PS2: currently there is no mixed-precision yet, but it is the branch we are >>>> adding mix-prec support. Will take a while before merging to master. >>>> >>>> Sherry >>>> >>>> >>>> On Wed, Jul 22, 2020 at 6:04 AM wrote: >>>> >>>>> Hi Barry, >>>>> for now I just want to run everything in single on CPUs only with >>>>> SUPERLU_DIST. Maybe we will also incorporate GPUs in the future, but >>>>> there are no immediate plans yet. So if you could provide the support, >>>>> that would be awesome. >>>>> >>>>> Best regards, >>>>> Felix >>>>> >>>>> Zitat von Barry Smith : >>>>> >>>>> > Felix, >>>>> > >>>>> > What are your needs, do you want this for CPUs or for GPUs? Do >>>>> > you wish to run all your code in single precision or just the >>>>> > SuperLU_Dist solver while the rest of your code double? >>>>> > >>>>> > If you want to run everything on CPUs using single precision >>>>> > then adding the support is very easy, we can provide that for you >>>>> > any time. The other cases will require more thought. >>>>> > >>>>> > Barry >>>>> > >>>>> > >>>>> >> On Jul 21, 2020, at 8:58 AM, flw at rzg.mpg.de wrote: >>>>> >> >>>>> >> Dear PETSc support team, >>>>> >> some time ago you told me that you are planning on releasing a >>>>> >> version that supports SUPERLU_DIST in single-precision soon. Can >>>>> >> you tell me roughly what time frame you had in mind? >>>>> >> >>>>> >> Best regards, >>>>> >> Felix >>>>> >> >>>>> >>>>> >>>>> >>>>> >>> >>> >>> > > > From xsli at lbl.gov Mon Aug 3 22:21:10 2020 From: xsli at lbl.gov (Xiaoye S. Li) Date: Mon, 3 Aug 2020 20:21:10 -0700 Subject: [petsc-users] SUPERLU_DIST in single precision In-Reply-To: <20200803174500.Horde.ipdcFrT5MFbLXenL8AAo3rJ@webmail.mpcdf.mpg.de> References: <20200721155807.Horde.5ZYOzrQ7dmwNTQquK_6ebeS@webmail.mpcdf.mpg.de> <20200722150403.Horde.NO015I2GI2E-524J-SXqwui@webmail.mpcdf.mpg.de> <20200727211054.Horde.ZgzkW110frSEh8mw1TA2Zdn@webmail.mpcdf.mpg.de> <20200803174500.Horde.ipdcFrT5MFbLXenL8AAo3rJ@webmail.mpcdf.mpg.de> Message-ID: Regarding MKL, superlu only uses some BLAS routines in MKL. If you have alternative BLAS, you don't need to use MKL. But, in my experience, on an Intel system, the compiler can correctly include the path with mkl.h, that is, I don't need to do anything special. I think mpiicc should work. Sherry On Mon, Aug 3, 2020 at 8:46 AM wrote: > Hi Barry, > Thanks for the branch (and thanks to Sherry as well). I tried to use > the configure example " arch-ci-linux-intel-mkl-single.py" (adding the > --with-batch flag, since I am running on a cluster), but I get the > following error message: > TESTING: check from > config.libraries(config/BuildSystem/config/libraries.py:157) > > > > > ******************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log > for details): > > ------------------------------------------------------------------------------- > Downloaded ptscotch could not be used. Please check install in > /u/flw/petsc-hash-pkgs/073aec050 > > (see configure1.log) > > Next I tried a minimalistic version using Intel MPI compilers: > ./configure --download-superlu_dist --download-metis > --download-parmetis --download-ptscotch --with-precision=single > --with-batch --with-fc=mpiifort --with-cc=mpiicc --with-cxx=mpiicpc > > There I got the following error: > =============================================================================== > > > Compiling and installing SUPERLU_DIST; this may take several > minutes > > > =============================================================================== > > > > > > > > ******************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log > for details): > > ------------------------------------------------------------------------------- > Error running make on SUPERLU_DIST > > (see configure2.log) > > > Next I tried > ./configure --download-superlu_dist --download-metis > --download-parmetis --download-ptscotch --with-precision=single > --with-batch > > --with-mpi-dir=/mpcdf/soft/SLE_12/packages/x86_64/intel_parallel_studio/2018.4/impi/2018.4.274/intel64 > > Same error message (see configure3.log) > > > It seems that there is something going on with mkl in case I want to > use Intel compilers for C and C++ (compiling with gcc and g++ seems to > work ) > > Do you know what is going on there? (I am running on the Draco > cluster, if that helps > (https://www.mpcdf.mpg.de/services/computing/draco)) > > > Best regards, > Felix > > Zitat von Barry Smith : > > > Felix, > > > > The branch is ready. Just use > > > > git checkout barry/2020-07-28/superlu_dist-single > > ./configure --download-superlu_dist --download-metis > > --download-parmetis --download-ptscotch --with-precision=single > > and whatever else you use > > > > Barry > > > > It will automatically get the needed branch of SuperLU_DIST that > > Sherry prepared. > > > > > >> On Jul 27, 2020, at 2:10 PM, flw at rzg.mpg.de wrote: > >> > >> Hi Shery, > >> Yes, ideally we would like to compile PETSc in single precision and > >> simply run a single precision version of SUPERLU_DIST just like > >> e.g. MUMPS. > >> > >> Best regards and thanks, > >> Felix > >> Zitat von "Xiaoye S. Li" : > >> > >>> Barry, > >>> > >>> I have a branch 'Mixed-precision' working with single precision FP32. I > >>> assume Felix wants to use superlu_dist from petsc. How do you want to > >>> incorporate it in petsc? > >>> > >>> https://github.com/xiaoyeli/superlu_dist > >>> > >>> PS1: in this version, FP32 only works on CPU. FP64 and complex-FP64 > all > >>> work on GPU. > >>> > >>> PS2: currently there is no mixed-precision yet, but it is the branch > we are > >>> adding mix-prec support. Will take a while before merging to master. > >>> > >>> Sherry > >>> > >>> > >>> On Wed, Jul 22, 2020 at 6:04 AM wrote: > >>> > >>>> Hi Barry, > >>>> for now I just want to run everything in single on CPUs only with > >>>> SUPERLU_DIST. Maybe we will also incorporate GPUs in the future, but > >>>> there are no immediate plans yet. So if you could provide the support, > >>>> that would be awesome. > >>>> > >>>> Best regards, > >>>> Felix > >>>> > >>>> Zitat von Barry Smith : > >>>> > >>>> > Felix, > >>>> > > >>>> > What are your needs, do you want this for CPUs or for GPUs? Do > >>>> > you wish to run all your code in single precision or just the > >>>> > SuperLU_Dist solver while the rest of your code double? > >>>> > > >>>> > If you want to run everything on CPUs using single precision > >>>> > then adding the support is very easy, we can provide that for you > >>>> > any time. The other cases will require more thought. > >>>> > > >>>> > Barry > >>>> > > >>>> > > >>>> >> On Jul 21, 2020, at 8:58 AM, flw at rzg.mpg.de wrote: > >>>> >> > >>>> >> Dear PETSc support team, > >>>> >> some time ago you told me that you are planning on releasing a > >>>> >> version that supports SUPERLU_DIST in single-precision soon. Can > >>>> >> you tell me roughly what time frame you had in mind? > >>>> >> > >>>> >> Best regards, > >>>> >> Felix > >>>> >> > >>>> > >>>> > >>>> > >>>> > >> > >> > >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Aug 3 23:05:02 2020 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 3 Aug 2020 23:05:02 -0500 Subject: [petsc-users] SUPERLU_DIST in single precision In-Reply-To: References: <20200721155807.Horde.5ZYOzrQ7dmwNTQquK_6ebeS@webmail.mpcdf.mpg.de> <20200722150403.Horde.NO015I2GI2E-524J-SXqwui@webmail.mpcdf.mpg.de> <20200727211054.Horde.ZgzkW110frSEh8mw1TA2Zdn@webmail.mpcdf.mpg.de> <20200803174500.Horde.ipdcFrT5MFbLXenL8AAo3rJ@webmail.mpcdf.mpg.de> Message-ID: <4668AA2D-04CE-457C-A409-3DFE5B194120@petsc.dev> On some systems, like my Mac, the environmental variable CPATH is set to the appropriate directory for mkl.h when MKL is initialized with source /opt/intel/compilers_and_libraries/mac/bin/compilervars.sh intel64 hence the compiler can magically find mkl.h but this doesn't seem to be universal, I think that is why Felix's system cannot find the include, I found a Linux machine at ANL that produces the same problem Felix has. Anyways I'm fixing the PETSc install will just work even if CPATH is not set. Barry > On Aug 3, 2020, at 10:21 PM, Xiaoye S. Li wrote: > > Regarding MKL, superlu only uses some BLAS routines in MKL. If you have alternative BLAS, you don't need to use MKL. > > But, in my experience, on an Intel system, the compiler can correctly include the path with mkl.h, that is, I don't need to do anything special. > I think mpiicc should work. > > Sherry > > On Mon, Aug 3, 2020 at 8:46 AM > wrote: > Hi Barry, > Thanks for the branch (and thanks to Sherry as well). I tried to use > the configure example " arch-ci-linux-intel-mkl-single.py" (adding the > --with-batch flag, since I am running on a cluster), but I get the > following error message: > TESTING: check from > config.libraries(config/BuildSystem/config/libraries.py:157) > > > > ******************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log > for details): > ------------------------------------------------------------------------------- > Downloaded ptscotch could not be used. Please check install in > /u/flw/petsc-hash-pkgs/073aec050 > > (see configure1.log) > > Next I tried a minimalistic version using Intel MPI compilers: > ./configure --download-superlu_dist --download-metis > --download-parmetis --download-ptscotch --with-precision=single > --with-batch --with-fc=mpiifort --with-cc=mpiicc --with-cxx=mpiicpc > > There I got the following error: > =============================================================================== Compiling and installing SUPERLU_DIST; this may take several minutes =============================================================================== > ******************************************************************************* > UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log > for details): > ------------------------------------------------------------------------------- > Error running make on SUPERLU_DIST > > (see configure2.log) > > > Next I tried > ./configure --download-superlu_dist --download-metis > --download-parmetis --download-ptscotch --with-precision=single > --with-batch > --with-mpi-dir=/mpcdf/soft/SLE_12/packages/x86_64/intel_parallel_studio/2018.4/impi/2018.4.274/intel64 > > Same error message (see configure3.log) > > > It seems that there is something going on with mkl in case I want to > use Intel compilers for C and C++ (compiling with gcc and g++ seems to > work ) > > Do you know what is going on there? (I am running on the Draco > cluster, if that helps > (https://www.mpcdf.mpg.de/services/computing/draco )) > > > Best regards, > Felix > > Zitat von Barry Smith >: > > > Felix, > > > > The branch is ready. Just use > > > > git checkout barry/2020-07-28/superlu_dist-single > > ./configure --download-superlu_dist --download-metis > > --download-parmetis --download-ptscotch --with-precision=single > > and whatever else you use > > > > Barry > > > > It will automatically get the needed branch of SuperLU_DIST that > > Sherry prepared. > > > > > >> On Jul 27, 2020, at 2:10 PM, flw at rzg.mpg.de wrote: > >> > >> Hi Shery, > >> Yes, ideally we would like to compile PETSc in single precision and > >> simply run a single precision version of SUPERLU_DIST just like > >> e.g. MUMPS. > >> > >> Best regards and thanks, > >> Felix > >> Zitat von "Xiaoye S. Li" >: > >> > >>> Barry, > >>> > >>> I have a branch 'Mixed-precision' working with single precision FP32. I > >>> assume Felix wants to use superlu_dist from petsc. How do you want to > >>> incorporate it in petsc? > >>> > >>> https://github.com/xiaoyeli/superlu_dist > >>> > >>> PS1: in this version, FP32 only works on CPU. FP64 and complex-FP64 all > >>> work on GPU. > >>> > >>> PS2: currently there is no mixed-precision yet, but it is the branch we are > >>> adding mix-prec support. Will take a while before merging to master. > >>> > >>> Sherry > >>> > >>> > >>> On Wed, Jul 22, 2020 at 6:04 AM > wrote: > >>> > >>>> Hi Barry, > >>>> for now I just want to run everything in single on CPUs only with > >>>> SUPERLU_DIST. Maybe we will also incorporate GPUs in the future, but > >>>> there are no immediate plans yet. So if you could provide the support, > >>>> that would be awesome. > >>>> > >>>> Best regards, > >>>> Felix > >>>> > >>>> Zitat von Barry Smith >: > >>>> > >>>> > Felix, > >>>> > > >>>> > What are your needs, do you want this for CPUs or for GPUs? Do > >>>> > you wish to run all your code in single precision or just the > >>>> > SuperLU_Dist solver while the rest of your code double? > >>>> > > >>>> > If you want to run everything on CPUs using single precision > >>>> > then adding the support is very easy, we can provide that for you > >>>> > any time. The other cases will require more thought. > >>>> > > >>>> > Barry > >>>> > > >>>> > > >>>> >> On Jul 21, 2020, at 8:58 AM, flw at rzg.mpg.de wrote: > >>>> >> > >>>> >> Dear PETSc support team, > >>>> >> some time ago you told me that you are planning on releasing a > >>>> >> version that supports SUPERLU_DIST in single-precision soon. Can > >>>> >> you tell me roughly what time frame you had in mind? > >>>> >> > >>>> >> Best regards, > >>>> >> Felix > >>>> >> > >>>> > >>>> > >>>> > >>>> > >> > >> > >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicola.varini at gmail.com Tue Aug 4 05:04:39 2020 From: nicola.varini at gmail.com (nicola varini) Date: Tue, 4 Aug 2020 12:04:39 +0200 Subject: [petsc-users] overlap cpu and gpu? In-Reply-To: References: <6C7446CE-D6FD-4087-8B81-41494FA712E7@petsc.dev> <87eeoqp3t2.fsf@jedbrown.org> Message-ID: Dear all, thanks for your replies. The reason why I've asked if it is possible to overlap poisson and ampere is because they roughly take the same amount of time. Please find in attachment the profiling logs for only CPU and only GPU. Of course it is possible to split the MPI communicator and run each solver on different subcommunicator, however this would involve more communication. Did anyone ever tried to run 2 solvers with hyperthreading? Thanks Il giorno dom 2 ago 2020 alle ore 14:09 Mark Adams ha scritto: > I suspect that the Poisson and Ampere's law solve are not coupled. You > might be able to duplicate the communicator and use two threads. You would > want to configure PETSc with threadsafty and threads and I think it > could/should work, but this mode is never used by anyone. > > That said, I would not recommend doing this unless you feel like playing > in computer science, as opposed to doing application science. The best case > scenario you get a speedup of 2x. That is a strict upper bound, but you > will never come close to it. Your hardware has some balance of CPU to GPU > processing rate. Your application has a balance of volume of work for your > two solves. They have to be the same to get close to 2x speedup and that > ratio(s) has to be 1:1. To be concrete, from what little I can guess about > your applications let's assume that the cost of each of these two solves is > about the same (eg, Laplacians on your domain and the best case scenario). > But, GPU machines are configured to have roughly 1-10% of capacity in the > GPUs, these days, that gives you an upper bound of about 10% speedup. That > is noise. Upshot, unless you configure your hardware to match this problem, > and the two solves have the same cost, you will not see close to 2x > speedup. Your time is better spent elsewhere. > > Mark > > On Sat, Aug 1, 2020 at 3:24 PM Jed Brown wrote: > >> You can use MPI and split the communicator so n-1 ranks create a DMDA for >> one part of your system and the other rank drives the GPU in the other >> part. They can all be part of the same coupled system on the full >> communicator, but PETSc doesn't currently support some ranks having their >> Vec arrays on GPU and others on host, so you'd be paying host-device >> transfer costs on each iteration (and that might swamp any performance >> benefit you would have gotten). >> >> In any case, be sure to think about the execution time of each part. >> Load balancing with matching time-to-solution for each part can be really >> hard. >> >> >> Barry Smith writes: >> >> > Nicola, >> > >> > This is really viable or practical at this time with PETSc. It is >> not impossible but requires careful coding with threads, another >> possibility is to use one half of the virtual GPUs for each solve, this is >> also not trivial. I would recommend first seeing what kind of performance >> you can get on the GPU for each type of solve and revist this idea in the >> future. >> > >> > Barry >> > >> > >> > >> > >> >> On Jul 31, 2020, at 9:23 AM, nicola varini >> wrote: >> >> >> >> Hello, I would like to know if it is possible to overlap CPU and GPU >> with DMDA. >> >> I've a machine where each node has 1P100+1Haswell. >> >> I've to resolve Poisson and Ampere equation for each time step. >> >> I'm using 2D DMDA for each of them. Would be possible to compute >> poisson >> >> and ampere equation at the same time? One on CPU and the other on GPU? >> >> >> >> Thanks >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: out_gpu Type: application/octet-stream Size: 385670 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: out_nogpu Type: application/octet-stream Size: 385195 bytes Desc: not available URL: From stefano.zampini at gmail.com Tue Aug 4 05:35:56 2020 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Tue, 4 Aug 2020 12:35:56 +0200 Subject: [petsc-users] overlap cpu and gpu? In-Reply-To: References: <6C7446CE-D6FD-4087-8B81-41494FA712E7@petsc.dev> <87eeoqp3t2.fsf@jedbrown.org> Message-ID: Nicola, You are actually not using the GPU properly, since you use HYPRE preconditioning, which is CPU only. One of your solvers is actually slower on ?GPU?. For a full AMG GPU, you can use PCGAMG, with cheby smoothers and with Jacobi preconditioning. Mark can help you out with the specific command line options. When it works properly, everything related to PC application is offloaded to the GPU, and you should expect to get the well-known and branded 10x (maybe more) speedup one is expecting from GPUs during KSPSolve Doing what you want to do is one of the last optimization steps of an already optimized code before entering production. Yours is not even optimized for proper GPU usage yet. Also, any specific reason why you are using dgmres and fgmres? PETSc has not been designed with multi-threading in mind. You can achieve ?overlap? of the two solves by splitting the communicator. But then you need communications to let the two solutions talk to each other. Thanks Stefano > On Aug 4, 2020, at 12:04 PM, nicola varini wrote: > > Dear all, thanks for your replies. The reason why I've asked if it is possible to overlap poisson and ampere is because they roughly > take the same amount of time. Please find in attachment the profiling logs for only CPU and only GPU. > Of course it is possible to split the MPI communicator and run each solver on different subcommunicator, however this would involve more communication. > Did anyone ever tried to run 2 solvers with hyperthreading? > Thanks > > > Il giorno dom 2 ago 2020 alle ore 14:09 Mark Adams > ha scritto: > I suspect that the Poisson and Ampere's law solve are not coupled. You might be able to duplicate the communicator and use two threads. You would want to configure PETSc with threadsafty and threads and I think it could/should work, but this mode is never used by anyone. > > That said, I would not recommend doing this unless you feel like playing in computer science, as opposed to doing application science. The best case scenario you get a speedup of 2x. That is a strict upper bound, but you will never come close to it. Your hardware has some balance of CPU to GPU processing rate. Your application has a balance of volume of work for your two solves. They have to be the same to get close to 2x speedup and that ratio(s) has to be 1:1. To be concrete, from what little I can guess about your applications let's assume that the cost of each of these two solves is about the same (eg, Laplacians on your domain and the best case scenario). But, GPU machines are configured to have roughly 1-10% of capacity in the GPUs, these days, that gives you an upper bound of about 10% speedup. That is noise. Upshot, unless you configure your hardware to match this problem, and the two solves have the same cost, you will not see close to 2x speedup. Your time is better spent elsewhere. > > Mark > > On Sat, Aug 1, 2020 at 3:24 PM Jed Brown > wrote: > You can use MPI and split the communicator so n-1 ranks create a DMDA for one part of your system and the other rank drives the GPU in the other part. They can all be part of the same coupled system on the full communicator, but PETSc doesn't currently support some ranks having their Vec arrays on GPU and others on host, so you'd be paying host-device transfer costs on each iteration (and that might swamp any performance benefit you would have gotten). > > In any case, be sure to think about the execution time of each part. Load balancing with matching time-to-solution for each part can be really hard. > > > Barry Smith > writes: > > > Nicola, > > > > This is really viable or practical at this time with PETSc. It is not impossible but requires careful coding with threads, another possibility is to use one half of the virtual GPUs for each solve, this is also not trivial. I would recommend first seeing what kind of performance you can get on the GPU for each type of solve and revist this idea in the future. > > > > Barry > > > > > > > > > >> On Jul 31, 2020, at 9:23 AM, nicola varini > wrote: > >> > >> Hello, I would like to know if it is possible to overlap CPU and GPU with DMDA. > >> I've a machine where each node has 1P100+1Haswell. > >> I've to resolve Poisson and Ampere equation for each time step. > >> I'm using 2D DMDA for each of them. Would be possible to compute poisson > >> and ampere equation at the same time? One on CPU and the other on GPU? > >> > >> Thanks > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicola.varini at gmail.com Tue Aug 4 05:46:21 2020 From: nicola.varini at gmail.com (nicola varini) Date: Tue, 4 Aug 2020 12:46:21 +0200 Subject: [petsc-users] overlap cpu and gpu? In-Reply-To: References: <6C7446CE-D6FD-4087-8B81-41494FA712E7@petsc.dev> <87eeoqp3t2.fsf@jedbrown.org> Message-ID: Thanks for your reply Stefano. I know that HYPRE is not ported on GPU, but the Solver is running on GPU and is taking ~9s and is showing 100% of GPU utilization. Il giorno mar 4 ago 2020 alle ore 12:35 Stefano Zampini < stefano.zampini at gmail.com> ha scritto: > Nicola, > > You are actually not using the GPU properly, since you use HYPRE > preconditioning, which is CPU only. One of your solvers is actually slower > on ?GPU?. > For a full AMG GPU, you can use PCGAMG, with cheby smoothers and with > Jacobi preconditioning. Mark can help you out with the specific command > line options. > When it works properly, everything related to PC application is offloaded > to the GPU, and you should expect to get the well-known and branded 10x > (maybe more) speedup one is expecting from GPUs during KSPSolve > > Doing what you want to do is one of the last optimization steps of an > already optimized code before entering production. Yours is not even > optimized for proper GPU usage yet. > Also, any specific reason why you are using dgmres and fgmres? > > PETSc has not been designed with multi-threading in mind. You can achieve > ?overlap? of the two solves by splitting the communicator. But then you > need communications to let the two solutions talk to each other. > > Thanks > Stefano > > > On Aug 4, 2020, at 12:04 PM, nicola varini > wrote: > > Dear all, thanks for your replies. The reason why I've asked if it is > possible to overlap poisson and ampere is because they roughly > take the same amount of time. Please find in attachment the profiling logs > for only CPU and only GPU. > Of course it is possible to split the MPI communicator and run each solver > on different subcommunicator, however this would involve more communication. > Did anyone ever tried to run 2 solvers with hyperthreading? > Thanks > > > Il giorno dom 2 ago 2020 alle ore 14:09 Mark Adams ha > scritto: > >> I suspect that the Poisson and Ampere's law solve are not coupled. You >> might be able to duplicate the communicator and use two threads. You would >> want to configure PETSc with threadsafty and threads and I think it >> could/should work, but this mode is never used by anyone. >> >> That said, I would not recommend doing this unless you feel like playing >> in computer science, as opposed to doing application science. The best case >> scenario you get a speedup of 2x. That is a strict upper bound, but you >> will never come close to it. Your hardware has some balance of CPU to GPU >> processing rate. Your application has a balance of volume of work for your >> two solves. They have to be the same to get close to 2x speedup and that >> ratio(s) has to be 1:1. To be concrete, from what little I can guess about >> your applications let's assume that the cost of each of these two solves is >> about the same (eg, Laplacians on your domain and the best case scenario). >> But, GPU machines are configured to have roughly 1-10% of capacity in the >> GPUs, these days, that gives you an upper bound of about 10% speedup. That >> is noise. Upshot, unless you configure your hardware to match this problem, >> and the two solves have the same cost, you will not see close to 2x >> speedup. Your time is better spent elsewhere. >> >> Mark >> >> On Sat, Aug 1, 2020 at 3:24 PM Jed Brown wrote: >> >>> You can use MPI and split the communicator so n-1 ranks create a DMDA >>> for one part of your system and the other rank drives the GPU in the other >>> part. They can all be part of the same coupled system on the full >>> communicator, but PETSc doesn't currently support some ranks having their >>> Vec arrays on GPU and others on host, so you'd be paying host-device >>> transfer costs on each iteration (and that might swamp any performance >>> benefit you would have gotten). >>> >>> In any case, be sure to think about the execution time of each part. >>> Load balancing with matching time-to-solution for each part can be really >>> hard. >>> >>> >>> Barry Smith writes: >>> >>> > Nicola, >>> > >>> > This is really viable or practical at this time with PETSc. It is >>> not impossible but requires careful coding with threads, another >>> possibility is to use one half of the virtual GPUs for each solve, this is >>> also not trivial. I would recommend first seeing what kind of performance >>> you can get on the GPU for each type of solve and revist this idea in the >>> future. >>> > >>> > Barry >>> > >>> > >>> > >>> > >>> >> On Jul 31, 2020, at 9:23 AM, nicola varini >>> wrote: >>> >> >>> >> Hello, I would like to know if it is possible to overlap CPU and GPU >>> with DMDA. >>> >> I've a machine where each node has 1P100+1Haswell. >>> >> I've to resolve Poisson and Ampere equation for each time step. >>> >> I'm using 2D DMDA for each of them. Would be possible to compute >>> poisson >>> >> and ampere equation at the same time? One on CPU and the other on GPU? >>> >> >>> >> Thanks >>> >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From flw at rzg.mpg.de Tue Aug 4 07:58:56 2020 From: flw at rzg.mpg.de (flw at rzg.mpg.de) Date: Tue, 04 Aug 2020 14:58:56 +0200 Subject: [petsc-users] SUPERLU_DIST in single precision In-Reply-To: <4668AA2D-04CE-457C-A409-3DFE5B194120@petsc.dev> References: <20200721155807.Horde.5ZYOzrQ7dmwNTQquK_6ebeS@webmail.mpcdf.mpg.de> <20200722150403.Horde.NO015I2GI2E-524J-SXqwui@webmail.mpcdf.mpg.de> <20200727211054.Horde.ZgzkW110frSEh8mw1TA2Zdn@webmail.mpcdf.mpg.de> <20200803174500.Horde.ipdcFrT5MFbLXenL8AAo3rJ@webmail.mpcdf.mpg.de> <4668AA2D-04CE-457C-A409-3DFE5B194120@petsc.dev> Message-ID: <20200804145856.Horde.D4wViBl3AGB6TjdkyxeWjB0@webmail.mpcdf.mpg.de> Yep, when I assign the mkl include directory to CPATH, it works :) Best regards, Felix Zitat von Barry Smith : > On some systems, like my Mac, the environmental variable CPATH is > set to the appropriate directory for mkl.h when MKL is initialized > with > > source /opt/intel/compilers_and_libraries/mac/bin/compilervars.sh intel64 > > hence the compiler can magically find mkl.h but this doesn't seem to > be universal, I think that is why Felix's system cannot find the > include, I found a Linux machine at ANL that produces the same > problem Felix has. Anyways I'm fixing the PETSc install will just > work even if CPATH is not set. > > Barry > > > > > >> On Aug 3, 2020, at 10:21 PM, Xiaoye S. Li wrote: >> >> Regarding MKL, superlu only uses some BLAS routines in MKL. If you >> have alternative BLAS, you don't need to use MKL. >> >> But, in my experience, on an Intel system, the compiler can >> correctly include the path with mkl.h, that is, I don't need to do >> anything special. >> I think mpiicc should work. >> >> Sherry >> >> On Mon, Aug 3, 2020 at 8:46 AM > > wrote: >> Hi Barry, >> Thanks for the branch (and thanks to Sherry as well). I tried to use >> the configure example " arch-ci-linux-intel-mkl-single.py" (adding the >> --with-batch flag, since I am running on a cluster), but I get the >> following error message: >> TESTING: check from >> config.libraries(config/BuildSystem/config/libraries.py:157) >> >> >> >> ******************************************************************************* >> UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log >> for details): >> ------------------------------------------------------------------------------- >> Downloaded ptscotch could not be used. Please check install in >> /u/flw/petsc-hash-pkgs/073aec050 >> >> (see configure1.log) >> >> Next I tried a minimalistic version using Intel MPI compilers: >> ./configure --download-superlu_dist --download-metis >> --download-parmetis --download-ptscotch --with-precision=single >> --with-batch --with-fc=mpiifort --with-cc=mpiicc --with-cxx=mpiicpc >> >> There I got the following error: >> =============================================================================== Compiling and installing SUPERLU_DIST; this may take several minutes >> =============================================================================== >> ******************************************************************************* >> UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log >> for details): >> ------------------------------------------------------------------------------- >> Error running make on SUPERLU_DIST >> >> (see configure2.log) >> >> >> Next I tried >> ./configure --download-superlu_dist --download-metis >> --download-parmetis --download-ptscotch --with-precision=single >> --with-batch >> --with-mpi-dir=/mpcdf/soft/SLE_12/packages/x86_64/intel_parallel_studio/2018.4/impi/2018.4.274/intel64 >> >> Same error message (see configure3.log) >> >> >> It seems that there is something going on with mkl in case I want to >> use Intel compilers for C and C++ (compiling with gcc and g++ seems to >> work ) >> >> Do you know what is going on there? (I am running on the Draco >> cluster, if that helps >> (https://www.mpcdf.mpg.de/services/computing/draco >> )) >> >> >> Best regards, >> Felix >> >> Zitat von Barry Smith >: >> >> > Felix, >> > >> > The branch is ready. Just use >> > >> > git checkout barry/2020-07-28/superlu_dist-single >> > ./configure --download-superlu_dist --download-metis >> > --download-parmetis --download-ptscotch --with-precision=single >> > and whatever else you use >> > >> > Barry >> > >> > It will automatically get the needed branch of SuperLU_DIST that >> > Sherry prepared. >> > >> > >> >> On Jul 27, 2020, at 2:10 PM, flw at rzg.mpg.de >> wrote: >> >> >> >> Hi Shery, >> >> Yes, ideally we would like to compile PETSc in single precision and >> >> simply run a single precision version of SUPERLU_DIST just like >> >> e.g. MUMPS. >> >> >> >> Best regards and thanks, >> >> Felix >> >> Zitat von "Xiaoye S. Li" >: >> >> >> >>> Barry, >> >>> >> >>> I have a branch 'Mixed-precision' working with single precision FP32. I >> >>> assume Felix wants to use superlu_dist from petsc. How do you want to >> >>> incorporate it in petsc? >> >>> >> >>> https://github.com/xiaoyeli/superlu_dist >> >> >>> >> >>> PS1: in this version, FP32 only works on CPU. FP64 and >> complex-FP64 all >> >>> work on GPU. >> >>> >> >>> PS2: currently there is no mixed-precision yet, but it is the >> branch we are >> >>> adding mix-prec support. Will take a while before merging to master. >> >>> >> >>> Sherry >> >>> >> >>> >> >>> On Wed, Jul 22, 2020 at 6:04 AM > > wrote: >> >>> >> >>>> Hi Barry, >> >>>> for now I just want to run everything in single on CPUs only with >> >>>> SUPERLU_DIST. Maybe we will also incorporate GPUs in the future, but >> >>>> there are no immediate plans yet. So if you could provide the support, >> >>>> that would be awesome. >> >>>> >> >>>> Best regards, >> >>>> Felix >> >>>> >> >>>> Zitat von Barry Smith >: >> >>>> >> >>>> > Felix, >> >>>> > >> >>>> > What are your needs, do you want this for CPUs or for GPUs? Do >> >>>> > you wish to run all your code in single precision or just the >> >>>> > SuperLU_Dist solver while the rest of your code double? >> >>>> > >> >>>> > If you want to run everything on CPUs using single precision >> >>>> > then adding the support is very easy, we can provide that for you >> >>>> > any time. The other cases will require more thought. >> >>>> > >> >>>> > Barry >> >>>> > >> >>>> > >> >>>> >> On Jul 21, 2020, at 8:58 AM, flw at rzg.mpg.de >> wrote: >> >>>> >> >> >>>> >> Dear PETSc support team, >> >>>> >> some time ago you told me that you are planning on releasing a >> >>>> >> version that supports SUPERLU_DIST in single-precision soon. Can >> >>>> >> you tell me roughly what time frame you had in mind? >> >>>> >> >> >>>> >> Best regards, >> >>>> >> Felix >> >>>> >> >> >>>> >> >>>> >> >>>> >> >>>> >> >> >> >> >> >> >> >> From flw at rzg.mpg.de Tue Aug 4 08:09:32 2020 From: flw at rzg.mpg.de (flw at rzg.mpg.de) Date: Tue, 04 Aug 2020 15:09:32 +0200 Subject: [petsc-users] SUPERLU_DIST in single precision In-Reply-To: <4668AA2D-04CE-457C-A409-3DFE5B194120@petsc.dev> References: <20200721155807.Horde.5ZYOzrQ7dmwNTQquK_6ebeS@webmail.mpcdf.mpg.de> <20200722150403.Horde.NO015I2GI2E-524J-SXqwui@webmail.mpcdf.mpg.de> <20200727211054.Horde.ZgzkW110frSEh8mw1TA2Zdn@webmail.mpcdf.mpg.de> <20200803174500.Horde.ipdcFrT5MFbLXenL8AAo3rJ@webmail.mpcdf.mpg.de> <4668AA2D-04CE-457C-A409-3DFE5B194120@petsc.dev> Message-ID: <20200804150932.Horde.-RoctsJnfu1WE9GVO9zcJaT@webmail.mpcdf.mpg.de> PS: I saw you committed a new version for the SUPERLU_DIST installer. There is a missing ")" at the end of line 42. Also I think one also has to add self.addToArgs(args,'-DCMAKE_CXX_FLAGS:STRING','-I'+self.blasLapack.include) as well (for me only adding the C flag raised the same error as before; with this, it works for me even if CPATH isn't set) Thank you again very much :) Best regards, Felix Zitat von Barry Smith : > On some systems, like my Mac, the environmental variable CPATH is > set to the appropriate directory for mkl.h when MKL is initialized > with > > source /opt/intel/compilers_and_libraries/mac/bin/compilervars.sh intel64 > > hence the compiler can magically find mkl.h but this doesn't seem to > be universal, I think that is why Felix's system cannot find the > include, I found a Linux machine at ANL that produces the same > problem Felix has. Anyways I'm fixing the PETSc install will just > work even if CPATH is not set. > > Barry > > > > > >> On Aug 3, 2020, at 10:21 PM, Xiaoye S. Li wrote: >> >> Regarding MKL, superlu only uses some BLAS routines in MKL. If you >> have alternative BLAS, you don't need to use MKL. >> >> But, in my experience, on an Intel system, the compiler can >> correctly include the path with mkl.h, that is, I don't need to do >> anything special. >> I think mpiicc should work. >> >> Sherry >> >> On Mon, Aug 3, 2020 at 8:46 AM > > wrote: >> Hi Barry, >> Thanks for the branch (and thanks to Sherry as well). I tried to use >> the configure example " arch-ci-linux-intel-mkl-single.py" (adding the >> --with-batch flag, since I am running on a cluster), but I get the >> following error message: >> TESTING: check from >> config.libraries(config/BuildSystem/config/libraries.py:157) >> >> >> >> ******************************************************************************* >> UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log >> for details): >> ------------------------------------------------------------------------------- >> Downloaded ptscotch could not be used. Please check install in >> /u/flw/petsc-hash-pkgs/073aec050 >> >> (see configure1.log) >> >> Next I tried a minimalistic version using Intel MPI compilers: >> ./configure --download-superlu_dist --download-metis >> --download-parmetis --download-ptscotch --with-precision=single >> --with-batch --with-fc=mpiifort --with-cc=mpiicc --with-cxx=mpiicpc >> >> There I got the following error: >> =============================================================================== Compiling and installing SUPERLU_DIST; this may take several minutes >> =============================================================================== >> ******************************************************************************* >> UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log >> for details): >> ------------------------------------------------------------------------------- >> Error running make on SUPERLU_DIST >> >> (see configure2.log) >> >> >> Next I tried >> ./configure --download-superlu_dist --download-metis >> --download-parmetis --download-ptscotch --with-precision=single >> --with-batch >> --with-mpi-dir=/mpcdf/soft/SLE_12/packages/x86_64/intel_parallel_studio/2018.4/impi/2018.4.274/intel64 >> >> Same error message (see configure3.log) >> >> >> It seems that there is something going on with mkl in case I want to >> use Intel compilers for C and C++ (compiling with gcc and g++ seems to >> work ) >> >> Do you know what is going on there? (I am running on the Draco >> cluster, if that helps >> (https://www.mpcdf.mpg.de/services/computing/draco >> )) >> >> >> Best regards, >> Felix >> >> Zitat von Barry Smith >: >> >> > Felix, >> > >> > The branch is ready. Just use >> > >> > git checkout barry/2020-07-28/superlu_dist-single >> > ./configure --download-superlu_dist --download-metis >> > --download-parmetis --download-ptscotch --with-precision=single >> > and whatever else you use >> > >> > Barry >> > >> > It will automatically get the needed branch of SuperLU_DIST that >> > Sherry prepared. >> > >> > >> >> On Jul 27, 2020, at 2:10 PM, flw at rzg.mpg.de >> wrote: >> >> >> >> Hi Shery, >> >> Yes, ideally we would like to compile PETSc in single precision and >> >> simply run a single precision version of SUPERLU_DIST just like >> >> e.g. MUMPS. >> >> >> >> Best regards and thanks, >> >> Felix >> >> Zitat von "Xiaoye S. Li" >: >> >> >> >>> Barry, >> >>> >> >>> I have a branch 'Mixed-precision' working with single precision FP32. I >> >>> assume Felix wants to use superlu_dist from petsc. How do you want to >> >>> incorporate it in petsc? >> >>> >> >>> https://github.com/xiaoyeli/superlu_dist >> >> >>> >> >>> PS1: in this version, FP32 only works on CPU. FP64 and >> complex-FP64 all >> >>> work on GPU. >> >>> >> >>> PS2: currently there is no mixed-precision yet, but it is the >> branch we are >> >>> adding mix-prec support. Will take a while before merging to master. >> >>> >> >>> Sherry >> >>> >> >>> >> >>> On Wed, Jul 22, 2020 at 6:04 AM > > wrote: >> >>> >> >>>> Hi Barry, >> >>>> for now I just want to run everything in single on CPUs only with >> >>>> SUPERLU_DIST. Maybe we will also incorporate GPUs in the future, but >> >>>> there are no immediate plans yet. So if you could provide the support, >> >>>> that would be awesome. >> >>>> >> >>>> Best regards, >> >>>> Felix >> >>>> >> >>>> Zitat von Barry Smith >: >> >>>> >> >>>> > Felix, >> >>>> > >> >>>> > What are your needs, do you want this for CPUs or for GPUs? Do >> >>>> > you wish to run all your code in single precision or just the >> >>>> > SuperLU_Dist solver while the rest of your code double? >> >>>> > >> >>>> > If you want to run everything on CPUs using single precision >> >>>> > then adding the support is very easy, we can provide that for you >> >>>> > any time. The other cases will require more thought. >> >>>> > >> >>>> > Barry >> >>>> > >> >>>> > >> >>>> >> On Jul 21, 2020, at 8:58 AM, flw at rzg.mpg.de >> wrote: >> >>>> >> >> >>>> >> Dear PETSc support team, >> >>>> >> some time ago you told me that you are planning on releasing a >> >>>> >> version that supports SUPERLU_DIST in single-precision soon. Can >> >>>> >> you tell me roughly what time frame you had in mind? >> >>>> >> >> >>>> >> Best regards, >> >>>> >> Felix >> >>>> >> >> >>>> >> >>>> >> >>>> >> >>>> >> >> >> >> >> >> >> >> From bsmith at petsc.dev Tue Aug 4 10:49:33 2020 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 4 Aug 2020 10:49:33 -0500 Subject: [petsc-users] overlap cpu and gpu? In-Reply-To: References: <6C7446CE-D6FD-4087-8B81-41494FA712E7@petsc.dev> <87eeoqp3t2.fsf@jedbrown.org> Message-ID: ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F --------------------------------------------------------------------------------------------------------------------------------------------------------------- --- Event Stage 0: Main Stage KSPSolve 48 1.0 9.2492e+00 1.1 8.22e+08 1.2 5.9e+05 3.6e+03 1.3e+03 17 99 88 97 78 17 99 88 97 79 51484 1792674 446 1.73e+02 957 3.72e+02 100 KSPGMRESOrthog 306 1.1 2.2495e-01 1.5 3.86e+08 1.2 0.0e+00 0.0e+00 3.0e+02 0 46 0 0 18 0 46 0 0 18 973865 2562528 94 3.67e+01 0 0.00e+00 100 PCApply 354 1.1 5.7478e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 11 0 0 0 0 11 0 0 0 0 0 0 0 0.00e+00 675 2.62e+02 0 It is GPU %F that is percent of the flops, not of the time. Since hypre does not count flops the only flops counted are in PETSc and the hypre ones, that take place on the CPU are not counted. Note that PCApply takes 11 percent of the time, (this is hypre) and KSPSolve (which is hypre plus GMRES) takes 17 percent of the time. So 11/17 of the time is not on the GPU. Note also the huge number of copies to and from the GPU above, this is because data has to be moved to the CPU for hypre and then back to the GPU for PETSc. Barry > On Aug 4, 2020, at 5:46 AM, nicola varini wrote: > > Thanks for your reply Stefano. I know that HYPRE is not ported on GPU, but the Solver is running on GPU and is taking ~9s and is showing 100% of GPU utilization. > > Il giorno mar 4 ago 2020 alle ore 12:35 Stefano Zampini > ha scritto: > Nicola, > > You are actually not using the GPU properly, since you use HYPRE preconditioning, which is CPU only. One of your solvers is actually slower on ?GPU?. > For a full AMG GPU, you can use PCGAMG, with cheby smoothers and with Jacobi preconditioning. Mark can help you out with the specific command line options. > When it works properly, everything related to PC application is offloaded to the GPU, and you should expect to get the well-known and branded 10x (maybe more) speedup one is expecting from GPUs during KSPSolve > > Doing what you want to do is one of the last optimization steps of an already optimized code before entering production. Yours is not even optimized for proper GPU usage yet. > Also, any specific reason why you are using dgmres and fgmres? > > PETSc has not been designed with multi-threading in mind. You can achieve ?overlap? of the two solves by splitting the communicator. But then you need communications to let the two solutions talk to each other. > > Thanks > Stefano > > >> On Aug 4, 2020, at 12:04 PM, nicola varini > wrote: >> >> Dear all, thanks for your replies. The reason why I've asked if it is possible to overlap poisson and ampere is because they roughly >> take the same amount of time. Please find in attachment the profiling logs for only CPU and only GPU. >> Of course it is possible to split the MPI communicator and run each solver on different subcommunicator, however this would involve more communication. >> Did anyone ever tried to run 2 solvers with hyperthreading? >> Thanks >> >> >> Il giorno dom 2 ago 2020 alle ore 14:09 Mark Adams > ha scritto: >> I suspect that the Poisson and Ampere's law solve are not coupled. You might be able to duplicate the communicator and use two threads. You would want to configure PETSc with threadsafty and threads and I think it could/should work, but this mode is never used by anyone. >> >> That said, I would not recommend doing this unless you feel like playing in computer science, as opposed to doing application science. The best case scenario you get a speedup of 2x. That is a strict upper bound, but you will never come close to it. Your hardware has some balance of CPU to GPU processing rate. Your application has a balance of volume of work for your two solves. They have to be the same to get close to 2x speedup and that ratio(s) has to be 1:1. To be concrete, from what little I can guess about your applications let's assume that the cost of each of these two solves is about the same (eg, Laplacians on your domain and the best case scenario). But, GPU machines are configured to have roughly 1-10% of capacity in the GPUs, these days, that gives you an upper bound of about 10% speedup. That is noise. Upshot, unless you configure your hardware to match this problem, and the two solves have the same cost, you will not see close to 2x speedup. Your time is better spent elsewhere. >> >> Mark >> >> On Sat, Aug 1, 2020 at 3:24 PM Jed Brown > wrote: >> You can use MPI and split the communicator so n-1 ranks create a DMDA for one part of your system and the other rank drives the GPU in the other part. They can all be part of the same coupled system on the full communicator, but PETSc doesn't currently support some ranks having their Vec arrays on GPU and others on host, so you'd be paying host-device transfer costs on each iteration (and that might swamp any performance benefit you would have gotten). >> >> In any case, be sure to think about the execution time of each part. Load balancing with matching time-to-solution for each part can be really hard. >> >> >> Barry Smith > writes: >> >> > Nicola, >> > >> > This is really viable or practical at this time with PETSc. It is not impossible but requires careful coding with threads, another possibility is to use one half of the virtual GPUs for each solve, this is also not trivial. I would recommend first seeing what kind of performance you can get on the GPU for each type of solve and revist this idea in the future. >> > >> > Barry >> > >> > >> > >> > >> >> On Jul 31, 2020, at 9:23 AM, nicola varini > wrote: >> >> >> >> Hello, I would like to know if it is possible to overlap CPU and GPU with DMDA. >> >> I've a machine where each node has 1P100+1Haswell. >> >> I've to resolve Poisson and Ampere equation for each time step. >> >> I'm using 2D DMDA for each of them. Would be possible to compute poisson >> >> and ampere equation at the same time? One on CPU and the other on GPU? >> >> >> >> Thanks >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Tue Aug 4 10:55:22 2020 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 4 Aug 2020 11:55:22 -0400 Subject: [petsc-users] overlap cpu and gpu? In-Reply-To: References: <6C7446CE-D6FD-4087-8B81-41494FA712E7@petsc.dev> <87eeoqp3t2.fsf@jedbrown.org> Message-ID: On Tue, Aug 4, 2020 at 6:04 AM nicola varini wrote: > Dear all, thanks for your replies. The reason why I've asked if it is > possible to overlap poisson and ampere is because they roughly > take the same amount of time. Please find in attachment the profiling logs > for only CPU and only GPU. > Of course it is possible to split the MPI communicator and run each solver > on different subcommunicator, however this would involve more communication. > As Stefano said, you don't want to do this, but I meant that if you used threads you would need to duplicate the communicator to have them both run concurrently. > Did anyone ever tried to run 2 solvers with hyperthreading? > Thanks > > > Il giorno dom 2 ago 2020 alle ore 14:09 Mark Adams ha > scritto: > >> I suspect that the Poisson and Ampere's law solve are not coupled. You >> might be able to duplicate the communicator and use two threads. You would >> want to configure PETSc with threadsafty and threads and I think it >> could/should work, but this mode is never used by anyone. >> >> That said, I would not recommend doing this unless you feel like playing >> in computer science, as opposed to doing application science. The best case >> scenario you get a speedup of 2x. That is a strict upper bound, but you >> will never come close to it. Your hardware has some balance of CPU to GPU >> processing rate. Your application has a balance of volume of work for your >> two solves. They have to be the same to get close to 2x speedup and that >> ratio(s) has to be 1:1. To be concrete, from what little I can guess about >> your applications let's assume that the cost of each of these two solves is >> about the same (eg, Laplacians on your domain and the best case scenario). >> But, GPU machines are configured to have roughly 1-10% of capacity in the >> GPUs, these days, that gives you an upper bound of about 10% speedup. That >> is noise. Upshot, unless you configure your hardware to match this problem, >> and the two solves have the same cost, you will not see close to 2x >> speedup. Your time is better spent elsewhere. >> >> Mark >> >> On Sat, Aug 1, 2020 at 3:24 PM Jed Brown wrote: >> >>> You can use MPI and split the communicator so n-1 ranks create a DMDA >>> for one part of your system and the other rank drives the GPU in the other >>> part. They can all be part of the same coupled system on the full >>> communicator, but PETSc doesn't currently support some ranks having their >>> Vec arrays on GPU and others on host, so you'd be paying host-device >>> transfer costs on each iteration (and that might swamp any performance >>> benefit you would have gotten). >>> >>> In any case, be sure to think about the execution time of each part. >>> Load balancing with matching time-to-solution for each part can be really >>> hard. >>> >>> >>> Barry Smith writes: >>> >>> > Nicola, >>> > >>> > This is really viable or practical at this time with PETSc. It is >>> not impossible but requires careful coding with threads, another >>> possibility is to use one half of the virtual GPUs for each solve, this is >>> also not trivial. I would recommend first seeing what kind of performance >>> you can get on the GPU for each type of solve and revist this idea in the >>> future. >>> > >>> > Barry >>> > >>> > >>> > >>> > >>> >> On Jul 31, 2020, at 9:23 AM, nicola varini >>> wrote: >>> >> >>> >> Hello, I would like to know if it is possible to overlap CPU and GPU >>> with DMDA. >>> >> I've a machine where each node has 1P100+1Haswell. >>> >> I've to resolve Poisson and Ampere equation for each time step. >>> >> I'm using 2D DMDA for each of them. Would be possible to compute >>> poisson >>> >> and ampere equation at the same time? One on CPU and the other on GPU? >>> >> >>> >> Thanks >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Aug 4 10:55:57 2020 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 4 Aug 2020 10:55:57 -0500 Subject: [petsc-users] SUPERLU_DIST in single precision In-Reply-To: <20200804150932.Horde.-RoctsJnfu1WE9GVO9zcJaT@webmail.mpcdf.mpg.de> References: <20200721155807.Horde.5ZYOzrQ7dmwNTQquK_6ebeS@webmail.mpcdf.mpg.de> <20200722150403.Horde.NO015I2GI2E-524J-SXqwui@webmail.mpcdf.mpg.de> <20200727211054.Horde.ZgzkW110frSEh8mw1TA2Zdn@webmail.mpcdf.mpg.de> <20200803174500.Horde.ipdcFrT5MFbLXenL8AAo3rJ@webmail.mpcdf.mpg.de> <4668AA2D-04CE-457C-A409-3DFE5B194120@petsc.dev> <20200804150932.Horde.-RoctsJnfu1WE9GVO9zcJaT@webmail.mpcdf.mpg.de> Message-ID: <6CF121C2-A4DA-432F-8AFD-83E968E69750@petsc.dev> Good. I have cleaned up the code and rebased the branch so if you use it in the future you need to do git fetch git checkout master git branch -D barry/2020-07-28/superlu_dist-single git checkout barry/2020-07-28/superlu_dist-single Do not just do a git pull in barry/2020-07-28/superlu_dist-single that will make a mess > On Aug 4, 2020, at 8:09 AM, flw at rzg.mpg.de wrote: > > PS: > I saw you committed a new version for the SUPERLU_DIST installer. > There is a missing ")" at the end of line 42. > Also I think one also has to add > self.addToArgs(args,'-DCMAKE_CXX_FLAGS:STRING','-I'+self.blasLapack.include) > > as well (for me only adding the C flag raised the same error as before; with this, it works for me even if CPATH isn't set) > Thank you again very much :) > > Best regards, > Felix > > Zitat von Barry Smith : > >> On some systems, like my Mac, the environmental variable CPATH is set to the appropriate directory for mkl.h when MKL is initialized with >> >> source /opt/intel/compilers_and_libraries/mac/bin/compilervars.sh intel64 >> >> hence the compiler can magically find mkl.h but this doesn't seem to be universal, I think that is why Felix's system cannot find the include, I found a Linux machine at ANL that produces the same problem Felix has. Anyways I'm fixing the PETSc install will just work even if CPATH is not set. >> >> Barry >> >> >> >> >> >>> On Aug 3, 2020, at 10:21 PM, Xiaoye S. Li wrote: >>> >>> Regarding MKL, superlu only uses some BLAS routines in MKL. If you have alternative BLAS, you don't need to use MKL. >>> >>> But, in my experience, on an Intel system, the compiler can correctly include the path with mkl.h, that is, I don't need to do anything special. >>> I think mpiicc should work. >>> >>> Sherry >>> >>> On Mon, Aug 3, 2020 at 8:46 AM > wrote: >>> Hi Barry, >>> Thanks for the branch (and thanks to Sherry as well). I tried to use >>> the configure example " arch-ci-linux-intel-mkl-single.py" (adding the >>> --with-batch flag, since I am running on a cluster), but I get the >>> following error message: >>> TESTING: check from >>> config.libraries(config/BuildSystem/config/libraries.py:157) >>> >>> >>> >>> ******************************************************************************* >>> UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log >>> for details): >>> ------------------------------------------------------------------------------- >>> Downloaded ptscotch could not be used. Please check install in >>> /u/flw/petsc-hash-pkgs/073aec050 >>> >>> (see configure1.log) >>> >>> Next I tried a minimalistic version using Intel MPI compilers: >>> ./configure --download-superlu_dist --download-metis >>> --download-parmetis --download-ptscotch --with-precision=single >>> --with-batch --with-fc=mpiifort --with-cc=mpiicc --with-cxx=mpiicpc >>> >>> There I got the following error: >>> =============================================================================== Compiling and installing SUPERLU_DIST; this may take several minutes =============================================================================== >>> ******************************************************************************* >>> UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log >>> for details): >>> ------------------------------------------------------------------------------- >>> Error running make on SUPERLU_DIST >>> >>> (see configure2.log) >>> >>> >>> Next I tried >>> ./configure --download-superlu_dist --download-metis >>> --download-parmetis --download-ptscotch --with-precision=single >>> --with-batch >>> --with-mpi-dir=/mpcdf/soft/SLE_12/packages/x86_64/intel_parallel_studio/2018.4/impi/2018.4.274/intel64 >>> >>> Same error message (see configure3.log) >>> >>> >>> It seems that there is something going on with mkl in case I want to >>> use Intel compilers for C and C++ (compiling with gcc and g++ seems to >>> work ) >>> >>> Do you know what is going on there? (I am running on the Draco >>> cluster, if that helps >>> (https://www.mpcdf.mpg.de/services/computing/draco )) >>> >>> >>> Best regards, >>> Felix >>> >>> Zitat von Barry Smith >: >>> >>> > Felix, >>> > >>> > The branch is ready. Just use >>> > >>> > git checkout barry/2020-07-28/superlu_dist-single >>> > ./configure --download-superlu_dist --download-metis >>> > --download-parmetis --download-ptscotch --with-precision=single >>> > and whatever else you use >>> > >>> > Barry >>> > >>> > It will automatically get the needed branch of SuperLU_DIST that >>> > Sherry prepared. >>> > >>> > >>> >> On Jul 27, 2020, at 2:10 PM, flw at rzg.mpg.de wrote: >>> >> >>> >> Hi Shery, >>> >> Yes, ideally we would like to compile PETSc in single precision and >>> >> simply run a single precision version of SUPERLU_DIST just like >>> >> e.g. MUMPS. >>> >> >>> >> Best regards and thanks, >>> >> Felix >>> >> Zitat von "Xiaoye S. Li" >: >>> >> >>> >>> Barry, >>> >>> >>> >>> I have a branch 'Mixed-precision' working with single precision FP32. I >>> >>> assume Felix wants to use superlu_dist from petsc. How do you want to >>> >>> incorporate it in petsc? >>> >>> >>> >>> https://github.com/xiaoyeli/superlu_dist >>> >>> >>> >>> PS1: in this version, FP32 only works on CPU. FP64 and complex-FP64 all >>> >>> work on GPU. >>> >>> >>> >>> PS2: currently there is no mixed-precision yet, but it is the branch we are >>> >>> adding mix-prec support. Will take a while before merging to master. >>> >>> >>> >>> Sherry >>> >>> >>> >>> >>> >>> On Wed, Jul 22, 2020 at 6:04 AM > wrote: >>> >>> >>> >>>> Hi Barry, >>> >>>> for now I just want to run everything in single on CPUs only with >>> >>>> SUPERLU_DIST. Maybe we will also incorporate GPUs in the future, but >>> >>>> there are no immediate plans yet. So if you could provide the support, >>> >>>> that would be awesome. >>> >>>> >>> >>>> Best regards, >>> >>>> Felix >>> >>>> >>> >>>> Zitat von Barry Smith >: >>> >>>> >>> >>>> > Felix, >>> >>>> > >>> >>>> > What are your needs, do you want this for CPUs or for GPUs? Do >>> >>>> > you wish to run all your code in single precision or just the >>> >>>> > SuperLU_Dist solver while the rest of your code double? >>> >>>> > >>> >>>> > If you want to run everything on CPUs using single precision >>> >>>> > then adding the support is very easy, we can provide that for you >>> >>>> > any time. The other cases will require more thought. >>> >>>> > >>> >>>> > Barry >>> >>>> > >>> >>>> > >>> >>>> >> On Jul 21, 2020, at 8:58 AM, flw at rzg.mpg.de wrote: >>> >>>> >> >>> >>>> >> Dear PETSc support team, >>> >>>> >> some time ago you told me that you are planning on releasing a >>> >>>> >> version that supports SUPERLU_DIST in single-precision soon. Can >>> >>>> >> you tell me roughly what time frame you had in mind? >>> >>>> >> >>> >>>> >> Best regards, >>> >>>> >> Felix >>> >>>> >> >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >> >>> >> >>> >> >>> >>> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Tue Aug 4 10:57:42 2020 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 4 Aug 2020 11:57:42 -0400 Subject: [petsc-users] overlap cpu and gpu? In-Reply-To: References: <6C7446CE-D6FD-4087-8B81-41494FA712E7@petsc.dev> <87eeoqp3t2.fsf@jedbrown.org> Message-ID: On Tue, Aug 4, 2020 at 6:35 AM Stefano Zampini wrote: > Nicola, > > You are actually not using the GPU properly, since you use HYPRE > preconditioning, which is CPU only. One of your solvers is actually slower > on ?GPU?. > For a full AMG GPU, you can use PCGAMG, with cheby smoothers and with > Jacobi preconditioning. Mark can help you out with the specific command > line options. > When it works properly, everything related to PC application is offloaded > to the GPU, and you should expect to get the well-known and branded 10x > (maybe more) speedup one is expecting from GPUs during KSPSolve > > The speedup depends on the machine, but on SUMMIT, using enough CPUs to saturate the memory bus vs all 6 GPUs the speedup is a function of problem subdomain size. I saw 10x at about 100K equations/process. > Doing what you want to do is one of the last optimization steps of an > already optimized code before entering production. Yours is not even > optimized for proper GPU usage yet. > Also, any specific reason why you are using dgmres and fgmres? > > PETSc has not been designed with multi-threading in mind. You can achieve > ?overlap? of the two solves by splitting the communicator. But then you > need communications to let the two solutions talk to each other. > > Thanks > Stefano > > > On Aug 4, 2020, at 12:04 PM, nicola varini > wrote: > > Dear all, thanks for your replies. The reason why I've asked if it is > possible to overlap poisson and ampere is because they roughly > take the same amount of time. Please find in attachment the profiling logs > for only CPU and only GPU. > Of course it is possible to split the MPI communicator and run each solver > on different subcommunicator, however this would involve more communication. > Did anyone ever tried to run 2 solvers with hyperthreading? > Thanks > > > Il giorno dom 2 ago 2020 alle ore 14:09 Mark Adams ha > scritto: > >> I suspect that the Poisson and Ampere's law solve are not coupled. You >> might be able to duplicate the communicator and use two threads. You would >> want to configure PETSc with threadsafty and threads and I think it >> could/should work, but this mode is never used by anyone. >> >> That said, I would not recommend doing this unless you feel like playing >> in computer science, as opposed to doing application science. The best case >> scenario you get a speedup of 2x. That is a strict upper bound, but you >> will never come close to it. Your hardware has some balance of CPU to GPU >> processing rate. Your application has a balance of volume of work for your >> two solves. They have to be the same to get close to 2x speedup and that >> ratio(s) has to be 1:1. To be concrete, from what little I can guess about >> your applications let's assume that the cost of each of these two solves is >> about the same (eg, Laplacians on your domain and the best case scenario). >> But, GPU machines are configured to have roughly 1-10% of capacity in the >> GPUs, these days, that gives you an upper bound of about 10% speedup. That >> is noise. Upshot, unless you configure your hardware to match this problem, >> and the two solves have the same cost, you will not see close to 2x >> speedup. Your time is better spent elsewhere. >> >> Mark >> >> On Sat, Aug 1, 2020 at 3:24 PM Jed Brown wrote: >> >>> You can use MPI and split the communicator so n-1 ranks create a DMDA >>> for one part of your system and the other rank drives the GPU in the other >>> part. They can all be part of the same coupled system on the full >>> communicator, but PETSc doesn't currently support some ranks having their >>> Vec arrays on GPU and others on host, so you'd be paying host-device >>> transfer costs on each iteration (and that might swamp any performance >>> benefit you would have gotten). >>> >>> In any case, be sure to think about the execution time of each part. >>> Load balancing with matching time-to-solution for each part can be really >>> hard. >>> >>> >>> Barry Smith writes: >>> >>> > Nicola, >>> > >>> > This is really viable or practical at this time with PETSc. It is >>> not impossible but requires careful coding with threads, another >>> possibility is to use one half of the virtual GPUs for each solve, this is >>> also not trivial. I would recommend first seeing what kind of performance >>> you can get on the GPU for each type of solve and revist this idea in the >>> future. >>> > >>> > Barry >>> > >>> > >>> > >>> > >>> >> On Jul 31, 2020, at 9:23 AM, nicola varini >>> wrote: >>> >> >>> >> Hello, I would like to know if it is possible to overlap CPU and GPU >>> with DMDA. >>> >> I've a machine where each node has 1P100+1Haswell. >>> >> I've to resolve Poisson and Ampere equation for each time step. >>> >> I'm using 2D DMDA for each of them. Would be possible to compute >>> poisson >>> >> and ampere equation at the same time? One on CPU and the other on GPU? >>> >> >>> >> Thanks >>> >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at email.arizona.edu Tue Aug 4 18:55:34 2020 From: aph at email.arizona.edu (Anthony Paul Haas) Date: Tue, 4 Aug 2020 16:55:34 -0700 Subject: [petsc-users] reuse Mumps factorization for multiple RHS Message-ID: Hello, When using Mumps to solve a linear system of equations (see below), can I reuse the factorization to solve for multiple RHS, ie, can I use KSPSolve multiple times while only building a different RHS in between the calls to KSPSolve? Thanks, Anthony call KSPSetType(self%ksp,KSPPREONLY,self%ierr_ps) call KSPGetPC(self%ksp,pc,self%ierr_ps) call PCSetType(pc,PCLU,self%ierr_ps) call PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS,self%ierr_ps) call PCFactorSetUpMatSolverPackage(pc,self%ierr_ps) -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Aug 4 19:27:47 2020 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 4 Aug 2020 19:27:47 -0500 Subject: [petsc-users] reuse Mumps factorization for multiple RHS In-Reply-To: References: Message-ID: <8FB04B95-5F6A-4FB5-9880-4C370E399449@petsc.dev> Yes, it automatically uses the same factorization. > On Aug 4, 2020, at 6:55 PM, Anthony Paul Haas wrote: > > Hello, > > When using Mumps to solve a linear system of equations (see below), can I reuse the factorization to solve for multiple RHS, ie, can I use KSPSolve multiple times while only building a different RHS in between the calls to KSPSolve? > > Thanks, > > Anthony > > call KSPSetType(self%ksp,KSPPREONLY,self%ierr_ps) > call KSPGetPC(self%ksp,pc,self%ierr_ps) > > call PCSetType(pc,PCLU,self%ierr_ps) > > call PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS,self%ierr_ps) > call PCFactorSetUpMatSolverPackage(pc,self%ierr_ps) -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Aug 4 19:29:49 2020 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 4 Aug 2020 20:29:49 -0400 Subject: [petsc-users] reuse Mumps factorization for multiple RHS In-Reply-To: References: Message-ID: On Tue, Aug 4, 2020 at 7:57 PM Anthony Paul Haas wrote: > Hello, > > When using Mumps to solve a linear system of equations (see below), can I > reuse the factorization to solve for multiple RHS, ie, can I use KSPSolve multiple > times while only building a different RHS in between the calls to KSPSolve > ? > Yes. It should detect that you have not changed the operator for the KSP and thus not refactor the matrix. Thanks, Matt > Thanks, > > Anthony > > call KSPSetType(self%ksp,KSPPREONLY,self%ierr_ps) > > call KSPGetPC(self%ksp,pc,self%ierr_ps) > > > > > > call PCSetType(pc,PCLU,self%ierr_ps) > > > > > > call PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS,self%ierr_ps) > > call PCFactorSetUpMatSolverPackage(pc,self%ierr_ps) > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Tue Aug 4 21:28:23 2020 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Wed, 5 Aug 2020 02:28:23 +0000 Subject: [petsc-users] reuse Mumps factorization for multiple RHS In-Reply-To: References: , Message-ID: See case '"-num_rhs" in petsc/src/ksp/ksp/tests/ex30.c Hong ________________________________ From: petsc-users on behalf of Matthew Knepley Sent: Tuesday, August 4, 2020 7:29 PM To: Anthony Paul Haas Cc: petsc-users Subject: Re: [petsc-users] reuse Mumps factorization for multiple RHS On Tue, Aug 4, 2020 at 7:57 PM Anthony Paul Haas > wrote: Hello, When using Mumps to solve a linear system of equations (see below), can I reuse the factorization to solve for multiple RHS, ie, can I use KSPSolve multiple times while only building a different RHS in between the calls to KSPSolve? Yes. It should detect that you have not changed the operator for the KSP and thus not refactor the matrix. Thanks, Matt Thanks, Anthony call KSPSetType(self%ksp,KSPPREONLY,self%ierr_ps) call KSPGetPC(self%ksp,pc,self%ierr_ps) call PCSetType(pc,PCLU,self%ierr_ps) call PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS,self%ierr_ps) call PCFactorSetUpMatSolverPackage(pc,self%ierr_ps) -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From flw at rzg.mpg.de Wed Aug 5 01:33:10 2020 From: flw at rzg.mpg.de (flw at rzg.mpg.de) Date: Wed, 05 Aug 2020 08:33:10 +0200 Subject: [petsc-users] SUPERLU_DIST in single precision In-Reply-To: <6CF121C2-A4DA-432F-8AFD-83E968E69750@petsc.dev> References: <20200721155807.Horde.5ZYOzrQ7dmwNTQquK_6ebeS@webmail.mpcdf.mpg.de> <20200722150403.Horde.NO015I2GI2E-524J-SXqwui@webmail.mpcdf.mpg.de> <20200727211054.Horde.ZgzkW110frSEh8mw1TA2Zdn@webmail.mpcdf.mpg.de> <20200803174500.Horde.ipdcFrT5MFbLXenL8AAo3rJ@webmail.mpcdf.mpg.de> <4668AA2D-04CE-457C-A409-3DFE5B194120@petsc.dev> <20200804150932.Horde.-RoctsJnfu1WE9GVO9zcJaT@webmail.mpcdf.mpg.de> <6CF121C2-A4DA-432F-8AFD-83E968E69750@petsc.dev> Message-ID: <20200805083310.Horde.fgJQfz9jpDfwNpL4WsnBJNa@webmail.mpcdf.mpg.de> Thanks Barry and Sherry, it works like a charm :) Best regards, Felix Zitat von Barry Smith : > Good. I have cleaned up the code and rebased the branch so if you > use it in the future you need to do > > git fetch > git checkout master > git branch -D barry/2020-07-28/superlu_dist-single > git checkout barry/2020-07-28/superlu_dist-single > > Do not just do a git pull in barry/2020-07-28/superlu_dist-single > that will make a mess > > >> On Aug 4, 2020, at 8:09 AM, flw at rzg.mpg.de wrote: >> >> PS: >> I saw you committed a new version for the SUPERLU_DIST installer. >> There is a missing ")" at the end of line 42. >> Also I think one also has to add >> self.addToArgs(args,'-DCMAKE_CXX_FLAGS:STRING','-I'+self.blasLapack.include) >> >> as well (for me only adding the C flag raised the same error as >> before; with this, it works for me even if CPATH isn't set) >> Thank you again very much :) >> >> Best regards, >> Felix >> >> Zitat von Barry Smith : >> >>> On some systems, like my Mac, the environmental variable CPATH is >>> set to the appropriate directory for mkl.h when MKL is initialized >>> with >>> >>> source /opt/intel/compilers_and_libraries/mac/bin/compilervars.sh intel64 >>> >>> hence the compiler can magically find mkl.h but this doesn't seem >>> to be universal, I think that is why Felix's system cannot find >>> the include, I found a Linux machine at ANL that produces the same >>> problem Felix has. Anyways I'm fixing the PETSc install will just >>> work even if CPATH is not set. >>> >>> Barry >>> >>> >>> >>> >>> >>>> On Aug 3, 2020, at 10:21 PM, Xiaoye S. Li wrote: >>>> >>>> Regarding MKL, superlu only uses some BLAS routines in MKL. If >>>> you have alternative BLAS, you don't need to use MKL. >>>> >>>> But, in my experience, on an Intel system, the compiler can >>>> correctly include the path with mkl.h, that is, I don't need to >>>> do anything special. >>>> I think mpiicc should work. >>>> >>>> Sherry >>>> >>>> On Mon, Aug 3, 2020 at 8:46 AM >>> > wrote: >>>> Hi Barry, >>>> Thanks for the branch (and thanks to Sherry as well). I tried to use >>>> the configure example " arch-ci-linux-intel-mkl-single.py" (adding the >>>> --with-batch flag, since I am running on a cluster), but I get the >>>> following error message: >>>> TESTING: check from >>>> config.libraries(config/BuildSystem/config/libraries.py:157) >>>> >>>> >>>> >>>> ******************************************************************************* >>>> UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log >>>> for details): >>>> ------------------------------------------------------------------------------- >>>> Downloaded ptscotch could not be used. Please check install in >>>> /u/flw/petsc-hash-pkgs/073aec050 >>>> >>>> (see configure1.log) >>>> >>>> Next I tried a minimalistic version using Intel MPI compilers: >>>> ./configure --download-superlu_dist --download-metis >>>> --download-parmetis --download-ptscotch --with-precision=single >>>> --with-batch --with-fc=mpiifort --with-cc=mpiicc --with-cxx=mpiicpc >>>> >>>> There I got the following error: >>>> =============================================================================== Compiling and installing SUPERLU_DIST; this may take several minutes >>>> =============================================================================== >>>> ******************************************************************************* >>>> UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log >>>> for details): >>>> ------------------------------------------------------------------------------- >>>> Error running make on SUPERLU_DIST >>>> >>>> (see configure2.log) >>>> >>>> >>>> Next I tried >>>> ./configure --download-superlu_dist --download-metis >>>> --download-parmetis --download-ptscotch --with-precision=single >>>> --with-batch >>>> --with-mpi-dir=/mpcdf/soft/SLE_12/packages/x86_64/intel_parallel_studio/2018.4/impi/2018.4.274/intel64 >>>> >>>> Same error message (see configure3.log) >>>> >>>> >>>> It seems that there is something going on with mkl in case I want to >>>> use Intel compilers for C and C++ (compiling with gcc and g++ seems to >>>> work ) >>>> >>>> Do you know what is going on there? (I am running on the Draco >>>> cluster, if that helps >>>> (https://www.mpcdf.mpg.de/services/computing/draco >>>> )) >>>> >>>> >>>> Best regards, >>>> Felix >>>> >>>> Zitat von Barry Smith >: >>>> >>>> > Felix, >>>> > >>>> > The branch is ready. Just use >>>> > >>>> > git checkout barry/2020-07-28/superlu_dist-single >>>> > ./configure --download-superlu_dist --download-metis >>>> > --download-parmetis --download-ptscotch --with-precision=single >>>> > and whatever else you use >>>> > >>>> > Barry >>>> > >>>> > It will automatically get the needed branch of SuperLU_DIST that >>>> > Sherry prepared. >>>> > >>>> > >>>> >> On Jul 27, 2020, at 2:10 PM, flw at rzg.mpg.de >>>> wrote: >>>> >> >>>> >> Hi Shery, >>>> >> Yes, ideally we would like to compile PETSc in single precision and >>>> >> simply run a single precision version of SUPERLU_DIST just like >>>> >> e.g. MUMPS. >>>> >> >>>> >> Best regards and thanks, >>>> >> Felix >>>> >> Zitat von "Xiaoye S. Li" >: >>>> >> >>>> >>> Barry, >>>> >>> >>>> >>> I have a branch 'Mixed-precision' working with single >>>> precision FP32. I >>>> >>> assume Felix wants to use superlu_dist from petsc. How do you want to >>>> >>> incorporate it in petsc? >>>> >>> >>>> >>> https://github.com/xiaoyeli/superlu_dist >>>> >>>> >>> >>>> >>> PS1: in this version, FP32 only works on CPU. FP64 and >>>> complex-FP64 all >>>> >>> work on GPU. >>>> >>> >>>> >>> PS2: currently there is no mixed-precision yet, but it is the >>>> branch we are >>>> >>> adding mix-prec support. Will take a while before merging to master. >>>> >>> >>>> >>> Sherry >>>> >>> >>>> >>> >>>> >>> On Wed, Jul 22, 2020 at 6:04 AM >>> > wrote: >>>> >>> >>>> >>>> Hi Barry, >>>> >>>> for now I just want to run everything in single on CPUs only with >>>> >>>> SUPERLU_DIST. Maybe we will also incorporate GPUs in the future, but >>>> >>>> there are no immediate plans yet. So if you could provide >>>> the support, >>>> >>>> that would be awesome. >>>> >>>> >>>> >>>> Best regards, >>>> >>>> Felix >>>> >>>> >>>> >>>> Zitat von Barry Smith >: >>>> >>>> >>>> >>>> > Felix, >>>> >>>> > >>>> >>>> > What are your needs, do you want this for CPUs or for GPUs? Do >>>> >>>> > you wish to run all your code in single precision or just the >>>> >>>> > SuperLU_Dist solver while the rest of your code double? >>>> >>>> > >>>> >>>> > If you want to run everything on CPUs using single precision >>>> >>>> > then adding the support is very easy, we can provide that for you >>>> >>>> > any time. The other cases will require more thought. >>>> >>>> > >>>> >>>> > Barry >>>> >>>> > >>>> >>>> > >>>> >>>> >> On Jul 21, 2020, at 8:58 AM, flw at rzg.mpg.de >>>> wrote: >>>> >>>> >> >>>> >>>> >> Dear PETSc support team, >>>> >>>> >> some time ago you told me that you are planning on releasing a >>>> >>>> >> version that supports SUPERLU_DIST in single-precision soon. Can >>>> >>>> >> you tell me roughly what time frame you had in mind? >>>> >>>> >> >>>> >>>> >> Best regards, >>>> >>>> >> Felix >>>> >>>> >> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >> >>>> >> >>>> >> >>>> >>>> >> >> >> From g.gibb at epcc.ed.ac.uk Wed Aug 5 10:24:10 2020 From: g.gibb at epcc.ed.ac.uk (GIBB Gordon) Date: Wed, 5 Aug 2020 15:24:10 +0000 Subject: [petsc-users] Code (possibly) not running on GPU with CUDA Message-ID: Hi, I?ve built PETSc with NVIDIA support for our GPU machine (https://cirrus.readthedocs.io/en/master/user-guide/gpu.html), and then compiled our executable against this PETSc (using version 3.13.3). I should add that the MPI on our system is not GPU-aware so I have to use -use_gpu_aware_mpi 0 When running this, in the .petscrc I put -dm_vec_type cuda -dm_mat_type aijcusparse as is suggested on the PETSc GPU page (https://www.mcs.anl.gov/petsc/features/gpus.html) to enable CUDA for DMs (all our PETSc data structures are with DMs). I have also ensured I'm using the jacobi preconditioner so that it definitely runs on the GPU (again, according to the PETSc GPU page). When I run this, I note that the GPU seems to have memory allocated on it from my executable, however seems to be doing no computation: Wed Aug 5 13:10:23 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... On | 00000000:1A:00.0 Off | Off | | N/A 43C P0 64W / 300W | 490MiB / 16160MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 33712 C .../z04/gpsgibb/TPLS/TPLS-GPU/./twophase.x 479MiB | +-----------------------------------------------------------------------------+ I then ran the same example but without the -dm_vec_type cuda, -dm_mat_type aijcusparse arguments, and I found the same behaviour (479MB allocated on the GPU, 0% GPU utilisation). In both cases the runtime of the example are near identical, suggesting that both are essentially the same run. As a further test I compiled PETSc without CUDA support and ran the same example again, and found the same runtime as with the GPUs, and (as expected) no GPU memory allocated. I then tried to run the example with the -dm_vec_type cuda, -dm_mat_type aijcusparse arguments and it ran without complaint. I would have expected it to throw an error or at least a warning if invalid arguments were passed to it. All this suggests to me that PETSc is ignoring my requests to use the GPUs. For the GPU-aware PETSc it seems to allocate memory on the GPUs but perform no calculations on them, regardless of whether I requested it to use the GPUs or not. On non-GPU-aware PETSc it accepts my requests to use the GPUs, but does not throw an error. What am I doing wrong? Thanks in advance, Gordon ----------------------------------------------- Dr Gordon P S Gibb EPCC, The University of Edinburgh Tel: +44 131 651 3459 The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Aug 5 11:10:44 2020 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 5 Aug 2020 12:10:44 -0400 Subject: [petsc-users] Code (possibly) not running on GPU with CUDA In-Reply-To: References: Message-ID: On Wed, Aug 5, 2020 at 11:24 AM GIBB Gordon wrote: > Hi, > > I?ve built PETSc with NVIDIA support for our GPU machine ( > https://cirrus.readthedocs.io/en/master/user-guide/gpu.html), and then > compiled our executable against this PETSc (using version 3.13.3). I should > add that the MPI on our system is not GPU-aware so I have to use -use_gpu_aware_mpi > 0 > > When running this, in the .petscrc I put > > -dm_vec_type cuda > -dm_mat_type aijcusparse > > as is suggested on the PETSc GPU page ( > https://www.mcs.anl.gov/petsc/features/gpus.html) to enable CUDA for DMs > (all our PETSc data structures are with DMs). I have also ensured I'm using > the jacobi preconditioner so that it definitely runs on the GPU (again, > according to the PETSc GPU page). > > When I run this, I note that the GPU seems to have memory allocated on it > from my executable, however seems to be doing no computation: > > Wed Aug 5 13:10:23 2020 > > +-----------------------------------------------------------------------------+ > | NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: 10.2 > | > > |-------------------------------+----------------------+----------------------+ > | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. > ECC | > | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute > M. | > > |===============================+======================+======================| > | 0 Tesla V100-SXM2... On | 00000000:1A:00.0 Off | > Off | > | N/A 43C P0 64W / 300W | 490MiB / 16160MiB | 0% > Default | > > +-------------------------------+----------------------+----------------------+ > > > > > +-----------------------------------------------------------------------------+ > | Processes: GPU > Memory | > | GPU PID Type Process name Usage > | > > |=============================================================================| > | 0 33712 C .../z04/gpsgibb/TPLS/TPLS-GPU/./twophase.x > 479MiB | > > +-----------------------------------------------------------------------------+ > > I then ran the same example but without the -dm_vec_type cuda, > -dm_mat_type aijcusparse arguments, and I found the same behaviour (479MB > allocated on the GPU, 0% GPU utilisation). > > In both cases the runtime of the example are near identical, suggesting > that both are essentially the same run. > > As a further test I compiled PETSc without CUDA support and ran the same > example again, and found the same runtime as with the GPUs, and (as > expected) no GPU memory allocated. I then tried to run the example with the -dm_vec_type > cuda, -dm_mat_type aijcusparse arguments and it ran without complaint. I > would have expected it to throw an error or at least a warning if invalid > arguments were passed to it. > > All this suggests to me that PETSc is ignoring my requests to use the > GPUs. For the GPU-aware PETSc it seems to allocate memory on the GPUs but > perform no calculations on them, regardless of whether I requested it to > use the GPUs or not. On non-GPU-aware PETSc it accepts my requests to use > the GPUs, but does not throw an error. > > What am I doing wrong? > Lets step back to a simpler thing so we can make sure your configuration is correct. Can you run the 2_cuda test from src/vec/vec/tests/ex28.c ? Does it execute on your GPU? Thanks, Matt > Thanks in advance, > > Gordon > ----------------------------------------------- > Dr Gordon P S Gibb > EPCC, The University of Edinburgh > Tel: +44 131 651 3459 > > The University of Edinburgh is a charitable body, registered in Scotland, > with registration number SC005336. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.gibb at epcc.ed.ac.uk Wed Aug 5 11:47:31 2020 From: g.gibb at epcc.ed.ac.uk (GIBB Gordon) Date: Wed, 5 Aug 2020 16:47:31 +0000 Subject: [petsc-users] Code (possibly) not running on GPU with CUDA In-Reply-To: References: Message-ID: Hi Matt, It runs, however it doesn?t produce any output, and I have no way of checking to see if it actually ran on the GPU. It was run with: srun -n 1 ./ex28 -vec_type cuda -use_gpu_aware_mpi 0 Cheers, Gordon ----------------------------------------------- Dr Gordon P S Gibb EPCC, The University of Edinburgh Tel: +44 131 651 3459 On 5 Aug 2020, at 17:10, Matthew Knepley > wrote: On Wed, Aug 5, 2020 at 11:24 AM GIBB Gordon > wrote: Hi, I?ve built PETSc with NVIDIA support for our GPU machine (https://cirrus.readthedocs.io/en/master/user-guide/gpu.html), and then compiled our executable against this PETSc (using version 3.13.3). I should add that the MPI on our system is not GPU-aware so I have to use -use_gpu_aware_mpi 0 When running this, in the .petscrc I put -dm_vec_type cuda -dm_mat_type aijcusparse as is suggested on the PETSc GPU page (https://www.mcs.anl.gov/petsc/features/gpus.html) to enable CUDA for DMs (all our PETSc data structures are with DMs). I have also ensured I'm using the jacobi preconditioner so that it definitely runs on the GPU (again, according to the PETSc GPU page). When I run this, I note that the GPU seems to have memory allocated on it from my executable, however seems to be doing no computation: Wed Aug 5 13:10:23 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... On | 00000000:1A:00.0 Off | Off | | N/A 43C P0 64W / 300W | 490MiB / 16160MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 33712 C .../z04/gpsgibb/TPLS/TPLS-GPU/./twophase.x 479MiB | +-----------------------------------------------------------------------------+ I then ran the same example but without the -dm_vec_type cuda, -dm_mat_type aijcusparse arguments, and I found the same behaviour (479MB allocated on the GPU, 0% GPU utilisation). In both cases the runtime of the example are near identical, suggesting that both are essentially the same run. As a further test I compiled PETSc without CUDA support and ran the same example again, and found the same runtime as with the GPUs, and (as expected) no GPU memory allocated. I then tried to run the example with the -dm_vec_type cuda, -dm_mat_type aijcusparse arguments and it ran without complaint. I would have expected it to throw an error or at least a warning if invalid arguments were passed to it. All this suggests to me that PETSc is ignoring my requests to use the GPUs. For the GPU-aware PETSc it seems to allocate memory on the GPUs but perform no calculations on them, regardless of whether I requested it to use the GPUs or not. On non-GPU-aware PETSc it accepts my requests to use the GPUs, but does not throw an error. What am I doing wrong? Lets step back to a simpler thing so we can make sure your configuration is correct. Can you run the 2_cuda test from src/vec/vec/tests/ex28.c ? Does it execute on your GPU? Thanks, Matt Thanks in advance, Gordon ----------------------------------------------- Dr Gordon P S Gibb EPCC, The University of Edinburgh Tel: +44 131 651 3459 The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Aug 5 11:58:16 2020 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 5 Aug 2020 12:58:16 -0400 Subject: [petsc-users] Code (possibly) not running on GPU with CUDA In-Reply-To: References: Message-ID: On Wed, Aug 5, 2020 at 12:47 PM GIBB Gordon wrote: > Hi Matt, > > It runs, however it doesn?t produce any output, and I have no way of > checking to see if it actually ran on the GPU. It was run with: > > srun -n 1 ./ex28 -vec_type cuda -use_gpu_aware_mpi 0 > 1) How did you check last time? 2) You can check using -log_view Thanks, Matt > Cheers, > > Gordon > > ----------------------------------------------- > Dr Gordon P S Gibb > EPCC, The University of Edinburgh > Tel: +44 131 651 3459 > > On 5 Aug 2020, at 17:10, Matthew Knepley wrote: > > On Wed, Aug 5, 2020 at 11:24 AM GIBB Gordon wrote: > >> Hi, >> >> I?ve built PETSc with NVIDIA support for our GPU machine ( >> https://cirrus.readthedocs.io/en/master/user-guide/gpu.html), and then >> compiled our executable against this PETSc (using version 3.13.3). I should >> add that the MPI on our system is not GPU-aware so I have to use -use_gpu_aware_mpi >> 0 >> >> When running this, in the .petscrc I put >> >> -dm_vec_type cuda >> -dm_mat_type aijcusparse >> >> as is suggested on the PETSc GPU page ( >> https://www.mcs.anl.gov/petsc/features/gpus.html) to enable CUDA for DMs >> (all our PETSc data structures are with DMs). I have also ensured I'm using >> the jacobi preconditioner so that it definitely runs on the GPU (again, >> according to the PETSc GPU page). >> >> When I run this, I note that the GPU seems to have memory allocated on it >> from my executable, however seems to be doing no computation: >> >> Wed Aug 5 13:10:23 2020 >> >> +-----------------------------------------------------------------------------+ >> | NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: 10.2 >> | >> >> |-------------------------------+----------------------+----------------------+ >> | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. >> ECC | >> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util >> Compute M. | >> >> |===============================+======================+======================| >> | 0 Tesla V100-SXM2... On | 00000000:1A:00.0 Off | >> Off | >> | N/A 43C P0 64W / 300W | 490MiB / 16160MiB | 0% >> Default | >> >> +-------------------------------+----------------------+----------------------+ >> >> >> >> +-----------------------------------------------------------------------------+ >> | Processes: GPU >> Memory | >> | GPU PID Type Process name Usage >> | >> >> |=============================================================================| >> | 0 33712 C .../z04/gpsgibb/TPLS/TPLS-GPU/./twophase.x >> 479MiB | >> >> +-----------------------------------------------------------------------------+ >> >> I then ran the same example but without the -dm_vec_type cuda, >> -dm_mat_type aijcusparse arguments, and I found the same behaviour (479MB >> allocated on the GPU, 0% GPU utilisation). >> >> In both cases the runtime of the example are near identical, suggesting >> that both are essentially the same run. >> >> As a further test I compiled PETSc without CUDA support and ran the same >> example again, and found the same runtime as with the GPUs, and (as >> expected) no GPU memory allocated. I then tried to run the example with the -dm_vec_type >> cuda, -dm_mat_type aijcusparse arguments and it ran without complaint. I >> would have expected it to throw an error or at least a warning if invalid >> arguments were passed to it. >> >> All this suggests to me that PETSc is ignoring my requests to use the >> GPUs. For the GPU-aware PETSc it seems to allocate memory on the GPUs but >> perform no calculations on them, regardless of whether I requested it to >> use the GPUs or not. On non-GPU-aware PETSc it accepts my requests to use >> the GPUs, but does not throw an error. >> >> What am I doing wrong? >> > > Lets step back to a simpler thing so we can make sure your configuration > is correct. Can you run the 2_cuda test from > src/vec/vec/tests/ex28.c ? Does it execute on your GPU? > > Thanks, > > Matt > > >> Thanks in advance, >> >> Gordon >> ----------------------------------------------- >> Dr Gordon P S Gibb >> EPCC, The University of Edinburgh >> Tel: +44 131 651 3459 >> >> The University of Edinburgh is a charitable body, registered in Scotland, >> with registration number SC005336. >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.gibb at epcc.ed.ac.uk Wed Aug 5 12:09:28 2020 From: g.gibb at epcc.ed.ac.uk (GIBB Gordon) Date: Wed, 5 Aug 2020 17:09:28 +0000 Subject: [petsc-users] Code (possibly) not running on GPU with CUDA In-Reply-To: References: Message-ID: <33A0BE80-33C1-4CED-8DEE-C48A6BCB3040@epcc.ed.ac.uk> Hi, I used nvidia-smi before, essentially a kind of ?top? for nvidia-gpus. The log output I get is: ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ########################################################## # # # WARNING!!! # # # # This code was compiled with a debugging option. # # To get timing results run ./configure # # using --with-debugging=no, the performance will # # be generally two or three times faster. # # # ########################################################## ########################################################## # # # WARNING!!! # # # # This code was compiled with GPU support but you used # # an MPI that's not GPU-aware, such Petsc had to copy # # data from GPU to CPU for MPI communication. To get # # meaningfull timing results, please use a GPU-aware # # MPI instead. # ########################################################## /lustre/home/z04/gpsgibb/TPLS/petsc/share/petsc/examples/src/vec/vec/tests/./ex28 on a named r2i7n0 with 1 processor, by gpsgibb Wed Aug 5 18:05:59 2020 Using Petsc Release Version 3.13.3, Jul 01, 2020 Max Max/Min Avg Total Time (sec): 1.566e-01 1.000 1.566e-01 Objects: 4.400e+01 1.000 4.400e+01 Flop: 2.546e+03 1.000 2.546e+03 2.546e+03 Flop/sec: 1.626e+04 1.000 1.626e+04 1.626e+04 Memory: 1.438e+05 1.000 1.438e+05 1.438e+05 MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flop and VecAXPY() for complex vectors of length N --> 8N flop Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 1.5657e-01 100.0% 2.5460e+03 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors) CpuToGpu Count: total number of CPU to GPU copies per processor CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor) GpuToCpu Count: total number of GPU to CPU copies per processor GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor) GPU %F: percent flops on GPU in this event ------------------------------------------------------------------------------------------------------------------------ ########################################################## # # # WARNING!!! # # # # This code was compiled with a debugging option. # # To get timing results run ./configure # # using --with-debugging=no, the performance will # # be generally two or three times faster. # # # ########################################################## Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F --------------------------------------------------------------------------------------------------------------------------------------------------------------- --- Event Stage 0: Main Stage VecDot 4 1.0 7.4222e-05 1.0 1.96e+02 1.0 0.0e+00 0.0e+00 0.0e+00 0 8 0 0 0 0 8 0 0 0 3 3 0 0.00e+00 0 0.00e+00 100 VecNorm 1 1.0 5.4168e-05 1.0 7.30e+01 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 1 1 0 0.00e+00 0 0.00e+00 100 VecSet 83 1.0 9.0480e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecAssemblyBegin 1 1.0 2.7206e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecAssemblyEnd 1 1.0 2.6403e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecSetRandom 1 1.0 1.5260e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecReduceArith 52 1.0 1.1307e-03 1.0 2.28e+03 1.0 0.0e+00 0.0e+00 0.0e+00 1 89 0 0 0 1 89 0 0 0 2 2 2 4.00e-04 0 0.00e+00 100 VecReduceComm 4 1.0 3.4969e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecReduceBegin 1 1.0 2.5639e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecReduceEnd 1 1.0 2.5495e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecCUDACopyTo 2 1.0 1.7550e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 2 4.00e-04 0 0.00e+00 0 VecCUDACopyFrom 42 1.0 3.7747e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 42 8.40e-03 0 --------------------------------------------------------------------------------------------------------------------------------------------------------------- Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Vector 42 42 75264 0. PetscRandom 1 1 646 0. Viewer 1 0 0 0. ======================================================================================================================== Average time to get PetscTime(): 3.67989e-08 #PETSc Option Table entries: -log_view -use_gpu_aware_mpi 0 -vec_type cuda #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: CC=nvcc FC=mpif90 CXX=mpicxx --prefix=/lustre/home/z04/gpsgibb/TPLS/petsc --with-cudac=nvcc --with-cuda=1 --with-mpi-dir= --with-batch ----------------------------------------- Libraries compiled on 2020-07-31 14:46:25 on r2i7n0 Machine characteristics: Linux-4.18.0-147.8.1.el8_1.x86_64-x86_64-with-centos-8.1.1911-Core Using PETSc directory: /lustre/home/z04/gpsgibb/TPLS/petsc Using PETSc arch: ----------------------------------------- Using C compiler: nvcc -g -I/lustre/home/z04/gpsgibb/TPLS/petsc-3.13.3/include Using Fortran compiler: mpif90 -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -I/lustre/home/z04/gpsgibb/TPLS/petsc-3.13.3/include ----------------------------------------- Using include paths: -I/lustre/home/z04/gpsgibb/TPLS/petsc/include -I/lustre/sw/nvidia/hpcsdk/Linux_x86_64/cuda/10.2/include -I/lustre/home/z04/gpsgibb/TPLS/petsc-3.13.3/include ----------------------------------------- Using C linker: nvcc Using Fortran linker: mpif90 Using libraries: -L/lustre/home/z04/gpsgibb/TPLS/petsc/lib -L/lustre/home/z04/gpsgibb/TPLS/petsc/lib -lpetsc -L/lustre/sw/intel/compilers_and_libraries_2019.0.117/linux/mkl -L/lustre/sw/nvidia/hpcsdk/Linux_x86_64/cuda/10.2/lib64 -L/lustre/home/z04/gpsgibb/TPLS/petsc-3.13.3/lib -L/opt/hpe/hpc/mpt/mpt-2.22/lib -L/lustre/sw/nvidia/hpcsdk/Linux_x86_64/20.5/math_libs/10.2/lib64 -L/lustre/sw/gcc/6.3.0/lib/gcc/x86_64-pc-linux-gnu/6.3.0 -L/lustre/sw/gcc/6.3.0/lib64 -L/lustre/sw/intel/compilers_and_libraries_2019.0.117/linux/mkl/lib/intel64 -L/lustre/sw/nvidia/hpcsdk/Linux_x86_64/cuda/10.2/bin -L/lustre/sw/gcc/6.3.0/lib -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -lX11 -lcufft -lcublas -lcudart -lcusparse -lcusolver -lcuda -lmpi++ -lmpi -lstdc++ -ldl -lpthread -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl ----------------------------------------- ########################################################## # # # WARNING!!! # # # # This code was compiled with GPU support but you used # # an MPI that's not GPU-aware, such Petsc had to copy # # data from GPU to CPU for MPI communication. To get # # meaningfull timing results, please use a GPU-aware # # MPI instead. # ########################################################## ########################################################## # # # WARNING!!! # # # # This code was compiled with a debugging option. # # To get timing results run ./configure # # using --with-debugging=no, the performance will # # be generally two or three times faster. # # # ########################################################## ----------------------------------------------- Dr Gordon P S Gibb EPCC, The University of Edinburgh Tel: +44 131 651 3459 On 5 Aug 2020, at 17:58, Matthew Knepley > wrote: On Wed, Aug 5, 2020 at 12:47 PM GIBB Gordon > wrote: Hi Matt, It runs, however it doesn?t produce any output, and I have no way of checking to see if it actually ran on the GPU. It was run with: srun -n 1 ./ex28 -vec_type cuda -use_gpu_aware_mpi 0 1) How did you check last time? 2) You can check using -log_view Thanks, Matt Cheers, Gordon ----------------------------------------------- Dr Gordon P S Gibb EPCC, The University of Edinburgh Tel: +44 131 651 3459 On 5 Aug 2020, at 17:10, Matthew Knepley > wrote: On Wed, Aug 5, 2020 at 11:24 AM GIBB Gordon > wrote: Hi, I?ve built PETSc with NVIDIA support for our GPU machine (https://cirrus.readthedocs.io/en/master/user-guide/gpu.html), and then compiled our executable against this PETSc (using version 3.13.3). I should add that the MPI on our system is not GPU-aware so I have to use -use_gpu_aware_mpi 0 When running this, in the .petscrc I put -dm_vec_type cuda -dm_mat_type aijcusparse as is suggested on the PETSc GPU page (https://www.mcs.anl.gov/petsc/features/gpus.html) to enable CUDA for DMs (all our PETSc data structures are with DMs). I have also ensured I'm using the jacobi preconditioner so that it definitely runs on the GPU (again, according to the PETSc GPU page). When I run this, I note that the GPU seems to have memory allocated on it from my executable, however seems to be doing no computation: Wed Aug 5 13:10:23 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... On | 00000000:1A:00.0 Off | Off | | N/A 43C P0 64W / 300W | 490MiB / 16160MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 33712 C .../z04/gpsgibb/TPLS/TPLS-GPU/./twophase.x 479MiB | +-----------------------------------------------------------------------------+ I then ran the same example but without the -dm_vec_type cuda, -dm_mat_type aijcusparse arguments, and I found the same behaviour (479MB allocated on the GPU, 0% GPU utilisation). In both cases the runtime of the example are near identical, suggesting that both are essentially the same run. As a further test I compiled PETSc without CUDA support and ran the same example again, and found the same runtime as with the GPUs, and (as expected) no GPU memory allocated. I then tried to run the example with the -dm_vec_type cuda, -dm_mat_type aijcusparse arguments and it ran without complaint. I would have expected it to throw an error or at least a warning if invalid arguments were passed to it. All this suggests to me that PETSc is ignoring my requests to use the GPUs. For the GPU-aware PETSc it seems to allocate memory on the GPUs but perform no calculations on them, regardless of whether I requested it to use the GPUs or not. On non-GPU-aware PETSc it accepts my requests to use the GPUs, but does not throw an error. What am I doing wrong? Lets step back to a simpler thing so we can make sure your configuration is correct. Can you run the 2_cuda test from src/vec/vec/tests/ex28.c ? Does it execute on your GPU? Thanks, Matt Thanks in advance, Gordon ----------------------------------------------- Dr Gordon P S Gibb EPCC, The University of Edinburgh Tel: +44 131 651 3459 The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Aug 5 12:20:53 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 5 Aug 2020 12:20:53 -0500 (CDT) Subject: [petsc-users] Code (possibly) not running on GPU with CUDA In-Reply-To: <33A0BE80-33C1-4CED-8DEE-C48A6BCB3040@epcc.ed.ac.uk> References: <33A0BE80-33C1-4CED-8DEE-C48A6BCB3040@epcc.ed.ac.uk> Message-ID: > Configure options: CC=nvcc FC=mpif90 CXX=mpicxx --prefix=/lustre/home/z04/gpsgibb/TPLS/petsc --with-cudac=nvcc --with-cuda=1 --with-mpi-dir= --with-batch This is weird. suggest using: CC=mpicc FC=mpif90 CXX=mpicxx --prefix=/lustre/home/z04/gpsgibb/TPLS/petsc --with-cudac=nvcc --with-cuda=1 > VecCUDACopyTo 2 1.0 1.7550e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 2 4.00e-04 0 0.00e+00 0 > VecCUDACopyFrom 42 1.0 3.7747e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 42 8.40e-03 0 So GPU is getting used. The other way to check: [balay at p1 tests]$ ./ex28 -vec_type cuda -vec_view ascii::ascii_info Vec Object: 1 MPI processes type: seqcuda length=25 [balay at p1 tests]$ ./ex28 -vec_view ascii::ascii_info Vec Object: 1 MPI processes type: seq length=25 Satish From knepley at gmail.com Wed Aug 5 12:30:55 2020 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 5 Aug 2020 13:30:55 -0400 Subject: [petsc-users] Code (possibly) not running on GPU with CUDA In-Reply-To: <33A0BE80-33C1-4CED-8DEE-C48A6BCB3040@epcc.ed.ac.uk> References: <33A0BE80-33C1-4CED-8DEE-C48A6BCB3040@epcc.ed.ac.uk> Message-ID: On Wed, Aug 5, 2020 at 1:09 PM GIBB Gordon wrote: > Hi, > > I used nvidia-smi before, essentially a kind of ?top? for nvidia-gpus. > > The log output I get is: > You can see that all flops are done on the GPU by looking at the last column: Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F --------------------------------------------------------------------------------------------------------------------------------------------------------------- --- Event Stage 0: Main Stage VecDot 4 1.0 7.4222e-05 1.0 1.96e+02 1.0 0.0e+00 0.0e+00 0.0e+00 0 8 0 0 0 0 8 0 0 0 3 3 0 0.00e+00 0 0.00e+00 100 VecNorm 1 1.0 5.4168e-05 1.0 7.30e+01 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 1 1 0 0.00e+00 0 0.00e+00 100 Thanks, Matt ************************************************************************************************************************ > *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r > -fCourier9' to print this document *** > > ************************************************************************************************************************ > > ---------------------------------------------- PETSc Performance Summary: > ---------------------------------------------- > > > > ########################################################## > # # > # WARNING!!! # > # # > # This code was compiled with a debugging option. # > # To get timing results run ./configure # > # using --with-debugging=no, the performance will # > # be generally two or three times faster. # > # # > ########################################################## > > > > > ########################################################## > # # > # WARNING!!! # > # # > # This code was compiled with GPU support but you used # > # an MPI that's not GPU-aware, such Petsc had to copy # > # data from GPU to CPU for MPI communication. To get # > # meaningfull timing results, please use a GPU-aware # > # MPI instead. # > ########################################################## > > > /lustre/home/z04/gpsgibb/TPLS/petsc/share/petsc/examples/src/vec/vec/tests/./ex28 > on a named r2i7n0 with 1 processor, by gpsgibb Wed Aug 5 18:05:59 2020 > Using Petsc Release Version 3.13.3, Jul 01, 2020 > > Max Max/Min Avg Total > Time (sec): 1.566e-01 1.000 1.566e-01 > Objects: 4.400e+01 1.000 4.400e+01 > Flop: 2.546e+03 1.000 2.546e+03 2.546e+03 > Flop/sec: 1.626e+04 1.000 1.626e+04 1.626e+04 > Memory: 1.438e+05 1.000 1.438e+05 1.438e+05 > MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00 > MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00 > MPI Reductions: 0.000e+00 0.000 > > Flop counting convention: 1 flop = 1 real number operation of type > (multiply/divide/add/subtract) > e.g., VecAXPY() for real vectors of length N > --> 2N flop > and VecAXPY() for complex vectors of length N > --> 8N flop > > Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages > --- -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total Count > %Total Avg %Total Count %Total > 0: Main Stage: 1.5657e-01 100.0% 2.5460e+03 100.0% 0.000e+00 > 0.0% 0.000e+00 0.0% 0.000e+00 0.0% > > > ------------------------------------------------------------------------------------------------------------------------ > See the 'Profiling' chapter of the users' manual for details on > interpreting output. > Phase summary info: > Count: number of times phase was executed > Time and Flop: Max - maximum over all processors > Ratio - ratio of maximum to minimum over all processors > Mess: number of messages sent > AvgLen: average message length (bytes) > Reduct: number of global reductions > Global: entire computation > Stage: stages of a computation. Set stages with PetscLogStagePush() and > PetscLogStagePop(). > %T - percent time in this phase %F - percent flop in this > phase > %M - percent messages in this phase %L - percent message lengths > in this phase > %R - percent reductions in this phase > Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over > all processors) > GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU > time over all processors) > CpuToGpu Count: total number of CPU to GPU copies per processor > CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per > processor) > GpuToCpu Count: total number of GPU to CPU copies per processor > GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per > processor) > GPU %F: percent flops on GPU in this event > > ------------------------------------------------------------------------------------------------------------------------ > > > ########################################################## > # # > # WARNING!!! # > # # > # This code was compiled with a debugging option. # > # To get timing results run ./configure # > # using --with-debugging=no, the performance will # > # be generally two or three times faster. # > # # > ########################################################## > > > Event Count Time (sec) Flop > --- Global --- --- Stage ---- Total GPU - CpuToGpu - - > GpuToCpu - GPU > Max Ratio Max Ratio Max Ratio Mess AvgLen > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count > Size %F > > --------------------------------------------------------------------------------------------------------------------------------------------------------------- > > --- Event Stage 0: Main Stage > > VecDot 4 1.0 7.4222e-05 1.0 1.96e+02 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 8 0 0 0 0 8 0 0 0 3 3 0 0.00e+00 0 > 0.00e+00 100 > VecNorm 1 1.0 5.4168e-05 1.0 7.30e+01 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 3 0 0 0 0 3 0 0 0 1 1 0 0.00e+00 0 > 0.00e+00 100 > VecSet 83 1.0 9.0480e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > VecAssemblyBegin 1 1.0 2.7206e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > VecAssemblyEnd 1 1.0 2.6403e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > VecSetRandom 1 1.0 1.5260e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > VecReduceArith 52 1.0 1.1307e-03 1.0 2.28e+03 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 89 0 0 0 1 89 0 0 0 2 2 2 4.00e-04 0 > 0.00e+00 100 > VecReduceComm 4 1.0 3.4969e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > VecReduceBegin 1 1.0 2.5639e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > VecReduceEnd 1 1.0 2.5495e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > VecCUDACopyTo 2 1.0 1.7550e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 2 4.00e-04 0 > 0.00e+00 0 > VecCUDACopyFrom 42 1.0 3.7747e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 42 > 8.40e-03 0 > > --------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Memory usage is given in bytes: > > Object Type Creations Destructions Memory Descendants' Mem. > Reports information only for process 0. > > --- Event Stage 0: Main Stage > > Vector 42 42 75264 0. > PetscRandom 1 1 646 0. > Viewer 1 0 0 0. > > ======================================================================================================================== > Average time to get PetscTime(): 3.67989e-08 > #PETSc Option Table entries: > -log_view > -use_gpu_aware_mpi 0 > -vec_type cuda > #End of PETSc Option Table entries > Compiled without FORTRAN kernels > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 > sizeof(PetscScalar) 8 sizeof(PetscInt) 4 > Configure options: CC=nvcc FC=mpif90 CXX=mpicxx > --prefix=/lustre/home/z04/gpsgibb/TPLS/petsc --with-cudac=nvcc > --with-cuda=1 --with-mpi-dir= --with-batch > ----------------------------------------- > Libraries compiled on 2020-07-31 14:46:25 on r2i7n0 > Machine characteristics: > Linux-4.18.0-147.8.1.el8_1.x86_64-x86_64-with-centos-8.1.1911-Core > Using PETSc directory: /lustre/home/z04/gpsgibb/TPLS/petsc > Using PETSc arch: > ----------------------------------------- > > Using C compiler: nvcc -g > -I/lustre/home/z04/gpsgibb/TPLS/petsc-3.13.3/include > Using Fortran compiler: mpif90 -Wall -ffree-line-length-0 > -Wno-unused-dummy-argument -g > -I/lustre/home/z04/gpsgibb/TPLS/petsc-3.13.3/include > ----------------------------------------- > > Using include paths: -I/lustre/home/z04/gpsgibb/TPLS/petsc/include > -I/lustre/sw/nvidia/hpcsdk/Linux_x86_64/cuda/10.2/include > -I/lustre/home/z04/gpsgibb/TPLS/petsc-3.13.3/include > ----------------------------------------- > > Using C linker: nvcc > Using Fortran linker: mpif90 > Using libraries: -L/lustre/home/z04/gpsgibb/TPLS/petsc/lib > -L/lustre/home/z04/gpsgibb/TPLS/petsc/lib -lpetsc > -L/lustre/sw/intel/compilers_and_libraries_2019.0.117/linux/mkl > -L/lustre/sw/nvidia/hpcsdk/Linux_x86_64/cuda/10.2/lib64 > -L/lustre/home/z04/gpsgibb/TPLS/petsc-3.13.3/lib > -L/opt/hpe/hpc/mpt/mpt-2.22/lib > -L/lustre/sw/nvidia/hpcsdk/Linux_x86_64/20.5/math_libs/10.2/lib64 > -L/lustre/sw/gcc/6.3.0/lib/gcc/x86_64-pc-linux-gnu/6.3.0 > -L/lustre/sw/gcc/6.3.0/lib64 > -L/lustre/sw/intel/compilers_and_libraries_2019.0.117/linux/mkl/lib/intel64 > -L/lustre/sw/nvidia/hpcsdk/Linux_x86_64/cuda/10.2/bin > -L/lustre/sw/gcc/6.3.0/lib -lmkl_intel_lp64 -lmkl_core -lmkl_sequential > -lpthread -lX11 -lcufft -lcublas -lcudart -lcusparse -lcusolver -lcuda > -lmpi++ -lmpi -lstdc++ -ldl -lpthread -lmpi -lgfortran -lm -lgfortran -lm > -lgcc_s -lquadmath -lstdc++ -ldl > ----------------------------------------- > > > > ########################################################## > # # > # WARNING!!! # > # # > # This code was compiled with GPU support but you used # > # an MPI that's not GPU-aware, such Petsc had to copy # > # data from GPU to CPU for MPI communication. To get # > # meaningfull timing results, please use a GPU-aware # > # MPI instead. # > ########################################################## > > > > > ########################################################## > # # > # WARNING!!! # > # # > # This code was compiled with a debugging option. # > # To get timing results run ./configure # > # using --with-debugging=no, the performance will # > # be generally two or three times faster. # > # # > ########################################################## > > > ----------------------------------------------- > Dr Gordon P S Gibb > EPCC, The University of Edinburgh > Tel: +44 131 651 3459 > > On 5 Aug 2020, at 17:58, Matthew Knepley wrote: > > On Wed, Aug 5, 2020 at 12:47 PM GIBB Gordon wrote: > >> Hi Matt, >> >> It runs, however it doesn?t produce any output, and I have no way of >> checking to see if it actually ran on the GPU. It was run with: >> >> srun -n 1 ./ex28 -vec_type cuda -use_gpu_aware_mpi 0 >> > > 1) How did you check last time? > > 2) You can check using -log_view > > Thanks, > > Matt > > >> Cheers, >> >> Gordon >> >> ----------------------------------------------- >> Dr Gordon P S Gibb >> EPCC, The University of Edinburgh >> Tel: +44 131 651 3459 >> >> On 5 Aug 2020, at 17:10, Matthew Knepley wrote: >> >> On Wed, Aug 5, 2020 at 11:24 AM GIBB Gordon wrote: >> >>> Hi, >>> >>> I?ve built PETSc with NVIDIA support for our GPU machine ( >>> https://cirrus.readthedocs.io/en/master/user-guide/gpu.html), and then >>> compiled our executable against this PETSc (using version 3.13.3). I should >>> add that the MPI on our system is not GPU-aware so I have to use -use_gpu_aware_mpi >>> 0 >>> >>> When running this, in the .petscrc I put >>> >>> -dm_vec_type cuda >>> -dm_mat_type aijcusparse >>> >>> as is suggested on the PETSc GPU page ( >>> https://www.mcs.anl.gov/petsc/features/gpus.html) to enable CUDA for >>> DMs (all our PETSc data structures are with DMs). I have also ensured I'm >>> using the jacobi preconditioner so that it definitely runs on the GPU >>> (again, according to the PETSc GPU page). >>> >>> When I run this, I note that the GPU seems to have memory allocated on >>> it from my executable, however seems to be doing no computation: >>> >>> Wed Aug 5 13:10:23 2020 >>> >>> +-----------------------------------------------------------------------------+ >>> | NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: >>> 10.2 | >>> >>> |-------------------------------+----------------------+----------------------+ >>> | GPU Name Persistence-M| Bus-Id Disp.A | Volatile >>> Uncorr. ECC | >>> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util >>> Compute M. | >>> >>> |===============================+======================+======================| >>> | 0 Tesla V100-SXM2... On | 00000000:1A:00.0 Off | >>> Off | >>> | N/A 43C P0 64W / 300W | 490MiB / 16160MiB | 0% >>> Default | >>> >>> +-------------------------------+----------------------+----------------------+ >>> >>> >>> >>> +-----------------------------------------------------------------------------+ >>> | Processes: GPU >>> Memory | >>> | GPU PID Type Process name >>> Usage | >>> >>> |=============================================================================| >>> | 0 33712 C .../z04/gpsgibb/TPLS/TPLS-GPU/./twophase.x >>> 479MiB | >>> >>> +-----------------------------------------------------------------------------+ >>> >>> I then ran the same example but without the -dm_vec_type cuda, >>> -dm_mat_type aijcusparse arguments, and I found the same behaviour (479MB >>> allocated on the GPU, 0% GPU utilisation). >>> >>> In both cases the runtime of the example are near identical, suggesting >>> that both are essentially the same run. >>> >>> As a further test I compiled PETSc without CUDA support and ran the same >>> example again, and found the same runtime as with the GPUs, and (as >>> expected) no GPU memory allocated. I then tried to run the example with the -dm_vec_type >>> cuda, -dm_mat_type aijcusparse arguments and it ran without complaint. >>> I would have expected it to throw an error or at least a warning if invalid >>> arguments were passed to it. >>> >>> All this suggests to me that PETSc is ignoring my requests to use the >>> GPUs. For the GPU-aware PETSc it seems to allocate memory on the GPUs but >>> perform no calculations on them, regardless of whether I requested it to >>> use the GPUs or not. On non-GPU-aware PETSc it accepts my requests to use >>> the GPUs, but does not throw an error. >>> >>> What am I doing wrong? >>> >> >> Lets step back to a simpler thing so we can make sure your configuration >> is correct. Can you run the 2_cuda test from >> src/vec/vec/tests/ex28.c ? Does it execute on your GPU? >> >> Thanks, >> >> Matt >> >> >>> Thanks in advance, >>> >>> Gordon >>> ----------------------------------------------- >>> Dr Gordon P S Gibb >>> EPCC, The University of Edinburgh >>> Tel: +44 131 651 3459 >>> >>> The University of Edinburgh is a charitable body, registered in >>> Scotland, with registration number SC005336. >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Aug 5 12:48:04 2020 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 5 Aug 2020 12:48:04 -0500 Subject: [petsc-users] Code (possibly) not running on GPU with CUDA In-Reply-To: References: Message-ID: <820A222F-B89E-4EF4-943C-A021857112E4@petsc.dev> Gordon, Do you have a call to DMSetFromOptions()? Barry > On Aug 5, 2020, at 10:24 AM, GIBB Gordon wrote: > > Hi, > > I?ve built PETSc with NVIDIA support for our GPU machine (https://cirrus.readthedocs.io/en/master/user-guide/gpu.html ), and then compiled our executable against this PETSc (using version 3.13.3). I should add that the MPI on our system is not GPU-aware so I have to use -use_gpu_aware_mpi 0 > > When running this, in the .petscrc I put > > -dm_vec_type cuda > -dm_mat_type aijcusparse > > as is suggested on the PETSc GPU page (https://www.mcs.anl.gov/petsc/features/gpus.html ) to enable CUDA for DMs (all our PETSc data structures are with DMs). I have also ensured I'm using the jacobi preconditioner so that it definitely runs on the GPU (again, according to the PETSc GPU page). > > When I run this, I note that the GPU seems to have memory allocated on it from my executable, however seems to be doing no computation: > > Wed Aug 5 13:10:23 2020 > +-----------------------------------------------------------------------------+ > | NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: 10.2 | > |-------------------------------+----------------------+----------------------+ > | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | > | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | > |===============================+======================+======================| > | 0 Tesla V100-SXM2... On | 00000000:1A:00.0 Off | Off | > | N/A 43C P0 64W / 300W | 490MiB / 16160MiB | 0% Default | > +-------------------------------+----------------------+----------------------+ > > +-----------------------------------------------------------------------------+ > | Processes: GPU Memory | > | GPU PID Type Process name Usage | > |=============================================================================| > | 0 33712 C .../z04/gpsgibb/TPLS/TPLS-GPU/./twophase.x 479MiB | > +-----------------------------------------------------------------------------+ > > I then ran the same example but without the -dm_vec_type cuda, -dm_mat_type aijcusparse arguments, and I found the same behaviour (479MB allocated on the GPU, 0% GPU utilisation). > > In both cases the runtime of the example are near identical, suggesting that both are essentially the same run. > > As a further test I compiled PETSc without CUDA support and ran the same example again, and found the same runtime as with the GPUs, and (as expected) no GPU memory allocated. I then tried to run the example with the -dm_vec_type cuda, -dm_mat_type aijcusparse arguments and it ran without complaint. I would have expected it to throw an error or at least a warning if invalid arguments were passed to it. > > All this suggests to me that PETSc is ignoring my requests to use the GPUs. For the GPU-aware PETSc it seems to allocate memory on the GPUs but perform no calculations on them, regardless of whether I requested it to use the GPUs or not. On non-GPU-aware PETSc it accepts my requests to use the GPUs, but does not throw an error. > > What am I doing wrong? > > Thanks in advance, > > Gordon > ----------------------------------------------- > Dr Gordon P S Gibb > EPCC, The University of Edinburgh > Tel: +44 131 651 3459 > > The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at email.arizona.edu Wed Aug 5 13:27:03 2020 From: aph at email.arizona.edu (Anthony Paul Haas) Date: Wed, 5 Aug 2020 11:27:03 -0700 Subject: [petsc-users] [EXT]Re: reuse Mumps factorization for multiple RHS In-Reply-To: References: Message-ID: Awesome! thanks to all. Anthony On Tue, Aug 4, 2020 at 7:28 PM Zhang, Hong wrote: > *External Email* > See case '"-num_rhs" in petsc/src/ksp/ksp/tests/ex30.c > > Hong > > > > ------------------------------ > *From:* petsc-users on behalf of > Matthew Knepley > *Sent:* Tuesday, August 4, 2020 7:29 PM > *To:* Anthony Paul Haas > *Cc:* petsc-users > *Subject:* Re: [petsc-users] reuse Mumps factorization for multiple RHS > > On Tue, Aug 4, 2020 at 7:57 PM Anthony Paul Haas > wrote: > > Hello, > > When using Mumps to solve a linear system of equations (see below), can I > reuse the factorization to solve for multiple RHS, ie, can I use KSPSolve multiple > times while only building a different RHS in between the calls to KSPSolve > ? > > > Yes. It should detect that you have not changed the operator for the KSP > and thus not refactor the matrix. > > Thanks, > > Matt > > > Thanks, > > Anthony > > call KSPSetType(self%ksp,KSPPREONLY,self%ierr_ps) > > call KSPGetPC(self%ksp,pc,self%ierr_ps) > > > > > > call PCSetType(pc,PCLU,self%ierr_ps) > > > > > > call PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS,self%ierr_ps) > > call PCFactorSetUpMatSolverPackage(pc,self%ierr_ps) > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From adantra at gmail.com Wed Aug 5 13:58:22 2020 From: adantra at gmail.com (Adolfo Rodriguez) Date: Wed, 5 Aug 2020 13:58:22 -0500 Subject: [petsc-users] Test convergence with non linear preconditioners Message-ID: I am trying to use non-linear preconditioners in a very similar way shown in the paper "Composing scalable nonlinear solvers" by Peter Brune and others. My question is simple, actually. I have provided a test convergence function, but it seems that the inner method does not take it into account and follows the default convergence criteria. Is there a way to specify a convergence function to the inner method? Appreciate any hints on this. Regards, Adolfo -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Aug 5 14:17:16 2020 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 5 Aug 2020 15:17:16 -0400 Subject: [petsc-users] Test convergence with non linear preconditioners In-Reply-To: References: Message-ID: On Wed, Aug 5, 2020 at 2:59 PM Adolfo Rodriguez wrote: > I am trying to use non-linear preconditioners in a very similar way shown > in the paper "Composing scalable nonlinear solvers" by Peter Brune and > others. > > My question is simple, actually. I have provided a test convergence > function, but it seems that the inner method does not take it into account > and follows the default convergence criteria. Is there a way to specify a > convergence function to the inner method? > > Appreciate any hints on this. > This is a good question. Currently, we do not have an alternative to just pulling out the inner solver and setting it directly. I don't think the right thing is to have inner solvers follow outer solvers since frequently they have much different convergence requirements. Maybe we could provide a function name in an options and have it looked up dynamically. What do people think about that? Thanks, Matt > Regards, > > Adolfo > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Aug 5 14:18:51 2020 From: jed at jedbrown.org (Jed Brown) Date: Wed, 05 Aug 2020 13:18:51 -0600 Subject: [petsc-users] Test convergence with non linear preconditioners In-Reply-To: References: Message-ID: <878sesgauc.fsf@jedbrown.org> Adolfo Rodriguez writes: > I am trying to use non-linear preconditioners in a very similar way shown > in the paper "Composing scalable nonlinear solvers" by Peter Brune and > others. > > My question is simple, actually. I have provided a test convergence > function, but it seems that the inner method does not take it into account > and follows the default convergence criteria. Is there a way to specify a > convergence function to the inner method? Are you saying that you did SNESGetNPC(snes, &inner); SNESSetConvergenceTest(inner, YourFunc, ...); and yet the function is not being called? Can you send output of -snes_view? From adantra at gmail.com Wed Aug 5 15:07:30 2020 From: adantra at gmail.com (Adolfo Rodriguez) Date: Wed, 5 Aug 2020 15:07:30 -0500 Subject: [petsc-users] Test convergence with non linear preconditioners In-Reply-To: <878sesgauc.fsf@jedbrown.org> References: <878sesgauc.fsf@jedbrown.org> Message-ID: Actually I can set the non-linear pc. My problem relates to the convergence test. I have not found a way to set the tolerances for the inner problem, are they hard coded? Thanks! Virus-free. www.avast.com <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> On Wed, Aug 5, 2020 at 2:18 PM Jed Brown wrote: > Adolfo Rodriguez writes: > > > I am trying to use non-linear preconditioners in a very similar way shown > > in the paper "Composing scalable nonlinear solvers" by Peter Brune and > > others. > > > > My question is simple, actually. I have provided a test convergence > > function, but it seems that the inner method does not take it into > account > > and follows the default convergence criteria. Is there a way to specify a > > convergence function to the inner method? > > Are you saying that you did > > SNESGetNPC(snes, &inner); > SNESSetConvergenceTest(inner, YourFunc, ...); > > and yet the function is not being called? Can you send output of > -snes_view? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Aug 5 15:11:53 2020 From: jed at jedbrown.org (Jed Brown) Date: Wed, 05 Aug 2020 14:11:53 -0600 Subject: [petsc-users] Test convergence with non linear preconditioners In-Reply-To: References: <878sesgauc.fsf@jedbrown.org> Message-ID: <875z9wg8dy.fsf@jedbrown.org> Adolfo Rodriguez writes: > Actually I can set the non-linear pc. My problem relates to the convergence > test. I have not found a way to set the tolerances for the inner problem, > are they hard coded? I suggested code to set a custom convergence test. Nothing is hard-coded. What have you done? From adantra at gmail.com Wed Aug 5 15:30:27 2020 From: adantra at gmail.com (Adolfo Rodriguez) Date: Wed, 5 Aug 2020 15:30:27 -0500 Subject: [petsc-users] Test convergence with non linear preconditioners In-Reply-To: <875z9wg8dy.fsf@jedbrown.org> References: <878sesgauc.fsf@jedbrown.org> <875z9wg8dy.fsf@jedbrown.org> Message-ID: Jed, I tred your suggestion SNESGetNPC(snes, &inner); SNESSetConvergenceTest(inner, YourFunc, ...); and it is working as expected. I had not pieced together the fact that "inner" is a "snes" object as well. Thanks! Virus-free. www.avast.com <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> On Wed, Aug 5, 2020 at 3:11 PM Jed Brown wrote: > Adolfo Rodriguez writes: > > > Actually I can set the non-linear pc. My problem relates to the > convergence > > test. I have not found a way to set the tolerances for the inner problem, > > are they hard coded? > > I suggested code to set a custom convergence test. Nothing is > hard-coded. What have you done? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Aug 5 15:41:51 2020 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 5 Aug 2020 15:41:51 -0500 Subject: [petsc-users] Test convergence with non linear preconditioners In-Reply-To: References: <878sesgauc.fsf@jedbrown.org> <875z9wg8dy.fsf@jedbrown.org> Message-ID: <939CE625-86EB-43CB-8F51-C521BDBC3BC3@petsc.dev> Adolfo, You can also just change the tolerances for the inner solve using the options data base and the prefix for the inner solve. When you run with -snes_view it will show the prefix for each of the (nested) solvers. You can also run with -help to get all the possible options for the inner solvers. In this case I think the prefix is npc so you can set tolerances with -npc_snes_rtol -npc_ksp_rtol etc. From the program you can use SNESGetNPC() and then call SNESSetXXX() to set options, for the linear solver (if there is one) call SNESGetKSP() on the npc and then set options on that KSP. Barry > On Aug 5, 2020, at 3:30 PM, Adolfo Rodriguez wrote: > > Jed, > > I tred your suggestion > > SNESGetNPC(snes, &inner); > SNESSetConvergenceTest(inner, YourFunc, ...); > > and it is working as expected. I had not pieced together the fact that "inner" is a "snes" object as well. > > Thanks! > > > > Virus-free. www.avast.com > On Wed, Aug 5, 2020 at 3:11 PM Jed Brown > wrote: > Adolfo Rodriguez > writes: > > > Actually I can set the non-linear pc. My problem relates to the convergence > > test. I have not found a way to set the tolerances for the inner problem, > > are they hard coded? > > I suggested code to set a custom convergence test. Nothing is hard-coded. What have you done? -------------- next part -------------- An HTML attachment was scrubbed... URL: From adantra at gmail.com Wed Aug 5 20:10:48 2020 From: adantra at gmail.com (Adolfo Rodriguez) Date: Wed, 5 Aug 2020 20:10:48 -0500 Subject: [petsc-users] Test convergence with non linear preconditioners In-Reply-To: <939CE625-86EB-43CB-8F51-C521BDBC3BC3@petsc.dev> References: <878sesgauc.fsf@jedbrown.org> <875z9wg8dy.fsf@jedbrown.org> <939CE625-86EB-43CB-8F51-C521BDBC3BC3@petsc.dev> Message-ID: It looks like I cannot really change the test function or anything else for this particular SNES solver (I am using SNESFas). Basically, I am trying to use the ideas exposed in the paper on Composing scalable solvers but it seems that SNESFas does not allow to change the function for testing convergence, or anything else. Is this correct? Adolfo Virus-free. www.avast.com <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> On Wed, Aug 5, 2020 at 3:41 PM Barry Smith wrote: > > Adolfo, > > You can also just change the tolerances for the inner solve using the > options data base and the prefix for the inner solve. > > When you run with -snes_view it will show the prefix for each of the > (nested) solvers. You can also run with -help to get all the possible > options for the inner solvers. > > In this case I think the prefix is npc so you can set tolerances with > -npc_snes_rtol -npc_ksp_rtol etc. From the program you can > use SNESGetNPC() and then call SNESSetXXX() to set options, for the linear > solver (if there is one) call SNESGetKSP() on the npc and then set options > on that KSP. > > > Barry > > > On Aug 5, 2020, at 3:30 PM, Adolfo Rodriguez wrote: > > Jed, > > I tred your suggestion > > SNESGetNPC(snes, &inner); > SNESSetConvergenceTest(inner, YourFunc, ...); > > and it is working as expected. I had not pieced together the fact that > "inner" is a "snes" object as well. > > Thanks! > > > > > Virus-free. > www.avast.com > > > On Wed, Aug 5, 2020 at 3:11 PM Jed Brown wrote: > >> Adolfo Rodriguez writes: >> >> > Actually I can set the non-linear pc. My problem relates to the >> convergence >> > test. I have not found a way to set the tolerances for the inner >> problem, >> > are they hard coded? >> >> I suggested code to set a custom convergence test. Nothing is >> hard-coded. What have you done? >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Aug 5 20:31:12 2020 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 5 Aug 2020 20:31:12 -0500 Subject: [petsc-users] Test convergence with non linear preconditioners In-Reply-To: References: <878sesgauc.fsf@jedbrown.org> <875z9wg8dy.fsf@jedbrown.org> <939CE625-86EB-43CB-8F51-C521BDBC3BC3@petsc.dev> Message-ID: Turtles on top of turtles on top of turtles. It is probably easiest for you to look at the actual code to see how it handles things 1) the SNESFAS uses SNES for each of the levels, for each of these level SNES you can control the convergence criteria (either from the command lineor with appropriate prefix (highly recommended) or with function calls (not recommended)); and even provide your own convergence functions run with -snes_view to see the various solvers and their prefixes). 2) at the finest level of SNESFAS it does call /* Test for convergence */ if (isFine) { ierr = (*snes->ops->converged)(snes,snes->iter,0.0,0.0,snes->norm,&snes->reason,snes->cnvP);CHKERRQ(ierr); if (snes->reason) break; } src/snes/impls/fas/fas.c line 881 so at least in theory you can provide your own convergence test function. It was certainly our intention that users can control all the convergence knobs for arbitrary imbedded nonlinear solvers including FAS but, of course, there may be bugs so let us know what doesn't work. Generally the model for FAS is to run a single (or small number of) iteration(s) on the level solves and so not directly use convergence tolerances like rtol to control the number of iterations on a level but you should be able to set any criteria you want. You should be able to run with -snes_view and change some of the criteria on the command line and see the changes presented in the -snes_view output, plus see differences in convergence behavior. Barry > On Aug 5, 2020, at 8:10 PM, Adolfo Rodriguez wrote: > > It looks like I cannot really change the test function or anything else for this particular SNES solver (I am using SNESFas). Basically, I am trying to use the ideas exposed in the paper on Composing scalable solvers but it seems that SNESFas does not allow to change the function for testing convergence, or anything else. Is this correct? > > Adolfo > > Virus-free. www.avast.com > On Wed, Aug 5, 2020 at 3:41 PM Barry Smith > wrote: > > Adolfo, > > You can also just change the tolerances for the inner solve using the options data base and the prefix for the inner solve. > > When you run with -snes_view it will show the prefix for each of the (nested) solvers. You can also run with -help to get all the possible options for the inner solvers. > > In this case I think the prefix is npc so you can set tolerances with -npc_snes_rtol -npc_ksp_rtol etc. From the program you can use SNESGetNPC() and then call SNESSetXXX() to set options, for the linear solver (if there is one) call SNESGetKSP() on the npc and then set options on that KSP. > > > Barry > > >> On Aug 5, 2020, at 3:30 PM, Adolfo Rodriguez > wrote: >> >> Jed, >> >> I tred your suggestion >> >> SNESGetNPC(snes, &inner); >> SNESSetConvergenceTest(inner, YourFunc, ...); >> >> and it is working as expected. I had not pieced together the fact that "inner" is a "snes" object as well. >> >> Thanks! >> >> >> >> Virus-free. www.avast.com <> >> On Wed, Aug 5, 2020 at 3:11 PM Jed Brown > wrote: >> Adolfo Rodriguez > writes: >> >> > Actually I can set the non-linear pc. My problem relates to the convergence >> > test. I have not found a way to set the tolerances for the inner problem, >> > are they hard coded? >> >> I suggested code to set a custom convergence test. Nothing is hard-coded. What have you done? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.gibb at epcc.ed.ac.uk Thu Aug 6 06:09:43 2020 From: g.gibb at epcc.ed.ac.uk (GIBB Gordon) Date: Thu, 6 Aug 2020 11:09:43 +0000 Subject: [petsc-users] Code (possibly) not running on GPU with CUDA In-Reply-To: <820A222F-B89E-4EF4-943C-A021857112E4@petsc.dev> References: <820A222F-B89E-4EF4-943C-A021857112E4@petsc.dev> Message-ID: <25F47C0C-CAE5-4DB6-BBAB-642C1C41CF21@epcc.ed.ac.uk> I do not have DMSetFromOptions in the code, so this may well be the issue ----------------------------------------------- Dr Gordon P S Gibb EPCC, The University of Edinburgh Tel: +44 131 651 3459 On 5 Aug 2020, at 18:48, Barry Smith > wrote: Gordon, Do you have a call to DMSetFromOptions()? Barry On Aug 5, 2020, at 10:24 AM, GIBB Gordon > wrote: Hi, I?ve built PETSc with NVIDIA support for our GPU machine (https://cirrus.readthedocs.io/en/master/user-guide/gpu.html), and then compiled our executable against this PETSc (using version 3.13.3). I should add that the MPI on our system is not GPU-aware so I have to use -use_gpu_aware_mpi 0 When running this, in the .petscrc I put -dm_vec_type cuda -dm_mat_type aijcusparse as is suggested on the PETSc GPU page (https://www.mcs.anl.gov/petsc/features/gpus.html) to enable CUDA for DMs (all our PETSc data structures are with DMs). I have also ensured I'm using the jacobi preconditioner so that it definitely runs on the GPU (again, according to the PETSc GPU page). When I run this, I note that the GPU seems to have memory allocated on it from my executable, however seems to be doing no computation: Wed Aug 5 13:10:23 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... On | 00000000:1A:00.0 Off | Off | | N/A 43C P0 64W / 300W | 490MiB / 16160MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 33712 C .../z04/gpsgibb/TPLS/TPLS-GPU/./twophase.x 479MiB | +-----------------------------------------------------------------------------+ I then ran the same example but without the -dm_vec_type cuda, -dm_mat_type aijcusparse arguments, and I found the same behaviour (479MB allocated on the GPU, 0% GPU utilisation). In both cases the runtime of the example are near identical, suggesting that both are essentially the same run. As a further test I compiled PETSc without CUDA support and ran the same example again, and found the same runtime as with the GPUs, and (as expected) no GPU memory allocated. I then tried to run the example with the -dm_vec_type cuda, -dm_mat_type aijcusparse arguments and it ran without complaint. I would have expected it to throw an error or at least a warning if invalid arguments were passed to it. All this suggests to me that PETSc is ignoring my requests to use the GPUs. For the GPU-aware PETSc it seems to allocate memory on the GPUs but perform no calculations on them, regardless of whether I requested it to use the GPUs or not. On non-GPU-aware PETSc it accepts my requests to use the GPUs, but does not throw an error. What am I doing wrong? Thanks in advance, Gordon ----------------------------------------------- Dr Gordon P S Gibb EPCC, The University of Edinburgh Tel: +44 131 651 3459 The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. -------------- next part -------------- An HTML attachment was scrubbed... URL: From nb25 at rice.edu Thu Aug 6 19:32:59 2020 From: nb25 at rice.edu (Nidish) Date: Thu, 6 Aug 2020 19:32:59 -0500 Subject: [petsc-users] Best practices for solving Dense Linear systems Message-ID: <7f208bce-6c5f-48ed-db38-4f3226dbf1f4@rice.edu> I'm relatively new to PETSc, and my applications involve (for the most part) dense matrix solves. I read in the documentation that this is an area PETSc does not specialize in but instead recommends external libraries such as Elemental. I'm wondering if there are any "best" practices in this regard. Some questions I'd like answered are: 1. Can I just declare my dense matrix as a sparse one and fill the whole matrix up? Do any of the others go this route? What're possible pitfalls/unfavorable outcomes for this? I understand the memory overhead probably shoots up. 2. Are there any specific guidelines on when I can expect elemental to perform better in parallel than in serial? Of course, I'm interesting in any other details that may be important in this regard. Thank you, Nidish From adantra at gmail.com Thu Aug 6 21:07:15 2020 From: adantra at gmail.com (Adolfo Rodriguez) Date: Thu, 6 Aug 2020 21:07:15 -0500 Subject: [petsc-users] Test convergence with non linear preconditioners In-Reply-To: References: <878sesgauc.fsf@jedbrown.org> <875z9wg8dy.fsf@jedbrown.org> <939CE625-86EB-43CB-8F51-C521BDBC3BC3@petsc.dev> Message-ID: Considering the output produced by snes_view (attachment), would be possible to change the linear solver tolerances and the preconditioning level or type? Adolfo Virus-free. www.avast.com <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> On Wed, Aug 5, 2020 at 8:31 PM Barry Smith wrote: > > Turtles on top of turtles on top of turtles. > > It is probably easiest for you to look at the actual code to see how it > handles things > > 1) the SNESFAS uses SNES for each of the levels, for each of these level > SNES you can control the convergence criteria (either from the command > lineor with appropriate prefix (highly recommended) or with function calls > (not recommended)); and even provide your own convergence functions run > with -snes_view to see the various solvers and their prefixes). > > 2) at the finest level of SNESFAS it does call > > /* Test for convergence */ > if (isFine) { > ierr = > (*snes->ops->converged)(snes,snes->iter,0.0,0.0,snes->norm,&snes->reason,snes->cnvP);CHKERRQ(ierr); > if (snes->reason) break; > } > > src/snes/impls/fas/fas.c line 881 so at least in theory you can provide > your own convergence test function. > > It was certainly our intention that users can control all the > convergence knobs for arbitrary imbedded nonlinear solvers including FAS > but, of course, there may be bugs so let us know what doesn't work. > > Generally the model for FAS is to run a single (or small number of) > iteration(s) on the level solves and so not directly use convergence > tolerances like rtol to control the number of iterations on a level but you > should be able to set any criteria you want. > > You should be able to run with -snes_view and change some of the > criteria on the command line and see the changes presented in the > -snes_view output, plus see differences in convergence behavior. > > > > Barry > > > > On Aug 5, 2020, at 8:10 PM, Adolfo Rodriguez wrote: > > It looks like I cannot really change the test function or anything else > for this particular SNES solver (I am using SNESFas). Basically, I am > trying to use the ideas exposed in the paper on Composing scalable solvers > but it seems that SNESFas does not allow to change the function for testing > convergence, or anything else. Is this correct? > > Adolfo > > > Virus-free. > www.avast.com > > > On Wed, Aug 5, 2020 at 3:41 PM Barry Smith wrote: > >> >> Adolfo, >> >> You can also just change the tolerances for the inner solve using >> the options data base and the prefix for the inner solve. >> >> When you run with -snes_view it will show the prefix for each of the >> (nested) solvers. You can also run with -help to get all the possible >> options for the inner solvers. >> >> In this case I think the prefix is npc so you can set tolerances >> with -npc_snes_rtol -npc_ksp_rtol etc. From the program you >> can use SNESGetNPC() and then call SNESSetXXX() to set options, for the >> linear solver (if there is one) call SNESGetKSP() on the npc and then set >> options on that KSP. >> >> >> Barry >> >> >> On Aug 5, 2020, at 3:30 PM, Adolfo Rodriguez wrote: >> >> Jed, >> >> I tred your suggestion >> >> SNESGetNPC(snes, &inner); >> SNESSetConvergenceTest(inner, YourFunc, ...); >> >> and it is working as expected. I had not pieced together the fact that >> "inner" is a "snes" object as well. >> >> Thanks! >> >> >> >> >> Virus-free. >> www.avast.com >> >> >> On Wed, Aug 5, 2020 at 3:11 PM Jed Brown wrote: >> >>> Adolfo Rodriguez writes: >>> >>> > Actually I can set the non-linear pc. My problem relates to the >>> convergence >>> > test. I have not found a way to set the tolerances for the inner >>> problem, >>> > are they hard coded? >>> >>> I suggested code to set a custom convergence test. Nothing is >>> hard-coded. What have you done? >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- SNES Object: 1 MPI processes type: anderson Number of stored past updates: 30 Residual selection: gammaA=2e+00, gammaC=2e+00 Difference restart: epsilonB=1e-01, deltaB=9e-01 Restart on F_M residual increase: FALSE maximum iterations=10000, maximum function evaluations=30000 tolerances: relative=1e-08, absolute=1e-50, solution=1e-08 total number of function evaluations=2 norm schedule ALWAYS SNESLineSearch Object: 1 MPI processes type: basic maxstep=1.000000e+08, minlambda=1.000000e-12 tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 maximum iterations=1 SNES Object: (npc_) 1 MPI processes type: fas type is MULTIPLICATIVE, levels=1, cycles=1 Not using Galerkin computed coarse grid function evaluation Coarse grid solver -- level 0 ------------------------------- SNES Object: (npc_fas_coarse_) 1 MPI processes type: newtonls maximum iterations=50, maximum function evaluations=10000 tolerances: relative=1e-08, absolute=1e-50, solution=1e-08 total number of linear solver iterations=43 total number of function evaluations=3 norm schedule FINALONLY SNESLineSearch Object: (npc_fas_coarse_) 1 MPI processes type: bt interpolation: cubic alpha=1.000000e-04 maxstep=1.000000e+08, minlambda=1.000000e-12 tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 maximum iterations=40 KSP Object: (npc_fas_coarse_) 1 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (npc_fas_coarse_) 1 MPI processes type: ilu out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1., needed 1. Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=786, cols=786 package used to perform factorization: petsc total: nonzeros=6060, allocated nonzeros=6060 total number of mallocs used during MatSetValues calls=0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=786, cols=786 total: nonzeros=6060, allocated nonzeros=6060 total number of mallocs used during MatSetValues calls=0 not using I-node routines maximum iterations=1, maximum function evaluations=30000 tolerances: relative=0., absolute=0., solution=0. total number of function evaluations=0 norm schedule FINALONLY From knepley at gmail.com Thu Aug 6 21:23:14 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 6 Aug 2020 22:23:14 -0400 Subject: [petsc-users] Test convergence with non linear preconditioners In-Reply-To: References: <878sesgauc.fsf@jedbrown.org> <875z9wg8dy.fsf@jedbrown.org> <939CE625-86EB-43CB-8F51-C521BDBC3BC3@petsc.dev> Message-ID: On Thu, Aug 6, 2020 at 10:08 PM Adolfo Rodriguez wrote: > Considering the output produced by snes_view (attachment), would be > possible to change the linear solver tolerances and the preconditioning > level or type? > Yes. The options prefix is shown for each subsolver. For example, you can change the linear solver type for the coarse level of FAS using -npc_fas_coarse_ksp_type bigcg Notice that you are using 1 level of FAS, so its the same as just Newton. Thanks, Matt > Adolfo > > > Virus-free. > www.avast.com > > <#m_9065914518290907664_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > > On Wed, Aug 5, 2020 at 8:31 PM Barry Smith wrote: > >> >> Turtles on top of turtles on top of turtles. >> >> It is probably easiest for you to look at the actual code to see how >> it handles things >> >> 1) the SNESFAS uses SNES for each of the levels, for each of these >> level SNES you can control the convergence criteria (either from the >> command lineor with appropriate prefix (highly recommended) or with >> function calls (not recommended)); and even provide your own convergence >> functions run with -snes_view to see the various solvers and their >> prefixes). >> >> 2) at the finest level of SNESFAS it does call >> >> /* Test for convergence */ >> if (isFine) { >> ierr = >> (*snes->ops->converged)(snes,snes->iter,0.0,0.0,snes->norm,&snes->reason,snes->cnvP);CHKERRQ(ierr); >> if (snes->reason) break; >> } >> >> src/snes/impls/fas/fas.c line 881 so at least in theory you can provide >> your own convergence test function. >> >> It was certainly our intention that users can control all the >> convergence knobs for arbitrary imbedded nonlinear solvers including FAS >> but, of course, there may be bugs so let us know what doesn't work. >> >> Generally the model for FAS is to run a single (or small number of) >> iteration(s) on the level solves and so not directly use convergence >> tolerances like rtol to control the number of iterations on a level but you >> should be able to set any criteria you want. >> >> You should be able to run with -snes_view and change some of the >> criteria on the command line and see the changes presented in the >> -snes_view output, plus see differences in convergence behavior. >> >> >> >> Barry >> >> >> >> On Aug 5, 2020, at 8:10 PM, Adolfo Rodriguez wrote: >> >> It looks like I cannot really change the test function or anything else >> for this particular SNES solver (I am using SNESFas). Basically, I am >> trying to use the ideas exposed in the paper on Composing scalable solvers >> but it seems that SNESFas does not allow to change the function for testing >> convergence, or anything else. Is this correct? >> >> Adolfo >> >> >> Virus-free. >> www.avast.com >> >> >> On Wed, Aug 5, 2020 at 3:41 PM Barry Smith wrote: >> >>> >>> Adolfo, >>> >>> You can also just change the tolerances for the inner solve using >>> the options data base and the prefix for the inner solve. >>> >>> When you run with -snes_view it will show the prefix for each of >>> the (nested) solvers. You can also run with -help to get all the possible >>> options for the inner solvers. >>> >>> In this case I think the prefix is npc so you can set tolerances >>> with -npc_snes_rtol -npc_ksp_rtol etc. From the program you >>> can use SNESGetNPC() and then call SNESSetXXX() to set options, for the >>> linear solver (if there is one) call SNESGetKSP() on the npc and then set >>> options on that KSP. >>> >>> >>> Barry >>> >>> >>> On Aug 5, 2020, at 3:30 PM, Adolfo Rodriguez wrote: >>> >>> Jed, >>> >>> I tred your suggestion >>> >>> SNESGetNPC(snes, &inner); >>> SNESSetConvergenceTest(inner, YourFunc, ...); >>> >>> and it is working as expected. I had not pieced together the fact that >>> "inner" is a "snes" object as well. >>> >>> Thanks! >>> >>> >>> >>> >>> Virus-free. >>> www.avast.com >>> >>> >>> On Wed, Aug 5, 2020 at 3:11 PM Jed Brown wrote: >>> >>>> Adolfo Rodriguez writes: >>>> >>>> > Actually I can set the non-linear pc. My problem relates to the >>>> convergence >>>> > test. I have not found a way to set the tolerances for the inner >>>> problem, >>>> > are they hard coded? >>>> >>>> I suggested code to set a custom convergence test. Nothing is >>>> hard-coded. What have you done? >>>> >>> >>> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Aug 6 23:01:21 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 6 Aug 2020 23:01:21 -0500 Subject: [petsc-users] Best practices for solving Dense Linear systems In-Reply-To: <7f208bce-6c5f-48ed-db38-4f3226dbf1f4@rice.edu> References: <7f208bce-6c5f-48ed-db38-4f3226dbf1f4@rice.edu> Message-ID: <245E77E0-3166-45B2-BAB3-C100D9A0420C@petsc.dev> > On Aug 6, 2020, at 7:32 PM, Nidish wrote: > > I'm relatively new to PETSc, and my applications involve (for the most part) dense matrix solves. > > I read in the documentation that this is an area PETSc does not specialize in but instead recommends external libraries such as Elemental. I'm wondering if there are any "best" practices in this regard. Some questions I'd like answered are: > > 1. Can I just declare my dense matrix as a sparse one and fill the whole matrix up? Do any of the others go this route? What're possible pitfalls/unfavorable outcomes for this? I understand the memory overhead probably shoots up. No, this isn't practical, the performance will be terrible. > 2. Are there any specific guidelines on when I can expect elemental to perform better in parallel than in serial? Because the computation to communication ratio for dense matrices is higher than for sparse you will see better parallel performance for dense problems of a given size than sparse problems of a similar size. In other words parallelism can help for dense matrices for relatively small problems, of course the specifics of your machine hardware and software also play a role. Barry > > Of course, I'm interesting in any other details that may be important in this regard. > > Thank you, > Nidish From nb25 at rice.edu Fri Aug 7 00:30:18 2020 From: nb25 at rice.edu (Nidish) Date: Fri, 07 Aug 2020 00:30:18 -0500 Subject: [petsc-users] Best practices for solving Dense Linear systems In-Reply-To: <245E77E0-3166-45B2-BAB3-C100D9A0420C@petsc.dev> References: <7f208bce-6c5f-48ed-db38-4f3226dbf1f4@rice.edu> <245E77E0-3166-45B2-BAB3-C100D9A0420C@petsc.dev> Message-ID: <1e028a97-5dc2-4dce-ad6f-bff3238a8161@rice.edu> Thank you for the response. I've just been running some tests with matrices up to 2e4 dimensions (dense). When I compared the solution times for "-mat_type elemental" and "-mat_type mpiaij" running with 4 cores, I found the mpidense versions running way faster than elemental. I have not been able to make the elemental version finish up for 2e4 so far (my patience runs out faster). What's going on here? I thought elemental was supposed to be superior for dense matrices. I can share the code if that's appropriate for this forum (sorry, I'm new here). Nidish On Aug 6, 2020, 23:01, at 23:01, Barry Smith wrote: > > >> On Aug 6, 2020, at 7:32 PM, Nidish wrote: >> >> I'm relatively new to PETSc, and my applications involve (for the >most part) dense matrix solves. >> >> I read in the documentation that this is an area PETSc does not >specialize in but instead recommends external libraries such as >Elemental. I'm wondering if there are any "best" practices in this >regard. Some questions I'd like answered are: >> >> 1. Can I just declare my dense matrix as a sparse one and fill the >whole matrix up? Do any of the others go this route? What're possible >pitfalls/unfavorable outcomes for this? I understand the memory >overhead probably shoots up. > > No, this isn't practical, the performance will be terrible. > >> 2. Are there any specific guidelines on when I can expect elemental >to perform better in parallel than in serial? > >Because the computation to communication ratio for dense matrices is >higher than for sparse you will see better parallel performance for >dense problems of a given size than sparse problems of a similar size. >In other words parallelism can help for dense matrices for relatively >small problems, of course the specifics of your machine hardware and >software also play a role. > > Barry > >> >> Of course, I'm interesting in any other details that may be important >in this regard. >> >> Thank you, >> Nidish -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Aug 7 00:50:48 2020 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 7 Aug 2020 00:50:48 -0500 Subject: [petsc-users] Best practices for solving Dense Linear systems In-Reply-To: <1e028a97-5dc2-4dce-ad6f-bff3238a8161@rice.edu> References: <7f208bce-6c5f-48ed-db38-4f3226dbf1f4@rice.edu> <245E77E0-3166-45B2-BAB3-C100D9A0420C@petsc.dev> <1e028a97-5dc2-4dce-ad6f-bff3238a8161@rice.edu> Message-ID: <85F9F817-2754-4F55-9222-3E23003E79FD@petsc.dev> What is the output of -ksp_view for the two case? It is not only the matrix format but also the matrix solver that matters. For example if you are using an iterative solver the elemental format won't be faster, you should use the PETSc MPIDENSE format. The elemental format is really intended when you use a direct LU solver for the matrix. For tiny matrices like this an iterative solver could easily be faster than the direct solver, it depends on the conditioning (eigenstructure) of the dense matrix. Also the default PETSc solver uses block Jacobi with ILU on each process if using a sparse format, ILU applied to a dense matrix is actually LU so your solver is probably different also between the MPIAIJ and the elemental. Barry > On Aug 7, 2020, at 12:30 AM, Nidish wrote: > > Thank you for the response. > > I've just been running some tests with matrices up to 2e4 dimensions (dense). When I compared the solution times for "-mat_type elemental" and "-mat_type mpiaij" running with 4 cores, I found the mpidense versions running way faster than elemental. I have not been able to make the elemental version finish up for 2e4 so far (my patience runs out faster). > > What's going on here? I thought elemental was supposed to be superior for dense matrices. > > I can share the code if that's appropriate for this forum (sorry, I'm new here). > > Nidish > On Aug 6, 2020, at 23:01, Barry Smith > wrote: > > > On Aug 6, 2020, at 7:32 PM, Nidish wrote: > > I'm relatively new to PETSc, and my applications involve (for the most part) dense matrix solves. > > I read in the documentation that this is an area PETSc does not specialize in but instead recommends external libraries such as Elemental. I'm wondering if there are any "best" practices in this regard. Some questions I'd like answered are: > > 1. Can I just declare my dense matrix as a sparse one and fill the whole matrix up? Do any of the others go this route? What're possible pitfalls/unfavorable outcomes for this? I understand the memory overhead probably shoots up. > > No, this isn't practical, the performance will be terrible. > > 2. Are there any specific guidelines on when I can expect elemental to perform better in parallel than in serial? > > Because the computation to communication ratio for dense matrices is higher than for sparse you will see better parallel performance for dense problems of a given size than sparse problems of a similar size. In other words parallelism can help for dense matrices for relatively small problems, of course the specifics of your machine hardware and software also play a role. > > Barry > > > Of course, I'm interesting in any other details that may be important in this regard. > > Thank you, > Nidish > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nb25 at rice.edu Fri Aug 7 01:25:31 2020 From: nb25 at rice.edu (Nidish) Date: Fri, 7 Aug 2020 01:25:31 -0500 Subject: [petsc-users] Best practices for solving Dense Linear systems In-Reply-To: <85F9F817-2754-4F55-9222-3E23003E79FD@petsc.dev> References: <7f208bce-6c5f-48ed-db38-4f3226dbf1f4@rice.edu> <245E77E0-3166-45B2-BAB3-C100D9A0420C@petsc.dev> <1e028a97-5dc2-4dce-ad6f-bff3238a8161@rice.edu> <85F9F817-2754-4F55-9222-3E23003E79FD@petsc.dev> Message-ID: Indeed - I was just using the default solver (GMRES with ILU). Using just standard LU (direct solve with "-pc_type lu -ksp_type preonly"), I find elemental to be extremely slow even for a 1000x1000 matrix. For MPIaij it's throwing me an error if I tried "-pc_type lu". I'm attaching the code here, in case you'd like to have a look at what I've been trying to do. The two configurations of interest are, $> mpirun -n 4 ./ksps -N 1000 -mat_type mpiaij $> mpirun -n 4 ./ksps -N 1000 -mat_type elemental (for the GMRES with ILU) and, $> mpirun -n 4 ./ksps -N 1000 -mat_type mpiaij -pc_type lu -ksp_type preonly $> mpirun -n 4 ./ksps -N 1000 -mat_type elemental -pc_type lu -ksp_type preonly elemental seems to perform poorly in both cases. Nidish On 8/7/20 12:50 AM, Barry Smith wrote: > > ? What is the output of -ksp_view ?for the two case? > > ? It is not only the matrix format but also the matrix solver that > matters. For example if you are using an iterative solver the > elemental format won't be faster, you should use the PETSc MPIDENSE > format. The elemental format is really intended when you use a direct > LU solver for the matrix. For tiny matrices like this an iterative > solver could easily be faster than the direct solver, it depends on > the conditioning (eigenstructure) of the dense matrix. Also the > default PETSc solver uses block Jacobi with ILU on each process if > using a sparse format, ILU applied to a dense matrix is actually LU so > your solver is probably different also between the MPIAIJ and the > elemental. > > ? Barry > > > > >> On Aug 7, 2020, at 12:30 AM, Nidish > > wrote: >> >> Thank you for the response. >> >> I've just been running some tests with matrices up to 2e4 dimensions >> (dense). When I compared the solution times for "-mat_type elemental" >> and "-mat_type mpiaij" running with 4 cores, I found the mpidense >> versions running way faster than elemental. I have not been able to >> make the elemental version finish up for 2e4 so far (my patience runs >> out faster). >> >> What's going on here? I thought elemental was supposed to be superior >> for dense matrices. >> >> I can share the code if that's appropriate for this forum (sorry, I'm >> new here). >> >> Nidish >> On Aug 6, 2020, at 23:01, Barry Smith > > wrote: >> >> >> On Aug 6, 2020, at 7:32 PM, Nidish > > wrote: I'm relatively new to PETSc, >> and my applications involve (for the most part) dense matrix >> solves. I read in the documentation that this is an area >> PETSc does not specialize in but instead recommends external >> libraries such as Elemental. I'm wondering if there are any >> "best" practices in this regard. Some questions I'd like >> answered are: 1. Can I just declare my dense matrix as a >> sparse one and fill the whole matrix up? Do any of the others >> go this route? What're possible pitfalls/unfavorable outcomes >> for this? I understand the memory overhead probably shoots up. >> >> >> No, this isn't practical, the performance will be terrible. >> >> 2. Are there any specific guidelines on when I can expect >> elemental to perform better in parallel than in serial? >> >> >> Because the computation to communication ratio for dense matrices is higher than for sparse you will see better parallel performance for dense problems of a given size than sparse problems of a similar size. In other words parallelism can help for dense matrices for relatively small problems, of course the specifics of your machine hardware and software also play a role. >> >> Barry >> >> Of course, I'm interesting in any other details that may be >> important in this regard. Thank you, Nidish >> >> > -- Nidish -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ksps.cpp Type: text/x-c++src Size: 3025 bytes Desc: not available URL: From bsmith at petsc.dev Fri Aug 7 08:52:22 2020 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 7 Aug 2020 08:52:22 -0500 Subject: [petsc-users] Best practices for solving Dense Linear systems In-Reply-To: References: <7f208bce-6c5f-48ed-db38-4f3226dbf1f4@rice.edu> <245E77E0-3166-45B2-BAB3-C100D9A0420C@petsc.dev> <1e028a97-5dc2-4dce-ad6f-bff3238a8161@rice.edu> <85F9F817-2754-4F55-9222-3E23003E79FD@petsc.dev> Message-ID: <6A7D902E-FACE-4778-B89A-90B043ED31C0@petsc.dev> > On Aug 7, 2020, at 1:25 AM, Nidish wrote: > > Indeed - I was just using the default solver (GMRES with ILU). > > Using just standard LU (direct solve with "-pc_type lu -ksp_type preonly"), I find elemental to be extremely slow even for a 1000x1000 matrix. > What about on one process? Elemental generally won't be competitive for such tiny matrices. > For MPIaij it's throwing me an error if I tried "-pc_type lu". > Yes, there is no PETSc code for sparse parallel direct solver, this is expected. What about ? > mpirun -n 1 ./ksps -N 1000 -mat_type mpidense -pc_type jacobi > > mpirun -n 4 ./ksps -N 1000 -mat_type mpidense -pc_type jacobi Where will your dense matrices be coming from and how big will they be in practice? This will help determine if an iterative solver is appropriate. If they will be 100,000 for example then testing with 1000 will tell you nothing useful, you need to test with the problem size you care about. Barry > I'm attaching the code here, in case you'd like to have a look at what I've been trying to do. > > The two configurations of interest are, > > $> mpirun -n 4 ./ksps -N 1000 -mat_type mpiaij > $> mpirun -n 4 ./ksps -N 1000 -mat_type elemental > > (for the GMRES with ILU) and, > > $> mpirun -n 4 ./ksps -N 1000 -mat_type mpiaij -pc_type lu -ksp_type preonly > $> mpirun -n 4 ./ksps -N 1000 -mat_type elemental -pc_type lu -ksp_type preonly > > elemental seems to perform poorly in both cases. > > Nidish > > On 8/7/20 12:50 AM, Barry Smith wrote: >> >> What is the output of -ksp_view for the two case? >> >> It is not only the matrix format but also the matrix solver that matters. For example if you are using an iterative solver the elemental format won't be faster, you should use the PETSc MPIDENSE format. The elemental format is really intended when you use a direct LU solver for the matrix. For tiny matrices like this an iterative solver could easily be faster than the direct solver, it depends on the conditioning (eigenstructure) of the dense matrix. Also the default PETSc solver uses block Jacobi with ILU on each process if using a sparse format, ILU applied to a dense matrix is actually LU so your solver is probably different also between the MPIAIJ and the elemental. >> >> Barry >> >> >> >> >>> On Aug 7, 2020, at 12:30 AM, Nidish > wrote: >>> >>> Thank you for the response. >>> >>> I've just been running some tests with matrices up to 2e4 dimensions (dense). When I compared the solution times for "-mat_type elemental" and "-mat_type mpiaij" running with 4 cores, I found the mpidense versions running way faster than elemental. I have not been able to make the elemental version finish up for 2e4 so far (my patience runs out faster). >>> >>> What's going on here? I thought elemental was supposed to be superior for dense matrices. >>> >>> I can share the code if that's appropriate for this forum (sorry, I'm new here). >>> >>> Nidish >>> On Aug 6, 2020, at 23:01, Barry Smith > wrote: >>> >>> On Aug 6, 2020, at 7:32 PM, Nidish > wrote: >>> >>> I'm relatively new to PETSc, and my applications involve (for the most part) dense matrix solves. >>> >>> I read in the documentation that this is an area PETSc does not specialize in but instead recommends external libraries such as Elemental. I'm wondering if there are any "best" practices in this regard. Some questions I'd like answered are: >>> >>> 1. Can I just declare my dense matrix as a sparse one and fill the whole matrix up? Do any of the others go this route? What're possible pitfalls/unfavorable outcomes for this? I understand the memory overhead probably shoots up. >>> >>> No, this isn't practical, the performance will be terrible. >>> >>> 2. Are there any specific guidelines on when I can expect elemental to perform better in parallel than in serial? >>> >>> Because the computation to communication ratio for dense matrices is higher than for sparse you will see better parallel performance for dense problems of a given size than sparse problems of a similar size. In other words parallelism can help for dense matrices for relatively small problems, of course the specifics of your machine hardware and software also play a role. >>> >>> Barry >>> >>> >>> Of course, I'm interesting in any other details that may be important in this regard. >>> >>> Thank you, >>> Nidish >>> >> > -- > Nidish > -------------- next part -------------- An HTML attachment was scrubbed... URL: From adantra at gmail.com Fri Aug 7 11:19:58 2020 From: adantra at gmail.com (Adolfo Rodriguez) Date: Fri, 7 Aug 2020 11:19:58 -0500 Subject: [petsc-users] Test convergence with non linear preconditioners In-Reply-To: References: <878sesgauc.fsf@jedbrown.org> <875z9wg8dy.fsf@jedbrown.org> <939CE625-86EB-43CB-8F51-C521BDBC3BC3@petsc.dev> Message-ID: Great, that works. What would be the way to change the ilu level, I need to use ilu(1). I would assume that I can accomplish that by means of: -npc_fas_coarse_pc_type ilu -npc_fas_coarse_pc_ilu_levels 1 I noticed that the first line actually works, but I am not sure about the second one. Thanks, Adolfo Virus-free. www.avast.com <#m_-242385451441738719_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> On Thu, Aug 6, 2020 at 9:23 PM Matthew Knepley wrote: > On Thu, Aug 6, 2020 at 10:08 PM Adolfo Rodriguez > wrote: > >> Considering the output produced by snes_view (attachment), would be >> possible to change the linear solver tolerances and the preconditioning >> level or type? >> > > Yes. The options prefix is shown for each subsolver. For example, you can > change the linear solver type for the coarse level of FAS using > > -npc_fas_coarse_ksp_type bigcg > > Notice that you are using 1 level of FAS, so its the same as just Newton. > > Thanks, > > Matt > > >> Adolfo >> >> >> Virus-free. >> www.avast.com >> >> <#m_-242385451441738719_m_5425691679335133860_m_9065914518290907664_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> >> >> On Wed, Aug 5, 2020 at 8:31 PM Barry Smith wrote: >> >>> >>> Turtles on top of turtles on top of turtles. >>> >>> It is probably easiest for you to look at the actual code to see how >>> it handles things >>> >>> 1) the SNESFAS uses SNES for each of the levels, for each of these >>> level SNES you can control the convergence criteria (either from the >>> command lineor with appropriate prefix (highly recommended) or with >>> function calls (not recommended)); and even provide your own convergence >>> functions run with -snes_view to see the various solvers and their >>> prefixes). >>> >>> 2) at the finest level of SNESFAS it does call >>> >>> /* Test for convergence */ >>> if (isFine) { >>> ierr = >>> (*snes->ops->converged)(snes,snes->iter,0.0,0.0,snes->norm,&snes->reason,snes->cnvP);CHKERRQ(ierr); >>> if (snes->reason) break; >>> } >>> >>> src/snes/impls/fas/fas.c line 881 so at least in theory you can provide >>> your own convergence test function. >>> >>> It was certainly our intention that users can control all the >>> convergence knobs for arbitrary imbedded nonlinear solvers including FAS >>> but, of course, there may be bugs so let us know what doesn't work. >>> >>> Generally the model for FAS is to run a single (or small number of) >>> iteration(s) on the level solves and so not directly use convergence >>> tolerances like rtol to control the number of iterations on a level but you >>> should be able to set any criteria you want. >>> >>> You should be able to run with -snes_view and change some of the >>> criteria on the command line and see the changes presented in the >>> -snes_view output, plus see differences in convergence behavior. >>> >>> >>> >>> Barry >>> >>> >>> >>> On Aug 5, 2020, at 8:10 PM, Adolfo Rodriguez wrote: >>> >>> It looks like I cannot really change the test function or anything else >>> for this particular SNES solver (I am using SNESFas). Basically, I am >>> trying to use the ideas exposed in the paper on Composing scalable solvers >>> but it seems that SNESFas does not allow to change the function for testing >>> convergence, or anything else. Is this correct? >>> >>> Adolfo >>> >>> >>> Virus-free. >>> www.avast.com >>> >>> >>> On Wed, Aug 5, 2020 at 3:41 PM Barry Smith wrote: >>> >>>> >>>> Adolfo, >>>> >>>> You can also just change the tolerances for the inner solve using >>>> the options data base and the prefix for the inner solve. >>>> >>>> When you run with -snes_view it will show the prefix for each of >>>> the (nested) solvers. You can also run with -help to get all the possible >>>> options for the inner solvers. >>>> >>>> In this case I think the prefix is npc so you can set tolerances >>>> with -npc_snes_rtol -npc_ksp_rtol etc. From the program you >>>> can use SNESGetNPC() and then call SNESSetXXX() to set options, for the >>>> linear solver (if there is one) call SNESGetKSP() on the npc and then set >>>> options on that KSP. >>>> >>>> >>>> Barry >>>> >>>> >>>> On Aug 5, 2020, at 3:30 PM, Adolfo Rodriguez wrote: >>>> >>>> Jed, >>>> >>>> I tred your suggestion >>>> >>>> SNESGetNPC(snes, &inner); >>>> SNESSetConvergenceTest(inner, YourFunc, ...); >>>> >>>> and it is working as expected. I had not pieced together the fact that >>>> "inner" is a "snes" object as well. >>>> >>>> Thanks! >>>> >>>> >>>> >>>> >>>> Virus-free. >>>> www.avast.com >>>> >>>> >>>> On Wed, Aug 5, 2020 at 3:11 PM Jed Brown wrote: >>>> >>>>> Adolfo Rodriguez writes: >>>>> >>>>> > Actually I can set the non-linear pc. My problem relates to the >>>>> convergence >>>>> > test. I have not found a way to set the tolerances for the inner >>>>> problem, >>>>> > are they hard coded? >>>>> >>>>> I suggested code to set a custom convergence test. Nothing is >>>>> hard-coded. What have you done? >>>>> >>>> >>>> >>> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nb25 at rice.edu Fri Aug 7 12:26:42 2020 From: nb25 at rice.edu (Nidish) Date: Fri, 7 Aug 2020 12:26:42 -0500 Subject: [petsc-users] Best practices for solving Dense Linear systems In-Reply-To: <6A7D902E-FACE-4778-B89A-90B043ED31C0@petsc.dev> References: <7f208bce-6c5f-48ed-db38-4f3226dbf1f4@rice.edu> <245E77E0-3166-45B2-BAB3-C100D9A0420C@petsc.dev> <1e028a97-5dc2-4dce-ad6f-bff3238a8161@rice.edu> <85F9F817-2754-4F55-9222-3E23003E79FD@petsc.dev> <6A7D902E-FACE-4778-B89A-90B043ED31C0@petsc.dev> Message-ID: On 8/7/20 8:52 AM, Barry Smith wrote: > > >> On Aug 7, 2020, at 1:25 AM, Nidish > > wrote: >> >> Indeed - I was just using the default solver (GMRES with ILU). >> >> Using just standard LU (direct solve with "-pc_type lu -ksp_type >> preonly"), I find elemental to be extremely slow even for a 1000x1000 >> matrix. >> > > What about on one process? On just one process the performance is comparable. > > Elemental generally won't be competitive for such tiny matrices. >> >> For MPIaij it's throwing me an error if I tried "-pc_type lu". >> > > ? ?Yes, there is no PETSc code for sparse parallel direct solver, this > is expected. > > ? ?What about ? > >> mpirun -n 1 ./ksps -N 1000 -mat_type mpidense -pc_type jacobi >> >> mpirun -n 4 ./ksps -N 1000 -mat_type mpidense -pc_type jacobi >> Same results - the elemental version is MUCH slower (for 1000x1000). > Where will your dense matrices be coming from and how big will they be > in practice? This will help determine if an iterative solver is > appropriate. If they will be 100,000 for example then testing with > 1000 will tell you nothing useful, you need to test with the problem > size you care about. The matrices in my application arise from substructuring/Component Mode Synthesis conducted on a system that is linear "almost everywhere", for example jointed systems. The procedure we follow is: build a mesh & identify the nodes corresponding to the interfaces, reduce the model using component mode synthesis to obtain a representation of the system using just the interface degrees-of-freedom along with some (~10s) generalized "modal coordinates". We conduct the non-linear analyses (transient, steady state harmonic, etc.) using this matrices. I am interested in conducting non-linear mesh convergence for a particular system of interest wherein the interface DoFs are, approx, 4000, 8000, 12000, 16000. I'm fairly certain the dense matrices will not be larger. The However for frequency domain simulations, we use matrices that are about 10 times the size of the original matrices (whose meshes have been shown to be convergent in static test cases). Thank you, Nidish > > Barry > >> I'm attaching the code here, in case you'd like to have a look at >> what I've been trying to do. >> >> The two configurations of interest are, >> >> $> mpirun -n 4 ./ksps -N 1000 -mat_type mpiaij >> $> mpirun -n 4 ./ksps -N 1000 -mat_type elemental >> >> (for the GMRES with ILU) and, >> >> $> mpirun -n 4 ./ksps -N 1000 -mat_type mpiaij -pc_type lu >> -ksp_type preonly >> $> mpirun -n 4 ./ksps -N 1000 -mat_type elemental -pc_type lu >> -ksp_type preonly >> >> elemental seems to perform poorly in both cases. >> >> Nidish >> >> On 8/7/20 12:50 AM, Barry Smith wrote: >>> >>> ? What is the output of -ksp_view ?for the two case? >>> >>> ? It is not only the matrix format but also the matrix solver that >>> matters. For example if you are using an iterative solver the >>> elemental format won't be faster, you should use the PETSc MPIDENSE >>> format. The elemental format is really intended when you use a >>> direct LU solver for the matrix. For tiny matrices like this an >>> iterative solver could easily be faster than the direct solver, it >>> depends on the conditioning (eigenstructure) of the dense matrix. >>> Also the default PETSc solver uses block Jacobi with ILU on each >>> process if using a sparse format, ILU applied to a dense matrix is >>> actually LU so your solver is probably different also between the >>> MPIAIJ and the elemental. >>> >>> ? Barry >>> >>> >>> >>> >>>> On Aug 7, 2020, at 12:30 AM, Nidish >>> > wrote: >>>> >>>> Thank you for the response. >>>> >>>> I've just been running some tests with matrices up to 2e4 >>>> dimensions (dense). When I compared the solution times for >>>> "-mat_type elemental" and "-mat_type mpiaij" running with 4 cores, >>>> I found the mpidense versions running way faster than elemental. I >>>> have not been able to make the elemental version finish up for 2e4 >>>> so far (my patience runs out faster). >>>> >>>> What's going on here? I thought elemental was supposed to be >>>> superior for dense matrices. >>>> >>>> I can share the code if that's appropriate for this forum (sorry, >>>> I'm new here). >>>> >>>> Nidish >>>> On Aug 6, 2020, at 23:01, Barry Smith >>> > wrote: >>>> >>>> On Aug 6, 2020, at 7:32 PM, Nidish >>> > wrote: I'm relatively new to PETSc, >>>> and my applications involve (for the most part) dense >>>> matrix solves. I read in the documentation that this is an >>>> area PETSc does not specialize in but instead recommends >>>> external libraries such as Elemental. I'm wondering if >>>> there are any "best" practices in this regard. Some >>>> questions I'd like answered are: 1. Can I just declare my >>>> dense matrix as a sparse one and fill the whole matrix up? >>>> Do any of the others go this route? What're possible >>>> pitfalls/unfavorable outcomes for this? I understand the >>>> memory overhead probably shoots up. >>>> >>>> >>>> No, this isn't practical, the performance will be terrible. >>>> >>>> 2. Are there any specific guidelines on when I can expect >>>> elemental to perform better in parallel than in serial? >>>> >>>> >>>> Because the computation to communication ratio for dense matrices is higher than for sparse you will see better parallel performance for dense problems of a given size than sparse problems of a similar size. In other words parallelism can help for dense matrices for relatively small problems, of course the specifics of your machine hardware and software also play a role. >>>> >>>> Barry >>>> >>>> Of course, I'm interesting in any other details that may be >>>> important in this regard. Thank you, Nidish >>>> >>>> >>> >> -- >> Nidish >> > -- Nidish -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Aug 7 12:55:36 2020 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 7 Aug 2020 12:55:36 -0500 Subject: [petsc-users] Best practices for solving Dense Linear systems In-Reply-To: References: <7f208bce-6c5f-48ed-db38-4f3226dbf1f4@rice.edu> <245E77E0-3166-45B2-BAB3-C100D9A0420C@petsc.dev> <1e028a97-5dc2-4dce-ad6f-bff3238a8161@rice.edu> <85F9F817-2754-4F55-9222-3E23003E79FD@petsc.dev> <6A7D902E-FACE-4778-B89A-90B043ED31C0@petsc.dev> Message-ID: > On Aug 7, 2020, at 12:26 PM, Nidish wrote: > > > > On 8/7/20 8:52 AM, Barry Smith wrote: >> >> >>> On Aug 7, 2020, at 1:25 AM, Nidish > wrote: >>> >>> Indeed - I was just using the default solver (GMRES with ILU). >>> >>> Using just standard LU (direct solve with "-pc_type lu -ksp_type preonly"), I find elemental to be extremely slow even for a 1000x1000 matrix. >>> >> >> What about on one process? > On just one process the performance is comparable. >> >> Elemental generally won't be competitive for such tiny matrices. >>> For MPIaij it's throwing me an error if I tried "-pc_type lu". >>> >> >> Yes, there is no PETSc code for sparse parallel direct solver, this is expected. >> >> What about ? >> >>> mpirun -n 1 ./ksps -N 1000 -mat_type mpidense -pc_type jacobi >>> >>> mpirun -n 4 ./ksps -N 1000 -mat_type mpidense -pc_type jacobi > Same results - the elemental version is MUCH slower (for 1000x1000). >> Where will your dense matrices be coming from and how big will they be in practice? This will help determine if an iterative solver is appropriate. If they will be 100,000 for example then testing with 1000 will tell you nothing useful, you need to test with the problem size you care about. > The matrices in my application arise from substructuring/Component Mode Synthesis conducted on a system that is linear "almost everywhere", for example jointed systems. The procedure we follow is: build a mesh & identify the nodes corresponding to the interfaces, reduce the model using component mode synthesis to obtain a representation of the system using just the interface degrees-of-freedom along with some (~10s) generalized "modal coordinates". We conduct the non-linear analyses (transient, steady state harmonic, etc.) using this matrices. > > I am interested in conducting non-linear mesh convergence for a particular system of interest wherein the interface DoFs are, approx, 4000, 8000, 12000, 16000. I'm fairly certain the dense matrices will not be larger. The > Ok, so it is not clear how well conditioned these dense matrices will be. There are three questions that need to be answered. 1) for your problem can iterative methods be used and will they require less work than direct solvers. For direct LU the work is order N^3 to do the factorization with a relatively small constant. Because of smart organization inside dense LU the flops can be done very efficiently. For GMRES with Jacobi preconditioning the work is order N^2 (the time for a dense matrix-vector product) for each iteration. If the number of iterations small than the total work is much less than a direct solver. In the worst case the number of iterations is order N so the total work is order N^3, the same order as a direct method. But the efficiency of a dense matrix-vector product is much lower than the efficiency of a LU factorization so even if the work is the same order it can take longer. One should use mpidense as the matrix format for iterative. With iterative methods YOU get to decide how accurate you need your solution, you do this by setting how small you want the residual to be (since you can't directly control the error). By default PETSc uses a relative decrease in the residual of 1e-5. 2) for your size problems can parallelism help? I think it should but elemental since it requires a different data layout has additional overhead cost to get the data into the optimal format for parallelism. 3) can parallelism help on YOUR machine. Just because a machine has multiple cores it may not be able to utilize them efficiently for solvers if the total machine memory bandwidth is limited. So the first thing to do is on the machine you plan to use for your computations run the streams benchmark discussed in https://www.mcs.anl.gov/petsc/documentation/faq.html#computers this will give us some general idea of how much parallelism you can take advantage of. Is the machine a parallel cluster or just a single node? After this I'll give you a few specific cases to run to get a feeling for what approach would be best for your problems, Barry > However for frequency domain simulations, we use matrices that are about 10 times the size of the original matrices (whose meshes have been shown to be convergent in static test cases). > > Thank you, > Nidish > >> >> Barry >> >>> I'm attaching the code here, in case you'd like to have a look at what I've been trying to do. >>> >>> The two configurations of interest are, >>> >>> $> mpirun -n 4 ./ksps -N 1000 -mat_type mpiaij >>> $> mpirun -n 4 ./ksps -N 1000 -mat_type elemental >>> >>> (for the GMRES with ILU) and, >>> >>> $> mpirun -n 4 ./ksps -N 1000 -mat_type mpiaij -pc_type lu -ksp_type preonly >>> $> mpirun -n 4 ./ksps -N 1000 -mat_type elemental -pc_type lu -ksp_type preonly >>> >>> elemental seems to perform poorly in both cases. >>> >>> Nidish >>> >>> On 8/7/20 12:50 AM, Barry Smith wrote: >>>> >>>> What is the output of -ksp_view for the two case? >>>> >>>> It is not only the matrix format but also the matrix solver that matters. For example if you are using an iterative solver the elemental format won't be faster, you should use the PETSc MPIDENSE format. The elemental format is really intended when you use a direct LU solver for the matrix. For tiny matrices like this an iterative solver could easily be faster than the direct solver, it depends on the conditioning (eigenstructure) of the dense matrix. Also the default PETSc solver uses block Jacobi with ILU on each process if using a sparse format, ILU applied to a dense matrix is actually LU so your solver is probably different also between the MPIAIJ and the elemental. >>>> >>>> Barry >>>> >>>> >>>> >>>> >>>>> On Aug 7, 2020, at 12:30 AM, Nidish > wrote: >>>>> >>>>> Thank you for the response. >>>>> >>>>> I've just been running some tests with matrices up to 2e4 dimensions (dense). When I compared the solution times for "-mat_type elemental" and "-mat_type mpiaij" running with 4 cores, I found the mpidense versions running way faster than elemental. I have not been able to make the elemental version finish up for 2e4 so far (my patience runs out faster). >>>>> >>>>> What's going on here? I thought elemental was supposed to be superior for dense matrices. >>>>> >>>>> I can share the code if that's appropriate for this forum (sorry, I'm new here). >>>>> >>>>> Nidish >>>>> On Aug 6, 2020, at 23:01, Barry Smith > wrote: >>>>> On Aug 6, 2020, at 7:32 PM, Nidish > wrote: >>>>> >>>>> I'm relatively new to PETSc, and my applications involve (for the most part) dense matrix solves. >>>>> >>>>> I read in the documentation that this is an area PETSc does not specialize in but instead recommends external libraries such as Elemental. I'm wondering if there are any "best" practices in this regard. Some questions I'd like answered are: >>>>> >>>>> 1. Can I just declare my dense matrix as a sparse one and fill the whole matrix up? Do any of the others go this route? What're possible pitfalls/unfavorable outcomes for this? I understand the memory overhead probably shoots up. >>>>> >>>>> No, this isn't practical, the performance will be terrible. >>>>> >>>>> 2. Are there any specific guidelines on when I can expect elemental to perform better in parallel than in serial? >>>>> >>>>> Because the computation to communication ratio for dense matrices is higher than for sparse you will see better parallel performance for dense problems of a given size than sparse problems of a similar size. In other words parallelism can help for dense matrices for relatively small problems, of course the specifics of your machine hardware and software also play a role. >>>>> >>>>> Barry >>>>> >>>>> >>>>> Of course, I'm interesting in any other details that may be important in this regard. >>>>> >>>>> Thank you, >>>>> Nidish >>>>> >>>> >>> -- >>> Nidish >>> >> > -- > Nidish -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Aug 7 13:00:49 2020 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 7 Aug 2020 13:00:49 -0500 Subject: [petsc-users] Test convergence with non linear preconditioners In-Reply-To: References: <878sesgauc.fsf@jedbrown.org> <875z9wg8dy.fsf@jedbrown.org> <939CE625-86EB-43CB-8F51-C521BDBC3BC3@petsc.dev> Message-ID: <529E5EB4-802A-4F90-9CF9-E49760DFA23C@petsc.dev> > On Aug 7, 2020, at 11:19 AM, Adolfo Rodriguez wrote: > > Great, that works. What would be the way to change the ilu level, I need to use ilu(1). I would assume that I can accomplish that by means of: > > -npc_fas_coarse_pc_type ilu > -npc_fas_coarse_pc_ilu_levels 1 > > I noticed that the first line actually works, but I am not sure about the second one. The number of levels should be noted in the -snes_view output. You can run with -help -npc_fas_coarse_pc_type ilu | grep npc_fas_coarse_pc see what the exact names of the options are if you are unsure > > Thanks, > Adolfo > > Virus-free. www.avast.com > On Thu, Aug 6, 2020 at 9:23 PM Matthew Knepley > wrote: > On Thu, Aug 6, 2020 at 10:08 PM Adolfo Rodriguez > wrote: > Considering the output produced by snes_view (attachment), would be possible to change the linear solver tolerances and the preconditioning level or type? > > Yes. The options prefix is shown for each subsolver. For example, you can change the linear solver type for the coarse level of FAS using > > -npc_fas_coarse_ksp_type bigcg > > Notice that you are using 1 level of FAS, so its the same as just Newton. > > Thanks, > > Matt > > Adolfo > > Virus-free. www.avast.com > On Wed, Aug 5, 2020 at 8:31 PM Barry Smith > wrote: > > Turtles on top of turtles on top of turtles. > > It is probably easiest for you to look at the actual code to see how it handles things > > 1) the SNESFAS uses SNES for each of the levels, for each of these level SNES you can control the convergence criteria (either from the command lineor with appropriate prefix (highly recommended) or with function calls (not recommended)); and even provide your own convergence functions run with -snes_view to see the various solvers and their prefixes). > > 2) at the finest level of SNESFAS it does call > > /* Test for convergence */ > if (isFine) { > ierr = (*snes->ops->converged)(snes,snes->iter,0.0,0.0,snes->norm,&snes->reason,snes->cnvP);CHKERRQ(ierr); > if (snes->reason) break; > } > > src/snes/impls/fas/fas.c line 881 so at least in theory you can provide your own convergence test function. > > It was certainly our intention that users can control all the convergence knobs for arbitrary imbedded nonlinear solvers including FAS but, of course, there may be bugs so let us know what doesn't work. > > Generally the model for FAS is to run a single (or small number of) iteration(s) on the level solves and so not directly use convergence tolerances like rtol to control the number of iterations on a level but you should be able to set any criteria you want. > > You should be able to run with -snes_view and change some of the criteria on the command line and see the changes presented in the -snes_view output, plus see differences in convergence behavior. > > > > Barry > > > >> On Aug 5, 2020, at 8:10 PM, Adolfo Rodriguez > wrote: >> >> It looks like I cannot really change the test function or anything else for this particular SNES solver (I am using SNESFas). Basically, I am trying to use the ideas exposed in the paper on Composing scalable solvers but it seems that SNESFas does not allow to change the function for testing convergence, or anything else. Is this correct? >> >> Adolfo >> >> Virus-free. www.avast.com <> >> On Wed, Aug 5, 2020 at 3:41 PM Barry Smith > wrote: >> >> Adolfo, >> >> You can also just change the tolerances for the inner solve using the options data base and the prefix for the inner solve. >> >> When you run with -snes_view it will show the prefix for each of the (nested) solvers. You can also run with -help to get all the possible options for the inner solvers. >> >> In this case I think the prefix is npc so you can set tolerances with -npc_snes_rtol -npc_ksp_rtol etc. From the program you can use SNESGetNPC() and then call SNESSetXXX() to set options, for the linear solver (if there is one) call SNESGetKSP() on the npc and then set options on that KSP. >> >> >> Barry >> >> >>> On Aug 5, 2020, at 3:30 PM, Adolfo Rodriguez > wrote: >>> >>> Jed, >>> >>> I tred your suggestion >>> >>> SNESGetNPC(snes, &inner); >>> SNESSetConvergenceTest(inner, YourFunc, ...); >>> >>> and it is working as expected. I had not pieced together the fact that "inner" is a "snes" object as well. >>> >>> Thanks! >>> >>> >>> >>> Virus-free. www.avast.com <> >>> On Wed, Aug 5, 2020 at 3:11 PM Jed Brown > wrote: >>> Adolfo Rodriguez > writes: >>> >>> > Actually I can set the non-linear pc. My problem relates to the convergence >>> > test. I have not found a way to set the tolerances for the inner problem, >>> > are they hard coded? >>> >>> I suggested code to set a custom convergence test. Nothing is hard-coded. What have you done? >> > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.mayhem23 at gmail.com Fri Aug 7 15:15:54 2020 From: dave.mayhem23 at gmail.com (Dave May) Date: Fri, 7 Aug 2020 22:15:54 +0200 Subject: [petsc-users] Test convergence with non linear preconditioners In-Reply-To: References: <878sesgauc.fsf@jedbrown.org> <875z9wg8dy.fsf@jedbrown.org> <939CE625-86EB-43CB-8F51-C521BDBC3BC3@petsc.dev> Message-ID: On Fri 7. Aug 2020 at 18:21, Adolfo Rodriguez wrote: > Great, that works. What would be the way to change the ilu level, I need > to use ilu(1). I > You want to use the option -xxx_pc_factor_levels 1 Where xxx is the appropriate FAS prefix level. See here https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCILU.html would assume that I can accomplish that by means of: > > -npc_fas_coarse_pc_type ilu > -npc_fas_coarse_pc_ilu_levels 1 > > I noticed that the first line actually works, but I am not sure about the > second one. > If you want to see which options are used / unused, add the command line option -options_left 1 Thanks, Dave > Thanks, > Adolfo > > > Virus-free. > www.avast.com > > <#m_5466096191530741685_m_-242385451441738719_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > > On Thu, Aug 6, 2020 at 9:23 PM Matthew Knepley wrote: > >> On Thu, Aug 6, 2020 at 10:08 PM Adolfo Rodriguez >> wrote: >> >>> Considering the output produced by snes_view (attachment), would be >>> possible to change the linear solver tolerances and the preconditioning >>> level or type? >>> >> >> Yes. The options prefix is shown for each subsolver. For example, you can >> change the linear solver type for the coarse level of FAS using >> >> -npc_fas_coarse_ksp_type bigcg >> >> Notice that you are using 1 level of FAS, so its the same as just Newton. >> >> Thanks, >> >> Matt >> >> >>> Adolfo >>> >>> >>> Virus-free. >>> www.avast.com >>> >>> <#m_5466096191530741685_m_-242385451441738719_m_5425691679335133860_m_9065914518290907664_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> >>> >>> On Wed, Aug 5, 2020 at 8:31 PM Barry Smith wrote: >>> >>>> >>>> Turtles on top of turtles on top of turtles. >>>> >>>> It is probably easiest for you to look at the actual code to see how >>>> it handles things >>>> >>>> 1) the SNESFAS uses SNES for each of the levels, for each of these >>>> level SNES you can control the convergence criteria (either from the >>>> command lineor with appropriate prefix (highly recommended) or with >>>> function calls (not recommended)); and even provide your own convergence >>>> functions run with -snes_view to see the various solvers and their >>>> prefixes). >>>> >>>> 2) at the finest level of SNESFAS it does call >>>> >>>> /* Test for convergence */ >>>> if (isFine) { >>>> ierr = >>>> (*snes->ops->converged)(snes,snes->iter,0.0,0.0,snes->norm,&snes->reason,snes->cnvP);CHKERRQ(ierr); >>>> if (snes->reason) break; >>>> } >>>> >>>> src/snes/impls/fas/fas.c line 881 so at least in theory you can provide >>>> your own convergence test function. >>>> >>>> It was certainly our intention that users can control all the >>>> convergence knobs for arbitrary imbedded nonlinear solvers including FAS >>>> but, of course, there may be bugs so let us know what doesn't work. >>>> >>>> Generally the model for FAS is to run a single (or small number of) >>>> iteration(s) on the level solves and so not directly use convergence >>>> tolerances like rtol to control the number of iterations on a level but you >>>> should be able to set any criteria you want. >>>> >>>> You should be able to run with -snes_view and change some of the >>>> criteria on the command line and see the changes presented in the >>>> -snes_view output, plus see differences in convergence behavior. >>>> >>>> >>>> >>>> Barry >>>> >>>> >>>> >>>> On Aug 5, 2020, at 8:10 PM, Adolfo Rodriguez wrote: >>>> >>>> It looks like I cannot really change the test function or anything else >>>> for this particular SNES solver (I am using SNESFas). Basically, I am >>>> trying to use the ideas exposed in the paper on Composing scalable solvers >>>> but it seems that SNESFas does not allow to change the function for testing >>>> convergence, or anything else. Is this correct? >>>> >>>> Adolfo >>>> >>>> >>>> Virus-free. >>>> www.avast.com >>>> >>>> >>>> On Wed, Aug 5, 2020 at 3:41 PM Barry Smith wrote: >>>> >>>>> >>>>> Adolfo, >>>>> >>>>> You can also just change the tolerances for the inner solve using >>>>> the options data base and the prefix for the inner solve. >>>>> >>>>> When you run with -snes_view it will show the prefix for each of >>>>> the (nested) solvers. You can also run with -help to get all the possible >>>>> options for the inner solvers. >>>>> >>>>> In this case I think the prefix is npc so you can set tolerances >>>>> with -npc_snes_rtol -npc_ksp_rtol etc. From the program you >>>>> can use SNESGetNPC() and then call SNESSetXXX() to set options, for the >>>>> linear solver (if there is one) call SNESGetKSP() on the npc and then set >>>>> options on that KSP. >>>>> >>>>> >>>>> Barry >>>>> >>>>> >>>>> On Aug 5, 2020, at 3:30 PM, Adolfo Rodriguez >>>>> wrote: >>>>> >>>>> Jed, >>>>> >>>>> I tred your suggestion >>>>> >>>>> SNESGetNPC(snes, &inner); >>>>> SNESSetConvergenceTest(inner, YourFunc, ...); >>>>> >>>>> and it is working as expected. I had not pieced together the fact that >>>>> "inner" is a "snes" object as well. >>>>> >>>>> Thanks! >>>>> >>>>> >>>>> >>>>> >>>>> Virus-free. >>>>> www.avast.com >>>>> >>>>> >>>>> On Wed, Aug 5, 2020 at 3:11 PM Jed Brown wrote: >>>>> >>>>>> Adolfo Rodriguez writes: >>>>>> >>>>>> > Actually I can set the non-linear pc. My problem relates to the >>>>>> convergence >>>>>> > test. I have not found a way to set the tolerances for the inner >>>>>> problem, >>>>>> > are they hard coded? >>>>>> >>>>>> I suggested code to set a custom convergence test. Nothing is >>>>>> hard-coded. What have you done? >>>>>> >>>>> >>>>> >>>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nb25 at rice.edu Sat Aug 8 16:55:46 2020 From: nb25 at rice.edu (Nidish) Date: Sat, 8 Aug 2020 16:55:46 -0500 Subject: [petsc-users] Best practices for solving Dense Linear systems In-Reply-To: References: <7f208bce-6c5f-48ed-db38-4f3226dbf1f4@rice.edu> <245E77E0-3166-45B2-BAB3-C100D9A0420C@petsc.dev> <1e028a97-5dc2-4dce-ad6f-bff3238a8161@rice.edu> <85F9F817-2754-4F55-9222-3E23003E79FD@petsc.dev> <6A7D902E-FACE-4778-B89A-90B043ED31C0@petsc.dev> Message-ID: On 8/7/20 12:55 PM, Barry Smith wrote: > > >> On Aug 7, 2020, at 12:26 PM, Nidish > > wrote: >> >> >> On 8/7/20 8:52 AM, Barry Smith wrote: >>> >>> >>>> On Aug 7, 2020, at 1:25 AM, Nidish >>> > wrote: >>>> >>>> Indeed - I was just using the default solver (GMRES with ILU). >>>> >>>> Using just standard LU (direct solve with "-pc_type lu -ksp_type >>>> preonly"), I find elemental to be extremely slow even for a >>>> 1000x1000 matrix. >>>> >>> >>> What about on one process? >> On just one process the performance is comparable. >>> >>> Elemental generally won't be competitive for such tiny matrices. >>>> >>>> For MPIaij it's throwing me an error if I tried "-pc_type lu". >>>> >>> >>> ? ?Yes, there is no PETSc code for sparse parallel direct solver, >>> this is expected. >>> >>> ? ?What about ? >>> >>>> mpirun -n 1 ./ksps -N 1000 -mat_type mpidense -pc_type jacobi >>>> >>>> mpirun -n 4 ./ksps -N 1000 -mat_type mpidense -pc_type jacobi >>>> >> Same results - the elemental version is MUCH slower (for 1000x1000). >>> Where will your dense matrices be coming from and how big will they >>> be in practice? This will help determine if an iterative solver is >>> appropriate. If they will be 100,000 for example then testing with >>> 1000 will tell you nothing useful, you need to test with the problem >>> size you care about. >> >> The matrices in my application arise from substructuring/Component >> Mode Synthesis conducted on a system that is linear "almost >> everywhere", for example jointed systems. The procedure we follow is: >> build a mesh & identify the nodes corresponding to the interfaces, >> reduce the model using component mode synthesis to obtain a >> representation of the system using just the interface >> degrees-of-freedom along with some (~10s) generalized "modal >> coordinates". We conduct the non-linear analyses (transient, steady >> state harmonic, etc.) using this matrices. >> >> I am interested in conducting non-linear mesh convergence for a >> particular system of interest wherein the interface DoFs are, approx, >> 4000, 8000, 12000, 16000. I'm fairly certain the dense matrices will >> not be larger. The >> > > ? ?Ok, so it is not clear how well conditioned these dense matrices > will be. > > ? ? There are three questions that need to be answered. > > 1) for your problem can iterative methods be used and will they > require less work than direct solvers. > > ? ? ? ?For direct LU the work is order N^3 to do the factorization > with a relatively small constant. Because of smart organization inside > dense LU the flops can be done very efficiently. > > ? ? ? ?For GMRES with Jacobi preconditioning the work is order N^2 > (the time for a dense matrix-vector product) for each iteration. If > the number of iterations small than the total work is much less than a > direct solver. In the worst case the number of iterations is order N > so the total work is order N^3, the same order as a direct method. > ?But the efficiency of a dense matrix-vector product is much lower > than the efficiency of a LU factorization so even if the work is the > same order it can take longer. ?One should use mpidense as the matrix > format for iterative. > > ? ? ? ?With iterative methods YOU get to decide how accurate you need > your solution, you do this by setting how small you want the residual > to be (since you can't directly control the error). By default PETSc > uses a relative decrease in the residual of 1e-5. > > 2) for your size problems can parallelism help? > > ? ? I think it should but elemental since it requires a different data > layout has additional overhead cost to get the data into the optimal > format for parallelism. > > 3) can parallelism help on YOUR machine. Just because a machine has > multiple cores it may not be able to utilize them efficiently for > solvers if the total machine memory bandwidth is limited. > > ? ?So the first thing to do is on the machine you plan to use for your > computations run the streams benchmark discussed in > https://www.mcs.anl.gov/petsc/documentation/faq.html#computers this > will give us some general idea of how much parallelism you can take > advantage of. Is the machine a parallel cluster or just a single node? > > ? ?After this I'll give you a few specific cases to run to get a > feeling for what approach would be best for your problems, > > ? ?Barry > Thank you for the responses. Here's a pointwise response to your queries: 1) I am presently working with random matrices (with a large constant value in the diagonals to ensure diagonal dominance) before I start working with the matrices from my system. At the end of the day the matrices I expect to be using can be thought of to be Schur complements of a Laplacian operator. 2) Since my application is joint dynamics, I have a non-linear function that has to be evaluated at quadrature locations on a 2D mesh and integrated to form the residue vector as well as the Jacobian matrices. There is thus potential speedup I expect for the function evaluations definitely. Since the matrices I will end up with will be dense (at least for static simulations), I wanted directions to find the best solver options for my problem. 3) I am presently on an octa-core (4 physical cores) machine with 16 Gigs of RAM. I plan to conduct code development and benchmarking on this machine before I start running larger models on a cluster I have access to. I was unable to run the streams benchmark on the cluster (PETSc 3.11.1 is installed there, and the benchmarks in the git directory was giving issues), but I was able to do this in my local machine - here's the output: scaling.log 1? 13697.5004?? Rate (MB/s) 2? 13021.9505?? Rate (MB/s) 0.950681 3? 12402.6925?? Rate (MB/s) 0.905471 4? 12309.1712?? Rate (MB/s) 0.898644 Could you point me to the part in the documentation that speaks about the different options available for dealing with dense matrices? I just realized that bindings for MUMPS are available in PETSc. Thank you very much, Nidish > >> However for frequency domain simulations, we use matrices that are >> about 10 times the size of the original matrices (whose meshes have >> been shown to be convergent in static test cases). >> >> Thank you, >> Nidish >> >>> >>> Barry >>> >>>> I'm attaching the code here, in case you'd like to have a look at >>>> what I've been trying to do. >>>> >>>> The two configurations of interest are, >>>> >>>> $> mpirun -n 4 ./ksps -N 1000 -mat_type mpiaij >>>> $> mpirun -n 4 ./ksps -N 1000 -mat_type elemental >>>> >>>> (for the GMRES with ILU) and, >>>> >>>> $> mpirun -n 4 ./ksps -N 1000 -mat_type mpiaij -pc_type lu >>>> -ksp_type preonly >>>> $> mpirun -n 4 ./ksps -N 1000 -mat_type elemental -pc_type lu >>>> -ksp_type preonly >>>> >>>> elemental seems to perform poorly in both cases. >>>> >>>> Nidish >>>> >>>> On 8/7/20 12:50 AM, Barry Smith wrote: >>>>> >>>>> ? What is the output of -ksp_view ?for the two case? >>>>> >>>>> ? It is not only the matrix format but also the matrix solver that >>>>> matters. For example if you are using an iterative solver the >>>>> elemental format won't be faster, you should use the PETSc >>>>> MPIDENSE format. The elemental format is really intended when you >>>>> use a direct LU solver for the matrix. For tiny matrices like this >>>>> an iterative solver could easily be faster than the direct solver, >>>>> it depends on the conditioning (eigenstructure) of the dense >>>>> matrix. Also the default PETSc solver uses block Jacobi with ILU >>>>> on each process if using a sparse format, ILU applied to a dense >>>>> matrix is actually LU so your solver is probably different also >>>>> between the MPIAIJ and the elemental. >>>>> >>>>> ? Barry >>>>> >>>>> >>>>> >>>>> >>>>>> On Aug 7, 2020, at 12:30 AM, Nidish >>>>> > wrote: >>>>>> >>>>>> Thank you for the response. >>>>>> >>>>>> I've just been running some tests with matrices up to 2e4 >>>>>> dimensions (dense). When I compared the solution times for >>>>>> "-mat_type elemental" and "-mat_type mpiaij" running with 4 >>>>>> cores, I found the mpidense versions running way faster than >>>>>> elemental. I have not been able to make the elemental version >>>>>> finish up for 2e4 so far (my patience runs out faster). >>>>>> >>>>>> What's going on here? I thought elemental was supposed to be >>>>>> superior for dense matrices. >>>>>> >>>>>> I can share the code if that's appropriate for this forum (sorry, >>>>>> I'm new here). >>>>>> >>>>>> Nidish >>>>>> On Aug 6, 2020, at 23:01, Barry Smith >>>>> > wrote: >>>>>> >>>>>> On Aug 6, 2020, at 7:32 PM, Nidish >>>>> > wrote: I'm relatively new to >>>>>> PETSc, and my applications involve (for the most part) >>>>>> dense matrix solves. I read in the documentation that >>>>>> this is an area PETSc does not specialize in but instead >>>>>> recommends external libraries such as Elemental. I'm >>>>>> wondering if there are any "best" practices in this >>>>>> regard. Some questions I'd like answered are: 1. Can I >>>>>> just declare my dense matrix as a sparse one and fill the >>>>>> whole matrix up? Do any of the others go this route? >>>>>> What're possible pitfalls/unfavorable outcomes for this? >>>>>> I understand the memory overhead probably shoots up. >>>>>> >>>>>> >>>>>> No, this isn't practical, the performance will be terrible. >>>>>> >>>>>> 2. Are there any specific guidelines on when I can expect >>>>>> elemental to perform better in parallel than in serial? >>>>>> >>>>>> >>>>>> Because the computation to communication ratio for dense matrices is higher than for sparse you will see better parallel performance for dense problems of a given size than sparse problems of a similar size. In other words parallelism can help for dense matrices for relatively small problems, of course the specifics of your machine hardware and software also play a role. >>>>>> >>>>>> Barry >>>>>> >>>>>> Of course, I'm interesting in any other details that may >>>>>> be important in this regard. Thank you, Nidish >>>>>> >>>>>> >>>>> >>>> -- >>>> Nidish >>>> >>> >> -- >> Nidish > -- Nidish -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sat Aug 8 17:46:50 2020 From: bsmith at petsc.dev (Barry Smith) Date: Sat, 8 Aug 2020 17:46:50 -0500 Subject: [petsc-users] Best practices for solving Dense Linear systems In-Reply-To: References: <7f208bce-6c5f-48ed-db38-4f3226dbf1f4@rice.edu> <245E77E0-3166-45B2-BAB3-C100D9A0420C@petsc.dev> <1e028a97-5dc2-4dce-ad6f-bff3238a8161@rice.edu> <85F9F817-2754-4F55-9222-3E23003E79FD@petsc.dev> <6A7D902E-FACE-4778-B89A-90B043ED31C0@petsc.dev> Message-ID: <2CE20974-A162-4645-BB64-BB21FA6475DA@petsc.dev> The user code is the same regardless of the solver. We provide support for dense matrices with direct solver formats elemental and scalapack (in the master branch) (Mumps is for sparse matrices). With iterative solvers one can use almost any of the preconditioners with MPIDENSE Using random matrices will tell you nothing about the behavior of direct vs iterative methods, you have to test with the actual matrices. But since switching between direct and iterative methods is just a command line option you can write your code to handle your actual matrices and then determine if direct or iterative methods are best. The condition number of the Schur complements of a Laplacian operator grows like the square root of the condition number of the Laplacian operator so not terribly fast, iterative methods with some modest preconditioner will likely easily beat direct methods for your size problems. Barry > On Aug 8, 2020, at 4:55 PM, Nidish wrote: > > > > On 8/7/20 12:55 PM, Barry Smith wrote: >> >> >>> On Aug 7, 2020, at 12:26 PM, Nidish > wrote: >>> >>> >>> >>> On 8/7/20 8:52 AM, Barry Smith wrote: >>>> >>>> >>>>> On Aug 7, 2020, at 1:25 AM, Nidish > wrote: >>>>> >>>>> Indeed - I was just using the default solver (GMRES with ILU). >>>>> >>>>> Using just standard LU (direct solve with "-pc_type lu -ksp_type preonly"), I find elemental to be extremely slow even for a 1000x1000 matrix. >>>>> >>>> >>>> What about on one process? >>> On just one process the performance is comparable. >>>> >>>> Elemental generally won't be competitive for such tiny matrices. >>>>> For MPIaij it's throwing me an error if I tried "-pc_type lu". >>>>> >>>> >>>> Yes, there is no PETSc code for sparse parallel direct solver, this is expected. >>>> >>>> What about ? >>>> >>>>> mpirun -n 1 ./ksps -N 1000 -mat_type mpidense -pc_type jacobi >>>>> >>>>> mpirun -n 4 ./ksps -N 1000 -mat_type mpidense -pc_type jacobi >>> Same results - the elemental version is MUCH slower (for 1000x1000). >>>> Where will your dense matrices be coming from and how big will they be in practice? This will help determine if an iterative solver is appropriate. If they will be 100,000 for example then testing with 1000 will tell you nothing useful, you need to test with the problem size you care about. >>> The matrices in my application arise from substructuring/Component Mode Synthesis conducted on a system that is linear "almost everywhere", for example jointed systems. The procedure we follow is: build a mesh & identify the nodes corresponding to the interfaces, reduce the model using component mode synthesis to obtain a representation of the system using just the interface degrees-of-freedom along with some (~10s) generalized "modal coordinates". We conduct the non-linear analyses (transient, steady state harmonic, etc.) using this matrices. >>> >>> I am interested in conducting non-linear mesh convergence for a particular system of interest wherein the interface DoFs are, approx, 4000, 8000, 12000, 16000. I'm fairly certain the dense matrices will not be larger. The >>> >> >> Ok, so it is not clear how well conditioned these dense matrices will be. >> >> There are three questions that need to be answered. >> >> 1) for your problem can iterative methods be used and will they require less work than direct solvers. >> >> For direct LU the work is order N^3 to do the factorization with a relatively small constant. Because of smart organization inside dense LU the flops can be done very efficiently. >> >> For GMRES with Jacobi preconditioning the work is order N^2 (the time for a dense matrix-vector product) for each iteration. If the number of iterations small than the total work is much less than a direct solver. In the worst case the number of iterations is order N so the total work is order N^3, the same order as a direct method. But the efficiency of a dense matrix-vector product is much lower than the efficiency of a LU factorization so even if the work is the same order it can take longer. One should use mpidense as the matrix format for iterative. >> >> With iterative methods YOU get to decide how accurate you need your solution, you do this by setting how small you want the residual to be (since you can't directly control the error). By default PETSc uses a relative decrease in the residual of 1e-5. >> >> 2) for your size problems can parallelism help? >> >> I think it should but elemental since it requires a different data layout has additional overhead cost to get the data into the optimal format for parallelism. >> >> 3) can parallelism help on YOUR machine. Just because a machine has multiple cores it may not be able to utilize them efficiently for solvers if the total machine memory bandwidth is limited. >> >> So the first thing to do is on the machine you plan to use for your computations run the streams benchmark discussed in https://www.mcs.anl.gov/petsc/documentation/faq.html#computers this will give us some general idea of how much parallelism you can take advantage of. Is the machine a parallel cluster or just a single node? >> >> After this I'll give you a few specific cases to run to get a feeling for what approach would be best for your problems, >> >> Barry >> > Thank you for the responses. Here's a pointwise response to your queries: > > 1) I am presently working with random matrices (with a large constant value in the diagonals to ensure diagonal dominance) before I start working with the matrices from my system. At the end of the day the matrices I expect to be using can be thought of to be Schur complements of a Laplacian operator. > > 2) Since my application is joint dynamics, I have a non-linear function that has to be evaluated at quadrature locations on a 2D mesh and integrated to form the residue vector as well as the Jacobian matrices. There is thus potential speedup I expect for the function evaluations definitely. > > Since the matrices I will end up with will be dense (at least for static simulations), I wanted directions to find the best solver options for my problem. > > 3) I am presently on an octa-core (4 physical cores) machine with 16 Gigs of RAM. I plan to conduct code development and benchmarking on this machine before I start running larger models on a cluster I have access to. > > I was unable to run the streams benchmark on the cluster (PETSc 3.11.1 is installed there, and the benchmarks in the git directory was giving issues), but I was able to do this in my local machine - here's the output: > > scaling.log > 1 13697.5004 Rate (MB/s) > 2 13021.9505 Rate (MB/s) 0.950681 > 3 12402.6925 Rate (MB/s) 0.905471 > 4 12309.1712 Rate (MB/s) 0.898644 > Could you point me to the part in the documentation that speaks about the different options available for dealing with dense matrices? I just realized that bindings for MUMPS are available in PETSc. > > Thank you very much, > Nidish > >> >>> >>> However for frequency domain simulations, we use matrices that are about 10 times the size of the original matrices (whose meshes have been shown to be convergent in static test cases). >>> >>> Thank you, >>> Nidish >>> >>>> >>>> Barry >>>> >>>>> I'm attaching the code here, in case you'd like to have a look at what I've been trying to do. >>>>> >>>>> The two configurations of interest are, >>>>> >>>>> $> mpirun -n 4 ./ksps -N 1000 -mat_type mpiaij >>>>> $> mpirun -n 4 ./ksps -N 1000 -mat_type elemental >>>>> >>>>> (for the GMRES with ILU) and, >>>>> >>>>> $> mpirun -n 4 ./ksps -N 1000 -mat_type mpiaij -pc_type lu -ksp_type preonly >>>>> $> mpirun -n 4 ./ksps -N 1000 -mat_type elemental -pc_type lu -ksp_type preonly >>>>> >>>>> elemental seems to perform poorly in both cases. >>>>> >>>>> Nidish >>>>> >>>>> On 8/7/20 12:50 AM, Barry Smith wrote: >>>>>> >>>>>> What is the output of -ksp_view for the two case? >>>>>> >>>>>> It is not only the matrix format but also the matrix solver that matters. For example if you are using an iterative solver the elemental format won't be faster, you should use the PETSc MPIDENSE format. The elemental format is really intended when you use a direct LU solver for the matrix. For tiny matrices like this an iterative solver could easily be faster than the direct solver, it depends on the conditioning (eigenstructure) of the dense matrix. Also the default PETSc solver uses block Jacobi with ILU on each process if using a sparse format, ILU applied to a dense matrix is actually LU so your solver is probably different also between the MPIAIJ and the elemental. >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> On Aug 7, 2020, at 12:30 AM, Nidish > wrote: >>>>>>> >>>>>>> Thank you for the response. >>>>>>> >>>>>>> I've just been running some tests with matrices up to 2e4 dimensions (dense). When I compared the solution times for "-mat_type elemental" and "-mat_type mpiaij" running with 4 cores, I found the mpidense versions running way faster than elemental. I have not been able to make the elemental version finish up for 2e4 so far (my patience runs out faster). >>>>>>> >>>>>>> What's going on here? I thought elemental was supposed to be superior for dense matrices. >>>>>>> >>>>>>> I can share the code if that's appropriate for this forum (sorry, I'm new here). >>>>>>> >>>>>>> Nidish >>>>>>> On Aug 6, 2020, at 23:01, Barry Smith > wrote: >>>>>>> On Aug 6, 2020, at 7:32 PM, Nidish > wrote: >>>>>>> >>>>>>> I'm relatively new to PETSc, and my applications involve (for the most part) dense matrix solves. >>>>>>> >>>>>>> I read in the documentation that this is an area PETSc does not specialize in but instead recommends external libraries such as Elemental. I'm wondering if there are any "best" practices in this regard. Some questions I'd like answered are: >>>>>>> >>>>>>> 1. Can I just declare my dense matrix as a sparse one and fill the whole matrix up? Do any of the others go this route? What're possible pitfalls/unfavorable outcomes for this? I understand the memory overhead probably shoots up. >>>>>>> >>>>>>> No, this isn't practical, the performance will be terrible. >>>>>>> >>>>>>> 2. Are there any specific guidelines on when I can expect elemental to perform better in parallel than in serial? >>>>>>> >>>>>>> Because the computation to communication ratio for dense matrices is higher than for sparse you will see better parallel performance for dense problems of a given size than sparse problems of a similar size. In other words parallelism can help for dense matrices for relatively small problems, of course the specifics of your machine hardware and software also play a role. >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> >>>>>>> Of course, I'm interesting in any other details that may be important in this regard. >>>>>>> >>>>>>> Thank you, >>>>>>> Nidish >>>>>>> >>>>>> >>>>> -- >>>>> Nidish >>>>> >>>> >>> -- >>> Nidish >> > -- > Nidish -------------- next part -------------- An HTML attachment was scrubbed... URL: From baagaard at usgs.gov Sat Aug 8 20:51:24 2020 From: baagaard at usgs.gov (Aagaard, Brad T) Date: Sun, 9 Aug 2020 01:51:24 +0000 Subject: [petsc-users] FW: Using TS IMEX for a mechanical problem with a fault Message-ID: <88284575-5F3A-465F-B0C8-77317F3A020F@usgs.gov> From: Matthew Knepley Date: Friday, August 7, 2020 at 12:28 PM To: PETSc Cc: Brad Aagaard Subject: [EXTERNAL] Using TS IMEX for a mechanical problem with a fault We are using ARKIMEX?(tried many of the formulations) for a problem with elastodynamics. Thus we have, in first order form, ? u_t - v = 0 ? v_t - E(u) + C (v^+ - v^-) + f = 0 ? C^T lambda = 0 Here u is the displacement, v is the velocity, E is the elastic operator, C are the compatibility condition across the fault (v^+ and v^- are velocity on opposite sides of the fault), and lambda is the Lagrange multiplier enforcing the fault conditions. We are running an example that breaks a bar in the middle and lets elastic waves travel outward and eventually fall off the end (absorbing boundary conditions). We are splitting this into a LHS and RHS for IMEX. The LHS operator looks like ?/ M? 0? ? 0 \ |? 0? ?M? ?C | \? 0 C^T? 0 / where M is the mass matrix and C is the operator from above. This gives the wrong solution, with the displacement being way too small and inconsistent with the velocity. However, if I multiply the RHS by M^{-1}, then the solution "looks" right. Do the IMEX methods assume that the LHS Jacobian is the identity somewhere? ? Thanks, ? ? ?Matt This is PyLith (https://github.com/geodynamics/pylith), so if someone wants to run it, we can help them out. -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.cse.buffalo.edu/~knepley/ From nb25 at rice.edu Sun Aug 9 01:14:49 2020 From: nb25 at rice.edu (Nidish) Date: Sun, 9 Aug 2020 01:14:49 -0500 Subject: [petsc-users] Parallelizing Nonlinear Algebraic systems without DM Message-ID: <43abdbd5-b4e0-1df4-934b-0cdac1f26bc9@rice.edu> Hello, My question is related to solving a generic nonlinear algebraic system in parallel. I'm using the word generic to point to the fact that the system may or may not be from a finite element/mesh type object - it's just a residual function and a Jacobian function that I can provide. I followed the serial example in $PETSC_DIR/src/snes/tutorials/ex1.c pretty much verbatim (albeit with a slightly different objective function whose size can be controlled), and have been trying to extend it to a parallel application (the given example is strictly serial). When I run my code in serial everything works well and I get the solution I expect. However, even if I use just two cores, the code starts returning garbage. I had a feeling it has something to do with index ordering, but I want to get this code working without using a DM object. I'm attaching my code with this email. TL;DR Version: What modifications must be made in "$PETSC_DIR/src/snes/tutorials/ex1.c" to make it work in Parallel? Is it possible to make minimal modifications to make it run in parallel too? Thank You, Nidish -------------- next part -------------- A non-text attachment was scrubbed... Name: sness.c Type: text/x-csrc Size: 4060 bytes Desc: not available URL: From bsmith at petsc.dev Sun Aug 9 01:45:26 2020 From: bsmith at petsc.dev (Barry Smith) Date: Sun, 9 Aug 2020 01:45:26 -0500 Subject: [petsc-users] Parallelizing Nonlinear Algebraic systems without DM In-Reply-To: <43abdbd5-b4e0-1df4-934b-0cdac1f26bc9@rice.edu> References: <43abdbd5-b4e0-1df4-934b-0cdac1f26bc9@rice.edu> Message-ID: <2EE2743E-8DDB-472B-B158-44D947B5FE74@petsc.dev> > On Aug 9, 2020, at 1:14 AM, Nidish wrote: > > Hello, > > My question is related to solving a generic nonlinear algebraic system in parallel. I'm using the word generic to point to the fact that the system may or may not be from a finite element/mesh type object - it's just a residual function and a Jacobian function that I can provide. > > I followed the serial example in $PETSC_DIR/src/snes/tutorials/ex1.c pretty much verbatim (albeit with a slightly different objective function whose size can be controlled), and have been trying to extend it to a parallel application (the given example is strictly serial). > > When I run my code in serial everything works well and I get the solution I expect. However, even if I use just two cores, the code starts returning garbage. I had a feeling it has something to do with index ordering, but I want to get this code working without using a DM object. I'm attaching my code with this email. > > TL;DR Version: What modifications must be made in "$PETSC_DIR/src/snes/tutorials/ex1.c" to make it work in Parallel? Is it possible to make minimal modifications to make it run in parallel too? This is wrong: VecGetSize(x, &N); VecGetOwnershipRange(x, &is, &ie); /* VecGetArrayRead(x, &xx); */ VecGetArray(x, &xx); for (int i=is; i > Thank You, > Nidish > From nb25 at rice.edu Sun Aug 9 13:37:14 2020 From: nb25 at rice.edu (Nidish) Date: Sun, 9 Aug 2020 13:37:14 -0500 Subject: [petsc-users] Parallelizing Nonlinear Algebraic systems without DM In-Reply-To: <2EE2743E-8DDB-472B-B158-44D947B5FE74@petsc.dev> References: <43abdbd5-b4e0-1df4-934b-0cdac1f26bc9@rice.edu> <2EE2743E-8DDB-472B-B158-44D947B5FE74@petsc.dev> Message-ID: <97573d1d-7a3d-08b0-4cbc-d888c2ab18a1@rice.edu> On 8/9/20 1:45 AM, Barry Smith wrote: > >> On Aug 9, 2020, at 1:14 AM, Nidish wrote: >> >> Hello, >> >> My question is related to solving a generic nonlinear algebraic system in parallel. I'm using the word generic to point to the fact that the system may or may not be from a finite element/mesh type object - it's just a residual function and a Jacobian function that I can provide. >> >> I followed the serial example in $PETSC_DIR/src/snes/tutorials/ex1.c pretty much verbatim (albeit with a slightly different objective function whose size can be controlled), and have been trying to extend it to a parallel application (the given example is strictly serial). >> >> When I run my code in serial everything works well and I get the solution I expect. However, even if I use just two cores, the code starts returning garbage. I had a feeling it has something to do with index ordering, but I want to get this code working without using a DM object. I'm attaching my code with this email. >> >> TL;DR Version: What modifications must be made in "$PETSC_DIR/src/snes/tutorials/ex1.c" to make it work in Parallel? Is it possible to make minimal modifications to make it run in parallel too? > This is wrong: > > VecGetSize(x, &N); > VecGetOwnershipRange(x, &is, &ie); > /* VecGetArrayRead(x, &xx); */ > VecGetArray(x, &xx); > for (int i=is; i > You appear to be trying to mimic a DM example but without a DM. > > VecGetArray() always returns an array with indices from 0 to VecGetLocalSize() > > Thus you need to make your loops over local indices, not global indices. > > In addition you are likely to need "ghost" values of the input vectors. If you are not using DM then you need to have code that determines what ghost values are needed and create a VecScatter to manage communicating those ghost values. Look for examples that use VecScatterBegin in the formfunction routine for how this may be done. > > Barry I wasn't accounting for the ghost values! My compiler wasn't giving segfaults when I was accessing the arrays returned by VecGetArray beyond the allocation size. (the loop indices was me trying to be smart, I was accessing xx by xx[i -is] and so on). I've changed my implementation with VecCreateGhost - the code's slightly longer but it works, thank you for this! I also had another related question about the storage of the arrays from VecGetArray() calls. Do these functions allocate new locations in memory (and can be expected to be slow)? Even for the ghosted vectors I found distinct addresses for the ghost values. I'm assuming the addresses are distinct to accommodate for cluster programming, but should a user be judicious in calls to VecGetArray() ? Thank you, Nidish > > >> Thank You, >> Nidish >> -- Nidish -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun Aug 9 13:43:51 2020 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 9 Aug 2020 14:43:51 -0400 Subject: [petsc-users] Parallelizing Nonlinear Algebraic systems without DM In-Reply-To: <97573d1d-7a3d-08b0-4cbc-d888c2ab18a1@rice.edu> References: <43abdbd5-b4e0-1df4-934b-0cdac1f26bc9@rice.edu> <2EE2743E-8DDB-472B-B158-44D947B5FE74@petsc.dev> <97573d1d-7a3d-08b0-4cbc-d888c2ab18a1@rice.edu> Message-ID: On Sun, Aug 9, 2020 at 2:37 PM Nidish wrote: > > On 8/9/20 1:45 AM, Barry Smith wrote: > > On Aug 9, 2020, at 1:14 AM, Nidish wrote: > > Hello, > > My question is related to solving a generic nonlinear algebraic system in parallel. I'm using the word generic to point to the fact that the system may or may not be from a finite element/mesh type object - it's just a residual function and a Jacobian function that I can provide. > > I followed the serial example in $PETSC_DIR/src/snes/tutorials/ex1.c pretty much verbatim (albeit with a slightly different objective function whose size can be controlled), and have been trying to extend it to a parallel application (the given example is strictly serial). > > When I run my code in serial everything works well and I get the solution I expect. However, even if I use just two cores, the code starts returning garbage. I had a feeling it has something to do with index ordering, but I want to get this code working without using a DM object. I'm attaching my code with this email. > > TL;DR Version: What modifications must be made in "$PETSC_DIR/src/snes/tutorials/ex1.c" to make it work in Parallel? Is it possible to make minimal modifications to make it run in parallel too? > > This is wrong: > > VecGetSize(x, &N); > VecGetOwnershipRange(x, &is, &ie); > /* VecGetArrayRead(x, &xx); */ > VecGetArray(x, &xx); > for (int i=is; i > You appear to be trying to mimic a DM example but without a DM. > > VecGetArray() always returns an array with indices from 0 to VecGetLocalSize() > > Thus you need to make your loops over local indices, not global indices. > > In addition you are likely to need "ghost" values of the input vectors. If you are not using DM then you need to have code that determines what ghost values are needed and create a VecScatter to manage communicating those ghost values. Look for examples that use VecScatterBegin in the formfunction routine for how this may be done. > > Barry > > I wasn't accounting for the ghost values! My compiler wasn't giving > segfaults when I was accessing the arrays returned by VecGetArray beyond > the allocation size. (the loop indices was me trying to be smart, I was > accessing xx by xx[i -is] and so on). > > I've changed my implementation with VecCreateGhost - the code's slightly > longer but it works, thank you for this! > > > I also had another related question about the storage of the arrays from > VecGetArray() calls. Do these functions allocate new locations in memory > (and can be expected to be slow)? Even for the ghosted vectors I found > distinct addresses for the ghost values. I'm assuming the addresses are > distinct to accommodate for cluster programming, but should a user be > judicious in calls to VecGetArray() ? > No. For the vectors you are using, this is a no-copy operation. Thanks, Matt > Thank you, > Nidish > > Thank You, > Nidish > > > -- > Nidish > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From nb25 at rice.edu Sun Aug 9 14:11:06 2020 From: nb25 at rice.edu (Nidish) Date: Sun, 9 Aug 2020 14:11:06 -0500 Subject: [petsc-users] Parallelizing Nonlinear Algebraic systems without DM In-Reply-To: References: <43abdbd5-b4e0-1df4-934b-0cdac1f26bc9@rice.edu> <2EE2743E-8DDB-472B-B158-44D947B5FE74@petsc.dev> <97573d1d-7a3d-08b0-4cbc-d888c2ab18a1@rice.edu> Message-ID: <38793bac-6051-4562-5518-15fd5167d828@rice.edu> Alright, thank you. Nidish On 8/9/20 1:43 PM, Matthew Knepley wrote: > On Sun, Aug 9, 2020 at 2:37 PM Nidish > wrote: > > > On 8/9/20 1:45 AM, Barry Smith wrote: >>> On Aug 9, 2020, at 1:14 AM, Nidish wrote: >>> >>> Hello, >>> >>> My question is related to solving a generic nonlinear algebraic system in parallel. I'm using the word generic to point to the fact that the system may or may not be from a finite element/mesh type object - it's just a residual function and a Jacobian function that I can provide. >>> >>> I followed the serial example in $PETSC_DIR/src/snes/tutorials/ex1.c pretty much verbatim (albeit with a slightly different objective function whose size can be controlled), and have been trying to extend it to a parallel application (the given example is strictly serial). >>> >>> When I run my code in serial everything works well and I get the solution I expect. However, even if I use just two cores, the code starts returning garbage. I had a feeling it has something to do with index ordering, but I want to get this code working without using a DM object. I'm attaching my code with this email. >>> >>> TL;DR Version: What modifications must be made in "$PETSC_DIR/src/snes/tutorials/ex1.c" to make it work in Parallel? Is it possible to make minimal modifications to make it run in parallel too? >> This is wrong: >> >> VecGetSize(x, &N); >> VecGetOwnershipRange(x, &is, &ie); >> /* VecGetArrayRead(x, &xx); */ >> VecGetArray(x, &xx); >> for (int i=is; i> >> You appear to be trying to mimic a DM example but without a DM. >> >> VecGetArray() always returns an array with indices from 0 to VecGetLocalSize() >> >> Thus you need to make your loops over local indices, not global indices. >> >> In addition you are likely to need "ghost" values of the input vectors. If you are not using DM then you need to have code that determines what ghost values are needed and create a VecScatter to manage communicating those ghost values. Look for examples that use VecScatterBegin in the formfunction routine for how this may be done. >> >> Barry > > I wasn't accounting for the ghost values! My compiler wasn't > giving segfaults when I was accessing the arrays returned by > VecGetArray beyond the allocation size. (the loop indices was me > trying to be smart, I was accessing xx by xx[i -is] and so on). > > I've changed my implementation with VecCreateGhost - the code's > slightly longer but it works, thank you for this! > > > I also had another related question about the storage of the > arrays from VecGetArray() calls. Do these functions allocate new > locations in memory (and can be expected to be slow)? Even for the > ghosted vectors I found distinct addresses for the ghost values. > I'm assuming the addresses are distinct to accommodate for cluster > programming, but should a user be judicious in calls to > VecGetArray() ? > > No. For the vectors you are using, this is a no-copy operation. > > ? Thanks, > > ? ? Matt > > Thank you, > Nidish > >>> Thank You, >>> Nidish >>> > -- > Nidish > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -- Nidish -------------- next part -------------- An HTML attachment was scrubbed... URL: From nb25 at rice.edu Sun Aug 9 21:13:34 2020 From: nb25 at rice.edu (Nidish) Date: Sun, 9 Aug 2020 21:13:34 -0500 Subject: [petsc-users] Using MatSetValuesBlocked to build Finite Element Matrices with variable coefficients Message-ID: <4960b5fb-d8fc-ecbf-4083-618a8a8d6642@rice.edu> I'm trying to write a simple 1D finite element test case (linear elasticity). It was given in the manual that using an appropriate DM and calling MatSetValuesBlocked(). I'm however unable to do this for cases where I want to take runtime inputs for physical constants in the matrices. One thing that puzzles me is the use of "const PetscInt*" and "const PetscScalar*" in the declaration of MatSetValuesBlocked(). I can't think of why this is done (if I am to loop through the elements and call this function, I don't see why the indices and/or values must be constant). The declaration I found was: PetscErrorCode MatSetValues (Mat mat,PetscInt m,constPetscInt idxm[],PetscInt n,constPetscInt idxn[],constPetscScalar v[],InsertMode addv) Here's the DM setup for my code: DM mshdm; Mater steel = {1.0, 2e11, 7800, 3.0, 400.0};// A structure with {Area, Modulus, Density, Length, Force} (runtime vars) KSP ksp; DMDACreate1d(PETSC_COMM_WORLD, DM_BOUNDARY_NONE, N, 1, 1, NULL, &mshdm);// N is a runtime variable DMSetFromOptions(mshdm); DMSetUp(mshdm); DMSetApplicationContext(mshdm, &steel);? /* User Context */ KSPCreate(PETSC_COMM_WORLD, &ksp); KSPSetDM(ksp, mshdm); KSPSetComputeRHS(ksp, RHSFun, &steel); KSPSetComputeOperators(ksp, JACFun, &steel); KSPSetFromOptions(ksp); I have no problems with the ComputeRHS function that I wrote, it seems to work fine. The ComputeOperators function on the other hand is giving me an "Argument out of range error" when MatSetValues() is called the second time (for e=1, discovered by debugging). Here's the function definition: PetscErrorCode JACFun(KSP ksp, Mat J, Mat jac, void* ctx) { ? Mater* pblm = (Mater*)ctx; ? PetscInt N, ix, n; ? PetscScalar Le; ? PetscErrorCode ierr; ? MatNullSpace nullspc; ? PetscScalar idx[2], idy[2], elstiff[2][2];// idx, idy are ids for MatSetValues() ? DM mshdm; ? ierr = KSPGetDM(ksp, &mshdm); ? ierr = DMDAGetInfo(mshdm, NULL, &N, NULL, NULL, &n, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL); ? DMDAGetCorners(mshdm, &ix, NULL, NULL, &n, NULL, NULL); ? // Construct Element Stiffness ? Le = pblm->L/N; ? elstiff[0][0] = pblm->A*pblm->E/Le; ? elstiff[0][1] = -pblm->A*pblm->E/Le; ? elstiff[1][0] = -pblm->A*pblm->E/Le; ? elstiff[1][1] = pblm->A*pblm->E/Le; ? for (int e=0; e From nb25 at rice.edu Sun Aug 9 22:05:19 2020 From: nb25 at rice.edu (Nidish) Date: Sun, 9 Aug 2020 22:05:19 -0500 Subject: [petsc-users] Using MatSetValuesBlocked to build Finite Element Matrices with variable coefficients In-Reply-To: <4960b5fb-d8fc-ecbf-4083-618a8a8d6642@rice.edu> References: <4960b5fb-d8fc-ecbf-4083-618a8a8d6642@rice.edu> Message-ID: Update: I had made some mistakes in the assembly in the previous version. They're fixed now (see below). On 8/9/20 9:13 PM, Nidish wrote: > > I'm trying to write a simple 1D finite element test case (linear > elasticity). It was given in the manual that using an appropriate DM > and calling MatSetValuesBlocked(). I'm however unable to do this for > cases where I want to take runtime inputs for physical constants in > the matrices. > > One thing that puzzles me is the use of "const PetscInt*" and "const > PetscScalar*" in the declaration of MatSetValuesBlocked(). I can't > think of why this is done (if I am to loop through the elements and > call this function, I don't see why the indices and/or values must be > constant). The declaration I found was: > > PetscErrorCode MatSetValues (Mat mat,PetscInt m,constPetscInt idxm[],PetscInt n,constPetscInt idxn[],constPetscScalar v[],InsertMode addv) > > Here's the DM setup for my code: > > DM mshdm; > Mater steel = {1.0, 2e11, 7800, 3.0, 400.0};// A structure with > {Area, Modulus, Density, Length, Force} (runtime vars) > KSP ksp; > > DMDACreate1d(PETSC_COMM_WORLD, DM_BOUNDARY_NONE, N, 1, 1, NULL, > &mshdm);// N is a runtime variable > DMSetFromOptions(mshdm); > DMSetUp(mshdm); > DMSetApplicationContext(mshdm, &steel);? /* User Context */ > > KSPCreate(PETSC_COMM_WORLD, &ksp); > KSPSetDM(ksp, mshdm); > KSPSetComputeRHS(ksp, RHSFun, &steel); > KSPSetComputeOperators(ksp, JACFun, &steel); > KSPSetFromOptions(ksp); > > I have no problems with the ComputeRHS function that I wrote, it seems > to work fine. The ComputeOperators function on the other hand is > giving me an "Argument out of range error" when MatSetValues() is > called the second time (for e=1, discovered by debugging). Here's the > function definition: > > PetscErrorCode JACFun(KSP ksp, Mat J, Mat jac, void* ctx) > { > ? Mater* pblm = (Mater*)ctx; > ? PetscInt N, ix, n; > ? PetscScalar Le; > ? PetscErrorCode ierr; > ? MatNullSpace nullspc; > > ? PetscScalar idx[2], idy[2], elstiff[2][2];// idx, idy are ids > for MatSetValues() > ? DM mshdm; > > ? ierr = KSPGetDM(ksp, &mshdm); > ? ierr = DMDAGetInfo(mshdm, NULL, &N, NULL, NULL, &n, NULL, NULL, > NULL, NULL, NULL, NULL, NULL, NULL); > ? DMDAGetCorners(mshdm, &ix, NULL, NULL, &n, NULL, NULL); > > ? // Construct Element Stiffness > ? Le = pblm->L/N; > ? elstiff[0][0] = pblm->A*pblm->E/Le; > ? elstiff[0][1] = -pblm->A*pblm->E/Le; > ? elstiff[1][0] = -pblm->A*pblm->E/Le; > ? elstiff[1][1] = pblm->A*pblm->E/Le; > > PetscInt is, ie; > ? MatGetOwnershipRange(jac, &is, &ie); > ? for (int e=is; e<((ie local elements > ??? idx[0] = e; idx[1] = e+1; > ??? idy[0] = e; idy[1] = e+1; > > ??? // This is the version I want to get working, but throws an > argument out of range error > ??? MatSetValuesBlocked(jac, 2, (const PetscInt*)idx, 2, (const > PetscInt*)idy, > ??? ??? ??? ??? (const PetscScalar*)elstiff, ADD_VALUES); > > ??? // This version seemed to not construct the correct matrix > (overlapped elements were added) > ??? /* MatSetValue(jac, e, e, elstiff[0][0], ADD_VALUES); */ > ??? /* MatSetValue(jac, e, e+1, elstiff[0][1], ADD_VALUES); */ > ??? /* MatSetValue(jac, e+1, e, elstiff[1][0], ADD_VALUES); */ > ??? /* MatSetValue(jac, e+1, e+1, elstiff[1][1], ADD_VALUES); */ > ? } > ? MatAssemblyBegin(jac, MAT_FINAL_ASSEMBLY); > ? ierr = MatAssemblyEnd(jac, MAT_FINAL_ASSEMBLY); > > ? // I'm doing a "free-free" simulation, so constraining out the > nullspace > MatNullSpaceCreate(PETSC_COMM_WORLD, PETSC_TRUE, 0, NULL, &nullspc); > ? MatSetNullSpace(J, nullspc); > ? MatNullSpaceDestroy(&nullspc); > ? PetscFunctionReturn(0); > } > > > Thank you, > Nidish -- Nidish -------------- next part -------------- An HTML attachment was scrubbed... URL: From nb25 at rice.edu Sun Aug 9 22:45:06 2020 From: nb25 at rice.edu (Nidish) Date: Sun, 9 Aug 2020 22:45:06 -0500 Subject: [petsc-users] [Closed] Using MatSetValuesBlocked to build Finite Element Matrices with variable coefficients In-Reply-To: References: <4960b5fb-d8fc-ecbf-4083-618a8a8d6642@rice.edu> Message-ID: <149c1ad4-ecf3-e358-a704-82e6f72c4894@rice.edu> Update: It was a type error (idx and idy below were declared as PetscScalar* instead of PetscInt*), I'm sorry. Guess I'm being punished for not paying attention to types. Nidish On 8/9/20 10:05 PM, Nidish wrote: > > Update: I had made some mistakes in the assembly in the previous > version. They're fixed now (see below). > > On 8/9/20 9:13 PM, Nidish wrote: >> >> I'm trying to write a simple 1D finite element test case (linear >> elasticity). It was given in the manual that using an appropriate DM >> and calling MatSetValuesBlocked(). I'm however unable to do this for >> cases where I want to take runtime inputs for physical constants in >> the matrices. >> >> One thing that puzzles me is the use of "const PetscInt*" and "const >> PetscScalar*" in the declaration of MatSetValuesBlocked(). I can't >> think of why this is done (if I am to loop through the elements and >> call this function, I don't see why the indices and/or values must be >> constant). The declaration I found was: >> >> PetscErrorCode MatSetValues (Mat mat,PetscInt m,constPetscInt idxm[],PetscInt n,constPetscInt idxn[],constPetscScalar v[],InsertMode addv) >> >> Here's the DM setup for my code: >> >> DM mshdm; >> Mater steel = {1.0, 2e11, 7800, 3.0, 400.0};// A structure with >> {Area, Modulus, Density, Length, Force} (runtime vars) >> KSP ksp; >> >> DMDACreate1d(PETSC_COMM_WORLD, DM_BOUNDARY_NONE, N, 1, 1, NULL, >> &mshdm);// N is a runtime variable >> DMSetFromOptions(mshdm); >> DMSetUp(mshdm); >> DMSetApplicationContext(mshdm, &steel);? /* User Context */ >> >> KSPCreate(PETSC_COMM_WORLD, &ksp); >> KSPSetDM(ksp, mshdm); >> KSPSetComputeRHS(ksp, RHSFun, &steel); >> KSPSetComputeOperators(ksp, JACFun, &steel); >> KSPSetFromOptions(ksp); >> >> I have no problems with the ComputeRHS function that I wrote, it >> seems to work fine. The ComputeOperators function on the other hand >> is giving me an "Argument out of range error" when MatSetValues() is >> called the second time (for e=1, discovered by debugging). Here's the >> function definition: >> >> PetscErrorCode JACFun(KSP ksp, Mat J, Mat jac, void* ctx) >> { >> ? Mater* pblm = (Mater*)ctx; >> ? PetscInt N, ix, n; >> ? PetscScalar Le; >> ? PetscErrorCode ierr; >> ? MatNullSpace nullspc; >> >> ? PetscScalar idx[2], idy[2], elstiff[2][2];// idx, idy are ids >> for MatSetValues() >> ? DM mshdm; >> >> ? ierr = KSPGetDM(ksp, &mshdm); >> ? ierr = DMDAGetInfo(mshdm, NULL, &N, NULL, NULL, &n, NULL, NULL, >> NULL, NULL, NULL, NULL, NULL, NULL); >> ? DMDAGetCorners(mshdm, &ix, NULL, NULL, &n, NULL, NULL); >> >> ? // Construct Element Stiffness >> ? Le = pblm->L/N; >> ? elstiff[0][0] = pblm->A*pblm->E/Le; >> ? elstiff[0][1] = -pblm->A*pblm->E/Le; >> ? elstiff[1][0] = -pblm->A*pblm->E/Le; >> ? elstiff[1][1] = pblm->A*pblm->E/Le; >> >> PetscInt is, ie; >> ? MatGetOwnershipRange(jac, &is, &ie); >> ? for (int e=is; e<((ie> over local elements >> ??? idx[0] = e; idx[1] = e+1; >> ??? idy[0] = e; idy[1] = e+1; >> >> ??? // This is the version I want to get working, but throws an >> argument out of range error >> ??? MatSetValuesBlocked(jac, 2, (const PetscInt*)idx, 2, (const >> PetscInt*)idy, >> ??? ??? ??? ??? (const PetscScalar*)elstiff, ADD_VALUES); >> >> ??? // This version seemed to not construct the correct matrix >> (overlapped elements were added) >> ??? /* MatSetValue(jac, e, e, elstiff[0][0], ADD_VALUES); */ >> ??? /* MatSetValue(jac, e, e+1, elstiff[0][1], ADD_VALUES); */ >> ??? /* MatSetValue(jac, e+1, e, elstiff[1][0], ADD_VALUES); */ >> ??? /* MatSetValue(jac, e+1, e+1, elstiff[1][1], ADD_VALUES); */ >> ? } >> ? MatAssemblyBegin(jac, MAT_FINAL_ASSEMBLY); >> ? ierr = MatAssemblyEnd(jac, MAT_FINAL_ASSEMBLY); >> >> ? // I'm doing a "free-free" simulation, so constraining out the >> nullspace >> MatNullSpaceCreate(PETSC_COMM_WORLD, PETSC_TRUE, 0, NULL, &nullspc); >> ? MatSetNullSpace(J, nullspc); >> ? MatNullSpaceDestroy(&nullspc); >> ? PetscFunctionReturn(0); >> } >> >> >> Thank you, >> Nidish > -- > Nidish -- Nidish -------------- next part -------------- An HTML attachment was scrubbed... URL: From nb25 at rice.edu Mon Aug 10 15:26:33 2020 From: nb25 at rice.edu (Nidish) Date: Mon, 10 Aug 2020 15:26:33 -0500 Subject: [petsc-users] MatSetValues vs MatSetValuesBlocked Message-ID: <061d21a5-d6ee-17eb-2b2b-c34b12b33ab5@rice.edu> Hello, I've written a basic FE code of an Euler-Bernoulli Beam (4th order spatial deriv interpolated using Hermite elements). While assembling matrices I noticed something peculiar - if I conducted assembly using MatSetValues it works, while if I did so using MatSetValuesBlocked it throws me an argument out of range error. You can observe this in the attached file in lines 94-95 where I have one version commented out and the other version active. I.O.W., using ??? MatSetValues(jac, 4, (const PetscInt*)idx, 4, (const PetscInt*)idx,(const PetscScalar*)elstiff, ADD_VALUES); works, while ??? MatSetValuesBlocked(jac, 4, (const PetscInt*)idx, 4, (const PetscInt*)idx,(const PetscScalar*)elstiff, ADD_VALUES); does NOT work. Also from the documentation I could not really understand what the difference was between these two. Thank you, Nidish -------------- next part -------------- A non-text attachment was scrubbed... Name: fe1dbeam.c Type: text/x-csrc Size: 2980 bytes Desc: not available URL: From jed at jedbrown.org Mon Aug 10 16:55:07 2020 From: jed at jedbrown.org (Jed Brown) Date: Mon, 10 Aug 2020 15:55:07 -0600 Subject: [petsc-users] MatSetValues vs MatSetValuesBlocked In-Reply-To: <061d21a5-d6ee-17eb-2b2b-c34b12b33ab5@rice.edu> References: <061d21a5-d6ee-17eb-2b2b-c34b12b33ab5@rice.edu> Message-ID: <87364ub1z8.fsf@jedbrown.org> It looks like you haven't called MatSetBlockSize. Is this a 1D model with scalar displacement, thus block size of 1? Nidish writes: > Hello, > > I've written a basic FE code of an Euler-Bernoulli Beam (4th order > spatial deriv interpolated using Hermite elements). > > While assembling matrices I noticed something peculiar - if I conducted > assembly using MatSetValues it works, while if I did so using > MatSetValuesBlocked it throws me an argument out of range error. You can > observe this in the attached file in lines 94-95 where I have one > version commented out and the other version active. > > I.O.W., using > ??? MatSetValues(jac, 4, (const PetscInt*)idx, 4, (const > PetscInt*)idx,(const PetscScalar*)elstiff, ADD_VALUES); > works, while > ??? MatSetValuesBlocked(jac, 4, (const PetscInt*)idx, 4, (const > PetscInt*)idx,(const PetscScalar*)elstiff, ADD_VALUES); > does NOT work. > > Also from the documentation I could not really understand what the > difference was between these two. > > Thank you, > Nidish > static char help[] = "1D Euler-Bernoulli Beam.\n"; > > #include > #include > #include > #include > #include > > PetscErrorCode RHSFun(KSP, Vec, void*); > PetscErrorCode StiffMat(KSP, Mat, Mat, void*); > > typedef struct { > PetscScalar A, E, I2, rho, L, F; > }Mater; > > int main(int nargs, char *sargs[]) > { > PetscErrorCode ierr; > PetscInt N=10; > PetscMPIInt rank, size; > ierr = PetscInitialize(&nargs, &sargs, (char*)0, help); > MPI_Comm_rank(PETSC_COMM_WORLD, &rank); > MPI_Comm_size(PETSC_COMM_WORLD, &size); > > PetscOptionsGetInt(NULL, NULL, "-N", &N, NULL); > > DM mshdm; > Mater steelbm = {1.0, 2e11, 1.0/12.0, 7800, 3.0, 1e6}; > KSP ksp; > > DMDACreate1d(PETSC_COMM_WORLD, DM_BOUNDARY_NONE, N, 2, 1, NULL, &mshdm); > DMSetFromOptions(mshdm); > DMSetUp(mshdm); > DMSetApplicationContext(mshdm, &steelbm); /* User Context */ > > KSPCreate(PETSC_COMM_WORLD, &ksp); > KSPSetDM(ksp, mshdm); > KSPSetComputeRHS(ksp, RHSFun, &steelbm); > KSPSetComputeOperators(ksp, StiffMat, &steelbm); > KSPSetFromOptions(ksp); > > KSPSolve(ksp, NULL, NULL); > > DMDestroy(&mshdm); > KSPDestroy(&ksp); > PetscFinalize(); > return 0; > } > > PetscErrorCode RHSFun(KSP ksp, Vec b, void* ctx) > { > Mater* steelbm = (Mater*)ctx; > PetscInt N; > PetscErrorCode ierr; > > VecGetSize(b, &N); > for (int i=0; i VecSetValue(b, i, 0.0, INSERT_VALUES); > VecSetValue(b, N-2, steelbm->F, INSERT_VALUES); > VecAssemblyBegin(b); > ierr = VecAssemblyEnd(b); > > PetscFunctionReturn(0); > } > > PetscErrorCode StiffMat(KSP ksp, Mat J, Mat jac, void* ctx) > { > Mater* steelbm = (Mater*)ctx; > PetscInt N, ix, n; > PetscScalar Le, EIbLec; > PetscErrorCode ierr; > DM mshdm; > > ierr = KSPGetDM(ksp, &mshdm); > ierr = DMDAGetInfo(mshdm, NULL, &N, NULL, NULL, &n, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL); > DMDAGetCorners(mshdm, &ix, NULL, NULL, &n, NULL, NULL); > > Le = steelbm->L/(N-1); > EIbLec = steelbm->E*steelbm->I2/pow(Le, 3); > PetscInt idx[4]; > PetscScalar elstiff[4][4] = {{EIbLec*12.0, EIbLec*6*Le, -EIbLec*12.0, EIbLec*6*Le}, > {EIbLec*6*Le, EIbLec*4*Le*Le, -EIbLec*6*Le, EIbLec*2*Le*Le}, > {-EIbLec*12.0, -EIbLec*6.0*Le, EIbLec*12.0, -EIbLec*6*Le}, > {EIbLec*6*Le, EIbLec*2*Le*Le, -EIbLec*6.0*Le, EIbLec*4*Le*Le}}; > > PetscInt is, ie; > is = ix; ie = ix+n; > PetscMPIInt rank; > MPI_Comm_rank(PETSC_COMM_WORLD, &rank); > > for (int e=is; e<((ie idx[0] = 2*e; idx[1] = 2*e+1; idx[2] = 2*e+2; idx[3] = 2*e+3; > > /* MatSetValuesBlocked(jac, 4, (const PetscInt*)idx, 4, (const PetscInt*)idx, */ > /* (const PetscScalar*)elstiff, ADD_VALUES); */ > MatSetValues(jac, 4, (const PetscInt*)idx, 4, (const PetscInt*)idx, > (const PetscScalar*)elstiff, ADD_VALUES); > } > MatAssemblyBegin(jac, MAT_FINAL_ASSEMBLY); > ierr = MatAssemblyEnd(jac, MAT_FINAL_ASSEMBLY); > > PetscInt rows[] = {0, 1}; > MatZeroRowsColumns(jac, 2, rows, EIbLec*12.0, NULL, NULL); > > PetscFunctionReturn(0); > } From nb25 at rice.edu Mon Aug 10 17:10:31 2020 From: nb25 at rice.edu (Nidish) Date: Mon, 10 Aug 2020 17:10:31 -0500 Subject: [petsc-users] MatSetValues vs MatSetValuesBlocked In-Reply-To: <87364ub1z8.fsf@jedbrown.org> References: <061d21a5-d6ee-17eb-2b2b-c34b12b33ab5@rice.edu> <87364ub1z8.fsf@jedbrown.org> Message-ID: It's a 1D model with displacements and rotations as DoFs at each node. I couldn't find much in the manual on MatSetBlockSize - could you provide some more information on its use? I thought since I've setup the system using DMDACreate1d (I've given 2dofs per node and a stencil width of 1 there), the matrix object should have the nonzero elements preallocated. Here's the call to DMDACreate1d: DMDACreate1d(PETSC_COMM_WORLD, DM_BOUNDARY_NONE, N, 2, 1, NULL, &mshdm); Thank you, Nidish On 8/10/20 4:55 PM, Jed Brown wrote: > It looks like you haven't called MatSetBlockSize. Is this a 1D model with scalar displacement, thus block size of 1? > > Nidish writes: > >> Hello, >> >> I've written a basic FE code of an Euler-Bernoulli Beam (4th order >> spatial deriv interpolated using Hermite elements). >> >> While assembling matrices I noticed something peculiar - if I conducted >> assembly using MatSetValues it works, while if I did so using >> MatSetValuesBlocked it throws me an argument out of range error. You can >> observe this in the attached file in lines 94-95 where I have one >> version commented out and the other version active. >> >> I.O.W., using >> ??? MatSetValues(jac, 4, (const PetscInt*)idx, 4, (const >> PetscInt*)idx,(const PetscScalar*)elstiff, ADD_VALUES); >> works, while >> ??? MatSetValuesBlocked(jac, 4, (const PetscInt*)idx, 4, (const >> PetscInt*)idx,(const PetscScalar*)elstiff, ADD_VALUES); >> does NOT work. >> >> Also from the documentation I could not really understand what the >> difference was between these two. >> >> Thank you, >> Nidish >> static char help[] = "1D Euler-Bernoulli Beam.\n"; >> >> #include >> #include >> #include >> #include >> #include >> >> PetscErrorCode RHSFun(KSP, Vec, void*); >> PetscErrorCode StiffMat(KSP, Mat, Mat, void*); >> >> typedef struct { >> PetscScalar A, E, I2, rho, L, F; >> }Mater; >> >> int main(int nargs, char *sargs[]) >> { >> PetscErrorCode ierr; >> PetscInt N=10; >> PetscMPIInt rank, size; >> ierr = PetscInitialize(&nargs, &sargs, (char*)0, help); >> MPI_Comm_rank(PETSC_COMM_WORLD, &rank); >> MPI_Comm_size(PETSC_COMM_WORLD, &size); >> >> PetscOptionsGetInt(NULL, NULL, "-N", &N, NULL); >> >> DM mshdm; >> Mater steelbm = {1.0, 2e11, 1.0/12.0, 7800, 3.0, 1e6}; >> KSP ksp; >> >> DMDACreate1d(PETSC_COMM_WORLD, DM_BOUNDARY_NONE, N, 2, 1, NULL, &mshdm); >> DMSetFromOptions(mshdm); >> DMSetUp(mshdm); >> DMSetApplicationContext(mshdm, &steelbm); /* User Context */ >> >> KSPCreate(PETSC_COMM_WORLD, &ksp); >> KSPSetDM(ksp, mshdm); >> KSPSetComputeRHS(ksp, RHSFun, &steelbm); >> KSPSetComputeOperators(ksp, StiffMat, &steelbm); >> KSPSetFromOptions(ksp); >> >> KSPSolve(ksp, NULL, NULL); >> >> DMDestroy(&mshdm); >> KSPDestroy(&ksp); >> PetscFinalize(); >> return 0; >> } >> >> PetscErrorCode RHSFun(KSP ksp, Vec b, void* ctx) >> { >> Mater* steelbm = (Mater*)ctx; >> PetscInt N; >> PetscErrorCode ierr; >> >> VecGetSize(b, &N); >> for (int i=0; i> VecSetValue(b, i, 0.0, INSERT_VALUES); >> VecSetValue(b, N-2, steelbm->F, INSERT_VALUES); >> VecAssemblyBegin(b); >> ierr = VecAssemblyEnd(b); >> >> PetscFunctionReturn(0); >> } >> >> PetscErrorCode StiffMat(KSP ksp, Mat J, Mat jac, void* ctx) >> { >> Mater* steelbm = (Mater*)ctx; >> PetscInt N, ix, n; >> PetscScalar Le, EIbLec; >> PetscErrorCode ierr; >> DM mshdm; >> >> ierr = KSPGetDM(ksp, &mshdm); >> ierr = DMDAGetInfo(mshdm, NULL, &N, NULL, NULL, &n, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL); >> DMDAGetCorners(mshdm, &ix, NULL, NULL, &n, NULL, NULL); >> >> Le = steelbm->L/(N-1); >> EIbLec = steelbm->E*steelbm->I2/pow(Le, 3); >> PetscInt idx[4]; >> PetscScalar elstiff[4][4] = {{EIbLec*12.0, EIbLec*6*Le, -EIbLec*12.0, EIbLec*6*Le}, >> {EIbLec*6*Le, EIbLec*4*Le*Le, -EIbLec*6*Le, EIbLec*2*Le*Le}, >> {-EIbLec*12.0, -EIbLec*6.0*Le, EIbLec*12.0, -EIbLec*6*Le}, >> {EIbLec*6*Le, EIbLec*2*Le*Le, -EIbLec*6.0*Le, EIbLec*4*Le*Le}}; >> >> PetscInt is, ie; >> is = ix; ie = ix+n; >> PetscMPIInt rank; >> MPI_Comm_rank(PETSC_COMM_WORLD, &rank); >> >> for (int e=is; e<((ie> idx[0] = 2*e; idx[1] = 2*e+1; idx[2] = 2*e+2; idx[3] = 2*e+3; >> >> /* MatSetValuesBlocked(jac, 4, (const PetscInt*)idx, 4, (const PetscInt*)idx, */ >> /* (const PetscScalar*)elstiff, ADD_VALUES); */ >> MatSetValues(jac, 4, (const PetscInt*)idx, 4, (const PetscInt*)idx, >> (const PetscScalar*)elstiff, ADD_VALUES); >> } >> MatAssemblyBegin(jac, MAT_FINAL_ASSEMBLY); >> ierr = MatAssemblyEnd(jac, MAT_FINAL_ASSEMBLY); >> >> PetscInt rows[] = {0, 1}; >> MatZeroRowsColumns(jac, 2, rows, EIbLec*12.0, NULL, NULL); >> >> PetscFunctionReturn(0); >> } -- Nidish From jed at jedbrown.org Mon Aug 10 17:16:13 2020 From: jed at jedbrown.org (Jed Brown) Date: Mon, 10 Aug 2020 16:16:13 -0600 Subject: [petsc-users] MatSetValues vs MatSetValuesBlocked In-Reply-To: References: <061d21a5-d6ee-17eb-2b2b-c34b12b33ab5@rice.edu> <87364ub1z8.fsf@jedbrown.org> Message-ID: <87k0y69mfm.fsf@jedbrown.org> Nidish writes: > It's a 1D model with displacements and rotations as DoFs at each node. > > I couldn't find much in the manual on MatSetBlockSize - could you > provide some more information on its use? > > I thought since I've setup the system using DMDACreate1d (I've given > 2dofs per node and a stencil width of 1 there), the matrix object should > have the nonzero elements preallocated. Here's the call to DMDACreate1d: > > DMDACreate1d(PETSC_COMM_WORLD, DM_BOUNDARY_NONE, N, 2, 1, NULL, &mshdm); Ah, that will set the block size, but then it'll expect elstiff to be an 8x8 matrix where you've only passed 4x4. idx[0] = 2*e; idx[1] = 2*e+1; idx[2] = 2*e+2; idx[3] = 2*e+3; MatSetValuesBlocked(jac, 4, (const PetscInt*)idx, 4, (const PetscInt*)idx, (const PetscScalar*)elstiff, ADD_VALUES); You don't need the casts in either case, BTW. You probably want something like this. idx[0] = e; idx[1] = e + 1; MatSetValuesBlocked(jac, 2, idx, 2, idx, elstiff, ADD_VALUES); Also, it might be more convenient to call MatSetValuesBlockedStencil(), especially if you move to a multi-dimensional problem at some point. From nb25 at rice.edu Mon Aug 10 17:26:01 2020 From: nb25 at rice.edu (Nidish) Date: Mon, 10 Aug 2020 17:26:01 -0500 Subject: [petsc-users] MatSetValues vs MatSetValuesBlocked In-Reply-To: <87k0y69mfm.fsf@jedbrown.org> References: <061d21a5-d6ee-17eb-2b2b-c34b12b33ab5@rice.edu> <87364ub1z8.fsf@jedbrown.org> <87k0y69mfm.fsf@jedbrown.org> Message-ID: Ah I get it now, MatSetBlocked has to be set node-wise. I tried this and it works, thank you. The other question I had was why are the arguments for MatSetValues() and MatSetValuesBlocked() set to const PetscInt* and const PetscScalar*? instead of just PetscInt* and PetscScalar* ? I have the typecast there so my flycheck doesn't keep throwing me warnings on emacs ;) Thank You, Nidish On 8/10/20 5:16 PM, Jed Brown wrote: > Nidish writes: > >> It's a 1D model with displacements and rotations as DoFs at each node. >> >> I couldn't find much in the manual on MatSetBlockSize - could you >> provide some more information on its use? >> >> I thought since I've setup the system using DMDACreate1d (I've given >> 2dofs per node and a stencil width of 1 there), the matrix object should >> have the nonzero elements preallocated. Here's the call to DMDACreate1d: >> >> DMDACreate1d(PETSC_COMM_WORLD, DM_BOUNDARY_NONE, N, 2, 1, NULL, &mshdm); > Ah, that will set the block size, but then it'll expect elstiff to be an 8x8 matrix where you've only passed 4x4. > > idx[0] = 2*e; idx[1] = 2*e+1; idx[2] = 2*e+2; idx[3] = 2*e+3; > > MatSetValuesBlocked(jac, 4, (const PetscInt*)idx, 4, (const PetscInt*)idx, > (const PetscScalar*)elstiff, ADD_VALUES); > > You don't need the casts in either case, BTW. You probably want something like this. > > idx[0] = e; idx[1] = e + 1; > > MatSetValuesBlocked(jac, 2, idx, 2, idx, elstiff, ADD_VALUES); > > Also, it might be more convenient to call MatSetValuesBlockedStencil(), especially if you move to a multi-dimensional problem at some point. -- Nidish From jed at jedbrown.org Mon Aug 10 17:35:09 2020 From: jed at jedbrown.org (Jed Brown) Date: Mon, 10 Aug 2020 16:35:09 -0600 Subject: [petsc-users] MatSetValues vs MatSetValuesBlocked In-Reply-To: References: <061d21a5-d6ee-17eb-2b2b-c34b12b33ab5@rice.edu> <87364ub1z8.fsf@jedbrown.org> <87k0y69mfm.fsf@jedbrown.org> Message-ID: <87blji9lk2.fsf@jedbrown.org> Nidish writes: > Ah I get it now, MatSetBlocked has to be set node-wise. I tried this and > it works, thank you. > > The other question I had was why are the arguments for MatSetValues() > and MatSetValuesBlocked() set to const PetscInt* and const PetscScalar*? > instead of just PetscInt* and PetscScalar* ? I have the typecast there > so my flycheck doesn't keep throwing me warnings on emacs ;) Your flycheck must be misconfigured. I use flycheck with clangd, but it doesn't have a problem with that (this more specific type qualifier can always be added without a cast). To pick a more mundane example, nobody casts the second argument. void *memcpy(void *dest, const void *src, size_t n); From knepley at gmail.com Mon Aug 10 17:40:20 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 10 Aug 2020 18:40:20 -0400 Subject: [petsc-users] MatSetValues vs MatSetValuesBlocked In-Reply-To: References: <061d21a5-d6ee-17eb-2b2b-c34b12b33ab5@rice.edu> <87364ub1z8.fsf@jedbrown.org> <87k0y69mfm.fsf@jedbrown.org> Message-ID: On Mon, Aug 10, 2020 at 6:26 PM Nidish wrote: > Ah I get it now, MatSetBlocked has to be set node-wise. I tried this and > it works, thank you. > > The other question I had was why are the arguments for MatSetValues() > and MatSetValuesBlocked() set to const PetscInt* and const PetscScalar* > instead of just PetscInt* and PetscScalar* ? I have the typecast there > so my flycheck doesn't keep throwing me warnings on emacs ;) > Jed is correct that this cast is implicit. The idea here is to tell the caller that we will not change the contents of the arrays that you pass in. Thanks, Matt > Thank You, > Nidish > > On 8/10/20 5:16 PM, Jed Brown wrote: > > Nidish writes: > > > >> It's a 1D model with displacements and rotations as DoFs at each node. > >> > >> I couldn't find much in the manual on MatSetBlockSize - could you > >> provide some more information on its use? > >> > >> I thought since I've setup the system using DMDACreate1d (I've given > >> 2dofs per node and a stencil width of 1 there), the matrix object should > >> have the nonzero elements preallocated. Here's the call to DMDACreate1d: > >> > >> DMDACreate1d(PETSC_COMM_WORLD, DM_BOUNDARY_NONE, N, 2, 1, NULL, > &mshdm); > > Ah, that will set the block size, but then it'll expect elstiff to be an > 8x8 matrix where you've only passed 4x4. > > > > idx[0] = 2*e; idx[1] = 2*e+1; idx[2] = 2*e+2; idx[3] = 2*e+3; > > > > MatSetValuesBlocked(jac, 4, (const PetscInt*)idx, 4, (const > PetscInt*)idx, > > (const PetscScalar*)elstiff, ADD_VALUES); > > > > You don't need the casts in either case, BTW. You probably want > something like this. > > > > idx[0] = e; idx[1] = e + 1; > > > > MatSetValuesBlocked(jac, 2, idx, 2, idx, elstiff, ADD_VALUES); > > > > Also, it might be more convenient to call MatSetValuesBlockedStencil(), > especially if you move to a multi-dimensional problem at some point. > -- > Nidish > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From nb25 at rice.edu Mon Aug 10 17:48:11 2020 From: nb25 at rice.edu (Nidish) Date: Mon, 10 Aug 2020 17:48:11 -0500 Subject: [petsc-users] MatSetValues vs MatSetValuesBlocked In-Reply-To: <87blji9lk2.fsf@jedbrown.org> References: <061d21a5-d6ee-17eb-2b2b-c34b12b33ab5@rice.edu> <87364ub1z8.fsf@jedbrown.org> <87k0y69mfm.fsf@jedbrown.org> <87blji9lk2.fsf@jedbrown.org> Message-ID: Urgh I must've been blind - flycheck was throwing warnings only if I didn't cast elstiff[][] into const PetscScalar* . Thanks for the responses! Nidish On 8/10/20 5:35 PM, Jed Brown wrote: > Nidish writes: > >> Ah I get it now, MatSetBlocked has to be set node-wise. I tried this and >> it works, thank you. >> >> The other question I had was why are the arguments for MatSetValues() >> and MatSetValuesBlocked() set to const PetscInt* and const PetscScalar* >> instead of just PetscInt* and PetscScalar* ? I have the typecast there >> so my flycheck doesn't keep throwing me warnings on emacs ;) > Your flycheck must be misconfigured. I use flycheck with clangd, but it doesn't have a problem with that (this more specific type qualifier can always be added without a cast). > > To pick a more mundane example, nobody casts the second argument. > > void *memcpy(void *dest, const void *src, size_t n); -- Nidish From nb25 at rice.edu Mon Aug 10 17:48:27 2020 From: nb25 at rice.edu (Nidish) Date: Mon, 10 Aug 2020 17:48:27 -0500 Subject: [petsc-users] MatSetValues vs MatSetValuesBlocked In-Reply-To: References: <061d21a5-d6ee-17eb-2b2b-c34b12b33ab5@rice.edu> <87364ub1z8.fsf@jedbrown.org> <87k0y69mfm.fsf@jedbrown.org> Message-ID: Ah I get it, thanks! On 8/10/20 5:40 PM, Matthew Knepley wrote: > On Mon, Aug 10, 2020 at 6:26 PM Nidish > wrote: > > Ah I get it now, MatSetBlocked has to be set node-wise. I tried > this and > it works, thank you. > > The other question I had was why are the arguments for MatSetValues() > and MatSetValuesBlocked() set to const PetscInt* and const > PetscScalar* > instead of just PetscInt* and PetscScalar* ? I have the typecast > there > so my flycheck doesn't keep throwing me warnings on emacs ;) > > > Jed is correct that this cast is implicit. The idea here is to tell > the caller that we will not change the contents of the arrays that you > pass in. > > ? Thanks, > > ? ? ?Matt > > Thank You, > Nidish > > On 8/10/20 5:16 PM, Jed Brown wrote: > > Nidish > writes: > > > >> It's a 1D model with displacements and rotations as DoFs at > each node. > >> > >> I couldn't find much in the manual on MatSetBlockSize - could you > >> provide some more information on its use? > >> > >> I thought since I've setup the system using DMDACreate1d (I've > given > >> 2dofs per node and a stencil width of 1 there), the matrix > object should > >> have the nonzero elements preallocated. Here's the call to > DMDACreate1d: > >> > >>? ? ?DMDACreate1d(PETSC_COMM_WORLD, DM_BOUNDARY_NONE, N, 2, 1, > NULL, &mshdm); > > Ah, that will set the block size, but then it'll expect elstiff > to be an 8x8 matrix where you've only passed 4x4. > > > >? ? ? idx[0] = 2*e; idx[1] = 2*e+1; idx[2] = 2*e+2; idx[3] = 2*e+3; > > > >? ? ? MatSetValuesBlocked(jac, 4, (const PetscInt*)idx, 4, (const > PetscInt*)idx, > >? ? ? ? ? ? ? ? ? ? ? ?(const PetscScalar*)elstiff, ADD_VALUES); > > > > You don't need the casts in either case, BTW.? You probably want > something like this. > > > >? ? ? idx[0] = e; idx[1] = e + 1; > > > >? ? ? MatSetValuesBlocked(jac, 2, idx, 2, idx, elstiff, ADD_VALUES); > > > > Also, it might be more convenient to call > MatSetValuesBlockedStencil(), especially if you move to a > multi-dimensional problem at some point. > -- > Nidish > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -- Nidish -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Aug 10 18:00:57 2020 From: jed at jedbrown.org (Jed Brown) Date: Mon, 10 Aug 2020 17:00:57 -0600 Subject: [petsc-users] MatSetValues vs MatSetValuesBlocked In-Reply-To: References: <061d21a5-d6ee-17eb-2b2b-c34b12b33ab5@rice.edu> <87364ub1z8.fsf@jedbrown.org> <87k0y69mfm.fsf@jedbrown.org> <87blji9lk2.fsf@jedbrown.org> Message-ID: <87y2mm85sm.fsf@jedbrown.org> Nidish writes: > Urgh I must've been blind - flycheck was throwing warnings only if I > didn't cast elstiff[][] into const PetscScalar* . You don't have to cast there either, just pass &elstiff[0][0], for example. From nb25 at rice.edu Mon Aug 10 18:12:29 2020 From: nb25 at rice.edu (Nidish) Date: Mon, 10 Aug 2020 18:12:29 -0500 Subject: [petsc-users] MatSetValues vs MatSetValuesBlocked In-Reply-To: <87y2mm85sm.fsf@jedbrown.org> References: <061d21a5-d6ee-17eb-2b2b-c34b12b33ab5@rice.edu> <87364ub1z8.fsf@jedbrown.org> <87k0y69mfm.fsf@jedbrown.org> <87blji9lk2.fsf@jedbrown.org> <87y2mm85sm.fsf@jedbrown.org> Message-ID: <37109364-cd81-eaea-73ab-4f800cb3dce5@rice.edu> Oh that makes sense - I just need to use the first element's address. Thank you! On 8/10/20 6:00 PM, Jed Brown wrote: > Nidish writes: > >> Urgh I must've been blind - flycheck was throwing warnings only if I >> didn't cast elstiff[][] into const PetscScalar* . > You don't have to cast there either, just pass &elstiff[0][0], for example. -- Nidish From rlmackie862 at gmail.com Mon Aug 10 18:29:24 2020 From: rlmackie862 at gmail.com (Randall Mackie) Date: Mon, 10 Aug 2020 16:29:24 -0700 Subject: [petsc-users] question about creating a block matrix Message-ID: <92359ABB-EC4D-40C2-A34A-C69390EA37FC@gmail.com> Dear PETSc users - I am trying to create a block matrix but it is not clear to me what is the right way to do this. First, I create 2 sparse matrices J1 and J2 using two different DMDAs. Then I compute the products J1^T J1, and J2^T J2, which are different sized matrices. Since the matrices are already constructed and built, what is the best way to place those matrices into a block matrix? Does it work to create a composite DM, call DMCreateMatrix on the composite DM, then call MatGetLocalSubMatrix on the blocks, then simply do a MatCopy? Thanks for any advice, Randy M. From mbuerkle at web.de Mon Aug 10 19:50:39 2020 From: mbuerkle at web.de (Marius Buerkle) Date: Tue, 11 Aug 2020 02:50:39 +0200 Subject: [petsc-users] componentwise matrix multiplication Message-ID: An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Mon Aug 10 21:05:24 2020 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Tue, 11 Aug 2020 02:05:24 +0000 Subject: [petsc-users] componentwise matrix multiplication In-Reply-To: References: Message-ID: Marius, No. You may write one yourself. Let us know what matrix format do you have. We'll make suggestion to you. Hong ________________________________ From: petsc-users on behalf of Marius Buerkle Sent: Monday, August 10, 2020 7:50 PM To: PETSc users list Subject: [petsc-users] componentwise matrix multiplication Hi, Is there a componentwise matrix multiplication, similar to VecPointwiseMult ? Best, Marius -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Aug 10 23:00:19 2020 From: jed at jedbrown.org (Jed Brown) Date: Mon, 10 Aug 2020 22:00:19 -0600 Subject: [petsc-users] question about creating a block matrix In-Reply-To: <92359ABB-EC4D-40C2-A34A-C69390EA37FC@gmail.com> References: <92359ABB-EC4D-40C2-A34A-C69390EA37FC@gmail.com> Message-ID: <87sgct96i4.fsf@jedbrown.org> Randall Mackie writes: > Dear PETSc users - > > I am trying to create a block matrix but it is not clear to me what is the right way to do this. > > First, I create 2 sparse matrices J1 and J2 using two different DMDAs. > Then I compute the products J1^T J1, and J2^T J2, which are different sized matrices. > > Since the matrices are already constructed and built, what is the best way to place those matrices into a block matrix? Block diagonal or are they coupled? > Does it work to create a composite DM, call DMCreateMatrix on the composite DM, then call MatGetLocalSubMatrix on the blocks, then simply do a MatCopy? If the matrices are already assembled and there is no going back on that, it'd be easiest to use MatNest and then MatConvert (if you don't want a split solver). From bastian.loehrer at tu-dresden.de Tue Aug 11 06:58:23 2020 From: bastian.loehrer at tu-dresden.de (=?UTF-8?Q?Bastian_L=c3=b6hrer?=) Date: Tue, 11 Aug 2020 13:58:23 +0200 Subject: [petsc-users] Does an optimized compilation with -march=native on multi-CPU HPC cluster make sense? Message-ID: <02b4069f-ce1a-1571-75d6-3e7894fa1b97@tu-dresden.de> Dear PETSc users, we use PETSc in our code. Therefore, we have multiple PETSc compilations lying around, each compiled differently, e.g debug build using an Intel-compiler, optimized build for Intel-CPUs using an Intel-compiler, optimized build for AMD-Rome-CPUs using an Intel-compiler, several builds using GNU-compilers ... Prior to compiling our code, we essentially set $PETSC_DIR and $PETSC_ARCH to point to a suitable PETSc build. When I compile such an optimized PETSc build aimed at Intel-CPUs and using an Intel-compiler I do so with ??? COPTFLAGS="-axCOMMON-AVX512,CORE-AVX2,AVX ..." to address all Intel processors available on our cluster. However, I recently noticed that the PETSc compilation provided by our HPC-administrators was compiled with -march=native. Does that make sense? If so this implies that my optimization flags are unnecessary, does it not? I imagine that when using a PETSc previously compiled with march=native and I compile that into my code on a CPU different from the one that has been used when compiling PETSc, I end up without optimizations of PETSc. Is that correct? Best, Bastian From rlmackie862 at gmail.com Tue Aug 11 10:49:31 2020 From: rlmackie862 at gmail.com (Randall Mackie) Date: Tue, 11 Aug 2020 08:49:31 -0700 Subject: [petsc-users] question about creating a block matrix In-Reply-To: <87sgct96i4.fsf@jedbrown.org> References: <92359ABB-EC4D-40C2-A34A-C69390EA37FC@gmail.com> <87sgct96i4.fsf@jedbrown.org> Message-ID: <6A83292D-7FD2-4B97-8BAB-B22C8B20F0B4@gmail.com> > On Aug 10, 2020, at 9:00 PM, Jed Brown wrote: > > Randall Mackie writes: > >> Dear PETSc users - >> >> I am trying to create a block matrix but it is not clear to me what is the right way to do this. >> >> First, I create 2 sparse matrices J1 and J2 using two different DMDAs. >> Then I compute the products J1^T J1, and J2^T J2, which are different sized matrices. >> >> Since the matrices are already constructed and built, what is the best way to place those matrices into a block matrix? > > Block diagonal or are they coupled? As you intuit the problem is more complicated and there are indeed off-diagonal blocks, but I wanted to start simply at first before adding more complexity. > >> Does it work to create a composite DM, call DMCreateMatrix on the composite DM, then call MatGetLocalSubMatrix on the blocks, then simply do a MatCopy? > > If the matrices are already assembled and there is no going back on that, it'd be easiest to use MatNest and then MatConvert (if you don't want a split solver). While the matrices are already assembled, it would be possible (and maybe not as difficult as I first thought) to go back and assemble in local space. The J2 matrix is the result of an interpolation from grid 2 to grid 1 using an interpolator matrix already assembled. However, I could do the interpolation in local space without the interpolator matrix. This would have the benefit of not assuming any particular matrix format, which I think is what you?re suggesting (and what is suggested in the manual). In the meantime, I will try to use a MatNest just to confirm that everything is working as it should. Thanks for your help, Randy M. From jed at jedbrown.org Tue Aug 11 13:00:09 2020 From: jed at jedbrown.org (Jed Brown) Date: Tue, 11 Aug 2020 12:00:09 -0600 Subject: [petsc-users] Does an optimized compilation with -march=native on multi-CPU HPC cluster make sense? In-Reply-To: <02b4069f-ce1a-1571-75d6-3e7894fa1b97@tu-dresden.de> References: <02b4069f-ce1a-1571-75d6-3e7894fa1b97@tu-dresden.de> Message-ID: <87lfil83me.fsf@jedbrown.org> -march=native is whatever architecture it was built on. That might be a login node. You might note that without -mprefer-vector-width=512 (gcc/clang) or -qopt-zmm-usage=high (icc), the compiler will rarely if ever actually use AVX-512 (because it causes huge stalls while the frequency is dropped). It likely doesn't pay off for most sparse matrix work. In that case, you can just build for AVX2/FMA (Haswell/Broadwell, Rome) and it'll presumably work across all machines, with comparable performance on your skylake-avx512 systems. I'd encourage you to double-check this with benchmarking of your entire app. You can check if any AVX-512 instructions have been generated using objdump -d --prefix-addresses -M intel libpetsc.so | grep zmm0 Bastian L?hrer writes: > Dear PETSc users, > > we use PETSc in our code. > Therefore, we have multiple PETSc compilations lying around, each > compiled differently, > e.g debug build using an Intel-compiler, optimized build for Intel-CPUs > using an Intel-compiler, optimized build for AMD-Rome-CPUs using an > Intel-compiler, several builds using GNU-compilers ... > > Prior to compiling our code, we essentially set $PETSC_DIR and > $PETSC_ARCH to point to a suitable PETSc build. > > When I compile such an optimized PETSc build aimed at Intel-CPUs and > using an Intel-compiler I do so with > > ??? COPTFLAGS="-axCOMMON-AVX512,CORE-AVX2,AVX ..." > > to address all Intel processors available on our cluster. > > However, I recently noticed that the PETSc compilation provided by our > HPC-administrators was compiled with -march=native. > Does that make sense? If so this implies that my optimization flags are > unnecessary, does it not? > > I imagine that when using a PETSc previously compiled with march=native > and I compile that into my code on a CPU different from the one that has > been used when compiling PETSc, I end up without optimizations of PETSc. > Is that correct? > > Best, > Bastian From mbuerkle at web.de Tue Aug 11 20:45:11 2020 From: mbuerkle at web.de (Marius Buerkle) Date: Wed, 12 Aug 2020 03:45:11 +0200 Subject: [petsc-users] componentwise matrix multiplication In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Aug 11 21:04:49 2020 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 11 Aug 2020 21:04:49 -0500 Subject: [petsc-users] Does an optimized compilation with -march=native on multi-CPU HPC cluster make sense? In-Reply-To: <02b4069f-ce1a-1571-75d6-3e7894fa1b97@tu-dresden.de> References: <02b4069f-ce1a-1571-75d6-3e7894fa1b97@tu-dresden.de> Message-ID: <7C1F0D4C-F7DD-415A-8653-184EA68A9963@petsc.dev> > On Aug 11, 2020, at 6:58 AM, Bastian L?hrer wrote: > > Dear PETSc users, > > we use PETSc in our code. > Therefore, we have multiple PETSc compilations lying around, each compiled differently, > e.g debug build using an Intel-compiler, optimized build for Intel-CPUs using an Intel-compiler, optimized build for AMD-Rome-CPUs using an Intel-compiler, several builds using GNU-compilers ... > > Prior to compiling our code, we essentially set $PETSC_DIR and $PETSC_ARCH to point to a suitable PETSc build. > > When I compile such an optimized PETSc build aimed at Intel-CPUs and using an Intel-compiler I do so with > > COPTFLAGS="-axCOMMON-AVX512,CORE-AVX2,AVX ..." > > to address all Intel processors available on our cluster. > > However, I recently noticed that the PETSc compilation provided by our HPC-administrators was compiled with -march=native. > Does that make sense? If so this implies that my optimization flags are unnecessary, does it not? > > I imagine that when using a PETSc previously compiled with march=native and I compile that into my code on a CPU different from the one that has been used when compiling PETSc, I end up without optimizations of PETSc. Is that correct? This is a good question that suggests maybe we should add a check in PETSc. At runtime could we compare the options used for the compile (and what was built in the .o) to the hardware it is being run on and produce a warning if they are out of wack or one could do better? Presumably one could also do this with non-Intel hardware and GPUs also? Then at least users would know when they copied excutables to other hardware that they are not the best possible optimization for what they are running on. Barry > > Best, > Bastian > From hzhang at mcs.anl.gov Tue Aug 11 22:31:57 2020 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Wed, 12 Aug 2020 03:31:57 +0000 Subject: [petsc-users] componentwise matrix multiplication In-Reply-To: References: , Message-ID: Marius, ________________________________ Ok I see, it is not so important I was just wondering. Presently I only need it for dense matrices, fow now I am just using MatDenseGetArray on both matrices and multiply both arrays pointwise, which is ok I guess. For non-dense matrices it may be more complicated. This is a good approach for sequential and parallel MatDense matrices. Hong Von: "Zhang, Hong" An: "Marius Buerkle" , "PETSc users list" Betreff: Re: [petsc-users] componentwise matrix multiplication Marius, No. You may write one yourself. Let us know what matrix format do you have. We'll make suggestion to you. Hong ________________________________ From: petsc-users on behalf of Marius Buerkle Sent: Monday, August 10, 2020 7:50 PM To: PETSc users list Subject: [petsc-users] componentwise matrix multiplication Hi, Is there a componentwise matrix multiplication, similar to VecPointwiseMult ? Best, Marius -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicola.varini at gmail.com Wed Aug 12 03:18:58 2020 From: nicola.varini at gmail.com (nicola varini) Date: Wed, 12 Aug 2020 10:18:58 +0200 Subject: [petsc-users] overlap cpu and gpu? In-Reply-To: References: <6C7446CE-D6FD-4087-8B81-41494FA712E7@petsc.dev> <87eeoqp3t2.fsf@jedbrown.org> Message-ID: Dear all, following the suggestions I did resubmit the simulation with the petscrc below. However I do get the following error: ======== 7362 [592]PETSC ERROR: #1 formProl0() line 748 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c 7363 [339]PETSC ERROR: Petsc has generated inconsistent data 7364 [339]PETSC ERROR: xGEQRF error 7365 [339]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. 7366 [339]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020 7367 [339]PETSC ERROR: /users/nvarini/gbs_test_nicola/bin/gbs_daint_gpu_gnu on a named nid05083 by nvarini Wed Aug 12 10:06:15 2020 7368 [339]PETSC ERROR: Configure options --with-cc=cc --with-fc=ftn --known-mpi-shared-libraries=1 --known-mpi-c-double-complex=1 --known-mpi-int64_t=1 --known-mpi-long-double=1 --with-batch=1 --known-64-bit-blas-indices=0 --LIBS=-lstdc++ --with-cxxlib-autodetect=0 --with-scalapa ck=1 --with-cxx=CC --with-debugging=0 --with-hypre-dir=/opt/cray/pe/tpsl/19.06.1/GNU/8.2/haswell --prefix=/scratch/snx3000/nvarini/petsc3.13.3-gpu --with-cuda=1 --with-cuda-c=nvcc --with-cxxlib-autodetect=0 --COPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include - -with-cxx=CC --CXXOPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include 7369 [592]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c 7370 [592]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c 7371 [592]PETSC ERROR: #4 PCSetUp() line 898 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c 7372 [592]PETSC ERROR: #5 KSPSetUp() line 376 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c 7373 [592]PETSC ERROR: #6 KSPSolve_Private() line 633 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c 7374 [316]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c 7375 [339]PETSC ERROR: #1 formProl0() line 748 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c 7376 [339]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c 7377 [339]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c 7378 [339]PETSC ERROR: #4 PCSetUp() line 898 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c 7379 [339]PETSC ERROR: #5 KSPSetUp() line 376 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c 7380 [592]PETSC ERROR: #7 KSPSolve() line 853 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c 7381 [339]PETSC ERROR: #6 KSPSolve_Private() line 633 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c 7382 [339]PETSC ERROR: #7 KSPSolve() line 853 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c 7383 On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal value (info = -7) 7384 [160]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c ======== I did try other pc_gamg_type but they fails as well. #PETSc Option Table entries: -ampere_dm_mat_type aijcusparse -ampere_dm_vec_type cuda -ampere_ksp_atol 1e-15 -ampere_ksp_initial_guess_nonzero yes -ampere_ksp_reuse_preconditioner yes -ampere_ksp_rtol 1e-7 -ampere_ksp_type dgmres -ampere_mg_levels_esteig_ksp_max_it 10 -ampere_mg_levels_esteig_ksp_type cg -ampere_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -ampere_mg_levels_ksp_type chebyshev -ampere_mg_levels_pc_type jacobi -ampere_pc_gamg_agg_nsmooths 1 -ampere_pc_gamg_coarse_eq_limit 10 -ampere_pc_gamg_reuse_interpolation true -ampere_pc_gamg_square_graph 1 -ampere_pc_gamg_threshold 0.05 -ampere_pc_gamg_threshold_scale .0 -ampere_pc_gamg_type agg -ampere_pc_type gamg -dm_mat_type aijcusparse -dm_vec_type cuda -log_view -poisson_dm_mat_type aijcusparse -poisson_dm_vec_type cuda -poisson_ksp_atol 1e-15 -poisson_ksp_initial_guess_nonzero yes -poisson_ksp_reuse_preconditioner yes -poisson_ksp_rtol 1e-7 -poisson_ksp_type dgmres -poisson_log_view -poisson_mg_levels_esteig_ksp_max_it 10 -poisson_mg_levels_esteig_ksp_type cg -poisson_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -poisson_mg_levels_ksp_max_it 1 -poisson_mg_levels_ksp_type chebyshev -poisson_mg_levels_pc_type jacobi -poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_gamg_coarse_eq_limit 10 -poisson_pc_gamg_reuse_interpolation true -poisson_pc_gamg_square_graph 1 -poisson_pc_gamg_threshold 0.05 -poisson_pc_gamg_threshold_scale .0 -poisson_pc_gamg_type agg -poisson_pc_type gamg -use_mat_nearnullspace true #End of PETSc Option Table entries Regards, Nicola Il giorno mar 4 ago 2020 alle ore 17:57 Mark Adams ha scritto: > > > On Tue, Aug 4, 2020 at 6:35 AM Stefano Zampini > wrote: > >> Nicola, >> >> You are actually not using the GPU properly, since you use HYPRE >> preconditioning, which is CPU only. One of your solvers is actually slower >> on ?GPU?. >> For a full AMG GPU, you can use PCGAMG, with cheby smoothers and with >> Jacobi preconditioning. Mark can help you out with the specific command >> line options. >> When it works properly, everything related to PC application is offloaded >> to the GPU, and you should expect to get the well-known and branded 10x >> (maybe more) speedup one is expecting from GPUs during KSPSolve >> >> > The speedup depends on the machine, but on SUMMIT, using enough CPUs to > saturate the memory bus vs all 6 GPUs the speedup is a function of problem > subdomain size. I saw 10x at about 100K equations/process. > > >> Doing what you want to do is one of the last optimization steps of an >> already optimized code before entering production. Yours is not even >> optimized for proper GPU usage yet. >> Also, any specific reason why you are using dgmres and fgmres? >> >> PETSc has not been designed with multi-threading in mind. You can achieve >> ?overlap? of the two solves by splitting the communicator. But then you >> need communications to let the two solutions talk to each other. >> >> Thanks >> Stefano >> >> >> On Aug 4, 2020, at 12:04 PM, nicola varini >> wrote: >> >> Dear all, thanks for your replies. The reason why I've asked if it is >> possible to overlap poisson and ampere is because they roughly >> take the same amount of time. Please find in attachment the profiling >> logs for only CPU and only GPU. >> Of course it is possible to split the MPI communicator and run each >> solver on different subcommunicator, however this would involve more >> communication. >> Did anyone ever tried to run 2 solvers with hyperthreading? >> Thanks >> >> >> Il giorno dom 2 ago 2020 alle ore 14:09 Mark Adams ha >> scritto: >> >>> I suspect that the Poisson and Ampere's law solve are not coupled. You >>> might be able to duplicate the communicator and use two threads. You would >>> want to configure PETSc with threadsafty and threads and I think it >>> could/should work, but this mode is never used by anyone. >>> >>> That said, I would not recommend doing this unless you feel like playing >>> in computer science, as opposed to doing application science. The best case >>> scenario you get a speedup of 2x. That is a strict upper bound, but you >>> will never come close to it. Your hardware has some balance of CPU to GPU >>> processing rate. Your application has a balance of volume of work for your >>> two solves. They have to be the same to get close to 2x speedup and that >>> ratio(s) has to be 1:1. To be concrete, from what little I can guess about >>> your applications let's assume that the cost of each of these two solves is >>> about the same (eg, Laplacians on your domain and the best case scenario). >>> But, GPU machines are configured to have roughly 1-10% of capacity in the >>> GPUs, these days, that gives you an upper bound of about 10% speedup. That >>> is noise. Upshot, unless you configure your hardware to match this problem, >>> and the two solves have the same cost, you will not see close to 2x >>> speedup. Your time is better spent elsewhere. >>> >>> Mark >>> >>> On Sat, Aug 1, 2020 at 3:24 PM Jed Brown wrote: >>> >>>> You can use MPI and split the communicator so n-1 ranks create a DMDA >>>> for one part of your system and the other rank drives the GPU in the other >>>> part. They can all be part of the same coupled system on the full >>>> communicator, but PETSc doesn't currently support some ranks having their >>>> Vec arrays on GPU and others on host, so you'd be paying host-device >>>> transfer costs on each iteration (and that might swamp any performance >>>> benefit you would have gotten). >>>> >>>> In any case, be sure to think about the execution time of each part. >>>> Load balancing with matching time-to-solution for each part can be really >>>> hard. >>>> >>>> >>>> Barry Smith writes: >>>> >>>> > Nicola, >>>> > >>>> > This is really viable or practical at this time with PETSc. It is >>>> not impossible but requires careful coding with threads, another >>>> possibility is to use one half of the virtual GPUs for each solve, this is >>>> also not trivial. I would recommend first seeing what kind of performance >>>> you can get on the GPU for each type of solve and revist this idea in the >>>> future. >>>> > >>>> > Barry >>>> > >>>> > >>>> > >>>> > >>>> >> On Jul 31, 2020, at 9:23 AM, nicola varini >>>> wrote: >>>> >> >>>> >> Hello, I would like to know if it is possible to overlap CPU and GPU >>>> with DMDA. >>>> >> I've a machine where each node has 1P100+1Haswell. >>>> >> I've to resolve Poisson and Ampere equation for each time step. >>>> >> I'm using 2D DMDA for each of them. Would be possible to compute >>>> poisson >>>> >> and ampere equation at the same time? One on CPU and the other on >>>> GPU? >>>> >> >>>> >> Thanks >>>> >>> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Aug 12 04:14:58 2020 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 12 Aug 2020 04:14:58 -0500 Subject: [petsc-users] overlap cpu and gpu? In-Reply-To: References: <6C7446CE-D6FD-4087-8B81-41494FA712E7@petsc.dev> <87eeoqp3t2.fsf@jedbrown.org> Message-ID: <5CE5A427-FCE2-43FB-9453-F83132CF0C2F@petsc.dev> Interesting, we don't see crashes in GAMG. Could you run with -ksp_view_mat binary -ksp_view_rhs binary this will create a file called binaryoutput, you can email it to petsc-maint at mcs.anl.gov or if it is too large for email post it somewhere and email the link from this we could possibly recreate the crash to see what is going wrong. Barry 7383 On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal value (info = -7) > On Aug 12, 2020, at 3:18 AM, nicola varini wrote: > > Dear all, following the suggestions I did resubmit the simulation with the petscrc below. > However I do get the following error: > ======== > 7362 [592]PETSC ERROR: #1 formProl0() line 748 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c > 7363 [339]PETSC ERROR: Petsc has generated inconsistent data > 7364 [339]PETSC ERROR: xGEQRF error > 7365 [339]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > 7366 [339]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020 > 7367 [339]PETSC ERROR: /users/nvarini/gbs_test_nicola/bin/gbs_daint_gpu_gnu on a named nid05083 by nvarini Wed Aug 12 10:06:15 2020 > 7368 [339]PETSC ERROR: Configure options --with-cc=cc --with-fc=ftn --known-mpi-shared-libraries=1 --known-mpi-c-double-complex=1 --known-mpi-int64_t=1 --known-mpi-long-double=1 --with-batch=1 --known-64-bit-blas-indices=0 --LIBS=-lstdc++ --with-cxxlib-autodetect=0 --with-scalapa ck=1 --with-cxx=CC --with-debugging=0 --with-hypre-dir=/opt/cray/pe/tpsl/19.06.1/GNU/8.2/haswell --prefix=/scratch/snx3000/nvarini/petsc3.13.3-gpu --with-cuda=1 --with-cuda-c=nvcc --with-cxxlib-autodetect=0 --COPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include - -with-cxx=CC --CXXOPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include > 7369 [592]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c > 7370 [592]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c > 7371 [592]PETSC ERROR: #4 PCSetUp() line 898 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c > 7372 [592]PETSC ERROR: #5 KSPSetUp() line 376 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c > 7373 [592]PETSC ERROR: #6 KSPSolve_Private() line 633 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c > 7374 [316]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c > 7375 [339]PETSC ERROR: #1 formProl0() line 748 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c > 7376 [339]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c > 7377 [339]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c > 7378 [339]PETSC ERROR: #4 PCSetUp() line 898 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c > 7379 [339]PETSC ERROR: #5 KSPSetUp() line 376 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c > 7380 [592]PETSC ERROR: #7 KSPSolve() line 853 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c > 7381 [339]PETSC ERROR: #6 KSPSolve_Private() line 633 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c > 7382 [339]PETSC ERROR: #7 KSPSolve() line 853 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c > 7383 On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal value (info = -7) > 7384 [160]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c > ======== > > I did try other pc_gamg_type but they fails as well. > > > #PETSc Option Table entries: > -ampere_dm_mat_type aijcusparse > -ampere_dm_vec_type cuda > -ampere_ksp_atol 1e-15 > -ampere_ksp_initial_guess_nonzero yes > -ampere_ksp_reuse_preconditioner yes > -ampere_ksp_rtol 1e-7 > -ampere_ksp_type dgmres > -ampere_mg_levels_esteig_ksp_max_it 10 > -ampere_mg_levels_esteig_ksp_type cg > -ampere_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 > -ampere_mg_levels_ksp_type chebyshev > -ampere_mg_levels_pc_type jacobi > -ampere_pc_gamg_agg_nsmooths 1 > -ampere_pc_gamg_coarse_eq_limit 10 > -ampere_pc_gamg_reuse_interpolation true > -ampere_pc_gamg_square_graph 1 > -ampere_pc_gamg_threshold 0.05 > -ampere_pc_gamg_threshold_scale .0 > -ampere_pc_gamg_type agg > -ampere_pc_type gamg > -dm_mat_type aijcusparse > -dm_vec_type cuda > -log_view > -poisson_dm_mat_type aijcusparse > -poisson_dm_vec_type cuda > -poisson_ksp_atol 1e-15 > -poisson_ksp_initial_guess_nonzero yes > -poisson_ksp_reuse_preconditioner yes > -poisson_ksp_rtol 1e-7 > -poisson_ksp_type dgmres > -poisson_log_view > -poisson_mg_levels_esteig_ksp_max_it 10 > -poisson_mg_levels_esteig_ksp_type cg > -poisson_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 > -poisson_mg_levels_ksp_max_it 1 > -poisson_mg_levels_ksp_type chebyshev > -poisson_mg_levels_pc_type jacobi > -poisson_pc_gamg_agg_nsmooths 1 > -poisson_pc_gamg_coarse_eq_limit 10 > -poisson_pc_gamg_reuse_interpolation true > -poisson_pc_gamg_square_graph 1 > -poisson_pc_gamg_threshold 0.05 > -poisson_pc_gamg_threshold_scale .0 > -poisson_pc_gamg_type agg > -poisson_pc_type gamg > -use_mat_nearnullspace true > #End of PETSc Option Table entries > > Regards, > > Nicola > > Il giorno mar 4 ago 2020 alle ore 17:57 Mark Adams > ha scritto: > > > On Tue, Aug 4, 2020 at 6:35 AM Stefano Zampini > wrote: > Nicola, > > You are actually not using the GPU properly, since you use HYPRE preconditioning, which is CPU only. One of your solvers is actually slower on ?GPU?. > For a full AMG GPU, you can use PCGAMG, with cheby smoothers and with Jacobi preconditioning. Mark can help you out with the specific command line options. > When it works properly, everything related to PC application is offloaded to the GPU, and you should expect to get the well-known and branded 10x (maybe more) speedup one is expecting from GPUs during KSPSolve > > > The speedup depends on the machine, but on SUMMIT, using enough CPUs to saturate the memory bus vs all 6 GPUs the speedup is a function of problem subdomain size. I saw 10x at about 100K equations/process. > > Doing what you want to do is one of the last optimization steps of an already optimized code before entering production. Yours is not even optimized for proper GPU usage yet. > Also, any specific reason why you are using dgmres and fgmres? > > PETSc has not been designed with multi-threading in mind. You can achieve ?overlap? of the two solves by splitting the communicator. But then you need communications to let the two solutions talk to each other. > > Thanks > Stefano > > >> On Aug 4, 2020, at 12:04 PM, nicola varini > wrote: >> >> Dear all, thanks for your replies. The reason why I've asked if it is possible to overlap poisson and ampere is because they roughly >> take the same amount of time. Please find in attachment the profiling logs for only CPU and only GPU. >> Of course it is possible to split the MPI communicator and run each solver on different subcommunicator, however this would involve more communication. >> Did anyone ever tried to run 2 solvers with hyperthreading? >> Thanks >> >> >> Il giorno dom 2 ago 2020 alle ore 14:09 Mark Adams > ha scritto: >> I suspect that the Poisson and Ampere's law solve are not coupled. You might be able to duplicate the communicator and use two threads. You would want to configure PETSc with threadsafty and threads and I think it could/should work, but this mode is never used by anyone. >> >> That said, I would not recommend doing this unless you feel like playing in computer science, as opposed to doing application science. The best case scenario you get a speedup of 2x. That is a strict upper bound, but you will never come close to it. Your hardware has some balance of CPU to GPU processing rate. Your application has a balance of volume of work for your two solves. They have to be the same to get close to 2x speedup and that ratio(s) has to be 1:1. To be concrete, from what little I can guess about your applications let's assume that the cost of each of these two solves is about the same (eg, Laplacians on your domain and the best case scenario). But, GPU machines are configured to have roughly 1-10% of capacity in the GPUs, these days, that gives you an upper bound of about 10% speedup. That is noise. Upshot, unless you configure your hardware to match this problem, and the two solves have the same cost, you will not see close to 2x speedup. Your time is better spent elsewhere. >> >> Mark >> >> On Sat, Aug 1, 2020 at 3:24 PM Jed Brown > wrote: >> You can use MPI and split the communicator so n-1 ranks create a DMDA for one part of your system and the other rank drives the GPU in the other part. They can all be part of the same coupled system on the full communicator, but PETSc doesn't currently support some ranks having their Vec arrays on GPU and others on host, so you'd be paying host-device transfer costs on each iteration (and that might swamp any performance benefit you would have gotten). >> >> In any case, be sure to think about the execution time of each part. Load balancing with matching time-to-solution for each part can be really hard. >> >> >> Barry Smith > writes: >> >> > Nicola, >> > >> > This is really viable or practical at this time with PETSc. It is not impossible but requires careful coding with threads, another possibility is to use one half of the virtual GPUs for each solve, this is also not trivial. I would recommend first seeing what kind of performance you can get on the GPU for each type of solve and revist this idea in the future. >> > >> > Barry >> > >> > >> > >> > >> >> On Jul 31, 2020, at 9:23 AM, nicola varini > wrote: >> >> >> >> Hello, I would like to know if it is possible to overlap CPU and GPU with DMDA. >> >> I've a machine where each node has 1P100+1Haswell. >> >> I've to resolve Poisson and Ampere equation for each time step. >> >> I'm using 2D DMDA for each of them. Would be possible to compute poisson >> >> and ampere equation at the same time? One on CPU and the other on GPU? >> >> >> >> Thanks >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgbk2008 at gmail.com Wed Aug 12 05:10:27 2020 From: hgbk2008 at gmail.com (hg) Date: Wed, 12 Aug 2020 12:10:27 +0200 Subject: [petsc-users] petscfv manual pages Message-ID: Hello I would like to report that the manual pages for PetscFV has point to some broken links (TS ex11). It would be nice to see some examples of Petsc with Finite Volume. https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/FV/index.html Best regards Giang -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlohry at gmail.com Wed Aug 12 06:52:22 2020 From: mlohry at gmail.com (Mark Lohry) Date: Wed, 12 Aug 2020 07:52:22 -0400 Subject: [petsc-users] Bus Error Message-ID: I'm getting seemingly random failures of late: Caught signal number 7 BUS: Bus Error, possibly illegal memory access Symptoms: 1) Seems to only happen (so far) on larger cases, 400-2000 cores 2) It doesn't happen right away -- this was running happily for several hours over several hundred time steps with no indication of bad health in the numerics 3) At least the total memory consumption seems to be within bounds, though I'm not sure about individual processes. e.g. slurm here reported Memory Efficiency: 75.23% of 1.76 TB (180.00 GB/node) 4) running the same setup twice it fails at different points Any suggestions on what to look for? This is a bit painful to work on as I can only reproduce it on large runs and then it's seemingly random. Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Aug 12 07:46:06 2020 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 12 Aug 2020 08:46:06 -0400 Subject: [petsc-users] Bus Error In-Reply-To: References: Message-ID: On Wed, Aug 12, 2020 at 7:53 AM Mark Lohry wrote: > I'm getting seemingly random failures of late: > Caught signal number 7 BUS: Bus Error, possibly illegal memory access > The first thing I would do is run valgrind on as wide an array of tests as you can. This will find problems on things that run completely fine. Thanks, Matt > Symptoms: > 1) Seems to only happen (so far) on larger cases, 400-2000 cores > 2) It doesn't happen right away -- this was running happily for several > hours over several hundred time steps with no indication of bad health in > the numerics > 3) At least the total memory consumption seems to be within bounds, though > I'm not sure about individual processes. e.g. slurm here reported Memory > Efficiency: 75.23% of 1.76 TB (180.00 GB/node) > 4) running the same setup twice it fails at different points > > Any suggestions on what to look for? This is a bit painful to work on as I > can only reproduce it on large runs and then it's seemingly random. > > > Thanks, > Mark > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicola.varini at gmail.com Wed Aug 12 09:30:27 2020 From: nicola.varini at gmail.com (nicola varini) Date: Wed, 12 Aug 2020 16:30:27 +0200 Subject: [petsc-users] overlap cpu and gpu? In-Reply-To: <5CE5A427-FCE2-43FB-9453-F83132CF0C2F@petsc.dev> References: <6C7446CE-D6FD-4087-8B81-41494FA712E7@petsc.dev> <87eeoqp3t2.fsf@jedbrown.org> <5CE5A427-FCE2-43FB-9453-F83132CF0C2F@petsc.dev> Message-ID: Dear Barry, thanks for offering to look at this. I added the options you suggested but it did create empty files. So I did save files manually. At the link https://drive.google.com/file/d/17jqLJaMyWSuAe6XSVeXzXnGsM_R2DXCy/view?usp=sharing you can find a folder with: the poisson matrix, the ampere matrix, a miniapp that read the matrices and call the solver, the petscrc file, and the slurm log that reproduces the error. I look forward to hear from you. Thanks again, Nicola Il giorno mer 12 ago 2020 alle ore 11:15 Barry Smith ha scritto: > > Interesting, we don't see crashes in GAMG. Could you run with > > -ksp_view_mat binary -ksp_view_rhs binary > > this will create a file called binaryoutput, you can email it to > petsc-maint at mcs.anl.gov or if it is too large for email post it somewhere > and email the link from this we could possibly recreate the crash to see > what is going wrong. > > Barry > > 7383 On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal value (info > = -7) > > > On Aug 12, 2020, at 3:18 AM, nicola varini > wrote: > > Dear all, following the suggestions I did resubmit the simulation with the > petscrc below. > However I do get the following error: > ======== > 7362 [592]PETSC ERROR: #1 formProl0() line 748 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c > 7363 [339]PETSC ERROR: Petsc has generated inconsistent data > 7364 [339]PETSC ERROR: xGEQRF error > 7365 [339]PETSC ERROR: See > https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > 7366 [339]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020 > 7367 [339]PETSC ERROR: > /users/nvarini/gbs_test_nicola/bin/gbs_daint_gpu_gnu on a named nid05083 > by nvarini Wed Aug 12 10:06:15 2020 > 7368 [339]PETSC ERROR: Configure options --with-cc=cc --with-fc=ftn > --known-mpi-shared-libraries=1 --known-mpi-c-double-complex=1 > --known-mpi-int64_t=1 --known-mpi-long-double=1 --with-batch=1 > --known-64-bit-blas-indices=0 --LIBS=-lstdc++ --with-cxxlib-autodetect=0 > --with-scalapa ck=1 --with-cxx=CC --with-debugging=0 > --with-hypre-dir=/opt/cray/pe/tpsl/19.06.1/GNU/8.2/haswell > --prefix=/scratch/snx3000/nvarini/petsc3.13.3-gpu --with-cuda=1 > --with-cuda-c=nvcc --with-cxxlib-autodetect=0 > --COPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include - > -with-cxx=CC > --CXXOPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include > 7369 [592]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c > 7370 [592]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c > 7371 [592]PETSC ERROR: #4 PCSetUp() line 898 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c > 7372 [592]PETSC ERROR: #5 KSPSetUp() line 376 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c > 7373 [592]PETSC ERROR: #6 KSPSolve_Private() line 633 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c > 7374 [316]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c > 7375 [339]PETSC ERROR: #1 formProl0() line 748 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c > 7376 [339]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c > 7377 [339]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c > 7378 [339]PETSC ERROR: #4 PCSetUp() line 898 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c > 7379 [339]PETSC ERROR: #5 KSPSetUp() line 376 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c > 7380 [592]PETSC ERROR: #7 KSPSolve() line 853 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c > 7381 [339]PETSC ERROR: #6 KSPSolve_Private() line 633 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c > 7382 [339]PETSC ERROR: #7 KSPSolve() line 853 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c > 7383 On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal value > (info = -7) > 7384 [160]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c > ======== > > I did try other pc_gamg_type but they fails as well. > > > #PETSc Option Table entries: > -ampere_dm_mat_type aijcusparse > -ampere_dm_vec_type cuda > -ampere_ksp_atol 1e-15 > -ampere_ksp_initial_guess_nonzero yes > -ampere_ksp_reuse_preconditioner yes > -ampere_ksp_rtol 1e-7 > -ampere_ksp_type dgmres > -ampere_mg_levels_esteig_ksp_max_it 10 > -ampere_mg_levels_esteig_ksp_type cg > -ampere_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 > -ampere_mg_levels_ksp_type chebyshev > -ampere_mg_levels_pc_type jacobi > -ampere_pc_gamg_agg_nsmooths 1 > -ampere_pc_gamg_coarse_eq_limit 10 > -ampere_pc_gamg_reuse_interpolation true > -ampere_pc_gamg_square_graph 1 > -ampere_pc_gamg_threshold 0.05 > -ampere_pc_gamg_threshold_scale .0 > -ampere_pc_gamg_type agg > -ampere_pc_type gamg > -dm_mat_type aijcusparse > -dm_vec_type cuda > -log_view > -poisson_dm_mat_type aijcusparse > -poisson_dm_vec_type cuda > -poisson_ksp_atol 1e-15 > -poisson_ksp_initial_guess_nonzero yes > -poisson_ksp_reuse_preconditioner yes > -poisson_ksp_rtol 1e-7 > -poisson_ksp_type dgmres > -poisson_log_view > -poisson_mg_levels_esteig_ksp_max_it 10 > -poisson_mg_levels_esteig_ksp_type cg > -poisson_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 > -poisson_mg_levels_ksp_max_it 1 > -poisson_mg_levels_ksp_type chebyshev > -poisson_mg_levels_pc_type jacobi > -poisson_pc_gamg_agg_nsmooths 1 > -poisson_pc_gamg_coarse_eq_limit 10 > -poisson_pc_gamg_reuse_interpolation true > -poisson_pc_gamg_square_graph 1 > -poisson_pc_gamg_threshold 0.05 > -poisson_pc_gamg_threshold_scale .0 > -poisson_pc_gamg_type agg > -poisson_pc_type gamg > -use_mat_nearnullspace true > #End of PETSc Option Table entries > > Regards, > > Nicola > > Il giorno mar 4 ago 2020 alle ore 17:57 Mark Adams ha > scritto: > >> >> >> On Tue, Aug 4, 2020 at 6:35 AM Stefano Zampini >> wrote: >> >>> Nicola, >>> >>> You are actually not using the GPU properly, since you use HYPRE >>> preconditioning, which is CPU only. One of your solvers is actually slower >>> on ?GPU?. >>> For a full AMG GPU, you can use PCGAMG, with cheby smoothers and with >>> Jacobi preconditioning. Mark can help you out with the specific command >>> line options. >>> When it works properly, everything related to PC application is >>> offloaded to the GPU, and you should expect to get the well-known and >>> branded 10x (maybe more) speedup one is expecting from GPUs during KSPSolve >>> >>> >> The speedup depends on the machine, but on SUMMIT, using enough CPUs to >> saturate the memory bus vs all 6 GPUs the speedup is a function of problem >> subdomain size. I saw 10x at about 100K equations/process. >> >> >>> Doing what you want to do is one of the last optimization steps of an >>> already optimized code before entering production. Yours is not even >>> optimized for proper GPU usage yet. >>> Also, any specific reason why you are using dgmres and fgmres? >>> >>> PETSc has not been designed with multi-threading in mind. You can >>> achieve ?overlap? of the two solves by splitting the communicator. But then >>> you need communications to let the two solutions talk to each other. >>> >>> Thanks >>> Stefano >>> >>> >>> On Aug 4, 2020, at 12:04 PM, nicola varini >>> wrote: >>> >>> Dear all, thanks for your replies. The reason why I've asked if it is >>> possible to overlap poisson and ampere is because they roughly >>> take the same amount of time. Please find in attachment the profiling >>> logs for only CPU and only GPU. >>> Of course it is possible to split the MPI communicator and run each >>> solver on different subcommunicator, however this would involve more >>> communication. >>> Did anyone ever tried to run 2 solvers with hyperthreading? >>> Thanks >>> >>> >>> Il giorno dom 2 ago 2020 alle ore 14:09 Mark Adams ha >>> scritto: >>> >>>> I suspect that the Poisson and Ampere's law solve are not coupled. You >>>> might be able to duplicate the communicator and use two threads. You would >>>> want to configure PETSc with threadsafty and threads and I think it >>>> could/should work, but this mode is never used by anyone. >>>> >>>> That said, I would not recommend doing this unless you feel like >>>> playing in computer science, as opposed to doing application science. The >>>> best case scenario you get a speedup of 2x. That is a strict upper bound, >>>> but you will never come close to it. Your hardware has some balance of CPU >>>> to GPU processing rate. Your application has a balance of volume of work >>>> for your two solves. They have to be the same to get close to 2x speedup >>>> and that ratio(s) has to be 1:1. To be concrete, from what little I can >>>> guess about your applications let's assume that the cost of each of these >>>> two solves is about the same (eg, Laplacians on your domain and the best >>>> case scenario). But, GPU machines are configured to have roughly 1-10% of >>>> capacity in the GPUs, these days, that gives you an upper bound of about >>>> 10% speedup. That is noise. Upshot, unless you configure your hardware to >>>> match this problem, and the two solves have the same cost, you will not see >>>> close to 2x speedup. Your time is better spent elsewhere. >>>> >>>> Mark >>>> >>>> On Sat, Aug 1, 2020 at 3:24 PM Jed Brown wrote: >>>> >>>>> You can use MPI and split the communicator so n-1 ranks create a DMDA >>>>> for one part of your system and the other rank drives the GPU in the other >>>>> part. They can all be part of the same coupled system on the full >>>>> communicator, but PETSc doesn't currently support some ranks having their >>>>> Vec arrays on GPU and others on host, so you'd be paying host-device >>>>> transfer costs on each iteration (and that might swamp any performance >>>>> benefit you would have gotten). >>>>> >>>>> In any case, be sure to think about the execution time of each part. >>>>> Load balancing with matching time-to-solution for each part can be really >>>>> hard. >>>>> >>>>> >>>>> Barry Smith writes: >>>>> >>>>> > Nicola, >>>>> > >>>>> > This is really viable or practical at this time with PETSc. It >>>>> is not impossible but requires careful coding with threads, another >>>>> possibility is to use one half of the virtual GPUs for each solve, this is >>>>> also not trivial. I would recommend first seeing what kind of performance >>>>> you can get on the GPU for each type of solve and revist this idea in the >>>>> future. >>>>> > >>>>> > Barry >>>>> > >>>>> > >>>>> > >>>>> > >>>>> >> On Jul 31, 2020, at 9:23 AM, nicola varini >>>>> wrote: >>>>> >> >>>>> >> Hello, I would like to know if it is possible to overlap CPU and >>>>> GPU with DMDA. >>>>> >> I've a machine where each node has 1P100+1Haswell. >>>>> >> I've to resolve Poisson and Ampere equation for each time step. >>>>> >> I'm using 2D DMDA for each of them. Would be possible to compute >>>>> poisson >>>>> >> and ampere equation at the same time? One on CPU and the other on >>>>> GPU? >>>>> >> >>>>> >> Thanks >>>>> >>>> >>> >>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Aug 12 09:58:33 2020 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 12 Aug 2020 10:58:33 -0400 Subject: [petsc-users] petscfv manual pages In-Reply-To: References: Message-ID: On Wed, Aug 12, 2020 at 6:12 AM hg wrote: > Hello > > I would like to report that the manual pages for PetscFV has point to some > broken links (TS ex11). It would be nice to see some examples of Petsc with > Finite Volume. > > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/FV/index.html > I have fixed this: https://gitlab.com/petsc/petsc/-/merge_requests/3043 Thanks, Matt > Best regards > Giang > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Aug 12 12:38:31 2020 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 12 Aug 2020 12:38:31 -0500 Subject: [petsc-users] componentwise matrix multiplication In-Reply-To: References: Message-ID: <277ECD16-9E98-4721-A7BE-8241BDD6BF48@petsc.dev> When the sparse matrices have the same nonzero structure then you also just get access to the two arrays as in the dense case. For two general sparse you can look at the code for MatAXPY and do the same thing but with multiply. Barry > On Aug 11, 2020, at 10:31 PM, Zhang, Hong via petsc-users wrote: > > Marius, > > Ok I see, it is not so important I was just wondering. Presently I only need it for dense matrices, fow now I am just using MatDenseGetArray on both matrices and multiply both arrays pointwise, which is ok I guess. For non-dense matrices it may be more complicated. > > This is a good approach for sequential and parallel MatDense matrices. > Hong > > > > Von: "Zhang, Hong" > > An: "Marius Buerkle" >, "PETSc users list" > > Betreff: Re: [petsc-users] componentwise matrix multiplication > Marius, > No. You may write one yourself. Let us know what matrix format do you have. We'll make suggestion to you. > Hong > From: petsc-users > on behalf of Marius Buerkle > > Sent: Monday, August 10, 2020 7:50 PM > To: PETSc users list > > Subject: [petsc-users] componentwise matrix multiplication > > Hi, > > Is there a componentwise matrix multiplication, similar to VecPointwiseMult ? > > Best, > Marius -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Aug 12 12:46:14 2020 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 12 Aug 2020 12:46:14 -0500 Subject: [petsc-users] Bus Error In-Reply-To: References: Message-ID: <8D172ADD-FC1A-4E71-B151-CA648951A61C@petsc.dev> Mark. When valgrind is not feasible (like on many centrally controlled batch systems) you can run PETSc with an extra flag to do some memory error checks -malloc_debug this 1) fills all malloced memory with Nan so if the code is using uninitialized memory it may be detected and 2) checks the beginning and end of each alloced memory region for out-of-bounds writes at each malloc and free. it will slow the code down a little bit but generally not a huge amount. It is no where near as good as valgrind or other memory corruption tools but it has the advantage you can run it anywhere on any size job. Barry > On Aug 12, 2020, at 7:46 AM, Matthew Knepley wrote: > > On Wed, Aug 12, 2020 at 7:53 AM Mark Lohry > wrote: > I'm getting seemingly random failures of late: > Caught signal number 7 BUS: Bus Error, possibly illegal memory access > > The first thing I would do is run valgrind on as wide an array of tests as you can. This will find problems > on things that run completely fine. > > Thanks, > > Matt > > Symptoms: > 1) Seems to only happen (so far) on larger cases, 400-2000 cores > 2) It doesn't happen right away -- this was running happily for several hours over several hundred time steps with no indication of bad health in the numerics > 3) At least the total memory consumption seems to be within bounds, though I'm not sure about individual processes. e.g. slurm here reported Memory Efficiency: 75.23% of 1.76 TB (180.00 GB/node) > 4) running the same setup twice it fails at different points > > Any suggestions on what to look for? This is a bit painful to work on as I can only reproduce it on large runs and then it's seemingly random. > > > Thanks, > Mark > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlohry at gmail.com Wed Aug 12 13:29:37 2020 From: mlohry at gmail.com (Mark Lohry) Date: Wed, 12 Aug 2020 14:29:37 -0400 Subject: [petsc-users] Bus Error In-Reply-To: <8D172ADD-FC1A-4E71-B151-CA648951A61C@petsc.dev> References: <8D172ADD-FC1A-4E71-B151-CA648951A61C@petsc.dev> Message-ID: Thanks Matt and Barry. At Matt's suggestion I ran a smaller representative case with valgrind and didn't see anything alarming (apart from a small leak in an older boost version I was using: https://github.com/boostorg/serialization/issues/104 although I don't think this was causing the issue). -malloc_debug dumps quite a lot, this is supposed to be empty right? Output pasted below. It looks like the same sequence of calls is repeated 8 times, which is how many nonlinear solves occurred in this particular run. Thoughts? [ 0]1408 bytes PetscSplitReductionCreate() line 63 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c [ 0]80 bytes PetscSplitReductionCreate() line 57 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c [ 0]16 bytes PetscCommBuildTwoSided_Allreduce() line 169 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/mpits.c [ 0]16 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c [ 0]272 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c [ 0]880 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c [ 0]960 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c [ 0]976 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c [ 0]1040 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c [ 0]64 bytes ISColoringGetIS() line 266 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/utils/iscoloring.c [ 0]32 bytes PetscCommDuplicate() line 129 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/objects/tagm.c On Wed, Aug 12, 2020 at 1:46 PM Barry Smith wrote: > > Mark. > > When valgrind is not feasible (like on many centrally controlled batch > systems) you can run PETSc with an extra flag to do some memory error checks > -malloc_debug > > this > > 1) fills all malloced memory with Nan so if the code is using > uninitialized memory it may be detected and > 2) checks the beginning and end of each alloced memory region for > out-of-bounds writes at each malloc and free. > > it will slow the code down a little bit but generally not a huge amount. > > It is no where near as good as valgrind or other memory corruption tools > but it has the advantage you can run it anywhere on any size job. > > > Barry > > > > > > On Aug 12, 2020, at 7:46 AM, Matthew Knepley wrote: > > On Wed, Aug 12, 2020 at 7:53 AM Mark Lohry wrote: > >> I'm getting seemingly random failures of late: >> Caught signal number 7 BUS: Bus Error, possibly illegal memory access >> > > The first thing I would do is run valgrind on as wide an array of tests as > you can. This will find problems > on things that run completely fine. > > Thanks, > > Matt > > >> Symptoms: >> 1) Seems to only happen (so far) on larger cases, 400-2000 cores >> 2) It doesn't happen right away -- this was running happily for several >> hours over several hundred time steps with no indication of bad health in >> the numerics >> 3) At least the total memory consumption seems to be within bounds, >> though I'm not sure about individual processes. e.g. slurm here reported >> Memory Efficiency: 75.23% of 1.76 TB (180.00 GB/node) >> 4) running the same setup twice it fails at different points >> >> Any suggestions on what to look for? This is a bit painful to work on as >> I can only reproduce it on large runs and then it's seemingly random. >> >> >> Thanks, >> Mark >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Aug 12 15:22:02 2020 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 12 Aug 2020 15:22:02 -0500 Subject: [petsc-users] Bus Error In-Reply-To: References: <8D172ADD-FC1A-4E71-B151-CA648951A61C@petsc.dev> Message-ID: Yes, there are some PETSc objects or arrays that you are not freeing so they are printed at the end of the run. For small runs this harmless but if new objects/memory is allocated at each iteration and not suitably freed it will eventually add up. Run with -malloc_view (small problem with say 2 iterations) it will print everything allocated and might be helpful. Perhaps you are calling ISColoringGetIS() and not calling ISColoringRestoreIS()? It is also possible it is a leak in PETSc, but that is unlikely since we test for them. Are you using Fortran? Barry > On Aug 12, 2020, at 1:29 PM, Mark Lohry wrote: > > Thanks Matt and Barry. At Matt's suggestion I ran a smaller representative case with valgrind and didn't see anything alarming (apart from a small leak in an older boost version I was using: https://github.com/boostorg/serialization/issues/104 although I don't think this was causing the issue). > > -malloc_debug dumps quite a lot, this is supposed to be empty right? Output pasted below. It looks like the same sequence of calls is repeated 8 times, which is how many nonlinear solves occurred in this particular run. Thoughts? > > > > [ 0]1408 bytes PetscSplitReductionCreate() line 63 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c > [ 0]80 bytes PetscSplitReductionCreate() line 57 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c > [ 0]16 bytes PetscCommBuildTwoSided_Allreduce() line 169 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/mpits.c > [ 0]16 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c > [ 0]272 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c > [ 0]880 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c > [ 0]960 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c > [ 0]976 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c > [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c > [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c > [ 0]1040 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c > [ 0]64 bytes ISColoringGetIS() line 266 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/utils/iscoloring.c > [ 0]32 bytes PetscCommDuplicate() line 129 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/objects/tagm.c > > > > On Wed, Aug 12, 2020 at 1:46 PM Barry Smith > wrote: > > Mark. > > When valgrind is not feasible (like on many centrally controlled batch systems) you can run PETSc with an extra flag to do some memory error checks > -malloc_debug > > this > > 1) fills all malloced memory with Nan so if the code is using uninitialized memory it may be detected and > 2) checks the beginning and end of each alloced memory region for out-of-bounds writes at each malloc and free. > > it will slow the code down a little bit but generally not a huge amount. > > It is no where near as good as valgrind or other memory corruption tools but it has the advantage you can run it anywhere on any size job. > > > Barry > > > > > >> On Aug 12, 2020, at 7:46 AM, Matthew Knepley > wrote: >> >> On Wed, Aug 12, 2020 at 7:53 AM Mark Lohry > wrote: >> I'm getting seemingly random failures of late: >> Caught signal number 7 BUS: Bus Error, possibly illegal memory access >> >> The first thing I would do is run valgrind on as wide an array of tests as you can. This will find problems >> on things that run completely fine. >> >> Thanks, >> >> Matt >> >> Symptoms: >> 1) Seems to only happen (so far) on larger cases, 400-2000 cores >> 2) It doesn't happen right away -- this was running happily for several hours over several hundred time steps with no indication of bad health in the numerics >> 3) At least the total memory consumption seems to be within bounds, though I'm not sure about individual processes. e.g. slurm here reported Memory Efficiency: 75.23% of 1.76 TB (180.00 GB/node) >> 4) running the same setup twice it fails at different points >> >> Any suggestions on what to look for? This is a bit painful to work on as I can only reproduce it on large runs and then it's seemingly random. >> >> >> Thanks, >> Mark >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlohry at gmail.com Wed Aug 12 19:19:19 2020 From: mlohry at gmail.com (Mark Lohry) Date: Wed, 12 Aug 2020 20:19:19 -0400 Subject: [petsc-users] Bus Error In-Reply-To: References: <8D172ADD-FC1A-4E71-B151-CA648951A61C@petsc.dev> Message-ID: > > Perhaps you are calling ISColoringGetIS() and not calling > ISColoringRestoreIS()? > I have matching ISColoringGet/Restore here, and it's only used prior to the first iteration so at least it doesn't seem to be growing. At the bottom I pasted the malloc_view and malloc_debug output from running 1 time step. I'm sort of thinking this might be a red herring -- is it possible the rank 0 process is chewing up dramatically more memory than others, like with logging or something? Like I mentioned earlier the total memory usage is well under the machine limits. I'll spring in some PetscMemoryGetMaximumUsage logging at every time step and try to get a big job going again. Are you using Fortran? > C++ [ 0]1408 bytes PetscSplitReductionCreate() line 63 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c [ 0]80 bytes PetscSplitReductionCreate() line 57 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c [ 0]16 bytes PetscCommBuildTwoSided_Allreduce() line 169 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/mpits.c [ 0]16 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c [ 0]272 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c [ 0]880 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c [ 0]960 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c [ 0]976 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c [ 0]1040 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c [ 0]64 bytes ISColoringGetIS() line 266 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/utils/iscoloring.c [ 0]32 bytes PetscCommDuplicate() line 129 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/objects/tagm.c [0] Maximum memory PetscMalloc()ed 610153776 maximum size of entire process 719073280 [0] Memory usage sorted by function [0] 6 192 DMCoarsenHookAdd() [0] 2 9984 DMCreate() [0] 2 128 DMCreate_Shell() [0] 2 64 DMDSEnlarge_Static() [0] 1 672 DMKSPCreate() [0] 3 96 DMRefineHookAdd() [0] 3 2064 DMSNESCreate() [0] 4 128 DMSubDomainHookAdd() [0] 1 768 DMTSCreate() [0] 2 96 ISColoringCreate() [0] 8 12608 ISColoringGetIS() [0] 1 307200 ISConcatenate() [0] 29 25984 ISCreate() [0] 25 400 ISCreate_General() [0] 4 64 ISCreate_Stride() [0] 20 338016 ISGeneralSetIndices_General() [0] 3 921600 ISGetIndices_Stride() [0] 2 307232 ISGlobalToLocalMappingSetUp_Basic() [0] 1 6144 ISInvertPermutation_General() [0] 3 308576 ISLocalToGlobalMappingCreate() [0] 2 32 KSPConvergedDefaultCreate() [0] 2 2816 KSPCreate() [0] 1 224 KSPCreate_FGMRES() [0] 1 8016 KSPGMRESClassicalGramSchmidtOrthogonalization() [0] 2 16032 KSPSetUp_FGMRES() [0] 4 16084160 KSPSetUp_GMRES() [0] 2 36864 MatColoringApply_SL() [0] 1 656 MatColoringCreate() [0] 6 17088 MatCreate() [0] 1 16 MatCreateMFFD_WP() [0] 1 16 MatCreateSubMatrices_SeqBAIJ() [0] 1 12288 MatCreateSubMatrix_SeqBAIJ() [0] 3 32320 MatCreateSubMatrix_SeqBAIJ_Private() [0] 2 1472 MatCreate_MFFD() [0] 1 416 MatCreate_SeqAIJ() [0] 3 864 MatCreate_SeqBAIJ() [0] 2 416 MatCreate_Shell() [0] 1 784 MatFDColoringCreate() [0] 2 12288 MatFDColoringDegreeSequence_Minpack() [0] 6 30859392 MatFDColoringSetUp_SeqXAIJ() [0] 3 42512 MatGetColumnIJ_SeqAIJ() [0] 4 72720 MatGetColumnIJ_SeqBAIJ_Color() [0] 1 6144 MatGetOrdering_Natural() [0] 2 36384 MatGetRowIJ_SeqAIJ() [0] 7 210626000 MatILUFactorSymbolic_SeqBAIJ() [0] 2 313376 MatIncreaseOverlap_SeqBAIJ() [0] 2 30740608 MatLUFactorNumeric_SeqBAIJ_N() [0] 1 6144 MatMarkDiagonal_SeqAIJ() [0] 1 6144 MatMarkDiagonal_SeqBAIJ() [0] 8 256 MatRegisterRootName() [0] 1 6160 MatSeqAIJCheckInode() [0] 4 115216 MatSeqAIJSetPreallocation_SeqAIJ() [0] 4 302779424 MatSeqBAIJSetPreallocation_SeqBAIJ() [0] 13 576 MatSolverTypeRegister() [0] 1 16 PCASMCreateSubdomains() [0] 2 1664 PCCreate() [0] 1 160 PCCreate_ASM() [0] 1 192 PCCreate_ILU() [0] 5 307264 PCSetUp_ASM() [0] 2 416 PetscBTCreate() [0] 2 3216 PetscClassPerfLogCreate() [0] 2 1616 PetscClassRegLogCreate() [0] 2 32 PetscCommBuildTwoSided_Allreduce() [0] 2 64 PetscCommDuplicate() [0] 2 1888 PetscDSCreate() [0] 2 26416 PetscEventPerfLogCreate() [0] 2 158400 PetscEventPerfLogEnsureSize() [0] 2 1616 PetscEventRegLogCreate() [0] 2 9600 PetscEventRegLogRegister() [0] 8 102400 PetscFreeSpaceGet() [0] 474 15168 PetscFunctionListAdd_Private() [0] 2 528 PetscIntStackCreate() [0] 142 11360 PetscLayoutCreate() [0] 56 896 PetscLayoutSetUp() [0] 59 9440 PetscObjectComposedDataIncreaseReal() [0] 2 576 PetscObjectListAdd() [0] 33 768 PetscOptionsGetEList() [0] 1 16 PetscOptionsHelpPrintedCreate() [0] 1 32 PetscPushSignalHandler() [0] 7 6944 PetscSFCreate() [0] 3 432 PetscSFCreate_Basic() [0] 2 1472 PetscSFLinkCreate() [0] 11 1229040 PetscSFSetUpRanks() [0] 7 614512 PetscSFSetUp_Basic() [0] 4 20096 PetscSegBufferCreate() [0] 2 1488 PetscSplitReductionCreate() [0] 2 3008 PetscStageLogCreate() [0] 1148 23872 PetscStrallocpy() [0] 6 13056 PetscStrreplace() [0] 9 3456 PetscTableCreate() [0] 1 16 PetscViewerASCIIOpen() [0] 6 96 PetscViewerAndFormatCreate() [0] 1 752 PetscViewerCreate() [0] 1 96 PetscViewerCreate_ASCII() [0] 2 1424 SNESCreate() [0] 1 16 SNESCreate_NEWTONLS() [0] 1 1008 SNESLineSearchCreate() [0] 1 16 SNESLineSearchCreate_BT() [0] 16 1824 SNESMSRegister() [0] 46 9056 TSARKIMEXRegister() [0] 1 1264 TSAdaptCreate() [0] 8 384 TSBasicSymplecticRegister() [0] 1 2160 TSCreate() [0] 1 224 TSCreate_Theta() [0] 48 5968 TSGLEERegister() [0] 41 7728 TSRKRegister() [0] 89 14736 TSRosWRegister() [0] 71 110192 VecCreate() [0] 1 307200 VecCreateGhostWithArray() [0] 123 36874080 VecCreate_MPI_Private() [0] 7 4300800 VecCreate_Seq() [0] 8 256 VecCreate_Seq_Private() [0] 6 400 VecDuplicateVecs_Default() [0] 3 2352 VecScatterCreate() [0] 7 1843296 VecScatterSetUp_SF() [0] 126 2016 VecStashCreate_Private() [0] 1 3072 mapBlockColoringToJacobian() On Wed, Aug 12, 2020 at 4:22 PM Barry Smith wrote: > > Yes, there are some PETSc objects or arrays that you are not freeing so > they are printed at the end of the run. For small runs this harmless but if > new objects/memory is allocated at each iteration and not suitably freed it > will eventually add up. > > Run with -malloc_view (small problem with say 2 iterations) it will > print everything allocated and might be helpful. > > Perhaps you are calling ISColoringGetIS() and not calling > ISColoringRestoreIS()? > > It is also possible it is a leak in PETSc, but that is unlikely since > we test for them. > > Are you using Fortran? > > Barry > > > On Aug 12, 2020, at 1:29 PM, Mark Lohry wrote: > > Thanks Matt and Barry. At Matt's suggestion I ran a smaller representative > case with valgrind and didn't see anything alarming (apart from a small > leak in an older boost version I was using: > https://github.com/boostorg/serialization/issues/104 although I don't > think this was causing the issue). > > -malloc_debug dumps quite a lot, this is supposed to be empty right? > Output pasted below. It looks like the same sequence of calls is repeated 8 > times, which is how many nonlinear solves occurred in this particular run. > Thoughts? > > > > [ 0]1408 bytes PetscSplitReductionCreate() line 63 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c > [ 0]80 bytes PetscSplitReductionCreate() line 57 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c > [ 0]16 bytes PetscCommBuildTwoSided_Allreduce() line 169 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/mpits.c > [ 0]16 bytes ISGeneralSetIndices_General() line 578 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]16 bytes PetscLayoutSetUp() line 269 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]80 bytes PetscLayoutCreate() line 55 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]16 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]32 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]16 bytes ISCreate_General() line 647 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]896 bytes ISCreate() line 37 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c > [ 0]272 bytes ISGeneralSetIndices_General() line 578 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]16 bytes PetscLayoutSetUp() line 269 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]80 bytes PetscLayoutCreate() line 55 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]16 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]32 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]16 bytes ISCreate_General() line 647 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]896 bytes ISCreate() line 37 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c > [ 0]880 bytes ISGeneralSetIndices_General() line 578 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]16 bytes PetscLayoutSetUp() line 269 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]80 bytes PetscLayoutCreate() line 55 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]16 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]32 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]16 bytes ISCreate_General() line 647 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]896 bytes ISCreate() line 37 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c > [ 0]960 bytes ISGeneralSetIndices_General() line 578 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]16 bytes PetscLayoutSetUp() line 269 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]80 bytes PetscLayoutCreate() line 55 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]16 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]32 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]16 bytes ISCreate_General() line 647 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]896 bytes ISCreate() line 37 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c > [ 0]976 bytes ISGeneralSetIndices_General() line 578 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]16 bytes PetscLayoutSetUp() line 269 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]80 bytes PetscLayoutCreate() line 55 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]16 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]32 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]16 bytes ISCreate_General() line 647 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]896 bytes ISCreate() line 37 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c > [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]16 bytes PetscLayoutSetUp() line 269 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]80 bytes PetscLayoutCreate() line 55 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]16 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]32 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]16 bytes ISCreate_General() line 647 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]896 bytes ISCreate() line 37 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c > [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]16 bytes PetscLayoutSetUp() line 269 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]80 bytes PetscLayoutCreate() line 55 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]16 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]32 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]16 bytes ISCreate_General() line 647 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]896 bytes ISCreate() line 37 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c > [ 0]1040 bytes ISGeneralSetIndices_General() line 578 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]16 bytes PetscLayoutSetUp() line 269 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]80 bytes PetscLayoutCreate() line 55 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]16 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]32 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]16 bytes ISCreate_General() line 647 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]896 bytes ISCreate() line 37 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c > [ 0]64 bytes ISColoringGetIS() line 266 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/utils/iscoloring.c > [ 0]32 bytes PetscCommDuplicate() line 129 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/objects/tagm.c > > > > On Wed, Aug 12, 2020 at 1:46 PM Barry Smith wrote: > >> >> Mark. >> >> When valgrind is not feasible (like on many centrally controlled >> batch systems) you can run PETSc with an extra flag to do some memory error >> checks >> -malloc_debug >> >> this >> >> 1) fills all malloced memory with Nan so if the code is using >> uninitialized memory it may be detected and >> 2) checks the beginning and end of each alloced memory region for >> out-of-bounds writes at each malloc and free. >> >> it will slow the code down a little bit but generally not a huge amount. >> >> It is no where near as good as valgrind or other memory corruption tools >> but it has the advantage you can run it anywhere on any size job. >> >> >> Barry >> >> >> >> >> >> On Aug 12, 2020, at 7:46 AM, Matthew Knepley wrote: >> >> On Wed, Aug 12, 2020 at 7:53 AM Mark Lohry wrote: >> >>> I'm getting seemingly random failures of late: >>> Caught signal number 7 BUS: Bus Error, possibly illegal memory access >>> >> >> The first thing I would do is run valgrind on as wide an array of tests >> as you can. This will find problems >> on things that run completely fine. >> >> Thanks, >> >> Matt >> >> >>> Symptoms: >>> 1) Seems to only happen (so far) on larger cases, 400-2000 cores >>> 2) It doesn't happen right away -- this was running happily for several >>> hours over several hundred time steps with no indication of bad health in >>> the numerics >>> 3) At least the total memory consumption seems to be within bounds, >>> though I'm not sure about individual processes. e.g. slurm here reported >>> Memory Efficiency: 75.23% of 1.76 TB (180.00 GB/node) >>> 4) running the same setup twice it fails at different points >>> >>> Any suggestions on what to look for? This is a bit painful to work on as >>> I can only reproduce it on large runs and then it's seemingly random. >>> >>> >>> Thanks, >>> Mark >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed Aug 12 19:19:59 2020 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 12 Aug 2020 20:19:59 -0400 Subject: [petsc-users] overlap cpu and gpu? In-Reply-To: References: <6C7446CE-D6FD-4087-8B81-41494FA712E7@petsc.dev> <87eeoqp3t2.fsf@jedbrown.org> Message-ID: Can you reproduce this on the CPU? The QR factorization seems to be failing. That could be from bad data or a bad GPU QR. On Wed, Aug 12, 2020 at 4:19 AM nicola varini wrote: > Dear all, following the suggestions I did resubmit the simulation with the > petscrc below. > However I do get the following error: > ======== > 7362 [592]PETSC ERROR: #1 formProl0() line 748 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c > 7363 [339]PETSC ERROR: Petsc has generated inconsistent data > 7364 [339]PETSC ERROR: xGEQRF error > 7365 [339]PETSC ERROR: See > https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > 7366 [339]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020 > 7367 [339]PETSC ERROR: > /users/nvarini/gbs_test_nicola/bin/gbs_daint_gpu_gnu on a named nid05083 > by nvarini Wed Aug 12 10:06:15 2020 > 7368 [339]PETSC ERROR: Configure options --with-cc=cc --with-fc=ftn > --known-mpi-shared-libraries=1 --known-mpi-c-double-complex=1 > --known-mpi-int64_t=1 --known-mpi-long-double=1 --with-batch=1 > --known-64-bit-blas-indices=0 --LIBS=-lstdc++ --with-cxxlib-autodetect=0 > --with-scalapa ck=1 --with-cxx=CC --with-debugging=0 > --with-hypre-dir=/opt/cray/pe/tpsl/19.06.1/GNU/8.2/haswell > --prefix=/scratch/snx3000/nvarini/petsc3.13.3-gpu --with-cuda=1 > --with-cuda-c=nvcc --with-cxxlib-autodetect=0 > --COPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include - > -with-cxx=CC > --CXXOPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include > 7369 [592]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c > 7370 [592]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c > 7371 [592]PETSC ERROR: #4 PCSetUp() line 898 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c > 7372 [592]PETSC ERROR: #5 KSPSetUp() line 376 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c > 7373 [592]PETSC ERROR: #6 KSPSolve_Private() line 633 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c > 7374 [316]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c > 7375 [339]PETSC ERROR: #1 formProl0() line 748 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c > 7376 [339]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c > 7377 [339]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c > 7378 [339]PETSC ERROR: #4 PCSetUp() line 898 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c > 7379 [339]PETSC ERROR: #5 KSPSetUp() line 376 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c > 7380 [592]PETSC ERROR: #7 KSPSolve() line 853 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c > 7381 [339]PETSC ERROR: #6 KSPSolve_Private() line 633 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c > 7382 [339]PETSC ERROR: #7 KSPSolve() line 853 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c > 7383 On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal value > (info = -7) > 7384 [160]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in > /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c > ======== > > I did try other pc_gamg_type but they fails as well. > > > #PETSc Option Table entries: > -ampere_dm_mat_type aijcusparse > -ampere_dm_vec_type cuda > -ampere_ksp_atol 1e-15 > -ampere_ksp_initial_guess_nonzero yes > -ampere_ksp_reuse_preconditioner yes > -ampere_ksp_rtol 1e-7 > -ampere_ksp_type dgmres > -ampere_mg_levels_esteig_ksp_max_it 10 > -ampere_mg_levels_esteig_ksp_type cg > -ampere_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 > -ampere_mg_levels_ksp_type chebyshev > -ampere_mg_levels_pc_type jacobi > -ampere_pc_gamg_agg_nsmooths 1 > -ampere_pc_gamg_coarse_eq_limit 10 > -ampere_pc_gamg_reuse_interpolation true > -ampere_pc_gamg_square_graph 1 > -ampere_pc_gamg_threshold 0.05 > -ampere_pc_gamg_threshold_scale .0 > -ampere_pc_gamg_type agg > -ampere_pc_type gamg > -dm_mat_type aijcusparse > -dm_vec_type cuda > -log_view > -poisson_dm_mat_type aijcusparse > -poisson_dm_vec_type cuda > -poisson_ksp_atol 1e-15 > -poisson_ksp_initial_guess_nonzero yes > -poisson_ksp_reuse_preconditioner yes > -poisson_ksp_rtol 1e-7 > -poisson_ksp_type dgmres > -poisson_log_view > -poisson_mg_levels_esteig_ksp_max_it 10 > -poisson_mg_levels_esteig_ksp_type cg > -poisson_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 > -poisson_mg_levels_ksp_max_it 1 > -poisson_mg_levels_ksp_type chebyshev > -poisson_mg_levels_pc_type jacobi > -poisson_pc_gamg_agg_nsmooths 1 > -poisson_pc_gamg_coarse_eq_limit 10 > -poisson_pc_gamg_reuse_interpolation true > -poisson_pc_gamg_square_graph 1 > -poisson_pc_gamg_threshold 0.05 > -poisson_pc_gamg_threshold_scale .0 > -poisson_pc_gamg_type agg > -poisson_pc_type gamg > -use_mat_nearnullspace true > #End of PETSc Option Table entries > > Regards, > > Nicola > > Il giorno mar 4 ago 2020 alle ore 17:57 Mark Adams ha > scritto: > >> >> >> On Tue, Aug 4, 2020 at 6:35 AM Stefano Zampini >> wrote: >> >>> Nicola, >>> >>> You are actually not using the GPU properly, since you use HYPRE >>> preconditioning, which is CPU only. One of your solvers is actually slower >>> on ?GPU?. >>> For a full AMG GPU, you can use PCGAMG, with cheby smoothers and with >>> Jacobi preconditioning. Mark can help you out with the specific command >>> line options. >>> When it works properly, everything related to PC application is >>> offloaded to the GPU, and you should expect to get the well-known and >>> branded 10x (maybe more) speedup one is expecting from GPUs during KSPSolve >>> >>> >> The speedup depends on the machine, but on SUMMIT, using enough CPUs to >> saturate the memory bus vs all 6 GPUs the speedup is a function of problem >> subdomain size. I saw 10x at about 100K equations/process. >> >> >>> Doing what you want to do is one of the last optimization steps of an >>> already optimized code before entering production. Yours is not even >>> optimized for proper GPU usage yet. >>> Also, any specific reason why you are using dgmres and fgmres? >>> >>> PETSc has not been designed with multi-threading in mind. You can >>> achieve ?overlap? of the two solves by splitting the communicator. But then >>> you need communications to let the two solutions talk to each other. >>> >>> Thanks >>> Stefano >>> >>> >>> On Aug 4, 2020, at 12:04 PM, nicola varini >>> wrote: >>> >>> Dear all, thanks for your replies. The reason why I've asked if it is >>> possible to overlap poisson and ampere is because they roughly >>> take the same amount of time. Please find in attachment the profiling >>> logs for only CPU and only GPU. >>> Of course it is possible to split the MPI communicator and run each >>> solver on different subcommunicator, however this would involve more >>> communication. >>> Did anyone ever tried to run 2 solvers with hyperthreading? >>> Thanks >>> >>> >>> Il giorno dom 2 ago 2020 alle ore 14:09 Mark Adams ha >>> scritto: >>> >>>> I suspect that the Poisson and Ampere's law solve are not coupled. You >>>> might be able to duplicate the communicator and use two threads. You would >>>> want to configure PETSc with threadsafty and threads and I think it >>>> could/should work, but this mode is never used by anyone. >>>> >>>> That said, I would not recommend doing this unless you feel like >>>> playing in computer science, as opposed to doing application science. The >>>> best case scenario you get a speedup of 2x. That is a strict upper bound, >>>> but you will never come close to it. Your hardware has some balance of CPU >>>> to GPU processing rate. Your application has a balance of volume of work >>>> for your two solves. They have to be the same to get close to 2x speedup >>>> and that ratio(s) has to be 1:1. To be concrete, from what little I can >>>> guess about your applications let's assume that the cost of each of these >>>> two solves is about the same (eg, Laplacians on your domain and the best >>>> case scenario). But, GPU machines are configured to have roughly 1-10% of >>>> capacity in the GPUs, these days, that gives you an upper bound of about >>>> 10% speedup. That is noise. Upshot, unless you configure your hardware to >>>> match this problem, and the two solves have the same cost, you will not see >>>> close to 2x speedup. Your time is better spent elsewhere. >>>> >>>> Mark >>>> >>>> On Sat, Aug 1, 2020 at 3:24 PM Jed Brown wrote: >>>> >>>>> You can use MPI and split the communicator so n-1 ranks create a DMDA >>>>> for one part of your system and the other rank drives the GPU in the other >>>>> part. They can all be part of the same coupled system on the full >>>>> communicator, but PETSc doesn't currently support some ranks having their >>>>> Vec arrays on GPU and others on host, so you'd be paying host-device >>>>> transfer costs on each iteration (and that might swamp any performance >>>>> benefit you would have gotten). >>>>> >>>>> In any case, be sure to think about the execution time of each part. >>>>> Load balancing with matching time-to-solution for each part can be really >>>>> hard. >>>>> >>>>> >>>>> Barry Smith writes: >>>>> >>>>> > Nicola, >>>>> > >>>>> > This is really viable or practical at this time with PETSc. It >>>>> is not impossible but requires careful coding with threads, another >>>>> possibility is to use one half of the virtual GPUs for each solve, this is >>>>> also not trivial. I would recommend first seeing what kind of performance >>>>> you can get on the GPU for each type of solve and revist this idea in the >>>>> future. >>>>> > >>>>> > Barry >>>>> > >>>>> > >>>>> > >>>>> > >>>>> >> On Jul 31, 2020, at 9:23 AM, nicola varini >>>>> wrote: >>>>> >> >>>>> >> Hello, I would like to know if it is possible to overlap CPU and >>>>> GPU with DMDA. >>>>> >> I've a machine where each node has 1P100+1Haswell. >>>>> >> I've to resolve Poisson and Ampere equation for each time step. >>>>> >> I'm using 2D DMDA for each of them. Would be possible to compute >>>>> poisson >>>>> >> and ampere equation at the same time? One on CPU and the other on >>>>> GPU? >>>>> >> >>>>> >> Thanks >>>>> >>>> >>> >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Aug 12 20:17:15 2020 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 12 Aug 2020 20:17:15 -0500 Subject: [petsc-users] overlap cpu and gpu? In-Reply-To: References: <6C7446CE-D6FD-4087-8B81-41494FA712E7@petsc.dev> <87eeoqp3t2.fsf@jedbrown.org> Message-ID: The QR is always done on the CPU, we don't have generic calls to blas/lapack go to the GPU currently. The error message is: On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal value (info = -7) argument 7 is &LWORK which is defined by PetscBLASInt LWORK=N*bs; and N=nSAvec is the column block size of new P. Presumably this is a huge run with many processes so using the debugger is not practical? We need to see what these variables are N, bs, nSAvec perhaps nSAvec is zero which could easily upset LAPACK. Crudest thing would be to just put a print statement in the code before the LAPACK call of if they are called many times add an error check like that generates an error if any of these three values are 0 (or negative). Barry It is not impossible that the Cray argument checking is incorrect and the value passed in is fine. You can check this by using --download-fblaslapack and see if the same or some other error comes up. > On Aug 12, 2020, at 7:19 PM, Mark Adams wrote: > > Can you reproduce this on the CPU? > The QR factorization seems to be failing. That could be from bad data or a bad GPU QR. > > On Wed, Aug 12, 2020 at 4:19 AM nicola varini > wrote: > Dear all, following the suggestions I did resubmit the simulation with the petscrc below. > However I do get the following error: > ======== > 7362 [592]PETSC ERROR: #1 formProl0() line 748 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c > 7363 [339]PETSC ERROR: Petsc has generated inconsistent data > 7364 [339]PETSC ERROR: xGEQRF error > 7365 [339]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > 7366 [339]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020 > 7367 [339]PETSC ERROR: /users/nvarini/gbs_test_nicola/bin/gbs_daint_gpu_gnu on a named nid05083 by nvarini Wed Aug 12 10:06:15 2020 > 7368 [339]PETSC ERROR: Configure options --with-cc=cc --with-fc=ftn --known-mpi-shared-libraries=1 --known-mpi-c-double-complex=1 --known-mpi-int64_t=1 --known-mpi-long-double=1 --with-batch=1 --known-64-bit-blas-indices=0 --LIBS=-lstdc++ --with-cxxlib-autodetect=0 --with-scalapa ck=1 --with-cxx=CC --with-debugging=0 --with-hypre-dir=/opt/cray/pe/tpsl/19.06.1/GNU/8.2/haswell --prefix=/scratch/snx3000/nvarini/petsc3.13.3-gpu --with-cuda=1 --with-cuda-c=nvcc --with-cxxlib-autodetect=0 --COPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include - -with-cxx=CC --CXXOPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include > 7369 [592]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c > 7370 [592]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c > 7371 [592]PETSC ERROR: #4 PCSetUp() line 898 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c > 7372 [592]PETSC ERROR: #5 KSPSetUp() line 376 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c > 7373 [592]PETSC ERROR: #6 KSPSolve_Private() line 633 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c > 7374 [316]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c > 7375 [339]PETSC ERROR: #1 formProl0() line 748 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c > 7376 [339]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c > 7377 [339]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c > 7378 [339]PETSC ERROR: #4 PCSetUp() line 898 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c > 7379 [339]PETSC ERROR: #5 KSPSetUp() line 376 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c > 7380 [592]PETSC ERROR: #7 KSPSolve() line 853 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c > 7381 [339]PETSC ERROR: #6 KSPSolve_Private() line 633 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c > 7382 [339]PETSC ERROR: #7 KSPSolve() line 853 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c > 7383 On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal value (info = -7) > 7384 [160]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c > ======== > > I did try other pc_gamg_type but they fails as well. > > > #PETSc Option Table entries: > -ampere_dm_mat_type aijcusparse > -ampere_dm_vec_type cuda > -ampere_ksp_atol 1e-15 > -ampere_ksp_initial_guess_nonzero yes > -ampere_ksp_reuse_preconditioner yes > -ampere_ksp_rtol 1e-7 > -ampere_ksp_type dgmres > -ampere_mg_levels_esteig_ksp_max_it 10 > -ampere_mg_levels_esteig_ksp_type cg > -ampere_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 > -ampere_mg_levels_ksp_type chebyshev > -ampere_mg_levels_pc_type jacobi > -ampere_pc_gamg_agg_nsmooths 1 > -ampere_pc_gamg_coarse_eq_limit 10 > -ampere_pc_gamg_reuse_interpolation true > -ampere_pc_gamg_square_graph 1 > -ampere_pc_gamg_threshold 0.05 > -ampere_pc_gamg_threshold_scale .0 > -ampere_pc_gamg_type agg > -ampere_pc_type gamg > -dm_mat_type aijcusparse > -dm_vec_type cuda > -log_view > -poisson_dm_mat_type aijcusparse > -poisson_dm_vec_type cuda > -poisson_ksp_atol 1e-15 > -poisson_ksp_initial_guess_nonzero yes > -poisson_ksp_reuse_preconditioner yes > -poisson_ksp_rtol 1e-7 > -poisson_ksp_type dgmres > -poisson_log_view > -poisson_mg_levels_esteig_ksp_max_it 10 > -poisson_mg_levels_esteig_ksp_type cg > -poisson_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 > -poisson_mg_levels_ksp_max_it 1 > -poisson_mg_levels_ksp_type chebyshev > -poisson_mg_levels_pc_type jacobi > -poisson_pc_gamg_agg_nsmooths 1 > -poisson_pc_gamg_coarse_eq_limit 10 > -poisson_pc_gamg_reuse_interpolation true > -poisson_pc_gamg_square_graph 1 > -poisson_pc_gamg_threshold 0.05 > -poisson_pc_gamg_threshold_scale .0 > -poisson_pc_gamg_type agg > -poisson_pc_type gamg > -use_mat_nearnullspace true > #End of PETSc Option Table entries > > Regards, > > Nicola > > Il giorno mar 4 ago 2020 alle ore 17:57 Mark Adams > ha scritto: > > > On Tue, Aug 4, 2020 at 6:35 AM Stefano Zampini > wrote: > Nicola, > > You are actually not using the GPU properly, since you use HYPRE preconditioning, which is CPU only. One of your solvers is actually slower on ?GPU?. > For a full AMG GPU, you can use PCGAMG, with cheby smoothers and with Jacobi preconditioning. Mark can help you out with the specific command line options. > When it works properly, everything related to PC application is offloaded to the GPU, and you should expect to get the well-known and branded 10x (maybe more) speedup one is expecting from GPUs during KSPSolve > > > The speedup depends on the machine, but on SUMMIT, using enough CPUs to saturate the memory bus vs all 6 GPUs the speedup is a function of problem subdomain size. I saw 10x at about 100K equations/process. > > Doing what you want to do is one of the last optimization steps of an already optimized code before entering production. Yours is not even optimized for proper GPU usage yet. > Also, any specific reason why you are using dgmres and fgmres? > > PETSc has not been designed with multi-threading in mind. You can achieve ?overlap? of the two solves by splitting the communicator. But then you need communications to let the two solutions talk to each other. > > Thanks > Stefano > > >> On Aug 4, 2020, at 12:04 PM, nicola varini > wrote: >> >> Dear all, thanks for your replies. The reason why I've asked if it is possible to overlap poisson and ampere is because they roughly >> take the same amount of time. Please find in attachment the profiling logs for only CPU and only GPU. >> Of course it is possible to split the MPI communicator and run each solver on different subcommunicator, however this would involve more communication. >> Did anyone ever tried to run 2 solvers with hyperthreading? >> Thanks >> >> >> Il giorno dom 2 ago 2020 alle ore 14:09 Mark Adams > ha scritto: >> I suspect that the Poisson and Ampere's law solve are not coupled. You might be able to duplicate the communicator and use two threads. You would want to configure PETSc with threadsafty and threads and I think it could/should work, but this mode is never used by anyone. >> >> That said, I would not recommend doing this unless you feel like playing in computer science, as opposed to doing application science. The best case scenario you get a speedup of 2x. That is a strict upper bound, but you will never come close to it. Your hardware has some balance of CPU to GPU processing rate. Your application has a balance of volume of work for your two solves. They have to be the same to get close to 2x speedup and that ratio(s) has to be 1:1. To be concrete, from what little I can guess about your applications let's assume that the cost of each of these two solves is about the same (eg, Laplacians on your domain and the best case scenario). But, GPU machines are configured to have roughly 1-10% of capacity in the GPUs, these days, that gives you an upper bound of about 10% speedup. That is noise. Upshot, unless you configure your hardware to match this problem, and the two solves have the same cost, you will not see close to 2x speedup. Your time is better spent elsewhere. >> >> Mark >> >> On Sat, Aug 1, 2020 at 3:24 PM Jed Brown > wrote: >> You can use MPI and split the communicator so n-1 ranks create a DMDA for one part of your system and the other rank drives the GPU in the other part. They can all be part of the same coupled system on the full communicator, but PETSc doesn't currently support some ranks having their Vec arrays on GPU and others on host, so you'd be paying host-device transfer costs on each iteration (and that might swamp any performance benefit you would have gotten). >> >> In any case, be sure to think about the execution time of each part. Load balancing with matching time-to-solution for each part can be really hard. >> >> >> Barry Smith > writes: >> >> > Nicola, >> > >> > This is really viable or practical at this time with PETSc. It is not impossible but requires careful coding with threads, another possibility is to use one half of the virtual GPUs for each solve, this is also not trivial. I would recommend first seeing what kind of performance you can get on the GPU for each type of solve and revist this idea in the future. >> > >> > Barry >> > >> > >> > >> > >> >> On Jul 31, 2020, at 9:23 AM, nicola varini > wrote: >> >> >> >> Hello, I would like to know if it is possible to overlap CPU and GPU with DMDA. >> >> I've a machine where each node has 1P100+1Haswell. >> >> I've to resolve Poisson and Ampere equation for each time step. >> >> I'm using 2D DMDA for each of them. Would be possible to compute poisson >> >> and ampere equation at the same time? One on CPU and the other on GPU? >> >> >> >> Thanks >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pascal.kraft at kit.edu Wed Aug 12 18:52:39 2020 From: pascal.kraft at kit.edu (Kraft, Pascal (IANM)) Date: Wed, 12 Aug 2020 23:52:39 +0000 Subject: [petsc-users] Measuring memory consumption Message-ID: <0f3eccd8dcc7414289b580b51970f993@kit.edu> Dear PETSc Users, I use the MUMPS wrapper in PETSc (loaded from dealii). I know that MUMPS computes a factorization based on the mulrifrontal method and since I have a very memory-strapped problem, I would like to run some test on the memory requirements for the factorizations across some sets of parameters in my underlying FEM-problem. My question is if there is a native way to check memory (as a measure of fill-in) for the MUMPS factorization in PETSc. I would prefer not to go through the operating system since that feels somewhat inconclusive to me. Does PETSc provide such functionality in any way? I would prefer to know the memory consumption for storing the factorization if that is possible, I would also take the total memory consumption of the direct solver if there is no other way, and, if nothing else is possible, I guess I would have to go with the memory requirement of the whole application if all else fails. Is anyone aware of something I can do? Kind regards, Pascal -------------------------------------------------------------------------- Karlsruhe Institute of Technology (KIT) Institute for Applied and Numerical Mathematics Kraft, Pascal Research scientist Englerstra?e 2 Geb?ude 20.30 76130, Germany Phone: +49 721 608-42801 Mobile: +49 163 6927612 E-mail: pascal.kraft?kit.edu Web: www.math.kit.edu/ianm2/~kraft/de KIT ? The Research University in the Helmholtz Association Since 2010, the KIT has been certified as a family-friendly university. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Aug 12 23:17:20 2020 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 12 Aug 2020 23:17:20 -0500 Subject: [petsc-users] Measuring memory consumption In-Reply-To: <0f3eccd8dcc7414289b580b51970f993@kit.edu> References: <0f3eccd8dcc7414289b580b51970f993@kit.edu> Message-ID: <7A3322D0-CD44-4504-BC09-EFB25D133C07@petsc.dev> Pascal, Do KSPGetPC(ksp,&pc); PCFactorGetMatrix(pc,fact) PetscViewerPushFormat(PETSC_VIEWER_STDOUT_(PetscObjectComm((PetscObject)fact)),PETSC_VIEWER_ASCII_INFO); MatView(fact,PETSC_VIEWER_STDOUT_(PetscObjectComm((PetscObject)fact); PetscViewerPopFormat(PETSC_VIEWER_STDOUT_(PetscObjectComm((PetscObject)fact)); after SNES/KSPSolve this should print the various MUMPS variables related to memory usage. This is processed in /src/mat/impls/aij/mpi/mumps/mumps.c If you want to process the values directly in the application code you can look at this file and see how it is extracting the memory values from the mumps variables and use the variables directly in your code. Barry > On Aug 12, 2020, at 6:52 PM, Kraft, Pascal (IANM) wrote: > > Dear PETSc Users, > > I use the MUMPS wrapper in PETSc (loaded from dealii). I know that MUMPS computes a factorization based on the mulrifrontal method and since I have a very memory-strapped problem, I would like to run some test on the memory requirements for the factorizations across some sets of parameters in my underlying FEM-problem. > My question is if there is a native way to check memory (as a measure of fill-in) for the MUMPS factorization in PETSc. I would prefer not to go through the operating system since that feels somewhat inconclusive to me. Does PETSc provide such functionality in any way? > I would prefer to know the memory consumption for storing the factorization if that is possible, I would also take the total memory consumption of the direct solver if there is no other way, and, if nothing else is possible, I guess I would have to go with the memory requirement of the whole application if all else fails. > Is anyone aware of something I can do? > > Kind regards, > Pascal > > -------------------------------------------------------------------------- > Karlsruhe Institute of Technology (KIT) > Institute for Applied and Numerical Mathematics > > Kraft, Pascal > Research scientist > > Englerstra?e 2 > Geb?ude 20.30 > 76130, Germany > > Phone: +49 721 608-42801 > Mobile: +49 163 6927612 > E-mail: pascal.kraft?kit.edu > Web: www.math.kit.edu/ianm2/~kraft/de > KIT ? The Research University in the Helmholtz Association > > Since 2010, the KIT has been certified as a family-friendly university. -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at email.arizona.edu Thu Aug 13 02:16:32 2020 From: aph at email.arizona.edu (Anthony Paul Haas) Date: Thu, 13 Aug 2020 00:16:32 -0700 Subject: [petsc-users] MatZeroRows Message-ID: Hello, I am using MatZeroRows for imposing a known forcing into my equations in conjunction with rhs and by setting the diagonal of the matrix to 1. I am using Fortran. I have used: ! only local processors set their own zeros call MatSetOption(self%fieldLHSMat_ps, MAT_NO_OFF_PROC_ZERO_ROWS, PETSC_TRUE, ierr) call MatSetOption(self%fieldLHSMat_ps, MAT_KEEP_NONZERO_PATTERN, PETSC_TRUE, ierr) call MatZeroRows(self%fieldLHSMat_ps, numrows, glob_row_idx, diag, PETSC_NULL_OBJECT, PETSC_NULL_OBJECT, ierr) Is numrows above the local (on each proc.) number of rows to remove, or is it the global number of rows to remove? Also on some processors, I have no rows to remove, hence the array glob_row_idx is not allocated. How can I tell Petsc? Should I pass PETSC_NULL_OBJECT instead of glob_row_idx in this case? Finally, does using the option MAT_KEEP_NONZERO_PATTERN have an influence on the time the MatZeroRows call will take? Thanks, Best regards, Anthony -------------- next part -------------- An HTML attachment was scrubbed... URL: From bastian.loehrer at tu-dresden.de Thu Aug 13 07:48:49 2020 From: bastian.loehrer at tu-dresden.de (=?UTF-8?Q?Bastian_L=c3=b6hrer?=) Date: Thu, 13 Aug 2020 14:48:49 +0200 Subject: [petsc-users] DMView / print out ownership ranges Message-ID: <31f4d18a-a6d3-95ff-26d5-fa2421d89d60@tu-dresden.de> Dear PETSc people, in PETSc 3.3 ??? call DMView( dm, PETSC_VIEWER_STDOUT_WORLD, ierr) printed out the ownership ranges like so: Processor [0] M 32 N 34 P 32 m 1 n 2 p 2 w 1 s 1 X range of indices: 0 32, Y range of indices: 0 17, Z range of indices: 0 16 Processor [1] M 32 N 34 P 32 m 1 n 2 p 2 w 1 s 1 X range of indices: 0 32, Y range of indices: 17 34, Z range of indices: 0 16 Processor [2] M 32 N 34 P 32 m 1 n 2 p 2 w 1 s 1 X range of indices: 0 32, Y range of indices: 0 17, Z range of indices: 16 32 Processor [3] M 32 N 34 P 32 m 1 n 2 p 2 w 1 s 1 X range of indices: 0 32, Y range of indices: 17 34, Z range of indices: 16 32 In PETSc 3.8.4 (and later?) the same function call only prints out: DM Object: 4 MPI processes ? type: da Does the feature to print out the ownership ranges still exist? I am unable to find it. Best, Bastian -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Aug 13 07:53:17 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 13 Aug 2020 08:53:17 -0400 Subject: [petsc-users] DMView / print out ownership ranges In-Reply-To: <31f4d18a-a6d3-95ff-26d5-fa2421d89d60@tu-dresden.de> References: <31f4d18a-a6d3-95ff-26d5-fa2421d89d60@tu-dresden.de> Message-ID: On Thu, Aug 13, 2020 at 8:49 AM Bastian L?hrer < bastian.loehrer at tu-dresden.de> wrote: > Dear PETSc people, > > in PETSc 3.3 > > call DMView( dm, PETSC_VIEWER_STDOUT_WORLD, ierr) > > printed out the ownership ranges like so: > > Processor [0] M 32 N 34 P 32 m 1 n 2 p 2 w 1 s 1 > X range of indices: 0 32, Y range of indices: 0 17, Z range of indices: 0 > 16 > Processor [1] M 32 N 34 P 32 m 1 n 2 p 2 w 1 s 1 > X range of indices: 0 32, Y range of indices: 17 34, Z range of indices: 0 > 16 > Processor [2] M 32 N 34 P 32 m 1 n 2 p 2 w 1 s 1 > X range of indices: 0 32, Y range of indices: 0 17, Z range of indices: 16 > 32 > Processor [3] M 32 N 34 P 32 m 1 n 2 p 2 w 1 s 1 > X range of indices: 0 32, Y range of indices: 17 34, Z range of indices: > 16 32 > > In PETSc 3.8.4 (and later?) the same function call only prints out: > > DM Object: 4 MPI processes > type: da > > Does the feature to print out the ownership ranges still exist? > I am unable to find it. > Certainly the latest release prints what you expect: knepley/feature-plex-stokes-tutorial $:/PETSc3/petsc/petsc-dev/src/snes/tutorials$ make ex5 /PETSc3/petsc/apple/bin/mpicc -Wl,-multiply_defined,suppress -Wl,-multiply_defined -Wl,suppress -Wl,-commons,use_dylibs -Wl,-search_paths_first -Wl,-no_compact_unwind -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fno-stack-check -Qunused-arguments -fvisibility=hidden -g3 -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fno-stack-check -Qunused-arguments -fvisibility=hidden -g3 -I/PETSc3/petsc/petsc-dev/include -I/PETSc3/petsc/petsc-dev/arch-master-debug/include -I/opt/X11/include -I/PETSc3/petsc/apple/include -I/PETSc3/petsc/petsc-dev/arch-master-debug/include/eigen3 ex5.c -Wl,-rpath,/PETSc3/petsc/petsc-dev/arch-master-debug/lib -L/PETSc3/petsc/petsc-dev/arch-master-debug/lib -Wl,-rpath,/PETSc3/petsc/petsc-dev/arch-master-debug/lib -L/PETSc3/petsc/petsc-dev/arch-master-debug/lib -Wl,-rpath,/opt/X11/lib -L/opt/X11/lib -Wl,-rpath,/PETSc3/petsc/apple/lib -L/PETSc3/petsc/apple/lib -Wl,-rpath,/usr/local/lib/gcc/x86_64-apple-darwin19/9.2.0 -L/usr/local/lib/gcc/x86_64-apple-darwin19/9.2.0 -Wl,-rpath,/usr/local/lib -L/usr/local/lib -lpetsc -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lsuperlu_dist -lml -lfftw3_mpi -lfftw3 -lp4est -lsc -llapack -lblas -legadslite -ltriangle -lX11 -lexodus -lnetcdf -lpnetcdf -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lchaco -lparmetis -lmetis -lz -lctetgen -lc++ -ldl -lmpifort -lmpi -lpmpi -lgfortran -lquadmath -lm -lc++ -ldl -o ex5 knepley/feature-plex-stokes-tutorial $:/PETSc3/petsc/petsc-dev/src/snes/tutorials$ ./ex5 -dm_view DM Object: 1 MPI processes type: da Processor [0] M 4 N 4 m 1 n 1 w 1 s 1 X range of indices: 0 4, Y range of indices: 0 4 DM Object: 1 MPI processes type: da Processor [0] M 4 N 4 m 1 n 1 w 2 s 1 X range of indices: 0 4, Y range of indices: 0 4 knepley/feature-plex-stokes-tutorial $:/PETSc3/petsc/petsc-dev/src/snes/tutorials$ $MPIEXEC -np 4 ./ex5 -dm_view DM Object: 4 MPI processes type: da Processor [0] M 4 N 4 m 2 n 2 w 1 s 1 X range of indices: 0 2, Y range of indices: 0 2 Processor [1] M 4 N 4 m 2 n 2 w 1 s 1 X range of indices: 2 4, Y range of indices: 0 2 Processor [2] M 4 N 4 m 2 n 2 w 1 s 1 X range of indices: 0 2, Y range of indices: 2 4 Processor [3] M 4 N 4 m 2 n 2 w 1 s 1 X range of indices: 2 4, Y range of indices: 2 4 DM Object: 4 MPI processes type: da Processor [0] M 4 N 4 m 2 n 2 w 2 s 1 X range of indices: 0 2, Y range of indices: 0 2 Processor [1] M 4 N 4 m 2 n 2 w 2 s 1 X range of indices: 2 4, Y range of indices: 0 2 Processor [2] M 4 N 4 m 2 n 2 w 2 s 1 X range of indices: 0 2, Y range of indices: 2 4 Processor [3] M 4 N 4 m 2 n 2 w 2 s 1 X range of indices: 2 4, Y range of indices: 2 4 We can try and go back to debug 3.8.4, but that is a long time ago. Can you use the latest release? Thanks, Matt > Best, > Bastian > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Aug 13 08:07:20 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 13 Aug 2020 09:07:20 -0400 Subject: [petsc-users] MatZeroRows In-Reply-To: References: Message-ID: On Thu, Aug 13, 2020 at 3:17 AM Anthony Paul Haas wrote: > Hello, > > I am using MatZeroRows for imposing a known forcing into my equations in > conjunction with rhs and by setting the diagonal of the matrix to 1. > > I am using Fortran. I have used: > > ! only local processors set their own zeros > > > > call MatSetOption(self%fieldLHSMat_ps, MAT_NO_OFF_PROC_ZERO_ROWS, > PETSC_TRUE, ierr) > > > call MatSetOption(self%fieldLHSMat_ps, MAT_KEEP_NONZERO_PATTERN, > PETSC_TRUE, ierr) > > > call MatZeroRows(self%fieldLHSMat_ps, numrows, glob_row_idx, diag, > PETSC_NULL_OBJECT, PETSC_NULL_OBJECT, ierr) > > > > Is numrows above the local (on each proc.) number of rows to remove, or is > it the global number of rows to remove? > Local. > Also on some processors, I have no rows to remove, hence the array glob_row_idx > is not allocated. How can I tell Petsc? Should I pass PETSC_NULL_OBJECT > instead of glob_row_idx in this case? > If you pass numrows = 0, it should not matter what is in the next slot, as long as it type checks. > Finally, does using the option MAT_KEEP_NONZERO_PATTERN have an influence > on the time the MatZeroRows call will take? > I don't think it makes much of a difference, but I have not measured it. Thanks, Matt > Thanks, > > > Best regards, > > > Anthony > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From smithc11 at rpi.edu Thu Aug 13 08:37:55 2020 From: smithc11 at rpi.edu (Cameron Smith) Date: Thu, 13 Aug 2020 09:37:55 -0400 Subject: [petsc-users] creation of parallel dmplex from a partitioned mesh Message-ID: <1953567c-6c7f-30fb-13e6-ad7017263a92@rpi.edu> Hello, We have a partitioned mesh that we want to create a DMPlex from that has the same distribution of elements (i.e., assignment of elements to processes) and vertices 'interior' to a process (i.e., vertices not on the inter-process boundary). We were trying to use DMPlexCreateFromCellListParallelPetsc() or DMPlexBuildFromCellListParallel() and found that the vertex ownership (roots in the returned Vertex SF) appears to be sequentially assigned to processes based on global vertex id. In general, this will not match our mesh distribution. As we understand, to subsequently set vertex coordinates (or other vertex data) we would have to utilize a star forest (SF) communication API to send data to the correct process. Is that correct? Alternatively, if we create a dmplex object from the elements that exist on each process using DMCreateFromCellList(), and then create a SF from mesh vertices on inter-process boundaries (using the mapping from local to global vertex ids provided by our mesh library), could we then associate the dmplex objects with the SF? Is it as simple as calling DMSetPointSF()? If manually defining the PointSF is a way forward, we would like some help understanding its definition; i.e., which entities become roots and which become leaves. In DMPlexBuildFromCellListParallel() https://gitlab.com/petsc/petsc/-/blob/753428fdb0644bc4cb7be6429ce8776c05405d40/src/dm/impls/plex/plexcreate.c#L2875-2899 the PointSF appears to contain roots for elements and vertices and leaves for owned vertices on the inter-process boundary. Is that correct? Thank-you, Cameron From knepley at gmail.com Thu Aug 13 08:54:26 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 13 Aug 2020 09:54:26 -0400 Subject: [petsc-users] creation of parallel dmplex from a partitioned mesh In-Reply-To: <1953567c-6c7f-30fb-13e6-ad7017263a92@rpi.edu> References: <1953567c-6c7f-30fb-13e6-ad7017263a92@rpi.edu> Message-ID: On Thu, Aug 13, 2020 at 9:38 AM Cameron Smith wrote: > Hello, > > We have a partitioned mesh that we want to create a DMPlex from that has > the same distribution of elements (i.e., assignment of elements to > processes) and vertices 'interior' to a process (i.e., vertices not on > the inter-process boundary). > > We were trying to use DMPlexCreateFromCellListParallelPetsc() or > DMPlexBuildFromCellListParallel() and found that the vertex ownership > (roots in the returned Vertex SF) appears to be sequentially assigned to > processes based on global vertex id. In general, this will not match > our mesh distribution. As we understand, to subsequently set vertex > coordinates (or other vertex data) we would have to utilize a star > forest (SF) communication API to send data to the correct process. Is > that correct? > > Alternatively, if we create a dmplex object from the elements that exist > on each process using DMCreateFromCellList(), and then create a SF from > mesh vertices on inter-process boundaries (using the mapping from local > to global vertex ids provided by our mesh library), could we then > associate the dmplex objects with the SF? Is it as simple as calling > DMSetPointSF()? > Yes. If you have all the distribution information, this is the easiest thing to do. > If manually defining the PointSF is a way forward, we would like some > help understanding its definition; i.e., which entities become roots and > which become leaves. In DMPlexBuildFromCellListParallel() > Short explanation of SF: SF stands for Star-Forest. It is a star graph because you have a single root that points to multiple leaves. It is a forest because you have several of these stars. We use this construct in many places in PETSc, and where it is used determines the semantics of the indices. The DMPlex point SF is an SF in which root indices are "owned" mesh points and leaf indices are "ghost" mesh points. You can take any set of local Plexes and add an SF to make them a parallel Plex. The SF is constructed with one-sided data. Locally, each process specifies two things: 1) The root space: The set of indices [0, Nr) which refers to possible roots on this process. For the pointSF, this is [0, Np) where Np is the number of local mesh points. 2) The leaves: Each leaf is a pair (local mesh point lp, remote mesh point rp) which says that local mesh point lp is a "ghost" of remote point rp. The remote point is given by (rank r, local mesh point rlp) where rlp is the local mesh point number on process r. With this, the Plex will automatically create all the other structures it needs. > > https://gitlab.com/petsc/petsc/-/blob/753428fdb0644bc4cb7be6429ce8776c05405d40/src/dm/impls/plex/plexcreate.c#L2875-2899 > > the PointSF appears to contain roots for elements and vertices and > leaves for owned vertices on the inter-process boundary. Is that correct? > No, the leaves are ghost vertices. They point back to the owner. Thanks, Matt > Thank-you, > Cameron > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Thu Aug 13 10:10:55 2020 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Thu, 13 Aug 2020 15:10:55 +0000 Subject: [petsc-users] Measuring memory consumption In-Reply-To: <7A3322D0-CD44-4504-BC09-EFB25D133C07@petsc.dev> References: <0f3eccd8dcc7414289b580b51970f993@kit.edu>, <7A3322D0-CD44-4504-BC09-EFB25D133C07@petsc.dev> Message-ID: Pascal, You may try an easy way first, i.e., using runtime option. For example, petsc/src/ksp/ksp/tutorials/ex2.c: 1) ./ex2 -pc_type lu -pc_factor_mat_solver_type mumps -help |grep mumps -pc_factor_mat_solver_type : Specific direct solver to use (MatGetFactor) -mat_mumps_icntl_1 : ICNTL(1): output stream for error messages (None) -mat_mumps_icntl_2 : ICNTL(2): output stream for diagnostic printing, statistics, and warning (None) -mat_mumps_icntl_3 : ICNTL(3): output stream for global information, collected on the host (None) -mat_mumps_icntl_4 : ICNTL(4): level of printing (0 to 4) (None) ... it tells you that '-mat_mumps_icntl_4 <#>' prints mumps internal info 2) ./ex2 -pc_type lu -pc_factor_mat_solver_type mumps -mat_mumps_icntl_4 2 MEMORY ESTIMATIONS ... Estimations with standard Full-Rank (FR) factorization: Total space in MBytes, IC factorization (INFOG(17)): 0 Total space in MBytes, OOC factorization (INFOG(27)): 0 Elapsed time in analysis driver= 0.0006 Entering DMUMPS 5.2.1 from C interface with JOB, N, NNZ = 2 56 250 executing #MPI = 1, without OMP ****** FACTORIZATION STEP ******** GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ... Number of working processes = 1 ICNTL(22) Out-of-core option = 0 ICNTL(35) BLR activation (eff. choice) = 0 ICNTL(14) Memory relaxation = 20 INFOG(3) Real space for factors (estimated)= 556 INFOG(4) Integer space for factors (estim.)= 1135 Maximum frontal size (estimated) = 11 Number of nodes in the tree = 39 Memory allowed (MB -- 0: N/A ) = 0 Memory provided by user, sum of LWK_USER = 0 ... Hong ________________________________ From: petsc-users on behalf of Barry Smith Sent: Wednesday, August 12, 2020 11:17 PM To: Kraft, Pascal (IANM) Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Measuring memory consumption Pascal, Do KSPGetPC(ksp,&pc); PCFactorGetMatrix(pc,fact) PetscViewerPushFormat(PETSC_VIEWER_STDOUT_(PetscObjectComm((PetscObject)fact)),PETSC_VIEWER_ASCII_INFO); MatView(fact,PETSC_VIEWER_STDOUT_(PetscObjectComm((PetscObject)fact); PetscViewerPopFormat(PETSC_VIEWER_STDOUT_(PetscObjectComm((PetscObject)fact)); after SNES/KSPSolve this should print the various MUMPS variables related to memory usage. This is processed in /src/mat/impls/aij/mpi/mumps/mumps.c If you want to process the values directly in the application code you can look at this file and see how it is extracting the memory values from the mumps variables and use the variables directly in your code. Barry On Aug 12, 2020, at 6:52 PM, Kraft, Pascal (IANM) > wrote: Dear PETSc Users, I use the MUMPS wrapper in PETSc (loaded from dealii). I know that MUMPS computes a factorization based on the mulrifrontal method and since I have a very memory-strapped problem, I would like to run some test on the memory requirements for the factorizations across some sets of parameters in my underlying FEM-problem. My question is if there is a native way to check memory (as a measure of fill-in) for the MUMPS factorization in PETSc. I would prefer not to go through the operating system since that feels somewhat inconclusive to me. Does PETSc provide such functionality in any way? I would prefer to know the memory consumption for storing the factorization if that is possible, I would also take the total memory consumption of the direct solver if there is no other way, and, if nothing else is possible, I guess I would have to go with the memory requirement of the whole application if all else fails. Is anyone aware of something I can do? Kind regards, Pascal -------------------------------------------------------------------------- Karlsruhe Institute of Technology (KIT) Institute for Applied and Numerical Mathematics Kraft, Pascal Research scientist Englerstra?e 2 Geb?ude 20.30 76130, Germany Phone: +49 721 608-42801 Mobile: +49 163 6927612 E-mail: pascal.kraft?kit.edu Web: www.math.kit.edu/ianm2/~kraft/de KIT ? The Research University in the Helmholtz Association Since 2010, the KIT has been certified as a family-friendly university. -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicola.varini at gmail.com Thu Aug 13 10:28:37 2020 From: nicola.varini at gmail.com (nicola varini) Date: Thu, 13 Aug 2020 17:28:37 +0200 Subject: [petsc-users] overlap cpu and gpu? In-Reply-To: References: <6C7446CE-D6FD-4087-8B81-41494FA712E7@petsc.dev> <87eeoqp3t2.fsf@jedbrown.org> Message-ID: Dear Barry, you are right. The Cray argument checking is incorrect. It does work with download-fblaslapack. However it does fail to converge. Is there anything obviously wrong with my petscrc? Anything else am I missing? Thanks Il giorno gio 13 ago 2020 alle ore 03:17 Barry Smith ha scritto: > > The QR is always done on the CPU, we don't have generic calls to > blas/lapack go to the GPU currently. > > The error message is: > > On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal value (info = > -7) > > argument 7 is &LWORK which is defined by > > PetscBLASInt LWORK=N*bs; > > and > > N=nSAvec is the column block size of new P. > > Presumably this is a huge run with many processes so using the debugger > is not practical? > > We need to see what these variables are > > N, bs, nSAvec > > perhaps nSAvec is zero which could easily upset LAPACK. > > Crudest thing would be to just put a print statement in the code > before the LAPACK call of if they are called many times add an error check > like that > generates an error if any of these three values are 0 (or negative). > > Barry > > > It is not impossible that the Cray argument checking is incorrect and > the value passed in is fine. You can check this by using > --download-fblaslapack and see if the same or some other error comes up. > > > > > > > > > On Aug 12, 2020, at 7:19 PM, Mark Adams wrote: > > Can you reproduce this on the CPU? > The QR factorization seems to be failing. That could be from bad data or a > bad GPU QR. > > On Wed, Aug 12, 2020 at 4:19 AM nicola varini > wrote: > >> Dear all, following the suggestions I did resubmit the simulation with >> the petscrc below. >> However I do get the following error: >> ======== >> 7362 [592]PETSC ERROR: #1 formProl0() line 748 in >> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >> 7363 [339]PETSC ERROR: Petsc has generated inconsistent data >> 7364 [339]PETSC ERROR: xGEQRF error >> 7365 [339]PETSC ERROR: See >> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >> shooting. >> 7366 [339]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020 >> 7367 [339]PETSC ERROR: >> /users/nvarini/gbs_test_nicola/bin/gbs_daint_gpu_gnu on a named nid05083 >> by nvarini Wed Aug 12 10:06:15 2020 >> 7368 [339]PETSC ERROR: Configure options --with-cc=cc --with-fc=ftn >> --known-mpi-shared-libraries=1 --known-mpi-c-double-complex=1 >> --known-mpi-int64_t=1 --known-mpi-long-double=1 --with-batch=1 >> --known-64-bit-blas-indices=0 --LIBS=-lstdc++ --with-cxxlib-autodetect=0 >> --with-scalapa ck=1 --with-cxx=CC --with-debugging=0 >> --with-hypre-dir=/opt/cray/pe/tpsl/19.06.1/GNU/8.2/haswell >> --prefix=/scratch/snx3000/nvarini/petsc3.13.3-gpu --with-cuda=1 >> --with-cuda-c=nvcc --with-cxxlib-autodetect=0 >> --COPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include - >> -with-cxx=CC >> --CXXOPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include >> 7369 [592]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in >> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >> 7370 [592]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >> 7371 [592]PETSC ERROR: #4 PCSetUp() line 898 in >> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c >> 7372 [592]PETSC ERROR: #5 KSPSetUp() line 376 in >> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >> 7373 [592]PETSC ERROR: #6 KSPSolve_Private() line 633 in >> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >> 7374 [316]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >> 7375 [339]PETSC ERROR: #1 formProl0() line 748 in >> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >> 7376 [339]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in >> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >> 7377 [339]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >> 7378 [339]PETSC ERROR: #4 PCSetUp() line 898 in >> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c >> 7379 [339]PETSC ERROR: #5 KSPSetUp() line 376 in >> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >> 7380 [592]PETSC ERROR: #7 KSPSolve() line 853 in >> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >> 7381 [339]PETSC ERROR: #6 KSPSolve_Private() line 633 in >> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >> 7382 [339]PETSC ERROR: #7 KSPSolve() line 853 in >> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >> 7383 On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal value >> (info = -7) >> 7384 [160]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >> ======== >> >> I did try other pc_gamg_type but they fails as well. >> >> >> #PETSc Option Table entries: >> -ampere_dm_mat_type aijcusparse >> -ampere_dm_vec_type cuda >> -ampere_ksp_atol 1e-15 >> -ampere_ksp_initial_guess_nonzero yes >> -ampere_ksp_reuse_preconditioner yes >> -ampere_ksp_rtol 1e-7 >> -ampere_ksp_type dgmres >> -ampere_mg_levels_esteig_ksp_max_it 10 >> -ampere_mg_levels_esteig_ksp_type cg >> -ampere_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >> -ampere_mg_levels_ksp_type chebyshev >> -ampere_mg_levels_pc_type jacobi >> -ampere_pc_gamg_agg_nsmooths 1 >> -ampere_pc_gamg_coarse_eq_limit 10 >> -ampere_pc_gamg_reuse_interpolation true >> -ampere_pc_gamg_square_graph 1 >> -ampere_pc_gamg_threshold 0.05 >> -ampere_pc_gamg_threshold_scale .0 >> -ampere_pc_gamg_type agg >> -ampere_pc_type gamg >> -dm_mat_type aijcusparse >> -dm_vec_type cuda >> -log_view >> -poisson_dm_mat_type aijcusparse >> -poisson_dm_vec_type cuda >> -poisson_ksp_atol 1e-15 >> -poisson_ksp_initial_guess_nonzero yes >> -poisson_ksp_reuse_preconditioner yes >> -poisson_ksp_rtol 1e-7 >> -poisson_ksp_type dgmres >> -poisson_log_view >> -poisson_mg_levels_esteig_ksp_max_it 10 >> -poisson_mg_levels_esteig_ksp_type cg >> -poisson_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >> -poisson_mg_levels_ksp_max_it 1 >> -poisson_mg_levels_ksp_type chebyshev >> -poisson_mg_levels_pc_type jacobi >> -poisson_pc_gamg_agg_nsmooths 1 >> -poisson_pc_gamg_coarse_eq_limit 10 >> -poisson_pc_gamg_reuse_interpolation true >> -poisson_pc_gamg_square_graph 1 >> -poisson_pc_gamg_threshold 0.05 >> -poisson_pc_gamg_threshold_scale .0 >> -poisson_pc_gamg_type agg >> -poisson_pc_type gamg >> -use_mat_nearnullspace true >> #End of PETSc Option Table entries >> >> Regards, >> >> Nicola >> >> Il giorno mar 4 ago 2020 alle ore 17:57 Mark Adams ha >> scritto: >> >>> >>> >>> On Tue, Aug 4, 2020 at 6:35 AM Stefano Zampini < >>> stefano.zampini at gmail.com> wrote: >>> >>>> Nicola, >>>> >>>> You are actually not using the GPU properly, since you use HYPRE >>>> preconditioning, which is CPU only. One of your solvers is actually slower >>>> on ?GPU?. >>>> For a full AMG GPU, you can use PCGAMG, with cheby smoothers and with >>>> Jacobi preconditioning. Mark can help you out with the specific command >>>> line options. >>>> When it works properly, everything related to PC application is >>>> offloaded to the GPU, and you should expect to get the well-known and >>>> branded 10x (maybe more) speedup one is expecting from GPUs during KSPSolve >>>> >>>> >>> The speedup depends on the machine, but on SUMMIT, using enough CPUs to >>> saturate the memory bus vs all 6 GPUs the speedup is a function of problem >>> subdomain size. I saw 10x at about 100K equations/process. >>> >>> >>>> Doing what you want to do is one of the last optimization steps of an >>>> already optimized code before entering production. Yours is not even >>>> optimized for proper GPU usage yet. >>>> Also, any specific reason why you are using dgmres and fgmres? >>>> >>>> PETSc has not been designed with multi-threading in mind. You can >>>> achieve ?overlap? of the two solves by splitting the communicator. But then >>>> you need communications to let the two solutions talk to each other. >>>> >>>> Thanks >>>> Stefano >>>> >>>> >>>> On Aug 4, 2020, at 12:04 PM, nicola varini >>>> wrote: >>>> >>>> Dear all, thanks for your replies. The reason why I've asked if it is >>>> possible to overlap poisson and ampere is because they roughly >>>> take the same amount of time. Please find in attachment the profiling >>>> logs for only CPU and only GPU. >>>> Of course it is possible to split the MPI communicator and run each >>>> solver on different subcommunicator, however this would involve more >>>> communication. >>>> Did anyone ever tried to run 2 solvers with hyperthreading? >>>> Thanks >>>> >>>> >>>> Il giorno dom 2 ago 2020 alle ore 14:09 Mark Adams >>>> ha scritto: >>>> >>>>> I suspect that the Poisson and Ampere's law solve are not coupled. You >>>>> might be able to duplicate the communicator and use two threads. You would >>>>> want to configure PETSc with threadsafty and threads and I think it >>>>> could/should work, but this mode is never used by anyone. >>>>> >>>>> That said, I would not recommend doing this unless you feel like >>>>> playing in computer science, as opposed to doing application science. The >>>>> best case scenario you get a speedup of 2x. That is a strict upper bound, >>>>> but you will never come close to it. Your hardware has some balance of CPU >>>>> to GPU processing rate. Your application has a balance of volume of work >>>>> for your two solves. They have to be the same to get close to 2x speedup >>>>> and that ratio(s) has to be 1:1. To be concrete, from what little I can >>>>> guess about your applications let's assume that the cost of each of these >>>>> two solves is about the same (eg, Laplacians on your domain and the best >>>>> case scenario). But, GPU machines are configured to have roughly 1-10% of >>>>> capacity in the GPUs, these days, that gives you an upper bound of about >>>>> 10% speedup. That is noise. Upshot, unless you configure your hardware to >>>>> match this problem, and the two solves have the same cost, you will not see >>>>> close to 2x speedup. Your time is better spent elsewhere. >>>>> >>>>> Mark >>>>> >>>>> On Sat, Aug 1, 2020 at 3:24 PM Jed Brown wrote: >>>>> >>>>>> You can use MPI and split the communicator so n-1 ranks create a DMDA >>>>>> for one part of your system and the other rank drives the GPU in the other >>>>>> part. They can all be part of the same coupled system on the full >>>>>> communicator, but PETSc doesn't currently support some ranks having their >>>>>> Vec arrays on GPU and others on host, so you'd be paying host-device >>>>>> transfer costs on each iteration (and that might swamp any performance >>>>>> benefit you would have gotten). >>>>>> >>>>>> In any case, be sure to think about the execution time of each part. >>>>>> Load balancing with matching time-to-solution for each part can be really >>>>>> hard. >>>>>> >>>>>> >>>>>> Barry Smith writes: >>>>>> >>>>>> > Nicola, >>>>>> > >>>>>> > This is really viable or practical at this time with PETSc. It >>>>>> is not impossible but requires careful coding with threads, another >>>>>> possibility is to use one half of the virtual GPUs for each solve, this is >>>>>> also not trivial. I would recommend first seeing what kind of performance >>>>>> you can get on the GPU for each type of solve and revist this idea in the >>>>>> future. >>>>>> > >>>>>> > Barry >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> >> On Jul 31, 2020, at 9:23 AM, nicola varini < >>>>>> nicola.varini at gmail.com> wrote: >>>>>> >> >>>>>> >> Hello, I would like to know if it is possible to overlap CPU and >>>>>> GPU with DMDA. >>>>>> >> I've a machine where each node has 1P100+1Haswell. >>>>>> >> I've to resolve Poisson and Ampere equation for each time step. >>>>>> >> I'm using 2D DMDA for each of them. Would be possible to compute >>>>>> poisson >>>>>> >> and ampere equation at the same time? One on CPU and the other on >>>>>> GPU? >>>>>> >> >>>>>> >> Thanks >>>>>> >>>>> >>>> >>>> >>>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: out_miniapp_f_poisson Type: application/octet-stream Size: 19156 bytes Desc: not available URL: From smithc11 at rpi.edu Thu Aug 13 10:43:39 2020 From: smithc11 at rpi.edu (Cameron Smith) Date: Thu, 13 Aug 2020 11:43:39 -0400 Subject: [petsc-users] creation of parallel dmplex from a partitioned mesh In-Reply-To: References: <1953567c-6c7f-30fb-13e6-ad7017263a92@rpi.edu> Message-ID: <62654977-bdbc-9cd7-5a70-e9fb4951310a@rpi.edu> Thank you for the quick reply and the info. We'll give it a shot and respond if we hit any snags. -Cameron On 8/13/20 9:54 AM, Matthew Knepley wrote: > On Thu, Aug 13, 2020 at 9:38 AM Cameron Smith > wrote: > > Hello, > > We have a partitioned mesh that we want to create a DMPlex from that > has > the same distribution of elements (i.e., assignment of elements to > processes) and vertices 'interior' to a process (i.e., vertices not on > the inter-process boundary). > > We were trying to use DMPlexCreateFromCellListParallelPetsc() or > DMPlexBuildFromCellListParallel() and found that the vertex ownership > (roots in the returned Vertex SF) appears to be sequentially > assigned to > processes based on global vertex id.? In general, this will not match > our mesh distribution.? As we understand, to subsequently set vertex > coordinates (or other vertex data) we would have to utilize a star > forest (SF) communication API to send data to the correct process. Is > that correct? > > Alternatively, if we create a dmplex object from the elements that > exist > on each process using DMCreateFromCellList(), and then create a SF from > mesh vertices on inter-process boundaries (using the mapping from local > to global vertex ids provided by our mesh library), could we then > associate the dmplex objects with the SF?? Is it as simple as calling > DMSetPointSF()? > > > Yes. If you have all the distribution information, this is the easiest > thing to do. > > If manually defining the PointSF is a way forward, we would like some > help understanding its definition; i.e., which entities become roots > and > which become leaves.? In DMPlexBuildFromCellListParallel() > > > Short explanation of SF: > > SF stands for Star-Forest. It is a star graph because you have a single > root that points to? multiple leaves. It is > a forest because you have several of these stars. We use this construct > in many places in PETSc, and where it > is used determines the semantics of the indices. > > The DMPlex point SF is an SF in which root indices are "owned" mesh > points and leaf indices are "ghost" mesh > points. You can take any set of local Plexes and add an SF to make them > a parallel Plex. > > The SF is constructed with one-sided data. Locally, each process > specifies two things: > > ? 1) The root space: The set of indices [0, Nr) which refers to > possible roots on this process. For the pointSF, this is [0, Np) where > Np is the number of local mesh points. > > ? 2) The leaves: Each leaf is a pair (local mesh point lp, remote mesh > point rp) which says that local mesh point lp is a "ghost" of remote > point rp. The remote point is > ? ? ? ?given by (rank r, local mesh point rlp) where rlp is the local > mesh point number on process r. > > With this, the Plex will automatically create all the other structures > it needs. > > https://gitlab.com/petsc/petsc/-/blob/753428fdb0644bc4cb7be6429ce8776c05405d40/src/dm/impls/plex/plexcreate.c#L2875-2899 > > the PointSF appears to contain roots for elements and vertices and > leaves for owned vertices on the inter-process boundary.? Is that > correct? > > > No, the leaves are ghost vertices. They point back to the owner. > > ? Thanks, > > ? ? ?Matt > > Thank-you, > Cameron > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From chris at resfrac.com Thu Aug 13 11:15:23 2020 From: chris at resfrac.com (Chris Hewson) Date: Thu, 13 Aug 2020 10:15:23 -0600 Subject: [petsc-users] Tough to reproduce petsctablefind error In-Reply-To: References: <0AC37384-BC37-4A6C-857D-41CD507F84C2@petsc.dev> Message-ID: Just as an update to this, I can confirm that using the mpich version (3.3.2) downloaded with the petsc download solved this issue on my end. *Chris Hewson* Senior Reservoir Simulation Engineer ResFrac +1.587.575.9792 On Thu, Jul 23, 2020 at 5:58 PM Junchao Zhang wrote: > On Mon, Jul 20, 2020 at 7:05 AM Barry Smith wrote: > >> >> Is there a comprehensive MPI test suite (perhaps from MPICH)? Is >> there any way to run this full test suite under the problematic MPI and see >> if it detects any problems? >> >> Is so, could someone add it to the FAQ in the debugging section? >> > MPICH does have a test suite. It is at the subdir test/mpi of downloaded > mpich . > It annoyed me since it is not user-friendly. It might be helpful in > catching bugs at very small scale. But say if I want to test allreduce on > 1024 ranks on 100 doubles, I have to hack the test suite. > Anyway, the instructions are here. > > For the purpose of petsc, under test/mpi one can configure it with > $./configure CC=mpicc CXX=mpicxx FC=mpifort --enable-strictmpi > --enable-threads=funneled --enable-fortran=f77,f90 --enable-fast > --disable-spawn --disable-cxx --disable-ft-tests // It is weird I disabled > cxx but I had to set CXX! > $make -k -j8 // -k is to keep going and ignore compilation errors, e.g., > when building tests for MPICH extensions not in MPI standard, but your MPI > is OpenMPI. > $ // edit testlist, remove lines mpi_t, rma, f77, impls. Those are > sub-dirs containing tests for MPI routines Petsc does not rely on. > $ make testings or directly './runtests -tests=testlist' > > On a batch system, > $export MPITEST_BATCHDIR=`pwd`/btest // specify a batch dir, say > btest, > $./runtests -batch -mpiexec=mpirun -np=1024 -tests=testlist // Use 1024 > ranks if a test does no specify the number of processes. > $ // It copies test binaries to the batch dir and generates a > script runtests.batch there. Edit the script to fit your batch system and > then submit a job and wait for its finish. > $ cd btest && ../checktests --ignorebogus > > > PS: Fande, changing an MPI fixed your problem does not necessarily mean > the old MPI has bugs. It is complicated. It could be a petsc bug. You need > to provide us a code to reproduce your error. It does not matter if the > code is big. > > >> Thanks >> >> Barry >> >> >> On Jul 20, 2020, at 12:16 AM, Fande Kong wrote: >> >> Trace could look like this: >> >> [640]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [640]PETSC ERROR: Argument out of range >> [640]PETSC ERROR: key 45226154 is greater than largest key allowed 740521 >> [640]PETSC ERROR: See >> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >> shooting. >> [640]PETSC ERROR: Petsc Release Version 3.13.3, unknown >> [640]PETSC ERROR: ../../griffin-opt on a arch-moose named r6i5n18 by >> wangy2 Sun Jul 19 17:14:28 2020 >> [640]PETSC ERROR: Configure options --download-hypre=1 >> --with-debugging=no --with-shared-libraries=1 --download-fblaslapack=1 >> --download-metis=1 --download-ptscotch=1 --download-parmetis=1 >> --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 >> --download-slepc=1 --with-mpi=1 --with-cxx-dialect=C++11 >> --with-fortran-bindings=0 --with-sowing=0 --with-64-bit-indices >> --download-mumps=0 >> [640]PETSC ERROR: #1 PetscTableFind() line 132 in >> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/include/petscctable.h >> [640]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() line 33 in >> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mmaij.c >> [640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in >> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >> [640]PETSC ERROR: #4 MatAssemblyEnd() line 5347 in >> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >> [640]PETSC ERROR: #5 MatPtAPNumeric_MPIAIJ_MPIXAIJ_allatonce() line 901 >> in >> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiptap.c >> [640]PETSC ERROR: #6 MatPtAPNumeric_MPIAIJ_MPIMAIJ_allatonce() line 3180 >> in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/maij/maij.c >> [640]PETSC ERROR: #7 MatProductNumeric_PtAP() line 704 in >> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >> [640]PETSC ERROR: #8 MatProductNumeric() line 759 in >> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >> [640]PETSC ERROR: #9 MatPtAP() line 9199 in >> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >> [640]PETSC ERROR: #10 MatGalerkin() line 10236 in >> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >> [640]PETSC ERROR: #11 PCSetUp_MG() line 745 in >> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/mg/mg.c >> [640]PETSC ERROR: #12 PCSetUp_HMG() line 220 in >> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/hmg/hmg.c >> [640]PETSC ERROR: #13 PCSetUp() line 898 in >> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/interface/precon.c >> [640]PETSC ERROR: #14 KSPSetUp() line 376 in >> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >> [640]PETSC ERROR: #15 KSPSolve_Private() line 633 in >> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >> [640]PETSC ERROR: #16 KSPSolve() line 853 in >> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >> [640]PETSC ERROR: #17 SNESSolve_NEWTONLS() line 225 in >> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/impls/ls/ls.c >> [640]PETSC ERROR: #18 SNESSolve() line 4519 in >> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/interface/snes.c >> >> On Sun, Jul 19, 2020 at 11:13 PM Fande Kong wrote: >> >>> I am not entirely sure what is happening, but we encountered similar >>> issues recently. It was not reproducible. It might occur at different >>> stages, and errors could be weird other than "ctable stuff." Our code was >>> Valgrind clean since every PR in moose needs to go through rigorous >>> Valgrind checks before it reaches the devel branch. The errors happened >>> when we used mvapich. >>> >>> We changed to use HPE-MPT (a vendor stalled MPI), then everything was >>> smooth. May you try a different MPI? It is better to try a system carried >>> one. >>> >>> We did not get the bottom of this problem yet, but we at least know this >>> is kind of MPI-related. >>> >>> Thanks, >>> >>> Fande, >>> >>> >>> On Sun, Jul 19, 2020 at 3:28 PM Chris Hewson wrote: >>> >>>> Hi, >>>> >>>> I am having a bug that is occurring in PETSC with the return string: >>>> >>>> [7]PETSC ERROR: PetscTableFind() line 132 in >>>> /home/chewson/petsc-3.13.2/include/petscctable.h key 7556 is greater than >>>> largest key allowed 5693 >>>> >>>> This is using petsc-3.13.2, compiled and running using mpich with -O3 >>>> and debugging turned off tuned to the haswell architecture and >>>> occurring either before or during a KSPBCGS solve/setup or during a MUMPS >>>> factorization solve (I haven't been able to replicate this issue with the >>>> same set of instructions etc.). >>>> >>>> This is a terrible way to ask a question, I know, and not very helpful >>>> from your side, but this is what I have from a user's run and can't >>>> reproduce on my end (either with the optimization compilation or with >>>> debugging turned on). This happens when the code has run for quite some >>>> time and is happening somewhat rarely. >>>> >>>> More than likely I am using a static variable (code is written in c++) >>>> that I'm not updating when the matrix size is changing or something silly >>>> like that, but any help or guidance on this would be appreciated. >>>> >>>> *Chris Hewson* >>>> Senior Reservoir Simulation Engineer >>>> ResFrac >>>> +1.587.575.9792 >>>> >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Thu Aug 13 12:14:45 2020 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Thu, 13 Aug 2020 12:14:45 -0500 Subject: [petsc-users] Tough to reproduce petsctablefind error In-Reply-To: References: <0AC37384-BC37-4A6C-857D-41CD507F84C2@petsc.dev> Message-ID: Thanks for the update. Let's assume it is a bug in MPI :) --Junchao Zhang On Thu, Aug 13, 2020 at 11:15 AM Chris Hewson wrote: > Just as an update to this, I can confirm that using the mpich version > (3.3.2) downloaded with the petsc download solved this issue on my end. > > *Chris Hewson* > Senior Reservoir Simulation Engineer > ResFrac > +1.587.575.9792 > > > On Thu, Jul 23, 2020 at 5:58 PM Junchao Zhang > wrote: > >> On Mon, Jul 20, 2020 at 7:05 AM Barry Smith wrote: >> >>> >>> Is there a comprehensive MPI test suite (perhaps from MPICH)? Is >>> there any way to run this full test suite under the problematic MPI and see >>> if it detects any problems? >>> >>> Is so, could someone add it to the FAQ in the debugging section? >>> >> MPICH does have a test suite. It is at the subdir test/mpi of downloaded >> mpich . >> It annoyed me since it is not user-friendly. It might be helpful in >> catching bugs at very small scale. But say if I want to test allreduce on >> 1024 ranks on 100 doubles, I have to hack the test suite. >> Anyway, the instructions are here. >> >> For the purpose of petsc, under test/mpi one can configure it with >> $./configure CC=mpicc CXX=mpicxx FC=mpifort --enable-strictmpi >> --enable-threads=funneled --enable-fortran=f77,f90 --enable-fast >> --disable-spawn --disable-cxx --disable-ft-tests // It is weird I disabled >> cxx but I had to set CXX! >> $make -k -j8 // -k is to keep going and ignore compilation errors, e.g., >> when building tests for MPICH extensions not in MPI standard, but your MPI >> is OpenMPI. >> $ // edit testlist, remove lines mpi_t, rma, f77, impls. Those are >> sub-dirs containing tests for MPI routines Petsc does not rely on. >> $ make testings or directly './runtests -tests=testlist' >> >> On a batch system, >> $export MPITEST_BATCHDIR=`pwd`/btest // specify a batch dir, say >> btest, >> $./runtests -batch -mpiexec=mpirun -np=1024 -tests=testlist // Use 1024 >> ranks if a test does no specify the number of processes. >> $ // It copies test binaries to the batch dir and generates a >> script runtests.batch there. Edit the script to fit your batch system and >> then submit a job and wait for its finish. >> $ cd btest && ../checktests --ignorebogus >> >> >> PS: Fande, changing an MPI fixed your problem does not necessarily mean >> the old MPI has bugs. It is complicated. It could be a petsc bug. You need >> to provide us a code to reproduce your error. It does not matter if the >> code is big. >> >> >>> Thanks >>> >>> Barry >>> >>> >>> On Jul 20, 2020, at 12:16 AM, Fande Kong wrote: >>> >>> Trace could look like this: >>> >>> [640]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> [640]PETSC ERROR: Argument out of range >>> [640]PETSC ERROR: key 45226154 is greater than largest key allowed 740521 >>> [640]PETSC ERROR: See >>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>> shooting. >>> [640]PETSC ERROR: Petsc Release Version 3.13.3, unknown >>> [640]PETSC ERROR: ../../griffin-opt on a arch-moose named r6i5n18 by >>> wangy2 Sun Jul 19 17:14:28 2020 >>> [640]PETSC ERROR: Configure options --download-hypre=1 >>> --with-debugging=no --with-shared-libraries=1 --download-fblaslapack=1 >>> --download-metis=1 --download-ptscotch=1 --download-parmetis=1 >>> --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 >>> --download-slepc=1 --with-mpi=1 --with-cxx-dialect=C++11 >>> --with-fortran-bindings=0 --with-sowing=0 --with-64-bit-indices >>> --download-mumps=0 >>> [640]PETSC ERROR: #1 PetscTableFind() line 132 in >>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/include/petscctable.h >>> [640]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() line 33 in >>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mmaij.c >>> [640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in >>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >>> [640]PETSC ERROR: #4 MatAssemblyEnd() line 5347 in >>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>> [640]PETSC ERROR: #5 MatPtAPNumeric_MPIAIJ_MPIXAIJ_allatonce() line 901 >>> in >>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiptap.c >>> [640]PETSC ERROR: #6 MatPtAPNumeric_MPIAIJ_MPIMAIJ_allatonce() line 3180 >>> in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/maij/maij.c >>> [640]PETSC ERROR: #7 MatProductNumeric_PtAP() line 704 in >>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>> [640]PETSC ERROR: #8 MatProductNumeric() line 759 in >>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>> [640]PETSC ERROR: #9 MatPtAP() line 9199 in >>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>> [640]PETSC ERROR: #10 MatGalerkin() line 10236 in >>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>> [640]PETSC ERROR: #11 PCSetUp_MG() line 745 in >>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/mg/mg.c >>> [640]PETSC ERROR: #12 PCSetUp_HMG() line 220 in >>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/hmg/hmg.c >>> [640]PETSC ERROR: #13 PCSetUp() line 898 in >>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/interface/precon.c >>> [640]PETSC ERROR: #14 KSPSetUp() line 376 in >>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>> [640]PETSC ERROR: #15 KSPSolve_Private() line 633 in >>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>> [640]PETSC ERROR: #16 KSPSolve() line 853 in >>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>> [640]PETSC ERROR: #17 SNESSolve_NEWTONLS() line 225 in >>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/impls/ls/ls.c >>> [640]PETSC ERROR: #18 SNESSolve() line 4519 in >>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/interface/snes.c >>> >>> On Sun, Jul 19, 2020 at 11:13 PM Fande Kong wrote: >>> >>>> I am not entirely sure what is happening, but we encountered similar >>>> issues recently. It was not reproducible. It might occur at different >>>> stages, and errors could be weird other than "ctable stuff." Our code was >>>> Valgrind clean since every PR in moose needs to go through rigorous >>>> Valgrind checks before it reaches the devel branch. The errors happened >>>> when we used mvapich. >>>> >>>> We changed to use HPE-MPT (a vendor stalled MPI), then everything was >>>> smooth. May you try a different MPI? It is better to try a system carried >>>> one. >>>> >>>> We did not get the bottom of this problem yet, but we at least know >>>> this is kind of MPI-related. >>>> >>>> Thanks, >>>> >>>> Fande, >>>> >>>> >>>> On Sun, Jul 19, 2020 at 3:28 PM Chris Hewson wrote: >>>> >>>>> Hi, >>>>> >>>>> I am having a bug that is occurring in PETSC with the return string: >>>>> >>>>> [7]PETSC ERROR: PetscTableFind() line 132 in >>>>> /home/chewson/petsc-3.13.2/include/petscctable.h key 7556 is greater than >>>>> largest key allowed 5693 >>>>> >>>>> This is using petsc-3.13.2, compiled and running using mpich with -O3 >>>>> and debugging turned off tuned to the haswell architecture and >>>>> occurring either before or during a KSPBCGS solve/setup or during a MUMPS >>>>> factorization solve (I haven't been able to replicate this issue with the >>>>> same set of instructions etc.). >>>>> >>>>> This is a terrible way to ask a question, I know, and not very helpful >>>>> from your side, but this is what I have from a user's run and can't >>>>> reproduce on my end (either with the optimization compilation or with >>>>> debugging turned on). This happens when the code has run for quite some >>>>> time and is happening somewhat rarely. >>>>> >>>>> More than likely I am using a static variable (code is written in c++) >>>>> that I'm not updating when the matrix size is changing or something silly >>>>> like that, but any help or guidance on this would be appreciated. >>>>> >>>>> *Chris Hewson* >>>>> Senior Reservoir Simulation Engineer >>>>> ResFrac >>>>> +1.587.575.9792 >>>>> >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From sajidsyed2021 at u.northwestern.edu Thu Aug 13 12:52:28 2020 From: sajidsyed2021 at u.northwestern.edu (Sajid Ali) Date: Thu, 13 Aug 2020 12:52:28 -0500 Subject: [petsc-users] Question on matrix assembly Message-ID: Hi PETSc-developers, When assembling a matrix, what would the relative performance of the following be : [a] loop over the rows owned by the mpi-rank twice, first to compute the values and set preallocation for this mpi-rank and then again to fill in the values (as recommended in the manual) [b] loop over the rows once, preallocate and set the values for each row. I'm refactoring an application that follows approach [a] but computes the elements of the matrix twice (once to fill in the nnz arrays and once to set the values) and I want to know if computing, preallocating and setting the elements by row instead would be better (so as to not compute the matrix entries twice which involves calls to boost-geometry). I'm attaching a plot that shows (Left) the number of non-zeros per row for a typical matrix used in this application and (Right) the histogram of the number of non zeros per row, should this be useful. Note that this matrix has global dimensions [12800 rows, 65586 columns]. PS : This matrix is used for a TAO optimization problem and generating the matrix takes between ~10 and ~25% of the time (longer on a smaller number of nodes). [image: image.png] Thank You, Sajid Ali | PhD Candidate Applied Physics Northwestern University s-sajid-ali.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 247517 bytes Desc: not available URL: From knepley at gmail.com Thu Aug 13 13:16:51 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 13 Aug 2020 14:16:51 -0400 Subject: [petsc-users] Question on matrix assembly In-Reply-To: References: Message-ID: On Thu, Aug 13, 2020 at 1:54 PM Sajid Ali wrote: > Hi PETSc-developers, > > When assembling a matrix, what would the relative performance of the > following be : > [a] loop over the rows owned by the mpi-rank twice, first to compute the > values and set preallocation for this mpi-rank and then again to fill in > the values (as recommended in the manual) > [b] loop over the rows once, preallocate and set the values for each row. > > I'm refactoring an application that follows approach [a] but computes the > elements of the matrix twice (once to fill in the nnz arrays and once to > set the values) and I want to know if computing, preallocating and setting > the elements by row instead would be better (so as to not compute the > matrix entries twice which involves calls to boost-geometry). > I am not sure what you mean by [b]. I do not believe the obvious interpretation is possible in PETSc. We allocate the matrix once, not row-by-row, so you could not preallocate just a few rows. However, why not have a flag so that on the first pass you do not compute entries, just the indices? Thanks, Matt > I'm attaching a plot that shows (Left) the number of non-zeros per row for > a typical matrix used in this application and (Right) the histogram of the > number of non zeros per row, should this be useful. Note that this matrix > has global dimensions [12800 rows, 65586 columns]. > > PS : This matrix is used for a TAO optimization problem and generating the > matrix takes between ~10 and ~25% of the time (longer on a smaller number > of nodes). > > [image: image.png] > > Thank You, > Sajid Ali | PhD Candidate > Applied Physics > Northwestern University > s-sajid-ali.github.io > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 247517 bytes Desc: not available URL: From stefano.zampini at gmail.com Thu Aug 13 13:45:33 2020 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Thu, 13 Aug 2020 20:45:33 +0200 Subject: [petsc-users] Question on matrix assembly In-Reply-To: References: Message-ID: <62502F3C-4411-4FCB-BD4F-6DCF0100180F@gmail.com> The matrix is rectangular. What do you need from it? Just its action? Or you need to know the entries to compute Matrix-matrix operations? If you just need the action, have you considered using a MATSHELL? > On Aug 13, 2020, at 8:16 PM, Matthew Knepley wrote: > > On Thu, Aug 13, 2020 at 1:54 PM Sajid Ali > wrote: > Hi PETSc-developers, > > When assembling a matrix, what would the relative performance of the following be : > [a] loop over the rows owned by the mpi-rank twice, first to compute the values and set preallocation for this mpi-rank and then again to fill in the values (as recommended in the manual) > [b] loop over the rows once, preallocate and set the values for each row. > > I'm refactoring an application that follows approach [a] but computes the elements of the matrix twice (once to fill in the nnz arrays and once to set the values) and I want to know if computing, preallocating and setting the elements by row instead would be better (so as to not compute the matrix entries twice which involves calls to boost-geometry). > > I am not sure what you mean by [b]. I do not believe the obvious interpretation is possible in PETSc. We allocate the > matrix once, not row-by-row, so you could not preallocate just a few rows. > > However, why not have a flag so that on the first pass you do not compute entries, just the indices? > > Thanks, > > Matt > > I'm attaching a plot that shows (Left) the number of non-zeros per row for a typical matrix used in this application and (Right) the histogram of the number of non zeros per row, should this be useful. Note that this matrix has global dimensions [12800 rows, 65586 columns]. > > PS : This matrix is used for a TAO optimization problem and generating the matrix takes between ~10 and ~25% of the time (longer on a smaller number of nodes). > > > > Thank You, > Sajid Ali | PhD Candidate > Applied Physics > Northwestern University > s-sajid-ali.github.io > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Aug 13 16:19:41 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 13 Aug 2020 16:19:41 -0500 Subject: [petsc-users] MatZeroRows In-Reply-To: References: Message-ID: <40168C06-4D21-45E9-B795-6EFE890CA6AB@petsc.dev> Anthony, Keeping the nonzero locations (with zero in them) will be a bit faster in the call to MatZeroRows() because otherwise it has to shift all the entries in the sparse matrix arrays data structures to "remove" the unneeded locations. But the real question is how it affects times in later function calls 1) If you do the MatSetValues()/MatAssemblyBegin/End() repeatedly it will be much faster if you keep the nonzero locations (with zero in them). 2) Time for MatMult() will be a bit faster if you remove the locations, but the depending on the preconditioner the preconditioner may be more or less effective and a bit slower or faster. So, rule of thumb is if you only do MatZeroRows() once you might generally remove the locations but if it is done in a loop with the same matrix (over time-steps for example, or even in SNES) you want to keep the nonzero locations. Barry Note: in both cases the memory usage is the same because PETSc does not "return" the excessive memory. > On Aug 13, 2020, at 8:07 AM, Matthew Knepley wrote: > > On Thu, Aug 13, 2020 at 3:17 AM Anthony Paul Haas > wrote: > Hello, > > I am using MatZeroRows for imposing a known forcing into my equations in conjunction with rhs and by setting the diagonal of the matrix to 1. > > I am using Fortran. I have used: > > ! only local processors set their own zeros > call MatSetOption(self%fieldLHSMat_ps, MAT_NO_OFF_PROC_ZERO_ROWS, PETSC_TRUE, ierr) > > call MatSetOption(self%fieldLHSMat_ps, MAT_KEEP_NONZERO_PATTERN, PETSC_TRUE, ierr) > > call MatZeroRows(self%fieldLHSMat_ps, numrows, glob_row_idx, diag, PETSC_NULL_OBJECT, PETSC_NULL_OBJECT, ierr) > > > Is numrows above the local (on each proc.) number of rows to remove, or is it the global number of rows to remove? > > Local. > > Also on some processors, I have no rows to remove, hence the array glob_row_idx is not allocated. How can I tell Petsc? Should I pass PETSC_NULL_OBJECT instead of glob_row_idx in this case? > > If you pass numrows = 0, it should not matter what is in the next slot, as long as it type checks. > > Finally, does using the option MAT_KEEP_NONZERO_PATTERN have an influence on the time the MatZeroRows call will take? > > I don't think it makes much of a difference, but I have not measured it. > > Thanks, > > Matt > > Thanks, > > Best regards, > > Anthony > > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Aug 13 16:22:35 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 13 Aug 2020 16:22:35 -0500 Subject: [petsc-users] overlap cpu and gpu? In-Reply-To: References: <6C7446CE-D6FD-4087-8B81-41494FA712E7@petsc.dev> <87eeoqp3t2.fsf@jedbrown.org> Message-ID: Does the same thing work (with GAMG) if you run on the same problem on the same machine same number of MPI ranks but make a new PETSC_ARCH that does NOT use the GPUs? Barry Ideally one gets almost identical convergence with CPUs or GPUs (same problem, same machine) but a bug or numerically change "might" affect this. > On Aug 13, 2020, at 10:28 AM, nicola varini wrote: > > Dear Barry, you are right. The Cray argument checking is incorrect. It does work with download-fblaslapack. > However it does fail to converge. Is there anything obviously wrong with my petscrc? > Anything else am I missing? > > Thanks > > Il giorno gio 13 ago 2020 alle ore 03:17 Barry Smith > ha scritto: > > The QR is always done on the CPU, we don't have generic calls to blas/lapack go to the GPU currently. > > The error message is: > > On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal value (info = -7) > > argument 7 is &LWORK which is defined by > > PetscBLASInt LWORK=N*bs; > > and > > N=nSAvec is the column block size of new P. > > Presumably this is a huge run with many processes so using the debugger is not practical? > > We need to see what these variables are > > N, bs, nSAvec > > perhaps nSAvec is zero which could easily upset LAPACK. > > Crudest thing would be to just put a print statement in the code before the LAPACK call of if they are called many times add an error check like that > generates an error if any of these three values are 0 (or negative). > > Barry > > > It is not impossible that the Cray argument checking is incorrect and the value passed in is fine. You can check this by using --download-fblaslapack and see if the same or some other error comes up. > > > > > > > > >> On Aug 12, 2020, at 7:19 PM, Mark Adams > wrote: >> >> Can you reproduce this on the CPU? >> The QR factorization seems to be failing. That could be from bad data or a bad GPU QR. >> >> On Wed, Aug 12, 2020 at 4:19 AM nicola varini > wrote: >> Dear all, following the suggestions I did resubmit the simulation with the petscrc below. >> However I do get the following error: >> ======== >> 7362 [592]PETSC ERROR: #1 formProl0() line 748 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >> 7363 [339]PETSC ERROR: Petsc has generated inconsistent data >> 7364 [339]PETSC ERROR: xGEQRF error >> 7365 [339]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >> 7366 [339]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020 >> 7367 [339]PETSC ERROR: /users/nvarini/gbs_test_nicola/bin/gbs_daint_gpu_gnu on a named nid05083 by nvarini Wed Aug 12 10:06:15 2020 >> 7368 [339]PETSC ERROR: Configure options --with-cc=cc --with-fc=ftn --known-mpi-shared-libraries=1 --known-mpi-c-double-complex=1 --known-mpi-int64_t=1 --known-mpi-long-double=1 --with-batch=1 --known-64-bit-blas-indices=0 --LIBS=-lstdc++ --with-cxxlib-autodetect=0 --with-scalapa ck=1 --with-cxx=CC --with-debugging=0 --with-hypre-dir=/opt/cray/pe/tpsl/19.06.1/GNU/8.2/haswell --prefix=/scratch/snx3000/nvarini/petsc3.13.3-gpu --with-cuda=1 --with-cuda-c=nvcc --with-cxxlib-autodetect=0 --COPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include - -with-cxx=CC --CXXOPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include >> 7369 [592]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >> 7370 [592]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >> 7371 [592]PETSC ERROR: #4 PCSetUp() line 898 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c >> 7372 [592]PETSC ERROR: #5 KSPSetUp() line 376 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >> 7373 [592]PETSC ERROR: #6 KSPSolve_Private() line 633 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >> 7374 [316]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >> 7375 [339]PETSC ERROR: #1 formProl0() line 748 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >> 7376 [339]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >> 7377 [339]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >> 7378 [339]PETSC ERROR: #4 PCSetUp() line 898 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c >> 7379 [339]PETSC ERROR: #5 KSPSetUp() line 376 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >> 7380 [592]PETSC ERROR: #7 KSPSolve() line 853 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >> 7381 [339]PETSC ERROR: #6 KSPSolve_Private() line 633 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >> 7382 [339]PETSC ERROR: #7 KSPSolve() line 853 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >> 7383 On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal value (info = -7) >> 7384 [160]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >> ======== >> >> I did try other pc_gamg_type but they fails as well. >> >> >> #PETSc Option Table entries: >> -ampere_dm_mat_type aijcusparse >> -ampere_dm_vec_type cuda >> -ampere_ksp_atol 1e-15 >> -ampere_ksp_initial_guess_nonzero yes >> -ampere_ksp_reuse_preconditioner yes >> -ampere_ksp_rtol 1e-7 >> -ampere_ksp_type dgmres >> -ampere_mg_levels_esteig_ksp_max_it 10 >> -ampere_mg_levels_esteig_ksp_type cg >> -ampere_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >> -ampere_mg_levels_ksp_type chebyshev >> -ampere_mg_levels_pc_type jacobi >> -ampere_pc_gamg_agg_nsmooths 1 >> -ampere_pc_gamg_coarse_eq_limit 10 >> -ampere_pc_gamg_reuse_interpolation true >> -ampere_pc_gamg_square_graph 1 >> -ampere_pc_gamg_threshold 0.05 >> -ampere_pc_gamg_threshold_scale .0 >> -ampere_pc_gamg_type agg >> -ampere_pc_type gamg >> -dm_mat_type aijcusparse >> -dm_vec_type cuda >> -log_view >> -poisson_dm_mat_type aijcusparse >> -poisson_dm_vec_type cuda >> -poisson_ksp_atol 1e-15 >> -poisson_ksp_initial_guess_nonzero yes >> -poisson_ksp_reuse_preconditioner yes >> -poisson_ksp_rtol 1e-7 >> -poisson_ksp_type dgmres >> -poisson_log_view >> -poisson_mg_levels_esteig_ksp_max_it 10 >> -poisson_mg_levels_esteig_ksp_type cg >> -poisson_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >> -poisson_mg_levels_ksp_max_it 1 >> -poisson_mg_levels_ksp_type chebyshev >> -poisson_mg_levels_pc_type jacobi >> -poisson_pc_gamg_agg_nsmooths 1 >> -poisson_pc_gamg_coarse_eq_limit 10 >> -poisson_pc_gamg_reuse_interpolation true >> -poisson_pc_gamg_square_graph 1 >> -poisson_pc_gamg_threshold 0.05 >> -poisson_pc_gamg_threshold_scale .0 >> -poisson_pc_gamg_type agg >> -poisson_pc_type gamg >> -use_mat_nearnullspace true >> #End of PETSc Option Table entries >> >> Regards, >> >> Nicola >> >> Il giorno mar 4 ago 2020 alle ore 17:57 Mark Adams > ha scritto: >> >> >> On Tue, Aug 4, 2020 at 6:35 AM Stefano Zampini > wrote: >> Nicola, >> >> You are actually not using the GPU properly, since you use HYPRE preconditioning, which is CPU only. One of your solvers is actually slower on ?GPU?. >> For a full AMG GPU, you can use PCGAMG, with cheby smoothers and with Jacobi preconditioning. Mark can help you out with the specific command line options. >> When it works properly, everything related to PC application is offloaded to the GPU, and you should expect to get the well-known and branded 10x (maybe more) speedup one is expecting from GPUs during KSPSolve >> >> >> The speedup depends on the machine, but on SUMMIT, using enough CPUs to saturate the memory bus vs all 6 GPUs the speedup is a function of problem subdomain size. I saw 10x at about 100K equations/process. >> >> Doing what you want to do is one of the last optimization steps of an already optimized code before entering production. Yours is not even optimized for proper GPU usage yet. >> Also, any specific reason why you are using dgmres and fgmres? >> >> PETSc has not been designed with multi-threading in mind. You can achieve ?overlap? of the two solves by splitting the communicator. But then you need communications to let the two solutions talk to each other. >> >> Thanks >> Stefano >> >> >>> On Aug 4, 2020, at 12:04 PM, nicola varini > wrote: >>> >>> Dear all, thanks for your replies. The reason why I've asked if it is possible to overlap poisson and ampere is because they roughly >>> take the same amount of time. Please find in attachment the profiling logs for only CPU and only GPU. >>> Of course it is possible to split the MPI communicator and run each solver on different subcommunicator, however this would involve more communication. >>> Did anyone ever tried to run 2 solvers with hyperthreading? >>> Thanks >>> >>> >>> Il giorno dom 2 ago 2020 alle ore 14:09 Mark Adams > ha scritto: >>> I suspect that the Poisson and Ampere's law solve are not coupled. You might be able to duplicate the communicator and use two threads. You would want to configure PETSc with threadsafty and threads and I think it could/should work, but this mode is never used by anyone. >>> >>> That said, I would not recommend doing this unless you feel like playing in computer science, as opposed to doing application science. The best case scenario you get a speedup of 2x. That is a strict upper bound, but you will never come close to it. Your hardware has some balance of CPU to GPU processing rate. Your application has a balance of volume of work for your two solves. They have to be the same to get close to 2x speedup and that ratio(s) has to be 1:1. To be concrete, from what little I can guess about your applications let's assume that the cost of each of these two solves is about the same (eg, Laplacians on your domain and the best case scenario). But, GPU machines are configured to have roughly 1-10% of capacity in the GPUs, these days, that gives you an upper bound of about 10% speedup. That is noise. Upshot, unless you configure your hardware to match this problem, and the two solves have the same cost, you will not see close to 2x speedup. Your time is better spent elsewhere. >>> >>> Mark >>> >>> On Sat, Aug 1, 2020 at 3:24 PM Jed Brown > wrote: >>> You can use MPI and split the communicator so n-1 ranks create a DMDA for one part of your system and the other rank drives the GPU in the other part. They can all be part of the same coupled system on the full communicator, but PETSc doesn't currently support some ranks having their Vec arrays on GPU and others on host, so you'd be paying host-device transfer costs on each iteration (and that might swamp any performance benefit you would have gotten). >>> >>> In any case, be sure to think about the execution time of each part. Load balancing with matching time-to-solution for each part can be really hard. >>> >>> >>> Barry Smith > writes: >>> >>> > Nicola, >>> > >>> > This is really viable or practical at this time with PETSc. It is not impossible but requires careful coding with threads, another possibility is to use one half of the virtual GPUs for each solve, this is also not trivial. I would recommend first seeing what kind of performance you can get on the GPU for each type of solve and revist this idea in the future. >>> > >>> > Barry >>> > >>> > >>> > >>> > >>> >> On Jul 31, 2020, at 9:23 AM, nicola varini > wrote: >>> >> >>> >> Hello, I would like to know if it is possible to overlap CPU and GPU with DMDA. >>> >> I've a machine where each node has 1P100+1Haswell. >>> >> I've to resolve Poisson and Ampere equation for each time step. >>> >> I'm using 2D DMDA for each of them. Would be possible to compute poisson >>> >> and ampere equation at the same time? One on CPU and the other on GPU? >>> >> >>> >> Thanks >>> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Aug 13 16:31:19 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 13 Aug 2020 16:31:19 -0500 Subject: [petsc-users] Tough to reproduce petsctablefind error In-Reply-To: References: <0AC37384-BC37-4A6C-857D-41CD507F84C2@petsc.dev> Message-ID: Junchao, Any way in the PETSc configure to warn that MPICH version is "bad" or "untrustworthy" or even the vague "we have suspicians about this version and recommend avoiding it"? A lot of time could be saved if others don't deal with the same problem. Maybe add arrays of suspect_versions for OpenMPI, MPICH, etc and always check against that list and print a boxed warning at configure time? Better you could somehow generalize it and put it in package.py for use by all packages, then any package can included lists of "suspect" versions. (There are definitely HDF5 versions that should be avoided :-)). Barry > On Aug 13, 2020, at 12:14 PM, Junchao Zhang wrote: > > Thanks for the update. Let's assume it is a bug in MPI :) > --Junchao Zhang > > > On Thu, Aug 13, 2020 at 11:15 AM Chris Hewson > wrote: > Just as an update to this, I can confirm that using the mpich version (3.3.2) downloaded with the petsc download solved this issue on my end. > > Chris Hewson > Senior Reservoir Simulation Engineer > ResFrac > +1.587.575.9792 > > > On Thu, Jul 23, 2020 at 5:58 PM Junchao Zhang > wrote: > On Mon, Jul 20, 2020 at 7:05 AM Barry Smith > wrote: > > Is there a comprehensive MPI test suite (perhaps from MPICH)? Is there any way to run this full test suite under the problematic MPI and see if it detects any problems? > > Is so, could someone add it to the FAQ in the debugging section? > MPICH does have a test suite. It is at the subdir test/mpi of downloaded mpich . It annoyed me since it is not user-friendly. It might be helpful in catching bugs at very small scale. But say if I want to test allreduce on 1024 ranks on 100 doubles, I have to hack the test suite. > Anyway, the instructions are here. > For the purpose of petsc, under test/mpi one can configure it with > $./configure CC=mpicc CXX=mpicxx FC=mpifort --enable-strictmpi --enable-threads=funneled --enable-fortran=f77,f90 --enable-fast --disable-spawn --disable-cxx --disable-ft-tests // It is weird I disabled cxx but I had to set CXX! > $make -k -j8 // -k is to keep going and ignore compilation errors, e.g., when building tests for MPICH extensions not in MPI standard, but your MPI is OpenMPI. > $ // edit testlist, remove lines mpi_t, rma, f77, impls. Those are sub-dirs containing tests for MPI routines Petsc does not rely on. > $ make testings or directly './runtests -tests=testlist' > > On a batch system, > $export MPITEST_BATCHDIR=`pwd`/btest // specify a batch dir, say btest, > $./runtests -batch -mpiexec=mpirun -np=1024 -tests=testlist // Use 1024 ranks if a test does no specify the number of processes. > $ // It copies test binaries to the batch dir and generates a script runtests.batch there. Edit the script to fit your batch system and then submit a job and wait for its finish. > $ cd btest && ../checktests --ignorebogus > > PS: Fande, changing an MPI fixed your problem does not necessarily mean the old MPI has bugs. It is complicated. It could be a petsc bug. You need to provide us a code to reproduce your error. It does not matter if the code is big. > > > Thanks > > Barry > > >> On Jul 20, 2020, at 12:16 AM, Fande Kong > wrote: >> >> Trace could look like this: >> >> [640]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >> [640]PETSC ERROR: Argument out of range >> [640]PETSC ERROR: key 45226154 is greater than largest key allowed 740521 >> [640]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >> [640]PETSC ERROR: Petsc Release Version 3.13.3, unknown >> [640]PETSC ERROR: ../../griffin-opt on a arch-moose named r6i5n18 by wangy2 Sun Jul 19 17:14:28 2020 >> [640]PETSC ERROR: Configure options --download-hypre=1 --with-debugging=no --with-shared-libraries=1 --download-fblaslapack=1 --download-metis=1 --download-ptscotch=1 --download-parmetis=1 --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 --download-slepc=1 --with-mpi=1 --with-cxx-dialect=C++11 --with-fortran-bindings=0 --with-sowing=0 --with-64-bit-indices --download-mumps=0 >> [640]PETSC ERROR: #1 PetscTableFind() line 132 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/include/petscctable.h >> [640]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() line 33 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mmaij.c >> [640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >> [640]PETSC ERROR: #4 MatAssemblyEnd() line 5347 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >> [640]PETSC ERROR: #5 MatPtAPNumeric_MPIAIJ_MPIXAIJ_allatonce() line 901 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiptap.c >> [640]PETSC ERROR: #6 MatPtAPNumeric_MPIAIJ_MPIMAIJ_allatonce() line 3180 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/maij/maij.c >> [640]PETSC ERROR: #7 MatProductNumeric_PtAP() line 704 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >> [640]PETSC ERROR: #8 MatProductNumeric() line 759 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >> [640]PETSC ERROR: #9 MatPtAP() line 9199 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >> [640]PETSC ERROR: #10 MatGalerkin() line 10236 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >> [640]PETSC ERROR: #11 PCSetUp_MG() line 745 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/mg/mg.c >> [640]PETSC ERROR: #12 PCSetUp_HMG() line 220 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/hmg/hmg.c >> [640]PETSC ERROR: #13 PCSetUp() line 898 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/interface/precon.c >> [640]PETSC ERROR: #14 KSPSetUp() line 376 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >> [640]PETSC ERROR: #15 KSPSolve_Private() line 633 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >> [640]PETSC ERROR: #16 KSPSolve() line 853 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >> [640]PETSC ERROR: #17 SNESSolve_NEWTONLS() line 225 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/impls/ls/ls.c >> [640]PETSC ERROR: #18 SNESSolve() line 4519 in /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/interface/snes.c >> >> On Sun, Jul 19, 2020 at 11:13 PM Fande Kong > wrote: >> I am not entirely sure what is happening, but we encountered similar issues recently. It was not reproducible. It might occur at different stages, and errors could be weird other than "ctable stuff." Our code was Valgrind clean since every PR in moose needs to go through rigorous Valgrind checks before it reaches the devel branch. The errors happened when we used mvapich. >> >> We changed to use HPE-MPT (a vendor stalled MPI), then everything was smooth. May you try a different MPI? It is better to try a system carried one. >> >> We did not get the bottom of this problem yet, but we at least know this is kind of MPI-related. >> >> Thanks, >> >> Fande, >> >> >> On Sun, Jul 19, 2020 at 3:28 PM Chris Hewson > wrote: >> Hi, >> >> I am having a bug that is occurring in PETSC with the return string: >> >> [7]PETSC ERROR: PetscTableFind() line 132 in /home/chewson/petsc-3.13.2/include/petscctable.h key 7556 is greater than largest key allowed 5693 >> >> This is using petsc-3.13.2, compiled and running using mpich with -O3 and debugging turned off tuned to the haswell architecture and occurring either before or during a KSPBCGS solve/setup or during a MUMPS factorization solve (I haven't been able to replicate this issue with the same set of instructions etc.). >> >> This is a terrible way to ask a question, I know, and not very helpful from your side, but this is what I have from a user's run and can't reproduce on my end (either with the optimization compilation or with debugging turned on). This happens when the code has run for quite some time and is happening somewhat rarely. >> >> More than likely I am using a static variable (code is written in c++) that I'm not updating when the matrix size is changing or something silly like that, but any help or guidance on this would be appreciated. >> >> Chris Hewson >> Senior Reservoir Simulation Engineer >> ResFrac >> +1.587.575.9792 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at email.arizona.edu Thu Aug 13 16:47:06 2020 From: aph at email.arizona.edu (Anthony Paul Haas) Date: Thu, 13 Aug 2020 14:47:06 -0700 Subject: [petsc-users] [EXT]Re: MatZeroRows In-Reply-To: <40168C06-4D21-45E9-B795-6EFE890CA6AB@petsc.dev> References: <40168C06-4D21-45E9-B795-6EFE890CA6AB@petsc.dev> Message-ID: Awesome! thanks to all. Cheers On Thu, Aug 13, 2020 at 2:19 PM Barry Smith wrote: > *External Email* > > Anthony, > > Keeping the nonzero locations (with zero in them) will be a bit faster > in the call to MatZeroRows() because otherwise it has to shift all the > entries in the sparse matrix arrays data structures to "remove" the > unneeded locations. > > But the real question is how it affects times in later function calls > > 1) If you do the MatSetValues()/MatAssemblyBegin/End() repeatedly it will > be much faster if you keep the nonzero locations (with zero in them). > > 2) Time for MatMult() will be a bit faster if you remove the locations, > but the depending on the preconditioner the preconditioner may be more or > less effective and a bit slower or faster. > > So, rule of thumb is if you only do MatZeroRows() once you might > generally remove the locations but if it is done in a loop with the same > matrix (over time-steps for example, or even in SNES) you want to keep the > nonzero locations. > > Barry > > Note: in both cases the memory usage is the same because PETSc does not > "return" the excessive memory. > > > On Aug 13, 2020, at 8:07 AM, Matthew Knepley wrote: > > On Thu, Aug 13, 2020 at 3:17 AM Anthony Paul Haas > wrote: > >> Hello, >> >> I am using MatZeroRows for imposing a known forcing into my equations in >> conjunction with rhs and by setting the diagonal of the matrix to 1. >> >> I am using Fortran. I have used: >> >> ! only local processors set their own zeros >> >> >> call MatSetOption(self%fieldLHSMat_ps, MAT_NO_OFF_PROC_ZERO_ROWS, >> PETSC_TRUE, ierr) >> >> call MatSetOption(self%fieldLHSMat_ps, MAT_KEEP_NONZERO_PATTERN, >> PETSC_TRUE, ierr) >> >> call MatZeroRows(self%fieldLHSMat_ps, numrows, glob_row_idx, diag, >> PETSC_NULL_OBJECT, PETSC_NULL_OBJECT, ierr) >> >> >> Is numrows above the local (on each proc.) number of rows to remove, or >> is it the global number of rows to remove? >> > > Local. > > >> Also on some processors, I have no rows to remove, hence the array glob_row_idx >> is not allocated. How can I tell Petsc? Should I pass PETSC_NULL_OBJECT >> instead of glob_row_idx in this case? >> > > If you pass numrows = 0, it should not matter what is in the next slot, as > long as it type checks. > > >> Finally, does using the option MAT_KEEP_NONZERO_PATTERN have an influence >> on the time the MatZeroRows call will take? >> > > I don't think it makes much of a difference, but I have not measured it. > > Thanks, > > Matt > > >> Thanks, >> >> Best regards, >> >> Anthony >> >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Thu Aug 13 17:21:02 2020 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Thu, 13 Aug 2020 17:21:02 -0500 Subject: [petsc-users] Tough to reproduce petsctablefind error In-Reply-To: References: <0AC37384-BC37-4A6C-857D-41CD507F84C2@petsc.dev> Message-ID: That is a great idea. I'll figure it out. --Junchao Zhang On Thu, Aug 13, 2020 at 4:31 PM Barry Smith wrote: > > Junchao, > > Any way in the PETSc configure to warn that MPICH version is "bad" or > "untrustworthy" or even the vague "we have suspicians about this version > and recommend avoiding it"? A lot of time could be saved if others don't > deal with the same problem. > > Maybe add arrays of suspect_versions for OpenMPI, MPICH, etc and > always check against that list and print a boxed warning at configure time? > Better you could somehow generalize it and put it in package.py for use by > all packages, then any package can included lists of "suspect" versions. > (There are definitely HDF5 versions that should be avoided :-)). > > Barry > > > On Aug 13, 2020, at 12:14 PM, Junchao Zhang > wrote: > > Thanks for the update. Let's assume it is a bug in MPI :) > --Junchao Zhang > > > On Thu, Aug 13, 2020 at 11:15 AM Chris Hewson wrote: > >> Just as an update to this, I can confirm that using the mpich version >> (3.3.2) downloaded with the petsc download solved this issue on my end. >> >> *Chris Hewson* >> Senior Reservoir Simulation Engineer >> ResFrac >> +1.587.575.9792 >> >> >> On Thu, Jul 23, 2020 at 5:58 PM Junchao Zhang >> wrote: >> >>> On Mon, Jul 20, 2020 at 7:05 AM Barry Smith wrote: >>> >>>> >>>> Is there a comprehensive MPI test suite (perhaps from MPICH)? Is >>>> there any way to run this full test suite under the problematic MPI and see >>>> if it detects any problems? >>>> >>>> Is so, could someone add it to the FAQ in the debugging section? >>>> >>> MPICH does have a test suite. It is at the subdir test/mpi of downloaded >>> mpich . >>> It annoyed me since it is not user-friendly. It might be helpful in >>> catching bugs at very small scale. But say if I want to test allreduce on >>> 1024 ranks on 100 doubles, I have to hack the test suite. >>> Anyway, the instructions are here. >>> >>> For the purpose of petsc, under test/mpi one can configure it with >>> $./configure CC=mpicc CXX=mpicxx FC=mpifort --enable-strictmpi >>> --enable-threads=funneled --enable-fortran=f77,f90 --enable-fast >>> --disable-spawn --disable-cxx --disable-ft-tests // It is weird I disabled >>> cxx but I had to set CXX! >>> $make -k -j8 // -k is to keep going and ignore compilation errors, >>> e.g., when building tests for MPICH extensions not in MPI standard, but >>> your MPI is OpenMPI. >>> $ // edit testlist, remove lines mpi_t, rma, f77, impls. Those are >>> sub-dirs containing tests for MPI routines Petsc does not rely on. >>> $ make testings or directly './runtests -tests=testlist' >>> >>> On a batch system, >>> $export MPITEST_BATCHDIR=`pwd`/btest // specify a batch dir, say >>> btest, >>> $./runtests -batch -mpiexec=mpirun -np=1024 -tests=testlist // Use >>> 1024 ranks if a test does no specify the number of processes. >>> $ // It copies test binaries to the batch dir and generates a >>> script runtests.batch there. Edit the script to fit your batch system and >>> then submit a job and wait for its finish. >>> $ cd btest && ../checktests --ignorebogus >>> >>> >>> PS: Fande, changing an MPI fixed your problem does not necessarily mean >>> the old MPI has bugs. It is complicated. It could be a petsc bug. You need >>> to provide us a code to reproduce your error. It does not matter if the >>> code is big. >>> >>> >>>> Thanks >>>> >>>> Barry >>>> >>>> >>>> On Jul 20, 2020, at 12:16 AM, Fande Kong wrote: >>>> >>>> Trace could look like this: >>>> >>>> [640]PETSC ERROR: --------------------- Error Message >>>> -------------------------------------------------------------- >>>> [640]PETSC ERROR: Argument out of range >>>> [640]PETSC ERROR: key 45226154 is greater than largest key allowed >>>> 740521 >>>> [640]PETSC ERROR: See >>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>> shooting. >>>> [640]PETSC ERROR: Petsc Release Version 3.13.3, unknown >>>> [640]PETSC ERROR: ../../griffin-opt on a arch-moose named r6i5n18 by >>>> wangy2 Sun Jul 19 17:14:28 2020 >>>> [640]PETSC ERROR: Configure options --download-hypre=1 >>>> --with-debugging=no --with-shared-libraries=1 --download-fblaslapack=1 >>>> --download-metis=1 --download-ptscotch=1 --download-parmetis=1 >>>> --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 >>>> --download-slepc=1 --with-mpi=1 --with-cxx-dialect=C++11 >>>> --with-fortran-bindings=0 --with-sowing=0 --with-64-bit-indices >>>> --download-mumps=0 >>>> [640]PETSC ERROR: #1 PetscTableFind() line 132 in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/include/petscctable.h >>>> [640]PETSC ERROR: #2 MatSetUpMultiply_MPIAIJ() line 33 in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mmaij.c >>>> [640]PETSC ERROR: #3 MatAssemblyEnd_MPIAIJ() line 876 in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiaij.c >>>> [640]PETSC ERROR: #4 MatAssemblyEnd() line 5347 in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>> [640]PETSC ERROR: #5 MatPtAPNumeric_MPIAIJ_MPIXAIJ_allatonce() line 901 >>>> in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/aij/mpi/mpiptap.c >>>> [640]PETSC ERROR: #6 MatPtAPNumeric_MPIAIJ_MPIMAIJ_allatonce() line >>>> 3180 in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/impls/maij/maij.c >>>> [640]PETSC ERROR: #7 MatProductNumeric_PtAP() line 704 in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>> [640]PETSC ERROR: #8 MatProductNumeric() line 759 in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matproduct.c >>>> [640]PETSC ERROR: #9 MatPtAP() line 9199 in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>> [640]PETSC ERROR: #10 MatGalerkin() line 10236 in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/mat/interface/matrix.c >>>> [640]PETSC ERROR: #11 PCSetUp_MG() line 745 in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/mg/mg.c >>>> [640]PETSC ERROR: #12 PCSetUp_HMG() line 220 in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/impls/hmg/hmg.c >>>> [640]PETSC ERROR: #13 PCSetUp() line 898 in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/pc/interface/precon.c >>>> [640]PETSC ERROR: #14 KSPSetUp() line 376 in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>> [640]PETSC ERROR: #15 KSPSolve_Private() line 633 in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>> [640]PETSC ERROR: #16 KSPSolve() line 853 in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/ksp/ksp/interface/itfunc.c >>>> [640]PETSC ERROR: #17 SNESSolve_NEWTONLS() line 225 in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/impls/ls/ls.c >>>> [640]PETSC ERROR: #18 SNESSolve() line 4519 in >>>> /home/wangy2/trunk/sawtooth/griffin/moose/petsc/src/snes/interface/snes.c >>>> >>>> On Sun, Jul 19, 2020 at 11:13 PM Fande Kong >>>> wrote: >>>> >>>>> I am not entirely sure what is happening, but we encountered similar >>>>> issues recently. It was not reproducible. It might occur at different >>>>> stages, and errors could be weird other than "ctable stuff." Our code was >>>>> Valgrind clean since every PR in moose needs to go through rigorous >>>>> Valgrind checks before it reaches the devel branch. The errors happened >>>>> when we used mvapich. >>>>> >>>>> We changed to use HPE-MPT (a vendor stalled MPI), then everything was >>>>> smooth. May you try a different MPI? It is better to try a system carried >>>>> one. >>>>> >>>>> We did not get the bottom of this problem yet, but we at least know >>>>> this is kind of MPI-related. >>>>> >>>>> Thanks, >>>>> >>>>> Fande, >>>>> >>>>> >>>>> On Sun, Jul 19, 2020 at 3:28 PM Chris Hewson >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I am having a bug that is occurring in PETSC with the return string: >>>>>> >>>>>> [7]PETSC ERROR: PetscTableFind() line 132 in >>>>>> /home/chewson/petsc-3.13.2/include/petscctable.h key 7556 is greater than >>>>>> largest key allowed 5693 >>>>>> >>>>>> This is using petsc-3.13.2, compiled and running using mpich with -O3 >>>>>> and debugging turned off tuned to the haswell architecture and >>>>>> occurring either before or during a KSPBCGS solve/setup or during a MUMPS >>>>>> factorization solve (I haven't been able to replicate this issue with the >>>>>> same set of instructions etc.). >>>>>> >>>>>> This is a terrible way to ask a question, I know, and not very >>>>>> helpful from your side, but this is what I have from a user's run and can't >>>>>> reproduce on my end (either with the optimization compilation or with >>>>>> debugging turned on). This happens when the code has run for quite some >>>>>> time and is happening somewhat rarely. >>>>>> >>>>>> More than likely I am using a static variable (code is written in >>>>>> c++) that I'm not updating when the matrix size is changing or something >>>>>> silly like that, but any help or guidance on this would be appreciated. >>>>>> >>>>>> *Chris Hewson* >>>>>> Senior Reservoir Simulation Engineer >>>>>> ResFrac >>>>>> +1.587.575.9792 >>>>>> >>>>> >>>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nb25 at rice.edu Thu Aug 13 20:41:54 2020 From: nb25 at rice.edu (Nidish) Date: Thu, 13 Aug 2020 20:41:54 -0500 Subject: [petsc-users] Performance differences between PETSc4py and PETSc on C Message-ID: <5597332a-1304-d817-0a6e-00d1c800242b@rice.edu> Hello, I'm wondering if any performance studies have been conducted between codes written using PETSc on C versus Python implementations using PETSc4py. Other than this, I'd really appreciate it if someone can give perspectives on the drawbacks/advantages on opting for either for code development. Thank you, Nidish From knepley at gmail.com Thu Aug 13 20:43:23 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 13 Aug 2020 21:43:23 -0400 Subject: [petsc-users] Performance differences between PETSc4py and PETSc on C In-Reply-To: <5597332a-1304-d817-0a6e-00d1c800242b@rice.edu> References: <5597332a-1304-d817-0a6e-00d1c800242b@rice.edu> Message-ID: On Thu, Aug 13, 2020 at 9:42 PM Nidish wrote: > Hello, > > I'm wondering if any performance studies have been conducted between > codes written using PETSc on C versus Python implementations using > PETSc4py. Other than this, I'd really appreciate it if someone can give > perspectives on the drawbacks/advantages on opting for either for code > development. > No PETSc code is in Python, so there is no difference as long as you do not call PETSc function millions of times from Python. It should always be possible to operate at the right granularity. Thanks, Matt > Thank you, > Nidish > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From okoshkarov at tae.com Thu Aug 13 21:16:39 2020 From: okoshkarov at tae.com (Alex Koshkarov) Date: Fri, 14 Aug 2020 02:16:39 +0000 Subject: [petsc-users] Help with DMDASetBlockFillsSparse Message-ID: <1D22B484-9201-44AD-B35F-667BB7A2E8E7@tae.com> Hello All, I have a problem with DMDA matrix, I hope you can help. I am solving big PDE with TS object and my main data structure is 3D DMDA which has large number of degrees of freedom. I solve this system implicitly with JFNK. Essentially, I do something like this (J is jacobian, P is preconditioner): MatCreateSNESMF(snes,&J); DMCreateMatrix(da,&P); SNESSetJacobian(snes,J,P,form_P,env); In form_P I form P, but not J, and all works. I also sometimes use 3D DMDA for 2D and 1D problems, just setting Ny=Nz=1 and periodic boundary conditions. I have many degrees of freedom in my DMDA, so I need blocks to be sparse. So I tried to use DMDASetBlockFillsSparse, before DMCreateMatrix. However, it gives me an error connected that size of each dimension should be divisible by (2*stencil_size +1) to make efficient coloring, which makes using 3D DMDA for 2D problems not optimal. Moreover, it becomes super slow? I am not sure why it needs coloring. Essentially, everything works fine without DMDASetBlockFillsSparse, but once I use it, it complains and becomes slow. Do I do something wrong here? Do I need this coloring? All I need is more sparse DMDA matrix? I am somewhat lost. Thank you very much and best regards, Alex Koshkarov. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Aug 13 23:28:11 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 13 Aug 2020 23:28:11 -0500 Subject: [petsc-users] Help with DMDASetBlockFillsSparse In-Reply-To: <1D22B484-9201-44AD-B35F-667BB7A2E8E7@tae.com> References: <1D22B484-9201-44AD-B35F-667BB7A2E8E7@tae.com> Message-ID: <5D1375EF-4261-4D6F-806C-BF7C12CB08AD@petsc.dev> Alex, Ahh, some over-zealous error checking used for the sparse case that is not there otherwise. You can edit src/dm/impls/da/fdda.c and remove the lines if (bx == DM_BOUNDARY_PERIODIC && (m % col)) SETERRQ(PetscObjectComm((PetscObject)da),PETSC_ERR_SUP,"For coloring efficiency ensure number of grid points in X is divisible\n\ by 2*stencil_width + 1\n"); if (by == DM_BOUNDARY_PERIODIC && (n % col)) SETERRQ(PetscObjectComm((PetscObject)da),PETSC_ERR_SUP,"For coloring efficiency ensure number of grid points in Y is divisible\n\ by 2*stencil_width + 1\n"); if (bz == DM_BOUNDARY_PERIODIC && (p % col)) SETERRQ(PetscObjectComm((PetscObject)da),PETSC_ERR_SUP,"For coloring efficiency ensure number of grid points in Z is divisible\n\ by 2*stencil_width + 1\n"); then do make libs in a PETSc directory. Regarding the slowdown. What becomes slower? Creating the DM or somewhere later in the code? My guess is that what becomes slow is your matrix assembly process and this happens because you provide the "wrong" sparsity pattern. That is your matrix actually has some entries that are not represented in the sparsity pattern. It will take forever to fill the matrix if this is the case. You can call MatSetOption(mat,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE) before putting values into the matrix to see if that is the case. Barry > On Aug 13, 2020, at 9:16 PM, Alex Koshkarov wrote: > > Hello All, > > I have a problem with DMDA matrix, I hope you can help. I am solving big PDE with TS object and my main data structure is 3D DMDA which has large number of degrees of freedom. I solve this system implicitly with JFNK. Essentially, I do something like this (J is jacobian, P is preconditioner): > > MatCreateSNESMF(snes,&J); > DMCreateMatrix(da,&P); > SNESSetJacobian(snes,J,P,form_P,env); > > In form_P I form P, but not J, and all works. I also sometimes use 3D DMDA for 2D and 1D problems, just setting Ny=Nz=1 and periodic boundary conditions. I have many degrees of freedom in my DMDA, so I need blocks to be sparse. So I tried to use DMDASetBlockFillsSparse, before DMCreateMatrix. However, it gives me an error connected that size of each dimension should be divisible by (2*stencil_size +1) to make efficient coloring, which makes using 3D DMDA for 2D problems not optimal. Moreover, it becomes super slow? I am not sure why it needs coloring. Essentially, everything works fine without DMDASetBlockFillsSparse, but once I use it, it complains and becomes slow. Do I do something wrong here? Do I need this coloring? All I need is more sparse DMDA matrix? I am somewhat lost. > > Thank you very much and best regards, > Alex Koshkarov. -------------- next part -------------- An HTML attachment was scrubbed... URL: From nb25 at rice.edu Thu Aug 13 23:50:09 2020 From: nb25 at rice.edu (Nidish) Date: Thu, 13 Aug 2020 23:50:09 -0500 Subject: [petsc-users] Performance differences between PETSc4py and PETSc on C In-Reply-To: References: <5597332a-1304-d817-0a6e-00d1c800242b@rice.edu> Message-ID: Does that mean one can choose to develop code using the petc4py wrappers without having to sacrifice any performance? Apologies if this question is too basic, I'm just trying to understand what I would be sacrificing if I chose to completely write my application on Python. Nidish On 8/13/20 8:43 PM, Matthew Knepley wrote: > On Thu, Aug 13, 2020 at 9:42 PM Nidish > wrote: > > Hello, > > I'm wondering if any performance studies have been conducted between > codes written using PETSc on C versus Python implementations using > PETSc4py. Other than this, I'd really appreciate it if someone can > give > perspectives on the drawbacks/advantages on opting for either for > code > development. > > > No PETSc code is in Python, so there is no difference as long as you > do not call PETSc > function millions of times from Python. It should always be possible > to operate at the right > granularity. > > ? Thanks, > > ? ? ?Matt > > Thank you, > Nidish > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -- Nidish -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu Aug 13 23:51:54 2020 From: jed at jedbrown.org (Jed Brown) Date: Thu, 13 Aug 2020 22:51:54 -0600 Subject: [petsc-users] Performance differences between PETSc4py and PETSc on C In-Reply-To: References: <5597332a-1304-d817-0a6e-00d1c800242b@rice.edu> Message-ID: <87ft8pde39.fsf@jedbrown.org> Nidish writes: > Does that mean one can choose to develop code using the petc4py wrappers > without having to sacrifice any performance? Roughly, yes. Though assembly of residuals and Jacobians is still your business and its performance will vary greatly with the class of method and your implementation. > Apologies if this question is too basic, I'm just trying to understand > what I would be sacrificing if I chose to completely write my > application on Python. > > Nidish > > On 8/13/20 8:43 PM, Matthew Knepley wrote: >> On Thu, Aug 13, 2020 at 9:42 PM Nidish > > wrote: >> >> Hello, >> >> I'm wondering if any performance studies have been conducted between >> codes written using PETSc on C versus Python implementations using >> PETSc4py. Other than this, I'd really appreciate it if someone can >> give >> perspectives on the drawbacks/advantages on opting for either for >> code >> development. >> >> >> No PETSc code is in Python, so there is no difference as long as you >> do not call PETSc >> function millions of times from Python. It should always be possible >> to operate at the right >> granularity. >> >> ? Thanks, >> >> ? ? ?Matt >> >> Thank you, >> Nidish >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which >> their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> > -- > Nidish From nb25 at rice.edu Fri Aug 14 00:10:48 2020 From: nb25 at rice.edu (Nidish) Date: Fri, 14 Aug 2020 00:10:48 -0500 Subject: [petsc-users] Performance differences between PETSc4py and PETSc on C In-Reply-To: <87ft8pde39.fsf@jedbrown.org> References: <5597332a-1304-d817-0a6e-00d1c800242b@rice.edu> <87ft8pde39.fsf@jedbrown.org> Message-ID: Thank you, Nidish On 8/13/20 11:51 PM, Jed Brown wrote: > Nidish writes: > >> Does that mean one can choose to develop code using the petc4py wrappers >> without having to sacrifice any performance? > Roughly, yes. Though assembly of residuals and Jacobians is still your business and its performance will vary greatly with the class of method and your implementation. > >> Apologies if this question is too basic, I'm just trying to understand >> what I would be sacrificing if I chose to completely write my >> application on Python. >> >> Nidish >> >> On 8/13/20 8:43 PM, Matthew Knepley wrote: >>> On Thu, Aug 13, 2020 at 9:42 PM Nidish >> > wrote: >>> >>> Hello, >>> >>> I'm wondering if any performance studies have been conducted between >>> codes written using PETSc on C versus Python implementations using >>> PETSc4py. Other than this, I'd really appreciate it if someone can >>> give >>> perspectives on the drawbacks/advantages on opting for either for >>> code >>> development. >>> >>> >>> No PETSc code is in Python, so there is no difference as long as you >>> do not call PETSc >>> function millions of times from Python. It should always be possible >>> to operate at the right >>> granularity. >>> >>> ? Thanks, >>> >>> ? ? ?Matt >>> >>> Thank you, >>> Nidish >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which >>> their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >> -- >> Nidish -- Nidish From nicola.varini at gmail.com Fri Aug 14 04:01:03 2020 From: nicola.varini at gmail.com (nicola varini) Date: Fri, 14 Aug 2020 11:01:03 +0200 Subject: [petsc-users] overlap cpu and gpu? In-Reply-To: References: <6C7446CE-D6FD-4087-8B81-41494FA712E7@petsc.dev> <87eeoqp3t2.fsf@jedbrown.org> Message-ID: Dear Barry, yes it gives the same problems. Il giorno gio 13 ago 2020 alle ore 23:22 Barry Smith ha scritto: > > Does the same thing work (with GAMG) if you run on the same problem on > the same machine same number of MPI ranks but make a new PETSC_ARCH that > does NOT use the GPUs? > > Barry > > Ideally one gets almost identical convergence with CPUs or GPUs (same > problem, same machine) but a bug or numerically change "might" affect this. > > On Aug 13, 2020, at 10:28 AM, nicola varini > wrote: > > Dear Barry, you are right. The Cray argument checking is incorrect. It > does work with download-fblaslapack. > However it does fail to converge. Is there anything obviously wrong with > my petscrc? > Anything else am I missing? > > Thanks > > Il giorno gio 13 ago 2020 alle ore 03:17 Barry Smith > ha scritto: > >> >> The QR is always done on the CPU, we don't have generic calls to >> blas/lapack go to the GPU currently. >> >> The error message is: >> >> On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal value (info >> = -7) >> >> argument 7 is &LWORK which is defined by >> >> PetscBLASInt LWORK=N*bs; >> >> and >> >> N=nSAvec is the column block size of new P. >> >> Presumably this is a huge run with many processes so using the >> debugger is not practical? >> >> We need to see what these variables are >> >> N, bs, nSAvec >> >> perhaps nSAvec is zero which could easily upset LAPACK. >> >> Crudest thing would be to just put a print statement in the code >> before the LAPACK call of if they are called many times add an error check >> like that >> generates an error if any of these three values are 0 (or negative). >> >> Barry >> >> >> It is not impossible that the Cray argument checking is incorrect and >> the value passed in is fine. You can check this by using >> --download-fblaslapack and see if the same or some other error comes up. >> >> >> >> >> >> >> >> >> On Aug 12, 2020, at 7:19 PM, Mark Adams wrote: >> >> Can you reproduce this on the CPU? >> The QR factorization seems to be failing. That could be from bad data or >> a bad GPU QR. >> >> On Wed, Aug 12, 2020 at 4:19 AM nicola varini >> wrote: >> >>> Dear all, following the suggestions I did resubmit the simulation with >>> the petscrc below. >>> However I do get the following error: >>> ======== >>> 7362 [592]PETSC ERROR: #1 formProl0() line 748 in >>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>> 7363 [339]PETSC ERROR: Petsc has generated inconsistent data >>> 7364 [339]PETSC ERROR: xGEQRF error >>> 7365 [339]PETSC ERROR: See >>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>> shooting. >>> 7366 [339]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020 >>> 7367 [339]PETSC ERROR: >>> /users/nvarini/gbs_test_nicola/bin/gbs_daint_gpu_gnu on a named nid05083 >>> by nvarini Wed Aug 12 10:06:15 2020 >>> 7368 [339]PETSC ERROR: Configure options --with-cc=cc --with-fc=ftn >>> --known-mpi-shared-libraries=1 --known-mpi-c-double-complex=1 >>> --known-mpi-int64_t=1 --known-mpi-long-double=1 --with-batch=1 >>> --known-64-bit-blas-indices=0 --LIBS=-lstdc++ --with-cxxlib-autodetect=0 >>> --with-scalapa ck=1 --with-cxx=CC --with-debugging=0 >>> --with-hypre-dir=/opt/cray/pe/tpsl/19.06.1/GNU/8.2/haswell >>> --prefix=/scratch/snx3000/nvarini/petsc3.13.3-gpu --with-cuda=1 >>> --with-cuda-c=nvcc --with-cxxlib-autodetect=0 >>> --COPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include - >>> -with-cxx=CC >>> --CXXOPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include >>> 7369 [592]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in >>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>> 7370 [592]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>> 7371 [592]PETSC ERROR: #4 PCSetUp() line 898 in >>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c >>> 7372 [592]PETSC ERROR: #5 KSPSetUp() line 376 in >>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>> 7373 [592]PETSC ERROR: #6 KSPSolve_Private() line 633 in >>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>> 7374 [316]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>> 7375 [339]PETSC ERROR: #1 formProl0() line 748 in >>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>> 7376 [339]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in >>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>> 7377 [339]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>> 7378 [339]PETSC ERROR: #4 PCSetUp() line 898 in >>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c >>> 7379 [339]PETSC ERROR: #5 KSPSetUp() line 376 in >>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>> 7380 [592]PETSC ERROR: #7 KSPSolve() line 853 in >>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>> 7381 [339]PETSC ERROR: #6 KSPSolve_Private() line 633 in >>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>> 7382 [339]PETSC ERROR: #7 KSPSolve() line 853 in >>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>> 7383 On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal value >>> (info = -7) >>> 7384 [160]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>> ======== >>> >>> I did try other pc_gamg_type but they fails as well. >>> >>> >>> #PETSc Option Table entries: >>> -ampere_dm_mat_type aijcusparse >>> -ampere_dm_vec_type cuda >>> -ampere_ksp_atol 1e-15 >>> -ampere_ksp_initial_guess_nonzero yes >>> -ampere_ksp_reuse_preconditioner yes >>> -ampere_ksp_rtol 1e-7 >>> -ampere_ksp_type dgmres >>> -ampere_mg_levels_esteig_ksp_max_it 10 >>> -ampere_mg_levels_esteig_ksp_type cg >>> -ampere_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >>> -ampere_mg_levels_ksp_type chebyshev >>> -ampere_mg_levels_pc_type jacobi >>> -ampere_pc_gamg_agg_nsmooths 1 >>> -ampere_pc_gamg_coarse_eq_limit 10 >>> -ampere_pc_gamg_reuse_interpolation true >>> -ampere_pc_gamg_square_graph 1 >>> -ampere_pc_gamg_threshold 0.05 >>> -ampere_pc_gamg_threshold_scale .0 >>> -ampere_pc_gamg_type agg >>> -ampere_pc_type gamg >>> -dm_mat_type aijcusparse >>> -dm_vec_type cuda >>> -log_view >>> -poisson_dm_mat_type aijcusparse >>> -poisson_dm_vec_type cuda >>> -poisson_ksp_atol 1e-15 >>> -poisson_ksp_initial_guess_nonzero yes >>> -poisson_ksp_reuse_preconditioner yes >>> -poisson_ksp_rtol 1e-7 >>> -poisson_ksp_type dgmres >>> -poisson_log_view >>> -poisson_mg_levels_esteig_ksp_max_it 10 >>> -poisson_mg_levels_esteig_ksp_type cg >>> -poisson_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >>> -poisson_mg_levels_ksp_max_it 1 >>> -poisson_mg_levels_ksp_type chebyshev >>> -poisson_mg_levels_pc_type jacobi >>> -poisson_pc_gamg_agg_nsmooths 1 >>> -poisson_pc_gamg_coarse_eq_limit 10 >>> -poisson_pc_gamg_reuse_interpolation true >>> -poisson_pc_gamg_square_graph 1 >>> -poisson_pc_gamg_threshold 0.05 >>> -poisson_pc_gamg_threshold_scale .0 >>> -poisson_pc_gamg_type agg >>> -poisson_pc_type gamg >>> -use_mat_nearnullspace true >>> #End of PETSc Option Table entries >>> >>> Regards, >>> >>> Nicola >>> >>> Il giorno mar 4 ago 2020 alle ore 17:57 Mark Adams ha >>> scritto: >>> >>>> >>>> >>>> On Tue, Aug 4, 2020 at 6:35 AM Stefano Zampini < >>>> stefano.zampini at gmail.com> wrote: >>>> >>>>> Nicola, >>>>> >>>>> You are actually not using the GPU properly, since you use HYPRE >>>>> preconditioning, which is CPU only. One of your solvers is actually slower >>>>> on ?GPU?. >>>>> For a full AMG GPU, you can use PCGAMG, with cheby smoothers and with >>>>> Jacobi preconditioning. Mark can help you out with the specific command >>>>> line options. >>>>> When it works properly, everything related to PC application is >>>>> offloaded to the GPU, and you should expect to get the well-known and >>>>> branded 10x (maybe more) speedup one is expecting from GPUs during KSPSolve >>>>> >>>>> >>>> The speedup depends on the machine, but on SUMMIT, using enough CPUs to >>>> saturate the memory bus vs all 6 GPUs the speedup is a function of problem >>>> subdomain size. I saw 10x at about 100K equations/process. >>>> >>>> >>>>> Doing what you want to do is one of the last optimization steps of an >>>>> already optimized code before entering production. Yours is not even >>>>> optimized for proper GPU usage yet. >>>>> Also, any specific reason why you are using dgmres and fgmres? >>>>> >>>>> PETSc has not been designed with multi-threading in mind. You can >>>>> achieve ?overlap? of the two solves by splitting the communicator. But then >>>>> you need communications to let the two solutions talk to each other. >>>>> >>>>> Thanks >>>>> Stefano >>>>> >>>>> >>>>> On Aug 4, 2020, at 12:04 PM, nicola varini >>>>> wrote: >>>>> >>>>> Dear all, thanks for your replies. The reason why I've asked if it is >>>>> possible to overlap poisson and ampere is because they roughly >>>>> take the same amount of time. Please find in attachment the profiling >>>>> logs for only CPU and only GPU. >>>>> Of course it is possible to split the MPI communicator and run each >>>>> solver on different subcommunicator, however this would involve more >>>>> communication. >>>>> Did anyone ever tried to run 2 solvers with hyperthreading? >>>>> Thanks >>>>> >>>>> >>>>> Il giorno dom 2 ago 2020 alle ore 14:09 Mark Adams >>>>> ha scritto: >>>>> >>>>>> I suspect that the Poisson and Ampere's law solve are not coupled. >>>>>> You might be able to duplicate the communicator and use two threads. You >>>>>> would want to configure PETSc with threadsafty and threads and I think it >>>>>> could/should work, but this mode is never used by anyone. >>>>>> >>>>>> That said, I would not recommend doing this unless you feel like >>>>>> playing in computer science, as opposed to doing application science. The >>>>>> best case scenario you get a speedup of 2x. That is a strict upper bound, >>>>>> but you will never come close to it. Your hardware has some balance of CPU >>>>>> to GPU processing rate. Your application has a balance of volume of work >>>>>> for your two solves. They have to be the same to get close to 2x speedup >>>>>> and that ratio(s) has to be 1:1. To be concrete, from what little I can >>>>>> guess about your applications let's assume that the cost of each of these >>>>>> two solves is about the same (eg, Laplacians on your domain and the best >>>>>> case scenario). But, GPU machines are configured to have roughly 1-10% of >>>>>> capacity in the GPUs, these days, that gives you an upper bound of about >>>>>> 10% speedup. That is noise. Upshot, unless you configure your hardware to >>>>>> match this problem, and the two solves have the same cost, you will not see >>>>>> close to 2x speedup. Your time is better spent elsewhere. >>>>>> >>>>>> Mark >>>>>> >>>>>> On Sat, Aug 1, 2020 at 3:24 PM Jed Brown wrote: >>>>>> >>>>>>> You can use MPI and split the communicator so n-1 ranks create a >>>>>>> DMDA for one part of your system and the other rank drives the GPU in the >>>>>>> other part. They can all be part of the same coupled system on the full >>>>>>> communicator, but PETSc doesn't currently support some ranks having their >>>>>>> Vec arrays on GPU and others on host, so you'd be paying host-device >>>>>>> transfer costs on each iteration (and that might swamp any performance >>>>>>> benefit you would have gotten). >>>>>>> >>>>>>> In any case, be sure to think about the execution time of each >>>>>>> part. Load balancing with matching time-to-solution for each part can be >>>>>>> really hard. >>>>>>> >>>>>>> >>>>>>> Barry Smith writes: >>>>>>> >>>>>>> > Nicola, >>>>>>> > >>>>>>> > This is really viable or practical at this time with PETSc. It >>>>>>> is not impossible but requires careful coding with threads, another >>>>>>> possibility is to use one half of the virtual GPUs for each solve, this is >>>>>>> also not trivial. I would recommend first seeing what kind of performance >>>>>>> you can get on the GPU for each type of solve and revist this idea in the >>>>>>> future. >>>>>>> > >>>>>>> > Barry >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> >> On Jul 31, 2020, at 9:23 AM, nicola varini < >>>>>>> nicola.varini at gmail.com> wrote: >>>>>>> >> >>>>>>> >> Hello, I would like to know if it is possible to overlap CPU and >>>>>>> GPU with DMDA. >>>>>>> >> I've a machine where each node has 1P100+1Haswell. >>>>>>> >> I've to resolve Poisson and Ampere equation for each time step. >>>>>>> >> I'm using 2D DMDA for each of them. Would be possible to compute >>>>>>> poisson >>>>>>> >> and ampere equation at the same time? One on CPU and the other on >>>>>>> GPU? >>>>>>> >> >>>>>>> >> Thanks >>>>>>> >>>>>> >>>>> >>>>> >>>>> >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eda.oktay at metu.edu.tr Fri Aug 14 08:53:53 2020 From: eda.oktay at metu.edu.tr (Eda Oktay) Date: Fri, 14 Aug 2020 16:53:53 +0300 Subject: [petsc-users] ParMETIS vs. CHACO when no partitioning is made Message-ID: Hi all, I am trying to try something. I am using the same MatPartitioning codes for both CHACO and ParMETIS: ierr = MatConvert(SymmA,MATMPIADJ,MAT_INITIAL_MATRIX,&AL);CHKERRQ(ierr); ierr = MatPartitioningCreate(MPI_COMM_WORLD,&part);CHKERRQ(ierr); ierr = MatPartitioningSetAdjacency(part,AL);CHKERRQ(ierr); ierr = MatPartitioningSetFromOptions(part);CHKERRQ(ierr); ierr = MatPartitioningApply(part,&partitioning);CHKERRQ(ierr); After obtaining the IS, I apply this to my original nonsymmetric matrix and try to get an approximate edge cut. Except for 1 partitioning, my program completely works for 2,4 and 16 partitionings. However, for 1, ParMETIS gives results where CHACO I guess doesn't since I am getting errors about the index set. What is the difference between CHACO and ParMETIS that one works for 1 partitioning and one doesn't? Thanks! Eda -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Aug 14 10:49:33 2020 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 14 Aug 2020 10:49:33 -0500 Subject: [petsc-users] ParMETIS vs. CHACO when no partitioning is made In-Reply-To: References: Message-ID: <548DC35C-0D80-409E-B360-D7E54076111D@petsc.dev> Could be a bug in Chaco or its call from PETSc for the special case of one process. Could you send a sample code that demonstrates the problem? Barry > On Aug 14, 2020, at 8:53 AM, Eda Oktay wrote: > > Hi all, > > I am trying to try something. I am using the same MatPartitioning codes for both CHACO and ParMETIS: > > ierr = MatConvert(SymmA,MATMPIADJ,MAT_INITIAL_MATRIX,&AL);CHKERRQ(ierr); > ierr = MatPartitioningCreate(MPI_COMM_WORLD,&part);CHKERRQ(ierr); > ierr = MatPartitioningSetAdjacency(part,AL);CHKERRQ(ierr); > > ierr = MatPartitioningSetFromOptions(part);CHKERRQ(ierr); > ierr = MatPartitioningApply(part,&partitioning);CHKERRQ(ierr); > > After obtaining the IS, I apply this to my original nonsymmetric matrix and try to get an approximate edge cut. > > Except for 1 partitioning, my program completely works for 2,4 and 16 partitionings. However, for 1, ParMETIS gives results where CHACO I guess doesn't since I am getting errors about the index set. > > What is the difference between CHACO and ParMETIS that one works for 1 partitioning and one doesn't? > > Thanks! > > Eda From eda.oktay at metu.edu.tr Fri Aug 14 11:07:25 2020 From: eda.oktay at metu.edu.tr (Eda Oktay) Date: Fri, 14 Aug 2020 19:07:25 +0300 Subject: [petsc-users] ParMETIS vs. CHACO when no partitioning is made In-Reply-To: <548DC35C-0D80-409E-B360-D7E54076111D@petsc.dev> References: <548DC35C-0D80-409E-B360-D7E54076111D@petsc.dev> Message-ID: Dear Barry, Thank you for answering. I am sending a sample code and a binary file. Thanks! Eda Barry Smith , 14 A?u 2020 Cum, 18:49 tarihinde ?unu yazd?: > > Could be a bug in Chaco or its call from PETSc for the special case of > one process. Could you send a sample code that demonstrates the problem? > > Barry > > > > On Aug 14, 2020, at 8:53 AM, Eda Oktay wrote: > > > > Hi all, > > > > I am trying to try something. I am using the same MatPartitioning codes > for both CHACO and ParMETIS: > > > > ierr = > MatConvert(SymmA,MATMPIADJ,MAT_INITIAL_MATRIX,&AL);CHKERRQ(ierr); > > ierr = MatPartitioningCreate(MPI_COMM_WORLD,&part);CHKERRQ(ierr); > > ierr = MatPartitioningSetAdjacency(part,AL);CHKERRQ(ierr); > > > > ierr = MatPartitioningSetFromOptions(part);CHKERRQ(ierr); > > ierr = MatPartitioningApply(part,&partitioning);CHKERRQ(ierr); > > > > After obtaining the IS, I apply this to my original nonsymmetric matrix > and try to get an approximate edge cut. > > > > Except for 1 partitioning, my program completely works for 2,4 and 16 > partitionings. However, for 1, ParMETIS gives results where CHACO I guess > doesn't since I am getting errors about the index set. > > > > What is the difference between CHACO and ParMETIS that one works for 1 > partitioning and one doesn't? > > > > Thanks! > > > > Eda > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: sample_chaco.zip Type: application/zip Size: 2683 bytes Desc: not available URL: From sajidsyed2021 at u.northwestern.edu Fri Aug 14 13:00:12 2020 From: sajidsyed2021 at u.northwestern.edu (Sajid Ali) Date: Fri, 14 Aug 2020 13:00:12 -0500 Subject: [petsc-users] Question on matrix assembly In-Reply-To: <62502F3C-4411-4FCB-BD4F-6DCF0100180F@gmail.com> References: <62502F3C-4411-4FCB-BD4F-6DCF0100180F@gmail.com> Message-ID: @Matthew Knepley : Thanks for the explanation on preallocation. >However, why not have a flag so that on the first pass you do not compute entries, just the indices? The matrix computes the projection of an image onto a detector so generating this involves computing all possible ray-rectangle intersections and computing the values only differs from computing the indices by a call to calculate intersection lengths. The process to set up the geometry and check for intersections is the same to generate indices and values. So, in this case the tradeoff would be to either compute everything twice and save on storage cost or compute everything once and use more memory (essentially compute the matrix rows on each rank, preallocate and then set the matrix values). @Stefano Zampini : Yes, I only need MatMult and MatMultTranspose in the TAO objective/gradient evaluation but in the current state it's cheaper to use a matrix instead of computing the intersections for each objective/gradient evaluation. About ~70% of the application time is spent in MatMult and MatMultTranspose so we're hoping that this would benefit from running on GPU's. Thanks for the pointer to MatShell, implementing a matrix free method is something we might pursue in the future. -- Sajid Ali | PhD Candidate Applied Physics Northwestern University s-sajid-ali.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Aug 14 13:12:57 2020 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 14 Aug 2020 14:12:57 -0400 Subject: [petsc-users] overlap cpu and gpu? In-Reply-To: References: <6C7446CE-D6FD-4087-8B81-41494FA712E7@petsc.dev> <87eeoqp3t2.fsf@jedbrown.org> Message-ID: You can try Hypre. If that fails then there is a problem with your system. And you can run with -info and grep on GAMG and send the output and I can see if I see anything funny. If this is just a Lapacian with a stable discretization and not crazy material parameters then stretched grids are about the only thing that can hurt the solver. Do both of your solves fail in a similar way? On the CPU you can try this with large subdomains, preferably (in serial ideally): -ampere_mg_levels_ksp_type richardson -ampere_mg_levels_pc_type sor And check that there are no unused options with -options_left. GAMG can fail with bad eigen estimates, but these parameters look fine. On Fri, Aug 14, 2020 at 5:01 AM nicola varini wrote: > Dear Barry, yes it gives the same problems. > > Il giorno gio 13 ago 2020 alle ore 23:22 Barry Smith > ha scritto: > >> >> Does the same thing work (with GAMG) if you run on the same problem on >> the same machine same number of MPI ranks but make a new PETSC_ARCH that >> does NOT use the GPUs? >> >> Barry >> >> Ideally one gets almost identical convergence with CPUs or GPUs (same >> problem, same machine) but a bug or numerically change "might" affect this. >> >> On Aug 13, 2020, at 10:28 AM, nicola varini >> wrote: >> >> Dear Barry, you are right. The Cray argument checking is incorrect. It >> does work with download-fblaslapack. >> However it does fail to converge. Is there anything obviously wrong with >> my petscrc? >> Anything else am I missing? >> >> Thanks >> >> Il giorno gio 13 ago 2020 alle ore 03:17 Barry Smith >> ha scritto: >> >>> >>> The QR is always done on the CPU, we don't have generic calls to >>> blas/lapack go to the GPU currently. >>> >>> The error message is: >>> >>> On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal value (info >>> = -7) >>> >>> argument 7 is &LWORK which is defined by >>> >>> PetscBLASInt LWORK=N*bs; >>> >>> and >>> >>> N=nSAvec is the column block size of new P. >>> >>> Presumably this is a huge run with many processes so using the >>> debugger is not practical? >>> >>> We need to see what these variables are >>> >>> N, bs, nSAvec >>> >>> perhaps nSAvec is zero which could easily upset LAPACK. >>> >>> Crudest thing would be to just put a print statement in the code >>> before the LAPACK call of if they are called many times add an error check >>> like that >>> generates an error if any of these three values are 0 (or negative). >>> >>> Barry >>> >>> >>> It is not impossible that the Cray argument checking is incorrect >>> and the value passed in is fine. You can check this by using >>> --download-fblaslapack and see if the same or some other error comes up. >>> >>> >>> >>> >>> >>> >>> >>> >>> On Aug 12, 2020, at 7:19 PM, Mark Adams wrote: >>> >>> Can you reproduce this on the CPU? >>> The QR factorization seems to be failing. That could be from bad data or >>> a bad GPU QR. >>> >>> On Wed, Aug 12, 2020 at 4:19 AM nicola varini >>> wrote: >>> >>>> Dear all, following the suggestions I did resubmit the simulation with >>>> the petscrc below. >>>> However I do get the following error: >>>> ======== >>>> 7362 [592]PETSC ERROR: #1 formProl0() line 748 in >>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>> 7363 [339]PETSC ERROR: Petsc has generated inconsistent data >>>> 7364 [339]PETSC ERROR: xGEQRF error >>>> 7365 [339]PETSC ERROR: See >>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>> shooting. >>>> 7366 [339]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020 >>>> 7367 [339]PETSC ERROR: >>>> /users/nvarini/gbs_test_nicola/bin/gbs_daint_gpu_gnu on a named nid05083 >>>> by nvarini Wed Aug 12 10:06:15 2020 >>>> 7368 [339]PETSC ERROR: Configure options --with-cc=cc --with-fc=ftn >>>> --known-mpi-shared-libraries=1 --known-mpi-c-double-complex=1 >>>> --known-mpi-int64_t=1 --known-mpi-long-double=1 --with-batch=1 >>>> --known-64-bit-blas-indices=0 --LIBS=-lstdc++ --with-cxxlib-autodetect=0 >>>> --with-scalapa ck=1 --with-cxx=CC --with-debugging=0 >>>> --with-hypre-dir=/opt/cray/pe/tpsl/19.06.1/GNU/8.2/haswell >>>> --prefix=/scratch/snx3000/nvarini/petsc3.13.3-gpu --with-cuda=1 >>>> --with-cuda-c=nvcc --with-cxxlib-autodetect=0 >>>> --COPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include - >>>> -with-cxx=CC >>>> --CXXOPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include >>>> 7369 [592]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in >>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>> 7370 [592]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>> 7371 [592]PETSC ERROR: #4 PCSetUp() line 898 in >>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c >>>> 7372 [592]PETSC ERROR: #5 KSPSetUp() line 376 in >>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>> 7373 [592]PETSC ERROR: #6 KSPSolve_Private() line 633 in >>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>> 7374 [316]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>> 7375 [339]PETSC ERROR: #1 formProl0() line 748 in >>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>> 7376 [339]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in >>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>> 7377 [339]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>> 7378 [339]PETSC ERROR: #4 PCSetUp() line 898 in >>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c >>>> 7379 [339]PETSC ERROR: #5 KSPSetUp() line 376 in >>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>> 7380 [592]PETSC ERROR: #7 KSPSolve() line 853 in >>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>> 7381 [339]PETSC ERROR: #6 KSPSolve_Private() line 633 in >>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>> 7382 [339]PETSC ERROR: #7 KSPSolve() line 853 in >>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>> 7383 On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal value >>>> (info = -7) >>>> 7384 [160]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>> ======== >>>> >>>> I did try other pc_gamg_type but they fails as well. >>>> >>>> >>>> #PETSc Option Table entries: >>>> -ampere_dm_mat_type aijcusparse >>>> -ampere_dm_vec_type cuda >>>> -ampere_ksp_atol 1e-15 >>>> -ampere_ksp_initial_guess_nonzero yes >>>> -ampere_ksp_reuse_preconditioner yes >>>> -ampere_ksp_rtol 1e-7 >>>> -ampere_ksp_type dgmres >>>> -ampere_mg_levels_esteig_ksp_max_it 10 >>>> -ampere_mg_levels_esteig_ksp_type cg >>>> -ampere_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >>>> -ampere_mg_levels_ksp_type chebyshev >>>> -ampere_mg_levels_pc_type jacobi >>>> -ampere_pc_gamg_agg_nsmooths 1 >>>> -ampere_pc_gamg_coarse_eq_limit 10 >>>> -ampere_pc_gamg_reuse_interpolation true >>>> -ampere_pc_gamg_square_graph 1 >>>> -ampere_pc_gamg_threshold 0.05 >>>> -ampere_pc_gamg_threshold_scale .0 >>>> -ampere_pc_gamg_type agg >>>> -ampere_pc_type gamg >>>> -dm_mat_type aijcusparse >>>> -dm_vec_type cuda >>>> -log_view >>>> -poisson_dm_mat_type aijcusparse >>>> -poisson_dm_vec_type cuda >>>> -poisson_ksp_atol 1e-15 >>>> -poisson_ksp_initial_guess_nonzero yes >>>> -poisson_ksp_reuse_preconditioner yes >>>> -poisson_ksp_rtol 1e-7 >>>> -poisson_ksp_type dgmres >>>> -poisson_log_view >>>> -poisson_mg_levels_esteig_ksp_max_it 10 >>>> -poisson_mg_levels_esteig_ksp_type cg >>>> -poisson_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >>>> -poisson_mg_levels_ksp_max_it 1 >>>> -poisson_mg_levels_ksp_type chebyshev >>>> -poisson_mg_levels_pc_type jacobi >>>> -poisson_pc_gamg_agg_nsmooths 1 >>>> -poisson_pc_gamg_coarse_eq_limit 10 >>>> -poisson_pc_gamg_reuse_interpolation true >>>> -poisson_pc_gamg_square_graph 1 >>>> -poisson_pc_gamg_threshold 0.05 >>>> -poisson_pc_gamg_threshold_scale .0 >>>> -poisson_pc_gamg_type agg >>>> -poisson_pc_type gamg >>>> -use_mat_nearnullspace true >>>> #End of PETSc Option Table entries >>>> >>>> Regards, >>>> >>>> Nicola >>>> >>>> Il giorno mar 4 ago 2020 alle ore 17:57 Mark Adams >>>> ha scritto: >>>> >>>>> >>>>> >>>>> On Tue, Aug 4, 2020 at 6:35 AM Stefano Zampini < >>>>> stefano.zampini at gmail.com> wrote: >>>>> >>>>>> Nicola, >>>>>> >>>>>> You are actually not using the GPU properly, since you use HYPRE >>>>>> preconditioning, which is CPU only. One of your solvers is actually slower >>>>>> on ?GPU?. >>>>>> For a full AMG GPU, you can use PCGAMG, with cheby smoothers and with >>>>>> Jacobi preconditioning. Mark can help you out with the specific command >>>>>> line options. >>>>>> When it works properly, everything related to PC application is >>>>>> offloaded to the GPU, and you should expect to get the well-known and >>>>>> branded 10x (maybe more) speedup one is expecting from GPUs during KSPSolve >>>>>> >>>>>> >>>>> The speedup depends on the machine, but on SUMMIT, using enough CPUs >>>>> to saturate the memory bus vs all 6 GPUs the speedup is a function of >>>>> problem subdomain size. I saw 10x at about 100K equations/process. >>>>> >>>>> >>>>>> Doing what you want to do is one of the last optimization steps of an >>>>>> already optimized code before entering production. Yours is not even >>>>>> optimized for proper GPU usage yet. >>>>>> Also, any specific reason why you are using dgmres and fgmres? >>>>>> >>>>>> PETSc has not been designed with multi-threading in mind. You can >>>>>> achieve ?overlap? of the two solves by splitting the communicator. But then >>>>>> you need communications to let the two solutions talk to each other. >>>>>> >>>>>> Thanks >>>>>> Stefano >>>>>> >>>>>> >>>>>> On Aug 4, 2020, at 12:04 PM, nicola varini >>>>>> wrote: >>>>>> >>>>>> Dear all, thanks for your replies. The reason why I've asked if it is >>>>>> possible to overlap poisson and ampere is because they roughly >>>>>> take the same amount of time. Please find in attachment the profiling >>>>>> logs for only CPU and only GPU. >>>>>> Of course it is possible to split the MPI communicator and run each >>>>>> solver on different subcommunicator, however this would involve more >>>>>> communication. >>>>>> Did anyone ever tried to run 2 solvers with hyperthreading? >>>>>> Thanks >>>>>> >>>>>> >>>>>> Il giorno dom 2 ago 2020 alle ore 14:09 Mark Adams >>>>>> ha scritto: >>>>>> >>>>>>> I suspect that the Poisson and Ampere's law solve are not coupled. >>>>>>> You might be able to duplicate the communicator and use two threads. You >>>>>>> would want to configure PETSc with threadsafty and threads and I think it >>>>>>> could/should work, but this mode is never used by anyone. >>>>>>> >>>>>>> That said, I would not recommend doing this unless you feel like >>>>>>> playing in computer science, as opposed to doing application science. The >>>>>>> best case scenario you get a speedup of 2x. That is a strict upper bound, >>>>>>> but you will never come close to it. Your hardware has some balance of CPU >>>>>>> to GPU processing rate. Your application has a balance of volume of work >>>>>>> for your two solves. They have to be the same to get close to 2x speedup >>>>>>> and that ratio(s) has to be 1:1. To be concrete, from what little I can >>>>>>> guess about your applications let's assume that the cost of each of these >>>>>>> two solves is about the same (eg, Laplacians on your domain and the best >>>>>>> case scenario). But, GPU machines are configured to have roughly 1-10% of >>>>>>> capacity in the GPUs, these days, that gives you an upper bound of about >>>>>>> 10% speedup. That is noise. Upshot, unless you configure your hardware to >>>>>>> match this problem, and the two solves have the same cost, you will not see >>>>>>> close to 2x speedup. Your time is better spent elsewhere. >>>>>>> >>>>>>> Mark >>>>>>> >>>>>>> On Sat, Aug 1, 2020 at 3:24 PM Jed Brown wrote: >>>>>>> >>>>>>>> You can use MPI and split the communicator so n-1 ranks create a >>>>>>>> DMDA for one part of your system and the other rank drives the GPU in the >>>>>>>> other part. They can all be part of the same coupled system on the full >>>>>>>> communicator, but PETSc doesn't currently support some ranks having their >>>>>>>> Vec arrays on GPU and others on host, so you'd be paying host-device >>>>>>>> transfer costs on each iteration (and that might swamp any performance >>>>>>>> benefit you would have gotten). >>>>>>>> >>>>>>>> In any case, be sure to think about the execution time of each >>>>>>>> part. Load balancing with matching time-to-solution for each part can be >>>>>>>> really hard. >>>>>>>> >>>>>>>> >>>>>>>> Barry Smith writes: >>>>>>>> >>>>>>>> > Nicola, >>>>>>>> > >>>>>>>> > This is really viable or practical at this time with PETSc. >>>>>>>> It is not impossible but requires careful coding with threads, another >>>>>>>> possibility is to use one half of the virtual GPUs for each solve, this is >>>>>>>> also not trivial. I would recommend first seeing what kind of performance >>>>>>>> you can get on the GPU for each type of solve and revist this idea in the >>>>>>>> future. >>>>>>>> > >>>>>>>> > Barry >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> >> On Jul 31, 2020, at 9:23 AM, nicola varini < >>>>>>>> nicola.varini at gmail.com> wrote: >>>>>>>> >> >>>>>>>> >> Hello, I would like to know if it is possible to overlap CPU and >>>>>>>> GPU with DMDA. >>>>>>>> >> I've a machine where each node has 1P100+1Haswell. >>>>>>>> >> I've to resolve Poisson and Ampere equation for each time step. >>>>>>>> >> I'm using 2D DMDA for each of them. Would be possible to compute >>>>>>>> poisson >>>>>>>> >> and ampere equation at the same time? One on CPU and the other >>>>>>>> on GPU? >>>>>>>> >> >>>>>>>> >> Thanks >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Aug 14 13:43:51 2020 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 14 Aug 2020 13:43:51 -0500 Subject: [petsc-users] Question on matrix assembly In-Reply-To: References: <62502F3C-4411-4FCB-BD4F-6DCF0100180F@gmail.com> Message-ID: <51B8FA5C-0231-40A8-A579-7536A0D9D5D8@petsc.dev> Sajid, Are the rows truly compute from first to last in order and all entries for a row are available together? Barry > On Aug 14, 2020, at 1:00 PM, Sajid Ali wrote: > > > @Matthew Knepley : Thanks for the explanation on preallocation. > > >However, why not have a flag so that on the first pass you do not compute entries, just the indices? > > The matrix computes the projection of an image onto a detector so generating this involves computing all possible ray-rectangle intersections and computing the values only differs from computing the indices by a call to calculate intersection lengths. The process to set up the geometry and check for intersections is the same to generate indices and values. > > So, in this case the tradeoff would be to either compute everything twice and save on storage cost or compute everything once and use more memory (essentially compute the matrix rows on each rank, preallocate and then set the matrix values). > > @Stefano Zampini : Yes, I only need MatMult and MatMultTranspose in the TAO objective/gradient evaluation but in the current state it's cheaper to use a matrix instead of computing the intersections for each objective/gradient evaluation. About ~70% of the application time is spent in MatMult and MatMultTranspose so we're hoping that this would benefit from running on GPU's. > > Thanks for the pointer to MatShell, implementing a matrix free method is something we might pursue in the future. > > -- > Sajid Ali | PhD Candidate > Applied Physics > Northwestern University > s-sajid-ali.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From sajidsyed2021 at u.northwestern.edu Fri Aug 14 14:59:29 2020 From: sajidsyed2021 at u.northwestern.edu (Sajid Ali) Date: Fri, 14 Aug 2020 14:59:29 -0500 Subject: [petsc-users] Question on matrix assembly In-Reply-To: <51B8FA5C-0231-40A8-A579-7536A0D9D5D8@petsc.dev> References: <62502F3C-4411-4FCB-BD4F-6DCF0100180F@gmail.com> <51B8FA5C-0231-40A8-A579-7536A0D9D5D8@petsc.dev> Message-ID: Hi Barry, All entries for a row are available together, but there is no requirement to compute them in order. Thank You, Sajid Ali | PhD Candidate Applied Physics Northwestern University s-sajid-ali.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From nb25 at rice.edu Sat Aug 15 18:09:25 2020 From: nb25 at rice.edu (Nidish) Date: Sat, 15 Aug 2020 18:09:25 -0500 Subject: [petsc-users] Singular Eigenproblem with SLEPc Message-ID: Hello, I'm presently working with a large finite element model with several RBE3 constraints with "virtual" 6DOF nodes in the model. I have about ~36000 3DOF nodes making up my model and about ~10 RBE3 virtual nodes (which have zero intrinsic mass and stiffness). I've extracted the matrices from Abaqus. The way these constraints are implemented is by introducing static linear constraints (populating the stiffness matrix) and padding the mass matrix with zero rows and columns in the rows corresponding to the virtual nodes. So this leaves me with an eigenproblem of the form, K.v = lam*M.v where M is singular but the eigenproblem is well defined. Abaqus seems to solve this perfectly well, but after exporting the matrices, I'm struggling to get slepc to solve this. The manual talks about deflation, etc., but I couldn't really understand too much. Is there any example code for such a case with a singular matrix where these procedures are carried out? Or could you provide references/guidances for approaching the problem? Thank you, Nidish From nb25 at rice.edu Sat Aug 15 20:53:52 2020 From: nb25 at rice.edu (Nidish) Date: Sat, 15 Aug 2020 20:53:52 -0500 Subject: [petsc-users] Solving singular systems with petsc Message-ID: <7a07ee36-4a86-7868-4d56-91e7a02e4e1c@rice.edu> The section on solving singular systems in the manual starts with assuming that the singular eigenvectors are already known. I have a large system where finding the singular eigenvectors is not trivially written down. How would you recommend I proceed with making initial estimates? In MATLAB (with MUCH smaller matrices), I conduct an eigensolve for the first 10 smallest eigenvalues and take the eigenvectors corresponding to the zero eigenvalues from this. This approach doesn't work here since I'm unable to use SLEPc for solving K.v = lam*M.v for cases where K is positive semi-definite (contains a few "rigid body modes") and M is strictly positive definite. I'd appreciate any assistance you may provide with this. Thank you, Nidish From bsmith at petsc.dev Sat Aug 15 21:38:19 2020 From: bsmith at petsc.dev (Barry Smith) Date: Sat, 15 Aug 2020 21:38:19 -0500 Subject: [petsc-users] Question on matrix assembly In-Reply-To: References: <62502F3C-4411-4FCB-BD4F-6DCF0100180F@gmail.com> <51B8FA5C-0231-40A8-A579-7536A0D9D5D8@petsc.dev> Message-ID: Sajid, In the branch barry/2020-08-13/matsetvalues-nopreallocation-seqaij I have provided a new function MatSeqAIJSetTotalPreallocation() that requires only a single estimate for the entire number of nonzeros in the matrix (not row by row). If you provide the rows in order with sorted column indices for each row the assembly process will be super fast and you won't need to compute anything twice. Barry > On Aug 14, 2020, at 2:59 PM, Sajid Ali wrote: > > Hi Barry, > > All entries for a row are available together, but there is no requirement to compute them in order. > > Thank You, > Sajid Ali | PhD Candidate > Applied Physics > Northwestern University > s-sajid-ali.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sat Aug 15 21:41:49 2020 From: bsmith at petsc.dev (Barry Smith) Date: Sat, 15 Aug 2020 21:41:49 -0500 Subject: [petsc-users] Solving singular systems with petsc In-Reply-To: <7a07ee36-4a86-7868-4d56-91e7a02e4e1c@rice.edu> References: <7a07ee36-4a86-7868-4d56-91e7a02e4e1c@rice.edu> Message-ID: Exactly what algorithm are you using in Matlab to get the 10 smallest eigenvalues and their corresponding eigenvectors? Barry > On Aug 15, 2020, at 8:53 PM, Nidish wrote: > > The section on solving singular systems in the manual starts with assuming that the singular eigenvectors are already known. > > I have a large system where finding the singular eigenvectors is not trivially written down. How would you recommend I proceed with making initial estimates? In MATLAB (with MUCH smaller matrices), I conduct an eigensolve for the first 10 smallest eigenvalues and take the eigenvectors corresponding to the zero eigenvalues from this. This approach doesn't work here since I'm unable to use SLEPc for solving > > K.v = lam*M.v > > for cases where K is positive semi-definite (contains a few "rigid body modes") and M is strictly positive definite. > > I'd appreciate any assistance you may provide with this. > > Thank you, > Nidish From nb25 at rice.edu Sat Aug 15 23:02:04 2020 From: nb25 at rice.edu (Nidish) Date: Sat, 15 Aug 2020 23:02:04 -0500 Subject: [petsc-users] Solving singular systems with petsc In-Reply-To: References: <7a07ee36-4a86-7868-4d56-91e7a02e4e1c@rice.edu> Message-ID: <4fce2a36-83d9-407a-a4fd-06a1710dd059@rice.edu> I just use the standard eigs function (https://www.mathworks.com/help/matlab/ref/eigs.html) as a black box. I think it uses a lanczos type method under the hood. Nidish On Aug 15, 2020, 21:42, at 21:42, Barry Smith wrote: > >Exactly what algorithm are you using in Matlab to get the 10 smallest >eigenvalues and their corresponding eigenvectors? > > Barry > > >> On Aug 15, 2020, at 8:53 PM, Nidish wrote: >> >> The section on solving singular systems in the manual starts with >assuming that the singular eigenvectors are already known. >> >> I have a large system where finding the singular eigenvectors is not >trivially written down. How would you recommend I proceed with making >initial estimates? In MATLAB (with MUCH smaller matrices), I conduct an >eigensolve for the first 10 smallest eigenvalues and take the >eigenvectors corresponding to the zero eigenvalues from this. This >approach doesn't work here since I'm unable to use SLEPc for solving >> >> K.v = lam*M.v >> >> for cases where K is positive semi-definite (contains a few "rigid >body modes") and M is strictly positive definite. >> >> I'd appreciate any assistance you may provide with this. >> >> Thank you, >> Nidish -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Sun Aug 16 00:17:47 2020 From: jed at jedbrown.org (Jed Brown) Date: Sat, 15 Aug 2020 23:17:47 -0600 Subject: [petsc-users] Solving singular systems with petsc In-Reply-To: <4fce2a36-83d9-407a-a4fd-06a1710dd059@rice.edu> References: <7a07ee36-4a86-7868-4d56-91e7a02e4e1c@rice.edu> <4fce2a36-83d9-407a-a4fd-06a1710dd059@rice.edu> Message-ID: <87bljb6uf8.fsf@jedbrown.org> It's possible to use this or a similar algorithm in SLEPc, but keep in mind that it's more expensive to compute these eigenvectors than to solve a linear system. Do you have a sequence of systems with the same null space? You referred to the null space as "rigid body modes". Why can't those be written down? Note that PETSc has convenience routines for computing rigid body modes from coordinates. Nidish writes: > I just use the standard eigs function (https://www.mathworks.com/help/matlab/ref/eigs.html) as a black box. I think it uses a lanczos type method under the hood. > > Nidish > > On Aug 15, 2020, 21:42, at 21:42, Barry Smith wrote: >> >>Exactly what algorithm are you using in Matlab to get the 10 smallest >>eigenvalues and their corresponding eigenvectors? >> >> Barry >> >> >>> On Aug 15, 2020, at 8:53 PM, Nidish wrote: >>> >>> The section on solving singular systems in the manual starts with >>assuming that the singular eigenvectors are already known. >>> >>> I have a large system where finding the singular eigenvectors is not >>trivially written down. How would you recommend I proceed with making >>initial estimates? In MATLAB (with MUCH smaller matrices), I conduct an >>eigensolve for the first 10 smallest eigenvalues and take the >>eigenvectors corresponding to the zero eigenvalues from this. This >>approach doesn't work here since I'm unable to use SLEPc for solving >>> >>> K.v = lam*M.v >>> >>> for cases where K is positive semi-definite (contains a few "rigid >>body modes") and M is strictly positive definite. >>> >>> I'd appreciate any assistance you may provide with this. >>> >>> Thank you, >>> Nidish From jroman at dsic.upv.es Sun Aug 16 01:50:42 2020 From: jroman at dsic.upv.es (Jose E. Roman) Date: Sun, 16 Aug 2020 08:50:42 +0200 Subject: [petsc-users] Singular Eigenproblem with SLEPc In-Reply-To: References: Message-ID: <5CB4E89E-9FB8-43CB-8447-6C4E25813075@dsic.upv.es> Nothing special is required for solving a GHEP with singular M, except for setting the problem type as GHEP, see https://slepc.upv.es/documentation/current/src/eps/tutorials/ex13.c.html Jose > El 16 ago 2020, a las 1:09, Nidish escribi?: > > Hello, > > I'm presently working with a large finite element model with several RBE3 constraints with "virtual" 6DOF nodes in the model. > > I have about ~36000 3DOF nodes making up my model and about ~10 RBE3 virtual nodes (which have zero intrinsic mass and stiffness). I've extracted the matrices from Abaqus. > > The way these constraints are implemented is by introducing static linear constraints (populating the stiffness matrix) and padding the mass matrix with zero rows and columns in the rows corresponding to the virtual nodes. So this leaves me with an eigenproblem of the form, > > K.v = lam*M.v > > where M is singular but the eigenproblem is well defined. Abaqus seems to solve this perfectly well, but after exporting the matrices, I'm struggling to get slepc to solve this. The manual talks about deflation, etc., but I couldn't really understand too much. > > Is there any example code for such a case with a singular matrix where these procedures are carried out? Or could you provide references/guidances for approaching the problem? > > Thank you, > Nidish From nb25 at rice.edu Sun Aug 16 11:26:17 2020 From: nb25 at rice.edu (Nidish) Date: Sun, 16 Aug 2020 11:26:17 -0500 Subject: [petsc-users] Solving singular systems with petsc In-Reply-To: <87bljb6uf8.fsf@jedbrown.org> References: <7a07ee36-4a86-7868-4d56-91e7a02e4e1c@rice.edu> <4fce2a36-83d9-407a-a4fd-06a1710dd059@rice.edu> <87bljb6uf8.fsf@jedbrown.org> Message-ID: Well some of the zero eigenvectors are rigid body modes, but there are some more which are introduced by lagrange-multiplier based constraint enforcement, which are non trivial. My final application is for a nonlinear simulation, so I don't mind the extra computational effort initially. Could you have me the suggested solver configurations to get this type of eigenvectors in slepc? Nidish On Aug 16, 2020, 00:17, at 00:17, Jed Brown wrote: >It's possible to use this or a similar algorithm in SLEPc, but keep in >mind that it's more expensive to compute these eigenvectors than to >solve a linear system. Do you have a sequence of systems with the same >null space? > >You referred to the null space as "rigid body modes". Why can't those >be written down? Note that PETSc has convenience routines for >computing rigid body modes from coordinates. > >Nidish writes: > >> I just use the standard eigs function >(https://www.mathworks.com/help/matlab/ref/eigs.html) as a black box. I >think it uses a lanczos type method under the hood. >> >> Nidish >> >> On Aug 15, 2020, 21:42, at 21:42, Barry Smith >wrote: >>> >>>Exactly what algorithm are you using in Matlab to get the 10 smallest >>>eigenvalues and their corresponding eigenvectors? >>> >>> Barry >>> >>> >>>> On Aug 15, 2020, at 8:53 PM, Nidish wrote: >>>> >>>> The section on solving singular systems in the manual starts with >>>assuming that the singular eigenvectors are already known. >>>> >>>> I have a large system where finding the singular eigenvectors is >not >>>trivially written down. How would you recommend I proceed with making >>>initial estimates? In MATLAB (with MUCH smaller matrices), I conduct >an >>>eigensolve for the first 10 smallest eigenvalues and take the >>>eigenvectors corresponding to the zero eigenvalues from this. This >>>approach doesn't work here since I'm unable to use SLEPc for solving >>>> >>>> K.v = lam*M.v >>>> >>>> for cases where K is positive semi-definite (contains a few "rigid >>>body modes") and M is strictly positive definite. >>>> >>>> I'd appreciate any assistance you may provide with this. >>>> >>>> Thank you, >>>> Nidish -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun Aug 16 11:27:08 2020 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 16 Aug 2020 12:27:08 -0400 Subject: [petsc-users] Solving singular systems with petsc In-Reply-To: References: <7a07ee36-4a86-7868-4d56-91e7a02e4e1c@rice.edu> <4fce2a36-83d9-407a-a4fd-06a1710dd059@rice.edu> <87bljb6uf8.fsf@jedbrown.org> Message-ID: On Sun, Aug 16, 2020 at 12:26 PM Nidish wrote: > Well some of the zero eigenvectors are rigid body modes, but there are > some more which are introduced by lagrange-multiplier based constraint > enforcement, which are non trivial. > > My final application is for a nonlinear simulation, so I don't mind the > extra computational effort initially. Could you have me the suggested > solver configurations to get this type of eigenvectors in slepc? > Follow the example that Jose linked to. Thanks, Matt > Nidish > On Aug 16, 2020, at 00:17, Jed Brown wrote: >> >> It's possible to use this or a similar algorithm in SLEPc, but keep in mind that it's more expensive to compute these eigenvectors than to solve a linear system. Do you have a sequence of systems with the same null space? >> >> You referred to the null space as "rigid body modes". Why can't those be written down? Note that PETSc has convenience routines for computing rigid body modes from coordinates. >> >> Nidish writes: >> >> I just use the standard eigs function (https://www.mathworks.com/help/matlab/ref/eigs.html) as a black box. I think it uses a lanczos type method under the hood. >>> >>> Nidish >>> >>> On Aug 15, 2020, 21:42, at 21:42, Barry Smith wrote: >>> >>>> >>>> Exactly what algorithm are you using in Matlab to get the 10 smallest >>>> eigenvalues and their corresponding eigenvectors? >>>> >>>> Barry >>>> >>>> >>>> On Aug 15, 2020, at 8:53 PM, Nidish wrote: >>>>> >>>>> The section on solving singular systems in the manual starts with >>>>> >>>> assuming that the singular eigenvectors are already known. >>>> >>>>> >>>>> I have a large system where finding the singular eigenvectors is not >>>>> >>>> trivially written down. How would you recommend I proceed with making >>>> initial estimates? In MATLAB (with MUCH smaller matrices), I conduct an >>>> eigensolve for the first 10 smallest eigenvalues and take the >>>> eigenvectors corresponding to the zero eigenvalues from this. This >>>> approach doesn't work here since I'm unable to use SLEPc for solving >>>> >>>>> >>>>> K.v = lam*M.v >>>>> >>>>> for cases where K is positive semi-definite (contains a few "rigid >>>>> >>>> body modes") and M is strictly positive definite. >>>> >>>>> >>>>> I'd appreciate any assistance you may provide with this. >>>>> >>>>> Thank you, >>>>> Nidish >>>>> >>>> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Sun Aug 16 13:49:25 2020 From: mfadams at lbl.gov (Mark Adams) Date: Sun, 16 Aug 2020 14:49:25 -0400 Subject: [petsc-users] Solving singular systems with petsc In-Reply-To: References: <7a07ee36-4a86-7868-4d56-91e7a02e4e1c@rice.edu> <4fce2a36-83d9-407a-a4fd-06a1710dd059@rice.edu> <87bljb6uf8.fsf@jedbrown.org> Message-ID: On Sun, Aug 16, 2020 at 12:26 PM Nidish wrote: > Well some of the zero eigenvectors are rigid body modes, but there are > some more which are introduced by lagrange-multiplier based constraint > enforcement, which are non trivial. > If you want the null space for AMG solvers, they do not deal with the constraints anyway, so you need an outer solver like Uzawa that uses a solver for the PDE block of your system. And Uzawa requires that you regularize this to remove the null space. So the Rigid body modes are what you want to give to the AMG solver. (it sounds like your system might be too complex for out of the box AMG anyway). (If you don't know what your null space is I don't see how you would keep the initial state and the RHS orthogonal to them.) > > My final application is for a nonlinear simulation, so I don't mind the > extra computational effort initially. Could you have me the suggested > solver configurations to get this type of eigenvectors in slepc? > > Nidish > On Aug 16, 2020, at 00:17, Jed Brown wrote: >> >> It's possible to use this or a similar algorithm in SLEPc, but keep in mind that it's more expensive to compute these eigenvectors than to solve a linear system. Do you have a sequence of systems with the same null space? >> >> You referred to the null space as "rigid body modes". Why can't those be written down? Note that PETSc has convenience routines for computing rigid body modes from coordinates. >> >> Nidish writes: >> >> I just use the standard eigs function (https://www.mathworks.com/help/matlab/ref/eigs.html) as a black box. I think it uses a lanczos type method under the hood. >>> >>> Nidish >>> >>> On Aug 15, 2020, 21:42, at 21:42, Barry Smith wrote: >>> >>>> >>>> Exactly what algorithm are you using in Matlab to get the 10 smallest >>>> eigenvalues and their corresponding eigenvectors? >>>> >>>> Barry >>>> >>>> >>>> On Aug 15, 2020, at 8:53 PM, Nidish wrote: >>>>> >>>>> The section on solving singular systems in the manual starts with >>>>> >>>> assuming that the singular eigenvectors are already known. >>>> >>>>> >>>>> I have a large system where finding the singular eigenvectors is not >>>>> >>>> trivially written down. How would you recommend I proceed with making >>>> initial estimates? In MATLAB (with MUCH smaller matrices), I conduct an >>>> eigensolve for the first 10 smallest eigenvalues and take the >>>> eigenvectors corresponding to the zero eigenvalues from this. This >>>> approach doesn't work here since I'm unable to use SLEPc for solving >>>> >>>>> >>>>> K.v = lam*M.v >>>>> >>>>> for cases where K is positive semi-definite (contains a few "rigid >>>>> >>>> body modes") and M is strictly positive definite. >>>> >>>>> >>>>> I'd appreciate any assistance you may provide with this. >>>>> >>>>> Thank you, >>>>> Nidish >>>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sun Aug 16 14:50:46 2020 From: bsmith at petsc.dev (Barry Smith) Date: Sun, 16 Aug 2020 14:50:46 -0500 Subject: [petsc-users] Solving singular systems with petsc In-Reply-To: References: <7a07ee36-4a86-7868-4d56-91e7a02e4e1c@rice.edu> <4fce2a36-83d9-407a-a4fd-06a1710dd059@rice.edu> <87bljb6uf8.fsf@jedbrown.org> Message-ID: If you know part of your null space explicitly (for example the rigid body modes) I would recommend you always use that information explicitly since it is extremely expensive numerically to obtain. Thus rather than numerically computing the entire null space compute the part orthogonal to the part you already know. Presumably SLEPc has tools to help do this, naively I would just orthogonalized against the know subspace during the computational process but there are probably better ways. Barry > On Aug 16, 2020, at 11:26 AM, Nidish wrote: > > Well some of the zero eigenvectors are rigid body modes, but there are some more which are introduced by lagrange-multiplier based constraint enforcement, which are non trivial. > > My final application is for a nonlinear simulation, so I don't mind the extra computational effort initially. Could you have me the suggested solver configurations to get this type of eigenvectors in slepc? > > Nidish > On Aug 16, 2020, at 00:17, Jed Brown > wrote: > It's possible to use this or a similar algorithm in SLEPc, but keep in mind that it's more expensive to compute these eigenvectors than to solve a linear system. Do you have a sequence of systems with the same null space? > > You referred to the null space as "rigid body modes". Why can't those be written down? Note that PETSc has convenience routines for computing rigid body modes from coordinates. > > Nidish writes: > > I just use the standard eigs function (https://www.mathworks.com/help/matlab/ref/eigs.html ) as a black box. I think it uses a lanczos type method under the hood. > > Nidish > > On Aug 15, 2020, 21:42, at 21:42, Barry Smith wrote: > > Exactly what algorithm are you using in Matlab to get the 10 smallest > eigenvalues and their corresponding eigenvectors? > > Barry > > > On Aug 15, 2020, at 8:53 PM, Nidish wrote: > > The section on solving singular systems in the manual starts with > assuming that the singular eigenvectors are already known. > > I have a large system where finding the singular eigenvectors is not > trivially written down. How would you recommend I proceed with making > initial estimates? In MATLAB (with MUCH smaller matrices), I conduct an > eigensolve for the first 10 smallest eigenvalues and take the > eigenvectors corresponding to the zero eigenvalues from this. This > approach doesn't work here since I'm unable to use SLEPc for solving > > K.v = lam*M.v > > for cases where K is positive semi-definite (contains a few "rigid > body modes") and M is strictly positive definite. > > I'd appreciate any assistance you may provide with this. > > Thank you, > Nidish -------------- next part -------------- An HTML attachment was scrubbed... URL: From nb25 at rice.edu Sun Aug 16 19:46:18 2020 From: nb25 at rice.edu (Nidish) Date: Sun, 16 Aug 2020 19:46:18 -0500 Subject: [petsc-users] Solving singular systems with petsc In-Reply-To: References: <7a07ee36-4a86-7868-4d56-91e7a02e4e1c@rice.edu> <4fce2a36-83d9-407a-a4fd-06a1710dd059@rice.edu> <87bljb6uf8.fsf@jedbrown.org> Message-ID: <75b47901-a5d2-e794-4c4f-27ed4cbe26b0@rice.edu> Thank you for the suggestions. I'm getting a zero pivot error for the LU in slepc while calculating the rest of the modes. Would conducting an SVD for just the stiffness matrix and then using the singular vectors as bases for the nullspace work? I haven't tried this out just yet, but I'm wondering if you could provide me insights into whether this will. Thanks, Nidish On 8/16/20 2:50 PM, Barry Smith wrote: > > ? If you know part of your null space explicitly (for example the > rigid body modes) I would recommend you always use that information > explicitly since it is extremely expensive numerically to obtain. Thus > rather than numerically computing the entire null space compute the > part orthogonal to the part you already know. Presumably SLEPc has > tools to help do this, naively I would just orthogonalized against the > know subspace during the computational process but there are probably > better ways. > > ? ?Barry > > > > >> On Aug 16, 2020, at 11:26 AM, Nidish > > wrote: >> >> Well some of the zero eigenvectors are rigid body modes, but there >> are some more which are introduced by lagrange-multiplier based >> constraint enforcement, which are non trivial. >> >> My final application is for a nonlinear simulation, so I don't mind >> the extra computational effort initially. Could you have me the >> suggested solver configurations to get this type of eigenvectors in >> slepc? >> >> Nidish >> On Aug 16, 2020, at 00:17, Jed Brown > > wrote: >> >> It's possible to use this or a similar algorithm in SLEPc, but keep in mind that it's more expensive to compute these eigenvectors than to solve a linear system. Do you have a sequence of systems with the same null space? >> >> You referred to the null space as "rigid body modes". Why can't those be written down? Note that PETSc has convenience routines for computing rigid body modes from coordinates. >> >> Nidish > writes: >> >> I just use the standard eigs function >> (https://www.mathworks.com/help/matlab/ref/eigs.html) as a >> black box. I think it uses a lanczos type method under the >> hood. Nidish On Aug 15, 2020, 21:42, at 21:42, Barry Smith >> > wrote: >> >> Exactly what algorithm are you using in Matlab to get the >> 10 smallest eigenvalues and their corresponding >> eigenvectors? Barry >> >> On Aug 15, 2020, at 8:53 PM, Nidish > > wrote: The section on solving >> singular systems in the manual starts with >> >> assuming that the singular eigenvectors are already known. >> >> I have a large system where finding the singular >> eigenvectors is not >> >> trivially written down. How would you recommend I proceed >> with making initial estimates? In MATLAB (with MUCH >> smaller matrices), I conduct an eigensolve for the first >> 10 smallest eigenvalues and take the eigenvectors >> corresponding to the zero eigenvalues from this. This >> approach doesn't work here since I'm unable to use SLEPc >> for solving >> >> K.v = lam*M.v for cases where K is positive >> semi-definite (contains a few "rigid >> >> body modes") and M is strictly positive definite. >> >> I'd appreciate any assistance you may provide with >> this. Thank you, Nidish >> > -- Nidish -------------- next part -------------- An HTML attachment was scrubbed... URL: From nb25 at rice.edu Sun Aug 16 19:50:04 2020 From: nb25 at rice.edu (Nidish) Date: Sun, 16 Aug 2020 19:50:04 -0500 Subject: [petsc-users] Singular Eigenproblem with SLEPc In-Reply-To: <5CB4E89E-9FB8-43CB-8447-6C4E25813075@dsic.upv.es> References: <5CB4E89E-9FB8-43CB-8447-6C4E25813075@dsic.upv.es> Message-ID: <6cb13549-8cb4-e75d-e021-779744aa7002@rice.edu> Thank you for the example, it is indeed useful! In my application, an additional complication is that the stiffness matrix "K" is also singular. So when I run my code with the suggested runtime flags, I persistently get a zero pivot error for the LU calculation (for the st_pc probably). I'm not sure what factorization or solver I should use in this case. TLDR: I have K.v=lam*M.v where BOTH K and M are singular. The nullspace of M is a subset of the nullspace of K. Thank you, Nidish On 8/16/20 1:50 AM, Jose E. Roman wrote: > Nothing special is required for solving a GHEP with singular M, except for setting the problem type as GHEP, see https://slepc.upv.es/documentation/current/src/eps/tutorials/ex13.c.html > Jose > > >> El 16 ago 2020, a las 1:09, Nidish escribi?: >> >> Hello, >> >> I'm presently working with a large finite element model with several RBE3 constraints with "virtual" 6DOF nodes in the model. >> >> I have about ~36000 3DOF nodes making up my model and about ~10 RBE3 virtual nodes (which have zero intrinsic mass and stiffness). I've extracted the matrices from Abaqus. >> >> The way these constraints are implemented is by introducing static linear constraints (populating the stiffness matrix) and padding the mass matrix with zero rows and columns in the rows corresponding to the virtual nodes. So this leaves me with an eigenproblem of the form, >> >> K.v = lam*M.v >> >> where M is singular but the eigenproblem is well defined. Abaqus seems to solve this perfectly well, but after exporting the matrices, I'm struggling to get slepc to solve this. The manual talks about deflation, etc., but I couldn't really understand too much. >> >> Is there any example code for such a case with a singular matrix where these procedures are carried out? Or could you provide references/guidances for approaching the problem? >> >> Thank you, >> Nidish -- Nidish From bsmith at petsc.dev Sun Aug 16 20:05:53 2020 From: bsmith at petsc.dev (Barry Smith) Date: Sun, 16 Aug 2020 20:05:53 -0500 Subject: [petsc-users] Solving singular systems with petsc In-Reply-To: <75b47901-a5d2-e794-4c4f-27ed4cbe26b0@rice.edu> References: <7a07ee36-4a86-7868-4d56-91e7a02e4e1c@rice.edu> <4fce2a36-83d9-407a-a4fd-06a1710dd059@rice.edu> <87bljb6uf8.fsf@jedbrown.org> <75b47901-a5d2-e794-4c4f-27ed4cbe26b0@rice.edu> Message-ID: SVD is enormously expensive, needs to be done on a full dense matrix so completely impractical. You need the best tuned iterative method, Jose is the by far the most knowledgeable about that. Barry > On Aug 16, 2020, at 7:46 PM, Nidish wrote: > > Thank you for the suggestions. > > I'm getting a zero pivot error for the LU in slepc while calculating the rest of the modes. > > Would conducting an SVD for just the stiffness matrix and then using the singular vectors as bases for the nullspace work? I haven't tried this out just yet, but I'm wondering if you could provide me insights into whether this will. > > Thanks, > Nidish > > On 8/16/20 2:50 PM, Barry Smith wrote: >> >> If you know part of your null space explicitly (for example the rigid body modes) I would recommend you always use that information explicitly since it is extremely expensive numerically to obtain. Thus rather than numerically computing the entire null space compute the part orthogonal to the part you already know. Presumably SLEPc has tools to help do this, naively I would just orthogonalized against the know subspace during the computational process but there are probably better ways. >> >> Barry >> >> >> >> >>> On Aug 16, 2020, at 11:26 AM, Nidish > wrote: >>> >>> Well some of the zero eigenvectors are rigid body modes, but there are some more which are introduced by lagrange-multiplier based constraint enforcement, which are non trivial. >>> >>> My final application is for a nonlinear simulation, so I don't mind the extra computational effort initially. Could you have me the suggested solver configurations to get this type of eigenvectors in slepc? >>> >>> Nidish >>> On Aug 16, 2020, at 00:17, Jed Brown > wrote: >>> It's possible to use this or a similar algorithm in SLEPc, but keep in mind that it's more expensive to compute these eigenvectors than to solve a linear system. Do you have a sequence of systems with the same null space? >>> >>> You referred to the null space as "rigid body modes". Why can't those be written down? Note that PETSc has convenience routines for computing rigid body modes from coordinates. >>> >>> Nidish > writes: >>> >>> I just use the standard eigs function (https://www.mathworks.com/help/matlab/ref/eigs.html ) as a black box. I think it uses a lanczos type method under the hood. >>> >>> Nidish >>> >>> On Aug 15, 2020, 21:42, at 21:42, Barry Smith > wrote: >>> >>> Exactly what algorithm are you using in Matlab to get the 10 smallest >>> eigenvalues and their corresponding eigenvectors? >>> >>> Barry >>> >>> >>> On Aug 15, 2020, at 8:53 PM, Nidish > wrote: >>> >>> The section on solving singular systems in the manual starts with >>> assuming that the singular eigenvectors are already known. >>> >>> I have a large system where finding the singular eigenvectors is not >>> trivially written down. How would you recommend I proceed with making >>> initial estimates? In MATLAB (with MUCH smaller matrices), I conduct an >>> eigensolve for the first 10 smallest eigenvalues and take the >>> eigenvectors corresponding to the zero eigenvalues from this. This >>> approach doesn't work here since I'm unable to use SLEPc for solving >>> >>> K.v = lam*M.v >>> >>> for cases where K is positive semi-definite (contains a few "rigid >>> body modes") and M is strictly positive definite. >>> >>> I'd appreciate any assistance you may provide with this. >>> >>> Thank you, >>> Nidish >> > -- > Nidish -------------- next part -------------- An HTML attachment was scrubbed... URL: From nb25 at rice.edu Sun Aug 16 20:10:16 2020 From: nb25 at rice.edu (Nidish) Date: Sun, 16 Aug 2020 20:10:16 -0500 Subject: [petsc-users] Solving singular systems with petsc In-Reply-To: References: <7a07ee36-4a86-7868-4d56-91e7a02e4e1c@rice.edu> <4fce2a36-83d9-407a-a4fd-06a1710dd059@rice.edu> <87bljb6uf8.fsf@jedbrown.org> <75b47901-a5d2-e794-4c4f-27ed4cbe26b0@rice.edu> Message-ID: Oh damn. Alright, I'll keep trying out the different options. Thank you, Nidish On 8/16/20 8:05 PM, Barry Smith wrote: > > ? SVD is enormously expensive, needs to be done on a full dense matrix > so completely impractical. You need the best tuned iterative method, > Jose is the by far the most knowledgeable about that. > > ? ?Barry > > >> On Aug 16, 2020, at 7:46 PM, Nidish > > wrote: >> >> Thank you for the suggestions. >> >> I'm getting a zero pivot error for the LU in slepc while calculating >> the rest of the modes. >> >> Would conducting an SVD for just the stiffness matrix and then using >> the singular vectors as bases for the nullspace work? I haven't tried >> this out just yet, but I'm wondering if you could provide me insights >> into whether this will. >> >> Thanks, >> Nidish >> >> On 8/16/20 2:50 PM, Barry Smith wrote: >>> >>> ? If you know part of your null space explicitly (for example the >>> rigid body modes) I would recommend you always use that information >>> explicitly since it is extremely expensive numerically to obtain. >>> Thus rather than numerically computing the entire null space compute >>> the part orthogonal to the part you already know. Presumably SLEPc >>> has tools to help do this, naively I would just orthogonalized >>> against the know subspace during the computational process but there >>> are probably better ways. >>> >>> ? ?Barry >>> >>> >>> >>> >>>> On Aug 16, 2020, at 11:26 AM, Nidish >>> > wrote: >>>> >>>> Well some of the zero eigenvectors are rigid body modes, but there >>>> are some more which are introduced by lagrange-multiplier based >>>> constraint enforcement, which are non trivial. >>>> >>>> My final application is for a nonlinear simulation, so I don't mind >>>> the extra computational effort initially. Could you have me the >>>> suggested solver configurations to get this type of eigenvectors in >>>> slepc? >>>> >>>> Nidish >>>> On Aug 16, 2020, at 00:17, Jed Brown >>> > wrote: >>>> >>>> It's possible to use this or a similar algorithm in SLEPc, but keep in mind that it's more expensive to compute these eigenvectors than to solve a linear system. Do you have a sequence of systems with the same null space? >>>> >>>> You referred to the null space as "rigid body modes". Why can't those be written down? Note that PETSc has convenience routines for computing rigid body modes from coordinates. >>>> >>>> Nidish > writes: >>>> >>>> I just use the standard eigs function >>>> (https://www.mathworks.com/help/matlab/ref/eigs.html) as a >>>> black box. I think it uses a lanczos type method under the >>>> hood. Nidish On Aug 15, 2020, 21:42, at 21:42, Barry Smith >>>> > wrote: >>>> >>>> Exactly what algorithm are you using in Matlab to get >>>> the 10 smallest eigenvalues and their corresponding >>>> eigenvectors? Barry >>>> >>>> On Aug 15, 2020, at 8:53 PM, Nidish >>> > wrote: The section on >>>> solving singular systems in the manual starts with >>>> >>>> assuming that the singular eigenvectors are already known. >>>> >>>> I have a large system where finding the singular >>>> eigenvectors is not >>>> >>>> trivially written down. How would you recommend I >>>> proceed with making initial estimates? In MATLAB (with >>>> MUCH smaller matrices), I conduct an eigensolve for the >>>> first 10 smallest eigenvalues and take the eigenvectors >>>> corresponding to the zero eigenvalues from this. This >>>> approach doesn't work here since I'm unable to use >>>> SLEPc for solving >>>> >>>> K.v = lam*M.v for cases where K is positive >>>> semi-definite (contains a few "rigid >>>> >>>> body modes") and M is strictly positive definite. >>>> >>>> I'd appreciate any assistance you may provide with >>>> this. Thank you, Nidish >>>> >>> >> -- >> Nidish > -- Nidish -------------- next part -------------- An HTML attachment was scrubbed... URL: From nb25 at rice.edu Sun Aug 16 23:21:54 2020 From: nb25 at rice.edu (Nidish) Date: Sun, 16 Aug 2020 23:21:54 -0500 Subject: [petsc-users] Possible bug in PETSc4py Mat().CreateAIJ with rectangular matrices? Message-ID: Hello, I'm presently using the following to convert scipy.sparse csr matrices to petsc. pM = PETSc.Mat().createAIJ(size=M.shape, csr =(M.indptr, M.indices, M.data)) This however doesn't seem to work when M is rectangular. For this case I have to do pM = PETSc.Mat().createAIJ(size=M.T.shape, csr =(M.indptr, M.indices, M.data)) i.e., I need to use the shape of the transpose of the matrix. Does this have something to do with the way the csr format is interpreted in the petsc4py wrappings or in PETSc itself? -- Nidish -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Mon Aug 17 02:40:35 2020 From: jroman at dsic.upv.es (Jose E. Roman) Date: Mon, 17 Aug 2020 09:40:35 +0200 Subject: [petsc-users] Singular Eigenproblem with SLEPc In-Reply-To: <6cb13549-8cb4-e75d-e021-779744aa7002@rice.edu> References: <5CB4E89E-9FB8-43CB-8447-6C4E25813075@dsic.upv.es> <6cb13549-8cb4-e75d-e021-779744aa7002@rice.edu> Message-ID: <4A9F7F10-4261-4399-B691-BB5082497568@dsic.upv.es> In that case, you should use a factorization that can handle singular matrices. PETSc's native LU does not, but you can install an external direct solvers that is robust for this case, such as MUMPS (see section 3.4.1 of SLEPc users manual). With this, you should get the eigenvalues correctly, but the eigenvectors will likely be corrupted by components in the nullspace. If you want to avoid this, compute a basis of the common nullspace and pass it with EPSSetDeflationSpace() before EPSSolve(). Jose > El 17 ago 2020, a las 2:50, Nidish escribi?: > > Thank you for the example, it is indeed useful! > > In my application, an additional complication is that the stiffness matrix "K" is also singular. So when I run my code with the suggested runtime flags, I persistently get a zero pivot error for the LU calculation (for the st_pc probably). I'm not sure what factorization or solver I should use in this case. > > TLDR: I have K.v=lam*M.v where BOTH K and M are singular. The nullspace of M is a subset of the nullspace of K. > > Thank you, > Nidish > > On 8/16/20 1:50 AM, Jose E. Roman wrote: >> Nothing special is required for solving a GHEP with singular M, except for setting the problem type as GHEP, see https://slepc.upv.es/documentation/current/src/eps/tutorials/ex13.c.html >> Jose >> >> >>> El 16 ago 2020, a las 1:09, Nidish escribi?: >>> >>> Hello, >>> >>> I'm presently working with a large finite element model with several RBE3 constraints with "virtual" 6DOF nodes in the model. >>> >>> I have about ~36000 3DOF nodes making up my model and about ~10 RBE3 virtual nodes (which have zero intrinsic mass and stiffness). I've extracted the matrices from Abaqus. >>> >>> The way these constraints are implemented is by introducing static linear constraints (populating the stiffness matrix) and padding the mass matrix with zero rows and columns in the rows corresponding to the virtual nodes. So this leaves me with an eigenproblem of the form, >>> >>> K.v = lam*M.v >>> >>> where M is singular but the eigenproblem is well defined. Abaqus seems to solve this perfectly well, but after exporting the matrices, I'm struggling to get slepc to solve this. The manual talks about deflation, etc., but I couldn't really understand too much. >>> >>> Is there any example code for such a case with a singular matrix where these procedures are carried out? Or could you provide references/guidances for approaching the problem? >>> >>> Thank you, >>> Nidish > -- > Nidish From jroman at dsic.upv.es Mon Aug 17 02:51:27 2020 From: jroman at dsic.upv.es (Jose E. Roman) Date: Mon, 17 Aug 2020 09:51:27 +0200 Subject: [petsc-users] Solving singular systems with petsc In-Reply-To: References: <7a07ee36-4a86-7868-4d56-91e7a02e4e1c@rice.edu> <4fce2a36-83d9-407a-a4fd-06a1710dd059@rice.edu> <87bljb6uf8.fsf@jedbrown.org> <75b47901-a5d2-e794-4c4f-27ed4cbe26b0@rice.edu> Message-ID: You can use SLEPc's SVD to compute the nullspace, but it has pitfalls: make sure you use an absolute convergence test (not relative); for the particular case of zero singular vectors, accuracy may not be very good and convergence may be slow (with the corresponding high computational cost). MUMPS has functionality to get a basis of the nullspace, once you have computed the factorization. But I don't know if this is easily accessible via PETSc. Jose > El 17 ago 2020, a las 3:10, Nidish escribi?: > > Oh damn. Alright, I'll keep trying out the different options. > > Thank you, > Nidish > > On 8/16/20 8:05 PM, Barry Smith wrote: >> >> SVD is enormously expensive, needs to be done on a full dense matrix so completely impractical. You need the best tuned iterative method, Jose is the by far the most knowledgeable about that. >> >> Barry >> >> >>> On Aug 16, 2020, at 7:46 PM, Nidish wrote: >>> >>> Thank you for the suggestions. >>> >>> I'm getting a zero pivot error for the LU in slepc while calculating the rest of the modes. >>> >>> Would conducting an SVD for just the stiffness matrix and then using the singular vectors as bases for the nullspace work? I haven't tried this out just yet, but I'm wondering if you could provide me insights into whether this will. >>> >>> Thanks, >>> Nidish >>> >>> On 8/16/20 2:50 PM, Barry Smith wrote: >>>> >>>> If you know part of your null space explicitly (for example the rigid body modes) I would recommend you always use that information explicitly since it is extremely expensive numerically to obtain. Thus rather than numerically computing the entire null space compute the part orthogonal to the part you already know. Presumably SLEPc has tools to help do this, naively I would just orthogonalized against the know subspace during the computational process but there are probably better ways. >>>> >>>> Barry >>>> >>>> >>>> >>>> >>>>> On Aug 16, 2020, at 11:26 AM, Nidish wrote: >>>>> >>>>> Well some of the zero eigenvectors are rigid body modes, but there are some more which are introduced by lagrange-multiplier based constraint enforcement, which are non trivial. >>>>> >>>>> My final application is for a nonlinear simulation, so I don't mind the extra computational effort initially. Could you have me the suggested solver configurations to get this type of eigenvectors in slepc? >>>>> >>>>> Nidish >>>>> On Aug 16, 2020, at 00:17, Jed Brown wrote: >>>>> It's possible to use this or a similar algorithm in SLEPc, but keep in mind that it's more expensive to compute these eigenvectors than to solve a linear system. Do you have a sequence of systems with the same null space? >>>>> >>>>> You referred to the null space as "rigid body modes". Why can't those be written down? Note that PETSc has convenience routines for computing rigid body modes from coordinates. >>>>> >>>>> Nidish < >>>>> nb25 at rice.edu >>>>> > writes: >>>>> >>>>> >>>>> I just use the standard eigs function (https://www.mathworks.com/help/matlab/ref/eigs.html >>>>> ) as a black box. I think it uses a lanczos type method under the hood. >>>>> >>>>> Nidish >>>>> >>>>> On Aug 15, 2020, 21:42, at 21:42, Barry Smith < >>>>> bsmith at petsc.dev >>>>> > wrote: >>>>> >>>>> >>>>> Exactly what algorithm are you using in Matlab to get the 10 smallest >>>>> eigenvalues and their corresponding eigenvectors? >>>>> >>>>> Barry >>>>> >>>>> >>>>> >>>>> On Aug 15, 2020, at 8:53 PM, Nidish >>>> > wrote: >>>>> >>>>> The section on solving singular systems in the manual starts with >>>>> >>>>> assuming that the singular eigenvectors are already known. >>>>> >>>>> >>>>> I have a large system where finding the singular eigenvectors is not >>>>> >>>>> trivially written down. How would you recommend I proceed with making >>>>> initial estimates? In MATLAB (with MUCH smaller matrices), I conduct an >>>>> eigensolve for the first 10 smallest eigenvalues and take the >>>>> eigenvectors corresponding to the zero eigenvalues from this. This >>>>> approach doesn't work here since I'm unable to use SLEPc for solving >>>>> >>>>> >>>>> K.v = lam*M.v >>>>> >>>>> for cases where K is positive semi-definite (contains a few "rigid >>>>> >>>>> body modes") and M is strictly positive definite. >>>>> >>>>> >>>>> I'd appreciate any assistance you may provide with this. >>>>> >>>>> Thank you, >>>>> Nidish >>>>> >>>> >>> -- >>> Nidish >> > -- > Nidish From knepley at gmail.com Mon Aug 17 07:02:04 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 17 Aug 2020 08:02:04 -0400 Subject: [petsc-users] Possible bug in PETSc4py Mat().CreateAIJ with rectangular matrices? In-Reply-To: References: Message-ID: On Mon, Aug 17, 2020 at 12:22 AM Nidish wrote: > Hello, > > I'm presently using the following to convert scipy.sparse csr matrices to > petsc. > > pM = PETSc.Mat().createAIJ(size=M.shape, csr =(M.indptr, M.indices, > M.data)) > > This however doesn't seem to work when M is rectangular. For this case I > have to do > > pM = PETSc.Mat().createAIJ(size=M.T.shape, csr =(M.indptr, M.indices, > M.data)) > > i.e., I need to use the shape of the transpose of the matrix. > > Does this have something to do with the way the csr format is interpreted > in the petsc4py wrappings or in PETSc itself? > No. It sounds like something in scipy. These are just the number of rows and columns. Thanks, Matt > -- > Nidish > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Aug 17 07:27:14 2020 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 17 Aug 2020 07:27:14 -0500 Subject: [petsc-users] Solving singular systems with petsc In-Reply-To: References: <7a07ee36-4a86-7868-4d56-91e7a02e4e1c@rice.edu> <4fce2a36-83d9-407a-a4fd-06a1710dd059@rice.edu> <87bljb6uf8.fsf@jedbrown.org> <75b47901-a5d2-e794-4c4f-27ed4cbe26b0@rice.edu> Message-ID: Nidish, Your matrix is dense, correct? MUMPS is for sparse matrices. Then I guess you could use Scalapack http://netlib.org/scalapack/slug/node48.html#SECTION04323200000000000000 to do the SVD. The work is order N^3 and parallel efficiency may not be great but it might help you solve your problem. I don't know if SLEPc has an interface to Scalapack for SVD or not. Barry > On Aug 17, 2020, at 2:51 AM, Jose E. Roman wrote: > > You can use SLEPc's SVD to compute the nullspace, but it has pitfalls: make sure you use an absolute convergence test (not relative); for the particular case of zero singular vectors, accuracy may not be very good and convergence may be slow (with the corresponding high computational cost). > > MUMPS has functionality to get a basis of the nullspace, once you have computed the factorization. But I don't know if this is easily accessible via PETSc. > > Jose > > > >> El 17 ago 2020, a las 3:10, Nidish escribi?: >> >> Oh damn. Alright, I'll keep trying out the different options. >> >> Thank you, >> Nidish >> >> On 8/16/20 8:05 PM, Barry Smith wrote: >>> >>> SVD is enormously expensive, needs to be done on a full dense matrix so completely impractical. You need the best tuned iterative method, Jose is the by far the most knowledgeable about that. >>> >>> Barry >>> >>> >>>> On Aug 16, 2020, at 7:46 PM, Nidish wrote: >>>> >>>> Thank you for the suggestions. >>>> >>>> I'm getting a zero pivot error for the LU in slepc while calculating the rest of the modes. >>>> >>>> Would conducting an SVD for just the stiffness matrix and then using the singular vectors as bases for the nullspace work? I haven't tried this out just yet, but I'm wondering if you could provide me insights into whether this will. >>>> >>>> Thanks, >>>> Nidish >>>> >>>> On 8/16/20 2:50 PM, Barry Smith wrote: >>>>> >>>>> If you know part of your null space explicitly (for example the rigid body modes) I would recommend you always use that information explicitly since it is extremely expensive numerically to obtain. Thus rather than numerically computing the entire null space compute the part orthogonal to the part you already know. Presumably SLEPc has tools to help do this, naively I would just orthogonalized against the know subspace during the computational process but there are probably better ways. >>>>> >>>>> Barry >>>>> >>>>> >>>>> >>>>> >>>>>> On Aug 16, 2020, at 11:26 AM, Nidish wrote: >>>>>> >>>>>> Well some of the zero eigenvectors are rigid body modes, but there are some more which are introduced by lagrange-multiplier based constraint enforcement, which are non trivial. >>>>>> >>>>>> My final application is for a nonlinear simulation, so I don't mind the extra computational effort initially. Could you have me the suggested solver configurations to get this type of eigenvectors in slepc? >>>>>> >>>>>> Nidish >>>>>> On Aug 16, 2020, at 00:17, Jed Brown wrote: >>>>>> It's possible to use this or a similar algorithm in SLEPc, but keep in mind that it's more expensive to compute these eigenvectors than to solve a linear system. Do you have a sequence of systems with the same null space? >>>>>> >>>>>> You referred to the null space as "rigid body modes". Why can't those be written down? Note that PETSc has convenience routines for computing rigid body modes from coordinates. >>>>>> >>>>>> Nidish < >>>>>> nb25 at rice.edu >>>>>>> writes: >>>>>> >>>>>> >>>>>> I just use the standard eigs function (https://www.mathworks.com/help/matlab/ref/eigs.html >>>>>> ) as a black box. I think it uses a lanczos type method under the hood. >>>>>> >>>>>> Nidish >>>>>> >>>>>> On Aug 15, 2020, 21:42, at 21:42, Barry Smith < >>>>>> bsmith at petsc.dev >>>>>>> wrote: >>>>>> >>>>>> >>>>>> Exactly what algorithm are you using in Matlab to get the 10 smallest >>>>>> eigenvalues and their corresponding eigenvectors? >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> >>>>>> On Aug 15, 2020, at 8:53 PM, Nidish >>>>>> wrote: >>>>>> >>>>>> The section on solving singular systems in the manual starts with >>>>>> >>>>>> assuming that the singular eigenvectors are already known. >>>>>> >>>>>> >>>>>> I have a large system where finding the singular eigenvectors is not >>>>>> >>>>>> trivially written down. How would you recommend I proceed with making >>>>>> initial estimates? In MATLAB (with MUCH smaller matrices), I conduct an >>>>>> eigensolve for the first 10 smallest eigenvalues and take the >>>>>> eigenvectors corresponding to the zero eigenvalues from this. This >>>>>> approach doesn't work here since I'm unable to use SLEPc for solving >>>>>> >>>>>> >>>>>> K.v = lam*M.v >>>>>> >>>>>> for cases where K is positive semi-definite (contains a few "rigid >>>>>> >>>>>> body modes") and M is strictly positive definite. >>>>>> >>>>>> >>>>>> I'd appreciate any assistance you may provide with this. >>>>>> >>>>>> Thank you, >>>>>> Nidish >>>>>> >>>>> >>>> -- >>>> Nidish >>> >> -- >> Nidish > From nicola.varini at gmail.com Mon Aug 17 08:24:08 2020 From: nicola.varini at gmail.com (nicola varini) Date: Mon, 17 Aug 2020 15:24:08 +0200 Subject: [petsc-users] overlap cpu and gpu? In-Reply-To: References: <6C7446CE-D6FD-4087-8B81-41494FA712E7@petsc.dev> <87eeoqp3t2.fsf@jedbrown.org> Message-ID: Hi Mark, I do confirm that hypre with boomeramg is working fine and is pretty fast. However, none of the GAMG option works. Did anyone ever succeeded in usign hypre with petsc on gpu? I did manage to compile hypre on gpu but I do get the following error: ======= CC gpuhypre/obj/vec/vec/impls/hypre/vhyp.o In file included from /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config.h:22, from /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/execution_policy.h:23, from /users/nvarini/hypre/include/_hypre_utilities.h:1129, from /users/nvarini/hypre/include/_hypre_IJ_mv.h:14, from /scratch/snx3000/nvarini/petsc-3.13.3/include/../src/vec/vec/impls/hypre/vhyp.h:6, from /scratch/snx3000/nvarini/petsc-3.13.3/src/vec/vec/impls/hypre/vhyp.c:7: /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/version.h:83:1: error: unknown type name 'namespace' namespace thrust ^~~~~~~~~ /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/version.h:84:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token { ^ In file included from /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config/config.h:28, from /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config.h:23, from /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/execution_policy.h:23, from /users/nvarini/hypre/include/_hypre_utilities.h:1129, from /users/nvarini/hypre/include/_hypre_IJ_mv.h:14, from /scratch/snx3000/nvarini/petsc-3.13.3/include/../src/vec/vec/impls/hypre/vhyp.h:6, from /scratch/snx3000/nvarini/petsc-3.13.3/src/vec/vec/impls/hypre/vhyp.c:7: /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config/cpp_compatibility.h:21:10: fatal error: cstddef: No such file or directory #include ^~~~~~~~~ compilation terminated. ======= Nicola Il giorno ven 14 ago 2020 alle ore 20:13 Mark Adams ha scritto: > You can try Hypre. If that fails then there is a problem with your system. > > And you can run with -info and grep on GAMG and send the output and I can > see if I see anything funny. > > If this is just a Lapacian with a stable discretization and not crazy > material parameters then stretched grids are about the only thing that can > hurt the solver. > > Do both of your solves fail in a similar way? > > On the CPU you can try this with large subdomains, preferably (in serial > ideally): > -ampere_mg_levels_ksp_type richardson > -ampere_mg_levels_pc_type sor > > And check that there are no unused options with -options_left. GAMG can > fail with bad eigen estimates, but these parameters look fine. > > On Fri, Aug 14, 2020 at 5:01 AM nicola varini > wrote: > >> Dear Barry, yes it gives the same problems. >> >> Il giorno gio 13 ago 2020 alle ore 23:22 Barry Smith >> ha scritto: >> >>> >>> Does the same thing work (with GAMG) if you run on the same problem >>> on the same machine same number of MPI ranks but make a new PETSC_ARCH that >>> does NOT use the GPUs? >>> >>> Barry >>> >>> Ideally one gets almost identical convergence with CPUs or GPUs (same >>> problem, same machine) but a bug or numerically change "might" affect this. >>> >>> On Aug 13, 2020, at 10:28 AM, nicola varini >>> wrote: >>> >>> Dear Barry, you are right. The Cray argument checking is incorrect. It >>> does work with download-fblaslapack. >>> However it does fail to converge. Is there anything obviously wrong with >>> my petscrc? >>> Anything else am I missing? >>> >>> Thanks >>> >>> Il giorno gio 13 ago 2020 alle ore 03:17 Barry Smith >>> ha scritto: >>> >>>> >>>> The QR is always done on the CPU, we don't have generic calls to >>>> blas/lapack go to the GPU currently. >>>> >>>> The error message is: >>>> >>>> On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal value >>>> (info = -7) >>>> >>>> argument 7 is &LWORK which is defined by >>>> >>>> PetscBLASInt LWORK=N*bs; >>>> >>>> and >>>> >>>> N=nSAvec is the column block size of new P. >>>> >>>> Presumably this is a huge run with many processes so using the >>>> debugger is not practical? >>>> >>>> We need to see what these variables are >>>> >>>> N, bs, nSAvec >>>> >>>> perhaps nSAvec is zero which could easily upset LAPACK. >>>> >>>> Crudest thing would be to just put a print statement in the code >>>> before the LAPACK call of if they are called many times add an error check >>>> like that >>>> generates an error if any of these three values are 0 (or negative). >>>> >>>> Barry >>>> >>>> >>>> It is not impossible that the Cray argument checking is incorrect >>>> and the value passed in is fine. You can check this by using >>>> --download-fblaslapack and see if the same or some other error comes up. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Aug 12, 2020, at 7:19 PM, Mark Adams wrote: >>>> >>>> Can you reproduce this on the CPU? >>>> The QR factorization seems to be failing. That could be from bad data >>>> or a bad GPU QR. >>>> >>>> On Wed, Aug 12, 2020 at 4:19 AM nicola varini >>>> wrote: >>>> >>>>> Dear all, following the suggestions I did resubmit the simulation with >>>>> the petscrc below. >>>>> However I do get the following error: >>>>> ======== >>>>> 7362 [592]PETSC ERROR: #1 formProl0() line 748 in >>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>>> 7363 [339]PETSC ERROR: Petsc has generated inconsistent data >>>>> 7364 [339]PETSC ERROR: xGEQRF error >>>>> 7365 [339]PETSC ERROR: See >>>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>>> shooting. >>>>> 7366 [339]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020 >>>>> 7367 [339]PETSC ERROR: >>>>> /users/nvarini/gbs_test_nicola/bin/gbs_daint_gpu_gnu on a named nid05083 >>>>> by nvarini Wed Aug 12 10:06:15 2020 >>>>> 7368 [339]PETSC ERROR: Configure options --with-cc=cc --with-fc=ftn >>>>> --known-mpi-shared-libraries=1 --known-mpi-c-double-complex=1 >>>>> --known-mpi-int64_t=1 --known-mpi-long-double=1 --with-batch=1 >>>>> --known-64-bit-blas-indices=0 --LIBS=-lstdc++ --with-cxxlib-autodetect=0 >>>>> --with-scalapa ck=1 --with-cxx=CC --with-debugging=0 >>>>> --with-hypre-dir=/opt/cray/pe/tpsl/19.06.1/GNU/8.2/haswell >>>>> --prefix=/scratch/snx3000/nvarini/petsc3.13.3-gpu --with-cuda=1 >>>>> --with-cuda-c=nvcc --with-cxxlib-autodetect=0 >>>>> --COPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include - >>>>> -with-cxx=CC >>>>> --CXXOPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include >>>>> 7369 [592]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in >>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>>> 7370 [592]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>>> 7371 [592]PETSC ERROR: #4 PCSetUp() line 898 in >>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c >>>>> 7372 [592]PETSC ERROR: #5 KSPSetUp() line 376 in >>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>> 7373 [592]PETSC ERROR: #6 KSPSolve_Private() line 633 in >>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>> 7374 [316]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>>> 7375 [339]PETSC ERROR: #1 formProl0() line 748 in >>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>>> 7376 [339]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in >>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>>> 7377 [339]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>>> 7378 [339]PETSC ERROR: #4 PCSetUp() line 898 in >>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c >>>>> 7379 [339]PETSC ERROR: #5 KSPSetUp() line 376 in >>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>> 7380 [592]PETSC ERROR: #7 KSPSolve() line 853 in >>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>> 7381 [339]PETSC ERROR: #6 KSPSolve_Private() line 633 in >>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>> 7382 [339]PETSC ERROR: #7 KSPSolve() line 853 in >>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>> 7383 On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal value >>>>> (info = -7) >>>>> 7384 [160]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>>> ======== >>>>> >>>>> I did try other pc_gamg_type but they fails as well. >>>>> >>>>> >>>>> #PETSc Option Table entries: >>>>> -ampere_dm_mat_type aijcusparse >>>>> -ampere_dm_vec_type cuda >>>>> -ampere_ksp_atol 1e-15 >>>>> -ampere_ksp_initial_guess_nonzero yes >>>>> -ampere_ksp_reuse_preconditioner yes >>>>> -ampere_ksp_rtol 1e-7 >>>>> -ampere_ksp_type dgmres >>>>> -ampere_mg_levels_esteig_ksp_max_it 10 >>>>> -ampere_mg_levels_esteig_ksp_type cg >>>>> -ampere_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >>>>> -ampere_mg_levels_ksp_type chebyshev >>>>> -ampere_mg_levels_pc_type jacobi >>>>> -ampere_pc_gamg_agg_nsmooths 1 >>>>> -ampere_pc_gamg_coarse_eq_limit 10 >>>>> -ampere_pc_gamg_reuse_interpolation true >>>>> -ampere_pc_gamg_square_graph 1 >>>>> -ampere_pc_gamg_threshold 0.05 >>>>> -ampere_pc_gamg_threshold_scale .0 >>>>> -ampere_pc_gamg_type agg >>>>> -ampere_pc_type gamg >>>>> -dm_mat_type aijcusparse >>>>> -dm_vec_type cuda >>>>> -log_view >>>>> -poisson_dm_mat_type aijcusparse >>>>> -poisson_dm_vec_type cuda >>>>> -poisson_ksp_atol 1e-15 >>>>> -poisson_ksp_initial_guess_nonzero yes >>>>> -poisson_ksp_reuse_preconditioner yes >>>>> -poisson_ksp_rtol 1e-7 >>>>> -poisson_ksp_type dgmres >>>>> -poisson_log_view >>>>> -poisson_mg_levels_esteig_ksp_max_it 10 >>>>> -poisson_mg_levels_esteig_ksp_type cg >>>>> -poisson_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >>>>> -poisson_mg_levels_ksp_max_it 1 >>>>> -poisson_mg_levels_ksp_type chebyshev >>>>> -poisson_mg_levels_pc_type jacobi >>>>> -poisson_pc_gamg_agg_nsmooths 1 >>>>> -poisson_pc_gamg_coarse_eq_limit 10 >>>>> -poisson_pc_gamg_reuse_interpolation true >>>>> -poisson_pc_gamg_square_graph 1 >>>>> -poisson_pc_gamg_threshold 0.05 >>>>> -poisson_pc_gamg_threshold_scale .0 >>>>> -poisson_pc_gamg_type agg >>>>> -poisson_pc_type gamg >>>>> -use_mat_nearnullspace true >>>>> #End of PETSc Option Table entries >>>>> >>>>> Regards, >>>>> >>>>> Nicola >>>>> >>>>> Il giorno mar 4 ago 2020 alle ore 17:57 Mark Adams >>>>> ha scritto: >>>>> >>>>>> >>>>>> >>>>>> On Tue, Aug 4, 2020 at 6:35 AM Stefano Zampini < >>>>>> stefano.zampini at gmail.com> wrote: >>>>>> >>>>>>> Nicola, >>>>>>> >>>>>>> You are actually not using the GPU properly, since you use HYPRE >>>>>>> preconditioning, which is CPU only. One of your solvers is actually slower >>>>>>> on ?GPU?. >>>>>>> For a full AMG GPU, you can use PCGAMG, with cheby smoothers and >>>>>>> with Jacobi preconditioning. Mark can help you out with the specific >>>>>>> command line options. >>>>>>> When it works properly, everything related to PC application is >>>>>>> offloaded to the GPU, and you should expect to get the well-known and >>>>>>> branded 10x (maybe more) speedup one is expecting from GPUs during KSPSolve >>>>>>> >>>>>>> >>>>>> The speedup depends on the machine, but on SUMMIT, using enough CPUs >>>>>> to saturate the memory bus vs all 6 GPUs the speedup is a function of >>>>>> problem subdomain size. I saw 10x at about 100K equations/process. >>>>>> >>>>>> >>>>>>> Doing what you want to do is one of the last optimization steps of >>>>>>> an already optimized code before entering production. Yours is not even >>>>>>> optimized for proper GPU usage yet. >>>>>>> Also, any specific reason why you are using dgmres and fgmres? >>>>>>> >>>>>>> PETSc has not been designed with multi-threading in mind. You can >>>>>>> achieve ?overlap? of the two solves by splitting the communicator. But then >>>>>>> you need communications to let the two solutions talk to each other. >>>>>>> >>>>>>> Thanks >>>>>>> Stefano >>>>>>> >>>>>>> >>>>>>> On Aug 4, 2020, at 12:04 PM, nicola varini >>>>>>> wrote: >>>>>>> >>>>>>> Dear all, thanks for your replies. The reason why I've asked if it >>>>>>> is possible to overlap poisson and ampere is because they roughly >>>>>>> take the same amount of time. Please find in attachment the >>>>>>> profiling logs for only CPU and only GPU. >>>>>>> Of course it is possible to split the MPI communicator and run each >>>>>>> solver on different subcommunicator, however this would involve more >>>>>>> communication. >>>>>>> Did anyone ever tried to run 2 solvers with hyperthreading? >>>>>>> Thanks >>>>>>> >>>>>>> >>>>>>> Il giorno dom 2 ago 2020 alle ore 14:09 Mark Adams >>>>>>> ha scritto: >>>>>>> >>>>>>>> I suspect that the Poisson and Ampere's law solve are not coupled. >>>>>>>> You might be able to duplicate the communicator and use two threads. You >>>>>>>> would want to configure PETSc with threadsafty and threads and I think it >>>>>>>> could/should work, but this mode is never used by anyone. >>>>>>>> >>>>>>>> That said, I would not recommend doing this unless you feel like >>>>>>>> playing in computer science, as opposed to doing application science. The >>>>>>>> best case scenario you get a speedup of 2x. That is a strict upper bound, >>>>>>>> but you will never come close to it. Your hardware has some balance of CPU >>>>>>>> to GPU processing rate. Your application has a balance of volume of work >>>>>>>> for your two solves. They have to be the same to get close to 2x speedup >>>>>>>> and that ratio(s) has to be 1:1. To be concrete, from what little I can >>>>>>>> guess about your applications let's assume that the cost of each of these >>>>>>>> two solves is about the same (eg, Laplacians on your domain and the best >>>>>>>> case scenario). But, GPU machines are configured to have roughly 1-10% of >>>>>>>> capacity in the GPUs, these days, that gives you an upper bound of about >>>>>>>> 10% speedup. That is noise. Upshot, unless you configure your hardware to >>>>>>>> match this problem, and the two solves have the same cost, you will not see >>>>>>>> close to 2x speedup. Your time is better spent elsewhere. >>>>>>>> >>>>>>>> Mark >>>>>>>> >>>>>>>> On Sat, Aug 1, 2020 at 3:24 PM Jed Brown wrote: >>>>>>>> >>>>>>>>> You can use MPI and split the communicator so n-1 ranks create a >>>>>>>>> DMDA for one part of your system and the other rank drives the GPU in the >>>>>>>>> other part. They can all be part of the same coupled system on the full >>>>>>>>> communicator, but PETSc doesn't currently support some ranks having their >>>>>>>>> Vec arrays on GPU and others on host, so you'd be paying host-device >>>>>>>>> transfer costs on each iteration (and that might swamp any performance >>>>>>>>> benefit you would have gotten). >>>>>>>>> >>>>>>>>> In any case, be sure to think about the execution time of each >>>>>>>>> part. Load balancing with matching time-to-solution for each part can be >>>>>>>>> really hard. >>>>>>>>> >>>>>>>>> >>>>>>>>> Barry Smith writes: >>>>>>>>> >>>>>>>>> > Nicola, >>>>>>>>> > >>>>>>>>> > This is really viable or practical at this time with PETSc. >>>>>>>>> It is not impossible but requires careful coding with threads, another >>>>>>>>> possibility is to use one half of the virtual GPUs for each solve, this is >>>>>>>>> also not trivial. I would recommend first seeing what kind of performance >>>>>>>>> you can get on the GPU for each type of solve and revist this idea in the >>>>>>>>> future. >>>>>>>>> > >>>>>>>>> > Barry >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> >> On Jul 31, 2020, at 9:23 AM, nicola varini < >>>>>>>>> nicola.varini at gmail.com> wrote: >>>>>>>>> >> >>>>>>>>> >> Hello, I would like to know if it is possible to overlap CPU >>>>>>>>> and GPU with DMDA. >>>>>>>>> >> I've a machine where each node has 1P100+1Haswell. >>>>>>>>> >> I've to resolve Poisson and Ampere equation for each time step. >>>>>>>>> >> I'm using 2D DMDA for each of them. Would be possible to >>>>>>>>> compute poisson >>>>>>>>> >> and ampere equation at the same time? One on CPU and the other >>>>>>>>> on GPU? >>>>>>>>> >> >>>>>>>>> >> Thanks >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>> >>> >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Mon Aug 17 08:40:36 2020 From: mfadams at lbl.gov (Mark Adams) Date: Mon, 17 Aug 2020 09:40:36 -0400 Subject: [petsc-users] overlap cpu and gpu? In-Reply-To: References: <6C7446CE-D6FD-4087-8B81-41494FA712E7@petsc.dev> <87eeoqp3t2.fsf@jedbrown.org> Message-ID: On Mon, Aug 17, 2020 at 9:24 AM nicola varini wrote: > Hi Mark, I do confirm that hypre with boomeramg is working fine and is > pretty fast. > Good, you can send me the -info (grep GAMG) output and I try to see what is going on. > However, none of the GAMG option works. > Did anyone ever succeeded in usign hypre with petsc on gpu? > We have gotten Hypre to run on GPUs but it has been fragile. The performance has been marginal (due to use of USM apparently), but it is being worked on by the hypre team. The cude tools are changing fast and I am guessing this is a different version than what we have tested, perhaps. Maybe someone else can help with this, but I know we use cuda 10.2 and you are using cuda tools 10.1. And you do want to use the most up-to-date PETSc. > I did manage to compile hypre on gpu but I do get the following error: > ======= > CC gpuhypre/obj/vec/vec/impls/hypre/vhyp.o > In file included from > /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config.h:22, > from > /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/execution_policy.h:23, > from /users/nvarini/hypre/include/_hypre_utilities.h:1129, > from /users/nvarini/hypre/include/_hypre_IJ_mv.h:14, > from > /scratch/snx3000/nvarini/petsc-3.13.3/include/../src/vec/vec/impls/hypre/vhyp.h:6, > from > /scratch/snx3000/nvarini/petsc-3.13.3/src/vec/vec/impls/hypre/vhyp.c:7: > /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/version.h:83:1: > error: unknown type name 'namespace' > namespace thrust > ^~~~~~~~~ > /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/version.h:84:1: > error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token > { > ^ > In file included from > /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config/config.h:28, > from > /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config.h:23, > from > /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/execution_policy.h:23, > from /users/nvarini/hypre/include/_hypre_utilities.h:1129, > from /users/nvarini/hypre/include/_hypre_IJ_mv.h:14, > from > /scratch/snx3000/nvarini/petsc-3.13.3/include/../src/vec/vec/impls/hypre/vhyp.h:6, > from > /scratch/snx3000/nvarini/petsc-3.13.3/src/vec/vec/impls/hypre/vhyp.c:7: > /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config/cpp_compatibility.h:21:10: > fatal error: cstddef: No such file or directory > #include > ^~~~~~~~~ > compilation terminated. > > ======= > Nicola > > Il giorno ven 14 ago 2020 alle ore 20:13 Mark Adams ha > scritto: > >> You can try Hypre. If that fails then there is a problem with your system. >> >> And you can run with -info and grep on GAMG and send the output and I can >> see if I see anything funny. >> >> If this is just a Lapacian with a stable discretization and not crazy >> material parameters then stretched grids are about the only thing that can >> hurt the solver. >> >> Do both of your solves fail in a similar way? >> >> On the CPU you can try this with large subdomains, preferably (in serial >> ideally): >> -ampere_mg_levels_ksp_type richardson >> -ampere_mg_levels_pc_type sor >> >> And check that there are no unused options with -options_left. GAMG can >> fail with bad eigen estimates, but these parameters look fine. >> >> On Fri, Aug 14, 2020 at 5:01 AM nicola varini >> wrote: >> >>> Dear Barry, yes it gives the same problems. >>> >>> Il giorno gio 13 ago 2020 alle ore 23:22 Barry Smith >>> ha scritto: >>> >>>> >>>> Does the same thing work (with GAMG) if you run on the same problem >>>> on the same machine same number of MPI ranks but make a new PETSC_ARCH that >>>> does NOT use the GPUs? >>>> >>>> Barry >>>> >>>> Ideally one gets almost identical convergence with CPUs or GPUs >>>> (same problem, same machine) but a bug or numerically change "might" affect >>>> this. >>>> >>>> On Aug 13, 2020, at 10:28 AM, nicola varini >>>> wrote: >>>> >>>> Dear Barry, you are right. The Cray argument checking is incorrect. It >>>> does work with download-fblaslapack. >>>> However it does fail to converge. Is there anything obviously wrong >>>> with my petscrc? >>>> Anything else am I missing? >>>> >>>> Thanks >>>> >>>> Il giorno gio 13 ago 2020 alle ore 03:17 Barry Smith >>>> ha scritto: >>>> >>>>> >>>>> The QR is always done on the CPU, we don't have generic calls to >>>>> blas/lapack go to the GPU currently. >>>>> >>>>> The error message is: >>>>> >>>>> On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal value >>>>> (info = -7) >>>>> >>>>> argument 7 is &LWORK which is defined by >>>>> >>>>> PetscBLASInt LWORK=N*bs; >>>>> >>>>> and >>>>> >>>>> N=nSAvec is the column block size of new P. >>>>> >>>>> Presumably this is a huge run with many processes so using the >>>>> debugger is not practical? >>>>> >>>>> We need to see what these variables are >>>>> >>>>> N, bs, nSAvec >>>>> >>>>> perhaps nSAvec is zero which could easily upset LAPACK. >>>>> >>>>> Crudest thing would be to just put a print statement in the code >>>>> before the LAPACK call of if they are called many times add an error check >>>>> like that >>>>> generates an error if any of these three values are 0 (or >>>>> negative). >>>>> >>>>> Barry >>>>> >>>>> >>>>> It is not impossible that the Cray argument checking is incorrect >>>>> and the value passed in is fine. You can check this by using >>>>> --download-fblaslapack and see if the same or some other error comes up. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Aug 12, 2020, at 7:19 PM, Mark Adams wrote: >>>>> >>>>> Can you reproduce this on the CPU? >>>>> The QR factorization seems to be failing. That could be from bad data >>>>> or a bad GPU QR. >>>>> >>>>> On Wed, Aug 12, 2020 at 4:19 AM nicola varini >>>>> wrote: >>>>> >>>>>> Dear all, following the suggestions I did resubmit the simulation >>>>>> with the petscrc below. >>>>>> However I do get the following error: >>>>>> ======== >>>>>> 7362 [592]PETSC ERROR: #1 formProl0() line 748 in >>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>>>> 7363 [339]PETSC ERROR: Petsc has generated inconsistent data >>>>>> 7364 [339]PETSC ERROR: xGEQRF error >>>>>> 7365 [339]PETSC ERROR: See >>>>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>>>> shooting. >>>>>> 7366 [339]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020 >>>>>> 7367 [339]PETSC ERROR: >>>>>> /users/nvarini/gbs_test_nicola/bin/gbs_daint_gpu_gnu on a named nid05083 >>>>>> by nvarini Wed Aug 12 10:06:15 2020 >>>>>> 7368 [339]PETSC ERROR: Configure options --with-cc=cc --with-fc=ftn >>>>>> --known-mpi-shared-libraries=1 --known-mpi-c-double-complex=1 >>>>>> --known-mpi-int64_t=1 --known-mpi-long-double=1 --with-batch=1 >>>>>> --known-64-bit-blas-indices=0 --LIBS=-lstdc++ --with-cxxlib-autodetect=0 >>>>>> --with-scalapa ck=1 --with-cxx=CC --with-debugging=0 >>>>>> --with-hypre-dir=/opt/cray/pe/tpsl/19.06.1/GNU/8.2/haswell >>>>>> --prefix=/scratch/snx3000/nvarini/petsc3.13.3-gpu --with-cuda=1 >>>>>> --with-cuda-c=nvcc --with-cxxlib-autodetect=0 >>>>>> --COPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include - >>>>>> -with-cxx=CC >>>>>> --CXXOPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include >>>>>> 7369 [592]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in >>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>>>> 7370 [592]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>>>> 7371 [592]PETSC ERROR: #4 PCSetUp() line 898 in >>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c >>>>>> 7372 [592]PETSC ERROR: #5 KSPSetUp() line 376 in >>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>> 7373 [592]PETSC ERROR: #6 KSPSolve_Private() line 633 in >>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>> 7374 [316]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>>>> 7375 [339]PETSC ERROR: #1 formProl0() line 748 in >>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>>>> 7376 [339]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in >>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>>>> 7377 [339]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>>>> 7378 [339]PETSC ERROR: #4 PCSetUp() line 898 in >>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c >>>>>> 7379 [339]PETSC ERROR: #5 KSPSetUp() line 376 in >>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>> 7380 [592]PETSC ERROR: #7 KSPSolve() line 853 in >>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>> 7381 [339]PETSC ERROR: #6 KSPSolve_Private() line 633 in >>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>> 7382 [339]PETSC ERROR: #7 KSPSolve() line 853 in >>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>> 7383 On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal >>>>>> value (info = -7) >>>>>> 7384 [160]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>>>> ======== >>>>>> >>>>>> I did try other pc_gamg_type but they fails as well. >>>>>> >>>>>> >>>>>> #PETSc Option Table entries: >>>>>> -ampere_dm_mat_type aijcusparse >>>>>> -ampere_dm_vec_type cuda >>>>>> -ampere_ksp_atol 1e-15 >>>>>> -ampere_ksp_initial_guess_nonzero yes >>>>>> -ampere_ksp_reuse_preconditioner yes >>>>>> -ampere_ksp_rtol 1e-7 >>>>>> -ampere_ksp_type dgmres >>>>>> -ampere_mg_levels_esteig_ksp_max_it 10 >>>>>> -ampere_mg_levels_esteig_ksp_type cg >>>>>> -ampere_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >>>>>> -ampere_mg_levels_ksp_type chebyshev >>>>>> -ampere_mg_levels_pc_type jacobi >>>>>> -ampere_pc_gamg_agg_nsmooths 1 >>>>>> -ampere_pc_gamg_coarse_eq_limit 10 >>>>>> -ampere_pc_gamg_reuse_interpolation true >>>>>> -ampere_pc_gamg_square_graph 1 >>>>>> -ampere_pc_gamg_threshold 0.05 >>>>>> -ampere_pc_gamg_threshold_scale .0 >>>>>> -ampere_pc_gamg_type agg >>>>>> -ampere_pc_type gamg >>>>>> -dm_mat_type aijcusparse >>>>>> -dm_vec_type cuda >>>>>> -log_view >>>>>> -poisson_dm_mat_type aijcusparse >>>>>> -poisson_dm_vec_type cuda >>>>>> -poisson_ksp_atol 1e-15 >>>>>> -poisson_ksp_initial_guess_nonzero yes >>>>>> -poisson_ksp_reuse_preconditioner yes >>>>>> -poisson_ksp_rtol 1e-7 >>>>>> -poisson_ksp_type dgmres >>>>>> -poisson_log_view >>>>>> -poisson_mg_levels_esteig_ksp_max_it 10 >>>>>> -poisson_mg_levels_esteig_ksp_type cg >>>>>> -poisson_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >>>>>> -poisson_mg_levels_ksp_max_it 1 >>>>>> -poisson_mg_levels_ksp_type chebyshev >>>>>> -poisson_mg_levels_pc_type jacobi >>>>>> -poisson_pc_gamg_agg_nsmooths 1 >>>>>> -poisson_pc_gamg_coarse_eq_limit 10 >>>>>> -poisson_pc_gamg_reuse_interpolation true >>>>>> -poisson_pc_gamg_square_graph 1 >>>>>> -poisson_pc_gamg_threshold 0.05 >>>>>> -poisson_pc_gamg_threshold_scale .0 >>>>>> -poisson_pc_gamg_type agg >>>>>> -poisson_pc_type gamg >>>>>> -use_mat_nearnullspace true >>>>>> #End of PETSc Option Table entries >>>>>> >>>>>> Regards, >>>>>> >>>>>> Nicola >>>>>> >>>>>> Il giorno mar 4 ago 2020 alle ore 17:57 Mark Adams >>>>>> ha scritto: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Aug 4, 2020 at 6:35 AM Stefano Zampini < >>>>>>> stefano.zampini at gmail.com> wrote: >>>>>>> >>>>>>>> Nicola, >>>>>>>> >>>>>>>> You are actually not using the GPU properly, since you use HYPRE >>>>>>>> preconditioning, which is CPU only. One of your solvers is actually slower >>>>>>>> on ?GPU?. >>>>>>>> For a full AMG GPU, you can use PCGAMG, with cheby smoothers and >>>>>>>> with Jacobi preconditioning. Mark can help you out with the specific >>>>>>>> command line options. >>>>>>>> When it works properly, everything related to PC application is >>>>>>>> offloaded to the GPU, and you should expect to get the well-known and >>>>>>>> branded 10x (maybe more) speedup one is expecting from GPUs during KSPSolve >>>>>>>> >>>>>>>> >>>>>>> The speedup depends on the machine, but on SUMMIT, using enough CPUs >>>>>>> to saturate the memory bus vs all 6 GPUs the speedup is a function of >>>>>>> problem subdomain size. I saw 10x at about 100K equations/process. >>>>>>> >>>>>>> >>>>>>>> Doing what you want to do is one of the last optimization steps of >>>>>>>> an already optimized code before entering production. Yours is not even >>>>>>>> optimized for proper GPU usage yet. >>>>>>>> Also, any specific reason why you are using dgmres and fgmres? >>>>>>>> >>>>>>>> PETSc has not been designed with multi-threading in mind. You can >>>>>>>> achieve ?overlap? of the two solves by splitting the communicator. But then >>>>>>>> you need communications to let the two solutions talk to each other. >>>>>>>> >>>>>>>> Thanks >>>>>>>> Stefano >>>>>>>> >>>>>>>> >>>>>>>> On Aug 4, 2020, at 12:04 PM, nicola varini >>>>>>>> wrote: >>>>>>>> >>>>>>>> Dear all, thanks for your replies. The reason why I've asked if it >>>>>>>> is possible to overlap poisson and ampere is because they roughly >>>>>>>> take the same amount of time. Please find in attachment the >>>>>>>> profiling logs for only CPU and only GPU. >>>>>>>> Of course it is possible to split the MPI communicator and run each >>>>>>>> solver on different subcommunicator, however this would involve more >>>>>>>> communication. >>>>>>>> Did anyone ever tried to run 2 solvers with hyperthreading? >>>>>>>> Thanks >>>>>>>> >>>>>>>> >>>>>>>> Il giorno dom 2 ago 2020 alle ore 14:09 Mark Adams >>>>>>>> ha scritto: >>>>>>>> >>>>>>>>> I suspect that the Poisson and Ampere's law solve are not coupled. >>>>>>>>> You might be able to duplicate the communicator and use two threads. You >>>>>>>>> would want to configure PETSc with threadsafty and threads and I think it >>>>>>>>> could/should work, but this mode is never used by anyone. >>>>>>>>> >>>>>>>>> That said, I would not recommend doing this unless you feel like >>>>>>>>> playing in computer science, as opposed to doing application science. The >>>>>>>>> best case scenario you get a speedup of 2x. That is a strict upper bound, >>>>>>>>> but you will never come close to it. Your hardware has some balance of CPU >>>>>>>>> to GPU processing rate. Your application has a balance of volume of work >>>>>>>>> for your two solves. They have to be the same to get close to 2x speedup >>>>>>>>> and that ratio(s) has to be 1:1. To be concrete, from what little I can >>>>>>>>> guess about your applications let's assume that the cost of each of these >>>>>>>>> two solves is about the same (eg, Laplacians on your domain and the best >>>>>>>>> case scenario). But, GPU machines are configured to have roughly 1-10% of >>>>>>>>> capacity in the GPUs, these days, that gives you an upper bound of about >>>>>>>>> 10% speedup. That is noise. Upshot, unless you configure your hardware to >>>>>>>>> match this problem, and the two solves have the same cost, you will not see >>>>>>>>> close to 2x speedup. Your time is better spent elsewhere. >>>>>>>>> >>>>>>>>> Mark >>>>>>>>> >>>>>>>>> On Sat, Aug 1, 2020 at 3:24 PM Jed Brown wrote: >>>>>>>>> >>>>>>>>>> You can use MPI and split the communicator so n-1 ranks create a >>>>>>>>>> DMDA for one part of your system and the other rank drives the GPU in the >>>>>>>>>> other part. They can all be part of the same coupled system on the full >>>>>>>>>> communicator, but PETSc doesn't currently support some ranks having their >>>>>>>>>> Vec arrays on GPU and others on host, so you'd be paying host-device >>>>>>>>>> transfer costs on each iteration (and that might swamp any performance >>>>>>>>>> benefit you would have gotten). >>>>>>>>>> >>>>>>>>>> In any case, be sure to think about the execution time of each >>>>>>>>>> part. Load balancing with matching time-to-solution for each part can be >>>>>>>>>> really hard. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Barry Smith writes: >>>>>>>>>> >>>>>>>>>> > Nicola, >>>>>>>>>> > >>>>>>>>>> > This is really viable or practical at this time with PETSc. >>>>>>>>>> It is not impossible but requires careful coding with threads, another >>>>>>>>>> possibility is to use one half of the virtual GPUs for each solve, this is >>>>>>>>>> also not trivial. I would recommend first seeing what kind of performance >>>>>>>>>> you can get on the GPU for each type of solve and revist this idea in the >>>>>>>>>> future. >>>>>>>>>> > >>>>>>>>>> > Barry >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> >> On Jul 31, 2020, at 9:23 AM, nicola varini < >>>>>>>>>> nicola.varini at gmail.com> wrote: >>>>>>>>>> >> >>>>>>>>>> >> Hello, I would like to know if it is possible to overlap CPU >>>>>>>>>> and GPU with DMDA. >>>>>>>>>> >> I've a machine where each node has 1P100+1Haswell. >>>>>>>>>> >> I've to resolve Poisson and Ampere equation for each time step. >>>>>>>>>> >> I'm using 2D DMDA for each of them. Would be possible to >>>>>>>>>> compute poisson >>>>>>>>>> >> and ampere equation at the same time? One on CPU and the other >>>>>>>>>> on GPU? >>>>>>>>>> >> >>>>>>>>>> >> Thanks >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>> >>>> >>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Mon Aug 17 08:42:29 2020 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Mon, 17 Aug 2020 15:42:29 +0200 Subject: [petsc-users] overlap cpu and gpu? In-Reply-To: References: <6C7446CE-D6FD-4087-8B81-41494FA712E7@petsc.dev> <87eeoqp3t2.fsf@jedbrown.org> Message-ID: <4D8EC2A9-3EEC-436C-8B37-70AA8BCF6684@gmail.com> > On Aug 17, 2020, at 3:24 PM, nicola varini wrote: > > Hi Mark, I do confirm that hypre with boomeramg is working fine and is pretty fast. > However, none of the GAMG option works. > Did anyone ever succeeded in usign hypre with petsc on gpu? > I did manage to compile hypre on gpu but I do get the following error: > ======= > CC gpuhypre/obj/vec/vec/impls/hypre/vhyp.o > In file included from /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config.h:22, > from /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/execution_policy.h:23, > from /users/nvarini/hypre/include/_hypre_utilities.h:1129, > from /users/nvarini/hypre/include/_hypre_IJ_mv.h:14, > from /scratch/snx3000/nvarini/petsc-3.13.3/include/../src/vec/vec/impls/hypre/vhyp.h:6, > from /scratch/snx3000/nvarini/petsc-3.13.3/src/vec/vec/impls/hypre/vhyp.c:7: > /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/version.h:83:1: error: unknown type name 'namespace' > namespace thrust > ^~~~~~~~~ > /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/version.h:84:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token > { > ^ > In file included from /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config/config.h:28, > from /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config.h:23, > from /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/execution_policy.h:23, > from /users/nvarini/hypre/include/_hypre_utilities.h:1129, > from /users/nvarini/hypre/include/_hypre_IJ_mv.h:14, > from /scratch/snx3000/nvarini/petsc-3.13.3/include/../src/vec/vec/impls/hypre/vhyp.h:6, > from /scratch/snx3000/nvarini/petsc-3.13.3/src/vec/vec/impls/hypre/vhyp.c:7: > /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config/cpp_compatibility.h:21:10: fatal error: cstddef: No such file or directory > #include > ^~~~~~~~~ > compilation terminated. Nicola, Interfacing PETSc with HYPRE-GPU is still work in progress. In the meantime, if you want to play with it, you can checkout the PETSc branch https://gitlab.com/petsc/petsc/-/tree/stefanozampini/hypre-cuda-rebased-v3 and the HYPRE branch https://github.com/hypre-space/hypre/tree/PETScFix and start experimenting. However, I do not guarantee it will work properly; for boomerAMG to work on GPU, you must use cudaUnifiedMemory (default memory allocator in the PETSc branch),and only PMIS interpolation and Jacobi smoothing are available on GPU. Ruipeng (in cc) can confirm. I will resume working on this thing in the next weeks and I hope it will be stable soon. Regards, Stefano > > ======= > Nicola > > Il giorno ven 14 ago 2020 alle ore 20:13 Mark Adams > ha scritto: > You can try Hypre. If that fails then there is a problem with your system. > > And you can run with -info and grep on GAMG and send the output and I can see if I see anything funny. > > If this is just a Lapacian with a stable discretization and not crazy material parameters then stretched grids are about the only thing that can hurt the solver. > > Do both of your solves fail in a similar way? > > On the CPU you can try this with large subdomains, preferably (in serial ideally): > -ampere_mg_levels_ksp_type richardson > -ampere_mg_levels_pc_type sor > > And check that there are no unused options with -options_left. GAMG can fail with bad eigen estimates, but these parameters look fine. > > On Fri, Aug 14, 2020 at 5:01 AM nicola varini > wrote: > Dear Barry, yes it gives the same problems. > > Il giorno gio 13 ago 2020 alle ore 23:22 Barry Smith > ha scritto: > > Does the same thing work (with GAMG) if you run on the same problem on the same machine same number of MPI ranks but make a new PETSC_ARCH that does NOT use the GPUs? > > Barry > > Ideally one gets almost identical convergence with CPUs or GPUs (same problem, same machine) but a bug or numerically change "might" affect this. > >> On Aug 13, 2020, at 10:28 AM, nicola varini > wrote: >> >> Dear Barry, you are right. The Cray argument checking is incorrect. It does work with download-fblaslapack. >> However it does fail to converge. Is there anything obviously wrong with my petscrc? >> Anything else am I missing? >> >> Thanks >> >> Il giorno gio 13 ago 2020 alle ore 03:17 Barry Smith > ha scritto: >> >> The QR is always done on the CPU, we don't have generic calls to blas/lapack go to the GPU currently. >> >> The error message is: >> >> On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal value (info = -7) >> >> argument 7 is &LWORK which is defined by >> >> PetscBLASInt LWORK=N*bs; >> >> and >> >> N=nSAvec is the column block size of new P. >> >> Presumably this is a huge run with many processes so using the debugger is not practical? >> >> We need to see what these variables are >> >> N, bs, nSAvec >> >> perhaps nSAvec is zero which could easily upset LAPACK. >> >> Crudest thing would be to just put a print statement in the code before the LAPACK call of if they are called many times add an error check like that >> generates an error if any of these three values are 0 (or negative). >> >> Barry >> >> >> It is not impossible that the Cray argument checking is incorrect and the value passed in is fine. You can check this by using --download-fblaslapack and see if the same or some other error comes up. >> >> >> >> >> >> >> >> >>> On Aug 12, 2020, at 7:19 PM, Mark Adams > wrote: >>> >>> Can you reproduce this on the CPU? >>> The QR factorization seems to be failing. That could be from bad data or a bad GPU QR. >>> >>> On Wed, Aug 12, 2020 at 4:19 AM nicola varini > wrote: >>> Dear all, following the suggestions I did resubmit the simulation with the petscrc below. >>> However I do get the following error: >>> ======== >>> 7362 [592]PETSC ERROR: #1 formProl0() line 748 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>> 7363 [339]PETSC ERROR: Petsc has generated inconsistent data >>> 7364 [339]PETSC ERROR: xGEQRF error >>> 7365 [339]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>> 7366 [339]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020 >>> 7367 [339]PETSC ERROR: /users/nvarini/gbs_test_nicola/bin/gbs_daint_gpu_gnu on a named nid05083 by nvarini Wed Aug 12 10:06:15 2020 >>> 7368 [339]PETSC ERROR: Configure options --with-cc=cc --with-fc=ftn --known-mpi-shared-libraries=1 --known-mpi-c-double-complex=1 --known-mpi-int64_t=1 --known-mpi-long-double=1 --with-batch=1 --known-64-bit-blas-indices=0 --LIBS=-lstdc++ --with-cxxlib-autodetect=0 --with-scalapa ck=1 --with-cxx=CC --with-debugging=0 --with-hypre-dir=/opt/cray/pe/tpsl/19.06.1/GNU/8.2/haswell --prefix=/scratch/snx3000/nvarini/petsc3.13.3-gpu --with-cuda=1 --with-cuda-c=nvcc --with-cxxlib-autodetect=0 --COPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include - -with-cxx=CC --CXXOPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include >>> 7369 [592]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>> 7370 [592]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>> 7371 [592]PETSC ERROR: #4 PCSetUp() line 898 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c >>> 7372 [592]PETSC ERROR: #5 KSPSetUp() line 376 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>> 7373 [592]PETSC ERROR: #6 KSPSolve_Private() line 633 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>> 7374 [316]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>> 7375 [339]PETSC ERROR: #1 formProl0() line 748 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>> 7376 [339]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>> 7377 [339]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>> 7378 [339]PETSC ERROR: #4 PCSetUp() line 898 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c >>> 7379 [339]PETSC ERROR: #5 KSPSetUp() line 376 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>> 7380 [592]PETSC ERROR: #7 KSPSolve() line 853 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>> 7381 [339]PETSC ERROR: #6 KSPSolve_Private() line 633 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>> 7382 [339]PETSC ERROR: #7 KSPSolve() line 853 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>> 7383 On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal value (info = -7) >>> 7384 [160]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>> ======== >>> >>> I did try other pc_gamg_type but they fails as well. >>> >>> >>> #PETSc Option Table entries: >>> -ampere_dm_mat_type aijcusparse >>> -ampere_dm_vec_type cuda >>> -ampere_ksp_atol 1e-15 >>> -ampere_ksp_initial_guess_nonzero yes >>> -ampere_ksp_reuse_preconditioner yes >>> -ampere_ksp_rtol 1e-7 >>> -ampere_ksp_type dgmres >>> -ampere_mg_levels_esteig_ksp_max_it 10 >>> -ampere_mg_levels_esteig_ksp_type cg >>> -ampere_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >>> -ampere_mg_levels_ksp_type chebyshev >>> -ampere_mg_levels_pc_type jacobi >>> -ampere_pc_gamg_agg_nsmooths 1 >>> -ampere_pc_gamg_coarse_eq_limit 10 >>> -ampere_pc_gamg_reuse_interpolation true >>> -ampere_pc_gamg_square_graph 1 >>> -ampere_pc_gamg_threshold 0.05 >>> -ampere_pc_gamg_threshold_scale .0 >>> -ampere_pc_gamg_type agg >>> -ampere_pc_type gamg >>> -dm_mat_type aijcusparse >>> -dm_vec_type cuda >>> -log_view >>> -poisson_dm_mat_type aijcusparse >>> -poisson_dm_vec_type cuda >>> -poisson_ksp_atol 1e-15 >>> -poisson_ksp_initial_guess_nonzero yes >>> -poisson_ksp_reuse_preconditioner yes >>> -poisson_ksp_rtol 1e-7 >>> -poisson_ksp_type dgmres >>> -poisson_log_view >>> -poisson_mg_levels_esteig_ksp_max_it 10 >>> -poisson_mg_levels_esteig_ksp_type cg >>> -poisson_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >>> -poisson_mg_levels_ksp_max_it 1 >>> -poisson_mg_levels_ksp_type chebyshev >>> -poisson_mg_levels_pc_type jacobi >>> -poisson_pc_gamg_agg_nsmooths 1 >>> -poisson_pc_gamg_coarse_eq_limit 10 >>> -poisson_pc_gamg_reuse_interpolation true >>> -poisson_pc_gamg_square_graph 1 >>> -poisson_pc_gamg_threshold 0.05 >>> -poisson_pc_gamg_threshold_scale .0 >>> -poisson_pc_gamg_type agg >>> -poisson_pc_type gamg >>> -use_mat_nearnullspace true >>> #End of PETSc Option Table entries >>> >>> Regards, >>> >>> Nicola >>> >>> Il giorno mar 4 ago 2020 alle ore 17:57 Mark Adams > ha scritto: >>> >>> >>> On Tue, Aug 4, 2020 at 6:35 AM Stefano Zampini > wrote: >>> Nicola, >>> >>> You are actually not using the GPU properly, since you use HYPRE preconditioning, which is CPU only. One of your solvers is actually slower on ?GPU?. >>> For a full AMG GPU, you can use PCGAMG, with cheby smoothers and with Jacobi preconditioning. Mark can help you out with the specific command line options. >>> When it works properly, everything related to PC application is offloaded to the GPU, and you should expect to get the well-known and branded 10x (maybe more) speedup one is expecting from GPUs during KSPSolve >>> >>> >>> The speedup depends on the machine, but on SUMMIT, using enough CPUs to saturate the memory bus vs all 6 GPUs the speedup is a function of problem subdomain size. I saw 10x at about 100K equations/process. >>> >>> Doing what you want to do is one of the last optimization steps of an already optimized code before entering production. Yours is not even optimized for proper GPU usage yet. >>> Also, any specific reason why you are using dgmres and fgmres? >>> >>> PETSc has not been designed with multi-threading in mind. You can achieve ?overlap? of the two solves by splitting the communicator. But then you need communications to let the two solutions talk to each other. >>> >>> Thanks >>> Stefano >>> >>> >>>> On Aug 4, 2020, at 12:04 PM, nicola varini > wrote: >>>> >>>> Dear all, thanks for your replies. The reason why I've asked if it is possible to overlap poisson and ampere is because they roughly >>>> take the same amount of time. Please find in attachment the profiling logs for only CPU and only GPU. >>>> Of course it is possible to split the MPI communicator and run each solver on different subcommunicator, however this would involve more communication. >>>> Did anyone ever tried to run 2 solvers with hyperthreading? >>>> Thanks >>>> >>>> >>>> Il giorno dom 2 ago 2020 alle ore 14:09 Mark Adams > ha scritto: >>>> I suspect that the Poisson and Ampere's law solve are not coupled. You might be able to duplicate the communicator and use two threads. You would want to configure PETSc with threadsafty and threads and I think it could/should work, but this mode is never used by anyone. >>>> >>>> That said, I would not recommend doing this unless you feel like playing in computer science, as opposed to doing application science. The best case scenario you get a speedup of 2x. That is a strict upper bound, but you will never come close to it. Your hardware has some balance of CPU to GPU processing rate. Your application has a balance of volume of work for your two solves. They have to be the same to get close to 2x speedup and that ratio(s) has to be 1:1. To be concrete, from what little I can guess about your applications let's assume that the cost of each of these two solves is about the same (eg, Laplacians on your domain and the best case scenario). But, GPU machines are configured to have roughly 1-10% of capacity in the GPUs, these days, that gives you an upper bound of about 10% speedup. That is noise. Upshot, unless you configure your hardware to match this problem, and the two solves have the same cost, you will not see close to 2x speedup. Your time is better spent elsewhere. >>>> >>>> Mark >>>> >>>> On Sat, Aug 1, 2020 at 3:24 PM Jed Brown > wrote: >>>> You can use MPI and split the communicator so n-1 ranks create a DMDA for one part of your system and the other rank drives the GPU in the other part. They can all be part of the same coupled system on the full communicator, but PETSc doesn't currently support some ranks having their Vec arrays on GPU and others on host, so you'd be paying host-device transfer costs on each iteration (and that might swamp any performance benefit you would have gotten). >>>> >>>> In any case, be sure to think about the execution time of each part. Load balancing with matching time-to-solution for each part can be really hard. >>>> >>>> >>>> Barry Smith > writes: >>>> >>>> > Nicola, >>>> > >>>> > This is really viable or practical at this time with PETSc. It is not impossible but requires careful coding with threads, another possibility is to use one half of the virtual GPUs for each solve, this is also not trivial. I would recommend first seeing what kind of performance you can get on the GPU for each type of solve and revist this idea in the future. >>>> > >>>> > Barry >>>> > >>>> > >>>> > >>>> > >>>> >> On Jul 31, 2020, at 9:23 AM, nicola varini > wrote: >>>> >> >>>> >> Hello, I would like to know if it is possible to overlap CPU and GPU with DMDA. >>>> >> I've a machine where each node has 1P100+1Haswell. >>>> >> I've to resolve Poisson and Ampere equation for each time step. >>>> >> I'm using 2D DMDA for each of them. Would be possible to compute poisson >>>> >> and ampere equation at the same time? One on CPU and the other on GPU? >>>> >> >>>> >> Thanks >>>> >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Mon Aug 17 08:52:00 2020 From: jroman at dsic.upv.es (Jose E. Roman) Date: Mon, 17 Aug 2020 15:52:00 +0200 Subject: [petsc-users] Solving singular systems with petsc In-Reply-To: References: <7a07ee36-4a86-7868-4d56-91e7a02e4e1c@rice.edu> <4fce2a36-83d9-407a-a4fd-06a1710dd059@rice.edu> <87bljb6uf8.fsf@jedbrown.org> <75b47901-a5d2-e794-4c4f-27ed4cbe26b0@rice.edu> Message-ID: > El 17 ago 2020, a las 14:27, Barry Smith escribi?: > > > Nidish, > > Your matrix is dense, correct? MUMPS is for sparse matrices. > > Then I guess you could use Scalapack http://netlib.org/scalapack/slug/node48.html#SECTION04323200000000000000 to do the SVD. The work is order N^3 and parallel efficiency may not be great but it might help you solve your problem. > > I don't know if SLEPc has an interface to Scalapack for SVD or not. Yes, SLEPc (master) has interfaces for ScaLAPACK and Elemental for both SVD and (symmetric) eigenvalues. Jose > > Barry > > > > > > > >> On Aug 17, 2020, at 2:51 AM, Jose E. Roman wrote: >> >> You can use SLEPc's SVD to compute the nullspace, but it has pitfalls: make sure you use an absolute convergence test (not relative); for the particular case of zero singular vectors, accuracy may not be very good and convergence may be slow (with the corresponding high computational cost). >> >> MUMPS has functionality to get a basis of the nullspace, once you have computed the factorization. But I don't know if this is easily accessible via PETSc. >> >> Jose >> >> >> >>> El 17 ago 2020, a las 3:10, Nidish escribi?: >>> >>> Oh damn. Alright, I'll keep trying out the different options. >>> >>> Thank you, >>> Nidish >>> >>> On 8/16/20 8:05 PM, Barry Smith wrote: >>>> >>>> SVD is enormously expensive, needs to be done on a full dense matrix so completely impractical. You need the best tuned iterative method, Jose is the by far the most knowledgeable about that. >>>> >>>> Barry >>>> >>>> >>>>> On Aug 16, 2020, at 7:46 PM, Nidish wrote: >>>>> >>>>> Thank you for the suggestions. >>>>> >>>>> I'm getting a zero pivot error for the LU in slepc while calculating the rest of the modes. >>>>> >>>>> Would conducting an SVD for just the stiffness matrix and then using the singular vectors as bases for the nullspace work? I haven't tried this out just yet, but I'm wondering if you could provide me insights into whether this will. >>>>> >>>>> Thanks, >>>>> Nidish >>>>> >>>>> On 8/16/20 2:50 PM, Barry Smith wrote: >>>>>> >>>>>> If you know part of your null space explicitly (for example the rigid body modes) I would recommend you always use that information explicitly since it is extremely expensive numerically to obtain. Thus rather than numerically computing the entire null space compute the part orthogonal to the part you already know. Presumably SLEPc has tools to help do this, naively I would just orthogonalized against the know subspace during the computational process but there are probably better ways. >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> On Aug 16, 2020, at 11:26 AM, Nidish wrote: >>>>>>> >>>>>>> Well some of the zero eigenvectors are rigid body modes, but there are some more which are introduced by lagrange-multiplier based constraint enforcement, which are non trivial. >>>>>>> >>>>>>> My final application is for a nonlinear simulation, so I don't mind the extra computational effort initially. Could you have me the suggested solver configurations to get this type of eigenvectors in slepc? >>>>>>> >>>>>>> Nidish >>>>>>> On Aug 16, 2020, at 00:17, Jed Brown wrote: >>>>>>> It's possible to use this or a similar algorithm in SLEPc, but keep in mind that it's more expensive to compute these eigenvectors than to solve a linear system. Do you have a sequence of systems with the same null space? >>>>>>> >>>>>>> You referred to the null space as "rigid body modes". Why can't those be written down? Note that PETSc has convenience routines for computing rigid body modes from coordinates. >>>>>>> >>>>>>> Nidish < >>>>>>> nb25 at rice.edu >>>>>>>> writes: >>>>>>> >>>>>>> >>>>>>> I just use the standard eigs function (https://www.mathworks.com/help/matlab/ref/eigs.html >>>>>>> ) as a black box. I think it uses a lanczos type method under the hood. >>>>>>> >>>>>>> Nidish >>>>>>> >>>>>>> On Aug 15, 2020, 21:42, at 21:42, Barry Smith < >>>>>>> bsmith at petsc.dev >>>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> Exactly what algorithm are you using in Matlab to get the 10 smallest >>>>>>> eigenvalues and their corresponding eigenvectors? >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Aug 15, 2020, at 8:53 PM, Nidish >>>>>>> wrote: >>>>>>> >>>>>>> The section on solving singular systems in the manual starts with >>>>>>> >>>>>>> assuming that the singular eigenvectors are already known. >>>>>>> >>>>>>> >>>>>>> I have a large system where finding the singular eigenvectors is not >>>>>>> >>>>>>> trivially written down. How would you recommend I proceed with making >>>>>>> initial estimates? In MATLAB (with MUCH smaller matrices), I conduct an >>>>>>> eigensolve for the first 10 smallest eigenvalues and take the >>>>>>> eigenvectors corresponding to the zero eigenvalues from this. This >>>>>>> approach doesn't work here since I'm unable to use SLEPc for solving >>>>>>> >>>>>>> >>>>>>> K.v = lam*M.v >>>>>>> >>>>>>> for cases where K is positive semi-definite (contains a few "rigid >>>>>>> >>>>>>> body modes") and M is strictly positive definite. >>>>>>> >>>>>>> >>>>>>> I'd appreciate any assistance you may provide with this. >>>>>>> >>>>>>> Thank you, >>>>>>> Nidish >>>>>>> >>>>>> >>>>> -- >>>>> Nidish >>>> >>> -- >>> Nidish >> > From nicola.varini at gmail.com Mon Aug 17 09:33:00 2020 From: nicola.varini at gmail.com (nicola varini) Date: Mon, 17 Aug 2020 16:33:00 +0200 Subject: [petsc-users] overlap cpu and gpu? In-Reply-To: References: <6C7446CE-D6FD-4087-8B81-41494FA712E7@petsc.dev> <87eeoqp3t2.fsf@jedbrown.org> Message-ID: Hi Mark, this is the out of grep GAMG after I used -info: ======= [0] PCSetUp_GAMG(): level 0) N=582736, n data rows=1, n data cols=1, nnz/row (ave)=9, np=12 [0] PCGAMGFilterGraph(): 97.9676% nnz after filtering, with threshold 0., 8.95768 nnz ave. (N=582736) [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square [0] PCGAMGProlongator_AGG(): New grid 38934 nodes [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=2.101683e+00 min=4.341777e-03 PC=jacobi [0] PCGAMGOptProlongator_AGG(): Smooth P0: level 0, cache spectra 0.00434178 2.10168 [0] PCSetUp_GAMG(): 1) N=38934, n data cols=1, nnz/row (ave)=18, 12 active pes [0] PCGAMGFilterGraph(): 97.024% nnz after filtering, with threshold 0., 17.9774 nnz ave. (N=38934) [0] PCGAMGProlongator_AGG(): New grid 4459 nodes [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=4.521607e+01 min=5.854294e-01 PC=jacobi [0] PCGAMGOptProlongator_AGG(): Smooth P0: level 1, cache spectra 0.585429 45.2161 [0] PCSetUp_GAMG(): 2) N=4459, n data cols=1, nnz/row (ave)=29, 12 active pes [0] PCGAMGFilterGraph(): 99.6422% nnz after filtering, with threshold 0., 27.5481 nnz ave. (N=4459) [0] PCGAMGProlongator_AGG(): New grid 345 nodes [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.394069e+01 min=1.086973e-01 PC=jacobi [0] PCGAMGOptProlongator_AGG(): Smooth P0: level 2, cache spectra 0.108697 13.9407 [0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 29 with simple aggregation [0] PCSetUp_GAMG(): 3) N=345, n data cols=1, nnz/row (ave)=31, 6 active pes [0] PCGAMGFilterGraph(): 99.6292% nnz after filtering, with threshold 0., 26.9667 nnz ave. (N=345) [0] PCGAMGProlongator_AGG(): New grid 26 nodes [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.463593e+02 min=1.469384e-01 PC=jacobi [0] PCGAMGOptProlongator_AGG(): Smooth P0: level 3, cache spectra 0.146938 146.359 [0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 5 with simple aggregation [0] PCSetUp_GAMG(): 4) N=26, n data cols=1, nnz/row (ave)=16, 1 active pes [0] PCSetUp_GAMG(): 5 levels, grid complexity = 1.16304 PCGAMGGraph_AGG 4 1.0 8.4114e-02 1.0 1.02e+06 1.0 3.8e+02 1.3e+03 4.0e+01 0 0 0 0 0 0 0 0 0 0 145 PCGAMGCoarse_AGG 4 1.0 3.2107e-01 1.0 9.43e+06 1.0 7.3e+02 1.1e+04 3.5e+01 0 0 0 0 0 0 0 0 0 0 351 PCGAMGProl_AGG 4 1.0 2.8825e-02 1.0 0.00e+00 0.0 3.5e+02 2.8e+03 6.4e+01 0 0 0 0 0 0 0 0 0 0 0 PCGAMGPOpt_AGG 4 1.0 1.1570e-01 1.0 2.61e+07 1.0 1.2e+03 2.6e+03 1.6e+02 0 0 0 0 1 0 0 0 0 1 2692 GAMG: createProl 4 1.0 5.5680e-01 1.0 3.64e+07 1.0 2.7e+03 4.6e+03 3.0e+02 0 0 0 0 1 0 0 0 0 1 784 GAMG: partLevel 4 1.0 1.1628e-01 1.0 5.90e+06 1.0 1.1e+03 3.0e+03 1.6e+02 0 0 0 0 1 0 0 0 0 1 604 ====== Nicola Il giorno lun 17 ago 2020 alle ore 15:40 Mark Adams ha scritto: > > > On Mon, Aug 17, 2020 at 9:24 AM nicola varini > wrote: > >> Hi Mark, I do confirm that hypre with boomeramg is working fine and is >> pretty fast. >> > > Good, you can send me the -info (grep GAMG) output and I try to see what > is going on. > > >> However, none of the GAMG option works. >> Did anyone ever succeeded in usign hypre with petsc on gpu? >> > > We have gotten Hypre to run on GPUs but it has been fragile. The > performance has been marginal (due to use of USM apparently), but it is > being worked on by the hypre team. > > The cude tools are changing fast and I am guessing this is a different > version than what we have tested, perhaps. Maybe someone else can help with > this, but I know we use cuda 10.2 and you are using cuda tools 10.1. > > And you do want to use the most up-to-date PETSc. > > >> I did manage to compile hypre on gpu but I do get the following error: >> ======= >> CC gpuhypre/obj/vec/vec/impls/hypre/vhyp.o >> In file included from >> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config.h:22, >> from >> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/execution_policy.h:23, >> from >> /users/nvarini/hypre/include/_hypre_utilities.h:1129, >> from /users/nvarini/hypre/include/_hypre_IJ_mv.h:14, >> from >> /scratch/snx3000/nvarini/petsc-3.13.3/include/../src/vec/vec/impls/hypre/vhyp.h:6, >> from >> /scratch/snx3000/nvarini/petsc-3.13.3/src/vec/vec/impls/hypre/vhyp.c:7: >> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/version.h:83:1: >> error: unknown type name 'namespace' >> namespace thrust >> ^~~~~~~~~ >> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/version.h:84:1: >> error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token >> { >> ^ >> In file included from >> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config/config.h:28, >> from >> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config.h:23, >> from >> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/execution_policy.h:23, >> from >> /users/nvarini/hypre/include/_hypre_utilities.h:1129, >> from /users/nvarini/hypre/include/_hypre_IJ_mv.h:14, >> from >> /scratch/snx3000/nvarini/petsc-3.13.3/include/../src/vec/vec/impls/hypre/vhyp.h:6, >> from >> /scratch/snx3000/nvarini/petsc-3.13.3/src/vec/vec/impls/hypre/vhyp.c:7: >> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config/cpp_compatibility.h:21:10: >> fatal error: cstddef: No such file or directory >> #include >> ^~~~~~~~~ >> compilation terminated. >> >> ======= >> Nicola >> >> Il giorno ven 14 ago 2020 alle ore 20:13 Mark Adams ha >> scritto: >> >>> You can try Hypre. If that fails then there is a problem with your >>> system. >>> >>> And you can run with -info and grep on GAMG and send the output and I >>> can see if I see anything funny. >>> >>> If this is just a Lapacian with a stable discretization and not crazy >>> material parameters then stretched grids are about the only thing that can >>> hurt the solver. >>> >>> Do both of your solves fail in a similar way? >>> >>> On the CPU you can try this with large subdomains, preferably (in serial >>> ideally): >>> -ampere_mg_levels_ksp_type richardson >>> -ampere_mg_levels_pc_type sor >>> >>> And check that there are no unused options with -options_left. GAMG can >>> fail with bad eigen estimates, but these parameters look fine. >>> >>> On Fri, Aug 14, 2020 at 5:01 AM nicola varini >>> wrote: >>> >>>> Dear Barry, yes it gives the same problems. >>>> >>>> Il giorno gio 13 ago 2020 alle ore 23:22 Barry Smith >>>> ha scritto: >>>> >>>>> >>>>> Does the same thing work (with GAMG) if you run on the same problem >>>>> on the same machine same number of MPI ranks but make a new PETSC_ARCH that >>>>> does NOT use the GPUs? >>>>> >>>>> Barry >>>>> >>>>> Ideally one gets almost identical convergence with CPUs or GPUs >>>>> (same problem, same machine) but a bug or numerically change "might" affect >>>>> this. >>>>> >>>>> On Aug 13, 2020, at 10:28 AM, nicola varini >>>>> wrote: >>>>> >>>>> Dear Barry, you are right. The Cray argument checking is incorrect. It >>>>> does work with download-fblaslapack. >>>>> However it does fail to converge. Is there anything obviously wrong >>>>> with my petscrc? >>>>> Anything else am I missing? >>>>> >>>>> Thanks >>>>> >>>>> Il giorno gio 13 ago 2020 alle ore 03:17 Barry Smith >>>>> ha scritto: >>>>> >>>>>> >>>>>> The QR is always done on the CPU, we don't have generic calls to >>>>>> blas/lapack go to the GPU currently. >>>>>> >>>>>> The error message is: >>>>>> >>>>>> On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal value >>>>>> (info = -7) >>>>>> >>>>>> argument 7 is &LWORK which is defined by >>>>>> >>>>>> PetscBLASInt LWORK=N*bs; >>>>>> >>>>>> and >>>>>> >>>>>> N=nSAvec is the column block size of new P. >>>>>> >>>>>> Presumably this is a huge run with many processes so using the >>>>>> debugger is not practical? >>>>>> >>>>>> We need to see what these variables are >>>>>> >>>>>> N, bs, nSAvec >>>>>> >>>>>> perhaps nSAvec is zero which could easily upset LAPACK. >>>>>> >>>>>> Crudest thing would be to just put a print statement in the code >>>>>> before the LAPACK call of if they are called many times add an error check >>>>>> like that >>>>>> generates an error if any of these three values are 0 (or >>>>>> negative). >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> It is not impossible that the Cray argument checking is incorrect >>>>>> and the value passed in is fine. You can check this by using >>>>>> --download-fblaslapack and see if the same or some other error comes up. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Aug 12, 2020, at 7:19 PM, Mark Adams wrote: >>>>>> >>>>>> Can you reproduce this on the CPU? >>>>>> The QR factorization seems to be failing. That could be from bad data >>>>>> or a bad GPU QR. >>>>>> >>>>>> On Wed, Aug 12, 2020 at 4:19 AM nicola varini < >>>>>> nicola.varini at gmail.com> wrote: >>>>>> >>>>>>> Dear all, following the suggestions I did resubmit the simulation >>>>>>> with the petscrc below. >>>>>>> However I do get the following error: >>>>>>> ======== >>>>>>> 7362 [592]PETSC ERROR: #1 formProl0() line 748 in >>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>>>>> 7363 [339]PETSC ERROR: Petsc has generated inconsistent data >>>>>>> 7364 [339]PETSC ERROR: xGEQRF error >>>>>>> 7365 [339]PETSC ERROR: See >>>>>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>>>>> shooting. >>>>>>> 7366 [339]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020 >>>>>>> 7367 [339]PETSC ERROR: >>>>>>> /users/nvarini/gbs_test_nicola/bin/gbs_daint_gpu_gnu on a named nid05083 >>>>>>> by nvarini Wed Aug 12 10:06:15 2020 >>>>>>> 7368 [339]PETSC ERROR: Configure options --with-cc=cc >>>>>>> --with-fc=ftn --known-mpi-shared-libraries=1 --known-mpi-c-double-complex=1 >>>>>>> --known-mpi-int64_t=1 --known-mpi-long-double=1 --with-batch=1 >>>>>>> --known-64-bit-blas-indices=0 --LIBS=-lstdc++ --with-cxxlib-autodetect=0 >>>>>>> --with-scalapa ck=1 --with-cxx=CC --with-debugging=0 >>>>>>> --with-hypre-dir=/opt/cray/pe/tpsl/19.06.1/GNU/8.2/haswell >>>>>>> --prefix=/scratch/snx3000/nvarini/petsc3.13.3-gpu --with-cuda=1 >>>>>>> --with-cuda-c=nvcc --with-cxxlib-autodetect=0 >>>>>>> --COPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include - >>>>>>> -with-cxx=CC >>>>>>> --CXXOPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include >>>>>>> 7369 [592]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in >>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>>>>> 7370 [592]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>>>>> 7371 [592]PETSC ERROR: #4 PCSetUp() line 898 in >>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c >>>>>>> 7372 [592]PETSC ERROR: #5 KSPSetUp() line 376 in >>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>> 7373 [592]PETSC ERROR: #6 KSPSolve_Private() line 633 in >>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>> 7374 [316]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>>>>> 7375 [339]PETSC ERROR: #1 formProl0() line 748 in >>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>>>>> 7376 [339]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in >>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>>>>> 7377 [339]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>>>>> 7378 [339]PETSC ERROR: #4 PCSetUp() line 898 in >>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c >>>>>>> 7379 [339]PETSC ERROR: #5 KSPSetUp() line 376 in >>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>> 7380 [592]PETSC ERROR: #7 KSPSolve() line 853 in >>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>> 7381 [339]PETSC ERROR: #6 KSPSolve_Private() line 633 in >>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>> 7382 [339]PETSC ERROR: #7 KSPSolve() line 853 in >>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>> 7383 On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal >>>>>>> value (info = -7) >>>>>>> 7384 [160]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>>>>> ======== >>>>>>> >>>>>>> I did try other pc_gamg_type but they fails as well. >>>>>>> >>>>>>> >>>>>>> #PETSc Option Table entries: >>>>>>> -ampere_dm_mat_type aijcusparse >>>>>>> -ampere_dm_vec_type cuda >>>>>>> -ampere_ksp_atol 1e-15 >>>>>>> -ampere_ksp_initial_guess_nonzero yes >>>>>>> -ampere_ksp_reuse_preconditioner yes >>>>>>> -ampere_ksp_rtol 1e-7 >>>>>>> -ampere_ksp_type dgmres >>>>>>> -ampere_mg_levels_esteig_ksp_max_it 10 >>>>>>> -ampere_mg_levels_esteig_ksp_type cg >>>>>>> -ampere_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >>>>>>> -ampere_mg_levels_ksp_type chebyshev >>>>>>> -ampere_mg_levels_pc_type jacobi >>>>>>> -ampere_pc_gamg_agg_nsmooths 1 >>>>>>> -ampere_pc_gamg_coarse_eq_limit 10 >>>>>>> -ampere_pc_gamg_reuse_interpolation true >>>>>>> -ampere_pc_gamg_square_graph 1 >>>>>>> -ampere_pc_gamg_threshold 0.05 >>>>>>> -ampere_pc_gamg_threshold_scale .0 >>>>>>> -ampere_pc_gamg_type agg >>>>>>> -ampere_pc_type gamg >>>>>>> -dm_mat_type aijcusparse >>>>>>> -dm_vec_type cuda >>>>>>> -log_view >>>>>>> -poisson_dm_mat_type aijcusparse >>>>>>> -poisson_dm_vec_type cuda >>>>>>> -poisson_ksp_atol 1e-15 >>>>>>> -poisson_ksp_initial_guess_nonzero yes >>>>>>> -poisson_ksp_reuse_preconditioner yes >>>>>>> -poisson_ksp_rtol 1e-7 >>>>>>> -poisson_ksp_type dgmres >>>>>>> -poisson_log_view >>>>>>> -poisson_mg_levels_esteig_ksp_max_it 10 >>>>>>> -poisson_mg_levels_esteig_ksp_type cg >>>>>>> -poisson_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >>>>>>> -poisson_mg_levels_ksp_max_it 1 >>>>>>> -poisson_mg_levels_ksp_type chebyshev >>>>>>> -poisson_mg_levels_pc_type jacobi >>>>>>> -poisson_pc_gamg_agg_nsmooths 1 >>>>>>> -poisson_pc_gamg_coarse_eq_limit 10 >>>>>>> -poisson_pc_gamg_reuse_interpolation true >>>>>>> -poisson_pc_gamg_square_graph 1 >>>>>>> -poisson_pc_gamg_threshold 0.05 >>>>>>> -poisson_pc_gamg_threshold_scale .0 >>>>>>> -poisson_pc_gamg_type agg >>>>>>> -poisson_pc_type gamg >>>>>>> -use_mat_nearnullspace true >>>>>>> #End of PETSc Option Table entries >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> Nicola >>>>>>> >>>>>>> Il giorno mar 4 ago 2020 alle ore 17:57 Mark Adams >>>>>>> ha scritto: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Aug 4, 2020 at 6:35 AM Stefano Zampini < >>>>>>>> stefano.zampini at gmail.com> wrote: >>>>>>>> >>>>>>>>> Nicola, >>>>>>>>> >>>>>>>>> You are actually not using the GPU properly, since you use HYPRE >>>>>>>>> preconditioning, which is CPU only. One of your solvers is actually slower >>>>>>>>> on ?GPU?. >>>>>>>>> For a full AMG GPU, you can use PCGAMG, with cheby smoothers and >>>>>>>>> with Jacobi preconditioning. Mark can help you out with the specific >>>>>>>>> command line options. >>>>>>>>> When it works properly, everything related to PC application is >>>>>>>>> offloaded to the GPU, and you should expect to get the well-known and >>>>>>>>> branded 10x (maybe more) speedup one is expecting from GPUs during KSPSolve >>>>>>>>> >>>>>>>>> >>>>>>>> The speedup depends on the machine, but on SUMMIT, using enough >>>>>>>> CPUs to saturate the memory bus vs all 6 GPUs the speedup is a function of >>>>>>>> problem subdomain size. I saw 10x at about 100K equations/process. >>>>>>>> >>>>>>>> >>>>>>>>> Doing what you want to do is one of the last optimization steps of >>>>>>>>> an already optimized code before entering production. Yours is not even >>>>>>>>> optimized for proper GPU usage yet. >>>>>>>>> Also, any specific reason why you are using dgmres and fgmres? >>>>>>>>> >>>>>>>>> PETSc has not been designed with multi-threading in mind. You can >>>>>>>>> achieve ?overlap? of the two solves by splitting the communicator. But then >>>>>>>>> you need communications to let the two solutions talk to each other. >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> Stefano >>>>>>>>> >>>>>>>>> >>>>>>>>> On Aug 4, 2020, at 12:04 PM, nicola varini < >>>>>>>>> nicola.varini at gmail.com> wrote: >>>>>>>>> >>>>>>>>> Dear all, thanks for your replies. The reason why I've asked if it >>>>>>>>> is possible to overlap poisson and ampere is because they roughly >>>>>>>>> take the same amount of time. Please find in attachment the >>>>>>>>> profiling logs for only CPU and only GPU. >>>>>>>>> Of course it is possible to split the MPI communicator and run >>>>>>>>> each solver on different subcommunicator, however this would involve more >>>>>>>>> communication. >>>>>>>>> Did anyone ever tried to run 2 solvers with hyperthreading? >>>>>>>>> Thanks >>>>>>>>> >>>>>>>>> >>>>>>>>> Il giorno dom 2 ago 2020 alle ore 14:09 Mark Adams < >>>>>>>>> mfadams at lbl.gov> ha scritto: >>>>>>>>> >>>>>>>>>> I suspect that the Poisson and Ampere's law solve are not >>>>>>>>>> coupled. You might be able to duplicate the communicator and use two >>>>>>>>>> threads. You would want to configure PETSc with threadsafty and threads and >>>>>>>>>> I think it could/should work, but this mode is never used by anyone. >>>>>>>>>> >>>>>>>>>> That said, I would not recommend doing this unless you feel like >>>>>>>>>> playing in computer science, as opposed to doing application science. The >>>>>>>>>> best case scenario you get a speedup of 2x. That is a strict upper bound, >>>>>>>>>> but you will never come close to it. Your hardware has some balance of CPU >>>>>>>>>> to GPU processing rate. Your application has a balance of volume of work >>>>>>>>>> for your two solves. They have to be the same to get close to 2x speedup >>>>>>>>>> and that ratio(s) has to be 1:1. To be concrete, from what little I can >>>>>>>>>> guess about your applications let's assume that the cost of each of these >>>>>>>>>> two solves is about the same (eg, Laplacians on your domain and the best >>>>>>>>>> case scenario). But, GPU machines are configured to have roughly 1-10% of >>>>>>>>>> capacity in the GPUs, these days, that gives you an upper bound of about >>>>>>>>>> 10% speedup. That is noise. Upshot, unless you configure your hardware to >>>>>>>>>> match this problem, and the two solves have the same cost, you will not see >>>>>>>>>> close to 2x speedup. Your time is better spent elsewhere. >>>>>>>>>> >>>>>>>>>> Mark >>>>>>>>>> >>>>>>>>>> On Sat, Aug 1, 2020 at 3:24 PM Jed Brown >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> You can use MPI and split the communicator so n-1 ranks create a >>>>>>>>>>> DMDA for one part of your system and the other rank drives the GPU in the >>>>>>>>>>> other part. They can all be part of the same coupled system on the full >>>>>>>>>>> communicator, but PETSc doesn't currently support some ranks having their >>>>>>>>>>> Vec arrays on GPU and others on host, so you'd be paying host-device >>>>>>>>>>> transfer costs on each iteration (and that might swamp any performance >>>>>>>>>>> benefit you would have gotten). >>>>>>>>>>> >>>>>>>>>>> In any case, be sure to think about the execution time of each >>>>>>>>>>> part. Load balancing with matching time-to-solution for each part can be >>>>>>>>>>> really hard. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Barry Smith writes: >>>>>>>>>>> >>>>>>>>>>> > Nicola, >>>>>>>>>>> > >>>>>>>>>>> > This is really viable or practical at this time with >>>>>>>>>>> PETSc. It is not impossible but requires careful coding with threads, >>>>>>>>>>> another possibility is to use one half of the virtual GPUs for each solve, >>>>>>>>>>> this is also not trivial. I would recommend first seeing what kind of >>>>>>>>>>> performance you can get on the GPU for each type of solve and revist this >>>>>>>>>>> idea in the future. >>>>>>>>>>> > >>>>>>>>>>> > Barry >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> >> On Jul 31, 2020, at 9:23 AM, nicola varini < >>>>>>>>>>> nicola.varini at gmail.com> wrote: >>>>>>>>>>> >> >>>>>>>>>>> >> Hello, I would like to know if it is possible to overlap CPU >>>>>>>>>>> and GPU with DMDA. >>>>>>>>>>> >> I've a machine where each node has 1P100+1Haswell. >>>>>>>>>>> >> I've to resolve Poisson and Ampere equation for each time >>>>>>>>>>> step. >>>>>>>>>>> >> I'm using 2D DMDA for each of them. Would be possible to >>>>>>>>>>> compute poisson >>>>>>>>>>> >> and ampere equation at the same time? One on CPU and the >>>>>>>>>>> other on GPU? >>>>>>>>>>> >> >>>>>>>>>>> >> Thanks >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>> >>>>> >>>>> >>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Mon Aug 17 11:10:42 2020 From: mfadams at lbl.gov (Mark Adams) Date: Mon, 17 Aug 2020 12:10:42 -0400 Subject: [petsc-users] overlap cpu and gpu? In-Reply-To: References: <6C7446CE-D6FD-4087-8B81-41494FA712E7@petsc.dev> <87eeoqp3t2.fsf@jedbrown.org> Message-ID: The eigen estimates are either very bad or the coarse grids have a problem. Everything looks fine other than these bad estimates that are >> 2. * Are these matrices not symmetric? Maybe from BCs. THat is not usually a problem, just checking. * Are these stretched grids? If not you might try: -ampere_pc_gamg_square_graph *10* * GMRES is not a good estimator when you have SPD matrices, but it is robust. You might try *-*ampere_mg_levels_esteig_*ksp_monitor_singular_value* -ampere_mg_levels_esteig_ksp_max_it *50* -ampere_mg_levels_esteig_ksp_type *gmres* * And why are you using: -ampere_ksp_type dgmres ? If your problems are SPD then CG is great. Mark On Mon, Aug 17, 2020 at 10:33 AM nicola varini wrote: > Hi Mark, this is the out of grep GAMG after I used -info: > ======= > [0] PCSetUp_GAMG(): level 0) N=582736, n data rows=1, n data cols=1, > nnz/row (ave)=9, np=12 > [0] PCGAMGFilterGraph(): 97.9676% nnz after filtering, with > threshold 0., 8.95768 nnz ave. (N=582736) > [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square > [0] PCGAMGProlongator_AGG(): New grid 38934 nodes > [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=2.101683e+00 min=4.341777e-03 > PC=jacobi > [0] PCGAMGOptProlongator_AGG(): Smooth P0: level 0, cache spectra > 0.00434178 2.10168 > [0] PCSetUp_GAMG(): 1) N=38934, n data cols=1, nnz/row (ave)=18, 12 active > pes > [0] PCGAMGFilterGraph(): 97.024% nnz after filtering, with > threshold 0., 17.9774 nnz ave. (N=38934) > [0] PCGAMGProlongator_AGG(): New grid 4459 nodes > [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=4.521607e+01 > min=5.854294e-01 PC=jacobi > [0] PCGAMGOptProlongator_AGG(): Smooth P0: level 1, cache spectra 0.585429 > 45.2161 > [0] PCSetUp_GAMG(): 2) N=4459, n data cols=1, nnz/row (ave)=29, 12 active > pes > [0] PCGAMGFilterGraph(): 99.6422% nnz after filtering, with > threshold 0., 27.5481 nnz ave. (N=4459) > [0] PCGAMGProlongator_AGG(): New grid 345 nodes > [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.394069e+01 min=1.086973e-01 > PC=jacobi > [0] PCGAMGOptProlongator_AGG(): Smooth P0: level 2, cache spectra 0.108697 > 13.9407 > [0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 29 with simple > aggregation > [0] PCSetUp_GAMG(): 3) N=345, n data cols=1, nnz/row (ave)=31, 6 active pes > [0] PCGAMGFilterGraph(): 99.6292% nnz after filtering, with > threshold 0., 26.9667 nnz ave. (N=345) > [0] PCGAMGProlongator_AGG(): New grid 26 nodes > [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.463593e+02 > min=1.469384e-01 PC=jacobi > [0] PCGAMGOptProlongator_AGG(): Smooth P0: level 3, cache spectra 0.146938 > 146.359 > [0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 5 with simple > aggregation > [0] PCSetUp_GAMG(): 4) N=26, n data cols=1, nnz/row (ave)=16, 1 active pes > [0] PCSetUp_GAMG(): 5 levels, grid complexity = 1.16304 > PCGAMGGraph_AGG 4 1.0 8.4114e-02 1.0 1.02e+06 1.0 3.8e+02 1.3e+03 > 4.0e+01 0 0 0 0 0 0 0 0 0 0 145 > PCGAMGCoarse_AGG 4 1.0 3.2107e-01 1.0 9.43e+06 1.0 7.3e+02 1.1e+04 > 3.5e+01 0 0 0 0 0 0 0 0 0 0 351 > PCGAMGProl_AGG 4 1.0 2.8825e-02 1.0 0.00e+00 0.0 3.5e+02 2.8e+03 > 6.4e+01 0 0 0 0 0 0 0 0 0 0 0 > PCGAMGPOpt_AGG 4 1.0 1.1570e-01 1.0 2.61e+07 1.0 1.2e+03 2.6e+03 > 1.6e+02 0 0 0 0 1 0 0 0 0 1 2692 > GAMG: createProl 4 1.0 5.5680e-01 1.0 3.64e+07 1.0 2.7e+03 4.6e+03 > 3.0e+02 0 0 0 0 1 0 0 0 0 1 784 > GAMG: partLevel 4 1.0 1.1628e-01 1.0 5.90e+06 1.0 1.1e+03 3.0e+03 > 1.6e+02 0 0 0 0 1 0 0 0 0 1 604 > ====== > Nicola > > > Il giorno lun 17 ago 2020 alle ore 15:40 Mark Adams ha > scritto: > >> >> >> On Mon, Aug 17, 2020 at 9:24 AM nicola varini >> wrote: >> >>> Hi Mark, I do confirm that hypre with boomeramg is working fine and is >>> pretty fast. >>> >> >> Good, you can send me the -info (grep GAMG) output and I try to see what >> is going on. >> >> >>> However, none of the GAMG option works. >>> Did anyone ever succeeded in usign hypre with petsc on gpu? >>> >> >> We have gotten Hypre to run on GPUs but it has been fragile. The >> performance has been marginal (due to use of USM apparently), but it is >> being worked on by the hypre team. >> >> The cude tools are changing fast and I am guessing this is a different >> version than what we have tested, perhaps. Maybe someone else can help with >> this, but I know we use cuda 10.2 and you are using cuda tools 10.1. >> >> And you do want to use the most up-to-date PETSc. >> >> >>> I did manage to compile hypre on gpu but I do get the following error: >>> ======= >>> CC gpuhypre/obj/vec/vec/impls/hypre/vhyp.o >>> In file included from >>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config.h:22, >>> from >>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/execution_policy.h:23, >>> from >>> /users/nvarini/hypre/include/_hypre_utilities.h:1129, >>> from /users/nvarini/hypre/include/_hypre_IJ_mv.h:14, >>> from >>> /scratch/snx3000/nvarini/petsc-3.13.3/include/../src/vec/vec/impls/hypre/vhyp.h:6, >>> from >>> /scratch/snx3000/nvarini/petsc-3.13.3/src/vec/vec/impls/hypre/vhyp.c:7: >>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/version.h:83:1: >>> error: unknown type name 'namespace' >>> namespace thrust >>> ^~~~~~~~~ >>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/version.h:84:1: >>> error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token >>> { >>> ^ >>> In file included from >>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config/config.h:28, >>> from >>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config.h:23, >>> from >>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/execution_policy.h:23, >>> from >>> /users/nvarini/hypre/include/_hypre_utilities.h:1129, >>> from /users/nvarini/hypre/include/_hypre_IJ_mv.h:14, >>> from >>> /scratch/snx3000/nvarini/petsc-3.13.3/include/../src/vec/vec/impls/hypre/vhyp.h:6, >>> from >>> /scratch/snx3000/nvarini/petsc-3.13.3/src/vec/vec/impls/hypre/vhyp.c:7: >>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config/cpp_compatibility.h:21:10: >>> fatal error: cstddef: No such file or directory >>> #include >>> ^~~~~~~~~ >>> compilation terminated. >>> >>> ======= >>> Nicola >>> >>> Il giorno ven 14 ago 2020 alle ore 20:13 Mark Adams >>> ha scritto: >>> >>>> You can try Hypre. If that fails then there is a problem with your >>>> system. >>>> >>>> And you can run with -info and grep on GAMG and send the output and I >>>> can see if I see anything funny. >>>> >>>> If this is just a Lapacian with a stable discretization and not crazy >>>> material parameters then stretched grids are about the only thing that can >>>> hurt the solver. >>>> >>>> Do both of your solves fail in a similar way? >>>> >>>> On the CPU you can try this with large subdomains, preferably (in >>>> serial ideally): >>>> -ampere_mg_levels_ksp_type richardson >>>> -ampere_mg_levels_pc_type sor >>>> >>>> And check that there are no unused options with -options_left. GAMG can >>>> fail with bad eigen estimates, but these parameters look fine. >>>> >>>> On Fri, Aug 14, 2020 at 5:01 AM nicola varini >>>> wrote: >>>> >>>>> Dear Barry, yes it gives the same problems. >>>>> >>>>> Il giorno gio 13 ago 2020 alle ore 23:22 Barry Smith >>>>> ha scritto: >>>>> >>>>>> >>>>>> Does the same thing work (with GAMG) if you run on the same >>>>>> problem on the same machine same number of MPI ranks but make a new >>>>>> PETSC_ARCH that does NOT use the GPUs? >>>>>> >>>>>> Barry >>>>>> >>>>>> Ideally one gets almost identical convergence with CPUs or GPUs >>>>>> (same problem, same machine) but a bug or numerically change "might" affect >>>>>> this. >>>>>> >>>>>> On Aug 13, 2020, at 10:28 AM, nicola varini >>>>>> wrote: >>>>>> >>>>>> Dear Barry, you are right. The Cray argument checking is incorrect. >>>>>> It does work with download-fblaslapack. >>>>>> However it does fail to converge. Is there anything obviously wrong >>>>>> with my petscrc? >>>>>> Anything else am I missing? >>>>>> >>>>>> Thanks >>>>>> >>>>>> Il giorno gio 13 ago 2020 alle ore 03:17 Barry Smith < >>>>>> bsmith at petsc.dev> ha scritto: >>>>>> >>>>>>> >>>>>>> The QR is always done on the CPU, we don't have generic calls to >>>>>>> blas/lapack go to the GPU currently. >>>>>>> >>>>>>> The error message is: >>>>>>> >>>>>>> On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal value >>>>>>> (info = -7) >>>>>>> >>>>>>> argument 7 is &LWORK which is defined by >>>>>>> >>>>>>> PetscBLASInt LWORK=N*bs; >>>>>>> >>>>>>> and >>>>>>> >>>>>>> N=nSAvec is the column block size of new P. >>>>>>> >>>>>>> Presumably this is a huge run with many processes so using the >>>>>>> debugger is not practical? >>>>>>> >>>>>>> We need to see what these variables are >>>>>>> >>>>>>> N, bs, nSAvec >>>>>>> >>>>>>> perhaps nSAvec is zero which could easily upset LAPACK. >>>>>>> >>>>>>> Crudest thing would be to just put a print statement in the code >>>>>>> before the LAPACK call of if they are called many times add an error check >>>>>>> like that >>>>>>> generates an error if any of these three values are 0 (or >>>>>>> negative). >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> >>>>>>> It is not impossible that the Cray argument checking is >>>>>>> incorrect and the value passed in is fine. You can check this by using >>>>>>> --download-fblaslapack and see if the same or some other error comes up. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Aug 12, 2020, at 7:19 PM, Mark Adams wrote: >>>>>>> >>>>>>> Can you reproduce this on the CPU? >>>>>>> The QR factorization seems to be failing. That could be from bad >>>>>>> data or a bad GPU QR. >>>>>>> >>>>>>> On Wed, Aug 12, 2020 at 4:19 AM nicola varini < >>>>>>> nicola.varini at gmail.com> wrote: >>>>>>> >>>>>>>> Dear all, following the suggestions I did resubmit the simulation >>>>>>>> with the petscrc below. >>>>>>>> However I do get the following error: >>>>>>>> ======== >>>>>>>> 7362 [592]PETSC ERROR: #1 formProl0() line 748 in >>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>>>>>> 7363 [339]PETSC ERROR: Petsc has generated inconsistent data >>>>>>>> 7364 [339]PETSC ERROR: xGEQRF error >>>>>>>> 7365 [339]PETSC ERROR: See >>>>>>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>>>>>> shooting. >>>>>>>> 7366 [339]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020 >>>>>>>> 7367 [339]PETSC ERROR: >>>>>>>> /users/nvarini/gbs_test_nicola/bin/gbs_daint_gpu_gnu on a named nid05083 >>>>>>>> by nvarini Wed Aug 12 10:06:15 2020 >>>>>>>> 7368 [339]PETSC ERROR: Configure options --with-cc=cc >>>>>>>> --with-fc=ftn --known-mpi-shared-libraries=1 --known-mpi-c-double-complex=1 >>>>>>>> --known-mpi-int64_t=1 --known-mpi-long-double=1 --with-batch=1 >>>>>>>> --known-64-bit-blas-indices=0 --LIBS=-lstdc++ --with-cxxlib-autodetect=0 >>>>>>>> --with-scalapa ck=1 --with-cxx=CC --with-debugging=0 >>>>>>>> --with-hypre-dir=/opt/cray/pe/tpsl/19.06.1/GNU/8.2/haswell >>>>>>>> --prefix=/scratch/snx3000/nvarini/petsc3.13.3-gpu --with-cuda=1 >>>>>>>> --with-cuda-c=nvcc --with-cxxlib-autodetect=0 >>>>>>>> --COPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include - >>>>>>>> -with-cxx=CC >>>>>>>> --CXXOPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include >>>>>>>> 7369 [592]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in >>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>>>>>> 7370 [592]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>>>>>> 7371 [592]PETSC ERROR: #4 PCSetUp() line 898 in >>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c >>>>>>>> 7372 [592]PETSC ERROR: #5 KSPSetUp() line 376 in >>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>>> 7373 [592]PETSC ERROR: #6 KSPSolve_Private() line 633 in >>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>>> 7374 [316]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>>>>>> 7375 [339]PETSC ERROR: #1 formProl0() line 748 in >>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>>>>>> 7376 [339]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in >>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>>>>>> 7377 [339]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>>>>>> 7378 [339]PETSC ERROR: #4 PCSetUp() line 898 in >>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c >>>>>>>> 7379 [339]PETSC ERROR: #5 KSPSetUp() line 376 in >>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>>> 7380 [592]PETSC ERROR: #7 KSPSolve() line 853 in >>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>>> 7381 [339]PETSC ERROR: #6 KSPSolve_Private() line 633 in >>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>>> 7382 [339]PETSC ERROR: #7 KSPSolve() line 853 in >>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>>> 7383 On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal >>>>>>>> value (info = -7) >>>>>>>> 7384 [160]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>>>>>> ======== >>>>>>>> >>>>>>>> I did try other pc_gamg_type but they fails as well. >>>>>>>> >>>>>>>> >>>>>>>> #PETSc Option Table entries: >>>>>>>> -ampere_dm_mat_type aijcusparse >>>>>>>> -ampere_dm_vec_type cuda >>>>>>>> -ampere_ksp_atol 1e-15 >>>>>>>> -ampere_ksp_initial_guess_nonzero yes >>>>>>>> -ampere_ksp_reuse_preconditioner yes >>>>>>>> -ampere_ksp_rtol 1e-7 >>>>>>>> -ampere_ksp_type dgmres >>>>>>>> -ampere_mg_levels_esteig_ksp_max_it 10 >>>>>>>> -ampere_mg_levels_esteig_ksp_type cg >>>>>>>> -ampere_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >>>>>>>> -ampere_mg_levels_ksp_type chebyshev >>>>>>>> -ampere_mg_levels_pc_type jacobi >>>>>>>> -ampere_pc_gamg_agg_nsmooths 1 >>>>>>>> -ampere_pc_gamg_coarse_eq_limit 10 >>>>>>>> -ampere_pc_gamg_reuse_interpolation true >>>>>>>> -ampere_pc_gamg_square_graph 1 >>>>>>>> -ampere_pc_gamg_threshold 0.05 >>>>>>>> -ampere_pc_gamg_threshold_scale .0 >>>>>>>> -ampere_pc_gamg_type agg >>>>>>>> -ampere_pc_type gamg >>>>>>>> -dm_mat_type aijcusparse >>>>>>>> -dm_vec_type cuda >>>>>>>> -log_view >>>>>>>> -poisson_dm_mat_type aijcusparse >>>>>>>> -poisson_dm_vec_type cuda >>>>>>>> -poisson_ksp_atol 1e-15 >>>>>>>> -poisson_ksp_initial_guess_nonzero yes >>>>>>>> -poisson_ksp_reuse_preconditioner yes >>>>>>>> -poisson_ksp_rtol 1e-7 >>>>>>>> -poisson_ksp_type dgmres >>>>>>>> -poisson_log_view >>>>>>>> -poisson_mg_levels_esteig_ksp_max_it 10 >>>>>>>> -poisson_mg_levels_esteig_ksp_type cg >>>>>>>> -poisson_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >>>>>>>> -poisson_mg_levels_ksp_max_it 1 >>>>>>>> -poisson_mg_levels_ksp_type chebyshev >>>>>>>> -poisson_mg_levels_pc_type jacobi >>>>>>>> -poisson_pc_gamg_agg_nsmooths 1 >>>>>>>> -poisson_pc_gamg_coarse_eq_limit 10 >>>>>>>> -poisson_pc_gamg_reuse_interpolation true >>>>>>>> -poisson_pc_gamg_square_graph 1 >>>>>>>> -poisson_pc_gamg_threshold 0.05 >>>>>>>> -poisson_pc_gamg_threshold_scale .0 >>>>>>>> -poisson_pc_gamg_type agg >>>>>>>> -poisson_pc_type gamg >>>>>>>> -use_mat_nearnullspace true >>>>>>>> #End of PETSc Option Table entries >>>>>>>> >>>>>>>> Regards, >>>>>>>> >>>>>>>> Nicola >>>>>>>> >>>>>>>> Il giorno mar 4 ago 2020 alle ore 17:57 Mark Adams >>>>>>>> ha scritto: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Aug 4, 2020 at 6:35 AM Stefano Zampini < >>>>>>>>> stefano.zampini at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Nicola, >>>>>>>>>> >>>>>>>>>> You are actually not using the GPU properly, since you use HYPRE >>>>>>>>>> preconditioning, which is CPU only. One of your solvers is actually slower >>>>>>>>>> on ?GPU?. >>>>>>>>>> For a full AMG GPU, you can use PCGAMG, with cheby smoothers and >>>>>>>>>> with Jacobi preconditioning. Mark can help you out with the specific >>>>>>>>>> command line options. >>>>>>>>>> When it works properly, everything related to PC application is >>>>>>>>>> offloaded to the GPU, and you should expect to get the well-known and >>>>>>>>>> branded 10x (maybe more) speedup one is expecting from GPUs during KSPSolve >>>>>>>>>> >>>>>>>>>> >>>>>>>>> The speedup depends on the machine, but on SUMMIT, using enough >>>>>>>>> CPUs to saturate the memory bus vs all 6 GPUs the speedup is a function of >>>>>>>>> problem subdomain size. I saw 10x at about 100K equations/process. >>>>>>>>> >>>>>>>>> >>>>>>>>>> Doing what you want to do is one of the last optimization steps >>>>>>>>>> of an already optimized code before entering production. Yours is not even >>>>>>>>>> optimized for proper GPU usage yet. >>>>>>>>>> Also, any specific reason why you are using dgmres and fgmres? >>>>>>>>>> >>>>>>>>>> PETSc has not been designed with multi-threading in mind. You can >>>>>>>>>> achieve ?overlap? of the two solves by splitting the communicator. But then >>>>>>>>>> you need communications to let the two solutions talk to each other. >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> Stefano >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Aug 4, 2020, at 12:04 PM, nicola varini < >>>>>>>>>> nicola.varini at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>> Dear all, thanks for your replies. The reason why I've asked if >>>>>>>>>> it is possible to overlap poisson and ampere is because they roughly >>>>>>>>>> take the same amount of time. Please find in attachment the >>>>>>>>>> profiling logs for only CPU and only GPU. >>>>>>>>>> Of course it is possible to split the MPI communicator and run >>>>>>>>>> each solver on different subcommunicator, however this would involve more >>>>>>>>>> communication. >>>>>>>>>> Did anyone ever tried to run 2 solvers with hyperthreading? >>>>>>>>>> Thanks >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Il giorno dom 2 ago 2020 alle ore 14:09 Mark Adams < >>>>>>>>>> mfadams at lbl.gov> ha scritto: >>>>>>>>>> >>>>>>>>>>> I suspect that the Poisson and Ampere's law solve are not >>>>>>>>>>> coupled. You might be able to duplicate the communicator and use two >>>>>>>>>>> threads. You would want to configure PETSc with threadsafty and threads and >>>>>>>>>>> I think it could/should work, but this mode is never used by anyone. >>>>>>>>>>> >>>>>>>>>>> That said, I would not recommend doing this unless you feel like >>>>>>>>>>> playing in computer science, as opposed to doing application science. The >>>>>>>>>>> best case scenario you get a speedup of 2x. That is a strict upper bound, >>>>>>>>>>> but you will never come close to it. Your hardware has some balance of CPU >>>>>>>>>>> to GPU processing rate. Your application has a balance of volume of work >>>>>>>>>>> for your two solves. They have to be the same to get close to 2x speedup >>>>>>>>>>> and that ratio(s) has to be 1:1. To be concrete, from what little I can >>>>>>>>>>> guess about your applications let's assume that the cost of each of these >>>>>>>>>>> two solves is about the same (eg, Laplacians on your domain and the best >>>>>>>>>>> case scenario). But, GPU machines are configured to have roughly 1-10% of >>>>>>>>>>> capacity in the GPUs, these days, that gives you an upper bound of about >>>>>>>>>>> 10% speedup. That is noise. Upshot, unless you configure your hardware to >>>>>>>>>>> match this problem, and the two solves have the same cost, you will not see >>>>>>>>>>> close to 2x speedup. Your time is better spent elsewhere. >>>>>>>>>>> >>>>>>>>>>> Mark >>>>>>>>>>> >>>>>>>>>>> On Sat, Aug 1, 2020 at 3:24 PM Jed Brown >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> You can use MPI and split the communicator so n-1 ranks create >>>>>>>>>>>> a DMDA for one part of your system and the other rank drives the GPU in the >>>>>>>>>>>> other part. They can all be part of the same coupled system on the full >>>>>>>>>>>> communicator, but PETSc doesn't currently support some ranks having their >>>>>>>>>>>> Vec arrays on GPU and others on host, so you'd be paying host-device >>>>>>>>>>>> transfer costs on each iteration (and that might swamp any performance >>>>>>>>>>>> benefit you would have gotten). >>>>>>>>>>>> >>>>>>>>>>>> In any case, be sure to think about the execution time of each >>>>>>>>>>>> part. Load balancing with matching time-to-solution for each part can be >>>>>>>>>>>> really hard. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Barry Smith writes: >>>>>>>>>>>> >>>>>>>>>>>> > Nicola, >>>>>>>>>>>> > >>>>>>>>>>>> > This is really viable or practical at this time with >>>>>>>>>>>> PETSc. It is not impossible but requires careful coding with threads, >>>>>>>>>>>> another possibility is to use one half of the virtual GPUs for each solve, >>>>>>>>>>>> this is also not trivial. I would recommend first seeing what kind of >>>>>>>>>>>> performance you can get on the GPU for each type of solve and revist this >>>>>>>>>>>> idea in the future. >>>>>>>>>>>> > >>>>>>>>>>>> > Barry >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> >> On Jul 31, 2020, at 9:23 AM, nicola varini < >>>>>>>>>>>> nicola.varini at gmail.com> wrote: >>>>>>>>>>>> >> >>>>>>>>>>>> >> Hello, I would like to know if it is possible to overlap CPU >>>>>>>>>>>> and GPU with DMDA. >>>>>>>>>>>> >> I've a machine where each node has 1P100+1Haswell. >>>>>>>>>>>> >> I've to resolve Poisson and Ampere equation for each time >>>>>>>>>>>> step. >>>>>>>>>>>> >> I'm using 2D DMDA for each of them. Would be possible to >>>>>>>>>>>> compute poisson >>>>>>>>>>>> >> and ampere equation at the same time? One on CPU and the >>>>>>>>>>>> other on GPU? >>>>>>>>>>>> >> >>>>>>>>>>>> >> Thanks >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From nb25 at rice.edu Mon Aug 17 13:31:17 2020 From: nb25 at rice.edu (Nidish) Date: Mon, 17 Aug 2020 13:31:17 -0500 Subject: [petsc-users] Solving singular systems with petsc In-Reply-To: References: <7a07ee36-4a86-7868-4d56-91e7a02e4e1c@rice.edu> <4fce2a36-83d9-407a-a4fd-06a1710dd059@rice.edu> <87bljb6uf8.fsf@jedbrown.org> <75b47901-a5d2-e794-4c4f-27ed4cbe26b0@rice.edu> Message-ID: Thankfully for this step, the matrix is not dense. But thank you. Nidish On 8/17/20 8:52 AM, Jose E. Roman wrote: > >> El 17 ago 2020, a las 14:27, Barry Smith escribi?: >> >> >> Nidish, >> >> Your matrix is dense, correct? MUMPS is for sparse matrices. >> >> Then I guess you could use Scalapack http://netlib.org/scalapack/slug/node48.html#SECTION04323200000000000000 to do the SVD. The work is order N^3 and parallel efficiency may not be great but it might help you solve your problem. >> >> I don't know if SLEPc has an interface to Scalapack for SVD or not. > Yes, SLEPc (master) has interfaces for ScaLAPACK and Elemental for both SVD and (symmetric) eigenvalues. > > Jose > >> Barry >> >> >> >> >> >> >> >>> On Aug 17, 2020, at 2:51 AM, Jose E. Roman wrote: >>> >>> You can use SLEPc's SVD to compute the nullspace, but it has pitfalls: make sure you use an absolute convergence test (not relative); for the particular case of zero singular vectors, accuracy may not be very good and convergence may be slow (with the corresponding high computational cost). >>> >>> MUMPS has functionality to get a basis of the nullspace, once you have computed the factorization. But I don't know if this is easily accessible via PETSc. >>> >>> Jose >>> >>> >>> >>>> El 17 ago 2020, a las 3:10, Nidish escribi?: >>>> >>>> Oh damn. Alright, I'll keep trying out the different options. >>>> >>>> Thank you, >>>> Nidish >>>> >>>> On 8/16/20 8:05 PM, Barry Smith wrote: >>>>> SVD is enormously expensive, needs to be done on a full dense matrix so completely impractical. You need the best tuned iterative method, Jose is the by far the most knowledgeable about that. >>>>> >>>>> Barry >>>>> >>>>> >>>>>> On Aug 16, 2020, at 7:46 PM, Nidish wrote: >>>>>> >>>>>> Thank you for the suggestions. >>>>>> >>>>>> I'm getting a zero pivot error for the LU in slepc while calculating the rest of the modes. >>>>>> >>>>>> Would conducting an SVD for just the stiffness matrix and then using the singular vectors as bases for the nullspace work? I haven't tried this out just yet, but I'm wondering if you could provide me insights into whether this will. >>>>>> >>>>>> Thanks, >>>>>> Nidish >>>>>> >>>>>> On 8/16/20 2:50 PM, Barry Smith wrote: >>>>>>> If you know part of your null space explicitly (for example the rigid body modes) I would recommend you always use that information explicitly since it is extremely expensive numerically to obtain. Thus rather than numerically computing the entire null space compute the part orthogonal to the part you already know. Presumably SLEPc has tools to help do this, naively I would just orthogonalized against the know subspace during the computational process but there are probably better ways. >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Aug 16, 2020, at 11:26 AM, Nidish wrote: >>>>>>>> >>>>>>>> Well some of the zero eigenvectors are rigid body modes, but there are some more which are introduced by lagrange-multiplier based constraint enforcement, which are non trivial. >>>>>>>> >>>>>>>> My final application is for a nonlinear simulation, so I don't mind the extra computational effort initially. Could you have me the suggested solver configurations to get this type of eigenvectors in slepc? >>>>>>>> >>>>>>>> Nidish >>>>>>>> On Aug 16, 2020, at 00:17, Jed Brown wrote: >>>>>>>> It's possible to use this or a similar algorithm in SLEPc, but keep in mind that it's more expensive to compute these eigenvectors than to solve a linear system. Do you have a sequence of systems with the same null space? >>>>>>>> >>>>>>>> You referred to the null space as "rigid body modes". Why can't those be written down? Note that PETSc has convenience routines for computing rigid body modes from coordinates. >>>>>>>> >>>>>>>> Nidish < >>>>>>>> nb25 at rice.edu >>>>>>>>> writes: >>>>>>>> >>>>>>>> I just use the standard eigs function (https://www.mathworks.com/help/matlab/ref/eigs.html >>>>>>>> ) as a black box. I think it uses a lanczos type method under the hood. >>>>>>>> >>>>>>>> Nidish >>>>>>>> >>>>>>>> On Aug 15, 2020, 21:42, at 21:42, Barry Smith < >>>>>>>> bsmith at petsc.dev >>>>>>>>> wrote: >>>>>>>> >>>>>>>> Exactly what algorithm are you using in Matlab to get the 10 smallest >>>>>>>> eigenvalues and their corresponding eigenvectors? >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Aug 15, 2020, at 8:53 PM, Nidish >>>>>>>> wrote: >>>>>>>> The section on solving singular systems in the manual starts with >>>>>>>> >>>>>>>> assuming that the singular eigenvectors are already known. >>>>>>>> >>>>>>>> >>>>>>>> I have a large system where finding the singular eigenvectors is not >>>>>>>> >>>>>>>> trivially written down. How would you recommend I proceed with making >>>>>>>> initial estimates? In MATLAB (with MUCH smaller matrices), I conduct an >>>>>>>> eigensolve for the first 10 smallest eigenvalues and take the >>>>>>>> eigenvectors corresponding to the zero eigenvalues from this. This >>>>>>>> approach doesn't work here since I'm unable to use SLEPc for solving >>>>>>>> >>>>>>>> >>>>>>>> K.v = lam*M.v >>>>>>>> >>>>>>>> for cases where K is positive semi-definite (contains a few "rigid >>>>>>>> >>>>>>>> body modes") and M is strictly positive definite. >>>>>>>> >>>>>>>> >>>>>>>> I'd appreciate any assistance you may provide with this. >>>>>>>> >>>>>>>> Thank you, >>>>>>>> Nidish >>>>>>>> >>>>>> -- >>>>>> Nidish >>>> -- >>>> Nidish -- Nidish From zakaryah at gmail.com Mon Aug 17 13:33:49 2020 From: zakaryah at gmail.com (zakaryah) Date: Mon, 17 Aug 2020 14:33:49 -0400 Subject: [petsc-users] Solving singular systems with petsc In-Reply-To: References: <7a07ee36-4a86-7868-4d56-91e7a02e4e1c@rice.edu> <4fce2a36-83d9-407a-a4fd-06a1710dd059@rice.edu> <87bljb6uf8.fsf@jedbrown.org> <75b47901-a5d2-e794-4c4f-27ed4cbe26b0@rice.edu> Message-ID: Hi Nidish, I may not fully understand your problem, but it sounds like you could benefit from continuation methods. Have you looked into this? If it's helpful, I have some experience with this and I can discuss with you by email. Cheers, Zak On Mon, Aug 17, 2020 at 2:31 PM Nidish wrote: > Thankfully for this step, the matrix is not dense. But thank you. > > Nidish > > On 8/17/20 8:52 AM, Jose E. Roman wrote: > > > >> El 17 ago 2020, a las 14:27, Barry Smith escribi?: > >> > >> > >> Nidish, > >> > >> Your matrix is dense, correct? MUMPS is for sparse matrices. > >> > >> Then I guess you could use Scalapack > http://netlib.org/scalapack/slug/node48.html#SECTION04323200000000000000 > to do the SVD. The work is order N^3 and parallel efficiency may not be > great but it might help you solve your problem. > >> > >> I don't know if SLEPc has an interface to Scalapack for SVD or > not. > > Yes, SLEPc (master) has interfaces for ScaLAPACK and Elemental for both > SVD and (symmetric) eigenvalues. > > > > Jose > > > >> Barry > >> > >> > >> > >> > >> > >> > >> > >>> On Aug 17, 2020, at 2:51 AM, Jose E. Roman wrote: > >>> > >>> You can use SLEPc's SVD to compute the nullspace, but it has pitfalls: > make sure you use an absolute convergence test (not relative); for the > particular case of zero singular vectors, accuracy may not be very good and > convergence may be slow (with the corresponding high computational cost). > >>> > >>> MUMPS has functionality to get a basis of the nullspace, once you have > computed the factorization. But I don't know if this is easily accessible > via PETSc. > >>> > >>> Jose > >>> > >>> > >>> > >>>> El 17 ago 2020, a las 3:10, Nidish escribi?: > >>>> > >>>> Oh damn. Alright, I'll keep trying out the different options. > >>>> > >>>> Thank you, > >>>> Nidish > >>>> > >>>> On 8/16/20 8:05 PM, Barry Smith wrote: > >>>>> SVD is enormously expensive, needs to be done on a full dense matrix > so completely impractical. You need the best tuned iterative method, Jose > is the by far the most knowledgeable about that. > >>>>> > >>>>> Barry > >>>>> > >>>>> > >>>>>> On Aug 16, 2020, at 7:46 PM, Nidish wrote: > >>>>>> > >>>>>> Thank you for the suggestions. > >>>>>> > >>>>>> I'm getting a zero pivot error for the LU in slepc while > calculating the rest of the modes. > >>>>>> > >>>>>> Would conducting an SVD for just the stiffness matrix and then > using the singular vectors as bases for the nullspace work? I haven't tried > this out just yet, but I'm wondering if you could provide me insights into > whether this will. > >>>>>> > >>>>>> Thanks, > >>>>>> Nidish > >>>>>> > >>>>>> On 8/16/20 2:50 PM, Barry Smith wrote: > >>>>>>> If you know part of your null space explicitly (for example the > rigid body modes) I would recommend you always use that information > explicitly since it is extremely expensive numerically to obtain. Thus > rather than numerically computing the entire null space compute the part > orthogonal to the part you already know. Presumably SLEPc has tools to help > do this, naively I would just orthogonalized against the know subspace > during the computational process but there are probably better ways. > >>>>>>> > >>>>>>> Barry > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> On Aug 16, 2020, at 11:26 AM, Nidish wrote: > >>>>>>>> > >>>>>>>> Well some of the zero eigenvectors are rigid body modes, but > there are some more which are introduced by lagrange-multiplier based > constraint enforcement, which are non trivial. > >>>>>>>> > >>>>>>>> My final application is for a nonlinear simulation, so I don't > mind the extra computational effort initially. Could you have me the > suggested solver configurations to get this type of eigenvectors in slepc? > >>>>>>>> > >>>>>>>> Nidish > >>>>>>>> On Aug 16, 2020, at 00:17, Jed Brown wrote: > >>>>>>>> It's possible to use this or a similar algorithm in SLEPc, but > keep in mind that it's more expensive to compute these eigenvectors than to > solve a linear system. Do you have a sequence of systems with the same > null space? > >>>>>>>> > >>>>>>>> You referred to the null space as "rigid body modes". Why can't > those be written down? Note that PETSc has convenience routines for > computing rigid body modes from coordinates. > >>>>>>>> > >>>>>>>> Nidish < > >>>>>>>> nb25 at rice.edu > >>>>>>>>> writes: > >>>>>>>> > >>>>>>>> I just use the standard eigs function ( > https://www.mathworks.com/help/matlab/ref/eigs.html > >>>>>>>> ) as a black box. I think it uses a lanczos type method under the > hood. > >>>>>>>> > >>>>>>>> Nidish > >>>>>>>> > >>>>>>>> On Aug 15, 2020, 21:42, at 21:42, Barry Smith < > >>>>>>>> bsmith at petsc.dev > >>>>>>>>> wrote: > >>>>>>>> > >>>>>>>> Exactly what algorithm are you using in Matlab to get the 10 > smallest > >>>>>>>> eigenvalues and their corresponding eigenvectors? > >>>>>>>> > >>>>>>>> Barry > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On Aug 15, 2020, at 8:53 PM, Nidish >>>>>>>>> wrote: > >>>>>>>> The section on solving singular systems in the manual starts with > >>>>>>>> > >>>>>>>> assuming that the singular eigenvectors are already known. > >>>>>>>> > >>>>>>>> > >>>>>>>> I have a large system where finding the singular eigenvectors is > not > >>>>>>>> > >>>>>>>> trivially written down. How would you recommend I proceed with > making > >>>>>>>> initial estimates? In MATLAB (with MUCH smaller matrices), I > conduct an > >>>>>>>> eigensolve for the first 10 smallest eigenvalues and take the > >>>>>>>> eigenvectors corresponding to the zero eigenvalues from this. This > >>>>>>>> approach doesn't work here since I'm unable to use SLEPc for > solving > >>>>>>>> > >>>>>>>> > >>>>>>>> K.v = lam*M.v > >>>>>>>> > >>>>>>>> for cases where K is positive semi-definite (contains a few "rigid > >>>>>>>> > >>>>>>>> body modes") and M is strictly positive definite. > >>>>>>>> > >>>>>>>> > >>>>>>>> I'd appreciate any assistance you may provide with this. > >>>>>>>> > >>>>>>>> Thank you, > >>>>>>>> Nidish > >>>>>>>> > >>>>>> -- > >>>>>> Nidish > >>>> -- > >>>> Nidish > -- > Nidish > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nb25 at rice.edu Mon Aug 17 13:42:44 2020 From: nb25 at rice.edu (Nidish) Date: Mon, 17 Aug 2020 13:42:44 -0500 Subject: [petsc-users] Solving singular systems with petsc In-Reply-To: References: <7a07ee36-4a86-7868-4d56-91e7a02e4e1c@rice.edu> <4fce2a36-83d9-407a-a4fd-06a1710dd059@rice.edu> <87bljb6uf8.fsf@jedbrown.org> <75b47901-a5d2-e794-4c4f-27ed4cbe26b0@rice.edu> Message-ID: <772dc5e7-d05a-9e10-e4cf-413273fd6897@rice.edu> Thank you for the email, Zak, I have not looked into continuation methods for this problem yet, and I'd love to hear your thoughts on it! The problem I have is two-fold: 1. Obtaining the eigenvectors of the system K.v = lam*M.v where the K and M are stiffness and mass matrices coming from a finite element model without any Dirichlet boundary conditions (i.e., having 6 Rigid body modes) and with a few RBE3 constraints (introducing degrees of freedom with "zero mass" in the mass matrix). So the system has, in addition to the 6 rigid body modes, a few "spurious" null vectors coming from these RBE3 constraints (which are enforced using Lagrange multipliers). 2. Conducting a linear solve of the system: K.x = b where K is from above. What I'm trying to do with both of these is to conduct a Hurty/Craig-Bampton Component Mode Synthesis, which is like a Schur Condensation with a few "fixed interface modal" DoFs added to the reduced system. So both can be solved by obtaining the vectors that span the null space of the system. I've created two separate threads for the two problems because I felt both are slightly different questions. But I understand now that the whole thing boils down to obtaining the nullspaces. Thank you, Nidish On 8/17/20 1:33 PM, zakaryah wrote: > Hi Nidish, > > I may not fully understand your problem, but it sounds like you could > benefit from continuation methods. Have you looked into this? If it's > helpful, I have some experience with this and I can discuss with you > by email. > > Cheers, Zak > > On Mon, Aug 17, 2020 at 2:31 PM Nidish > wrote: > > Thankfully for this step, the matrix is not dense. But thank you. > > Nidish > > On 8/17/20 8:52 AM, Jose E. Roman wrote: > > > >> El 17 ago 2020, a las 14:27, Barry Smith > escribi?: > >> > >> > >>? ?Nidish, > >> > >>? ? ? Your matrix is dense, correct? MUMPS is for sparse matrices. > >> > >>? ? ? Then I guess you could use Scalapack > http://netlib.org/scalapack/slug/node48.html#SECTION04323200000000000000 > to do the SVD. The work is order N^3 and parallel efficiency may > not be great but it might help you solve your problem. > >> > >>? ? ? ?I don't know if SLEPc has an interface to Scalapack for > SVD or not. > > Yes, SLEPc (master) has interfaces for ScaLAPACK and Elemental > for both SVD and (symmetric) eigenvalues. > > > > Jose > > > >>? ? ?Barry > >> > >> > >> > >> > >> > >> > >> > >>> On Aug 17, 2020, at 2:51 AM, Jose E. Roman > wrote: > >>> > >>> You can use SLEPc's SVD to compute the nullspace, but it has > pitfalls: make sure you use an absolute convergence test (not > relative); for the particular case of zero singular vectors, > accuracy may not be very good and convergence may be slow (with > the corresponding high computational cost). > >>> > >>> MUMPS has functionality to get a basis of the nullspace, once > you have computed the factorization. But I don't know if this is > easily accessible via PETSc. > >>> > >>> Jose > >>> > >>> > >>> > >>>> El 17 ago 2020, a las 3:10, Nidish > escribi?: > >>>> > >>>> Oh damn. Alright, I'll keep trying out the different options. > >>>> > >>>> Thank you, > >>>> Nidish > >>>> > >>>> On 8/16/20 8:05 PM, Barry Smith wrote: > >>>>> SVD is enormously expensive, needs to be done on a full > dense matrix so completely impractical. You need the best tuned > iterative method, Jose is the by far the most knowledgeable about > that. > >>>>> > >>>>>? ?Barry > >>>>> > >>>>> > >>>>>> On Aug 16, 2020, at 7:46 PM, Nidish > wrote: > >>>>>> > >>>>>> Thank you for the suggestions. > >>>>>> > >>>>>> I'm getting a zero pivot error for the LU in slepc while > calculating the rest of the modes. > >>>>>> > >>>>>> Would conducting an SVD for just the stiffness matrix and > then using the singular vectors as bases for the nullspace work? I > haven't tried this out just yet, but I'm wondering if you could > provide me insights into whether this will. > >>>>>> > >>>>>> Thanks, > >>>>>> Nidish > >>>>>> > >>>>>> On 8/16/20 2:50 PM, Barry Smith wrote: > >>>>>>> If you know part of your null space explicitly (for > example the rigid body modes) I would recommend you always use > that information explicitly since it is extremely expensive > numerically to obtain. Thus rather than numerically computing the > entire null space compute the part orthogonal to the part you > already know. Presumably SLEPc has tools to help do this, naively > I would just orthogonalized against the know subspace during the > computational process but there are probably better ways. > >>>>>>> > >>>>>>>? ?Barry > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> On Aug 16, 2020, at 11:26 AM, Nidish > wrote: > >>>>>>>> > >>>>>>>> Well some of the zero eigenvectors are rigid body modes, > but there are some more which are introduced by > lagrange-multiplier based constraint enforcement, which are non > trivial. > >>>>>>>> > >>>>>>>> My final application is for a nonlinear simulation, so I > don't mind the extra computational effort initially. Could you > have me the suggested solver configurations to get this type of > eigenvectors in slepc? > >>>>>>>> > >>>>>>>> Nidish > >>>>>>>> On Aug 16, 2020, at 00:17, Jed Brown > wrote: > >>>>>>>> It's possible to use this or a similar algorithm in > SLEPc, but keep in mind that it's more expensive to compute these > eigenvectors than to solve a linear system.? Do you have a > sequence of systems with the same null space? > >>>>>>>> > >>>>>>>> You referred to the null space as "rigid body modes".? > Why can't those be written down?? Note that PETSc has convenience > routines for computing rigid body modes from coordinates. > >>>>>>>> > >>>>>>>> Nidish < > >>>>>>>> nb25 at rice.edu > >>>>>>>>> writes: > >>>>>>>> > >>>>>>>> I just use the standard eigs function > (https://www.mathworks.com/help/matlab/ref/eigs.html > >>>>>>>> ) as a black box. I think it uses a lanczos type method > under the hood. > >>>>>>>> > >>>>>>>> Nidish > >>>>>>>> > >>>>>>>> On Aug 15, 2020, 21:42, at 21:42, Barry Smith < > >>>>>>>> bsmith at petsc.dev > >>>>>>>>> wrote: > >>>>>>>> > >>>>>>>> Exactly what algorithm are you using in Matlab to get the > 10 smallest > >>>>>>>> eigenvalues and their corresponding eigenvectors? > >>>>>>>> > >>>>>>>> Barry > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On Aug 15, 2020, at 8:53 PM, Nidish > >>>>>>>>> wrote: > >>>>>>>> The section on solving singular systems in the manual > starts with > >>>>>>>> > >>>>>>>> assuming that the singular eigenvectors are already known. > >>>>>>>> > >>>>>>>> > >>>>>>>> I have a large system where finding the singular > eigenvectors is not > >>>>>>>> > >>>>>>>> trivially written down. How would you recommend I proceed > with making > >>>>>>>> initial estimates? In MATLAB (with MUCH smaller > matrices), I conduct an > >>>>>>>> eigensolve for the first 10 smallest eigenvalues and take the > >>>>>>>> eigenvectors corresponding to the zero eigenvalues from > this. This > >>>>>>>> approach doesn't work here since I'm unable to use SLEPc > for solving > >>>>>>>> > >>>>>>>> > >>>>>>>> K.v = lam*M.v > >>>>>>>> > >>>>>>>> for cases where K is positive semi-definite (contains a > few "rigid > >>>>>>>> > >>>>>>>> body modes") and M is strictly positive definite. > >>>>>>>> > >>>>>>>> > >>>>>>>> I'd appreciate any assistance you may provide with this. > >>>>>>>> > >>>>>>>> Thank you, > >>>>>>>> Nidish > >>>>>>>> > >>>>>> -- > >>>>>> Nidish > >>>> -- > >>>> Nidish > -- > Nidish > -- Nidish -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdkong.jd at gmail.com Mon Aug 17 18:04:31 2020 From: fdkong.jd at gmail.com (Fande Kong) Date: Mon, 17 Aug 2020 17:04:31 -0600 Subject: [petsc-users] ParMETIS vs. CHACO when no partitioning is made In-Reply-To: References: <548DC35C-0D80-409E-B360-D7E54076111D@petsc.dev> Message-ID: IIRC, Chaco does not produce an arbitrary number of subdomains. The number needs to be like 2^n. ParMETIS and PTScotch are much better, and they are production-level code. If there is no particular reason, I would like to suggest staying with ParMETIS and PTScotch. Thanks, Fande, On Fri, Aug 14, 2020 at 10:07 AM Eda Oktay wrote: > Dear Barry, > > Thank you for answering. I am sending a sample code and a binary file. > > Thanks! > > Eda > > Barry Smith , 14 A?u 2020 Cum, 18:49 tarihinde ?unu > yazd?: > >> >> Could be a bug in Chaco or its call from PETSc for the special case of >> one process. Could you send a sample code that demonstrates the problem? >> >> Barry >> >> >> > On Aug 14, 2020, at 8:53 AM, Eda Oktay wrote: >> > >> > Hi all, >> > >> > I am trying to try something. I am using the same MatPartitioning codes >> for both CHACO and ParMETIS: >> > >> > ierr = >> MatConvert(SymmA,MATMPIADJ,MAT_INITIAL_MATRIX,&AL);CHKERRQ(ierr); >> > ierr = MatPartitioningCreate(MPI_COMM_WORLD,&part);CHKERRQ(ierr); >> > ierr = MatPartitioningSetAdjacency(part,AL);CHKERRQ(ierr); >> > >> > ierr = MatPartitioningSetFromOptions(part);CHKERRQ(ierr); >> > ierr = MatPartitioningApply(part,&partitioning);CHKERRQ(ierr); >> > >> > After obtaining the IS, I apply this to my original nonsymmetric matrix >> and try to get an approximate edge cut. >> > >> > Except for 1 partitioning, my program completely works for 2,4 and 16 >> partitionings. However, for 1, ParMETIS gives results where CHACO I guess >> doesn't since I am getting errors about the index set. >> > >> > What is the difference between CHACO and ParMETIS that one works for 1 >> partitioning and one doesn't? >> > >> > Thanks! >> > >> > Eda >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdkong.jd at gmail.com Mon Aug 17 18:18:00 2020 From: fdkong.jd at gmail.com (Fande Kong) Date: Mon, 17 Aug 2020 17:18:00 -0600 Subject: [petsc-users] ParMETIS vs. CHACO when no partitioning is made In-Reply-To: References: <548DC35C-0D80-409E-B360-D7E54076111D@petsc.dev> Message-ID: For this particular case (one subdoanin), it may be easy to fix in petsc. We could create a partitioning index filled with zeros. Fande, On Mon, Aug 17, 2020 at 5:04 PM Fande Kong wrote: > IIRC, Chaco does not produce an arbitrary number of subdomains. The number > needs to be like 2^n. > > ParMETIS and PTScotch are much better, and they are production-level code. > If there is no particular reason, I would like to suggest staying with > ParMETIS and PTScotch. > > Thanks, > > Fande, > > > > On Fri, Aug 14, 2020 at 10:07 AM Eda Oktay wrote: > >> Dear Barry, >> >> Thank you for answering. I am sending a sample code and a binary file. >> >> Thanks! >> >> Eda >> >> Barry Smith , 14 A?u 2020 Cum, 18:49 tarihinde ?unu >> yazd?: >> >>> >>> Could be a bug in Chaco or its call from PETSc for the special case >>> of one process. Could you send a sample code that demonstrates the problem? >>> >>> Barry >>> >>> >>> > On Aug 14, 2020, at 8:53 AM, Eda Oktay wrote: >>> > >>> > Hi all, >>> > >>> > I am trying to try something. I am using the same MatPartitioning >>> codes for both CHACO and ParMETIS: >>> > >>> > ierr = >>> MatConvert(SymmA,MATMPIADJ,MAT_INITIAL_MATRIX,&AL);CHKERRQ(ierr); >>> > ierr = MatPartitioningCreate(MPI_COMM_WORLD,&part);CHKERRQ(ierr); >>> > ierr = MatPartitioningSetAdjacency(part,AL);CHKERRQ(ierr); >>> > >>> > ierr = MatPartitioningSetFromOptions(part);CHKERRQ(ierr); >>> > ierr = MatPartitioningApply(part,&partitioning);CHKERRQ(ierr); >>> > >>> > After obtaining the IS, I apply this to my original nonsymmetric >>> matrix and try to get an approximate edge cut. >>> > >>> > Except for 1 partitioning, my program completely works for 2,4 and 16 >>> partitionings. However, for 1, ParMETIS gives results where CHACO I guess >>> doesn't since I am getting errors about the index set. >>> > >>> > What is the difference between CHACO and ParMETIS that one works for 1 >>> partitioning and one doesn't? >>> > >>> > Thanks! >>> > >>> > Eda >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicola.varini at gmail.com Tue Aug 18 02:18:00 2020 From: nicola.varini at gmail.com (nicola varini) Date: Tue, 18 Aug 2020 09:18:00 +0200 Subject: [petsc-users] overlap cpu and gpu? In-Reply-To: References: <6C7446CE-D6FD-4087-8B81-41494FA712E7@petsc.dev> <87eeoqp3t2.fsf@jedbrown.org> Message-ID: Dear Mark, the matrices are not symmetric and not positive definite. I did try to add: *-*ampere_mg_levels_esteig_*ksp_monitor_singular_value* -ampere_mg_levels_esteig_ksp_max_it *50* -ampere_mg_levels_esteig_ksp_type *gmres* but it still fails to converge. For the time being it seems that hypre on CPU is the safest choice, although it is surely worth experimenting with Stefano branch. Thanks, Nicola Il giorno lun 17 ago 2020 alle ore 18:10 Mark Adams ha scritto: > The eigen estimates are either very bad or the coarse grids have a > problem. Everything looks fine other than these bad estimates that are > >> 2. > > * Are these matrices not symmetric? Maybe from BCs. THat is not usually a > problem, just checking. > > * Are these stretched grids? If not you might try: > -ampere_pc_gamg_square_graph *10* > > * GMRES is not a good estimator when you have SPD matrices, but it is > robust. You might try > > *-*ampere_mg_levels_esteig_*ksp_monitor_singular_value* > -ampere_mg_levels_esteig_ksp_max_it *50* > -ampere_mg_levels_esteig_ksp_type *gmres* > > * And why are you using: > > -ampere_ksp_type dgmres > > ? > If your problems are SPD then CG is great. > > Mark > > On Mon, Aug 17, 2020 at 10:33 AM nicola varini > wrote: > >> Hi Mark, this is the out of grep GAMG after I used -info: >> ======= >> [0] PCSetUp_GAMG(): level 0) N=582736, n data rows=1, n data cols=1, >> nnz/row (ave)=9, np=12 >> [0] PCGAMGFilterGraph(): 97.9676% nnz after filtering, with >> threshold 0., 8.95768 nnz ave. (N=582736) >> [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square >> [0] PCGAMGProlongator_AGG(): New grid 38934 nodes >> [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=2.101683e+00 min=4.341777e-03 >> PC=jacobi >> [0] PCGAMGOptProlongator_AGG(): Smooth P0: level 0, cache spectra >> 0.00434178 2.10168 >> [0] PCSetUp_GAMG(): 1) N=38934, n data cols=1, nnz/row (ave)=18, 12 >> active pes >> [0] PCGAMGFilterGraph(): 97.024% nnz after filtering, with >> threshold 0., 17.9774 nnz ave. (N=38934) >> [0] PCGAMGProlongator_AGG(): New grid 4459 nodes >> [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=4.521607e+01 >> min=5.854294e-01 PC=jacobi >> [0] PCGAMGOptProlongator_AGG(): Smooth P0: level 1, cache spectra >> 0.585429 45.2161 >> [0] PCSetUp_GAMG(): 2) N=4459, n data cols=1, nnz/row (ave)=29, 12 active >> pes >> [0] PCGAMGFilterGraph(): 99.6422% nnz after filtering, with >> threshold 0., 27.5481 nnz ave. (N=4459) >> [0] PCGAMGProlongator_AGG(): New grid 345 nodes >> [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.394069e+01 min=1.086973e-01 >> PC=jacobi >> [0] PCGAMGOptProlongator_AGG(): Smooth P0: level 2, cache spectra >> 0.108697 13.9407 >> [0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 29 with simple >> aggregation >> [0] PCSetUp_GAMG(): 3) N=345, n data cols=1, nnz/row (ave)=31, 6 active >> pes >> [0] PCGAMGFilterGraph(): 99.6292% nnz after filtering, with >> threshold 0., 26.9667 nnz ave. (N=345) >> [0] PCGAMGProlongator_AGG(): New grid 26 nodes >> [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.463593e+02 >> min=1.469384e-01 PC=jacobi >> [0] PCGAMGOptProlongator_AGG(): Smooth P0: level 3, cache spectra >> 0.146938 146.359 >> [0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 5 with simple >> aggregation >> [0] PCSetUp_GAMG(): 4) N=26, n data cols=1, nnz/row (ave)=16, 1 active pes >> [0] PCSetUp_GAMG(): 5 levels, grid complexity = 1.16304 >> PCGAMGGraph_AGG 4 1.0 8.4114e-02 1.0 1.02e+06 1.0 3.8e+02 1.3e+03 >> 4.0e+01 0 0 0 0 0 0 0 0 0 0 145 >> PCGAMGCoarse_AGG 4 1.0 3.2107e-01 1.0 9.43e+06 1.0 7.3e+02 1.1e+04 >> 3.5e+01 0 0 0 0 0 0 0 0 0 0 351 >> PCGAMGProl_AGG 4 1.0 2.8825e-02 1.0 0.00e+00 0.0 3.5e+02 2.8e+03 >> 6.4e+01 0 0 0 0 0 0 0 0 0 0 0 >> PCGAMGPOpt_AGG 4 1.0 1.1570e-01 1.0 2.61e+07 1.0 1.2e+03 2.6e+03 >> 1.6e+02 0 0 0 0 1 0 0 0 0 1 2692 >> GAMG: createProl 4 1.0 5.5680e-01 1.0 3.64e+07 1.0 2.7e+03 4.6e+03 >> 3.0e+02 0 0 0 0 1 0 0 0 0 1 784 >> GAMG: partLevel 4 1.0 1.1628e-01 1.0 5.90e+06 1.0 1.1e+03 3.0e+03 >> 1.6e+02 0 0 0 0 1 0 0 0 0 1 604 >> ====== >> Nicola >> >> >> Il giorno lun 17 ago 2020 alle ore 15:40 Mark Adams ha >> scritto: >> >>> >>> >>> On Mon, Aug 17, 2020 at 9:24 AM nicola varini >>> wrote: >>> >>>> Hi Mark, I do confirm that hypre with boomeramg is working fine and is >>>> pretty fast. >>>> >>> >>> Good, you can send me the -info (grep GAMG) output and I try to see what >>> is going on. >>> >>> >>>> However, none of the GAMG option works. >>>> Did anyone ever succeeded in usign hypre with petsc on gpu? >>>> >>> >>> We have gotten Hypre to run on GPUs but it has been fragile. The >>> performance has been marginal (due to use of USM apparently), but it is >>> being worked on by the hypre team. >>> >>> The cude tools are changing fast and I am guessing this is a different >>> version than what we have tested, perhaps. Maybe someone else can help with >>> this, but I know we use cuda 10.2 and you are using cuda tools 10.1. >>> >>> And you do want to use the most up-to-date PETSc. >>> >>> >>>> I did manage to compile hypre on gpu but I do get the following error: >>>> ======= >>>> CC gpuhypre/obj/vec/vec/impls/hypre/vhyp.o >>>> In file included from >>>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config.h:22, >>>> from >>>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/execution_policy.h:23, >>>> from >>>> /users/nvarini/hypre/include/_hypre_utilities.h:1129, >>>> from /users/nvarini/hypre/include/_hypre_IJ_mv.h:14, >>>> from >>>> /scratch/snx3000/nvarini/petsc-3.13.3/include/../src/vec/vec/impls/hypre/vhyp.h:6, >>>> from >>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/vec/vec/impls/hypre/vhyp.c:7: >>>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/version.h:83:1: >>>> error: unknown type name 'namespace' >>>> namespace thrust >>>> ^~~~~~~~~ >>>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/version.h:84:1: >>>> error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token >>>> { >>>> ^ >>>> In file included from >>>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config/config.h:28, >>>> from >>>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config.h:23, >>>> from >>>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/execution_policy.h:23, >>>> from >>>> /users/nvarini/hypre/include/_hypre_utilities.h:1129, >>>> from /users/nvarini/hypre/include/_hypre_IJ_mv.h:14, >>>> from >>>> /scratch/snx3000/nvarini/petsc-3.13.3/include/../src/vec/vec/impls/hypre/vhyp.h:6, >>>> from >>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/vec/vec/impls/hypre/vhyp.c:7: >>>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config/cpp_compatibility.h:21:10: >>>> fatal error: cstddef: No such file or directory >>>> #include >>>> ^~~~~~~~~ >>>> compilation terminated. >>>> >>>> ======= >>>> Nicola >>>> >>>> Il giorno ven 14 ago 2020 alle ore 20:13 Mark Adams >>>> ha scritto: >>>> >>>>> You can try Hypre. If that fails then there is a problem with your >>>>> system. >>>>> >>>>> And you can run with -info and grep on GAMG and send the output and I >>>>> can see if I see anything funny. >>>>> >>>>> If this is just a Lapacian with a stable discretization and not crazy >>>>> material parameters then stretched grids are about the only thing that can >>>>> hurt the solver. >>>>> >>>>> Do both of your solves fail in a similar way? >>>>> >>>>> On the CPU you can try this with large subdomains, preferably (in >>>>> serial ideally): >>>>> -ampere_mg_levels_ksp_type richardson >>>>> -ampere_mg_levels_pc_type sor >>>>> >>>>> And check that there are no unused options with -options_left. GAMG >>>>> can fail with bad eigen estimates, but these parameters look fine. >>>>> >>>>> On Fri, Aug 14, 2020 at 5:01 AM nicola varini >>>>> wrote: >>>>> >>>>>> Dear Barry, yes it gives the same problems. >>>>>> >>>>>> Il giorno gio 13 ago 2020 alle ore 23:22 Barry Smith < >>>>>> bsmith at petsc.dev> ha scritto: >>>>>> >>>>>>> >>>>>>> Does the same thing work (with GAMG) if you run on the same >>>>>>> problem on the same machine same number of MPI ranks but make a new >>>>>>> PETSC_ARCH that does NOT use the GPUs? >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> Ideally one gets almost identical convergence with CPUs or GPUs >>>>>>> (same problem, same machine) but a bug or numerically change "might" affect >>>>>>> this. >>>>>>> >>>>>>> On Aug 13, 2020, at 10:28 AM, nicola varini >>>>>>> wrote: >>>>>>> >>>>>>> Dear Barry, you are right. The Cray argument checking is incorrect. >>>>>>> It does work with download-fblaslapack. >>>>>>> However it does fail to converge. Is there anything obviously wrong >>>>>>> with my petscrc? >>>>>>> Anything else am I missing? >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> Il giorno gio 13 ago 2020 alle ore 03:17 Barry Smith < >>>>>>> bsmith at petsc.dev> ha scritto: >>>>>>> >>>>>>>> >>>>>>>> The QR is always done on the CPU, we don't have generic calls to >>>>>>>> blas/lapack go to the GPU currently. >>>>>>>> >>>>>>>> The error message is: >>>>>>>> >>>>>>>> On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal value >>>>>>>> (info = -7) >>>>>>>> >>>>>>>> argument 7 is &LWORK which is defined by >>>>>>>> >>>>>>>> PetscBLASInt LWORK=N*bs; >>>>>>>> >>>>>>>> and >>>>>>>> >>>>>>>> N=nSAvec is the column block size of new P. >>>>>>>> >>>>>>>> Presumably this is a huge run with many processes so using the >>>>>>>> debugger is not practical? >>>>>>>> >>>>>>>> We need to see what these variables are >>>>>>>> >>>>>>>> N, bs, nSAvec >>>>>>>> >>>>>>>> perhaps nSAvec is zero which could easily upset LAPACK. >>>>>>>> >>>>>>>> Crudest thing would be to just put a print statement in the >>>>>>>> code before the LAPACK call of if they are called many times add an error >>>>>>>> check like that >>>>>>>> generates an error if any of these three values are 0 (or >>>>>>>> negative). >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> >>>>>>>> It is not impossible that the Cray argument checking is >>>>>>>> incorrect and the value passed in is fine. You can check this by using >>>>>>>> --download-fblaslapack and see if the same or some other error comes up. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Aug 12, 2020, at 7:19 PM, Mark Adams wrote: >>>>>>>> >>>>>>>> Can you reproduce this on the CPU? >>>>>>>> The QR factorization seems to be failing. That could be from bad >>>>>>>> data or a bad GPU QR. >>>>>>>> >>>>>>>> On Wed, Aug 12, 2020 at 4:19 AM nicola varini < >>>>>>>> nicola.varini at gmail.com> wrote: >>>>>>>> >>>>>>>>> Dear all, following the suggestions I did resubmit the simulation >>>>>>>>> with the petscrc below. >>>>>>>>> However I do get the following error: >>>>>>>>> ======== >>>>>>>>> 7362 [592]PETSC ERROR: #1 formProl0() line 748 in >>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>>>>>>> 7363 [339]PETSC ERROR: Petsc has generated inconsistent data >>>>>>>>> 7364 [339]PETSC ERROR: xGEQRF error >>>>>>>>> 7365 [339]PETSC ERROR: See >>>>>>>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>>>>>>> shooting. >>>>>>>>> 7366 [339]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020 >>>>>>>>> 7367 [339]PETSC ERROR: >>>>>>>>> /users/nvarini/gbs_test_nicola/bin/gbs_daint_gpu_gnu on a named nid05083 >>>>>>>>> by nvarini Wed Aug 12 10:06:15 2020 >>>>>>>>> 7368 [339]PETSC ERROR: Configure options --with-cc=cc >>>>>>>>> --with-fc=ftn --known-mpi-shared-libraries=1 --known-mpi-c-double-complex=1 >>>>>>>>> --known-mpi-int64_t=1 --known-mpi-long-double=1 --with-batch=1 >>>>>>>>> --known-64-bit-blas-indices=0 --LIBS=-lstdc++ --with-cxxlib-autodetect=0 >>>>>>>>> --with-scalapa ck=1 --with-cxx=CC --with-debugging=0 >>>>>>>>> --with-hypre-dir=/opt/cray/pe/tpsl/19.06.1/GNU/8.2/haswell >>>>>>>>> --prefix=/scratch/snx3000/nvarini/petsc3.13.3-gpu --with-cuda=1 >>>>>>>>> --with-cuda-c=nvcc --with-cxxlib-autodetect=0 >>>>>>>>> --COPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include - >>>>>>>>> -with-cxx=CC >>>>>>>>> --CXXOPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include >>>>>>>>> 7369 [592]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in >>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>>>>>>> 7370 [592]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>> 7371 [592]PETSC ERROR: #4 PCSetUp() line 898 in >>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c >>>>>>>>> 7372 [592]PETSC ERROR: #5 KSPSetUp() line 376 in >>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>>>> 7373 [592]PETSC ERROR: #6 KSPSolve_Private() line 633 in >>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>>>> 7374 [316]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>> 7375 [339]PETSC ERROR: #1 formProl0() line 748 in >>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>>>>>>> 7376 [339]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in >>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>>>>>>> 7377 [339]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>> 7378 [339]PETSC ERROR: #4 PCSetUp() line 898 in >>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c >>>>>>>>> 7379 [339]PETSC ERROR: #5 KSPSetUp() line 376 in >>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>>>> 7380 [592]PETSC ERROR: #7 KSPSolve() line 853 in >>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>>>> 7381 [339]PETSC ERROR: #6 KSPSolve_Private() line 633 in >>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>>>> 7382 [339]PETSC ERROR: #7 KSPSolve() line 853 in >>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>>>> 7383 On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal >>>>>>>>> value (info = -7) >>>>>>>>> 7384 [160]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>> ======== >>>>>>>>> >>>>>>>>> I did try other pc_gamg_type but they fails as well. >>>>>>>>> >>>>>>>>> >>>>>>>>> #PETSc Option Table entries: >>>>>>>>> -ampere_dm_mat_type aijcusparse >>>>>>>>> -ampere_dm_vec_type cuda >>>>>>>>> -ampere_ksp_atol 1e-15 >>>>>>>>> -ampere_ksp_initial_guess_nonzero yes >>>>>>>>> -ampere_ksp_reuse_preconditioner yes >>>>>>>>> -ampere_ksp_rtol 1e-7 >>>>>>>>> -ampere_ksp_type dgmres >>>>>>>>> -ampere_mg_levels_esteig_ksp_max_it 10 >>>>>>>>> -ampere_mg_levels_esteig_ksp_type cg >>>>>>>>> -ampere_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >>>>>>>>> -ampere_mg_levels_ksp_type chebyshev >>>>>>>>> -ampere_mg_levels_pc_type jacobi >>>>>>>>> -ampere_pc_gamg_agg_nsmooths 1 >>>>>>>>> -ampere_pc_gamg_coarse_eq_limit 10 >>>>>>>>> -ampere_pc_gamg_reuse_interpolation true >>>>>>>>> -ampere_pc_gamg_square_graph 1 >>>>>>>>> -ampere_pc_gamg_threshold 0.05 >>>>>>>>> -ampere_pc_gamg_threshold_scale .0 >>>>>>>>> -ampere_pc_gamg_type agg >>>>>>>>> -ampere_pc_type gamg >>>>>>>>> -dm_mat_type aijcusparse >>>>>>>>> -dm_vec_type cuda >>>>>>>>> -log_view >>>>>>>>> -poisson_dm_mat_type aijcusparse >>>>>>>>> -poisson_dm_vec_type cuda >>>>>>>>> -poisson_ksp_atol 1e-15 >>>>>>>>> -poisson_ksp_initial_guess_nonzero yes >>>>>>>>> -poisson_ksp_reuse_preconditioner yes >>>>>>>>> -poisson_ksp_rtol 1e-7 >>>>>>>>> -poisson_ksp_type dgmres >>>>>>>>> -poisson_log_view >>>>>>>>> -poisson_mg_levels_esteig_ksp_max_it 10 >>>>>>>>> -poisson_mg_levels_esteig_ksp_type cg >>>>>>>>> -poisson_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >>>>>>>>> -poisson_mg_levels_ksp_max_it 1 >>>>>>>>> -poisson_mg_levels_ksp_type chebyshev >>>>>>>>> -poisson_mg_levels_pc_type jacobi >>>>>>>>> -poisson_pc_gamg_agg_nsmooths 1 >>>>>>>>> -poisson_pc_gamg_coarse_eq_limit 10 >>>>>>>>> -poisson_pc_gamg_reuse_interpolation true >>>>>>>>> -poisson_pc_gamg_square_graph 1 >>>>>>>>> -poisson_pc_gamg_threshold 0.05 >>>>>>>>> -poisson_pc_gamg_threshold_scale .0 >>>>>>>>> -poisson_pc_gamg_type agg >>>>>>>>> -poisson_pc_type gamg >>>>>>>>> -use_mat_nearnullspace true >>>>>>>>> #End of PETSc Option Table entries >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> >>>>>>>>> Nicola >>>>>>>>> >>>>>>>>> Il giorno mar 4 ago 2020 alle ore 17:57 Mark Adams < >>>>>>>>> mfadams at lbl.gov> ha scritto: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Aug 4, 2020 at 6:35 AM Stefano Zampini < >>>>>>>>>> stefano.zampini at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Nicola, >>>>>>>>>>> >>>>>>>>>>> You are actually not using the GPU properly, since you use HYPRE >>>>>>>>>>> preconditioning, which is CPU only. One of your solvers is actually slower >>>>>>>>>>> on ?GPU?. >>>>>>>>>>> For a full AMG GPU, you can use PCGAMG, with cheby smoothers and >>>>>>>>>>> with Jacobi preconditioning. Mark can help you out with the specific >>>>>>>>>>> command line options. >>>>>>>>>>> When it works properly, everything related to PC application is >>>>>>>>>>> offloaded to the GPU, and you should expect to get the well-known and >>>>>>>>>>> branded 10x (maybe more) speedup one is expecting from GPUs during KSPSolve >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> The speedup depends on the machine, but on SUMMIT, using enough >>>>>>>>>> CPUs to saturate the memory bus vs all 6 GPUs the speedup is a function of >>>>>>>>>> problem subdomain size. I saw 10x at about 100K equations/process. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Doing what you want to do is one of the last optimization steps >>>>>>>>>>> of an already optimized code before entering production. Yours is not even >>>>>>>>>>> optimized for proper GPU usage yet. >>>>>>>>>>> Also, any specific reason why you are using dgmres and fgmres? >>>>>>>>>>> >>>>>>>>>>> PETSc has not been designed with multi-threading in mind. You >>>>>>>>>>> can achieve ?overlap? of the two solves by splitting the communicator. But >>>>>>>>>>> then you need communications to let the two solutions talk to each other. >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> Stefano >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Aug 4, 2020, at 12:04 PM, nicola varini < >>>>>>>>>>> nicola.varini at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>> Dear all, thanks for your replies. The reason why I've asked if >>>>>>>>>>> it is possible to overlap poisson and ampere is because they roughly >>>>>>>>>>> take the same amount of time. Please find in attachment the >>>>>>>>>>> profiling logs for only CPU and only GPU. >>>>>>>>>>> Of course it is possible to split the MPI communicator and run >>>>>>>>>>> each solver on different subcommunicator, however this would involve more >>>>>>>>>>> communication. >>>>>>>>>>> Did anyone ever tried to run 2 solvers with hyperthreading? >>>>>>>>>>> Thanks >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Il giorno dom 2 ago 2020 alle ore 14:09 Mark Adams < >>>>>>>>>>> mfadams at lbl.gov> ha scritto: >>>>>>>>>>> >>>>>>>>>>>> I suspect that the Poisson and Ampere's law solve are not >>>>>>>>>>>> coupled. You might be able to duplicate the communicator and use two >>>>>>>>>>>> threads. You would want to configure PETSc with threadsafty and threads and >>>>>>>>>>>> I think it could/should work, but this mode is never used by anyone. >>>>>>>>>>>> >>>>>>>>>>>> That said, I would not recommend doing this unless you feel >>>>>>>>>>>> like playing in computer science, as opposed to doing application science. >>>>>>>>>>>> The best case scenario you get a speedup of 2x. That is a strict upper >>>>>>>>>>>> bound, but you will never come close to it. Your hardware has some balance >>>>>>>>>>>> of CPU to GPU processing rate. Your application has a balance of volume of >>>>>>>>>>>> work for your two solves. They have to be the same to get close to 2x >>>>>>>>>>>> speedup and that ratio(s) has to be 1:1. To be concrete, from what little I >>>>>>>>>>>> can guess about your applications let's assume that the cost of each of >>>>>>>>>>>> these two solves is about the same (eg, Laplacians on your domain and the >>>>>>>>>>>> best case scenario). But, GPU machines are configured to have roughly 1-10% >>>>>>>>>>>> of capacity in the GPUs, these days, that gives you an upper bound of about >>>>>>>>>>>> 10% speedup. That is noise. Upshot, unless you configure your hardware to >>>>>>>>>>>> match this problem, and the two solves have the same cost, you will not see >>>>>>>>>>>> close to 2x speedup. Your time is better spent elsewhere. >>>>>>>>>>>> >>>>>>>>>>>> Mark >>>>>>>>>>>> >>>>>>>>>>>> On Sat, Aug 1, 2020 at 3:24 PM Jed Brown >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> You can use MPI and split the communicator so n-1 ranks create >>>>>>>>>>>>> a DMDA for one part of your system and the other rank drives the GPU in the >>>>>>>>>>>>> other part. They can all be part of the same coupled system on the full >>>>>>>>>>>>> communicator, but PETSc doesn't currently support some ranks having their >>>>>>>>>>>>> Vec arrays on GPU and others on host, so you'd be paying host-device >>>>>>>>>>>>> transfer costs on each iteration (and that might swamp any performance >>>>>>>>>>>>> benefit you would have gotten). >>>>>>>>>>>>> >>>>>>>>>>>>> In any case, be sure to think about the execution time of each >>>>>>>>>>>>> part. Load balancing with matching time-to-solution for each part can be >>>>>>>>>>>>> really hard. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Barry Smith writes: >>>>>>>>>>>>> >>>>>>>>>>>>> > Nicola, >>>>>>>>>>>>> > >>>>>>>>>>>>> > This is really viable or practical at this time with >>>>>>>>>>>>> PETSc. It is not impossible but requires careful coding with threads, >>>>>>>>>>>>> another possibility is to use one half of the virtual GPUs for each solve, >>>>>>>>>>>>> this is also not trivial. I would recommend first seeing what kind of >>>>>>>>>>>>> performance you can get on the GPU for each type of solve and revist this >>>>>>>>>>>>> idea in the future. >>>>>>>>>>>>> > >>>>>>>>>>>>> > Barry >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> >> On Jul 31, 2020, at 9:23 AM, nicola varini < >>>>>>>>>>>>> nicola.varini at gmail.com> wrote: >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> Hello, I would like to know if it is possible to overlap >>>>>>>>>>>>> CPU and GPU with DMDA. >>>>>>>>>>>>> >> I've a machine where each node has 1P100+1Haswell. >>>>>>>>>>>>> >> I've to resolve Poisson and Ampere equation for each time >>>>>>>>>>>>> step. >>>>>>>>>>>>> >> I'm using 2D DMDA for each of them. Would be possible to >>>>>>>>>>>>> compute poisson >>>>>>>>>>>>> >> and ampere equation at the same time? One on CPU and the >>>>>>>>>>>>> other on GPU? >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> Thanks >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Tue Aug 18 06:44:51 2020 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 18 Aug 2020 07:44:51 -0400 Subject: [petsc-users] overlap cpu and gpu? In-Reply-To: References: <6C7446CE-D6FD-4087-8B81-41494FA712E7@petsc.dev> <87eeoqp3t2.fsf@jedbrown.org> Message-ID: I thought these were the Lapalcian (Poisson and AMerphere law). Anyway, the coarse grids are very messed up, or at least the eigen estimates are very messed up. A bad QR solver, used in GAMG's coarse grid construction, could do that. I've never seen that happen before, but it would explain this. On Tue, Aug 18, 2020 at 3:18 AM nicola varini wrote: > Dear Mark, the matrices are not symmetric and not positive definite. I did > try to add: > *-*ampere_mg_levels_esteig_*ksp_monitor_singular_value* > -ampere_mg_levels_esteig_ksp_max_it *50* > -ampere_mg_levels_esteig_ksp_type > *gmres* > but it still fails to converge. > For the time being it seems that hypre on CPU is the safest choice, > although it is surely worth experimenting with Stefano branch. > > Thanks, > > Nicola > > Il giorno lun 17 ago 2020 alle ore 18:10 Mark Adams ha > scritto: > >> The eigen estimates are either very bad or the coarse grids have a >> problem. Everything looks fine other than these bad estimates that are >> >> 2. >> >> * Are these matrices not symmetric? Maybe from BCs. THat is not usually a >> problem, just checking. >> >> * Are these stretched grids? If not you might try: >> -ampere_pc_gamg_square_graph *10* >> >> * GMRES is not a good estimator when you have SPD matrices, but it is >> robust. You might try >> >> *-*ampere_mg_levels_esteig_*ksp_monitor_singular_value* >> -ampere_mg_levels_esteig_ksp_max_it *50* >> -ampere_mg_levels_esteig_ksp_type *gmres* >> >> * And why are you using: >> >> -ampere_ksp_type dgmres >> >> ? >> If your problems are SPD then CG is great. >> >> Mark >> >> On Mon, Aug 17, 2020 at 10:33 AM nicola varini >> wrote: >> >>> Hi Mark, this is the out of grep GAMG after I used -info: >>> ======= >>> [0] PCSetUp_GAMG(): level 0) N=582736, n data rows=1, n data cols=1, >>> nnz/row (ave)=9, np=12 >>> [0] PCGAMGFilterGraph(): 97.9676% nnz after filtering, with >>> threshold 0., 8.95768 nnz ave. (N=582736) >>> [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square >>> [0] PCGAMGProlongator_AGG(): New grid 38934 nodes >>> [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=2.101683e+00 min=4.341777e-03 >>> PC=jacobi >>> [0] PCGAMGOptProlongator_AGG(): Smooth P0: level 0, cache spectra >>> 0.00434178 2.10168 >>> [0] PCSetUp_GAMG(): 1) N=38934, n data cols=1, nnz/row (ave)=18, 12 >>> active pes >>> [0] PCGAMGFilterGraph(): 97.024% nnz after filtering, with >>> threshold 0., 17.9774 nnz ave. (N=38934) >>> [0] PCGAMGProlongator_AGG(): New grid 4459 nodes >>> [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=4.521607e+01 >>> min=5.854294e-01 PC=jacobi >>> [0] PCGAMGOptProlongator_AGG(): Smooth P0: level 1, cache spectra >>> 0.585429 45.2161 >>> [0] PCSetUp_GAMG(): 2) N=4459, n data cols=1, nnz/row (ave)=29, 12 >>> active pes >>> [0] PCGAMGFilterGraph(): 99.6422% nnz after filtering, with >>> threshold 0., 27.5481 nnz ave. (N=4459) >>> [0] PCGAMGProlongator_AGG(): New grid 345 nodes >>> [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.394069e+01 min=1.086973e-01 >>> PC=jacobi >>> [0] PCGAMGOptProlongator_AGG(): Smooth P0: level 2, cache spectra >>> 0.108697 13.9407 >>> [0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 29 with simple >>> aggregation >>> [0] PCSetUp_GAMG(): 3) N=345, n data cols=1, nnz/row (ave)=31, 6 active >>> pes >>> [0] PCGAMGFilterGraph(): 99.6292% nnz after filtering, with >>> threshold 0., 26.9667 nnz ave. (N=345) >>> [0] PCGAMGProlongator_AGG(): New grid 26 nodes >>> [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.463593e+02 >>> min=1.469384e-01 PC=jacobi >>> [0] PCGAMGOptProlongator_AGG(): Smooth P0: level 3, cache spectra >>> 0.146938 146.359 >>> [0] PCGAMGCreateLevel_GAMG(): Number of equations (loc) 5 with simple >>> aggregation >>> [0] PCSetUp_GAMG(): 4) N=26, n data cols=1, nnz/row (ave)=16, 1 active >>> pes >>> [0] PCSetUp_GAMG(): 5 levels, grid complexity = 1.16304 >>> PCGAMGGraph_AGG 4 1.0 8.4114e-02 1.0 1.02e+06 1.0 3.8e+02 1.3e+03 >>> 4.0e+01 0 0 0 0 0 0 0 0 0 0 145 >>> PCGAMGCoarse_AGG 4 1.0 3.2107e-01 1.0 9.43e+06 1.0 7.3e+02 1.1e+04 >>> 3.5e+01 0 0 0 0 0 0 0 0 0 0 351 >>> PCGAMGProl_AGG 4 1.0 2.8825e-02 1.0 0.00e+00 0.0 3.5e+02 2.8e+03 >>> 6.4e+01 0 0 0 0 0 0 0 0 0 0 0 >>> PCGAMGPOpt_AGG 4 1.0 1.1570e-01 1.0 2.61e+07 1.0 1.2e+03 2.6e+03 >>> 1.6e+02 0 0 0 0 1 0 0 0 0 1 2692 >>> GAMG: createProl 4 1.0 5.5680e-01 1.0 3.64e+07 1.0 2.7e+03 4.6e+03 >>> 3.0e+02 0 0 0 0 1 0 0 0 0 1 784 >>> GAMG: partLevel 4 1.0 1.1628e-01 1.0 5.90e+06 1.0 1.1e+03 3.0e+03 >>> 1.6e+02 0 0 0 0 1 0 0 0 0 1 604 >>> ====== >>> Nicola >>> >>> >>> Il giorno lun 17 ago 2020 alle ore 15:40 Mark Adams >>> ha scritto: >>> >>>> >>>> >>>> On Mon, Aug 17, 2020 at 9:24 AM nicola varini >>>> wrote: >>>> >>>>> Hi Mark, I do confirm that hypre with boomeramg is working fine and is >>>>> pretty fast. >>>>> >>>> >>>> Good, you can send me the -info (grep GAMG) output and I try to see >>>> what is going on. >>>> >>>> >>>>> However, none of the GAMG option works. >>>>> Did anyone ever succeeded in usign hypre with petsc on gpu? >>>>> >>>> >>>> We have gotten Hypre to run on GPUs but it has been fragile. The >>>> performance has been marginal (due to use of USM apparently), but it is >>>> being worked on by the hypre team. >>>> >>>> The cude tools are changing fast and I am guessing this is a different >>>> version than what we have tested, perhaps. Maybe someone else can help with >>>> this, but I know we use cuda 10.2 and you are using cuda tools 10.1. >>>> >>>> And you do want to use the most up-to-date PETSc. >>>> >>>> >>>>> I did manage to compile hypre on gpu but I do get the following error: >>>>> ======= >>>>> CC gpuhypre/obj/vec/vec/impls/hypre/vhyp.o >>>>> In file included from >>>>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config.h:22, >>>>> from >>>>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/execution_policy.h:23, >>>>> from >>>>> /users/nvarini/hypre/include/_hypre_utilities.h:1129, >>>>> from /users/nvarini/hypre/include/_hypre_IJ_mv.h:14, >>>>> from >>>>> /scratch/snx3000/nvarini/petsc-3.13.3/include/../src/vec/vec/impls/hypre/vhyp.h:6, >>>>> from >>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/vec/vec/impls/hypre/vhyp.c:7: >>>>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/version.h:83:1: >>>>> error: unknown type name 'namespace' >>>>> namespace thrust >>>>> ^~~~~~~~~ >>>>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/version.h:84:1: >>>>> error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token >>>>> { >>>>> ^ >>>>> In file included from >>>>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config/config.h:28, >>>>> from >>>>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config.h:23, >>>>> from >>>>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/execution_policy.h:23, >>>>> from >>>>> /users/nvarini/hypre/include/_hypre_utilities.h:1129, >>>>> from /users/nvarini/hypre/include/_hypre_IJ_mv.h:14, >>>>> from >>>>> /scratch/snx3000/nvarini/petsc-3.13.3/include/../src/vec/vec/impls/hypre/vhyp.h:6, >>>>> from >>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/vec/vec/impls/hypre/vhyp.c:7: >>>>> /opt/nvidia/cudatoolkit10/10.1.105_3.27-7.0.1.1_4.1__ga311ce7/include/thrust/detail/config/cpp_compatibility.h:21:10: >>>>> fatal error: cstddef: No such file or directory >>>>> #include >>>>> ^~~~~~~~~ >>>>> compilation terminated. >>>>> >>>>> ======= >>>>> Nicola >>>>> >>>>> Il giorno ven 14 ago 2020 alle ore 20:13 Mark Adams >>>>> ha scritto: >>>>> >>>>>> You can try Hypre. If that fails then there is a problem with your >>>>>> system. >>>>>> >>>>>> And you can run with -info and grep on GAMG and send the output and I >>>>>> can see if I see anything funny. >>>>>> >>>>>> If this is just a Lapacian with a stable discretization and not crazy >>>>>> material parameters then stretched grids are about the only thing that can >>>>>> hurt the solver. >>>>>> >>>>>> Do both of your solves fail in a similar way? >>>>>> >>>>>> On the CPU you can try this with large subdomains, preferably (in >>>>>> serial ideally): >>>>>> -ampere_mg_levels_ksp_type richardson >>>>>> -ampere_mg_levels_pc_type sor >>>>>> >>>>>> And check that there are no unused options with -options_left. GAMG >>>>>> can fail with bad eigen estimates, but these parameters look fine. >>>>>> >>>>>> On Fri, Aug 14, 2020 at 5:01 AM nicola varini < >>>>>> nicola.varini at gmail.com> wrote: >>>>>> >>>>>>> Dear Barry, yes it gives the same problems. >>>>>>> >>>>>>> Il giorno gio 13 ago 2020 alle ore 23:22 Barry Smith < >>>>>>> bsmith at petsc.dev> ha scritto: >>>>>>> >>>>>>>> >>>>>>>> Does the same thing work (with GAMG) if you run on the same >>>>>>>> problem on the same machine same number of MPI ranks but make a new >>>>>>>> PETSC_ARCH that does NOT use the GPUs? >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> Ideally one gets almost identical convergence with CPUs or GPUs >>>>>>>> (same problem, same machine) but a bug or numerically change "might" affect >>>>>>>> this. >>>>>>>> >>>>>>>> On Aug 13, 2020, at 10:28 AM, nicola varini < >>>>>>>> nicola.varini at gmail.com> wrote: >>>>>>>> >>>>>>>> Dear Barry, you are right. The Cray argument checking is incorrect. >>>>>>>> It does work with download-fblaslapack. >>>>>>>> However it does fail to converge. Is there anything obviously wrong >>>>>>>> with my petscrc? >>>>>>>> Anything else am I missing? >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>> Il giorno gio 13 ago 2020 alle ore 03:17 Barry Smith < >>>>>>>> bsmith at petsc.dev> ha scritto: >>>>>>>> >>>>>>>>> >>>>>>>>> The QR is always done on the CPU, we don't have generic calls >>>>>>>>> to blas/lapack go to the GPU currently. >>>>>>>>> >>>>>>>>> The error message is: >>>>>>>>> >>>>>>>>> On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal value >>>>>>>>> (info = -7) >>>>>>>>> >>>>>>>>> argument 7 is &LWORK which is defined by >>>>>>>>> >>>>>>>>> PetscBLASInt LWORK=N*bs; >>>>>>>>> >>>>>>>>> and >>>>>>>>> >>>>>>>>> N=nSAvec is the column block size of new P. >>>>>>>>> >>>>>>>>> Presumably this is a huge run with many processes so using the >>>>>>>>> debugger is not practical? >>>>>>>>> >>>>>>>>> We need to see what these variables are >>>>>>>>> >>>>>>>>> N, bs, nSAvec >>>>>>>>> >>>>>>>>> perhaps nSAvec is zero which could easily upset LAPACK. >>>>>>>>> >>>>>>>>> Crudest thing would be to just put a print statement in the >>>>>>>>> code before the LAPACK call of if they are called many times add an error >>>>>>>>> check like that >>>>>>>>> generates an error if any of these three values are 0 (or >>>>>>>>> negative). >>>>>>>>> >>>>>>>>> Barry >>>>>>>>> >>>>>>>>> >>>>>>>>> It is not impossible that the Cray argument checking is >>>>>>>>> incorrect and the value passed in is fine. You can check this by using >>>>>>>>> --download-fblaslapack and see if the same or some other error comes up. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Aug 12, 2020, at 7:19 PM, Mark Adams wrote: >>>>>>>>> >>>>>>>>> Can you reproduce this on the CPU? >>>>>>>>> The QR factorization seems to be failing. That could be from bad >>>>>>>>> data or a bad GPU QR. >>>>>>>>> >>>>>>>>> On Wed, Aug 12, 2020 at 4:19 AM nicola varini < >>>>>>>>> nicola.varini at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Dear all, following the suggestions I did resubmit the simulation >>>>>>>>>> with the petscrc below. >>>>>>>>>> However I do get the following error: >>>>>>>>>> ======== >>>>>>>>>> 7362 [592]PETSC ERROR: #1 formProl0() line 748 in >>>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>>>>>>>> 7363 [339]PETSC ERROR: Petsc has generated inconsistent data >>>>>>>>>> 7364 [339]PETSC ERROR: xGEQRF error >>>>>>>>>> 7365 [339]PETSC ERROR: See >>>>>>>>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>>>>>>>> shooting. >>>>>>>>>> 7366 [339]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, >>>>>>>>>> 2020 >>>>>>>>>> 7367 [339]PETSC ERROR: >>>>>>>>>> /users/nvarini/gbs_test_nicola/bin/gbs_daint_gpu_gnu on a named nid05083 >>>>>>>>>> by nvarini Wed Aug 12 10:06:15 2020 >>>>>>>>>> 7368 [339]PETSC ERROR: Configure options --with-cc=cc >>>>>>>>>> --with-fc=ftn --known-mpi-shared-libraries=1 --known-mpi-c-double-complex=1 >>>>>>>>>> --known-mpi-int64_t=1 --known-mpi-long-double=1 --with-batch=1 >>>>>>>>>> --known-64-bit-blas-indices=0 --LIBS=-lstdc++ --with-cxxlib-autodetect=0 >>>>>>>>>> --with-scalapa ck=1 --with-cxx=CC --with-debugging=0 >>>>>>>>>> --with-hypre-dir=/opt/cray/pe/tpsl/19.06.1/GNU/8.2/haswell >>>>>>>>>> --prefix=/scratch/snx3000/nvarini/petsc3.13.3-gpu --with-cuda=1 >>>>>>>>>> --with-cuda-c=nvcc --with-cxxlib-autodetect=0 >>>>>>>>>> --COPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include - >>>>>>>>>> -with-cxx=CC >>>>>>>>>> --CXXOPTFLAGS=-I/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/include >>>>>>>>>> 7369 [592]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in >>>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>>>>>>>> 7370 [592]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>> 7371 [592]PETSC ERROR: #4 PCSetUp() line 898 in >>>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c >>>>>>>>>> 7372 [592]PETSC ERROR: #5 KSPSetUp() line 376 in >>>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>>>>> 7373 [592]PETSC ERROR: #6 KSPSolve_Private() line 633 in >>>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>>>>> 7374 [316]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>> 7375 [339]PETSC ERROR: #1 formProl0() line 748 in >>>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>>>>>>>> 7376 [339]PETSC ERROR: #2 PCGAMGProlongator_AGG() line 1063 in >>>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/agg.c >>>>>>>>>> 7377 [339]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>> 7378 [339]PETSC ERROR: #4 PCSetUp() line 898 in >>>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/interface/precon.c >>>>>>>>>> 7379 [339]PETSC ERROR: #5 KSPSetUp() line 376 in >>>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>>>>> 7380 [592]PETSC ERROR: #7 KSPSolve() line 853 in >>>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>>>>> 7381 [339]PETSC ERROR: #6 KSPSolve_Private() line 633 in >>>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>>>>> 7382 [339]PETSC ERROR: #7 KSPSolve() line 853 in >>>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/ksp/interface/itfunc.c >>>>>>>>>> 7383 On entry to __cray_mgm_dgeqrf, parameter 7 had an illegal >>>>>>>>>> value (info = -7) >>>>>>>>>> 7384 [160]PETSC ERROR: #3 PCSetUp_GAMG() line 548 in >>>>>>>>>> /scratch/snx3000/nvarini/petsc-3.13.3/src/ksp/pc/impls/gamg/gamg.c >>>>>>>>>> ======== >>>>>>>>>> >>>>>>>>>> I did try other pc_gamg_type but they fails as well. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> #PETSc Option Table entries: >>>>>>>>>> -ampere_dm_mat_type aijcusparse >>>>>>>>>> -ampere_dm_vec_type cuda >>>>>>>>>> -ampere_ksp_atol 1e-15 >>>>>>>>>> -ampere_ksp_initial_guess_nonzero yes >>>>>>>>>> -ampere_ksp_reuse_preconditioner yes >>>>>>>>>> -ampere_ksp_rtol 1e-7 >>>>>>>>>> -ampere_ksp_type dgmres >>>>>>>>>> -ampere_mg_levels_esteig_ksp_max_it 10 >>>>>>>>>> -ampere_mg_levels_esteig_ksp_type cg >>>>>>>>>> -ampere_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >>>>>>>>>> -ampere_mg_levels_ksp_type chebyshev >>>>>>>>>> -ampere_mg_levels_pc_type jacobi >>>>>>>>>> -ampere_pc_gamg_agg_nsmooths 1 >>>>>>>>>> -ampere_pc_gamg_coarse_eq_limit 10 >>>>>>>>>> -ampere_pc_gamg_reuse_interpolation true >>>>>>>>>> -ampere_pc_gamg_square_graph 1 >>>>>>>>>> -ampere_pc_gamg_threshold 0.05 >>>>>>>>>> -ampere_pc_gamg_threshold_scale .0 >>>>>>>>>> -ampere_pc_gamg_type agg >>>>>>>>>> -ampere_pc_type gamg >>>>>>>>>> -dm_mat_type aijcusparse >>>>>>>>>> -dm_vec_type cuda >>>>>>>>>> -log_view >>>>>>>>>> -poisson_dm_mat_type aijcusparse >>>>>>>>>> -poisson_dm_vec_type cuda >>>>>>>>>> -poisson_ksp_atol 1e-15 >>>>>>>>>> -poisson_ksp_initial_guess_nonzero yes >>>>>>>>>> -poisson_ksp_reuse_preconditioner yes >>>>>>>>>> -poisson_ksp_rtol 1e-7 >>>>>>>>>> -poisson_ksp_type dgmres >>>>>>>>>> -poisson_log_view >>>>>>>>>> -poisson_mg_levels_esteig_ksp_max_it 10 >>>>>>>>>> -poisson_mg_levels_esteig_ksp_type cg >>>>>>>>>> -poisson_mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 >>>>>>>>>> -poisson_mg_levels_ksp_max_it 1 >>>>>>>>>> -poisson_mg_levels_ksp_type chebyshev >>>>>>>>>> -poisson_mg_levels_pc_type jacobi >>>>>>>>>> -poisson_pc_gamg_agg_nsmooths 1 >>>>>>>>>> -poisson_pc_gamg_coarse_eq_limit 10 >>>>>>>>>> -poisson_pc_gamg_reuse_interpolation true >>>>>>>>>> -poisson_pc_gamg_square_graph 1 >>>>>>>>>> -poisson_pc_gamg_threshold 0.05 >>>>>>>>>> -poisson_pc_gamg_threshold_scale .0 >>>>>>>>>> -poisson_pc_gamg_type agg >>>>>>>>>> -poisson_pc_type gamg >>>>>>>>>> -use_mat_nearnullspace true >>>>>>>>>> #End of PETSc Option Table entries >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> >>>>>>>>>> Nicola >>>>>>>>>> >>>>>>>>>> Il giorno mar 4 ago 2020 alle ore 17:57 Mark Adams < >>>>>>>>>> mfadams at lbl.gov> ha scritto: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Aug 4, 2020 at 6:35 AM Stefano Zampini < >>>>>>>>>>> stefano.zampini at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Nicola, >>>>>>>>>>>> >>>>>>>>>>>> You are actually not using the GPU properly, since you use >>>>>>>>>>>> HYPRE preconditioning, which is CPU only. One of your solvers is actually >>>>>>>>>>>> slower on ?GPU?. >>>>>>>>>>>> For a full AMG GPU, you can use PCGAMG, with cheby smoothers >>>>>>>>>>>> and with Jacobi preconditioning. Mark can help you out with the specific >>>>>>>>>>>> command line options. >>>>>>>>>>>> When it works properly, everything related to PC application is >>>>>>>>>>>> offloaded to the GPU, and you should expect to get the well-known and >>>>>>>>>>>> branded 10x (maybe more) speedup one is expecting from GPUs during KSPSolve >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> The speedup depends on the machine, but on SUMMIT, using enough >>>>>>>>>>> CPUs to saturate the memory bus vs all 6 GPUs the speedup is a function of >>>>>>>>>>> problem subdomain size. I saw 10x at about 100K equations/process. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Doing what you want to do is one of the last optimization steps >>>>>>>>>>>> of an already optimized code before entering production. Yours is not even >>>>>>>>>>>> optimized for proper GPU usage yet. >>>>>>>>>>>> Also, any specific reason why you are using dgmres and fgmres? >>>>>>>>>>>> >>>>>>>>>>>> PETSc has not been designed with multi-threading in mind. You >>>>>>>>>>>> can achieve ?overlap? of the two solves by splitting the communicator. But >>>>>>>>>>>> then you need communications to let the two solutions talk to each other. >>>>>>>>>>>> >>>>>>>>>>>> Thanks >>>>>>>>>>>> Stefano >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Aug 4, 2020, at 12:04 PM, nicola varini < >>>>>>>>>>>> nicola.varini at gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Dear all, thanks for your replies. The reason why I've asked if >>>>>>>>>>>> it is possible to overlap poisson and ampere is because they roughly >>>>>>>>>>>> take the same amount of time. Please find in attachment the >>>>>>>>>>>> profiling logs for only CPU and only GPU. >>>>>>>>>>>> Of course it is possible to split the MPI communicator and run >>>>>>>>>>>> each solver on different subcommunicator, however this would involve more >>>>>>>>>>>> communication. >>>>>>>>>>>> Did anyone ever tried to run 2 solvers with hyperthreading? >>>>>>>>>>>> Thanks >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Il giorno dom 2 ago 2020 alle ore 14:09 Mark Adams < >>>>>>>>>>>> mfadams at lbl.gov> ha scritto: >>>>>>>>>>>> >>>>>>>>>>>>> I suspect that the Poisson and Ampere's law solve are not >>>>>>>>>>>>> coupled. You might be able to duplicate the communicator and use two >>>>>>>>>>>>> threads. You would want to configure PETSc with threadsafty and threads and >>>>>>>>>>>>> I think it could/should work, but this mode is never used by anyone. >>>>>>>>>>>>> >>>>>>>>>>>>> That said, I would not recommend doing this unless you feel >>>>>>>>>>>>> like playing in computer science, as opposed to doing application science. >>>>>>>>>>>>> The best case scenario you get a speedup of 2x. That is a strict upper >>>>>>>>>>>>> bound, but you will never come close to it. Your hardware has some balance >>>>>>>>>>>>> of CPU to GPU processing rate. Your application has a balance of volume of >>>>>>>>>>>>> work for your two solves. They have to be the same to get close to 2x >>>>>>>>>>>>> speedup and that ratio(s) has to be 1:1. To be concrete, from what little I >>>>>>>>>>>>> can guess about your applications let's assume that the cost of each of >>>>>>>>>>>>> these two solves is about the same (eg, Laplacians on your domain and the >>>>>>>>>>>>> best case scenario). But, GPU machines are configured to have roughly 1-10% >>>>>>>>>>>>> of capacity in the GPUs, these days, that gives you an upper bound of about >>>>>>>>>>>>> 10% speedup. That is noise. Upshot, unless you configure your hardware to >>>>>>>>>>>>> match this problem, and the two solves have the same cost, you will not see >>>>>>>>>>>>> close to 2x speedup. Your time is better spent elsewhere. >>>>>>>>>>>>> >>>>>>>>>>>>> Mark >>>>>>>>>>>>> >>>>>>>>>>>>> On Sat, Aug 1, 2020 at 3:24 PM Jed Brown >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> You can use MPI and split the communicator so n-1 ranks >>>>>>>>>>>>>> create a DMDA for one part of your system and the other rank drives the GPU >>>>>>>>>>>>>> in the other part. They can all be part of the same coupled system on the >>>>>>>>>>>>>> full communicator, but PETSc doesn't currently support some ranks having >>>>>>>>>>>>>> their Vec arrays on GPU and others on host, so you'd be paying host-device >>>>>>>>>>>>>> transfer costs on each iteration (and that might swamp any performance >>>>>>>>>>>>>> benefit you would have gotten). >>>>>>>>>>>>>> >>>>>>>>>>>>>> In any case, be sure to think about the execution time of >>>>>>>>>>>>>> each part. Load balancing with matching time-to-solution for each part can >>>>>>>>>>>>>> be really hard. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Barry Smith writes: >>>>>>>>>>>>>> >>>>>>>>>>>>>> > Nicola, >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > This is really viable or practical at this time with >>>>>>>>>>>>>> PETSc. It is not impossible but requires careful coding with threads, >>>>>>>>>>>>>> another possibility is to use one half of the virtual GPUs for each solve, >>>>>>>>>>>>>> this is also not trivial. I would recommend first seeing what kind of >>>>>>>>>>>>>> performance you can get on the GPU for each type of solve and revist this >>>>>>>>>>>>>> idea in the future. >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > Barry >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> >> On Jul 31, 2020, at 9:23 AM, nicola varini < >>>>>>>>>>>>>> nicola.varini at gmail.com> wrote: >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> Hello, I would like to know if it is possible to overlap >>>>>>>>>>>>>> CPU and GPU with DMDA. >>>>>>>>>>>>>> >> I've a machine where each node has 1P100+1Haswell. >>>>>>>>>>>>>> >> I've to resolve Poisson and Ampere equation for each time >>>>>>>>>>>>>> step. >>>>>>>>>>>>>> >> I'm using 2D DMDA for each of them. Would be possible to >>>>>>>>>>>>>> compute poisson >>>>>>>>>>>>>> >> and ampere equation at the same time? One on CPU and the >>>>>>>>>>>>>> other on GPU? >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> Thanks >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Aug 18 08:30:15 2020 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 18 Aug 2020 09:30:15 -0400 Subject: [petsc-users] ParMETIS vs. CHACO when no partitioning is made In-Reply-To: References: <548DC35C-0D80-409E-B360-D7E54076111D@petsc.dev> Message-ID: On Mon, Aug 17, 2020 at 7:05 PM Fande Kong wrote: > IIRC, Chaco does not produce an arbitrary number of subdomains. The number > needs to be like 2^n. > No, Chaco can do an arbitrary number. Thanks, Matt > ParMETIS and PTScotch are much better, and they are production-level code. > If there is no particular reason, I would like to suggest staying with > ParMETIS and PTScotch. > > Thanks, > > Fande, > > > > On Fri, Aug 14, 2020 at 10:07 AM Eda Oktay wrote: > >> Dear Barry, >> >> Thank you for answering. I am sending a sample code and a binary file. >> >> Thanks! >> >> Eda >> >> Barry Smith , 14 A?u 2020 Cum, 18:49 tarihinde ?unu >> yazd?: >> >>> >>> Could be a bug in Chaco or its call from PETSc for the special case >>> of one process. Could you send a sample code that demonstrates the problem? >>> >>> Barry >>> >>> >>> > On Aug 14, 2020, at 8:53 AM, Eda Oktay wrote: >>> > >>> > Hi all, >>> > >>> > I am trying to try something. I am using the same MatPartitioning >>> codes for both CHACO and ParMETIS: >>> > >>> > ierr = >>> MatConvert(SymmA,MATMPIADJ,MAT_INITIAL_MATRIX,&AL);CHKERRQ(ierr); >>> > ierr = MatPartitioningCreate(MPI_COMM_WORLD,&part);CHKERRQ(ierr); >>> > ierr = MatPartitioningSetAdjacency(part,AL);CHKERRQ(ierr); >>> > >>> > ierr = MatPartitioningSetFromOptions(part);CHKERRQ(ierr); >>> > ierr = MatPartitioningApply(part,&partitioning);CHKERRQ(ierr); >>> > >>> > After obtaining the IS, I apply this to my original nonsymmetric >>> matrix and try to get an approximate edge cut. >>> > >>> > Except for 1 partitioning, my program completely works for 2,4 and 16 >>> partitionings. However, for 1, ParMETIS gives results where CHACO I guess >>> doesn't since I am getting errors about the index set. >>> > >>> > What is the difference between CHACO and ParMETIS that one works for 1 >>> partitioning and one doesn't? >>> > >>> > Thanks! >>> > >>> > Eda >>> >>> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From adantra at gmail.com Tue Aug 18 12:14:08 2020 From: adantra at gmail.com (Adolfo Rodriguez) Date: Tue, 18 Aug 2020 12:14:08 -0500 Subject: [petsc-users] Memory leak in snesnpc ? Message-ID: I am suspecting that there is a memory leak in the implementation of non-linear preconditioners in PETSc. When I use the following options I see the memory usage increase (using the windows process monitor) during consecutive time steps. PetscOptionsSetValue(NULL, "-snes_type", "qn"); PetscOptionsSetValue(NULL, "-snes_qn_monitor", ""); PetscOptionsSetValue(NULL, "-snes_qn_scale_type ", "jacobian"); PetscOptionsSetValue(NULL, "-npc_snes_max_it", max_it); PetscOptionsSetValue(NULL, "-npc_snes_type", "newtonls"); PetscOptionsSetValue(NULL, "-npc_pc_factor_levels", ilu_level); PetscOptionsSetValue(NULL, "-npc_ksp_rtol", s_ksp_rtol); PetscOptionsSetValue(NULL, "-npc_snes_linesearch_type", "bt"); PetscOptionsSetValue(NULL, "-npc_snes_linesearch_max_it", "5"); SNESSetFromOptions(snes); I destroy the context after each time step However, if I don't use a non-linear preconditioner, ie., PetscOptionsSetValue(NULL, "-snes_converged_reason", ""); PetscOptionsSetValue(NULL, "-snes_type", "qn"); PetscOptionsSetValue(NULL, "-snes_qn_monitor", ""); PetscOptionsSetValue(NULL, "-snes_qn_scale_type ", "jacobian"); SNESSetFromOptions(snes); Then everything works fine. Am I missing something? Thanks! Adolfo -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Aug 18 13:01:08 2020 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 18 Aug 2020 13:01:08 -0500 Subject: [petsc-users] Memory leak in snesnpc ? In-Reply-To: References: Message-ID: > On Aug 18, 2020, at 12:14 PM, Adolfo Rodriguez wrote: > > I am suspecting that there is a memory leak in the implementation of non-linear preconditioners in PETSc. It is possible, in the configurations we test there are no memory leaks, but perhaps your configuration does something we do not test. Run just one or two time-steps with command line option -malloc_debug (for the latest PETSc release). It should print at the end any unfreed memory. What happens if you do not destroy the context at each time-step but keep it for all time-steps? Is that possible? Still a memory leak? Barry > > When I use the following options I see the memory usage increase (using the windows process monitor) during consecutive time steps. > > PetscOptionsSetValue(NULL, "-snes_type", "qn"); > PetscOptionsSetValue(NULL, "-snes_qn_monitor", ""); > PetscOptionsSetValue(NULL, "-snes_qn_scale_type ", "jacobian"); > PetscOptionsSetValue(NULL, "-npc_snes_max_it", max_it); > PetscOptionsSetValue(NULL, "-npc_snes_type", "newtonls"); > PetscOptionsSetValue(NULL, "-npc_pc_factor_levels", ilu_level); > PetscOptionsSetValue(NULL, "-npc_ksp_rtol", s_ksp_rtol); > PetscOptionsSetValue(NULL, "-npc_snes_linesearch_type", "bt"); > PetscOptionsSetValue(NULL, "-npc_snes_linesearch_max_it", "5"); > > SNESSetFromOptions(snes); > > I destroy the context after each time step > > > However, if I don't use a non-linear preconditioner, ie., > > PetscOptionsSetValue(NULL, "-snes_converged_reason", ""); > PetscOptionsSetValue(NULL, "-snes_type", "qn"); > PetscOptionsSetValue(NULL, "-snes_qn_monitor", ""); > PetscOptionsSetValue(NULL, "-snes_qn_scale_type ", "jacobian"); > SNESSetFromOptions(snes); > > Then everything works fine. > > Am I missing something? > > Thanks! > Adolfo > From salazardetro1 at llnl.gov Tue Aug 18 13:11:15 2020 From: salazardetro1 at llnl.gov (Salazar De Troya, Miguel) Date: Tue, 18 Aug 2020 18:11:15 +0000 Subject: [petsc-users] Error calling TSSolve more than once with TSSetSaveTrajectory Message-ID: Hello, If I set up my TS with TSSetSaveTrajectory() and then call TSSolve() more than once, the second time I get this error: [0] TSSolve() line 4102 in /Users/salazardetro1/scicomp_libraries/firedrake-debug/firedrake/src/petsc/src/ts/interface/ts.c [0] TSTrajectorySet() line 73 in /Users/salazardetro1/scicomp_libraries/firedrake-debug/firedrake/src/petsc/src/ts/trajectory/interface/traj.c [0] TSHistoryUpdate() line 82 in /Users/salazardetro1/scicomp_libraries/firedrake-debug/firedrake/src/petsc/src/ts/interface/tshistory.c [0] Petsc has generated inconsistent data [0] History id should be unique If I call TSAdjointSolve() in between calls to TSSolve(), there is no problem. Is this the intended behavior for TSSetSaveTrajectory? Meaning that TSAdjointSolve() always has to be called right after TSSolve() if TSSetSaveTrajectory() is called. I need to call TSSolve() independently of TSAdjointSolve() whenever I want to perform a line search during the optimization. Is there a workaround like creating a new trajectory whenever TSSolve() is called? Thanks Miguel Miguel A. Salazar de Troya Postdoctoral Researcher, Lawrence Livermore National Laboratory B141 Rm: 1085-5 Ph: 1(925) 422-6411 -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Tue Aug 18 17:46:05 2020 From: hongzhang at anl.gov (Zhang, Hong) Date: Tue, 18 Aug 2020 22:46:05 +0000 Subject: [petsc-users] Error calling TSSolve more than once with TSSetSaveTrajectory In-Reply-To: References: Message-ID: <6688A9AB-BF39-4D04-B437-E73FB5E5B0E9@anl.gov> To get rid of this error, you can disable TSHistory with the command line option -ts_trajectory_use_history 0 or set up your TS with TSGetTrajectory(ts, &tj); TSTrajectorySetUseHistory(tj, PETSC_FALSE); It is a known issue, but not the intended behavior for TSSetSaveTrajectory. I think TSHistory should be disabled by default in TS. It is actually disabled in the setup for TSAdjoint. So if you call TSAdjointSolve once, this error should not occur in the following calls to TSSolve. Hong(Mr.) On Aug 18, 2020, at 1:11 PM, Salazar De Troya, Miguel via petsc-users > wrote: Hello, If I set up my TS with TSSetSaveTrajectory() and then call TSSolve() more than once, the second time I get this error: [0] TSSolve() line 4102 in /Users/salazardetro1/scicomp_libraries/firedrake-debug/firedrake/src/petsc/src/ts/interface/ts.c [0] TSTrajectorySet() line 73 in /Users/salazardetro1/scicomp_libraries/firedrake-debug/firedrake/src/petsc/src/ts/trajectory/interface/traj.c [0] TSHistoryUpdate() line 82 in /Users/salazardetro1/scicomp_libraries/firedrake-debug/firedrake/src/petsc/src/ts/interface/tshistory.c [0] Petsc has generated inconsistent data [0] History id should be unique If I call TSAdjointSolve() in between calls to TSSolve(), there is no problem. Is this the intended behavior for TSSetSaveTrajectory? Meaning that TSAdjointSolve() always has to be called right after TSSolve() if TSSetSaveTrajectory() is called. I need to call TSSolve() independently of TSAdjointSolve() whenever I want to perform a line search during the optimization. Is there a workaround like creating a new trajectory whenever TSSolve() is called? Thanks Miguel Miguel A. Salazar de Troya Postdoctoral Researcher, Lawrence Livermore National Laboratory B141 Rm: 1085-5 Ph: 1(925) 422-6411 -------------- next part -------------- An HTML attachment was scrubbed... URL: From salazardetro1 at llnl.gov Tue Aug 18 18:07:00 2020 From: salazardetro1 at llnl.gov (Salazar De Troya, Miguel) Date: Tue, 18 Aug 2020 23:07:00 +0000 Subject: [petsc-users] Error calling TSSolve more than once with TSSetSaveTrajectory In-Reply-To: <6688A9AB-BF39-4D04-B437-E73FB5E5B0E9@anl.gov> References: <6688A9AB-BF39-4D04-B437-E73FB5E5B0E9@anl.gov> Message-ID: <1BFDDBFD-4C36-40AA-9342-342742967C00@llnl.gov> Thank you, Does this compromise the ability of TSAdjoint to use checkpointing schemes to save memory? Miguel From: "Zhang, Hong" Date: Tuesday, August 18, 2020 at 3:46 PM To: "Salazar De Troya, Miguel" Cc: "Zhang, Hong via petsc-users" Subject: Re: [petsc-users] Error calling TSSolve more than once with TSSetSaveTrajectory To get rid of this error, you can disable TSHistory with the command line option -ts_trajectory_use_history 0 or set up your TS with TSGetTrajectory(ts, &tj); TSTrajectorySetUseHistory(tj, PETSC_FALSE); It is a known issue, but not the intended behavior for TSSetSaveTrajectory. I think TSHistory should be disabled by default in TS. It is actually disabled in the setup for TSAdjoint. So if you call TSAdjointSolve once, this error should not occur in the following calls to TSSolve. Hong(Mr.) On Aug 18, 2020, at 1:11 PM, Salazar De Troya, Miguel via petsc-users > wrote: Hello, If I set up my TS with TSSetSaveTrajectory() and then call TSSolve() more than once, the second time I get this error: [0] TSSolve() line 4102 in /Users/salazardetro1/scicomp_libraries/firedrake-debug/firedrake/src/petsc/src/ts/interface/ts.c [0] TSTrajectorySet() line 73 in /Users/salazardetro1/scicomp_libraries/firedrake-debug/firedrake/src/petsc/src/ts/trajectory/interface/traj.c [0] TSHistoryUpdate() line 82 in /Users/salazardetro1/scicomp_libraries/firedrake-debug/firedrake/src/petsc/src/ts/interface/tshistory.c [0] Petsc has generated inconsistent data [0] History id should be unique If I call TSAdjointSolve() in between calls to TSSolve(), there is no problem. Is this the intended behavior for TSSetSaveTrajectory? Meaning that TSAdjointSolve() always has to be called right after TSSolve() if TSSetSaveTrajectory() is called. I need to call TSSolve() independently of TSAdjointSolve() whenever I want to perform a line search during the optimization. Is there a workaround like creating a new trajectory whenever TSSolve() is called? Thanks Miguel Miguel A. Salazar de Troya Postdoctoral Researcher, Lawrence Livermore National Laboratory B141 Rm: 1085-5 Ph: 1(925) 422-6411 -------------- next part -------------- An HTML attachment was scrubbed... URL: From salazardetro1 at llnl.gov Tue Aug 18 18:12:46 2020 From: salazardetro1 at llnl.gov (Salazar De Troya, Miguel) Date: Tue, 18 Aug 2020 23:12:46 +0000 Subject: [petsc-users] Error calling TSSolve more than once with TSSetSaveTrajectory In-Reply-To: <6688A9AB-BF39-4D04-B437-E73FB5E5B0E9@anl.gov> References: <6688A9AB-BF39-4D04-B437-E73FB5E5B0E9@anl.gov> Message-ID: <9BC30F34-0374-4CEC-B826-169A9CE965B3@llnl.gov> Would it be possible for you to share a patch, PR or to point where in the ts.c file to add those lines? I am working from petsc4py, it would be easy for me to modify my petsc local repo than creating the petsc4py interface functions for those two calls. Thanks Miguel From: "Zhang, Hong" Date: Tuesday, August 18, 2020 at 3:46 PM To: "Salazar De Troya, Miguel" Cc: "Zhang, Hong via petsc-users" Subject: Re: [petsc-users] Error calling TSSolve more than once with TSSetSaveTrajectory To get rid of this error, you can disable TSHistory with the command line option -ts_trajectory_use_history 0 or set up your TS with TSGetTrajectory(ts, &tj); TSTrajectorySetUseHistory(tj, PETSC_FALSE); It is a known issue, but not the intended behavior for TSSetSaveTrajectory. I think TSHistory should be disabled by default in TS. It is actually disabled in the setup for TSAdjoint. So if you call TSAdjointSolve once, this error should not occur in the following calls to TSSolve. Hong(Mr.) On Aug 18, 2020, at 1:11 PM, Salazar De Troya, Miguel via petsc-users > wrote: Hello, If I set up my TS with TSSetSaveTrajectory() and then call TSSolve() more than once, the second time I get this error: [0] TSSolve() line 4102 in /Users/salazardetro1/scicomp_libraries/firedrake-debug/firedrake/src/petsc/src/ts/interface/ts.c [0] TSTrajectorySet() line 73 in /Users/salazardetro1/scicomp_libraries/firedrake-debug/firedrake/src/petsc/src/ts/trajectory/interface/traj.c [0] TSHistoryUpdate() line 82 in /Users/salazardetro1/scicomp_libraries/firedrake-debug/firedrake/src/petsc/src/ts/interface/tshistory.c [0] Petsc has generated inconsistent data [0] History id should be unique If I call TSAdjointSolve() in between calls to TSSolve(), there is no problem. Is this the intended behavior for TSSetSaveTrajectory? Meaning that TSAdjointSolve() always has to be called right after TSSolve() if TSSetSaveTrajectory() is called. I need to call TSSolve() independently of TSAdjointSolve() whenever I want to perform a line search during the optimization. Is there a workaround like creating a new trajectory whenever TSSolve() is called? Thanks Miguel Miguel A. Salazar de Troya Postdoctoral Researcher, Lawrence Livermore National Laboratory B141 Rm: 1085-5 Ph: 1(925) 422-6411 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Aug 18 18:34:14 2020 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 18 Aug 2020 18:34:14 -0500 Subject: [petsc-users] Error calling TSSolve more than once with TSSetSaveTrajectory In-Reply-To: <9BC30F34-0374-4CEC-B826-169A9CE965B3@llnl.gov> References: <6688A9AB-BF39-4D04-B437-E73FB5E5B0E9@anl.gov> <9BC30F34-0374-4CEC-B826-169A9CE965B3@llnl.gov> Message-ID: export PETSC_OPTIONS="-ts_trajectory_use_history 0" before starting python or os.environ['PETSC_OPTIONS'] = '-ts_trajectory_use_history 0' is the easiest. > On Aug 18, 2020, at 6:12 PM, Salazar De Troya, Miguel via petsc-users wrote: > > Would it be possible for you to share a patch, PR or to point where in the ts.c file to add those lines? I am working from petsc4py, it would be easy for me to modify my petsc local repo than creating the petsc4py interface functions for those two calls. > > Thanks > Miguel > > From: "Zhang, Hong" > > Date: Tuesday, August 18, 2020 at 3:46 PM > To: "Salazar De Troya, Miguel" > > Cc: "Zhang, Hong via petsc-users" > > Subject: Re: [petsc-users] Error calling TSSolve more than once with TSSetSaveTrajectory > > To get rid of this error, you can disable TSHistory with the command line option -ts_trajectory_use_history 0 > > or set up your TS with > > TSGetTrajectory(ts, &tj); > TSTrajectorySetUseHistory(tj, PETSC_FALSE); > > It is a known issue, but not the intended behavior for TSSetSaveTrajectory. I think TSHistory should be disabled by default in TS. It is actually disabled in the setup for TSAdjoint. So if you call TSAdjointSolve once, this error should not occur in the following calls to TSSolve. > > Hong(Mr.) > > >> On Aug 18, 2020, at 1:11 PM, Salazar De Troya, Miguel via petsc-users > wrote: >> >> Hello, >> >> If I set up my TS with TSSetSaveTrajectory() and then call TSSolve() more than once, the second time I get this error: >> >> [0] TSSolve() line 4102 in /Users/salazardetro1/scicomp_libraries/firedrake-debug/firedrake/src/petsc/src/ts/interface/ts.c >> [0] TSTrajectorySet() line 73 in /Users/salazardetro1/scicomp_libraries/firedrake-debug/firedrake/src/petsc/src/ts/trajectory/interface/traj.c >> [0] TSHistoryUpdate() line 82 in /Users/salazardetro1/scicomp_libraries/firedrake-debug/firedrake/src/petsc/src/ts/interface/tshistory.c >> [0] Petsc has generated inconsistent data >> [0] History id should be unique >> >> If I call TSAdjointSolve() in between calls to TSSolve(), there is no problem. Is this the intended behavior for TSSetSaveTrajectory? Meaning that TSAdjointSolve() always has to be called right after TSSolve() if TSSetSaveTrajectory() is called. I need to call TSSolve() independently of TSAdjointSolve() whenever I want to perform a line search during the optimization. >> >> Is there a workaround like creating a new trajectory whenever TSSolve() is called? >> >> Thanks >> Miguel >> >> >> >> Miguel A. Salazar de Troya >> Postdoctoral Researcher, Lawrence Livermore National Laboratory >> B141 >> Rm: 1085-5 >> Ph: 1(925) 422-6411 -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Tue Aug 18 18:43:08 2020 From: hongzhang at anl.gov (Zhang, Hong) Date: Tue, 18 Aug 2020 23:43:08 +0000 Subject: [petsc-users] Error calling TSSolve more than once with TSSetSaveTrajectory In-Reply-To: <1BFDDBFD-4C36-40AA-9342-342742967C00@llnl.gov> References: <6688A9AB-BF39-4D04-B437-E73FB5E5B0E9@anl.gov> <1BFDDBFD-4C36-40AA-9342-342742967C00@llnl.gov> Message-ID: <13E61036-6CED-4104-A80C-71366D83D05C@anl.gov> It does not compromise the checkpointing ability since TSAdjoint does not rely on TSHistory. diff --git a/src/ts/trajectory/interface/traj.c b/src/ts/trajectory/interface/traj.c index 465a52f5cf..8267fd73a3 100644 --- a/src/ts/trajectory/interface/traj.c +++ b/src/ts/trajectory/interface/traj.c @@ -450,7 +450,7 @@ PetscErrorCode TSTrajectoryCreate(MPI_Comm comm,TSTrajectory *tj) t->adjoint_solve_mode = PETSC_TRUE; t->solution_only = PETSC_FALSE; t->keepfiles = PETSC_FALSE; - t->usehistory = PETSC_TRUE; + t->usehistory = PETSC_FALSE; *tj = t; ierr = TSTrajectorySetFiletemplate(t,"TS-%06D.bin");CHKERRQ(ierr); PetscFunctionReturn(0) Hong (Mr.) On Aug 18, 2020, at 6:07 PM, Salazar De Troya, Miguel > wrote: Thank you, Does this compromise the ability of TSAdjoint to use checkpointing schemes to save memory? Miguel From: "Zhang, Hong" > Date: Tuesday, August 18, 2020 at 3:46 PM To: "Salazar De Troya, Miguel" > Cc: "Zhang, Hong via petsc-users" > Subject: Re: [petsc-users] Error calling TSSolve more than once with TSSetSaveTrajectory To get rid of this error, you can disable TSHistory with the command line option -ts_trajectory_use_history 0 or set up your TS with TSGetTrajectory(ts, &tj); TSTrajectorySetUseHistory(tj, PETSC_FALSE); It is a known issue, but not the intended behavior for TSSetSaveTrajectory. I think TSHistory should be disabled by default in TS. It is actually disabled in the setup for TSAdjoint. So if you call TSAdjointSolve once, this error should not occur in the following calls to TSSolve. Hong(Mr.) On Aug 18, 2020, at 1:11 PM, Salazar De Troya, Miguel via petsc-users > wrote: Hello, If I set up my TS with TSSetSaveTrajectory() and then call TSSolve() more than once, the second time I get this error: [0] TSSolve() line 4102 in /Users/salazardetro1/scicomp_libraries/firedrake-debug/firedrake/src/petsc/src/ts/interface/ts.c [0] TSTrajectorySet() line 73 in /Users/salazardetro1/scicomp_libraries/firedrake-debug/firedrake/src/petsc/src/ts/trajectory/interface/traj.c [0] TSHistoryUpdate() line 82 in /Users/salazardetro1/scicomp_libraries/firedrake-debug/firedrake/src/petsc/src/ts/interface/tshistory.c [0] Petsc has generated inconsistent data [0] History id should be unique If I call TSAdjointSolve() in between calls to TSSolve(), there is no problem. Is this the intended behavior for TSSetSaveTrajectory? Meaning that TSAdjointSolve() always has to be called right after TSSolve() if TSSetSaveTrajectory() is called. I need to call TSSolve() independently of TSAdjointSolve() whenever I want to perform a line search during the optimization. Is there a workaround like creating a new trajectory whenever TSSolve() is called? Thanks Miguel Miguel A. Salazar de Troya Postdoctoral Researcher, Lawrence Livermore National Laboratory B141 Rm: 1085-5 Ph: 1(925) 422-6411 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bhatiamanav at gmail.com Wed Aug 19 19:30:11 2020 From: bhatiamanav at gmail.com (Manav Bhatia) Date: Wed, 19 Aug 2020 19:30:11 -0500 Subject: [petsc-users] MatAssemblyEnd taking too long Message-ID: <96E8FF05-5719-4244-A763-A9BACDA21C30@gmail.com> Hi, I have an application that uses the KSP solver and I am looking at the performance with increasing system size. I am currently running on MacBook Pro with 32 GB memory and Petsc obtained from GitHub (commit df0e43005dbe6ff47eff22a32b336a6c37d02c3a). The application runs fine till about 2e6 DoFs using gamg without any problems. However, when I try a larger system size, in this case with 5.4e6 DoFs, the application hangs for an hour and I have to kill the MPI processes. I tried to use Xcode Instruments to profile the 8 MPI processes and I have attached a screenshot of the recorded results from each process. All processes are stuck inside MatAssemblyEnd, but at different function calls. I am not sure how to debug this issue, and would greatly appreciate any guidance. For reference, I am calling PETSc with the following options: -ksp_type gmres -pc_type gamg -mat_block_size 3 -mg_levels_ksp_max_it 4 -ksp_monitor -ksp_converged_reason Regards, Manav -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2020-08-19 at 7.16.08 PM.png Type: image/png Size: 211493 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2020-08-19 at 7.16.06 PM.png Type: image/png Size: 202337 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2020-08-19 at 7.16.04 PM.png Type: image/png Size: 184314 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2020-08-19 at 7.16.02 PM.png Type: image/png Size: 185896 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2020-08-19 at 7.15.59 PM.png Type: image/png Size: 804446 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2020-08-19 at 7.15.57 PM.png Type: image/png Size: 851646 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2020-08-19 at 7.15.55 PM.png Type: image/png Size: 787467 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2020-08-19 at 7.15.52 PM.png Type: image/png Size: 966045 bytes Desc: not available URL: From jed at jedbrown.org Wed Aug 19 19:42:27 2020 From: jed at jedbrown.org (Jed Brown) Date: Wed, 19 Aug 2020 18:42:27 -0600 Subject: [petsc-users] MatAssemblyEnd taking too long In-Reply-To: <96E8FF05-5719-4244-A763-A9BACDA21C30@gmail.com> References: <96E8FF05-5719-4244-A763-A9BACDA21C30@gmail.com> Message-ID: <87k0xui1vw.fsf@jedbrown.org> Can you share a couple example stack traces from that debugging? About how many nonzeros per row? Manav Bhatia writes: > Hi, > > I have an application that uses the KSP solver and I am looking at the performance with increasing system size. I am currently running on MacBook Pro with 32 GB memory and Petsc obtained from GitHub (commit df0e43005dbe6ff47eff22a32b336a6c37d02c3a). > > The application runs fine till about 2e6 DoFs using gamg without any problems. > > However, when I try a larger system size, in this case with 5.4e6 DoFs, the application hangs for an hour and I have to kill the MPI processes. > > I tried to use Xcode Instruments to profile the 8 MPI processes and I have attached a screenshot of the recorded results from each process. All processes are stuck inside MatAssemblyEnd, but at different function calls. > > I am not sure how to debug this issue, and would greatly appreciate any guidance. > > For reference, I am calling PETSc with the following options: > -ksp_type gmres -pc_type gamg -mat_block_size 3 -mg_levels_ksp_max_it 4 -ksp_monitor -ksp_converged_reason > > Regards, > Manav From bhatiamanav at gmail.com Wed Aug 19 19:50:33 2020 From: bhatiamanav at gmail.com (Manav Bhatia) Date: Wed, 19 Aug 2020 19:50:33 -0500 Subject: [petsc-users] MatAssemblyEnd taking too long In-Reply-To: <87k0xui1vw.fsf@jedbrown.org> References: <96E8FF05-5719-4244-A763-A9BACDA21C30@gmail.com> <87k0xui1vw.fsf@jedbrown.org> Message-ID: Thanks for the followup, Jed. > On Aug 19, 2020, at 7:42 PM, Jed Brown wrote: > > Can you share a couple example stack traces from that debugging? Do you mean a similar screenshot at different system sizes? Or a different format? > About how many nonzeros per row? This is a 3D elasticity run with Hex8 elements. So, each row has 81 non-zero entries, although I have not verified that (I will do so now). Is there a command line argument that will print this for the matrix? Although, on second thought that will not be printed unless the Assembly routine has finished. > > Manav Bhatia writes: > >> Hi, >> >> I have an application that uses the KSP solver and I am looking at the performance with increasing system size. I am currently running on MacBook Pro with 32 GB memory and Petsc obtained from GitHub (commit df0e43005dbe6ff47eff22a32b336a6c37d02c3a). >> >> The application runs fine till about 2e6 DoFs using gamg without any problems. >> >> However, when I try a larger system size, in this case with 5.4e6 DoFs, the application hangs for an hour and I have to kill the MPI processes. >> >> I tried to use Xcode Instruments to profile the 8 MPI processes and I have attached a screenshot of the recorded results from each process. All processes are stuck inside MatAssemblyEnd, but at different function calls. >> >> I am not sure how to debug this issue, and would greatly appreciate any guidance. >> >> For reference, I am calling PETSc with the following options: >> -ksp_type gmres -pc_type gamg -mat_block_size 3 -mg_levels_ksp_max_it 4 -ksp_monitor -ksp_converged_reason >> >> Regards, >> Manav From jed at jedbrown.org Wed Aug 19 19:56:00 2020 From: jed at jedbrown.org (Jed Brown) Date: Wed, 19 Aug 2020 18:56:00 -0600 Subject: [petsc-users] MatAssemblyEnd taking too long In-Reply-To: References: <96E8FF05-5719-4244-A763-A9BACDA21C30@gmail.com> <87k0xui1vw.fsf@jedbrown.org> Message-ID: <87h7syi19b.fsf@jedbrown.org> Manav Bhatia writes: > Thanks for the followup, Jed. > >> On Aug 19, 2020, at 7:42 PM, Jed Brown wrote: >> >> Can you share a couple example stack traces from that debugging? > > Do you mean a similar screenshot at different system sizes? Or a different format? Sorry, I missed the screenshots (they were tucked away in the text/html and I was reading the text/plain version of your message). >> About how many nonzeros per row? > > This is a 3D elasticity run with Hex8 elements. So, each row has 81 non-zero entries, although I have not verified that (I will do so now). Is there a command line argument that will print this for the matrix? Although, on second thought that will not be printed unless the Assembly routine has finished. You could run a smaller problem size with -snes_view, which would show matrix stats. Can you try running with -matstash_legacy? What version of Open MPI is this? From pranayreddy865 at gmail.com Wed Aug 19 20:00:11 2020 From: pranayreddy865 at gmail.com (baikadi pranay) Date: Wed, 19 Aug 2020 18:00:11 -0700 Subject: [petsc-users] Segmentation fault when using KSPSetInitialGuessNonzero Message-ID: Hi, I am trying to build a 2D poisson solver using BiCGStab in FORTRAN 90. After I form the A matrix and create the x and b vectors, I call the routine KSPSetInitialGuessNonzero, so that I can pass a specific initialization of the x-vector to the solver. However, when I do this I run into a Segmentation Violation at this line. I am not sure why I am running into this problem. I am attaching the screenshot of the error message, and would appreciate any help you can provide. Please let me know if you need any further information. Thank you. Best Regards, Pranay. ? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PETSc_SEGV.JPG Type: image/jpeg Size: 197718 bytes Desc: not available URL: From bhatiamanav at gmail.com Wed Aug 19 20:06:55 2020 From: bhatiamanav at gmail.com (Manav Bhatia) Date: Wed, 19 Aug 2020 20:06:55 -0500 Subject: [petsc-users] MatAssemblyEnd taking too long In-Reply-To: <87h7syi19b.fsf@jedbrown.org> References: <96E8FF05-5719-4244-A763-A9BACDA21C30@gmail.com> <87k0xui1vw.fsf@jedbrown.org> <87h7syi19b.fsf@jedbrown.org> Message-ID: <95AC8133-4B3E-4964-B64D-7A11405D41A2@gmail.com> > On Aug 19, 2020, at 7:56 PM, Jed Brown wrote: > > Manav Bhatia writes: > >> Thanks for the followup, Jed. >> >>> On Aug 19, 2020, at 7:42 PM, Jed Brown wrote: >>> >>> Can you share a couple example stack traces from that debugging? >> >> Do you mean a similar screenshot at different system sizes? Or a different format? > > Sorry, I missed the screenshots (they were tucked away in the text/html and I was reading the text/plain version of your message). Glad you found them. Please let me know if more information would help. > >>> About how many nonzeros per row? >> >> This is a 3D elasticity run with Hex8 elements. So, each row has 81 non-zero entries, although I have not verified that (I will do so now). Is there a command line argument that will print this for the matrix? Although, on second thought that will not be printed unless the Assembly routine has finished. > > You could run a smaller problem size with -snes_view, which would show matrix stats. Here is the information from a case with 2e6 DoFs. KSP Object: 8 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 8 MPI processes type: gamg type is MULTIPLICATIVE, levels=5 cycles=v Cycles per PCApply=1 Using externally compute Galerkin coarse grid matrices GAMG specific options Threshold for dropping small values in graph on each level = 0. 0. 0. Threshold scaling factor for each level not specified = 1. AGG specific options Symmetric graph false Number of levels to square graph 1 Number smoothing steps 1 Complexity: grid = 1.16005 Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 8 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 8 MPI processes type: bjacobi number of blocks = 8 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (mg_coarse_sub_) 1 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_sub_) 1 MPI processes type: lu out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5., needed 1. Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=12, cols=12, bs=6 package used to perform factorization: petsc total: nonzeros=144, allocated nonzeros=144 total number of mallocs used during MatSetValues calls=0 using I-node routines: found 3 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=12, cols=12, bs=6 total: nonzeros=144, allocated nonzeros=144 total number of mallocs used during MatSetValues calls=0 using I-node routines: found 3 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaij rows=12, cols=12, bs=6 total: nonzeros=144, allocated nonzeros=144 total number of mallocs used during MatSetValues calls=0 using I-node (on process 0) routines: found 3 nodes, limit used is 5 Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 8 MPI processes type: chebyshev eigenvalue estimates used: min = 0.16303, max = 1.79333 eigenvalues estimate via gmres min 0.0108937, max 1.6303 eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] KSP Object: (mg_levels_1_esteig_) 8 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test estimating eigenvalues using noisy right hand side maximum iterations=4, nonzero initial guess tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_levels_1_) 8 MPI processes type: sor type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaij rows=240, cols=240, bs=6 total: nonzeros=51912, allocated nonzeros=51912 total number of mallocs used during MatSetValues calls=0 using I-node (on process 0) routines: found 13 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 2 ------------------------------- KSP Object: (mg_levels_2_) 8 MPI processes type: chebyshev eigenvalue estimates used: min = 0.146755, max = 1.6143 eigenvalues estimate via gmres min 0.00483441, max 1.46755 eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] KSP Object: (mg_levels_2_esteig_) 8 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test estimating eigenvalues using noisy right hand side maximum iterations=4, nonzero initial guess tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_levels_2_) 8 MPI processes type: sor type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaij rows=6336, cols=6336, bs=6 total: nonzeros=3902760, allocated nonzeros=3902760 total number of mallocs used during MatSetValues calls=0 using nonscalable MatPtAP() implementation using I-node (on process 0) routines: found 228 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 3 ------------------------------- KSP Object: (mg_levels_3_) 8 MPI processes type: chebyshev eigenvalue estimates used: min = 0.1525, max = 1.67751 eigenvalues estimate via gmres min 0.0281517, max 1.525 eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] KSP Object: (mg_levels_3_esteig_) 8 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test estimating eigenvalues using noisy right hand side maximum iterations=4, nonzero initial guess tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_levels_3_) 8 MPI processes type: sor type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaij rows=87246, cols=87246, bs=6 total: nonzeros=21279420, allocated nonzeros=21279420 total number of mallocs used during MatSetValues calls=0 using nonscalable MatPtAP() implementation using I-node (on process 0) routines: found 3552 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) Down solver (pre-smoother) on level 4 ------------------------------- KSP Object: (mg_levels_4_) 8 MPI processes type: chebyshev eigenvalue estimates used: min = 0.160784, max = 1.76862 eigenvalues estimate via gmres min 0.0293826, max 1.60784 eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] KSP Object: (mg_levels_4_esteig_) 8 MPI processes type: gmres restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test estimating eigenvalues using noisy right hand side maximum iterations=4, nonzero initial guess tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_levels_4_) 8 MPI processes type: sor type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: () 8 MPI processes type: mpiaij rows=2000103, cols=2000103, bs=3 total: nonzeros=157666509, allocated nonzeros=160054056 total number of mallocs used during MatSetValues calls=0 has attached near null space using I-node (on process 0) routines: found 86672 nodes, limit used is 5 Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix = precond matrix: Mat Object: () 8 MPI processes type: mpiaij rows=2000103, cols=2000103, bs=3 total: nonzeros=157666509, allocated nonzeros=160054056 total number of mallocs used during MatSetValues calls=0 has attached near null space using I-node (on process 0) routines: found 86672 nodes, limit used is 5 > > Can you try running with -matstash_legacy? Will do and report results shortly. > > What version of Open MPI is this? This is MPI 4.0.1 installed using macports: InfiHorizon:opt manav$ mpiexec-openmpi-clang --version mpiexec-openmpi-clang (OpenRTE) 4.0.1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bhatiamanav at gmail.com Wed Aug 19 20:14:41 2020 From: bhatiamanav at gmail.com (Manav Bhatia) Date: Wed, 19 Aug 2020 20:14:41 -0500 Subject: [petsc-users] MatAssemblyEnd taking too long In-Reply-To: <95AC8133-4B3E-4964-B64D-7A11405D41A2@gmail.com> References: <96E8FF05-5719-4244-A763-A9BACDA21C30@gmail.com> <87k0xui1vw.fsf@jedbrown.org> <87h7syi19b.fsf@jedbrown.org> <95AC8133-4B3E-4964-B64D-7A11405D41A2@gmail.com> Message-ID: <3B99E2E7-80E5-415B-82E6-9107560F4EED@gmail.com> > On Aug 19, 2020, at 8:06 PM, Manav Bhatia wrote: > >> Can you try running with -matstash_legacy? > > Will do and report results shortly. > -ksp_type gmres -pc_type gamg -mat_block_size 3 -mg_levels_ksp_max_it 4 -ksp_monitor -ksp_converged_reason -ksp_view -matstash_legacy Running with this option results in hanging of the processes in MatAssemblyEnd. Attached is a screenshot. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2020-08-19 at 8.13.06 PM.png Type: image/png Size: 904834 bytes Desc: not available URL: From pranayreddy865 at gmail.com Wed Aug 19 20:44:51 2020 From: pranayreddy865 at gmail.com (baikadi pranay) Date: Wed, 19 Aug 2020 18:44:51 -0700 Subject: [petsc-users] Segmentation fault when using KSPSetInitialGuessNonzero In-Reply-To: References: Message-ID: Problem is solved. I was calling a KSP routine before creating the KSP context. Best Regards, Pranay. ? On Wed, Aug 19, 2020 at 6:00 PM baikadi pranay wrote: > Hi, > > I am trying to build a 2D poisson solver using BiCGStab in FORTRAN 90. > After I form the A matrix and create the x and b vectors, I call the > routine KSPSetInitialGuessNonzero, so that I can pass a specific > initialization of the x-vector to the solver. However, when I do this I run > into a Segmentation Violation at this line. I am not sure why I am running > into this problem. I am attaching the screenshot of the error message, and > would appreciate any help you can provide. > > Please let me know if you need any further information. > > Thank you. > > Best Regards, > Pranay. > ? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pranayreddy865 at gmail.com Wed Aug 19 21:05:02 2020 From: pranayreddy865 at gmail.com (baikadi pranay) Date: Wed, 19 Aug 2020 19:05:02 -0700 Subject: [petsc-users] On re usability of A matrix and b vector Message-ID: Hello, I am trying to solve the poisson equation iteratively using BiCGStab in FORTRAN 90. After every call to KSPSolve, I update the central coefficients of A matrix and the b vector (and then solve the new linear equation system, repeating the process until convergence is achieved). I want to know whether the A matrix and b vector that are created initially can be used in the iteration process or do I need to create a new A matrix and b vector in each iteration. Please let me know if you need any further information. Thank you. Sincerely, Pranay. ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Aug 19 21:39:22 2020 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 19 Aug 2020 22:39:22 -0400 Subject: [petsc-users] MatAssemblyEnd taking too long In-Reply-To: <87h7syi19b.fsf@jedbrown.org> References: <96E8FF05-5719-4244-A763-A9BACDA21C30@gmail.com> <87k0xui1vw.fsf@jedbrown.org> <87h7syi19b.fsf@jedbrown.org> Message-ID: On Wed, Aug 19, 2020 at 8:55 PM Jed Brown wrote: > Manav Bhatia writes: > > > Thanks for the followup, Jed. > > > >> On Aug 19, 2020, at 7:42 PM, Jed Brown wrote: > >> > >> Can you share a couple example stack traces from that debugging? > > > > Do you mean a similar screenshot at different system sizes? Or a > different format? > > Sorry, I missed the screenshots (they were tucked away in the text/html > and I was reading the text/plain version of your message). > > >> About how many nonzeros per row? > > > > This is a 3D elasticity run with Hex8 elements. So, each row has 81 > non-zero entries, although I have not verified that (I will do so now). Is > there a command line argument that will print this for the matrix? > Although, on second thought that will not be printed unless the Assembly > routine has finished. > > You could run a smaller problem size with -snes_view, which would show > matrix stats. > > Can you try running with -matstash_legacy? > > What version of Open MPI is this? > Jed is more knowledgeable about the communication, but I have a simple question about the FEM method. Normally, the way we divide unknowns is that the only unknowns which might have entries computed off-process are those on the partition boundary. However, it sounds like you have a huge number of communicated values. Is it possible that the division of rows in your matrix does not match the division of the cells you compute element matrices for? Thanks, Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Wed Aug 19 23:05:58 2020 From: bsmith at petsc.dev (Barry Smith) Date: Wed, 19 Aug 2020 23:05:58 -0500 Subject: [petsc-users] On re usability of A matrix and b vector In-Reply-To: References: Message-ID: KSP knows if the matrix has changed and rebuilds the parts of the preconditioner it needs if the matrix has changed. When the matrix has not changed it uses the exact same preconditioner as before. Barry This "trick" is done by having an integer state variable inside the matrix object. Each time the matrix is changed by MatSetValues() etc. the state variable is incremented. KSPSolve() keeps a record of the state variable of the matrix for each call to SNESSolve(), if the state variable has increased it knows the matrix has changed so updates the preconditioner. > On Aug 19, 2020, at 9:05 PM, baikadi pranay wrote: > > Hello, > > I am trying to solve the poisson equation iteratively using BiCGStab in FORTRAN 90. After every call to KSPSolve, I update the central coefficients of A matrix and the b vector (and then solve the new linear equation system, repeating the process until convergence is achieved). I want to know whether the A matrix and b vector that are created initially can be used in the iteration process or do I need to create a new A matrix and b vector in each iteration. > > Please let me know if you need any further information. > > Thank you. > > Sincerely, > Pranay. > ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From thibault.bridelbertomeu at gmail.com Thu Aug 20 06:44:06 2020 From: thibault.bridelbertomeu at gmail.com (Thibault Bridel-Bertomeu) Date: Thu, 20 Aug 2020 13:44:06 +0200 Subject: [petsc-users] PetscFV and TS implicit Message-ID: Dear all, I have a finite volume code inspired from TS example ex11.c (with a riemann solver, etc...). So far, I used only explicit time stepping through the TSSSP, and to set the RHS of my hyperbolic system I used : TSSetType(ts, TSSSP); DMTSSetRHSFunctionLocal(dm, DMPlexTSComputeRHSFunctionFVM , &ctx); after setting the right Riemann solver in the TS associated to the DM. Now, in some cases where the physics is stationary, I would like to reach the steady state faster and use an implicit timestepping operator to do so. After looking through the examples, especially TS examples ex48.c and ex53.c, I simply tried to set TSSetType(ts, TSBEULER); DMTSSetIFunctionLocal(dm, DMPlexTSComputeIFunctionFEM , &ctx); DMTSSetIJacobianLocal(dm, DMPlexTSComputeIJacobianFEM , &ctx); instead of the previous calls. It compiles fine, and it runs. However, nothing happens : it behaves like there is no time evolution at all, the solution does not change from its initial state. >From the source code, it is my understanding that the DMPlexTSComputeIFunctionFEM and DMPlexTSComputeIJacobianFEM methods, in spite of their names, call respectively DMPlexComputeResidual_Internal and DMPlexComputeJacobian_Internal that can handle a FVM discretization. What am I missing ? Are there other steps to take before I can simply try to run with a finite volume discretization and an implicit time stepping algorithm ? Thank you very much for your help in advance ! Thibault Bridel-Bertomeu ? Eng, MSc, PhD Research Engineer CEA/CESTA 33114 LE BARP Tel.: (+33)557046924 Mob.: (+33)611025322 Mail: thibault.bridelbertomeu at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From bhatiamanav at gmail.com Thu Aug 20 08:10:18 2020 From: bhatiamanav at gmail.com (Manav Bhatia) Date: Thu, 20 Aug 2020 08:10:18 -0500 Subject: [petsc-users] MatAssemblyEnd taking too long In-Reply-To: References: <96E8FF05-5719-4244-A763-A9BACDA21C30@gmail.com> <87k0xui1vw.fsf@jedbrown.org> <87h7syi19b.fsf@jedbrown.org> Message-ID: <295DDF3F-433D-4057-9658-31F260D18BE8@gmail.com> > On Aug 19, 2020, at 9:39 PM, Matthew Knepley wrote: > > Jed is more knowledgeable about the communication, but I have a simple question about the FEM method. Normally, the way > we divide unknowns is that the only unknowns which might have entries computed off-process are those on the partition boundary. > However, it sounds like you have a huge number of communicated values. Is it possible that the division of rows in your matrix does > not match the division of the cells you compute element matrices for? I hope that is not the case. I am using libMesh to manage the mesh and creation of sparsity pattern, which uses Parmetis to create the partitions. libMesh ensures that off-process entries are only at the partition boundary (unless an extra set of DoFs are marked for coupling. I also printed and looked at the n_nz and n_oz values on each rank and it does not seem to raise any flags. I will try to dig in a bit further to make sure everything checks out. Looking at the screenshots I had shared yesterday, all processes are in this function: PetscErrorCode MatAssemblyEnd_MPIAIJ(Mat mat,MatAssemblyType mode) { Mat_MPIAIJ *aij = (Mat_MPIAIJ*)mat->data; Mat_SeqAIJ *a = (Mat_SeqAIJ*)aij->A->data; PetscErrorCode ierr; PetscMPIInt n; PetscInt i,j,rstart,ncols,flg; PetscInt *row,*col; PetscBool other_disassembled; PetscScalar *val; /* do not use 'b = (Mat_SeqAIJ*)aij->B->data' as B can be reset in disassembly */ PetscFunctionBegin; if (!aij->donotstash && !mat->nooffprocentries) { while (1) { ierr = MatStashScatterGetMesg_Private(&mat->stash,&n,&row,&col,&val,&flg);CHKERRQ(ierr); if (!flg) break; for (i=0; iinsertmode);CHKERRQ(ierr); i = j; } } ierr = MatStashScatterEnd_Private(&mat->stash);CHKERRQ(ierr); } ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr); ierr = MatAssemblyEnd(aij->A,mode);CHKERRQ(ierr); /* determine if any processor has disassembled, if so we must also disassemble ourselfs, in order that we may reassemble. */ /* if nonzero structure of submatrix B cannot change then we know that no processor disassembled thus we can skip this stuff */ if (!((Mat_SeqAIJ*)aij->B->data)->nonew) { ierr = MPIU_Allreduce(&mat->was_assembled,&other_disassembled,1,MPIU_BOOL,MPI_PROD,PetscObjectComm((PetscObject)mat));CHKERRQ(ierr); if (mat->was_assembled && !other_disassembled) { ierr = MatDisAssemble_MPIAIJ(mat);CHKERRQ(ierr); } } if (!mat->was_assembled && mode == MAT_FINAL_ASSEMBLY) { ierr = MatSetUpMultiply_MPIAIJ(mat);CHKERRQ(ierr); } ierr = MatSetOption(aij->B,MAT_USE_INODES,PETSC_FALSE);CHKERRQ(ierr); ierr = MatAssemblyBegin(aij->B,mode);CHKERRQ(ierr); ierr = MatAssemblyEnd(aij->B,mode);CHKERRQ(ierr); ierr = PetscFree2(aij->rowvalues,aij->rowindices);CHKERRQ(ierr); aij->rowvalues = 0; ierr = VecDestroy(&aij->diag);CHKERRQ(ierr); if (a->inode.size) mat->ops->multdiagonalblock = MatMultDiagonalBlock_MPIAIJ; /* if no new nonzero locations are allowed in matrix then only set the matrix state the first time through */ if ((!mat->was_assembled && mode == MAT_FINAL_ASSEMBLY) || !((Mat_SeqAIJ*)(aij->A->data))->nonew) { PetscObjectState state = aij->A->nonzerostate + aij->B->nonzerostate; ierr = MPIU_Allreduce(&state,&mat->nonzerostate,1,MPIU_INT64,MPI_SUM,PetscObjectComm((PetscObject)mat));CHKERRQ(ierr); } PetscFunctionReturn(0); } I noticed that of the 8 MPI processes, 2 were stuck at ierr = MatStashScatterGetMesg_Private(&mat->stash,&n,&row,&col,&val,&flg);CHKERRQ(ierr); Other two were stuck at ierr = MatStashScatterEnd_Private(&mat->stash);CHKERRQ(ierr); And remaining four were under ierr = MatSetUpMultiply_MPIAIJ(mat);CHKERRQ(ierr); Is it expected for processes to be at different stages in this function? -Manav -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Thu Aug 20 08:31:26 2020 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Thu, 20 Aug 2020 15:31:26 +0200 Subject: [petsc-users] MatAssemblyEnd taking too long In-Reply-To: <295DDF3F-433D-4057-9658-31F260D18BE8@gmail.com> References: <96E8FF05-5719-4244-A763-A9BACDA21C30@gmail.com> <87k0xui1vw.fsf@jedbrown.org> <87h7syi19b.fsf@jedbrown.org> <295DDF3F-433D-4057-9658-31F260D18BE8@gmail.com> Message-ID: <4144EFF3-4F1E-4721-A72B-3321497655B2@gmail.com> Manav Can you add a MPI_Barrier before ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr); Also, in order to assess where the issue is, we need to see the values (per rank) of ((Mat_SeqAIJ*)aij->B->data)->nonew mat->was_assembled aij->donotstash mat->nooffprocentries Another question: is this the first matrix assembly of the code? If you change to pc_none, do you get the same issue? > On Aug 20, 2020, at 3:10 PM, Manav Bhatia wrote: > > > >> On Aug 19, 2020, at 9:39 PM, Matthew Knepley > wrote: >> >> Jed is more knowledgeable about the communication, but I have a simple question about the FEM method. Normally, the way >> we divide unknowns is that the only unknowns which might have entries computed off-process are those on the partition boundary. >> However, it sounds like you have a huge number of communicated values. Is it possible that the division of rows in your matrix does >> not match the division of the cells you compute element matrices for? > > > I hope that is not the case. I am using libMesh to manage the mesh and creation of sparsity pattern, which uses Parmetis to create the partitions. > libMesh ensures that off-process entries are only at the partition boundary (unless an extra set of DoFs are marked for coupling. > > I also printed and looked at the n_nz and n_oz values on each rank and it does not seem to raise any flags. > > I will try to dig in a bit further to make sure everything checks out. > > Looking at the screenshots I had shared yesterday, all processes are in this function: > > PetscErrorCode MatAssemblyEnd_MPIAIJ(Mat mat,MatAssemblyType mode) > { > Mat_MPIAIJ *aij = (Mat_MPIAIJ*)mat->data; > Mat_SeqAIJ *a = (Mat_SeqAIJ*)aij->A->data; > PetscErrorCode ierr; > PetscMPIInt n; > PetscInt i,j,rstart,ncols,flg; > PetscInt *row,*col; > PetscBool other_disassembled; > PetscScalar *val; > > /* do not use 'b = (Mat_SeqAIJ*)aij->B->data' as B can be reset in disassembly */ > > PetscFunctionBegin; > if (!aij->donotstash && !mat->nooffprocentries) { > while (1) { > ierr = MatStashScatterGetMesg_Private(&mat->stash,&n,&row,&col,&val,&flg);CHKERRQ(ierr); > if (!flg) break; > > for (i=0; i /* Now identify the consecutive vals belonging to the same row */ > for (j=i,rstart=row[j]; j if (row[j] != rstart) break; > } > if (j < n) ncols = j-i; > else ncols = n-i; > /* Now assemble all these values with a single function call */ > ierr = MatSetValues_MPIAIJ(mat,1,row+i,ncols,col+i,val+i,mat->insertmode);CHKERRQ(ierr); > > i = j; > } > } > ierr = MatStashScatterEnd_Private(&mat->stash);CHKERRQ(ierr); > } > ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr); > ierr = MatAssemblyEnd(aij->A,mode);CHKERRQ(ierr); > > /* determine if any processor has disassembled, if so we must > also disassemble ourselfs, in order that we may reassemble. */ > /* > if nonzero structure of submatrix B cannot change then we know that > no processor disassembled thus we can skip this stuff > */ > if (!((Mat_SeqAIJ*)aij->B->data)->nonew) { > ierr = MPIU_Allreduce(&mat->was_assembled,&other_disassembled,1,MPIU_BOOL,MPI_PROD,PetscObjectComm((PetscObject)mat));CHKERRQ(ierr); > if (mat->was_assembled && !other_disassembled) { > ierr = MatDisAssemble_MPIAIJ(mat);CHKERRQ(ierr); > } > } > if (!mat->was_assembled && mode == MAT_FINAL_ASSEMBLY) { > ierr = MatSetUpMultiply_MPIAIJ(mat);CHKERRQ(ierr); > } > ierr = MatSetOption(aij->B,MAT_USE_INODES,PETSC_FALSE);CHKERRQ(ierr); > ierr = MatAssemblyBegin(aij->B,mode);CHKERRQ(ierr); > ierr = MatAssemblyEnd(aij->B,mode);CHKERRQ(ierr); > > ierr = PetscFree2(aij->rowvalues,aij->rowindices);CHKERRQ(ierr); > > aij->rowvalues = 0; > > ierr = VecDestroy(&aij->diag);CHKERRQ(ierr); > if (a->inode.size) mat->ops->multdiagonalblock = MatMultDiagonalBlock_MPIAIJ; > > /* if no new nonzero locations are allowed in matrix then only set the matrix state the first time through */ > if ((!mat->was_assembled && mode == MAT_FINAL_ASSEMBLY) || !((Mat_SeqAIJ*)(aij->A->data))->nonew) { > PetscObjectState state = aij->A->nonzerostate + aij->B->nonzerostate; > ierr = MPIU_Allreduce(&state,&mat->nonzerostate,1,MPIU_INT64,MPI_SUM,PetscObjectComm((PetscObject)mat));CHKERRQ(ierr); > } > PetscFunctionReturn(0); > } > > > I noticed that of the 8 MPI processes, 2 were stuck at > ierr = MatStashScatterGetMesg_Private(&mat->stash,&n,&row,&col,&val,&flg);CHKERRQ(ierr); > > Other two were stuck at > ierr = MatStashScatterEnd_Private(&mat->stash);CHKERRQ(ierr); > > And remaining four were under > ierr = MatSetUpMultiply_MPIAIJ(mat);CHKERRQ(ierr); > > Is it expected for processes to be at different stages in this function? > > -Manav > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bhatiamanav at gmail.com Thu Aug 20 08:54:37 2020 From: bhatiamanav at gmail.com (Manav Bhatia) Date: Thu, 20 Aug 2020 08:54:37 -0500 Subject: [petsc-users] MatAssemblyEnd taking too long In-Reply-To: <4144EFF3-4F1E-4721-A72B-3321497655B2@gmail.com> References: <96E8FF05-5719-4244-A763-A9BACDA21C30@gmail.com> <87k0xui1vw.fsf@jedbrown.org> <87h7syi19b.fsf@jedbrown.org> <295DDF3F-433D-4057-9658-31F260D18BE8@gmail.com> <4144EFF3-4F1E-4721-A72B-3321497655B2@gmail.com> Message-ID: <6CB67FFB-77C5-42E7-84EE-54143BACE708@gmail.com> Stefano, I will report the results to these shortly. To your second question, this is the first matrix assembly of the code. -Manav > On Aug 20, 2020, at 8:31 AM, Stefano Zampini wrote: > > Manav > > Can you add a MPI_Barrier before > > ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr); > > Also, in order to assess where the issue is, we need to see the values (per rank) of > > ((Mat_SeqAIJ*)aij->B->data)->nonew > mat->was_assembled > aij->donotstash > mat->nooffprocentries > > Another question: is this the first matrix assembly of the code? > If you change to pc_none, do you get the same issue? > >> On Aug 20, 2020, at 3:10 PM, Manav Bhatia > wrote: >> >> >> >>> On Aug 19, 2020, at 9:39 PM, Matthew Knepley > wrote: >>> >>> Jed is more knowledgeable about the communication, but I have a simple question about the FEM method. Normally, the way >>> we divide unknowns is that the only unknowns which might have entries computed off-process are those on the partition boundary. >>> However, it sounds like you have a huge number of communicated values. Is it possible that the division of rows in your matrix does >>> not match the division of the cells you compute element matrices for? >> >> >> I hope that is not the case. I am using libMesh to manage the mesh and creation of sparsity pattern, which uses Parmetis to create the partitions. >> libMesh ensures that off-process entries are only at the partition boundary (unless an extra set of DoFs are marked for coupling. >> >> I also printed and looked at the n_nz and n_oz values on each rank and it does not seem to raise any flags. >> >> I will try to dig in a bit further to make sure everything checks out. >> >> Looking at the screenshots I had shared yesterday, all processes are in this function: >> >> PetscErrorCode MatAssemblyEnd_MPIAIJ(Mat mat,MatAssemblyType mode) >> { >> Mat_MPIAIJ *aij = (Mat_MPIAIJ*)mat->data; >> Mat_SeqAIJ *a = (Mat_SeqAIJ*)aij->A->data; >> PetscErrorCode ierr; >> PetscMPIInt n; >> PetscInt i,j,rstart,ncols,flg; >> PetscInt *row,*col; >> PetscBool other_disassembled; >> PetscScalar *val; >> >> /* do not use 'b = (Mat_SeqAIJ*)aij->B->data' as B can be reset in disassembly */ >> >> PetscFunctionBegin; >> if (!aij->donotstash && !mat->nooffprocentries) { >> while (1) { >> ierr = MatStashScatterGetMesg_Private(&mat->stash,&n,&row,&col,&val,&flg);CHKERRQ(ierr); >> if (!flg) break; >> >> for (i=0; i> /* Now identify the consecutive vals belonging to the same row */ >> for (j=i,rstart=row[j]; j> if (row[j] != rstart) break; >> } >> if (j < n) ncols = j-i; >> else ncols = n-i; >> /* Now assemble all these values with a single function call */ >> ierr = MatSetValues_MPIAIJ(mat,1,row+i,ncols,col+i,val+i,mat->insertmode);CHKERRQ(ierr); >> >> i = j; >> } >> } >> ierr = MatStashScatterEnd_Private(&mat->stash);CHKERRQ(ierr); >> } >> ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr); >> ierr = MatAssemblyEnd(aij->A,mode);CHKERRQ(ierr); >> >> /* determine if any processor has disassembled, if so we must >> also disassemble ourselfs, in order that we may reassemble. */ >> /* >> if nonzero structure of submatrix B cannot change then we know that >> no processor disassembled thus we can skip this stuff >> */ >> if (!((Mat_SeqAIJ*)aij->B->data)->nonew) { >> ierr = MPIU_Allreduce(&mat->was_assembled,&other_disassembled,1,MPIU_BOOL,MPI_PROD,PetscObjectComm((PetscObject)mat));CHKERRQ(ierr); >> if (mat->was_assembled && !other_disassembled) { >> ierr = MatDisAssemble_MPIAIJ(mat);CHKERRQ(ierr); >> } >> } >> if (!mat->was_assembled && mode == MAT_FINAL_ASSEMBLY) { >> ierr = MatSetUpMultiply_MPIAIJ(mat);CHKERRQ(ierr); >> } >> ierr = MatSetOption(aij->B,MAT_USE_INODES,PETSC_FALSE);CHKERRQ(ierr); >> ierr = MatAssemblyBegin(aij->B,mode);CHKERRQ(ierr); >> ierr = MatAssemblyEnd(aij->B,mode);CHKERRQ(ierr); >> >> ierr = PetscFree2(aij->rowvalues,aij->rowindices);CHKERRQ(ierr); >> >> aij->rowvalues = 0; >> >> ierr = VecDestroy(&aij->diag);CHKERRQ(ierr); >> if (a->inode.size) mat->ops->multdiagonalblock = MatMultDiagonalBlock_MPIAIJ; >> >> /* if no new nonzero locations are allowed in matrix then only set the matrix state the first time through */ >> if ((!mat->was_assembled && mode == MAT_FINAL_ASSEMBLY) || !((Mat_SeqAIJ*)(aij->A->data))->nonew) { >> PetscObjectState state = aij->A->nonzerostate + aij->B->nonzerostate; >> ierr = MPIU_Allreduce(&state,&mat->nonzerostate,1,MPIU_INT64,MPI_SUM,PetscObjectComm((PetscObject)mat));CHKERRQ(ierr); >> } >> PetscFunctionReturn(0); >> } >> >> >> I noticed that of the 8 MPI processes, 2 were stuck at >> ierr = MatStashScatterGetMesg_Private(&mat->stash,&n,&row,&col,&val,&flg);CHKERRQ(ierr); >> >> Other two were stuck at >> ierr = MatStashScatterEnd_Private(&mat->stash);CHKERRQ(ierr); >> >> And remaining four were under >> ierr = MatSetUpMultiply_MPIAIJ(mat);CHKERRQ(ierr); >> >> Is it expected for processes to be at different stages in this function? >> >> -Manav >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bhatiamanav at gmail.com Thu Aug 20 10:08:57 2020 From: bhatiamanav at gmail.com (Manav Bhatia) Date: Thu, 20 Aug 2020 10:08:57 -0500 Subject: [petsc-users] MatAssemblyEnd taking too long In-Reply-To: <4144EFF3-4F1E-4721-A72B-3321497655B2@gmail.com> References: <96E8FF05-5719-4244-A763-A9BACDA21C30@gmail.com> <87k0xui1vw.fsf@jedbrown.org> <87h7syi19b.fsf@jedbrown.org> <295DDF3F-433D-4057-9658-31F260D18BE8@gmail.com> <4144EFF3-4F1E-4721-A72B-3321497655B2@gmail.com> Message-ID: > On Aug 20, 2020, at 8:31 AM, Stefano Zampini wrote: > > Can you add a MPI_Barrier before > > ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr); > With a MPI_Barrier before this function call: ? three of the processes have already hit this barrier, ? the other 5 are inside MatStashScatterGetMesg_Private -> MatStashScatterGerMesg_BTS -> MPI_Waitsome(2 processes)/MPI_Waitall(3 processes) > Also, in order to assess where the issue is, we need to see the values (per rank) of > > ((Mat_SeqAIJ*)aij->B->data)->nonew > mat->was_assembled > aij->donotstash > mat->nooffprocentries > I am working to get this information. > Another question: is this the first matrix assembly of the code? Yes, this is the first matrix assembly in the code. > If you change to pc_none, do you get the same issue? Yes, with "-pc_type none? the code is stuck at the same spot. -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdkong.jd at gmail.com Thu Aug 20 10:09:14 2020 From: fdkong.jd at gmail.com (Fande Kong) Date: Thu, 20 Aug 2020 09:09:14 -0600 Subject: [petsc-users] Disable PETSC_HAVE_CLOSURE Message-ID: Hi All, We (moose team) hit an error message when compiling PETSc, recently. The error is related to "PETSC_HAVE_CLOSURE." Everything runs well if I am going to turn this flag off by making the following changes: git diff diff --git a/config/BuildSystem/config/utilities/closure.py b/config/BuildSystem/config/utilities/closure.py index 6341ddf271..930e5b3b1b 100644 --- a/config/BuildSystem/config/utilities/closure.py +++ b/config/BuildSystem/config/utilities/closure.py @@ -19,8 +19,8 @@ class Configure(config.base.Configure): includes = '#include \n' body = 'int (^closure)(int);' self.pushLanguage('C') - if self.checkLink(includes, body): - self.addDefine('HAVE_CLOSURE','1') +# if self.checkLink(includes, body): +# self.addDefine('HAVE_CLOSURE','1') def configure(self): self.executeTest(self.configureClosure) I was wondering if there exists a configuration option to disable "Closure" C syntax? I did not find one by running "configuration --help" Please let me know if you need more information. Thanks, Fande, In file included from /Users/milljm/projects/moose/scripts/../libmesh/src/solvers/petscdmlibmesh.C:25: /Users/milljm/projects/moose/petsc/include/petsc/private/petscimpl.h:15:29: warning: 'PetscVFPrintfSetClosure' initialized and declared 'extern' 15 | PETSC_EXTERN PetscErrorCode PetscVFPrintfSetClosure(int (^)(const char*)); | ^~~~~~~~~~~~~~~~~~~~~~~ /Users/milljm/projects/moose/petsc/include/petsc/private/petscimpl.h:15:53: error: expected primary-expression before 'int' 15 | PETSC_EXTERN PetscErrorCode PetscVFPrintfSetClosure(int (^)(const char*)); | ^~~ CXX src/systems/libmesh_opt_la-equation_systems_io.lo In file included from /Users/milljm/projects/moose/petsc/include/petsc/private/dmimpl.h:7, from /Users/milljm/projects/moose/scripts/../libmesh/src/solvers/petscdmlibmeshimpl.C:26: /Users/milljm/projects/moose/petsc/include/petsc/private/petscimpl.h:15:29: warning: 'PetscVFPrintfSetClosure' initialized and declared 'extern' 15 | PETSC_EXTERN PetscErrorCode PetscVFPrintfSetClosure(int (^)(const char*)); | ^~~~~~~~~~~~~~~~~~~~~~~ /Users/milljm/projects/moose/petsc/include/petsc/private/petscimpl.h:15:53: error: expected primary-expression before 'int' 15 | PETSC_EXTERN PetscErrorCode PetscVFPrintfSetClosure(int (^)(const char*)); -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Aug 20 10:12:00 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 20 Aug 2020 11:12:00 -0400 Subject: [petsc-users] MatAssemblyEnd taking too long In-Reply-To: References: <96E8FF05-5719-4244-A763-A9BACDA21C30@gmail.com> <87k0xui1vw.fsf@jedbrown.org> <87h7syi19b.fsf@jedbrown.org> <295DDF3F-433D-4057-9658-31F260D18BE8@gmail.com> <4144EFF3-4F1E-4721-A72B-3321497655B2@gmail.com> Message-ID: On Thu, Aug 20, 2020 at 11:09 AM Manav Bhatia wrote: > > > On Aug 20, 2020, at 8:31 AM, Stefano Zampini > wrote: > > Can you add a MPI_Barrier before > > ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr); > > > With a MPI_Barrier before this function call: > ? three of the processes have already hit this barrier, > ? the other 5 are inside MatStashScatterGetMesg_Private -> > MatStashScatterGerMesg_BTS -> MPI_Waitsome(2 processes)/MPI_Waitall(3 > processes) > Okay, you should run this with -matstash_legacy just to make sure it is not a bug in your MPI implementation. But it looks like there is inconsistency in the parallel state. This can happen because we have a bug, or it could be that you called a collective operation on a subset of the processes. Is there any way you could cut down the example (say put all 1s in the matrix, etc) so that you could give it to us to run? Thanks, Matt > Also, in order to assess where the issue is, we need to see the values > (per rank) of > > ((Mat_SeqAIJ*)aij->B->data)->nonew > mat->was_assembled > aij->donotstash > mat->nooffprocentries > > > I am working to get this information. > > Another question: is this the first matrix assembly of the code? > > > Yes, this is the first matrix assembly in the code. > > If you change to pc_none, do you get the same issue? > > > Yes, with "-pc_type none? the code is stuck at the same spot. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu Aug 20 10:19:43 2020 From: jed at jedbrown.org (Jed Brown) Date: Thu, 20 Aug 2020 09:19:43 -0600 Subject: [petsc-users] Disable PETSC_HAVE_CLOSURE In-Reply-To: References: Message-ID: <87364hibu8.fsf@jedbrown.org> Barry, this is a side-effect of your Swift experiment. Does that need to be in a header (even if it's a private header)? The issue may be that you test with a C compiler and it gets included in C++ source. Fande Kong writes: > Hi All, > > We (moose team) hit an error message when compiling PETSc, recently. The > error is related to "PETSC_HAVE_CLOSURE." Everything runs well if I am > going to turn this flag off by making the following changes: > > > git diff > diff --git a/config/BuildSystem/config/utilities/closure.py > b/config/BuildSystem/config/utilities/closure.py > index 6341ddf271..930e5b3b1b 100644 > --- a/config/BuildSystem/config/utilities/closure.py > +++ b/config/BuildSystem/config/utilities/closure.py > @@ -19,8 +19,8 @@ class Configure(config.base.Configure): > includes = '#include \n' > body = 'int (^closure)(int);' > self.pushLanguage('C') > - if self.checkLink(includes, body): > - self.addDefine('HAVE_CLOSURE','1') > +# if self.checkLink(includes, body): > +# self.addDefine('HAVE_CLOSURE','1') > def configure(self): > self.executeTest(self.configureClosure) > > > I was wondering if there exists a configuration option to disable "Closure" > C syntax? I did not find one by running "configuration --help" > > Please let me know if you need more information. > > > Thanks, > > Fande, > > > In file included from > /Users/milljm/projects/moose/scripts/../libmesh/src/solvers/petscdmlibmesh.C:25: > /Users/milljm/projects/moose/petsc/include/petsc/private/petscimpl.h:15:29: > warning: 'PetscVFPrintfSetClosure' initialized and declared 'extern' > 15 | PETSC_EXTERN PetscErrorCode PetscVFPrintfSetClosure(int (^)(const > char*)); > | ^~~~~~~~~~~~~~~~~~~~~~~ > /Users/milljm/projects/moose/petsc/include/petsc/private/petscimpl.h:15:53: > error: expected primary-expression before 'int' > 15 | PETSC_EXTERN PetscErrorCode PetscVFPrintfSetClosure(int (^)(const > char*)); > | ^~~ > CXX src/systems/libmesh_opt_la-equation_systems_io.lo > In file included from > /Users/milljm/projects/moose/petsc/include/petsc/private/dmimpl.h:7, > from > /Users/milljm/projects/moose/scripts/../libmesh/src/solvers/petscdmlibmeshimpl.C:26: > /Users/milljm/projects/moose/petsc/include/petsc/private/petscimpl.h:15:29: > warning: 'PetscVFPrintfSetClosure' initialized and declared 'extern' > 15 | PETSC_EXTERN PetscErrorCode PetscVFPrintfSetClosure(int (^)(const > char*)); > | ^~~~~~~~~~~~~~~~~~~~~~~ > /Users/milljm/projects/moose/petsc/include/petsc/private/petscimpl.h:15:53: > error: expected primary-expression before 'int' > 15 | PETSC_EXTERN PetscErrorCode PetscVFPrintfSetClosure(int (^)(const > char*)); From junchao.zhang at gmail.com Thu Aug 20 10:22:36 2020 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Thu, 20 Aug 2020 10:22:36 -0500 Subject: [petsc-users] MatAssemblyEnd taking too long In-Reply-To: <96E8FF05-5719-4244-A763-A9BACDA21C30@gmail.com> References: <96E8FF05-5719-4244-A763-A9BACDA21C30@gmail.com> Message-ID: See if you could reproduce the problem on another machine, e.g., a Linux workstation with MPICH? --Junchao Zhang On Wed, Aug 19, 2020 at 7:32 PM Manav Bhatia wrote: > Hi, > > I have an application that uses the KSP solver and I am looking at the > performance with increasing system size. I am currently running on MacBook > Pro with 32 GB memory and Petsc obtained from GitHub (commit > df0e43005dbe6ff47eff22a32b336a6c37d02c3a). > > The application runs fine till about 2e6 DoFs using gamg without any > problems. > > However, when I try a larger system size, in this case with 5.4e6 DoFs, > the application hangs for an hour and I have to kill the MPI processes. > > I tried to use Xcode Instruments to profile the 8 MPI processes and I > have attached a screenshot of the recorded results from each process. All > processes are stuck inside MatAssemblyEnd, but at different function calls. > > I am not sure how to debug this issue, and would greatly appreciate any > guidance. > > For reference, I am calling PETSc with the following options: > -ksp_type gmres -pc_type gamg -mat_block_size 3 -mg_levels_ksp_max_it 4 > -ksp_monitor -ksp_converged_reason > > Regards, > Manav > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2020-08-19 at 7.16.08 PM.png Type: image/png Size: 211493 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2020-08-19 at 7.16.06 PM.png Type: image/png Size: 202337 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2020-08-19 at 7.16.04 PM.png Type: image/png Size: 184314 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2020-08-19 at 7.16.02 PM.png Type: image/png Size: 185896 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2020-08-19 at 7.15.59 PM.png Type: image/png Size: 804446 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2020-08-19 at 7.15.57 PM.png Type: image/png Size: 851646 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2020-08-19 at 7.15.55 PM.png Type: image/png Size: 787467 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2020-08-19 at 7.15.52 PM.png Type: image/png Size: 966045 bytes Desc: not available URL: From bhatiamanav at gmail.com Thu Aug 20 10:25:38 2020 From: bhatiamanav at gmail.com (Manav Bhatia) Date: Thu, 20 Aug 2020 10:25:38 -0500 Subject: [petsc-users] MatAssemblyEnd taking too long In-Reply-To: References: <96E8FF05-5719-4244-A763-A9BACDA21C30@gmail.com> Message-ID: > On Aug 20, 2020, at 10:22 AM, Junchao Zhang wrote: > > See if you could reproduce the problem on another machine, e.g., a Linux workstation with MPICH? Yes, the same behavior happened on another machine with Centos 7 with an older build of PETSc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Thu Aug 20 10:49:40 2020 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Thu, 20 Aug 2020 10:49:40 -0500 Subject: [petsc-users] MatAssemblyEnd taking too long In-Reply-To: References: <96E8FF05-5719-4244-A763-A9BACDA21C30@gmail.com> Message-ID: Then could you provide a test case and build instructions? --Junchao Zhang On Thu, Aug 20, 2020 at 10:25 AM Manav Bhatia wrote: > > > On Aug 20, 2020, at 10:22 AM, Junchao Zhang > wrote: > > See if you could reproduce the problem on another machine, e.g., a Linux > workstation with MPICH? > > > Yes, the same behavior happened on another machine with Centos 7 with an > older build of PETSc. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bhatiamanav at gmail.com Thu Aug 20 10:59:51 2020 From: bhatiamanav at gmail.com (Manav Bhatia) Date: Thu, 20 Aug 2020 10:59:51 -0500 Subject: [petsc-users] MatAssemblyEnd taking too long In-Reply-To: <4144EFF3-4F1E-4721-A72B-3321497655B2@gmail.com> References: <96E8FF05-5719-4244-A763-A9BACDA21C30@gmail.com> <87k0xui1vw.fsf@jedbrown.org> <87h7syi19b.fsf@jedbrown.org> <295DDF3F-433D-4057-9658-31F260D18BE8@gmail.com> <4144EFF3-4F1E-4721-A72B-3321497655B2@gmail.com> Message-ID: <1281B298-A1A1-44E2-9171-A3BE71B236E9@gmail.com> > On Aug 20, 2020, at 8:31 AM, Stefano Zampini wrote: > > ((Mat_SeqAIJ*)aij->B->data)->nonew > mat->was_assembled > aij->donotstash > mat->nooffprocentries > The values for the last three variables are all False on all 8 processes. Regards, Manav -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Thu Aug 20 11:21:10 2020 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Thu, 20 Aug 2020 18:21:10 +0200 Subject: [petsc-users] MatAssemblyEnd taking too long In-Reply-To: <1281B298-A1A1-44E2-9171-A3BE71B236E9@gmail.com> References: <96E8FF05-5719-4244-A763-A9BACDA21C30@gmail.com> <87k0xui1vw.fsf@jedbrown.org> <87h7syi19b.fsf@jedbrown.org> <295DDF3F-433D-4057-9658-31F260D18BE8@gmail.com> <4144EFF3-4F1E-4721-A72B-3321497655B2@gmail.com> <1281B298-A1A1-44E2-9171-A3BE71B236E9@gmail.com> Message-ID: <9AD6B98D-48B5-404A-BA34-B5F3712AF433@gmail.com> > On Aug 20, 2020, at 5:59 PM, Manav Bhatia wrote: > > > >> On Aug 20, 2020, at 8:31 AM, Stefano Zampini > wrote: >> >> ((Mat_SeqAIJ*)aij->B->data)->nonew >> mat->was_assembled >> aij->donotstash >> mat->nooffprocentries >> > > The values for the last three variables are all False on all 8 processes. Thanks, it seems a bug in MPI or on our side. As Matt said, can you run with -matstash_legacy? Also, make sure to run with a debug version of PETSc (configure using ?with-debugging=1). How feasible is to write a driver code to run with the same mesh, same discretization and same equations to solve but with the matrix assembly only? Does it hang in that case too? > > Regards, > Manav > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Aug 20 11:33:44 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 20 Aug 2020 12:33:44 -0400 Subject: [petsc-users] PetscFV and TS implicit In-Reply-To: References: Message-ID: On Thu, Aug 20, 2020 at 7:45 AM Thibault Bridel-Bertomeu < thibault.bridelbertomeu at gmail.com> wrote: > Dear all, > > I have a finite volume code inspired from TS example ex11.c (with a > riemann solver, etc...). > So far, I used only explicit time stepping through the TSSSP, and to set > the RHS of my hyperbolic system I used : > > TSSetType(ts, TSSSP); > DMTSSetRHSFunctionLocal(dm, DMPlexTSComputeRHSFunctionFVM , &ctx); > > after setting the right Riemann solver in the TS associated to the DM. > Now, in some cases where the physics is stationary, I would like to reach > the steady state faster and use an implicit timestepping operator to do so. > After looking through the examples, especially TS examples ex48.c and > ex53.c, I simply tried to set > > TSSetType(ts, TSBEULER); > DMTSSetIFunctionLocal(dm, DMPlexTSComputeIFunctionFEM , &ctx); > DMTSSetIJacobianLocal(dm, DMPlexTSComputeIJacobianFEM , &ctx); > > instead of the previous calls. It compiles fine, and it runs. However, > nothing happens : it behaves like there is no time evolution at all, the > solution does not change from its initial state. > From the source code, it is my understanding that the > DMPlexTSComputeIFunctionFEM and DMPlexTSComputeIJacobianFEM methods, in > spite of their names, call respectively DMPlexComputeResidual_Internal and > DMPlexComputeJacobian_Internal that can handle a FVM discretization. > > What am I missing ? Are there other steps to take before I can simply try > to run with a finite volume discretization and an implicit time stepping > algorithm ? > > Thank you very much for your help in advance ! > I could never get the FVM stuff to make sense to me for implicit methods. Here is my problem understanding. If you have an FVM method, it decides to move "stuff" from one cell to its neighboring cells depending on the solution to the Riemann problem on each face, which computed the flux. This is fine unless the timestep is so big that material can flow through into the cells beyond the neighbor. Then I should have considered the effect of the Riemann problem for those interfaces. That would be in the Jacobian, but I don't know how to compute that Jacobian. I guess you could do everything matrix-free, but without a preconditioner it seems hard. Operationally, I always mark FVM things explicit, so they are only handled by RHSFunction/Jacobian, except for the u_t term which I stick in the IJacobian. I was never successful in running an implicit FVM and still do not understand it. I could not find any literature that I understood either. If there is a definite thing you want changed, I might be able to do it. Thanks, Matt > Thibault Bridel-Bertomeu > ? > Eng, MSc, PhD > Research Engineer > CEA/CESTA > 33114 LE BARP > Tel.: (+33)557046924 > Mob.: (+33)611025322 > Mail: thibault.bridelbertomeu at gmail.com > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu Aug 20 11:43:46 2020 From: jed at jedbrown.org (Jed Brown) Date: Thu, 20 Aug 2020 10:43:46 -0600 Subject: [petsc-users] PetscFV and TS implicit In-Reply-To: References: Message-ID: <87mu2pgtdp.fsf@jedbrown.org> Matthew Knepley writes: > I could never get the FVM stuff to make sense to me for implicit methods. > Here is my problem understanding. If you have an FVM method, it decides > to move "stuff" from one cell to its neighboring cells depending on the > solution to the Riemann problem on each face, which computed the flux. This > is > fine unless the timestep is so big that material can flow through into the > cells beyond the neighbor. Then I should have considered the effect of the > Riemann problem for those interfaces. That would be in the Jacobian, but I > don't know how to compute that Jacobian. I guess you could do everything > matrix-free, but without a preconditioner it seems hard. So long as we're using method of lines, the flux is just instantaneous flux, not integrated over some time step. It has the same meaning for implicit and explicit. An explicit method would be unstable if you took such a large time step (CFL) and an implicit method will not simultaneously be SSP and higher than first order, but it's still a consistent discretization of the problem. It's common (done in FUN3D and others) to precondition with a first-order method, where gradient reconstruction/limiting is skipped. That's what I'd recommend because limiting creates nasty nonlinearities and the resulting discretizations lack h-ellipticity which makes them very hard to solve. From bhatiamanav at gmail.com Thu Aug 20 11:53:08 2020 From: bhatiamanav at gmail.com (Manav Bhatia) Date: Thu, 20 Aug 2020 11:53:08 -0500 Subject: [petsc-users] MatAssemblyEnd taking too long In-Reply-To: <9AD6B98D-48B5-404A-BA34-B5F3712AF433@gmail.com> References: <96E8FF05-5719-4244-A763-A9BACDA21C30@gmail.com> <87k0xui1vw.fsf@jedbrown.org> <87h7syi19b.fsf@jedbrown.org> <295DDF3F-433D-4057-9658-31F260D18BE8@gmail.com> <4144EFF3-4F1E-4721-A72B-3321497655B2@gmail.com> <1281B298-A1A1-44E2-9171-A3BE71B236E9@gmail.com> <9AD6B98D-48B5-404A-BA34-B5F3712AF433@gmail.com> Message-ID: <7E9B822C-1B02-4BFA-9DA1-6339FEA44978@gmail.com> > On Aug 20, 2020, at 11:21 AM, Stefano Zampini wrote: > > > >> On Aug 20, 2020, at 5:59 PM, Manav Bhatia > wrote: >> >> >> >>> On Aug 20, 2020, at 8:31 AM, Stefano Zampini > wrote: >>> >>> ((Mat_SeqAIJ*)aij->B->data)->nonew >>> mat->was_assembled >>> aij->donotstash >>> mat->nooffprocentries >>> >> >> The values for the last three variables are all False on all 8 processes. > > Thanks, it seems a bug in MPI or on our side. As Matt said, can you run with -matstash_legacy? I did that with the result that a smaller case that had been running got stuck in the MatAssemblyEnd_MPIAIJ routine with -matstash_legacy. Not sure what this implies. > Also, make sure to run with a debug version of PETSc (configure using ?with-debugging=1). Trying that now. > How feasible is to write a driver code to run with the same mesh, same discretization and same equations to solve but with the matrix assembly only? Does it hang in that case too? My code is on GitHub (https://github.com/MASTmultiphysics/MAST3), including the specific example that is producing this error (https://github.com/MASTmultiphysics/MAST3/tree/master/examples/structural/example_6), but has multiple dependencies, including libMesh/PETSc/SLEPc/Eigen. The mesh generation and sparsity pattern creation is currently done by libMesh. If needed, I may be able to store all the necessary information (mesh, nnz, noz) in a text file, remove dependencies on libMesh and directly try to initialize with information from a file. Would that help? > >> >> Regards, >> Manav -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu Aug 20 12:54:38 2020 From: jed at jedbrown.org (Jed Brown) Date: Thu, 20 Aug 2020 11:54:38 -0600 Subject: [petsc-users] MatAssemblyEnd taking too long In-Reply-To: References: <96E8FF05-5719-4244-A763-A9BACDA21C30@gmail.com> <87k0xui1vw.fsf@jedbrown.org> <87h7syi19b.fsf@jedbrown.org> <295DDF3F-433D-4057-9658-31F260D18BE8@gmail.com> <4144EFF3-4F1E-4721-A72B-3321497655B2@gmail.com> Message-ID: <87imddgq3l.fsf@jedbrown.org> Matthew Knepley writes: > On Thu, Aug 20, 2020 at 11:09 AM Manav Bhatia wrote: > >> >> >> On Aug 20, 2020, at 8:31 AM, Stefano Zampini >> wrote: >> >> Can you add a MPI_Barrier before >> >> ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr); >> >> >> With a MPI_Barrier before this function call: >> ? three of the processes have already hit this barrier, >> ? the other 5 are inside MatStashScatterGetMesg_Private -> >> MatStashScatterGetMesg_BTS -> MPI_Waitsome(2 processes)/MPI_Waitall(3 >> processes) This is not itself evidence of inconsistent state. You can use -build_twosided allreduce to avoid the nonblocking sparse algorithm. > > Okay, you should run this with -matstash_legacy just to make sure it is not > a bug in your MPI implementation. But it looks like > there is inconsistency in the parallel state. This can happen because we > have a bug, or it could be that you called a collective > operation on a subset of the processes. Is there any way you could cut down > the example (say put all 1s in the matrix, etc) so > that you could give it to us to run? From bhatiamanav at gmail.com Thu Aug 20 17:23:52 2020 From: bhatiamanav at gmail.com (Manav Bhatia) Date: Thu, 20 Aug 2020 17:23:52 -0500 Subject: [petsc-users] MatAssemblyEnd taking too long In-Reply-To: <87imddgq3l.fsf@jedbrown.org> References: <96E8FF05-5719-4244-A763-A9BACDA21C30@gmail.com> <87k0xui1vw.fsf@jedbrown.org> <87h7syi19b.fsf@jedbrown.org> <295DDF3F-433D-4057-9658-31F260D18BE8@gmail.com> <4144EFF3-4F1E-4721-A72B-3321497655B2@gmail.com> <87imddgq3l.fsf@jedbrown.org> Message-ID: <2CAFDC41-75ED-4282-A6C3-9B477E2F8542@gmail.com> I have created a standalone test that demonstrates the problem at my end. I have stored the indices, etc. from my problem in a text file for each rank, which I use to initialize the matrix. Please note that the test is specifically for 8 ranks. The .tgz file is on my google drive: https://drive.google.com/file/d/1R-WjS36av3maXX3pUyiR3ndGAxteTVj-/view?usp=sharing This contains a README file with instructions on running. Please note that the work directory needs the index files. Please let me know if I can provide any further information. Thank you all for your help. Regards, Manav > On Aug 20, 2020, at 12:54 PM, Jed Brown wrote: > > Matthew Knepley > writes: > >> On Thu, Aug 20, 2020 at 11:09 AM Manav Bhatia wrote: >> >>> >>> >>> On Aug 20, 2020, at 8:31 AM, Stefano Zampini >>> wrote: >>> >>> Can you add a MPI_Barrier before >>> >>> ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr); >>> >>> >>> With a MPI_Barrier before this function call: >>> ? three of the processes have already hit this barrier, >>> ? the other 5 are inside MatStashScatterGetMesg_Private -> >>> MatStashScatterGetMesg_BTS -> MPI_Waitsome(2 processes)/MPI_Waitall(3 >>> processes) > > This is not itself evidence of inconsistent state. You can use > > -build_twosided allreduce > > to avoid the nonblocking sparse algorithm. > >> >> Okay, you should run this with -matstash_legacy just to make sure it is not >> a bug in your MPI implementation. But it looks like >> there is inconsistency in the parallel state. This can happen because we >> have a bug, or it could be that you called a collective >> operation on a subset of the processes. Is there any way you could cut down >> the example (say put all 1s in the matrix, etc) so >> that you could give it to us to run? -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Thu Aug 20 17:29:11 2020 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Thu, 20 Aug 2020 17:29:11 -0500 Subject: [petsc-users] MatAssemblyEnd taking too long In-Reply-To: <2CAFDC41-75ED-4282-A6C3-9B477E2F8542@gmail.com> References: <96E8FF05-5719-4244-A763-A9BACDA21C30@gmail.com> <87k0xui1vw.fsf@jedbrown.org> <87h7syi19b.fsf@jedbrown.org> <295DDF3F-433D-4057-9658-31F260D18BE8@gmail.com> <4144EFF3-4F1E-4721-A72B-3321497655B2@gmail.com> <87imddgq3l.fsf@jedbrown.org> <2CAFDC41-75ED-4282-A6C3-9B477E2F8542@gmail.com> Message-ID: I will have a look and report back to you. Thanks. --Junchao Zhang On Thu, Aug 20, 2020 at 5:23 PM Manav Bhatia wrote: > I have created a standalone test that demonstrates the problem at my end. > I have stored the indices, etc. from my problem in a text file for each > rank, which I use to initialize the matrix. > Please note that the test is specifically for 8 ranks. > > The .tgz file is on my google drive: > https://drive.google.com/file/d/1R-WjS36av3maXX3pUyiR3ndGAxteTVj-/view?usp=sharing > > > This contains a README file with instructions on running. Please note that > the work directory needs the index files. > > Please let me know if I can provide any further information. > > Thank you all for your help. > > Regards, > Manav > > On Aug 20, 2020, at 12:54 PM, Jed Brown wrote: > > Matthew Knepley writes: > > On Thu, Aug 20, 2020 at 11:09 AM Manav Bhatia > wrote: > > > > On Aug 20, 2020, at 8:31 AM, Stefano Zampini > wrote: > > Can you add a MPI_Barrier before > > ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr); > > > With a MPI_Barrier before this function call: > ? three of the processes have already hit this barrier, > ? the other 5 are inside MatStashScatterGetMesg_Private -> > MatStashScatterGetMesg_BTS -> MPI_Waitsome(2 processes)/MPI_Waitall(3 > processes) > > > This is not itself evidence of inconsistent state. You can use > > -build_twosided allreduce > > to avoid the nonblocking sparse algorithm. > > > Okay, you should run this with -matstash_legacy just to make sure it is not > a bug in your MPI implementation. But it looks like > there is inconsistency in the parallel state. This can happen because we > have a bug, or it could be that you called a collective > operation on a subset of the processes. Is there any way you could cut down > the example (say put all 1s in the matrix, etc) so > that you could give it to us to run? > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pranayreddy865 at gmail.com Thu Aug 20 19:45:48 2020 From: pranayreddy865 at gmail.com (baikadi pranay) Date: Thu, 20 Aug 2020 17:45:48 -0700 Subject: [petsc-users] On re usability of A matrix and b vector In-Reply-To: References: Message-ID: Thank you for the clarification. I also want to get two more things clarified. After i^th call to KSPSolve(), I get x^i as the solution vector. I do a bunch of updates to A matrix and b vector using this x^i and copy x^i into a vector called x_old. 1) When I call KSPSolve for the (i+1)th time using KSPSolve(ksp,b,x,ierr), the variable x already has the solution of the previous iteration in it. Does that mean I need to create a new x vector each time I call KSPSolve? 2) On the similar lines, do I need to create new x_old after each iteration? Please let me know if the questions are unclear, so that I can elaborate. ? On Wed, Aug 19, 2020 at 9:06 PM Barry Smith wrote: > > KSP knows if the matrix has changed and rebuilds the parts of the > preconditioner it needs if the matrix has changed. When the matrix has not > changed it uses the exact same preconditioner as before. > > > Barry > > This "trick" is done by having an integer state variable inside the > matrix object. Each time the matrix is changed by MatSetValues() etc. the > state variable is incremented. KSPSolve() keeps a record of the state > variable of the matrix for each call to SNESSolve(), if the state variable > has increased it knows the matrix has changed so updates the > preconditioner. > > On Aug 19, 2020, at 9:05 PM, baikadi pranay > wrote: > > Hello, > > I am trying to solve the poisson equation iteratively using BiCGStab in > FORTRAN 90. After every call to KSPSolve, I update the central coefficients > of A matrix and the b vector (and then solve the new linear equation > system, repeating the process until convergence is achieved). I want to > know whether the A matrix and b vector that are created initially can be > used in the iteration process or do I need to create a new A matrix and b > vector in each iteration. > > Please let me know if you need any further information. > > Thank you. > > Sincerely, > Pranay. > ? > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Aug 20 20:33:51 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 20 Aug 2020 20:33:51 -0500 Subject: [petsc-users] On re usability of A matrix and b vector In-Reply-To: References: Message-ID: <601F5D9A-1688-46BE-AC95-85F7CB7957B5@petsc.dev> > On Aug 20, 2020, at 7:45 PM, baikadi pranay wrote: > > Thank you for the clarification. > > I also want to get two more things clarified. After i^th call to KSPSolve(), I get x^i as the solution vector. I do a bunch of updates to A matrix and b vector using this x^i and copy x^i into a vector called x_old. > > 1) When I call KSPSolve for the (i+1)th time using KSPSolve(ksp,b,x,ierr), the variable x already has the solution of the previous iteration in it. Does that mean I need to create a new x vector each time I call KSPSolve? > 2) On the similar lines, do I need to create new x_old after each iteration? No, you do not have to do anything special with the vectors, the KSP does not track them like it tracks the matrix. Separate note: If you are using the previous solution as an initial guess for the next solve you need to call KSPSetInitialGuessNonzero() go tell KSP to use the initial guess (otherwise it always starts with a zero initial guess) Barry > > Please let me know if the questions are unclear, so that I can elaborate. > ? > > On Wed, Aug 19, 2020 at 9:06 PM Barry Smith > wrote: > > KSP knows if the matrix has changed and rebuilds the parts of the preconditioner it needs if the matrix has changed. When the matrix has not changed it uses the exact same preconditioner as before. > > > Barry > > This "trick" is done by having an integer state variable inside the matrix object. Each time the matrix is changed by MatSetValues() etc. the state variable is incremented. KSPSolve() keeps a record of the state variable of the matrix for each call to SNESSolve(), if the state variable has increased it knows the matrix has changed so updates the preconditioner. > >> On Aug 19, 2020, at 9:05 PM, baikadi pranay > wrote: >> >> Hello, >> >> I am trying to solve the poisson equation iteratively using BiCGStab in FORTRAN 90. After every call to KSPSolve, I update the central coefficients of A matrix and the b vector (and then solve the new linear equation system, repeating the process until convergence is achieved). I want to know whether the A matrix and b vector that are created initially can be used in the iteration process or do I need to create a new A matrix and b vector in each iteration. >> >> Please let me know if you need any further information. >> >> Thank you. >> >> Sincerely, >> Pranay. >> ? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Thu Aug 20 22:45:11 2020 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Thu, 20 Aug 2020 22:45:11 -0500 Subject: [petsc-users] MatAssemblyEnd taking too long In-Reply-To: References: <96E8FF05-5719-4244-A763-A9BACDA21C30@gmail.com> <87k0xui1vw.fsf@jedbrown.org> <87h7syi19b.fsf@jedbrown.org> <295DDF3F-433D-4057-9658-31F260D18BE8@gmail.com> <4144EFF3-4F1E-4721-A72B-3321497655B2@gmail.com> <87imddgq3l.fsf@jedbrown.org> <2CAFDC41-75ED-4282-A6C3-9B477E2F8542@gmail.com> Message-ID: Manav, I downloaded your petsc_mat.tgz but could not reproduce the problem, on both Linux and Mac. I used the petsc commit id df0e4300 you mentioned. On Linux, I have openmpi-4.0.2 + gcc-8.3.0, and petsc is configured --with-debugging --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort --COPTFLAGS="-g -O0" --FOPTFLAGS="-g -O0" --CXXOPTFLAGS="-g -O0" --PETSC_ARCH=linux-host-dbg On Mac, I have mpich-3.3.1 + clang-11.0.0-apple, and petsc is configured --with-debugging=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort --with-ctable=0 COPTFLAGS="-O0 -g" CXXOPTFLAGS="-O0 -g" PETSC_ARCH=mac-clang-dbg mpirun -n 8 ./test rank: 1 : stdout.processor.1 rank: 4 : stdout.processor.4 rank: 0 : stdout.processor.0 rank: 5 : stdout.processor.5 rank: 6 : stdout.processor.6 rank: 7 : stdout.processor.7 rank: 3 : stdout.processor.3 rank: 2 : stdout.processor.2 rank: 1 : Beginning reading nnz... rank: 4 : Beginning reading nnz... rank: 0 : Beginning reading nnz... rank: 5 : Beginning reading nnz... rank: 7 : Beginning reading nnz... rank: 2 : Beginning reading nnz... rank: 3 : Beginning reading nnz... rank: 6 : Beginning reading nnz... rank: 5 : Finished reading nnz rank: 5 : Beginning mat preallocation... rank: 3 : Finished reading nnz rank: 3 : Beginning mat preallocation... rank: 4 : Finished reading nnz rank: 4 : Beginning mat preallocation... rank: 7 : Finished reading nnz rank: 7 : Beginning mat preallocation... rank: 1 : Finished reading nnz rank: 1 : Beginning mat preallocation... rank: 0 : Finished reading nnz rank: 0 : Beginning mat preallocation... rank: 2 : Finished reading nnz rank: 2 : Beginning mat preallocation... rank: 6 : Finished reading nnz rank: 6 : Beginning mat preallocation... rank: 5 : Finished preallocation rank: 5 : Beginning reading and setting matrix values... rank: 1 : Finished preallocation rank: 1 : Beginning reading and setting matrix values... rank: 7 : Finished preallocation rank: 7 : Beginning reading and setting matrix values... rank: 2 : Finished preallocation rank: 2 : Beginning reading and setting matrix values... rank: 4 : Finished preallocation rank: 4 : Beginning reading and setting matrix values... rank: 0 : Finished preallocation rank: 0 : Beginning reading and setting matrix values... rank: 3 : Finished preallocation rank: 3 : Beginning reading and setting matrix values... rank: 6 : Finished preallocation rank: 6 : Beginning reading and setting matrix values... rank: 1 : Finished reading and setting matrix values rank: 1 : Beginning mat assembly... rank: 5 : Finished reading and setting matrix values rank: 5 : Beginning mat assembly... rank: 4 : Finished reading and setting matrix values rank: 4 : Beginning mat assembly... rank: 2 : Finished reading and setting matrix values rank: 2 : Beginning mat assembly... rank: 3 : Finished reading and setting matrix values rank: 3 : Beginning mat assembly... rank: 7 : Finished reading and setting matrix values rank: 7 : Beginning mat assembly... rank: 6 : Finished reading and setting matrix values rank: 6 : Beginning mat assembly... rank: 0 : Finished reading and setting matrix values rank: 0 : Beginning mat assembly... rank: 1 : Finished mat assembly rank: 3 : Finished mat assembly rank: 7 : Finished mat assembly rank: 0 : Finished mat assembly rank: 5 : Finished mat assembly rank: 2 : Finished mat assembly rank: 4 : Finished mat assembly rank: 6 : Finished mat assembly --Junchao Zhang On Thu, Aug 20, 2020 at 5:29 PM Junchao Zhang wrote: > I will have a look and report back to you. Thanks. > --Junchao Zhang > > > On Thu, Aug 20, 2020 at 5:23 PM Manav Bhatia > wrote: > >> I have created a standalone test that demonstrates the problem at my end. >> I have stored the indices, etc. from my problem in a text file for each >> rank, which I use to initialize the matrix. >> Please note that the test is specifically for 8 ranks. >> >> The .tgz file is on my google drive: >> https://drive.google.com/file/d/1R-WjS36av3maXX3pUyiR3ndGAxteTVj-/view?usp=sharing >> >> >> This contains a README file with instructions on running. Please note >> that the work directory needs the index files. >> >> Please let me know if I can provide any further information. >> >> Thank you all for your help. >> >> Regards, >> Manav >> >> On Aug 20, 2020, at 12:54 PM, Jed Brown wrote: >> >> Matthew Knepley writes: >> >> On Thu, Aug 20, 2020 at 11:09 AM Manav Bhatia >> wrote: >> >> >> >> On Aug 20, 2020, at 8:31 AM, Stefano Zampini >> wrote: >> >> Can you add a MPI_Barrier before >> >> ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr); >> >> >> With a MPI_Barrier before this function call: >> ? three of the processes have already hit this barrier, >> ? the other 5 are inside MatStashScatterGetMesg_Private -> >> MatStashScatterGetMesg_BTS -> MPI_Waitsome(2 processes)/MPI_Waitall(3 >> processes) >> >> >> This is not itself evidence of inconsistent state. You can use >> >> -build_twosided allreduce >> >> to avoid the nonblocking sparse algorithm. >> >> >> Okay, you should run this with -matstash_legacy just to make sure it is >> not >> a bug in your MPI implementation. But it looks like >> there is inconsistency in the parallel state. This can happen because we >> have a bug, or it could be that you called a collective >> operation on a subset of the processes. Is there any way you could cut >> down >> the example (say put all 1s in the matrix, etc) so >> that you could give it to us to run? >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.saturday at gmail.com Thu Aug 20 23:31:20 2020 From: luis.saturday at gmail.com (Alex Fleeter) Date: Thu, 20 Aug 2020 21:31:20 -0700 Subject: [petsc-users] read argument from an XML file Message-ID: Hi: Does PETSc have or plan to enable reading arguments from an XML file? Something like the Teuchos package in Trilinos. Thanks, AF -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu Aug 20 23:38:29 2020 From: jed at jedbrown.org (Jed Brown) Date: Thu, 20 Aug 2020 22:38:29 -0600 Subject: [petsc-users] read argument from an XML file In-Reply-To: References: Message-ID: <877dtshauy.fsf@jedbrown.org> You can read from a YAML file with -options_file_yaml file.yaml or via https://www.mcs.anl.gov/petsc/petsc-master/docs/manualpages/Sys/PetscOptionsInsertFileYAML.html Would this work for you or do you really want XML? Alex Fleeter writes: > Hi: > > Does PETSc have or plan to enable reading arguments from an XML file? > Something like the Teuchos package in Trilinos. > > Thanks, > > AF From luis.saturday at gmail.com Thu Aug 20 23:44:06 2020 From: luis.saturday at gmail.com (Alex Fleeter) Date: Thu, 20 Aug 2020 21:44:06 -0700 Subject: [petsc-users] read argument from an XML file In-Reply-To: <877dtshauy.fsf@jedbrown.org> References: <877dtshauy.fsf@jedbrown.org> Message-ID: Thanks, we will try that. I have never used YAML before. Anyway, we feel using command line arguments is a bit old fashioned. It can be quite desirable to set parameters from a human-readable file. On Thu, Aug 20, 2020 at 9:38 PM Jed Brown wrote: > You can read from a YAML file with -options_file_yaml file.yaml or via > > > https://www.mcs.anl.gov/petsc/petsc-master/docs/manualpages/Sys/PetscOptionsInsertFileYAML.html > > Would this work for you or do you really want XML? > > Alex Fleeter writes: > > > Hi: > > > > Does PETSc have or plan to enable reading arguments from an XML file? > > Something like the Teuchos package in Trilinos. > > > > Thanks, > > > > AF > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Aug 20 23:59:44 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 20 Aug 2020 23:59:44 -0500 Subject: [petsc-users] read argument from an XML file In-Reply-To: References: <877dtshauy.fsf@jedbrown.org> Message-ID: Alex, If you are ok with plan text you can also put all your "command line arguments" in a text file (using # for comment lines) like -ksp_rtol 1.e-3 # very tight tolerance for SNES -snes_rtol 1.e-12 and provide the file to PETSc via the command line argument :-) -options_file filename a file with a standard name petscrc in the running directory or PetscOptionsInsertFile() https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscOptionsInsertFile.html Barry > On Aug 20, 2020, at 11:44 PM, Alex Fleeter wrote: > > Thanks, we will try that. I have never used YAML before. > > Anyway, we feel using command line arguments is a bit old fashioned. It can be quite desirable to set parameters from a human-readable file. > > On Thu, Aug 20, 2020 at 9:38 PM Jed Brown > wrote: > You can read from a YAML file with -options_file_yaml file.yaml or via > > https://www.mcs.anl.gov/petsc/petsc-master/docs/manualpages/Sys/PetscOptionsInsertFileYAML.html > > Would this work for you or do you really want XML? > > Alex Fleeter > writes: > > > Hi: > > > > Does PETSc have or plan to enable reading arguments from an XML file? > > Something like the Teuchos package in Trilinos. > > > > Thanks, > > > > AF -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Fri Aug 21 00:17:28 2020 From: jed at jedbrown.org (Jed Brown) Date: Thu, 20 Aug 2020 23:17:28 -0600 Subject: [petsc-users] read argument from an XML file In-Reply-To: References: <877dtshauy.fsf@jedbrown.org> Message-ID: <87364gh91z.fsf@jedbrown.org> Alex Fleeter writes: > Thanks, we will try that. I have never used YAML before. It's meant to be more human-readable than XML, and is widely used these days. > Anyway, we feel using command line arguments is a bit old fashioned. It can > be quite desirable to set parameters from a human-readable file. My usual workflow is to build up comprehensive options by experimenting on the command line, then put them in a file (either using the basic format that Barry mentioned or YAML). From Laura-victoria.ROLANDI at isae-supaero.fr Fri Aug 21 04:56:50 2020 From: Laura-victoria.ROLANDI at isae-supaero.fr (ROLANDI Laura victoria) Date: Fri, 21 Aug 2020 11:56:50 +0200 Subject: [petsc-users] =?utf-8?q?=5BSLEPc=5D_Krylov_Schur-_saving_krylov_?= =?utf-8?q?subspace?= Message-ID: <1670-5f3f9a80-1ab-50f4c200@1565017> Dear SLEPc developers, I'm using the Krylov Schur EPS and I have a question regarding a command. Is there a way for having access and saving the krylov subspace during the EPSSolve call? I inizialize the solver using the function EPSSetInitialSpace(eps,1, v0), where v0 is a specific vector, but after 24 hours of calculation my job has to end even if the EPSSolve hasn't finished yet. Which function should I use for saving the computed Krylov subspace and its dimention n during the process, in order to restart the calculation from it by using EPSsetInitialSpace(eps,n, Krylov-Subspace)? Thank you very much, Victoria -- --- Laura victoria ROLANDI Doctorant - Doctorat ISAE-SUPAERO Doctorat 1 laura-victoria.rolandi at isae-supaero.fr https://www.isae-supaero.fr Institut Sup?rieur de l'A?ronautique et de l'Espace 10, avenue Edouard Belin - BP 54032 31055 Toulouse Cedex 4 France?? Suivez l'ISAE-SUPAERO sur les r?seaux sociaux / Follow the ISAE-SUPAERO on the social media Facebook Twitter LinkedIn Youtube Instagram -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Fri Aug 21 05:42:47 2020 From: jroman at dsic.upv.es (Jose E. Roman) Date: Fri, 21 Aug 2020 12:42:47 +0200 Subject: [petsc-users] [SLEPc] Krylov Schur- saving krylov subspace In-Reply-To: <1670-5f3f9a80-1ab-50f4c200@1565017> References: <1670-5f3f9a80-1ab-50f4c200@1565017> Message-ID: <34111C3C-F81A-41A4-8747-5602F8E7B348@dsic.upv.es> Why is your problem taking so long? Are you running in parallel? Is your computation doing a factorization of a matrix? Are you getting slow convergence? How many eigenvalues are you computing? Note that Krylov-Schur is not intended for computing a large percentage of eigenvalues, if you do so then you might get large overheads unless you tune the EPSSetDimensions() parameters (mpd). EPSSetInitialSpace() is intended to provide an initial guess, which in Krylov-Schur is a single vector, so in this case you would not pass the Krylov subspace from a previous run. A possible scheme for restarting is to save the eigenvectors computed so far, then pass them in the next run via EPSSetDeflationSpace() to avoid recomputing them. You can use a custom stopping criterion as in https://slepc.upv.es/documentation/current/src/eps/tutorials/ex29.c.html to stop before the job is killed, then save the converged eigenvectors (or EPSGetInvariantSubspace() if the problem is nonsymmetric). Jose > El 21 ago 2020, a las 11:56, ROLANDI Laura victoria escribi?: > > Dear SLEPc developers, > > I'm using the Krylov Schur EPS and I have a question regarding a command. > > Is there a way for having access and saving the krylov subspace during the EPSSolve call? > > I inizialize the solver using the function EPSSetInitialSpace(eps,1, v0), where v0 is a specific vector, but after 24 hours of calculation my job has to end even if the EPSSolve hasn't finished yet. > Which function should I use for saving the computed Krylov subspace and its dimention n during the process, in order to restart the calculation from it by using EPSsetInitialSpace(eps,n, Krylov-Subspace)? > > Thank you very much, > Victoria From Laura-victoria.ROLANDI at isae-supaero.fr Fri Aug 21 07:34:38 2020 From: Laura-victoria.ROLANDI at isae-supaero.fr (ROLANDI Laura victoria) Date: Fri, 21 Aug 2020 14:34:38 +0200 Subject: [petsc-users] =?utf-8?b?Pz09P3V0Zi04P3E/ID89PT91dGYtOD9xPyBbU0xF?= =?utf-8?q?Pc=5D__Krylov_Schur-_saving_krylov_subspace?= In-Reply-To: <34111C3C-F81A-41A4-8747-5602F8E7B348@dsic.upv.es> Message-ID: <1670-5f3fbf80-27d-50f4c200@91787799> Thank you for your quick response. Yes, i'm running in parallel, I'm just asking for 2 eigenvalues and I'm not doing any factorization. My problem is taking so long because I have implemented the time stepping-exponential transformation: my MatMult() function for computing vectors of the Krylov subspace calls the Direct Numerical Simulation code for compressible Navier-Stokes equations to which I'm linking the stability code. Therefore, each MatMult() call takes very long, and I cannot save the converged eigenvectors for the restart beacause there won't be any converged eigenvectors yet when the job is killed. That's why I thought that the only thing I could save was the computed krylov subspace. Victoria Il giorno Venerdi, Agosto 21, 2020 12:42 CEST, "Jose E. Roman" ha scritto: ?Why is your problem taking so long? Are you running in parallel? Is your computation doing a factorization of a matrix? Are you getting slow convergence? How many eigenvalues are you computing? Note that Krylov-Schur is not intended for computing a large percentage of eigenvalues, if you do so then you might get large overheads unless you tune the EPSSetDimensions() parameters (mpd). EPSSetInitialSpace() is intended to provide an initial guess, which in Krylov-Schur is a single vector, so in this case you would not pass the Krylov subspace from a previous run. A possible scheme for restarting is to save the eigenvectors computed so far, then pass them in the next run via EPSSetDeflationSpace() to avoid recomputing them. You can use a custom stopping criterion as in https://slepc.upv.es/documentation/current/src/eps/tutorials/ex29.c.html to stop before the job is killed, then save the converged eigenvectors (or EPSGetInvariantSubspace() if the problem is nonsymmetric). Jose > El 21 ago 2020, a las 11:56, ROLANDI Laura victoria escribi?: > > Dear SLEPc developers, > > I'm using the Krylov Schur EPS and I have a question regarding a command. > > Is there a way for having access and saving the krylov subspace during the EPSSolve call? > > I inizialize the solver using the function EPSSetInitialSpace(eps,1, v0), where v0 is a specific vector, but after 24 hours of calculation my job has to end even if the EPSSolve hasn't finished yet. > Which function should I use for saving the computed Krylov subspace and its dimention n during the process, in order to restart the calculation from it by using EPSsetInitialSpace(eps,n, Krylov-Subspace)? > > Thank you very much, > Victoria ? -- --- Laura victoria ROLANDI Doctorant - Doctorat ISAE-SUPAERO Doctorat 1 laura-victoria.rolandi at isae-supaero.fr https://www.isae-supaero.fr Institut Sup?rieur de l'A?ronautique et de l'Espace 10, avenue Edouard Belin - BP 54032 31055 Toulouse Cedex 4 France?? Suivez l'ISAE-SUPAERO sur les r?seaux sociaux / Follow the ISAE-SUPAERO on the social media Facebook Twitter LinkedIn Youtube Instagram -------------- next part -------------- An HTML attachment was scrubbed... URL: From thibault.bridelbertomeu at gmail.com Fri Aug 21 07:55:52 2020 From: thibault.bridelbertomeu at gmail.com (Thibault Bridel-Bertomeu) Date: Fri, 21 Aug 2020 14:55:52 +0200 Subject: [petsc-users] PetscFV and TS implicit In-Reply-To: <87mu2pgtdp.fsf@jedbrown.org> References: <87mu2pgtdp.fsf@jedbrown.org> Message-ID: Hi, Thanks Matthew and Jed for your input. I indeed envision an implicit solver in the sense Jed mentioned - Jiri Blazek's book is a nice intro to this concept. Matthew, I do not know exactly what to change right now because although I understand globally what the DMPlexComputeXXXX_Internal methods do, I cannot say for sure line by line what is happening. In a structured code, I have a an implicit FVM solver with PETSc but I do not use any of the FV structure, not even a DM - I just use C arrays that I transform to PETSc Vec and Mat and build my IJacobian and my preconditioner and gives all that to a TS and it runs. I cannot figure out how to do it with the FV and the DM and all the underlying "shortcuts" that I want to use. Here is the top method for the structured code : int total_size = context.npoints * solver->nvars ierr = TSSetRHSFunction(ts,PETSC_NULL,PetscRHSFunctionImpl,&context); CHKERRQ(ierr); SNES snes; KSP ksp; PC pc; SNESType snestype; ierr = TSGetSNES(ts,&snes); CHKERRQ(ierr); ierr = SNESGetType(snes,&snestype); CHKERRQ(ierr); flag_mat_a = 1; ierr = MatCreateShell(MPI_COMM_WORLD,total_size,total_size,PETSC_DETERMINE, PETSC_DETERMINE,&context,&A); CHKERRQ(ierr); context.jfnk_eps = 1e-7; ierr = PetscOptionsGetReal(NULL,NULL,"-jfnk_epsilon",&context.jfnk_eps,NULL); CHKERRQ(ierr); ierr = MatShellSetOperation(A,MATOP_MULT,(void (*)(void)) PetscJacobianFunction_JFNK); CHKERRQ(ierr); ierr = MatSetUp(A); CHKERRQ(ierr); context.flag_use_precon = 0; ierr = PetscOptionsGetBool(PETSC_NULL,PETSC_NULL,"-with_pc",(PetscBool*)(& context.flag_use_precon),PETSC_NULL); CHKERRQ(ierr); /* Set up preconditioner matrix */ flag_mat_b = 1; ierr = MatCreateAIJ(MPI_COMM_WORLD,total_size,total_size,PETSC_DETERMINE, PETSC_DETERMINE, (solver->ndims*2+1)*solver->nvars,NULL, 2*solver->ndims*solver->nvars,NULL,&B); CHKERRQ(ierr); ierr = MatSetBlockSize(B,solver->nvars); /* Set the RHSJacobian function for TS */ ierr = TSSetIJacobian(ts,A,B,PetscIJacobian,&context); CHKERRQ(ierr); Thibault Bridel-Bertomeu ? Eng, MSc, PhD Research Engineer CEA/CESTA 33114 LE BARP Tel.: (+33)557046924 Mob.: (+33)611025322 Mail: thibault.bridelbertomeu at gmail.com Le jeu. 20 ao?t 2020 ? 18:43, Jed Brown a ?crit : > Matthew Knepley writes: > > > I could never get the FVM stuff to make sense to me for implicit methods. > > Here is my problem understanding. If you have an FVM method, it decides > > to move "stuff" from one cell to its neighboring cells depending on the > > solution to the Riemann problem on each face, which computed the flux. > This > > is > > fine unless the timestep is so big that material can flow through into > the > > cells beyond the neighbor. Then I should have considered the effect of > the > > Riemann problem for those interfaces. That would be in the Jacobian, but > I > > don't know how to compute that Jacobian. I guess you could do everything > > matrix-free, but without a preconditioner it seems hard. > > So long as we're using method of lines, the flux is just instantaneous > flux, not integrated over some time step. It has the same meaning for > implicit and explicit. > > An explicit method would be unstable if you took such a large time step > (CFL) and an implicit method will not simultaneously be SSP and higher than > first order, but it's still a consistent discretization of the problem. > > It's common (done in FUN3D and others) to precondition with a first-order > method, where gradient reconstruction/limiting is skipped. That's what I'd > recommend because limiting creates nasty nonlinearities and the resulting > discretizations lack h-ellipticity which makes them very hard to solve. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Aug 21 08:09:35 2020 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 21 Aug 2020 09:09:35 -0400 Subject: [petsc-users] PetscFV and TS implicit In-Reply-To: References: <87mu2pgtdp.fsf@jedbrown.org> Message-ID: On Fri, Aug 21, 2020 at 8:56 AM Thibault Bridel-Bertomeu < thibault.bridelbertomeu at gmail.com> wrote: > Hi, > > Thanks Matthew and Jed for your input. > I indeed envision an implicit solver in the sense Jed mentioned - Jiri > Blazek's book is a nice intro to this concept. > Unfortunately, I still do not really understand. I have a step in the FV code where I update the state based on the fluxes. If you violate CFD, this step is completely wrong. > Matthew, I do not know exactly what to change right now because although I > understand globally what the DMPlexComputeXXXX_Internal methods do, I > cannot say for sure line by line what is happening. > In a structured code, I have a an implicit FVM solver with PETSc but I do > not use any of the FV structure, not even a DM - I just use C arrays that I > transform to PETSc Vec and Mat and build my IJacobian and my preconditioner > and gives all that to a TS and it runs. I cannot figure out how to do it > with the FV and the DM and all the underlying "shortcuts" that I want to > use. > I can explain what I am doing, and maybe we can either change it to what you want, or allow you to use the pieces in your code to make things simpler. Here is the comment from DMPlexComputeResidual_Internal(): /* FVM */ /* Get geometric data */ /* If using gradients */ /* Compute gradient data */ /* Loop over domain faces */ /* Count computational faces */ /* Reconstruct cell gradient */ /* Loop over domain cells */ /* Limit cell gradients */ /* Handle boundary values */ /* Loop over domain faces */ /* Read out field, centroid, normal, volume for each side of face */ /* Riemann solve over faces */ /* Loop over domain faces */ /* Accumulate fluxes to cells */ Obviously you might not need all the steps, and the last step is the one I cannot understand for your case. Thanks, Matt > Here is the top method for the structured code : > > int total_size = context.npoints * solver->nvars > ierr = TSSetRHSFunction(ts,PETSC_NULL,PetscRHSFunctionImpl,&context); > CHKERRQ(ierr); > SNES snes; > KSP ksp; > PC pc; > SNESType snestype; > ierr = TSGetSNES(ts,&snes); CHKERRQ(ierr); > ierr = SNESGetType(snes,&snestype); CHKERRQ(ierr); > > flag_mat_a = 1; > ierr = MatCreateShell(MPI_COMM_WORLD,total_size,total_size,PETSC_DETERMINE > , > PETSC_DETERMINE,&context,&A); CHKERRQ(ierr); > context.jfnk_eps = 1e-7; > ierr = PetscOptionsGetReal(NULL,NULL,"-jfnk_epsilon",&context.jfnk_eps, > NULL); CHKERRQ(ierr); > ierr = MatShellSetOperation(A,MATOP_MULT,(void (*)(void)) > PetscJacobianFunction_JFNK); CHKERRQ(ierr); > ierr = MatSetUp(A); CHKERRQ(ierr); > > context.flag_use_precon = 0; > ierr = PetscOptionsGetBool(PETSC_NULL,PETSC_NULL,"-with_pc",(PetscBool*)(& > context.flag_use_precon),PETSC_NULL); CHKERRQ(ierr); > > /* Set up preconditioner matrix */ > flag_mat_b = 1; > ierr = MatCreateAIJ(MPI_COMM_WORLD,total_size,total_size,PETSC_DETERMINE, > PETSC_DETERMINE, > (solver->ndims*2+1)*solver->nvars,NULL, > 2*solver->ndims*solver->nvars,NULL,&B); CHKERRQ(ierr); > ierr = MatSetBlockSize(B,solver->nvars); > /* Set the RHSJacobian function for TS */ > ierr = TSSetIJacobian(ts,A,B,PetscIJacobian,&context); CHKERRQ(ierr); > > Thibault Bridel-Bertomeu > ? > Eng, MSc, PhD > Research Engineer > CEA/CESTA > 33114 LE BARP > Tel.: (+33)557046924 > Mob.: (+33)611025322 > Mail: thibault.bridelbertomeu at gmail.com > > > Le jeu. 20 ao?t 2020 ? 18:43, Jed Brown a ?crit : > >> Matthew Knepley writes: >> >> > I could never get the FVM stuff to make sense to me for implicit >> methods. >> > Here is my problem understanding. If you have an FVM method, it decides >> > to move "stuff" from one cell to its neighboring cells depending on the >> > solution to the Riemann problem on each face, which computed the flux. >> This >> > is >> > fine unless the timestep is so big that material can flow through into >> the >> > cells beyond the neighbor. Then I should have considered the effect of >> the >> > Riemann problem for those interfaces. That would be in the Jacobian, >> but I >> > don't know how to compute that Jacobian. I guess you could do everything >> > matrix-free, but without a preconditioner it seems hard. >> >> So long as we're using method of lines, the flux is just instantaneous >> flux, not integrated over some time step. It has the same meaning for >> implicit and explicit. >> >> An explicit method would be unstable if you took such a large time step >> (CFL) and an implicit method will not simultaneously be SSP and higher than >> first order, but it's still a consistent discretization of the problem. >> >> It's common (done in FUN3D and others) to precondition with a first-order >> method, where gradient reconstruction/limiting is skipped. That's what I'd >> recommend because limiting creates nasty nonlinearities and the resulting >> discretizations lack h-ellipticity which makes them very hard to solve. >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From thibault.bridelbertomeu at gmail.com Fri Aug 21 08:10:00 2020 From: thibault.bridelbertomeu at gmail.com (Thibault Bridel-Bertomeu) Date: Fri, 21 Aug 2020 15:10:00 +0200 Subject: [petsc-users] PetscFV and TS implicit In-Reply-To: References: <87mu2pgtdp.fsf@jedbrown.org> Message-ID: Sorry, I sent too soon, I hit the wrong key. I wanted to say that context.npoints is the local number of cells. PetscRHSFunctionImpl allows to generate the hyperbolic part of the right hand side. Then we have : PetscErrorCode PetscIJacobian( TS ts, /*!< Time stepping object (see PETSc TS)*/ PetscReal t, /*!< Current time */ Vec Y, /*!< Solution vector */ Vec Ydot, /*!< Time-derivative of solution vector */ PetscReal a, /*!< Shift */ Mat A, /*!< Jacobian matrix */ Mat B, /*!< Preconditioning matrix */ void *ctxt /*!< Application context */ ) { PETScContext *context = (PETScContext*) ctxt; HyPar *solver = context->solver; _DECLARE_IERR_; PetscFunctionBegin; solver->count_IJacobian++; context->shift = a; context->waqt = t; /* Construct preconditioning matrix */ if (context->flag_use_precon) { IERR PetscComputePreconMatImpl(B,Y,context); CHECKERR(ierr); } PetscFunctionReturn(0); } and PetscJacobianFunction_JFNK which I bind to the matrix shell, computes the action of the jacobian on a vector : say U0 is the state of reference and Y the vector upon which to apply the JFNK method, then the PetscJacobianFunction_JFNK returns shift * Y - 1/epsilon * (F(U0 + epsilon*Y) - F(U0)) where F allows to evaluate the hyperbolic flux (shift comes from the TS). The preconditioning matrix I compute as an approximation to the actual jacobian, that is shift * Identity - Derivative(dF/dU) where dF/dU is, in each cell, a 4x4 matrix that is known exactly for the system of equations I am solving, i.e. Euler equations. For the structured grid, I can loop on the cells and do that 'Derivative' thing at first order by simply taking a finite-difference like approximation with the neighboring cells, Derivative(phi) = phi_i - phi_{i-1} and I assemble the B matrix block by block (JFunction is the dF/dU) /* diagonal element */ for (v=0; vJFunction (values,(u+nvars*p),solver->physics ,dir,0); _ArrayScale1D_ (values,(dxinv*iblank),(nvars*nvars)); ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); CHKERRQ(ierr); /* left neighbor */ if (pgL >= 0) { for (v=0; vJFunction (values,(u+nvars*pL),solver->physics ,dir,1); _ArrayScale1D_ (values,(-dxinv*iblank),(nvars*nvars)); ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); CHKERRQ(ierr); } /* right neighbor */ if (pgR >= 0) { for (v=0; vJFunction (values,(u+nvars*pR),solver->physics ,dir,-1); _ArrayScale1D_ (values,(-dxinv*iblank),(nvars*nvars)); ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); CHKERRQ(ierr); } I do not know if I am clear here ... Anyways, I am trying to figure out how to do this shell matrix and this preconditioner using all the FV and DMPlex artillery. Le ven. 21 ao?t 2020 ? 14:55, Thibault Bridel-Bertomeu < thibault.bridelbertomeu at gmail.com> a ?crit : > Hi, > > Thanks Matthew and Jed for your input. > I indeed envision an implicit solver in the sense Jed mentioned - Jiri > Blazek's book is a nice intro to this concept. > > Matthew, I do not know exactly what to change right now because although I > understand globally what the DMPlexComputeXXXX_Internal methods do, I > cannot say for sure line by line what is happening. > In a structured code, I have a an implicit FVM solver with PETSc but I do > not use any of the FV structure, not even a DM - I just use C arrays that I > transform to PETSc Vec and Mat and build my IJacobian and my preconditioner > and gives all that to a TS and it runs. I cannot figure out how to do it > with the FV and the DM and all the underlying "shortcuts" that I want to > use. > > Here is the top method for the structured code : > > int total_size = context.npoints * solver->nvars > ierr = TSSetRHSFunction(ts,PETSC_NULL,PetscRHSFunctionImpl,&context); > CHKERRQ(ierr); > SNES snes; > KSP ksp; > PC pc; > SNESType snestype; > ierr = TSGetSNES(ts,&snes); CHKERRQ(ierr); > ierr = SNESGetType(snes,&snestype); CHKERRQ(ierr); > > flag_mat_a = 1; > ierr = MatCreateShell(MPI_COMM_WORLD,total_size,total_size,PETSC_DETERMINE > , > PETSC_DETERMINE,&context,&A); CHKERRQ(ierr); > context.jfnk_eps = 1e-7; > ierr = PetscOptionsGetReal(NULL,NULL,"-jfnk_epsilon",&context.jfnk_eps, > NULL); CHKERRQ(ierr); > ierr = MatShellSetOperation(A,MATOP_MULT,(void (*)(void)) > PetscJacobianFunction_JFNK); CHKERRQ(ierr); > ierr = MatSetUp(A); CHKERRQ(ierr); > > context.flag_use_precon = 0; > ierr = PetscOptionsGetBool(PETSC_NULL,PETSC_NULL,"-with_pc",(PetscBool*)(& > context.flag_use_precon),PETSC_NULL); CHKERRQ(ierr); > > /* Set up preconditioner matrix */ > flag_mat_b = 1; > ierr = MatCreateAIJ(MPI_COMM_WORLD,total_size,total_size,PETSC_DETERMINE, > PETSC_DETERMINE, > (solver->ndims*2+1)*solver->nvars,NULL, > 2*solver->ndims*solver->nvars,NULL,&B); CHKERRQ(ierr); > ierr = MatSetBlockSize(B,solver->nvars); > /* Set the RHSJacobian function for TS */ > ierr = TSSetIJacobian(ts,A,B,PetscIJacobian,&context); CHKERRQ(ierr); > > Thibault Bridel-Bertomeu > ? > Eng, MSc, PhD > Research Engineer > CEA/CESTA > 33114 LE BARP > Tel.: (+33)557046924 > Mob.: (+33)611025322 > Mail: thibault.bridelbertomeu at gmail.com > > > Le jeu. 20 ao?t 2020 ? 18:43, Jed Brown a ?crit : > >> Matthew Knepley writes: >> >> > I could never get the FVM stuff to make sense to me for implicit >> methods. >> > Here is my problem understanding. If you have an FVM method, it decides >> > to move "stuff" from one cell to its neighboring cells depending on the >> > solution to the Riemann problem on each face, which computed the flux. >> This >> > is >> > fine unless the timestep is so big that material can flow through into >> the >> > cells beyond the neighbor. Then I should have considered the effect of >> the >> > Riemann problem for those interfaces. That would be in the Jacobian, >> but I >> > don't know how to compute that Jacobian. I guess you could do everything >> > matrix-free, but without a preconditioner it seems hard. >> >> So long as we're using method of lines, the flux is just instantaneous >> flux, not integrated over some time step. It has the same meaning for >> implicit and explicit. >> >> An explicit method would be unstable if you took such a large time step >> (CFL) and an implicit method will not simultaneously be SSP and higher than >> first order, but it's still a consistent discretization of the problem. >> >> It's common (done in FUN3D and others) to precondition with a first-order >> method, where gradient reconstruction/limiting is skipped. That's what I'd >> recommend because limiting creates nasty nonlinearities and the resulting >> discretizations lack h-ellipticity which makes them very hard to solve. >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thibault.bridelbertomeu at gmail.com Fri Aug 21 08:21:07 2020 From: thibault.bridelbertomeu at gmail.com (Thibault Bridel-Bertomeu) Date: Fri, 21 Aug 2020 15:21:07 +0200 Subject: [petsc-users] PetscFV and TS implicit In-Reply-To: References: <87mu2pgtdp.fsf@jedbrown.org> Message-ID: Thanks Matthew for your very fast answer. Sorry I sent another mail at the same time my first one wasn't complete. Le ven. 21 ao?t 2020 ? 15:09, Matthew Knepley a ?crit : > On Fri, Aug 21, 2020 at 8:56 AM Thibault Bridel-Bertomeu < > thibault.bridelbertomeu at gmail.com> wrote: > >> Hi, >> >> Thanks Matthew and Jed for your input. >> I indeed envision an implicit solver in the sense Jed mentioned - Jiri >> Blazek's book is a nice intro to this concept. >> > > Unfortunately, I still do not really understand. I have a step in the FV > code where I update the state based on the fluxes. If you > violate CFD, this step is completely wrong. > > >> Matthew, I do not know exactly what to change right now because although >> I understand globally what the DMPlexComputeXXXX_Internal methods do, I >> cannot say for sure line by line what is happening. >> In a structured code, I have a an implicit FVM solver with PETSc but I do >> not use any of the FV structure, not even a DM - I just use C arrays that I >> transform to PETSc Vec and Mat and build my IJacobian and my preconditioner >> and gives all that to a TS and it runs. I cannot figure out how to do it >> with the FV and the DM and all the underlying "shortcuts" that I want to >> use. >> > > I can explain what I am doing, and maybe we can either change it to what > you want, or allow you to use the pieces in your code to make things > simpler. > Here is the comment from DMPlexComputeResidual_Internal(): > > /* FVM */ > /* Get geometric data */ > /* If using gradients */ > /* Compute gradient data */ > /* Loop over domain faces */ > /* Count computational faces */ > /* Reconstruct cell gradient */ > /* Loop over domain cells */ > /* Limit cell gradients */ > /* Handle boundary values */ > /* Loop over domain faces */ > /* Read out field, centroid, normal, volume for each side of face */ > /* Riemann solve over faces */ > /* Loop over domain faces */ > /* Accumulate fluxes to cells */ > Yes I saw the comments at the end of the method and they helped to understand. Indeed at the end of this, you have the RHS of your system in the dU/dt = RHS sense. To be accurate it is the RHS at time t^n, like if it were a first order Euler approximation we would write U^(n+1) - U^n = dt * RHS^n. Only with an implicit solver we want to write (U^(n+1) - U^n )/dt = RHS^(n+1) as you know. Since we cannot really evaluate RHS^(n+1) we like to linearize it about RHS^n and we write RHS^(n+1) = RHS^n + [dRHS/dU]^n * (U^(n+1) - U^n) All in all if we substitute we obtain something like [alpha * Identity - [dRHS/dU]^n](U^(n+1) - U^n) = RHS^n where alpha depends on the time step. This left hand side matrix is what I would like to compute before advancing the system in time, whereas the RHS^n is what DMPlexComputeResidual_Internal gives. Then there is the story of preconditioning this system to make the whole thing more stable but if I already could figure out how to generate that LHS it would be a great step in the right direction. Thanks again a lot ! Thibault > Obviously you might not need all the steps, and the last step is the one I > cannot understand for your case. > > Thanks, > > Matt > > >> Here is the top method for the structured code : >> >> int total_size = context.npoints * solver->nvars >> ierr = TSSetRHSFunction(ts,PETSC_NULL,PetscRHSFunctionImpl,&context); >> CHKERRQ(ierr); >> SNES snes; >> KSP ksp; >> PC pc; >> SNESType snestype; >> ierr = TSGetSNES(ts,&snes); CHKERRQ(ierr); >> ierr = SNESGetType(snes,&snestype); CHKERRQ(ierr); >> >> flag_mat_a = 1; >> ierr = MatCreateShell(MPI_COMM_WORLD,total_size,total_size, >> PETSC_DETERMINE, >> PETSC_DETERMINE,&context,&A); CHKERRQ(ierr); >> context.jfnk_eps = 1e-7; >> ierr = PetscOptionsGetReal(NULL,NULL,"-jfnk_epsilon",&context.jfnk_eps, >> NULL); CHKERRQ(ierr); >> ierr = MatShellSetOperation(A,MATOP_MULT,(void (*)(void)) >> PetscJacobianFunction_JFNK); CHKERRQ(ierr); >> ierr = MatSetUp(A); CHKERRQ(ierr); >> >> context.flag_use_precon = 0; >> ierr = PetscOptionsGetBool(PETSC_NULL,PETSC_NULL,"-with_pc",(PetscBool >> *)(&context.flag_use_precon),PETSC_NULL); CHKERRQ(ierr); >> >> /* Set up preconditioner matrix */ >> flag_mat_b = 1; >> ierr = MatCreateAIJ(MPI_COMM_WORLD,total_size,total_size,PETSC_DETERMINE, >> PETSC_DETERMINE, >> (solver->ndims*2+1)*solver->nvars,NULL, >> 2*solver->ndims*solver->nvars,NULL,&B); CHKERRQ(ierr); >> ierr = MatSetBlockSize(B,solver->nvars); >> /* Set the RHSJacobian function for TS */ >> ierr = TSSetIJacobian(ts,A,B,PetscIJacobian,&context); CHKERRQ(ierr); >> >> Thibault Bridel-Bertomeu >> ? >> Eng, MSc, PhD >> Research Engineer >> CEA/CESTA >> 33114 LE BARP >> Tel.: (+33)557046924 >> Mob.: (+33)611025322 >> Mail: thibault.bridelbertomeu at gmail.com >> >> >> Le jeu. 20 ao?t 2020 ? 18:43, Jed Brown a ?crit : >> >>> Matthew Knepley writes: >>> >>> > I could never get the FVM stuff to make sense to me for implicit >>> methods. >>> > Here is my problem understanding. If you have an FVM method, it decides >>> > to move "stuff" from one cell to its neighboring cells depending on the >>> > solution to the Riemann problem on each face, which computed the flux. >>> This >>> > is >>> > fine unless the timestep is so big that material can flow through into >>> the >>> > cells beyond the neighbor. Then I should have considered the effect of >>> the >>> > Riemann problem for those interfaces. That would be in the Jacobian, >>> but I >>> > don't know how to compute that Jacobian. I guess you could do >>> everything >>> > matrix-free, but without a preconditioner it seems hard. >>> >>> So long as we're using method of lines, the flux is just instantaneous >>> flux, not integrated over some time step. It has the same meaning for >>> implicit and explicit. >>> >>> An explicit method would be unstable if you took such a large time step >>> (CFL) and an implicit method will not simultaneously be SSP and higher than >>> first order, but it's still a consistent discretization of the problem. >>> >>> It's common (done in FUN3D and others) to precondition with a >>> first-order method, where gradient reconstruction/limiting is skipped. >>> That's what I'd recommend because limiting creates nasty nonlinearities and >>> the resulting discretizations lack h-ellipticity which makes them very hard >>> to solve. >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Aug 21 08:23:31 2020 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 21 Aug 2020 09:23:31 -0400 Subject: [petsc-users] PetscFV and TS implicit In-Reply-To: References: <87mu2pgtdp.fsf@jedbrown.org> Message-ID: On Fri, Aug 21, 2020 at 9:10 AM Thibault Bridel-Bertomeu < thibault.bridelbertomeu at gmail.com> wrote: > Sorry, I sent too soon, I hit the wrong key. > > I wanted to say that context.npoints is the local number of cells. > > PetscRHSFunctionImpl allows to generate the hyperbolic part of the right > hand side. > Then we have : > > PetscErrorCode PetscIJacobian( > TS ts, /*!< Time stepping object (see PETSc TS)*/ > PetscReal t, /*!< Current time */ > Vec Y, /*!< Solution vector */ > Vec Ydot, /*!< Time-derivative of solution vector */ > PetscReal a, /*!< Shift */ > Mat A, /*!< Jacobian matrix */ > Mat B, /*!< Preconditioning matrix */ > void *ctxt /*!< Application context */ > ) > { > PETScContext *context = (PETScContext*) ctxt; > HyPar *solver = context->solver; > _DECLARE_IERR_; > > PetscFunctionBegin; > solver->count_IJacobian++; > context->shift = a; > context->waqt = t; > /* Construct preconditioning matrix */ > if (context->flag_use_precon) { IERR PetscComputePreconMatImpl(B,Y,context); > CHECKERR(ierr); } > > PetscFunctionReturn(0); > } > > and PetscJacobianFunction_JFNK which I bind to the matrix shell, computes > the action of the jacobian on a vector : say U0 is the state of reference > and Y the vector upon which to apply the JFNK method, then the > PetscJacobianFunction_JFNK returns shift * Y - 1/epsilon * (F(U0 + > epsilon*Y) - F(U0)) where F allows to evaluate the hyperbolic flux (shift > comes from the TS). > The preconditioning matrix I compute as an approximation to the actual > jacobian, that is shift * Identity - Derivative(dF/dU) where dF/dU is, in > each cell, a 4x4 matrix that is known exactly for the system of equations I > am solving, i.e. Euler equations. For the structured grid, I can loop on > the cells and do that 'Derivative' thing at first order by simply taking a > finite-difference like approximation with the neighboring cells, > Derivative(phi) = phi_i - phi_{i-1} and I assemble the B matrix block by > block (JFunction is the dF/dU) > > /* diagonal element */ > for (v=0; v v; } > ierr = solver->JFunction > > (values,(u+nvars*p),solver->physics > > ,dir,0); > _ArrayScale1D_ > > (values,(dxinv*iblank),(nvars*nvars)); > ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); > CHKERRQ(ierr); > > /* left neighbor */ > if (pgL >= 0) { > for (v=0; v + v; } > ierr = solver->JFunction > > (values,(u+nvars*pL),solver->physics > > ,dir,1); > _ArrayScale1D_ > > (values,(-dxinv*iblank),(nvars*nvars)); > ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); > CHKERRQ(ierr); > } > > /* right neighbor */ > if (pgR >= 0) { > for (v=0; v + v; } > ierr = solver->JFunction > > (values,(u+nvars*pR),solver->physics > > ,dir,-1); > _ArrayScale1D_ > > (values,(-dxinv*iblank),(nvars*nvars)); > ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); > CHKERRQ(ierr); > } > > > > I do not know if I am clear here ... > Anyways, I am trying to figure out how to do this shell matrix and this > preconditioner using all the FV and DMPlex artillery. > Okay, that is very clear. We should be able to get the JFNK just with -snes_mf_operator, and put the approximate J construction in DMPlexComputeJacobian_Internal(). There is an FV section already, and we could just add this. I would need to understand those entries in the pointwise Riemann sense that the other stuff is now. Thanks, Matt > Le ven. 21 ao?t 2020 ? 14:55, Thibault Bridel-Bertomeu < > thibault.bridelbertomeu at gmail.com> a ?crit : > >> Hi, >> >> Thanks Matthew and Jed for your input. >> I indeed envision an implicit solver in the sense Jed mentioned - Jiri >> Blazek's book is a nice intro to this concept. >> >> Matthew, I do not know exactly what to change right now because although >> I understand globally what the DMPlexComputeXXXX_Internal methods do, I >> cannot say for sure line by line what is happening. >> In a structured code, I have a an implicit FVM solver with PETSc but I do >> not use any of the FV structure, not even a DM - I just use C arrays that I >> transform to PETSc Vec and Mat and build my IJacobian and my preconditioner >> and gives all that to a TS and it runs. I cannot figure out how to do it >> with the FV and the DM and all the underlying "shortcuts" that I want to >> use. >> >> Here is the top method for the structured code : >> >> int total_size = context.npoints * solver->nvars >> ierr = TSSetRHSFunction(ts,PETSC_NULL,PetscRHSFunctionImpl,&context); >> CHKERRQ(ierr); >> SNES snes; >> KSP ksp; >> PC pc; >> SNESType snestype; >> ierr = TSGetSNES(ts,&snes); CHKERRQ(ierr); >> ierr = SNESGetType(snes,&snestype); CHKERRQ(ierr); >> >> flag_mat_a = 1; >> ierr = MatCreateShell(MPI_COMM_WORLD,total_size,total_size, >> PETSC_DETERMINE, >> PETSC_DETERMINE,&context,&A); CHKERRQ(ierr); >> context.jfnk_eps = 1e-7; >> ierr = PetscOptionsGetReal(NULL,NULL,"-jfnk_epsilon",&context.jfnk_eps, >> NULL); CHKERRQ(ierr); >> ierr = MatShellSetOperation(A,MATOP_MULT,(void (*)(void)) >> PetscJacobianFunction_JFNK); CHKERRQ(ierr); >> ierr = MatSetUp(A); CHKERRQ(ierr); >> >> context.flag_use_precon = 0; >> ierr = PetscOptionsGetBool(PETSC_NULL,PETSC_NULL,"-with_pc",(PetscBool >> *)(&context.flag_use_precon),PETSC_NULL); CHKERRQ(ierr); >> >> /* Set up preconditioner matrix */ >> flag_mat_b = 1; >> ierr = MatCreateAIJ(MPI_COMM_WORLD,total_size,total_size,PETSC_DETERMINE, >> PETSC_DETERMINE, >> (solver->ndims*2+1)*solver->nvars,NULL, >> 2*solver->ndims*solver->nvars,NULL,&B); CHKERRQ(ierr); >> ierr = MatSetBlockSize(B,solver->nvars); >> /* Set the RHSJacobian function for TS */ >> ierr = TSSetIJacobian(ts,A,B,PetscIJacobian,&context); CHKERRQ(ierr); >> >> Thibault Bridel-Bertomeu >> ? >> Eng, MSc, PhD >> Research Engineer >> CEA/CESTA >> 33114 LE BARP >> Tel.: (+33)557046924 >> Mob.: (+33)611025322 >> Mail: thibault.bridelbertomeu at gmail.com >> >> >> Le jeu. 20 ao?t 2020 ? 18:43, Jed Brown a ?crit : >> >>> Matthew Knepley writes: >>> >>> > I could never get the FVM stuff to make sense to me for implicit >>> methods. >>> > Here is my problem understanding. If you have an FVM method, it decides >>> > to move "stuff" from one cell to its neighboring cells depending on the >>> > solution to the Riemann problem on each face, which computed the flux. >>> This >>> > is >>> > fine unless the timestep is so big that material can flow through into >>> the >>> > cells beyond the neighbor. Then I should have considered the effect of >>> the >>> > Riemann problem for those interfaces. That would be in the Jacobian, >>> but I >>> > don't know how to compute that Jacobian. I guess you could do >>> everything >>> > matrix-free, but without a preconditioner it seems hard. >>> >>> So long as we're using method of lines, the flux is just instantaneous >>> flux, not integrated over some time step. It has the same meaning for >>> implicit and explicit. >>> >>> An explicit method would be unstable if you took such a large time step >>> (CFL) and an implicit method will not simultaneously be SSP and higher than >>> first order, but it's still a consistent discretization of the problem. >>> >>> It's common (done in FUN3D and others) to precondition with a >>> first-order method, where gradient reconstruction/limiting is skipped. >>> That's what I'd recommend because limiting creates nasty nonlinearities and >>> the resulting discretizations lack h-ellipticity which makes them very hard >>> to solve. >>> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From thibault.bridelbertomeu at gmail.com Fri Aug 21 08:35:45 2020 From: thibault.bridelbertomeu at gmail.com (Thibault Bridel-Bertomeu) Date: Fri, 21 Aug 2020 15:35:45 +0200 Subject: [petsc-users] PetscFV and TS implicit In-Reply-To: References: <87mu2pgtdp.fsf@jedbrown.org> Message-ID: Le ven. 21 ao?t 2020 ? 15:23, Matthew Knepley a ?crit : > On Fri, Aug 21, 2020 at 9:10 AM Thibault Bridel-Bertomeu < > thibault.bridelbertomeu at gmail.com> wrote: > >> Sorry, I sent too soon, I hit the wrong key. >> >> I wanted to say that context.npoints is the local number of cells. >> >> PetscRHSFunctionImpl allows to generate the hyperbolic part of the right >> hand side. >> Then we have : >> >> PetscErrorCode PetscIJacobian( >> TS ts, /*!< Time stepping object (see PETSc TS)*/ >> PetscReal t, /*!< Current time */ >> Vec Y, /*!< Solution vector */ >> Vec Ydot, /*!< Time-derivative of solution vector */ >> PetscReal a, /*!< Shift */ >> Mat A, /*!< Jacobian matrix */ >> Mat B, /*!< Preconditioning matrix */ >> void *ctxt /*!< Application context */ >> ) >> { >> PETScContext *context = (PETScContext*) ctxt; >> HyPar *solver = context->solver; >> _DECLARE_IERR_; >> >> PetscFunctionBegin; >> solver->count_IJacobian++; >> context->shift = a; >> context->waqt = t; >> /* Construct preconditioning matrix */ >> if (context->flag_use_precon) { IERR PetscComputePreconMatImpl(B,Y, >> context); CHECKERR(ierr); } >> >> PetscFunctionReturn(0); >> } >> >> and PetscJacobianFunction_JFNK which I bind to the matrix shell, >> computes the action of the jacobian on a vector : say U0 is the state of >> reference and Y the vector upon which to apply the JFNK method, then the >> PetscJacobianFunction_JFNK returns shift * Y - 1/epsilon * (F(U0 + >> epsilon*Y) - F(U0)) where F allows to evaluate the hyperbolic flux (shift >> comes from the TS). >> The preconditioning matrix I compute as an approximation to the actual >> jacobian, that is shift * Identity - Derivative(dF/dU) where dF/dU is, in >> each cell, a 4x4 matrix that is known exactly for the system of equations I >> am solving, i.e. Euler equations. For the structured grid, I can loop on >> the cells and do that 'Derivative' thing at first order by simply taking a >> finite-difference like approximation with the neighboring cells, >> Derivative(phi) = phi_i - phi_{i-1} and I assemble the B matrix block by >> block (JFunction is the dF/dU) >> >> /* diagonal element */ >> for (v=0; v> v; } >> ierr = solver->JFunction >> >> (values,(u+nvars*p),solver->physics >> >> ,dir,0); >> _ArrayScale1D_ >> >> (values,(dxinv*iblank),(nvars*nvars)); >> ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); >> CHKERRQ(ierr); >> >> /* left neighbor */ >> if (pgL >= 0) { >> for (v=0; v> + v; } >> ierr = solver->JFunction >> >> (values,(u+nvars*pL),solver->physics >> >> ,dir,1); >> _ArrayScale1D_ >> >> (values,(-dxinv*iblank),(nvars*nvars)); >> ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); >> CHKERRQ(ierr); >> } >> >> /* right neighbor */ >> if (pgR >= 0) { >> for (v=0; v> + v; } >> ierr = solver->JFunction >> >> (values,(u+nvars*pR),solver->physics >> >> ,dir,-1); >> _ArrayScale1D_ >> >> (values,(-dxinv*iblank),(nvars*nvars)); >> ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); >> CHKERRQ(ierr); >> } >> >> >> >> I do not know if I am clear here ... >> Anyways, I am trying to figure out how to do this shell matrix and this >> preconditioner using all the FV and DMPlex artillery. >> > > Okay, that is very clear. We should be able to get the JFNK just with > -snes_mf_operator, and put the approximate J construction in > DMPlexComputeJacobian_Internal(). > There is an FV section already, and we could just add this. I would need > to understand those entries in the pointwise Riemann sense that the other > stuff is now. > Ok i had a quick look and if I understood correctly it would do the job. Setting the snes-mf-operator flag would mean however that we have to go through SNESSetJacobian to set the jacobian and the preconditioning matrix wouldn't it ? There might be calls to the Riemann solver to evaluate the dRHS / dU part yes but maybe it's possible to re-use what was computed for the RHS^n ? In the FV section the jacobian is set to identity which I missed before, but it could explain why when I used the following : TSSetType(ts, TSBEULER); DMTSSetIFunctionLocal(dm, DMPlexTSComputeIFunctionFEM , &ctx); DMTSSetIJacobianLocal(dm, DMPlexTSComputeIJacobianFEM , &ctx); with my FV discretization nothing happened, right ? Thank you, Thibault Thanks, > > Matt > > >> Le ven. 21 ao?t 2020 ? 14:55, Thibault Bridel-Bertomeu < >> thibault.bridelbertomeu at gmail.com> a ?crit : >> >>> Hi, >>> >>> Thanks Matthew and Jed for your input. >>> I indeed envision an implicit solver in the sense Jed mentioned - Jiri >>> Blazek's book is a nice intro to this concept. >>> >>> Matthew, I do not know exactly what to change right now because although >>> I understand globally what the DMPlexComputeXXXX_Internal methods do, I >>> cannot say for sure line by line what is happening. >>> In a structured code, I have a an implicit FVM solver with PETSc but I >>> do not use any of the FV structure, not even a DM - I just use C arrays >>> that I transform to PETSc Vec and Mat and build my IJacobian and my >>> preconditioner and gives all that to a TS and it runs. I cannot figure out >>> how to do it with the FV and the DM and all the underlying "shortcuts" that >>> I want to use. >>> >>> Here is the top method for the structured code : >>> >>> int total_size = context.npoints * solver->nvars >>> ierr = TSSetRHSFunction(ts,PETSC_NULL,PetscRHSFunctionImpl,&context); >>> CHKERRQ(ierr); >>> SNES snes; >>> KSP ksp; >>> PC pc; >>> SNESType snestype; >>> ierr = TSGetSNES(ts,&snes); CHKERRQ(ierr); >>> ierr = SNESGetType(snes,&snestype); CHKERRQ(ierr); >>> >>> flag_mat_a = 1; >>> ierr = MatCreateShell(MPI_COMM_WORLD,total_size,total_size, >>> PETSC_DETERMINE, >>> PETSC_DETERMINE,&context,&A); CHKERRQ(ierr); >>> context.jfnk_eps = 1e-7; >>> ierr = PetscOptionsGetReal(NULL,NULL,"-jfnk_epsilon",&context.jfnk_eps, >>> NULL); CHKERRQ(ierr); >>> ierr = MatShellSetOperation(A,MATOP_MULT,(void (*)(void)) >>> PetscJacobianFunction_JFNK); CHKERRQ(ierr); >>> ierr = MatSetUp(A); CHKERRQ(ierr); >>> >>> context.flag_use_precon = 0; >>> ierr = PetscOptionsGetBool(PETSC_NULL,PETSC_NULL,"-with_pc",(PetscBool >>> *)(&context.flag_use_precon),PETSC_NULL); CHKERRQ(ierr); >>> >>> /* Set up preconditioner matrix */ >>> flag_mat_b = 1; >>> ierr = MatCreateAIJ(MPI_COMM_WORLD,total_size,total_size,PETSC_DETERMINE >>> ,PETSC_DETERMINE, >>> (solver->ndims*2+1)*solver->nvars,NULL, >>> 2*solver->ndims*solver->nvars,NULL,&B); CHKERRQ(ierr); >>> ierr = MatSetBlockSize(B,solver->nvars); >>> /* Set the RHSJacobian function for TS */ >>> ierr = TSSetIJacobian(ts,A,B,PetscIJacobian,&context); CHKERRQ(ierr); >>> >>> Thibault Bridel-Bertomeu >>> ? >>> Eng, MSc, PhD >>> Research Engineer >>> CEA/CESTA >>> 33114 LE BARP >>> Tel.: (+33)557046924 >>> Mob.: (+33)611025322 >>> Mail: thibault.bridelbertomeu at gmail.com >>> >>> >>> Le jeu. 20 ao?t 2020 ? 18:43, Jed Brown a ?crit : >>> >>>> Matthew Knepley writes: >>>> >>>> > I could never get the FVM stuff to make sense to me for implicit >>>> methods. >>>> > Here is my problem understanding. If you have an FVM method, it >>>> decides >>>> > to move "stuff" from one cell to its neighboring cells depending on >>>> the >>>> > solution to the Riemann problem on each face, which computed the >>>> flux. This >>>> > is >>>> > fine unless the timestep is so big that material can flow through >>>> into the >>>> > cells beyond the neighbor. Then I should have considered the effect >>>> of the >>>> > Riemann problem for those interfaces. That would be in the Jacobian, >>>> but I >>>> > don't know how to compute that Jacobian. I guess you could do >>>> everything >>>> > matrix-free, but without a preconditioner it seems hard. >>>> >>>> So long as we're using method of lines, the flux is just instantaneous >>>> flux, not integrated over some time step. It has the same meaning for >>>> implicit and explicit. >>>> >>>> An explicit method would be unstable if you took such a large time step >>>> (CFL) and an implicit method will not simultaneously be SSP and higher than >>>> first order, but it's still a consistent discretization of the problem. >>>> >>>> It's common (done in FUN3D and others) to precondition with a >>>> first-order method, where gradient reconstruction/limiting is skipped. >>>> That's what I'd recommend because limiting creates nasty nonlinearities and >>>> the resulting discretizations lack h-ellipticity which makes them very hard >>>> to solve. >>>> >>> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Aug 21 10:22:19 2020 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 21 Aug 2020 10:22:19 -0500 Subject: [petsc-users] PetscFV and TS implicit In-Reply-To: References: <87mu2pgtdp.fsf@jedbrown.org> Message-ID: <01FA5D4D-A0CA-4ACB-ACC9-EB213E3B0D2F@petsc.dev> > On Aug 21, 2020, at 8:35 AM, Thibault Bridel-Bertomeu wrote: > > > > Le ven. 21 ao?t 2020 ? 15:23, Matthew Knepley > a ?crit : > On Fri, Aug 21, 2020 at 9:10 AM Thibault Bridel-Bertomeu > wrote: > Sorry, I sent too soon, I hit the wrong key. > > I wanted to say that context.npoints is the local number of cells. > > PetscRHSFunctionImpl allows to generate the hyperbolic part of the right hand side. > Then we have : > > PetscErrorCode PetscIJacobian( > TS ts, /*!< Time stepping object (see PETSc TS)*/ > PetscReal t, /*!< Current time */ > Vec Y, /*!< Solution vector */ > Vec Ydot, /*!< Time-derivative of solution vector */ > PetscReal a, /*!< Shift */ > Mat A, /*!< Jacobian matrix */ > Mat B, /*!< Preconditioning matrix */ > void *ctxt /*!< Application context */ > ) > { > PETScContext *context = (PETScContext*) ctxt; > HyPar *solver = context->solver; > _DECLARE_IERR_; > > PetscFunctionBegin; > solver->count_IJacobian++; > context->shift = a; > context->waqt = t; > /* Construct preconditioning matrix */ > if (context->flag_use_precon) { IERR PetscComputePreconMatImpl(B,Y,context); CHECKERR(ierr); } > > PetscFunctionReturn(0); > } > > and PetscJacobianFunction_JFNK which I bind to the matrix shell, computes the action of the jacobian on a vector : say U0 is the state of reference and Y the vector upon which to apply the JFNK method, then the PetscJacobianFunction_JFNK returns shift * Y - 1/epsilon * (F(U0 + epsilon*Y) - F(U0)) where F allows to evaluate the hyperbolic flux (shift comes from the TS). > The preconditioning matrix I compute as an approximation to the actual jacobian, that is shift * Identity - Derivative(dF/dU) where dF/dU is, in each cell, a 4x4 matrix that is known exactly for the system of equations I am solving, i.e. Euler equations. For the structured grid, I can loop on the cells and do that 'Derivative' thing at first order by simply taking a finite-difference like approximation with the neighboring cells, Derivative(phi) = phi_i - phi_{i-1} and I assemble the B matrix block by block (JFunction is the dF/dU) > > /* diagonal element */ > <> for (v=0; v <> ierr = solver->JFunction (values,(u+nvars*p),solver->physics ,dir,0); > <> _ArrayScale1D_ (values,(dxinv*iblank),(nvars*nvars)); > <> ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); CHKERRQ(ierr); > <> > <> /* left neighbor */ > <> if (pgL >= 0) { > <> for (v=0; v <> ierr = solver->JFunction (values,(u+nvars*pL),solver->physics ,dir,1); > <> _ArrayScale1D_ (values,(-dxinv*iblank),(nvars*nvars)); > <> ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); CHKERRQ(ierr); > <> } > <> > <> /* right neighbor */ > <> if (pgR >= 0) { > <> for (v=0; v <> ierr = solver->JFunction (values,(u+nvars*pR),solver->physics ,dir,-1); > <> _ArrayScale1D_ (values,(-dxinv*iblank),(nvars*nvars)); > <> ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); CHKERRQ(ierr); > <> } > > > > I do not know if I am clear here ... > Anyways, I am trying to figure out how to do this shell matrix and this preconditioner using all the FV and DMPlex artillery. > > Okay, that is very clear. We should be able to get the JFNK just with -snes_mf_operator, and put the approximate J construction in DMPlexComputeJacobian_Internal(). > There is an FV section already, and we could just add this. I would need to understand those entries in the pointwise Riemann sense that the other stuff is now. > > Ok i had a quick look and if I understood correctly it would do the job. Setting the snes-mf-operator flag would mean however that we have to go through SNESSetJacobian to set the jacobian and the preconditioning matrix wouldn't it ? Thibault, Since the TS implicit methods end up using SNES internally the option should be available to you without requiring you to be calling the SNES routines directly Once you have finalized your approach and if for the implicit case you always work in the snes mf operator mode you can hardwire TSGetSNES(ts,&snes); SNESSetUseMatrixFree(snes,PETSC_TRUE,PETSC_FALSE); in your code so you don't need to always provide the option -snes-mf-operator Barry > There might be calls to the Riemann solver to evaluate the dRHS / dU part yes but maybe it's possible to re-use what was computed for the RHS^n ? > In the FV section the jacobian is set to identity which I missed before, but it could explain why when I used the following : > TSSetType(ts, TSBEULER); > DMTSSetIFunctionLocal(dm, DMPlexTSComputeIFunctionFEM , &ctx); > DMTSSetIJacobianLocal(dm, DMPlexTSComputeIJacobianFEM , &ctx); > with my FV discretization nothing happened, right ? > > Thank you, > > Thibault > > Thanks, > > Matt > > Le ven. 21 ao?t 2020 ? 14:55, Thibault Bridel-Bertomeu > a ?crit : > Hi, > > Thanks Matthew and Jed for your input. > I indeed envision an implicit solver in the sense Jed mentioned - Jiri Blazek's book is a nice intro to this concept. > > Matthew, I do not know exactly what to change right now because although I understand globally what the DMPlexComputeXXXX_Internal methods do, I cannot say for sure line by line what is happening. > In a structured code, I have a an implicit FVM solver with PETSc but I do not use any of the FV structure, not even a DM - I just use C arrays that I transform to PETSc Vec and Mat and build my IJacobian and my preconditioner and gives all that to a TS and it runs. I cannot figure out how to do it with the FV and the DM and all the underlying "shortcuts" that I want to use. > > Here is the top method for the structured code : > > int total_size = context.npoints * solver->nvars > ierr = TSSetRHSFunction(ts,PETSC_NULL,PetscRHSFunctionImpl,&context); CHKERRQ(ierr); > SNES snes; > KSP ksp; > PC pc; > SNESType snestype; > ierr = TSGetSNES(ts,&snes); CHKERRQ(ierr); > ierr = SNESGetType(snes,&snestype); CHKERRQ(ierr); > > flag_mat_a = 1; > ierr = MatCreateShell(MPI_COMM_WORLD,total_size,total_size,PETSC_DETERMINE, > PETSC_DETERMINE,&context,&A); CHKERRQ(ierr); > context.jfnk_eps = 1e-7; > ierr = PetscOptionsGetReal(NULL,NULL,"-jfnk_epsilon",&context.jfnk_eps,NULL); CHKERRQ(ierr); > ierr = MatShellSetOperation(A,MATOP_MULT,(void (*)(void))PetscJacobianFunction_JFNK); CHKERRQ(ierr); > ierr = MatSetUp(A); CHKERRQ(ierr); > > context.flag_use_precon = 0; > ierr = PetscOptionsGetBool(PETSC_NULL,PETSC_NULL,"-with_pc",(PetscBool*)(&context.flag_use_precon),PETSC_NULL); CHKERRQ(ierr); > > /* Set up preconditioner matrix */ > flag_mat_b = 1; > ierr = MatCreateAIJ(MPI_COMM_WORLD,total_size,total_size,PETSC_DETERMINE,PETSC_DETERMINE, > (solver->ndims*2+1)*solver->nvars,NULL, > 2*solver->ndims*solver->nvars,NULL,&B); CHKERRQ(ierr); > ierr = MatSetBlockSize(B,solver->nvars); > /* Set the RHSJacobian function for TS */ > ierr = TSSetIJacobian(ts,A,B,PetscIJacobian,&context); CHKERRQ(ierr); > > Thibault Bridel-Bertomeu > ? > Eng, MSc, PhD > Research Engineer > CEA/CESTA > 33114 LE BARP > Tel.: (+33)557046924 > Mob.: (+33)611025322 > Mail: thibault.bridelbertomeu at gmail.com > > > Le jeu. 20 ao?t 2020 ? 18:43, Jed Brown > a ?crit : > Matthew Knepley > writes: > > > I could never get the FVM stuff to make sense to me for implicit methods. > > Here is my problem understanding. If you have an FVM method, it decides > > to move "stuff" from one cell to its neighboring cells depending on the > > solution to the Riemann problem on each face, which computed the flux. This > > is > > fine unless the timestep is so big that material can flow through into the > > cells beyond the neighbor. Then I should have considered the effect of the > > Riemann problem for those interfaces. That would be in the Jacobian, but I > > don't know how to compute that Jacobian. I guess you could do everything > > matrix-free, but without a preconditioner it seems hard. > > So long as we're using method of lines, the flux is just instantaneous flux, not integrated over some time step. It has the same meaning for implicit and explicit. > > An explicit method would be unstable if you took such a large time step (CFL) and an implicit method will not simultaneously be SSP and higher than first order, but it's still a consistent discretization of the problem. > > It's common (done in FUN3D and others) to precondition with a first-order method, where gradient reconstruction/limiting is skipped. That's what I'd recommend because limiting creates nasty nonlinearities and the resulting discretizations lack h-ellipticity which makes them very hard to solve. > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From thijs.smit at hest.ethz.ch Fri Aug 21 03:49:45 2020 From: thijs.smit at hest.ethz.ch (Smit Thijs) Date: Fri, 21 Aug 2020 08:49:45 +0000 Subject: [petsc-users] =?windows-1252?q?error=3A_invalid_types_=91PetscSc?= =?windows-1252?q?alar*_=7Baka_double*=7D=5BPetscScalar_=7Baka_double=7D?= =?windows-1252?q?=5D=92_for_array_subscript?= Message-ID: <3a9eee1ee0214244b52790cd49737d88@hest.ethz.ch> Hi All, I am having the following error when I try to do a mapping with vectors and I can?t figure out how to solve this or what is going wrong: error: invalid types ?PetscScalar* {aka double*}[PetscScalar {aka double}]? for array subscript xpMMA[i] = xp[indicesMap[i]]; Herewith two code snippets: // total number of elements on core PetscInt nel; VecGetLocalSize(xPhys, &nel); // create xPassive vector ierr = VecDuplicate(xPhys, &xPassive); CHKERRQ(ierr); // create mapping vector ierr = VecDuplicate(xPhys, &indicator); CHKERRQ(ierr); // index set for xPassive and indicator PetscScalar *xpPassive, *xpIndicator; ierr = VecGetArray(xPassive, &xpPassive); CHKERRQ(ierr); ierr = VecGetArray(indicator, &xpIndicator); CHKERRQ(ierr); // counters for total and active elements on this processor PetscInt tcount = 0; // total number of elements PetscInt acount = 0; // number of active elements PetscInt scount = 0; // number of solid elements PetscInt rcount = 0; // number of rigid element // loop over all elements and update xPassive from wrapper data // count number of active elements, acount // set indicator vector for (PetscInt el = 0; el < nel; el++) { if (data.xPassive_w.size() > 1) { xpPassive[el] = data.xPassive_w[el]; tcount++; if (xpPassive[el] < 0) { xpIndicator[acount] = el; acount++; } } else { xpPassive[el] = -1.0; // default, if no xPassive_w than all elements are active = -1.0 } } // printing //PetscPrintf(PETSC_COMM_WORLD, "tcount: %i\n", tcount); //PetscPrintf(PETSC_COMM_WORLD, "acount: %i\n", acount); // Allreduce, get number of active elements over all processes // tmp number of var on proces // acount total number of var sumed PetscInt tmp = acount; acount = 0.0; MPI_Allreduce(&tmp, &(acount), 1, MPIU_INT, MPI_SUM, PETSC_COMM_WORLD); //// create xMMA vector VecCreateMPI(PETSC_COMM_WORLD, tmp, acount, &xMMA); // Pointers to the vectors PetscScalar *xp, *xpMMA, *indicesMap; //PetscInt indicesMap; ierr = VecGetArray(MMAVector, &xpMMA); CHKERRQ(ierr); ierr = VecGetArray(elementVector, &xp); CHKERRQ(ierr); // Index set PetscInt nLocalVar; VecGetLocalSize(xMMA, &nLocalVar); // print number of var on pocessor PetscPrintf(PETSC_COMM_WORLD, "Local var: %i\n", nLocalVar); ierr = VecGetArray(indicator, &indicesMap); CHKERRQ(ierr); // Run through the indices for (PetscInt i = 0; i < nLocalVar; i++) { if (updateDirection > 0) { //PetscPrintf(PETSC_COMM_WORLD, "i: %i, xp[%i] = %f\n", i, indicesMap[i], xp[indicesMap[i]]); xpMMA[i] = xp[indicesMap[i]]; } else if (updateDirection < 0) { xp[indicesMap[i]] = xpMMA[i]; //PetscPrintf(PETSC_COMM_WORLD, "i: %i, xp[%i] = %f\n", i, indicesMap[i], xp[indicesMap[i]]); } } // Restore ierr = VecRestoreArray(elementVector, &xp); CHKERRQ(ierr); ierr = VecRestoreArray(MMAVector, &xpMMA); CHKERRQ(ierr); ierr = VecRestoreArray(indicator, &indicesMap); CHKERRQ(ierr); PetscPrintf(PETSC_COMM_WORLD, "FINISHED UpdateVariables \n"); The error message says that the type with which I try to index is wrong, I think. But VecGetArray only excepts scalars. Furthermore, the el variable is an int, but is seams like to turn out to be a scalar. Does anybody see how to proceed with this? Best regards, Thijs Smit -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Fri Aug 21 10:51:14 2020 From: jroman at dsic.upv.es (Jose E. Roman) Date: Fri, 21 Aug 2020 17:51:14 +0200 Subject: [petsc-users] [SLEPc] Krylov Schur- saving krylov subspace In-Reply-To: <1670-5f3fbf80-27d-50f4c200@91787799> References: <1670-5f3fbf80-27d-50f4c200@91787799> Message-ID: <6520D6CE-543F-4170-B1F7-5FA8B2575424@dsic.upv.es> I see. Doing the exponential with SLEPc itself might be faster, as in https://slepc.upv.es/documentation/current/src/eps/tutorials/ex36.c.html but I cannot say if it will work for your problem. This approach was used in https://doi.org/10.1017/S0022377818001022 EPSGetBV() gives you a BV object that contains the Krylov subspace. But to use EPSSetInitialSpace() with Krylov-Schur you should compute a single vector, and the information for the unconverged Ritz vector cannot be easily recovered. It is easier if you use EPSSUBSPACE, where in that case you can pass the whole subspace to EPSSetInitialSpace(). The problem is that convergence of subspace iteration will likely be much slower than Krylov-Schur. Jose > El 21 ago 2020, a las 14:34, ROLANDI Laura victoria escribi?: > > Thank you for your quick response. > > Yes, i'm running in parallel, I'm just asking for 2 eigenvalues and I'm not doing any factorization. > > My problem is taking so long because I have implemented the time stepping-exponential transformation: my MatMult() function for computing vectors of the Krylov subspace calls the Direct Numerical Simulation code for compressible Navier-Stokes equations to which I'm linking the stability code. > Therefore, each MatMult() call takes very long, and I cannot save the converged eigenvectors for the restart beacause there won't be any converged eigenvectors yet when the job is killed. > That's why I thought that the only thing I could save was the computed krylov subspace. > > Victoria > > > > Il giorno Venerdi, Agosto 21, 2020 12:42 CEST, "Jose E. Roman" ha scritto: > >> Why is your problem taking so long? Are you running in parallel? Is your computation doing a factorization of a matrix? Are you getting slow convergence? How many eigenvalues are you computing? Note that Krylov-Schur is not intended for computing a large percentage of eigenvalues, if you do so then you might get large overheads unless you tune the EPSSetDimensions() parameters (mpd). >> >> EPSSetInitialSpace() is intended to provide an initial guess, which in Krylov-Schur is a single vector, so in this case you would not pass the Krylov subspace from a previous run. >> >> A possible scheme for restarting is to save the eigenvectors computed so far, then pass them in the next run via EPSSetDeflationSpace() to avoid recomputing them. You can use a custom stopping criterion as in https://slepc.upv.es/documentation/current/src/eps/tutorials/ex29.c.html to stop before the job is killed, then save the converged eigenvectors (or EPSGetInvariantSubspace() if the problem is nonsymmetric). >> >> Jose >> >> >> > El 21 ago 2020, a las 11:56, ROLANDI Laura victoria escribi?: >> > >> > Dear SLEPc developers, >> > >> > I'm using the Krylov Schur EPS and I have a question regarding a command. >> > >> > Is there a way for having access and saving the krylov subspace during the EPSSolve call? >> > >> > I inizialize the solver using the function EPSSetInitialSpace(eps,1, v0), where v0 is a specific vector, but after 24 hours of calculation my job has to end even if the EPSSolve hasn't finished yet. >> > Which function should I use for saving the computed Krylov subspace and its dimention n during the process, in order to restart the calculation from it by using EPSsetInitialSpace(eps,n, Krylov-Subspace)? >> > >> > Thank you very much, >> > Victoria >> > > > > -- > --- > Laura victoria ROLANDI > Doctorant - Doctorat ISAE-SUPAERO Doctorat 1 > laura-victoria.rolandi at isae-supaero.fr > https://www.isae-supaero.fr > Institut Sup?rieur de l'A?ronautique et de l'Espace > 10, avenue Edouard Belin - BP 54032 > 31055 Toulouse Cedex 4 > France > > > Suivez l'ISAE-SUPAERO sur les r?seaux sociaux / Follow the ISAE-SUPAERO on the social media > Facebook Twitter LinkedIn Youtube Instagram From thibault.bridelbertomeu at gmail.com Fri Aug 21 10:58:24 2020 From: thibault.bridelbertomeu at gmail.com (Thibault Bridel-Bertomeu) Date: Fri, 21 Aug 2020 17:58:24 +0200 Subject: [petsc-users] PetscFV and TS implicit In-Reply-To: <01FA5D4D-A0CA-4ACB-ACC9-EB213E3B0D2F@petsc.dev> References: <87mu2pgtdp.fsf@jedbrown.org> <01FA5D4D-A0CA-4ACB-ACC9-EB213E3B0D2F@petsc.dev> Message-ID: Thank you Barry for the tip ! I?ll make sure to do that when everything is set. What I also meant is that there will not be any more direct way to set the preconditioner than to go through SNESSetJacobian after having assembled everything by hand ? Like, in my case, or in the more general case of fluid dynamics equations, the preconditioner is not a fun matrix to assemble, because for every cell the derivative of the physical flux jacobian has to be taken and put in the right block in the matrix - finite element style if you want. Is there a way to do that with Petsc methods, maybe short-circuiting the FEM based methods ? Thanks ! Thibault Le ven. 21 ao?t 2020 ? 17:22, Barry Smith a ?crit : > > > On Aug 21, 2020, at 8:35 AM, Thibault Bridel-Bertomeu < > thibault.bridelbertomeu at gmail.com> wrote: > > > > Le ven. 21 ao?t 2020 ? 15:23, Matthew Knepley a > ?crit : > >> On Fri, Aug 21, 2020 at 9:10 AM Thibault Bridel-Bertomeu < >> thibault.bridelbertomeu at gmail.com> wrote: >> >>> Sorry, I sent too soon, I hit the wrong key. >>> >>> I wanted to say that context.npoints is the local number of cells. >>> >>> PetscRHSFunctionImpl allows to generate the hyperbolic part of the right >>> hand side. >>> Then we have : >>> >>> PetscErrorCode PetscIJacobian( >>> TS ts, /*!< Time stepping object (see PETSc TS)*/ >>> PetscReal t, /*!< Current time */ >>> Vec Y, /*!< Solution vector */ >>> Vec Ydot, /*!< Time-derivative of solution vector */ >>> PetscReal a, /*!< Shift */ >>> Mat A, /*!< Jacobian matrix */ >>> Mat B, /*!< Preconditioning matrix */ >>> void *ctxt /*!< Application context */ >>> ) >>> { >>> PETScContext *context = (PETScContext*) ctxt; >>> HyPar *solver = context->solver; >>> _DECLARE_IERR_; >>> >>> PetscFunctionBegin; >>> solver->count_IJacobian++; >>> context->shift = a; >>> context->waqt = t; >>> /* Construct preconditioning matrix */ >>> if (context->flag_use_precon) { IERR PetscComputePreconMatImpl(B,Y, >>> context); CHECKERR(ierr); } >>> >>> PetscFunctionReturn(0); >>> } >>> >>> and PetscJacobianFunction_JFNK which I bind to the matrix shell, >>> computes the action of the jacobian on a vector : say U0 is the state of >>> reference and Y the vector upon which to apply the JFNK method, then the >>> PetscJacobianFunction_JFNK returns shift * Y - 1/epsilon * (F(U0 + >>> epsilon*Y) - F(U0)) where F allows to evaluate the hyperbolic flux (shift >>> comes from the TS). >>> The preconditioning matrix I compute as an approximation to the actual >>> jacobian, that is shift * Identity - Derivative(dF/dU) where dF/dU is, in >>> each cell, a 4x4 matrix that is known exactly for the system of equations I >>> am solving, i.e. Euler equations. For the structured grid, I can loop on >>> the cells and do that 'Derivative' thing at first order by simply taking a >>> finite-difference like approximation with the neighboring cells, >>> Derivative(phi) = phi_i - phi_{i-1} and I assemble the B matrix block by >>> block (JFunction is the dF/dU) >>> >>> /* diagonal element */ >>> >>> >>> for (v=0; v>> + v; } >>> >>> >>> ierr = solver->JFunction >>> >>> (values,(u+nvars*p),solver->physics >>> >>> ,dir,0); >>> >>> >>> _ArrayScale1D_ >>> >>> (values,(dxinv*iblank),(nvars*nvars)); >>> >>> >>> ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); >>> CHKERRQ(ierr); >>> >>> >>> >>> >>> >>> /* left neighbor */ >>> >>> >>> if (pgL >= 0) { >>> >>> >>> for (v=0; v>> nvars*pgL + v; } >>> >>> >>> ierr = solver->JFunction >>> >>> (values,(u+nvars*pL),solver->physics >>> >>> ,dir,1); >>> >>> >>> _ArrayScale1D_ >>> >>> (values,(-dxinv*iblank),(nvars*nvars)); >>> >>> >>> ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); >>> CHKERRQ(ierr); >>> >>> >>> } >>> >>> >>> >>> >>> >>> /* right neighbor */ >>> >>> >>> if (pgR >= 0) { >>> >>> >>> for (v=0; v>> nvars*pgR + v; } >>> >>> >>> ierr = solver->JFunction >>> >>> (values,(u+nvars*pR),solver->physics >>> >>> ,dir,-1); >>> >>> >>> _ArrayScale1D_ >>> >>> (values,(-dxinv*iblank),(nvars*nvars)); >>> >>> >>> ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); >>> CHKERRQ(ierr); >>> >>> >>> } >>> >>> >>> >>> I do not know if I am clear here ... >>> Anyways, I am trying to figure out how to do this shell matrix and this >>> preconditioner using all the FV and DMPlex artillery. >>> >> >> Okay, that is very clear. We should be able to get the JFNK just with >> -snes_mf_operator, and put the approximate J construction in >> DMPlexComputeJacobian_Internal(). >> There is an FV section already, and we could just add this. I would need >> to understand those entries in the pointwise Riemann sense that the other >> stuff is now. >> > > Ok i had a quick look and if I understood correctly it would do the job. > Setting the snes-mf-operator flag would mean however that we have to go > through SNESSetJacobian to set the jacobian and the preconditioning matrix > wouldn't it ? > > > Thibault, > > Since the TS implicit methods end up using SNES internally the option > should be available to you without requiring you to be calling the SNES > routines directly > > Once you have finalized your approach and if for the implicit case you > always work in the snes mf operator mode you can hardwire > > TSGetSNES(ts,&snes); > SNESSetUseMatrixFree(snes,PETSC_TRUE,PETSC_FALSE); > > in your code so you don't need to always provide the option > -snes-mf-operator > > Barry > > > > > There might be calls to the Riemann solver to evaluate the dRHS / dU part > yes but maybe it's possible to re-use what was computed for the RHS^n ? > In the FV section the jacobian is set to identity which I missed before, > but it could explain why when I used the following : > > TSSetType(ts, TSBEULER); > DMTSSetIFunctionLocal(dm, DMPlexTSComputeIFunctionFEM , &ctx); > DMTSSetIJacobianLocal(dm, DMPlexTSComputeIJacobianFEM , &ctx); > > with my FV discretization nothing happened, right ? > > Thank you, > > Thibault > > Thanks, >> >> Matt >> >> >>> Le ven. 21 ao?t 2020 ? 14:55, Thibault Bridel-Bertomeu < >>> thibault.bridelbertomeu at gmail.com> a ?crit : >>> >>>> Hi, >>>> >>>> Thanks Matthew and Jed for your input. >>>> I indeed envision an implicit solver in the sense Jed mentioned - Jiri >>>> Blazek's book is a nice intro to this concept. >>>> >>>> Matthew, I do not know exactly what to change right now because >>>> although I understand globally what the DMPlexComputeXXXX_Internal methods >>>> do, I cannot say for sure line by line what is happening. >>>> In a structured code, I have a an implicit FVM solver with PETSc but I >>>> do not use any of the FV structure, not even a DM - I just use C arrays >>>> that I transform to PETSc Vec and Mat and build my IJacobian and my >>>> preconditioner and gives all that to a TS and it runs. I cannot figure out >>>> how to do it with the FV and the DM and all the underlying "shortcuts" that >>>> I want to use. >>>> >>>> Here is the top method for the structured code : >>>> >>>> int total_size = context.npoints * solver->nvars >>>> ierr = TSSetRHSFunction(ts,PETSC_NULL,PetscRHSFunctionImpl,&context); >>>> CHKERRQ(ierr); >>>> SNES snes; >>>> KSP ksp; >>>> PC pc; >>>> SNESType snestype; >>>> ierr = TSGetSNES(ts,&snes); CHKERRQ(ierr); >>>> ierr = SNESGetType(snes,&snestype); CHKERRQ(ierr); >>>> >>>> flag_mat_a = 1; >>>> ierr = MatCreateShell(MPI_COMM_WORLD,total_size,total_size, >>>> PETSC_DETERMINE, >>>> PETSC_DETERMINE,&context,&A); CHKERRQ(ierr); >>>> context.jfnk_eps = 1e-7; >>>> ierr = PetscOptionsGetReal(NULL,NULL,"-jfnk_epsilon",&context.jfnk_eps, >>>> NULL); CHKERRQ(ierr); >>>> ierr = MatShellSetOperation(A,MATOP_MULT,(void (*)(void)) >>>> PetscJacobianFunction_JFNK); CHKERRQ(ierr); >>>> ierr = MatSetUp(A); CHKERRQ(ierr); >>>> >>>> context.flag_use_precon = 0; >>>> ierr = PetscOptionsGetBool(PETSC_NULL,PETSC_NULL,"-with_pc",(PetscBool >>>> *)(&context.flag_use_precon),PETSC_NULL); CHKERRQ(ierr); >>>> >>>> /* Set up preconditioner matrix */ >>>> flag_mat_b = 1; >>>> ierr = MatCreateAIJ(MPI_COMM_WORLD,total_size,total_size, >>>> PETSC_DETERMINE,PETSC_DETERMINE, >>>> (solver->ndims*2+1)*solver->nvars,NULL, >>>> 2*solver->ndims*solver->nvars,NULL,&B); CHKERRQ(ierr); >>>> ierr = MatSetBlockSize(B,solver->nvars); >>>> /* Set the RHSJacobian function for TS */ >>>> ierr = TSSetIJacobian(ts,A,B,PetscIJacobian,&context); CHKERRQ(ierr); >>>> >>>> Thibault Bridel-Bertomeu >>>> ? >>>> Eng, MSc, PhD >>>> Research Engineer >>>> CEA/CESTA >>>> 33114 LE BARP >>>> Tel.: (+33)557046924 >>>> Mob.: (+33)611025322 >>>> Mail: thibault.bridelbertomeu at gmail.com >>>> >>>> >>>> Le jeu. 20 ao?t 2020 ? 18:43, Jed Brown a ?crit : >>>> >>>>> Matthew Knepley writes: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> > I could never get the FVM stuff to make sense to me for implicit >>>>> methods. >>>>> >>>>> >>>>> > Here is my problem understanding. If you have an FVM method, it >>>>> decides >>>>> >>>>> >>>>> > to move "stuff" from one cell to its neighboring cells depending on >>>>> the >>>>> >>>>> >>>>> > solution to the Riemann problem on each face, which computed the >>>>> flux. This >>>>> >>>>> >>>>> > is >>>>> >>>>> >>>>> > fine unless the timestep is so big that material can flow through >>>>> into the >>>>> >>>>> >>>>> > cells beyond the neighbor. Then I should have considered the effect >>>>> of the >>>>> >>>>> >>>>> > Riemann problem for those interfaces. That would be in the Jacobian, >>>>> but I >>>>> >>>>> >>>>> > don't know how to compute that Jacobian. I guess you could do >>>>> everything >>>>> >>>>> >>>>> > matrix-free, but without a preconditioner it seems hard. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> So long as we're using method of lines, the flux is just instantaneous >>>>> flux, not integrated over some time step. It has the same meaning for >>>>> implicit and explicit. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> An explicit method would be unstable if you took such a large time >>>>> step (CFL) and an implicit method will not simultaneously be SSP and higher >>>>> than first order, but it's still a consistent discretization of the problem. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> It's common (done in FUN3D and others) to precondition with a >>>>> first-order method, where gradient reconstruction/limiting is skipped. >>>>> That's what I'd recommend because limiting creates nasty nonlinearities and >>>>> the resulting discretizations lack h-ellipticity which makes them very hard >>>>> to solve. >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> > > > -- Thibault Bridel-Bertomeu ? Eng, MSc, PhD Research Engineer CEA/CESTA 33114 LE BARP Tel.: (+33)557046924 Mob.: (+33)611025322 Mail: thibault.bridelbertomeu at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Aug 21 11:19:34 2020 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 21 Aug 2020 12:19:34 -0400 Subject: [petsc-users] =?utf-8?q?error=3A_invalid_types_=E2=80=98PetscSca?= =?utf-8?q?lar*_=7Baka_double*=7D=5BPetscScalar_=7Baka_double=7D=5D?= =?utf-8?q?=E2=80=99_for_array_subscript?= In-Reply-To: <3a9eee1ee0214244b52790cd49737d88@hest.ethz.ch> References: <3a9eee1ee0214244b52790cd49737d88@hest.ethz.ch> Message-ID: On Fri, Aug 21, 2020 at 11:49 AM Smit Thijs wrote: > Hi All, > > > > I am having the following error when I try to do a mapping with vectors > and I can?t figure out how to solve this or what is going wrong: > > error: invalid types ?PetscScalar* {aka double*}[PetscScalar {aka > double}]? for array subscript > > xpMMA[i] = xp[indicesMap[i]]; > > > > Herewith two code snippets: > > // total number of elements on core > > PetscInt nel; > > VecGetLocalSize(xPhys, &nel); > > > > // create xPassive vector > > ierr = VecDuplicate(xPhys, &xPassive); > > CHKERRQ(ierr); > > > > // create mapping vector > > ierr = VecDuplicate(xPhys, &indicator); > > CHKERRQ(ierr); > > > > // index set for xPassive and indicator > > PetscScalar *xpPassive, *xpIndicator; > > ierr = VecGetArray(xPassive, &xpPassive); > > CHKERRQ(ierr); > > ierr = VecGetArray(indicator, &xpIndicator); > > CHKERRQ(ierr); > > > > // counters for total and active elements on this processor > > PetscInt tcount = 0; // total number of elements > > PetscInt acount = 0; // number of active elements > > PetscInt scount = 0; // number of solid elements > > PetscInt rcount = 0; // number of rigid element > > > > // loop over all elements and update xPassive from wrapper data > > // count number of active elements, acount > > // set indicator vector > > for (PetscInt el = 0; el < nel; el++) { > > if (data.xPassive_w.size() > 1) { > > xpPassive[el] = data.xPassive_w[el]; > > tcount++; > > if (xpPassive[el] < 0) { > > xpIndicator[acount] = el; > > acount++; > > } > > } else { > > xpPassive[el] = -1.0; // default, if no xPassive_w than all > elements are active = -1.0 > > } > > } > > > > // printing > > //PetscPrintf(PETSC_COMM_WORLD, "tcount: %i\n", tcount); > > //PetscPrintf(PETSC_COMM_WORLD, "acount: %i\n", acount); > > > > // Allreduce, get number of active elements over all processes > > // tmp number of var on proces > > // acount total number of var sumed > > PetscInt tmp = acount; > > acount = 0.0; > > MPI_Allreduce(&tmp, &(acount), 1, MPIU_INT, MPI_SUM, PETSC_COMM_WORLD); > > > > //// create xMMA vector > > VecCreateMPI(PETSC_COMM_WORLD, tmp, acount, &xMMA); > > > > // Pointers to the vectors > > PetscScalar *xp, *xpMMA, *indicesMap; > Here you declare indicesMap as PetscScalar[]. You cannot index an array with this. I see that you want to store these indices in a Vec. You should use an IS instead. Thanks, Matt > //PetscInt indicesMap; > > ierr = VecGetArray(MMAVector, &xpMMA); > > CHKERRQ(ierr); > > ierr = VecGetArray(elementVector, &xp); > > CHKERRQ(ierr); > > // Index set > > PetscInt nLocalVar; > > VecGetLocalSize(xMMA, &nLocalVar); > > > > // print number of var on pocessor > > PetscPrintf(PETSC_COMM_WORLD, "Local var: %i\n", nLocalVar); > > > > ierr = VecGetArray(indicator, &indicesMap); > > CHKERRQ(ierr); > > > > // Run through the indices > > for (PetscInt i = 0; i < nLocalVar; i++) { > > if (updateDirection > 0) { > > //PetscPrintf(PETSC_COMM_WORLD, "i: %i, xp[%i] = %f\n", i, > indicesMap[i], xp[indicesMap[i]]); > > xpMMA[i] = xp[indicesMap[i]]; > > } else if (updateDirection < 0) { > > xp[indicesMap[i]] = xpMMA[i]; > > //PetscPrintf(PETSC_COMM_WORLD, "i: %i, xp[%i] = %f\n", i, > indicesMap[i], xp[indicesMap[i]]); > > } > > } > > // Restore > > ierr = VecRestoreArray(elementVector, &xp); > > CHKERRQ(ierr); > > ierr = VecRestoreArray(MMAVector, &xpMMA); > > CHKERRQ(ierr); > > ierr = VecRestoreArray(indicator, &indicesMap); > > CHKERRQ(ierr); > > PetscPrintf(PETSC_COMM_WORLD, "FINISHED UpdateVariables \n"); > > > > The error message says that the type with which I try to index is wrong, I > think. But VecGetArray only excepts scalars. Furthermore, the el variable > is an int, but is seams like to turn out to be a scalar. Does anybody see > how to proceed with this? > > > > Best regards, > > > > Thijs Smit > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bhatiamanav at gmail.com Fri Aug 21 11:28:23 2020 From: bhatiamanav at gmail.com (Manav Bhatia) Date: Fri, 21 Aug 2020 11:28:23 -0500 Subject: [petsc-users] MatAssemblyEnd taking too long In-Reply-To: References: <96E8FF05-5719-4244-A763-A9BACDA21C30@gmail.com> <87k0xui1vw.fsf@jedbrown.org> <87h7syi19b.fsf@jedbrown.org> <295DDF3F-433D-4057-9658-31F260D18BE8@gmail.com> <4144EFF3-4F1E-4721-A72B-3321497655B2@gmail.com> <87imddgq3l.fsf@jedbrown.org> <2CAFDC41-75ED-4282-A6C3-9B477E2F8542@gmail.com> Message-ID: <404CFD83-85ED-429A-A678-1E9A3F3B0CFE@gmail.com> Thanks for looking into this, Junchao. I guess the next step if for me to build petsc with the same configuration as yours and see if that works. Regards, Manav > On Aug 20, 2020, at 10:45 PM, Junchao Zhang wrote: > > Manav, > I downloaded your petsc_mat.tgz but could not reproduce the problem, on both Linux and Mac. I used the petsc commit id df0e4300 you mentioned. > On Linux, I have openmpi-4.0.2 + gcc-8.3.0, and petsc is configured --with-debugging --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort --COPTFLAGS="-g -O0" --FOPTFLAGS="-g -O0" --CXXOPTFLAGS="-g -O0" --PETSC_ARCH=linux-host-dbg > On Mac, I have mpich-3.3.1 + clang-11.0.0-apple, and petsc is configured --with-debugging=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort --with-ctable=0 COPTFLAGS="-O0 -g" CXXOPTFLAGS="-O0 -g" PETSC_ARCH=mac-clang-dbg > > mpirun -n 8 ./test > rank: 1 : stdout.processor.1 > rank: 4 : stdout.processor.4 > rank: 0 : stdout.processor.0 > rank: 5 : stdout.processor.5 > rank: 6 : stdout.processor.6 > rank: 7 : stdout.processor.7 > rank: 3 : stdout.processor.3 > rank: 2 : stdout.processor.2 > rank: 1 : Beginning reading nnz... > rank: 4 : Beginning reading nnz... > rank: 0 : Beginning reading nnz... > rank: 5 : Beginning reading nnz... > rank: 7 : Beginning reading nnz... > rank: 2 : Beginning reading nnz... > rank: 3 : Beginning reading nnz... > rank: 6 : Beginning reading nnz... > rank: 5 : Finished reading nnz > rank: 5 : Beginning mat preallocation... > rank: 3 : Finished reading nnz > rank: 3 : Beginning mat preallocation... > rank: 4 : Finished reading nnz > rank: 4 : Beginning mat preallocation... > rank: 7 : Finished reading nnz > rank: 7 : Beginning mat preallocation... > rank: 1 : Finished reading nnz > rank: 1 : Beginning mat preallocation... > rank: 0 : Finished reading nnz > rank: 0 : Beginning mat preallocation... > rank: 2 : Finished reading nnz > rank: 2 : Beginning mat preallocation... > rank: 6 : Finished reading nnz > rank: 6 : Beginning mat preallocation... > rank: 5 : Finished preallocation > rank: 5 : Beginning reading and setting matrix values... > rank: 1 : Finished preallocation > rank: 1 : Beginning reading and setting matrix values... > rank: 7 : Finished preallocation > rank: 7 : Beginning reading and setting matrix values... > rank: 2 : Finished preallocation > rank: 2 : Beginning reading and setting matrix values... > rank: 4 : Finished preallocation > rank: 4 : Beginning reading and setting matrix values... > rank: 0 : Finished preallocation > rank: 0 : Beginning reading and setting matrix values... > rank: 3 : Finished preallocation > rank: 3 : Beginning reading and setting matrix values... > rank: 6 : Finished preallocation > rank: 6 : Beginning reading and setting matrix values... > rank: 1 : Finished reading and setting matrix values > rank: 1 : Beginning mat assembly... > rank: 5 : Finished reading and setting matrix values > rank: 5 : Beginning mat assembly... > rank: 4 : Finished reading and setting matrix values > rank: 4 : Beginning mat assembly... > rank: 2 : Finished reading and setting matrix values > rank: 2 : Beginning mat assembly... > rank: 3 : Finished reading and setting matrix values > rank: 3 : Beginning mat assembly... > rank: 7 : Finished reading and setting matrix values > rank: 7 : Beginning mat assembly... > rank: 6 : Finished reading and setting matrix values > rank: 6 : Beginning mat assembly... > rank: 0 : Finished reading and setting matrix values > rank: 0 : Beginning mat assembly... > rank: 1 : Finished mat assembly > rank: 3 : Finished mat assembly > rank: 7 : Finished mat assembly > rank: 0 : Finished mat assembly > rank: 5 : Finished mat assembly > rank: 2 : Finished mat assembly > rank: 4 : Finished mat assembly > rank: 6 : Finished mat assembly > > --Junchao Zhang > > > On Thu, Aug 20, 2020 at 5:29 PM Junchao Zhang > wrote: > I will have a look and report back to you. Thanks. > --Junchao Zhang > > > On Thu, Aug 20, 2020 at 5:23 PM Manav Bhatia > wrote: > I have created a standalone test that demonstrates the problem at my end. I have stored the indices, etc. from my problem in a text file for each rank, which I use to initialize the matrix. > Please note that the test is specifically for 8 ranks. > > The .tgz file is on my google drive: https://drive.google.com/file/d/1R-WjS36av3maXX3pUyiR3ndGAxteTVj-/view?usp=sharing > > This contains a README file with instructions on running. Please note that the work directory needs the index files. > > Please let me know if I can provide any further information. > > Thank you all for your help. > > Regards, > Manav > >> On Aug 20, 2020, at 12:54 PM, Jed Brown > wrote: >> >> Matthew Knepley > writes: >> >>> On Thu, Aug 20, 2020 at 11:09 AM Manav Bhatia > wrote: >>> >>>> >>>> >>>> On Aug 20, 2020, at 8:31 AM, Stefano Zampini > >>>> wrote: >>>> >>>> Can you add a MPI_Barrier before >>>> >>>> ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr); >>>> >>>> >>>> With a MPI_Barrier before this function call: >>>> ? three of the processes have already hit this barrier, >>>> ? the other 5 are inside MatStashScatterGetMesg_Private -> >>>> MatStashScatterGetMesg_BTS -> MPI_Waitsome(2 processes)/MPI_Waitall(3 >>>> processes) >> >> This is not itself evidence of inconsistent state. You can use >> >> -build_twosided allreduce >> >> to avoid the nonblocking sparse algorithm. >> >>> >>> Okay, you should run this with -matstash_legacy just to make sure it is not >>> a bug in your MPI implementation. But it looks like >>> there is inconsistency in the parallel state. This can happen because we >>> have a bug, or it could be that you called a collective >>> operation on a subset of the processes. Is there any way you could cut down >>> the example (say put all 1s in the matrix, etc) so >>> that you could give it to us to run? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bhatiamanav at gmail.com Fri Aug 21 11:55:48 2020 From: bhatiamanav at gmail.com (Manav Bhatia) Date: Fri, 21 Aug 2020 11:55:48 -0500 Subject: [petsc-users] MatAssemblyEnd taking too long In-Reply-To: References: <96E8FF05-5719-4244-A763-A9BACDA21C30@gmail.com> <87k0xui1vw.fsf@jedbrown.org> <87h7syi19b.fsf@jedbrown.org> <295DDF3F-433D-4057-9658-31F260D18BE8@gmail.com> <4144EFF3-4F1E-4721-A72B-3321497655B2@gmail.com> <87imddgq3l.fsf@jedbrown.org> <2CAFDC41-75ED-4282-A6C3-9B477E2F8542@gmail.com> Message-ID: <60D8E648-4316-4B3B-B17D-94887E297F50@gmail.com> I built petsc with mpich-3.3.2 on my MacBook Pro with Apple clang 11.0.3 and the test is finishing at my end. So, it appears that there is some issue with openmpi-4.0.1 on this machine. I will now build all my dependency toolchain with mpich and hopefully things will work for my application code. Thank you again for your help. Regards, Manav > On Aug 20, 2020, at 10:45 PM, Junchao Zhang wrote: > > Manav, > I downloaded your petsc_mat.tgz but could not reproduce the problem, on both Linux and Mac. I used the petsc commit id df0e4300 you mentioned. > On Linux, I have openmpi-4.0.2 + gcc-8.3.0, and petsc is configured --with-debugging --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort --COPTFLAGS="-g -O0" --FOPTFLAGS="-g -O0" --CXXOPTFLAGS="-g -O0" --PETSC_ARCH=linux-host-dbg > On Mac, I have mpich-3.3.1 + clang-11.0.0-apple, and petsc is configured --with-debugging=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort --with-ctable=0 COPTFLAGS="-O0 -g" CXXOPTFLAGS="-O0 -g" PETSC_ARCH=mac-clang-dbg > > mpirun -n 8 ./test > rank: 1 : stdout.processor.1 > rank: 4 : stdout.processor.4 > rank: 0 : stdout.processor.0 > rank: 5 : stdout.processor.5 > rank: 6 : stdout.processor.6 > rank: 7 : stdout.processor.7 > rank: 3 : stdout.processor.3 > rank: 2 : stdout.processor.2 > rank: 1 : Beginning reading nnz... > rank: 4 : Beginning reading nnz... > rank: 0 : Beginning reading nnz... > rank: 5 : Beginning reading nnz... > rank: 7 : Beginning reading nnz... > rank: 2 : Beginning reading nnz... > rank: 3 : Beginning reading nnz... > rank: 6 : Beginning reading nnz... > rank: 5 : Finished reading nnz > rank: 5 : Beginning mat preallocation... > rank: 3 : Finished reading nnz > rank: 3 : Beginning mat preallocation... > rank: 4 : Finished reading nnz > rank: 4 : Beginning mat preallocation... > rank: 7 : Finished reading nnz > rank: 7 : Beginning mat preallocation... > rank: 1 : Finished reading nnz > rank: 1 : Beginning mat preallocation... > rank: 0 : Finished reading nnz > rank: 0 : Beginning mat preallocation... > rank: 2 : Finished reading nnz > rank: 2 : Beginning mat preallocation... > rank: 6 : Finished reading nnz > rank: 6 : Beginning mat preallocation... > rank: 5 : Finished preallocation > rank: 5 : Beginning reading and setting matrix values... > rank: 1 : Finished preallocation > rank: 1 : Beginning reading and setting matrix values... > rank: 7 : Finished preallocation > rank: 7 : Beginning reading and setting matrix values... > rank: 2 : Finished preallocation > rank: 2 : Beginning reading and setting matrix values... > rank: 4 : Finished preallocation > rank: 4 : Beginning reading and setting matrix values... > rank: 0 : Finished preallocation > rank: 0 : Beginning reading and setting matrix values... > rank: 3 : Finished preallocation > rank: 3 : Beginning reading and setting matrix values... > rank: 6 : Finished preallocation > rank: 6 : Beginning reading and setting matrix values... > rank: 1 : Finished reading and setting matrix values > rank: 1 : Beginning mat assembly... > rank: 5 : Finished reading and setting matrix values > rank: 5 : Beginning mat assembly... > rank: 4 : Finished reading and setting matrix values > rank: 4 : Beginning mat assembly... > rank: 2 : Finished reading and setting matrix values > rank: 2 : Beginning mat assembly... > rank: 3 : Finished reading and setting matrix values > rank: 3 : Beginning mat assembly... > rank: 7 : Finished reading and setting matrix values > rank: 7 : Beginning mat assembly... > rank: 6 : Finished reading and setting matrix values > rank: 6 : Beginning mat assembly... > rank: 0 : Finished reading and setting matrix values > rank: 0 : Beginning mat assembly... > rank: 1 : Finished mat assembly > rank: 3 : Finished mat assembly > rank: 7 : Finished mat assembly > rank: 0 : Finished mat assembly > rank: 5 : Finished mat assembly > rank: 2 : Finished mat assembly > rank: 4 : Finished mat assembly > rank: 6 : Finished mat assembly > > --Junchao Zhang > > > On Thu, Aug 20, 2020 at 5:29 PM Junchao Zhang > wrote: > I will have a look and report back to you. Thanks. > --Junchao Zhang > > > On Thu, Aug 20, 2020 at 5:23 PM Manav Bhatia > wrote: > I have created a standalone test that demonstrates the problem at my end. I have stored the indices, etc. from my problem in a text file for each rank, which I use to initialize the matrix. > Please note that the test is specifically for 8 ranks. > > The .tgz file is on my google drive: https://drive.google.com/file/d/1R-WjS36av3maXX3pUyiR3ndGAxteTVj-/view?usp=sharing > > This contains a README file with instructions on running. Please note that the work directory needs the index files. > > Please let me know if I can provide any further information. > > Thank you all for your help. > > Regards, > Manav > >> On Aug 20, 2020, at 12:54 PM, Jed Brown > wrote: >> >> Matthew Knepley > writes: >> >>> On Thu, Aug 20, 2020 at 11:09 AM Manav Bhatia > wrote: >>> >>>> >>>> >>>> On Aug 20, 2020, at 8:31 AM, Stefano Zampini > >>>> wrote: >>>> >>>> Can you add a MPI_Barrier before >>>> >>>> ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr); >>>> >>>> >>>> With a MPI_Barrier before this function call: >>>> ? three of the processes have already hit this barrier, >>>> ? the other 5 are inside MatStashScatterGetMesg_Private -> >>>> MatStashScatterGetMesg_BTS -> MPI_Waitsome(2 processes)/MPI_Waitall(3 >>>> processes) >> >> This is not itself evidence of inconsistent state. You can use >> >> -build_twosided allreduce >> >> to avoid the nonblocking sparse algorithm. >> >>> >>> Okay, you should run this with -matstash_legacy just to make sure it is not >>> a bug in your MPI implementation. But it looks like >>> there is inconsistency in the parallel state. This can happen because we >>> have a bug, or it could be that you called a collective >>> operation on a subset of the processes. Is there any way you could cut down >>> the example (say put all 1s in the matrix, etc) so >>> that you could give it to us to run? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Aug 21 12:25:38 2020 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 21 Aug 2020 12:25:38 -0500 Subject: [petsc-users] PetscFV and TS implicit In-Reply-To: References: <87mu2pgtdp.fsf@jedbrown.org> <01FA5D4D-A0CA-4ACB-ACC9-EB213E3B0D2F@petsc.dev> Message-ID: > On Aug 21, 2020, at 10:58 AM, Thibault Bridel-Bertomeu wrote: > > Thank you Barry for the tip ! I?ll make sure to do that when everything is set. > What I also meant is that there will not be any more direct way to set the preconditioner than to go through SNESSetJacobian after having assembled everything by hand ? Like, in my case, or in the more general case of fluid dynamics equations, the preconditioner is not a fun matrix to assemble, because for every cell the derivative of the physical flux jacobian has to be taken and put in the right block in the matrix - finite element style if you want. Is there a way to do that with Petsc methods, maybe short-circuiting the FEM based methods ? Thibault I am not sure what you mean but there are a couple of things that may be helpful. PCSHELL https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCSHELL.html allows you to build your own preconditioner (that can and often will use one or more of its own Mats, and KSP or PC inside it, or even use another PETScFV etc to build some of the sub matrices for you if it is appropriate), this approach means you never need to construct a "global" PETSc matrix from which PETSc builds the preconditioner. But you should only do this if the conventional approach is not reasonable for your problem. MATNEST https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATNEST.html allows you to build a global matrix by building parts of it separately and even skipping parts you decide you don't need in the preconditioner. Conceptually it is the same as just creating a global matrix and filling up but the process is a bit different and something suitable for "multi physics" or "multi-equation" type applications. Of course what you put into PCSHELL and MATNEST will affect the convergence of the nonlinear solver. As Jed noted what you put in the "Jacobian" does not have to be directly the same mathematically as what you put into the TSSetI/RHSFunction with the caveat that it does have to appropriate spectral properties to result in a good preconditioner for the "true" Jacobian. Couple of other notes: The entire business of "Jacobian" matrix-free or not (with for example -snes_fd_operator) is tricky because as Jed noted if your finite volume scheme has non-differential terms such as if () tests. There is a concept of sub-differential for this type of thing but I know absolutely nothing about that and probably not worth investigating. In this situation you can avoid the "true" Jacobian completely (both for matrix-vector product and preconditioner) and use something else as Jed suggested a lower order scheme that is differentiable. This can work well for solving the nonlinear system or not depending on how suitable it is for your original "function" 1) In theory at least you can have the Jacobian matrix-vector product computed directly using DMPLEX/PETScFV infrastructure (it would apply the Jacobian locally matrix-free using code similar to the code that evaluates the FV "function". I do no know if any of this code is written, it will be more efficient than -snes_mf_operator that evaluates the FV "function" and does traditional differencing to compute the Jacobian. Again it has the problem of non-differentialability if the function is not differential. But it could be done for a different (lower order scheme) that is differentiable. 2) You can have PETSc compute the Jacobian explicitly coloring and from that build the preconditioner, this allows you to avoid the hassle of writing the code for the derivatives yourself. This uses finite differences on your function and coloring of the graph to compute many columns of the Jacobian simultaneously and can be pretty efficient. Again if the function is not differential there can be issues of what the result means and will it work in a nonlinear solver. SNESComputeJacobianDefaultColor https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESComputeJacobianDefaultColor.html 3) Much more outlandish is to skip Newton and Jacobians completely and use the full approximation scheme SNESFAS https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNESFAS/SNESFAS.html this requires a grid hierarchy and appropriate way to interpolate up through the grid hierarchy your finite volume solutions. Probably not worth investigating unless you have lots of time on your hands and keen interest in this kind of stuff https://arxiv.org/pdf/1607.04254.pdf So to summarize, and Matt and Jed can correct my mistakes. 1) Form the full Jacobian from the original "function" using analytic approach use it for both the matrix-vector product and to build the preconditioner. Problem if full Jacobian not well defined mathematically. Tough to code, usually not practical. 2) Do any matrix free (any way) for the full Jacobian and a) build another "approximate" Jacobian (using any technique analytic or finite differences using matrix coloring on a new "lower order" "function") Still can have trouble if this original Jacobian is no well defined b) "write your own preconditioner" that internally can use anything in PETSc that approximately solves the Jacobian. Same potential problems if original Jacobian is not differential, plus convergence will depend on how good your own preconditioner approximates the inverse of the true Jacobian. 3) Use a lower Jacobian (computed anyway you want) for the matrix-vector product and the preconditioner. The problem of differentiability is gone but convergence of the nonlinear solver depends on how well lower order Jacobian is appropriate for the original "function" a) Form the "lower order" Jacobian analytically or with coloring and use for both matrix-vector product and building preconditioner. Note that switching between this and 2a is trivial. b) Do the "lower order" Jacobian matrix free and provide your own PCSHELL. Note that switching between this and 2b is trivial. Barry I would first try competing the "true" Jacobian via coloring, if that works and give satisfactory results (fast enough) then stop. Then I would do 2a/2b by writing my "function" using PETScFV and writing the "lower order function" via PETScFV and use matrix coloring to get the Jacobian from the second "lower order function". If this works well (either with 2a or 3a or both) then stop or you can compute the "lower order" Jacobian analytically (again using PetscFV) for a more efficient evaluation of the Jacobian. > > Thanks ! > > Thibault > > Le ven. 21 ao?t 2020 ? 17:22, Barry Smith > a ?crit : > > >> On Aug 21, 2020, at 8:35 AM, Thibault Bridel-Bertomeu > wrote: >> >> >> >> Le ven. 21 ao?t 2020 ? 15:23, Matthew Knepley > a ?crit : >> On Fri, Aug 21, 2020 at 9:10 AM Thibault Bridel-Bertomeu > wrote: >> Sorry, I sent too soon, I hit the wrong key. >> >> I wanted to say that context.npoints is the local number of cells. >> >> PetscRHSFunctionImpl allows to generate the hyperbolic part of the right hand side. >> Then we have : >> >> PetscErrorCode PetscIJacobian( >> TS ts, /*!< Time stepping object (see PETSc TS)*/ >> PetscReal t, /*!< Current time */ >> Vec Y, /*!< Solution vector */ >> Vec Ydot, /*!< Time-derivative of solution vector */ >> PetscReal a, /*!< Shift */ >> Mat A, /*!< Jacobian matrix */ >> Mat B, /*!< Preconditioning matrix */ >> void *ctxt /*!< Application context */ >> ) >> { >> PETScContext *context = (PETScContext*) ctxt; >> HyPar *solver = context->solver; >> _DECLARE_IERR_; >> >> PetscFunctionBegin; >> solver->count_IJacobian++; >> context->shift = a; >> context->waqt = t; >> /* Construct preconditioning matrix */ >> if (context->flag_use_precon) { IERR PetscComputePreconMatImpl(B,Y,context); CHECKERR(ierr); } >> >> PetscFunctionReturn(0); >> } >> >> and PetscJacobianFunction_JFNK which I bind to the matrix shell, computes the action of the jacobian on a vector : say U0 is the state of reference and Y the vector upon which to apply the JFNK method, then the PetscJacobianFunction_JFNK returns shift * Y - 1/epsilon * (F(U0 + epsilon*Y) - F(U0)) where F allows to evaluate the hyperbolic flux (shift comes from the TS). >> The preconditioning matrix I compute as an approximation to the actual jacobian, that is shift * Identity - Derivative(dF/dU) where dF/dU is, in each cell, a 4x4 matrix that is known exactly for the system of equations I am solving, i.e. Euler equations. For the structured grid, I can loop on the cells and do that 'Derivative' thing at first order by simply taking a finite-difference like approximation with the neighboring cells, Derivative(phi) = phi_i - phi_{i-1} and I assemble the B matrix block by block (JFunction is the dF/dU) >> >> /* diagonal element */ >> >> >> <> for (v=0; v> >> >> <> ierr = solver->JFunction (values,(u+nvars*p),solver->physics ,dir,0); >> >> >> <> _ArrayScale1D_ (values,(dxinv*iblank),(nvars*nvars)); >> >> >> <> ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); CHKERRQ(ierr); >> >> >> <> >> >> >> <> /* left neighbor */ >> >> >> <> if (pgL >= 0) { >> >> >> <> for (v=0; v> >> >> <> ierr = solver->JFunction (values,(u+nvars*pL),solver->physics ,dir,1); >> >> >> <> _ArrayScale1D_ (values,(-dxinv*iblank),(nvars*nvars)); >> >> >> <> ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); CHKERRQ(ierr); >> >> >> <> } >> >> >> <> >> >> >> <> /* right neighbor */ >> >> >> <> if (pgR >= 0) { >> >> >> <> for (v=0; v> >> >> <> ierr = solver->JFunction (values,(u+nvars*pR),solver->physics ,dir,-1); >> >> >> <> _ArrayScale1D_ (values,(-dxinv*iblank),(nvars*nvars)); >> >> >> <> ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); CHKERRQ(ierr); >> >> >> <> } >> >> >> >> I do not know if I am clear here ... >> Anyways, I am trying to figure out how to do this shell matrix and this preconditioner using all the FV and DMPlex artillery. >> >> Okay, that is very clear. We should be able to get the JFNK just with -snes_mf_operator, and put the approximate J construction in DMPlexComputeJacobian_Internal(). >> There is an FV section already, and we could just add this. I would need to understand those entries in the pointwise Riemann sense that the other stuff is now. >> >> Ok i had a quick look and if I understood correctly it would do the job. Setting the snes-mf-operator flag would mean however that we have to go through SNESSetJacobian to set the jacobian and the preconditioning matrix wouldn't it ? > > Thibault, > > Since the TS implicit methods end up using SNES internally the option should be available to you without requiring you to be calling the SNES routines directly > > Once you have finalized your approach and if for the implicit case you always work in the snes mf operator mode you can hardwire > > TSGetSNES(ts,&snes); > SNESSetUseMatrixFree(snes,PETSC_TRUE,PETSC_FALSE); > > in your code so you don't need to always provide the option -snes-mf-operator > > Barry > > > >> There might be calls to the Riemann solver to evaluate the dRHS / dU part yes but maybe it's possible to re-use what was computed for the RHS^n ? >> In the FV section the jacobian is set to identity which I missed before, but it could explain why when I used the following : >> TSSetType(ts, TSBEULER); >> DMTSSetIFunctionLocal(dm, DMPlexTSComputeIFunctionFEM , &ctx); >> DMTSSetIJacobianLocal(dm, DMPlexTSComputeIJacobianFEM , &ctx); >> with my FV discretization nothing happened, right ? >> >> Thank you, >> >> Thibault >> >> Thanks, >> >> Matt >> >> Le ven. 21 ao?t 2020 ? 14:55, Thibault Bridel-Bertomeu > a ?crit : >> Hi, >> >> Thanks Matthew and Jed for your input. >> I indeed envision an implicit solver in the sense Jed mentioned - Jiri Blazek's book is a nice intro to this concept. >> >> Matthew, I do not know exactly what to change right now because although I understand globally what the DMPlexComputeXXXX_Internal methods do, I cannot say for sure line by line what is happening. >> In a structured code, I have a an implicit FVM solver with PETSc but I do not use any of the FV structure, not even a DM - I just use C arrays that I transform to PETSc Vec and Mat and build my IJacobian and my preconditioner and gives all that to a TS and it runs. I cannot figure out how to do it with the FV and the DM and all the underlying "shortcuts" that I want to use. >> >> Here is the top method for the structured code : >> >> int total_size = context.npoints * solver->nvars >> ierr = TSSetRHSFunction(ts,PETSC_NULL,PetscRHSFunctionImpl,&context); CHKERRQ(ierr); >> SNES snes; >> KSP ksp; >> PC pc; >> SNESType snestype; >> ierr = TSGetSNES(ts,&snes); CHKERRQ(ierr); >> ierr = SNESGetType(snes,&snestype); CHKERRQ(ierr); >> >> flag_mat_a = 1; >> ierr = MatCreateShell(MPI_COMM_WORLD,total_size,total_size,PETSC_DETERMINE, >> PETSC_DETERMINE,&context,&A); CHKERRQ(ierr); >> context.jfnk_eps = 1e-7; >> ierr = PetscOptionsGetReal(NULL,NULL,"-jfnk_epsilon",&context.jfnk_eps,NULL); CHKERRQ(ierr); >> ierr = MatShellSetOperation(A,MATOP_MULT,(void (*)(void))PetscJacobianFunction_JFNK); CHKERRQ(ierr); >> ierr = MatSetUp(A); CHKERRQ(ierr); >> >> context.flag_use_precon = 0; >> ierr = PetscOptionsGetBool(PETSC_NULL,PETSC_NULL,"-with_pc",(PetscBool*)(&context.flag_use_precon),PETSC_NULL); CHKERRQ(ierr); >> >> /* Set up preconditioner matrix */ >> flag_mat_b = 1; >> ierr = MatCreateAIJ(MPI_COMM_WORLD,total_size,total_size,PETSC_DETERMINE,PETSC_DETERMINE, >> (solver->ndims*2+1)*solver->nvars,NULL, >> 2*solver->ndims*solver->nvars,NULL,&B); CHKERRQ(ierr); >> ierr = MatSetBlockSize(B,solver->nvars); >> /* Set the RHSJacobian function for TS */ >> ierr = TSSetIJacobian(ts,A,B,PetscIJacobian,&context); CHKERRQ(ierr); >> >> Thibault Bridel-Bertomeu >> ? >> Eng, MSc, PhD >> Research Engineer >> CEA/CESTA >> 33114 LE BARP >> Tel.: (+33)557046924 >> Mob.: (+33)611025322 >> Mail: thibault.bridelbertomeu at gmail.com >> >> >> Le jeu. 20 ao?t 2020 ? 18:43, Jed Brown > a ?crit : >> Matthew Knepley > writes: >> >> >> >> >> >> > I could never get the FVM stuff to make sense to me for implicit methods. >> >> >> > Here is my problem understanding. If you have an FVM method, it decides >> >> >> > to move "stuff" from one cell to its neighboring cells depending on the >> >> >> > solution to the Riemann problem on each face, which computed the flux. This >> >> >> > is >> >> >> > fine unless the timestep is so big that material can flow through into the >> >> >> > cells beyond the neighbor. Then I should have considered the effect of the >> >> >> > Riemann problem for those interfaces. That would be in the Jacobian, but I >> >> >> > don't know how to compute that Jacobian. I guess you could do everything >> >> >> > matrix-free, but without a preconditioner it seems hard. >> >> >> >> >> >> So long as we're using method of lines, the flux is just instantaneous flux, not integrated over some time step. It has the same meaning for implicit and explicit. >> >> >> >> >> >> An explicit method would be unstable if you took such a large time step (CFL) and an implicit method will not simultaneously be SSP and higher than first order, but it's still a consistent discretization of the problem. >> >> >> >> >> >> It's common (done in FUN3D and others) to precondition with a first-order method, where gradient reconstruction/limiting is skipped. That's what I'd recommend because limiting creates nasty nonlinearities and the resulting discretizations lack h-ellipticity which makes them very hard to solve. >> >> >> >> >> >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> > > -- > Thibault Bridel-Bertomeu > ? > Eng, MSc, PhD > Research Engineer > CEA/CESTA > 33114 LE BARP > Tel.: (+33)557046924 > Mob.: (+33)611025322 > Mail: thibault.bridelbertomeu at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Aug 21 12:32:37 2020 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 21 Aug 2020 12:32:37 -0500 Subject: [petsc-users] MatAssemblyEnd taking too long In-Reply-To: <60D8E648-4316-4B3B-B17D-94887E297F50@gmail.com> References: <96E8FF05-5719-4244-A763-A9BACDA21C30@gmail.com> <87k0xui1vw.fsf@jedbrown.org> <87h7syi19b.fsf@jedbrown.org> <295DDF3F-433D-4057-9658-31F260D18BE8@gmail.com> <4144EFF3-4F1E-4721-A72B-3321497655B2@gmail.com> <87imddgq3l.fsf@jedbrown.org> <2CAFDC41-75ED-4282-A6C3-9B477E2F8542@gmail.com> <60D8E648-4316-4B3B-B17D-94887E297F50@gmail.com> Message-ID: There really needs to be a usable extensive MPI test suite that can find these performance issues, we spend time helping users with these problems when it is really the MPI communities job. > On Aug 21, 2020, at 11:55 AM, Manav Bhatia wrote: > > I built petsc with mpich-3.3.2 on my MacBook Pro with Apple clang 11.0.3 and the test is finishing at my end. > > So, it appears that there is some issue with openmpi-4.0.1 on this machine. > > I will now build all my dependency toolchain with mpich and hopefully things will work for my application code. > > Thank you again for your help. > > Regards, > Manav > > >> On Aug 20, 2020, at 10:45 PM, Junchao Zhang > wrote: >> >> Manav, >> I downloaded your petsc_mat.tgz but could not reproduce the problem, on both Linux and Mac. I used the petsc commit id df0e4300 you mentioned. >> On Linux, I have openmpi-4.0.2 + gcc-8.3.0, and petsc is configured --with-debugging --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort --COPTFLAGS="-g -O0" --FOPTFLAGS="-g -O0" --CXXOPTFLAGS="-g -O0" --PETSC_ARCH=linux-host-dbg >> On Mac, I have mpich-3.3.1 + clang-11.0.0-apple, and petsc is configured --with-debugging=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort --with-ctable=0 COPTFLAGS="-O0 -g" CXXOPTFLAGS="-O0 -g" PETSC_ARCH=mac-clang-dbg >> >> mpirun -n 8 ./test >> rank: 1 : stdout.processor.1 >> rank: 4 : stdout.processor.4 >> rank: 0 : stdout.processor.0 >> rank: 5 : stdout.processor.5 >> rank: 6 : stdout.processor.6 >> rank: 7 : stdout.processor.7 >> rank: 3 : stdout.processor.3 >> rank: 2 : stdout.processor.2 >> rank: 1 : Beginning reading nnz... >> rank: 4 : Beginning reading nnz... >> rank: 0 : Beginning reading nnz... >> rank: 5 : Beginning reading nnz... >> rank: 7 : Beginning reading nnz... >> rank: 2 : Beginning reading nnz... >> rank: 3 : Beginning reading nnz... >> rank: 6 : Beginning reading nnz... >> rank: 5 : Finished reading nnz >> rank: 5 : Beginning mat preallocation... >> rank: 3 : Finished reading nnz >> rank: 3 : Beginning mat preallocation... >> rank: 4 : Finished reading nnz >> rank: 4 : Beginning mat preallocation... >> rank: 7 : Finished reading nnz >> rank: 7 : Beginning mat preallocation... >> rank: 1 : Finished reading nnz >> rank: 1 : Beginning mat preallocation... >> rank: 0 : Finished reading nnz >> rank: 0 : Beginning mat preallocation... >> rank: 2 : Finished reading nnz >> rank: 2 : Beginning mat preallocation... >> rank: 6 : Finished reading nnz >> rank: 6 : Beginning mat preallocation... >> rank: 5 : Finished preallocation >> rank: 5 : Beginning reading and setting matrix values... >> rank: 1 : Finished preallocation >> rank: 1 : Beginning reading and setting matrix values... >> rank: 7 : Finished preallocation >> rank: 7 : Beginning reading and setting matrix values... >> rank: 2 : Finished preallocation >> rank: 2 : Beginning reading and setting matrix values... >> rank: 4 : Finished preallocation >> rank: 4 : Beginning reading and setting matrix values... >> rank: 0 : Finished preallocation >> rank: 0 : Beginning reading and setting matrix values... >> rank: 3 : Finished preallocation >> rank: 3 : Beginning reading and setting matrix values... >> rank: 6 : Finished preallocation >> rank: 6 : Beginning reading and setting matrix values... >> rank: 1 : Finished reading and setting matrix values >> rank: 1 : Beginning mat assembly... >> rank: 5 : Finished reading and setting matrix values >> rank: 5 : Beginning mat assembly... >> rank: 4 : Finished reading and setting matrix values >> rank: 4 : Beginning mat assembly... >> rank: 2 : Finished reading and setting matrix values >> rank: 2 : Beginning mat assembly... >> rank: 3 : Finished reading and setting matrix values >> rank: 3 : Beginning mat assembly... >> rank: 7 : Finished reading and setting matrix values >> rank: 7 : Beginning mat assembly... >> rank: 6 : Finished reading and setting matrix values >> rank: 6 : Beginning mat assembly... >> rank: 0 : Finished reading and setting matrix values >> rank: 0 : Beginning mat assembly... >> rank: 1 : Finished mat assembly >> rank: 3 : Finished mat assembly >> rank: 7 : Finished mat assembly >> rank: 0 : Finished mat assembly >> rank: 5 : Finished mat assembly >> rank: 2 : Finished mat assembly >> rank: 4 : Finished mat assembly >> rank: 6 : Finished mat assembly >> >> --Junchao Zhang >> >> >> On Thu, Aug 20, 2020 at 5:29 PM Junchao Zhang > wrote: >> I will have a look and report back to you. Thanks. >> --Junchao Zhang >> >> >> On Thu, Aug 20, 2020 at 5:23 PM Manav Bhatia > wrote: >> I have created a standalone test that demonstrates the problem at my end. I have stored the indices, etc. from my problem in a text file for each rank, which I use to initialize the matrix. >> Please note that the test is specifically for 8 ranks. >> >> The .tgz file is on my google drive: https://drive.google.com/file/d/1R-WjS36av3maXX3pUyiR3ndGAxteTVj-/view?usp=sharing >> >> This contains a README file with instructions on running. Please note that the work directory needs the index files. >> >> Please let me know if I can provide any further information. >> >> Thank you all for your help. >> >> Regards, >> Manav >> >>> On Aug 20, 2020, at 12:54 PM, Jed Brown > wrote: >>> >>> Matthew Knepley > writes: >>> >>>> On Thu, Aug 20, 2020 at 11:09 AM Manav Bhatia > wrote: >>>> >>>>> >>>>> >>>>> On Aug 20, 2020, at 8:31 AM, Stefano Zampini > >>>>> wrote: >>>>> >>>>> Can you add a MPI_Barrier before >>>>> >>>>> ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr); >>>>> >>>>> >>>>> With a MPI_Barrier before this function call: >>>>> ? three of the processes have already hit this barrier, >>>>> ? the other 5 are inside MatStashScatterGetMesg_Private -> >>>>> MatStashScatterGetMesg_BTS -> MPI_Waitsome(2 processes)/MPI_Waitall(3 >>>>> processes) >>> >>> This is not itself evidence of inconsistent state. You can use >>> >>> -build_twosided allreduce >>> >>> to avoid the nonblocking sparse algorithm. >>> >>>> >>>> Okay, you should run this with -matstash_legacy just to make sure it is not >>>> a bug in your MPI implementation. But it looks like >>>> there is inconsistency in the parallel state. This can happen because we >>>> have a bug, or it could be that you called a collective >>>> operation on a subset of the processes. Is there any way you could cut down >>>> the example (say put all 1s in the matrix, etc) so >>>> that you could give it to us to run? >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bhatiamanav at gmail.com Fri Aug 21 14:05:16 2020 From: bhatiamanav at gmail.com (Manav Bhatia) Date: Fri, 21 Aug 2020 14:05:16 -0500 Subject: [petsc-users] MatAssemblyEnd taking too long In-Reply-To: <60D8E648-4316-4B3B-B17D-94887E297F50@gmail.com> References: <96E8FF05-5719-4244-A763-A9BACDA21C30@gmail.com> <87k0xui1vw.fsf@jedbrown.org> <87h7syi19b.fsf@jedbrown.org> <295DDF3F-433D-4057-9658-31F260D18BE8@gmail.com> <4144EFF3-4F1E-4721-A72B-3321497655B2@gmail.com> <87imddgq3l.fsf@jedbrown.org> <2CAFDC41-75ED-4282-A6C3-9B477E2F8542@gmail.com> <60D8E648-4316-4B3B-B17D-94887E297F50@gmail.com> Message-ID: I can verify that my application code is working with mpich-3.3.2 . -Manav > On Aug 21, 2020, at 11:55 AM, Manav Bhatia wrote: > > I built petsc with mpich-3.3.2 on my MacBook Pro with Apple clang 11.0.3 and the test is finishing at my end. > > So, it appears that there is some issue with openmpi-4.0.1 on this machine. > > I will now build all my dependency toolchain with mpich and hopefully things will work for my application code. > > Thank you again for your help. > > Regards, > Manav > > >> On Aug 20, 2020, at 10:45 PM, Junchao Zhang > wrote: >> >> Manav, >> I downloaded your petsc_mat.tgz but could not reproduce the problem, on both Linux and Mac. I used the petsc commit id df0e4300 you mentioned. >> On Linux, I have openmpi-4.0.2 + gcc-8.3.0, and petsc is configured --with-debugging --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort --COPTFLAGS="-g -O0" --FOPTFLAGS="-g -O0" --CXXOPTFLAGS="-g -O0" --PETSC_ARCH=linux-host-dbg >> On Mac, I have mpich-3.3.1 + clang-11.0.0-apple, and petsc is configured --with-debugging=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort --with-ctable=0 COPTFLAGS="-O0 -g" CXXOPTFLAGS="-O0 -g" PETSC_ARCH=mac-clang-dbg >> >> mpirun -n 8 ./test >> rank: 1 : stdout.processor.1 >> rank: 4 : stdout.processor.4 >> rank: 0 : stdout.processor.0 >> rank: 5 : stdout.processor.5 >> rank: 6 : stdout.processor.6 >> rank: 7 : stdout.processor.7 >> rank: 3 : stdout.processor.3 >> rank: 2 : stdout.processor.2 >> rank: 1 : Beginning reading nnz... >> rank: 4 : Beginning reading nnz... >> rank: 0 : Beginning reading nnz... >> rank: 5 : Beginning reading nnz... >> rank: 7 : Beginning reading nnz... >> rank: 2 : Beginning reading nnz... >> rank: 3 : Beginning reading nnz... >> rank: 6 : Beginning reading nnz... >> rank: 5 : Finished reading nnz >> rank: 5 : Beginning mat preallocation... >> rank: 3 : Finished reading nnz >> rank: 3 : Beginning mat preallocation... >> rank: 4 : Finished reading nnz >> rank: 4 : Beginning mat preallocation... >> rank: 7 : Finished reading nnz >> rank: 7 : Beginning mat preallocation... >> rank: 1 : Finished reading nnz >> rank: 1 : Beginning mat preallocation... >> rank: 0 : Finished reading nnz >> rank: 0 : Beginning mat preallocation... >> rank: 2 : Finished reading nnz >> rank: 2 : Beginning mat preallocation... >> rank: 6 : Finished reading nnz >> rank: 6 : Beginning mat preallocation... >> rank: 5 : Finished preallocation >> rank: 5 : Beginning reading and setting matrix values... >> rank: 1 : Finished preallocation >> rank: 1 : Beginning reading and setting matrix values... >> rank: 7 : Finished preallocation >> rank: 7 : Beginning reading and setting matrix values... >> rank: 2 : Finished preallocation >> rank: 2 : Beginning reading and setting matrix values... >> rank: 4 : Finished preallocation >> rank: 4 : Beginning reading and setting matrix values... >> rank: 0 : Finished preallocation >> rank: 0 : Beginning reading and setting matrix values... >> rank: 3 : Finished preallocation >> rank: 3 : Beginning reading and setting matrix values... >> rank: 6 : Finished preallocation >> rank: 6 : Beginning reading and setting matrix values... >> rank: 1 : Finished reading and setting matrix values >> rank: 1 : Beginning mat assembly... >> rank: 5 : Finished reading and setting matrix values >> rank: 5 : Beginning mat assembly... >> rank: 4 : Finished reading and setting matrix values >> rank: 4 : Beginning mat assembly... >> rank: 2 : Finished reading and setting matrix values >> rank: 2 : Beginning mat assembly... >> rank: 3 : Finished reading and setting matrix values >> rank: 3 : Beginning mat assembly... >> rank: 7 : Finished reading and setting matrix values >> rank: 7 : Beginning mat assembly... >> rank: 6 : Finished reading and setting matrix values >> rank: 6 : Beginning mat assembly... >> rank: 0 : Finished reading and setting matrix values >> rank: 0 : Beginning mat assembly... >> rank: 1 : Finished mat assembly >> rank: 3 : Finished mat assembly >> rank: 7 : Finished mat assembly >> rank: 0 : Finished mat assembly >> rank: 5 : Finished mat assembly >> rank: 2 : Finished mat assembly >> rank: 4 : Finished mat assembly >> rank: 6 : Finished mat assembly >> >> --Junchao Zhang >> >> >> On Thu, Aug 20, 2020 at 5:29 PM Junchao Zhang > wrote: >> I will have a look and report back to you. Thanks. >> --Junchao Zhang >> >> >> On Thu, Aug 20, 2020 at 5:23 PM Manav Bhatia > wrote: >> I have created a standalone test that demonstrates the problem at my end. I have stored the indices, etc. from my problem in a text file for each rank, which I use to initialize the matrix. >> Please note that the test is specifically for 8 ranks. >> >> The .tgz file is on my google drive: https://drive.google.com/file/d/1R-WjS36av3maXX3pUyiR3ndGAxteTVj-/view?usp=sharing >> >> This contains a README file with instructions on running. Please note that the work directory needs the index files. >> >> Please let me know if I can provide any further information. >> >> Thank you all for your help. >> >> Regards, >> Manav >> >>> On Aug 20, 2020, at 12:54 PM, Jed Brown > wrote: >>> >>> Matthew Knepley > writes: >>> >>>> On Thu, Aug 20, 2020 at 11:09 AM Manav Bhatia > wrote: >>>> >>>>> >>>>> >>>>> On Aug 20, 2020, at 8:31 AM, Stefano Zampini > >>>>> wrote: >>>>> >>>>> Can you add a MPI_Barrier before >>>>> >>>>> ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr); >>>>> >>>>> >>>>> With a MPI_Barrier before this function call: >>>>> ? three of the processes have already hit this barrier, >>>>> ? the other 5 are inside MatStashScatterGetMesg_Private -> >>>>> MatStashScatterGetMesg_BTS -> MPI_Waitsome(2 processes)/MPI_Waitall(3 >>>>> processes) >>> >>> This is not itself evidence of inconsistent state. You can use >>> >>> -build_twosided allreduce >>> >>> to avoid the nonblocking sparse algorithm. >>> >>>> >>>> Okay, you should run this with -matstash_legacy just to make sure it is not >>>> a bug in your MPI implementation. But it looks like >>>> there is inconsistency in the parallel state. This can happen because we >>>> have a bug, or it could be that you called a collective >>>> operation on a subset of the processes. Is there any way you could cut down >>>> the example (say put all 1s in the matrix, etc) so >>>> that you could give it to us to run? >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Fri Aug 21 14:17:55 2020 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Fri, 21 Aug 2020 14:17:55 -0500 Subject: [petsc-users] MatAssemblyEnd taking too long In-Reply-To: References: <96E8FF05-5719-4244-A763-A9BACDA21C30@gmail.com> <87k0xui1vw.fsf@jedbrown.org> <87h7syi19b.fsf@jedbrown.org> <295DDF3F-433D-4057-9658-31F260D18BE8@gmail.com> <4144EFF3-4F1E-4721-A72B-3321497655B2@gmail.com> <87imddgq3l.fsf@jedbrown.org> <2CAFDC41-75ED-4282-A6C3-9B477E2F8542@gmail.com> <60D8E648-4316-4B3B-B17D-94887E297F50@gmail.com> Message-ID: Barry, I mentioned a test suite from MPICH at https://lists.mcs.anl.gov/pipermail/petsc-users/2020-July/041738.html. Since it is not easy to use, I did not put it on PETSc FAQ. I also asked in the OpenMPI mailing list. An OpenMPI developer said he could make their tests public, and is in the process of checking with all authors to have a license :). If it is done, it will be at https://github.com/open-mpi/ompi-tests-public A test suite will be helpful but I doubt it will solve the problem. User's particular case (number of ranks, message size, communication pattern etc) might not be covered by a test suite. --Junchao Zhang On Fri, Aug 21, 2020 at 12:33 PM Barry Smith wrote: > > There really needs to be a usable extensive MPI test suite that can find > these performance issues, we spend time helping users with these problems > when it is really the MPI communities job. > > > > On Aug 21, 2020, at 11:55 AM, Manav Bhatia wrote: > > I built petsc with mpich-3.3.2 on my MacBook Pro with Apple clang 11.0.3 > and the test is finishing at my end. > > So, it appears that there is some issue with openmpi-4.0.1 on this > machine. > > I will now build all my dependency toolchain with mpich and hopefully > things will work for my application code. > > Thank you again for your help. > > Regards, > Manav > > > On Aug 20, 2020, at 10:45 PM, Junchao Zhang > wrote: > > Manav, > I downloaded your petsc_mat.tgz but could not reproduce the problem, on > both Linux and Mac. I used the petsc commit id df0e4300 you mentioned. > On Linux, I have openmpi-4.0.2 + gcc-8.3.0, and petsc is configured > --with-debugging --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort > --COPTFLAGS="-g -O0" --FOPTFLAGS="-g -O0" --CXXOPTFLAGS="-g -O0" > --PETSC_ARCH=linux-host-dbg > On Mac, I have mpich-3.3.1 + clang-11.0.0-apple, and petsc is > configured --with-debugging=1 --with-cc=mpicc --with-cxx=mpicxx > --with-fc=mpifort --with-ctable=0 COPTFLAGS="-O0 -g" CXXOPTFLAGS="-O0 -g" > PETSC_ARCH=mac-clang-dbg > > mpirun -n 8 ./test > rank: 1 : stdout.processor.1 > rank: 4 : stdout.processor.4 > rank: 0 : stdout.processor.0 > rank: 5 : stdout.processor.5 > rank: 6 : stdout.processor.6 > rank: 7 : stdout.processor.7 > rank: 3 : stdout.processor.3 > rank: 2 : stdout.processor.2 > rank: 1 : Beginning reading nnz... > rank: 4 : Beginning reading nnz... > rank: 0 : Beginning reading nnz... > rank: 5 : Beginning reading nnz... > rank: 7 : Beginning reading nnz... > rank: 2 : Beginning reading nnz... > rank: 3 : Beginning reading nnz... > rank: 6 : Beginning reading nnz... > rank: 5 : Finished reading nnz > rank: 5 : Beginning mat preallocation... > rank: 3 : Finished reading nnz > rank: 3 : Beginning mat preallocation... > rank: 4 : Finished reading nnz > rank: 4 : Beginning mat preallocation... > rank: 7 : Finished reading nnz > rank: 7 : Beginning mat preallocation... > rank: 1 : Finished reading nnz > rank: 1 : Beginning mat preallocation... > rank: 0 : Finished reading nnz > rank: 0 : Beginning mat preallocation... > rank: 2 : Finished reading nnz > rank: 2 : Beginning mat preallocation... > rank: 6 : Finished reading nnz > rank: 6 : Beginning mat preallocation... > rank: 5 : Finished preallocation > rank: 5 : Beginning reading and setting matrix values... > rank: 1 : Finished preallocation > rank: 1 : Beginning reading and setting matrix values... > rank: 7 : Finished preallocation > rank: 7 : Beginning reading and setting matrix values... > rank: 2 : Finished preallocation > rank: 2 : Beginning reading and setting matrix values... > rank: 4 : Finished preallocation > rank: 4 : Beginning reading and setting matrix values... > rank: 0 : Finished preallocation > rank: 0 : Beginning reading and setting matrix values... > rank: 3 : Finished preallocation > rank: 3 : Beginning reading and setting matrix values... > rank: 6 : Finished preallocation > rank: 6 : Beginning reading and setting matrix values... > rank: 1 : Finished reading and setting matrix values > rank: 1 : Beginning mat assembly... > rank: 5 : Finished reading and setting matrix values > rank: 5 : Beginning mat assembly... > rank: 4 : Finished reading and setting matrix values > rank: 4 : Beginning mat assembly... > rank: 2 : Finished reading and setting matrix values > rank: 2 : Beginning mat assembly... > rank: 3 : Finished reading and setting matrix values > rank: 3 : Beginning mat assembly... > rank: 7 : Finished reading and setting matrix values > rank: 7 : Beginning mat assembly... > rank: 6 : Finished reading and setting matrix values > rank: 6 : Beginning mat assembly... > rank: 0 : Finished reading and setting matrix values > rank: 0 : Beginning mat assembly... > rank: 1 : Finished mat assembly > rank: 3 : Finished mat assembly > rank: 7 : Finished mat assembly > rank: 0 : Finished mat assembly > rank: 5 : Finished mat assembly > rank: 2 : Finished mat assembly > rank: 4 : Finished mat assembly > rank: 6 : Finished mat assembly > > --Junchao Zhang > > > On Thu, Aug 20, 2020 at 5:29 PM Junchao Zhang > wrote: > >> I will have a look and report back to you. Thanks. >> --Junchao Zhang >> >> >> On Thu, Aug 20, 2020 at 5:23 PM Manav Bhatia >> wrote: >> >>> I have created a standalone test that demonstrates the problem at my >>> end. I have stored the indices, etc. from my problem in a text file >>> for each rank, which I use to initialize the matrix. >>> Please note that the test is specifically for 8 ranks. >>> >>> The .tgz file is on my google drive: >>> https://drive.google.com/file/d/1R-WjS36av3maXX3pUyiR3ndGAxteTVj-/view?usp=sharing >>> >>> >>> This contains a README file with instructions on running. Please note >>> that the work directory needs the index files. >>> >>> Please let me know if I can provide any further information. >>> >>> Thank you all for your help. >>> >>> Regards, >>> Manav >>> >>> On Aug 20, 2020, at 12:54 PM, Jed Brown wrote: >>> >>> Matthew Knepley writes: >>> >>> On Thu, Aug 20, 2020 at 11:09 AM Manav Bhatia >>> wrote: >>> >>> >>> >>> On Aug 20, 2020, at 8:31 AM, Stefano Zampini >>> wrote: >>> >>> Can you add a MPI_Barrier before >>> >>> ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr); >>> >>> >>> With a MPI_Barrier before this function call: >>> ? three of the processes have already hit this barrier, >>> ? the other 5 are inside MatStashScatterGetMesg_Private -> >>> MatStashScatterGetMesg_BTS -> MPI_Waitsome(2 processes)/MPI_Waitall(3 >>> processes) >>> >>> >>> This is not itself evidence of inconsistent state. You can use >>> >>> -build_twosided allreduce >>> >>> to avoid the nonblocking sparse algorithm. >>> >>> >>> Okay, you should run this with -matstash_legacy just to make sure it is >>> not >>> a bug in your MPI implementation. But it looks like >>> there is inconsistency in the parallel state. This can happen because we >>> have a bug, or it could be that you called a collective >>> operation on a subset of the processes. Is there any way you could cut >>> down >>> the example (say put all 1s in the matrix, etc) so >>> that you could give it to us to run? >>> >>> >>> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Aug 21 14:31:25 2020 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 21 Aug 2020 14:31:25 -0500 Subject: [petsc-users] MatAssemblyEnd taking too long In-Reply-To: References: <96E8FF05-5719-4244-A763-A9BACDA21C30@gmail.com> <87k0xui1vw.fsf@jedbrown.org> <87h7syi19b.fsf@jedbrown.org> <295DDF3F-433D-4057-9658-31F260D18BE8@gmail.com> <4144EFF3-4F1E-4721-A72B-3321497655B2@gmail.com> <87imddgq3l.fsf@jedbrown.org> <2CAFDC41-75ED-4282-A6C3-9B477E2F8542@gmail.com> <60D8E648-4316-4B3B-B17D-94887E297F50@gmail.com> Message-ID: Yes, absolutely a test suite will not solve all problems. In the PETSc model, which is not uncommon, each bug/problem found is suppose to result in another test to detect that problem, thus the test suite can find repeats of the problem without again all the hard work from scratch. So this OpenMPI suite, if it gets off the ground, will be valuable ONLY if they accept community additions efficiently and happily. For example would the test suite detect the problem reported by the PETSc user? It should be trivial to have the user run the suite on their system (which is why it needs be very easy to run) and determine. If it does not detect the problem then working with the appropriate "test suite" community we could submit a MR to the test suite that looks for the problem and finds it. Now the test suite is better and we have one less hassle that comes up multiple times for us. In addition the OpenMPI, MPICH developers etc should do the same thing, each time they fix a bug that was not detected by testing they should donate to the universal test suite the code to reproduce the bug. The question is would our effort in helping the MPI test suite community be more than our "wasted" effort dealing with buggy MPIs? Barry It is a bit curious that after 25 years no friendly extensible universal MPI test suite community has emerged. Perhaps it is because each MPI implementation has its own test processes and suites and cannot form the wider community to have a single friendly extensible universal MPI test suite. Looking back one could say this was a mistake of the MPI forum, they should have started that in motion in 1995, would have saved a lot of duplication of effort and would be very very good now. > On Aug 21, 2020, at 2:17 PM, Junchao Zhang wrote: > > Barry, > I mentioned a test suite from MPICH at https://lists.mcs.anl.gov/pipermail/petsc-users/2020-July/041738.html . Since it is not easy to use, I did not put it on PETSc FAQ. > I also asked in the OpenMPI mailing list. An OpenMPI developer said he could make their tests public, and is in the process of checking with all authors to have a license :). If it is done, it will be at https://github.com/open-mpi/ompi-tests-public > > A test suite will be helpful but I doubt it will solve the problem. User's particular case (number of ranks, message size, communication pattern etc) might not be covered by a test suite. > --Junchao Zhang > > > On Fri, Aug 21, 2020 at 12:33 PM Barry Smith > wrote: > > There really needs to be a usable extensive MPI test suite that can find these performance issues, we spend time helping users with these problems when it is really the MPI communities job. > > > >> On Aug 21, 2020, at 11:55 AM, Manav Bhatia > wrote: >> >> I built petsc with mpich-3.3.2 on my MacBook Pro with Apple clang 11.0.3 and the test is finishing at my end. >> >> So, it appears that there is some issue with openmpi-4.0.1 on this machine. >> >> I will now build all my dependency toolchain with mpich and hopefully things will work for my application code. >> >> Thank you again for your help. >> >> Regards, >> Manav >> >> >>> On Aug 20, 2020, at 10:45 PM, Junchao Zhang > wrote: >>> >>> Manav, >>> I downloaded your petsc_mat.tgz but could not reproduce the problem, on both Linux and Mac. I used the petsc commit id df0e4300 you mentioned. >>> On Linux, I have openmpi-4.0.2 + gcc-8.3.0, and petsc is configured --with-debugging --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort --COPTFLAGS="-g -O0" --FOPTFLAGS="-g -O0" --CXXOPTFLAGS="-g -O0" --PETSC_ARCH=linux-host-dbg >>> On Mac, I have mpich-3.3.1 + clang-11.0.0-apple, and petsc is configured --with-debugging=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort --with-ctable=0 COPTFLAGS="-O0 -g" CXXOPTFLAGS="-O0 -g" PETSC_ARCH=mac-clang-dbg >>> >>> mpirun -n 8 ./test >>> rank: 1 : stdout.processor.1 >>> rank: 4 : stdout.processor.4 >>> rank: 0 : stdout.processor.0 >>> rank: 5 : stdout.processor.5 >>> rank: 6 : stdout.processor.6 >>> rank: 7 : stdout.processor.7 >>> rank: 3 : stdout.processor.3 >>> rank: 2 : stdout.processor.2 >>> rank: 1 : Beginning reading nnz... >>> rank: 4 : Beginning reading nnz... >>> rank: 0 : Beginning reading nnz... >>> rank: 5 : Beginning reading nnz... >>> rank: 7 : Beginning reading nnz... >>> rank: 2 : Beginning reading nnz... >>> rank: 3 : Beginning reading nnz... >>> rank: 6 : Beginning reading nnz... >>> rank: 5 : Finished reading nnz >>> rank: 5 : Beginning mat preallocation... >>> rank: 3 : Finished reading nnz >>> rank: 3 : Beginning mat preallocation... >>> rank: 4 : Finished reading nnz >>> rank: 4 : Beginning mat preallocation... >>> rank: 7 : Finished reading nnz >>> rank: 7 : Beginning mat preallocation... >>> rank: 1 : Finished reading nnz >>> rank: 1 : Beginning mat preallocation... >>> rank: 0 : Finished reading nnz >>> rank: 0 : Beginning mat preallocation... >>> rank: 2 : Finished reading nnz >>> rank: 2 : Beginning mat preallocation... >>> rank: 6 : Finished reading nnz >>> rank: 6 : Beginning mat preallocation... >>> rank: 5 : Finished preallocation >>> rank: 5 : Beginning reading and setting matrix values... >>> rank: 1 : Finished preallocation >>> rank: 1 : Beginning reading and setting matrix values... >>> rank: 7 : Finished preallocation >>> rank: 7 : Beginning reading and setting matrix values... >>> rank: 2 : Finished preallocation >>> rank: 2 : Beginning reading and setting matrix values... >>> rank: 4 : Finished preallocation >>> rank: 4 : Beginning reading and setting matrix values... >>> rank: 0 : Finished preallocation >>> rank: 0 : Beginning reading and setting matrix values... >>> rank: 3 : Finished preallocation >>> rank: 3 : Beginning reading and setting matrix values... >>> rank: 6 : Finished preallocation >>> rank: 6 : Beginning reading and setting matrix values... >>> rank: 1 : Finished reading and setting matrix values >>> rank: 1 : Beginning mat assembly... >>> rank: 5 : Finished reading and setting matrix values >>> rank: 5 : Beginning mat assembly... >>> rank: 4 : Finished reading and setting matrix values >>> rank: 4 : Beginning mat assembly... >>> rank: 2 : Finished reading and setting matrix values >>> rank: 2 : Beginning mat assembly... >>> rank: 3 : Finished reading and setting matrix values >>> rank: 3 : Beginning mat assembly... >>> rank: 7 : Finished reading and setting matrix values >>> rank: 7 : Beginning mat assembly... >>> rank: 6 : Finished reading and setting matrix values >>> rank: 6 : Beginning mat assembly... >>> rank: 0 : Finished reading and setting matrix values >>> rank: 0 : Beginning mat assembly... >>> rank: 1 : Finished mat assembly >>> rank: 3 : Finished mat assembly >>> rank: 7 : Finished mat assembly >>> rank: 0 : Finished mat assembly >>> rank: 5 : Finished mat assembly >>> rank: 2 : Finished mat assembly >>> rank: 4 : Finished mat assembly >>> rank: 6 : Finished mat assembly >>> >>> --Junchao Zhang >>> >>> >>> On Thu, Aug 20, 2020 at 5:29 PM Junchao Zhang > wrote: >>> I will have a look and report back to you. Thanks. >>> --Junchao Zhang >>> >>> >>> On Thu, Aug 20, 2020 at 5:23 PM Manav Bhatia > wrote: >>> I have created a standalone test that demonstrates the problem at my end. I have stored the indices, etc. from my problem in a text file for each rank, which I use to initialize the matrix. >>> Please note that the test is specifically for 8 ranks. >>> >>> The .tgz file is on my google drive: https://drive.google.com/file/d/1R-WjS36av3maXX3pUyiR3ndGAxteTVj-/view?usp=sharing >>> >>> This contains a README file with instructions on running. Please note that the work directory needs the index files. >>> >>> Please let me know if I can provide any further information. >>> >>> Thank you all for your help. >>> >>> Regards, >>> Manav >>> >>>> On Aug 20, 2020, at 12:54 PM, Jed Brown > wrote: >>>> >>>> Matthew Knepley > writes: >>>> >>>>> On Thu, Aug 20, 2020 at 11:09 AM Manav Bhatia > wrote: >>>>> >>>>>> >>>>>> >>>>>> On Aug 20, 2020, at 8:31 AM, Stefano Zampini > >>>>>> wrote: >>>>>> >>>>>> Can you add a MPI_Barrier before >>>>>> >>>>>> ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr); >>>>>> >>>>>> >>>>>> With a MPI_Barrier before this function call: >>>>>> ? three of the processes have already hit this barrier, >>>>>> ? the other 5 are inside MatStashScatterGetMesg_Private -> >>>>>> MatStashScatterGetMesg_BTS -> MPI_Waitsome(2 processes)/MPI_Waitall(3 >>>>>> processes) >>>> >>>> This is not itself evidence of inconsistent state. You can use >>>> >>>> -build_twosided allreduce >>>> >>>> to avoid the nonblocking sparse algorithm. >>>> >>>>> >>>>> Okay, you should run this with -matstash_legacy just to make sure it is not >>>>> a bug in your MPI implementation. But it looks like >>>>> there is inconsistency in the parallel state. This can happen because we >>>>> have a bug, or it could be that you called a collective >>>>> operation on a subset of the processes. Is there any way you could cut down >>>>> the example (say put all 1s in the matrix, etc) so >>>>> that you could give it to us to run? >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Aug 21 14:50:55 2020 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 21 Aug 2020 15:50:55 -0400 Subject: [petsc-users] MatAssemblyEnd taking too long In-Reply-To: References: <96E8FF05-5719-4244-A763-A9BACDA21C30@gmail.com> <87k0xui1vw.fsf@jedbrown.org> <87h7syi19b.fsf@jedbrown.org> <295DDF3F-433D-4057-9658-31F260D18BE8@gmail.com> <4144EFF3-4F1E-4721-A72B-3321497655B2@gmail.com> <87imddgq3l.fsf@jedbrown.org> <2CAFDC41-75ED-4282-A6C3-9B477E2F8542@gmail.com> <60D8E648-4316-4B3B-B17D-94887E297F50@gmail.com> Message-ID: On Fri, Aug 21, 2020 at 3:32 PM Barry Smith wrote: > > Yes, absolutely a test suite will not solve all problems. In the PETSc > model, which is not uncommon, each bug/problem found is suppose to result > in another test to detect that problem, thus the test suite can find > repeats of the problem without again all the hard work from scratch. > > So this OpenMPI suite, if it gets off the ground, will be valuable ONLY > if they accept community additions efficiently and happily. For example > would the test suite detect the problem reported by the PETSc user? It > should be trivial to have the user run the suite on their system (which is > why it needs be very easy to run) and determine. If it does not detect the > problem then working with the appropriate "test suite" community we could > submit a MR to the test suite that looks for the problem and finds it. Now > the test suite is better and we have one less hassle that comes up multiple > times for us. In addition the OpenMPI, MPICH developers etc should do the > same thing, each time they fix a bug that was not detected by testing they > should donate to the universal test suite the code to reproduce the bug. > > The question is would our effort in helping the MPI test suite community > be more than our "wasted" effort dealing with buggy MPIs? > > Barry > > It is a bit curious that after 25 years no friendly extensible universal > MPI test suite community has emerged. Perhaps it is because each MPI > implementation has its own test processes and suites and cannot form the > wider community to have a single friendly extensible universal MPI test > suite. Looking back one could say this was a mistake of the MPI forum, they > should have started that in motion in 1995, would have saved a lot of > duplication of effort and would be very very good now. > I think they do not do it because people do not hold implementors accountable, only the packages using MPI. Matt > On Aug 21, 2020, at 2:17 PM, Junchao Zhang > wrote: > > Barry, > I mentioned a test suite from MPICH at > https://lists.mcs.anl.gov/pipermail/petsc-users/2020-July/041738.html. > Since it is not easy to use, I did not put it on PETSc FAQ. > I also asked in the OpenMPI mailing list. An OpenMPI developer said he > could make their tests public, and is in the process of checking with all > authors to have a license :). If it is done, it will be at > https://github.com/open-mpi/ompi-tests-public > > A test suite will be helpful but I doubt it will solve the problem. > User's particular case (number of ranks, message size, > communication pattern etc) might not be covered by a test suite. > --Junchao Zhang > > > On Fri, Aug 21, 2020 at 12:33 PM Barry Smith wrote: > >> >> There really needs to be a usable extensive MPI test suite that can >> find these performance issues, we spend time helping users with these >> problems when it is really the MPI communities job. >> >> >> >> On Aug 21, 2020, at 11:55 AM, Manav Bhatia wrote: >> >> I built petsc with mpich-3.3.2 on my MacBook Pro with Apple clang 11.0.3 >> and the test is finishing at my end. >> >> So, it appears that there is some issue with openmpi-4.0.1 on this >> machine. >> >> I will now build all my dependency toolchain with mpich and hopefully >> things will work for my application code. >> >> Thank you again for your help. >> >> Regards, >> Manav >> >> >> On Aug 20, 2020, at 10:45 PM, Junchao Zhang >> wrote: >> >> Manav, >> I downloaded your petsc_mat.tgz but could not reproduce the problem, on >> both Linux and Mac. I used the petsc commit id df0e4300 you mentioned. >> On Linux, I have openmpi-4.0.2 + gcc-8.3.0, and petsc is configured >> --with-debugging --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort >> --COPTFLAGS="-g -O0" --FOPTFLAGS="-g -O0" --CXXOPTFLAGS="-g -O0" >> --PETSC_ARCH=linux-host-dbg >> On Mac, I have mpich-3.3.1 + clang-11.0.0-apple, and petsc is >> configured --with-debugging=1 --with-cc=mpicc --with-cxx=mpicxx >> --with-fc=mpifort --with-ctable=0 COPTFLAGS="-O0 -g" CXXOPTFLAGS="-O0 -g" >> PETSC_ARCH=mac-clang-dbg >> >> mpirun -n 8 ./test >> rank: 1 : stdout.processor.1 >> rank: 4 : stdout.processor.4 >> rank: 0 : stdout.processor.0 >> rank: 5 : stdout.processor.5 >> rank: 6 : stdout.processor.6 >> rank: 7 : stdout.processor.7 >> rank: 3 : stdout.processor.3 >> rank: 2 : stdout.processor.2 >> rank: 1 : Beginning reading nnz... >> rank: 4 : Beginning reading nnz... >> rank: 0 : Beginning reading nnz... >> rank: 5 : Beginning reading nnz... >> rank: 7 : Beginning reading nnz... >> rank: 2 : Beginning reading nnz... >> rank: 3 : Beginning reading nnz... >> rank: 6 : Beginning reading nnz... >> rank: 5 : Finished reading nnz >> rank: 5 : Beginning mat preallocation... >> rank: 3 : Finished reading nnz >> rank: 3 : Beginning mat preallocation... >> rank: 4 : Finished reading nnz >> rank: 4 : Beginning mat preallocation... >> rank: 7 : Finished reading nnz >> rank: 7 : Beginning mat preallocation... >> rank: 1 : Finished reading nnz >> rank: 1 : Beginning mat preallocation... >> rank: 0 : Finished reading nnz >> rank: 0 : Beginning mat preallocation... >> rank: 2 : Finished reading nnz >> rank: 2 : Beginning mat preallocation... >> rank: 6 : Finished reading nnz >> rank: 6 : Beginning mat preallocation... >> rank: 5 : Finished preallocation >> rank: 5 : Beginning reading and setting matrix values... >> rank: 1 : Finished preallocation >> rank: 1 : Beginning reading and setting matrix values... >> rank: 7 : Finished preallocation >> rank: 7 : Beginning reading and setting matrix values... >> rank: 2 : Finished preallocation >> rank: 2 : Beginning reading and setting matrix values... >> rank: 4 : Finished preallocation >> rank: 4 : Beginning reading and setting matrix values... >> rank: 0 : Finished preallocation >> rank: 0 : Beginning reading and setting matrix values... >> rank: 3 : Finished preallocation >> rank: 3 : Beginning reading and setting matrix values... >> rank: 6 : Finished preallocation >> rank: 6 : Beginning reading and setting matrix values... >> rank: 1 : Finished reading and setting matrix values >> rank: 1 : Beginning mat assembly... >> rank: 5 : Finished reading and setting matrix values >> rank: 5 : Beginning mat assembly... >> rank: 4 : Finished reading and setting matrix values >> rank: 4 : Beginning mat assembly... >> rank: 2 : Finished reading and setting matrix values >> rank: 2 : Beginning mat assembly... >> rank: 3 : Finished reading and setting matrix values >> rank: 3 : Beginning mat assembly... >> rank: 7 : Finished reading and setting matrix values >> rank: 7 : Beginning mat assembly... >> rank: 6 : Finished reading and setting matrix values >> rank: 6 : Beginning mat assembly... >> rank: 0 : Finished reading and setting matrix values >> rank: 0 : Beginning mat assembly... >> rank: 1 : Finished mat assembly >> rank: 3 : Finished mat assembly >> rank: 7 : Finished mat assembly >> rank: 0 : Finished mat assembly >> rank: 5 : Finished mat assembly >> rank: 2 : Finished mat assembly >> rank: 4 : Finished mat assembly >> rank: 6 : Finished mat assembly >> >> --Junchao Zhang >> >> >> On Thu, Aug 20, 2020 at 5:29 PM Junchao Zhang >> wrote: >> >>> I will have a look and report back to you. Thanks. >>> --Junchao Zhang >>> >>> >>> On Thu, Aug 20, 2020 at 5:23 PM Manav Bhatia >>> wrote: >>> >>>> I have created a standalone test that demonstrates the problem at my >>>> end. I have stored the indices, etc. from my problem in a text file >>>> for each rank, which I use to initialize the matrix. >>>> Please note that the test is specifically for 8 ranks. >>>> >>>> The .tgz file is on my google drive: >>>> https://drive.google.com/file/d/1R-WjS36av3maXX3pUyiR3ndGAxteTVj-/view?usp=sharing >>>> >>>> >>>> This contains a README file with instructions on running. Please note >>>> that the work directory needs the index files. >>>> >>>> Please let me know if I can provide any further information. >>>> >>>> Thank you all for your help. >>>> >>>> Regards, >>>> Manav >>>> >>>> On Aug 20, 2020, at 12:54 PM, Jed Brown wrote: >>>> >>>> Matthew Knepley writes: >>>> >>>> On Thu, Aug 20, 2020 at 11:09 AM Manav Bhatia >>>> wrote: >>>> >>>> >>>> >>>> On Aug 20, 2020, at 8:31 AM, Stefano Zampini >>> > >>>> wrote: >>>> >>>> Can you add a MPI_Barrier before >>>> >>>> ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr); >>>> >>>> >>>> With a MPI_Barrier before this function call: >>>> ? three of the processes have already hit this barrier, >>>> ? the other 5 are inside MatStashScatterGetMesg_Private -> >>>> MatStashScatterGetMesg_BTS -> MPI_Waitsome(2 processes)/MPI_Waitall(3 >>>> processes) >>>> >>>> >>>> This is not itself evidence of inconsistent state. You can use >>>> >>>> -build_twosided allreduce >>>> >>>> to avoid the nonblocking sparse algorithm. >>>> >>>> >>>> Okay, you should run this with -matstash_legacy just to make sure it is >>>> not >>>> a bug in your MPI implementation. But it looks like >>>> there is inconsistency in the parallel state. This can happen because we >>>> have a bug, or it could be that you called a collective >>>> operation on a subset of the processes. Is there any way you could cut >>>> down >>>> the example (say put all 1s in the matrix, etc) so >>>> that you could give it to us to run? >>>> >>>> >>>> >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Aug 21 14:57:06 2020 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 21 Aug 2020 14:57:06 -0500 Subject: [petsc-users] MatAssemblyEnd taking too long In-Reply-To: References: <96E8FF05-5719-4244-A763-A9BACDA21C30@gmail.com> <87k0xui1vw.fsf@jedbrown.org> <87h7syi19b.fsf@jedbrown.org> <295DDF3F-433D-4057-9658-31F260D18BE8@gmail.com> <4144EFF3-4F1E-4721-A72B-3321497655B2@gmail.com> <87imddgq3l.fsf@jedbrown.org> <2CAFDC41-75ED-4282-A6C3-9B477E2F8542@gmail.com> <60D8E648-4316-4B3B-B17D-94887E297F50@gmail.com> Message-ID: True, the bug reports come to us and we get the blame. > On Aug 21, 2020, at 2:50 PM, Matthew Knepley wrote: > > On Fri, Aug 21, 2020 at 3:32 PM Barry Smith > wrote: > > Yes, absolutely a test suite will not solve all problems. In the PETSc model, which is not uncommon, each bug/problem found is suppose to result in another test to detect that problem, thus the test suite can find repeats of the problem without again all the hard work from scratch. > > So this OpenMPI suite, if it gets off the ground, will be valuable ONLY if they accept community additions efficiently and happily. For example would the test suite detect the problem reported by the PETSc user? It should be trivial to have the user run the suite on their system (which is why it needs be very easy to run) and determine. If it does not detect the problem then working with the appropriate "test suite" community we could submit a MR to the test suite that looks for the problem and finds it. Now the test suite is better and we have one less hassle that comes up multiple times for us. In addition the OpenMPI, MPICH developers etc should do the same thing, each time they fix a bug that was not detected by testing they should donate to the universal test suite the code to reproduce the bug. > > The question is would our effort in helping the MPI test suite community be more than our "wasted" effort dealing with buggy MPIs? > > Barry > > It is a bit curious that after 25 years no friendly extensible universal MPI test suite community has emerged. Perhaps it is because each MPI implementation has its own test processes and suites and cannot form the wider community to have a single friendly extensible universal MPI test suite. Looking back one could say this was a mistake of the MPI forum, they should have started that in motion in 1995, would have saved a lot of duplication of effort and would be very very good now. > > I think they do not do it because people do not hold implementors accountable, only the packages using MPI. > > Matt > >> On Aug 21, 2020, at 2:17 PM, Junchao Zhang > wrote: >> >> Barry, >> I mentioned a test suite from MPICH at https://lists.mcs.anl.gov/pipermail/petsc-users/2020-July/041738.html . Since it is not easy to use, I did not put it on PETSc FAQ. >> I also asked in the OpenMPI mailing list. An OpenMPI developer said he could make their tests public, and is in the process of checking with all authors to have a license :). If it is done, it will be at https://github.com/open-mpi/ompi-tests-public >> >> A test suite will be helpful but I doubt it will solve the problem. User's particular case (number of ranks, message size, communication pattern etc) might not be covered by a test suite. >> --Junchao Zhang >> >> >> On Fri, Aug 21, 2020 at 12:33 PM Barry Smith > wrote: >> >> There really needs to be a usable extensive MPI test suite that can find these performance issues, we spend time helping users with these problems when it is really the MPI communities job. >> >> >> >>> On Aug 21, 2020, at 11:55 AM, Manav Bhatia > wrote: >>> >>> I built petsc with mpich-3.3.2 on my MacBook Pro with Apple clang 11.0.3 and the test is finishing at my end. >>> >>> So, it appears that there is some issue with openmpi-4.0.1 on this machine. >>> >>> I will now build all my dependency toolchain with mpich and hopefully things will work for my application code. >>> >>> Thank you again for your help. >>> >>> Regards, >>> Manav >>> >>> >>>> On Aug 20, 2020, at 10:45 PM, Junchao Zhang > wrote: >>>> >>>> Manav, >>>> I downloaded your petsc_mat.tgz but could not reproduce the problem, on both Linux and Mac. I used the petsc commit id df0e4300 you mentioned. >>>> On Linux, I have openmpi-4.0.2 + gcc-8.3.0, and petsc is configured --with-debugging --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort --COPTFLAGS="-g -O0" --FOPTFLAGS="-g -O0" --CXXOPTFLAGS="-g -O0" --PETSC_ARCH=linux-host-dbg >>>> On Mac, I have mpich-3.3.1 + clang-11.0.0-apple, and petsc is configured --with-debugging=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort --with-ctable=0 COPTFLAGS="-O0 -g" CXXOPTFLAGS="-O0 -g" PETSC_ARCH=mac-clang-dbg >>>> >>>> mpirun -n 8 ./test >>>> rank: 1 : stdout.processor.1 >>>> rank: 4 : stdout.processor.4 >>>> rank: 0 : stdout.processor.0 >>>> rank: 5 : stdout.processor.5 >>>> rank: 6 : stdout.processor.6 >>>> rank: 7 : stdout.processor.7 >>>> rank: 3 : stdout.processor.3 >>>> rank: 2 : stdout.processor.2 >>>> rank: 1 : Beginning reading nnz... >>>> rank: 4 : Beginning reading nnz... >>>> rank: 0 : Beginning reading nnz... >>>> rank: 5 : Beginning reading nnz... >>>> rank: 7 : Beginning reading nnz... >>>> rank: 2 : Beginning reading nnz... >>>> rank: 3 : Beginning reading nnz... >>>> rank: 6 : Beginning reading nnz... >>>> rank: 5 : Finished reading nnz >>>> rank: 5 : Beginning mat preallocation... >>>> rank: 3 : Finished reading nnz >>>> rank: 3 : Beginning mat preallocation... >>>> rank: 4 : Finished reading nnz >>>> rank: 4 : Beginning mat preallocation... >>>> rank: 7 : Finished reading nnz >>>> rank: 7 : Beginning mat preallocation... >>>> rank: 1 : Finished reading nnz >>>> rank: 1 : Beginning mat preallocation... >>>> rank: 0 : Finished reading nnz >>>> rank: 0 : Beginning mat preallocation... >>>> rank: 2 : Finished reading nnz >>>> rank: 2 : Beginning mat preallocation... >>>> rank: 6 : Finished reading nnz >>>> rank: 6 : Beginning mat preallocation... >>>> rank: 5 : Finished preallocation >>>> rank: 5 : Beginning reading and setting matrix values... >>>> rank: 1 : Finished preallocation >>>> rank: 1 : Beginning reading and setting matrix values... >>>> rank: 7 : Finished preallocation >>>> rank: 7 : Beginning reading and setting matrix values... >>>> rank: 2 : Finished preallocation >>>> rank: 2 : Beginning reading and setting matrix values... >>>> rank: 4 : Finished preallocation >>>> rank: 4 : Beginning reading and setting matrix values... >>>> rank: 0 : Finished preallocation >>>> rank: 0 : Beginning reading and setting matrix values... >>>> rank: 3 : Finished preallocation >>>> rank: 3 : Beginning reading and setting matrix values... >>>> rank: 6 : Finished preallocation >>>> rank: 6 : Beginning reading and setting matrix values... >>>> rank: 1 : Finished reading and setting matrix values >>>> rank: 1 : Beginning mat assembly... >>>> rank: 5 : Finished reading and setting matrix values >>>> rank: 5 : Beginning mat assembly... >>>> rank: 4 : Finished reading and setting matrix values >>>> rank: 4 : Beginning mat assembly... >>>> rank: 2 : Finished reading and setting matrix values >>>> rank: 2 : Beginning mat assembly... >>>> rank: 3 : Finished reading and setting matrix values >>>> rank: 3 : Beginning mat assembly... >>>> rank: 7 : Finished reading and setting matrix values >>>> rank: 7 : Beginning mat assembly... >>>> rank: 6 : Finished reading and setting matrix values >>>> rank: 6 : Beginning mat assembly... >>>> rank: 0 : Finished reading and setting matrix values >>>> rank: 0 : Beginning mat assembly... >>>> rank: 1 : Finished mat assembly >>>> rank: 3 : Finished mat assembly >>>> rank: 7 : Finished mat assembly >>>> rank: 0 : Finished mat assembly >>>> rank: 5 : Finished mat assembly >>>> rank: 2 : Finished mat assembly >>>> rank: 4 : Finished mat assembly >>>> rank: 6 : Finished mat assembly >>>> >>>> --Junchao Zhang >>>> >>>> >>>> On Thu, Aug 20, 2020 at 5:29 PM Junchao Zhang > wrote: >>>> I will have a look and report back to you. Thanks. >>>> --Junchao Zhang >>>> >>>> >>>> On Thu, Aug 20, 2020 at 5:23 PM Manav Bhatia > wrote: >>>> I have created a standalone test that demonstrates the problem at my end. I have stored the indices, etc. from my problem in a text file for each rank, which I use to initialize the matrix. >>>> Please note that the test is specifically for 8 ranks. >>>> >>>> The .tgz file is on my google drive: https://drive.google.com/file/d/1R-WjS36av3maXX3pUyiR3ndGAxteTVj-/view?usp=sharing >>>> >>>> This contains a README file with instructions on running. Please note that the work directory needs the index files. >>>> >>>> Please let me know if I can provide any further information. >>>> >>>> Thank you all for your help. >>>> >>>> Regards, >>>> Manav >>>> >>>>> On Aug 20, 2020, at 12:54 PM, Jed Brown > wrote: >>>>> >>>>> Matthew Knepley > writes: >>>>> >>>>>> On Thu, Aug 20, 2020 at 11:09 AM Manav Bhatia > wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Aug 20, 2020, at 8:31 AM, Stefano Zampini > >>>>>>> wrote: >>>>>>> >>>>>>> Can you add a MPI_Barrier before >>>>>>> >>>>>>> ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr); >>>>>>> >>>>>>> >>>>>>> With a MPI_Barrier before this function call: >>>>>>> ? three of the processes have already hit this barrier, >>>>>>> ? the other 5 are inside MatStashScatterGetMesg_Private -> >>>>>>> MatStashScatterGetMesg_BTS -> MPI_Waitsome(2 processes)/MPI_Waitall(3 >>>>>>> processes) >>>>> >>>>> This is not itself evidence of inconsistent state. You can use >>>>> >>>>> -build_twosided allreduce >>>>> >>>>> to avoid the nonblocking sparse algorithm. >>>>> >>>>>> >>>>>> Okay, you should run this with -matstash_legacy just to make sure it is not >>>>>> a bug in your MPI implementation. But it looks like >>>>>> there is inconsistency in the parallel state. This can happen because we >>>>>> have a bug, or it could be that you called a collective >>>>>> operation on a subset of the processes. Is there any way you could cut down >>>>>> the example (say put all 1s in the matrix, etc) so >>>>>> that you could give it to us to run? >>>> >>> >> > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pranayreddy865 at gmail.com Sat Aug 22 00:58:42 2020 From: pranayreddy865 at gmail.com (baikadi pranay) Date: Fri, 21 Aug 2020 22:58:42 -0700 Subject: [petsc-users] 2-norm of solution update suddenly becomes zero after a few iterations Message-ID: Hello, I am trying to solve the Poisson equation in 2D for heterostructure devices. I have linearized the equation and discretized it using FDM. I am using BiCGStab to iteratively solve for the solution as follows: Step 1: Solve A^(i-1) x^(i) = b^(i-1) {i = 1 to N where convergence is reached} Step 2: Use x^{i} to update the central coefficients of A^{i-1} to get A^{i} and similarly update b^{i-1} to get b^{i} Step3: If ( ||x^{i}-x^{i-1}||_2 , the 2-norm of the solution update, is greater than a tolerance, then go back to Step 1 to solve the new system of equations using BiCGStab. Else, exit the loop. *1) I am facing the following problem with this procedure*: The 2-norm of the solution update is suddenly becoming zero after a few iterations in some cases. I print out the getconvergedreason and there are not red flags there, so I am kind of confused whey this behaviour is being observed. This behaviour is leading to "false convergences", in the sense that the solutions obtained are not physical. A similar behaviour was observed when I used SOR instead of BiCGStab. At this point I am starting to suspect if it is wrong to use linear solvers on the poisson equation which is a nonlinear equation (although linearized). If you could please comment on this, that would be very helpful. Any help with this problem is greatly appreciated. Please let me know if you need any further information. Thank you, Sincerely, Pranay. ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat Aug 22 07:24:44 2020 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 22 Aug 2020 08:24:44 -0400 Subject: [petsc-users] 2-norm of solution update suddenly becomes zero after a few iterations In-Reply-To: References: Message-ID: On Sat, Aug 22, 2020 at 2:07 AM baikadi pranay wrote: > Hello, > > I am trying to solve the Poisson equation in 2D for heterostructure > devices. I have linearized the equation and discretized it using FDM. I am > using BiCGStab to iteratively solve for the solution as follows: > > Step 1: Solve A^(i-1) x^(i) = b^(i-1) {i = 1 to N where convergence is > reached} > Step 2: Use x^{i} to update the central coefficients of A^{i-1} to get > A^{i} and similarly update b^{i-1} to get b^{i} > Step3: If ( ||x^{i}-x^{i-1}||_2 , the 2-norm of the solution update, is > greater than a tolerance, then go back to Step 1 to > solve the new system of equations using BiCGStab. Else, > exit the loop. > *1) I am facing the following problem with this procedure*: > The 2-norm of the solution update is suddenly becoming zero after a few > iterations in some cases. I print out the getconvergedreason and there are > not red flags there, so I am kind of confused whey this behaviour is being > observed. This behaviour is leading to "false convergences", in the sense > that the solutions obtained are not physical. > > A similar behaviour was observed when I used SOR instead of BiCGStab. At > this point I am starting to suspect if it is wrong to use linear solvers on > the poisson equation which is a nonlinear equation (although linearized). > If you could please comment on this, that would be very helpful. > > Any help with this problem is greatly appreciated. Please let me know if > you need any further information. > 1) You are coding up the Picard method by hand to solve your nonlinear equation. If the operator is not contractive, this can stagnate, as you are seeing. You could try another solver, like Newton's method. We have a variety of nonlinear solves in the SNES class. 2) It is not clear from your description whether you linear solver is converging. BiCGStab without a preconditioner is a terrible solver for Poisson. We usually recommend starting with Algebraic Multigrid, like Hypre which is great at 2D Poisson. You can monitor the convergence of your linear solver using -knp_monitor_true_solution -ksp_converged_reason We want to see this information with any questions about convergence. Thanks, Matt > Thank you, > > Sincerely, > Pranay. > > ? > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sat Aug 22 10:10:01 2020 From: bsmith at petsc.dev (Barry Smith) Date: Sat, 22 Aug 2020 10:10:01 -0500 Subject: [petsc-users] 2-norm of solution update suddenly becomes zero after a few iterations In-Reply-To: References: Message-ID: <0BC08A7E-591F-4716-8634-55182FC337D9@petsc.dev> Pranay Newton's method is generally the best choice for nonlinear problems as Matt notes but PETSc also provides an implementation of Picard's method with SNESSetPicard() https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESSetPicard.html We implement the defect correction form of the Picard iteration because it converges much more generally when inexact linear solvers are used then the direct Picard iteration A(x^n) x^{n+1} = b(x^n), which is what Matt just said. Based on your email it looks like you using the direct Picard iteration algorithm. With your current code you can likely easily switch to trying SNESSetPicard() and then switch to trying Newton with SNESSetFunction(), https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESSetFunction.html and SNESSetJacobian() https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESSetJacobian.html PETSc is designed to make it as simple as possible to switch between various algorithms to help determine the best for your exact problem. SNES uses KSP for is linear solvers so you get full access to all the possible preconditioners, for larger problems as Matt notes, once you have the best nonlinear convergence selected tuning the linear solver is the most important thing to do for speed. We recommend when possible first getting good nonlinear convergence using a direct linear solver and then switching to an iterative solver as an optimization, for large problems you almost always should to switch to an iterative solver when the problem size increases. Barry > On Aug 22, 2020, at 7:24 AM, Matthew Knepley wrote: > > On Sat, Aug 22, 2020 at 2:07 AM baikadi pranay > wrote: > Hello, > > I am trying to solve the Poisson equation in 2D for heterostructure devices. I have linearized the equation and discretized it using FDM. I am using BiCGStab to iteratively solve for the solution as follows: > > Step 1: Solve A^(i-1) x^(i) = b^(i-1) {i = 1 to N where convergence is reached} > Step 2: Use x^{i} to update the central coefficients of A^{i-1} to get A^{i} and similarly update b^{i-1} to get b^{i} > Step3: If ( ||x^{i}-x^{i-1}||_2 , the 2-norm of the solution update, is greater than a tolerance, then go back to Step 1 to solve the new system of equations using BiCGStab. Else, exit the loop. > 1) I am facing the following problem with this procedure: > The 2-norm of the solution update is suddenly becoming zero after a few iterations in some cases. I print out the getconvergedreason and there are not red flags there, so I am kind of confused whey this behaviour is being observed. This behaviour is leading to "false convergences", in the sense that the solutions obtained are not physical. > > A similar behaviour was observed when I used SOR instead of BiCGStab. At this point I am starting to suspect if it is wrong to use linear solvers on the poisson equation which is a nonlinear equation (although linearized). If you could please comment on this, that would be very helpful. > > Any help with this problem is greatly appreciated. Please let me know if you need any further information. > > 1) You are coding up the Picard method by hand to solve your nonlinear equation. If the operator is not contractive, this can stagnate, as you are seeing. You > could try another solver, like Newton's method. We have a variety of nonlinear solves in the SNES class. > > 2) It is not clear from your description whether you linear solver is converging. BiCGStab without a preconditioner is a terrible solver for Poisson. We usually > recommend starting with Algebraic Multigrid, like Hypre which is great at 2D Poisson. You can monitor the convergence of your linear solver using > > -knp_monitor_true_solution -ksp_converged_reason > > We want to see this information with any questions about convergence. > > Thanks, > > Matt > > Thank you, > > Sincerely, > Pranay. > > ? > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pranayreddy865 at gmail.com Sat Aug 22 17:14:09 2020 From: pranayreddy865 at gmail.com (baikadi pranay) Date: Sat, 22 Aug 2020 15:14:09 -0700 Subject: [petsc-users] 2-norm of solution update suddenly becomes zero after a few iterations In-Reply-To: <0BC08A7E-591F-4716-8634-55182FC337D9@petsc.dev> References: <0BC08A7E-591F-4716-8634-55182FC337D9@petsc.dev> Message-ID: Hi, Thank you for the suggestions. I am attaching a text file which might help you better understand the problem. 1) The first column is iteration number of the outer loop (not that of BiCGStab itself, but the loop I mentioned previously) 2) The second column is the output from KSPGetConvergedReason(). 3) The third column is the 2-norm of the solution update || xi-xi-1||2 4) The last column is the infinity norm of the solution update || xi-xi-1|| ? As can be seen from the file, both the 2-norm and the infinity norm are highly oscillating and become zero at the end. Please let me know if any more information is required. Best Regards, Pranay. ? On Sat, Aug 22, 2020 at 8:10 AM Barry Smith wrote: > > > Pranay > > Newton's method is generally the best choice for nonlinear problems as > Matt notes but PETSc also provides an implementation of Picard's method > with SNESSetPicard() > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESSetPicard.html > > We implement the defect correction form of the Picard iteration because > it converges much more generally when inexact linear solvers are used then > the direct Picard iteration A(x^n) x^{n+1} = b(x^n), which is what Matt > just said. > > Based on your email it looks like you using the direct Picard iteration > algorithm. > > With your current code you can likely easily switch to > trying SNESSetPicard() and then switch to trying Newton with > SNESSetFunction(), > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESSetFunction.html > and SNESSetJacobian() > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESSetJacobian.html > > PETSc is designed to make it as simple as possible to switch between > various algorithms to help determine the best for your exact problem. > > SNES uses KSP for is linear solvers so you get full access to all the > possible preconditioners, for larger problems as Matt notes, once you have > the best nonlinear convergence selected tuning the linear solver is the > most important thing to do for speed. We recommend when possible first > getting good nonlinear convergence using a direct linear solver and then > switching to an iterative solver as an optimization, for large problems you > almost always should to switch to an iterative solver when the problem size > increases. > > Barry > > > On Aug 22, 2020, at 7:24 AM, Matthew Knepley wrote: > > On Sat, Aug 22, 2020 at 2:07 AM baikadi pranay > wrote: > >> Hello, >> >> I am trying to solve the Poisson equation in 2D for heterostructure >> devices. I have linearized the equation and discretized it using FDM. I am >> using BiCGStab to iteratively solve for the solution as follows: >> >> Step 1: Solve A^(i-1) x^(i) = b^(i-1) {i = 1 to N where convergence >> is reached} >> Step 2: Use x^{i} to update the central coefficients of A^{i-1} to get >> A^{i} and similarly update b^{i-1} to get b^{i} >> Step3: If ( ||x^{i}-x^{i-1}||_2 , the 2-norm of the solution update, is >> greater than a tolerance, then go back to Step 1 to >> solve the new system of equations using BiCGStab. Else, >> exit the loop. >> *1) I am facing the following problem with this procedure*: >> The 2-norm of the solution update is suddenly becoming zero after a few >> iterations in some cases. I print out the getconvergedreason and there are >> not red flags there, so I am kind of confused whey this behaviour is being >> observed. This behaviour is leading to "false convergences", in the sense >> that the solutions obtained are not physical. >> >> A similar behaviour was observed when I used SOR instead of BiCGStab. At >> this point I am starting to suspect if it is wrong to use linear solvers on >> the poisson equation which is a nonlinear equation (although linearized). >> If you could please comment on this, that would be very helpful. >> >> Any help with this problem is greatly appreciated. Please let me know if >> you need any further information. >> > > 1) You are coding up the Picard method by hand to solve your nonlinear > equation. If the operator is not contractive, this can stagnate, as you are > seeing. You > could try another solver, like Newton's method. We have a variety of > nonlinear solves in the SNES class. > > 2) It is not clear from your description whether you linear solver is > converging. BiCGStab without a preconditioner is a terrible solver for > Poisson. We usually > recommend starting with Algebraic Multigrid, like Hypre which is great > at 2D Poisson. You can monitor the convergence of your linear solver using > > -knp_monitor_true_solution -ksp_converged_reason > > We want to see this information with any questions about convergence. > > Thanks, > > Matt > > >> Thank you, >> >> Sincerely, >> Pranay. >> >> ? >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- 1 2 13124.30945 390.72893 2 2 7631.38526 83.77839 3 2 3628.89484 22.70774 4 2 2835.57475 16.24984 5 2 1118.29504 52.39736 6 2 201.33808 5.93052 7 2 152.59920 2.10226 8 2 135.95135 8.68023 9 2 162.77195 49.23192 10 2 146.73950 53.37312 11 2 110.92144 37.54294 12 2 62.47228 12.88331 13 2 59.92464 8.22509 14 2 118.13048 49.34283 15 2 74.24096 17.75077 16 2 113.74954 33.55566 17 2 57.55951 6.03360 18 2 57.31492 8.76227 19 2 59.64794 8.12482 20 2 124.94885 35.86473 21 2 83.40711 22.57516 22 2 54.06012 3.11328 23 2 53.95904 2.23418 24 2 54.11551 4.61540 25 2 101.26445 33.92082 26 2 55.20264 8.28690 27 2 52.16077 4.67542 28 2 52.46353 7.96665 29 2 50.40224 5.04300 30 2 64.34241 15.84481 31 2 97.33783 29.84633 32 2 49.88432 9.48937 33 2 48.35499 6.81308 34 2 56.02825 16.59957 35 2 44.94613 3.68454 36 2 45.31085 5.02417 37 2 46.45370 7.86148 38 2 48.35202 7.45577 39 2 159.02393 60.69175 40 2 140.68775 42.35607 41 2 63.85090 14.53674 42 2 44.69606 2.33555 43 2 46.08288 3.53488 44 2 50.53235 11.96550 45 2 102.94297 24.58494 46 2 154.02986 40.02237 47 2 47.04888 6.31340 48 2 52.00559 8.76507 49 2 147.79451 35.21966 50 2 136.99033 31.78769 51 2 50.85379 8.06005 52 2 100.42847 15.81859 53 2 562.54537 104.29386 54 2 50.06653 1.99792 55 2 55.31731 7.42284 56 2 444.96391 107.83397 57 2 84.10197 16.58726 58 2 394.20170 87.15730 59 2 70.09014 11.32516 60 2 179.69804 42.51805 61 2 53.12296 8.11494 62 2 57.90019 10.78737 63 2 185.69650 35.65920 64 2 304.99610 60.21665 65 2 50.43410 5.79141 66 2 131.31947 19.13589 67 2 459.10314 78.69138 68 2 44.83491 4.06550 69 2 54.81529 10.66011 70 2 228.15471 78.58078 71 2 101.79257 23.86673 72 2 480.58671 64.86570 73 2 279.27030 24.60649 74 2 1005.79888 103.74193 75 2 1254.44115 111.83908 76 2 657.22281 82.86932 77 2 56.37964 2.34575 78 2 59.73578 8.19196 79 2 65.20465 12.30647 80 2 63.69281 2.03619 81 2 68.67820 1.91895 82 2 79.25878 2.39632 83 2 103.49986 7.63864 84 2 414.29181 33.46999 85 2 767.31593 61.80309 86 2 523.70259 29.21437 87 2 1086.23508 40.38881 88 2 3366.14573 97.62876 89 2 2852.64799 127.31142 90 2 144.41587 1.59447 91 2 136.49564 3.41243 92 2 116.30267 13.21697 93 2 101.51899 8.39333 94 2 80.89365 4.03447 95 2 64.63812 10.91171 96 2 47.41314 4.94550 97 2 45.64313 6.54045 98 2 42.12346 1.50149 99 2 40.23948 2.23758 100 2 47.95494 6.17719 101 2 44.51530 5.69793 102 2 46.68990 11.25752 103 2 44.72751 2.94589 104 2 48.84677 11.22236 105 2 41.34298 4.57479 106 2 43.11769 1.61734 107 2 40.16490 3.53333 108 2 44.19556 12.87755 109 2 36.78342 2.58973 110 2 45.25303 7.97519 111 2 34.42245 5.97650 112 2 37.20395 11.57894 113 2 36.99031 8.18363 114 2 30.04802 4.24234 115 2 29.05911 1.93466 116 2 30.65247 5.52962 117 2 39.25919 12.60553 118 2 31.37479 6.23143 119 2 32.95480 8.71476 120 2 33.34683 9.14877 121 2 28.64018 2.23928 122 2 30.96161 6.73283 123 2 32.27548 6.66316 124 2 38.87176 13.60993 125 2 28.34918 2.28963 126 2 30.89377 7.04422 127 2 30.69550 6.49067 128 2 28.73492 2.87596 129 2 33.50048 9.39073 130 2 29.46175 3.39530 131 2 36.49282 11.61143 132 2 31.79454 6.72216 133 2 30.31759 4.56391 134 2 42.75370 14.83077 135 2 32.16942 8.62399 136 2 29.02617 4.69046 137 2 28.32183 2.90256 138 2 33.24142 9.42903 139 2 34.48308 9.68168 140 2 35.92134 10.54897 141 2 38.29803 8.11770 142 2 43.13167 14.46956 143 2 28.45875 1.64274 144 2 32.15664 3.74557 145 2 47.83797 15.08145 146 2 31.07124 4.13721 147 2 34.89141 5.90824 148 2 45.91680 14.25379 149 2 29.01324 2.38663 150 2 32.35149 7.58427 151 2 31.88630 6.05307 152 2 41.04729 8.80713 153 2 45.83311 11.50142 154 2 37.92755 5.60077 155 2 57.60880 15.66695 156 2 36.78120 3.17010 157 2 40.80703 3.75769 158 2 63.20205 16.52362 159 2 43.99623 3.38244 160 2 43.41530 3.44272 161 2 48.83611 7.69024 162 2 48.02654 4.55997 163 2 46.76826 2.01315 164 2 57.05013 6.56364 165 2 84.48355 17.53791 166 2 56.78274 5.66436 167 2 84.23674 19.89337 168 2 50.49174 3.74315 169 2 60.20342 5.85562 170 2 54.14879 4.05940 171 2 189.21591 72.41638 172 2 124.19466 50.48550 173 2 81.58599 18.93813 174 2 75.36313 9.28344 175 2 115.81593 25.39800 176 2 102.71146 24.48289 177 2 63.24702 7.47907 178 2 105.46140 25.57530 179 2 42.47659 9.02126 180 2 57.00335 9.05072 181 2 67.50768 8.50419 182 2 138.45400 29.51914 183 2 92.12635 16.93663 184 2 101.38409 21.50513 185 2 66.64650 10.78565 186 2 129.47977 34.04078 187 2 38.42077 8.04871 188 2 55.73455 10.12391 189 2 57.09683 7.17040 190 2 100.29849 12.89213 191 2 212.18613 32.00684 192 2 76.18711 9.93337 193 2 140.57416 35.17933 194 2 37.95314 8.72272 195 2 66.64926 14.64538 196 2 48.09355 9.37456 197 2 68.81235 13.34293 198 2 142.85657 38.69665 199 2 108.57613 17.95671 200 2 121.33428 32.36562 201 2 159.28792 39.30868 202 2 80.03162 15.72035 203 2 85.57498 18.27559 204 2 133.66726 33.96868 205 2 153.44562 35.79877 206 2 67.59698 16.33144 207 2 170.34290 42.72268 208 2 74.93545 16.60116 209 2 149.92604 42.60370 210 2 129.23181 22.91909 211 2 169.72095 35.29440 212 2 173.62245 39.32415 213 2 110.22268 20.48081 214 2 251.86620 48.34705 215 2 566.88933 88.52235 216 2 832.93588 97.99458 217 2 578.54941 88.60411 218 2 172.77563 29.03878 219 2 1163.93830 109.84525 220 2 897.87959 116.96158 221 2 1029.20109 115.61124 222 2 2304.85722 115.83554 223 2 2492.73911 115.17508 224 2 2841.14228 124.36860 225 2 1626.63461 122.73285 226 2 264.21654 50.97592 227 2 638.83174 101.83662 228 2 776.23451 95.92944 229 2 198.46686 38.58590 230 2 67.98385 7.54171 231 2 55.55435 1.37469 232 2 56.00709 1.78829 233 2 58.59405 1.92599 234 2 63.32928 1.55966 235 2 76.63288 3.01666 236 2 119.21013 6.71854 237 2 217.36198 20.07005 238 2 98.10471 4.08328 239 2 158.50519 18.27863 240 2 132.44524 6.77678 241 2 181.56934 27.83192 242 2 234.13469 10.53000 243 2 866.27178 72.24490 244 2 536.06810 26.51613 245 2 733.03491 34.84524 246 2 545.75933 15.97669 247 2 776.70681 34.06130 248 2 2151.03393 41.24975 249 2 1573.39566 46.18526 250 2 3339.70076 56.70606 251 2 16834.87050 140.54872 252 2 18807.89700 170.68123 253 2 972.59103 56.53599 254 2 470.87485 4.91924 255 2 463.44086 40.01454 256 2 432.69679 4.34140 257 2 421.70473 2.61176 258 2 435.00925 19.69636 259 2 626.86302 114.24779 260 2 408.66518 2.00536 261 2 420.56472 1.99646 262 2 420.08806 1.98525 263 2 399.15693 1.94706 264 2 415.12787 1.95216 265 2 407.06814 1.94565 266 2 416.15120 1.96190 267 2 411.03292 1.96881 268 2 423.88868 2.01362 269 2 422.64839 2.04560 270 2 430.37453 2.10408 271 2 436.93216 2.20171 272 2 434.55991 2.31131 273 2 369.99956 2.25757 274 2 229.52903 1.39717 275 2 168.52262 1.00000 276 2 167.21900 1.00000 277 2 167.33114 1.00000 278 2 165.29601 1.00000 279 2 166.13481 1.00000 280 2 165.18580 1.00000 281 2 163.02447 1.00000 282 2 160.83555 1.00000 283 2 163.40987 1.00000 284 2 155.88426 1.00000 285 2 156.72291 1.00000 286 2 155.05760 1.00000 287 2 152.73031 1.00001 288 2 149.54299 1.00000 289 2 151.38646 1.00000 290 2 145.49351 1.00000 291 2 144.41058 1.00001 292 2 142.85468 1.00000 293 2 139.31548 1.00002 294 2 138.50696 1.00001 295 2 135.65517 1.00001 296 2 133.26915 1.00000 297 2 131.12657 1.00000 298 2 128.02357 1.00001 299 2 125.81751 1.00001 300 2 123.18214 1.00005 301 2 119.85496 1.00002 302 2 116.95172 1.00005 303 2 113.37712 1.00002 304 2 109.87568 1.00003 305 2 105.81895 1.00004 306 2 101.29727 1.00008 307 2 96.31139 1.00017 308 2 90.21892 1.00013 309 2 82.90779 1.00014 310 2 72.32880 1.00018 311 2 58.74452 0.99881 312 2 42.49284 0.95881 313 2 23.77822 0.70753 314 2 6.72280 0.24092 315 2 0.48017 0.01811 316 2 0.00000 0.00000 From bsmith at petsc.dev Sat Aug 22 17:44:44 2020 From: bsmith at petsc.dev (Barry Smith) Date: Sat, 22 Aug 2020 17:44:44 -0500 Subject: [petsc-users] 2-norm of solution update suddenly becomes zero after a few iterations In-Reply-To: References: <0BC08A7E-591F-4716-8634-55182FC337D9@petsc.dev> Message-ID: <764ECB1B-CFBF-4990-858E-5550DF5D5A62@petsc.dev> Pranay This is due to you using the "full" Picard iteration Solve A(x^{i-1}) x^{i} = b(x^{i-1}) implementation. The defect correction implementation, which is what PETSc provides is given by Solve A(x^{i}) (x^{i+1} - x^{i}) = b(x^{i}) - A(x^{i})x^{i} will not have this problem. The full Picard iteration requires a very accurate linear solve (which is expensive). The defect correction implementation can work with linear solves that only give a couple of digits of accuracy. The reason is you are computing x^{i} = x^{i-1} + (x^{i+1} - x^{i}) where (x^{i+1} - x^{i}) begins to become much smaller than x^{i-1} so only a few digits in (x^{i+1} - x^{i}) matter. While with the full Picard iteration more and more of the digits in x^{i} (which is computed directly by the linear solve matter) so the linear solver that gives you those digits must be solved more and more accurately. Just don't use the full Picard iteration. Barry > On Aug 22, 2020, at 5:14 PM, baikadi pranay wrote: > > Hi, > Thank you for the suggestions. I am attaching a text file which might help you better understand the problem. > 1) The first column is iteration number of the outer loop (not that of BiCGStab itself, but the loop I mentioned previously) > 2) The second column is the output from KSPGetConvergedReason(). > 3) The third column is the 2-norm of the solution update || xi-xi-1||2 > 4) The last column is the infinity norm of the solution update || xi-xi-1||? > > As can be seen from the file, both the 2-norm and the infinity norm are highly oscillating and become zero at the end. Please let me know if any more information is required. > > Best Regards, > Pranay. > ? > > On Sat, Aug 22, 2020 at 8:10 AM Barry Smith > wrote: > > > Pranay > > Newton's method is generally the best choice for nonlinear problems as Matt notes but PETSc also provides an implementation of Picard's method with SNESSetPicard() https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESSetPicard.html > > We implement the defect correction form of the Picard iteration because it converges much more generally when inexact linear solvers are used then the direct Picard iteration A(x^n) x^{n+1} = b(x^n), which is what Matt just said. > > Based on your email it looks like you using the direct Picard iteration algorithm. > > With your current code you can likely easily switch to trying SNESSetPicard() and then switch to trying Newton with SNESSetFunction(), https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESSetFunction.html and SNESSetJacobian() https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESSetJacobian.html > > PETSc is designed to make it as simple as possible to switch between various algorithms to help determine the best for your exact problem. > > SNES uses KSP for is linear solvers so you get full access to all the possible preconditioners, for larger problems as Matt notes, once you have the best nonlinear convergence selected tuning the linear solver is the most important thing to do for speed. We recommend when possible first getting good nonlinear convergence using a direct linear solver and then switching to an iterative solver as an optimization, for large problems you almost always should to switch to an iterative solver when the problem size increases. > > Barry > > >> On Aug 22, 2020, at 7:24 AM, Matthew Knepley > wrote: >> >> On Sat, Aug 22, 2020 at 2:07 AM baikadi pranay > wrote: >> Hello, >> >> I am trying to solve the Poisson equation in 2D for heterostructure devices. I have linearized the equation and discretized it using FDM. I am using BiCGStab to iteratively solve for the solution as follows: >> >> Step 1: Solve A^(i-1) x^(i) = b^(i-1) {i = 1 to N where convergence is reached} >> Step 2: Use x^{i} to update the central coefficients of A^{i-1} to get A^{i} and similarly update b^{i-1} to get b^{i} >> Step3: If ( ||x^{i}-x^{i-1}||_2 , the 2-norm of the solution update, is greater than a tolerance, then go back to Step 1 to solve the new system of equations using BiCGStab. Else, exit the loop. >> 1) I am facing the following problem with this procedure: >> The 2-norm of the solution update is suddenly becoming zero after a few iterations in some cases. I print out the getconvergedreason and there are not red flags there, so I am kind of confused whey this behaviour is being observed. This behaviour is leading to "false convergences", in the sense that the solutions obtained are not physical. >> >> A similar behaviour was observed when I used SOR instead of BiCGStab. At this point I am starting to suspect if it is wrong to use linear solvers on the poisson equation which is a nonlinear equation (although linearized). If you could please comment on this, that would be very helpful. >> >> Any help with this problem is greatly appreciated. Please let me know if you need any further information. >> >> 1) You are coding up the Picard method by hand to solve your nonlinear equation. If the operator is not contractive, this can stagnate, as you are seeing. You >> could try another solver, like Newton's method. We have a variety of nonlinear solves in the SNES class. >> >> 2) It is not clear from your description whether you linear solver is converging. BiCGStab without a preconditioner is a terrible solver for Poisson. We usually >> recommend starting with Algebraic Multigrid, like Hypre which is great at 2D Poisson. You can monitor the convergence of your linear solver using >> >> -knp_monitor_true_solution -ksp_converged_reason >> >> We want to see this information with any questions about convergence. >> >> Thanks, >> >> Matt >> >> Thank you, >> >> Sincerely, >> Pranay. >> >> ? >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sajidsyed2021 at u.northwestern.edu Sat Aug 22 17:47:45 2020 From: sajidsyed2021 at u.northwestern.edu (Sajid Ali) Date: Sat, 22 Aug 2020 17:47:45 -0500 Subject: [petsc-users] Question on matrix assembly In-Reply-To: References: <62502F3C-4411-4FCB-BD4F-6DCF0100180F@gmail.com> <51B8FA5C-0231-40A8-A579-7536A0D9D5D8@petsc.dev> Message-ID: Hi Barry, Thanks for creating the new function. I'm somewhat confused as to how I'd use it. Given an MPIAIJ matrix, is one supposed to extract the local SeqAIJ matrix and set the preallocation on each mpi-rank independently followed by MatSetValues (on the MPIAIJ matrix) to fill the rows? Or, does one create SeqAIJ matrices on each rank and then combine them into a parallel MPIAIJ matrix say by using MatCreateMPIMatConcatenateSeqMat? I tried the second approach but leaving the "number of local columns" for MatCreateMPIMatConcatenateSeqMat as PETSC_DECIDE causes a crash (when running with 1 mpi rank). Is this the correct approach to take and if yes what does "number of local columns" mean when combining the seqaij matrices ? Thank You, Sajid Ali | PhD Candidate Applied Physics Northwestern University s-sajid-ali.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sat Aug 22 18:19:51 2020 From: bsmith at petsc.dev (Barry Smith) Date: Sat, 22 Aug 2020 18:19:51 -0500 Subject: [petsc-users] Question on matrix assembly In-Reply-To: References: <62502F3C-4411-4FCB-BD4F-6DCF0100180F@gmail.com> <51B8FA5C-0231-40A8-A579-7536A0D9D5D8@petsc.dev> Message-ID: Oh, sorry, it is unusable for MPI AIJ matrixes. We would need a special little additional code for MPIAIJ also to get it to work. Since your matrices were so tiny I just assumed their usage was sequential. For MPI there would need to be a new MatMPIAIJSetTotalPreallocation(A, PetscInt d, PetscInt o) that would call the new MatSeqAIJSetTotalPreallocation() on each part of the MPI matrix and a new MatSetValues_MPIAIJ_xxx() that found for each row the parts you past in for diagonal part and off diagonal part, mapped the indices appropriately and call the MatSetValues() one each for the two submatrices appropriately. This would mean packing the two off diagonal parts in the input row data structure together for the call to MatSetValues() on the off-diagonal part. Kind of annoying; instead one could add support to the special MatSetValues_SeqIAIJ_xxx() function to allow it to be called multiple times for each row so long as the each new part came after the previous in column index. Barry > On Aug 22, 2020, at 5:47 PM, Sajid Ali wrote: > > Hi Barry, > > Thanks for creating the new function. I'm somewhat confused as to how I'd use it. Given an MPIAIJ matrix, is one supposed to extract the local SeqAIJ matrix and set the preallocation on each mpi-rank independently followed by MatSetValues (on the MPIAIJ matrix) to fill the rows? Or, does one create SeqAIJ matrices on each rank and then combine them into a parallel MPIAIJ matrix say by using MatCreateMPIMatConcatenateSeqMat? > > I tried the second approach but leaving the "number of local columns" for MatCreateMPIMatConcatenateSeqMat as PETSC_DECIDE causes a crash (when running with 1 mpi rank). Is this the correct approach to take and if yes what does "number of local columns" mean when combining the seqaij matrices ? > > Thank You, > Sajid Ali | PhD Candidate > Applied Physics > Northwestern University > s-sajid-ali.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From bourdin at lsu.edu Sun Aug 23 14:20:40 2020 From: bourdin at lsu.edu (Blaise A Bourdin) Date: Sun, 23 Aug 2020 19:20:40 +0000 Subject: [petsc-users] read argument from an XML file In-Reply-To: <87364gh91z.fsf@jedbrown.org> References: <877dtshauy.fsf@jedbrown.org> <87364gh91z.fsf@jedbrown.org> Message-ID: <89737C07-BAD0-408C-876D-4EADF53620AE@lsu.edu> Note that the YAML parser simply reads the yaml file, converts it into a petsc option string which is later parsed by your code. Some of the advanced features of YAML, which do not translate easily into petsc options (sequences, tags, and anchors, for instance) are not implemented. Still, I think that an option file of the form disp: snes: type: ls linesearch: type: basic damping: 1.0 lag: preconditioner: 1 atol: 1.0e-7 rtol: 1.0e-5 ksp: type: cg atol: 1.e-7 rtol: 1.e-5 pc: type: ml is a major improvement over the petsc options equivalent -disp_snes_type ls -disp_snes_linesearch_type basic -disp_snes_linesearch_damping 1.0 -disp_snes_lag_preconditioner 1 -disp_snes_atol 1.0e-7 -disp_snes_rtol 1.0e-5 -disp_ksp_type cg -disp_ksp_atol 1.e-7 -disp_ksp_rtol 1.e-5 -disp_pc_type ml and is much easier to parse from other codes. Regards, Blaise > On Aug 21, 2020, at 12:17 AM, Jed Brown wrote: > > Alex Fleeter writes: > >> Thanks, we will try that. I have never used YAML before. > > It's meant to be more human-readable than XML, and is widely used these days. > >> Anyway, we feel using command line arguments is a bit old fashioned. It can >> be quite desirable to set parameters from a human-readable file. > > My usual workflow is to build up comprehensive options by experimenting on the command line, then put them in a file (either using the basic format that Barry mentioned or YAML). -- A.K. & Shirley Barton Professor of Mathematics Adjunct Professor of Mechanical Engineering Adjunct of the Center for Computation & Technology Louisiana State University, Lockett Hall Room 344, Baton Rouge, LA 70803, USA Tel. +1 (225) 578 1612, Fax +1 (225) 578 4276 Web http://www.math.lsu.edu/~bourdin From luis.saturday at gmail.com Sun Aug 23 22:50:25 2020 From: luis.saturday at gmail.com (Alex Fleeter) Date: Sun, 23 Aug 2020 20:50:25 -0700 Subject: [petsc-users] MatNest question Message-ID: Hi: I have been trying MatNest. I have two questions. 1. I have to call MatAssemblyBegin/End for both sub-matrices and for the MatNest object. From my experiment, if I miss any of those, I will get an error message [0]PETSC ERROR: Object is in wrong state [0]PETSC ERROR: Not for unassembled matrix I am a bit confused because in the example https://www.mcs.anl.gov/petsc/petsc-current/src/snes/tutorials/ex70.c.html The MatAssemblyBegin/End are only called for submatrices. I assume that example works correctly. On the other hand, in the actual implementation, https://www.mcs.anl.gov/petsc/petsc-current/src/mat/impls/nest/matnest.c.html#MatAssemblyBegin_Nest, I can see that the MatAssemblyBegin/End are already called for each sub-matrix. It seems that one should only call one MatAssemblyBegin/End for the big nest matrix. Unfortunately, I tried that and got an error message shown above. I must have misunderstood something. 2. From previous threads, Jed has strongly suggested using standard Vec instead of VecNest, unless there are strong reasons. In my case, since MatNest is already used as the underlying matrix object, the local element assembly will generate residual vectors for separate physics. So it is rather natural to put the right-hand side residual vector as a nest vector. Of course, a consequence is that I have to put every vector object in the nested format, since my experiments suggest VecNest cannot be used with standard Vec. My question is this. How mature is the VecNest implementation? Is it a good design to put every vector in the nest format? Thanks, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Sun Aug 23 23:03:34 2020 From: jed at jedbrown.org (Jed Brown) Date: Sun, 23 Aug 2020 22:03:34 -0600 Subject: [petsc-users] MatNest question In-Reply-To: References: Message-ID: <87o8n0d71l.fsf@jedbrown.org> Alex Fleeter writes: > Hi: > > I have been trying MatNest. I have two questions. > > 1. I have to call MatAssemblyBegin/End for both sub-matrices and for the > MatNest object. From my experiment, if I miss any of those, I will get an > error message > [0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Not for unassembled matrix > > I am a bit confused because in the example > https://www.mcs.anl.gov/petsc/petsc-current/src/snes/tutorials/ex70.c.html The > MatAssemblyBegin/End are only called for submatrices. I assume that example > works correctly. The example is run in continuous integration. You'll need to provide more details (like a minimal reproducer) if you want us to explain why assembly is needed in your usage, or if it can be relaxed. > On the other hand, in the actual implementation, > https://www.mcs.anl.gov/petsc/petsc-current/src/mat/impls/nest/matnest.c.html#MatAssemblyBegin_Nest, > I can see that the MatAssemblyBegin/End are already called for each > sub-matrix. It seems that one should only call one MatAssemblyBegin/End for > the big nest matrix. Unfortunately, I tried that and got an error message > shown above. I must have misunderstood something. > > 2. From previous threads, Jed has strongly suggested using standard Vec > instead of VecNest, unless there are strong reasons. In my case, since > MatNest is already used as the underlying matrix object, the local element > assembly will generate residual vectors for separate physics. Please look at src/snes/tutorials/ex28.c. Also, VecGetSubVector() is no-copy when the subvector is contiguous, so you can assemble into "separate" vectors that are actually part of the same contiguous Vec. VecNest incurs extra overhead in all the usual (e.g., Krylov) operations. I don't think you have a good reason to use it. Yes, I'm that troll under the bridge trying to scare you off. > So it is rather natural to put the right-hand side residual vector as > a nest vector. Of course, a consequence is that I have to put every > vector object in the nested format, since my experiments suggest > VecNest cannot be used with standard Vec. My question is this. How > mature is the VecNest implementation? Is it a good design to put every > vector in the nest format? > > Thanks, > > Alex From thibault.bridelbertomeu at gmail.com Mon Aug 24 00:56:55 2020 From: thibault.bridelbertomeu at gmail.com (Thibault Bridel-Bertomeu) Date: Mon, 24 Aug 2020 07:56:55 +0200 Subject: [petsc-users] PetscFV and TS implicit In-Reply-To: References: <87mu2pgtdp.fsf@jedbrown.org> <01FA5D4D-A0CA-4ACB-ACC9-EB213E3B0D2F@petsc.dev> Message-ID: Barry, first of all, thank you very much for your detailed answer, I keep reading it to let it soak in - I might come back to you for more details if you do not mind. In the meantime, to fuel the conversation, I attach to this e-mail two pdfs containing the pieces of the code that regard what we are discussing. In the *timedisc.pdf, you'll find how I handle the initialization of the TS object, and in the *petscdefs.pdf you'll find the method that calls the TSSolve as well as the methods that are linked to the TS (the timestep adapt, the jacobian etc ...). [Sorry for the quality, I cannot do better than that sort of pdf ...] Based on what is in the structured code I sent you the other day, I rewrote the PetscJacobianFunction_JFNK. I think it should be all right, but although it compiles, execution raises a seg fault I think when I do ierr = TSSetIJacobian(ts, A, A, PetscIJacobian, user); saying that A does not have the right dimensions. It is quite new, I am still looking into where exactly the error is raised. What do you think of this implementation though, does it look correct in your expert eyes ? As for what we really discussed so far, it's that PetscComputePreconMatImpl that I do not know how to implement (with the derivative of the jacobian based on the FVM object). I understand now that what I am showing you today might not be the right way to go if one wants to really use the PetscFV, but I just wanted to add those code lines to the conversation to have your feedback. Thank you again for your help, Thibault Le ven. 21 ao?t 2020 ? 19:25, Barry Smith a ?crit : > > > On Aug 21, 2020, at 10:58 AM, Thibault Bridel-Bertomeu < > thibault.bridelbertomeu at gmail.com> wrote: > > Thank you Barry for the tip ! I?ll make sure to do that when everything is > set. > What I also meant is that there will not be any more direct way to set the > preconditioner than to go through SNESSetJacobian after having assembled > everything by hand ? Like, in my case, or in the more general case of fluid > dynamics equations, the preconditioner is not a fun matrix to assemble, > because for every cell the derivative of the physical flux jacobian has to > be taken and put in the right block in the matrix - finite element style if > you want. Is there a way to do that with Petsc methods, maybe > short-circuiting the FEM based methods ? > > > Thibault > > I am not sure what you mean but there are a couple of things that may > be helpful. > > PCSHELL > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCSHELL.html allows > you to build your own preconditioner (that can and often will use one or > more of its own Mats, and KSP or PC inside it, or even use another PETScFV > etc to build some of the sub matrices for you if it is appropriate), this > approach means you never need to construct a "global" PETSc matrix from > which PETSc builds the preconditioner. But you should only do this if the > conventional approach is not reasonable for your problem. > > MATNEST > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATNEST.html allows > you to build a global matrix by building parts of it separately and even > skipping parts you decide you don't need in the preconditioner. > Conceptually it is the same as just creating a global matrix and filling up > but the process is a bit different and something suitable for "multi > physics" or "multi-equation" type applications. > > Of course what you put into PCSHELL and MATNEST will affect the > convergence of the nonlinear solver. As Jed noted what you put in the > "Jacobian" does not have to be directly the same mathematically as what you > put into the TSSetI/RHSFunction with the caveat that it does have to > appropriate spectral properties to result in a good preconditioner for the > "true" Jacobian. > > Couple of other notes: > > The entire business of "Jacobian" matrix-free or not (with for example > -snes_fd_operator) is tricky because as Jed noted if your finite volume > scheme has non-differential terms such as if () tests. There is a concept > of sub-differential for this type of thing but I know absolutely nothing > about that and probably not worth investigating. > > In this situation you can avoid the "true" Jacobian completely (both for > matrix-vector product and preconditioner) and use something else as Jed > suggested a lower order scheme that is differentiable. This can work well > for solving the nonlinear system or not depending on how suitable it is for > your original "function" > > > 1) In theory at least you can have the Jacobian matrix-vector product > computed directly using DMPLEX/PETScFV infrastructure (it would apply the > Jacobian locally matrix-free using code similar to the code that evaluates > the FV "function". I do no know if any of this code is written, it will be > more efficient than -snes_mf_operator that evaluates the FV "function" and > does traditional differencing to compute the Jacobian. Again it has the > problem of non-differentialability if the function is not differential. But > it could be done for a different (lower order scheme) that is > differentiable. > > 2) You can have PETSc compute the Jacobian explicitly coloring and from > that build the preconditioner, this allows you to avoid the hassle of > writing the code for the derivatives yourself. This uses finite differences > on your function and coloring of the graph to compute many columns of the > Jacobian simultaneously and can be pretty efficient. Again if the function > is not differential there can be issues of what the result means and will > it work in a nonlinear solver. SNESComputeJacobianDefaultColor > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESComputeJacobianDefaultColor.html > > 3) Much more outlandish is to skip Newton and Jacobians completely and use > the full approximation scheme SNESFAS > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNESFAS/SNESFAS.html this > requires a grid hierarchy and appropriate way to interpolate up through the > grid hierarchy your finite volume solutions. Probably not worth > investigating unless you have lots of time on your hands and keen interest > in this kind of stuff https://arxiv.org/pdf/1607.04254.pdf > > So to summarize, and Matt and Jed can correct my mistakes. > > 1) Form the full Jacobian from the original "function" using analytic > approach use it for both the matrix-vector product and to build the > preconditioner. Problem if full Jacobian not well defined mathematically. > Tough to code, usually not practical. > > 2) Do any matrix free (any way) for the full Jacobian and > > a) build another "approximate" Jacobian (using any technique analytic or > finite differences using matrix coloring on a new "lower order" "function") > Still can have trouble if this original Jacobian is no well defined > > b) "write your own preconditioner" that internally can use anything in > PETSc that approximately solves the Jacobian. Same potential problems if > original Jacobian is not differential, plus convergence will depend on how > good your own preconditioner approximates the inverse of the true Jacobian. > > 3) Use a lower Jacobian (computed anyway you want) for the matrix-vector > product and the preconditioner. The problem of differentiability is gone > but convergence of the nonlinear solver depends on how well lower order > Jacobian is appropriate for the original "function" > > a) Form the "lower order" Jacobian analytically or with coloring and > use for both matrix-vector product and building preconditioner. Note that > switching between this and 2a is trivial. > > b) Do the "lower order" Jacobian matrix free and provide your own > PCSHELL. Note that switching between this and 2b is trivial. > > Barry > > I would first try competing the "true" Jacobian via coloring, if that > works and give satisfactory results (fast enough) then stop. > > Then I would do 2a/2b by writing my "function" using PETScFV and writing > the "lower order function" via PETScFV and use matrix coloring to get the > Jacobian from the second "lower order function". If this works well (either > with 2a or 3a or both) then stop or you can compute the "lower order" > Jacobian analytically (again using PetscFV) for a more efficient evaluation > of the Jacobian. > > > > > Thanks ! > > Thibault > > Le ven. 21 ao?t 2020 ? 17:22, Barry Smith a ?crit : > >> >> >> On Aug 21, 2020, at 8:35 AM, Thibault Bridel-Bertomeu < >> thibault.bridelbertomeu at gmail.com> wrote: >> >> >> >> Le ven. 21 ao?t 2020 ? 15:23, Matthew Knepley a >> ?crit : >> >>> On Fri, Aug 21, 2020 at 9:10 AM Thibault Bridel-Bertomeu < >>> thibault.bridelbertomeu at gmail.com> wrote: >>> >>>> Sorry, I sent too soon, I hit the wrong key. >>>> >>>> I wanted to say that context.npoints is the local number of cells. >>>> >>>> PetscRHSFunctionImpl allows to generate the hyperbolic part of the >>>> right hand side. >>>> Then we have : >>>> >>>> PetscErrorCode PetscIJacobian( >>>> TS ts, /*!< Time stepping object (see PETSc TS)*/ >>>> PetscReal t, /*!< Current time */ >>>> Vec Y, /*!< Solution vector */ >>>> Vec Ydot, /*!< Time-derivative of solution vector */ >>>> PetscReal a, /*!< Shift */ >>>> Mat A, /*!< Jacobian matrix */ >>>> Mat B, /*!< Preconditioning matrix */ >>>> void *ctxt /*!< Application context */ >>>> ) >>>> { >>>> PETScContext *context = (PETScContext*) ctxt; >>>> HyPar *solver = context->solver; >>>> _DECLARE_IERR_; >>>> >>>> PetscFunctionBegin; >>>> solver->count_IJacobian++; >>>> context->shift = a; >>>> context->waqt = t; >>>> /* Construct preconditioning matrix */ >>>> if (context->flag_use_precon) { IERR PetscComputePreconMatImpl(B,Y, >>>> context); CHECKERR(ierr); } >>>> >>>> PetscFunctionReturn(0); >>>> } >>>> >>>> and PetscJacobianFunction_JFNK which I bind to the matrix shell, >>>> computes the action of the jacobian on a vector : say U0 is the state of >>>> reference and Y the vector upon which to apply the JFNK method, then the >>>> PetscJacobianFunction_JFNK returns shift * Y - 1/epsilon * (F(U0 + >>>> epsilon*Y) - F(U0)) where F allows to evaluate the hyperbolic flux (shift >>>> comes from the TS). >>>> The preconditioning matrix I compute as an approximation to the actual >>>> jacobian, that is shift * Identity - Derivative(dF/dU) where dF/dU is, in >>>> each cell, a 4x4 matrix that is known exactly for the system of equations I >>>> am solving, i.e. Euler equations. For the structured grid, I can loop on >>>> the cells and do that 'Derivative' thing at first order by simply taking a >>>> finite-difference like approximation with the neighboring cells, >>>> Derivative(phi) = phi_i - phi_{i-1} and I assemble the B matrix block by >>>> block (JFunction is the dF/dU) >>>> >>>> /* diagonal element */ >>>> >>>> >>>> for (v=0; v>>> + v; } >>>> >>>> >>>> ierr = solver->JFunction >>>> >>>> (values,(u+nvars*p),solver->physics >>>> >>>> ,dir,0); >>>> >>>> >>>> _ArrayScale1D_ >>>> >>>> (values,(dxinv*iblank),(nvars*nvars)); >>>> >>>> >>>> ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); >>>> CHKERRQ(ierr); >>>> >>>> >>>> >>>> >>>> >>>> /* left neighbor */ >>>> >>>> >>>> if (pgL >= 0) { >>>> >>>> >>>> for (v=0; v>>> nvars*pgL + v; } >>>> >>>> >>>> ierr = solver->JFunction >>>> >>>> (values,(u+nvars*pL),solver->physics >>>> >>>> ,dir,1); >>>> >>>> >>>> _ArrayScale1D_ >>>> >>>> (values,(-dxinv*iblank),(nvars*nvars)); >>>> >>>> >>>> ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); >>>> CHKERRQ(ierr); >>>> >>>> >>>> } >>>> >>>> >>>> >>>> >>>> >>>> /* right neighbor */ >>>> >>>> >>>> if (pgR >= 0) { >>>> >>>> >>>> for (v=0; v>>> nvars*pgR + v; } >>>> >>>> >>>> ierr = solver->JFunction >>>> >>>> (values,(u+nvars*pR),solver->physics >>>> >>>> ,dir,-1); >>>> >>>> >>>> _ArrayScale1D_ >>>> >>>> (values,(-dxinv*iblank),(nvars*nvars)); >>>> >>>> >>>> ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); >>>> CHKERRQ(ierr); >>>> >>>> >>>> } >>>> >>>> >>>> >>>> I do not know if I am clear here ... >>>> Anyways, I am trying to figure out how to do this shell matrix and this >>>> preconditioner using all the FV and DMPlex artillery. >>>> >>> >>> Okay, that is very clear. We should be able to get the JFNK just with >>> -snes_mf_operator, and put the approximate J construction in >>> DMPlexComputeJacobian_Internal(). >>> There is an FV section already, and we could just add this. I would need >>> to understand those entries in the pointwise Riemann sense that the other >>> stuff is now. >>> >> >> Ok i had a quick look and if I understood correctly it would do the job. >> Setting the snes-mf-operator flag would mean however that we have to go >> through SNESSetJacobian to set the jacobian and the preconditioning matrix >> wouldn't it ? >> >> >> Thibault, >> >> Since the TS implicit methods end up using SNES internally the option >> should be available to you without requiring you to be calling the SNES >> routines directly >> >> Once you have finalized your approach and if for the implicit case you >> always work in the snes mf operator mode you can hardwire >> >> TSGetSNES(ts,&snes); >> SNESSetUseMatrixFree(snes,PETSC_TRUE,PETSC_FALSE); >> >> in your code so you don't need to always provide the option >> -snes-mf-operator >> >> Barry >> >> >> >> >> There might be calls to the Riemann solver to evaluate the dRHS / dU part >> yes but maybe it's possible to re-use what was computed for the RHS^n ? >> In the FV section the jacobian is set to identity which I missed before, >> but it could explain why when I used the following : >> >> TSSetType(ts, TSBEULER); >> DMTSSetIFunctionLocal(dm, DMPlexTSComputeIFunctionFEM , &ctx); >> DMTSSetIJacobianLocal(dm, DMPlexTSComputeIJacobianFEM , &ctx); >> >> with my FV discretization nothing happened, right ? >> >> Thank you, >> >> Thibault >> >> Thanks, >>> >>> Matt >>> >>> >>>> Le ven. 21 ao?t 2020 ? 14:55, Thibault Bridel-Bertomeu < >>>> thibault.bridelbertomeu at gmail.com> a ?crit : >>>> >>>>> Hi, >>>>> >>>>> Thanks Matthew and Jed for your input. >>>>> I indeed envision an implicit solver in the sense Jed mentioned - Jiri >>>>> Blazek's book is a nice intro to this concept. >>>>> >>>>> Matthew, I do not know exactly what to change right now because >>>>> although I understand globally what the DMPlexComputeXXXX_Internal methods >>>>> do, I cannot say for sure line by line what is happening. >>>>> In a structured code, I have a an implicit FVM solver with PETSc but I >>>>> do not use any of the FV structure, not even a DM - I just use C arrays >>>>> that I transform to PETSc Vec and Mat and build my IJacobian and my >>>>> preconditioner and gives all that to a TS and it runs. I cannot figure out >>>>> how to do it with the FV and the DM and all the underlying "shortcuts" that >>>>> I want to use. >>>>> >>>>> Here is the top method for the structured code : >>>>> >>>>> int total_size = context.npoints * solver->nvars >>>>> ierr = TSSetRHSFunction(ts,PETSC_NULL,PetscRHSFunctionImpl,&context); >>>>> CHKERRQ(ierr); >>>>> SNES snes; >>>>> KSP ksp; >>>>> PC pc; >>>>> SNESType snestype; >>>>> ierr = TSGetSNES(ts,&snes); CHKERRQ(ierr); >>>>> ierr = SNESGetType(snes,&snestype); CHKERRQ(ierr); >>>>> >>>>> flag_mat_a = 1; >>>>> ierr = MatCreateShell(MPI_COMM_WORLD,total_size,total_size, >>>>> PETSC_DETERMINE, >>>>> PETSC_DETERMINE,&context,&A); CHKERRQ(ierr); >>>>> context.jfnk_eps = 1e-7; >>>>> ierr = PetscOptionsGetReal(NULL,NULL,"-jfnk_epsilon",&context.jfnk_eps >>>>> ,NULL); CHKERRQ(ierr); >>>>> ierr = MatShellSetOperation(A,MATOP_MULT,(void (*)(void)) >>>>> PetscJacobianFunction_JFNK); CHKERRQ(ierr); >>>>> ierr = MatSetUp(A); CHKERRQ(ierr); >>>>> >>>>> context.flag_use_precon = 0; >>>>> ierr = PetscOptionsGetBool(PETSC_NULL,PETSC_NULL,"-with_pc",(PetscBool >>>>> *)(&context.flag_use_precon),PETSC_NULL); CHKERRQ(ierr); >>>>> >>>>> /* Set up preconditioner matrix */ >>>>> flag_mat_b = 1; >>>>> ierr = MatCreateAIJ(MPI_COMM_WORLD,total_size,total_size, >>>>> PETSC_DETERMINE,PETSC_DETERMINE, >>>>> (solver->ndims*2+1)*solver->nvars,NULL, >>>>> 2*solver->ndims*solver->nvars,NULL,&B); CHKERRQ(ierr); >>>>> ierr = MatSetBlockSize(B,solver->nvars); >>>>> /* Set the RHSJacobian function for TS */ >>>>> ierr = TSSetIJacobian(ts,A,B,PetscIJacobian,&context); CHKERRQ(ierr); >>>>> >>>>> Thibault Bridel-Bertomeu >>>>> ? >>>>> Eng, MSc, PhD >>>>> Research Engineer >>>>> CEA/CESTA >>>>> 33114 LE BARP >>>>> Tel.: (+33)557046924 >>>>> Mob.: (+33)611025322 >>>>> Mail: thibault.bridelbertomeu at gmail.com >>>>> >>>>> >>>>> Le jeu. 20 ao?t 2020 ? 18:43, Jed Brown a ?crit : >>>>> >>>>>> Matthew Knepley writes: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> > I could never get the FVM stuff to make sense to me for implicit >>>>>> methods. >>>>>> >>>>>> >>>>>> > Here is my problem understanding. If you have an FVM method, it >>>>>> decides >>>>>> >>>>>> >>>>>> > to move "stuff" from one cell to its neighboring cells depending on >>>>>> the >>>>>> >>>>>> >>>>>> > solution to the Riemann problem on each face, which computed the >>>>>> flux. This >>>>>> >>>>>> >>>>>> > is >>>>>> >>>>>> >>>>>> > fine unless the timestep is so big that material can flow through >>>>>> into the >>>>>> >>>>>> >>>>>> > cells beyond the neighbor. Then I should have considered the effect >>>>>> of the >>>>>> >>>>>> >>>>>> > Riemann problem for those interfaces. That would be in the >>>>>> Jacobian, but I >>>>>> >>>>>> >>>>>> > don't know how to compute that Jacobian. I guess you could do >>>>>> everything >>>>>> >>>>>> >>>>>> > matrix-free, but without a preconditioner it seems hard. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> So long as we're using method of lines, the flux is just >>>>>> instantaneous flux, not integrated over some time step. It has the same >>>>>> meaning for implicit and explicit. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> An explicit method would be unstable if you took such a large time >>>>>> step (CFL) and an implicit method will not simultaneously be SSP and higher >>>>>> than first order, but it's still a consistent discretization of the problem. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> It's common (done in FUN3D and others) to precondition with a >>>>>> first-order method, where gradient reconstruction/limiting is skipped. >>>>>> That's what I'd recommend because limiting creates nasty nonlinearities and >>>>>> the resulting discretizations lack h-ellipticity which makes them very hard >>>>>> to solve. >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >>> >>> >> >> >> -- > Thibault Bridel-Bertomeu > ? > Eng, MSc, PhD > Research Engineer > CEA/CESTA > 33114 LE BARP > Tel.: (+33)557046924 > Mob.: (+33)611025322 > Mail: thibault.bridelbertomeu at gmail.com > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: petsc_fvm_part_timedisc_v2.pdf Type: application/pdf Size: 303982 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: petsc_fvm_part_petscdefs_v2.pdf Type: application/pdf Size: 413581 bytes Desc: not available URL: From mlohry at gmail.com Mon Aug 24 07:54:07 2020 From: mlohry at gmail.com (Mark Lohry) Date: Mon, 24 Aug 2020 08:54:07 -0400 Subject: [petsc-users] Bus Error In-Reply-To: References: <8D172ADD-FC1A-4E71-B151-CA648951A61C@petsc.dev> Message-ID: Reran with debug mode and got a stack trace for this bus error, looks like it's happening in BLASgemv, see pasted below. I did take care of the ISColoring leak mentioned previously, although that was a very small amount of data and I don't think is relevant here. At this point it's happily run 222 timesteps prior to this, so I'm a little mystified. Any ideas? Thanks, Mark 222 TS dt 0.03 time 6.66 0 SNES Function norm 4.124287265556e+02 0 KSP Residual norm 4.124287265556e+02 1 KSP Residual norm 4.123248052318e+02 2 KSP Residual norm 4.123173350456e+02 3 KSP Residual norm 4.118769044110e+02 4 KSP Residual norm 4.094856150740e+02 5 KSP Residual norm 4.006000788078e+02 6 KSP Residual norm 3.787922969183e+02 [clip] Linear solve converged due to CONVERGED_RTOL iterations 9 Line search: Using full step: fnorm 4.015236590684e+01 gnorm 3.173434863784e+00 2 SNES Function norm 3.173434863784e+00 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 2 0 SNES Function norm 5.842010710080e+02 0 KSP Residual norm 5.842010710080e+02 1 KSP Residual norm 5.840526408234e+02 2 KSP Residual norm 5.840431857354e+02 3 KSP Residual norm 5.834351392302e+02 4 KSP Residual norm 5.800901047861e+02 5 KSP Residual norm 5.675562288567e+02 6 KSP Residual norm 5.366287895681e+02 7 KSP Residual norm 4.725811521866e+02 [911]PETSC ERROR: ------------------------------------------------------------------------ [911]PETSC ERROR: Caught signal number 7 BUS: Bus Error, possibly illegal memory access [911]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [911]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [911]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [911]PETSC ERROR: likely location of problem given in stack below [911]PETSC ERROR: --------------------- Stack Frames ------------------------------------ [911]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, [911]PETSC ERROR: INSTEAD the line number of the start of the function [911]PETSC ERROR: is given. [911]PETSC ERROR: [911] BLASgemv line 1393 /home/mlohry/build/external/petsc/src/mat/impls/baij/seq/baijfact.c [911]PETSC ERROR: [911] MatSolve_SeqBAIJ_N_NaturalOrdering line 1378 /home/mlohry/build/external/petsc/src/mat/impls/baij/seq/baijfact.c [911]PETSC ERROR: [911] MatSolve line 3354 /home/mlohry/build/external/petsc/src/mat/interface/matrix.c [911]PETSC ERROR: [911] PCApply_ILU line 201 /home/mlohry/build/external/petsc/src/ksp/pc/impls/factor/ilu/ilu.c [911]PETSC ERROR: [911] PCApply line 426 /home/mlohry/build/external/petsc/src/ksp/pc/interface/precon.c [911]PETSC ERROR: [911] KSP_PCApply line 279 /home/mlohry/build/external/petsc/include/petsc/private/kspimpl.h [911]PETSC ERROR: [911] KSPSolve_PREONLY line 16 /home/mlohry/build/external/petsc/src/ksp/ksp/impls/preonly/preonly.c [911]PETSC ERROR: [911] KSPSolve_Private line 590 /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c [911]PETSC ERROR: [911] KSPSolve line 848 /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c [911]PETSC ERROR: [911] PCApply_ASM line 441 /home/mlohry/build/external/petsc/src/ksp/pc/impls/asm/asm.c [911]PETSC ERROR: [911] PCApply line 426 /home/mlohry/build/external/petsc/src/ksp/pc/interface/precon.c [911]PETSC ERROR: [911] KSP_PCApply line 279 /home/mlohry/build/external/petsc/include/petsc/private/kspimpl.h [911]PETSC ERROR: [911] KSPFGMRESCycle line 108 /home/mlohry/build/external/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c [911]PETSC ERROR: [911] KSPSolve_FGMRES line 274 /home/mlohry/build/external/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c [911]PETSC ERROR: [911] KSPSolve_Private line 590 /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c [911]PETSC ERROR: [911] KSPSolve line 848 /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c [911]PETSC ERROR: [911] SNESSolve_NEWTONLS line 144 /home/mlohry/build/external/petsc/src/snes/impls/ls/ls.c [911]PETSC ERROR: [911] SNESSolve line 4403 /home/mlohry/build/external/petsc/src/snes/interface/snes.c [911]PETSC ERROR: [911] TSStep_ARKIMEX line 728 /home/mlohry/build/external/petsc/src/ts/impls/arkimex/arkimex.c [911]PETSC ERROR: [911] TSStep line 3682 /home/mlohry/build/external/petsc/src/ts/interface/ts.c [911]PETSC ERROR: [911] TSSolve line 4005 /home/mlohry/build/external/petsc/src/ts/interface/ts.c [911]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [911]PETSC ERROR: Signal received [911]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [911]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020 [911]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h20c2n20 by mlohry Sun Aug 23 19:54:21 2020 [911]PETSC ERROR: Configure options PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt --with-cc=/usr/local/openmpi/3.1.3/gcc/x8 [911]PETSC ERROR: #1 User provided function() line 0 in unknown file -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 911 in communicator MPI_COMM_WORLD On Wed, Aug 12, 2020 at 8:19 PM Mark Lohry wrote: > Perhaps you are calling ISColoringGetIS() and not calling >> ISColoringRestoreIS()? >> > > I have matching ISColoringGet/Restore here, and it's only used prior to > the first iteration so at least it doesn't seem to be growing. At the > bottom I pasted the malloc_view and malloc_debug output from running 1 time > step. > > I'm sort of thinking this might be a red herring -- is it possible the > rank 0 process is chewing up dramatically more memory than others, like > with logging or something? Like I mentioned earlier the total memory usage > is well under the machine limits. I'll spring in some > PetscMemoryGetMaximumUsage logging at every time step and try to get a big > job going again. > > > > Are you using Fortran? >> > > C++ > > > > [ 0]1408 bytes PetscSplitReductionCreate() line 63 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c > [ 0]80 bytes PetscSplitReductionCreate() line 57 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c > [ 0]16 bytes PetscCommBuildTwoSided_Allreduce() line 169 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/mpits.c > [ 0]16 bytes ISGeneralSetIndices_General() line 578 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]16 bytes PetscLayoutSetUp() line 269 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]80 bytes PetscLayoutCreate() line 55 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]16 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]32 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]16 bytes ISCreate_General() line 647 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]896 bytes ISCreate() line 37 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c > [ 0]272 bytes ISGeneralSetIndices_General() line 578 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]16 bytes PetscLayoutSetUp() line 269 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]80 bytes PetscLayoutCreate() line 55 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]16 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]32 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]16 bytes ISCreate_General() line 647 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]896 bytes ISCreate() line 37 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c > [ 0]880 bytes ISGeneralSetIndices_General() line 578 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]16 bytes PetscLayoutSetUp() line 269 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]80 bytes PetscLayoutCreate() line 55 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]16 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]32 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]16 bytes ISCreate_General() line 647 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]896 bytes ISCreate() line 37 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c > [ 0]960 bytes ISGeneralSetIndices_General() line 578 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]16 bytes PetscLayoutSetUp() line 269 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]80 bytes PetscLayoutCreate() line 55 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]16 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]32 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]16 bytes ISCreate_General() line 647 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]896 bytes ISCreate() line 37 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c > [ 0]976 bytes ISGeneralSetIndices_General() line 578 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]16 bytes PetscLayoutSetUp() line 269 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]80 bytes PetscLayoutCreate() line 55 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]16 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]32 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]16 bytes ISCreate_General() line 647 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]896 bytes ISCreate() line 37 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c > [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]16 bytes PetscLayoutSetUp() line 269 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]80 bytes PetscLayoutCreate() line 55 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]16 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]32 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]16 bytes ISCreate_General() line 647 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]896 bytes ISCreate() line 37 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c > [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]16 bytes PetscLayoutSetUp() line 269 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]80 bytes PetscLayoutCreate() line 55 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]16 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]32 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]16 bytes ISCreate_General() line 647 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]896 bytes ISCreate() line 37 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c > [ 0]1040 bytes ISGeneralSetIndices_General() line 578 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]16 bytes PetscLayoutSetUp() line 269 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]80 bytes PetscLayoutCreate() line 55 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]16 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]32 bytes PetscStrallocpy() line 187 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]16 bytes ISCreate_General() line 647 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]896 bytes ISCreate() line 37 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c > [ 0]64 bytes ISColoringGetIS() line 266 in > /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/utils/iscoloring.c > [ 0]32 bytes PetscCommDuplicate() line 129 in > /home/mlohry/dev/cmake-build/external/petsc/src/sys/objects/tagm.c > [0] Maximum memory PetscMalloc()ed 610153776 maximum size of entire > process 719073280 > [0] Memory usage sorted by function > [0] 6 192 DMCoarsenHookAdd() > [0] 2 9984 DMCreate() > [0] 2 128 DMCreate_Shell() > [0] 2 64 DMDSEnlarge_Static() > [0] 1 672 DMKSPCreate() > [0] 3 96 DMRefineHookAdd() > [0] 3 2064 DMSNESCreate() > [0] 4 128 DMSubDomainHookAdd() > [0] 1 768 DMTSCreate() > [0] 2 96 ISColoringCreate() > [0] 8 12608 ISColoringGetIS() > [0] 1 307200 ISConcatenate() > [0] 29 25984 ISCreate() > [0] 25 400 ISCreate_General() > [0] 4 64 ISCreate_Stride() > [0] 20 338016 ISGeneralSetIndices_General() > [0] 3 921600 ISGetIndices_Stride() > [0] 2 307232 ISGlobalToLocalMappingSetUp_Basic() > [0] 1 6144 ISInvertPermutation_General() > [0] 3 308576 ISLocalToGlobalMappingCreate() > [0] 2 32 KSPConvergedDefaultCreate() > [0] 2 2816 KSPCreate() > [0] 1 224 KSPCreate_FGMRES() > [0] 1 8016 KSPGMRESClassicalGramSchmidtOrthogonalization() > [0] 2 16032 KSPSetUp_FGMRES() > [0] 4 16084160 KSPSetUp_GMRES() > [0] 2 36864 MatColoringApply_SL() > [0] 1 656 MatColoringCreate() > [0] 6 17088 MatCreate() > [0] 1 16 MatCreateMFFD_WP() > [0] 1 16 MatCreateSubMatrices_SeqBAIJ() > [0] 1 12288 MatCreateSubMatrix_SeqBAIJ() > [0] 3 32320 MatCreateSubMatrix_SeqBAIJ_Private() > [0] 2 1472 MatCreate_MFFD() > [0] 1 416 MatCreate_SeqAIJ() > [0] 3 864 MatCreate_SeqBAIJ() > [0] 2 416 MatCreate_Shell() > [0] 1 784 MatFDColoringCreate() > [0] 2 12288 MatFDColoringDegreeSequence_Minpack() > [0] 6 30859392 MatFDColoringSetUp_SeqXAIJ() > [0] 3 42512 MatGetColumnIJ_SeqAIJ() > [0] 4 72720 MatGetColumnIJ_SeqBAIJ_Color() > [0] 1 6144 MatGetOrdering_Natural() > [0] 2 36384 MatGetRowIJ_SeqAIJ() > [0] 7 210626000 MatILUFactorSymbolic_SeqBAIJ() > [0] 2 313376 MatIncreaseOverlap_SeqBAIJ() > [0] 2 30740608 MatLUFactorNumeric_SeqBAIJ_N() > [0] 1 6144 MatMarkDiagonal_SeqAIJ() > [0] 1 6144 MatMarkDiagonal_SeqBAIJ() > [0] 8 256 MatRegisterRootName() > [0] 1 6160 MatSeqAIJCheckInode() > [0] 4 115216 MatSeqAIJSetPreallocation_SeqAIJ() > [0] 4 302779424 MatSeqBAIJSetPreallocation_SeqBAIJ() > [0] 13 576 MatSolverTypeRegister() > [0] 1 16 PCASMCreateSubdomains() > [0] 2 1664 PCCreate() > [0] 1 160 PCCreate_ASM() > [0] 1 192 PCCreate_ILU() > [0] 5 307264 PCSetUp_ASM() > [0] 2 416 PetscBTCreate() > [0] 2 3216 PetscClassPerfLogCreate() > [0] 2 1616 PetscClassRegLogCreate() > [0] 2 32 PetscCommBuildTwoSided_Allreduce() > [0] 2 64 PetscCommDuplicate() > [0] 2 1888 PetscDSCreate() > [0] 2 26416 PetscEventPerfLogCreate() > [0] 2 158400 PetscEventPerfLogEnsureSize() > [0] 2 1616 PetscEventRegLogCreate() > [0] 2 9600 PetscEventRegLogRegister() > [0] 8 102400 PetscFreeSpaceGet() > [0] 474 15168 PetscFunctionListAdd_Private() > [0] 2 528 PetscIntStackCreate() > [0] 142 11360 PetscLayoutCreate() > [0] 56 896 PetscLayoutSetUp() > [0] 59 9440 PetscObjectComposedDataIncreaseReal() > [0] 2 576 PetscObjectListAdd() > [0] 33 768 PetscOptionsGetEList() > [0] 1 16 PetscOptionsHelpPrintedCreate() > [0] 1 32 PetscPushSignalHandler() > [0] 7 6944 PetscSFCreate() > [0] 3 432 PetscSFCreate_Basic() > [0] 2 1472 PetscSFLinkCreate() > [0] 11 1229040 PetscSFSetUpRanks() > [0] 7 614512 PetscSFSetUp_Basic() > [0] 4 20096 PetscSegBufferCreate() > [0] 2 1488 PetscSplitReductionCreate() > [0] 2 3008 PetscStageLogCreate() > [0] 1148 23872 PetscStrallocpy() > [0] 6 13056 PetscStrreplace() > [0] 9 3456 PetscTableCreate() > [0] 1 16 PetscViewerASCIIOpen() > [0] 6 96 PetscViewerAndFormatCreate() > [0] 1 752 PetscViewerCreate() > [0] 1 96 PetscViewerCreate_ASCII() > [0] 2 1424 SNESCreate() > [0] 1 16 SNESCreate_NEWTONLS() > [0] 1 1008 SNESLineSearchCreate() > [0] 1 16 SNESLineSearchCreate_BT() > [0] 16 1824 SNESMSRegister() > [0] 46 9056 TSARKIMEXRegister() > [0] 1 1264 TSAdaptCreate() > [0] 8 384 TSBasicSymplecticRegister() > [0] 1 2160 TSCreate() > [0] 1 224 TSCreate_Theta() > [0] 48 5968 TSGLEERegister() > [0] 41 7728 TSRKRegister() > [0] 89 14736 TSRosWRegister() > [0] 71 110192 VecCreate() > [0] 1 307200 VecCreateGhostWithArray() > [0] 123 36874080 VecCreate_MPI_Private() > [0] 7 4300800 VecCreate_Seq() > [0] 8 256 VecCreate_Seq_Private() > [0] 6 400 VecDuplicateVecs_Default() > [0] 3 2352 VecScatterCreate() > [0] 7 1843296 VecScatterSetUp_SF() > [0] 126 2016 VecStashCreate_Private() > [0] 1 3072 mapBlockColoringToJacobian() > > On Wed, Aug 12, 2020 at 4:22 PM Barry Smith wrote: > >> >> Yes, there are some PETSc objects or arrays that you are not freeing >> so they are printed at the end of the run. For small runs this harmless but >> if new objects/memory is allocated at each iteration and not suitably freed >> it will eventually add up. >> >> Run with -malloc_view (small problem with say 2 iterations) it will >> print everything allocated and might be helpful. >> >> Perhaps you are calling ISColoringGetIS() and not calling >> ISColoringRestoreIS()? >> >> It is also possible it is a leak in PETSc, but that is unlikely since >> we test for them. >> >> Are you using Fortran? >> >> Barry >> >> >> On Aug 12, 2020, at 1:29 PM, Mark Lohry wrote: >> >> Thanks Matt and Barry. At Matt's suggestion I ran a smaller >> representative case with valgrind and didn't see anything alarming (apart >> from a small leak in an older boost version I was using: >> https://github.com/boostorg/serialization/issues/104 although I don't >> think this was causing the issue). >> >> -malloc_debug dumps quite a lot, this is supposed to be empty right? >> Output pasted below. It looks like the same sequence of calls is repeated 8 >> times, which is how many nonlinear solves occurred in this particular run. >> Thoughts? >> >> >> >> [ 0]1408 bytes PetscSplitReductionCreate() line 63 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >> [ 0]80 bytes PetscSplitReductionCreate() line 57 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >> [ 0]16 bytes PetscCommBuildTwoSided_Allreduce() line 169 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/mpits.c >> [ 0]16 bytes ISGeneralSetIndices_General() line 578 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]16 bytes PetscLayoutSetUp() line 269 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]80 bytes PetscLayoutCreate() line 55 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]16 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]32 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]16 bytes ISCreate_General() line 647 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]896 bytes ISCreate() line 37 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >> [ 0]272 bytes ISGeneralSetIndices_General() line 578 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]16 bytes PetscLayoutSetUp() line 269 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]80 bytes PetscLayoutCreate() line 55 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]16 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]32 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]16 bytes ISCreate_General() line 647 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]896 bytes ISCreate() line 37 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >> [ 0]880 bytes ISGeneralSetIndices_General() line 578 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]16 bytes PetscLayoutSetUp() line 269 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]80 bytes PetscLayoutCreate() line 55 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]16 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]32 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]16 bytes ISCreate_General() line 647 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]896 bytes ISCreate() line 37 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >> [ 0]960 bytes ISGeneralSetIndices_General() line 578 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]16 bytes PetscLayoutSetUp() line 269 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]80 bytes PetscLayoutCreate() line 55 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]16 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]32 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]16 bytes ISCreate_General() line 647 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]896 bytes ISCreate() line 37 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >> [ 0]976 bytes ISGeneralSetIndices_General() line 578 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]16 bytes PetscLayoutSetUp() line 269 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]80 bytes PetscLayoutCreate() line 55 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]16 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]32 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]16 bytes ISCreate_General() line 647 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]896 bytes ISCreate() line 37 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]16 bytes PetscLayoutSetUp() line 269 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]80 bytes PetscLayoutCreate() line 55 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]16 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]32 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]16 bytes ISCreate_General() line 647 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]896 bytes ISCreate() line 37 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]16 bytes PetscLayoutSetUp() line 269 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]80 bytes PetscLayoutCreate() line 55 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]16 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]32 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]16 bytes ISCreate_General() line 647 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]896 bytes ISCreate() line 37 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >> [ 0]1040 bytes ISGeneralSetIndices_General() line 578 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]16 bytes PetscLayoutSetUp() line 269 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]80 bytes PetscLayoutCreate() line 55 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]16 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]32 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]16 bytes ISCreate_General() line 647 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]896 bytes ISCreate() line 37 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >> [ 0]64 bytes ISColoringGetIS() line 266 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/utils/iscoloring.c >> [ 0]32 bytes PetscCommDuplicate() line 129 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/objects/tagm.c >> >> >> >> On Wed, Aug 12, 2020 at 1:46 PM Barry Smith wrote: >> >>> >>> Mark. >>> >>> When valgrind is not feasible (like on many centrally controlled >>> batch systems) you can run PETSc with an extra flag to do some memory error >>> checks >>> -malloc_debug >>> >>> this >>> >>> 1) fills all malloced memory with Nan so if the code is using >>> uninitialized memory it may be detected and >>> 2) checks the beginning and end of each alloced memory region for >>> out-of-bounds writes at each malloc and free. >>> >>> it will slow the code down a little bit but generally not a huge amount. >>> >>> It is no where near as good as valgrind or other memory corruption tools >>> but it has the advantage you can run it anywhere on any size job. >>> >>> >>> Barry >>> >>> >>> >>> >>> >>> On Aug 12, 2020, at 7:46 AM, Matthew Knepley wrote: >>> >>> On Wed, Aug 12, 2020 at 7:53 AM Mark Lohry wrote: >>> >>>> I'm getting seemingly random failures of late: >>>> Caught signal number 7 BUS: Bus Error, possibly illegal memory access >>>> >>> >>> The first thing I would do is run valgrind on as wide an array of tests >>> as you can. This will find problems >>> on things that run completely fine. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Symptoms: >>>> 1) Seems to only happen (so far) on larger cases, 400-2000 cores >>>> 2) It doesn't happen right away -- this was running happily for several >>>> hours over several hundred time steps with no indication of bad health in >>>> the numerics >>>> 3) At least the total memory consumption seems to be within bounds, >>>> though I'm not sure about individual processes. e.g. slurm here reported >>>> Memory Efficiency: 75.23% of 1.76 TB (180.00 GB/node) >>>> 4) running the same setup twice it fails at different points >>>> >>>> Any suggestions on what to look for? This is a bit painful to work on >>>> as I can only reproduce it on large runs and then it's seemingly random. >>>> >>>> >>>> Thanks, >>>> Mark >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >>> >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Aug 24 08:45:53 2020 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 24 Aug 2020 08:45:53 -0500 Subject: [petsc-users] PetscFV and TS implicit In-Reply-To: References: <87mu2pgtdp.fsf@jedbrown.org> <01FA5D4D-A0CA-4ACB-ACC9-EB213E3B0D2F@petsc.dev> Message-ID: I think the attached is wrong. The input to the matrix vector product for the Jacobian is always global vectors which means on each process the dimension is not the size of the DMGetLocalVector() it should be the VecGetLocalSize() of the DMGetGlobalVector() But you may be able to skip all this and have the DM create the shell matrix setting it sizes appropriately and you only need to supply the MATOP DMSetMatType(dm,MATSHELL); DMCreateMatrix(dm,&A); In fact, I also don't understand the PetscJacobianFunction_JFKN() function It seems to be doing finite differencing on the DMPlexTSComputeRHSFunctionFVM() assuming the current function value is in usr->RHS_ref. How is this different than just letting PETSc/SNES used finite differences to do the matrix-vector product. Your code seems rather complicated with the DMGlobalToLocal() which I don't understand what it is suppose to do there. I think you can just call TSGetSNES() SNESSetUseMatrixFree(snes,PETSC_TRUE,PETSC_FALSE); and it will set up an internal matrix that does the finite differencing for you. Then you never need a shell matrix. Also to create the preconditioner matrix B this should work DMSetMatType(dm,MATAIJ); DMCreateMatrix(dm,&B); no need for you to figure out the sizes. Note that both A and B need to have the same dimensions on each process as the global vectors which I don't think your current code has. Barry > On Aug 24, 2020, at 12:56 AM, Thibault Bridel-Bertomeu wrote: > > Barry, first of all, thank you very much for your detailed answer, I keep reading it to let it soak in - I might come back to you for more details if you do not mind. > > In the meantime, to fuel the conversation, I attach to this e-mail two pdfs containing the pieces of the code that regard what we are discussing. In the *timedisc.pdf, you'll find how I handle the initialization of the TS object, and in the *petscdefs.pdf you'll find the method that calls the TSSolve as well as the methods that are linked to the TS (the timestep adapt, the jacobian etc ...). [Sorry for the quality, I cannot do better than that sort of pdf ...] > > Based on what is in the structured code I sent you the other day, I rewrote the PetscJacobianFunction_JFNK. I think it should be all right, but although it compiles, execution raises a seg fault I think when I do > ierr = TSSetIJacobian(ts, A, A, PetscIJacobian, user); > saying that A does not have the right dimensions. It is quite new, I am still looking into where exactly the error is raised. What do you think of this implementation though, does it look correct in your expert eyes ? > As for what we really discussed so far, it's that PetscComputePreconMatImpl that I do not know how to implement (with the derivative of the jacobian based on the FVM object). > > I understand now that what I am showing you today might not be the right way to go if one wants to really use the PetscFV, but I just wanted to add those code lines to the conversation to have your feedback. > > Thank you again for your help, > > Thibault > > > Le ven. 21 ao?t 2020 ? 19:25, Barry Smith > a ?crit : > > >> On Aug 21, 2020, at 10:58 AM, Thibault Bridel-Bertomeu > wrote: >> >> Thank you Barry for the tip ! I?ll make sure to do that when everything is set. >> What I also meant is that there will not be any more direct way to set the preconditioner than to go through SNESSetJacobian after having assembled everything by hand ? Like, in my case, or in the more general case of fluid dynamics equations, the preconditioner is not a fun matrix to assemble, because for every cell the derivative of the physical flux jacobian has to be taken and put in the right block in the matrix - finite element style if you want. Is there a way to do that with Petsc methods, maybe short-circuiting the FEM based methods ? > > Thibault > > I am not sure what you mean but there are a couple of things that may be helpful. > > PCSHELL https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCSHELL.html <> allows you to build your own preconditioner (that can and often will use one or more of its own Mats, and KSP or PC inside it, or even use another PETScFV etc to build some of the sub matrices for you if it is appropriate), this approach means you never need to construct a "global" PETSc matrix from which PETSc builds the preconditioner. But you should only do this if the conventional approach is not reasonable for your problem. > > MATNEST https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATNEST.html allows you to build a global matrix by building parts of it separately and even skipping parts you decide you don't need in the preconditioner. Conceptually it is the same as just creating a global matrix and filling up but the process is a bit different and something suitable for "multi physics" or "multi-equation" type applications. > > Of course what you put into PCSHELL and MATNEST will affect the convergence of the nonlinear solver. As Jed noted what you put in the "Jacobian" does not have to be directly the same mathematically as what you put into the TSSetI/RHSFunction with the caveat that it does have to appropriate spectral properties to result in a good preconditioner for the "true" Jacobian. > > Couple of other notes: > > The entire business of "Jacobian" matrix-free or not (with for example -snes_fd_operator) is tricky because as Jed noted if your finite volume scheme has non-differential terms such as if () tests. There is a concept of sub-differential for this type of thing but I know absolutely nothing about that and probably not worth investigating. > > In this situation you can avoid the "true" Jacobian completely (both for matrix-vector product and preconditioner) and use something else as Jed suggested a lower order scheme that is differentiable. This can work well for solving the nonlinear system or not depending on how suitable it is for your original "function" > > > 1) In theory at least you can have the Jacobian matrix-vector product computed directly using DMPLEX/PETScFV infrastructure (it would apply the Jacobian locally matrix-free using code similar to the code that evaluates the FV "function". I do no know if any of this code is written, it will be more efficient than -snes_mf_operator that evaluates the FV "function" and does traditional differencing to compute the Jacobian. Again it has the problem of non-differentialability if the function is not differential. But it could be done for a different (lower order scheme) that is differentiable. > > 2) You can have PETSc compute the Jacobian explicitly coloring and from that build the preconditioner, this allows you to avoid the hassle of writing the code for the derivatives yourself. This uses finite differences on your function and coloring of the graph to compute many columns of the Jacobian simultaneously and can be pretty efficient. Again if the function is not differential there can be issues of what the result means and will it work in a nonlinear solver. SNESComputeJacobianDefaultColor https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESComputeJacobianDefaultColor.html > > 3) Much more outlandish is to skip Newton and Jacobians completely and use the full approximation scheme SNESFAS https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNESFAS/SNESFAS.html this requires a grid hierarchy and appropriate way to interpolate up through the grid hierarchy your finite volume solutions. Probably not worth investigating unless you have lots of time on your hands and keen interest in this kind of stuff https://arxiv.org/pdf/1607.04254.pdf > > So to summarize, and Matt and Jed can correct my mistakes. > > 1) Form the full Jacobian from the original "function" using analytic approach use it for both the matrix-vector product and to build the preconditioner. Problem if full Jacobian not well defined mathematically. Tough to code, usually not practical. > > 2) Do any matrix free (any way) for the full Jacobian and > > a) build another "approximate" Jacobian (using any technique analytic or finite differences using matrix coloring on a new "lower order" "function") Still can have trouble if this original Jacobian is no well defined > > b) "write your own preconditioner" that internally can use anything in PETSc that approximately solves the Jacobian. Same potential problems if original Jacobian is not differential, plus convergence will depend on how good your own preconditioner approximates the inverse of the true Jacobian. > > 3) Use a lower Jacobian (computed anyway you want) for the matrix-vector product and the preconditioner. The problem of differentiability is gone but convergence of the nonlinear solver depends on how well lower order Jacobian is appropriate for the original "function" > > a) Form the "lower order" Jacobian analytically or with coloring and use for both matrix-vector product and building preconditioner. Note that switching between this and 2a is trivial. > > b) Do the "lower order" Jacobian matrix free and provide your own PCSHELL. Note that switching between this and 2b is trivial. > > Barry > > I would first try competing the "true" Jacobian via coloring, if that works and give satisfactory results (fast enough) then stop. > > Then I would do 2a/2b by writing my "function" using PETScFV and writing the "lower order function" via PETScFV and use matrix coloring to get the Jacobian from the second "lower order function". If this works well (either with 2a or 3a or both) then stop or you can compute the "lower order" Jacobian analytically (again using PetscFV) for a more efficient evaluation of the Jacobian. > > >> >> Thanks ! >> >> Thibault >> >> Le ven. 21 ao?t 2020 ? 17:22, Barry Smith > a ?crit : >> >> >>> On Aug 21, 2020, at 8:35 AM, Thibault Bridel-Bertomeu > wrote: >>> >>> >>> >>> Le ven. 21 ao?t 2020 ? 15:23, Matthew Knepley > a ?crit : >>> On Fri, Aug 21, 2020 at 9:10 AM Thibault Bridel-Bertomeu > wrote: >>> Sorry, I sent too soon, I hit the wrong key. >>> >>> I wanted to say that context.npoints is the local number of cells. >>> >>> PetscRHSFunctionImpl allows to generate the hyperbolic part of the right hand side. >>> Then we have : >>> >>> PetscErrorCode PetscIJacobian( >>> TS ts, /*!< Time stepping object (see PETSc TS)*/ >>> PetscReal t, /*!< Current time */ >>> Vec Y, /*!< Solution vector */ >>> Vec Ydot, /*!< Time-derivative of solution vector */ >>> PetscReal a, /*!< Shift */ >>> Mat A, /*!< Jacobian matrix */ >>> Mat B, /*!< Preconditioning matrix */ >>> void *ctxt /*!< Application context */ >>> ) >>> { >>> PETScContext *context = (PETScContext*) ctxt; >>> HyPar *solver = context->solver; >>> _DECLARE_IERR_; >>> >>> PetscFunctionBegin; >>> solver->count_IJacobian++; >>> context->shift = a; >>> context->waqt = t; >>> /* Construct preconditioning matrix */ >>> if (context->flag_use_precon) { IERR PetscComputePreconMatImpl(B,Y,context); CHECKERR(ierr); } >>> >>> PetscFunctionReturn(0); >>> } >>> >>> and PetscJacobianFunction_JFNK which I bind to the matrix shell, computes the action of the jacobian on a vector : say U0 is the state of reference and Y the vector upon which to apply the JFNK method, then the PetscJacobianFunction_JFNK returns shift * Y - 1/epsilon * (F(U0 + epsilon*Y) - F(U0)) where F allows to evaluate the hyperbolic flux (shift comes from the TS). >>> The preconditioning matrix I compute as an approximation to the actual jacobian, that is shift * Identity - Derivative(dF/dU) where dF/dU is, in each cell, a 4x4 matrix that is known exactly for the system of equations I am solving, i.e. Euler equations. For the structured grid, I can loop on the cells and do that 'Derivative' thing at first order by simply taking a finite-difference like approximation with the neighboring cells, Derivative(phi) = phi_i - phi_{i-1} and I assemble the B matrix block by block (JFunction is the dF/dU) >>> >>> /* diagonal element */ >>> >>> >>> <> for (v=0; v>> >>> >>> <> ierr = solver->JFunction (values,(u+nvars*p),solver->physics ,dir,0); >>> >>> >>> <> _ArrayScale1D_ (values,(dxinv*iblank),(nvars*nvars)); >>> >>> >>> <> ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); CHKERRQ(ierr); >>> >>> >>> <> >>> >>> >>> <> /* left neighbor */ >>> >>> >>> <> if (pgL >= 0) { >>> >>> >>> <> for (v=0; v>> >>> >>> <> ierr = solver->JFunction (values,(u+nvars*pL),solver->physics ,dir,1); >>> >>> >>> <> _ArrayScale1D_ (values,(-dxinv*iblank),(nvars*nvars)); >>> >>> >>> <> ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); CHKERRQ(ierr); >>> >>> >>> <> } >>> >>> >>> <> >>> >>> >>> <> /* right neighbor */ >>> >>> >>> <> if (pgR >= 0) { >>> >>> >>> <> for (v=0; v>> >>> >>> <> ierr = solver->JFunction (values,(u+nvars*pR),solver->physics ,dir,-1); >>> >>> >>> <> _ArrayScale1D_ (values,(-dxinv*iblank),(nvars*nvars)); >>> >>> >>> <> ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); CHKERRQ(ierr); >>> >>> >>> <> } >>> >>> >>> >>> I do not know if I am clear here ... >>> Anyways, I am trying to figure out how to do this shell matrix and this preconditioner using all the FV and DMPlex artillery. >>> >>> Okay, that is very clear. We should be able to get the JFNK just with -snes_mf_operator, and put the approximate J construction in DMPlexComputeJacobian_Internal(). >>> There is an FV section already, and we could just add this. I would need to understand those entries in the pointwise Riemann sense that the other stuff is now. >>> >>> Ok i had a quick look and if I understood correctly it would do the job. Setting the snes-mf-operator flag would mean however that we have to go through SNESSetJacobian to set the jacobian and the preconditioning matrix wouldn't it ? >> >> Thibault, >> >> Since the TS implicit methods end up using SNES internally the option should be available to you without requiring you to be calling the SNES routines directly >> >> Once you have finalized your approach and if for the implicit case you always work in the snes mf operator mode you can hardwire >> >> TSGetSNES(ts,&snes); >> SNESSetUseMatrixFree(snes,PETSC_TRUE,PETSC_FALSE); >> >> in your code so you don't need to always provide the option -snes-mf-operator >> >> Barry >> >> >> >>> There might be calls to the Riemann solver to evaluate the dRHS / dU part yes but maybe it's possible to re-use what was computed for the RHS^n ? >>> In the FV section the jacobian is set to identity which I missed before, but it could explain why when I used the following : >>> TSSetType(ts, TSBEULER); >>> DMTSSetIFunctionLocal(dm, DMPlexTSComputeIFunctionFEM , &ctx); >>> DMTSSetIJacobianLocal(dm, DMPlexTSComputeIJacobianFEM , &ctx); >>> with my FV discretization nothing happened, right ? >>> >>> Thank you, >>> >>> Thibault >>> >>> Thanks, >>> >>> Matt >>> >>> Le ven. 21 ao?t 2020 ? 14:55, Thibault Bridel-Bertomeu > a ?crit : >>> Hi, >>> >>> Thanks Matthew and Jed for your input. >>> I indeed envision an implicit solver in the sense Jed mentioned - Jiri Blazek's book is a nice intro to this concept. >>> >>> Matthew, I do not know exactly what to change right now because although I understand globally what the DMPlexComputeXXXX_Internal methods do, I cannot say for sure line by line what is happening. >>> In a structured code, I have a an implicit FVM solver with PETSc but I do not use any of the FV structure, not even a DM - I just use C arrays that I transform to PETSc Vec and Mat and build my IJacobian and my preconditioner and gives all that to a TS and it runs. I cannot figure out how to do it with the FV and the DM and all the underlying "shortcuts" that I want to use. >>> >>> Here is the top method for the structured code : >>> >>> int total_size = context.npoints * solver->nvars >>> ierr = TSSetRHSFunction(ts,PETSC_NULL,PetscRHSFunctionImpl,&context); CHKERRQ(ierr); >>> SNES snes; >>> KSP ksp; >>> PC pc; >>> SNESType snestype; >>> ierr = TSGetSNES(ts,&snes); CHKERRQ(ierr); >>> ierr = SNESGetType(snes,&snestype); CHKERRQ(ierr); >>> >>> flag_mat_a = 1; >>> ierr = MatCreateShell(MPI_COMM_WORLD,total_size,total_size,PETSC_DETERMINE, >>> PETSC_DETERMINE,&context,&A); CHKERRQ(ierr); >>> context.jfnk_eps = 1e-7; >>> ierr = PetscOptionsGetReal(NULL,NULL,"-jfnk_epsilon",&context.jfnk_eps,NULL); CHKERRQ(ierr); >>> ierr = MatShellSetOperation(A,MATOP_MULT,(void (*)(void))PetscJacobianFunction_JFNK); CHKERRQ(ierr); >>> ierr = MatSetUp(A); CHKERRQ(ierr); >>> >>> context.flag_use_precon = 0; >>> ierr = PetscOptionsGetBool(PETSC_NULL,PETSC_NULL,"-with_pc",(PetscBool*)(&context.flag_use_precon),PETSC_NULL); CHKERRQ(ierr); >>> >>> /* Set up preconditioner matrix */ >>> flag_mat_b = 1; >>> ierr = MatCreateAIJ(MPI_COMM_WORLD,total_size,total_size,PETSC_DETERMINE,PETSC_DETERMINE, >>> (solver->ndims*2+1)*solver->nvars,NULL, >>> 2*solver->ndims*solver->nvars,NULL,&B); CHKERRQ(ierr); >>> ierr = MatSetBlockSize(B,solver->nvars); >>> /* Set the RHSJacobian function for TS */ >>> ierr = TSSetIJacobian(ts,A,B,PetscIJacobian,&context); CHKERRQ(ierr); >>> >>> Thibault Bridel-Bertomeu >>> ? >>> Eng, MSc, PhD >>> Research Engineer >>> CEA/CESTA >>> 33114 LE BARP >>> Tel.: (+33)557046924 >>> Mob.: (+33)611025322 >>> Mail: thibault.bridelbertomeu at gmail.com >>> >>> >>> Le jeu. 20 ao?t 2020 ? 18:43, Jed Brown > a ?crit : >>> Matthew Knepley > writes: >>> >>> >>> >>> >>> >>> > I could never get the FVM stuff to make sense to me for implicit methods. >>> >>> >>> > Here is my problem understanding. If you have an FVM method, it decides >>> >>> >>> > to move "stuff" from one cell to its neighboring cells depending on the >>> >>> >>> > solution to the Riemann problem on each face, which computed the flux. This >>> >>> >>> > is >>> >>> >>> > fine unless the timestep is so big that material can flow through into the >>> >>> >>> > cells beyond the neighbor. Then I should have considered the effect of the >>> >>> >>> > Riemann problem for those interfaces. That would be in the Jacobian, but I >>> >>> >>> > don't know how to compute that Jacobian. I guess you could do everything >>> >>> >>> > matrix-free, but without a preconditioner it seems hard. >>> >>> >>> >>> >>> >>> So long as we're using method of lines, the flux is just instantaneous flux, not integrated over some time step. It has the same meaning for implicit and explicit. >>> >>> >>> >>> >>> >>> An explicit method would be unstable if you took such a large time step (CFL) and an implicit method will not simultaneously be SSP and higher than first order, but it's still a consistent discretization of the problem. >>> >>> >>> >>> >>> >>> It's common (done in FUN3D and others) to precondition with a first-order method, where gradient reconstruction/limiting is skipped. That's what I'd recommend because limiting creates nasty nonlinearities and the resulting discretizations lack h-ellipticity which makes them very hard to solve. >>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >>> >>> >> >> -- >> Thibault Bridel-Bertomeu >> ? >> Eng, MSc, PhD >> Research Engineer >> CEA/CESTA >> 33114 LE BARP >> Tel.: (+33)557046924 >> Mob.: (+33)611025322 >> Mail: thibault.bridelbertomeu at gmail.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Untitled.png Type: image/png Size: 120787 bytes Desc: not available URL: From bsmith at petsc.dev Mon Aug 24 08:48:34 2020 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 24 Aug 2020 08:48:34 -0500 Subject: [petsc-users] Bus Error In-Reply-To: References: <8D172ADD-FC1A-4E71-B151-CA648951A61C@petsc.dev> Message-ID: <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> Mark, Can you run in valgrind? Exactly what BLAS are you using? Barry > On Aug 24, 2020, at 7:54 AM, Mark Lohry wrote: > > Reran with debug mode and got a stack trace for this bus error, looks like it's happening in BLASgemv, see pasted below. I did take care of the ISColoring leak mentioned previously, although that was a very small amount of data and I don't think is relevant here. > > At this point it's happily run 222 timesteps prior to this, so I'm a little mystified. Any ideas? > > Thanks, > Mark > > > 222 TS dt 0.03 time 6.66 > 0 SNES Function norm 4.124287265556e+02 > 0 KSP Residual norm 4.124287265556e+02 > 1 KSP Residual norm 4.123248052318e+02 > 2 KSP Residual norm 4.123173350456e+02 > 3 KSP Residual norm 4.118769044110e+02 > 4 KSP Residual norm 4.094856150740e+02 > 5 KSP Residual norm 4.006000788078e+02 > 6 KSP Residual norm 3.787922969183e+02 > [clip] > Linear solve converged due to CONVERGED_RTOL iterations 9 > Line search: Using full step: fnorm 4.015236590684e+01 gnorm 3.173434863784e+00 > 2 SNES Function norm 3.173434863784e+00 > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 2 > 0 SNES Function norm 5.842010710080e+02 > 0 KSP Residual norm 5.842010710080e+02 > 1 KSP Residual norm 5.840526408234e+02 > 2 KSP Residual norm 5.840431857354e+02 > 3 KSP Residual norm 5.834351392302e+02 > 4 KSP Residual norm 5.800901047861e+02 > 5 KSP Residual norm 5.675562288567e+02 > 6 KSP Residual norm 5.366287895681e+02 > 7 KSP Residual norm 4.725811521866e+02 > [911]PETSC ERROR: ------------------------------------------------------------------------ > [911]PETSC ERROR: Caught signal number 7 BUS: Bus Error, possibly illegal memory access > [911]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [911]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [911]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors > [911]PETSC ERROR: likely location of problem given in stack below > [911]PETSC ERROR: --------------------- Stack Frames ------------------------------------ > [911]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > [911]PETSC ERROR: INSTEAD the line number of the start of the function > [911]PETSC ERROR: is given. > [911]PETSC ERROR: [911] BLASgemv line 1393 /home/mlohry/build/external/petsc/src/mat/impls/baij/seq/baijfact.c > [911]PETSC ERROR: [911] MatSolve_SeqBAIJ_N_NaturalOrdering line 1378 /home/mlohry/build/external/petsc/src/mat/impls/baij/seq/baijfact.c > [911]PETSC ERROR: [911] MatSolve line 3354 /home/mlohry/build/external/petsc/src/mat/interface/matrix.c > [911]PETSC ERROR: [911] PCApply_ILU line 201 /home/mlohry/build/external/petsc/src/ksp/pc/impls/factor/ilu/ilu.c > [911]PETSC ERROR: [911] PCApply line 426 /home/mlohry/build/external/petsc/src/ksp/pc/interface/precon.c > [911]PETSC ERROR: [911] KSP_PCApply line 279 /home/mlohry/build/external/petsc/include/petsc/private/kspimpl.h > [911]PETSC ERROR: [911] KSPSolve_PREONLY line 16 /home/mlohry/build/external/petsc/src/ksp/ksp/impls/preonly/preonly.c > [911]PETSC ERROR: [911] KSPSolve_Private line 590 /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c > [911]PETSC ERROR: [911] KSPSolve line 848 /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c > [911]PETSC ERROR: [911] PCApply_ASM line 441 /home/mlohry/build/external/petsc/src/ksp/pc/impls/asm/asm.c > [911]PETSC ERROR: [911] PCApply line 426 /home/mlohry/build/external/petsc/src/ksp/pc/interface/precon.c > [911]PETSC ERROR: [911] KSP_PCApply line 279 /home/mlohry/build/external/petsc/include/petsc/private/kspimpl.h > [911]PETSC ERROR: [911] KSPFGMRESCycle line 108 /home/mlohry/build/external/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c > [911]PETSC ERROR: [911] KSPSolve_FGMRES line 274 /home/mlohry/build/external/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c > [911]PETSC ERROR: [911] KSPSolve_Private line 590 /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c > [911]PETSC ERROR: [911] KSPSolve line 848 /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c > [911]PETSC ERROR: [911] SNESSolve_NEWTONLS line 144 /home/mlohry/build/external/petsc/src/snes/impls/ls/ls.c > [911]PETSC ERROR: [911] SNESSolve line 4403 /home/mlohry/build/external/petsc/src/snes/interface/snes.c > [911]PETSC ERROR: [911] TSStep_ARKIMEX line 728 /home/mlohry/build/external/petsc/src/ts/impls/arkimex/arkimex.c > [911]PETSC ERROR: [911] TSStep line 3682 /home/mlohry/build/external/petsc/src/ts/interface/ts.c > [911]PETSC ERROR: [911] TSSolve line 4005 /home/mlohry/build/external/petsc/src/ts/interface/ts.c > [911]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [911]PETSC ERROR: Signal received > [911]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [911]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020 > [911]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h20c2n20 by mlohry Sun Aug 23 19:54:21 2020 > [911]PETSC ERROR: Configure options PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt --with-cc=/usr/local/openmpi/3.1.3/gcc/x8 > [911]PETSC ERROR: #1 User provided function() line 0 in unknown file > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 911 in communicator MPI_COMM_WORLD > > On Wed, Aug 12, 2020 at 8:19 PM Mark Lohry > wrote: > Perhaps you are calling ISColoringGetIS() and not calling ISColoringRestoreIS()? > > I have matching ISColoringGet/Restore here, and it's only used prior to the first iteration so at least it doesn't seem to be growing. At the bottom I pasted the malloc_view and malloc_debug output from running 1 time step. > > I'm sort of thinking this might be a red herring -- is it possible the rank 0 process is chewing up dramatically more memory than others, like with logging or something? Like I mentioned earlier the total memory usage is well under the machine limits. I'll spring in some PetscMemoryGetMaximumUsage logging at every time step and try to get a big job going again. > > > > Are you using Fortran? > > C++ > > > > [ 0]1408 bytes PetscSplitReductionCreate() line 63 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c > [ 0]80 bytes PetscSplitReductionCreate() line 57 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c > [ 0]16 bytes PetscCommBuildTwoSided_Allreduce() line 169 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/mpits.c > [ 0]16 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c > [ 0]272 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c > [ 0]880 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c > [ 0]960 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c > [ 0]976 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c > [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c > [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c > [ 0]1040 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c > [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c > [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c > [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c > [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c > [ 0]64 bytes ISColoringGetIS() line 266 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/utils/iscoloring.c > [ 0]32 bytes PetscCommDuplicate() line 129 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/objects/tagm.c > [0] Maximum memory PetscMalloc()ed 610153776 maximum size of entire process 719073280 > [0] Memory usage sorted by function > [0] 6 192 DMCoarsenHookAdd() > [0] 2 9984 DMCreate() > [0] 2 128 DMCreate_Shell() > [0] 2 64 DMDSEnlarge_Static() > [0] 1 672 DMKSPCreate() > [0] 3 96 DMRefineHookAdd() > [0] 3 2064 DMSNESCreate() > [0] 4 128 DMSubDomainHookAdd() > [0] 1 768 DMTSCreate() > [0] 2 96 ISColoringCreate() > [0] 8 12608 ISColoringGetIS() > [0] 1 307200 ISConcatenate() > [0] 29 25984 ISCreate() > [0] 25 400 ISCreate_General() > [0] 4 64 ISCreate_Stride() > [0] 20 338016 ISGeneralSetIndices_General() > [0] 3 921600 ISGetIndices_Stride() > [0] 2 307232 ISGlobalToLocalMappingSetUp_Basic() > [0] 1 6144 ISInvertPermutation_General() > [0] 3 308576 ISLocalToGlobalMappingCreate() > [0] 2 32 KSPConvergedDefaultCreate() > [0] 2 2816 KSPCreate() > [0] 1 224 KSPCreate_FGMRES() > [0] 1 8016 KSPGMRESClassicalGramSchmidtOrthogonalization() > [0] 2 16032 KSPSetUp_FGMRES() > [0] 4 16084160 KSPSetUp_GMRES() > [0] 2 36864 MatColoringApply_SL() > [0] 1 656 MatColoringCreate() > [0] 6 17088 MatCreate() > [0] 1 16 MatCreateMFFD_WP() > [0] 1 16 MatCreateSubMatrices_SeqBAIJ() > [0] 1 12288 MatCreateSubMatrix_SeqBAIJ() > [0] 3 32320 MatCreateSubMatrix_SeqBAIJ_Private() > [0] 2 1472 MatCreate_MFFD() > [0] 1 416 MatCreate_SeqAIJ() > [0] 3 864 MatCreate_SeqBAIJ() > [0] 2 416 MatCreate_Shell() > [0] 1 784 MatFDColoringCreate() > [0] 2 12288 MatFDColoringDegreeSequence_Minpack() > [0] 6 30859392 MatFDColoringSetUp_SeqXAIJ() > [0] 3 42512 MatGetColumnIJ_SeqAIJ() > [0] 4 72720 MatGetColumnIJ_SeqBAIJ_Color() > [0] 1 6144 MatGetOrdering_Natural() > [0] 2 36384 MatGetRowIJ_SeqAIJ() > [0] 7 210626000 MatILUFactorSymbolic_SeqBAIJ() > [0] 2 313376 MatIncreaseOverlap_SeqBAIJ() > [0] 2 30740608 MatLUFactorNumeric_SeqBAIJ_N() > [0] 1 6144 MatMarkDiagonal_SeqAIJ() > [0] 1 6144 MatMarkDiagonal_SeqBAIJ() > [0] 8 256 MatRegisterRootName() > [0] 1 6160 MatSeqAIJCheckInode() > [0] 4 115216 MatSeqAIJSetPreallocation_SeqAIJ() > [0] 4 302779424 MatSeqBAIJSetPreallocation_SeqBAIJ() > [0] 13 576 MatSolverTypeRegister() > [0] 1 16 PCASMCreateSubdomains() > [0] 2 1664 PCCreate() > [0] 1 160 PCCreate_ASM() > [0] 1 192 PCCreate_ILU() > [0] 5 307264 PCSetUp_ASM() > [0] 2 416 PetscBTCreate() > [0] 2 3216 PetscClassPerfLogCreate() > [0] 2 1616 PetscClassRegLogCreate() > [0] 2 32 PetscCommBuildTwoSided_Allreduce() > [0] 2 64 PetscCommDuplicate() > [0] 2 1888 PetscDSCreate() > [0] 2 26416 PetscEventPerfLogCreate() > [0] 2 158400 PetscEventPerfLogEnsureSize() > [0] 2 1616 PetscEventRegLogCreate() > [0] 2 9600 PetscEventRegLogRegister() > [0] 8 102400 PetscFreeSpaceGet() > [0] 474 15168 PetscFunctionListAdd_Private() > [0] 2 528 PetscIntStackCreate() > [0] 142 11360 PetscLayoutCreate() > [0] 56 896 PetscLayoutSetUp() > [0] 59 9440 PetscObjectComposedDataIncreaseReal() > [0] 2 576 PetscObjectListAdd() > [0] 33 768 PetscOptionsGetEList() > [0] 1 16 PetscOptionsHelpPrintedCreate() > [0] 1 32 PetscPushSignalHandler() > [0] 7 6944 PetscSFCreate() > [0] 3 432 PetscSFCreate_Basic() > [0] 2 1472 PetscSFLinkCreate() > [0] 11 1229040 PetscSFSetUpRanks() > [0] 7 614512 PetscSFSetUp_Basic() > [0] 4 20096 PetscSegBufferCreate() > [0] 2 1488 PetscSplitReductionCreate() > [0] 2 3008 PetscStageLogCreate() > [0] 1148 23872 PetscStrallocpy() > [0] 6 13056 PetscStrreplace() > [0] 9 3456 PetscTableCreate() > [0] 1 16 PetscViewerASCIIOpen() > [0] 6 96 PetscViewerAndFormatCreate() > [0] 1 752 PetscViewerCreate() > [0] 1 96 PetscViewerCreate_ASCII() > [0] 2 1424 SNESCreate() > [0] 1 16 SNESCreate_NEWTONLS() > [0] 1 1008 SNESLineSearchCreate() > [0] 1 16 SNESLineSearchCreate_BT() > [0] 16 1824 SNESMSRegister() > [0] 46 9056 TSARKIMEXRegister() > [0] 1 1264 TSAdaptCreate() > [0] 8 384 TSBasicSymplecticRegister() > [0] 1 2160 TSCreate() > [0] 1 224 TSCreate_Theta() > [0] 48 5968 TSGLEERegister() > [0] 41 7728 TSRKRegister() > [0] 89 14736 TSRosWRegister() > [0] 71 110192 VecCreate() > [0] 1 307200 VecCreateGhostWithArray() > [0] 123 36874080 VecCreate_MPI_Private() > [0] 7 4300800 VecCreate_Seq() > [0] 8 256 VecCreate_Seq_Private() > [0] 6 400 VecDuplicateVecs_Default() > [0] 3 2352 VecScatterCreate() > [0] 7 1843296 VecScatterSetUp_SF() > [0] 126 2016 VecStashCreate_Private() > [0] 1 3072 mapBlockColoringToJacobian() > > On Wed, Aug 12, 2020 at 4:22 PM Barry Smith > wrote: > > Yes, there are some PETSc objects or arrays that you are not freeing so they are printed at the end of the run. For small runs this harmless but if new objects/memory is allocated at each iteration and not suitably freed it will eventually add up. > > Run with -malloc_view (small problem with say 2 iterations) it will print everything allocated and might be helpful. > > Perhaps you are calling ISColoringGetIS() and not calling ISColoringRestoreIS()? > > It is also possible it is a leak in PETSc, but that is unlikely since we test for them. > > Are you using Fortran? > > Barry > > >> On Aug 12, 2020, at 1:29 PM, Mark Lohry > wrote: >> >> Thanks Matt and Barry. At Matt's suggestion I ran a smaller representative case with valgrind and didn't see anything alarming (apart from a small leak in an older boost version I was using: https://github.com/boostorg/serialization/issues/104 although I don't think this was causing the issue). >> >> -malloc_debug dumps quite a lot, this is supposed to be empty right? Output pasted below. It looks like the same sequence of calls is repeated 8 times, which is how many nonlinear solves occurred in this particular run. Thoughts? >> >> >> >> [ 0]1408 bytes PetscSplitReductionCreate() line 63 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >> [ 0]80 bytes PetscSplitReductionCreate() line 57 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >> [ 0]16 bytes PetscCommBuildTwoSided_Allreduce() line 169 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/mpits.c >> [ 0]16 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >> [ 0]272 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >> [ 0]880 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >> [ 0]960 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >> [ 0]976 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >> [ 0]1040 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >> [ 0]64 bytes ISColoringGetIS() line 266 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/utils/iscoloring.c >> [ 0]32 bytes PetscCommDuplicate() line 129 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/objects/tagm.c >> >> >> >> On Wed, Aug 12, 2020 at 1:46 PM Barry Smith > wrote: >> >> Mark. >> >> When valgrind is not feasible (like on many centrally controlled batch systems) you can run PETSc with an extra flag to do some memory error checks >> -malloc_debug >> >> this >> >> 1) fills all malloced memory with Nan so if the code is using uninitialized memory it may be detected and >> 2) checks the beginning and end of each alloced memory region for out-of-bounds writes at each malloc and free. >> >> it will slow the code down a little bit but generally not a huge amount. >> >> It is no where near as good as valgrind or other memory corruption tools but it has the advantage you can run it anywhere on any size job. >> >> >> Barry >> >> >> >> >> >>> On Aug 12, 2020, at 7:46 AM, Matthew Knepley > wrote: >>> >>> On Wed, Aug 12, 2020 at 7:53 AM Mark Lohry > wrote: >>> I'm getting seemingly random failures of late: >>> Caught signal number 7 BUS: Bus Error, possibly illegal memory access >>> >>> The first thing I would do is run valgrind on as wide an array of tests as you can. This will find problems >>> on things that run completely fine. >>> >>> Thanks, >>> >>> Matt >>> >>> Symptoms: >>> 1) Seems to only happen (so far) on larger cases, 400-2000 cores >>> 2) It doesn't happen right away -- this was running happily for several hours over several hundred time steps with no indication of bad health in the numerics >>> 3) At least the total memory consumption seems to be within bounds, though I'm not sure about individual processes. e.g. slurm here reported Memory Efficiency: 75.23% of 1.76 TB (180.00 GB/node) >>> 4) running the same setup twice it fails at different points >>> >>> Any suggestions on what to look for? This is a bit painful to work on as I can only reproduce it on large runs and then it's seemingly random. >>> >>> >>> Thanks, >>> Mark >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlohry at gmail.com Mon Aug 24 09:15:25 2020 From: mlohry at gmail.com (Mark Lohry) Date: Mon, 24 Aug 2020 10:15:25 -0400 Subject: [petsc-users] Bus Error In-Reply-To: <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> References: <8D172ADD-FC1A-4E71-B151-CA648951A61C@petsc.dev> <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> Message-ID: valgrind: I ran a much smaller case and didn't see any issues in valgrind. I'm only seeing this bus error on several hundred cores a few hours wallclock in, so it might not be feasible to run that in valgrind. blas: i'm not entirely sure -- it's the stock one in PUIAS linux (red hat derivative), libblas.so.3.4.2.. i'm going to try with intel and if that fails use the openblas downloaded via petsc and see if it alleviates itself. On Mon, Aug 24, 2020 at 9:48 AM Barry Smith wrote: > > Mark, > > Can you run in valgrind? > > Exactly what BLAS are you using? > > Barry > > > On Aug 24, 2020, at 7:54 AM, Mark Lohry wrote: > > Reran with debug mode and got a stack trace for this bus error, looks like > it's happening in BLASgemv, see pasted below. I did take care of the > ISColoring leak mentioned previously, although that was a very small amount > of data and I don't think is relevant here. > > At this point it's happily run 222 timesteps prior to this, so I'm a > little mystified. Any ideas? > > Thanks, > Mark > > > 222 TS dt 0.03 time 6.66 > 0 SNES Function norm 4.124287265556e+02 > 0 KSP Residual norm 4.124287265556e+02 > 1 KSP Residual norm 4.123248052318e+02 > 2 KSP Residual norm 4.123173350456e+02 > 3 KSP Residual norm 4.118769044110e+02 > 4 KSP Residual norm 4.094856150740e+02 > 5 KSP Residual norm 4.006000788078e+02 > 6 KSP Residual norm 3.787922969183e+02 > [clip] > Linear solve converged due to CONVERGED_RTOL iterations 9 > Line search: Using full step: fnorm 4.015236590684e+01 gnorm > 3.173434863784e+00 > 2 SNES Function norm 3.173434863784e+00 > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 2 > 0 SNES Function norm 5.842010710080e+02 > 0 KSP Residual norm 5.842010710080e+02 > 1 KSP Residual norm 5.840526408234e+02 > 2 KSP Residual norm 5.840431857354e+02 > 3 KSP Residual norm 5.834351392302e+02 > 4 KSP Residual norm 5.800901047861e+02 > 5 KSP Residual norm 5.675562288567e+02 > 6 KSP Residual norm 5.366287895681e+02 > 7 KSP Residual norm 4.725811521866e+02 > [911]PETSC ERROR: > ------------------------------------------------------------------------ > [911]PETSC ERROR: Caught signal number 7 BUS: Bus Error, possibly illegal > memory access > [911]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > [911]PETSC ERROR: or see > https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [911]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac > OS X to find memory corruption errors > [911]PETSC ERROR: likely location of problem given in stack below > [911]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [911]PETSC ERROR: Note: The EXACT line numbers in the stack are not > available, > [911]PETSC ERROR: INSTEAD the line number of the start of the > function > [911]PETSC ERROR: is given. > [911]PETSC ERROR: [911] BLASgemv line 1393 > /home/mlohry/build/external/petsc/src/mat/impls/baij/seq/baijfact.c > [911]PETSC ERROR: [911] MatSolve_SeqBAIJ_N_NaturalOrdering line 1378 > /home/mlohry/build/external/petsc/src/mat/impls/baij/seq/baijfact.c > [911]PETSC ERROR: [911] MatSolve line 3354 > /home/mlohry/build/external/petsc/src/mat/interface/matrix.c > [911]PETSC ERROR: [911] PCApply_ILU line 201 > /home/mlohry/build/external/petsc/src/ksp/pc/impls/factor/ilu/ilu.c > [911]PETSC ERROR: [911] PCApply line 426 > /home/mlohry/build/external/petsc/src/ksp/pc/interface/precon.c > [911]PETSC ERROR: [911] KSP_PCApply line 279 > /home/mlohry/build/external/petsc/include/petsc/private/kspimpl.h > [911]PETSC ERROR: [911] KSPSolve_PREONLY line 16 > /home/mlohry/build/external/petsc/src/ksp/ksp/impls/preonly/preonly.c > [911]PETSC ERROR: [911] KSPSolve_Private line 590 > /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c > [911]PETSC ERROR: [911] KSPSolve line 848 > /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c > [911]PETSC ERROR: [911] PCApply_ASM line 441 > /home/mlohry/build/external/petsc/src/ksp/pc/impls/asm/asm.c > [911]PETSC ERROR: [911] PCApply line 426 > /home/mlohry/build/external/petsc/src/ksp/pc/interface/precon.c > [911]PETSC ERROR: [911] KSP_PCApply line 279 > /home/mlohry/build/external/petsc/include/petsc/private/kspimpl.h > [911]PETSC ERROR: [911] KSPFGMRESCycle line 108 > /home/mlohry/build/external/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c > [911]PETSC ERROR: [911] KSPSolve_FGMRES line 274 > /home/mlohry/build/external/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c > [911]PETSC ERROR: [911] KSPSolve_Private line 590 > /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c > [911]PETSC ERROR: [911] KSPSolve line 848 > /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c > [911]PETSC ERROR: [911] SNESSolve_NEWTONLS line 144 > /home/mlohry/build/external/petsc/src/snes/impls/ls/ls.c > [911]PETSC ERROR: [911] SNESSolve line 4403 > /home/mlohry/build/external/petsc/src/snes/interface/snes.c > [911]PETSC ERROR: [911] TSStep_ARKIMEX line 728 > /home/mlohry/build/external/petsc/src/ts/impls/arkimex/arkimex.c > [911]PETSC ERROR: [911] TSStep line 3682 > /home/mlohry/build/external/petsc/src/ts/interface/ts.c > [911]PETSC ERROR: [911] TSSolve line 4005 > /home/mlohry/build/external/petsc/src/ts/interface/ts.c > [911]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [911]PETSC ERROR: Signal received > [911]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [911]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020 > [911]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h20c2n20 by > mlohry Sun Aug 23 19:54:21 2020 > [911]PETSC ERROR: Configure options > PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt > --with-cc=/usr/local/openmpi/3.1.3/gcc/x8 > [911]PETSC ERROR: #1 User provided function() line 0 in unknown file > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 911 in communicator MPI_COMM_WORLD > > On Wed, Aug 12, 2020 at 8:19 PM Mark Lohry wrote: > >> Perhaps you are calling ISColoringGetIS() and not calling >>> ISColoringRestoreIS()? >>> >> >> I have matching ISColoringGet/Restore here, and it's only used prior to >> the first iteration so at least it doesn't seem to be growing. At the >> bottom I pasted the malloc_view and malloc_debug output from running 1 time >> step. >> >> I'm sort of thinking this might be a red herring -- is it possible the >> rank 0 process is chewing up dramatically more memory than others, like >> with logging or something? Like I mentioned earlier the total memory usage >> is well under the machine limits. I'll spring in some >> PetscMemoryGetMaximumUsage logging at every time step and try to get a big >> job going again. >> >> >> >> Are you using Fortran? >>> >> >> C++ >> >> >> >> [ 0]1408 bytes PetscSplitReductionCreate() line 63 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >> [ 0]80 bytes PetscSplitReductionCreate() line 57 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >> [ 0]16 bytes PetscCommBuildTwoSided_Allreduce() line 169 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/mpits.c >> [ 0]16 bytes ISGeneralSetIndices_General() line 578 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]16 bytes PetscLayoutSetUp() line 269 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]80 bytes PetscLayoutCreate() line 55 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]16 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]32 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]16 bytes ISCreate_General() line 647 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]896 bytes ISCreate() line 37 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >> [ 0]272 bytes ISGeneralSetIndices_General() line 578 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]16 bytes PetscLayoutSetUp() line 269 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]80 bytes PetscLayoutCreate() line 55 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]16 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]32 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]16 bytes ISCreate_General() line 647 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]896 bytes ISCreate() line 37 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >> [ 0]880 bytes ISGeneralSetIndices_General() line 578 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]16 bytes PetscLayoutSetUp() line 269 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]80 bytes PetscLayoutCreate() line 55 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]16 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]32 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]16 bytes ISCreate_General() line 647 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]896 bytes ISCreate() line 37 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >> [ 0]960 bytes ISGeneralSetIndices_General() line 578 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]16 bytes PetscLayoutSetUp() line 269 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]80 bytes PetscLayoutCreate() line 55 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]16 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]32 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]16 bytes ISCreate_General() line 647 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]896 bytes ISCreate() line 37 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >> [ 0]976 bytes ISGeneralSetIndices_General() line 578 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]16 bytes PetscLayoutSetUp() line 269 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]80 bytes PetscLayoutCreate() line 55 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]16 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]32 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]16 bytes ISCreate_General() line 647 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]896 bytes ISCreate() line 37 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]16 bytes PetscLayoutSetUp() line 269 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]80 bytes PetscLayoutCreate() line 55 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]16 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]32 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]16 bytes ISCreate_General() line 647 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]896 bytes ISCreate() line 37 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]16 bytes PetscLayoutSetUp() line 269 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]80 bytes PetscLayoutCreate() line 55 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]16 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]32 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]16 bytes ISCreate_General() line 647 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]896 bytes ISCreate() line 37 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >> [ 0]1040 bytes ISGeneralSetIndices_General() line 578 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]16 bytes PetscLayoutSetUp() line 269 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]80 bytes PetscLayoutCreate() line 55 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]16 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]32 bytes PetscStrallocpy() line 187 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]16 bytes ISCreate_General() line 647 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]896 bytes ISCreate() line 37 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >> [ 0]64 bytes ISColoringGetIS() line 266 in >> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/utils/iscoloring.c >> [ 0]32 bytes PetscCommDuplicate() line 129 in >> /home/mlohry/dev/cmake-build/external/petsc/src/sys/objects/tagm.c >> [0] Maximum memory PetscMalloc()ed 610153776 maximum size of entire >> process 719073280 >> [0] Memory usage sorted by function >> [0] 6 192 DMCoarsenHookAdd() >> [0] 2 9984 DMCreate() >> [0] 2 128 DMCreate_Shell() >> [0] 2 64 DMDSEnlarge_Static() >> [0] 1 672 DMKSPCreate() >> [0] 3 96 DMRefineHookAdd() >> [0] 3 2064 DMSNESCreate() >> [0] 4 128 DMSubDomainHookAdd() >> [0] 1 768 DMTSCreate() >> [0] 2 96 ISColoringCreate() >> [0] 8 12608 ISColoringGetIS() >> [0] 1 307200 ISConcatenate() >> [0] 29 25984 ISCreate() >> [0] 25 400 ISCreate_General() >> [0] 4 64 ISCreate_Stride() >> [0] 20 338016 ISGeneralSetIndices_General() >> [0] 3 921600 ISGetIndices_Stride() >> [0] 2 307232 ISGlobalToLocalMappingSetUp_Basic() >> [0] 1 6144 ISInvertPermutation_General() >> [0] 3 308576 ISLocalToGlobalMappingCreate() >> [0] 2 32 KSPConvergedDefaultCreate() >> [0] 2 2816 KSPCreate() >> [0] 1 224 KSPCreate_FGMRES() >> [0] 1 8016 KSPGMRESClassicalGramSchmidtOrthogonalization() >> [0] 2 16032 KSPSetUp_FGMRES() >> [0] 4 16084160 KSPSetUp_GMRES() >> [0] 2 36864 MatColoringApply_SL() >> [0] 1 656 MatColoringCreate() >> [0] 6 17088 MatCreate() >> [0] 1 16 MatCreateMFFD_WP() >> [0] 1 16 MatCreateSubMatrices_SeqBAIJ() >> [0] 1 12288 MatCreateSubMatrix_SeqBAIJ() >> [0] 3 32320 MatCreateSubMatrix_SeqBAIJ_Private() >> [0] 2 1472 MatCreate_MFFD() >> [0] 1 416 MatCreate_SeqAIJ() >> [0] 3 864 MatCreate_SeqBAIJ() >> [0] 2 416 MatCreate_Shell() >> [0] 1 784 MatFDColoringCreate() >> [0] 2 12288 MatFDColoringDegreeSequence_Minpack() >> [0] 6 30859392 MatFDColoringSetUp_SeqXAIJ() >> [0] 3 42512 MatGetColumnIJ_SeqAIJ() >> [0] 4 72720 MatGetColumnIJ_SeqBAIJ_Color() >> [0] 1 6144 MatGetOrdering_Natural() >> [0] 2 36384 MatGetRowIJ_SeqAIJ() >> [0] 7 210626000 MatILUFactorSymbolic_SeqBAIJ() >> [0] 2 313376 MatIncreaseOverlap_SeqBAIJ() >> [0] 2 30740608 MatLUFactorNumeric_SeqBAIJ_N() >> [0] 1 6144 MatMarkDiagonal_SeqAIJ() >> [0] 1 6144 MatMarkDiagonal_SeqBAIJ() >> [0] 8 256 MatRegisterRootName() >> [0] 1 6160 MatSeqAIJCheckInode() >> [0] 4 115216 MatSeqAIJSetPreallocation_SeqAIJ() >> [0] 4 302779424 MatSeqBAIJSetPreallocation_SeqBAIJ() >> [0] 13 576 MatSolverTypeRegister() >> [0] 1 16 PCASMCreateSubdomains() >> [0] 2 1664 PCCreate() >> [0] 1 160 PCCreate_ASM() >> [0] 1 192 PCCreate_ILU() >> [0] 5 307264 PCSetUp_ASM() >> [0] 2 416 PetscBTCreate() >> [0] 2 3216 PetscClassPerfLogCreate() >> [0] 2 1616 PetscClassRegLogCreate() >> [0] 2 32 PetscCommBuildTwoSided_Allreduce() >> [0] 2 64 PetscCommDuplicate() >> [0] 2 1888 PetscDSCreate() >> [0] 2 26416 PetscEventPerfLogCreate() >> [0] 2 158400 PetscEventPerfLogEnsureSize() >> [0] 2 1616 PetscEventRegLogCreate() >> [0] 2 9600 PetscEventRegLogRegister() >> [0] 8 102400 PetscFreeSpaceGet() >> [0] 474 15168 PetscFunctionListAdd_Private() >> [0] 2 528 PetscIntStackCreate() >> [0] 142 11360 PetscLayoutCreate() >> [0] 56 896 PetscLayoutSetUp() >> [0] 59 9440 PetscObjectComposedDataIncreaseReal() >> [0] 2 576 PetscObjectListAdd() >> [0] 33 768 PetscOptionsGetEList() >> [0] 1 16 PetscOptionsHelpPrintedCreate() >> [0] 1 32 PetscPushSignalHandler() >> [0] 7 6944 PetscSFCreate() >> [0] 3 432 PetscSFCreate_Basic() >> [0] 2 1472 PetscSFLinkCreate() >> [0] 11 1229040 PetscSFSetUpRanks() >> [0] 7 614512 PetscSFSetUp_Basic() >> [0] 4 20096 PetscSegBufferCreate() >> [0] 2 1488 PetscSplitReductionCreate() >> [0] 2 3008 PetscStageLogCreate() >> [0] 1148 23872 PetscStrallocpy() >> [0] 6 13056 PetscStrreplace() >> [0] 9 3456 PetscTableCreate() >> [0] 1 16 PetscViewerASCIIOpen() >> [0] 6 96 PetscViewerAndFormatCreate() >> [0] 1 752 PetscViewerCreate() >> [0] 1 96 PetscViewerCreate_ASCII() >> [0] 2 1424 SNESCreate() >> [0] 1 16 SNESCreate_NEWTONLS() >> [0] 1 1008 SNESLineSearchCreate() >> [0] 1 16 SNESLineSearchCreate_BT() >> [0] 16 1824 SNESMSRegister() >> [0] 46 9056 TSARKIMEXRegister() >> [0] 1 1264 TSAdaptCreate() >> [0] 8 384 TSBasicSymplecticRegister() >> [0] 1 2160 TSCreate() >> [0] 1 224 TSCreate_Theta() >> [0] 48 5968 TSGLEERegister() >> [0] 41 7728 TSRKRegister() >> [0] 89 14736 TSRosWRegister() >> [0] 71 110192 VecCreate() >> [0] 1 307200 VecCreateGhostWithArray() >> [0] 123 36874080 VecCreate_MPI_Private() >> [0] 7 4300800 VecCreate_Seq() >> [0] 8 256 VecCreate_Seq_Private() >> [0] 6 400 VecDuplicateVecs_Default() >> [0] 3 2352 VecScatterCreate() >> [0] 7 1843296 VecScatterSetUp_SF() >> [0] 126 2016 VecStashCreate_Private() >> [0] 1 3072 mapBlockColoringToJacobian() >> >> On Wed, Aug 12, 2020 at 4:22 PM Barry Smith wrote: >> >>> >>> Yes, there are some PETSc objects or arrays that you are not freeing >>> so they are printed at the end of the run. For small runs this harmless but >>> if new objects/memory is allocated at each iteration and not suitably freed >>> it will eventually add up. >>> >>> Run with -malloc_view (small problem with say 2 iterations) it will >>> print everything allocated and might be helpful. >>> >>> Perhaps you are calling ISColoringGetIS() and not calling >>> ISColoringRestoreIS()? >>> >>> It is also possible it is a leak in PETSc, but that is unlikely since >>> we test for them. >>> >>> Are you using Fortran? >>> >>> Barry >>> >>> >>> On Aug 12, 2020, at 1:29 PM, Mark Lohry wrote: >>> >>> Thanks Matt and Barry. At Matt's suggestion I ran a smaller >>> representative case with valgrind and didn't see anything alarming (apart >>> from a small leak in an older boost version I was using: >>> https://github.com/boostorg/serialization/issues/104 although I don't >>> think this was causing the issue). >>> >>> -malloc_debug dumps quite a lot, this is supposed to be empty right? >>> Output pasted below. It looks like the same sequence of calls is repeated 8 >>> times, which is how many nonlinear solves occurred in this particular run. >>> Thoughts? >>> >>> >>> >>> [ 0]1408 bytes PetscSplitReductionCreate() line 63 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>> [ 0]80 bytes PetscSplitReductionCreate() line 57 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>> [ 0]16 bytes PetscCommBuildTwoSided_Allreduce() line 169 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/mpits.c >>> [ 0]16 bytes ISGeneralSetIndices_General() line 578 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]272 bytes ISGeneralSetIndices_General() line 578 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]880 bytes ISGeneralSetIndices_General() line 578 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]960 bytes ISGeneralSetIndices_General() line 578 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]976 bytes ISGeneralSetIndices_General() line 578 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]1040 bytes ISGeneralSetIndices_General() line 578 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]64 bytes ISColoringGetIS() line 266 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/utils/iscoloring.c >>> [ 0]32 bytes PetscCommDuplicate() line 129 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/objects/tagm.c >>> >>> >>> >>> On Wed, Aug 12, 2020 at 1:46 PM Barry Smith wrote: >>> >>>> >>>> Mark. >>>> >>>> When valgrind is not feasible (like on many centrally controlled >>>> batch systems) you can run PETSc with an extra flag to do some memory error >>>> checks >>>> -malloc_debug >>>> >>>> this >>>> >>>> 1) fills all malloced memory with Nan so if the code is using >>>> uninitialized memory it may be detected and >>>> 2) checks the beginning and end of each alloced memory region for >>>> out-of-bounds writes at each malloc and free. >>>> >>>> it will slow the code down a little bit but generally not a huge amount. >>>> >>>> It is no where near as good as valgrind or other memory corruption >>>> tools but it has the advantage you can run it anywhere on any size job. >>>> >>>> >>>> Barry >>>> >>>> >>>> >>>> >>>> >>>> On Aug 12, 2020, at 7:46 AM, Matthew Knepley wrote: >>>> >>>> On Wed, Aug 12, 2020 at 7:53 AM Mark Lohry wrote: >>>> >>>>> I'm getting seemingly random failures of late: >>>>> Caught signal number 7 BUS: Bus Error, possibly illegal memory access >>>>> >>>> >>>> The first thing I would do is run valgrind on as wide an array of tests >>>> as you can. This will find problems >>>> on things that run completely fine. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Symptoms: >>>>> 1) Seems to only happen (so far) on larger cases, 400-2000 cores >>>>> 2) It doesn't happen right away -- this was running happily for >>>>> several hours over several hundred time steps with no indication of bad >>>>> health in the numerics >>>>> 3) At least the total memory consumption seems to be within bounds, >>>>> though I'm not sure about individual processes. e.g. slurm here reported >>>>> Memory Efficiency: 75.23% of 1.76 TB (180.00 GB/node) >>>>> 4) running the same setup twice it fails at different points >>>>> >>>>> Any suggestions on what to look for? This is a bit painful to work on >>>>> as I can only reproduce it on large runs and then it's seemingly random. >>>>> >>>>> >>>>> Thanks, >>>>> Mark >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>>> >>>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Aug 24 09:35:48 2020 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 24 Aug 2020 09:35:48 -0500 Subject: [petsc-users] Bus Error In-Reply-To: References: <8D172ADD-FC1A-4E71-B151-CA648951A61C@petsc.dev> <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> Message-ID: <02176361-CBC4-490B-A064-057C9FAC271A@petsc.dev> Mark, Ok, I'd generally trust the stock BLAS for not failing over OpenBLAS. Since valgrind is not viable have you tried with -malloc_debug with the bad case it will be a little bit slower but not to bad and can find some memory corruption issues. It might be useful to get the stack trace inside the BLAS to see exactly where it crashes. If you ./configure with debugging and use --download-fblaslapack or --download-f2cblaslapack it will compile the BLAS with debugging, but just running a batch job still won't display the stack frames inside the BLAS call. We have an option -on_error_attach_debugger which is useful for longer many rank runs that attaches the debugger ONLY when the error is detected but it may not play well with batch systems. But if you can make your run on a non-batch system it might be able, along with the --download-fblaslapack or --download-f2cblaslapack to get the exact stack frames. And in the debugger look at the variables and address points to try to determine how it could have gone wrong. I know this is not necessarily a reasonable test but if you run the exact same thing twice does it crash at the same location in terms of iterations or does it seem to crash eventually "randomly" just after a long time? I understand the frustration with this kind of crash, it just shouldn't happen because the same BLAS calls have been made in the same way thousands of times and yet suddenly trouble and very hard to debug. Barry > On Aug 24, 2020, at 9:15 AM, Mark Lohry wrote: > > valgrind: I ran a much smaller case and didn't see any issues in valgrind. I'm only seeing this bus error on several hundred cores a few hours wallclock in, so it might not be feasible to run that in valgrind. > > blas: i'm not entirely sure -- it's the stock one in PUIAS linux (red hat derivative), libblas.so.3.4.2.. i'm going to try with intel and if that fails use the openblas downloaded via petsc and see if it alleviates itself. > > > > On Mon, Aug 24, 2020 at 9:48 AM Barry Smith > wrote: > > Mark, > > Can you run in valgrind? > > Exactly what BLAS are you using? > > Barry > > >> On Aug 24, 2020, at 7:54 AM, Mark Lohry > wrote: >> >> Reran with debug mode and got a stack trace for this bus error, looks like it's happening in BLASgemv, see pasted below. I did take care of the ISColoring leak mentioned previously, although that was a very small amount of data and I don't think is relevant here. >> >> At this point it's happily run 222 timesteps prior to this, so I'm a little mystified. Any ideas? >> >> Thanks, >> Mark >> >> >> 222 TS dt 0.03 time 6.66 >> 0 SNES Function norm 4.124287265556e+02 >> 0 KSP Residual norm 4.124287265556e+02 >> 1 KSP Residual norm 4.123248052318e+02 >> 2 KSP Residual norm 4.123173350456e+02 >> 3 KSP Residual norm 4.118769044110e+02 >> 4 KSP Residual norm 4.094856150740e+02 >> 5 KSP Residual norm 4.006000788078e+02 >> 6 KSP Residual norm 3.787922969183e+02 >> [clip] >> Linear solve converged due to CONVERGED_RTOL iterations 9 >> Line search: Using full step: fnorm 4.015236590684e+01 gnorm 3.173434863784e+00 >> 2 SNES Function norm 3.173434863784e+00 >> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 2 >> 0 SNES Function norm 5.842010710080e+02 >> 0 KSP Residual norm 5.842010710080e+02 >> 1 KSP Residual norm 5.840526408234e+02 >> 2 KSP Residual norm 5.840431857354e+02 >> 3 KSP Residual norm 5.834351392302e+02 >> 4 KSP Residual norm 5.800901047861e+02 >> 5 KSP Residual norm 5.675562288567e+02 >> 6 KSP Residual norm 5.366287895681e+02 >> 7 KSP Residual norm 4.725811521866e+02 >> [911]PETSC ERROR: ------------------------------------------------------------------------ >> [911]PETSC ERROR: Caught signal number 7 BUS: Bus Error, possibly illegal memory access >> [911]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >> [911]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >> [911]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors >> [911]PETSC ERROR: likely location of problem given in stack below >> [911]PETSC ERROR: --------------------- Stack Frames ------------------------------------ >> [911]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, >> [911]PETSC ERROR: INSTEAD the line number of the start of the function >> [911]PETSC ERROR: is given. >> [911]PETSC ERROR: [911] BLASgemv line 1393 /home/mlohry/build/external/petsc/src/mat/impls/baij/seq/baijfact.c >> [911]PETSC ERROR: [911] MatSolve_SeqBAIJ_N_NaturalOrdering line 1378 /home/mlohry/build/external/petsc/src/mat/impls/baij/seq/baijfact.c >> [911]PETSC ERROR: [911] MatSolve line 3354 /home/mlohry/build/external/petsc/src/mat/interface/matrix.c >> [911]PETSC ERROR: [911] PCApply_ILU line 201 /home/mlohry/build/external/petsc/src/ksp/pc/impls/factor/ilu/ilu.c >> [911]PETSC ERROR: [911] PCApply line 426 /home/mlohry/build/external/petsc/src/ksp/pc/interface/precon.c >> [911]PETSC ERROR: [911] KSP_PCApply line 279 /home/mlohry/build/external/petsc/include/petsc/private/kspimpl.h >> [911]PETSC ERROR: [911] KSPSolve_PREONLY line 16 /home/mlohry/build/external/petsc/src/ksp/ksp/impls/preonly/preonly.c >> [911]PETSC ERROR: [911] KSPSolve_Private line 590 /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >> [911]PETSC ERROR: [911] KSPSolve line 848 /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >> [911]PETSC ERROR: [911] PCApply_ASM line 441 /home/mlohry/build/external/petsc/src/ksp/pc/impls/asm/asm.c >> [911]PETSC ERROR: [911] PCApply line 426 /home/mlohry/build/external/petsc/src/ksp/pc/interface/precon.c >> [911]PETSC ERROR: [911] KSP_PCApply line 279 /home/mlohry/build/external/petsc/include/petsc/private/kspimpl.h >> [911]PETSC ERROR: [911] KSPFGMRESCycle line 108 /home/mlohry/build/external/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c >> [911]PETSC ERROR: [911] KSPSolve_FGMRES line 274 /home/mlohry/build/external/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c >> [911]PETSC ERROR: [911] KSPSolve_Private line 590 /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >> [911]PETSC ERROR: [911] KSPSolve line 848 /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >> [911]PETSC ERROR: [911] SNESSolve_NEWTONLS line 144 /home/mlohry/build/external/petsc/src/snes/impls/ls/ls.c >> [911]PETSC ERROR: [911] SNESSolve line 4403 /home/mlohry/build/external/petsc/src/snes/interface/snes.c >> [911]PETSC ERROR: [911] TSStep_ARKIMEX line 728 /home/mlohry/build/external/petsc/src/ts/impls/arkimex/arkimex.c >> [911]PETSC ERROR: [911] TSStep line 3682 /home/mlohry/build/external/petsc/src/ts/interface/ts.c >> [911]PETSC ERROR: [911] TSSolve line 4005 /home/mlohry/build/external/petsc/src/ts/interface/ts.c >> [911]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >> [911]PETSC ERROR: Signal received >> [911]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >> [911]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020 >> [911]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h20c2n20 by mlohry Sun Aug 23 19:54:21 2020 >> [911]PETSC ERROR: Configure options PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt --with-cc=/usr/local/openmpi/3.1.3/gcc/x8 >> [911]PETSC ERROR: #1 User provided function() line 0 in unknown file >> -------------------------------------------------------------------------- >> MPI_ABORT was invoked on rank 911 in communicator MPI_COMM_WORLD >> >> On Wed, Aug 12, 2020 at 8:19 PM Mark Lohry > wrote: >> Perhaps you are calling ISColoringGetIS() and not calling ISColoringRestoreIS()? >> >> I have matching ISColoringGet/Restore here, and it's only used prior to the first iteration so at least it doesn't seem to be growing. At the bottom I pasted the malloc_view and malloc_debug output from running 1 time step. >> >> I'm sort of thinking this might be a red herring -- is it possible the rank 0 process is chewing up dramatically more memory than others, like with logging or something? Like I mentioned earlier the total memory usage is well under the machine limits. I'll spring in some PetscMemoryGetMaximumUsage logging at every time step and try to get a big job going again. >> >> >> >> Are you using Fortran? >> >> C++ >> >> >> >> [ 0]1408 bytes PetscSplitReductionCreate() line 63 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >> [ 0]80 bytes PetscSplitReductionCreate() line 57 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >> [ 0]16 bytes PetscCommBuildTwoSided_Allreduce() line 169 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/mpits.c >> [ 0]16 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >> [ 0]272 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >> [ 0]880 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >> [ 0]960 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >> [ 0]976 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >> [ 0]1040 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >> [ 0]64 bytes ISColoringGetIS() line 266 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/utils/iscoloring.c >> [ 0]32 bytes PetscCommDuplicate() line 129 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/objects/tagm.c >> [0] Maximum memory PetscMalloc()ed 610153776 maximum size of entire process 719073280 >> [0] Memory usage sorted by function >> [0] 6 192 DMCoarsenHookAdd() >> [0] 2 9984 DMCreate() >> [0] 2 128 DMCreate_Shell() >> [0] 2 64 DMDSEnlarge_Static() >> [0] 1 672 DMKSPCreate() >> [0] 3 96 DMRefineHookAdd() >> [0] 3 2064 DMSNESCreate() >> [0] 4 128 DMSubDomainHookAdd() >> [0] 1 768 DMTSCreate() >> [0] 2 96 ISColoringCreate() >> [0] 8 12608 ISColoringGetIS() >> [0] 1 307200 ISConcatenate() >> [0] 29 25984 ISCreate() >> [0] 25 400 ISCreate_General() >> [0] 4 64 ISCreate_Stride() >> [0] 20 338016 ISGeneralSetIndices_General() >> [0] 3 921600 ISGetIndices_Stride() >> [0] 2 307232 ISGlobalToLocalMappingSetUp_Basic() >> [0] 1 6144 ISInvertPermutation_General() >> [0] 3 308576 ISLocalToGlobalMappingCreate() >> [0] 2 32 KSPConvergedDefaultCreate() >> [0] 2 2816 KSPCreate() >> [0] 1 224 KSPCreate_FGMRES() >> [0] 1 8016 KSPGMRESClassicalGramSchmidtOrthogonalization() >> [0] 2 16032 KSPSetUp_FGMRES() >> [0] 4 16084160 KSPSetUp_GMRES() >> [0] 2 36864 MatColoringApply_SL() >> [0] 1 656 MatColoringCreate() >> [0] 6 17088 MatCreate() >> [0] 1 16 MatCreateMFFD_WP() >> [0] 1 16 MatCreateSubMatrices_SeqBAIJ() >> [0] 1 12288 MatCreateSubMatrix_SeqBAIJ() >> [0] 3 32320 MatCreateSubMatrix_SeqBAIJ_Private() >> [0] 2 1472 MatCreate_MFFD() >> [0] 1 416 MatCreate_SeqAIJ() >> [0] 3 864 MatCreate_SeqBAIJ() >> [0] 2 416 MatCreate_Shell() >> [0] 1 784 MatFDColoringCreate() >> [0] 2 12288 MatFDColoringDegreeSequence_Minpack() >> [0] 6 30859392 MatFDColoringSetUp_SeqXAIJ() >> [0] 3 42512 MatGetColumnIJ_SeqAIJ() >> [0] 4 72720 MatGetColumnIJ_SeqBAIJ_Color() >> [0] 1 6144 MatGetOrdering_Natural() >> [0] 2 36384 MatGetRowIJ_SeqAIJ() >> [0] 7 210626000 MatILUFactorSymbolic_SeqBAIJ() >> [0] 2 313376 MatIncreaseOverlap_SeqBAIJ() >> [0] 2 30740608 MatLUFactorNumeric_SeqBAIJ_N() >> [0] 1 6144 MatMarkDiagonal_SeqAIJ() >> [0] 1 6144 MatMarkDiagonal_SeqBAIJ() >> [0] 8 256 MatRegisterRootName() >> [0] 1 6160 MatSeqAIJCheckInode() >> [0] 4 115216 MatSeqAIJSetPreallocation_SeqAIJ() >> [0] 4 302779424 MatSeqBAIJSetPreallocation_SeqBAIJ() >> [0] 13 576 MatSolverTypeRegister() >> [0] 1 16 PCASMCreateSubdomains() >> [0] 2 1664 PCCreate() >> [0] 1 160 PCCreate_ASM() >> [0] 1 192 PCCreate_ILU() >> [0] 5 307264 PCSetUp_ASM() >> [0] 2 416 PetscBTCreate() >> [0] 2 3216 PetscClassPerfLogCreate() >> [0] 2 1616 PetscClassRegLogCreate() >> [0] 2 32 PetscCommBuildTwoSided_Allreduce() >> [0] 2 64 PetscCommDuplicate() >> [0] 2 1888 PetscDSCreate() >> [0] 2 26416 PetscEventPerfLogCreate() >> [0] 2 158400 PetscEventPerfLogEnsureSize() >> [0] 2 1616 PetscEventRegLogCreate() >> [0] 2 9600 PetscEventRegLogRegister() >> [0] 8 102400 PetscFreeSpaceGet() >> [0] 474 15168 PetscFunctionListAdd_Private() >> [0] 2 528 PetscIntStackCreate() >> [0] 142 11360 PetscLayoutCreate() >> [0] 56 896 PetscLayoutSetUp() >> [0] 59 9440 PetscObjectComposedDataIncreaseReal() >> [0] 2 576 PetscObjectListAdd() >> [0] 33 768 PetscOptionsGetEList() >> [0] 1 16 PetscOptionsHelpPrintedCreate() >> [0] 1 32 PetscPushSignalHandler() >> [0] 7 6944 PetscSFCreate() >> [0] 3 432 PetscSFCreate_Basic() >> [0] 2 1472 PetscSFLinkCreate() >> [0] 11 1229040 PetscSFSetUpRanks() >> [0] 7 614512 PetscSFSetUp_Basic() >> [0] 4 20096 PetscSegBufferCreate() >> [0] 2 1488 PetscSplitReductionCreate() >> [0] 2 3008 PetscStageLogCreate() >> [0] 1148 23872 PetscStrallocpy() >> [0] 6 13056 PetscStrreplace() >> [0] 9 3456 PetscTableCreate() >> [0] 1 16 PetscViewerASCIIOpen() >> [0] 6 96 PetscViewerAndFormatCreate() >> [0] 1 752 PetscViewerCreate() >> [0] 1 96 PetscViewerCreate_ASCII() >> [0] 2 1424 SNESCreate() >> [0] 1 16 SNESCreate_NEWTONLS() >> [0] 1 1008 SNESLineSearchCreate() >> [0] 1 16 SNESLineSearchCreate_BT() >> [0] 16 1824 SNESMSRegister() >> [0] 46 9056 TSARKIMEXRegister() >> [0] 1 1264 TSAdaptCreate() >> [0] 8 384 TSBasicSymplecticRegister() >> [0] 1 2160 TSCreate() >> [0] 1 224 TSCreate_Theta() >> [0] 48 5968 TSGLEERegister() >> [0] 41 7728 TSRKRegister() >> [0] 89 14736 TSRosWRegister() >> [0] 71 110192 VecCreate() >> [0] 1 307200 VecCreateGhostWithArray() >> [0] 123 36874080 VecCreate_MPI_Private() >> [0] 7 4300800 VecCreate_Seq() >> [0] 8 256 VecCreate_Seq_Private() >> [0] 6 400 VecDuplicateVecs_Default() >> [0] 3 2352 VecScatterCreate() >> [0] 7 1843296 VecScatterSetUp_SF() >> [0] 126 2016 VecStashCreate_Private() >> [0] 1 3072 mapBlockColoringToJacobian() >> >> On Wed, Aug 12, 2020 at 4:22 PM Barry Smith > wrote: >> >> Yes, there are some PETSc objects or arrays that you are not freeing so they are printed at the end of the run. For small runs this harmless but if new objects/memory is allocated at each iteration and not suitably freed it will eventually add up. >> >> Run with -malloc_view (small problem with say 2 iterations) it will print everything allocated and might be helpful. >> >> Perhaps you are calling ISColoringGetIS() and not calling ISColoringRestoreIS()? >> >> It is also possible it is a leak in PETSc, but that is unlikely since we test for them. >> >> Are you using Fortran? >> >> Barry >> >> >>> On Aug 12, 2020, at 1:29 PM, Mark Lohry > wrote: >>> >>> Thanks Matt and Barry. At Matt's suggestion I ran a smaller representative case with valgrind and didn't see anything alarming (apart from a small leak in an older boost version I was using: https://github.com/boostorg/serialization/issues/104 although I don't think this was causing the issue). >>> >>> -malloc_debug dumps quite a lot, this is supposed to be empty right? Output pasted below. It looks like the same sequence of calls is repeated 8 times, which is how many nonlinear solves occurred in this particular run. Thoughts? >>> >>> >>> >>> [ 0]1408 bytes PetscSplitReductionCreate() line 63 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>> [ 0]80 bytes PetscSplitReductionCreate() line 57 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>> [ 0]16 bytes PetscCommBuildTwoSided_Allreduce() line 169 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/mpits.c >>> [ 0]16 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]272 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]880 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]960 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]976 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]1040 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]64 bytes ISColoringGetIS() line 266 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/utils/iscoloring.c >>> [ 0]32 bytes PetscCommDuplicate() line 129 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/objects/tagm.c >>> >>> >>> >>> On Wed, Aug 12, 2020 at 1:46 PM Barry Smith > wrote: >>> >>> Mark. >>> >>> When valgrind is not feasible (like on many centrally controlled batch systems) you can run PETSc with an extra flag to do some memory error checks >>> -malloc_debug >>> >>> this >>> >>> 1) fills all malloced memory with Nan so if the code is using uninitialized memory it may be detected and >>> 2) checks the beginning and end of each alloced memory region for out-of-bounds writes at each malloc and free. >>> >>> it will slow the code down a little bit but generally not a huge amount. >>> >>> It is no where near as good as valgrind or other memory corruption tools but it has the advantage you can run it anywhere on any size job. >>> >>> >>> Barry >>> >>> >>> >>> >>> >>>> On Aug 12, 2020, at 7:46 AM, Matthew Knepley > wrote: >>>> >>>> On Wed, Aug 12, 2020 at 7:53 AM Mark Lohry > wrote: >>>> I'm getting seemingly random failures of late: >>>> Caught signal number 7 BUS: Bus Error, possibly illegal memory access >>>> >>>> The first thing I would do is run valgrind on as wide an array of tests as you can. This will find problems >>>> on things that run completely fine. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> Symptoms: >>>> 1) Seems to only happen (so far) on larger cases, 400-2000 cores >>>> 2) It doesn't happen right away -- this was running happily for several hours over several hundred time steps with no indication of bad health in the numerics >>>> 3) At least the total memory consumption seems to be within bounds, though I'm not sure about individual processes. e.g. slurm here reported Memory Efficiency: 75.23% of 1.76 TB (180.00 GB/node) >>>> 4) running the same setup twice it fails at different points >>>> >>>> Any suggestions on what to look for? This is a bit painful to work on as I can only reproduce it on large runs and then it's seemingly random. >>>> >>>> >>>> Thanks, >>>> Mark >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlohry at gmail.com Mon Aug 24 09:54:57 2020 From: mlohry at gmail.com (Mark Lohry) Date: Mon, 24 Aug 2020 10:54:57 -0400 Subject: [petsc-users] Bus Error In-Reply-To: <02176361-CBC4-490B-A064-057C9FAC271A@petsc.dev> References: <8D172ADD-FC1A-4E71-B151-CA648951A61C@petsc.dev> <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> <02176361-CBC4-490B-A064-057C9FAC271A@petsc.dev> Message-ID: Thanks Barry, I'll give -malloc_debug a shot. I know this is not necessarily a reasonable test but if you run the exact > same thing twice does it crash at the same location in terms of iterations > or does it seem to crash eventually "randomly" just after a long time? > Crashes after a different number of iterations, seemingly random. > > I understand the frustration with this kind of crash, it just shouldn't > happen because the same BLAS calls have been made in the same way thousands > of times and yet suddenly trouble and very hard to debug. > Eventually makes for a good war story. Thinking back, I have seen some disturbing memory behavior that I think falls back to my use of eigen... e.g. in the past when running my full test suite a particular case would fail with NaNs, but if I ran that case in isolation it passes. I wonder if some object isn't getting properly aligned and at some point some kind of corruption occurs? On Mon, Aug 24, 2020 at 10:35 AM Barry Smith wrote: > > Mark, > > Ok, I'd generally trust the stock BLAS for not failing over OpenBLAS. > > Since valgrind is not viable have you tried with -malloc_debug with the > bad case it will be a little bit slower but not to bad and can find some > memory corruption issues. > > It might be useful to get the stack trace inside the BLAS to see exactly > where it crashes. If you ./configure with debugging and use > --download-fblaslapack or --download-f2cblaslapack it will compile the BLAS > with debugging, but just running a batch job still won't display the stack > frames inside the BLAS call. > > We have an option -on_error_attach_debugger which is useful for longer > many rank runs that attaches the debugger ONLY when the error is detected > but it may not play well with batch systems. But if you can make your run > on a non-batch system it might be able, along with the > --download-fblaslapack or --download-f2cblaslapack to get the exact stack > frames. And in the debugger look at the variables and address points to try > to determine how it could have gone wrong. > > I know this is not necessarily a reasonable test but if you run the > exact same thing twice does it crash at the same location in terms of > iterations or does it seem to crash eventually "randomly" just after a long > time? > > I understand the frustration with this kind of crash, it just shouldn't > happen because the same BLAS calls have been made in the same way thousands > of times and yet suddenly trouble and very hard to debug. > > Barry > > > > > On Aug 24, 2020, at 9:15 AM, Mark Lohry wrote: > > valgrind: I ran a much smaller case and didn't see any issues in valgrind. > I'm only seeing this bus error on several hundred cores a few hours > wallclock in, so it might not be feasible to run that in valgrind. > > blas: i'm not entirely sure -- it's the stock one in PUIAS linux (red hat > derivative), libblas.so.3.4.2.. i'm going to try with intel and if that > fails use the openblas downloaded via petsc and see if it alleviates itself. > > > > On Mon, Aug 24, 2020 at 9:48 AM Barry Smith wrote: > >> >> Mark, >> >> Can you run in valgrind? >> >> Exactly what BLAS are you using? >> >> Barry >> >> >> On Aug 24, 2020, at 7:54 AM, Mark Lohry wrote: >> >> Reran with debug mode and got a stack trace for this bus error, looks >> like it's happening in BLASgemv, see pasted below. I did take care of the >> ISColoring leak mentioned previously, although that was a very small amount >> of data and I don't think is relevant here. >> >> At this point it's happily run 222 timesteps prior to this, so I'm a >> little mystified. Any ideas? >> >> Thanks, >> Mark >> >> >> 222 TS dt 0.03 time 6.66 >> 0 SNES Function norm 4.124287265556e+02 >> 0 KSP Residual norm 4.124287265556e+02 >> 1 KSP Residual norm 4.123248052318e+02 >> 2 KSP Residual norm 4.123173350456e+02 >> 3 KSP Residual norm 4.118769044110e+02 >> 4 KSP Residual norm 4.094856150740e+02 >> 5 KSP Residual norm 4.006000788078e+02 >> 6 KSP Residual norm 3.787922969183e+02 >> [clip] >> Linear solve converged due to CONVERGED_RTOL iterations 9 >> Line search: Using full step: fnorm 4.015236590684e+01 gnorm >> 3.173434863784e+00 >> 2 SNES Function norm 3.173434863784e+00 >> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 2 >> 0 SNES Function norm 5.842010710080e+02 >> 0 KSP Residual norm 5.842010710080e+02 >> 1 KSP Residual norm 5.840526408234e+02 >> 2 KSP Residual norm 5.840431857354e+02 >> 3 KSP Residual norm 5.834351392302e+02 >> 4 KSP Residual norm 5.800901047861e+02 >> 5 KSP Residual norm 5.675562288567e+02 >> 6 KSP Residual norm 5.366287895681e+02 >> 7 KSP Residual norm 4.725811521866e+02 >> [911]PETSC ERROR: >> ------------------------------------------------------------------------ >> [911]PETSC ERROR: Caught signal number 7 BUS: Bus Error, possibly illegal >> memory access >> [911]PETSC ERROR: Try option -start_in_debugger or >> -on_error_attach_debugger >> [911]PETSC ERROR: or see >> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >> [911]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac >> OS X to find memory corruption errors >> [911]PETSC ERROR: likely location of problem given in stack below >> [911]PETSC ERROR: --------------------- Stack Frames >> ------------------------------------ >> [911]PETSC ERROR: Note: The EXACT line numbers in the stack are not >> available, >> [911]PETSC ERROR: INSTEAD the line number of the start of the >> function >> [911]PETSC ERROR: is given. >> [911]PETSC ERROR: [911] BLASgemv line 1393 >> /home/mlohry/build/external/petsc/src/mat/impls/baij/seq/baijfact.c >> [911]PETSC ERROR: [911] MatSolve_SeqBAIJ_N_NaturalOrdering line 1378 >> /home/mlohry/build/external/petsc/src/mat/impls/baij/seq/baijfact.c >> [911]PETSC ERROR: [911] MatSolve line 3354 >> /home/mlohry/build/external/petsc/src/mat/interface/matrix.c >> [911]PETSC ERROR: [911] PCApply_ILU line 201 >> /home/mlohry/build/external/petsc/src/ksp/pc/impls/factor/ilu/ilu.c >> [911]PETSC ERROR: [911] PCApply line 426 >> /home/mlohry/build/external/petsc/src/ksp/pc/interface/precon.c >> [911]PETSC ERROR: [911] KSP_PCApply line 279 >> /home/mlohry/build/external/petsc/include/petsc/private/kspimpl.h >> [911]PETSC ERROR: [911] KSPSolve_PREONLY line 16 >> /home/mlohry/build/external/petsc/src/ksp/ksp/impls/preonly/preonly.c >> [911]PETSC ERROR: [911] KSPSolve_Private line 590 >> /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >> [911]PETSC ERROR: [911] KSPSolve line 848 >> /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >> [911]PETSC ERROR: [911] PCApply_ASM line 441 >> /home/mlohry/build/external/petsc/src/ksp/pc/impls/asm/asm.c >> [911]PETSC ERROR: [911] PCApply line 426 >> /home/mlohry/build/external/petsc/src/ksp/pc/interface/precon.c >> [911]PETSC ERROR: [911] KSP_PCApply line 279 >> /home/mlohry/build/external/petsc/include/petsc/private/kspimpl.h >> [911]PETSC ERROR: [911] KSPFGMRESCycle line 108 >> /home/mlohry/build/external/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c >> [911]PETSC ERROR: [911] KSPSolve_FGMRES line 274 >> /home/mlohry/build/external/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c >> [911]PETSC ERROR: [911] KSPSolve_Private line 590 >> /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >> [911]PETSC ERROR: [911] KSPSolve line 848 >> /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >> [911]PETSC ERROR: [911] SNESSolve_NEWTONLS line 144 >> /home/mlohry/build/external/petsc/src/snes/impls/ls/ls.c >> [911]PETSC ERROR: [911] SNESSolve line 4403 >> /home/mlohry/build/external/petsc/src/snes/interface/snes.c >> [911]PETSC ERROR: [911] TSStep_ARKIMEX line 728 >> /home/mlohry/build/external/petsc/src/ts/impls/arkimex/arkimex.c >> [911]PETSC ERROR: [911] TSStep line 3682 >> /home/mlohry/build/external/petsc/src/ts/interface/ts.c >> [911]PETSC ERROR: [911] TSSolve line 4005 >> /home/mlohry/build/external/petsc/src/ts/interface/ts.c >> [911]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [911]PETSC ERROR: Signal received >> [911]PETSC ERROR: See >> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >> shooting. >> [911]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020 >> [911]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h20c2n20 by >> mlohry Sun Aug 23 19:54:21 2020 >> [911]PETSC ERROR: Configure options >> PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt >> --with-cc=/usr/local/openmpi/3.1.3/gcc/x8 >> [911]PETSC ERROR: #1 User provided function() line 0 in unknown file >> -------------------------------------------------------------------------- >> MPI_ABORT was invoked on rank 911 in communicator MPI_COMM_WORLD >> >> On Wed, Aug 12, 2020 at 8:19 PM Mark Lohry wrote: >> >>> Perhaps you are calling ISColoringGetIS() and not calling >>>> ISColoringRestoreIS()? >>>> >>> >>> I have matching ISColoringGet/Restore here, and it's only used prior to >>> the first iteration so at least it doesn't seem to be growing. At the >>> bottom I pasted the malloc_view and malloc_debug output from running 1 time >>> step. >>> >>> I'm sort of thinking this might be a red herring -- is it possible the >>> rank 0 process is chewing up dramatically more memory than others, like >>> with logging or something? Like I mentioned earlier the total memory usage >>> is well under the machine limits. I'll spring in some >>> PetscMemoryGetMaximumUsage logging at every time step and try to get a big >>> job going again. >>> >>> >>> >>> Are you using Fortran? >>>> >>> >>> C++ >>> >>> >>> >>> [ 0]1408 bytes PetscSplitReductionCreate() line 63 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>> [ 0]80 bytes PetscSplitReductionCreate() line 57 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>> [ 0]16 bytes PetscCommBuildTwoSided_Allreduce() line 169 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/mpits.c >>> [ 0]16 bytes ISGeneralSetIndices_General() line 578 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]272 bytes ISGeneralSetIndices_General() line 578 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]880 bytes ISGeneralSetIndices_General() line 578 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]960 bytes ISGeneralSetIndices_General() line 578 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]976 bytes ISGeneralSetIndices_General() line 578 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]1040 bytes ISGeneralSetIndices_General() line 578 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]64 bytes ISColoringGetIS() line 266 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/utils/iscoloring.c >>> [ 0]32 bytes PetscCommDuplicate() line 129 in >>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/objects/tagm.c >>> [0] Maximum memory PetscMalloc()ed 610153776 maximum size of entire >>> process 719073280 >>> [0] Memory usage sorted by function >>> [0] 6 192 DMCoarsenHookAdd() >>> [0] 2 9984 DMCreate() >>> [0] 2 128 DMCreate_Shell() >>> [0] 2 64 DMDSEnlarge_Static() >>> [0] 1 672 DMKSPCreate() >>> [0] 3 96 DMRefineHookAdd() >>> [0] 3 2064 DMSNESCreate() >>> [0] 4 128 DMSubDomainHookAdd() >>> [0] 1 768 DMTSCreate() >>> [0] 2 96 ISColoringCreate() >>> [0] 8 12608 ISColoringGetIS() >>> [0] 1 307200 ISConcatenate() >>> [0] 29 25984 ISCreate() >>> [0] 25 400 ISCreate_General() >>> [0] 4 64 ISCreate_Stride() >>> [0] 20 338016 ISGeneralSetIndices_General() >>> [0] 3 921600 ISGetIndices_Stride() >>> [0] 2 307232 ISGlobalToLocalMappingSetUp_Basic() >>> [0] 1 6144 ISInvertPermutation_General() >>> [0] 3 308576 ISLocalToGlobalMappingCreate() >>> [0] 2 32 KSPConvergedDefaultCreate() >>> [0] 2 2816 KSPCreate() >>> [0] 1 224 KSPCreate_FGMRES() >>> [0] 1 8016 KSPGMRESClassicalGramSchmidtOrthogonalization() >>> [0] 2 16032 KSPSetUp_FGMRES() >>> [0] 4 16084160 KSPSetUp_GMRES() >>> [0] 2 36864 MatColoringApply_SL() >>> [0] 1 656 MatColoringCreate() >>> [0] 6 17088 MatCreate() >>> [0] 1 16 MatCreateMFFD_WP() >>> [0] 1 16 MatCreateSubMatrices_SeqBAIJ() >>> [0] 1 12288 MatCreateSubMatrix_SeqBAIJ() >>> [0] 3 32320 MatCreateSubMatrix_SeqBAIJ_Private() >>> [0] 2 1472 MatCreate_MFFD() >>> [0] 1 416 MatCreate_SeqAIJ() >>> [0] 3 864 MatCreate_SeqBAIJ() >>> [0] 2 416 MatCreate_Shell() >>> [0] 1 784 MatFDColoringCreate() >>> [0] 2 12288 MatFDColoringDegreeSequence_Minpack() >>> [0] 6 30859392 MatFDColoringSetUp_SeqXAIJ() >>> [0] 3 42512 MatGetColumnIJ_SeqAIJ() >>> [0] 4 72720 MatGetColumnIJ_SeqBAIJ_Color() >>> [0] 1 6144 MatGetOrdering_Natural() >>> [0] 2 36384 MatGetRowIJ_SeqAIJ() >>> [0] 7 210626000 MatILUFactorSymbolic_SeqBAIJ() >>> [0] 2 313376 MatIncreaseOverlap_SeqBAIJ() >>> [0] 2 30740608 MatLUFactorNumeric_SeqBAIJ_N() >>> [0] 1 6144 MatMarkDiagonal_SeqAIJ() >>> [0] 1 6144 MatMarkDiagonal_SeqBAIJ() >>> [0] 8 256 MatRegisterRootName() >>> [0] 1 6160 MatSeqAIJCheckInode() >>> [0] 4 115216 MatSeqAIJSetPreallocation_SeqAIJ() >>> [0] 4 302779424 MatSeqBAIJSetPreallocation_SeqBAIJ() >>> [0] 13 576 MatSolverTypeRegister() >>> [0] 1 16 PCASMCreateSubdomains() >>> [0] 2 1664 PCCreate() >>> [0] 1 160 PCCreate_ASM() >>> [0] 1 192 PCCreate_ILU() >>> [0] 5 307264 PCSetUp_ASM() >>> [0] 2 416 PetscBTCreate() >>> [0] 2 3216 PetscClassPerfLogCreate() >>> [0] 2 1616 PetscClassRegLogCreate() >>> [0] 2 32 PetscCommBuildTwoSided_Allreduce() >>> [0] 2 64 PetscCommDuplicate() >>> [0] 2 1888 PetscDSCreate() >>> [0] 2 26416 PetscEventPerfLogCreate() >>> [0] 2 158400 PetscEventPerfLogEnsureSize() >>> [0] 2 1616 PetscEventRegLogCreate() >>> [0] 2 9600 PetscEventRegLogRegister() >>> [0] 8 102400 PetscFreeSpaceGet() >>> [0] 474 15168 PetscFunctionListAdd_Private() >>> [0] 2 528 PetscIntStackCreate() >>> [0] 142 11360 PetscLayoutCreate() >>> [0] 56 896 PetscLayoutSetUp() >>> [0] 59 9440 PetscObjectComposedDataIncreaseReal() >>> [0] 2 576 PetscObjectListAdd() >>> [0] 33 768 PetscOptionsGetEList() >>> [0] 1 16 PetscOptionsHelpPrintedCreate() >>> [0] 1 32 PetscPushSignalHandler() >>> [0] 7 6944 PetscSFCreate() >>> [0] 3 432 PetscSFCreate_Basic() >>> [0] 2 1472 PetscSFLinkCreate() >>> [0] 11 1229040 PetscSFSetUpRanks() >>> [0] 7 614512 PetscSFSetUp_Basic() >>> [0] 4 20096 PetscSegBufferCreate() >>> [0] 2 1488 PetscSplitReductionCreate() >>> [0] 2 3008 PetscStageLogCreate() >>> [0] 1148 23872 PetscStrallocpy() >>> [0] 6 13056 PetscStrreplace() >>> [0] 9 3456 PetscTableCreate() >>> [0] 1 16 PetscViewerASCIIOpen() >>> [0] 6 96 PetscViewerAndFormatCreate() >>> [0] 1 752 PetscViewerCreate() >>> [0] 1 96 PetscViewerCreate_ASCII() >>> [0] 2 1424 SNESCreate() >>> [0] 1 16 SNESCreate_NEWTONLS() >>> [0] 1 1008 SNESLineSearchCreate() >>> [0] 1 16 SNESLineSearchCreate_BT() >>> [0] 16 1824 SNESMSRegister() >>> [0] 46 9056 TSARKIMEXRegister() >>> [0] 1 1264 TSAdaptCreate() >>> [0] 8 384 TSBasicSymplecticRegister() >>> [0] 1 2160 TSCreate() >>> [0] 1 224 TSCreate_Theta() >>> [0] 48 5968 TSGLEERegister() >>> [0] 41 7728 TSRKRegister() >>> [0] 89 14736 TSRosWRegister() >>> [0] 71 110192 VecCreate() >>> [0] 1 307200 VecCreateGhostWithArray() >>> [0] 123 36874080 VecCreate_MPI_Private() >>> [0] 7 4300800 VecCreate_Seq() >>> [0] 8 256 VecCreate_Seq_Private() >>> [0] 6 400 VecDuplicateVecs_Default() >>> [0] 3 2352 VecScatterCreate() >>> [0] 7 1843296 VecScatterSetUp_SF() >>> [0] 126 2016 VecStashCreate_Private() >>> [0] 1 3072 mapBlockColoringToJacobian() >>> >>> On Wed, Aug 12, 2020 at 4:22 PM Barry Smith wrote: >>> >>>> >>>> Yes, there are some PETSc objects or arrays that you are not freeing >>>> so they are printed at the end of the run. For small runs this harmless but >>>> if new objects/memory is allocated at each iteration and not suitably freed >>>> it will eventually add up. >>>> >>>> Run with -malloc_view (small problem with say 2 iterations) it will >>>> print everything allocated and might be helpful. >>>> >>>> Perhaps you are calling ISColoringGetIS() and not calling >>>> ISColoringRestoreIS()? >>>> >>>> It is also possible it is a leak in PETSc, but that is unlikely >>>> since we test for them. >>>> >>>> Are you using Fortran? >>>> >>>> Barry >>>> >>>> >>>> On Aug 12, 2020, at 1:29 PM, Mark Lohry wrote: >>>> >>>> Thanks Matt and Barry. At Matt's suggestion I ran a smaller >>>> representative case with valgrind and didn't see anything alarming (apart >>>> from a small leak in an older boost version I was using: >>>> https://github.com/boostorg/serialization/issues/104 although I don't >>>> think this was causing the issue). >>>> >>>> -malloc_debug dumps quite a lot, this is supposed to be empty right? >>>> Output pasted below. It looks like the same sequence of calls is repeated 8 >>>> times, which is how many nonlinear solves occurred in this particular run. >>>> Thoughts? >>>> >>>> >>>> >>>> [ 0]1408 bytes PetscSplitReductionCreate() line 63 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>>> [ 0]80 bytes PetscSplitReductionCreate() line 57 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>>> [ 0]16 bytes PetscCommBuildTwoSided_Allreduce() line 169 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/mpits.c >>>> [ 0]16 bytes ISGeneralSetIndices_General() line 578 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]16 bytes ISCreate_General() line 647 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]896 bytes ISCreate() line 37 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>> [ 0]272 bytes ISGeneralSetIndices_General() line 578 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]16 bytes ISCreate_General() line 647 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]896 bytes ISCreate() line 37 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>> [ 0]880 bytes ISGeneralSetIndices_General() line 578 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]16 bytes ISCreate_General() line 647 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]896 bytes ISCreate() line 37 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>> [ 0]960 bytes ISGeneralSetIndices_General() line 578 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]16 bytes ISCreate_General() line 647 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]896 bytes ISCreate() line 37 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>> [ 0]976 bytes ISGeneralSetIndices_General() line 578 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]16 bytes ISCreate_General() line 647 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]896 bytes ISCreate() line 37 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]16 bytes ISCreate_General() line 647 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]896 bytes ISCreate() line 37 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]16 bytes ISCreate_General() line 647 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]896 bytes ISCreate() line 37 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>> [ 0]1040 bytes ISGeneralSetIndices_General() line 578 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]16 bytes ISCreate_General() line 647 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]896 bytes ISCreate() line 37 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>> [ 0]64 bytes ISColoringGetIS() line 266 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/utils/iscoloring.c >>>> [ 0]32 bytes PetscCommDuplicate() line 129 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/objects/tagm.c >>>> >>>> >>>> >>>> On Wed, Aug 12, 2020 at 1:46 PM Barry Smith wrote: >>>> >>>>> >>>>> Mark. >>>>> >>>>> When valgrind is not feasible (like on many centrally controlled >>>>> batch systems) you can run PETSc with an extra flag to do some memory error >>>>> checks >>>>> -malloc_debug >>>>> >>>>> this >>>>> >>>>> 1) fills all malloced memory with Nan so if the code is using >>>>> uninitialized memory it may be detected and >>>>> 2) checks the beginning and end of each alloced memory region for >>>>> out-of-bounds writes at each malloc and free. >>>>> >>>>> it will slow the code down a little bit but generally not a huge >>>>> amount. >>>>> >>>>> It is no where near as good as valgrind or other memory corruption >>>>> tools but it has the advantage you can run it anywhere on any size job. >>>>> >>>>> >>>>> Barry >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Aug 12, 2020, at 7:46 AM, Matthew Knepley >>>>> wrote: >>>>> >>>>> On Wed, Aug 12, 2020 at 7:53 AM Mark Lohry wrote: >>>>> >>>>>> I'm getting seemingly random failures of late: >>>>>> Caught signal number 7 BUS: Bus Error, possibly illegal memory access >>>>>> >>>>> >>>>> The first thing I would do is run valgrind on as wide an array of >>>>> tests as you can. This will find problems >>>>> on things that run completely fine. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> Symptoms: >>>>>> 1) Seems to only happen (so far) on larger cases, 400-2000 cores >>>>>> 2) It doesn't happen right away -- this was running happily for >>>>>> several hours over several hundred time steps with no indication of bad >>>>>> health in the numerics >>>>>> 3) At least the total memory consumption seems to be within bounds, >>>>>> though I'm not sure about individual processes. e.g. slurm here reported >>>>>> Memory Efficiency: 75.23% of 1.76 TB (180.00 GB/node) >>>>>> 4) running the same setup twice it fails at different points >>>>>> >>>>>> Any suggestions on what to look for? This is a bit painful to work on >>>>>> as I can only reproduce it on large runs and then it's seemingly random. >>>>>> >>>>>> >>>>>> Thanks, >>>>>> Mark >>>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>>> >>>>> >>>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Aug 24 10:10:16 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 24 Aug 2020 11:10:16 -0400 Subject: [petsc-users] Bus Error In-Reply-To: References: <8D172ADD-FC1A-4E71-B151-CA648951A61C@petsc.dev> <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> <02176361-CBC4-490B-A064-057C9FAC271A@petsc.dev> Message-ID: On Mon, Aug 24, 2020 at 10:56 AM Mark Lohry wrote: > Thanks Barry, I'll give -malloc_debug a shot. > > I know this is not necessarily a reasonable test but if you run the >> exact same thing twice does it crash at the same location in terms of >> iterations or does it seem to crash eventually "randomly" just after a long >> time? >> > > Crashes after a different number of iterations, seemingly random. > > >> >> I understand the frustration with this kind of crash, it just shouldn't >> happen because the same BLAS calls have been made in the same way thousands >> of times and yet suddenly trouble and very hard to debug. >> > > Eventually makes for a good war story. > > Thinking back, I have seen some disturbing memory behavior that I think > falls back to my use of eigen... e.g. in the past when running my full test > suite a particular case would fail with NaNs, but if I ran that case in > isolation it passes. I wonder if some object isn't getting properly aligned > and at some point some kind of corruption occurs? > Do you ever use regular malloc()? PETSc malloc aligns automatically, but the system one does not. Thanks, Matt > On Mon, Aug 24, 2020 at 10:35 AM Barry Smith wrote: > >> >> Mark, >> >> Ok, I'd generally trust the stock BLAS for not failing over OpenBLAS. >> >> Since valgrind is not viable have you tried with -malloc_debug with the >> bad case it will be a little bit slower but not to bad and can find some >> memory corruption issues. >> >> It might be useful to get the stack trace inside the BLAS to see >> exactly where it crashes. If you ./configure with debugging and use >> --download-fblaslapack or --download-f2cblaslapack it will compile the BLAS >> with debugging, but just running a batch job still won't display the stack >> frames inside the BLAS call. >> >> We have an option -on_error_attach_debugger which is useful for longer >> many rank runs that attaches the debugger ONLY when the error is detected >> but it may not play well with batch systems. But if you can make your run >> on a non-batch system it might be able, along with the >> --download-fblaslapack or --download-f2cblaslapack to get the exact stack >> frames. And in the debugger look at the variables and address points to try >> to determine how it could have gone wrong. >> >> I know this is not necessarily a reasonable test but if you run the >> exact same thing twice does it crash at the same location in terms of >> iterations or does it seem to crash eventually "randomly" just after a long >> time? >> >> I understand the frustration with this kind of crash, it just shouldn't >> happen because the same BLAS calls have been made in the same way thousands >> of times and yet suddenly trouble and very hard to debug. >> >> Barry >> >> >> >> >> On Aug 24, 2020, at 9:15 AM, Mark Lohry wrote: >> >> valgrind: I ran a much smaller case and didn't see any issues in >> valgrind. I'm only seeing this bus error on several hundred cores a few >> hours wallclock in, so it might not be feasible to run that in valgrind. >> >> blas: i'm not entirely sure -- it's the stock one in PUIAS linux (red hat >> derivative), libblas.so.3.4.2.. i'm going to try with intel and if that >> fails use the openblas downloaded via petsc and see if it alleviates itself. >> >> >> >> On Mon, Aug 24, 2020 at 9:48 AM Barry Smith wrote: >> >>> >>> Mark, >>> >>> Can you run in valgrind? >>> >>> Exactly what BLAS are you using? >>> >>> Barry >>> >>> >>> On Aug 24, 2020, at 7:54 AM, Mark Lohry wrote: >>> >>> Reran with debug mode and got a stack trace for this bus error, looks >>> like it's happening in BLASgemv, see pasted below. I did take care of the >>> ISColoring leak mentioned previously, although that was a very small amount >>> of data and I don't think is relevant here. >>> >>> At this point it's happily run 222 timesteps prior to this, so I'm a >>> little mystified. Any ideas? >>> >>> Thanks, >>> Mark >>> >>> >>> 222 TS dt 0.03 time 6.66 >>> 0 SNES Function norm 4.124287265556e+02 >>> 0 KSP Residual norm 4.124287265556e+02 >>> 1 KSP Residual norm 4.123248052318e+02 >>> 2 KSP Residual norm 4.123173350456e+02 >>> 3 KSP Residual norm 4.118769044110e+02 >>> 4 KSP Residual norm 4.094856150740e+02 >>> 5 KSP Residual norm 4.006000788078e+02 >>> 6 KSP Residual norm 3.787922969183e+02 >>> [clip] >>> Linear solve converged due to CONVERGED_RTOL iterations 9 >>> Line search: Using full step: fnorm 4.015236590684e+01 gnorm >>> 3.173434863784e+00 >>> 2 SNES Function norm 3.173434863784e+00 >>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 2 >>> 0 SNES Function norm 5.842010710080e+02 >>> 0 KSP Residual norm 5.842010710080e+02 >>> 1 KSP Residual norm 5.840526408234e+02 >>> 2 KSP Residual norm 5.840431857354e+02 >>> 3 KSP Residual norm 5.834351392302e+02 >>> 4 KSP Residual norm 5.800901047861e+02 >>> 5 KSP Residual norm 5.675562288567e+02 >>> 6 KSP Residual norm 5.366287895681e+02 >>> 7 KSP Residual norm 4.725811521866e+02 >>> [911]PETSC ERROR: >>> ------------------------------------------------------------------------ >>> [911]PETSC ERROR: Caught signal number 7 BUS: Bus Error, possibly >>> illegal memory access >>> [911]PETSC ERROR: Try option -start_in_debugger or >>> -on_error_attach_debugger >>> [911]PETSC ERROR: or see >>> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >>> [911]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac >>> OS X to find memory corruption errors >>> [911]PETSC ERROR: likely location of problem given in stack below >>> [911]PETSC ERROR: --------------------- Stack Frames >>> ------------------------------------ >>> [911]PETSC ERROR: Note: The EXACT line numbers in the stack are not >>> available, >>> [911]PETSC ERROR: INSTEAD the line number of the start of the >>> function >>> [911]PETSC ERROR: is given. >>> [911]PETSC ERROR: [911] BLASgemv line 1393 >>> /home/mlohry/build/external/petsc/src/mat/impls/baij/seq/baijfact.c >>> [911]PETSC ERROR: [911] MatSolve_SeqBAIJ_N_NaturalOrdering line 1378 >>> /home/mlohry/build/external/petsc/src/mat/impls/baij/seq/baijfact.c >>> [911]PETSC ERROR: [911] MatSolve line 3354 >>> /home/mlohry/build/external/petsc/src/mat/interface/matrix.c >>> [911]PETSC ERROR: [911] PCApply_ILU line 201 >>> /home/mlohry/build/external/petsc/src/ksp/pc/impls/factor/ilu/ilu.c >>> [911]PETSC ERROR: [911] PCApply line 426 >>> /home/mlohry/build/external/petsc/src/ksp/pc/interface/precon.c >>> [911]PETSC ERROR: [911] KSP_PCApply line 279 >>> /home/mlohry/build/external/petsc/include/petsc/private/kspimpl.h >>> [911]PETSC ERROR: [911] KSPSolve_PREONLY line 16 >>> /home/mlohry/build/external/petsc/src/ksp/ksp/impls/preonly/preonly.c >>> [911]PETSC ERROR: [911] KSPSolve_Private line 590 >>> /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >>> [911]PETSC ERROR: [911] KSPSolve line 848 >>> /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >>> [911]PETSC ERROR: [911] PCApply_ASM line 441 >>> /home/mlohry/build/external/petsc/src/ksp/pc/impls/asm/asm.c >>> [911]PETSC ERROR: [911] PCApply line 426 >>> /home/mlohry/build/external/petsc/src/ksp/pc/interface/precon.c >>> [911]PETSC ERROR: [911] KSP_PCApply line 279 >>> /home/mlohry/build/external/petsc/include/petsc/private/kspimpl.h >>> [911]PETSC ERROR: [911] KSPFGMRESCycle line 108 >>> /home/mlohry/build/external/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c >>> [911]PETSC ERROR: [911] KSPSolve_FGMRES line 274 >>> /home/mlohry/build/external/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c >>> [911]PETSC ERROR: [911] KSPSolve_Private line 590 >>> /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >>> [911]PETSC ERROR: [911] KSPSolve line 848 >>> /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >>> [911]PETSC ERROR: [911] SNESSolve_NEWTONLS line 144 >>> /home/mlohry/build/external/petsc/src/snes/impls/ls/ls.c >>> [911]PETSC ERROR: [911] SNESSolve line 4403 >>> /home/mlohry/build/external/petsc/src/snes/interface/snes.c >>> [911]PETSC ERROR: [911] TSStep_ARKIMEX line 728 >>> /home/mlohry/build/external/petsc/src/ts/impls/arkimex/arkimex.c >>> [911]PETSC ERROR: [911] TSStep line 3682 >>> /home/mlohry/build/external/petsc/src/ts/interface/ts.c >>> [911]PETSC ERROR: [911] TSSolve line 4005 >>> /home/mlohry/build/external/petsc/src/ts/interface/ts.c >>> [911]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> [911]PETSC ERROR: Signal received >>> [911]PETSC ERROR: See >>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>> shooting. >>> [911]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020 >>> [911]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h20c2n20 by >>> mlohry Sun Aug 23 19:54:21 2020 >>> [911]PETSC ERROR: Configure options >>> PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt >>> --with-cc=/usr/local/openmpi/3.1.3/gcc/x8 >>> [911]PETSC ERROR: #1 User provided function() line 0 in unknown file >>> >>> -------------------------------------------------------------------------- >>> MPI_ABORT was invoked on rank 911 in communicator MPI_COMM_WORLD >>> >>> On Wed, Aug 12, 2020 at 8:19 PM Mark Lohry wrote: >>> >>>> Perhaps you are calling ISColoringGetIS() and not calling >>>>> ISColoringRestoreIS()? >>>>> >>>> >>>> I have matching ISColoringGet/Restore here, and it's only used prior to >>>> the first iteration so at least it doesn't seem to be growing. At the >>>> bottom I pasted the malloc_view and malloc_debug output from running 1 time >>>> step. >>>> >>>> I'm sort of thinking this might be a red herring -- is it possible the >>>> rank 0 process is chewing up dramatically more memory than others, like >>>> with logging or something? Like I mentioned earlier the total memory usage >>>> is well under the machine limits. I'll spring in some >>>> PetscMemoryGetMaximumUsage logging at every time step and try to get a big >>>> job going again. >>>> >>>> >>>> >>>> Are you using Fortran? >>>>> >>>> >>>> C++ >>>> >>>> >>>> >>>> [ 0]1408 bytes PetscSplitReductionCreate() line 63 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>>> [ 0]80 bytes PetscSplitReductionCreate() line 57 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>>> [ 0]16 bytes PetscCommBuildTwoSided_Allreduce() line 169 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/mpits.c >>>> [ 0]16 bytes ISGeneralSetIndices_General() line 578 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]16 bytes ISCreate_General() line 647 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]896 bytes ISCreate() line 37 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>> [ 0]272 bytes ISGeneralSetIndices_General() line 578 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]16 bytes ISCreate_General() line 647 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]896 bytes ISCreate() line 37 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>> [ 0]880 bytes ISGeneralSetIndices_General() line 578 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]16 bytes ISCreate_General() line 647 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]896 bytes ISCreate() line 37 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>> [ 0]960 bytes ISGeneralSetIndices_General() line 578 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]16 bytes ISCreate_General() line 647 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]896 bytes ISCreate() line 37 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>> [ 0]976 bytes ISGeneralSetIndices_General() line 578 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]16 bytes ISCreate_General() line 647 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]896 bytes ISCreate() line 37 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]16 bytes ISCreate_General() line 647 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]896 bytes ISCreate() line 37 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]16 bytes ISCreate_General() line 647 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]896 bytes ISCreate() line 37 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>> [ 0]1040 bytes ISGeneralSetIndices_General() line 578 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]16 bytes ISCreate_General() line 647 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]896 bytes ISCreate() line 37 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>> [ 0]64 bytes ISColoringGetIS() line 266 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/utils/iscoloring.c >>>> [ 0]32 bytes PetscCommDuplicate() line 129 in >>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/objects/tagm.c >>>> [0] Maximum memory PetscMalloc()ed 610153776 maximum size of entire >>>> process 719073280 >>>> [0] Memory usage sorted by function >>>> [0] 6 192 DMCoarsenHookAdd() >>>> [0] 2 9984 DMCreate() >>>> [0] 2 128 DMCreate_Shell() >>>> [0] 2 64 DMDSEnlarge_Static() >>>> [0] 1 672 DMKSPCreate() >>>> [0] 3 96 DMRefineHookAdd() >>>> [0] 3 2064 DMSNESCreate() >>>> [0] 4 128 DMSubDomainHookAdd() >>>> [0] 1 768 DMTSCreate() >>>> [0] 2 96 ISColoringCreate() >>>> [0] 8 12608 ISColoringGetIS() >>>> [0] 1 307200 ISConcatenate() >>>> [0] 29 25984 ISCreate() >>>> [0] 25 400 ISCreate_General() >>>> [0] 4 64 ISCreate_Stride() >>>> [0] 20 338016 ISGeneralSetIndices_General() >>>> [0] 3 921600 ISGetIndices_Stride() >>>> [0] 2 307232 ISGlobalToLocalMappingSetUp_Basic() >>>> [0] 1 6144 ISInvertPermutation_General() >>>> [0] 3 308576 ISLocalToGlobalMappingCreate() >>>> [0] 2 32 KSPConvergedDefaultCreate() >>>> [0] 2 2816 KSPCreate() >>>> [0] 1 224 KSPCreate_FGMRES() >>>> [0] 1 8016 KSPGMRESClassicalGramSchmidtOrthogonalization() >>>> [0] 2 16032 KSPSetUp_FGMRES() >>>> [0] 4 16084160 KSPSetUp_GMRES() >>>> [0] 2 36864 MatColoringApply_SL() >>>> [0] 1 656 MatColoringCreate() >>>> [0] 6 17088 MatCreate() >>>> [0] 1 16 MatCreateMFFD_WP() >>>> [0] 1 16 MatCreateSubMatrices_SeqBAIJ() >>>> [0] 1 12288 MatCreateSubMatrix_SeqBAIJ() >>>> [0] 3 32320 MatCreateSubMatrix_SeqBAIJ_Private() >>>> [0] 2 1472 MatCreate_MFFD() >>>> [0] 1 416 MatCreate_SeqAIJ() >>>> [0] 3 864 MatCreate_SeqBAIJ() >>>> [0] 2 416 MatCreate_Shell() >>>> [0] 1 784 MatFDColoringCreate() >>>> [0] 2 12288 MatFDColoringDegreeSequence_Minpack() >>>> [0] 6 30859392 MatFDColoringSetUp_SeqXAIJ() >>>> [0] 3 42512 MatGetColumnIJ_SeqAIJ() >>>> [0] 4 72720 MatGetColumnIJ_SeqBAIJ_Color() >>>> [0] 1 6144 MatGetOrdering_Natural() >>>> [0] 2 36384 MatGetRowIJ_SeqAIJ() >>>> [0] 7 210626000 MatILUFactorSymbolic_SeqBAIJ() >>>> [0] 2 313376 MatIncreaseOverlap_SeqBAIJ() >>>> [0] 2 30740608 MatLUFactorNumeric_SeqBAIJ_N() >>>> [0] 1 6144 MatMarkDiagonal_SeqAIJ() >>>> [0] 1 6144 MatMarkDiagonal_SeqBAIJ() >>>> [0] 8 256 MatRegisterRootName() >>>> [0] 1 6160 MatSeqAIJCheckInode() >>>> [0] 4 115216 MatSeqAIJSetPreallocation_SeqAIJ() >>>> [0] 4 302779424 MatSeqBAIJSetPreallocation_SeqBAIJ() >>>> [0] 13 576 MatSolverTypeRegister() >>>> [0] 1 16 PCASMCreateSubdomains() >>>> [0] 2 1664 PCCreate() >>>> [0] 1 160 PCCreate_ASM() >>>> [0] 1 192 PCCreate_ILU() >>>> [0] 5 307264 PCSetUp_ASM() >>>> [0] 2 416 PetscBTCreate() >>>> [0] 2 3216 PetscClassPerfLogCreate() >>>> [0] 2 1616 PetscClassRegLogCreate() >>>> [0] 2 32 PetscCommBuildTwoSided_Allreduce() >>>> [0] 2 64 PetscCommDuplicate() >>>> [0] 2 1888 PetscDSCreate() >>>> [0] 2 26416 PetscEventPerfLogCreate() >>>> [0] 2 158400 PetscEventPerfLogEnsureSize() >>>> [0] 2 1616 PetscEventRegLogCreate() >>>> [0] 2 9600 PetscEventRegLogRegister() >>>> [0] 8 102400 PetscFreeSpaceGet() >>>> [0] 474 15168 PetscFunctionListAdd_Private() >>>> [0] 2 528 PetscIntStackCreate() >>>> [0] 142 11360 PetscLayoutCreate() >>>> [0] 56 896 PetscLayoutSetUp() >>>> [0] 59 9440 PetscObjectComposedDataIncreaseReal() >>>> [0] 2 576 PetscObjectListAdd() >>>> [0] 33 768 PetscOptionsGetEList() >>>> [0] 1 16 PetscOptionsHelpPrintedCreate() >>>> [0] 1 32 PetscPushSignalHandler() >>>> [0] 7 6944 PetscSFCreate() >>>> [0] 3 432 PetscSFCreate_Basic() >>>> [0] 2 1472 PetscSFLinkCreate() >>>> [0] 11 1229040 PetscSFSetUpRanks() >>>> [0] 7 614512 PetscSFSetUp_Basic() >>>> [0] 4 20096 PetscSegBufferCreate() >>>> [0] 2 1488 PetscSplitReductionCreate() >>>> [0] 2 3008 PetscStageLogCreate() >>>> [0] 1148 23872 PetscStrallocpy() >>>> [0] 6 13056 PetscStrreplace() >>>> [0] 9 3456 PetscTableCreate() >>>> [0] 1 16 PetscViewerASCIIOpen() >>>> [0] 6 96 PetscViewerAndFormatCreate() >>>> [0] 1 752 PetscViewerCreate() >>>> [0] 1 96 PetscViewerCreate_ASCII() >>>> [0] 2 1424 SNESCreate() >>>> [0] 1 16 SNESCreate_NEWTONLS() >>>> [0] 1 1008 SNESLineSearchCreate() >>>> [0] 1 16 SNESLineSearchCreate_BT() >>>> [0] 16 1824 SNESMSRegister() >>>> [0] 46 9056 TSARKIMEXRegister() >>>> [0] 1 1264 TSAdaptCreate() >>>> [0] 8 384 TSBasicSymplecticRegister() >>>> [0] 1 2160 TSCreate() >>>> [0] 1 224 TSCreate_Theta() >>>> [0] 48 5968 TSGLEERegister() >>>> [0] 41 7728 TSRKRegister() >>>> [0] 89 14736 TSRosWRegister() >>>> [0] 71 110192 VecCreate() >>>> [0] 1 307200 VecCreateGhostWithArray() >>>> [0] 123 36874080 VecCreate_MPI_Private() >>>> [0] 7 4300800 VecCreate_Seq() >>>> [0] 8 256 VecCreate_Seq_Private() >>>> [0] 6 400 VecDuplicateVecs_Default() >>>> [0] 3 2352 VecScatterCreate() >>>> [0] 7 1843296 VecScatterSetUp_SF() >>>> [0] 126 2016 VecStashCreate_Private() >>>> [0] 1 3072 mapBlockColoringToJacobian() >>>> >>>> On Wed, Aug 12, 2020 at 4:22 PM Barry Smith wrote: >>>> >>>>> >>>>> Yes, there are some PETSc objects or arrays that you are not >>>>> freeing so they are printed at the end of the run. For small runs this >>>>> harmless but if new objects/memory is allocated at each iteration and not >>>>> suitably freed it will eventually add up. >>>>> >>>>> Run with -malloc_view (small problem with say 2 iterations) it >>>>> will print everything allocated and might be helpful. >>>>> >>>>> Perhaps you are calling ISColoringGetIS() and not calling >>>>> ISColoringRestoreIS()? >>>>> >>>>> It is also possible it is a leak in PETSc, but that is unlikely >>>>> since we test for them. >>>>> >>>>> Are you using Fortran? >>>>> >>>>> Barry >>>>> >>>>> >>>>> On Aug 12, 2020, at 1:29 PM, Mark Lohry wrote: >>>>> >>>>> Thanks Matt and Barry. At Matt's suggestion I ran a smaller >>>>> representative case with valgrind and didn't see anything alarming (apart >>>>> from a small leak in an older boost version I was using: >>>>> https://github.com/boostorg/serialization/issues/104 although I >>>>> don't think this was causing the issue). >>>>> >>>>> -malloc_debug dumps quite a lot, this is supposed to be empty right? >>>>> Output pasted below. It looks like the same sequence of calls is repeated 8 >>>>> times, which is how many nonlinear solves occurred in this particular run. >>>>> Thoughts? >>>>> >>>>> >>>>> >>>>> [ 0]1408 bytes PetscSplitReductionCreate() line 63 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>>>> [ 0]80 bytes PetscSplitReductionCreate() line 57 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>>>> [ 0]16 bytes PetscCommBuildTwoSided_Allreduce() line 169 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/mpits.c >>>>> [ 0]16 bytes ISGeneralSetIndices_General() line 578 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]896 bytes ISCreate() line 37 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>> [ 0]272 bytes ISGeneralSetIndices_General() line 578 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]896 bytes ISCreate() line 37 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>> [ 0]880 bytes ISGeneralSetIndices_General() line 578 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]896 bytes ISCreate() line 37 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>> [ 0]960 bytes ISGeneralSetIndices_General() line 578 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]896 bytes ISCreate() line 37 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>> [ 0]976 bytes ISGeneralSetIndices_General() line 578 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]896 bytes ISCreate() line 37 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]896 bytes ISCreate() line 37 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]896 bytes ISCreate() line 37 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>> [ 0]1040 bytes ISGeneralSetIndices_General() line 578 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]896 bytes ISCreate() line 37 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>> [ 0]64 bytes ISColoringGetIS() line 266 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/utils/iscoloring.c >>>>> [ 0]32 bytes PetscCommDuplicate() line 129 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/objects/tagm.c >>>>> >>>>> >>>>> >>>>> On Wed, Aug 12, 2020 at 1:46 PM Barry Smith wrote: >>>>> >>>>>> >>>>>> Mark. >>>>>> >>>>>> When valgrind is not feasible (like on many centrally controlled >>>>>> batch systems) you can run PETSc with an extra flag to do some memory error >>>>>> checks >>>>>> -malloc_debug >>>>>> >>>>>> this >>>>>> >>>>>> 1) fills all malloced memory with Nan so if the code is using >>>>>> uninitialized memory it may be detected and >>>>>> 2) checks the beginning and end of each alloced memory region for >>>>>> out-of-bounds writes at each malloc and free. >>>>>> >>>>>> it will slow the code down a little bit but generally not a huge >>>>>> amount. >>>>>> >>>>>> It is no where near as good as valgrind or other memory corruption >>>>>> tools but it has the advantage you can run it anywhere on any size job. >>>>>> >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Aug 12, 2020, at 7:46 AM, Matthew Knepley >>>>>> wrote: >>>>>> >>>>>> On Wed, Aug 12, 2020 at 7:53 AM Mark Lohry wrote: >>>>>> >>>>>>> I'm getting seemingly random failures of late: >>>>>>> Caught signal number 7 BUS: Bus Error, possibly illegal memory access >>>>>>> >>>>>> >>>>>> The first thing I would do is run valgrind on as wide an array of >>>>>> tests as you can. This will find problems >>>>>> on things that run completely fine. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>>> Symptoms: >>>>>>> 1) Seems to only happen (so far) on larger cases, 400-2000 cores >>>>>>> 2) It doesn't happen right away -- this was running happily for >>>>>>> several hours over several hundred time steps with no indication of bad >>>>>>> health in the numerics >>>>>>> 3) At least the total memory consumption seems to be within bounds, >>>>>>> though I'm not sure about individual processes. e.g. slurm here reported >>>>>>> Memory Efficiency: 75.23% of 1.76 TB (180.00 GB/node) >>>>>>> 4) running the same setup twice it fails at different points >>>>>>> >>>>>>> Any suggestions on what to look for? This is a bit painful to work >>>>>>> on as I can only reproduce it on large runs and then it's seemingly random. >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Mark >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlohry at gmail.com Mon Aug 24 10:15:28 2020 From: mlohry at gmail.com (Mark Lohry) Date: Mon, 24 Aug 2020 11:15:28 -0400 Subject: [petsc-users] Bus Error In-Reply-To: References: <8D172ADD-FC1A-4E71-B151-CA648951A61C@petsc.dev> <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> <02176361-CBC4-490B-A064-057C9FAC271A@petsc.dev> Message-ID: > > Do you ever use regular malloc()? PETSc malloc aligns automatically, but > the system one does not. Indirectly via new, yes. On Mon, Aug 24, 2020 at 11:10 AM Matthew Knepley wrote: > On Mon, Aug 24, 2020 at 10:56 AM Mark Lohry wrote: > >> Thanks Barry, I'll give -malloc_debug a shot. >> >> I know this is not necessarily a reasonable test but if you run the >>> exact same thing twice does it crash at the same location in terms of >>> iterations or does it seem to crash eventually "randomly" just after a long >>> time? >>> >> >> Crashes after a different number of iterations, seemingly random. >> >> >>> >>> I understand the frustration with this kind of crash, it just >>> shouldn't happen because the same BLAS calls have been made in the same way >>> thousands of times and yet suddenly trouble and very hard to debug. >>> >> >> Eventually makes for a good war story. >> >> Thinking back, I have seen some disturbing memory behavior that I think >> falls back to my use of eigen... e.g. in the past when running my full test >> suite a particular case would fail with NaNs, but if I ran that case in >> isolation it passes. I wonder if some object isn't getting properly aligned >> and at some point some kind of corruption occurs? >> > > Do you ever use regular malloc()? PETSc malloc aligns automatically, but > the system one does not. > > Thanks, > > Matt > > >> On Mon, Aug 24, 2020 at 10:35 AM Barry Smith wrote: >> >>> >>> Mark, >>> >>> Ok, I'd generally trust the stock BLAS for not failing over OpenBLAS. >>> >>> Since valgrind is not viable have you tried with -malloc_debug with >>> the bad case it will be a little bit slower but not to bad and can find >>> some memory corruption issues. >>> >>> It might be useful to get the stack trace inside the BLAS to see >>> exactly where it crashes. If you ./configure with debugging and use >>> --download-fblaslapack or --download-f2cblaslapack it will compile the BLAS >>> with debugging, but just running a batch job still won't display the stack >>> frames inside the BLAS call. >>> >>> We have an option -on_error_attach_debugger which is useful for longer >>> many rank runs that attaches the debugger ONLY when the error is detected >>> but it may not play well with batch systems. But if you can make your run >>> on a non-batch system it might be able, along with the >>> --download-fblaslapack or --download-f2cblaslapack to get the exact stack >>> frames. And in the debugger look at the variables and address points to try >>> to determine how it could have gone wrong. >>> >>> I know this is not necessarily a reasonable test but if you run the >>> exact same thing twice does it crash at the same location in terms of >>> iterations or does it seem to crash eventually "randomly" just after a long >>> time? >>> >>> I understand the frustration with this kind of crash, it just >>> shouldn't happen because the same BLAS calls have been made in the same way >>> thousands of times and yet suddenly trouble and very hard to debug. >>> >>> Barry >>> >>> >>> >>> >>> On Aug 24, 2020, at 9:15 AM, Mark Lohry wrote: >>> >>> valgrind: I ran a much smaller case and didn't see any issues in >>> valgrind. I'm only seeing this bus error on several hundred cores a few >>> hours wallclock in, so it might not be feasible to run that in valgrind. >>> >>> blas: i'm not entirely sure -- it's the stock one in PUIAS linux (red >>> hat derivative), libblas.so.3.4.2.. i'm going to try with intel and if that >>> fails use the openblas downloaded via petsc and see if it alleviates itself. >>> >>> >>> >>> On Mon, Aug 24, 2020 at 9:48 AM Barry Smith wrote: >>> >>>> >>>> Mark, >>>> >>>> Can you run in valgrind? >>>> >>>> Exactly what BLAS are you using? >>>> >>>> Barry >>>> >>>> >>>> On Aug 24, 2020, at 7:54 AM, Mark Lohry wrote: >>>> >>>> Reran with debug mode and got a stack trace for this bus error, looks >>>> like it's happening in BLASgemv, see pasted below. I did take care of the >>>> ISColoring leak mentioned previously, although that was a very small amount >>>> of data and I don't think is relevant here. >>>> >>>> At this point it's happily run 222 timesteps prior to this, so I'm a >>>> little mystified. Any ideas? >>>> >>>> Thanks, >>>> Mark >>>> >>>> >>>> 222 TS dt 0.03 time 6.66 >>>> 0 SNES Function norm 4.124287265556e+02 >>>> 0 KSP Residual norm 4.124287265556e+02 >>>> 1 KSP Residual norm 4.123248052318e+02 >>>> 2 KSP Residual norm 4.123173350456e+02 >>>> 3 KSP Residual norm 4.118769044110e+02 >>>> 4 KSP Residual norm 4.094856150740e+02 >>>> 5 KSP Residual norm 4.006000788078e+02 >>>> 6 KSP Residual norm 3.787922969183e+02 >>>> [clip] >>>> Linear solve converged due to CONVERGED_RTOL iterations 9 >>>> Line search: Using full step: fnorm 4.015236590684e+01 gnorm >>>> 3.173434863784e+00 >>>> 2 SNES Function norm 3.173434863784e+00 >>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 2 >>>> 0 SNES Function norm 5.842010710080e+02 >>>> 0 KSP Residual norm 5.842010710080e+02 >>>> 1 KSP Residual norm 5.840526408234e+02 >>>> 2 KSP Residual norm 5.840431857354e+02 >>>> 3 KSP Residual norm 5.834351392302e+02 >>>> 4 KSP Residual norm 5.800901047861e+02 >>>> 5 KSP Residual norm 5.675562288567e+02 >>>> 6 KSP Residual norm 5.366287895681e+02 >>>> 7 KSP Residual norm 4.725811521866e+02 >>>> [911]PETSC ERROR: >>>> ------------------------------------------------------------------------ >>>> [911]PETSC ERROR: Caught signal number 7 BUS: Bus Error, possibly >>>> illegal memory access >>>> [911]PETSC ERROR: Try option -start_in_debugger or >>>> -on_error_attach_debugger >>>> [911]PETSC ERROR: or see >>>> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >>>> [911]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple >>>> Mac OS X to find memory corruption errors >>>> [911]PETSC ERROR: likely location of problem given in stack below >>>> [911]PETSC ERROR: --------------------- Stack Frames >>>> ------------------------------------ >>>> [911]PETSC ERROR: Note: The EXACT line numbers in the stack are not >>>> available, >>>> [911]PETSC ERROR: INSTEAD the line number of the start of the >>>> function >>>> [911]PETSC ERROR: is given. >>>> [911]PETSC ERROR: [911] BLASgemv line 1393 >>>> /home/mlohry/build/external/petsc/src/mat/impls/baij/seq/baijfact.c >>>> [911]PETSC ERROR: [911] MatSolve_SeqBAIJ_N_NaturalOrdering line 1378 >>>> /home/mlohry/build/external/petsc/src/mat/impls/baij/seq/baijfact.c >>>> [911]PETSC ERROR: [911] MatSolve line 3354 >>>> /home/mlohry/build/external/petsc/src/mat/interface/matrix.c >>>> [911]PETSC ERROR: [911] PCApply_ILU line 201 >>>> /home/mlohry/build/external/petsc/src/ksp/pc/impls/factor/ilu/ilu.c >>>> [911]PETSC ERROR: [911] PCApply line 426 >>>> /home/mlohry/build/external/petsc/src/ksp/pc/interface/precon.c >>>> [911]PETSC ERROR: [911] KSP_PCApply line 279 >>>> /home/mlohry/build/external/petsc/include/petsc/private/kspimpl.h >>>> [911]PETSC ERROR: [911] KSPSolve_PREONLY line 16 >>>> /home/mlohry/build/external/petsc/src/ksp/ksp/impls/preonly/preonly.c >>>> [911]PETSC ERROR: [911] KSPSolve_Private line 590 >>>> /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >>>> [911]PETSC ERROR: [911] KSPSolve line 848 >>>> /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >>>> [911]PETSC ERROR: [911] PCApply_ASM line 441 >>>> /home/mlohry/build/external/petsc/src/ksp/pc/impls/asm/asm.c >>>> [911]PETSC ERROR: [911] PCApply line 426 >>>> /home/mlohry/build/external/petsc/src/ksp/pc/interface/precon.c >>>> [911]PETSC ERROR: [911] KSP_PCApply line 279 >>>> /home/mlohry/build/external/petsc/include/petsc/private/kspimpl.h >>>> [911]PETSC ERROR: [911] KSPFGMRESCycle line 108 >>>> /home/mlohry/build/external/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c >>>> [911]PETSC ERROR: [911] KSPSolve_FGMRES line 274 >>>> /home/mlohry/build/external/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c >>>> [911]PETSC ERROR: [911] KSPSolve_Private line 590 >>>> /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >>>> [911]PETSC ERROR: [911] KSPSolve line 848 >>>> /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >>>> [911]PETSC ERROR: [911] SNESSolve_NEWTONLS line 144 >>>> /home/mlohry/build/external/petsc/src/snes/impls/ls/ls.c >>>> [911]PETSC ERROR: [911] SNESSolve line 4403 >>>> /home/mlohry/build/external/petsc/src/snes/interface/snes.c >>>> [911]PETSC ERROR: [911] TSStep_ARKIMEX line 728 >>>> /home/mlohry/build/external/petsc/src/ts/impls/arkimex/arkimex.c >>>> [911]PETSC ERROR: [911] TSStep line 3682 >>>> /home/mlohry/build/external/petsc/src/ts/interface/ts.c >>>> [911]PETSC ERROR: [911] TSSolve line 4005 >>>> /home/mlohry/build/external/petsc/src/ts/interface/ts.c >>>> [911]PETSC ERROR: --------------------- Error Message >>>> -------------------------------------------------------------- >>>> [911]PETSC ERROR: Signal received >>>> [911]PETSC ERROR: See >>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>> shooting. >>>> [911]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020 >>>> [911]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h20c2n20 by >>>> mlohry Sun Aug 23 19:54:21 2020 >>>> [911]PETSC ERROR: Configure options >>>> PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt >>>> --with-cc=/usr/local/openmpi/3.1.3/gcc/x8 >>>> [911]PETSC ERROR: #1 User provided function() line 0 in unknown file >>>> >>>> -------------------------------------------------------------------------- >>>> MPI_ABORT was invoked on rank 911 in communicator MPI_COMM_WORLD >>>> >>>> On Wed, Aug 12, 2020 at 8:19 PM Mark Lohry wrote: >>>> >>>>> Perhaps you are calling ISColoringGetIS() and not calling >>>>>> ISColoringRestoreIS()? >>>>>> >>>>> >>>>> I have matching ISColoringGet/Restore here, and it's only used prior >>>>> to the first iteration so at least it doesn't seem to be growing. At the >>>>> bottom I pasted the malloc_view and malloc_debug output from running 1 time >>>>> step. >>>>> >>>>> I'm sort of thinking this might be a red herring -- is it possible the >>>>> rank 0 process is chewing up dramatically more memory than others, like >>>>> with logging or something? Like I mentioned earlier the total memory usage >>>>> is well under the machine limits. I'll spring in some >>>>> PetscMemoryGetMaximumUsage logging at every time step and try to get a big >>>>> job going again. >>>>> >>>>> >>>>> >>>>> Are you using Fortran? >>>>>> >>>>> >>>>> C++ >>>>> >>>>> >>>>> >>>>> [ 0]1408 bytes PetscSplitReductionCreate() line 63 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>>>> [ 0]80 bytes PetscSplitReductionCreate() line 57 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>>>> [ 0]16 bytes PetscCommBuildTwoSided_Allreduce() line 169 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/mpits.c >>>>> [ 0]16 bytes ISGeneralSetIndices_General() line 578 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]896 bytes ISCreate() line 37 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>> [ 0]272 bytes ISGeneralSetIndices_General() line 578 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]896 bytes ISCreate() line 37 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>> [ 0]880 bytes ISGeneralSetIndices_General() line 578 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]896 bytes ISCreate() line 37 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>> [ 0]960 bytes ISGeneralSetIndices_General() line 578 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]896 bytes ISCreate() line 37 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>> [ 0]976 bytes ISGeneralSetIndices_General() line 578 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]896 bytes ISCreate() line 37 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]896 bytes ISCreate() line 37 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]896 bytes ISCreate() line 37 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>> [ 0]1040 bytes ISGeneralSetIndices_General() line 578 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>> [ 0]896 bytes ISCreate() line 37 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>> [ 0]64 bytes ISColoringGetIS() line 266 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/utils/iscoloring.c >>>>> [ 0]32 bytes PetscCommDuplicate() line 129 in >>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/objects/tagm.c >>>>> [0] Maximum memory PetscMalloc()ed 610153776 maximum size of entire >>>>> process 719073280 >>>>> [0] Memory usage sorted by function >>>>> [0] 6 192 DMCoarsenHookAdd() >>>>> [0] 2 9984 DMCreate() >>>>> [0] 2 128 DMCreate_Shell() >>>>> [0] 2 64 DMDSEnlarge_Static() >>>>> [0] 1 672 DMKSPCreate() >>>>> [0] 3 96 DMRefineHookAdd() >>>>> [0] 3 2064 DMSNESCreate() >>>>> [0] 4 128 DMSubDomainHookAdd() >>>>> [0] 1 768 DMTSCreate() >>>>> [0] 2 96 ISColoringCreate() >>>>> [0] 8 12608 ISColoringGetIS() >>>>> [0] 1 307200 ISConcatenate() >>>>> [0] 29 25984 ISCreate() >>>>> [0] 25 400 ISCreate_General() >>>>> [0] 4 64 ISCreate_Stride() >>>>> [0] 20 338016 ISGeneralSetIndices_General() >>>>> [0] 3 921600 ISGetIndices_Stride() >>>>> [0] 2 307232 ISGlobalToLocalMappingSetUp_Basic() >>>>> [0] 1 6144 ISInvertPermutation_General() >>>>> [0] 3 308576 ISLocalToGlobalMappingCreate() >>>>> [0] 2 32 KSPConvergedDefaultCreate() >>>>> [0] 2 2816 KSPCreate() >>>>> [0] 1 224 KSPCreate_FGMRES() >>>>> [0] 1 8016 KSPGMRESClassicalGramSchmidtOrthogonalization() >>>>> [0] 2 16032 KSPSetUp_FGMRES() >>>>> [0] 4 16084160 KSPSetUp_GMRES() >>>>> [0] 2 36864 MatColoringApply_SL() >>>>> [0] 1 656 MatColoringCreate() >>>>> [0] 6 17088 MatCreate() >>>>> [0] 1 16 MatCreateMFFD_WP() >>>>> [0] 1 16 MatCreateSubMatrices_SeqBAIJ() >>>>> [0] 1 12288 MatCreateSubMatrix_SeqBAIJ() >>>>> [0] 3 32320 MatCreateSubMatrix_SeqBAIJ_Private() >>>>> [0] 2 1472 MatCreate_MFFD() >>>>> [0] 1 416 MatCreate_SeqAIJ() >>>>> [0] 3 864 MatCreate_SeqBAIJ() >>>>> [0] 2 416 MatCreate_Shell() >>>>> [0] 1 784 MatFDColoringCreate() >>>>> [0] 2 12288 MatFDColoringDegreeSequence_Minpack() >>>>> [0] 6 30859392 MatFDColoringSetUp_SeqXAIJ() >>>>> [0] 3 42512 MatGetColumnIJ_SeqAIJ() >>>>> [0] 4 72720 MatGetColumnIJ_SeqBAIJ_Color() >>>>> [0] 1 6144 MatGetOrdering_Natural() >>>>> [0] 2 36384 MatGetRowIJ_SeqAIJ() >>>>> [0] 7 210626000 MatILUFactorSymbolic_SeqBAIJ() >>>>> [0] 2 313376 MatIncreaseOverlap_SeqBAIJ() >>>>> [0] 2 30740608 MatLUFactorNumeric_SeqBAIJ_N() >>>>> [0] 1 6144 MatMarkDiagonal_SeqAIJ() >>>>> [0] 1 6144 MatMarkDiagonal_SeqBAIJ() >>>>> [0] 8 256 MatRegisterRootName() >>>>> [0] 1 6160 MatSeqAIJCheckInode() >>>>> [0] 4 115216 MatSeqAIJSetPreallocation_SeqAIJ() >>>>> [0] 4 302779424 MatSeqBAIJSetPreallocation_SeqBAIJ() >>>>> [0] 13 576 MatSolverTypeRegister() >>>>> [0] 1 16 PCASMCreateSubdomains() >>>>> [0] 2 1664 PCCreate() >>>>> [0] 1 160 PCCreate_ASM() >>>>> [0] 1 192 PCCreate_ILU() >>>>> [0] 5 307264 PCSetUp_ASM() >>>>> [0] 2 416 PetscBTCreate() >>>>> [0] 2 3216 PetscClassPerfLogCreate() >>>>> [0] 2 1616 PetscClassRegLogCreate() >>>>> [0] 2 32 PetscCommBuildTwoSided_Allreduce() >>>>> [0] 2 64 PetscCommDuplicate() >>>>> [0] 2 1888 PetscDSCreate() >>>>> [0] 2 26416 PetscEventPerfLogCreate() >>>>> [0] 2 158400 PetscEventPerfLogEnsureSize() >>>>> [0] 2 1616 PetscEventRegLogCreate() >>>>> [0] 2 9600 PetscEventRegLogRegister() >>>>> [0] 8 102400 PetscFreeSpaceGet() >>>>> [0] 474 15168 PetscFunctionListAdd_Private() >>>>> [0] 2 528 PetscIntStackCreate() >>>>> [0] 142 11360 PetscLayoutCreate() >>>>> [0] 56 896 PetscLayoutSetUp() >>>>> [0] 59 9440 PetscObjectComposedDataIncreaseReal() >>>>> [0] 2 576 PetscObjectListAdd() >>>>> [0] 33 768 PetscOptionsGetEList() >>>>> [0] 1 16 PetscOptionsHelpPrintedCreate() >>>>> [0] 1 32 PetscPushSignalHandler() >>>>> [0] 7 6944 PetscSFCreate() >>>>> [0] 3 432 PetscSFCreate_Basic() >>>>> [0] 2 1472 PetscSFLinkCreate() >>>>> [0] 11 1229040 PetscSFSetUpRanks() >>>>> [0] 7 614512 PetscSFSetUp_Basic() >>>>> [0] 4 20096 PetscSegBufferCreate() >>>>> [0] 2 1488 PetscSplitReductionCreate() >>>>> [0] 2 3008 PetscStageLogCreate() >>>>> [0] 1148 23872 PetscStrallocpy() >>>>> [0] 6 13056 PetscStrreplace() >>>>> [0] 9 3456 PetscTableCreate() >>>>> [0] 1 16 PetscViewerASCIIOpen() >>>>> [0] 6 96 PetscViewerAndFormatCreate() >>>>> [0] 1 752 PetscViewerCreate() >>>>> [0] 1 96 PetscViewerCreate_ASCII() >>>>> [0] 2 1424 SNESCreate() >>>>> [0] 1 16 SNESCreate_NEWTONLS() >>>>> [0] 1 1008 SNESLineSearchCreate() >>>>> [0] 1 16 SNESLineSearchCreate_BT() >>>>> [0] 16 1824 SNESMSRegister() >>>>> [0] 46 9056 TSARKIMEXRegister() >>>>> [0] 1 1264 TSAdaptCreate() >>>>> [0] 8 384 TSBasicSymplecticRegister() >>>>> [0] 1 2160 TSCreate() >>>>> [0] 1 224 TSCreate_Theta() >>>>> [0] 48 5968 TSGLEERegister() >>>>> [0] 41 7728 TSRKRegister() >>>>> [0] 89 14736 TSRosWRegister() >>>>> [0] 71 110192 VecCreate() >>>>> [0] 1 307200 VecCreateGhostWithArray() >>>>> [0] 123 36874080 VecCreate_MPI_Private() >>>>> [0] 7 4300800 VecCreate_Seq() >>>>> [0] 8 256 VecCreate_Seq_Private() >>>>> [0] 6 400 VecDuplicateVecs_Default() >>>>> [0] 3 2352 VecScatterCreate() >>>>> [0] 7 1843296 VecScatterSetUp_SF() >>>>> [0] 126 2016 VecStashCreate_Private() >>>>> [0] 1 3072 mapBlockColoringToJacobian() >>>>> >>>>> On Wed, Aug 12, 2020 at 4:22 PM Barry Smith wrote: >>>>> >>>>>> >>>>>> Yes, there are some PETSc objects or arrays that you are not >>>>>> freeing so they are printed at the end of the run. For small runs this >>>>>> harmless but if new objects/memory is allocated at each iteration and not >>>>>> suitably freed it will eventually add up. >>>>>> >>>>>> Run with -malloc_view (small problem with say 2 iterations) it >>>>>> will print everything allocated and might be helpful. >>>>>> >>>>>> Perhaps you are calling ISColoringGetIS() and not calling >>>>>> ISColoringRestoreIS()? >>>>>> >>>>>> It is also possible it is a leak in PETSc, but that is unlikely >>>>>> since we test for them. >>>>>> >>>>>> Are you using Fortran? >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> On Aug 12, 2020, at 1:29 PM, Mark Lohry wrote: >>>>>> >>>>>> Thanks Matt and Barry. At Matt's suggestion I ran a smaller >>>>>> representative case with valgrind and didn't see anything alarming (apart >>>>>> from a small leak in an older boost version I was using: >>>>>> https://github.com/boostorg/serialization/issues/104 although I >>>>>> don't think this was causing the issue). >>>>>> >>>>>> -malloc_debug dumps quite a lot, this is supposed to be empty right? >>>>>> Output pasted below. It looks like the same sequence of calls is repeated 8 >>>>>> times, which is how many nonlinear solves occurred in this particular run. >>>>>> Thoughts? >>>>>> >>>>>> >>>>>> >>>>>> [ 0]1408 bytes PetscSplitReductionCreate() line 63 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>>>>> [ 0]80 bytes PetscSplitReductionCreate() line 57 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>>>>> [ 0]16 bytes PetscCommBuildTwoSided_Allreduce() line 169 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/mpits.c >>>>>> [ 0]16 bytes ISGeneralSetIndices_General() line 578 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]896 bytes ISCreate() line 37 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>> [ 0]272 bytes ISGeneralSetIndices_General() line 578 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]896 bytes ISCreate() line 37 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>> [ 0]880 bytes ISGeneralSetIndices_General() line 578 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]896 bytes ISCreate() line 37 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>> [ 0]960 bytes ISGeneralSetIndices_General() line 578 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]896 bytes ISCreate() line 37 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>> [ 0]976 bytes ISGeneralSetIndices_General() line 578 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]896 bytes ISCreate() line 37 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]896 bytes ISCreate() line 37 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]896 bytes ISCreate() line 37 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>> [ 0]1040 bytes ISGeneralSetIndices_General() line 578 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]896 bytes ISCreate() line 37 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>> [ 0]64 bytes ISColoringGetIS() line 266 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/utils/iscoloring.c >>>>>> [ 0]32 bytes PetscCommDuplicate() line 129 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/objects/tagm.c >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Aug 12, 2020 at 1:46 PM Barry Smith wrote: >>>>>> >>>>>>> >>>>>>> Mark. >>>>>>> >>>>>>> When valgrind is not feasible (like on many centrally controlled >>>>>>> batch systems) you can run PETSc with an extra flag to do some memory error >>>>>>> checks >>>>>>> -malloc_debug >>>>>>> >>>>>>> this >>>>>>> >>>>>>> 1) fills all malloced memory with Nan so if the code is using >>>>>>> uninitialized memory it may be detected and >>>>>>> 2) checks the beginning and end of each alloced memory region for >>>>>>> out-of-bounds writes at each malloc and free. >>>>>>> >>>>>>> it will slow the code down a little bit but generally not a huge >>>>>>> amount. >>>>>>> >>>>>>> It is no where near as good as valgrind or other memory corruption >>>>>>> tools but it has the advantage you can run it anywhere on any size job. >>>>>>> >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Aug 12, 2020, at 7:46 AM, Matthew Knepley >>>>>>> wrote: >>>>>>> >>>>>>> On Wed, Aug 12, 2020 at 7:53 AM Mark Lohry wrote: >>>>>>> >>>>>>>> I'm getting seemingly random failures of late: >>>>>>>> Caught signal number 7 BUS: Bus Error, possibly illegal memory >>>>>>>> access >>>>>>>> >>>>>>> >>>>>>> The first thing I would do is run valgrind on as wide an array of >>>>>>> tests as you can. This will find problems >>>>>>> on things that run completely fine. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Matt >>>>>>> >>>>>>> >>>>>>>> Symptoms: >>>>>>>> 1) Seems to only happen (so far) on larger cases, 400-2000 cores >>>>>>>> 2) It doesn't happen right away -- this was running happily for >>>>>>>> several hours over several hundred time steps with no indication of bad >>>>>>>> health in the numerics >>>>>>>> 3) At least the total memory consumption seems to be within bounds, >>>>>>>> though I'm not sure about individual processes. e.g. slurm here reported >>>>>>>> Memory Efficiency: 75.23% of 1.76 TB (180.00 GB/node) >>>>>>>> 4) running the same setup twice it fails at different points >>>>>>>> >>>>>>>> Any suggestions on what to look for? This is a bit painful to work >>>>>>>> on as I can only reproduce it on large runs and then it's seemingly random. >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Mark >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their >>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>> experiments lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>> >>> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Aug 24 10:25:40 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 24 Aug 2020 11:25:40 -0400 Subject: [petsc-users] Bus Error In-Reply-To: References: <8D172ADD-FC1A-4E71-B151-CA648951A61C@petsc.dev> <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> <02176361-CBC4-490B-A064-057C9FAC271A@petsc.dev> Message-ID: On Mon, Aug 24, 2020 at 11:15 AM Mark Lohry wrote: > Do you ever use regular malloc()? PETSc malloc aligns automatically, but >> the system one does not. > > > Indirectly via new, yes. > I would consider replacing those. Thanks, Matt > On Mon, Aug 24, 2020 at 11:10 AM Matthew Knepley > wrote: > >> On Mon, Aug 24, 2020 at 10:56 AM Mark Lohry wrote: >> >>> Thanks Barry, I'll give -malloc_debug a shot. >>> >>> I know this is not necessarily a reasonable test but if you run the >>>> exact same thing twice does it crash at the same location in terms of >>>> iterations or does it seem to crash eventually "randomly" just after a long >>>> time? >>>> >>> >>> Crashes after a different number of iterations, seemingly random. >>> >>> >>>> >>>> I understand the frustration with this kind of crash, it just >>>> shouldn't happen because the same BLAS calls have been made in the same way >>>> thousands of times and yet suddenly trouble and very hard to debug. >>>> >>> >>> Eventually makes for a good war story. >>> >>> Thinking back, I have seen some disturbing memory behavior that I think >>> falls back to my use of eigen... e.g. in the past when running my full test >>> suite a particular case would fail with NaNs, but if I ran that case in >>> isolation it passes. I wonder if some object isn't getting properly aligned >>> and at some point some kind of corruption occurs? >>> >> >> Do you ever use regular malloc()? PETSc malloc aligns automatically, but >> the system one does not. >> >> Thanks, >> >> Matt >> >> >>> On Mon, Aug 24, 2020 at 10:35 AM Barry Smith wrote: >>> >>>> >>>> Mark, >>>> >>>> Ok, I'd generally trust the stock BLAS for not failing over OpenBLAS. >>>> >>>> Since valgrind is not viable have you tried with -malloc_debug with >>>> the bad case it will be a little bit slower but not to bad and can find >>>> some memory corruption issues. >>>> >>>> It might be useful to get the stack trace inside the BLAS to see >>>> exactly where it crashes. If you ./configure with debugging and use >>>> --download-fblaslapack or --download-f2cblaslapack it will compile the BLAS >>>> with debugging, but just running a batch job still won't display the stack >>>> frames inside the BLAS call. >>>> >>>> We have an option -on_error_attach_debugger which is useful for >>>> longer many rank runs that attaches the debugger ONLY when the error is >>>> detected but it may not play well with batch systems. But if you can make >>>> your run on a non-batch system it might be able, along with the >>>> --download-fblaslapack or --download-f2cblaslapack to get the exact stack >>>> frames. And in the debugger look at the variables and address points to try >>>> to determine how it could have gone wrong. >>>> >>>> I know this is not necessarily a reasonable test but if you run the >>>> exact same thing twice does it crash at the same location in terms of >>>> iterations or does it seem to crash eventually "randomly" just after a long >>>> time? >>>> >>>> I understand the frustration with this kind of crash, it just >>>> shouldn't happen because the same BLAS calls have been made in the same way >>>> thousands of times and yet suddenly trouble and very hard to debug. >>>> >>>> Barry >>>> >>>> >>>> >>>> >>>> On Aug 24, 2020, at 9:15 AM, Mark Lohry wrote: >>>> >>>> valgrind: I ran a much smaller case and didn't see any issues in >>>> valgrind. I'm only seeing this bus error on several hundred cores a few >>>> hours wallclock in, so it might not be feasible to run that in valgrind. >>>> >>>> blas: i'm not entirely sure -- it's the stock one in PUIAS linux (red >>>> hat derivative), libblas.so.3.4.2.. i'm going to try with intel and if that >>>> fails use the openblas downloaded via petsc and see if it alleviates itself. >>>> >>>> >>>> >>>> On Mon, Aug 24, 2020 at 9:48 AM Barry Smith wrote: >>>> >>>>> >>>>> Mark, >>>>> >>>>> Can you run in valgrind? >>>>> >>>>> Exactly what BLAS are you using? >>>>> >>>>> Barry >>>>> >>>>> >>>>> On Aug 24, 2020, at 7:54 AM, Mark Lohry wrote: >>>>> >>>>> Reran with debug mode and got a stack trace for this bus error, looks >>>>> like it's happening in BLASgemv, see pasted below. I did take care of the >>>>> ISColoring leak mentioned previously, although that was a very small amount >>>>> of data and I don't think is relevant here. >>>>> >>>>> At this point it's happily run 222 timesteps prior to this, so I'm a >>>>> little mystified. Any ideas? >>>>> >>>>> Thanks, >>>>> Mark >>>>> >>>>> >>>>> 222 TS dt 0.03 time 6.66 >>>>> 0 SNES Function norm 4.124287265556e+02 >>>>> 0 KSP Residual norm 4.124287265556e+02 >>>>> 1 KSP Residual norm 4.123248052318e+02 >>>>> 2 KSP Residual norm 4.123173350456e+02 >>>>> 3 KSP Residual norm 4.118769044110e+02 >>>>> 4 KSP Residual norm 4.094856150740e+02 >>>>> 5 KSP Residual norm 4.006000788078e+02 >>>>> 6 KSP Residual norm 3.787922969183e+02 >>>>> [clip] >>>>> Linear solve converged due to CONVERGED_RTOL iterations 9 >>>>> Line search: Using full step: fnorm 4.015236590684e+01 gnorm >>>>> 3.173434863784e+00 >>>>> 2 SNES Function norm 3.173434863784e+00 >>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations >>>>> 2 >>>>> 0 SNES Function norm 5.842010710080e+02 >>>>> 0 KSP Residual norm 5.842010710080e+02 >>>>> 1 KSP Residual norm 5.840526408234e+02 >>>>> 2 KSP Residual norm 5.840431857354e+02 >>>>> 3 KSP Residual norm 5.834351392302e+02 >>>>> 4 KSP Residual norm 5.800901047861e+02 >>>>> 5 KSP Residual norm 5.675562288567e+02 >>>>> 6 KSP Residual norm 5.366287895681e+02 >>>>> 7 KSP Residual norm 4.725811521866e+02 >>>>> [911]PETSC ERROR: >>>>> ------------------------------------------------------------------------ >>>>> [911]PETSC ERROR: Caught signal number 7 BUS: Bus Error, possibly >>>>> illegal memory access >>>>> [911]PETSC ERROR: Try option -start_in_debugger or >>>>> -on_error_attach_debugger >>>>> [911]PETSC ERROR: or see >>>>> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >>>>> [911]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple >>>>> Mac OS X to find memory corruption errors >>>>> [911]PETSC ERROR: likely location of problem given in stack below >>>>> [911]PETSC ERROR: --------------------- Stack Frames >>>>> ------------------------------------ >>>>> [911]PETSC ERROR: Note: The EXACT line numbers in the stack are not >>>>> available, >>>>> [911]PETSC ERROR: INSTEAD the line number of the start of the >>>>> function >>>>> [911]PETSC ERROR: is given. >>>>> [911]PETSC ERROR: [911] BLASgemv line 1393 >>>>> /home/mlohry/build/external/petsc/src/mat/impls/baij/seq/baijfact.c >>>>> [911]PETSC ERROR: [911] MatSolve_SeqBAIJ_N_NaturalOrdering line 1378 >>>>> /home/mlohry/build/external/petsc/src/mat/impls/baij/seq/baijfact.c >>>>> [911]PETSC ERROR: [911] MatSolve line 3354 >>>>> /home/mlohry/build/external/petsc/src/mat/interface/matrix.c >>>>> [911]PETSC ERROR: [911] PCApply_ILU line 201 >>>>> /home/mlohry/build/external/petsc/src/ksp/pc/impls/factor/ilu/ilu.c >>>>> [911]PETSC ERROR: [911] PCApply line 426 >>>>> /home/mlohry/build/external/petsc/src/ksp/pc/interface/precon.c >>>>> [911]PETSC ERROR: [911] KSP_PCApply line 279 >>>>> /home/mlohry/build/external/petsc/include/petsc/private/kspimpl.h >>>>> [911]PETSC ERROR: [911] KSPSolve_PREONLY line 16 >>>>> /home/mlohry/build/external/petsc/src/ksp/ksp/impls/preonly/preonly.c >>>>> [911]PETSC ERROR: [911] KSPSolve_Private line 590 >>>>> /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >>>>> [911]PETSC ERROR: [911] KSPSolve line 848 >>>>> /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >>>>> [911]PETSC ERROR: [911] PCApply_ASM line 441 >>>>> /home/mlohry/build/external/petsc/src/ksp/pc/impls/asm/asm.c >>>>> [911]PETSC ERROR: [911] PCApply line 426 >>>>> /home/mlohry/build/external/petsc/src/ksp/pc/interface/precon.c >>>>> [911]PETSC ERROR: [911] KSP_PCApply line 279 >>>>> /home/mlohry/build/external/petsc/include/petsc/private/kspimpl.h >>>>> [911]PETSC ERROR: [911] KSPFGMRESCycle line 108 >>>>> /home/mlohry/build/external/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c >>>>> [911]PETSC ERROR: [911] KSPSolve_FGMRES line 274 >>>>> /home/mlohry/build/external/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c >>>>> [911]PETSC ERROR: [911] KSPSolve_Private line 590 >>>>> /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >>>>> [911]PETSC ERROR: [911] KSPSolve line 848 >>>>> /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >>>>> [911]PETSC ERROR: [911] SNESSolve_NEWTONLS line 144 >>>>> /home/mlohry/build/external/petsc/src/snes/impls/ls/ls.c >>>>> [911]PETSC ERROR: [911] SNESSolve line 4403 >>>>> /home/mlohry/build/external/petsc/src/snes/interface/snes.c >>>>> [911]PETSC ERROR: [911] TSStep_ARKIMEX line 728 >>>>> /home/mlohry/build/external/petsc/src/ts/impls/arkimex/arkimex.c >>>>> [911]PETSC ERROR: [911] TSStep line 3682 >>>>> /home/mlohry/build/external/petsc/src/ts/interface/ts.c >>>>> [911]PETSC ERROR: [911] TSSolve line 4005 >>>>> /home/mlohry/build/external/petsc/src/ts/interface/ts.c >>>>> [911]PETSC ERROR: --------------------- Error Message >>>>> -------------------------------------------------------------- >>>>> [911]PETSC ERROR: Signal received >>>>> [911]PETSC ERROR: See >>>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>>>> shooting. >>>>> [911]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020 >>>>> [911]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h20c2n20 by >>>>> mlohry Sun Aug 23 19:54:21 2020 >>>>> [911]PETSC ERROR: Configure options >>>>> PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt >>>>> --with-cc=/usr/local/openmpi/3.1.3/gcc/x8 >>>>> [911]PETSC ERROR: #1 User provided function() line 0 in unknown file >>>>> >>>>> -------------------------------------------------------------------------- >>>>> MPI_ABORT was invoked on rank 911 in communicator MPI_COMM_WORLD >>>>> >>>>> On Wed, Aug 12, 2020 at 8:19 PM Mark Lohry wrote: >>>>> >>>>>> Perhaps you are calling ISColoringGetIS() and not calling >>>>>>> ISColoringRestoreIS()? >>>>>>> >>>>>> >>>>>> I have matching ISColoringGet/Restore here, and it's only used prior >>>>>> to the first iteration so at least it doesn't seem to be growing. At the >>>>>> bottom I pasted the malloc_view and malloc_debug output from running 1 time >>>>>> step. >>>>>> >>>>>> I'm sort of thinking this might be a red herring -- is it possible >>>>>> the rank 0 process is chewing up dramatically more memory than others, like >>>>>> with logging or something? Like I mentioned earlier the total memory usage >>>>>> is well under the machine limits. I'll spring in some >>>>>> PetscMemoryGetMaximumUsage logging at every time step and try to get a big >>>>>> job going again. >>>>>> >>>>>> >>>>>> >>>>>> Are you using Fortran? >>>>>>> >>>>>> >>>>>> C++ >>>>>> >>>>>> >>>>>> >>>>>> [ 0]1408 bytes PetscSplitReductionCreate() line 63 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>>>>> [ 0]80 bytes PetscSplitReductionCreate() line 57 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>>>>> [ 0]16 bytes PetscCommBuildTwoSided_Allreduce() line 169 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/mpits.c >>>>>> [ 0]16 bytes ISGeneralSetIndices_General() line 578 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]896 bytes ISCreate() line 37 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>> [ 0]272 bytes ISGeneralSetIndices_General() line 578 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]896 bytes ISCreate() line 37 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>> [ 0]880 bytes ISGeneralSetIndices_General() line 578 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]896 bytes ISCreate() line 37 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>> [ 0]960 bytes ISGeneralSetIndices_General() line 578 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]896 bytes ISCreate() line 37 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>> [ 0]976 bytes ISGeneralSetIndices_General() line 578 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]896 bytes ISCreate() line 37 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]896 bytes ISCreate() line 37 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]896 bytes ISCreate() line 37 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>> [ 0]1040 bytes ISGeneralSetIndices_General() line 578 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>> [ 0]896 bytes ISCreate() line 37 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>> [ 0]64 bytes ISColoringGetIS() line 266 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/utils/iscoloring.c >>>>>> [ 0]32 bytes PetscCommDuplicate() line 129 in >>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/objects/tagm.c >>>>>> [0] Maximum memory PetscMalloc()ed 610153776 maximum size of entire >>>>>> process 719073280 >>>>>> [0] Memory usage sorted by function >>>>>> [0] 6 192 DMCoarsenHookAdd() >>>>>> [0] 2 9984 DMCreate() >>>>>> [0] 2 128 DMCreate_Shell() >>>>>> [0] 2 64 DMDSEnlarge_Static() >>>>>> [0] 1 672 DMKSPCreate() >>>>>> [0] 3 96 DMRefineHookAdd() >>>>>> [0] 3 2064 DMSNESCreate() >>>>>> [0] 4 128 DMSubDomainHookAdd() >>>>>> [0] 1 768 DMTSCreate() >>>>>> [0] 2 96 ISColoringCreate() >>>>>> [0] 8 12608 ISColoringGetIS() >>>>>> [0] 1 307200 ISConcatenate() >>>>>> [0] 29 25984 ISCreate() >>>>>> [0] 25 400 ISCreate_General() >>>>>> [0] 4 64 ISCreate_Stride() >>>>>> [0] 20 338016 ISGeneralSetIndices_General() >>>>>> [0] 3 921600 ISGetIndices_Stride() >>>>>> [0] 2 307232 ISGlobalToLocalMappingSetUp_Basic() >>>>>> [0] 1 6144 ISInvertPermutation_General() >>>>>> [0] 3 308576 ISLocalToGlobalMappingCreate() >>>>>> [0] 2 32 KSPConvergedDefaultCreate() >>>>>> [0] 2 2816 KSPCreate() >>>>>> [0] 1 224 KSPCreate_FGMRES() >>>>>> [0] 1 8016 KSPGMRESClassicalGramSchmidtOrthogonalization() >>>>>> [0] 2 16032 KSPSetUp_FGMRES() >>>>>> [0] 4 16084160 KSPSetUp_GMRES() >>>>>> [0] 2 36864 MatColoringApply_SL() >>>>>> [0] 1 656 MatColoringCreate() >>>>>> [0] 6 17088 MatCreate() >>>>>> [0] 1 16 MatCreateMFFD_WP() >>>>>> [0] 1 16 MatCreateSubMatrices_SeqBAIJ() >>>>>> [0] 1 12288 MatCreateSubMatrix_SeqBAIJ() >>>>>> [0] 3 32320 MatCreateSubMatrix_SeqBAIJ_Private() >>>>>> [0] 2 1472 MatCreate_MFFD() >>>>>> [0] 1 416 MatCreate_SeqAIJ() >>>>>> [0] 3 864 MatCreate_SeqBAIJ() >>>>>> [0] 2 416 MatCreate_Shell() >>>>>> [0] 1 784 MatFDColoringCreate() >>>>>> [0] 2 12288 MatFDColoringDegreeSequence_Minpack() >>>>>> [0] 6 30859392 MatFDColoringSetUp_SeqXAIJ() >>>>>> [0] 3 42512 MatGetColumnIJ_SeqAIJ() >>>>>> [0] 4 72720 MatGetColumnIJ_SeqBAIJ_Color() >>>>>> [0] 1 6144 MatGetOrdering_Natural() >>>>>> [0] 2 36384 MatGetRowIJ_SeqAIJ() >>>>>> [0] 7 210626000 MatILUFactorSymbolic_SeqBAIJ() >>>>>> [0] 2 313376 MatIncreaseOverlap_SeqBAIJ() >>>>>> [0] 2 30740608 MatLUFactorNumeric_SeqBAIJ_N() >>>>>> [0] 1 6144 MatMarkDiagonal_SeqAIJ() >>>>>> [0] 1 6144 MatMarkDiagonal_SeqBAIJ() >>>>>> [0] 8 256 MatRegisterRootName() >>>>>> [0] 1 6160 MatSeqAIJCheckInode() >>>>>> [0] 4 115216 MatSeqAIJSetPreallocation_SeqAIJ() >>>>>> [0] 4 302779424 MatSeqBAIJSetPreallocation_SeqBAIJ() >>>>>> [0] 13 576 MatSolverTypeRegister() >>>>>> [0] 1 16 PCASMCreateSubdomains() >>>>>> [0] 2 1664 PCCreate() >>>>>> [0] 1 160 PCCreate_ASM() >>>>>> [0] 1 192 PCCreate_ILU() >>>>>> [0] 5 307264 PCSetUp_ASM() >>>>>> [0] 2 416 PetscBTCreate() >>>>>> [0] 2 3216 PetscClassPerfLogCreate() >>>>>> [0] 2 1616 PetscClassRegLogCreate() >>>>>> [0] 2 32 PetscCommBuildTwoSided_Allreduce() >>>>>> [0] 2 64 PetscCommDuplicate() >>>>>> [0] 2 1888 PetscDSCreate() >>>>>> [0] 2 26416 PetscEventPerfLogCreate() >>>>>> [0] 2 158400 PetscEventPerfLogEnsureSize() >>>>>> [0] 2 1616 PetscEventRegLogCreate() >>>>>> [0] 2 9600 PetscEventRegLogRegister() >>>>>> [0] 8 102400 PetscFreeSpaceGet() >>>>>> [0] 474 15168 PetscFunctionListAdd_Private() >>>>>> [0] 2 528 PetscIntStackCreate() >>>>>> [0] 142 11360 PetscLayoutCreate() >>>>>> [0] 56 896 PetscLayoutSetUp() >>>>>> [0] 59 9440 PetscObjectComposedDataIncreaseReal() >>>>>> [0] 2 576 PetscObjectListAdd() >>>>>> [0] 33 768 PetscOptionsGetEList() >>>>>> [0] 1 16 PetscOptionsHelpPrintedCreate() >>>>>> [0] 1 32 PetscPushSignalHandler() >>>>>> [0] 7 6944 PetscSFCreate() >>>>>> [0] 3 432 PetscSFCreate_Basic() >>>>>> [0] 2 1472 PetscSFLinkCreate() >>>>>> [0] 11 1229040 PetscSFSetUpRanks() >>>>>> [0] 7 614512 PetscSFSetUp_Basic() >>>>>> [0] 4 20096 PetscSegBufferCreate() >>>>>> [0] 2 1488 PetscSplitReductionCreate() >>>>>> [0] 2 3008 PetscStageLogCreate() >>>>>> [0] 1148 23872 PetscStrallocpy() >>>>>> [0] 6 13056 PetscStrreplace() >>>>>> [0] 9 3456 PetscTableCreate() >>>>>> [0] 1 16 PetscViewerASCIIOpen() >>>>>> [0] 6 96 PetscViewerAndFormatCreate() >>>>>> [0] 1 752 PetscViewerCreate() >>>>>> [0] 1 96 PetscViewerCreate_ASCII() >>>>>> [0] 2 1424 SNESCreate() >>>>>> [0] 1 16 SNESCreate_NEWTONLS() >>>>>> [0] 1 1008 SNESLineSearchCreate() >>>>>> [0] 1 16 SNESLineSearchCreate_BT() >>>>>> [0] 16 1824 SNESMSRegister() >>>>>> [0] 46 9056 TSARKIMEXRegister() >>>>>> [0] 1 1264 TSAdaptCreate() >>>>>> [0] 8 384 TSBasicSymplecticRegister() >>>>>> [0] 1 2160 TSCreate() >>>>>> [0] 1 224 TSCreate_Theta() >>>>>> [0] 48 5968 TSGLEERegister() >>>>>> [0] 41 7728 TSRKRegister() >>>>>> [0] 89 14736 TSRosWRegister() >>>>>> [0] 71 110192 VecCreate() >>>>>> [0] 1 307200 VecCreateGhostWithArray() >>>>>> [0] 123 36874080 VecCreate_MPI_Private() >>>>>> [0] 7 4300800 VecCreate_Seq() >>>>>> [0] 8 256 VecCreate_Seq_Private() >>>>>> [0] 6 400 VecDuplicateVecs_Default() >>>>>> [0] 3 2352 VecScatterCreate() >>>>>> [0] 7 1843296 VecScatterSetUp_SF() >>>>>> [0] 126 2016 VecStashCreate_Private() >>>>>> [0] 1 3072 mapBlockColoringToJacobian() >>>>>> >>>>>> On Wed, Aug 12, 2020 at 4:22 PM Barry Smith wrote: >>>>>> >>>>>>> >>>>>>> Yes, there are some PETSc objects or arrays that you are not >>>>>>> freeing so they are printed at the end of the run. For small runs this >>>>>>> harmless but if new objects/memory is allocated at each iteration and not >>>>>>> suitably freed it will eventually add up. >>>>>>> >>>>>>> Run with -malloc_view (small problem with say 2 iterations) it >>>>>>> will print everything allocated and might be helpful. >>>>>>> >>>>>>> Perhaps you are calling ISColoringGetIS() and not calling >>>>>>> ISColoringRestoreIS()? >>>>>>> >>>>>>> It is also possible it is a leak in PETSc, but that is unlikely >>>>>>> since we test for them. >>>>>>> >>>>>>> Are you using Fortran? >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> >>>>>>> On Aug 12, 2020, at 1:29 PM, Mark Lohry wrote: >>>>>>> >>>>>>> Thanks Matt and Barry. At Matt's suggestion I ran a smaller >>>>>>> representative case with valgrind and didn't see anything alarming (apart >>>>>>> from a small leak in an older boost version I was using: >>>>>>> https://github.com/boostorg/serialization/issues/104 although I >>>>>>> don't think this was causing the issue). >>>>>>> >>>>>>> -malloc_debug dumps quite a lot, this is supposed to be empty right? >>>>>>> Output pasted below. It looks like the same sequence of calls is repeated 8 >>>>>>> times, which is how many nonlinear solves occurred in this particular run. >>>>>>> Thoughts? >>>>>>> >>>>>>> >>>>>>> >>>>>>> [ 0]1408 bytes PetscSplitReductionCreate() line 63 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>>>>>> [ 0]80 bytes PetscSplitReductionCreate() line 57 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>>>>>> [ 0]16 bytes PetscCommBuildTwoSided_Allreduce() line 169 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/mpits.c >>>>>>> [ 0]16 bytes ISGeneralSetIndices_General() line 578 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>> [ 0]896 bytes ISCreate() line 37 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>>> [ 0]272 bytes ISGeneralSetIndices_General() line 578 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>> [ 0]896 bytes ISCreate() line 37 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>>> [ 0]880 bytes ISGeneralSetIndices_General() line 578 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>> [ 0]896 bytes ISCreate() line 37 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>>> [ 0]960 bytes ISGeneralSetIndices_General() line 578 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>> [ 0]896 bytes ISCreate() line 37 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>>> [ 0]976 bytes ISGeneralSetIndices_General() line 578 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>> [ 0]896 bytes ISCreate() line 37 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>> [ 0]896 bytes ISCreate() line 37 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>> [ 0]896 bytes ISCreate() line 37 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>>> [ 0]1040 bytes ISGeneralSetIndices_General() line 578 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>> [ 0]16 bytes ISCreate_General() line 647 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>> [ 0]896 bytes ISCreate() line 37 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>>> [ 0]64 bytes ISColoringGetIS() line 266 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/utils/iscoloring.c >>>>>>> [ 0]32 bytes PetscCommDuplicate() line 129 in >>>>>>> /home/mlohry/dev/cmake-build/external/petsc/src/sys/objects/tagm.c >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Aug 12, 2020 at 1:46 PM Barry Smith >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> Mark. >>>>>>>> >>>>>>>> When valgrind is not feasible (like on many centrally >>>>>>>> controlled batch systems) you can run PETSc with an extra flag to do some >>>>>>>> memory error checks >>>>>>>> -malloc_debug >>>>>>>> >>>>>>>> this >>>>>>>> >>>>>>>> 1) fills all malloced memory with Nan so if the code is using >>>>>>>> uninitialized memory it may be detected and >>>>>>>> 2) checks the beginning and end of each alloced memory region for >>>>>>>> out-of-bounds writes at each malloc and free. >>>>>>>> >>>>>>>> it will slow the code down a little bit but generally not a huge >>>>>>>> amount. >>>>>>>> >>>>>>>> It is no where near as good as valgrind or other memory corruption >>>>>>>> tools but it has the advantage you can run it anywhere on any size job. >>>>>>>> >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Aug 12, 2020, at 7:46 AM, Matthew Knepley >>>>>>>> wrote: >>>>>>>> >>>>>>>> On Wed, Aug 12, 2020 at 7:53 AM Mark Lohry >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I'm getting seemingly random failures of late: >>>>>>>>> Caught signal number 7 BUS: Bus Error, possibly illegal memory >>>>>>>>> access >>>>>>>>> >>>>>>>> >>>>>>>> The first thing I would do is run valgrind on as wide an array of >>>>>>>> tests as you can. This will find problems >>>>>>>> on things that run completely fine. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Matt >>>>>>>> >>>>>>>> >>>>>>>>> Symptoms: >>>>>>>>> 1) Seems to only happen (so far) on larger cases, 400-2000 cores >>>>>>>>> 2) It doesn't happen right away -- this was running happily for >>>>>>>>> several hours over several hundred time steps with no indication of bad >>>>>>>>> health in the numerics >>>>>>>>> 3) At least the total memory consumption seems to be within >>>>>>>>> bounds, though I'm not sure about individual processes. e.g. slurm here >>>>>>>>> reported Memory Efficiency: 75.23% of 1.76 TB (180.00 GB/node) >>>>>>>>> 4) running the same setup twice it fails at different points >>>>>>>>> >>>>>>>>> Any suggestions on what to look for? This is a bit painful to work >>>>>>>>> on as I can only reproduce it on large runs and then it's seemingly random. >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Mark >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> What most experimenters take for granted before they begin their >>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>> experiments lead. >>>>>>>> -- Norbert Wiener >>>>>>>> >>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jacob.fai at gmail.com Mon Aug 24 10:58:06 2020 From: jacob.fai at gmail.com (Jacob Faibussowitsch) Date: Mon, 24 Aug 2020 11:58:06 -0400 Subject: [petsc-users] Bus Error In-Reply-To: References: Message-ID: <2480130F-C30C-4B45-B623-BD10284B0369@gmail.com> WRT PetscMalloc: if you are using PetscMallocX to alloc I believe you must also use the corresponding PetscFreeX (so for PetscMalloc2 you must free both pointers at once using PetscFree2, not individually) to free the pointers because of the coalesced malloc. I had a similar issue in a pipeline a while back that gave me a bus error, so perhaps it is applicable? Best, Jacob Faibussowitsch Jacob.fai at gmail.com Faibuss2 at illinois.edu Cell: (312) 694-3391 > On Aug 24, 2020, at 11:27, Matthew Knepley wrote: > > ? > On Mon, Aug 24, 2020 at 11:15 AM Mark Lohry wrote: >>> Do you ever use regular malloc()? PETSc malloc aligns automatically, but the system one does not. >> >> Indirectly via new, yes. > > I would consider replacing those. > > Thanks, > > Matt > >> On Mon, Aug 24, 2020 at 11:10 AM Matthew Knepley wrote: >>> On Mon, Aug 24, 2020 at 10:56 AM Mark Lohry wrote: >>>> Thanks Barry, I'll give -malloc_debug a shot. >>>> >>>>> I know this is not necessarily a reasonable test but if you run the exact same thing twice does it crash at the same location in terms of iterations or does it seem to crash eventually "randomly" just after a long time? >>>> >>>> Crashes after a different number of iterations, seemingly random. >>>> >>>>> >>>>> I understand the frustration with this kind of crash, it just shouldn't happen because the same BLAS calls have been made in the same way thousands of times and yet suddenly trouble and very hard to debug. >>>> >>>> Eventually makes for a good war story. >>>> >>>> Thinking back, I have seen some disturbing memory behavior that I think falls back to my use of eigen... e.g. in the past when running my full test suite a particular case would fail with NaNs, but if I ran that case in isolation it passes. I wonder if some object isn't getting properly aligned and at some point some kind of corruption occurs? >>> >>> Do you ever use regular malloc()? PETSc malloc aligns automatically, but the system one does not. >>> >>> Thanks, >>> >>> Matt >>> >>>> On Mon, Aug 24, 2020 at 10:35 AM Barry Smith wrote: >>>>> >>>>> Mark, >>>>> >>>>> Ok, I'd generally trust the stock BLAS for not failing over OpenBLAS. >>>>> >>>>> Since valgrind is not viable have you tried with -malloc_debug with the bad case it will be a little bit slower but not to bad and can find some memory corruption issues. >>>>> >>>>> It might be useful to get the stack trace inside the BLAS to see exactly where it crashes. If you ./configure with debugging and use --download-fblaslapack or --download-f2cblaslapack it will compile the BLAS with debugging, but just running a batch job still won't display the stack frames inside the BLAS call. >>>>> >>>>> We have an option -on_error_attach_debugger which is useful for longer many rank runs that attaches the debugger ONLY when the error is detected but it may not play well with batch systems. But if you can make your run on a non-batch system it might be able, along with the --download-fblaslapack or --download-f2cblaslapack to get the exact stack frames. And in the debugger look at the variables and address points to try to determine how it could have gone wrong. >>>>> >>>>> I know this is not necessarily a reasonable test but if you run the exact same thing twice does it crash at the same location in terms of iterations or does it seem to crash eventually "randomly" just after a long time? >>>>> >>>>> I understand the frustration with this kind of crash, it just shouldn't happen because the same BLAS calls have been made in the same way thousands of times and yet suddenly trouble and very hard to debug. >>>>> >>>>> Barry >>>>> >>>>> >>>>> >>>>> >>>>>> On Aug 24, 2020, at 9:15 AM, Mark Lohry wrote: >>>>>> >>>>>> valgrind: I ran a much smaller case and didn't see any issues in valgrind. I'm only seeing this bus error on several hundred cores a few hours wallclock in, so it might not be feasible to run that in valgrind. >>>>>> >>>>>> blas: i'm not entirely sure -- it's the stock one in PUIAS linux (red hat derivative), libblas.so.3.4.2.. i'm going to try with intel and if that fails use the openblas downloaded via petsc and see if it alleviates itself. >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Aug 24, 2020 at 9:48 AM Barry Smith wrote: >>>>>>> >>>>>>> Mark, >>>>>>> >>>>>>> Can you run in valgrind? >>>>>>> >>>>>>> Exactly what BLAS are you using? >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> >>>>>>>> On Aug 24, 2020, at 7:54 AM, Mark Lohry wrote: >>>>>>>> >>>>>>>> Reran with debug mode and got a stack trace for this bus error, looks like it's happening in BLASgemv, see pasted below. I did take care of the ISColoring leak mentioned previously, although that was a very small amount of data and I don't think is relevant here. >>>>>>>> >>>>>>>> At this point it's happily run 222 timesteps prior to this, so I'm a little mystified. Any ideas? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Mark >>>>>>>> >>>>>>>> >>>>>>>> 222 TS dt 0.03 time 6.66 >>>>>>>> 0 SNES Function norm 4.124287265556e+02 >>>>>>>> 0 KSP Residual norm 4.124287265556e+02 >>>>>>>> 1 KSP Residual norm 4.123248052318e+02 >>>>>>>> 2 KSP Residual norm 4.123173350456e+02 >>>>>>>> 3 KSP Residual norm 4.118769044110e+02 >>>>>>>> 4 KSP Residual norm 4.094856150740e+02 >>>>>>>> 5 KSP Residual norm 4.006000788078e+02 >>>>>>>> 6 KSP Residual norm 3.787922969183e+02 >>>>>>>> [clip] >>>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 9 >>>>>>>> Line search: Using full step: fnorm 4.015236590684e+01 gnorm 3.173434863784e+00 >>>>>>>> 2 SNES Function norm 3.173434863784e+00 >>>>>>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 2 >>>>>>>> 0 SNES Function norm 5.842010710080e+02 >>>>>>>> 0 KSP Residual norm 5.842010710080e+02 >>>>>>>> 1 KSP Residual norm 5.840526408234e+02 >>>>>>>> 2 KSP Residual norm 5.840431857354e+02 >>>>>>>> 3 KSP Residual norm 5.834351392302e+02 >>>>>>>> 4 KSP Residual norm 5.800901047861e+02 >>>>>>>> 5 KSP Residual norm 5.675562288567e+02 >>>>>>>> 6 KSP Residual norm 5.366287895681e+02 >>>>>>>> 7 KSP Residual norm 4.725811521866e+02 >>>>>>>> [911]PETSC ERROR: ------------------------------------------------------------------------ >>>>>>>> [911]PETSC ERROR: Caught signal number 7 BUS: Bus Error, possibly illegal memory access >>>>>>>> [911]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >>>>>>>> [911]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >>>>>>>> [911]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors >>>>>>>> [911]PETSC ERROR: likely location of problem given in stack below >>>>>>>> [911]PETSC ERROR: --------------------- Stack Frames ------------------------------------ >>>>>>>> [911]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, >>>>>>>> [911]PETSC ERROR: INSTEAD the line number of the start of the function >>>>>>>> [911]PETSC ERROR: is given. >>>>>>>> [911]PETSC ERROR: [911] BLASgemv line 1393 /home/mlohry/build/external/petsc/src/mat/impls/baij/seq/baijfact.c >>>>>>>> [911]PETSC ERROR: [911] MatSolve_SeqBAIJ_N_NaturalOrdering line 1378 /home/mlohry/build/external/petsc/src/mat/impls/baij/seq/baijfact.c >>>>>>>> [911]PETSC ERROR: [911] MatSolve line 3354 /home/mlohry/build/external/petsc/src/mat/interface/matrix.c >>>>>>>> [911]PETSC ERROR: [911] PCApply_ILU line 201 /home/mlohry/build/external/petsc/src/ksp/pc/impls/factor/ilu/ilu.c >>>>>>>> [911]PETSC ERROR: [911] PCApply line 426 /home/mlohry/build/external/petsc/src/ksp/pc/interface/precon.c >>>>>>>> [911]PETSC ERROR: [911] KSP_PCApply line 279 /home/mlohry/build/external/petsc/include/petsc/private/kspimpl.h >>>>>>>> [911]PETSC ERROR: [911] KSPSolve_PREONLY line 16 /home/mlohry/build/external/petsc/src/ksp/ksp/impls/preonly/preonly.c >>>>>>>> [911]PETSC ERROR: [911] KSPSolve_Private line 590 /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>>> [911]PETSC ERROR: [911] KSPSolve line 848 /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>>> [911]PETSC ERROR: [911] PCApply_ASM line 441 /home/mlohry/build/external/petsc/src/ksp/pc/impls/asm/asm.c >>>>>>>> [911]PETSC ERROR: [911] PCApply line 426 /home/mlohry/build/external/petsc/src/ksp/pc/interface/precon.c >>>>>>>> [911]PETSC ERROR: [911] KSP_PCApply line 279 /home/mlohry/build/external/petsc/include/petsc/private/kspimpl.h >>>>>>>> [911]PETSC ERROR: [911] KSPFGMRESCycle line 108 /home/mlohry/build/external/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c >>>>>>>> [911]PETSC ERROR: [911] KSPSolve_FGMRES line 274 /home/mlohry/build/external/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c >>>>>>>> [911]PETSC ERROR: [911] KSPSolve_Private line 590 /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>>> [911]PETSC ERROR: [911] KSPSolve line 848 /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >>>>>>>> [911]PETSC ERROR: [911] SNESSolve_NEWTONLS line 144 /home/mlohry/build/external/petsc/src/snes/impls/ls/ls.c >>>>>>>> [911]PETSC ERROR: [911] SNESSolve line 4403 /home/mlohry/build/external/petsc/src/snes/interface/snes.c >>>>>>>> [911]PETSC ERROR: [911] TSStep_ARKIMEX line 728 /home/mlohry/build/external/petsc/src/ts/impls/arkimex/arkimex.c >>>>>>>> [911]PETSC ERROR: [911] TSStep line 3682 /home/mlohry/build/external/petsc/src/ts/interface/ts.c >>>>>>>> [911]PETSC ERROR: [911] TSSolve line 4005 /home/mlohry/build/external/petsc/src/ts/interface/ts.c >>>>>>>> [911]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>>>>>>> [911]PETSC ERROR: Signal received >>>>>>>> [911]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>>>>>>> [911]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020 >>>>>>>> [911]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h20c2n20 by mlohry Sun Aug 23 19:54:21 2020 >>>>>>>> [911]PETSC ERROR: Configure options PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt --with-cc=/usr/local/openmpi/3.1.3/gcc/x8 >>>>>>>> [911]PETSC ERROR: #1 User provided function() line 0 in unknown file >>>>>>>> -------------------------------------------------------------------------- >>>>>>>> MPI_ABORT was invoked on rank 911 in communicator MPI_COMM_WORLD >>>>>>>> >>>>>>>> On Wed, Aug 12, 2020 at 8:19 PM Mark Lohry wrote: >>>>>>>>>> Perhaps you are calling ISColoringGetIS() and not calling ISColoringRestoreIS()? >>>>>>>>> >>>>>>>>> I have matching ISColoringGet/Restore here, and it's only used prior to the first iteration so at least it doesn't seem to be growing. At the bottom I pasted the malloc_view and malloc_debug output from running 1 time step. >>>>>>>>> >>>>>>>>> I'm sort of thinking this might be a red herring -- is it possible the rank 0 process is chewing up dramatically more memory than others, like with logging or something? Like I mentioned earlier the total memory usage is well under the machine limits. I'll spring in some PetscMemoryGetMaximumUsage logging at every time step and try to get a big job going again. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Are you using Fortran? >>>>>>>>> >>>>>>>>> C++ >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> [ 0]1408 bytes PetscSplitReductionCreate() line 63 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>>>>>>>> [ 0]80 bytes PetscSplitReductionCreate() line 57 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>>>>>>>> [ 0]16 bytes PetscCommBuildTwoSided_Allreduce() line 169 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/mpits.c >>>>>>>>> [ 0]16 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>>>>> [ 0]272 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>>>>> [ 0]880 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>>>>> [ 0]960 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>>>>> [ 0]976 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>>>>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>>>>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>>>>> [ 0]1040 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>>>>> [ 0]64 bytes ISColoringGetIS() line 266 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/utils/iscoloring.c >>>>>>>>> [ 0]32 bytes PetscCommDuplicate() line 129 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/objects/tagm.c >>>>>>>>> [0] Maximum memory PetscMalloc()ed 610153776 maximum size of entire process 719073280 >>>>>>>>> [0] Memory usage sorted by function >>>>>>>>> [0] 6 192 DMCoarsenHookAdd() >>>>>>>>> [0] 2 9984 DMCreate() >>>>>>>>> [0] 2 128 DMCreate_Shell() >>>>>>>>> [0] 2 64 DMDSEnlarge_Static() >>>>>>>>> [0] 1 672 DMKSPCreate() >>>>>>>>> [0] 3 96 DMRefineHookAdd() >>>>>>>>> [0] 3 2064 DMSNESCreate() >>>>>>>>> [0] 4 128 DMSubDomainHookAdd() >>>>>>>>> [0] 1 768 DMTSCreate() >>>>>>>>> [0] 2 96 ISColoringCreate() >>>>>>>>> [0] 8 12608 ISColoringGetIS() >>>>>>>>> [0] 1 307200 ISConcatenate() >>>>>>>>> [0] 29 25984 ISCreate() >>>>>>>>> [0] 25 400 ISCreate_General() >>>>>>>>> [0] 4 64 ISCreate_Stride() >>>>>>>>> [0] 20 338016 ISGeneralSetIndices_General() >>>>>>>>> [0] 3 921600 ISGetIndices_Stride() >>>>>>>>> [0] 2 307232 ISGlobalToLocalMappingSetUp_Basic() >>>>>>>>> [0] 1 6144 ISInvertPermutation_General() >>>>>>>>> [0] 3 308576 ISLocalToGlobalMappingCreate() >>>>>>>>> [0] 2 32 KSPConvergedDefaultCreate() >>>>>>>>> [0] 2 2816 KSPCreate() >>>>>>>>> [0] 1 224 KSPCreate_FGMRES() >>>>>>>>> [0] 1 8016 KSPGMRESClassicalGramSchmidtOrthogonalization() >>>>>>>>> [0] 2 16032 KSPSetUp_FGMRES() >>>>>>>>> [0] 4 16084160 KSPSetUp_GMRES() >>>>>>>>> [0] 2 36864 MatColoringApply_SL() >>>>>>>>> [0] 1 656 MatColoringCreate() >>>>>>>>> [0] 6 17088 MatCreate() >>>>>>>>> [0] 1 16 MatCreateMFFD_WP() >>>>>>>>> [0] 1 16 MatCreateSubMatrices_SeqBAIJ() >>>>>>>>> [0] 1 12288 MatCreateSubMatrix_SeqBAIJ() >>>>>>>>> [0] 3 32320 MatCreateSubMatrix_SeqBAIJ_Private() >>>>>>>>> [0] 2 1472 MatCreate_MFFD() >>>>>>>>> [0] 1 416 MatCreate_SeqAIJ() >>>>>>>>> [0] 3 864 MatCreate_SeqBAIJ() >>>>>>>>> [0] 2 416 MatCreate_Shell() >>>>>>>>> [0] 1 784 MatFDColoringCreate() >>>>>>>>> [0] 2 12288 MatFDColoringDegreeSequence_Minpack() >>>>>>>>> [0] 6 30859392 MatFDColoringSetUp_SeqXAIJ() >>>>>>>>> [0] 3 42512 MatGetColumnIJ_SeqAIJ() >>>>>>>>> [0] 4 72720 MatGetColumnIJ_SeqBAIJ_Color() >>>>>>>>> [0] 1 6144 MatGetOrdering_Natural() >>>>>>>>> [0] 2 36384 MatGetRowIJ_SeqAIJ() >>>>>>>>> [0] 7 210626000 MatILUFactorSymbolic_SeqBAIJ() >>>>>>>>> [0] 2 313376 MatIncreaseOverlap_SeqBAIJ() >>>>>>>>> [0] 2 30740608 MatLUFactorNumeric_SeqBAIJ_N() >>>>>>>>> [0] 1 6144 MatMarkDiagonal_SeqAIJ() >>>>>>>>> [0] 1 6144 MatMarkDiagonal_SeqBAIJ() >>>>>>>>> [0] 8 256 MatRegisterRootName() >>>>>>>>> [0] 1 6160 MatSeqAIJCheckInode() >>>>>>>>> [0] 4 115216 MatSeqAIJSetPreallocation_SeqAIJ() >>>>>>>>> [0] 4 302779424 MatSeqBAIJSetPreallocation_SeqBAIJ() >>>>>>>>> [0] 13 576 MatSolverTypeRegister() >>>>>>>>> [0] 1 16 PCASMCreateSubdomains() >>>>>>>>> [0] 2 1664 PCCreate() >>>>>>>>> [0] 1 160 PCCreate_ASM() >>>>>>>>> [0] 1 192 PCCreate_ILU() >>>>>>>>> [0] 5 307264 PCSetUp_ASM() >>>>>>>>> [0] 2 416 PetscBTCreate() >>>>>>>>> [0] 2 3216 PetscClassPerfLogCreate() >>>>>>>>> [0] 2 1616 PetscClassRegLogCreate() >>>>>>>>> [0] 2 32 PetscCommBuildTwoSided_Allreduce() >>>>>>>>> [0] 2 64 PetscCommDuplicate() >>>>>>>>> [0] 2 1888 PetscDSCreate() >>>>>>>>> [0] 2 26416 PetscEventPerfLogCreate() >>>>>>>>> [0] 2 158400 PetscEventPerfLogEnsureSize() >>>>>>>>> [0] 2 1616 PetscEventRegLogCreate() >>>>>>>>> [0] 2 9600 PetscEventRegLogRegister() >>>>>>>>> [0] 8 102400 PetscFreeSpaceGet() >>>>>>>>> [0] 474 15168 PetscFunctionListAdd_Private() >>>>>>>>> [0] 2 528 PetscIntStackCreate() >>>>>>>>> [0] 142 11360 PetscLayoutCreate() >>>>>>>>> [0] 56 896 PetscLayoutSetUp() >>>>>>>>> [0] 59 9440 PetscObjectComposedDataIncreaseReal() >>>>>>>>> [0] 2 576 PetscObjectListAdd() >>>>>>>>> [0] 33 768 PetscOptionsGetEList() >>>>>>>>> [0] 1 16 PetscOptionsHelpPrintedCreate() >>>>>>>>> [0] 1 32 PetscPushSignalHandler() >>>>>>>>> [0] 7 6944 PetscSFCreate() >>>>>>>>> [0] 3 432 PetscSFCreate_Basic() >>>>>>>>> [0] 2 1472 PetscSFLinkCreate() >>>>>>>>> [0] 11 1229040 PetscSFSetUpRanks() >>>>>>>>> [0] 7 614512 PetscSFSetUp_Basic() >>>>>>>>> [0] 4 20096 PetscSegBufferCreate() >>>>>>>>> [0] 2 1488 PetscSplitReductionCreate() >>>>>>>>> [0] 2 3008 PetscStageLogCreate() >>>>>>>>> [0] 1148 23872 PetscStrallocpy() >>>>>>>>> [0] 6 13056 PetscStrreplace() >>>>>>>>> [0] 9 3456 PetscTableCreate() >>>>>>>>> [0] 1 16 PetscViewerASCIIOpen() >>>>>>>>> [0] 6 96 PetscViewerAndFormatCreate() >>>>>>>>> [0] 1 752 PetscViewerCreate() >>>>>>>>> [0] 1 96 PetscViewerCreate_ASCII() >>>>>>>>> [0] 2 1424 SNESCreate() >>>>>>>>> [0] 1 16 SNESCreate_NEWTONLS() >>>>>>>>> [0] 1 1008 SNESLineSearchCreate() >>>>>>>>> [0] 1 16 SNESLineSearchCreate_BT() >>>>>>>>> [0] 16 1824 SNESMSRegister() >>>>>>>>> [0] 46 9056 TSARKIMEXRegister() >>>>>>>>> [0] 1 1264 TSAdaptCreate() >>>>>>>>> [0] 8 384 TSBasicSymplecticRegister() >>>>>>>>> [0] 1 2160 TSCreate() >>>>>>>>> [0] 1 224 TSCreate_Theta() >>>>>>>>> [0] 48 5968 TSGLEERegister() >>>>>>>>> [0] 41 7728 TSRKRegister() >>>>>>>>> [0] 89 14736 TSRosWRegister() >>>>>>>>> [0] 71 110192 VecCreate() >>>>>>>>> [0] 1 307200 VecCreateGhostWithArray() >>>>>>>>> [0] 123 36874080 VecCreate_MPI_Private() >>>>>>>>> [0] 7 4300800 VecCreate_Seq() >>>>>>>>> [0] 8 256 VecCreate_Seq_Private() >>>>>>>>> [0] 6 400 VecDuplicateVecs_Default() >>>>>>>>> [0] 3 2352 VecScatterCreate() >>>>>>>>> [0] 7 1843296 VecScatterSetUp_SF() >>>>>>>>> [0] 126 2016 VecStashCreate_Private() >>>>>>>>> [0] 1 3072 mapBlockColoringToJacobian() >>>>>>>>> >>>>>>>>> On Wed, Aug 12, 2020 at 4:22 PM Barry Smith wrote: >>>>>>>>>> >>>>>>>>>> Yes, there are some PETSc objects or arrays that you are not freeing so they are printed at the end of the run. For small runs this harmless but if new objects/memory is allocated at each iteration and not suitably freed it will eventually add up. >>>>>>>>>> >>>>>>>>>> Run with -malloc_view (small problem with say 2 iterations) it will print everything allocated and might be helpful. >>>>>>>>>> >>>>>>>>>> Perhaps you are calling ISColoringGetIS() and not calling ISColoringRestoreIS()? >>>>>>>>>> >>>>>>>>>> It is also possible it is a leak in PETSc, but that is unlikely since we test for them. >>>>>>>>>> >>>>>>>>>> Are you using Fortran? >>>>>>>>>> >>>>>>>>>> Barry >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Aug 12, 2020, at 1:29 PM, Mark Lohry wrote: >>>>>>>>>>> >>>>>>>>>>> Thanks Matt and Barry. At Matt's suggestion I ran a smaller representative case with valgrind and didn't see anything alarming (apart from a small leak in an older boost version I was using: https://github.com/boostorg/serialization/issues/104 although I don't think this was causing the issue). >>>>>>>>>>> >>>>>>>>>>> -malloc_debug dumps quite a lot, this is supposed to be empty right? Output pasted below. It looks like the same sequence of calls is repeated 8 times, which is how many nonlinear solves occurred in this particular run. Thoughts? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> [ 0]1408 bytes PetscSplitReductionCreate() line 63 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>>>>>>>>>> [ 0]80 bytes PetscSplitReductionCreate() line 57 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>>>>>>>>>> [ 0]16 bytes PetscCommBuildTwoSided_Allreduce() line 169 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/mpits.c >>>>>>>>>>> [ 0]16 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>>>>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>>>>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>>>>>>> [ 0]272 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>>>>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>>>>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>>>>>>> [ 0]880 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>>>>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>>>>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>>>>>>> [ 0]960 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>>>>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>>>>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>>>>>>> [ 0]976 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>>>>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>>>>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>>>>>>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>>>>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>>>>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>>>>>>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>>>>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>>>>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>>>>>>> [ 0]1040 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>>>>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>>>>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>>>>>>>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>>>>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>>>>>>>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>>>>>>>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>>>>>>>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>>>>>>>>> [ 0]64 bytes ISColoringGetIS() line 266 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/utils/iscoloring.c >>>>>>>>>>> [ 0]32 bytes PetscCommDuplicate() line 129 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/objects/tagm.c >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Aug 12, 2020 at 1:46 PM Barry Smith wrote: >>>>>>>>>>>> >>>>>>>>>>>> Mark. >>>>>>>>>>>> >>>>>>>>>>>> When valgrind is not feasible (like on many centrally controlled batch systems) you can run PETSc with an extra flag to do some memory error checks >>>>>>>>>>>> -malloc_debug >>>>>>>>>>>> >>>>>>>>>>>> this >>>>>>>>>>>> >>>>>>>>>>>> 1) fills all malloced memory with Nan so if the code is using uninitialized memory it may be detected and >>>>>>>>>>>> 2) checks the beginning and end of each alloced memory region for out-of-bounds writes at each malloc and free. >>>>>>>>>>>> >>>>>>>>>>>> it will slow the code down a little bit but generally not a huge amount. >>>>>>>>>>>> >>>>>>>>>>>> It is no where near as good as valgrind or other memory corruption tools but it has the advantage you can run it anywhere on any size job. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Barry >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On Aug 12, 2020, at 7:46 AM, Matthew Knepley wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Aug 12, 2020 at 7:53 AM Mark Lohry wrote: >>>>>>>>>>>>>> I'm getting seemingly random failures of late: >>>>>>>>>>>>>> Caught signal number 7 BUS: Bus Error, possibly illegal memory access >>>>>>>>>>>>> >>>>>>>>>>>>> The first thing I would do is run valgrind on as wide an array of tests as you can. This will find problems >>>>>>>>>>>>> on things that run completely fine. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>>>>>> Matt >>>>>>>>>>>>> >>>>>>>>>>>>>> Symptoms: >>>>>>>>>>>>>> 1) Seems to only happen (so far) on larger cases, 400-2000 cores >>>>>>>>>>>>>> 2) It doesn't happen right away -- this was running happily for several hours over several hundred time steps with no indication of bad health in the numerics >>>>>>>>>>>>>> 3) At least the total memory consumption seems to be within bounds, though I'm not sure about individual processes. e.g. slurm here reported Memory Efficiency: 75.23% of 1.76 TB (180.00 GB/node) >>>>>>>>>>>>>> 4) running the same setup twice it fails at different points >>>>>>>>>>>>>> >>>>>>>>>>>>>> Any suggestions on what to look for? This is a bit painful to work on as I can only reproduce it on large runs and then it's seemingly random. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Mark >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>>> >>>>>>>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Aug 24 11:22:28 2020 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 24 Aug 2020 11:22:28 -0500 Subject: [petsc-users] Bus Error In-Reply-To: References: <8D172ADD-FC1A-4E71-B151-CA648951A61C@petsc.dev> <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> <02176361-CBC4-490B-A064-057C9FAC271A@petsc.dev> Message-ID: <2C14E111-0ABC-4322-AD1C-FC83E4BD4798@petsc.dev> Some of the MKL versions of BLAS use the new Phi-like SIMD instructions that require 64 byte alignment but this probably does not apply to you. Barry > On Aug 24, 2020, at 9:54 AM, Mark Lohry wrote: > > Thanks Barry, I'll give -malloc_debug a shot. > > I know this is not necessarily a reasonable test but if you run the exact same thing twice does it crash at the same location in terms of iterations or does it seem to crash eventually "randomly" just after a long time? > > Crashes after a different number of iterations, seemingly random. > > > I understand the frustration with this kind of crash, it just shouldn't happen because the same BLAS calls have been made in the same way thousands of times and yet suddenly trouble and very hard to debug. > > Eventually makes for a good war story. > > Thinking back, I have seen some disturbing memory behavior that I think falls back to my use of eigen... e.g. in the past when running my full test suite a particular case would fail with NaNs, but if I ran that case in isolation it passes. I wonder if some object isn't getting properly aligned and at some point some kind of corruption occurs? > > On Mon, Aug 24, 2020 at 10:35 AM Barry Smith > wrote: > > Mark, > > Ok, I'd generally trust the stock BLAS for not failing over OpenBLAS. > > Since valgrind is not viable have you tried with -malloc_debug with the bad case it will be a little bit slower but not to bad and can find some memory corruption issues. > > It might be useful to get the stack trace inside the BLAS to see exactly where it crashes. If you ./configure with debugging and use --download-fblaslapack or --download-f2cblaslapack it will compile the BLAS with debugging, but just running a batch job still won't display the stack frames inside the BLAS call. > > We have an option -on_error_attach_debugger which is useful for longer many rank runs that attaches the debugger ONLY when the error is detected but it may not play well with batch systems. But if you can make your run on a non-batch system it might be able, along with the --download-fblaslapack or --download-f2cblaslapack to get the exact stack frames. And in the debugger look at the variables and address points to try to determine how it could have gone wrong. > > I know this is not necessarily a reasonable test but if you run the exact same thing twice does it crash at the same location in terms of iterations or does it seem to crash eventually "randomly" just after a long time? > > I understand the frustration with this kind of crash, it just shouldn't happen because the same BLAS calls have been made in the same way thousands of times and yet suddenly trouble and very hard to debug. > > Barry > > > > >> On Aug 24, 2020, at 9:15 AM, Mark Lohry > wrote: >> >> valgrind: I ran a much smaller case and didn't see any issues in valgrind. I'm only seeing this bus error on several hundred cores a few hours wallclock in, so it might not be feasible to run that in valgrind. >> >> blas: i'm not entirely sure -- it's the stock one in PUIAS linux (red hat derivative), libblas.so.3.4.2.. i'm going to try with intel and if that fails use the openblas downloaded via petsc and see if it alleviates itself. >> >> >> >> On Mon, Aug 24, 2020 at 9:48 AM Barry Smith > wrote: >> >> Mark, >> >> Can you run in valgrind? >> >> Exactly what BLAS are you using? >> >> Barry >> >> >>> On Aug 24, 2020, at 7:54 AM, Mark Lohry > wrote: >>> >>> Reran with debug mode and got a stack trace for this bus error, looks like it's happening in BLASgemv, see pasted below. I did take care of the ISColoring leak mentioned previously, although that was a very small amount of data and I don't think is relevant here. >>> >>> At this point it's happily run 222 timesteps prior to this, so I'm a little mystified. Any ideas? >>> >>> Thanks, >>> Mark >>> >>> >>> 222 TS dt 0.03 time 6.66 >>> 0 SNES Function norm 4.124287265556e+02 >>> 0 KSP Residual norm 4.124287265556e+02 >>> 1 KSP Residual norm 4.123248052318e+02 >>> 2 KSP Residual norm 4.123173350456e+02 >>> 3 KSP Residual norm 4.118769044110e+02 >>> 4 KSP Residual norm 4.094856150740e+02 >>> 5 KSP Residual norm 4.006000788078e+02 >>> 6 KSP Residual norm 3.787922969183e+02 >>> [clip] >>> Linear solve converged due to CONVERGED_RTOL iterations 9 >>> Line search: Using full step: fnorm 4.015236590684e+01 gnorm 3.173434863784e+00 >>> 2 SNES Function norm 3.173434863784e+00 >>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 2 >>> 0 SNES Function norm 5.842010710080e+02 >>> 0 KSP Residual norm 5.842010710080e+02 >>> 1 KSP Residual norm 5.840526408234e+02 >>> 2 KSP Residual norm 5.840431857354e+02 >>> 3 KSP Residual norm 5.834351392302e+02 >>> 4 KSP Residual norm 5.800901047861e+02 >>> 5 KSP Residual norm 5.675562288567e+02 >>> 6 KSP Residual norm 5.366287895681e+02 >>> 7 KSP Residual norm 4.725811521866e+02 >>> [911]PETSC ERROR: ------------------------------------------------------------------------ >>> [911]PETSC ERROR: Caught signal number 7 BUS: Bus Error, possibly illegal memory access >>> [911]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >>> [911]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >>> [911]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors >>> [911]PETSC ERROR: likely location of problem given in stack below >>> [911]PETSC ERROR: --------------------- Stack Frames ------------------------------------ >>> [911]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, >>> [911]PETSC ERROR: INSTEAD the line number of the start of the function >>> [911]PETSC ERROR: is given. >>> [911]PETSC ERROR: [911] BLASgemv line 1393 /home/mlohry/build/external/petsc/src/mat/impls/baij/seq/baijfact.c >>> [911]PETSC ERROR: [911] MatSolve_SeqBAIJ_N_NaturalOrdering line 1378 /home/mlohry/build/external/petsc/src/mat/impls/baij/seq/baijfact.c >>> [911]PETSC ERROR: [911] MatSolve line 3354 /home/mlohry/build/external/petsc/src/mat/interface/matrix.c >>> [911]PETSC ERROR: [911] PCApply_ILU line 201 /home/mlohry/build/external/petsc/src/ksp/pc/impls/factor/ilu/ilu.c >>> [911]PETSC ERROR: [911] PCApply line 426 /home/mlohry/build/external/petsc/src/ksp/pc/interface/precon.c >>> [911]PETSC ERROR: [911] KSP_PCApply line 279 /home/mlohry/build/external/petsc/include/petsc/private/kspimpl.h >>> [911]PETSC ERROR: [911] KSPSolve_PREONLY line 16 /home/mlohry/build/external/petsc/src/ksp/ksp/impls/preonly/preonly.c >>> [911]PETSC ERROR: [911] KSPSolve_Private line 590 /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >>> [911]PETSC ERROR: [911] KSPSolve line 848 /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >>> [911]PETSC ERROR: [911] PCApply_ASM line 441 /home/mlohry/build/external/petsc/src/ksp/pc/impls/asm/asm.c >>> [911]PETSC ERROR: [911] PCApply line 426 /home/mlohry/build/external/petsc/src/ksp/pc/interface/precon.c >>> [911]PETSC ERROR: [911] KSP_PCApply line 279 /home/mlohry/build/external/petsc/include/petsc/private/kspimpl.h >>> [911]PETSC ERROR: [911] KSPFGMRESCycle line 108 /home/mlohry/build/external/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c >>> [911]PETSC ERROR: [911] KSPSolve_FGMRES line 274 /home/mlohry/build/external/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c >>> [911]PETSC ERROR: [911] KSPSolve_Private line 590 /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >>> [911]PETSC ERROR: [911] KSPSolve line 848 /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >>> [911]PETSC ERROR: [911] SNESSolve_NEWTONLS line 144 /home/mlohry/build/external/petsc/src/snes/impls/ls/ls.c >>> [911]PETSC ERROR: [911] SNESSolve line 4403 /home/mlohry/build/external/petsc/src/snes/interface/snes.c >>> [911]PETSC ERROR: [911] TSStep_ARKIMEX line 728 /home/mlohry/build/external/petsc/src/ts/impls/arkimex/arkimex.c >>> [911]PETSC ERROR: [911] TSStep line 3682 /home/mlohry/build/external/petsc/src/ts/interface/ts.c >>> [911]PETSC ERROR: [911] TSSolve line 4005 /home/mlohry/build/external/petsc/src/ts/interface/ts.c >>> [911]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>> [911]PETSC ERROR: Signal received >>> [911]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>> [911]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020 >>> [911]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h20c2n20 by mlohry Sun Aug 23 19:54:21 2020 >>> [911]PETSC ERROR: Configure options PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt --with-cc=/usr/local/openmpi/3.1.3/gcc/x8 >>> [911]PETSC ERROR: #1 User provided function() line 0 in unknown file >>> -------------------------------------------------------------------------- >>> MPI_ABORT was invoked on rank 911 in communicator MPI_COMM_WORLD >>> >>> On Wed, Aug 12, 2020 at 8:19 PM Mark Lohry > wrote: >>> Perhaps you are calling ISColoringGetIS() and not calling ISColoringRestoreIS()? >>> >>> I have matching ISColoringGet/Restore here, and it's only used prior to the first iteration so at least it doesn't seem to be growing. At the bottom I pasted the malloc_view and malloc_debug output from running 1 time step. >>> >>> I'm sort of thinking this might be a red herring -- is it possible the rank 0 process is chewing up dramatically more memory than others, like with logging or something? Like I mentioned earlier the total memory usage is well under the machine limits. I'll spring in some PetscMemoryGetMaximumUsage logging at every time step and try to get a big job going again. >>> >>> >>> >>> Are you using Fortran? >>> >>> C++ >>> >>> >>> >>> [ 0]1408 bytes PetscSplitReductionCreate() line 63 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>> [ 0]80 bytes PetscSplitReductionCreate() line 57 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>> [ 0]16 bytes PetscCommBuildTwoSided_Allreduce() line 169 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/mpits.c >>> [ 0]16 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]272 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]880 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]960 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]976 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]1040 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]64 bytes ISColoringGetIS() line 266 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/utils/iscoloring.c >>> [ 0]32 bytes PetscCommDuplicate() line 129 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/objects/tagm.c >>> [0] Maximum memory PetscMalloc()ed 610153776 maximum size of entire process 719073280 >>> [0] Memory usage sorted by function >>> [0] 6 192 DMCoarsenHookAdd() >>> [0] 2 9984 DMCreate() >>> [0] 2 128 DMCreate_Shell() >>> [0] 2 64 DMDSEnlarge_Static() >>> [0] 1 672 DMKSPCreate() >>> [0] 3 96 DMRefineHookAdd() >>> [0] 3 2064 DMSNESCreate() >>> [0] 4 128 DMSubDomainHookAdd() >>> [0] 1 768 DMTSCreate() >>> [0] 2 96 ISColoringCreate() >>> [0] 8 12608 ISColoringGetIS() >>> [0] 1 307200 ISConcatenate() >>> [0] 29 25984 ISCreate() >>> [0] 25 400 ISCreate_General() >>> [0] 4 64 ISCreate_Stride() >>> [0] 20 338016 ISGeneralSetIndices_General() >>> [0] 3 921600 ISGetIndices_Stride() >>> [0] 2 307232 ISGlobalToLocalMappingSetUp_Basic() >>> [0] 1 6144 ISInvertPermutation_General() >>> [0] 3 308576 ISLocalToGlobalMappingCreate() >>> [0] 2 32 KSPConvergedDefaultCreate() >>> [0] 2 2816 KSPCreate() >>> [0] 1 224 KSPCreate_FGMRES() >>> [0] 1 8016 KSPGMRESClassicalGramSchmidtOrthogonalization() >>> [0] 2 16032 KSPSetUp_FGMRES() >>> [0] 4 16084160 KSPSetUp_GMRES() >>> [0] 2 36864 MatColoringApply_SL() >>> [0] 1 656 MatColoringCreate() >>> [0] 6 17088 MatCreate() >>> [0] 1 16 MatCreateMFFD_WP() >>> [0] 1 16 MatCreateSubMatrices_SeqBAIJ() >>> [0] 1 12288 MatCreateSubMatrix_SeqBAIJ() >>> [0] 3 32320 MatCreateSubMatrix_SeqBAIJ_Private() >>> [0] 2 1472 MatCreate_MFFD() >>> [0] 1 416 MatCreate_SeqAIJ() >>> [0] 3 864 MatCreate_SeqBAIJ() >>> [0] 2 416 MatCreate_Shell() >>> [0] 1 784 MatFDColoringCreate() >>> [0] 2 12288 MatFDColoringDegreeSequence_Minpack() >>> [0] 6 30859392 MatFDColoringSetUp_SeqXAIJ() >>> [0] 3 42512 MatGetColumnIJ_SeqAIJ() >>> [0] 4 72720 MatGetColumnIJ_SeqBAIJ_Color() >>> [0] 1 6144 MatGetOrdering_Natural() >>> [0] 2 36384 MatGetRowIJ_SeqAIJ() >>> [0] 7 210626000 MatILUFactorSymbolic_SeqBAIJ() >>> [0] 2 313376 MatIncreaseOverlap_SeqBAIJ() >>> [0] 2 30740608 MatLUFactorNumeric_SeqBAIJ_N() >>> [0] 1 6144 MatMarkDiagonal_SeqAIJ() >>> [0] 1 6144 MatMarkDiagonal_SeqBAIJ() >>> [0] 8 256 MatRegisterRootName() >>> [0] 1 6160 MatSeqAIJCheckInode() >>> [0] 4 115216 MatSeqAIJSetPreallocation_SeqAIJ() >>> [0] 4 302779424 MatSeqBAIJSetPreallocation_SeqBAIJ() >>> [0] 13 576 MatSolverTypeRegister() >>> [0] 1 16 PCASMCreateSubdomains() >>> [0] 2 1664 PCCreate() >>> [0] 1 160 PCCreate_ASM() >>> [0] 1 192 PCCreate_ILU() >>> [0] 5 307264 PCSetUp_ASM() >>> [0] 2 416 PetscBTCreate() >>> [0] 2 3216 PetscClassPerfLogCreate() >>> [0] 2 1616 PetscClassRegLogCreate() >>> [0] 2 32 PetscCommBuildTwoSided_Allreduce() >>> [0] 2 64 PetscCommDuplicate() >>> [0] 2 1888 PetscDSCreate() >>> [0] 2 26416 PetscEventPerfLogCreate() >>> [0] 2 158400 PetscEventPerfLogEnsureSize() >>> [0] 2 1616 PetscEventRegLogCreate() >>> [0] 2 9600 PetscEventRegLogRegister() >>> [0] 8 102400 PetscFreeSpaceGet() >>> [0] 474 15168 PetscFunctionListAdd_Private() >>> [0] 2 528 PetscIntStackCreate() >>> [0] 142 11360 PetscLayoutCreate() >>> [0] 56 896 PetscLayoutSetUp() >>> [0] 59 9440 PetscObjectComposedDataIncreaseReal() >>> [0] 2 576 PetscObjectListAdd() >>> [0] 33 768 PetscOptionsGetEList() >>> [0] 1 16 PetscOptionsHelpPrintedCreate() >>> [0] 1 32 PetscPushSignalHandler() >>> [0] 7 6944 PetscSFCreate() >>> [0] 3 432 PetscSFCreate_Basic() >>> [0] 2 1472 PetscSFLinkCreate() >>> [0] 11 1229040 PetscSFSetUpRanks() >>> [0] 7 614512 PetscSFSetUp_Basic() >>> [0] 4 20096 PetscSegBufferCreate() >>> [0] 2 1488 PetscSplitReductionCreate() >>> [0] 2 3008 PetscStageLogCreate() >>> [0] 1148 23872 PetscStrallocpy() >>> [0] 6 13056 PetscStrreplace() >>> [0] 9 3456 PetscTableCreate() >>> [0] 1 16 PetscViewerASCIIOpen() >>> [0] 6 96 PetscViewerAndFormatCreate() >>> [0] 1 752 PetscViewerCreate() >>> [0] 1 96 PetscViewerCreate_ASCII() >>> [0] 2 1424 SNESCreate() >>> [0] 1 16 SNESCreate_NEWTONLS() >>> [0] 1 1008 SNESLineSearchCreate() >>> [0] 1 16 SNESLineSearchCreate_BT() >>> [0] 16 1824 SNESMSRegister() >>> [0] 46 9056 TSARKIMEXRegister() >>> [0] 1 1264 TSAdaptCreate() >>> [0] 8 384 TSBasicSymplecticRegister() >>> [0] 1 2160 TSCreate() >>> [0] 1 224 TSCreate_Theta() >>> [0] 48 5968 TSGLEERegister() >>> [0] 41 7728 TSRKRegister() >>> [0] 89 14736 TSRosWRegister() >>> [0] 71 110192 VecCreate() >>> [0] 1 307200 VecCreateGhostWithArray() >>> [0] 123 36874080 VecCreate_MPI_Private() >>> [0] 7 4300800 VecCreate_Seq() >>> [0] 8 256 VecCreate_Seq_Private() >>> [0] 6 400 VecDuplicateVecs_Default() >>> [0] 3 2352 VecScatterCreate() >>> [0] 7 1843296 VecScatterSetUp_SF() >>> [0] 126 2016 VecStashCreate_Private() >>> [0] 1 3072 mapBlockColoringToJacobian() >>> >>> On Wed, Aug 12, 2020 at 4:22 PM Barry Smith > wrote: >>> >>> Yes, there are some PETSc objects or arrays that you are not freeing so they are printed at the end of the run. For small runs this harmless but if new objects/memory is allocated at each iteration and not suitably freed it will eventually add up. >>> >>> Run with -malloc_view (small problem with say 2 iterations) it will print everything allocated and might be helpful. >>> >>> Perhaps you are calling ISColoringGetIS() and not calling ISColoringRestoreIS()? >>> >>> It is also possible it is a leak in PETSc, but that is unlikely since we test for them. >>> >>> Are you using Fortran? >>> >>> Barry >>> >>> >>>> On Aug 12, 2020, at 1:29 PM, Mark Lohry > wrote: >>>> >>>> Thanks Matt and Barry. At Matt's suggestion I ran a smaller representative case with valgrind and didn't see anything alarming (apart from a small leak in an older boost version I was using: https://github.com/boostorg/serialization/issues/104 although I don't think this was causing the issue). >>>> >>>> -malloc_debug dumps quite a lot, this is supposed to be empty right? Output pasted below. It looks like the same sequence of calls is repeated 8 times, which is how many nonlinear solves occurred in this particular run. Thoughts? >>>> >>>> >>>> >>>> [ 0]1408 bytes PetscSplitReductionCreate() line 63 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>>> [ 0]80 bytes PetscSplitReductionCreate() line 57 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>>> [ 0]16 bytes PetscCommBuildTwoSided_Allreduce() line 169 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/mpits.c >>>> [ 0]16 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>> [ 0]272 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>> [ 0]880 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>> [ 0]960 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>> [ 0]976 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>> [ 0]1040 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>> [ 0]64 bytes ISColoringGetIS() line 266 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/utils/iscoloring.c >>>> [ 0]32 bytes PetscCommDuplicate() line 129 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/objects/tagm.c >>>> >>>> >>>> >>>> On Wed, Aug 12, 2020 at 1:46 PM Barry Smith > wrote: >>>> >>>> Mark. >>>> >>>> When valgrind is not feasible (like on many centrally controlled batch systems) you can run PETSc with an extra flag to do some memory error checks >>>> -malloc_debug >>>> >>>> this >>>> >>>> 1) fills all malloced memory with Nan so if the code is using uninitialized memory it may be detected and >>>> 2) checks the beginning and end of each alloced memory region for out-of-bounds writes at each malloc and free. >>>> >>>> it will slow the code down a little bit but generally not a huge amount. >>>> >>>> It is no where near as good as valgrind or other memory corruption tools but it has the advantage you can run it anywhere on any size job. >>>> >>>> >>>> Barry >>>> >>>> >>>> >>>> >>>> >>>>> On Aug 12, 2020, at 7:46 AM, Matthew Knepley > wrote: >>>>> >>>>> On Wed, Aug 12, 2020 at 7:53 AM Mark Lohry > wrote: >>>>> I'm getting seemingly random failures of late: >>>>> Caught signal number 7 BUS: Bus Error, possibly illegal memory access >>>>> >>>>> The first thing I would do is run valgrind on as wide an array of tests as you can. This will find problems >>>>> on things that run completely fine. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> Symptoms: >>>>> 1) Seems to only happen (so far) on larger cases, 400-2000 cores >>>>> 2) It doesn't happen right away -- this was running happily for several hours over several hundred time steps with no indication of bad health in the numerics >>>>> 3) At least the total memory consumption seems to be within bounds, though I'm not sure about individual processes. e.g. slurm here reported Memory Efficiency: 75.23% of 1.76 TB (180.00 GB/node) >>>>> 4) running the same setup twice it fails at different points >>>>> >>>>> Any suggestions on what to look for? This is a bit painful to work on as I can only reproduce it on large runs and then it's seemingly random. >>>>> >>>>> >>>>> Thanks, >>>>> Mark >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdkong.jd at gmail.com Mon Aug 24 11:29:00 2020 From: fdkong.jd at gmail.com (Fande Kong) Date: Mon, 24 Aug 2020 10:29:00 -0600 Subject: [petsc-users] Disable PETSC_HAVE_CLOSURE In-Reply-To: <87364hibu8.fsf@jedbrown.org> References: <87364hibu8.fsf@jedbrown.org> Message-ID: Thanks for your reply, Jed. Barry, do you have any comment? Fande, On Thu, Aug 20, 2020 at 9:19 AM Jed Brown wrote: > Barry, this is a side-effect of your Swift experiment. Does that need to > be in a header (even if it's a private header)? > > The issue may be that you test with a C compiler and it gets included in > C++ source. > > Fande Kong writes: > > > Hi All, > > > > We (moose team) hit an error message when compiling PETSc, recently. The > > error is related to "PETSC_HAVE_CLOSURE." Everything runs well if I am > > going to turn this flag off by making the following changes: > > > > > > git diff > > diff --git a/config/BuildSystem/config/utilities/closure.py > > b/config/BuildSystem/config/utilities/closure.py > > index 6341ddf271..930e5b3b1b 100644 > > --- a/config/BuildSystem/config/utilities/closure.py > > +++ b/config/BuildSystem/config/utilities/closure.py > > @@ -19,8 +19,8 @@ class Configure(config.base.Configure): > > includes = '#include \n' > > body = 'int (^closure)(int);' > > self.pushLanguage('C') > > - if self.checkLink(includes, body): > > - self.addDefine('HAVE_CLOSURE','1') > > +# if self.checkLink(includes, body): > > +# self.addDefine('HAVE_CLOSURE','1') > > def configure(self): > > self.executeTest(self.configureClosure) > > > > > > I was wondering if there exists a configuration option to disable > "Closure" > > C syntax? I did not find one by running "configuration --help" > > > > Please let me know if you need more information. > > > > > > Thanks, > > > > Fande, > > > > > > In file included from > > > /Users/milljm/projects/moose/scripts/../libmesh/src/solvers/petscdmlibmesh.C:25: > > > /Users/milljm/projects/moose/petsc/include/petsc/private/petscimpl.h:15:29: > > warning: 'PetscVFPrintfSetClosure' initialized and declared 'extern' > > 15 | PETSC_EXTERN PetscErrorCode PetscVFPrintfSetClosure(int (^)(const > > char*)); > > | ^~~~~~~~~~~~~~~~~~~~~~~ > > > /Users/milljm/projects/moose/petsc/include/petsc/private/petscimpl.h:15:53: > > error: expected primary-expression before 'int' > > 15 | PETSC_EXTERN PetscErrorCode PetscVFPrintfSetClosure(int (^)(const > > char*)); > > | ^~~ > > CXX src/systems/libmesh_opt_la-equation_systems_io.lo > > In file included from > > /Users/milljm/projects/moose/petsc/include/petsc/private/dmimpl.h:7, > > from > > > /Users/milljm/projects/moose/scripts/../libmesh/src/solvers/petscdmlibmeshimpl.C:26: > > > /Users/milljm/projects/moose/petsc/include/petsc/private/petscimpl.h:15:29: > > warning: 'PetscVFPrintfSetClosure' initialized and declared 'extern' > > 15 | PETSC_EXTERN PetscErrorCode PetscVFPrintfSetClosure(int (^)(const > > char*)); > > | ^~~~~~~~~~~~~~~~~~~~~~~ > > > /Users/milljm/projects/moose/petsc/include/petsc/private/petscimpl.h:15:53: > > error: expected primary-expression before 'int' > > 15 | PETSC_EXTERN PetscErrorCode PetscVFPrintfSetClosure(int (^)(const > > char*)); > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Aug 24 11:33:51 2020 From: jed at jedbrown.org (Jed Brown) Date: Mon, 24 Aug 2020 10:33:51 -0600 Subject: [petsc-users] Bus Error In-Reply-To: <2C14E111-0ABC-4322-AD1C-FC83E4BD4798@petsc.dev> References: <8D172ADD-FC1A-4E71-B151-CA648951A61C@petsc.dev> <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> <02176361-CBC4-490B-A064-057C9FAC271A@petsc.dev> <2C14E111-0ABC-4322-AD1C-FC83E4BD4798@petsc.dev> Message-ID: <87h7ssgg0g.fsf@jedbrown.org> Barry Smith writes: > Some of the MKL versions of BLAS use the new Phi-like SIMD instructions that require 64 byte alignment but this probably does not apply to you. They shouldn't ever require that the input array be aligned. For large sizes, they'll always be packing tiles anyway, in which case MKL (or BLIS, etc) is responsible for alignment of the tiles. From thibault.bridelbertomeu at gmail.com Mon Aug 24 11:38:02 2020 From: thibault.bridelbertomeu at gmail.com (Thibault Bridel-Bertomeu) Date: Mon, 24 Aug 2020 18:38:02 +0200 Subject: [petsc-users] PetscFV and TS implicit In-Reply-To: References: <87mu2pgtdp.fsf@jedbrown.org> <01FA5D4D-A0CA-4ACB-ACC9-EB213E3B0D2F@petsc.dev> Message-ID: Thank you Barry for taking the time to go through the code ! I indeed figured out this afternoon that the function related to the matrix-vector product is always handling global vectors. I corrected mine so that it compiles, but I have a feeling it won't run properly without a preconditioner. Anyways you are right, my PetscJacobianFunction_JFNK() aims at doing some basic finite-differencing ; user->RHS_ref is my F(U) if you see the system as dU/dt = F(U). As for the DMGlobalToLocal() it was there mainly because I had not realized the vectors I was manipulating were global. I will take your advice and try with just the SNESSetUseMatrixFree. I haven't quite fully understood what it does "under the hood" though: just calling SNESSetUseMatrixFree(snes,PETSC_TRUE,PETSC_FALSE) before the TSSolve call is enough to ensure that the implicit matrix is computed ? Does it use the function we set as a RHS to build the matrix ? To create the preconditioner I will do as you suggest too, thank you. This matrix has to be as close as possible to the inverse of the implicit matrix to ensure that the eigenvalues of the system are as close to 1 as possible. Given the implicit matrix is built "automatically" thanks to the SNES matrix free capability, can we use that matrix as a starting point to the building of the preconditioner ? You were talking about the coloring capabilities in PETSc, is that where it can be applied ? Thank you so much again, Thibault Le lun. 24 ao?t 2020 ? 15:45, Barry Smith a ?crit : > > I think the attached is wrong. > > > The input to the matrix vector product for the Jacobian is always global > vectors which means on each process the dimension is not the size of the > DMGetLocalVector() it should be the VecGetLocalSize() of the > DMGetGlobalVector() > > But you may be able to skip all this and have the DM create the shell > matrix setting it sizes appropriately and you only need to supply the MATOP > > DMSetMatType(dm,MATSHELL); > DMCreateMatrix(dm,&A); > > In fact, I also don't understand the PetscJacobianFunction_JFKN() function > It seems to be doing finite differencing on the > DMPlexTSComputeRHSFunctionFVM() assuming the current function value is in > usr->RHS_ref. How is this different than just letting PETSc/SNES used > finite differences to do the matrix-vector product. Your code seems rather > complicated with the DMGlobalToLocal() which I don't understand what it is > suppose to do there. > > I think you can just call > > TSGetSNES() > SNESSetUseMatrixFree(snes,PETSC_TRUE,PETSC_FALSE); > > and it will set up an internal matrix that does the finite differencing > for you. Then you never need a shell matrix. > > > Also to create the preconditioner matrix B this should work > > DMSetMatType(dm,MATAIJ); > DMCreateMatrix(dm,&B); > > no need for you to figure out the sizes. > > > Note that both A and B need to have the same dimensions on each process as > the global vectors which I don't think your current code has. > > > > Barry > > > > On Aug 24, 2020, at 12:56 AM, Thibault Bridel-Bertomeu < > thibault.bridelbertomeu at gmail.com> wrote: > > Barry, first of all, thank you very much for your detailed answer, I keep > reading it to let it soak in - I might come back to you for more details if > you do not mind. > > In the meantime, to fuel the conversation, I attach to this e-mail two > pdfs containing the pieces of the code that regard what we are discussing. > In the *timedisc.pdf, you'll find how I handle the initialization of the TS > object, and in the *petscdefs.pdf you'll find the method that calls the > TSSolve as well as the methods that are linked to the TS (the timestep > adapt, the jacobian etc ...). [Sorry for the quality, I cannot do better > than that sort of pdf ...] > > Based on what is in the structured code I sent you the other day, I > rewrote the PetscJacobianFunction_JFNK. I think it should be all right, but > although it compiles, execution raises a seg fault I think when I do > ierr = TSSetIJacobian(ts, A, A, PetscIJacobian, user); > saying that A does not have the right dimensions. It is quite new, I am > still looking into where exactly the error is raised. What do you think of > this implementation though, does it look correct in your expert eyes ? > As for what we really discussed so far, it's that > PetscComputePreconMatImpl that I do not know how to implement (with the > derivative of the jacobian based on the FVM object). > > I understand now that what I am showing you today might not be the right > way to go if one wants to really use the PetscFV, but I just wanted to add > those code lines to the conversation to have your feedback. > > Thank you again for your help, > > Thibault > > > Le ven. 21 ao?t 2020 ? 19:25, Barry Smith a ?crit : > > >> >> On Aug 21, 2020, at 10:58 AM, Thibault Bridel-Bertomeu < >> thibault.bridelbertomeu at gmail.com> wrote: >> >> Thank you Barry for the tip ! I?ll make sure to do that when everything >> is set. >> What I also meant is that there will not be any more direct way to set >> the preconditioner than to go through SNESSetJacobian after having >> assembled everything by hand ? Like, in my case, or in the more general >> case of fluid dynamics equations, the preconditioner is not a fun matrix to >> assemble, because for every cell the derivative of the physical flux >> jacobian has to be taken and put in the right block in the matrix - finite >> element style if you want. Is there a way to do that with Petsc methods, >> maybe short-circuiting the FEM based methods ? >> >> >> Thibault >> >> I am not sure what you mean but there are a couple of things that may >> be helpful. >> >> PCSHELL >> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCSHELL.html allows >> you to build your own preconditioner (that can and often will use one or >> more of its own Mats, and KSP or PC inside it, or even use another PETScFV >> etc to build some of the sub matrices for you if it is appropriate), this >> approach means you never need to construct a "global" PETSc matrix from >> which PETSc builds the preconditioner. But you should only do this if the >> conventional approach is not reasonable for your problem. >> >> MATNEST >> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATNEST.html allows >> you to build a global matrix by building parts of it separately and even >> skipping parts you decide you don't need in the preconditioner. >> Conceptually it is the same as just creating a global matrix and filling up >> but the process is a bit different and something suitable for "multi >> physics" or "multi-equation" type applications. >> >> Of course what you put into PCSHELL and MATNEST will affect the >> convergence of the nonlinear solver. As Jed noted what you put in the >> "Jacobian" does not have to be directly the same mathematically as what you >> put into the TSSetI/RHSFunction with the caveat that it does have to >> appropriate spectral properties to result in a good preconditioner for the >> "true" Jacobian. >> >> Couple of other notes: >> >> The entire business of "Jacobian" matrix-free or not (with for example >> -snes_fd_operator) is tricky because as Jed noted if your finite volume >> scheme has non-differential terms such as if () tests. There is a concept >> of sub-differential for this type of thing but I know absolutely nothing >> about that and probably not worth investigating. >> >> In this situation you can avoid the "true" Jacobian completely (both for >> matrix-vector product and preconditioner) and use something else as Jed >> suggested a lower order scheme that is differentiable. This can work well >> for solving the nonlinear system or not depending on how suitable it is for >> your original "function" >> >> >> 1) In theory at least you can have the Jacobian matrix-vector product >> computed directly using DMPLEX/PETScFV infrastructure (it would apply the >> Jacobian locally matrix-free using code similar to the code that evaluates >> the FV "function". I do no know if any of this code is written, it will be >> more efficient than -snes_mf_operator that evaluates the FV "function" and >> does traditional differencing to compute the Jacobian. Again it has the >> problem of non-differentialability if the function is not differential. But >> it could be done for a different (lower order scheme) that is >> differentiable. >> >> 2) You can have PETSc compute the Jacobian explicitly coloring and from >> that build the preconditioner, this allows you to avoid the hassle of >> writing the code for the derivatives yourself. This uses finite differences >> on your function and coloring of the graph to compute many columns of the >> Jacobian simultaneously and can be pretty efficient. Again if the function >> is not differential there can be issues of what the result means and will >> it work in a nonlinear solver. SNESComputeJacobianDefaultColor >> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESComputeJacobianDefaultColor.html >> >> 3) Much more outlandish is to skip Newton and Jacobians completely and >> use the full approximation scheme SNESFAS >> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNESFAS/SNESFAS.html this >> requires a grid hierarchy and appropriate way to interpolate up through the >> grid hierarchy your finite volume solutions. Probably not worth >> investigating unless you have lots of time on your hands and keen interest >> in this kind of stuff https://arxiv.org/pdf/1607.04254.pdf >> >> So to summarize, and Matt and Jed can correct my mistakes. >> >> 1) Form the full Jacobian from the original "function" using analytic >> approach use it for both the matrix-vector product and to build the >> preconditioner. Problem if full Jacobian not well defined mathematically. >> Tough to code, usually not practical. >> >> 2) Do any matrix free (any way) for the full Jacobian and >> >> a) build another "approximate" Jacobian (using any technique analytic or >> finite differences using matrix coloring on a new "lower order" "function") >> Still can have trouble if this original Jacobian is no well defined >> >> b) "write your own preconditioner" that internally can use anything in >> PETSc that approximately solves the Jacobian. Same potential problems if >> original Jacobian is not differential, plus convergence will depend on how >> good your own preconditioner approximates the inverse of the true Jacobian. >> >> 3) Use a lower Jacobian (computed anyway you want) for the matrix-vector >> product and the preconditioner. The problem of differentiability is gone >> but convergence of the nonlinear solver depends on how well lower order >> Jacobian is appropriate for the original "function" >> >> a) Form the "lower order" Jacobian analytically or with coloring and >> use for both matrix-vector product and building preconditioner. Note that >> switching between this and 2a is trivial. >> >> b) Do the "lower order" Jacobian matrix free and provide your own >> PCSHELL. Note that switching between this and 2b is trivial. >> >> Barry >> >> I would first try competing the "true" Jacobian via coloring, if that >> works and give satisfactory results (fast enough) then stop. >> >> Then I would do 2a/2b by writing my "function" using PETScFV and writing >> the "lower order function" via PETScFV and use matrix coloring to get the >> Jacobian from the second "lower order function". If this works well (either >> with 2a or 3a or both) then stop or you can compute the "lower order" >> Jacobian analytically (again using PetscFV) for a more efficient evaluation >> of the Jacobian. >> >> > >> >> >> Thanks ! >> >> Thibault >> >> Le ven. 21 ao?t 2020 ? 17:22, Barry Smith a ?crit : >> >> >>> >>> On Aug 21, 2020, at 8:35 AM, Thibault Bridel-Bertomeu < >>> thibault.bridelbertomeu at gmail.com> wrote: >>> >>> >>> >>> Le ven. 21 ao?t 2020 ? 15:23, Matthew Knepley a >>> ?crit : >>> >>>> On Fri, Aug 21, 2020 at 9:10 AM Thibault Bridel-Bertomeu < >>>> thibault.bridelbertomeu at gmail.com> wrote: >>>> >>>>> Sorry, I sent too soon, I hit the wrong key. >>>>> >>>>> I wanted to say that context.npoints is the local number of cells. >>>>> >>>>> PetscRHSFunctionImpl allows to generate the hyperbolic part of the >>>>> right hand side. >>>>> Then we have : >>>>> >>>>> PetscErrorCode PetscIJacobian( >>>>> TS ts, /*!< Time stepping object (see PETSc TS)*/ >>>>> PetscReal t, /*!< Current time */ >>>>> Vec Y, /*!< Solution vector */ >>>>> Vec Ydot, /*!< Time-derivative of solution vector */ >>>>> PetscReal a, /*!< Shift */ >>>>> Mat A, /*!< Jacobian matrix */ >>>>> Mat B, /*!< Preconditioning matrix */ >>>>> void *ctxt /*!< Application context */ >>>>> ) >>>>> { >>>>> PETScContext *context = (PETScContext*) ctxt; >>>>> HyPar *solver = context->solver; >>>>> _DECLARE_IERR_; >>>>> >>>>> PetscFunctionBegin; >>>>> solver->count_IJacobian++; >>>>> context->shift = a; >>>>> context->waqt = t; >>>>> /* Construct preconditioning matrix */ >>>>> if (context->flag_use_precon) { IERR PetscComputePreconMatImpl(B,Y, >>>>> context); CHECKERR(ierr); } >>>>> >>>>> PetscFunctionReturn(0); >>>>> } >>>>> >>>>> and PetscJacobianFunction_JFNK which I bind to the matrix shell, >>>>> computes the action of the jacobian on a vector : say U0 is the state of >>>>> reference and Y the vector upon which to apply the JFNK method, then the >>>>> PetscJacobianFunction_JFNK returns shift * Y - 1/epsilon * (F(U0 + >>>>> epsilon*Y) - F(U0)) where F allows to evaluate the hyperbolic flux (shift >>>>> comes from the TS). >>>>> The preconditioning matrix I compute as an approximation to the actual >>>>> jacobian, that is shift * Identity - Derivative(dF/dU) where dF/dU is, in >>>>> each cell, a 4x4 matrix that is known exactly for the system of equations I >>>>> am solving, i.e. Euler equations. For the structured grid, I can loop on >>>>> the cells and do that 'Derivative' thing at first order by simply taking a >>>>> finite-difference like approximation with the neighboring cells, >>>>> Derivative(phi) = phi_i - phi_{i-1} and I assemble the B matrix block by >>>>> block (JFunction is the dF/dU) >>>>> >>>>> /* diagonal element */ >>>>> >>>>> >>>>> for (v=0; v>>>> nvars*pg + v; } >>>>> >>>>> >>>>> ierr = solver->JFunction >>>>> >>>>> (values,(u+nvars*p),solver->physics >>>>> >>>>> ,dir,0); >>>>> >>>>> >>>>> _ArrayScale1D_ >>>>> >>>>> (values,(dxinv*iblank),(nvars*nvars)); >>>>> >>>>> >>>>> ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); >>>>> CHKERRQ(ierr); >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> /* left neighbor */ >>>>> >>>>> >>>>> if (pgL >= 0) { >>>>> >>>>> >>>>> for (v=0; v>>>> nvars*pgL + v; } >>>>> >>>>> >>>>> ierr = solver->JFunction >>>>> >>>>> (values,(u+nvars*pL),solver->physics >>>>> >>>>> ,dir,1); >>>>> >>>>> >>>>> _ArrayScale1D_ >>>>> >>>>> (values,(-dxinv*iblank),(nvars*nvars)); >>>>> >>>>> >>>>> ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); >>>>> CHKERRQ(ierr); >>>>> >>>>> >>>>> } >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> /* right neighbor */ >>>>> >>>>> >>>>> if (pgR >= 0) { >>>>> >>>>> >>>>> for (v=0; v>>>> nvars*pgR + v; } >>>>> >>>>> >>>>> ierr = solver->JFunction >>>>> >>>>> (values,(u+nvars*pR),solver->physics >>>>> >>>>> ,dir,-1); >>>>> >>>>> >>>>> _ArrayScale1D_ >>>>> >>>>> (values,(-dxinv*iblank),(nvars*nvars)); >>>>> >>>>> >>>>> ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); >>>>> CHKERRQ(ierr); >>>>> >>>>> >>>>> } >>>>> >>>>> >>>>> >>>>> I do not know if I am clear here ... >>>>> Anyways, I am trying to figure out how to do this shell matrix and >>>>> this preconditioner using all the FV and DMPlex artillery. >>>>> >>>> >>>> Okay, that is very clear. We should be able to get the JFNK just with >>>> -snes_mf_operator, and put the approximate J construction in >>>> DMPlexComputeJacobian_Internal(). >>>> There is an FV section already, and we could just add this. I would >>>> need to understand those entries in the pointwise Riemann sense that the >>>> other stuff is now. >>>> >>> >>> Ok i had a quick look and if I understood correctly it would do the job. >>> Setting the snes-mf-operator flag would mean however that we have to go >>> through SNESSetJacobian to set the jacobian and the preconditioning matrix >>> wouldn't it ? >>> >>> >>> Thibault, >>> >>> Since the TS implicit methods end up using SNES internally the option >>> should be available to you without requiring you to be calling the SNES >>> routines directly >>> >>> Once you have finalized your approach and if for the implicit case >>> you always work in the snes mf operator mode you can hardwire >>> >>> TSGetSNES(ts,&snes); >>> SNESSetUseMatrixFree(snes,PETSC_TRUE,PETSC_FALSE); >>> >>> in your code so you don't need to always provide the option >>> -snes-mf-operator >>> >> >>> Barry >>> >>> >>> >>> >>> There might be calls to the Riemann solver to evaluate the dRHS / dU >>> part yes but maybe it's possible to re-use what was computed for the RHS^n ? >>> In the FV section the jacobian is set to identity which I missed before, >>> but it could explain why when I used the following : >>> >>> TSSetType(ts, TSBEULER); >>> DMTSSetIFunctionLocal(dm, DMPlexTSComputeIFunctionFEM , &ctx); >>> DMTSSetIJacobianLocal(dm, DMPlexTSComputeIJacobianFEM , &ctx); >>> >>> with my FV discretization nothing happened, right ? >>> >>> Thank you, >>> >>> Thibault >>> >>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Le ven. 21 ao?t 2020 ? 14:55, Thibault Bridel-Bertomeu < >>>>> thibault.bridelbertomeu at gmail.com> a ?crit : >>>>> >>>> Hi, >>>>>> >>>>>> Thanks Matthew and Jed for your input. >>>>>> I indeed envision an implicit solver in the sense Jed mentioned - >>>>>> Jiri Blazek's book is a nice intro to this concept. >>>>>> >>>>>> Matthew, I do not know exactly what to change right now because >>>>>> although I understand globally what the DMPlexComputeXXXX_Internal methods >>>>>> do, I cannot say for sure line by line what is happening. >>>>>> In a structured code, I have a an implicit FVM solver with PETSc but >>>>>> I do not use any of the FV structure, not even a DM - I just use C arrays >>>>>> that I transform to PETSc Vec and Mat and build my IJacobian and my >>>>>> preconditioner and gives all that to a TS and it runs. I cannot figure out >>>>>> how to do it with the FV and the DM and all the underlying "shortcuts" that >>>>>> I want to use. >>>>>> >>>>>> Here is the top method for the structured code : >>>>>> >>>>>> int total_size = context.npoints * solver->nvars >>>>>> ierr = TSSetRHSFunction(ts,PETSC_NULL,PetscRHSFunctionImpl,&context); >>>>>> CHKERRQ(ierr); >>>>>> SNES snes; >>>>>> KSP ksp; >>>>>> PC pc; >>>>>> SNESType snestype; >>>>>> ierr = TSGetSNES(ts,&snes); CHKERRQ(ierr); >>>>>> ierr = SNESGetType(snes,&snestype); CHKERRQ(ierr); >>>>>> >>>>>> flag_mat_a = 1; >>>>>> ierr = MatCreateShell(MPI_COMM_WORLD,total_size,total_size, >>>>>> PETSC_DETERMINE, >>>>>> PETSC_DETERMINE,&context,&A); CHKERRQ(ierr); >>>>>> context.jfnk_eps = 1e-7; >>>>>> ierr = PetscOptionsGetReal(NULL,NULL,"-jfnk_epsilon",&context. >>>>>> jfnk_eps,NULL); CHKERRQ(ierr); >>>>>> ierr = MatShellSetOperation(A,MATOP_MULT,(void (*)(void)) >>>>>> PetscJacobianFunction_JFNK); CHKERRQ(ierr); >>>>>> ierr = MatSetUp(A); CHKERRQ(ierr); >>>>>> >>>>>> context.flag_use_precon = 0; >>>>>> ierr = PetscOptionsGetBool(PETSC_NULL,PETSC_NULL,"-with_pc",( >>>>>> PetscBool*)(&context.flag_use_precon),PETSC_NULL); CHKERRQ(ierr); >>>>>> >>>>>> /* Set up preconditioner matrix */ >>>>>> flag_mat_b = 1; >>>>>> ierr = MatCreateAIJ(MPI_COMM_WORLD,total_size,total_size, >>>>>> PETSC_DETERMINE,PETSC_DETERMINE, >>>>>> >>>>> (solver->ndims*2+1)*solver->nvars,NULL, >>>>>> 2*solver->ndims*solver->nvars,NULL,&B); CHKERRQ(ierr); >>>>>> ierr = MatSetBlockSize(B,solver->nvars); >>>>>> /* Set the RHSJacobian function for TS */ >>>>>> >>>>> ierr = TSSetIJacobian(ts,A,B,PetscIJacobian,&context); CHKERRQ(ierr); Thibault Bridel-Bertomeu >>>>>> ? >>>>>> Eng, MSc, PhD >>>>>> Research Engineer >>>>>> CEA/CESTA >>>>>> 33114 LE BARP >>>>>> Tel.: (+33)557046924 >>>>>> Mob.: (+33)611025322 >>>>>> Mail: thibault.bridelbertomeu at gmail.com >>>>>> >>>>>> >>>>>> Le jeu. 20 ao?t 2020 ? 18:43, Jed Brown a ?crit : >>>>>> >>>>>>> Matthew Knepley writes: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> > I could never get the FVM stuff to make sense to me for implicit >>>>>>> methods. >>>>>>> >>>>>>> >>>>>>> > Here is my problem understanding. If you have an FVM method, it >>>>>>> decides >>>>>>> >>>>>>> >>>>>>> > to move "stuff" from one cell to its neighboring cells depending >>>>>>> on the >>>>>>> >>>>>>> >>>>>>> > solution to the Riemann problem on each face, which computed the >>>>>>> flux. This >>>>>>> >>>>>>> >>>>>>> > is >>>>>>> >>>>>>> >>>>>>> > fine unless the timestep is so big that material can flow through >>>>>>> into the >>>>>>> >>>>>>> >>>>>>> > cells beyond the neighbor. Then I should have considered the >>>>>>> effect of the >>>>>>> >>>>>>> >>>>>>> > Riemann problem for those interfaces. That would be in the >>>>>>> Jacobian, but I >>>>>>> >>>>>>> >>>>>>> > don't know how to compute that Jacobian. I guess you could do >>>>>>> everything >>>>>>> >>>>>>> >>>>>>> > matrix-free, but without a preconditioner it seems hard. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> So long as we're using method of lines, the flux is just >>>>>>> instantaneous flux, not integrated over some time step. It has the same >>>>>>> meaning for implicit and explicit. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> An explicit method would be unstable if you took such a large time >>>>>>> step (CFL) and an implicit method will not simultaneously be SSP and higher >>>>>>> than first order, but it's still a consistent discretization of the problem. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> It's common (done in FUN3D and others) to precondition with a >>>>>>> first-order method, where gradient reconstruction/limiting is skipped. >>>>>>> That's what I'd recommend because limiting creates nasty nonlinearities and >>>>>>> the resulting discretizations lack h-ellipticity which makes them very hard >>>>>>> to solve. >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >>>> >>>> >>> >>> >>> -- >> Thibault Bridel-Bertomeu >> ? >> Eng, MSc, PhD >> Research Engineer >> CEA/CESTA >> 33114 LE BARP >> Tel.: (+33)557046924 >> Mob.: (+33)611025322 >> Mail: thibault.bridelbertomeu at gmail.com >> >> >> >> > > > > > -- Thibault Bridel-Bertomeu ? Eng, MSc, PhD Research Engineer CEA/CESTA 33114 LE BARP Tel.: (+33)557046924 Mob.: (+33)611025322 Mail: thibault.bridelbertomeu at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Untitled.png Type: image/png Size: 120787 bytes Desc: not available URL: From bsmith at petsc.dev Mon Aug 24 11:41:45 2020 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 24 Aug 2020 11:41:45 -0500 Subject: [petsc-users] Disable PETSC_HAVE_CLOSURE In-Reply-To: References: <87364hibu8.fsf@jedbrown.org> Message-ID: <40BFDB75-F16D-4A7E-8678-C64077DC3164@petsc.dev> Hmm, would a pull request that required the compiler be C during the compile work? #if defined(PETSC_HAVE_CLOSURE) && !defined(__cplusplus) Barry > On Aug 24, 2020, at 11:29 AM, Fande Kong wrote: > > Thanks for your reply, Jed. > > > Barry, do you have any comment? > > Fande, > > On Thu, Aug 20, 2020 at 9:19 AM Jed Brown > wrote: > Barry, this is a side-effect of your Swift experiment. Does that need to be in a header (even if it's a private header)? > > The issue may be that you test with a C compiler and it gets included in C++ source. > > Fande Kong > writes: > > > Hi All, > > > > We (moose team) hit an error message when compiling PETSc, recently. The > > error is related to "PETSC_HAVE_CLOSURE." Everything runs well if I am > > going to turn this flag off by making the following changes: > > > > > > git diff > > diff --git a/config/BuildSystem/config/utilities/closure.py > > b/config/BuildSystem/config/utilities/closure.py > > index 6341ddf271..930e5b3b1b 100644 > > --- a/config/BuildSystem/config/utilities/closure.py > > +++ b/config/BuildSystem/config/utilities/closure.py > > @@ -19,8 +19,8 @@ class Configure(config.base.Configure): > > includes = '#include \n' > > body = 'int (^closure)(int);' > > self.pushLanguage('C') > > - if self.checkLink(includes, body): > > - self.addDefine('HAVE_CLOSURE','1') > > +# if self.checkLink(includes, body): > > +# self.addDefine('HAVE_CLOSURE','1') > > def configure(self): > > self.executeTest(self.configureClosure) > > > > > > I was wondering if there exists a configuration option to disable "Closure" > > C syntax? I did not find one by running "configuration --help" > > > > Please let me know if you need more information. > > > > > > Thanks, > > > > Fande, > > > > > > In file included from > > /Users/milljm/projects/moose/scripts/../libmesh/src/solvers/petscdmlibmesh.C:25: > > /Users/milljm/projects/moose/petsc/include/petsc/private/petscimpl.h:15:29: > > warning: 'PetscVFPrintfSetClosure' initialized and declared 'extern' > > 15 | PETSC_EXTERN PetscErrorCode PetscVFPrintfSetClosure(int (^)(const > > char*)); > > | ^~~~~~~~~~~~~~~~~~~~~~~ > > /Users/milljm/projects/moose/petsc/include/petsc/private/petscimpl.h:15:53: > > error: expected primary-expression before 'int' > > 15 | PETSC_EXTERN PetscErrorCode PetscVFPrintfSetClosure(int (^)(const > > char*)); > > | ^~~ > > CXX src/systems/libmesh_opt_la-equation_systems_io.lo > > In file included from > > /Users/milljm/projects/moose/petsc/include/petsc/private/dmimpl.h:7, > > from > > /Users/milljm/projects/moose/scripts/../libmesh/src/solvers/petscdmlibmeshimpl.C:26: > > /Users/milljm/projects/moose/petsc/include/petsc/private/petscimpl.h:15:29: > > warning: 'PetscVFPrintfSetClosure' initialized and declared 'extern' > > 15 | PETSC_EXTERN PetscErrorCode PetscVFPrintfSetClosure(int (^)(const > > char*)); > > | ^~~~~~~~~~~~~~~~~~~~~~~ > > /Users/milljm/projects/moose/petsc/include/petsc/private/petscimpl.h:15:53: > > error: expected primary-expression before 'int' > > 15 | PETSC_EXTERN PetscErrorCode PetscVFPrintfSetClosure(int (^)(const > > char*)); -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Aug 24 11:51:33 2020 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 24 Aug 2020 11:51:33 -0500 Subject: [petsc-users] PetscFV and TS implicit In-Reply-To: References: <87mu2pgtdp.fsf@jedbrown.org> <01FA5D4D-A0CA-4ACB-ACC9-EB213E3B0D2F@petsc.dev> Message-ID: > On Aug 24, 2020, at 11:38 AM, Thibault Bridel-Bertomeu wrote: > > Thank you Barry for taking the time to go through the code ! > > I indeed figured out this afternoon that the function related to the matrix-vector product is always handling global vectors. I corrected mine so that it compiles, but I have a feeling it won't run properly without a preconditioner. > > Anyways you are right, my PetscJacobianFunction_JFNK() aims at doing some basic finite-differencing ; user->RHS_ref is my F(U) if you see the system as dU/dt = F(U). As for the DMGlobalToLocal() it was there mainly because I had not realized the vectors I was manipulating were global. > I will take your advice and try with just the SNESSetUseMatrixFree. > I haven't quite fully understood what it does "under the hood" though: just calling SNESSetUseMatrixFree(snes,PETSC_TRUE,PETSC_FALSE) before the TSSolve call is enough to ensure that the implicit matrix is computed ? Does it use the function we set as a RHS to build the matrix ? All it does is "replace" the A matrix with one automatically created for the job using MatCreateMFFD(). It does not touch the B matrix, it does not build the matrix but yes if does use the function to provide to do the differencing. > > To create the preconditioner I will do as you suggest too, thank you. This matrix has to be as close as possible to the inverse of the implicit matrix to ensure that the eigenvalues of the system are as close to 1 as possible. Given the implicit matrix is built "automatically" thanks to the SNES matrix free capability, can we use that matrix as a starting point to the building of the preconditioner ? No the MatrixFree doesn't build a matrix, it can only do matrix-vector products with differencing. > You were talking about the coloring capabilities in PETSc, is that where it can be applied ? Yes you can use that. See MatFDColoringCreate() but since you are using a DM in theory you can use -snes_fd_color and PETSc will manage everything for you so you don't have to write any code for Jacobians at all. Again it uses your function to do differences using coloring to be efficient to build the Jacobian for you. Barry Internally it uses SNESComputeJacobianDefaultColor() if you are interested in what it does. > > Thank you so much again, > > Thibault > > > Le lun. 24 ao?t 2020 ? 15:45, Barry Smith > a ?crit : > > I think the attached is wrong. > > > > The input to the matrix vector product for the Jacobian is always global vectors which means on each process the dimension is not the size of the DMGetLocalVector() it should be the VecGetLocalSize() of the DMGetGlobalVector() > > But you may be able to skip all this and have the DM create the shell matrix setting it sizes appropriately and you only need to supply the MATOP > > DMSetMatType(dm,MATSHELL); > DMCreateMatrix(dm,&A); > > In fact, I also don't understand the PetscJacobianFunction_JFKN() function It seems to be doing finite differencing on the DMPlexTSComputeRHSFunctionFVM() assuming the current function value is in usr->RHS_ref. How is this different than just letting PETSc/SNES used finite differences to do the matrix-vector product. Your code seems rather complicated with the DMGlobalToLocal() which I don't understand what it is suppose to do there. > > I think you can just call > > TSGetSNES() > SNESSetUseMatrixFree(snes,PETSC_TRUE,PETSC_FALSE); > > and it will set up an internal matrix that does the finite differencing for you. Then you never need a shell matrix. > > > Also to create the preconditioner matrix B this should work > > DMSetMatType(dm,MATAIJ); > DMCreateMatrix(dm,&B); > > no need for you to figure out the sizes. > > > Note that both A and B need to have the same dimensions on each process as the global vectors which I don't think your current code has. > > > > Barry > > > > >> On Aug 24, 2020, at 12:56 AM, Thibault Bridel-Bertomeu > wrote: >> > > >> Barry, first of all, thank you very much for your detailed answer, I keep reading it to let it soak in - I might come back to you for more details if you do not mind. >> >> In the meantime, to fuel the conversation, I attach to this e-mail two pdfs containing the pieces of the code that regard what we are discussing. In the *timedisc.pdf, you'll find how I handle the initialization of the TS object, and in the *petscdefs.pdf you'll find the method that calls the TSSolve as well as the methods that are linked to the TS (the timestep adapt, the jacobian etc ...). [Sorry for the quality, I cannot do better than that sort of pdf ...] >> >> Based on what is in the structured code I sent you the other day, I rewrote the PetscJacobianFunction_JFNK. I think it should be all right, but although it compiles, execution raises a seg fault I think when I do >> ierr = TSSetIJacobian(ts, A, A, PetscIJacobian, user); >> saying that A does not have the right dimensions. It is quite new, I am still looking into where exactly the error is raised. What do you think of this implementation though, does it look correct in your expert eyes ? >> As for what we really discussed so far, it's that PetscComputePreconMatImpl that I do not know how to implement (with the derivative of the jacobian based on the FVM object). >> >> I understand now that what I am showing you today might not be the right way to go if one wants to really use the PetscFV, but I just wanted to add those code lines to the conversation to have your feedback. >> >> Thank you again for your help, >> >> Thibault >> >> > > >> Le ven. 21 ao?t 2020 ? 19:25, Barry Smith > a ?crit : > > >> >> >>> On Aug 21, 2020, at 10:58 AM, Thibault Bridel-Bertomeu > wrote: >>> >>> Thank you Barry for the tip ! I?ll make sure to do that when everything is set. >>> What I also meant is that there will not be any more direct way to set the preconditioner than to go through SNESSetJacobian after having assembled everything by hand ? Like, in my case, or in the more general case of fluid dynamics equations, the preconditioner is not a fun matrix to assemble, because for every cell the derivative of the physical flux jacobian has to be taken and put in the right block in the matrix - finite element style if you want. Is there a way to do that with Petsc methods, maybe short-circuiting the FEM based methods ? >> >> Thibault >> >> I am not sure what you mean but there are a couple of things that may be helpful. >> >> PCSHELL https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCSHELL.html <> allows you to build your own preconditioner (that can and often will use one or more of its own Mats, and KSP or PC inside it, or even use another PETScFV etc to build some of the sub matrices for you if it is appropriate), this approach means you never need to construct a "global" PETSc matrix from which PETSc builds the preconditioner. But you should only do this if the conventional approach is not reasonable for your problem. >> >> MATNEST https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATNEST.html allows you to build a global matrix by building parts of it separately and even skipping parts you decide you don't need in the preconditioner. Conceptually it is the same as just creating a global matrix and filling up but the process is a bit different and something suitable for "multi physics" or "multi-equation" type applications. >> >> Of course what you put into PCSHELL and MATNEST will affect the convergence of the nonlinear solver. As Jed noted what you put in the "Jacobian" does not have to be directly the same mathematically as what you put into the TSSetI/RHSFunction with the caveat that it does have to appropriate spectral properties to result in a good preconditioner for the "true" Jacobian. >> >> Couple of other notes: >> >> The entire business of "Jacobian" matrix-free or not (with for example -snes_fd_operator) is tricky because as Jed noted if your finite volume scheme has non-differential terms such as if () tests. There is a concept of sub-differential for this type of thing but I know absolutely nothing about that and probably not worth investigating. >> >> In this situation you can avoid the "true" Jacobian completely (both for matrix-vector product and preconditioner) and use something else as Jed suggested a lower order scheme that is differentiable. This can work well for solving the nonlinear system or not depending on how suitable it is for your original "function" >> >> >> 1) In theory at least you can have the Jacobian matrix-vector product computed directly using DMPLEX/PETScFV infrastructure (it would apply the Jacobian locally matrix-free using code similar to the code that evaluates the FV "function". I do no know if any of this code is written, it will be more efficient than -snes_mf_operator that evaluates the FV "function" and does traditional differencing to compute the Jacobian. Again it has the problem of non-differentialability if the function is not differential. But it could be done for a different (lower order scheme) that is differentiable. >> >> 2) You can have PETSc compute the Jacobian explicitly coloring and from that build the preconditioner, this allows you to avoid the hassle of writing the code for the derivatives yourself. This uses finite differences on your function and coloring of the graph to compute many columns of the Jacobian simultaneously and can be pretty efficient. Again if the function is not differential there can be issues of what the result means and will it work in a nonlinear solver. SNESComputeJacobianDefaultColor https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESComputeJacobianDefaultColor.html >> >> 3) Much more outlandish is to skip Newton and Jacobians completely and use the full approximation scheme SNESFAS https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNESFAS/SNESFAS.html this requires a grid hierarchy and appropriate way to interpolate up through the grid hierarchy your finite volume solutions. Probably not worth investigating unless you have lots of time on your hands and keen interest in this kind of stuff https://arxiv.org/pdf/1607.04254.pdf >> >> So to summarize, and Matt and Jed can correct my mistakes. >> >> 1) Form the full Jacobian from the original "function" using analytic approach use it for both the matrix-vector product and to build the preconditioner. Problem if full Jacobian not well defined mathematically. Tough to code, usually not practical. >> >> 2) Do any matrix free (any way) for the full Jacobian and >> >> a) build another "approximate" Jacobian (using any technique analytic or finite differences using matrix coloring on a new "lower order" "function") Still can have trouble if this original Jacobian is no well defined >> >> b) "write your own preconditioner" that internally can use anything in PETSc that approximately solves the Jacobian. Same potential problems if original Jacobian is not differential, plus convergence will depend on how good your own preconditioner approximates the inverse of the true Jacobian. >> >> 3) Use a lower Jacobian (computed anyway you want) for the matrix-vector product and the preconditioner. The problem of differentiability is gone but convergence of the nonlinear solver depends on how well lower order Jacobian is appropriate for the original "function" >> >> a) Form the "lower order" Jacobian analytically or with coloring and use for both matrix-vector product and building preconditioner. Note that switching between this and 2a is trivial. >> >> b) Do the "lower order" Jacobian matrix free and provide your own PCSHELL. Note that switching between this and 2b is trivial. >> >> Barry >> >> I would first try competing the "true" Jacobian via coloring, if that works and give satisfactory results (fast enough) then stop. >> >> Then I would do 2a/2b by writing my "function" using PETScFV and writing the "lower order function" via PETScFV and use matrix coloring to get the Jacobian from the second "lower order function". If this works well (either with 2a or 3a or both) then stop or you can compute the "lower order" Jacobian analytically (again using PetscFV) for a more efficient evaluation of the Jacobian. >> > >> >>> >>> Thanks ! >>> >>> Thibault >>> > >>> Le ven. 21 ao?t 2020 ? 17:22, Barry Smith > a ?crit : > > >>> >>> >>>> On Aug 21, 2020, at 8:35 AM, Thibault Bridel-Bertomeu > wrote: >>>> >>>> >>>> >>>> Le ven. 21 ao?t 2020 ? 15:23, Matthew Knepley > a ?crit : >>>> On Fri, Aug 21, 2020 at 9:10 AM Thibault Bridel-Bertomeu > wrote: >>>> Sorry, I sent too soon, I hit the wrong key. >>>> >>>> I wanted to say that context.npoints is the local number of cells. >>>> >>>> PetscRHSFunctionImpl allows to generate the hyperbolic part of the right hand side. >>>> Then we have : >>>> >>>> PetscErrorCode PetscIJacobian( >>>> TS ts, /*!< Time stepping object (see PETSc TS)*/ >>>> PetscReal t, /*!< Current time */ >>>> Vec Y, /*!< Solution vector */ >>>> Vec Ydot, /*!< Time-derivative of solution vector */ >>>> PetscReal a, /*!< Shift */ >>>> Mat A, /*!< Jacobian matrix */ >>>> Mat B, /*!< Preconditioning matrix */ >>>> void *ctxt /*!< Application context */ >>>> ) >>>> { >>>> PETScContext *context = (PETScContext*) ctxt; >>>> HyPar *solver = context->solver; >>>> _DECLARE_IERR_; >>>> >>>> PetscFunctionBegin; >>>> solver->count_IJacobian++; >>>> context->shift = a; >>>> context->waqt = t; >>>> /* Construct preconditioning matrix */ >>>> if (context->flag_use_precon) { IERR PetscComputePreconMatImpl(B,Y,context); CHECKERR(ierr); } >>>> >>>> PetscFunctionReturn(0); >>>> } >>>> >>>> and PetscJacobianFunction_JFNK which I bind to the matrix shell, computes the action of the jacobian on a vector : say U0 is the state of reference and Y the vector upon which to apply the JFNK method, then the PetscJacobianFunction_JFNK returns shift * Y - 1/epsilon * (F(U0 + epsilon*Y) - F(U0)) where F allows to evaluate the hyperbolic flux (shift comes from the TS). >>>> The preconditioning matrix I compute as an approximation to the actual jacobian, that is shift * Identity - Derivative(dF/dU) where dF/dU is, in each cell, a 4x4 matrix that is known exactly for the system of equations I am solving, i.e. Euler equations. For the structured grid, I can loop on the cells and do that 'Derivative' thing at first order by simply taking a finite-difference like approximation with the neighboring cells, Derivative(phi) = phi_i - phi_{i-1} and I assemble the B matrix block by block (JFunction is the dF/dU) >>>> >>>> /* diagonal element */ >>>> >>>> >>>> <> for (v=0; v>>> >>>> >>>> <> ierr = solver->JFunction (values,(u+nvars*p),solver->physics ,dir,0); >>>> >>>> >>>> <> _ArrayScale1D_ (values,(dxinv*iblank),(nvars*nvars)); >>>> >>>> >>>> <> ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); CHKERRQ(ierr); >>>> >>>> >>>> <> >>>> >>>> >>>> <> /* left neighbor */ >>>> >>>> >>>> <> if (pgL >= 0) { >>>> >>>> >>>> <> for (v=0; v>>> >>>> >>>> <> ierr = solver->JFunction (values,(u+nvars*pL),solver->physics ,dir,1); >>>> >>>> >>>> <> _ArrayScale1D_ (values,(-dxinv*iblank),(nvars*nvars)); >>>> >>>> >>>> <> ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); CHKERRQ(ierr); >>>> >>>> >>>> <> } >>>> >>>> >>>> <> >>>> >>>> >>>> <> /* right neighbor */ >>>> >>>> >>>> <> if (pgR >= 0) { >>>> >>>> >>>> <> for (v=0; v>>> >>>> >>>> <> ierr = solver->JFunction (values,(u+nvars*pR),solver->physics ,dir,-1); >>>> >>>> >>>> <> _ArrayScale1D_ (values,(-dxinv*iblank),(nvars*nvars)); >>>> >>>> >>>> <> ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); CHKERRQ(ierr); >>>> >>>> >>>> <> } >>>> >>>> >>>> >>>> I do not know if I am clear here ... >>>> Anyways, I am trying to figure out how to do this shell matrix and this preconditioner using all the FV and DMPlex artillery. >>>> >>>> Okay, that is very clear. We should be able to get the JFNK just with -snes_mf_operator, and put the approximate J construction in DMPlexComputeJacobian_Internal(). >>>> There is an FV section already, and we could just add this. I would need to understand those entries in the pointwise Riemann sense that the other stuff is now. >>>> >>>> Ok i had a quick look and if I understood correctly it would do the job. Setting the snes-mf-operator flag would mean however that we have to go through SNESSetJacobian to set the jacobian and the preconditioning matrix wouldn't it ? >>> >>> Thibault, >>> >>> Since the TS implicit methods end up using SNES internally the option should be available to you without requiring you to be calling the SNES routines directly >>> >>> Once you have finalized your approach and if for the implicit case you always work in the snes mf operator mode you can hardwire >>> >>> TSGetSNES(ts,&snes); >>> SNESSetUseMatrixFree(snes,PETSC_TRUE,PETSC_FALSE); >>> >>> in your code so you don't need to always provide the option -snes-mf-operator > >>> >>> Barry >>> >>> >>> > >>>> There might be calls to the Riemann solver to evaluate the dRHS / dU part yes but maybe it's possible to re-use what was computed for the RHS^n ? >>>> In the FV section the jacobian is set to identity which I missed before, but it could explain why when I used the following : >>>> TSSetType(ts, TSBEULER); >>>> DMTSSetIFunctionLocal(dm, DMPlexTSComputeIFunctionFEM , &ctx); >>>> DMTSSetIJacobianLocal(dm, DMPlexTSComputeIJacobianFEM , &ctx); >>>> with my FV discretization nothing happened, right ? >>>> >>>> Thank you, >>>> >>>> Thibault >>>> > >>>> Thanks, >>>> >>>> Matt >>>> > >>>> Le ven. 21 ao?t 2020 ? 14:55, Thibault Bridel-Bertomeu > a ?crit : > > >>>> Hi, >>>> >>>> Thanks Matthew and Jed for your input. >>>> I indeed envision an implicit solver in the sense Jed mentioned - Jiri Blazek's book is a nice intro to this concept. >>>> >>>> Matthew, I do not know exactly what to change right now because although I understand globally what the DMPlexComputeXXXX_Internal methods do, I cannot say for sure line by line what is happening. >>>> In a structured code, I have a an implicit FVM solver with PETSc but I do not use any of the FV structure, not even a DM - I just use C arrays that I transform to PETSc Vec and Mat and build my IJacobian and my preconditioner and gives all that to a TS and it runs. I cannot figure out how to do it with the FV and the DM and all the underlying "shortcuts" that I want to use. >>>> >>>> Here is the top method for the structured code : >>>> > > >>>> int total_size = context.npoints * solver->nvars >>>> ierr = TSSetRHSFunction(ts,PETSC_NULL,PetscRHSFunctionImpl,&context); CHKERRQ(ierr); >>>> SNES snes; >>>> KSP ksp; >>>> PC pc; >>>> SNESType snestype; >>>> ierr = TSGetSNES(ts,&snes); CHKERRQ(ierr); >>>> ierr = SNESGetType(snes,&snestype); CHKERRQ(ierr); >>>> >>>> flag_mat_a = 1; >>>> ierr = MatCreateShell(MPI_COMM_WORLD,total_size,total_size,PETSC_DETERMINE, >>>> PETSC_DETERMINE,&context,&A); CHKERRQ(ierr); >>>> context.jfnk_eps = 1e-7; >>>> ierr = PetscOptionsGetReal(NULL,NULL,"-jfnk_epsilon",&context.jfnk_eps,NULL); CHKERRQ(ierr); >>>> ierr = MatShellSetOperation(A,MATOP_MULT,(void (*)(void))PetscJacobianFunction_JFNK); CHKERRQ(ierr); >>>> ierr = MatSetUp(A); CHKERRQ(ierr); >>>> >>>> context.flag_use_precon = 0; >>>> ierr = PetscOptionsGetBool(PETSC_NULL,PETSC_NULL,"-with_pc",(PetscBool*)(&context.flag_use_precon),PETSC_NULL); CHKERRQ(ierr); >>>> >>>> /* Set up preconditioner matrix */ >>>> flag_mat_b = 1; >>>> ierr = MatCreateAIJ(MPI_COMM_WORLD,total_size,total_size,PETSC_DETERMINE,PETSC_DETERMINE, > >>>> (solver->ndims*2+1)*solver->nvars,NULL, >>>> 2*solver->ndims*solver->nvars,NULL,&B); CHKERRQ(ierr); >>>> ierr = MatSetBlockSize(B,solver->nvars); >>>> /* Set the RHSJacobian function for TS */ > > ierr = TSSetIJacobian(ts,A,B,PetscIJacobian,&context); CHKERRQ(ierr); > >>>> Thibault Bridel-Bertomeu >>>> ? >>>> Eng, MSc, PhD >>>> Research Engineer >>>> CEA/CESTA >>>> 33114 LE BARP >>>> Tel.: (+33)557046924 >>>> Mob.: (+33)611025322 >>>> Mail: thibault.bridelbertomeu at gmail.com >>>> > > >>>> >>>> Le jeu. 20 ao?t 2020 ? 18:43, Jed Brown > a ?crit : >>>> Matthew Knepley > writes: >>>> >>>> >>>> >>>> >>>> >>>> > I could never get the FVM stuff to make sense to me for implicit methods. >>>> >>>> >>>> > Here is my problem understanding. If you have an FVM method, it decides >>>> >>>> >>>> > to move "stuff" from one cell to its neighboring cells depending on the >>>> >>>> >>>> > solution to the Riemann problem on each face, which computed the flux. This >>>> >>>> >>>> > is >>>> >>>> >>>> > fine unless the timestep is so big that material can flow through into the >>>> >>>> >>>> > cells beyond the neighbor. Then I should have considered the effect of the >>>> >>>> >>>> > Riemann problem for those interfaces. That would be in the Jacobian, but I >>>> >>>> >>>> > don't know how to compute that Jacobian. I guess you could do everything >>>> >>>> >>>> > matrix-free, but without a preconditioner it seems hard. >>>> >>>> >>>> >>>> >>>> >>>> So long as we're using method of lines, the flux is just instantaneous flux, not integrated over some time step. It has the same meaning for implicit and explicit. >>>> >>>> >>>> >>>> >>>> >>>> An explicit method would be unstable if you took such a large time step (CFL) and an implicit method will not simultaneously be SSP and higher than first order, but it's still a consistent discretization of the problem. >>>> >>>> >>>> >>>> >>>> >>>> It's common (done in FUN3D and others) to precondition with a first-order method, where gradient reconstruction/limiting is skipped. That's what I'd recommend because limiting creates nasty nonlinearities and the resulting discretizations lack h-ellipticity which makes them very hard to solve. >>>> >>>> >>>> >>>> > >>>> >>>> > >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ > >>>> >>>> >>>> >>>> >>> > >>> -- >>> Thibault Bridel-Bertomeu >>> ? >>> Eng, MSc, PhD >>> Research Engineer >>> CEA/CESTA >>> 33114 LE BARP >>> Tel.: (+33)557046924 >>> Mob.: (+33)611025322 >>> Mail: thibault.bridelbertomeu at gmail.com >>> >>> > >> >> >> >> > > > > -- > Thibault Bridel-Bertomeu > ? > Eng, MSc, PhD > Research Engineer > CEA/CESTA > 33114 LE BARP > Tel.: (+33)557046924 > Mob.: (+33)611025322 > Mail: thibault.bridelbertomeu at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Aug 24 11:53:01 2020 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 24 Aug 2020 11:53:01 -0500 Subject: [petsc-users] Bus Error In-Reply-To: <87h7ssgg0g.fsf@jedbrown.org> References: <8D172ADD-FC1A-4E71-B151-CA648951A61C@petsc.dev> <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> <02176361-CBC4-490B-A064-057C9FAC271A@petsc.dev> <2C14E111-0ABC-4322-AD1C-FC83E4BD4798@petsc.dev> <87h7ssgg0g.fsf@jedbrown.org> Message-ID: <80DC4DC1-8D9B-43BF-9947-F47694FE68E9@petsc.dev> Jed, Even for BLAS 1 operations? Barry > On Aug 24, 2020, at 11:33 AM, Jed Brown wrote: > > Barry Smith writes: > >> Some of the MKL versions of BLAS use the new Phi-like SIMD instructions that require 64 byte alignment but this probably does not apply to you. > > They shouldn't ever require that the input array be aligned. For large sizes, they'll always be packing tiles anyway, in which case MKL (or BLIS, etc) is responsible for alignment of the tiles. From jed at jedbrown.org Mon Aug 24 11:58:04 2020 From: jed at jedbrown.org (Jed Brown) Date: Mon, 24 Aug 2020 10:58:04 -0600 Subject: [petsc-users] Bus Error In-Reply-To: <80DC4DC1-8D9B-43BF-9947-F47694FE68E9@petsc.dev> References: <8D172ADD-FC1A-4E71-B151-CA648951A61C@petsc.dev> <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> <02176361-CBC4-490B-A064-057C9FAC271A@petsc.dev> <2C14E111-0ABC-4322-AD1C-FC83E4BD4798@petsc.dev> <87h7ssgg0g.fsf@jedbrown.org> <80DC4DC1-8D9B-43BF-9947-F47694FE68E9@petsc.dev> Message-ID: <87blj0gew3.fsf@jedbrown.org> Barry Smith writes: > Even for BLAS 1 operations? Yes, and requiring alignment is untenable given how algorithms like unblocked Householder work on a subset of the original column (and simpler cases where lda doesn't maintain alignment). From bsmith at petsc.dev Mon Aug 24 12:11:34 2020 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 24 Aug 2020 12:11:34 -0500 Subject: [petsc-users] Bus Error In-Reply-To: <87blj0gew3.fsf@jedbrown.org> References: <8D172ADD-FC1A-4E71-B151-CA648951A61C@petsc.dev> <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> <02176361-CBC4-490B-A064-057C9FAC271A@petsc.dev> <2C14E111-0ABC-4322-AD1C-FC83E4BD4798@petsc.dev> <87h7ssgg0g.fsf@jedbrown.org> <80DC4DC1-8D9B-43BF-9947-F47694FE68E9@petsc.dev> <87blj0gew3.fsf@jedbrown.org> Message-ID: So if a BLAS errors with SIGBUS then it is always an input error of just not proper double/complex alignment? Or some other very strange thing? > On Aug 24, 2020, at 11:58 AM, Jed Brown wrote: > > Barry Smith writes: > >> Even for BLAS 1 operations? > > Yes, and requiring alignment is untenable given how algorithms like unblocked Householder work on a subset of the original column (and simpler cases where lda doesn't maintain alignment). From jed at jedbrown.org Mon Aug 24 12:31:01 2020 From: jed at jedbrown.org (Jed Brown) Date: Mon, 24 Aug 2020 11:31:01 -0600 Subject: [petsc-users] Bus Error In-Reply-To: References: <8D172ADD-FC1A-4E71-B151-CA648951A61C@petsc.dev> <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> <02176361-CBC4-490B-A064-057C9FAC271A@petsc.dev> <2C14E111-0ABC-4322-AD1C-FC83E4BD4798@petsc.dev> <87h7ssgg0g.fsf@jedbrown.org> <80DC4DC1-8D9B-43BF-9947-F47694FE68E9@petsc.dev> <87blj0gew3.fsf@jedbrown.org> Message-ID: <878se4gdd6.fsf@jedbrown.org> Barry Smith writes: > So if a BLAS errors with SIGBUS then it is always an input error of just not proper double/complex alignment? Or some other very strange thing? I would suspect memory corruption. From bsmith at petsc.dev Mon Aug 24 12:35:24 2020 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 24 Aug 2020 12:35:24 -0500 Subject: [petsc-users] Bus Error In-Reply-To: <878se4gdd6.fsf@jedbrown.org> References: <8D172ADD-FC1A-4E71-B151-CA648951A61C@petsc.dev> <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> <02176361-CBC4-490B-A064-057C9FAC271A@petsc.dev> <2C14E111-0ABC-4322-AD1C-FC83E4BD4798@petsc.dev> <87h7ssgg0g.fsf@jedbrown.org> <80DC4DC1-8D9B-43BF-9947-F47694FE68E9@petsc.dev> <87blj0gew3.fsf@jedbrown.org> <878se4gdd6.fsf@jedbrown.org> Message-ID: <3B4D8471-F863-49C8-AAD7-4A4ACA3BE78A@petsc.dev> > On Aug 24, 2020, at 12:31 PM, Jed Brown wrote: > > Barry Smith writes: > >> So if a BLAS errors with SIGBUS then it is always an input error of just not proper double/complex alignment? Or some other very strange thing? > > I would suspect memory corruption. Corruption meaning what specifically? The routines crashing are dgemv which only take double precision arrays, regardless of what garbage is in those arrays i don't think there can be BUS errors resulting. They don't take integer arrays whose corruption could result in bad indexing and then BUS errors. So then it can only be corruption of the pointers passed in, correct? From jed at jedbrown.org Mon Aug 24 12:39:15 2020 From: jed at jedbrown.org (Jed Brown) Date: Mon, 24 Aug 2020 11:39:15 -0600 Subject: [petsc-users] Bus Error In-Reply-To: <3B4D8471-F863-49C8-AAD7-4A4ACA3BE78A@petsc.dev> References: <8D172ADD-FC1A-4E71-B151-CA648951A61C@petsc.dev> <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> <02176361-CBC4-490B-A064-057C9FAC271A@petsc.dev> <2C14E111-0ABC-4322-AD1C-FC83E4BD4798@petsc.dev> <87h7ssgg0g.fsf@jedbrown.org> <80DC4DC1-8D9B-43BF-9947-F47694FE68E9@petsc.dev> <87blj0gew3.fsf@jedbrown.org> <878se4gdd6.fsf@jedbrown.org> <3B4D8471-F863-49C8-AAD7-4A4ACA3BE78A@petsc.dev> Message-ID: <87364cgczg.fsf@jedbrown.org> Barry Smith writes: >> On Aug 24, 2020, at 12:31 PM, Jed Brown wrote: >> >> Barry Smith writes: >> >>> So if a BLAS errors with SIGBUS then it is always an input error of just not proper double/complex alignment? Or some other very strange thing? >> >> I would suspect memory corruption. > > > Corruption meaning what specifically? > > The routines crashing are dgemv which only take double precision arrays, regardless of what garbage is in those arrays i don't think there can be BUS errors resulting. They don't take integer arrays whose corruption could result in bad indexing and then BUS errors. > > So then it can only be corruption of the pointers passed in, correct? Such as those pointers pointing into data on the stack with incorrect sizes. From balay at mcs.anl.gov Mon Aug 24 12:40:05 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 24 Aug 2020 12:40:05 -0500 (CDT) Subject: [petsc-users] Bus Error In-Reply-To: <3B4D8471-F863-49C8-AAD7-4A4ACA3BE78A@petsc.dev> References: <8D172ADD-FC1A-4E71-B151-CA648951A61C@petsc.dev> <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> <02176361-CBC4-490B-A064-057C9FAC271A@petsc.dev> <2C14E111-0ABC-4322-AD1C-FC83E4BD4798@petsc.dev> <87h7ssgg0g.fsf@jedbrown.org> <80DC4DC1-8D9B-43BF-9947-F47694FE68E9@petsc.dev> <87blj0gew3.fsf@jedbrown.org> <878se4gdd6.fsf@jedbrown.org> <3B4D8471-F863-49C8-AAD7-4A4ACA3BE78A@petsc.dev> Message-ID: On Mon, 24 Aug 2020, Barry Smith wrote: > > > > On Aug 24, 2020, at 12:31 PM, Jed Brown wrote: > > > > Barry Smith writes: > > > >> So if a BLAS errors with SIGBUS then it is always an input error of just not proper double/complex alignment? Or some other very strange thing? > > > > I would suspect memory corruption. > > > Corruption meaning what specifically? > > The routines crashing are dgemv which only take double precision arrays, regardless of what garbage is in those arrays i don't think there can be BUS errors resulting. They don't take integer arrays whose corruption could result in bad indexing and then BUS errors. > > So then it can only be corruption of the pointers passed in, correct? My wild guess here is - some hardware is misbehaving [on severe load/overheating/insufficient-coolring]. Some errors should be detected/corrected by ECC RAM - but perhaps not all failures get detected? Satish From bsmith at petsc.dev Mon Aug 24 12:40:20 2020 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 24 Aug 2020 12:40:20 -0500 Subject: [petsc-users] Bus Error In-Reply-To: References: <8D172ADD-FC1A-4E71-B151-CA648951A61C@petsc.dev> <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> <02176361-CBC4-490B-A064-057C9FAC271A@petsc.dev> Message-ID: <6855D8C6-99D7-43A9-BB68-60E02B6A9513@petsc.dev> Mark, I have attached a patch file you can apply to PETSc with patch -p1 < blascheck.patch then build the debug version of PETSc and run your crashing problem. Then checks all the input double precision arrays passed to BLAS that are crashing in your code for every call. If the pointer is not usable as a double precision pointer it will error and print the argument number of the BLAS call and the stack. This may give us a bit more information about the problem than before. For example if there is memory corruption that changes one of the pointers used in the BLAS we will now know which one. Barry > On Aug 24, 2020, at 10:15 AM, Mark Lohry wrote: > > Do you ever use regular malloc()? PETSc malloc aligns automatically, but the system one does not. > > Indirectly via new, yes. > > On Mon, Aug 24, 2020 at 11:10 AM Matthew Knepley > wrote: > On Mon, Aug 24, 2020 at 10:56 AM Mark Lohry > wrote: > Thanks Barry, I'll give -malloc_debug a shot. > > I know this is not necessarily a reasonable test but if you run the exact same thing twice does it crash at the same location in terms of iterations or does it seem to crash eventually "randomly" just after a long time? > > Crashes after a different number of iterations, seemingly random. > > > I understand the frustration with this kind of crash, it just shouldn't happen because the same BLAS calls have been made in the same way thousands of times and yet suddenly trouble and very hard to debug. > > Eventually makes for a good war story. > > Thinking back, I have seen some disturbing memory behavior that I think falls back to my use of eigen... e.g. in the past when running my full test suite a particular case would fail with NaNs, but if I ran that case in isolation it passes. I wonder if some object isn't getting properly aligned and at some point some kind of corruption occurs? > > Do you ever use regular malloc()? PETSc malloc aligns automatically, but the system one does not. > > Thanks, > > Matt > > On Mon, Aug 24, 2020 at 10:35 AM Barry Smith > wrote: > > Mark, > > Ok, I'd generally trust the stock BLAS for not failing over OpenBLAS. > > Since valgrind is not viable have you tried with -malloc_debug with the bad case it will be a little bit slower but not to bad and can find some memory corruption issues. > > It might be useful to get the stack trace inside the BLAS to see exactly where it crashes. If you ./configure with debugging and use --download-fblaslapack or --download-f2cblaslapack it will compile the BLAS with debugging, but just running a batch job still won't display the stack frames inside the BLAS call. > > We have an option -on_error_attach_debugger which is useful for longer many rank runs that attaches the debugger ONLY when the error is detected but it may not play well with batch systems. But if you can make your run on a non-batch system it might be able, along with the --download-fblaslapack or --download-f2cblaslapack to get the exact stack frames. And in the debugger look at the variables and address points to try to determine how it could have gone wrong. > > I know this is not necessarily a reasonable test but if you run the exact same thing twice does it crash at the same location in terms of iterations or does it seem to crash eventually "randomly" just after a long time? > > I understand the frustration with this kind of crash, it just shouldn't happen because the same BLAS calls have been made in the same way thousands of times and yet suddenly trouble and very hard to debug. > > Barry > > > > >> On Aug 24, 2020, at 9:15 AM, Mark Lohry > wrote: >> >> valgrind: I ran a much smaller case and didn't see any issues in valgrind. I'm only seeing this bus error on several hundred cores a few hours wallclock in, so it might not be feasible to run that in valgrind. >> >> blas: i'm not entirely sure -- it's the stock one in PUIAS linux (red hat derivative), libblas.so.3.4.2.. i'm going to try with intel and if that fails use the openblas downloaded via petsc and see if it alleviates itself. >> >> >> >> On Mon, Aug 24, 2020 at 9:48 AM Barry Smith > wrote: >> >> Mark, >> >> Can you run in valgrind? >> >> Exactly what BLAS are you using? >> >> Barry >> >> >>> On Aug 24, 2020, at 7:54 AM, Mark Lohry > wrote: >>> >>> Reran with debug mode and got a stack trace for this bus error, looks like it's happening in BLASgemv, see pasted below. I did take care of the ISColoring leak mentioned previously, although that was a very small amount of data and I don't think is relevant here. >>> >>> At this point it's happily run 222 timesteps prior to this, so I'm a little mystified. Any ideas? >>> >>> Thanks, >>> Mark >>> >>> >>> 222 TS dt 0.03 time 6.66 >>> 0 SNES Function norm 4.124287265556e+02 >>> 0 KSP Residual norm 4.124287265556e+02 >>> 1 KSP Residual norm 4.123248052318e+02 >>> 2 KSP Residual norm 4.123173350456e+02 >>> 3 KSP Residual norm 4.118769044110e+02 >>> 4 KSP Residual norm 4.094856150740e+02 >>> 5 KSP Residual norm 4.006000788078e+02 >>> 6 KSP Residual norm 3.787922969183e+02 >>> [clip] >>> Linear solve converged due to CONVERGED_RTOL iterations 9 >>> Line search: Using full step: fnorm 4.015236590684e+01 gnorm 3.173434863784e+00 >>> 2 SNES Function norm 3.173434863784e+00 >>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 2 >>> 0 SNES Function norm 5.842010710080e+02 >>> 0 KSP Residual norm 5.842010710080e+02 >>> 1 KSP Residual norm 5.840526408234e+02 >>> 2 KSP Residual norm 5.840431857354e+02 >>> 3 KSP Residual norm 5.834351392302e+02 >>> 4 KSP Residual norm 5.800901047861e+02 >>> 5 KSP Residual norm 5.675562288567e+02 >>> 6 KSP Residual norm 5.366287895681e+02 >>> 7 KSP Residual norm 4.725811521866e+02 >>> [911]PETSC ERROR: ------------------------------------------------------------------------ >>> [911]PETSC ERROR: Caught signal number 7 BUS: Bus Error, possibly illegal memory access >>> [911]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >>> [911]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >>> [911]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors >>> [911]PETSC ERROR: likely location of problem given in stack below >>> [911]PETSC ERROR: --------------------- Stack Frames ------------------------------------ >>> [911]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, >>> [911]PETSC ERROR: INSTEAD the line number of the start of the function >>> [911]PETSC ERROR: is given. >>> [911]PETSC ERROR: [911] BLASgemv line 1393 /home/mlohry/build/external/petsc/src/mat/impls/baij/seq/baijfact.c >>> [911]PETSC ERROR: [911] MatSolve_SeqBAIJ_N_NaturalOrdering line 1378 /home/mlohry/build/external/petsc/src/mat/impls/baij/seq/baijfact.c >>> [911]PETSC ERROR: [911] MatSolve line 3354 /home/mlohry/build/external/petsc/src/mat/interface/matrix.c >>> [911]PETSC ERROR: [911] PCApply_ILU line 201 /home/mlohry/build/external/petsc/src/ksp/pc/impls/factor/ilu/ilu.c >>> [911]PETSC ERROR: [911] PCApply line 426 /home/mlohry/build/external/petsc/src/ksp/pc/interface/precon.c >>> [911]PETSC ERROR: [911] KSP_PCApply line 279 /home/mlohry/build/external/petsc/include/petsc/private/kspimpl.h >>> [911]PETSC ERROR: [911] KSPSolve_PREONLY line 16 /home/mlohry/build/external/petsc/src/ksp/ksp/impls/preonly/preonly.c >>> [911]PETSC ERROR: [911] KSPSolve_Private line 590 /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >>> [911]PETSC ERROR: [911] KSPSolve line 848 /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >>> [911]PETSC ERROR: [911] PCApply_ASM line 441 /home/mlohry/build/external/petsc/src/ksp/pc/impls/asm/asm.c >>> [911]PETSC ERROR: [911] PCApply line 426 /home/mlohry/build/external/petsc/src/ksp/pc/interface/precon.c >>> [911]PETSC ERROR: [911] KSP_PCApply line 279 /home/mlohry/build/external/petsc/include/petsc/private/kspimpl.h >>> [911]PETSC ERROR: [911] KSPFGMRESCycle line 108 /home/mlohry/build/external/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c >>> [911]PETSC ERROR: [911] KSPSolve_FGMRES line 274 /home/mlohry/build/external/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c >>> [911]PETSC ERROR: [911] KSPSolve_Private line 590 /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >>> [911]PETSC ERROR: [911] KSPSolve line 848 /home/mlohry/build/external/petsc/src/ksp/ksp/interface/itfunc.c >>> [911]PETSC ERROR: [911] SNESSolve_NEWTONLS line 144 /home/mlohry/build/external/petsc/src/snes/impls/ls/ls.c >>> [911]PETSC ERROR: [911] SNESSolve line 4403 /home/mlohry/build/external/petsc/src/snes/interface/snes.c >>> [911]PETSC ERROR: [911] TSStep_ARKIMEX line 728 /home/mlohry/build/external/petsc/src/ts/impls/arkimex/arkimex.c >>> [911]PETSC ERROR: [911] TSStep line 3682 /home/mlohry/build/external/petsc/src/ts/interface/ts.c >>> [911]PETSC ERROR: [911] TSSolve line 4005 /home/mlohry/build/external/petsc/src/ts/interface/ts.c >>> [911]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>> [911]PETSC ERROR: Signal received >>> [911]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>> [911]PETSC ERROR: Petsc Release Version 3.13.3, Jul 01, 2020 >>> [911]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h20c2n20 by mlohry Sun Aug 23 19:54:21 2020 >>> [911]PETSC ERROR: Configure options PETSC_DIR=/home/mlohry/build/external/petsc PETSC_ARCH=arch-linux2-c-opt --with-cc=/usr/local/openmpi/3.1.3/gcc/x8 >>> [911]PETSC ERROR: #1 User provided function() line 0 in unknown file >>> -------------------------------------------------------------------------- >>> MPI_ABORT was invoked on rank 911 in communicator MPI_COMM_WORLD >>> >>> On Wed, Aug 12, 2020 at 8:19 PM Mark Lohry > wrote: >>> Perhaps you are calling ISColoringGetIS() and not calling ISColoringRestoreIS()? >>> >>> I have matching ISColoringGet/Restore here, and it's only used prior to the first iteration so at least it doesn't seem to be growing. At the bottom I pasted the malloc_view and malloc_debug output from running 1 time step. >>> >>> I'm sort of thinking this might be a red herring -- is it possible the rank 0 process is chewing up dramatically more memory than others, like with logging or something? Like I mentioned earlier the total memory usage is well under the machine limits. I'll spring in some PetscMemoryGetMaximumUsage logging at every time step and try to get a big job going again. >>> >>> >>> >>> Are you using Fortran? >>> >>> C++ >>> >>> >>> >>> [ 0]1408 bytes PetscSplitReductionCreate() line 63 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>> [ 0]80 bytes PetscSplitReductionCreate() line 57 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>> [ 0]16 bytes PetscCommBuildTwoSided_Allreduce() line 169 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/mpits.c >>> [ 0]16 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]272 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]880 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]960 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]976 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]1040 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>> [ 0]64 bytes ISColoringGetIS() line 266 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/utils/iscoloring.c >>> [ 0]32 bytes PetscCommDuplicate() line 129 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/objects/tagm.c >>> [0] Maximum memory PetscMalloc()ed 610153776 maximum size of entire process 719073280 >>> [0] Memory usage sorted by function >>> [0] 6 192 DMCoarsenHookAdd() >>> [0] 2 9984 DMCreate() >>> [0] 2 128 DMCreate_Shell() >>> [0] 2 64 DMDSEnlarge_Static() >>> [0] 1 672 DMKSPCreate() >>> [0] 3 96 DMRefineHookAdd() >>> [0] 3 2064 DMSNESCreate() >>> [0] 4 128 DMSubDomainHookAdd() >>> [0] 1 768 DMTSCreate() >>> [0] 2 96 ISColoringCreate() >>> [0] 8 12608 ISColoringGetIS() >>> [0] 1 307200 ISConcatenate() >>> [0] 29 25984 ISCreate() >>> [0] 25 400 ISCreate_General() >>> [0] 4 64 ISCreate_Stride() >>> [0] 20 338016 ISGeneralSetIndices_General() >>> [0] 3 921600 ISGetIndices_Stride() >>> [0] 2 307232 ISGlobalToLocalMappingSetUp_Basic() >>> [0] 1 6144 ISInvertPermutation_General() >>> [0] 3 308576 ISLocalToGlobalMappingCreate() >>> [0] 2 32 KSPConvergedDefaultCreate() >>> [0] 2 2816 KSPCreate() >>> [0] 1 224 KSPCreate_FGMRES() >>> [0] 1 8016 KSPGMRESClassicalGramSchmidtOrthogonalization() >>> [0] 2 16032 KSPSetUp_FGMRES() >>> [0] 4 16084160 KSPSetUp_GMRES() >>> [0] 2 36864 MatColoringApply_SL() >>> [0] 1 656 MatColoringCreate() >>> [0] 6 17088 MatCreate() >>> [0] 1 16 MatCreateMFFD_WP() >>> [0] 1 16 MatCreateSubMatrices_SeqBAIJ() >>> [0] 1 12288 MatCreateSubMatrix_SeqBAIJ() >>> [0] 3 32320 MatCreateSubMatrix_SeqBAIJ_Private() >>> [0] 2 1472 MatCreate_MFFD() >>> [0] 1 416 MatCreate_SeqAIJ() >>> [0] 3 864 MatCreate_SeqBAIJ() >>> [0] 2 416 MatCreate_Shell() >>> [0] 1 784 MatFDColoringCreate() >>> [0] 2 12288 MatFDColoringDegreeSequence_Minpack() >>> [0] 6 30859392 MatFDColoringSetUp_SeqXAIJ() >>> [0] 3 42512 MatGetColumnIJ_SeqAIJ() >>> [0] 4 72720 MatGetColumnIJ_SeqBAIJ_Color() >>> [0] 1 6144 MatGetOrdering_Natural() >>> [0] 2 36384 MatGetRowIJ_SeqAIJ() >>> [0] 7 210626000 MatILUFactorSymbolic_SeqBAIJ() >>> [0] 2 313376 MatIncreaseOverlap_SeqBAIJ() >>> [0] 2 30740608 MatLUFactorNumeric_SeqBAIJ_N() >>> [0] 1 6144 MatMarkDiagonal_SeqAIJ() >>> [0] 1 6144 MatMarkDiagonal_SeqBAIJ() >>> [0] 8 256 MatRegisterRootName() >>> [0] 1 6160 MatSeqAIJCheckInode() >>> [0] 4 115216 MatSeqAIJSetPreallocation_SeqAIJ() >>> [0] 4 302779424 MatSeqBAIJSetPreallocation_SeqBAIJ() >>> [0] 13 576 MatSolverTypeRegister() >>> [0] 1 16 PCASMCreateSubdomains() >>> [0] 2 1664 PCCreate() >>> [0] 1 160 PCCreate_ASM() >>> [0] 1 192 PCCreate_ILU() >>> [0] 5 307264 PCSetUp_ASM() >>> [0] 2 416 PetscBTCreate() >>> [0] 2 3216 PetscClassPerfLogCreate() >>> [0] 2 1616 PetscClassRegLogCreate() >>> [0] 2 32 PetscCommBuildTwoSided_Allreduce() >>> [0] 2 64 PetscCommDuplicate() >>> [0] 2 1888 PetscDSCreate() >>> [0] 2 26416 PetscEventPerfLogCreate() >>> [0] 2 158400 PetscEventPerfLogEnsureSize() >>> [0] 2 1616 PetscEventRegLogCreate() >>> [0] 2 9600 PetscEventRegLogRegister() >>> [0] 8 102400 PetscFreeSpaceGet() >>> [0] 474 15168 PetscFunctionListAdd_Private() >>> [0] 2 528 PetscIntStackCreate() >>> [0] 142 11360 PetscLayoutCreate() >>> [0] 56 896 PetscLayoutSetUp() >>> [0] 59 9440 PetscObjectComposedDataIncreaseReal() >>> [0] 2 576 PetscObjectListAdd() >>> [0] 33 768 PetscOptionsGetEList() >>> [0] 1 16 PetscOptionsHelpPrintedCreate() >>> [0] 1 32 PetscPushSignalHandler() >>> [0] 7 6944 PetscSFCreate() >>> [0] 3 432 PetscSFCreate_Basic() >>> [0] 2 1472 PetscSFLinkCreate() >>> [0] 11 1229040 PetscSFSetUpRanks() >>> [0] 7 614512 PetscSFSetUp_Basic() >>> [0] 4 20096 PetscSegBufferCreate() >>> [0] 2 1488 PetscSplitReductionCreate() >>> [0] 2 3008 PetscStageLogCreate() >>> [0] 1148 23872 PetscStrallocpy() >>> [0] 6 13056 PetscStrreplace() >>> [0] 9 3456 PetscTableCreate() >>> [0] 1 16 PetscViewerASCIIOpen() >>> [0] 6 96 PetscViewerAndFormatCreate() >>> [0] 1 752 PetscViewerCreate() >>> [0] 1 96 PetscViewerCreate_ASCII() >>> [0] 2 1424 SNESCreate() >>> [0] 1 16 SNESCreate_NEWTONLS() >>> [0] 1 1008 SNESLineSearchCreate() >>> [0] 1 16 SNESLineSearchCreate_BT() >>> [0] 16 1824 SNESMSRegister() >>> [0] 46 9056 TSARKIMEXRegister() >>> [0] 1 1264 TSAdaptCreate() >>> [0] 8 384 TSBasicSymplecticRegister() >>> [0] 1 2160 TSCreate() >>> [0] 1 224 TSCreate_Theta() >>> [0] 48 5968 TSGLEERegister() >>> [0] 41 7728 TSRKRegister() >>> [0] 89 14736 TSRosWRegister() >>> [0] 71 110192 VecCreate() >>> [0] 1 307200 VecCreateGhostWithArray() >>> [0] 123 36874080 VecCreate_MPI_Private() >>> [0] 7 4300800 VecCreate_Seq() >>> [0] 8 256 VecCreate_Seq_Private() >>> [0] 6 400 VecDuplicateVecs_Default() >>> [0] 3 2352 VecScatterCreate() >>> [0] 7 1843296 VecScatterSetUp_SF() >>> [0] 126 2016 VecStashCreate_Private() >>> [0] 1 3072 mapBlockColoringToJacobian() >>> >>> On Wed, Aug 12, 2020 at 4:22 PM Barry Smith > wrote: >>> >>> Yes, there are some PETSc objects or arrays that you are not freeing so they are printed at the end of the run. For small runs this harmless but if new objects/memory is allocated at each iteration and not suitably freed it will eventually add up. >>> >>> Run with -malloc_view (small problem with say 2 iterations) it will print everything allocated and might be helpful. >>> >>> Perhaps you are calling ISColoringGetIS() and not calling ISColoringRestoreIS()? >>> >>> It is also possible it is a leak in PETSc, but that is unlikely since we test for them. >>> >>> Are you using Fortran? >>> >>> Barry >>> >>> >>>> On Aug 12, 2020, at 1:29 PM, Mark Lohry > wrote: >>>> >>>> Thanks Matt and Barry. At Matt's suggestion I ran a smaller representative case with valgrind and didn't see anything alarming (apart from a small leak in an older boost version I was using: https://github.com/boostorg/serialization/issues/104 although I don't think this was causing the issue). >>>> >>>> -malloc_debug dumps quite a lot, this is supposed to be empty right? Output pasted below. It looks like the same sequence of calls is repeated 8 times, which is how many nonlinear solves occurred in this particular run. Thoughts? >>>> >>>> >>>> >>>> [ 0]1408 bytes PetscSplitReductionCreate() line 63 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>>> [ 0]80 bytes PetscSplitReductionCreate() line 57 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/vec/utils/comb.c >>>> [ 0]16 bytes PetscCommBuildTwoSided_Allreduce() line 169 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/mpits.c >>>> [ 0]16 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>> [ 0]272 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>> [ 0]880 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>> [ 0]960 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>> [ 0]976 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>> [ 0]1024 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>> [ 0]1040 bytes ISGeneralSetIndices_General() line 578 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]16 bytes PetscLayoutSetUp() line 269 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]80 bytes PetscLayoutCreate() line 55 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/utils/pmap.c >>>> [ 0]16 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 255 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]32 bytes PetscStrallocpy() line 187 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/utils/str.c >>>> [ 0]32 bytes PetscFunctionListAdd_Private() line 222 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/dll/reg.c >>>> [ 0]16 bytes ISCreate_General() line 647 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/impls/general/general.c >>>> [ 0]896 bytes ISCreate() line 37 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/interface/isreg.c >>>> [ 0]64 bytes ISColoringGetIS() line 266 in /home/mlohry/dev/cmake-build/external/petsc/src/vec/is/is/utils/iscoloring.c >>>> [ 0]32 bytes PetscCommDuplicate() line 129 in /home/mlohry/dev/cmake-build/external/petsc/src/sys/objects/tagm.c >>>> >>>> >>>> >>>> On Wed, Aug 12, 2020 at 1:46 PM Barry Smith > wrote: >>>> >>>> Mark. >>>> >>>> When valgrind is not feasible (like on many centrally controlled batch systems) you can run PETSc with an extra flag to do some memory error checks >>>> -malloc_debug >>>> >>>> this >>>> >>>> 1) fills all malloced memory with Nan so if the code is using uninitialized memory it may be detected and >>>> 2) checks the beginning and end of each alloced memory region for out-of-bounds writes at each malloc and free. >>>> >>>> it will slow the code down a little bit but generally not a huge amount. >>>> >>>> It is no where near as good as valgrind or other memory corruption tools but it has the advantage you can run it anywhere on any size job. >>>> >>>> >>>> Barry >>>> >>>> >>>> >>>> >>>> >>>>> On Aug 12, 2020, at 7:46 AM, Matthew Knepley > wrote: >>>>> >>>>> On Wed, Aug 12, 2020 at 7:53 AM Mark Lohry > wrote: >>>>> I'm getting seemingly random failures of late: >>>>> Caught signal number 7 BUS: Bus Error, possibly illegal memory access >>>>> >>>>> The first thing I would do is run valgrind on as wide an array of tests as you can. This will find problems >>>>> on things that run completely fine. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> Symptoms: >>>>> 1) Seems to only happen (so far) on larger cases, 400-2000 cores >>>>> 2) It doesn't happen right away -- this was running happily for several hours over several hundred time steps with no indication of bad health in the numerics >>>>> 3) At least the total memory consumption seems to be within bounds, though I'm not sure about individual processes. e.g. slurm here reported Memory Efficiency: 75.23% of 1.76 TB (180.00 GB/node) >>>>> 4) running the same setup twice it fails at different points >>>>> >>>>> Any suggestions on what to look for? This is a bit painful to work on as I can only reproduce it on large runs and then it's seemingly random. >>>>> >>>>> >>>>> Thanks, >>>>> Mark >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>> >> > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Aug 24 12:45:09 2020 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 24 Aug 2020 12:45:09 -0500 Subject: [petsc-users] Bus Error In-Reply-To: <87364cgczg.fsf@jedbrown.org> References: <8D172ADD-FC1A-4E71-B151-CA648951A61C@petsc.dev> <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> <02176361-CBC4-490B-A064-057C9FAC271A@petsc.dev> <2C14E111-0ABC-4322-AD1C-FC83E4BD4798@petsc.dev> <87h7ssgg0g.fsf@jedbrown.org> <80DC4DC1-8D9B-43BF-9947-F47694FE68E9@petsc.dev> <87blj0gew3.fsf@jedbrown.org> <878se4gdd6.fsf@jedbrown.org> <3B4D8471-F863-49C8-AAD7-4A4ACA3BE78A@petsc.dev> <87364cgczg.fsf@jedbrown.org> Message-ID: > On Aug 24, 2020, at 12:39 PM, Jed Brown wrote: > > Barry Smith writes: > >>> On Aug 24, 2020, at 12:31 PM, Jed Brown wrote: >>> >>> Barry Smith writes: >>> >>>> So if a BLAS errors with SIGBUS then it is always an input error of just not proper double/complex alignment? Or some other very strange thing? >>> >>> I would suspect memory corruption. >> >> >> Corruption meaning what specifically? >> >> The routines crashing are dgemv which only take double precision arrays, regardless of what garbage is in those arrays i don't think there can be BUS errors resulting. They don't take integer arrays whose corruption could result in bad indexing and then BUS errors. >> >> So then it can only be corruption of the pointers passed in, correct? > > Such as those pointers pointing into data on the stack with incorrect sizes. But won't incorrect sizes "usually" lead to SEGV not SEGBUS? From knepley at gmail.com Mon Aug 24 13:06:11 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 24 Aug 2020 14:06:11 -0400 Subject: [petsc-users] Bus Error In-Reply-To: References: <8D172ADD-FC1A-4E71-B151-CA648951A61C@petsc.dev> <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> <02176361-CBC4-490B-A064-057C9FAC271A@petsc.dev> <2C14E111-0ABC-4322-AD1C-FC83E4BD4798@petsc.dev> <87h7ssgg0g.fsf@jedbrown.org> <80DC4DC1-8D9B-43BF-9947-F47694FE68E9@petsc.dev> <87blj0gew3.fsf@jedbrown.org> <878se4gdd6.fsf@jedbrown.org> <3B4D8471-F863-49C8-AAD7-4A4ACA3BE78A@petsc.dev> <87364cgczg.fsf@jedbrown.org> Message-ID: On Mon, Aug 24, 2020 at 1:46 PM Barry Smith wrote: > > > > On Aug 24, 2020, at 12:39 PM, Jed Brown wrote: > > > > Barry Smith writes: > > > >>> On Aug 24, 2020, at 12:31 PM, Jed Brown wrote: > >>> > >>> Barry Smith writes: > >>> > >>>> So if a BLAS errors with SIGBUS then it is always an input error of > just not proper double/complex alignment? Or some other very strange thing? > >>> > >>> I would suspect memory corruption. > >> > >> > >> Corruption meaning what specifically? > >> > >> The routines crashing are dgemv which only take double precision > arrays, regardless of what garbage is in those arrays i don't think there > can be BUS errors resulting. They don't take integer arrays whose > corruption could result in bad indexing and then BUS errors. > >> > >> So then it can only be corruption of the pointers passed in, correct? > > > > Such as those pointers pointing into data on the stack with incorrect > sizes. > > But won't incorrect sizes "usually" lead to SEGV not SEGBUS? > My understanding was that roughly memory errors in the heap are SEGV and memory errors on the stack are SIGBUS. Is that not true? Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rtmills at anl.gov Mon Aug 24 13:17:16 2020 From: rtmills at anl.gov (Mills, Richard Tran) Date: Mon, 24 Aug 2020 18:17:16 +0000 Subject: [petsc-users] Postdoctoral position at Argonne: Numerical Solvers for Next Generation High Performance Computing Architectures Message-ID: Dear PETSc Users and Developers, The PETSc/TAO team at Argonne National Laboratory has an opening for a postdoctoral researcher to work on development of robust and efficient algebraic solvers and related technologies targeting exascale-class supercomputers -- such as the Aurora machine slated to be the first exascale computer in the United States and fielded at Argonne -- and other novel high-performance computing (HPC) architectures. For those interested, please see the job posting at https://bit.ly/3kPtY8L. Best regards, Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Aug 24 13:59:28 2020 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 24 Aug 2020 13:59:28 -0500 Subject: [petsc-users] Bus Error In-Reply-To: References: <8D172ADD-FC1A-4E71-B151-CA648951A61C@petsc.dev> <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> <02176361-CBC4-490B-A064-057C9FAC271A@petsc.dev> <2C14E111-0ABC-4322-AD1C-FC83E4BD4798@petsc.dev> <87h7ssgg0g.fsf@jedbrown.org> <80DC4DC1-8D9B-43BF-9947-F47694FE68E9@petsc.dev> <87blj0gew3.fsf@jedbrown.org> <878se4gdd6.fsf@jedbrown.org> <3B4D8471-F863-49C8-AAD7-4A4ACA3BE78A@petsc.dev> <87364cgczg.fsf@jedbrown.org> Message-ID: <79E082F4-0261-4F32-9781-861B2B650511@petsc.dev> https://en.wikipedia.org/wiki/Bus_error But perhaps not true for Intel? > On Aug 24, 2020, at 1:06 PM, Matthew Knepley wrote: > > On Mon, Aug 24, 2020 at 1:46 PM Barry Smith > wrote: > > > > On Aug 24, 2020, at 12:39 PM, Jed Brown > wrote: > > > > Barry Smith > writes: > > > >>> On Aug 24, 2020, at 12:31 PM, Jed Brown > wrote: > >>> > >>> Barry Smith > writes: > >>> > >>>> So if a BLAS errors with SIGBUS then it is always an input error of just not proper double/complex alignment? Or some other very strange thing? > >>> > >>> I would suspect memory corruption. > >> > >> > >> Corruption meaning what specifically? > >> > >> The routines crashing are dgemv which only take double precision arrays, regardless of what garbage is in those arrays i don't think there can be BUS errors resulting. They don't take integer arrays whose corruption could result in bad indexing and then BUS errors. > >> > >> So then it can only be corruption of the pointers passed in, correct? > > > > Such as those pointers pointing into data on the stack with incorrect sizes. > > But won't incorrect sizes "usually" lead to SEGV not SEGBUS? > > My understanding was that roughly memory errors in the heap are SEGV and memory errors on the stack are SIGBUS. Is that not true? > > Matt > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From thibault.bridelbertomeu at gmail.com Mon Aug 24 14:20:49 2020 From: thibault.bridelbertomeu at gmail.com (Thibault Bridel-Bertomeu) Date: Mon, 24 Aug 2020 21:20:49 +0200 Subject: [petsc-users] PetscFV and TS implicit In-Reply-To: References: <87mu2pgtdp.fsf@jedbrown.org> <01FA5D4D-A0CA-4ACB-ACC9-EB213E3B0D2F@petsc.dev> Message-ID: Good evening everyone, Thanks Barry for your answer. Le lun. 24 ao?t 2020 ? 18:51, Barry Smith a ?crit : > > > On Aug 24, 2020, at 11:38 AM, Thibault Bridel-Bertomeu < > thibault.bridelbertomeu at gmail.com> wrote: > > Thank you Barry for taking the time to go through the code ! > > I indeed figured out this afternoon that the function related to the > matrix-vector product is always handling global vectors. I corrected mine > so that it compiles, but I have a feeling it won't run properly without a > preconditioner. > > Anyways you are right, my PetscJacobianFunction_JFNK() aims at doing some > basic finite-differencing ; user->RHS_ref is my F(U) if you see the system > as dU/dt = F(U). As for the DMGlobalToLocal() it was there mainly because I > had not realized the vectors I was manipulating were global. > I will take your advice and try with just the SNESSetUseMatrixFree. > I haven't quite fully understood what it does "under the hood" though: > just calling SNESSetUseMatrixFree(snes,PETSC_TRUE,PETSC_FALSE) before the > TSSolve call is enough to ensure that the implicit matrix is computed ? > Does it use the function we set as a RHS to build the matrix ? > > > All it does is "replace" the A matrix with one automatically created > for the job using MatCreateMFFD(). It does not touch the B matrix, it does > not build the matrix but yes if does use the function to provide to do the > differencing. > OK, thank you. This MFFD Matrix is then called by the TS to construct the linear system that will be solved to advance the system of equations, right ? > > To create the preconditioner I will do as you suggest too, thank you. This > matrix has to be as close as possible to the inverse of the implicit matrix > to ensure that the eigenvalues of the system are as close to 1 as possible. > Given the implicit matrix is built "automatically" thanks to the SNES > matrix free capability, can we use that matrix as a starting point to the > building of the preconditioner ? > > > No the MatrixFree doesn't build a matrix, it can only do matrix-vector > products with differencing. > My bad, wrong word. Yes of course it's all matrix-free hence it's just a functional, however maybe the inner mechanisms can be accessed and used for the preconditioner ? > You were talking about the coloring capabilities in PETSc, is that where > it can be applied ? > > > Yes you can use that. See MatFDColoringCreate() but since you are using > a DM in theory you can use -snes_fd_color and PETSc will manage everything > for you so you don't have to write any code for Jacobians at all. Again it > uses your function to do differences using coloring to be efficient to > build the Jacobian for you. > I read a bit about the coloring you are mentioning. As I understand it, it is another option to have a matrix-free Jacobian behavior during the Newton-Krylov iterations, right ? Either we use the SNESSetUseMatrixFree() alone, then it works using "basic" finite-differencing, or we use the SNESSetUseMatrixFree + MatFDColoringCreate & SNESComputeJacobianDefaultColor as an option to SNESSetJacobian to access the finite-differencing based on coloring. Is that right ? Then if i come back to my preconditioner problem ... once you have set-up the implicit matrix with one or the other aforementioned matrix-free ways, how would you go around setting up the preconditioner ? In a matrix-free way too, or rather as a real matrix that we assemble ourselves this time, as you seemed to mean with the previous MatAij DMCreateMatrix ? Sorry if it seems like I am nagging, but I would really like to understand how to manipulate the matrix-free methods and structures in PETSc to run a time-implicit finite volume computation, it's so promising ! Thanks again, Thibault > Barry > > Internally it uses SNESComputeJacobianDefaultColor() if you are interested > in what it does. > > > > > > Thank you so much again, > > Thibault > > > Le lun. 24 ao?t 2020 ? 15:45, Barry Smith a ?crit : > >> >> I think the attached is wrong. >> >> >> >> The input to the matrix vector product for the Jacobian is always global >> vectors which means on each process the dimension is not the size of the >> DMGetLocalVector() it should be the VecGetLocalSize() of the >> DMGetGlobalVector() >> >> But you may be able to skip all this and have the DM create the shell >> matrix setting it sizes appropriately and you only need to supply the MATOP >> >> DMSetMatType(dm,MATSHELL); >> DMCreateMatrix(dm,&A); >> >> In fact, I also don't understand the PetscJacobianFunction_JFKN() >> function It seems to be doing finite differencing on the >> DMPlexTSComputeRHSFunctionFVM() assuming the current function value is in >> usr->RHS_ref. How is this different than just letting PETSc/SNES used >> finite differences to do the matrix-vector product. Your code seems rather >> complicated with the DMGlobalToLocal() which I don't understand what it is >> suppose to do there. >> >> I think you can just call >> >> TSGetSNES() >> SNESSetUseMatrixFree(snes,PETSC_TRUE,PETSC_FALSE); >> >> and it will set up an internal matrix that does the finite differencing >> for you. Then you never need a shell matrix. >> >> >> Also to create the preconditioner matrix B this should work >> >> DMSetMatType(dm,MATAIJ); >> DMCreateMatrix(dm,&B); >> >> no need for you to figure out the sizes. >> >> >> Note that both A and B need to have the same dimensions on each process >> as the global vectors which I don't think your current code has. >> >> >> >> Barry >> >> >> >> On Aug 24, 2020, at 12:56 AM, Thibault Bridel-Bertomeu < >> thibault.bridelbertomeu at gmail.com> wrote: >> >> Barry, first of all, thank you very much for your detailed answer, I keep >> reading it to let it soak in - I might come back to you for more details if >> you do not mind. >> >> In the meantime, to fuel the conversation, I attach to this e-mail two >> pdfs containing the pieces of the code that regard what we are discussing. >> In the *timedisc.pdf, you'll find how I handle the initialization of the TS >> object, and in the *petscdefs.pdf you'll find the method that calls the >> TSSolve as well as the methods that are linked to the TS (the timestep >> adapt, the jacobian etc ...). [Sorry for the quality, I cannot do better >> than that sort of pdf ...] >> >> Based on what is in the structured code I sent you the other day, I >> rewrote the PetscJacobianFunction_JFNK. I think it should be all right, but >> although it compiles, execution raises a seg fault I think when I do >> ierr = TSSetIJacobian(ts, A, A, PetscIJacobian, user); >> saying that A does not have the right dimensions. It is quite new, I am >> still looking into where exactly the error is raised. What do you think of >> this implementation though, does it look correct in your expert eyes ? >> As for what we really discussed so far, it's that >> PetscComputePreconMatImpl that I do not know how to implement (with the >> derivative of the jacobian based on the FVM object). >> >> I understand now that what I am showing you today might not be the right >> way to go if one wants to really use the PetscFV, but I just wanted to add >> those code lines to the conversation to have your feedback. >> >> Thank you again for your help, >> >> Thibault >> >> >> Le ven. 21 ao?t 2020 ? 19:25, Barry Smith a ?crit : >> >> >>> >>> On Aug 21, 2020, at 10:58 AM, Thibault Bridel-Bertomeu < >>> thibault.bridelbertomeu at gmail.com> wrote: >>> >>> Thank you Barry for the tip ! I?ll make sure to do that when everything >>> is set. >>> What I also meant is that there will not be any more direct way to set >>> the preconditioner than to go through SNESSetJacobian after having >>> assembled everything by hand ? Like, in my case, or in the more general >>> case of fluid dynamics equations, the preconditioner is not a fun matrix to >>> assemble, because for every cell the derivative of the physical flux >>> jacobian has to be taken and put in the right block in the matrix - finite >>> element style if you want. Is there a way to do that with Petsc methods, >>> maybe short-circuiting the FEM based methods ? >>> >>> >>> Thibault >>> >>> I am not sure what you mean but there are a couple of things that may >>> be helpful. >>> >>> PCSHELL >>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCSHELL.html allows >>> you to build your own preconditioner (that can and often will use one or >>> more of its own Mats, and KSP or PC inside it, or even use another PETScFV >>> etc to build some of the sub matrices for you if it is appropriate), this >>> approach means you never need to construct a "global" PETSc matrix from >>> which PETSc builds the preconditioner. But you should only do this if the >>> conventional approach is not reasonable for your problem. >>> >>> MATNEST >>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATNEST.html allows >>> you to build a global matrix by building parts of it separately and even >>> skipping parts you decide you don't need in the preconditioner. >>> Conceptually it is the same as just creating a global matrix and filling up >>> but the process is a bit different and something suitable for "multi >>> physics" or "multi-equation" type applications. >>> >>> Of course what you put into PCSHELL and MATNEST will affect the >>> convergence of the nonlinear solver. As Jed noted what you put in the >>> "Jacobian" does not have to be directly the same mathematically as what you >>> put into the TSSetI/RHSFunction with the caveat that it does have to >>> appropriate spectral properties to result in a good preconditioner for the >>> "true" Jacobian. >>> >>> Couple of other notes: >>> >>> The entire business of "Jacobian" matrix-free or not (with for example >>> -snes_fd_operator) is tricky because as Jed noted if your finite volume >>> scheme has non-differential terms such as if () tests. There is a concept >>> of sub-differential for this type of thing but I know absolutely nothing >>> about that and probably not worth investigating. >>> >>> In this situation you can avoid the "true" Jacobian completely (both for >>> matrix-vector product and preconditioner) and use something else as Jed >>> suggested a lower order scheme that is differentiable. This can work well >>> for solving the nonlinear system or not depending on how suitable it is for >>> your original "function" >>> >>> >>> 1) In theory at least you can have the Jacobian matrix-vector product >>> computed directly using DMPLEX/PETScFV infrastructure (it would apply the >>> Jacobian locally matrix-free using code similar to the code that evaluates >>> the FV "function". I do no know if any of this code is written, it will be >>> more efficient than -snes_mf_operator that evaluates the FV "function" and >>> does traditional differencing to compute the Jacobian. Again it has the >>> problem of non-differentialability if the function is not differential. But >>> it could be done for a different (lower order scheme) that is >>> differentiable. >>> >>> 2) You can have PETSc compute the Jacobian explicitly coloring and from >>> that build the preconditioner, this allows you to avoid the hassle of >>> writing the code for the derivatives yourself. This uses finite differences >>> on your function and coloring of the graph to compute many columns of the >>> Jacobian simultaneously and can be pretty efficient. Again if the function >>> is not differential there can be issues of what the result means and will >>> it work in a nonlinear solver. SNESComputeJacobianDefaultColor >>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESComputeJacobianDefaultColor.html >>> >>> 3) Much more outlandish is to skip Newton and Jacobians completely and >>> use the full approximation scheme SNESFAS >>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNESFAS/SNESFAS.html this >>> requires a grid hierarchy and appropriate way to interpolate up through the >>> grid hierarchy your finite volume solutions. Probably not worth >>> investigating unless you have lots of time on your hands and keen interest >>> in this kind of stuff https://arxiv.org/pdf/1607.04254.pdf >>> >>> So to summarize, and Matt and Jed can correct my mistakes. >>> >>> 1) Form the full Jacobian from the original "function" using analytic >>> approach use it for both the matrix-vector product and to build the >>> preconditioner. Problem if full Jacobian not well defined mathematically. >>> Tough to code, usually not practical. >>> >>> 2) Do any matrix free (any way) for the full Jacobian and >>> >>> a) build another "approximate" Jacobian (using any technique analytic >>> or finite differences using matrix coloring on a new "lower order" >>> "function") Still can have trouble if this original Jacobian is no well >>> defined >>> >>> b) "write your own preconditioner" that internally can use anything >>> in PETSc that approximately solves the Jacobian. Same potential problems if >>> original Jacobian is not differential, plus convergence will depend on how >>> good your own preconditioner approximates the inverse of the true Jacobian. >>> >>> 3) Use a lower Jacobian (computed anyway you want) for the matrix-vector >>> product and the preconditioner. The problem of differentiability is gone >>> but convergence of the nonlinear solver depends on how well lower order >>> Jacobian is appropriate for the original "function" >>> >>> a) Form the "lower order" Jacobian analytically or with coloring and >>> use for both matrix-vector product and building preconditioner. Note that >>> switching between this and 2a is trivial. >>> >>> b) Do the "lower order" Jacobian matrix free and provide your own >>> PCSHELL. Note that switching between this and 2b is trivial. >>> >>> Barry >>> >>> I would first try competing the "true" Jacobian via coloring, if that >>> works and give satisfactory results (fast enough) then stop. >>> >>> Then I would do 2a/2b by writing my "function" using PETScFV and >>> writing the "lower order function" via PETScFV and use matrix coloring to >>> get the Jacobian from the second "lower order function". If this works well >>> (either with 2a or 3a or both) then stop or you can compute the "lower >>> order" Jacobian analytically (again using PetscFV) for a more efficient >>> evaluation of the Jacobian. >>> >>> >> >>> >>> >>> Thanks ! >>> >>> Thibault >>> >>> Le ven. 21 ao?t 2020 ? 17:22, Barry Smith a ?crit : >>> >>> >>>> >>>> On Aug 21, 2020, at 8:35 AM, Thibault Bridel-Bertomeu < >>>> thibault.bridelbertomeu at gmail.com> wrote: >>>> >>>> >>>> >>>> Le ven. 21 ao?t 2020 ? 15:23, Matthew Knepley a >>>> ?crit : >>>> >>>>> On Fri, Aug 21, 2020 at 9:10 AM Thibault Bridel-Bertomeu < >>>>> thibault.bridelbertomeu at gmail.com> wrote: >>>>> >>>>>> Sorry, I sent too soon, I hit the wrong key. >>>>>> >>>>>> I wanted to say that context.npoints is the local number of cells. >>>>>> >>>>>> PetscRHSFunctionImpl allows to generate the hyperbolic part of the >>>>>> right hand side. >>>>>> Then we have : >>>>>> >>>>>> PetscErrorCode PetscIJacobian( >>>>>> TS ts, /*!< Time stepping object (see PETSc TS)*/ >>>>>> PetscReal t, /*!< Current time */ >>>>>> Vec Y, /*!< Solution vector */ >>>>>> Vec Ydot, /*!< Time-derivative of solution vector */ >>>>>> PetscReal a, /*!< Shift */ >>>>>> Mat A, /*!< Jacobian matrix */ >>>>>> Mat B, /*!< Preconditioning matrix */ >>>>>> void *ctxt /*!< Application context */ >>>>>> ) >>>>>> { >>>>>> PETScContext *context = (PETScContext*) ctxt; >>>>>> HyPar *solver = context->solver; >>>>>> _DECLARE_IERR_; >>>>>> >>>>>> PetscFunctionBegin; >>>>>> solver->count_IJacobian++; >>>>>> context->shift = a; >>>>>> context->waqt = t; >>>>>> /* Construct preconditioning matrix */ >>>>>> if (context->flag_use_precon) { IERR PetscComputePreconMatImpl(B,Y, >>>>>> context); CHECKERR(ierr); } >>>>>> >>>>>> PetscFunctionReturn(0); >>>>>> } >>>>>> >>>>>> and PetscJacobianFunction_JFNK which I bind to the matrix shell, >>>>>> computes the action of the jacobian on a vector : say U0 is the state of >>>>>> reference and Y the vector upon which to apply the JFNK method, then the >>>>>> PetscJacobianFunction_JFNK returns shift * Y - 1/epsilon * (F(U0 + >>>>>> epsilon*Y) - F(U0)) where F allows to evaluate the hyperbolic flux (shift >>>>>> comes from the TS). >>>>>> The preconditioning matrix I compute as an approximation to the >>>>>> actual jacobian, that is shift * Identity - Derivative(dF/dU) where dF/dU >>>>>> is, in each cell, a 4x4 matrix that is known exactly for the system of >>>>>> equations I am solving, i.e. Euler equations. For the structured grid, I >>>>>> can loop on the cells and do that 'Derivative' thing at first order by >>>>>> simply taking a finite-difference like approximation with the neighboring >>>>>> cells, Derivative(phi) = phi_i - phi_{i-1} and I assemble the B matrix >>>>>> block by block (JFunction is the dF/dU) >>>>>> >>>>>> /* diagonal element */ >>>>>> >>>>>> >>>>>> for (v=0; v>>>>> nvars*pg + v; } >>>>>> >>>>>> >>>>>> ierr = solver->JFunction >>>>>> >>>>>> (values,(u+nvars*p),solver->physics >>>>>> >>>>>> ,dir,0); >>>>>> >>>>>> >>>>>> _ArrayScale1D_ >>>>>> >>>>>> (values,(dxinv*iblank),(nvars*nvars)); >>>>>> >>>>>> >>>>>> ierr = >>>>>> MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); CHKERRQ(ierr); >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> /* left neighbor */ >>>>>> >>>>>> >>>>>> if (pgL >= 0) { >>>>>> >>>>>> >>>>>> for (v=0; v>>>>> nvars*pgL + v; } >>>>>> >>>>>> >>>>>> ierr = solver->JFunction >>>>>> >>>>>> (values,(u+nvars*pL),solver->physics >>>>>> >>>>>> ,dir,1); >>>>>> >>>>>> >>>>>> _ArrayScale1D_ >>>>>> >>>>>> (values,(-dxinv*iblank),(nvars*nvars)); >>>>>> >>>>>> >>>>>> ierr = >>>>>> MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); CHKERRQ(ierr); >>>>>> >>>>>> >>>>>> } >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> /* right neighbor */ >>>>>> >>>>>> >>>>>> if (pgR >= 0) { >>>>>> >>>>>> >>>>>> for (v=0; v>>>>> nvars*pgR + v; } >>>>>> >>>>>> >>>>>> ierr = solver->JFunction >>>>>> >>>>>> (values,(u+nvars*pR),solver->physics >>>>>> >>>>>> ,dir,-1); >>>>>> >>>>>> >>>>>> _ArrayScale1D_ >>>>>> >>>>>> (values,(-dxinv*iblank),(nvars*nvars)); >>>>>> >>>>>> >>>>>> ierr = >>>>>> MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); CHKERRQ(ierr); >>>>>> >>>>>> >>>>>> } >>>>>> >>>>>> >>>>>> >>>>>> I do not know if I am clear here ... >>>>>> Anyways, I am trying to figure out how to do this shell matrix and >>>>>> this preconditioner using all the FV and DMPlex artillery. >>>>>> >>>>> >>>>> Okay, that is very clear. We should be able to get the JFNK just with >>>>> -snes_mf_operator, and put the approximate J construction in >>>>> DMPlexComputeJacobian_Internal(). >>>>> There is an FV section already, and we could just add this. I would >>>>> need to understand those entries in the pointwise Riemann sense that the >>>>> other stuff is now. >>>>> >>>> >>>> Ok i had a quick look and if I understood correctly it would do the >>>> job. Setting the snes-mf-operator flag would mean however that we have to >>>> go through SNESSetJacobian to set the jacobian and the preconditioning >>>> matrix wouldn't it ? >>>> >>>> >>>> Thibault, >>>> >>>> Since the TS implicit methods end up using SNES internally the >>>> option should be available to you without requiring you to be calling the >>>> SNES routines directly >>>> >>>> Once you have finalized your approach and if for the implicit case >>>> you always work in the snes mf operator mode you can hardwire >>>> >>>> TSGetSNES(ts,&snes); >>>> SNESSetUseMatrixFree(snes,PETSC_TRUE,PETSC_FALSE); >>>> >>>> in your code so you don't need to always provide the option >>>> -snes-mf-operator >>>> >>> >>>> Barry >>>> >>>> >>>> >>>> >>>> There might be calls to the Riemann solver to evaluate the dRHS / dU >>>> part yes but maybe it's possible to re-use what was computed for the RHS^n ? >>>> In the FV section the jacobian is set to identity which I missed >>>> before, but it could explain why when I used the following : >>>> >>>> TSSetType(ts, TSBEULER); >>>> DMTSSetIFunctionLocal(dm, DMPlexTSComputeIFunctionFEM , &ctx); >>>> DMTSSetIJacobianLocal(dm, DMPlexTSComputeIJacobianFEM , &ctx); >>>> >>>> with my FV discretization nothing happened, right ? >>>> >>>> Thank you, >>>> >>>> Thibault >>>> >>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> Le ven. 21 ao?t 2020 ? 14:55, Thibault Bridel-Bertomeu < >>>>>> thibault.bridelbertomeu at gmail.com> a ?crit : >>>>>> >>>>> Hi, >>>>>>> >>>>>>> Thanks Matthew and Jed for your input. >>>>>>> I indeed envision an implicit solver in the sense Jed mentioned - >>>>>>> Jiri Blazek's book is a nice intro to this concept. >>>>>>> >>>>>>> Matthew, I do not know exactly what to change right now because >>>>>>> although I understand globally what the DMPlexComputeXXXX_Internal methods >>>>>>> do, I cannot say for sure line by line what is happening. >>>>>>> In a structured code, I have a an implicit FVM solver with PETSc but >>>>>>> I do not use any of the FV structure, not even a DM - I just use C arrays >>>>>>> that I transform to PETSc Vec and Mat and build my IJacobian and my >>>>>>> preconditioner and gives all that to a TS and it runs. I cannot figure out >>>>>>> how to do it with the FV and the DM and all the underlying "shortcuts" that >>>>>>> I want to use. >>>>>>> >>>>>>> Here is the top method for the structured code : >>>>>>> >>>>>>> int total_size = context.npoints * solver->nvars >>>>>>> ierr = TSSetRHSFunction(ts,PETSC_NULL,PetscRHSFunctionImpl,&context); >>>>>>> CHKERRQ(ierr); >>>>>>> SNES snes; >>>>>>> KSP ksp; >>>>>>> PC pc; >>>>>>> SNESType snestype; >>>>>>> ierr = TSGetSNES(ts,&snes); CHKERRQ(ierr); >>>>>>> ierr = SNESGetType(snes,&snestype); CHKERRQ(ierr); >>>>>>> >>>>>>> flag_mat_a = 1; >>>>>>> ierr = MatCreateShell(MPI_COMM_WORLD,total_size,total_size, >>>>>>> PETSC_DETERMINE, >>>>>>> PETSC_DETERMINE,&context,&A); CHKERRQ(ierr); >>>>>>> context.jfnk_eps = 1e-7; >>>>>>> ierr = PetscOptionsGetReal(NULL,NULL,"-jfnk_epsilon",&context. >>>>>>> jfnk_eps,NULL); CHKERRQ(ierr); >>>>>>> ierr = MatShellSetOperation(A,MATOP_MULT,(void (*)(void)) >>>>>>> PetscJacobianFunction_JFNK); CHKERRQ(ierr); >>>>>>> ierr = MatSetUp(A); CHKERRQ(ierr); >>>>>>> >>>>>>> context.flag_use_precon = 0; >>>>>>> ierr = PetscOptionsGetBool(PETSC_NULL,PETSC_NULL,"-with_pc",( >>>>>>> PetscBool*)(&context.flag_use_precon),PETSC_NULL); CHKERRQ(ierr); >>>>>>> >>>>>>> /* Set up preconditioner matrix */ >>>>>>> flag_mat_b = 1; >>>>>>> ierr = MatCreateAIJ(MPI_COMM_WORLD,total_size,total_size, >>>>>>> PETSC_DETERMINE,PETSC_DETERMINE, >>>>>>> >>>>>> (solver->ndims*2+1)*solver->nvars,NULL, >>>>>>> 2*solver->ndims*solver->nvars,NULL,&B); CHKERRQ(ierr); >>>>>>> ierr = MatSetBlockSize(B,solver->nvars); >>>>>>> /* Set the RHSJacobian function for TS */ >>>>>>> >>>>>> ierr = TSSetIJacobian(ts,A,B,PetscIJacobian,&context); CHKERRQ(ierr); > > Thibault Bridel-Bertomeu >>>>>>> ? >>>>>>> Eng, MSc, PhD >>>>>>> Research Engineer >>>>>>> CEA/CESTA >>>>>>> 33114 LE BARP >>>>>>> Tel.: (+33)557046924 >>>>>>> Mob.: (+33)611025322 >>>>>>> Mail: thibault.bridelbertomeu at gmail.com >>>>>>> >>>>>>> >>>>>>> Le jeu. 20 ao?t 2020 ? 18:43, Jed Brown a ?crit : >>>>>>> >>>>>>>> Matthew Knepley writes: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> > I could never get the FVM stuff to make sense to me for implicit >>>>>>>> methods. >>>>>>>> >>>>>>>> >>>>>>>> > Here is my problem understanding. If you have an FVM method, it >>>>>>>> decides >>>>>>>> >>>>>>>> >>>>>>>> > to move "stuff" from one cell to its neighboring cells depending >>>>>>>> on the >>>>>>>> >>>>>>>> >>>>>>>> > solution to the Riemann problem on each face, which computed the >>>>>>>> flux. This >>>>>>>> >>>>>>>> >>>>>>>> > is >>>>>>>> >>>>>>>> >>>>>>>> > fine unless the timestep is so big that material can flow through >>>>>>>> into the >>>>>>>> >>>>>>>> >>>>>>>> > cells beyond the neighbor. Then I should have considered the >>>>>>>> effect of the >>>>>>>> >>>>>>>> >>>>>>>> > Riemann problem for those interfaces. That would be in the >>>>>>>> Jacobian, but I >>>>>>>> >>>>>>>> >>>>>>>> > don't know how to compute that Jacobian. I guess you could do >>>>>>>> everything >>>>>>>> >>>>>>>> >>>>>>>> > matrix-free, but without a preconditioner it seems hard. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> So long as we're using method of lines, the flux is just >>>>>>>> instantaneous flux, not integrated over some time step. It has the same >>>>>>>> meaning for implicit and explicit. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> An explicit method would be unstable if you took such a large time >>>>>>>> step (CFL) and an implicit method will not simultaneously be SSP and higher >>>>>>>> than first order, but it's still a consistent discretization of the problem. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> It's common (done in FUN3D and others) to precondition with a >>>>>>>> first-order method, where gradient reconstruction/limiting is skipped. >>>>>>>> That's what I'd recommend because limiting creates nasty nonlinearities and >>>>>>>> the resulting discretizations lack h-ellipticity which makes them very hard >>>>>>>> to solve. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>> >>>>> >>>>> >>>> >>>> >>>> -- >>> Thibault Bridel-Bertomeu >>> ? >>> Eng, MSc, PhD >>> Research Engineer >>> CEA/CESTA >>> 33114 LE BARP >>> Tel.: (+33)557046924 >>> Mob.: (+33)611025322 >>> Mail: thibault.bridelbertomeu at gmail.com >>> >>> >>> >>> >> >> >> >> >> > > -- > Thibault Bridel-Bertomeu > ? > Eng, MSc, PhD > Research Engineer > CEA/CESTA > 33114 LE BARP > Tel.: (+33)557046924 > Mob.: (+33)611025322 > Mail: thibault.bridelbertomeu at gmail.com > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Aug 24 14:34:54 2020 From: jed at jedbrown.org (Jed Brown) Date: Mon, 24 Aug 2020 13:34:54 -0600 Subject: [petsc-users] Bus Error In-Reply-To: <79E082F4-0261-4F32-9781-861B2B650511@petsc.dev> References: <8D172ADD-FC1A-4E71-B151-CA648951A61C@petsc.dev> <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> <02176361-CBC4-490B-A064-057C9FAC271A@petsc.dev> <2C14E111-0ABC-4322-AD1C-FC83E4BD4798@petsc.dev> <87h7ssgg0g.fsf@jedbrown.org> <80DC4DC1-8D9B-43BF-9947-F47694FE68E9@petsc.dev> <87blj0gew3.fsf@jedbrown.org> <878se4gdd6.fsf@jedbrown.org> <3B4D8471-F863-49C8-AAD7-4A4ACA3BE78A@petsc.dev> <87364cgczg.fsf@jedbrown.org> <79E082F4-0261-4F32-9781-861B2B650511@petsc.dev> Message-ID: <87y2m3g7mp.fsf@jedbrown.org> I'm thinking of something such as writing floating point data into the return address, which would be unaligned/garbage. Reproducing under Valgrind would help a lot. Perhaps it's possible to checkpoint such that the breakage can be reproduced more quickly? Barry Smith writes: > https://en.wikipedia.org/wiki/Bus_error > > But perhaps not true for Intel? > > > >> On Aug 24, 2020, at 1:06 PM, Matthew Knepley wrote: >> >> On Mon, Aug 24, 2020 at 1:46 PM Barry Smith > wrote: >> >> >> > On Aug 24, 2020, at 12:39 PM, Jed Brown > wrote: >> > >> > Barry Smith > writes: >> > >> >>> On Aug 24, 2020, at 12:31 PM, Jed Brown > wrote: >> >>> >> >>> Barry Smith > writes: >> >>> >> >>>> So if a BLAS errors with SIGBUS then it is always an input error of just not proper double/complex alignment? Or some other very strange thing? >> >>> >> >>> I would suspect memory corruption. >> >> >> >> >> >> Corruption meaning what specifically? >> >> >> >> The routines crashing are dgemv which only take double precision arrays, regardless of what garbage is in those arrays i don't think there can be BUS errors resulting. They don't take integer arrays whose corruption could result in bad indexing and then BUS errors. >> >> >> >> So then it can only be corruption of the pointers passed in, correct? >> > >> > Such as those pointers pointing into data on the stack with incorrect sizes. >> >> But won't incorrect sizes "usually" lead to SEGV not SEGBUS? >> >> My understanding was that roughly memory errors in the heap are SEGV and memory errors on the stack are SIGBUS. Is that not true? >> >> Matt >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ From bsmith at petsc.dev Mon Aug 24 15:16:10 2020 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 24 Aug 2020 15:16:10 -0500 Subject: [petsc-users] PetscFV and TS implicit In-Reply-To: References: <87mu2pgtdp.fsf@jedbrown.org> <01FA5D4D-A0CA-4ACB-ACC9-EB213E3B0D2F@petsc.dev> Message-ID: <2BF36064-AEC6-4795-BEE7-DAAF69119D2E@petsc.dev> > On Aug 24, 2020, at 2:20 PM, Thibault Bridel-Bertomeu wrote: > > Good evening everyone, > > Thanks Barry for your answer. > > Le lun. 24 ao?t 2020 ? 18:51, Barry Smith > a ?crit : > > >> On Aug 24, 2020, at 11:38 AM, Thibault Bridel-Bertomeu > wrote: >> >> Thank you Barry for taking the time to go through the code ! >> >> I indeed figured out this afternoon that the function related to the matrix-vector product is always handling global vectors. I corrected mine so that it compiles, but I have a feeling it won't run properly without a preconditioner. >> >> Anyways you are right, my PetscJacobianFunction_JFNK() aims at doing some basic finite-differencing ; user->RHS_ref is my F(U) if you see the system as dU/dt = F(U). As for the DMGlobalToLocal() it was there mainly because I had not realized the vectors I was manipulating were global. >> I will take your advice and try with just the SNESSetUseMatrixFree. >> I haven't quite fully understood what it does "under the hood" though: just calling SNESSetUseMatrixFree(snes,PETSC_TRUE,PETSC_FALSE) before the TSSolve call is enough to ensure that the implicit matrix is computed ? Does it use the function we set as a RHS to build the matrix ? > > All it does is "replace" the A matrix with one automatically created for the job using MatCreateMFFD(). It does not touch the B matrix, it does not build the matrix but yes if does use the function to provide to do the differencing. > > OK, thank you. This MFFD Matrix is then called by the TS to construct the linear system that will be solved to advance the system of equations, right ? >> >> To create the preconditioner I will do as you suggest too, thank you. This matrix has to be as close as possible to the inverse of the implicit matrix to ensure that the eigenvalues of the system are as close to 1 as possible. Given the implicit matrix is built "automatically" thanks to the SNES matrix free capability, can we use that matrix as a starting point to the building of the preconditioner ? > > No the MatrixFree doesn't build a matrix, it can only do matrix-vector products with differencing. > > My bad, wrong word. Yes of course it's all matrix-free hence it's just a functional, however maybe the inner mechanisms can be accessed and used for the preconditioner ? Probably not, it really only can do matrix-vector products. >> You were talking about the coloring capabilities in PETSc, is that where it can be applied ? > > Yes you can use that. See MatFDColoringCreate() but since you are using a DM in theory you can use -snes_fd_color and PETSc will manage everything for you so you don't have to write any code for Jacobians at all. Again it uses your function to do differences using coloring to be efficient to build the Jacobian for you. > > I read a bit about the coloring you are mentioning. As I understand it, it is another option to have a matrix-free Jacobian behavior during the Newton-Krylov iterations, right ? Either we use the SNESSetUseMatrixFree() alone, then it works using "basic" finite-differencing, or we use the SNESSetUseMatrixFree + MatFDColoringCreate & SNESComputeJacobianDefaultColor as an option to SNESSetJacobian to access the finite-differencing based on coloring. Is that right ? > Then if i come back to my preconditioner problem ... once you have set-up the implicit matrix with one or the other aforementioned matrix-free ways, how would you go around setting up the preconditioner ? In a matrix-free way too, or rather as a real matrix that we assemble ourselves this time, as you seemed to mean with the previous MatAij DMCreateMatrix ? > > Sorry if it seems like I am nagging, but I would really like to understand how to manipulate the matrix-free methods and structures in PETSc to run a time-implicit finite volume computation, it's so promising ! There are many many possibilities as we discussed in previous email, most with various limitations. When you use -snes_fd_color (or put code into the source like MatFDColoringCreate which is unnecessary a since you are doing the same thing as -snes_fd_color you get back the true Jacobian (approximated so in less digits than analytic) so you can use any preconditioner that you can use as if you built the true Jacobian yourself. I always recommend starting with -pc_type lu and making sure you are getting the correct answers to your problem and then worrying about the preconditioner. Faster preconditioner are JUST optimizations, nothing more, they should not change the quality of the solution to your PDE/ODE and you absolutely need to make sure your are getting correct quality answers before fiddling with the preconditioner. Once you have the solution correct and figured out a good preconditioner (assuming using the true Jacobian works for your discretization) then you can think about optimizing the computation of the Jacobian by doing it analytically finite volume by finite volume. But you shouldn't do any of that until you are sure that your implicit TS integrator for FV produces good numerical answers. Barry > > Thanks again, > Thibault > > Barry > > Internally it uses SNESComputeJacobianDefaultColor() if you are interested in what it does. > > > > >> >> Thank you so much again, >> >> Thibault >> >> >> Le lun. 24 ao?t 2020 ? 15:45, Barry Smith > a ?crit : >> >> I think the attached is wrong. >> >> >> >> The input to the matrix vector product for the Jacobian is always global vectors which means on each process the dimension is not the size of the DMGetLocalVector() it should be the VecGetLocalSize() of the DMGetGlobalVector() >> >> But you may be able to skip all this and have the DM create the shell matrix setting it sizes appropriately and you only need to supply the MATOP >> >> DMSetMatType(dm,MATSHELL); >> DMCreateMatrix(dm,&A); >> >> In fact, I also don't understand the PetscJacobianFunction_JFKN() function It seems to be doing finite differencing on the DMPlexTSComputeRHSFunctionFVM() assuming the current function value is in usr->RHS_ref. How is this different than just letting PETSc/SNES used finite differences to do the matrix-vector product. Your code seems rather complicated with the DMGlobalToLocal() which I don't understand what it is suppose to do there. >> >> I think you can just call >> >> TSGetSNES() >> SNESSetUseMatrixFree(snes,PETSC_TRUE,PETSC_FALSE); >> >> and it will set up an internal matrix that does the finite differencing for you. Then you never need a shell matrix. >> >> >> Also to create the preconditioner matrix B this should work >> >> DMSetMatType(dm,MATAIJ); >> DMCreateMatrix(dm,&B); >> >> no need for you to figure out the sizes. >> >> >> Note that both A and B need to have the same dimensions on each process as the global vectors which I don't think your current code has. >> >> >> >> Barry >> >> >> >> >>> On Aug 24, 2020, at 12:56 AM, Thibault Bridel-Bertomeu > wrote: >>> >> >> >>> Barry, first of all, thank you very much for your detailed answer, I keep reading it to let it soak in - I might come back to you for more details if you do not mind. >>> >>> In the meantime, to fuel the conversation, I attach to this e-mail two pdfs containing the pieces of the code that regard what we are discussing. In the *timedisc.pdf, you'll find how I handle the initialization of the TS object, and in the *petscdefs.pdf you'll find the method that calls the TSSolve as well as the methods that are linked to the TS (the timestep adapt, the jacobian etc ...). [Sorry for the quality, I cannot do better than that sort of pdf ...] >>> >>> Based on what is in the structured code I sent you the other day, I rewrote the PetscJacobianFunction_JFNK. I think it should be all right, but although it compiles, execution raises a seg fault I think when I do >>> ierr = TSSetIJacobian(ts, A, A, PetscIJacobian, user); >>> saying that A does not have the right dimensions. It is quite new, I am still looking into where exactly the error is raised. What do you think of this implementation though, does it look correct in your expert eyes ? >>> As for what we really discussed so far, it's that PetscComputePreconMatImpl that I do not know how to implement (with the derivative of the jacobian based on the FVM object). >>> >>> I understand now that what I am showing you today might not be the right way to go if one wants to really use the PetscFV, but I just wanted to add those code lines to the conversation to have your feedback. >>> >>> Thank you again for your help, >>> >>> Thibault >>> >>> >> >> >>> Le ven. 21 ao?t 2020 ? 19:25, Barry Smith > a ?crit : >> >> >>> >>> >>>> On Aug 21, 2020, at 10:58 AM, Thibault Bridel-Bertomeu > wrote: >>>> >>>> Thank you Barry for the tip ! I?ll make sure to do that when everything is set. >>>> What I also meant is that there will not be any more direct way to set the preconditioner than to go through SNESSetJacobian after having assembled everything by hand ? Like, in my case, or in the more general case of fluid dynamics equations, the preconditioner is not a fun matrix to assemble, because for every cell the derivative of the physical flux jacobian has to be taken and put in the right block in the matrix - finite element style if you want. Is there a way to do that with Petsc methods, maybe short-circuiting the FEM based methods ? >>> >>> Thibault >>> >>> I am not sure what you mean but there are a couple of things that may be helpful. >>> >>> PCSHELL https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCSHELL.html <> allows you to build your own preconditioner (that can and often will use one or more of its own Mats, and KSP or PC inside it, or even use another PETScFV etc to build some of the sub matrices for you if it is appropriate), this approach means you never need to construct a "global" PETSc matrix from which PETSc builds the preconditioner. But you should only do this if the conventional approach is not reasonable for your problem. >>> >>> MATNEST https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATNEST.html allows you to build a global matrix by building parts of it separately and even skipping parts you decide you don't need in the preconditioner. Conceptually it is the same as just creating a global matrix and filling up but the process is a bit different and something suitable for "multi physics" or "multi-equation" type applications. >>> >>> Of course what you put into PCSHELL and MATNEST will affect the convergence of the nonlinear solver. As Jed noted what you put in the "Jacobian" does not have to be directly the same mathematically as what you put into the TSSetI/RHSFunction with the caveat that it does have to appropriate spectral properties to result in a good preconditioner for the "true" Jacobian. >>> >>> Couple of other notes: >>> >>> The entire business of "Jacobian" matrix-free or not (with for example -snes_fd_operator) is tricky because as Jed noted if your finite volume scheme has non-differential terms such as if () tests. There is a concept of sub-differential for this type of thing but I know absolutely nothing about that and probably not worth investigating. >>> >>> In this situation you can avoid the "true" Jacobian completely (both for matrix-vector product and preconditioner) and use something else as Jed suggested a lower order scheme that is differentiable. This can work well for solving the nonlinear system or not depending on how suitable it is for your original "function" >>> >>> >>> 1) In theory at least you can have the Jacobian matrix-vector product computed directly using DMPLEX/PETScFV infrastructure (it would apply the Jacobian locally matrix-free using code similar to the code that evaluates the FV "function". I do no know if any of this code is written, it will be more efficient than -snes_mf_operator that evaluates the FV "function" and does traditional differencing to compute the Jacobian. Again it has the problem of non-differentialability if the function is not differential. But it could be done for a different (lower order scheme) that is differentiable. >>> >>> 2) You can have PETSc compute the Jacobian explicitly coloring and from that build the preconditioner, this allows you to avoid the hassle of writing the code for the derivatives yourself. This uses finite differences on your function and coloring of the graph to compute many columns of the Jacobian simultaneously and can be pretty efficient. Again if the function is not differential there can be issues of what the result means and will it work in a nonlinear solver. SNESComputeJacobianDefaultColor https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESComputeJacobianDefaultColor.html >>> >>> 3) Much more outlandish is to skip Newton and Jacobians completely and use the full approximation scheme SNESFAS https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNESFAS/SNESFAS.html this requires a grid hierarchy and appropriate way to interpolate up through the grid hierarchy your finite volume solutions. Probably not worth investigating unless you have lots of time on your hands and keen interest in this kind of stuff https://arxiv.org/pdf/1607.04254.pdf >>> >>> So to summarize, and Matt and Jed can correct my mistakes. >>> >>> 1) Form the full Jacobian from the original "function" using analytic approach use it for both the matrix-vector product and to build the preconditioner. Problem if full Jacobian not well defined mathematically. Tough to code, usually not practical. >>> >>> 2) Do any matrix free (any way) for the full Jacobian and >>> >>> a) build another "approximate" Jacobian (using any technique analytic or finite differences using matrix coloring on a new "lower order" "function") Still can have trouble if this original Jacobian is no well defined >>> >>> b) "write your own preconditioner" that internally can use anything in PETSc that approximately solves the Jacobian. Same potential problems if original Jacobian is not differential, plus convergence will depend on how good your own preconditioner approximates the inverse of the true Jacobian. >>> >>> 3) Use a lower Jacobian (computed anyway you want) for the matrix-vector product and the preconditioner. The problem of differentiability is gone but convergence of the nonlinear solver depends on how well lower order Jacobian is appropriate for the original "function" >>> >>> a) Form the "lower order" Jacobian analytically or with coloring and use for both matrix-vector product and building preconditioner. Note that switching between this and 2a is trivial. >>> >>> b) Do the "lower order" Jacobian matrix free and provide your own PCSHELL. Note that switching between this and 2b is trivial. >>> >>> Barry >>> >>> I would first try competing the "true" Jacobian via coloring, if that works and give satisfactory results (fast enough) then stop. >>> >>> Then I would do 2a/2b by writing my "function" using PETScFV and writing the "lower order function" via PETScFV and use matrix coloring to get the Jacobian from the second "lower order function". If this works well (either with 2a or 3a or both) then stop or you can compute the "lower order" Jacobian analytically (again using PetscFV) for a more efficient evaluation of the Jacobian. >>> >> >>> >>>> >>>> Thanks ! >>>> >>>> Thibault >>>> >> >>>> Le ven. 21 ao?t 2020 ? 17:22, Barry Smith > a ?crit : >> >> >>>> >>>> >>>>> On Aug 21, 2020, at 8:35 AM, Thibault Bridel-Bertomeu > wrote: >>>>> >>>>> >>>>> >>>>> Le ven. 21 ao?t 2020 ? 15:23, Matthew Knepley > a ?crit : >>>>> On Fri, Aug 21, 2020 at 9:10 AM Thibault Bridel-Bertomeu > wrote: >>>>> Sorry, I sent too soon, I hit the wrong key. >>>>> >>>>> I wanted to say that context.npoints is the local number of cells. >>>>> >>>>> PetscRHSFunctionImpl allows to generate the hyperbolic part of the right hand side. >>>>> Then we have : >>>>> >>>>> PetscErrorCode PetscIJacobian( >>>>> TS ts, /*!< Time stepping object (see PETSc TS)*/ >>>>> PetscReal t, /*!< Current time */ >>>>> Vec Y, /*!< Solution vector */ >>>>> Vec Ydot, /*!< Time-derivative of solution vector */ >>>>> PetscReal a, /*!< Shift */ >>>>> Mat A, /*!< Jacobian matrix */ >>>>> Mat B, /*!< Preconditioning matrix */ >>>>> void *ctxt /*!< Application context */ >>>>> ) >>>>> { >>>>> PETScContext *context = (PETScContext*) ctxt; >>>>> HyPar *solver = context->solver; >>>>> _DECLARE_IERR_; >>>>> >>>>> PetscFunctionBegin; >>>>> solver->count_IJacobian++; >>>>> context->shift = a; >>>>> context->waqt = t; >>>>> /* Construct preconditioning matrix */ >>>>> if (context->flag_use_precon) { IERR PetscComputePreconMatImpl(B,Y,context); CHECKERR(ierr); } >>>>> >>>>> PetscFunctionReturn(0); >>>>> } >>>>> >>>>> and PetscJacobianFunction_JFNK which I bind to the matrix shell, computes the action of the jacobian on a vector : say U0 is the state of reference and Y the vector upon which to apply the JFNK method, then the PetscJacobianFunction_JFNK returns shift * Y - 1/epsilon * (F(U0 + epsilon*Y) - F(U0)) where F allows to evaluate the hyperbolic flux (shift comes from the TS). >>>>> The preconditioning matrix I compute as an approximation to the actual jacobian, that is shift * Identity - Derivative(dF/dU) where dF/dU is, in each cell, a 4x4 matrix that is known exactly for the system of equations I am solving, i.e. Euler equations. For the structured grid, I can loop on the cells and do that 'Derivative' thing at first order by simply taking a finite-difference like approximation with the neighboring cells, Derivative(phi) = phi_i - phi_{i-1} and I assemble the B matrix block by block (JFunction is the dF/dU) >>>>> >>>>> /* diagonal element */ >>>>> >>>>> >>>>> <> for (v=0; v>>>> >>>>> >>>>> <> ierr = solver->JFunction (values,(u+nvars*p),solver->physics ,dir,0); >>>>> >>>>> >>>>> <> _ArrayScale1D_ (values,(dxinv*iblank),(nvars*nvars)); >>>>> >>>>> >>>>> <> ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); CHKERRQ(ierr); >>>>> >>>>> >>>>> <> >>>>> >>>>> >>>>> <> /* left neighbor */ >>>>> >>>>> >>>>> <> if (pgL >= 0) { >>>>> >>>>> >>>>> <> for (v=0; v>>>> >>>>> >>>>> <> ierr = solver->JFunction (values,(u+nvars*pL),solver->physics ,dir,1); >>>>> >>>>> >>>>> <> _ArrayScale1D_ (values,(-dxinv*iblank),(nvars*nvars)); >>>>> >>>>> >>>>> <> ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); CHKERRQ(ierr); >>>>> >>>>> >>>>> <> } >>>>> >>>>> >>>>> <> >>>>> >>>>> >>>>> <> /* right neighbor */ >>>>> >>>>> >>>>> <> if (pgR >= 0) { >>>>> >>>>> >>>>> <> for (v=0; v>>>> >>>>> >>>>> <> ierr = solver->JFunction (values,(u+nvars*pR),solver->physics ,dir,-1); >>>>> >>>>> >>>>> <> _ArrayScale1D_ (values,(-dxinv*iblank),(nvars*nvars)); >>>>> >>>>> >>>>> <> ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); CHKERRQ(ierr); >>>>> >>>>> >>>>> <> } >>>>> >>>>> >>>>> >>>>> I do not know if I am clear here ... >>>>> Anyways, I am trying to figure out how to do this shell matrix and this preconditioner using all the FV and DMPlex artillery. >>>>> >>>>> Okay, that is very clear. We should be able to get the JFNK just with -snes_mf_operator, and put the approximate J construction in DMPlexComputeJacobian_Internal(). >>>>> There is an FV section already, and we could just add this. I would need to understand those entries in the pointwise Riemann sense that the other stuff is now. >>>>> >>>>> Ok i had a quick look and if I understood correctly it would do the job. Setting the snes-mf-operator flag would mean however that we have to go through SNESSetJacobian to set the jacobian and the preconditioning matrix wouldn't it ? >>>> >>>> Thibault, >>>> >>>> Since the TS implicit methods end up using SNES internally the option should be available to you without requiring you to be calling the SNES routines directly >>>> >>>> Once you have finalized your approach and if for the implicit case you always work in the snes mf operator mode you can hardwire >>>> >>>> TSGetSNES(ts,&snes); >>>> SNESSetUseMatrixFree(snes,PETSC_TRUE,PETSC_FALSE); >>>> >>>> in your code so you don't need to always provide the option -snes-mf-operator >> >>>> >>>> Barry >>>> >>>> >>>> >> >>>>> There might be calls to the Riemann solver to evaluate the dRHS / dU part yes but maybe it's possible to re-use what was computed for the RHS^n ? >>>>> In the FV section the jacobian is set to identity which I missed before, but it could explain why when I used the following : >>>>> TSSetType(ts, TSBEULER); >>>>> DMTSSetIFunctionLocal(dm, DMPlexTSComputeIFunctionFEM , &ctx); >>>>> DMTSSetIJacobianLocal(dm, DMPlexTSComputeIJacobianFEM , &ctx); >>>>> with my FV discretization nothing happened, right ? >>>>> >>>>> Thank you, >>>>> >>>>> Thibault >>>>> >> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >> >>>>> Le ven. 21 ao?t 2020 ? 14:55, Thibault Bridel-Bertomeu > a ?crit : >> >> >>>>> Hi, >>>>> >>>>> Thanks Matthew and Jed for your input. >>>>> I indeed envision an implicit solver in the sense Jed mentioned - Jiri Blazek's book is a nice intro to this concept. >>>>> >>>>> Matthew, I do not know exactly what to change right now because although I understand globally what the DMPlexComputeXXXX_Internal methods do, I cannot say for sure line by line what is happening. >>>>> In a structured code, I have a an implicit FVM solver with PETSc but I do not use any of the FV structure, not even a DM - I just use C arrays that I transform to PETSc Vec and Mat and build my IJacobian and my preconditioner and gives all that to a TS and it runs. I cannot figure out how to do it with the FV and the DM and all the underlying "shortcuts" that I want to use. >>>>> >>>>> Here is the top method for the structured code : >>>>> >> >> >>>>> int total_size = context.npoints * solver->nvars >>>>> ierr = TSSetRHSFunction(ts,PETSC_NULL,PetscRHSFunctionImpl,&context); CHKERRQ(ierr); >>>>> SNES snes; >>>>> KSP ksp; >>>>> PC pc; >>>>> SNESType snestype; >>>>> ierr = TSGetSNES(ts,&snes); CHKERRQ(ierr); >>>>> ierr = SNESGetType(snes,&snestype); CHKERRQ(ierr); >>>>> >>>>> flag_mat_a = 1; >>>>> ierr = MatCreateShell(MPI_COMM_WORLD,total_size,total_size,PETSC_DETERMINE, >>>>> PETSC_DETERMINE,&context,&A); CHKERRQ(ierr); >>>>> context.jfnk_eps = 1e-7; >>>>> ierr = PetscOptionsGetReal(NULL,NULL,"-jfnk_epsilon",&context.jfnk_eps,NULL); CHKERRQ(ierr); >>>>> ierr = MatShellSetOperation(A,MATOP_MULT,(void (*)(void))PetscJacobianFunction_JFNK); CHKERRQ(ierr); >>>>> ierr = MatSetUp(A); CHKERRQ(ierr); >>>>> >>>>> context.flag_use_precon = 0; >>>>> ierr = PetscOptionsGetBool(PETSC_NULL,PETSC_NULL,"-with_pc",(PetscBool*)(&context.flag_use_precon),PETSC_NULL); CHKERRQ(ierr); >>>>> >>>>> /* Set up preconditioner matrix */ >>>>> flag_mat_b = 1; >>>>> ierr = MatCreateAIJ(MPI_COMM_WORLD,total_size,total_size,PETSC_DETERMINE,PETSC_DETERMINE, >> >>>>> (solver->ndims*2+1)*solver->nvars,NULL, >>>>> 2*solver->ndims*solver->nvars,NULL,&B); CHKERRQ(ierr); >>>>> ierr = MatSetBlockSize(B,solver->nvars); >>>>> /* Set the RHSJacobian function for TS */ >> >> ierr = TSSetIJacobian(ts,A,B,PetscIJacobian,&context); CHKERRQ(ierr); >> >>>>> Thibault Bridel-Bertomeu >>>>> ? >>>>> Eng, MSc, PhD >>>>> Research Engineer >>>>> CEA/CESTA >>>>> 33114 LE BARP >>>>> Tel.: (+33)557046924 >>>>> Mob.: (+33)611025322 >>>>> Mail: thibault.bridelbertomeu at gmail.com >>>>> >> >> >>>>> >>>>> Le jeu. 20 ao?t 2020 ? 18:43, Jed Brown > a ?crit : >>>>> Matthew Knepley > writes: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> > I could never get the FVM stuff to make sense to me for implicit methods. >>>>> >>>>> >>>>> > Here is my problem understanding. If you have an FVM method, it decides >>>>> >>>>> >>>>> > to move "stuff" from one cell to its neighboring cells depending on the >>>>> >>>>> >>>>> > solution to the Riemann problem on each face, which computed the flux. This >>>>> >>>>> >>>>> > is >>>>> >>>>> >>>>> > fine unless the timestep is so big that material can flow through into the >>>>> >>>>> >>>>> > cells beyond the neighbor. Then I should have considered the effect of the >>>>> >>>>> >>>>> > Riemann problem for those interfaces. That would be in the Jacobian, but I >>>>> >>>>> >>>>> > don't know how to compute that Jacobian. I guess you could do everything >>>>> >>>>> >>>>> > matrix-free, but without a preconditioner it seems hard. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> So long as we're using method of lines, the flux is just instantaneous flux, not integrated over some time step. It has the same meaning for implicit and explicit. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> An explicit method would be unstable if you took such a large time step (CFL) and an implicit method will not simultaneously be SSP and higher than first order, but it's still a consistent discretization of the problem. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> It's common (done in FUN3D and others) to precondition with a first-order method, where gradient reconstruction/limiting is skipped. That's what I'd recommend because limiting creates nasty nonlinearities and the resulting discretizations lack h-ellipticity which makes them very hard to solve. >>>>> >>>>> >>>>> >>>>> >> >>>>> >>>>> >> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >> >>>>> >>>>> >>>>> >>>>> >>>> >> >>>> -- >>>> Thibault Bridel-Bertomeu >>>> ? >>>> Eng, MSc, PhD >>>> Research Engineer >>>> CEA/CESTA >>>> 33114 LE BARP >>>> Tel.: (+33)557046924 >>>> Mob.: (+33)611025322 >>>> Mail: thibault.bridelbertomeu at gmail.com >>>> >>>> >> >>> >>> >>> >>> >> >> >> >> -- >> Thibault Bridel-Bertomeu >> ? >> Eng, MSc, PhD >> Research Engineer >> CEA/CESTA >> 33114 LE BARP >> Tel.: (+33)557046924 >> Mob.: (+33)611025322 >> Mail: thibault.bridelbertomeu at gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From smithc11 at rpi.edu Mon Aug 24 15:18:33 2020 From: smithc11 at rpi.edu (Cameron Smith) Date: Mon, 24 Aug 2020 16:18:33 -0400 Subject: [petsc-users] creation of parallel dmplex from a partitioned mesh In-Reply-To: <62654977-bdbc-9cd7-5a70-e9fb4951310a@rpi.edu> References: <1953567c-6c7f-30fb-13e6-ad7017263a92@rpi.edu> <62654977-bdbc-9cd7-5a70-e9fb4951310a@rpi.edu> Message-ID: <3fcf90b7-3abd-1345-bd90-d7d7272816d9@rpi.edu> We made some progress with star forest creation but still have work to do. We revisited DMPlexCreateFromCellListParallelPetsc(...) and got it working by sequentially partitioning the vertex coordinates across processes to satisfy the 'vertexCoords' argument. Specifically, rank 0 has the coordinates for vertices with global id 0:N/P-1, rank 1 has N/P:2*(N/P)-1, and so on (N is the total number of global vertices and P is the number of processes). The consequences of the sequential partition of vertex coordinates in subsequent solver operations is not clear. Does it make process i responsible for computations and communications associated with global vertices i*(N/P):(i+1)*(N/P)-1 ? We assumed it does and wanted to confirm. Thank-you, Cameron On 8/13/20 11:43 AM, Cameron Smith wrote: > Thank you for the quick reply and the info.? We'll give it a shot and > respond if we hit any snags. > > -Cameron > > On 8/13/20 9:54 AM, Matthew Knepley wrote: >> On Thu, Aug 13, 2020 at 9:38 AM Cameron Smith > > wrote: >> >> ??? Hello, >> >> ??? We have a partitioned mesh that we want to create a DMPlex from that >> ??? has >> ??? the same distribution of elements (i.e., assignment of elements to >> ??? processes) and vertices 'interior' to a process (i.e., vertices >> not on >> ??? the inter-process boundary). >> >> ??? We were trying to use DMPlexCreateFromCellListParallelPetsc() or >> ??? DMPlexBuildFromCellListParallel() and found that the vertex ownership >> ??? (roots in the returned Vertex SF) appears to be sequentially >> ??? assigned to >> ??? processes based on global vertex id.? In general, this will not match >> ??? our mesh distribution.? As we understand, to subsequently set vertex >> ??? coordinates (or other vertex data) we would have to utilize a star >> ??? forest (SF) communication API to send data to the correct process. Is >> ??? that correct? >> >> ??? Alternatively, if we create a dmplex object from the elements that >> ??? exist >> ??? on each process using DMCreateFromCellList(), and then create a SF >> from >> ??? mesh vertices on inter-process boundaries (using the mapping from >> local >> ??? to global vertex ids provided by our mesh library), could we then >> ??? associate the dmplex objects with the SF?? Is it as simple as calling >> ??? DMSetPointSF()? >> >> >> Yes. If you have all the distribution information, this is the easiest >> thing to do. >> >> ??? If manually defining the PointSF is a way forward, we would like some >> ??? help understanding its definition; i.e., which entities become roots >> ??? and >> ??? which become leaves.? In DMPlexBuildFromCellListParallel() >> >> >> Short explanation of SF: >> >> SF stands for Star-Forest. It is a star graph because you have a >> single root that points to? multiple leaves. It is >> a forest because you have several of these stars. We use this >> construct in many places in PETSc, and where it >> is used determines the semantics of the indices. >> >> The DMPlex point SF is an SF in which root indices are "owned" mesh >> points and leaf indices are "ghost" mesh >> points. You can take any set of local Plexes and add an SF to make >> them a parallel Plex. >> >> The SF is constructed with one-sided data. Locally, each process >> specifies two things: >> >> ?? 1) The root space: The set of indices [0, Nr) which refers to >> possible roots on this process. For the pointSF, this is [0, Np) where >> Np is the number of local mesh points. >> >> ?? 2) The leaves: Each leaf is a pair (local mesh point lp, remote >> mesh point rp) which says that local mesh point lp is a "ghost" of >> remote point rp. The remote point is >> ?? ? ? ?given by (rank r, local mesh point rlp) where rlp is the local >> mesh point number on process r. >> >> With this, the Plex will automatically create all the other structures >> it needs. >> >> >> https://gitlab.com/petsc/petsc/-/blob/753428fdb0644bc4cb7be6429ce8776c05405d40/src/dm/impls/plex/plexcreate.c#L2875-2899 >> >> >> ??? the PointSF appears to contain roots for elements and vertices and >> ??? leaves for owned vertices on the inter-process boundary.? Is that >> ??? correct? >> >> >> No, the leaves are ghost vertices. They point back to the owner. >> >> ?? Thanks, >> >> ?? ? ?Matt >> >> ??? Thank-you, >> ??? Cameron >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which >> their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> From bsmith at petsc.dev Mon Aug 24 15:21:17 2020 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 24 Aug 2020 15:21:17 -0500 Subject: [petsc-users] Bus Error In-Reply-To: <87y2m3g7mp.fsf@jedbrown.org> References: <8D172ADD-FC1A-4E71-B151-CA648951A61C@petsc.dev> <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> <02176361-CBC4-490B-A064-057C9FAC271A@petsc.dev> <2C14E111-0ABC-4322-AD1C-FC83E4BD4798@petsc.dev> <87h7ssgg0g.fsf@jedbrown.org> <80DC4DC1-8D9B-43BF-9947-F47694FE68E9@petsc.dev> <87blj0gew3.fsf@jedbrown.org> <878se4gdd6.fsf@jedbrown.org> <3B4D8471-F863-49C8-AAD7-4A4ACA3BE78A@petsc.dev> <87364cgczg.fsf@jedbrown.org> <79E082F4-0261-4F32-9781-861B2B650511@petsc.dev> <87y2m3g7mp.fsf@jedbrown.org> Message-ID: <1BA78983-882E-404D-983D-B432D17E6421@petsc.dev> > On Aug 24, 2020, at 2:34 PM, Jed Brown wrote: > > I'm thinking of something such as writing floating point data into the return address, which would be unaligned/garbage. Ok, my patch will detect this. This is what I was talking about, messing up the BLAS arguments which are the addresses of arrays. Valgrind is by far the preferred approach. Barry Another feature we could add to the malloc checking is when a SEGV or BUS error is encountered and we catch it we should run the PetscMallocVerify() and check our memory for corruption reporting any we find. > > Reproducing under Valgrind would help a lot. Perhaps it's possible to checkpoint such that the breakage can be reproduced more quickly? > > Barry Smith writes: > >> https://en.wikipedia.org/wiki/Bus_error >> >> But perhaps not true for Intel? >> >> >> >>> On Aug 24, 2020, at 1:06 PM, Matthew Knepley wrote: >>> >>> On Mon, Aug 24, 2020 at 1:46 PM Barry Smith > wrote: >>> >>> >>>> On Aug 24, 2020, at 12:39 PM, Jed Brown > wrote: >>>> >>>> Barry Smith > writes: >>>> >>>>>> On Aug 24, 2020, at 12:31 PM, Jed Brown > wrote: >>>>>> >>>>>> Barry Smith > writes: >>>>>> >>>>>>> So if a BLAS errors with SIGBUS then it is always an input error of just not proper double/complex alignment? Or some other very strange thing? >>>>>> >>>>>> I would suspect memory corruption. >>>>> >>>>> >>>>> Corruption meaning what specifically? >>>>> >>>>> The routines crashing are dgemv which only take double precision arrays, regardless of what garbage is in those arrays i don't think there can be BUS errors resulting. They don't take integer arrays whose corruption could result in bad indexing and then BUS errors. >>>>> >>>>> So then it can only be corruption of the pointers passed in, correct? >>>> >>>> Such as those pointers pointing into data on the stack with incorrect sizes. >>> >>> But won't incorrect sizes "usually" lead to SEGV not SEGBUS? >>> >>> My understanding was that roughly memory errors in the heap are SEGV and memory errors on the stack are SIGBUS. Is that not true? >>> >>> Matt >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ From jed at jedbrown.org Mon Aug 24 15:27:21 2020 From: jed at jedbrown.org (Jed Brown) Date: Mon, 24 Aug 2020 14:27:21 -0600 Subject: [petsc-users] creation of parallel dmplex from a partitioned mesh In-Reply-To: <3fcf90b7-3abd-1345-bd90-d7d7272816d9@rpi.edu> References: <1953567c-6c7f-30fb-13e6-ad7017263a92@rpi.edu> <62654977-bdbc-9cd7-5a70-e9fb4951310a@rpi.edu> <3fcf90b7-3abd-1345-bd90-d7d7272816d9@rpi.edu> Message-ID: <87mu2jg57a.fsf@jedbrown.org> Cameron Smith writes: > We made some progress with star forest creation but still have work to do. > > We revisited DMPlexCreateFromCellListParallelPetsc(...) and got it > working by sequentially partitioning the vertex coordinates across > processes to satisfy the 'vertexCoords' argument. Specifically, rank 0 > has the coordinates for vertices with global id 0:N/P-1, rank 1 has > N/P:2*(N/P)-1, and so on (N is the total number of global vertices and P > is the number of processes). > > The consequences of the sequential partition of vertex coordinates in > subsequent solver operations is not clear. Does it make process i > responsible for computations and communications associated with global > vertices i*(N/P):(i+1)*(N/P)-1 ? We assumed it does and wanted to confirm. Yeah, in the sense that the corners would be owned by the rank you place them on. But many methods, especially high-order, perform assembly via non-overlapping partition of elements, in which case the "computations" happen where the elements are (with any required vertex data for the closure of those elements being sent to the rank handling the element). Note that a typical pattern would be to create a parallel DMPlex with a naive distribution, then repartition/distribute it. From jed at jedbrown.org Mon Aug 24 15:29:11 2020 From: jed at jedbrown.org (Jed Brown) Date: Mon, 24 Aug 2020 14:29:11 -0600 Subject: [petsc-users] Bus Error In-Reply-To: <1BA78983-882E-404D-983D-B432D17E6421@petsc.dev> References: <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> <02176361-CBC4-490B-A064-057C9FAC271A@petsc.dev> <2C14E111-0ABC-4322-AD1C-FC83E4BD4798@petsc.dev> <87h7ssgg0g.fsf@jedbrown.org> <80DC4DC1-8D9B-43BF-9947-F47694FE68E9@petsc.dev> <87blj0gew3.fsf@jedbrown.org> <878se4gdd6.fsf@jedbrown.org> <3B4D8471-F863-49C8-AAD7-4A4ACA3BE78A@petsc.dev> <87364cgczg.fsf@jedbrown.org> <79E082F4-0261-4F32-9781-861B2B650511@petsc.dev> <87y2m3g7mp.fsf@jedbrown.org> <1BA78983-882E-404D-983D-B432D17E6421@petsc.dev> Message-ID: <87k0xng548.fsf@jedbrown.org> Barry Smith writes: >> On Aug 24, 2020, at 2:34 PM, Jed Brown wrote: >> >> I'm thinking of something such as writing floating point data into the return address, which would be unaligned/garbage. > > Ok, my patch will detect this. This is what I was talking about, messing up the BLAS arguments which are the addresses of arrays. > > Valgrind is by far the preferred approach. FWIW, Valgrind is very limited at detecting stack corruption, and PetscMallocVerify known nothing about the stack. That said, it's unusual to place large arrays on the stack. > Barry > > Another feature we could add to the malloc checking is when a SEGV or BUS error is encountered and we catch it we should run the PetscMallocVerify() and check our memory for corruption reporting any we find. From mlohry at gmail.com Mon Aug 24 15:36:36 2020 From: mlohry at gmail.com (Mark Lohry) Date: Mon, 24 Aug 2020 16:36:36 -0400 Subject: [petsc-users] Bus Error In-Reply-To: <1BA78983-882E-404D-983D-B432D17E6421@petsc.dev> References: <8D172ADD-FC1A-4E71-B151-CA648951A61C@petsc.dev> <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> <02176361-CBC4-490B-A064-057C9FAC271A@petsc.dev> <2C14E111-0ABC-4322-AD1C-FC83E4BD4798@petsc.dev> <87h7ssgg0g.fsf@jedbrown.org> <80DC4DC1-8D9B-43BF-9947-F47694FE68E9@petsc.dev> <87blj0gew3.fsf@jedbrown.org> <878se4gdd6.fsf@jedbrown.org> <3B4D8471-F863-49C8-AAD7-4A4ACA3BE78A@petsc.dev> <87364cgczg.fsf@jedbrown.org> <79E082F4-0261-4F32-9781-861B2B650511@petsc.dev> <87y2m3g7mp.fsf@jedbrown.org> <1BA78983-882E-404D-983D-B432D17E6421@petsc.dev> Message-ID: I queued up some jobs with Barry's patch, so we'll see. Re Jed's suggestion at checkpointing, I don't *think* this is something coming from the state of the solution -- running from the same point I'm seeing it crash anywhere between 1 hour and 20 hours in. I'll increase my file save frequency in case I'm wrong there though. My intel build with different blas just made it through a 6 hour time slot without crash, whereas yesterday the same thing crashed after 3 hours. But given the randomness so far I'd bet that's just dumb luck. On Mon, Aug 24, 2020 at 4:22 PM Barry Smith wrote: > > > > On Aug 24, 2020, at 2:34 PM, Jed Brown wrote: > > > > I'm thinking of something such as writing floating point data into the > return address, which would be unaligned/garbage. > > Ok, my patch will detect this. This is what I was talking about, messing > up the BLAS arguments which are the addresses of arrays. > > Valgrind is by far the preferred approach. > > Barry > > Another feature we could add to the malloc checking is when a SEGV or > BUS error is encountered and we catch it we should run the > PetscMallocVerify() and check our memory for corruption reporting any we > find. > > > > > > > Reproducing under Valgrind would help a lot. Perhaps it's possible to > checkpoint such that the breakage can be reproduced more quickly? > > > > Barry Smith writes: > > > >> https://en.wikipedia.org/wiki/Bus_error < > https://en.wikipedia.org/wiki/Bus_error> > >> > >> But perhaps not true for Intel? > >> > >> > >> > >>> On Aug 24, 2020, at 1:06 PM, Matthew Knepley > wrote: > >>> > >>> On Mon, Aug 24, 2020 at 1:46 PM Barry Smith bsmith at petsc.dev>> wrote: > >>> > >>> > >>>> On Aug 24, 2020, at 12:39 PM, Jed Brown jed at jedbrown.org>> wrote: > >>>> > >>>> Barry Smith > writes: > >>>> > >>>>>> On Aug 24, 2020, at 12:31 PM, Jed Brown jed at jedbrown.org>> wrote: > >>>>>> > >>>>>> Barry Smith > writes: > >>>>>> > >>>>>>> So if a BLAS errors with SIGBUS then it is always an input error > of just not proper double/complex alignment? Or some other very strange > thing? > >>>>>> > >>>>>> I would suspect memory corruption. > >>>>> > >>>>> > >>>>> Corruption meaning what specifically? > >>>>> > >>>>> The routines crashing are dgemv which only take double precision > arrays, regardless of what garbage is in those arrays i don't think there > can be BUS errors resulting. They don't take integer arrays whose > corruption could result in bad indexing and then BUS errors. > >>>>> > >>>>> So then it can only be corruption of the pointers passed in, correct? > >>>> > >>>> Such as those pointers pointing into data on the stack with incorrect > sizes. > >>> > >>> But won't incorrect sizes "usually" lead to SEGV not SEGBUS? > >>> > >>> My understanding was that roughly memory errors in the heap are SEGV > and memory errors on the stack are SIGBUS. Is that not true? > >>> > >>> Matt > >>> > >>> -- > >>> What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > >>> -- Norbert Wiener > >>> > >>> https://www.cse.buffalo.edu/~knepley/ < > http://www.cse.buffalo.edu/~knepley/> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Aug 24 15:57:49 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 24 Aug 2020 16:57:49 -0400 Subject: [petsc-users] creation of parallel dmplex from a partitioned mesh In-Reply-To: <87mu2jg57a.fsf@jedbrown.org> References: <1953567c-6c7f-30fb-13e6-ad7017263a92@rpi.edu> <62654977-bdbc-9cd7-5a70-e9fb4951310a@rpi.edu> <3fcf90b7-3abd-1345-bd90-d7d7272816d9@rpi.edu> <87mu2jg57a.fsf@jedbrown.org> Message-ID: On Mon, Aug 24, 2020 at 4:27 PM Jed Brown wrote: > Cameron Smith writes: > > > We made some progress with star forest creation but still have work to > do. > > > > We revisited DMPlexCreateFromCellListParallelPetsc(...) and got it > > working by sequentially partitioning the vertex coordinates across > > processes to satisfy the 'vertexCoords' argument. Specifically, rank 0 > > has the coordinates for vertices with global id 0:N/P-1, rank 1 has > > N/P:2*(N/P)-1, and so on (N is the total number of global vertices and P > > is the number of processes). > > > > The consequences of the sequential partition of vertex coordinates in > > subsequent solver operations is not clear. Does it make process i > > responsible for computations and communications associated with global > > vertices i*(N/P):(i+1)*(N/P)-1 ? We assumed it does and wanted to > confirm. > > Yeah, in the sense that the corners would be owned by the rank you place > them on. > > But many methods, especially high-order, perform assembly via > non-overlapping partition of elements, in which case the "computations" > happen where the elements are (with any required vertex data for the > closure of those elements being sent to the rank handling the element). > > Note that a typical pattern would be to create a parallel DMPlex with a > naive distribution, then repartition/distribute it. > As Jed says, CreateParallel() just makes the most naive partition of vertices because we have no other information. Once the mesh is made, you call DMPlexDistribute() again to reduce the edge cut. Thanks, Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Aug 24 16:00:26 2020 From: jed at jedbrown.org (Jed Brown) Date: Mon, 24 Aug 2020 15:00:26 -0600 Subject: [petsc-users] Bus Error In-Reply-To: References: <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> <02176361-CBC4-490B-A064-057C9FAC271A@petsc.dev> <2C14E111-0ABC-4322-AD1C-FC83E4BD4798@petsc.dev> <87h7ssgg0g.fsf@jedbrown.org> <80DC4DC1-8D9B-43BF-9947-F47694FE68E9@petsc.dev> <87blj0gew3.fsf@jedbrown.org> <878se4gdd6.fsf@jedbrown.org> <3B4D8471-F863-49C8-AAD7-4A4ACA3BE78A@petsc.dev> <87364cgczg.fsf@jedbrown.org> <79E082F4-0261-4F32-9781-861B2B650511@petsc.dev> <87y2m3g7mp.fsf@jedbrown.org> <1BA78983-882E-404D-983D-B432D17E6421@petsc.dev> Message-ID: <87a6yjg3o5.fsf@jedbrown.org> Do you potentially have a memory or other resource leak? SIGBUS would be an odd result, but the symptom of crashing after running for a long time sometimes fits with a resource leak. Mark Lohry writes: > I queued up some jobs with Barry's patch, so we'll see. > > Re Jed's suggestion at checkpointing, I don't *think* this is something > coming from the state of the solution -- running from the same point I'm > seeing it crash anywhere between 1 hour and 20 hours in. I'll increase my > file save frequency in case I'm wrong there though. > > My intel build with different blas just made it through a 6 hour time slot > without crash, whereas yesterday the same thing crashed after 3 hours. But > given the randomness so far I'd bet that's just dumb luck. > > On Mon, Aug 24, 2020 at 4:22 PM Barry Smith wrote: > >> >> >> > On Aug 24, 2020, at 2:34 PM, Jed Brown wrote: >> > >> > I'm thinking of something such as writing floating point data into the >> return address, which would be unaligned/garbage. >> >> Ok, my patch will detect this. This is what I was talking about, messing >> up the BLAS arguments which are the addresses of arrays. >> >> Valgrind is by far the preferred approach. >> >> Barry >> >> Another feature we could add to the malloc checking is when a SEGV or >> BUS error is encountered and we catch it we should run the >> PetscMallocVerify() and check our memory for corruption reporting any we >> find. >> >> >> >> > >> > Reproducing under Valgrind would help a lot. Perhaps it's possible to >> checkpoint such that the breakage can be reproduced more quickly? >> > >> > Barry Smith writes: >> > >> >> https://en.wikipedia.org/wiki/Bus_error < >> https://en.wikipedia.org/wiki/Bus_error> >> >> >> >> But perhaps not true for Intel? >> >> >> >> >> >> >> >>> On Aug 24, 2020, at 1:06 PM, Matthew Knepley >> wrote: >> >>> >> >>> On Mon, Aug 24, 2020 at 1:46 PM Barry Smith > bsmith at petsc.dev>> wrote: >> >>> >> >>> >> >>>> On Aug 24, 2020, at 12:39 PM, Jed Brown > jed at jedbrown.org>> wrote: >> >>>> >> >>>> Barry Smith > writes: >> >>>> >> >>>>>> On Aug 24, 2020, at 12:31 PM, Jed Brown > jed at jedbrown.org>> wrote: >> >>>>>> >> >>>>>> Barry Smith > writes: >> >>>>>> >> >>>>>>> So if a BLAS errors with SIGBUS then it is always an input error >> of just not proper double/complex alignment? Or some other very strange >> thing? >> >>>>>> >> >>>>>> I would suspect memory corruption. >> >>>>> >> >>>>> >> >>>>> Corruption meaning what specifically? >> >>>>> >> >>>>> The routines crashing are dgemv which only take double precision >> arrays, regardless of what garbage is in those arrays i don't think there >> can be BUS errors resulting. They don't take integer arrays whose >> corruption could result in bad indexing and then BUS errors. >> >>>>> >> >>>>> So then it can only be corruption of the pointers passed in, correct? >> >>>> >> >>>> Such as those pointers pointing into data on the stack with incorrect >> sizes. >> >>> >> >>> But won't incorrect sizes "usually" lead to SEGV not SEGBUS? >> >>> >> >>> My understanding was that roughly memory errors in the heap are SEGV >> and memory errors on the stack are SIGBUS. Is that not true? >> >>> >> >>> Matt >> >>> >> >>> -- >> >>> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> >>> -- Norbert Wiener >> >>> >> >>> https://www.cse.buffalo.edu/~knepley/ < >> http://www.cse.buffalo.edu/~knepley/> >> >> From ajaramillopalma at gmail.com Mon Aug 24 16:05:33 2020 From: ajaramillopalma at gmail.com (Alfredo Jaramillo) Date: Mon, 24 Aug 2020 18:05:33 -0300 Subject: [petsc-users] error when solving a linear system with gmres + pilut/euclid Message-ID: Dear PETSc developers, I'm trying to solve a linear problem with GMRES preconditioned with pilut from HYPRE. For this I'm using the options: -ksp_type gmres -pc_type hypre -pc_hypre_type pilut -ksp_monitor If I use a single core, GMRES (+ pilut or euclid) converges. However, when using multiple cores the next error appears after some number of iterations: [0]PETSC ERROR: Scalar value must be same on all processes, argument # 3 relative to the function VecMAXPY. I attached a screenshot with more detailed output. The same happens when using euclid. Can you please give me some insight on this? best regards Alfredo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot from 2020-08-24 17-57-52.png Type: image/png Size: 129347 bytes Desc: not available URL: From ajaramillopalma at gmail.com Mon Aug 24 16:44:33 2020 From: ajaramillopalma at gmail.com (Alfredo Jaramillo) Date: Mon, 24 Aug 2020 18:44:33 -0300 Subject: [petsc-users] error when solving a linear system with gmres + pilut/euclid In-Reply-To: References: Message-ID: complementing the previous message, the same error appears when using boomeramg On Mon, Aug 24, 2020 at 6:05 PM Alfredo Jaramillo wrote: > Dear PETSc developers, > > I'm trying to solve a linear problem with GMRES preconditioned with pilut > from HYPRE. For this I'm using the options: > > -ksp_type gmres -pc_type hypre -pc_hypre_type pilut -ksp_monitor > > If I use a single core, GMRES (+ pilut or euclid) converges. However, when > using multiple cores the next error appears after some number of iterations: > > [0]PETSC ERROR: Scalar value must be same on all processes, argument # 3 > > relative to the function VecMAXPY. I attached a screenshot with more > detailed output. The same happens when using euclid. Can you please give me > some insight on this? > > best regards > Alfredo > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Aug 24 17:26:03 2020 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 24 Aug 2020 17:26:03 -0500 Subject: [petsc-users] error when solving a linear system with gmres + pilut/euclid In-Reply-To: References: Message-ID: <4A6C7C21-E4AB-45AE-ABAA-D9028622B66C@petsc.dev> Alfredo, This should never happen. The input to the VecMAXPY in gmres is computed via VMDot which produces the same result on all processes. If you run with -pc_type bjacobi does it also happen? Is this your custom code or does it happen in PETSc examples also? Like src/snes/tutorials/ex19 -da_refine 5 Could be memory corruption, can you run under valgrind? Barry > On Aug 24, 2020, at 4:05 PM, Alfredo Jaramillo wrote: > > Dear PETSc developers, > > I'm trying to solve a linear problem with GMRES preconditioned with pilut from HYPRE. For this I'm using the options: > > -ksp_type gmres -pc_type hypre -pc_hypre_type pilut -ksp_monitor > > If I use a single core, GMRES (+ pilut or euclid) converges. However, when using multiple cores the next error appears after some number of iterations: > > [0]PETSC ERROR: Scalar value must be same on all processes, argument # 3 > > relative to the function VecMAXPY. I attached a screenshot with more detailed output. The same happens when using euclid. Can you please give me some insight on this? > > best regards > Alfredo > From mlohry at gmail.com Mon Aug 24 17:47:04 2020 From: mlohry at gmail.com (Mark Lohry) Date: Mon, 24 Aug 2020 18:47:04 -0400 Subject: [petsc-users] Bus Error In-Reply-To: <87a6yjg3o5.fsf@jedbrown.org> References: <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> <02176361-CBC4-490B-A064-057C9FAC271A@petsc.dev> <2C14E111-0ABC-4322-AD1C-FC83E4BD4798@petsc.dev> <87h7ssgg0g.fsf@jedbrown.org> <80DC4DC1-8D9B-43BF-9947-F47694FE68E9@petsc.dev> <87blj0gew3.fsf@jedbrown.org> <878se4gdd6.fsf@jedbrown.org> <3B4D8471-F863-49C8-AAD7-4A4ACA3BE78A@petsc.dev> <87364cgczg.fsf@jedbrown.org> <79E082F4-0261-4F32-9781-861B2B650511@petsc.dev> <87y2m3g7mp.fsf@jedbrown.org> <1BA78983-882E-404D-983D-B432D17E6421@petsc.dev> <87a6yjg3o5.fsf@jedbrown.org> Message-ID: I don't think I do. Running a much smaller case with the same models I get the attached report from valgrind --show-leak-kinds=all --leak-check=full --track-origins=yes. I only see some HDF5 stuff and OpenMPI that I think are false positives. ==1286950== Memcheck, a memory error detector ==1286950== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==1286950== Using Valgrind-3.15.0-608cb11914-20190413 and LibVEX; rerun with -h for copyright info ==1286950== Command: ./verification_testing --gtest_filter=DrivenCavity3D.Re100_BackwardEulerILU1_16x16N2_Quadrature1 --petsc_time_integrator=arkimex --petsc_arkimex_type=l2 ==1286950== Parent PID: 1286932 ==1286950== --1286950-- --1286950-- Valgrind options: --1286950-- --show-leak-kinds=all --1286950-- --leak-check=full --1286950-- --track-origins=yes --1286950-- --log-file=valgrind-out.txt --1286950-- -v --1286950-- Contents of /proc/version: --1286950-- Linux version 5.4.0-29-generic (buildd at lgw01-amd64-035) (gcc version 9.3.0 (Ubuntu 9.3.0-10ubuntu2)) #33-Ubuntu SMP Wed Apr 29 14:32:27 UTC 2020 --1286950-- --1286950-- Arch and hwcaps: AMD64, LittleEndian, amd64-cx16-rdtscp-sse3-ssse3-avx --1286950-- Page sizes: currently 4096, max supported 4096 --1286950-- Valgrind library directory: /usr/lib/x86_64-linux-gnu/valgrind --1286950-- Reading syms from /home/mlohry/dev/cmake-build/verification_testing --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/ld-2.31.so --1286950-- Considering /usr/lib/x86_64-linux-gnu/ld-2.31.so .. --1286950-- .. CRC mismatch (computed 387b17ea wanted d28cf5ef) --1286950-- Considering /lib/x86_64-linux-gnu/ld-2.31.so .. --1286950-- .. CRC mismatch (computed 387b17ea wanted d28cf5ef) --1286950-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ld-2.31.so .. --1286950-- .. CRC is valid --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/valgrind/memcheck-amd64-linux --1286950-- object doesn't have a symbol table --1286950-- object doesn't have a dynamic symbol table --1286950-- Scheduler: using generic scheduler lock implementation. --1286950-- Reading suppressions file: /usr/lib/x86_64-linux-gnu/valgrind/default.supp ==1286950== embedded gdbserver: reading from /tmp/vgdb-pipe-from-vgdb-to-1286950-by-mlohry-on-??? ==1286950== embedded gdbserver: writing to /tmp/vgdb-pipe-to-vgdb-from-1286950-by-mlohry-on-??? ==1286950== embedded gdbserver: shared mem /tmp/vgdb-pipe-shared-mem-vgdb-1286950-by-mlohry-on-??? ==1286950== ==1286950== TO CONTROL THIS PROCESS USING vgdb (which you probably ==1286950== don't want to do, unless you know exactly what you're doing, ==1286950== or are doing some strange experiment): ==1286950== /usr/lib/x86_64-linux-gnu/valgrind/../../bin/vgdb --pid=1286950 ...command... ==1286950== ==1286950== TO DEBUG THIS PROCESS USING GDB: start GDB like this ==1286950== /path/to/gdb ./verification_testing ==1286950== and then give GDB the following command ==1286950== target remote | /usr/lib/x86_64-linux-gnu/valgrind/../../bin/vgdb --pid=1286950 ==1286950== --pid is optional if only one valgrind process is running ==1286950== --1286950-- REDIR: 0x4022d80 (ld-linux-x86-64.so.2:strlen) redirected to 0x580c9ce2 (???) --1286950-- REDIR: 0x4022b50 (ld-linux-x86-64.so.2:index) redirected to 0x580c9cfc (???) --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_core-amd64-linux.so --1286950-- object doesn't have a symbol table --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so --1286950-- object doesn't have a symbol table ==1286950== WARNING: new redirection conflicts with existing -- ignoring it --1286950-- old: 0x04022d80 (strlen ) R-> (0000.0) 0x580c9ce2 ??? --1286950-- new: 0x04022d80 (strlen ) R-> (2007.0) 0x0483f060 strlen --1286950-- REDIR: 0x401f560 (ld-linux-x86-64.so.2:strcmp) redirected to 0x483ffd0 (strcmp) --1286950-- REDIR: 0x40232e0 (ld-linux-x86-64.so.2:mempcpy) redirected to 0x4843a20 (mempcpy) --1286950-- Reading syms from /home/mlohry/dev/cmake-build/initialization/libinitialization.so --1286950-- Reading syms from /home/mlohry/dev/cmake-build/governing_equations/libgoverning_equations.so --1286950-- Reading syms from /home/mlohry/dev/cmake-build/time_stepping/libtime_stepping.so --1286950-- Reading syms from /home/mlohry/dev/cmake-build/governing_equations/libboundary_conditions.so --1286950-- Reading syms from /home/mlohry/dev/cmake-build/governing_equations/libsolution_monitors.so --1286950-- Reading syms from /home/mlohry/dev/cmake-build/governing_equations/libfluxtypes.so --1286950-- Reading syms from /home/mlohry/dev/cmake-build/algebraic_solvers/libalgebraic_solvers.so --1286950-- Reading syms from /home/mlohry/dev/cmake-build/program_options/libprogram_options.so --1286950-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_filesystem.so.1.73.0 --1286950-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0 --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so.40.20.1 --1286950-- object doesn't have a symbol table --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3 --1286950-- object doesn't have a symbol table --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libpthread-2.31.so --1286950-- Considering /usr/lib/debug/.build-id/77/5cbbfff814456660786780b0b3b40096b4c05e.debug .. --1286950-- .. build-id is valid --1286948-- Reading syms from /home/mlohry/dev/cmake-build/external/petsc/arch-linux2-c-opt/lib/libpetsc.so.3.13.3 --1286937-- Reading syms from /home/mlohry/dev/cmake-build/parallel/libparallel.so --1286937-- Reading syms from /home/mlohry/dev/cmake-build/logger/liblogger.so --1286937-- Reading syms from /home/mlohry/dev/cmake-build/spatial_discretization/libdiscretization.so --1286945-- Reading syms from /home/mlohry/dev/cmake-build/utils/libutils.so --1286944-- Reading syms from /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28 --1286938-- object doesn't have a symbol table --1286949-- Reading syms from /usr/lib/x86_64-linux-gnu/libm-2.31.so --1286949-- Considering /usr/lib/x86_64-linux-gnu/libm-2.31.so .. --1286947-- .. CRC mismatch (computed 327d785f wanted 751f5509) --1286947-- Considering /lib/x86_64-linux-gnu/libm-2.31.so .. --1286938-- .. CRC mismatch (computed 327d785f wanted 751f5509) --1286937-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/libm-2.31.so .. --1286950-- .. CRC is valid --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 --1286950-- object doesn't have a symbol table --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libc-2.31.so --1286950-- Considering /usr/lib/x86_64-linux-gnu/libc-2.31.so .. --1286951-- .. CRC mismatch (computed a6f43087 wanted 6555436e) --1286951-- Considering /lib/x86_64-linux-gnu/libc-2.31.so .. --1286947-- .. CRC mismatch (computed a6f43087 wanted 6555436e) --1286947-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.31.so .. --1286950-- .. CRC is valid --1286940-- Reading syms from /home/mlohry/dev/cmake-build/file_io/libfileio.so --1286950-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_program_options.so.1.73.0 --1286950-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_serialization.so.1.73.0 --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3 --1286950-- object doesn't have a symbol table --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3 --1286950-- object doesn't have a symbol table --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libhwloc.so.15.1.0 --1286950-- object doesn't have a symbol table --1286950-- Reading syms from /home/mlohry/dev/cmake-build/external/petsc/arch-linux2-c-opt/lib/libsuperlu_dist.so.6.3.0 --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 --1286950-- object doesn't have a symbol table --1286937-- Reading syms from /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 --1286937-- object doesn't have a symbol table --1286947-- Reading syms from /usr/lib/x86_64-linux-gnu/libX11.so.6.3.0 --1286939-- object doesn't have a symbol table --1286947-- Reading syms from /usr/lib/x86_64-linux-gnu/libdl-2.31.so --1286947-- Considering /usr/lib/x86_64-linux-gnu/libdl-2.31.so .. --1286947-- .. CRC mismatch (computed 4fd191ca wanted df8dd39a) --1286947-- Considering /lib/x86_64-linux-gnu/libdl-2.31.so .. --1286947-- .. CRC mismatch (computed 4fd191ca wanted df8dd39a) --1286947-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/libdl-2.31.so .. --1286947-- .. CRC is valid --1286937-- Reading syms from /home/mlohry/dev/cmake-build/external/petsc/arch-linux2-c-opt/lib/libmetis.so --1286937-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_log.so.1.73.0 --1286942-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_log_setup.so.1.73.0 --1286942-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_thread.so.1.73.0 --1286942-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_regex.so.1.73.0 --1286949-- Reading syms from /home/mlohry/dev/cmake-build/basis_functions/libbasis_functions.so --1286944-- Reading syms from /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0 --1286944-- object doesn't have a symbol table --1286951-- Reading syms from /home/mlohry/dev/cmake-build/external_install/lib/libcgns.so --1286951-- object doesn't have a symbol table --1286943-- Reading syms from /home/mlohry/dev/cmake-build/external_install/lib/libhdf5.so.103.1.0 --1286951-- Reading syms from /home/mlohry/dev/cmake-build/external/tinyxml2-build/libtinyxml2.so.6.1.0 --1286944-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_iostreams.so.1.73.0 --1286944-- Reading syms from /usr/lib/x86_64-linux-gnu/libz.so.1.2.11 --1286944-- object doesn't have a symbol table --1286951-- Reading syms from /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0 --1286951-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libutil-2.31.so --1286946-- Considering /usr/lib/x86_64-linux-gnu/libutil-2.31.so .. --1286946-- .. CRC mismatch (computed 4639aba5 wanted ceb246b4) --1286946-- Considering /lib/x86_64-linux-gnu/libutil-2.31.so .. --1286946-- .. CRC mismatch (computed 4639aba5 wanted ceb246b4) --1286948-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ libutil-2.31.so .. --1286939-- .. CRC is valid --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0 --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libudev.so.1.6.17 --1286950-- object doesn't have a symbol table --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libltdl.so.7.3.1 --1286950-- object doesn't have a symbol table --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libgfortran.so.5.0.0 --1286950-- object doesn't have a symbol table --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libxcb.so.1.1.0 --1286950-- object doesn't have a symbol table --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/librt-2.31.so --1286950-- Considering /usr/lib/x86_64-linux-gnu/librt-2.31.so .. --1286950-- .. CRC mismatch (computed a9acc0ce wanted cb4905a3) --1286950-- Considering /lib/x86_64-linux-gnu/librt-2.31.so .. --1286950-- .. CRC mismatch (computed a9acc0ce wanted cb4905a3) --1286950-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/librt-2.31.so .. --1286950-- .. CRC is valid --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libquadmath.so.0.0.0 --1286950-- object doesn't have a symbol table --1286945-- Reading syms from /usr/lib/x86_64-linux-gnu/libXau.so.6.0.0 --1286945-- Considering /usr/lib/x86_64-linux-gnu/libXau.so.6.0.0 .. --1286945-- .. CRC mismatch (computed 7de9b6ad wanted e8a17129) --1286945-- Considering /lib/x86_64-linux-gnu/libXau.so.6.0.0 .. --1286945-- .. CRC mismatch (computed 7de9b6ad wanted e8a17129) --1286945-- object doesn't have a symbol table --1286945-- Reading syms from /usr/lib/x86_64-linux-gnu/libXdmcp.so.6.0.0 --1286942-- object doesn't have a symbol table --1286942-- Reading syms from /usr/lib/x86_64-linux-gnu/libbsd.so.0.10.0 --1286942-- object doesn't have a symbol table --1286950-- REDIR: 0x6516600 (libc.so.6:memmove) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x6515900 (libc.so.6:strncpy) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x6516930 (libc.so.6:strcasecmp) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x6515220 (libc.so.6:strcat) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x6515960 (libc.so.6:rindex) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x6517dd0 (libc.so.6:rawmemchr) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x6532e60 (libc.so.6:wmemchr) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x65329a0 (libc.so.6:wcscmp) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x6516760 (libc.so.6:mempcpy) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x6516590 (libc.so.6:bcmp) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x6515890 (libc.so.6:strncmp) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x65152d0 (libc.so.6:strcmp) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x65166c0 (libc.so.6:memset) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x6532960 (libc.so.6:wcschr) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x65157f0 (libc.so.6:strnlen) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x65153b0 (libc.so.6:strcspn) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x6516980 (libc.so.6:strncasecmp) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x6515350 (libc.so.6:strcpy) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x6516ad0 (libc.so.6:memcpy@@GLIBC_2.14) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x65340d0 (libc.so.6:wcsnlen) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x65329e0 (libc.so.6:wcscpy) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x65159a0 (libc.so.6:strpbrk) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x6515280 (libc.so.6:index) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x65157b0 (libc.so.6:strlen) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x651ed20 (libc.so.6:memrchr) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x65169d0 (libc.so.6:strcasecmp_l) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x6516550 (libc.so.6:memchr) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x6532ab0 (libc.so.6:wcslen) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x6515c60 (libc.so.6:strspn) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x65168d0 (libc.so.6:stpncpy) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x6516870 (libc.so.6:stpcpy) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x6517e10 (libc.so.6:strchrnul) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x6516a20 (libc.so.6:strncasecmp_l) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x6516470 (libc.so.6:strstr) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x65a3750 (libc.so.6:__memcpy_chk) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286938-- REDIR: 0x6527a30 (libc.so.6:__strrchr_sse2) redirected to 0x483ea70 (__strrchr_sse2) --1286938-- REDIR: 0x6511c90 (libc.so.6:calloc) redirected to 0x483dce0 (calloc) --1286938-- REDIR: 0x6510260 (libc.so.6:malloc) redirected to 0x483b780 (malloc) --1286938-- REDIR: 0x6531c40 (libc.so.6:memcpy at GLIBC_2.2.5) redirected to 0x4840100 (memcpy at GLIBC_2.2.5) --1286938-- REDIR: 0x6527d30 (libc.so.6:__strlen_sse2) redirected to 0x483efa0 (__strlen_sse2) --1286938-- REDIR: 0x65f4ac0 (libc.so.6:__strncmp_sse42) redirected to 0x483f7c0 (__strncmp_sse42) --1286938-- REDIR: 0x6510850 (libc.so.6:free) redirected to 0x483c9d0 (free) --1286938-- REDIR: 0x6532070 (libc.so.6:__memset_sse2_unaligned) redirected to 0x48428e0 (memset) --1286938-- REDIR: 0x6603350 (libc.so.6:__memcmp_sse4_1) redirected to 0x4842150 (__memcmp_sse4_1) --1286938-- REDIR: 0x6520520 (libc.so.6:__strcmp_sse2_unaligned) redirected to 0x483fed0 (strcmp) --1286938-- REDIR: 0x61d0c10 (libstdc++.so.6:operator new(unsigned long)) redirected to 0x483bdf0 (operator new(unsigned long)) --1286938-- REDIR: 0x61cee60 (libstdc++.so.6:operator delete(void*)) redirected to 0x483cf50 (operator delete(void*)) --1286938-- REDIR: 0x61d0c70 (libstdc++.so.6:operator new[](unsigned long)) redirected to 0x483c510 (operator new[](unsigned long)) --1286938-- REDIR: 0x61cee90 (libstdc++.so.6:operator delete[](void*)) redirected to 0x483d6e0 (operator delete[](void*)) --1286938-- REDIR: 0x65275f0 (libc.so.6:__strchr_sse2) redirected to 0x483eb90 (__strchr_sse2) --1286950-- REDIR: 0x6511000 (libc.so.6:realloc) redirected to 0x483df30 (realloc) --1286950-- REDIR: 0x6527820 (libc.so.6:__strchrnul_sse2) redirected to 0x4843540 (strchrnul) --1286950-- REDIR: 0x6531560 (libc.so.6:__strstr_sse2_unaligned) redirected to 0x4843c20 (strstr) --1286950-- REDIR: 0x6531c20 (libc.so.6:__mempcpy_sse2_unaligned) redirected to 0x4843660 (mempcpy) --1286950-- REDIR: 0x652d2a0 (libc.so.6:__strncpy_sse2_unaligned) redirected to 0x483f560 (__strncpy_sse2_unaligned) --1286950-- REDIR: 0x6515830 (libc.so.6:strncat) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) --1286950-- REDIR: 0x65305b0 (libc.so.6:__strncat_sse2_unaligned) redirected to 0x483ede0 (strncat) --1286950-- REDIR: 0x6516120 (libc.so.6:__GI_strstr) redirected to 0x4843ca0 (__strstr_sse2) --1286950-- REDIR: 0x6522360 (libc.so.6:__rawmemchr_sse2) redirected to 0x4843580 (rawmemchr) --1286950-- REDIR: 0x65faea0 (libc.so.6:__strcasecmp_avx) redirected to 0x483f830 (strcasecmp) --1286950-- REDIR: 0x65fc520 (libc.so.6:__strncasecmp_avx) redirected to 0x483f910 (strncasecmp) --1286950-- REDIR: 0x65f98a0 (libc.so.6:__strspn_sse42) redirected to 0x4843ef0 (strspn) --1286950-- REDIR: 0x65f9620 (libc.so.6:__strcspn_sse42) redirected to 0x4843e10 (strcspn) --1286948-- REDIR: 0x6522030 (libc.so.6:__memchr_sse2) redirected to 0x4840050 (memchr) --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_mmap.so --1286948-- object doesn't have a symbol table --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_posix.so --1286948-- object doesn't have a symbol table --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_sysv.so --1286948-- object doesn't have a symbol table --1286948-- Discarding syms at 0x4a96240-0x4a96d47 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_posix.so (have_dinfo 1) --1286948-- Discarding syms at 0x4a9b1c0-0x4a9b937 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_sysv.so (have_dinfo 1) --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_weighted.so --1286948-- object doesn't have a symbol table --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_netlink.so --1286948-- object doesn't have a symbol table --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/libnl-3.so.200.26.0 --1286948-- object doesn't have a symbol table --1286948-- Discarding syms at 0x4a96120-0x4a966b0 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_weighted.so (have_dinfo 1) --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_flux.so --1286948-- object doesn't have a symbol table --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_orte.so --1286948-- object doesn't have a symbol table --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_ompi.so --1286948-- object doesn't have a symbol table --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_slurm.so --1286948-- object doesn't have a symbol table --1286948-- REDIR: 0x64bc670 (libc.so.6:setenv) redirected to 0x4844480 (setenv) --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_pmi.so --1286948-- object doesn't have a symbol table --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_flux.so --1286948-- object doesn't have a symbol table --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_ext3x.so --1286948-- object doesn't have a symbol table --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25 --1286948-- object doesn't have a symbol table --1286948-- Discarding syms at 0x8d053e0-0x8d07391 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_flux.so (have_dinfo 1) --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v21.so --1286948-- object doesn't have a symbol table --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v20.so --1286948-- object doesn't have a symbol table --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v3.so --1286948-- object doesn't have a symbol table --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v12.so --1286948-- object doesn't have a symbol table --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_usock.so --1286950-- object doesn't have a symbol table --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_tcp.so --1286950-- object doesn't have a symbol table --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_native.so --1286950-- object doesn't have a symbol table --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_none.so --1286950-- object doesn't have a symbol table --1286950-- Discarding syms at 0x8d04180-0x8d045b0 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_none.so (have_dinfo 1) --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so --1286950-- object doesn't have a symbol table --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/libmca_common_dstore.so.1.0.2 --1286950-- object doesn't have a symbol table --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_hash.so --1286950-- object doesn't have a symbol table --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds12.so --1286950-- object doesn't have a symbol table --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_pshmem_mmap.so --1286950-- object doesn't have a symbol table --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_preg_native.so --1286950-- object doesn't have a symbol table --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_stdfd.so --1286950-- object doesn't have a symbol table --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_syslog.so --1286950-- object doesn't have a symbol table --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_default.so --1286950-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_tool.so --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_app.so --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_orted.so --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_hnp.so --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_novm.so --1286946-- object doesn't have a symbol table --1286946-- Discarding syms at 0x9ebf0a0-0x9ebf490 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_tool.so (have_dinfo 1) --1286946-- Discarding syms at 0x9eca300-0x9ecbee8 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_orted.so (have_dinfo 1) --1286946-- Discarding syms at 0x9ed1220-0x9ed24e7 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_hnp.so (have_dinfo 1) --1286946-- Discarding syms at 0x9ed8240-0x9ed8c88 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_novm.so (have_dinfo 1) --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_tool.so --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_app.so --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_hnp.so --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_orted.so --1286946-- object doesn't have a symbol table --1286946-- Discarding syms at 0x9ebf0e0-0x9ebf417 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_tool.so (have_dinfo 1) --1286946-- Discarding syms at 0x9ecf320-0x9ed1239 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_hnp.so (have_dinfo 1) --1286946-- Discarding syms at 0x9ed73a0-0x9ed9ccc in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_orted.so (have_dinfo 1) --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_basic.so --1286936-- object doesn't have a symbol table --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_bucket.so --1286936-- object doesn't have a symbol table --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_rcache_grdma.so --1286936-- object doesn't have a symbol table --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mpool_hugepage.so --1286936-- object doesn't have a symbol table --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_bml_r2.so --1286936-- object doesn't have a symbol table --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_tcp.so --1286936-- object doesn't have a symbol table --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_sm.so --1286936-- object doesn't have a symbol table --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so --1286936-- object doesn't have a symbol table --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_openib.so --1286936-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_verbs.so.40.20.0 --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_self.so --1286946-- object doesn't have a symbol table --1286946-- REDIR: 0x652cc70 (libc.so.6:__strcpy_sse2_unaligned) redirected to 0x483f090 (strcpy) --1286946-- REDIR: 0x65a3810 (libc.so.6:__memmove_chk) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) ==1286946== WARNING: new redirection conflicts with existing -- ignoring it --1286946-- old: 0x06531c30 (__memcpy_chk_sse2_un) R-> (2030.0) 0x04843b10 __memcpy_chk --1286946-- new: 0x06531c30 (__memcpy_chk_sse2_un) R-> (2024.0) 0x048434d0 __memmove_chk --1286946-- REDIR: 0x6531c30 (libc.so.6:__memcpy_chk_sse2_unaligned) redirected to 0x4843b10 (__memcpy_chk) --1286946-- REDIR: 0x65129b0 (libc.so.6:posix_memalign) redirected to 0x483e1e0 (posix_memalign) --1286946-- Discarding syms at 0x9f15280-0x9f32932 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_openib.so (have_dinfo 1) --1286946-- Discarding syms at 0x9f7c4c0-0x9f7ded8 in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_verbs.so.40.20.0 (have_dinfo 1) --1286946-- Discarding syms at 0x9f620c0-0x9f71483 in /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 (have_dinfo 1) --1286946-- Discarding syms at 0x9f9ba10-0x9fd22ee in /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 (have_dinfo 1) --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_cm.so --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_monitoring.so --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_monitoring.so.50.10.0 --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_vprotocol_pessimist.so --1286946-- object doesn't have a symbol table --1286946-- Discarding syms at 0x9f4d400-0x9f50c19 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_vprotocol_pessimist.so (have_dinfo 1) --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm.so --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/libpsm1/libpsm_infinipath.so.1.16 --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libinfinipath.so.4.0 --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libuuid.so.1.3.0 --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_ofi.so --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libfabric.so.1.9.15 --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/librdmacm.so.1.2.28.0 --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libpsm2.so.2.1 --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libnuma.so.1.0.0 --1286946-- object doesn't have a symbol table --1286946-- REDIR: 0x6517140 (libc.so.6:strcasestr) redirected to 0x4843f80 (strcasestr) --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm2.so --1286946-- object doesn't have a symbol table --1286946-- Discarding syms at 0x9f4d5c0-0x9f4f5a1 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm.so (have_dinfo 1) --1286946-- Discarding syms at 0x9fee680-0x9ff096c in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm2.so (have_dinfo 1) --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_inter.so --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_basic.so --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sm.so --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_sm.so.40.20.0 --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_self.so --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sync.so --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_monitoring.so --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_libnbc.so --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_tuned.so --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_sm.so --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_pt2pt.so --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_rdma.so --1286946-- object doesn't have a symbol table --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_monitoring.so --1286946-- object doesn't have a symbol table --1286946-- Discarding syms at 0x9f724a0-0x9f787b5 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_ofi.so (have_dinfo 1) --1286946-- Discarding syms at 0xa827f80-0xa8e14c4 in /usr/lib/x86_64-linux-gnu/libfabric.so.1.9.15 (have_dinfo 1) --1286946-- Discarding syms at 0x9f94830-0x9fbafce in /usr/lib/libpsm1/libpsm_infinipath.so.1.16 (have_dinfo 1) --1286946-- Discarding syms at 0x9fe5580-0x9fe8f71 in /usr/lib/x86_64-linux-gnu/libuuid.so.1.3.0 (have_dinfo 1) --1286946-- Discarding syms at 0x9f56420-0x9f5cec0 in /usr/lib/x86_64-linux-gnu/libinfinipath.so.4.0 (have_dinfo 1) --1286946-- Discarding syms at 0xa929f10-0xa93d5fc in /usr/lib/x86_64-linux-gnu/librdmacm.so.1.2.28.0 (have_dinfo 1) --1286946-- Discarding syms at 0xa94b0c0-0xa95a483 in /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 (have_dinfo 1) --1286946-- Discarding syms at 0xa968860-0xa9adf12 in /usr/lib/x86_64-linux-gnu/libpsm2.so.2.1 (have_dinfo 1) --1286946-- Discarding syms at 0xa9e7a10-0xaa1e2ee in /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 (have_dinfo 1) --1286946-- Discarding syms at 0x9f80410-0x9f84e27 in /usr/lib/x86_64-linux-gnu/libnuma.so.1.0.0 (have_dinfo 1) --1286946-- Discarding syms at 0x9f103e0-0x9f15fd5 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_cm.so (have_dinfo 1) --1286946-- Discarding syms at 0x9f471e0-0x9f47ce0 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_monitoring.so (have_dinfo 1) ==1286946== Thread 3: ==1286946== Syscall param writev(vector[...]) points to uninitialised byte(s) ==1286946== at 0x658A48D: __writev (writev.c:26) ==1286946== by 0x658A48D: writev (writev.c:24) ==1286946== by 0x8DF9B4C: pmix_ptl_base_send_handler (in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) ==1286946== by 0x7CC413E: ??? (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) ==1286946== by 0x7CC487E: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) ==1286946== by 0x8DBDD55: ??? (in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) ==1286946== by 0x4BF7608: start_thread (pthread_create.c:477) ==1286946== by 0x6595102: clone (clone.S:95) ==1286946== Address 0xa28fdcf is 127 bytes inside a block of size 5,120 alloc'd ==1286946== at 0x483DFAF: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286946== by 0x8DE155A: pmix_bfrop_buffer_extend (in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) ==1286946== by 0x8DE3F4A: pmix_bfrops_base_pack_byte (in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) ==1286946== by 0x8DE4900: pmix_bfrops_base_pack_buf (in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) ==1286946== by 0x8DE4175: pmix_bfrops_base_pack (in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) ==1286946== by 0x8D7CF91: ??? (in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) ==1286946== by 0x7CC3FDD: ??? (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) ==1286946== by 0x7CC487E: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) ==1286946== by 0x8DBDD55: ??? (in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) ==1286946== by 0x4BF7608: start_thread (pthread_create.c:477) ==1286946== by 0x6595102: clone (clone.S:95) ==1286946== Uninitialised value was created by a stack allocation ==1286946== at 0x9F048D6: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so) ==1286946== --1286944-- Discarding syms at 0xaa4d220-0xaa5796a in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmp--1286945-- Discarding syms at 0xaa4d220---1286948-- Discarding syms at 0xaae1100-0xaae7d70 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmp--1286945-- Discarding syms at 0xaae1100-0xaae7d70 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_monitoring.so (have_dinfo 1) --1286945-- Discarding syms at 0x9f69420-0x9f--1286938-- REDIR: 0x61cee70 (libstdc++.so.6:operator delete(void*, unsigned long)) redirected to --1286937-- REDIR: 0x61cee70 (libstdc++.so.6:opera--1286946-- REDIR: 0x652e970 (libc.so.6:__stpncpy_sse2_unaligned) redirected to 0x48427e0 (stpncpy) --1286942-- REDIR: 0x6527ed0 (libc.so.6:__strnlen_sse2) redirected to 0x483eee0 (strnlen) --1286944-- REDIR: 0x652fcc0 (libc.so.6:__strcat_sse2_unaligned) redirected to 0x483ec20 (strcat) --1286951-- REDIR: 0x65113d0 (libc.so.6:memalign) redirected to 0x483e2a0 (memalign) --1286951-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_romio321.so --1286951-- object doesn't have a symbol table --1286951-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_ompio.so --1286951-- object doesn't have a symbol table --1286941-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_ompio.so.41.19.3 --1286941-- object doesn't have a symbol table --1286951-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fs_ufs.so --1286951-- object doesn't have a symbol table --1286939-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_two_phase.so --1286939-- object doesn't have a symbol table --1286939-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_individual.so --1286939-- object doesn't have a symbol table --1286939-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic.so --1286939-- object doesn't have a symbol table --1286939-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic_gen2.so --1286939-- object doesn't have a symbol table --1286939-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_vulcan.so --1286939-- object doesn't have a symbol table --1286939-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fbtl_posix.so --1286939-- object doesn't have a symbol table --1286943-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_sm.so --1286943-- object doesn't have a symbol table --1286943-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_individual.so --1286943-- object doesn't have a symbol table --1286943-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_lockedfile.so --1286943-- object doesn't have a symbol table --1286938-- REDIR: 0x65a3b00 (libc.so.6:__strcpy_chk) redirected to 0x48435c0 (__strcpy_chk) --1286939-- Discarding syms at 0x9f1d660-0x9f371d6 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so (have_dinfo 1) --1286939-- Discarding syms at 0x9f5afa0-0x9f8f8b6 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_romio321.so (have_dinfo 1) --1286939-- Discarding syms at 0x9fa0640-0x9fa42d9 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_ompio.so (have_dinfo 1) --1286939-- Discarding syms at 0x9f4c160-0x9f4dc58 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_inter.so (have_dinfo 1) --1286939-- Discarding syms at 0xa7fc270-0xa804f00 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_basic.so (have_dinfo 1) --1286939-- Discarding syms at 0x9fee3a0-0x9ff134e in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sm.so (have_dinfo 1) --1286939-- Discarding syms at 0xa80a240-0xa80aa8d in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_sm.so.40.20.0 (have_dinfo 1) --1286939-- Discarding syms at 0xa80f0e0-0xa80f8bb in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_self.so (have_dinfo 1) --1286939-- Discarding syms at 0xaa460c0-0xaa47947 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sync.so (have_dinfo 1) --1286939-- Discarding syms at 0xaa613e0-0xaa7730f in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_libnbc.so (have_dinfo 1) --1286939-- Discarding syms at 0xaa849c0-0xaa8a845 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_tuned.so (have_dinfo 1) --1286939-- Discarding syms at 0x9ee1320-0x9ee3567 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_bml_r2.so (have_dinfo 1) --1286939-- Discarding syms at 0x9eebc40-0x9ef4ad7 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_tcp.so (have_dinfo 1) --1286939-- Discarding syms at 0x9f02600-0x9f08cd8 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so (have_dinfo 1) --1286939-- Discarding syms at 0x9f40200-0x9f4126e in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_self.so (have_dinfo 1) --1286939-- Discarding syms at 0x9eda4e0-0x9edb4c5 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mpool_hugepage.so (have_dinfo 1) --1286939-- Discarding syms at 0x9ed32c0-0x9ed4afe in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_rcache_grdma.so (have_dinfo 1) --1286939-- Discarding syms at 0x9ebf160-0x9ebfe95 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_basic.so (have_dinfo 1) --1286939-- Discarding syms at 0x9ece140-0x9ecebed in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_bucket.so (have_dinfo 1) --1286939-- Discarding syms at 0x9ec92a0-0x9ec9aa2 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_app.so (have_dinfo 1) --1286939-- Discarding syms at 0x8eae0e0-0x8eae4a7 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_stdfd.so (have_dinfo 1) --1286939-- Discarding syms at 0x8eb3220-0x8eb3c27 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_syslog.so (have_dinfo 1) --1286939-- Discarding syms at 0x8eb80e0-0x8eb90b7 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_default.so (have_dinfo 1) --1286939-- Discarding syms at 0x8ea6380-0x8ea97b3 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_preg_native.so (have_dinfo 1) --1286939-- Discarding syms at 0x8e5a740-0x8e5f859 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_usock.so (have_dinfo 1) --1286939-- Discarding syms at 0x8e67be0-0x8e743f0 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_tcp.so (have_dinfo 1) --1286939-- Discarding syms at 0x84da200-0x84daa5d in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_native.so (have_dinfo 1) --1286939-- Discarding syms at 0x8d322b0-0x8d34bfc in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v21.so (have_dinfo 1) --1286939-- Discarding syms at 0x8e29480-0x8e3b70a in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v20.so (have_dinfo 1) --1286939-- Discarding syms at 0x8d3c2b0-0x8d3ed5c in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v3.so (have_dinfo 1) --1286939-- Discarding syms at 0x8e45340-0x8e502da in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v12.so (have_dinfo 1) --1286939-- Discarding syms at 0x8e901a0-0x8e908a7 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_pshmem_mmap.so (have_dinfo 1) --1286939-- Discarding syms at 0x8d05520-0x8d06783 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so (have_dinfo 1) --1286939-- Discarding syms at 0x8e7b460-0x8e8aaa4 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_hash.so (have_dinfo 1) --1286939-- Discarding syms at 0x8d44520-0x8d4556a in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds12.so (have_dinfo 1) --1286939-- Discarding syms at 0x8e97600-0x8ea0fa1 in /usr/lib/x86_64-linux-gnu/pmix/lib/libmca_common_dstore.so.1.0.2 (have_dinfo 1) --1286939-- Discarding syms at 0x8d109c0-0x8d27dcf in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_ext3x.so (have_dinfo 1) --1286939-- Discarding syms at 0x8d5b280-0x8dfdffb in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25 (have_dinfo 1) --1286939-- Discarding syms at 0x9ec40a0-0x9ec4490 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_app.so (have_dinfo 1) --1286939-- Discarding syms at 0x84d2580-0x84d518f in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_pmi.so (have_dinfo 1) --1286939-- Discarding syms at 0x4a96120-0x4a9644f in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_flux.so (have_dinfo 1) --1286939-- Discarding syms at 0x4aa0100-0x4aa03e7 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_orte.so (have_dinfo 1) --1286939-- Discarding syms at 0x84c74a0-0x84c901f in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_ompi.so (have_dinfo 1) --1286939-- Discarding syms at 0x4aa5260-0x4aa58e9 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_slurm.so (have_dinfo 1) --1286939-- Discarding syms at 0x4a9b420-0x4a9bcdf in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_netlink.so (have_dinfo 1) --1286939-- Discarding syms at 0x84e7460-0x84f52ca in /usr/lib/x86_64-linux-gnu/libnl-3.so.200.26.0 (have_dinfo 1) --1286939-- Discarding syms at 0x4a90360-0x4a91107 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_mmap.so (have_dinfo 1) --1286939-- Discarding syms at 0x9f46220-0x9f474cc in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fbtl_posix.so (have_dinfo 1) --1286939-- Discarding syms at 0x9f0f180-0x9f0f78d in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fs_ufs.so (have_dinfo 1) --1286939-- Discarding syms at 0xaa94540-0xaa96a4a in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_sm.so (have_dinfo 1) --1286939-- Discarding syms at 0xaa9f6c0-0xaab44d0 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_pt2pt.so (have_dinfo 1) --1286939-- Discarding syms at 0xaabe820-0xaad8ee0 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_rdma.so (have_dinfo 1) --1286939-- Discarding syms at 0x9efc080-0x9efc1e1 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_sm.so (have_dinfo 1) --1286939-- Discarding syms at 0x9fab2a0-0x9fb1341 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_two_phase.so (have_dinfo 1) --1286939-- Discarding syms at 0x9f140c0-0x9f14299 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_individual.so (have_dinfo 1) --1286939-- Discarding syms at 0x9fb72a0-0x9fbb791 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic.so (have_dinfo 1) --1286939-- Discarding syms at 0x9fd52a0-0x9fda794 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic_gen2.so (have_dinfo 1) --1286939-- Discarding syms at 0x9fe02e0-0x9fe59a5 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_vulcan.so (have_dinfo 1) --1286939-- Discarding syms at 0xa815460-0xa8177ab in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_sm.so (have_dinfo 1) --1286939-- Discarding syms at 0xa81e260-0xa82033d in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_individual.so (have_dinfo 1) --1286939-- Discarding syms at 0xa8273e0-0xa8297d8 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_lockedfile.so (have_dinfo 1) --1286939-- Discarding syms at 0x9fc85e0-0x9fce8ef in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_ompio.so.41.19.3 (have_dinfo 1) ==1286939== ==1286939== HEAP SUMMARY: ==1286939== in use at exit: 74,054 bytes in 223 blocks ==1286939== total heap usage: 22,405,782 allocs, 22,405,559 frees, 34,062,479,959 bytes allocated ==1286939== ==1286939== Searching for pointers to 223 not-freed blocks ==1286939== Checked 3,415,912 bytes ==1286939== ==1286939== Thread 1: ==1286939== 1 bytes in 1 blocks are definitely lost in loss record 1 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x651550E: strdup (strdup.c:42) ==1286939== by 0x9F6A4B6: ??? ==1286939== by 0x9F47373: ??? ==1286939== by 0x68E3B9B: mca_base_framework_components_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68E3F35: mca_base_framework_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68E3F93: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x4BA1734: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) ==1286939== ==1286939== 8 bytes in 1 blocks are still reachable in loss record 2 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x764724C: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) ==1286939== by 0x7657B9A: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) ==1286939== by 0x7645679: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) ==1286939== by 0x4011B89: call_init.part.0 (dl-init.c:72) ==1286939== by 0x4011C90: call_init (dl-init.c:30) ==1286939== by 0x4011C90: _dl_init (dl-init.c:119) ==1286939== by 0x4001139: ??? (in /usr/lib/x86_64-linux-gnu/ld-2.31.so) ==1286939== by 0x3: ??? ==1286939== by 0x1FFEFFF926: ??? ==1286939== by 0x1FFEFFF93D: ??? ==1286939== by 0x1FFEFFF987: ??? ==1286939== by 0x1FFEFFF9A7: ??? ==1286939== ==1286939== 8 bytes in 1 blocks are definitely lost in loss record 3 of 44 ==1286939== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x9F69B6F: ??? ==1286939== by 0x9F1CDED: ??? ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x9EE3527: ??? ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) ==1286939== by 0x15710D: main (testing_main.cpp:8) ==1286939== ==1286939== 13 bytes in 2 blocks are still reachable in loss record 4 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x651550E: strdup (strdup.c:42) ==1286939== by 0x7CC3657: event_config_avoid_method (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) ==1286939== by 0x68FEB5A: opal_event_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68FE8CA: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68B8BCF: opal_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x6860120: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) ==1286939== ==1286939== 15 bytes in 1 blocks are indirectly lost in loss record 5 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x651550E: strdup (strdup.c:42) ==1286939== by 0x9EDB189: ??? ==1286939== by 0x68D98FC: mca_base_framework_components_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x6907C25: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x4BA16D5: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) ==1286939== by 0x15710D: main (testing_main.cpp:8) ==1286939== ==1286939== 15 bytes in 1 blocks are definitely lost in loss record 6 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x651550E: strdup (strdup.c:42) ==1286939== by 0x9F5655C: ??? ==1286939== by 0x4011B89: call_init.part.0 (dl-init.c:72) ==1286939== by 0x4011C90: call_init (dl-init.c:30) ==1286939== by 0x4011C90: _dl_init (dl-init.c:119) ==1286939== by 0x65D6784: _dl_catch_exception (dl-error-skeleton.c:182) ==1286939== by 0x401642C: dl_open_worker (dl-open.c:758) ==1286939== by 0x65D6727: _dl_catch_exception (dl-error-skeleton.c:208) ==1286939== by 0x40155F9: _dl_open (dl-open.c:837) ==1286939== by 0x72DE34B: dlopen_doit (dlopen.c:66) ==1286939== by 0x65D6727: _dl_catch_exception (dl-error-skeleton.c:208) ==1286939== by 0x65D67F2: _dl_catch_error (dl-error-skeleton.c:227) ==1286939== ==1286939== 16 bytes in 1 blocks are definitely lost in loss record 7 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x9F1CBEB: ??? ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x9EE3527: ??? ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) ==1286939== by 0x15710D: main (testing_main.cpp:8) ==1286939== ==1286939== 16 bytes in 1 blocks are definitely lost in loss record 8 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x9F1CC66: ??? ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x9EE3527: ??? ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) ==1286939== by 0x15710D: main (testing_main.cpp:8) ==1286939== ==1286939== 16 bytes in 1 blocks are definitely lost in loss record 9 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x9F1CCDA: ??? ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x9EE3527: ??? ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) ==1286939== by 0x15710D: main (testing_main.cpp:8) ==1286939== ==1286939== 25 bytes in 1 blocks are still reachable in loss record 10 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x651550E: strdup (strdup.c:42) ==1286939== by 0x68F27BD: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x4B956B6: ompi_pml_v_output_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B95259: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x68D98FC: mca_base_framework_components_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x4B93FAE: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x4BA1734: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) ==1286939== ==1286939== 30 bytes in 1 blocks are definitely lost in loss record 11 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0xA9A859B: ??? ==1286939== by 0x4011B89: call_init.part.0 (dl-init.c:72) ==1286939== by 0x4011C90: call_init (dl-init.c:30) ==1286939== by 0x4011C90: _dl_init (dl-init.c:119) ==1286939== by 0x65D6784: _dl_catch_exception (dl-error-skeleton.c:182) ==1286939== by 0x401642C: dl_open_worker (dl-open.c:758) ==1286939== by 0x65D6727: _dl_catch_exception (dl-error-skeleton.c:208) ==1286939== by 0x40155F9: _dl_open (dl-open.c:837) ==1286939== by 0x72DE34B: dlopen_doit (dlopen.c:66) ==1286939== by 0x65D6727: _dl_catch_exception (dl-error-skeleton.c:208) ==1286939== by 0x65D67F2: _dl_catch_error (dl-error-skeleton.c:227) ==1286939== by 0x72DEB58: _dlerror_run (dlerror.c:170) ==1286939== ==1286939== 32 bytes in 1 blocks are still reachable in loss record 12 of 44 ==1286939== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x7CC353E: event_get_supported_methods (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) ==1286939== by 0x68FEA98: opal_event_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68FE8CA: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68B8BCF: opal_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x6860120: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) ==1286939== ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 13 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x8E9D3EB: ??? ==1286939== by 0x8E9F1C1: ??? ==1286939== by 0x8D0578C: ??? ==1286939== by 0x8D8605A: ??? ==1286939== by 0x8D87FE8: ??? ==1286939== by 0x8D88E4D: ??? ==1286939== by 0x8D1A5EB: ??? ==1286939== by 0x84D2B0A: ??? ==1286939== by 0x68602FB: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 14 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x8E9D3EB: ??? ==1286939== by 0x8E9F1C1: ??? ==1286939== by 0x8D0578C: ??? ==1286939== by 0x8D8605A: ??? ==1286939== by 0x8D87FE8: ??? ==1286939== by 0x8D88E4D: ??? ==1286939== by 0x8D1A5EB: ??? ==1286939== by 0x84D2BCE: ??? ==1286939== by 0x68602FB: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 15 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x8E9D3EB: ??? ==1286939== by 0x8E9F1C1: ??? ==1286939== by 0x8D0578C: ??? ==1286939== by 0x8D8605A: ??? ==1286939== by 0x8D87FE8: ??? ==1286939== by 0x8D88E4D: ??? ==1286939== by 0x8D1A5EB: ??? ==1286939== by 0x84D2CB2: ??? ==1286939== by 0x68602FB: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 16 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x8E9D3EB: ??? ==1286939== by 0x8E9F1C1: ??? ==1286939== by 0x8D0578C: ??? ==1286939== by 0x8D8605A: ??? ==1286939== by 0x8D87FE8: ??? ==1286939== by 0x8D88E4D: ??? ==1286939== by 0x8D1A5EB: ??? ==1286939== by 0x84D2D91: ??? ==1286939== by 0x68602FB: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 17 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x8E81BD8: ??? ==1286939== by 0x8E89F4B: ??? ==1286939== by 0x8D84A0D: ??? ==1286939== by 0x8DF79C1: ??? ==1286939== by 0x7CC3FDD: ??? (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) ==1286939== by 0x7CC487E: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) ==1286939== by 0x8DBDD55: ??? ==1286939== by 0x4BF7608: start_thread (pthread_create.c:477) ==1286939== by 0x6595102: clone (clone.S:95) ==1286939== ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 18 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x8E9D3EB: ??? ==1286939== by 0x8E9F1C1: ??? ==1286939== by 0x8D0578C: ??? ==1286939== by 0x8D8605A: ??? ==1286939== by 0x8D87FE8: ??? ==1286939== by 0x8D88E4D: ??? ==1286939== by 0x8D1A767: ??? ==1286939== by 0x84D330E: ??? ==1286939== by 0x68602FB: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== ==1286939== 36 (32 direct, 4 indirect) bytes in 1 blocks are definitely lost in loss record 19 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x8E9D3EB: ??? ==1286939== by 0x8E9F1C1: ??? ==1286939== by 0x8D0578C: ??? ==1286939== by 0x8D8605A: ??? ==1286939== by 0x8D87FE8: ??? ==1286939== by 0x8D88E4D: ??? ==1286939== by 0x8D1A5EB: ??? ==1286939== by 0x4B94C09: mca_pml_base_pml_check_selected (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x9F1E1E1: ??? ==1286939== by 0x4BA1A09: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== ==1286939== 40 bytes in 1 blocks are still reachable in loss record 20 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x7CFF4B6: ??? (in /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) ==1286939== by 0x7CC5E26: event_global_setup_locks_ (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) ==1286939== by 0x7CFF68F: evthread_use_pthreads (in /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) ==1286939== by 0x68FE8E4: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68B8BCF: opal_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x6860120: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) ==1286939== ==1286939== 40 bytes in 1 blocks are still reachable in loss record 21 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x7CFF4B6: ??? (in /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) ==1286939== by 0x7CCF377: evsig_global_setup_locks_ (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) ==1286939== by 0x7CC5E39: event_global_setup_locks_ (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) ==1286939== by 0x7CFF68F: evthread_use_pthreads (in /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) ==1286939== by 0x68FE8E4: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68B8BCF: opal_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x6860120: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) ==1286939== ==1286939== 40 bytes in 1 blocks are still reachable in loss record 22 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x7CFF4B6: ??? (in /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) ==1286939== by 0x7CCB997: evutil_secure_rng_global_setup_locks_ (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) ==1286939== by 0x7CC5E4F: event_global_setup_locks_ (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) ==1286939== by 0x7CFF68F: evthread_use_pthreads (in /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) ==1286939== by 0x68FE8E4: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68B8BCF: opal_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x6860120: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) ==1286939== ==1286939== 48 bytes in 1 blocks are still reachable in loss record 23 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x68D9043: mca_base_component_repository_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68D7F7A: mca_base_component_find (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68E3F35: mca_base_framework_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68E3F93: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x4B8560C: mca_io_base_file_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B0E68A: ompi_file_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B3ADB8: PMPI_File_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) ==1286939== ==1286939== 48 bytes in 1 blocks are still reachable in loss record 24 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x68D9043: mca_base_component_repository_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68D7F7A: mca_base_component_find (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68E3F35: mca_base_framework_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68E3F93: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x4B85638: mca_io_base_file_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B0E68A: ompi_file_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B3ADB8: PMPI_File_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) ==1286939== ==1286939== 48 bytes in 2 blocks are still reachable in loss record 25 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x7CC3647: event_config_avoid_method (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) ==1286939== by 0x68FEB5A: opal_event_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68FE8CA: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68B8BCF: opal_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x6860120: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) ==1286939== ==1286939== 55 (32 direct, 23 indirect) bytes in 1 blocks are definitely lost in loss record 26 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x8E9D3EB: ??? ==1286939== by 0x8E9F1C1: ??? ==1286939== by 0x8D0578C: ??? ==1286939== by 0x8D8605A: ??? ==1286939== by 0x8D87FE8: ??? ==1286939== by 0x8D88E4D: ??? ==1286939== by 0x8D1A767: ??? ==1286939== by 0x4AF6CD6: ompi_comm_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4BA194D: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) ==1286939== ==1286939== 56 bytes in 1 blocks are still reachable in loss record 27 of 44 ==1286939== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x7CC1C86: event_config_new (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) ==1286939== by 0x68FEAC0: opal_event_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68FE8CA: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68B8BCF: opal_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x6860120: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) ==1286939== ==1286939== 56 bytes in 1 blocks are definitely lost in loss record 28 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x9F6E008: ??? ==1286939== by 0x9F7C654: ??? ==1286939== by 0x9F1CD3E: ??? ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x9EE3527: ??? ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) ==1286939== ==1286939== 56 bytes in 1 blocks are definitely lost in loss record 29 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0xA957008: ??? ==1286939== by 0xA86B017: ??? ==1286939== by 0xA862FD8: ??? ==1286939== by 0xA828E15: ??? ==1286939== by 0xA829624: ??? ==1286939== by 0x9F77910: ??? ==1286939== by 0x4B85C53: ompi_mtl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x9F13E4D: ??? ==1286939== by 0x4B94673: mca_pml_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4BA1789: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== ==1286939== 76 (32 direct, 44 indirect) bytes in 1 blocks are definitely lost in loss record 30 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x8E9D3EB: ??? ==1286939== by 0x8E9F1C1: ??? ==1286939== by 0x8D0578C: ??? ==1286939== by 0x8D8605A: ??? ==1286939== by 0x8D87FE8: ??? ==1286939== by 0x8D88E4D: ??? ==1286939== by 0x8D1A767: ??? ==1286939== by 0x84D387F: ??? ==1286939== by 0x68602FB: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== ==1286939== 79 (64 direct, 15 indirect) bytes in 1 blocks are definitely lost in loss record 31 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x9EDB12E: ??? ==1286939== by 0x68D98FC: mca_base_framework_components_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x6907C25: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x4BA16D5: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) ==1286939== by 0x15710D: main (testing_main.cpp:8) ==1286939== ==1286939== 144 bytes in 3 blocks are still reachable in loss record 32 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x68D9043: mca_base_component_repository_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68D7F7A: mca_base_component_find (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68E3F35: mca_base_framework_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68E3F93: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x4B8564E: mca_io_base_file_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B0E68A: ompi_file_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B3ADB8: PMPI_File_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) ==1286939== ==1286939== 231 bytes in 12 blocks are definitely lost in loss record 33 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x651550E: strdup (strdup.c:42) ==1286939== by 0x9F2B4B3: ??? ==1286939== by 0x9F2B85C: ??? ==1286939== by 0x9F2BBD7: ??? ==1286939== by 0x9F1CAAC: ??? ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x9EE3527: ??? ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) ==1286939== ==1286939== 240 bytes in 5 blocks are still reachable in loss record 34 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x68D9043: mca_base_component_repository_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68D7F7A: mca_base_component_find (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68E3F35: mca_base_framework_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x68E3F93: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x4B85622: mca_io_base_file_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B0E68A: ompi_file_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B3ADB8: PMPI_File_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) ==1286939== ==1286939== 272 bytes in 44 blocks are definitely lost in loss record 35 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x9FCAEDB: ??? ==1286939== by 0x9FE42B2: ??? ==1286939== by 0x9FE47BB: ??? ==1286939== by 0x9FCDDBF: ??? ==1286939== by 0x9FA324A: ??? ==1286939== by 0x4B3DD7F: PMPI_File_write_at_all (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x7B6DB32: H5FD_mpio_write (H5FDmpio.c:1833) ==1286939== by 0x78DF11D: H5FD_write (H5FDint.c:257) ==1286939== by 0x78AE86B: H5F__accum_write (H5Faccum.c:825) ==1286939== by 0x7A1FBE9: H5PB_write (H5PB.c:1027) ==1286939== by 0x78BBC7A: H5F_block_write (H5Fio.c:164) ==1286939== ==1286939== 585 (480 direct, 105 indirect) bytes in 15 blocks are definitely lost in loss record 36 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x8E9D3EB: ??? ==1286939== by 0x8E9F1C1: ??? ==1286939== by 0x8D0578C: ??? ==1286939== by 0x8D8605A: ??? ==1286939== by 0x8D87FE8: ??? ==1286939== by 0x8D88E4D: ??? ==1286939== by 0x8D1A767: ??? ==1286939== by 0x4B14036: ompi_proc_complete_init_single (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B146C3: ompi_proc_complete_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4BA19A9: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== ==1286939== 776 bytes in 32 blocks are indirectly lost in loss record 37 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x8DE9816: ??? ==1286939== by 0x8DEB1D2: ??? ==1286939== by 0x8DEB49A: ??? ==1286939== by 0x8DE8B12: ??? ==1286939== by 0x8E9D492: ??? ==1286939== by 0x8E9F1C1: ??? ==1286939== by 0x8D0578C: ??? ==1286939== by 0x8D8605A: ??? ==1286939== by 0x8D87FE8: ??? ==1286939== by 0x8D88E4D: ??? ==1286939== by 0x8D1A767: ??? ==1286939== ==1286939== 840 (480 direct, 360 indirect) bytes in 15 blocks are definitely lost in loss record 38 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x8E9D3EB: ??? ==1286939== by 0x8E9F1C1: ??? ==1286939== by 0x8D0578C: ??? ==1286939== by 0x8D8605A: ??? ==1286939== by 0x8D87FE8: ??? ==1286939== by 0x8D88E4D: ??? ==1286939== by 0x8D1A5EB: ??? ==1286939== by 0x9EF2F00: ??? ==1286939== by 0x9EEBF17: ??? ==1286939== by 0x9EE2F54: ??? ==1286939== by 0x9F1E1FB: ??? ==1286939== ==1286939== 1,084 (480 direct, 604 indirect) bytes in 15 blocks are definitely lost in loss record 39 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x8E9D3EB: ??? ==1286939== by 0x8E9F1C1: ??? ==1286939== by 0x8D0578C: ??? ==1286939== by 0x8D8605A: ??? ==1286939== by 0x8D87FE8: ??? ==1286939== by 0x8D88E4D: ??? ==1286939== by 0x8D1A767: ??? ==1286939== by 0x84D4800: ??? ==1286939== by 0x68602FB: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== ==1286939== 1,344 bytes in 1 blocks are definitely lost in loss record 40 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x68AE702: opal_free_list_grow_st (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x9F1CD2D: ??? ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x9EE3527: ??? ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) ==1286939== by 0x15710D: main (testing_main.cpp:8) ==1286939== ==1286939== 2,752 bytes in 1 blocks are definitely lost in loss record 41 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x68AE702: opal_free_list_grow_st (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x9F1CC50: ??? ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x9EE3527: ??? ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) ==1286939== by 0x15710D: main (testing_main.cpp:8) ==1286939== ==1286939== 2,752 bytes in 1 blocks are definitely lost in loss record 42 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x68AE702: opal_free_list_grow_st (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x9F1CCC4: ??? ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286939== by 0x9EE3527: ??? ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) ==1286939== by 0x15710D: main (testing_main.cpp:8) ==1286939== ==1286939== 62,644 bytes in 31 blocks are indirectly lost in loss record 43 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x8DE9FA8: ??? ==1286939== by 0x8DEB032: ??? ==1286939== by 0x8DEB49A: ??? ==1286939== by 0x8DE8B12: ??? ==1286939== by 0x8E9D492: ??? ==1286939== by 0x8E9F1C1: ??? ==1286939== by 0x8D0578C: ??? ==1286939== by 0x8D8605A: ??? ==1286939== by 0x8D87FE8: ??? ==1286939== by 0x8D88E4D: ??? ==1286939== by 0x8D1A5EB: ??? ==1286939== ==1286939== 62,760 (480 direct, 62,280 indirect) bytes in 15 blocks are definitely lost in loss record 44 of 44 ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x8E9D3EB: ??? ==1286939== by 0x8E9F1C1: ??? ==1286939== by 0x8D0578C: ??? ==1286939== by 0x8D8605A: ??? ==1286939== by 0x8D87FE8: ??? ==1286939== by 0x8D88E4D: ??? ==1286939== by 0x8D1A5EB: ??? ==1286939== by 0x9F0398A: ??? ==1286939== by 0x9EE2F54: ??? ==1286939== by 0x9F1E1FB: ??? ==1286939== by 0x4BA1A09: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286939== ==1286939== LEAK SUMMARY: ==1286939== definitely lost: 9,837 bytes in 138 blocks ==1286939== indirectly lost: 63,435 bytes in 64 blocks ==1286939== possibly lost: 0 bytes in 0 blocks ==1286939== still reachable: 782 bytes in 21 blocks ==1286939== suppressed: 0 bytes in 0 blocks ==1286939== ==1286939== ERROR SUMMARY: 29 errors from 29 contexts (suppressed: 0 from 0) ==1286939== ==1286939== 1 errors in context 1 of 29: ==1286939== Thread 3: ==1286939== Syscall param writev(vector[...]) points to uninitialised byte(s) ==1286939== at 0x658A48D: __writev (writev.c:26) ==1286939== by 0x658A48D: writev (writev.c:24) ==1286939== by 0x8DF9B4C: ??? ==1286939== by 0x7CC413E: ??? (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) ==1286939== by 0x7CC487E: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) ==1286939== by 0x8DBDD55: ??? ==1286939== by 0x4BF7608: start_thread (pthread_create.c:477) ==1286939== by 0x6595102: clone (clone.S:95) ==1286939== Address 0xa28ee1f is 127 bytes inside a block of size 5,120 alloc'd ==1286939== at 0x483DFAF: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286939== by 0x8DE155A: ??? ==1286939== by 0x8DE3F4A: ??? ==1286939== by 0x8DE4900: ??? ==1286939== by 0x8DE4175: ??? ==1286939== by 0x8D7CF91: ??? ==1286939== by 0x7CC3FDD: ??? (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) ==1286939== by 0x7CC487E: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) ==1286939== by 0x8DBDD55: ??? ==1286939== by 0x4BF7608: start_thread (pthread_create.c:477) ==1286939== by 0x6595102: clone (clone.S:95) ==1286939== Uninitialised value was created by a stack allocation ==1286939== at 0x9F048D6: ??? ==1286939== ==1286939== ERROR SUMMARY: 29 errors from 29 contexts (suppressed: 0 from 0) mpi/lib/libopen-pal.so.40.20.3) ==1286936== by 0x4B85622: mca_io_base_file_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286936== by 0x4B0E68A: ompi_file_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286936== by 0x4B3ADB8: PMPI_File_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286936== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) ==1286936== by 0x78D4B23: H5FD_open (H5FD.c:733) ==1286936== by 0x78B953B: H5F_open (H5Fint.c:1493) ==1286936== ==1286936== 272 bytes in 44 blocks are definitely lost in loss record 39 of 49 ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286936== by 0x9FCAEDB: ??? ==1286936== by 0x9FE42B2: ??? ==1286936== by 0x9FE47BB: ??? ==1286936== by 0x9FCDDBF: ??? ==1286936== by 0x9FA324A: ??? ==1286936== by 0x4B3DD7F: PMPI_File_write_at_all (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286936== by 0x7B6DB32: H5FD_mpio_write (H5FDmpio.c:1833) ==1286936== by 0x78DF11D: H5FD_write (H5FDint.c:257) ==1286936== by 0x78AE86B: H5F__accum_write (H5Faccum.c:825) ==1286936== by 0x7A1FBE9: H5PB_write (H5PB.c:1027) ==1286936== by 0x78BBC7A: H5F_block_write (H5Fio.c:164) ==1286936== ==1286936== 312 bytes in 1 blocks are still reachable in loss record 40 of 49 ==1286936== at 0x483BE63: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286936== by 0x74E78EB: boost::detail::make_external_thread_data() (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_thread.so.1.73.0) ==1286936== by 0x74E7C74: boost::detail::add_thread_exit_function(boost::detail::thread_exit_function_base*) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_thread.so.1.73.0) ==1286936== by 0x73AFCEA: boost::log::v2_mt_posix::sources::aux::get_severity_level() (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_log.so.1.73.0) ==1286936== by 0x5F71A6C: set_value (severity_feature.hpp:135) ==1286936== by 0x5F71A6C: open_record_unlocked > > (severity_feature.hpp:252) ==1286936== by 0x5F71A6C: open_record > > (basic_logger.hpp:459) ==1286936== by 0x5F71A6C: Logger::TraceMessage(std::__cxx11::basic_string, std::allocator >) (logger.cpp:328) ==1286936== by 0x5F729C7: Logger::Message(std::__cxx11::basic_string, std::allocator > const&, LogLevel) (logger.cpp:280) ==1286936== by 0x5F73CF1: Logger::Timer::Timer(std::__cxx11::basic_string, std::allocator > const&, LogLevel) (logger.cpp:426) ==1286936== by 0x15718A: timer (logger.hpp:98) ==1286936== by 0x15718A: main (testing_main.cpp:9) ==1286936== ==1286936== 585 (480 direct, 105 indirect) bytes in 15 blocks are definitely lost in loss record 41 of 49 ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286936== by 0x8E9D3EB: ??? ==1286936== by 0x8E9F1C1: ??? ==1286936== by 0x8D0578C: ??? ==1286936== by 0x8D8605A: ??? ==1286936== by 0x8D87FE8: ??? ==1286936== by 0x8D88E4D: ??? ==1286936== by 0x8D1A767: ??? ==1286936== by 0x4B14036: ompi_proc_complete_init_single (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286936== by 0x4B146C3: ompi_proc_complete_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286936== by 0x4BA19A9: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286936== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286936== ==1286936== 776 bytes in 32 blocks are indirectly lost in loss record 42 of 49 ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286936== by 0x8DE9816: ??? ==1286936== by 0x8DEB1D2: ??? ==1286936== by 0x8DEB49A: ??? ==1286936== by 0x8DE8B12: ??? ==1286936== by 0x8E9D492: ??? ==1286936== by 0x8E9F1C1: ??? ==1286936== by 0x8D0578C: ??? ==1286936== by 0x8D8605A: ??? ==1286936== by 0x8D87FE8: ??? ==1286936== by 0x8D88E4D: ??? ==1286936== by 0x8D1A767: ??? ==1286936== ==1286936== 840 (480 direct, 360 indirect) bytes in 15 blocks are definitely lost in loss record 43 of 49 ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286936== by 0x8E9D3EB: ??? ==1286936== by 0x8E9F1C1: ??? ==1286936== by 0x8D0578C: ??? ==1286936== by 0x8D8605A: ??? ==1286936== by 0x8D87FE8: ??? ==1286936== by 0x8D88E4D: ??? ==1286936== by 0x8D1A5EB: ??? ==1286936== by 0x9EF2F00: ??? ==1286936== by 0x9EEBF17: ??? ==1286936== by 0x9EE2F54: ??? ==1286936== by 0x9F1E1FB: ??? ==1286936== ==1286936== 1,091 (480 direct, 611 indirect) bytes in 15 blocks are definitely lost in loss record 44 of 49 ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286936== by 0x8E9D3EB: ??? ==1286936== by 0x8E9F1C1: ??? ==1286936== by 0x8D0578C: ??? ==1286936== by 0x8D8605A: ??? ==1286936== by 0x8D87FE8: ??? ==1286936== by 0x8D88E4D: ??? ==1286936== by 0x8D1A767: ??? ==1286936== by 0x84D4800: ??? ==1286936== by 0x68602FB: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) ==1286936== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286936== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286936== ==1286936== 1,344 bytes in 1 blocks are definitely lost in loss record 45 of 49 ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286936== by 0x68AE702: opal_free_list_grow_st (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286936== by 0x9F1CD2D: ??? ==1286936== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286936== by 0x9EE3527: ??? ==1286936== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286936== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286936== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286936== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) ==1286936== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) ==1286936== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) ==1286936== by 0x15710D: main (testing_main.cpp:8) ==1286936== ==1286936== 2,752 bytes in 1 blocks are definitely lost in loss record 46 of 49 ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286936== by 0x68AE702: opal_free_list_grow_st (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286936== by 0x9F1CC50: ??? ==1286936== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286936== by 0x9EE3527: ??? ==1286936== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286936== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286936== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286936== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) ==1286936== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) ==1286936== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) ==1286936== by 0x15710D: main (testing_main.cpp:8) ==1286936== ==1286936== 2,752 bytes in 1 blocks are definitely lost in loss record 47 of 49 ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286936== by 0x68AE702: opal_free_list_grow_st (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286936== by 0x9F1CCC4: ??? ==1286936== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) ==1286936== by 0x9EE3527: ??? ==1286936== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286936== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286936== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286936== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) ==1286936== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) ==1286936== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) ==1286936== by 0x15710D: main (testing_main.cpp:8) ==1286936== ==1286936== 62,640 bytes in 30 blocks are indirectly lost in loss record 48 of 49 ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286936== by 0x8DE9FA8: ??? ==1286936== by 0x8DEB032: ??? ==1286936== by 0x8DEB49A: ??? ==1286936== by 0x8DE8B12: ??? ==1286936== by 0x8E9D492: ??? ==1286936== by 0x8E9F1C1: ??? ==1286936== by 0x8D0578C: ??? ==1286936== by 0x8D8605A: ??? ==1286936== by 0x8D87FE8: ??? ==1286936== by 0x8D88E4D: ??? ==1286936== by 0x8D1A5EB: ??? ==1286936== ==1286936== 62,760 (480 direct, 62,280 indirect) bytes in 15 blocks are definitely lost in loss record 49 of 49 ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286936== by 0x8E9D3EB: ??? ==1286936== by 0x8E9F1C1: ??? ==1286936== by 0x8D0578C: ??? ==1286936== by 0x8D8605A: ??? ==1286936== by 0x8D87FE8: ??? ==1286936== by 0x8D88E4D: ??? ==1286936== by 0x8D1A5EB: ??? ==1286936== by 0x9F0398A: ??? ==1286936== by 0x9EE2F54: ??? ==1286936== by 0x9F1E1FB: ??? ==1286936== by 0x4BA1A09: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286936== ==1286936== LEAK SUMMARY: ==1286936== definitely lost: 9,805 bytes in 137 blocks ==1286936== indirectly lost: 63,431 bytes in 63 blocks ==1286936== possibly lost: 0 bytes in 0 blocks ==1286936== still reachable: 1,174 bytes in 27 blocks ==1286936== suppressed: 0 bytes in 0 blocks ==1286936== ==1286936== ERROR SUMMARY: 34 errors from 29 contexts (suppressed: 0 from 0) ==1286936== ==1286936== 1 errors in context 1 of 29: ==1286936== Thread 3: ==1286936== Syscall param writev(vector[...]) points to uninitialised byte(s) ==1286936== at 0x658A48D: __writev (writev.c:26) ==1286936== by 0x658A48D: writev (writev.c:24) ==1286936== by 0x8DF9B4C: ??? ==1286936== by 0x7CC413E: ??? (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) ==1286936== by 0x7CC487E: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) ==1286936== by 0x8DBDD55: ??? ==1286936== by 0x4BF7608: start_thread (pthread_create.c:477) ==1286936== by 0x6595102: clone (clone.S:95) ==1286936== Address 0xa290cbf is 127 bytes inside a block of size 5,120 alloc'd ==1286936== at 0x483DFAF: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286936== by 0x8DE155A: ??? ==1286936== by 0x8DE3F4A: ??? ==1286936== by 0x8DE4900: ??? ==1286936== by 0x8DE4175: ??? ==1286936== by 0x8D7CF91: ??? ==1286936== by 0x7CC3FDD: ??? (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) ==1286936== by 0x7CC487E: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) ==1286936== by 0x8DBDD55: ??? ==1286936== by 0x4BF7608: start_thread (pthread_create.c:477) ==1286936== by 0x6595102: clone (clone.S:95) ==1286936== Uninitialised value was created by a stack allocation ==1286936== at 0x9F048D6: ??? ==1286936== ==1286936== ==1286936== 6 errors in context 2 of 29: ==1286936== Thread 1: ==1286936== Syscall param pwritev(vector[...]) points to uninitialised byte(s) ==1286936== at 0x658A608: pwritev64 (pwritev64.c:30) ==1286936== by 0x658A608: pwritev (pwritev64.c:28) ==1286936== by 0x9F46E25: ??? ==1286936== by 0x9FCE33B: ??? ==1286936== by 0x9FCDDBF: ??? ==1286936== by 0x9FA324A: ??? ==1286936== by 0x4B3DD7F: PMPI_File_write_at_all (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) ==1286936== by 0x7B6DB32: H5FD_mpio_write (H5FDmpio.c:1833) ==1286936== by 0x78DF11D: H5FD_write (H5FDint.c:257) ==1286936== by 0x78AE86B: H5F__accum_write (H5Faccum.c:825) ==1286936== by 0x7A1FBE9: H5PB_write (H5PB.c:1027) ==1286936== by 0x78BBC7A: H5F_block_write (H5Fio.c:164) ==1286936== by 0x7B5ED15: H5C__collective_write (H5Cmpio.c:1020) ==1286936== by 0x7B5ED15: H5C_apply_candidate_list (H5Cmpio.c:394) ==1286936== Address 0xedf91b0 is 96 bytes inside a block of size 216 alloc'd ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==1286936== by 0x7980C45: H5MM_malloc (H5MM.c:292) ==1286936== by 0x7980C45: H5MM_malloc (H5MM.c:267) ==1286936== by 0x77FC8FF: H5C__flush_single_entry (H5C.c:6045) ==1286936== by 0x7B5DC7E: H5C__flush_candidates_in_ring (H5Cmpio.c:1371) ==1286936== by 0x7B5DC7E: H5C__flush_candidate_entries (H5Cmpio.c:1192) ==1286936== by 0x7B5DC7E: H5C_apply_candidate_list (H5Cmpio.c:385) ==1286936== by 0x7B5BA18: H5AC__rsp__dist_md_write__flush (H5ACmpio.c:1709) ==1286936== by 0x7B5BA18: H5AC__run_sync_point (H5ACmpio.c:2164) ==1286936== by 0x7B5C9D2: H5AC__flush_entries (H5ACmpio.c:2307) ==1286936== by 0x77C95E4: H5AC_flush (H5AC.c:681) ==1286936== by 0x78B306A: H5F__flush_phase2 (H5Fint.c:1831) ==1286936== by 0x78B5D7A: H5F__dest (H5Fint.c:1152) ==1286936== by 0x78B6603: H5F_try_close (H5Fint.c:2180) ==1286936== by 0x78B69F5: H5F__close_cb (H5Fint.c:2009) ==1286936== by 0x7965797: H5I_dec_ref (H5I.c:1254) ==1286936== Uninitialised value was created by a stack allocation ==1286936== at 0x7695AF0: ??? (in /home/mlohry/dev/cmake-build/external_install/lib/libcgns.so) ==1286936== ==1286936== ERROR SUMMARY: 34 errors from 29 contexts (suppressed: 0 from 0) On Mon, Aug 24, 2020 at 5:00 PM Jed Brown wrote: > Do you potentially have a memory or other resource leak? SIGBUS would be > an odd result, but the symptom of crashing after running for a long time > sometimes fits with a resource leak. > > Mark Lohry writes: > > > I queued up some jobs with Barry's patch, so we'll see. > > > > Re Jed's suggestion at checkpointing, I don't *think* this is something > > coming from the state of the solution -- running from the same point I'm > > seeing it crash anywhere between 1 hour and 20 hours in. I'll increase my > > file save frequency in case I'm wrong there though. > > > > My intel build with different blas just made it through a 6 hour time > slot > > without crash, whereas yesterday the same thing crashed after 3 hours. > But > > given the randomness so far I'd bet that's just dumb luck. > > > > On Mon, Aug 24, 2020 at 4:22 PM Barry Smith wrote: > > > >> > >> > >> > On Aug 24, 2020, at 2:34 PM, Jed Brown wrote: > >> > > >> > I'm thinking of something such as writing floating point data into the > >> return address, which would be unaligned/garbage. > >> > >> Ok, my patch will detect this. This is what I was talking about, > messing > >> up the BLAS arguments which are the addresses of arrays. > >> > >> Valgrind is by far the preferred approach. > >> > >> Barry > >> > >> Another feature we could add to the malloc checking is when a SEGV or > >> BUS error is encountered and we catch it we should run the > >> PetscMallocVerify() and check our memory for corruption reporting any we > >> find. > >> > >> > >> > >> > > >> > Reproducing under Valgrind would help a lot. Perhaps it's possible to > >> checkpoint such that the breakage can be reproduced more quickly? > >> > > >> > Barry Smith writes: > >> > > >> >> https://en.wikipedia.org/wiki/Bus_error < > >> https://en.wikipedia.org/wiki/Bus_error> > >> >> > >> >> But perhaps not true for Intel? > >> >> > >> >> > >> >> > >> >>> On Aug 24, 2020, at 1:06 PM, Matthew Knepley > >> wrote: > >> >>> > >> >>> On Mon, Aug 24, 2020 at 1:46 PM Barry Smith >> bsmith at petsc.dev>> wrote: > >> >>> > >> >>> > >> >>>> On Aug 24, 2020, at 12:39 PM, Jed Brown >> jed at jedbrown.org>> wrote: > >> >>>> > >> >>>> Barry Smith > writes: > >> >>>> > >> >>>>>> On Aug 24, 2020, at 12:31 PM, Jed Brown >> jed at jedbrown.org>> wrote: > >> >>>>>> > >> >>>>>> Barry Smith > writes: > >> >>>>>> > >> >>>>>>> So if a BLAS errors with SIGBUS then it is always an input error > >> of just not proper double/complex alignment? Or some other very strange > >> thing? > >> >>>>>> > >> >>>>>> I would suspect memory corruption. > >> >>>>> > >> >>>>> > >> >>>>> Corruption meaning what specifically? > >> >>>>> > >> >>>>> The routines crashing are dgemv which only take double precision > >> arrays, regardless of what garbage is in those arrays i don't think > there > >> can be BUS errors resulting. They don't take integer arrays whose > >> corruption could result in bad indexing and then BUS errors. > >> >>>>> > >> >>>>> So then it can only be corruption of the pointers passed in, > correct? > >> >>>> > >> >>>> Such as those pointers pointing into data on the stack with > incorrect > >> sizes. > >> >>> > >> >>> But won't incorrect sizes "usually" lead to SEGV not SEGBUS? > >> >>> > >> >>> My understanding was that roughly memory errors in the heap are SEGV > >> and memory errors on the stack are SIGBUS. Is that not true? > >> >>> > >> >>> Matt > >> >>> > >> >>> -- > >> >>> What most experimenters take for granted before they begin their > >> experiments is infinitely more interesting than any results to which > their > >> experiments lead. > >> >>> -- Norbert Wiener > >> >>> > >> >>> https://www.cse.buffalo.edu/~knepley/ < > >> http://www.cse.buffalo.edu/~knepley/> > >> > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Aug 24 18:38:47 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 24 Aug 2020 19:38:47 -0400 Subject: [petsc-users] error when solving a linear system with gmres + pilut/euclid In-Reply-To: <4A6C7C21-E4AB-45AE-ABAA-D9028622B66C@petsc.dev> References: <4A6C7C21-E4AB-45AE-ABAA-D9028622B66C@petsc.dev> Message-ID: On Mon, Aug 24, 2020 at 6:27 PM Barry Smith wrote: > > Alfredo, > > This should never happen. The input to the VecMAXPY in gmres is > computed via VMDot which produces the same result on all processes. > > If you run with -pc_type bjacobi does it also happen? > > Is this your custom code or does it happen in PETSc examples also? > Like src/snes/tutorials/ex19 -da_refine 5 > > Could be memory corruption, can you run under valgrind? > Couldn't it happen if something generates a NaN? That also should not happen, but I was allowing that pilut might do it. Thanks, Matt > Barry > > > > On Aug 24, 2020, at 4:05 PM, Alfredo Jaramillo < > ajaramillopalma at gmail.com> wrote: > > > > Dear PETSc developers, > > > > I'm trying to solve a linear problem with GMRES preconditioned with > pilut from HYPRE. For this I'm using the options: > > > > -ksp_type gmres -pc_type hypre -pc_hypre_type pilut -ksp_monitor > > > > If I use a single core, GMRES (+ pilut or euclid) converges. However, > when using multiple cores the next error appears after some number of > iterations: > > > > [0]PETSC ERROR: Scalar value must be same on all processes, argument # 3 > > > > relative to the function VecMAXPY. I attached a screenshot with more > detailed output. The same happens when using euclid. Can you please give me > some insight on this? > > > > best regards > > Alfredo > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Aug 24 19:00:25 2020 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 24 Aug 2020 19:00:25 -0500 Subject: [petsc-users] error when solving a linear system with gmres + pilut/euclid In-Reply-To: References: <4A6C7C21-E4AB-45AE-ABAA-D9028622B66C@petsc.dev> Message-ID: <04AF3F3C-47D5-49C0-8367-C43B7A1811D0@petsc.dev> Oh yes, it could happen with Nan. KSPGMRESClassicalGramSchmidtOrthogonalization() calls KSPCheckDot(ksp,lhh[j]); so should detect any NAN that appear and set ksp->convergedreason but the call to MAXPY() is still made before returning and hence producing the error message. We should circuit the orthogonalization as soon as it sees a Nan/Inf and return immediately for GMRES to cleanup and produce a very useful error message. Alfredo, It is also possible that the hypre preconditioners are producing a Nan because your matrix is too difficult for them to handle, but it would be odd to happen after many iterations. As I suggested before run with -pc_type bjacobi to see if you get the same problem. Barry > On Aug 24, 2020, at 6:38 PM, Matthew Knepley wrote: > > On Mon, Aug 24, 2020 at 6:27 PM Barry Smith > wrote: > > Alfredo, > > This should never happen. The input to the VecMAXPY in gmres is computed via VMDot which produces the same result on all processes. > > If you run with -pc_type bjacobi does it also happen? > > Is this your custom code or does it happen in PETSc examples also? Like src/snes/tutorials/ex19 -da_refine 5 > > Could be memory corruption, can you run under valgrind? > > Couldn't it happen if something generates a NaN? That also should not happen, but I was allowing that pilut might do it. > > Thanks, > > Matt > > Barry > > > > On Aug 24, 2020, at 4:05 PM, Alfredo Jaramillo > wrote: > > > > Dear PETSc developers, > > > > I'm trying to solve a linear problem with GMRES preconditioned with pilut from HYPRE. For this I'm using the options: > > > > -ksp_type gmres -pc_type hypre -pc_hypre_type pilut -ksp_monitor > > > > If I use a single core, GMRES (+ pilut or euclid) converges. However, when using multiple cores the next error appears after some number of iterations: > > > > [0]PETSC ERROR: Scalar value must be same on all processes, argument # 3 > > > > relative to the function VecMAXPY. I attached a screenshot with more detailed output. The same happens when using euclid. Can you please give me some insight on this? > > > > best regards > > Alfredo > > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ajaramillopalma at gmail.com Mon Aug 24 20:35:42 2020 From: ajaramillopalma at gmail.com (Alfredo Jaramillo) Date: Mon, 24 Aug 2020 22:35:42 -0300 Subject: [petsc-users] error when solving a linear system with gmres + pilut/euclid In-Reply-To: <04AF3F3C-47D5-49C0-8367-C43B7A1811D0@petsc.dev> References: <4A6C7C21-E4AB-45AE-ABAA-D9028622B66C@petsc.dev> <04AF3F3C-47D5-49C0-8367-C43B7A1811D0@petsc.dev> Message-ID: Hello Barry, Matthew, thanks for the replies ! Yes, it is our custom code, and it also happens when setting -pc_type bjacobi. Before testing an iterative solver, we were using MUMPS (-ksp_type preonly -ksp_pc_type lu -pc_factor_mat_solver_type mumps) without issues. Running the ex19 (as "mpirun -n 4 ex19 -da_refine 5") did not produce any problem. To reproduce the situation on my computer, I was able to reproduce the error for a small case and -pc_type bjacobi. For that particular case, when running in the cluster the error appears at the very last iteration: ===== 27 KSP Residual norm 8.230378644666e-06 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Invalid argument [0]PETSC ERROR: Scalar value must be same on all processes, argument # 3 ==== whereas running on my computer the error is not launched and convergence is reached instead: ==== Linear interp_ solve converged due to CONVERGED_RTOL iterations 27 ==== I will run valgrind to seek for possible memory corruptions. thank you Alfredo On Mon, Aug 24, 2020 at 9:00 PM Barry Smith wrote: > > Oh yes, it could happen with Nan. > > KSPGMRESClassicalGramSchmidtOrthogonalization() > calls KSPCheckDot(ksp,lhh[j]); so should detect any NAN that appear and > set ksp->convergedreason but the call to MAXPY() is still made before > returning and hence producing the error message. > > We should circuit the orthogonalization as soon as it sees a Nan/Inf > and return immediately for GMRES to cleanup and produce a very useful error > message. > > Alfredo, > > It is also possible that the hypre preconditioners are producing a Nan > because your matrix is too difficult for them to handle, but it would be > odd to happen after many iterations. > > As I suggested before run with -pc_type bjacobi to see if you get the > same problem. > > Barry > > > On Aug 24, 2020, at 6:38 PM, Matthew Knepley wrote: > > On Mon, Aug 24, 2020 at 6:27 PM Barry Smith wrote: > >> >> Alfredo, >> >> This should never happen. The input to the VecMAXPY in gmres is >> computed via VMDot which produces the same result on all processes. >> >> If you run with -pc_type bjacobi does it also happen? >> >> Is this your custom code or does it happen in PETSc examples also? >> Like src/snes/tutorials/ex19 -da_refine 5 >> >> Could be memory corruption, can you run under valgrind? >> > > Couldn't it happen if something generates a NaN? That also should not > happen, but I was allowing that pilut might do it. > > Thanks, > > Matt > > >> Barry >> >> >> > On Aug 24, 2020, at 4:05 PM, Alfredo Jaramillo < >> ajaramillopalma at gmail.com> wrote: >> > >> > Dear PETSc developers, >> > >> > I'm trying to solve a linear problem with GMRES preconditioned with >> pilut from HYPRE. For this I'm using the options: >> > >> > -ksp_type gmres -pc_type hypre -pc_hypre_type pilut -ksp_monitor >> > >> > If I use a single core, GMRES (+ pilut or euclid) converges. However, >> when using multiple cores the next error appears after some number of >> iterations: >> > >> > [0]PETSC ERROR: Scalar value must be same on all processes, argument # 3 >> > >> > relative to the function VecMAXPY. I attached a screenshot with more >> detailed output. The same happens when using euclid. Can you please give me >> some insight on this? >> > >> > best regards >> > Alfredo >> > >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Mon Aug 24 22:15:56 2020 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Mon, 24 Aug 2020 22:15:56 -0500 Subject: [petsc-users] Bus Error In-Reply-To: References: <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> <02176361-CBC4-490B-A064-057C9FAC271A@petsc.dev> <2C14E111-0ABC-4322-AD1C-FC83E4BD4798@petsc.dev> <87h7ssgg0g.fsf@jedbrown.org> <80DC4DC1-8D9B-43BF-9947-F47694FE68E9@petsc.dev> <87blj0gew3.fsf@jedbrown.org> <878se4gdd6.fsf@jedbrown.org> <3B4D8471-F863-49C8-AAD7-4A4ACA3BE78A@petsc.dev> <87364cgczg.fsf@jedbrown.org> <79E082F4-0261-4F32-9781-861B2B650511@petsc.dev> <87y2m3g7mp.fsf@jedbrown.org> <1BA78983-882E-404D-983D-B432D17E6421@petsc.dev> <87a6yjg3o5.fsf@jedbrown.org> Message-ID: Why not install a valgrind-clean MPICH and run with valgrind to see what happens? --Junchao Zhang On Mon, Aug 24, 2020 at 5:48 PM Mark Lohry wrote: > I don't think I do. Running a much smaller case with the same models I get > the attached report from valgrind --show-leak-kinds=all --leak-check=full > --track-origins=yes. I only see some HDF5 stuff and OpenMPI that I think > are false positives. > > ==1286950== Memcheck, a memory error detector > ==1286950== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. > ==1286950== Using Valgrind-3.15.0-608cb11914-20190413 and LibVEX; rerun > with -h for copyright info > ==1286950== Command: ./verification_testing > --gtest_filter=DrivenCavity3D.Re100_BackwardEulerILU1_16x16N2_Quadrature1 > --petsc_time_integrator=arkimex --petsc_arkimex_type=l2 > ==1286950== Parent PID: 1286932 > ==1286950== > --1286950-- > --1286950-- Valgrind options: > --1286950-- --show-leak-kinds=all > --1286950-- --leak-check=full > --1286950-- --track-origins=yes > --1286950-- --log-file=valgrind-out.txt > --1286950-- -v > --1286950-- Contents of /proc/version: > --1286950-- Linux version 5.4.0-29-generic (buildd at lgw01-amd64-035) > (gcc version 9.3.0 (Ubuntu 9.3.0-10ubuntu2)) #33-Ubuntu SMP Wed Apr 29 > 14:32:27 UTC 2020 > --1286950-- > --1286950-- Arch and hwcaps: AMD64, LittleEndian, > amd64-cx16-rdtscp-sse3-ssse3-avx > --1286950-- Page sizes: currently 4096, max supported 4096 > --1286950-- Valgrind library directory: /usr/lib/x86_64-linux-gnu/valgrind > --1286950-- Reading syms from > /home/mlohry/dev/cmake-build/verification_testing > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/ld-2.31.so > --1286950-- Considering /usr/lib/x86_64-linux-gnu/ld-2.31.so .. > --1286950-- .. CRC mismatch (computed 387b17ea wanted d28cf5ef) > --1286950-- Considering /lib/x86_64-linux-gnu/ld-2.31.so .. > --1286950-- .. CRC mismatch (computed 387b17ea wanted d28cf5ef) > --1286950-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ld-2.31.so > .. > --1286950-- .. CRC is valid > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/valgrind/memcheck-amd64-linux > --1286950-- object doesn't have a symbol table > --1286950-- object doesn't have a dynamic symbol table > --1286950-- Scheduler: using generic scheduler lock implementation. > --1286950-- Reading suppressions file: > /usr/lib/x86_64-linux-gnu/valgrind/default.supp > ==1286950== embedded gdbserver: reading from > /tmp/vgdb-pipe-from-vgdb-to-1286950-by-mlohry-on-??? > ==1286950== embedded gdbserver: writing to > /tmp/vgdb-pipe-to-vgdb-from-1286950-by-mlohry-on-??? > ==1286950== embedded gdbserver: shared mem > /tmp/vgdb-pipe-shared-mem-vgdb-1286950-by-mlohry-on-??? > ==1286950== > ==1286950== TO CONTROL THIS PROCESS USING vgdb (which you probably > ==1286950== don't want to do, unless you know exactly what you're doing, > ==1286950== or are doing some strange experiment): > ==1286950== /usr/lib/x86_64-linux-gnu/valgrind/../../bin/vgdb > --pid=1286950 ...command... > ==1286950== > ==1286950== TO DEBUG THIS PROCESS USING GDB: start GDB like this > ==1286950== /path/to/gdb ./verification_testing > ==1286950== and then give GDB the following command > ==1286950== target remote | > /usr/lib/x86_64-linux-gnu/valgrind/../../bin/vgdb --pid=1286950 > ==1286950== --pid is optional if only one valgrind process is running > ==1286950== > --1286950-- REDIR: 0x4022d80 (ld-linux-x86-64.so.2:strlen) redirected to > 0x580c9ce2 (???) > --1286950-- REDIR: 0x4022b50 (ld-linux-x86-64.so.2:index) redirected to > 0x580c9cfc (???) > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_core-amd64-linux.so > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so > --1286950-- object doesn't have a symbol table > ==1286950== WARNING: new redirection conflicts with existing -- ignoring it > --1286950-- old: 0x04022d80 (strlen ) R-> (0000.0) > 0x580c9ce2 ??? > --1286950-- new: 0x04022d80 (strlen ) R-> (2007.0) > 0x0483f060 strlen > --1286950-- REDIR: 0x401f560 (ld-linux-x86-64.so.2:strcmp) redirected to > 0x483ffd0 (strcmp) > --1286950-- REDIR: 0x40232e0 (ld-linux-x86-64.so.2:mempcpy) redirected to > 0x4843a20 (mempcpy) > --1286950-- Reading syms from > /home/mlohry/dev/cmake-build/initialization/libinitialization.so > --1286950-- Reading syms from > /home/mlohry/dev/cmake-build/governing_equations/libgoverning_equations.so > --1286950-- Reading syms from > /home/mlohry/dev/cmake-build/time_stepping/libtime_stepping.so > --1286950-- Reading syms from > /home/mlohry/dev/cmake-build/governing_equations/libboundary_conditions.so > --1286950-- Reading syms from > /home/mlohry/dev/cmake-build/governing_equations/libsolution_monitors.so > --1286950-- Reading syms from > /home/mlohry/dev/cmake-build/governing_equations/libfluxtypes.so > --1286950-- Reading syms from > /home/mlohry/dev/cmake-build/algebraic_solvers/libalgebraic_solvers.so > --1286950-- Reading syms from > /home/mlohry/dev/cmake-build/program_options/libprogram_options.so > --1286950-- Reading syms from > /home/mlohry/dev/cmake-build/boost_install/lib/libboost_filesystem.so.1.73.0 > --1286950-- Reading syms from > /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0 > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so.40.20.1 > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3 > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libpthread-2.31.so > --1286950-- Considering > /usr/lib/debug/.build-id/77/5cbbfff814456660786780b0b3b40096b4c05e.debug .. > --1286950-- .. build-id is valid > --1286948-- Reading syms from > /home/mlohry/dev/cmake-build/external/petsc/arch-linux2-c-opt/lib/libpetsc.so.3.13.3 > --1286937-- Reading syms from > /home/mlohry/dev/cmake-build/parallel/libparallel.so > --1286937-- Reading syms from > /home/mlohry/dev/cmake-build/logger/liblogger.so > --1286937-- Reading syms from > /home/mlohry/dev/cmake-build/spatial_discretization/libdiscretization.so > --1286945-- Reading syms from > /home/mlohry/dev/cmake-build/utils/libutils.so > --1286944-- Reading syms from /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28 > --1286938-- object doesn't have a symbol table > --1286949-- Reading syms from /usr/lib/x86_64-linux-gnu/libm-2.31.so > --1286949-- Considering /usr/lib/x86_64-linux-gnu/libm-2.31.so .. > --1286947-- .. CRC mismatch (computed 327d785f wanted 751f5509) > --1286947-- Considering /lib/x86_64-linux-gnu/libm-2.31.so .. > --1286938-- .. CRC mismatch (computed 327d785f wanted 751f5509) > --1286937-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/libm-2.31.so > .. > --1286950-- .. CRC is valid > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libc-2.31.so > --1286950-- Considering /usr/lib/x86_64-linux-gnu/libc-2.31.so .. > --1286951-- .. CRC mismatch (computed a6f43087 wanted 6555436e) > --1286951-- Considering /lib/x86_64-linux-gnu/libc-2.31.so .. > --1286947-- .. CRC mismatch (computed a6f43087 wanted 6555436e) > --1286947-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.31.so > .. > --1286950-- .. CRC is valid > --1286940-- Reading syms from > /home/mlohry/dev/cmake-build/file_io/libfileio.so > --1286950-- Reading syms from > /home/mlohry/dev/cmake-build/boost_install/lib/libboost_program_options.so.1.73.0 > --1286950-- Reading syms from > /home/mlohry/dev/cmake-build/boost_install/lib/libboost_serialization.so.1.73.0 > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3 > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3 > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libhwloc.so.15.1.0 > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from > /home/mlohry/dev/cmake-build/external/petsc/arch-linux2-c-opt/lib/libsuperlu_dist.so.6.3.0 > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 > --1286950-- object doesn't have a symbol table > --1286937-- Reading syms from > /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 > --1286937-- object doesn't have a symbol table > --1286947-- Reading syms from /usr/lib/x86_64-linux-gnu/libX11.so.6.3.0 > --1286939-- object doesn't have a symbol table > --1286947-- Reading syms from /usr/lib/x86_64-linux-gnu/libdl-2.31.so > --1286947-- Considering /usr/lib/x86_64-linux-gnu/libdl-2.31.so .. > --1286947-- .. CRC mismatch (computed 4fd191ca wanted df8dd39a) > --1286947-- Considering /lib/x86_64-linux-gnu/libdl-2.31.so .. > --1286947-- .. CRC mismatch (computed 4fd191ca wanted df8dd39a) > --1286947-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ > libdl-2.31.so .. > --1286947-- .. CRC is valid > --1286937-- Reading syms from > /home/mlohry/dev/cmake-build/external/petsc/arch-linux2-c-opt/lib/libmetis.so > --1286937-- Reading syms from > /home/mlohry/dev/cmake-build/boost_install/lib/libboost_log.so.1.73.0 > --1286942-- Reading syms from > /home/mlohry/dev/cmake-build/boost_install/lib/libboost_log_setup.so.1.73.0 > --1286942-- Reading syms from > /home/mlohry/dev/cmake-build/boost_install/lib/libboost_thread.so.1.73.0 > --1286942-- Reading syms from > /home/mlohry/dev/cmake-build/boost_install/lib/libboost_regex.so.1.73.0 > --1286949-- Reading syms from > /home/mlohry/dev/cmake-build/basis_functions/libbasis_functions.so > --1286944-- Reading syms from /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0 > --1286944-- object doesn't have a symbol table > --1286951-- Reading syms from > /home/mlohry/dev/cmake-build/external_install/lib/libcgns.so > --1286951-- object doesn't have a symbol table > --1286943-- Reading syms from > /home/mlohry/dev/cmake-build/external_install/lib/libhdf5.so.103.1.0 > --1286951-- Reading syms from > /home/mlohry/dev/cmake-build/external/tinyxml2-build/libtinyxml2.so.6.1.0 > --1286944-- Reading syms from > /home/mlohry/dev/cmake-build/boost_install/lib/libboost_iostreams.so.1.73.0 > --1286944-- Reading syms from /usr/lib/x86_64-linux-gnu/libz.so.1.2.11 > --1286944-- object doesn't have a symbol table > --1286951-- Reading syms from > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0 > --1286951-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libutil-2.31.so > --1286946-- Considering /usr/lib/x86_64-linux-gnu/libutil-2.31.so .. > --1286946-- .. CRC mismatch (computed 4639aba5 wanted ceb246b4) > --1286946-- Considering /lib/x86_64-linux-gnu/libutil-2.31.so .. > --1286946-- .. CRC mismatch (computed 4639aba5 wanted ceb246b4) > --1286948-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ > libutil-2.31.so .. > --1286939-- .. CRC is valid > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libudev.so.1.6.17 > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libltdl.so.7.3.1 > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/libgfortran.so.5.0.0 > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libxcb.so.1.1.0 > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/librt-2.31.so > --1286950-- Considering /usr/lib/x86_64-linux-gnu/librt-2.31.so .. > --1286950-- .. CRC mismatch (computed a9acc0ce wanted cb4905a3) > --1286950-- Considering /lib/x86_64-linux-gnu/librt-2.31.so .. > --1286950-- .. CRC mismatch (computed a9acc0ce wanted cb4905a3) > --1286950-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ > librt-2.31.so .. > --1286950-- .. CRC is valid > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/libquadmath.so.0.0.0 > --1286950-- object doesn't have a symbol table > --1286945-- Reading syms from /usr/lib/x86_64-linux-gnu/libXau.so.6.0.0 > --1286945-- Considering /usr/lib/x86_64-linux-gnu/libXau.so.6.0.0 .. > --1286945-- .. CRC mismatch (computed 7de9b6ad wanted e8a17129) > --1286945-- Considering /lib/x86_64-linux-gnu/libXau.so.6.0.0 .. > --1286945-- .. CRC mismatch (computed 7de9b6ad wanted e8a17129) > --1286945-- object doesn't have a symbol table > --1286945-- Reading syms from /usr/lib/x86_64-linux-gnu/libXdmcp.so.6.0.0 > --1286942-- object doesn't have a symbol table > --1286942-- Reading syms from /usr/lib/x86_64-linux-gnu/libbsd.so.0.10.0 > --1286942-- object doesn't have a symbol table > --1286950-- REDIR: 0x6516600 (libc.so.6:memmove) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6515900 (libc.so.6:strncpy) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6516930 (libc.so.6:strcasecmp) redirected to > 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6515220 (libc.so.6:strcat) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6515960 (libc.so.6:rindex) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6517dd0 (libc.so.6:rawmemchr) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6532e60 (libc.so.6:wmemchr) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65329a0 (libc.so.6:wcscmp) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6516760 (libc.so.6:mempcpy) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6516590 (libc.so.6:bcmp) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6515890 (libc.so.6:strncmp) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65152d0 (libc.so.6:strcmp) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65166c0 (libc.so.6:memset) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6532960 (libc.so.6:wcschr) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65157f0 (libc.so.6:strnlen) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65153b0 (libc.so.6:strcspn) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6516980 (libc.so.6:strncasecmp) redirected to > 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6515350 (libc.so.6:strcpy) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6516ad0 (libc.so.6:memcpy@@GLIBC_2.14) redirected to > 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65340d0 (libc.so.6:wcsnlen) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65329e0 (libc.so.6:wcscpy) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65159a0 (libc.so.6:strpbrk) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6515280 (libc.so.6:index) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65157b0 (libc.so.6:strlen) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x651ed20 (libc.so.6:memrchr) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65169d0 (libc.so.6:strcasecmp_l) redirected to > 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6516550 (libc.so.6:memchr) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6532ab0 (libc.so.6:wcslen) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6515c60 (libc.so.6:strspn) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65168d0 (libc.so.6:stpncpy) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6516870 (libc.so.6:stpcpy) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6517e10 (libc.so.6:strchrnul) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6516a20 (libc.so.6:strncasecmp_l) redirected to > 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6516470 (libc.so.6:strstr) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65a3750 (libc.so.6:__memcpy_chk) redirected to > 0x48331d0 (_vgnU_ifunc_wrapper) > --1286938-- REDIR: 0x6527a30 (libc.so.6:__strrchr_sse2) redirected to > 0x483ea70 (__strrchr_sse2) > --1286938-- REDIR: 0x6511c90 (libc.so.6:calloc) redirected to 0x483dce0 > (calloc) > --1286938-- REDIR: 0x6510260 (libc.so.6:malloc) redirected to 0x483b780 > (malloc) > --1286938-- REDIR: 0x6531c40 (libc.so.6:memcpy at GLIBC_2.2.5) redirected to > 0x4840100 (memcpy at GLIBC_2.2.5) > --1286938-- REDIR: 0x6527d30 (libc.so.6:__strlen_sse2) redirected to > 0x483efa0 (__strlen_sse2) > --1286938-- REDIR: 0x65f4ac0 (libc.so.6:__strncmp_sse42) redirected to > 0x483f7c0 (__strncmp_sse42) > --1286938-- REDIR: 0x6510850 (libc.so.6:free) redirected to 0x483c9d0 > (free) > --1286938-- REDIR: 0x6532070 (libc.so.6:__memset_sse2_unaligned) > redirected to 0x48428e0 (memset) > --1286938-- REDIR: 0x6603350 (libc.so.6:__memcmp_sse4_1) redirected to > 0x4842150 (__memcmp_sse4_1) > --1286938-- REDIR: 0x6520520 (libc.so.6:__strcmp_sse2_unaligned) > redirected to 0x483fed0 (strcmp) > --1286938-- REDIR: 0x61d0c10 (libstdc++.so.6:operator new(unsigned long)) > redirected to 0x483bdf0 (operator new(unsigned long)) > --1286938-- REDIR: 0x61cee60 (libstdc++.so.6:operator delete(void*)) > redirected to 0x483cf50 (operator delete(void*)) > --1286938-- REDIR: 0x61d0c70 (libstdc++.so.6:operator new[](unsigned > long)) redirected to 0x483c510 (operator new[](unsigned long)) > --1286938-- REDIR: 0x61cee90 (libstdc++.so.6:operator delete[](void*)) > redirected to 0x483d6e0 (operator delete[](void*)) > --1286938-- REDIR: 0x65275f0 (libc.so.6:__strchr_sse2) redirected to > 0x483eb90 (__strchr_sse2) > --1286950-- REDIR: 0x6511000 (libc.so.6:realloc) redirected to 0x483df30 > (realloc) > --1286950-- REDIR: 0x6527820 (libc.so.6:__strchrnul_sse2) redirected to > 0x4843540 (strchrnul) > --1286950-- REDIR: 0x6531560 (libc.so.6:__strstr_sse2_unaligned) > redirected to 0x4843c20 (strstr) > --1286950-- REDIR: 0x6531c20 (libc.so.6:__mempcpy_sse2_unaligned) > redirected to 0x4843660 (mempcpy) > --1286950-- REDIR: 0x652d2a0 (libc.so.6:__strncpy_sse2_unaligned) > redirected to 0x483f560 (__strncpy_sse2_unaligned) > --1286950-- REDIR: 0x6515830 (libc.so.6:strncat) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65305b0 (libc.so.6:__strncat_sse2_unaligned) > redirected to 0x483ede0 (strncat) > --1286950-- REDIR: 0x6516120 (libc.so.6:__GI_strstr) redirected to > 0x4843ca0 (__strstr_sse2) > --1286950-- REDIR: 0x6522360 (libc.so.6:__rawmemchr_sse2) redirected to > 0x4843580 (rawmemchr) > --1286950-- REDIR: 0x65faea0 (libc.so.6:__strcasecmp_avx) redirected to > 0x483f830 (strcasecmp) > --1286950-- REDIR: 0x65fc520 (libc.so.6:__strncasecmp_avx) redirected to > 0x483f910 (strncasecmp) > --1286950-- REDIR: 0x65f98a0 (libc.so.6:__strspn_sse42) redirected to > 0x4843ef0 (strspn) > --1286950-- REDIR: 0x65f9620 (libc.so.6:__strcspn_sse42) redirected to > 0x4843e10 (strcspn) > --1286948-- REDIR: 0x6522030 (libc.so.6:__memchr_sse2) redirected to > 0x4840050 (memchr) > --1286948-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_mmap.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_posix.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_sysv.so > --1286948-- object doesn't have a symbol table > --1286948-- Discarding syms at 0x4a96240-0x4a96d47 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_posix.so > (have_dinfo 1) > --1286948-- Discarding syms at 0x4a9b1c0-0x4a9b937 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_sysv.so > (have_dinfo 1) > --1286948-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_weighted.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_netlink.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/libnl-3.so.200.26.0 > --1286948-- object doesn't have a symbol table > --1286948-- Discarding syms at 0x4a96120-0x4a966b0 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_weighted.so > (have_dinfo 1) > --1286948-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_flux.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_orte.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_ompi.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_slurm.so > --1286948-- object doesn't have a symbol table > --1286948-- REDIR: 0x64bc670 (libc.so.6:setenv) redirected to 0x4844480 > (setenv) > --1286948-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_pmi.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_flux.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_ext3x.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25 > --1286948-- object doesn't have a symbol table > --1286948-- Discarding syms at 0x8d053e0-0x8d07391 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_flux.so (have_dinfo > 1) > --1286948-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v21.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v20.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v3.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v12.so > --1286948-- object doesn't have a symbol table > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_usock.so > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_tcp.so > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_native.so > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_none.so > --1286950-- object doesn't have a symbol table > --1286950-- Discarding syms at 0x8d04180-0x8d045b0 in > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_none.so (have_dinfo 1) > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/libmca_common_dstore.so.1.0.2 > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_hash.so > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds12.so > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_pshmem_mmap.so > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_preg_native.so > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_stdfd.so > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_syslog.so > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_default.so > --1286950-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_tool.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_app.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_orted.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_hnp.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_novm.so > --1286946-- object doesn't have a symbol table > --1286946-- Discarding syms at 0x9ebf0a0-0x9ebf490 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_tool.so > (have_dinfo 1) > --1286946-- Discarding syms at 0x9eca300-0x9ecbee8 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_orted.so > (have_dinfo 1) > --1286946-- Discarding syms at 0x9ed1220-0x9ed24e7 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_hnp.so (have_dinfo > 1) > --1286946-- Discarding syms at 0x9ed8240-0x9ed8c88 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_novm.so > (have_dinfo 1) > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_tool.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_app.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_hnp.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_orted.so > --1286946-- object doesn't have a symbol table > --1286946-- Discarding syms at 0x9ebf0e0-0x9ebf417 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_tool.so > (have_dinfo 1) > --1286946-- Discarding syms at 0x9ecf320-0x9ed1239 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_hnp.so > (have_dinfo 1) > --1286946-- Discarding syms at 0x9ed73a0-0x9ed9ccc in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_orted.so > (have_dinfo 1) > --1286936-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_basic.so > --1286936-- object doesn't have a symbol table > --1286936-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_bucket.so > --1286936-- object doesn't have a symbol table > --1286936-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_rcache_grdma.so > --1286936-- object doesn't have a symbol table > --1286936-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mpool_hugepage.so > --1286936-- object doesn't have a symbol table > --1286936-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_bml_r2.so > --1286936-- object doesn't have a symbol table > --1286936-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_tcp.so > --1286936-- object doesn't have a symbol table > --1286936-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_sm.so > --1286936-- object doesn't have a symbol table > --1286936-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so > --1286936-- object doesn't have a symbol table > --1286936-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_openib.so > --1286936-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_verbs.so.40.20.0 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_self.so > --1286946-- object doesn't have a symbol table > --1286946-- REDIR: 0x652cc70 (libc.so.6:__strcpy_sse2_unaligned) > redirected to 0x483f090 (strcpy) > --1286946-- REDIR: 0x65a3810 (libc.so.6:__memmove_chk) redirected to > 0x48331d0 (_vgnU_ifunc_wrapper) > ==1286946== WARNING: new redirection conflicts with existing -- ignoring it > --1286946-- old: 0x06531c30 (__memcpy_chk_sse2_un) R-> (2030.0) > 0x04843b10 __memcpy_chk > --1286946-- new: 0x06531c30 (__memcpy_chk_sse2_un) R-> (2024.0) > 0x048434d0 __memmove_chk > --1286946-- REDIR: 0x6531c30 (libc.so.6:__memcpy_chk_sse2_unaligned) > redirected to 0x4843b10 (__memcpy_chk) > --1286946-- REDIR: 0x65129b0 (libc.so.6:posix_memalign) redirected to > 0x483e1e0 (posix_memalign) > --1286946-- Discarding syms at 0x9f15280-0x9f32932 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_openib.so > (have_dinfo 1) > --1286946-- Discarding syms at 0x9f7c4c0-0x9f7ded8 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_verbs.so.40.20.0 > (have_dinfo 1) > --1286946-- Discarding syms at 0x9f620c0-0x9f71483 in > /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 (have_dinfo 1) > --1286946-- Discarding syms at 0x9f9ba10-0x9fd22ee in > /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 (have_dinfo 1) > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_cm.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_monitoring.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_monitoring.so.50.10.0 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_vprotocol_pessimist.so > --1286946-- object doesn't have a symbol table > --1286946-- Discarding syms at 0x9f4d400-0x9f50c19 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_vprotocol_pessimist.so > (have_dinfo 1) > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/libpsm1/libpsm_infinipath.so.1.16 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/libinfinipath.so.4.0 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libuuid.so.1.3.0 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_ofi.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libfabric.so.1.9.15 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/librdmacm.so.1.2.28.0 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libpsm2.so.2.1 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libnuma.so.1.0.0 > --1286946-- object doesn't have a symbol table > --1286946-- REDIR: 0x6517140 (libc.so.6:strcasestr) redirected to > 0x4843f80 (strcasestr) > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm2.so > --1286946-- object doesn't have a symbol table > --1286946-- Discarding syms at 0x9f4d5c0-0x9f4f5a1 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm.so (have_dinfo 1) > --1286946-- Discarding syms at 0x9fee680-0x9ff096c in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm2.so (have_dinfo > 1) > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_inter.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_basic.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sm.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_sm.so.40.20.0 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_self.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sync.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_monitoring.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_libnbc.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_tuned.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_sm.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_pt2pt.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_rdma.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_monitoring.so > --1286946-- object doesn't have a symbol table > --1286946-- Discarding syms at 0x9f724a0-0x9f787b5 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_ofi.so (have_dinfo 1) > --1286946-- Discarding syms at 0xa827f80-0xa8e14c4 in > /usr/lib/x86_64-linux-gnu/libfabric.so.1.9.15 (have_dinfo 1) > --1286946-- Discarding syms at 0x9f94830-0x9fbafce in > /usr/lib/libpsm1/libpsm_infinipath.so.1.16 (have_dinfo 1) > --1286946-- Discarding syms at 0x9fe5580-0x9fe8f71 in > /usr/lib/x86_64-linux-gnu/libuuid.so.1.3.0 (have_dinfo 1) > --1286946-- Discarding syms at 0x9f56420-0x9f5cec0 in > /usr/lib/x86_64-linux-gnu/libinfinipath.so.4.0 (have_dinfo 1) > --1286946-- Discarding syms at 0xa929f10-0xa93d5fc in > /usr/lib/x86_64-linux-gnu/librdmacm.so.1.2.28.0 (have_dinfo 1) > --1286946-- Discarding syms at 0xa94b0c0-0xa95a483 in > /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 (have_dinfo 1) > --1286946-- Discarding syms at 0xa968860-0xa9adf12 in > /usr/lib/x86_64-linux-gnu/libpsm2.so.2.1 (have_dinfo 1) > --1286946-- Discarding syms at 0xa9e7a10-0xaa1e2ee in > /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 (have_dinfo 1) > --1286946-- Discarding syms at 0x9f80410-0x9f84e27 in > /usr/lib/x86_64-linux-gnu/libnuma.so.1.0.0 (have_dinfo 1) > --1286946-- Discarding syms at 0x9f103e0-0x9f15fd5 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_cm.so (have_dinfo 1) > --1286946-- Discarding syms at 0x9f471e0-0x9f47ce0 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_monitoring.so > (have_dinfo 1) > ==1286946== Thread 3: > ==1286946== Syscall param writev(vector[...]) points to uninitialised > byte(s) > ==1286946== at 0x658A48D: __writev (writev.c:26) > ==1286946== by 0x658A48D: writev (writev.c:24) > ==1286946== by 0x8DF9B4C: pmix_ptl_base_send_handler (in > /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) > ==1286946== by 0x7CC413E: ??? (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286946== by 0x7CC487E: event_base_loop (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286946== by 0x8DBDD55: ??? (in > /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) > ==1286946== by 0x4BF7608: start_thread (pthread_create.c:477) > ==1286946== by 0x6595102: clone (clone.S:95) > ==1286946== Address 0xa28fdcf is 127 bytes inside a block of size 5,120 > alloc'd > ==1286946== at 0x483DFAF: realloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286946== by 0x8DE155A: pmix_bfrop_buffer_extend (in > /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) > ==1286946== by 0x8DE3F4A: pmix_bfrops_base_pack_byte (in > /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) > ==1286946== by 0x8DE4900: pmix_bfrops_base_pack_buf (in > /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) > ==1286946== by 0x8DE4175: pmix_bfrops_base_pack (in > /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) > ==1286946== by 0x8D7CF91: ??? (in > /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) > ==1286946== by 0x7CC3FDD: ??? (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286946== by 0x7CC487E: event_base_loop (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286946== by 0x8DBDD55: ??? (in > /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) > ==1286946== by 0x4BF7608: start_thread (pthread_create.c:477) > ==1286946== by 0x6595102: clone (clone.S:95) > ==1286946== Uninitialised value was created by a stack allocation > ==1286946== at 0x9F048D6: ??? (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so) > ==1286946== > --1286944-- Discarding syms at 0xaa4d220-0xaa5796a in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmp--1286945-- Discarding syms at > 0xaa4d220---1286948-- Discarding syms at 0xaae1100-0xaae7d70 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmp--1286945-- Discarding syms at > 0xaae1100-0xaae7d70 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_monitoring.so > (have_dinfo 1) > --1286945-- Discarding syms at 0x9f69420-0x9f--1286938-- REDIR: 0x61cee70 > (libstdc++.so.6:operator delete(void*, unsigned long)) redirected to > --1286937-- REDIR: 0x61cee70 (libstdc++.so.6:opera--1286946-- REDIR: > 0x652e970 (libc.so.6:__stpncpy_sse2_unaligned) redirected to 0x48427e0 > (stpncpy) > --1286942-- REDIR: 0x6527ed0 (libc.so.6:__strnlen_sse2) redirected to > 0x483eee0 (strnlen) > --1286944-- REDIR: 0x652fcc0 (libc.so.6:__strcat_sse2_unaligned) > redirected to 0x483ec20 (strcat) > --1286951-- REDIR: 0x65113d0 (libc.so.6:memalign) redirected to 0x483e2a0 > (memalign) > --1286951-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_romio321.so > --1286951-- object doesn't have a symbol table > --1286951-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_ompio.so > --1286951-- object doesn't have a symbol table > --1286941-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_ompio.so.41.19.3 > --1286941-- object doesn't have a symbol table > --1286951-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fs_ufs.so > --1286951-- object doesn't have a symbol table > --1286939-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_two_phase.so > --1286939-- object doesn't have a symbol table > --1286939-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_individual.so > --1286939-- object doesn't have a symbol table > --1286939-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic.so > --1286939-- object doesn't have a symbol table > --1286939-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic_gen2.so > --1286939-- object doesn't have a symbol table > --1286939-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_vulcan.so > --1286939-- object doesn't have a symbol table > --1286939-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fbtl_posix.so > --1286939-- object doesn't have a symbol table > --1286943-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_sm.so > --1286943-- object doesn't have a symbol table > --1286943-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_individual.so > --1286943-- object doesn't have a symbol table > --1286943-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_lockedfile.so > --1286943-- object doesn't have a symbol table > --1286938-- REDIR: 0x65a3b00 (libc.so.6:__strcpy_chk) redirected to > 0x48435c0 (__strcpy_chk) > --1286939-- Discarding syms at 0x9f1d660-0x9f371d6 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so (have_dinfo 1) > --1286939-- Discarding syms at 0x9f5afa0-0x9f8f8b6 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_romio321.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x9fa0640-0x9fa42d9 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_ompio.so (have_dinfo > 1) > --1286939-- Discarding syms at 0x9f4c160-0x9f4dc58 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_inter.so > (have_dinfo 1) > --1286939-- Discarding syms at 0xa7fc270-0xa804f00 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_basic.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x9fee3a0-0x9ff134e in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sm.so (have_dinfo 1) > --1286939-- Discarding syms at 0xa80a240-0xa80aa8d in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_sm.so.40.20.0 > (have_dinfo 1) > --1286939-- Discarding syms at 0xa80f0e0-0xa80f8bb in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_self.so (have_dinfo > 1) > --1286939-- Discarding syms at 0xaa460c0-0xaa47947 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sync.so (have_dinfo > 1) > --1286939-- Discarding syms at 0xaa613e0-0xaa7730f in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_libnbc.so > (have_dinfo 1) > --1286939-- Discarding syms at 0xaa849c0-0xaa8a845 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_tuned.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x9ee1320-0x9ee3567 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_bml_r2.so (have_dinfo 1) > --1286939-- Discarding syms at 0x9eebc40-0x9ef4ad7 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_tcp.so (have_dinfo 1) > --1286939-- Discarding syms at 0x9f02600-0x9f08cd8 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so (have_dinfo > 1) > --1286939-- Discarding syms at 0x9f40200-0x9f4126e in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_self.so (have_dinfo > 1) > --1286939-- Discarding syms at 0x9eda4e0-0x9edb4c5 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mpool_hugepage.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x9ed32c0-0x9ed4afe in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_rcache_grdma.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x9ebf160-0x9ebfe95 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_basic.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x9ece140-0x9ecebed in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_bucket.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x9ec92a0-0x9ec9aa2 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_app.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x8eae0e0-0x8eae4a7 in > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_stdfd.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8eb3220-0x8eb3c27 in > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_syslog.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8eb80e0-0x8eb90b7 in > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_default.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8ea6380-0x8ea97b3 in > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_preg_native.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8e5a740-0x8e5f859 in > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_usock.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8e67be0-0x8e743f0 in > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_tcp.so (have_dinfo 1) > --1286939-- Discarding syms at 0x84da200-0x84daa5d in > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_native.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8d322b0-0x8d34bfc in > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v21.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8e29480-0x8e3b70a in > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v20.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8d3c2b0-0x8d3ed5c in > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v3.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8e45340-0x8e502da in > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v12.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8e901a0-0x8e908a7 in > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_pshmem_mmap.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8d05520-0x8d06783 in > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8e7b460-0x8e8aaa4 in > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_hash.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8d44520-0x8d4556a in > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds12.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8e97600-0x8ea0fa1 in > /usr/lib/x86_64-linux-gnu/pmix/lib/libmca_common_dstore.so.1.0.2 > (have_dinfo 1) > --1286939-- Discarding syms at 0x8d109c0-0x8d27dcf in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_ext3x.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x8d5b280-0x8dfdffb in > /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25 (have_dinfo 1) > --1286939-- Discarding syms at 0x9ec40a0-0x9ec4490 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_app.so (have_dinfo > 1) > --1286939-- Discarding syms at 0x84d2580-0x84d518f in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_pmi.so (have_dinfo 1) > --1286939-- Discarding syms at 0x4a96120-0x4a9644f in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_flux.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x4aa0100-0x4aa03e7 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_orte.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x84c74a0-0x84c901f in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_ompi.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x4aa5260-0x4aa58e9 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_slurm.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x4a9b420-0x4a9bcdf in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_netlink.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x84e7460-0x84f52ca in > /usr/lib/x86_64-linux-gnu/libnl-3.so.200.26.0 (have_dinfo 1) > --1286939-- Discarding syms at 0x4a90360-0x4a91107 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_mmap.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x9f46220-0x9f474cc in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fbtl_posix.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x9f0f180-0x9f0f78d in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fs_ufs.so (have_dinfo 1) > --1286939-- Discarding syms at 0xaa94540-0xaa96a4a in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_sm.so (have_dinfo 1) > --1286939-- Discarding syms at 0xaa9f6c0-0xaab44d0 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_pt2pt.so (have_dinfo > 1) > --1286939-- Discarding syms at 0xaabe820-0xaad8ee0 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_rdma.so (have_dinfo > 1) > --1286939-- Discarding syms at 0x9efc080-0x9efc1e1 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_sm.so (have_dinfo 1) > --1286939-- Discarding syms at 0x9fab2a0-0x9fb1341 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_two_phase.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x9f140c0-0x9f14299 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_individual.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x9fb72a0-0x9fbb791 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x9fd52a0-0x9fda794 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic_gen2.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x9fe02e0-0x9fe59a5 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_vulcan.so > (have_dinfo 1) > --1286939-- Discarding syms at 0xa815460-0xa8177ab in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_sm.so > (have_dinfo 1) > --1286939-- Discarding syms at 0xa81e260-0xa82033d in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_individual.so > (have_dinfo 1) > --1286939-- Discarding syms at 0xa8273e0-0xa8297d8 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_lockedfile.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x9fc85e0-0x9fce8ef in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_ompio.so.41.19.3 > (have_dinfo 1) > ==1286939== > ==1286939== HEAP SUMMARY: > ==1286939== in use at exit: 74,054 bytes in 223 blocks > ==1286939== total heap usage: 22,405,782 allocs, 22,405,559 frees, > 34,062,479,959 bytes allocated > ==1286939== > ==1286939== Searching for pointers to 223 not-freed blocks > ==1286939== Checked 3,415,912 bytes > ==1286939== > ==1286939== Thread 1: > ==1286939== 1 bytes in 1 blocks are definitely lost in loss record 1 of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x651550E: strdup (strdup.c:42) > ==1286939== by 0x9F6A4B6: ??? > ==1286939== by 0x9F47373: ??? > ==1286939== by 0x68E3B9B: mca_base_framework_components_register (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3F35: mca_base_framework_register (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3F93: mca_base_framework_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x4BA1734: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== > ==1286939== 8 bytes in 1 blocks are still reachable in loss record 2 of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x764724C: ??? (in > /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) > ==1286939== by 0x7657B9A: ??? (in > /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) > ==1286939== by 0x7645679: ??? (in > /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) > ==1286939== by 0x4011B89: call_init.part.0 (dl-init.c:72) > ==1286939== by 0x4011C90: call_init (dl-init.c:30) > ==1286939== by 0x4011C90: _dl_init (dl-init.c:119) > ==1286939== by 0x4001139: ??? (in /usr/lib/x86_64-linux-gnu/ld-2.31.so) > ==1286939== by 0x3: ??? > ==1286939== by 0x1FFEFFF926: ??? > ==1286939== by 0x1FFEFFF93D: ??? > ==1286939== by 0x1FFEFFF987: ??? > ==1286939== by 0x1FFEFFF9A7: ??? > ==1286939== > ==1286939== 8 bytes in 1 blocks are definitely lost in loss record 3 of 44 > ==1286939== at 0x483DD99: calloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x9F69B6F: ??? > ==1286939== by 0x9F1CDED: ??? > ==1286939== by 0x68FC9C8: mca_btl_base_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9EE3527: ??? > ==1286939== by 0x4B6170A: mca_bml_base_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA1714: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== by 0x15710D: main (testing_main.cpp:8) > ==1286939== > ==1286939== 13 bytes in 2 blocks are still reachable in loss record 4 of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x651550E: strdup (strdup.c:42) > ==1286939== by 0x7CC3657: event_config_avoid_method (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x68FEB5A: opal_event_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68FE8CA: ??? (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E4008: mca_base_framework_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68B8BCF: opal_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x6860120: orte_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== > ==1286939== 15 bytes in 1 blocks are indirectly lost in loss record 5 of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x651550E: strdup (strdup.c:42) > ==1286939== by 0x9EDB189: ??? > ==1286939== by 0x68D98FC: mca_base_framework_components_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x6907C25: ??? (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E4008: mca_base_framework_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x4BA16D5: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== by 0x15710D: main (testing_main.cpp:8) > ==1286939== > ==1286939== 15 bytes in 1 blocks are definitely lost in loss record 6 of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x651550E: strdup (strdup.c:42) > ==1286939== by 0x9F5655C: ??? > ==1286939== by 0x4011B89: call_init.part.0 (dl-init.c:72) > ==1286939== by 0x4011C90: call_init (dl-init.c:30) > ==1286939== by 0x4011C90: _dl_init (dl-init.c:119) > ==1286939== by 0x65D6784: _dl_catch_exception (dl-error-skeleton.c:182) > ==1286939== by 0x401642C: dl_open_worker (dl-open.c:758) > ==1286939== by 0x65D6727: _dl_catch_exception (dl-error-skeleton.c:208) > ==1286939== by 0x40155F9: _dl_open (dl-open.c:837) > ==1286939== by 0x72DE34B: dlopen_doit (dlopen.c:66) > ==1286939== by 0x65D6727: _dl_catch_exception (dl-error-skeleton.c:208) > ==1286939== by 0x65D67F2: _dl_catch_error (dl-error-skeleton.c:227) > ==1286939== > ==1286939== 16 bytes in 1 blocks are definitely lost in loss record 7 of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x9F1CBEB: ??? > ==1286939== by 0x68FC9C8: mca_btl_base_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9EE3527: ??? > ==1286939== by 0x4B6170A: mca_bml_base_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA1714: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== by 0x15710D: main (testing_main.cpp:8) > ==1286939== > ==1286939== 16 bytes in 1 blocks are definitely lost in loss record 8 of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x9F1CC66: ??? > ==1286939== by 0x68FC9C8: mca_btl_base_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9EE3527: ??? > ==1286939== by 0x4B6170A: mca_bml_base_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA1714: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== by 0x15710D: main (testing_main.cpp:8) > ==1286939== > ==1286939== 16 bytes in 1 blocks are definitely lost in loss record 9 of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x9F1CCDA: ??? > ==1286939== by 0x68FC9C8: mca_btl_base_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9EE3527: ??? > ==1286939== by 0x4B6170A: mca_bml_base_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA1714: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== by 0x15710D: main (testing_main.cpp:8) > ==1286939== > ==1286939== 25 bytes in 1 blocks are still reachable in loss record 10 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x651550E: strdup (strdup.c:42) > ==1286939== by 0x68F27BD: ??? (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x4B956B6: ompi_pml_v_output_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B95259: ??? (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x68D98FC: mca_base_framework_components_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x4B93FAE: ??? (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x68E4008: mca_base_framework_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x4BA1734: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== > ==1286939== 30 bytes in 1 blocks are definitely lost in loss record 11 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0xA9A859B: ??? > ==1286939== by 0x4011B89: call_init.part.0 (dl-init.c:72) > ==1286939== by 0x4011C90: call_init (dl-init.c:30) > ==1286939== by 0x4011C90: _dl_init (dl-init.c:119) > ==1286939== by 0x65D6784: _dl_catch_exception (dl-error-skeleton.c:182) > ==1286939== by 0x401642C: dl_open_worker (dl-open.c:758) > ==1286939== by 0x65D6727: _dl_catch_exception (dl-error-skeleton.c:208) > ==1286939== by 0x40155F9: _dl_open (dl-open.c:837) > ==1286939== by 0x72DE34B: dlopen_doit (dlopen.c:66) > ==1286939== by 0x65D6727: _dl_catch_exception (dl-error-skeleton.c:208) > ==1286939== by 0x65D67F2: _dl_catch_error (dl-error-skeleton.c:227) > ==1286939== by 0x72DEB58: _dlerror_run (dlerror.c:170) > ==1286939== > ==1286939== 32 bytes in 1 blocks are still reachable in loss record 12 of > 44 > ==1286939== at 0x483DD99: calloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x7CC353E: event_get_supported_methods (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x68FEA98: opal_event_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68FE8CA: ??? (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E4008: mca_base_framework_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68B8BCF: opal_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x6860120: orte_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== > ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 13 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A5EB: ??? > ==1286939== by 0x84D2B0A: ??? > ==1286939== by 0x68602FB: orte_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== > ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 14 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A5EB: ??? > ==1286939== by 0x84D2BCE: ??? > ==1286939== by 0x68602FB: orte_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== > ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 15 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A5EB: ??? > ==1286939== by 0x84D2CB2: ??? > ==1286939== by 0x68602FB: orte_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== > ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 16 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A5EB: ??? > ==1286939== by 0x84D2D91: ??? > ==1286939== by 0x68602FB: orte_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== > ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 17 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E81BD8: ??? > ==1286939== by 0x8E89F4B: ??? > ==1286939== by 0x8D84A0D: ??? > ==1286939== by 0x8DF79C1: ??? > ==1286939== by 0x7CC3FDD: ??? (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x7CC487E: event_base_loop (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x8DBDD55: ??? > ==1286939== by 0x4BF7608: start_thread (pthread_create.c:477) > ==1286939== by 0x6595102: clone (clone.S:95) > ==1286939== > ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 18 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A767: ??? > ==1286939== by 0x84D330E: ??? > ==1286939== by 0x68602FB: orte_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== > ==1286939== 36 (32 direct, 4 indirect) bytes in 1 blocks are definitely > lost in loss record 19 of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A5EB: ??? > ==1286939== by 0x4B94C09: mca_pml_base_pml_check_selected (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x9F1E1E1: ??? > ==1286939== by 0x4BA1A09: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== > ==1286939== 40 bytes in 1 blocks are still reachable in loss record 20 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x7CFF4B6: ??? (in > /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) > ==1286939== by 0x7CC5E26: event_global_setup_locks_ (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x7CFF68F: evthread_use_pthreads (in > /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) > ==1286939== by 0x68FE8E4: ??? (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E4008: mca_base_framework_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68B8BCF: opal_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x6860120: orte_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== > ==1286939== 40 bytes in 1 blocks are still reachable in loss record 21 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x7CFF4B6: ??? (in > /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) > ==1286939== by 0x7CCF377: evsig_global_setup_locks_ (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x7CC5E39: event_global_setup_locks_ (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x7CFF68F: evthread_use_pthreads (in > /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) > ==1286939== by 0x68FE8E4: ??? (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E4008: mca_base_framework_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68B8BCF: opal_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x6860120: orte_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== > ==1286939== 40 bytes in 1 blocks are still reachable in loss record 22 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x7CFF4B6: ??? (in > /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) > ==1286939== by 0x7CCB997: evutil_secure_rng_global_setup_locks_ (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x7CC5E4F: event_global_setup_locks_ (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x7CFF68F: evthread_use_pthreads (in > /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) > ==1286939== by 0x68FE8E4: ??? (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E4008: mca_base_framework_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68B8BCF: opal_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x6860120: orte_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== > ==1286939== 48 bytes in 1 blocks are still reachable in loss record 23 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x68D9043: mca_base_component_repository_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68D7F7A: mca_base_component_find (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3F35: mca_base_framework_register (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3F93: mca_base_framework_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x4B8560C: mca_io_base_file_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B0E68A: ompi_file_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B3ADB8: PMPI_File_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) > ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) > ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) > ==1286939== > ==1286939== 48 bytes in 1 blocks are still reachable in loss record 24 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x68D9043: mca_base_component_repository_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68D7F7A: mca_base_component_find (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3F35: mca_base_framework_register (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3F93: mca_base_framework_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x4B85638: mca_io_base_file_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B0E68A: ompi_file_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B3ADB8: PMPI_File_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) > ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) > ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) > ==1286939== > ==1286939== 48 bytes in 2 blocks are still reachable in loss record 25 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x7CC3647: event_config_avoid_method (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x68FEB5A: opal_event_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68FE8CA: ??? (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E4008: mca_base_framework_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68B8BCF: opal_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x6860120: orte_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== > ==1286939== 55 (32 direct, 23 indirect) bytes in 1 blocks are definitely > lost in loss record 26 of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A767: ??? > ==1286939== by 0x4AF6CD6: ompi_comm_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA194D: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== > ==1286939== 56 bytes in 1 blocks are still reachable in loss record 27 of > 44 > ==1286939== at 0x483DD99: calloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x7CC1C86: event_config_new (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x68FEAC0: opal_event_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68FE8CA: ??? (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E4008: mca_base_framework_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68B8BCF: opal_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x6860120: orte_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== > ==1286939== 56 bytes in 1 blocks are definitely lost in loss record 28 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x9F6E008: ??? > ==1286939== by 0x9F7C654: ??? > ==1286939== by 0x9F1CD3E: ??? > ==1286939== by 0x68FC9C8: mca_btl_base_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9EE3527: ??? > ==1286939== by 0x4B6170A: mca_bml_base_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA1714: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== > ==1286939== 56 bytes in 1 blocks are definitely lost in loss record 29 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0xA957008: ??? > ==1286939== by 0xA86B017: ??? > ==1286939== by 0xA862FD8: ??? > ==1286939== by 0xA828E15: ??? > ==1286939== by 0xA829624: ??? > ==1286939== by 0x9F77910: ??? > ==1286939== by 0x4B85C53: ompi_mtl_base_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x9F13E4D: ??? > ==1286939== by 0x4B94673: mca_pml_base_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA1789: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== > ==1286939== 76 (32 direct, 44 indirect) bytes in 1 blocks are definitely > lost in loss record 30 of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A767: ??? > ==1286939== by 0x84D387F: ??? > ==1286939== by 0x68602FB: orte_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== > ==1286939== 79 (64 direct, 15 indirect) bytes in 1 blocks are definitely > lost in loss record 31 of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x9EDB12E: ??? > ==1286939== by 0x68D98FC: mca_base_framework_components_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x6907C25: ??? (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E4008: mca_base_framework_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x4BA16D5: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== by 0x15710D: main (testing_main.cpp:8) > ==1286939== > ==1286939== 144 bytes in 3 blocks are still reachable in loss record 32 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x68D9043: mca_base_component_repository_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68D7F7A: mca_base_component_find (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3F35: mca_base_framework_register (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3F93: mca_base_framework_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x4B8564E: mca_io_base_file_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B0E68A: ompi_file_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B3ADB8: PMPI_File_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) > ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) > ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) > ==1286939== > ==1286939== 231 bytes in 12 blocks are definitely lost in loss record 33 > of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x651550E: strdup (strdup.c:42) > ==1286939== by 0x9F2B4B3: ??? > ==1286939== by 0x9F2B85C: ??? > ==1286939== by 0x9F2BBD7: ??? > ==1286939== by 0x9F1CAAC: ??? > ==1286939== by 0x68FC9C8: mca_btl_base_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9EE3527: ??? > ==1286939== by 0x4B6170A: mca_bml_base_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA1714: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== > ==1286939== 240 bytes in 5 blocks are still reachable in loss record 34 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x68D9043: mca_base_component_repository_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68D7F7A: mca_base_component_find (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3F35: mca_base_framework_register (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3F93: mca_base_framework_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x4B85622: mca_io_base_file_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B0E68A: ompi_file_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B3ADB8: PMPI_File_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) > ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) > ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) > ==1286939== > ==1286939== 272 bytes in 44 blocks are definitely lost in loss record 35 > of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x9FCAEDB: ??? > ==1286939== by 0x9FE42B2: ??? > ==1286939== by 0x9FE47BB: ??? > ==1286939== by 0x9FCDDBF: ??? > ==1286939== by 0x9FA324A: ??? > ==1286939== by 0x4B3DD7F: PMPI_File_write_at_all (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x7B6DB32: H5FD_mpio_write (H5FDmpio.c:1833) > ==1286939== by 0x78DF11D: H5FD_write (H5FDint.c:257) > ==1286939== by 0x78AE86B: H5F__accum_write (H5Faccum.c:825) > ==1286939== by 0x7A1FBE9: H5PB_write (H5PB.c:1027) > ==1286939== by 0x78BBC7A: H5F_block_write (H5Fio.c:164) > ==1286939== > ==1286939== 585 (480 direct, 105 indirect) bytes in 15 blocks are > definitely lost in loss record 36 of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A767: ??? > ==1286939== by 0x4B14036: ompi_proc_complete_init_single (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B146C3: ompi_proc_complete_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA19A9: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== > ==1286939== 776 bytes in 32 blocks are indirectly lost in loss record 37 > of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8DE9816: ??? > ==1286939== by 0x8DEB1D2: ??? > ==1286939== by 0x8DEB49A: ??? > ==1286939== by 0x8DE8B12: ??? > ==1286939== by 0x8E9D492: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A767: ??? > ==1286939== > ==1286939== 840 (480 direct, 360 indirect) bytes in 15 blocks are > definitely lost in loss record 38 of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A5EB: ??? > ==1286939== by 0x9EF2F00: ??? > ==1286939== by 0x9EEBF17: ??? > ==1286939== by 0x9EE2F54: ??? > ==1286939== by 0x9F1E1FB: ??? > ==1286939== > ==1286939== 1,084 (480 direct, 604 indirect) bytes in 15 blocks are > definitely lost in loss record 39 of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A767: ??? > ==1286939== by 0x84D4800: ??? > ==1286939== by 0x68602FB: orte_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== > ==1286939== 1,344 bytes in 1 blocks are definitely lost in loss record 40 > of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x68AE702: opal_free_list_grow_st (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9F1CD2D: ??? > ==1286939== by 0x68FC9C8: mca_btl_base_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9EE3527: ??? > ==1286939== by 0x4B6170A: mca_bml_base_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA1714: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== by 0x15710D: main (testing_main.cpp:8) > ==1286939== > ==1286939== 2,752 bytes in 1 blocks are definitely lost in loss record 41 > of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x68AE702: opal_free_list_grow_st (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9F1CC50: ??? > ==1286939== by 0x68FC9C8: mca_btl_base_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9EE3527: ??? > ==1286939== by 0x4B6170A: mca_bml_base_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA1714: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== by 0x15710D: main (testing_main.cpp:8) > ==1286939== > ==1286939== 2,752 bytes in 1 blocks are definitely lost in loss record 42 > of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x68AE702: opal_free_list_grow_st (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9F1CCC4: ??? > ==1286939== by 0x68FC9C8: mca_btl_base_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9EE3527: ??? > ==1286939== by 0x4B6170A: mca_bml_base_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA1714: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== by 0x15710D: main (testing_main.cpp:8) > ==1286939== > ==1286939== 62,644 bytes in 31 blocks are indirectly lost in loss record > 43 of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8DE9FA8: ??? > ==1286939== by 0x8DEB032: ??? > ==1286939== by 0x8DEB49A: ??? > ==1286939== by 0x8DE8B12: ??? > ==1286939== by 0x8E9D492: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A5EB: ??? > ==1286939== > ==1286939== 62,760 (480 direct, 62,280 indirect) bytes in 15 blocks are > definitely lost in loss record 44 of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A5EB: ??? > ==1286939== by 0x9F0398A: ??? > ==1286939== by 0x9EE2F54: ??? > ==1286939== by 0x9F1E1FB: ??? > ==1286939== by 0x4BA1A09: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== > ==1286939== LEAK SUMMARY: > ==1286939== definitely lost: 9,837 bytes in 138 blocks > ==1286939== indirectly lost: 63,435 bytes in 64 blocks > ==1286939== possibly lost: 0 bytes in 0 blocks > ==1286939== still reachable: 782 bytes in 21 blocks > ==1286939== suppressed: 0 bytes in 0 blocks > ==1286939== > ==1286939== ERROR SUMMARY: 29 errors from 29 contexts (suppressed: 0 from > 0) > ==1286939== > ==1286939== 1 errors in context 1 of 29: > ==1286939== Thread 3: > ==1286939== Syscall param writev(vector[...]) points to uninitialised > byte(s) > ==1286939== at 0x658A48D: __writev (writev.c:26) > ==1286939== by 0x658A48D: writev (writev.c:24) > ==1286939== by 0x8DF9B4C: ??? > ==1286939== by 0x7CC413E: ??? (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x7CC487E: event_base_loop (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x8DBDD55: ??? > ==1286939== by 0x4BF7608: start_thread (pthread_create.c:477) > ==1286939== by 0x6595102: clone (clone.S:95) > ==1286939== Address 0xa28ee1f is 127 bytes inside a block of size 5,120 > alloc'd > ==1286939== at 0x483DFAF: realloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8DE155A: ??? > ==1286939== by 0x8DE3F4A: ??? > ==1286939== by 0x8DE4900: ??? > ==1286939== by 0x8DE4175: ??? > ==1286939== by 0x8D7CF91: ??? > ==1286939== by 0x7CC3FDD: ??? (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x7CC487E: event_base_loop (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x8DBDD55: ??? > ==1286939== by 0x4BF7608: start_thread (pthread_create.c:477) > ==1286939== by 0x6595102: clone (clone.S:95) > ==1286939== Uninitialised value was created by a stack allocation > ==1286939== at 0x9F048D6: ??? > ==1286939== > ==1286939== ERROR SUMMARY: 29 errors from 29 contexts (suppressed: 0 from > 0) > mpi/lib/libopen-pal.so.40.20.3) > ==1286936== by 0x4B85622: mca_io_base_file_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4B0E68A: ompi_file_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4B3ADB8: PMPI_File_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) > ==1286936== by 0x78D4B23: H5FD_open (H5FD.c:733) > ==1286936== by 0x78B953B: H5F_open (H5Fint.c:1493) > ==1286936== > ==1286936== 272 bytes in 44 blocks are definitely lost in loss record 39 > of 49 > ==1286936== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x9FCAEDB: ??? > ==1286936== by 0x9FE42B2: ??? > ==1286936== by 0x9FE47BB: ??? > ==1286936== by 0x9FCDDBF: ??? > ==1286936== by 0x9FA324A: ??? > ==1286936== by 0x4B3DD7F: PMPI_File_write_at_all (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x7B6DB32: H5FD_mpio_write (H5FDmpio.c:1833) > ==1286936== by 0x78DF11D: H5FD_write (H5FDint.c:257) > ==1286936== by 0x78AE86B: H5F__accum_write (H5Faccum.c:825) > ==1286936== by 0x7A1FBE9: H5PB_write (H5PB.c:1027) > ==1286936== by 0x78BBC7A: H5F_block_write (H5Fio.c:164) > ==1286936== > ==1286936== 312 bytes in 1 blocks are still reachable in loss record 40 of > 49 > ==1286936== at 0x483BE63: operator new(unsigned long) (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x74E78EB: boost::detail::make_external_thread_data() > (in > /home/mlohry/dev/cmake-build/boost_install/lib/libboost_thread.so.1.73.0) > ==1286936== by 0x74E7C74: > boost::detail::add_thread_exit_function(boost::detail::thread_exit_function_base*) > (in > /home/mlohry/dev/cmake-build/boost_install/lib/libboost_thread.so.1.73.0) > ==1286936== by 0x73AFCEA: > boost::log::v2_mt_posix::sources::aux::get_severity_level() (in > /home/mlohry/dev/cmake-build/boost_install/lib/libboost_log.so.1.73.0) > ==1286936== by 0x5F71A6C: set_value (severity_feature.hpp:135) > ==1286936== by 0x5F71A6C: > open_record_unlocked const boost::log::v2_mt_posix::trivial::severity_level> > > > (severity_feature.hpp:252) > ==1286936== by 0x5F71A6C: > open_record const boost::log::v2_mt_posix::trivial::severity_level> > > > (basic_logger.hpp:459) > ==1286936== by 0x5F71A6C: > Logger::TraceMessage(std::__cxx11::basic_string std::char_traits, std::allocator >) (logger.cpp:328) > ==1286936== by 0x5F729C7: > Logger::Message(std::__cxx11::basic_string, > std::allocator > const&, LogLevel) (logger.cpp:280) > ==1286936== by 0x5F73CF1: > Logger::Timer::Timer(std::__cxx11::basic_string std::char_traits, std::allocator > const&, LogLevel) > (logger.cpp:426) > ==1286936== by 0x15718A: timer (logger.hpp:98) > ==1286936== by 0x15718A: main (testing_main.cpp:9) > ==1286936== > ==1286936== 585 (480 direct, 105 indirect) bytes in 15 blocks are > definitely lost in loss record 41 of 49 > ==1286936== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x8E9D3EB: ??? > ==1286936== by 0x8E9F1C1: ??? > ==1286936== by 0x8D0578C: ??? > ==1286936== by 0x8D8605A: ??? > ==1286936== by 0x8D87FE8: ??? > ==1286936== by 0x8D88E4D: ??? > ==1286936== by 0x8D1A767: ??? > ==1286936== by 0x4B14036: ompi_proc_complete_init_single (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4B146C3: ompi_proc_complete_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4BA19A9: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== > ==1286936== 776 bytes in 32 blocks are indirectly lost in loss record 42 > of 49 > ==1286936== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x8DE9816: ??? > ==1286936== by 0x8DEB1D2: ??? > ==1286936== by 0x8DEB49A: ??? > ==1286936== by 0x8DE8B12: ??? > ==1286936== by 0x8E9D492: ??? > ==1286936== by 0x8E9F1C1: ??? > ==1286936== by 0x8D0578C: ??? > ==1286936== by 0x8D8605A: ??? > ==1286936== by 0x8D87FE8: ??? > ==1286936== by 0x8D88E4D: ??? > ==1286936== by 0x8D1A767: ??? > ==1286936== > ==1286936== 840 (480 direct, 360 indirect) bytes in 15 blocks are > definitely lost in loss record 43 of 49 > ==1286936== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x8E9D3EB: ??? > ==1286936== by 0x8E9F1C1: ??? > ==1286936== by 0x8D0578C: ??? > ==1286936== by 0x8D8605A: ??? > ==1286936== by 0x8D87FE8: ??? > ==1286936== by 0x8D88E4D: ??? > ==1286936== by 0x8D1A5EB: ??? > ==1286936== by 0x9EF2F00: ??? > ==1286936== by 0x9EEBF17: ??? > ==1286936== by 0x9EE2F54: ??? > ==1286936== by 0x9F1E1FB: ??? > ==1286936== > ==1286936== 1,091 (480 direct, 611 indirect) bytes in 15 blocks are > definitely lost in loss record 44 of 49 > ==1286936== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x8E9D3EB: ??? > ==1286936== by 0x8E9F1C1: ??? > ==1286936== by 0x8D0578C: ??? > ==1286936== by 0x8D8605A: ??? > ==1286936== by 0x8D87FE8: ??? > ==1286936== by 0x8D88E4D: ??? > ==1286936== by 0x8D1A767: ??? > ==1286936== by 0x84D4800: ??? > ==1286936== by 0x68602FB: orte_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286936== by 0x4BA1322: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== > ==1286936== 1,344 bytes in 1 blocks are definitely lost in loss record 45 > of 49 > ==1286936== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x68AE702: opal_free_list_grow_st (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286936== by 0x9F1CD2D: ??? > ==1286936== by 0x68FC9C8: mca_btl_base_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286936== by 0x9EE3527: ??? > ==1286936== by 0x4B6170A: mca_bml_base_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4BA1714: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286936== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286936== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286936== by 0x15710D: main (testing_main.cpp:8) > ==1286936== > ==1286936== 2,752 bytes in 1 blocks are definitely lost in loss record 46 > of 49 > ==1286936== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x68AE702: opal_free_list_grow_st (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286936== by 0x9F1CC50: ??? > ==1286936== by 0x68FC9C8: mca_btl_base_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286936== by 0x9EE3527: ??? > ==1286936== by 0x4B6170A: mca_bml_base_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4BA1714: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286936== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286936== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286936== by 0x15710D: main (testing_main.cpp:8) > ==1286936== > ==1286936== 2,752 bytes in 1 blocks are definitely lost in loss record 47 > of 49 > ==1286936== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x68AE702: opal_free_list_grow_st (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286936== by 0x9F1CCC4: ??? > ==1286936== by 0x68FC9C8: mca_btl_base_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286936== by 0x9EE3527: ??? > ==1286936== by 0x4B6170A: mca_bml_base_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4BA1714: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286936== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286936== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286936== by 0x15710D: main (testing_main.cpp:8) > ==1286936== > ==1286936== 62,640 bytes in 30 blocks are indirectly lost in loss record > 48 of 49 > ==1286936== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x8DE9FA8: ??? > ==1286936== by 0x8DEB032: ??? > ==1286936== by 0x8DEB49A: ??? > ==1286936== by 0x8DE8B12: ??? > ==1286936== by 0x8E9D492: ??? > ==1286936== by 0x8E9F1C1: ??? > ==1286936== by 0x8D0578C: ??? > ==1286936== by 0x8D8605A: ??? > ==1286936== by 0x8D87FE8: ??? > ==1286936== by 0x8D88E4D: ??? > ==1286936== by 0x8D1A5EB: ??? > ==1286936== > ==1286936== 62,760 (480 direct, 62,280 indirect) bytes in 15 blocks are > definitely lost in loss record 49 of 49 > ==1286936== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x8E9D3EB: ??? > ==1286936== by 0x8E9F1C1: ??? > ==1286936== by 0x8D0578C: ??? > ==1286936== by 0x8D8605A: ??? > ==1286936== by 0x8D87FE8: ??? > ==1286936== by 0x8D88E4D: ??? > ==1286936== by 0x8D1A5EB: ??? > ==1286936== by 0x9F0398A: ??? > ==1286936== by 0x9EE2F54: ??? > ==1286936== by 0x9F1E1FB: ??? > ==1286936== by 0x4BA1A09: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== > ==1286936== LEAK SUMMARY: > ==1286936== definitely lost: 9,805 bytes in 137 blocks > ==1286936== indirectly lost: 63,431 bytes in 63 blocks > ==1286936== possibly lost: 0 bytes in 0 blocks > ==1286936== still reachable: 1,174 bytes in 27 blocks > ==1286936== suppressed: 0 bytes in 0 blocks > ==1286936== > ==1286936== ERROR SUMMARY: 34 errors from 29 contexts (suppressed: 0 from > 0) > ==1286936== > ==1286936== 1 errors in context 1 of 29: > ==1286936== Thread 3: > ==1286936== Syscall param writev(vector[...]) points to uninitialised > byte(s) > ==1286936== at 0x658A48D: __writev (writev.c:26) > ==1286936== by 0x658A48D: writev (writev.c:24) > ==1286936== by 0x8DF9B4C: ??? > ==1286936== by 0x7CC413E: ??? (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286936== by 0x7CC487E: event_base_loop (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286936== by 0x8DBDD55: ??? > ==1286936== by 0x4BF7608: start_thread (pthread_create.c:477) > ==1286936== by 0x6595102: clone (clone.S:95) > ==1286936== Address 0xa290cbf is 127 bytes inside a block of size 5,120 > alloc'd > ==1286936== at 0x483DFAF: realloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x8DE155A: ??? > ==1286936== by 0x8DE3F4A: ??? > ==1286936== by 0x8DE4900: ??? > ==1286936== by 0x8DE4175: ??? > ==1286936== by 0x8D7CF91: ??? > ==1286936== by 0x7CC3FDD: ??? (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286936== by 0x7CC487E: event_base_loop (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286936== by 0x8DBDD55: ??? > ==1286936== by 0x4BF7608: start_thread (pthread_create.c:477) > ==1286936== by 0x6595102: clone (clone.S:95) > ==1286936== Uninitialised value was created by a stack allocation > ==1286936== at 0x9F048D6: ??? > ==1286936== > ==1286936== > ==1286936== 6 errors in context 2 of 29: > ==1286936== Thread 1: > ==1286936== Syscall param pwritev(vector[...]) points to uninitialised > byte(s) > ==1286936== at 0x658A608: pwritev64 (pwritev64.c:30) > ==1286936== by 0x658A608: pwritev (pwritev64.c:28) > ==1286936== by 0x9F46E25: ??? > ==1286936== by 0x9FCE33B: ??? > ==1286936== by 0x9FCDDBF: ??? > ==1286936== by 0x9FA324A: ??? > ==1286936== by 0x4B3DD7F: PMPI_File_write_at_all (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x7B6DB32: H5FD_mpio_write (H5FDmpio.c:1833) > ==1286936== by 0x78DF11D: H5FD_write (H5FDint.c:257) > ==1286936== by 0x78AE86B: H5F__accum_write (H5Faccum.c:825) > ==1286936== by 0x7A1FBE9: H5PB_write (H5PB.c:1027) > ==1286936== by 0x78BBC7A: H5F_block_write (H5Fio.c:164) > ==1286936== by 0x7B5ED15: H5C__collective_write (H5Cmpio.c:1020) > ==1286936== by 0x7B5ED15: H5C_apply_candidate_list (H5Cmpio.c:394) > ==1286936== Address 0xedf91b0 is 96 bytes inside a block of size 216 > alloc'd > ==1286936== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x7980C45: H5MM_malloc (H5MM.c:292) > ==1286936== by 0x7980C45: H5MM_malloc (H5MM.c:267) > ==1286936== by 0x77FC8FF: H5C__flush_single_entry (H5C.c:6045) > ==1286936== by 0x7B5DC7E: H5C__flush_candidates_in_ring (H5Cmpio.c:1371) > ==1286936== by 0x7B5DC7E: H5C__flush_candidate_entries (H5Cmpio.c:1192) > ==1286936== by 0x7B5DC7E: H5C_apply_candidate_list (H5Cmpio.c:385) > ==1286936== by 0x7B5BA18: H5AC__rsp__dist_md_write__flush > (H5ACmpio.c:1709) > ==1286936== by 0x7B5BA18: H5AC__run_sync_point (H5ACmpio.c:2164) > ==1286936== by 0x7B5C9D2: H5AC__flush_entries (H5ACmpio.c:2307) > ==1286936== by 0x77C95E4: H5AC_flush (H5AC.c:681) > ==1286936== by 0x78B306A: H5F__flush_phase2 (H5Fint.c:1831) > ==1286936== by 0x78B5D7A: H5F__dest (H5Fint.c:1152) > ==1286936== by 0x78B6603: H5F_try_close (H5Fint.c:2180) > ==1286936== by 0x78B69F5: H5F__close_cb (H5Fint.c:2009) > ==1286936== by 0x7965797: H5I_dec_ref (H5I.c:1254) > ==1286936== Uninitialised value was created by a stack allocation > ==1286936== at 0x7695AF0: ??? (in > /home/mlohry/dev/cmake-build/external_install/lib/libcgns.so) > ==1286936== > ==1286936== ERROR SUMMARY: 34 errors from 29 contexts (suppressed: 0 from > 0) > > On Mon, Aug 24, 2020 at 5:00 PM Jed Brown wrote: > >> Do you potentially have a memory or other resource leak? SIGBUS would be >> an odd result, but the symptom of crashing after running for a long time >> sometimes fits with a resource leak. >> >> Mark Lohry writes: >> >> > I queued up some jobs with Barry's patch, so we'll see. >> > >> > Re Jed's suggestion at checkpointing, I don't *think* this is something >> > coming from the state of the solution -- running from the same point I'm >> > seeing it crash anywhere between 1 hour and 20 hours in. I'll increase >> my >> > file save frequency in case I'm wrong there though. >> > >> > My intel build with different blas just made it through a 6 hour time >> slot >> > without crash, whereas yesterday the same thing crashed after 3 hours. >> But >> > given the randomness so far I'd bet that's just dumb luck. >> > >> > On Mon, Aug 24, 2020 at 4:22 PM Barry Smith wrote: >> > >> >> >> >> >> >> > On Aug 24, 2020, at 2:34 PM, Jed Brown wrote: >> >> > >> >> > I'm thinking of something such as writing floating point data into >> the >> >> return address, which would be unaligned/garbage. >> >> >> >> Ok, my patch will detect this. This is what I was talking about, >> messing >> >> up the BLAS arguments which are the addresses of arrays. >> >> >> >> Valgrind is by far the preferred approach. >> >> >> >> Barry >> >> >> >> Another feature we could add to the malloc checking is when a SEGV or >> >> BUS error is encountered and we catch it we should run the >> >> PetscMallocVerify() and check our memory for corruption reporting any >> we >> >> find. >> >> >> >> >> >> >> >> > >> >> > Reproducing under Valgrind would help a lot. Perhaps it's possible >> to >> >> checkpoint such that the breakage can be reproduced more quickly? >> >> > >> >> > Barry Smith writes: >> >> > >> >> >> https://en.wikipedia.org/wiki/Bus_error < >> >> https://en.wikipedia.org/wiki/Bus_error> >> >> >> >> >> >> But perhaps not true for Intel? >> >> >> >> >> >> >> >> >> >> >> >>> On Aug 24, 2020, at 1:06 PM, Matthew Knepley >> >> wrote: >> >> >>> >> >> >>> On Mon, Aug 24, 2020 at 1:46 PM Barry Smith > > >> bsmith at petsc.dev>> wrote: >> >> >>> >> >> >>> >> >> >>>> On Aug 24, 2020, at 12:39 PM, Jed Brown > > >> jed at jedbrown.org>> wrote: >> >> >>>> >> >> >>>> Barry Smith > writes: >> >> >>>> >> >> >>>>>> On Aug 24, 2020, at 12:31 PM, Jed Brown > > >> jed at jedbrown.org>> wrote: >> >> >>>>>> >> >> >>>>>> Barry Smith > >> writes: >> >> >>>>>> >> >> >>>>>>> So if a BLAS errors with SIGBUS then it is always an input >> error >> >> of just not proper double/complex alignment? Or some other very strange >> >> thing? >> >> >>>>>> >> >> >>>>>> I would suspect memory corruption. >> >> >>>>> >> >> >>>>> >> >> >>>>> Corruption meaning what specifically? >> >> >>>>> >> >> >>>>> The routines crashing are dgemv which only take double precision >> >> arrays, regardless of what garbage is in those arrays i don't think >> there >> >> can be BUS errors resulting. They don't take integer arrays whose >> >> corruption could result in bad indexing and then BUS errors. >> >> >>>>> >> >> >>>>> So then it can only be corruption of the pointers passed in, >> correct? >> >> >>>> >> >> >>>> Such as those pointers pointing into data on the stack with >> incorrect >> >> sizes. >> >> >>> >> >> >>> But won't incorrect sizes "usually" lead to SEGV not SEGBUS? >> >> >>> >> >> >>> My understanding was that roughly memory errors in the heap are >> SEGV >> >> and memory errors on the stack are SIGBUS. Is that not true? >> >> >>> >> >> >>> Matt >> >> >>> >> >> >>> -- >> >> >>> What most experimenters take for granted before they begin their >> >> experiments is infinitely more interesting than any results to which >> their >> >> experiments lead. >> >> >>> -- Norbert Wiener >> >> >>> >> >> >>> https://www.cse.buffalo.edu/~knepley/ < >> >> http://www.cse.buffalo.edu/~knepley/> >> >> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Aug 24 23:02:03 2020 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 24 Aug 2020 23:02:03 -0500 Subject: [petsc-users] error when solving a linear system with gmres + pilut/euclid In-Reply-To: References: <4A6C7C21-E4AB-45AE-ABAA-D9028622B66C@petsc.dev> <04AF3F3C-47D5-49C0-8367-C43B7A1811D0@petsc.dev> Message-ID: <19B4C575-D633-4088-830F-12AC84C84EAE@petsc.dev> On one system you get this error, on another system with the identical code and test case you do not get the error? You get it with three iterative methods but not with MUMPS? Barry > On Aug 24, 2020, at 8:35 PM, Alfredo Jaramillo wrote: > > Hello Barry, Matthew, thanks for the replies ! > > Yes, it is our custom code, and it also happens when setting -pc_type bjacobi. Before testing an iterative solver, we were using MUMPS (-ksp_type preonly -ksp_pc_type lu -pc_factor_mat_solver_type mumps) without issues. > > Running the ex19 (as "mpirun -n 4 ex19 -da_refine 5") did not produce any problem. > > To reproduce the situation on my computer, I was able to reproduce the error for a small case and -pc_type bjacobi. For that particular case, when running in the cluster the error appears at the very last iteration: > > ===== > 27 KSP Residual norm 8.230378644666e-06 > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Invalid argument > [0]PETSC ERROR: Scalar value must be same on all processes, argument # 3 > ==== > > whereas running on my computer the error is not launched and convergence is reached instead: > > ==== > Linear interp_ solve converged due to CONVERGED_RTOL iterations 27 > ==== > > I will run valgrind to seek for possible memory corruptions. > > thank you > Alfredo > > On Mon, Aug 24, 2020 at 9:00 PM Barry Smith > wrote: > > Oh yes, it could happen with Nan. > > KSPGMRESClassicalGramSchmidtOrthogonalization() calls KSPCheckDot(ksp,lhh[j]); so should detect any NAN that appear and set ksp->convergedreason but the call to MAXPY() is still made before returning and hence producing the error message. > > We should circuit the orthogonalization as soon as it sees a Nan/Inf and return immediately for GMRES to cleanup and produce a very useful error message. > > Alfredo, > > It is also possible that the hypre preconditioners are producing a Nan because your matrix is too difficult for them to handle, but it would be odd to happen after many iterations. > > As I suggested before run with -pc_type bjacobi to see if you get the same problem. > > Barry > > >> On Aug 24, 2020, at 6:38 PM, Matthew Knepley > wrote: >> >> On Mon, Aug 24, 2020 at 6:27 PM Barry Smith > wrote: >> >> Alfredo, >> >> This should never happen. The input to the VecMAXPY in gmres is computed via VMDot which produces the same result on all processes. >> >> If you run with -pc_type bjacobi does it also happen? >> >> Is this your custom code or does it happen in PETSc examples also? Like src/snes/tutorials/ex19 -da_refine 5 >> >> Could be memory corruption, can you run under valgrind? >> >> Couldn't it happen if something generates a NaN? That also should not happen, but I was allowing that pilut might do it. >> >> Thanks, >> >> Matt >> >> Barry >> >> >> > On Aug 24, 2020, at 4:05 PM, Alfredo Jaramillo > wrote: >> > >> > Dear PETSc developers, >> > >> > I'm trying to solve a linear problem with GMRES preconditioned with pilut from HYPRE. For this I'm using the options: >> > >> > -ksp_type gmres -pc_type hypre -pc_hypre_type pilut -ksp_monitor >> > >> > If I use a single core, GMRES (+ pilut or euclid) converges. However, when using multiple cores the next error appears after some number of iterations: >> > >> > [0]PETSC ERROR: Scalar value must be same on all processes, argument # 3 >> > >> > relative to the function VecMAXPY. I attached a screenshot with more detailed output. The same happens when using euclid. Can you please give me some insight on this? >> > >> > best regards >> > Alfredo >> > >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thibault.bridelbertomeu at gmail.com Tue Aug 25 02:06:51 2020 From: thibault.bridelbertomeu at gmail.com (Thibault Bridel-Bertomeu) Date: Tue, 25 Aug 2020 09:06:51 +0200 Subject: [petsc-users] PetscFV and TS implicit In-Reply-To: <2BF36064-AEC6-4795-BEE7-DAAF69119D2E@petsc.dev> References: <87mu2pgtdp.fsf@jedbrown.org> <01FA5D4D-A0CA-4ACB-ACC9-EB213E3B0D2F@petsc.dev> <2BF36064-AEC6-4795-BEE7-DAAF69119D2E@petsc.dev> Message-ID: Hello everyone, Barry, I followed your recommendations and came up with the pieces of code that are in the attached PDF - mostly pages 1 & 3 are important, page 2 is almost entirely commented. I tried to use DMCreateColoring as the doc says it may produce a more accurate coloring, however it is not implemented for a Plex yet hence the call to Matcoloringcreate that you will see. I left the test DMHascoloring in case in a later release PETSc allows for the generation of the coloring from a Plex. Also, you'll see in the input file that contrary to what you suggested I am using the jacobi PC. It is simply because it appears that the way I compiled my PETSc does not support a PC LU or PC CHOLESKY (per the seg fault print in the console). Do I need scalapack or mumps or something else ? Altogether this implementation works and produces results that are correct physically speaking. Now I have to try and increase the CFL number a lot to see how robust this approach is. All in all, what do you think of this implementation, is it what you had in mind ? Thank you for your help, Thibault Le lun. 24 ao?t 2020 ? 22:16, Barry Smith a ?crit : > > > On Aug 24, 2020, at 2:20 PM, Thibault Bridel-Bertomeu < > thibault.bridelbertomeu at gmail.com> wrote: > > Good evening everyone, > > Thanks Barry for your answer. > > Le lun. 24 ao?t 2020 ? 18:51, Barry Smith a ?crit : > >> >> >> On Aug 24, 2020, at 11:38 AM, Thibault Bridel-Bertomeu < >> thibault.bridelbertomeu at gmail.com> wrote: >> >> Thank you Barry for taking the time to go through the code ! >> >> I indeed figured out this afternoon that the function related to the >> matrix-vector product is always handling global vectors. I corrected mine >> so that it compiles, but I have a feeling it won't run properly without a >> preconditioner. >> >> Anyways you are right, my PetscJacobianFunction_JFNK() aims at doing some >> basic finite-differencing ; user->RHS_ref is my F(U) if you see the system >> as dU/dt = F(U). As for the DMGlobalToLocal() it was there mainly because I >> had not realized the vectors I was manipulating were global. >> I will take your advice and try with just the SNESSetUseMatrixFree. >> I haven't quite fully understood what it does "under the hood" though: >> just calling SNESSetUseMatrixFree(snes,PETSC_TRUE,PETSC_FALSE) before the >> TSSolve call is enough to ensure that the implicit matrix is computed ? >> Does it use the function we set as a RHS to build the matrix ? >> >> >> All it does is "replace" the A matrix with one automatically created >> for the job using MatCreateMFFD(). It does not touch the B matrix, it does >> not build the matrix but yes if does use the function to provide to do the >> differencing. >> > > OK, thank you. This MFFD Matrix is then called by the TS to construct the > linear system that will be solved to advance the system of equations, right > ? > >> >> To create the preconditioner I will do as you suggest too, thank you. >> This matrix has to be as close as possible to the inverse of the implicit >> matrix to ensure that the eigenvalues of the system are as close to 1 as >> possible. Given the implicit matrix is built "automatically" thanks to the >> SNES matrix free capability, can we use that matrix as a starting point to >> the building of the preconditioner ? >> >> >> No the MatrixFree doesn't build a matrix, it can only do matrix-vector >> products with differencing. >> > > My bad, wrong word. Yes of course it's all matrix-free hence it's just a > functional, however maybe the inner mechanisms can be accessed and used for > the preconditioner ? > > > Probably not, it really only can do matrix-vector products. > > You were talking about the coloring capabilities in PETSc, is that where >> it can be applied ? >> >> >> Yes you can use that. See MatFDColoringCreate() but since you are >> using a DM in theory you can use -snes_fd_color and PETSc will manage >> everything for you so you don't have to write any code for Jacobians at >> all. Again it uses your function to do differences using coloring to be >> efficient to build the Jacobian for you. >> > > I read a bit about the coloring you are mentioning. As I understand it, it > is another option to have a matrix-free Jacobian behavior during the > Newton-Krylov iterations, right ? Either we use the SNESSetUseMatrixFree() > alone, then it works using "basic" finite-differencing, or we use the > SNESSetUseMatrixFree + MatFDColoringCreate > & SNESComputeJacobianDefaultColor as an option to SNESSetJacobian to access > the finite-differencing based on coloring. Is that right ? > Then if i come back to my preconditioner problem ... once you have set-up > the implicit matrix with one or the other aforementioned matrix-free ways, > how would you go around setting up the preconditioner ? In a matrix-free > way too, or rather as a real matrix that we assemble ourselves this time, > as you seemed to mean with the previous MatAij DMCreateMatrix ? > > Sorry if it seems like I am nagging, but I would really like to understand > how to manipulate the matrix-free methods and structures in PETSc to run a > time-implicit finite volume computation, it's so promising ! > > > There are many many possibilities as we discussed in previous email, > most with various limitations. > > When you use -snes_fd_color (or put code into the source like > MatFDColoringCreate which is unnecessary a since you are doing the same > thing as -snes_fd_color you get back the true Jacobian (approximated so in > less digits than analytic) so you can use any preconditioner that you can > use as if you built the true Jacobian yourself. > > I always recommend starting with -pc_type lu and making sure you are > getting the correct answers to your problem and then worrying about the > preconditioner. Faster preconditioner are JUST optimizations, nothing more, > they should not change the quality of the solution to your PDE/ODE and you > absolutely need to make sure your are getting correct quality answers > before fiddling with the preconditioner. > > Once you have the solution correct and figured out a good preconditioner > (assuming using the true Jacobian works for your discretization) then you > can think about optimizing the computation of the Jacobian by doing it > analytically finite volume by finite volume. But you shouldn't do any of > that until you are sure that your implicit TS integrator for FV produces > good numerical answers. > > Barry > > > > > > Thanks again, > > > Thibault > > >> Barry >> >> Internally it uses SNESComputeJacobianDefaultColor() if you are >> interested in what it does. >> >> >> >> >> >> Thank you so much again, >> >> Thibault >> >> >> Le lun. 24 ao?t 2020 ? 15:45, Barry Smith a ?crit : >> >>> >>> I think the attached is wrong. >>> >>> >>> >>> The input to the matrix vector product for the Jacobian is always global >>> vectors which means on each process the dimension is not the size of the >>> DMGetLocalVector() it should be the VecGetLocalSize() of the >>> DMGetGlobalVector() >>> >>> But you may be able to skip all this and have the DM create the shell >>> matrix setting it sizes appropriately and you only need to supply the MATOP >>> >>> DMSetMatType(dm,MATSHELL); >>> DMCreateMatrix(dm,&A); >>> >>> In fact, I also don't understand the PetscJacobianFunction_JFKN() >>> function It seems to be doing finite differencing on the >>> DMPlexTSComputeRHSFunctionFVM() assuming the current function value is in >>> usr->RHS_ref. How is this different than just letting PETSc/SNES used >>> finite differences to do the matrix-vector product. Your code seems rather >>> complicated with the DMGlobalToLocal() which I don't understand what it is >>> suppose to do there. >>> >>> I think you can just call >>> >>> TSGetSNES() >>> SNESSetUseMatrixFree(snes,PETSC_TRUE,PETSC_FALSE); >>> >>> and it will set up an internal matrix that does the finite differencing >>> for you. Then you never need a shell matrix. >>> >>> >>> Also to create the preconditioner matrix B this should work >>> >>> DMSetMatType(dm,MATAIJ); >>> DMCreateMatrix(dm,&B); >>> >>> no need for you to figure out the sizes. >>> >>> >>> Note that both A and B need to have the same dimensions on each process >>> as the global vectors which I don't think your current code has. >>> >>> >>> >>> Barry >>> >>> >>> >>> On Aug 24, 2020, at 12:56 AM, Thibault Bridel-Bertomeu < >>> thibault.bridelbertomeu at gmail.com> wrote: >>> >>> Barry, first of all, thank you very much for your detailed answer, I >>> keep reading it to let it soak in - I might come back to you for more >>> details if you do not mind. >>> >>> In the meantime, to fuel the conversation, I attach to this e-mail two >>> pdfs containing the pieces of the code that regard what we are discussing. >>> In the *timedisc.pdf, you'll find how I handle the initialization of the TS >>> object, and in the *petscdefs.pdf you'll find the method that calls the >>> TSSolve as well as the methods that are linked to the TS (the timestep >>> adapt, the jacobian etc ...). [Sorry for the quality, I cannot do better >>> than that sort of pdf ...] >>> >>> Based on what is in the structured code I sent you the other day, I >>> rewrote the PetscJacobianFunction_JFNK. I think it should be all right, but >>> although it compiles, execution raises a seg fault I think when I do >>> ierr = TSSetIJacobian(ts, A, A, PetscIJacobian, user); >>> saying that A does not have the right dimensions. It is quite new, I am >>> still looking into where exactly the error is raised. What do you think of >>> this implementation though, does it look correct in your expert eyes ? >>> As for what we really discussed so far, it's that >>> PetscComputePreconMatImpl that I do not know how to implement (with the >>> derivative of the jacobian based on the FVM object). >>> >>> I understand now that what I am showing you today might not be the >>> right way to go if one wants to really use the PetscFV, but I just wanted >>> to add those code lines to the conversation to have your feedback. >>> >>> Thank you again for your help, >>> >>> Thibault >>> >>> >>> Le ven. 21 ao?t 2020 ? 19:25, Barry Smith a ?crit : >>> >>> >>>> >>>> On Aug 21, 2020, at 10:58 AM, Thibault Bridel-Bertomeu < >>>> thibault.bridelbertomeu at gmail.com> wrote: >>>> >>>> Thank you Barry for the tip ! I?ll make sure to do that when everything >>>> is set. >>>> What I also meant is that there will not be any more direct way to set >>>> the preconditioner than to go through SNESSetJacobian after having >>>> assembled everything by hand ? Like, in my case, or in the more general >>>> case of fluid dynamics equations, the preconditioner is not a fun matrix to >>>> assemble, because for every cell the derivative of the physical flux >>>> jacobian has to be taken and put in the right block in the matrix - finite >>>> element style if you want. Is there a way to do that with Petsc methods, >>>> maybe short-circuiting the FEM based methods ? >>>> >>>> >>>> Thibault >>>> >>>> I am not sure what you mean but there are a couple of things that >>>> may be helpful. >>>> >>>> PCSHELL >>>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCSHELL.html allows >>>> you to build your own preconditioner (that can and often will use one or >>>> more of its own Mats, and KSP or PC inside it, or even use another PETScFV >>>> etc to build some of the sub matrices for you if it is appropriate), this >>>> approach means you never need to construct a "global" PETSc matrix from >>>> which PETSc builds the preconditioner. But you should only do this if the >>>> conventional approach is not reasonable for your problem. >>>> >>>> MATNEST >>>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATNEST.html allows >>>> you to build a global matrix by building parts of it separately and even >>>> skipping parts you decide you don't need in the preconditioner. >>>> Conceptually it is the same as just creating a global matrix and filling up >>>> but the process is a bit different and something suitable for "multi >>>> physics" or "multi-equation" type applications. >>>> >>>> Of course what you put into PCSHELL and MATNEST will affect the >>>> convergence of the nonlinear solver. As Jed noted what you put in the >>>> "Jacobian" does not have to be directly the same mathematically as what you >>>> put into the TSSetI/RHSFunction with the caveat that it does have to >>>> appropriate spectral properties to result in a good preconditioner for the >>>> "true" Jacobian. >>>> >>>> Couple of other notes: >>>> >>>> The entire business of "Jacobian" matrix-free or not (with for example >>>> -snes_fd_operator) is tricky because as Jed noted if your finite volume >>>> scheme has non-differential terms such as if () tests. There is a concept >>>> of sub-differential for this type of thing but I know absolutely nothing >>>> about that and probably not worth investigating. >>>> >>>> In this situation you can avoid the "true" Jacobian completely (both >>>> for matrix-vector product and preconditioner) and use something else as Jed >>>> suggested a lower order scheme that is differentiable. This can work well >>>> for solving the nonlinear system or not depending on how suitable it is for >>>> your original "function" >>>> >>>> >>>> 1) In theory at least you can have the Jacobian matrix-vector product >>>> computed directly using DMPLEX/PETScFV infrastructure (it would apply the >>>> Jacobian locally matrix-free using code similar to the code that evaluates >>>> the FV "function". I do no know if any of this code is written, it will be >>>> more efficient than -snes_mf_operator that evaluates the FV "function" and >>>> does traditional differencing to compute the Jacobian. Again it has the >>>> problem of non-differentialability if the function is not differential. But >>>> it could be done for a different (lower order scheme) that is >>>> differentiable. >>>> >>>> 2) You can have PETSc compute the Jacobian explicitly coloring and >>>> from that build the preconditioner, this allows you to avoid the hassle of >>>> writing the code for the derivatives yourself. This uses finite differences >>>> on your function and coloring of the graph to compute many columns of the >>>> Jacobian simultaneously and can be pretty efficient. Again if the function >>>> is not differential there can be issues of what the result means and will >>>> it work in a nonlinear solver. SNESComputeJacobianDefaultColor >>>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESComputeJacobianDefaultColor.html >>>> >>>> 3) Much more outlandish is to skip Newton and Jacobians completely and >>>> use the full approximation scheme SNESFAS >>>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNESFAS/SNESFAS.html this >>>> requires a grid hierarchy and appropriate way to interpolate up through the >>>> grid hierarchy your finite volume solutions. Probably not worth >>>> investigating unless you have lots of time on your hands and keen interest >>>> in this kind of stuff https://arxiv.org/pdf/1607.04254.pdf >>>> >>>> So to summarize, and Matt and Jed can correct my mistakes. >>>> >>>> 1) Form the full Jacobian from the original "function" using analytic >>>> approach use it for both the matrix-vector product and to build the >>>> preconditioner. Problem if full Jacobian not well defined mathematically. >>>> Tough to code, usually not practical. >>>> >>>> 2) Do any matrix free (any way) for the full Jacobian and >>>> >>>> a) build another "approximate" Jacobian (using any technique analytic >>>> or finite differences using matrix coloring on a new "lower order" >>>> "function") Still can have trouble if this original Jacobian is no well >>>> defined >>>> >>>> b) "write your own preconditioner" that internally can use anything >>>> in PETSc that approximately solves the Jacobian. Same potential problems if >>>> original Jacobian is not differential, plus convergence will depend on how >>>> good your own preconditioner approximates the inverse of the true Jacobian. >>>> >>>> 3) Use a lower Jacobian (computed anyway you want) for the >>>> matrix-vector product and the preconditioner. The problem of >>>> differentiability is gone but convergence of the nonlinear solver depends >>>> on how well lower order Jacobian is appropriate for the original "function" >>>> >>>> a) Form the "lower order" Jacobian analytically or with coloring >>>> and use for both matrix-vector product and building preconditioner. Note >>>> that switching between this and 2a is trivial. >>>> >>>> b) Do the "lower order" Jacobian matrix free and provide your own >>>> PCSHELL. Note that switching between this and 2b is trivial. >>>> >>>> Barry >>>> >>>> I would first try competing the "true" Jacobian via coloring, if that >>>> works and give satisfactory results (fast enough) then stop. >>>> >>>> Then I would do 2a/2b by writing my "function" using PETScFV and >>>> writing the "lower order function" via PETScFV and use matrix coloring to >>>> get the Jacobian from the second "lower order function". If this works well >>>> (either with 2a or 3a or both) then stop or you can compute the "lower >>>> order" Jacobian analytically (again using PetscFV) for a more efficient >>>> evaluation of the Jacobian. >>>> >>>> >>> >>>> >>>> >>>> Thanks ! >>>> >>>> Thibault >>>> >>>> Le ven. 21 ao?t 2020 ? 17:22, Barry Smith a ?crit : >>>> >>>> >>>>> >>>>> On Aug 21, 2020, at 8:35 AM, Thibault Bridel-Bertomeu < >>>>> thibault.bridelbertomeu at gmail.com> wrote: >>>>> >>>>> >>>>> >>>>> Le ven. 21 ao?t 2020 ? 15:23, Matthew Knepley a >>>>> ?crit : >>>>> >>>>>> On Fri, Aug 21, 2020 at 9:10 AM Thibault Bridel-Bertomeu < >>>>>> thibault.bridelbertomeu at gmail.com> wrote: >>>>>> >>>>>>> Sorry, I sent too soon, I hit the wrong key. >>>>>>> >>>>>>> I wanted to say that context.npoints is the local number of cells. >>>>>>> >>>>>>> PetscRHSFunctionImpl allows to generate the hyperbolic part of the >>>>>>> right hand side. >>>>>>> Then we have : >>>>>>> >>>>>>> PetscErrorCode PetscIJacobian( >>>>>>> TS ts, /*!< Time stepping object (see PETSc TS)*/ >>>>>>> PetscReal t, /*!< Current time */ >>>>>>> Vec Y, /*!< Solution vector */ >>>>>>> Vec Ydot, /*!< Time-derivative of solution vector */ >>>>>>> PetscReal a, /*!< Shift */ >>>>>>> Mat A, /*!< Jacobian matrix */ >>>>>>> Mat B, /*!< Preconditioning matrix */ >>>>>>> void *ctxt /*!< Application context */ >>>>>>> ) >>>>>>> { >>>>>>> PETScContext *context = (PETScContext*) ctxt; >>>>>>> HyPar *solver = context->solver; >>>>>>> _DECLARE_IERR_; >>>>>>> >>>>>>> PetscFunctionBegin; >>>>>>> solver->count_IJacobian++; >>>>>>> context->shift = a; >>>>>>> context->waqt = t; >>>>>>> /* Construct preconditioning matrix */ >>>>>>> if (context->flag_use_precon) { IERR PetscComputePreconMatImpl(B,Y, >>>>>>> context); CHECKERR(ierr); } >>>>>>> >>>>>>> PetscFunctionReturn(0); >>>>>>> } >>>>>>> >>>>>>> and PetscJacobianFunction_JFNK which I bind to the matrix shell, >>>>>>> computes the action of the jacobian on a vector : say U0 is the state of >>>>>>> reference and Y the vector upon which to apply the JFNK method, then the >>>>>>> PetscJacobianFunction_JFNK returns shift * Y - 1/epsilon * (F(U0 + >>>>>>> epsilon*Y) - F(U0)) where F allows to evaluate the hyperbolic flux (shift >>>>>>> comes from the TS). >>>>>>> The preconditioning matrix I compute as an approximation to the >>>>>>> actual jacobian, that is shift * Identity - Derivative(dF/dU) where dF/dU >>>>>>> is, in each cell, a 4x4 matrix that is known exactly for the system of >>>>>>> equations I am solving, i.e. Euler equations. For the structured grid, I >>>>>>> can loop on the cells and do that 'Derivative' thing at first order by >>>>>>> simply taking a finite-difference like approximation with the neighboring >>>>>>> cells, Derivative(phi) = phi_i - phi_{i-1} and I assemble the B matrix >>>>>>> block by block (JFunction is the dF/dU) >>>>>>> >>>>>>> /* diagonal element */ >>>>>>> >>>>>>> >>>>>>> for (v=0; v>>>>>> nvars*pg + v; } >>>>>>> >>>>>>> >>>>>>> ierr = solver->JFunction >>>>>>> >>>>>>> (values,(u+nvars*p),solver->physics >>>>>>> >>>>>>> ,dir,0); >>>>>>> >>>>>>> >>>>>>> _ArrayScale1D_ >>>>>>> >>>>>>> (values,(dxinv*iblank),(nvars*nvars)); >>>>>>> >>>>>>> >>>>>>> ierr = >>>>>>> MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); CHKERRQ(ierr); >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> /* left neighbor */ >>>>>>> >>>>>>> >>>>>>> if (pgL >= 0) { >>>>>>> >>>>>>> >>>>>>> for (v=0; v>>>>>> nvars*pgL + v; } >>>>>>> >>>>>>> >>>>>>> ierr = solver->JFunction >>>>>>> >>>>>>> (values,(u+nvars*pL),solver->physics >>>>>>> >>>>>>> ,dir,1); >>>>>>> >>>>>>> >>>>>>> _ArrayScale1D_ >>>>>>> >>>>>>> (values,(-dxinv*iblank),(nvars*nvars)); >>>>>>> >>>>>>> >>>>>>> ierr = >>>>>>> MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); CHKERRQ(ierr); >>>>>>> >>>>>>> >>>>>>> } >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> /* right neighbor */ >>>>>>> >>>>>>> >>>>>>> if (pgR >= 0) { >>>>>>> >>>>>>> >>>>>>> for (v=0; v>>>>>> nvars*pgR + v; } >>>>>>> >>>>>>> >>>>>>> ierr = solver->JFunction >>>>>>> >>>>>>> (values,(u+nvars*pR),solver->physics >>>>>>> >>>>>>> ,dir,-1); >>>>>>> >>>>>>> >>>>>>> _ArrayScale1D_ >>>>>>> >>>>>>> (values,(-dxinv*iblank),(nvars*nvars)); >>>>>>> >>>>>>> >>>>>>> ierr = >>>>>>> MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); CHKERRQ(ierr); >>>>>>> >>>>>>> >>>>>>> } >>>>>>> >>>>>>> >>>>>>> >>>>>>> I do not know if I am clear here ... >>>>>>> Anyways, I am trying to figure out how to do this shell matrix and >>>>>>> this preconditioner using all the FV and DMPlex artillery. >>>>>>> >>>>>> >>>>>> Okay, that is very clear. We should be able to get the JFNK just with >>>>>> -snes_mf_operator, and put the approximate J construction in >>>>>> DMPlexComputeJacobian_Internal(). >>>>>> There is an FV section already, and we could just add this. I would >>>>>> need to understand those entries in the pointwise Riemann sense that the >>>>>> other stuff is now. >>>>>> >>>>> >>>>> Ok i had a quick look and if I understood correctly it would do the >>>>> job. Setting the snes-mf-operator flag would mean however that we have to >>>>> go through SNESSetJacobian to set the jacobian and the preconditioning >>>>> matrix wouldn't it ? >>>>> >>>>> >>>>> Thibault, >>>>> >>>>> Since the TS implicit methods end up using SNES internally the >>>>> option should be available to you without requiring you to be calling the >>>>> SNES routines directly >>>>> >>>>> Once you have finalized your approach and if for the implicit case >>>>> you always work in the snes mf operator mode you can hardwire >>>>> >>>>> TSGetSNES(ts,&snes); >>>>> SNESSetUseMatrixFree(snes,PETSC_TRUE,PETSC_FALSE); >>>>> >>>>> in your code so you don't need to always provide the option >>>>> -snes-mf-operator >>>>> >>>> >>>>> Barry >>>>> >>>>> >>>>> >>>>> >>>>> There might be calls to the Riemann solver to evaluate the dRHS / dU >>>>> part yes but maybe it's possible to re-use what was computed for the RHS^n ? >>>>> In the FV section the jacobian is set to identity which I missed >>>>> before, but it could explain why when I used the following : >>>>> >>>>> TSSetType(ts, TSBEULER); >>>>> DMTSSetIFunctionLocal(dm, DMPlexTSComputeIFunctionFEM , &ctx); >>>>> DMTSSetIJacobianLocal(dm, DMPlexTSComputeIJacobianFEM , &ctx); >>>>> >>>>> with my FV discretization nothing happened, right ? >>>>> >>>>> Thank you, >>>>> >>>>> Thibault >>>>> >>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>>> Le ven. 21 ao?t 2020 ? 14:55, Thibault Bridel-Bertomeu < >>>>>>> thibault.bridelbertomeu at gmail.com> a ?crit : >>>>>>> >>>>>> Hi, >>>>>>>> >>>>>>>> Thanks Matthew and Jed for your input. >>>>>>>> I indeed envision an implicit solver in the sense Jed mentioned - >>>>>>>> Jiri Blazek's book is a nice intro to this concept. >>>>>>>> >>>>>>>> Matthew, I do not know exactly what to change right now because >>>>>>>> although I understand globally what the DMPlexComputeXXXX_Internal methods >>>>>>>> do, I cannot say for sure line by line what is happening. >>>>>>>> In a structured code, I have a an implicit FVM solver with PETSc >>>>>>>> but I do not use any of the FV structure, not even a DM - I just use C >>>>>>>> arrays that I transform to PETSc Vec and Mat and build my IJacobian and my >>>>>>>> preconditioner and gives all that to a TS and it runs. I cannot figure out >>>>>>>> how to do it with the FV and the DM and all the underlying "shortcuts" that >>>>>>>> I want to use. >>>>>>>> >>>>>>>> Here is the top method for the structured code : >>>>>>>> >>>>>>>> int total_size = context.npoints * solver->nvars >>>>>>>> ierr = TSSetRHSFunction(ts,PETSC_NULL,PetscRHSFunctionImpl,&context); >>>>>>>> CHKERRQ(ierr); >>>>>>>> SNES snes; >>>>>>>> KSP ksp; >>>>>>>> PC pc; >>>>>>>> SNESType snestype; >>>>>>>> ierr = TSGetSNES(ts,&snes); CHKERRQ(ierr); >>>>>>>> ierr = SNESGetType(snes,&snestype); CHKERRQ(ierr); >>>>>>>> >>>>>>>> flag_mat_a = 1; >>>>>>>> ierr = MatCreateShell(MPI_COMM_WORLD,total_size,total_size, >>>>>>>> PETSC_DETERMINE, >>>>>>>> PETSC_DETERMINE,&context,&A); CHKERRQ(ierr); >>>>>>>> context.jfnk_eps = 1e-7; >>>>>>>> ierr = PetscOptionsGetReal(NULL,NULL,"-jfnk_epsilon",&context. >>>>>>>> jfnk_eps,NULL); CHKERRQ(ierr); >>>>>>>> ierr = MatShellSetOperation(A,MATOP_MULT,(void (*)(void)) >>>>>>>> PetscJacobianFunction_JFNK); CHKERRQ(ierr); >>>>>>>> ierr = MatSetUp(A); CHKERRQ(ierr); >>>>>>>> >>>>>>>> context.flag_use_precon = 0; >>>>>>>> ierr = PetscOptionsGetBool(PETSC_NULL,PETSC_NULL,"-with_pc",( >>>>>>>> PetscBool*)(&context.flag_use_precon),PETSC_NULL); CHKERRQ(ierr); >>>>>>>> >>>>>>>> /* Set up preconditioner matrix */ >>>>>>>> flag_mat_b = 1; >>>>>>>> ierr = MatCreateAIJ(MPI_COMM_WORLD,total_size,total_size, >>>>>>>> PETSC_DETERMINE,PETSC_DETERMINE, >>>>>>>> >>>>>>> (solver->ndims*2+1)*solver->nvars,NULL, >>>>>>>> 2*solver->ndims*solver->nvars,NULL,&B); CHKERRQ(ierr); >>>>>>>> ierr = MatSetBlockSize(B,solver->nvars); >>>>>>>> /* Set the RHSJacobian function for TS */ >>>>>>>> >>>>>>> ierr = TSSetIJacobian(ts,A,B,PetscIJacobian,&context); CHKERRQ(ierr >> ); >> >> Thibault Bridel-Bertomeu >>>>>>>> ? >>>>>>>> Eng, MSc, PhD >>>>>>>> Research Engineer >>>>>>>> CEA/CESTA >>>>>>>> 33114 LE BARP >>>>>>>> Tel.: (+33)557046924 >>>>>>>> Mob.: (+33)611025322 >>>>>>>> Mail: thibault.bridelbertomeu at gmail.com >>>>>>>> >>>>>>>> >>>>>>>> Le jeu. 20 ao?t 2020 ? 18:43, Jed Brown a >>>>>>>> ?crit : >>>>>>>> >>>>>>>>> Matthew Knepley writes: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> > I could never get the FVM stuff to make sense to me for implicit >>>>>>>>> methods. >>>>>>>>> >>>>>>>>> >>>>>>>>> > Here is my problem understanding. If you have an FVM method, it >>>>>>>>> decides >>>>>>>>> >>>>>>>>> >>>>>>>>> > to move "stuff" from one cell to its neighboring cells depending >>>>>>>>> on the >>>>>>>>> >>>>>>>>> >>>>>>>>> > solution to the Riemann problem on each face, which computed the >>>>>>>>> flux. This >>>>>>>>> >>>>>>>>> >>>>>>>>> > is >>>>>>>>> >>>>>>>>> >>>>>>>>> > fine unless the timestep is so big that material can flow >>>>>>>>> through into the >>>>>>>>> >>>>>>>>> >>>>>>>>> > cells beyond the neighbor. Then I should have considered the >>>>>>>>> effect of the >>>>>>>>> >>>>>>>>> >>>>>>>>> > Riemann problem for those interfaces. That would be in the >>>>>>>>> Jacobian, but I >>>>>>>>> >>>>>>>>> >>>>>>>>> > don't know how to compute that Jacobian. I guess you could do >>>>>>>>> everything >>>>>>>>> >>>>>>>>> >>>>>>>>> > matrix-free, but without a preconditioner it seems hard. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> So long as we're using method of lines, the flux is just >>>>>>>>> instantaneous flux, not integrated over some time step. It has the same >>>>>>>>> meaning for implicit and explicit. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> An explicit method would be unstable if you took such a large time >>>>>>>>> step (CFL) and an implicit method will not simultaneously be SSP and higher >>>>>>>>> than first order, but it's still a consistent discretization of the problem. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> It's common (done in FUN3D and others) to precondition with a >>>>>>>>> first-order method, where gradient reconstruction/limiting is skipped. >>>>>>>>> That's what I'd recommend because limiting creates nasty nonlinearities and >>>>>>>>> the resulting discretizations lack h-ellipticity which makes them very hard >>>>>>>>> to solve. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> >>>>>> >>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>> Thibault Bridel-Bertomeu >>>> ? >>>> Eng, MSc, PhD >>>> Research Engineer >>>> CEA/CESTA >>>> 33114 LE BARP >>>> Tel.: (+33)557046924 >>>> Mob.: (+33)611025322 >>>> Mail: thibault.bridelbertomeu at gmail.com >>>> >>>> >>>> >>>> >>> >>> >>> >>> >>> >> >> -- >> Thibault Bridel-Bertomeu >> ? >> Eng, MSc, PhD >> Research Engineer >> CEA/CESTA >> 33114 LE BARP >> Tel.: (+33)557046924 >> Mob.: (+33)611025322 >> Mail: thibault.bridelbertomeu at gmail.com >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: implicit_with_coloring.pdf Type: application/pdf Size: 207853 bytes Desc: not available URL: From smithc11 at rpi.edu Tue Aug 25 07:34:45 2020 From: smithc11 at rpi.edu (Cameron Smith) Date: Tue, 25 Aug 2020 08:34:45 -0400 Subject: [petsc-users] creation of parallel dmplex from a partitioned mesh In-Reply-To: References: <1953567c-6c7f-30fb-13e6-ad7017263a92@rpi.edu> <62654977-bdbc-9cd7-5a70-e9fb4951310a@rpi.edu> <3fcf90b7-3abd-1345-bd90-d7d7272816d9@rpi.edu> <87mu2jg57a.fsf@jedbrown.org> Message-ID: On 8/24/20 4:57 PM, Matthew Knepley wrote: > On Mon, Aug 24, 2020 at 4:27 PM Jed Brown > wrote: > > Cameron Smith > writes: > > > We made some progress with star forest creation but still have > work to do. > > > > We revisited DMPlexCreateFromCellListParallelPetsc(...) and got it > > working by sequentially partitioning the vertex coordinates across > > processes to satisfy the 'vertexCoords' argument. Specifically, > rank 0 > > has the coordinates for vertices with global id 0:N/P-1, rank 1 has > > N/P:2*(N/P)-1, and so on (N is the total number of global > vertices and P > > is the number of processes). > > > > The consequences of the sequential partition of vertex > coordinates in > > subsequent solver operations is not clear.? Does it make process i > > responsible for computations and communications associated with > global > > vertices i*(N/P):(i+1)*(N/P)-1 ?? We assumed it does and wanted > to confirm. > > Yeah, in the sense that the corners would be owned by the rank you > place them on. > > But many methods, especially high-order, perform assembly via > non-overlapping partition of elements, in which case the > "computations" happen where the elements are (with any required > vertex data for the closure of those elements being sent to the rank > handling the element). > > Note that a typical pattern would be to create a parallel DMPlex > with a naive distribution, then repartition/distribute it. > > > As Jed says, CreateParallel() just makes the most naive partition of > vertices because we have no other information. Once > the mesh is made, you call DMPlexDistribute() again to reduce the edge cut. > > ? Thanks, > > ? ? ?Matt > Thank you. This is being used for PIC code with low order 2d elements whose mesh is partitioned to minimize communications during particle operations. This partition will not be ideal for the field solve using petsc so we're exploring alternatives that will require minimal data movement between the two partitions. Towards that, we'll keep pursuing the SF creation. -Cameron From ajaramillopalma at gmail.com Tue Aug 25 08:03:59 2020 From: ajaramillopalma at gmail.com (Alfredo Jaramillo) Date: Tue, 25 Aug 2020 10:03:59 -0300 Subject: [petsc-users] error when solving a linear system with gmres + pilut/euclid In-Reply-To: <19B4C575-D633-4088-830F-12AC84C84EAE@petsc.dev> References: <4A6C7C21-E4AB-45AE-ABAA-D9028622B66C@petsc.dev> <04AF3F3C-47D5-49C0-8367-C43B7A1811D0@petsc.dev> <19B4C575-D633-4088-830F-12AC84C84EAE@petsc.dev> Message-ID: Yes, Barry, that is correct. On Tue, Aug 25, 2020 at 1:02 AM Barry Smith wrote: > > On one system you get this error, on another system with the identical > code and test case you do not get the error? > > You get it with three iterative methods but not with MUMPS? > > Barry > > > On Aug 24, 2020, at 8:35 PM, Alfredo Jaramillo > wrote: > > Hello Barry, Matthew, thanks for the replies ! > > Yes, it is our custom code, and it also happens when setting -pc_type > bjacobi. Before testing an iterative solver, we were using MUMPS (-ksp_type > preonly -ksp_pc_type lu -pc_factor_mat_solver_type mumps) without issues. > > Running the ex19 (as "mpirun -n 4 ex19 -da_refine 5") did not produce any > problem. > > To reproduce the situation on my computer, I was able to reproduce the > error for a small case and -pc_type bjacobi. For that particular case, when > running in the cluster the error appears at the very last iteration: > > ===== > 27 KSP Residual norm 8.230378644666e-06 > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Invalid argument > [0]PETSC ERROR: Scalar value must be same on all processes, argument # 3 > ==== > > whereas running on my computer the error is not launched and convergence > is reached instead: > > ==== > Linear interp_ solve converged due to CONVERGED_RTOL iterations 27 > ==== > > I will run valgrind to seek for possible memory corruptions. > > thank you > Alfredo > > On Mon, Aug 24, 2020 at 9:00 PM Barry Smith wrote: > >> >> Oh yes, it could happen with Nan. >> >> KSPGMRESClassicalGramSchmidtOrthogonalization() >> calls KSPCheckDot(ksp,lhh[j]); so should detect any NAN that appear and >> set ksp->convergedreason but the call to MAXPY() is still made before >> returning and hence producing the error message. >> >> We should circuit the orthogonalization as soon as it sees a Nan/Inf >> and return immediately for GMRES to cleanup and produce a very useful error >> message. >> >> Alfredo, >> >> It is also possible that the hypre preconditioners are producing a >> Nan because your matrix is too difficult for them to handle, but it would >> be odd to happen after many iterations. >> >> As I suggested before run with -pc_type bjacobi to see if you get the >> same problem. >> >> Barry >> >> >> On Aug 24, 2020, at 6:38 PM, Matthew Knepley wrote: >> >> On Mon, Aug 24, 2020 at 6:27 PM Barry Smith wrote: >> >>> >>> Alfredo, >>> >>> This should never happen. The input to the VecMAXPY in gmres is >>> computed via VMDot which produces the same result on all processes. >>> >>> If you run with -pc_type bjacobi does it also happen? >>> >>> Is this your custom code or does it happen in PETSc examples >>> also? Like src/snes/tutorials/ex19 -da_refine 5 >>> >>> Could be memory corruption, can you run under valgrind? >>> >> >> Couldn't it happen if something generates a NaN? That also should not >> happen, but I was allowing that pilut might do it. >> >> Thanks, >> >> Matt >> >> >>> Barry >>> >>> >>> > On Aug 24, 2020, at 4:05 PM, Alfredo Jaramillo < >>> ajaramillopalma at gmail.com> wrote: >>> > >>> > Dear PETSc developers, >>> > >>> > I'm trying to solve a linear problem with GMRES preconditioned with >>> pilut from HYPRE. For this I'm using the options: >>> > >>> > -ksp_type gmres -pc_type hypre -pc_hypre_type pilut -ksp_monitor >>> > >>> > If I use a single core, GMRES (+ pilut or euclid) converges. However, >>> when using multiple cores the next error appears after some number of >>> iterations: >>> > >>> > [0]PETSC ERROR: Scalar value must be same on all processes, argument # >>> 3 >>> > >>> > relative to the function VecMAXPY. I attached a screenshot with more >>> detailed output. The same happens when using euclid. Can you please give me >>> some insight on this? >>> > >>> > best regards >>> > Alfredo >>> > >>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Aug 25 08:19:38 2020 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 25 Aug 2020 08:19:38 -0500 Subject: [petsc-users] PetscFV and TS implicit In-Reply-To: References: <87mu2pgtdp.fsf@jedbrown.org> <01FA5D4D-A0CA-4ACB-ACC9-EB213E3B0D2F@petsc.dev> <2BF36064-AEC6-4795-BEE7-DAAF69119D2E@petsc.dev> Message-ID: Yes, your use of the coloring is what I was thinking of. I don't think you need any of calls to the coloring code as it is managed in SNESComputeJacobianDefaultColor() if you don't provide it initially. Did that not work, just using -snes_fd_color? Regarding direct solvers. Add the arguments --download-superlu_dist --download-metis --download-parmetis --download-mumps --download-scalapack --download-ptscotch to ./configure Then when you run the code you can use -pc_type lu -pc_factor_mat_solver_type superlu_dist or mumps Barry > On Aug 25, 2020, at 2:06 AM, Thibault Bridel-Bertomeu wrote: > > Hello everyone, > > Barry, I followed your recommendations and came up with the pieces of code that are in the attached PDF - mostly pages 1 & 3 are important, page 2 is almost entirely commented. > > I tried to use DMCreateColoring as the doc says it may produce a more accurate coloring, however it is not implemented for a Plex yet hence the call to Matcoloringcreate that you will see. I left the test DMHascoloring in case in a later release PETSc allows for the generation of the coloring from a Plex. > > Also, you'll see in the input file that contrary to what you suggested I am using the jacobi PC. It is simply because it appears that the way I compiled my PETSc does not support a PC LU or PC CHOLESKY (per the seg fault print in the console). Do I need scalapack or mumps or something else ? > > Altogether this implementation works and produces results that are correct physically speaking. Now I have to try and increase the CFL number a lot to see how robust this approach is. > > All in all, what do you think of this implementation, is it what you had in mind ? > > Thank you for your help, > > Thibault > > Le lun. 24 ao?t 2020 ? 22:16, Barry Smith > a ?crit : > > >> On Aug 24, 2020, at 2:20 PM, Thibault Bridel-Bertomeu > wrote: >> >> Good evening everyone, >> >> Thanks Barry for your answer. >> >> Le lun. 24 ao?t 2020 ? 18:51, Barry Smith > a ?crit : >> >> >>> On Aug 24, 2020, at 11:38 AM, Thibault Bridel-Bertomeu > wrote: >>> >>> Thank you Barry for taking the time to go through the code ! >>> >>> I indeed figured out this afternoon that the function related to the matrix-vector product is always handling global vectors. I corrected mine so that it compiles, but I have a feeling it won't run properly without a preconditioner. >>> >>> Anyways you are right, my PetscJacobianFunction_JFNK() aims at doing some basic finite-differencing ; user->RHS_ref is my F(U) if you see the system as dU/dt = F(U). As for the DMGlobalToLocal() it was there mainly because I had not realized the vectors I was manipulating were global. >>> I will take your advice and try with just the SNESSetUseMatrixFree. >>> I haven't quite fully understood what it does "under the hood" though: just calling SNESSetUseMatrixFree(snes,PETSC_TRUE,PETSC_FALSE) before the TSSolve call is enough to ensure that the implicit matrix is computed ? Does it use the function we set as a RHS to build the matrix ? >> >> All it does is "replace" the A matrix with one automatically created for the job using MatCreateMFFD(). It does not touch the B matrix, it does not build the matrix but yes if does use the function to provide to do the differencing. >> >> OK, thank you. This MFFD Matrix is then called by the TS to construct the linear system that will be solved to advance the system of equations, right ? >>> >>> To create the preconditioner I will do as you suggest too, thank you. This matrix has to be as close as possible to the inverse of the implicit matrix to ensure that the eigenvalues of the system are as close to 1 as possible. Given the implicit matrix is built "automatically" thanks to the SNES matrix free capability, can we use that matrix as a starting point to the building of the preconditioner ? >> >> No the MatrixFree doesn't build a matrix, it can only do matrix-vector products with differencing. >> >> My bad, wrong word. Yes of course it's all matrix-free hence it's just a functional, however maybe the inner mechanisms can be accessed and used for the preconditioner ? > > Probably not, it really only can do matrix-vector products. > >>> You were talking about the coloring capabilities in PETSc, is that where it can be applied ? >> >> Yes you can use that. See MatFDColoringCreate() but since you are using a DM in theory you can use -snes_fd_color and PETSc will manage everything for you so you don't have to write any code for Jacobians at all. Again it uses your function to do differences using coloring to be efficient to build the Jacobian for you. >> >> I read a bit about the coloring you are mentioning. As I understand it, it is another option to have a matrix-free Jacobian behavior during the Newton-Krylov iterations, right ? Either we use the SNESSetUseMatrixFree() alone, then it works using "basic" finite-differencing, or we use the SNESSetUseMatrixFree + MatFDColoringCreate & SNESComputeJacobianDefaultColor as an option to SNESSetJacobian to access the finite-differencing based on coloring. Is that right ? >> Then if i come back to my preconditioner problem ... once you have set-up the implicit matrix with one or the other aforementioned matrix-free ways, how would you go around setting up the preconditioner ? In a matrix-free way too, or rather as a real matrix that we assemble ourselves this time, as you seemed to mean with the previous MatAij DMCreateMatrix ? >> >> Sorry if it seems like I am nagging, but I would really like to understand how to manipulate the matrix-free methods and structures in PETSc to run a time-implicit finite volume computation, it's so promising ! > > There are many many possibilities as we discussed in previous email, most with various limitations. > > When you use -snes_fd_color (or put code into the source like MatFDColoringCreate which is unnecessary a since you are doing the same thing as -snes_fd_color you get back the true Jacobian (approximated so in less digits than analytic) so you can use any preconditioner that you can use as if you built the true Jacobian yourself. > > I always recommend starting with -pc_type lu and making sure you are getting the correct answers to your problem and then worrying about the preconditioner. Faster preconditioner are JUST optimizations, nothing more, they should not change the quality of the solution to your PDE/ODE and you absolutely need to make sure your are getting correct quality answers before fiddling with the preconditioner. > > Once you have the solution correct and figured out a good preconditioner (assuming using the true Jacobian works for your discretization) then you can think about optimizing the computation of the Jacobian by doing it analytically finite volume by finite volume. But you shouldn't do any of that until you are sure that your implicit TS integrator for FV produces good numerical answers. > > Barry > > > > >> >> Thanks again, > >> Thibault >> >> Barry >> >> Internally it uses SNESComputeJacobianDefaultColor() if you are interested in what it does. >> >> >> >> >>> >>> Thank you so much again, >>> >>> Thibault >>> >>> >>> Le lun. 24 ao?t 2020 ? 15:45, Barry Smith > a ?crit : >>> >>> I think the attached is wrong. >>> >>> >>> >>> The input to the matrix vector product for the Jacobian is always global vectors which means on each process the dimension is not the size of the DMGetLocalVector() it should be the VecGetLocalSize() of the DMGetGlobalVector() >>> >>> But you may be able to skip all this and have the DM create the shell matrix setting it sizes appropriately and you only need to supply the MATOP >>> >>> DMSetMatType(dm,MATSHELL); >>> DMCreateMatrix(dm,&A); >>> >>> In fact, I also don't understand the PetscJacobianFunction_JFKN() function It seems to be doing finite differencing on the DMPlexTSComputeRHSFunctionFVM() assuming the current function value is in usr->RHS_ref. How is this different than just letting PETSc/SNES used finite differences to do the matrix-vector product. Your code seems rather complicated with the DMGlobalToLocal() which I don't understand what it is suppose to do there. >>> >>> I think you can just call >>> >>> TSGetSNES() >>> SNESSetUseMatrixFree(snes,PETSC_TRUE,PETSC_FALSE); >>> >>> and it will set up an internal matrix that does the finite differencing for you. Then you never need a shell matrix. >>> >>> >>> Also to create the preconditioner matrix B this should work >>> >>> DMSetMatType(dm,MATAIJ); >>> DMCreateMatrix(dm,&B); >>> >>> no need for you to figure out the sizes. >>> >>> >>> Note that both A and B need to have the same dimensions on each process as the global vectors which I don't think your current code has. >>> >>> >>> >>> Barry >>> >>> >>> >>> >>>> On Aug 24, 2020, at 12:56 AM, Thibault Bridel-Bertomeu > wrote: >>>> >>> >>> >>>> Barry, first of all, thank you very much for your detailed answer, I keep reading it to let it soak in - I might come back to you for more details if you do not mind. >>>> >>>> In the meantime, to fuel the conversation, I attach to this e-mail two pdfs containing the pieces of the code that regard what we are discussing. In the *timedisc.pdf, you'll find how I handle the initialization of the TS object, and in the *petscdefs.pdf you'll find the method that calls the TSSolve as well as the methods that are linked to the TS (the timestep adapt, the jacobian etc ...). [Sorry for the quality, I cannot do better than that sort of pdf ...] >>>> >>>> Based on what is in the structured code I sent you the other day, I rewrote the PetscJacobianFunction_JFNK. I think it should be all right, but although it compiles, execution raises a seg fault I think when I do >>>> ierr = TSSetIJacobian(ts, A, A, PetscIJacobian, user); >>>> saying that A does not have the right dimensions. It is quite new, I am still looking into where exactly the error is raised. What do you think of this implementation though, does it look correct in your expert eyes ? >>>> As for what we really discussed so far, it's that PetscComputePreconMatImpl that I do not know how to implement (with the derivative of the jacobian based on the FVM object). >>>> >>>> I understand now that what I am showing you today might not be the right way to go if one wants to really use the PetscFV, but I just wanted to add those code lines to the conversation to have your feedback. >>>> >>>> Thank you again for your help, >>>> >>>> Thibault >>>> >>>> >>> >>> >>>> Le ven. 21 ao?t 2020 ? 19:25, Barry Smith > a ?crit : >>> >>> >>>> >>>> >>>>> On Aug 21, 2020, at 10:58 AM, Thibault Bridel-Bertomeu > wrote: >>>>> >>>>> Thank you Barry for the tip ! I?ll make sure to do that when everything is set. >>>>> What I also meant is that there will not be any more direct way to set the preconditioner than to go through SNESSetJacobian after having assembled everything by hand ? Like, in my case, or in the more general case of fluid dynamics equations, the preconditioner is not a fun matrix to assemble, because for every cell the derivative of the physical flux jacobian has to be taken and put in the right block in the matrix - finite element style if you want. Is there a way to do that with Petsc methods, maybe short-circuiting the FEM based methods ? >>>> >>>> Thibault >>>> >>>> I am not sure what you mean but there are a couple of things that may be helpful. >>>> >>>> PCSHELL https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCSHELL.html <> allows you to build your own preconditioner (that can and often will use one or more of its own Mats, and KSP or PC inside it, or even use another PETScFV etc to build some of the sub matrices for you if it is appropriate), this approach means you never need to construct a "global" PETSc matrix from which PETSc builds the preconditioner. But you should only do this if the conventional approach is not reasonable for your problem. >>>> >>>> MATNEST https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATNEST.html allows you to build a global matrix by building parts of it separately and even skipping parts you decide you don't need in the preconditioner. Conceptually it is the same as just creating a global matrix and filling up but the process is a bit different and something suitable for "multi physics" or "multi-equation" type applications. >>>> >>>> Of course what you put into PCSHELL and MATNEST will affect the convergence of the nonlinear solver. As Jed noted what you put in the "Jacobian" does not have to be directly the same mathematically as what you put into the TSSetI/RHSFunction with the caveat that it does have to appropriate spectral properties to result in a good preconditioner for the "true" Jacobian. >>>> >>>> Couple of other notes: >>>> >>>> The entire business of "Jacobian" matrix-free or not (with for example -snes_fd_operator) is tricky because as Jed noted if your finite volume scheme has non-differential terms such as if () tests. There is a concept of sub-differential for this type of thing but I know absolutely nothing about that and probably not worth investigating. >>>> >>>> In this situation you can avoid the "true" Jacobian completely (both for matrix-vector product and preconditioner) and use something else as Jed suggested a lower order scheme that is differentiable. This can work well for solving the nonlinear system or not depending on how suitable it is for your original "function" >>>> >>>> >>>> 1) In theory at least you can have the Jacobian matrix-vector product computed directly using DMPLEX/PETScFV infrastructure (it would apply the Jacobian locally matrix-free using code similar to the code that evaluates the FV "function". I do no know if any of this code is written, it will be more efficient than -snes_mf_operator that evaluates the FV "function" and does traditional differencing to compute the Jacobian. Again it has the problem of non-differentialability if the function is not differential. But it could be done for a different (lower order scheme) that is differentiable. >>>> >>>> 2) You can have PETSc compute the Jacobian explicitly coloring and from that build the preconditioner, this allows you to avoid the hassle of writing the code for the derivatives yourself. This uses finite differences on your function and coloring of the graph to compute many columns of the Jacobian simultaneously and can be pretty efficient. Again if the function is not differential there can be issues of what the result means and will it work in a nonlinear solver. SNESComputeJacobianDefaultColor https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESComputeJacobianDefaultColor.html >>>> >>>> 3) Much more outlandish is to skip Newton and Jacobians completely and use the full approximation scheme SNESFAS https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNESFAS/SNESFAS.html this requires a grid hierarchy and appropriate way to interpolate up through the grid hierarchy your finite volume solutions. Probably not worth investigating unless you have lots of time on your hands and keen interest in this kind of stuff https://arxiv.org/pdf/1607.04254.pdf >>>> >>>> So to summarize, and Matt and Jed can correct my mistakes. >>>> >>>> 1) Form the full Jacobian from the original "function" using analytic approach use it for both the matrix-vector product and to build the preconditioner. Problem if full Jacobian not well defined mathematically. Tough to code, usually not practical. >>>> >>>> 2) Do any matrix free (any way) for the full Jacobian and >>>> >>>> a) build another "approximate" Jacobian (using any technique analytic or finite differences using matrix coloring on a new "lower order" "function") Still can have trouble if this original Jacobian is no well defined >>>> >>>> b) "write your own preconditioner" that internally can use anything in PETSc that approximately solves the Jacobian. Same potential problems if original Jacobian is not differential, plus convergence will depend on how good your own preconditioner approximates the inverse of the true Jacobian. >>>> >>>> 3) Use a lower Jacobian (computed anyway you want) for the matrix-vector product and the preconditioner. The problem of differentiability is gone but convergence of the nonlinear solver depends on how well lower order Jacobian is appropriate for the original "function" >>>> >>>> a) Form the "lower order" Jacobian analytically or with coloring and use for both matrix-vector product and building preconditioner. Note that switching between this and 2a is trivial. >>>> >>>> b) Do the "lower order" Jacobian matrix free and provide your own PCSHELL. Note that switching between this and 2b is trivial. >>>> >>>> Barry >>>> >>>> I would first try competing the "true" Jacobian via coloring, if that works and give satisfactory results (fast enough) then stop. >>>> >>>> Then I would do 2a/2b by writing my "function" using PETScFV and writing the "lower order function" via PETScFV and use matrix coloring to get the Jacobian from the second "lower order function". If this works well (either with 2a or 3a or both) then stop or you can compute the "lower order" Jacobian analytically (again using PetscFV) for a more efficient evaluation of the Jacobian. >>>> >>> >>>> >>>>> >>>>> Thanks ! >>>>> >>>>> Thibault >>>>> >>> >>>>> Le ven. 21 ao?t 2020 ? 17:22, Barry Smith > a ?crit : >>> >>> >>>>> >>>>> >>>>>> On Aug 21, 2020, at 8:35 AM, Thibault Bridel-Bertomeu > wrote: >>>>>> >>>>>> >>>>>> >>>>>> Le ven. 21 ao?t 2020 ? 15:23, Matthew Knepley > a ?crit : >>>>>> On Fri, Aug 21, 2020 at 9:10 AM Thibault Bridel-Bertomeu > wrote: >>>>>> Sorry, I sent too soon, I hit the wrong key. >>>>>> >>>>>> I wanted to say that context.npoints is the local number of cells. >>>>>> >>>>>> PetscRHSFunctionImpl allows to generate the hyperbolic part of the right hand side. >>>>>> Then we have : >>>>>> >>>>>> PetscErrorCode PetscIJacobian( >>>>>> TS ts, /*!< Time stepping object (see PETSc TS)*/ >>>>>> PetscReal t, /*!< Current time */ >>>>>> Vec Y, /*!< Solution vector */ >>>>>> Vec Ydot, /*!< Time-derivative of solution vector */ >>>>>> PetscReal a, /*!< Shift */ >>>>>> Mat A, /*!< Jacobian matrix */ >>>>>> Mat B, /*!< Preconditioning matrix */ >>>>>> void *ctxt /*!< Application context */ >>>>>> ) >>>>>> { >>>>>> PETScContext *context = (PETScContext*) ctxt; >>>>>> HyPar *solver = context->solver; >>>>>> _DECLARE_IERR_; >>>>>> >>>>>> PetscFunctionBegin; >>>>>> solver->count_IJacobian++; >>>>>> context->shift = a; >>>>>> context->waqt = t; >>>>>> /* Construct preconditioning matrix */ >>>>>> if (context->flag_use_precon) { IERR PetscComputePreconMatImpl(B,Y,context); CHECKERR(ierr); } >>>>>> >>>>>> PetscFunctionReturn(0); >>>>>> } >>>>>> >>>>>> and PetscJacobianFunction_JFNK which I bind to the matrix shell, computes the action of the jacobian on a vector : say U0 is the state of reference and Y the vector upon which to apply the JFNK method, then the PetscJacobianFunction_JFNK returns shift * Y - 1/epsilon * (F(U0 + epsilon*Y) - F(U0)) where F allows to evaluate the hyperbolic flux (shift comes from the TS). >>>>>> The preconditioning matrix I compute as an approximation to the actual jacobian, that is shift * Identity - Derivative(dF/dU) where dF/dU is, in each cell, a 4x4 matrix that is known exactly for the system of equations I am solving, i.e. Euler equations. For the structured grid, I can loop on the cells and do that 'Derivative' thing at first order by simply taking a finite-difference like approximation with the neighboring cells, Derivative(phi) = phi_i - phi_{i-1} and I assemble the B matrix block by block (JFunction is the dF/dU) >>>>>> >>>>>> /* diagonal element */ >>>>>> >>>>>> >>>>>> <> for (v=0; v>>>>> >>>>>> >>>>>> <> ierr = solver->JFunction (values,(u+nvars*p),solver->physics ,dir,0); >>>>>> >>>>>> >>>>>> <> _ArrayScale1D_ (values,(dxinv*iblank),(nvars*nvars)); >>>>>> >>>>>> >>>>>> <> ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); CHKERRQ(ierr); >>>>>> >>>>>> >>>>>> <> >>>>>> >>>>>> >>>>>> <> /* left neighbor */ >>>>>> >>>>>> >>>>>> <> if (pgL >= 0) { >>>>>> >>>>>> >>>>>> <> for (v=0; v>>>>> >>>>>> >>>>>> <> ierr = solver->JFunction (values,(u+nvars*pL),solver->physics ,dir,1); >>>>>> >>>>>> >>>>>> <> _ArrayScale1D_ (values,(-dxinv*iblank),(nvars*nvars)); >>>>>> >>>>>> >>>>>> <> ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); CHKERRQ(ierr); >>>>>> >>>>>> >>>>>> <> } >>>>>> >>>>>> >>>>>> <> >>>>>> >>>>>> >>>>>> <> /* right neighbor */ >>>>>> >>>>>> >>>>>> <> if (pgR >= 0) { >>>>>> >>>>>> >>>>>> <> for (v=0; v>>>>> >>>>>> >>>>>> <> ierr = solver->JFunction (values,(u+nvars*pR),solver->physics ,dir,-1); >>>>>> >>>>>> >>>>>> <> _ArrayScale1D_ (values,(-dxinv*iblank),(nvars*nvars)); >>>>>> >>>>>> >>>>>> <> ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); CHKERRQ(ierr); >>>>>> >>>>>> >>>>>> <> } >>>>>> >>>>>> >>>>>> >>>>>> I do not know if I am clear here ... >>>>>> Anyways, I am trying to figure out how to do this shell matrix and this preconditioner using all the FV and DMPlex artillery. >>>>>> >>>>>> Okay, that is very clear. We should be able to get the JFNK just with -snes_mf_operator, and put the approximate J construction in DMPlexComputeJacobian_Internal(). >>>>>> There is an FV section already, and we could just add this. I would need to understand those entries in the pointwise Riemann sense that the other stuff is now. >>>>>> >>>>>> Ok i had a quick look and if I understood correctly it would do the job. Setting the snes-mf-operator flag would mean however that we have to go through SNESSetJacobian to set the jacobian and the preconditioning matrix wouldn't it ? >>>>> >>>>> Thibault, >>>>> >>>>> Since the TS implicit methods end up using SNES internally the option should be available to you without requiring you to be calling the SNES routines directly >>>>> >>>>> Once you have finalized your approach and if for the implicit case you always work in the snes mf operator mode you can hardwire >>>>> >>>>> TSGetSNES(ts,&snes); >>>>> SNESSetUseMatrixFree(snes,PETSC_TRUE,PETSC_FALSE); >>>>> >>>>> in your code so you don't need to always provide the option -snes-mf-operator >>> >>>>> >>>>> Barry >>>>> >>>>> >>>>> >>> >>>>>> There might be calls to the Riemann solver to evaluate the dRHS / dU part yes but maybe it's possible to re-use what was computed for the RHS^n ? >>>>>> In the FV section the jacobian is set to identity which I missed before, but it could explain why when I used the following : >>>>>> TSSetType(ts, TSBEULER); >>>>>> DMTSSetIFunctionLocal(dm, DMPlexTSComputeIFunctionFEM , &ctx); >>>>>> DMTSSetIJacobianLocal(dm, DMPlexTSComputeIJacobianFEM , &ctx); >>>>>> with my FV discretization nothing happened, right ? >>>>>> >>>>>> Thank you, >>>>>> >>>>>> Thibault >>>>>> >>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>> >>>>>> Le ven. 21 ao?t 2020 ? 14:55, Thibault Bridel-Bertomeu > a ?crit : >>> >>> >>>>>> Hi, >>>>>> >>>>>> Thanks Matthew and Jed for your input. >>>>>> I indeed envision an implicit solver in the sense Jed mentioned - Jiri Blazek's book is a nice intro to this concept. >>>>>> >>>>>> Matthew, I do not know exactly what to change right now because although I understand globally what the DMPlexComputeXXXX_Internal methods do, I cannot say for sure line by line what is happening. >>>>>> In a structured code, I have a an implicit FVM solver with PETSc but I do not use any of the FV structure, not even a DM - I just use C arrays that I transform to PETSc Vec and Mat and build my IJacobian and my preconditioner and gives all that to a TS and it runs. I cannot figure out how to do it with the FV and the DM and all the underlying "shortcuts" that I want to use. >>>>>> >>>>>> Here is the top method for the structured code : >>>>>> >>> >>> >>>>>> int total_size = context.npoints * solver->nvars >>>>>> ierr = TSSetRHSFunction(ts,PETSC_NULL,PetscRHSFunctionImpl,&context); CHKERRQ(ierr); >>>>>> SNES snes; >>>>>> KSP ksp; >>>>>> PC pc; >>>>>> SNESType snestype; >>>>>> ierr = TSGetSNES(ts,&snes); CHKERRQ(ierr); >>>>>> ierr = SNESGetType(snes,&snestype); CHKERRQ(ierr); >>>>>> >>>>>> flag_mat_a = 1; >>>>>> ierr = MatCreateShell(MPI_COMM_WORLD,total_size,total_size,PETSC_DETERMINE, >>>>>> PETSC_DETERMINE,&context,&A); CHKERRQ(ierr); >>>>>> context.jfnk_eps = 1e-7; >>>>>> ierr = PetscOptionsGetReal(NULL,NULL,"-jfnk_epsilon",&context.jfnk_eps,NULL); CHKERRQ(ierr); >>>>>> ierr = MatShellSetOperation(A,MATOP_MULT,(void (*)(void))PetscJacobianFunction_JFNK); CHKERRQ(ierr); >>>>>> ierr = MatSetUp(A); CHKERRQ(ierr); >>>>>> >>>>>> context.flag_use_precon = 0; >>>>>> ierr = PetscOptionsGetBool(PETSC_NULL,PETSC_NULL,"-with_pc",(PetscBool*)(&context.flag_use_precon),PETSC_NULL); CHKERRQ(ierr); >>>>>> >>>>>> /* Set up preconditioner matrix */ >>>>>> flag_mat_b = 1; >>>>>> ierr = MatCreateAIJ(MPI_COMM_WORLD,total_size,total_size,PETSC_DETERMINE,PETSC_DETERMINE, >>> >>>>>> (solver->ndims*2+1)*solver->nvars,NULL, >>>>>> 2*solver->ndims*solver->nvars,NULL,&B); CHKERRQ(ierr); >>>>>> ierr = MatSetBlockSize(B,solver->nvars); >>>>>> /* Set the RHSJacobian function for TS */ >>> >>> ierr = TSSetIJacobian(ts,A,B,PetscIJacobian,&context); CHKERRQ(ierr); >>> >>>>>> Thibault Bridel-Bertomeu >>>>>> ? >>>>>> Eng, MSc, PhD >>>>>> Research Engineer >>>>>> CEA/CESTA >>>>>> 33114 LE BARP >>>>>> Tel.: (+33)557046924 >>>>>> Mob.: (+33)611025322 >>>>>> Mail: thibault.bridelbertomeu at gmail.com >>>>>> >>> >>> >>>>>> >>>>>> Le jeu. 20 ao?t 2020 ? 18:43, Jed Brown > a ?crit : >>>>>> Matthew Knepley > writes: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> > I could never get the FVM stuff to make sense to me for implicit methods. >>>>>> >>>>>> >>>>>> > Here is my problem understanding. If you have an FVM method, it decides >>>>>> >>>>>> >>>>>> > to move "stuff" from one cell to its neighboring cells depending on the >>>>>> >>>>>> >>>>>> > solution to the Riemann problem on each face, which computed the flux. This >>>>>> >>>>>> >>>>>> > is >>>>>> >>>>>> >>>>>> > fine unless the timestep is so big that material can flow through into the >>>>>> >>>>>> >>>>>> > cells beyond the neighbor. Then I should have considered the effect of the >>>>>> >>>>>> >>>>>> > Riemann problem for those interfaces. That would be in the Jacobian, but I >>>>>> >>>>>> >>>>>> > don't know how to compute that Jacobian. I guess you could do everything >>>>>> >>>>>> >>>>>> > matrix-free, but without a preconditioner it seems hard. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> So long as we're using method of lines, the flux is just instantaneous flux, not integrated over some time step. It has the same meaning for implicit and explicit. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> An explicit method would be unstable if you took such a large time step (CFL) and an implicit method will not simultaneously be SSP and higher than first order, but it's still a consistent discretization of the problem. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> It's common (done in FUN3D and others) to precondition with a first-order method, where gradient reconstruction/limiting is skipped. That's what I'd recommend because limiting creates nasty nonlinearities and the resulting discretizations lack h-ellipticity which makes them very hard to solve. >>>>>> >>>>>> >>>>>> >>>>>> >>> >>>>>> >>>>>> >>> >>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>> >>>>> -- >>>>> Thibault Bridel-Bertomeu >>>>> ? >>>>> Eng, MSc, PhD >>>>> Research Engineer >>>>> CEA/CESTA >>>>> 33114 LE BARP >>>>> Tel.: (+33)557046924 >>>>> Mob.: (+33)611025322 >>>>> Mail: thibault.bridelbertomeu at gmail.com >>>>> >>>>> >>> >>>> >>>> >>>> >>>> >>> >>> >>> >>> -- >>> Thibault Bridel-Bertomeu >>> ? >>> Eng, MSc, PhD >>> Research Engineer >>> CEA/CESTA >>> 33114 LE BARP >>> Tel.: (+33)557046924 >>> Mob.: (+33)611025322 >>> Mail: thibault.bridelbertomeu at gmail.com >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Aug 25 08:23:27 2020 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 25 Aug 2020 08:23:27 -0500 Subject: [petsc-users] error when solving a linear system with gmres + pilut/euclid In-Reply-To: References: <4A6C7C21-E4AB-45AE-ABAA-D9028622B66C@petsc.dev> <04AF3F3C-47D5-49C0-8367-C43B7A1811D0@petsc.dev> <19B4C575-D633-4088-830F-12AC84C84EAE@petsc.dev> Message-ID: Sounds like it might be a compiler problem generating bad code. On the machine where it fails you can run with -fp_trap to have it error out as soon as a Nan or Inf appears. If you can use the debugger on that machine you can tell the debugger to catch floating point exceptions and see the exact line an values of variables where a Nan or Inf appear. As Matt conjectured it is likely there is a divide by zero before PETSc detects and it may be helpful to find out exactly where that happens. Barry > On Aug 25, 2020, at 8:03 AM, Alfredo Jaramillo wrote: > > Yes, Barry, that is correct. > > > > On Tue, Aug 25, 2020 at 1:02 AM Barry Smith > wrote: > > On one system you get this error, on another system with the identical code and test case you do not get the error? > > You get it with three iterative methods but not with MUMPS? > > Barry > > >> On Aug 24, 2020, at 8:35 PM, Alfredo Jaramillo > wrote: >> >> Hello Barry, Matthew, thanks for the replies ! >> >> Yes, it is our custom code, and it also happens when setting -pc_type bjacobi. Before testing an iterative solver, we were using MUMPS (-ksp_type preonly -ksp_pc_type lu -pc_factor_mat_solver_type mumps) without issues. >> >> Running the ex19 (as "mpirun -n 4 ex19 -da_refine 5") did not produce any problem. >> >> To reproduce the situation on my computer, I was able to reproduce the error for a small case and -pc_type bjacobi. For that particular case, when running in the cluster the error appears at the very last iteration: >> >> ===== >> 27 KSP Residual norm 8.230378644666e-06 >> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >> [0]PETSC ERROR: Invalid argument >> [0]PETSC ERROR: Scalar value must be same on all processes, argument # 3 >> ==== >> >> whereas running on my computer the error is not launched and convergence is reached instead: >> >> ==== >> Linear interp_ solve converged due to CONVERGED_RTOL iterations 27 >> ==== >> >> I will run valgrind to seek for possible memory corruptions. >> >> thank you >> Alfredo >> >> On Mon, Aug 24, 2020 at 9:00 PM Barry Smith > wrote: >> >> Oh yes, it could happen with Nan. >> >> KSPGMRESClassicalGramSchmidtOrthogonalization() calls KSPCheckDot(ksp,lhh[j]); so should detect any NAN that appear and set ksp->convergedreason but the call to MAXPY() is still made before returning and hence producing the error message. >> >> We should circuit the orthogonalization as soon as it sees a Nan/Inf and return immediately for GMRES to cleanup and produce a very useful error message. >> >> Alfredo, >> >> It is also possible that the hypre preconditioners are producing a Nan because your matrix is too difficult for them to handle, but it would be odd to happen after many iterations. >> >> As I suggested before run with -pc_type bjacobi to see if you get the same problem. >> >> Barry >> >> >>> On Aug 24, 2020, at 6:38 PM, Matthew Knepley > wrote: >>> >>> On Mon, Aug 24, 2020 at 6:27 PM Barry Smith > wrote: >>> >>> Alfredo, >>> >>> This should never happen. The input to the VecMAXPY in gmres is computed via VMDot which produces the same result on all processes. >>> >>> If you run with -pc_type bjacobi does it also happen? >>> >>> Is this your custom code or does it happen in PETSc examples also? Like src/snes/tutorials/ex19 -da_refine 5 >>> >>> Could be memory corruption, can you run under valgrind? >>> >>> Couldn't it happen if something generates a NaN? That also should not happen, but I was allowing that pilut might do it. >>> >>> Thanks, >>> >>> Matt >>> >>> Barry >>> >>> >>> > On Aug 24, 2020, at 4:05 PM, Alfredo Jaramillo > wrote: >>> > >>> > Dear PETSc developers, >>> > >>> > I'm trying to solve a linear problem with GMRES preconditioned with pilut from HYPRE. For this I'm using the options: >>> > >>> > -ksp_type gmres -pc_type hypre -pc_hypre_type pilut -ksp_monitor >>> > >>> > If I use a single core, GMRES (+ pilut or euclid) converges. However, when using multiple cores the next error appears after some number of iterations: >>> > >>> > [0]PETSC ERROR: Scalar value must be same on all processes, argument # 3 >>> > >>> > relative to the function VecMAXPY. I attached a screenshot with more detailed output. The same happens when using euclid. Can you please give me some insight on this? >>> > >>> > best regards >>> > Alfredo >>> > >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ajaramillopalma at gmail.com Tue Aug 25 08:55:01 2020 From: ajaramillopalma at gmail.com (Alfredo Jaramillo) Date: Tue, 25 Aug 2020 10:55:01 -0300 Subject: [petsc-users] error when solving a linear system with gmres + pilut/euclid In-Reply-To: References: <4A6C7C21-E4AB-45AE-ABAA-D9028622B66C@petsc.dev> <04AF3F3C-47D5-49C0-8367-C43B7A1811D0@petsc.dev> <19B4C575-D633-4088-830F-12AC84C84EAE@petsc.dev> Message-ID: In fact, on my machine the code is compiled with gnu, and on the cluster it is compiled with intel (2015) compilers. I just run the program with "-fp_trap" and got: =============================================================== |> Assembling interface problem. Unk # 56 |> Solving interface problem Residual norms for interp_ solve. 0 KSP Residual norm 3.642615470862e+03 [0]PETSC ERROR: *** unknown floating point error occurred *** [0]PETSC ERROR: The specific exception can be determined by running in a debugger. When the [0]PETSC ERROR: debugger traps the signal, the exception can be found with fetestexcept(0x3f) [0]PETSC ERROR: where the result is a bitwise OR of the following flags: [0]PETSC ERROR: FE_INVALID=0x1 FE_DIVBYZERO=0x4 FE_OVERFLOW=0x8 FE_UNDERFLOW=0x10 FE_INEXACT=0x20 [0]PETSC ERROR: Try option -start_in_debugger [0]PETSC ERROR: likely location of problem given in stack below [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ [1]PETSC ERROR: [2]PETSC ERROR: *** unknown floating point error occurred *** [3]PETSC ERROR: *** unknown floating point error occurred *** [3]PETSC ERROR: The specific exception can be determined by running in a debugger. When the [4]PETSC ERROR: *** unknown floating point error occurred *** [4]PETSC ERROR: The specific exception can be determined by running in a debugger. When the [4]PETSC ERROR: [5]PETSC ERROR: *** unknown floating point error occurred *** [5]PETSC ERROR: The specific exception can be determined by running in a debugger. When the [5]PETSC ERROR: debugger traps the signal, the exception can be found with fetestexcept(0x3f) [5]PETSC ERROR: where the result is a bitwise OR of the following flags: [6]PETSC ERROR: *** unknown floating point error occurred *** [6]PETSC ERROR: The specific exception can be determined by running in a debugger. When the [6]PETSC ERROR: debugger traps the signal, the exception can be found with fetestexcept(0x3f) [6]PETSC ERROR: where the result is a bitwise OR of the following flags: [6]PETSC ERROR: FE_INVALID=0x1 FE_DIVBYZERO=0x4 FE_OVERFLOW=0x8 FE_UNDERFLOW=0x10 FE_INEXACT=0x20 [7]PETSC ERROR: *** unknown floating point error occurred *** [7]PETSC ERROR: The specific exception can be determined by running in a debugger. When the [7]PETSC ERROR: debugger traps the signal, the exception can be found with fetestexcept(0x3f) [7]PETSC ERROR: where the result is a bitwise OR of the following flags: [7]PETSC ERROR: FE_INVALID=0x1 FE_DIVBYZERO=0x4 FE_OVERFLOW=0x8 FE_UNDERFLOW=0x10 FE_INEXACT=0x20 [7]PETSC ERROR: Try option -start_in_debugger [7]PETSC ERROR: likely location of problem given in stack below [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, [0]PETSC ERROR: INSTEAD the line number of the start of the function [0]PETSC ERROR: is given. [0]PETSC ERROR: [0] PetscDefaultFPTrap line 355 /mnt/lustre/home/ajaramillo/petsc-3.13.0/src/sys/error/fp.c [0]PETSC ERROR: [0] VecMDot line 1154 /mnt/lustre/home/ajaramillo/petsc-3.13.0/src/vec/vec/interface/rvector.c [0]PETSC ERROR: [0] KSPGMRESClassicalGramSchmidtOrthogonalization line 44 /mnt/lustre/home/ajaramillo/petsc-3.13.0/src/ksp/ksp/impls/gmres/borthog2.c [0]PETSC ERROR: [0] KSPGMRESCycle line 122 /mnt/lustre/home/ajaramillo/petsc-3.13.0/src/ksp/ksp/impls/gmres/gmres.c [0]PETSC ERROR: [0] KSPSolve_GMRES line 225 /mnt/lustre/home/ajaramillo/petsc-3.13.0/src/ksp/ksp/impls/gmres/gmres.c [0]PETSC ERROR: [0] KSPSolve_Private line 590 /mnt/lustre/home/ajaramillo/petsc-3.13.0/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: *** unknown floating point error occurred *** =============================================================== So it seems that in fact a division by 0 is taking place. I will try to run this in debug mode. thanks Alfredo On Tue, Aug 25, 2020 at 10:23 AM Barry Smith wrote: > > Sounds like it might be a compiler problem generating bad code. > > On the machine where it fails you can run with -fp_trap to have it error > out as soon as a Nan or Inf appears. If you can use the debugger on that > machine you can tell the debugger to catch floating point exceptions and > see the exact line an values of variables where a Nan or Inf appear. > > As Matt conjectured it is likely there is a divide by zero before PETSc > detects and it may be helpful to find out exactly where that happens. > > Barry > > > On Aug 25, 2020, at 8:03 AM, Alfredo Jaramillo > wrote: > > Yes, Barry, that is correct. > > > > On Tue, Aug 25, 2020 at 1:02 AM Barry Smith wrote: > >> >> On one system you get this error, on another system with the identical >> code and test case you do not get the error? >> >> You get it with three iterative methods but not with MUMPS? >> >> Barry >> >> >> On Aug 24, 2020, at 8:35 PM, Alfredo Jaramillo >> wrote: >> >> Hello Barry, Matthew, thanks for the replies ! >> >> Yes, it is our custom code, and it also happens when setting -pc_type >> bjacobi. Before testing an iterative solver, we were using MUMPS (-ksp_type >> preonly -ksp_pc_type lu -pc_factor_mat_solver_type mumps) without issues. >> >> Running the ex19 (as "mpirun -n 4 ex19 -da_refine 5") did not produce any >> problem. >> >> To reproduce the situation on my computer, I was able to reproduce the >> error for a small case and -pc_type bjacobi. For that particular case, when >> running in the cluster the error appears at the very last iteration: >> >> ===== >> 27 KSP Residual norm 8.230378644666e-06 >> [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [0]PETSC ERROR: Invalid argument >> [0]PETSC ERROR: Scalar value must be same on all processes, argument # 3 >> ==== >> >> whereas running on my computer the error is not launched and convergence >> is reached instead: >> >> ==== >> Linear interp_ solve converged due to CONVERGED_RTOL iterations 27 >> ==== >> >> I will run valgrind to seek for possible memory corruptions. >> >> thank you >> Alfredo >> >> On Mon, Aug 24, 2020 at 9:00 PM Barry Smith wrote: >> >>> >>> Oh yes, it could happen with Nan. >>> >>> KSPGMRESClassicalGramSchmidtOrthogonalization() >>> calls KSPCheckDot(ksp,lhh[j]); so should detect any NAN that appear and >>> set ksp->convergedreason but the call to MAXPY() is still made before >>> returning and hence producing the error message. >>> >>> We should circuit the orthogonalization as soon as it sees a Nan/Inf >>> and return immediately for GMRES to cleanup and produce a very useful error >>> message. >>> >>> Alfredo, >>> >>> It is also possible that the hypre preconditioners are producing a >>> Nan because your matrix is too difficult for them to handle, but it would >>> be odd to happen after many iterations. >>> >>> As I suggested before run with -pc_type bjacobi to see if you get the >>> same problem. >>> >>> Barry >>> >>> >>> On Aug 24, 2020, at 6:38 PM, Matthew Knepley wrote: >>> >>> On Mon, Aug 24, 2020 at 6:27 PM Barry Smith wrote: >>> >>>> >>>> Alfredo, >>>> >>>> This should never happen. The input to the VecMAXPY in gmres is >>>> computed via VMDot which produces the same result on all processes. >>>> >>>> If you run with -pc_type bjacobi does it also happen? >>>> >>>> Is this your custom code or does it happen in PETSc examples >>>> also? Like src/snes/tutorials/ex19 -da_refine 5 >>>> >>>> Could be memory corruption, can you run under valgrind? >>>> >>> >>> Couldn't it happen if something generates a NaN? That also should not >>> happen, but I was allowing that pilut might do it. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Barry >>>> >>>> >>>> > On Aug 24, 2020, at 4:05 PM, Alfredo Jaramillo < >>>> ajaramillopalma at gmail.com> wrote: >>>> > >>>> > Dear PETSc developers, >>>> > >>>> > I'm trying to solve a linear problem with GMRES preconditioned with >>>> pilut from HYPRE. For this I'm using the options: >>>> > >>>> > -ksp_type gmres -pc_type hypre -pc_hypre_type pilut -ksp_monitor >>>> > >>>> > If I use a single core, GMRES (+ pilut or euclid) converges. However, >>>> when using multiple cores the next error appears after some number of >>>> iterations: >>>> > >>>> > [0]PETSC ERROR: Scalar value must be same on all processes, argument >>>> # 3 >>>> > >>>> > relative to the function VecMAXPY. I attached a screenshot with more >>>> detailed output. The same happens when using euclid. Can you please give me >>>> some insight on this? >>>> > >>>> > best regards >>>> > Alfredo >>>> > >>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Aug 25 16:46:37 2020 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 25 Aug 2020 16:46:37 -0500 Subject: [petsc-users] error when solving a linear system with gmres + pilut/euclid In-Reply-To: References: <4A6C7C21-E4AB-45AE-ABAA-D9028622B66C@petsc.dev> <04AF3F3C-47D5-49C0-8367-C43B7A1811D0@petsc.dev> <19B4C575-D633-4088-830F-12AC84C84EAE@petsc.dev> Message-ID: <9078A3A8-C7A6-4B30-9D16-475C86520491@petsc.dev> I have submitted a merge request https://gitlab.com/petsc/petsc/-/merge_requests/3096 that will make the error handling and message clearer in the future. Barry > On Aug 25, 2020, at 8:55 AM, Alfredo Jaramillo wrote: > > In fact, on my machine the code is compiled with gnu, and on the cluster it is compiled with intel (2015) compilers. I just run the program with "-fp_trap" and got: > > =============================================================== > |> Assembling interface problem. Unk # 56 > |> Solving interface problem > Residual norms for interp_ solve. > 0 KSP Residual norm 3.642615470862e+03 > [0]PETSC ERROR: *** unknown floating point error occurred *** > [0]PETSC ERROR: The specific exception can be determined by running in a debugger. When the > [0]PETSC ERROR: debugger traps the signal, the exception can be found with fetestexcept(0x3f) > [0]PETSC ERROR: where the result is a bitwise OR of the following flags: > [0]PETSC ERROR: FE_INVALID=0x1 FE_DIVBYZERO=0x4 FE_OVERFLOW=0x8 FE_UNDERFLOW=0x10 FE_INEXACT=0x20 > [0]PETSC ERROR: Try option -start_in_debugger > [0]PETSC ERROR: likely location of problem given in stack below > [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ > [1]PETSC ERROR: [2]PETSC ERROR: *** unknown floating point error occurred *** > [3]PETSC ERROR: *** unknown floating point error occurred *** > [3]PETSC ERROR: The specific exception can be determined by running in a debugger. When the > [4]PETSC ERROR: *** unknown floating point error occurred *** > [4]PETSC ERROR: The specific exception can be determined by running in a debugger. When the > [4]PETSC ERROR: [5]PETSC ERROR: *** unknown floating point error occurred *** > [5]PETSC ERROR: The specific exception can be determined by running in a debugger. When the > [5]PETSC ERROR: debugger traps the signal, the exception can be found with fetestexcept(0x3f) > [5]PETSC ERROR: where the result is a bitwise OR of the following flags: > [6]PETSC ERROR: *** unknown floating point error occurred *** > [6]PETSC ERROR: The specific exception can be determined by running in a debugger. When the > [6]PETSC ERROR: debugger traps the signal, the exception can be found with fetestexcept(0x3f) > [6]PETSC ERROR: where the result is a bitwise OR of the following flags: > [6]PETSC ERROR: FE_INVALID=0x1 FE_DIVBYZERO=0x4 FE_OVERFLOW=0x8 FE_UNDERFLOW=0x10 FE_INEXACT=0x20 > [7]PETSC ERROR: *** unknown floating point error occurred *** > [7]PETSC ERROR: The specific exception can be determined by running in a debugger. When the > [7]PETSC ERROR: debugger traps the signal, the exception can be found with fetestexcept(0x3f) > [7]PETSC ERROR: where the result is a bitwise OR of the following flags: > [7]PETSC ERROR: FE_INVALID=0x1 FE_DIVBYZERO=0x4 FE_OVERFLOW=0x8 FE_UNDERFLOW=0x10 FE_INEXACT=0x20 > [7]PETSC ERROR: Try option -start_in_debugger > [7]PETSC ERROR: likely location of problem given in stack below > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > [0]PETSC ERROR: INSTEAD the line number of the start of the function > [0]PETSC ERROR: is given. > [0]PETSC ERROR: [0] PetscDefaultFPTrap line 355 /mnt/lustre/home/ajaramillo/petsc-3.13.0/src/sys/error/fp.c > [0]PETSC ERROR: [0] VecMDot line 1154 /mnt/lustre/home/ajaramillo/petsc-3.13.0/src/vec/vec/interface/rvector.c > [0]PETSC ERROR: [0] KSPGMRESClassicalGramSchmidtOrthogonalization line 44 /mnt/lustre/home/ajaramillo/petsc-3.13.0/src/ksp/ksp/impls/gmres/borthog2.c > [0]PETSC ERROR: [0] KSPGMRESCycle line 122 /mnt/lustre/home/ajaramillo/petsc-3.13.0/src/ksp/ksp/impls/gmres/gmres.c > [0]PETSC ERROR: [0] KSPSolve_GMRES line 225 /mnt/lustre/home/ajaramillo/petsc-3.13.0/src/ksp/ksp/impls/gmres/gmres.c > [0]PETSC ERROR: [0] KSPSolve_Private line 590 /mnt/lustre/home/ajaramillo/petsc-3.13.0/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: *** unknown floating point error occurred *** > =============================================================== > > So it seems that in fact a division by 0 is taking place. I will try to run this in debug mode. > > thanks > Alfredo > > On Tue, Aug 25, 2020 at 10:23 AM Barry Smith > wrote: > > Sounds like it might be a compiler problem generating bad code. > > On the machine where it fails you can run with -fp_trap to have it error out as soon as a Nan or Inf appears. If you can use the debugger on that machine you can tell the debugger to catch floating point exceptions and see the exact line an values of variables where a Nan or Inf appear. > > As Matt conjectured it is likely there is a divide by zero before PETSc detects and it may be helpful to find out exactly where that happens. > > Barry > > >> On Aug 25, 2020, at 8:03 AM, Alfredo Jaramillo > wrote: >> >> Yes, Barry, that is correct. >> >> >> >> On Tue, Aug 25, 2020 at 1:02 AM Barry Smith > wrote: >> >> On one system you get this error, on another system with the identical code and test case you do not get the error? >> >> You get it with three iterative methods but not with MUMPS? >> >> Barry >> >> >>> On Aug 24, 2020, at 8:35 PM, Alfredo Jaramillo > wrote: >>> >>> Hello Barry, Matthew, thanks for the replies ! >>> >>> Yes, it is our custom code, and it also happens when setting -pc_type bjacobi. Before testing an iterative solver, we were using MUMPS (-ksp_type preonly -ksp_pc_type lu -pc_factor_mat_solver_type mumps) without issues. >>> >>> Running the ex19 (as "mpirun -n 4 ex19 -da_refine 5") did not produce any problem. >>> >>> To reproduce the situation on my computer, I was able to reproduce the error for a small case and -pc_type bjacobi. For that particular case, when running in the cluster the error appears at the very last iteration: >>> >>> ===== >>> 27 KSP Residual norm 8.230378644666e-06 >>> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>> [0]PETSC ERROR: Invalid argument >>> [0]PETSC ERROR: Scalar value must be same on all processes, argument # 3 >>> ==== >>> >>> whereas running on my computer the error is not launched and convergence is reached instead: >>> >>> ==== >>> Linear interp_ solve converged due to CONVERGED_RTOL iterations 27 >>> ==== >>> >>> I will run valgrind to seek for possible memory corruptions. >>> >>> thank you >>> Alfredo >>> >>> On Mon, Aug 24, 2020 at 9:00 PM Barry Smith > wrote: >>> >>> Oh yes, it could happen with Nan. >>> >>> KSPGMRESClassicalGramSchmidtOrthogonalization() calls KSPCheckDot(ksp,lhh[j]); so should detect any NAN that appear and set ksp->convergedreason but the call to MAXPY() is still made before returning and hence producing the error message. >>> >>> We should circuit the orthogonalization as soon as it sees a Nan/Inf and return immediately for GMRES to cleanup and produce a very useful error message. >>> >>> Alfredo, >>> >>> It is also possible that the hypre preconditioners are producing a Nan because your matrix is too difficult for them to handle, but it would be odd to happen after many iterations. >>> >>> As I suggested before run with -pc_type bjacobi to see if you get the same problem. >>> >>> Barry >>> >>> >>>> On Aug 24, 2020, at 6:38 PM, Matthew Knepley > wrote: >>>> >>>> On Mon, Aug 24, 2020 at 6:27 PM Barry Smith > wrote: >>>> >>>> Alfredo, >>>> >>>> This should never happen. The input to the VecMAXPY in gmres is computed via VMDot which produces the same result on all processes. >>>> >>>> If you run with -pc_type bjacobi does it also happen? >>>> >>>> Is this your custom code or does it happen in PETSc examples also? Like src/snes/tutorials/ex19 -da_refine 5 >>>> >>>> Could be memory corruption, can you run under valgrind? >>>> >>>> Couldn't it happen if something generates a NaN? That also should not happen, but I was allowing that pilut might do it. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> Barry >>>> >>>> >>>> > On Aug 24, 2020, at 4:05 PM, Alfredo Jaramillo > wrote: >>>> > >>>> > Dear PETSc developers, >>>> > >>>> > I'm trying to solve a linear problem with GMRES preconditioned with pilut from HYPRE. For this I'm using the options: >>>> > >>>> > -ksp_type gmres -pc_type hypre -pc_hypre_type pilut -ksp_monitor >>>> > >>>> > If I use a single core, GMRES (+ pilut or euclid) converges. However, when using multiple cores the next error appears after some number of iterations: >>>> > >>>> > [0]PETSC ERROR: Scalar value must be same on all processes, argument # 3 >>>> > >>>> > relative to the function VecMAXPY. I attached a screenshot with more detailed output. The same happens when using euclid. Can you please give me some insight on this? >>>> > >>>> > best regards >>>> > Alfredo >>>> > >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ajaramillopalma at gmail.com Tue Aug 25 16:54:37 2020 From: ajaramillopalma at gmail.com (Alfredo Jaramillo) Date: Tue, 25 Aug 2020 18:54:37 -0300 Subject: [petsc-users] error when solving a linear system with gmres + pilut/euclid In-Reply-To: <9078A3A8-C7A6-4B30-9D16-475C86520491@petsc.dev> References: <4A6C7C21-E4AB-45AE-ABAA-D9028622B66C@petsc.dev> <04AF3F3C-47D5-49C0-8367-C43B7A1811D0@petsc.dev> <19B4C575-D633-4088-830F-12AC84C84EAE@petsc.dev> <9078A3A8-C7A6-4B30-9D16-475C86520491@petsc.dev> Message-ID: thank you, Barry, I wasn't able to reproduce the error on my computer, neither on a second cluster. On the first cluster, I requested to activate X11 at some node for attaching a debugger, and that activation (if possible) should take some time. I will inform you of any news on that. kind regards Alfredo On Tue, Aug 25, 2020 at 6:46 PM Barry Smith wrote: > > I have submitted a merge request > https://gitlab.com/petsc/petsc/-/merge_requests/3096 that will make the > error handling and message clearer in the future. > > Barry > > > On Aug 25, 2020, at 8:55 AM, Alfredo Jaramillo > wrote: > > In fact, on my machine the code is compiled with gnu, and on the cluster > it is compiled with intel (2015) compilers. I just run the program with > "-fp_trap" and got: > > =============================================================== > |> Assembling interface problem. Unk # 56 > |> Solving interface problem > Residual norms for interp_ solve. > 0 KSP Residual norm 3.642615470862e+03 > [0]PETSC ERROR: *** unknown floating point error occurred *** > [0]PETSC ERROR: The specific exception can be determined by running in a > debugger. When the > [0]PETSC ERROR: debugger traps the signal, the exception can be found with > fetestexcept(0x3f) > [0]PETSC ERROR: where the result is a bitwise OR of the following flags: > [0]PETSC ERROR: FE_INVALID=0x1 FE_DIVBYZERO=0x4 FE_OVERFLOW=0x8 > FE_UNDERFLOW=0x10 FE_INEXACT=0x20 > [0]PETSC ERROR: Try option -start_in_debugger > [0]PETSC ERROR: likely location of problem given in stack below > [0]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [1]PETSC ERROR: [2]PETSC ERROR: *** unknown floating point error occurred > *** > [3]PETSC ERROR: *** unknown floating point error occurred *** > [3]PETSC ERROR: The specific exception can be determined by running in a > debugger. When the > [4]PETSC ERROR: *** unknown floating point error occurred *** > [4]PETSC ERROR: The specific exception can be determined by running in a > debugger. When the > [4]PETSC ERROR: [5]PETSC ERROR: *** unknown floating point error occurred > *** > [5]PETSC ERROR: The specific exception can be determined by running in a > debugger. When the > [5]PETSC ERROR: debugger traps the signal, the exception can be found with > fetestexcept(0x3f) > [5]PETSC ERROR: where the result is a bitwise OR of the following flags: > [6]PETSC ERROR: *** unknown floating point error occurred *** > [6]PETSC ERROR: The specific exception can be determined by running in a > debugger. When the > [6]PETSC ERROR: debugger traps the signal, the exception can be found with > fetestexcept(0x3f) > [6]PETSC ERROR: where the result is a bitwise OR of the following flags: > [6]PETSC ERROR: FE_INVALID=0x1 FE_DIVBYZERO=0x4 FE_OVERFLOW=0x8 > FE_UNDERFLOW=0x10 FE_INEXACT=0x20 > [7]PETSC ERROR: *** unknown floating point error occurred *** > [7]PETSC ERROR: The specific exception can be determined by running in a > debugger. When the > [7]PETSC ERROR: debugger traps the signal, the exception can be found with > fetestexcept(0x3f) > [7]PETSC ERROR: where the result is a bitwise OR of the following flags: > [7]PETSC ERROR: FE_INVALID=0x1 FE_DIVBYZERO=0x4 FE_OVERFLOW=0x8 > FE_UNDERFLOW=0x10 FE_INEXACT=0x20 > [7]PETSC ERROR: Try option -start_in_debugger > [7]PETSC ERROR: likely location of problem given in stack below > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not > available, > [0]PETSC ERROR: INSTEAD the line number of the start of the function > [0]PETSC ERROR: is given. > [0]PETSC ERROR: [0] PetscDefaultFPTrap line 355 > /mnt/lustre/home/ajaramillo/petsc-3.13.0/src/sys/error/fp.c > [0]PETSC ERROR: [0] VecMDot line 1154 > /mnt/lustre/home/ajaramillo/petsc-3.13.0/src/vec/vec/interface/rvector.c > [0]PETSC ERROR: [0] KSPGMRESClassicalGramSchmidtOrthogonalization line 44 > /mnt/lustre/home/ajaramillo/petsc-3.13.0/src/ksp/ksp/impls/gmres/borthog2.c > [0]PETSC ERROR: [0] KSPGMRESCycle line 122 > /mnt/lustre/home/ajaramillo/petsc-3.13.0/src/ksp/ksp/impls/gmres/gmres.c > [0]PETSC ERROR: [0] KSPSolve_GMRES line 225 > /mnt/lustre/home/ajaramillo/petsc-3.13.0/src/ksp/ksp/impls/gmres/gmres.c > [0]PETSC ERROR: [0] KSPSolve_Private line 590 > /mnt/lustre/home/ajaramillo/petsc-3.13.0/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: *** unknown floating point error occurred *** > =============================================================== > > So it seems that in fact a division by 0 is taking place. I will try to > run this in debug mode. > > thanks > Alfredo > > On Tue, Aug 25, 2020 at 10:23 AM Barry Smith wrote: > >> >> Sounds like it might be a compiler problem generating bad code. >> >> On the machine where it fails you can run with -fp_trap to have it >> error out as soon as a Nan or Inf appears. If you can use the debugger on >> that machine you can tell the debugger to catch floating point exceptions >> and see the exact line an values of variables where a Nan or Inf appear. >> >> As Matt conjectured it is likely there is a divide by zero before >> PETSc detects and it may be helpful to find out exactly where that happens. >> >> Barry >> >> >> On Aug 25, 2020, at 8:03 AM, Alfredo Jaramillo >> wrote: >> >> Yes, Barry, that is correct. >> >> >> >> On Tue, Aug 25, 2020 at 1:02 AM Barry Smith wrote: >> >>> >>> On one system you get this error, on another system with the identical >>> code and test case you do not get the error? >>> >>> You get it with three iterative methods but not with MUMPS? >>> >>> Barry >>> >>> >>> On Aug 24, 2020, at 8:35 PM, Alfredo Jaramillo < >>> ajaramillopalma at gmail.com> wrote: >>> >>> Hello Barry, Matthew, thanks for the replies ! >>> >>> Yes, it is our custom code, and it also happens when setting -pc_type >>> bjacobi. Before testing an iterative solver, we were using MUMPS (-ksp_type >>> preonly -ksp_pc_type lu -pc_factor_mat_solver_type mumps) without issues. >>> >>> Running the ex19 (as "mpirun -n 4 ex19 -da_refine 5") did not produce >>> any problem. >>> >>> To reproduce the situation on my computer, I was able to reproduce the >>> error for a small case and -pc_type bjacobi. For that particular case, when >>> running in the cluster the error appears at the very last iteration: >>> >>> ===== >>> 27 KSP Residual norm 8.230378644666e-06 >>> [0]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> [0]PETSC ERROR: Invalid argument >>> [0]PETSC ERROR: Scalar value must be same on all processes, argument # 3 >>> ==== >>> >>> whereas running on my computer the error is not launched and convergence >>> is reached instead: >>> >>> ==== >>> Linear interp_ solve converged due to CONVERGED_RTOL iterations 27 >>> ==== >>> >>> I will run valgrind to seek for possible memory corruptions. >>> >>> thank you >>> Alfredo >>> >>> On Mon, Aug 24, 2020 at 9:00 PM Barry Smith wrote: >>> >>>> >>>> Oh yes, it could happen with Nan. >>>> >>>> KSPGMRESClassicalGramSchmidtOrthogonalization() >>>> calls KSPCheckDot(ksp,lhh[j]); so should detect any NAN that appear and >>>> set ksp->convergedreason but the call to MAXPY() is still made before >>>> returning and hence producing the error message. >>>> >>>> We should circuit the orthogonalization as soon as it sees a Nan/Inf >>>> and return immediately for GMRES to cleanup and produce a very useful error >>>> message. >>>> >>>> Alfredo, >>>> >>>> It is also possible that the hypre preconditioners are producing a >>>> Nan because your matrix is too difficult for them to handle, but it would >>>> be odd to happen after many iterations. >>>> >>>> As I suggested before run with -pc_type bjacobi to see if you get >>>> the same problem. >>>> >>>> Barry >>>> >>>> >>>> On Aug 24, 2020, at 6:38 PM, Matthew Knepley wrote: >>>> >>>> On Mon, Aug 24, 2020 at 6:27 PM Barry Smith wrote: >>>> >>>>> >>>>> Alfredo, >>>>> >>>>> This should never happen. The input to the VecMAXPY in gmres is >>>>> computed via VMDot which produces the same result on all processes. >>>>> >>>>> If you run with -pc_type bjacobi does it also happen? >>>>> >>>>> Is this your custom code or does it happen in PETSc examples >>>>> also? Like src/snes/tutorials/ex19 -da_refine 5 >>>>> >>>>> Could be memory corruption, can you run under valgrind? >>>>> >>>> >>>> Couldn't it happen if something generates a NaN? That also should not >>>> happen, but I was allowing that pilut might do it. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Barry >>>>> >>>>> >>>>> > On Aug 24, 2020, at 4:05 PM, Alfredo Jaramillo < >>>>> ajaramillopalma at gmail.com> wrote: >>>>> > >>>>> > Dear PETSc developers, >>>>> > >>>>> > I'm trying to solve a linear problem with GMRES preconditioned with >>>>> pilut from HYPRE. For this I'm using the options: >>>>> > >>>>> > -ksp_type gmres -pc_type hypre -pc_hypre_type pilut -ksp_monitor >>>>> > >>>>> > If I use a single core, GMRES (+ pilut or euclid) converges. >>>>> However, when using multiple cores the next error appears after some number >>>>> of iterations: >>>>> > >>>>> > [0]PETSC ERROR: Scalar value must be same on all processes, argument >>>>> # 3 >>>>> > >>>>> > relative to the function VecMAXPY. I attached a screenshot with more >>>>> detailed output. The same happens when using euclid. Can you please give me >>>>> some insight on this? >>>>> > >>>>> > best regards >>>>> > Alfredo >>>>> > >>>>> >>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Aug 25 18:14:44 2020 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 25 Aug 2020 18:14:44 -0500 Subject: [petsc-users] error when solving a linear system with gmres + pilut/euclid In-Reply-To: References: <4A6C7C21-E4AB-45AE-ABAA-D9028622B66C@petsc.dev> <04AF3F3C-47D5-49C0-8367-C43B7A1811D0@petsc.dev> <19B4C575-D633-4088-830F-12AC84C84EAE@petsc.dev> <9078A3A8-C7A6-4B30-9D16-475C86520491@petsc.dev> Message-ID: Irony, the more one pays for a machine the more difficult it is to debug on. > On Aug 25, 2020, at 4:54 PM, Alfredo Jaramillo wrote: > > thank you, Barry, > > I wasn't able to reproduce the error on my computer, neither on a second cluster. On the first cluster, I requested to activate X11 at some node for attaching a debugger, and that activation (if possible) should take some time. > I will inform you of any news on that. > > kind regards > Alfredo > > > > On Tue, Aug 25, 2020 at 6:46 PM Barry Smith > wrote: > > I have submitted a merge request https://gitlab.com/petsc/petsc/-/merge_requests/3096 that will make the error handling and message clearer in the future. > > Barry > > >> On Aug 25, 2020, at 8:55 AM, Alfredo Jaramillo > wrote: >> >> In fact, on my machine the code is compiled with gnu, and on the cluster it is compiled with intel (2015) compilers. I just run the program with "-fp_trap" and got: >> >> =============================================================== >> |> Assembling interface problem. Unk # 56 >> |> Solving interface problem >> Residual norms for interp_ solve. >> 0 KSP Residual norm 3.642615470862e+03 >> [0]PETSC ERROR: *** unknown floating point error occurred *** >> [0]PETSC ERROR: The specific exception can be determined by running in a debugger. When the >> [0]PETSC ERROR: debugger traps the signal, the exception can be found with fetestexcept(0x3f) >> [0]PETSC ERROR: where the result is a bitwise OR of the following flags: >> [0]PETSC ERROR: FE_INVALID=0x1 FE_DIVBYZERO=0x4 FE_OVERFLOW=0x8 FE_UNDERFLOW=0x10 FE_INEXACT=0x20 >> [0]PETSC ERROR: Try option -start_in_debugger >> [0]PETSC ERROR: likely location of problem given in stack below >> [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ >> [1]PETSC ERROR: [2]PETSC ERROR: *** unknown floating point error occurred *** >> [3]PETSC ERROR: *** unknown floating point error occurred *** >> [3]PETSC ERROR: The specific exception can be determined by running in a debugger. When the >> [4]PETSC ERROR: *** unknown floating point error occurred *** >> [4]PETSC ERROR: The specific exception can be determined by running in a debugger. When the >> [4]PETSC ERROR: [5]PETSC ERROR: *** unknown floating point error occurred *** >> [5]PETSC ERROR: The specific exception can be determined by running in a debugger. When the >> [5]PETSC ERROR: debugger traps the signal, the exception can be found with fetestexcept(0x3f) >> [5]PETSC ERROR: where the result is a bitwise OR of the following flags: >> [6]PETSC ERROR: *** unknown floating point error occurred *** >> [6]PETSC ERROR: The specific exception can be determined by running in a debugger. When the >> [6]PETSC ERROR: debugger traps the signal, the exception can be found with fetestexcept(0x3f) >> [6]PETSC ERROR: where the result is a bitwise OR of the following flags: >> [6]PETSC ERROR: FE_INVALID=0x1 FE_DIVBYZERO=0x4 FE_OVERFLOW=0x8 FE_UNDERFLOW=0x10 FE_INEXACT=0x20 >> [7]PETSC ERROR: *** unknown floating point error occurred *** >> [7]PETSC ERROR: The specific exception can be determined by running in a debugger. When the >> [7]PETSC ERROR: debugger traps the signal, the exception can be found with fetestexcept(0x3f) >> [7]PETSC ERROR: where the result is a bitwise OR of the following flags: >> [7]PETSC ERROR: FE_INVALID=0x1 FE_DIVBYZERO=0x4 FE_OVERFLOW=0x8 FE_UNDERFLOW=0x10 FE_INEXACT=0x20 >> [7]PETSC ERROR: Try option -start_in_debugger >> [7]PETSC ERROR: likely location of problem given in stack below >> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, >> [0]PETSC ERROR: INSTEAD the line number of the start of the function >> [0]PETSC ERROR: is given. >> [0]PETSC ERROR: [0] PetscDefaultFPTrap line 355 /mnt/lustre/home/ajaramillo/petsc-3.13.0/src/sys/error/fp.c >> [0]PETSC ERROR: [0] VecMDot line 1154 /mnt/lustre/home/ajaramillo/petsc-3.13.0/src/vec/vec/interface/rvector.c >> [0]PETSC ERROR: [0] KSPGMRESClassicalGramSchmidtOrthogonalization line 44 /mnt/lustre/home/ajaramillo/petsc-3.13.0/src/ksp/ksp/impls/gmres/borthog2.c >> [0]PETSC ERROR: [0] KSPGMRESCycle line 122 /mnt/lustre/home/ajaramillo/petsc-3.13.0/src/ksp/ksp/impls/gmres/gmres.c >> [0]PETSC ERROR: [0] KSPSolve_GMRES line 225 /mnt/lustre/home/ajaramillo/petsc-3.13.0/src/ksp/ksp/impls/gmres/gmres.c >> [0]PETSC ERROR: [0] KSPSolve_Private line 590 /mnt/lustre/home/ajaramillo/petsc-3.13.0/src/ksp/ksp/interface/itfunc.c >> [0]PETSC ERROR: *** unknown floating point error occurred *** >> =============================================================== >> >> So it seems that in fact a division by 0 is taking place. I will try to run this in debug mode. >> >> thanks >> Alfredo >> >> On Tue, Aug 25, 2020 at 10:23 AM Barry Smith > wrote: >> >> Sounds like it might be a compiler problem generating bad code. >> >> On the machine where it fails you can run with -fp_trap to have it error out as soon as a Nan or Inf appears. If you can use the debugger on that machine you can tell the debugger to catch floating point exceptions and see the exact line an values of variables where a Nan or Inf appear. >> >> As Matt conjectured it is likely there is a divide by zero before PETSc detects and it may be helpful to find out exactly where that happens. >> >> Barry >> >> >>> On Aug 25, 2020, at 8:03 AM, Alfredo Jaramillo > wrote: >>> >>> Yes, Barry, that is correct. >>> >>> >>> >>> On Tue, Aug 25, 2020 at 1:02 AM Barry Smith > wrote: >>> >>> On one system you get this error, on another system with the identical code and test case you do not get the error? >>> >>> You get it with three iterative methods but not with MUMPS? >>> >>> Barry >>> >>> >>>> On Aug 24, 2020, at 8:35 PM, Alfredo Jaramillo > wrote: >>>> >>>> Hello Barry, Matthew, thanks for the replies ! >>>> >>>> Yes, it is our custom code, and it also happens when setting -pc_type bjacobi. Before testing an iterative solver, we were using MUMPS (-ksp_type preonly -ksp_pc_type lu -pc_factor_mat_solver_type mumps) without issues. >>>> >>>> Running the ex19 (as "mpirun -n 4 ex19 -da_refine 5") did not produce any problem. >>>> >>>> To reproduce the situation on my computer, I was able to reproduce the error for a small case and -pc_type bjacobi. For that particular case, when running in the cluster the error appears at the very last iteration: >>>> >>>> ===== >>>> 27 KSP Residual norm 8.230378644666e-06 >>>> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>>> [0]PETSC ERROR: Invalid argument >>>> [0]PETSC ERROR: Scalar value must be same on all processes, argument # 3 >>>> ==== >>>> >>>> whereas running on my computer the error is not launched and convergence is reached instead: >>>> >>>> ==== >>>> Linear interp_ solve converged due to CONVERGED_RTOL iterations 27 >>>> ==== >>>> >>>> I will run valgrind to seek for possible memory corruptions. >>>> >>>> thank you >>>> Alfredo >>>> >>>> On Mon, Aug 24, 2020 at 9:00 PM Barry Smith > wrote: >>>> >>>> Oh yes, it could happen with Nan. >>>> >>>> KSPGMRESClassicalGramSchmidtOrthogonalization() calls KSPCheckDot(ksp,lhh[j]); so should detect any NAN that appear and set ksp->convergedreason but the call to MAXPY() is still made before returning and hence producing the error message. >>>> >>>> We should circuit the orthogonalization as soon as it sees a Nan/Inf and return immediately for GMRES to cleanup and produce a very useful error message. >>>> >>>> Alfredo, >>>> >>>> It is also possible that the hypre preconditioners are producing a Nan because your matrix is too difficult for them to handle, but it would be odd to happen after many iterations. >>>> >>>> As I suggested before run with -pc_type bjacobi to see if you get the same problem. >>>> >>>> Barry >>>> >>>> >>>>> On Aug 24, 2020, at 6:38 PM, Matthew Knepley > wrote: >>>>> >>>>> On Mon, Aug 24, 2020 at 6:27 PM Barry Smith > wrote: >>>>> >>>>> Alfredo, >>>>> >>>>> This should never happen. The input to the VecMAXPY in gmres is computed via VMDot which produces the same result on all processes. >>>>> >>>>> If you run with -pc_type bjacobi does it also happen? >>>>> >>>>> Is this your custom code or does it happen in PETSc examples also? Like src/snes/tutorials/ex19 -da_refine 5 >>>>> >>>>> Could be memory corruption, can you run under valgrind? >>>>> >>>>> Couldn't it happen if something generates a NaN? That also should not happen, but I was allowing that pilut might do it. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> Barry >>>>> >>>>> >>>>> > On Aug 24, 2020, at 4:05 PM, Alfredo Jaramillo > wrote: >>>>> > >>>>> > Dear PETSc developers, >>>>> > >>>>> > I'm trying to solve a linear problem with GMRES preconditioned with pilut from HYPRE. For this I'm using the options: >>>>> > >>>>> > -ksp_type gmres -pc_type hypre -pc_hypre_type pilut -ksp_monitor >>>>> > >>>>> > If I use a single core, GMRES (+ pilut or euclid) converges. However, when using multiple cores the next error appears after some number of iterations: >>>>> > >>>>> > [0]PETSC ERROR: Scalar value must be same on all processes, argument # 3 >>>>> > >>>>> > relative to the function VecMAXPY. I attached a screenshot with more detailed output. The same happens when using euclid. Can you please give me some insight on this? >>>>> > >>>>> > best regards >>>>> > Alfredo >>>>> > >>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sajidsyed2021 at u.northwestern.edu Tue Aug 25 18:46:31 2020 From: sajidsyed2021 at u.northwestern.edu (Sajid Ali) Date: Tue, 25 Aug 2020 18:46:31 -0500 Subject: [petsc-users] Question on usage of PetscMalloc(Re)SetCUDAHost Message-ID: Hi PETSc-developers, Is it valid to allocate matrix values on host for use on a GPU later by embedding all allocation logic (i.e the code block that calls PetscMalloc1 for values and indices and sets them using MatSetValues) within a section marked by PetscMalloc(Re)SetCUDAHost ? My understanding was that PetscMallocSetCUDAHost would set mallocs to be on the host but I?m getting an error as shown below (for some strange reason it happens to be the 5th column on the 0th row (if that helps) both when setting one value at a time and when setting the whole 0th row together): [sajid at xrmlite cuda]$ mpirun -np 1 ~/packages/pirt/src/pirt -inputfile shepplogan.h5 PIRT -- Parallel Iterative Reconstruction Tomography Reading in real data from shepplogan.h5 After loading data, nTau:100, nTheta:50 After detector geometry context initialization Initialized PIRT [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Error in external library [0]PETSC ERROR: cuda error 1 (cudaErrorInvalidValue) : invalid argument [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.13.2-947-gc2372adeb2 GIT Date: 2020-08-25 21:07:25 +0000 [0]PETSC ERROR: /home/sajid/packages/pirt/src/pirt on a arch-linux-c-debug named xrmlite by sajid Tue Aug 25 18:30:55 2020 [0]PETSC ERROR: Configure options --with-hdf5=1 --with-cuda=1 [0]PETSC ERROR: #1 PetscCUDAHostFree() line 14 in /home/sajid/packages/petsc/src/sys/memory/cuda/mcudahost.cu [0]PETSC ERROR: #2 PetscFreeA() line 475 in /home/sajid/packages/petsc/src/sys/memory/mal.c [0]PETSC ERROR: #3 MatSeqXAIJFreeAIJ() line 135 in /home/sajid/packages/petsc/include/../src/mat/impls/aij/seq/aij.h [0]PETSC ERROR: #4 MatSetValues_SeqAIJ() line 498 in /home/sajid/packages/petsc/src/mat/impls/aij/seq/aij.c [0]PETSC ERROR: #5 MatSetValues() line 1392 in /home/sajid/packages/petsc/src/mat/interface/matrix.c [0]PETSC ERROR: #6 setMatrixElements() line 248 in /home/sajid/packages/pirt/src/geom.cxx [0]PETSC ERROR: #7 construct_matrix() line 91 in /home/sajid/packages/pirt/src/matrix.cu [0]PETSC ERROR: #8 main() line 20 in /home/sajid/packages/pirt/src/pirt.cxx [0]PETSC ERROR: PETSc Option Table entries: [0]PETSC ERROR: -inputfile shepplogan.h5 [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF with errorcode 20076. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- [sajid at xrmlite cuda]$ PetscCUDAHostFree is called within the PetscMalloc(Re)SetCUDAHost block as described earlier which should?ve created valid memory on the host. Could someone explain if this is the correct approach to take and what the above error means ? (PS : I?ve run ksp tutorial-ex2 with -vec_type cuda -mat_type aijcusparse to test the installation and everything works as expected.) Thank You, Sajid Ali | PhD Candidate Applied Physics Northwestern University s-sajid-ali.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Aug 25 18:59:35 2020 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 25 Aug 2020 18:59:35 -0500 Subject: [petsc-users] Question on usage of PetscMalloc(Re)SetCUDAHost In-Reply-To: References: Message-ID: <03DB6A68-9984-4C26-A669-BD676B8B8AAA@petsc.dev> PetscMallocSetCUDAHost() switches from using the regular malloc on the CPU to using cudaMallocHost() it also switches the free.These means between the PetscMallocSetCUDAHost() and the PetscMallocResetCUDAHost() all mallocs are done with cudaHost version and so are all frees. If any memory that was allocated before the call to PetscMallocSetCUDAHost() so it was allocated with a regular malloc is freed inside the block it will be freed with the incorrect cudaHostFree and will crash. This makes these routines very fragile I don't understand the purpose of PetscMallocSetCUDAHost(), possibly it is intended to be used with Nvidia unified memory so the same addresses can be used on the GPU. PETSc does not use or need unified memory in its programming model for GPUs. As far as I am aware you don't have any reason to use these routines. Barry > On Aug 25, 2020, at 6:46 PM, Sajid Ali wrote: > > Hi PETSc-developers, > > Is it valid to allocate matrix values on host for use on a GPU later by embedding all allocation logic (i.e the code block that calls PetscMalloc1 for values and indices and sets them using MatSetValues) within a section marked by PetscMalloc(Re)SetCUDAHost ? > > My understanding was that PetscMallocSetCUDAHost would set mallocs to be on the host but I?m getting an error as shown below (for some strange reason it happens to be the 5th column on the 0th row (if that helps) both when setting one value at a time and when setting the whole 0th row together): > > [sajid at xrmlite cuda]$ mpirun -np 1 ~/packages/pirt/src/pirt -inputfile shepplogan.h5 > PIRT -- Parallel Iterative Reconstruction Tomography > Reading in real data from shepplogan.h5 > After loading data, nTau:100, nTheta:50 > After detector geometry context initialization > Initialized PIRT > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Error in external library > [0]PETSC ERROR: cuda error 1 (cudaErrorInvalidValue) : invalid argument > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.13.2-947-gc2372adeb2 GIT Date: 2020-08-25 21:07:25 +0000 > [0]PETSC ERROR: /home/sajid/packages/pirt/src/pirt on a arch-linux-c-debug named xrmlite by sajid Tue Aug 25 18:30:55 2020 > [0]PETSC ERROR: Configure options --with-hdf5=1 --with-cuda=1 > [0]PETSC ERROR: #1 PetscCUDAHostFree() line 14 in /home/sajid/packages/petsc/src/sys/memory/cuda/mcudahost.cu > [0]PETSC ERROR: #2 PetscFreeA() line 475 in /home/sajid/packages/petsc/src/sys/memory/mal.c > [0]PETSC ERROR: #3 MatSeqXAIJFreeAIJ() line 135 in /home/sajid/packages/petsc/include/../src/mat/impls/aij/seq/aij.h > [0]PETSC ERROR: #4 MatSetValues_SeqAIJ() line 498 in /home/sajid/packages/petsc/src/mat/impls/aij/seq/aij.c > [0]PETSC ERROR: #5 MatSetValues() line 1392 in /home/sajid/packages/petsc/src/mat/interface/matrix.c > [0]PETSC ERROR: #6 setMatrixElements() line 248 in /home/sajid/packages/pirt/src/geom.cxx > [0]PETSC ERROR: #7 construct_matrix() line 91 in /home/sajid/packages/pirt/src/matrix.cu > [0]PETSC ERROR: #8 main() line 20 in /home/sajid/packages/pirt/src/pirt.cxx > [0]PETSC ERROR: PETSc Option Table entries: > [0]PETSC ERROR: -inputfile shepplogan.h5 > [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF > with errorcode 20076. > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > -------------------------------------------------------------------------- > [sajid at xrmlite cuda]$ > PetscCUDAHostFree is called within the PetscMalloc(Re)SetCUDAHost block as described earlier which should?ve created valid memory on the host. > > Could someone explain if this is the correct approach to take and what the above error means ? > > (PS : I?ve run ksp tutorial-ex2 with -vec_type cuda -mat_type aijcusparse to test the installation and everything works as expected.) > > Thank You, > Sajid Ali | PhD Candidate > Applied Physics > Northwestern University > s-sajid-ali.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From sajidsyed2021 at u.northwestern.edu Tue Aug 25 20:59:37 2020 From: sajidsyed2021 at u.northwestern.edu (Sajid Ali) Date: Tue, 25 Aug 2020 20:59:37 -0500 Subject: [petsc-users] Question on usage of PetscMalloc(Re)SetCUDAHost In-Reply-To: <03DB6A68-9984-4C26-A669-BD676B8B8AAA@petsc.dev> References: <03DB6A68-9984-4C26-A669-BD676B8B8AAA@petsc.dev> Message-ID: Hi Barry, Thanks for the explanation! Removing the calls to PetscMalloc(Re)SetCUDAHost solved that issue. Just to clarify, all PetscMalloc(s) happen on the host and there is no special PetscMalloc for device memory allocation ? (Say for an operation sequence PetscMalloc1(N, &ptr), VecCUDAGetArray(cudavec, &ptr) ) Thank You, Sajid Ali | PhD Candidate Applied Physics Northwestern University s-sajid-ali.github.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Tue Aug 25 21:56:48 2020 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Tue, 25 Aug 2020 21:56:48 -0500 Subject: [petsc-users] Question on usage of PetscMalloc(Re)SetCUDAHost In-Reply-To: References: <03DB6A68-9984-4C26-A669-BD676B8B8AAA@petsc.dev> Message-ID: On Tue, Aug 25, 2020 at 9:01 PM Sajid Ali wrote: > Hi Barry, > > Thanks for the explanation! Removing the calls to > PetscMalloc(Re)SetCUDAHost solved that issue. > > Just to clarify, all PetscMalloc(s) happen on the host and there is no > special PetscMalloc for device memory allocation ? (Say for an operation > sequence PetscMalloc1(N, &ptr), VecCUDAGetArray(cudavec, &ptr) ) > > PetscMallocSetCUDAHost() is to instruct petsc to use cudaHostAlloc() thereafter to allocate non-pagable host memory. Yes, there is no variant of PetscMalloc doing cudaMalloc. > Thank You, > Sajid Ali | PhD Candidate > Applied Physics > Northwestern University > s-sajid-ali.github.io > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Laura-victoria.ROLANDI at isae-supaero.fr Wed Aug 26 04:07:51 2020 From: Laura-victoria.ROLANDI at isae-supaero.fr (ROLANDI Laura victoria) Date: Wed, 26 Aug 2020 11:07:51 +0200 Subject: [petsc-users] =?utf-8?b?Pz09P3V0Zi04P3E/ID89PT91dGYtOD9xPyBbU0xF?= =?utf-8?q?Pc=5D__Krylov_Schur-_saving_krylov_subspace?= In-Reply-To: <6520D6CE-543F-4170-B1F7-5FA8B2575424@dsic.upv.es> Message-ID: <391-5f462680-15-78c54e80@265196094> Hi Jose, Thank you for your advices!? Maybe I didn't understand the EPSSetInitialSpace() function correctly, why?should I compute a single vector for using it with Krylov-Schur? >From what I understood it needs 3 inputs EPSSetInitialSpace(EPS eps,PetscInt n,Vec vv[]) where vv[] is the set of basis vectors of the initial space and n is the number of vectors of the basis. >From what you said regarding the EPSGetBV(), couldn't I use this function to get the whole krylov-subspace and then pass it as vv[] to EPSSetInitialSpace()? even if, as you said, I loose informations regarding the unconverged Ritz pairs... Victoria Il giorno Venerdi, Agosto 21, 2020 17:51 CEST, "Jose E. Roman" ha scritto: ?I see. Doing the exponential with SLEPc itself might be faster, as in https://slepc.upv.es/documentation/current/src/eps/tutorials/ex36.c.html but I cannot say if it will work for your problem. This approach was used in https://doi.org/10.1017/S0022377818001022 EPSGetBV() gives you a BV object that contains the Krylov subspace. But to use EPSSetInitialSpace() with Krylov-Schur you should compute a single vector, and the information for the unconverged Ritz vector cannot be easily recovered. It is easier if you use EPSSUBSPACE, where in that case you can pass the whole subspace to EPSSetInitialSpace(). The problem is that convergence of subspace iteration will likely be much slower than Krylov-Schur. Jose > El 21 ago 2020, a las 14:34, ROLANDI Laura victoria escribi?: > > Thank you for your quick response. > > Yes, i'm running in parallel, I'm just asking for 2 eigenvalues and I'm not doing any factorization. > > My problem is taking so long because I have implemented the time stepping-exponential transformation: my MatMult() function for computing vectors of the Krylov subspace calls the Direct Numerical Simulation code for compressible Navier-Stokes equations to which I'm linking the stability code. > Therefore, each MatMult() call takes very long, and I cannot save the converged eigenvectors for the restart beacause there won't be any converged eigenvectors yet when the job is killed. > That's why I thought that the only thing I could save was the computed krylov subspace. > > Victoria > > > > Il giorno Venerdi, Agosto 21, 2020 12:42 CEST, "Jose E. Roman" ha scritto: > >> Why is your problem taking so long? Are you running in parallel? Is your computation doing a factorization of a matrix? Are you getting slow convergence? How many eigenvalues are you computing? Note that Krylov-Schur is not intended for computing a large percentage of eigenvalues, if you do so then you might get large overheads unless you tune the EPSSetDimensions() parameters (mpd). >> >> EPSSetInitialSpace() is intended to provide an initial guess, which in Krylov-Schur is a single vector, so in this case you would not pass the Krylov subspace from a previous run. >> >> A possible scheme for restarting is to save the eigenvectors computed so far, then pass them in the next run via EPSSetDeflationSpace() to avoid recomputing them. You can use a custom stopping criterion as in https://slepc.upv.es/documentation/current/src/eps/tutorials/ex29.c.html to stop before the job is killed, then save the converged eigenvectors (or EPSGetInvariantSubspace() if the problem is nonsymmetric). >> >> Jose >> >> >> > El 21 ago 2020, a las 11:56, ROLANDI Laura victoria escribi?: >> > >> > Dear SLEPc developers, >> > >> > I'm using the Krylov Schur EPS and I have a question regarding a command. >> > >> > Is there a way for having access and saving the krylov subspace during the EPSSolve call? >> > >> > I inizialize the solver using the function EPSSetInitialSpace(eps,1, v0), where v0 is a specific vector, but after 24 hours of calculation my job has to end even if the EPSSolve hasn't finished yet. >> > Which function should I use for saving the computed Krylov subspace and its dimention n during the process, in order to restart the calculation from it by using EPSsetInitialSpace(eps,n, Krylov-Subspace)? >> > >> > Thank you very much, >> > Victoria >> > > > > -- > --- > Laura victoria ROLANDI > Doctorant - Doctorat ISAE-SUPAERO Doctorat 1 > laura-victoria.rolandi at isae-supaero.fr > https://www.isae-supaero.fr > Institut Sup?rieur de l'A?ronautique et de l'Espace > 10, avenue Edouard Belin - BP 54032 > 31055 Toulouse Cedex 4 > France > > > Suivez l'ISAE-SUPAERO sur les r?seaux sociaux / Follow the ISAE-SUPAERO on the social media > Facebook Twitter LinkedIn Youtube Instagram ? -- --- Laura victoria ROLANDI Doctorant - Doctorat ISAE-SUPAERO Doctorat 1 laura-victoria.rolandi at isae-supaero.fr https://www.isae-supaero.fr Institut Sup?rieur de l'A?ronautique et de l'Espace 10, avenue Edouard Belin - BP 54032 31055 Toulouse Cedex 4 France?? Suivez l'ISAE-SUPAERO sur les r?seaux sociaux / Follow the ISAE-SUPAERO on the social media Facebook Twitter LinkedIn Youtube Instagram -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Wed Aug 26 04:17:34 2020 From: jroman at dsic.upv.es (Jose E. Roman) Date: Wed, 26 Aug 2020 11:17:34 +0200 Subject: [petsc-users] [SLEPc] Krylov Schur- saving krylov subspace In-Reply-To: <391-5f462680-15-78c54e80@265196094> References: <391-5f462680-15-78c54e80@265196094> Message-ID: <511DAAAF-EA64-4A1A-B24D-D7ECFAF8756A@dsic.upv.es> Every solver picks different information from the initial space. Currently, Krylov-Schur only picks the first vector (and discards the rest), because it starts to iterate with just one vector. Other solvers (EPSSUBSPACE) will use all vectors because the iteration starts already with a subspace, not a vector. Jose > El 26 ago 2020, a las 11:07, ROLANDI Laura victoria escribi?: > > Hi Jose, > > Thank you for your advices! > Maybe I didn't understand the EPSSetInitialSpace() function correctly, why should I compute a single vector for using it with Krylov-Schur? > From what I understood it needs 3 inputs > > EPSSetInitialSpace(EPS eps,PetscInt n,Vec vv[]) > > where vv[] is the set of basis vectors of the initial space and n is the number of vectors of the basis. > From what you said regarding the EPSGetBV(), couldn't I use this function to get the whole krylov-subspace and then pass it as vv[] to EPSSetInitialSpace()? even if, as you said, I loose informations regarding the unconverged Ritz pairs... > > Victoria > > > > > Il giorno Venerdi, Agosto 21, 2020 17:51 CEST, "Jose E. Roman" ha scritto: > >> I see. Doing the exponential with SLEPc itself might be faster, as in https://slepc.upv.es/documentation/current/src/eps/tutorials/ex36.c.html >> but I cannot say if it will work for your problem. This approach was used in https://doi.org/10.1017/S0022377818001022 >> >> EPSGetBV() gives you a BV object that contains the Krylov subspace. But to use EPSSetInitialSpace() with Krylov-Schur you should compute a single vector, and the information for the unconverged Ritz vector cannot be easily recovered. It is easier if you use EPSSUBSPACE, where in that case you can pass the whole subspace to EPSSetInitialSpace(). The problem is that convergence of subspace iteration will likely be much slower than Krylov-Schur. >> >> Jose >> >> > El 21 ago 2020, a las 14:34, ROLANDI Laura victoria escribi?: >> > >> > Thank you for your quick response. >> > >> > Yes, i'm running in parallel, I'm just asking for 2 eigenvalues and I'm not doing any factorization. >> > >> > My problem is taking so long because I have implemented the time stepping-exponential transformation: my MatMult() function for computing vectors of the Krylov subspace calls the Direct Numerical Simulation code for compressible Navier-Stokes equations to which I'm linking the stability code. >> > Therefore, each MatMult() call takes very long, and I cannot save the converged eigenvectors for the restart beacause there won't be any converged eigenvectors yet when the job is killed. >> > That's why I thought that the only thing I could save was the computed krylov subspace. >> > >> > Victoria >> > >> > >> > >> > Il giorno Venerdi, Agosto 21, 2020 12:42 CEST, "Jose E. Roman" ha scritto: >> > >> >> Why is your problem taking so long? Are you running in parallel? Is your computation doing a factorization of a matrix? Are you getting slow convergence? How many eigenvalues are you computing? Note that Krylov-Schur is not intended for computing a large percentage of eigenvalues, if you do so then you might get large overheads unless you tune the EPSSetDimensions() parameters (mpd). >> >> >> >> EPSSetInitialSpace() is intended to provide an initial guess, which in Krylov-Schur is a single vector, so in this case you would not pass the Krylov subspace from a previous run. >> >> >> >> A possible scheme for restarting is to save the eigenvectors computed so far, then pass them in the next run via EPSSetDeflationSpace() to avoid recomputing them. You can use a custom stopping criterion as in https://slepc.upv.es/documentation/current/src/eps/tutorials/ex29.c.html to stop before the job is killed, then save the converged eigenvectors (or EPSGetInvariantSubspace() if the problem is nonsymmetric). >> >> >> >> Jose >> >> >> >> >> >> > El 21 ago 2020, a las 11:56, ROLANDI Laura victoria escribi?: >> >> > >> >> > Dear SLEPc developers, >> >> > >> >> > I'm using the Krylov Schur EPS and I have a question regarding a command. >> >> > >> >> > Is there a way for having access and saving the krylov subspace during the EPSSolve call? >> >> > >> >> > I inizialize the solver using the function EPSSetInitialSpace(eps,1, v0), where v0 is a specific vector, but after 24 hours of calculation my job has to end even if the EPSSolve hasn't finished yet. >> >> > Which function should I use for saving the computed Krylov subspace and its dimention n during the process, in order to restart the calculation from it by using EPSsetInitialSpace(eps,n, Krylov-Subspace)? >> >> > >> >> > Thank you very much, >> >> > Victoria >> >> >> > >> > >> > >> > -- >> > --- >> > Laura victoria ROLANDI >> > Doctorant - Doctorat ISAE-SUPAERO Doctorat 1 >> > laura-victoria.rolandi at isae-supaero.fr >> > https://www.isae-supaero.fr >> > Institut Sup?rieur de l'A?ronautique et de l'Espace >> > 10, avenue Edouard Belin - BP 54032 >> > 31055 Toulouse Cedex 4 >> > France >> > >> > >> > Suivez l'ISAE-SUPAERO sur les r?seaux sociaux / Follow the ISAE-SUPAERO on the social media >> > Facebook Twitter LinkedIn Youtube Instagram >> > > > > -- > --- > Laura victoria ROLANDI > Doctorant - Doctorat ISAE-SUPAERO Doctorat 1 > laura-victoria.rolandi at isae-supaero.fr > https://www.isae-supaero.fr > Institut Sup?rieur de l'A?ronautique et de l'Espace > 10, avenue Edouard Belin - BP 54032 > 31055 Toulouse Cedex 4 > France > > > Suivez l'ISAE-SUPAERO sur les r?seaux sociaux / Follow the ISAE-SUPAERO on the social media > Facebook Twitter LinkedIn Youtube Instagram From thibault.bridelbertomeu at gmail.com Thu Aug 27 01:15:07 2020 From: thibault.bridelbertomeu at gmail.com (Thibault Bridel-Bertomeu) Date: Thu, 27 Aug 2020 08:15:07 +0200 Subject: [petsc-users] PetscFV and TS implicit In-Reply-To: References: <87mu2pgtdp.fsf@jedbrown.org> <01FA5D4D-A0CA-4ACB-ACC9-EB213E3B0D2F@petsc.dev> <2BF36064-AEC6-4795-BEE7-DAAF69119D2E@petsc.dev> Message-ID: Sorry Barry for the late reply. Le mar. 25 ao?t 2020 ? 15:19, Barry Smith a ?crit : > > Yes, your use of the coloring is what I was thinking of. I don't think > you need any of calls to the coloring code as it is managed > in SNESComputeJacobianDefaultColor() if you don't provide it initially. Did > that not work, just using -snes_fd_color? > Yes it works with the command line flag too, I just wanted to write down the lines somewhere in case I needed them and I left them there in the end. > Regarding direct solvers. Add the arguments > > --download-superlu_dist --download-metis --download-parmetis > --download-mumps --download-scalapack --download-ptscotch > > to ./configure > > Then when you run the code you can use -pc_type lu > -pc_factor_mat_solver_type superlu_dist or mumps > Ah, thanks ! I haven't experimented much with the direct solvers in PETSc, I mostly use iterative resolution so far, but I'll have a look. With this first implementation using the automatic differentiation by coloring, I was able to solve with implicit time stepping problems involving the Euler equations or the Navier-Stokes equations (which contain a parabolic term) in both 2D and axisymmetric form. It is definitely a win for me. I played around the different SNES, KSP and PC I could use, and it turns out using respectively newtonls, gmres and sor with 5 iterations is probably the most robust combination, with which I was able to achieve start-up CFL numbers around 50 (based solely on the non-parabolic part of the systems of equations). Now, coming back to why I first sent a message in this mailing list, I am still trying to create the Jacobian and Preconditioner myself. As you can see in the attached PDF, I am still using my PetscJacobianFunction_JFNK and my PetscIJacobian routines, but they evolved since last time. Following your advice Barry I rewrote carefully the JFNK method to account for global vectors as input and I feel it should be okay now even though I don't really have a way of testing it : with the systems of equations I am trying to solve, no preconditioner yields a SNES divergence. So I wrote the PetscComputePreconMatImpl to get the preconditioner, the idea being I would like to get something like alpha * Id - grad (dF/dU) where grad represents the spatial differentiation operator and dF/dU is the exact jacobian in each cell this time, alpha being a shift parameter introduced by the implicit time stepping method. The exact dF/dU is computed from the state in a cell using the static NavierStokes2DJFunction. To compute the gradient of this jacobian I took inspiration from the way the gradient is computed in the DMPlex_reconstruct_gradient_internal but I am not quite sure of what I am doing there. If you don't mind, could you please tell me what you think of the way I approach this preconditioner computation ? Thank you so much for your help once again, Best regards, Thibault > Barry > > > > > On Aug 25, 2020, at 2:06 AM, Thibault Bridel-Bertomeu < > thibault.bridelbertomeu at gmail.com> wrote: > > Hello everyone, > > Barry, I followed your recommendations and came up with the pieces of code > that are in the attached PDF - mostly pages 1 & 3 are important, page 2 is > almost entirely commented. > > I tried to use DMCreateColoring as the doc says it may produce a more > accurate coloring, however it is not implemented for a Plex yet hence the > call to Matcoloringcreate that you will see. I left the test DMHascoloring > in case in a later release PETSc allows for the generation of the coloring > from a Plex. > > Also, you'll see in the input file that contrary to what you suggested I > am using the jacobi PC. It is simply because it appears that the way I > compiled my PETSc does not support a PC LU or PC CHOLESKY (per the seg > fault print in the console). Do I need scalapack or mumps or something else > ? > > Altogether this implementation works and produces results that are correct > physically speaking. Now I have to try and increase the CFL number a lot to > see how robust this approach is. > > All in all, what do you think of this implementation, is it what you had > in mind ? > > Thank you for your help, > > Thibault > > Le lun. 24 ao?t 2020 ? 22:16, Barry Smith a ?crit : > >> >> >> On Aug 24, 2020, at 2:20 PM, Thibault Bridel-Bertomeu < >> thibault.bridelbertomeu at gmail.com> wrote: >> >> Good evening everyone, >> >> Thanks Barry for your answer. >> >> Le lun. 24 ao?t 2020 ? 18:51, Barry Smith a ?crit : >> >>> >>> >>> On Aug 24, 2020, at 11:38 AM, Thibault Bridel-Bertomeu < >>> thibault.bridelbertomeu at gmail.com> wrote: >>> >>> Thank you Barry for taking the time to go through the code ! >>> >>> I indeed figured out this afternoon that the function related to the >>> matrix-vector product is always handling global vectors. I corrected mine >>> so that it compiles, but I have a feeling it won't run properly without a >>> preconditioner. >>> >>> Anyways you are right, my PetscJacobianFunction_JFNK() aims at doing >>> some basic finite-differencing ; user->RHS_ref is my F(U) if you see the >>> system as dU/dt = F(U). As for the DMGlobalToLocal() it was there mainly >>> because I had not realized the vectors I was manipulating were global. >>> I will take your advice and try with just the SNESSetUseMatrixFree. >>> I haven't quite fully understood what it does "under the hood" though: >>> just calling SNESSetUseMatrixFree(snes,PETSC_TRUE,PETSC_FALSE) before the >>> TSSolve call is enough to ensure that the implicit matrix is computed ? >>> Does it use the function we set as a RHS to build the matrix ? >>> >>> >>> All it does is "replace" the A matrix with one automatically created >>> for the job using MatCreateMFFD(). It does not touch the B matrix, it does >>> not build the matrix but yes if does use the function to provide to do the >>> differencing. >>> >> >> OK, thank you. This MFFD Matrix is then called by the TS to construct the >> linear system that will be solved to advance the system of equations, right >> ? >> >>> >>> To create the preconditioner I will do as you suggest too, thank you. >>> This matrix has to be as close as possible to the inverse of the implicit >>> matrix to ensure that the eigenvalues of the system are as close to 1 as >>> possible. Given the implicit matrix is built "automatically" thanks to the >>> SNES matrix free capability, can we use that matrix as a starting point to >>> the building of the preconditioner ? >>> >>> >>> No the MatrixFree doesn't build a matrix, it can only do >>> matrix-vector products with differencing. >>> >> >> My bad, wrong word. Yes of course it's all matrix-free hence it's just a >> functional, however maybe the inner mechanisms can be accessed and used for >> the preconditioner ? >> >> >> Probably not, it really only can do matrix-vector products. >> >> You were talking about the coloring capabilities in PETSc, is that where >>> it can be applied ? >>> >>> >>> Yes you can use that. See MatFDColoringCreate() but since you are >>> using a DM in theory you can use -snes_fd_color and PETSc will manage >>> everything for you so you don't have to write any code for Jacobians at >>> all. Again it uses your function to do differences using coloring to be >>> efficient to build the Jacobian for you. >>> >> >> I read a bit about the coloring you are mentioning. As I understand it, >> it is another option to have a matrix-free Jacobian behavior during the >> Newton-Krylov iterations, right ? Either we use the SNESSetUseMatrixFree() >> alone, then it works using "basic" finite-differencing, or we use the >> SNESSetUseMatrixFree + MatFDColoringCreate >> & SNESComputeJacobianDefaultColor as an option to SNESSetJacobian to access >> the finite-differencing based on coloring. Is that right ? >> Then if i come back to my preconditioner problem ... once you have set-up >> the implicit matrix with one or the other aforementioned matrix-free ways, >> how would you go around setting up the preconditioner ? In a matrix-free >> way too, or rather as a real matrix that we assemble ourselves this time, >> as you seemed to mean with the previous MatAij DMCreateMatrix ? >> >> Sorry if it seems like I am nagging, but I would really like to >> understand how to manipulate the matrix-free methods and structures in >> PETSc to run a time-implicit finite volume computation, it's so promising ! >> >> >> There are many many possibilities as we discussed in previous email, >> most with various limitations. >> >> When you use -snes_fd_color (or put code into the source like >> MatFDColoringCreate which is unnecessary a since you are doing the same >> thing as -snes_fd_color you get back the true Jacobian (approximated so in >> less digits than analytic) so you can use any preconditioner that you can >> use as if you built the true Jacobian yourself. >> >> I always recommend starting with -pc_type lu and making sure you are >> getting the correct answers to your problem and then worrying about the >> preconditioner. Faster preconditioner are JUST optimizations, nothing more, >> they should not change the quality of the solution to your PDE/ODE and you >> absolutely need to make sure your are getting correct quality answers >> before fiddling with the preconditioner. >> >> Once you have the solution correct and figured out a good >> preconditioner (assuming using the true Jacobian works for your >> discretization) then you can think about optimizing the computation of the >> Jacobian by doing it analytically finite volume by finite volume. But you >> shouldn't do any of that until you are sure that your implicit TS >> integrator for FV produces good numerical answers. >> >> Barry >> >> >> >> >> >> Thanks again, >> >> >> Thibault >> >> >>> Barry >>> >>> Internally it uses SNESComputeJacobianDefaultColor() if you are >>> interested in what it does. >>> >>> >>> >>> >>> >>> Thank you so much again, >>> >>> Thibault >>> >>> >>> Le lun. 24 ao?t 2020 ? 15:45, Barry Smith a ?crit : >>> >>>> >>>> I think the attached is wrong. >>>> >>>> >>>> >>>> The input to the matrix vector product for the Jacobian is always >>>> global vectors which means on each process the dimension is not the size of >>>> the DMGetLocalVector() it should be the VecGetLocalSize() of the >>>> DMGetGlobalVector() >>>> >>>> But you may be able to skip all this and have the DM create the shell >>>> matrix setting it sizes appropriately and you only need to supply the MATOP >>>> >>>> DMSetMatType(dm,MATSHELL); >>>> DMCreateMatrix(dm,&A); >>>> >>>> In fact, I also don't understand the PetscJacobianFunction_JFKN() >>>> function It seems to be doing finite differencing on the >>>> DMPlexTSComputeRHSFunctionFVM() assuming the current function value is in >>>> usr->RHS_ref. How is this different than just letting PETSc/SNES used >>>> finite differences to do the matrix-vector product. Your code seems rather >>>> complicated with the DMGlobalToLocal() which I don't understand what it is >>>> suppose to do there. >>>> >>>> I think you can just call >>>> >>>> TSGetSNES() >>>> SNESSetUseMatrixFree(snes,PETSC_TRUE,PETSC_FALSE); >>>> >>>> and it will set up an internal matrix that does the finite differencing >>>> for you. Then you never need a shell matrix. >>>> >>>> >>>> Also to create the preconditioner matrix B this should work >>>> >>>> DMSetMatType(dm,MATAIJ); >>>> DMCreateMatrix(dm,&B); >>>> >>>> no need for you to figure out the sizes. >>>> >>>> >>>> Note that both A and B need to have the same dimensions on each process >>>> as the global vectors which I don't think your current code has. >>>> >>>> >>>> >>>> Barry >>>> >>>> >>>> >>>> On Aug 24, 2020, at 12:56 AM, Thibault Bridel-Bertomeu < >>>> thibault.bridelbertomeu at gmail.com> wrote: >>>> >>>> Barry, first of all, thank you very much for your detailed answer, I >>>> keep reading it to let it soak in - I might come back to you for more >>>> details if you do not mind. >>>> >>>> In the meantime, to fuel the conversation, I attach to this e-mail two >>>> pdfs containing the pieces of the code that regard what we are discussing. >>>> In the *timedisc.pdf, you'll find how I handle the initialization of the TS >>>> object, and in the *petscdefs.pdf you'll find the method that calls the >>>> TSSolve as well as the methods that are linked to the TS (the timestep >>>> adapt, the jacobian etc ...). [Sorry for the quality, I cannot do better >>>> than that sort of pdf ...] >>>> >>>> Based on what is in the structured code I sent you the other day, I >>>> rewrote the PetscJacobianFunction_JFNK. I think it should be all right, but >>>> although it compiles, execution raises a seg fault I think when I do >>>> ierr = TSSetIJacobian(ts, A, A, PetscIJacobian, user); >>>> saying that A does not have the right dimensions. It is quite new, I am >>>> still looking into where exactly the error is raised. What do you think of >>>> this implementation though, does it look correct in your expert eyes ? >>>> As for what we really discussed so far, it's that >>>> PetscComputePreconMatImpl that I do not know how to implement (with the >>>> derivative of the jacobian based on the FVM object). >>>> >>>> I understand now that what I am showing you today might not be the >>>> right way to go if one wants to really use the PetscFV, but I just wanted >>>> to add those code lines to the conversation to have your feedback. >>>> >>>> Thank you again for your help, >>>> >>>> Thibault >>>> >>>> >>>> Le ven. 21 ao?t 2020 ? 19:25, Barry Smith a ?crit : >>>> >>>> >>>>> >>>>> On Aug 21, 2020, at 10:58 AM, Thibault Bridel-Bertomeu < >>>>> thibault.bridelbertomeu at gmail.com> wrote: >>>>> >>>>> Thank you Barry for the tip ! I?ll make sure to do that when >>>>> everything is set. >>>>> What I also meant is that there will not be any more direct way to set >>>>> the preconditioner than to go through SNESSetJacobian after having >>>>> assembled everything by hand ? Like, in my case, or in the more general >>>>> case of fluid dynamics equations, the preconditioner is not a fun matrix to >>>>> assemble, because for every cell the derivative of the physical flux >>>>> jacobian has to be taken and put in the right block in the matrix - finite >>>>> element style if you want. Is there a way to do that with Petsc methods, >>>>> maybe short-circuiting the FEM based methods ? >>>>> >>>>> >>>>> Thibault >>>>> >>>>> I am not sure what you mean but there are a couple of things that >>>>> may be helpful. >>>>> >>>>> PCSHELL >>>>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCSHELL.html allows >>>>> you to build your own preconditioner (that can and often will use one or >>>>> more of its own Mats, and KSP or PC inside it, or even use another PETScFV >>>>> etc to build some of the sub matrices for you if it is appropriate), this >>>>> approach means you never need to construct a "global" PETSc matrix from >>>>> which PETSc builds the preconditioner. But you should only do this if the >>>>> conventional approach is not reasonable for your problem. >>>>> >>>>> MATNEST >>>>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATNEST.html allows >>>>> you to build a global matrix by building parts of it separately and even >>>>> skipping parts you decide you don't need in the preconditioner. >>>>> Conceptually it is the same as just creating a global matrix and filling up >>>>> but the process is a bit different and something suitable for "multi >>>>> physics" or "multi-equation" type applications. >>>>> >>>>> Of course what you put into PCSHELL and MATNEST will affect the >>>>> convergence of the nonlinear solver. As Jed noted what you put in the >>>>> "Jacobian" does not have to be directly the same mathematically as what you >>>>> put into the TSSetI/RHSFunction with the caveat that it does have to >>>>> appropriate spectral properties to result in a good preconditioner for the >>>>> "true" Jacobian. >>>>> >>>>> Couple of other notes: >>>>> >>>>> The entire business of "Jacobian" matrix-free or not (with for example >>>>> -snes_fd_operator) is tricky because as Jed noted if your finite volume >>>>> scheme has non-differential terms such as if () tests. There is a concept >>>>> of sub-differential for this type of thing but I know absolutely nothing >>>>> about that and probably not worth investigating. >>>>> >>>>> In this situation you can avoid the "true" Jacobian completely (both >>>>> for matrix-vector product and preconditioner) and use something else as Jed >>>>> suggested a lower order scheme that is differentiable. This can work well >>>>> for solving the nonlinear system or not depending on how suitable it is for >>>>> your original "function" >>>>> >>>>> >>>>> 1) In theory at least you can have the Jacobian matrix-vector product >>>>> computed directly using DMPLEX/PETScFV infrastructure (it would apply the >>>>> Jacobian locally matrix-free using code similar to the code that evaluates >>>>> the FV "function". I do no know if any of this code is written, it will be >>>>> more efficient than -snes_mf_operator that evaluates the FV "function" and >>>>> does traditional differencing to compute the Jacobian. Again it has the >>>>> problem of non-differentialability if the function is not differential. But >>>>> it could be done for a different (lower order scheme) that is >>>>> differentiable. >>>>> >>>>> 2) You can have PETSc compute the Jacobian explicitly coloring and >>>>> from that build the preconditioner, this allows you to avoid the hassle of >>>>> writing the code for the derivatives yourself. This uses finite differences >>>>> on your function and coloring of the graph to compute many columns of the >>>>> Jacobian simultaneously and can be pretty efficient. Again if the function >>>>> is not differential there can be issues of what the result means and will >>>>> it work in a nonlinear solver. SNESComputeJacobianDefaultColor >>>>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESComputeJacobianDefaultColor.html >>>>> >>>>> 3) Much more outlandish is to skip Newton and Jacobians completely and >>>>> use the full approximation scheme SNESFAS >>>>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNESFAS/SNESFAS.html this >>>>> requires a grid hierarchy and appropriate way to interpolate up through the >>>>> grid hierarchy your finite volume solutions. Probably not worth >>>>> investigating unless you have lots of time on your hands and keen interest >>>>> in this kind of stuff https://arxiv.org/pdf/1607.04254.pdf >>>>> >>>>> So to summarize, and Matt and Jed can correct my mistakes. >>>>> >>>>> 1) Form the full Jacobian from the original "function" using >>>>> analytic approach use it for both the matrix-vector product and to build >>>>> the preconditioner. Problem if full Jacobian not well defined >>>>> mathematically. Tough to code, usually not practical. >>>>> >>>>> 2) Do any matrix free (any way) for the full Jacobian and >>>>> >>>>> a) build another "approximate" Jacobian (using any technique analytic >>>>> or finite differences using matrix coloring on a new "lower order" >>>>> "function") Still can have trouble if this original Jacobian is no well >>>>> defined >>>>> >>>>> b) "write your own preconditioner" that internally can use anything >>>>> in PETSc that approximately solves the Jacobian. Same potential problems if >>>>> original Jacobian is not differential, plus convergence will depend on how >>>>> good your own preconditioner approximates the inverse of the true Jacobian. >>>>> >>>>> 3) Use a lower Jacobian (computed anyway you want) for the >>>>> matrix-vector product and the preconditioner. The problem of >>>>> differentiability is gone but convergence of the nonlinear solver depends >>>>> on how well lower order Jacobian is appropriate for the original "function" >>>>> >>>>> a) Form the "lower order" Jacobian analytically or with coloring >>>>> and use for both matrix-vector product and building preconditioner. Note >>>>> that switching between this and 2a is trivial. >>>>> >>>>> b) Do the "lower order" Jacobian matrix free and provide your own >>>>> PCSHELL. Note that switching between this and 2b is trivial. >>>>> >>>>> Barry >>>>> >>>>> I would first try competing the "true" Jacobian via coloring, if that >>>>> works and give satisfactory results (fast enough) then stop. >>>>> >>>>> Then I would do 2a/2b by writing my "function" using PETScFV and >>>>> writing the "lower order function" via PETScFV and use matrix coloring to >>>>> get the Jacobian from the second "lower order function". If this works well >>>>> (either with 2a or 3a or both) then stop or you can compute the "lower >>>>> order" Jacobian analytically (again using PetscFV) for a more efficient >>>>> evaluation of the Jacobian. >>>>> >>>>> >>>> >>>>> >>>>> >>>>> Thanks ! >>>>> >>>>> Thibault >>>>> >>>>> Le ven. 21 ao?t 2020 ? 17:22, Barry Smith a ?crit : >>>>> >>>>> >>>>>> >>>>>> On Aug 21, 2020, at 8:35 AM, Thibault Bridel-Bertomeu < >>>>>> thibault.bridelbertomeu at gmail.com> wrote: >>>>>> >>>>>> >>>>>> >>>>>> Le ven. 21 ao?t 2020 ? 15:23, Matthew Knepley a >>>>>> ?crit : >>>>>> >>>>>>> On Fri, Aug 21, 2020 at 9:10 AM Thibault Bridel-Bertomeu < >>>>>>> thibault.bridelbertomeu at gmail.com> wrote: >>>>>>> >>>>>>>> Sorry, I sent too soon, I hit the wrong key. >>>>>>>> >>>>>>>> I wanted to say that context.npoints is the local number of cells. >>>>>>>> >>>>>>>> PetscRHSFunctionImpl allows to generate the hyperbolic part of the >>>>>>>> right hand side. >>>>>>>> Then we have : >>>>>>>> >>>>>>>> PetscErrorCode PetscIJacobian( >>>>>>>> TS ts, /*!< Time stepping object (see PETSc TS)*/ >>>>>>>> PetscReal t, /*!< Current time */ >>>>>>>> Vec Y, /*!< Solution vector */ >>>>>>>> Vec Ydot, /*!< Time-derivative of solution vector */ >>>>>>>> PetscReal a, /*!< Shift */ >>>>>>>> Mat A, /*!< Jacobian matrix */ >>>>>>>> Mat B, /*!< Preconditioning matrix */ >>>>>>>> void *ctxt /*!< Application context */ >>>>>>>> ) >>>>>>>> { >>>>>>>> PETScContext *context = (PETScContext*) ctxt; >>>>>>>> HyPar *solver = context->solver; >>>>>>>> _DECLARE_IERR_; >>>>>>>> >>>>>>>> PetscFunctionBegin; >>>>>>>> solver->count_IJacobian++; >>>>>>>> context->shift = a; >>>>>>>> context->waqt = t; >>>>>>>> /* Construct preconditioning matrix */ >>>>>>>> if (context->flag_use_precon) { IERR PetscComputePreconMatImpl(B,Y, >>>>>>>> context); CHECKERR(ierr); } >>>>>>>> >>>>>>>> PetscFunctionReturn(0); >>>>>>>> } >>>>>>>> >>>>>>>> and PetscJacobianFunction_JFNK which I bind to the matrix shell, >>>>>>>> computes the action of the jacobian on a vector : say U0 is the state of >>>>>>>> reference and Y the vector upon which to apply the JFNK method, then the >>>>>>>> PetscJacobianFunction_JFNK returns shift * Y - 1/epsilon * (F(U0 + >>>>>>>> epsilon*Y) - F(U0)) where F allows to evaluate the hyperbolic flux (shift >>>>>>>> comes from the TS). >>>>>>>> The preconditioning matrix I compute as an approximation to the >>>>>>>> actual jacobian, that is shift * Identity - Derivative(dF/dU) where dF/dU >>>>>>>> is, in each cell, a 4x4 matrix that is known exactly for the system of >>>>>>>> equations I am solving, i.e. Euler equations. For the structured grid, I >>>>>>>> can loop on the cells and do that 'Derivative' thing at first order by >>>>>>>> simply taking a finite-difference like approximation with the neighboring >>>>>>>> cells, Derivative(phi) = phi_i - phi_{i-1} and I assemble the B matrix >>>>>>>> block by block (JFunction is the dF/dU) >>>>>>>> >>>>>>>> /* diagonal element */ >>>>>>>> >>>>>>>> >>>>>>>> for (v=0; v>>>>>>> nvars*pg + v; } >>>>>>>> >>>>>>>> >>>>>>>> ierr = solver->JFunction >>>>>>>> >>>>>>>> (values,(u+nvars*p),solver->physics >>>>>>>> >>>>>>>> ,dir,0); >>>>>>>> >>>>>>>> >>>>>>>> _ArrayScale1D_ >>>>>>>> >>>>>>>> (values,(dxinv*iblank),(nvars*nvars)); >>>>>>>> >>>>>>>> >>>>>>>> ierr = >>>>>>>> MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); CHKERRQ(ierr); >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> /* left neighbor */ >>>>>>>> >>>>>>>> >>>>>>>> if (pgL >= 0) { >>>>>>>> >>>>>>>> >>>>>>>> for (v=0; v>>>>>>> nvars*pgL + v; } >>>>>>>> >>>>>>>> >>>>>>>> ierr = solver->JFunction >>>>>>>> >>>>>>>> (values,(u+nvars*pL),solver->physics >>>>>>>> >>>>>>>> ,dir,1); >>>>>>>> >>>>>>>> >>>>>>>> _ArrayScale1D_ >>>>>>>> >>>>>>>> (values,(-dxinv*iblank),(nvars*nvars)); >>>>>>>> >>>>>>>> >>>>>>>> ierr = >>>>>>>> MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); CHKERRQ(ierr); >>>>>>>> >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> /* right neighbor */ >>>>>>>> >>>>>>>> >>>>>>>> if (pgR >= 0) { >>>>>>>> >>>>>>>> >>>>>>>> for (v=0; v>>>>>>> nvars*pgR + v; } >>>>>>>> >>>>>>>> >>>>>>>> ierr = solver->JFunction >>>>>>>> >>>>>>>> (values,(u+nvars*pR),solver->physics >>>>>>>> >>>>>>>> ,dir,-1); >>>>>>>> >>>>>>>> >>>>>>>> _ArrayScale1D_ >>>>>>>> >>>>>>>> (values,(-dxinv*iblank),(nvars*nvars)); >>>>>>>> >>>>>>>> >>>>>>>> ierr = >>>>>>>> MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); CHKERRQ(ierr); >>>>>>>> >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I do not know if I am clear here ... >>>>>>>> Anyways, I am trying to figure out how to do this shell matrix and >>>>>>>> this preconditioner using all the FV and DMPlex artillery. >>>>>>>> >>>>>>> >>>>>>> Okay, that is very clear. We should be able to get the JFNK just >>>>>>> with -snes_mf_operator, and put the approximate J construction in >>>>>>> DMPlexComputeJacobian_Internal(). >>>>>>> There is an FV section already, and we could just add this. I would >>>>>>> need to understand those entries in the pointwise Riemann sense that the >>>>>>> other stuff is now. >>>>>>> >>>>>> >>>>>> Ok i had a quick look and if I understood correctly it would do the >>>>>> job. Setting the snes-mf-operator flag would mean however that we have to >>>>>> go through SNESSetJacobian to set the jacobian and the preconditioning >>>>>> matrix wouldn't it ? >>>>>> >>>>>> >>>>>> Thibault, >>>>>> >>>>>> Since the TS implicit methods end up using SNES internally the >>>>>> option should be available to you without requiring you to be calling the >>>>>> SNES routines directly >>>>>> >>>>>> Once you have finalized your approach and if for the implicit case >>>>>> you always work in the snes mf operator mode you can hardwire >>>>>> >>>>>> TSGetSNES(ts,&snes); >>>>>> SNESSetUseMatrixFree(snes,PETSC_TRUE,PETSC_FALSE); >>>>>> >>>>>> in your code so you don't need to always provide the option >>>>>> -snes-mf-operator >>>>>> >>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> There might be calls to the Riemann solver to evaluate the dRHS / dU >>>>>> part yes but maybe it's possible to re-use what was computed for the RHS^n ? >>>>>> In the FV section the jacobian is set to identity which I missed >>>>>> before, but it could explain why when I used the following : >>>>>> >>>>>> TSSetType(ts, TSBEULER); >>>>>> DMTSSetIFunctionLocal(dm, DMPlexTSComputeIFunctionFEM , &ctx); >>>>>> DMTSSetIJacobianLocal(dm, DMPlexTSComputeIJacobianFEM , &ctx); >>>>>> >>>>>> with my FV discretization nothing happened, right ? >>>>>> >>>>>> Thank you, >>>>>> >>>>>> Thibault >>>>>> >>>>>> Thanks, >>>>>>> >>>>>>> Matt >>>>>>> >>>>>>> >>>>>>>> Le ven. 21 ao?t 2020 ? 14:55, Thibault Bridel-Bertomeu < >>>>>>>> thibault.bridelbertomeu at gmail.com> a ?crit : >>>>>>>> >>>>>>> Hi, >>>>>>>>> >>>>>>>>> Thanks Matthew and Jed for your input. >>>>>>>>> I indeed envision an implicit solver in the sense Jed mentioned - >>>>>>>>> Jiri Blazek's book is a nice intro to this concept. >>>>>>>>> >>>>>>>>> Matthew, I do not know exactly what to change right now because >>>>>>>>> although I understand globally what the DMPlexComputeXXXX_Internal methods >>>>>>>>> do, I cannot say for sure line by line what is happening. >>>>>>>>> In a structured code, I have a an implicit FVM solver with PETSc >>>>>>>>> but I do not use any of the FV structure, not even a DM - I just use C >>>>>>>>> arrays that I transform to PETSc Vec and Mat and build my IJacobian and my >>>>>>>>> preconditioner and gives all that to a TS and it runs. I cannot figure out >>>>>>>>> how to do it with the FV and the DM and all the underlying "shortcuts" that >>>>>>>>> I want to use. >>>>>>>>> >>>>>>>>> Here is the top method for the structured code : >>>>>>>>> >>>>>>>>> int total_size = context.npoints * solver->nvars >>>>>>>>> ierr = TSSetRHSFunction(ts,PETSC_NULL,PetscRHSFunctionImpl,& >>>>>>>>> context); CHKERRQ(ierr); >>>>>>>>> SNES snes; >>>>>>>>> KSP ksp; >>>>>>>>> PC pc; >>>>>>>>> SNESType snestype; >>>>>>>>> ierr = TSGetSNES(ts,&snes); CHKERRQ(ierr); >>>>>>>>> ierr = SNESGetType(snes,&snestype); CHKERRQ(ierr); >>>>>>>>> >>>>>>>>> flag_mat_a = 1; >>>>>>>>> ierr = MatCreateShell(MPI_COMM_WORLD,total_size,total_size, >>>>>>>>> PETSC_DETERMINE, >>>>>>>>> PETSC_DETERMINE,&context,&A); CHKERRQ(ierr); >>>>>>>>> context.jfnk_eps = 1e-7; >>>>>>>>> ierr = PetscOptionsGetReal(NULL,NULL,"-jfnk_epsilon",&context. >>>>>>>>> jfnk_eps,NULL); CHKERRQ(ierr); >>>>>>>>> ierr = MatShellSetOperation(A,MATOP_MULT,(void (*)(void)) >>>>>>>>> PetscJacobianFunction_JFNK); CHKERRQ(ierr); >>>>>>>>> ierr = MatSetUp(A); CHKERRQ(ierr); >>>>>>>>> >>>>>>>>> context.flag_use_precon = 0; >>>>>>>>> ierr = PetscOptionsGetBool(PETSC_NULL,PETSC_NULL,"-with_pc",( >>>>>>>>> PetscBool*)(&context.flag_use_precon),PETSC_NULL); CHKERRQ(ierr); >>>>>>>>> >>>>>>>>> /* Set up preconditioner matrix */ >>>>>>>>> flag_mat_b = 1; >>>>>>>>> ierr = MatCreateAIJ(MPI_COMM_WORLD,total_size,total_size, >>>>>>>>> PETSC_DETERMINE,PETSC_DETERMINE, >>>>>>>>> >>>>>>>> (solver->ndims*2+1)*solver->nvars,NULL, >>>>>>>>> 2*solver->ndims*solver->nvars,NULL,&B); CHKERRQ(ierr); >>>>>>>>> ierr = MatSetBlockSize(B,solver->nvars); >>>>>>>>> /* Set the RHSJacobian function for TS */ >>>>>>>>> >>>>>>>> ierr = TSSetIJacobian(ts,A,B,PetscIJacobian,&context); CHKERRQ(ierr >>> ); >>> >>> Thibault Bridel-Bertomeu >>>>>>>>> ? >>>>>>>>> Eng, MSc, PhD >>>>>>>>> Research Engineer >>>>>>>>> CEA/CESTA >>>>>>>>> 33114 LE BARP >>>>>>>>> Tel.: (+33)557046924 >>>>>>>>> Mob.: (+33)611025322 >>>>>>>>> Mail: thibault.bridelbertomeu at gmail.com >>>>>>>>> >>>>>>>>> >>>>>>>>> Le jeu. 20 ao?t 2020 ? 18:43, Jed Brown a >>>>>>>>> ?crit : >>>>>>>>> >>>>>>>>>> Matthew Knepley writes: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> > I could never get the FVM stuff to make sense to me for >>>>>>>>>> implicit methods. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> > Here is my problem understanding. If you have an FVM method, it >>>>>>>>>> decides >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> > to move "stuff" from one cell to its neighboring cells >>>>>>>>>> depending on the >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> > solution to the Riemann problem on each face, which computed >>>>>>>>>> the flux. This >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> > is >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> > fine unless the timestep is so big that material can flow >>>>>>>>>> through into the >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> > cells beyond the neighbor. Then I should have considered the >>>>>>>>>> effect of the >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> > Riemann problem for those interfaces. That would be in the >>>>>>>>>> Jacobian, but I >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> > don't know how to compute that Jacobian. I guess you could do >>>>>>>>>> everything >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> > matrix-free, but without a preconditioner it seems hard. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> So long as we're using method of lines, the flux is just >>>>>>>>>> instantaneous flux, not integrated over some time step. It has the same >>>>>>>>>> meaning for implicit and explicit. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> An explicit method would be unstable if you took such a large >>>>>>>>>> time step (CFL) and an implicit method will not simultaneously be SSP and >>>>>>>>>> higher than first order, but it's still a consistent discretization of the >>>>>>>>>> problem. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> It's common (done in FUN3D and others) to precondition with a >>>>>>>>>> first-order method, where gradient reconstruction/limiting is skipped. >>>>>>>>>> That's what I'd recommend because limiting creates nasty nonlinearities and >>>>>>>>>> the resulting discretizations lack h-ellipticity which makes them very hard >>>>>>>>>> to solve. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their >>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>> experiments lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>> >>>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>> Thibault Bridel-Bertomeu >>>>> ? >>>>> Eng, MSc, PhD >>>>> Research Engineer >>>>> CEA/CESTA >>>>> 33114 LE BARP >>>>> Tel.: (+33)557046924 >>>>> Mob.: (+33)611025322 >>>>> Mail: thibault.bridelbertomeu at gmail.com >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >>>> >>> >>> -- >>> Thibault Bridel-Bertomeu >>> ? >>> Eng, MSc, PhD >>> Research Engineer >>> CEA/CESTA >>> 33114 LE BARP >>> Tel.: (+33)557046924 >>> Mob.: (+33)611025322 >>> Mail: thibault.bridelbertomeu at gmail.com >>> >>> >>> >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: equationOfState_modded_h.pdf Type: application/pdf Size: 299857 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: timeDiscretization_PETSc_modded.pdf Type: application/pdf Size: 603726 bytes Desc: not available URL: From mlohry at gmail.com Thu Aug 27 08:10:46 2020 From: mlohry at gmail.com (Mark Lohry) Date: Thu, 27 Aug 2020 09:10:46 -0400 Subject: [petsc-users] Bus Error In-Reply-To: References: <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> <02176361-CBC4-490B-A064-057C9FAC271A@petsc.dev> <2C14E111-0ABC-4322-AD1C-FC83E4BD4798@petsc.dev> <87h7ssgg0g.fsf@jedbrown.org> <80DC4DC1-8D9B-43BF-9947-F47694FE68E9@petsc.dev> <87blj0gew3.fsf@jedbrown.org> <878se4gdd6.fsf@jedbrown.org> <3B4D8471-F863-49C8-AAD7-4A4ACA3BE78A@petsc.dev> <87364cgczg.fsf@jedbrown.org> <79E082F4-0261-4F32-9781-861B2B650511@petsc.dev> <87y2m3g7mp.fsf@jedbrown.org> <1BA78983-882E-404D-983D-B432D17E6421@petsc.dev> <87a6yjg3o5.fsf@jedbrown.org> Message-ID: Barry, no output from that patch i'm afraid: 54 KSP Residual norm 3.215013886664e+03 55 KSP Residual norm 3.049105434513e+03 56 KSP Residual norm 2.859123916860e+03 [929]PETSC ERROR: ------------------------------------------------------------------------ [929]PETSC ERROR: Caught signal number 7 BUS: Bus Error, possibly illegal memory access [929]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [929]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [929]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [929]PETSC ERROR: likely location of problem given in stack below [929]PETSC ERROR: --------------------- Stack Frames ------------------------------------ [929]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, [929]PETSC ERROR: INSTEAD the line number of the start of the function [929]PETSC ERROR: is given. [929]PETSC ERROR: [929] BLASgemv line 1406 /home/mlohry/petsc/src/mat/impls/baij/seq/baijfact.c [929]PETSC ERROR: [929] MatSolve_SeqBAIJ_N_NaturalOrdering line 1378 /home/mlohry/petsc/src/mat/impls/baij/seq/baijfact.c [929]PETSC ERROR: [929] MatSolve line 3354 /home/mlohry/petsc/src/mat/interface/matrix.c [929]PETSC ERROR: [929] PCApply_ILU line 201 /home/mlohry/petsc/src/ksp/pc/impls/factor/ilu/ilu.c [929]PETSC ERROR: [929] PCApply line 426 /home/mlohry/petsc/src/ksp/pc/interface/precon.c [929]PETSC ERROR: [929] KSP_PCApply line 279 /home/mlohry/petsc/include/petsc/private/kspimpl.h [929]PETSC ERROR: [929] KSPSolve_PREONLY line 16 /home/mlohry/petsc/src/ksp/ksp/impls/preonly/preonly.c [929]PETSC ERROR: [929] KSPSolve_Private line 590 /home/mlohry/petsc/src/ksp/ksp/interface/itfunc.c [929]PETSC ERROR: [929] KSPSolve line 848 /home/mlohry/petsc/src/ksp/ksp/interface/itfunc.c [929]PETSC ERROR: [929] PCApply_ASM line 441 /home/mlohry/petsc/src/ksp/pc/impls/asm/asm.c [929]PETSC ERROR: [929] PCApply line 426 /home/mlohry/petsc/src/ksp/pc/interface/precon.c [929]PETSC ERROR: [929] KSP_PCApply line 279 /home/mlohry/petsc/include/petsc/private/kspimpl.h srun: Job step aborted: Waiting up to 47 seconds for job step to finish. [929]PETSC ERROR: [929] KSPFGMRESCycle line 108 /home/mlohry/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c [929]PETSC ERROR: [929] KSPSolve_FGMRES line 274 /home/mlohry/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c [929]PETSC ERROR: [929] KSPSolve_Private line 590 /home/mlohry/petsc/src/ksp/ksp/interface/itfunc.c On Mon, Aug 24, 2020 at 6:47 PM Mark Lohry wrote: > I don't think I do. Running a much smaller case with the same models I get > the attached report from valgrind --show-leak-kinds=all --leak-check=full > --track-origins=yes. I only see some HDF5 stuff and OpenMPI that I think > are false positives. > > ==1286950== Memcheck, a memory error detector > ==1286950== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. > ==1286950== Using Valgrind-3.15.0-608cb11914-20190413 and LibVEX; rerun > with -h for copyright info > ==1286950== Command: ./verification_testing > --gtest_filter=DrivenCavity3D.Re100_BackwardEulerILU1_16x16N2_Quadrature1 > --petsc_time_integrator=arkimex --petsc_arkimex_type=l2 > ==1286950== Parent PID: 1286932 > ==1286950== > --1286950-- > --1286950-- Valgrind options: > --1286950-- --show-leak-kinds=all > --1286950-- --leak-check=full > --1286950-- --track-origins=yes > --1286950-- --log-file=valgrind-out.txt > --1286950-- -v > --1286950-- Contents of /proc/version: > --1286950-- Linux version 5.4.0-29-generic (buildd at lgw01-amd64-035) > (gcc version 9.3.0 (Ubuntu 9.3.0-10ubuntu2)) #33-Ubuntu SMP Wed Apr 29 > 14:32:27 UTC 2020 > --1286950-- > --1286950-- Arch and hwcaps: AMD64, LittleEndian, > amd64-cx16-rdtscp-sse3-ssse3-avx > --1286950-- Page sizes: currently 4096, max supported 4096 > --1286950-- Valgrind library directory: /usr/lib/x86_64-linux-gnu/valgrind > --1286950-- Reading syms from > /home/mlohry/dev/cmake-build/verification_testing > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/ld-2.31.so > --1286950-- Considering /usr/lib/x86_64-linux-gnu/ld-2.31.so .. > --1286950-- .. CRC mismatch (computed 387b17ea wanted d28cf5ef) > --1286950-- Considering /lib/x86_64-linux-gnu/ld-2.31.so .. > --1286950-- .. CRC mismatch (computed 387b17ea wanted d28cf5ef) > --1286950-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ld-2.31.so > .. > --1286950-- .. CRC is valid > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/valgrind/memcheck-amd64-linux > --1286950-- object doesn't have a symbol table > --1286950-- object doesn't have a dynamic symbol table > --1286950-- Scheduler: using generic scheduler lock implementation. > --1286950-- Reading suppressions file: > /usr/lib/x86_64-linux-gnu/valgrind/default.supp > ==1286950== embedded gdbserver: reading from > /tmp/vgdb-pipe-from-vgdb-to-1286950-by-mlohry-on-??? > ==1286950== embedded gdbserver: writing to > /tmp/vgdb-pipe-to-vgdb-from-1286950-by-mlohry-on-??? > ==1286950== embedded gdbserver: shared mem > /tmp/vgdb-pipe-shared-mem-vgdb-1286950-by-mlohry-on-??? > ==1286950== > ==1286950== TO CONTROL THIS PROCESS USING vgdb (which you probably > ==1286950== don't want to do, unless you know exactly what you're doing, > ==1286950== or are doing some strange experiment): > ==1286950== /usr/lib/x86_64-linux-gnu/valgrind/../../bin/vgdb > --pid=1286950 ...command... > ==1286950== > ==1286950== TO DEBUG THIS PROCESS USING GDB: start GDB like this > ==1286950== /path/to/gdb ./verification_testing > ==1286950== and then give GDB the following command > ==1286950== target remote | > /usr/lib/x86_64-linux-gnu/valgrind/../../bin/vgdb --pid=1286950 > ==1286950== --pid is optional if only one valgrind process is running > ==1286950== > --1286950-- REDIR: 0x4022d80 (ld-linux-x86-64.so.2:strlen) redirected to > 0x580c9ce2 (???) > --1286950-- REDIR: 0x4022b50 (ld-linux-x86-64.so.2:index) redirected to > 0x580c9cfc (???) > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_core-amd64-linux.so > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so > --1286950-- object doesn't have a symbol table > ==1286950== WARNING: new redirection conflicts with existing -- ignoring it > --1286950-- old: 0x04022d80 (strlen ) R-> (0000.0) > 0x580c9ce2 ??? > --1286950-- new: 0x04022d80 (strlen ) R-> (2007.0) > 0x0483f060 strlen > --1286950-- REDIR: 0x401f560 (ld-linux-x86-64.so.2:strcmp) redirected to > 0x483ffd0 (strcmp) > --1286950-- REDIR: 0x40232e0 (ld-linux-x86-64.so.2:mempcpy) redirected to > 0x4843a20 (mempcpy) > --1286950-- Reading syms from > /home/mlohry/dev/cmake-build/initialization/libinitialization.so > --1286950-- Reading syms from > /home/mlohry/dev/cmake-build/governing_equations/libgoverning_equations.so > --1286950-- Reading syms from > /home/mlohry/dev/cmake-build/time_stepping/libtime_stepping.so > --1286950-- Reading syms from > /home/mlohry/dev/cmake-build/governing_equations/libboundary_conditions.so > --1286950-- Reading syms from > /home/mlohry/dev/cmake-build/governing_equations/libsolution_monitors.so > --1286950-- Reading syms from > /home/mlohry/dev/cmake-build/governing_equations/libfluxtypes.so > --1286950-- Reading syms from > /home/mlohry/dev/cmake-build/algebraic_solvers/libalgebraic_solvers.so > --1286950-- Reading syms from > /home/mlohry/dev/cmake-build/program_options/libprogram_options.so > --1286950-- Reading syms from > /home/mlohry/dev/cmake-build/boost_install/lib/libboost_filesystem.so.1.73.0 > --1286950-- Reading syms from > /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0 > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so.40.20.1 > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3 > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libpthread-2.31.so > --1286950-- Considering > /usr/lib/debug/.build-id/77/5cbbfff814456660786780b0b3b40096b4c05e.debug .. > --1286950-- .. build-id is valid > --1286948-- Reading syms from > /home/mlohry/dev/cmake-build/external/petsc/arch-linux2-c-opt/lib/libpetsc.so.3.13.3 > --1286937-- Reading syms from > /home/mlohry/dev/cmake-build/parallel/libparallel.so > --1286937-- Reading syms from > /home/mlohry/dev/cmake-build/logger/liblogger.so > --1286937-- Reading syms from > /home/mlohry/dev/cmake-build/spatial_discretization/libdiscretization.so > --1286945-- Reading syms from > /home/mlohry/dev/cmake-build/utils/libutils.so > --1286944-- Reading syms from /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28 > --1286938-- object doesn't have a symbol table > --1286949-- Reading syms from /usr/lib/x86_64-linux-gnu/libm-2.31.so > --1286949-- Considering /usr/lib/x86_64-linux-gnu/libm-2.31.so .. > --1286947-- .. CRC mismatch (computed 327d785f wanted 751f5509) > --1286947-- Considering /lib/x86_64-linux-gnu/libm-2.31.so .. > --1286938-- .. CRC mismatch (computed 327d785f wanted 751f5509) > --1286937-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/libm-2.31.so > .. > --1286950-- .. CRC is valid > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libc-2.31.so > --1286950-- Considering /usr/lib/x86_64-linux-gnu/libc-2.31.so .. > --1286951-- .. CRC mismatch (computed a6f43087 wanted 6555436e) > --1286951-- Considering /lib/x86_64-linux-gnu/libc-2.31.so .. > --1286947-- .. CRC mismatch (computed a6f43087 wanted 6555436e) > --1286947-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.31.so > .. > --1286950-- .. CRC is valid > --1286940-- Reading syms from > /home/mlohry/dev/cmake-build/file_io/libfileio.so > --1286950-- Reading syms from > /home/mlohry/dev/cmake-build/boost_install/lib/libboost_program_options.so.1.73.0 > --1286950-- Reading syms from > /home/mlohry/dev/cmake-build/boost_install/lib/libboost_serialization.so.1.73.0 > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3 > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3 > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libhwloc.so.15.1.0 > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from > /home/mlohry/dev/cmake-build/external/petsc/arch-linux2-c-opt/lib/libsuperlu_dist.so.6.3.0 > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 > --1286950-- object doesn't have a symbol table > --1286937-- Reading syms from > /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 > --1286937-- object doesn't have a symbol table > --1286947-- Reading syms from /usr/lib/x86_64-linux-gnu/libX11.so.6.3.0 > --1286939-- object doesn't have a symbol table > --1286947-- Reading syms from /usr/lib/x86_64-linux-gnu/libdl-2.31.so > --1286947-- Considering /usr/lib/x86_64-linux-gnu/libdl-2.31.so .. > --1286947-- .. CRC mismatch (computed 4fd191ca wanted df8dd39a) > --1286947-- Considering /lib/x86_64-linux-gnu/libdl-2.31.so .. > --1286947-- .. CRC mismatch (computed 4fd191ca wanted df8dd39a) > --1286947-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ > libdl-2.31.so .. > --1286947-- .. CRC is valid > --1286937-- Reading syms from > /home/mlohry/dev/cmake-build/external/petsc/arch-linux2-c-opt/lib/libmetis.so > --1286937-- Reading syms from > /home/mlohry/dev/cmake-build/boost_install/lib/libboost_log.so.1.73.0 > --1286942-- Reading syms from > /home/mlohry/dev/cmake-build/boost_install/lib/libboost_log_setup.so.1.73.0 > --1286942-- Reading syms from > /home/mlohry/dev/cmake-build/boost_install/lib/libboost_thread.so.1.73.0 > --1286942-- Reading syms from > /home/mlohry/dev/cmake-build/boost_install/lib/libboost_regex.so.1.73.0 > --1286949-- Reading syms from > /home/mlohry/dev/cmake-build/basis_functions/libbasis_functions.so > --1286944-- Reading syms from /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0 > --1286944-- object doesn't have a symbol table > --1286951-- Reading syms from > /home/mlohry/dev/cmake-build/external_install/lib/libcgns.so > --1286951-- object doesn't have a symbol table > --1286943-- Reading syms from > /home/mlohry/dev/cmake-build/external_install/lib/libhdf5.so.103.1.0 > --1286951-- Reading syms from > /home/mlohry/dev/cmake-build/external/tinyxml2-build/libtinyxml2.so.6.1.0 > --1286944-- Reading syms from > /home/mlohry/dev/cmake-build/boost_install/lib/libboost_iostreams.so.1.73.0 > --1286944-- Reading syms from /usr/lib/x86_64-linux-gnu/libz.so.1.2.11 > --1286944-- object doesn't have a symbol table > --1286951-- Reading syms from > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0 > --1286951-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libutil-2.31.so > --1286946-- Considering /usr/lib/x86_64-linux-gnu/libutil-2.31.so .. > --1286946-- .. CRC mismatch (computed 4639aba5 wanted ceb246b4) > --1286946-- Considering /lib/x86_64-linux-gnu/libutil-2.31.so .. > --1286946-- .. CRC mismatch (computed 4639aba5 wanted ceb246b4) > --1286948-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ > libutil-2.31.so .. > --1286939-- .. CRC is valid > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libudev.so.1.6.17 > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libltdl.so.7.3.1 > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/libgfortran.so.5.0.0 > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libxcb.so.1.1.0 > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/librt-2.31.so > --1286950-- Considering /usr/lib/x86_64-linux-gnu/librt-2.31.so .. > --1286950-- .. CRC mismatch (computed a9acc0ce wanted cb4905a3) > --1286950-- Considering /lib/x86_64-linux-gnu/librt-2.31.so .. > --1286950-- .. CRC mismatch (computed a9acc0ce wanted cb4905a3) > --1286950-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ > librt-2.31.so .. > --1286950-- .. CRC is valid > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/libquadmath.so.0.0.0 > --1286950-- object doesn't have a symbol table > --1286945-- Reading syms from /usr/lib/x86_64-linux-gnu/libXau.so.6.0.0 > --1286945-- Considering /usr/lib/x86_64-linux-gnu/libXau.so.6.0.0 .. > --1286945-- .. CRC mismatch (computed 7de9b6ad wanted e8a17129) > --1286945-- Considering /lib/x86_64-linux-gnu/libXau.so.6.0.0 .. > --1286945-- .. CRC mismatch (computed 7de9b6ad wanted e8a17129) > --1286945-- object doesn't have a symbol table > --1286945-- Reading syms from /usr/lib/x86_64-linux-gnu/libXdmcp.so.6.0.0 > --1286942-- object doesn't have a symbol table > --1286942-- Reading syms from /usr/lib/x86_64-linux-gnu/libbsd.so.0.10.0 > --1286942-- object doesn't have a symbol table > --1286950-- REDIR: 0x6516600 (libc.so.6:memmove) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6515900 (libc.so.6:strncpy) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6516930 (libc.so.6:strcasecmp) redirected to > 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6515220 (libc.so.6:strcat) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6515960 (libc.so.6:rindex) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6517dd0 (libc.so.6:rawmemchr) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6532e60 (libc.so.6:wmemchr) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65329a0 (libc.so.6:wcscmp) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6516760 (libc.so.6:mempcpy) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6516590 (libc.so.6:bcmp) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6515890 (libc.so.6:strncmp) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65152d0 (libc.so.6:strcmp) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65166c0 (libc.so.6:memset) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6532960 (libc.so.6:wcschr) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65157f0 (libc.so.6:strnlen) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65153b0 (libc.so.6:strcspn) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6516980 (libc.so.6:strncasecmp) redirected to > 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6515350 (libc.so.6:strcpy) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6516ad0 (libc.so.6:memcpy@@GLIBC_2.14) redirected to > 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65340d0 (libc.so.6:wcsnlen) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65329e0 (libc.so.6:wcscpy) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65159a0 (libc.so.6:strpbrk) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6515280 (libc.so.6:index) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65157b0 (libc.so.6:strlen) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x651ed20 (libc.so.6:memrchr) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65169d0 (libc.so.6:strcasecmp_l) redirected to > 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6516550 (libc.so.6:memchr) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6532ab0 (libc.so.6:wcslen) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6515c60 (libc.so.6:strspn) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65168d0 (libc.so.6:stpncpy) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6516870 (libc.so.6:stpcpy) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6517e10 (libc.so.6:strchrnul) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6516a20 (libc.so.6:strncasecmp_l) redirected to > 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6516470 (libc.so.6:strstr) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65a3750 (libc.so.6:__memcpy_chk) redirected to > 0x48331d0 (_vgnU_ifunc_wrapper) > --1286938-- REDIR: 0x6527a30 (libc.so.6:__strrchr_sse2) redirected to > 0x483ea70 (__strrchr_sse2) > --1286938-- REDIR: 0x6511c90 (libc.so.6:calloc) redirected to 0x483dce0 > (calloc) > --1286938-- REDIR: 0x6510260 (libc.so.6:malloc) redirected to 0x483b780 > (malloc) > --1286938-- REDIR: 0x6531c40 (libc.so.6:memcpy at GLIBC_2.2.5) redirected to > 0x4840100 (memcpy at GLIBC_2.2.5) > --1286938-- REDIR: 0x6527d30 (libc.so.6:__strlen_sse2) redirected to > 0x483efa0 (__strlen_sse2) > --1286938-- REDIR: 0x65f4ac0 (libc.so.6:__strncmp_sse42) redirected to > 0x483f7c0 (__strncmp_sse42) > --1286938-- REDIR: 0x6510850 (libc.so.6:free) redirected to 0x483c9d0 > (free) > --1286938-- REDIR: 0x6532070 (libc.so.6:__memset_sse2_unaligned) > redirected to 0x48428e0 (memset) > --1286938-- REDIR: 0x6603350 (libc.so.6:__memcmp_sse4_1) redirected to > 0x4842150 (__memcmp_sse4_1) > --1286938-- REDIR: 0x6520520 (libc.so.6:__strcmp_sse2_unaligned) > redirected to 0x483fed0 (strcmp) > --1286938-- REDIR: 0x61d0c10 (libstdc++.so.6:operator new(unsigned long)) > redirected to 0x483bdf0 (operator new(unsigned long)) > --1286938-- REDIR: 0x61cee60 (libstdc++.so.6:operator delete(void*)) > redirected to 0x483cf50 (operator delete(void*)) > --1286938-- REDIR: 0x61d0c70 (libstdc++.so.6:operator new[](unsigned > long)) redirected to 0x483c510 (operator new[](unsigned long)) > --1286938-- REDIR: 0x61cee90 (libstdc++.so.6:operator delete[](void*)) > redirected to 0x483d6e0 (operator delete[](void*)) > --1286938-- REDIR: 0x65275f0 (libc.so.6:__strchr_sse2) redirected to > 0x483eb90 (__strchr_sse2) > --1286950-- REDIR: 0x6511000 (libc.so.6:realloc) redirected to 0x483df30 > (realloc) > --1286950-- REDIR: 0x6527820 (libc.so.6:__strchrnul_sse2) redirected to > 0x4843540 (strchrnul) > --1286950-- REDIR: 0x6531560 (libc.so.6:__strstr_sse2_unaligned) > redirected to 0x4843c20 (strstr) > --1286950-- REDIR: 0x6531c20 (libc.so.6:__mempcpy_sse2_unaligned) > redirected to 0x4843660 (mempcpy) > --1286950-- REDIR: 0x652d2a0 (libc.so.6:__strncpy_sse2_unaligned) > redirected to 0x483f560 (__strncpy_sse2_unaligned) > --1286950-- REDIR: 0x6515830 (libc.so.6:strncat) redirected to 0x48331d0 > (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65305b0 (libc.so.6:__strncat_sse2_unaligned) > redirected to 0x483ede0 (strncat) > --1286950-- REDIR: 0x6516120 (libc.so.6:__GI_strstr) redirected to > 0x4843ca0 (__strstr_sse2) > --1286950-- REDIR: 0x6522360 (libc.so.6:__rawmemchr_sse2) redirected to > 0x4843580 (rawmemchr) > --1286950-- REDIR: 0x65faea0 (libc.so.6:__strcasecmp_avx) redirected to > 0x483f830 (strcasecmp) > --1286950-- REDIR: 0x65fc520 (libc.so.6:__strncasecmp_avx) redirected to > 0x483f910 (strncasecmp) > --1286950-- REDIR: 0x65f98a0 (libc.so.6:__strspn_sse42) redirected to > 0x4843ef0 (strspn) > --1286950-- REDIR: 0x65f9620 (libc.so.6:__strcspn_sse42) redirected to > 0x4843e10 (strcspn) > --1286948-- REDIR: 0x6522030 (libc.so.6:__memchr_sse2) redirected to > 0x4840050 (memchr) > --1286948-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_mmap.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_posix.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_sysv.so > --1286948-- object doesn't have a symbol table > --1286948-- Discarding syms at 0x4a96240-0x4a96d47 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_posix.so > (have_dinfo 1) > --1286948-- Discarding syms at 0x4a9b1c0-0x4a9b937 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_sysv.so > (have_dinfo 1) > --1286948-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_weighted.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_netlink.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/libnl-3.so.200.26.0 > --1286948-- object doesn't have a symbol table > --1286948-- Discarding syms at 0x4a96120-0x4a966b0 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_weighted.so > (have_dinfo 1) > --1286948-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_flux.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_orte.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_ompi.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_slurm.so > --1286948-- object doesn't have a symbol table > --1286948-- REDIR: 0x64bc670 (libc.so.6:setenv) redirected to 0x4844480 > (setenv) > --1286948-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_pmi.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_flux.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_ext3x.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25 > --1286948-- object doesn't have a symbol table > --1286948-- Discarding syms at 0x8d053e0-0x8d07391 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_flux.so (have_dinfo > 1) > --1286948-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v21.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v20.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v3.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v12.so > --1286948-- object doesn't have a symbol table > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_usock.so > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_tcp.so > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_native.so > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_none.so > --1286950-- object doesn't have a symbol table > --1286950-- Discarding syms at 0x8d04180-0x8d045b0 in > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_none.so (have_dinfo 1) > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/libmca_common_dstore.so.1.0.2 > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_hash.so > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds12.so > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_pshmem_mmap.so > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_preg_native.so > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_stdfd.so > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_syslog.so > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_default.so > --1286950-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_tool.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_app.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_orted.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_hnp.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_novm.so > --1286946-- object doesn't have a symbol table > --1286946-- Discarding syms at 0x9ebf0a0-0x9ebf490 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_tool.so > (have_dinfo 1) > --1286946-- Discarding syms at 0x9eca300-0x9ecbee8 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_orted.so > (have_dinfo 1) > --1286946-- Discarding syms at 0x9ed1220-0x9ed24e7 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_hnp.so (have_dinfo > 1) > --1286946-- Discarding syms at 0x9ed8240-0x9ed8c88 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_novm.so > (have_dinfo 1) > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_tool.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_app.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_hnp.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_orted.so > --1286946-- object doesn't have a symbol table > --1286946-- Discarding syms at 0x9ebf0e0-0x9ebf417 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_tool.so > (have_dinfo 1) > --1286946-- Discarding syms at 0x9ecf320-0x9ed1239 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_hnp.so > (have_dinfo 1) > --1286946-- Discarding syms at 0x9ed73a0-0x9ed9ccc in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_orted.so > (have_dinfo 1) > --1286936-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_basic.so > --1286936-- object doesn't have a symbol table > --1286936-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_bucket.so > --1286936-- object doesn't have a symbol table > --1286936-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_rcache_grdma.so > --1286936-- object doesn't have a symbol table > --1286936-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mpool_hugepage.so > --1286936-- object doesn't have a symbol table > --1286936-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_bml_r2.so > --1286936-- object doesn't have a symbol table > --1286936-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_tcp.so > --1286936-- object doesn't have a symbol table > --1286936-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_sm.so > --1286936-- object doesn't have a symbol table > --1286936-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so > --1286936-- object doesn't have a symbol table > --1286936-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_openib.so > --1286936-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_verbs.so.40.20.0 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_self.so > --1286946-- object doesn't have a symbol table > --1286946-- REDIR: 0x652cc70 (libc.so.6:__strcpy_sse2_unaligned) > redirected to 0x483f090 (strcpy) > --1286946-- REDIR: 0x65a3810 (libc.so.6:__memmove_chk) redirected to > 0x48331d0 (_vgnU_ifunc_wrapper) > ==1286946== WARNING: new redirection conflicts with existing -- ignoring it > --1286946-- old: 0x06531c30 (__memcpy_chk_sse2_un) R-> (2030.0) > 0x04843b10 __memcpy_chk > --1286946-- new: 0x06531c30 (__memcpy_chk_sse2_un) R-> (2024.0) > 0x048434d0 __memmove_chk > --1286946-- REDIR: 0x6531c30 (libc.so.6:__memcpy_chk_sse2_unaligned) > redirected to 0x4843b10 (__memcpy_chk) > --1286946-- REDIR: 0x65129b0 (libc.so.6:posix_memalign) redirected to > 0x483e1e0 (posix_memalign) > --1286946-- Discarding syms at 0x9f15280-0x9f32932 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_openib.so > (have_dinfo 1) > --1286946-- Discarding syms at 0x9f7c4c0-0x9f7ded8 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_verbs.so.40.20.0 > (have_dinfo 1) > --1286946-- Discarding syms at 0x9f620c0-0x9f71483 in > /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 (have_dinfo 1) > --1286946-- Discarding syms at 0x9f9ba10-0x9fd22ee in > /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 (have_dinfo 1) > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_cm.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_monitoring.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_monitoring.so.50.10.0 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_vprotocol_pessimist.so > --1286946-- object doesn't have a symbol table > --1286946-- Discarding syms at 0x9f4d400-0x9f50c19 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_vprotocol_pessimist.so > (have_dinfo 1) > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/libpsm1/libpsm_infinipath.so.1.16 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/libinfinipath.so.4.0 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libuuid.so.1.3.0 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_ofi.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libfabric.so.1.9.15 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/librdmacm.so.1.2.28.0 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libpsm2.so.2.1 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libnuma.so.1.0.0 > --1286946-- object doesn't have a symbol table > --1286946-- REDIR: 0x6517140 (libc.so.6:strcasestr) redirected to > 0x4843f80 (strcasestr) > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm2.so > --1286946-- object doesn't have a symbol table > --1286946-- Discarding syms at 0x9f4d5c0-0x9f4f5a1 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm.so (have_dinfo 1) > --1286946-- Discarding syms at 0x9fee680-0x9ff096c in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm2.so (have_dinfo > 1) > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_inter.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_basic.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sm.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_sm.so.40.20.0 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_self.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sync.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_monitoring.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_libnbc.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_tuned.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_sm.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_pt2pt.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_rdma.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_monitoring.so > --1286946-- object doesn't have a symbol table > --1286946-- Discarding syms at 0x9f724a0-0x9f787b5 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_ofi.so (have_dinfo 1) > --1286946-- Discarding syms at 0xa827f80-0xa8e14c4 in > /usr/lib/x86_64-linux-gnu/libfabric.so.1.9.15 (have_dinfo 1) > --1286946-- Discarding syms at 0x9f94830-0x9fbafce in > /usr/lib/libpsm1/libpsm_infinipath.so.1.16 (have_dinfo 1) > --1286946-- Discarding syms at 0x9fe5580-0x9fe8f71 in > /usr/lib/x86_64-linux-gnu/libuuid.so.1.3.0 (have_dinfo 1) > --1286946-- Discarding syms at 0x9f56420-0x9f5cec0 in > /usr/lib/x86_64-linux-gnu/libinfinipath.so.4.0 (have_dinfo 1) > --1286946-- Discarding syms at 0xa929f10-0xa93d5fc in > /usr/lib/x86_64-linux-gnu/librdmacm.so.1.2.28.0 (have_dinfo 1) > --1286946-- Discarding syms at 0xa94b0c0-0xa95a483 in > /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 (have_dinfo 1) > --1286946-- Discarding syms at 0xa968860-0xa9adf12 in > /usr/lib/x86_64-linux-gnu/libpsm2.so.2.1 (have_dinfo 1) > --1286946-- Discarding syms at 0xa9e7a10-0xaa1e2ee in > /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 (have_dinfo 1) > --1286946-- Discarding syms at 0x9f80410-0x9f84e27 in > /usr/lib/x86_64-linux-gnu/libnuma.so.1.0.0 (have_dinfo 1) > --1286946-- Discarding syms at 0x9f103e0-0x9f15fd5 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_cm.so (have_dinfo 1) > --1286946-- Discarding syms at 0x9f471e0-0x9f47ce0 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_monitoring.so > (have_dinfo 1) > ==1286946== Thread 3: > ==1286946== Syscall param writev(vector[...]) points to uninitialised > byte(s) > ==1286946== at 0x658A48D: __writev (writev.c:26) > ==1286946== by 0x658A48D: writev (writev.c:24) > ==1286946== by 0x8DF9B4C: pmix_ptl_base_send_handler (in > /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) > ==1286946== by 0x7CC413E: ??? (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286946== by 0x7CC487E: event_base_loop (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286946== by 0x8DBDD55: ??? (in > /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) > ==1286946== by 0x4BF7608: start_thread (pthread_create.c:477) > ==1286946== by 0x6595102: clone (clone.S:95) > ==1286946== Address 0xa28fdcf is 127 bytes inside a block of size 5,120 > alloc'd > ==1286946== at 0x483DFAF: realloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286946== by 0x8DE155A: pmix_bfrop_buffer_extend (in > /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) > ==1286946== by 0x8DE3F4A: pmix_bfrops_base_pack_byte (in > /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) > ==1286946== by 0x8DE4900: pmix_bfrops_base_pack_buf (in > /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) > ==1286946== by 0x8DE4175: pmix_bfrops_base_pack (in > /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) > ==1286946== by 0x8D7CF91: ??? (in > /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) > ==1286946== by 0x7CC3FDD: ??? (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286946== by 0x7CC487E: event_base_loop (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286946== by 0x8DBDD55: ??? (in > /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) > ==1286946== by 0x4BF7608: start_thread (pthread_create.c:477) > ==1286946== by 0x6595102: clone (clone.S:95) > ==1286946== Uninitialised value was created by a stack allocation > ==1286946== at 0x9F048D6: ??? (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so) > ==1286946== > --1286944-- Discarding syms at 0xaa4d220-0xaa5796a in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmp--1286945-- Discarding syms at > 0xaa4d220---1286948-- Discarding syms at 0xaae1100-0xaae7d70 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmp--1286945-- Discarding syms at > 0xaae1100-0xaae7d70 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_monitoring.so > (have_dinfo 1) > --1286945-- Discarding syms at 0x9f69420-0x9f--1286938-- REDIR: 0x61cee70 > (libstdc++.so.6:operator delete(void*, unsigned long)) redirected to > --1286937-- REDIR: 0x61cee70 (libstdc++.so.6:opera--1286946-- REDIR: > 0x652e970 (libc.so.6:__stpncpy_sse2_unaligned) redirected to 0x48427e0 > (stpncpy) > --1286942-- REDIR: 0x6527ed0 (libc.so.6:__strnlen_sse2) redirected to > 0x483eee0 (strnlen) > --1286944-- REDIR: 0x652fcc0 (libc.so.6:__strcat_sse2_unaligned) > redirected to 0x483ec20 (strcat) > --1286951-- REDIR: 0x65113d0 (libc.so.6:memalign) redirected to 0x483e2a0 > (memalign) > --1286951-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_romio321.so > --1286951-- object doesn't have a symbol table > --1286951-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_ompio.so > --1286951-- object doesn't have a symbol table > --1286941-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_ompio.so.41.19.3 > --1286941-- object doesn't have a symbol table > --1286951-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fs_ufs.so > --1286951-- object doesn't have a symbol table > --1286939-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_two_phase.so > --1286939-- object doesn't have a symbol table > --1286939-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_individual.so > --1286939-- object doesn't have a symbol table > --1286939-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic.so > --1286939-- object doesn't have a symbol table > --1286939-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic_gen2.so > --1286939-- object doesn't have a symbol table > --1286939-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_vulcan.so > --1286939-- object doesn't have a symbol table > --1286939-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fbtl_posix.so > --1286939-- object doesn't have a symbol table > --1286943-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_sm.so > --1286943-- object doesn't have a symbol table > --1286943-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_individual.so > --1286943-- object doesn't have a symbol table > --1286943-- Reading syms from > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_lockedfile.so > --1286943-- object doesn't have a symbol table > --1286938-- REDIR: 0x65a3b00 (libc.so.6:__strcpy_chk) redirected to > 0x48435c0 (__strcpy_chk) > --1286939-- Discarding syms at 0x9f1d660-0x9f371d6 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so (have_dinfo 1) > --1286939-- Discarding syms at 0x9f5afa0-0x9f8f8b6 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_romio321.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x9fa0640-0x9fa42d9 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_ompio.so (have_dinfo > 1) > --1286939-- Discarding syms at 0x9f4c160-0x9f4dc58 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_inter.so > (have_dinfo 1) > --1286939-- Discarding syms at 0xa7fc270-0xa804f00 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_basic.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x9fee3a0-0x9ff134e in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sm.so (have_dinfo 1) > --1286939-- Discarding syms at 0xa80a240-0xa80aa8d in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_sm.so.40.20.0 > (have_dinfo 1) > --1286939-- Discarding syms at 0xa80f0e0-0xa80f8bb in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_self.so (have_dinfo > 1) > --1286939-- Discarding syms at 0xaa460c0-0xaa47947 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sync.so (have_dinfo > 1) > --1286939-- Discarding syms at 0xaa613e0-0xaa7730f in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_libnbc.so > (have_dinfo 1) > --1286939-- Discarding syms at 0xaa849c0-0xaa8a845 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_tuned.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x9ee1320-0x9ee3567 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_bml_r2.so (have_dinfo 1) > --1286939-- Discarding syms at 0x9eebc40-0x9ef4ad7 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_tcp.so (have_dinfo 1) > --1286939-- Discarding syms at 0x9f02600-0x9f08cd8 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so (have_dinfo > 1) > --1286939-- Discarding syms at 0x9f40200-0x9f4126e in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_self.so (have_dinfo > 1) > --1286939-- Discarding syms at 0x9eda4e0-0x9edb4c5 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mpool_hugepage.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x9ed32c0-0x9ed4afe in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_rcache_grdma.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x9ebf160-0x9ebfe95 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_basic.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x9ece140-0x9ecebed in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_bucket.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x9ec92a0-0x9ec9aa2 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_app.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x8eae0e0-0x8eae4a7 in > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_stdfd.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8eb3220-0x8eb3c27 in > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_syslog.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8eb80e0-0x8eb90b7 in > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_default.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8ea6380-0x8ea97b3 in > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_preg_native.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8e5a740-0x8e5f859 in > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_usock.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8e67be0-0x8e743f0 in > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_tcp.so (have_dinfo 1) > --1286939-- Discarding syms at 0x84da200-0x84daa5d in > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_native.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8d322b0-0x8d34bfc in > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v21.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8e29480-0x8e3b70a in > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v20.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8d3c2b0-0x8d3ed5c in > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v3.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8e45340-0x8e502da in > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v12.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8e901a0-0x8e908a7 in > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_pshmem_mmap.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8d05520-0x8d06783 in > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8e7b460-0x8e8aaa4 in > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_hash.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8d44520-0x8d4556a in > /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds12.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8e97600-0x8ea0fa1 in > /usr/lib/x86_64-linux-gnu/pmix/lib/libmca_common_dstore.so.1.0.2 > (have_dinfo 1) > --1286939-- Discarding syms at 0x8d109c0-0x8d27dcf in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_ext3x.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x8d5b280-0x8dfdffb in > /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25 (have_dinfo 1) > --1286939-- Discarding syms at 0x9ec40a0-0x9ec4490 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_app.so (have_dinfo > 1) > --1286939-- Discarding syms at 0x84d2580-0x84d518f in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_pmi.so (have_dinfo 1) > --1286939-- Discarding syms at 0x4a96120-0x4a9644f in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_flux.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x4aa0100-0x4aa03e7 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_orte.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x84c74a0-0x84c901f in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_ompi.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x4aa5260-0x4aa58e9 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_slurm.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x4a9b420-0x4a9bcdf in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_netlink.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x84e7460-0x84f52ca in > /usr/lib/x86_64-linux-gnu/libnl-3.so.200.26.0 (have_dinfo 1) > --1286939-- Discarding syms at 0x4a90360-0x4a91107 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_mmap.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x9f46220-0x9f474cc in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fbtl_posix.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x9f0f180-0x9f0f78d in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fs_ufs.so (have_dinfo 1) > --1286939-- Discarding syms at 0xaa94540-0xaa96a4a in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_sm.so (have_dinfo 1) > --1286939-- Discarding syms at 0xaa9f6c0-0xaab44d0 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_pt2pt.so (have_dinfo > 1) > --1286939-- Discarding syms at 0xaabe820-0xaad8ee0 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_rdma.so (have_dinfo > 1) > --1286939-- Discarding syms at 0x9efc080-0x9efc1e1 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_sm.so (have_dinfo 1) > --1286939-- Discarding syms at 0x9fab2a0-0x9fb1341 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_two_phase.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x9f140c0-0x9f14299 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_individual.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x9fb72a0-0x9fbb791 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x9fd52a0-0x9fda794 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic_gen2.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x9fe02e0-0x9fe59a5 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_vulcan.so > (have_dinfo 1) > --1286939-- Discarding syms at 0xa815460-0xa8177ab in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_sm.so > (have_dinfo 1) > --1286939-- Discarding syms at 0xa81e260-0xa82033d in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_individual.so > (have_dinfo 1) > --1286939-- Discarding syms at 0xa8273e0-0xa8297d8 in > /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_lockedfile.so > (have_dinfo 1) > --1286939-- Discarding syms at 0x9fc85e0-0x9fce8ef in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_ompio.so.41.19.3 > (have_dinfo 1) > ==1286939== > ==1286939== HEAP SUMMARY: > ==1286939== in use at exit: 74,054 bytes in 223 blocks > ==1286939== total heap usage: 22,405,782 allocs, 22,405,559 frees, > 34,062,479,959 bytes allocated > ==1286939== > ==1286939== Searching for pointers to 223 not-freed blocks > ==1286939== Checked 3,415,912 bytes > ==1286939== > ==1286939== Thread 1: > ==1286939== 1 bytes in 1 blocks are definitely lost in loss record 1 of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x651550E: strdup (strdup.c:42) > ==1286939== by 0x9F6A4B6: ??? > ==1286939== by 0x9F47373: ??? > ==1286939== by 0x68E3B9B: mca_base_framework_components_register (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3F35: mca_base_framework_register (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3F93: mca_base_framework_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x4BA1734: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== > ==1286939== 8 bytes in 1 blocks are still reachable in loss record 2 of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x764724C: ??? (in > /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) > ==1286939== by 0x7657B9A: ??? (in > /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) > ==1286939== by 0x7645679: ??? (in > /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) > ==1286939== by 0x4011B89: call_init.part.0 (dl-init.c:72) > ==1286939== by 0x4011C90: call_init (dl-init.c:30) > ==1286939== by 0x4011C90: _dl_init (dl-init.c:119) > ==1286939== by 0x4001139: ??? (in /usr/lib/x86_64-linux-gnu/ld-2.31.so) > ==1286939== by 0x3: ??? > ==1286939== by 0x1FFEFFF926: ??? > ==1286939== by 0x1FFEFFF93D: ??? > ==1286939== by 0x1FFEFFF987: ??? > ==1286939== by 0x1FFEFFF9A7: ??? > ==1286939== > ==1286939== 8 bytes in 1 blocks are definitely lost in loss record 3 of 44 > ==1286939== at 0x483DD99: calloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x9F69B6F: ??? > ==1286939== by 0x9F1CDED: ??? > ==1286939== by 0x68FC9C8: mca_btl_base_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9EE3527: ??? > ==1286939== by 0x4B6170A: mca_bml_base_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA1714: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== by 0x15710D: main (testing_main.cpp:8) > ==1286939== > ==1286939== 13 bytes in 2 blocks are still reachable in loss record 4 of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x651550E: strdup (strdup.c:42) > ==1286939== by 0x7CC3657: event_config_avoid_method (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x68FEB5A: opal_event_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68FE8CA: ??? (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E4008: mca_base_framework_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68B8BCF: opal_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x6860120: orte_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== > ==1286939== 15 bytes in 1 blocks are indirectly lost in loss record 5 of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x651550E: strdup (strdup.c:42) > ==1286939== by 0x9EDB189: ??? > ==1286939== by 0x68D98FC: mca_base_framework_components_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x6907C25: ??? (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E4008: mca_base_framework_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x4BA16D5: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== by 0x15710D: main (testing_main.cpp:8) > ==1286939== > ==1286939== 15 bytes in 1 blocks are definitely lost in loss record 6 of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x651550E: strdup (strdup.c:42) > ==1286939== by 0x9F5655C: ??? > ==1286939== by 0x4011B89: call_init.part.0 (dl-init.c:72) > ==1286939== by 0x4011C90: call_init (dl-init.c:30) > ==1286939== by 0x4011C90: _dl_init (dl-init.c:119) > ==1286939== by 0x65D6784: _dl_catch_exception (dl-error-skeleton.c:182) > ==1286939== by 0x401642C: dl_open_worker (dl-open.c:758) > ==1286939== by 0x65D6727: _dl_catch_exception (dl-error-skeleton.c:208) > ==1286939== by 0x40155F9: _dl_open (dl-open.c:837) > ==1286939== by 0x72DE34B: dlopen_doit (dlopen.c:66) > ==1286939== by 0x65D6727: _dl_catch_exception (dl-error-skeleton.c:208) > ==1286939== by 0x65D67F2: _dl_catch_error (dl-error-skeleton.c:227) > ==1286939== > ==1286939== 16 bytes in 1 blocks are definitely lost in loss record 7 of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x9F1CBEB: ??? > ==1286939== by 0x68FC9C8: mca_btl_base_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9EE3527: ??? > ==1286939== by 0x4B6170A: mca_bml_base_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA1714: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== by 0x15710D: main (testing_main.cpp:8) > ==1286939== > ==1286939== 16 bytes in 1 blocks are definitely lost in loss record 8 of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x9F1CC66: ??? > ==1286939== by 0x68FC9C8: mca_btl_base_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9EE3527: ??? > ==1286939== by 0x4B6170A: mca_bml_base_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA1714: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== by 0x15710D: main (testing_main.cpp:8) > ==1286939== > ==1286939== 16 bytes in 1 blocks are definitely lost in loss record 9 of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x9F1CCDA: ??? > ==1286939== by 0x68FC9C8: mca_btl_base_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9EE3527: ??? > ==1286939== by 0x4B6170A: mca_bml_base_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA1714: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== by 0x15710D: main (testing_main.cpp:8) > ==1286939== > ==1286939== 25 bytes in 1 blocks are still reachable in loss record 10 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x651550E: strdup (strdup.c:42) > ==1286939== by 0x68F27BD: ??? (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x4B956B6: ompi_pml_v_output_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B95259: ??? (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x68D98FC: mca_base_framework_components_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x4B93FAE: ??? (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x68E4008: mca_base_framework_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x4BA1734: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== > ==1286939== 30 bytes in 1 blocks are definitely lost in loss record 11 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0xA9A859B: ??? > ==1286939== by 0x4011B89: call_init.part.0 (dl-init.c:72) > ==1286939== by 0x4011C90: call_init (dl-init.c:30) > ==1286939== by 0x4011C90: _dl_init (dl-init.c:119) > ==1286939== by 0x65D6784: _dl_catch_exception (dl-error-skeleton.c:182) > ==1286939== by 0x401642C: dl_open_worker (dl-open.c:758) > ==1286939== by 0x65D6727: _dl_catch_exception (dl-error-skeleton.c:208) > ==1286939== by 0x40155F9: _dl_open (dl-open.c:837) > ==1286939== by 0x72DE34B: dlopen_doit (dlopen.c:66) > ==1286939== by 0x65D6727: _dl_catch_exception (dl-error-skeleton.c:208) > ==1286939== by 0x65D67F2: _dl_catch_error (dl-error-skeleton.c:227) > ==1286939== by 0x72DEB58: _dlerror_run (dlerror.c:170) > ==1286939== > ==1286939== 32 bytes in 1 blocks are still reachable in loss record 12 of > 44 > ==1286939== at 0x483DD99: calloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x7CC353E: event_get_supported_methods (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x68FEA98: opal_event_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68FE8CA: ??? (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E4008: mca_base_framework_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68B8BCF: opal_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x6860120: orte_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== > ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 13 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A5EB: ??? > ==1286939== by 0x84D2B0A: ??? > ==1286939== by 0x68602FB: orte_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== > ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 14 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A5EB: ??? > ==1286939== by 0x84D2BCE: ??? > ==1286939== by 0x68602FB: orte_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== > ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 15 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A5EB: ??? > ==1286939== by 0x84D2CB2: ??? > ==1286939== by 0x68602FB: orte_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== > ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 16 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A5EB: ??? > ==1286939== by 0x84D2D91: ??? > ==1286939== by 0x68602FB: orte_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== > ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 17 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E81BD8: ??? > ==1286939== by 0x8E89F4B: ??? > ==1286939== by 0x8D84A0D: ??? > ==1286939== by 0x8DF79C1: ??? > ==1286939== by 0x7CC3FDD: ??? (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x7CC487E: event_base_loop (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x8DBDD55: ??? > ==1286939== by 0x4BF7608: start_thread (pthread_create.c:477) > ==1286939== by 0x6595102: clone (clone.S:95) > ==1286939== > ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 18 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A767: ??? > ==1286939== by 0x84D330E: ??? > ==1286939== by 0x68602FB: orte_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== > ==1286939== 36 (32 direct, 4 indirect) bytes in 1 blocks are definitely > lost in loss record 19 of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A5EB: ??? > ==1286939== by 0x4B94C09: mca_pml_base_pml_check_selected (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x9F1E1E1: ??? > ==1286939== by 0x4BA1A09: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== > ==1286939== 40 bytes in 1 blocks are still reachable in loss record 20 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x7CFF4B6: ??? (in > /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) > ==1286939== by 0x7CC5E26: event_global_setup_locks_ (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x7CFF68F: evthread_use_pthreads (in > /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) > ==1286939== by 0x68FE8E4: ??? (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E4008: mca_base_framework_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68B8BCF: opal_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x6860120: orte_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== > ==1286939== 40 bytes in 1 blocks are still reachable in loss record 21 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x7CFF4B6: ??? (in > /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) > ==1286939== by 0x7CCF377: evsig_global_setup_locks_ (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x7CC5E39: event_global_setup_locks_ (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x7CFF68F: evthread_use_pthreads (in > /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) > ==1286939== by 0x68FE8E4: ??? (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E4008: mca_base_framework_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68B8BCF: opal_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x6860120: orte_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== > ==1286939== 40 bytes in 1 blocks are still reachable in loss record 22 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x7CFF4B6: ??? (in > /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) > ==1286939== by 0x7CCB997: evutil_secure_rng_global_setup_locks_ (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x7CC5E4F: event_global_setup_locks_ (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x7CFF68F: evthread_use_pthreads (in > /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) > ==1286939== by 0x68FE8E4: ??? (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E4008: mca_base_framework_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68B8BCF: opal_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x6860120: orte_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== > ==1286939== 48 bytes in 1 blocks are still reachable in loss record 23 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x68D9043: mca_base_component_repository_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68D7F7A: mca_base_component_find (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3F35: mca_base_framework_register (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3F93: mca_base_framework_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x4B8560C: mca_io_base_file_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B0E68A: ompi_file_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B3ADB8: PMPI_File_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) > ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) > ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) > ==1286939== > ==1286939== 48 bytes in 1 blocks are still reachable in loss record 24 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x68D9043: mca_base_component_repository_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68D7F7A: mca_base_component_find (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3F35: mca_base_framework_register (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3F93: mca_base_framework_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x4B85638: mca_io_base_file_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B0E68A: ompi_file_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B3ADB8: PMPI_File_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) > ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) > ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) > ==1286939== > ==1286939== 48 bytes in 2 blocks are still reachable in loss record 25 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x7CC3647: event_config_avoid_method (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x68FEB5A: opal_event_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68FE8CA: ??? (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E4008: mca_base_framework_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68B8BCF: opal_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x6860120: orte_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== > ==1286939== 55 (32 direct, 23 indirect) bytes in 1 blocks are definitely > lost in loss record 26 of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A767: ??? > ==1286939== by 0x4AF6CD6: ompi_comm_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA194D: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== > ==1286939== 56 bytes in 1 blocks are still reachable in loss record 27 of > 44 > ==1286939== at 0x483DD99: calloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x7CC1C86: event_config_new (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x68FEAC0: opal_event_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68FE8CA: ??? (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E4008: mca_base_framework_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68B8BCF: opal_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x6860120: orte_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== > ==1286939== 56 bytes in 1 blocks are definitely lost in loss record 28 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x9F6E008: ??? > ==1286939== by 0x9F7C654: ??? > ==1286939== by 0x9F1CD3E: ??? > ==1286939== by 0x68FC9C8: mca_btl_base_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9EE3527: ??? > ==1286939== by 0x4B6170A: mca_bml_base_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA1714: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== > ==1286939== 56 bytes in 1 blocks are definitely lost in loss record 29 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0xA957008: ??? > ==1286939== by 0xA86B017: ??? > ==1286939== by 0xA862FD8: ??? > ==1286939== by 0xA828E15: ??? > ==1286939== by 0xA829624: ??? > ==1286939== by 0x9F77910: ??? > ==1286939== by 0x4B85C53: ompi_mtl_base_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x9F13E4D: ??? > ==1286939== by 0x4B94673: mca_pml_base_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA1789: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== > ==1286939== 76 (32 direct, 44 indirect) bytes in 1 blocks are definitely > lost in loss record 30 of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A767: ??? > ==1286939== by 0x84D387F: ??? > ==1286939== by 0x68602FB: orte_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== > ==1286939== 79 (64 direct, 15 indirect) bytes in 1 blocks are definitely > lost in loss record 31 of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x9EDB12E: ??? > ==1286939== by 0x68D98FC: mca_base_framework_components_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x6907C25: ??? (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E4008: mca_base_framework_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x4BA16D5: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== by 0x15710D: main (testing_main.cpp:8) > ==1286939== > ==1286939== 144 bytes in 3 blocks are still reachable in loss record 32 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x68D9043: mca_base_component_repository_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68D7F7A: mca_base_component_find (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3F35: mca_base_framework_register (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3F93: mca_base_framework_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x4B8564E: mca_io_base_file_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B0E68A: ompi_file_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B3ADB8: PMPI_File_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) > ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) > ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) > ==1286939== > ==1286939== 231 bytes in 12 blocks are definitely lost in loss record 33 > of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x651550E: strdup (strdup.c:42) > ==1286939== by 0x9F2B4B3: ??? > ==1286939== by 0x9F2B85C: ??? > ==1286939== by 0x9F2BBD7: ??? > ==1286939== by 0x9F1CAAC: ??? > ==1286939== by 0x68FC9C8: mca_btl_base_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9EE3527: ??? > ==1286939== by 0x4B6170A: mca_bml_base_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA1714: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== > ==1286939== 240 bytes in 5 blocks are still reachable in loss record 34 of > 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x68D9043: mca_base_component_repository_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68D7F7A: mca_base_component_find (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3F35: mca_base_framework_register (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3F93: mca_base_framework_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x4B85622: mca_io_base_file_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B0E68A: ompi_file_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B3ADB8: PMPI_File_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) > ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) > ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) > ==1286939== > ==1286939== 272 bytes in 44 blocks are definitely lost in loss record 35 > of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x9FCAEDB: ??? > ==1286939== by 0x9FE42B2: ??? > ==1286939== by 0x9FE47BB: ??? > ==1286939== by 0x9FCDDBF: ??? > ==1286939== by 0x9FA324A: ??? > ==1286939== by 0x4B3DD7F: PMPI_File_write_at_all (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x7B6DB32: H5FD_mpio_write (H5FDmpio.c:1833) > ==1286939== by 0x78DF11D: H5FD_write (H5FDint.c:257) > ==1286939== by 0x78AE86B: H5F__accum_write (H5Faccum.c:825) > ==1286939== by 0x7A1FBE9: H5PB_write (H5PB.c:1027) > ==1286939== by 0x78BBC7A: H5F_block_write (H5Fio.c:164) > ==1286939== > ==1286939== 585 (480 direct, 105 indirect) bytes in 15 blocks are > definitely lost in loss record 36 of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A767: ??? > ==1286939== by 0x4B14036: ompi_proc_complete_init_single (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B146C3: ompi_proc_complete_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA19A9: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== > ==1286939== 776 bytes in 32 blocks are indirectly lost in loss record 37 > of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8DE9816: ??? > ==1286939== by 0x8DEB1D2: ??? > ==1286939== by 0x8DEB49A: ??? > ==1286939== by 0x8DE8B12: ??? > ==1286939== by 0x8E9D492: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A767: ??? > ==1286939== > ==1286939== 840 (480 direct, 360 indirect) bytes in 15 blocks are > definitely lost in loss record 38 of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A5EB: ??? > ==1286939== by 0x9EF2F00: ??? > ==1286939== by 0x9EEBF17: ??? > ==1286939== by 0x9EE2F54: ??? > ==1286939== by 0x9F1E1FB: ??? > ==1286939== > ==1286939== 1,084 (480 direct, 604 indirect) bytes in 15 blocks are > definitely lost in loss record 39 of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A767: ??? > ==1286939== by 0x84D4800: ??? > ==1286939== by 0x68602FB: orte_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== > ==1286939== 1,344 bytes in 1 blocks are definitely lost in loss record 40 > of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x68AE702: opal_free_list_grow_st (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9F1CD2D: ??? > ==1286939== by 0x68FC9C8: mca_btl_base_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9EE3527: ??? > ==1286939== by 0x4B6170A: mca_bml_base_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA1714: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== by 0x15710D: main (testing_main.cpp:8) > ==1286939== > ==1286939== 2,752 bytes in 1 blocks are definitely lost in loss record 41 > of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x68AE702: opal_free_list_grow_st (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9F1CC50: ??? > ==1286939== by 0x68FC9C8: mca_btl_base_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9EE3527: ??? > ==1286939== by 0x4B6170A: mca_bml_base_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA1714: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== by 0x15710D: main (testing_main.cpp:8) > ==1286939== > ==1286939== 2,752 bytes in 1 blocks are definitely lost in loss record 42 > of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x68AE702: opal_free_list_grow_st (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9F1CCC4: ??? > ==1286939== by 0x68FC9C8: mca_btl_base_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9EE3527: ??? > ==1286939== by 0x4B6170A: mca_bml_base_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA1714: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== by 0x15710D: main (testing_main.cpp:8) > ==1286939== > ==1286939== 62,644 bytes in 31 blocks are indirectly lost in loss record > 43 of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8DE9FA8: ??? > ==1286939== by 0x8DEB032: ??? > ==1286939== by 0x8DEB49A: ??? > ==1286939== by 0x8DE8B12: ??? > ==1286939== by 0x8E9D492: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A5EB: ??? > ==1286939== > ==1286939== 62,760 (480 direct, 62,280 indirect) bytes in 15 blocks are > definitely lost in loss record 44 of 44 > ==1286939== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A5EB: ??? > ==1286939== by 0x9F0398A: ??? > ==1286939== by 0x9EE2F54: ??? > ==1286939== by 0x9F1E1FB: ??? > ==1286939== by 0x4BA1A09: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== > ==1286939== LEAK SUMMARY: > ==1286939== definitely lost: 9,837 bytes in 138 blocks > ==1286939== indirectly lost: 63,435 bytes in 64 blocks > ==1286939== possibly lost: 0 bytes in 0 blocks > ==1286939== still reachable: 782 bytes in 21 blocks > ==1286939== suppressed: 0 bytes in 0 blocks > ==1286939== > ==1286939== ERROR SUMMARY: 29 errors from 29 contexts (suppressed: 0 from > 0) > ==1286939== > ==1286939== 1 errors in context 1 of 29: > ==1286939== Thread 3: > ==1286939== Syscall param writev(vector[...]) points to uninitialised > byte(s) > ==1286939== at 0x658A48D: __writev (writev.c:26) > ==1286939== by 0x658A48D: writev (writev.c:24) > ==1286939== by 0x8DF9B4C: ??? > ==1286939== by 0x7CC413E: ??? (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x7CC487E: event_base_loop (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x8DBDD55: ??? > ==1286939== by 0x4BF7608: start_thread (pthread_create.c:477) > ==1286939== by 0x6595102: clone (clone.S:95) > ==1286939== Address 0xa28ee1f is 127 bytes inside a block of size 5,120 > alloc'd > ==1286939== at 0x483DFAF: realloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8DE155A: ??? > ==1286939== by 0x8DE3F4A: ??? > ==1286939== by 0x8DE4900: ??? > ==1286939== by 0x8DE4175: ??? > ==1286939== by 0x8D7CF91: ??? > ==1286939== by 0x7CC3FDD: ??? (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x7CC487E: event_base_loop (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x8DBDD55: ??? > ==1286939== by 0x4BF7608: start_thread (pthread_create.c:477) > ==1286939== by 0x6595102: clone (clone.S:95) > ==1286939== Uninitialised value was created by a stack allocation > ==1286939== at 0x9F048D6: ??? > ==1286939== > ==1286939== ERROR SUMMARY: 29 errors from 29 contexts (suppressed: 0 from > 0) > mpi/lib/libopen-pal.so.40.20.3) > ==1286936== by 0x4B85622: mca_io_base_file_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4B0E68A: ompi_file_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4B3ADB8: PMPI_File_open (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) > ==1286936== by 0x78D4B23: H5FD_open (H5FD.c:733) > ==1286936== by 0x78B953B: H5F_open (H5Fint.c:1493) > ==1286936== > ==1286936== 272 bytes in 44 blocks are definitely lost in loss record 39 > of 49 > ==1286936== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x9FCAEDB: ??? > ==1286936== by 0x9FE42B2: ??? > ==1286936== by 0x9FE47BB: ??? > ==1286936== by 0x9FCDDBF: ??? > ==1286936== by 0x9FA324A: ??? > ==1286936== by 0x4B3DD7F: PMPI_File_write_at_all (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x7B6DB32: H5FD_mpio_write (H5FDmpio.c:1833) > ==1286936== by 0x78DF11D: H5FD_write (H5FDint.c:257) > ==1286936== by 0x78AE86B: H5F__accum_write (H5Faccum.c:825) > ==1286936== by 0x7A1FBE9: H5PB_write (H5PB.c:1027) > ==1286936== by 0x78BBC7A: H5F_block_write (H5Fio.c:164) > ==1286936== > ==1286936== 312 bytes in 1 blocks are still reachable in loss record 40 of > 49 > ==1286936== at 0x483BE63: operator new(unsigned long) (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x74E78EB: boost::detail::make_external_thread_data() > (in > /home/mlohry/dev/cmake-build/boost_install/lib/libboost_thread.so.1.73.0) > ==1286936== by 0x74E7C74: > boost::detail::add_thread_exit_function(boost::detail::thread_exit_function_base*) > (in > /home/mlohry/dev/cmake-build/boost_install/lib/libboost_thread.so.1.73.0) > ==1286936== by 0x73AFCEA: > boost::log::v2_mt_posix::sources::aux::get_severity_level() (in > /home/mlohry/dev/cmake-build/boost_install/lib/libboost_log.so.1.73.0) > ==1286936== by 0x5F71A6C: set_value (severity_feature.hpp:135) > ==1286936== by 0x5F71A6C: > open_record_unlocked const boost::log::v2_mt_posix::trivial::severity_level> > > > (severity_feature.hpp:252) > ==1286936== by 0x5F71A6C: > open_record const boost::log::v2_mt_posix::trivial::severity_level> > > > (basic_logger.hpp:459) > ==1286936== by 0x5F71A6C: > Logger::TraceMessage(std::__cxx11::basic_string std::char_traits, std::allocator >) (logger.cpp:328) > ==1286936== by 0x5F729C7: > Logger::Message(std::__cxx11::basic_string, > std::allocator > const&, LogLevel) (logger.cpp:280) > ==1286936== by 0x5F73CF1: > Logger::Timer::Timer(std::__cxx11::basic_string std::char_traits, std::allocator > const&, LogLevel) > (logger.cpp:426) > ==1286936== by 0x15718A: timer (logger.hpp:98) > ==1286936== by 0x15718A: main (testing_main.cpp:9) > ==1286936== > ==1286936== 585 (480 direct, 105 indirect) bytes in 15 blocks are > definitely lost in loss record 41 of 49 > ==1286936== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x8E9D3EB: ??? > ==1286936== by 0x8E9F1C1: ??? > ==1286936== by 0x8D0578C: ??? > ==1286936== by 0x8D8605A: ??? > ==1286936== by 0x8D87FE8: ??? > ==1286936== by 0x8D88E4D: ??? > ==1286936== by 0x8D1A767: ??? > ==1286936== by 0x4B14036: ompi_proc_complete_init_single (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4B146C3: ompi_proc_complete_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4BA19A9: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== > ==1286936== 776 bytes in 32 blocks are indirectly lost in loss record 42 > of 49 > ==1286936== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x8DE9816: ??? > ==1286936== by 0x8DEB1D2: ??? > ==1286936== by 0x8DEB49A: ??? > ==1286936== by 0x8DE8B12: ??? > ==1286936== by 0x8E9D492: ??? > ==1286936== by 0x8E9F1C1: ??? > ==1286936== by 0x8D0578C: ??? > ==1286936== by 0x8D8605A: ??? > ==1286936== by 0x8D87FE8: ??? > ==1286936== by 0x8D88E4D: ??? > ==1286936== by 0x8D1A767: ??? > ==1286936== > ==1286936== 840 (480 direct, 360 indirect) bytes in 15 blocks are > definitely lost in loss record 43 of 49 > ==1286936== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x8E9D3EB: ??? > ==1286936== by 0x8E9F1C1: ??? > ==1286936== by 0x8D0578C: ??? > ==1286936== by 0x8D8605A: ??? > ==1286936== by 0x8D87FE8: ??? > ==1286936== by 0x8D88E4D: ??? > ==1286936== by 0x8D1A5EB: ??? > ==1286936== by 0x9EF2F00: ??? > ==1286936== by 0x9EEBF17: ??? > ==1286936== by 0x9EE2F54: ??? > ==1286936== by 0x9F1E1FB: ??? > ==1286936== > ==1286936== 1,091 (480 direct, 611 indirect) bytes in 15 blocks are > definitely lost in loss record 44 of 49 > ==1286936== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x8E9D3EB: ??? > ==1286936== by 0x8E9F1C1: ??? > ==1286936== by 0x8D0578C: ??? > ==1286936== by 0x8D8605A: ??? > ==1286936== by 0x8D87FE8: ??? > ==1286936== by 0x8D88E4D: ??? > ==1286936== by 0x8D1A767: ??? > ==1286936== by 0x84D4800: ??? > ==1286936== by 0x68602FB: orte_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286936== by 0x4BA1322: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== > ==1286936== 1,344 bytes in 1 blocks are definitely lost in loss record 45 > of 49 > ==1286936== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x68AE702: opal_free_list_grow_st (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286936== by 0x9F1CD2D: ??? > ==1286936== by 0x68FC9C8: mca_btl_base_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286936== by 0x9EE3527: ??? > ==1286936== by 0x4B6170A: mca_bml_base_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4BA1714: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286936== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286936== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286936== by 0x15710D: main (testing_main.cpp:8) > ==1286936== > ==1286936== 2,752 bytes in 1 blocks are definitely lost in loss record 46 > of 49 > ==1286936== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x68AE702: opal_free_list_grow_st (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286936== by 0x9F1CC50: ??? > ==1286936== by 0x68FC9C8: mca_btl_base_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286936== by 0x9EE3527: ??? > ==1286936== by 0x4B6170A: mca_bml_base_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4BA1714: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286936== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286936== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286936== by 0x15710D: main (testing_main.cpp:8) > ==1286936== > ==1286936== 2,752 bytes in 1 blocks are definitely lost in loss record 47 > of 49 > ==1286936== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x68AE702: opal_free_list_grow_st (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286936== by 0x9F1CCC4: ??? > ==1286936== by 0x68FC9C8: mca_btl_base_select (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286936== by 0x9EE3527: ??? > ==1286936== by 0x4B6170A: mca_bml_base_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4BA1714: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4B450B0: PMPI_Init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4A7BA77: boost::mpi::environment::environment(bool) > (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286936== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286936== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286936== by 0x15710D: main (testing_main.cpp:8) > ==1286936== > ==1286936== 62,640 bytes in 30 blocks are indirectly lost in loss record > 48 of 49 > ==1286936== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x8DE9FA8: ??? > ==1286936== by 0x8DEB032: ??? > ==1286936== by 0x8DEB49A: ??? > ==1286936== by 0x8DE8B12: ??? > ==1286936== by 0x8E9D492: ??? > ==1286936== by 0x8E9F1C1: ??? > ==1286936== by 0x8D0578C: ??? > ==1286936== by 0x8D8605A: ??? > ==1286936== by 0x8D87FE8: ??? > ==1286936== by 0x8D88E4D: ??? > ==1286936== by 0x8D1A5EB: ??? > ==1286936== > ==1286936== 62,760 (480 direct, 62,280 indirect) bytes in 15 blocks are > definitely lost in loss record 49 of 49 > ==1286936== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x8E9D3EB: ??? > ==1286936== by 0x8E9F1C1: ??? > ==1286936== by 0x8D0578C: ??? > ==1286936== by 0x8D8605A: ??? > ==1286936== by 0x8D87FE8: ??? > ==1286936== by 0x8D88E4D: ??? > ==1286936== by 0x8D1A5EB: ??? > ==1286936== by 0x9F0398A: ??? > ==1286936== by 0x9EE2F54: ??? > ==1286936== by 0x9F1E1FB: ??? > ==1286936== by 0x4BA1A09: ompi_mpi_init (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== > ==1286936== LEAK SUMMARY: > ==1286936== definitely lost: 9,805 bytes in 137 blocks > ==1286936== indirectly lost: 63,431 bytes in 63 blocks > ==1286936== possibly lost: 0 bytes in 0 blocks > ==1286936== still reachable: 1,174 bytes in 27 blocks > ==1286936== suppressed: 0 bytes in 0 blocks > ==1286936== > ==1286936== ERROR SUMMARY: 34 errors from 29 contexts (suppressed: 0 from > 0) > ==1286936== > ==1286936== 1 errors in context 1 of 29: > ==1286936== Thread 3: > ==1286936== Syscall param writev(vector[...]) points to uninitialised > byte(s) > ==1286936== at 0x658A48D: __writev (writev.c:26) > ==1286936== by 0x658A48D: writev (writev.c:24) > ==1286936== by 0x8DF9B4C: ??? > ==1286936== by 0x7CC413E: ??? (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286936== by 0x7CC487E: event_base_loop (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286936== by 0x8DBDD55: ??? > ==1286936== by 0x4BF7608: start_thread (pthread_create.c:477) > ==1286936== by 0x6595102: clone (clone.S:95) > ==1286936== Address 0xa290cbf is 127 bytes inside a block of size 5,120 > alloc'd > ==1286936== at 0x483DFAF: realloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x8DE155A: ??? > ==1286936== by 0x8DE3F4A: ??? > ==1286936== by 0x8DE4900: ??? > ==1286936== by 0x8DE4175: ??? > ==1286936== by 0x8D7CF91: ??? > ==1286936== by 0x7CC3FDD: ??? (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286936== by 0x7CC487E: event_base_loop (in > /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286936== by 0x8DBDD55: ??? > ==1286936== by 0x4BF7608: start_thread (pthread_create.c:477) > ==1286936== by 0x6595102: clone (clone.S:95) > ==1286936== Uninitialised value was created by a stack allocation > ==1286936== at 0x9F048D6: ??? > ==1286936== > ==1286936== > ==1286936== 6 errors in context 2 of 29: > ==1286936== Thread 1: > ==1286936== Syscall param pwritev(vector[...]) points to uninitialised > byte(s) > ==1286936== at 0x658A608: pwritev64 (pwritev64.c:30) > ==1286936== by 0x658A608: pwritev (pwritev64.c:28) > ==1286936== by 0x9F46E25: ??? > ==1286936== by 0x9FCE33B: ??? > ==1286936== by 0x9FCDDBF: ??? > ==1286936== by 0x9FA324A: ??? > ==1286936== by 0x4B3DD7F: PMPI_File_write_at_all (in > /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x7B6DB32: H5FD_mpio_write (H5FDmpio.c:1833) > ==1286936== by 0x78DF11D: H5FD_write (H5FDint.c:257) > ==1286936== by 0x78AE86B: H5F__accum_write (H5Faccum.c:825) > ==1286936== by 0x7A1FBE9: H5PB_write (H5PB.c:1027) > ==1286936== by 0x78BBC7A: H5F_block_write (H5Fio.c:164) > ==1286936== by 0x7B5ED15: H5C__collective_write (H5Cmpio.c:1020) > ==1286936== by 0x7B5ED15: H5C_apply_candidate_list (H5Cmpio.c:394) > ==1286936== Address 0xedf91b0 is 96 bytes inside a block of size 216 > alloc'd > ==1286936== at 0x483B7F3: malloc (in > /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x7980C45: H5MM_malloc (H5MM.c:292) > ==1286936== by 0x7980C45: H5MM_malloc (H5MM.c:267) > ==1286936== by 0x77FC8FF: H5C__flush_single_entry (H5C.c:6045) > ==1286936== by 0x7B5DC7E: H5C__flush_candidates_in_ring (H5Cmpio.c:1371) > ==1286936== by 0x7B5DC7E: H5C__flush_candidate_entries (H5Cmpio.c:1192) > ==1286936== by 0x7B5DC7E: H5C_apply_candidate_list (H5Cmpio.c:385) > ==1286936== by 0x7B5BA18: H5AC__rsp__dist_md_write__flush > (H5ACmpio.c:1709) > ==1286936== by 0x7B5BA18: H5AC__run_sync_point (H5ACmpio.c:2164) > ==1286936== by 0x7B5C9D2: H5AC__flush_entries (H5ACmpio.c:2307) > ==1286936== by 0x77C95E4: H5AC_flush (H5AC.c:681) > ==1286936== by 0x78B306A: H5F__flush_phase2 (H5Fint.c:1831) > ==1286936== by 0x78B5D7A: H5F__dest (H5Fint.c:1152) > ==1286936== by 0x78B6603: H5F_try_close (H5Fint.c:2180) > ==1286936== by 0x78B69F5: H5F__close_cb (H5Fint.c:2009) > ==1286936== by 0x7965797: H5I_dec_ref (H5I.c:1254) > ==1286936== Uninitialised value was created by a stack allocation > ==1286936== at 0x7695AF0: ??? (in > /home/mlohry/dev/cmake-build/external_install/lib/libcgns.so) > ==1286936== > ==1286936== ERROR SUMMARY: 34 errors from 29 contexts (suppressed: 0 from > 0) > > On Mon, Aug 24, 2020 at 5:00 PM Jed Brown wrote: > >> Do you potentially have a memory or other resource leak? SIGBUS would be >> an odd result, but the symptom of crashing after running for a long time >> sometimes fits with a resource leak. >> >> Mark Lohry writes: >> >> > I queued up some jobs with Barry's patch, so we'll see. >> > >> > Re Jed's suggestion at checkpointing, I don't *think* this is something >> > coming from the state of the solution -- running from the same point I'm >> > seeing it crash anywhere between 1 hour and 20 hours in. I'll increase >> my >> > file save frequency in case I'm wrong there though. >> > >> > My intel build with different blas just made it through a 6 hour time >> slot >> > without crash, whereas yesterday the same thing crashed after 3 hours. >> But >> > given the randomness so far I'd bet that's just dumb luck. >> > >> > On Mon, Aug 24, 2020 at 4:22 PM Barry Smith wrote: >> > >> >> >> >> >> >> > On Aug 24, 2020, at 2:34 PM, Jed Brown wrote: >> >> > >> >> > I'm thinking of something such as writing floating point data into >> the >> >> return address, which would be unaligned/garbage. >> >> >> >> Ok, my patch will detect this. This is what I was talking about, >> messing >> >> up the BLAS arguments which are the addresses of arrays. >> >> >> >> Valgrind is by far the preferred approach. >> >> >> >> Barry >> >> >> >> Another feature we could add to the malloc checking is when a SEGV or >> >> BUS error is encountered and we catch it we should run the >> >> PetscMallocVerify() and check our memory for corruption reporting any >> we >> >> find. >> >> >> >> >> >> >> >> > >> >> > Reproducing under Valgrind would help a lot. Perhaps it's possible >> to >> >> checkpoint such that the breakage can be reproduced more quickly? >> >> > >> >> > Barry Smith writes: >> >> > >> >> >> https://en.wikipedia.org/wiki/Bus_error < >> >> https://en.wikipedia.org/wiki/Bus_error> >> >> >> >> >> >> But perhaps not true for Intel? >> >> >> >> >> >> >> >> >> >> >> >>> On Aug 24, 2020, at 1:06 PM, Matthew Knepley >> >> wrote: >> >> >>> >> >> >>> On Mon, Aug 24, 2020 at 1:46 PM Barry Smith > > >> bsmith at petsc.dev>> wrote: >> >> >>> >> >> >>> >> >> >>>> On Aug 24, 2020, at 12:39 PM, Jed Brown > > >> jed at jedbrown.org>> wrote: >> >> >>>> >> >> >>>> Barry Smith > writes: >> >> >>>> >> >> >>>>>> On Aug 24, 2020, at 12:31 PM, Jed Brown > > >> jed at jedbrown.org>> wrote: >> >> >>>>>> >> >> >>>>>> Barry Smith > >> writes: >> >> >>>>>> >> >> >>>>>>> So if a BLAS errors with SIGBUS then it is always an input >> error >> >> of just not proper double/complex alignment? Or some other very strange >> >> thing? >> >> >>>>>> >> >> >>>>>> I would suspect memory corruption. >> >> >>>>> >> >> >>>>> >> >> >>>>> Corruption meaning what specifically? >> >> >>>>> >> >> >>>>> The routines crashing are dgemv which only take double precision >> >> arrays, regardless of what garbage is in those arrays i don't think >> there >> >> can be BUS errors resulting. They don't take integer arrays whose >> >> corruption could result in bad indexing and then BUS errors. >> >> >>>>> >> >> >>>>> So then it can only be corruption of the pointers passed in, >> correct? >> >> >>>> >> >> >>>> Such as those pointers pointing into data on the stack with >> incorrect >> >> sizes. >> >> >>> >> >> >>> But won't incorrect sizes "usually" lead to SEGV not SEGBUS? >> >> >>> >> >> >>> My understanding was that roughly memory errors in the heap are >> SEGV >> >> and memory errors on the stack are SIGBUS. Is that not true? >> >> >>> >> >> >>> Matt >> >> >>> >> >> >>> -- >> >> >>> What most experimenters take for granted before they begin their >> >> experiments is infinitely more interesting than any results to which >> their >> >> experiments lead. >> >> >>> -- Norbert Wiener >> >> >>> >> >> >>> https://www.cse.buffalo.edu/~knepley/ < >> >> http://www.cse.buffalo.edu/~knepley/> >> >> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Aug 27 08:44:18 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 27 Aug 2020 08:44:18 -0500 Subject: [petsc-users] Bus Error In-Reply-To: References: <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> <02176361-CBC4-490B-A064-057C9FAC271A@petsc.dev> <2C14E111-0ABC-4322-AD1C-FC83E4BD4798@petsc.dev> <87h7ssgg0g.fsf@jedbrown.org> <80DC4DC1-8D9B-43BF-9947-F47694FE68E9@petsc.dev> <87blj0gew3.fsf@jedbrown.org> <878se4gdd6.fsf@jedbrown.org> <3B4D8471-F863-49C8-AAD7-4A4ACA3BE78A@petsc.dev> <87364cgczg.fsf@jedbrown.org> <79E082F4-0261-4F32-9781-861B2B650511@petsc.dev> <87y2m3g7mp.fsf@jedbrown.org> <1BA78983-882E-404D-983D-B432D17E6421@petsc.dev> <87a6yjg3o5.fsf@jedbrown.org> Message-ID: <9EEB2628-D6ED-4466-A629-33EAC73BCE4C@petsc.dev> Mark, Did i tell you that this has to be built with the configure option --with-debugging=1 and won't be turned off with --with-debugging=0 ? Barry > On Aug 27, 2020, at 8:10 AM, Mark Lohry wrote: > > Barry, no output from that patch i'm afraid: > > 54 KSP Residual norm 3.215013886664e+03 > 55 KSP Residual norm 3.049105434513e+03 > 56 KSP Residual norm 2.859123916860e+03 > [929]PETSC ERROR: ------------------------------------------------------------------------ > [929]PETSC ERROR: Caught signal number 7 BUS: Bus Error, possibly illegal memory access > [929]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [929]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [929]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors > [929]PETSC ERROR: likely location of problem given in stack below > [929]PETSC ERROR: --------------------- Stack Frames ------------------------------------ > [929]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > [929]PETSC ERROR: INSTEAD the line number of the start of the function > [929]PETSC ERROR: is given. > [929]PETSC ERROR: [929] BLASgemv line 1406 /home/mlohry/petsc/src/mat/impls/baij/seq/baijfact.c > [929]PETSC ERROR: [929] MatSolve_SeqBAIJ_N_NaturalOrdering line 1378 /home/mlohry/petsc/src/mat/impls/baij/seq/baijfact.c > [929]PETSC ERROR: [929] MatSolve line 3354 /home/mlohry/petsc/src/mat/interface/matrix.c > [929]PETSC ERROR: [929] PCApply_ILU line 201 /home/mlohry/petsc/src/ksp/pc/impls/factor/ilu/ilu.c > [929]PETSC ERROR: [929] PCApply line 426 /home/mlohry/petsc/src/ksp/pc/interface/precon.c > [929]PETSC ERROR: [929] KSP_PCApply line 279 /home/mlohry/petsc/include/petsc/private/kspimpl.h > [929]PETSC ERROR: [929] KSPSolve_PREONLY line 16 /home/mlohry/petsc/src/ksp/ksp/impls/preonly/preonly.c > [929]PETSC ERROR: [929] KSPSolve_Private line 590 /home/mlohry/petsc/src/ksp/ksp/interface/itfunc.c > [929]PETSC ERROR: [929] KSPSolve line 848 /home/mlohry/petsc/src/ksp/ksp/interface/itfunc.c > [929]PETSC ERROR: [929] PCApply_ASM line 441 /home/mlohry/petsc/src/ksp/pc/impls/asm/asm.c > [929]PETSC ERROR: [929] PCApply line 426 /home/mlohry/petsc/src/ksp/pc/interface/precon.c > [929]PETSC ERROR: [929] KSP_PCApply line 279 /home/mlohry/petsc/include/petsc/private/kspimpl.h > srun: Job step aborted: Waiting up to 47 seconds for job step to finish. > [929]PETSC ERROR: [929] KSPFGMRESCycle line 108 /home/mlohry/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c > [929]PETSC ERROR: [929] KSPSolve_FGMRES line 274 /home/mlohry/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c > [929]PETSC ERROR: [929] KSPSolve_Private line 590 /home/mlohry/petsc/src/ksp/ksp/interface/itfunc.c > > On Mon, Aug 24, 2020 at 6:47 PM Mark Lohry > wrote: > I don't think I do. Running a much smaller case with the same models I get the attached report from valgrind --show-leak-kinds=all --leak-check=full --track-origins=yes. I only see some HDF5 stuff and OpenMPI that I think are false positives. > > ==1286950== Memcheck, a memory error detector > ==1286950== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. > ==1286950== Using Valgrind-3.15.0-608cb11914-20190413 and LibVEX; rerun with -h for copyright info > ==1286950== Command: ./verification_testing --gtest_filter=DrivenCavity3D.Re100_BackwardEulerILU1_16x16N2_Quadrature1 --petsc_time_integrator=arkimex --petsc_arkimex_type=l2 > ==1286950== Parent PID: 1286932 > ==1286950== > --1286950-- > --1286950-- Valgrind options: > --1286950-- --show-leak-kinds=all > --1286950-- --leak-check=full > --1286950-- --track-origins=yes > --1286950-- --log-file=valgrind-out.txt > --1286950-- -v > --1286950-- Contents of /proc/version: > --1286950-- Linux version 5.4.0-29-generic (buildd at lgw01-amd64-035) (gcc version 9.3.0 (Ubuntu 9.3.0-10ubuntu2)) #33-Ubuntu SMP Wed Apr 29 14:32:27 UTC 2020 > --1286950-- > --1286950-- Arch and hwcaps: AMD64, LittleEndian, amd64-cx16-rdtscp-sse3-ssse3-avx > --1286950-- Page sizes: currently 4096, max supported 4096 > --1286950-- Valgrind library directory: /usr/lib/x86_64-linux-gnu/valgrind > --1286950-- Reading syms from /home/mlohry/dev/cmake-build/verification_testing > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/ld-2.31.so > --1286950-- Considering /usr/lib/x86_64-linux-gnu/ld-2.31.so .. > --1286950-- .. CRC mismatch (computed 387b17ea wanted d28cf5ef) > --1286950-- Considering /lib/x86_64-linux-gnu/ld-2.31.so .. > --1286950-- .. CRC mismatch (computed 387b17ea wanted d28cf5ef) > --1286950-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ld-2.31.so .. > --1286950-- .. CRC is valid > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/valgrind/memcheck-amd64-linux > --1286950-- object doesn't have a symbol table > --1286950-- object doesn't have a dynamic symbol table > --1286950-- Scheduler: using generic scheduler lock implementation. > --1286950-- Reading suppressions file: /usr/lib/x86_64-linux-gnu/valgrind/default.supp > ==1286950== embedded gdbserver: reading from /tmp/vgdb-pipe-from-vgdb-to-1286950-by-mlohry-on-??? > ==1286950== embedded gdbserver: writing to /tmp/vgdb-pipe-to-vgdb-from-1286950-by-mlohry-on-??? > ==1286950== embedded gdbserver: shared mem /tmp/vgdb-pipe-shared-mem-vgdb-1286950-by-mlohry-on-??? > ==1286950== > ==1286950== TO CONTROL THIS PROCESS USING vgdb (which you probably > ==1286950== don't want to do, unless you know exactly what you're doing, > ==1286950== or are doing some strange experiment): > ==1286950== /usr/lib/x86_64-linux-gnu/valgrind/../../bin/vgdb --pid=1286950 ...command... > ==1286950== > ==1286950== TO DEBUG THIS PROCESS USING GDB: start GDB like this > ==1286950== /path/to/gdb ./verification_testing > ==1286950== and then give GDB the following command > ==1286950== target remote | /usr/lib/x86_64-linux-gnu/valgrind/../../bin/vgdb --pid=1286950 > ==1286950== --pid is optional if only one valgrind process is running > ==1286950== > --1286950-- REDIR: 0x4022d80 (ld-linux-x86-64.so.2:strlen) redirected to 0x580c9ce2 (???) > --1286950-- REDIR: 0x4022b50 (ld-linux-x86-64.so.2:index) redirected to 0x580c9cfc (???) > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_core-amd64-linux.so > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so > --1286950-- object doesn't have a symbol table > ==1286950== WARNING: new redirection conflicts with existing -- ignoring it > --1286950-- old: 0x04022d80 (strlen ) R-> (0000.0) 0x580c9ce2 ??? > --1286950-- new: 0x04022d80 (strlen ) R-> (2007.0) 0x0483f060 strlen > --1286950-- REDIR: 0x401f560 (ld-linux-x86-64.so.2:strcmp) redirected to 0x483ffd0 (strcmp) > --1286950-- REDIR: 0x40232e0 (ld-linux-x86-64.so.2:mempcpy) redirected to 0x4843a20 (mempcpy) > --1286950-- Reading syms from /home/mlohry/dev/cmake-build/initialization/libinitialization.so > --1286950-- Reading syms from /home/mlohry/dev/cmake-build/governing_equations/libgoverning_equations.so > --1286950-- Reading syms from /home/mlohry/dev/cmake-build/time_stepping/libtime_stepping.so > --1286950-- Reading syms from /home/mlohry/dev/cmake-build/governing_equations/libboundary_conditions.so > --1286950-- Reading syms from /home/mlohry/dev/cmake-build/governing_equations/libsolution_monitors.so > --1286950-- Reading syms from /home/mlohry/dev/cmake-build/governing_equations/libfluxtypes.so > --1286950-- Reading syms from /home/mlohry/dev/cmake-build/algebraic_solvers/libalgebraic_solvers.so > --1286950-- Reading syms from /home/mlohry/dev/cmake-build/program_options/libprogram_options.so > --1286950-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_filesystem.so.1.73.0 > --1286950-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0 > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so.40.20.1 > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3 > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libpthread-2.31.so > --1286950-- Considering /usr/lib/debug/.build-id/77/5cbbfff814456660786780b0b3b40096b4c05e.debug .. > --1286950-- .. build-id is valid > --1286948-- Reading syms from /home/mlohry/dev/cmake-build/external/petsc/arch-linux2-c-opt/lib/libpetsc.so.3.13.3 > --1286937-- Reading syms from /home/mlohry/dev/cmake-build/parallel/libparallel.so > --1286937-- Reading syms from /home/mlohry/dev/cmake-build/logger/liblogger.so > --1286937-- Reading syms from /home/mlohry/dev/cmake-build/spatial_discretization/libdiscretization.so > --1286945-- Reading syms from /home/mlohry/dev/cmake-build/utils/libutils.so > --1286944-- Reading syms from /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28 > --1286938-- object doesn't have a symbol table > --1286949-- Reading syms from /usr/lib/x86_64-linux-gnu/libm-2.31.so > --1286949-- Considering /usr/lib/x86_64-linux-gnu/libm-2.31.so .. > --1286947-- .. CRC mismatch (computed 327d785f wanted 751f5509) > --1286947-- Considering /lib/x86_64-linux-gnu/libm-2.31.so .. > --1286938-- .. CRC mismatch (computed 327d785f wanted 751f5509) > --1286937-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/libm-2.31.so .. > --1286950-- .. CRC is valid > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libc-2.31.so > --1286950-- Considering /usr/lib/x86_64-linux-gnu/libc-2.31.so .. > --1286951-- .. CRC mismatch (computed a6f43087 wanted 6555436e) > --1286951-- Considering /lib/x86_64-linux-gnu/libc-2.31.so .. > --1286947-- .. CRC mismatch (computed a6f43087 wanted 6555436e) > --1286947-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.31.so .. > --1286950-- .. CRC is valid > --1286940-- Reading syms from /home/mlohry/dev/cmake-build/file_io/libfileio.so > --1286950-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_program_options.so.1.73.0 > --1286950-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_serialization.so.1.73.0 > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3 > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3 > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libhwloc.so.15.1.0 > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from /home/mlohry/dev/cmake-build/external/petsc/arch-linux2-c-opt/lib/libsuperlu_dist.so.6.3.0 > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 > --1286950-- object doesn't have a symbol table > --1286937-- Reading syms from /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 > --1286937-- object doesn't have a symbol table > --1286947-- Reading syms from /usr/lib/x86_64-linux-gnu/libX11.so.6.3.0 > --1286939-- object doesn't have a symbol table > --1286947-- Reading syms from /usr/lib/x86_64-linux-gnu/libdl-2.31.so > --1286947-- Considering /usr/lib/x86_64-linux-gnu/libdl-2.31.so .. > --1286947-- .. CRC mismatch (computed 4fd191ca wanted df8dd39a) > --1286947-- Considering /lib/x86_64-linux-gnu/libdl-2.31.so .. > --1286947-- .. CRC mismatch (computed 4fd191ca wanted df8dd39a) > --1286947-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/libdl-2.31.so .. > --1286947-- .. CRC is valid > --1286937-- Reading syms from /home/mlohry/dev/cmake-build/external/petsc/arch-linux2-c-opt/lib/libmetis.so > --1286937-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_log.so.1.73.0 > --1286942-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_log_setup.so.1.73.0 > --1286942-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_thread.so.1.73.0 > --1286942-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_regex.so.1.73.0 > --1286949-- Reading syms from /home/mlohry/dev/cmake-build/basis_functions/libbasis_functions.so > --1286944-- Reading syms from /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0 > --1286944-- object doesn't have a symbol table > --1286951-- Reading syms from /home/mlohry/dev/cmake-build/external_install/lib/libcgns.so > --1286951-- object doesn't have a symbol table > --1286943-- Reading syms from /home/mlohry/dev/cmake-build/external_install/lib/libhdf5.so.103.1.0 > --1286951-- Reading syms from /home/mlohry/dev/cmake-build/external/tinyxml2-build/libtinyxml2.so.6.1.0 > --1286944-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_iostreams.so.1.73.0 > --1286944-- Reading syms from /usr/lib/x86_64-linux-gnu/libz.so.1.2.11 > --1286944-- object doesn't have a symbol table > --1286951-- Reading syms from /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0 > --1286951-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libutil-2.31.so > --1286946-- Considering /usr/lib/x86_64-linux-gnu/libutil-2.31.so .. > --1286946-- .. CRC mismatch (computed 4639aba5 wanted ceb246b4) > --1286946-- Considering /lib/x86_64-linux-gnu/libutil-2.31.so .. > --1286946-- .. CRC mismatch (computed 4639aba5 wanted ceb246b4) > --1286948-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/libutil-2.31.so .. > --1286939-- .. CRC is valid > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libudev.so.1.6.17 > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libltdl.so.7.3.1 > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libgfortran.so.5.0.0 > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libxcb.so.1.1.0 > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/librt-2.31.so > --1286950-- Considering /usr/lib/x86_64-linux-gnu/librt-2.31.so .. > --1286950-- .. CRC mismatch (computed a9acc0ce wanted cb4905a3) > --1286950-- Considering /lib/x86_64-linux-gnu/librt-2.31.so .. > --1286950-- .. CRC mismatch (computed a9acc0ce wanted cb4905a3) > --1286950-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/librt-2.31.so .. > --1286950-- .. CRC is valid > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libquadmath.so.0.0.0 > --1286950-- object doesn't have a symbol table > --1286945-- Reading syms from /usr/lib/x86_64-linux-gnu/libXau.so.6.0.0 > --1286945-- Considering /usr/lib/x86_64-linux-gnu/libXau.so.6.0.0 .. > --1286945-- .. CRC mismatch (computed 7de9b6ad wanted e8a17129) > --1286945-- Considering /lib/x86_64-linux-gnu/libXau.so.6.0.0 .. > --1286945-- .. CRC mismatch (computed 7de9b6ad wanted e8a17129) > --1286945-- object doesn't have a symbol table > --1286945-- Reading syms from /usr/lib/x86_64-linux-gnu/libXdmcp.so.6.0.0 > --1286942-- object doesn't have a symbol table > --1286942-- Reading syms from /usr/lib/x86_64-linux-gnu/libbsd.so.0.10.0 > --1286942-- object doesn't have a symbol table > --1286950-- REDIR: 0x6516600 (libc.so.6:memmove) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6515900 (libc.so.6:strncpy) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6516930 (libc.so.6:strcasecmp) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6515220 (libc.so.6:strcat) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6515960 (libc.so.6:rindex) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6517dd0 (libc.so.6:rawmemchr) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6532e60 (libc.so.6:wmemchr) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65329a0 (libc.so.6:wcscmp) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6516760 (libc.so.6:mempcpy) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6516590 (libc.so.6:bcmp) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6515890 (libc.so.6:strncmp) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65152d0 (libc.so.6:strcmp) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65166c0 (libc.so.6:memset) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6532960 (libc.so.6:wcschr) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65157f0 (libc.so.6:strnlen) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65153b0 (libc.so.6:strcspn) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6516980 (libc.so.6:strncasecmp) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6515350 (libc.so.6:strcpy) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6516ad0 (libc.so.6:memcpy@@GLIBC_2.14) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65340d0 (libc.so.6:wcsnlen) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65329e0 (libc.so.6:wcscpy) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65159a0 (libc.so.6:strpbrk) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6515280 (libc.so.6:index) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65157b0 (libc.so.6:strlen) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x651ed20 (libc.so.6:memrchr) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65169d0 (libc.so.6:strcasecmp_l) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6516550 (libc.so.6:memchr) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6532ab0 (libc.so.6:wcslen) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6515c60 (libc.so.6:strspn) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65168d0 (libc.so.6:stpncpy) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6516870 (libc.so.6:stpcpy) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6517e10 (libc.so.6:strchrnul) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6516a20 (libc.so.6:strncasecmp_l) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x6516470 (libc.so.6:strstr) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65a3750 (libc.so.6:__memcpy_chk) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286938-- REDIR: 0x6527a30 (libc.so.6:__strrchr_sse2) redirected to 0x483ea70 (__strrchr_sse2) > --1286938-- REDIR: 0x6511c90 (libc.so.6:calloc) redirected to 0x483dce0 (calloc) > --1286938-- REDIR: 0x6510260 (libc.so.6:malloc) redirected to 0x483b780 (malloc) > --1286938-- REDIR: 0x6531c40 (libc.so.6:memcpy at GLIBC_2.2.5) redirected to 0x4840100 (memcpy at GLIBC_2.2.5) > --1286938-- REDIR: 0x6527d30 (libc.so.6:__strlen_sse2) redirected to 0x483efa0 (__strlen_sse2) > --1286938-- REDIR: 0x65f4ac0 (libc.so.6:__strncmp_sse42) redirected to 0x483f7c0 (__strncmp_sse42) > --1286938-- REDIR: 0x6510850 (libc.so.6:free) redirected to 0x483c9d0 (free) > --1286938-- REDIR: 0x6532070 (libc.so.6:__memset_sse2_unaligned) redirected to 0x48428e0 (memset) > --1286938-- REDIR: 0x6603350 (libc.so.6:__memcmp_sse4_1) redirected to 0x4842150 (__memcmp_sse4_1) > --1286938-- REDIR: 0x6520520 (libc.so.6:__strcmp_sse2_unaligned) redirected to 0x483fed0 (strcmp) > --1286938-- REDIR: 0x61d0c10 (libstdc++.so.6:operator new(unsigned long)) redirected to 0x483bdf0 (operator new(unsigned long)) > --1286938-- REDIR: 0x61cee60 (libstdc++.so.6:operator delete(void*)) redirected to 0x483cf50 (operator delete(void*)) > --1286938-- REDIR: 0x61d0c70 (libstdc++.so.6:operator new[](unsigned long)) redirected to 0x483c510 (operator new[](unsigned long)) > --1286938-- REDIR: 0x61cee90 (libstdc++.so.6:operator delete[](void*)) redirected to 0x483d6e0 (operator delete[](void*)) > --1286938-- REDIR: 0x65275f0 (libc.so.6:__strchr_sse2) redirected to 0x483eb90 (__strchr_sse2) > --1286950-- REDIR: 0x6511000 (libc.so.6:realloc) redirected to 0x483df30 (realloc) > --1286950-- REDIR: 0x6527820 (libc.so.6:__strchrnul_sse2) redirected to 0x4843540 (strchrnul) > --1286950-- REDIR: 0x6531560 (libc.so.6:__strstr_sse2_unaligned) redirected to 0x4843c20 (strstr) > --1286950-- REDIR: 0x6531c20 (libc.so.6:__mempcpy_sse2_unaligned) redirected to 0x4843660 (mempcpy) > --1286950-- REDIR: 0x652d2a0 (libc.so.6:__strncpy_sse2_unaligned) redirected to 0x483f560 (__strncpy_sse2_unaligned) > --1286950-- REDIR: 0x6515830 (libc.so.6:strncat) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > --1286950-- REDIR: 0x65305b0 (libc.so.6:__strncat_sse2_unaligned) redirected to 0x483ede0 (strncat) > --1286950-- REDIR: 0x6516120 (libc.so.6:__GI_strstr) redirected to 0x4843ca0 (__strstr_sse2) > --1286950-- REDIR: 0x6522360 (libc.so.6:__rawmemchr_sse2) redirected to 0x4843580 (rawmemchr) > --1286950-- REDIR: 0x65faea0 (libc.so.6:__strcasecmp_avx) redirected to 0x483f830 (strcasecmp) > --1286950-- REDIR: 0x65fc520 (libc.so.6:__strncasecmp_avx) redirected to 0x483f910 (strncasecmp) > --1286950-- REDIR: 0x65f98a0 (libc.so.6:__strspn_sse42) redirected to 0x4843ef0 (strspn) > --1286950-- REDIR: 0x65f9620 (libc.so.6:__strcspn_sse42) redirected to 0x4843e10 (strcspn) > --1286948-- REDIR: 0x6522030 (libc.so.6:__memchr_sse2) redirected to 0x4840050 (memchr) > --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_mmap.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_posix.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_sysv.so > --1286948-- object doesn't have a symbol table > --1286948-- Discarding syms at 0x4a96240-0x4a96d47 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_posix.so (have_dinfo 1) > --1286948-- Discarding syms at 0x4a9b1c0-0x4a9b937 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_sysv.so (have_dinfo 1) > --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_weighted.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_netlink.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/libnl-3.so.200.26.0 > --1286948-- object doesn't have a symbol table > --1286948-- Discarding syms at 0x4a96120-0x4a966b0 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_weighted.so (have_dinfo 1) > --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_flux.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_orte.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_ompi.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_slurm.so > --1286948-- object doesn't have a symbol table > --1286948-- REDIR: 0x64bc670 (libc.so.6:setenv) redirected to 0x4844480 (setenv) > --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_pmi.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_flux.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_ext3x.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25 > --1286948-- object doesn't have a symbol table > --1286948-- Discarding syms at 0x8d053e0-0x8d07391 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_flux.so (have_dinfo 1) > --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v21.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v20.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v3.so > --1286948-- object doesn't have a symbol table > --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v12.so > --1286948-- object doesn't have a symbol table > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_usock.so > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_tcp.so > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_native.so > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_none.so > --1286950-- object doesn't have a symbol table > --1286950-- Discarding syms at 0x8d04180-0x8d045b0 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_none.so (have_dinfo 1) > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/libmca_common_dstore.so.1.0.2 > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_hash.so > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds12.so > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_pshmem_mmap.so > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_preg_native.so > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_stdfd.so > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_syslog.so > --1286950-- object doesn't have a symbol table > --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_default.so > --1286950-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_tool.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_app.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_orted.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_hnp.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_novm.so > --1286946-- object doesn't have a symbol table > --1286946-- Discarding syms at 0x9ebf0a0-0x9ebf490 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_tool.so (have_dinfo 1) > --1286946-- Discarding syms at 0x9eca300-0x9ecbee8 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_orted.so (have_dinfo 1) > --1286946-- Discarding syms at 0x9ed1220-0x9ed24e7 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_hnp.so (have_dinfo 1) > --1286946-- Discarding syms at 0x9ed8240-0x9ed8c88 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_novm.so (have_dinfo 1) > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_tool.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_app.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_hnp.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_orted.so > --1286946-- object doesn't have a symbol table > --1286946-- Discarding syms at 0x9ebf0e0-0x9ebf417 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_tool.so (have_dinfo 1) > --1286946-- Discarding syms at 0x9ecf320-0x9ed1239 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_hnp.so (have_dinfo 1) > --1286946-- Discarding syms at 0x9ed73a0-0x9ed9ccc in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_orted.so (have_dinfo 1) > --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_basic.so > --1286936-- object doesn't have a symbol table > --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_bucket.so > --1286936-- object doesn't have a symbol table > --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_rcache_grdma.so > --1286936-- object doesn't have a symbol table > --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mpool_hugepage.so > --1286936-- object doesn't have a symbol table > --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_bml_r2.so > --1286936-- object doesn't have a symbol table > --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_tcp.so > --1286936-- object doesn't have a symbol table > --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_sm.so > --1286936-- object doesn't have a symbol table > --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so > --1286936-- object doesn't have a symbol table > --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_openib.so > --1286936-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_verbs.so.40.20.0 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_self.so > --1286946-- object doesn't have a symbol table > --1286946-- REDIR: 0x652cc70 (libc.so.6:__strcpy_sse2_unaligned) redirected to 0x483f090 (strcpy) > --1286946-- REDIR: 0x65a3810 (libc.so.6:__memmove_chk) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) > ==1286946== WARNING: new redirection conflicts with existing -- ignoring it > --1286946-- old: 0x06531c30 (__memcpy_chk_sse2_un) R-> (2030.0) 0x04843b10 __memcpy_chk > --1286946-- new: 0x06531c30 (__memcpy_chk_sse2_un) R-> (2024.0) 0x048434d0 __memmove_chk > --1286946-- REDIR: 0x6531c30 (libc.so.6:__memcpy_chk_sse2_unaligned) redirected to 0x4843b10 (__memcpy_chk) > --1286946-- REDIR: 0x65129b0 (libc.so.6:posix_memalign) redirected to 0x483e1e0 (posix_memalign) > --1286946-- Discarding syms at 0x9f15280-0x9f32932 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_openib.so (have_dinfo 1) > --1286946-- Discarding syms at 0x9f7c4c0-0x9f7ded8 in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_verbs.so.40.20.0 (have_dinfo 1) > --1286946-- Discarding syms at 0x9f620c0-0x9f71483 in /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 (have_dinfo 1) > --1286946-- Discarding syms at 0x9f9ba10-0x9fd22ee in /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 (have_dinfo 1) > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_cm.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_monitoring.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_monitoring.so.50.10.0 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_vprotocol_pessimist.so > --1286946-- object doesn't have a symbol table > --1286946-- Discarding syms at 0x9f4d400-0x9f50c19 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_vprotocol_pessimist.so (have_dinfo 1) > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/libpsm1/libpsm_infinipath.so.1.16 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libinfinipath.so.4.0 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libuuid.so.1.3.0 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_ofi.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libfabric.so.1.9.15 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/librdmacm.so.1.2.28.0 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libpsm2.so.2.1 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libnuma.so.1.0.0 > --1286946-- object doesn't have a symbol table > --1286946-- REDIR: 0x6517140 (libc.so.6:strcasestr) redirected to 0x4843f80 (strcasestr) > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm2.so > --1286946-- object doesn't have a symbol table > --1286946-- Discarding syms at 0x9f4d5c0-0x9f4f5a1 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm.so (have_dinfo 1) > --1286946-- Discarding syms at 0x9fee680-0x9ff096c in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm2.so (have_dinfo 1) > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_inter.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_basic.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sm.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_sm.so.40.20.0 > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_self.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sync.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_monitoring.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_libnbc.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_tuned.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_sm.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_pt2pt.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_rdma.so > --1286946-- object doesn't have a symbol table > --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_monitoring.so > --1286946-- object doesn't have a symbol table > --1286946-- Discarding syms at 0x9f724a0-0x9f787b5 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_ofi.so (have_dinfo 1) > --1286946-- Discarding syms at 0xa827f80-0xa8e14c4 in /usr/lib/x86_64-linux-gnu/libfabric.so.1.9.15 (have_dinfo 1) > --1286946-- Discarding syms at 0x9f94830-0x9fbafce in /usr/lib/libpsm1/libpsm_infinipath.so.1.16 (have_dinfo 1) > --1286946-- Discarding syms at 0x9fe5580-0x9fe8f71 in /usr/lib/x86_64-linux-gnu/libuuid.so.1.3.0 (have_dinfo 1) > --1286946-- Discarding syms at 0x9f56420-0x9f5cec0 in /usr/lib/x86_64-linux-gnu/libinfinipath.so.4.0 (have_dinfo 1) > --1286946-- Discarding syms at 0xa929f10-0xa93d5fc in /usr/lib/x86_64-linux-gnu/librdmacm.so.1.2.28.0 (have_dinfo 1) > --1286946-- Discarding syms at 0xa94b0c0-0xa95a483 in /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 (have_dinfo 1) > --1286946-- Discarding syms at 0xa968860-0xa9adf12 in /usr/lib/x86_64-linux-gnu/libpsm2.so.2.1 (have_dinfo 1) > --1286946-- Discarding syms at 0xa9e7a10-0xaa1e2ee in /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 (have_dinfo 1) > --1286946-- Discarding syms at 0x9f80410-0x9f84e27 in /usr/lib/x86_64-linux-gnu/libnuma.so.1.0.0 (have_dinfo 1) > --1286946-- Discarding syms at 0x9f103e0-0x9f15fd5 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_cm.so (have_dinfo 1) > --1286946-- Discarding syms at 0x9f471e0-0x9f47ce0 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_monitoring.so (have_dinfo 1) > ==1286946== Thread 3: > ==1286946== Syscall param writev(vector[...]) points to uninitialised byte(s) > ==1286946== at 0x658A48D: __writev (writev.c:26) > ==1286946== by 0x658A48D: writev (writev.c:24) > ==1286946== by 0x8DF9B4C: pmix_ptl_base_send_handler (in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) > ==1286946== by 0x7CC413E: ??? (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286946== by 0x7CC487E: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286946== by 0x8DBDD55: ??? (in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) > ==1286946== by 0x4BF7608: start_thread (pthread_create.c:477) > ==1286946== by 0x6595102: clone (clone.S:95) > ==1286946== Address 0xa28fdcf is 127 bytes inside a block of size 5,120 alloc'd > ==1286946== at 0x483DFAF: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286946== by 0x8DE155A: pmix_bfrop_buffer_extend (in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) > ==1286946== by 0x8DE3F4A: pmix_bfrops_base_pack_byte (in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) > ==1286946== by 0x8DE4900: pmix_bfrops_base_pack_buf (in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) > ==1286946== by 0x8DE4175: pmix_bfrops_base_pack (in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) > ==1286946== by 0x8D7CF91: ??? (in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) > ==1286946== by 0x7CC3FDD: ??? (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286946== by 0x7CC487E: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286946== by 0x8DBDD55: ??? (in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) > ==1286946== by 0x4BF7608: start_thread (pthread_create.c:477) > ==1286946== by 0x6595102: clone (clone.S:95) > ==1286946== Uninitialised value was created by a stack allocation > ==1286946== at 0x9F048D6: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so) > ==1286946== > --1286944-- Discarding syms at 0xaa4d220-0xaa5796a in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmp--1286945-- Discarding syms at 0xaa4d220---1286948-- Discarding syms at 0xaae1100-0xaae7d70 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmp--1286945-- Discarding syms at 0xaae1100-0xaae7d70 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_monitoring.so (have_dinfo 1) > --1286945-- Discarding syms at 0x9f69420-0x9f--1286938-- REDIR: 0x61cee70 (libstdc++.so.6:operator delete(void*, unsigned long)) redirected to --1286937-- REDIR: 0x61cee70 (libstdc++.so.6:opera--1286946-- REDIR: 0x652e970 (libc.so.6:__stpncpy_sse2_unaligned) redirected to 0x48427e0 (stpncpy) > --1286942-- REDIR: 0x6527ed0 (libc.so.6:__strnlen_sse2) redirected to 0x483eee0 (strnlen) > --1286944-- REDIR: 0x652fcc0 (libc.so.6:__strcat_sse2_unaligned) redirected to 0x483ec20 (strcat) > --1286951-- REDIR: 0x65113d0 (libc.so.6:memalign) redirected to 0x483e2a0 (memalign) > --1286951-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_romio321.so > --1286951-- object doesn't have a symbol table > --1286951-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_ompio.so > --1286951-- object doesn't have a symbol table > --1286941-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_ompio.so.41.19.3 > --1286941-- object doesn't have a symbol table > --1286951-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fs_ufs.so > --1286951-- object doesn't have a symbol table > --1286939-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_two_phase.so > --1286939-- object doesn't have a symbol table > --1286939-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_individual.so > --1286939-- object doesn't have a symbol table > --1286939-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic.so > --1286939-- object doesn't have a symbol table > --1286939-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic_gen2.so > --1286939-- object doesn't have a symbol table > --1286939-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_vulcan.so > --1286939-- object doesn't have a symbol table > --1286939-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fbtl_posix.so > --1286939-- object doesn't have a symbol table > --1286943-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_sm.so > --1286943-- object doesn't have a symbol table > --1286943-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_individual.so > --1286943-- object doesn't have a symbol table > --1286943-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_lockedfile.so > --1286943-- object doesn't have a symbol table > --1286938-- REDIR: 0x65a3b00 (libc.so.6:__strcpy_chk) redirected to 0x48435c0 (__strcpy_chk) > --1286939-- Discarding syms at 0x9f1d660-0x9f371d6 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so (have_dinfo 1) > --1286939-- Discarding syms at 0x9f5afa0-0x9f8f8b6 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_romio321.so (have_dinfo 1) > --1286939-- Discarding syms at 0x9fa0640-0x9fa42d9 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_ompio.so (have_dinfo 1) > --1286939-- Discarding syms at 0x9f4c160-0x9f4dc58 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_inter.so (have_dinfo 1) > --1286939-- Discarding syms at 0xa7fc270-0xa804f00 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_basic.so (have_dinfo 1) > --1286939-- Discarding syms at 0x9fee3a0-0x9ff134e in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sm.so (have_dinfo 1) > --1286939-- Discarding syms at 0xa80a240-0xa80aa8d in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_sm.so.40.20.0 (have_dinfo 1) > --1286939-- Discarding syms at 0xa80f0e0-0xa80f8bb in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_self.so (have_dinfo 1) > --1286939-- Discarding syms at 0xaa460c0-0xaa47947 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sync.so (have_dinfo 1) > --1286939-- Discarding syms at 0xaa613e0-0xaa7730f in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_libnbc.so (have_dinfo 1) > --1286939-- Discarding syms at 0xaa849c0-0xaa8a845 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_tuned.so (have_dinfo 1) > --1286939-- Discarding syms at 0x9ee1320-0x9ee3567 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_bml_r2.so (have_dinfo 1) > --1286939-- Discarding syms at 0x9eebc40-0x9ef4ad7 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_tcp.so (have_dinfo 1) > --1286939-- Discarding syms at 0x9f02600-0x9f08cd8 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so (have_dinfo 1) > --1286939-- Discarding syms at 0x9f40200-0x9f4126e in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_self.so (have_dinfo 1) > --1286939-- Discarding syms at 0x9eda4e0-0x9edb4c5 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mpool_hugepage.so (have_dinfo 1) > --1286939-- Discarding syms at 0x9ed32c0-0x9ed4afe in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_rcache_grdma.so (have_dinfo 1) > --1286939-- Discarding syms at 0x9ebf160-0x9ebfe95 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_basic.so (have_dinfo 1) > --1286939-- Discarding syms at 0x9ece140-0x9ecebed in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_bucket.so (have_dinfo 1) > --1286939-- Discarding syms at 0x9ec92a0-0x9ec9aa2 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_app.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8eae0e0-0x8eae4a7 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_stdfd.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8eb3220-0x8eb3c27 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_syslog.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8eb80e0-0x8eb90b7 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_default.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8ea6380-0x8ea97b3 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_preg_native.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8e5a740-0x8e5f859 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_usock.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8e67be0-0x8e743f0 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_tcp.so (have_dinfo 1) > --1286939-- Discarding syms at 0x84da200-0x84daa5d in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_native.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8d322b0-0x8d34bfc in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v21.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8e29480-0x8e3b70a in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v20.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8d3c2b0-0x8d3ed5c in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v3.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8e45340-0x8e502da in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v12.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8e901a0-0x8e908a7 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_pshmem_mmap.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8d05520-0x8d06783 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8e7b460-0x8e8aaa4 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_hash.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8d44520-0x8d4556a in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds12.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8e97600-0x8ea0fa1 in /usr/lib/x86_64-linux-gnu/pmix/lib/libmca_common_dstore.so.1.0.2 (have_dinfo 1) > --1286939-- Discarding syms at 0x8d109c0-0x8d27dcf in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_ext3x.so (have_dinfo 1) > --1286939-- Discarding syms at 0x8d5b280-0x8dfdffb in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25 (have_dinfo 1) > --1286939-- Discarding syms at 0x9ec40a0-0x9ec4490 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_app.so (have_dinfo 1) > --1286939-- Discarding syms at 0x84d2580-0x84d518f in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_pmi.so (have_dinfo 1) > --1286939-- Discarding syms at 0x4a96120-0x4a9644f in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_flux.so (have_dinfo 1) > --1286939-- Discarding syms at 0x4aa0100-0x4aa03e7 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_orte.so (have_dinfo 1) > --1286939-- Discarding syms at 0x84c74a0-0x84c901f in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_ompi.so (have_dinfo 1) > --1286939-- Discarding syms at 0x4aa5260-0x4aa58e9 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_slurm.so (have_dinfo 1) > --1286939-- Discarding syms at 0x4a9b420-0x4a9bcdf in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_netlink.so (have_dinfo 1) > --1286939-- Discarding syms at 0x84e7460-0x84f52ca in /usr/lib/x86_64-linux-gnu/libnl-3.so.200.26.0 (have_dinfo 1) > --1286939-- Discarding syms at 0x4a90360-0x4a91107 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_mmap.so (have_dinfo 1) > --1286939-- Discarding syms at 0x9f46220-0x9f474cc in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fbtl_posix.so (have_dinfo 1) > --1286939-- Discarding syms at 0x9f0f180-0x9f0f78d in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fs_ufs.so (have_dinfo 1) > --1286939-- Discarding syms at 0xaa94540-0xaa96a4a in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_sm.so (have_dinfo 1) > --1286939-- Discarding syms at 0xaa9f6c0-0xaab44d0 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_pt2pt.so (have_dinfo 1) > --1286939-- Discarding syms at 0xaabe820-0xaad8ee0 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_rdma.so (have_dinfo 1) > --1286939-- Discarding syms at 0x9efc080-0x9efc1e1 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_sm.so (have_dinfo 1) > --1286939-- Discarding syms at 0x9fab2a0-0x9fb1341 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_two_phase.so (have_dinfo 1) > --1286939-- Discarding syms at 0x9f140c0-0x9f14299 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_individual.so (have_dinfo 1) > --1286939-- Discarding syms at 0x9fb72a0-0x9fbb791 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic.so (have_dinfo 1) > --1286939-- Discarding syms at 0x9fd52a0-0x9fda794 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic_gen2.so (have_dinfo 1) > --1286939-- Discarding syms at 0x9fe02e0-0x9fe59a5 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_vulcan.so (have_dinfo 1) > --1286939-- Discarding syms at 0xa815460-0xa8177ab in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_sm.so (have_dinfo 1) > --1286939-- Discarding syms at 0xa81e260-0xa82033d in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_individual.so (have_dinfo 1) > --1286939-- Discarding syms at 0xa8273e0-0xa8297d8 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_lockedfile.so (have_dinfo 1) > --1286939-- Discarding syms at 0x9fc85e0-0x9fce8ef in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_ompio.so.41.19.3 (have_dinfo 1) > ==1286939== > ==1286939== HEAP SUMMARY: > ==1286939== in use at exit: 74,054 bytes in 223 blocks > ==1286939== total heap usage: 22,405,782 allocs, 22,405,559 frees, 34,062,479,959 bytes allocated > ==1286939== > ==1286939== Searching for pointers to 223 not-freed blocks > ==1286939== Checked 3,415,912 bytes > ==1286939== > ==1286939== Thread 1: > ==1286939== 1 bytes in 1 blocks are definitely lost in loss record 1 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x651550E: strdup (strdup.c:42) > ==1286939== by 0x9F6A4B6: ??? > ==1286939== by 0x9F47373: ??? > ==1286939== by 0x68E3B9B: mca_base_framework_components_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3F35: mca_base_framework_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3F93: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x4BA1734: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== > ==1286939== 8 bytes in 1 blocks are still reachable in loss record 2 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x764724C: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) > ==1286939== by 0x7657B9A: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) > ==1286939== by 0x7645679: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) > ==1286939== by 0x4011B89: call_init.part.0 (dl-init.c:72) > ==1286939== by 0x4011C90: call_init (dl-init.c:30) > ==1286939== by 0x4011C90: _dl_init (dl-init.c:119) > ==1286939== by 0x4001139: ??? (in /usr/lib/x86_64-linux-gnu/ld-2.31.so ) > ==1286939== by 0x3: ??? > ==1286939== by 0x1FFEFFF926: ??? > ==1286939== by 0x1FFEFFF93D: ??? > ==1286939== by 0x1FFEFFF987: ??? > ==1286939== by 0x1FFEFFF9A7: ??? > ==1286939== > ==1286939== 8 bytes in 1 blocks are definitely lost in loss record 3 of 44 > ==1286939== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x9F69B6F: ??? > ==1286939== by 0x9F1CDED: ??? > ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9EE3527: ??? > ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== by 0x15710D: main (testing_main.cpp:8) > ==1286939== > ==1286939== 13 bytes in 2 blocks are still reachable in loss record 4 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x651550E: strdup (strdup.c:42) > ==1286939== by 0x7CC3657: event_config_avoid_method (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x68FEB5A: opal_event_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68FE8CA: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68B8BCF: opal_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x6860120: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== > ==1286939== 15 bytes in 1 blocks are indirectly lost in loss record 5 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x651550E: strdup (strdup.c:42) > ==1286939== by 0x9EDB189: ??? > ==1286939== by 0x68D98FC: mca_base_framework_components_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x6907C25: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x4BA16D5: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== by 0x15710D: main (testing_main.cpp:8) > ==1286939== > ==1286939== 15 bytes in 1 blocks are definitely lost in loss record 6 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x651550E: strdup (strdup.c:42) > ==1286939== by 0x9F5655C: ??? > ==1286939== by 0x4011B89: call_init.part.0 (dl-init.c:72) > ==1286939== by 0x4011C90: call_init (dl-init.c:30) > ==1286939== by 0x4011C90: _dl_init (dl-init.c:119) > ==1286939== by 0x65D6784: _dl_catch_exception (dl-error-skeleton.c:182) > ==1286939== by 0x401642C: dl_open_worker (dl-open.c:758) > ==1286939== by 0x65D6727: _dl_catch_exception (dl-error-skeleton.c:208) > ==1286939== by 0x40155F9: _dl_open (dl-open.c:837) > ==1286939== by 0x72DE34B: dlopen_doit (dlopen.c:66) > ==1286939== by 0x65D6727: _dl_catch_exception (dl-error-skeleton.c:208) > ==1286939== by 0x65D67F2: _dl_catch_error (dl-error-skeleton.c:227) > ==1286939== > ==1286939== 16 bytes in 1 blocks are definitely lost in loss record 7 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x9F1CBEB: ??? > ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9EE3527: ??? > ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== by 0x15710D: main (testing_main.cpp:8) > ==1286939== > ==1286939== 16 bytes in 1 blocks are definitely lost in loss record 8 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x9F1CC66: ??? > ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9EE3527: ??? > ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== by 0x15710D: main (testing_main.cpp:8) > ==1286939== > ==1286939== 16 bytes in 1 blocks are definitely lost in loss record 9 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x9F1CCDA: ??? > ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9EE3527: ??? > ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== by 0x15710D: main (testing_main.cpp:8) > ==1286939== > ==1286939== 25 bytes in 1 blocks are still reachable in loss record 10 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x651550E: strdup (strdup.c:42) > ==1286939== by 0x68F27BD: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x4B956B6: ompi_pml_v_output_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B95259: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x68D98FC: mca_base_framework_components_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x4B93FAE: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x4BA1734: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== > ==1286939== 30 bytes in 1 blocks are definitely lost in loss record 11 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0xA9A859B: ??? > ==1286939== by 0x4011B89: call_init.part.0 (dl-init.c:72) > ==1286939== by 0x4011C90: call_init (dl-init.c:30) > ==1286939== by 0x4011C90: _dl_init (dl-init.c:119) > ==1286939== by 0x65D6784: _dl_catch_exception (dl-error-skeleton.c:182) > ==1286939== by 0x401642C: dl_open_worker (dl-open.c:758) > ==1286939== by 0x65D6727: _dl_catch_exception (dl-error-skeleton.c:208) > ==1286939== by 0x40155F9: _dl_open (dl-open.c:837) > ==1286939== by 0x72DE34B: dlopen_doit (dlopen.c:66) > ==1286939== by 0x65D6727: _dl_catch_exception (dl-error-skeleton.c:208) > ==1286939== by 0x65D67F2: _dl_catch_error (dl-error-skeleton.c:227) > ==1286939== by 0x72DEB58: _dlerror_run (dlerror.c:170) > ==1286939== > ==1286939== 32 bytes in 1 blocks are still reachable in loss record 12 of 44 > ==1286939== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x7CC353E: event_get_supported_methods (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x68FEA98: opal_event_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68FE8CA: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68B8BCF: opal_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x6860120: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== > ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 13 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A5EB: ??? > ==1286939== by 0x84D2B0A: ??? > ==1286939== by 0x68602FB: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== > ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 14 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A5EB: ??? > ==1286939== by 0x84D2BCE: ??? > ==1286939== by 0x68602FB: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== > ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 15 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A5EB: ??? > ==1286939== by 0x84D2CB2: ??? > ==1286939== by 0x68602FB: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== > ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 16 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A5EB: ??? > ==1286939== by 0x84D2D91: ??? > ==1286939== by 0x68602FB: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== > ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 17 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E81BD8: ??? > ==1286939== by 0x8E89F4B: ??? > ==1286939== by 0x8D84A0D: ??? > ==1286939== by 0x8DF79C1: ??? > ==1286939== by 0x7CC3FDD: ??? (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x7CC487E: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x8DBDD55: ??? > ==1286939== by 0x4BF7608: start_thread (pthread_create.c:477) > ==1286939== by 0x6595102: clone (clone.S:95) > ==1286939== > ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 18 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A767: ??? > ==1286939== by 0x84D330E: ??? > ==1286939== by 0x68602FB: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== > ==1286939== 36 (32 direct, 4 indirect) bytes in 1 blocks are definitely lost in loss record 19 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A5EB: ??? > ==1286939== by 0x4B94C09: mca_pml_base_pml_check_selected (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x9F1E1E1: ??? > ==1286939== by 0x4BA1A09: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== > ==1286939== 40 bytes in 1 blocks are still reachable in loss record 20 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x7CFF4B6: ??? (in /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) > ==1286939== by 0x7CC5E26: event_global_setup_locks_ (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x7CFF68F: evthread_use_pthreads (in /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) > ==1286939== by 0x68FE8E4: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68B8BCF: opal_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x6860120: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== > ==1286939== 40 bytes in 1 blocks are still reachable in loss record 21 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x7CFF4B6: ??? (in /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) > ==1286939== by 0x7CCF377: evsig_global_setup_locks_ (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x7CC5E39: event_global_setup_locks_ (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x7CFF68F: evthread_use_pthreads (in /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) > ==1286939== by 0x68FE8E4: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68B8BCF: opal_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x6860120: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== > ==1286939== 40 bytes in 1 blocks are still reachable in loss record 22 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x7CFF4B6: ??? (in /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) > ==1286939== by 0x7CCB997: evutil_secure_rng_global_setup_locks_ (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x7CC5E4F: event_global_setup_locks_ (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x7CFF68F: evthread_use_pthreads (in /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) > ==1286939== by 0x68FE8E4: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68B8BCF: opal_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x6860120: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== > ==1286939== 48 bytes in 1 blocks are still reachable in loss record 23 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x68D9043: mca_base_component_repository_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68D7F7A: mca_base_component_find (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3F35: mca_base_framework_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3F93: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x4B8560C: mca_io_base_file_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B0E68A: ompi_file_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B3ADB8: PMPI_File_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) > ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) > ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) > ==1286939== > ==1286939== 48 bytes in 1 blocks are still reachable in loss record 24 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x68D9043: mca_base_component_repository_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68D7F7A: mca_base_component_find (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3F35: mca_base_framework_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3F93: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x4B85638: mca_io_base_file_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B0E68A: ompi_file_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B3ADB8: PMPI_File_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) > ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) > ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) > ==1286939== > ==1286939== 48 bytes in 2 blocks are still reachable in loss record 25 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x7CC3647: event_config_avoid_method (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x68FEB5A: opal_event_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68FE8CA: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68B8BCF: opal_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x6860120: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== > ==1286939== 55 (32 direct, 23 indirect) bytes in 1 blocks are definitely lost in loss record 26 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A767: ??? > ==1286939== by 0x4AF6CD6: ompi_comm_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA194D: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== > ==1286939== 56 bytes in 1 blocks are still reachable in loss record 27 of 44 > ==1286939== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x7CC1C86: event_config_new (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x68FEAC0: opal_event_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68FE8CA: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68B8BCF: opal_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x6860120: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== > ==1286939== 56 bytes in 1 blocks are definitely lost in loss record 28 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x9F6E008: ??? > ==1286939== by 0x9F7C654: ??? > ==1286939== by 0x9F1CD3E: ??? > ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9EE3527: ??? > ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== > ==1286939== 56 bytes in 1 blocks are definitely lost in loss record 29 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0xA957008: ??? > ==1286939== by 0xA86B017: ??? > ==1286939== by 0xA862FD8: ??? > ==1286939== by 0xA828E15: ??? > ==1286939== by 0xA829624: ??? > ==1286939== by 0x9F77910: ??? > ==1286939== by 0x4B85C53: ompi_mtl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x9F13E4D: ??? > ==1286939== by 0x4B94673: mca_pml_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA1789: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== > ==1286939== 76 (32 direct, 44 indirect) bytes in 1 blocks are definitely lost in loss record 30 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A767: ??? > ==1286939== by 0x84D387F: ??? > ==1286939== by 0x68602FB: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== > ==1286939== 79 (64 direct, 15 indirect) bytes in 1 blocks are definitely lost in loss record 31 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x9EDB12E: ??? > ==1286939== by 0x68D98FC: mca_base_framework_components_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x6907C25: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x4BA16D5: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== by 0x15710D: main (testing_main.cpp:8) > ==1286939== > ==1286939== 144 bytes in 3 blocks are still reachable in loss record 32 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x68D9043: mca_base_component_repository_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68D7F7A: mca_base_component_find (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3F35: mca_base_framework_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3F93: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x4B8564E: mca_io_base_file_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B0E68A: ompi_file_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B3ADB8: PMPI_File_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) > ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) > ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) > ==1286939== > ==1286939== 231 bytes in 12 blocks are definitely lost in loss record 33 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x651550E: strdup (strdup.c:42) > ==1286939== by 0x9F2B4B3: ??? > ==1286939== by 0x9F2B85C: ??? > ==1286939== by 0x9F2BBD7: ??? > ==1286939== by 0x9F1CAAC: ??? > ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9EE3527: ??? > ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== > ==1286939== 240 bytes in 5 blocks are still reachable in loss record 34 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x68D9043: mca_base_component_repository_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68D7F7A: mca_base_component_find (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3F35: mca_base_framework_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x68E3F93: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x4B85622: mca_io_base_file_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B0E68A: ompi_file_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B3ADB8: PMPI_File_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) > ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) > ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) > ==1286939== > ==1286939== 272 bytes in 44 blocks are definitely lost in loss record 35 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x9FCAEDB: ??? > ==1286939== by 0x9FE42B2: ??? > ==1286939== by 0x9FE47BB: ??? > ==1286939== by 0x9FCDDBF: ??? > ==1286939== by 0x9FA324A: ??? > ==1286939== by 0x4B3DD7F: PMPI_File_write_at_all (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x7B6DB32: H5FD_mpio_write (H5FDmpio.c:1833) > ==1286939== by 0x78DF11D: H5FD_write (H5FDint.c:257) > ==1286939== by 0x78AE86B: H5F__accum_write (H5Faccum.c:825) > ==1286939== by 0x7A1FBE9: H5PB_write (H5PB.c:1027) > ==1286939== by 0x78BBC7A: H5F_block_write (H5Fio.c:164) > ==1286939== > ==1286939== 585 (480 direct, 105 indirect) bytes in 15 blocks are definitely lost in loss record 36 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A767: ??? > ==1286939== by 0x4B14036: ompi_proc_complete_init_single (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B146C3: ompi_proc_complete_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA19A9: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== > ==1286939== 776 bytes in 32 blocks are indirectly lost in loss record 37 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8DE9816: ??? > ==1286939== by 0x8DEB1D2: ??? > ==1286939== by 0x8DEB49A: ??? > ==1286939== by 0x8DE8B12: ??? > ==1286939== by 0x8E9D492: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A767: ??? > ==1286939== > ==1286939== 840 (480 direct, 360 indirect) bytes in 15 blocks are definitely lost in loss record 38 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A5EB: ??? > ==1286939== by 0x9EF2F00: ??? > ==1286939== by 0x9EEBF17: ??? > ==1286939== by 0x9EE2F54: ??? > ==1286939== by 0x9F1E1FB: ??? > ==1286939== > ==1286939== 1,084 (480 direct, 604 indirect) bytes in 15 blocks are definitely lost in loss record 39 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A767: ??? > ==1286939== by 0x84D4800: ??? > ==1286939== by 0x68602FB: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== > ==1286939== 1,344 bytes in 1 blocks are definitely lost in loss record 40 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x68AE702: opal_free_list_grow_st (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9F1CD2D: ??? > ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9EE3527: ??? > ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== by 0x15710D: main (testing_main.cpp:8) > ==1286939== > ==1286939== 2,752 bytes in 1 blocks are definitely lost in loss record 41 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x68AE702: opal_free_list_grow_st (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9F1CC50: ??? > ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9EE3527: ??? > ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== by 0x15710D: main (testing_main.cpp:8) > ==1286939== > ==1286939== 2,752 bytes in 1 blocks are definitely lost in loss record 42 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x68AE702: opal_free_list_grow_st (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9F1CCC4: ??? > ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286939== by 0x9EE3527: ??? > ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286939== by 0x15710D: main (testing_main.cpp:8) > ==1286939== > ==1286939== 62,644 bytes in 31 blocks are indirectly lost in loss record 43 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8DE9FA8: ??? > ==1286939== by 0x8DEB032: ??? > ==1286939== by 0x8DEB49A: ??? > ==1286939== by 0x8DE8B12: ??? > ==1286939== by 0x8E9D492: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A5EB: ??? > ==1286939== > ==1286939== 62,760 (480 direct, 62,280 indirect) bytes in 15 blocks are definitely lost in loss record 44 of 44 > ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8E9D3EB: ??? > ==1286939== by 0x8E9F1C1: ??? > ==1286939== by 0x8D0578C: ??? > ==1286939== by 0x8D8605A: ??? > ==1286939== by 0x8D87FE8: ??? > ==1286939== by 0x8D88E4D: ??? > ==1286939== by 0x8D1A5EB: ??? > ==1286939== by 0x9F0398A: ??? > ==1286939== by 0x9EE2F54: ??? > ==1286939== by 0x9F1E1FB: ??? > ==1286939== by 0x4BA1A09: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286939== > ==1286939== LEAK SUMMARY: > ==1286939== definitely lost: 9,837 bytes in 138 blocks > ==1286939== indirectly lost: 63,435 bytes in 64 blocks > ==1286939== possibly lost: 0 bytes in 0 blocks > ==1286939== still reachable: 782 bytes in 21 blocks > ==1286939== suppressed: 0 bytes in 0 blocks > ==1286939== > ==1286939== ERROR SUMMARY: 29 errors from 29 contexts (suppressed: 0 from 0) > ==1286939== > ==1286939== 1 errors in context 1 of 29: > ==1286939== Thread 3: > ==1286939== Syscall param writev(vector[...]) points to uninitialised byte(s) > ==1286939== at 0x658A48D: __writev (writev.c:26) > ==1286939== by 0x658A48D: writev (writev.c:24) > ==1286939== by 0x8DF9B4C: ??? > ==1286939== by 0x7CC413E: ??? (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x7CC487E: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x8DBDD55: ??? > ==1286939== by 0x4BF7608: start_thread (pthread_create.c:477) > ==1286939== by 0x6595102: clone (clone.S:95) > ==1286939== Address 0xa28ee1f is 127 bytes inside a block of size 5,120 alloc'd > ==1286939== at 0x483DFAF: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286939== by 0x8DE155A: ??? > ==1286939== by 0x8DE3F4A: ??? > ==1286939== by 0x8DE4900: ??? > ==1286939== by 0x8DE4175: ??? > ==1286939== by 0x8D7CF91: ??? > ==1286939== by 0x7CC3FDD: ??? (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x7CC487E: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286939== by 0x8DBDD55: ??? > ==1286939== by 0x4BF7608: start_thread (pthread_create.c:477) > ==1286939== by 0x6595102: clone (clone.S:95) > ==1286939== Uninitialised value was created by a stack allocation > ==1286939== at 0x9F048D6: ??? > ==1286939== > ==1286939== ERROR SUMMARY: 29 errors from 29 contexts (suppressed: 0 from 0) > mpi/lib/libopen-pal.so.40.20.3) > ==1286936== by 0x4B85622: mca_io_base_file_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4B0E68A: ompi_file_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4B3ADB8: PMPI_File_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) > ==1286936== by 0x78D4B23: H5FD_open (H5FD.c:733) > ==1286936== by 0x78B953B: H5F_open (H5Fint.c:1493) > ==1286936== > ==1286936== 272 bytes in 44 blocks are definitely lost in loss record 39 of 49 > ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x9FCAEDB: ??? > ==1286936== by 0x9FE42B2: ??? > ==1286936== by 0x9FE47BB: ??? > ==1286936== by 0x9FCDDBF: ??? > ==1286936== by 0x9FA324A: ??? > ==1286936== by 0x4B3DD7F: PMPI_File_write_at_all (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x7B6DB32: H5FD_mpio_write (H5FDmpio.c:1833) > ==1286936== by 0x78DF11D: H5FD_write (H5FDint.c:257) > ==1286936== by 0x78AE86B: H5F__accum_write (H5Faccum.c:825) > ==1286936== by 0x7A1FBE9: H5PB_write (H5PB.c:1027) > ==1286936== by 0x78BBC7A: H5F_block_write (H5Fio.c:164) > ==1286936== > ==1286936== 312 bytes in 1 blocks are still reachable in loss record 40 of 49 > ==1286936== at 0x483BE63: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x74E78EB: boost::detail::make_external_thread_data() (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_thread.so.1.73.0) > ==1286936== by 0x74E7C74: boost::detail::add_thread_exit_function(boost::detail::thread_exit_function_base*) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_thread.so.1.73.0) > ==1286936== by 0x73AFCEA: boost::log::v2_mt_posix::sources::aux::get_severity_level() (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_log.so.1.73.0) > ==1286936== by 0x5F71A6C: set_value (severity_feature.hpp:135) > ==1286936== by 0x5F71A6C: open_record_unlocked > > (severity_feature.hpp:252) > ==1286936== by 0x5F71A6C: open_record > > (basic_logger.hpp:459) > ==1286936== by 0x5F71A6C: Logger::TraceMessage(std::__cxx11::basic_string, std::allocator >) (logger.cpp:328) > ==1286936== by 0x5F729C7: Logger::Message(std::__cxx11::basic_string, std::allocator > const&, LogLevel) (logger.cpp:280) > ==1286936== by 0x5F73CF1: Logger::Timer::Timer(std::__cxx11::basic_string, std::allocator > const&, LogLevel) (logger.cpp:426) > ==1286936== by 0x15718A: timer (logger.hpp:98) > ==1286936== by 0x15718A: main (testing_main.cpp:9) > ==1286936== > ==1286936== 585 (480 direct, 105 indirect) bytes in 15 blocks are definitely lost in loss record 41 of 49 > ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x8E9D3EB: ??? > ==1286936== by 0x8E9F1C1: ??? > ==1286936== by 0x8D0578C: ??? > ==1286936== by 0x8D8605A: ??? > ==1286936== by 0x8D87FE8: ??? > ==1286936== by 0x8D88E4D: ??? > ==1286936== by 0x8D1A767: ??? > ==1286936== by 0x4B14036: ompi_proc_complete_init_single (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4B146C3: ompi_proc_complete_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4BA19A9: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== > ==1286936== 776 bytes in 32 blocks are indirectly lost in loss record 42 of 49 > ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x8DE9816: ??? > ==1286936== by 0x8DEB1D2: ??? > ==1286936== by 0x8DEB49A: ??? > ==1286936== by 0x8DE8B12: ??? > ==1286936== by 0x8E9D492: ??? > ==1286936== by 0x8E9F1C1: ??? > ==1286936== by 0x8D0578C: ??? > ==1286936== by 0x8D8605A: ??? > ==1286936== by 0x8D87FE8: ??? > ==1286936== by 0x8D88E4D: ??? > ==1286936== by 0x8D1A767: ??? > ==1286936== > ==1286936== 840 (480 direct, 360 indirect) bytes in 15 blocks are definitely lost in loss record 43 of 49 > ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x8E9D3EB: ??? > ==1286936== by 0x8E9F1C1: ??? > ==1286936== by 0x8D0578C: ??? > ==1286936== by 0x8D8605A: ??? > ==1286936== by 0x8D87FE8: ??? > ==1286936== by 0x8D88E4D: ??? > ==1286936== by 0x8D1A5EB: ??? > ==1286936== by 0x9EF2F00: ??? > ==1286936== by 0x9EEBF17: ??? > ==1286936== by 0x9EE2F54: ??? > ==1286936== by 0x9F1E1FB: ??? > ==1286936== > ==1286936== 1,091 (480 direct, 611 indirect) bytes in 15 blocks are definitely lost in loss record 44 of 49 > ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x8E9D3EB: ??? > ==1286936== by 0x8E9F1C1: ??? > ==1286936== by 0x8D0578C: ??? > ==1286936== by 0x8D8605A: ??? > ==1286936== by 0x8D87FE8: ??? > ==1286936== by 0x8D88E4D: ??? > ==1286936== by 0x8D1A767: ??? > ==1286936== by 0x84D4800: ??? > ==1286936== by 0x68602FB: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) > ==1286936== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== > ==1286936== 1,344 bytes in 1 blocks are definitely lost in loss record 45 of 49 > ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x68AE702: opal_free_list_grow_st (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286936== by 0x9F1CD2D: ??? > ==1286936== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286936== by 0x9EE3527: ??? > ==1286936== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286936== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286936== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286936== by 0x15710D: main (testing_main.cpp:8) > ==1286936== > ==1286936== 2,752 bytes in 1 blocks are definitely lost in loss record 46 of 49 > ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x68AE702: opal_free_list_grow_st (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286936== by 0x9F1CC50: ??? > ==1286936== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286936== by 0x9EE3527: ??? > ==1286936== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286936== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286936== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286936== by 0x15710D: main (testing_main.cpp:8) > ==1286936== > ==1286936== 2,752 bytes in 1 blocks are definitely lost in loss record 47 of 49 > ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x68AE702: opal_free_list_grow_st (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286936== by 0x9F1CCC4: ??? > ==1286936== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) > ==1286936== by 0x9EE3527: ??? > ==1286936== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) > ==1286936== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) > ==1286936== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) > ==1286936== by 0x15710D: main (testing_main.cpp:8) > ==1286936== > ==1286936== 62,640 bytes in 30 blocks are indirectly lost in loss record 48 of 49 > ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x8DE9FA8: ??? > ==1286936== by 0x8DEB032: ??? > ==1286936== by 0x8DEB49A: ??? > ==1286936== by 0x8DE8B12: ??? > ==1286936== by 0x8E9D492: ??? > ==1286936== by 0x8E9F1C1: ??? > ==1286936== by 0x8D0578C: ??? > ==1286936== by 0x8D8605A: ??? > ==1286936== by 0x8D87FE8: ??? > ==1286936== by 0x8D88E4D: ??? > ==1286936== by 0x8D1A5EB: ??? > ==1286936== > ==1286936== 62,760 (480 direct, 62,280 indirect) bytes in 15 blocks are definitely lost in loss record 49 of 49 > ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x8E9D3EB: ??? > ==1286936== by 0x8E9F1C1: ??? > ==1286936== by 0x8D0578C: ??? > ==1286936== by 0x8D8605A: ??? > ==1286936== by 0x8D87FE8: ??? > ==1286936== by 0x8D88E4D: ??? > ==1286936== by 0x8D1A5EB: ??? > ==1286936== by 0x9F0398A: ??? > ==1286936== by 0x9EE2F54: ??? > ==1286936== by 0x9F1E1FB: ??? > ==1286936== by 0x4BA1A09: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== > ==1286936== LEAK SUMMARY: > ==1286936== definitely lost: 9,805 bytes in 137 blocks > ==1286936== indirectly lost: 63,431 bytes in 63 blocks > ==1286936== possibly lost: 0 bytes in 0 blocks > ==1286936== still reachable: 1,174 bytes in 27 blocks > ==1286936== suppressed: 0 bytes in 0 blocks > ==1286936== > ==1286936== ERROR SUMMARY: 34 errors from 29 contexts (suppressed: 0 from 0) > ==1286936== > ==1286936== 1 errors in context 1 of 29: > ==1286936== Thread 3: > ==1286936== Syscall param writev(vector[...]) points to uninitialised byte(s) > ==1286936== at 0x658A48D: __writev (writev.c:26) > ==1286936== by 0x658A48D: writev (writev.c:24) > ==1286936== by 0x8DF9B4C: ??? > ==1286936== by 0x7CC413E: ??? (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286936== by 0x7CC487E: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286936== by 0x8DBDD55: ??? > ==1286936== by 0x4BF7608: start_thread (pthread_create.c:477) > ==1286936== by 0x6595102: clone (clone.S:95) > ==1286936== Address 0xa290cbf is 127 bytes inside a block of size 5,120 alloc'd > ==1286936== at 0x483DFAF: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x8DE155A: ??? > ==1286936== by 0x8DE3F4A: ??? > ==1286936== by 0x8DE4900: ??? > ==1286936== by 0x8DE4175: ??? > ==1286936== by 0x8D7CF91: ??? > ==1286936== by 0x7CC3FDD: ??? (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286936== by 0x7CC487E: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) > ==1286936== by 0x8DBDD55: ??? > ==1286936== by 0x4BF7608: start_thread (pthread_create.c:477) > ==1286936== by 0x6595102: clone (clone.S:95) > ==1286936== Uninitialised value was created by a stack allocation > ==1286936== at 0x9F048D6: ??? > ==1286936== > ==1286936== > ==1286936== 6 errors in context 2 of 29: > ==1286936== Thread 1: > ==1286936== Syscall param pwritev(vector[...]) points to uninitialised byte(s) > ==1286936== at 0x658A608: pwritev64 (pwritev64.c:30) > ==1286936== by 0x658A608: pwritev (pwritev64.c:28) > ==1286936== by 0x9F46E25: ??? > ==1286936== by 0x9FCE33B: ??? > ==1286936== by 0x9FCDDBF: ??? > ==1286936== by 0x9FA324A: ??? > ==1286936== by 0x4B3DD7F: PMPI_File_write_at_all (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) > ==1286936== by 0x7B6DB32: H5FD_mpio_write (H5FDmpio.c:1833) > ==1286936== by 0x78DF11D: H5FD_write (H5FDint.c:257) > ==1286936== by 0x78AE86B: H5F__accum_write (H5Faccum.c:825) > ==1286936== by 0x7A1FBE9: H5PB_write (H5PB.c:1027) > ==1286936== by 0x78BBC7A: H5F_block_write (H5Fio.c:164) > ==1286936== by 0x7B5ED15: H5C__collective_write (H5Cmpio.c:1020) > ==1286936== by 0x7B5ED15: H5C_apply_candidate_list (H5Cmpio.c:394) > ==1286936== Address 0xedf91b0 is 96 bytes inside a block of size 216 alloc'd > ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) > ==1286936== by 0x7980C45: H5MM_malloc (H5MM.c:292) > ==1286936== by 0x7980C45: H5MM_malloc (H5MM.c:267) > ==1286936== by 0x77FC8FF: H5C__flush_single_entry (H5C.c:6045) > ==1286936== by 0x7B5DC7E: H5C__flush_candidates_in_ring (H5Cmpio.c:1371) > ==1286936== by 0x7B5DC7E: H5C__flush_candidate_entries (H5Cmpio.c:1192) > ==1286936== by 0x7B5DC7E: H5C_apply_candidate_list (H5Cmpio.c:385) > ==1286936== by 0x7B5BA18: H5AC__rsp__dist_md_write__flush (H5ACmpio.c:1709) > ==1286936== by 0x7B5BA18: H5AC__run_sync_point (H5ACmpio.c:2164) > ==1286936== by 0x7B5C9D2: H5AC__flush_entries (H5ACmpio.c:2307) > ==1286936== by 0x77C95E4: H5AC_flush (H5AC.c:681) > ==1286936== by 0x78B306A: H5F__flush_phase2 (H5Fint.c:1831) > ==1286936== by 0x78B5D7A: H5F__dest (H5Fint.c:1152) > ==1286936== by 0x78B6603: H5F_try_close (H5Fint.c:2180) > ==1286936== by 0x78B69F5: H5F__close_cb (H5Fint.c:2009) > ==1286936== by 0x7965797: H5I_dec_ref (H5I.c:1254) > ==1286936== Uninitialised value was created by a stack allocation > ==1286936== at 0x7695AF0: ??? (in /home/mlohry/dev/cmake-build/external_install/lib/libcgns.so) > ==1286936== > ==1286936== ERROR SUMMARY: 34 errors from 29 contexts (suppressed: 0 from 0) > > On Mon, Aug 24, 2020 at 5:00 PM Jed Brown > wrote: > Do you potentially have a memory or other resource leak? SIGBUS would be an odd result, but the symptom of crashing after running for a long time sometimes fits with a resource leak. > > Mark Lohry > writes: > > > I queued up some jobs with Barry's patch, so we'll see. > > > > Re Jed's suggestion at checkpointing, I don't *think* this is something > > coming from the state of the solution -- running from the same point I'm > > seeing it crash anywhere between 1 hour and 20 hours in. I'll increase my > > file save frequency in case I'm wrong there though. > > > > My intel build with different blas just made it through a 6 hour time slot > > without crash, whereas yesterday the same thing crashed after 3 hours. But > > given the randomness so far I'd bet that's just dumb luck. > > > > On Mon, Aug 24, 2020 at 4:22 PM Barry Smith > wrote: > > > >> > >> > >> > On Aug 24, 2020, at 2:34 PM, Jed Brown > wrote: > >> > > >> > I'm thinking of something such as writing floating point data into the > >> return address, which would be unaligned/garbage. > >> > >> Ok, my patch will detect this. This is what I was talking about, messing > >> up the BLAS arguments which are the addresses of arrays. > >> > >> Valgrind is by far the preferred approach. > >> > >> Barry > >> > >> Another feature we could add to the malloc checking is when a SEGV or > >> BUS error is encountered and we catch it we should run the > >> PetscMallocVerify() and check our memory for corruption reporting any we > >> find. > >> > >> > >> > >> > > >> > Reproducing under Valgrind would help a lot. Perhaps it's possible to > >> checkpoint such that the breakage can be reproduced more quickly? > >> > > >> > Barry Smith > writes: > >> > > >> >> https://en.wikipedia.org/wiki/Bus_error < > >> https://en.wikipedia.org/wiki/Bus_error > > >> >> > >> >> But perhaps not true for Intel? > >> >> > >> >> > >> >> > >> >>> On Aug 24, 2020, at 1:06 PM, Matthew Knepley > > >> wrote: > >> >>> > >> >>> On Mon, Aug 24, 2020 at 1:46 PM Barry Smith >> bsmith at petsc.dev >> wrote: > >> >>> > >> >>> > >> >>>> On Aug 24, 2020, at 12:39 PM, Jed Brown >> jed at jedbrown.org >> wrote: > >> >>>> > >> >>>> Barry Smith >> writes: > >> >>>> > >> >>>>>> On Aug 24, 2020, at 12:31 PM, Jed Brown >> jed at jedbrown.org >> wrote: > >> >>>>>> > >> >>>>>> Barry Smith >> writes: > >> >>>>>> > >> >>>>>>> So if a BLAS errors with SIGBUS then it is always an input error > >> of just not proper double/complex alignment? Or some other very strange > >> thing? > >> >>>>>> > >> >>>>>> I would suspect memory corruption. > >> >>>>> > >> >>>>> > >> >>>>> Corruption meaning what specifically? > >> >>>>> > >> >>>>> The routines crashing are dgemv which only take double precision > >> arrays, regardless of what garbage is in those arrays i don't think there > >> can be BUS errors resulting. They don't take integer arrays whose > >> corruption could result in bad indexing and then BUS errors. > >> >>>>> > >> >>>>> So then it can only be corruption of the pointers passed in, correct? > >> >>>> > >> >>>> Such as those pointers pointing into data on the stack with incorrect > >> sizes. > >> >>> > >> >>> But won't incorrect sizes "usually" lead to SEGV not SEGBUS? > >> >>> > >> >>> My understanding was that roughly memory errors in the heap are SEGV > >> and memory errors on the stack are SIGBUS. Is that not true? > >> >>> > >> >>> Matt > >> >>> > >> >>> -- > >> >>> What most experimenters take for granted before they begin their > >> experiments is infinitely more interesting than any results to which their > >> experiments lead. > >> >>> -- Norbert Wiener > >> >>> > >> >>> https://www.cse.buffalo.edu/~knepley/ < > >> http://www.cse.buffalo.edu/~knepley/ > > >> > >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlohry at gmail.com Thu Aug 27 08:53:31 2020 From: mlohry at gmail.com (Mark Lohry) Date: Thu, 27 Aug 2020 09:53:31 -0400 Subject: [petsc-users] Bus Error In-Reply-To: <9EEB2628-D6ED-4466-A629-33EAC73BCE4C@petsc.dev> References: <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> <02176361-CBC4-490B-A064-057C9FAC271A@petsc.dev> <2C14E111-0ABC-4322-AD1C-FC83E4BD4798@petsc.dev> <87h7ssgg0g.fsf@jedbrown.org> <80DC4DC1-8D9B-43BF-9947-F47694FE68E9@petsc.dev> <87blj0gew3.fsf@jedbrown.org> <878se4gdd6.fsf@jedbrown.org> <3B4D8471-F863-49C8-AAD7-4A4ACA3BE78A@petsc.dev> <87364cgczg.fsf@jedbrown.org> <79E082F4-0261-4F32-9781-861B2B650511@petsc.dev> <87y2m3g7mp.fsf@jedbrown.org> <1BA78983-882E-404D-983D-B432D17E6421@petsc.dev> <87a6yjg3o5.fsf@jedbrown.org> <9EEB2628-D6ED-4466-A629-33EAC73BCE4C@petsc.dev> Message-ID: It was built with --with-debugging=1 On Thu, Aug 27, 2020 at 9:44 AM Barry Smith wrote: > > Mark, > > Did i tell you that this has to be built with the configure option > --with-debugging=1 and won't be turned off with --with-debugging=0 ? > > Barry > > > On Aug 27, 2020, at 8:10 AM, Mark Lohry wrote: > > Barry, no output from that patch i'm afraid: > > 54 KSP Residual norm 3.215013886664e+03 > 55 KSP Residual norm 3.049105434513e+03 > 56 KSP Residual norm 2.859123916860e+03 > [929]PETSC ERROR: > ------------------------------------------------------------------------ > [929]PETSC ERROR: Caught signal number 7 BUS: Bus Error, possibly illegal > memory access > [929]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > [929]PETSC ERROR: or see > https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [929]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac > OS X to find memory corruption errors > [929]PETSC ERROR: likely location of problem given in stack below > [929]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [929]PETSC ERROR: Note: The EXACT line numbers in the stack are not > available, > [929]PETSC ERROR: INSTEAD the line number of the start of the > function > [929]PETSC ERROR: is given. > [929]PETSC ERROR: [929] BLASgemv line 1406 > /home/mlohry/petsc/src/mat/impls/baij/seq/baijfact.c > [929]PETSC ERROR: [929] MatSolve_SeqBAIJ_N_NaturalOrdering line 1378 > /home/mlohry/petsc/src/mat/impls/baij/seq/baijfact.c > [929]PETSC ERROR: [929] MatSolve line 3354 > /home/mlohry/petsc/src/mat/interface/matrix.c > [929]PETSC ERROR: [929] PCApply_ILU line 201 > /home/mlohry/petsc/src/ksp/pc/impls/factor/ilu/ilu.c > [929]PETSC ERROR: [929] PCApply line 426 > /home/mlohry/petsc/src/ksp/pc/interface/precon.c > [929]PETSC ERROR: [929] KSP_PCApply line 279 > /home/mlohry/petsc/include/petsc/private/kspimpl.h > [929]PETSC ERROR: [929] KSPSolve_PREONLY line 16 > /home/mlohry/petsc/src/ksp/ksp/impls/preonly/preonly.c > [929]PETSC ERROR: [929] KSPSolve_Private line 590 > /home/mlohry/petsc/src/ksp/ksp/interface/itfunc.c > [929]PETSC ERROR: [929] KSPSolve line 848 > /home/mlohry/petsc/src/ksp/ksp/interface/itfunc.c > [929]PETSC ERROR: [929] PCApply_ASM line 441 > /home/mlohry/petsc/src/ksp/pc/impls/asm/asm.c > [929]PETSC ERROR: [929] PCApply line 426 > /home/mlohry/petsc/src/ksp/pc/interface/precon.c > [929]PETSC ERROR: [929] KSP_PCApply line 279 > /home/mlohry/petsc/include/petsc/private/kspimpl.h > srun: Job step aborted: Waiting up to 47 seconds for job step to finish. > [929]PETSC ERROR: [929] KSPFGMRESCycle line 108 > /home/mlohry/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c > [929]PETSC ERROR: [929] KSPSolve_FGMRES line 274 > /home/mlohry/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c > [929]PETSC ERROR: [929] KSPSolve_Private line 590 > /home/mlohry/petsc/src/ksp/ksp/interface/itfunc.c > > On Mon, Aug 24, 2020 at 6:47 PM Mark Lohry wrote: > >> I don't think I do. Running a much smaller case with the same models I >> get the attached report from valgrind --show-leak-kinds=all >> --leak-check=full --track-origins=yes. I only see some HDF5 stuff and >> OpenMPI that I think are false positives. >> >> ==1286950== Memcheck, a memory error detector >> ==1286950== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et >> al. >> ==1286950== Using Valgrind-3.15.0-608cb11914-20190413 and LibVEX; rerun >> with -h for copyright info >> ==1286950== Command: ./verification_testing >> --gtest_filter=DrivenCavity3D.Re100_BackwardEulerILU1_16x16N2_Quadrature1 >> --petsc_time_integrator=arkimex --petsc_arkimex_type=l2 >> ==1286950== Parent PID: 1286932 >> ==1286950== >> --1286950-- >> --1286950-- Valgrind options: >> --1286950-- --show-leak-kinds=all >> --1286950-- --leak-check=full >> --1286950-- --track-origins=yes >> --1286950-- --log-file=valgrind-out.txt >> --1286950-- -v >> --1286950-- Contents of /proc/version: >> --1286950-- Linux version 5.4.0-29-generic (buildd at lgw01-amd64-035) >> (gcc version 9.3.0 (Ubuntu 9.3.0-10ubuntu2)) #33-Ubuntu SMP Wed Apr 29 >> 14:32:27 UTC 2020 >> --1286950-- >> --1286950-- Arch and hwcaps: AMD64, LittleEndian, >> amd64-cx16-rdtscp-sse3-ssse3-avx >> --1286950-- Page sizes: currently 4096, max supported 4096 >> --1286950-- Valgrind library directory: /usr/lib/x86_64-linux-gnu/valgrind >> --1286950-- Reading syms from >> /home/mlohry/dev/cmake-build/verification_testing >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/ld-2.31.so >> --1286950-- Considering /usr/lib/x86_64-linux-gnu/ld-2.31.so .. >> --1286950-- .. CRC mismatch (computed 387b17ea wanted d28cf5ef) >> --1286950-- Considering /lib/x86_64-linux-gnu/ld-2.31.so .. >> --1286950-- .. CRC mismatch (computed 387b17ea wanted d28cf5ef) >> --1286950-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ld-2.31.so >> .. >> --1286950-- .. CRC is valid >> --1286950-- Reading syms from >> /usr/lib/x86_64-linux-gnu/valgrind/memcheck-amd64-linux >> --1286950-- object doesn't have a symbol table >> --1286950-- object doesn't have a dynamic symbol table >> --1286950-- Scheduler: using generic scheduler lock implementation. >> --1286950-- Reading suppressions file: >> /usr/lib/x86_64-linux-gnu/valgrind/default.supp >> ==1286950== embedded gdbserver: reading from >> /tmp/vgdb-pipe-from-vgdb-to-1286950-by-mlohry-on-??? >> ==1286950== embedded gdbserver: writing to >> /tmp/vgdb-pipe-to-vgdb-from-1286950-by-mlohry-on-??? >> ==1286950== embedded gdbserver: shared mem >> /tmp/vgdb-pipe-shared-mem-vgdb-1286950-by-mlohry-on-??? >> ==1286950== >> ==1286950== TO CONTROL THIS PROCESS USING vgdb (which you probably >> ==1286950== don't want to do, unless you know exactly what you're doing, >> ==1286950== or are doing some strange experiment): >> ==1286950== /usr/lib/x86_64-linux-gnu/valgrind/../../bin/vgdb >> --pid=1286950 ...command... >> ==1286950== >> ==1286950== TO DEBUG THIS PROCESS USING GDB: start GDB like this >> ==1286950== /path/to/gdb ./verification_testing >> ==1286950== and then give GDB the following command >> ==1286950== target remote | >> /usr/lib/x86_64-linux-gnu/valgrind/../../bin/vgdb --pid=1286950 >> ==1286950== --pid is optional if only one valgrind process is running >> ==1286950== >> --1286950-- REDIR: 0x4022d80 (ld-linux-x86-64.so.2:strlen) redirected to >> 0x580c9ce2 (???) >> --1286950-- REDIR: 0x4022b50 (ld-linux-x86-64.so.2:index) redirected to >> 0x580c9cfc (???) >> --1286950-- Reading syms from >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_core-amd64-linux.so >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so >> --1286950-- object doesn't have a symbol table >> ==1286950== WARNING: new redirection conflicts with existing -- ignoring >> it >> --1286950-- old: 0x04022d80 (strlen ) R-> (0000.0) >> 0x580c9ce2 ??? >> --1286950-- new: 0x04022d80 (strlen ) R-> (2007.0) >> 0x0483f060 strlen >> --1286950-- REDIR: 0x401f560 (ld-linux-x86-64.so.2:strcmp) redirected to >> 0x483ffd0 (strcmp) >> --1286950-- REDIR: 0x40232e0 (ld-linux-x86-64.so.2:mempcpy) redirected to >> 0x4843a20 (mempcpy) >> --1286950-- Reading syms from >> /home/mlohry/dev/cmake-build/initialization/libinitialization.so >> --1286950-- Reading syms from >> /home/mlohry/dev/cmake-build/governing_equations/libgoverning_equations.so >> --1286950-- Reading syms from >> /home/mlohry/dev/cmake-build/time_stepping/libtime_stepping.so >> --1286950-- Reading syms from >> /home/mlohry/dev/cmake-build/governing_equations/libboundary_conditions.so >> --1286950-- Reading syms from >> /home/mlohry/dev/cmake-build/governing_equations/libsolution_monitors.so >> --1286950-- Reading syms from >> /home/mlohry/dev/cmake-build/governing_equations/libfluxtypes.so >> --1286950-- Reading syms from >> /home/mlohry/dev/cmake-build/algebraic_solvers/libalgebraic_solvers.so >> --1286950-- Reading syms from >> /home/mlohry/dev/cmake-build/program_options/libprogram_options.so >> --1286950-- Reading syms from >> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_filesystem.so.1.73.0 >> --1286950-- Reading syms from >> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0 >> --1286950-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so.40.20.1 >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3 >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/ >> libpthread-2.31.so >> --1286950-- Considering >> /usr/lib/debug/.build-id/77/5cbbfff814456660786780b0b3b40096b4c05e.debug .. >> --1286950-- .. build-id is valid >> --1286948-- Reading syms from >> /home/mlohry/dev/cmake-build/external/petsc/arch-linux2-c-opt/lib/libpetsc.so.3.13.3 >> --1286937-- Reading syms from >> /home/mlohry/dev/cmake-build/parallel/libparallel.so >> --1286937-- Reading syms from >> /home/mlohry/dev/cmake-build/logger/liblogger.so >> --1286937-- Reading syms from >> /home/mlohry/dev/cmake-build/spatial_discretization/libdiscretization.so >> --1286945-- Reading syms from >> /home/mlohry/dev/cmake-build/utils/libutils.so >> --1286944-- Reading syms from >> /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28 >> --1286938-- object doesn't have a symbol table >> --1286949-- Reading syms from /usr/lib/x86_64-linux-gnu/libm-2.31.so >> --1286949-- Considering /usr/lib/x86_64-linux-gnu/libm-2.31.so .. >> --1286947-- .. CRC mismatch (computed 327d785f wanted 751f5509) >> --1286947-- Considering /lib/x86_64-linux-gnu/libm-2.31.so .. >> --1286938-- .. CRC mismatch (computed 327d785f wanted 751f5509) >> --1286937-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ >> libm-2.31.so .. >> --1286950-- .. CRC is valid >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libc-2.31.so >> --1286950-- Considering /usr/lib/x86_64-linux-gnu/libc-2.31.so .. >> --1286951-- .. CRC mismatch (computed a6f43087 wanted 6555436e) >> --1286951-- Considering /lib/x86_64-linux-gnu/libc-2.31.so .. >> --1286947-- .. CRC mismatch (computed a6f43087 wanted 6555436e) >> --1286947-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ >> libc-2.31.so .. >> --1286950-- .. CRC is valid >> --1286940-- Reading syms from >> /home/mlohry/dev/cmake-build/file_io/libfileio.so >> --1286950-- Reading syms from >> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_program_options.so.1.73.0 >> --1286950-- Reading syms from >> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_serialization.so.1.73.0 >> --1286950-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3 >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3 >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libhwloc.so.15.1.0 >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from >> /home/mlohry/dev/cmake-build/external/petsc/arch-linux2-c-opt/lib/libsuperlu_dist.so.6.3.0 >> --1286950-- Reading syms from >> /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 >> --1286950-- object doesn't have a symbol table >> --1286937-- Reading syms from >> /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 >> --1286937-- object doesn't have a symbol table >> --1286947-- Reading syms from /usr/lib/x86_64-linux-gnu/libX11.so.6.3.0 >> --1286939-- object doesn't have a symbol table >> --1286947-- Reading syms from /usr/lib/x86_64-linux-gnu/libdl-2.31.so >> --1286947-- Considering /usr/lib/x86_64-linux-gnu/libdl-2.31.so .. >> --1286947-- .. CRC mismatch (computed 4fd191ca wanted df8dd39a) >> --1286947-- Considering /lib/x86_64-linux-gnu/libdl-2.31.so .. >> --1286947-- .. CRC mismatch (computed 4fd191ca wanted df8dd39a) >> --1286947-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ >> libdl-2.31.so .. >> --1286947-- .. CRC is valid >> --1286937-- Reading syms from >> /home/mlohry/dev/cmake-build/external/petsc/arch-linux2-c-opt/lib/libmetis.so >> --1286937-- Reading syms from >> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_log.so.1.73.0 >> --1286942-- Reading syms from >> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_log_setup.so.1.73.0 >> --1286942-- Reading syms from >> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_thread.so.1.73.0 >> --1286942-- Reading syms from >> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_regex.so.1.73.0 >> --1286949-- Reading syms from >> /home/mlohry/dev/cmake-build/basis_functions/libbasis_functions.so >> --1286944-- Reading syms from /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0 >> --1286944-- object doesn't have a symbol table >> --1286951-- Reading syms from >> /home/mlohry/dev/cmake-build/external_install/lib/libcgns.so >> --1286951-- object doesn't have a symbol table >> --1286943-- Reading syms from >> /home/mlohry/dev/cmake-build/external_install/lib/libhdf5.so.103.1.0 >> --1286951-- Reading syms from >> /home/mlohry/dev/cmake-build/external/tinyxml2-build/libtinyxml2.so.6.1.0 >> --1286944-- Reading syms from >> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_iostreams.so.1.73.0 >> --1286944-- Reading syms from /usr/lib/x86_64-linux-gnu/libz.so.1.2.11 >> --1286944-- object doesn't have a symbol table >> --1286951-- Reading syms from >> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0 >> --1286951-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libutil-2.31.so >> --1286946-- Considering /usr/lib/x86_64-linux-gnu/libutil-2.31.so .. >> --1286946-- .. CRC mismatch (computed 4639aba5 wanted ceb246b4) >> --1286946-- Considering /lib/x86_64-linux-gnu/libutil-2.31.so .. >> --1286946-- .. CRC mismatch (computed 4639aba5 wanted ceb246b4) >> --1286948-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ >> libutil-2.31.so .. >> --1286939-- .. CRC is valid >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0 >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libudev.so.1.6.17 >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libltdl.so.7.3.1 >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from >> /usr/lib/x86_64-linux-gnu/libgfortran.so.5.0.0 >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libxcb.so.1.1.0 >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/librt-2.31.so >> --1286950-- Considering /usr/lib/x86_64-linux-gnu/librt-2.31.so .. >> --1286950-- .. CRC mismatch (computed a9acc0ce wanted cb4905a3) >> --1286950-- Considering /lib/x86_64-linux-gnu/librt-2.31.so .. >> --1286950-- .. CRC mismatch (computed a9acc0ce wanted cb4905a3) >> --1286950-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ >> librt-2.31.so .. >> --1286950-- .. CRC is valid >> --1286950-- Reading syms from >> /usr/lib/x86_64-linux-gnu/libquadmath.so.0.0.0 >> --1286950-- object doesn't have a symbol table >> --1286945-- Reading syms from /usr/lib/x86_64-linux-gnu/libXau.so.6.0.0 >> --1286945-- Considering /usr/lib/x86_64-linux-gnu/libXau.so.6.0.0 .. >> --1286945-- .. CRC mismatch (computed 7de9b6ad wanted e8a17129) >> --1286945-- Considering /lib/x86_64-linux-gnu/libXau.so.6.0.0 .. >> --1286945-- .. CRC mismatch (computed 7de9b6ad wanted e8a17129) >> --1286945-- object doesn't have a symbol table >> --1286945-- Reading syms from /usr/lib/x86_64-linux-gnu/libXdmcp.so.6.0.0 >> --1286942-- object doesn't have a symbol table >> --1286942-- Reading syms from /usr/lib/x86_64-linux-gnu/libbsd.so.0.10.0 >> --1286942-- object doesn't have a symbol table >> --1286950-- REDIR: 0x6516600 (libc.so.6:memmove) redirected to 0x48331d0 >> (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6515900 (libc.so.6:strncpy) redirected to 0x48331d0 >> (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6516930 (libc.so.6:strcasecmp) redirected to >> 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6515220 (libc.so.6:strcat) redirected to 0x48331d0 >> (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6515960 (libc.so.6:rindex) redirected to 0x48331d0 >> (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6517dd0 (libc.so.6:rawmemchr) redirected to >> 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6532e60 (libc.so.6:wmemchr) redirected to 0x48331d0 >> (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x65329a0 (libc.so.6:wcscmp) redirected to 0x48331d0 >> (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6516760 (libc.so.6:mempcpy) redirected to 0x48331d0 >> (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6516590 (libc.so.6:bcmp) redirected to 0x48331d0 >> (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6515890 (libc.so.6:strncmp) redirected to 0x48331d0 >> (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x65152d0 (libc.so.6:strcmp) redirected to 0x48331d0 >> (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x65166c0 (libc.so.6:memset) redirected to 0x48331d0 >> (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6532960 (libc.so.6:wcschr) redirected to 0x48331d0 >> (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x65157f0 (libc.so.6:strnlen) redirected to 0x48331d0 >> (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x65153b0 (libc.so.6:strcspn) redirected to 0x48331d0 >> (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6516980 (libc.so.6:strncasecmp) redirected to >> 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6515350 (libc.so.6:strcpy) redirected to 0x48331d0 >> (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6516ad0 (libc.so.6:memcpy@@GLIBC_2.14) redirected >> to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x65340d0 (libc.so.6:wcsnlen) redirected to 0x48331d0 >> (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x65329e0 (libc.so.6:wcscpy) redirected to 0x48331d0 >> (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x65159a0 (libc.so.6:strpbrk) redirected to 0x48331d0 >> (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6515280 (libc.so.6:index) redirected to 0x48331d0 >> (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x65157b0 (libc.so.6:strlen) redirected to 0x48331d0 >> (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x651ed20 (libc.so.6:memrchr) redirected to 0x48331d0 >> (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x65169d0 (libc.so.6:strcasecmp_l) redirected to >> 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6516550 (libc.so.6:memchr) redirected to 0x48331d0 >> (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6532ab0 (libc.so.6:wcslen) redirected to 0x48331d0 >> (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6515c60 (libc.so.6:strspn) redirected to 0x48331d0 >> (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x65168d0 (libc.so.6:stpncpy) redirected to 0x48331d0 >> (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6516870 (libc.so.6:stpcpy) redirected to 0x48331d0 >> (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6517e10 (libc.so.6:strchrnul) redirected to >> 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6516a20 (libc.so.6:strncasecmp_l) redirected to >> 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6516470 (libc.so.6:strstr) redirected to 0x48331d0 >> (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x65a3750 (libc.so.6:__memcpy_chk) redirected to >> 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286938-- REDIR: 0x6527a30 (libc.so.6:__strrchr_sse2) redirected to >> 0x483ea70 (__strrchr_sse2) >> --1286938-- REDIR: 0x6511c90 (libc.so.6:calloc) redirected to 0x483dce0 >> (calloc) >> --1286938-- REDIR: 0x6510260 (libc.so.6:malloc) redirected to 0x483b780 >> (malloc) >> --1286938-- REDIR: 0x6531c40 (libc.so.6:memcpy at GLIBC_2.2.5) redirected >> to 0x4840100 (memcpy at GLIBC_2.2.5) >> --1286938-- REDIR: 0x6527d30 (libc.so.6:__strlen_sse2) redirected to >> 0x483efa0 (__strlen_sse2) >> --1286938-- REDIR: 0x65f4ac0 (libc.so.6:__strncmp_sse42) redirected to >> 0x483f7c0 (__strncmp_sse42) >> --1286938-- REDIR: 0x6510850 (libc.so.6:free) redirected to 0x483c9d0 >> (free) >> --1286938-- REDIR: 0x6532070 (libc.so.6:__memset_sse2_unaligned) >> redirected to 0x48428e0 (memset) >> --1286938-- REDIR: 0x6603350 (libc.so.6:__memcmp_sse4_1) redirected to >> 0x4842150 (__memcmp_sse4_1) >> --1286938-- REDIR: 0x6520520 (libc.so.6:__strcmp_sse2_unaligned) >> redirected to 0x483fed0 (strcmp) >> --1286938-- REDIR: 0x61d0c10 (libstdc++.so.6:operator new(unsigned long)) >> redirected to 0x483bdf0 (operator new(unsigned long)) >> --1286938-- REDIR: 0x61cee60 (libstdc++.so.6:operator delete(void*)) >> redirected to 0x483cf50 (operator delete(void*)) >> --1286938-- REDIR: 0x61d0c70 (libstdc++.so.6:operator new[](unsigned >> long)) redirected to 0x483c510 (operator new[](unsigned long)) >> --1286938-- REDIR: 0x61cee90 (libstdc++.so.6:operator delete[](void*)) >> redirected to 0x483d6e0 (operator delete[](void*)) >> --1286938-- REDIR: 0x65275f0 (libc.so.6:__strchr_sse2) redirected to >> 0x483eb90 (__strchr_sse2) >> --1286950-- REDIR: 0x6511000 (libc.so.6:realloc) redirected to 0x483df30 >> (realloc) >> --1286950-- REDIR: 0x6527820 (libc.so.6:__strchrnul_sse2) redirected to >> 0x4843540 (strchrnul) >> --1286950-- REDIR: 0x6531560 (libc.so.6:__strstr_sse2_unaligned) >> redirected to 0x4843c20 (strstr) >> --1286950-- REDIR: 0x6531c20 (libc.so.6:__mempcpy_sse2_unaligned) >> redirected to 0x4843660 (mempcpy) >> --1286950-- REDIR: 0x652d2a0 (libc.so.6:__strncpy_sse2_unaligned) >> redirected to 0x483f560 (__strncpy_sse2_unaligned) >> --1286950-- REDIR: 0x6515830 (libc.so.6:strncat) redirected to 0x48331d0 >> (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x65305b0 (libc.so.6:__strncat_sse2_unaligned) >> redirected to 0x483ede0 (strncat) >> --1286950-- REDIR: 0x6516120 (libc.so.6:__GI_strstr) redirected to >> 0x4843ca0 (__strstr_sse2) >> --1286950-- REDIR: 0x6522360 (libc.so.6:__rawmemchr_sse2) redirected to >> 0x4843580 (rawmemchr) >> --1286950-- REDIR: 0x65faea0 (libc.so.6:__strcasecmp_avx) redirected to >> 0x483f830 (strcasecmp) >> --1286950-- REDIR: 0x65fc520 (libc.so.6:__strncasecmp_avx) redirected to >> 0x483f910 (strncasecmp) >> --1286950-- REDIR: 0x65f98a0 (libc.so.6:__strspn_sse42) redirected to >> 0x4843ef0 (strspn) >> --1286950-- REDIR: 0x65f9620 (libc.so.6:__strcspn_sse42) redirected to >> 0x4843e10 (strcspn) >> --1286948-- REDIR: 0x6522030 (libc.so.6:__memchr_sse2) redirected to >> 0x4840050 (memchr) >> --1286948-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_mmap.so >> --1286948-- object doesn't have a symbol table >> --1286948-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_posix.so >> --1286948-- object doesn't have a symbol table >> --1286948-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_sysv.so >> --1286948-- object doesn't have a symbol table >> --1286948-- Discarding syms at 0x4a96240-0x4a96d47 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_posix.so >> (have_dinfo 1) >> --1286948-- Discarding syms at 0x4a9b1c0-0x4a9b937 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_sysv.so >> (have_dinfo 1) >> --1286948-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_weighted.so >> --1286948-- object doesn't have a symbol table >> --1286948-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_netlink.so >> --1286948-- object doesn't have a symbol table >> --1286948-- Reading syms from >> /usr/lib/x86_64-linux-gnu/libnl-3.so.200.26.0 >> --1286948-- object doesn't have a symbol table >> --1286948-- Discarding syms at 0x4a96120-0x4a966b0 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_weighted.so >> (have_dinfo 1) >> --1286948-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_flux.so >> --1286948-- object doesn't have a symbol table >> --1286948-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_orte.so >> --1286948-- object doesn't have a symbol table >> --1286948-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_ompi.so >> --1286948-- object doesn't have a symbol table >> --1286948-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_slurm.so >> --1286948-- object doesn't have a symbol table >> --1286948-- REDIR: 0x64bc670 (libc.so.6:setenv) redirected to 0x4844480 >> (setenv) >> --1286948-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_pmi.so >> --1286948-- object doesn't have a symbol table >> --1286948-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_flux.so >> --1286948-- object doesn't have a symbol table >> --1286948-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_ext3x.so >> --1286948-- object doesn't have a symbol table >> --1286948-- Reading syms from >> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25 >> --1286948-- object doesn't have a symbol table >> --1286948-- Discarding syms at 0x8d053e0-0x8d07391 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_flux.so (have_dinfo >> 1) >> --1286948-- Reading syms from >> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v21.so >> --1286948-- object doesn't have a symbol table >> --1286948-- Reading syms from >> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v20.so >> --1286948-- object doesn't have a symbol table >> --1286948-- Reading syms from >> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v3.so >> --1286948-- object doesn't have a symbol table >> --1286948-- Reading syms from >> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v12.so >> --1286948-- object doesn't have a symbol table >> --1286950-- Reading syms from >> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_usock.so >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from >> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_tcp.so >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from >> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_native.so >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from >> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_none.so >> --1286950-- object doesn't have a symbol table >> --1286950-- Discarding syms at 0x8d04180-0x8d045b0 in >> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_none.so (have_dinfo 1) >> --1286950-- Reading syms from >> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from >> /usr/lib/x86_64-linux-gnu/pmix/lib/libmca_common_dstore.so.1.0.2 >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from >> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_hash.so >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from >> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds12.so >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from >> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_pshmem_mmap.so >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from >> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_preg_native.so >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from >> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_stdfd.so >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from >> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_syslog.so >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from >> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_default.so >> --1286950-- object doesn't have a symbol table >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_tool.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_app.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_orted.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_hnp.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_novm.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Discarding syms at 0x9ebf0a0-0x9ebf490 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_tool.so >> (have_dinfo 1) >> --1286946-- Discarding syms at 0x9eca300-0x9ecbee8 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_orted.so >> (have_dinfo 1) >> --1286946-- Discarding syms at 0x9ed1220-0x9ed24e7 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_hnp.so (have_dinfo >> 1) >> --1286946-- Discarding syms at 0x9ed8240-0x9ed8c88 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_novm.so >> (have_dinfo 1) >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_tool.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_app.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_hnp.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_orted.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Discarding syms at 0x9ebf0e0-0x9ebf417 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_tool.so >> (have_dinfo 1) >> --1286946-- Discarding syms at 0x9ecf320-0x9ed1239 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_hnp.so >> (have_dinfo 1) >> --1286946-- Discarding syms at 0x9ed73a0-0x9ed9ccc in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_orted.so >> (have_dinfo 1) >> --1286936-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_basic.so >> --1286936-- object doesn't have a symbol table >> --1286936-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_bucket.so >> --1286936-- object doesn't have a symbol table >> --1286936-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_rcache_grdma.so >> --1286936-- object doesn't have a symbol table >> --1286936-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mpool_hugepage.so >> --1286936-- object doesn't have a symbol table >> --1286936-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_bml_r2.so >> --1286936-- object doesn't have a symbol table >> --1286936-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_tcp.so >> --1286936-- object doesn't have a symbol table >> --1286936-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_sm.so >> --1286936-- object doesn't have a symbol table >> --1286936-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so >> --1286936-- object doesn't have a symbol table >> --1286936-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_openib.so >> --1286936-- object doesn't have a symbol table >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_verbs.so.40.20.0 >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_self.so >> --1286946-- object doesn't have a symbol table >> --1286946-- REDIR: 0x652cc70 (libc.so.6:__strcpy_sse2_unaligned) >> redirected to 0x483f090 (strcpy) >> --1286946-- REDIR: 0x65a3810 (libc.so.6:__memmove_chk) redirected to >> 0x48331d0 (_vgnU_ifunc_wrapper) >> ==1286946== WARNING: new redirection conflicts with existing -- ignoring >> it >> --1286946-- old: 0x06531c30 (__memcpy_chk_sse2_un) R-> (2030.0) >> 0x04843b10 __memcpy_chk >> --1286946-- new: 0x06531c30 (__memcpy_chk_sse2_un) R-> (2024.0) >> 0x048434d0 __memmove_chk >> --1286946-- REDIR: 0x6531c30 (libc.so.6:__memcpy_chk_sse2_unaligned) >> redirected to 0x4843b10 (__memcpy_chk) >> --1286946-- REDIR: 0x65129b0 (libc.so.6:posix_memalign) redirected to >> 0x483e1e0 (posix_memalign) >> --1286946-- Discarding syms at 0x9f15280-0x9f32932 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_openib.so >> (have_dinfo 1) >> --1286946-- Discarding syms at 0x9f7c4c0-0x9f7ded8 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_verbs.so.40.20.0 >> (have_dinfo 1) >> --1286946-- Discarding syms at 0x9f620c0-0x9f71483 in >> /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 (have_dinfo 1) >> --1286946-- Discarding syms at 0x9f9ba10-0x9fd22ee in >> /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 (have_dinfo 1) >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_cm.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_monitoring.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_monitoring.so.50.10.0 >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_vprotocol_pessimist.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Discarding syms at 0x9f4d400-0x9f50c19 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_vprotocol_pessimist.so >> (have_dinfo 1) >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/libpsm1/libpsm_infinipath.so.1.16 >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/libinfinipath.so.4.0 >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libuuid.so.1.3.0 >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_ofi.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/libfabric.so.1.9.15 >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/librdmacm.so.1.2.28.0 >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libpsm2.so.2.1 >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libnuma.so.1.0.0 >> --1286946-- object doesn't have a symbol table >> --1286946-- REDIR: 0x6517140 (libc.so.6:strcasestr) redirected to >> 0x4843f80 (strcasestr) >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm2.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Discarding syms at 0x9f4d5c0-0x9f4f5a1 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm.so (have_dinfo 1) >> --1286946-- Discarding syms at 0x9fee680-0x9ff096c in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm2.so (have_dinfo >> 1) >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_inter.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_basic.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sm.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_sm.so.40.20.0 >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_self.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sync.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_monitoring.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_libnbc.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_tuned.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_sm.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_pt2pt.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_rdma.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_monitoring.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Discarding syms at 0x9f724a0-0x9f787b5 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_ofi.so (have_dinfo 1) >> --1286946-- Discarding syms at 0xa827f80-0xa8e14c4 in >> /usr/lib/x86_64-linux-gnu/libfabric.so.1.9.15 (have_dinfo 1) >> --1286946-- Discarding syms at 0x9f94830-0x9fbafce in >> /usr/lib/libpsm1/libpsm_infinipath.so.1.16 (have_dinfo 1) >> --1286946-- Discarding syms at 0x9fe5580-0x9fe8f71 in >> /usr/lib/x86_64-linux-gnu/libuuid.so.1.3.0 (have_dinfo 1) >> --1286946-- Discarding syms at 0x9f56420-0x9f5cec0 in >> /usr/lib/x86_64-linux-gnu/libinfinipath.so.4.0 (have_dinfo 1) >> --1286946-- Discarding syms at 0xa929f10-0xa93d5fc in >> /usr/lib/x86_64-linux-gnu/librdmacm.so.1.2.28.0 (have_dinfo 1) >> --1286946-- Discarding syms at 0xa94b0c0-0xa95a483 in >> /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 (have_dinfo 1) >> --1286946-- Discarding syms at 0xa968860-0xa9adf12 in >> /usr/lib/x86_64-linux-gnu/libpsm2.so.2.1 (have_dinfo 1) >> --1286946-- Discarding syms at 0xa9e7a10-0xaa1e2ee in >> /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 (have_dinfo 1) >> --1286946-- Discarding syms at 0x9f80410-0x9f84e27 in >> /usr/lib/x86_64-linux-gnu/libnuma.so.1.0.0 (have_dinfo 1) >> --1286946-- Discarding syms at 0x9f103e0-0x9f15fd5 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_cm.so (have_dinfo 1) >> --1286946-- Discarding syms at 0x9f471e0-0x9f47ce0 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_monitoring.so >> (have_dinfo 1) >> ==1286946== Thread 3: >> ==1286946== Syscall param writev(vector[...]) points to uninitialised >> byte(s) >> ==1286946== at 0x658A48D: __writev (writev.c:26) >> ==1286946== by 0x658A48D: writev (writev.c:24) >> ==1286946== by 0x8DF9B4C: pmix_ptl_base_send_handler (in >> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >> ==1286946== by 0x7CC413E: ??? (in >> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286946== by 0x7CC487E: event_base_loop (in >> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286946== by 0x8DBDD55: ??? (in >> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >> ==1286946== by 0x4BF7608: start_thread (pthread_create.c:477) >> ==1286946== by 0x6595102: clone (clone.S:95) >> ==1286946== Address 0xa28fdcf is 127 bytes inside a block of size 5,120 >> alloc'd >> ==1286946== at 0x483DFAF: realloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286946== by 0x8DE155A: pmix_bfrop_buffer_extend (in >> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >> ==1286946== by 0x8DE3F4A: pmix_bfrops_base_pack_byte (in >> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >> ==1286946== by 0x8DE4900: pmix_bfrops_base_pack_buf (in >> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >> ==1286946== by 0x8DE4175: pmix_bfrops_base_pack (in >> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >> ==1286946== by 0x8D7CF91: ??? (in >> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >> ==1286946== by 0x7CC3FDD: ??? (in >> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286946== by 0x7CC487E: event_base_loop (in >> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286946== by 0x8DBDD55: ??? (in >> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >> ==1286946== by 0x4BF7608: start_thread (pthread_create.c:477) >> ==1286946== by 0x6595102: clone (clone.S:95) >> ==1286946== Uninitialised value was created by a stack allocation >> ==1286946== at 0x9F048D6: ??? (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so) >> ==1286946== >> --1286944-- Discarding syms at 0xaa4d220-0xaa5796a in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmp--1286945-- Discarding syms at >> 0xaa4d220---1286948-- Discarding syms at 0xaae1100-0xaae7d70 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmp--1286945-- Discarding syms at >> 0xaae1100-0xaae7d70 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_monitoring.so >> (have_dinfo 1) >> --1286945-- Discarding syms at 0x9f69420-0x9f--1286938-- REDIR: 0x61cee70 >> (libstdc++.so.6:operator delete(void*, unsigned long)) redirected to >> --1286937-- REDIR: 0x61cee70 (libstdc++.so.6:opera--1286946-- REDIR: >> 0x652e970 (libc.so.6:__stpncpy_sse2_unaligned) redirected to 0x48427e0 >> (stpncpy) >> --1286942-- REDIR: 0x6527ed0 (libc.so.6:__strnlen_sse2) redirected to >> 0x483eee0 (strnlen) >> --1286944-- REDIR: 0x652fcc0 (libc.so.6:__strcat_sse2_unaligned) >> redirected to 0x483ec20 (strcat) >> --1286951-- REDIR: 0x65113d0 (libc.so.6:memalign) redirected to 0x483e2a0 >> (memalign) >> --1286951-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_romio321.so >> --1286951-- object doesn't have a symbol table >> --1286951-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_ompio.so >> --1286951-- object doesn't have a symbol table >> --1286941-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_ompio.so.41.19.3 >> --1286941-- object doesn't have a symbol table >> --1286951-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fs_ufs.so >> --1286951-- object doesn't have a symbol table >> --1286939-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_two_phase.so >> --1286939-- object doesn't have a symbol table >> --1286939-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_individual.so >> --1286939-- object doesn't have a symbol table >> --1286939-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic.so >> --1286939-- object doesn't have a symbol table >> --1286939-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic_gen2.so >> --1286939-- object doesn't have a symbol table >> --1286939-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_vulcan.so >> --1286939-- object doesn't have a symbol table >> --1286939-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fbtl_posix.so >> --1286939-- object doesn't have a symbol table >> --1286943-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_sm.so >> --1286943-- object doesn't have a symbol table >> --1286943-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_individual.so >> --1286943-- object doesn't have a symbol table >> --1286943-- Reading syms from >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_lockedfile.so >> --1286943-- object doesn't have a symbol table >> --1286938-- REDIR: 0x65a3b00 (libc.so.6:__strcpy_chk) redirected to >> 0x48435c0 (__strcpy_chk) >> --1286939-- Discarding syms at 0x9f1d660-0x9f371d6 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x9f5afa0-0x9f8f8b6 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_romio321.so >> (have_dinfo 1) >> --1286939-- Discarding syms at 0x9fa0640-0x9fa42d9 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_ompio.so (have_dinfo >> 1) >> --1286939-- Discarding syms at 0x9f4c160-0x9f4dc58 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_inter.so >> (have_dinfo 1) >> --1286939-- Discarding syms at 0xa7fc270-0xa804f00 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_basic.so >> (have_dinfo 1) >> --1286939-- Discarding syms at 0x9fee3a0-0x9ff134e in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sm.so (have_dinfo 1) >> --1286939-- Discarding syms at 0xa80a240-0xa80aa8d in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_sm.so.40.20.0 >> (have_dinfo 1) >> --1286939-- Discarding syms at 0xa80f0e0-0xa80f8bb in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_self.so (have_dinfo >> 1) >> --1286939-- Discarding syms at 0xaa460c0-0xaa47947 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sync.so (have_dinfo >> 1) >> --1286939-- Discarding syms at 0xaa613e0-0xaa7730f in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_libnbc.so >> (have_dinfo 1) >> --1286939-- Discarding syms at 0xaa849c0-0xaa8a845 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_tuned.so >> (have_dinfo 1) >> --1286939-- Discarding syms at 0x9ee1320-0x9ee3567 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_bml_r2.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x9eebc40-0x9ef4ad7 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_tcp.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x9f02600-0x9f08cd8 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so (have_dinfo >> 1) >> --1286939-- Discarding syms at 0x9f40200-0x9f4126e in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_self.so (have_dinfo >> 1) >> --1286939-- Discarding syms at 0x9eda4e0-0x9edb4c5 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mpool_hugepage.so >> (have_dinfo 1) >> --1286939-- Discarding syms at 0x9ed32c0-0x9ed4afe in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_rcache_grdma.so >> (have_dinfo 1) >> --1286939-- Discarding syms at 0x9ebf160-0x9ebfe95 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_basic.so >> (have_dinfo 1) >> --1286939-- Discarding syms at 0x9ece140-0x9ecebed in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_bucket.so >> (have_dinfo 1) >> --1286939-- Discarding syms at 0x9ec92a0-0x9ec9aa2 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_app.so >> (have_dinfo 1) >> --1286939-- Discarding syms at 0x8eae0e0-0x8eae4a7 in >> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_stdfd.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x8eb3220-0x8eb3c27 in >> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_syslog.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x8eb80e0-0x8eb90b7 in >> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_default.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x8ea6380-0x8ea97b3 in >> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_preg_native.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x8e5a740-0x8e5f859 in >> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_usock.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x8e67be0-0x8e743f0 in >> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_tcp.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x84da200-0x84daa5d in >> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_native.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x8d322b0-0x8d34bfc in >> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v21.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x8e29480-0x8e3b70a in >> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v20.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x8d3c2b0-0x8d3ed5c in >> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v3.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x8e45340-0x8e502da in >> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v12.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x8e901a0-0x8e908a7 in >> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_pshmem_mmap.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x8d05520-0x8d06783 in >> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x8e7b460-0x8e8aaa4 in >> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_hash.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x8d44520-0x8d4556a in >> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds12.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x8e97600-0x8ea0fa1 in >> /usr/lib/x86_64-linux-gnu/pmix/lib/libmca_common_dstore.so.1.0.2 >> (have_dinfo 1) >> --1286939-- Discarding syms at 0x8d109c0-0x8d27dcf in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_ext3x.so >> (have_dinfo 1) >> --1286939-- Discarding syms at 0x8d5b280-0x8dfdffb in >> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25 (have_dinfo 1) >> --1286939-- Discarding syms at 0x9ec40a0-0x9ec4490 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_app.so (have_dinfo >> 1) >> --1286939-- Discarding syms at 0x84d2580-0x84d518f in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_pmi.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x4a96120-0x4a9644f in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_flux.so >> (have_dinfo 1) >> --1286939-- Discarding syms at 0x4aa0100-0x4aa03e7 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_orte.so >> (have_dinfo 1) >> --1286939-- Discarding syms at 0x84c74a0-0x84c901f in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_ompi.so >> (have_dinfo 1) >> --1286939-- Discarding syms at 0x4aa5260-0x4aa58e9 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_slurm.so >> (have_dinfo 1) >> --1286939-- Discarding syms at 0x4a9b420-0x4a9bcdf in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_netlink.so >> (have_dinfo 1) >> --1286939-- Discarding syms at 0x84e7460-0x84f52ca in >> /usr/lib/x86_64-linux-gnu/libnl-3.so.200.26.0 (have_dinfo 1) >> --1286939-- Discarding syms at 0x4a90360-0x4a91107 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_mmap.so >> (have_dinfo 1) >> --1286939-- Discarding syms at 0x9f46220-0x9f474cc in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fbtl_posix.so >> (have_dinfo 1) >> --1286939-- Discarding syms at 0x9f0f180-0x9f0f78d in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fs_ufs.so (have_dinfo 1) >> --1286939-- Discarding syms at 0xaa94540-0xaa96a4a in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_sm.so (have_dinfo 1) >> --1286939-- Discarding syms at 0xaa9f6c0-0xaab44d0 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_pt2pt.so (have_dinfo >> 1) >> --1286939-- Discarding syms at 0xaabe820-0xaad8ee0 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_rdma.so (have_dinfo >> 1) >> --1286939-- Discarding syms at 0x9efc080-0x9efc1e1 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_sm.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x9fab2a0-0x9fb1341 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_two_phase.so >> (have_dinfo 1) >> --1286939-- Discarding syms at 0x9f140c0-0x9f14299 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_individual.so >> (have_dinfo 1) >> --1286939-- Discarding syms at 0x9fb72a0-0x9fbb791 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic.so >> (have_dinfo 1) >> --1286939-- Discarding syms at 0x9fd52a0-0x9fda794 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic_gen2.so >> (have_dinfo 1) >> --1286939-- Discarding syms at 0x9fe02e0-0x9fe59a5 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_vulcan.so >> (have_dinfo 1) >> --1286939-- Discarding syms at 0xa815460-0xa8177ab in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_sm.so >> (have_dinfo 1) >> --1286939-- Discarding syms at 0xa81e260-0xa82033d in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_individual.so >> (have_dinfo 1) >> --1286939-- Discarding syms at 0xa8273e0-0xa8297d8 in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_lockedfile.so >> (have_dinfo 1) >> --1286939-- Discarding syms at 0x9fc85e0-0x9fce8ef in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_ompio.so.41.19.3 >> (have_dinfo 1) >> ==1286939== >> ==1286939== HEAP SUMMARY: >> ==1286939== in use at exit: 74,054 bytes in 223 blocks >> ==1286939== total heap usage: 22,405,782 allocs, 22,405,559 frees, >> 34,062,479,959 bytes allocated >> ==1286939== >> ==1286939== Searching for pointers to 223 not-freed blocks >> ==1286939== Checked 3,415,912 bytes >> ==1286939== >> ==1286939== Thread 1: >> ==1286939== 1 bytes in 1 blocks are definitely lost in loss record 1 of 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x651550E: strdup (strdup.c:42) >> ==1286939== by 0x9F6A4B6: ??? >> ==1286939== by 0x9F47373: ??? >> ==1286939== by 0x68E3B9B: mca_base_framework_components_register (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E3F35: mca_base_framework_register (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E3F93: mca_base_framework_open (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x4BA1734: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >> ==1286939== >> ==1286939== 8 bytes in 1 blocks are still reachable in loss record 2 of 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x764724C: ??? (in >> /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) >> ==1286939== by 0x7657B9A: ??? (in >> /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) >> ==1286939== by 0x7645679: ??? (in >> /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) >> ==1286939== by 0x4011B89: call_init.part.0 (dl-init.c:72) >> ==1286939== by 0x4011C90: call_init (dl-init.c:30) >> ==1286939== by 0x4011C90: _dl_init (dl-init.c:119) >> ==1286939== by 0x4001139: ??? (in /usr/lib/x86_64-linux-gnu/ld-2.31.so >> ) >> ==1286939== by 0x3: ??? >> ==1286939== by 0x1FFEFFF926: ??? >> ==1286939== by 0x1FFEFFF93D: ??? >> ==1286939== by 0x1FFEFFF987: ??? >> ==1286939== by 0x1FFEFFF9A7: ??? >> ==1286939== >> ==1286939== 8 bytes in 1 blocks are definitely lost in loss record 3 of 44 >> ==1286939== at 0x483DD99: calloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x9F69B6F: ??? >> ==1286939== by 0x9F1CDED: ??? >> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x9EE3527: ??? >> ==1286939== by 0x4B6170A: mca_bml_base_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4BA1714: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >> ==1286939== by 0x15710D: main (testing_main.cpp:8) >> ==1286939== >> ==1286939== 13 bytes in 2 blocks are still reachable in loss record 4 of >> 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x651550E: strdup (strdup.c:42) >> ==1286939== by 0x7CC3657: event_config_avoid_method (in >> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286939== by 0x68FEB5A: opal_event_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68FE8CA: ??? (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E4008: mca_base_framework_open (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68B8BCF: opal_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x6860120: orte_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >> ==1286939== by 0x4BA1322: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286939== >> ==1286939== 15 bytes in 1 blocks are indirectly lost in loss record 5 of >> 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x651550E: strdup (strdup.c:42) >> ==1286939== by 0x9EDB189: ??? >> ==1286939== by 0x68D98FC: mca_base_framework_components_open (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x6907C25: ??? (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E4008: mca_base_framework_open (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x4BA16D5: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >> ==1286939== by 0x15710D: main (testing_main.cpp:8) >> ==1286939== >> ==1286939== 15 bytes in 1 blocks are definitely lost in loss record 6 of >> 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x651550E: strdup (strdup.c:42) >> ==1286939== by 0x9F5655C: ??? >> ==1286939== by 0x4011B89: call_init.part.0 (dl-init.c:72) >> ==1286939== by 0x4011C90: call_init (dl-init.c:30) >> ==1286939== by 0x4011C90: _dl_init (dl-init.c:119) >> ==1286939== by 0x65D6784: _dl_catch_exception (dl-error-skeleton.c:182) >> ==1286939== by 0x401642C: dl_open_worker (dl-open.c:758) >> ==1286939== by 0x65D6727: _dl_catch_exception (dl-error-skeleton.c:208) >> ==1286939== by 0x40155F9: _dl_open (dl-open.c:837) >> ==1286939== by 0x72DE34B: dlopen_doit (dlopen.c:66) >> ==1286939== by 0x65D6727: _dl_catch_exception (dl-error-skeleton.c:208) >> ==1286939== by 0x65D67F2: _dl_catch_error (dl-error-skeleton.c:227) >> ==1286939== >> ==1286939== 16 bytes in 1 blocks are definitely lost in loss record 7 of >> 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x9F1CBEB: ??? >> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x9EE3527: ??? >> ==1286939== by 0x4B6170A: mca_bml_base_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4BA1714: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >> ==1286939== by 0x15710D: main (testing_main.cpp:8) >> ==1286939== >> ==1286939== 16 bytes in 1 blocks are definitely lost in loss record 8 of >> 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x9F1CC66: ??? >> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x9EE3527: ??? >> ==1286939== by 0x4B6170A: mca_bml_base_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4BA1714: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >> ==1286939== by 0x15710D: main (testing_main.cpp:8) >> ==1286939== >> ==1286939== 16 bytes in 1 blocks are definitely lost in loss record 9 of >> 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x9F1CCDA: ??? >> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x9EE3527: ??? >> ==1286939== by 0x4B6170A: mca_bml_base_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4BA1714: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >> ==1286939== by 0x15710D: main (testing_main.cpp:8) >> ==1286939== >> ==1286939== 25 bytes in 1 blocks are still reachable in loss record 10 of >> 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x651550E: strdup (strdup.c:42) >> ==1286939== by 0x68F27BD: ??? (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x4B956B6: ompi_pml_v_output_open (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B95259: ??? (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x68D98FC: mca_base_framework_components_open (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x4B93FAE: ??? (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x68E4008: mca_base_framework_open (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x4BA1734: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286939== >> ==1286939== 30 bytes in 1 blocks are definitely lost in loss record 11 of >> 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0xA9A859B: ??? >> ==1286939== by 0x4011B89: call_init.part.0 (dl-init.c:72) >> ==1286939== by 0x4011C90: call_init (dl-init.c:30) >> ==1286939== by 0x4011C90: _dl_init (dl-init.c:119) >> ==1286939== by 0x65D6784: _dl_catch_exception (dl-error-skeleton.c:182) >> ==1286939== by 0x401642C: dl_open_worker (dl-open.c:758) >> ==1286939== by 0x65D6727: _dl_catch_exception (dl-error-skeleton.c:208) >> ==1286939== by 0x40155F9: _dl_open (dl-open.c:837) >> ==1286939== by 0x72DE34B: dlopen_doit (dlopen.c:66) >> ==1286939== by 0x65D6727: _dl_catch_exception (dl-error-skeleton.c:208) >> ==1286939== by 0x65D67F2: _dl_catch_error (dl-error-skeleton.c:227) >> ==1286939== by 0x72DEB58: _dlerror_run (dlerror.c:170) >> ==1286939== >> ==1286939== 32 bytes in 1 blocks are still reachable in loss record 12 of >> 44 >> ==1286939== at 0x483DD99: calloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x7CC353E: event_get_supported_methods (in >> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286939== by 0x68FEA98: opal_event_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68FE8CA: ??? (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E4008: mca_base_framework_open (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68B8BCF: opal_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x6860120: orte_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >> ==1286939== by 0x4BA1322: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >> ==1286939== >> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 13 of >> 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x8E9D3EB: ??? >> ==1286939== by 0x8E9F1C1: ??? >> ==1286939== by 0x8D0578C: ??? >> ==1286939== by 0x8D8605A: ??? >> ==1286939== by 0x8D87FE8: ??? >> ==1286939== by 0x8D88E4D: ??? >> ==1286939== by 0x8D1A5EB: ??? >> ==1286939== by 0x84D2B0A: ??? >> ==1286939== by 0x68602FB: orte_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >> ==1286939== by 0x4BA1322: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== >> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 14 of >> 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x8E9D3EB: ??? >> ==1286939== by 0x8E9F1C1: ??? >> ==1286939== by 0x8D0578C: ??? >> ==1286939== by 0x8D8605A: ??? >> ==1286939== by 0x8D87FE8: ??? >> ==1286939== by 0x8D88E4D: ??? >> ==1286939== by 0x8D1A5EB: ??? >> ==1286939== by 0x84D2BCE: ??? >> ==1286939== by 0x68602FB: orte_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >> ==1286939== by 0x4BA1322: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== >> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 15 of >> 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x8E9D3EB: ??? >> ==1286939== by 0x8E9F1C1: ??? >> ==1286939== by 0x8D0578C: ??? >> ==1286939== by 0x8D8605A: ??? >> ==1286939== by 0x8D87FE8: ??? >> ==1286939== by 0x8D88E4D: ??? >> ==1286939== by 0x8D1A5EB: ??? >> ==1286939== by 0x84D2CB2: ??? >> ==1286939== by 0x68602FB: orte_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >> ==1286939== by 0x4BA1322: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== >> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 16 of >> 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x8E9D3EB: ??? >> ==1286939== by 0x8E9F1C1: ??? >> ==1286939== by 0x8D0578C: ??? >> ==1286939== by 0x8D8605A: ??? >> ==1286939== by 0x8D87FE8: ??? >> ==1286939== by 0x8D88E4D: ??? >> ==1286939== by 0x8D1A5EB: ??? >> ==1286939== by 0x84D2D91: ??? >> ==1286939== by 0x68602FB: orte_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >> ==1286939== by 0x4BA1322: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== >> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 17 of >> 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x8E81BD8: ??? >> ==1286939== by 0x8E89F4B: ??? >> ==1286939== by 0x8D84A0D: ??? >> ==1286939== by 0x8DF79C1: ??? >> ==1286939== by 0x7CC3FDD: ??? (in >> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286939== by 0x7CC487E: event_base_loop (in >> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286939== by 0x8DBDD55: ??? >> ==1286939== by 0x4BF7608: start_thread (pthread_create.c:477) >> ==1286939== by 0x6595102: clone (clone.S:95) >> ==1286939== >> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 18 of >> 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x8E9D3EB: ??? >> ==1286939== by 0x8E9F1C1: ??? >> ==1286939== by 0x8D0578C: ??? >> ==1286939== by 0x8D8605A: ??? >> ==1286939== by 0x8D87FE8: ??? >> ==1286939== by 0x8D88E4D: ??? >> ==1286939== by 0x8D1A767: ??? >> ==1286939== by 0x84D330E: ??? >> ==1286939== by 0x68602FB: orte_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >> ==1286939== by 0x4BA1322: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== >> ==1286939== 36 (32 direct, 4 indirect) bytes in 1 blocks are definitely >> lost in loss record 19 of 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x8E9D3EB: ??? >> ==1286939== by 0x8E9F1C1: ??? >> ==1286939== by 0x8D0578C: ??? >> ==1286939== by 0x8D8605A: ??? >> ==1286939== by 0x8D87FE8: ??? >> ==1286939== by 0x8D88E4D: ??? >> ==1286939== by 0x8D1A5EB: ??? >> ==1286939== by 0x4B94C09: mca_pml_base_pml_check_selected (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x9F1E1E1: ??? >> ==1286939== by 0x4BA1A09: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== >> ==1286939== 40 bytes in 1 blocks are still reachable in loss record 20 of >> 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x7CFF4B6: ??? (in >> /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >> ==1286939== by 0x7CC5E26: event_global_setup_locks_ (in >> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286939== by 0x7CFF68F: evthread_use_pthreads (in >> /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >> ==1286939== by 0x68FE8E4: ??? (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E4008: mca_base_framework_open (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68B8BCF: opal_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x6860120: orte_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >> ==1286939== by 0x4BA1322: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286939== >> ==1286939== 40 bytes in 1 blocks are still reachable in loss record 21 of >> 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x7CFF4B6: ??? (in >> /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >> ==1286939== by 0x7CCF377: evsig_global_setup_locks_ (in >> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286939== by 0x7CC5E39: event_global_setup_locks_ (in >> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286939== by 0x7CFF68F: evthread_use_pthreads (in >> /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >> ==1286939== by 0x68FE8E4: ??? (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E4008: mca_base_framework_open (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68B8BCF: opal_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x6860120: orte_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >> ==1286939== by 0x4BA1322: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== >> ==1286939== 40 bytes in 1 blocks are still reachable in loss record 22 of >> 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x7CFF4B6: ??? (in >> /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >> ==1286939== by 0x7CCB997: evutil_secure_rng_global_setup_locks_ (in >> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286939== by 0x7CC5E4F: event_global_setup_locks_ (in >> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286939== by 0x7CFF68F: evthread_use_pthreads (in >> /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >> ==1286939== by 0x68FE8E4: ??? (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E4008: mca_base_framework_open (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68B8BCF: opal_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x6860120: orte_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >> ==1286939== by 0x4BA1322: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== >> ==1286939== 48 bytes in 1 blocks are still reachable in loss record 23 of >> 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x68D9043: mca_base_component_repository_open (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68D7F7A: mca_base_component_find (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E3F35: mca_base_framework_register (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E3F93: mca_base_framework_open (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x4B8560C: mca_io_base_file_select (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B0E68A: ompi_file_open (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B3ADB8: PMPI_File_open (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) >> ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) >> ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) >> ==1286939== >> ==1286939== 48 bytes in 1 blocks are still reachable in loss record 24 of >> 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x68D9043: mca_base_component_repository_open (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68D7F7A: mca_base_component_find (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E3F35: mca_base_framework_register (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E3F93: mca_base_framework_open (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x4B85638: mca_io_base_file_select (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B0E68A: ompi_file_open (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B3ADB8: PMPI_File_open (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) >> ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) >> ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) >> ==1286939== >> ==1286939== 48 bytes in 2 blocks are still reachable in loss record 25 of >> 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x7CC3647: event_config_avoid_method (in >> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286939== by 0x68FEB5A: opal_event_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68FE8CA: ??? (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E4008: mca_base_framework_open (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68B8BCF: opal_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x6860120: orte_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >> ==1286939== by 0x4BA1322: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >> ==1286939== >> ==1286939== 55 (32 direct, 23 indirect) bytes in 1 blocks are definitely >> lost in loss record 26 of 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x8E9D3EB: ??? >> ==1286939== by 0x8E9F1C1: ??? >> ==1286939== by 0x8D0578C: ??? >> ==1286939== by 0x8D8605A: ??? >> ==1286939== by 0x8D87FE8: ??? >> ==1286939== by 0x8D88E4D: ??? >> ==1286939== by 0x8D1A767: ??? >> ==1286939== by 0x4AF6CD6: ompi_comm_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4BA194D: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== >> ==1286939== 56 bytes in 1 blocks are still reachable in loss record 27 of >> 44 >> ==1286939== at 0x483DD99: calloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x7CC1C86: event_config_new (in >> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286939== by 0x68FEAC0: opal_event_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68FE8CA: ??? (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E4008: mca_base_framework_open (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68B8BCF: opal_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x6860120: orte_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >> ==1286939== by 0x4BA1322: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >> ==1286939== >> ==1286939== 56 bytes in 1 blocks are definitely lost in loss record 28 of >> 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x9F6E008: ??? >> ==1286939== by 0x9F7C654: ??? >> ==1286939== by 0x9F1CD3E: ??? >> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x9EE3527: ??? >> ==1286939== by 0x4B6170A: mca_bml_base_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4BA1714: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >> ==1286939== >> ==1286939== 56 bytes in 1 blocks are definitely lost in loss record 29 of >> 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0xA957008: ??? >> ==1286939== by 0xA86B017: ??? >> ==1286939== by 0xA862FD8: ??? >> ==1286939== by 0xA828E15: ??? >> ==1286939== by 0xA829624: ??? >> ==1286939== by 0x9F77910: ??? >> ==1286939== by 0x4B85C53: ompi_mtl_base_select (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x9F13E4D: ??? >> ==1286939== by 0x4B94673: mca_pml_base_select (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4BA1789: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== >> ==1286939== 76 (32 direct, 44 indirect) bytes in 1 blocks are definitely >> lost in loss record 30 of 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x8E9D3EB: ??? >> ==1286939== by 0x8E9F1C1: ??? >> ==1286939== by 0x8D0578C: ??? >> ==1286939== by 0x8D8605A: ??? >> ==1286939== by 0x8D87FE8: ??? >> ==1286939== by 0x8D88E4D: ??? >> ==1286939== by 0x8D1A767: ??? >> ==1286939== by 0x84D387F: ??? >> ==1286939== by 0x68602FB: orte_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >> ==1286939== by 0x4BA1322: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== >> ==1286939== 79 (64 direct, 15 indirect) bytes in 1 blocks are definitely >> lost in loss record 31 of 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x9EDB12E: ??? >> ==1286939== by 0x68D98FC: mca_base_framework_components_open (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x6907C25: ??? (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E4008: mca_base_framework_open (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x4BA16D5: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >> ==1286939== by 0x15710D: main (testing_main.cpp:8) >> ==1286939== >> ==1286939== 144 bytes in 3 blocks are still reachable in loss record 32 >> of 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x68D9043: mca_base_component_repository_open (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68D7F7A: mca_base_component_find (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E3F35: mca_base_framework_register (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E3F93: mca_base_framework_open (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x4B8564E: mca_io_base_file_select (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B0E68A: ompi_file_open (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B3ADB8: PMPI_File_open (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) >> ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) >> ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) >> ==1286939== >> ==1286939== 231 bytes in 12 blocks are definitely lost in loss record 33 >> of 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x651550E: strdup (strdup.c:42) >> ==1286939== by 0x9F2B4B3: ??? >> ==1286939== by 0x9F2B85C: ??? >> ==1286939== by 0x9F2BBD7: ??? >> ==1286939== by 0x9F1CAAC: ??? >> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x9EE3527: ??? >> ==1286939== by 0x4B6170A: mca_bml_base_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4BA1714: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== >> ==1286939== 240 bytes in 5 blocks are still reachable in loss record 34 >> of 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x68D9043: mca_base_component_repository_open (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68D7F7A: mca_base_component_find (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E3F35: mca_base_framework_register (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E3F93: mca_base_framework_open (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x4B85622: mca_io_base_file_select (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B0E68A: ompi_file_open (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B3ADB8: PMPI_File_open (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) >> ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) >> ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) >> ==1286939== >> ==1286939== 272 bytes in 44 blocks are definitely lost in loss record 35 >> of 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x9FCAEDB: ??? >> ==1286939== by 0x9FE42B2: ??? >> ==1286939== by 0x9FE47BB: ??? >> ==1286939== by 0x9FCDDBF: ??? >> ==1286939== by 0x9FA324A: ??? >> ==1286939== by 0x4B3DD7F: PMPI_File_write_at_all (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x7B6DB32: H5FD_mpio_write (H5FDmpio.c:1833) >> ==1286939== by 0x78DF11D: H5FD_write (H5FDint.c:257) >> ==1286939== by 0x78AE86B: H5F__accum_write (H5Faccum.c:825) >> ==1286939== by 0x7A1FBE9: H5PB_write (H5PB.c:1027) >> ==1286939== by 0x78BBC7A: H5F_block_write (H5Fio.c:164) >> ==1286939== >> ==1286939== 585 (480 direct, 105 indirect) bytes in 15 blocks are >> definitely lost in loss record 36 of 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x8E9D3EB: ??? >> ==1286939== by 0x8E9F1C1: ??? >> ==1286939== by 0x8D0578C: ??? >> ==1286939== by 0x8D8605A: ??? >> ==1286939== by 0x8D87FE8: ??? >> ==1286939== by 0x8D88E4D: ??? >> ==1286939== by 0x8D1A767: ??? >> ==1286939== by 0x4B14036: ompi_proc_complete_init_single (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B146C3: ompi_proc_complete_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4BA19A9: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== >> ==1286939== 776 bytes in 32 blocks are indirectly lost in loss record 37 >> of 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x8DE9816: ??? >> ==1286939== by 0x8DEB1D2: ??? >> ==1286939== by 0x8DEB49A: ??? >> ==1286939== by 0x8DE8B12: ??? >> ==1286939== by 0x8E9D492: ??? >> ==1286939== by 0x8E9F1C1: ??? >> ==1286939== by 0x8D0578C: ??? >> ==1286939== by 0x8D8605A: ??? >> ==1286939== by 0x8D87FE8: ??? >> ==1286939== by 0x8D88E4D: ??? >> ==1286939== by 0x8D1A767: ??? >> ==1286939== >> ==1286939== 840 (480 direct, 360 indirect) bytes in 15 blocks are >> definitely lost in loss record 38 of 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x8E9D3EB: ??? >> ==1286939== by 0x8E9F1C1: ??? >> ==1286939== by 0x8D0578C: ??? >> ==1286939== by 0x8D8605A: ??? >> ==1286939== by 0x8D87FE8: ??? >> ==1286939== by 0x8D88E4D: ??? >> ==1286939== by 0x8D1A5EB: ??? >> ==1286939== by 0x9EF2F00: ??? >> ==1286939== by 0x9EEBF17: ??? >> ==1286939== by 0x9EE2F54: ??? >> ==1286939== by 0x9F1E1FB: ??? >> ==1286939== >> ==1286939== 1,084 (480 direct, 604 indirect) bytes in 15 blocks are >> definitely lost in loss record 39 of 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x8E9D3EB: ??? >> ==1286939== by 0x8E9F1C1: ??? >> ==1286939== by 0x8D0578C: ??? >> ==1286939== by 0x8D8605A: ??? >> ==1286939== by 0x8D87FE8: ??? >> ==1286939== by 0x8D88E4D: ??? >> ==1286939== by 0x8D1A767: ??? >> ==1286939== by 0x84D4800: ??? >> ==1286939== by 0x68602FB: orte_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >> ==1286939== by 0x4BA1322: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== >> ==1286939== 1,344 bytes in 1 blocks are definitely lost in loss record 40 >> of 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x68AE702: opal_free_list_grow_st (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x9F1CD2D: ??? >> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x9EE3527: ??? >> ==1286939== by 0x4B6170A: mca_bml_base_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4BA1714: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >> ==1286939== by 0x15710D: main (testing_main.cpp:8) >> ==1286939== >> ==1286939== 2,752 bytes in 1 blocks are definitely lost in loss record 41 >> of 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x68AE702: opal_free_list_grow_st (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x9F1CC50: ??? >> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x9EE3527: ??? >> ==1286939== by 0x4B6170A: mca_bml_base_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4BA1714: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >> ==1286939== by 0x15710D: main (testing_main.cpp:8) >> ==1286939== >> ==1286939== 2,752 bytes in 1 blocks are definitely lost in loss record 42 >> of 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x68AE702: opal_free_list_grow_st (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x9F1CCC4: ??? >> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x9EE3527: ??? >> ==1286939== by 0x4B6170A: mca_bml_base_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4BA1714: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >> ==1286939== by 0x15710D: main (testing_main.cpp:8) >> ==1286939== >> ==1286939== 62,644 bytes in 31 blocks are indirectly lost in loss record >> 43 of 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x8DE9FA8: ??? >> ==1286939== by 0x8DEB032: ??? >> ==1286939== by 0x8DEB49A: ??? >> ==1286939== by 0x8DE8B12: ??? >> ==1286939== by 0x8E9D492: ??? >> ==1286939== by 0x8E9F1C1: ??? >> ==1286939== by 0x8D0578C: ??? >> ==1286939== by 0x8D8605A: ??? >> ==1286939== by 0x8D87FE8: ??? >> ==1286939== by 0x8D88E4D: ??? >> ==1286939== by 0x8D1A5EB: ??? >> ==1286939== >> ==1286939== 62,760 (480 direct, 62,280 indirect) bytes in 15 blocks are >> definitely lost in loss record 44 of 44 >> ==1286939== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x8E9D3EB: ??? >> ==1286939== by 0x8E9F1C1: ??? >> ==1286939== by 0x8D0578C: ??? >> ==1286939== by 0x8D8605A: ??? >> ==1286939== by 0x8D87FE8: ??? >> ==1286939== by 0x8D88E4D: ??? >> ==1286939== by 0x8D1A5EB: ??? >> ==1286939== by 0x9F0398A: ??? >> ==1286939== by 0x9EE2F54: ??? >> ==1286939== by 0x9F1E1FB: ??? >> ==1286939== by 0x4BA1A09: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== >> ==1286939== LEAK SUMMARY: >> ==1286939== definitely lost: 9,837 bytes in 138 blocks >> ==1286939== indirectly lost: 63,435 bytes in 64 blocks >> ==1286939== possibly lost: 0 bytes in 0 blocks >> ==1286939== still reachable: 782 bytes in 21 blocks >> ==1286939== suppressed: 0 bytes in 0 blocks >> ==1286939== >> ==1286939== ERROR SUMMARY: 29 errors from 29 contexts (suppressed: 0 from >> 0) >> ==1286939== >> ==1286939== 1 errors in context 1 of 29: >> ==1286939== Thread 3: >> ==1286939== Syscall param writev(vector[...]) points to uninitialised >> byte(s) >> ==1286939== at 0x658A48D: __writev (writev.c:26) >> ==1286939== by 0x658A48D: writev (writev.c:24) >> ==1286939== by 0x8DF9B4C: ??? >> ==1286939== by 0x7CC413E: ??? (in >> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286939== by 0x7CC487E: event_base_loop (in >> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286939== by 0x8DBDD55: ??? >> ==1286939== by 0x4BF7608: start_thread (pthread_create.c:477) >> ==1286939== by 0x6595102: clone (clone.S:95) >> ==1286939== Address 0xa28ee1f is 127 bytes inside a block of size 5,120 >> alloc'd >> ==1286939== at 0x483DFAF: realloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x8DE155A: ??? >> ==1286939== by 0x8DE3F4A: ??? >> ==1286939== by 0x8DE4900: ??? >> ==1286939== by 0x8DE4175: ??? >> ==1286939== by 0x8D7CF91: ??? >> ==1286939== by 0x7CC3FDD: ??? (in >> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286939== by 0x7CC487E: event_base_loop (in >> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286939== by 0x8DBDD55: ??? >> ==1286939== by 0x4BF7608: start_thread (pthread_create.c:477) >> ==1286939== by 0x6595102: clone (clone.S:95) >> ==1286939== Uninitialised value was created by a stack allocation >> ==1286939== at 0x9F048D6: ??? >> ==1286939== >> ==1286939== ERROR SUMMARY: 29 errors from 29 contexts (suppressed: 0 from >> 0) >> mpi/lib/libopen-pal.so.40.20.3) >> ==1286936== by 0x4B85622: mca_io_base_file_select (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x4B0E68A: ompi_file_open (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x4B3ADB8: PMPI_File_open (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) >> ==1286936== by 0x78D4B23: H5FD_open (H5FD.c:733) >> ==1286936== by 0x78B953B: H5F_open (H5Fint.c:1493) >> ==1286936== >> ==1286936== 272 bytes in 44 blocks are definitely lost in loss record 39 >> of 49 >> ==1286936== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286936== by 0x9FCAEDB: ??? >> ==1286936== by 0x9FE42B2: ??? >> ==1286936== by 0x9FE47BB: ??? >> ==1286936== by 0x9FCDDBF: ??? >> ==1286936== by 0x9FA324A: ??? >> ==1286936== by 0x4B3DD7F: PMPI_File_write_at_all (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x7B6DB32: H5FD_mpio_write (H5FDmpio.c:1833) >> ==1286936== by 0x78DF11D: H5FD_write (H5FDint.c:257) >> ==1286936== by 0x78AE86B: H5F__accum_write (H5Faccum.c:825) >> ==1286936== by 0x7A1FBE9: H5PB_write (H5PB.c:1027) >> ==1286936== by 0x78BBC7A: H5F_block_write (H5Fio.c:164) >> ==1286936== >> ==1286936== 312 bytes in 1 blocks are still reachable in loss record 40 >> of 49 >> ==1286936== at 0x483BE63: operator new(unsigned long) (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286936== by 0x74E78EB: boost::detail::make_external_thread_data() >> (in >> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_thread.so.1.73.0) >> ==1286936== by 0x74E7C74: >> boost::detail::add_thread_exit_function(boost::detail::thread_exit_function_base*) >> (in >> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_thread.so.1.73.0) >> ==1286936== by 0x73AFCEA: >> boost::log::v2_mt_posix::sources::aux::get_severity_level() (in >> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_log.so.1.73.0) >> ==1286936== by 0x5F71A6C: set_value (severity_feature.hpp:135) >> ==1286936== by 0x5F71A6C: >> open_record_unlocked> const boost::log::v2_mt_posix::trivial::severity_level> > > >> (severity_feature.hpp:252) >> ==1286936== by 0x5F71A6C: >> open_record> const boost::log::v2_mt_posix::trivial::severity_level> > > >> (basic_logger.hpp:459) >> ==1286936== by 0x5F71A6C: >> Logger::TraceMessage(std::__cxx11::basic_string> std::char_traits, std::allocator >) (logger.cpp:328) >> ==1286936== by 0x5F729C7: >> Logger::Message(std::__cxx11::basic_string, >> std::allocator > const&, LogLevel) (logger.cpp:280) >> ==1286936== by 0x5F73CF1: >> Logger::Timer::Timer(std::__cxx11::basic_string> std::char_traits, std::allocator > const&, LogLevel) >> (logger.cpp:426) >> ==1286936== by 0x15718A: timer (logger.hpp:98) >> ==1286936== by 0x15718A: main (testing_main.cpp:9) >> ==1286936== >> ==1286936== 585 (480 direct, 105 indirect) bytes in 15 blocks are >> definitely lost in loss record 41 of 49 >> ==1286936== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286936== by 0x8E9D3EB: ??? >> ==1286936== by 0x8E9F1C1: ??? >> ==1286936== by 0x8D0578C: ??? >> ==1286936== by 0x8D8605A: ??? >> ==1286936== by 0x8D87FE8: ??? >> ==1286936== by 0x8D88E4D: ??? >> ==1286936== by 0x8D1A767: ??? >> ==1286936== by 0x4B14036: ompi_proc_complete_init_single (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x4B146C3: ompi_proc_complete_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x4BA19A9: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== >> ==1286936== 776 bytes in 32 blocks are indirectly lost in loss record 42 >> of 49 >> ==1286936== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286936== by 0x8DE9816: ??? >> ==1286936== by 0x8DEB1D2: ??? >> ==1286936== by 0x8DEB49A: ??? >> ==1286936== by 0x8DE8B12: ??? >> ==1286936== by 0x8E9D492: ??? >> ==1286936== by 0x8E9F1C1: ??? >> ==1286936== by 0x8D0578C: ??? >> ==1286936== by 0x8D8605A: ??? >> ==1286936== by 0x8D87FE8: ??? >> ==1286936== by 0x8D88E4D: ??? >> ==1286936== by 0x8D1A767: ??? >> ==1286936== >> ==1286936== 840 (480 direct, 360 indirect) bytes in 15 blocks are >> definitely lost in loss record 43 of 49 >> ==1286936== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286936== by 0x8E9D3EB: ??? >> ==1286936== by 0x8E9F1C1: ??? >> ==1286936== by 0x8D0578C: ??? >> ==1286936== by 0x8D8605A: ??? >> ==1286936== by 0x8D87FE8: ??? >> ==1286936== by 0x8D88E4D: ??? >> ==1286936== by 0x8D1A5EB: ??? >> ==1286936== by 0x9EF2F00: ??? >> ==1286936== by 0x9EEBF17: ??? >> ==1286936== by 0x9EE2F54: ??? >> ==1286936== by 0x9F1E1FB: ??? >> ==1286936== >> ==1286936== 1,091 (480 direct, 611 indirect) bytes in 15 blocks are >> definitely lost in loss record 44 of 49 >> ==1286936== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286936== by 0x8E9D3EB: ??? >> ==1286936== by 0x8E9F1C1: ??? >> ==1286936== by 0x8D0578C: ??? >> ==1286936== by 0x8D8605A: ??? >> ==1286936== by 0x8D87FE8: ??? >> ==1286936== by 0x8D88E4D: ??? >> ==1286936== by 0x8D1A767: ??? >> ==1286936== by 0x84D4800: ??? >> ==1286936== by 0x68602FB: orte_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >> ==1286936== by 0x4BA1322: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== >> ==1286936== 1,344 bytes in 1 blocks are definitely lost in loss record 45 >> of 49 >> ==1286936== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286936== by 0x68AE702: opal_free_list_grow_st (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286936== by 0x9F1CD2D: ??? >> ==1286936== by 0x68FC9C8: mca_btl_base_select (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286936== by 0x9EE3527: ??? >> ==1286936== by 0x4B6170A: mca_bml_base_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x4BA1714: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x4A7BA77: boost::mpi::environment::environment(bool) >> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286936== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286936== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >> ==1286936== by 0x15710D: main (testing_main.cpp:8) >> ==1286936== >> ==1286936== 2,752 bytes in 1 blocks are definitely lost in loss record 46 >> of 49 >> ==1286936== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286936== by 0x68AE702: opal_free_list_grow_st (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286936== by 0x9F1CC50: ??? >> ==1286936== by 0x68FC9C8: mca_btl_base_select (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286936== by 0x9EE3527: ??? >> ==1286936== by 0x4B6170A: mca_bml_base_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x4BA1714: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x4A7BA77: boost::mpi::environment::environment(bool) >> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286936== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286936== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >> ==1286936== by 0x15710D: main (testing_main.cpp:8) >> ==1286936== >> ==1286936== 2,752 bytes in 1 blocks are definitely lost in loss record 47 >> of 49 >> ==1286936== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286936== by 0x68AE702: opal_free_list_grow_st (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286936== by 0x9F1CCC4: ??? >> ==1286936== by 0x68FC9C8: mca_btl_base_select (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286936== by 0x9EE3527: ??? >> ==1286936== by 0x4B6170A: mca_bml_base_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x4BA1714: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x4B450B0: PMPI_Init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x4A7BA77: boost::mpi::environment::environment(bool) >> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286936== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286936== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >> ==1286936== by 0x15710D: main (testing_main.cpp:8) >> ==1286936== >> ==1286936== 62,640 bytes in 30 blocks are indirectly lost in loss record >> 48 of 49 >> ==1286936== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286936== by 0x8DE9FA8: ??? >> ==1286936== by 0x8DEB032: ??? >> ==1286936== by 0x8DEB49A: ??? >> ==1286936== by 0x8DE8B12: ??? >> ==1286936== by 0x8E9D492: ??? >> ==1286936== by 0x8E9F1C1: ??? >> ==1286936== by 0x8D0578C: ??? >> ==1286936== by 0x8D8605A: ??? >> ==1286936== by 0x8D87FE8: ??? >> ==1286936== by 0x8D88E4D: ??? >> ==1286936== by 0x8D1A5EB: ??? >> ==1286936== >> ==1286936== 62,760 (480 direct, 62,280 indirect) bytes in 15 blocks are >> definitely lost in loss record 49 of 49 >> ==1286936== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286936== by 0x8E9D3EB: ??? >> ==1286936== by 0x8E9F1C1: ??? >> ==1286936== by 0x8D0578C: ??? >> ==1286936== by 0x8D8605A: ??? >> ==1286936== by 0x8D87FE8: ??? >> ==1286936== by 0x8D88E4D: ??? >> ==1286936== by 0x8D1A5EB: ??? >> ==1286936== by 0x9F0398A: ??? >> ==1286936== by 0x9EE2F54: ??? >> ==1286936== by 0x9F1E1FB: ??? >> ==1286936== by 0x4BA1A09: ompi_mpi_init (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== >> ==1286936== LEAK SUMMARY: >> ==1286936== definitely lost: 9,805 bytes in 137 blocks >> ==1286936== indirectly lost: 63,431 bytes in 63 blocks >> ==1286936== possibly lost: 0 bytes in 0 blocks >> ==1286936== still reachable: 1,174 bytes in 27 blocks >> ==1286936== suppressed: 0 bytes in 0 blocks >> ==1286936== >> ==1286936== ERROR SUMMARY: 34 errors from 29 contexts (suppressed: 0 from >> 0) >> ==1286936== >> ==1286936== 1 errors in context 1 of 29: >> ==1286936== Thread 3: >> ==1286936== Syscall param writev(vector[...]) points to uninitialised >> byte(s) >> ==1286936== at 0x658A48D: __writev (writev.c:26) >> ==1286936== by 0x658A48D: writev (writev.c:24) >> ==1286936== by 0x8DF9B4C: ??? >> ==1286936== by 0x7CC413E: ??? (in >> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286936== by 0x7CC487E: event_base_loop (in >> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286936== by 0x8DBDD55: ??? >> ==1286936== by 0x4BF7608: start_thread (pthread_create.c:477) >> ==1286936== by 0x6595102: clone (clone.S:95) >> ==1286936== Address 0xa290cbf is 127 bytes inside a block of size 5,120 >> alloc'd >> ==1286936== at 0x483DFAF: realloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286936== by 0x8DE155A: ??? >> ==1286936== by 0x8DE3F4A: ??? >> ==1286936== by 0x8DE4900: ??? >> ==1286936== by 0x8DE4175: ??? >> ==1286936== by 0x8D7CF91: ??? >> ==1286936== by 0x7CC3FDD: ??? (in >> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286936== by 0x7CC487E: event_base_loop (in >> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286936== by 0x8DBDD55: ??? >> ==1286936== by 0x4BF7608: start_thread (pthread_create.c:477) >> ==1286936== by 0x6595102: clone (clone.S:95) >> ==1286936== Uninitialised value was created by a stack allocation >> ==1286936== at 0x9F048D6: ??? >> ==1286936== >> ==1286936== >> ==1286936== 6 errors in context 2 of 29: >> ==1286936== Thread 1: >> ==1286936== Syscall param pwritev(vector[...]) points to uninitialised >> byte(s) >> ==1286936== at 0x658A608: pwritev64 (pwritev64.c:30) >> ==1286936== by 0x658A608: pwritev (pwritev64.c:28) >> ==1286936== by 0x9F46E25: ??? >> ==1286936== by 0x9FCE33B: ??? >> ==1286936== by 0x9FCDDBF: ??? >> ==1286936== by 0x9FA324A: ??? >> ==1286936== by 0x4B3DD7F: PMPI_File_write_at_all (in >> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x7B6DB32: H5FD_mpio_write (H5FDmpio.c:1833) >> ==1286936== by 0x78DF11D: H5FD_write (H5FDint.c:257) >> ==1286936== by 0x78AE86B: H5F__accum_write (H5Faccum.c:825) >> ==1286936== by 0x7A1FBE9: H5PB_write (H5PB.c:1027) >> ==1286936== by 0x78BBC7A: H5F_block_write (H5Fio.c:164) >> ==1286936== by 0x7B5ED15: H5C__collective_write (H5Cmpio.c:1020) >> ==1286936== by 0x7B5ED15: H5C_apply_candidate_list (H5Cmpio.c:394) >> ==1286936== Address 0xedf91b0 is 96 bytes inside a block of size 216 >> alloc'd >> ==1286936== at 0x483B7F3: malloc (in >> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286936== by 0x7980C45: H5MM_malloc (H5MM.c:292) >> ==1286936== by 0x7980C45: H5MM_malloc (H5MM.c:267) >> ==1286936== by 0x77FC8FF: H5C__flush_single_entry (H5C.c:6045) >> ==1286936== by 0x7B5DC7E: H5C__flush_candidates_in_ring >> (H5Cmpio.c:1371) >> ==1286936== by 0x7B5DC7E: H5C__flush_candidate_entries (H5Cmpio.c:1192) >> ==1286936== by 0x7B5DC7E: H5C_apply_candidate_list (H5Cmpio.c:385) >> ==1286936== by 0x7B5BA18: H5AC__rsp__dist_md_write__flush >> (H5ACmpio.c:1709) >> ==1286936== by 0x7B5BA18: H5AC__run_sync_point (H5ACmpio.c:2164) >> ==1286936== by 0x7B5C9D2: H5AC__flush_entries (H5ACmpio.c:2307) >> ==1286936== by 0x77C95E4: H5AC_flush (H5AC.c:681) >> ==1286936== by 0x78B306A: H5F__flush_phase2 (H5Fint.c:1831) >> ==1286936== by 0x78B5D7A: H5F__dest (H5Fint.c:1152) >> ==1286936== by 0x78B6603: H5F_try_close (H5Fint.c:2180) >> ==1286936== by 0x78B69F5: H5F__close_cb (H5Fint.c:2009) >> ==1286936== by 0x7965797: H5I_dec_ref (H5I.c:1254) >> ==1286936== Uninitialised value was created by a stack allocation >> ==1286936== at 0x7695AF0: ??? (in >> /home/mlohry/dev/cmake-build/external_install/lib/libcgns.so) >> ==1286936== >> ==1286936== ERROR SUMMARY: 34 errors from 29 contexts (suppressed: 0 from >> 0) >> >> On Mon, Aug 24, 2020 at 5:00 PM Jed Brown wrote: >> >>> Do you potentially have a memory or other resource leak? SIGBUS would >>> be an odd result, but the symptom of crashing after running for a long time >>> sometimes fits with a resource leak. >>> >>> Mark Lohry writes: >>> >>> > I queued up some jobs with Barry's patch, so we'll see. >>> > >>> > Re Jed's suggestion at checkpointing, I don't *think* this is something >>> > coming from the state of the solution -- running from the same point >>> I'm >>> > seeing it crash anywhere between 1 hour and 20 hours in. I'll increase >>> my >>> > file save frequency in case I'm wrong there though. >>> > >>> > My intel build with different blas just made it through a 6 hour time >>> slot >>> > without crash, whereas yesterday the same thing crashed after 3 hours. >>> But >>> > given the randomness so far I'd bet that's just dumb luck. >>> > >>> > On Mon, Aug 24, 2020 at 4:22 PM Barry Smith wrote: >>> > >>> >> >>> >> >>> >> > On Aug 24, 2020, at 2:34 PM, Jed Brown wrote: >>> >> > >>> >> > I'm thinking of something such as writing floating point data into >>> the >>> >> return address, which would be unaligned/garbage. >>> >> >>> >> Ok, my patch will detect this. This is what I was talking about, >>> messing >>> >> up the BLAS arguments which are the addresses of arrays. >>> >> >>> >> Valgrind is by far the preferred approach. >>> >> >>> >> Barry >>> >> >>> >> Another feature we could add to the malloc checking is when a SEGV >>> or >>> >> BUS error is encountered and we catch it we should run the >>> >> PetscMallocVerify() and check our memory for corruption reporting any >>> we >>> >> find. >>> >> >>> >> >>> >> >>> >> > >>> >> > Reproducing under Valgrind would help a lot. Perhaps it's possible >>> to >>> >> checkpoint such that the breakage can be reproduced more quickly? >>> >> > >>> >> > Barry Smith writes: >>> >> > >>> >> >> https://en.wikipedia.org/wiki/Bus_error < >>> >> https://en.wikipedia.org/wiki/Bus_error> >>> >> >> >>> >> >> But perhaps not true for Intel? >>> >> >> >>> >> >> >>> >> >> >>> >> >>> On Aug 24, 2020, at 1:06 PM, Matthew Knepley >>> >> wrote: >>> >> >>> >>> >> >>> On Mon, Aug 24, 2020 at 1:46 PM Barry Smith >> >> >> bsmith at petsc.dev>> wrote: >>> >> >>> >>> >> >>> >>> >> >>>> On Aug 24, 2020, at 12:39 PM, Jed Brown >> >> >> jed at jedbrown.org>> wrote: >>> >> >>>> >>> >> >>>> Barry Smith > writes: >>> >> >>>> >>> >> >>>>>> On Aug 24, 2020, at 12:31 PM, Jed Brown >> >> >> jed at jedbrown.org>> wrote: >>> >> >>>>>> >>> >> >>>>>> Barry Smith > >>> writes: >>> >> >>>>>> >>> >> >>>>>>> So if a BLAS errors with SIGBUS then it is always an input >>> error >>> >> of just not proper double/complex alignment? Or some other very >>> strange >>> >> thing? >>> >> >>>>>> >>> >> >>>>>> I would suspect memory corruption. >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> Corruption meaning what specifically? >>> >> >>>>> >>> >> >>>>> The routines crashing are dgemv which only take double precision >>> >> arrays, regardless of what garbage is in those arrays i don't think >>> there >>> >> can be BUS errors resulting. They don't take integer arrays whose >>> >> corruption could result in bad indexing and then BUS errors. >>> >> >>>>> >>> >> >>>>> So then it can only be corruption of the pointers passed in, >>> correct? >>> >> >>>> >>> >> >>>> Such as those pointers pointing into data on the stack with >>> incorrect >>> >> sizes. >>> >> >>> >>> >> >>> But won't incorrect sizes "usually" lead to SEGV not SEGBUS? >>> >> >>> >>> >> >>> My understanding was that roughly memory errors in the heap are >>> SEGV >>> >> and memory errors on the stack are SIGBUS. Is that not true? >>> >> >>> >>> >> >>> Matt >>> >> >>> >>> >> >>> -- >>> >> >>> What most experimenters take for granted before they begin their >>> >> experiments is infinitely more interesting than any results to which >>> their >>> >> experiments lead. >>> >> >>> -- Norbert Wiener >>> >> >>> >>> >> >>> https://www.cse.buffalo.edu/~knepley/ < >>> >> http://www.cse.buffalo.edu/~knepley/> >>> >> >>> >> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlohry at gmail.com Thu Aug 27 08:57:44 2020 From: mlohry at gmail.com (Mark Lohry) Date: Thu, 27 Aug 2020 09:57:44 -0400 Subject: [petsc-users] Bus Error In-Reply-To: References: <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> <02176361-CBC4-490B-A064-057C9FAC271A@petsc.dev> <2C14E111-0ABC-4322-AD1C-FC83E4BD4798@petsc.dev> <87h7ssgg0g.fsf@jedbrown.org> <80DC4DC1-8D9B-43BF-9947-F47694FE68E9@petsc.dev> <87blj0gew3.fsf@jedbrown.org> <878se4gdd6.fsf@jedbrown.org> <3B4D8471-F863-49C8-AAD7-4A4ACA3BE78A@petsc.dev> <87364cgczg.fsf@jedbrown.org> <79E082F4-0261-4F32-9781-861B2B650511@petsc.dev> <87y2m3g7mp.fsf@jedbrown.org> <1BA78983-882E-404D-983D-B432D17E6421@petsc.dev> <87a6yjg3o5.fsf@jedbrown.org> <9EEB2628-D6ED-4466-A629-33EAC73BCE4C@petsc.dev> Message-ID: nevermind, i'm incompetent and only copied the .so (symbolic link) and not the actual new library i compiled with that... On Thu, Aug 27, 2020 at 9:53 AM Mark Lohry wrote: > It was built with --with-debugging=1 > > On Thu, Aug 27, 2020 at 9:44 AM Barry Smith wrote: > >> >> Mark, >> >> Did i tell you that this has to be built with the configure option >> --with-debugging=1 and won't be turned off with --with-debugging=0 ? >> >> Barry >> >> >> On Aug 27, 2020, at 8:10 AM, Mark Lohry wrote: >> >> Barry, no output from that patch i'm afraid: >> >> 54 KSP Residual norm 3.215013886664e+03 >> 55 KSP Residual norm 3.049105434513e+03 >> 56 KSP Residual norm 2.859123916860e+03 >> [929]PETSC ERROR: >> ------------------------------------------------------------------------ >> [929]PETSC ERROR: Caught signal number 7 BUS: Bus Error, possibly illegal >> memory access >> [929]PETSC ERROR: Try option -start_in_debugger or >> -on_error_attach_debugger >> [929]PETSC ERROR: or see >> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >> [929]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac >> OS X to find memory corruption errors >> [929]PETSC ERROR: likely location of problem given in stack below >> [929]PETSC ERROR: --------------------- Stack Frames >> ------------------------------------ >> [929]PETSC ERROR: Note: The EXACT line numbers in the stack are not >> available, >> [929]PETSC ERROR: INSTEAD the line number of the start of the >> function >> [929]PETSC ERROR: is given. >> [929]PETSC ERROR: [929] BLASgemv line 1406 >> /home/mlohry/petsc/src/mat/impls/baij/seq/baijfact.c >> [929]PETSC ERROR: [929] MatSolve_SeqBAIJ_N_NaturalOrdering line 1378 >> /home/mlohry/petsc/src/mat/impls/baij/seq/baijfact.c >> [929]PETSC ERROR: [929] MatSolve line 3354 >> /home/mlohry/petsc/src/mat/interface/matrix.c >> [929]PETSC ERROR: [929] PCApply_ILU line 201 >> /home/mlohry/petsc/src/ksp/pc/impls/factor/ilu/ilu.c >> [929]PETSC ERROR: [929] PCApply line 426 >> /home/mlohry/petsc/src/ksp/pc/interface/precon.c >> [929]PETSC ERROR: [929] KSP_PCApply line 279 >> /home/mlohry/petsc/include/petsc/private/kspimpl.h >> [929]PETSC ERROR: [929] KSPSolve_PREONLY line 16 >> /home/mlohry/petsc/src/ksp/ksp/impls/preonly/preonly.c >> [929]PETSC ERROR: [929] KSPSolve_Private line 590 >> /home/mlohry/petsc/src/ksp/ksp/interface/itfunc.c >> [929]PETSC ERROR: [929] KSPSolve line 848 >> /home/mlohry/petsc/src/ksp/ksp/interface/itfunc.c >> [929]PETSC ERROR: [929] PCApply_ASM line 441 >> /home/mlohry/petsc/src/ksp/pc/impls/asm/asm.c >> [929]PETSC ERROR: [929] PCApply line 426 >> /home/mlohry/petsc/src/ksp/pc/interface/precon.c >> [929]PETSC ERROR: [929] KSP_PCApply line 279 >> /home/mlohry/petsc/include/petsc/private/kspimpl.h >> srun: Job step aborted: Waiting up to 47 seconds for job step to finish. >> [929]PETSC ERROR: [929] KSPFGMRESCycle line 108 >> /home/mlohry/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c >> [929]PETSC ERROR: [929] KSPSolve_FGMRES line 274 >> /home/mlohry/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c >> [929]PETSC ERROR: [929] KSPSolve_Private line 590 >> /home/mlohry/petsc/src/ksp/ksp/interface/itfunc.c >> >> On Mon, Aug 24, 2020 at 6:47 PM Mark Lohry wrote: >> >>> I don't think I do. Running a much smaller case with the same models I >>> get the attached report from valgrind --show-leak-kinds=all >>> --leak-check=full --track-origins=yes. I only see some HDF5 stuff and >>> OpenMPI that I think are false positives. >>> >>> ==1286950== Memcheck, a memory error detector >>> ==1286950== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et >>> al. >>> ==1286950== Using Valgrind-3.15.0-608cb11914-20190413 and LibVEX; rerun >>> with -h for copyright info >>> ==1286950== Command: ./verification_testing >>> --gtest_filter=DrivenCavity3D.Re100_BackwardEulerILU1_16x16N2_Quadrature1 >>> --petsc_time_integrator=arkimex --petsc_arkimex_type=l2 >>> ==1286950== Parent PID: 1286932 >>> ==1286950== >>> --1286950-- >>> --1286950-- Valgrind options: >>> --1286950-- --show-leak-kinds=all >>> --1286950-- --leak-check=full >>> --1286950-- --track-origins=yes >>> --1286950-- --log-file=valgrind-out.txt >>> --1286950-- -v >>> --1286950-- Contents of /proc/version: >>> --1286950-- Linux version 5.4.0-29-generic (buildd at lgw01-amd64-035) >>> (gcc version 9.3.0 (Ubuntu 9.3.0-10ubuntu2)) #33-Ubuntu SMP Wed Apr 29 >>> 14:32:27 UTC 2020 >>> --1286950-- >>> --1286950-- Arch and hwcaps: AMD64, LittleEndian, >>> amd64-cx16-rdtscp-sse3-ssse3-avx >>> --1286950-- Page sizes: currently 4096, max supported 4096 >>> --1286950-- Valgrind library directory: >>> /usr/lib/x86_64-linux-gnu/valgrind >>> --1286950-- Reading syms from >>> /home/mlohry/dev/cmake-build/verification_testing >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/ld-2.31.so >>> --1286950-- Considering /usr/lib/x86_64-linux-gnu/ld-2.31.so .. >>> --1286950-- .. CRC mismatch (computed 387b17ea wanted d28cf5ef) >>> --1286950-- Considering /lib/x86_64-linux-gnu/ld-2.31.so .. >>> --1286950-- .. CRC mismatch (computed 387b17ea wanted d28cf5ef) >>> --1286950-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ld-2.31.so >>> .. >>> --1286950-- .. CRC is valid >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/valgrind/memcheck-amd64-linux >>> --1286950-- object doesn't have a symbol table >>> --1286950-- object doesn't have a dynamic symbol table >>> --1286950-- Scheduler: using generic scheduler lock implementation. >>> --1286950-- Reading suppressions file: >>> /usr/lib/x86_64-linux-gnu/valgrind/default.supp >>> ==1286950== embedded gdbserver: reading from >>> /tmp/vgdb-pipe-from-vgdb-to-1286950-by-mlohry-on-??? >>> ==1286950== embedded gdbserver: writing to >>> /tmp/vgdb-pipe-to-vgdb-from-1286950-by-mlohry-on-??? >>> ==1286950== embedded gdbserver: shared mem >>> /tmp/vgdb-pipe-shared-mem-vgdb-1286950-by-mlohry-on-??? >>> ==1286950== >>> ==1286950== TO CONTROL THIS PROCESS USING vgdb (which you probably >>> ==1286950== don't want to do, unless you know exactly what you're doing, >>> ==1286950== or are doing some strange experiment): >>> ==1286950== /usr/lib/x86_64-linux-gnu/valgrind/../../bin/vgdb >>> --pid=1286950 ...command... >>> ==1286950== >>> ==1286950== TO DEBUG THIS PROCESS USING GDB: start GDB like this >>> ==1286950== /path/to/gdb ./verification_testing >>> ==1286950== and then give GDB the following command >>> ==1286950== target remote | >>> /usr/lib/x86_64-linux-gnu/valgrind/../../bin/vgdb --pid=1286950 >>> ==1286950== --pid is optional if only one valgrind process is running >>> ==1286950== >>> --1286950-- REDIR: 0x4022d80 (ld-linux-x86-64.so.2:strlen) redirected to >>> 0x580c9ce2 (???) >>> --1286950-- REDIR: 0x4022b50 (ld-linux-x86-64.so.2:index) redirected to >>> 0x580c9cfc (???) >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_core-amd64-linux.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so >>> --1286950-- object doesn't have a symbol table >>> ==1286950== WARNING: new redirection conflicts with existing -- ignoring >>> it >>> --1286950-- old: 0x04022d80 (strlen ) R-> (0000.0) >>> 0x580c9ce2 ??? >>> --1286950-- new: 0x04022d80 (strlen ) R-> (2007.0) >>> 0x0483f060 strlen >>> --1286950-- REDIR: 0x401f560 (ld-linux-x86-64.so.2:strcmp) redirected to >>> 0x483ffd0 (strcmp) >>> --1286950-- REDIR: 0x40232e0 (ld-linux-x86-64.so.2:mempcpy) redirected >>> to 0x4843a20 (mempcpy) >>> --1286950-- Reading syms from >>> /home/mlohry/dev/cmake-build/initialization/libinitialization.so >>> --1286950-- Reading syms from >>> /home/mlohry/dev/cmake-build/governing_equations/libgoverning_equations.so >>> --1286950-- Reading syms from >>> /home/mlohry/dev/cmake-build/time_stepping/libtime_stepping.so >>> --1286950-- Reading syms from >>> /home/mlohry/dev/cmake-build/governing_equations/libboundary_conditions.so >>> --1286950-- Reading syms from >>> /home/mlohry/dev/cmake-build/governing_equations/libsolution_monitors.so >>> --1286950-- Reading syms from >>> /home/mlohry/dev/cmake-build/governing_equations/libfluxtypes.so >>> --1286950-- Reading syms from >>> /home/mlohry/dev/cmake-build/algebraic_solvers/libalgebraic_solvers.so >>> --1286950-- Reading syms from >>> /home/mlohry/dev/cmake-build/program_options/libprogram_options.so >>> --1286950-- Reading syms from >>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_filesystem.so.1.73.0 >>> --1286950-- Reading syms from >>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0 >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so.40.20.1 >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3 >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/ >>> libpthread-2.31.so >>> --1286950-- Considering >>> /usr/lib/debug/.build-id/77/5cbbfff814456660786780b0b3b40096b4c05e.debug .. >>> --1286950-- .. build-id is valid >>> --1286948-- Reading syms from >>> /home/mlohry/dev/cmake-build/external/petsc/arch-linux2-c-opt/lib/libpetsc.so.3.13.3 >>> --1286937-- Reading syms from >>> /home/mlohry/dev/cmake-build/parallel/libparallel.so >>> --1286937-- Reading syms from >>> /home/mlohry/dev/cmake-build/logger/liblogger.so >>> --1286937-- Reading syms from >>> /home/mlohry/dev/cmake-build/spatial_discretization/libdiscretization.so >>> --1286945-- Reading syms from >>> /home/mlohry/dev/cmake-build/utils/libutils.so >>> --1286944-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28 >>> --1286938-- object doesn't have a symbol table >>> --1286949-- Reading syms from /usr/lib/x86_64-linux-gnu/libm-2.31.so >>> --1286949-- Considering /usr/lib/x86_64-linux-gnu/libm-2.31.so .. >>> --1286947-- .. CRC mismatch (computed 327d785f wanted 751f5509) >>> --1286947-- Considering /lib/x86_64-linux-gnu/libm-2.31.so .. >>> --1286938-- .. CRC mismatch (computed 327d785f wanted 751f5509) >>> --1286937-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ >>> libm-2.31.so .. >>> --1286950-- .. CRC is valid >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libc-2.31.so >>> --1286950-- Considering /usr/lib/x86_64-linux-gnu/libc-2.31.so .. >>> --1286951-- .. CRC mismatch (computed a6f43087 wanted 6555436e) >>> --1286951-- Considering /lib/x86_64-linux-gnu/libc-2.31.so .. >>> --1286947-- .. CRC mismatch (computed a6f43087 wanted 6555436e) >>> --1286947-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ >>> libc-2.31.so .. >>> --1286950-- .. CRC is valid >>> --1286940-- Reading syms from >>> /home/mlohry/dev/cmake-build/file_io/libfileio.so >>> --1286950-- Reading syms from >>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_program_options.so.1.73.0 >>> --1286950-- Reading syms from >>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_serialization.so.1.73.0 >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3 >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3 >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/libhwloc.so.15.1.0 >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /home/mlohry/dev/cmake-build/external/petsc/arch-linux2-c-opt/lib/libsuperlu_dist.so.6.3.0 >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 >>> --1286950-- object doesn't have a symbol table >>> --1286937-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 >>> --1286937-- object doesn't have a symbol table >>> --1286947-- Reading syms from /usr/lib/x86_64-linux-gnu/libX11.so.6.3.0 >>> --1286939-- object doesn't have a symbol table >>> --1286947-- Reading syms from /usr/lib/x86_64-linux-gnu/libdl-2.31.so >>> --1286947-- Considering /usr/lib/x86_64-linux-gnu/libdl-2.31.so .. >>> --1286947-- .. CRC mismatch (computed 4fd191ca wanted df8dd39a) >>> --1286947-- Considering /lib/x86_64-linux-gnu/libdl-2.31.so .. >>> --1286947-- .. CRC mismatch (computed 4fd191ca wanted df8dd39a) >>> --1286947-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ >>> libdl-2.31.so .. >>> --1286947-- .. CRC is valid >>> --1286937-- Reading syms from >>> /home/mlohry/dev/cmake-build/external/petsc/arch-linux2-c-opt/lib/libmetis.so >>> --1286937-- Reading syms from >>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_log.so.1.73.0 >>> --1286942-- Reading syms from >>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_log_setup.so.1.73.0 >>> --1286942-- Reading syms from >>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_thread.so.1.73.0 >>> --1286942-- Reading syms from >>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_regex.so.1.73.0 >>> --1286949-- Reading syms from >>> /home/mlohry/dev/cmake-build/basis_functions/libbasis_functions.so >>> --1286944-- Reading syms from /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0 >>> --1286944-- object doesn't have a symbol table >>> --1286951-- Reading syms from >>> /home/mlohry/dev/cmake-build/external_install/lib/libcgns.so >>> --1286951-- object doesn't have a symbol table >>> --1286943-- Reading syms from >>> /home/mlohry/dev/cmake-build/external_install/lib/libhdf5.so.103.1.0 >>> --1286951-- Reading syms from >>> /home/mlohry/dev/cmake-build/external/tinyxml2-build/libtinyxml2.so.6.1.0 >>> --1286944-- Reading syms from >>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_iostreams.so.1.73.0 >>> --1286944-- Reading syms from /usr/lib/x86_64-linux-gnu/libz.so.1.2.11 >>> --1286944-- object doesn't have a symbol table >>> --1286951-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0 >>> --1286951-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libutil-2.31.so >>> --1286946-- Considering /usr/lib/x86_64-linux-gnu/libutil-2.31.so .. >>> --1286946-- .. CRC mismatch (computed 4639aba5 wanted ceb246b4) >>> --1286946-- Considering /lib/x86_64-linux-gnu/libutil-2.31.so .. >>> --1286946-- .. CRC mismatch (computed 4639aba5 wanted ceb246b4) >>> --1286948-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ >>> libutil-2.31.so .. >>> --1286939-- .. CRC is valid >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libudev.so.1.6.17 >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libltdl.so.7.3.1 >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/libgfortran.so.5.0.0 >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libxcb.so.1.1.0 >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/librt-2.31.so >>> --1286950-- Considering /usr/lib/x86_64-linux-gnu/librt-2.31.so .. >>> --1286950-- .. CRC mismatch (computed a9acc0ce wanted cb4905a3) >>> --1286950-- Considering /lib/x86_64-linux-gnu/librt-2.31.so .. >>> --1286950-- .. CRC mismatch (computed a9acc0ce wanted cb4905a3) >>> --1286950-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ >>> librt-2.31.so .. >>> --1286950-- .. CRC is valid >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/libquadmath.so.0.0.0 >>> --1286950-- object doesn't have a symbol table >>> --1286945-- Reading syms from /usr/lib/x86_64-linux-gnu/libXau.so.6.0.0 >>> --1286945-- Considering /usr/lib/x86_64-linux-gnu/libXau.so.6.0.0 .. >>> --1286945-- .. CRC mismatch (computed 7de9b6ad wanted e8a17129) >>> --1286945-- Considering /lib/x86_64-linux-gnu/libXau.so.6.0.0 .. >>> --1286945-- .. CRC mismatch (computed 7de9b6ad wanted e8a17129) >>> --1286945-- object doesn't have a symbol table >>> --1286945-- Reading syms from /usr/lib/x86_64-linux-gnu/libXdmcp.so.6.0.0 >>> --1286942-- object doesn't have a symbol table >>> --1286942-- Reading syms from /usr/lib/x86_64-linux-gnu/libbsd.so.0.10.0 >>> --1286942-- object doesn't have a symbol table >>> --1286950-- REDIR: 0x6516600 (libc.so.6:memmove) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6515900 (libc.so.6:strncpy) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6516930 (libc.so.6:strcasecmp) redirected to >>> 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6515220 (libc.so.6:strcat) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6515960 (libc.so.6:rindex) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6517dd0 (libc.so.6:rawmemchr) redirected to >>> 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6532e60 (libc.so.6:wmemchr) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65329a0 (libc.so.6:wcscmp) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6516760 (libc.so.6:mempcpy) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6516590 (libc.so.6:bcmp) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6515890 (libc.so.6:strncmp) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65152d0 (libc.so.6:strcmp) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65166c0 (libc.so.6:memset) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6532960 (libc.so.6:wcschr) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65157f0 (libc.so.6:strnlen) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65153b0 (libc.so.6:strcspn) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6516980 (libc.so.6:strncasecmp) redirected to >>> 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6515350 (libc.so.6:strcpy) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6516ad0 (libc.so.6:memcpy@@GLIBC_2.14) redirected >>> to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65340d0 (libc.so.6:wcsnlen) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65329e0 (libc.so.6:wcscpy) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65159a0 (libc.so.6:strpbrk) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6515280 (libc.so.6:index) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65157b0 (libc.so.6:strlen) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x651ed20 (libc.so.6:memrchr) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65169d0 (libc.so.6:strcasecmp_l) redirected to >>> 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6516550 (libc.so.6:memchr) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6532ab0 (libc.so.6:wcslen) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6515c60 (libc.so.6:strspn) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65168d0 (libc.so.6:stpncpy) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6516870 (libc.so.6:stpcpy) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6517e10 (libc.so.6:strchrnul) redirected to >>> 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6516a20 (libc.so.6:strncasecmp_l) redirected to >>> 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6516470 (libc.so.6:strstr) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65a3750 (libc.so.6:__memcpy_chk) redirected to >>> 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286938-- REDIR: 0x6527a30 (libc.so.6:__strrchr_sse2) redirected to >>> 0x483ea70 (__strrchr_sse2) >>> --1286938-- REDIR: 0x6511c90 (libc.so.6:calloc) redirected to 0x483dce0 >>> (calloc) >>> --1286938-- REDIR: 0x6510260 (libc.so.6:malloc) redirected to 0x483b780 >>> (malloc) >>> --1286938-- REDIR: 0x6531c40 (libc.so.6:memcpy at GLIBC_2.2.5) redirected >>> to 0x4840100 (memcpy at GLIBC_2.2.5) >>> --1286938-- REDIR: 0x6527d30 (libc.so.6:__strlen_sse2) redirected to >>> 0x483efa0 (__strlen_sse2) >>> --1286938-- REDIR: 0x65f4ac0 (libc.so.6:__strncmp_sse42) redirected to >>> 0x483f7c0 (__strncmp_sse42) >>> --1286938-- REDIR: 0x6510850 (libc.so.6:free) redirected to 0x483c9d0 >>> (free) >>> --1286938-- REDIR: 0x6532070 (libc.so.6:__memset_sse2_unaligned) >>> redirected to 0x48428e0 (memset) >>> --1286938-- REDIR: 0x6603350 (libc.so.6:__memcmp_sse4_1) redirected to >>> 0x4842150 (__memcmp_sse4_1) >>> --1286938-- REDIR: 0x6520520 (libc.so.6:__strcmp_sse2_unaligned) >>> redirected to 0x483fed0 (strcmp) >>> --1286938-- REDIR: 0x61d0c10 (libstdc++.so.6:operator new(unsigned >>> long)) redirected to 0x483bdf0 (operator new(unsigned long)) >>> --1286938-- REDIR: 0x61cee60 (libstdc++.so.6:operator delete(void*)) >>> redirected to 0x483cf50 (operator delete(void*)) >>> --1286938-- REDIR: 0x61d0c70 (libstdc++.so.6:operator new[](unsigned >>> long)) redirected to 0x483c510 (operator new[](unsigned long)) >>> --1286938-- REDIR: 0x61cee90 (libstdc++.so.6:operator delete[](void*)) >>> redirected to 0x483d6e0 (operator delete[](void*)) >>> --1286938-- REDIR: 0x65275f0 (libc.so.6:__strchr_sse2) redirected to >>> 0x483eb90 (__strchr_sse2) >>> --1286950-- REDIR: 0x6511000 (libc.so.6:realloc) redirected to 0x483df30 >>> (realloc) >>> --1286950-- REDIR: 0x6527820 (libc.so.6:__strchrnul_sse2) redirected to >>> 0x4843540 (strchrnul) >>> --1286950-- REDIR: 0x6531560 (libc.so.6:__strstr_sse2_unaligned) >>> redirected to 0x4843c20 (strstr) >>> --1286950-- REDIR: 0x6531c20 (libc.so.6:__mempcpy_sse2_unaligned) >>> redirected to 0x4843660 (mempcpy) >>> --1286950-- REDIR: 0x652d2a0 (libc.so.6:__strncpy_sse2_unaligned) >>> redirected to 0x483f560 (__strncpy_sse2_unaligned) >>> --1286950-- REDIR: 0x6515830 (libc.so.6:strncat) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65305b0 (libc.so.6:__strncat_sse2_unaligned) >>> redirected to 0x483ede0 (strncat) >>> --1286950-- REDIR: 0x6516120 (libc.so.6:__GI_strstr) redirected to >>> 0x4843ca0 (__strstr_sse2) >>> --1286950-- REDIR: 0x6522360 (libc.so.6:__rawmemchr_sse2) redirected to >>> 0x4843580 (rawmemchr) >>> --1286950-- REDIR: 0x65faea0 (libc.so.6:__strcasecmp_avx) redirected to >>> 0x483f830 (strcasecmp) >>> --1286950-- REDIR: 0x65fc520 (libc.so.6:__strncasecmp_avx) redirected to >>> 0x483f910 (strncasecmp) >>> --1286950-- REDIR: 0x65f98a0 (libc.so.6:__strspn_sse42) redirected to >>> 0x4843ef0 (strspn) >>> --1286950-- REDIR: 0x65f9620 (libc.so.6:__strcspn_sse42) redirected to >>> 0x4843e10 (strcspn) >>> --1286948-- REDIR: 0x6522030 (libc.so.6:__memchr_sse2) redirected to >>> 0x4840050 (memchr) >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_mmap.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_posix.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_sysv.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Discarding syms at 0x4a96240-0x4a96d47 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_posix.so >>> (have_dinfo 1) >>> --1286948-- Discarding syms at 0x4a9b1c0-0x4a9b937 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_sysv.so >>> (have_dinfo 1) >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_weighted.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_netlink.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/libnl-3.so.200.26.0 >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Discarding syms at 0x4a96120-0x4a966b0 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_weighted.so >>> (have_dinfo 1) >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_flux.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_orte.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_ompi.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_slurm.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- REDIR: 0x64bc670 (libc.so.6:setenv) redirected to 0x4844480 >>> (setenv) >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_pmi.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_flux.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_ext3x.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25 >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Discarding syms at 0x8d053e0-0x8d07391 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_flux.so (have_dinfo >>> 1) >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v21.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v20.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v3.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v12.so >>> --1286948-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_usock.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_tcp.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_native.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_none.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Discarding syms at 0x8d04180-0x8d045b0 in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_none.so (have_dinfo 1) >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/libmca_common_dstore.so.1.0.2 >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_hash.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds12.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_pshmem_mmap.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_preg_native.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_stdfd.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_syslog.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_default.so >>> --1286950-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_tool.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_app.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_orted.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_hnp.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_novm.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Discarding syms at 0x9ebf0a0-0x9ebf490 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_tool.so >>> (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9eca300-0x9ecbee8 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_orted.so >>> (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9ed1220-0x9ed24e7 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_hnp.so (have_dinfo >>> 1) >>> --1286946-- Discarding syms at 0x9ed8240-0x9ed8c88 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_novm.so >>> (have_dinfo 1) >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_tool.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_app.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_hnp.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_orted.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Discarding syms at 0x9ebf0e0-0x9ebf417 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_tool.so >>> (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9ecf320-0x9ed1239 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_hnp.so >>> (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9ed73a0-0x9ed9ccc in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_orted.so >>> (have_dinfo 1) >>> --1286936-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_basic.so >>> --1286936-- object doesn't have a symbol table >>> --1286936-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_bucket.so >>> --1286936-- object doesn't have a symbol table >>> --1286936-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_rcache_grdma.so >>> --1286936-- object doesn't have a symbol table >>> --1286936-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mpool_hugepage.so >>> --1286936-- object doesn't have a symbol table >>> --1286936-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_bml_r2.so >>> --1286936-- object doesn't have a symbol table >>> --1286936-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_tcp.so >>> --1286936-- object doesn't have a symbol table >>> --1286936-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_sm.so >>> --1286936-- object doesn't have a symbol table >>> --1286936-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so >>> --1286936-- object doesn't have a symbol table >>> --1286936-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_openib.so >>> --1286936-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_verbs.so.40.20.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_self.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- REDIR: 0x652cc70 (libc.so.6:__strcpy_sse2_unaligned) >>> redirected to 0x483f090 (strcpy) >>> --1286946-- REDIR: 0x65a3810 (libc.so.6:__memmove_chk) redirected to >>> 0x48331d0 (_vgnU_ifunc_wrapper) >>> ==1286946== WARNING: new redirection conflicts with existing -- ignoring >>> it >>> --1286946-- old: 0x06531c30 (__memcpy_chk_sse2_un) R-> (2030.0) >>> 0x04843b10 __memcpy_chk >>> --1286946-- new: 0x06531c30 (__memcpy_chk_sse2_un) R-> (2024.0) >>> 0x048434d0 __memmove_chk >>> --1286946-- REDIR: 0x6531c30 (libc.so.6:__memcpy_chk_sse2_unaligned) >>> redirected to 0x4843b10 (__memcpy_chk) >>> --1286946-- REDIR: 0x65129b0 (libc.so.6:posix_memalign) redirected to >>> 0x483e1e0 (posix_memalign) >>> --1286946-- Discarding syms at 0x9f15280-0x9f32932 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_openib.so >>> (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9f7c4c0-0x9f7ded8 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_verbs.so.40.20.0 >>> (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9f620c0-0x9f71483 in >>> /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9f9ba10-0x9fd22ee in >>> /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 (have_dinfo 1) >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_cm.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_monitoring.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_monitoring.so.50.10.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_vprotocol_pessimist.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Discarding syms at 0x9f4d400-0x9f50c19 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_vprotocol_pessimist.so >>> (have_dinfo 1) >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/libpsm1/libpsm_infinipath.so.1.16 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/libinfinipath.so.4.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libuuid.so.1.3.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_ofi.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/libfabric.so.1.9.15 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/librdmacm.so.1.2.28.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libpsm2.so.2.1 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libnuma.so.1.0.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- REDIR: 0x6517140 (libc.so.6:strcasestr) redirected to >>> 0x4843f80 (strcasestr) >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm2.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Discarding syms at 0x9f4d5c0-0x9f4f5a1 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm.so (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9fee680-0x9ff096c in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm2.so (have_dinfo >>> 1) >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_inter.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_basic.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sm.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_sm.so.40.20.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_self.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sync.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_monitoring.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_libnbc.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_tuned.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_sm.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_pt2pt.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_rdma.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_monitoring.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Discarding syms at 0x9f724a0-0x9f787b5 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_ofi.so (have_dinfo 1) >>> --1286946-- Discarding syms at 0xa827f80-0xa8e14c4 in >>> /usr/lib/x86_64-linux-gnu/libfabric.so.1.9.15 (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9f94830-0x9fbafce in >>> /usr/lib/libpsm1/libpsm_infinipath.so.1.16 (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9fe5580-0x9fe8f71 in >>> /usr/lib/x86_64-linux-gnu/libuuid.so.1.3.0 (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9f56420-0x9f5cec0 in >>> /usr/lib/x86_64-linux-gnu/libinfinipath.so.4.0 (have_dinfo 1) >>> --1286946-- Discarding syms at 0xa929f10-0xa93d5fc in >>> /usr/lib/x86_64-linux-gnu/librdmacm.so.1.2.28.0 (have_dinfo 1) >>> --1286946-- Discarding syms at 0xa94b0c0-0xa95a483 in >>> /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 (have_dinfo 1) >>> --1286946-- Discarding syms at 0xa968860-0xa9adf12 in >>> /usr/lib/x86_64-linux-gnu/libpsm2.so.2.1 (have_dinfo 1) >>> --1286946-- Discarding syms at 0xa9e7a10-0xaa1e2ee in >>> /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9f80410-0x9f84e27 in >>> /usr/lib/x86_64-linux-gnu/libnuma.so.1.0.0 (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9f103e0-0x9f15fd5 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_cm.so (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9f471e0-0x9f47ce0 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_monitoring.so >>> (have_dinfo 1) >>> ==1286946== Thread 3: >>> ==1286946== Syscall param writev(vector[...]) points to uninitialised >>> byte(s) >>> ==1286946== at 0x658A48D: __writev (writev.c:26) >>> ==1286946== by 0x658A48D: writev (writev.c:24) >>> ==1286946== by 0x8DF9B4C: pmix_ptl_base_send_handler (in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >>> ==1286946== by 0x7CC413E: ??? (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286946== by 0x7CC487E: event_base_loop (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286946== by 0x8DBDD55: ??? (in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >>> ==1286946== by 0x4BF7608: start_thread (pthread_create.c:477) >>> ==1286946== by 0x6595102: clone (clone.S:95) >>> ==1286946== Address 0xa28fdcf is 127 bytes inside a block of size 5,120 >>> alloc'd >>> ==1286946== at 0x483DFAF: realloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286946== by 0x8DE155A: pmix_bfrop_buffer_extend (in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >>> ==1286946== by 0x8DE3F4A: pmix_bfrops_base_pack_byte (in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >>> ==1286946== by 0x8DE4900: pmix_bfrops_base_pack_buf (in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >>> ==1286946== by 0x8DE4175: pmix_bfrops_base_pack (in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >>> ==1286946== by 0x8D7CF91: ??? (in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >>> ==1286946== by 0x7CC3FDD: ??? (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286946== by 0x7CC487E: event_base_loop (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286946== by 0x8DBDD55: ??? (in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >>> ==1286946== by 0x4BF7608: start_thread (pthread_create.c:477) >>> ==1286946== by 0x6595102: clone (clone.S:95) >>> ==1286946== Uninitialised value was created by a stack allocation >>> ==1286946== at 0x9F048D6: ??? (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so) >>> ==1286946== >>> --1286944-- Discarding syms at 0xaa4d220-0xaa5796a in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmp--1286945-- Discarding syms at >>> 0xaa4d220---1286948-- Discarding syms at 0xaae1100-0xaae7d70 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmp--1286945-- Discarding syms at >>> 0xaae1100-0xaae7d70 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_monitoring.so >>> (have_dinfo 1) >>> --1286945-- Discarding syms at 0x9f69420-0x9f--1286938-- REDIR: >>> 0x61cee70 (libstdc++.so.6:operator delete(void*, unsigned long)) redirected >>> to --1286937-- REDIR: 0x61cee70 (libstdc++.so.6:opera--1286946-- REDIR: >>> 0x652e970 (libc.so.6:__stpncpy_sse2_unaligned) redirected to 0x48427e0 >>> (stpncpy) >>> --1286942-- REDIR: 0x6527ed0 (libc.so.6:__strnlen_sse2) redirected to >>> 0x483eee0 (strnlen) >>> --1286944-- REDIR: 0x652fcc0 (libc.so.6:__strcat_sse2_unaligned) >>> redirected to 0x483ec20 (strcat) >>> --1286951-- REDIR: 0x65113d0 (libc.so.6:memalign) redirected to >>> 0x483e2a0 (memalign) >>> --1286951-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_romio321.so >>> --1286951-- object doesn't have a symbol table >>> --1286951-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_ompio.so >>> --1286951-- object doesn't have a symbol table >>> --1286941-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_ompio.so.41.19.3 >>> --1286941-- object doesn't have a symbol table >>> --1286951-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fs_ufs.so >>> --1286951-- object doesn't have a symbol table >>> --1286939-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_two_phase.so >>> --1286939-- object doesn't have a symbol table >>> --1286939-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_individual.so >>> --1286939-- object doesn't have a symbol table >>> --1286939-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic.so >>> --1286939-- object doesn't have a symbol table >>> --1286939-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic_gen2.so >>> --1286939-- object doesn't have a symbol table >>> --1286939-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_vulcan.so >>> --1286939-- object doesn't have a symbol table >>> --1286939-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fbtl_posix.so >>> --1286939-- object doesn't have a symbol table >>> --1286943-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_sm.so >>> --1286943-- object doesn't have a symbol table >>> --1286943-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_individual.so >>> --1286943-- object doesn't have a symbol table >>> --1286943-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_lockedfile.so >>> --1286943-- object doesn't have a symbol table >>> --1286938-- REDIR: 0x65a3b00 (libc.so.6:__strcpy_chk) redirected to >>> 0x48435c0 (__strcpy_chk) >>> --1286939-- Discarding syms at 0x9f1d660-0x9f371d6 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9f5afa0-0x9f8f8b6 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_romio321.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9fa0640-0x9fa42d9 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_ompio.so (have_dinfo >>> 1) >>> --1286939-- Discarding syms at 0x9f4c160-0x9f4dc58 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_inter.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0xa7fc270-0xa804f00 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_basic.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9fee3a0-0x9ff134e in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sm.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0xa80a240-0xa80aa8d in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_sm.so.40.20.0 >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0xa80f0e0-0xa80f8bb in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_self.so (have_dinfo >>> 1) >>> --1286939-- Discarding syms at 0xaa460c0-0xaa47947 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sync.so (have_dinfo >>> 1) >>> --1286939-- Discarding syms at 0xaa613e0-0xaa7730f in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_libnbc.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0xaa849c0-0xaa8a845 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_tuned.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9ee1320-0x9ee3567 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_bml_r2.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9eebc40-0x9ef4ad7 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_tcp.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9f02600-0x9f08cd8 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so (have_dinfo >>> 1) >>> --1286939-- Discarding syms at 0x9f40200-0x9f4126e in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_self.so (have_dinfo >>> 1) >>> --1286939-- Discarding syms at 0x9eda4e0-0x9edb4c5 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mpool_hugepage.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9ed32c0-0x9ed4afe in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_rcache_grdma.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9ebf160-0x9ebfe95 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_basic.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9ece140-0x9ecebed in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_bucket.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9ec92a0-0x9ec9aa2 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_app.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8eae0e0-0x8eae4a7 in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_stdfd.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8eb3220-0x8eb3c27 in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_syslog.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8eb80e0-0x8eb90b7 in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_default.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8ea6380-0x8ea97b3 in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_preg_native.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8e5a740-0x8e5f859 in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_usock.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8e67be0-0x8e743f0 in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_tcp.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x84da200-0x84daa5d in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_native.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8d322b0-0x8d34bfc in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v21.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8e29480-0x8e3b70a in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v20.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8d3c2b0-0x8d3ed5c in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v3.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8e45340-0x8e502da in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v12.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8e901a0-0x8e908a7 in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_pshmem_mmap.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8d05520-0x8d06783 in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8e7b460-0x8e8aaa4 in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_hash.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8d44520-0x8d4556a in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds12.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8e97600-0x8ea0fa1 in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/libmca_common_dstore.so.1.0.2 >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8d109c0-0x8d27dcf in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_ext3x.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8d5b280-0x8dfdffb in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25 (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9ec40a0-0x9ec4490 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_app.so (have_dinfo >>> 1) >>> --1286939-- Discarding syms at 0x84d2580-0x84d518f in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_pmi.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x4a96120-0x4a9644f in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_flux.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x4aa0100-0x4aa03e7 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_orte.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x84c74a0-0x84c901f in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_ompi.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x4aa5260-0x4aa58e9 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_slurm.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x4a9b420-0x4a9bcdf in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_netlink.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x84e7460-0x84f52ca in >>> /usr/lib/x86_64-linux-gnu/libnl-3.so.200.26.0 (have_dinfo 1) >>> --1286939-- Discarding syms at 0x4a90360-0x4a91107 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_mmap.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9f46220-0x9f474cc in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fbtl_posix.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9f0f180-0x9f0f78d in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fs_ufs.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0xaa94540-0xaa96a4a in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_sm.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0xaa9f6c0-0xaab44d0 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_pt2pt.so (have_dinfo >>> 1) >>> --1286939-- Discarding syms at 0xaabe820-0xaad8ee0 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_rdma.so (have_dinfo >>> 1) >>> --1286939-- Discarding syms at 0x9efc080-0x9efc1e1 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_sm.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9fab2a0-0x9fb1341 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_two_phase.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9f140c0-0x9f14299 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_individual.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9fb72a0-0x9fbb791 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9fd52a0-0x9fda794 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic_gen2.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9fe02e0-0x9fe59a5 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_vulcan.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0xa815460-0xa8177ab in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_sm.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0xa81e260-0xa82033d in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_individual.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0xa8273e0-0xa8297d8 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_lockedfile.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9fc85e0-0x9fce8ef in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_ompio.so.41.19.3 >>> (have_dinfo 1) >>> ==1286939== >>> ==1286939== HEAP SUMMARY: >>> ==1286939== in use at exit: 74,054 bytes in 223 blocks >>> ==1286939== total heap usage: 22,405,782 allocs, 22,405,559 frees, >>> 34,062,479,959 bytes allocated >>> ==1286939== >>> ==1286939== Searching for pointers to 223 not-freed blocks >>> ==1286939== Checked 3,415,912 bytes >>> ==1286939== >>> ==1286939== Thread 1: >>> ==1286939== 1 bytes in 1 blocks are definitely lost in loss record 1 of >>> 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x651550E: strdup (strdup.c:42) >>> ==1286939== by 0x9F6A4B6: ??? >>> ==1286939== by 0x9F47373: ??? >>> ==1286939== by 0x68E3B9B: mca_base_framework_components_register (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3F35: mca_base_framework_register (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3F93: mca_base_framework_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x4BA1734: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== >>> ==1286939== 8 bytes in 1 blocks are still reachable in loss record 2 of >>> 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x764724C: ??? (in >>> /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) >>> ==1286939== by 0x7657B9A: ??? (in >>> /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) >>> ==1286939== by 0x7645679: ??? (in >>> /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) >>> ==1286939== by 0x4011B89: call_init.part.0 (dl-init.c:72) >>> ==1286939== by 0x4011C90: call_init (dl-init.c:30) >>> ==1286939== by 0x4011C90: _dl_init (dl-init.c:119) >>> ==1286939== by 0x4001139: ??? (in /usr/lib/x86_64-linux-gnu/ >>> ld-2.31.so) >>> ==1286939== by 0x3: ??? >>> ==1286939== by 0x1FFEFFF926: ??? >>> ==1286939== by 0x1FFEFFF93D: ??? >>> ==1286939== by 0x1FFEFFF987: ??? >>> ==1286939== by 0x1FFEFFF9A7: ??? >>> ==1286939== >>> ==1286939== 8 bytes in 1 blocks are definitely lost in loss record 3 of >>> 44 >>> ==1286939== at 0x483DD99: calloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x9F69B6F: ??? >>> ==1286939== by 0x9F1CDED: ??? >>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9EE3527: ??? >>> ==1286939== by 0x4B6170A: mca_bml_base_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA1714: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>> ==1286939== >>> ==1286939== 13 bytes in 2 blocks are still reachable in loss record 4 of >>> 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x651550E: strdup (strdup.c:42) >>> ==1286939== by 0x7CC3657: event_config_avoid_method (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x68FEB5A: opal_event_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68FE8CA: ??? (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E4008: mca_base_framework_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68B8BCF: opal_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x6860120: orte_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== >>> ==1286939== 15 bytes in 1 blocks are indirectly lost in loss record 5 of >>> 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x651550E: strdup (strdup.c:42) >>> ==1286939== by 0x9EDB189: ??? >>> ==1286939== by 0x68D98FC: mca_base_framework_components_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x6907C25: ??? (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E4008: mca_base_framework_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x4BA16D5: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>> ==1286939== >>> ==1286939== 15 bytes in 1 blocks are definitely lost in loss record 6 of >>> 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x651550E: strdup (strdup.c:42) >>> ==1286939== by 0x9F5655C: ??? >>> ==1286939== by 0x4011B89: call_init.part.0 (dl-init.c:72) >>> ==1286939== by 0x4011C90: call_init (dl-init.c:30) >>> ==1286939== by 0x4011C90: _dl_init (dl-init.c:119) >>> ==1286939== by 0x65D6784: _dl_catch_exception >>> (dl-error-skeleton.c:182) >>> ==1286939== by 0x401642C: dl_open_worker (dl-open.c:758) >>> ==1286939== by 0x65D6727: _dl_catch_exception >>> (dl-error-skeleton.c:208) >>> ==1286939== by 0x40155F9: _dl_open (dl-open.c:837) >>> ==1286939== by 0x72DE34B: dlopen_doit (dlopen.c:66) >>> ==1286939== by 0x65D6727: _dl_catch_exception >>> (dl-error-skeleton.c:208) >>> ==1286939== by 0x65D67F2: _dl_catch_error (dl-error-skeleton.c:227) >>> ==1286939== >>> ==1286939== 16 bytes in 1 blocks are definitely lost in loss record 7 of >>> 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x9F1CBEB: ??? >>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9EE3527: ??? >>> ==1286939== by 0x4B6170A: mca_bml_base_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA1714: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>> ==1286939== >>> ==1286939== 16 bytes in 1 blocks are definitely lost in loss record 8 of >>> 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x9F1CC66: ??? >>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9EE3527: ??? >>> ==1286939== by 0x4B6170A: mca_bml_base_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA1714: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>> ==1286939== >>> ==1286939== 16 bytes in 1 blocks are definitely lost in loss record 9 of >>> 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x9F1CCDA: ??? >>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9EE3527: ??? >>> ==1286939== by 0x4B6170A: mca_bml_base_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA1714: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>> ==1286939== >>> ==1286939== 25 bytes in 1 blocks are still reachable in loss record 10 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x651550E: strdup (strdup.c:42) >>> ==1286939== by 0x68F27BD: ??? (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x4B956B6: ompi_pml_v_output_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B95259: ??? (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x68D98FC: mca_base_framework_components_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x4B93FAE: ??? (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x68E4008: mca_base_framework_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x4BA1734: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== >>> ==1286939== 30 bytes in 1 blocks are definitely lost in loss record 11 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0xA9A859B: ??? >>> ==1286939== by 0x4011B89: call_init.part.0 (dl-init.c:72) >>> ==1286939== by 0x4011C90: call_init (dl-init.c:30) >>> ==1286939== by 0x4011C90: _dl_init (dl-init.c:119) >>> ==1286939== by 0x65D6784: _dl_catch_exception >>> (dl-error-skeleton.c:182) >>> ==1286939== by 0x401642C: dl_open_worker (dl-open.c:758) >>> ==1286939== by 0x65D6727: _dl_catch_exception >>> (dl-error-skeleton.c:208) >>> ==1286939== by 0x40155F9: _dl_open (dl-open.c:837) >>> ==1286939== by 0x72DE34B: dlopen_doit (dlopen.c:66) >>> ==1286939== by 0x65D6727: _dl_catch_exception >>> (dl-error-skeleton.c:208) >>> ==1286939== by 0x65D67F2: _dl_catch_error (dl-error-skeleton.c:227) >>> ==1286939== by 0x72DEB58: _dlerror_run (dlerror.c:170) >>> ==1286939== >>> ==1286939== 32 bytes in 1 blocks are still reachable in loss record 12 >>> of 44 >>> ==1286939== at 0x483DD99: calloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x7CC353E: event_get_supported_methods (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x68FEA98: opal_event_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68FE8CA: ??? (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E4008: mca_base_framework_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68B8BCF: opal_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x6860120: orte_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== >>> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 13 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A5EB: ??? >>> ==1286939== by 0x84D2B0A: ??? >>> ==1286939== by 0x68602FB: orte_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== >>> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 14 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A5EB: ??? >>> ==1286939== by 0x84D2BCE: ??? >>> ==1286939== by 0x68602FB: orte_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== >>> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 15 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A5EB: ??? >>> ==1286939== by 0x84D2CB2: ??? >>> ==1286939== by 0x68602FB: orte_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== >>> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 16 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A5EB: ??? >>> ==1286939== by 0x84D2D91: ??? >>> ==1286939== by 0x68602FB: orte_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== >>> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 17 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E81BD8: ??? >>> ==1286939== by 0x8E89F4B: ??? >>> ==1286939== by 0x8D84A0D: ??? >>> ==1286939== by 0x8DF79C1: ??? >>> ==1286939== by 0x7CC3FDD: ??? (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x7CC487E: event_base_loop (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x8DBDD55: ??? >>> ==1286939== by 0x4BF7608: start_thread (pthread_create.c:477) >>> ==1286939== by 0x6595102: clone (clone.S:95) >>> ==1286939== >>> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 18 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A767: ??? >>> ==1286939== by 0x84D330E: ??? >>> ==1286939== by 0x68602FB: orte_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== >>> ==1286939== 36 (32 direct, 4 indirect) bytes in 1 blocks are definitely >>> lost in loss record 19 of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A5EB: ??? >>> ==1286939== by 0x4B94C09: mca_pml_base_pml_check_selected (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x9F1E1E1: ??? >>> ==1286939== by 0x4BA1A09: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== >>> ==1286939== 40 bytes in 1 blocks are still reachable in loss record 20 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x7CFF4B6: ??? (in >>> /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >>> ==1286939== by 0x7CC5E26: event_global_setup_locks_ (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x7CFF68F: evthread_use_pthreads (in >>> /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >>> ==1286939== by 0x68FE8E4: ??? (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E4008: mca_base_framework_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68B8BCF: opal_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x6860120: orte_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== >>> ==1286939== 40 bytes in 1 blocks are still reachable in loss record 21 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x7CFF4B6: ??? (in >>> /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >>> ==1286939== by 0x7CCF377: evsig_global_setup_locks_ (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x7CC5E39: event_global_setup_locks_ (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x7CFF68F: evthread_use_pthreads (in >>> /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >>> ==1286939== by 0x68FE8E4: ??? (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E4008: mca_base_framework_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68B8BCF: opal_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x6860120: orte_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== >>> ==1286939== 40 bytes in 1 blocks are still reachable in loss record 22 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x7CFF4B6: ??? (in >>> /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >>> ==1286939== by 0x7CCB997: evutil_secure_rng_global_setup_locks_ (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x7CC5E4F: event_global_setup_locks_ (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x7CFF68F: evthread_use_pthreads (in >>> /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >>> ==1286939== by 0x68FE8E4: ??? (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E4008: mca_base_framework_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68B8BCF: opal_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x6860120: orte_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== >>> ==1286939== 48 bytes in 1 blocks are still reachable in loss record 23 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x68D9043: mca_base_component_repository_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68D7F7A: mca_base_component_find (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3F35: mca_base_framework_register (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3F93: mca_base_framework_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x4B8560C: mca_io_base_file_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B0E68A: ompi_file_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B3ADB8: PMPI_File_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) >>> ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) >>> ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) >>> ==1286939== >>> ==1286939== 48 bytes in 1 blocks are still reachable in loss record 24 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x68D9043: mca_base_component_repository_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68D7F7A: mca_base_component_find (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3F35: mca_base_framework_register (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3F93: mca_base_framework_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x4B85638: mca_io_base_file_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B0E68A: ompi_file_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B3ADB8: PMPI_File_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) >>> ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) >>> ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) >>> ==1286939== >>> ==1286939== 48 bytes in 2 blocks are still reachable in loss record 25 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x7CC3647: event_config_avoid_method (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x68FEB5A: opal_event_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68FE8CA: ??? (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E4008: mca_base_framework_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68B8BCF: opal_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x6860120: orte_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== >>> ==1286939== 55 (32 direct, 23 indirect) bytes in 1 blocks are definitely >>> lost in loss record 26 of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A767: ??? >>> ==1286939== by 0x4AF6CD6: ompi_comm_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA194D: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== >>> ==1286939== 56 bytes in 1 blocks are still reachable in loss record 27 >>> of 44 >>> ==1286939== at 0x483DD99: calloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x7CC1C86: event_config_new (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x68FEAC0: opal_event_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68FE8CA: ??? (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E4008: mca_base_framework_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68B8BCF: opal_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x6860120: orte_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== >>> ==1286939== 56 bytes in 1 blocks are definitely lost in loss record 28 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x9F6E008: ??? >>> ==1286939== by 0x9F7C654: ??? >>> ==1286939== by 0x9F1CD3E: ??? >>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9EE3527: ??? >>> ==1286939== by 0x4B6170A: mca_bml_base_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA1714: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== >>> ==1286939== 56 bytes in 1 blocks are definitely lost in loss record 29 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0xA957008: ??? >>> ==1286939== by 0xA86B017: ??? >>> ==1286939== by 0xA862FD8: ??? >>> ==1286939== by 0xA828E15: ??? >>> ==1286939== by 0xA829624: ??? >>> ==1286939== by 0x9F77910: ??? >>> ==1286939== by 0x4B85C53: ompi_mtl_base_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x9F13E4D: ??? >>> ==1286939== by 0x4B94673: mca_pml_base_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA1789: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== >>> ==1286939== 76 (32 direct, 44 indirect) bytes in 1 blocks are definitely >>> lost in loss record 30 of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A767: ??? >>> ==1286939== by 0x84D387F: ??? >>> ==1286939== by 0x68602FB: orte_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== >>> ==1286939== 79 (64 direct, 15 indirect) bytes in 1 blocks are definitely >>> lost in loss record 31 of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x9EDB12E: ??? >>> ==1286939== by 0x68D98FC: mca_base_framework_components_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x6907C25: ??? (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E4008: mca_base_framework_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x4BA16D5: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>> ==1286939== >>> ==1286939== 144 bytes in 3 blocks are still reachable in loss record 32 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x68D9043: mca_base_component_repository_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68D7F7A: mca_base_component_find (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3F35: mca_base_framework_register (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3F93: mca_base_framework_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x4B8564E: mca_io_base_file_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B0E68A: ompi_file_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B3ADB8: PMPI_File_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) >>> ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) >>> ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) >>> ==1286939== >>> ==1286939== 231 bytes in 12 blocks are definitely lost in loss record 33 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x651550E: strdup (strdup.c:42) >>> ==1286939== by 0x9F2B4B3: ??? >>> ==1286939== by 0x9F2B85C: ??? >>> ==1286939== by 0x9F2BBD7: ??? >>> ==1286939== by 0x9F1CAAC: ??? >>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9EE3527: ??? >>> ==1286939== by 0x4B6170A: mca_bml_base_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA1714: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== >>> ==1286939== 240 bytes in 5 blocks are still reachable in loss record 34 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x68D9043: mca_base_component_repository_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68D7F7A: mca_base_component_find (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3F35: mca_base_framework_register (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3F93: mca_base_framework_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x4B85622: mca_io_base_file_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B0E68A: ompi_file_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B3ADB8: PMPI_File_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) >>> ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) >>> ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) >>> ==1286939== >>> ==1286939== 272 bytes in 44 blocks are definitely lost in loss record 35 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x9FCAEDB: ??? >>> ==1286939== by 0x9FE42B2: ??? >>> ==1286939== by 0x9FE47BB: ??? >>> ==1286939== by 0x9FCDDBF: ??? >>> ==1286939== by 0x9FA324A: ??? >>> ==1286939== by 0x4B3DD7F: PMPI_File_write_at_all (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x7B6DB32: H5FD_mpio_write (H5FDmpio.c:1833) >>> ==1286939== by 0x78DF11D: H5FD_write (H5FDint.c:257) >>> ==1286939== by 0x78AE86B: H5F__accum_write (H5Faccum.c:825) >>> ==1286939== by 0x7A1FBE9: H5PB_write (H5PB.c:1027) >>> ==1286939== by 0x78BBC7A: H5F_block_write (H5Fio.c:164) >>> ==1286939== >>> ==1286939== 585 (480 direct, 105 indirect) bytes in 15 blocks are >>> definitely lost in loss record 36 of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A767: ??? >>> ==1286939== by 0x4B14036: ompi_proc_complete_init_single (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B146C3: ompi_proc_complete_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA19A9: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== >>> ==1286939== 776 bytes in 32 blocks are indirectly lost in loss record 37 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8DE9816: ??? >>> ==1286939== by 0x8DEB1D2: ??? >>> ==1286939== by 0x8DEB49A: ??? >>> ==1286939== by 0x8DE8B12: ??? >>> ==1286939== by 0x8E9D492: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A767: ??? >>> ==1286939== >>> ==1286939== 840 (480 direct, 360 indirect) bytes in 15 blocks are >>> definitely lost in loss record 38 of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A5EB: ??? >>> ==1286939== by 0x9EF2F00: ??? >>> ==1286939== by 0x9EEBF17: ??? >>> ==1286939== by 0x9EE2F54: ??? >>> ==1286939== by 0x9F1E1FB: ??? >>> ==1286939== >>> ==1286939== 1,084 (480 direct, 604 indirect) bytes in 15 blocks are >>> definitely lost in loss record 39 of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A767: ??? >>> ==1286939== by 0x84D4800: ??? >>> ==1286939== by 0x68602FB: orte_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== >>> ==1286939== 1,344 bytes in 1 blocks are definitely lost in loss record >>> 40 of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x68AE702: opal_free_list_grow_st (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9F1CD2D: ??? >>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9EE3527: ??? >>> ==1286939== by 0x4B6170A: mca_bml_base_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA1714: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>> ==1286939== >>> ==1286939== 2,752 bytes in 1 blocks are definitely lost in loss record >>> 41 of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x68AE702: opal_free_list_grow_st (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9F1CC50: ??? >>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9EE3527: ??? >>> ==1286939== by 0x4B6170A: mca_bml_base_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA1714: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>> ==1286939== >>> ==1286939== 2,752 bytes in 1 blocks are definitely lost in loss record >>> 42 of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x68AE702: opal_free_list_grow_st (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9F1CCC4: ??? >>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9EE3527: ??? >>> ==1286939== by 0x4B6170A: mca_bml_base_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA1714: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>> ==1286939== >>> ==1286939== 62,644 bytes in 31 blocks are indirectly lost in loss record >>> 43 of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8DE9FA8: ??? >>> ==1286939== by 0x8DEB032: ??? >>> ==1286939== by 0x8DEB49A: ??? >>> ==1286939== by 0x8DE8B12: ??? >>> ==1286939== by 0x8E9D492: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A5EB: ??? >>> ==1286939== >>> ==1286939== 62,760 (480 direct, 62,280 indirect) bytes in 15 blocks are >>> definitely lost in loss record 44 of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A5EB: ??? >>> ==1286939== by 0x9F0398A: ??? >>> ==1286939== by 0x9EE2F54: ??? >>> ==1286939== by 0x9F1E1FB: ??? >>> ==1286939== by 0x4BA1A09: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== >>> ==1286939== LEAK SUMMARY: >>> ==1286939== definitely lost: 9,837 bytes in 138 blocks >>> ==1286939== indirectly lost: 63,435 bytes in 64 blocks >>> ==1286939== possibly lost: 0 bytes in 0 blocks >>> ==1286939== still reachable: 782 bytes in 21 blocks >>> ==1286939== suppressed: 0 bytes in 0 blocks >>> ==1286939== >>> ==1286939== ERROR SUMMARY: 29 errors from 29 contexts (suppressed: 0 >>> from 0) >>> ==1286939== >>> ==1286939== 1 errors in context 1 of 29: >>> ==1286939== Thread 3: >>> ==1286939== Syscall param writev(vector[...]) points to uninitialised >>> byte(s) >>> ==1286939== at 0x658A48D: __writev (writev.c:26) >>> ==1286939== by 0x658A48D: writev (writev.c:24) >>> ==1286939== by 0x8DF9B4C: ??? >>> ==1286939== by 0x7CC413E: ??? (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x7CC487E: event_base_loop (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x8DBDD55: ??? >>> ==1286939== by 0x4BF7608: start_thread (pthread_create.c:477) >>> ==1286939== by 0x6595102: clone (clone.S:95) >>> ==1286939== Address 0xa28ee1f is 127 bytes inside a block of size 5,120 >>> alloc'd >>> ==1286939== at 0x483DFAF: realloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8DE155A: ??? >>> ==1286939== by 0x8DE3F4A: ??? >>> ==1286939== by 0x8DE4900: ??? >>> ==1286939== by 0x8DE4175: ??? >>> ==1286939== by 0x8D7CF91: ??? >>> ==1286939== by 0x7CC3FDD: ??? (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x7CC487E: event_base_loop (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x8DBDD55: ??? >>> ==1286939== by 0x4BF7608: start_thread (pthread_create.c:477) >>> ==1286939== by 0x6595102: clone (clone.S:95) >>> ==1286939== Uninitialised value was created by a stack allocation >>> ==1286939== at 0x9F048D6: ??? >>> ==1286939== >>> ==1286939== ERROR SUMMARY: 29 errors from 29 contexts (suppressed: 0 >>> from 0) >>> mpi/lib/libopen-pal.so.40.20.3) >>> ==1286936== by 0x4B85622: mca_io_base_file_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4B0E68A: ompi_file_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4B3ADB8: PMPI_File_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) >>> ==1286936== by 0x78D4B23: H5FD_open (H5FD.c:733) >>> ==1286936== by 0x78B953B: H5F_open (H5Fint.c:1493) >>> ==1286936== >>> ==1286936== 272 bytes in 44 blocks are definitely lost in loss record 39 >>> of 49 >>> ==1286936== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x9FCAEDB: ??? >>> ==1286936== by 0x9FE42B2: ??? >>> ==1286936== by 0x9FE47BB: ??? >>> ==1286936== by 0x9FCDDBF: ??? >>> ==1286936== by 0x9FA324A: ??? >>> ==1286936== by 0x4B3DD7F: PMPI_File_write_at_all (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x7B6DB32: H5FD_mpio_write (H5FDmpio.c:1833) >>> ==1286936== by 0x78DF11D: H5FD_write (H5FDint.c:257) >>> ==1286936== by 0x78AE86B: H5F__accum_write (H5Faccum.c:825) >>> ==1286936== by 0x7A1FBE9: H5PB_write (H5PB.c:1027) >>> ==1286936== by 0x78BBC7A: H5F_block_write (H5Fio.c:164) >>> ==1286936== >>> ==1286936== 312 bytes in 1 blocks are still reachable in loss record 40 >>> of 49 >>> ==1286936== at 0x483BE63: operator new(unsigned long) (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x74E78EB: boost::detail::make_external_thread_data() >>> (in >>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_thread.so.1.73.0) >>> ==1286936== by 0x74E7C74: >>> boost::detail::add_thread_exit_function(boost::detail::thread_exit_function_base*) >>> (in >>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_thread.so.1.73.0) >>> ==1286936== by 0x73AFCEA: >>> boost::log::v2_mt_posix::sources::aux::get_severity_level() (in >>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_log.so.1.73.0) >>> ==1286936== by 0x5F71A6C: set_value (severity_feature.hpp:135) >>> ==1286936== by 0x5F71A6C: >>> open_record_unlocked>> const boost::log::v2_mt_posix::trivial::severity_level> > > >>> (severity_feature.hpp:252) >>> ==1286936== by 0x5F71A6C: >>> open_record>> const boost::log::v2_mt_posix::trivial::severity_level> > > >>> (basic_logger.hpp:459) >>> ==1286936== by 0x5F71A6C: >>> Logger::TraceMessage(std::__cxx11::basic_string>> std::char_traits, std::allocator >) (logger.cpp:328) >>> ==1286936== by 0x5F729C7: >>> Logger::Message(std::__cxx11::basic_string, >>> std::allocator > const&, LogLevel) (logger.cpp:280) >>> ==1286936== by 0x5F73CF1: >>> Logger::Timer::Timer(std::__cxx11::basic_string>> std::char_traits, std::allocator > const&, LogLevel) >>> (logger.cpp:426) >>> ==1286936== by 0x15718A: timer (logger.hpp:98) >>> ==1286936== by 0x15718A: main (testing_main.cpp:9) >>> ==1286936== >>> ==1286936== 585 (480 direct, 105 indirect) bytes in 15 blocks are >>> definitely lost in loss record 41 of 49 >>> ==1286936== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x8E9D3EB: ??? >>> ==1286936== by 0x8E9F1C1: ??? >>> ==1286936== by 0x8D0578C: ??? >>> ==1286936== by 0x8D8605A: ??? >>> ==1286936== by 0x8D87FE8: ??? >>> ==1286936== by 0x8D88E4D: ??? >>> ==1286936== by 0x8D1A767: ??? >>> ==1286936== by 0x4B14036: ompi_proc_complete_init_single (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4B146C3: ompi_proc_complete_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4BA19A9: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== >>> ==1286936== 776 bytes in 32 blocks are indirectly lost in loss record 42 >>> of 49 >>> ==1286936== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x8DE9816: ??? >>> ==1286936== by 0x8DEB1D2: ??? >>> ==1286936== by 0x8DEB49A: ??? >>> ==1286936== by 0x8DE8B12: ??? >>> ==1286936== by 0x8E9D492: ??? >>> ==1286936== by 0x8E9F1C1: ??? >>> ==1286936== by 0x8D0578C: ??? >>> ==1286936== by 0x8D8605A: ??? >>> ==1286936== by 0x8D87FE8: ??? >>> ==1286936== by 0x8D88E4D: ??? >>> ==1286936== by 0x8D1A767: ??? >>> ==1286936== >>> ==1286936== 840 (480 direct, 360 indirect) bytes in 15 blocks are >>> definitely lost in loss record 43 of 49 >>> ==1286936== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x8E9D3EB: ??? >>> ==1286936== by 0x8E9F1C1: ??? >>> ==1286936== by 0x8D0578C: ??? >>> ==1286936== by 0x8D8605A: ??? >>> ==1286936== by 0x8D87FE8: ??? >>> ==1286936== by 0x8D88E4D: ??? >>> ==1286936== by 0x8D1A5EB: ??? >>> ==1286936== by 0x9EF2F00: ??? >>> ==1286936== by 0x9EEBF17: ??? >>> ==1286936== by 0x9EE2F54: ??? >>> ==1286936== by 0x9F1E1FB: ??? >>> ==1286936== >>> ==1286936== 1,091 (480 direct, 611 indirect) bytes in 15 blocks are >>> definitely lost in loss record 44 of 49 >>> ==1286936== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x8E9D3EB: ??? >>> ==1286936== by 0x8E9F1C1: ??? >>> ==1286936== by 0x8D0578C: ??? >>> ==1286936== by 0x8D8605A: ??? >>> ==1286936== by 0x8D87FE8: ??? >>> ==1286936== by 0x8D88E4D: ??? >>> ==1286936== by 0x8D1A767: ??? >>> ==1286936== by 0x84D4800: ??? >>> ==1286936== by 0x68602FB: orte_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286936== by 0x4BA1322: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== >>> ==1286936== 1,344 bytes in 1 blocks are definitely lost in loss record >>> 45 of 49 >>> ==1286936== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x68AE702: opal_free_list_grow_st (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286936== by 0x9F1CD2D: ??? >>> ==1286936== by 0x68FC9C8: mca_btl_base_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286936== by 0x9EE3527: ??? >>> ==1286936== by 0x4B6170A: mca_bml_base_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4BA1714: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286936== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286936== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286936== by 0x15710D: main (testing_main.cpp:8) >>> ==1286936== >>> ==1286936== 2,752 bytes in 1 blocks are definitely lost in loss record >>> 46 of 49 >>> ==1286936== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x68AE702: opal_free_list_grow_st (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286936== by 0x9F1CC50: ??? >>> ==1286936== by 0x68FC9C8: mca_btl_base_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286936== by 0x9EE3527: ??? >>> ==1286936== by 0x4B6170A: mca_bml_base_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4BA1714: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286936== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286936== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286936== by 0x15710D: main (testing_main.cpp:8) >>> ==1286936== >>> ==1286936== 2,752 bytes in 1 blocks are definitely lost in loss record >>> 47 of 49 >>> ==1286936== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x68AE702: opal_free_list_grow_st (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286936== by 0x9F1CCC4: ??? >>> ==1286936== by 0x68FC9C8: mca_btl_base_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286936== by 0x9EE3527: ??? >>> ==1286936== by 0x4B6170A: mca_bml_base_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4BA1714: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286936== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286936== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286936== by 0x15710D: main (testing_main.cpp:8) >>> ==1286936== >>> ==1286936== 62,640 bytes in 30 blocks are indirectly lost in loss record >>> 48 of 49 >>> ==1286936== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x8DE9FA8: ??? >>> ==1286936== by 0x8DEB032: ??? >>> ==1286936== by 0x8DEB49A: ??? >>> ==1286936== by 0x8DE8B12: ??? >>> ==1286936== by 0x8E9D492: ??? >>> ==1286936== by 0x8E9F1C1: ??? >>> ==1286936== by 0x8D0578C: ??? >>> ==1286936== by 0x8D8605A: ??? >>> ==1286936== by 0x8D87FE8: ??? >>> ==1286936== by 0x8D88E4D: ??? >>> ==1286936== by 0x8D1A5EB: ??? >>> ==1286936== >>> ==1286936== 62,760 (480 direct, 62,280 indirect) bytes in 15 blocks are >>> definitely lost in loss record 49 of 49 >>> ==1286936== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x8E9D3EB: ??? >>> ==1286936== by 0x8E9F1C1: ??? >>> ==1286936== by 0x8D0578C: ??? >>> ==1286936== by 0x8D8605A: ??? >>> ==1286936== by 0x8D87FE8: ??? >>> ==1286936== by 0x8D88E4D: ??? >>> ==1286936== by 0x8D1A5EB: ??? >>> ==1286936== by 0x9F0398A: ??? >>> ==1286936== by 0x9EE2F54: ??? >>> ==1286936== by 0x9F1E1FB: ??? >>> ==1286936== by 0x4BA1A09: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== >>> ==1286936== LEAK SUMMARY: >>> ==1286936== definitely lost: 9,805 bytes in 137 blocks >>> ==1286936== indirectly lost: 63,431 bytes in 63 blocks >>> ==1286936== possibly lost: 0 bytes in 0 blocks >>> ==1286936== still reachable: 1,174 bytes in 27 blocks >>> ==1286936== suppressed: 0 bytes in 0 blocks >>> ==1286936== >>> ==1286936== ERROR SUMMARY: 34 errors from 29 contexts (suppressed: 0 >>> from 0) >>> ==1286936== >>> ==1286936== 1 errors in context 1 of 29: >>> ==1286936== Thread 3: >>> ==1286936== Syscall param writev(vector[...]) points to uninitialised >>> byte(s) >>> ==1286936== at 0x658A48D: __writev (writev.c:26) >>> ==1286936== by 0x658A48D: writev (writev.c:24) >>> ==1286936== by 0x8DF9B4C: ??? >>> ==1286936== by 0x7CC413E: ??? (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286936== by 0x7CC487E: event_base_loop (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286936== by 0x8DBDD55: ??? >>> ==1286936== by 0x4BF7608: start_thread (pthread_create.c:477) >>> ==1286936== by 0x6595102: clone (clone.S:95) >>> ==1286936== Address 0xa290cbf is 127 bytes inside a block of size 5,120 >>> alloc'd >>> ==1286936== at 0x483DFAF: realloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x8DE155A: ??? >>> ==1286936== by 0x8DE3F4A: ??? >>> ==1286936== by 0x8DE4900: ??? >>> ==1286936== by 0x8DE4175: ??? >>> ==1286936== by 0x8D7CF91: ??? >>> ==1286936== by 0x7CC3FDD: ??? (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286936== by 0x7CC487E: event_base_loop (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286936== by 0x8DBDD55: ??? >>> ==1286936== by 0x4BF7608: start_thread (pthread_create.c:477) >>> ==1286936== by 0x6595102: clone (clone.S:95) >>> ==1286936== Uninitialised value was created by a stack allocation >>> ==1286936== at 0x9F048D6: ??? >>> ==1286936== >>> ==1286936== >>> ==1286936== 6 errors in context 2 of 29: >>> ==1286936== Thread 1: >>> ==1286936== Syscall param pwritev(vector[...]) points to uninitialised >>> byte(s) >>> ==1286936== at 0x658A608: pwritev64 (pwritev64.c:30) >>> ==1286936== by 0x658A608: pwritev (pwritev64.c:28) >>> ==1286936== by 0x9F46E25: ??? >>> ==1286936== by 0x9FCE33B: ??? >>> ==1286936== by 0x9FCDDBF: ??? >>> ==1286936== by 0x9FA324A: ??? >>> ==1286936== by 0x4B3DD7F: PMPI_File_write_at_all (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x7B6DB32: H5FD_mpio_write (H5FDmpio.c:1833) >>> ==1286936== by 0x78DF11D: H5FD_write (H5FDint.c:257) >>> ==1286936== by 0x78AE86B: H5F__accum_write (H5Faccum.c:825) >>> ==1286936== by 0x7A1FBE9: H5PB_write (H5PB.c:1027) >>> ==1286936== by 0x78BBC7A: H5F_block_write (H5Fio.c:164) >>> ==1286936== by 0x7B5ED15: H5C__collective_write (H5Cmpio.c:1020) >>> ==1286936== by 0x7B5ED15: H5C_apply_candidate_list (H5Cmpio.c:394) >>> ==1286936== Address 0xedf91b0 is 96 bytes inside a block of size 216 >>> alloc'd >>> ==1286936== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x7980C45: H5MM_malloc (H5MM.c:292) >>> ==1286936== by 0x7980C45: H5MM_malloc (H5MM.c:267) >>> ==1286936== by 0x77FC8FF: H5C__flush_single_entry (H5C.c:6045) >>> ==1286936== by 0x7B5DC7E: H5C__flush_candidates_in_ring >>> (H5Cmpio.c:1371) >>> ==1286936== by 0x7B5DC7E: H5C__flush_candidate_entries >>> (H5Cmpio.c:1192) >>> ==1286936== by 0x7B5DC7E: H5C_apply_candidate_list (H5Cmpio.c:385) >>> ==1286936== by 0x7B5BA18: H5AC__rsp__dist_md_write__flush >>> (H5ACmpio.c:1709) >>> ==1286936== by 0x7B5BA18: H5AC__run_sync_point (H5ACmpio.c:2164) >>> ==1286936== by 0x7B5C9D2: H5AC__flush_entries (H5ACmpio.c:2307) >>> ==1286936== by 0x77C95E4: H5AC_flush (H5AC.c:681) >>> ==1286936== by 0x78B306A: H5F__flush_phase2 (H5Fint.c:1831) >>> ==1286936== by 0x78B5D7A: H5F__dest (H5Fint.c:1152) >>> ==1286936== by 0x78B6603: H5F_try_close (H5Fint.c:2180) >>> ==1286936== by 0x78B69F5: H5F__close_cb (H5Fint.c:2009) >>> ==1286936== by 0x7965797: H5I_dec_ref (H5I.c:1254) >>> ==1286936== Uninitialised value was created by a stack allocation >>> ==1286936== at 0x7695AF0: ??? (in >>> /home/mlohry/dev/cmake-build/external_install/lib/libcgns.so) >>> ==1286936== >>> ==1286936== ERROR SUMMARY: 34 errors from 29 contexts (suppressed: 0 >>> from 0) >>> >>> On Mon, Aug 24, 2020 at 5:00 PM Jed Brown wrote: >>> >>>> Do you potentially have a memory or other resource leak? SIGBUS would >>>> be an odd result, but the symptom of crashing after running for a long time >>>> sometimes fits with a resource leak. >>>> >>>> Mark Lohry writes: >>>> >>>> > I queued up some jobs with Barry's patch, so we'll see. >>>> > >>>> > Re Jed's suggestion at checkpointing, I don't *think* this is >>>> something >>>> > coming from the state of the solution -- running from the same point >>>> I'm >>>> > seeing it crash anywhere between 1 hour and 20 hours in. I'll >>>> increase my >>>> > file save frequency in case I'm wrong there though. >>>> > >>>> > My intel build with different blas just made it through a 6 hour time >>>> slot >>>> > without crash, whereas yesterday the same thing crashed after 3 >>>> hours. But >>>> > given the randomness so far I'd bet that's just dumb luck. >>>> > >>>> > On Mon, Aug 24, 2020 at 4:22 PM Barry Smith wrote: >>>> > >>>> >> >>>> >> >>>> >> > On Aug 24, 2020, at 2:34 PM, Jed Brown wrote: >>>> >> > >>>> >> > I'm thinking of something such as writing floating point data into >>>> the >>>> >> return address, which would be unaligned/garbage. >>>> >> >>>> >> Ok, my patch will detect this. This is what I was talking about, >>>> messing >>>> >> up the BLAS arguments which are the addresses of arrays. >>>> >> >>>> >> Valgrind is by far the preferred approach. >>>> >> >>>> >> Barry >>>> >> >>>> >> Another feature we could add to the malloc checking is when a SEGV >>>> or >>>> >> BUS error is encountered and we catch it we should run the >>>> >> PetscMallocVerify() and check our memory for corruption reporting >>>> any we >>>> >> find. >>>> >> >>>> >> >>>> >> >>>> >> > >>>> >> > Reproducing under Valgrind would help a lot. Perhaps it's >>>> possible to >>>> >> checkpoint such that the breakage can be reproduced more quickly? >>>> >> > >>>> >> > Barry Smith writes: >>>> >> > >>>> >> >> https://en.wikipedia.org/wiki/Bus_error < >>>> >> https://en.wikipedia.org/wiki/Bus_error> >>>> >> >> >>>> >> >> But perhaps not true for Intel? >>>> >> >> >>>> >> >> >>>> >> >> >>>> >> >>> On Aug 24, 2020, at 1:06 PM, Matthew Knepley >>>> >> wrote: >>>> >> >>> >>>> >> >>> On Mon, Aug 24, 2020 at 1:46 PM Barry Smith >>> >>> >> bsmith at petsc.dev>> wrote: >>>> >> >>> >>>> >> >>> >>>> >> >>>> On Aug 24, 2020, at 12:39 PM, Jed Brown >>> >>> >> jed at jedbrown.org>> wrote: >>>> >> >>>> >>>> >> >>>> Barry Smith > >>>> writes: >>>> >> >>>> >>>> >> >>>>>> On Aug 24, 2020, at 12:31 PM, Jed Brown >>> >>> >> jed at jedbrown.org>> wrote: >>>> >> >>>>>> >>>> >> >>>>>> Barry Smith > >>>> writes: >>>> >> >>>>>> >>>> >> >>>>>>> So if a BLAS errors with SIGBUS then it is always an input >>>> error >>>> >> of just not proper double/complex alignment? Or some other very >>>> strange >>>> >> thing? >>>> >> >>>>>> >>>> >> >>>>>> I would suspect memory corruption. >>>> >> >>>>> >>>> >> >>>>> >>>> >> >>>>> Corruption meaning what specifically? >>>> >> >>>>> >>>> >> >>>>> The routines crashing are dgemv which only take double >>>> precision >>>> >> arrays, regardless of what garbage is in those arrays i don't think >>>> there >>>> >> can be BUS errors resulting. They don't take integer arrays whose >>>> >> corruption could result in bad indexing and then BUS errors. >>>> >> >>>>> >>>> >> >>>>> So then it can only be corruption of the pointers passed in, >>>> correct? >>>> >> >>>> >>>> >> >>>> Such as those pointers pointing into data on the stack with >>>> incorrect >>>> >> sizes. >>>> >> >>> >>>> >> >>> But won't incorrect sizes "usually" lead to SEGV not SEGBUS? >>>> >> >>> >>>> >> >>> My understanding was that roughly memory errors in the heap are >>>> SEGV >>>> >> and memory errors on the stack are SIGBUS. Is that not true? >>>> >> >>> >>>> >> >>> Matt >>>> >> >>> >>>> >> >>> -- >>>> >> >>> What most experimenters take for granted before they begin their >>>> >> experiments is infinitely more interesting than any results to which >>>> their >>>> >> experiments lead. >>>> >> >>> -- Norbert Wiener >>>> >> >>> >>>> >> >>> https://www.cse.buffalo.edu/~knepley/ < >>>> >> http://www.cse.buffalo.edu/~knepley/> >>>> >> >>>> >> >>>> >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Aug 27 09:52:00 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 27 Aug 2020 09:52:00 -0500 Subject: [petsc-users] Bus Error In-Reply-To: References: <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> <02176361-CBC4-490B-A064-057C9FAC271A@petsc.dev> <2C14E111-0ABC-4322-AD1C-FC83E4BD4798@petsc.dev> <87h7ssgg0g.fsf@jedbrown.org> <80DC4DC1-8D9B-43BF-9947-F47694FE68E9@petsc.dev> <87blj0gew3.fsf@jedbrown.org> <878se4gdd6.fsf@jedbrown.org> <3B4D8471-F863-49C8-AAD7-4A4ACA3BE78A@petsc.dev> <87364cgczg.fsf@jedbrown.org> <79E082F4-0261-4F32-9781-861B2B650511@petsc.dev> <87y2m3g7mp.fsf@jedbrown.org> <1BA78983-882E-404D-983D-B432D17E6421@petsc.dev> <87a6yjg3o5.fsf@jedbrown.org> <9EEB2628-D6ED-4466-A629-33EAC73BCE4C@petsc.dev> Message-ID: <386068BC-7972-455E-A9E8-C09F9DCF58BD@petsc.dev> Thanks, So this means that all the double precision array pointers that PETSc is passing into these BLAS calls are addressable. Which means nothing has corrupted any of these pointers before the calls. What my patch did. Before each BLAS call, for each double array argument it set a special exception handler and then accessed the first entry in the array. Since the exception handler was never called this means that the first entry of each array was accessible and would not produce a SEGV or SIGBUS. What else could be corrupted. 1) the size arguments passed to the BLAS calls, if they were too large they could result in accessing incorrect memory but IMHO that would usually produce a SEGV not a SIGBUS. It is hard to put a check in the code because these sizes are problem dependent and there is no way to know if they are wrong. 2) corruption of the stack? 3) hardware issue due to overheating or bad memory etc. I assume the MPI rank that crashes changes for each crashing run. I am adding code to our patch branch to print the node name that hopefully is constant for all runs, then one can see if the problem is always on the same node. Patch attached Can you try with a very different BLAS implementation? What are you using now? For example you could configure PETSc with --download-f2cblaslapack or if you are using MKL switch to non-MKL, or if you are using the system BLAS switch to MKL. Barry We can also replace the BLAS calls with direct C and see what happens but let's only do that after you try a different BLAS. > On Aug 27, 2020, at 8:53 AM, Mark Lohry wrote: > > It was built with --with-debugging=1 > > On Thu, Aug 27, 2020 at 9:44 AM Barry Smith > wrote: > > Mark, > > Did i tell you that this has to be built with the configure option --with-debugging=1 and won't be turned off with --with-debugging=0 ? > > Barry > > >> On Aug 27, 2020, at 8:10 AM, Mark Lohry > wrote: >> >> Barry, no output from that patch i'm afraid: >> >> 54 KSP Residual norm 3.215013886664e+03 >> 55 KSP Residual norm 3.049105434513e+03 >> 56 KSP Residual norm 2.859123916860e+03 >> [929]PETSC ERROR: ------------------------------------------------------------------------ >> [929]PETSC ERROR: Caught signal number 7 BUS: Bus Error, possibly illegal memory access >> [929]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >> [929]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >> [929]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors >> [929]PETSC ERROR: likely location of problem given in stack below >> [929]PETSC ERROR: --------------------- Stack Frames ------------------------------------ >> [929]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, >> [929]PETSC ERROR: INSTEAD the line number of the start of the function >> [929]PETSC ERROR: is given. >> [929]PETSC ERROR: [929] BLASgemv line 1406 /home/mlohry/petsc/src/mat/impls/baij/seq/baijfact.c >> [929]PETSC ERROR: [929] MatSolve_SeqBAIJ_N_NaturalOrdering line 1378 /home/mlohry/petsc/src/mat/impls/baij/seq/baijfact.c >> [929]PETSC ERROR: [929] MatSolve line 3354 /home/mlohry/petsc/src/mat/interface/matrix.c >> [929]PETSC ERROR: [929] PCApply_ILU line 201 /home/mlohry/petsc/src/ksp/pc/impls/factor/ilu/ilu.c >> [929]PETSC ERROR: [929] PCApply line 426 /home/mlohry/petsc/src/ksp/pc/interface/precon.c >> [929]PETSC ERROR: [929] KSP_PCApply line 279 /home/mlohry/petsc/include/petsc/private/kspimpl.h >> [929]PETSC ERROR: [929] KSPSolve_PREONLY line 16 /home/mlohry/petsc/src/ksp/ksp/impls/preonly/preonly.c >> [929]PETSC ERROR: [929] KSPSolve_Private line 590 /home/mlohry/petsc/src/ksp/ksp/interface/itfunc.c >> [929]PETSC ERROR: [929] KSPSolve line 848 /home/mlohry/petsc/src/ksp/ksp/interface/itfunc.c >> [929]PETSC ERROR: [929] PCApply_ASM line 441 /home/mlohry/petsc/src/ksp/pc/impls/asm/asm.c >> [929]PETSC ERROR: [929] PCApply line 426 /home/mlohry/petsc/src/ksp/pc/interface/precon.c >> [929]PETSC ERROR: [929] KSP_PCApply line 279 /home/mlohry/petsc/include/petsc/private/kspimpl.h >> srun: Job step aborted: Waiting up to 47 seconds for job step to finish. >> [929]PETSC ERROR: [929] KSPFGMRESCycle line 108 /home/mlohry/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c >> [929]PETSC ERROR: [929] KSPSolve_FGMRES line 274 /home/mlohry/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c >> [929]PETSC ERROR: [929] KSPSolve_Private line 590 /home/mlohry/petsc/src/ksp/ksp/interface/itfunc.c >> >> On Mon, Aug 24, 2020 at 6:47 PM Mark Lohry > wrote: >> I don't think I do. Running a much smaller case with the same models I get the attached report from valgrind --show-leak-kinds=all --leak-check=full --track-origins=yes. I only see some HDF5 stuff and OpenMPI that I think are false positives. >> >> ==1286950== Memcheck, a memory error detector >> ==1286950== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. >> ==1286950== Using Valgrind-3.15.0-608cb11914-20190413 and LibVEX; rerun with -h for copyright info >> ==1286950== Command: ./verification_testing --gtest_filter=DrivenCavity3D.Re100_BackwardEulerILU1_16x16N2_Quadrature1 --petsc_time_integrator=arkimex --petsc_arkimex_type=l2 >> ==1286950== Parent PID: 1286932 >> ==1286950== >> --1286950-- >> --1286950-- Valgrind options: >> --1286950-- --show-leak-kinds=all >> --1286950-- --leak-check=full >> --1286950-- --track-origins=yes >> --1286950-- --log-file=valgrind-out.txt >> --1286950-- -v >> --1286950-- Contents of /proc/version: >> --1286950-- Linux version 5.4.0-29-generic (buildd at lgw01-amd64-035) (gcc version 9.3.0 (Ubuntu 9.3.0-10ubuntu2)) #33-Ubuntu SMP Wed Apr 29 14:32:27 UTC 2020 >> --1286950-- >> --1286950-- Arch and hwcaps: AMD64, LittleEndian, amd64-cx16-rdtscp-sse3-ssse3-avx >> --1286950-- Page sizes: currently 4096, max supported 4096 >> --1286950-- Valgrind library directory: /usr/lib/x86_64-linux-gnu/valgrind >> --1286950-- Reading syms from /home/mlohry/dev/cmake-build/verification_testing >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/ld-2.31.so >> --1286950-- Considering /usr/lib/x86_64-linux-gnu/ld-2.31.so .. >> --1286950-- .. CRC mismatch (computed 387b17ea wanted d28cf5ef) >> --1286950-- Considering /lib/x86_64-linux-gnu/ld-2.31.so .. >> --1286950-- .. CRC mismatch (computed 387b17ea wanted d28cf5ef) >> --1286950-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ld-2.31.so .. >> --1286950-- .. CRC is valid >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/valgrind/memcheck-amd64-linux >> --1286950-- object doesn't have a symbol table >> --1286950-- object doesn't have a dynamic symbol table >> --1286950-- Scheduler: using generic scheduler lock implementation. >> --1286950-- Reading suppressions file: /usr/lib/x86_64-linux-gnu/valgrind/default.supp >> ==1286950== embedded gdbserver: reading from /tmp/vgdb-pipe-from-vgdb-to-1286950-by-mlohry-on-??? >> ==1286950== embedded gdbserver: writing to /tmp/vgdb-pipe-to-vgdb-from-1286950-by-mlohry-on-??? >> ==1286950== embedded gdbserver: shared mem /tmp/vgdb-pipe-shared-mem-vgdb-1286950-by-mlohry-on-??? >> ==1286950== >> ==1286950== TO CONTROL THIS PROCESS USING vgdb (which you probably >> ==1286950== don't want to do, unless you know exactly what you're doing, >> ==1286950== or are doing some strange experiment): >> ==1286950== /usr/lib/x86_64-linux-gnu/valgrind/../../bin/vgdb --pid=1286950 ...command... >> ==1286950== >> ==1286950== TO DEBUG THIS PROCESS USING GDB: start GDB like this >> ==1286950== /path/to/gdb ./verification_testing >> ==1286950== and then give GDB the following command >> ==1286950== target remote | /usr/lib/x86_64-linux-gnu/valgrind/../../bin/vgdb --pid=1286950 >> ==1286950== --pid is optional if only one valgrind process is running >> ==1286950== >> --1286950-- REDIR: 0x4022d80 (ld-linux-x86-64.so.2:strlen) redirected to 0x580c9ce2 (???) >> --1286950-- REDIR: 0x4022b50 (ld-linux-x86-64.so.2:index) redirected to 0x580c9cfc (???) >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_core-amd64-linux.so >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so >> --1286950-- object doesn't have a symbol table >> ==1286950== WARNING: new redirection conflicts with existing -- ignoring it >> --1286950-- old: 0x04022d80 (strlen ) R-> (0000.0) 0x580c9ce2 ??? >> --1286950-- new: 0x04022d80 (strlen ) R-> (2007.0) 0x0483f060 strlen >> --1286950-- REDIR: 0x401f560 (ld-linux-x86-64.so.2:strcmp) redirected to 0x483ffd0 (strcmp) >> --1286950-- REDIR: 0x40232e0 (ld-linux-x86-64.so.2:mempcpy) redirected to 0x4843a20 (mempcpy) >> --1286950-- Reading syms from /home/mlohry/dev/cmake-build/initialization/libinitialization.so >> --1286950-- Reading syms from /home/mlohry/dev/cmake-build/governing_equations/libgoverning_equations.so >> --1286950-- Reading syms from /home/mlohry/dev/cmake-build/time_stepping/libtime_stepping.so >> --1286950-- Reading syms from /home/mlohry/dev/cmake-build/governing_equations/libboundary_conditions.so >> --1286950-- Reading syms from /home/mlohry/dev/cmake-build/governing_equations/libsolution_monitors.so >> --1286950-- Reading syms from /home/mlohry/dev/cmake-build/governing_equations/libfluxtypes.so >> --1286950-- Reading syms from /home/mlohry/dev/cmake-build/algebraic_solvers/libalgebraic_solvers.so >> --1286950-- Reading syms from /home/mlohry/dev/cmake-build/program_options/libprogram_options.so >> --1286950-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_filesystem.so.1.73.0 >> --1286950-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0 >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so.40.20.1 >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3 >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libpthread-2.31.so >> --1286950-- Considering /usr/lib/debug/.build-id/77/5cbbfff814456660786780b0b3b40096b4c05e.debug .. >> --1286950-- .. build-id is valid >> --1286948-- Reading syms from /home/mlohry/dev/cmake-build/external/petsc/arch-linux2-c-opt/lib/libpetsc.so.3.13.3 >> --1286937-- Reading syms from /home/mlohry/dev/cmake-build/parallel/libparallel.so >> --1286937-- Reading syms from /home/mlohry/dev/cmake-build/logger/liblogger.so >> --1286937-- Reading syms from /home/mlohry/dev/cmake-build/spatial_discretization/libdiscretization.so >> --1286945-- Reading syms from /home/mlohry/dev/cmake-build/utils/libutils.so >> --1286944-- Reading syms from /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28 >> --1286938-- object doesn't have a symbol table >> --1286949-- Reading syms from /usr/lib/x86_64-linux-gnu/libm-2.31.so >> --1286949-- Considering /usr/lib/x86_64-linux-gnu/libm-2.31.so .. >> --1286947-- .. CRC mismatch (computed 327d785f wanted 751f5509) >> --1286947-- Considering /lib/x86_64-linux-gnu/libm-2.31.so .. >> --1286938-- .. CRC mismatch (computed 327d785f wanted 751f5509) >> --1286937-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/libm-2.31.so .. >> --1286950-- .. CRC is valid >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libc-2.31.so >> --1286950-- Considering /usr/lib/x86_64-linux-gnu/libc-2.31.so .. >> --1286951-- .. CRC mismatch (computed a6f43087 wanted 6555436e) >> --1286951-- Considering /lib/x86_64-linux-gnu/libc-2.31.so .. >> --1286947-- .. CRC mismatch (computed a6f43087 wanted 6555436e) >> --1286947-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.31.so .. >> --1286950-- .. CRC is valid >> --1286940-- Reading syms from /home/mlohry/dev/cmake-build/file_io/libfileio.so >> --1286950-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_program_options.so.1.73.0 >> --1286950-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_serialization.so.1.73.0 >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3 >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3 >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libhwloc.so.15.1.0 >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from /home/mlohry/dev/cmake-build/external/petsc/arch-linux2-c-opt/lib/libsuperlu_dist.so.6.3.0 >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 >> --1286950-- object doesn't have a symbol table >> --1286937-- Reading syms from /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 >> --1286937-- object doesn't have a symbol table >> --1286947-- Reading syms from /usr/lib/x86_64-linux-gnu/libX11.so.6.3.0 >> --1286939-- object doesn't have a symbol table >> --1286947-- Reading syms from /usr/lib/x86_64-linux-gnu/libdl-2.31.so >> --1286947-- Considering /usr/lib/x86_64-linux-gnu/libdl-2.31.so .. >> --1286947-- .. CRC mismatch (computed 4fd191ca wanted df8dd39a) >> --1286947-- Considering /lib/x86_64-linux-gnu/libdl-2.31.so .. >> --1286947-- .. CRC mismatch (computed 4fd191ca wanted df8dd39a) >> --1286947-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/libdl-2.31.so .. >> --1286947-- .. CRC is valid >> --1286937-- Reading syms from /home/mlohry/dev/cmake-build/external/petsc/arch-linux2-c-opt/lib/libmetis.so >> --1286937-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_log.so.1.73.0 >> --1286942-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_log_setup.so.1.73.0 >> --1286942-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_thread.so.1.73.0 >> --1286942-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_regex.so.1.73.0 >> --1286949-- Reading syms from /home/mlohry/dev/cmake-build/basis_functions/libbasis_functions.so >> --1286944-- Reading syms from /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0 >> --1286944-- object doesn't have a symbol table >> --1286951-- Reading syms from /home/mlohry/dev/cmake-build/external_install/lib/libcgns.so >> --1286951-- object doesn't have a symbol table >> --1286943-- Reading syms from /home/mlohry/dev/cmake-build/external_install/lib/libhdf5.so.103.1.0 >> --1286951-- Reading syms from /home/mlohry/dev/cmake-build/external/tinyxml2-build/libtinyxml2.so.6.1.0 >> --1286944-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_iostreams.so.1.73.0 >> --1286944-- Reading syms from /usr/lib/x86_64-linux-gnu/libz.so.1.2.11 >> --1286944-- object doesn't have a symbol table >> --1286951-- Reading syms from /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0 >> --1286951-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libutil-2.31.so >> --1286946-- Considering /usr/lib/x86_64-linux-gnu/libutil-2.31.so .. >> --1286946-- .. CRC mismatch (computed 4639aba5 wanted ceb246b4) >> --1286946-- Considering /lib/x86_64-linux-gnu/libutil-2.31.so .. >> --1286946-- .. CRC mismatch (computed 4639aba5 wanted ceb246b4) >> --1286948-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/libutil-2.31.so .. >> --1286939-- .. CRC is valid >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0 >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libudev.so.1.6.17 >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libltdl.so.7.3.1 >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libgfortran.so.5.0.0 >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libxcb.so.1.1.0 >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/librt-2.31.so >> --1286950-- Considering /usr/lib/x86_64-linux-gnu/librt-2.31.so .. >> --1286950-- .. CRC mismatch (computed a9acc0ce wanted cb4905a3) >> --1286950-- Considering /lib/x86_64-linux-gnu/librt-2.31.so .. >> --1286950-- .. CRC mismatch (computed a9acc0ce wanted cb4905a3) >> --1286950-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/librt-2.31.so .. >> --1286950-- .. CRC is valid >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libquadmath.so.0.0.0 >> --1286950-- object doesn't have a symbol table >> --1286945-- Reading syms from /usr/lib/x86_64-linux-gnu/libXau.so.6.0.0 >> --1286945-- Considering /usr/lib/x86_64-linux-gnu/libXau.so.6.0.0 .. >> --1286945-- .. CRC mismatch (computed 7de9b6ad wanted e8a17129) >> --1286945-- Considering /lib/x86_64-linux-gnu/libXau.so.6.0.0 .. >> --1286945-- .. CRC mismatch (computed 7de9b6ad wanted e8a17129) >> --1286945-- object doesn't have a symbol table >> --1286945-- Reading syms from /usr/lib/x86_64-linux-gnu/libXdmcp.so.6.0.0 >> --1286942-- object doesn't have a symbol table >> --1286942-- Reading syms from /usr/lib/x86_64-linux-gnu/libbsd.so.0.10.0 >> --1286942-- object doesn't have a symbol table >> --1286950-- REDIR: 0x6516600 (libc.so.6:memmove) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6515900 (libc.so.6:strncpy) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6516930 (libc.so.6:strcasecmp) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6515220 (libc.so.6:strcat) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6515960 (libc.so.6:rindex) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6517dd0 (libc.so.6:rawmemchr) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6532e60 (libc.so.6:wmemchr) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x65329a0 (libc.so.6:wcscmp) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6516760 (libc.so.6:mempcpy) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6516590 (libc.so.6:bcmp) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6515890 (libc.so.6:strncmp) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x65152d0 (libc.so.6:strcmp) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x65166c0 (libc.so.6:memset) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6532960 (libc.so.6:wcschr) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x65157f0 (libc.so.6:strnlen) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x65153b0 (libc.so.6:strcspn) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6516980 (libc.so.6:strncasecmp) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6515350 (libc.so.6:strcpy) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6516ad0 (libc.so.6:memcpy@@GLIBC_2.14) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x65340d0 (libc.so.6:wcsnlen) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x65329e0 (libc.so.6:wcscpy) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x65159a0 (libc.so.6:strpbrk) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6515280 (libc.so.6:index) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x65157b0 (libc.so.6:strlen) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x651ed20 (libc.so.6:memrchr) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x65169d0 (libc.so.6:strcasecmp_l) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6516550 (libc.so.6:memchr) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6532ab0 (libc.so.6:wcslen) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6515c60 (libc.so.6:strspn) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x65168d0 (libc.so.6:stpncpy) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6516870 (libc.so.6:stpcpy) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6517e10 (libc.so.6:strchrnul) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6516a20 (libc.so.6:strncasecmp_l) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x6516470 (libc.so.6:strstr) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x65a3750 (libc.so.6:__memcpy_chk) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286938-- REDIR: 0x6527a30 (libc.so.6:__strrchr_sse2) redirected to 0x483ea70 (__strrchr_sse2) >> --1286938-- REDIR: 0x6511c90 (libc.so.6:calloc) redirected to 0x483dce0 (calloc) >> --1286938-- REDIR: 0x6510260 (libc.so.6:malloc) redirected to 0x483b780 (malloc) >> --1286938-- REDIR: 0x6531c40 (libc.so.6:memcpy at GLIBC_2.2.5) redirected to 0x4840100 (memcpy at GLIBC_2.2.5) >> --1286938-- REDIR: 0x6527d30 (libc.so.6:__strlen_sse2) redirected to 0x483efa0 (__strlen_sse2) >> --1286938-- REDIR: 0x65f4ac0 (libc.so.6:__strncmp_sse42) redirected to 0x483f7c0 (__strncmp_sse42) >> --1286938-- REDIR: 0x6510850 (libc.so.6:free) redirected to 0x483c9d0 (free) >> --1286938-- REDIR: 0x6532070 (libc.so.6:__memset_sse2_unaligned) redirected to 0x48428e0 (memset) >> --1286938-- REDIR: 0x6603350 (libc.so.6:__memcmp_sse4_1) redirected to 0x4842150 (__memcmp_sse4_1) >> --1286938-- REDIR: 0x6520520 (libc.so.6:__strcmp_sse2_unaligned) redirected to 0x483fed0 (strcmp) >> --1286938-- REDIR: 0x61d0c10 (libstdc++.so.6:operator new(unsigned long)) redirected to 0x483bdf0 (operator new(unsigned long)) >> --1286938-- REDIR: 0x61cee60 (libstdc++.so.6:operator delete(void*)) redirected to 0x483cf50 (operator delete(void*)) >> --1286938-- REDIR: 0x61d0c70 (libstdc++.so.6:operator new[](unsigned long)) redirected to 0x483c510 (operator new[](unsigned long)) >> --1286938-- REDIR: 0x61cee90 (libstdc++.so.6:operator delete[](void*)) redirected to 0x483d6e0 (operator delete[](void*)) >> --1286938-- REDIR: 0x65275f0 (libc.so.6:__strchr_sse2) redirected to 0x483eb90 (__strchr_sse2) >> --1286950-- REDIR: 0x6511000 (libc.so.6:realloc) redirected to 0x483df30 (realloc) >> --1286950-- REDIR: 0x6527820 (libc.so.6:__strchrnul_sse2) redirected to 0x4843540 (strchrnul) >> --1286950-- REDIR: 0x6531560 (libc.so.6:__strstr_sse2_unaligned) redirected to 0x4843c20 (strstr) >> --1286950-- REDIR: 0x6531c20 (libc.so.6:__mempcpy_sse2_unaligned) redirected to 0x4843660 (mempcpy) >> --1286950-- REDIR: 0x652d2a0 (libc.so.6:__strncpy_sse2_unaligned) redirected to 0x483f560 (__strncpy_sse2_unaligned) >> --1286950-- REDIR: 0x6515830 (libc.so.6:strncat) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> --1286950-- REDIR: 0x65305b0 (libc.so.6:__strncat_sse2_unaligned) redirected to 0x483ede0 (strncat) >> --1286950-- REDIR: 0x6516120 (libc.so.6:__GI_strstr) redirected to 0x4843ca0 (__strstr_sse2) >> --1286950-- REDIR: 0x6522360 (libc.so.6:__rawmemchr_sse2) redirected to 0x4843580 (rawmemchr) >> --1286950-- REDIR: 0x65faea0 (libc.so.6:__strcasecmp_avx) redirected to 0x483f830 (strcasecmp) >> --1286950-- REDIR: 0x65fc520 (libc.so.6:__strncasecmp_avx) redirected to 0x483f910 (strncasecmp) >> --1286950-- REDIR: 0x65f98a0 (libc.so.6:__strspn_sse42) redirected to 0x4843ef0 (strspn) >> --1286950-- REDIR: 0x65f9620 (libc.so.6:__strcspn_sse42) redirected to 0x4843e10 (strcspn) >> --1286948-- REDIR: 0x6522030 (libc.so.6:__memchr_sse2) redirected to 0x4840050 (memchr) >> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_mmap.so >> --1286948-- object doesn't have a symbol table >> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_posix.so >> --1286948-- object doesn't have a symbol table >> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_sysv.so >> --1286948-- object doesn't have a symbol table >> --1286948-- Discarding syms at 0x4a96240-0x4a96d47 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_posix.so (have_dinfo 1) >> --1286948-- Discarding syms at 0x4a9b1c0-0x4a9b937 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_sysv.so (have_dinfo 1) >> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_weighted.so >> --1286948-- object doesn't have a symbol table >> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_netlink.so >> --1286948-- object doesn't have a symbol table >> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/libnl-3.so.200.26.0 >> --1286948-- object doesn't have a symbol table >> --1286948-- Discarding syms at 0x4a96120-0x4a966b0 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_weighted.so (have_dinfo 1) >> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_flux.so >> --1286948-- object doesn't have a symbol table >> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_orte.so >> --1286948-- object doesn't have a symbol table >> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_ompi.so >> --1286948-- object doesn't have a symbol table >> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_slurm.so >> --1286948-- object doesn't have a symbol table >> --1286948-- REDIR: 0x64bc670 (libc.so.6:setenv) redirected to 0x4844480 (setenv) >> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_pmi.so >> --1286948-- object doesn't have a symbol table >> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_flux.so >> --1286948-- object doesn't have a symbol table >> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_ext3x.so >> --1286948-- object doesn't have a symbol table >> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25 >> --1286948-- object doesn't have a symbol table >> --1286948-- Discarding syms at 0x8d053e0-0x8d07391 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_flux.so (have_dinfo 1) >> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v21.so >> --1286948-- object doesn't have a symbol table >> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v20.so >> --1286948-- object doesn't have a symbol table >> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v3.so >> --1286948-- object doesn't have a symbol table >> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v12.so >> --1286948-- object doesn't have a symbol table >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_usock.so >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_tcp.so >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_native.so >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_none.so >> --1286950-- object doesn't have a symbol table >> --1286950-- Discarding syms at 0x8d04180-0x8d045b0 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_none.so (have_dinfo 1) >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/libmca_common_dstore.so.1.0.2 >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_hash.so >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds12.so >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_pshmem_mmap.so >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_preg_native.so >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_stdfd.so >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_syslog.so >> --1286950-- object doesn't have a symbol table >> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_default.so >> --1286950-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_tool.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_app.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_orted.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_hnp.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_novm.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Discarding syms at 0x9ebf0a0-0x9ebf490 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_tool.so (have_dinfo 1) >> --1286946-- Discarding syms at 0x9eca300-0x9ecbee8 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_orted.so (have_dinfo 1) >> --1286946-- Discarding syms at 0x9ed1220-0x9ed24e7 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_hnp.so (have_dinfo 1) >> --1286946-- Discarding syms at 0x9ed8240-0x9ed8c88 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_novm.so (have_dinfo 1) >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_tool.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_app.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_hnp.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_orted.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Discarding syms at 0x9ebf0e0-0x9ebf417 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_tool.so (have_dinfo 1) >> --1286946-- Discarding syms at 0x9ecf320-0x9ed1239 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_hnp.so (have_dinfo 1) >> --1286946-- Discarding syms at 0x9ed73a0-0x9ed9ccc in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_orted.so (have_dinfo 1) >> --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_basic.so >> --1286936-- object doesn't have a symbol table >> --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_bucket.so >> --1286936-- object doesn't have a symbol table >> --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_rcache_grdma.so >> --1286936-- object doesn't have a symbol table >> --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mpool_hugepage.so >> --1286936-- object doesn't have a symbol table >> --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_bml_r2.so >> --1286936-- object doesn't have a symbol table >> --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_tcp.so >> --1286936-- object doesn't have a symbol table >> --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_sm.so >> --1286936-- object doesn't have a symbol table >> --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so >> --1286936-- object doesn't have a symbol table >> --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_openib.so >> --1286936-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_verbs.so.40.20.0 >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_self.so >> --1286946-- object doesn't have a symbol table >> --1286946-- REDIR: 0x652cc70 (libc.so.6:__strcpy_sse2_unaligned) redirected to 0x483f090 (strcpy) >> --1286946-- REDIR: 0x65a3810 (libc.so.6:__memmove_chk) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >> ==1286946== WARNING: new redirection conflicts with existing -- ignoring it >> --1286946-- old: 0x06531c30 (__memcpy_chk_sse2_un) R-> (2030.0) 0x04843b10 __memcpy_chk >> --1286946-- new: 0x06531c30 (__memcpy_chk_sse2_un) R-> (2024.0) 0x048434d0 __memmove_chk >> --1286946-- REDIR: 0x6531c30 (libc.so.6:__memcpy_chk_sse2_unaligned) redirected to 0x4843b10 (__memcpy_chk) >> --1286946-- REDIR: 0x65129b0 (libc.so.6:posix_memalign) redirected to 0x483e1e0 (posix_memalign) >> --1286946-- Discarding syms at 0x9f15280-0x9f32932 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_openib.so (have_dinfo 1) >> --1286946-- Discarding syms at 0x9f7c4c0-0x9f7ded8 in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_verbs.so.40.20.0 (have_dinfo 1) >> --1286946-- Discarding syms at 0x9f620c0-0x9f71483 in /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 (have_dinfo 1) >> --1286946-- Discarding syms at 0x9f9ba10-0x9fd22ee in /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 (have_dinfo 1) >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_cm.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_monitoring.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_monitoring.so.50.10.0 >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_vprotocol_pessimist.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Discarding syms at 0x9f4d400-0x9f50c19 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_vprotocol_pessimist.so (have_dinfo 1) >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/libpsm1/libpsm_infinipath.so.1.16 >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libinfinipath.so.4.0 >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libuuid.so.1.3.0 >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_ofi.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libfabric.so.1.9.15 >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/librdmacm.so.1.2.28.0 >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libpsm2.so.2.1 >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libnuma.so.1.0.0 >> --1286946-- object doesn't have a symbol table >> --1286946-- REDIR: 0x6517140 (libc.so.6:strcasestr) redirected to 0x4843f80 (strcasestr) >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm2.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Discarding syms at 0x9f4d5c0-0x9f4f5a1 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm.so (have_dinfo 1) >> --1286946-- Discarding syms at 0x9fee680-0x9ff096c in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm2.so (have_dinfo 1) >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_inter.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_basic.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sm.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_sm.so.40.20.0 >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_self.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sync.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_monitoring.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_libnbc.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_tuned.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_sm.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_pt2pt.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_rdma.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_monitoring.so >> --1286946-- object doesn't have a symbol table >> --1286946-- Discarding syms at 0x9f724a0-0x9f787b5 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_ofi.so (have_dinfo 1) >> --1286946-- Discarding syms at 0xa827f80-0xa8e14c4 in /usr/lib/x86_64-linux-gnu/libfabric.so.1.9.15 (have_dinfo 1) >> --1286946-- Discarding syms at 0x9f94830-0x9fbafce in /usr/lib/libpsm1/libpsm_infinipath.so.1.16 (have_dinfo 1) >> --1286946-- Discarding syms at 0x9fe5580-0x9fe8f71 in /usr/lib/x86_64-linux-gnu/libuuid.so.1.3.0 (have_dinfo 1) >> --1286946-- Discarding syms at 0x9f56420-0x9f5cec0 in /usr/lib/x86_64-linux-gnu/libinfinipath.so.4.0 (have_dinfo 1) >> --1286946-- Discarding syms at 0xa929f10-0xa93d5fc in /usr/lib/x86_64-linux-gnu/librdmacm.so.1.2.28.0 (have_dinfo 1) >> --1286946-- Discarding syms at 0xa94b0c0-0xa95a483 in /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 (have_dinfo 1) >> --1286946-- Discarding syms at 0xa968860-0xa9adf12 in /usr/lib/x86_64-linux-gnu/libpsm2.so.2.1 (have_dinfo 1) >> --1286946-- Discarding syms at 0xa9e7a10-0xaa1e2ee in /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 (have_dinfo 1) >> --1286946-- Discarding syms at 0x9f80410-0x9f84e27 in /usr/lib/x86_64-linux-gnu/libnuma.so.1.0.0 (have_dinfo 1) >> --1286946-- Discarding syms at 0x9f103e0-0x9f15fd5 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_cm.so (have_dinfo 1) >> --1286946-- Discarding syms at 0x9f471e0-0x9f47ce0 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_monitoring.so (have_dinfo 1) >> ==1286946== Thread 3: >> ==1286946== Syscall param writev(vector[...]) points to uninitialised byte(s) >> ==1286946== at 0x658A48D: __writev (writev.c:26) >> ==1286946== by 0x658A48D: writev (writev.c:24) >> ==1286946== by 0x8DF9B4C: pmix_ptl_base_send_handler (in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >> ==1286946== by 0x7CC413E: ??? (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286946== by 0x7CC487E: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286946== by 0x8DBDD55: ??? (in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >> ==1286946== by 0x4BF7608: start_thread (pthread_create.c:477) >> ==1286946== by 0x6595102: clone (clone.S:95) >> ==1286946== Address 0xa28fdcf is 127 bytes inside a block of size 5,120 alloc'd >> ==1286946== at 0x483DFAF: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286946== by 0x8DE155A: pmix_bfrop_buffer_extend (in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >> ==1286946== by 0x8DE3F4A: pmix_bfrops_base_pack_byte (in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >> ==1286946== by 0x8DE4900: pmix_bfrops_base_pack_buf (in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >> ==1286946== by 0x8DE4175: pmix_bfrops_base_pack (in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >> ==1286946== by 0x8D7CF91: ??? (in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >> ==1286946== by 0x7CC3FDD: ??? (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286946== by 0x7CC487E: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286946== by 0x8DBDD55: ??? (in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >> ==1286946== by 0x4BF7608: start_thread (pthread_create.c:477) >> ==1286946== by 0x6595102: clone (clone.S:95) >> ==1286946== Uninitialised value was created by a stack allocation >> ==1286946== at 0x9F048D6: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so) >> ==1286946== >> --1286944-- Discarding syms at 0xaa4d220-0xaa5796a in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmp--1286945-- Discarding syms at 0xaa4d220---1286948-- Discarding syms at 0xaae1100-0xaae7d70 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmp--1286945-- Discarding syms at 0xaae1100-0xaae7d70 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_monitoring.so (have_dinfo 1) >> --1286945-- Discarding syms at 0x9f69420-0x9f--1286938-- REDIR: 0x61cee70 (libstdc++.so.6:operator delete(void*, unsigned long)) redirected to --1286937-- REDIR: 0x61cee70 (libstdc++.so.6:opera--1286946-- REDIR: 0x652e970 (libc.so.6:__stpncpy_sse2_unaligned) redirected to 0x48427e0 (stpncpy) >> --1286942-- REDIR: 0x6527ed0 (libc.so.6:__strnlen_sse2) redirected to 0x483eee0 (strnlen) >> --1286944-- REDIR: 0x652fcc0 (libc.so.6:__strcat_sse2_unaligned) redirected to 0x483ec20 (strcat) >> --1286951-- REDIR: 0x65113d0 (libc.so.6:memalign) redirected to 0x483e2a0 (memalign) >> --1286951-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_romio321.so >> --1286951-- object doesn't have a symbol table >> --1286951-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_ompio.so >> --1286951-- object doesn't have a symbol table >> --1286941-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_ompio.so.41.19.3 >> --1286941-- object doesn't have a symbol table >> --1286951-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fs_ufs.so >> --1286951-- object doesn't have a symbol table >> --1286939-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_two_phase.so >> --1286939-- object doesn't have a symbol table >> --1286939-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_individual.so >> --1286939-- object doesn't have a symbol table >> --1286939-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic.so >> --1286939-- object doesn't have a symbol table >> --1286939-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic_gen2.so >> --1286939-- object doesn't have a symbol table >> --1286939-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_vulcan.so >> --1286939-- object doesn't have a symbol table >> --1286939-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fbtl_posix.so >> --1286939-- object doesn't have a symbol table >> --1286943-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_sm.so >> --1286943-- object doesn't have a symbol table >> --1286943-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_individual.so >> --1286943-- object doesn't have a symbol table >> --1286943-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_lockedfile.so >> --1286943-- object doesn't have a symbol table >> --1286938-- REDIR: 0x65a3b00 (libc.so.6:__strcpy_chk) redirected to 0x48435c0 (__strcpy_chk) >> --1286939-- Discarding syms at 0x9f1d660-0x9f371d6 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x9f5afa0-0x9f8f8b6 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_romio321.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x9fa0640-0x9fa42d9 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_ompio.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x9f4c160-0x9f4dc58 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_inter.so (have_dinfo 1) >> --1286939-- Discarding syms at 0xa7fc270-0xa804f00 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_basic.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x9fee3a0-0x9ff134e in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sm.so (have_dinfo 1) >> --1286939-- Discarding syms at 0xa80a240-0xa80aa8d in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_sm.so.40.20.0 (have_dinfo 1) >> --1286939-- Discarding syms at 0xa80f0e0-0xa80f8bb in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_self.so (have_dinfo 1) >> --1286939-- Discarding syms at 0xaa460c0-0xaa47947 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sync.so (have_dinfo 1) >> --1286939-- Discarding syms at 0xaa613e0-0xaa7730f in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_libnbc.so (have_dinfo 1) >> --1286939-- Discarding syms at 0xaa849c0-0xaa8a845 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_tuned.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x9ee1320-0x9ee3567 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_bml_r2.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x9eebc40-0x9ef4ad7 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_tcp.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x9f02600-0x9f08cd8 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x9f40200-0x9f4126e in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_self.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x9eda4e0-0x9edb4c5 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mpool_hugepage.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x9ed32c0-0x9ed4afe in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_rcache_grdma.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x9ebf160-0x9ebfe95 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_basic.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x9ece140-0x9ecebed in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_bucket.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x9ec92a0-0x9ec9aa2 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_app.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x8eae0e0-0x8eae4a7 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_stdfd.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x8eb3220-0x8eb3c27 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_syslog.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x8eb80e0-0x8eb90b7 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_default.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x8ea6380-0x8ea97b3 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_preg_native.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x8e5a740-0x8e5f859 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_usock.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x8e67be0-0x8e743f0 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_tcp.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x84da200-0x84daa5d in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_native.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x8d322b0-0x8d34bfc in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v21.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x8e29480-0x8e3b70a in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v20.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x8d3c2b0-0x8d3ed5c in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v3.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x8e45340-0x8e502da in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v12.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x8e901a0-0x8e908a7 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_pshmem_mmap.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x8d05520-0x8d06783 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x8e7b460-0x8e8aaa4 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_hash.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x8d44520-0x8d4556a in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds12.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x8e97600-0x8ea0fa1 in /usr/lib/x86_64-linux-gnu/pmix/lib/libmca_common_dstore.so.1.0.2 (have_dinfo 1) >> --1286939-- Discarding syms at 0x8d109c0-0x8d27dcf in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_ext3x.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x8d5b280-0x8dfdffb in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25 (have_dinfo 1) >> --1286939-- Discarding syms at 0x9ec40a0-0x9ec4490 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_app.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x84d2580-0x84d518f in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_pmi.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x4a96120-0x4a9644f in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_flux.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x4aa0100-0x4aa03e7 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_orte.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x84c74a0-0x84c901f in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_ompi.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x4aa5260-0x4aa58e9 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_slurm.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x4a9b420-0x4a9bcdf in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_netlink.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x84e7460-0x84f52ca in /usr/lib/x86_64-linux-gnu/libnl-3.so.200.26.0 (have_dinfo 1) >> --1286939-- Discarding syms at 0x4a90360-0x4a91107 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_mmap.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x9f46220-0x9f474cc in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fbtl_posix.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x9f0f180-0x9f0f78d in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fs_ufs.so (have_dinfo 1) >> --1286939-- Discarding syms at 0xaa94540-0xaa96a4a in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_sm.so (have_dinfo 1) >> --1286939-- Discarding syms at 0xaa9f6c0-0xaab44d0 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_pt2pt.so (have_dinfo 1) >> --1286939-- Discarding syms at 0xaabe820-0xaad8ee0 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_rdma.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x9efc080-0x9efc1e1 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_sm.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x9fab2a0-0x9fb1341 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_two_phase.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x9f140c0-0x9f14299 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_individual.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x9fb72a0-0x9fbb791 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x9fd52a0-0x9fda794 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic_gen2.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x9fe02e0-0x9fe59a5 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_vulcan.so (have_dinfo 1) >> --1286939-- Discarding syms at 0xa815460-0xa8177ab in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_sm.so (have_dinfo 1) >> --1286939-- Discarding syms at 0xa81e260-0xa82033d in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_individual.so (have_dinfo 1) >> --1286939-- Discarding syms at 0xa8273e0-0xa8297d8 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_lockedfile.so (have_dinfo 1) >> --1286939-- Discarding syms at 0x9fc85e0-0x9fce8ef in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_ompio.so.41.19.3 (have_dinfo 1) >> ==1286939== >> ==1286939== HEAP SUMMARY: >> ==1286939== in use at exit: 74,054 bytes in 223 blocks >> ==1286939== total heap usage: 22,405,782 allocs, 22,405,559 frees, 34,062,479,959 bytes allocated >> ==1286939== >> ==1286939== Searching for pointers to 223 not-freed blocks >> ==1286939== Checked 3,415,912 bytes >> ==1286939== >> ==1286939== Thread 1: >> ==1286939== 1 bytes in 1 blocks are definitely lost in loss record 1 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x651550E: strdup (strdup.c:42) >> ==1286939== by 0x9F6A4B6: ??? >> ==1286939== by 0x9F47373: ??? >> ==1286939== by 0x68E3B9B: mca_base_framework_components_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E3F35: mca_base_framework_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E3F93: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x4BA1734: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >> ==1286939== >> ==1286939== 8 bytes in 1 blocks are still reachable in loss record 2 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x764724C: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) >> ==1286939== by 0x7657B9A: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) >> ==1286939== by 0x7645679: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) >> ==1286939== by 0x4011B89: call_init.part.0 (dl-init.c:72) >> ==1286939== by 0x4011C90: call_init (dl-init.c:30) >> ==1286939== by 0x4011C90: _dl_init (dl-init.c:119) >> ==1286939== by 0x4001139: ??? (in /usr/lib/x86_64-linux-gnu/ld-2.31.so ) >> ==1286939== by 0x3: ??? >> ==1286939== by 0x1FFEFFF926: ??? >> ==1286939== by 0x1FFEFFF93D: ??? >> ==1286939== by 0x1FFEFFF987: ??? >> ==1286939== by 0x1FFEFFF9A7: ??? >> ==1286939== >> ==1286939== 8 bytes in 1 blocks are definitely lost in loss record 3 of 44 >> ==1286939== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x9F69B6F: ??? >> ==1286939== by 0x9F1CDED: ??? >> ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x9EE3527: ??? >> ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >> ==1286939== by 0x15710D: main (testing_main.cpp:8) >> ==1286939== >> ==1286939== 13 bytes in 2 blocks are still reachable in loss record 4 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x651550E: strdup (strdup.c:42) >> ==1286939== by 0x7CC3657: event_config_avoid_method (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286939== by 0x68FEB5A: opal_event_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68FE8CA: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68B8BCF: opal_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x6860120: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >> ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286939== >> ==1286939== 15 bytes in 1 blocks are indirectly lost in loss record 5 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x651550E: strdup (strdup.c:42) >> ==1286939== by 0x9EDB189: ??? >> ==1286939== by 0x68D98FC: mca_base_framework_components_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x6907C25: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x4BA16D5: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >> ==1286939== by 0x15710D: main (testing_main.cpp:8) >> ==1286939== >> ==1286939== 15 bytes in 1 blocks are definitely lost in loss record 6 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x651550E: strdup (strdup.c:42) >> ==1286939== by 0x9F5655C: ??? >> ==1286939== by 0x4011B89: call_init.part.0 (dl-init.c:72) >> ==1286939== by 0x4011C90: call_init (dl-init.c:30) >> ==1286939== by 0x4011C90: _dl_init (dl-init.c:119) >> ==1286939== by 0x65D6784: _dl_catch_exception (dl-error-skeleton.c:182) >> ==1286939== by 0x401642C: dl_open_worker (dl-open.c:758) >> ==1286939== by 0x65D6727: _dl_catch_exception (dl-error-skeleton.c:208) >> ==1286939== by 0x40155F9: _dl_open (dl-open.c:837) >> ==1286939== by 0x72DE34B: dlopen_doit (dlopen.c:66) >> ==1286939== by 0x65D6727: _dl_catch_exception (dl-error-skeleton.c:208) >> ==1286939== by 0x65D67F2: _dl_catch_error (dl-error-skeleton.c:227) >> ==1286939== >> ==1286939== 16 bytes in 1 blocks are definitely lost in loss record 7 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x9F1CBEB: ??? >> ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x9EE3527: ??? >> ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >> ==1286939== by 0x15710D: main (testing_main.cpp:8) >> ==1286939== >> ==1286939== 16 bytes in 1 blocks are definitely lost in loss record 8 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x9F1CC66: ??? >> ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x9EE3527: ??? >> ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >> ==1286939== by 0x15710D: main (testing_main.cpp:8) >> ==1286939== >> ==1286939== 16 bytes in 1 blocks are definitely lost in loss record 9 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x9F1CCDA: ??? >> ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x9EE3527: ??? >> ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >> ==1286939== by 0x15710D: main (testing_main.cpp:8) >> ==1286939== >> ==1286939== 25 bytes in 1 blocks are still reachable in loss record 10 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x651550E: strdup (strdup.c:42) >> ==1286939== by 0x68F27BD: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x4B956B6: ompi_pml_v_output_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B95259: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x68D98FC: mca_base_framework_components_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x4B93FAE: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x4BA1734: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286939== >> ==1286939== 30 bytes in 1 blocks are definitely lost in loss record 11 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0xA9A859B: ??? >> ==1286939== by 0x4011B89: call_init.part.0 (dl-init.c:72) >> ==1286939== by 0x4011C90: call_init (dl-init.c:30) >> ==1286939== by 0x4011C90: _dl_init (dl-init.c:119) >> ==1286939== by 0x65D6784: _dl_catch_exception (dl-error-skeleton.c:182) >> ==1286939== by 0x401642C: dl_open_worker (dl-open.c:758) >> ==1286939== by 0x65D6727: _dl_catch_exception (dl-error-skeleton.c:208) >> ==1286939== by 0x40155F9: _dl_open (dl-open.c:837) >> ==1286939== by 0x72DE34B: dlopen_doit (dlopen.c:66) >> ==1286939== by 0x65D6727: _dl_catch_exception (dl-error-skeleton.c:208) >> ==1286939== by 0x65D67F2: _dl_catch_error (dl-error-skeleton.c:227) >> ==1286939== by 0x72DEB58: _dlerror_run (dlerror.c:170) >> ==1286939== >> ==1286939== 32 bytes in 1 blocks are still reachable in loss record 12 of 44 >> ==1286939== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x7CC353E: event_get_supported_methods (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286939== by 0x68FEA98: opal_event_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68FE8CA: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68B8BCF: opal_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x6860120: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >> ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >> ==1286939== >> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 13 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x8E9D3EB: ??? >> ==1286939== by 0x8E9F1C1: ??? >> ==1286939== by 0x8D0578C: ??? >> ==1286939== by 0x8D8605A: ??? >> ==1286939== by 0x8D87FE8: ??? >> ==1286939== by 0x8D88E4D: ??? >> ==1286939== by 0x8D1A5EB: ??? >> ==1286939== by 0x84D2B0A: ??? >> ==1286939== by 0x68602FB: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >> ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== >> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 14 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x8E9D3EB: ??? >> ==1286939== by 0x8E9F1C1: ??? >> ==1286939== by 0x8D0578C: ??? >> ==1286939== by 0x8D8605A: ??? >> ==1286939== by 0x8D87FE8: ??? >> ==1286939== by 0x8D88E4D: ??? >> ==1286939== by 0x8D1A5EB: ??? >> ==1286939== by 0x84D2BCE: ??? >> ==1286939== by 0x68602FB: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >> ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== >> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 15 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x8E9D3EB: ??? >> ==1286939== by 0x8E9F1C1: ??? >> ==1286939== by 0x8D0578C: ??? >> ==1286939== by 0x8D8605A: ??? >> ==1286939== by 0x8D87FE8: ??? >> ==1286939== by 0x8D88E4D: ??? >> ==1286939== by 0x8D1A5EB: ??? >> ==1286939== by 0x84D2CB2: ??? >> ==1286939== by 0x68602FB: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >> ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== >> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 16 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x8E9D3EB: ??? >> ==1286939== by 0x8E9F1C1: ??? >> ==1286939== by 0x8D0578C: ??? >> ==1286939== by 0x8D8605A: ??? >> ==1286939== by 0x8D87FE8: ??? >> ==1286939== by 0x8D88E4D: ??? >> ==1286939== by 0x8D1A5EB: ??? >> ==1286939== by 0x84D2D91: ??? >> ==1286939== by 0x68602FB: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >> ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== >> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 17 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x8E81BD8: ??? >> ==1286939== by 0x8E89F4B: ??? >> ==1286939== by 0x8D84A0D: ??? >> ==1286939== by 0x8DF79C1: ??? >> ==1286939== by 0x7CC3FDD: ??? (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286939== by 0x7CC487E: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286939== by 0x8DBDD55: ??? >> ==1286939== by 0x4BF7608: start_thread (pthread_create.c:477) >> ==1286939== by 0x6595102: clone (clone.S:95) >> ==1286939== >> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 18 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x8E9D3EB: ??? >> ==1286939== by 0x8E9F1C1: ??? >> ==1286939== by 0x8D0578C: ??? >> ==1286939== by 0x8D8605A: ??? >> ==1286939== by 0x8D87FE8: ??? >> ==1286939== by 0x8D88E4D: ??? >> ==1286939== by 0x8D1A767: ??? >> ==1286939== by 0x84D330E: ??? >> ==1286939== by 0x68602FB: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >> ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== >> ==1286939== 36 (32 direct, 4 indirect) bytes in 1 blocks are definitely lost in loss record 19 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x8E9D3EB: ??? >> ==1286939== by 0x8E9F1C1: ??? >> ==1286939== by 0x8D0578C: ??? >> ==1286939== by 0x8D8605A: ??? >> ==1286939== by 0x8D87FE8: ??? >> ==1286939== by 0x8D88E4D: ??? >> ==1286939== by 0x8D1A5EB: ??? >> ==1286939== by 0x4B94C09: mca_pml_base_pml_check_selected (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x9F1E1E1: ??? >> ==1286939== by 0x4BA1A09: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== >> ==1286939== 40 bytes in 1 blocks are still reachable in loss record 20 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x7CFF4B6: ??? (in /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >> ==1286939== by 0x7CC5E26: event_global_setup_locks_ (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286939== by 0x7CFF68F: evthread_use_pthreads (in /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >> ==1286939== by 0x68FE8E4: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68B8BCF: opal_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x6860120: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >> ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286939== >> ==1286939== 40 bytes in 1 blocks are still reachable in loss record 21 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x7CFF4B6: ??? (in /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >> ==1286939== by 0x7CCF377: evsig_global_setup_locks_ (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286939== by 0x7CC5E39: event_global_setup_locks_ (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286939== by 0x7CFF68F: evthread_use_pthreads (in /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >> ==1286939== by 0x68FE8E4: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68B8BCF: opal_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x6860120: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >> ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== >> ==1286939== 40 bytes in 1 blocks are still reachable in loss record 22 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x7CFF4B6: ??? (in /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >> ==1286939== by 0x7CCB997: evutil_secure_rng_global_setup_locks_ (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286939== by 0x7CC5E4F: event_global_setup_locks_ (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286939== by 0x7CFF68F: evthread_use_pthreads (in /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >> ==1286939== by 0x68FE8E4: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68B8BCF: opal_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x6860120: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >> ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== >> ==1286939== 48 bytes in 1 blocks are still reachable in loss record 23 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x68D9043: mca_base_component_repository_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68D7F7A: mca_base_component_find (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E3F35: mca_base_framework_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E3F93: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x4B8560C: mca_io_base_file_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B0E68A: ompi_file_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B3ADB8: PMPI_File_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) >> ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) >> ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) >> ==1286939== >> ==1286939== 48 bytes in 1 blocks are still reachable in loss record 24 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x68D9043: mca_base_component_repository_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68D7F7A: mca_base_component_find (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E3F35: mca_base_framework_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E3F93: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x4B85638: mca_io_base_file_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B0E68A: ompi_file_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B3ADB8: PMPI_File_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) >> ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) >> ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) >> ==1286939== >> ==1286939== 48 bytes in 2 blocks are still reachable in loss record 25 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x7CC3647: event_config_avoid_method (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286939== by 0x68FEB5A: opal_event_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68FE8CA: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68B8BCF: opal_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x6860120: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >> ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >> ==1286939== >> ==1286939== 55 (32 direct, 23 indirect) bytes in 1 blocks are definitely lost in loss record 26 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x8E9D3EB: ??? >> ==1286939== by 0x8E9F1C1: ??? >> ==1286939== by 0x8D0578C: ??? >> ==1286939== by 0x8D8605A: ??? >> ==1286939== by 0x8D87FE8: ??? >> ==1286939== by 0x8D88E4D: ??? >> ==1286939== by 0x8D1A767: ??? >> ==1286939== by 0x4AF6CD6: ompi_comm_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4BA194D: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== >> ==1286939== 56 bytes in 1 blocks are still reachable in loss record 27 of 44 >> ==1286939== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x7CC1C86: event_config_new (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286939== by 0x68FEAC0: opal_event_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68FE8CA: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68B8BCF: opal_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x6860120: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >> ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >> ==1286939== >> ==1286939== 56 bytes in 1 blocks are definitely lost in loss record 28 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x9F6E008: ??? >> ==1286939== by 0x9F7C654: ??? >> ==1286939== by 0x9F1CD3E: ??? >> ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x9EE3527: ??? >> ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >> ==1286939== >> ==1286939== 56 bytes in 1 blocks are definitely lost in loss record 29 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0xA957008: ??? >> ==1286939== by 0xA86B017: ??? >> ==1286939== by 0xA862FD8: ??? >> ==1286939== by 0xA828E15: ??? >> ==1286939== by 0xA829624: ??? >> ==1286939== by 0x9F77910: ??? >> ==1286939== by 0x4B85C53: ompi_mtl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x9F13E4D: ??? >> ==1286939== by 0x4B94673: mca_pml_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4BA1789: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== >> ==1286939== 76 (32 direct, 44 indirect) bytes in 1 blocks are definitely lost in loss record 30 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x8E9D3EB: ??? >> ==1286939== by 0x8E9F1C1: ??? >> ==1286939== by 0x8D0578C: ??? >> ==1286939== by 0x8D8605A: ??? >> ==1286939== by 0x8D87FE8: ??? >> ==1286939== by 0x8D88E4D: ??? >> ==1286939== by 0x8D1A767: ??? >> ==1286939== by 0x84D387F: ??? >> ==1286939== by 0x68602FB: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >> ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== >> ==1286939== 79 (64 direct, 15 indirect) bytes in 1 blocks are definitely lost in loss record 31 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x9EDB12E: ??? >> ==1286939== by 0x68D98FC: mca_base_framework_components_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x6907C25: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x4BA16D5: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >> ==1286939== by 0x15710D: main (testing_main.cpp:8) >> ==1286939== >> ==1286939== 144 bytes in 3 blocks are still reachable in loss record 32 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x68D9043: mca_base_component_repository_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68D7F7A: mca_base_component_find (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E3F35: mca_base_framework_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E3F93: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x4B8564E: mca_io_base_file_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B0E68A: ompi_file_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B3ADB8: PMPI_File_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) >> ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) >> ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) >> ==1286939== >> ==1286939== 231 bytes in 12 blocks are definitely lost in loss record 33 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x651550E: strdup (strdup.c:42) >> ==1286939== by 0x9F2B4B3: ??? >> ==1286939== by 0x9F2B85C: ??? >> ==1286939== by 0x9F2BBD7: ??? >> ==1286939== by 0x9F1CAAC: ??? >> ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x9EE3527: ??? >> ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== >> ==1286939== 240 bytes in 5 blocks are still reachable in loss record 34 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x68D9043: mca_base_component_repository_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68D7F7A: mca_base_component_find (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E3F35: mca_base_framework_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x68E3F93: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x4B85622: mca_io_base_file_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B0E68A: ompi_file_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B3ADB8: PMPI_File_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) >> ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) >> ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) >> ==1286939== >> ==1286939== 272 bytes in 44 blocks are definitely lost in loss record 35 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x9FCAEDB: ??? >> ==1286939== by 0x9FE42B2: ??? >> ==1286939== by 0x9FE47BB: ??? >> ==1286939== by 0x9FCDDBF: ??? >> ==1286939== by 0x9FA324A: ??? >> ==1286939== by 0x4B3DD7F: PMPI_File_write_at_all (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x7B6DB32: H5FD_mpio_write (H5FDmpio.c:1833) >> ==1286939== by 0x78DF11D: H5FD_write (H5FDint.c:257) >> ==1286939== by 0x78AE86B: H5F__accum_write (H5Faccum.c:825) >> ==1286939== by 0x7A1FBE9: H5PB_write (H5PB.c:1027) >> ==1286939== by 0x78BBC7A: H5F_block_write (H5Fio.c:164) >> ==1286939== >> ==1286939== 585 (480 direct, 105 indirect) bytes in 15 blocks are definitely lost in loss record 36 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x8E9D3EB: ??? >> ==1286939== by 0x8E9F1C1: ??? >> ==1286939== by 0x8D0578C: ??? >> ==1286939== by 0x8D8605A: ??? >> ==1286939== by 0x8D87FE8: ??? >> ==1286939== by 0x8D88E4D: ??? >> ==1286939== by 0x8D1A767: ??? >> ==1286939== by 0x4B14036: ompi_proc_complete_init_single (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B146C3: ompi_proc_complete_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4BA19A9: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== >> ==1286939== 776 bytes in 32 blocks are indirectly lost in loss record 37 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x8DE9816: ??? >> ==1286939== by 0x8DEB1D2: ??? >> ==1286939== by 0x8DEB49A: ??? >> ==1286939== by 0x8DE8B12: ??? >> ==1286939== by 0x8E9D492: ??? >> ==1286939== by 0x8E9F1C1: ??? >> ==1286939== by 0x8D0578C: ??? >> ==1286939== by 0x8D8605A: ??? >> ==1286939== by 0x8D87FE8: ??? >> ==1286939== by 0x8D88E4D: ??? >> ==1286939== by 0x8D1A767: ??? >> ==1286939== >> ==1286939== 840 (480 direct, 360 indirect) bytes in 15 blocks are definitely lost in loss record 38 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x8E9D3EB: ??? >> ==1286939== by 0x8E9F1C1: ??? >> ==1286939== by 0x8D0578C: ??? >> ==1286939== by 0x8D8605A: ??? >> ==1286939== by 0x8D87FE8: ??? >> ==1286939== by 0x8D88E4D: ??? >> ==1286939== by 0x8D1A5EB: ??? >> ==1286939== by 0x9EF2F00: ??? >> ==1286939== by 0x9EEBF17: ??? >> ==1286939== by 0x9EE2F54: ??? >> ==1286939== by 0x9F1E1FB: ??? >> ==1286939== >> ==1286939== 1,084 (480 direct, 604 indirect) bytes in 15 blocks are definitely lost in loss record 39 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x8E9D3EB: ??? >> ==1286939== by 0x8E9F1C1: ??? >> ==1286939== by 0x8D0578C: ??? >> ==1286939== by 0x8D8605A: ??? >> ==1286939== by 0x8D87FE8: ??? >> ==1286939== by 0x8D88E4D: ??? >> ==1286939== by 0x8D1A767: ??? >> ==1286939== by 0x84D4800: ??? >> ==1286939== by 0x68602FB: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >> ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== >> ==1286939== 1,344 bytes in 1 blocks are definitely lost in loss record 40 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x68AE702: opal_free_list_grow_st (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x9F1CD2D: ??? >> ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x9EE3527: ??? >> ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >> ==1286939== by 0x15710D: main (testing_main.cpp:8) >> ==1286939== >> ==1286939== 2,752 bytes in 1 blocks are definitely lost in loss record 41 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x68AE702: opal_free_list_grow_st (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x9F1CC50: ??? >> ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x9EE3527: ??? >> ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >> ==1286939== by 0x15710D: main (testing_main.cpp:8) >> ==1286939== >> ==1286939== 2,752 bytes in 1 blocks are definitely lost in loss record 42 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x68AE702: opal_free_list_grow_st (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x9F1CCC4: ??? >> ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286939== by 0x9EE3527: ??? >> ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >> ==1286939== by 0x15710D: main (testing_main.cpp:8) >> ==1286939== >> ==1286939== 62,644 bytes in 31 blocks are indirectly lost in loss record 43 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x8DE9FA8: ??? >> ==1286939== by 0x8DEB032: ??? >> ==1286939== by 0x8DEB49A: ??? >> ==1286939== by 0x8DE8B12: ??? >> ==1286939== by 0x8E9D492: ??? >> ==1286939== by 0x8E9F1C1: ??? >> ==1286939== by 0x8D0578C: ??? >> ==1286939== by 0x8D8605A: ??? >> ==1286939== by 0x8D87FE8: ??? >> ==1286939== by 0x8D88E4D: ??? >> ==1286939== by 0x8D1A5EB: ??? >> ==1286939== >> ==1286939== 62,760 (480 direct, 62,280 indirect) bytes in 15 blocks are definitely lost in loss record 44 of 44 >> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x8E9D3EB: ??? >> ==1286939== by 0x8E9F1C1: ??? >> ==1286939== by 0x8D0578C: ??? >> ==1286939== by 0x8D8605A: ??? >> ==1286939== by 0x8D87FE8: ??? >> ==1286939== by 0x8D88E4D: ??? >> ==1286939== by 0x8D1A5EB: ??? >> ==1286939== by 0x9F0398A: ??? >> ==1286939== by 0x9EE2F54: ??? >> ==1286939== by 0x9F1E1FB: ??? >> ==1286939== by 0x4BA1A09: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286939== >> ==1286939== LEAK SUMMARY: >> ==1286939== definitely lost: 9,837 bytes in 138 blocks >> ==1286939== indirectly lost: 63,435 bytes in 64 blocks >> ==1286939== possibly lost: 0 bytes in 0 blocks >> ==1286939== still reachable: 782 bytes in 21 blocks >> ==1286939== suppressed: 0 bytes in 0 blocks >> ==1286939== >> ==1286939== ERROR SUMMARY: 29 errors from 29 contexts (suppressed: 0 from 0) >> ==1286939== >> ==1286939== 1 errors in context 1 of 29: >> ==1286939== Thread 3: >> ==1286939== Syscall param writev(vector[...]) points to uninitialised byte(s) >> ==1286939== at 0x658A48D: __writev (writev.c:26) >> ==1286939== by 0x658A48D: writev (writev.c:24) >> ==1286939== by 0x8DF9B4C: ??? >> ==1286939== by 0x7CC413E: ??? (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286939== by 0x7CC487E: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286939== by 0x8DBDD55: ??? >> ==1286939== by 0x4BF7608: start_thread (pthread_create.c:477) >> ==1286939== by 0x6595102: clone (clone.S:95) >> ==1286939== Address 0xa28ee1f is 127 bytes inside a block of size 5,120 alloc'd >> ==1286939== at 0x483DFAF: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286939== by 0x8DE155A: ??? >> ==1286939== by 0x8DE3F4A: ??? >> ==1286939== by 0x8DE4900: ??? >> ==1286939== by 0x8DE4175: ??? >> ==1286939== by 0x8D7CF91: ??? >> ==1286939== by 0x7CC3FDD: ??? (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286939== by 0x7CC487E: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286939== by 0x8DBDD55: ??? >> ==1286939== by 0x4BF7608: start_thread (pthread_create.c:477) >> ==1286939== by 0x6595102: clone (clone.S:95) >> ==1286939== Uninitialised value was created by a stack allocation >> ==1286939== at 0x9F048D6: ??? >> ==1286939== >> ==1286939== ERROR SUMMARY: 29 errors from 29 contexts (suppressed: 0 from 0) >> mpi/lib/libopen-pal.so.40.20.3) >> ==1286936== by 0x4B85622: mca_io_base_file_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x4B0E68A: ompi_file_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x4B3ADB8: PMPI_File_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) >> ==1286936== by 0x78D4B23: H5FD_open (H5FD.c:733) >> ==1286936== by 0x78B953B: H5F_open (H5Fint.c:1493) >> ==1286936== >> ==1286936== 272 bytes in 44 blocks are definitely lost in loss record 39 of 49 >> ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286936== by 0x9FCAEDB: ??? >> ==1286936== by 0x9FE42B2: ??? >> ==1286936== by 0x9FE47BB: ??? >> ==1286936== by 0x9FCDDBF: ??? >> ==1286936== by 0x9FA324A: ??? >> ==1286936== by 0x4B3DD7F: PMPI_File_write_at_all (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x7B6DB32: H5FD_mpio_write (H5FDmpio.c:1833) >> ==1286936== by 0x78DF11D: H5FD_write (H5FDint.c:257) >> ==1286936== by 0x78AE86B: H5F__accum_write (H5Faccum.c:825) >> ==1286936== by 0x7A1FBE9: H5PB_write (H5PB.c:1027) >> ==1286936== by 0x78BBC7A: H5F_block_write (H5Fio.c:164) >> ==1286936== >> ==1286936== 312 bytes in 1 blocks are still reachable in loss record 40 of 49 >> ==1286936== at 0x483BE63: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286936== by 0x74E78EB: boost::detail::make_external_thread_data() (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_thread.so.1.73.0) >> ==1286936== by 0x74E7C74: boost::detail::add_thread_exit_function(boost::detail::thread_exit_function_base*) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_thread.so.1.73.0) >> ==1286936== by 0x73AFCEA: boost::log::v2_mt_posix::sources::aux::get_severity_level() (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_log.so.1.73.0) >> ==1286936== by 0x5F71A6C: set_value (severity_feature.hpp:135) >> ==1286936== by 0x5F71A6C: open_record_unlocked > > (severity_feature.hpp:252) >> ==1286936== by 0x5F71A6C: open_record > > (basic_logger.hpp:459) >> ==1286936== by 0x5F71A6C: Logger::TraceMessage(std::__cxx11::basic_string, std::allocator >) (logger.cpp:328) >> ==1286936== by 0x5F729C7: Logger::Message(std::__cxx11::basic_string, std::allocator > const&, LogLevel) (logger.cpp:280) >> ==1286936== by 0x5F73CF1: Logger::Timer::Timer(std::__cxx11::basic_string, std::allocator > const&, LogLevel) (logger.cpp:426) >> ==1286936== by 0x15718A: timer (logger.hpp:98) >> ==1286936== by 0x15718A: main (testing_main.cpp:9) >> ==1286936== >> ==1286936== 585 (480 direct, 105 indirect) bytes in 15 blocks are definitely lost in loss record 41 of 49 >> ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286936== by 0x8E9D3EB: ??? >> ==1286936== by 0x8E9F1C1: ??? >> ==1286936== by 0x8D0578C: ??? >> ==1286936== by 0x8D8605A: ??? >> ==1286936== by 0x8D87FE8: ??? >> ==1286936== by 0x8D88E4D: ??? >> ==1286936== by 0x8D1A767: ??? >> ==1286936== by 0x4B14036: ompi_proc_complete_init_single (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x4B146C3: ompi_proc_complete_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x4BA19A9: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== >> ==1286936== 776 bytes in 32 blocks are indirectly lost in loss record 42 of 49 >> ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286936== by 0x8DE9816: ??? >> ==1286936== by 0x8DEB1D2: ??? >> ==1286936== by 0x8DEB49A: ??? >> ==1286936== by 0x8DE8B12: ??? >> ==1286936== by 0x8E9D492: ??? >> ==1286936== by 0x8E9F1C1: ??? >> ==1286936== by 0x8D0578C: ??? >> ==1286936== by 0x8D8605A: ??? >> ==1286936== by 0x8D87FE8: ??? >> ==1286936== by 0x8D88E4D: ??? >> ==1286936== by 0x8D1A767: ??? >> ==1286936== >> ==1286936== 840 (480 direct, 360 indirect) bytes in 15 blocks are definitely lost in loss record 43 of 49 >> ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286936== by 0x8E9D3EB: ??? >> ==1286936== by 0x8E9F1C1: ??? >> ==1286936== by 0x8D0578C: ??? >> ==1286936== by 0x8D8605A: ??? >> ==1286936== by 0x8D87FE8: ??? >> ==1286936== by 0x8D88E4D: ??? >> ==1286936== by 0x8D1A5EB: ??? >> ==1286936== by 0x9EF2F00: ??? >> ==1286936== by 0x9EEBF17: ??? >> ==1286936== by 0x9EE2F54: ??? >> ==1286936== by 0x9F1E1FB: ??? >> ==1286936== >> ==1286936== 1,091 (480 direct, 611 indirect) bytes in 15 blocks are definitely lost in loss record 44 of 49 >> ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286936== by 0x8E9D3EB: ??? >> ==1286936== by 0x8E9F1C1: ??? >> ==1286936== by 0x8D0578C: ??? >> ==1286936== by 0x8D8605A: ??? >> ==1286936== by 0x8D87FE8: ??? >> ==1286936== by 0x8D88E4D: ??? >> ==1286936== by 0x8D1A767: ??? >> ==1286936== by 0x84D4800: ??? >> ==1286936== by 0x68602FB: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >> ==1286936== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== >> ==1286936== 1,344 bytes in 1 blocks are definitely lost in loss record 45 of 49 >> ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286936== by 0x68AE702: opal_free_list_grow_st (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286936== by 0x9F1CD2D: ??? >> ==1286936== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286936== by 0x9EE3527: ??? >> ==1286936== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286936== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286936== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >> ==1286936== by 0x15710D: main (testing_main.cpp:8) >> ==1286936== >> ==1286936== 2,752 bytes in 1 blocks are definitely lost in loss record 46 of 49 >> ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286936== by 0x68AE702: opal_free_list_grow_st (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286936== by 0x9F1CC50: ??? >> ==1286936== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286936== by 0x9EE3527: ??? >> ==1286936== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286936== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286936== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >> ==1286936== by 0x15710D: main (testing_main.cpp:8) >> ==1286936== >> ==1286936== 2,752 bytes in 1 blocks are definitely lost in loss record 47 of 49 >> ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286936== by 0x68AE702: opal_free_list_grow_st (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286936== by 0x9F1CCC4: ??? >> ==1286936== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >> ==1286936== by 0x9EE3527: ??? >> ==1286936== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >> ==1286936== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >> ==1286936== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >> ==1286936== by 0x15710D: main (testing_main.cpp:8) >> ==1286936== >> ==1286936== 62,640 bytes in 30 blocks are indirectly lost in loss record 48 of 49 >> ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286936== by 0x8DE9FA8: ??? >> ==1286936== by 0x8DEB032: ??? >> ==1286936== by 0x8DEB49A: ??? >> ==1286936== by 0x8DE8B12: ??? >> ==1286936== by 0x8E9D492: ??? >> ==1286936== by 0x8E9F1C1: ??? >> ==1286936== by 0x8D0578C: ??? >> ==1286936== by 0x8D8605A: ??? >> ==1286936== by 0x8D87FE8: ??? >> ==1286936== by 0x8D88E4D: ??? >> ==1286936== by 0x8D1A5EB: ??? >> ==1286936== >> ==1286936== 62,760 (480 direct, 62,280 indirect) bytes in 15 blocks are definitely lost in loss record 49 of 49 >> ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286936== by 0x8E9D3EB: ??? >> ==1286936== by 0x8E9F1C1: ??? >> ==1286936== by 0x8D0578C: ??? >> ==1286936== by 0x8D8605A: ??? >> ==1286936== by 0x8D87FE8: ??? >> ==1286936== by 0x8D88E4D: ??? >> ==1286936== by 0x8D1A5EB: ??? >> ==1286936== by 0x9F0398A: ??? >> ==1286936== by 0x9EE2F54: ??? >> ==1286936== by 0x9F1E1FB: ??? >> ==1286936== by 0x4BA1A09: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== >> ==1286936== LEAK SUMMARY: >> ==1286936== definitely lost: 9,805 bytes in 137 blocks >> ==1286936== indirectly lost: 63,431 bytes in 63 blocks >> ==1286936== possibly lost: 0 bytes in 0 blocks >> ==1286936== still reachable: 1,174 bytes in 27 blocks >> ==1286936== suppressed: 0 bytes in 0 blocks >> ==1286936== >> ==1286936== ERROR SUMMARY: 34 errors from 29 contexts (suppressed: 0 from 0) >> ==1286936== >> ==1286936== 1 errors in context 1 of 29: >> ==1286936== Thread 3: >> ==1286936== Syscall param writev(vector[...]) points to uninitialised byte(s) >> ==1286936== at 0x658A48D: __writev (writev.c:26) >> ==1286936== by 0x658A48D: writev (writev.c:24) >> ==1286936== by 0x8DF9B4C: ??? >> ==1286936== by 0x7CC413E: ??? (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286936== by 0x7CC487E: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286936== by 0x8DBDD55: ??? >> ==1286936== by 0x4BF7608: start_thread (pthread_create.c:477) >> ==1286936== by 0x6595102: clone (clone.S:95) >> ==1286936== Address 0xa290cbf is 127 bytes inside a block of size 5,120 alloc'd >> ==1286936== at 0x483DFAF: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286936== by 0x8DE155A: ??? >> ==1286936== by 0x8DE3F4A: ??? >> ==1286936== by 0x8DE4900: ??? >> ==1286936== by 0x8DE4175: ??? >> ==1286936== by 0x8D7CF91: ??? >> ==1286936== by 0x7CC3FDD: ??? (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286936== by 0x7CC487E: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >> ==1286936== by 0x8DBDD55: ??? >> ==1286936== by 0x4BF7608: start_thread (pthread_create.c:477) >> ==1286936== by 0x6595102: clone (clone.S:95) >> ==1286936== Uninitialised value was created by a stack allocation >> ==1286936== at 0x9F048D6: ??? >> ==1286936== >> ==1286936== >> ==1286936== 6 errors in context 2 of 29: >> ==1286936== Thread 1: >> ==1286936== Syscall param pwritev(vector[...]) points to uninitialised byte(s) >> ==1286936== at 0x658A608: pwritev64 (pwritev64.c:30) >> ==1286936== by 0x658A608: pwritev (pwritev64.c:28) >> ==1286936== by 0x9F46E25: ??? >> ==1286936== by 0x9FCE33B: ??? >> ==1286936== by 0x9FCDDBF: ??? >> ==1286936== by 0x9FA324A: ??? >> ==1286936== by 0x4B3DD7F: PMPI_File_write_at_all (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >> ==1286936== by 0x7B6DB32: H5FD_mpio_write (H5FDmpio.c:1833) >> ==1286936== by 0x78DF11D: H5FD_write (H5FDint.c:257) >> ==1286936== by 0x78AE86B: H5F__accum_write (H5Faccum.c:825) >> ==1286936== by 0x7A1FBE9: H5PB_write (H5PB.c:1027) >> ==1286936== by 0x78BBC7A: H5F_block_write (H5Fio.c:164) >> ==1286936== by 0x7B5ED15: H5C__collective_write (H5Cmpio.c:1020) >> ==1286936== by 0x7B5ED15: H5C_apply_candidate_list (H5Cmpio.c:394) >> ==1286936== Address 0xedf91b0 is 96 bytes inside a block of size 216 alloc'd >> ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==1286936== by 0x7980C45: H5MM_malloc (H5MM.c:292) >> ==1286936== by 0x7980C45: H5MM_malloc (H5MM.c:267) >> ==1286936== by 0x77FC8FF: H5C__flush_single_entry (H5C.c:6045) >> ==1286936== by 0x7B5DC7E: H5C__flush_candidates_in_ring (H5Cmpio.c:1371) >> ==1286936== by 0x7B5DC7E: H5C__flush_candidate_entries (H5Cmpio.c:1192) >> ==1286936== by 0x7B5DC7E: H5C_apply_candidate_list (H5Cmpio.c:385) >> ==1286936== by 0x7B5BA18: H5AC__rsp__dist_md_write__flush (H5ACmpio.c:1709) >> ==1286936== by 0x7B5BA18: H5AC__run_sync_point (H5ACmpio.c:2164) >> ==1286936== by 0x7B5C9D2: H5AC__flush_entries (H5ACmpio.c:2307) >> ==1286936== by 0x77C95E4: H5AC_flush (H5AC.c:681) >> ==1286936== by 0x78B306A: H5F__flush_phase2 (H5Fint.c:1831) >> ==1286936== by 0x78B5D7A: H5F__dest (H5Fint.c:1152) >> ==1286936== by 0x78B6603: H5F_try_close (H5Fint.c:2180) >> ==1286936== by 0x78B69F5: H5F__close_cb (H5Fint.c:2009) >> ==1286936== by 0x7965797: H5I_dec_ref (H5I.c:1254) >> ==1286936== Uninitialised value was created by a stack allocation >> ==1286936== at 0x7695AF0: ??? (in /home/mlohry/dev/cmake-build/external_install/lib/libcgns.so) >> ==1286936== >> ==1286936== ERROR SUMMARY: 34 errors from 29 contexts (suppressed: 0 from 0) >> >> On Mon, Aug 24, 2020 at 5:00 PM Jed Brown > wrote: >> Do you potentially have a memory or other resource leak? SIGBUS would be an odd result, but the symptom of crashing after running for a long time sometimes fits with a resource leak. >> >> Mark Lohry > writes: >> >> > I queued up some jobs with Barry's patch, so we'll see. >> > >> > Re Jed's suggestion at checkpointing, I don't *think* this is something >> > coming from the state of the solution -- running from the same point I'm >> > seeing it crash anywhere between 1 hour and 20 hours in. I'll increase my >> > file save frequency in case I'm wrong there though. >> > >> > My intel build with different blas just made it through a 6 hour time slot >> > without crash, whereas yesterday the same thing crashed after 3 hours. But >> > given the randomness so far I'd bet that's just dumb luck. >> > >> > On Mon, Aug 24, 2020 at 4:22 PM Barry Smith > wrote: >> > >> >> >> >> >> >> > On Aug 24, 2020, at 2:34 PM, Jed Brown > wrote: >> >> > >> >> > I'm thinking of something such as writing floating point data into the >> >> return address, which would be unaligned/garbage. >> >> >> >> Ok, my patch will detect this. This is what I was talking about, messing >> >> up the BLAS arguments which are the addresses of arrays. >> >> >> >> Valgrind is by far the preferred approach. >> >> >> >> Barry >> >> >> >> Another feature we could add to the malloc checking is when a SEGV or >> >> BUS error is encountered and we catch it we should run the >> >> PetscMallocVerify() and check our memory for corruption reporting any we >> >> find. >> >> >> >> >> >> >> >> > >> >> > Reproducing under Valgrind would help a lot. Perhaps it's possible to >> >> checkpoint such that the breakage can be reproduced more quickly? >> >> > >> >> > Barry Smith > writes: >> >> > >> >> >> https://en.wikipedia.org/wiki/Bus_error < >> >> https://en.wikipedia.org/wiki/Bus_error > >> >> >> >> >> >> But perhaps not true for Intel? >> >> >> >> >> >> >> >> >> >> >> >>> On Aug 24, 2020, at 1:06 PM, Matthew Knepley > >> >> wrote: >> >> >>> >> >> >>> On Mon, Aug 24, 2020 at 1:46 PM Barry Smith > >> bsmith at petsc.dev >> wrote: >> >> >>> >> >> >>> >> >> >>>> On Aug 24, 2020, at 12:39 PM, Jed Brown > >> jed at jedbrown.org >> wrote: >> >> >>>> >> >> >>>> Barry Smith >> writes: >> >> >>>> >> >> >>>>>> On Aug 24, 2020, at 12:31 PM, Jed Brown > >> jed at jedbrown.org >> wrote: >> >> >>>>>> >> >> >>>>>> Barry Smith >> writes: >> >> >>>>>> >> >> >>>>>>> So if a BLAS errors with SIGBUS then it is always an input error >> >> of just not proper double/complex alignment? Or some other very strange >> >> thing? >> >> >>>>>> >> >> >>>>>> I would suspect memory corruption. >> >> >>>>> >> >> >>>>> >> >> >>>>> Corruption meaning what specifically? >> >> >>>>> >> >> >>>>> The routines crashing are dgemv which only take double precision >> >> arrays, regardless of what garbage is in those arrays i don't think there >> >> can be BUS errors resulting. They don't take integer arrays whose >> >> corruption could result in bad indexing and then BUS errors. >> >> >>>>> >> >> >>>>> So then it can only be corruption of the pointers passed in, correct? >> >> >>>> >> >> >>>> Such as those pointers pointing into data on the stack with incorrect >> >> sizes. >> >> >>> >> >> >>> But won't incorrect sizes "usually" lead to SEGV not SEGBUS? >> >> >>> >> >> >>> My understanding was that roughly memory errors in the heap are SEGV >> >> and memory errors on the stack are SIGBUS. Is that not true? >> >> >>> >> >> >>> Matt >> >> >>> >> >> >>> -- >> >> >>> What most experimenters take for granted before they begin their >> >> experiments is infinitely more interesting than any results to which their >> >> experiments lead. >> >> >>> -- Norbert Wiener >> >> >>> >> >> >>> https://www.cse.buffalo.edu/~knepley/ < >> >> http://www.cse.buffalo.edu/~knepley/ > >> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: hostname.patch Type: application/octet-stream Size: 1198 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.appel17 at imperial.ac.uk Thu Aug 27 10:25:40 2020 From: t.appel17 at imperial.ac.uk (Thibaut Appel) Date: Thu, 27 Aug 2020 16:25:40 +0100 Subject: [petsc-users] About KSPConvergedReasonView Message-ID: <9bffed18-91a0-9ba0-54c4-c77b48d976a9@imperial.ac.uk> Dear PETSc users, I found out that (at least in the master branch) that KSPReasonView has been recently deprecated in favor of KSPConvergedReasonView. After changing my application code, I thought I was using the function correctly: ??? CALL KSPGetConvergedReason(ksp,ksp_reason,ierr) ??? CHKERRA(ierr) ??? IF (ksp_reason < 0) THEN ????? CALL KSPConvergedReasonView(ksp,PETSC_VIEWER_STDOUT_WORLD,PETSC_VIEWER_DEFAULT,ierr) ????? CHKERRA(ierr) ??? END IF but I still get the following backtrace Program received signal SIGSEGV, Segmentation fault. 0x00007ffff51dea59 in PetscObjectTypeCompare (obj=0x8, ??? type_name=0x7ffff75b5b88 "ascii", same=0x7fffffffd7b8) ??? at /home/Packages/petsc/src/sys/objects/destroy.c:160 160??? ? else if (!type_name || !obj->type_name) *same = PETSC_FALSE; (gdb) bt #0? 0x00007ffff51dea59 in PetscObjectTypeCompare (obj=0x8, ??? type_name=0x7ffff75b5b88 "ascii", same=0x7fffffffd7b8) ??? at /home/Packages/petsc/src/sys/objects/destroy.c:160 #1? 0x00007ffff6ba1d83 in KSPConvergedReasonView (ksp=0x555555bb2510, viewer=0x8, ??? format=PETSC_VIEWER_DEFAULT) ??? at /home/Packages/petsc/src/ksp/ksp/interface/itfunc.c:452 #2? 0x00007ffff6beb37a in kspconvergedreasonview_ ( ??? ksp=0x55555593f3c0 <__solver_MOD_ksp>, viewer=0x7fffffffda50, ??? format=0x5555558fae48, __ierr=0x7fffffffda6c) ??? at /home/Packages/petsc/src/ksp/ksp/interface/ftn-auto/itfuncf.c:295 #3? 0x00005555555e040d in solver::solve_linear_problem (vec_rhs=..., vec_sol=...) ??? at mod_solver.F90:1872 #4? 0x0000555555614453 in solver::solve () at mod_solver.F90:164 #5? 0x00005555555ba3c6 in main () at main.F90:67 #6? 0x00005555555ba437 in main (argc=1, argv=0x7fffffffe17e) at main.F90:3 #7? 0x00007ffff46fa1e3 in __libc_start_main () ?? from /usr/lib/x86_64-linux-gnu/libc.so.6 #8? 0x000055555555cd7e in _start () as if there was a type mismatch. Could anyone pinpoint what's wrong? Thank you, Thibaut From bsmith at petsc.dev Thu Aug 27 10:59:33 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 27 Aug 2020 10:59:33 -0500 Subject: [petsc-users] About KSPConvergedReasonView In-Reply-To: <9bffed18-91a0-9ba0-54c4-c77b48d976a9@imperial.ac.uk> References: <9bffed18-91a0-9ba0-54c4-c77b48d976a9@imperial.ac.uk> Message-ID: <0A6FAFE4-41DE-494A-A8AA-2BF3E84F60D7@petsc.dev> This is probably due to the final argument being a character string which PETSc has difficulty managing by default for Fortran. I just removed the format argument from the call so this problem will gone soon. You could just comment out the call for now and put a print statement in instead. The problem will be fixed as soon as my merge request gets into master. Sorry about this. Barry > On Aug 27, 2020, at 10:25 AM, Thibaut Appel wrote: > > Dear PETSc users, > > I found out that (at least in the master branch) that KSPReasonView has been recently deprecated in favor of KSPConvergedReasonView. > > After changing my application code, I thought I was using the function correctly: > > CALL KSPGetConvergedReason(ksp,ksp_reason,ierr) > CHKERRA(ierr) > > IF (ksp_reason < 0) THEN > > CALL KSPConvergedReasonView(ksp,PETSC_VIEWER_STDOUT_WORLD,PETSC_VIEWER_DEFAULT,ierr) > CHKERRA(ierr) > > END IF > > but I still get the following backtrace > > Program received signal SIGSEGV, Segmentation fault. > 0x00007ffff51dea59 in PetscObjectTypeCompare (obj=0x8, > type_name=0x7ffff75b5b88 "ascii", same=0x7fffffffd7b8) > at /home/Packages/petsc/src/sys/objects/destroy.c:160 > 160 else if (!type_name || !obj->type_name) *same = PETSC_FALSE; > (gdb) bt > #0 0x00007ffff51dea59 in PetscObjectTypeCompare (obj=0x8, > type_name=0x7ffff75b5b88 "ascii", same=0x7fffffffd7b8) > at /home/Packages/petsc/src/sys/objects/destroy.c:160 > #1 0x00007ffff6ba1d83 in KSPConvergedReasonView (ksp=0x555555bb2510, viewer=0x8, > format=PETSC_VIEWER_DEFAULT) > at /home/Packages/petsc/src/ksp/ksp/interface/itfunc.c:452 > #2 0x00007ffff6beb37a in kspconvergedreasonview_ ( > ksp=0x55555593f3c0 <__solver_MOD_ksp>, viewer=0x7fffffffda50, > format=0x5555558fae48, __ierr=0x7fffffffda6c) > at /home/Packages/petsc/src/ksp/ksp/interface/ftn-auto/itfuncf.c:295 > #3 0x00005555555e040d in solver::solve_linear_problem (vec_rhs=..., vec_sol=...) > at mod_solver.F90:1872 > #4 0x0000555555614453 in solver::solve () at mod_solver.F90:164 > #5 0x00005555555ba3c6 in main () at main.F90:67 > #6 0x00005555555ba437 in main (argc=1, argv=0x7fffffffe17e) at main.F90:3 > #7 0x00007ffff46fa1e3 in __libc_start_main () > from /usr/lib/x86_64-linux-gnu/libc.so.6 > #8 0x000055555555cd7e in _start () > > as if there was a type mismatch. Could anyone pinpoint what's wrong? > > Thank you, > > Thibaut > From bourdin at lsu.edu Thu Aug 27 13:50:42 2020 From: bourdin at lsu.edu (Blaise A Bourdin) Date: Thu, 27 Aug 2020 18:50:42 +0000 Subject: [petsc-users] Error building petsc4py using intel compilers and mpi Message-ID: <805D51D3-8F53-4F3E-90A5-478FFFB27A8B@lsu.edu> Hi, I am trying to build firedrake using intel compilers, mpi, and python. I compiled petsc using the following options, and petsc4py is installed by firedrake-install (see the compilation options below) ./configure --with-mpi-dir=$MPI_HOME --CFLAGS='-std=c11 -D_GNU_SOURCE' --CXXFLAGS='' --FFLAGS='' --COPTFLAGS='-O3 -xCASCADELAKE -g' --FOPTFLAGS='-O3 -xCASCADELAKE -g' --CXXOPTFLAGS='-O3 -xCASCADELAKE -g' --with-c++-support --with-c-support --with-fortran --with-fortran-bindings=0 --with-cxx-dialect=C++11 --download-ptscotch --download-hdf5=https://support.hdfgroup.org/ftp/HDF5/releases/hdf5-1.10/hdf5-1.10.4/src/hdf5-1.10.4.tar.bz2 --download-hypre --download-superlu_dist --download-metis --download-parmetis --download-scalapack --download-mumps --download-chaco --download-ml --download-eigen=https://github.com/eigenteam/eigen-git-mirror/archive/3.3.3.tar.gz --download-sowing=0 Here is what I get when trying to import petsc4py: (firedrake) mef90:firedrake$ python3 Python 3.7.7 (default, Jun 26 2020, 05:10:03) [GCC 7.3.0] :: Intel(R) Corporation on linux Type "help", "copyright", "credits" or "license" for more information. Intel(R) Distribution for Python is brought to you by Intel Corporation. Please check out: https://software.intel.com/en-us/python-distribution >>> import petsc4py >>> petsc4py.init() Traceback (most recent call last): File "", line 1, in File "/home/blaise/Development/firedrake/lib/python3.7/site-packages/petsc4py/__init__.py", line 42, in init PETSc = petsc4py.lib.ImportPETSc(arch) File "/home/blaise/Development/firedrake/lib/python3.7/site-packages/petsc4py/lib/__init__.py", line 29, in ImportPETSc return Import('petsc4py', 'PETSc', path, arch) File "/home/blaise/Development/firedrake/lib/python3.7/site-packages/petsc4py/lib/__init__.py", line 73, in Import module = import_module(pkg, name, path, arch) File "/home/blaise/Development/firedrake/lib/python3.7/site-packages/petsc4py/lib/__init__.py", line 58, in import_module with f: return imp.load_module(fullname, f, fn, info) File "/share/apps/intel-2020.2/intelpython3/lib/python3.7/imp.py", line 242, in load_module return load_dynamic(name, filename, file) File "/share/apps/intel-2020.2/intelpython3/lib/python3.7/imp.py", line 342, in load_dynamic return _load(spec) File "", line 696, in _load File "", line 670, in _load_unlocked File "", line 583, in module_from_spec File "", line 1043, in create_module File "", line 219, in _call_with_frames_removed ImportError: /home/blaise/Development/firedrake/lib/python3.7/site-packages/petsc4py/lib/RHEL7-intel2020.2-impi-firedrake/PETSc.cpython-37m-x86_64-linux-gnu.so: undefined symbol: PetscPartitionerInitializePackage The relevant part of the firedrake-install.log are below. 2020-08-27 12:48:23,861 INFO Installing petsc4py/ 2020-08-27 12:48:23,882 DEBUG Running command '/home/blaise/Development/firedrake/bin/python -m pip install --no-binary mpi4py,randomgen,islpy --no-deps -vvv --ignore-installed petsc4py/' 2020-08-27 12:50:32,017 DEBUG Using pip 20.2.2 from /home/blaise/Development/firedrake/lib/python3.7/site-packages/pip (python 3.7) Non-user install because user site-packages disabled Created temporary directory: /tmp/pip-ephem-wheel-cache-5540399x Created temporary directory: /tmp/pip-req-tracker-jo7ho6yz Initialized build tracking at /tmp/pip-req-tracker-jo7ho6yz Created build tracker: /tmp/pip-req-tracker-jo7ho6yz Entered build tracker: /tmp/pip-req-tracker-jo7ho6yz Created temporary directory: /tmp/pip-install-e3u5z76o Processing ./petsc4py Created temporary directory: /tmp/pip-req-build-r3agmgty Added file:///home/blaise/Development/firedrake/src/petsc4py to build tracker '/tmp/pip-req-tracker-jo7ho6yz' Running setup.py (path:/tmp/pip-req-build-r3agmgty/setup.py) egg_info for package from file:///home/blaise/Development/firedrake/src/petsc4py Created temporary directory: /tmp/pip-pip-egg-info-dctmtwx0 Running command python setup.py egg_info running egg_info creating /tmp/pip-pip-egg-info-dctmtwx0/petsc4py.egg-info writing /tmp/pip-pip-egg-info-dctmtwx0/petsc4py.egg-info/PKG-INFO writing dependency_links to /tmp/pip-pip-egg-info-dctmtwx0/petsc4py.egg-info/dependency_links.txt writing requirements to /tmp/pip-pip-egg-info-dctmtwx0/petsc4py.egg-info/requires.txt writing top-level names to /tmp/pip-pip-egg-info-dctmtwx0/petsc4py.egg-info/top_level.txt writing manifest file '/tmp/pip-pip-egg-info-dctmtwx0/petsc4py.egg-info/SOURCES.txt' reading manifest file '/tmp/pip-pip-egg-info-dctmtwx0/petsc4py.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' writing manifest file '/tmp/pip-pip-egg-info-dctmtwx0/petsc4py.egg-info/SOURCES.txt' Source in /tmp/pip-req-build-r3agmgty has version 3.13.0, which satisfies requirement petsc4py==3.13.0 from file:///home/blaise/Development/firedrake/src/petsc4py Removed petsc4py==3.13.0 from file:///home/blaise/Development/firedrake/src/petsc4py from build tracker '/tmp/pip-req-tracker-jo7ho6yz' Building wheels for collected packages: petsc4py Created temporary directory: /tmp/pip-wheel-0o0xldo2 Building wheel for petsc4py (setup.py): started Destination directory: /tmp/pip-wheel-0o0xldo2 Running command /home/blaise/Development/firedrake/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-r3agmgty/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-r3agmgty/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-0o0xldo2 running bdist_wheel running build running build_src cythonizing 'petsc4py.PETSc.pyx' -> 'petsc4py.PETSc.c' /home/blaise/Development/firedrake/lib/python3.7/site-packages/Cython/Compiler/Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: include/petsc4py/PETSc.pxd tree = Parsing.p_module(s, pxd, full_module_name) cythonizing 'libpetsc4py/libpetsc4py.pyx' -> 'libpetsc4py/libpetsc4py.c' /home/blaise/Development/firedrake/lib/python3.7/site-packages/Cython/Compiler/Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /tmp/pip-req-build-r3agmgty/src/libpetsc4py/libpetsc4py.pyx tree = Parsing.p_module(s, pxd, full_module_name) running build_py creating build creating build/lib.linux-x86_64-3.7 creating build/lib.linux-x86_64-3.7/petsc4py copying src/PETSc.py -> build/lib.linux-x86_64-3.7/petsc4py copying src/__init__.py -> build/lib.linux-x86_64-3.7/petsc4py copying src/__main__.py -> build/lib.linux-x86_64-3.7/petsc4py creating build/lib.linux-x86_64-3.7/petsc4py/lib copying src/lib/__init__.py -> build/lib.linux-x86_64-3.7/petsc4py/lib creating build/lib.linux-x86_64-3.7/petsc4py/include creating build/lib.linux-x86_64-3.7/petsc4py/include/petsc4py copying src/include/petsc4py/numpy.h -> build/lib.linux-x86_64-3.7/petsc4py/include/petsc4py copying src/include/petsc4py/petsc4py.h -> build/lib.linux-x86_64-3.7/petsc4py/include/petsc4py copying src/include/petsc4py/petsc4py.PETSc.h -> build/lib.linux-x86_64-3.7/petsc4py/include/petsc4py copying src/include/petsc4py/petsc4py.PETSc_api.h -> build/lib.linux-x86_64-3.7/petsc4py/include/petsc4py copying src/include/petsc4py/petsc4py.i -> build/lib.linux-x86_64-3.7/petsc4py/include/petsc4py copying src/include/petsc4py/PETSc.pxd -> build/lib.linux-x86_64-3.7/petsc4py/include/petsc4py copying src/include/petsc4py/__init__.pxd -> build/lib.linux-x86_64-3.7/petsc4py/include/petsc4py copying src/include/petsc4py/__init__.pyx -> build/lib.linux-x86_64-3.7/petsc4py/include/petsc4py copying src/PETSc.pxd -> build/lib.linux-x86_64-3.7/petsc4py copying src/lib/petsc.cfg -> build/lib.linux-x86_64-3.7/petsc4py/lib running build_ext PETSC_DIR: /opt/HPC/petsc-maint PETSC_ARCH: RHEL7-intel2020.2-impi-firedrake version: 3.13.4 release integer-size: 32-bit scalar-type: real precision: double language: CONLY compiler: /share/apps/intel-2020.2/compilers_and_libraries/linux/mpi/intel64/bin/mpiicc linker: /share/apps/intel-2020.2/compilers_and_libraries/linux/mpi/intel64/bin/mpiicc building 'PETSc' extension creating build/temp.linux-x86_64-3.7 creating build/temp.linux-x86_64-3.7/RHEL7-intel2020.2-impi-firedrake creating build/temp.linux-x86_64-3.7/RHEL7-intel2020.2-impi-firedrake/src /share/apps/intel-2020.2/compilers_and_libraries/linux/mpi/intel64/bin/mpiicc -pthread -B /share/apps/intel-2020.2/intelpython3/compiler_compat -Wl,--sysroot=/ -std=c11 -D_GNU_SOURCE -fPIC -O3 -xCASCADELAKE -g -fPIC -L/opt/HPC/petsc-maint/RHEL7-intel2020.2-impi-firedrake/lib -I/opt/HPC/petsc-maint/RHEL7-intel2020.2-impi-firedrake/include -I/share/apps/intel-2020.2/compilers_and_libraries/linux/mpi/intel64/include -I/opt/HPC/petsc-maint/RHEL7-intel2020.2-impi-firedrake/include/eigen3 -I/opt/HPC/petsc-maint/RHEL7-intel2020.2-impi-firedrake/include -I/opt/HPC/petsc-maint/include -Isrc/include -I/home/blaise/Development/firedrake/lib/python3.7/site-packages/numpy/core/include -I/home/blaise/Development/firedrake/include -I/share/apps/intel-2020.2/intelpython3/include/python3.7m -c src/PETSc.c -o build/temp.linux-x86_64-3.7/RHEL7-intel2020.2-impi-firedrake/src/PETSc.o In file included from /home/blaise/Development/firedrake/lib/python3.7/site-packages/numpy/core/include/numpy/ndarrayobject.h(12), from /home/blaise/Development/firedrake/lib/python3.7/site-packages/numpy/core/include/numpy/arrayobject.h(4), from src/include/petsc4py/numpy.h(11), from src/petsc4py.PETSc.c(612), from src/PETSc.c(4): /home/blaise/Development/firedrake/lib/python3.7/site-packages/numpy/core/include/numpy/ndarraytypes.h(84): warning #2650: attributes ignored here NPY_CHAR NPY_ATTR_DEPRECATE("Use NPY_STRING"), ^ In file included from src/petsc4py.PETSc.c(619), from src/PETSc.c(4): src/include/initpkg.h(23): warning #266: function "PetscPartitionerInitializePackage" declared implicitly ierr = PetscPartitionerInitializePackage();CHKERRQ(ierr); ^ In file included from src/PETSc.c(4): src/petsc4py.PETSc.c(294008): warning #266: function "PetscPartitionerReset" declared implicitly __pyx_t_1 = __pyx_f_8petsc4py_5PETSc_CHKERR(PetscPartitionerReset(__pyx_v_self->part)); if (unlikely(__pyx_t_1 == ((int)-1))) __PYX_ERR(54, 55, __pyx_L1_error) ^ /share/apps/intel-2020.2/compilers_and_libraries/linux/mpi/intel64/bin/mpiicc -pthread -B /share/apps/intel-2020.2/intelpython3/compiler_compat -Wl,--sysroot=/ -std=c11 -D_GNU_SOURCE -fPIC -O3 -xCASCADELAKE -g -fPIC -L/opt/HPC/petsc-maint/RHEL7-intel2020.2-impi-firedrake/lib -I/opt/HPC/petsc-maint/RHEL7-intel2020.2-impi-firedrake/include -I/share/apps/intel-2020.2/compilers_and_libraries/linux/mpi/intel64/include -I/opt/HPC/petsc-maint/RHEL7-intel2020.2-impi-firedrake/include/eigen3 -I/opt/HPC/petsc-maint/RHEL7-intel2020.2-impi-firedrake/include -I/opt/HPC/petsc-maint/include -Isrc/include -I/home/blaise/Development/firedrake/lib/python3.7/site-packages/numpy/core/include -I/home/blaise/Development/firedrake/include -I/share/apps/intel-2020.2/intelpython3/include/python3.7m -c src/libpetsc4py.c -o build/temp.linux-x86_64-3.7/RHEL7-intel2020.2-impi-firedrake/src/libpetsc4py.o In file included from /home/blaise/Development/firedrake/lib/python3.7/site-packages/numpy/core/include/numpy/ndarrayobject.h(12), from /home/blaise/Development/firedrake/lib/python3.7/site-packages/numpy/core/include/numpy/arrayobject.h(4), from src/include/petsc4py/numpy.h(11), from src/libpetsc4py/libpetsc4py.c(612), from src/libpetsc4py.c(6): /home/blaise/Development/firedrake/lib/python3.7/site-packages/numpy/core/include/numpy/ndarraytypes.h(84): warning #2650: attributes ignored here NPY_CHAR NPY_ATTR_DEPRECATE("Use NPY_STRING"), ^ creating build/lib.linux-x86_64-3.7/petsc4py/lib/RHEL7-intel2020.2-impi-firedrake /share/apps/intel-2020.2/compilers_and_libraries/linux/mpi/intel64/bin/mpiicc -pthread -B /share/apps/intel-2020.2/intelpython3/compiler_compat -Wl,--sysroot=/ -std=c11 -D_GNU_SOURCE -fPIC -O3 -xCASCADELAKE -g -shared -L/opt/HPC/petsc-maint/RHEL7-intel2020.2-impi-firedrake/lib -I/opt/HPC/petsc-maint/RHEL7-intel2020.2-impi-firedrake/include -L/share/apps/intel-2020.2/intelpython3/lib -Wl,-rpath=/share/apps/intel-2020.2/intelpython3/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/RHEL7-intel2020.2-impi-firedrake/src/PETSc.o build/temp.linux-x86_64-3.7/RHEL7-intel2020.2-impi-firedrake/src/libpetsc4py.o -L/opt/HPC/petsc-maint/RHEL7-intel2020.2-impi-firedrake/lib -Wl,-rpath,/opt/HPC/petsc-maint/RHEL7-intel2020.2-impi-firedrake/lib -lpetsc -o build/lib.linux-x86_64-3.7/petsc4py/lib/RHEL7-intel2020.2-impi-firedrake/PETSc.cpython-37m-x86_64-linux-gnu.so writing build/lib.linux-x86_64-3.7/petsc4py/lib/petsc.cfg installing to build/bdist.linux-x86_64/wheel running install running install_lib creating build/bdist.linux-x86_64 creating build/bdist.linux-x86_64/wheel creating build/bdist.linux-x86_64/wheel/petsc4py copying build/lib.linux-x86_64-3.7/petsc4py/PETSc.py -> build/bdist.linux-x86_64/wheel/petsc4py copying build/lib.linux-x86_64-3.7/petsc4py/__init__.py -> build/bdist.linux-x86_64/wheel/petsc4py copying build/lib.linux-x86_64-3.7/petsc4py/__main__.py -> build/bdist.linux-x86_64/wheel/petsc4py creating build/bdist.linux-x86_64/wheel/petsc4py/lib copying build/lib.linux-x86_64-3.7/petsc4py/lib/__init__.py -> build/bdist.linux-x86_64/wheel/petsc4py/lib copying build/lib.linux-x86_64-3.7/petsc4py/lib/petsc.cfg -> build/bdist.linux-x86_64/wheel/petsc4py/lib creating build/bdist.linux-x86_64/wheel/petsc4py/lib/RHEL7-intel2020.2-impi-firedrake copying build/lib.linux-x86_64-3.7/petsc4py/lib/RHEL7-intel2020.2-impi-firedrake/PETSc.cpython-37m-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel/petsc4py/lib/RHEL7-intel2020.2-impi-firedrake creating build/bdist.linux-x86_64/wheel/petsc4py/include creating build/bdist.linux-x86_64/wheel/petsc4py/include/petsc4py copying build/lib.linux-x86_64-3.7/petsc4py/include/petsc4py/numpy.h -> build/bdist.linux-x86_64/wheel/petsc4py/include/petsc4py copying build/lib.linux-x86_64-3.7/petsc4py/include/petsc4py/petsc4py.h -> build/bdist.linux-x86_64/wheel/petsc4py/include/petsc4py copying build/lib.linux-x86_64-3.7/petsc4py/include/petsc4py/petsc4py.PETSc.h -> build/bdist.linux-x86_64/wheel/petsc4py/include/petsc4py copying build/lib.linux-x86_64-3.7/petsc4py/include/petsc4py/petsc4py.PETSc_api.h -> build/bdist.linux-x86_64/wheel/petsc4py/include/petsc4py copying build/lib.linux-x86_64-3.7/petsc4py/include/petsc4py/petsc4py.i -> build/bdist.linux-x86_64/wheel/petsc4py/include/petsc4py copying build/lib.linux-x86_64-3.7/petsc4py/include/petsc4py/PETSc.pxd -> build/bdist.linux-x86_64/wheel/petsc4py/include/petsc4py copying build/lib.linux-x86_64-3.7/petsc4py/include/petsc4py/__init__.pxd -> build/bdist.linux-x86_64/wheel/petsc4py/include/petsc4py copying build/lib.linux-x86_64-3.7/petsc4py/include/petsc4py/__init__.pyx -> build/bdist.linux-x86_64/wheel/petsc4py/include/petsc4py copying build/lib.linux-x86_64-3.7/petsc4py/PETSc.pxd -> build/bdist.linux-x86_64/wheel/petsc4py running install_egg_info running egg_info creating petsc4py.egg-info writing petsc4py.egg-info/PKG-INFO writing dependency_links to petsc4py.egg-info/dependency_links.txt writing requirements to petsc4py.egg-info/requires.txt writing top-level names to petsc4py.egg-info/top_level.txt writing manifest file 'petsc4py.egg-info/SOURCES.txt' reading manifest file 'petsc4py.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' writing manifest file 'petsc4py.egg-info/SOURCES.txt' Copying petsc4py.egg-info to build/bdist.linux-x86_64/wheel/petsc4py-3.13.0-py3.7.egg-info running install_scripts adding license file "LICENSE.rst" (matched pattern "LICEN[CS]E*") creating build/bdist.linux-x86_64/wheel/petsc4py-3.13.0.dist-info/WHEEL creating '/tmp/pip-wheel-0o0xldo2/petsc4py-3.13.0-cp37-cp37m-linux_x86_64.whl' and adding 'build/bdist.linux-x86_64/wheel' to it adding 'petsc4py/PETSc.pxd' adding 'petsc4py/PETSc.py' adding 'petsc4py/__init__.py' adding 'petsc4py/__main__.py' adding 'petsc4py/include/petsc4py/PETSc.pxd' adding 'petsc4py/include/petsc4py/__init__.pxd' adding 'petsc4py/include/petsc4py/__init__.pyx' adding 'petsc4py/include/petsc4py/numpy.h' adding 'petsc4py/include/petsc4py/petsc4py.PETSc.h' adding 'petsc4py/include/petsc4py/petsc4py.PETSc_api.h' adding 'petsc4py/include/petsc4py/petsc4py.h' adding 'petsc4py/include/petsc4py/petsc4py.i' adding 'petsc4py/lib/__init__.py' adding 'petsc4py/lib/petsc.cfg' adding 'petsc4py/lib/RHEL7-intel2020.2-impi-firedrake/PETSc.cpython-37m-x86_64-linux-gnu.so' adding 'petsc4py-3.13.0.dist-info/LICENSE.rst' adding 'petsc4py-3.13.0.dist-info/METADATA' adding 'petsc4py-3.13.0.dist-info/WHEEL' adding 'petsc4py-3.13.0.dist-info/top_level.txt' adding 'petsc4py-3.13.0.dist-info/RECORD' removing build/bdist.linux-x86_64/wheel Building wheel for petsc4py (setup.py): finished with status 'done' Created wheel for petsc4py: filename=petsc4py-3.13.0-cp37-cp37m-linux_x86_64.whl size=4287046 sha256=95604a089946004038f4268d38948c35a5372a2cedaa9df21e354d68231cb99a Stored in directory: /tmp/pip-ephem-wheel-cache-5540399x/wheels/88/a6/a9/78aff17157d1fedc3a047afc3f1c84462e68ea89775758cd2e Successfully built petsc4py Installing collected packages: petsc4py Successfully installed petsc4py-3.13.0 Removed build tracker: '/tmp/pip-req-tracker-jo7ho6yz' Any idea? Blaise -- A.K. & Shirley Barton Professor of Mathematics Adjunct Professor of Mechanical Engineering Adjunct of the Center for Computation & Technology Louisiana State University, Lockett Hall Room 344, Baton Rouge, LA 70803, USA Tel. +1 (225) 578 1612, Fax +1 (225) 578 4276 Web http://www.math.lsu.edu/~bourdin -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Aug 27 13:55:36 2020 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 27 Aug 2020 14:55:36 -0400 Subject: [petsc-users] Error building petsc4py using intel compilers and mpi In-Reply-To: <805D51D3-8F53-4F3E-90A5-478FFFB27A8B@lsu.edu> References: <805D51D3-8F53-4F3E-90A5-478FFFB27A8B@lsu.edu> Message-ID: On Thu, Aug 27, 2020 at 2:50 PM Blaise A Bourdin wrote: > Hi, > > I am trying to build firedrake using intel compilers, mpi, and python. I > compiled petsc using the following options, and petsc4py is installed by > firedrake-install (see the compilation options below) > > ./configure --with-mpi-dir=$MPI_HOME --CFLAGS='-std=c11 > -D_GNU_SOURCE' --CXXFLAGS='' --FFLAGS='' --COPTFLAGS='-O3 > -xCASCADELAKE -g' --FOPTFLAGS='-O3 -xCASCADELAKE -g' > --CXXOPTFLAGS='-O3 -xCASCADELAKE -g' --with-c++-support > --with-c-support --with-fortran --with-fortran-bindings=0 > --with-cxx-dialect=C++11 --download-ptscotch --download-hdf5= > https://support.hdfgroup.org/ftp/HDF5/releases/hdf5-1.10/hdf5-1.10.4/src/hdf5-1.10.4.tar.bz2 > --download-hypre --download-superlu_dist --download-metis > --download-parmetis --download-scalapack --download-mumps > --download-chaco --download-ml --download-eigen= > https://github.com/eigenteam/eigen-git-mirror/archive/3.3.3.tar.gz > --download-sowing=0 > > > > Here is what I get when trying to import petsc4py: > > (firedrake) mef90:firedrake$ python3 > Python 3.7.7 (default, Jun 26 2020, 05:10:03) > [GCC 7.3.0] :: Intel(R) Corporation on linux > Type "help", "copyright", "credits" or "license" for more information. > Intel(R) Distribution for Python is brought to you by Intel Corporation. > Please check out: https://software.intel.com/en-us/python-distribution > >>> import petsc4py > >>> petsc4py.init() > Traceback (most recent call last): > File "", line 1, in > File > "/home/blaise/Development/firedrake/lib/python3.7/site-packages/petsc4py/__init__.py", > line 42, in init > PETSc = petsc4py.lib.ImportPETSc(arch) > File > "/home/blaise/Development/firedrake/lib/python3.7/site-packages/petsc4py/lib/__init__.py", > line 29, in ImportPETSc > return Import('petsc4py', 'PETSc', path, arch) > File > "/home/blaise/Development/firedrake/lib/python3.7/site-packages/petsc4py/lib/__init__.py", > line 73, in Import > module = import_module(pkg, name, path, arch) > File > "/home/blaise/Development/firedrake/lib/python3.7/site-packages/petsc4py/lib/__init__.py", > line 58, in import_module > with f: return imp.load_module(fullname, f, fn, info) > File "/share/apps/intel-2020.2/intelpython3/lib/python3.7/imp.py", line > 242, in load_module > return load_dynamic(name, filename, file) > File "/share/apps/intel-2020.2/intelpython3/lib/python3.7/imp.py", line > 342, in load_dynamic > return _load(spec) > File "", line 696, in _load > File "", line 670, in _load_unlocked > File "", line 583, in module_from_spec > File "", line 1043, in > create_module > File "", line 219, in > _call_with_frames_removed > ImportError: > /home/blaise/Development/firedrake/lib/python3.7/site-packages/petsc4py/lib/RHEL7-intel2020.2-impi-firedrake/ > PETSc.cpython-37m-x86_64-linux-gnu.so: undefined symbol: > PetscPartitionerInitializePackage > > I have this symbol: master $:/PETSc3/petsc/petsc-dev$ nm -o arch-master-debug/lib/libpetsc.dylib | grep PetscPartitionerInitializePackage arch-master-debug/lib/libpetsc.dylib: 0000000000e07f90 T _PetscPartitionerInitializePackage Can you run nm -o /home/blaise/Development/firedrake/lib/python3.7/site-packages/petsc4py/lib/RHEL7-intel2020.2-impi-firedrake/ PETSc.cpython-37m-x86_64-linux-gnu.so | grep PetscPartitionerInitializePackage Thanks, Matt > The relevant part of the firedrake-install.log are below. > > > 2020-08-27 12:48:23,861 INFO Installing petsc4py/ > 2020-08-27 12:48:23,882 DEBUG Running command > '/home/blaise/Development/firedrake/bin/python -m pip install > --no-binary mpi4py,randomgen,islpy --no-deps -vvv --ignore-installed > petsc4py/' > 2020-08-27 12:50:32,017 DEBUG Using pip 20.2.2 from > /home/blaise/Development/firedrake/lib/python3.7/site-packages/pip (python > 3.7) > Non-user install because user site-packages disabled > Created temporary directory: /tmp/pip-ephem-wheel-cache-5540399x > Created temporary directory: /tmp/pip-req-tracker-jo7ho6yz > Initialized build tracking at /tmp/pip-req-tracker-jo7ho6yz > Created build tracker: /tmp/pip-req-tracker-jo7ho6yz > Entered build tracker: /tmp/pip-req-tracker-jo7ho6yz > Created temporary directory: /tmp/pip-install-e3u5z76o > Processing ./petsc4py > Created temporary directory: /tmp/pip-req-build-r3agmgty > Added file:///home/blaise/Development/firedrake/src/petsc4py to build > tracker '/tmp/pip-req-tracker-jo7ho6yz' > Running setup.py (path:/tmp/pip-req-build-r3agmgty/setup.py) egg_info > for package from file:///home/blaise/Development/firedrake/src/petsc4py > Created temporary directory: /tmp/pip-pip-egg-info-dctmtwx0 > Running command python setup.py egg_info > running egg_info > creating /tmp/pip-pip-egg-info-dctmtwx0/petsc4py.egg-info > writing /tmp/pip-pip-egg-info-dctmtwx0/petsc4py.egg-info/PKG-INFO > writing dependency_links to > /tmp/pip-pip-egg-info-dctmtwx0/petsc4py.egg-info/dependency_links.txt > writing requirements to > /tmp/pip-pip-egg-info-dctmtwx0/petsc4py.egg-info/requires.txt > writing top-level names to > /tmp/pip-pip-egg-info-dctmtwx0/petsc4py.egg-info/top_level.txt > writing manifest file > '/tmp/pip-pip-egg-info-dctmtwx0/petsc4py.egg-info/SOURCES.txt' > reading manifest file > '/tmp/pip-pip-egg-info-dctmtwx0/petsc4py.egg-info/SOURCES.txt' > reading manifest template 'MANIFEST.in' > writing manifest file > '/tmp/pip-pip-egg-info-dctmtwx0/petsc4py.egg-info/SOURCES.txt' > Source in /tmp/pip-req-build-r3agmgty has version 3.13.0, which > satisfies requirement petsc4py==3.13.0 from > file:///home/blaise/Development/firedrake/src/petsc4py > Removed petsc4py==3.13.0 from > file:///home/blaise/Development/firedrake/src/petsc4py from build tracker > '/tmp/pip-req-tracker-jo7ho6yz' > Building wheels for collected packages: petsc4py > Created temporary directory: /tmp/pip-wheel-0o0xldo2 > Building wheel for petsc4py (setup.py): started > Destination directory: /tmp/pip-wheel-0o0xldo2 > Running command /home/blaise/Development/firedrake/bin/python -u -c > 'import sys, setuptools, tokenize; sys.argv[0] > = '"'"'/tmp/pip-req-build-r3agmgty/setup.py'"'"'; > __file__='"'"'/tmp/pip-req-build-r3agmgty/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', > open)(__file__);code=f.read().replace('"'"'\r\n'"'"', > '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' > bdist_wheel -d /tmp/pip-wheel-0o0xldo2 > running bdist_wheel > running build > running build_src > cythonizing 'petsc4py.PETSc.pyx' -> 'petsc4py.PETSc.c' > > /home/blaise/Development/firedrake/lib/python3.7/site-packages/Cython/Compiler/Main.py:369: > FutureWarning: Cython directive 'language_level' not set, using 2 for now > (Py2). This will change in a later release! File: include/petsc4py/PETSc.pxd > tree = Parsing.p_module(s, pxd, full_module_name) > cythonizing 'libpetsc4py/libpetsc4py.pyx' -> 'libpetsc4py/libpetsc4py.c' > > /home/blaise/Development/firedrake/lib/python3.7/site-packages/Cython/Compiler/Main.py:369: > FutureWarning: Cython directive 'language_level' not set, using 2 for now > (Py2). This will change in a later release! File: > /tmp/pip-req-build-r3agmgty/src/libpetsc4py/libpetsc4py.pyx > tree = Parsing.p_module(s, pxd, full_module_name) > running build_py > creating build > creating build/lib.linux-x86_64-3.7 > creating build/lib.linux-x86_64-3.7/petsc4py > copying src/PETSc.py -> build/lib.linux-x86_64-3.7/petsc4py > copying src/__init__.py -> build/lib.linux-x86_64-3.7/petsc4py > copying src/__main__.py -> build/lib.linux-x86_64-3.7/petsc4py > creating build/lib.linux-x86_64-3.7/petsc4py/lib > copying src/lib/__init__.py -> build/lib.linux-x86_64-3.7/petsc4py/lib > creating build/lib.linux-x86_64-3.7/petsc4py/include > creating build/lib.linux-x86_64-3.7/petsc4py/include/petsc4py > copying src/include/petsc4py/numpy.h -> > build/lib.linux-x86_64-3.7/petsc4py/include/petsc4py > copying src/include/petsc4py/petsc4py.h -> > build/lib.linux-x86_64-3.7/petsc4py/include/petsc4py > copying src/include/petsc4py/petsc4py.PETSc.h -> > build/lib.linux-x86_64-3.7/petsc4py/include/petsc4py > copying src/include/petsc4py/petsc4py.PETSc_api.h -> > build/lib.linux-x86_64-3.7/petsc4py/include/petsc4py > copying src/include/petsc4py/petsc4py.i -> > build/lib.linux-x86_64-3.7/petsc4py/include/petsc4py > copying src/include/petsc4py/PETSc.pxd -> > build/lib.linux-x86_64-3.7/petsc4py/include/petsc4py > copying src/include/petsc4py/__init__.pxd -> > build/lib.linux-x86_64-3.7/petsc4py/include/petsc4py > copying src/include/petsc4py/__init__.pyx -> > build/lib.linux-x86_64-3.7/petsc4py/include/petsc4py > copying src/PETSc.pxd -> build/lib.linux-x86_64-3.7/petsc4py > copying src/lib/petsc.cfg -> build/lib.linux-x86_64-3.7/petsc4py/lib > running build_ext > PETSC_DIR: /opt/HPC/petsc-maint > PETSC_ARCH: RHEL7-intel2020.2-impi-firedrake > version: 3.13.4 release > integer-size: 32-bit > scalar-type: real > precision: double > language: CONLY > compiler: > /share/apps/intel-2020.2/compilers_and_libraries/linux/mpi/intel64/bin/mpiicc > linker: > /share/apps/intel-2020.2/compilers_and_libraries/linux/mpi/intel64/bin/mpiicc > building 'PETSc' extension > creating build/temp.linux-x86_64-3.7 > creating build/temp.linux-x86_64-3.7/RHEL7-intel2020.2-impi-firedrake > creating build/temp.linux-x86_64-3.7/RHEL7-intel2020.2-impi-firedrake/src > > /share/apps/intel-2020.2/compilers_and_libraries/linux/mpi/intel64/bin/mpiicc > -pthread -B /share/apps/intel-2020.2/intelpython3/compiler_compat > -Wl,--sysroot=/ -std=c11 -D_GNU_SOURCE -fPIC -O3 -xCASCADELAKE -g -fPIC > -L/opt/HPC/petsc-maint/RHEL7-intel2020.2-impi-firedrake/lib > -I/opt/HPC/petsc-maint/RHEL7-intel2020.2-impi-firedrake/include > -I/share/apps/intel-2020.2/compilers_and_libraries/linux/mpi/intel64/include > -I/opt/HPC/petsc-maint/RHEL7-intel2020.2-impi-firedrake/include/eigen3 > -I/opt/HPC/petsc-maint/RHEL7-intel2020.2-impi-firedrake/include > -I/opt/HPC/petsc-maint/include -Isrc/include > -I/home/blaise/Development/firedrake/lib/python3.7/site-packages/numpy/core/include > -I/home/blaise/Development/firedrake/include > -I/share/apps/intel-2020.2/intelpython3/include/python3.7m -c src/PETSc.c > -o build/temp.linux-x86_64-3.7/RHEL7-intel2020.2-impi-firedrake/src/PETSc.o > In file included from > /home/blaise/Development/firedrake/lib/python3.7/site-packages/numpy/core/include/numpy/ndarrayobject.h(12), > from > /home/blaise/Development/firedrake/lib/python3.7/site-packages/numpy/core/include/numpy/arrayobject.h(4), > from src/include/petsc4py/numpy.h(11), > from src/petsc4py.PETSc.c(612), > from src/PETSc.c(4): > > /home/blaise/Development/firedrake/lib/python3.7/site-packages/numpy/core/include/numpy/ndarraytypes.h(84): > warning #2650: attributes ignored here > NPY_CHAR NPY_ATTR_DEPRECATE("Use NPY_STRING"), > ^ > > In file included from src/petsc4py.PETSc.c(619), > from src/PETSc.c(4): > src/include/initpkg.h(23): warning #266: function > "PetscPartitionerInitializePackage" declared implicitly > ierr = PetscPartitionerInitializePackage();CHKERRQ(ierr); > ^ > > In file included from src/PETSc.c(4): > src/petsc4py.PETSc.c(294008): warning #266: function > "PetscPartitionerReset" declared implicitly > __pyx_t_1 = > __pyx_f_8petsc4py_5PETSc_CHKERR(PetscPartitionerReset(__pyx_v_self->part)); > if (unlikely(__pyx_t_1 == ((int)-1))) __PYX_ERR(54, 55, __pyx_L1_error) > ^ > > > /share/apps/intel-2020.2/compilers_and_libraries/linux/mpi/intel64/bin/mpiicc > -pthread -B /share/apps/intel-2020.2/intelpython3/compiler_compat > -Wl,--sysroot=/ -std=c11 -D_GNU_SOURCE -fPIC -O3 -xCASCADELAKE -g -fPIC > -L/opt/HPC/petsc-maint/RHEL7-intel2020.2-impi-firedrake/lib > -I/opt/HPC/petsc-maint/RHEL7-intel2020.2-impi-firedrake/include > -I/share/apps/intel-2020.2/compilers_and_libraries/linux/mpi/intel64/include > -I/opt/HPC/petsc-maint/RHEL7-intel2020.2-impi-firedrake/include/eigen3 > -I/opt/HPC/petsc-maint/RHEL7-intel2020.2-impi-firedrake/include > -I/opt/HPC/petsc-maint/include -Isrc/include > -I/home/blaise/Development/firedrake/lib/python3.7/site-packages/numpy/core/include > -I/home/blaise/Development/firedrake/include > -I/share/apps/intel-2020.2/intelpython3/include/python3.7m -c > src/libpetsc4py.c > -o build/temp.linux-x86_64-3.7/RHEL7-intel2020.2-impi-firedrake/src/libpetsc4py.o > In file included from > /home/blaise/Development/firedrake/lib/python3.7/site-packages/numpy/core/include/numpy/ndarrayobject.h(12), > from > /home/blaise/Development/firedrake/lib/python3.7/site-packages/numpy/core/include/numpy/arrayobject.h(4), > from src/include/petsc4py/numpy.h(11), > from src/libpetsc4py/libpetsc4py.c(612), > from src/libpetsc4py.c(6): > > /home/blaise/Development/firedrake/lib/python3.7/site-packages/numpy/core/include/numpy/ndarraytypes.h(84): > warning #2650: attributes ignored here > NPY_CHAR NPY_ATTR_DEPRECATE("Use NPY_STRING"), > ^ > > creating > build/lib.linux-x86_64-3.7/petsc4py/lib/RHEL7-intel2020.2-impi-firedrake > > /share/apps/intel-2020.2/compilers_and_libraries/linux/mpi/intel64/bin/mpiicc > -pthread -B /share/apps/intel-2020.2/intelpython3/compiler_compat > -Wl,--sysroot=/ -std=c11 -D_GNU_SOURCE -fPIC -O3 -xCASCADELAKE -g -shared > -L/opt/HPC/petsc-maint/RHEL7-intel2020.2-impi-firedrake/lib > -I/opt/HPC/petsc-maint/RHEL7-intel2020.2-impi-firedrake/include > -L/share/apps/intel-2020.2/intelpython3/lib > -Wl,-rpath=/share/apps/intel-2020.2/intelpython3/lib -Wl,--no-as-needed > -Wl,--sysroot=/ > build/temp.linux-x86_64-3.7/RHEL7-intel2020.2-impi-firedrake/src/PETSc.o > build/temp.linux-x86_64-3.7/RHEL7-intel2020.2-impi-firedrake/src/libpetsc4py.o > -L/opt/HPC/petsc-maint/RHEL7-intel2020.2-impi-firedrake/lib > -Wl,-rpath,/opt/HPC/petsc-maint/RHEL7-intel2020.2-impi-firedrake/lib > -lpetsc -o > build/lib.linux-x86_64-3.7/petsc4py/lib/RHEL7-intel2020.2-impi-firedrake/ > PETSc.cpython-37m-x86_64-linux-gnu.so > writing build/lib.linux-x86_64-3.7/petsc4py/lib/petsc.cfg > installing to build/bdist.linux-x86_64/wheel > running install > running install_lib > creating build/bdist.linux-x86_64 > creating build/bdist.linux-x86_64/wheel > creating build/bdist.linux-x86_64/wheel/petsc4py > copying build/lib.linux-x86_64-3.7/petsc4py/PETSc.py -> > build/bdist.linux-x86_64/wheel/petsc4py > copying build/lib.linux-x86_64-3.7/petsc4py/__init__.py -> > build/bdist.linux-x86_64/wheel/petsc4py > copying build/lib.linux-x86_64-3.7/petsc4py/__main__.py -> > build/bdist.linux-x86_64/wheel/petsc4py > creating build/bdist.linux-x86_64/wheel/petsc4py/lib > copying build/lib.linux-x86_64-3.7/petsc4py/lib/__init__.py -> > build/bdist.linux-x86_64/wheel/petsc4py/lib > copying build/lib.linux-x86_64-3.7/petsc4py/lib/petsc.cfg -> > build/bdist.linux-x86_64/wheel/petsc4py/lib > creating > build/bdist.linux-x86_64/wheel/petsc4py/lib/RHEL7-intel2020.2-impi-firedrake > copying > build/lib.linux-x86_64-3.7/petsc4py/lib/RHEL7-intel2020.2-impi-firedrake/ > PETSc.cpython-37m-x86_64-linux-gnu.so > -> build/bdist.linux-x86_64/wheel/petsc4py/lib/RHEL7-intel2020.2-impi-firedrake > creating build/bdist.linux-x86_64/wheel/petsc4py/include > creating build/bdist.linux-x86_64/wheel/petsc4py/include/petsc4py > copying build/lib.linux-x86_64-3.7/petsc4py/include/petsc4py/numpy.h -> > build/bdist.linux-x86_64/wheel/petsc4py/include/petsc4py > copying build/lib.linux-x86_64-3.7/petsc4py/include/petsc4py/petsc4py.h > -> build/bdist.linux-x86_64/wheel/petsc4py/include/petsc4py > copying > build/lib.linux-x86_64-3.7/petsc4py/include/petsc4py/petsc4py.PETSc.h -> > build/bdist.linux-x86_64/wheel/petsc4py/include/petsc4py > copying > build/lib.linux-x86_64-3.7/petsc4py/include/petsc4py/petsc4py.PETSc_api.h > -> build/bdist.linux-x86_64/wheel/petsc4py/include/petsc4py > copying build/lib.linux-x86_64-3.7/petsc4py/include/petsc4py/petsc4py.i > -> build/bdist.linux-x86_64/wheel/petsc4py/include/petsc4py > copying build/lib.linux-x86_64-3.7/petsc4py/include/petsc4py/PETSc.pxd > -> build/bdist.linux-x86_64/wheel/petsc4py/include/petsc4py > copying > build/lib.linux-x86_64-3.7/petsc4py/include/petsc4py/__init__.pxd -> > build/bdist.linux-x86_64/wheel/petsc4py/include/petsc4py > copying > build/lib.linux-x86_64-3.7/petsc4py/include/petsc4py/__init__.pyx -> > build/bdist.linux-x86_64/wheel/petsc4py/include/petsc4py > copying build/lib.linux-x86_64-3.7/petsc4py/PETSc.pxd -> > build/bdist.linux-x86_64/wheel/petsc4py > running install_egg_info > running egg_info > creating petsc4py.egg-info > writing petsc4py.egg-info/PKG-INFO > writing dependency_links to petsc4py.egg-info/dependency_links.txt > writing requirements to petsc4py.egg-info/requires.txt > writing top-level names to petsc4py.egg-info/top_level.txt > writing manifest file 'petsc4py.egg-info/SOURCES.txt' > reading manifest file 'petsc4py.egg-info/SOURCES.txt' > reading manifest template 'MANIFEST.in' > writing manifest file 'petsc4py.egg-info/SOURCES.txt' > Copying petsc4py.egg-info to > build/bdist.linux-x86_64/wheel/petsc4py-3.13.0-py3.7.egg-info > running install_scripts > adding license file "LICENSE.rst" (matched pattern "LICEN[CS]E*") > creating build/bdist.linux-x86_64/wheel/petsc4py-3.13.0.dist-info/WHEEL > creating > '/tmp/pip-wheel-0o0xldo2/petsc4py-3.13.0-cp37-cp37m-linux_x86_64.whl' and > adding 'build/bdist.linux-x86_64/wheel' to it > adding 'petsc4py/PETSc.pxd' > adding 'petsc4py/PETSc.py' > adding 'petsc4py/__init__.py' > adding 'petsc4py/__main__.py' > adding 'petsc4py/include/petsc4py/PETSc.pxd' > adding 'petsc4py/include/petsc4py/__init__.pxd' > adding 'petsc4py/include/petsc4py/__init__.pyx' > adding 'petsc4py/include/petsc4py/numpy.h' > adding 'petsc4py/include/petsc4py/petsc4py.PETSc.h' > adding 'petsc4py/include/petsc4py/petsc4py.PETSc_api.h' > adding 'petsc4py/include/petsc4py/petsc4py.h' > adding 'petsc4py/include/petsc4py/petsc4py.i' > adding 'petsc4py/lib/__init__.py' > adding 'petsc4py/lib/petsc.cfg' > adding 'petsc4py/lib/RHEL7-intel2020.2-impi-firedrake/ > PETSc.cpython-37m-x86_64-linux-gnu.so' > adding 'petsc4py-3.13.0.dist-info/LICENSE.rst' > adding 'petsc4py-3.13.0.dist-info/METADATA' > adding 'petsc4py-3.13.0.dist-info/WHEEL' > adding 'petsc4py-3.13.0.dist-info/top_level.txt' > adding 'petsc4py-3.13.0.dist-info/RECORD' > removing build/bdist.linux-x86_64/wheel > Building wheel for petsc4py (setup.py): finished with status 'done' > Created wheel for petsc4py: > filename=petsc4py-3.13.0-cp37-cp37m-linux_x86_64.whl > size=4287046 sha256=95604a089946004038f4268d38948c35a5372a2cedaa9df21e354d68231cb99a > Stored in directory: > /tmp/pip-ephem-wheel-cache-5540399x/wheels/88/a6/a9/78aff17157d1fedc3a047afc3f1c84462e68ea89775758cd2e > Successfully built petsc4py > Installing collected packages: petsc4py > > Successfully installed petsc4py-3.13.0 > Removed build tracker: '/tmp/pip-req-tracker-jo7ho6yz' > > > > Any idea? > > Blaise > > > -- > A.K. & Shirley Barton Professor of Mathematics > Adjunct Professor of Mechanical Engineering > Adjunct of the Center for Computation & Technology > Louisiana State University, Lockett Hall Room 344, Baton Rouge, LA 70803, > USA > Tel. +1 (225) 578 1612, Fax +1 (225) 578 4276 Web > http://www.math.lsu.edu/~bourdin > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlohry at gmail.com Thu Aug 27 15:26:10 2020 From: mlohry at gmail.com (Mark Lohry) Date: Thu, 27 Aug 2020 16:26:10 -0400 Subject: [petsc-users] Bus Error In-Reply-To: <386068BC-7972-455E-A9E8-C09F9DCF58BD@petsc.dev> References: <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> <02176361-CBC4-490B-A064-057C9FAC271A@petsc.dev> <2C14E111-0ABC-4322-AD1C-FC83E4BD4798@petsc.dev> <87h7ssgg0g.fsf@jedbrown.org> <80DC4DC1-8D9B-43BF-9947-F47694FE68E9@petsc.dev> <87blj0gew3.fsf@jedbrown.org> <878se4gdd6.fsf@jedbrown.org> <3B4D8471-F863-49C8-AAD7-4A4ACA3BE78A@petsc.dev> <87364cgczg.fsf@jedbrown.org> <79E082F4-0261-4F32-9781-861B2B650511@petsc.dev> <87y2m3g7mp.fsf@jedbrown.org> <1BA78983-882E-404D-983D-B432D17E6421@petsc.dev> <87a6yjg3o5.fsf@jedbrown.org> <9EEB2628-D6ED-4466-A629-33EAC73BCE4C@petsc.dev> <386068BC-7972-455E-A9E8-C09F9DCF58BD@petsc.dev> Message-ID: Alright, this time it crashed with a bus error before petsc had even been initialized or anything in blas had ever been called. I'm told there was also a known network failure on this cluster a few days ago that took out one rack, so now I'm reasonably convinced there are legitimate hardware faults elsewhere. Looking like a wild goose chase on the software side, but all the help is hugely appreciated. On Thu, Aug 27, 2020 at 10:52 AM Barry Smith wrote: > > Thanks, > > So this means that all the double precision array pointers that PETSc is > passing into these BLAS calls are addressable. Which means nothing has > corrupted any of these pointers before the calls. > > What my patch did. Before each BLAS call, for each double array argument > it set a special exception handler and then accessed the first entry in the > array. Since the exception handler was never called this means that the > first entry of each array was accessible and would not produce a SEGV or > SIGBUS. > > What else could be corrupted. > > 1) the size arguments passed to the BLAS calls, if they were too large > they could result in accessing incorrect memory but IMHO that would usually > produce a SEGV not a SIGBUS. It is hard to put a check in the code because > these sizes are problem dependent and there is no way to know if they are > wrong. > > 2) corruption of the stack? > > 3) hardware issue due to overheating or bad memory etc. I assume the MPI > rank that crashes changes for each crashing run. I am adding code to our > patch branch to print the node name that hopefully is constant for all > runs, then one can see if the problem is always on the same node. Patch > attached > > > Can you try with a very different BLAS implementation? What are you > using now? > > For example you could configure PETSc with --download-f2cblaslapack or > if you are using MKL switch to non-MKL, or if you are using the system BLAS > switch to MKL. > > Barry > > We can also replace the BLAS calls with direct C and see what happens but > let's only do that after you try a different BLAS. > > > > > > On Aug 27, 2020, at 8:53 AM, Mark Lohry wrote: > > It was built with --with-debugging=1 > > On Thu, Aug 27, 2020 at 9:44 AM Barry Smith wrote: > >> >> Mark, >> >> Did i tell you that this has to be built with the configure option >> --with-debugging=1 and won't be turned off with --with-debugging=0 ? >> >> Barry >> >> >> On Aug 27, 2020, at 8:10 AM, Mark Lohry wrote: >> >> Barry, no output from that patch i'm afraid: >> >> 54 KSP Residual norm 3.215013886664e+03 >> 55 KSP Residual norm 3.049105434513e+03 >> 56 KSP Residual norm 2.859123916860e+03 >> [929]PETSC ERROR: >> ------------------------------------------------------------------------ >> [929]PETSC ERROR: Caught signal number 7 BUS: Bus Error, possibly illegal >> memory access >> [929]PETSC ERROR: Try option -start_in_debugger or >> -on_error_attach_debugger >> [929]PETSC ERROR: or see >> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >> [929]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac >> OS X to find memory corruption errors >> [929]PETSC ERROR: likely location of problem given in stack below >> [929]PETSC ERROR: --------------------- Stack Frames >> ------------------------------------ >> [929]PETSC ERROR: Note: The EXACT line numbers in the stack are not >> available, >> [929]PETSC ERROR: INSTEAD the line number of the start of the >> function >> [929]PETSC ERROR: is given. >> [929]PETSC ERROR: [929] BLASgemv line 1406 >> /home/mlohry/petsc/src/mat/impls/baij/seq/baijfact.c >> [929]PETSC ERROR: [929] MatSolve_SeqBAIJ_N_NaturalOrdering line 1378 >> /home/mlohry/petsc/src/mat/impls/baij/seq/baijfact.c >> [929]PETSC ERROR: [929] MatSolve line 3354 >> /home/mlohry/petsc/src/mat/interface/matrix.c >> [929]PETSC ERROR: [929] PCApply_ILU line 201 >> /home/mlohry/petsc/src/ksp/pc/impls/factor/ilu/ilu.c >> [929]PETSC ERROR: [929] PCApply line 426 >> /home/mlohry/petsc/src/ksp/pc/interface/precon.c >> [929]PETSC ERROR: [929] KSP_PCApply line 279 >> /home/mlohry/petsc/include/petsc/private/kspimpl.h >> [929]PETSC ERROR: [929] KSPSolve_PREONLY line 16 >> /home/mlohry/petsc/src/ksp/ksp/impls/preonly/preonly.c >> [929]PETSC ERROR: [929] KSPSolve_Private line 590 >> /home/mlohry/petsc/src/ksp/ksp/interface/itfunc.c >> [929]PETSC ERROR: [929] KSPSolve line 848 >> /home/mlohry/petsc/src/ksp/ksp/interface/itfunc.c >> [929]PETSC ERROR: [929] PCApply_ASM line 441 >> /home/mlohry/petsc/src/ksp/pc/impls/asm/asm.c >> [929]PETSC ERROR: [929] PCApply line 426 >> /home/mlohry/petsc/src/ksp/pc/interface/precon.c >> [929]PETSC ERROR: [929] KSP_PCApply line 279 >> /home/mlohry/petsc/include/petsc/private/kspimpl.h >> srun: Job step aborted: Waiting up to 47 seconds for job step to finish. >> [929]PETSC ERROR: [929] KSPFGMRESCycle line 108 >> /home/mlohry/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c >> [929]PETSC ERROR: [929] KSPSolve_FGMRES line 274 >> /home/mlohry/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c >> [929]PETSC ERROR: [929] KSPSolve_Private line 590 >> /home/mlohry/petsc/src/ksp/ksp/interface/itfunc.c >> >> On Mon, Aug 24, 2020 at 6:47 PM Mark Lohry wrote: >> >>> I don't think I do. Running a much smaller case with the same models I >>> get the attached report from valgrind --show-leak-kinds=all >>> --leak-check=full --track-origins=yes. I only see some HDF5 stuff and >>> OpenMPI that I think are false positives. >>> >>> ==1286950== Memcheck, a memory error detector >>> ==1286950== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et >>> al. >>> ==1286950== Using Valgrind-3.15.0-608cb11914-20190413 and LibVEX; rerun >>> with -h for copyright info >>> ==1286950== Command: ./verification_testing >>> --gtest_filter=DrivenCavity3D.Re100_BackwardEulerILU1_16x16N2_Quadrature1 >>> --petsc_time_integrator=arkimex --petsc_arkimex_type=l2 >>> ==1286950== Parent PID: 1286932 >>> ==1286950== >>> --1286950-- >>> --1286950-- Valgrind options: >>> --1286950-- --show-leak-kinds=all >>> --1286950-- --leak-check=full >>> --1286950-- --track-origins=yes >>> --1286950-- --log-file=valgrind-out.txt >>> --1286950-- -v >>> --1286950-- Contents of /proc/version: >>> --1286950-- Linux version 5.4.0-29-generic (buildd at lgw01-amd64-035) >>> (gcc version 9.3.0 (Ubuntu 9.3.0-10ubuntu2)) #33-Ubuntu SMP Wed Apr 29 >>> 14:32:27 UTC 2020 >>> --1286950-- >>> --1286950-- Arch and hwcaps: AMD64, LittleEndian, >>> amd64-cx16-rdtscp-sse3-ssse3-avx >>> --1286950-- Page sizes: currently 4096, max supported 4096 >>> --1286950-- Valgrind library directory: >>> /usr/lib/x86_64-linux-gnu/valgrind >>> --1286950-- Reading syms from >>> /home/mlohry/dev/cmake-build/verification_testing >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/ld-2.31.so >>> --1286950-- Considering /usr/lib/x86_64-linux-gnu/ld-2.31.so .. >>> --1286950-- .. CRC mismatch (computed 387b17ea wanted d28cf5ef) >>> --1286950-- Considering /lib/x86_64-linux-gnu/ld-2.31.so .. >>> --1286950-- .. CRC mismatch (computed 387b17ea wanted d28cf5ef) >>> --1286950-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ld-2.31.so >>> .. >>> --1286950-- .. CRC is valid >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/valgrind/memcheck-amd64-linux >>> --1286950-- object doesn't have a symbol table >>> --1286950-- object doesn't have a dynamic symbol table >>> --1286950-- Scheduler: using generic scheduler lock implementation. >>> --1286950-- Reading suppressions file: >>> /usr/lib/x86_64-linux-gnu/valgrind/default.supp >>> ==1286950== embedded gdbserver: reading from >>> /tmp/vgdb-pipe-from-vgdb-to-1286950-by-mlohry-on-??? >>> ==1286950== embedded gdbserver: writing to >>> /tmp/vgdb-pipe-to-vgdb-from-1286950-by-mlohry-on-??? >>> ==1286950== embedded gdbserver: shared mem >>> /tmp/vgdb-pipe-shared-mem-vgdb-1286950-by-mlohry-on-??? >>> ==1286950== >>> ==1286950== TO CONTROL THIS PROCESS USING vgdb (which you probably >>> ==1286950== don't want to do, unless you know exactly what you're doing, >>> ==1286950== or are doing some strange experiment): >>> ==1286950== /usr/lib/x86_64-linux-gnu/valgrind/../../bin/vgdb >>> --pid=1286950 ...command... >>> ==1286950== >>> ==1286950== TO DEBUG THIS PROCESS USING GDB: start GDB like this >>> ==1286950== /path/to/gdb ./verification_testing >>> ==1286950== and then give GDB the following command >>> ==1286950== target remote | >>> /usr/lib/x86_64-linux-gnu/valgrind/../../bin/vgdb --pid=1286950 >>> ==1286950== --pid is optional if only one valgrind process is running >>> ==1286950== >>> --1286950-- REDIR: 0x4022d80 (ld-linux-x86-64.so.2:strlen) redirected to >>> 0x580c9ce2 (???) >>> --1286950-- REDIR: 0x4022b50 (ld-linux-x86-64.so.2:index) redirected to >>> 0x580c9cfc (???) >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_core-amd64-linux.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so >>> --1286950-- object doesn't have a symbol table >>> ==1286950== WARNING: new redirection conflicts with existing -- ignoring >>> it >>> --1286950-- old: 0x04022d80 (strlen ) R-> (0000.0) >>> 0x580c9ce2 ??? >>> --1286950-- new: 0x04022d80 (strlen ) R-> (2007.0) >>> 0x0483f060 strlen >>> --1286950-- REDIR: 0x401f560 (ld-linux-x86-64.so.2:strcmp) redirected to >>> 0x483ffd0 (strcmp) >>> --1286950-- REDIR: 0x40232e0 (ld-linux-x86-64.so.2:mempcpy) redirected >>> to 0x4843a20 (mempcpy) >>> --1286950-- Reading syms from >>> /home/mlohry/dev/cmake-build/initialization/libinitialization.so >>> --1286950-- Reading syms from >>> /home/mlohry/dev/cmake-build/governing_equations/libgoverning_equations.so >>> --1286950-- Reading syms from >>> /home/mlohry/dev/cmake-build/time_stepping/libtime_stepping.so >>> --1286950-- Reading syms from >>> /home/mlohry/dev/cmake-build/governing_equations/libboundary_conditions.so >>> --1286950-- Reading syms from >>> /home/mlohry/dev/cmake-build/governing_equations/libsolution_monitors.so >>> --1286950-- Reading syms from >>> /home/mlohry/dev/cmake-build/governing_equations/libfluxtypes.so >>> --1286950-- Reading syms from >>> /home/mlohry/dev/cmake-build/algebraic_solvers/libalgebraic_solvers.so >>> --1286950-- Reading syms from >>> /home/mlohry/dev/cmake-build/program_options/libprogram_options.so >>> --1286950-- Reading syms from >>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_filesystem.so.1.73.0 >>> --1286950-- Reading syms from >>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0 >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so.40.20.1 >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3 >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/ >>> libpthread-2.31.so >>> --1286950-- Considering >>> /usr/lib/debug/.build-id/77/5cbbfff814456660786780b0b3b40096b4c05e.debug .. >>> --1286950-- .. build-id is valid >>> --1286948-- Reading syms from >>> /home/mlohry/dev/cmake-build/external/petsc/arch-linux2-c-opt/lib/libpetsc.so.3.13.3 >>> --1286937-- Reading syms from >>> /home/mlohry/dev/cmake-build/parallel/libparallel.so >>> --1286937-- Reading syms from >>> /home/mlohry/dev/cmake-build/logger/liblogger.so >>> --1286937-- Reading syms from >>> /home/mlohry/dev/cmake-build/spatial_discretization/libdiscretization.so >>> --1286945-- Reading syms from >>> /home/mlohry/dev/cmake-build/utils/libutils.so >>> --1286944-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28 >>> --1286938-- object doesn't have a symbol table >>> --1286949-- Reading syms from /usr/lib/x86_64-linux-gnu/libm-2.31.so >>> --1286949-- Considering /usr/lib/x86_64-linux-gnu/libm-2.31.so .. >>> --1286947-- .. CRC mismatch (computed 327d785f wanted 751f5509) >>> --1286947-- Considering /lib/x86_64-linux-gnu/libm-2.31.so .. >>> --1286938-- .. CRC mismatch (computed 327d785f wanted 751f5509) >>> --1286937-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ >>> libm-2.31.so .. >>> --1286950-- .. CRC is valid >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libc-2.31.so >>> --1286950-- Considering /usr/lib/x86_64-linux-gnu/libc-2.31.so .. >>> --1286951-- .. CRC mismatch (computed a6f43087 wanted 6555436e) >>> --1286951-- Considering /lib/x86_64-linux-gnu/libc-2.31.so .. >>> --1286947-- .. CRC mismatch (computed a6f43087 wanted 6555436e) >>> --1286947-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ >>> libc-2.31.so .. >>> --1286950-- .. CRC is valid >>> --1286940-- Reading syms from >>> /home/mlohry/dev/cmake-build/file_io/libfileio.so >>> --1286950-- Reading syms from >>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_program_options.so.1.73.0 >>> --1286950-- Reading syms from >>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_serialization.so.1.73.0 >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3 >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3 >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/libhwloc.so.15.1.0 >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /home/mlohry/dev/cmake-build/external/petsc/arch-linux2-c-opt/lib/libsuperlu_dist.so.6.3.0 >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 >>> --1286950-- object doesn't have a symbol table >>> --1286937-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 >>> --1286937-- object doesn't have a symbol table >>> --1286947-- Reading syms from /usr/lib/x86_64-linux-gnu/libX11.so.6.3.0 >>> --1286939-- object doesn't have a symbol table >>> --1286947-- Reading syms from /usr/lib/x86_64-linux-gnu/libdl-2.31.so >>> --1286947-- Considering /usr/lib/x86_64-linux-gnu/libdl-2.31.so .. >>> --1286947-- .. CRC mismatch (computed 4fd191ca wanted df8dd39a) >>> --1286947-- Considering /lib/x86_64-linux-gnu/libdl-2.31.so .. >>> --1286947-- .. CRC mismatch (computed 4fd191ca wanted df8dd39a) >>> --1286947-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ >>> libdl-2.31.so .. >>> --1286947-- .. CRC is valid >>> --1286937-- Reading syms from >>> /home/mlohry/dev/cmake-build/external/petsc/arch-linux2-c-opt/lib/libmetis.so >>> --1286937-- Reading syms from >>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_log.so.1.73.0 >>> --1286942-- Reading syms from >>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_log_setup.so.1.73.0 >>> --1286942-- Reading syms from >>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_thread.so.1.73.0 >>> --1286942-- Reading syms from >>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_regex.so.1.73.0 >>> --1286949-- Reading syms from >>> /home/mlohry/dev/cmake-build/basis_functions/libbasis_functions.so >>> --1286944-- Reading syms from /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0 >>> --1286944-- object doesn't have a symbol table >>> --1286951-- Reading syms from >>> /home/mlohry/dev/cmake-build/external_install/lib/libcgns.so >>> --1286951-- object doesn't have a symbol table >>> --1286943-- Reading syms from >>> /home/mlohry/dev/cmake-build/external_install/lib/libhdf5.so.103.1.0 >>> --1286951-- Reading syms from >>> /home/mlohry/dev/cmake-build/external/tinyxml2-build/libtinyxml2.so.6.1.0 >>> --1286944-- Reading syms from >>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_iostreams.so.1.73.0 >>> --1286944-- Reading syms from /usr/lib/x86_64-linux-gnu/libz.so.1.2.11 >>> --1286944-- object doesn't have a symbol table >>> --1286951-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0 >>> --1286951-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libutil-2.31.so >>> --1286946-- Considering /usr/lib/x86_64-linux-gnu/libutil-2.31.so .. >>> --1286946-- .. CRC mismatch (computed 4639aba5 wanted ceb246b4) >>> --1286946-- Considering /lib/x86_64-linux-gnu/libutil-2.31.so .. >>> --1286946-- .. CRC mismatch (computed 4639aba5 wanted ceb246b4) >>> --1286948-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ >>> libutil-2.31.so .. >>> --1286939-- .. CRC is valid >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libudev.so.1.6.17 >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libltdl.so.7.3.1 >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/libgfortran.so.5.0.0 >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libxcb.so.1.1.0 >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/librt-2.31.so >>> --1286950-- Considering /usr/lib/x86_64-linux-gnu/librt-2.31.so .. >>> --1286950-- .. CRC mismatch (computed a9acc0ce wanted cb4905a3) >>> --1286950-- Considering /lib/x86_64-linux-gnu/librt-2.31.so .. >>> --1286950-- .. CRC mismatch (computed a9acc0ce wanted cb4905a3) >>> --1286950-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ >>> librt-2.31.so .. >>> --1286950-- .. CRC is valid >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/libquadmath.so.0.0.0 >>> --1286950-- object doesn't have a symbol table >>> --1286945-- Reading syms from /usr/lib/x86_64-linux-gnu/libXau.so.6.0.0 >>> --1286945-- Considering /usr/lib/x86_64-linux-gnu/libXau.so.6.0.0 .. >>> --1286945-- .. CRC mismatch (computed 7de9b6ad wanted e8a17129) >>> --1286945-- Considering /lib/x86_64-linux-gnu/libXau.so.6.0.0 .. >>> --1286945-- .. CRC mismatch (computed 7de9b6ad wanted e8a17129) >>> --1286945-- object doesn't have a symbol table >>> --1286945-- Reading syms from /usr/lib/x86_64-linux-gnu/libXdmcp.so.6.0.0 >>> --1286942-- object doesn't have a symbol table >>> --1286942-- Reading syms from /usr/lib/x86_64-linux-gnu/libbsd.so.0.10.0 >>> --1286942-- object doesn't have a symbol table >>> --1286950-- REDIR: 0x6516600 (libc.so.6:memmove) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6515900 (libc.so.6:strncpy) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6516930 (libc.so.6:strcasecmp) redirected to >>> 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6515220 (libc.so.6:strcat) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6515960 (libc.so.6:rindex) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6517dd0 (libc.so.6:rawmemchr) redirected to >>> 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6532e60 (libc.so.6:wmemchr) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65329a0 (libc.so.6:wcscmp) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6516760 (libc.so.6:mempcpy) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6516590 (libc.so.6:bcmp) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6515890 (libc.so.6:strncmp) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65152d0 (libc.so.6:strcmp) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65166c0 (libc.so.6:memset) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6532960 (libc.so.6:wcschr) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65157f0 (libc.so.6:strnlen) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65153b0 (libc.so.6:strcspn) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6516980 (libc.so.6:strncasecmp) redirected to >>> 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6515350 (libc.so.6:strcpy) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6516ad0 (libc.so.6:memcpy@@GLIBC_2.14) redirected >>> to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65340d0 (libc.so.6:wcsnlen) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65329e0 (libc.so.6:wcscpy) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65159a0 (libc.so.6:strpbrk) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6515280 (libc.so.6:index) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65157b0 (libc.so.6:strlen) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x651ed20 (libc.so.6:memrchr) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65169d0 (libc.so.6:strcasecmp_l) redirected to >>> 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6516550 (libc.so.6:memchr) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6532ab0 (libc.so.6:wcslen) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6515c60 (libc.so.6:strspn) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65168d0 (libc.so.6:stpncpy) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6516870 (libc.so.6:stpcpy) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6517e10 (libc.so.6:strchrnul) redirected to >>> 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6516a20 (libc.so.6:strncasecmp_l) redirected to >>> 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6516470 (libc.so.6:strstr) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65a3750 (libc.so.6:__memcpy_chk) redirected to >>> 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286938-- REDIR: 0x6527a30 (libc.so.6:__strrchr_sse2) redirected to >>> 0x483ea70 (__strrchr_sse2) >>> --1286938-- REDIR: 0x6511c90 (libc.so.6:calloc) redirected to 0x483dce0 >>> (calloc) >>> --1286938-- REDIR: 0x6510260 (libc.so.6:malloc) redirected to 0x483b780 >>> (malloc) >>> --1286938-- REDIR: 0x6531c40 (libc.so.6:memcpy at GLIBC_2.2.5) redirected >>> to 0x4840100 (memcpy at GLIBC_2.2.5) >>> --1286938-- REDIR: 0x6527d30 (libc.so.6:__strlen_sse2) redirected to >>> 0x483efa0 (__strlen_sse2) >>> --1286938-- REDIR: 0x65f4ac0 (libc.so.6:__strncmp_sse42) redirected to >>> 0x483f7c0 (__strncmp_sse42) >>> --1286938-- REDIR: 0x6510850 (libc.so.6:free) redirected to 0x483c9d0 >>> (free) >>> --1286938-- REDIR: 0x6532070 (libc.so.6:__memset_sse2_unaligned) >>> redirected to 0x48428e0 (memset) >>> --1286938-- REDIR: 0x6603350 (libc.so.6:__memcmp_sse4_1) redirected to >>> 0x4842150 (__memcmp_sse4_1) >>> --1286938-- REDIR: 0x6520520 (libc.so.6:__strcmp_sse2_unaligned) >>> redirected to 0x483fed0 (strcmp) >>> --1286938-- REDIR: 0x61d0c10 (libstdc++.so.6:operator new(unsigned >>> long)) redirected to 0x483bdf0 (operator new(unsigned long)) >>> --1286938-- REDIR: 0x61cee60 (libstdc++.so.6:operator delete(void*)) >>> redirected to 0x483cf50 (operator delete(void*)) >>> --1286938-- REDIR: 0x61d0c70 (libstdc++.so.6:operator new[](unsigned >>> long)) redirected to 0x483c510 (operator new[](unsigned long)) >>> --1286938-- REDIR: 0x61cee90 (libstdc++.so.6:operator delete[](void*)) >>> redirected to 0x483d6e0 (operator delete[](void*)) >>> --1286938-- REDIR: 0x65275f0 (libc.so.6:__strchr_sse2) redirected to >>> 0x483eb90 (__strchr_sse2) >>> --1286950-- REDIR: 0x6511000 (libc.so.6:realloc) redirected to 0x483df30 >>> (realloc) >>> --1286950-- REDIR: 0x6527820 (libc.so.6:__strchrnul_sse2) redirected to >>> 0x4843540 (strchrnul) >>> --1286950-- REDIR: 0x6531560 (libc.so.6:__strstr_sse2_unaligned) >>> redirected to 0x4843c20 (strstr) >>> --1286950-- REDIR: 0x6531c20 (libc.so.6:__mempcpy_sse2_unaligned) >>> redirected to 0x4843660 (mempcpy) >>> --1286950-- REDIR: 0x652d2a0 (libc.so.6:__strncpy_sse2_unaligned) >>> redirected to 0x483f560 (__strncpy_sse2_unaligned) >>> --1286950-- REDIR: 0x6515830 (libc.so.6:strncat) redirected to 0x48331d0 >>> (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65305b0 (libc.so.6:__strncat_sse2_unaligned) >>> redirected to 0x483ede0 (strncat) >>> --1286950-- REDIR: 0x6516120 (libc.so.6:__GI_strstr) redirected to >>> 0x4843ca0 (__strstr_sse2) >>> --1286950-- REDIR: 0x6522360 (libc.so.6:__rawmemchr_sse2) redirected to >>> 0x4843580 (rawmemchr) >>> --1286950-- REDIR: 0x65faea0 (libc.so.6:__strcasecmp_avx) redirected to >>> 0x483f830 (strcasecmp) >>> --1286950-- REDIR: 0x65fc520 (libc.so.6:__strncasecmp_avx) redirected to >>> 0x483f910 (strncasecmp) >>> --1286950-- REDIR: 0x65f98a0 (libc.so.6:__strspn_sse42) redirected to >>> 0x4843ef0 (strspn) >>> --1286950-- REDIR: 0x65f9620 (libc.so.6:__strcspn_sse42) redirected to >>> 0x4843e10 (strcspn) >>> --1286948-- REDIR: 0x6522030 (libc.so.6:__memchr_sse2) redirected to >>> 0x4840050 (memchr) >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_mmap.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_posix.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_sysv.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Discarding syms at 0x4a96240-0x4a96d47 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_posix.so >>> (have_dinfo 1) >>> --1286948-- Discarding syms at 0x4a9b1c0-0x4a9b937 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_sysv.so >>> (have_dinfo 1) >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_weighted.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_netlink.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/libnl-3.so.200.26.0 >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Discarding syms at 0x4a96120-0x4a966b0 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_weighted.so >>> (have_dinfo 1) >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_flux.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_orte.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_ompi.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_slurm.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- REDIR: 0x64bc670 (libc.so.6:setenv) redirected to 0x4844480 >>> (setenv) >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_pmi.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_flux.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_ext3x.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25 >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Discarding syms at 0x8d053e0-0x8d07391 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_flux.so (have_dinfo >>> 1) >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v21.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v20.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v3.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v12.so >>> --1286948-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_usock.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_tcp.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_native.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_none.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Discarding syms at 0x8d04180-0x8d045b0 in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_none.so (have_dinfo 1) >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/libmca_common_dstore.so.1.0.2 >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_hash.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds12.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_pshmem_mmap.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_preg_native.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_stdfd.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_syslog.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_default.so >>> --1286950-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_tool.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_app.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_orted.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_hnp.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_novm.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Discarding syms at 0x9ebf0a0-0x9ebf490 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_tool.so >>> (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9eca300-0x9ecbee8 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_orted.so >>> (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9ed1220-0x9ed24e7 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_hnp.so (have_dinfo >>> 1) >>> --1286946-- Discarding syms at 0x9ed8240-0x9ed8c88 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_novm.so >>> (have_dinfo 1) >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_tool.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_app.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_hnp.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_orted.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Discarding syms at 0x9ebf0e0-0x9ebf417 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_tool.so >>> (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9ecf320-0x9ed1239 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_hnp.so >>> (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9ed73a0-0x9ed9ccc in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_orted.so >>> (have_dinfo 1) >>> --1286936-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_basic.so >>> --1286936-- object doesn't have a symbol table >>> --1286936-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_bucket.so >>> --1286936-- object doesn't have a symbol table >>> --1286936-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_rcache_grdma.so >>> --1286936-- object doesn't have a symbol table >>> --1286936-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mpool_hugepage.so >>> --1286936-- object doesn't have a symbol table >>> --1286936-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_bml_r2.so >>> --1286936-- object doesn't have a symbol table >>> --1286936-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_tcp.so >>> --1286936-- object doesn't have a symbol table >>> --1286936-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_sm.so >>> --1286936-- object doesn't have a symbol table >>> --1286936-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so >>> --1286936-- object doesn't have a symbol table >>> --1286936-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_openib.so >>> --1286936-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_verbs.so.40.20.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_self.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- REDIR: 0x652cc70 (libc.so.6:__strcpy_sse2_unaligned) >>> redirected to 0x483f090 (strcpy) >>> --1286946-- REDIR: 0x65a3810 (libc.so.6:__memmove_chk) redirected to >>> 0x48331d0 (_vgnU_ifunc_wrapper) >>> ==1286946== WARNING: new redirection conflicts with existing -- ignoring >>> it >>> --1286946-- old: 0x06531c30 (__memcpy_chk_sse2_un) R-> (2030.0) >>> 0x04843b10 __memcpy_chk >>> --1286946-- new: 0x06531c30 (__memcpy_chk_sse2_un) R-> (2024.0) >>> 0x048434d0 __memmove_chk >>> --1286946-- REDIR: 0x6531c30 (libc.so.6:__memcpy_chk_sse2_unaligned) >>> redirected to 0x4843b10 (__memcpy_chk) >>> --1286946-- REDIR: 0x65129b0 (libc.so.6:posix_memalign) redirected to >>> 0x483e1e0 (posix_memalign) >>> --1286946-- Discarding syms at 0x9f15280-0x9f32932 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_openib.so >>> (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9f7c4c0-0x9f7ded8 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_verbs.so.40.20.0 >>> (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9f620c0-0x9f71483 in >>> /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9f9ba10-0x9fd22ee in >>> /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 (have_dinfo 1) >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_cm.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_monitoring.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_monitoring.so.50.10.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_vprotocol_pessimist.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Discarding syms at 0x9f4d400-0x9f50c19 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_vprotocol_pessimist.so >>> (have_dinfo 1) >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/libpsm1/libpsm_infinipath.so.1.16 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/libinfinipath.so.4.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libuuid.so.1.3.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_ofi.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/libfabric.so.1.9.15 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/librdmacm.so.1.2.28.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libpsm2.so.2.1 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libnuma.so.1.0.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- REDIR: 0x6517140 (libc.so.6:strcasestr) redirected to >>> 0x4843f80 (strcasestr) >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm2.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Discarding syms at 0x9f4d5c0-0x9f4f5a1 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm.so (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9fee680-0x9ff096c in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm2.so (have_dinfo >>> 1) >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_inter.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_basic.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sm.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_sm.so.40.20.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_self.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sync.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_monitoring.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_libnbc.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_tuned.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_sm.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_pt2pt.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_rdma.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_monitoring.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Discarding syms at 0x9f724a0-0x9f787b5 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_ofi.so (have_dinfo 1) >>> --1286946-- Discarding syms at 0xa827f80-0xa8e14c4 in >>> /usr/lib/x86_64-linux-gnu/libfabric.so.1.9.15 (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9f94830-0x9fbafce in >>> /usr/lib/libpsm1/libpsm_infinipath.so.1.16 (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9fe5580-0x9fe8f71 in >>> /usr/lib/x86_64-linux-gnu/libuuid.so.1.3.0 (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9f56420-0x9f5cec0 in >>> /usr/lib/x86_64-linux-gnu/libinfinipath.so.4.0 (have_dinfo 1) >>> --1286946-- Discarding syms at 0xa929f10-0xa93d5fc in >>> /usr/lib/x86_64-linux-gnu/librdmacm.so.1.2.28.0 (have_dinfo 1) >>> --1286946-- Discarding syms at 0xa94b0c0-0xa95a483 in >>> /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 (have_dinfo 1) >>> --1286946-- Discarding syms at 0xa968860-0xa9adf12 in >>> /usr/lib/x86_64-linux-gnu/libpsm2.so.2.1 (have_dinfo 1) >>> --1286946-- Discarding syms at 0xa9e7a10-0xaa1e2ee in >>> /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9f80410-0x9f84e27 in >>> /usr/lib/x86_64-linux-gnu/libnuma.so.1.0.0 (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9f103e0-0x9f15fd5 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_cm.so (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9f471e0-0x9f47ce0 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_monitoring.so >>> (have_dinfo 1) >>> ==1286946== Thread 3: >>> ==1286946== Syscall param writev(vector[...]) points to uninitialised >>> byte(s) >>> ==1286946== at 0x658A48D: __writev (writev.c:26) >>> ==1286946== by 0x658A48D: writev (writev.c:24) >>> ==1286946== by 0x8DF9B4C: pmix_ptl_base_send_handler (in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >>> ==1286946== by 0x7CC413E: ??? (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286946== by 0x7CC487E: event_base_loop (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286946== by 0x8DBDD55: ??? (in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >>> ==1286946== by 0x4BF7608: start_thread (pthread_create.c:477) >>> ==1286946== by 0x6595102: clone (clone.S:95) >>> ==1286946== Address 0xa28fdcf is 127 bytes inside a block of size 5,120 >>> alloc'd >>> ==1286946== at 0x483DFAF: realloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286946== by 0x8DE155A: pmix_bfrop_buffer_extend (in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >>> ==1286946== by 0x8DE3F4A: pmix_bfrops_base_pack_byte (in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >>> ==1286946== by 0x8DE4900: pmix_bfrops_base_pack_buf (in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >>> ==1286946== by 0x8DE4175: pmix_bfrops_base_pack (in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >>> ==1286946== by 0x8D7CF91: ??? (in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >>> ==1286946== by 0x7CC3FDD: ??? (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286946== by 0x7CC487E: event_base_loop (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286946== by 0x8DBDD55: ??? (in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >>> ==1286946== by 0x4BF7608: start_thread (pthread_create.c:477) >>> ==1286946== by 0x6595102: clone (clone.S:95) >>> ==1286946== Uninitialised value was created by a stack allocation >>> ==1286946== at 0x9F048D6: ??? (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so) >>> ==1286946== >>> --1286944-- Discarding syms at 0xaa4d220-0xaa5796a in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmp--1286945-- Discarding syms at >>> 0xaa4d220---1286948-- Discarding syms at 0xaae1100-0xaae7d70 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmp--1286945-- Discarding syms at >>> 0xaae1100-0xaae7d70 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_monitoring.so >>> (have_dinfo 1) >>> --1286945-- Discarding syms at 0x9f69420-0x9f--1286938-- REDIR: >>> 0x61cee70 (libstdc++.so.6:operator delete(void*, unsigned long)) redirected >>> to --1286937-- REDIR: 0x61cee70 (libstdc++.so.6:opera--1286946-- REDIR: >>> 0x652e970 (libc.so.6:__stpncpy_sse2_unaligned) redirected to 0x48427e0 >>> (stpncpy) >>> --1286942-- REDIR: 0x6527ed0 (libc.so.6:__strnlen_sse2) redirected to >>> 0x483eee0 (strnlen) >>> --1286944-- REDIR: 0x652fcc0 (libc.so.6:__strcat_sse2_unaligned) >>> redirected to 0x483ec20 (strcat) >>> --1286951-- REDIR: 0x65113d0 (libc.so.6:memalign) redirected to >>> 0x483e2a0 (memalign) >>> --1286951-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_romio321.so >>> --1286951-- object doesn't have a symbol table >>> --1286951-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_ompio.so >>> --1286951-- object doesn't have a symbol table >>> --1286941-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_ompio.so.41.19.3 >>> --1286941-- object doesn't have a symbol table >>> --1286951-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fs_ufs.so >>> --1286951-- object doesn't have a symbol table >>> --1286939-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_two_phase.so >>> --1286939-- object doesn't have a symbol table >>> --1286939-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_individual.so >>> --1286939-- object doesn't have a symbol table >>> --1286939-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic.so >>> --1286939-- object doesn't have a symbol table >>> --1286939-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic_gen2.so >>> --1286939-- object doesn't have a symbol table >>> --1286939-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_vulcan.so >>> --1286939-- object doesn't have a symbol table >>> --1286939-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fbtl_posix.so >>> --1286939-- object doesn't have a symbol table >>> --1286943-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_sm.so >>> --1286943-- object doesn't have a symbol table >>> --1286943-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_individual.so >>> --1286943-- object doesn't have a symbol table >>> --1286943-- Reading syms from >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_lockedfile.so >>> --1286943-- object doesn't have a symbol table >>> --1286938-- REDIR: 0x65a3b00 (libc.so.6:__strcpy_chk) redirected to >>> 0x48435c0 (__strcpy_chk) >>> --1286939-- Discarding syms at 0x9f1d660-0x9f371d6 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9f5afa0-0x9f8f8b6 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_romio321.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9fa0640-0x9fa42d9 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_ompio.so (have_dinfo >>> 1) >>> --1286939-- Discarding syms at 0x9f4c160-0x9f4dc58 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_inter.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0xa7fc270-0xa804f00 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_basic.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9fee3a0-0x9ff134e in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sm.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0xa80a240-0xa80aa8d in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_sm.so.40.20.0 >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0xa80f0e0-0xa80f8bb in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_self.so (have_dinfo >>> 1) >>> --1286939-- Discarding syms at 0xaa460c0-0xaa47947 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sync.so (have_dinfo >>> 1) >>> --1286939-- Discarding syms at 0xaa613e0-0xaa7730f in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_libnbc.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0xaa849c0-0xaa8a845 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_tuned.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9ee1320-0x9ee3567 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_bml_r2.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9eebc40-0x9ef4ad7 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_tcp.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9f02600-0x9f08cd8 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so (have_dinfo >>> 1) >>> --1286939-- Discarding syms at 0x9f40200-0x9f4126e in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_self.so (have_dinfo >>> 1) >>> --1286939-- Discarding syms at 0x9eda4e0-0x9edb4c5 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mpool_hugepage.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9ed32c0-0x9ed4afe in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_rcache_grdma.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9ebf160-0x9ebfe95 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_basic.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9ece140-0x9ecebed in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_bucket.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9ec92a0-0x9ec9aa2 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_app.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8eae0e0-0x8eae4a7 in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_stdfd.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8eb3220-0x8eb3c27 in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_syslog.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8eb80e0-0x8eb90b7 in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_default.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8ea6380-0x8ea97b3 in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_preg_native.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8e5a740-0x8e5f859 in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_usock.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8e67be0-0x8e743f0 in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_tcp.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x84da200-0x84daa5d in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_native.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8d322b0-0x8d34bfc in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v21.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8e29480-0x8e3b70a in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v20.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8d3c2b0-0x8d3ed5c in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v3.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8e45340-0x8e502da in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v12.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8e901a0-0x8e908a7 in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_pshmem_mmap.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8d05520-0x8d06783 in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8e7b460-0x8e8aaa4 in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_hash.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8d44520-0x8d4556a in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds12.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8e97600-0x8ea0fa1 in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/libmca_common_dstore.so.1.0.2 >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8d109c0-0x8d27dcf in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_ext3x.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8d5b280-0x8dfdffb in >>> /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25 (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9ec40a0-0x9ec4490 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_app.so (have_dinfo >>> 1) >>> --1286939-- Discarding syms at 0x84d2580-0x84d518f in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_pmi.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x4a96120-0x4a9644f in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_flux.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x4aa0100-0x4aa03e7 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_orte.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x84c74a0-0x84c901f in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_ompi.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x4aa5260-0x4aa58e9 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_slurm.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x4a9b420-0x4a9bcdf in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_netlink.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x84e7460-0x84f52ca in >>> /usr/lib/x86_64-linux-gnu/libnl-3.so.200.26.0 (have_dinfo 1) >>> --1286939-- Discarding syms at 0x4a90360-0x4a91107 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_mmap.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9f46220-0x9f474cc in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fbtl_posix.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9f0f180-0x9f0f78d in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fs_ufs.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0xaa94540-0xaa96a4a in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_sm.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0xaa9f6c0-0xaab44d0 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_pt2pt.so (have_dinfo >>> 1) >>> --1286939-- Discarding syms at 0xaabe820-0xaad8ee0 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_rdma.so (have_dinfo >>> 1) >>> --1286939-- Discarding syms at 0x9efc080-0x9efc1e1 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_sm.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9fab2a0-0x9fb1341 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_two_phase.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9f140c0-0x9f14299 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_individual.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9fb72a0-0x9fbb791 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9fd52a0-0x9fda794 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic_gen2.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9fe02e0-0x9fe59a5 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_vulcan.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0xa815460-0xa8177ab in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_sm.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0xa81e260-0xa82033d in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_individual.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0xa8273e0-0xa8297d8 in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_lockedfile.so >>> (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9fc85e0-0x9fce8ef in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_ompio.so.41.19.3 >>> (have_dinfo 1) >>> ==1286939== >>> ==1286939== HEAP SUMMARY: >>> ==1286939== in use at exit: 74,054 bytes in 223 blocks >>> ==1286939== total heap usage: 22,405,782 allocs, 22,405,559 frees, >>> 34,062,479,959 bytes allocated >>> ==1286939== >>> ==1286939== Searching for pointers to 223 not-freed blocks >>> ==1286939== Checked 3,415,912 bytes >>> ==1286939== >>> ==1286939== Thread 1: >>> ==1286939== 1 bytes in 1 blocks are definitely lost in loss record 1 of >>> 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x651550E: strdup (strdup.c:42) >>> ==1286939== by 0x9F6A4B6: ??? >>> ==1286939== by 0x9F47373: ??? >>> ==1286939== by 0x68E3B9B: mca_base_framework_components_register (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3F35: mca_base_framework_register (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3F93: mca_base_framework_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x4BA1734: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== >>> ==1286939== 8 bytes in 1 blocks are still reachable in loss record 2 of >>> 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x764724C: ??? (in >>> /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) >>> ==1286939== by 0x7657B9A: ??? (in >>> /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) >>> ==1286939== by 0x7645679: ??? (in >>> /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) >>> ==1286939== by 0x4011B89: call_init.part.0 (dl-init.c:72) >>> ==1286939== by 0x4011C90: call_init (dl-init.c:30) >>> ==1286939== by 0x4011C90: _dl_init (dl-init.c:119) >>> ==1286939== by 0x4001139: ??? (in /usr/lib/x86_64-linux-gnu/ >>> ld-2.31.so) >>> ==1286939== by 0x3: ??? >>> ==1286939== by 0x1FFEFFF926: ??? >>> ==1286939== by 0x1FFEFFF93D: ??? >>> ==1286939== by 0x1FFEFFF987: ??? >>> ==1286939== by 0x1FFEFFF9A7: ??? >>> ==1286939== >>> ==1286939== 8 bytes in 1 blocks are definitely lost in loss record 3 of >>> 44 >>> ==1286939== at 0x483DD99: calloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x9F69B6F: ??? >>> ==1286939== by 0x9F1CDED: ??? >>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9EE3527: ??? >>> ==1286939== by 0x4B6170A: mca_bml_base_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA1714: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>> ==1286939== >>> ==1286939== 13 bytes in 2 blocks are still reachable in loss record 4 of >>> 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x651550E: strdup (strdup.c:42) >>> ==1286939== by 0x7CC3657: event_config_avoid_method (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x68FEB5A: opal_event_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68FE8CA: ??? (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E4008: mca_base_framework_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68B8BCF: opal_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x6860120: orte_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== >>> ==1286939== 15 bytes in 1 blocks are indirectly lost in loss record 5 of >>> 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x651550E: strdup (strdup.c:42) >>> ==1286939== by 0x9EDB189: ??? >>> ==1286939== by 0x68D98FC: mca_base_framework_components_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x6907C25: ??? (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E4008: mca_base_framework_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x4BA16D5: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>> ==1286939== >>> ==1286939== 15 bytes in 1 blocks are definitely lost in loss record 6 of >>> 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x651550E: strdup (strdup.c:42) >>> ==1286939== by 0x9F5655C: ??? >>> ==1286939== by 0x4011B89: call_init.part.0 (dl-init.c:72) >>> ==1286939== by 0x4011C90: call_init (dl-init.c:30) >>> ==1286939== by 0x4011C90: _dl_init (dl-init.c:119) >>> ==1286939== by 0x65D6784: _dl_catch_exception >>> (dl-error-skeleton.c:182) >>> ==1286939== by 0x401642C: dl_open_worker (dl-open.c:758) >>> ==1286939== by 0x65D6727: _dl_catch_exception >>> (dl-error-skeleton.c:208) >>> ==1286939== by 0x40155F9: _dl_open (dl-open.c:837) >>> ==1286939== by 0x72DE34B: dlopen_doit (dlopen.c:66) >>> ==1286939== by 0x65D6727: _dl_catch_exception >>> (dl-error-skeleton.c:208) >>> ==1286939== by 0x65D67F2: _dl_catch_error (dl-error-skeleton.c:227) >>> ==1286939== >>> ==1286939== 16 bytes in 1 blocks are definitely lost in loss record 7 of >>> 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x9F1CBEB: ??? >>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9EE3527: ??? >>> ==1286939== by 0x4B6170A: mca_bml_base_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA1714: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>> ==1286939== >>> ==1286939== 16 bytes in 1 blocks are definitely lost in loss record 8 of >>> 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x9F1CC66: ??? >>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9EE3527: ??? >>> ==1286939== by 0x4B6170A: mca_bml_base_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA1714: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>> ==1286939== >>> ==1286939== 16 bytes in 1 blocks are definitely lost in loss record 9 of >>> 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x9F1CCDA: ??? >>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9EE3527: ??? >>> ==1286939== by 0x4B6170A: mca_bml_base_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA1714: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>> ==1286939== >>> ==1286939== 25 bytes in 1 blocks are still reachable in loss record 10 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x651550E: strdup (strdup.c:42) >>> ==1286939== by 0x68F27BD: ??? (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x4B956B6: ompi_pml_v_output_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B95259: ??? (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x68D98FC: mca_base_framework_components_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x4B93FAE: ??? (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x68E4008: mca_base_framework_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x4BA1734: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== >>> ==1286939== 30 bytes in 1 blocks are definitely lost in loss record 11 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0xA9A859B: ??? >>> ==1286939== by 0x4011B89: call_init.part.0 (dl-init.c:72) >>> ==1286939== by 0x4011C90: call_init (dl-init.c:30) >>> ==1286939== by 0x4011C90: _dl_init (dl-init.c:119) >>> ==1286939== by 0x65D6784: _dl_catch_exception >>> (dl-error-skeleton.c:182) >>> ==1286939== by 0x401642C: dl_open_worker (dl-open.c:758) >>> ==1286939== by 0x65D6727: _dl_catch_exception >>> (dl-error-skeleton.c:208) >>> ==1286939== by 0x40155F9: _dl_open (dl-open.c:837) >>> ==1286939== by 0x72DE34B: dlopen_doit (dlopen.c:66) >>> ==1286939== by 0x65D6727: _dl_catch_exception >>> (dl-error-skeleton.c:208) >>> ==1286939== by 0x65D67F2: _dl_catch_error (dl-error-skeleton.c:227) >>> ==1286939== by 0x72DEB58: _dlerror_run (dlerror.c:170) >>> ==1286939== >>> ==1286939== 32 bytes in 1 blocks are still reachable in loss record 12 >>> of 44 >>> ==1286939== at 0x483DD99: calloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x7CC353E: event_get_supported_methods (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x68FEA98: opal_event_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68FE8CA: ??? (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E4008: mca_base_framework_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68B8BCF: opal_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x6860120: orte_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== >>> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 13 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A5EB: ??? >>> ==1286939== by 0x84D2B0A: ??? >>> ==1286939== by 0x68602FB: orte_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== >>> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 14 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A5EB: ??? >>> ==1286939== by 0x84D2BCE: ??? >>> ==1286939== by 0x68602FB: orte_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== >>> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 15 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A5EB: ??? >>> ==1286939== by 0x84D2CB2: ??? >>> ==1286939== by 0x68602FB: orte_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== >>> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 16 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A5EB: ??? >>> ==1286939== by 0x84D2D91: ??? >>> ==1286939== by 0x68602FB: orte_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== >>> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 17 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E81BD8: ??? >>> ==1286939== by 0x8E89F4B: ??? >>> ==1286939== by 0x8D84A0D: ??? >>> ==1286939== by 0x8DF79C1: ??? >>> ==1286939== by 0x7CC3FDD: ??? (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x7CC487E: event_base_loop (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x8DBDD55: ??? >>> ==1286939== by 0x4BF7608: start_thread (pthread_create.c:477) >>> ==1286939== by 0x6595102: clone (clone.S:95) >>> ==1286939== >>> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 18 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A767: ??? >>> ==1286939== by 0x84D330E: ??? >>> ==1286939== by 0x68602FB: orte_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== >>> ==1286939== 36 (32 direct, 4 indirect) bytes in 1 blocks are definitely >>> lost in loss record 19 of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A5EB: ??? >>> ==1286939== by 0x4B94C09: mca_pml_base_pml_check_selected (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x9F1E1E1: ??? >>> ==1286939== by 0x4BA1A09: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== >>> ==1286939== 40 bytes in 1 blocks are still reachable in loss record 20 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x7CFF4B6: ??? (in >>> /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >>> ==1286939== by 0x7CC5E26: event_global_setup_locks_ (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x7CFF68F: evthread_use_pthreads (in >>> /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >>> ==1286939== by 0x68FE8E4: ??? (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E4008: mca_base_framework_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68B8BCF: opal_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x6860120: orte_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== >>> ==1286939== 40 bytes in 1 blocks are still reachable in loss record 21 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x7CFF4B6: ??? (in >>> /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >>> ==1286939== by 0x7CCF377: evsig_global_setup_locks_ (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x7CC5E39: event_global_setup_locks_ (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x7CFF68F: evthread_use_pthreads (in >>> /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >>> ==1286939== by 0x68FE8E4: ??? (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E4008: mca_base_framework_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68B8BCF: opal_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x6860120: orte_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== >>> ==1286939== 40 bytes in 1 blocks are still reachable in loss record 22 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x7CFF4B6: ??? (in >>> /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >>> ==1286939== by 0x7CCB997: evutil_secure_rng_global_setup_locks_ (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x7CC5E4F: event_global_setup_locks_ (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x7CFF68F: evthread_use_pthreads (in >>> /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >>> ==1286939== by 0x68FE8E4: ??? (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E4008: mca_base_framework_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68B8BCF: opal_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x6860120: orte_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== >>> ==1286939== 48 bytes in 1 blocks are still reachable in loss record 23 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x68D9043: mca_base_component_repository_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68D7F7A: mca_base_component_find (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3F35: mca_base_framework_register (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3F93: mca_base_framework_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x4B8560C: mca_io_base_file_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B0E68A: ompi_file_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B3ADB8: PMPI_File_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) >>> ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) >>> ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) >>> ==1286939== >>> ==1286939== 48 bytes in 1 blocks are still reachable in loss record 24 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x68D9043: mca_base_component_repository_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68D7F7A: mca_base_component_find (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3F35: mca_base_framework_register (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3F93: mca_base_framework_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x4B85638: mca_io_base_file_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B0E68A: ompi_file_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B3ADB8: PMPI_File_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) >>> ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) >>> ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) >>> ==1286939== >>> ==1286939== 48 bytes in 2 blocks are still reachable in loss record 25 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x7CC3647: event_config_avoid_method (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x68FEB5A: opal_event_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68FE8CA: ??? (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E4008: mca_base_framework_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68B8BCF: opal_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x6860120: orte_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== >>> ==1286939== 55 (32 direct, 23 indirect) bytes in 1 blocks are definitely >>> lost in loss record 26 of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A767: ??? >>> ==1286939== by 0x4AF6CD6: ompi_comm_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA194D: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== >>> ==1286939== 56 bytes in 1 blocks are still reachable in loss record 27 >>> of 44 >>> ==1286939== at 0x483DD99: calloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x7CC1C86: event_config_new (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x68FEAC0: opal_event_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68FE8CA: ??? (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E4008: mca_base_framework_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68B8BCF: opal_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x6860120: orte_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== >>> ==1286939== 56 bytes in 1 blocks are definitely lost in loss record 28 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x9F6E008: ??? >>> ==1286939== by 0x9F7C654: ??? >>> ==1286939== by 0x9F1CD3E: ??? >>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9EE3527: ??? >>> ==1286939== by 0x4B6170A: mca_bml_base_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA1714: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== >>> ==1286939== 56 bytes in 1 blocks are definitely lost in loss record 29 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0xA957008: ??? >>> ==1286939== by 0xA86B017: ??? >>> ==1286939== by 0xA862FD8: ??? >>> ==1286939== by 0xA828E15: ??? >>> ==1286939== by 0xA829624: ??? >>> ==1286939== by 0x9F77910: ??? >>> ==1286939== by 0x4B85C53: ompi_mtl_base_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x9F13E4D: ??? >>> ==1286939== by 0x4B94673: mca_pml_base_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA1789: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== >>> ==1286939== 76 (32 direct, 44 indirect) bytes in 1 blocks are definitely >>> lost in loss record 30 of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A767: ??? >>> ==1286939== by 0x84D387F: ??? >>> ==1286939== by 0x68602FB: orte_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== >>> ==1286939== 79 (64 direct, 15 indirect) bytes in 1 blocks are definitely >>> lost in loss record 31 of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x9EDB12E: ??? >>> ==1286939== by 0x68D98FC: mca_base_framework_components_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x6907C25: ??? (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E4008: mca_base_framework_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x4BA16D5: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>> ==1286939== >>> ==1286939== 144 bytes in 3 blocks are still reachable in loss record 32 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x68D9043: mca_base_component_repository_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68D7F7A: mca_base_component_find (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3F35: mca_base_framework_register (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3F93: mca_base_framework_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x4B8564E: mca_io_base_file_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B0E68A: ompi_file_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B3ADB8: PMPI_File_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) >>> ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) >>> ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) >>> ==1286939== >>> ==1286939== 231 bytes in 12 blocks are definitely lost in loss record 33 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x651550E: strdup (strdup.c:42) >>> ==1286939== by 0x9F2B4B3: ??? >>> ==1286939== by 0x9F2B85C: ??? >>> ==1286939== by 0x9F2BBD7: ??? >>> ==1286939== by 0x9F1CAAC: ??? >>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9EE3527: ??? >>> ==1286939== by 0x4B6170A: mca_bml_base_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA1714: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== >>> ==1286939== 240 bytes in 5 blocks are still reachable in loss record 34 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x68D9043: mca_base_component_repository_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68D7F7A: mca_base_component_find (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3F35: mca_base_framework_register (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3F93: mca_base_framework_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x4B85622: mca_io_base_file_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B0E68A: ompi_file_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B3ADB8: PMPI_File_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) >>> ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) >>> ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) >>> ==1286939== >>> ==1286939== 272 bytes in 44 blocks are definitely lost in loss record 35 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x9FCAEDB: ??? >>> ==1286939== by 0x9FE42B2: ??? >>> ==1286939== by 0x9FE47BB: ??? >>> ==1286939== by 0x9FCDDBF: ??? >>> ==1286939== by 0x9FA324A: ??? >>> ==1286939== by 0x4B3DD7F: PMPI_File_write_at_all (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x7B6DB32: H5FD_mpio_write (H5FDmpio.c:1833) >>> ==1286939== by 0x78DF11D: H5FD_write (H5FDint.c:257) >>> ==1286939== by 0x78AE86B: H5F__accum_write (H5Faccum.c:825) >>> ==1286939== by 0x7A1FBE9: H5PB_write (H5PB.c:1027) >>> ==1286939== by 0x78BBC7A: H5F_block_write (H5Fio.c:164) >>> ==1286939== >>> ==1286939== 585 (480 direct, 105 indirect) bytes in 15 blocks are >>> definitely lost in loss record 36 of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A767: ??? >>> ==1286939== by 0x4B14036: ompi_proc_complete_init_single (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B146C3: ompi_proc_complete_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA19A9: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== >>> ==1286939== 776 bytes in 32 blocks are indirectly lost in loss record 37 >>> of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8DE9816: ??? >>> ==1286939== by 0x8DEB1D2: ??? >>> ==1286939== by 0x8DEB49A: ??? >>> ==1286939== by 0x8DE8B12: ??? >>> ==1286939== by 0x8E9D492: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A767: ??? >>> ==1286939== >>> ==1286939== 840 (480 direct, 360 indirect) bytes in 15 blocks are >>> definitely lost in loss record 38 of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A5EB: ??? >>> ==1286939== by 0x9EF2F00: ??? >>> ==1286939== by 0x9EEBF17: ??? >>> ==1286939== by 0x9EE2F54: ??? >>> ==1286939== by 0x9F1E1FB: ??? >>> ==1286939== >>> ==1286939== 1,084 (480 direct, 604 indirect) bytes in 15 blocks are >>> definitely lost in loss record 39 of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A767: ??? >>> ==1286939== by 0x84D4800: ??? >>> ==1286939== by 0x68602FB: orte_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== >>> ==1286939== 1,344 bytes in 1 blocks are definitely lost in loss record >>> 40 of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x68AE702: opal_free_list_grow_st (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9F1CD2D: ??? >>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9EE3527: ??? >>> ==1286939== by 0x4B6170A: mca_bml_base_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA1714: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>> ==1286939== >>> ==1286939== 2,752 bytes in 1 blocks are definitely lost in loss record >>> 41 of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x68AE702: opal_free_list_grow_st (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9F1CC50: ??? >>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9EE3527: ??? >>> ==1286939== by 0x4B6170A: mca_bml_base_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA1714: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>> ==1286939== >>> ==1286939== 2,752 bytes in 1 blocks are definitely lost in loss record >>> 42 of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x68AE702: opal_free_list_grow_st (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9F1CCC4: ??? >>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9EE3527: ??? >>> ==1286939== by 0x4B6170A: mca_bml_base_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA1714: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>> ==1286939== >>> ==1286939== 62,644 bytes in 31 blocks are indirectly lost in loss record >>> 43 of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8DE9FA8: ??? >>> ==1286939== by 0x8DEB032: ??? >>> ==1286939== by 0x8DEB49A: ??? >>> ==1286939== by 0x8DE8B12: ??? >>> ==1286939== by 0x8E9D492: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A5EB: ??? >>> ==1286939== >>> ==1286939== 62,760 (480 direct, 62,280 indirect) bytes in 15 blocks are >>> definitely lost in loss record 44 of 44 >>> ==1286939== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A5EB: ??? >>> ==1286939== by 0x9F0398A: ??? >>> ==1286939== by 0x9EE2F54: ??? >>> ==1286939== by 0x9F1E1FB: ??? >>> ==1286939== by 0x4BA1A09: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== >>> ==1286939== LEAK SUMMARY: >>> ==1286939== definitely lost: 9,837 bytes in 138 blocks >>> ==1286939== indirectly lost: 63,435 bytes in 64 blocks >>> ==1286939== possibly lost: 0 bytes in 0 blocks >>> ==1286939== still reachable: 782 bytes in 21 blocks >>> ==1286939== suppressed: 0 bytes in 0 blocks >>> ==1286939== >>> ==1286939== ERROR SUMMARY: 29 errors from 29 contexts (suppressed: 0 >>> from 0) >>> ==1286939== >>> ==1286939== 1 errors in context 1 of 29: >>> ==1286939== Thread 3: >>> ==1286939== Syscall param writev(vector[...]) points to uninitialised >>> byte(s) >>> ==1286939== at 0x658A48D: __writev (writev.c:26) >>> ==1286939== by 0x658A48D: writev (writev.c:24) >>> ==1286939== by 0x8DF9B4C: ??? >>> ==1286939== by 0x7CC413E: ??? (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x7CC487E: event_base_loop (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x8DBDD55: ??? >>> ==1286939== by 0x4BF7608: start_thread (pthread_create.c:477) >>> ==1286939== by 0x6595102: clone (clone.S:95) >>> ==1286939== Address 0xa28ee1f is 127 bytes inside a block of size 5,120 >>> alloc'd >>> ==1286939== at 0x483DFAF: realloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8DE155A: ??? >>> ==1286939== by 0x8DE3F4A: ??? >>> ==1286939== by 0x8DE4900: ??? >>> ==1286939== by 0x8DE4175: ??? >>> ==1286939== by 0x8D7CF91: ??? >>> ==1286939== by 0x7CC3FDD: ??? (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x7CC487E: event_base_loop (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x8DBDD55: ??? >>> ==1286939== by 0x4BF7608: start_thread (pthread_create.c:477) >>> ==1286939== by 0x6595102: clone (clone.S:95) >>> ==1286939== Uninitialised value was created by a stack allocation >>> ==1286939== at 0x9F048D6: ??? >>> ==1286939== >>> ==1286939== ERROR SUMMARY: 29 errors from 29 contexts (suppressed: 0 >>> from 0) >>> mpi/lib/libopen-pal.so.40.20.3) >>> ==1286936== by 0x4B85622: mca_io_base_file_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4B0E68A: ompi_file_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4B3ADB8: PMPI_File_open (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) >>> ==1286936== by 0x78D4B23: H5FD_open (H5FD.c:733) >>> ==1286936== by 0x78B953B: H5F_open (H5Fint.c:1493) >>> ==1286936== >>> ==1286936== 272 bytes in 44 blocks are definitely lost in loss record 39 >>> of 49 >>> ==1286936== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x9FCAEDB: ??? >>> ==1286936== by 0x9FE42B2: ??? >>> ==1286936== by 0x9FE47BB: ??? >>> ==1286936== by 0x9FCDDBF: ??? >>> ==1286936== by 0x9FA324A: ??? >>> ==1286936== by 0x4B3DD7F: PMPI_File_write_at_all (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x7B6DB32: H5FD_mpio_write (H5FDmpio.c:1833) >>> ==1286936== by 0x78DF11D: H5FD_write (H5FDint.c:257) >>> ==1286936== by 0x78AE86B: H5F__accum_write (H5Faccum.c:825) >>> ==1286936== by 0x7A1FBE9: H5PB_write (H5PB.c:1027) >>> ==1286936== by 0x78BBC7A: H5F_block_write (H5Fio.c:164) >>> ==1286936== >>> ==1286936== 312 bytes in 1 blocks are still reachable in loss record 40 >>> of 49 >>> ==1286936== at 0x483BE63: operator new(unsigned long) (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x74E78EB: boost::detail::make_external_thread_data() >>> (in >>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_thread.so.1.73.0) >>> ==1286936== by 0x74E7C74: >>> boost::detail::add_thread_exit_function(boost::detail::thread_exit_function_base*) >>> (in >>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_thread.so.1.73.0) >>> ==1286936== by 0x73AFCEA: >>> boost::log::v2_mt_posix::sources::aux::get_severity_level() (in >>> /home/mlohry/dev/cmake-build/boost_install/lib/libboost_log.so.1.73.0) >>> ==1286936== by 0x5F71A6C: set_value (severity_feature.hpp:135) >>> ==1286936== by 0x5F71A6C: >>> open_record_unlocked>> const boost::log::v2_mt_posix::trivial::severity_level> > > >>> (severity_feature.hpp:252) >>> ==1286936== by 0x5F71A6C: >>> open_record>> const boost::log::v2_mt_posix::trivial::severity_level> > > >>> (basic_logger.hpp:459) >>> ==1286936== by 0x5F71A6C: >>> Logger::TraceMessage(std::__cxx11::basic_string>> std::char_traits, std::allocator >) (logger.cpp:328) >>> ==1286936== by 0x5F729C7: >>> Logger::Message(std::__cxx11::basic_string, >>> std::allocator > const&, LogLevel) (logger.cpp:280) >>> ==1286936== by 0x5F73CF1: >>> Logger::Timer::Timer(std::__cxx11::basic_string>> std::char_traits, std::allocator > const&, LogLevel) >>> (logger.cpp:426) >>> ==1286936== by 0x15718A: timer (logger.hpp:98) >>> ==1286936== by 0x15718A: main (testing_main.cpp:9) >>> ==1286936== >>> ==1286936== 585 (480 direct, 105 indirect) bytes in 15 blocks are >>> definitely lost in loss record 41 of 49 >>> ==1286936== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x8E9D3EB: ??? >>> ==1286936== by 0x8E9F1C1: ??? >>> ==1286936== by 0x8D0578C: ??? >>> ==1286936== by 0x8D8605A: ??? >>> ==1286936== by 0x8D87FE8: ??? >>> ==1286936== by 0x8D88E4D: ??? >>> ==1286936== by 0x8D1A767: ??? >>> ==1286936== by 0x4B14036: ompi_proc_complete_init_single (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4B146C3: ompi_proc_complete_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4BA19A9: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== >>> ==1286936== 776 bytes in 32 blocks are indirectly lost in loss record 42 >>> of 49 >>> ==1286936== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x8DE9816: ??? >>> ==1286936== by 0x8DEB1D2: ??? >>> ==1286936== by 0x8DEB49A: ??? >>> ==1286936== by 0x8DE8B12: ??? >>> ==1286936== by 0x8E9D492: ??? >>> ==1286936== by 0x8E9F1C1: ??? >>> ==1286936== by 0x8D0578C: ??? >>> ==1286936== by 0x8D8605A: ??? >>> ==1286936== by 0x8D87FE8: ??? >>> ==1286936== by 0x8D88E4D: ??? >>> ==1286936== by 0x8D1A767: ??? >>> ==1286936== >>> ==1286936== 840 (480 direct, 360 indirect) bytes in 15 blocks are >>> definitely lost in loss record 43 of 49 >>> ==1286936== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x8E9D3EB: ??? >>> ==1286936== by 0x8E9F1C1: ??? >>> ==1286936== by 0x8D0578C: ??? >>> ==1286936== by 0x8D8605A: ??? >>> ==1286936== by 0x8D87FE8: ??? >>> ==1286936== by 0x8D88E4D: ??? >>> ==1286936== by 0x8D1A5EB: ??? >>> ==1286936== by 0x9EF2F00: ??? >>> ==1286936== by 0x9EEBF17: ??? >>> ==1286936== by 0x9EE2F54: ??? >>> ==1286936== by 0x9F1E1FB: ??? >>> ==1286936== >>> ==1286936== 1,091 (480 direct, 611 indirect) bytes in 15 blocks are >>> definitely lost in loss record 44 of 49 >>> ==1286936== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x8E9D3EB: ??? >>> ==1286936== by 0x8E9F1C1: ??? >>> ==1286936== by 0x8D0578C: ??? >>> ==1286936== by 0x8D8605A: ??? >>> ==1286936== by 0x8D87FE8: ??? >>> ==1286936== by 0x8D88E4D: ??? >>> ==1286936== by 0x8D1A767: ??? >>> ==1286936== by 0x84D4800: ??? >>> ==1286936== by 0x68602FB: orte_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286936== by 0x4BA1322: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== >>> ==1286936== 1,344 bytes in 1 blocks are definitely lost in loss record >>> 45 of 49 >>> ==1286936== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x68AE702: opal_free_list_grow_st (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286936== by 0x9F1CD2D: ??? >>> ==1286936== by 0x68FC9C8: mca_btl_base_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286936== by 0x9EE3527: ??? >>> ==1286936== by 0x4B6170A: mca_bml_base_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4BA1714: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286936== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286936== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286936== by 0x15710D: main (testing_main.cpp:8) >>> ==1286936== >>> ==1286936== 2,752 bytes in 1 blocks are definitely lost in loss record >>> 46 of 49 >>> ==1286936== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x68AE702: opal_free_list_grow_st (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286936== by 0x9F1CC50: ??? >>> ==1286936== by 0x68FC9C8: mca_btl_base_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286936== by 0x9EE3527: ??? >>> ==1286936== by 0x4B6170A: mca_bml_base_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4BA1714: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286936== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286936== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286936== by 0x15710D: main (testing_main.cpp:8) >>> ==1286936== >>> ==1286936== 2,752 bytes in 1 blocks are definitely lost in loss record >>> 47 of 49 >>> ==1286936== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x68AE702: opal_free_list_grow_st (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286936== by 0x9F1CCC4: ??? >>> ==1286936== by 0x68FC9C8: mca_btl_base_select (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286936== by 0x9EE3527: ??? >>> ==1286936== by 0x4B6170A: mca_bml_base_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4BA1714: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4B450B0: PMPI_Init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4A7BA77: boost::mpi::environment::environment(bool) >>> (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286936== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286936== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286936== by 0x15710D: main (testing_main.cpp:8) >>> ==1286936== >>> ==1286936== 62,640 bytes in 30 blocks are indirectly lost in loss record >>> 48 of 49 >>> ==1286936== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x8DE9FA8: ??? >>> ==1286936== by 0x8DEB032: ??? >>> ==1286936== by 0x8DEB49A: ??? >>> ==1286936== by 0x8DE8B12: ??? >>> ==1286936== by 0x8E9D492: ??? >>> ==1286936== by 0x8E9F1C1: ??? >>> ==1286936== by 0x8D0578C: ??? >>> ==1286936== by 0x8D8605A: ??? >>> ==1286936== by 0x8D87FE8: ??? >>> ==1286936== by 0x8D88E4D: ??? >>> ==1286936== by 0x8D1A5EB: ??? >>> ==1286936== >>> ==1286936== 62,760 (480 direct, 62,280 indirect) bytes in 15 blocks are >>> definitely lost in loss record 49 of 49 >>> ==1286936== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x8E9D3EB: ??? >>> ==1286936== by 0x8E9F1C1: ??? >>> ==1286936== by 0x8D0578C: ??? >>> ==1286936== by 0x8D8605A: ??? >>> ==1286936== by 0x8D87FE8: ??? >>> ==1286936== by 0x8D88E4D: ??? >>> ==1286936== by 0x8D1A5EB: ??? >>> ==1286936== by 0x9F0398A: ??? >>> ==1286936== by 0x9EE2F54: ??? >>> ==1286936== by 0x9F1E1FB: ??? >>> ==1286936== by 0x4BA1A09: ompi_mpi_init (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== >>> ==1286936== LEAK SUMMARY: >>> ==1286936== definitely lost: 9,805 bytes in 137 blocks >>> ==1286936== indirectly lost: 63,431 bytes in 63 blocks >>> ==1286936== possibly lost: 0 bytes in 0 blocks >>> ==1286936== still reachable: 1,174 bytes in 27 blocks >>> ==1286936== suppressed: 0 bytes in 0 blocks >>> ==1286936== >>> ==1286936== ERROR SUMMARY: 34 errors from 29 contexts (suppressed: 0 >>> from 0) >>> ==1286936== >>> ==1286936== 1 errors in context 1 of 29: >>> ==1286936== Thread 3: >>> ==1286936== Syscall param writev(vector[...]) points to uninitialised >>> byte(s) >>> ==1286936== at 0x658A48D: __writev (writev.c:26) >>> ==1286936== by 0x658A48D: writev (writev.c:24) >>> ==1286936== by 0x8DF9B4C: ??? >>> ==1286936== by 0x7CC413E: ??? (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286936== by 0x7CC487E: event_base_loop (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286936== by 0x8DBDD55: ??? >>> ==1286936== by 0x4BF7608: start_thread (pthread_create.c:477) >>> ==1286936== by 0x6595102: clone (clone.S:95) >>> ==1286936== Address 0xa290cbf is 127 bytes inside a block of size 5,120 >>> alloc'd >>> ==1286936== at 0x483DFAF: realloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x8DE155A: ??? >>> ==1286936== by 0x8DE3F4A: ??? >>> ==1286936== by 0x8DE4900: ??? >>> ==1286936== by 0x8DE4175: ??? >>> ==1286936== by 0x8D7CF91: ??? >>> ==1286936== by 0x7CC3FDD: ??? (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286936== by 0x7CC487E: event_base_loop (in >>> /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286936== by 0x8DBDD55: ??? >>> ==1286936== by 0x4BF7608: start_thread (pthread_create.c:477) >>> ==1286936== by 0x6595102: clone (clone.S:95) >>> ==1286936== Uninitialised value was created by a stack allocation >>> ==1286936== at 0x9F048D6: ??? >>> ==1286936== >>> ==1286936== >>> ==1286936== 6 errors in context 2 of 29: >>> ==1286936== Thread 1: >>> ==1286936== Syscall param pwritev(vector[...]) points to uninitialised >>> byte(s) >>> ==1286936== at 0x658A608: pwritev64 (pwritev64.c:30) >>> ==1286936== by 0x658A608: pwritev (pwritev64.c:28) >>> ==1286936== by 0x9F46E25: ??? >>> ==1286936== by 0x9FCE33B: ??? >>> ==1286936== by 0x9FCDDBF: ??? >>> ==1286936== by 0x9FA324A: ??? >>> ==1286936== by 0x4B3DD7F: PMPI_File_write_at_all (in >>> /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x7B6DB32: H5FD_mpio_write (H5FDmpio.c:1833) >>> ==1286936== by 0x78DF11D: H5FD_write (H5FDint.c:257) >>> ==1286936== by 0x78AE86B: H5F__accum_write (H5Faccum.c:825) >>> ==1286936== by 0x7A1FBE9: H5PB_write (H5PB.c:1027) >>> ==1286936== by 0x78BBC7A: H5F_block_write (H5Fio.c:164) >>> ==1286936== by 0x7B5ED15: H5C__collective_write (H5Cmpio.c:1020) >>> ==1286936== by 0x7B5ED15: H5C_apply_candidate_list (H5Cmpio.c:394) >>> ==1286936== Address 0xedf91b0 is 96 bytes inside a block of size 216 >>> alloc'd >>> ==1286936== at 0x483B7F3: malloc (in >>> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x7980C45: H5MM_malloc (H5MM.c:292) >>> ==1286936== by 0x7980C45: H5MM_malloc (H5MM.c:267) >>> ==1286936== by 0x77FC8FF: H5C__flush_single_entry (H5C.c:6045) >>> ==1286936== by 0x7B5DC7E: H5C__flush_candidates_in_ring >>> (H5Cmpio.c:1371) >>> ==1286936== by 0x7B5DC7E: H5C__flush_candidate_entries >>> (H5Cmpio.c:1192) >>> ==1286936== by 0x7B5DC7E: H5C_apply_candidate_list (H5Cmpio.c:385) >>> ==1286936== by 0x7B5BA18: H5AC__rsp__dist_md_write__flush >>> (H5ACmpio.c:1709) >>> ==1286936== by 0x7B5BA18: H5AC__run_sync_point (H5ACmpio.c:2164) >>> ==1286936== by 0x7B5C9D2: H5AC__flush_entries (H5ACmpio.c:2307) >>> ==1286936== by 0x77C95E4: H5AC_flush (H5AC.c:681) >>> ==1286936== by 0x78B306A: H5F__flush_phase2 (H5Fint.c:1831) >>> ==1286936== by 0x78B5D7A: H5F__dest (H5Fint.c:1152) >>> ==1286936== by 0x78B6603: H5F_try_close (H5Fint.c:2180) >>> ==1286936== by 0x78B69F5: H5F__close_cb (H5Fint.c:2009) >>> ==1286936== by 0x7965797: H5I_dec_ref (H5I.c:1254) >>> ==1286936== Uninitialised value was created by a stack allocation >>> ==1286936== at 0x7695AF0: ??? (in >>> /home/mlohry/dev/cmake-build/external_install/lib/libcgns.so) >>> ==1286936== >>> ==1286936== ERROR SUMMARY: 34 errors from 29 contexts (suppressed: 0 >>> from 0) >>> >>> On Mon, Aug 24, 2020 at 5:00 PM Jed Brown wrote: >>> >>>> Do you potentially have a memory or other resource leak? SIGBUS would >>>> be an odd result, but the symptom of crashing after running for a long time >>>> sometimes fits with a resource leak. >>>> >>>> Mark Lohry writes: >>>> >>>> > I queued up some jobs with Barry's patch, so we'll see. >>>> > >>>> > Re Jed's suggestion at checkpointing, I don't *think* this is >>>> something >>>> > coming from the state of the solution -- running from the same point >>>> I'm >>>> > seeing it crash anywhere between 1 hour and 20 hours in. I'll >>>> increase my >>>> > file save frequency in case I'm wrong there though. >>>> > >>>> > My intel build with different blas just made it through a 6 hour time >>>> slot >>>> > without crash, whereas yesterday the same thing crashed after 3 >>>> hours. But >>>> > given the randomness so far I'd bet that's just dumb luck. >>>> > >>>> > On Mon, Aug 24, 2020 at 4:22 PM Barry Smith wrote: >>>> > >>>> >> >>>> >> >>>> >> > On Aug 24, 2020, at 2:34 PM, Jed Brown wrote: >>>> >> > >>>> >> > I'm thinking of something such as writing floating point data into >>>> the >>>> >> return address, which would be unaligned/garbage. >>>> >> >>>> >> Ok, my patch will detect this. This is what I was talking about, >>>> messing >>>> >> up the BLAS arguments which are the addresses of arrays. >>>> >> >>>> >> Valgrind is by far the preferred approach. >>>> >> >>>> >> Barry >>>> >> >>>> >> Another feature we could add to the malloc checking is when a SEGV >>>> or >>>> >> BUS error is encountered and we catch it we should run the >>>> >> PetscMallocVerify() and check our memory for corruption reporting >>>> any we >>>> >> find. >>>> >> >>>> >> >>>> >> >>>> >> > >>>> >> > Reproducing under Valgrind would help a lot. Perhaps it's >>>> possible to >>>> >> checkpoint such that the breakage can be reproduced more quickly? >>>> >> > >>>> >> > Barry Smith writes: >>>> >> > >>>> >> >> https://en.wikipedia.org/wiki/Bus_error < >>>> >> https://en.wikipedia.org/wiki/Bus_error> >>>> >> >> >>>> >> >> But perhaps not true for Intel? >>>> >> >> >>>> >> >> >>>> >> >> >>>> >> >>> On Aug 24, 2020, at 1:06 PM, Matthew Knepley >>>> >> wrote: >>>> >> >>> >>>> >> >>> On Mon, Aug 24, 2020 at 1:46 PM Barry Smith >>> >>> >> bsmith at petsc.dev>> wrote: >>>> >> >>> >>>> >> >>> >>>> >> >>>> On Aug 24, 2020, at 12:39 PM, Jed Brown >>> >>> >> jed at jedbrown.org>> wrote: >>>> >> >>>> >>>> >> >>>> Barry Smith > >>>> writes: >>>> >> >>>> >>>> >> >>>>>> On Aug 24, 2020, at 12:31 PM, Jed Brown >>> >>> >> jed at jedbrown.org>> wrote: >>>> >> >>>>>> >>>> >> >>>>>> Barry Smith > >>>> writes: >>>> >> >>>>>> >>>> >> >>>>>>> So if a BLAS errors with SIGBUS then it is always an input >>>> error >>>> >> of just not proper double/complex alignment? Or some other very >>>> strange >>>> >> thing? >>>> >> >>>>>> >>>> >> >>>>>> I would suspect memory corruption. >>>> >> >>>>> >>>> >> >>>>> >>>> >> >>>>> Corruption meaning what specifically? >>>> >> >>>>> >>>> >> >>>>> The routines crashing are dgemv which only take double >>>> precision >>>> >> arrays, regardless of what garbage is in those arrays i don't think >>>> there >>>> >> can be BUS errors resulting. They don't take integer arrays whose >>>> >> corruption could result in bad indexing and then BUS errors. >>>> >> >>>>> >>>> >> >>>>> So then it can only be corruption of the pointers passed in, >>>> correct? >>>> >> >>>> >>>> >> >>>> Such as those pointers pointing into data on the stack with >>>> incorrect >>>> >> sizes. >>>> >> >>> >>>> >> >>> But won't incorrect sizes "usually" lead to SEGV not SEGBUS? >>>> >> >>> >>>> >> >>> My understanding was that roughly memory errors in the heap are >>>> SEGV >>>> >> and memory errors on the stack are SIGBUS. Is that not true? >>>> >> >>> >>>> >> >>> Matt >>>> >> >>> >>>> >> >>> -- >>>> >> >>> What most experimenters take for granted before they begin their >>>> >> experiments is infinitely more interesting than any results to which >>>> their >>>> >> experiments lead. >>>> >> >>> -- Norbert Wiener >>>> >> >>> >>>> >> >>> https://www.cse.buffalo.edu/~knepley/ < >>>> >> http://www.cse.buffalo.edu/~knepley/> >>>> >> >>>> >> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Aug 27 16:34:26 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 27 Aug 2020 16:34:26 -0500 Subject: [petsc-users] Bus Error In-Reply-To: References: <917B3F31-A98C-48A5-9E66-6B93A8C0462D@petsc.dev> <02176361-CBC4-490B-A064-057C9FAC271A@petsc.dev> <2C14E111-0ABC-4322-AD1C-FC83E4BD4798@petsc.dev> <87h7ssgg0g.fsf@jedbrown.org> <80DC4DC1-8D9B-43BF-9947-F47694FE68E9@petsc.dev> <87blj0gew3.fsf@jedbrown.org> <878se4gdd6.fsf@jedbrown.org> <3B4D8471-F863-49C8-AAD7-4A4ACA3BE78A@petsc.dev> <87364cgczg.fsf@jedbrown.org> <79E082F4-0261-4F32-9781-861B2B650511@petsc.dev> <87y2m3g7mp.fsf@jedbrown.org> <1BA78983-882E-404D-983D-B432D17E6421@petsc.dev> <87a6yjg3o5.fsf@jedbrown.org> <9EEB2628-D6ED-4466-A629-33EAC73BCE4C@petsc.dev> <386068BC-7972-455E-A9E8-C09F9DCF58BD@petsc.dev> Message-ID: <4463F108-D33B-46C2-80BC-EDEBB3BBE140@petsc.dev> Mark, No problem, we'll have a few more automatic checks in PETSc due to this to help everyone in the future debug these difficult situations a little easier. Barry > On Aug 27, 2020, at 3:26 PM, Mark Lohry wrote: > > Alright, this time it crashed with a bus error before petsc had even been initialized or anything in blas had ever been called. I'm told there was also a known network failure on this cluster a few days ago that took out one rack, so now I'm reasonably convinced there are legitimate hardware faults elsewhere. > > Looking like a wild goose chase on the software side, but all the help is hugely appreciated. > > On Thu, Aug 27, 2020 at 10:52 AM Barry Smith > wrote: > > Thanks, > > So this means that all the double precision array pointers that PETSc is passing into these BLAS calls are addressable. Which means nothing has corrupted any of these pointers before the calls. > > What my patch did. Before each BLAS call, for each double array argument it set a special exception handler and then accessed the first entry in the array. Since the exception handler was never called this means that the first entry of each array was accessible and would not produce a SEGV or SIGBUS. > > What else could be corrupted. > > 1) the size arguments passed to the BLAS calls, if they were too large they could result in accessing incorrect memory but IMHO that would usually produce a SEGV not a SIGBUS. It is hard to put a check in the code because these sizes are problem dependent and there is no way to know if they are wrong. > > 2) corruption of the stack? > > 3) hardware issue due to overheating or bad memory etc. I assume the MPI rank that crashes changes for each crashing run. I am adding code to our patch branch to print the node name that hopefully is constant for all runs, then one can see if the problem is always on the same node. Patch attached > > > Can you try with a very different BLAS implementation? What are you using now? > > For example you could configure PETSc with --download-f2cblaslapack or if you are using MKL switch to non-MKL, or if you are using the system BLAS switch to MKL. > > Barry > > We can also replace the BLAS calls with direct C and see what happens but let's only do that after you try a different BLAS. > > > > > >> On Aug 27, 2020, at 8:53 AM, Mark Lohry > wrote: >> >> It was built with --with-debugging=1 >> >> On Thu, Aug 27, 2020 at 9:44 AM Barry Smith > wrote: >> >> Mark, >> >> Did i tell you that this has to be built with the configure option --with-debugging=1 and won't be turned off with --with-debugging=0 ? >> >> Barry >> >> >>> On Aug 27, 2020, at 8:10 AM, Mark Lohry > wrote: >>> >>> Barry, no output from that patch i'm afraid: >>> >>> 54 KSP Residual norm 3.215013886664e+03 >>> 55 KSP Residual norm 3.049105434513e+03 >>> 56 KSP Residual norm 2.859123916860e+03 >>> [929]PETSC ERROR: ------------------------------------------------------------------------ >>> [929]PETSC ERROR: Caught signal number 7 BUS: Bus Error, possibly illegal memory access >>> [929]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >>> [929]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >>> [929]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors >>> [929]PETSC ERROR: likely location of problem given in stack below >>> [929]PETSC ERROR: --------------------- Stack Frames ------------------------------------ >>> [929]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, >>> [929]PETSC ERROR: INSTEAD the line number of the start of the function >>> [929]PETSC ERROR: is given. >>> [929]PETSC ERROR: [929] BLASgemv line 1406 /home/mlohry/petsc/src/mat/impls/baij/seq/baijfact.c >>> [929]PETSC ERROR: [929] MatSolve_SeqBAIJ_N_NaturalOrdering line 1378 /home/mlohry/petsc/src/mat/impls/baij/seq/baijfact.c >>> [929]PETSC ERROR: [929] MatSolve line 3354 /home/mlohry/petsc/src/mat/interface/matrix.c >>> [929]PETSC ERROR: [929] PCApply_ILU line 201 /home/mlohry/petsc/src/ksp/pc/impls/factor/ilu/ilu.c >>> [929]PETSC ERROR: [929] PCApply line 426 /home/mlohry/petsc/src/ksp/pc/interface/precon.c >>> [929]PETSC ERROR: [929] KSP_PCApply line 279 /home/mlohry/petsc/include/petsc/private/kspimpl.h >>> [929]PETSC ERROR: [929] KSPSolve_PREONLY line 16 /home/mlohry/petsc/src/ksp/ksp/impls/preonly/preonly.c >>> [929]PETSC ERROR: [929] KSPSolve_Private line 590 /home/mlohry/petsc/src/ksp/ksp/interface/itfunc.c >>> [929]PETSC ERROR: [929] KSPSolve line 848 /home/mlohry/petsc/src/ksp/ksp/interface/itfunc.c >>> [929]PETSC ERROR: [929] PCApply_ASM line 441 /home/mlohry/petsc/src/ksp/pc/impls/asm/asm.c >>> [929]PETSC ERROR: [929] PCApply line 426 /home/mlohry/petsc/src/ksp/pc/interface/precon.c >>> [929]PETSC ERROR: [929] KSP_PCApply line 279 /home/mlohry/petsc/include/petsc/private/kspimpl.h >>> srun: Job step aborted: Waiting up to 47 seconds for job step to finish. >>> [929]PETSC ERROR: [929] KSPFGMRESCycle line 108 /home/mlohry/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c >>> [929]PETSC ERROR: [929] KSPSolve_FGMRES line 274 /home/mlohry/petsc/src/ksp/ksp/impls/gmres/fgmres/fgmres.c >>> [929]PETSC ERROR: [929] KSPSolve_Private line 590 /home/mlohry/petsc/src/ksp/ksp/interface/itfunc.c >>> >>> On Mon, Aug 24, 2020 at 6:47 PM Mark Lohry > wrote: >>> I don't think I do. Running a much smaller case with the same models I get the attached report from valgrind --show-leak-kinds=all --leak-check=full --track-origins=yes. I only see some HDF5 stuff and OpenMPI that I think are false positives. >>> >>> ==1286950== Memcheck, a memory error detector >>> ==1286950== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. >>> ==1286950== Using Valgrind-3.15.0-608cb11914-20190413 and LibVEX; rerun with -h for copyright info >>> ==1286950== Command: ./verification_testing --gtest_filter=DrivenCavity3D.Re100_BackwardEulerILU1_16x16N2_Quadrature1 --petsc_time_integrator=arkimex --petsc_arkimex_type=l2 >>> ==1286950== Parent PID: 1286932 >>> ==1286950== >>> --1286950-- >>> --1286950-- Valgrind options: >>> --1286950-- --show-leak-kinds=all >>> --1286950-- --leak-check=full >>> --1286950-- --track-origins=yes >>> --1286950-- --log-file=valgrind-out.txt >>> --1286950-- -v >>> --1286950-- Contents of /proc/version: >>> --1286950-- Linux version 5.4.0-29-generic (buildd at lgw01-amd64-035) (gcc version 9.3.0 (Ubuntu 9.3.0-10ubuntu2)) #33-Ubuntu SMP Wed Apr 29 14:32:27 UTC 2020 >>> --1286950-- >>> --1286950-- Arch and hwcaps: AMD64, LittleEndian, amd64-cx16-rdtscp-sse3-ssse3-avx >>> --1286950-- Page sizes: currently 4096, max supported 4096 >>> --1286950-- Valgrind library directory: /usr/lib/x86_64-linux-gnu/valgrind >>> --1286950-- Reading syms from /home/mlohry/dev/cmake-build/verification_testing >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/ld-2.31.so >>> --1286950-- Considering /usr/lib/x86_64-linux-gnu/ld-2.31.so .. >>> --1286950-- .. CRC mismatch (computed 387b17ea wanted d28cf5ef) >>> --1286950-- Considering /lib/x86_64-linux-gnu/ld-2.31.so .. >>> --1286950-- .. CRC mismatch (computed 387b17ea wanted d28cf5ef) >>> --1286950-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/ld-2.31.so .. >>> --1286950-- .. CRC is valid >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/valgrind/memcheck-amd64-linux >>> --1286950-- object doesn't have a symbol table >>> --1286950-- object doesn't have a dynamic symbol table >>> --1286950-- Scheduler: using generic scheduler lock implementation. >>> --1286950-- Reading suppressions file: /usr/lib/x86_64-linux-gnu/valgrind/default.supp >>> ==1286950== embedded gdbserver: reading from /tmp/vgdb-pipe-from-vgdb-to-1286950-by-mlohry-on-??? >>> ==1286950== embedded gdbserver: writing to /tmp/vgdb-pipe-to-vgdb-from-1286950-by-mlohry-on-??? >>> ==1286950== embedded gdbserver: shared mem /tmp/vgdb-pipe-shared-mem-vgdb-1286950-by-mlohry-on-??? >>> ==1286950== >>> ==1286950== TO CONTROL THIS PROCESS USING vgdb (which you probably >>> ==1286950== don't want to do, unless you know exactly what you're doing, >>> ==1286950== or are doing some strange experiment): >>> ==1286950== /usr/lib/x86_64-linux-gnu/valgrind/../../bin/vgdb --pid=1286950 ...command... >>> ==1286950== >>> ==1286950== TO DEBUG THIS PROCESS USING GDB: start GDB like this >>> ==1286950== /path/to/gdb ./verification_testing >>> ==1286950== and then give GDB the following command >>> ==1286950== target remote | /usr/lib/x86_64-linux-gnu/valgrind/../../bin/vgdb --pid=1286950 >>> ==1286950== --pid is optional if only one valgrind process is running >>> ==1286950== >>> --1286950-- REDIR: 0x4022d80 (ld-linux-x86-64.so.2:strlen) redirected to 0x580c9ce2 (???) >>> --1286950-- REDIR: 0x4022b50 (ld-linux-x86-64.so.2:index) redirected to 0x580c9cfc (???) >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_core-amd64-linux.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so >>> --1286950-- object doesn't have a symbol table >>> ==1286950== WARNING: new redirection conflicts with existing -- ignoring it >>> --1286950-- old: 0x04022d80 (strlen ) R-> (0000.0) 0x580c9ce2 ??? >>> --1286950-- new: 0x04022d80 (strlen ) R-> (2007.0) 0x0483f060 strlen >>> --1286950-- REDIR: 0x401f560 (ld-linux-x86-64.so.2:strcmp) redirected to 0x483ffd0 (strcmp) >>> --1286950-- REDIR: 0x40232e0 (ld-linux-x86-64.so.2:mempcpy) redirected to 0x4843a20 (mempcpy) >>> --1286950-- Reading syms from /home/mlohry/dev/cmake-build/initialization/libinitialization.so >>> --1286950-- Reading syms from /home/mlohry/dev/cmake-build/governing_equations/libgoverning_equations.so >>> --1286950-- Reading syms from /home/mlohry/dev/cmake-build/time_stepping/libtime_stepping.so >>> --1286950-- Reading syms from /home/mlohry/dev/cmake-build/governing_equations/libboundary_conditions.so >>> --1286950-- Reading syms from /home/mlohry/dev/cmake-build/governing_equations/libsolution_monitors.so >>> --1286950-- Reading syms from /home/mlohry/dev/cmake-build/governing_equations/libfluxtypes.so >>> --1286950-- Reading syms from /home/mlohry/dev/cmake-build/algebraic_solvers/libalgebraic_solvers.so >>> --1286950-- Reading syms from /home/mlohry/dev/cmake-build/program_options/libprogram_options.so >>> --1286950-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_filesystem.so.1.73.0 >>> --1286950-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0 >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so.40.20.1 >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3 >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libpthread-2.31.so >>> --1286950-- Considering /usr/lib/debug/.build-id/77/5cbbfff814456660786780b0b3b40096b4c05e.debug .. >>> --1286950-- .. build-id is valid >>> --1286948-- Reading syms from /home/mlohry/dev/cmake-build/external/petsc/arch-linux2-c-opt/lib/libpetsc.so.3.13.3 >>> --1286937-- Reading syms from /home/mlohry/dev/cmake-build/parallel/libparallel.so >>> --1286937-- Reading syms from /home/mlohry/dev/cmake-build/logger/liblogger.so >>> --1286937-- Reading syms from /home/mlohry/dev/cmake-build/spatial_discretization/libdiscretization.so >>> --1286945-- Reading syms from /home/mlohry/dev/cmake-build/utils/libutils.so >>> --1286944-- Reading syms from /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28 >>> --1286938-- object doesn't have a symbol table >>> --1286949-- Reading syms from /usr/lib/x86_64-linux-gnu/libm-2.31.so >>> --1286949-- Considering /usr/lib/x86_64-linux-gnu/libm-2.31.so .. >>> --1286947-- .. CRC mismatch (computed 327d785f wanted 751f5509) >>> --1286947-- Considering /lib/x86_64-linux-gnu/libm-2.31.so .. >>> --1286938-- .. CRC mismatch (computed 327d785f wanted 751f5509) >>> --1286937-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/libm-2.31.so .. >>> --1286950-- .. CRC is valid >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libc-2.31.so >>> --1286950-- Considering /usr/lib/x86_64-linux-gnu/libc-2.31.so .. >>> --1286951-- .. CRC mismatch (computed a6f43087 wanted 6555436e) >>> --1286951-- Considering /lib/x86_64-linux-gnu/libc-2.31.so .. >>> --1286947-- .. CRC mismatch (computed a6f43087 wanted 6555436e) >>> --1286947-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.31.so .. >>> --1286950-- .. CRC is valid >>> --1286940-- Reading syms from /home/mlohry/dev/cmake-build/file_io/libfileio.so >>> --1286950-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_program_options.so.1.73.0 >>> --1286950-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_serialization.so.1.73.0 >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3 >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3 >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libhwloc.so.15.1.0 >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from /home/mlohry/dev/cmake-build/external/petsc/arch-linux2-c-opt/lib/libsuperlu_dist.so.6.3.0 >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 >>> --1286950-- object doesn't have a symbol table >>> --1286937-- Reading syms from /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 >>> --1286937-- object doesn't have a symbol table >>> --1286947-- Reading syms from /usr/lib/x86_64-linux-gnu/libX11.so.6.3.0 >>> --1286939-- object doesn't have a symbol table >>> --1286947-- Reading syms from /usr/lib/x86_64-linux-gnu/libdl-2.31.so >>> --1286947-- Considering /usr/lib/x86_64-linux-gnu/libdl-2.31.so .. >>> --1286947-- .. CRC mismatch (computed 4fd191ca wanted df8dd39a) >>> --1286947-- Considering /lib/x86_64-linux-gnu/libdl-2.31.so .. >>> --1286947-- .. CRC mismatch (computed 4fd191ca wanted df8dd39a) >>> --1286947-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/libdl-2.31.so .. >>> --1286947-- .. CRC is valid >>> --1286937-- Reading syms from /home/mlohry/dev/cmake-build/external/petsc/arch-linux2-c-opt/lib/libmetis.so >>> --1286937-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_log.so.1.73.0 >>> --1286942-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_log_setup.so.1.73.0 >>> --1286942-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_thread.so.1.73.0 >>> --1286942-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_regex.so.1.73.0 >>> --1286949-- Reading syms from /home/mlohry/dev/cmake-build/basis_functions/libbasis_functions.so >>> --1286944-- Reading syms from /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0 >>> --1286944-- object doesn't have a symbol table >>> --1286951-- Reading syms from /home/mlohry/dev/cmake-build/external_install/lib/libcgns.so >>> --1286951-- object doesn't have a symbol table >>> --1286943-- Reading syms from /home/mlohry/dev/cmake-build/external_install/lib/libhdf5.so.103.1.0 >>> --1286951-- Reading syms from /home/mlohry/dev/cmake-build/external/tinyxml2-build/libtinyxml2.so.6.1.0 >>> --1286944-- Reading syms from /home/mlohry/dev/cmake-build/boost_install/lib/libboost_iostreams.so.1.73.0 >>> --1286944-- Reading syms from /usr/lib/x86_64-linux-gnu/libz.so.1.2.11 >>> --1286944-- object doesn't have a symbol table >>> --1286951-- Reading syms from /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0 >>> --1286951-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libutil-2.31.so >>> --1286946-- Considering /usr/lib/x86_64-linux-gnu/libutil-2.31.so .. >>> --1286946-- .. CRC mismatch (computed 4639aba5 wanted ceb246b4) >>> --1286946-- Considering /lib/x86_64-linux-gnu/libutil-2.31.so .. >>> --1286946-- .. CRC mismatch (computed 4639aba5 wanted ceb246b4) >>> --1286948-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/libutil-2.31.so .. >>> --1286939-- .. CRC is valid >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libudev.so.1.6.17 >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libltdl.so.7.3.1 >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libgfortran.so.5.0.0 >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libxcb.so.1.1.0 >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/librt-2.31.so >>> --1286950-- Considering /usr/lib/x86_64-linux-gnu/librt-2.31.so .. >>> --1286950-- .. CRC mismatch (computed a9acc0ce wanted cb4905a3) >>> --1286950-- Considering /lib/x86_64-linux-gnu/librt-2.31.so .. >>> --1286950-- .. CRC mismatch (computed a9acc0ce wanted cb4905a3) >>> --1286950-- Considering /usr/lib/debug/lib/x86_64-linux-gnu/librt-2.31.so .. >>> --1286950-- .. CRC is valid >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/libquadmath.so.0.0.0 >>> --1286950-- object doesn't have a symbol table >>> --1286945-- Reading syms from /usr/lib/x86_64-linux-gnu/libXau.so.6.0.0 >>> --1286945-- Considering /usr/lib/x86_64-linux-gnu/libXau.so.6.0.0 .. >>> --1286945-- .. CRC mismatch (computed 7de9b6ad wanted e8a17129) >>> --1286945-- Considering /lib/x86_64-linux-gnu/libXau.so.6.0.0 .. >>> --1286945-- .. CRC mismatch (computed 7de9b6ad wanted e8a17129) >>> --1286945-- object doesn't have a symbol table >>> --1286945-- Reading syms from /usr/lib/x86_64-linux-gnu/libXdmcp.so.6.0.0 >>> --1286942-- object doesn't have a symbol table >>> --1286942-- Reading syms from /usr/lib/x86_64-linux-gnu/libbsd.so.0.10.0 >>> --1286942-- object doesn't have a symbol table >>> --1286950-- REDIR: 0x6516600 (libc.so.6:memmove) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6515900 (libc.so.6:strncpy) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6516930 (libc.so.6:strcasecmp) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6515220 (libc.so.6:strcat) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6515960 (libc.so.6:rindex) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6517dd0 (libc.so.6:rawmemchr) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6532e60 (libc.so.6:wmemchr) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65329a0 (libc.so.6:wcscmp) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6516760 (libc.so.6:mempcpy) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6516590 (libc.so.6:bcmp) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6515890 (libc.so.6:strncmp) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65152d0 (libc.so.6:strcmp) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65166c0 (libc.so.6:memset) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6532960 (libc.so.6:wcschr) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65157f0 (libc.so.6:strnlen) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65153b0 (libc.so.6:strcspn) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6516980 (libc.so.6:strncasecmp) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6515350 (libc.so.6:strcpy) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6516ad0 (libc.so.6:memcpy@@GLIBC_2.14) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65340d0 (libc.so.6:wcsnlen) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65329e0 (libc.so.6:wcscpy) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65159a0 (libc.so.6:strpbrk) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6515280 (libc.so.6:index) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65157b0 (libc.so.6:strlen) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x651ed20 (libc.so.6:memrchr) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65169d0 (libc.so.6:strcasecmp_l) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6516550 (libc.so.6:memchr) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6532ab0 (libc.so.6:wcslen) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6515c60 (libc.so.6:strspn) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65168d0 (libc.so.6:stpncpy) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6516870 (libc.so.6:stpcpy) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6517e10 (libc.so.6:strchrnul) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6516a20 (libc.so.6:strncasecmp_l) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x6516470 (libc.so.6:strstr) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65a3750 (libc.so.6:__memcpy_chk) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286938-- REDIR: 0x6527a30 (libc.so.6:__strrchr_sse2) redirected to 0x483ea70 (__strrchr_sse2) >>> --1286938-- REDIR: 0x6511c90 (libc.so.6:calloc) redirected to 0x483dce0 (calloc) >>> --1286938-- REDIR: 0x6510260 (libc.so.6:malloc) redirected to 0x483b780 (malloc) >>> --1286938-- REDIR: 0x6531c40 (libc.so.6:memcpy at GLIBC_2.2.5) redirected to 0x4840100 (memcpy at GLIBC_2.2.5) >>> --1286938-- REDIR: 0x6527d30 (libc.so.6:__strlen_sse2) redirected to 0x483efa0 (__strlen_sse2) >>> --1286938-- REDIR: 0x65f4ac0 (libc.so.6:__strncmp_sse42) redirected to 0x483f7c0 (__strncmp_sse42) >>> --1286938-- REDIR: 0x6510850 (libc.so.6:free) redirected to 0x483c9d0 (free) >>> --1286938-- REDIR: 0x6532070 (libc.so.6:__memset_sse2_unaligned) redirected to 0x48428e0 (memset) >>> --1286938-- REDIR: 0x6603350 (libc.so.6:__memcmp_sse4_1) redirected to 0x4842150 (__memcmp_sse4_1) >>> --1286938-- REDIR: 0x6520520 (libc.so.6:__strcmp_sse2_unaligned) redirected to 0x483fed0 (strcmp) >>> --1286938-- REDIR: 0x61d0c10 (libstdc++.so.6:operator new(unsigned long)) redirected to 0x483bdf0 (operator new(unsigned long)) >>> --1286938-- REDIR: 0x61cee60 (libstdc++.so.6:operator delete(void*)) redirected to 0x483cf50 (operator delete(void*)) >>> --1286938-- REDIR: 0x61d0c70 (libstdc++.so.6:operator new[](unsigned long)) redirected to 0x483c510 (operator new[](unsigned long)) >>> --1286938-- REDIR: 0x61cee90 (libstdc++.so.6:operator delete[](void*)) redirected to 0x483d6e0 (operator delete[](void*)) >>> --1286938-- REDIR: 0x65275f0 (libc.so.6:__strchr_sse2) redirected to 0x483eb90 (__strchr_sse2) >>> --1286950-- REDIR: 0x6511000 (libc.so.6:realloc) redirected to 0x483df30 (realloc) >>> --1286950-- REDIR: 0x6527820 (libc.so.6:__strchrnul_sse2) redirected to 0x4843540 (strchrnul) >>> --1286950-- REDIR: 0x6531560 (libc.so.6:__strstr_sse2_unaligned) redirected to 0x4843c20 (strstr) >>> --1286950-- REDIR: 0x6531c20 (libc.so.6:__mempcpy_sse2_unaligned) redirected to 0x4843660 (mempcpy) >>> --1286950-- REDIR: 0x652d2a0 (libc.so.6:__strncpy_sse2_unaligned) redirected to 0x483f560 (__strncpy_sse2_unaligned) >>> --1286950-- REDIR: 0x6515830 (libc.so.6:strncat) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> --1286950-- REDIR: 0x65305b0 (libc.so.6:__strncat_sse2_unaligned) redirected to 0x483ede0 (strncat) >>> --1286950-- REDIR: 0x6516120 (libc.so.6:__GI_strstr) redirected to 0x4843ca0 (__strstr_sse2) >>> --1286950-- REDIR: 0x6522360 (libc.so.6:__rawmemchr_sse2) redirected to 0x4843580 (rawmemchr) >>> --1286950-- REDIR: 0x65faea0 (libc.so.6:__strcasecmp_avx) redirected to 0x483f830 (strcasecmp) >>> --1286950-- REDIR: 0x65fc520 (libc.so.6:__strncasecmp_avx) redirected to 0x483f910 (strncasecmp) >>> --1286950-- REDIR: 0x65f98a0 (libc.so.6:__strspn_sse42) redirected to 0x4843ef0 (strspn) >>> --1286950-- REDIR: 0x65f9620 (libc.so.6:__strcspn_sse42) redirected to 0x4843e10 (strcspn) >>> --1286948-- REDIR: 0x6522030 (libc.so.6:__memchr_sse2) redirected to 0x4840050 (memchr) >>> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_mmap.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_posix.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_sysv.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Discarding syms at 0x4a96240-0x4a96d47 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_posix.so (have_dinfo 1) >>> --1286948-- Discarding syms at 0x4a9b1c0-0x4a9b937 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_sysv.so (have_dinfo 1) >>> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_weighted.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_netlink.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/libnl-3.so.200.26.0 >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Discarding syms at 0x4a96120-0x4a966b0 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_weighted.so (have_dinfo 1) >>> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_flux.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_orte.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_ompi.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_slurm.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- REDIR: 0x64bc670 (libc.so.6:setenv) redirected to 0x4844480 (setenv) >>> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_pmi.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_flux.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_ext3x.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25 >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Discarding syms at 0x8d053e0-0x8d07391 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_flux.so (have_dinfo 1) >>> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v21.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v20.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v3.so >>> --1286948-- object doesn't have a symbol table >>> --1286948-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v12.so >>> --1286948-- object doesn't have a symbol table >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_usock.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_tcp.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_native.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_none.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Discarding syms at 0x8d04180-0x8d045b0 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_none.so (have_dinfo 1) >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/libmca_common_dstore.so.1.0.2 >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_hash.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds12.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_pshmem_mmap.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_preg_native.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_stdfd.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_syslog.so >>> --1286950-- object doesn't have a symbol table >>> --1286950-- Reading syms from /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_default.so >>> --1286950-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_tool.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_app.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_orted.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_hnp.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_novm.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Discarding syms at 0x9ebf0a0-0x9ebf490 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_tool.so (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9eca300-0x9ecbee8 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_orted.so (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9ed1220-0x9ed24e7 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_hnp.so (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9ed8240-0x9ed8c88 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_novm.so (have_dinfo 1) >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_tool.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_app.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_hnp.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_orted.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Discarding syms at 0x9ebf0e0-0x9ebf417 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_tool.so (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9ecf320-0x9ed1239 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_hnp.so (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9ed73a0-0x9ed9ccc in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_orted.so (have_dinfo 1) >>> --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_basic.so >>> --1286936-- object doesn't have a symbol table >>> --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_bucket.so >>> --1286936-- object doesn't have a symbol table >>> --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_rcache_grdma.so >>> --1286936-- object doesn't have a symbol table >>> --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mpool_hugepage.so >>> --1286936-- object doesn't have a symbol table >>> --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_bml_r2.so >>> --1286936-- object doesn't have a symbol table >>> --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_tcp.so >>> --1286936-- object doesn't have a symbol table >>> --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_sm.so >>> --1286936-- object doesn't have a symbol table >>> --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so >>> --1286936-- object doesn't have a symbol table >>> --1286936-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_openib.so >>> --1286936-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_verbs.so.40.20.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_self.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- REDIR: 0x652cc70 (libc.so.6:__strcpy_sse2_unaligned) redirected to 0x483f090 (strcpy) >>> --1286946-- REDIR: 0x65a3810 (libc.so.6:__memmove_chk) redirected to 0x48331d0 (_vgnU_ifunc_wrapper) >>> ==1286946== WARNING: new redirection conflicts with existing -- ignoring it >>> --1286946-- old: 0x06531c30 (__memcpy_chk_sse2_un) R-> (2030.0) 0x04843b10 __memcpy_chk >>> --1286946-- new: 0x06531c30 (__memcpy_chk_sse2_un) R-> (2024.0) 0x048434d0 __memmove_chk >>> --1286946-- REDIR: 0x6531c30 (libc.so.6:__memcpy_chk_sse2_unaligned) redirected to 0x4843b10 (__memcpy_chk) >>> --1286946-- REDIR: 0x65129b0 (libc.so.6:posix_memalign) redirected to 0x483e1e0 (posix_memalign) >>> --1286946-- Discarding syms at 0x9f15280-0x9f32932 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_openib.so (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9f7c4c0-0x9f7ded8 in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_verbs.so.40.20.0 (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9f620c0-0x9f71483 in /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9f9ba10-0x9fd22ee in /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 (have_dinfo 1) >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_cm.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_monitoring.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_monitoring.so.50.10.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_vprotocol_pessimist.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Discarding syms at 0x9f4d400-0x9f50c19 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_vprotocol_pessimist.so (have_dinfo 1) >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/libpsm1/libpsm_infinipath.so.1.16 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libinfinipath.so.4.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libuuid.so.1.3.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_ofi.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libfabric.so.1.9.15 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/librdmacm.so.1.2.28.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libpsm2.so.2.1 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/libnuma.so.1.0.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- REDIR: 0x6517140 (libc.so.6:strcasestr) redirected to 0x4843f80 (strcasestr) >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm2.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Discarding syms at 0x9f4d5c0-0x9f4f5a1 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm.so (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9fee680-0x9ff096c in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_psm2.so (have_dinfo 1) >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_inter.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_basic.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sm.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_sm.so.40.20.0 >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_self.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sync.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_monitoring.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_libnbc.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_tuned.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_sm.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_pt2pt.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_rdma.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_monitoring.so >>> --1286946-- object doesn't have a symbol table >>> --1286946-- Discarding syms at 0x9f724a0-0x9f787b5 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mtl_ofi.so (have_dinfo 1) >>> --1286946-- Discarding syms at 0xa827f80-0xa8e14c4 in /usr/lib/x86_64-linux-gnu/libfabric.so.1.9.15 (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9f94830-0x9fbafce in /usr/lib/libpsm1/libpsm_infinipath.so.1.16 (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9fe5580-0x9fe8f71 in /usr/lib/x86_64-linux-gnu/libuuid.so.1.3.0 (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9f56420-0x9f5cec0 in /usr/lib/x86_64-linux-gnu/libinfinipath.so.4.0 (have_dinfo 1) >>> --1286946-- Discarding syms at 0xa929f10-0xa93d5fc in /usr/lib/x86_64-linux-gnu/librdmacm.so.1.2.28.0 (have_dinfo 1) >>> --1286946-- Discarding syms at 0xa94b0c0-0xa95a483 in /usr/lib/x86_64-linux-gnu/libibverbs.so.1.8.28.0 (have_dinfo 1) >>> --1286946-- Discarding syms at 0xa968860-0xa9adf12 in /usr/lib/x86_64-linux-gnu/libpsm2.so.2.1 (have_dinfo 1) >>> --1286946-- Discarding syms at 0xa9e7a10-0xaa1e2ee in /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200.26.0 (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9f80410-0x9f84e27 in /usr/lib/x86_64-linux-gnu/libnuma.so.1.0.0 (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9f103e0-0x9f15fd5 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_cm.so (have_dinfo 1) >>> --1286946-- Discarding syms at 0x9f471e0-0x9f47ce0 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_monitoring.so (have_dinfo 1) >>> ==1286946== Thread 3: >>> ==1286946== Syscall param writev(vector[...]) points to uninitialised byte(s) >>> ==1286946== at 0x658A48D: __writev (writev.c:26) >>> ==1286946== by 0x658A48D: writev (writev.c:24) >>> ==1286946== by 0x8DF9B4C: pmix_ptl_base_send_handler (in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >>> ==1286946== by 0x7CC413E: ??? (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286946== by 0x7CC487E: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286946== by 0x8DBDD55: ??? (in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >>> ==1286946== by 0x4BF7608: start_thread (pthread_create.c:477) >>> ==1286946== by 0x6595102: clone (clone.S:95) >>> ==1286946== Address 0xa28fdcf is 127 bytes inside a block of size 5,120 alloc'd >>> ==1286946== at 0x483DFAF: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286946== by 0x8DE155A: pmix_bfrop_buffer_extend (in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >>> ==1286946== by 0x8DE3F4A: pmix_bfrops_base_pack_byte (in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >>> ==1286946== by 0x8DE4900: pmix_bfrops_base_pack_buf (in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >>> ==1286946== by 0x8DE4175: pmix_bfrops_base_pack (in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >>> ==1286946== by 0x8D7CF91: ??? (in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >>> ==1286946== by 0x7CC3FDD: ??? (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286946== by 0x7CC487E: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286946== by 0x8DBDD55: ??? (in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25) >>> ==1286946== by 0x4BF7608: start_thread (pthread_create.c:477) >>> ==1286946== by 0x6595102: clone (clone.S:95) >>> ==1286946== Uninitialised value was created by a stack allocation >>> ==1286946== at 0x9F048D6: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so) >>> ==1286946== >>> --1286944-- Discarding syms at 0xaa4d220-0xaa5796a in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmp--1286945-- Discarding syms at 0xaa4d220---1286948-- Discarding syms at 0xaae1100-0xaae7d70 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmp--1286945-- Discarding syms at 0xaae1100-0xaae7d70 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_monitoring.so (have_dinfo 1) >>> --1286945-- Discarding syms at 0x9f69420-0x9f--1286938-- REDIR: 0x61cee70 (libstdc++.so.6:operator delete(void*, unsigned long)) redirected to --1286937-- REDIR: 0x61cee70 (libstdc++.so.6:opera--1286946-- REDIR: 0x652e970 (libc.so.6:__stpncpy_sse2_unaligned) redirected to 0x48427e0 (stpncpy) >>> --1286942-- REDIR: 0x6527ed0 (libc.so.6:__strnlen_sse2) redirected to 0x483eee0 (strnlen) >>> --1286944-- REDIR: 0x652fcc0 (libc.so.6:__strcat_sse2_unaligned) redirected to 0x483ec20 (strcat) >>> --1286951-- REDIR: 0x65113d0 (libc.so.6:memalign) redirected to 0x483e2a0 (memalign) >>> --1286951-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_romio321.so >>> --1286951-- object doesn't have a symbol table >>> --1286951-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_ompio.so >>> --1286951-- object doesn't have a symbol table >>> --1286941-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_ompio.so.41.19.3 >>> --1286941-- object doesn't have a symbol table >>> --1286951-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fs_ufs.so >>> --1286951-- object doesn't have a symbol table >>> --1286939-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_two_phase.so >>> --1286939-- object doesn't have a symbol table >>> --1286939-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_individual.so >>> --1286939-- object doesn't have a symbol table >>> --1286939-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic.so >>> --1286939-- object doesn't have a symbol table >>> --1286939-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic_gen2.so >>> --1286939-- object doesn't have a symbol table >>> --1286939-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_vulcan.so >>> --1286939-- object doesn't have a symbol table >>> --1286939-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fbtl_posix.so >>> --1286939-- object doesn't have a symbol table >>> --1286943-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_sm.so >>> --1286943-- object doesn't have a symbol table >>> --1286943-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_individual.so >>> --1286943-- object doesn't have a symbol table >>> --1286943-- Reading syms from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_lockedfile.so >>> --1286943-- object doesn't have a symbol table >>> --1286938-- REDIR: 0x65a3b00 (libc.so.6:__strcpy_chk) redirected to 0x48435c0 (__strcpy_chk) >>> --1286939-- Discarding syms at 0x9f1d660-0x9f371d6 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9f5afa0-0x9f8f8b6 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_romio321.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9fa0640-0x9fa42d9 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_io_ompio.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9f4c160-0x9f4dc58 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_inter.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0xa7fc270-0xa804f00 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_basic.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9fee3a0-0x9ff134e in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sm.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0xa80a240-0xa80aa8d in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_sm.so.40.20.0 (have_dinfo 1) >>> --1286939-- Discarding syms at 0xa80f0e0-0xa80f8bb in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_self.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0xaa460c0-0xaa47947 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_sync.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0xaa613e0-0xaa7730f in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_libnbc.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0xaa849c0-0xaa8a845 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_coll_tuned.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9ee1320-0x9ee3567 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_bml_r2.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9eebc40-0x9ef4ad7 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_tcp.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9f02600-0x9f08cd8 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9f40200-0x9f4126e in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_self.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9eda4e0-0x9edb4c5 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_mpool_hugepage.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9ed32c0-0x9ed4afe in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_rcache_grdma.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9ebf160-0x9ebfe95 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_basic.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9ece140-0x9ecebed in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_allocator_bucket.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9ec92a0-0x9ec9aa2 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_errmgr_default_app.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8eae0e0-0x8eae4a7 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_stdfd.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8eb3220-0x8eb3c27 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_syslog.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8eb80e0-0x8eb90b7 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_plog_default.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8ea6380-0x8ea97b3 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_preg_native.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8e5a740-0x8e5f859 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_usock.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8e67be0-0x8e743f0 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_ptl_tcp.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x84da200-0x84daa5d in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_psec_native.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8d322b0-0x8d34bfc in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v21.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8e29480-0x8e3b70a in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v20.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8d3c2b0-0x8d3ed5c in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v3.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8e45340-0x8e502da in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_bfrops_v12.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8e901a0-0x8e908a7 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_pshmem_mmap.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8d05520-0x8d06783 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8e7b460-0x8e8aaa4 in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_hash.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8d44520-0x8d4556a in /usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds12.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8e97600-0x8ea0fa1 in /usr/lib/x86_64-linux-gnu/pmix/lib/libmca_common_dstore.so.1.0.2 (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8d109c0-0x8d27dcf in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_ext3x.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x8d5b280-0x8dfdffb in /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.25 (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9ec40a0-0x9ec4490 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_state_app.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x84d2580-0x84d518f in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_pmi.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x4a96120-0x4a9644f in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_flux.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x4aa0100-0x4aa03e7 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_orte.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x84c74a0-0x84c901f in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_ompi.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x4aa5260-0x4aa58e9 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_schizo_slurm.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x4a9b420-0x4a9bcdf in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_reachable_netlink.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x84e7460-0x84f52ca in /usr/lib/x86_64-linux-gnu/libnl-3.so.200.26.0 (have_dinfo 1) >>> --1286939-- Discarding syms at 0x4a90360-0x4a91107 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_shmem_mmap.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9f46220-0x9f474cc in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fbtl_posix.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9f0f180-0x9f0f78d in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fs_ufs.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0xaa94540-0xaa96a4a in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_sm.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0xaa9f6c0-0xaab44d0 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_pt2pt.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0xaabe820-0xaad8ee0 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_osc_rdma.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9efc080-0x9efc1e1 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_sm.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9fab2a0-0x9fb1341 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_two_phase.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9f140c0-0x9f14299 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_individual.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9fb72a0-0x9fbb791 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9fd52a0-0x9fda794 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_dynamic_gen2.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9fe02e0-0x9fe59a5 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_fcoll_vulcan.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0xa815460-0xa8177ab in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_sm.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0xa81e260-0xa82033d in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_individual.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0xa8273e0-0xa8297d8 in /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_sharedfp_lockedfile.so (have_dinfo 1) >>> --1286939-- Discarding syms at 0x9fc85e0-0x9fce8ef in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmca_common_ompio.so.41.19.3 (have_dinfo 1) >>> ==1286939== >>> ==1286939== HEAP SUMMARY: >>> ==1286939== in use at exit: 74,054 bytes in 223 blocks >>> ==1286939== total heap usage: 22,405,782 allocs, 22,405,559 frees, 34,062,479,959 bytes allocated >>> ==1286939== >>> ==1286939== Searching for pointers to 223 not-freed blocks >>> ==1286939== Checked 3,415,912 bytes >>> ==1286939== >>> ==1286939== Thread 1: >>> ==1286939== 1 bytes in 1 blocks are definitely lost in loss record 1 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x651550E: strdup (strdup.c:42) >>> ==1286939== by 0x9F6A4B6: ??? >>> ==1286939== by 0x9F47373: ??? >>> ==1286939== by 0x68E3B9B: mca_base_framework_components_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3F35: mca_base_framework_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3F93: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x4BA1734: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== >>> ==1286939== 8 bytes in 1 blocks are still reachable in loss record 2 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x764724C: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) >>> ==1286939== by 0x7657B9A: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) >>> ==1286939== by 0x7645679: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) >>> ==1286939== by 0x4011B89: call_init.part.0 (dl-init.c:72) >>> ==1286939== by 0x4011C90: call_init (dl-init.c:30) >>> ==1286939== by 0x4011C90: _dl_init (dl-init.c:119) >>> ==1286939== by 0x4001139: ??? (in /usr/lib/x86_64-linux-gnu/ld-2.31.so ) >>> ==1286939== by 0x3: ??? >>> ==1286939== by 0x1FFEFFF926: ??? >>> ==1286939== by 0x1FFEFFF93D: ??? >>> ==1286939== by 0x1FFEFFF987: ??? >>> ==1286939== by 0x1FFEFFF9A7: ??? >>> ==1286939== >>> ==1286939== 8 bytes in 1 blocks are definitely lost in loss record 3 of 44 >>> ==1286939== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x9F69B6F: ??? >>> ==1286939== by 0x9F1CDED: ??? >>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9EE3527: ??? >>> ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>> ==1286939== >>> ==1286939== 13 bytes in 2 blocks are still reachable in loss record 4 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x651550E: strdup (strdup.c:42) >>> ==1286939== by 0x7CC3657: event_config_avoid_method (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x68FEB5A: opal_event_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68FE8CA: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68B8BCF: opal_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x6860120: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== >>> ==1286939== 15 bytes in 1 blocks are indirectly lost in loss record 5 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x651550E: strdup (strdup.c:42) >>> ==1286939== by 0x9EDB189: ??? >>> ==1286939== by 0x68D98FC: mca_base_framework_components_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x6907C25: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x4BA16D5: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>> ==1286939== >>> ==1286939== 15 bytes in 1 blocks are definitely lost in loss record 6 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x651550E: strdup (strdup.c:42) >>> ==1286939== by 0x9F5655C: ??? >>> ==1286939== by 0x4011B89: call_init.part.0 (dl-init.c:72) >>> ==1286939== by 0x4011C90: call_init (dl-init.c:30) >>> ==1286939== by 0x4011C90: _dl_init (dl-init.c:119) >>> ==1286939== by 0x65D6784: _dl_catch_exception (dl-error-skeleton.c:182) >>> ==1286939== by 0x401642C: dl_open_worker (dl-open.c:758) >>> ==1286939== by 0x65D6727: _dl_catch_exception (dl-error-skeleton.c:208) >>> ==1286939== by 0x40155F9: _dl_open (dl-open.c:837) >>> ==1286939== by 0x72DE34B: dlopen_doit (dlopen.c:66) >>> ==1286939== by 0x65D6727: _dl_catch_exception (dl-error-skeleton.c:208) >>> ==1286939== by 0x65D67F2: _dl_catch_error (dl-error-skeleton.c:227) >>> ==1286939== >>> ==1286939== 16 bytes in 1 blocks are definitely lost in loss record 7 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x9F1CBEB: ??? >>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9EE3527: ??? >>> ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>> ==1286939== >>> ==1286939== 16 bytes in 1 blocks are definitely lost in loss record 8 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x9F1CC66: ??? >>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9EE3527: ??? >>> ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>> ==1286939== >>> ==1286939== 16 bytes in 1 blocks are definitely lost in loss record 9 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x9F1CCDA: ??? >>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9EE3527: ??? >>> ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>> ==1286939== >>> ==1286939== 25 bytes in 1 blocks are still reachable in loss record 10 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x651550E: strdup (strdup.c:42) >>> ==1286939== by 0x68F27BD: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x4B956B6: ompi_pml_v_output_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B95259: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x68D98FC: mca_base_framework_components_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x4B93FAE: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x4BA1734: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== >>> ==1286939== 30 bytes in 1 blocks are definitely lost in loss record 11 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0xA9A859B: ??? >>> ==1286939== by 0x4011B89: call_init.part.0 (dl-init.c:72) >>> ==1286939== by 0x4011C90: call_init (dl-init.c:30) >>> ==1286939== by 0x4011C90: _dl_init (dl-init.c:119) >>> ==1286939== by 0x65D6784: _dl_catch_exception (dl-error-skeleton.c:182) >>> ==1286939== by 0x401642C: dl_open_worker (dl-open.c:758) >>> ==1286939== by 0x65D6727: _dl_catch_exception (dl-error-skeleton.c:208) >>> ==1286939== by 0x40155F9: _dl_open (dl-open.c:837) >>> ==1286939== by 0x72DE34B: dlopen_doit (dlopen.c:66) >>> ==1286939== by 0x65D6727: _dl_catch_exception (dl-error-skeleton.c:208) >>> ==1286939== by 0x65D67F2: _dl_catch_error (dl-error-skeleton.c:227) >>> ==1286939== by 0x72DEB58: _dlerror_run (dlerror.c:170) >>> ==1286939== >>> ==1286939== 32 bytes in 1 blocks are still reachable in loss record 12 of 44 >>> ==1286939== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x7CC353E: event_get_supported_methods (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x68FEA98: opal_event_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68FE8CA: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68B8BCF: opal_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x6860120: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== >>> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 13 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A5EB: ??? >>> ==1286939== by 0x84D2B0A: ??? >>> ==1286939== by 0x68602FB: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== >>> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 14 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A5EB: ??? >>> ==1286939== by 0x84D2BCE: ??? >>> ==1286939== by 0x68602FB: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== >>> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 15 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A5EB: ??? >>> ==1286939== by 0x84D2CB2: ??? >>> ==1286939== by 0x68602FB: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== >>> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 16 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A5EB: ??? >>> ==1286939== by 0x84D2D91: ??? >>> ==1286939== by 0x68602FB: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== >>> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 17 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E81BD8: ??? >>> ==1286939== by 0x8E89F4B: ??? >>> ==1286939== by 0x8D84A0D: ??? >>> ==1286939== by 0x8DF79C1: ??? >>> ==1286939== by 0x7CC3FDD: ??? (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x7CC487E: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x8DBDD55: ??? >>> ==1286939== by 0x4BF7608: start_thread (pthread_create.c:477) >>> ==1286939== by 0x6595102: clone (clone.S:95) >>> ==1286939== >>> ==1286939== 32 bytes in 1 blocks are definitely lost in loss record 18 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A767: ??? >>> ==1286939== by 0x84D330E: ??? >>> ==1286939== by 0x68602FB: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== >>> ==1286939== 36 (32 direct, 4 indirect) bytes in 1 blocks are definitely lost in loss record 19 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A5EB: ??? >>> ==1286939== by 0x4B94C09: mca_pml_base_pml_check_selected (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x9F1E1E1: ??? >>> ==1286939== by 0x4BA1A09: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== >>> ==1286939== 40 bytes in 1 blocks are still reachable in loss record 20 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x7CFF4B6: ??? (in /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >>> ==1286939== by 0x7CC5E26: event_global_setup_locks_ (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x7CFF68F: evthread_use_pthreads (in /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >>> ==1286939== by 0x68FE8E4: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68B8BCF: opal_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x6860120: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== >>> ==1286939== 40 bytes in 1 blocks are still reachable in loss record 21 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x7CFF4B6: ??? (in /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >>> ==1286939== by 0x7CCF377: evsig_global_setup_locks_ (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x7CC5E39: event_global_setup_locks_ (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x7CFF68F: evthread_use_pthreads (in /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >>> ==1286939== by 0x68FE8E4: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68B8BCF: opal_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x6860120: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== >>> ==1286939== 40 bytes in 1 blocks are still reachable in loss record 22 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x7CFF4B6: ??? (in /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >>> ==1286939== by 0x7CCB997: evutil_secure_rng_global_setup_locks_ (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x7CC5E4F: event_global_setup_locks_ (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x7CFF68F: evthread_use_pthreads (in /usr/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7.0.0) >>> ==1286939== by 0x68FE8E4: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68B8BCF: opal_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x6860120: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== >>> ==1286939== 48 bytes in 1 blocks are still reachable in loss record 23 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x68D9043: mca_base_component_repository_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68D7F7A: mca_base_component_find (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3F35: mca_base_framework_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3F93: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x4B8560C: mca_io_base_file_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B0E68A: ompi_file_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B3ADB8: PMPI_File_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) >>> ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) >>> ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) >>> ==1286939== >>> ==1286939== 48 bytes in 1 blocks are still reachable in loss record 24 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x68D9043: mca_base_component_repository_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68D7F7A: mca_base_component_find (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3F35: mca_base_framework_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3F93: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x4B85638: mca_io_base_file_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B0E68A: ompi_file_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B3ADB8: PMPI_File_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) >>> ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) >>> ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) >>> ==1286939== >>> ==1286939== 48 bytes in 2 blocks are still reachable in loss record 25 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x7CC3647: event_config_avoid_method (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x68FEB5A: opal_event_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68FE8CA: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68B8BCF: opal_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x6860120: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== >>> ==1286939== 55 (32 direct, 23 indirect) bytes in 1 blocks are definitely lost in loss record 26 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A767: ??? >>> ==1286939== by 0x4AF6CD6: ompi_comm_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA194D: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== >>> ==1286939== 56 bytes in 1 blocks are still reachable in loss record 27 of 44 >>> ==1286939== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x7CC1C86: event_config_new (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x68FEAC0: opal_event_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68FE8CA: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68B8BCF: opal_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x6860120: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== >>> ==1286939== 56 bytes in 1 blocks are definitely lost in loss record 28 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x9F6E008: ??? >>> ==1286939== by 0x9F7C654: ??? >>> ==1286939== by 0x9F1CD3E: ??? >>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9EE3527: ??? >>> ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== >>> ==1286939== 56 bytes in 1 blocks are definitely lost in loss record 29 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0xA957008: ??? >>> ==1286939== by 0xA86B017: ??? >>> ==1286939== by 0xA862FD8: ??? >>> ==1286939== by 0xA828E15: ??? >>> ==1286939== by 0xA829624: ??? >>> ==1286939== by 0x9F77910: ??? >>> ==1286939== by 0x4B85C53: ompi_mtl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x9F13E4D: ??? >>> ==1286939== by 0x4B94673: mca_pml_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA1789: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== >>> ==1286939== 76 (32 direct, 44 indirect) bytes in 1 blocks are definitely lost in loss record 30 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A767: ??? >>> ==1286939== by 0x84D387F: ??? >>> ==1286939== by 0x68602FB: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== >>> ==1286939== 79 (64 direct, 15 indirect) bytes in 1 blocks are definitely lost in loss record 31 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x9EDB12E: ??? >>> ==1286939== by 0x68D98FC: mca_base_framework_components_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x6907C25: ??? (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E4008: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x4BA16D5: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>> ==1286939== >>> ==1286939== 144 bytes in 3 blocks are still reachable in loss record 32 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x68D9043: mca_base_component_repository_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68D7F7A: mca_base_component_find (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3F35: mca_base_framework_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3F93: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x4B8564E: mca_io_base_file_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B0E68A: ompi_file_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B3ADB8: PMPI_File_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) >>> ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) >>> ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) >>> ==1286939== >>> ==1286939== 231 bytes in 12 blocks are definitely lost in loss record 33 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x651550E: strdup (strdup.c:42) >>> ==1286939== by 0x9F2B4B3: ??? >>> ==1286939== by 0x9F2B85C: ??? >>> ==1286939== by 0x9F2BBD7: ??? >>> ==1286939== by 0x9F1CAAC: ??? >>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9EE3527: ??? >>> ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== >>> ==1286939== 240 bytes in 5 blocks are still reachable in loss record 34 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x68D9043: mca_base_component_repository_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68D7F7A: mca_base_component_find (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3A4D: mca_base_framework_components_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3F35: mca_base_framework_register (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x68E3F93: mca_base_framework_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x4B85622: mca_io_base_file_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B0E68A: ompi_file_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B3ADB8: PMPI_File_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) >>> ==1286939== by 0x78D4B23: H5FD_open (H5FD.c:733) >>> ==1286939== by 0x78B953B: H5F_open (H5Fint.c:1493) >>> ==1286939== >>> ==1286939== 272 bytes in 44 blocks are definitely lost in loss record 35 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x9FCAEDB: ??? >>> ==1286939== by 0x9FE42B2: ??? >>> ==1286939== by 0x9FE47BB: ??? >>> ==1286939== by 0x9FCDDBF: ??? >>> ==1286939== by 0x9FA324A: ??? >>> ==1286939== by 0x4B3DD7F: PMPI_File_write_at_all (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x7B6DB32: H5FD_mpio_write (H5FDmpio.c:1833) >>> ==1286939== by 0x78DF11D: H5FD_write (H5FDint.c:257) >>> ==1286939== by 0x78AE86B: H5F__accum_write (H5Faccum.c:825) >>> ==1286939== by 0x7A1FBE9: H5PB_write (H5PB.c:1027) >>> ==1286939== by 0x78BBC7A: H5F_block_write (H5Fio.c:164) >>> ==1286939== >>> ==1286939== 585 (480 direct, 105 indirect) bytes in 15 blocks are definitely lost in loss record 36 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A767: ??? >>> ==1286939== by 0x4B14036: ompi_proc_complete_init_single (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B146C3: ompi_proc_complete_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA19A9: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== >>> ==1286939== 776 bytes in 32 blocks are indirectly lost in loss record 37 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8DE9816: ??? >>> ==1286939== by 0x8DEB1D2: ??? >>> ==1286939== by 0x8DEB49A: ??? >>> ==1286939== by 0x8DE8B12: ??? >>> ==1286939== by 0x8E9D492: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A767: ??? >>> ==1286939== >>> ==1286939== 840 (480 direct, 360 indirect) bytes in 15 blocks are definitely lost in loss record 38 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A5EB: ??? >>> ==1286939== by 0x9EF2F00: ??? >>> ==1286939== by 0x9EEBF17: ??? >>> ==1286939== by 0x9EE2F54: ??? >>> ==1286939== by 0x9F1E1FB: ??? >>> ==1286939== >>> ==1286939== 1,084 (480 direct, 604 indirect) bytes in 15 blocks are definitely lost in loss record 39 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A767: ??? >>> ==1286939== by 0x84D4800: ??? >>> ==1286939== by 0x68602FB: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286939== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== >>> ==1286939== 1,344 bytes in 1 blocks are definitely lost in loss record 40 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x68AE702: opal_free_list_grow_st (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9F1CD2D: ??? >>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9EE3527: ??? >>> ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>> ==1286939== >>> ==1286939== 2,752 bytes in 1 blocks are definitely lost in loss record 41 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x68AE702: opal_free_list_grow_st (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9F1CC50: ??? >>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9EE3527: ??? >>> ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>> ==1286939== >>> ==1286939== 2,752 bytes in 1 blocks are definitely lost in loss record 42 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x68AE702: opal_free_list_grow_st (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9F1CCC4: ??? >>> ==1286939== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286939== by 0x9EE3527: ??? >>> ==1286939== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286939== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286939== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286939== by 0x15710D: main (testing_main.cpp:8) >>> ==1286939== >>> ==1286939== 62,644 bytes in 31 blocks are indirectly lost in loss record 43 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8DE9FA8: ??? >>> ==1286939== by 0x8DEB032: ??? >>> ==1286939== by 0x8DEB49A: ??? >>> ==1286939== by 0x8DE8B12: ??? >>> ==1286939== by 0x8E9D492: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A5EB: ??? >>> ==1286939== >>> ==1286939== 62,760 (480 direct, 62,280 indirect) bytes in 15 blocks are definitely lost in loss record 44 of 44 >>> ==1286939== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8E9D3EB: ??? >>> ==1286939== by 0x8E9F1C1: ??? >>> ==1286939== by 0x8D0578C: ??? >>> ==1286939== by 0x8D8605A: ??? >>> ==1286939== by 0x8D87FE8: ??? >>> ==1286939== by 0x8D88E4D: ??? >>> ==1286939== by 0x8D1A5EB: ??? >>> ==1286939== by 0x9F0398A: ??? >>> ==1286939== by 0x9EE2F54: ??? >>> ==1286939== by 0x9F1E1FB: ??? >>> ==1286939== by 0x4BA1A09: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286939== >>> ==1286939== LEAK SUMMARY: >>> ==1286939== definitely lost: 9,837 bytes in 138 blocks >>> ==1286939== indirectly lost: 63,435 bytes in 64 blocks >>> ==1286939== possibly lost: 0 bytes in 0 blocks >>> ==1286939== still reachable: 782 bytes in 21 blocks >>> ==1286939== suppressed: 0 bytes in 0 blocks >>> ==1286939== >>> ==1286939== ERROR SUMMARY: 29 errors from 29 contexts (suppressed: 0 from 0) >>> ==1286939== >>> ==1286939== 1 errors in context 1 of 29: >>> ==1286939== Thread 3: >>> ==1286939== Syscall param writev(vector[...]) points to uninitialised byte(s) >>> ==1286939== at 0x658A48D: __writev (writev.c:26) >>> ==1286939== by 0x658A48D: writev (writev.c:24) >>> ==1286939== by 0x8DF9B4C: ??? >>> ==1286939== by 0x7CC413E: ??? (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x7CC487E: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x8DBDD55: ??? >>> ==1286939== by 0x4BF7608: start_thread (pthread_create.c:477) >>> ==1286939== by 0x6595102: clone (clone.S:95) >>> ==1286939== Address 0xa28ee1f is 127 bytes inside a block of size 5,120 alloc'd >>> ==1286939== at 0x483DFAF: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286939== by 0x8DE155A: ??? >>> ==1286939== by 0x8DE3F4A: ??? >>> ==1286939== by 0x8DE4900: ??? >>> ==1286939== by 0x8DE4175: ??? >>> ==1286939== by 0x8D7CF91: ??? >>> ==1286939== by 0x7CC3FDD: ??? (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x7CC487E: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286939== by 0x8DBDD55: ??? >>> ==1286939== by 0x4BF7608: start_thread (pthread_create.c:477) >>> ==1286939== by 0x6595102: clone (clone.S:95) >>> ==1286939== Uninitialised value was created by a stack allocation >>> ==1286939== at 0x9F048D6: ??? >>> ==1286939== >>> ==1286939== ERROR SUMMARY: 29 errors from 29 contexts (suppressed: 0 from 0) >>> mpi/lib/libopen-pal.so.40.20.3) >>> ==1286936== by 0x4B85622: mca_io_base_file_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4B0E68A: ompi_file_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4B3ADB8: PMPI_File_open (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x7B6F1AC: H5FD_mpio_open (H5FDmpio.c:997) >>> ==1286936== by 0x78D4B23: H5FD_open (H5FD.c:733) >>> ==1286936== by 0x78B953B: H5F_open (H5Fint.c:1493) >>> ==1286936== >>> ==1286936== 272 bytes in 44 blocks are definitely lost in loss record 39 of 49 >>> ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x9FCAEDB: ??? >>> ==1286936== by 0x9FE42B2: ??? >>> ==1286936== by 0x9FE47BB: ??? >>> ==1286936== by 0x9FCDDBF: ??? >>> ==1286936== by 0x9FA324A: ??? >>> ==1286936== by 0x4B3DD7F: PMPI_File_write_at_all (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x7B6DB32: H5FD_mpio_write (H5FDmpio.c:1833) >>> ==1286936== by 0x78DF11D: H5FD_write (H5FDint.c:257) >>> ==1286936== by 0x78AE86B: H5F__accum_write (H5Faccum.c:825) >>> ==1286936== by 0x7A1FBE9: H5PB_write (H5PB.c:1027) >>> ==1286936== by 0x78BBC7A: H5F_block_write (H5Fio.c:164) >>> ==1286936== >>> ==1286936== 312 bytes in 1 blocks are still reachable in loss record 40 of 49 >>> ==1286936== at 0x483BE63: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x74E78EB: boost::detail::make_external_thread_data() (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_thread.so.1.73.0) >>> ==1286936== by 0x74E7C74: boost::detail::add_thread_exit_function(boost::detail::thread_exit_function_base*) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_thread.so.1.73.0) >>> ==1286936== by 0x73AFCEA: boost::log::v2_mt_posix::sources::aux::get_severity_level() (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_log.so.1.73.0) >>> ==1286936== by 0x5F71A6C: set_value (severity_feature.hpp:135) >>> ==1286936== by 0x5F71A6C: open_record_unlocked > > (severity_feature.hpp:252) >>> ==1286936== by 0x5F71A6C: open_record > > (basic_logger.hpp:459) >>> ==1286936== by 0x5F71A6C: Logger::TraceMessage(std::__cxx11::basic_string, std::allocator >) (logger.cpp:328) >>> ==1286936== by 0x5F729C7: Logger::Message(std::__cxx11::basic_string, std::allocator > const&, LogLevel) (logger.cpp:280) >>> ==1286936== by 0x5F73CF1: Logger::Timer::Timer(std::__cxx11::basic_string, std::allocator > const&, LogLevel) (logger.cpp:426) >>> ==1286936== by 0x15718A: timer (logger.hpp:98) >>> ==1286936== by 0x15718A: main (testing_main.cpp:9) >>> ==1286936== >>> ==1286936== 585 (480 direct, 105 indirect) bytes in 15 blocks are definitely lost in loss record 41 of 49 >>> ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x8E9D3EB: ??? >>> ==1286936== by 0x8E9F1C1: ??? >>> ==1286936== by 0x8D0578C: ??? >>> ==1286936== by 0x8D8605A: ??? >>> ==1286936== by 0x8D87FE8: ??? >>> ==1286936== by 0x8D88E4D: ??? >>> ==1286936== by 0x8D1A767: ??? >>> ==1286936== by 0x4B14036: ompi_proc_complete_init_single (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4B146C3: ompi_proc_complete_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4BA19A9: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== >>> ==1286936== 776 bytes in 32 blocks are indirectly lost in loss record 42 of 49 >>> ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x8DE9816: ??? >>> ==1286936== by 0x8DEB1D2: ??? >>> ==1286936== by 0x8DEB49A: ??? >>> ==1286936== by 0x8DE8B12: ??? >>> ==1286936== by 0x8E9D492: ??? >>> ==1286936== by 0x8E9F1C1: ??? >>> ==1286936== by 0x8D0578C: ??? >>> ==1286936== by 0x8D8605A: ??? >>> ==1286936== by 0x8D87FE8: ??? >>> ==1286936== by 0x8D88E4D: ??? >>> ==1286936== by 0x8D1A767: ??? >>> ==1286936== >>> ==1286936== 840 (480 direct, 360 indirect) bytes in 15 blocks are definitely lost in loss record 43 of 49 >>> ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x8E9D3EB: ??? >>> ==1286936== by 0x8E9F1C1: ??? >>> ==1286936== by 0x8D0578C: ??? >>> ==1286936== by 0x8D8605A: ??? >>> ==1286936== by 0x8D87FE8: ??? >>> ==1286936== by 0x8D88E4D: ??? >>> ==1286936== by 0x8D1A5EB: ??? >>> ==1286936== by 0x9EF2F00: ??? >>> ==1286936== by 0x9EEBF17: ??? >>> ==1286936== by 0x9EE2F54: ??? >>> ==1286936== by 0x9F1E1FB: ??? >>> ==1286936== >>> ==1286936== 1,091 (480 direct, 611 indirect) bytes in 15 blocks are definitely lost in loss record 44 of 49 >>> ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x8E9D3EB: ??? >>> ==1286936== by 0x8E9F1C1: ??? >>> ==1286936== by 0x8D0578C: ??? >>> ==1286936== by 0x8D8605A: ??? >>> ==1286936== by 0x8D87FE8: ??? >>> ==1286936== by 0x8D88E4D: ??? >>> ==1286936== by 0x8D1A767: ??? >>> ==1286936== by 0x84D4800: ??? >>> ==1286936== by 0x68602FB: orte_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-rte.so.40.20.3) >>> ==1286936== by 0x4BA1322: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== >>> ==1286936== 1,344 bytes in 1 blocks are definitely lost in loss record 45 of 49 >>> ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x68AE702: opal_free_list_grow_st (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286936== by 0x9F1CD2D: ??? >>> ==1286936== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286936== by 0x9EE3527: ??? >>> ==1286936== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286936== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286936== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286936== by 0x15710D: main (testing_main.cpp:8) >>> ==1286936== >>> ==1286936== 2,752 bytes in 1 blocks are definitely lost in loss record 46 of 49 >>> ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x68AE702: opal_free_list_grow_st (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286936== by 0x9F1CC50: ??? >>> ==1286936== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286936== by 0x9EE3527: ??? >>> ==1286936== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286936== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286936== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286936== by 0x15710D: main (testing_main.cpp:8) >>> ==1286936== >>> ==1286936== 2,752 bytes in 1 blocks are definitely lost in loss record 47 of 49 >>> ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x68AE702: opal_free_list_grow_st (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286936== by 0x9F1CCC4: ??? >>> ==1286936== by 0x68FC9C8: mca_btl_base_select (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libopen-pal.so.40.20.3) >>> ==1286936== by 0x9EE3527: ??? >>> ==1286936== by 0x4B6170A: mca_bml_base_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4BA1714: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4B450B0: PMPI_Init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x4A7BA77: boost::mpi::environment::environment(bool) (in /home/mlohry/dev/cmake-build/boost_install/lib/libboost_mpi.so.1.73.0) >>> ==1286936== by 0x5F1F232: Parallel::Parallel() (parallel.cpp:19) >>> ==1286936== by 0x16CDDB: Parallel::Get() (parallel.hpp:40) >>> ==1286936== by 0x15710D: main (testing_main.cpp:8) >>> ==1286936== >>> ==1286936== 62,640 bytes in 30 blocks are indirectly lost in loss record 48 of 49 >>> ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x8DE9FA8: ??? >>> ==1286936== by 0x8DEB032: ??? >>> ==1286936== by 0x8DEB49A: ??? >>> ==1286936== by 0x8DE8B12: ??? >>> ==1286936== by 0x8E9D492: ??? >>> ==1286936== by 0x8E9F1C1: ??? >>> ==1286936== by 0x8D0578C: ??? >>> ==1286936== by 0x8D8605A: ??? >>> ==1286936== by 0x8D87FE8: ??? >>> ==1286936== by 0x8D88E4D: ??? >>> ==1286936== by 0x8D1A5EB: ??? >>> ==1286936== >>> ==1286936== 62,760 (480 direct, 62,280 indirect) bytes in 15 blocks are definitely lost in loss record 49 of 49 >>> ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x8E9D3EB: ??? >>> ==1286936== by 0x8E9F1C1: ??? >>> ==1286936== by 0x8D0578C: ??? >>> ==1286936== by 0x8D8605A: ??? >>> ==1286936== by 0x8D87FE8: ??? >>> ==1286936== by 0x8D88E4D: ??? >>> ==1286936== by 0x8D1A5EB: ??? >>> ==1286936== by 0x9F0398A: ??? >>> ==1286936== by 0x9EE2F54: ??? >>> ==1286936== by 0x9F1E1FB: ??? >>> ==1286936== by 0x4BA1A09: ompi_mpi_init (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== >>> ==1286936== LEAK SUMMARY: >>> ==1286936== definitely lost: 9,805 bytes in 137 blocks >>> ==1286936== indirectly lost: 63,431 bytes in 63 blocks >>> ==1286936== possibly lost: 0 bytes in 0 blocks >>> ==1286936== still reachable: 1,174 bytes in 27 blocks >>> ==1286936== suppressed: 0 bytes in 0 blocks >>> ==1286936== >>> ==1286936== ERROR SUMMARY: 34 errors from 29 contexts (suppressed: 0 from 0) >>> ==1286936== >>> ==1286936== 1 errors in context 1 of 29: >>> ==1286936== Thread 3: >>> ==1286936== Syscall param writev(vector[...]) points to uninitialised byte(s) >>> ==1286936== at 0x658A48D: __writev (writev.c:26) >>> ==1286936== by 0x658A48D: writev (writev.c:24) >>> ==1286936== by 0x8DF9B4C: ??? >>> ==1286936== by 0x7CC413E: ??? (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286936== by 0x7CC487E: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286936== by 0x8DBDD55: ??? >>> ==1286936== by 0x4BF7608: start_thread (pthread_create.c:477) >>> ==1286936== by 0x6595102: clone (clone.S:95) >>> ==1286936== Address 0xa290cbf is 127 bytes inside a block of size 5,120 alloc'd >>> ==1286936== at 0x483DFAF: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x8DE155A: ??? >>> ==1286936== by 0x8DE3F4A: ??? >>> ==1286936== by 0x8DE4900: ??? >>> ==1286936== by 0x8DE4175: ??? >>> ==1286936== by 0x8D7CF91: ??? >>> ==1286936== by 0x7CC3FDD: ??? (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286936== by 0x7CC487E: event_base_loop (in /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7.0.0) >>> ==1286936== by 0x8DBDD55: ??? >>> ==1286936== by 0x4BF7608: start_thread (pthread_create.c:477) >>> ==1286936== by 0x6595102: clone (clone.S:95) >>> ==1286936== Uninitialised value was created by a stack allocation >>> ==1286936== at 0x9F048D6: ??? >>> ==1286936== >>> ==1286936== >>> ==1286936== 6 errors in context 2 of 29: >>> ==1286936== Thread 1: >>> ==1286936== Syscall param pwritev(vector[...]) points to uninitialised byte(s) >>> ==1286936== at 0x658A608: pwritev64 (pwritev64.c:30) >>> ==1286936== by 0x658A608: pwritev (pwritev64.c:28) >>> ==1286936== by 0x9F46E25: ??? >>> ==1286936== by 0x9FCE33B: ??? >>> ==1286936== by 0x9FCDDBF: ??? >>> ==1286936== by 0x9FA324A: ??? >>> ==1286936== by 0x4B3DD7F: PMPI_File_write_at_all (in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi.so.40.20.3) >>> ==1286936== by 0x7B6DB32: H5FD_mpio_write (H5FDmpio.c:1833) >>> ==1286936== by 0x78DF11D: H5FD_write (H5FDint.c:257) >>> ==1286936== by 0x78AE86B: H5F__accum_write (H5Faccum.c:825) >>> ==1286936== by 0x7A1FBE9: H5PB_write (H5PB.c:1027) >>> ==1286936== by 0x78BBC7A: H5F_block_write (H5Fio.c:164) >>> ==1286936== by 0x7B5ED15: H5C__collective_write (H5Cmpio.c:1020) >>> ==1286936== by 0x7B5ED15: H5C_apply_candidate_list (H5Cmpio.c:394) >>> ==1286936== Address 0xedf91b0 is 96 bytes inside a block of size 216 alloc'd >>> ==1286936== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==1286936== by 0x7980C45: H5MM_malloc (H5MM.c:292) >>> ==1286936== by 0x7980C45: H5MM_malloc (H5MM.c:267) >>> ==1286936== by 0x77FC8FF: H5C__flush_single_entry (H5C.c:6045) >>> ==1286936== by 0x7B5DC7E: H5C__flush_candidates_in_ring (H5Cmpio.c:1371) >>> ==1286936== by 0x7B5DC7E: H5C__flush_candidate_entries (H5Cmpio.c:1192) >>> ==1286936== by 0x7B5DC7E: H5C_apply_candidate_list (H5Cmpio.c:385) >>> ==1286936== by 0x7B5BA18: H5AC__rsp__dist_md_write__flush (H5ACmpio.c:1709) >>> ==1286936== by 0x7B5BA18: H5AC__run_sync_point (H5ACmpio.c:2164) >>> ==1286936== by 0x7B5C9D2: H5AC__flush_entries (H5ACmpio.c:2307) >>> ==1286936== by 0x77C95E4: H5AC_flush (H5AC.c:681) >>> ==1286936== by 0x78B306A: H5F__flush_phase2 (H5Fint.c:1831) >>> ==1286936== by 0x78B5D7A: H5F__dest (H5Fint.c:1152) >>> ==1286936== by 0x78B6603: H5F_try_close (H5Fint.c:2180) >>> ==1286936== by 0x78B69F5: H5F__close_cb (H5Fint.c:2009) >>> ==1286936== by 0x7965797: H5I_dec_ref (H5I.c:1254) >>> ==1286936== Uninitialised value was created by a stack allocation >>> ==1286936== at 0x7695AF0: ??? (in /home/mlohry/dev/cmake-build/external_install/lib/libcgns.so) >>> ==1286936== >>> ==1286936== ERROR SUMMARY: 34 errors from 29 contexts (suppressed: 0 from 0) >>> >>> On Mon, Aug 24, 2020 at 5:00 PM Jed Brown > wrote: >>> Do you potentially have a memory or other resource leak? SIGBUS would be an odd result, but the symptom of crashing after running for a long time sometimes fits with a resource leak. >>> >>> Mark Lohry > writes: >>> >>> > I queued up some jobs with Barry's patch, so we'll see. >>> > >>> > Re Jed's suggestion at checkpointing, I don't *think* this is something >>> > coming from the state of the solution -- running from the same point I'm >>> > seeing it crash anywhere between 1 hour and 20 hours in. I'll increase my >>> > file save frequency in case I'm wrong there though. >>> > >>> > My intel build with different blas just made it through a 6 hour time slot >>> > without crash, whereas yesterday the same thing crashed after 3 hours. But >>> > given the randomness so far I'd bet that's just dumb luck. >>> > >>> > On Mon, Aug 24, 2020 at 4:22 PM Barry Smith > wrote: >>> > >>> >> >>> >> >>> >> > On Aug 24, 2020, at 2:34 PM, Jed Brown > wrote: >>> >> > >>> >> > I'm thinking of something such as writing floating point data into the >>> >> return address, which would be unaligned/garbage. >>> >> >>> >> Ok, my patch will detect this. This is what I was talking about, messing >>> >> up the BLAS arguments which are the addresses of arrays. >>> >> >>> >> Valgrind is by far the preferred approach. >>> >> >>> >> Barry >>> >> >>> >> Another feature we could add to the malloc checking is when a SEGV or >>> >> BUS error is encountered and we catch it we should run the >>> >> PetscMallocVerify() and check our memory for corruption reporting any we >>> >> find. >>> >> >>> >> >>> >> >>> >> > >>> >> > Reproducing under Valgrind would help a lot. Perhaps it's possible to >>> >> checkpoint such that the breakage can be reproduced more quickly? >>> >> > >>> >> > Barry Smith > writes: >>> >> > >>> >> >> https://en.wikipedia.org/wiki/Bus_error < >>> >> https://en.wikipedia.org/wiki/Bus_error > >>> >> >> >>> >> >> But perhaps not true for Intel? >>> >> >> >>> >> >> >>> >> >> >>> >> >>> On Aug 24, 2020, at 1:06 PM, Matthew Knepley > >>> >> wrote: >>> >> >>> >>> >> >>> On Mon, Aug 24, 2020 at 1:46 PM Barry Smith >> >> bsmith at petsc.dev >> wrote: >>> >> >>> >>> >> >>> >>> >> >>>> On Aug 24, 2020, at 12:39 PM, Jed Brown >> >> jed at jedbrown.org >> wrote: >>> >> >>>> >>> >> >>>> Barry Smith >> writes: >>> >> >>>> >>> >> >>>>>> On Aug 24, 2020, at 12:31 PM, Jed Brown >> >> jed at jedbrown.org >> wrote: >>> >> >>>>>> >>> >> >>>>>> Barry Smith >> writes: >>> >> >>>>>> >>> >> >>>>>>> So if a BLAS errors with SIGBUS then it is always an input error >>> >> of just not proper double/complex alignment? Or some other very strange >>> >> thing? >>> >> >>>>>> >>> >> >>>>>> I would suspect memory corruption. >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> Corruption meaning what specifically? >>> >> >>>>> >>> >> >>>>> The routines crashing are dgemv which only take double precision >>> >> arrays, regardless of what garbage is in those arrays i don't think there >>> >> can be BUS errors resulting. They don't take integer arrays whose >>> >> corruption could result in bad indexing and then BUS errors. >>> >> >>>>> >>> >> >>>>> So then it can only be corruption of the pointers passed in, correct? >>> >> >>>> >>> >> >>>> Such as those pointers pointing into data on the stack with incorrect >>> >> sizes. >>> >> >>> >>> >> >>> But won't incorrect sizes "usually" lead to SEGV not SEGBUS? >>> >> >>> >>> >> >>> My understanding was that roughly memory errors in the heap are SEGV >>> >> and memory errors on the stack are SIGBUS. Is that not true? >>> >> >>> >>> >> >>> Matt >>> >> >>> >>> >> >>> -- >>> >> >>> What most experimenters take for granted before they begin their >>> >> experiments is infinitely more interesting than any results to which their >>> >> experiments lead. >>> >> >>> -- Norbert Wiener >>> >> >>> >>> >> >>> https://www.cse.buffalo.edu/~knepley/ < >>> >> http://www.cse.buffalo.edu/~knepley/ > >>> >> >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Aug 27 23:36:51 2020 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 27 Aug 2020 23:36:51 -0500 Subject: [petsc-users] PetscFV and TS implicit In-Reply-To: References: <87mu2pgtdp.fsf@jedbrown.org> <01FA5D4D-A0CA-4ACB-ACC9-EB213E3B0D2F@petsc.dev> <2BF36064-AEC6-4795-BEE7-DAAF69119D2E@petsc.dev> Message-ID: <2E97A97A-593E-4649-A5F8-7986A5A7CC06@petsc.dev> I'm sorry I'm not the one to understand PetscComputePreconMatImpl, I know some of the words but cannot follow the tune. Matt, Jed, Stefano, and Pierre, and maybe others can maybe help. Regarding PetscJacobianFunction_JFNK, I don't understand what purpose it would serve, PETSc already has matrix-free based on finite differences that work automatically for TS, you won't be able to write anything substantially better than that. With your specific code I am concerned that you call DMPlexTSComputeRHSFVM() but don't take into account the specific ODE integrator, that is you are differencing the RHS function but the nonlinear problem is actually defined by the full ODE u_t - RHS(u) = 0 discretized with a particular ODE method so your code will need to know the specific ODE integrator and the time-step. But perhaps I misunderstand the code. Barry > On Aug 27, 2020, at 1:15 AM, Thibault Bridel-Bertomeu wrote: > > Sorry Barry for the late reply. > > Le mar. 25 ao?t 2020 ? 15:19, Barry Smith > a ?crit : > > Yes, your use of the coloring is what I was thinking of. I don't think you need any of calls to the coloring code as it is managed in SNESComputeJacobianDefaultColor() if you don't provide it initially. Did that not work, just using -snes_fd_color? > > Yes it works with the command line flag too, I just wanted to write down the lines somewhere in case I needed them and I left them there in the end. > > Regarding direct solvers. Add the arguments > > --download-superlu_dist --download-metis --download-parmetis --download-mumps --download-scalapack --download-ptscotch > > to ./configure > > Then when you run the code you can use -pc_type lu -pc_factor_mat_solver_type superlu_dist or mumps > > Ah, thanks ! I haven't experimented much with the direct solvers in PETSc, I mostly use iterative resolution so far, but I'll have a look. > > With this first implementation using the automatic differentiation by coloring, I was able to solve with implicit time stepping problems involving the Euler equations or the Navier-Stokes equations (which contain a parabolic term) in both 2D and axisymmetric form. It is definitely a win for me. I played around the different SNES, KSP and PC I could use, and it turns out using respectively newtonls, gmres and sor with 5 iterations is probably the most robust combination, with which I was able to achieve start-up CFL numbers around 50 (based solely on the non-parabolic part of the systems of equations). > > Now, coming back to why I first sent a message in this mailing list, I am still trying to create the Jacobian and Preconditioner myself. As you can see in the attached PDF, I am still using my PetscJacobianFunction_JFNK and my PetscIJacobian routines, but they evolved since last time. Following your advice Barry I rewrote carefully the JFNK method to account for global vectors as input and I feel it should be okay now even though I don't really have a way of testing it : with the systems of equations I am trying to solve, no preconditioner yields a SNES divergence. So I wrote the PetscComputePreconMatImpl to get the preconditioner, the idea being I would like to get something like alpha * Id - grad (dF/dU) where grad represents the spatial differentiation operator and dF/dU is the exact jacobian in each cell this time, alpha being a shift parameter introduced by the implicit time stepping method. The exact dF/dU is computed from the state in a cell using the static NavierStokes2DJFunction. > To compute the gradient of this jacobian I took inspiration from the way the gradient is computed in the DMPlex_reconstruct_gradient_internal but I am not quite sure of what I am doing there. > If you don't mind, could you please tell me what you think of the way I approach this preconditioner computation ? > > Thank you so much for your help once again, > > Best regards, > > Thibault > > > Barry > > > > >> On Aug 25, 2020, at 2:06 AM, Thibault Bridel-Bertomeu > wrote: >> >> Hello everyone, >> >> Barry, I followed your recommendations and came up with the pieces of code that are in the attached PDF - mostly pages 1 & 3 are important, page 2 is almost entirely commented. >> >> I tried to use DMCreateColoring as the doc says it may produce a more accurate coloring, however it is not implemented for a Plex yet hence the call to Matcoloringcreate that you will see. I left the test DMHascoloring in case in a later release PETSc allows for the generation of the coloring from a Plex. >> >> Also, you'll see in the input file that contrary to what you suggested I am using the jacobi PC. It is simply because it appears that the way I compiled my PETSc does not support a PC LU or PC CHOLESKY (per the seg fault print in the console). Do I need scalapack or mumps or something else ? >> >> Altogether this implementation works and produces results that are correct physically speaking. Now I have to try and increase the CFL number a lot to see how robust this approach is. >> >> All in all, what do you think of this implementation, is it what you had in mind ? >> >> Thank you for your help, >> >> Thibault >> >> Le lun. 24 ao?t 2020 ? 22:16, Barry Smith > a ?crit : >> >> >>> On Aug 24, 2020, at 2:20 PM, Thibault Bridel-Bertomeu > wrote: >>> >>> Good evening everyone, >>> >>> Thanks Barry for your answer. >>> >>> Le lun. 24 ao?t 2020 ? 18:51, Barry Smith > a ?crit : >>> >>> >>>> On Aug 24, 2020, at 11:38 AM, Thibault Bridel-Bertomeu > wrote: >>>> >>>> Thank you Barry for taking the time to go through the code ! >>>> >>>> I indeed figured out this afternoon that the function related to the matrix-vector product is always handling global vectors. I corrected mine so that it compiles, but I have a feeling it won't run properly without a preconditioner. >>>> >>>> Anyways you are right, my PetscJacobianFunction_JFNK() aims at doing some basic finite-differencing ; user->RHS_ref is my F(U) if you see the system as dU/dt = F(U). As for the DMGlobalToLocal() it was there mainly because I had not realized the vectors I was manipulating were global. >>>> I will take your advice and try with just the SNESSetUseMatrixFree. >>>> I haven't quite fully understood what it does "under the hood" though: just calling SNESSetUseMatrixFree(snes,PETSC_TRUE,PETSC_FALSE) before the TSSolve call is enough to ensure that the implicit matrix is computed ? Does it use the function we set as a RHS to build the matrix ? >>> >>> All it does is "replace" the A matrix with one automatically created for the job using MatCreateMFFD(). It does not touch the B matrix, it does not build the matrix but yes if does use the function to provide to do the differencing. >>> >>> OK, thank you. This MFFD Matrix is then called by the TS to construct the linear system that will be solved to advance the system of equations, right ? >>>> >>>> To create the preconditioner I will do as you suggest too, thank you. This matrix has to be as close as possible to the inverse of the implicit matrix to ensure that the eigenvalues of the system are as close to 1 as possible. Given the implicit matrix is built "automatically" thanks to the SNES matrix free capability, can we use that matrix as a starting point to the building of the preconditioner ? >>> >>> No the MatrixFree doesn't build a matrix, it can only do matrix-vector products with differencing. >>> >>> My bad, wrong word. Yes of course it's all matrix-free hence it's just a functional, however maybe the inner mechanisms can be accessed and used for the preconditioner ? >> >> Probably not, it really only can do matrix-vector products. >> >>>> You were talking about the coloring capabilities in PETSc, is that where it can be applied ? >>> >>> Yes you can use that. See MatFDColoringCreate() but since you are using a DM in theory you can use -snes_fd_color and PETSc will manage everything for you so you don't have to write any code for Jacobians at all. Again it uses your function to do differences using coloring to be efficient to build the Jacobian for you. >>> >>> I read a bit about the coloring you are mentioning. As I understand it, it is another option to have a matrix-free Jacobian behavior during the Newton-Krylov iterations, right ? Either we use the SNESSetUseMatrixFree() alone, then it works using "basic" finite-differencing, or we use the SNESSetUseMatrixFree + MatFDColoringCreate & SNESComputeJacobianDefaultColor as an option to SNESSetJacobian to access the finite-differencing based on coloring. Is that right ? >>> Then if i come back to my preconditioner problem ... once you have set-up the implicit matrix with one or the other aforementioned matrix-free ways, how would you go around setting up the preconditioner ? In a matrix-free way too, or rather as a real matrix that we assemble ourselves this time, as you seemed to mean with the previous MatAij DMCreateMatrix ? >>> >>> Sorry if it seems like I am nagging, but I would really like to understand how to manipulate the matrix-free methods and structures in PETSc to run a time-implicit finite volume computation, it's so promising ! >> >> There are many many possibilities as we discussed in previous email, most with various limitations. >> >> When you use -snes_fd_color (or put code into the source like MatFDColoringCreate which is unnecessary a since you are doing the same thing as -snes_fd_color you get back the true Jacobian (approximated so in less digits than analytic) so you can use any preconditioner that you can use as if you built the true Jacobian yourself. >> >> I always recommend starting with -pc_type lu and making sure you are getting the correct answers to your problem and then worrying about the preconditioner. Faster preconditioner are JUST optimizations, nothing more, they should not change the quality of the solution to your PDE/ODE and you absolutely need to make sure your are getting correct quality answers before fiddling with the preconditioner. >> >> Once you have the solution correct and figured out a good preconditioner (assuming using the true Jacobian works for your discretization) then you can think about optimizing the computation of the Jacobian by doing it analytically finite volume by finite volume. But you shouldn't do any of that until you are sure that your implicit TS integrator for FV produces good numerical answers. >> >> Barry >> >> >> >> >>> >>> Thanks again, >> >>> Thibault >>> >>> Barry >>> >>> Internally it uses SNESComputeJacobianDefaultColor() if you are interested in what it does. >>> >>> >>> >>> >>>> >>>> Thank you so much again, >>>> >>>> Thibault >>>> >>>> >>>> Le lun. 24 ao?t 2020 ? 15:45, Barry Smith > a ?crit : >>>> >>>> I think the attached is wrong. >>>> >>>> >>>> >>>> The input to the matrix vector product for the Jacobian is always global vectors which means on each process the dimension is not the size of the DMGetLocalVector() it should be the VecGetLocalSize() of the DMGetGlobalVector() >>>> >>>> But you may be able to skip all this and have the DM create the shell matrix setting it sizes appropriately and you only need to supply the MATOP >>>> >>>> DMSetMatType(dm,MATSHELL); >>>> DMCreateMatrix(dm,&A); >>>> >>>> In fact, I also don't understand the PetscJacobianFunction_JFKN() function It seems to be doing finite differencing on the DMPlexTSComputeRHSFunctionFVM() assuming the current function value is in usr->RHS_ref. How is this different than just letting PETSc/SNES used finite differences to do the matrix-vector product. Your code seems rather complicated with the DMGlobalToLocal() which I don't understand what it is suppose to do there. >>>> >>>> I think you can just call >>>> >>>> TSGetSNES() >>>> SNESSetUseMatrixFree(snes,PETSC_TRUE,PETSC_FALSE); >>>> >>>> and it will set up an internal matrix that does the finite differencing for you. Then you never need a shell matrix. >>>> >>>> >>>> Also to create the preconditioner matrix B this should work >>>> >>>> DMSetMatType(dm,MATAIJ); >>>> DMCreateMatrix(dm,&B); >>>> >>>> no need for you to figure out the sizes. >>>> >>>> >>>> Note that both A and B need to have the same dimensions on each process as the global vectors which I don't think your current code has. >>>> >>>> >>>> >>>> Barry >>>> >>>> >>>> >>>> >>>>> On Aug 24, 2020, at 12:56 AM, Thibault Bridel-Bertomeu > wrote: >>>>> >>>> >>>> >>>>> Barry, first of all, thank you very much for your detailed answer, I keep reading it to let it soak in - I might come back to you for more details if you do not mind. >>>>> >>>>> In the meantime, to fuel the conversation, I attach to this e-mail two pdfs containing the pieces of the code that regard what we are discussing. In the *timedisc.pdf, you'll find how I handle the initialization of the TS object, and in the *petscdefs.pdf you'll find the method that calls the TSSolve as well as the methods that are linked to the TS (the timestep adapt, the jacobian etc ...). [Sorry for the quality, I cannot do better than that sort of pdf ...] >>>>> >>>>> Based on what is in the structured code I sent you the other day, I rewrote the PetscJacobianFunction_JFNK. I think it should be all right, but although it compiles, execution raises a seg fault I think when I do >>>>> ierr = TSSetIJacobian(ts, A, A, PetscIJacobian, user); >>>>> saying that A does not have the right dimensions. It is quite new, I am still looking into where exactly the error is raised. What do you think of this implementation though, does it look correct in your expert eyes ? >>>>> As for what we really discussed so far, it's that PetscComputePreconMatImpl that I do not know how to implement (with the derivative of the jacobian based on the FVM object). >>>>> >>>>> I understand now that what I am showing you today might not be the right way to go if one wants to really use the PetscFV, but I just wanted to add those code lines to the conversation to have your feedback. >>>>> >>>>> Thank you again for your help, >>>>> >>>>> Thibault >>>>> >>>>> >>>> >>>> >>>>> Le ven. 21 ao?t 2020 ? 19:25, Barry Smith > a ?crit : >>>> >>>> >>>>> >>>>> >>>>>> On Aug 21, 2020, at 10:58 AM, Thibault Bridel-Bertomeu > wrote: >>>>>> >>>>>> Thank you Barry for the tip ! I?ll make sure to do that when everything is set. >>>>>> What I also meant is that there will not be any more direct way to set the preconditioner than to go through SNESSetJacobian after having assembled everything by hand ? Like, in my case, or in the more general case of fluid dynamics equations, the preconditioner is not a fun matrix to assemble, because for every cell the derivative of the physical flux jacobian has to be taken and put in the right block in the matrix - finite element style if you want. Is there a way to do that with Petsc methods, maybe short-circuiting the FEM based methods ? >>>>> >>>>> Thibault >>>>> >>>>> I am not sure what you mean but there are a couple of things that may be helpful. >>>>> >>>>> PCSHELL https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCSHELL.html <> allows you to build your own preconditioner (that can and often will use one or more of its own Mats, and KSP or PC inside it, or even use another PETScFV etc to build some of the sub matrices for you if it is appropriate), this approach means you never need to construct a "global" PETSc matrix from which PETSc builds the preconditioner. But you should only do this if the conventional approach is not reasonable for your problem. >>>>> >>>>> MATNEST https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MATNEST.html allows you to build a global matrix by building parts of it separately and even skipping parts you decide you don't need in the preconditioner. Conceptually it is the same as just creating a global matrix and filling up but the process is a bit different and something suitable for "multi physics" or "multi-equation" type applications. >>>>> >>>>> Of course what you put into PCSHELL and MATNEST will affect the convergence of the nonlinear solver. As Jed noted what you put in the "Jacobian" does not have to be directly the same mathematically as what you put into the TSSetI/RHSFunction with the caveat that it does have to appropriate spectral properties to result in a good preconditioner for the "true" Jacobian. >>>>> >>>>> Couple of other notes: >>>>> >>>>> The entire business of "Jacobian" matrix-free or not (with for example -snes_fd_operator) is tricky because as Jed noted if your finite volume scheme has non-differential terms such as if () tests. There is a concept of sub-differential for this type of thing but I know absolutely nothing about that and probably not worth investigating. >>>>> >>>>> In this situation you can avoid the "true" Jacobian completely (both for matrix-vector product and preconditioner) and use something else as Jed suggested a lower order scheme that is differentiable. This can work well for solving the nonlinear system or not depending on how suitable it is for your original "function" >>>>> >>>>> >>>>> 1) In theory at least you can have the Jacobian matrix-vector product computed directly using DMPLEX/PETScFV infrastructure (it would apply the Jacobian locally matrix-free using code similar to the code that evaluates the FV "function". I do no know if any of this code is written, it will be more efficient than -snes_mf_operator that evaluates the FV "function" and does traditional differencing to compute the Jacobian. Again it has the problem of non-differentialability if the function is not differential. But it could be done for a different (lower order scheme) that is differentiable. >>>>> >>>>> 2) You can have PETSc compute the Jacobian explicitly coloring and from that build the preconditioner, this allows you to avoid the hassle of writing the code for the derivatives yourself. This uses finite differences on your function and coloring of the graph to compute many columns of the Jacobian simultaneously and can be pretty efficient. Again if the function is not differential there can be issues of what the result means and will it work in a nonlinear solver. SNESComputeJacobianDefaultColor https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESComputeJacobianDefaultColor.html >>>>> >>>>> 3) Much more outlandish is to skip Newton and Jacobians completely and use the full approximation scheme SNESFAS https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNESFAS/SNESFAS.html this requires a grid hierarchy and appropriate way to interpolate up through the grid hierarchy your finite volume solutions. Probably not worth investigating unless you have lots of time on your hands and keen interest in this kind of stuff https://arxiv.org/pdf/1607.04254.pdf >>>>> >>>>> So to summarize, and Matt and Jed can correct my mistakes. >>>>> >>>>> 1) Form the full Jacobian from the original "function" using analytic approach use it for both the matrix-vector product and to build the preconditioner. Problem if full Jacobian not well defined mathematically. Tough to code, usually not practical. >>>>> >>>>> 2) Do any matrix free (any way) for the full Jacobian and >>>>> >>>>> a) build another "approximate" Jacobian (using any technique analytic or finite differences using matrix coloring on a new "lower order" "function") Still can have trouble if this original Jacobian is no well defined >>>>> >>>>> b) "write your own preconditioner" that internally can use anything in PETSc that approximately solves the Jacobian. Same potential problems if original Jacobian is not differential, plus convergence will depend on how good your own preconditioner approximates the inverse of the true Jacobian. >>>>> >>>>> 3) Use a lower Jacobian (computed anyway you want) for the matrix-vector product and the preconditioner. The problem of differentiability is gone but convergence of the nonlinear solver depends on how well lower order Jacobian is appropriate for the original "function" >>>>> >>>>> a) Form the "lower order" Jacobian analytically or with coloring and use for both matrix-vector product and building preconditioner. Note that switching between this and 2a is trivial. >>>>> >>>>> b) Do the "lower order" Jacobian matrix free and provide your own PCSHELL. Note that switching between this and 2b is trivial. >>>>> >>>>> Barry >>>>> >>>>> I would first try competing the "true" Jacobian via coloring, if that works and give satisfactory results (fast enough) then stop. >>>>> >>>>> Then I would do 2a/2b by writing my "function" using PETScFV and writing the "lower order function" via PETScFV and use matrix coloring to get the Jacobian from the second "lower order function". If this works well (either with 2a or 3a or both) then stop or you can compute the "lower order" Jacobian analytically (again using PetscFV) for a more efficient evaluation of the Jacobian. >>>>> >>>> >>>>> >>>>>> >>>>>> Thanks ! >>>>>> >>>>>> Thibault >>>>>> >>>> >>>>>> Le ven. 21 ao?t 2020 ? 17:22, Barry Smith > a ?crit : >>>> >>>> >>>>>> >>>>>> >>>>>>> On Aug 21, 2020, at 8:35 AM, Thibault Bridel-Bertomeu > wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> Le ven. 21 ao?t 2020 ? 15:23, Matthew Knepley > a ?crit : >>>>>>> On Fri, Aug 21, 2020 at 9:10 AM Thibault Bridel-Bertomeu > wrote: >>>>>>> Sorry, I sent too soon, I hit the wrong key. >>>>>>> >>>>>>> I wanted to say that context.npoints is the local number of cells. >>>>>>> >>>>>>> PetscRHSFunctionImpl allows to generate the hyperbolic part of the right hand side. >>>>>>> Then we have : >>>>>>> >>>>>>> PetscErrorCode PetscIJacobian( >>>>>>> TS ts, /*!< Time stepping object (see PETSc TS)*/ >>>>>>> PetscReal t, /*!< Current time */ >>>>>>> Vec Y, /*!< Solution vector */ >>>>>>> Vec Ydot, /*!< Time-derivative of solution vector */ >>>>>>> PetscReal a, /*!< Shift */ >>>>>>> Mat A, /*!< Jacobian matrix */ >>>>>>> Mat B, /*!< Preconditioning matrix */ >>>>>>> void *ctxt /*!< Application context */ >>>>>>> ) >>>>>>> { >>>>>>> PETScContext *context = (PETScContext*) ctxt; >>>>>>> HyPar *solver = context->solver; >>>>>>> _DECLARE_IERR_; >>>>>>> >>>>>>> PetscFunctionBegin; >>>>>>> solver->count_IJacobian++; >>>>>>> context->shift = a; >>>>>>> context->waqt = t; >>>>>>> /* Construct preconditioning matrix */ >>>>>>> if (context->flag_use_precon) { IERR PetscComputePreconMatImpl(B,Y,context); CHECKERR(ierr); } >>>>>>> >>>>>>> PetscFunctionReturn(0); >>>>>>> } >>>>>>> >>>>>>> and PetscJacobianFunction_JFNK which I bind to the matrix shell, computes the action of the jacobian on a vector : say U0 is the state of reference and Y the vector upon which to apply the JFNK method, then the PetscJacobianFunction_JFNK returns shift * Y - 1/epsilon * (F(U0 + epsilon*Y) - F(U0)) where F allows to evaluate the hyperbolic flux (shift comes from the TS). >>>>>>> The preconditioning matrix I compute as an approximation to the actual jacobian, that is shift * Identity - Derivative(dF/dU) where dF/dU is, in each cell, a 4x4 matrix that is known exactly for the system of equations I am solving, i.e. Euler equations. For the structured grid, I can loop on the cells and do that 'Derivative' thing at first order by simply taking a finite-difference like approximation with the neighboring cells, Derivative(phi) = phi_i - phi_{i-1} and I assemble the B matrix block by block (JFunction is the dF/dU) >>>>>>> >>>>>>> /* diagonal element */ >>>>>>> >>>>>>> >>>>>>> <> for (v=0; v>>>>>> >>>>>>> >>>>>>> <> ierr = solver->JFunction (values,(u+nvars*p),solver->physics ,dir,0); >>>>>>> >>>>>>> >>>>>>> <> _ArrayScale1D_ (values,(dxinv*iblank),(nvars*nvars)); >>>>>>> >>>>>>> >>>>>>> <> ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); CHKERRQ(ierr); >>>>>>> >>>>>>> >>>>>>> <> >>>>>>> >>>>>>> >>>>>>> <> /* left neighbor */ >>>>>>> >>>>>>> >>>>>>> <> if (pgL >= 0) { >>>>>>> >>>>>>> >>>>>>> <> for (v=0; v>>>>>> >>>>>>> >>>>>>> <> ierr = solver->JFunction (values,(u+nvars*pL),solver->physics ,dir,1); >>>>>>> >>>>>>> >>>>>>> <> _ArrayScale1D_ (values,(-dxinv*iblank),(nvars*nvars)); >>>>>>> >>>>>>> >>>>>>> <> ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); CHKERRQ(ierr); >>>>>>> >>>>>>> >>>>>>> <> } >>>>>>> >>>>>>> >>>>>>> <> >>>>>>> >>>>>>> >>>>>>> <> /* right neighbor */ >>>>>>> >>>>>>> >>>>>>> <> if (pgR >= 0) { >>>>>>> >>>>>>> >>>>>>> <> for (v=0; v>>>>>> >>>>>>> >>>>>>> <> ierr = solver->JFunction (values,(u+nvars*pR),solver->physics ,dir,-1); >>>>>>> >>>>>>> >>>>>>> <> _ArrayScale1D_ (values,(-dxinv*iblank),(nvars*nvars)); >>>>>>> >>>>>>> >>>>>>> <> ierr = MatSetValues(Pmat,nvars,rows,nvars,cols,values,ADD_VALUES); CHKERRQ(ierr); >>>>>>> >>>>>>> >>>>>>> <> } >>>>>>> >>>>>>> >>>>>>> >>>>>>> I do not know if I am clear here ... >>>>>>> Anyways, I am trying to figure out how to do this shell matrix and this preconditioner using all the FV and DMPlex artillery. >>>>>>> >>>>>>> Okay, that is very clear. We should be able to get the JFNK just with -snes_mf_operator, and put the approximate J construction in DMPlexComputeJacobian_Internal(). >>>>>>> There is an FV section already, and we could just add this. I would need to understand those entries in the pointwise Riemann sense that the other stuff is now. >>>>>>> >>>>>>> Ok i had a quick look and if I understood correctly it would do the job. Setting the snes-mf-operator flag would mean however that we have to go through SNESSetJacobian to set the jacobian and the preconditioning matrix wouldn't it ? >>>>>> >>>>>> Thibault, >>>>>> >>>>>> Since the TS implicit methods end up using SNES internally the option should be available to you without requiring you to be calling the SNES routines directly >>>>>> >>>>>> Once you have finalized your approach and if for the implicit case you always work in the snes mf operator mode you can hardwire >>>>>> >>>>>> TSGetSNES(ts,&snes); >>>>>> SNESSetUseMatrixFree(snes,PETSC_TRUE,PETSC_FALSE); >>>>>> >>>>>> in your code so you don't need to always provide the option -snes-mf-operator >>>> >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> >>>> >>>>>>> There might be calls to the Riemann solver to evaluate the dRHS / dU part yes but maybe it's possible to re-use what was computed for the RHS^n ? >>>>>>> In the FV section the jacobian is set to identity which I missed before, but it could explain why when I used the following : >>>>>>> TSSetType(ts, TSBEULER); >>>>>>> DMTSSetIFunctionLocal(dm, DMPlexTSComputeIFunctionFEM , &ctx); >>>>>>> DMTSSetIJacobianLocal(dm, DMPlexTSComputeIJacobianFEM , &ctx); >>>>>>> with my FV discretization nothing happened, right ? >>>>>>> >>>>>>> Thank you, >>>>>>> >>>>>>> Thibault >>>>>>> >>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Matt >>>>>>> >>>> >>>>>>> Le ven. 21 ao?t 2020 ? 14:55, Thibault Bridel-Bertomeu > a ?crit : >>>> >>>> >>>>>>> Hi, >>>>>>> >>>>>>> Thanks Matthew and Jed for your input. >>>>>>> I indeed envision an implicit solver in the sense Jed mentioned - Jiri Blazek's book is a nice intro to this concept. >>>>>>> >>>>>>> Matthew, I do not know exactly what to change right now because although I understand globally what the DMPlexComputeXXXX_Internal methods do, I cannot say for sure line by line what is happening. >>>>>>> In a structured code, I have a an implicit FVM solver with PETSc but I do not use any of the FV structure, not even a DM - I just use C arrays that I transform to PETSc Vec and Mat and build my IJacobian and my preconditioner and gives all that to a TS and it runs. I cannot figure out how to do it with the FV and the DM and all the underlying "shortcuts" that I want to use. >>>>>>> >>>>>>> Here is the top method for the structured code : >>>>>>> >>>> >>>> >>>>>>> int total_size = context.npoints * solver->nvars >>>>>>> ierr = TSSetRHSFunction(ts,PETSC_NULL,PetscRHSFunctionImpl,&context); CHKERRQ(ierr); >>>>>>> SNES snes; >>>>>>> KSP ksp; >>>>>>> PC pc; >>>>>>> SNESType snestype; >>>>>>> ierr = TSGetSNES(ts,&snes); CHKERRQ(ierr); >>>>>>> ierr = SNESGetType(snes,&snestype); CHKERRQ(ierr); >>>>>>> >>>>>>> flag_mat_a = 1; >>>>>>> ierr = MatCreateShell(MPI_COMM_WORLD,total_size,total_size,PETSC_DETERMINE, >>>>>>> PETSC_DETERMINE,&context,&A); CHKERRQ(ierr); >>>>>>> context.jfnk_eps = 1e-7; >>>>>>> ierr = PetscOptionsGetReal(NULL,NULL,"-jfnk_epsilon",&context.jfnk_eps,NULL); CHKERRQ(ierr); >>>>>>> ierr = MatShellSetOperation(A,MATOP_MULT,(void (*)(void))PetscJacobianFunction_JFNK); CHKERRQ(ierr); >>>>>>> ierr = MatSetUp(A); CHKERRQ(ierr); >>>>>>> >>>>>>> context.flag_use_precon = 0; >>>>>>> ierr = PetscOptionsGetBool(PETSC_NULL,PETSC_NULL,"-with_pc",(PetscBool*)(&context.flag_use_precon),PETSC_NULL); CHKERRQ(ierr); >>>>>>> >>>>>>> /* Set up preconditioner matrix */ >>>>>>> flag_mat_b = 1; >>>>>>> ierr = MatCreateAIJ(MPI_COMM_WORLD,total_size,total_size,PETSC_DETERMINE,PETSC_DETERMINE, >>>> >>>>>>> (solver->ndims*2+1)*solver->nvars,NULL, >>>>>>> 2*solver->ndims*solver->nvars,NULL,&B); CHKERRQ(ierr); >>>>>>> ierr = MatSetBlockSize(B,solver->nvars); >>>>>>> /* Set the RHSJacobian function for TS */ >>>> >>>> ierr = TSSetIJacobian(ts,A,B,PetscIJacobian,&context); CHKERRQ(ierr); >>>> >>>>>>> Thibault Bridel-Bertomeu >>>>>>> ? >>>>>>> Eng, MSc, PhD >>>>>>> Research Engineer >>>>>>> CEA/CESTA >>>>>>> 33114 LE BARP >>>>>>> Tel.: (+33)557046924 >>>>>>> Mob.: (+33)611025322 >>>>>>> Mail: thibault.bridelbertomeu at gmail.com >>>>>>> >>>> >>>> >>>>>>> >>>>>>> Le jeu. 20 ao?t 2020 ? 18:43, Jed Brown > a ?crit : >>>>>>> Matthew Knepley > writes: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> > I could never get the FVM stuff to make sense to me for implicit methods. >>>>>>> >>>>>>> >>>>>>> > Here is my problem understanding. If you have an FVM method, it decides >>>>>>> >>>>>>> >>>>>>> > to move "stuff" from one cell to its neighboring cells depending on the >>>>>>> >>>>>>> >>>>>>> > solution to the Riemann problem on each face, which computed the flux. This >>>>>>> >>>>>>> >>>>>>> > is >>>>>>> >>>>>>> >>>>>>> > fine unless the timestep is so big that material can flow through into the >>>>>>> >>>>>>> >>>>>>> > cells beyond the neighbor. Then I should have considered the effect of the >>>>>>> >>>>>>> >>>>>>> > Riemann problem for those interfaces. That would be in the Jacobian, but I >>>>>>> >>>>>>> >>>>>>> > don't know how to compute that Jacobian. I guess you could do everything >>>>>>> >>>>>>> >>>>>>> > matrix-free, but without a preconditioner it seems hard. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> So long as we're using method of lines, the flux is just instantaneous flux, not integrated over some time step. It has the same meaning for implicit and explicit. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> An explicit method would be unstable if you took such a large time step (CFL) and an implicit method will not simultaneously be SSP and higher than first order, but it's still a consistent discretization of the problem. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> It's common (done in FUN3D and others) to precondition with a first-order method, where gradient reconstruction/limiting is skipped. That's what I'd recommend because limiting creates nasty nonlinearities and the resulting discretizations lack h-ellipticity which makes them very hard to solve. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>> >>>>>>> >>>>>>> >>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>> >>>>>> -- >>>>>> Thibault Bridel-Bertomeu >>>>>> ? >>>>>> Eng, MSc, PhD >>>>>> Research Engineer >>>>>> CEA/CESTA >>>>>> 33114 LE BARP >>>>>> Tel.: (+33)557046924 >>>>>> Mob.: (+33)611025322 >>>>>> Mail: thibault.bridelbertomeu at gmail.com >>>>>> >>>>>> >>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Thibault Bridel-Bertomeu >>>> ? >>>> Eng, MSc, PhD >>>> Research Engineer >>>> CEA/CESTA >>>> 33114 LE BARP >>>> Tel.: (+33)557046924 >>>> Mob.: (+33)611025322 >>>> Mail: thibault.bridelbertomeu at gmail.com >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pranayreddy865 at gmail.com Fri Aug 28 03:30:01 2020 From: pranayreddy865 at gmail.com (baikadi pranay) Date: Fri, 28 Aug 2020 01:30:01 -0700 Subject: [petsc-users] Unexplained memory leaks Message-ID: Hi, I am building a 2D solver for the semiconductor Poisson-Boltzmann equation. I detected a memory leak when running the program using valgrind but I am unable to solve this issue as there are no signs in the valgrind output indicating that the source of the error is in the modules I have written. I am attaching you a text file containing the valgrind output. I have seen that a similar question was asked earlier (found here ) but I could not find a final solution to that problem. Could you let me know the source of the problem? Please let me know if you need any further information. Thank you, Pranay. ? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ==3496== HEAP SUMMARY ==3496== in use at exit 13,861,509 bytes in 89 blocks ==3496== total heap usage 23,885 allocs, 23,796 frees, 67,052,683 bytes allocated ==3496== ==3496== 5 bytes in 1 blocks are definitely lost in loss record 1 of 89 ==3496== at 0x4C29BE3 malloc (vg_replace_malloc.c299) ==3496== by 0x9863949 strdup (in usrlib64libc-2.17.so) ==3496== by 0xEAA7DDF ==3496== by 0xE872782 ==3496== by 0xE879DB0 ==3496== by 0xE8700FE ==3496== by 0xE82C6A5 ==3496== by 0xE5EB5F3 ==3496== by 0xDFD78FA ==3496== by 0x9DE3000 orte_init (in packages7xopenmpiagave3.0.0gcc6xnormalliblibopen-rte.so.40.0.0) ==3496== by 0x8EA28DD ompi_mpi_init (in packages7xopenmpiagave3.0.0gcc6xnormalliblibmpi.so.40.0.0) ==3496== by 0x8EC8C7A PMPI_Init (in packages7xopenmpiagave3.0.0gcc6xnormalliblibmpi.so.40.0.0) ==3496== ==3496== 5 bytes in 1 blocks are definitely lost in loss record 2 of 89 ==3496== at 0x4C29BE3 malloc (vg_replace_malloc.c299) ==3496== by 0x9863949 strdup (in usrlib64libc-2.17.so) ==3496== by 0xEAA7D87 ==3496== by 0xE872782 ==3496== by 0xE879DB0 ==3496== by 0xE8700FE ==3496== by 0xE82C6A5 ==3496== by 0xE5EB5F3 ==3496== by 0xDFD78FA ==3496== by 0x9DE3000 orte_init (in packages7xopenmpiagave3.0.0gcc6xnormalliblibopen-rte.so.40.0.0) ==3496== by 0x8EA28DD ompi_mpi_init (in packages7xopenmpiagave3.0.0gcc6xnormalliblibmpi.so.40.0.0) ==3496== by 0x8EC8C7A PMPI_Init (in packages7xopenmpiagave3.0.0gcc6xnormalliblibmpi.so.40.0.0) ==3496== ==3496== 8 bytes in 1 blocks are definitely lost in loss record 3 of 89 ==3496== at 0x4C2B975 calloc (vg_replace_malloc.c711) ==3496== by 0xA0E9773 dlopen_open (in packages7xopenmpiagave3.0.0gcc6xnormalliblibopen-pal.so.40.0.0) ==3496== by 0x12824C63 ==3496== by 0x134516CF ==3496== by 0xA0C92D2 mca_base_framework_components_open (in packages7xopenmpiagave3.0.0gcc6xnormalliblibopen-pal.so.40.0.0) ==3496== by 0xA0E773A mca_btl_base_open (in packages7xopenmpiagave3.0.0gcc6xnormalliblibopen-pal.so.40.0.0) ==3496== by 0xA0D3C90 mca_base_framework_open (in packages7xopenmpiagave3.0.0gcc6xnormalliblibopen-pal.so.40.0.0) ==3496== by 0xA0D3C90 mca_base_framework_open (in packages7xopenmpiagave3.0.0gcc6xnormalliblibopen-pal.so.40.0.0) ==3496== by 0x8EA2E6C ompi_mpi_init (in packages7xopenmpiagave3.0.0gcc6xnormalliblibmpi.so.40.0.0) ==3496== by 0x8EC8C7A PMPI_Init (in packages7xopenmpiagave3.0.0gcc6xnormalliblibmpi.so.40.0.0) ==3496== by 0x8C463E7 MPI_INIT (in packages7xopenmpiagave3.0.0gcc6xnormalliblibmpi_mpifh.so.40.0.0) ==3496== by 0x4F8AF56 petscinitialize_internal (zstart.c317) ==3496== ==3496== 8 bytes in 1 blocks are definitely lost in loss record 4 of 89 ==3496== at 0x4C2B975 calloc (vg_replace_malloc.c711) ==3496== by 0xA0E9773 dlopen_open (in packages7xopenmpiagave3.0.0gcc6xnormalliblibopen-pal.so.40.0.0) ==3496== by 0x12824C63 ==3496== by 0x13AA05E9 ==3496== by 0xA0E7B25 mca_btl_base_select (in packages7xopenmpiagave3.0.0gcc6xnormalliblibopen-pal.so.40.0.0) ==3496== by 0x12E35491 ==3496== by 0x8EEC95B mca_bml_base_init (in packages7xopenmpiagave3.0.0gcc6xnormalliblibmpi.so.40.0.0) ==3496== by 0x8EA2E8B ompi_mpi_init (in packages7xopenmpiagave3.0.0gcc6xnormalliblibmpi.so.40.0.0) ==3496== by 0x8EC8C7A PMPI_Init (in packages7xopenmpiagave3.0.0gcc6xnormalliblibmpi.so.40.0.0) ==3496== by 0x8C463E7 MPI_INIT (in packages7xopenmpiagave3.0.0gcc6xnormalliblibmpi_mpifh.so.40.0.0) ==3496== by 0x4F8AF56 petscinitialize_internal (zstart.c317) ==3496== by 0x4F8B996 petscinitialize_ (zstart.c504) ==3496== ==3496== 8 bytes in 1 blocks are definitely lost in loss record 5 of 89 ==3496== at 0x4C2B975 calloc (vg_replace_malloc.c711) ==3496== by 0xA0E9773 dlopen_open (in packages7xopenmpiagave3.0.0gcc6xnormalliblibopen-pal.so.40.0.0) ==3496== by 0x12824C63 ==3496== by 0x13EAF2F3 ==3496== by 0xA0E7B25 mca_btl_base_select (in packages7xopenmpiagave3.0.0gcc6xnormalliblibopen-pal.so.40.0.0) ==3496== by 0x12E35491 ==3496== by 0x8EEC95B mca_bml_base_init (in packages7xopenmpiagave3.0.0gcc6xnormalliblibmpi.so.40.0.0) ==3496== by 0x8EA2E8B ompi_mpi_init (in packages7xopenmpiagave3.0.0gcc6xnormalliblibmpi.so.40.0.0) ==3496== by 0x8EC8C7A PMPI_Init (in packages7xopenmpiagave3.0.0gcc6xnormalliblibmpi.so.40.0.0) ==3496== by 0x8C463E7 MPI_INIT (in packages7xopenmpiagave3.0.0gcc6xnormalliblibmpi_mpifh.so.40.0.0) ==3496== by 0x4F8AF56 petscinitialize_internal (zstart.c317) ==3496== by 0x4F8B996 petscinitialize_ (zstart.c504) ==3496== ==3496== 12 bytes in 1 blocks are definitely lost in loss record 6 of 89 ==3496== at 0x4C29BE3 malloc (vg_replace_malloc.c299) ==3496== by 0x9863949 strdup (in usrlib64libc-2.17.so) ==3496== by 0xEAABCFD ==3496== by 0xE882357 ==3496== by 0xE82C925 ==3496== by 0xE5EB5F3 ==3496== by 0xDFD78FA ==3496== by 0x9DE3000 orte_init (in packages7xopenmpiagave3.0.0gcc6xnormalliblibopen-rte.so.40.0.0) ==3496== by 0x8EA28DD ompi_mpi_init (in packages7xopenmpiagave3.0.0gcc6xnormalliblibmpi.so.40.0.0) ==3496== by 0x8EC8C7A PMPI_Init (in packages7xopenmpiagave3.0.0gcc6xnormalliblibmpi.so.40.0.0) ==3496== by 0x8C463E7 MPI_INIT (in packages7xopenmpiagave3.0.0gcc6xnormalliblibmpi_mpifh.so.40.0.0) ==3496== by 0x4F8AF56 petscinitialize_internal (zstart.c317) ==3496== ==3496== 13 bytes in 1 blocks are definitely lost in loss record 7 of 89 ==3496== at 0x4C29BE3 malloc (vg_replace_malloc.c299) ==3496== by 0x9863949 strdup (in usrlib64libc-2.17.so) ==3496== by 0x12824CFF ==3496== by 0x134516CF ==3496== by 0xA0C92D2 mca_base_framework_components_open (in packages7xopenmpiagave3.0.0gcc6xnormalliblibopen-pal.so.40.0.0) ==3496== by 0xA0E773A mca_btl_base_open (in packages7xopenmpiagave3.0.0gcc6xnormalliblibopen-pal.so.40.0.0) ==3496== by 0xA0D3C90 mca_base_framework_open (in packages7xopenmpiagave3.0.0gcc6xnormalliblibopen-pal.so.40.0.0) ==3496== by 0xA0D3C90 mca_base_framework_open (in packages7xopenmpiagave3.0.0gcc6xnormalliblibopen-pal.so.40.0.0) ==3496== by 0x8EA2E6C ompi_mpi_init (in packages7xopenmpiagave3.0.0gcc6xnormalliblibmpi.so.40.0.0) ==3496== by 0x8EC8C7A PMPI_Init (in packages7xopenmpiagave3.0.0gcc6xnormalliblibmpi.so.40.0.0) ==3496== by 0x8C463E7 MPI_INIT (in packages7xopenmpiagave3.0.0gcc6xnormalliblibmpi_mpifh.so.40.0.0) ==3496== by 0x4F8AF56 petscinitialize_internal (zstart.c317) ==3496== ==3496== 13 bytes in 1 blocks are definitely lost in loss record 8 of 89 ==3496== at 0x4C29BE3 malloc (vg_replace_malloc.c299) ==3496== by 0x9863949 strdup (in usrlib64libc-2.17.so) ==3496== by 0x12824CFF ==3496== by 0x13AA05E9 ==3496== by 0xA0E7B25 mca_btl_base_select (in packages7xopenmpiagave3.0.0gcc6xnormalliblibopen-pal.so.40.0.0) ==3496== by 0x12E35491 ==3496== by 0x8EEC95B mca_bml_base_init (in packages7xopenmpiagave3.0.0gcc6xnormalliblibmpi.so.40.0.0) ==3496== by 0x8EA2E8B ompi_mpi_init (in packages7xopenmpiagave3.0.0gcc6xnormalliblibmpi.so.40.0.0) ==3496== by 0x8EC8C7A PMPI_Init (in packages7xopenmpiagave3.0.0gcc6xnormalliblibmpi.so.40.0.0) ==3496== by 0x8C463E7 MPI_INIT (in packages7xopenmpiagave3.0.0gcc6xnormalliblibmpi_mpifh.so.40.0.0) ==3496== by 0x4F8AF56 petscinitialize_internal (zstart.c317) ==3496== by 0x4F8B996 petscinitialize_ (zstart.c504) ==3496== ==3496== 13 bytes in 1 blocks are definitely lost in loss record 9 of 89 ==3496== at 0x4C29BE3 malloc (vg_replace_malloc.c299) ==3496== by 0x9863949 strdup (in usrlib64libc-2.17.so) ==3496== by 0x12824CFF ==3496== by 0x13EAF2F3 ==3496== by 0xA0E7B25 mca_btl_base_select (in packages7xopenmpiagave3.0.0gcc6xnormalliblibopen-pal.so.40.0.0) ==3496== by 0x12E35491 ==3496== by 0x8EEC95B mca_bml_base_init (in packages7xopenmpiagave3.0.0gcc6xnormalliblibmpi.so.40.0.0) ==3496== by 0x8EA2E8B ompi_mpi_init (in packages7xopenmpiagave3.0.0gcc6xnormalliblibmpi.so.40.0.0) ==3496== by 0x8EC8C7A PMPI_Init (in packages7xopenmpiagave3.0.0gcc6xnormalliblibmpi.so.40.0.0) ==3496== by 0x8C463E7 MPI_INIT (in packages7xopenmpiagave3.0.0gcc6xnormalliblibmpi_mpifh.so.40.0.0) ==3496== by 0x4F8AF56 petscinitialize_internal (zstart.c317) ==3496== by 0x4F8B996 petscinitialize_ (zstart.c504) ==3496== ==3496== 29 bytes in 1 blocks are definitely lost in loss record 14 of 89 ==3496== at 0x4C29BE3 malloc (vg_replace_malloc.c299) ==3496== by 0x9863949 strdup (in usrlib64libc-2.17.so) ==3496== by 0xF2BBCE7 ==3496== by 0x400F4C2 _dl_init (in usrlib64ld-2.17.so) ==3496== by 0x4013BD5 dl_open_worker (in usrlib64ld-2.17.so) ==3496== by 0x400F2D3 _dl_catch_error (in usrlib64ld-2.17.so) ==3496== by 0x40132CA _dl_open (in usrlib64ld-2.17.so) ==3496== by 0x85C8FBA dlopen_doit (in usrlib64libdl-2.17.so) ==3496== by 0x400F2D3 _dl_catch_error (in usrlib64ld-2.17.so) ==3496== by 0x85C95BC _dlerror_run (in usrlib64libdl-2.17.so) ==3496== by 0x85C9050 dlopen@@GLIBC_2.2.5 (in usrlib64libdl-2.17.so) ==3496== by 0xE87A5C1 ==3496== ==3496== 56 bytes in 1 blocks are definitely lost in loss record 60 of 89 ==3496== at 0x4C29BE3 malloc (vg_replace_malloc.c299) ==3496== by 0x11965489 ==3496== by 0x117539A4 ==3496== by 0x11544C25 ==3496== by 0x9E33E3C orte_oob_base_select (in packages7xopenmpiagave3.0.0gcc6xnormalliblibopen-rte.so.40.0.0) ==3496== by 0x9E23074 orte_ess_base_app_setup (in packages7xopenmpiagave3.0.0gcc6xnormalliblibopen-rte.so.40.0.0) ==3496== by 0xDFD786A ==3496== by 0x9DE3000 orte_init (in packages7xopenmpiagave3.0.0gcc6xnormalliblibopen-rte.so.40.0.0) ==3496== by 0x8EA28DD ompi_mpi_init (in packages7xopenmpiagave3.0.0gcc6xnormalliblibmpi.so.40.0.0) ==3496== by 0x8EC8C7A PMPI_Init (in packages7xopenmpiagave3.0.0gcc6xnormalliblibmpi.so.40.0.0) ==3496== by 0x8C463E7 MPI_INIT (in packages7xopenmpiagave3.0.0gcc6xnormalliblibmpi_mpifh.so.40.0.0) ==3496== by 0x4F8AF56 petscinitialize_internal (zstart.c317) ==3496== ==3496== 536 (24 direct, 512 indirect) bytes in 1 blocks are definitely lost in loss record 64 of 89 ==3496== at 0x4C29BE3 malloc (vg_replace_malloc.c299) ==3496== by 0x78FC2F4 _gfortrani_xmalloc (memory.c43) ==3496== by 0x79CC48A _gfortrani_fbuf_init (fbuf.c42) ==3496== by 0x79BD0C6 _gfortrani_new_unit (open.c615) ==3496== by 0x79BD70D already_open (open.c672) ==3496== by 0x79BD70D _gfortran_st_open (open.c837) ==3496== by 0x41D901 __poisson_petsc_MOD_find_profile (poisson_PETSc.F90496) ==3496== by 0x41E5D8 MAIN__ (main.F9021) ==3496== by 0x41E60E main (main.F902) ==3496== ==3496== 1,640 (320 direct, 1,320 indirect) bytes in 1 blocks are definitely lost in loss record 67 of 89 ==3496== at 0x4C2BB78 realloc (vg_replace_malloc.c785) ==3496== by 0xE813563 ==3496== by 0xE81C77D ==3496== by 0xA0EE0C7 event_process_active_single_queue (event.c1370) ==3496== by 0xA0EE0C7 event_process_active (event.c1440) ==3496== by 0xA0EE0C7 opal_libevent2022_event_base_loop (event.c1644) ==3496== by 0xE8703CD ==3496== by 0x95C8E24 start_thread (in usrlib64libpthread-2.17.so) ==3496== by 0x98D534C clone (in usrlib64libc-2.17.so) ==3496== ==3496== LEAK SUMMARY ==3496== definitely lost 514 bytes in 13 blocks ==3496== indirectly lost 1,832 bytes in 36 blocks ==3496== possibly lost 0 bytes in 0 blocks ==3496== still reachable 13,859,163 bytes in 40 blocks ==3496== suppressed 0 bytes in 0 blocks ==3496== Reachable blocks (those to which a pointer was found) are not shown. ==3496== To see them, rerun with --leak-check=full --show-leak-kinds=all ==3496== ==3496== For counts of detected and suppressed errors, rerun with -v ==3496== ERROR SUMMARY 13 errors from 13 contexts (suppressed 0 from 0) From jroman at dsic.upv.es Fri Aug 28 04:52:48 2020 From: jroman at dsic.upv.es (Jose E. Roman) Date: Fri, 28 Aug 2020 11:52:48 +0200 Subject: [petsc-users] About KSPConvergedReasonView In-Reply-To: <0A6FAFE4-41DE-494A-A8AA-2BF3E84F60D7@petsc.dev> References: <9bffed18-91a0-9ba0-54c4-c77b48d976a9@imperial.ac.uk> <0A6FAFE4-41DE-494A-A8AA-2BF3E84F60D7@petsc.dev> Message-ID: <56FD480D-F828-41E9-8831-8A26F77AD667@dsic.upv.es> Thibaut: the changes that Barry mentioned are already in master. Can you try? Barry: I think the issue is with the PetscViewer argument. KSPView() has a custom Fortran stub that calls PetscPatchDefaultViewers_Fortran(), but this is missing in KSPConvergedReasonView(). Jose > El 27 ago 2020, a las 17:59, Barry Smith escribi?: > > > This is probably due to the final argument being a character string which PETSc has difficulty managing by default for Fortran. > > I just removed the format argument from the call so this problem will gone soon. > > You could just comment out the call for now and put a print statement in instead. > > The problem will be fixed as soon as my merge request gets into master. > > Sorry about this. > > Barry > > > > > >> On Aug 27, 2020, at 10:25 AM, Thibaut Appel wrote: >> >> Dear PETSc users, >> >> I found out that (at least in the master branch) that KSPReasonView has been recently deprecated in favor of KSPConvergedReasonView. >> >> After changing my application code, I thought I was using the function correctly: >> >> CALL KSPGetConvergedReason(ksp,ksp_reason,ierr) >> CHKERRA(ierr) >> >> IF (ksp_reason < 0) THEN >> >> CALL KSPConvergedReasonView(ksp,PETSC_VIEWER_STDOUT_WORLD,PETSC_VIEWER_DEFAULT,ierr) >> CHKERRA(ierr) >> >> END IF >> >> but I still get the following backtrace >> >> Program received signal SIGSEGV, Segmentation fault. >> 0x00007ffff51dea59 in PetscObjectTypeCompare (obj=0x8, >> type_name=0x7ffff75b5b88 "ascii", same=0x7fffffffd7b8) >> at /home/Packages/petsc/src/sys/objects/destroy.c:160 >> 160 else if (!type_name || !obj->type_name) *same = PETSC_FALSE; >> (gdb) bt >> #0 0x00007ffff51dea59 in PetscObjectTypeCompare (obj=0x8, >> type_name=0x7ffff75b5b88 "ascii", same=0x7fffffffd7b8) >> at /home/Packages/petsc/src/sys/objects/destroy.c:160 >> #1 0x00007ffff6ba1d83 in KSPConvergedReasonView (ksp=0x555555bb2510, viewer=0x8, >> format=PETSC_VIEWER_DEFAULT) >> at /home/Packages/petsc/src/ksp/ksp/interface/itfunc.c:452 >> #2 0x00007ffff6beb37a in kspconvergedreasonview_ ( >> ksp=0x55555593f3c0 <__solver_MOD_ksp>, viewer=0x7fffffffda50, >> format=0x5555558fae48, __ierr=0x7fffffffda6c) >> at /home/Packages/petsc/src/ksp/ksp/interface/ftn-auto/itfuncf.c:295 >> #3 0x00005555555e040d in solver::solve_linear_problem (vec_rhs=..., vec_sol=...) >> at mod_solver.F90:1872 >> #4 0x0000555555614453 in solver::solve () at mod_solver.F90:164 >> #5 0x00005555555ba3c6 in main () at main.F90:67 >> #6 0x00005555555ba437 in main (argc=1, argv=0x7fffffffe17e) at main.F90:3 >> #7 0x00007ffff46fa1e3 in __libc_start_main () >> from /usr/lib/x86_64-linux-gnu/libc.so.6 >> #8 0x000055555555cd7e in _start () >> >> as if there was a type mismatch. Could anyone pinpoint what's wrong? >> >> Thank you, >> >> Thibaut >> > From t.appel17 at imperial.ac.uk Fri Aug 28 06:11:48 2020 From: t.appel17 at imperial.ac.uk (Thibaut Appel) Date: Fri, 28 Aug 2020 12:11:48 +0100 Subject: [petsc-users] About KSPConvergedReasonView In-Reply-To: <56FD480D-F828-41E9-8831-8A26F77AD667@dsic.upv.es> References: <9bffed18-91a0-9ba0-54c4-c77b48d976a9@imperial.ac.uk> <0A6FAFE4-41DE-494A-A8AA-2BF3E84F60D7@petsc.dev> <56FD480D-F828-41E9-8831-8A26F77AD667@dsic.upv.es> Message-ID: On 28/08/2020 10:52, Jose E. Roman wrote: > Thibaut: the changes that Barry mentioned are already in master. Can you try? > > Barry: I think the issue is with the PetscViewer argument. KSPView() has a custom Fortran stub that calls PetscPatchDefaultViewers_Fortran(), but this is missing in KSPConvergedReasonView(). > > Jose I reconfigured/compiled PETSc from the most recent master branch an hour ago; the following compiles fine: ??? CALL KSPSolve(ksp,vec_rhs,vec_sol,ierr) ??? CHKERRA(ierr) ??? CALL KSPGetConvergedReason(ksp,ksp_reason,ierr) ??? CHKERRA(ierr) ??? CALL KSPConvergedReasonView(ksp,PETSC_VIEWER_STDOUT_WORLD,ierr) ??? CHKERRA(ierr) but I still get a similar backtrace when KSPConvergedReasonView is called: Program received signal SIGSEGV, Segmentation fault. 0x00007ffff511e89c in PetscObjectTypeCompare (obj=0x8, type_name=0x7ffff7580128 "ascii", ??? same=0x7fffffffd7c8) at /home/thibaut/Packages/petsc/src/sys/objects/destroy.c:160 160??? ? else if (!type_name || !obj->type_name) *same = PETSC_FALSE; (gdb) backtrace #0? 0x00007ffff511e89c in PetscObjectTypeCompare (obj=0x8, type_name=0x7ffff7580128 "ascii", ??? same=0x7fffffffd7c8) at /home/Packages/petsc/src/sys/objects/destroy.c:160 #1? 0x00007ffff66ed4d3 in KSPConvergedReasonView (ksp=0x555555b44860, viewer=0x8) ??? at /home/Packages/petsc/src/ksp/ksp/interface/itfunc.c:455 #2? 0x00007ffff6733fca in kspconvergedreasonview_ (ksp=0x5555559403c0 <__solver_MOD_ksp>, ??? viewer=0x7fffffffda60, __ierr=0x7fffffffda6c) ??? at /home/Packages/petsc/src/ksp/ksp/interface/ftn-auto/itfuncf.c:295 #3? 0x00005555555e165d in solver::solve_linear_problem (vec_rhs=..., vec_sol=...) ??? at mod_solver.F90:1867 #4? 0x00005555556156fe in solver::solve () at mod_solver.F90:164 #5? 0x00005555555bb8f3 in main () at main.F90:67 #6? 0x00005555555bb964 in main (argc=1, argv=0x7fffffffe17d) at main.F90:3 #7? 0x00007ffff46f71e3 in __libc_start_main () from /usr/lib/x86_64-linux-gnu/libc.so.6 #8? 0x000055555555cd5e in _start () Thibaut >> El 27 ago 2020, a las 17:59, Barry Smith escribi?: >> >> >> This is probably due to the final argument being a character string which PETSc has difficulty managing by default for Fortran. >> >> I just removed the format argument from the call so this problem will gone soon. >> >> You could just comment out the call for now and put a print statement in instead. >> >> The problem will be fixed as soon as my merge request gets into master. >> >> Sorry about this. >> >> Barry >> >> >> >> >> >>> On Aug 27, 2020, at 10:25 AM, Thibaut Appel wrote: >>> >>> Dear PETSc users, >>> >>> I found out that (at least in the master branch) that KSPReasonView has been recently deprecated in favor of KSPConvergedReasonView. >>> >>> After changing my application code, I thought I was using the function correctly: >>> >>> CALL KSPGetConvergedReason(ksp,ksp_reason,ierr) >>> CHKERRA(ierr) >>> >>> IF (ksp_reason < 0) THEN >>> >>> CALL KSPConvergedReasonView(ksp,PETSC_VIEWER_STDOUT_WORLD,PETSC_VIEWER_DEFAULT,ierr) >>> CHKERRA(ierr) >>> >>> END IF >>> >>> but I still get the following backtrace >>> >>> Program received signal SIGSEGV, Segmentation fault. >>> 0x00007ffff51dea59 in PetscObjectTypeCompare (obj=0x8, >>> type_name=0x7ffff75b5b88 "ascii", same=0x7fffffffd7b8) >>> at /home/Packages/petsc/src/sys/objects/destroy.c:160 >>> 160 else if (!type_name || !obj->type_name) *same = PETSC_FALSE; >>> (gdb) bt >>> #0 0x00007ffff51dea59 in PetscObjectTypeCompare (obj=0x8, >>> type_name=0x7ffff75b5b88 "ascii", same=0x7fffffffd7b8) >>> at /home/Packages/petsc/src/sys/objects/destroy.c:160 >>> #1 0x00007ffff6ba1d83 in KSPConvergedReasonView (ksp=0x555555bb2510, viewer=0x8, >>> format=PETSC_VIEWER_DEFAULT) >>> at /home/Packages/petsc/src/ksp/ksp/interface/itfunc.c:452 >>> #2 0x00007ffff6beb37a in kspconvergedreasonview_ ( >>> ksp=0x55555593f3c0 <__solver_MOD_ksp>, viewer=0x7fffffffda50, >>> format=0x5555558fae48, __ierr=0x7fffffffda6c) >>> at /home/Packages/petsc/src/ksp/ksp/interface/ftn-auto/itfuncf.c:295 >>> #3 0x00005555555e040d in solver::solve_linear_problem (vec_rhs=..., vec_sol=...) >>> at mod_solver.F90:1872 >>> #4 0x0000555555614453 in solver::solve () at mod_solver.F90:164 >>> #5 0x00005555555ba3c6 in main () at main.F90:67 >>> #6 0x00005555555ba437 in main (argc=1, argv=0x7fffffffe17e) at main.F90:3 >>> #7 0x00007ffff46fa1e3 in __libc_start_main () >>> from /usr/lib/x86_64-linux-gnu/libc.so.6 >>> #8 0x000055555555cd7e in _start () >>> >>> as if there was a type mismatch. Could anyone pinpoint what's wrong? >>> >>> Thank you, >>> >>> Thibaut >>> From knepley at gmail.com Fri Aug 28 06:40:12 2020 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 28 Aug 2020 07:40:12 -0400 Subject: [petsc-users] Unexplained memory leaks In-Reply-To: References: Message-ID: OpenMPI leaks memory at the end. As you can see it is small, so they are unmotivated to fix that. Thanks, Matt On Fri, Aug 28, 2020 at 4:31 AM baikadi pranay wrote: > Hi, > I am building a 2D solver for the semiconductor Poisson-Boltzmann > equation. I detected a memory leak when running the program using valgrind > but I am unable to solve this issue as there are no signs in the valgrind > output indicating that the source of the error is in the modules I have > written. I am attaching you a text file containing the valgrind output. > > I have seen that a similar question was asked earlier (found here > ) > but I could not find a final solution to that problem. Could you let me > know the source of the problem? > > Please let me know if you need any further information. > > Thank you, > Pranay. > ? > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sblondel at utk.edu Fri Aug 28 09:49:59 2020 From: sblondel at utk.edu (Blondel, Sophie) Date: Fri, 28 Aug 2020 14:49:59 +0000 Subject: [petsc-users] Matrix Free Method questions Message-ID: Hi everyone, I have been using PETSc for a few years with a fully implicit TS ARKIMEX method and am now exploring the matrix free method option. Here is the list of PETSc options I typically use: -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 5 -snes_force_iteration -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -fieldsplit_0_pc_type sor -ts_max_snes_failures -1 -pc_fieldsplit_detect_coupling -ts_monitor -pc_type fieldsplit -fieldsplit_1_pc_type redundant -ts_max_steps 100 I started to compare the performance of the code without changing anything of the executable and simply adding "-snes_mf", I see a reduction of memory usage as expected and a benchmark that would usually take ~5min to run now takes ~50min. Reading the documentation I saw that there are a few option to play with the matrix free method like -snes_mf_err, -snes_mf_umin, or switching to -snes_mf_type wp. I used and modified the values of each of these options separately but never saw a sizable change in runtime, is it expected? And are there other ways to make the matrix free method faster? I saw in the documentation that you can define your own per-conditioner for instance. Let me know if you need additional information about the PETSc setup in the application I use. Best, Sophie -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Aug 28 10:24:32 2020 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 28 Aug 2020 10:24:32 -0500 Subject: [petsc-users] About KSPConvergedReasonView In-Reply-To: References: <9bffed18-91a0-9ba0-54c4-c77b48d976a9@imperial.ac.uk> <0A6FAFE4-41DE-494A-A8AA-2BF3E84F60D7@petsc.dev> <56FD480D-F828-41E9-8831-8A26F77AD667@dsic.upv.es> Message-ID: This should never have worked from Fortran since the custom Fortran stub was missing, as Jose points out. I will make a MR with the needed correction code shortly. Barry > On Aug 28, 2020, at 6:11 AM, Thibaut Appel wrote: > > On 28/08/2020 10:52, Jose E. Roman wrote: >> Thibaut: the changes that Barry mentioned are already in master. Can you try? >> >> Barry: I think the issue is with the PetscViewer argument. KSPView() has a custom Fortran stub that calls PetscPatchDefaultViewers_Fortran(), but this is missing in KSPConvergedReasonView(). >> >> Jose > > I reconfigured/compiled PETSc from the most recent master branch an hour ago; the following compiles fine: > > CALL KSPSolve(ksp,vec_rhs,vec_sol,ierr) > CHKERRA(ierr) > > CALL KSPGetConvergedReason(ksp,ksp_reason,ierr) > CHKERRA(ierr) > > CALL KSPConvergedReasonView(ksp,PETSC_VIEWER_STDOUT_WORLD,ierr) > CHKERRA(ierr) > > but I still get a similar backtrace when KSPConvergedReasonView is called: > > Program received signal SIGSEGV, Segmentation fault. > 0x00007ffff511e89c in PetscObjectTypeCompare (obj=0x8, type_name=0x7ffff7580128 "ascii", > same=0x7fffffffd7c8) at /home/thibaut/Packages/petsc/src/sys/objects/destroy.c:160 > 160 else if (!type_name || !obj->type_name) *same = PETSC_FALSE; > (gdb) backtrace > #0 0x00007ffff511e89c in PetscObjectTypeCompare (obj=0x8, type_name=0x7ffff7580128 "ascii", > same=0x7fffffffd7c8) at /home/Packages/petsc/src/sys/objects/destroy.c:160 > #1 0x00007ffff66ed4d3 in KSPConvergedReasonView (ksp=0x555555b44860, viewer=0x8) > at /home/Packages/petsc/src/ksp/ksp/interface/itfunc.c:455 > #2 0x00007ffff6733fca in kspconvergedreasonview_ (ksp=0x5555559403c0 <__solver_MOD_ksp>, > viewer=0x7fffffffda60, __ierr=0x7fffffffda6c) > at /home/Packages/petsc/src/ksp/ksp/interface/ftn-auto/itfuncf.c:295 > #3 0x00005555555e165d in solver::solve_linear_problem (vec_rhs=..., vec_sol=...) > at mod_solver.F90:1867 > #4 0x00005555556156fe in solver::solve () at mod_solver.F90:164 > #5 0x00005555555bb8f3 in main () at main.F90:67 > #6 0x00005555555bb964 in main (argc=1, argv=0x7fffffffe17d) at main.F90:3 > #7 0x00007ffff46f71e3 in __libc_start_main () from /usr/lib/x86_64-linux-gnu/libc.so.6 > #8 0x000055555555cd5e in _start () > > Thibaut > >>> El 27 ago 2020, a las 17:59, Barry Smith escribi?: >>> >>> >>> This is probably due to the final argument being a character string which PETSc has difficulty managing by default for Fortran. >>> >>> I just removed the format argument from the call so this problem will gone soon. >>> >>> You could just comment out the call for now and put a print statement in instead. >>> >>> The problem will be fixed as soon as my merge request gets into master. >>> >>> Sorry about this. >>> >>> Barry >>> >>> >>> >>> >>> >>>> On Aug 27, 2020, at 10:25 AM, Thibaut Appel wrote: >>>> >>>> Dear PETSc users, >>>> >>>> I found out that (at least in the master branch) that KSPReasonView has been recently deprecated in favor of KSPConvergedReasonView. >>>> >>>> After changing my application code, I thought I was using the function correctly: >>>> >>>> CALL KSPGetConvergedReason(ksp,ksp_reason,ierr) >>>> CHKERRA(ierr) >>>> >>>> IF (ksp_reason < 0) THEN >>>> >>>> CALL KSPConvergedReasonView(ksp,PETSC_VIEWER_STDOUT_WORLD,PETSC_VIEWER_DEFAULT,ierr) >>>> CHKERRA(ierr) >>>> >>>> END IF >>>> >>>> but I still get the following backtrace >>>> >>>> Program received signal SIGSEGV, Segmentation fault. >>>> 0x00007ffff51dea59 in PetscObjectTypeCompare (obj=0x8, >>>> type_name=0x7ffff75b5b88 "ascii", same=0x7fffffffd7b8) >>>> at /home/Packages/petsc/src/sys/objects/destroy.c:160 >>>> 160 else if (!type_name || !obj->type_name) *same = PETSC_FALSE; >>>> (gdb) bt >>>> #0 0x00007ffff51dea59 in PetscObjectTypeCompare (obj=0x8, >>>> type_name=0x7ffff75b5b88 "ascii", same=0x7fffffffd7b8) >>>> at /home/Packages/petsc/src/sys/objects/destroy.c:160 >>>> #1 0x00007ffff6ba1d83 in KSPConvergedReasonView (ksp=0x555555bb2510, viewer=0x8, >>>> format=PETSC_VIEWER_DEFAULT) >>>> at /home/Packages/petsc/src/ksp/ksp/interface/itfunc.c:452 >>>> #2 0x00007ffff6beb37a in kspconvergedreasonview_ ( >>>> ksp=0x55555593f3c0 <__solver_MOD_ksp>, viewer=0x7fffffffda50, >>>> format=0x5555558fae48, __ierr=0x7fffffffda6c) >>>> at /home/Packages/petsc/src/ksp/ksp/interface/ftn-auto/itfuncf.c:295 >>>> #3 0x00005555555e040d in solver::solve_linear_problem (vec_rhs=..., vec_sol=...) >>>> at mod_solver.F90:1872 >>>> #4 0x0000555555614453 in solver::solve () at mod_solver.F90:164 >>>> #5 0x00005555555ba3c6 in main () at main.F90:67 >>>> #6 0x00005555555ba437 in main (argc=1, argv=0x7fffffffe17e) at main.F90:3 >>>> #7 0x00007ffff46fa1e3 in __libc_start_main () >>>> from /usr/lib/x86_64-linux-gnu/libc.so.6 >>>> #8 0x000055555555cd7e in _start () >>>> >>>> as if there was a type mismatch. Could anyone pinpoint what's wrong? >>>> >>>> Thank you, >>>> >>>> Thibaut >>>> From jed at jedbrown.org Fri Aug 28 10:28:00 2020 From: jed at jedbrown.org (Jed Brown) Date: Fri, 28 Aug 2020 09:28:00 -0600 Subject: [petsc-users] Matrix Free Method questions In-Reply-To: References: Message-ID: <873646rdrz.fsf@jedbrown.org> "Blondel, Sophie via petsc-users" writes: > Hi everyone, > > I have been using PETSc for a few years with a fully implicit TS ARKIMEX method and am now exploring the matrix free method option. Here is the list of PETSc options I typically use: -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 5 -snes_force_iteration -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -fieldsplit_0_pc_type sor -ts_max_snes_failures -1 -pc_fieldsplit_detect_coupling -ts_monitor -pc_type fieldsplit -fieldsplit_1_pc_type redundant -ts_max_steps 100 > > I started to compare the performance of the code without changing anything of the executable and simply adding "-snes_mf", I see a reduction of memory usage as expected and a benchmark that would usually take ~5min to run now takes ~50min. Reading the documentation I saw that there are a few option to play with the matrix free method like -snes_mf_err, -snes_mf_umin, or switching to -snes_mf_type wp. I used and modified the values of each of these options separately but never saw a sizable change in runtime, is it expected? Totally normal. Changing the MF method mainly just affects stability of the algorithm (with small consequences in vector work required to determine the differencing parameter). > And are there other ways to make the matrix free method faster? I saw in the documentation that you can define your own per-conditioner for instance. Let me know if you need additional information about the PETSc setup in the application I use. Could you share -log_view output for your standard method and the MF variant? From bsmith at petsc.dev Fri Aug 28 11:12:18 2020 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 28 Aug 2020 11:12:18 -0500 Subject: [petsc-users] Matrix Free Method questions In-Reply-To: References: Message-ID: <5BDE8465-76BE-4132-BF4E-6784548AADC0@petsc.dev> Sophie, This is exactly what i would expect. If you run with -ksp_monitor you will see the -snes_mf run takes many more iterations. I am puzzled that the argument -pc_type fieldsplit did not stop the run since this is under normal circumstances not a viable preconditioner with -snes_mf. Did you also remove the -pc_type fieldsplit argument? In order to see how one can avoid forming the entire matrix and use matrix-free to do the matrix-vector but still have an effective preconditioner let's look at what the current preconditioner options do. > -pc_fieldsplit_detect_coupling creates two sub-preconditioners, the first for all the variables and the second for those that are coupled by the matrix to variables in neighboring cells Since only the smallest cluster sizes have diffusion/advection this second set contains only the cluster size one variables. > -fieldsplit_0_pc_type sor Runs SOR on all the variables; you can think of this as running SOR on the reactions, it is a pretty good preconditioner for the reactions since the reactions are local, per cell. > -fieldsplit_1_pc_type redundant This runs the default preconditioner (ILU) on just the variables that diffuse, i.e. the elliptic part. For smallish problems this is fine, for larger problems and 2d and 3d presumably you have also -redundant_pc_type gamg to use algebraic multigrid for the diffusion. This part of the matrix will always need to be formed and used in the preconditioner. It is very important since the diffusion is what brings in most of the ill-conditioning for larger problems into the linear system. Note that it only needs the matrix entries for the cluster size of 1 so it is very small compared to the entire sparse matrix. ---- The first preconditioner SOR requires ALL the matrix entries which are almost all (except for the diffusion terms) the coupling between different size clusters within a cell. Especially each cell has its own sparse matrix of the size of total number of clusters, it is sparse but not super sparse. So the to significantly lower memory usage we need to remove the SOR and the storing of all the matrix entries but still have an efficient preconditioner for the "reaction" terms. The simplest thing would be to use Jacobi instead of SOR for the first subpreconditioner since it only requires the diagonal entries in the matrix. But Jacobi is a worse preconditioner than SOR (since it totally ignores the matrix coupling) and sometimes can be much worse. Before anyone writes additional code we need to know if doing something along these lines does not ruin the convergence that. Have you used the same options as before but with -fieldsplit_0_pc_type jacobi ? (Not using any matrix free). We need to get an idea of how many more linear iterations it requires (not time, comparing time won't be helpful for this exercise.) We also need this information for realistic size problems in 2 or 3 dimensions that you really want to run; for small problems this approach will work ok and give misleading information about what happens for large problems. I suspect the iteration counts will shot up. Can you run some cases and see how the iteration counts change? Based on that we can decide if we still retain "good convergence" by changing the SOR to Jacobi and then change the code to make this change efficient (basically by skipping the explicit computation of the reaction Jacobian terms and using matrix-free on the outside of the PCFIELDSPLIT.) Barry > On Aug 28, 2020, at 9:49 AM, Blondel, Sophie via petsc-users wrote: > > Hi everyone, > > I have been using PETSc for a few years with a fully implicit TS ARKIMEX method and am now exploring the matrix free method option. Here is the list of PETSc options I typically use: -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 5 -snes_force_iteration -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -fieldsplit_0_pc_type sor -ts_max_snes_failures -1 -pc_fieldsplit_detect_coupling -ts_monitor -pc_type fieldsplit -fieldsplit_1_pc_type redundant -ts_max_steps 100 > > I started to compare the performance of the code without changing anything of the executable and simply adding "-snes_mf", I see a reduction of memory usage as expected and a benchmark that would usually take ~5min to run now takes ~50min. Reading the documentation I saw that there are a few option to play with the matrix free method like -snes_mf_err, -snes_mf_umin, or switching to -snes_mf_type wp. I used and modified the values of each of these options separately but never saw a sizable change in runtime, is it expected? > > And are there other ways to make the matrix free method faster? I saw in the documentation that you can define your own per-conditioner for instance. Let me know if you need additional information about the PETSc setup in the application I use. > > Best, > > Sophie -------------- next part -------------- An HTML attachment was scrubbed... URL: From pranayreddy865 at gmail.com Fri Aug 28 12:32:29 2020 From: pranayreddy865 at gmail.com (baikadi pranay) Date: Fri, 28 Aug 2020 10:32:29 -0700 Subject: [petsc-users] Unexplained memory leaks In-Reply-To: References: Message-ID: Thank you Mark and Matthew. I will try to use the suppression file to hide the leaks. On a different note, I see a similar error related to MPI when I couple my poisson solver with the schrodinger solver (written with SLEPc). I am attaching a screenshot of the error. I was wondering if you could comment on this issue as well. Please let me know if you need any further information from me. Thank you. Best Regards, Pranay. ? On Fri, Aug 28, 2020 at 5:48 AM Mark Lohry wrote: > Looking through my own valgrind I saw OpenMPI mentions on their faq they > provide a suppression file to hide their known leaks: > https://www.open-mpi.org/faq/?category=debugging#valgrind_clean > > mpirun -np 2 valgrind --suppressions=$PREFIX/share/openmpi/openmpi-valgrind.supp > > > > On Fri, Aug 28, 2020 at 7:41 AM Matthew Knepley wrote: > >> OpenMPI leaks memory at the end. As you can see it is small, so they are >> unmotivated to fix that. >> >> Thanks, >> >> Matt >> >> On Fri, Aug 28, 2020 at 4:31 AM baikadi pranay >> wrote: >> >>> Hi, >>> I am building a 2D solver for the semiconductor Poisson-Boltzmann >>> equation. I detected a memory leak when running the program using valgrind >>> but I am unable to solve this issue as there are no signs in the valgrind >>> output indicating that the source of the error is in the modules I have >>> written. I am attaching you a text file containing the valgrind output. >>> >>> I have seen that a similar question was asked earlier (found here >>> ) >>> but I could not find a final solution to that problem. Could you let me >>> know the source of the problem? >>> >>> Please let me know if you need any further information. >>> >>> Thank you, >>> Pranay. >>> ? >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: schrod_poisson_MPI_error.JPG Type: image/jpeg Size: 196144 bytes Desc: not available URL: From knepley at gmail.com Fri Aug 28 12:36:47 2020 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 28 Aug 2020 13:36:47 -0400 Subject: [petsc-users] Unexplained memory leaks In-Reply-To: References: Message-ID: On Fri, Aug 28, 2020 at 1:32 PM baikadi pranay wrote: > Thank you Mark and Matthew. I will try to use the suppression file to hide > the leaks. > > On a different note, I see a similar error related to MPI when I couple my > poisson solver with the schrodinger solver (written with SLEPc). I am > attaching a screenshot of the error. I was wondering if you could comment > on this issue as well. > > Please let me know if you need any further information from me. > It looks like you called an MPI function after you called PetscFinalize(). In order to do this, you have to call MPI_Init() before you call PetscInitialize(). Thanks, Matt > Thank you. > > Best Regards, > Pranay. > ? > > On Fri, Aug 28, 2020 at 5:48 AM Mark Lohry wrote: > >> Looking through my own valgrind I saw OpenMPI mentions on their faq they >> provide a suppression file to hide their known leaks: >> https://www.open-mpi.org/faq/?category=debugging#valgrind_clean >> >> mpirun -np 2 valgrind --suppressions=$PREFIX/share/openmpi/openmpi-valgrind.supp >> >> >> >> On Fri, Aug 28, 2020 at 7:41 AM Matthew Knepley >> wrote: >> >>> OpenMPI leaks memory at the end. As you can see it is small, so they are >>> unmotivated to fix that. >>> >>> Thanks, >>> >>> Matt >>> >>> On Fri, Aug 28, 2020 at 4:31 AM baikadi pranay >>> wrote: >>> >>>> Hi, >>>> I am building a 2D solver for the semiconductor Poisson-Boltzmann >>>> equation. I detected a memory leak when running the program using valgrind >>>> but I am unable to solve this issue as there are no signs in the valgrind >>>> output indicating that the source of the error is in the modules I have >>>> written. I am attaching you a text file containing the valgrind output. >>>> >>>> I have seen that a similar question was asked earlier (found here >>>> ) >>>> but I could not find a final solution to that problem. Could you let me >>>> know the source of the problem? >>>> >>>> Please let me know if you need any further information. >>>> >>>> Thank you, >>>> Pranay. >>>> ? >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Fri Aug 28 13:02:07 2020 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 28 Aug 2020 13:02:07 -0500 (CDT) Subject: [petsc-users] Unexplained memory leaks In-Reply-To: References: Message-ID: Its best to build PETSc with --download-mpich for use with valgrind. Satish On Fri, 28 Aug 2020, baikadi pranay wrote: > Thank you Mark and Matthew. I will try to use the suppression file to hide > the leaks. > > On a different note, I see a similar error related to MPI when I couple my > poisson solver with the schrodinger solver (written with SLEPc). I am > attaching a screenshot of the error. I was wondering if you could comment > on this issue as well. > > Please let me know if you need any further information from me. > > Thank you. > > Best Regards, > Pranay. > ? > > On Fri, Aug 28, 2020 at 5:48 AM Mark Lohry wrote: > > > Looking through my own valgrind I saw OpenMPI mentions on their faq they > > provide a suppression file to hide their known leaks: > > https://www.open-mpi.org/faq/?category=debugging#valgrind_clean > > > > mpirun -np 2 valgrind --suppressions=$PREFIX/share/openmpi/openmpi-valgrind.supp > > > > > > > > On Fri, Aug 28, 2020 at 7:41 AM Matthew Knepley wrote: > > > >> OpenMPI leaks memory at the end. As you can see it is small, so they are > >> unmotivated to fix that. > >> > >> Thanks, > >> > >> Matt > >> > >> On Fri, Aug 28, 2020 at 4:31 AM baikadi pranay > >> wrote: > >> > >>> Hi, > >>> I am building a 2D solver for the semiconductor Poisson-Boltzmann > >>> equation. I detected a memory leak when running the program using valgrind > >>> but I am unable to solve this issue as there are no signs in the valgrind > >>> output indicating that the source of the error is in the modules I have > >>> written. I am attaching you a text file containing the valgrind output. > >>> > >>> I have seen that a similar question was asked earlier (found here > >>> ) > >>> but I could not find a final solution to that problem. Could you let me > >>> know the source of the problem? > >>> > >>> Please let me know if you need any further information. > >>> > >>> Thank you, > >>> Pranay. > >>> ? > >>> > >> > >> > >> -- > >> What most experimenters take for granted before they begin their > >> experiments is infinitely more interesting than any results to which their > >> experiments lead. > >> -- Norbert Wiener > >> > >> https://www.cse.buffalo.edu/~knepley/ > >> > >> > > > From sblondel at utk.edu Fri Aug 28 16:11:06 2020 From: sblondel at utk.edu (Blondel, Sophie) Date: Fri, 28 Aug 2020 21:11:06 +0000 Subject: [petsc-users] Matrix Free Method questions In-Reply-To: <5BDE8465-76BE-4132-BF4E-6784548AADC0@petsc.dev> References: , <5BDE8465-76BE-4132-BF4E-6784548AADC0@petsc.dev> Message-ID: Thank you Jed and Barry, First, attached are the logs from the benchmark runs I did without (log_std.txt) and with MF method (log_mf.txt). It took me some trouble to get the -log_view to work because I'm using push and pop for the options which means that PETSc is initialized with no argument so the command line argument was not taken into account, but I guess this is for a separate discussion. To answer questions about the current per-conditioners: * I used the same pre-conditioner options as listed in my previous email when I added the -snes_mf option; I did try to remove all the PC related options at one point with the MF method but didn't see a change in runtime so I put them back in * this benchmark is for a 1D DMDA using 20 grid points; when running in 2D or 3D I switch the PC options to: -pc_type fieldsplit -fieldsplit_0_pc_type sor -fieldsplit_1_pc_type gamg -fieldsplit_1_ksp_type gmres -ksp_type fgmres -fieldsplit_1_pc_gamg_threshold -1 I haven't tried a Jacobi PC instead of SOR, I will run a set of more realistic runs (1D and 2D) without MF but with Jacobi and report on it next week. When you say "iterations" do you mean what is given by -ksp_monitor? Cheers, Sophie ________________________________ De : Barry Smith Envoy? : vendredi 28 ao?t 2020 12:12 ? : Blondel, Sophie Cc : petsc-users at mcs.anl.gov ; xolotl-psi-development at lists.sourceforge.net Objet : Re: [petsc-users] Matrix Free Method questions [External Email] Sophie, This is exactly what i would expect. If you run with -ksp_monitor you will see the -snes_mf run takes many more iterations. I am puzzled that the argument -pc_type fieldsplit did not stop the run since this is under normal circumstances not a viable preconditioner with -snes_mf. Did you also remove the -pc_type fieldsplit argument? In order to see how one can avoid forming the entire matrix and use matrix-free to do the matrix-vector but still have an effective preconditioner let's look at what the current preconditioner options do. -pc_fieldsplit_detect_coupling creates two sub-preconditioners, the first for all the variables and the second for those that are coupled by the matrix to variables in neighboring cells Since only the smallest cluster sizes have diffusion/advection this second set contains only the cluster size one variables. -fieldsplit_0_pc_type sor Runs SOR on all the variables; you can think of this as running SOR on the reactions, it is a pretty good preconditioner for the reactions since the reactions are local, per cell. -fieldsplit_1_pc_type redundant This runs the default preconditioner (ILU) on just the variables that diffuse, i.e. the elliptic part. For smallish problems this is fine, for larger problems and 2d and 3d presumably you have also -redundant_pc_type gamg to use algebraic multigrid for the diffusion. This part of the matrix will always need to be formed and used in the preconditioner. It is very important since the diffusion is what brings in most of the ill-conditioning for larger problems into the linear system. Note that it only needs the matrix entries for the cluster size of 1 so it is very small compared to the entire sparse matrix. ---- The first preconditioner SOR requires ALL the matrix entries which are almost all (except for the diffusion terms) the coupling between different size clusters within a cell. Especially each cell has its own sparse matrix of the size of total number of clusters, it is sparse but not super sparse. So the to significantly lower memory usage we need to remove the SOR and the storing of all the matrix entries but still have an efficient preconditioner for the "reaction" terms. The simplest thing would be to use Jacobi instead of SOR for the first subpreconditioner since it only requires the diagonal entries in the matrix. But Jacobi is a worse preconditioner than SOR (since it totally ignores the matrix coupling) and sometimes can be much worse. Before anyone writes additional code we need to know if doing something along these lines does not ruin the convergence that. Have you used the same options as before but with -fieldsplit_0_pc_type jacobi ? (Not using any matrix free). We need to get an idea of how many more linear iterations it requires (not time, comparing time won't be helpful for this exercise.) We also need this information for realistic size problems in 2 or 3 dimensions that you really want to run; for small problems this approach will work ok and give misleading information about what happens for large problems. I suspect the iteration counts will shot up. Can you run some cases and see how the iteration counts change? Based on that we can decide if we still retain "good convergence" by changing the SOR to Jacobi and then change the code to make this change efficient (basically by skipping the explicit computation of the reaction Jacobian terms and using matrix-free on the outside of the PCFIELDSPLIT.) Barry On Aug 28, 2020, at 9:49 AM, Blondel, Sophie via petsc-users > wrote: Hi everyone, I have been using PETSc for a few years with a fully implicit TS ARKIMEX method and am now exploring the matrix free method option. Here is the list of PETSc options I typically use: -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 5 -snes_force_iteration -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -fieldsplit_0_pc_type sor -ts_max_snes_failures -1 -pc_fieldsplit_detect_coupling -ts_monitor -pc_type fieldsplit -fieldsplit_1_pc_type redundant -ts_max_steps 100 I started to compare the performance of the code without changing anything of the executable and simply adding "-snes_mf", I see a reduction of memory usage as expected and a benchmark that would usually take ~5min to run now takes ~50min. Reading the documentation I saw that there are a few option to play with the matrix free method like -snes_mf_err, -snes_mf_umin, or switching to -snes_mf_type wp. I used and modified the values of each of these options separately but never saw a sizable change in runtime, is it expected? And are there other ways to make the matrix free method faster? I saw in the documentation that you can define your own per-conditioner for instance. Let me know if you need additional information about the PETSc setup in the application I use. Best, Sophie -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: log_mf.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: log_std.txt URL: From bsmith at petsc.dev Fri Aug 28 17:31:05 2020 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 28 Aug 2020 17:31:05 -0500 Subject: [petsc-users] Matrix Free Method questions In-Reply-To: References: <5BDE8465-76BE-4132-BF4E-6784548AADC0@petsc.dev> Message-ID: <3329269A-EB37-41C9-9698-BA4631A1E18A@petsc.dev> > On Aug 28, 2020, at 4:11 PM, Blondel, Sophie wrote: > > Thank you Jed and Barry, > > First, attached are the logs from the benchmark runs I did without (log_std.txt) and with MF method (log_mf.txt). It took me some trouble to get the -log_view to work because I'm using push and pop for the options which means that PETSc is initialized with no argument so the command line argument was not taken into account, but I guess this is for a separate discussion. > > To answer questions about the current per-conditioners: > I used the same pre-conditioner options as listed in my previous email when I added the -snes_mf option; I did try to remove all the PC related options at one point with the MF method but didn't see a change in runtime so I put them back in > this benchmark is for a 1D DMDA using 20 grid points; when running in 2D or 3D I switch the PC options to: -pc_type fieldsplit -fieldsplit_0_pc_type sor -fieldsplit_1_pc_type gamg -fieldsplit_1_ksp_type gmres -ksp_type fgmres -fieldsplit_1_pc_gamg_threshold -1 > I haven't tried a Jacobi PC instead of SOR, I will run a set of more realistic runs (1D and 2D) without MF but with Jacobi and report on it next week. When you say "iterations" do you mean what is given by -ksp_monitor? Yes, the number of MatMult is a good enough surrogate. So using matrix-free (which means no preconditioning) has 35846/160 ans = 224.0375 or 224 as many iterations. So even for this modest 1d problem preconditioning is doing a great deal. Barry > > Cheers, > > Sophie > De : Barry Smith > > Envoy? : vendredi 28 ao?t 2020 12:12 > ? : Blondel, Sophie > > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > > Objet : Re: [petsc-users] Matrix Free Method questions > > [External Email] > > Sophie, > > This is exactly what i would expect. If you run with -ksp_monitor you will see the -snes_mf run takes many more iterations. > > I am puzzled that the argument -pc_type fieldsplit did not stop the run since this is under normal circumstances not a viable preconditioner with -snes_mf. Did you also remove the -pc_type fieldsplit argument? > > In order to see how one can avoid forming the entire matrix and use matrix-free to do the matrix-vector but still have an effective preconditioner let's look at what the current preconditioner options do. > >> -pc_fieldsplit_detect_coupling > > creates two sub-preconditioners, the first for all the variables and the second for those that are coupled by the matrix to variables in neighboring cells Since only the smallest cluster sizes have diffusion/advection this second set contains only the cluster size one variables. > >> -fieldsplit_0_pc_type sor > > Runs SOR on all the variables; you can think of this as running SOR on the reactions, it is a pretty good preconditioner for the reactions since the reactions are local, per cell. > >> -fieldsplit_1_pc_type redundant > > > This runs the default preconditioner (ILU) on just the variables that diffuse, i.e. the elliptic part. For smallish problems this is fine, for larger problems and 2d and 3d presumably you have also -redundant_pc_type gamg to use algebraic multigrid for the diffusion. This part of the matrix will always need to be formed and used in the preconditioner. It is very important since the diffusion is what brings in most of the ill-conditioning for larger problems into the linear system. Note that it only needs the matrix entries for the cluster size of 1 so it is very small compared to the entire sparse matrix. > > ---- > The first preconditioner SOR requires ALL the matrix entries which are almost all (except for the diffusion terms) the coupling between different size clusters within a cell. Especially each cell has its own sparse matrix of the size of total number of clusters, it is sparse but not super sparse. > > So the to significantly lower memory usage we need to remove the SOR and the storing of all the matrix entries but still have an efficient preconditioner for the "reaction" terms. > > The simplest thing would be to use Jacobi instead of SOR for the first subpreconditioner since it only requires the diagonal entries in the matrix. But Jacobi is a worse preconditioner than SOR (since it totally ignores the matrix coupling) and sometimes can be much worse. > > Before anyone writes additional code we need to know if doing something along these lines does not ruin the convergence that. > > Have you used the same options as before but with -fieldsplit_0_pc_type jacobi ? (Not using any matrix free). We need to get an idea of how many more linear iterations it requires (not time, comparing time won't be helpful for this exercise.) We also need this information for realistic size problems in 2 or 3 dimensions that you really want to run; for small problems this approach will work ok and give misleading information about what happens for large problems. > > I suspect the iteration counts will shot up. Can you run some cases and see how the iteration counts change? > > Based on that we can decide if we still retain "good convergence" by changing the SOR to Jacobi and then change the code to make this change efficient (basically by skipping the explicit computation of the reaction Jacobian terms and using matrix-free on the outside of the PCFIELDSPLIT.) > > Barry > > > > > > > > > >> On Aug 28, 2020, at 9:49 AM, Blondel, Sophie via petsc-users > wrote: >> >> Hi everyone, >> >> I have been using PETSc for a few years with a fully implicit TS ARKIMEX method and am now exploring the matrix free method option. Here is the list of PETSc options I typically use: -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 5 -snes_force_iteration -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -fieldsplit_0_pc_type sor -ts_max_snes_failures -1 -pc_fieldsplit_detect_coupling -ts_monitor -pc_type fieldsplit -fieldsplit_1_pc_type redundant -ts_max_steps 100 >> >> I started to compare the performance of the code without changing anything of the executable and simply adding "-snes_mf", I see a reduction of memory usage as expected and a benchmark that would usually take ~5min to run now takes ~50min. Reading the documentation I saw that there are a few option to play with the matrix free method like -snes_mf_err, -snes_mf_umin, or switching to -snes_mf_type wp. I used and modified the values of each of these options separately but never saw a sizable change in runtime, is it expected? >> >> And are there other ways to make the matrix free method faster? I saw in the documentation that you can define your own per-conditioner for instance. Let me know if you need additional information about the PETSc setup in the application I use. >> >> Best, >> >> Sophie > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sat Aug 29 11:53:04 2020 From: bsmith at petsc.dev (Barry Smith) Date: Sat, 29 Aug 2020 11:53:04 -0500 Subject: [petsc-users] About KSPConvergedReasonView In-Reply-To: References: <9bffed18-91a0-9ba0-54c4-c77b48d976a9@imperial.ac.uk> <0A6FAFE4-41DE-494A-A8AA-2BF3E84F60D7@petsc.dev> <56FD480D-F828-41E9-8831-8A26F77AD667@dsic.upv.es> Message-ID: <81FD7E5C-17B5-4774-BB42-503BD9FDDACB@petsc.dev> Branch barry/2020-08-29/fortran-stub-converged-reason-view should fix this, soon to be in master. Barry > On Aug 28, 2020, at 6:11 AM, Thibaut Appel wrote: > > On 28/08/2020 10:52, Jose E. Roman wrote: >> Thibaut: the changes that Barry mentioned are already in master. Can you try? >> >> Barry: I think the issue is with the PetscViewer argument. KSPView() has a custom Fortran stub that calls PetscPatchDefaultViewers_Fortran(), but this is missing in KSPConvergedReasonView(). >> >> Jose > > I reconfigured/compiled PETSc from the most recent master branch an hour ago; the following compiles fine: > > CALL KSPSolve(ksp,vec_rhs,vec_sol,ierr) > CHKERRA(ierr) > > CALL KSPGetConvergedReason(ksp,ksp_reason,ierr) > CHKERRA(ierr) > > CALL KSPConvergedReasonView(ksp,PETSC_VIEWER_STDOUT_WORLD,ierr) > CHKERRA(ierr) > > but I still get a similar backtrace when KSPConvergedReasonView is called: > > Program received signal SIGSEGV, Segmentation fault. > 0x00007ffff511e89c in PetscObjectTypeCompare (obj=0x8, type_name=0x7ffff7580128 "ascii", > same=0x7fffffffd7c8) at /home/thibaut/Packages/petsc/src/sys/objects/destroy.c:160 > 160 else if (!type_name || !obj->type_name) *same = PETSC_FALSE; > (gdb) backtrace > #0 0x00007ffff511e89c in PetscObjectTypeCompare (obj=0x8, type_name=0x7ffff7580128 "ascii", > same=0x7fffffffd7c8) at /home/Packages/petsc/src/sys/objects/destroy.c:160 > #1 0x00007ffff66ed4d3 in KSPConvergedReasonView (ksp=0x555555b44860, viewer=0x8) > at /home/Packages/petsc/src/ksp/ksp/interface/itfunc.c:455 > #2 0x00007ffff6733fca in kspconvergedreasonview_ (ksp=0x5555559403c0 <__solver_MOD_ksp>, > viewer=0x7fffffffda60, __ierr=0x7fffffffda6c) > at /home/Packages/petsc/src/ksp/ksp/interface/ftn-auto/itfuncf.c:295 > #3 0x00005555555e165d in solver::solve_linear_problem (vec_rhs=..., vec_sol=...) > at mod_solver.F90:1867 > #4 0x00005555556156fe in solver::solve () at mod_solver.F90:164 > #5 0x00005555555bb8f3 in main () at main.F90:67 > #6 0x00005555555bb964 in main (argc=1, argv=0x7fffffffe17d) at main.F90:3 > #7 0x00007ffff46f71e3 in __libc_start_main () from /usr/lib/x86_64-linux-gnu/libc.so.6 > #8 0x000055555555cd5e in _start () > > Thibaut > >>> El 27 ago 2020, a las 17:59, Barry Smith escribi?: >>> >>> >>> This is probably due to the final argument being a character string which PETSc has difficulty managing by default for Fortran. >>> >>> I just removed the format argument from the call so this problem will gone soon. >>> >>> You could just comment out the call for now and put a print statement in instead. >>> >>> The problem will be fixed as soon as my merge request gets into master. >>> >>> Sorry about this. >>> >>> Barry >>> >>> >>> >>> >>> >>>> On Aug 27, 2020, at 10:25 AM, Thibaut Appel wrote: >>>> >>>> Dear PETSc users, >>>> >>>> I found out that (at least in the master branch) that KSPReasonView has been recently deprecated in favor of KSPConvergedReasonView. >>>> >>>> After changing my application code, I thought I was using the function correctly: >>>> >>>> CALL KSPGetConvergedReason(ksp,ksp_reason,ierr) >>>> CHKERRA(ierr) >>>> >>>> IF (ksp_reason < 0) THEN >>>> >>>> CALL KSPConvergedReasonView(ksp,PETSC_VIEWER_STDOUT_WORLD,PETSC_VIEWER_DEFAULT,ierr) >>>> CHKERRA(ierr) >>>> >>>> END IF >>>> >>>> but I still get the following backtrace >>>> >>>> Program received signal SIGSEGV, Segmentation fault. >>>> 0x00007ffff51dea59 in PetscObjectTypeCompare (obj=0x8, >>>> type_name=0x7ffff75b5b88 "ascii", same=0x7fffffffd7b8) >>>> at /home/Packages/petsc/src/sys/objects/destroy.c:160 >>>> 160 else if (!type_name || !obj->type_name) *same = PETSC_FALSE; >>>> (gdb) bt >>>> #0 0x00007ffff51dea59 in PetscObjectTypeCompare (obj=0x8, >>>> type_name=0x7ffff75b5b88 "ascii", same=0x7fffffffd7b8) >>>> at /home/Packages/petsc/src/sys/objects/destroy.c:160 >>>> #1 0x00007ffff6ba1d83 in KSPConvergedReasonView (ksp=0x555555bb2510, viewer=0x8, >>>> format=PETSC_VIEWER_DEFAULT) >>>> at /home/Packages/petsc/src/ksp/ksp/interface/itfunc.c:452 >>>> #2 0x00007ffff6beb37a in kspconvergedreasonview_ ( >>>> ksp=0x55555593f3c0 <__solver_MOD_ksp>, viewer=0x7fffffffda50, >>>> format=0x5555558fae48, __ierr=0x7fffffffda6c) >>>> at /home/Packages/petsc/src/ksp/ksp/interface/ftn-auto/itfuncf.c:295 >>>> #3 0x00005555555e040d in solver::solve_linear_problem (vec_rhs=..., vec_sol=...) >>>> at mod_solver.F90:1872 >>>> #4 0x0000555555614453 in solver::solve () at mod_solver.F90:164 >>>> #5 0x00005555555ba3c6 in main () at main.F90:67 >>>> #6 0x00005555555ba437 in main (argc=1, argv=0x7fffffffe17e) at main.F90:3 >>>> #7 0x00007ffff46fa1e3 in __libc_start_main () >>>> from /usr/lib/x86_64-linux-gnu/libc.so.6 >>>> #8 0x000055555555cd7e in _start () >>>> >>>> as if there was a type mismatch. Could anyone pinpoint what's wrong? >>>> >>>> Thank you, >>>> >>>> Thibaut >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From eda.oktay at metu.edu.tr Sat Aug 29 14:23:37 2020 From: eda.oktay at metu.edu.tr (Eda Oktay) Date: Sat, 29 Aug 2020 22:23:37 +0300 Subject: [petsc-users] Using edge-weights for partitioning Message-ID: Hi all, I am trying to partition a sparse matrix by using ParMETIS. I am converting my matrix to adjacency type and then applying partitioning. Default, I understood that partitioning doesn't use edge-weights. However, when I used the following codes I saw from ex15 and used "-test_use_edge_weights 1", I am getting the same results as when I don't consider edge weights. PetscBool use_edge_weights=PETSC_FALSE; PetscOptionsGetBool(NULL,NULL,"-test_use_edge_weights",&use_edge_weights,NULL); if (use_edge_weights) { MatPartitioningSetUseEdgeWeights(part,use_edge_weights); MatPartitioningGetUseEdgeWeights(part,&use_edge_weights); if (!use_edge_weights) SETERRQ(PETSC_COMM_SELF,PETSC_ERR_ARG_INCOMP, "use_edge_weights flag does not setup correctly \n"); } My matrix does not consist of 1s and 0s, so I want partitioning to consider all the nonzero elements in the matrix as edge weights. Don't MatPartitioningSetUseEdgeWeights and MatPartitioningGetUseEdgeWeights do that? Should I add something more? In the page of MatPartitioningSetUseEdgeWeights, it is written that "If set use_edge_weights to TRUE, users need to make sure legal edge weights are stored in an ADJ matrix.". How can I make sure of this? I am trying to compare the use of ParMETIS with the spectral partitioning algorithm when I used a weighted Laplacian. Thanks! Eda -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat Aug 29 14:37:31 2020 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 29 Aug 2020 15:37:31 -0400 Subject: [petsc-users] Using edge-weights for partitioning In-Reply-To: References: Message-ID: On Sat, Aug 29, 2020 at 3:24 PM Eda Oktay wrote: > Hi all, > > I am trying to partition a sparse matrix by using ParMETIS. I am > converting my matrix to adjacency type and then applying partitioning. > Default, I understood that partitioning doesn't use edge-weights. However, > when I used the following codes I saw from ex15 and used > "-test_use_edge_weights 1", I am getting the same results as when I don't > consider edge weights. > > PetscBool use_edge_weights=PETSC_FALSE; > > PetscOptionsGetBool(NULL,NULL,"-test_use_edge_weights",&use_edge_weights,NULL); > if (use_edge_weights) { > MatPartitioningSetUseEdgeWeights(part,use_edge_weights); > > MatPartitioningGetUseEdgeWeights(part,&use_edge_weights); > if (!use_edge_weights) SETERRQ(PETSC_COMM_SELF,PETSC_ERR_ARG_INCOMP, > "use_edge_weights flag does not setup correctly \n"); > } > > My matrix does not consist of 1s and 0s, so I want partitioning to > consider all the nonzero elements in the matrix as edge weights. Don't > MatPartitioningSetUseEdgeWeights and MatPartitioningGetUseEdgeWeights do > that? Should I add something more? In the page > of MatPartitioningSetUseEdgeWeights, it is written that "If set > use_edge_weights to TRUE, users need to make sure legal edge weights are > stored in an ADJ matrix.". How can I make sure of this? > This is a question for the ParMetis list. My memory says that the weights need to be non-negative, and for their optimization algorithm to work, they should be small, say < 10. Thanks, Matt > I am trying to compare the use of ParMETIS with the spectral partitioning > algorithm when I used a weighted Laplacian. > > Thanks! > > Eda > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sat Aug 29 18:59:53 2020 From: bsmith at petsc.dev (Barry Smith) Date: Sat, 29 Aug 2020 18:59:53 -0500 Subject: [petsc-users] Using edge-weights for partitioning In-Reply-To: References: Message-ID: > On Aug 29, 2020, at 2:23 PM, Eda Oktay wrote: > > Hi all, > > I am trying to partition a sparse matrix by using ParMETIS. I am converting my matrix to adjacency type and then applying partitioning. You don't need to do this. Just pass your original matrix directly into MatPartitioningSetAdjacency() it will handle any conversions needed. Edge weights need to be positive, since they represent how much communication is to take place over that link. You may need to force your matrix to have all positive values before giving it to MatPartitioningSetAdjacency and using edge weights. I this doesn't work than our code is broken, please send us a simple test case Question: Why are you partitioning a matrix? Is it for load balancing of solves or matrix vector products with the matrix? To reduce interprocess communication during solves or matrix vector products with the matrix? If so the numerical values in the matrix don't affect load balance or interprocess communication for these operations. Barry > Default, I understood that partitioning doesn't use edge-weights. However, when I used the following codes I saw from ex15 and used "-test_use_edge_weights 1", I am getting the same results as when I don't consider edge weights. > > PetscBool use_edge_weights=PETSC_FALSE; > PetscOptionsGetBool(NULL,NULL,"-test_use_edge_weights",&use_edge_weights,NULL); > if (use_edge_weights) { > MatPartitioningSetUseEdgeWeights(part,use_edge_weights); > > MatPartitioningGetUseEdgeWeights(part,&use_edge_weights); > if (!use_edge_weights) SETERRQ(PETSC_COMM_SELF,PETSC_ERR_ARG_INCOMP, "use_edge_weights flag does not setup correctly \n"); > } > > My matrix does not consist of 1s and 0s, so I want partitioning to consider all the nonzero elements in the matrix as edge weights. Don't MatPartitioningSetUseEdgeWeights and MatPartitioningGetUseEdgeWeights do that? Should I add something more? In the page of MatPartitioningSetUseEdgeWeights, it is written that "If set use_edge_weights to TRUE, users need to make sure legal edge weights are stored in an ADJ matrix.". How can I make sure of this? > > I am trying to compare the use of ParMETIS with the spectral partitioning algorithm when I used a weighted Laplacian. > > Thanks! > > Eda > From eda.oktay at metu.edu.tr Sun Aug 30 02:57:46 2020 From: eda.oktay at metu.edu.tr (Eda Oktay) Date: Sun, 30 Aug 2020 10:57:46 +0300 Subject: [petsc-users] Using edge-weights for partitioning In-Reply-To: References: Message-ID: Dear Matt, First of all I figured out that I asked wrongly. It's not ParMETIS giving the same result. It is CHACO. ParMETIS gives different results when I use edge weights. Thanks! Dear Barry, I am trying to partition the matrix to compare the edge cuts when it is partitioned with CHACO, ParMETIS and the spectral partitioning algorithm with the k-means clustering (I wrote this code in PETSc). In the end, I will conclude that if a linear system is to be solved and the coefficient matrix is large in size, then partitioning the coefficient matrix by using one of these algorithms will help one to solve the linear system faster and with small communication. What is forcing matrix to have all positive values? Isn't it done by using MatPartitioningGetUseEdgeWeights and MatPartitioningSetUseEdgeWeights? I will send the test case but I am already passing my original matrix directly to SetAdjacency (SymmA is my symmetric matrix with positive values): ierr = MatConvert(SymmA,MATMPIADJ,MAT_INITIAL_MATRIX,&AL);CHKERRQ(ierr); ierr = MatPartitioningCreate(MPI_COMM_WORLD,&part);CHKERRQ(ierr); ierr = MatPartitioningSetAdjacency(part,AL);CHKERRQ(ierr); ierr = MatPartitioningSetFromOptions(part);CHKERRQ(ierr); So, if ParMETIS gives different edge cut as it is expected, MatPartitioningGetUseEdgeWeights and MatPartitioningSetUseEdgeWeights works correctly. Why can't CHACO? Thanks! Eda Barry Smith , 30 A?u 2020 Paz, 03:00 tarihinde ?unu yazd?: > > > > On Aug 29, 2020, at 2:23 PM, Eda Oktay wrote: > > > > Hi all, > > > > I am trying to partition a sparse matrix by using ParMETIS. I am > converting my matrix to adjacency type and then applying partitioning. > > You don't need to do this. Just pass your original matrix directly into > MatPartitioningSetAdjacency() it will handle any conversions needed. > > Edge weights need to be positive, since they represent how much > communication is to take place over that link. You may need to force your > matrix to have all positive values before giving it to > MatPartitioningSetAdjacency and using edge weights. > > I this doesn't work than our code is broken, please send us a simple > test case > > Question: Why are you partitioning a matrix? Is it for load balancing of > solves or matrix vector products with the matrix? To reduce interprocess > communication during solves or matrix vector products with the matrix? If > so the numerical values in the matrix don't affect load balance or > interprocess communication for these operations. > > > Barry > > > > > > Default, I understood that partitioning doesn't use edge-weights. > However, when I used the following codes I saw from ex15 and used > "-test_use_edge_weights 1", I am getting the same results as when I don't > consider edge weights. > > > > PetscBool use_edge_weights=PETSC_FALSE; > > > PetscOptionsGetBool(NULL,NULL,"-test_use_edge_weights",&use_edge_weights,NULL); > > if (use_edge_weights) { > > MatPartitioningSetUseEdgeWeights(part,use_edge_weights); > > > > MatPartitioningGetUseEdgeWeights(part,&use_edge_weights); > > if (!use_edge_weights) > SETERRQ(PETSC_COMM_SELF,PETSC_ERR_ARG_INCOMP, "use_edge_weights flag does > not setup correctly \n"); > > } > > > > My matrix does not consist of 1s and 0s, so I want partitioning to > consider all the nonzero elements in the matrix as edge weights. Don't > MatPartitioningSetUseEdgeWeights and MatPartitioningGetUseEdgeWeights do > that? Should I add something more? In the page of > MatPartitioningSetUseEdgeWeights, it is written that "If set > use_edge_weights to TRUE, users need to make sure legal edge weights are > stored in an ADJ matrix.". How can I make sure of this? > > > > I am trying to compare the use of ParMETIS with the spectral > partitioning algorithm when I used a weighted Laplacian. > > > > Thanks! > > > > Eda > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eda.oktay at metu.edu.tr Sun Aug 30 03:05:09 2020 From: eda.oktay at metu.edu.tr (Eda Oktay) Date: Sun, 30 Aug 2020 11:05:09 +0300 Subject: [petsc-users] Using edge-weights for partitioning In-Reply-To: References: Message-ID: And is edge weights being less than 10 still valid? Although I am not getting any errors, when the elements in my matrix are larger than 10, even ParMETIS doesn't give different results. Eda Oktay , 30 A?u 2020 Paz, 10:57 tarihinde ?unu yazd?: > Dear Matt, > > First of all I figured out that I asked wrongly. It's not ParMETIS giving > the same result. It is CHACO. ParMETIS gives different results when I use > edge weights. > > Thanks! > > Dear Barry, > > I am trying to partition the matrix to compare the edge cuts when it is > partitioned with CHACO, ParMETIS and the spectral partitioning algorithm > with the k-means clustering (I wrote this code in PETSc). In the end, I > will conclude that if a linear system is to be solved and the coefficient > matrix is large in size, then partitioning the coefficient matrix by using > one of these algorithms will help one to solve the linear system faster and > with small communication. > > What is forcing matrix to have all positive values? Isn't it done by using > MatPartitioningGetUseEdgeWeights and MatPartitioningSetUseEdgeWeights? > > I will send the test case but I am already passing my original matrix > directly to SetAdjacency (SymmA is my symmetric matrix with positive > values): > > ierr = MatConvert(SymmA,MATMPIADJ,MAT_INITIAL_MATRIX,&AL);CHKERRQ(ierr); > > ierr = MatPartitioningCreate(MPI_COMM_WORLD,&part);CHKERRQ(ierr); > ierr = MatPartitioningSetAdjacency(part,AL);CHKERRQ(ierr); > ierr = MatPartitioningSetFromOptions(part);CHKERRQ(ierr); > > So, if ParMETIS gives different edge cut as it is expected, > MatPartitioningGetUseEdgeWeights and MatPartitioningSetUseEdgeWeights works > correctly. Why can't CHACO? > > Thanks! > > Eda > > Barry Smith , 30 A?u 2020 Paz, 03:00 tarihinde ?unu > yazd?: > >> >> >> > On Aug 29, 2020, at 2:23 PM, Eda Oktay wrote: >> > >> > Hi all, >> > >> > I am trying to partition a sparse matrix by using ParMETIS. I am >> converting my matrix to adjacency type and then applying partitioning. >> >> You don't need to do this. Just pass your original matrix directly into >> MatPartitioningSetAdjacency() it will handle any conversions needed. >> >> Edge weights need to be positive, since they represent how much >> communication is to take place over that link. You may need to force your >> matrix to have all positive values before giving it to >> MatPartitioningSetAdjacency and using edge weights. >> >> I this doesn't work than our code is broken, please send us a simple >> test case >> >> Question: Why are you partitioning a matrix? Is it for load balancing >> of solves or matrix vector products with the matrix? To reduce interprocess >> communication during solves or matrix vector products with the matrix? If >> so the numerical values in the matrix don't affect load balance or >> interprocess communication for these operations. >> >> >> Barry >> >> >> >> >> > Default, I understood that partitioning doesn't use edge-weights. >> However, when I used the following codes I saw from ex15 and used >> "-test_use_edge_weights 1", I am getting the same results as when I don't >> consider edge weights. >> > >> > PetscBool use_edge_weights=PETSC_FALSE; >> > >> PetscOptionsGetBool(NULL,NULL,"-test_use_edge_weights",&use_edge_weights,NULL); >> > if (use_edge_weights) { >> > MatPartitioningSetUseEdgeWeights(part,use_edge_weights); >> > >> > MatPartitioningGetUseEdgeWeights(part,&use_edge_weights); >> > if (!use_edge_weights) >> SETERRQ(PETSC_COMM_SELF,PETSC_ERR_ARG_INCOMP, "use_edge_weights flag does >> not setup correctly \n"); >> > } >> > >> > My matrix does not consist of 1s and 0s, so I want partitioning to >> consider all the nonzero elements in the matrix as edge weights. Don't >> MatPartitioningSetUseEdgeWeights and MatPartitioningGetUseEdgeWeights do >> that? Should I add something more? In the page of >> MatPartitioningSetUseEdgeWeights, it is written that "If set >> use_edge_weights to TRUE, users need to make sure legal edge weights are >> stored in an ADJ matrix.". How can I make sure of this? >> > >> > I am trying to compare the use of ParMETIS with the spectral >> partitioning algorithm when I used a weighted Laplacian. >> > >> > Thanks! >> > >> > Eda >> > >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From eda.oktay at metu.edu.tr Sun Aug 30 03:17:18 2020 From: eda.oktay at metu.edu.tr (Eda Oktay) Date: Sun, 30 Aug 2020 11:17:18 +0300 Subject: [petsc-users] Using edge-weights for partitioning In-Reply-To: References: Message-ID: Dear Barry, I attached test code and a small binary matrix whose elements are less than 10 which you can use to test. In the code, as an output, the index set "partitioning" will be given. As you will see, when ParMETIS is used, index set changes when edge weights are used, whereas when the option changes to chaco, index set remains the same. Thanks! Eda Eda Oktay , 30 A?u 2020 Paz, 11:05 tarihinde ?unu yazd?: > And is edge weights being less than 10 still valid? Although I am not > getting any errors, when the elements in my matrix are larger than 10, even > ParMETIS doesn't give different results. > > Eda Oktay , 30 A?u 2020 Paz, 10:57 tarihinde ?unu > yazd?: > >> Dear Matt, >> >> First of all I figured out that I asked wrongly. It's not ParMETIS giving >> the same result. It is CHACO. ParMETIS gives different results when I use >> edge weights. >> >> Thanks! >> >> Dear Barry, >> >> I am trying to partition the matrix to compare the edge cuts when it is >> partitioned with CHACO, ParMETIS and the spectral partitioning algorithm >> with the k-means clustering (I wrote this code in PETSc). In the end, I >> will conclude that if a linear system is to be solved and the coefficient >> matrix is large in size, then partitioning the coefficient matrix by using >> one of these algorithms will help one to solve the linear system faster and >> with small communication. >> >> What is forcing matrix to have all positive values? Isn't it done by >> using MatPartitioningGetUseEdgeWeights and MatPartitioningSetUseEdgeWeights? >> >> I will send the test case but I am already passing my original matrix >> directly to SetAdjacency (SymmA is my symmetric matrix with positive >> values): >> >> ierr = >> MatConvert(SymmA,MATMPIADJ,MAT_INITIAL_MATRIX,&AL);CHKERRQ(ierr); >> >> ierr = MatPartitioningCreate(MPI_COMM_WORLD,&part);CHKERRQ(ierr); >> ierr = MatPartitioningSetAdjacency(part,AL);CHKERRQ(ierr); >> ierr = MatPartitioningSetFromOptions(part);CHKERRQ(ierr); >> >> So, if ParMETIS gives different edge cut as it is expected, >> MatPartitioningGetUseEdgeWeights and MatPartitioningSetUseEdgeWeights works >> correctly. Why can't CHACO? >> >> Thanks! >> >> Eda >> >> Barry Smith , 30 A?u 2020 Paz, 03:00 tarihinde ?unu >> yazd?: >> >>> >>> >>> > On Aug 29, 2020, at 2:23 PM, Eda Oktay wrote: >>> > >>> > Hi all, >>> > >>> > I am trying to partition a sparse matrix by using ParMETIS. I am >>> converting my matrix to adjacency type and then applying partitioning. >>> >>> You don't need to do this. Just pass your original matrix directly into >>> MatPartitioningSetAdjacency() it will handle any conversions needed. >>> >>> Edge weights need to be positive, since they represent how much >>> communication is to take place over that link. You may need to force your >>> matrix to have all positive values before giving it to >>> MatPartitioningSetAdjacency and using edge weights. >>> >>> I this doesn't work than our code is broken, please send us a simple >>> test case >>> >>> Question: Why are you partitioning a matrix? Is it for load balancing >>> of solves or matrix vector products with the matrix? To reduce interprocess >>> communication during solves or matrix vector products with the matrix? If >>> so the numerical values in the matrix don't affect load balance or >>> interprocess communication for these operations. >>> >>> >>> Barry >>> >>> >>> >>> >>> > Default, I understood that partitioning doesn't use edge-weights. >>> However, when I used the following codes I saw from ex15 and used >>> "-test_use_edge_weights 1", I am getting the same results as when I don't >>> consider edge weights. >>> > >>> > PetscBool use_edge_weights=PETSC_FALSE; >>> > >>> PetscOptionsGetBool(NULL,NULL,"-test_use_edge_weights",&use_edge_weights,NULL); >>> > if (use_edge_weights) { >>> > MatPartitioningSetUseEdgeWeights(part,use_edge_weights); >>> > >>> > MatPartitioningGetUseEdgeWeights(part,&use_edge_weights); >>> > if (!use_edge_weights) >>> SETERRQ(PETSC_COMM_SELF,PETSC_ERR_ARG_INCOMP, "use_edge_weights flag does >>> not setup correctly \n"); >>> > } >>> > >>> > My matrix does not consist of 1s and 0s, so I want partitioning to >>> consider all the nonzero elements in the matrix as edge weights. Don't >>> MatPartitioningSetUseEdgeWeights and MatPartitioningGetUseEdgeWeights do >>> that? Should I add something more? In the page of >>> MatPartitioningSetUseEdgeWeights, it is written that "If set >>> use_edge_weights to TRUE, users need to make sure legal edge weights are >>> stored in an ADJ matrix.". How can I make sure of this? >>> > >>> > I am trying to compare the use of ParMETIS with the spectral >>> partitioning algorithm when I used a weighted Laplacian. >>> > >>> > Thanks! >>> > >>> > Eda >>> > >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test_adj.zip Type: application/zip Size: 2408 bytes Desc: not available URL: From knepley at gmail.com Sun Aug 30 07:09:42 2020 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 30 Aug 2020 08:09:42 -0400 Subject: [petsc-users] Using edge-weights for partitioning In-Reply-To: References: Message-ID: On Sun, Aug 30, 2020 at 3:58 AM Eda Oktay wrote: > Dear Matt, > > First of all I figured out that I asked wrongly. It's not ParMETIS giving > the same result. It is CHACO. ParMETIS gives different results when I use > edge weights. > ParMetis is the only partitioner that can use edge weights I believe. Thanks, Matt > Thanks! > > Dear Barry, > > I am trying to partition the matrix to compare the edge cuts when it is > partitioned with CHACO, ParMETIS and the spectral partitioning algorithm > with the k-means clustering (I wrote this code in PETSc). In the end, I > will conclude that if a linear system is to be solved and the coefficient > matrix is large in size, then partitioning the coefficient matrix by using > one of these algorithms will help one to solve the linear system faster and > with small communication. > > What is forcing matrix to have all positive values? Isn't it done by using > MatPartitioningGetUseEdgeWeights and MatPartitioningSetUseEdgeWeights? > > I will send the test case but I am already passing my original matrix > directly to SetAdjacency (SymmA is my symmetric matrix with positive > values): > > ierr = MatConvert(SymmA,MATMPIADJ,MAT_INITIAL_MATRIX,&AL);CHKERRQ(ierr); > > ierr = MatPartitioningCreate(MPI_COMM_WORLD,&part);CHKERRQ(ierr); > ierr = MatPartitioningSetAdjacency(part,AL);CHKERRQ(ierr); > ierr = MatPartitioningSetFromOptions(part);CHKERRQ(ierr); > > So, if ParMETIS gives different edge cut as it is expected, > MatPartitioningGetUseEdgeWeights and MatPartitioningSetUseEdgeWeights works > correctly. Why can't CHACO? > > Thanks! > > Eda > > Barry Smith , 30 A?u 2020 Paz, 03:00 tarihinde ?unu > yazd?: > >> >> >> > On Aug 29, 2020, at 2:23 PM, Eda Oktay wrote: >> > >> > Hi all, >> > >> > I am trying to partition a sparse matrix by using ParMETIS. I am >> converting my matrix to adjacency type and then applying partitioning. >> >> You don't need to do this. Just pass your original matrix directly into >> MatPartitioningSetAdjacency() it will handle any conversions needed. >> >> Edge weights need to be positive, since they represent how much >> communication is to take place over that link. You may need to force your >> matrix to have all positive values before giving it to >> MatPartitioningSetAdjacency and using edge weights. >> >> I this doesn't work than our code is broken, please send us a simple >> test case >> >> Question: Why are you partitioning a matrix? Is it for load balancing >> of solves or matrix vector products with the matrix? To reduce interprocess >> communication during solves or matrix vector products with the matrix? If >> so the numerical values in the matrix don't affect load balance or >> interprocess communication for these operations. >> >> >> Barry >> >> >> >> >> > Default, I understood that partitioning doesn't use edge-weights. >> However, when I used the following codes I saw from ex15 and used >> "-test_use_edge_weights 1", I am getting the same results as when I don't >> consider edge weights. >> > >> > PetscBool use_edge_weights=PETSC_FALSE; >> > >> PetscOptionsGetBool(NULL,NULL,"-test_use_edge_weights",&use_edge_weights,NULL); >> > if (use_edge_weights) { >> > MatPartitioningSetUseEdgeWeights(part,use_edge_weights); >> > >> > MatPartitioningGetUseEdgeWeights(part,&use_edge_weights); >> > if (!use_edge_weights) >> SETERRQ(PETSC_COMM_SELF,PETSC_ERR_ARG_INCOMP, "use_edge_weights flag does >> not setup correctly \n"); >> > } >> > >> > My matrix does not consist of 1s and 0s, so I want partitioning to >> consider all the nonzero elements in the matrix as edge weights. Don't >> MatPartitioningSetUseEdgeWeights and MatPartitioningGetUseEdgeWeights do >> that? Should I add something more? In the page of >> MatPartitioningSetUseEdgeWeights, it is written that "If set >> use_edge_weights to TRUE, users need to make sure legal edge weights are >> stored in an ADJ matrix.". How can I make sure of this? >> > >> > I am trying to compare the use of ParMETIS with the spectral >> partitioning algorithm when I used a weighted Laplacian. >> > >> > Thanks! >> > >> > Eda >> > >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Sun Aug 30 07:33:50 2020 From: mfadams at lbl.gov (Mark Adams) Date: Sun, 30 Aug 2020 08:33:50 -0400 Subject: [petsc-users] Using edge-weights for partitioning In-Reply-To: References: Message-ID: > > > > > So, if ParMETIS gives different edge cut as it is expected, > MatPartitioningGetUseEdgeWeights and MatPartitioningSetUseEdgeWeights works > correctly. Why can't CHACO? > >> >> Chaco does not support using edge weights. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eda.oktay at metu.edu.tr Sun Aug 30 08:05:27 2020 From: eda.oktay at metu.edu.tr (Eda Oktay) Date: Sun, 30 Aug 2020 16:05:27 +0300 Subject: [petsc-users] Using edge-weights for partitioning In-Reply-To: References: Message-ID: Okay thank you so much! Mark Adams , 30 A?u 2020 Paz, 15:34 tarihinde ?unu yazd?: > >> >> >> So, if ParMETIS gives different edge cut as it is expected, >> MatPartitioningGetUseEdgeWeights and MatPartitioningSetUseEdgeWeights works >> correctly. Why can't CHACO? >> >>> >>> > Chaco does not support using edge weights. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sun Aug 30 09:35:18 2020 From: bsmith at petsc.dev (Barry Smith) Date: Sun, 30 Aug 2020 09:35:18 -0500 Subject: [petsc-users] Using edge-weights for partitioning In-Reply-To: References: Message-ID: > On Aug 30, 2020, at 2:57 AM, Eda Oktay wrote: > > Dear Matt, > > First of all I figured out that I asked wrongly. It's not ParMETIS giving the same result. It is CHACO. ParMETIS gives different results when I use edge weights. > > Thanks! > > Dear Barry, > > I am trying to partition the matrix to compare the edge cuts when it is partitioned with CHACO, ParMETIS and the spectral partitioning algorithm with the k-means clustering (I wrote this code in PETSc). In the end, I will conclude that if a linear system is to be solved and the coefficient matrix is large in size, then partitioning the coefficient matrix by using one of these algorithms will help one to solve the linear system faster and with small communication. > > What is forcing matrix to have all positive values? Isn't it done by using MatPartitioningGetUseEdgeWeights and MatPartitioningSetUseEdgeWeights? > > I will send the test case but I am already passing my original matrix directly to SetAdjacency (SymmA is my symmetric matrix with positive values): > > ierr = MatConvert(SymmA,MATMPIADJ,MAT_INITIAL_MATRIX,&AL);CHKERRQ(ierr); > ierr = MatPartitioningCreate(MPI_COMM_WORLD,&part);CHKERRQ(ierr); > ierr = MatPartitioningSetAdjacency(part,AL);CHKERRQ(ierr); > ierr = MatPartitioningSetFromOptions(part);CHKERRQ(ierr); You should not need this. Just ierr = MatPartitioningCreate(MPI_COMM_WORLD,&part);CHKERRQ(ierr); ierr = MatPartitioningSetAdjacency(part,SymmA);CHKERRQ(ierr); ierr = MatPartitioningSetFromOptions(part);CHKERRQ(ierr); MatPartitioningSetAdjacency takes any MatType directly. > > So, if ParMETIS gives different edge cut as it is expected, MatPartitioningGetUseEdgeWeights and MatPartitioningSetUseEdgeWeights works correctly. Why can't CHACO? > > Thanks! > > Eda > > Barry Smith >, 30 A?u 2020 Paz, 03:00 tarihinde ?unu yazd?: > > > > On Aug 29, 2020, at 2:23 PM, Eda Oktay > wrote: > > > > Hi all, > > > > I am trying to partition a sparse matrix by using ParMETIS. I am converting my matrix to adjacency type and then applying partitioning. > > You don't need to do this. Just pass your original matrix directly into MatPartitioningSetAdjacency() it will handle any conversions needed. > > Edge weights need to be positive, since they represent how much communication is to take place over that link. You may need to force your matrix to have all positive values before giving it to MatPartitioningSetAdjacency and using edge weights. > > I this doesn't work than our code is broken, please send us a simple test case > > Question: Why are you partitioning a matrix? Is it for load balancing of solves or matrix vector products with the matrix? To reduce interprocess communication during solves or matrix vector products with the matrix? If so the numerical values in the matrix don't affect load balance or interprocess communication for these operations. > > > Barry > > > > > > Default, I understood that partitioning doesn't use edge-weights. However, when I used the following codes I saw from ex15 and used "-test_use_edge_weights 1", I am getting the same results as when I don't consider edge weights. > > > > PetscBool use_edge_weights=PETSC_FALSE; > > PetscOptionsGetBool(NULL,NULL,"-test_use_edge_weights",&use_edge_weights,NULL); > > if (use_edge_weights) { > > MatPartitioningSetUseEdgeWeights(part,use_edge_weights); > > > > MatPartitioningGetUseEdgeWeights(part,&use_edge_weights); > > if (!use_edge_weights) SETERRQ(PETSC_COMM_SELF,PETSC_ERR_ARG_INCOMP, "use_edge_weights flag does not setup correctly \n"); > > } > > > > My matrix does not consist of 1s and 0s, so I want partitioning to consider all the nonzero elements in the matrix as edge weights. Don't MatPartitioningSetUseEdgeWeights and MatPartitioningGetUseEdgeWeights do that? Should I add something more? In the page of MatPartitioningSetUseEdgeWeights, it is written that "If set use_edge_weights to TRUE, users need to make sure legal edge weights are stored in an ADJ matrix.". How can I make sure of this? > > > > I am trying to compare the use of ParMETIS with the spectral partitioning algorithm when I used a weighted Laplacian. > > > > Thanks! > > > > Eda > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Sun Aug 30 13:35:52 2020 From: jroman at dsic.upv.es (Jose E. Roman) Date: Sun, 30 Aug 2020 20:35:52 +0200 Subject: [petsc-users] Using edge-weights for partitioning In-Reply-To: References: Message-ID: <522CBDFF-1D05-41DC-A656-29C4A3A97DF0@dsic.upv.es> The user interface of Chaco includes an argument for edge weights (don't know which methods within Chaco take them into account). The original PETSc wrapper to Chaco was developed by a student of mine (together with Scotch, Party and Jostle) at around 2003. At that time, MatPartitioning only had support for vertex weights (edge weight support has been introduced very recently), so we just passed a NULL in this argument. Jose > El 30 ago 2020, a las 16:35, Barry Smith escribi?: > > > >> On Aug 30, 2020, at 2:57 AM, Eda Oktay wrote: >> >> Dear Matt, >> >> First of all I figured out that I asked wrongly. It's not ParMETIS giving the same result. It is CHACO. ParMETIS gives different results when I use edge weights. >> >> Thanks! >> >> Dear Barry, >> >> I am trying to partition the matrix to compare the edge cuts when it is partitioned with CHACO, ParMETIS and the spectral partitioning algorithm with the k-means clustering (I wrote this code in PETSc). In the end, I will conclude that if a linear system is to be solved and the coefficient matrix is large in size, then partitioning the coefficient matrix by using one of these algorithms will help one to solve the linear system faster and with small communication. >> >> What is forcing matrix to have all positive values? Isn't it done by using MatPartitioningGetUseEdgeWeights and MatPartitioningSetUseEdgeWeights? >> >> I will send the test case but I am already passing my original matrix directly to SetAdjacency (SymmA is my symmetric matrix with positive values): >> >> ierr = MatConvert(SymmA,MATMPIADJ,MAT_INITIAL_MATRIX,&AL);CHKERRQ(ierr); >> ierr = MatPartitioningCreate(MPI_COMM_WORLD,&part);CHKERRQ(ierr); >> ierr = MatPartitioningSetAdjacency(part,AL);CHKERRQ(ierr); >> ierr = MatPartitioningSetFromOptions(part);CHKERRQ(ierr); > > You should not need this. Just > > ierr = MatPartitioningCreate(MPI_COMM_WORLD,&part);CHKERRQ(ierr); > ierr = MatPartitioningSetAdjacency(part,SymmA);CHKERRQ(ierr); > ierr = MatPartitioningSetFromOptions(part);CHKERRQ(ierr); > > > MatPartitioningSetAdjacency takes any MatType directly. > > >> >> So, if ParMETIS gives different edge cut as it is expected, MatPartitioningGetUseEdgeWeights and MatPartitioningSetUseEdgeWeights works correctly. Why can't CHACO? >> >> Thanks! >> >> Eda >> >> Barry Smith , 30 A?u 2020 Paz, 03:00 tarihinde ?unu yazd?: >> >> >> > On Aug 29, 2020, at 2:23 PM, Eda Oktay wrote: >> > >> > Hi all, >> > >> > I am trying to partition a sparse matrix by using ParMETIS. I am converting my matrix to adjacency type and then applying partitioning. >> >> You don't need to do this. Just pass your original matrix directly into MatPartitioningSetAdjacency() it will handle any conversions needed. >> >> Edge weights need to be positive, since they represent how much communication is to take place over that link. You may need to force your matrix to have all positive values before giving it to MatPartitioningSetAdjacency and using edge weights. >> >> I this doesn't work than our code is broken, please send us a simple test case >> >> Question: Why are you partitioning a matrix? Is it for load balancing of solves or matrix vector products with the matrix? To reduce interprocess communication during solves or matrix vector products with the matrix? If so the numerical values in the matrix don't affect load balance or interprocess communication for these operations. >> >> >> Barry >> >> >> >> >> > Default, I understood that partitioning doesn't use edge-weights. However, when I used the following codes I saw from ex15 and used "-test_use_edge_weights 1", I am getting the same results as when I don't consider edge weights. >> > >> > PetscBool use_edge_weights=PETSC_FALSE; >> > PetscOptionsGetBool(NULL,NULL,"-test_use_edge_weights",&use_edge_weights,NULL); >> > if (use_edge_weights) { >> > MatPartitioningSetUseEdgeWeights(part,use_edge_weights); >> > >> > MatPartitioningGetUseEdgeWeights(part,&use_edge_weights); >> > if (!use_edge_weights) SETERRQ(PETSC_COMM_SELF,PETSC_ERR_ARG_INCOMP, "use_edge_weights flag does not setup correctly \n"); >> > } >> > >> > My matrix does not consist of 1s and 0s, so I want partitioning to consider all the nonzero elements in the matrix as edge weights. Don't MatPartitioningSetUseEdgeWeights and MatPartitioningGetUseEdgeWeights do that? Should I add something more? In the page of MatPartitioningSetUseEdgeWeights, it is written that "If set use_edge_weights to TRUE, users need to make sure legal edge weights are stored in an ADJ matrix.". How can I make sure of this? >> > >> > I am trying to compare the use of ParMETIS with the spectral partitioning algorithm when I used a weighted Laplacian. >> > >> > Thanks! >> > >> > Eda >> > From eda.oktay at metu.edu.tr Sun Aug 30 13:37:33 2020 From: eda.oktay at metu.edu.tr (Eda Oktay) Date: Sun, 30 Aug 2020 21:37:33 +0300 Subject: [petsc-users] Using edge-weights for partitioning In-Reply-To: <522CBDFF-1D05-41DC-A656-29C4A3A97DF0@dsic.upv.es> References: <522CBDFF-1D05-41DC-A656-29C4A3A97DF0@dsic.upv.es> Message-ID: Okay, thank you so much! Eda On Sun, Aug 30, 2020, 9:36 PM Jose E. Roman wrote: > The user interface of Chaco includes an argument for edge weights (don't > know which methods within Chaco take them into account). The original PETSc > wrapper to Chaco was developed by a student of mine (together with Scotch, > Party and Jostle) at around 2003. At that time, MatPartitioning only had > support for vertex weights (edge weight support has been introduced very > recently), so we just passed a NULL in this argument. > > Jose > > > > El 30 ago 2020, a las 16:35, Barry Smith escribi?: > > > > > > > >> On Aug 30, 2020, at 2:57 AM, Eda Oktay wrote: > >> > >> Dear Matt, > >> > >> First of all I figured out that I asked wrongly. It's not ParMETIS > giving the same result. It is CHACO. ParMETIS gives different results when > I use edge weights. > >> > >> Thanks! > >> > >> Dear Barry, > >> > >> I am trying to partition the matrix to compare the edge cuts when it is > partitioned with CHACO, ParMETIS and the spectral partitioning algorithm > with the k-means clustering (I wrote this code in PETSc). In the end, I > will conclude that if a linear system is to be solved and the coefficient > matrix is large in size, then partitioning the coefficient matrix by using > one of these algorithms will help one to solve the linear system faster and > with small communication. > >> > >> What is forcing matrix to have all positive values? Isn't it done by > using MatPartitioningGetUseEdgeWeights and MatPartitioningSetUseEdgeWeights? > >> > >> I will send the test case but I am already passing my original matrix > directly to SetAdjacency (SymmA is my symmetric matrix with positive > values): > >> > >> ierr = > MatConvert(SymmA,MATMPIADJ,MAT_INITIAL_MATRIX,&AL);CHKERRQ(ierr); > > >> ierr = MatPartitioningCreate(MPI_COMM_WORLD,&part);CHKERRQ(ierr); > >> ierr = MatPartitioningSetAdjacency(part,AL);CHKERRQ(ierr); > >> ierr = MatPartitioningSetFromOptions(part);CHKERRQ(ierr); > > > > You should not need this. Just > > > > ierr = MatPartitioningCreate(MPI_COMM_WORLD,&part);CHKERRQ(ierr); > > ierr = MatPartitioningSetAdjacency(part,SymmA);CHKERRQ(ierr); > > ierr = MatPartitioningSetFromOptions(part);CHKERRQ(ierr); > > > > > > MatPartitioningSetAdjacency takes any MatType directly. > > > > > >> > >> So, if ParMETIS gives different edge cut as it is expected, > MatPartitioningGetUseEdgeWeights and MatPartitioningSetUseEdgeWeights works > correctly. Why can't CHACO? > >> > >> Thanks! > >> > >> Eda > >> > >> Barry Smith , 30 A?u 2020 Paz, 03:00 tarihinde ?unu > yazd?: > >> > >> > >> > On Aug 29, 2020, at 2:23 PM, Eda Oktay wrote: > >> > > >> > Hi all, > >> > > >> > I am trying to partition a sparse matrix by using ParMETIS. I am > converting my matrix to adjacency type and then applying partitioning. > >> > >> You don't need to do this. Just pass your original matrix directly > into MatPartitioningSetAdjacency() it will handle any conversions needed. > >> > >> Edge weights need to be positive, since they represent how much > communication is to take place over that link. You may need to force your > matrix to have all positive values before giving it to > MatPartitioningSetAdjacency and using edge weights. > >> > >> I this doesn't work than our code is broken, please send us a simple > test case > >> > >> Question: Why are you partitioning a matrix? Is it for load balancing > of solves or matrix vector products with the matrix? To reduce interprocess > communication during solves or matrix vector products with the matrix? If > so the numerical values in the matrix don't affect load balance or > interprocess communication for these operations. > >> > >> > >> Barry > >> > >> > >> > >> > >> > Default, I understood that partitioning doesn't use edge-weights. > However, when I used the following codes I saw from ex15 and used > "-test_use_edge_weights 1", I am getting the same results as when I don't > consider edge weights. > >> > > >> > PetscBool use_edge_weights=PETSC_FALSE; > >> > > PetscOptionsGetBool(NULL,NULL,"-test_use_edge_weights",&use_edge_weights,NULL); > >> > if (use_edge_weights) { > >> > MatPartitioningSetUseEdgeWeights(part,use_edge_weights); > >> > > >> > MatPartitioningGetUseEdgeWeights(part,&use_edge_weights); > >> > if (!use_edge_weights) > SETERRQ(PETSC_COMM_SELF,PETSC_ERR_ARG_INCOMP, "use_edge_weights flag does > not setup correctly \n"); > >> > } > >> > > >> > My matrix does not consist of 1s and 0s, so I want partitioning to > consider all the nonzero elements in the matrix as edge weights. Don't > MatPartitioningSetUseEdgeWeights and MatPartitioningGetUseEdgeWeights do > that? Should I add something more? In the page of > MatPartitioningSetUseEdgeWeights, it is written that "If set > use_edge_weights to TRUE, users need to make sure legal edge weights are > stored in an ADJ matrix.". How can I make sure of this? > >> > > >> > I am trying to compare the use of ParMETIS with the spectral > partitioning algorithm when I used a weighted Laplacian. > >> > > >> > Thanks! > >> > > >> > Eda > >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sun Aug 30 17:19:05 2020 From: bsmith at petsc.dev (Barry Smith) Date: Sun, 30 Aug 2020 17:19:05 -0500 Subject: [petsc-users] Using edge-weights for partitioning In-Reply-To: References: Message-ID: <97B3FB32-B92F-4AE4-BE55-6B9B9C51CEA6@petsc.dev> > On Aug 30, 2020, at 7:33 AM, Mark Adams wrote: > > > > > So, if ParMETIS gives different edge cut as it is expected, MatPartitioningGetUseEdgeWeights and MatPartitioningSetUseEdgeWeights works correctly. Why can't CHACO? > > > Chaco does not support using edge weights. The package interfaces that do not support edge weights should error if one requests partitioning with edge weights. Not everyone is born with the innate knowledge that the Chaco PETSc interface doesn't support edge weights. > https://gitlab.com/petsc/petsc/-/merge_requests/3119 -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdkong.jd at gmail.com Sun Aug 30 17:44:15 2020 From: fdkong.jd at gmail.com (Fande Kong) Date: Sun, 30 Aug 2020 16:44:15 -0600 Subject: [petsc-users] Using edge-weights for partitioning In-Reply-To: <97B3FB32-B92F-4AE4-BE55-6B9B9C51CEA6@petsc.dev> References: <97B3FB32-B92F-4AE4-BE55-6B9B9C51CEA6@petsc.dev> Message-ID: I agreed, Barry. A year ago, I enabled edge-weights and vertex weights for only ParMETIS and PTScotch. I did not do the same thing for Chaco, Party, etc. It is straightforward to do that, and I could add an MR if needed. Thanks, Fande, On Sun, Aug 30, 2020 at 4:20 PM Barry Smith wrote: > > > On Aug 30, 2020, at 7:33 AM, Mark Adams wrote: > > >> >> >> So, if ParMETIS gives different edge cut as it is expected, >> MatPartitioningGetUseEdgeWeights and MatPartitioningSetUseEdgeWeights works >> correctly. Why can't CHACO? >> >>> >>> > Chaco does not support using edge weights. > > > The package interfaces that do not support edge weights should error if > one requests partitioning with edge weights. Not everyone is born with the > innate knowledge that the Chaco PETSc interface doesn't support edge > weights. > > > > > https://gitlab.com/petsc/petsc/-/merge_requests/3119 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From elbueler at alaska.edu Sun Aug 30 17:44:34 2020 From: elbueler at alaska.edu (Ed Bueler) Date: Sun, 30 Aug 2020 14:44:34 -0800 Subject: [petsc-users] ARKIMEX produces incorrect values Message-ID: Dear PETSc -- I tried twice to make this an issue at the gitlab.com host site, but both times got "something went wrong (500)". So this is a bug report by old-fashioned means. I created a TS example, https://github.com/bueler/p4pdes-next/blob/master/c/fix-arkimex/ex54.c at my github, also attached. It solves a 2D linear ODE ``` x' + y' = 6 y y' = x ``` Pretty basic; the known exact solution is just exponentials. The code writes it as F(t,u,u')=G(t,u) and supplies all the pieces, namely IFunction,IJacobian,RHSFunction,RHSJacobian. Note both F and G must be seen by TS to get the correct solution. In summary, a boring (and valgrind-clean ;-)) example. For current master branch it runs fine for the fully-implicit methods (e.g. BDF, CN, ROSW) which can use the IFunction F, including with finite-differenced Jacobians. With BDF2, BDF2+-snes_fd, BDF6+tight tol., CN, BEULER, ROSW: $ ./ex54 error norm at tf = 1.000000 from 33 steps: |u-u_exact| = 9.29170e-02 $ ./ex54 -snes_fd error norm at tf = 1.000000 from 33 steps: |u-u_exact| = 9.29170e-02 $ ./ex54 -ts_rtol 1.0e-14 -ts_atol 1.0e-14 -ts_bdf_order 6 error norm at tf = 1.000000 from 388 steps: |u-u_exact| = 4.23624e-11 $ ./ex54 -ts_type beuler error norm at tf = 1.000000 from 100 steps: |u-u_exact| = 6.71676e-01 $ ./ex54 -ts_type cn error norm at tf = 1.000000 from 100 steps: |u-u_exact| = 2.22839e-03 $ ./ex54 -ts_type rosw error norm at tf = 1.000000 from 21 steps: |u-u_exact| = 5.64012e-03 But it produces wrong values with ARKIMEX: $ ./ex54 -ts_type arkimex error norm at tf = 1.000000 from 16 steps: |u-u_exact| = 1.93229e+01 Neither tightening tolerance nor changing type (`-ts_arkimex_type`) helps ARKIMEX. Thanks! Ed PS My book is at a late proofs stage, and out of my hands. It should appear SIAM Press in a couple of months. In all the examples in my book, only my diffusion-reaction system example using F(t,u,u') = G(t,u) is broken. Thus the motivation for a trivial ODE example as above. -- Ed Bueler Dept of Mathematics and Statistics University of Alaska Fairbanks Fairbanks, AK 99775-6660 306C Chapman -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex54.c Type: text/x-csrc Size: 6584 bytes Desc: not available URL: From elbueler at alaska.edu Sun Aug 30 17:57:47 2020 From: elbueler at alaska.edu (Ed Bueler) Date: Sun, 30 Aug 2020 14:57:47 -0800 Subject: [petsc-users] ARKIMEX produces incorrect values In-Reply-To: References: Message-ID: Darn, sorry. I realize the ARKIMEX page does say "Methods with an explicit stage can only be used with ODE in which the stiff part G(t,X,Xdot) has the form Xdot + Ghat(t,X)." So my example does not do that. Is there a way for ARKIMEX to detect that dG/d(Xdot) = I? Ed On Sun, Aug 30, 2020 at 2:44 PM Ed Bueler wrote: > Dear PETSc -- > > I tried twice to make this an issue at the gitlab.com host site, but both > times got "something went wrong (500)". So this is a bug report by > old-fashioned means. > > I created a TS example, > https://github.com/bueler/p4pdes-next/blob/master/c/fix-arkimex/ex54.c at > my github, also attached. It solves a 2D linear ODE > ``` > x' + y' = 6 y > y' = x > ``` > Pretty basic; the known exact solution is just exponentials. The code > writes it as F(t,u,u')=G(t,u) and supplies all the pieces, namely > IFunction,IJacobian,RHSFunction,RHSJacobian. Note both F and G must be > seen by TS to get the correct solution. In summary, a boring (and > valgrind-clean ;-)) example. > > For current master branch it runs fine for the fully-implicit methods > (e.g. BDF, CN, ROSW) which can use the IFunction F, including with > finite-differenced Jacobians. With BDF2, BDF2+-snes_fd, BDF6+tight tol., > CN, BEULER, ROSW: > $ ./ex54 > error norm at tf = 1.000000 from 33 steps: |u-u_exact| = 9.29170e-02 > $ ./ex54 -snes_fd > error norm at tf = 1.000000 from 33 steps: |u-u_exact| = 9.29170e-02 > $ ./ex54 -ts_rtol 1.0e-14 -ts_atol 1.0e-14 -ts_bdf_order 6 > error norm at tf = 1.000000 from 388 steps: |u-u_exact| = 4.23624e-11 > $ ./ex54 -ts_type beuler > error norm at tf = 1.000000 from 100 steps: |u-u_exact| = 6.71676e-01 > $ ./ex54 -ts_type cn > error norm at tf = 1.000000 from 100 steps: |u-u_exact| = 2.22839e-03 > $ ./ex54 -ts_type rosw > error norm at tf = 1.000000 from 21 steps: |u-u_exact| = 5.64012e-03 > > But it produces wrong values with ARKIMEX: > $ ./ex54 -ts_type arkimex > error norm at tf = 1.000000 from 16 steps: |u-u_exact| = 1.93229e+01 > > Neither tightening tolerance nor changing type (`-ts_arkimex_type`) helps > ARKIMEX. > > Thanks! > > Ed > > PS My book is at a late proofs stage, and out of my hands. It should > appear SIAM Press in a couple of months. In all the examples in my book, > only my diffusion-reaction system example using F(t,u,u') = G(t,u) is > broken. Thus the motivation for a trivial ODE example as above. > > > -- > Ed Bueler > Dept of Mathematics and Statistics > University of Alaska Fairbanks > Fairbanks, AK 99775-6660 > 306C Chapman > -- Ed Bueler Dept of Mathematics and Statistics University of Alaska Fairbanks Fairbanks, AK 99775-6660 306C Chapman -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Sun Aug 30 18:04:33 2020 From: jed at jedbrown.org (Jed Brown) Date: Sun, 30 Aug 2020 17:04:33 -0600 Subject: [petsc-users] ARKIMEX produces incorrect values In-Reply-To: References: Message-ID: <871rjnhh1a.fsf@jedbrown.org> Ed Bueler writes: > Darn, sorry. > > I realize the ARKIMEX page does say "Methods with an explicit stage can > only be used with ODE in which the stiff part G(t,X,Xdot) has the form Xdot > + Ghat(t,X)." So my example does not do that. Is there a way for > ARKIMEX to detect that dG/d(Xdot) = I? Other than sampling its action on vectors? Not really; it's user code. Emil, per our thread the other day, here is an example of "misuse" by a very experienced user. We need a better way to detect or provide feedback to users. From elbueler at alaska.edu Sun Aug 30 18:04:55 2020 From: elbueler at alaska.edu (Ed Bueler) Date: Sun, 30 Aug 2020 15:04:55 -0800 Subject: [petsc-users] ARKIMEX produces incorrect values In-Reply-To: References: Message-ID: Actually, ARKIMEX is not off the hook. It still gets the wrong answer if told the whole thing is implicit: $ ./ex54 -ts_type arkimex -ts_arkimex_fully_implicit # WRONG (AND REALLY SLOW) error norm at tf = 1.000000 from 224 steps: |u-u_exact| = 2.76636e+00 versus $ ./ex54 -ts_type arkimex # WRONG BUT IFunction IS OF FLAGGED FORM error norm at tf = 1.000000 from 16 steps: |u-u_exact| = 1.93229e+01 $ ./ex54 -ts_type bdf # RIGHT error norm at tf = 1.000000 from 33 steps: |u-u_exact| = 9.29170e-02 So I am not sure what "Methods with an explicit stage can only be used with ODE in which the stiff part G(t,X,Xdot) has the form Xdot + Ghat(t,X)." means. Ed On Sun, Aug 30, 2020 at 2:57 PM Ed Bueler wrote: > Darn, sorry. > > I realize the ARKIMEX page does say "Methods with an explicit stage can > only be used with ODE in which the stiff part G(t,X,Xdot) has the form Xdot > + Ghat(t,X)." So my example does not do that. Is there a way for > ARKIMEX to detect that dG/d(Xdot) = I? > > Ed > > On Sun, Aug 30, 2020 at 2:44 PM Ed Bueler wrote: > >> Dear PETSc -- >> >> I tried twice to make this an issue at the gitlab.com host site, but >> both times got "something went wrong (500)". So this is a bug report by >> old-fashioned means. >> >> I created a TS example, >> https://github.com/bueler/p4pdes-next/blob/master/c/fix-arkimex/ex54.c >> at my github, also attached. It solves a 2D linear ODE >> ``` >> x' + y' = 6 y >> y' = x >> ``` >> Pretty basic; the known exact solution is just exponentials. The code >> writes it as F(t,u,u')=G(t,u) and supplies all the pieces, namely >> IFunction,IJacobian,RHSFunction,RHSJacobian. Note both F and G must be >> seen by TS to get the correct solution. In summary, a boring (and >> valgrind-clean ;-)) example. >> >> For current master branch it runs fine for the fully-implicit methods >> (e.g. BDF, CN, ROSW) which can use the IFunction F, including with >> finite-differenced Jacobians. With BDF2, BDF2+-snes_fd, BDF6+tight tol., >> CN, BEULER, ROSW: >> $ ./ex54 >> error norm at tf = 1.000000 from 33 steps: |u-u_exact| = 9.29170e-02 >> $ ./ex54 -snes_fd >> error norm at tf = 1.000000 from 33 steps: |u-u_exact| = 9.29170e-02 >> $ ./ex54 -ts_rtol 1.0e-14 -ts_atol 1.0e-14 -ts_bdf_order 6 >> error norm at tf = 1.000000 from 388 steps: |u-u_exact| = 4.23624e-11 >> $ ./ex54 -ts_type beuler >> error norm at tf = 1.000000 from 100 steps: |u-u_exact| = 6.71676e-01 >> $ ./ex54 -ts_type cn >> error norm at tf = 1.000000 from 100 steps: |u-u_exact| = 2.22839e-03 >> $ ./ex54 -ts_type rosw >> error norm at tf = 1.000000 from 21 steps: |u-u_exact| = 5.64012e-03 >> >> But it produces wrong values with ARKIMEX: >> $ ./ex54 -ts_type arkimex >> error norm at tf = 1.000000 from 16 steps: |u-u_exact| = 1.93229e+01 >> >> Neither tightening tolerance nor changing type (`-ts_arkimex_type`) helps >> ARKIMEX. >> >> Thanks! >> >> Ed >> >> PS My book is at a late proofs stage, and out of my hands. It should >> appear SIAM Press in a couple of months. In all the examples in my book, >> only my diffusion-reaction system example using F(t,u,u') = G(t,u) is >> broken. Thus the motivation for a trivial ODE example as above. >> >> >> -- >> Ed Bueler >> Dept of Mathematics and Statistics >> University of Alaska Fairbanks >> Fairbanks, AK 99775-6660 >> 306C Chapman >> > > > -- > Ed Bueler > Dept of Mathematics and Statistics > University of Alaska Fairbanks > Fairbanks, AK 99775-6660 > 306C Chapman > -- Ed Bueler Dept of Mathematics and Statistics University of Alaska Fairbanks Fairbanks, AK 99775-6660 306C Chapman -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sun Aug 30 18:06:03 2020 From: bsmith at petsc.dev (Barry Smith) Date: Sun, 30 Aug 2020 18:06:03 -0500 Subject: [petsc-users] Using edge-weights for partitioning In-Reply-To: References: <97B3FB32-B92F-4AE4-BE55-6B9B9C51CEA6@petsc.dev> Message-ID: I don't think they are super important. Barry > On Aug 30, 2020, at 5:44 PM, Fande Kong wrote: > > I agreed, Barry. > > A year ago, I enabled edge-weights and vertex weights for only ParMETIS and PTScotch. I did not do the same thing for Chaco, Party, etc. > > It is straightforward to do that, and I could add an MR if needed. > > Thanks, > > Fande, > > On Sun, Aug 30, 2020 at 4:20 PM Barry Smith > wrote: > > >> On Aug 30, 2020, at 7:33 AM, Mark Adams > wrote: >> >> >> >> >> So, if ParMETIS gives different edge cut as it is expected, MatPartitioningGetUseEdgeWeights and MatPartitioningSetUseEdgeWeights works correctly. Why can't CHACO? >> >> >> Chaco does not support using edge weights. > > The package interfaces that do not support edge weights should error if one requests partitioning with edge weights. Not everyone is born with the innate knowledge that the Chaco PETSc interface doesn't support edge weights. > > >> > https://gitlab.com/petsc/petsc/-/merge_requests/3119 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Sun Aug 30 18:24:37 2020 From: bsmith at petsc.dev (Barry Smith) Date: Sun, 30 Aug 2020 18:24:37 -0500 Subject: [petsc-users] ARKIMEX produces incorrect values In-Reply-To: <871rjnhh1a.fsf@jedbrown.org> References: <871rjnhh1a.fsf@jedbrown.org> Message-ID: <27E06EBB-7BB0-42BB-B3CE-49B01EB4D6E6@petsc.dev> > On Aug 30, 2020, at 6:04 PM, Jed Brown wrote: > > Ed Bueler writes: > >> Darn, sorry. >> >> I realize the ARKIMEX page does say "Methods with an explicit stage can >> only be used with ODE in which the stiff part G(t,X,Xdot) has the form Xdot >> + Ghat(t,X)." So my example does not do that. Is there a way for >> ARKIMEX to detect that dG/d(Xdot) = I? > > Other than sampling its action on vectors? Not really; it's user code. Call TSComputeIFunction(TS ts,PetscReal t,Vec U,Vec Udot,Vec Y,PetscBool imex) with Udot all 1 and rhsfunction turned off. Call again with Udot is zero, take the difference. If it does not return all 1 then you know the user has provided an unacceptable function? You could do this at the beginning of each TSSolve() for these picky methods in debug mode. > > Emil, per our thread the other day, here is an example of "misuse" by a very experienced user. We need a better way to detect or provide feedback to users. From elbueler at alaska.edu Mon Aug 31 01:15:57 2020 From: elbueler at alaska.edu (Ed Bueler) Date: Sun, 30 Aug 2020 22:15:57 -0800 Subject: [petsc-users] ARKIMEX produces incorrect values In-Reply-To: <27E06EBB-7BB0-42BB-B3CE-49B01EB4D6E6@petsc.dev> References: <871rjnhh1a.fsf@jedbrown.org> <27E06EBB-7BB0-42BB-B3CE-49B01EB4D6E6@petsc.dev> Message-ID: >>> I realize the ARKIMEX page does say "Methods with an explicit stage can >>> only be used with ODE in which the stiff part G(t,X,Xdot) has the form Xdot >>> + Ghat(t,X)." So my example does not do that. Is there a way for >>> ARKIMEX to detect that dG/d(Xdot) = I? > >> Other than sampling its action on vectors? Not really; it's user code. > > Call TSComputeIFunction(TS ts,PetscReal t,Vec U,Vec Udot,Vec Y,PetscBool imex) with Udot all 1 and rhsfunction turned off. Call again with Udot is zero, take the difference. If it does not return all 1 then you know the user has provided an unacceptable function? > > You could do this at the beginning of each TSSolve() for these picky methods in debug mode. How about calling the IJacobian first with a=0 and then with a=1 and subtracting matrices, to see if the result is the identity? Ed On Sun, Aug 30, 2020 at 3:24 PM Barry Smith wrote: > > > > On Aug 30, 2020, at 6:04 PM, Jed Brown wrote: > > > > Ed Bueler writes: > > > >> Darn, sorry. > >> > >> I realize the ARKIMEX page does say "Methods with an explicit stage can > >> only be used with ODE in which the stiff part G(t,X,Xdot) has the form > Xdot > >> + Ghat(t,X)." So my example does not do that. Is there a way for > >> ARKIMEX to detect that dG/d(Xdot) = I? > > > > Other than sampling its action on vectors? Not really; it's user code. > > Call TSComputeIFunction(TS ts,PetscReal t,Vec U,Vec Udot,Vec Y,PetscBool > imex) with Udot all 1 and rhsfunction turned off. Call again with Udot is > zero, take the difference. If it does not return all 1 then you know the > user has provided an unacceptable function? > > You could do this at the beginning of each TSSolve() for these picky > methods in debug mode. > > > > > > Emil, per our thread the other day, here is an example of "misuse" by a > very experienced user. We need a better way to detect or provide feedback > to users. > > -- Ed Bueler Dept of Mathematics and Statistics University of Alaska Fairbanks Fairbanks, AK 99775-6660 306C Chapman -------------- next part -------------- An HTML attachment was scrubbed... URL: From thibault.bridelbertomeu at gmail.com Mon Aug 31 04:32:50 2020 From: thibault.bridelbertomeu at gmail.com (Thibault Bridel-Bertomeu) Date: Mon, 31 Aug 2020 11:32:50 +0200 Subject: [petsc-users] DMAdaptLabel with triangle mesh Message-ID: Dear all, I have recently been playing around with the AMR capabilities embedded in PETSc for quad meshes using p4est. Based on the TS tutorial ex11, I was able to incorporate the AMR into a pre-existing code with different metrics for the adaptation process. Now I would like to do something similar using tri meshes. I read that compiling PETSc with Triangle (in 2D and Tetgen for 3D) gives access to refinement and coarsening capabilities on triangular meshes.When I try to execute the code with a triangular mesh (that i manipulate as a DMPLEX), it yields "Triangle 1700 has an invalid vertex index" when trying to adapt the mesh (the initial mesh indeed has 1700 cells). From what i could tell, it comes from the reconstruct method called by the triangulate method of triangle.c, the latter being called by either *DMPlexGenerate_Triangle *or *DMPlexRefine_Triangle *in PETSc, I cannot be sure. In substance, the code is the same as in ex11.c and the crash occurs in the first adaptation pass, i.e. an equivalent in ex11 is that it crashes after the SetInitialCondition in the first if (useAMR) located line 1835 when it calls adaptToleranceFVM (which I copied basically so the code is the same). Is the automatic mesh refinement feature on tri meshes supposed to work or am I trying something that has not been completed yet ? Thank you very much for your help, as always. Thibault Bridel-Bertomeu ? Eng, MSc, PhD Research Engineer CEA/CESTA 33114 LE BARP Tel.: (+33)557046924 Mob.: (+33)611025322 Mail: thibault.bridelbertomeu at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Aug 31 05:55:36 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 31 Aug 2020 06:55:36 -0400 Subject: [petsc-users] DMAdaptLabel with triangle mesh In-Reply-To: References: Message-ID: On Mon, Aug 31, 2020 at 5:34 AM Thibault Bridel-Bertomeu < thibault.bridelbertomeu at gmail.com> wrote: > Dear all, > > I have recently been playing around with the AMR capabilities embedded in > PETSc for quad meshes using p4est. Based on the TS tutorial ex11, I was > able to incorporate the AMR into a pre-existing code with different metrics > for the adaptation process. > Now I would like to do something similar using tri meshes. I read that > compiling PETSc with Triangle (in 2D and Tetgen for 3D) gives access to > refinement and coarsening capabilities on triangular meshes.When I try to > execute the code with a triangular mesh (that i manipulate as a DMPLEX), it > yields "Triangle 1700 has an invalid vertex index" when trying to adapt the > mesh (the initial mesh indeed has 1700 cells). From what i could tell, it > comes from the reconstruct method called by the triangulate method of > triangle.c, the latter being called by either *DMPlexGenerate_Triangle * > or *DMPlexRefine_Triangle *in PETSc, I cannot be sure. > > In substance, the code is the same as in ex11.c and the crash occurs in > the first adaptation pass, i.e. an equivalent in ex11 is that it crashes > after the SetInitialCondition in the first if (useAMR) located line 1835 > when it calls adaptToleranceFVM (which I copied basically so the code is > the same). > > Is the automatic mesh refinement feature on tri meshes supposed to work or > am I trying something that has not been completed yet ? > It is supposed to work, and does for some tests in the library. I stopped using it because it is inherently serial and it is isotropic. However, it should be fixed. Is there something I can run to help me track down the problem? Thanks, Matt > Thank you very much for your help, as always. > > Thibault Bridel-Bertomeu > ? > Eng, MSc, PhD > Research Engineer > CEA/CESTA > 33114 LE BARP > Tel.: (+33)557046924 > Mob.: (+33)611025322 > Mail: thibault.bridelbertomeu at gmail.com > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From thibault.bridelbertomeu at gmail.com Mon Aug 31 08:45:11 2020 From: thibault.bridelbertomeu at gmail.com (Thibault Bridel-Bertomeu) Date: Mon, 31 Aug 2020 15:45:11 +0200 Subject: [petsc-users] DMAdaptLabel with triangle mesh In-Reply-To: References: Message-ID: Hi Matt, OK so I tried to replicate the problem starting from one of the tests in PETSc repo. I found https://gitlab.com/petsc/petsc/-/blob/master/src/dm/impls/plex/tests/ex20.c that actually uses DMAdaptLabel. Just add { DM gdm; DMPlexConstructGhostCells (dm, NULL, NULL, &gdm); DMDestroy (&dm); dm = gdm; } after line 24 where the box mesh is generated. Then compile and run with ex20 -dim 2.It should tell you that Triangle 18 has an invalid vertex index. That's the minimal example that I found that replicates the problem. Regarding the serial character of the technique, I tried with a distributed mesh and it works. So do you mean that intrinsically it gathers all the cells on the master proc before proceeding to the coarsening & refinement and only then broadcast the info back to the other processors ? Thanks, Thibault Le lun. 31 ao?t 2020 ? 12:55, Matthew Knepley a ?crit : > On Mon, Aug 31, 2020 at 5:34 AM Thibault Bridel-Bertomeu < > thibault.bridelbertomeu at gmail.com> wrote: > >> Dear all, >> >> I have recently been playing around with the AMR capabilities embedded in >> PETSc for quad meshes using p4est. Based on the TS tutorial ex11, I was >> able to incorporate the AMR into a pre-existing code with different metrics >> for the adaptation process. >> Now I would like to do something similar using tri meshes. I read that >> compiling PETSc with Triangle (in 2D and Tetgen for 3D) gives access to >> refinement and coarsening capabilities on triangular meshes.When I try to >> execute the code with a triangular mesh (that i manipulate as a DMPLEX), it >> yields "Triangle 1700 has an invalid vertex index" when trying to adapt the >> mesh (the initial mesh indeed has 1700 cells). From what i could tell, it >> comes from the reconstruct method called by the triangulate method of >> triangle.c, the latter being called by either *DMPlexGenerate_Triangle * >> or *DMPlexRefine_Triangle *in PETSc, I cannot be sure. >> >> In substance, the code is the same as in ex11.c and the crash occurs in >> the first adaptation pass, i.e. an equivalent in ex11 is that it crashes >> after the SetInitialCondition in the first if (useAMR) located line 1835 >> when it calls adaptToleranceFVM (which I copied basically so the code is >> the same). >> >> Is the automatic mesh refinement feature on tri meshes supposed to work >> or am I trying something that has not been completed yet ? >> > > It is supposed to work, and does for some tests in the library. I stopped > using it because it is inherently serial and it is isotropic. However, it > should be fixed. > Is there something I can run to help me track down the problem? > > Thanks, > > Matt > > >> Thank you very much for your help, as always. >> >> Thibault Bridel-Bertomeu >> ? >> Eng, MSc, PhD >> Research Engineer >> CEA/CESTA >> 33114 LE BARP >> Tel.: (+33)557046924 >> Mob.: (+33)611025322 >> Mail: thibault.bridelbertomeu at gmail.com >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From emconsta at anl.gov Mon Aug 31 09:09:09 2020 From: emconsta at anl.gov (Constantinescu, Emil M.) Date: Mon, 31 Aug 2020 14:09:09 +0000 Subject: [petsc-users] ARKIMEX produces incorrect values In-Reply-To: References: Message-ID: On 8/30/20 6:04 PM, Ed Bueler wrote: Actually, ARKIMEX is not off the hook. It still gets the wrong answer if told the whole thing is implicit: $ ./ex54 -ts_type arkimex -ts_arkimex_fully_implicit # WRONG (AND REALLY SLOW) error norm at tf = 1.000000 from 224 steps: |u-u_exact| = 2.76636e+00 Hi Ed, can you please add the following TSSetEquationType(ts,TS_EQ_IMPLICIT); before calling TSSolve and try again? This is described in Table 12 in the pdf doc. So that we improve our user experience, can you tell us what are your usual sources/starting points when implementing a new problem: 1- PDF doc 2- tutorials (if you find a good match) 3- own PETSc implementations 4- online function doc 5- other Thanks, Emil versus $ ./ex54 -ts_type arkimex # WRONG BUT IFunction IS OF FLAGGED FORM error norm at tf = 1.000000 from 16 steps: |u-u_exact| = 1.93229e+01 $ ./ex54 -ts_type bdf # RIGHT error norm at tf = 1.000000 from 33 steps: |u-u_exact| = 9.29170e-02 So I am not sure what "Methods with an explicit stage can only be used with ODE in which the stiff part G(t,X,Xdot) has the form Xdot + Ghat(t,X)." means. Ed On Sun, Aug 30, 2020 at 2:57 PM Ed Bueler > wrote: Darn, sorry. I realize the ARKIMEX page does say "Methods with an explicit stage can only be used with ODE in which the stiff part G(t,X,Xdot) has the form Xdot + Ghat(t,X)." So my example does not do that. Is there a way for ARKIMEX to detect that dG/d(Xdot) = I? Ed On Sun, Aug 30, 2020 at 2:44 PM Ed Bueler > wrote: Dear PETSc -- I tried twice to make this an issue at the gitlab.com host site, but both times got "something went wrong (500)". So this is a bug report by old-fashioned means. I created a TS example, https://github.com/bueler/p4pdes-next/blob/master/c/fix-arkimex/ex54.c at my github, also attached. It solves a 2D linear ODE ``` x' + y' = 6 y y' = x ``` Pretty basic; the known exact solution is just exponentials. The code writes it as F(t,u,u')=G(t,u) and supplies all the pieces, namely IFunction,IJacobian,RHSFunction,RHSJacobian. Note both F and G must be seen by TS to get the correct solution. In summary, a boring (and valgrind-clean ;-)) example. For current master branch it runs fine for the fully-implicit methods (e.g. BDF, CN, ROSW) which can use the IFunction F, including with finite-differenced Jacobians. With BDF2, BDF2+-snes_fd, BDF6+tight tol., CN, BEULER, ROSW: $ ./ex54 error norm at tf = 1.000000 from 33 steps: |u-u_exact| = 9.29170e-02 $ ./ex54 -snes_fd error norm at tf = 1.000000 from 33 steps: |u-u_exact| = 9.29170e-02 $ ./ex54 -ts_rtol 1.0e-14 -ts_atol 1.0e-14 -ts_bdf_order 6 error norm at tf = 1.000000 from 388 steps: |u-u_exact| = 4.23624e-11 $ ./ex54 -ts_type beuler error norm at tf = 1.000000 from 100 steps: |u-u_exact| = 6.71676e-01 $ ./ex54 -ts_type cn error norm at tf = 1.000000 from 100 steps: |u-u_exact| = 2.22839e-03 $ ./ex54 -ts_type rosw error norm at tf = 1.000000 from 21 steps: |u-u_exact| = 5.64012e-03 But it produces wrong values with ARKIMEX: $ ./ex54 -ts_type arkimex error norm at tf = 1.000000 from 16 steps: |u-u_exact| = 1.93229e+01 Neither tightening tolerance nor changing type (`-ts_arkimex_type`) helps ARKIMEX. Thanks! Ed PS My book is at a late proofs stage, and out of my hands. It should appear SIAM Press in a couple of months. In all the examples in my book, only my diffusion-reaction system example using F(t,u,u') = G(t,u) is broken. Thus the motivation for a trivial ODE example as above. -- Ed Bueler Dept of Mathematics and Statistics University of Alaska Fairbanks Fairbanks, AK 99775-6660 306C Chapman -- Ed Bueler Dept of Mathematics and Statistics University of Alaska Fairbanks Fairbanks, AK 99775-6660 306C Chapman -- Ed Bueler Dept of Mathematics and Statistics University of Alaska Fairbanks Fairbanks, AK 99775-6660 306C Chapman -- Emil M. Constantinescu, Ph.D. Computational Mathematician Argonne National Laboratory Mathematics and Computer Science Division Ph: 630-252-0926 http://www.mcs.anl.gov/~emconsta -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Aug 31 10:23:30 2020 From: jed at jedbrown.org (Jed Brown) Date: Mon, 31 Aug 2020 09:23:30 -0600 Subject: [petsc-users] ARKIMEX produces incorrect values In-Reply-To: References: <871rjnhh1a.fsf@jedbrown.org> <27E06EBB-7BB0-42BB-B3CE-49B01EB4D6E6@petsc.dev> Message-ID: <87pn76g7pp.fsf@jedbrown.org> Ed Bueler writes: >>>> I realize the ARKIMEX page does say "Methods with an explicit stage can >>>> only be used with ODE in which the stiff part G(t,X,Xdot) has the form > Xdot >>>> + Ghat(t,X)." So my example does not do that. Is there a way for >>>> ARKIMEX to detect that dG/d(Xdot) = I? >> >>> Other than sampling its action on vectors? Not really; it's user code. >> > > Call TSComputeIFunction(TS ts,PetscReal t,Vec U,Vec Udot,Vec Y,PetscBool > > imex) with Udot all 1 and rhsfunction turned off. Call again with Udot is > > zero, take the difference. If it does not return all 1 then you know the > > user has provided an unacceptable function? That's weaker than sampling with random vectors. For example, it would "accept" any case in which the matrix has row sum of 1, which is true for permutation matrices, arbitrary mass matrices (with appropriate scaling), etc. It'd be more reliable to set Udot=random and check that the difference matches that random vector. > > You could do this at the beginning of each TSSolve() for these picky > > methods in debug mode. > > How about calling the IJacobian first with a=0 and then with a=1 and > subtracting matrices, to see if the result is the identity? That's more precise (assuming the IJacobian does not depend on Udot), but significantly more expensive in memory and time. From fdkong.jd at gmail.com Mon Aug 31 11:33:12 2020 From: fdkong.jd at gmail.com (Fande Kong) Date: Mon, 31 Aug 2020 10:33:12 -0600 Subject: [petsc-users] EPSMonitorSet Message-ID: Hi All, There is a statement on API EPSMonitorSet: "Sets an ADDITIONAL function to be called at every iteration to monitor the error estimates for each requested eigenpair." I was wondering how to replace SLEPc EPS monitors instead of adding one? I want to use my monitor only. Thanks, Fande, -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Mon Aug 31 12:11:42 2020 From: jroman at dsic.upv.es (Jose E. Roman) Date: Mon, 31 Aug 2020 19:11:42 +0200 Subject: [petsc-users] EPSMonitorSet In-Reply-To: References: Message-ID: <6CAAFDEC-EE14-4E84-8979-62C339198809@dsic.upv.es> Call EPSMonitorCancel() before EPSMonitorSet(). Jose > El 31 ago 2020, a las 18:33, Fande Kong escribi?: > > Hi All, > > There is a statement on API EPSMonitorSet: > > "Sets an ADDITIONAL function to be called at every iteration to monitor the error estimates for each requested eigenpair." > > I was wondering how to replace SLEPc EPS monitors instead of adding one? I want to use my monitor only. > > Thanks, > > Fande, From elbueler at alaska.edu Mon Aug 31 12:17:46 2020 From: elbueler at alaska.edu (Ed Bueler) Date: Mon, 31 Aug 2020 09:17:46 -0800 Subject: [petsc-users] ARKIMEX produces incorrect values In-Reply-To: References: Message-ID: Emil -- Thanks for looking at this. > Hi Ed, can you please add the following > TSSetEquationType(ts,TS_EQ_IMPLICIT); > before calling TSSolve and try again? This is described in Table 12 in the pdf doc. Yep, that fixes it. After setting the TS_EQ_IMPLICIT flag programmatically I get: $ ./ex54 -ts_type arkimex -ts_arkimex_fully_implicit error norm at tf = 1.000000 from 12 steps: |u-u_exact| = 1.34500e-02 Without -ts_arkimex_fully_implicit we still get the wrong answer, but, as I understand it, we expect the wrong answer because dF/d(dudt) != I, correct? So -ts_arkimex_fully_implicit does not set this flag? > So that we improve our user experience, can you tell us what are your usual sources/starting points > when implementing a new problem: > 1- PDF doc Yes. Looked briefly at the PDF manual. E.g. I saw the tables for IMEX methods but my eyes glazed over. > 2- tutorials (if you find a good match) Yes. Looked at various html pages including the one for TSARKIMEX. But I missed the sentence "Methods with an explicit stage can only be used with ODE in which the stiff part G(t,X,Xdot) has the form Xdot + Ghat(t,X)." I did not expect that ARKIMEX had this restriction, and did not pick it up. > 3- own PETSc implementations Yes. I have my own diffusion-reaction system ( https://github.com/bueler/p4pdes/blob/master/c/ch5/pattern.c) in which ARKIMEX works well. (Or at least as far as I can tell. I don't have a manufactured solution for it, for example.) I am in the midst of tracking down a different kind of error, probably from DMDA callbacks, when I got distracted by the current issue. > 4- online function doc Yes. See above comment on TSARKIMEX page. By my memory I also looked at the TSSet{I,RHS}Jacobian() pages, for example, and probably others. > 5- other Not sure. Thanks, Ed On Mon, Aug 31, 2020 at 6:09 AM Constantinescu, Emil M. wrote: > > On 8/30/20 6:04 PM, Ed Bueler wrote: > > Actually, ARKIMEX is not off the hook. It still gets the wrong answer if > told the whole thing is implicit: > > $ ./ex54 -ts_type arkimex -ts_arkimex_fully_implicit # WRONG (AND > REALLY SLOW) > error norm at tf = 1.000000 from 224 steps: |u-u_exact| = 2.76636e+00 > > Hi Ed, can you please add the following > > TSSetEquationType (ts,TS_EQ_IMPLICIT ); > > before calling TSSolve and try again? This is described in Table 12 in the > pdf doc. > > > So that we improve our user experience, can you tell us what are your > usual sources/starting points when implementing a new problem: > > 1- PDF doc > > 2- tutorials (if you find a good match) > > 3- own PETSc implementations > > 4- online function doc > > 5- other > > Thanks, > Emil > > versus > > $ ./ex54 -ts_type arkimex # WRONG BUT IFunction IS OF FLAGGED FORM > error norm at tf = 1.000000 from 16 steps: |u-u_exact| = 1.93229e+01 > > $ ./ex54 -ts_type bdf # RIGHT > error norm at tf = 1.000000 from 33 steps: |u-u_exact| = 9.29170e-02 > > So I am not sure what "Methods with an explicit stage can only be used > with ODE in which the stiff part G(t,X,Xdot) has the form Xdot + > Ghat(t,X)." means. > > Ed > > > On Sun, Aug 30, 2020 at 2:57 PM Ed Bueler wrote: > >> Darn, sorry. >> >> I realize the ARKIMEX page does say "Methods with an explicit stage can >> only be used with ODE in which the stiff part G(t,X,Xdot) has the form Xdot >> + Ghat(t,X)." So my example does not do that. Is there a way for >> ARKIMEX to detect that dG/d(Xdot) = I? >> >> Ed >> >> On Sun, Aug 30, 2020 at 2:44 PM Ed Bueler wrote: >> >>> Dear PETSc -- >>> >>> I tried twice to make this an issue at the gitlab.com host site, but >>> both times got "something went wrong (500)". So this is a bug report by >>> old-fashioned means. >>> >>> I created a TS example, >>> https://github.com/bueler/p4pdes-next/blob/master/c/fix-arkimex/ex54.c >>> at my github, also attached. It solves a 2D linear ODE >>> ``` >>> x' + y' = 6 y >>> y' = x >>> ``` >>> Pretty basic; the known exact solution is just exponentials. The code >>> writes it as F(t,u,u')=G(t,u) and supplies all the pieces, namely >>> IFunction,IJacobian,RHSFunction,RHSJacobian. Note both F and G must be >>> seen by TS to get the correct solution. In summary, a boring (and >>> valgrind-clean ;-)) example. >>> >>> For current master branch it runs fine for the fully-implicit methods >>> (e.g. BDF, CN, ROSW) which can use the IFunction F, including with >>> finite-differenced Jacobians. With BDF2, BDF2+-snes_fd, BDF6+tight tol., >>> CN, BEULER, ROSW: >>> $ ./ex54 >>> error norm at tf = 1.000000 from 33 steps: |u-u_exact| = 9.29170e-02 >>> $ ./ex54 -snes_fd >>> error norm at tf = 1.000000 from 33 steps: |u-u_exact| = 9.29170e-02 >>> $ ./ex54 -ts_rtol 1.0e-14 -ts_atol 1.0e-14 -ts_bdf_order 6 >>> error norm at tf = 1.000000 from 388 steps: |u-u_exact| = 4.23624e-11 >>> $ ./ex54 -ts_type beuler >>> error norm at tf = 1.000000 from 100 steps: |u-u_exact| = 6.71676e-01 >>> $ ./ex54 -ts_type cn >>> error norm at tf = 1.000000 from 100 steps: |u-u_exact| = 2.22839e-03 >>> $ ./ex54 -ts_type rosw >>> error norm at tf = 1.000000 from 21 steps: |u-u_exact| = 5.64012e-03 >>> >>> But it produces wrong values with ARKIMEX: >>> $ ./ex54 -ts_type arkimex >>> error norm at tf = 1.000000 from 16 steps: |u-u_exact| = 1.93229e+01 >>> >>> Neither tightening tolerance nor changing type (`-ts_arkimex_type`) >>> helps ARKIMEX. >>> >>> Thanks! >>> >>> Ed >>> >>> PS My book is at a late proofs stage, and out of my hands. It should >>> appear SIAM Press in a couple of months. In all the examples in my book, >>> only my diffusion-reaction system example using F(t,u,u') = G(t,u) is >>> broken. Thus the motivation for a trivial ODE example as above. >>> >>> >>> -- >>> Ed Bueler >>> Dept of Mathematics and Statistics >>> University of Alaska Fairbanks >>> Fairbanks, AK 99775-6660 >>> 306C Chapman >>> >> >> >> -- >> Ed Bueler >> Dept of Mathematics and Statistics >> University of Alaska Fairbanks >> Fairbanks, AK 99775-6660 >> 306C Chapman >> > > > -- > Ed Bueler > Dept of Mathematics and Statistics > University of Alaska Fairbanks > Fairbanks, AK 99775-6660 > 306C Chapman > > -- > Emil M. Constantinescu, Ph.D. > Computational Mathematician > Argonne National Laboratory > Mathematics and Computer Science Division > > Ph: 630-252-0926http://www.mcs.anl.gov/~emconsta > > -- Ed Bueler Dept of Mathematics and Statistics University of Alaska Fairbanks Fairbanks, AK 99775-6660 306C Chapman -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Aug 31 12:58:46 2020 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 31 Aug 2020 12:58:46 -0500 Subject: [petsc-users] ARKIMEX produces incorrect values In-Reply-To: <87pn76g7pp.fsf@jedbrown.org> References: <871rjnhh1a.fsf@jedbrown.org> <27E06EBB-7BB0-42BB-B3CE-49B01EB4D6E6@petsc.dev> <87pn76g7pp.fsf@jedbrown.org> Message-ID: Sure, random definitely is better. Seems like worth putting in (at least debug mode) since it will catch nearly all incorrect uses of this functionality. Barry > On Aug 31, 2020, at 10:23 AM, Jed Brown wrote: > > Ed Bueler writes: > >>>>> I realize the ARKIMEX page does say "Methods with an explicit stage can >>>>> only be used with ODE in which the stiff part G(t,X,Xdot) has the form >> Xdot >>>>> + Ghat(t,X)." So my example does not do that. Is there a way for >>>>> ARKIMEX to detect that dG/d(Xdot) = I? >>> >>>> Other than sampling its action on vectors? Not really; it's user code. >>> >>> Call TSComputeIFunction(TS ts,PetscReal t,Vec U,Vec Udot,Vec Y,PetscBool >>> imex) with Udot all 1 and rhsfunction turned off. Call again with Udot is >>> zero, take the difference. If it does not return all 1 then you know the >>> user has provided an unacceptable function? > > That's weaker than sampling with random vectors. For example, it would > "accept" any case in which the matrix has row sum of 1, which is true > for permutation matrices, arbitrary mass matrices (with appropriate > scaling), etc. It'd be more reliable to set Udot=random and check that > the difference matches that random vector. > >>> You could do this at the beginning of each TSSolve() for these picky >>> methods in debug mode. >> >> How about calling the IJacobian first with a=0 and then with a=1 and >> subtracting matrices, to see if the result is the identity? > > That's more precise (assuming the IJacobian does not depend on Udot), > but significantly more expensive in memory and time. From fdkong.jd at gmail.com Mon Aug 31 12:59:08 2020 From: fdkong.jd at gmail.com (Fande Kong) Date: Mon, 31 Aug 2020 11:59:08 -0600 Subject: [petsc-users] EPSMonitorSet In-Reply-To: <6CAAFDEC-EE14-4E84-8979-62C339198809@dsic.upv.es> References: <6CAAFDEC-EE14-4E84-8979-62C339198809@dsic.upv.es> Message-ID: Oh, cool. Thanks, Jose, I will try that. Fande, On Mon, Aug 31, 2020 at 11:11 AM Jose E. Roman wrote: > Call EPSMonitorCancel() before EPSMonitorSet(). > Jose > > > > El 31 ago 2020, a las 18:33, Fande Kong escribi?: > > > > Hi All, > > > > There is a statement on API EPSMonitorSet: > > > > "Sets an ADDITIONAL function to be called at every iteration to monitor > the error estimates for each requested eigenpair." > > > > I was wondering how to replace SLEPc EPS monitors instead of adding one? > I want to use my monitor only. > > > > Thanks, > > > > Fande, > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sblondel at utk.edu Mon Aug 31 13:13:30 2020 From: sblondel at utk.edu (Blondel, Sophie) Date: Mon, 31 Aug 2020 18:13:30 +0000 Subject: [petsc-users] Matrix Free Method questions In-Reply-To: <3329269A-EB37-41C9-9698-BA4631A1E18A@petsc.dev> References: <5BDE8465-76BE-4132-BF4E-6784548AADC0@petsc.dev> , <3329269A-EB37-41C9-9698-BA4631A1E18A@petsc.dev> Message-ID: Hi Barry, I ran the 2 cases to look at the effect of the Jacobi pre-conditionner: * 1D with 200 grid points and 7759 DOF per grid point (for the PSI application), for 20 TS: the factor between SOR and Jacobi is ~4 (976 MatMult for SOR and 4162 MatMult for Jacobi) * 2D with 63x63 grid points and 4124 DOF per grid point (for the NE application), for 20 TS: the factor is 1.5 (6657 for SOR, 10379 for Jacobi) Cheers, Sophie ________________________________ De : Barry Smith Envoy? : vendredi 28 ao?t 2020 18:31 ? : Blondel, Sophie Cc : petsc-users at mcs.anl.gov ; xolotl-psi-development at lists.sourceforge.net Objet : Re: [petsc-users] Matrix Free Method questions On Aug 28, 2020, at 4:11 PM, Blondel, Sophie > wrote: Thank you Jed and Barry, First, attached are the logs from the benchmark runs I did without (log_std.txt) and with MF method (log_mf.txt). It took me some trouble to get the -log_view to work because I'm using push and pop for the options which means that PETSc is initialized with no argument so the command line argument was not taken into account, but I guess this is for a separate discussion. To answer questions about the current per-conditioners: * I used the same pre-conditioner options as listed in my previous email when I added the -snes_mf option; I did try to remove all the PC related options at one point with the MF method but didn't see a change in runtime so I put them back in * this benchmark is for a 1D DMDA using 20 grid points; when running in 2D or 3D I switch the PC options to: -pc_type fieldsplit -fieldsplit_0_pc_type sor -fieldsplit_1_pc_type gamg -fieldsplit_1_ksp_type gmres -ksp_type fgmres -fieldsplit_1_pc_gamg_threshold -1 I haven't tried a Jacobi PC instead of SOR, I will run a set of more realistic runs (1D and 2D) without MF but with Jacobi and report on it next week. When you say "iterations" do you mean what is given by -ksp_monitor? Yes, the number of MatMult is a good enough surrogate. So using matrix-free (which means no preconditioning) has 35846/160 ans = 224.0375 or 224 as many iterations. So even for this modest 1d problem preconditioning is doing a great deal. Barry Cheers, Sophie ________________________________ De : Barry Smith > Envoy? : vendredi 28 ao?t 2020 12:12 ? : Blondel, Sophie > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > Objet : Re: [petsc-users] Matrix Free Method questions [External Email] Sophie, This is exactly what i would expect. If you run with -ksp_monitor you will see the -snes_mf run takes many more iterations. I am puzzled that the argument -pc_type fieldsplit did not stop the run since this is under normal circumstances not a viable preconditioner with -snes_mf. Did you also remove the -pc_type fieldsplit argument? In order to see how one can avoid forming the entire matrix and use matrix-free to do the matrix-vector but still have an effective preconditioner let's look at what the current preconditioner options do. -pc_fieldsplit_detect_coupling creates two sub-preconditioners, the first for all the variables and the second for those that are coupled by the matrix to variables in neighboring cells Since only the smallest cluster sizes have diffusion/advection this second set contains only the cluster size one variables. -fieldsplit_0_pc_type sor Runs SOR on all the variables; you can think of this as running SOR on the reactions, it is a pretty good preconditioner for the reactions since the reactions are local, per cell. -fieldsplit_1_pc_type redundant This runs the default preconditioner (ILU) on just the variables that diffuse, i.e. the elliptic part. For smallish problems this is fine, for larger problems and 2d and 3d presumably you have also -redundant_pc_type gamg to use algebraic multigrid for the diffusion. This part of the matrix will always need to be formed and used in the preconditioner. It is very important since the diffusion is what brings in most of the ill-conditioning for larger problems into the linear system. Note that it only needs the matrix entries for the cluster size of 1 so it is very small compared to the entire sparse matrix. ---- The first preconditioner SOR requires ALL the matrix entries which are almost all (except for the diffusion terms) the coupling between different size clusters within a cell. Especially each cell has its own sparse matrix of the size of total number of clusters, it is sparse but not super sparse. So the to significantly lower memory usage we need to remove the SOR and the storing of all the matrix entries but still have an efficient preconditioner for the "reaction" terms. The simplest thing would be to use Jacobi instead of SOR for the first subpreconditioner since it only requires the diagonal entries in the matrix. But Jacobi is a worse preconditioner than SOR (since it totally ignores the matrix coupling) and sometimes can be much worse. Before anyone writes additional code we need to know if doing something along these lines does not ruin the convergence that. Have you used the same options as before but with -fieldsplit_0_pc_type jacobi ? (Not using any matrix free). We need to get an idea of how many more linear iterations it requires (not time, comparing time won't be helpful for this exercise.) We also need this information for realistic size problems in 2 or 3 dimensions that you really want to run; for small problems this approach will work ok and give misleading information about what happens for large problems. I suspect the iteration counts will shot up. Can you run some cases and see how the iteration counts change? Based on that we can decide if we still retain "good convergence" by changing the SOR to Jacobi and then change the code to make this change efficient (basically by skipping the explicit computation of the reaction Jacobian terms and using matrix-free on the outside of the PCFIELDSPLIT.) Barry On Aug 28, 2020, at 9:49 AM, Blondel, Sophie via petsc-users > wrote: Hi everyone, I have been using PETSc for a few years with a fully implicit TS ARKIMEX method and am now exploring the matrix free method option. Here is the list of PETSc options I typically use: -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 5 -snes_force_iteration -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -fieldsplit_0_pc_type sor -ts_max_snes_failures -1 -pc_fieldsplit_detect_coupling -ts_monitor -pc_type fieldsplit -fieldsplit_1_pc_type redundant -ts_max_steps 100 I started to compare the performance of the code without changing anything of the executable and simply adding "-snes_mf", I see a reduction of memory usage as expected and a benchmark that would usually take ~5min to run now takes ~50min. Reading the documentation I saw that there are a few option to play with the matrix free method like -snes_mf_err, -snes_mf_umin, or switching to -snes_mf_type wp. I used and modified the values of each of these options separately but never saw a sizable change in runtime, is it expected? And are there other ways to make the matrix free method faster? I saw in the documentation that you can define your own per-conditioner for instance. Let me know if you need additional information about the PETSc setup in the application I use. Best, Sophie -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Aug 31 13:35:00 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 31 Aug 2020 14:35:00 -0400 Subject: [petsc-users] DMAdaptLabel with triangle mesh In-Reply-To: References: Message-ID: On Mon, Aug 31, 2020 at 9:45 AM Thibault Bridel-Bertomeu < thibault.bridelbertomeu at gmail.com> wrote: > Hi Matt, > > OK so I tried to replicate the problem starting from one of the tests in > PETSc repo. > I found > https://gitlab.com/petsc/petsc/-/blob/master/src/dm/impls/plex/tests/ex20.c that > actually uses DMAdaptLabel. > Just add > > { > DM gdm; > DMPlexConstructGhostCells (dm, NULL, NULL, &gdm); DMDestroy (&dm); > > dm = gdm; > > } > > after line 24 where the box mesh is generated. Then compile and run with ex20 -dim 2.It should tell you that Triangle 18 has an invalid vertex index. > That's the minimal example that I found that replicates the problem. > > Ah, okay. p4est knows to discard the ghost cells. I can add that to Triangle. > Regarding the serial character of the technique, I tried with a distributed mesh and it works. > > Hmm, it can't work. Maybe it appears to work. Triangle knows nothing about parallelism. So this must be feeding the local mesh to triangle and replacing it by a refined mesh, but the parallel boundaries will not be correct, and might not even match up. Thanks, Matt > So do you mean that intrinsically it gathers all the cells on the master proc before proceeding to the coarsening & refinement and only then broadcast the info back to the other processors ? > > Thanks, > > Thibault > > Le lun. 31 ao?t 2020 ? 12:55, Matthew Knepley a > ?crit : > >> On Mon, Aug 31, 2020 at 5:34 AM Thibault Bridel-Bertomeu < >> thibault.bridelbertomeu at gmail.com> wrote: >> >>> Dear all, >>> >>> I have recently been playing around with the AMR capabilities embedded >>> in PETSc for quad meshes using p4est. Based on the TS tutorial ex11, I was >>> able to incorporate the AMR into a pre-existing code with different metrics >>> for the adaptation process. >>> Now I would like to do something similar using tri meshes. I read that >>> compiling PETSc with Triangle (in 2D and Tetgen for 3D) gives access to >>> refinement and coarsening capabilities on triangular meshes.When I try to >>> execute the code with a triangular mesh (that i manipulate as a DMPLEX), it >>> yields "Triangle 1700 has an invalid vertex index" when trying to adapt the >>> mesh (the initial mesh indeed has 1700 cells). From what i could tell, it >>> comes from the reconstruct method called by the triangulate method of >>> triangle.c, the latter being called by either *DMPlexGenerate_Triangle * >>> or *DMPlexRefine_Triangle *in PETSc, I cannot be sure. >>> >>> In substance, the code is the same as in ex11.c and the crash occurs in >>> the first adaptation pass, i.e. an equivalent in ex11 is that it crashes >>> after the SetInitialCondition in the first if (useAMR) located line 1835 >>> when it calls adaptToleranceFVM (which I copied basically so the code is >>> the same). >>> >>> Is the automatic mesh refinement feature on tri meshes supposed to work >>> or am I trying something that has not been completed yet ? >>> >> >> It is supposed to work, and does for some tests in the library. I stopped >> using it because it is inherently serial and it is isotropic. However, it >> should be fixed. >> Is there something I can run to help me track down the problem? >> >> Thanks, >> >> Matt >> >> >>> Thank you very much for your help, as always. >>> >>> Thibault Bridel-Bertomeu >>> ? >>> Eng, MSc, PhD >>> Research Engineer >>> CEA/CESTA >>> 33114 LE BARP >>> Tel.: (+33)557046924 >>> Mob.: (+33)611025322 >>> Mail: thibault.bridelbertomeu at gmail.com >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From KJiao at slb.com Mon Aug 31 13:51:13 2020 From: KJiao at slb.com (Kun Jiao) Date: Mon, 31 Aug 2020 18:51:13 +0000 Subject: [petsc-users] change matrix Message-ID: Hi Petsc Experts, Trying to do something like appending some rows (~100 rows) to an already created matrix, but could not find any document about it. Could anyone provide some information about it? Regards, Kun Schlumberger-Private -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Aug 31 13:50:21 2020 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 31 Aug 2020 13:50:21 -0500 Subject: [petsc-users] Matrix Free Method questions In-Reply-To: References: <5BDE8465-76BE-4132-BF4E-6784548AADC0@petsc.dev> <3329269A-EB37-41C9-9698-BA4631A1E18A@petsc.dev> Message-ID: <3E68F0AF-2F7D-4394-894A-3099EC80B9BC@petsc.dev> Sophie, Thanks. The factor of 4 is lot, the 1.5 not so bad. You will definitely want to retain the full matrix assembly codes for speed and to verify a reduced matrix version. It is worth trying a "reduced matrix version" with matrix-free multiply based on these numbers. This reduced matrix Jacobian will only have the diagonals and all the terms connected to the cluster sizes that move. In other words you will be building just the part of the Jacobian needed for the new preconditioner (PC subtype for Jacobi) and doing the matrix-vector product matrix free. (SOR requires all the Jacobian entries). Fortunately this is hopefully pretty straightforward for this code. You will not have to change the structure of the main code at all. Step 1) create a new "sparse matrix" that will be passed to DMDASetBlockFillsSparse(). This new "sparse matrix" needs to retain all the diagonal entries and also all the entries that are associated with the variables that diffuse. If I remember correctly these are just the smallest cluster size, plain Helium? Call MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE); Then you would run the code with -snes_mf_operator and the new PC subtype for Jacobi. A test that the new reduced Jacobian is correct will be that you get almost the same iterations as the runs you just make using the PC subtype of Jacobi. Hopefully not slower and using a great deal less memory. The iterations will not be identical because of the matrix-free multiple. Step 2) create a new version of the Jacobian computation routine. This routine should only compute the elements of the Jacobian needed for this reduced matrix Jacobian, so the diagonals and the diffusion/convection terms. Again run with with -snes_mf_operator and the new PC subtype for Jacobi and you should again get the same convergence history. I made two steps because it makes it easier to validate and debug to get the same results as before. The first step cheats in that it still computes the full Jacobian but ignores the entries that we don't need to store for the preconditioner. The second step is more efficient because it only computes the Jacobian entries needed for the preconditioner but it requires you going through the Jacobian code and making sure only the needed parts are computed. If you have any questions please let me know. Barry > On Aug 31, 2020, at 1:13 PM, Blondel, Sophie wrote: > > Hi Barry, > > I ran the 2 cases to look at the effect of the Jacobi pre-conditionner: > 1D with 200 grid points and 7759 DOF per grid point (for the PSI application), for 20 TS: the factor between SOR and Jacobi is ~4 (976 MatMult for SOR and 4162 MatMult for Jacobi) > 2D with 63x63 grid points and 4124 DOF per grid point (for the NE application), for 20 TS: the factor is 1.5 (6657 for SOR, 10379 for Jacobi) > Cheers, > > Sophie > De : Barry Smith > > Envoy? : vendredi 28 ao?t 2020 18:31 > ? : Blondel, Sophie > > Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > > Objet : Re: [petsc-users] Matrix Free Method questions > > > >> On Aug 28, 2020, at 4:11 PM, Blondel, Sophie > wrote: >> >> Thank you Jed and Barry, >> >> First, attached are the logs from the benchmark runs I did without (log_std.txt) and with MF method (log_mf.txt). It took me some trouble to get the -log_view to work because I'm using push and pop for the options which means that PETSc is initialized with no argument so the command line argument was not taken into account, but I guess this is for a separate discussion. >> >> To answer questions about the current per-conditioners: >> I used the same pre-conditioner options as listed in my previous email when I added the -snes_mf option; I did try to remove all the PC related options at one point with the MF method but didn't see a change in runtime so I put them back in >> this benchmark is for a 1D DMDA using 20 grid points; when running in 2D or 3D I switch the PC options to: -pc_type fieldsplit -fieldsplit_0_pc_type sor -fieldsplit_1_pc_type gamg -fieldsplit_1_ksp_type gmres -ksp_type fgmres -fieldsplit_1_pc_gamg_threshold -1 >> I haven't tried a Jacobi PC instead of SOR, I will run a set of more realistic runs (1D and 2D) without MF but with Jacobi and report on it next week. When you say "iterations" do you mean what is given by -ksp_monitor? > > Yes, the number of MatMult is a good enough surrogate. > > So using matrix-free (which means no preconditioning) has > > 35846/160 > > ans = > > 224.0375 > > or 224 as many iterations. So even for this modest 1d problem preconditioning is doing a great deal. > > Barry > > > >> >> Cheers, >> >> Sophie >> De : Barry Smith > >> Envoy? : vendredi 28 ao?t 2020 12:12 >> ? : Blondel, Sophie > >> Cc : petsc-users at mcs.anl.gov >; xolotl-psi-development at lists.sourceforge.net > >> Objet : Re: [petsc-users] Matrix Free Method questions >> >> [External Email] >> >> Sophie, >> >> This is exactly what i would expect. If you run with -ksp_monitor you will see the -snes_mf run takes many more iterations. >> >> I am puzzled that the argument -pc_type fieldsplit did not stop the run since this is under normal circumstances not a viable preconditioner with -snes_mf. Did you also remove the -pc_type fieldsplit argument? >> >> In order to see how one can avoid forming the entire matrix and use matrix-free to do the matrix-vector but still have an effective preconditioner let's look at what the current preconditioner options do. >> >>> -pc_fieldsplit_detect_coupling >> >> creates two sub-preconditioners, the first for all the variables and the second for those that are coupled by the matrix to variables in neighboring cells Since only the smallest cluster sizes have diffusion/advection this second set contains only the cluster size one variables. >> >>> -fieldsplit_0_pc_type sor >> >> Runs SOR on all the variables; you can think of this as running SOR on the reactions, it is a pretty good preconditioner for the reactions since the reactions are local, per cell. >> >>> -fieldsplit_1_pc_type redundant >> >> >> This runs the default preconditioner (ILU) on just the variables that diffuse, i.e. the elliptic part. For smallish problems this is fine, for larger problems and 2d and 3d presumably you have also -redundant_pc_type gamg to use algebraic multigrid for the diffusion. This part of the matrix will always need to be formed and used in the preconditioner. It is very important since the diffusion is what brings in most of the ill-conditioning for larger problems into the linear system. Note that it only needs the matrix entries for the cluster size of 1 so it is very small compared to the entire sparse matrix. >> >> ---- >> The first preconditioner SOR requires ALL the matrix entries which are almost all (except for the diffusion terms) the coupling between different size clusters within a cell. Especially each cell has its own sparse matrix of the size of total number of clusters, it is sparse but not super sparse. >> >> So the to significantly lower memory usage we need to remove the SOR and the storing of all the matrix entries but still have an efficient preconditioner for the "reaction" terms. >> >> The simplest thing would be to use Jacobi instead of SOR for the first subpreconditioner since it only requires the diagonal entries in the matrix. But Jacobi is a worse preconditioner than SOR (since it totally ignores the matrix coupling) and sometimes can be much worse. >> >> Before anyone writes additional code we need to know if doing something along these lines does not ruin the convergence that. >> >> Have you used the same options as before but with -fieldsplit_0_pc_type jacobi ? (Not using any matrix free). We need to get an idea of how many more linear iterations it requires (not time, comparing time won't be helpful for this exercise.) We also need this information for realistic size problems in 2 or 3 dimensions that you really want to run; for small problems this approach will work ok and give misleading information about what happens for large problems. >> >> I suspect the iteration counts will shot up. Can you run some cases and see how the iteration counts change? >> >> Based on that we can decide if we still retain "good convergence" by changing the SOR to Jacobi and then change the code to make this change efficient (basically by skipping the explicit computation of the reaction Jacobian terms and using matrix-free on the outside of the PCFIELDSPLIT.) >> >> Barry >> >> >> >> >> >> >> >> >> >>> On Aug 28, 2020, at 9:49 AM, Blondel, Sophie via petsc-users > wrote: >>> >>> Hi everyone, >>> >>> I have been using PETSc for a few years with a fully implicit TS ARKIMEX method and am now exploring the matrix free method option. Here is the list of PETSc options I typically use: -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 5 -snes_force_iteration -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -fieldsplit_0_pc_type sor -ts_max_snes_failures -1 -pc_fieldsplit_detect_coupling -ts_monitor -pc_type fieldsplit -fieldsplit_1_pc_type redundant -ts_max_steps 100 >>> >>> I started to compare the performance of the code without changing anything of the executable and simply adding "-snes_mf", I see a reduction of memory usage as expected and a benchmark that would usually take ~5min to run now takes ~50min. Reading the documentation I saw that there are a few option to play with the matrix free method like -snes_mf_err, -snes_mf_umin, or switching to -snes_mf_type wp. I used and modified the values of each of these options separately but never saw a sizable change in runtime, is it expected? >>> >>> And are there other ways to make the matrix free method faster? I saw in the documentation that you can define your own per-conditioner for instance. Let me know if you need additional information about the PETSc setup in the application I use. >>> >>> Best, >>> >>> Sophie >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Aug 31 13:54:33 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 31 Aug 2020 14:54:33 -0400 Subject: [petsc-users] change matrix In-Reply-To: References: Message-ID: On Mon, Aug 31, 2020 at 2:51 PM Kun Jiao via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi Petsc Experts, > > > > Trying to do something like appending some rows (~100 rows) to an already > created matrix, but could not find any document about it. > > > > Could anyone provide some information about it? > This is not possible. Once created, matrices are optimized for MatMult, not insertion. You just create another matrix with the extra rows and copy in. Thanks, Matt > Regards, > > Kun > > Schlumberger-Private > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Aug 31 13:55:57 2020 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 31 Aug 2020 13:55:57 -0500 Subject: [petsc-users] change matrix In-Reply-To: References: Message-ID: <89835093-5358-4AB5-AD4A-B9E213F830F0@petsc.dev> Kun, This is not possible, PETSc matrices have a static size (resizing in parallel is tricky so we don't support it). If it is more efficient to reuse the matrix entries than recompute them you can create a larger matrix and then loop over the old matrix calling MatGetRow() and then call MatSetValues() to copy that row into the new matrix. Barry > On Aug 31, 2020, at 1:51 PM, Kun Jiao via petsc-users wrote: > > Hi Petsc Experts, > > Trying to do something like appending some rows (~100 rows) to an already created matrix, but could not find any document about it. > > Could anyone provide some information about it? > > Regards, > Kun > > Schlumberger-Private -------------- next part -------------- An HTML attachment was scrubbed... URL: From KJiao at slb.com Mon Aug 31 14:08:27 2020 From: KJiao at slb.com (Kun Jiao) Date: Mon, 31 Aug 2020 19:08:27 +0000 Subject: [petsc-users] [Ext] Re: change matrix In-Reply-To: <89835093-5358-4AB5-AD4A-B9E213F830F0@petsc.dev> References: <89835093-5358-4AB5-AD4A-B9E213F830F0@petsc.dev> Message-ID: If I am correct, to do this, it will double the peak memory usage. Is there any way no to double the peak memory usage? Schlumberger-Private From: Barry Smith Sent: Monday, August 31, 2020 1:56 PM To: Kun Jiao Cc: petsc-users Subject: [Ext] Re: [petsc-users] change matrix Kun, This is not possible, PETSc matrices have a static size (resizing in parallel is tricky so we don't support it). If it is more efficient to reuse the matrix entries than recompute them you can create a larger matrix and then loop over the old matrix calling MatGetRow() and then call MatSetValues() to copy that row into the new matrix. Barry On Aug 31, 2020, at 1:51 PM, Kun Jiao via petsc-users > wrote: Hi Petsc Experts, Trying to do something like appending some rows (~100 rows) to an already created matrix, but could not find any document about it. Could anyone provide some information about it? Regards, Kun Schlumberger-Private -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Aug 31 14:11:10 2020 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 31 Aug 2020 14:11:10 -0500 Subject: [petsc-users] [Ext] change matrix In-Reply-To: References: <89835093-5358-4AB5-AD4A-B9E213F830F0@petsc.dev> Message-ID: <5D350A29-9AC9-42D9-80B5-918B295780A5@petsc.dev> > On Aug 31, 2020, at 2:08 PM, Kun Jiao wrote: > > If I am correct, to do this, it will double the peak memory usage. Yes > > Is there any way no to double the peak memory usage? The only way would be to destroy the old matrix, allocate a new one and recompute the entries. Barry Depending on the application etc the extra memory for storing two copies of the matrix may not be a fundamental problem. > > > > Schlumberger-Private > From: Barry Smith > > Sent: Monday, August 31, 2020 1:56 PM > To: Kun Jiao > > Cc: petsc-users > > Subject: [Ext] Re: [petsc-users] change matrix > > > Kun, > > This is not possible, PETSc matrices have a static size (resizing in parallel is tricky so we don't support it). > > If it is more efficient to reuse the matrix entries than recompute them you can create a larger matrix and then loop over the old matrix calling MatGetRow() and then call MatSetValues() to copy that row into the new matrix. > > Barry > > > > On Aug 31, 2020, at 1:51 PM, Kun Jiao via petsc-users > wrote: > > Hi Petsc Experts, > > Trying to do something like appending some rows (~100 rows) to an already created matrix, but could not find any document about it. > > Could anyone provide some information about it? > > Regards, > Kun > > Schlumberger-Private -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Aug 31 14:20:12 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 31 Aug 2020 15:20:12 -0400 Subject: [petsc-users] [Ext] change matrix In-Reply-To: <5D350A29-9AC9-42D9-80B5-918B295780A5@petsc.dev> References: <89835093-5358-4AB5-AD4A-B9E213F830F0@petsc.dev> <5D350A29-9AC9-42D9-80B5-918B295780A5@petsc.dev> Message-ID: On Mon, Aug 31, 2020 at 3:12 PM Barry Smith wrote: > On Aug 31, 2020, at 2:08 PM, Kun Jiao wrote: > > If I am correct, to do this, it will double the peak memory usage. > > > Yes > > > Is there any way no to double the peak memory usage? > > > The only way would be to destroy the old matrix, allocate a new one and > recompute the entries. > > Barry > > Depending on the application etc the extra memory for storing two copies > of the matrix may not be a fundamental problem > The other thing is to look at why you are adding rows. If you know how many will eventually show up you can allocate them, but fill with zeros, etc. until you get the values. Thanks, Matt > > > Schlumberger-Private > *From:* Barry Smith > *Sent:* Monday, August 31, 2020 1:56 PM > *To:* Kun Jiao > *Cc:* petsc-users > *Subject:* [Ext] Re: [petsc-users] change matrix > > > Kun, > > This is not possible, PETSc matrices have a static size (resizing in > parallel is tricky so we don't support it). > > If it is more efficient to reuse the matrix entries than recompute them > you can create a larger matrix and then loop over the old matrix calling > MatGetRow() and then call MatSetValues() to copy that row into the new > matrix. > > Barry > > > > > On Aug 31, 2020, at 1:51 PM, Kun Jiao via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hi Petsc Experts, > > Trying to do something like appending some rows (~100 rows) to an already > created matrix, but could not find any document about it. > > Could anyone provide some information about it? > > Regards, > Kun > > Schlumberger-Private > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From KJiao at slb.com Mon Aug 31 14:36:37 2020 From: KJiao at slb.com (Kun Jiao) Date: Mon, 31 Aug 2020 19:36:37 +0000 Subject: [petsc-users] [Ext] change matrix In-Reply-To: References: <89835093-5358-4AB5-AD4A-B9E213F830F0@petsc.dev> <5D350A29-9AC9-42D9-80B5-918B295780A5@petsc.dev>, Message-ID: <29e2b187-e354-4718-b218-ba39dd746f0e@email.android.com> thanks for the info. regards, kun On Aug 31, 2020 2:20 PM, Matthew Knepley wrote: On Mon, Aug 31, 2020 at 3:12 PM Barry Smith > wrote: On Aug 31, 2020, at 2:08 PM, Kun Jiao > wrote: If I am correct, to do this, it will double the peak memory usage. Yes Is there any way no to double the peak memory usage? The only way would be to destroy the old matrix, allocate a new one and recompute the entries. Barry Depending on the application etc the extra memory for storing two copies of the matrix may not be a fundamental problem The other thing is to look at why you are adding rows. If you know how many will eventually show up you can allocate them, but fill with zeros, etc. until you get the values. Thanks, Matt Schlumberger-Private From: Barry Smith > Sent: Monday, August 31, 2020 1:56 PM To: Kun Jiao > Cc: petsc-users > Subject: [Ext] Re: [petsc-users] change matrix Kun, This is not possible, PETSc matrices have a static size (resizing in parallel is tricky so we don't support it). If it is more efficient to reuse the matrix entries than recompute them you can create a larger matrix and then loop over the old matrix calling MatGetRow() and then call MatSetValues() to copy that row into the new matrix. Barry On Aug 31, 2020, at 1:51 PM, Kun Jiao via petsc-users > wrote: Hi Petsc Experts, Trying to do something like appending some rows (~100 rows) to an already created matrix, but could not find any document about it. Could anyone provide some information about it? Regards, Kun Schlumberger-Private -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From thibault.bridelbertomeu at gmail.com Mon Aug 31 15:00:41 2020 From: thibault.bridelbertomeu at gmail.com (Thibault Bridel-Bertomeu) Date: Mon, 31 Aug 2020 22:00:41 +0200 Subject: [petsc-users] DMAdaptLabel with triangle mesh In-Reply-To: References: Message-ID: Le lun. 31 ao?t 2020 ? 20:35, Matthew Knepley a ?crit : > On Mon, Aug 31, 2020 at 9:45 AM Thibault Bridel-Bertomeu < > thibault.bridelbertomeu at gmail.com> wrote: > >> Hi Matt, >> >> OK so I tried to replicate the problem starting from one of the tests in >> PETSc repo. >> I found >> https://gitlab.com/petsc/petsc/-/blob/master/src/dm/impls/plex/tests/ex20.c that >> actually uses DMAdaptLabel. >> Just add >> >> { >> >> DM gdm; >> >> >> >> DMPlexConstructGhostCells (dm, NULL, NULL, &gdm); >> >> DMDestroy (&dm); >> >> dm = gdm; >> >> } >> >> after line 24 where the box mesh is generated. Then compile and run with ex20 -dim 2. >> >> It should tell you that Triangle 18 has an invalid vertex index. >> >> That's the minimal example that I found that replicates the problem. >> >> Ah, okay. p4est knows to discard the ghost cells. I can add that to > Triangle. > I thought it was something like that, seeing what addition of code triggers the problem. Thanks for adding the treatment to Triangle ! > Regarding the serial character of the technique, I tried with a distributed mesh and it works. >> >> Hmm, it can't work. Maybe it appears to work. Triangle knows nothing > about parallelism. So this must be feeding the local mesh to triangle and > replacing it by > a refined mesh, but the parallel boundaries will not be correct, and might > not even match up. > Ok, yea, it appears to work. When asked to refine from scratch, not from AdaptLabel but with a -dm_refine order, the mesh is funky as if it was entirely re-made and the previous mesh thrown away. Can you think of a way where each processor would be able to call on Triangle on it?s own, with its own piece of mesh and maybe the surrounding ghost cells ? I imagine it could work for parallel refining of triangular meshes, couldn?t it ? Thanks for your replies, Have a great afternoon/evening ! Thibault > Thanks, > > Matt > >> So do you mean that intrinsically it gathers all the cells on the master proc before proceeding to the coarsening & refinement and only then broadcast the info back to the other processors ? >> >> Thanks, >> >> Thibault >> >> Le lun. 31 ao?t 2020 ? 12:55, Matthew Knepley a >> ?crit : >> >>> On Mon, Aug 31, 2020 at 5:34 AM Thibault Bridel-Bertomeu < >>> thibault.bridelbertomeu at gmail.com> wrote: >>> >>>> Dear all, >>>> >>>> I have recently been playing around with the AMR capabilities embedded >>>> in PETSc for quad meshes using p4est. Based on the TS tutorial ex11, I was >>>> able to incorporate the AMR into a pre-existing code with different metrics >>>> for the adaptation process. >>>> Now I would like to do something similar using tri meshes. I read that >>>> compiling PETSc with Triangle (in 2D and Tetgen for 3D) gives access to >>>> refinement and coarsening capabilities on triangular meshes.When I try to >>>> execute the code with a triangular mesh (that i manipulate as a DMPLEX), it >>>> yields "Triangle 1700 has an invalid vertex index" when trying to adapt the >>>> mesh (the initial mesh indeed has 1700 cells). From what i could tell, it >>>> comes from the reconstruct method called by the triangulate method of >>>> triangle.c, the latter being called by either >>>> *DMPlexGenerate_Triangle *or *DMPlexRefine_Triangle *in PETSc, I >>>> cannot be sure. >>>> >>>> In substance, the code is the same as in ex11.c and the crash occurs in >>>> the first adaptation pass, i.e. an equivalent in ex11 is that it crashes >>>> after the SetInitialCondition in the first if (useAMR) located line 1835 >>>> when it calls adaptToleranceFVM (which I copied basically so the code is >>>> the same). >>>> >>>> Is the automatic mesh refinement feature on tri meshes supposed to work >>>> or am I trying something that has not been completed yet ? >>>> >>> >>> It is supposed to work, and does for some tests in the library. I >>> stopped using it because it is inherently serial and it is isotropic. >>> However, it should be fixed. >>> Is there something I can run to help me track down the problem? >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thank you very much for your help, as always. >>>> >>>> Thibault Bridel-Bertomeu >>>> ? >>>> Eng, MSc, PhD >>>> Research Engineer >>>> CEA/CESTA >>>> 33114 LE BARP >>>> Tel.: (+33)557046924 >>>> Mob.: (+33)611025322 >>>> Mail: thibault.bridelbertomeu at gmail.com >>>> >>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >>> >>> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -- Thibault Bridel-Bertomeu ? Eng, MSc, PhD Research Engineer CEA/CESTA 33114 LE BARP Tel.: (+33)557046924 Mob.: (+33)611025322 Mail: thibault.bridelbertomeu at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Aug 31 15:03:23 2020 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 31 Aug 2020 16:03:23 -0400 Subject: [petsc-users] DMAdaptLabel with triangle mesh In-Reply-To: References: Message-ID: On Mon, Aug 31, 2020 at 4:00 PM Thibault Bridel-Bertomeu < thibault.bridelbertomeu at gmail.com> wrote: > > > Le lun. 31 ao?t 2020 ? 20:35, Matthew Knepley a > ?crit : > >> On Mon, Aug 31, 2020 at 9:45 AM Thibault Bridel-Bertomeu < >> thibault.bridelbertomeu at gmail.com> wrote: >> >>> Hi Matt, >>> >>> OK so I tried to replicate the problem starting from one of the tests in >>> PETSc repo. >>> I found >>> https://gitlab.com/petsc/petsc/-/blob/master/src/dm/impls/plex/tests/ex20.c that >>> actually uses DMAdaptLabel. >>> Just add >>> >>> { >>> >>> DM gdm; >>> >>> >>> >>> DMPlexConstructGhostCells (dm, NULL, NULL, &gdm); >>> >>> DMDestroy (&dm); >>> >>> dm = gdm; >>> >>> } >>> >>> after line 24 where the box mesh is generated. Then compile and run with ex20 -dim 2. >>> >>> It should tell you that Triangle 18 has an invalid vertex index. >>> >>> That's the minimal example that I found that replicates the problem. >>> >>> Ah, okay. p4est knows to discard the ghost cells. I can add that to >> Triangle. >> > > I thought it was something like that, seeing what addition of code > triggers the problem. > Thanks for adding the treatment to Triangle ! > >> Regarding the serial character of the technique, I tried with a distributed mesh and it works. >>> >>> Hmm, it can't work. Maybe it appears to work. Triangle knows nothing >> about parallelism. So this must be feeding the local mesh to triangle and >> replacing it by >> a refined mesh, but the parallel boundaries will not be correct, and >> might not even match up. >> > > Ok, yea, it appears to work. When asked to refine from scratch, not from > AdaptLabel but with a -dm_refine order, the mesh is funky as if it was > entirely re-made and the previous mesh thrown away. > Can you think of a way where each processor would be able to call on > Triangle on it?s own, with its own piece of mesh and maybe the surrounding > ghost cells ? I imagine it could work for parallel refining of triangular > meshes, couldn?t it ? > It turns out that his is a very hairy problem. That is why almost no parallel refinement packages exist. To my knowledge, this is only one: Pragmatic. We support that package, but it is in development, and we really need to update our interface. I am working on it, but too much stuff gets in the way. Thanks, Matt > Thanks for your replies, > Have a great afternoon/evening ! > > Thibault > > >> Thanks, >> >> Matt >> >>> So do you mean that intrinsically it gathers all the cells on the master proc before proceeding to the coarsening & refinement and only then broadcast the info back to the other processors ? >>> >>> Thanks, >>> >>> Thibault >>> >>> Le lun. 31 ao?t 2020 ? 12:55, Matthew Knepley a >>> ?crit : >>> >>>> On Mon, Aug 31, 2020 at 5:34 AM Thibault Bridel-Bertomeu < >>>> thibault.bridelbertomeu at gmail.com> wrote: >>>> >>>>> Dear all, >>>>> >>>>> I have recently been playing around with the AMR capabilities embedded >>>>> in PETSc for quad meshes using p4est. Based on the TS tutorial ex11, I was >>>>> able to incorporate the AMR into a pre-existing code with different metrics >>>>> for the adaptation process. >>>>> Now I would like to do something similar using tri meshes. I read that >>>>> compiling PETSc with Triangle (in 2D and Tetgen for 3D) gives access to >>>>> refinement and coarsening capabilities on triangular meshes.When I try to >>>>> execute the code with a triangular mesh (that i manipulate as a DMPLEX), it >>>>> yields "Triangle 1700 has an invalid vertex index" when trying to adapt the >>>>> mesh (the initial mesh indeed has 1700 cells). From what i could tell, it >>>>> comes from the reconstruct method called by the triangulate method of >>>>> triangle.c, the latter being called by either >>>>> *DMPlexGenerate_Triangle *or *DMPlexRefine_Triangle *in PETSc, I >>>>> cannot be sure. >>>>> >>>>> In substance, the code is the same as in ex11.c and the crash occurs >>>>> in the first adaptation pass, i.e. an equivalent in ex11 is that it crashes >>>>> after the SetInitialCondition in the first if (useAMR) located line 1835 >>>>> when it calls adaptToleranceFVM (which I copied basically so the code is >>>>> the same). >>>>> >>>>> Is the automatic mesh refinement feature on tri meshes supposed to >>>>> work or am I trying something that has not been completed yet ? >>>>> >>>> >>>> It is supposed to work, and does for some tests in the library. I >>>> stopped using it because it is inherently serial and it is isotropic. >>>> However, it should be fixed. >>>> Is there something I can run to help me track down the problem? >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Thank you very much for your help, as always. >>>>> >>>>> Thibault Bridel-Bertomeu >>>>> ? >>>>> Eng, MSc, PhD >>>>> Research Engineer >>>>> CEA/CESTA >>>>> 33114 LE BARP >>>>> Tel.: (+33)557046924 >>>>> Mob.: (+33)611025322 >>>>> Mail: thibault.bridelbertomeu at gmail.com >>>>> >>>>> >>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>>> >>>> >>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> -- > Thibault Bridel-Bertomeu > ? > Eng, MSc, PhD > Research Engineer > CEA/CESTA > 33114 LE BARP > Tel.: (+33)557046924 > Mob.: (+33)611025322 > Mail: thibault.bridelbertomeu at gmail.com > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From emconsta at anl.gov Mon Aug 31 17:27:44 2020 From: emconsta at anl.gov (Constantinescu, Emil M.) Date: Mon, 31 Aug 2020 22:27:44 +0000 Subject: [petsc-users] ARKIMEX produces incorrect values In-Reply-To: References: Message-ID: On 8/31/20 12:17 PM, Ed Bueler wrote: Emil -- Thanks for looking at this. > Hi Ed, can you please add the following > TSSetEquationType(ts,TS_EQ_IMPLICIT); > before calling TSSolve and try again? This is described in Table 12 in the pdf doc. Yep, that fixes it. After setting the TS_EQ_IMPLICIT flag programmatically I get: It is only programmatic because it has to do with the form of RHS and IFunctions. $ ./ex54 -ts_type arkimex -ts_arkimex_fully_implicit error norm at tf = 1.000000 from 12 steps: |u-u_exact| = 1.34500e-02 Without -ts_arkimex_fully_implicit we still get the wrong answer, but, as I understand it, we expect the wrong answer because dF/d(dudt) != I, correct? Yes. I keep mixing F and G, but if you want to solve Mu'=H(u), then define the IFunction := M u_dot - H(u) then it should work with all time steppers. If you want to set the RHS of your ODE in the RHS function (so that you can use explicit integrators, too) you have to provide: IFunction := u_dot and RHSFunction := M^{-1}*H(u) [or solve Mx=H(u) in the RHS function]. Note that M u_dot - H(u) can only be solved by implicit solvers directly so IFunction := M u_dot and RHSFunction := H(u). Table 12 in the PDF doc explains these cases, but that can be improved as well. So -ts_arkimex_fully_implicit does not set this flag? No, its use is for when you have both IFunction (for stiff) and RHSfunction (for nonstiff) defined to solve Mu'=H(u) + W(u) and: 1- mass is identity: IFunction:= u_dot-H(u); RHSFunction:= W(u), or 2- mass is full rank, but not identity: IFunction:= M u_dot-H(u); RHSFunction:= M^{-1} * W(u) and you have a choice of using either an IMEX scheme [-ts_arkimex_fully_implicit false] or just the implicit part [-ts_arkimex_fully_implicit true]. Thank you for your feedback on our short survey - it is very valuable in helping us crafting a less painful path to using all these options. Emil > So that we improve our user experience, can you tell us what are your usual sources/starting points > when implementing a new problem: > 1- PDF doc Yes. Looked briefly at the PDF manual. E.g. I saw the tables for IMEX methods but my eyes glazed over. > 2- tutorials (if you find a good match) Yes. Looked at various html pages including the one for TSARKIMEX. But I missed the sentence "Methods with an explicit stage can only be used with ODE in which the stiff part G(t,X,Xdot) has the form Xdot + Ghat(t,X)." I did not expect that ARKIMEX had this restriction, and did not pick it up. > 3- own PETSc implementations Yes. I have my own diffusion-reaction system (https://github.com/bueler/p4pdes/blob/master/c/ch5/pattern.c) in which ARKIMEX works well. (Or at least as far as I can tell. I don't have a manufactured solution for it, for example.) I am in the midst of tracking down a different kind of error, probably from DMDA callbacks, when I got distracted by the current issue. > 4- online function doc Yes. See above comment on TSARKIMEX page. By my memory I also looked at the TSSet{I,RHS}Jacobian() pages, for example, and probably others. > 5- other Not sure. Thanks, Ed On Mon, Aug 31, 2020 at 6:09 AM Constantinescu, Emil M. > wrote: On 8/30/20 6:04 PM, Ed Bueler wrote: Actually, ARKIMEX is not off the hook. It still gets the wrong answer if told the whole thing is implicit: $ ./ex54 -ts_type arkimex -ts_arkimex_fully_implicit # WRONG (AND REALLY SLOW) error norm at tf = 1.000000 from 224 steps: |u-u_exact| = 2.76636e+00 Hi Ed, can you please add the following TSSetEquationType(ts,TS_EQ_IMPLICIT); before calling TSSolve and try again? This is described in Table 12 in the pdf doc. So that we improve our user experience, can you tell us what are your usual sources/starting points when implementing a new problem: 1- PDF doc 2- tutorials (if you find a good match) 3- own PETSc implementations 4- online function doc 5- other Thanks, Emil versus $ ./ex54 -ts_type arkimex # WRONG BUT IFunction IS OF FLAGGED FORM error norm at tf = 1.000000 from 16 steps: |u-u_exact| = 1.93229e+01 $ ./ex54 -ts_type bdf # RIGHT error norm at tf = 1.000000 from 33 steps: |u-u_exact| = 9.29170e-02 So I am not sure what "Methods with an explicit stage can only be used with ODE in which the stiff part G(t,X,Xdot) has the form Xdot + Ghat(t,X)." means. Ed On Sun, Aug 30, 2020 at 2:57 PM Ed Bueler > wrote: Darn, sorry. I realize the ARKIMEX page does say "Methods with an explicit stage can only be used with ODE in which the stiff part G(t,X,Xdot) has the form Xdot + Ghat(t,X)." So my example does not do that. Is there a way for ARKIMEX to detect that dG/d(Xdot) = I? Ed On Sun, Aug 30, 2020 at 2:44 PM Ed Bueler > wrote: Dear PETSc -- I tried twice to make this an issue at the gitlab.com host site, but both times got "something went wrong (500)". So this is a bug report by old-fashioned means. I created a TS example, https://github.com/bueler/p4pdes-next/blob/master/c/fix-arkimex/ex54.c at my github, also attached. It solves a 2D linear ODE ``` x' + y' = 6 y y' = x ``` Pretty basic; the known exact solution is just exponentials. The code writes it as F(t,u,u')=G(t,u) and supplies all the pieces, namely IFunction,IJacobian,RHSFunction,RHSJacobian. Note both F and G must be seen by TS to get the correct solution. In summary, a boring (and valgrind-clean ;-)) example. For current master branch it runs fine for the fully-implicit methods (e.g. BDF, CN, ROSW) which can use the IFunction F, including with finite-differenced Jacobians. With BDF2, BDF2+-snes_fd, BDF6+tight tol., CN, BEULER, ROSW: $ ./ex54 error norm at tf = 1.000000 from 33 steps: |u-u_exact| = 9.29170e-02 $ ./ex54 -snes_fd error norm at tf = 1.000000 from 33 steps: |u-u_exact| = 9.29170e-02 $ ./ex54 -ts_rtol 1.0e-14 -ts_atol 1.0e-14 -ts_bdf_order 6 error norm at tf = 1.000000 from 388 steps: |u-u_exact| = 4.23624e-11 $ ./ex54 -ts_type beuler error norm at tf = 1.000000 from 100 steps: |u-u_exact| = 6.71676e-01 $ ./ex54 -ts_type cn error norm at tf = 1.000000 from 100 steps: |u-u_exact| = 2.22839e-03 $ ./ex54 -ts_type rosw error norm at tf = 1.000000 from 21 steps: |u-u_exact| = 5.64012e-03 But it produces wrong values with ARKIMEX: $ ./ex54 -ts_type arkimex error norm at tf = 1.000000 from 16 steps: |u-u_exact| = 1.93229e+01 Neither tightening tolerance nor changing type (`-ts_arkimex_type`) helps ARKIMEX. Thanks! Ed PS My book is at a late proofs stage, and out of my hands. It should appear SIAM Press in a couple of months. In all the examples in my book, only my diffusion-reaction system example using F(t,u,u') = G(t,u) is broken. Thus the motivation for a trivial ODE example as above. -- Ed Bueler Dept of Mathematics and Statistics University of Alaska Fairbanks Fairbanks, AK 99775-6660 306C Chapman -- Ed Bueler Dept of Mathematics and Statistics University of Alaska Fairbanks Fairbanks, AK 99775-6660 306C Chapman -- Ed Bueler Dept of Mathematics and Statistics University of Alaska Fairbanks Fairbanks, AK 99775-6660 306C Chapman -- Emil M. Constantinescu, Ph.D. Computational Mathematician Argonne National Laboratory Mathematics and Computer Science Division Ph: 630-252-0926 http://www.mcs.anl.gov/~emconsta -- Ed Bueler Dept of Mathematics and Statistics University of Alaska Fairbanks Fairbanks, AK 99775-6660 306C Chapman -- Emil M. Constantinescu, Ph.D. Computational Mathematician Argonne National Laboratory Mathematics and Computer Science Division Ph: 630-252-0926 http://www.mcs.anl.gov/~emconsta -------------- next part -------------- An HTML attachment was scrubbed... URL: From elbueler at alaska.edu Mon Aug 31 19:32:22 2020 From: elbueler at alaska.edu (Ed Bueler) Date: Mon, 31 Aug 2020 16:32:22 -0800 Subject: [petsc-users] ARKIMEX produces incorrect values In-Reply-To: References: Message-ID: Emil -- When I use PETSc on various tasks, so far, I have separated (1) how I describe the problem structure for use by the PETSc component, and (2) the choice of solver. For TS I am confused about what is expected by the design you describe. I would like to describe my ODE system as clearly as possible and *then* go out and try/choose solver types. My understanding is that if I have a problem which can be written in the form F(t,u,u') = G(t,u), and if I do not want to pre-emptively restrict to *not* allowing IMEX, then I should put F into an IFunction and G into a RHSFunction. This is a *good* split, performance-wise, if in fact F contains the stiff part, but whether good or bad I have described the ODE system. Any fully-implicit method should now be able to handle this form F(t,u,u') = G(t,u), because for implicit methods there is no real distinction between F(t,u,u')=0 and F(t,u,u')=G(t,u). If an IMEX method is completely flexible, and so far ROSW seems to be flexible in this way (?), then I think it should also work with either form. If an IMEX method is restricted further by form, e.g. requiring dF/d(u') to be the identity, then wouldn't it make sense to have the user programmatically indicate that structural property? Such an indication is not about the desired solver but about the ODE. If the structural property held then we could proceed with the restricted-application method, e.g. ARKIMEX/EIMEX. Perhaps one could have this (proposed) functionality: TSSetLHSHasIdentity(TS,PETSC_TRUE) Or one might instead set an enumerate for whether dF/d(u') is I or M (invertible) or M (noninvertible for DAE) or a general nonlinear function: TSSetLHSStructureType(TS,TS_LHS_STRUCTURE_IDENTITY) TSSetLHSStructureType(TS,TS_LHS_STRUCTURE_INVERTIBLE) TSSetLHSStructureType(TS,TS_LHS_STRUCTURE_NONINVERTIBLE) TSSetLHSStructureType(TS,TS_LHS_STRUCTURE_NONLINEAR) Obviously the enumerate would only need to include structure which some method could exploit. What do you think? In any case, I am currently having trouble with the preferred way to describe e.g. diffusion-reaction PDEs. It seems to me I would want to supply all four of these IFunction IJacobian RHSFunction RHSJacobian so as to allow full-performance for both fully-implicit methods and IMEX methods. (And for typical examples I certainly *can* form RHSJacobian, for example.) But none of the src/ts/tutorials/ examples seem to unconditionally supply all four, and I can't tell (e.g. from -ts_view) which parts are seen and called by the various methods. Ed On Mon, Aug 31, 2020 at 2:27 PM Constantinescu, Emil M. wrote: > > > > > > > > > > > On 8/31/20 12:17 PM, Ed Bueler wrote: > > > > > > > Emil -- > > > > > > > Thanks for looking at this. > > > > > > > > > Hi Ed, can you please add the following > > > > TSSetEquationType(ts,TS_EQ_IMPLICIT); > > > > before calling TSSolve and try again? This is described in Table 12 in > the pdf doc. > > > > > > > Yep, that fixes it. After setting the TS_EQ_IMPLICIT flag > programmatically I get: > > > > > > > > > > > > It is only programmatic because it has to do with the form of RHS and > IFunctions. > > > > > > > > > $ ./ex54 -ts_type arkimex -ts_arkimex_fully_implicit > > > error norm at tf = 1.000000 from 12 steps: |u-u_exact| = 1.34500e-02 > > > > > > > > > > > > Without -ts_arkimex_fully_implicit we still get the wrong answer, but, as > I understand it, we expect the wrong answer because dF/d(dudt) != I, > correct? > > > > > > > > > > > > Yes. I keep mixing F and G, but if you want to solve Mu'=H(u), then define > the IFunction := M u_dot - H(u) then it should work with all time steppers. > > > > > > > If you want to set the RHS of your ODE in the RHS function (so that you > can use explicit integrators, too) you have to provide: > > > > > > > IFunction := u_dot and RHSFunction := M^{-1}*H(u) [or solve Mx=H(u) in the > RHS function]. > > > > > > > Note that M u_dot - H(u) can only be solved by implicit solvers directly > so IFunction := M u_dot and RHSFunction := H(u). Table 12 in the PDF doc > explains these cases, but that can be improved as well. > > > > > > > > So -ts_arkimex_fully_implicit does not set this flag? > > > > > > > > > > > > No, its use is for when you have both IFunction (for stiff) and > RHSfunction (for nonstiff) defined to solve Mu'=H(u) + W(u) and: > > > 1- mass is identity: IFunction:= u_dot-H(u); RHSFunction:= W(u), or > > > > > 2- mass is full rank, but not identity: IFunction:= M u_dot-H(u); > RHSFunction:= M^{-1} * W(u) > > > and you have a choice of using either an IMEX scheme > [-ts_arkimex_fully_implicit false] or just the implicit part > [-ts_arkimex_fully_implicit true]. > > > > > Thank you for your feedback on our short survey - it is very valuable in > helping us crafting a less painful path to using all these options. > > > Emil > > > > > > > > So that we improve our user experience, can you tell us what are your > usual sources/starting points > > > when implementing a new problem: > > > > 1- PDF doc > > > > > > Yes. Looked briefly at the PDF manual. E.g. I saw the tables for IMEX > methods but my eyes glazed over. > > > > > > > 2- tutorials (if you find a good match) > > > > > > > > Yes. Looked at various html pages including the one for TSARKIMEX. But I > missed the sentence "Methods with an explicit stage can only be used with > ODE in which the stiff part G(t,X,Xdot) has the form Xdot + Ghat(t,X)." I > did not expect that ARKIMEX > > had this restriction, and did not pick it up. > > > > > > > 3- own PETSc implementations > > > > > > Yes. I have my own diffusion-reaction system ( > https://github.com/bueler/p4pdes/blob/master/c/ch5/pattern.c) in which > ARKIMEX works well. (Or at least as far > > as I can tell. I don't have a manufactured solution for it, for > example.) I am in the midst of tracking down a different kind of error, > probably from DMDA callbacks, when I got distracted by the current issue. > > > > > > > 4- online function doc > > > > > > Yes. See above comment on TSARKIMEX page. By my memory I also looked at > the TSSet{I,RHS}Jacobian() pages, for example, and probably others. > > > > > > > 5- other > > > > > > > > Not sure. > > > > > > > > Thanks, > > > > > > > > Ed > > > > > > > > > > > > > > > On Mon, Aug 31, 2020 at 6:09 AM Constantinescu, Emil M. > wrote: > > > > > >> >> >> >> >> >> >> >> >> On 8/30/20 6:04 PM, Ed Bueler wrote: >> >> >> >> >> >> >> Actually, ARKIMEX is not off the hook. It still gets the wrong answer if >> told the whole thing is implicit: >> >> >> >> >> >> >> $ ./ex54 -ts_type arkimex -ts_arkimex_fully_implicit # WRONG (AND >> REALLY SLOW) >> >> >> error norm at tf = 1.000000 from 224 steps: |u-u_exact| = 2.76636e+00 >> >> >> >> >> >> >> >> >> >> >> >> Hi Ed, can you please add the following >> >> TSSetEquationType (ts,TS_EQ_IMPLICIT ); >> >> >> >> before calling TSSolve and try again? This is described in Table 12 in >> the pdf doc. >> >> >> >> >> >> >> >> >> >> So that we improve our user experience, can you tell us what are your >> usual sources/starting points when implementing a new problem: >> >> >> 1- PDF doc >> >> >> 2- tutorials (if you find a good match) >> >> >> >> >> 3- own PETSc implementations >> >> >> 4- online function doc >> >> >> 5- other >> >> >> >> >> Thanks, >> >> >> Emil >> >> >> >> >> >> >> >> >> versus >> >> >> >> >> >> >> >> $ ./ex54 -ts_type arkimex # WRONG BUT IFunction IS OF FLAGGED FORM >> >> >> error norm at tf = 1.000000 from 16 steps: |u-u_exact| = 1.93229e+01 >> >> >> >> >> >> >> >> >> >> $ ./ex54 -ts_type bdf # RIGHT >> >> >> error norm at tf = 1.000000 from 33 steps: |u-u_exact| = 9.29170e-02 >> >> >> >> >> >> >> >> >> >> So I am not sure what "Methods with an explicit stage can only be used >> with ODE in which the stiff part G(t,X,Xdot) has the form Xdot + >> Ghat(t,X)." means. >> >> >> >> >> >> >> >> Ed >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Sun, Aug 30, 2020 at 2:57 PM Ed Bueler wrote: >> >> >> >> >> >>> >>> Darn, sorry. >>> >>> >>> >>> >>> >>> >>> I realize the ARKIMEX page does say "Methods with an explicit stage can >>> only be used with ODE in which the stiff part G(t,X,Xdot) has the form Xdot >>> + Ghat(t,X)." So my example does not do that. Is there a way for >>> ARKIMEX to detect that dG/d(Xdot) = I? >>> >>> >>> >>> >>> >>> >>> Ed >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Sun, Aug 30, 2020 at 2:44 PM Ed Bueler wrote: >>> >>> >>> >>> >>> >>>> >>>> >>>> >>>> Dear PETSc -- >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> I tried twice to make this an issue at the gitlab.com host site, but >>>> both times got "something went wrong (500)". So this is a bug report by >>>> old-fashioned means. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> I created a TS example, >>>> >>>> https://github.com/bueler/p4pdes-next/blob/master/c/fix-arkimex/ex54.c >>>> at my github, also attached. It solves a 2D linear ODE >>>> >>>> >>>> ``` >>>> >>>> >>>> x' + y' = 6 y >>>> >>>> >>>> y' = x >>>> >>>> >>>> ``` >>>> >>>> >>>> Pretty basic; the known exact solution is just exponentials. The code >>>> writes it as F(t,u,u')=G(t,u) and supplies all the pieces, namely >>>> IFunction,IJacobian,RHSFunction,RHSJacobian. Note both F and G must be >>>> seen by TS to get the correct solution. In summary, >>>> >>>> a boring (and valgrind-clean ;-)) example. >>>> >>>> >>>> >>>> >>>> For current master branch it runs fine for the fully-implicit methods >>>> (e.g. BDF, CN, ROSW) which can use the IFunction F, including with >>>> finite-differenced Jacobians. With BDF2, BDF2+-snes_fd, BDF6+tight tol., >>>> CN, BEULER, ROSW: >>>> >>>> >>>> $ ./ex54 >>>> >>>> >>>> error norm at tf = 1.000000 from 33 steps: |u-u_exact| = 9.29170e-02 >>>> >>>> >>>> $ ./ex54 -snes_fd >>>> >>>> >>>> error norm at tf = 1.000000 from 33 steps: |u-u_exact| = 9.29170e-02 >>>> >>>> >>>> $ ./ex54 -ts_rtol 1.0e-14 -ts_atol 1.0e-14 -ts_bdf_order 6 >>>> >>>> >>>> error norm at tf = 1.000000 from 388 steps: |u-u_exact| = 4.23624e-11 >>>> >>>> >>>> $ ./ex54 -ts_type beuler >>>> >>>> >>>> error norm at tf = 1.000000 from 100 steps: |u-u_exact| = 6.71676e-01 >>>> >>>> >>>> $ ./ex54 -ts_type cn >>>> >>>> >>>> error norm at tf = 1.000000 from 100 steps: |u-u_exact| = 2.22839e-03 >>>> >>>> >>>> $ ./ex54 -ts_type rosw >>>> >>>> >>>> error norm at tf = 1.000000 from 21 steps: |u-u_exact| = 5.64012e-03 >>>> >>>> >>>> >>>> >>>> >>>> But it produces wrong values with ARKIMEX: >>>> >>>> >>>> $ ./ex54 -ts_type arkimex >>>> >>>> >>>> error norm at tf = 1.000000 from 16 steps: |u-u_exact| = 1.93229e+01 >>>> >>>> >>>> >>>> >>>> >>>> Neither tightening tolerance nor changing type (`-ts_arkimex_type`) >>>> helps ARKIMEX. >>>> >>>> >>>> >>>> >>>> >>>> Thanks! >>>> >>>> >>>> >>>> >>>> >>>> Ed >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> PS My book is at a late proofs stage, and out of my hands. It should >>>> appear SIAM Press in a couple of months. In all the examples in my book, >>>> only my diffusion-reaction system example using F(t,u,u') = G(t,u) is >>>> broken. Thus the motivation for a trivial >>>> >>>> ODE example as above. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> Ed Bueler >>>> >>>> >>>> Dept of Mathematics and Statistics >>>> >>>> >>>> University of Alaska Fairbanks >>>> >>>> >>>> Fairbanks, AK 99775-6660 >>>> >>>> >>>> 306C Chapman >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> Ed Bueler >>> >>> >>> Dept of Mathematics and Statistics >>> >>> >>> University of Alaska Fairbanks >>> >>> >>> Fairbanks, AK 99775-6660 >>> >>> >>> 306C Chapman >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Ed Bueler >> >> >> Dept of Mathematics and Statistics >> >> >> University of Alaska Fairbanks >> >> >> Fairbanks, AK 99775-6660 >> >> >> 306C Chapman >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> Emil M. Constantinescu, Ph.D. >> >> Computational Mathematician >> >> Argonne National Laboratory >> >> Mathematics and Computer Science Division >> >> >> >> Ph: 630-252-0926 >> >> http://www.mcs.anl.gov/~emconsta >> >> >> >> >> >> > > > > > > > > > > > > -- > > > > > > > > > > > > > > > Ed Bueler > > > Dept of Mathematics and Statistics > > > University of Alaska Fairbanks > > > Fairbanks, AK 99775-6660 > > > 306C Chapman > > > > > > > > > > > > > > > > > > > -- > > Emil M. Constantinescu, Ph.D. > > Computational Mathematician > > Argonne National Laboratory > > Mathematics and Computer Science Division > > > > Ph: 630-252-0926 > > http://www.mcs.anl.gov/~emconsta > > > > > > > > -- Ed Bueler Dept of Mathematics and Statistics University of Alaska Fairbanks Fairbanks, AK 99775-6660 306C Chapman -------------- next part -------------- An HTML attachment was scrubbed... URL: From karl.linkui at gmail.com Mon Aug 31 22:11:00 2020 From: karl.linkui at gmail.com (Karl Lin) Date: Mon, 31 Aug 2020 22:11:00 -0500 Subject: [petsc-users] is there a function to append matrix Message-ID: If I have two matrix A and B with the same number of columns, same distribution pattern (column ownership pattern) among processes but different number of rows, is there a function to append B to A to make a new matrix C = [A; B]? Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Aug 31 22:29:46 2020 From: jed at jedbrown.org (Jed Brown) Date: Mon, 31 Aug 2020 21:29:46 -0600 Subject: [petsc-users] is there a function to append matrix In-Reply-To: References: Message-ID: <87lfhukwd1.fsf@jedbrown.org> Karl Lin writes: > If I have two matrix A and B with the same number of columns, same > distribution pattern (column ownership pattern) among processes but > different number of rows, is there a function to append B to A to make a > new matrix C = [A; B]? Thanks. Sort of; you can create a MatNest with the two matrices and (optionally) convert to AIJ format. Better, you can take the code that builds A and B, but call it on "local" submatrices; see MatGetLocalSubMatrix() or an example like src/snes/tutorials/ex28.c. From karl.linkui at gmail.com Mon Aug 31 22:51:59 2020 From: karl.linkui at gmail.com (Karl Lin) Date: Mon, 31 Aug 2020 22:51:59 -0500 Subject: [petsc-users] is there a function to append matrix In-Reply-To: <87lfhukwd1.fsf@jedbrown.org> References: <87lfhukwd1.fsf@jedbrown.org> Message-ID: Thanks for the quick reply. The reason why I want to do this is because I would like to build A and B separately first. Then do something with B by itself. Then scale B by a constant. Then append B to A to make C and continue some other matrix operations. I took a look at MatGetLocalSubMatrix() and there is this line: Depending on the format of mat, the returned submat may not implement MatMult (). Its communicator may be the same as mat, it may be PETSC_COMM_SELF , or some other subcomm of mat's. what is the format that will make submat not being able to do MatMult()? Thank you very much. On Mon, Aug 31, 2020 at 10:29 PM Jed Brown wrote: > Karl Lin writes: > > > If I have two matrix A and B with the same number of columns, same > > distribution pattern (column ownership pattern) among processes but > > different number of rows, is there a function to append B to A to make a > > new matrix C = [A; B]? Thanks. > > Sort of; you can create a MatNest with the two matrices and (optionally) > convert to AIJ format. > > Better, you can take the code that builds A and B, but call it on "local" > submatrices; see MatGetLocalSubMatrix() or an example like > src/snes/tutorials/ex28.c. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Aug 31 22:58:31 2020 From: jed at jedbrown.org (Jed Brown) Date: Mon, 31 Aug 2020 21:58:31 -0600 Subject: [petsc-users] is there a function to append matrix In-Reply-To: References: <87lfhukwd1.fsf@jedbrown.org> Message-ID: <87h7sikv14.fsf@jedbrown.org> Karl Lin writes: > Thanks for the quick reply. The reason why I want to do this is because I > would like to build A and B separately first. Then do something with B by > itself. Then scale B by a constant. Then append B to A to make C and > continue some other matrix operations. I took a look at > MatGetLocalSubMatrix() and there is this line: > > Depending on the format of mat, the returned submat may not implement > MatMult > (). > Its communicator may be the same as mat, it may be PETSC_COMM_SELF > , > or some other subcomm of mat's. > > what is the format that will make submat not being able to do MatMult()? > Thank you very much. When called on matrices like AIJ, it only returns a Mat capable of doing assembly-related operations (like MatSetValuesLocal). If you use MatNest, it returns the matching submatrix (which is typically fully-functional), but MatNest does not support monolithic preconditioners like a sparse direct solver. (It's usually used with PCFieldSplit.) If you don't mind the extra time and space, you can MatConvert, otherwise assembly into the data structure you want (via MatGetLocalSubMatrix). > On Mon, Aug 31, 2020 at 10:29 PM Jed Brown wrote: > >> Karl Lin writes: >> >> > If I have two matrix A and B with the same number of columns, same >> > distribution pattern (column ownership pattern) among processes but >> > different number of rows, is there a function to append B to A to make a >> > new matrix C = [A; B]? Thanks. >> >> Sort of; you can create a MatNest with the two matrices and (optionally) >> convert to AIJ format. >> >> Better, you can take the code that builds A and B, but call it on "local" >> submatrices; see MatGetLocalSubMatrix() or an example like >> src/snes/tutorials/ex28.c. >> From karl.linkui at gmail.com Mon Aug 31 22:58:13 2020 From: karl.linkui at gmail.com (Karl Lin) Date: Mon, 31 Aug 2020 22:58:13 -0500 Subject: [petsc-users] is there a function to append matrix In-Reply-To: <87lfhukwd1.fsf@jedbrown.org> References: <87lfhukwd1.fsf@jedbrown.org> Message-ID: I guess another way to look at this is if I already build matrix A and MatAssembly has been called. Can I populate more rows to matrix A later on? With the number of columns and column ownership pattern not changed of course. Thank you. On Mon, Aug 31, 2020 at 10:29 PM Jed Brown wrote: > Karl Lin writes: > > > If I have two matrix A and B with the same number of columns, same > > distribution pattern (column ownership pattern) among processes but > > different number of rows, is there a function to append B to A to make a > > new matrix C = [A; B]? Thanks. > > Sort of; you can create a MatNest with the two matrices and (optionally) > convert to AIJ format. > > Better, you can take the code that builds A and B, but call it on "local" > submatrices; see MatGetLocalSubMatrix() or an example like > src/snes/tutorials/ex28.c. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Aug 31 23:00:55 2020 From: jed at jedbrown.org (Jed Brown) Date: Mon, 31 Aug 2020 22:00:55 -0600 Subject: [petsc-users] is there a function to append matrix In-Reply-To: References: <87lfhukwd1.fsf@jedbrown.org> Message-ID: <87a6yakux4.fsf@jedbrown.org> Karl Lin writes: > I guess another way to look at this is if I already build matrix A and > MatAssembly has been called. Can I populate more rows to matrix A later on? > With the number of columns and column ownership pattern not changed of > course. Thank you. No. From karl.linkui at gmail.com Mon Aug 31 23:19:34 2020 From: karl.linkui at gmail.com (Karl Lin) Date: Mon, 31 Aug 2020 23:19:34 -0500 Subject: [petsc-users] is there a function to append matrix In-Reply-To: <87a6yakux4.fsf@jedbrown.org> References: <87lfhukwd1.fsf@jedbrown.org> <87a6yakux4.fsf@jedbrown.org> Message-ID: Thanks for the feedback. What about if I build A to have as many rows as A and B and then later on use MatGetRow and MatSetValues to add B matrix entries to A? Can MatGetRow and MatSetValues be used after MatAssembly is called? B is much much smaller than A so the number of rows can be added to just the portion of A on one process. Will this work? Thanks. Regards. On Mon, Aug 31, 2020 at 11:00 PM Jed Brown wrote: > Karl Lin writes: > > > I guess another way to look at this is if I already build matrix A and > > MatAssembly has been called. Can I populate more rows to matrix A later > on? > > With the number of columns and column ownership pattern not changed of > > course. Thank you. > > No. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Aug 31 23:22:44 2020 From: jed at jedbrown.org (Jed Brown) Date: Mon, 31 Aug 2020 22:22:44 -0600 Subject: [petsc-users] is there a function to append matrix In-Reply-To: References: <87lfhukwd1.fsf@jedbrown.org> <87a6yakux4.fsf@jedbrown.org> Message-ID: <877dtektwr.fsf@jedbrown.org> Karl Lin writes: > Thanks for the feedback. What about if I build A to have as many rows as A > and B and then later on use MatGetRow and MatSetValues to add B matrix > entries to A? Can MatGetRow and MatSetValues be used after MatAssembly is > called? B is much much smaller than A so the number of rows can be added to > just the portion of A on one process. Will this work? Thanks. Regards. That would work fine, you'll just need to MatAssembly after your new MatSetValues. Note that you'll likely want to think about the distribution of B relative to A; you may not want B to come "at the end" because it'll all be on the last rank, versus dispersed over the ranks. This is especially true if those rows are heavier. > On Mon, Aug 31, 2020 at 11:00 PM Jed Brown wrote: > >> Karl Lin writes: >> >> > I guess another way to look at this is if I already build matrix A and >> > MatAssembly has been called. Can I populate more rows to matrix A later >> on? >> > With the number of columns and column ownership pattern not changed of >> > course. Thank you. >> >> No. >> From emconsta at anl.gov Mon Aug 31 23:59:38 2020 From: emconsta at anl.gov (Constantinescu, Emil M.) Date: Tue, 1 Sep 2020 04:59:38 +0000 Subject: [petsc-users] ARKIMEX produces incorrect values In-Reply-To: References: Message-ID: <5a194153-8239-9b88-c0fb-c4e3e8d9f135@anl.gov> Ed, I agree with you that there is a problem with how the interface is presented and guidelines need to be improved. That's why your feedback is so valuable - and we appreciate it. I created Table 12 (ref below) to make things more clear, but based on your and others feedback is definitely not enough. We will take your input under serious advisement, but until we improve the process, here's some the rationale below in context. On 8/31/20 7:32 PM, Ed Bueler wrote: Emil -- When I use PETSc on various tasks, so far, I have separated (1) how I describe the problem structure for use by the PETSc component, and (2) the choice of solver. For TS I am confused about what is expected by the design you describe. I would like to describe my ODE system as clearly as possible and *then* go out and try/choose solver types. I agree, I like to follow a similar process. But from implementation considerations this is sometimes difficult to maintain. ARKIMEX solves 6-7 different problems types ranging from stiff ODEs, IMEX ODEs, DAEs with or without trivial mass matrices. The decision we made >5years ago was to keep a single TS type and code. The user should specify the F,G functions and tell PETSc the type of problem (ODE/DAE/Mass) and internally we'd distinguish as opposed to re-implementing the same code in 6-7 TS types with tiny differences. My understanding is that if I have a problem which can be written in the form F(t,u,u') = G(t,u), and if I do not want to pre-emptively restrict to *not* allowing IMEX, then I should put F into an IFunction and G into a RHSFunction. This is a *good* split, performance-wise, if in fact F contains the stiff part, but whether good or bad I have described the ODE system. In general yes, but if the ODE is implicit (e.g., nontrivial M), then from the solver standpoint there is no reason to put anything in G() because most solvers will move G on the other side and include in F. That's why CN, BE, BDF worked in your example. None use G() explicitly. Any fully-implicit method should now be able to handle this form F(t,u,u') = G(t,u), because for implicit methods there is no real distinction between F(t,u,u')=0 and F(t,u,u')=G(t,u). If an IMEX method is completely flexible, and so far ROSW seems to be flexible in this way (?), then I think it should also work with either form. Ideally, yes. IMEX schemes for F(t,u,u')=G(t,u) and nontrivial M do not exist and special care is needed in order to account for that M that is buried inside F (Table 12). Exposing M as a separate object would make things easier on the user side on the surface but would create issues for other users (e.g., those using ALE methods) and upset Jed :). ROSW does the same thing as the other implicit methods by moving G on the other side and becoming part of F; but it uses explicit evaluations of this fatter F. So although you provide G, it is not treated separately by any of these methods. If an IMEX method is restricted further by form, e.g. requiring dF/d(u') to be the identity, then wouldn't it make sense to have the user programmatically indicate that structural property? Such an indication is not about the desired solver but about the ODE. If the structural property held then we could proceed with the restricted-application method, e.g. ARKIMEX/EIMEX. Perhaps one could have this (proposed) functionality: TSSetLHSHasIdentity(TS,PETSC_TRUE) Or one might instead set an enumerate for whether dF/d(u') is I or M (invertible) or M (noninvertible for DAE) or a general nonlinear function: TSSetLHSStructureType(TS,TS_LHS_STRUCTURE_IDENTITY) TSSetLHSStructureType(TS,TS_LHS_STRUCTURE_INVERTIBLE) TSSetLHSStructureType(TS,TS_LHS_STRUCTURE_NONINVERTIBLE) TSSetLHSStructureType(TS,TS_LHS_STRUCTURE_NONLINEAR) Obviously the enumerate would only need to include structure which some method could exploit. What do you think? Yes, that is precisely why we introduced EquationType and this is what it does to some extent: EXPLICIT_ODE=LHS_STRUCTURE_IDENTITY IMPLICT_ODE=STRUCTURE_INVERTIBLE; DAE=STRUCTURE_NONINVERTIBLE, but then it can be DAE with different diff index. The idea was to provide enough granularity that would be useful in a decision tree. But it has only been used consistently in the context of ARKIMEX because there it really makes a difference. In any case, I am currently having trouble with the preferred way to describe e.g. diffusion-reaction PDEs. It seems to me I would want to supply all four of these IFunction IJacobian RHSFunction RHSJacobian so as to allow full-performance for both fully-implicit methods and IMEX methods. (And for typical examples I certainly *can* form RHSJacobian, for example.) But none of the src/ts/tutorials/ examples seem to unconditionally supply all four, and I can't tell (e.g. from -ts_view) which parts are seen and called by the various methods. Yes, that's correct; all 4 are desirable if available/feasible. Unfortunately, for now I'd suggest iterating a bit between stage (1) and (2): what does the problem look like, what solvers I expect to use and what are the solver requirements. E.g., if you have stiff-nonstiff IMEX and mass matrix that can be accelerated by ARKIMEX, then you'd need to work through the different functions until we put up a more reasonable guidance. I also think that ts_view is the perfect place to print information of what the solver actually solves based on the information is has. We'll try to formulate something like this. Examples can also be improved. Matt promised he'd some smarter partitioning based on index sets that should make our lives easier. Emil Ed On Mon, Aug 31, 2020 at 2:27 PM Constantinescu, Emil M. > wrote: On 8/31/20 12:17 PM, Ed Bueler wrote: Emil -- Thanks for looking at this. > Hi Ed, can you please add the following > TSSetEquationType(ts,TS_EQ_IMPLICIT); > before calling TSSolve and try again? This is described in Table 12 in the pdf doc. Yep, that fixes it. After setting the TS_EQ_IMPLICIT flag programmatically I get: It is only programmatic because it has to do with the form of RHS and IFunctions. $ ./ex54 -ts_type arkimex -ts_arkimex_fully_implicit error norm at tf = 1.000000 from 12 steps: |u-u_exact| = 1.34500e-02 Without -ts_arkimex_fully_implicit we still get the wrong answer, but, as I understand it, we expect the wrong answer because dF/d(dudt) != I, correct? Yes. I keep mixing F and G, but if you want to solve Mu'=H(u), then define the IFunction := M u_dot - H(u) then it should work with all time steppers. If you want to set the RHS of your ODE in the RHS function (so that you can use explicit integrators, too) you have to provide: IFunction := u_dot and RHSFunction := M^{-1}*H(u) [or solve Mx=H(u) in the RHS function]. Note that M u_dot - H(u) can only be solved by implicit solvers directly so IFunction := M u_dot and RHSFunction := H(u). Table 12 in the PDF doc explains these cases, but that can be improved as well. So -ts_arkimex_fully_implicit does not set this flag? No, its use is for when you have both IFunction (for stiff) and RHSfunction (for nonstiff) defined to solve Mu'=H(u) + W(u) and: 1- mass is identity: IFunction:= u_dot-H(u); RHSFunction:= W(u), or 2- mass is full rank, but not identity: IFunction:= M u_dot-H(u); RHSFunction:= M^{-1} * W(u) and you have a choice of using either an IMEX scheme [-ts_arkimex_fully_implicit false] or just the implicit part [-ts_arkimex_fully_implicit true]. Thank you for your feedback on our short survey - it is very valuable in helping us crafting a less painful path to using all these options. Emil > So that we improve our user experience, can you tell us what are your usual sources/starting points > when implementing a new problem: > 1- PDF doc Yes. Looked briefly at the PDF manual. E.g. I saw the tables for IMEX methods but my eyes glazed over. > 2- tutorials (if you find a good match) Yes. Looked at various html pages including the one for TSARKIMEX. But I missed the sentence "Methods with an explicit stage can only be used with ODE in which the stiff part G(t,X,Xdot) has the form Xdot + Ghat(t,X)." I did not expect that ARKIMEX had this restriction, and did not pick it up. > 3- own PETSc implementations Yes. I have my own diffusion-reaction system (https://github.com/bueler/p4pdes/blob/master/c/ch5/pattern.c) in which ARKIMEX works well. (Or at least as far as I can tell. I don't have a manufactured solution for it, for example.) I am in the midst of tracking down a different kind of error, probably from DMDA callbacks, when I got distracted by the current issue. > 4- online function doc Yes. See above comment on TSARKIMEX page. By my memory I also looked at the TSSet{I,RHS}Jacobian() pages, for example, and probably others. > 5- other Not sure. Thanks, Ed On Mon, Aug 31, 2020 at 6:09 AM Constantinescu, Emil M. > wrote: On 8/30/20 6:04 PM, Ed Bueler wrote: Actually, ARKIMEX is not off the hook. It still gets the wrong answer if told the whole thing is implicit: $ ./ex54 -ts_type arkimex -ts_arkimex_fully_implicit # WRONG (AND REALLY SLOW) error norm at tf = 1.000000 from 224 steps: |u-u_exact| = 2.76636e+00 Hi Ed, can you please add the following TSSetEquationType(ts,TS_EQ_IMPLICIT); before calling TSSolve and try again? This is described in Table 12 in the pdf doc. So that we improve our user experience, can you tell us what are your usual sources/starting points when implementing a new problem: 1- PDF doc 2- tutorials (if you find a good match) 3- own PETSc implementations 4- online function doc 5- other Thanks, Emil versus $ ./ex54 -ts_type arkimex # WRONG BUT IFunction IS OF FLAGGED FORM error norm at tf = 1.000000 from 16 steps: |u-u_exact| = 1.93229e+01 $ ./ex54 -ts_type bdf # RIGHT error norm at tf = 1.000000 from 33 steps: |u-u_exact| = 9.29170e-02 So I am not sure what "Methods with an explicit stage can only be used with ODE in which the stiff part G(t,X,Xdot) has the form Xdot + Ghat(t,X)." means. Ed On Sun, Aug 30, 2020 at 2:57 PM Ed Bueler > wrote: Darn, sorry. I realize the ARKIMEX page does say "Methods with an explicit stage can only be used with ODE in which the stiff part G(t,X,Xdot) has the form Xdot + Ghat(t,X)." So my example does not do that. Is there a way for ARKIMEX to detect that dG/d(Xdot) = I? Ed On Sun, Aug 30, 2020 at 2:44 PM Ed Bueler > wrote: Dear PETSc -- I tried twice to make this an issue at the gitlab.com host site, but both times got "something went wrong (500)". So this is a bug report by old-fashioned means. I created a TS example, https://github.com/bueler/p4pdes-next/blob/master/c/fix-arkimex/ex54.c at my github, also attached. It solves a 2D linear ODE ``` x' + y' = 6 y y' = x ``` Pretty basic; the known exact solution is just exponentials. The code writes it as F(t,u,u')=G(t,u) and supplies all the pieces, namely IFunction,IJacobian,RHSFunction,RHSJacobian. Note both F and G must be seen by TS to get the correct solution. In summary, a boring (and valgrind-clean ;-)) example. For current master branch it runs fine for the fully-implicit methods (e.g. BDF, CN, ROSW) which can use the IFunction F, including with finite-differenced Jacobians. With BDF2, BDF2+-snes_fd, BDF6+tight tol., CN, BEULER, ROSW: $ ./ex54 error norm at tf = 1.000000 from 33 steps: |u-u_exact| = 9.29170e-02 $ ./ex54 -snes_fd error norm at tf = 1.000000 from 33 steps: |u-u_exact| = 9.29170e-02 $ ./ex54 -ts_rtol 1.0e-14 -ts_atol 1.0e-14 -ts_bdf_order 6 error norm at tf = 1.000000 from 388 steps: |u-u_exact| = 4.23624e-11 $ ./ex54 -ts_type beuler error norm at tf = 1.000000 from 100 steps: |u-u_exact| = 6.71676e-01 $ ./ex54 -ts_type cn error norm at tf = 1.000000 from 100 steps: |u-u_exact| = 2.22839e-03 $ ./ex54 -ts_type rosw error norm at tf = 1.000000 from 21 steps: |u-u_exact| = 5.64012e-03 But it produces wrong values with ARKIMEX: $ ./ex54 -ts_type arkimex error norm at tf = 1.000000 from 16 steps: |u-u_exact| = 1.93229e+01 Neither tightening tolerance nor changing type (`-ts_arkimex_type`) helps ARKIMEX. Thanks! Ed PS My book is at a late proofs stage, and out of my hands. It should appear SIAM Press in a couple of months. In all the examples in my book, only my diffusion-reaction system example using F(t,u,u') = G(t,u) is broken. Thus the motivation for a trivial ODE example as above. -- Ed Bueler Dept of Mathematics and Statistics University of Alaska Fairbanks Fairbanks, AK 99775-6660 306C Chapman -- Ed Bueler Dept of Mathematics and Statistics University of Alaska Fairbanks Fairbanks, AK 99775-6660 306C Chapman -- Ed Bueler Dept of Mathematics and Statistics University of Alaska Fairbanks Fairbanks, AK 99775-6660 306C Chapman -- Emil M. Constantinescu, Ph.D. Computational Mathematician Argonne National Laboratory Mathematics and Computer Science Division Ph: 630-252-0926 http://www.mcs.anl.gov/~emconsta -- Ed Bueler Dept of Mathematics and Statistics University of Alaska Fairbanks Fairbanks, AK 99775-6660 306C Chapman -- Emil M. Constantinescu, Ph.D. Computational Mathematician Argonne National Laboratory Mathematics and Computer Science Division Ph: 630-252-0926 http://www.mcs.anl.gov/~emconsta -- Ed Bueler Dept of Mathematics and Statistics University of Alaska Fairbanks Fairbanks, AK 99775-6660 306C Chapman -- Emil M. Constantinescu, Ph.D. Computational Mathematician Argonne National Laboratory Mathematics and Computer Science Division Ph: 630-252-0926 http://www.mcs.anl.gov/~emconsta -------------- next part -------------- An HTML attachment was scrubbed... URL: