From knepley at gmail.com  Mon Aug  2 08:48:38 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 2 Aug 2021 09:48:38 -0400
Subject: [petsc-users] DMPlex box mesh periodicity bug (?)
In-Reply-To: <CAJ6QA5tXhuf_QDHXvcRtH99PJNdDa5Deoor5excAxC769Ys=rA@mail.gmail.com>
References: <CAJ6QA5tXhuf_QDHXvcRtH99PJNdDa5Deoor5excAxC769Ys=rA@mail.gmail.com>
Message-ID: <CAMYG4GkbGGNqbG2TmxxtoYLsqKZUmgjXtnAU4xWaEDWF97tuQw@mail.gmail.com>

On Sat, Jul 31, 2021 at 6:01 AM Thibault Bridel-Bertomeu <
thibault.bridelbertomeu at gmail.com> wrote:

> Dear all,
>
> I have noticed what I think is a bug with a 3D DMPlex box mesh with
> periodic boundaries.
> When I project a function onto it, it behaves as if the last row of cells
> in X and in Y direction do not have the right coordinates.
>
> I attach to this email a minimal example that reproduces the bug (files
> mwe_periodic_3d.F90, wrapper_petsc.c, wrapper_petsc.h90, makefile), as well
> as the output of this code (initmesh.vtu for an output of the DM,
> solution.vtu for an output of the data projected onto the mesh). There is
> also a screenshot of what's going on.
> If one considers the function I project onto the mesh, what should
> normally happen is that there is a "hole" is the density field around the
> x=0, y=0 region, the rest being equal to one.
>
> I hope it is just a mishandling from my end !!
>

Hi Thibault,

1) The main thing happening here is that visualization has some problems
for completely periodic things. It is on my list to fix, but below other
things.

2) Second, it looks like you need to localize coordinates again after
creating ghost cells. Everything is in the right order if you use the
command line to create
    the mesh. I have attached a C example where I do this. In your code, I
think just adding Localize again will work.

In my example, you can see 2D as well as 3D, and change from non-periodic
to periodic to see what is happening.

  Thanks,

     Matt


> Thank you in advance for your help,
>
> Cheers,
>
> Thibault
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210802/8c01b1d2/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: thibault.c
Type: application/octet-stream
Size: 2647 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210802/8c01b1d2/attachment.obj>

From thibault.bridelbertomeu at gmail.com  Mon Aug  2 15:23:44 2021
From: thibault.bridelbertomeu at gmail.com (Thibault Bridel-Bertomeu)
Date: Mon, 2 Aug 2021 22:23:44 +0200
Subject: [petsc-users] DMPlex box mesh periodicity bug (?)
In-Reply-To: <CAMYG4GkbGGNqbG2TmxxtoYLsqKZUmgjXtnAU4xWaEDWF97tuQw@mail.gmail.com>
References: <CAJ6QA5tXhuf_QDHXvcRtH99PJNdDa5Deoor5excAxC769Ys=rA@mail.gmail.com>
	<CAMYG4GkbGGNqbG2TmxxtoYLsqKZUmgjXtnAU4xWaEDWF97tuQw@mail.gmail.com>
Message-ID: <CAJ6QA5tN7qtPtx2DSJoPjU9VLTPartWhr4G7bUi-8dB9UXQNNw@mail.gmail.com>

Hi Matt,

Thank you for taking the time to take a look  !

1) Yea ... I know most visualization softwares and most storage formats do
not like periodicity (a.k.a. infinity) ... but fo CFD-research it's a great
tool so we keep trying ... x) The thing that surprised me is that my mwe
works perfectly well in 2D, and not in 3D : that's why I ventured to call
it a bug !

2) Hmm I tried adding a "DMLocalizeCoordinates" after the
"DMPlexConstructGhostCells" but it does not change anything, the result is
still exactly the same as I showed above.

As for your code, it compiles fine and I can execute it as well, thanks !
However ...
1) when I run it in 2D and ask for the HDF5 output, and then run the
petsc_gen_xdmf.py script, I get the attached result (sol_2D.png) : it seems
either Paraview cannot handle what's in the HDF file or the HDF file does
not contain something consistent,
2) when I run it in 3D and ask for the HDF5 output + run petsc_gen_xdmf.py
I can see ... nothing : paraview does not show anything when I try to open
the XDMF file, although it seems syntaxically correct (the HDF file as well)
3) when I run it in 3D and ask for VTU output, I get exactly the same thing
as what I got with my F90 program (see sol_3D.png attached) - my command
line is ./thibault -dm_plex_dim 3 -dm_plex_simplex 0 -dm_plex_box_faces
16,16,16 -dm_plex_box_lower -5,-5,-5 -dm_plex_box_upper 5,5,5
-dm_plex_box_bd periodic,periodic,periodic -dm_plex_create_fv_ghost_cells
-dm_plex_periodic_cut -vec_view vtk:sol.vtu

Does your piece of code yields different results on your end ?

(P.S. I am using the main branch, commit id ae6adb75dd).

Thank you very much for your support !!

Thibault


Le lun. 2 ao?t 2021 ? 15:48, Matthew Knepley <knepley at gmail.com> a ?crit :

> On Sat, Jul 31, 2021 at 6:01 AM Thibault Bridel-Bertomeu <
> thibault.bridelbertomeu at gmail.com> wrote:
>
>> Dear all,
>>
>> I have noticed what I think is a bug with a 3D DMPlex box mesh with
>> periodic boundaries.
>> When I project a function onto it, it behaves as if the last row of cells
>> in X and in Y direction do not have the right coordinates.
>>
>> I attach to this email a minimal example that reproduces the bug (files
>> mwe_periodic_3d.F90, wrapper_petsc.c, wrapper_petsc.h90, makefile), as well
>> as the output of this code (initmesh.vtu for an output of the DM,
>> solution.vtu for an output of the data projected onto the mesh). There is
>> also a screenshot of what's going on.
>> If one considers the function I project onto the mesh, what should
>> normally happen is that there is a "hole" is the density field around the
>> x=0, y=0 region, the rest being equal to one.
>>
>> I hope it is just a mishandling from my end !!
>>
>
> Hi Thibault,
>
> 1) The main thing happening here is that visualization has some problems
> for completely periodic things. It is on my list to fix, but below other
> things.
>
> 2) Second, it looks like you need to localize coordinates again after
> creating ghost cells. Everything is in the right order if you use the
> command line to create
>     the mesh. I have attached a C example where I do this. In your code, I
> think just adding Localize again will work.
>
> In my example, you can see 2D as well as 3D, and change from non-periodic
> to periodic to see what is happening.
>
>   Thanks,
>
>      Matt
>
>
>> Thank you in advance for your help,
>>
>> Cheers,
>>
>> Thibault
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210802/da7c7e24/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sol_2D.png
Type: image/png
Size: 1218883 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210802/da7c7e24/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sol_3D.png
Type: image/png
Size: 1113773 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210802/da7c7e24/attachment-0003.png>

From knepley at gmail.com  Mon Aug  2 15:45:53 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 2 Aug 2021 16:45:53 -0400
Subject: [petsc-users] DMPlex box mesh periodicity bug (?)
In-Reply-To: <CAJ6QA5tN7qtPtx2DSJoPjU9VLTPartWhr4G7bUi-8dB9UXQNNw@mail.gmail.com>
References: <CAJ6QA5tXhuf_QDHXvcRtH99PJNdDa5Deoor5excAxC769Ys=rA@mail.gmail.com>
	<CAMYG4GkbGGNqbG2TmxxtoYLsqKZUmgjXtnAU4xWaEDWF97tuQw@mail.gmail.com>
	<CAJ6QA5tN7qtPtx2DSJoPjU9VLTPartWhr4G7bUi-8dB9UXQNNw@mail.gmail.com>
Message-ID: <CAMYG4G=rVkHSnB+21j_xt=ZD2=xJQ4Akc2KAteMt7+JXCKn9gw@mail.gmail.com>

On Mon, Aug 2, 2021 at 4:23 PM Thibault Bridel-Bertomeu <
thibault.bridelbertomeu at gmail.com> wrote:

> Hi Matt,
>
> Thank you for taking the time to take a look  !
>
> 1) Yea ... I know most visualization softwares and most storage formats do
> not like periodicity (a.k.a. infinity) ... but fo CFD-research it's a great
> tool so we keep trying ... x) The thing that surprised me is that my mwe
> works perfectly well in 2D, and not in 3D : that's why I ventured to call
> it a bug !
>
> 2) Hmm I tried adding a "DMLocalizeCoordinates" after the
> "DMPlexConstructGhostCells" but it does not change anything, the result is
> still exactly the same as I showed above.
>

Hmm, I will try it. It should error telling you that VTK cannot handle
meshes with localized coordinates, so something is going wrong there.


> As for your code, it compiles fine and I can execute it as well, thanks !
> However ...
> 1) when I run it in 2D and ask for the HDF5 output, and then run the
> petsc_gen_xdmf.py script, I get the attached result (sol_2D.png) : it seems
> either Paraview cannot handle what's in the HDF file or the HDF file does
> not contain something consistent,
>

This is exactly what I get. Something is wrong with the periodic cut when I
have a double point. If you use

  -dm_plex_box_bd periodic,none

it looks fine. This is the bug with periodic visualization that I was
talking about.


> 2) when I run it in 3D and ask for the HDF5 output + run petsc_gen_xdmf.py
> I can see ... nothing : paraview does not show anything when I try to open
> the XDMF file, although it seems syntaxically correct (the HDF file as well)
>

Yes, Paraview misunderstands the mesh connections, so I must be telling it
something wrong with the periodic cut. You can see that each periodic
dimension
vanishes from the display, but all the questions I ask it in tests are
correct.


> 3) when I run it in 3D and ask for VTU output, I get exactly the same
> thing as what I got with my F90 program (see sol_3D.png attached) - my
> command line is ./thibault -dm_plex_dim 3 -dm_plex_simplex 0
> -dm_plex_box_faces 16,16,16 -dm_plex_box_lower -5,-5,-5 -dm_plex_box_upper
> 5,5,5 -dm_plex_box_bd periodic,periodic,periodic
> -dm_plex_create_fv_ghost_cells -dm_plex_periodic_cut -vec_view vtk:sol.vtu
>

Yes, VTU output does not work at all with periodicity. It was only really
intended to work with DMDA. I do everything with HDF5 now since it can
handle
multiple meshes, etc.

I think the right thing to do is fix the "periodic cut" support for
multiply periodic things, and for 3D.

  Thanks,

     Matt


> Does your piece of code yields different results on your end ?
>
> (P.S. I am using the main branch, commit id ae6adb75dd).
>
> Thank you very much for your support !!
>
> Thibault
>
>
>
> Le lun. 2 ao?t 2021 ? 15:48, Matthew Knepley <knepley at gmail.com> a ?crit :
>
>> On Sat, Jul 31, 2021 at 6:01 AM Thibault Bridel-Bertomeu <
>> thibault.bridelbertomeu at gmail.com> wrote:
>>
>>> Dear all,
>>>
>>> I have noticed what I think is a bug with a 3D DMPlex box mesh with
>>> periodic boundaries.
>>> When I project a function onto it, it behaves as if the last row of
>>> cells in X and in Y direction do not have the right coordinates.
>>>
>>> I attach to this email a minimal example that reproduces the bug (files
>>> mwe_periodic_3d.F90, wrapper_petsc.c, wrapper_petsc.h90, makefile), as well
>>> as the output of this code (initmesh.vtu for an output of the DM,
>>> solution.vtu for an output of the data projected onto the mesh). There is
>>> also a screenshot of what's going on.
>>> If one considers the function I project onto the mesh, what should
>>> normally happen is that there is a "hole" is the density field around the
>>> x=0, y=0 region, the rest being equal to one.
>>>
>>> I hope it is just a mishandling from my end !!
>>>
>>
>> Hi Thibault,
>>
>> 1) The main thing happening here is that visualization has some problems
>> for completely periodic things. It is on my list to fix, but below other
>> things.
>>
>> 2) Second, it looks like you need to localize coordinates again after
>> creating ghost cells. Everything is in the right order if you use the
>> command line to create
>>     the mesh. I have attached a C example where I do this. In your code,
>> I think just adding Localize again will work.
>>
>> In my example, you can see 2D as well as 3D, and change from non-periodic
>> to periodic to see what is happening.
>>
>>   Thanks,
>>
>>      Matt
>>
>>
>>> Thank you in advance for your help,
>>>
>>> Cheers,
>>>
>>> Thibault
>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210802/1fe325aa/attachment.html>

From aph at email.arizona.edu  Mon Aug  2 16:06:33 2021
From: aph at email.arizona.edu (Anthony Paul Haas)
Date: Mon, 2 Aug 2021 14:06:33 -0700
Subject: [petsc-users] from Petsc 3.7.4.0 to 3.13.3.0
Message-ID: <CAEyxMWWuwopMOj3Nn+fohVMmjNqnN130XVhfQ4bO2ek5pSLTTg@mail.gmail.com>

Hello,

I recently updated our code from Petsc 3.7.4.0 to 3.13.3.0. Among other
things I noticed is that all the includes (such as #include
<petsc/finclude/petscvec.h>) have now to be accompanied with  use
statements (such as use petscvec).

It seems that due to the use statements the compiler is now way more
strict. In our code, we can solve stability equations in real arithmetic or
in complex arithmetic, where some subroutines are used for complex
arithmetic and some other ones for real arithmetic.

My question is, is it good practice to wrap around a Petsc call  with the
pre-compiler flag PETSC_USE_COMPLEX in order to avoid compilation error if
that call is not used say in the complex part of the code?

Example, the call to MatSetValuesBlocked below is not used in the complex
arithmetic code, so to avoid a compilation error, I wrapped the call
with PETSC_USE_COMPLEX==0
(Mat1 is a real array in this example)

#if (PETSC_USE_COMPLEX==0)

       call MatSetValuesBlocked(self%fieldLHSMat_ps,1,ptLoc-1,1,colIndex-1,
transpose(Mat1(1:ndim1,1:ndim2)),INSERT_VALUES,ierr)

#endif


Thanks,


Anthony
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210802/64e4d821/attachment.html>

From bsmith at petsc.dev  Mon Aug  2 20:45:54 2021
From: bsmith at petsc.dev (Barry Smith)
Date: Mon, 2 Aug 2021 20:45:54 -0500
Subject: [petsc-users] from Petsc 3.7.4.0 to 3.13.3.0
In-Reply-To: <CAEyxMWWuwopMOj3Nn+fohVMmjNqnN130XVhfQ4bO2ek5pSLTTg@mail.gmail.com>
References: <CAEyxMWWuwopMOj3Nn+fohVMmjNqnN130XVhfQ4bO2ek5pSLTTg@mail.gmail.com>
Message-ID: <01F246CF-0B93-4EC3-945B-DE40188A64EB@petsc.dev>


  It is find to use such conditional checks if needed. 

> On Aug 2, 2021, at 4:06 PM, Anthony Paul Haas <aph at email.arizona.edu> wrote:
> 
> Hello,
> 
> I recently updated our code from Petsc 3.7.4.0 to 3.13.3.0. Among other things I noticed is that all the includes (such as #include <petsc/finclude/petscvec.h>) have now to be accompanied with  use statements (such as use petscvec).
> 
> It seems that due to the use statements the compiler is now way more strict. In our code, we can solve stability equations in real arithmetic or in complex arithmetic, where some subroutines are used for complex arithmetic and some other ones for real arithmetic.
> 
> My question is, is it good practice to wrap around a Petsc call  with the pre-compiler flag PETSC_USE_COMPLEX in order to avoid compilation error if that call is not used say in the complex part of the code?
> 
> Example, the call to MatSetValuesBlocked below is not used in the complex arithmetic code, so to avoid a compilation error, I wrapped the call with PETSC_USE_COMPLEX==0 (Mat1 is a real array in this example)
> 
> #if (PETSC_USE_COMPLEX==0)
>        call MatSetValuesBlocked(self%fieldLHSMat_ps,1,ptLoc-1,1,colIndex-1,transpose(Mat1(1:ndim1,1:ndim2)),INSERT_VALUES,ierr)
> #endif
> 
> Thanks,
> 
> Anthony
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210802/16d8743c/attachment-0001.html>

From milan.pelletier at protonmail.com  Tue Aug  3 05:39:31 2021
From: milan.pelletier at protonmail.com (Milan Pelletier)
Date: Tue, 03 Aug 2021 10:39:31 +0000
Subject: [petsc-users] Is there an up-to-date list of GPU-supported
 preconditioners?
Message-ID: <cdjjQFVaWn6UNlL00Tzm-nkajk8WzQJKkPxtO_DMBxSMO9EK5X_MO6RK1tJS_lkslKQs186hZvUkEIbkjCM5CNv_DHsmNiB3t-NqjjoZVJY=@protonmail.com>

Dear PETSc users,

I would like to know if there is somewhere a list or table summarizing which preconditioners completely/partly run with CUDA kernels?

Best regards,
Milan Pelletier
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210803/6e22e5d7/attachment.html>

From mfadams at lbl.gov  Tue Aug  3 07:08:32 2021
From: mfadams at lbl.gov (Mark Adams)
Date: Tue, 3 Aug 2021 08:08:32 -0400
Subject: [petsc-users] Is there an up-to-date list of GPU-supported
 preconditioners?
In-Reply-To: <cdjjQFVaWn6UNlL00Tzm-nkajk8WzQJKkPxtO_DMBxSMO9EK5X_MO6RK1tJS_lkslKQs186hZvUkEIbkjCM5CNv_DHsmNiB3t-NqjjoZVJY=@protonmail.com>
References: <cdjjQFVaWn6UNlL00Tzm-nkajk8WzQJKkPxtO_DMBxSMO9EK5X_MO6RK1tJS_lkslKQs186hZvUkEIbkjCM5CNv_DHsmNiB3t-NqjjoZVJY=@protonmail.com>
Message-ID: <CADOhEh4T_sf6AxKEyo5qCi=SpcGnaRx8RcHUZL-1etb=iC6pSQ@mail.gmail.com>

Hi Milan,

I would say no. Our GPU support is under active development and moving fast.

All built-in solvers work with the cuSparse back-end, at least nominally
(eg, SOR smoothers do not work).
'-mat_type cusparse' is probably the place to start.

Our Hypre/GPU support is close.

Our Kokkos-HIP back-end is up and running (I've tested it). Our HIP and
SYCL back-ends are just coming online now.

Mark

On Tue, Aug 3, 2021 at 6:43 AM Milan Pelletier via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> Dear PETSc users,
>
> I would like to know if there is somewhere a list or table summarizing
> which preconditioners completely/partly run with CUDA kernels?
>
> Best regards,
> Milan Pelletier
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210803/c169324c/attachment.html>

From s_sagar at ce.iitr.ac.in  Wed Aug  4 07:31:29 2021
From: s_sagar at ce.iitr.ac.in (SHIV SAGAR)
Date: Wed, 4 Aug 2021 18:01:29 +0530
Subject: [petsc-users] Issue while running a PETSC example
Message-ID: <CA+=SFSzZvtLpRBPa7reAROnPpOyqXecTthNyBBWC36d9L9rNOw@mail.gmail.com>

Dear Sir,
I would like to extend my gratitude towards you for creating PETSc
libraries for efficient computation.

I am a PhD student studying Brittle Fracture using Finite Elements and I
have been introduced to the idea of efficient computation using PETSc
libraries. Being a beginner to PETSc and the Linux OS, I was having
difficulties in running PETSc example in

~/petsc/src/ksp/ksp/tutorials$ make ex1

I get the following error:

makefile:41: home/sagar/petsc/lib/petsc/conf/test: No such file or directory
make: *** No rule to make target 'home/sagar/petsc/lib/petsc/conf/test'.
Stop.

I have set the environment variables PETSC_DIR = home/sagar/petsc
and PETSC_ARCH = arch-linux2-c-debug

If you could help me with this, I would be grateful and could continue
using the libraries for much complex programs.

Thank You
Yours Faithfully
Shiv Sagar
PhD, IIT Roorkee
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210804/012f401e/attachment.html>

From knepley at gmail.com  Thu Aug  5 08:59:15 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 5 Aug 2021 09:59:15 -0400
Subject: [petsc-users] Issue while running a PETSC example
In-Reply-To: <CA+=SFSzZvtLpRBPa7reAROnPpOyqXecTthNyBBWC36d9L9rNOw@mail.gmail.com>
References: <CA+=SFSzZvtLpRBPa7reAROnPpOyqXecTthNyBBWC36d9L9rNOw@mail.gmail.com>
Message-ID: <CAMYG4GnejRi3+nNqgK_a3p3SiqA-YuNN1QavTWCYq8ZLLXccgA@mail.gmail.com>

On Thu, Aug 5, 2021 at 9:56 AM SHIV SAGAR <s_sagar at ce.iitr.ac.in> wrote:

> Dear Sir,
> I would like to extend my gratitude towards you for creating PETSc
> libraries for efficient computation.
>
> I am a PhD student studying Brittle Fracture using Finite Elements and I
> have been introduced to the idea of efficient computation using PETSc
> libraries. Being a beginner to PETSc and the Linux OS, I was having
> difficulties in running PETSc example in
>
> ~/petsc/src/ksp/ksp/tutorials$ make ex1
>
> I get the following error:
>
> makefile:41: home/sagar/petsc/lib/petsc/conf/test: No such file or
> directory
> make: *** No rule to make target 'home/sagar/petsc/lib/petsc/conf/test'.
> Stop.
>
> I have set the environment variables PETSC_DIR = home/sagar/petsc
>

Hi Shiv,

I think the problem is that you need

  PETSC_DIR = /home/sagar/petsc

   Thanks,

      Matt


> and PETSC_ARCH = arch-linux2-c-debug
>
> If you could help me with this, I would be grateful and could continue
> using the libraries for much complex programs.
>
> Thank You
> Yours Faithfully
> Shiv Sagar
> PhD, IIT Roorkee
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210805/0e71dd16/attachment.html>

From milan.pelletier at protonmail.com  Fri Aug  6 07:51:23 2021
From: milan.pelletier at protonmail.com (Milan Pelletier)
Date: Fri, 06 Aug 2021 12:51:23 +0000
Subject: [petsc-users] Using external matrix as ILU preconditioner
Message-ID: <BSTaYDmbPf5YdsSf-AREC8ZG13sEKpSfYYxmwNDfxTBzdPppDWBoqMUTJeNLCsLa4jfPemO6Ub5Uz2k8J_TBqtpTRYKCCHxP4eTRnrYFx-8=@protonmail.com>

Dear PETSc users,

I would like to know if it is possible to provide PETSc with a ILU-preconditioner matrix computed externally beforehand.
I tried and built a PCSHELL, to which I pass the externally-computed preconditioner matrix as "Pmat" using the KSPSetOperators function. Then I wanted to use that Pmat in PCApply by calling MatSolve as it seems to be done in the ILU case. Though, this fails since the mat->ops->solve (with mat being my PC Matrix) is a null pointer.

I guess the way I set the matrix (as a MATSEQAIJ) is not sufficient for PETSc to know what function to use as MatSolve.

How could I achieve providing my own ILU-decomposed matrix and feed PETSc's PCG with it? Is it actually possible?

Thanks for your help,
Milan Pelletier
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210806/fd2b07ea/attachment.html>

From mfadams at lbl.gov  Fri Aug  6 08:02:56 2021
From: mfadams at lbl.gov (Mark Adams)
Date: Fri, 6 Aug 2021 09:02:56 -0400
Subject: [petsc-users] Using external matrix as ILU preconditioner
In-Reply-To: <BSTaYDmbPf5YdsSf-AREC8ZG13sEKpSfYYxmwNDfxTBzdPppDWBoqMUTJeNLCsLa4jfPemO6Ub5Uz2k8J_TBqtpTRYKCCHxP4eTRnrYFx-8=@protonmail.com>
References: <BSTaYDmbPf5YdsSf-AREC8ZG13sEKpSfYYxmwNDfxTBzdPppDWBoqMUTJeNLCsLa4jfPemO6Ub5Uz2k8J_TBqtpTRYKCCHxP4eTRnrYFx-8=@protonmail.com>
Message-ID: <CADOhEh4z-5FDj4UVrMoa60-9+V8Y-9f8GuyqrVKpty5v84dxDQ@mail.gmail.com>

PCSHELL is for adding your own preconditioner method (mat->ops->solve).
Pmat is the matrix that you want the PC to use to compute the
preconditioner.
This is usually the same as Amat, the matrix that you want to use to apply
the operator, but if you want to use say a matrix-free Amat, then you need
to provide some sort of explicit matrix approximation of Amat for most
preconditioners.
If you have a matrix that is an approximation of the inverse of Amat that
you simply want to apply then you would need to make a PCSHELL with a
method that does that.
Mark

On Fri, Aug 6, 2021 at 8:51 AM Milan Pelletier via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> Dear PETSc users,
>
> I would like to know if it is possible to provide PETSc with a
> ILU-preconditioner matrix computed externally beforehand.
> I tried and built a PCSHELL, to which I pass the externally-computed
> preconditioner matrix as "Pmat" using the KSPSetOperators function. Then I
> wanted to use that Pmat in PCApply by calling MatSolve as it seems to be
> done in the ILU case. Though, this fails since the mat->ops->solve (with
> mat being my PC Matrix) is a null pointer.
>
> I guess the way I set the matrix (as a MATSEQAIJ) is not sufficient for
> PETSc to know what function to use as MatSolve.
>
> How could I achieve providing my own ILU-decomposed matrix and feed
> PETSc's PCG with it? Is it actually possible?
>
> Thanks for your help,
>
> Milan Pelletier
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210806/b4f15f86/attachment.html>

From aduarteg at utexas.edu  Fri Aug  6 17:26:07 2021
From: aduarteg at utexas.edu (Alfredo J Duarte Gomez)
Date: Fri, 6 Aug 2021 17:26:07 -0500
Subject: [petsc-users] PCFIELDSPLIT and Block Matrices
Message-ID: <CAO1tTfJTEVuCQT6zrfRLdCogdczMrwPrUUGfJ=cfmaZb1UQ4jQ@mail.gmail.com>

Good morning,

I am currently working on a PETSC application that will require a
preconditioner that uses several block matrices.

For now, I have a simple problem that I am solving with a dmda
structured grid with two fields. For presentation purposes (I know petsc
does not use this ordering), lets assume a vector ordering
[u1,u2,...,uN,v1,v2,...vN] where u and v are my two fields with N number of
grid points. The coupling between these two fields is weak enough that an
efficient preconditioner can be formed as the matrix P = [A1, 0;0,A2] where
A1 (dependent on u only) and A2 (dependent on v only) are block matrices of
size NxN. Therefore, I only require two linear solves of the reduced
systems.

I am passing the preconditioner matrix P in the Jacobian function, and I
hope this strategy is what I am telling PETSC to do with the following
block of code:

ierr = KSPGetPC(snes,&pc);CHKERRQ(ierr);CHKERRQ(ierr);
ierr = PCSetType(pc,PCFIELDSPLIT);CHKERRQ(ierr);
ierr =
 DMCreateFieldDecomposition(dau,NULL,NULL,&fields,NULL);CHKERRQ(ierr);
ierr =  PCFieldSplitSetIS(pc,NULL,fields[0]);CHKERRQ(ierr);
ierr =  PCFieldSplitSetIS(pc,NULL,fields[1]);CHKERRQ(ierr);

Is this what is actually happening, or is the split also including some of
the zero blocks on P?

Second, for a future application, I will need a slightly more complicated
strategy. It will require solving a similar matrix to P as specified above
with more fields (Block diagonal for the fields), and then using the answer
to those independent systems for a smaller local solves. In summary, if i
have M fields and N grid points, I will solve M systems of size N then
followed by using solution as the right hand side to solve N systems of
size M.

Is this something that the PCFIELDSPLIT can accomodate? Or will I have to
implement my own PCSHELL?

Thank you,

-Alfredo

-- 
Alfredo Duarte
Graduate Research Assistant
The University of Texas at Austin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210806/661450ca/attachment.html>

From bsmith at petsc.dev  Fri Aug  6 21:11:45 2021
From: bsmith at petsc.dev (Barry Smith)
Date: Fri, 6 Aug 2021 21:11:45 -0500
Subject: [petsc-users] PCFIELDSPLIT and Block Matrices
In-Reply-To: <CAO1tTfJTEVuCQT6zrfRLdCogdczMrwPrUUGfJ=cfmaZb1UQ4jQ@mail.gmail.com>
References: <CAO1tTfJTEVuCQT6zrfRLdCogdczMrwPrUUGfJ=cfmaZb1UQ4jQ@mail.gmail.com>
Message-ID: <4CCA2E78-69B0-40AF-821F-D31EB7E523F6@petsc.dev>


> On Aug 6, 2021, at 5:26 PM, Alfredo J Duarte Gomez <aduarteg at utexas.edu> wrote:
> 
> Good morning,
> 
> I am currently working on a PETSC application that will require a preconditioner that uses several block matrices.
> 
> For now, I have a simple problem that I am solving with a dmda structured grid with two fields. For presentation purposes (I know petsc does not use this ordering), lets assume a vector ordering [u1,u2,...,uN,v1,v2,...vN] where u and v are my two fields with N number of grid points. The coupling between these two fields is weak enough that an efficient preconditioner can be formed as the matrix P = [A1, 0;0,A2] where A1 (dependent on u only) and A2 (dependent on v only) are block matrices of size NxN. Therefore, I only require two linear solves of the reduced systems.
> 
> I am passing the preconditioner matrix P in the Jacobian function, and I hope this strategy is what I am telling PETSC to do with the following block of code:
> 
> ierr = KSPGetPC(snes,&pc);CHKERRQ(ierr);CHKERRQ(ierr);
> ierr = PCSetType(pc,PCFIELDSPLIT);CHKERRQ(ierr);
> ierr =  DMCreateFieldDecomposition(dau,NULL,NULL,&fields,NULL);CHKERRQ(ierr);
> ierr =  PCFieldSplitSetIS(pc,NULL,fields[0]);CHKERRQ(ierr);
> ierr =  PCFieldSplitSetIS(pc,NULL,fields[1]);CHKERRQ(ierr);
> 
> Is this what is actually happening, or is the split also including some of the zero blocks on P? 

You should also use PCFieldSplitSetType(pc,PC_COMPOSITE_ADDITIVE) then the preconditioned problem will look like

[   KSPSolve(A1) ;  0       ]  [ A1   A12 ]    
[     0 ;      KSPSolve(A2) ]  [ A21 A22 ]                                               

in other words the preconditioner is 
                                     [A1    0 ]
approximate inverse (                )
                                    [ 0     A2 ]

the computation is done efficiently and never uses the zero blocks. 

The default PCFieldSplitType is PC_COMPOSITE_MULTIPLICATIVE where the preconditioner system looks like
 
                                     [A1         0 ]
approximate inverse (                      )
                                    [ A12     A2 ]

The preconditioner is applied efficiently by first (approximately) solving with A1, then applying A12 to that (approximate) solution, removing it from the right hand side of the second block and then (approximately) solving with A2. 

Note that if you are using DMDA you don't need to write the above code you can use -pc_type fieldsplit -pc_fieldsplit_type additive and it will use the fields as you would like.


> 
> Second, for a future application, I will need a slightly more complicated strategy. It will require solving a similar matrix to P as specified above with more fields (Block diagonal for the fields), and then using the answer to those independent systems for a smaller local solves. In summary, if i have M fields and N grid points, I will solve M systems of size N then followed by using solution as the right hand side to solve N systems of size M.

    It sounds like a block diagonal preconditioner (with one block per field) in one ordering then changing the ordering and doing another block diagonal preconditioner with one block for each grid point. PCFIELDSPLIT cannot do this since it basically works with a single ordering.

  You might be able to combine multiple preconditioners using PCCOMPOSITE that does a PCFIELDSPLIT  then a PCPBJACOBI. Again you have a choice of additive or multiplicative formulation. You should not need to use PCSHELL. In fact you should not have to write any code, you should be able to control it completely from the options database with, maybe,  -pc_type composite -pc_composite_pcs fieldsplit,pbjacobi -pc_composite_type additive

You can also control the solvers used on the inner solves from the options database. If you run with -ksp_view it will show the options prefix for each inner solve and you can use them to control the fields solvers, for example, using gamg for one of the fields in the PCFIELDSPLIT.


> 
> Is this something that the PCFIELDSPLIT can accomodate? Or will I have to implement my own PCSHELL?
> 
> Thank you,
> 
> -Alfredo
> 
> -- 
> Alfredo Duarte
> Graduate Research Assistant
> The University of Texas at Austin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210806/c19778aa/attachment.html>

From balay at mcs.anl.gov  Fri Aug  6 22:38:45 2021
From: balay at mcs.anl.gov (Satish Balay)
Date: Fri, 6 Aug 2021 22:38:45 -0500 (CDT)
Subject: [petsc-users] petsc-3.15.3 now available
Message-ID: <36a33a9e-8d92-77da-413d-506ca8dc923@mcs.anl.gov>

Dear PETSc users,

The patch release petsc-3.15.3 is now available for download.

https://petsc.org/release/download/

Satish


From armandococo28 at gmail.com  Tue Aug 10 04:35:11 2021
From: armandococo28 at gmail.com (Armando Coco)
Date: Tue, 10 Aug 2021 10:35:11 +0100
Subject: [petsc-users] use of undeclared identifier PetscViewerHDF5Open - in
 Petsc 3.12
Message-ID: <CAG9LePFjc4M6MB+zzjV-PzhiUiD5bL0PrQVvtuU808j3JfbXDQ@mail.gmail.com>

Hello,

I am trying to compile a petsc program that calls PetscViewerHDF5Open.
I have added #include <petscviewerhdf5.h> in the header, but the
compilation fails with error:
use of undeclared identifier PetscViewerHDF5Open

I have asked a colleague to run the same program in a newer version 3.14 or
3.15 and everything seems to work properly. Does it mean that I have to
update my petsc version necessarily?

Many Thanks
Armando
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210810/08bbb579/attachment.html>

From knepley at gmail.com  Tue Aug 10 08:23:51 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 10 Aug 2021 09:23:51 -0400
Subject: [petsc-users] use of undeclared identifier PetscViewerHDF5Open
 - in Petsc 3.12
In-Reply-To: <CAG9LePFjc4M6MB+zzjV-PzhiUiD5bL0PrQVvtuU808j3JfbXDQ@mail.gmail.com>
References: <CAG9LePFjc4M6MB+zzjV-PzhiUiD5bL0PrQVvtuU808j3JfbXDQ@mail.gmail.com>
Message-ID: <CAMYG4GkBU1V8yzbjv6ObUX1y4xKbfgoePuONagPVi52VpODPYA@mail.gmail.com>

On Tue, Aug 10, 2021 at 5:35 AM Armando Coco <armandococo28 at gmail.com>
wrote:

> Hello,
>
> I am trying to compile a petsc program that calls PetscViewerHDF5Open.
> I have added #include <petscviewerhdf5.h> in the header, but the
> compilation fails with error:
> use of undeclared identifier PetscViewerHDF5Open
>
> I have asked a colleague to run the same program in a newer version 3.14
> or 3.15 and everything seems to work properly. Does it mean that I have to
> update my petsc version necessarily?
>

I see the declaration there in version 3.12:

  https://gitlab.com/petsc/petsc/-/blob/v3.12/include/petscviewerhdf5.h#L43

Can you send the entire error output?

  Thanks,

     Matt


> Many Thanks
> Armando
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210810/67ef5450/attachment.html>

From bsmith at petsc.dev  Tue Aug 10 13:40:45 2021
From: bsmith at petsc.dev (Barry Smith)
Date: Tue, 10 Aug 2021 13:40:45 -0500
Subject: [petsc-users] use of undeclared identifier PetscViewerHDF5Open
 - in Petsc 3.12
In-Reply-To: <CAMYG4GkBU1V8yzbjv6ObUX1y4xKbfgoePuONagPVi52VpODPYA@mail.gmail.com>
References: <CAG9LePFjc4M6MB+zzjV-PzhiUiD5bL0PrQVvtuU808j3JfbXDQ@mail.gmail.com>
	<CAMYG4GkBU1V8yzbjv6ObUX1y4xKbfgoePuONagPVi52VpODPYA@mail.gmail.com>
Message-ID: <4CBB0226-798D-4A8B-BDDB-FD88042E231E@petsc.dev>


  Likely your install of PETSc was not configured for HDF5.  Use ./configure --download-hdf5 


> On Aug 10, 2021, at 8:23 AM, Matthew Knepley <knepley at gmail.com> wrote:
> 
> On Tue, Aug 10, 2021 at 5:35 AM Armando Coco <armandococo28 at gmail.com <mailto:armandococo28 at gmail.com>> wrote:
> Hello,
> 
> I am trying to compile a petsc program that calls PetscViewerHDF5Open. 
> I have added #include <petscviewerhdf5.h> in the header, but the compilation fails with error:
> use of undeclared identifier PetscViewerHDF5Open
> 
> I have asked a colleague to run the same program in a newer version 3.14 or 3.15 and everything seems to work properly. Does it mean that I have to update my petsc version necessarily?
> 
> I see the declaration there in version 3.12:
> 
>   https://gitlab.com/petsc/petsc/-/blob/v3.12/include/petscviewerhdf5.h#L43 <https://gitlab.com/petsc/petsc/-/blob/v3.12/include/petscviewerhdf5.h#L43>
> 
> Can you send the entire error output?
> 
>   Thanks,
> 
>      Matt
>  
> Many Thanks
> Armando
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210810/a764e0e9/attachment.html>

From armandococo28 at gmail.com  Wed Aug 11 08:39:28 2021
From: armandococo28 at gmail.com (Armando Coco)
Date: Wed, 11 Aug 2021 14:39:28 +0100
Subject: [petsc-users] use of undeclared identifier PetscViewerHDF5Open
 - in Petsc 3.12
In-Reply-To: <4CBB0226-798D-4A8B-BDDB-FD88042E231E@petsc.dev>
References: <CAG9LePFjc4M6MB+zzjV-PzhiUiD5bL0PrQVvtuU808j3JfbXDQ@mail.gmail.com>
	<CAMYG4GkBU1V8yzbjv6ObUX1y4xKbfgoePuONagPVi52VpODPYA@mail.gmail.com>
	<4CBB0226-798D-4A8B-BDDB-FD88042E231E@petsc.dev>
Message-ID: <CAG9LePHYtndqgjJ=BfoLkR_iNEyCwurZZAzOr_fP3tAh9KuYfA@mail.gmail.com>

Yes, it works with ./configure --download-hdf5
Thank you!!
Armando

Il giorno mar 10 ago 2021 alle ore 19:40 Barry Smith <bsmith at petsc.dev> ha
scritto:

>
>   Likely your install of PETSc was not configured for HDF5.  Use
> ./configure --download-hdf5
>
>
>
> On Aug 10, 2021, at 8:23 AM, Matthew Knepley <knepley at gmail.com> wrote:
>
> On Tue, Aug 10, 2021 at 5:35 AM Armando Coco <armandococo28 at gmail.com>
> wrote:
>
>> Hello,
>>
>> I am trying to compile a petsc program that calls PetscViewerHDF5Open.
>> I have added #include <petscviewerhdf5.h> in the header, but the
>> compilation fails with error:
>> use of undeclared identifier PetscViewerHDF5Open
>>
>> I have asked a colleague to run the same program in a newer version 3.14
>> or 3.15 and everything seems to work properly. Does it mean that I have to
>> update my petsc version necessarily?
>>
>
> I see the declaration there in version 3.12:
>
>
> https://gitlab.com/petsc/petsc/-/blob/v3.12/include/petscviewerhdf5.h#L43
>
> Can you send the entire error output?
>
>   Thanks,
>
>      Matt
>
>
>> Many Thanks
>> Armando
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210811/4504f766/attachment.html>

From aduarteg at utexas.edu  Wed Aug 11 14:33:14 2021
From: aduarteg at utexas.edu (Alfredo J Duarte Gomez)
Date: Wed, 11 Aug 2021 14:33:14 -0500
Subject: [petsc-users] Concatenating DM vectors
Message-ID: <CAO1tTfK0sCboGumpMupG5Gmafoi5Czs_Pb65AxZr8xto7T8OMQ@mail.gmail.com>

Good morning,

I am currently handling a structured dmda object with more than one field.

In some intermediate operations, I have to create and handle vectors of a
size that corresponds to the same dmda with one field only.

After that, it would be very useful to concatenate these vectors and then
use them with matrices of the size of the original dmda (more than one
field), I hope the vectors keep their i,j structure from the dmda.

I tried using VecConcatenate but it seems to be scrambling the vector
without the i,j arrangement, and the only other way I can think of is using
a for loop over every grid point which seems cumbersome.

Any suggestions for this problem?

-Alfredo

-- 
Alfredo Duarte
Graduate Research Assistant
The University of Texas at Austin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210811/7fdb7dfc/attachment.html>

From rlmackie862 at gmail.com  Wed Aug 11 14:47:54 2021
From: rlmackie862 at gmail.com (Randall Mackie)
Date: Wed, 11 Aug 2021 12:47:54 -0700
Subject: [petsc-users] Concatenating DM vectors
In-Reply-To: <CAO1tTfK0sCboGumpMupG5Gmafoi5Czs_Pb65AxZr8xto7T8OMQ@mail.gmail.com>
References: <CAO1tTfK0sCboGumpMupG5Gmafoi5Czs_Pb65AxZr8xto7T8OMQ@mail.gmail.com>
Message-ID: <102AE53C-BAEA-4750-A997-3220EC45BFEE@gmail.com>

Hi Alfredo

Take a look at VecStrideGather and VecStrideScatter?.maybe these are what you want?

https://petsc.org/release/docs/manualpages/Vec/VecStrideGather.html <https://petsc.org/release/docs/manualpages/Vec/VecStrideGather.html>

https://petsc.org/release/docs/manualpages/Vec/VecStrideScatter.html <https://petsc.org/release/docs/manualpages/Vec/VecStrideScatter.html>


Randy M.


> On Aug 11, 2021, at 12:33 PM, Alfredo J Duarte Gomez <aduarteg at utexas.edu> wrote:
> 
> Good morning,
> 
> I am currently handling a structured dmda object with more than one field.
> 
> In some intermediate operations, I have to create and handle vectors of a size that corresponds to the same dmda with one field only.
> 
> After that, it would be very useful to concatenate these vectors and then use them with matrices of the size of the original dmda (more than one field), I hope the vectors keep their i,j structure from the dmda.
> 
> I tried using VecConcatenate but it seems to be scrambling the vector without the i,j arrangement, and the only other way I can think of is using a for loop over every grid point which seems cumbersome.
> 
> Any suggestions for this problem?
> 
> -Alfredo
> 
> -- 
> Alfredo Duarte
> Graduate Research Assistant
> The University of Texas at Austin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210811/4c3b88c8/attachment.html>

From aduarteg at utexas.edu  Thu Aug 12 09:43:59 2021
From: aduarteg at utexas.edu (Alfredo J Duarte Gomez)
Date: Thu, 12 Aug 2021 09:43:59 -0500
Subject: [petsc-users] Sparse Matrix Matrix Multiply
Message-ID: <CAO1tTfKcYw_+d+sCd+qHDf5NHy0LejSHUQ-6AJDGPX5z8XvRrg@mail.gmail.com>

Good morning,

I am currently having some trouble in the creation of some matrices.

I am using structured dmda objects to create matrices using the DMCreate()
function.

One of these matrices will be the result of a matrix-matrix product of two
of these dm matrices.

I know that the matrix product will have more nonzero entries or at least a
bigger stencil than the original dm matrices, however I accounted for that
when I set the DMDA stencil width in the initial creation.

The problem is that even with that, the resulting matrix-matrix product has
a bigger stencil as evidenced by failure in subsequent matrix copy/addition
operations using SAME_NONZERO_PATTERN.

Judging by the difference of the nonzero entries I believe that initial
zero entries (the ones I initialized to eventually hold this
expaned stencil) on the original dm matrices are being combined to further
expand the stencil of the product matrix.

Is there any way of getting a matrix-matrix product that will keep the same
nonzero pattern as the dm matrices?

I have tried both MatMatMult() and the MatProductCreate() sequence so far,
but both produce nonzero patterns that do not match the dm nonzero pattern.

Thank you,

-Alfredo


-- 
Alfredo Duarte
Graduate Research Assistant
The University of Texas at Austin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210812/d301a57b/attachment.html>

From knepley at gmail.com  Thu Aug 12 10:31:10 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 12 Aug 2021 11:31:10 -0400
Subject: [petsc-users] Sparse Matrix Matrix Multiply
In-Reply-To: <CAO1tTfKcYw_+d+sCd+qHDf5NHy0LejSHUQ-6AJDGPX5z8XvRrg@mail.gmail.com>
References: <CAO1tTfKcYw_+d+sCd+qHDf5NHy0LejSHUQ-6AJDGPX5z8XvRrg@mail.gmail.com>
Message-ID: <CAMYG4G=B0Dxrv7VuOBTHZJXBwybxat2Hhqnr0H3FOQ0KLguw_g@mail.gmail.com>

On Thu, Aug 12, 2021 at 10:44 AM Alfredo J Duarte Gomez <aduarteg at utexas.edu>
wrote:

> Good morning,
>
> I am currently having some trouble in the creation of some matrices.
>
> I am using structured dmda objects to create matrices using the DMCreate()
> function.
>
> One of these matrices will be the result of a matrix-matrix product of two
> of these dm matrices.
>
> I know that the matrix product will have more nonzero entries or at least
> a bigger stencil than the original dm matrices, however I accounted for
> that when I set the DMDA stencil width in the initial creation.
>

By default, we put zeros into those locations, so you would expand that
stencil when doing MatMatMult(). You can use

  -dm_preallocate_only

to prevent the zeros from being included. However, then your target matrix
would not have those locations, so you would
need to turn that off before creating the product matrix, or you could just
make two DMDA with different stencils, since they
are really small. This later solutions sounds cleaner to me.

  Thanks,

     Matt


> The problem is that even with that, the resulting matrix-matrix product
> has a bigger stencil as evidenced by failure in subsequent matrix
> copy/addition operations using SAME_NONZERO_PATTERN.
>
> Judging by the difference of the nonzero entries I believe that initial
> zero entries (the ones I initialized to eventually hold this
> expaned stencil) on the original dm matrices are being combined to further
> expand the stencil of the product matrix.
>
> Is there any way of getting a matrix-matrix product that will keep the
> same nonzero pattern as the dm matrices?
>
> I have tried both MatMatMult() and the MatProductCreate() sequence so far,
> but both produce nonzero patterns that do not match the dm nonzero pattern.
>
> Thank you,
>
> -Alfredo
>
>
>
> --
> Alfredo Duarte
> Graduate Research Assistant
> The University of Texas at Austin
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210812/22f5b34d/attachment.html>

From bsmith at petsc.dev  Thu Aug 12 11:09:59 2021
From: bsmith at petsc.dev (Barry Smith)
Date: Thu, 12 Aug 2021 11:09:59 -0500
Subject: [petsc-users] Sparse Matrix Matrix Multiply
In-Reply-To: <CAMYG4G=B0Dxrv7VuOBTHZJXBwybxat2Hhqnr0H3FOQ0KLguw_g@mail.gmail.com>
References: <CAO1tTfKcYw_+d+sCd+qHDf5NHy0LejSHUQ-6AJDGPX5z8XvRrg@mail.gmail.com>
	<CAMYG4G=B0Dxrv7VuOBTHZJXBwybxat2Hhqnr0H3FOQ0KLguw_g@mail.gmail.com>
Message-ID: <C48E1DDA-048C-46AE-8EA2-F730D100F475@petsc.dev>


   I don't understand. Why do you wish the new matrix-matrix product vector to have the same nonzero pattern as the basic dm matrix? 

   If you multiple two dm matrices together it will generally have a larger stencil then the dm matrix but this is normal and the new product matrix handles it correctly. You should not copy this new "larger" matrix into a dm matrix. When you do MatAXPY() or MatAYPX() you should put the result into the product matrix, not the dm matrix and you can use SUBSET_NONZERO_PATTERN to make it reasonably efficient.

  Barry


> On Aug 12, 2021, at 10:31 AM, Matthew Knepley <knepley at gmail.com> wrote:
> 
> On Thu, Aug 12, 2021 at 10:44 AM Alfredo J Duarte Gomez <aduarteg at utexas.edu <mailto:aduarteg at utexas.edu>> wrote:
> Good morning,
> 
> I am currently having some trouble in the creation of some matrices.
> 
> I am using structured dmda objects to create matrices using the DMCreate() function.
> 
> One of these matrices will be the result of a matrix-matrix product of two of these dm matrices.
> 
> I know that the matrix product will have more nonzero entries or at least a bigger stencil than the original dm matrices, however I accounted for that when I set the DMDA stencil width in the initial creation.
> 
> By default, we put zeros into those locations, so you would expand that stencil when doing MatMatMult(). You can use
> 
>   -dm_preallocate_only
> 
> to prevent the zeros from being included. However, then your target matrix would not have those locations, so you would
> need to turn that off before creating the product matrix, or you could just make two DMDA with different stencils, since they
> are really small. This later solutions sounds cleaner to me.
> 
>   Thanks,
> 
>      Matt
>  
> The problem is that even with that, the resulting matrix-matrix product has a bigger stencil as evidenced by failure in subsequent matrix copy/addition operations using SAME_NONZERO_PATTERN.
> 
> Judging by the difference of the nonzero entries I believe that initial zero entries (the ones I initialized to eventually hold this expaned stencil) on the original dm matrices are being combined to further expand the stencil of the product matrix. 
> 
> Is there any way of getting a matrix-matrix product that will keep the same nonzero pattern as the dm matrices?
> 
> I have tried both MatMatMult() and the MatProductCreate() sequence so far, but both produce nonzero patterns that do not match the dm nonzero pattern.
> 
> Thank you,
> 
> -Alfredo
> 
> 
> 
> -- 
> Alfredo Duarte
> Graduate Research Assistant
> The University of Texas at Austin
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210812/3c65c495/attachment-0001.html>

From miguel.td19 at outlook.com  Fri Aug 13 23:19:19 2021
From: miguel.td19 at outlook.com (Miguel Angel Tapia)
Date: Sat, 14 Aug 2021 04:19:19 +0000
Subject: [petsc-users] Numbering convention
Message-ID: <SA0PR06MB6873442BAC464566A469816FF5FB9@SA0PR06MB6873.namprd06.prod.outlook.com>

Hello. First of all thanks for the answers to the previous questions. They were really useful to me.

Now I am facing a new problem. The code in which I want to implement DMPlex has a specific order in which it orders the elements that make up other elements for meshes of tetrahedra. But when I get the elements that make up some point of the DMPlex DAG they don't match what I need. So I would like to know what is the numbering convention for meshes of tetrahedra that is used in DMPlex?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210814/d9872161/attachment.html>

From knepley at gmail.com  Sat Aug 14 08:27:52 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Sat, 14 Aug 2021 09:27:52 -0400
Subject: [petsc-users] Numbering convention
In-Reply-To: <SA0PR06MB6873442BAC464566A469816FF5FB9@SA0PR06MB6873.namprd06.prod.outlook.com>
References: <SA0PR06MB6873442BAC464566A469816FF5FB9@SA0PR06MB6873.namprd06.prod.outlook.com>
Message-ID: <CAMYG4GmQ32bEOgYYJ7hqhksMQk1p-Cw1t_YrCfgTKegyqtmP1g@mail.gmail.com>

On Sat, Aug 14, 2021 at 12:19 AM Miguel Angel Tapia <miguel.td19 at outlook.com>
wrote:

> Hello. First of all thanks for the answers to the previous questions. They
> were really useful to me.
>
> Now I am facing a new problem. The code in which I want to implement
> DMPlex has a specific order in which it orders the elements that make up
> other elements for meshes of tetrahedra. But when I get the elements that
> make up some point of the DMPlex DAG they don't match what I need. So I
> would like to know what is the numbering convention for meshes of
> tetrahedra that is used in DMPlex?
>

We need to be a little more precise, so I can understand what you need.

We usually begin with a cell associated to a set of vertices. Our reference
tetrahedron is composed of the vertices, numbered 0 to 3:

  (-1, -1, -1) -- (-1, 1, -1) -- (1, -1, -1) -- (-1, -1, 1)

The triangular faces, numbers 0 to 3, are composed of the vertices

  {0, 1, 2}, {0, 3, 1}, {0, 2, 3}, {2, 1, 3}

Notice that they all have outward normal.

  Thanks,

     Matt

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210814/d1f02c01/attachment.html>

From zjorti at lanl.gov  Mon Aug 16 18:56:36 2021
From: zjorti at lanl.gov (Jorti, Zakariae)
Date: Mon, 16 Aug 2021 23:56:36 +0000
Subject: [petsc-users] malloc error
Message-ID: <dfb4abf3136348a98d6342b53df94568@lanl.gov>

Hello,


I am using TSSolve to solve a linear problem.

In the FormIJacobian function that I provide to TSSetIJacobian, I first set the coefficients of both J and Jpre matrices the same way (J and Jpre matrices are equal in the first step). Then I call MatAXPY to prepare Jpre (Jpre := Jpre - another_matrix. So, Jpre and J are not equal anymore).


But I get the error once FormIJacobian is called the second time inside TSSolve:

"[0]PETSC ERROR: New nonzero at (5,1) caused a malloc

Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn off this check".


It looks like MatAXPY changes the allocation of Jpre, which the second FormIJacobian does not like unless Jpre is destroyed first.


Do you have any suggestions to fix this malloc issue?

Thanks.


Best regards,


Zakariae
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210816/3c1efd2e/attachment.html>

From jed at jedbrown.org  Mon Aug 16 23:40:20 2021
From: jed at jedbrown.org (Jed Brown)
Date: Mon, 16 Aug 2021 22:40:20 -0600
Subject: [petsc-users] malloc error
In-Reply-To: <dfb4abf3136348a98d6342b53df94568@lanl.gov>
References: <dfb4abf3136348a98d6342b53df94568@lanl.gov>
Message-ID: <87sfz865vv.fsf@jedbrown.org>

"Jorti, Zakariae via petsc-users" <petsc-users at mcs.anl.gov> writes:

> Hello,
>
>
> I am using TSSolve to solve a linear problem.
>
> In the FormIJacobian function that I provide to TSSetIJacobian, I first set the coefficients of both J and Jpre matrices the same way (J and Jpre matrices are equal in the first step). Then I call MatAXPY to prepare Jpre (Jpre := Jpre - another_matrix. So, Jpre and J are not equal anymore).

How do you call MatAXPY? What MatStructure arg are you passing? What is the sparsity pattern of another_matrix relative to Jpre?

>
> But I get the error once FormIJacobian is called the second time inside TSSolve:
>
> "[0]PETSC ERROR: New nonzero at (5,1) caused a malloc
>
> Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn off this check".
>
>
> It looks like MatAXPY changes the allocation of Jpre, which the second FormIJacobian does not like unless Jpre is destroyed first.
>
>
> Do you have any suggestions to fix this malloc issue?
>
> Thanks.
>
>
> Best regards,
>
>
> Zakariae

From zjorti at lanl.gov  Tue Aug 17 15:51:16 2021
From: zjorti at lanl.gov (Jorti, Zakariae)
Date: Tue, 17 Aug 2021 20:51:16 +0000
Subject: [petsc-users] [EXTERNAL] Re:  malloc error
In-Reply-To: <87sfz865vv.fsf@jedbrown.org>
References: <dfb4abf3136348a98d6342b53df94568@lanl.gov>,
	<87sfz865vv.fsf@jedbrown.org>
Message-ID: <9220838398464eaf8961b78523988d90@lanl.gov>

Hello,


Thank you for your reply.

The problem is now fixed.

The issue was actually with MatZeroRowsIS that was called after MatAXPY to cancel the boundary rows of Jpre.

It seems to change the non-zero pattern of Jpre.

I added MatSetOption(Jpre,MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE); to make sure it does not happen.

Thanks.


Zakariae

________________________________
From: Jed Brown <jed at jedbrown.org>
Sent: Monday, August 16, 2021 10:40:20 PM
To: Jorti, Zakariae; petsc-users at mcs.anl.gov
Cc: Tang, Xianzhu
Subject: [EXTERNAL] Re: [petsc-users] malloc error

"Jorti, Zakariae via petsc-users" <petsc-users at mcs.anl.gov> writes:

> Hello,
>
>
> I am using TSSolve to solve a linear problem.
>
> In the FormIJacobian function that I provide to TSSetIJacobian, I first set the coefficients of both J and Jpre matrices the same way (J and Jpre matrices are equal in the first step). Then I call MatAXPY to prepare Jpre (Jpre := Jpre - another_matrix. So, Jpre and J are not equal anymore).

How do you call MatAXPY? What MatStructure arg are you passing? What is the sparsity pattern of another_matrix relative to Jpre?

>
> But I get the error once FormIJacobian is called the second time inside TSSolve:
>
> "[0]PETSC ERROR: New nonzero at (5,1) caused a malloc
>
> Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn off this check".
>
>
> It looks like MatAXPY changes the allocation of Jpre, which the second FormIJacobian does not like unless Jpre is destroyed first.
>
>
> Do you have any suggestions to fix this malloc issue?
>
> Thanks.
>
>
> Best regards,
>
>
> Zakariae
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210817/2ec2902b/attachment.html>

From yuf2 at rpi.edu  Wed Aug 18 12:52:30 2021
From: yuf2 at rpi.edu (Feimi Yu)
Date: Wed, 18 Aug 2021 13:52:30 -0400
Subject: [petsc-users] Reaching limit number of communicator with Spectrum
 MPI
Message-ID: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu>

Hi,

I was trying to run a simulation with a PETSc-wrapped Hypre 
preconditioner, and encountered this problem:

[dcs122:133012] Out of resources: all 4095 communicator IDs have been used.
[19]PETSC ERROR: --------------------- Error Message 
--------------------------------------------------------------
[19]PETSC ERROR: General MPI error
[19]PETSC ERROR: MPI error 17 MPI_ERR_INTERN: internal error
[19]PETSC ERROR: See 
https://www.mcs.anl.gov/petsc/documentation/faq.html 
<https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble shooting.
[19]PETSC ERROR: Petsc Release Version 3.15.2, unknown
[19]PETSC ERROR: ./main on a arch-linux-c-opt named dcs122 by CFSIfmyu 
Wed Aug 11 19:51:47 2021
[19]PETSC ERROR: [dcs122:133010] Out of resources: all 4095 communicator 
IDs have been used.
[18]PETSC ERROR: --------------------- Error Message 
--------------------------------------------------------------
[18]PETSC ERROR: General MPI error
[18]PETSC ERROR: MPI error 17 MPI_ERR_INTERN: internal error
[18]PETSC ERROR: See 
https://www.mcs.anl.gov/petsc/documentation/faq.html 
<https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble shooting.
[18]PETSC ERROR: Petsc Release Version 3.15.2, unknown
[18]PETSC ERROR: ./main on a arch-linux-c-opt named dcs122 by CFSIfmyu 
Wed Aug 11 19:51:47 2021
[18]PETSC ERROR: Configure options --download-scalapack --download-mumps 
--download-hypre --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 
--with-cudac=0 --with-debugging=0 
--with-blaslapack-dir=/gpfs/u/home/CFSI/CFSIfmyu/barn-shared/dcs-rh8/lapack-build/
[18]PETSC ERROR: #1 <https://itssc.rpi.edu/hc/requests/1> 
MatCreate_HYPRE() at 
/gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:2120
[18]PETSC ERROR: #2 <https://itssc.rpi.edu/hc/requests/2> MatSetType() 
at /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matreg.c:91
[18]PETSC ERROR: #3 <https://itssc.rpi.edu/hc/requests/3> 
MatConvert_AIJ_HYPRE() at 
/gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:392
[18]PETSC ERROR: #4 <https://itssc.rpi.edu/hc/requests/4> MatConvert() 
at /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matrix.c:4439
[18]PETSC ERROR: #5 <https://itssc.rpi.edu/hc/requests/5> 
PCSetUp_HYPRE() at 
/gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/impls/hypre/hypre.c:240
[18]PETSC ERROR: #6 <https://itssc.rpi.edu/hc/requests/6> PCSetUp() at 
/gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/interface/precon.c:1015
Configure options --download-scalapack --download-mumps --download-hypre 
--with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --with-cudac=0 
--with-debugging=0 
--with-blaslapack-dir=/gpfs/u/home/CFSI/CFSIfmyu/barn-shared/dcs-rh8/lapack-build/
[19]PETSC ERROR: #1 <https://itssc.rpi.edu/hc/requests/1> 
MatCreate_HYPRE() at 
/gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:2120
[19]PETSC ERROR: #2 <https://itssc.rpi.edu/hc/requests/2> MatSetType() 
at /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matreg.c:91
[19]PETSC ERROR: #3 <https://itssc.rpi.edu/hc/requests/3> 
MatConvert_AIJ_HYPRE() at 
/gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:392
[19]PETSC ERROR: #4 <https://itssc.rpi.edu/hc/requests/4> MatConvert() 
at /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matrix.c:4439
[19]PETSC ERROR: #5 <https://itssc.rpi.edu/hc/requests/5> 
PCSetUp_HYPRE() at 
/gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/impls/hypre/hypre.c:240
[19]PETSC ERROR: #6 <https://itssc.rpi.edu/hc/requests/6> PCSetUp() at 
/gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/interface/precon.c:1015

It seems that MPI_Comm_dup() at petsc/src/mat/impls/hypre/mhypre.c:2120 
caused the problem. Since mine is a time-dependent problem, 
MatCreate_HYPRE() is called every time the new system matrix is 
assembled. The above error message is reported after ~4095 calls of 
MatCreate_HYPRE(), which is around 455 time steps in my code. Here is 
some basic compiler information:

IBM Spectrum MPI 10.4.0

GCC 8.4.1

I've never had this problem before with OpenMPI or MPICH implementation, 
so I was wondering if this can be resolved from my end, or it's an 
implementation specific problem.

Thanks!

Feimi

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210818/343ce926/attachment.html>

From junchao.zhang at gmail.com  Wed Aug 18 15:23:27 2021
From: junchao.zhang at gmail.com (Junchao Zhang)
Date: Wed, 18 Aug 2021 15:23:27 -0500
Subject: [petsc-users] Reaching limit number of communicator with
 Spectrum MPI
In-Reply-To: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu>
References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu>
Message-ID: <CA+MQGp_EVwiHF1zaRF5o4qvWpk4yByq-Qrrx3OO2hrrKHnSvZA@mail.gmail.com>

On Wed, Aug 18, 2021 at 12:52 PM Feimi Yu <yuf2 at rpi.edu> wrote:

> Hi,
>
> I was trying to run a simulation with a PETSc-wrapped Hypre
> preconditioner, and encountered this problem:
>
> [dcs122:133012] Out of resources: all 4095 communicator IDs have been used.
> [19]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [19]PETSC ERROR: General MPI error
> [19]PETSC ERROR: MPI error 17 MPI_ERR_INTERN: internal error
> [19]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [19]PETSC ERROR: Petsc Release Version 3.15.2, unknown
> [19]PETSC ERROR: ./main on a arch-linux-c-opt named dcs122 by CFSIfmyu Wed
> Aug 11 19:51:47 2021
> [19]PETSC ERROR: [dcs122:133010] Out of resources: all 4095 communicator
> IDs have been used.
> [18]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [18]PETSC ERROR: General MPI error
> [18]PETSC ERROR: MPI error 17 MPI_ERR_INTERN: internal error
> [18]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [18]PETSC ERROR: Petsc Release Version 3.15.2, unknown
> [18]PETSC ERROR: ./main on a arch-linux-c-opt named dcs122 by CFSIfmyu Wed
> Aug 11 19:51:47 2021
> [18]PETSC ERROR: Configure options --download-scalapack --download-mumps
> --download-hypre --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90
> --with-cudac=0 --with-debugging=0
> --with-blaslapack-dir=/gpfs/u/home/CFSI/CFSIfmyu/barn-shared/dcs-rh8/lapack-build/
> [18]PETSC ERROR: #1 <https://itssc.rpi.edu/hc/requests/1>
> MatCreate_HYPRE() at
> /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:2120
> [18]PETSC ERROR: #2 <https://itssc.rpi.edu/hc/requests/2> MatSetType() at
> /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matreg.c:91
> [18]PETSC ERROR: #3 <https://itssc.rpi.edu/hc/requests/3>
> MatConvert_AIJ_HYPRE() at
> /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:392
> [18]PETSC ERROR: #4 <https://itssc.rpi.edu/hc/requests/4> MatConvert() at
> /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matrix.c:4439
> [18]PETSC ERROR: #5 <https://itssc.rpi.edu/hc/requests/5> PCSetUp_HYPRE()
> at /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/impls/hypre/hypre.c:240
> [18]PETSC ERROR: #6 <https://itssc.rpi.edu/hc/requests/6> PCSetUp() at
> /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/interface/precon.c:1015
> Configure options --download-scalapack --download-mumps --download-hypre
> --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --with-cudac=0
> --with-debugging=0
> --with-blaslapack-dir=/gpfs/u/home/CFSI/CFSIfmyu/barn-shared/dcs-rh8/lapack-build/
> [19]PETSC ERROR: #1 <https://itssc.rpi.edu/hc/requests/1>
> MatCreate_HYPRE() at
> /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:2120
> [19]PETSC ERROR: #2 <https://itssc.rpi.edu/hc/requests/2> MatSetType() at
> /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matreg.c:91
> [19]PETSC ERROR: #3 <https://itssc.rpi.edu/hc/requests/3>
> MatConvert_AIJ_HYPRE() at
> /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:392
> [19]PETSC ERROR: #4 <https://itssc.rpi.edu/hc/requests/4> MatConvert() at
> /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matrix.c:4439
> [19]PETSC ERROR: #5 <https://itssc.rpi.edu/hc/requests/5> PCSetUp_HYPRE()
> at /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/impls/hypre/hypre.c:240
> [19]PETSC ERROR: #6 <https://itssc.rpi.edu/hc/requests/6> PCSetUp() at
> /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/interface/precon.c:1015
>
> It seems that MPI_Comm_dup() at petsc/src/mat/impls/hypre/mhypre.c:2120
> caused the problem. Since mine is a time-dependent problem,
> MatCreate_HYPRE() is called every time the new system matrix is assembled.
> The above error message is reported after ~4095 calls of MatCreate_HYPRE(),
> which is around 455 time steps in my code. Here is some basic compiler
> information:
>
Can you destroy old matrices to free MPI communicators?  Otherwise, you run
into a limitation we knew before.

>
>
IBM Spectrum MPI 10.4.0
>
> GCC 8.4.1
>
> I've never had this problem before with OpenMPI or MPICH implementation,
> so I was wondering if this can be resolved from my end, or it's an
> implementation specific problem.
>
> Thanks!
>
> Feimi
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210818/3f53f573/attachment.html>

From yuf2 at rpi.edu  Wed Aug 18 15:31:30 2021
From: yuf2 at rpi.edu (Feimi Yu)
Date: Wed, 18 Aug 2021 16:31:30 -0400
Subject: [petsc-users] Reaching limit number of communicator with
 Spectrum MPI
In-Reply-To: <CA+MQGp_EVwiHF1zaRF5o4qvWpk4yByq-Qrrx3OO2hrrKHnSvZA@mail.gmail.com>
References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu>
	<CA+MQGp_EVwiHF1zaRF5o4qvWpk4yByq-Qrrx3OO2hrrKHnSvZA@mail.gmail.com>
Message-ID: <71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu>

Hi Junchao,

Thank you for the suggestion! I'm using the deal.ii wrapper 
dealii::PETScWrappers::PreconditionBase to handle the PETSc 
preconditioners, and the wrappers does the destroy when the 
preconditioner is reinitialized or gets out of scope. I just 
double-checked, this is called to make sure the old matrices are destroyed:

 ?? void
 ?? PreconditionBase::clear()
 ?? {
 ???? matrix = nullptr;

 ???? if (pc != nullptr)
 ?????? {
 ???????? PetscErrorCode ierr = PCDestroy(&pc);
 ???????? pc????????????????? = nullptr;
 ???????? AssertThrow(ierr == 0, ExcPETScError(ierr));
 ?????? }
 ?? }

Thanks!

Feimi

On 8/18/21 4:23 PM, Junchao Zhang wrote:
>
>
>
> On Wed, Aug 18, 2021 at 12:52 PM Feimi Yu <yuf2 at rpi.edu 
> <mailto:yuf2 at rpi.edu>> wrote:
>
>     Hi,
>
>     I was trying to run a simulation with a PETSc-wrapped Hypre
>     preconditioner, and encountered this problem:
>
>     [dcs122:133012] Out of resources: all 4095 communicator IDs have
>     been used.
>     [19]PETSC ERROR: --------------------- Error Message
>     --------------------------------------------------------------
>     [19]PETSC ERROR: General MPI error
>     [19]PETSC ERROR: MPI error 17 MPI_ERR_INTERN: internal error
>     [19]PETSC ERROR: See
>     https://www.mcs.anl.gov/petsc/documentation/faq.html
>     <https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble
>     shooting.
>     [19]PETSC ERROR: Petsc Release Version 3.15.2, unknown
>     [19]PETSC ERROR: ./main on a arch-linux-c-opt named dcs122 by
>     CFSIfmyu Wed Aug 11 19:51:47 2021
>     [19]PETSC ERROR: [dcs122:133010] Out of resources: all 4095
>     communicator IDs have been used.
>     [18]PETSC ERROR: --------------------- Error Message
>     --------------------------------------------------------------
>     [18]PETSC ERROR: General MPI error
>     [18]PETSC ERROR: MPI error 17 MPI_ERR_INTERN: internal error
>     [18]PETSC ERROR: See
>     https://www.mcs.anl.gov/petsc/documentation/faq.html
>     <https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble
>     shooting.
>     [18]PETSC ERROR: Petsc Release Version 3.15.2, unknown
>     [18]PETSC ERROR: ./main on a arch-linux-c-opt named dcs122 by
>     CFSIfmyu Wed Aug 11 19:51:47 2021
>     [18]PETSC ERROR: Configure options --download-scalapack
>     --download-mumps --download-hypre --with-cc=mpicc
>     --with-cxx=mpicxx --with-fc=mpif90 --with-cudac=0
>     --with-debugging=0
>     --with-blaslapack-dir=/gpfs/u/home/CFSI/CFSIfmyu/barn-shared/dcs-rh8/lapack-build/
>     [18]PETSC ERROR: #1 <https://itssc.rpi.edu/hc/requests/1>
>     MatCreate_HYPRE() at
>     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:2120
>     [18]PETSC ERROR: #2 <https://itssc.rpi.edu/hc/requests/2>
>     MatSetType() at
>     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matreg.c:91
>     [18]PETSC ERROR: #3 <https://itssc.rpi.edu/hc/requests/3>
>     MatConvert_AIJ_HYPRE() at
>     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:392
>     [18]PETSC ERROR: #4 <https://itssc.rpi.edu/hc/requests/4>
>     MatConvert() at
>     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matrix.c:4439
>     [18]PETSC ERROR: #5 <https://itssc.rpi.edu/hc/requests/5>
>     PCSetUp_HYPRE() at
>     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/impls/hypre/hypre.c:240
>     [18]PETSC ERROR: #6 <https://itssc.rpi.edu/hc/requests/6>
>     PCSetUp() at
>     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/interface/precon.c:1015
>     Configure options --download-scalapack --download-mumps
>     --download-hypre --with-cc=mpicc --with-cxx=mpicxx
>     --with-fc=mpif90 --with-cudac=0 --with-debugging=0
>     --with-blaslapack-dir=/gpfs/u/home/CFSI/CFSIfmyu/barn-shared/dcs-rh8/lapack-build/
>     [19]PETSC ERROR: #1 <https://itssc.rpi.edu/hc/requests/1>
>     MatCreate_HYPRE() at
>     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:2120
>     [19]PETSC ERROR: #2 <https://itssc.rpi.edu/hc/requests/2>
>     MatSetType() at
>     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matreg.c:91
>     [19]PETSC ERROR: #3 <https://itssc.rpi.edu/hc/requests/3>
>     MatConvert_AIJ_HYPRE() at
>     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:392
>     [19]PETSC ERROR: #4 <https://itssc.rpi.edu/hc/requests/4>
>     MatConvert() at
>     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matrix.c:4439
>     [19]PETSC ERROR: #5 <https://itssc.rpi.edu/hc/requests/5>
>     PCSetUp_HYPRE() at
>     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/impls/hypre/hypre.c:240
>     [19]PETSC ERROR: #6 <https://itssc.rpi.edu/hc/requests/6>
>     PCSetUp() at
>     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/interface/precon.c:1015
>
>     It seems that MPI_Comm_dup() at
>     petsc/src/mat/impls/hypre/mhypre.c:2120 caused the problem. Since
>     mine is a time-dependent problem, MatCreate_HYPRE() is called
>     every time the new system matrix is assembled. The above error
>     message is reported after ~4095 calls of MatCreate_HYPRE(), which
>     is around 455 time steps in my code. Here is some basic compiler
>     information:
>
> Can you destroy old matrices to free MPI communicators? Otherwise, you 
> run into a limitation we knew before.
>
>     IBM Spectrum MPI 10.4.0
>
>     GCC 8.4.1
>
>     I've never had this problem before with OpenMPI or MPICH
>     implementation, so I was wondering if this can be resolved from my
>     end, or it's an implementation specific problem.
>
>     Thanks!
>
>     Feimi
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210818/7f58cd45/attachment-0001.html>

From yuf2 at rpi.edu  Wed Aug 18 15:37:10 2021
From: yuf2 at rpi.edu (Feimi Yu)
Date: Wed, 18 Aug 2021 16:37:10 -0400
Subject: [petsc-users] Reaching limit number of communicator with
 Spectrum MPI
In-Reply-To: <71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu>
References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu>
	<CA+MQGp_EVwiHF1zaRF5o4qvWpk4yByq-Qrrx3OO2hrrKHnSvZA@mail.gmail.com>
	<71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu>
Message-ID: <23b3ff76-d2fd-e966-a3d7-c8982c4c21ef@rpi.edu>

My previous message may sound misleading. This problem happens despite 
the fact that the old matrices are destroyed.

Feimi

On 8/18/21 4:31 PM, Feimi Yu wrote:
>
> Hi Junchao,
>
> Thank you for the suggestion! I'm using the deal.ii wrapper 
> dealii::PETScWrappers::PreconditionBase to handle the PETSc 
> preconditioners, and the wrappers does the destroy when the 
> preconditioner is reinitialized or gets out of scope. I just 
> double-checked, this is called to make sure the old matrices are 
> destroyed:
>
> ?? void
> ?? PreconditionBase::clear()
> ?? {
> ???? matrix = nullptr;
>
> ???? if (pc != nullptr)
> ?????? {
> ???????? PetscErrorCode ierr = PCDestroy(&pc);
> ???????? pc????????????????? = nullptr;
> ???????? AssertThrow(ierr == 0, ExcPETScError(ierr));
> ?????? }
> ?? }
>
> Thanks!
>
> Feimi
>
> On 8/18/21 4:23 PM, Junchao Zhang wrote:
>>
>>
>>
>> On Wed, Aug 18, 2021 at 12:52 PM Feimi Yu <yuf2 at rpi.edu 
>> <mailto:yuf2 at rpi.edu>> wrote:
>>
>>     Hi,
>>
>>     I was trying to run a simulation with a PETSc-wrapped Hypre
>>     preconditioner, and encountered this problem:
>>
>>     [dcs122:133012] Out of resources: all 4095 communicator IDs have
>>     been used.
>>     [19]PETSC ERROR: --------------------- Error Message
>>     --------------------------------------------------------------
>>     [19]PETSC ERROR: General MPI error
>>     [19]PETSC ERROR: MPI error 17 MPI_ERR_INTERN: internal error
>>     [19]PETSC ERROR: See
>>     https://www.mcs.anl.gov/petsc/documentation/faq.html
>>     <https://www.mcs.anl.gov/petsc/documentation/faq.html> for
>>     trouble shooting.
>>     [19]PETSC ERROR: Petsc Release Version 3.15.2, unknown
>>     [19]PETSC ERROR: ./main on a arch-linux-c-opt named dcs122 by
>>     CFSIfmyu Wed Aug 11 19:51:47 2021
>>     [19]PETSC ERROR: [dcs122:133010] Out of resources: all 4095
>>     communicator IDs have been used.
>>     [18]PETSC ERROR: --------------------- Error Message
>>     --------------------------------------------------------------
>>     [18]PETSC ERROR: General MPI error
>>     [18]PETSC ERROR: MPI error 17 MPI_ERR_INTERN: internal error
>>     [18]PETSC ERROR: See
>>     https://www.mcs.anl.gov/petsc/documentation/faq.html
>>     <https://www.mcs.anl.gov/petsc/documentation/faq.html> for
>>     trouble shooting.
>>     [18]PETSC ERROR: Petsc Release Version 3.15.2, unknown
>>     [18]PETSC ERROR: ./main on a arch-linux-c-opt named dcs122 by
>>     CFSIfmyu Wed Aug 11 19:51:47 2021
>>     [18]PETSC ERROR: Configure options --download-scalapack
>>     --download-mumps --download-hypre --with-cc=mpicc
>>     --with-cxx=mpicxx --with-fc=mpif90 --with-cudac=0
>>     --with-debugging=0
>>     --with-blaslapack-dir=/gpfs/u/home/CFSI/CFSIfmyu/barn-shared/dcs-rh8/lapack-build/
>>     [18]PETSC ERROR: #1 <https://itssc.rpi.edu/hc/requests/1>
>>     MatCreate_HYPRE() at
>>     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:2120
>>     [18]PETSC ERROR: #2 <https://itssc.rpi.edu/hc/requests/2>
>>     MatSetType() at
>>     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matreg.c:91
>>     [18]PETSC ERROR: #3 <https://itssc.rpi.edu/hc/requests/3>
>>     MatConvert_AIJ_HYPRE() at
>>     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:392
>>     [18]PETSC ERROR: #4 <https://itssc.rpi.edu/hc/requests/4>
>>     MatConvert() at
>>     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matrix.c:4439
>>     [18]PETSC ERROR: #5 <https://itssc.rpi.edu/hc/requests/5>
>>     PCSetUp_HYPRE() at
>>     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/impls/hypre/hypre.c:240
>>     [18]PETSC ERROR: #6 <https://itssc.rpi.edu/hc/requests/6>
>>     PCSetUp() at
>>     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/interface/precon.c:1015
>>     Configure options --download-scalapack --download-mumps
>>     --download-hypre --with-cc=mpicc --with-cxx=mpicxx
>>     --with-fc=mpif90 --with-cudac=0 --with-debugging=0
>>     --with-blaslapack-dir=/gpfs/u/home/CFSI/CFSIfmyu/barn-shared/dcs-rh8/lapack-build/
>>     [19]PETSC ERROR: #1 <https://itssc.rpi.edu/hc/requests/1>
>>     MatCreate_HYPRE() at
>>     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:2120
>>     [19]PETSC ERROR: #2 <https://itssc.rpi.edu/hc/requests/2>
>>     MatSetType() at
>>     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matreg.c:91
>>     [19]PETSC ERROR: #3 <https://itssc.rpi.edu/hc/requests/3>
>>     MatConvert_AIJ_HYPRE() at
>>     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:392
>>     [19]PETSC ERROR: #4 <https://itssc.rpi.edu/hc/requests/4>
>>     MatConvert() at
>>     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matrix.c:4439
>>     [19]PETSC ERROR: #5 <https://itssc.rpi.edu/hc/requests/5>
>>     PCSetUp_HYPRE() at
>>     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/impls/hypre/hypre.c:240
>>     [19]PETSC ERROR: #6 <https://itssc.rpi.edu/hc/requests/6>
>>     PCSetUp() at
>>     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/interface/precon.c:1015
>>
>>     It seems that MPI_Comm_dup() at
>>     petsc/src/mat/impls/hypre/mhypre.c:2120 caused the problem. Since
>>     mine is a time-dependent problem, MatCreate_HYPRE() is called
>>     every time the new system matrix is assembled. The above error
>>     message is reported after ~4095 calls of MatCreate_HYPRE(), which
>>     is around 455 time steps in my code. Here is some basic compiler
>>     information:
>>
>> Can you destroy old matrices to free MPI communicators?? Otherwise, 
>> you run into a limitation we knew before.
>>
>>     IBM Spectrum MPI 10.4.0
>>
>>     GCC 8.4.1
>>
>>     I've never had this problem before with OpenMPI or MPICH
>>     implementation, so I was wondering if this can be resolved from
>>     my end, or it's an implementation specific problem.
>>
>>     Thanks!
>>
>>     Feimi
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210818/92559cdc/attachment.html>

From balay at mcs.anl.gov  Wed Aug 18 15:38:26 2021
From: balay at mcs.anl.gov (Satish Balay)
Date: Wed, 18 Aug 2021 15:38:26 -0500 (CDT)
Subject: [petsc-users] Reaching limit number of communicator with
 Spectrum MPI
In-Reply-To: <71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu>
References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu>
	<CA+MQGp_EVwiHF1zaRF5o4qvWpk4yByq-Qrrx3OO2hrrKHnSvZA@mail.gmail.com>
	<71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu>
Message-ID: <1ce1902e-4335-29d4-033-42e219e6355c@mcs.anl.gov>

Is the communicator used to create PETSc objects MPI_COMM_WORLD?

If so - try changing it to PETSC_COMM_WORLD

Satish

 On Wed, 18 Aug 2021, Feimi Yu wrote:

> Hi Junchao,
> 
> Thank you for the suggestion! I'm using the deal.ii wrapper
> dealii::PETScWrappers::PreconditionBase to handle the PETSc preconditioners,
> and the wrappers does the destroy when the preconditioner is reinitialized or
> gets out of scope. I just double-checked, this is called to make sure the old
> matrices are destroyed:
> 
> ?? void
> ?? PreconditionBase::clear()
> ?? {
> ???? matrix = nullptr;
> 
> ???? if (pc != nullptr)
> ?????? {
> ???????? PetscErrorCode ierr = PCDestroy(&pc);
> ???????? pc????????????????? = nullptr;
> ???????? AssertThrow(ierr == 0, ExcPETScError(ierr));
> ?????? }
> ?? }
> 
> Thanks!
> 
> Feimi
> 
> On 8/18/21 4:23 PM, Junchao Zhang wrote:
> >
> >
> >
> > On Wed, Aug 18, 2021 at 12:52 PM Feimi Yu <yuf2 at rpi.edu
> > <mailto:yuf2 at rpi.edu>> wrote:
> >
> >     Hi,
> >
> >     I was trying to run a simulation with a PETSc-wrapped Hypre
> >     preconditioner, and encountered this problem:
> >
> >     [dcs122:133012] Out of resources: all 4095 communicator IDs have
> >     been used.
> >     [19]PETSC ERROR: --------------------- Error Message
> >     --------------------------------------------------------------
> >     [19]PETSC ERROR: General MPI error
> >     [19]PETSC ERROR: MPI error 17 MPI_ERR_INTERN: internal error
> >     [19]PETSC ERROR: See
> >     https://www.mcs.anl.gov/petsc/documentation/faq.html
> >     <https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble
> >     shooting.
> >     [19]PETSC ERROR: Petsc Release Version 3.15.2, unknown
> >     [19]PETSC ERROR: ./main on a arch-linux-c-opt named dcs122 by
> >     CFSIfmyu Wed Aug 11 19:51:47 2021
> >     [19]PETSC ERROR: [dcs122:133010] Out of resources: all 4095
> >     communicator IDs have been used.
> >     [18]PETSC ERROR: --------------------- Error Message
> >     --------------------------------------------------------------
> >     [18]PETSC ERROR: General MPI error
> >     [18]PETSC ERROR: MPI error 17 MPI_ERR_INTERN: internal error
> >     [18]PETSC ERROR: See
> >     https://www.mcs.anl.gov/petsc/documentation/faq.html
> >     <https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble
> >     shooting.
> >     [18]PETSC ERROR: Petsc Release Version 3.15.2, unknown
> >     [18]PETSC ERROR: ./main on a arch-linux-c-opt named dcs122 by
> >     CFSIfmyu Wed Aug 11 19:51:47 2021
> >     [18]PETSC ERROR: Configure options --download-scalapack
> >     --download-mumps --download-hypre --with-cc=mpicc
> >     --with-cxx=mpicxx --with-fc=mpif90 --with-cudac=0
> >     --with-debugging=0
> >     --with-blaslapack-dir=/gpfs/u/home/CFSI/CFSIfmyu/barn-shared/dcs-rh8/lapack-build/
> >     [18]PETSC ERROR: #1 <https://itssc.rpi.edu/hc/requests/1>
> >     MatCreate_HYPRE() at
> >     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:2120
> >     [18]PETSC ERROR: #2 <https://itssc.rpi.edu/hc/requests/2>
> >     MatSetType() at
> >     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matreg.c:91
> >     [18]PETSC ERROR: #3 <https://itssc.rpi.edu/hc/requests/3>
> >     MatConvert_AIJ_HYPRE() at
> >     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:392
> >     [18]PETSC ERROR: #4 <https://itssc.rpi.edu/hc/requests/4>
> >     MatConvert() at
> >     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matrix.c:4439
> >     [18]PETSC ERROR: #5 <https://itssc.rpi.edu/hc/requests/5>
> >     PCSetUp_HYPRE() at
> >     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/impls/hypre/hypre.c:240
> >     [18]PETSC ERROR: #6 <https://itssc.rpi.edu/hc/requests/6>
> >     PCSetUp() at
> >     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/interface/precon.c:1015
> >     Configure options --download-scalapack --download-mumps
> >     --download-hypre --with-cc=mpicc --with-cxx=mpicxx
> >     --with-fc=mpif90 --with-cudac=0 --with-debugging=0
> >     --with-blaslapack-dir=/gpfs/u/home/CFSI/CFSIfmyu/barn-shared/dcs-rh8/lapack-build/
> >     [19]PETSC ERROR: #1 <https://itssc.rpi.edu/hc/requests/1>
> >     MatCreate_HYPRE() at
> >     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:2120
> >     [19]PETSC ERROR: #2 <https://itssc.rpi.edu/hc/requests/2>
> >     MatSetType() at
> >     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matreg.c:91
> >     [19]PETSC ERROR: #3 <https://itssc.rpi.edu/hc/requests/3>
> >     MatConvert_AIJ_HYPRE() at
> >     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:392
> >     [19]PETSC ERROR: #4 <https://itssc.rpi.edu/hc/requests/4>
> >     MatConvert() at
> >     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matrix.c:4439
> >     [19]PETSC ERROR: #5 <https://itssc.rpi.edu/hc/requests/5>
> >     PCSetUp_HYPRE() at
> >     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/impls/hypre/hypre.c:240
> >     [19]PETSC ERROR: #6 <https://itssc.rpi.edu/hc/requests/6>
> >     PCSetUp() at
> >     /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/interface/precon.c:1015
> >
> >     It seems that MPI_Comm_dup() at
> >     petsc/src/mat/impls/hypre/mhypre.c:2120 caused the problem. Since
> >     mine is a time-dependent problem, MatCreate_HYPRE() is called
> >     every time the new system matrix is assembled. The above error
> >     message is reported after ~4095 calls of MatCreate_HYPRE(), which
> >     is around 455 time steps in my code. Here is some basic compiler
> >     information:
> >
> > Can you destroy old matrices to free MPI communicators? Otherwise, you run
> > into a limitation we knew before.
> >
> >     IBM Spectrum MPI 10.4.0
> >
> >     GCC 8.4.1
> >
> >     I've never had this problem before with OpenMPI or MPICH
> >     implementation, so I was wondering if this can be resolved from my
> >     end, or it's an implementation specific problem.
> >
> >     Thanks!
> >
> >     Feimi
> >
> 
> 

From junchao.zhang at gmail.com  Wed Aug 18 15:53:22 2021
From: junchao.zhang at gmail.com (Junchao Zhang)
Date: Wed, 18 Aug 2021 15:53:22 -0500
Subject: [petsc-users] Reaching limit number of communicator with
 Spectrum MPI
In-Reply-To: <1ce1902e-4335-29d4-033-42e219e6355c@mcs.anl.gov>
References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu>
	<CA+MQGp_EVwiHF1zaRF5o4qvWpk4yByq-Qrrx3OO2hrrKHnSvZA@mail.gmail.com>
	<71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu>
	<1ce1902e-4335-29d4-033-42e219e6355c@mcs.anl.gov>
Message-ID: <CA+MQGp_PsC5bNr461Ehc5t81cy-PxzVmy5MNhKGMGS7NnPFRew@mail.gmail.com>

Hi, Feimi,
  I need to consult Jed (cc'ed).
  Jed, is this an example of
https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663?
If Feimi really can not free matrices, then we just need to attach a
hypre-comm to a petsc inner comm, and pass that to hypre.

--Junchao Zhang


On Wed, Aug 18, 2021 at 3:38 PM Satish Balay <balay at mcs.anl.gov> wrote:

> Is the communicator used to create PETSc objects MPI_COMM_WORLD?
>
> If so - try changing it to PETSC_COMM_WORLD
>
> Satish
>
>  On Wed, 18 Aug 2021, Feimi Yu wrote:
>
> > Hi Junchao,
> >
> > Thank you for the suggestion! I'm using the deal.ii wrapper
> > dealii::PETScWrappers::PreconditionBase to handle the PETSc
> preconditioners,
> > and the wrappers does the destroy when the preconditioner is
> reinitialized or
> > gets out of scope. I just double-checked, this is called to make sure
> the old
> > matrices are destroyed:
> >
> >    void
> >    PreconditionBase::clear()
> >    {
> >      matrix = nullptr;
> >
> >      if (pc != nullptr)
> >        {
> >          PetscErrorCode ierr = PCDestroy(&pc);
> >          pc                  = nullptr;
> >          AssertThrow(ierr == 0, ExcPETScError(ierr));
> >        }
> >    }
> >
> > Thanks!
> >
> > Feimi
> >
> > On 8/18/21 4:23 PM, Junchao Zhang wrote:
> > >
> > >
> > >
> > > On Wed, Aug 18, 2021 at 12:52 PM Feimi Yu <yuf2 at rpi.edu
> > > <mailto:yuf2 at rpi.edu>> wrote:
> > >
> > >     Hi,
> > >
> > >     I was trying to run a simulation with a PETSc-wrapped Hypre
> > >     preconditioner, and encountered this problem:
> > >
> > >     [dcs122:133012] Out of resources: all 4095 communicator IDs have
> > >     been used.
> > >     [19]PETSC ERROR: --------------------- Error Message
> > >     --------------------------------------------------------------
> > >     [19]PETSC ERROR: General MPI error
> > >     [19]PETSC ERROR: MPI error 17 MPI_ERR_INTERN: internal error
> > >     [19]PETSC ERROR: See
> > >     https://www.mcs.anl.gov/petsc/documentation/faq.html
> > >     <https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble
> > >     shooting.
> > >     [19]PETSC ERROR: Petsc Release Version 3.15.2, unknown
> > >     [19]PETSC ERROR: ./main on a arch-linux-c-opt named dcs122 by
> > >     CFSIfmyu Wed Aug 11 19:51:47 2021
> > >     [19]PETSC ERROR: [dcs122:133010] Out of resources: all 4095
> > >     communicator IDs have been used.
> > >     [18]PETSC ERROR: --------------------- Error Message
> > >     --------------------------------------------------------------
> > >     [18]PETSC ERROR: General MPI error
> > >     [18]PETSC ERROR: MPI error 17 MPI_ERR_INTERN: internal error
> > >     [18]PETSC ERROR: See
> > >     https://www.mcs.anl.gov/petsc/documentation/faq.html
> > >     <https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble
> > >     shooting.
> > >     [18]PETSC ERROR: Petsc Release Version 3.15.2, unknown
> > >     [18]PETSC ERROR: ./main on a arch-linux-c-opt named dcs122 by
> > >     CFSIfmyu Wed Aug 11 19:51:47 2021
> > >     [18]PETSC ERROR: Configure options --download-scalapack
> > >     --download-mumps --download-hypre --with-cc=mpicc
> > >     --with-cxx=mpicxx --with-fc=mpif90 --with-cudac=0
> > >     --with-debugging=0
> > >
>  --with-blaslapack-dir=/gpfs/u/home/CFSI/CFSIfmyu/barn-shared/dcs-rh8/lapack-build/
> > >     [18]PETSC ERROR: #1 <https://itssc.rpi.edu/hc/requests/1>
> > >     MatCreate_HYPRE() at
> > >
>  /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:2120
> > >     [18]PETSC ERROR: #2 <https://itssc.rpi.edu/hc/requests/2>
> > >     MatSetType() at
> > >
>  /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matreg.c:91
> > >     [18]PETSC ERROR: #3 <https://itssc.rpi.edu/hc/requests/3>
> > >     MatConvert_AIJ_HYPRE() at
> > >
>  /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:392
> > >     [18]PETSC ERROR: #4 <https://itssc.rpi.edu/hc/requests/4>
> > >     MatConvert() at
> > >
>  /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matrix.c:4439
> > >     [18]PETSC ERROR: #5 <https://itssc.rpi.edu/hc/requests/5>
> > >     PCSetUp_HYPRE() at
> > >
>  /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/impls/hypre/hypre.c:240
> > >     [18]PETSC ERROR: #6 <https://itssc.rpi.edu/hc/requests/6>
> > >     PCSetUp() at
> > >
>  /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/interface/precon.c:1015
> > >     Configure options --download-scalapack --download-mumps
> > >     --download-hypre --with-cc=mpicc --with-cxx=mpicxx
> > >     --with-fc=mpif90 --with-cudac=0 --with-debugging=0
> > >
>  --with-blaslapack-dir=/gpfs/u/home/CFSI/CFSIfmyu/barn-shared/dcs-rh8/lapack-build/
> > >     [19]PETSC ERROR: #1 <https://itssc.rpi.edu/hc/requests/1>
> > >     MatCreate_HYPRE() at
> > >
>  /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:2120
> > >     [19]PETSC ERROR: #2 <https://itssc.rpi.edu/hc/requests/2>
> > >     MatSetType() at
> > >
>  /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matreg.c:91
> > >     [19]PETSC ERROR: #3 <https://itssc.rpi.edu/hc/requests/3>
> > >     MatConvert_AIJ_HYPRE() at
> > >
>  /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:392
> > >     [19]PETSC ERROR: #4 <https://itssc.rpi.edu/hc/requests/4>
> > >     MatConvert() at
> > >
>  /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matrix.c:4439
> > >     [19]PETSC ERROR: #5 <https://itssc.rpi.edu/hc/requests/5>
> > >     PCSetUp_HYPRE() at
> > >
>  /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/impls/hypre/hypre.c:240
> > >     [19]PETSC ERROR: #6 <https://itssc.rpi.edu/hc/requests/6>
> > >     PCSetUp() at
> > >
>  /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/interface/precon.c:1015
> > >
> > >     It seems that MPI_Comm_dup() at
> > >     petsc/src/mat/impls/hypre/mhypre.c:2120 caused the problem. Since
> > >     mine is a time-dependent problem, MatCreate_HYPRE() is called
> > >     every time the new system matrix is assembled. The above error
> > >     message is reported after ~4095 calls of MatCreate_HYPRE(), which
> > >     is around 455 time steps in my code. Here is some basic compiler
> > >     information:
> > >
> > > Can you destroy old matrices to free MPI communicators? Otherwise, you
> run
> > > into a limitation we knew before.
> > >
> > >     IBM Spectrum MPI 10.4.0
> > >
> > >     GCC 8.4.1
> > >
> > >     I've never had this problem before with OpenMPI or MPICH
> > >     implementation, so I was wondering if this can be resolved from my
> > >     end, or it's an implementation specific problem.
> > >
> > >     Thanks!
> > >
> > >     Feimi
> > >
> >
> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210818/2993b47b/attachment-0001.html>

From yuf2 at rpi.edu  Wed Aug 18 16:23:52 2021
From: yuf2 at rpi.edu (Feimi Yu)
Date: Wed, 18 Aug 2021 17:23:52 -0400
Subject: [petsc-users] Reaching limit number of communicator with
 Spectrum MPI
In-Reply-To: <CA+MQGp_PsC5bNr461Ehc5t81cy-PxzVmy5MNhKGMGS7NnPFRew@mail.gmail.com>
References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu>
	<CA+MQGp_EVwiHF1zaRF5o4qvWpk4yByq-Qrrx3OO2hrrKHnSvZA@mail.gmail.com>
	<71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu>
	<1ce1902e-4335-29d4-033-42e219e6355c@mcs.anl.gov>
	<CA+MQGp_PsC5bNr461Ehc5t81cy-PxzVmy5MNhKGMGS7NnPFRew@mail.gmail.com>
Message-ID: <095ee8d1-56d9-7d4c-484c-7dc0b88e657c@rpi.edu>

Hi Satish and Junchao,

I just tried replacing all MPI_COMM_WORLD with PETSC_COMM_WORLD, but it 
didn't do the trick. One thing that interests me is that, I ran with 40 
ranks but only 2 ranks reported the communicator error. I think this 
means at least the rest 38 ranks freed the communicators properly.

Thanks!

Feimi

On 8/18/21 4:53 PM, Junchao Zhang wrote:
> Hi, Feimi,
> ? I need to?consult?Jed (cc'ed).
> ? Jed, is this an example of 
> https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663 
> <https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663>? 
> If Feimi really can not free matrices, then we just need to attach a 
> hypre-comm to a petsc inner comm, and pass that to hypre.
>
> --Junchao Zhang
>
>
> On Wed, Aug 18, 2021 at 3:38 PM Satish Balay <balay at mcs.anl.gov 
> <mailto:balay at mcs.anl.gov>> wrote:
>
>     Is the communicator used to create PETSc objects MPI_COMM_WORLD?
>
>     If so - try changing it to PETSC_COMM_WORLD
>
>     Satish
>
>     ?On Wed, 18 Aug 2021, Feimi Yu wrote:
>
>     > Hi Junchao,
>     >
>     > Thank you for the suggestion! I'm using the deal.ii wrapper
>     > dealii::PETScWrappers::PreconditionBase to handle the PETSc
>     preconditioners,
>     > and the wrappers does the destroy when the preconditioner is
>     reinitialized or
>     > gets out of scope. I just double-checked, this is called to make
>     sure the old
>     > matrices are destroyed:
>     >
>     > ?? void
>     > ?? PreconditionBase::clear()
>     > ?? {
>     > ???? matrix = nullptr;
>     >
>     > ???? if (pc != nullptr)
>     > ?????? {
>     > ???????? PetscErrorCode ierr = PCDestroy(&pc);
>     > ???????? pc????????????????? = nullptr;
>     > ???????? AssertThrow(ierr == 0, ExcPETScError(ierr));
>     > ?????? }
>     > ?? }
>     >
>     > Thanks!
>     >
>     > Feimi
>     >
>     > On 8/18/21 4:23 PM, Junchao Zhang wrote:
>     > >
>     > >
>     > >
>     > > On Wed, Aug 18, 2021 at 12:52 PM Feimi Yu <yuf2 at rpi.edu
>     <mailto:yuf2 at rpi.edu>
>     > > <mailto:yuf2 at rpi.edu <mailto:yuf2 at rpi.edu>>> wrote:
>     > >
>     > >? ? ?Hi,
>     > >
>     > >? ? ?I was trying to run a simulation with a PETSc-wrapped Hypre
>     > >? ? ?preconditioner, and encountered this problem:
>     > >
>     > >? ? ?[dcs122:133012] Out of resources: all 4095 communicator
>     IDs have
>     > >? ? ?been used.
>     > >? ? ?[19]PETSC ERROR: --------------------- Error Message
>     > > ?--------------------------------------------------------------
>     > >? ? ?[19]PETSC ERROR: General MPI error
>     > >? ? ?[19]PETSC ERROR: MPI error 17 MPI_ERR_INTERN: internal error
>     > >? ? ?[19]PETSC ERROR: See
>     > > https://www.mcs.anl.gov/petsc/documentation/faq.html
>     <https://www.mcs.anl.gov/petsc/documentation/faq.html>
>     > >? ? ?<https://www.mcs.anl.gov/petsc/documentation/faq.html
>     <https://www.mcs.anl.gov/petsc/documentation/faq.html>> for trouble
>     > >? ? ?shooting.
>     > >? ? ?[19]PETSC ERROR: Petsc Release Version 3.15.2, unknown
>     > >? ? ?[19]PETSC ERROR: ./main on a arch-linux-c-opt named dcs122 by
>     > >? ? ?CFSIfmyu Wed Aug 11 19:51:47 2021
>     > >? ? ?[19]PETSC ERROR: [dcs122:133010] Out of resources: all 4095
>     > >? ? ?communicator IDs have been used.
>     > >? ? ?[18]PETSC ERROR: --------------------- Error Message
>     > > ?--------------------------------------------------------------
>     > >? ? ?[18]PETSC ERROR: General MPI error
>     > >? ? ?[18]PETSC ERROR: MPI error 17 MPI_ERR_INTERN: internal error
>     > >? ? ?[18]PETSC ERROR: See
>     > > https://www.mcs.anl.gov/petsc/documentation/faq.html
>     <https://www.mcs.anl.gov/petsc/documentation/faq.html>
>     > >? ? ?<https://www.mcs.anl.gov/petsc/documentation/faq.html
>     <https://www.mcs.anl.gov/petsc/documentation/faq.html>> for trouble
>     > >? ? ?shooting.
>     > >? ? ?[18]PETSC ERROR: Petsc Release Version 3.15.2, unknown
>     > >? ? ?[18]PETSC ERROR: ./main on a arch-linux-c-opt named dcs122 by
>     > >? ? ?CFSIfmyu Wed Aug 11 19:51:47 2021
>     > >? ? ?[18]PETSC ERROR: Configure options --download-scalapack
>     > >? ? ?--download-mumps --download-hypre --with-cc=mpicc
>     > >? ? ?--with-cxx=mpicxx --with-fc=mpif90 --with-cudac=0
>     > >? ? ?--with-debugging=0
>     > >
>     ?--with-blaslapack-dir=/gpfs/u/home/CFSI/CFSIfmyu/barn-shared/dcs-rh8/lapack-build/
>     > >? ? ?[18]PETSC ERROR: #1 <https://itssc.rpi.edu/hc/requests/1
>     <https://itssc.rpi.edu/hc/requests/1>>
>     > >? ? ?MatCreate_HYPRE() at
>     > >
>     ?/gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:2120
>     > >? ? ?[18]PETSC ERROR: #2 <https://itssc.rpi.edu/hc/requests/2
>     <https://itssc.rpi.edu/hc/requests/2>>
>     > >? ? ?MatSetType() at
>     > >
>     ?/gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matreg.c:91
>     > >? ? ?[18]PETSC ERROR: #3 <https://itssc.rpi.edu/hc/requests/3
>     <https://itssc.rpi.edu/hc/requests/3>>
>     > >? ? ?MatConvert_AIJ_HYPRE() at
>     > >
>     ?/gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:392
>     > >? ? ?[18]PETSC ERROR: #4 <https://itssc.rpi.edu/hc/requests/4
>     <https://itssc.rpi.edu/hc/requests/4>>
>     > >? ? ?MatConvert() at
>     > >
>     ?/gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matrix.c:4439
>     > >? ? ?[18]PETSC ERROR: #5 <https://itssc.rpi.edu/hc/requests/5
>     <https://itssc.rpi.edu/hc/requests/5>>
>     > >? ? ?PCSetUp_HYPRE() at
>     > >
>     ?/gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/impls/hypre/hypre.c:240
>     > >? ? ?[18]PETSC ERROR: #6 <https://itssc.rpi.edu/hc/requests/6
>     <https://itssc.rpi.edu/hc/requests/6>>
>     > >? ? ?PCSetUp() at
>     > >
>     ?/gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/interface/precon.c:1015
>     > >? ? ?Configure options --download-scalapack --download-mumps
>     > >? ? ?--download-hypre --with-cc=mpicc --with-cxx=mpicxx
>     > >? ? ?--with-fc=mpif90 --with-cudac=0 --with-debugging=0
>     > >
>     ?--with-blaslapack-dir=/gpfs/u/home/CFSI/CFSIfmyu/barn-shared/dcs-rh8/lapack-build/
>     > >? ? ?[19]PETSC ERROR: #1 <https://itssc.rpi.edu/hc/requests/1
>     <https://itssc.rpi.edu/hc/requests/1>>
>     > >? ? ?MatCreate_HYPRE() at
>     > >
>     ?/gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:2120
>     > >? ? ?[19]PETSC ERROR: #2 <https://itssc.rpi.edu/hc/requests/2
>     <https://itssc.rpi.edu/hc/requests/2>>
>     > >? ? ?MatSetType() at
>     > >
>     ?/gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matreg.c:91
>     > >? ? ?[19]PETSC ERROR: #3 <https://itssc.rpi.edu/hc/requests/3
>     <https://itssc.rpi.edu/hc/requests/3>>
>     > >? ? ?MatConvert_AIJ_HYPRE() at
>     > >
>     ?/gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:392
>     > >? ? ?[19]PETSC ERROR: #4 <https://itssc.rpi.edu/hc/requests/4
>     <https://itssc.rpi.edu/hc/requests/4>>
>     > >? ? ?MatConvert() at
>     > >
>     ?/gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matrix.c:4439
>     > >? ? ?[19]PETSC ERROR: #5 <https://itssc.rpi.edu/hc/requests/5
>     <https://itssc.rpi.edu/hc/requests/5>>
>     > >? ? ?PCSetUp_HYPRE() at
>     > >
>     ?/gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/impls/hypre/hypre.c:240
>     > >? ? ?[19]PETSC ERROR: #6 <https://itssc.rpi.edu/hc/requests/6
>     <https://itssc.rpi.edu/hc/requests/6>>
>     > >? ? ?PCSetUp() at
>     > >
>     ?/gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/interface/precon.c:1015
>     > >
>     > >? ? ?It seems that MPI_Comm_dup() at
>     > >? ? ?petsc/src/mat/impls/hypre/mhypre.c:2120 caused the
>     problem. Since
>     > >? ? ?mine is a time-dependent problem, MatCreate_HYPRE() is called
>     > >? ? ?every time the new system matrix is assembled. The above error
>     > >? ? ?message is reported after ~4095 calls of
>     MatCreate_HYPRE(), which
>     > >? ? ?is around 455 time steps in my code. Here is some basic
>     compiler
>     > >? ? ?information:
>     > >
>     > > Can you destroy old matrices to free MPI communicators?
>     Otherwise, you run
>     > > into a limitation we knew before.
>     > >
>     > >? ? ?IBM Spectrum MPI 10.4.0
>     > >
>     > >? ? ?GCC 8.4.1
>     > >
>     > >? ? ?I've never had this problem before with OpenMPI or MPICH
>     > >? ? ?implementation, so I was wondering if this can be resolved
>     from my
>     > >? ? ?end, or it's an implementation specific problem.
>     > >
>     > >? ? ?Thanks!
>     > >
>     > >? ? ?Feimi
>     > >
>     >
>     > 
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210818/18fccc09/attachment.html>

From sayosale at hotmail.com  Thu Aug 19 00:12:14 2021
From: sayosale at hotmail.com (dazza simplythebest)
Date: Thu, 19 Aug 2021 05:12:14 +0000
Subject: [petsc-users] Improving efficiency of slepc usage
Message-ID: <MEYP282MB1975D9155BCF70CF2F5D65CAD0C09@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>

Dear All,
            I am planning on using slepc to do a large number of eigenvalue calculations
 of a generalized eigenvalue problem, called from a program written in fortran using MPI.
 Thus far I have successfully installed the slepc/PETSc software, both locally and on a cluster,
 and on smaller test problems everything is working well; the matrices are efficiently and
correctly constructed and slepc returns the correct spectrum. I am just now starting to move
towards now solving the full-size 'production run' problems, and would appreciate some
general advice on how to improve the solver's performance.

In particular, I am currently trying to solve the problem Ax = lambda Bx whose matrices
are of size 50000 (this is the smallest 'production run' problem I will be tackling), and are
complex, non-Hermitian.  In most cases I aim to find the eigenvalues with the largest real part,
although in other cases I will also be interested in finding the eigenvalues whose real part
is close to zero.

A)
Calling slepc 's EPS solver with the following options:

-eps_nev 10   -log_view -eps_view -eps_max_it 600 -eps_ncv 140  -eps_tol 5.0e-6  -eps_largest_real -eps_monitor :monitor_output.txt


led to the code successfully running, but failing to find any eigenvalues within the maximum 600 iterations
(examining the monitor output it did appear to be very slowly approaching convergence).

B)
On the same problem I have also tried a shift-invert transformation using the options

-eps_nev 10    -eps_ncv 140    -eps_target 0.0+0.0i  -st_type sinvert

-in this case the code crashed at the point it tried to call slepc, so perhaps I have incorrectly specified these options ?


Does anyone have any suggestions as to how to improve this performance ( or find out more about the problem) ?
In the case of A) I can see from watching the slepc   videos that increasing ncv
may help, but I am wondering , since 600 is a large number of iterations, whether there
maybe something else going on - e.g. perhaps some alternative preconditioner may help ?
In the case of B), I guess there must be some mistake in these command line options?
 Again, any advice will be greatly appreciated.
     Best wishes,  Dan.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210819/a873e955/attachment-0001.html>

From jroman at dsic.upv.es  Thu Aug 19 02:58:29 2021
From: jroman at dsic.upv.es (Jose E. Roman)
Date: Thu, 19 Aug 2021 09:58:29 +0200
Subject: [petsc-users] Improving efficiency of slepc usage
In-Reply-To: <MEYP282MB1975D9155BCF70CF2F5D65CAD0C09@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
References: <MEYP282MB1975D9155BCF70CF2F5D65CAD0C09@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
Message-ID: <EF005C76-523A-4D6B-8DA5-45A9A3FC37B3@dsic.upv.es>

In A) convergence may be slow, especially if the wanted eigenvalues have small magnitude. I would not say 600 iterations is a lot, you probably need many more. In most cases, approach B) is better because it improves convergence of eigenvalues close to the target, but it requires prior knowledge of your spectrum distribution in order to choose an appropriate target.

In B) what do you mean that it crashes. If you get an error about factorization, it means that your A-matrix is singular, In that case, try using a nonzero target -eps_target 0.1

Jose


> El 19 ago 2021, a las 7:12, dazza simplythebest <sayosale at hotmail.com> escribi?:
> 
> Dear All,
>             I am planning on using slepc to do a large number of eigenvalue calculations
>  of a generalized eigenvalue problem, called from a program written in fortran using MPI.
>  Thus far I have successfully installed the slepc/PETSc software, both locally and on a cluster,
>  and on smaller test problems everything is working well; the matrices are efficiently and 
> correctly constructed and slepc returns the correct spectrum. I am just now starting to move
> towards now solving the full-size 'production run' problems, and would appreciate some 
> general advice on how to improve the solver's performance.
> 
> In particular, I am currently trying to solve the problem Ax = lambda Bx whose matrices 
> are of size 50000 (this is the smallest 'production run' problem I will be tackling), and are 
> complex, non-Hermitian.  In most cases I aim to find the eigenvalues with the largest real part, 
> although in other cases I will also be interested in finding the eigenvalues whose real part 
> is close to zero.
> 
> A)
> Calling slepc 's EPS solver with the following options:
> 
> -eps_nev 10   -log_view -eps_view -eps_max_it 600 -eps_ncv 140  -eps_tol 5.0e-6  -eps_largest_real -eps_monitor :monitor_output.txt
> 
> 
> led to the code successfully running, but failing to find any eigenvalues within the maximum 600 iterations 
> (examining the monitor output it did appear to be very slowly approaching convergence).
> 
> B)
> On the same problem I have also tried a shift-invert transformation using the options
> 
> -eps_nev 10    -eps_ncv 140    -eps_target 0.0+0.0i  -st_type sinvert
> 
> -in this case the code crashed at the point it tried to call slepc, so perhaps I have incorrectly specified these options ?
> 
> 
> Does anyone have any suggestions as to how to improve this performance ( or find out more about the problem) ?
> In the case of A) I can see from watching the slepc   videos that increasing ncv 
> may help, but I am wondering , since 600 is a large number of iterations, whether there 
> maybe something else going on - e.g. perhaps some alternative preconditioner may help ?
> In the case of B), I guess there must be some mistake in these command line options?
>  Again, any advice will be greatly appreciated.
>      Best wishes,  Dan.


From jed at jedbrown.org  Thu Aug 19 08:01:55 2021
From: jed at jedbrown.org (Jed Brown)
Date: Thu, 19 Aug 2021 07:01:55 -0600
Subject: [petsc-users] Reaching limit number of communicator with
 Spectrum MPI
In-Reply-To: <CA+MQGp_PsC5bNr461Ehc5t81cy-PxzVmy5MNhKGMGS7NnPFRew@mail.gmail.com>
References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu>
	<CA+MQGp_EVwiHF1zaRF5o4qvWpk4yByq-Qrrx3OO2hrrKHnSvZA@mail.gmail.com>
	<71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu>
	<1ce1902e-4335-29d4-033-42e219e6355c@mcs.anl.gov>
	<CA+MQGp_PsC5bNr461Ehc5t81cy-PxzVmy5MNhKGMGS7NnPFRew@mail.gmail.com>
Message-ID: <878s0x6118.fsf@jedbrown.org>

Junchao Zhang <junchao.zhang at gmail.com> writes:

> Hi, Feimi,
>   I need to consult Jed (cc'ed).
>   Jed, is this an example of
> https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663?
> If Feimi really can not free matrices, then we just need to attach a
> hypre-comm to a petsc inner comm, and pass that to hypre.

Are there a bunch of solves as in that case?

My understanding is that one should be able to MPI_Comm_dup/MPI_Comm_free as many times as you like, but the implementation has limits on how many communicators can co-exist at any one time. The many-at-once is what we encountered in that 2018 thread.

One way to check would be to use a debugger or tracer to examine the stack every time (P)MPI_Comm_dup and (P)MPI_Comm_free are called.

case 1: we'll find lots of dups without frees (until the end) because the user really wants lots of these existing at the same time.

case 2: dups are unfreed because of reference counting issue/inessential references


In case 1, I think the solution is as outlined in the thread, PETSc can create an inner-comm for Hypre. I think I'd prefer to attach it to the outer comm instead of the PETSc inner comm, but perhaps a case could be made either way.

From yuf2 at rpi.edu  Thu Aug 19 14:08:00 2021
From: yuf2 at rpi.edu (Feimi Yu)
Date: Thu, 19 Aug 2021 15:08:00 -0400
Subject: [petsc-users] Reaching limit number of communicator with
 Spectrum MPI
In-Reply-To: <878s0x6118.fsf@jedbrown.org>
References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu>
	<CA+MQGp_EVwiHF1zaRF5o4qvWpk4yByq-Qrrx3OO2hrrKHnSvZA@mail.gmail.com>
	<71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu>
	<1ce1902e-4335-29d4-033-42e219e6355c@mcs.anl.gov>
	<CA+MQGp_PsC5bNr461Ehc5t81cy-PxzVmy5MNhKGMGS7NnPFRew@mail.gmail.com>
	<878s0x6118.fsf@jedbrown.org>
Message-ID: <a515a91d-73ee-8622-4a8d-f0517924caa2@rpi.edu>

Hi Jed,

In my case, I only have 2 hypre preconditioners at the same time, and 
they do not solve simultaneously, so it might not be case 1.

I checked the stack for all the calls of MPI_Comm_dup/MPI_Comm_free on 
my own machine (with OpenMPI), all the communicators are freed from my 
observation. I could not test it with Spectrum MPI on the clusters 
immediately because all the dependencies were built in release mode. 
However, as I mentioned, I haven't had this problem with OpenMPI before, 
so I'm not sure if this is really an MPI implementation problem, or just 
because Spectrum MPI has less limit for the number of communicators, 
and/or this also depends on how many MPI ranks are used, as only 2 out 
of 40 ranks reported the error.

As a workaround, I replaced the MPI_Comm_dup() at 
petsc/src/mat/impls/hypre/mhypre.c:2120 with a copy assignment, and also 
removed the MPI_Comm_free() in the hypre destroyer. My code runs fine 
with Spectrum MPI now, but I don't think this is a long-term solution.

Thanks!

Feimi

On 8/19/21 9:01 AM, Jed Brown wrote:
> Junchao Zhang <junchao.zhang at gmail.com> writes:
>
>> Hi, Feimi,
>>    I need to consult Jed (cc'ed).
>>    Jed, is this an example of
>> https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663?
>> If Feimi really can not free matrices, then we just need to attach a
>> hypre-comm to a petsc inner comm, and pass that to hypre.
> Are there a bunch of solves as in that case?
>
> My understanding is that one should be able to MPI_Comm_dup/MPI_Comm_free as many times as you like, but the implementation has limits on how many communicators can co-exist at any one time. The many-at-once is what we encountered in that 2018 thread.
>
> One way to check would be to use a debugger or tracer to examine the stack every time (P)MPI_Comm_dup and (P)MPI_Comm_free are called.
>
> case 1: we'll find lots of dups without frees (until the end) because the user really wants lots of these existing at the same time.
>
> case 2: dups are unfreed because of reference counting issue/inessential references
>
>
> In case 1, I think the solution is as outlined in the thread, PETSc can create an inner-comm for Hypre. I think I'd prefer to attach it to the outer comm instead of the PETSc inner comm, but perhaps a case could be made either way.

From knepley at gmail.com  Thu Aug 19 14:14:37 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 19 Aug 2021 14:14:37 -0500
Subject: [petsc-users] Reaching limit number of communicator with
 Spectrum MPI
In-Reply-To: <a515a91d-73ee-8622-4a8d-f0517924caa2@rpi.edu>
References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu>
	<CA+MQGp_EVwiHF1zaRF5o4qvWpk4yByq-Qrrx3OO2hrrKHnSvZA@mail.gmail.com>
	<71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu>
	<1ce1902e-4335-29d4-033-42e219e6355c@mcs.anl.gov>
	<CA+MQGp_PsC5bNr461Ehc5t81cy-PxzVmy5MNhKGMGS7NnPFRew@mail.gmail.com>
	<878s0x6118.fsf@jedbrown.org>
	<a515a91d-73ee-8622-4a8d-f0517924caa2@rpi.edu>
Message-ID: <CAMYG4GmSKOvq0B31ookR3WcA1v3V8ODUDL2D45MfS3ZCGdJQ1Q@mail.gmail.com>

On Thu, Aug 19, 2021 at 2:08 PM Feimi Yu <yuf2 at rpi.edu> wrote:

> Hi Jed,
>
> In my case, I only have 2 hypre preconditioners at the same time, and
> they do not solve simultaneously, so it might not be case 1.
>
> I checked the stack for all the calls of MPI_Comm_dup/MPI_Comm_free on
> my own machine (with OpenMPI), all the communicators are freed from my
> observation. I could not test it with Spectrum MPI on the clusters
> immediately because all the dependencies were built in release mode.
> However, as I mentioned, I haven't had this problem with OpenMPI before,
> so I'm not sure if this is really an MPI implementation problem, or just
> because Spectrum MPI has less limit for the number of communicators,
> and/or this also depends on how many MPI ranks are used, as only 2 out
> of 40 ranks reported the error.
>
> As a workaround, I replaced the MPI_Comm_dup() at
> petsc/src/mat/impls/hypre/mhypre.c:2120 with a copy assignment, and also
> removed the MPI_Comm_free() in the hypre destroyer. My code runs fine
> with Spectrum MPI now, but I don't think this is a long-term solution.
>

If that runs, then it is definitely an MPI implementation problem.

  Thanks,

     Matt


> Thanks!
>
> Feimi
>
> On 8/19/21 9:01 AM, Jed Brown wrote:
> > Junchao Zhang <junchao.zhang at gmail.com> writes:
> >
> >> Hi, Feimi,
> >>    I need to consult Jed (cc'ed).
> >>    Jed, is this an example of
> >>
> https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663
> ?
> >> If Feimi really can not free matrices, then we just need to attach a
> >> hypre-comm to a petsc inner comm, and pass that to hypre.
> > Are there a bunch of solves as in that case?
> >
> > My understanding is that one should be able to
> MPI_Comm_dup/MPI_Comm_free as many times as you like, but the
> implementation has limits on how many communicators can co-exist at any one
> time. The many-at-once is what we encountered in that 2018 thread.
> >
> > One way to check would be to use a debugger or tracer to examine the
> stack every time (P)MPI_Comm_dup and (P)MPI_Comm_free are called.
> >
> > case 1: we'll find lots of dups without frees (until the end) because
> the user really wants lots of these existing at the same time.
> >
> > case 2: dups are unfreed because of reference counting issue/inessential
> references
> >
> >
> > In case 1, I think the solution is as outlined in the thread, PETSc can
> create an inner-comm for Hypre. I think I'd prefer to attach it to the
> outer comm instead of the PETSc inner comm, but perhaps a case could be
> made either way.
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210819/bc97dd16/attachment.html>

From junchao.zhang at gmail.com  Thu Aug 19 15:29:33 2021
From: junchao.zhang at gmail.com (Junchao Zhang)
Date: Thu, 19 Aug 2021 15:29:33 -0500
Subject: [petsc-users] Reaching limit number of communicator with
 Spectrum MPI
In-Reply-To: <a515a91d-73ee-8622-4a8d-f0517924caa2@rpi.edu>
References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu>
	<CA+MQGp_EVwiHF1zaRF5o4qvWpk4yByq-Qrrx3OO2hrrKHnSvZA@mail.gmail.com>
	<71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu>
	<1ce1902e-4335-29d4-033-42e219e6355c@mcs.anl.gov>
	<CA+MQGp_PsC5bNr461Ehc5t81cy-PxzVmy5MNhKGMGS7NnPFRew@mail.gmail.com>
	<878s0x6118.fsf@jedbrown.org>
	<a515a91d-73ee-8622-4a8d-f0517924caa2@rpi.edu>
Message-ID: <CA+MQGp-HzqZ_gDio0_TeO-ojXc7iJVYSZvztwcX8VMhkQtc4Xg@mail.gmail.com>

On Thu, Aug 19, 2021 at 2:08 PM Feimi Yu <yuf2 at rpi.edu> wrote:

> Hi Jed,
>
> In my case, I only have 2 hypre preconditioners at the same time, and
> they do not solve simultaneously, so it might not be case 1.
>
> I checked the stack for all the calls of MPI_Comm_dup/MPI_Comm_free on
> my own machine (with OpenMPI), all the communicators are freed from my
> observation. I could not test it with Spectrum MPI on the clusters
> immediately because all the dependencies were built in release mode.
> However, as I mentioned, I haven't had this problem with OpenMPI before,
> so I'm not sure if this is really an MPI implementation problem, or just
> because Spectrum MPI has less limit for the number of communicators,
> and/or this also depends on how many MPI ranks are used, as only 2 out
> of 40 ranks reported the error.
>
You can add printf around MPI_Comm_dup/MPI_Comm_free sites on the two
ranks, e.g., if (myrank == 38) printf(...), to see if the dup/free are
paired.

 As a workaround, I replaced the MPI_Comm_dup() at

> petsc/src/mat/impls/hypre/mhypre.c:2120 with a copy assignment, and also
> removed the MPI_Comm_free() in the hypre destroyer. My code runs fine
> with Spectrum MPI now, but I don't think this is a long-term solution.
>
> Thanks!
>
> Feimi
>
> On 8/19/21 9:01 AM, Jed Brown wrote:
> > Junchao Zhang <junchao.zhang at gmail.com> writes:
> >
> >> Hi, Feimi,
> >>    I need to consult Jed (cc'ed).
> >>    Jed, is this an example of
> >>
> https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663
> ?
> >> If Feimi really can not free matrices, then we just need to attach a
> >> hypre-comm to a petsc inner comm, and pass that to hypre.
> > Are there a bunch of solves as in that case?
> >
> > My understanding is that one should be able to
> MPI_Comm_dup/MPI_Comm_free as many times as you like, but the
> implementation has limits on how many communicators can co-exist at any one
> time. The many-at-once is what we encountered in that 2018 thread.
> >
> > One way to check would be to use a debugger or tracer to examine the
> stack every time (P)MPI_Comm_dup and (P)MPI_Comm_free are called.
> >
> > case 1: we'll find lots of dups without frees (until the end) because
> the user really wants lots of these existing at the same time.
> >
> > case 2: dups are unfreed because of reference counting issue/inessential
> references
> >
> >
> > In case 1, I think the solution is as outlined in the thread, PETSc can
> create an inner-comm for Hypre. I think I'd prefer to attach it to the
> outer comm instead of the PETSc inner comm, but perhaps a case could be
> made either way.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210819/0a622c6d/attachment.html>

From bsmith at petsc.dev  Fri Aug 20 00:33:24 2021
From: bsmith at petsc.dev (Barry Smith)
Date: Fri, 20 Aug 2021 00:33:24 -0500
Subject: [petsc-users] Reaching limit number of communicator with
 Spectrum MPI
In-Reply-To: <CA+MQGp-HzqZ_gDio0_TeO-ojXc7iJVYSZvztwcX8VMhkQtc4Xg@mail.gmail.com>
References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu>
	<CA+MQGp_EVwiHF1zaRF5o4qvWpk4yByq-Qrrx3OO2hrrKHnSvZA@mail.gmail.com>
	<71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu>
	<1ce1902e-4335-29d4-033-42e219e6355c@mcs.anl.gov>
	<CA+MQGp_PsC5bNr461Ehc5t81cy-PxzVmy5MNhKGMGS7NnPFRew@mail.gmail.com>
	<878s0x6118.fsf@jedbrown.org>
	<a515a91d-73ee-8622-4a8d-f0517924caa2@rpi.edu>
	<CA+MQGp-HzqZ_gDio0_TeO-ojXc7iJVYSZvztwcX8VMhkQtc4Xg@mail.gmail.com>
Message-ID: <EBABBCEA-5488-4CF5-BD06-1A3DD9CE8DE1@petsc.dev>


  It sounds like maybe the Spectrum MPI_Comm_free() is not returning the comm to the "pool" as available for future use; a very buggy MPI implementation. This can easily be checked in a tiny standalone MPI program that simply comm dups and frees thousands of times in a loop. Could even be a configure test (that requires running an MPI program). I do not remember if we ever tested this possibility; maybe and I forgot.

  If this is the problem we can provide a "work around" that attributes the new comm (to be passed to hypre) to the old comm with a reference count value also in the attribute. When the hypre matrix is created that count is (with the new comm) is set to 1, when the hypre matrix is freed that count is set to zero (but the comm is not freed), in the next call to create the hypre matrix when the attribute is found, the count is zero so PETSc knows it can pass the same comm again to the new hypre matrix.

This will only allow one simultaneous hypre matrix to be created from the original comm. To allow multiply simultaneous hypre matrix one could have multiple comms and counts in the attribute and just check them until one finds an available one to reuse (or creates yet another one if all the current ones are busy with hypre matrices). So it is the same model as DMGetXXVector() where vectors are checked out and then checked in to be available later. This would solve the currently reported problem (if it is a buggy MPI that does not properly free comms), but not solve the MOOSE problem where 10,000 comms are needed at the same time. 

  Barry


> On Aug 19, 2021, at 3:29 PM, Junchao Zhang <junchao.zhang at gmail.com> wrote:
> 
> 
> 
> 
> On Thu, Aug 19, 2021 at 2:08 PM Feimi Yu <yuf2 at rpi.edu <mailto:yuf2 at rpi.edu>> wrote:
> Hi Jed,
> 
> In my case, I only have 2 hypre preconditioners at the same time, and 
> they do not solve simultaneously, so it might not be case 1.
> 
> I checked the stack for all the calls of MPI_Comm_dup/MPI_Comm_free on 
> my own machine (with OpenMPI), all the communicators are freed from my 
> observation. I could not test it with Spectrum MPI on the clusters 
> immediately because all the dependencies were built in release mode. 
> However, as I mentioned, I haven't had this problem with OpenMPI before, 
> so I'm not sure if this is really an MPI implementation problem, or just 
> because Spectrum MPI has less limit for the number of communicators, 
> and/or this also depends on how many MPI ranks are used, as only 2 out 
> of 40 ranks reported the error.
> You can add printf around MPI_Comm_dup/MPI_Comm_free sites on the two ranks, e.g., if (myrank == 38) printf(...), to see if the dup/free are paired.
>  
>  As a workaround, I replaced the MPI_Comm_dup() at
> petsc/src/mat/impls/hypre/mhypre.c:2120 with a copy assignment, and also 
> removed the MPI_Comm_free() in the hypre destroyer. My code runs fine 
> with Spectrum MPI now, but I don't think this is a long-term solution.
> 
> Thanks!
> 
> Feimi
> 
> On 8/19/21 9:01 AM, Jed Brown wrote:
> > Junchao Zhang <junchao.zhang at gmail.com <mailto:junchao.zhang at gmail.com>> writes:
> >
> >> Hi, Feimi,
> >>    I need to consult Jed (cc'ed).
> >>    Jed, is this an example of
> >> https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663 <https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663>?
> >> If Feimi really can not free matrices, then we just need to attach a
> >> hypre-comm to a petsc inner comm, and pass that to hypre.
> > Are there a bunch of solves as in that case?
> >
> > My understanding is that one should be able to MPI_Comm_dup/MPI_Comm_free as many times as you like, but the implementation has limits on how many communicators can co-exist at any one time. The many-at-once is what we encountered in that 2018 thread.
> >
> > One way to check would be to use a debugger or tracer to examine the stack every time (P)MPI_Comm_dup and (P)MPI_Comm_free are called.
> >
> > case 1: we'll find lots of dups without frees (until the end) because the user really wants lots of these existing at the same time.
> >
> > case 2: dups are unfreed because of reference counting issue/inessential references
> >
> >
> > In case 1, I think the solution is as outlined in the thread, PETSc can create an inner-comm for Hypre. I think I'd prefer to attach it to the outer comm instead of the PETSc inner comm, but perhaps a case could be made either way.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210820/a881b561/attachment-0001.html>

From bsmith at petsc.dev  Fri Aug 20 00:52:06 2021
From: bsmith at petsc.dev (Barry Smith)
Date: Fri, 20 Aug 2021 00:52:06 -0500
Subject: [petsc-users] Reaching limit number of communicator with
 Spectrum MPI
In-Reply-To: <EBABBCEA-5488-4CF5-BD06-1A3DD9CE8DE1@petsc.dev>
References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu>
	<CA+MQGp_EVwiHF1zaRF5o4qvWpk4yByq-Qrrx3OO2hrrKHnSvZA@mail.gmail.com>
	<71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu>
	<1ce1902e-4335-29d4-033-42e219e6355c@mcs.anl.gov>
	<CA+MQGp_PsC5bNr461Ehc5t81cy-PxzVmy5MNhKGMGS7NnPFRew@mail.gmail.com>
	<878s0x6118.fsf@jedbrown.org>
	<a515a91d-73ee-8622-4a8d-f0517924caa2@rpi.edu>
	<CA+MQGp-HzqZ_gDio0_TeO-ojXc7iJVYSZvztwcX8VMhkQtc4Xg@mail.gmail.com>
	<EBABBCEA-5488-4CF5-BD06-1A3DD9CE8DE1@petsc.dev>
Message-ID: <130D0B03-EBDC-4FBF-A051-12FDE0B51CAD@petsc.dev>


   With a couple of new PETSc utility functions we could use this approach generically to provide communicators to all external packages instead of directly use the dup and free specifically for each external package as we do now.


> On Aug 20, 2021, at 12:33 AM, Barry Smith <bsmith at petsc.dev> wrote:
> 
> 
>   It sounds like maybe the Spectrum MPI_Comm_free() is not returning the comm to the "pool" as available for future use; a very buggy MPI implementation. This can easily be checked in a tiny standalone MPI program that simply comm dups and frees thousands of times in a loop. Could even be a configure test (that requires running an MPI program). I do not remember if we ever tested this possibility; maybe and I forgot.
> 
>   If this is the problem we can provide a "work around" that attributes the new comm (to be passed to hypre) to the old comm with a reference count value also in the attribute. When the hypre matrix is created that count is (with the new comm) is set to 1, when the hypre matrix is freed that count is set to zero (but the comm is not freed), in the next call to create the hypre matrix when the attribute is found, the count is zero so PETSc knows it can pass the same comm again to the new hypre matrix.
> 
> This will only allow one simultaneous hypre matrix to be created from the original comm. To allow multiply simultaneous hypre matrix one could have multiple comms and counts in the attribute and just check them until one finds an available one to reuse (or creates yet another one if all the current ones are busy with hypre matrices). So it is the same model as DMGetXXVector() where vectors are checked out and then checked in to be available later. This would solve the currently reported problem (if it is a buggy MPI that does not properly free comms), but not solve the MOOSE problem where 10,000 comms are needed at the same time. 
> 
>   Barry
> 
> 
> 
> 
> 
>> On Aug 19, 2021, at 3:29 PM, Junchao Zhang <junchao.zhang at gmail.com <mailto:junchao.zhang at gmail.com>> wrote:
>> 
>> 
>> 
>> 
>> On Thu, Aug 19, 2021 at 2:08 PM Feimi Yu <yuf2 at rpi.edu <mailto:yuf2 at rpi.edu>> wrote:
>> Hi Jed,
>> 
>> In my case, I only have 2 hypre preconditioners at the same time, and 
>> they do not solve simultaneously, so it might not be case 1.
>> 
>> I checked the stack for all the calls of MPI_Comm_dup/MPI_Comm_free on 
>> my own machine (with OpenMPI), all the communicators are freed from my 
>> observation. I could not test it with Spectrum MPI on the clusters 
>> immediately because all the dependencies were built in release mode. 
>> However, as I mentioned, I haven't had this problem with OpenMPI before, 
>> so I'm not sure if this is really an MPI implementation problem, or just 
>> because Spectrum MPI has less limit for the number of communicators, 
>> and/or this also depends on how many MPI ranks are used, as only 2 out 
>> of 40 ranks reported the error.
>> You can add printf around MPI_Comm_dup/MPI_Comm_free sites on the two ranks, e.g., if (myrank == 38) printf(...), to see if the dup/free are paired.
>>  
>>  As a workaround, I replaced the MPI_Comm_dup() at
>> petsc/src/mat/impls/hypre/mhypre.c:2120 with a copy assignment, and also 
>> removed the MPI_Comm_free() in the hypre destroyer. My code runs fine 
>> with Spectrum MPI now, but I don't think this is a long-term solution.
>> 
>> Thanks!
>> 
>> Feimi
>> 
>> On 8/19/21 9:01 AM, Jed Brown wrote:
>> > Junchao Zhang <junchao.zhang at gmail.com <mailto:junchao.zhang at gmail.com>> writes:
>> >
>> >> Hi, Feimi,
>> >>    I need to consult Jed (cc'ed).
>> >>    Jed, is this an example of
>> >> https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663 <https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663>?
>> >> If Feimi really can not free matrices, then we just need to attach a
>> >> hypre-comm to a petsc inner comm, and pass that to hypre.
>> > Are there a bunch of solves as in that case?
>> >
>> > My understanding is that one should be able to MPI_Comm_dup/MPI_Comm_free as many times as you like, but the implementation has limits on how many communicators can co-exist at any one time. The many-at-once is what we encountered in that 2018 thread.
>> >
>> > One way to check would be to use a debugger or tracer to examine the stack every time (P)MPI_Comm_dup and (P)MPI_Comm_free are called.
>> >
>> > case 1: we'll find lots of dups without frees (until the end) because the user really wants lots of these existing at the same time.
>> >
>> > case 2: dups are unfreed because of reference counting issue/inessential references
>> >
>> >
>> > In case 1, I think the solution is as outlined in the thread, PETSc can create an inner-comm for Hypre. I think I'd prefer to attach it to the outer comm instead of the PETSc inner comm, but perhaps a case could be made either way.
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210820/e9f6e0bd/attachment.html>

From sayosale at hotmail.com  Fri Aug 20 06:55:29 2021
From: sayosale at hotmail.com (dazza simplythebest)
Date: Fri, 20 Aug 2021 11:55:29 +0000
Subject: [petsc-users] Improving efficiency of slepc usage
In-Reply-To: <EF005C76-523A-4D6B-8DA5-45A9A3FC37B3@dsic.upv.es>
References: <MEYP282MB1975D9155BCF70CF2F5D65CAD0C09@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<EF005C76-523A-4D6B-8DA5-45A9A3FC37B3@dsic.upv.es>
Message-ID: <MEYP282MB19759AF49362AE3EF4024546D0C19@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>

Dear Jose,
    Many thanks for your response, I have been investigating this issue with a few more calculations
today, hence the slightly delayed response.

The problem is actually derived from a fluid dynamics problem, so to allow an easier exploration of things
I first downsized the resolution of the underlying fluid solver while keeping all the physical parameters
 the same - i.e. I would get a smaller matrix that should be solving the same physical problem as the original
 larger matrix but to lower accuracy.

Results

Small matrix (N= 21168) - everything good!
This converged when using the -eps_largest_real approach (taking 92 iterations for nev=10,
tol= 5.0000E-06 and ncv = 300), and also when using the shift-invert approach, converging
very impressively in a single iteration ! Interestingly it did this both for a non-zero  -eps_target
 and also for a zero  -eps_target.

Large matrix (N=50400)- works for -eps_largest_real , fails for st_type sinvert
I have just double checked again that the code does run properly when we use the -eps_largest_real
option - indeed I ran it with a small nev and large tolerance (nev = 4, tol= -eps_tol 5.0e-4 , ncv = 300)
and with these parameters convergence was obtained in 164 iterations, which took 6 hours on the
machine I was running it on. Furthermore the eigenvalues seem to be ballpark correct; for this large
higher resolution case (although with lower slepc tolerance) we obtain 1789.56816314173 -4724.51319554773i
 as the eigenvalue with largest real part, while the smaller matrix (same physical problem but at lower resolution case)
found this eigenvalue to be 1831.11845726501 -4787.54519511345i , which means the agreement is in line
with expectations.

Unfortunately though the code does still crash though when I try to do shift-invert for the large matrix case ,
 whether or not I use a non-zero  -eps_target. For reference this is the command line used :
-eps_nev 10    -eps_ncv 300  -log_view -eps_view   -eps_target 0.1 -st_type sinvert -eps_monitor :monitor_output05.txt
To be precise the code crashes soon after calling EPSSolve (it successfully calls
 MatCreateVecs, EPSCreate,  EPSSetOperators, EPSSetProblemType and EPSSetFromOptions).
By crashes I mean that I do not even get any error messages from slepc/PETSC, and do not even get the
'EPS Object: 16 MPI processes' message - I simply get a  MPI/Fortran 'KILLED BY SIGNAL: 9 (Killed)' message
 as soon as EPSsolve is called.

Do you have any ideas as to why this larger matrix case should fail when using shift-invert but succeed when using
-eps_largest_real ? The fact that the program works and produces correct results
when using the -eps_largest_real  option suggests that there is probably nothing wrong with the specification
of the problem or the matrices ? It is strange how there is no error message from slepc / Petsc ... the
only idea I have at the moment is that perhaps max memory has been exceeded, which could cause such a sudden
shutdown? For your reference when running the large matrix case with the -eps_largest_real option I am using
about 36 GB of the 148GB available on this machine  - does the shift invert approach require substantially
more memory for example ?

  I would be very grateful if you have any suggestions to resolve this issue or even ways to clarify it further,
 the performance I have seen with the shift-invert for the small matrix is so impressive it would be great to
 get that working for the full-size problem.

   Many thanks and best wishes,
                                  Dan.


________________________________
From: Jose E. Roman <jroman at dsic.upv.es>
Sent: Thursday, August 19, 2021 7:58 AM
To: dazza simplythebest <sayosale at hotmail.com>
Cc: PETSc <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Improving efficiency of slepc usage

In A) convergence may be slow, especially if the wanted eigenvalues have small magnitude. I would not say 600 iterations is a lot, you probably need many more. In most cases, approach B) is better because it improves convergence of eigenvalues close to the target, but it requires prior knowledge of your spectrum distribution in order to choose an appropriate target.

In B) what do you mean that it crashes. If you get an error about factorization, it means that your A-matrix is singular, In that case, try using a nonzero target -eps_target 0.1

Jose


> El 19 ago 2021, a las 7:12, dazza simplythebest <sayosale at hotmail.com> escribi?:
>
> Dear All,
>             I am planning on using slepc to do a large number of eigenvalue calculations
>  of a generalized eigenvalue problem, called from a program written in fortran using MPI.
>  Thus far I have successfully installed the slepc/PETSc software, both locally and on a cluster,
>  and on smaller test problems everything is working well; the matrices are efficiently and
> correctly constructed and slepc returns the correct spectrum. I am just now starting to move
> towards now solving the full-size 'production run' problems, and would appreciate some
> general advice on how to improve the solver's performance.
>
> In particular, I am currently trying to solve the problem Ax = lambda Bx whose matrices
> are of size 50000 (this is the smallest 'production run' problem I will be tackling), and are
> complex, non-Hermitian.  In most cases I aim to find the eigenvalues with the largest real part,
> although in other cases I will also be interested in finding the eigenvalues whose real part
> is close to zero.
>
> A)
> Calling slepc 's EPS solver with the following options:
>
> -eps_nev 10   -log_view -eps_view -eps_max_it 600 -eps_ncv 140  -eps_tol 5.0e-6  -eps_largest_real -eps_monitor :monitor_output.txt
>
>
> led to the code successfully running, but failing to find any eigenvalues within the maximum 600 iterations
> (examining the monitor output it did appear to be very slowly approaching convergence).
>
> B)
> On the same problem I have also tried a shift-invert transformation using the options
>
> -eps_nev 10    -eps_ncv 140    -eps_target 0.0+0.0i  -st_type sinvert
>
> -in this case the code crashed at the point it tried to call slepc, so perhaps I have incorrectly specified these options ?
>
>
> Does anyone have any suggestions as to how to improve this performance ( or find out more about the problem) ?
> In the case of A) I can see from watching the slepc   videos that increasing ncv
> may help, but I am wondering , since 600 is a large number of iterations, whether there
> maybe something else going on - e.g. perhaps some alternative preconditioner may help ?
> In the case of B), I guess there must be some mistake in these command line options?
>  Again, any advice will be greatly appreciated.
>      Best wishes,  Dan.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210820/ea7446d3/attachment-0001.html>

From numbersixvs at gmail.com  Fri Aug 20 03:02:22 2021
From: numbersixvs at gmail.com (=?UTF-8?B?0J3QsNC30LTRgNCw0YfRkdCyINCS0LjQutGC0L7RgA==?=)
Date: Fri, 20 Aug 2021 11:02:22 +0300
Subject: [petsc-users] Euclid or Boomeramg vs ILU: questions.
Message-ID: <CAELBu--XbLfrQP87VoxDesSjLspuhrptar+PN1gzhJ=GSun-nA@mail.gmail.com>

*Hello, dear PETSc team!*


I have a 3D elasticity with heterogeneous properties problem. There is
unstructured grid with aspect ratio varied from 4 to 25. Dirichlet BCs
(bottom zero displacements) are imposed via linear constraint equations
using Lagrange multipliers. Also, Neumann (traction) BCs are imposed on
side edges of mesh. Gravity load is also accounted for.

I can solve this problem with *dgmres solver* and *ILU* as a
*preconditioner*. But ILU doesn`t support parallel computing, so I decided
to use Euclid or Boomeramg as a preconditioner. The issue is in slow
convergence and high memory consumption, much higher, than for ILU.

E.g., for source matrix size 2.14 GB with *ILU-0 preconditioning* memory
consumption is about 5.9 GB, and the process converges due to 767
iterations, and with *Euclid-0 preconditioning* memory consumption is about
8.7 GB, and the process converges due to 1732 iterations.

One of the following preconditioners is currently in use: *ILU-0, ILU-1,
Hypre (Euclid), Hypre (boomeramg)*.

As a result of computations *(logs and memory logs are attached)*, the
following is established for preconditioners:

1. *ILU-0*: does not always provide convergence (or provides, but slow);
uses an acceptable amount of RAM; does not support parallel computing.

2. *ILU-1*: stable; memory consumption is much higher than that of ILU-0;
does not support parallel computing.

3. *Euclid*: provides very slow convergence, calculations are performed
several times slower than for ILU-0; memory consumption greatly exceeds
both ILU-0 and ILU-1; supports parallel computing. Also ?drop tolerance?
doesn?t provide enough accuracy in some cells, so I don?t use it.

4. *Boomeramg*: provides very slow convergence, calculations are performed
several times slower than for ILU-0; memory consumption greatly exceeds
both ILU-0 and ILU-1; supports parallel computing.


In this regard, the following questions arose:

1. Is this behavior expected for HYPRE in computations with 1 MPI process?
If not, is that problem can be related to *PETSc* or *HYPRE*?

2. Hypre (Euclid) has much fewer parameters than ILU. Among them is the
factorization level *"-pc_hypre_euclid_level <now -2: formerly -2>:
Factorization levels (None)"* and its default value looks very strange,
moreover, it doesn?t matter what factor is chosen -2, -1 or 0. Could it be
that the parameter is confused with Column pivot tolerance in ILU -
*"-pc_factor_column_pivot
<-2.: -2.>: Column pivot tolerance (used only for some factorization)
(PCFactorSetColumnPivot)"*?

3. What preconditioner would you recommend to: optimize *convergence*,
*memory* consumption, add *parallel computing*?

4. How can we theoretically estimate memory costs with *ILU, Euclid,
Boomeramg*?

5. At what stage are memory leaks most likely?


In any case, thank you so much for your attention! Will be grateful for any
response.

Kind regards,
Viktor Nazdrachev
R&D senior researcher
Geosteering Technologies LLC
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210820/ebd5d65f/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: logs.rar
Type: application/octet-stream
Size: 90710 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210820/ebd5d65f/attachment-0001.obj>

From joauma.marichal at uclouvain.be  Fri Aug 20 03:33:03 2021
From: joauma.marichal at uclouvain.be (Joauma Marichal)
Date: Fri, 20 Aug 2021 08:33:03 +0000
Subject: [petsc-users] Parallelize in the y direction
Message-ID: <AM9PR03MB69612B3C1D55A98A47F512D781C09@AM9PR03MB6961.eurprd03.prod.outlook.com>

 Dear Sir or Madam,

I am looking for advice regarding some of PETSc functionnalities. I am currently using PETSc to solve the Navier-Stokes equations on a 3D mesh decomposed over several processors.  However, until now, the processors are distributed along the x and z directions but not along the y one. Indeed, at some point in the algorithm, I must solve a tridiagonal system that depends only on y. Until now, I have therefore performed something like this:
for(int k = cornp->zs, k<cornp->zs+cornp->zm; ++k){
     for(int i = cornp->xs, i<cornp->xs+cornp->xm; ++i){
           Create and solve a tridiagonal system for all the y coordinates (which are on the same process)
}
However, I would like to decompose my mesh in the y direction (as this should improve the code efficiency).
I managed to do so by creating a system based on the 3D DM of all my case (so 1 system of size x*y*z). Unfortunately, this does not seem to be very efficient.
Do you have some advice on how to cut in the y direction while still being able to solve x*z systems of size y? Should I create 1D DMs?

Thanks a lot for your help.

Best regards,

Joauma Marichal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210820/22197c52/attachment.html>

From junchao.zhang at gmail.com  Fri Aug 20 08:58:45 2021
From: junchao.zhang at gmail.com (Junchao Zhang)
Date: Fri, 20 Aug 2021 08:58:45 -0500
Subject: [petsc-users] Reaching limit number of communicator with
 Spectrum MPI
In-Reply-To: <EBABBCEA-5488-4CF5-BD06-1A3DD9CE8DE1@petsc.dev>
References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu>
	<CA+MQGp_EVwiHF1zaRF5o4qvWpk4yByq-Qrrx3OO2hrrKHnSvZA@mail.gmail.com>
	<71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu>
	<1ce1902e-4335-29d4-033-42e219e6355c@mcs.anl.gov>
	<CA+MQGp_PsC5bNr461Ehc5t81cy-PxzVmy5MNhKGMGS7NnPFRew@mail.gmail.com>
	<878s0x6118.fsf@jedbrown.org>
	<a515a91d-73ee-8622-4a8d-f0517924caa2@rpi.edu>
	<CA+MQGp-HzqZ_gDio0_TeO-ojXc7iJVYSZvztwcX8VMhkQtc4Xg@mail.gmail.com>
	<EBABBCEA-5488-4CF5-BD06-1A3DD9CE8DE1@petsc.dev>
Message-ID: <CA+MQGp9fHsmmgO2f8DVhGDDoc+kRm87UFa=t0FwkrhJGfG8neA@mail.gmail.com>

Feimi, if it is easy to reproduce, could you give instructions on how to
reproduce that?

PS: Spectrum MPI is based on OpenMPI.  I don't understand why it has the
problem but OpenMPI does not.  It could be a bug in petsc or user's code.
For reference counting on MPI_Comm, we already have petsc inner comm. I
think we can reuse that.

--Junchao Zhang


On Fri, Aug 20, 2021 at 12:33 AM Barry Smith <bsmith at petsc.dev> wrote:

>
>   It sounds like maybe the Spectrum MPI_Comm_free() is not returning the
> comm to the "pool" as available for future use; a very buggy MPI
> implementation. This can easily be checked in a tiny standalone MPI program
> that simply comm dups and frees thousands of times in a loop. Could even be
> a configure test (that requires running an MPI program). I do not remember
> if we ever tested this possibility; maybe and I forgot.
>
>   If this is the problem we can provide a "work around" that attributes
> the new comm (to be passed to hypre) to the old comm with a reference count
> value also in the attribute. When the hypre matrix is created that count is
> (with the new comm) is set to 1, when the hypre matrix is freed that count
> is set to zero (but the comm is not freed), in the next call to create the
> hypre matrix when the attribute is found, the count is zero so PETSc knows
> it can pass the same comm again to the new hypre matrix.
>
> This will only allow one simultaneous hypre matrix to be created from the
> original comm. To allow multiply simultaneous hypre matrix one could have
> multiple comms and counts in the attribute and just check them until one
> finds an available one to reuse (or creates yet another one if all the
> current ones are busy with hypre matrices). So it is the same model as
> DMGetXXVector() where vectors are checked out and then checked in to be
> available later. This would solve the currently reported problem (if it is
> a buggy MPI that does not properly free comms), but not solve the MOOSE
> problem where 10,000 comms are needed at the same time.
>
>   Barry
>
>
>
>
>
> On Aug 19, 2021, at 3:29 PM, Junchao Zhang <junchao.zhang at gmail.com>
> wrote:
>
>
>
>
> On Thu, Aug 19, 2021 at 2:08 PM Feimi Yu <yuf2 at rpi.edu> wrote:
>
>> Hi Jed,
>>
>> In my case, I only have 2 hypre preconditioners at the same time, and
>> they do not solve simultaneously, so it might not be case 1.
>>
>> I checked the stack for all the calls of MPI_Comm_dup/MPI_Comm_free on
>> my own machine (with OpenMPI), all the communicators are freed from my
>> observation. I could not test it with Spectrum MPI on the clusters
>> immediately because all the dependencies were built in release mode.
>> However, as I mentioned, I haven't had this problem with OpenMPI before,
>> so I'm not sure if this is really an MPI implementation problem, or just
>> because Spectrum MPI has less limit for the number of communicators,
>> and/or this also depends on how many MPI ranks are used, as only 2 out
>> of 40 ranks reported the error.
>>
> You can add printf around MPI_Comm_dup/MPI_Comm_free sites on the two
> ranks, e.g., if (myrank == 38) printf(...), to see if the dup/free are
> paired.
>
>  As a workaround, I replaced the MPI_Comm_dup() at
>
>> petsc/src/mat/impls/hypre/mhypre.c:2120 with a copy assignment, and also
>> removed the MPI_Comm_free() in the hypre destroyer. My code runs fine
>> with Spectrum MPI now, but I don't think this is a long-term solution.
>>
>> Thanks!
>>
>> Feimi
>>
>> On 8/19/21 9:01 AM, Jed Brown wrote:
>> > Junchao Zhang <junchao.zhang at gmail.com> writes:
>> >
>> >> Hi, Feimi,
>> >>    I need to consult Jed (cc'ed).
>> >>    Jed, is this an example of
>> >>
>> https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663
>> ?
>> >> If Feimi really can not free matrices, then we just need to attach a
>> >> hypre-comm to a petsc inner comm, and pass that to hypre.
>> > Are there a bunch of solves as in that case?
>> >
>> > My understanding is that one should be able to
>> MPI_Comm_dup/MPI_Comm_free as many times as you like, but the
>> implementation has limits on how many communicators can co-exist at any one
>> time. The many-at-once is what we encountered in that 2018 thread.
>> >
>> > One way to check would be to use a debugger or tracer to examine the
>> stack every time (P)MPI_Comm_dup and (P)MPI_Comm_free are called.
>> >
>> > case 1: we'll find lots of dups without frees (until the end) because
>> the user really wants lots of these existing at the same time.
>> >
>> > case 2: dups are unfreed because of reference counting
>> issue/inessential references
>> >
>> >
>> > In case 1, I think the solution is as outlined in the thread, PETSc can
>> create an inner-comm for Hypre. I think I'd prefer to attach it to the
>> outer comm instead of the PETSc inner comm, but perhaps a case could be
>> made either way.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210820/6aaa0f1a/attachment.html>

From knepley at gmail.com  Fri Aug 20 09:12:58 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 20 Aug 2021 09:12:58 -0500
Subject: [petsc-users] Improving efficiency of slepc usage
In-Reply-To: <MEYP282MB19759AF49362AE3EF4024546D0C19@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
References: <MEYP282MB1975D9155BCF70CF2F5D65CAD0C09@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<EF005C76-523A-4D6B-8DA5-45A9A3FC37B3@dsic.upv.es>
	<MEYP282MB19759AF49362AE3EF4024546D0C19@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
Message-ID: <CAMYG4Gn1h_S4o9LsnyzL_PRkHMxsXkpHgL16n_3iL5B6bRq7sw@mail.gmail.com>

On Fri, Aug 20, 2021 at 6:55 AM dazza simplythebest <sayosale at hotmail.com>
wrote:

> Dear Jose,
>     Many thanks for your response, I have been investigating this issue
> with a few more calculations
> today, hence the slightly delayed response.
>
> The problem is actually derived from a fluid dynamics problem, so to allow
> an easier exploration of things
> I first downsized the resolution of the underlying fluid solver while
> keeping all the physical parameters
>  the same - i.e. I would get a smaller matrix that should be solving the
> same physical problem as the original
>  larger matrix but to lower accuracy.
>
> *Results*
>
> *Small matrix (N= 21168) - everything good!*
> This converged when using the -eps_largest_real approach (taking 92
> iterations for nev=10,
> tol= 5.0000E-06 and ncv = 300), and also when using the shift-invert
> approach, converging
> very impressively in a single iteration ! Interestingly it did this both
> for a non-zero  -eps_target
>  and also for a zero  -eps_target.
>
> *Large matrix (N=50400)- works for -eps_largest_real , fails for st_type
> sinvert *
> I have just double checked again that the code does run properly when we
> use the -eps_largest_real
> option - indeed I ran it with a small nev and large tolerance (nev = 4,
> tol= -eps_tol 5.0e-4 , ncv = 300)
> and with these parameters convergence was obtained in 164 iterations,
> which took 6 hours on the
> machine I was running it on. Furthermore the eigenvalues seem to be
> ballpark correct; for this large
> higher resolution case (although with lower slepc tolerance) we obtain
> 1789.56816314173 -4724.51319554773i
>  as the eigenvalue with largest real part, while the smaller matrix (same
> physical problem but at lower resolution case)
> found this eigenvalue to be 1831.11845726501 -4787.54519511345i , which
> means the agreement is in line
> with expectations.
>
> *Unfortunately though the code does still crash though when I try to do
> shift-invert for the large matrix case *,
>  whether or not I use a non-zero  -eps_target. For reference this is the
> command line used :
> -eps_nev 10    -eps_ncv 300  -log_view -eps_view   -eps_target 0.1
> -st_type sinvert -eps_monitor :monitor_output05.txt
> To be precise the code crashes soon after calling EPSSolve (it
> successfully calls
>  MatCreateVecs, EPSCreate,  EPSSetOperators, EPSSetProblemType and
> EPSSetFromOptions).
> By crashes I mean that I do not even get any error messages from
> slepc/PETSC, and do not even get the
> 'EPS Object: 16 MPI processes' message - I simply get a  MPI/Fortran
> 'KILLED BY SIGNAL: 9 (Killed)' message
>  as soon as EPSsolve is called.
>

Hi Dan,

It would help track this error down if we had a stack trace. You can get a
stack trace from the debugger. You run with

  -start_in_debugger

which should launch the debugger (usually), and then type

  cont

to continue, and then

  where

to get the stack trace when it crashes, or 'bt' on lldb.

  Thanks,

     Matt


> Do you have any ideas as to why this larger matrix case should fail when
> using shift-invert but succeed when using
> -eps_largest_real ? The fact that the program works and produces correct
> results
> when using the -eps_largest_real  option suggests that there is probably
> nothing wrong with the specification
> of the problem or the matrices ? It is strange how there is no error
> message from slepc / Petsc ... the
> only idea I have at the moment is that perhaps max memory has been
> exceeded, which could cause such a sudden
> shutdown? For your reference when running the large matrix case with the
> -eps_largest_real option I am using
> about 36 GB of the 148GB available on this machine  - does the shift
> invert approach require substantially
> more memory for example ?
>
>   I would be very grateful if you have any suggestions to resolve this
> issue or even ways to clarify it further,
>  the performance I have seen with the shift-invert for the small matrix is
> so impressive it would be great to
>  get that working for the full-size problem.
>
>    Many thanks and best wishes,
>                                   Dan.
>
>
>
> ------------------------------
> *From:* Jose E. Roman <jroman at dsic.upv.es>
> *Sent:* Thursday, August 19, 2021 7:58 AM
> *To:* dazza simplythebest <sayosale at hotmail.com>
> *Cc:* PETSc <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Improving efficiency of slepc usage
>
> In A) convergence may be slow, especially if the wanted eigenvalues have
> small magnitude. I would not say 600 iterations is a lot, you probably need
> many more. In most cases, approach B) is better because it improves
> convergence of eigenvalues close to the target, but it requires prior
> knowledge of your spectrum distribution in order to choose an appropriate
> target.
>
> In B) what do you mean that it crashes. If you get an error about
> factorization, it means that your A-matrix is singular, In that case, try
> using a nonzero target -eps_target 0.1
>
> Jose
>
>
> > El 19 ago 2021, a las 7:12, dazza simplythebest <sayosale at hotmail.com>
> escribi?:
> >
> > Dear All,
> >             I am planning on using slepc to do a large number of
> eigenvalue calculations
> >  of a generalized eigenvalue problem, called from a program written in
> fortran using MPI.
> >  Thus far I have successfully installed the slepc/PETSc software, both
> locally and on a cluster,
> >  and on smaller test problems everything is working well; the matrices
> are efficiently and
> > correctly constructed and slepc returns the correct spectrum. I am just
> now starting to move
> > towards now solving the full-size 'production run' problems, and would
> appreciate some
> > general advice on how to improve the solver's performance.
> >
> > In particular, I am currently trying to solve the problem Ax = lambda Bx
> whose matrices
> > are of size 50000 (this is the smallest 'production run' problem I will
> be tackling), and are
> > complex, non-Hermitian.  In most cases I aim to find the eigenvalues
> with the largest real part,
> > although in other cases I will also be interested in finding the
> eigenvalues whose real part
> > is close to zero.
> >
> > A)
> > Calling slepc 's EPS solver with the following options:
> >
> > -eps_nev 10   -log_view -eps_view -eps_max_it 600 -eps_ncv 140  -eps_tol
> 5.0e-6  -eps_largest_real -eps_monitor :monitor_output.txt
> >
> >
> > led to the code successfully running, but failing to find any
> eigenvalues within the maximum 600 iterations
> > (examining the monitor output it did appear to be very slowly
> approaching convergence).
> >
> > B)
> > On the same problem I have also tried a shift-invert transformation
> using the options
> >
> > -eps_nev 10    -eps_ncv 140    -eps_target 0.0+0.0i  -st_type sinvert
> >
> > -in this case the code crashed at the point it tried to call slepc, so
> perhaps I have incorrectly specified these options ?
> >
> >
> > Does anyone have any suggestions as to how to improve this performance (
> or find out more about the problem) ?
> > In the case of A) I can see from watching the slepc   videos that
> increasing ncv
> > may help, but I am wondering , since 600 is a large number of
> iterations, whether there
> > maybe something else going on - e.g. perhaps some alternative
> preconditioner may help ?
> > In the case of B), I guess there must be some mistake in these command
> line options?
> >  Again, any advice will be greatly appreciated.
> >      Best wishes,  Dan.
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210820/fdccdf49/attachment-0001.html>

From knepley at gmail.com  Fri Aug 20 09:14:51 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Fri, 20 Aug 2021 09:14:51 -0500
Subject: [petsc-users] Parallelize in the y direction
In-Reply-To: <AM9PR03MB69612B3C1D55A98A47F512D781C09@AM9PR03MB6961.eurprd03.prod.outlook.com>
References: <AM9PR03MB69612B3C1D55A98A47F512D781C09@AM9PR03MB6961.eurprd03.prod.outlook.com>
Message-ID: <CAMYG4GkV0D0-JwWNAs-LMJvu4F3_5ZPSi+q3JcubV9ubXOC_ew@mail.gmail.com>

On Fri, Aug 20, 2021 at 7:53 AM Joauma Marichal <
joauma.marichal at uclouvain.be> wrote:

>  Dear Sir or Madam,
>
> I am looking for advice regarding some of PETSc functionnalities. I am
> currently using PETSc to solve the Navier-Stokes equations on a 3D mesh
> decomposed over several processors.  However, until now, the processors are
> distributed along the x and z directions but not along the y one. Indeed,
> at some point in the algorithm, I must solve a tridiagonal system that
> depends only on y. Until now, I have therefore performed something like
> this:
> for(int k = cornp->zs, k<cornp->zs+cornp->zm; ++k){
>      for(int i = cornp->xs, i<cornp->xs+cornp->xm; ++i){
>            Create and solve a tridiagonal system for all the y coordinates
> (which are on the same process)
> }
> However, I would like to decompose my mesh in the y direction (as this
> should improve the code efficiency).
> I managed to do so by creating a system based on the 3D DM of all my case
> (so 1 system of size x*y*z). Unfortunately, this does not seem to be very
> efficient.
> Do you have some advice on how to cut in the y direction while still being
> able to solve x*z systems of size y? Should I create 1D DMs?
>

1) Are you using a 3D DMDA?

2) Is the coupling much different in the x and z than in the y direction?

  Thanks,

     Matt


> Thanks a lot for your help.
>
> Best regards,
>
> Joauma Marichal
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210820/b3a0572c/attachment.html>

From zhugp01 at nus.edu.sg  Fri Aug 20 09:59:03 2021
From: zhugp01 at nus.edu.sg (Guangpu Zhu)
Date: Fri, 20 Aug 2021 14:59:03 +0000
Subject: [petsc-users] Using Elemetal with petsc4py to solve AX = B paralelly
Message-ID: <KL1PR0601MB384818EBD35B87D6997F0F9483C19@KL1PR0601MB3848.apcprd06.prod.outlook.com>

Dear Sir/Madam,

           I am trying to use the petsc4py to solve AX = B parallelly, where A is a large dense matrix. The Elemental package in petsc4py is very suitable for the dense matrix, but I can't find any example or learning material about it on the PETSc website and other websites.  I am writing this e-mail to ask if you can kindly provide a minimal example for solving a linear system based on Elemental with petsc4py. I am looking forward to hearing from you. Thank you very much.

Best,

Guangpu Zhu

---
Guangpu Zhu (???)

Research Associate,  Department of Mechanical Engineering

National University of Singapore

Personal E-mail: zhugpupc at gmail.com

Phone: (+65) 87581879
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210820/449f4035/attachment.html>

From jroman at dsic.upv.es  Fri Aug 20 11:20:24 2021
From: jroman at dsic.upv.es (Jose E. Roman)
Date: Fri, 20 Aug 2021 18:20:24 +0200
Subject: [petsc-users] Improving efficiency of slepc usage
In-Reply-To: <CAMYG4Gn1h_S4o9LsnyzL_PRkHMxsXkpHgL16n_3iL5B6bRq7sw@mail.gmail.com>
References: <MEYP282MB1975D9155BCF70CF2F5D65CAD0C09@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<EF005C76-523A-4D6B-8DA5-45A9A3FC37B3@dsic.upv.es>
	<MEYP282MB19759AF49362AE3EF4024546D0C19@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<CAMYG4Gn1h_S4o9LsnyzL_PRkHMxsXkpHgL16n_3iL5B6bRq7sw@mail.gmail.com>
Message-ID: <922F34AC-1EB5-4A63-B25F-11C0007BD372@dsic.upv.es>

Maybe too much fill-in during factorization. Try using an external linear solver such as MUMPS as explained in section 3.4.1 of SLEPc's users manual.

Jose


> El 20 ago 2021, a las 16:12, Matthew Knepley <knepley at gmail.com> escribi?:
> 
> On Fri, Aug 20, 2021 at 6:55 AM dazza simplythebest <sayosale at hotmail.com> wrote:
> Dear Jose,
>     Many thanks for your response, I have been investigating this issue with a few more calculations 
> today, hence the slightly delayed response.
> 
> The problem is actually derived from a fluid dynamics problem, so to allow an easier exploration of things 
> I first downsized the resolution of the underlying fluid solver while keeping all the physical parameters
>  the same - i.e. I would get a smaller matrix that should be solving the same physical problem as the original
>  larger matrix but to lower accuracy.  
> 
> Results
> 
> Small matrix (N= 21168) - everything good!
> This converged when using the -eps_largest_real approach (taking 92 iterations for nev=10,
> tol= 5.0000E-06 and ncv = 300), and also when using the shift-invert approach, converging 
> very impressively in a single iteration ! Interestingly it did this both for a non-zero  -eps_target
>  and also for a zero  -eps_target.
> 
> Large matrix (N=50400)- works for -eps_largest_real , fails for st_type sinvert 
> I have just double checked again that the code does run properly when we use the -eps_largest_real 
> option - indeed I ran it with a small nev and large tolerance (nev = 4, tol= -eps_tol 5.0e-4 , ncv = 300)
> and with these parameters convergence was obtained in 164 iterations, which took 6 hours on the 
> machine I was running it on. Furthermore the eigenvalues seem to be ballpark correct; for this large
> higher resolution case (although with lower slepc tolerance) we obtain 1789.56816314173 -4724.51319554773i
>  as the eigenvalue with largest real part, while the smaller matrix (same physical problem but at lower resolution case)
> found this eigenvalue to be 1831.11845726501 -4787.54519511345i , which means the agreement is in line
> with expectations.
> 
> Unfortunately though the code does still crash though when I try to do shift-invert for the large matrix case ,
>  whether or not I use a non-zero  -eps_target. For reference this is the command line used :
> -eps_nev 10    -eps_ncv 300  -log_view -eps_view   -eps_target 0.1 -st_type sinvert -eps_monitor :monitor_output05.txt  
> To be precise the code crashes soon after calling EPSSolve (it successfully calls 
>  MatCreateVecs, EPSCreate,  EPSSetOperators, EPSSetProblemType and EPSSetFromOptions).
> By crashes I mean that I do not even get any error messages from slepc/PETSC, and do not even get the 
> 'EPS Object: 16 MPI processes' message - I simply get a  MPI/Fortran 'KILLED BY SIGNAL: 9 (Killed)' message
>  as soon as EPSsolve is called.
> 
> Hi Dan,
> 
> It would help track this error down if we had a stack trace. You can get a stack trace from the debugger. You run with
> 
>   -start_in_debugger
> 
> which should launch the debugger (usually), and then type
> 
>   cont
> 
> to continue, and then
> 
>   where
> 
> to get the stack trace when it crashes, or 'bt' on lldb.
> 
>   Thanks,
> 
>      Matt
>  
> Do you have any ideas as to why this larger matrix case should fail when using shift-invert but succeed when using 
> -eps_largest_real ? The fact that the program works and produces correct results 
> when using the -eps_largest_real  option suggests that there is probably nothing wrong with the specification 
> of the problem or the matrices ? It is strange how there is no error message from slepc / Petsc ... the 
> only idea I have at the moment is that perhaps max memory has been exceeded, which could cause such a sudden 
> shutdown? For your reference when running the large matrix case with the -eps_largest_real option I am using 
> about 36 GB of the 148GB available on this machine  - does the shift invert approach require substantially 
> more memory for example ?
> 
>   I would be very grateful if you have any suggestions to resolve this issue or even ways to clarify it further,
>  the performance I have seen with the shift-invert for the small matrix is so impressive it would be great to
>  get that working for the full-size problem.
> 
>    Many thanks and best wishes,
>                                   Dan.
> 
> 
> 
> From: Jose E. Roman <jroman at dsic.upv.es>
> Sent: Thursday, August 19, 2021 7:58 AM
> To: dazza simplythebest <sayosale at hotmail.com>
> Cc: PETSc <petsc-users at mcs.anl.gov>
> Subject: Re: [petsc-users] Improving efficiency of slepc usage
>  
> In A) convergence may be slow, especially if the wanted eigenvalues have small magnitude. I would not say 600 iterations is a lot, you probably need many more. In most cases, approach B) is better because it improves convergence of eigenvalues close to the target, but it requires prior knowledge of your spectrum distribution in order to choose an appropriate target.
> 
> In B) what do you mean that it crashes. If you get an error about factorization, it means that your A-matrix is singular, In that case, try using a nonzero target -eps_target 0.1
> 
> Jose
> 
> 
> > El 19 ago 2021, a las 7:12, dazza simplythebest <sayosale at hotmail.com> escribi?:
> > 
> > Dear All,
> >             I am planning on using slepc to do a large number of eigenvalue calculations
> >  of a generalized eigenvalue problem, called from a program written in fortran using MPI.
> >  Thus far I have successfully installed the slepc/PETSc software, both locally and on a cluster,
> >  and on smaller test problems everything is working well; the matrices are efficiently and 
> > correctly constructed and slepc returns the correct spectrum. I am just now starting to move
> > towards now solving the full-size 'production run' problems, and would appreciate some 
> > general advice on how to improve the solver's performance.
> > 
> > In particular, I am currently trying to solve the problem Ax = lambda Bx whose matrices 
> > are of size 50000 (this is the smallest 'production run' problem I will be tackling), and are 
> > complex, non-Hermitian.  In most cases I aim to find the eigenvalues with the largest real part, 
> > although in other cases I will also be interested in finding the eigenvalues whose real part 
> > is close to zero.
> > 
> > A)
> > Calling slepc 's EPS solver with the following options:
> > 
> > -eps_nev 10   -log_view -eps_view -eps_max_it 600 -eps_ncv 140  -eps_tol 5.0e-6  -eps_largest_real -eps_monitor :monitor_output.txt
> > 
> > 
> > led to the code successfully running, but failing to find any eigenvalues within the maximum 600 iterations 
> > (examining the monitor output it did appear to be very slowly approaching convergence).
> > 
> > B)
> > On the same problem I have also tried a shift-invert transformation using the options
> > 
> > -eps_nev 10    -eps_ncv 140    -eps_target 0.0+0.0i  -st_type sinvert
> > 
> > -in this case the code crashed at the point it tried to call slepc, so perhaps I have incorrectly specified these options ?
> > 
> > 
> > Does anyone have any suggestions as to how to improve this performance ( or find out more about the problem) ?
> > In the case of A) I can see from watching the slepc   videos that increasing ncv 
> > may help, but I am wondering , since 600 is a large number of iterations, whether there 
> > maybe something else going on - e.g. perhaps some alternative preconditioner may help ?
> > In the case of B), I guess there must be some mistake in these command line options?
> >  Again, any advice will be greatly appreciated.
> >      Best wishes,  Dan.
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/


From mfadams at lbl.gov  Fri Aug 20 13:21:29 2021
From: mfadams at lbl.gov (Mark Adams)
Date: Fri, 20 Aug 2021 14:21:29 -0400
Subject: [petsc-users] Euclid or Boomeramg vs ILU: questions.
In-Reply-To: <CAELBu--XbLfrQP87VoxDesSjLspuhrptar+PN1gzhJ=GSun-nA@mail.gmail.com>
References: <CAELBu--XbLfrQP87VoxDesSjLspuhrptar+PN1gzhJ=GSun-nA@mail.gmail.com>
Message-ID: <CADOhEh7KhcaJrEfcT95HYv_=WH6OQ4u__8QaZtAJvLG_1fVGDw@mail.gmail.com>

Constraints are a pain with scalable/iterative solvers. If you order the
constraints last then ILU should work as well as it can work, but AMG gets
confused by the constraint equations.
You could look at PETSc's Stokes solvers, but it would be best if you could
remove the constrained equations from your system if they are just simple
point wise BC's.
Mark

On Fri, Aug 20, 2021 at 8:53 AM ????????? ?????? <numbersixvs at gmail.com>
wrote:

> *Hello, dear PETSc team!*
>
>
>
> I have a 3D elasticity with heterogeneous properties problem. There is
> unstructured grid with aspect ratio varied from 4 to 25. Dirichlet BCs
> (bottom zero displacements) are imposed via linear constraint equations
> using Lagrange multipliers. Also, Neumann (traction) BCs are imposed on
> side edges of mesh. Gravity load is also accounted for.
>
> I can solve this problem with *dgmres solver* and *ILU* as a
> *preconditioner*. But ILU doesn`t support parallel computing, so I
> decided to use Euclid or Boomeramg as a preconditioner. The issue is in
> slow convergence and high memory consumption, much higher, than for ILU.
>
> E.g., for source matrix size 2.14 GB with *ILU-0 preconditioning* memory
> consumption is about 5.9 GB, and the process converges due to 767
> iterations, and with *Euclid-0 preconditioning* memory consumption is
> about 8.7 GB, and the process converges due to 1732 iterations.
>
> One of the following preconditioners is currently in use: *ILU-0, ILU-1,
> Hypre (Euclid), Hypre (boomeramg)*.
>
> As a result of computations *(logs and memory logs are attached)*, the
> following is established for preconditioners:
>
> 1. *ILU-0*: does not always provide convergence (or provides, but slow);
> uses an acceptable amount of RAM; does not support parallel computing.
>
> 2. *ILU-1*: stable; memory consumption is much higher than that of ILU-0;
> does not support parallel computing.
>
> 3. *Euclid*: provides very slow convergence, calculations are performed
> several times slower than for ILU-0; memory consumption greatly exceeds
> both ILU-0 and ILU-1; supports parallel computing. Also ?drop tolerance?
> doesn?t provide enough accuracy in some cells, so I don?t use it.
>
> 4. *Boomeramg*: provides very slow convergence, calculations are
> performed several times slower than for ILU-0; memory consumption greatly
> exceeds both ILU-0 and ILU-1; supports parallel computing.
>
>
>
> In this regard, the following questions arose:
>
> 1. Is this behavior expected for HYPRE in computations with 1 MPI process?
> If not, is that problem can be related to *PETSc* or *HYPRE*?
>
> 2. Hypre (Euclid) has much fewer parameters than ILU. Among them is the
> factorization level *"-pc_hypre_euclid_level <now -2: formerly -2>:
> Factorization levels (None)"* and its default value looks very strange,
> moreover, it doesn?t matter what factor is chosen -2, -1 or 0. Could it be
> that the parameter is confused with Column pivot tolerance in ILU - *"-pc_factor_column_pivot
> <-2.: -2.>: Column pivot tolerance (used only for some factorization)
> (PCFactorSetColumnPivot)"*?
>
> 3. What preconditioner would you recommend to: optimize *convergence*,
> *memory* consumption, add *parallel computing*?
>
> 4. How can we theoretically estimate memory costs with *ILU, Euclid,
> Boomeramg*?
>
> 5. At what stage are memory leaks most likely?
>
>
>
> In any case, thank you so much for your attention! Will be grateful for
> any response.
>
> Kind regards,
> Viktor Nazdrachev
> R&D senior researcher
> Geosteering Technologies LLC
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210820/3133ec7b/attachment.html>

From s_g at berkeley.edu  Fri Aug 20 13:32:13 2021
From: s_g at berkeley.edu (Sanjay Govindjee)
Date: Fri, 20 Aug 2021 11:32:13 -0700
Subject: [petsc-users] Euclid or Boomeramg vs ILU: questions.
In-Reply-To: <CADOhEh7KhcaJrEfcT95HYv_=WH6OQ4u__8QaZtAJvLG_1fVGDw@mail.gmail.com>
References: <CAELBu--XbLfrQP87VoxDesSjLspuhrptar+PN1gzhJ=GSun-nA@mail.gmail.com>
	<CADOhEh7KhcaJrEfcT95HYv_=WH6OQ4u__8QaZtAJvLG_1fVGDw@mail.gmail.com>
Message-ID: <15e3a0b9-b16b-8a59-5801-bef9895f762d@berkeley.edu>

Mark's suggestion will definitely help a lot.? Remove the displacement 
bc equations or include them in the matrix by zeroing out the row and 
putting a 1 on the diagonal.? The Lagrange multiplier will cause grief.

On 8/20/21 11:21 AM, Mark Adams wrote:
> Constraints are a pain with scalable/iterative solvers. If you order 
> the constraints last then ILU should work as well as it can work,?but 
> AMG gets confused by the constraint equations.
> You could look at PETSc's?Stokes solvers, but it would be best if you 
> could remove the constrained equations from your system if they are 
> just simple point wise BC's.
> Mark
>
> On Fri, Aug 20, 2021 at 8:53 AM ????????? ?????? 
> <numbersixvs at gmail.com <mailto:numbersixvs at gmail.com>> wrote:
>
>     *Hello, dear PETSc team!*
>
>     I have a 3D elasticity with heterogeneous properties problem.
>     There is unstructured grid with aspect ratio varied from 4 to 25.
>     Dirichlet BCs (bottom zero displacements) are imposed via linear
>     constraint equations using Lagrange multipliers. Also, Neumann
>     (traction) BCs are imposed on side edges of mesh. Gravity load is
>     also accounted for.
>
>     I can solve this problem with *dgmres solver*?and *ILU*?as a
>     *preconditioner*. But ILU doesn`t support parallel computing, so I
>     decided to use Euclid or Boomeramg as a preconditioner. The issue
>     is in slow convergence and high memory consumption, much higher,
>     than for ILU.
>
>     E.g., for source matrix size 2.14 GB with *ILU-0
>     preconditioning*?memory consumption is about 5.9 GB, and the
>     process converges due to 767 iterations, and with *Euclid-0
>     preconditioning*?memory consumption is about 8.7 GB, and the
>     process converges due to 1732 iterations.
>
>     One of the following preconditioners is currently in use: *ILU-0,
>     ILU-1, Hypre (Euclid), Hypre (boomeramg)*.
>
>     As a result of computations */(logs and memory logs are
>     attached)/*, the following is established for preconditioners:
>
>     1. *ILU-0*: does not always provide convergence (or provides, but
>     slow); uses an acceptable amount of RAM; does not support parallel
>     computing.
>
>     2. *ILU-1*: stable; memory consumption is much higher than that of
>     ILU-0; does not support parallel computing.
>
>     3. *Euclid*: provides very slow convergence, calculations are
>     performed several times slower than for ILU-0; memory consumption
>     greatly exceeds both ILU-0 and ILU-1; supports parallel computing.
>     Also ?drop tolerance? doesn?t provide enough accuracy in some
>     cells, so I don?t use it.
>
>     4. *Boomeramg*: provides very slow convergence, calculations are
>     performed several times slower than for ILU-0; memory consumption
>     greatly exceeds both ILU-0 and ILU-1; supports parallel computing.
>
>     In this regard, the following questions arose:
>
>     1. Is this behavior expected for HYPRE in computations with 1 MPI
>     process? If not, is that problem can be related to *PETSc*?or *HYPRE*?
>
>     2. Hypre (Euclid) has much fewer parameters than ILU. Among them
>     is the factorization level *"-pc_hypre_euclid_level <now -2:
>     formerly -2>: Factorization levels (None)"*?and its default value
>     looks very strange, moreover, it doesn?t matter what factor is
>     chosen -2, -1 or 0. Could it be that the parameter is confused
>     with Column pivot tolerance in ILU - *"-pc_factor_column_pivot
>     <-2.: -2.>: Column pivot tolerance (used only for some
>     factorization) (PCFactorSetColumnPivot)"*?
>
>     3. What preconditioner would you recommend to: optimize
>     *convergence*, *memory*?consumption, add *parallel computing*?
>
>     4. How can we theoretically estimate memory costs with *ILU,
>     Euclid, Boomeramg*?
>
>     5. At what stage are memory leaks most likely?
>
>     In any case, thank you so much for your attention! Will be
>     grateful for any response.
>
>     Kind regards,
>     Viktor Nazdrachev
>     R&D senior researcher
>     Geosteering Technologies LLC
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210820/83286f91/attachment.html>

From yuf2 at rpi.edu  Fri Aug 20 13:54:22 2021
From: yuf2 at rpi.edu (Feimi Yu)
Date: Fri, 20 Aug 2021 14:54:22 -0400
Subject: [petsc-users] Reaching limit number of communicator with
 Spectrum MPI
In-Reply-To: <CA+MQGp9fHsmmgO2f8DVhGDDoc+kRm87UFa=t0FwkrhJGfG8neA@mail.gmail.com>
References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu>
	<CA+MQGp_EVwiHF1zaRF5o4qvWpk4yByq-Qrrx3OO2hrrKHnSvZA@mail.gmail.com>
	<71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu>
	<1ce1902e-4335-29d4-033-42e219e6355c@mcs.anl.gov>
	<CA+MQGp_PsC5bNr461Ehc5t81cy-PxzVmy5MNhKGMGS7NnPFRew@mail.gmail.com>
	<878s0x6118.fsf@jedbrown.org>
	<a515a91d-73ee-8622-4a8d-f0517924caa2@rpi.edu>
	<CA+MQGp-HzqZ_gDio0_TeO-ojXc7iJVYSZvztwcX8VMhkQtc4Xg@mail.gmail.com>
	<EBABBCEA-5488-4CF5-BD06-1A3DD9CE8DE1@petsc.dev>
	<CA+MQGp9fHsmmgO2f8DVhGDDoc+kRm87UFa=t0FwkrhJGfG8neA@mail.gmail.com>
Message-ID: <59eab54c-ad95-c305-e233-f0f39d613011@rpi.edu>

Hi Barry and Junchao,

Actually I did a simple MPI "dup and free" test before with Spectrum 
MPI, but that one did not have any problem. I'm not a PETSc programmer 
as I mainly use deal.ii's PETSc wrappers, but I managed to write a 
minimal program based on petsc/src/mat/tests/ex98.c to reproduce my 
problem. This piece of code creates and destroys 10,000 instances of 
Hypre Parasail preconditioners (for my own code, it uses Euclid, but I 
don't think it matters). It runs fine with OpenMPI but reports the out 
of communicator error with Sepctrum MPI. The code is attached in the 
email. In case the attachment is not available, I also uploaded a copy 
on my google drive:

https://drive.google.com/drive/folders/1DCf7lNlks8GjazvoP7c211ojNHLwFKL6?usp=sharing

Thanks!

Feimi

On 8/20/21 9:58 AM, Junchao Zhang wrote:
> Feimi, if it is easy to reproduce, could you give instructions on how 
> to reproduce that?
>
> PS: Spectrum MPI is based on OpenMPI.? I don't understand why it has 
> the problem but OpenMPI does not.? It could be a bug in petsc or 
> user's code.? For reference counting on MPI_Comm, we already have 
> petsc inner comm. I think we can reuse that.
>
> --Junchao Zhang
>
>
> On Fri, Aug 20, 2021 at 12:33 AM Barry Smith <bsmith at petsc.dev 
> <mailto:bsmith at petsc.dev>> wrote:
>
>
>     ? It sounds like maybe the Spectrum MPI_Comm_free() is not
>     returning the comm to the "pool" as available for future use; a
>     very buggy MPI implementation. This can easily be checked in a
>     tiny standalone MPI program that simply comm dups and frees
>     thousands of times in a loop. Could even be a configure test (that
>     requires running an MPI program). I do not remember if we ever
>     tested this possibility; maybe and I forgot.
>
>     ? If this is the problem we can provide a "work around" that
>     attributes the new comm (to be passed to hypre) to the old comm
>     with a reference count value also in the attribute. When the hypre
>     matrix is created that count is (with the new comm) is set to 1,
>     when the hypre matrix is freed that count is set to zero (but the
>     comm is not freed), in the next call to create the hypre matrix
>     when the attribute is found, the count is zero so PETSc knows it
>     can pass the same comm again to the new hypre matrix.
>
>     This will only allow one simultaneous hypre matrix to be created
>     from the original comm. To allow multiply simultaneous hypre
>     matrix one could have multiple comms and counts in the attribute
>     and just check them until one finds an available one to reuse (or
>     creates yet another one if all the current ones are busy with
>     hypre matrices). So it is the same model as DMGetXXVector() where
>     vectors are checked out and then checked in to be available later.
>     This would solve the currently reported problem (if it is a buggy
>     MPI that does not properly free comms), but not solve the MOOSE
>     problem where 10,000 comms are needed at the same time.
>
>     ? Barry
>
>
>
>
>
>>     On Aug 19, 2021, at 3:29 PM, Junchao Zhang
>>     <junchao.zhang at gmail.com <mailto:junchao.zhang at gmail.com>> wrote:
>>
>>
>>
>>
>>     On Thu, Aug 19, 2021 at 2:08 PM Feimi Yu <yuf2 at rpi.edu
>>     <mailto:yuf2 at rpi.edu>> wrote:
>>
>>         Hi Jed,
>>
>>         In my case, I only have 2 hypre preconditioners at the same
>>         time, and
>>         they do not solve simultaneously, so it might not be case 1.
>>
>>         I checked the stack for all the calls of
>>         MPI_Comm_dup/MPI_Comm_free on
>>         my own machine (with OpenMPI), all the communicators are
>>         freed from my
>>         observation. I could not test it with Spectrum MPI on the
>>         clusters
>>         immediately because all the dependencies were built in
>>         release mode.
>>         However, as I mentioned, I haven't had this problem with
>>         OpenMPI before,
>>         so I'm not sure if this is really an MPI implementation
>>         problem, or just
>>         because Spectrum MPI has less limit for the number of
>>         communicators,
>>         and/or this also depends on how many MPI ranks are used, as
>>         only 2 out
>>         of 40 ranks reported the error.
>>
>>     You can add printf around MPI_Comm_dup/MPI_Comm_free sites on the
>>     two ranks, e.g., if (myrank == 38) printf(...), to see if the
>>     dup/free are paired.
>>     ?As a workaround, I replaced the MPI_Comm_dup() at
>>
>>         petsc/src/mat/impls/hypre/mhypre.c:2120 with a copy
>>         assignment, and also
>>         removed the MPI_Comm_free() in the hypre destroyer. My code
>>         runs fine
>>         with Spectrum MPI now, but I don't think this is a long-term
>>         solution.
>>
>>         Thanks!
>>
>>         Feimi
>>
>>         On 8/19/21 9:01 AM, Jed Brown wrote:
>>         > Junchao Zhang <junchao.zhang at gmail.com
>>         <mailto:junchao.zhang at gmail.com>> writes:
>>         >
>>         >> Hi, Feimi,
>>         >>? ? I need to consult Jed (cc'ed).
>>         >>? ? Jed, is this an example of
>>         >>
>>         https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663
>>         <https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663>?
>>         >> If Feimi really can not free matrices, then we just need
>>         to attach a
>>         >> hypre-comm to a petsc inner comm, and pass that to hypre.
>>         > Are there a bunch of solves as in that case?
>>         >
>>         > My understanding is that one should be able to
>>         MPI_Comm_dup/MPI_Comm_free as many times as you like, but the
>>         implementation has limits on how many communicators can
>>         co-exist at any one time. The many-at-once is what we
>>         encountered in that 2018 thread.
>>         >
>>         > One way to check would be to use a debugger or tracer to
>>         examine the stack every time (P)MPI_Comm_dup and
>>         (P)MPI_Comm_free are called.
>>         >
>>         > case 1: we'll find lots of dups without frees (until the
>>         end) because the user really wants lots of these existing at
>>         the same time.
>>         >
>>         > case 2: dups are unfreed because of reference counting
>>         issue/inessential references
>>         >
>>         >
>>         > In case 1, I think the solution is as outlined in the
>>         thread, PETSc can create an inner-comm for Hypre. I think I'd
>>         prefer to attach it to the outer comm instead of the PETSc
>>         inner comm, but perhaps a case could be made either way.
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210820/aa5ddfc6/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hypre_precon_test.cpp
Type: text/x-c++src
Size: 3422 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210820/aa5ddfc6/attachment-0001.bin>

From yuf2 at rpi.edu  Fri Aug 20 14:02:30 2021
From: yuf2 at rpi.edu (Feimi Yu)
Date: Fri, 20 Aug 2021 15:02:30 -0400
Subject: [petsc-users] Reaching limit number of communicator with
 Spectrum MPI
In-Reply-To: <59eab54c-ad95-c305-e233-f0f39d613011@rpi.edu>
References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu>
	<CA+MQGp_EVwiHF1zaRF5o4qvWpk4yByq-Qrrx3OO2hrrKHnSvZA@mail.gmail.com>
	<71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu>
	<1ce1902e-4335-29d4-033-42e219e6355c@mcs.anl.gov>
	<CA+MQGp_PsC5bNr461Ehc5t81cy-PxzVmy5MNhKGMGS7NnPFRew@mail.gmail.com>
	<878s0x6118.fsf@jedbrown.org>
	<a515a91d-73ee-8622-4a8d-f0517924caa2@rpi.edu>
	<CA+MQGp-HzqZ_gDio0_TeO-ojXc7iJVYSZvztwcX8VMhkQtc4Xg@mail.gmail.com>
	<EBABBCEA-5488-4CF5-BD06-1A3DD9CE8DE1@petsc.dev>
	<CA+MQGp9fHsmmgO2f8DVhGDDoc+kRm87UFa=t0FwkrhJGfG8neA@mail.gmail.com>
	<59eab54c-ad95-c305-e233-f0f39d613011@rpi.edu>
Message-ID: <21c1bf50-2e96-b924-529d-b1e78a319f78@rpi.edu>

Sorry, I forgot to destroy the matrix after the loop, but anyway, the 
in-loop preconditioners are destroyed. Updated the code here and the 
google drive.

Feimi

On 8/20/21 2:54 PM, Feimi Yu wrote:
>
> Hi Barry and Junchao,
>
> Actually I did a simple MPI "dup and free" test before with Spectrum 
> MPI, but that one did not have any problem. I'm not a PETSc programmer 
> as I mainly use deal.ii's PETSc wrappers, but I managed to write a 
> minimal program based on petsc/src/mat/tests/ex98.c to reproduce my 
> problem. This piece of code creates and destroys 10,000 instances of 
> Hypre Parasail preconditioners (for my own code, it uses Euclid, but I 
> don't think it matters). It runs fine with OpenMPI but reports the out 
> of communicator error with Sepctrum MPI. The code is attached in the 
> email. In case the attachment is not available, I also uploaded a copy 
> on my google drive:
>
> https://drive.google.com/drive/folders/1DCf7lNlks8GjazvoP7c211ojNHLwFKL6?usp=sharing
>
> Thanks!
>
> Feimi
>
> On 8/20/21 9:58 AM, Junchao Zhang wrote:
>> Feimi, if it is easy to reproduce, could you give instructions on how 
>> to reproduce that?
>>
>> PS: Spectrum MPI is based on OpenMPI.? I don't understand why it has 
>> the problem but OpenMPI does not.? It could be a bug in petsc or 
>> user's code.? For reference counting on MPI_Comm, we already have 
>> petsc inner comm. I think we can reuse that.
>>
>> --Junchao Zhang
>>
>>
>> On Fri, Aug 20, 2021 at 12:33 AM Barry Smith <bsmith at petsc.dev 
>> <mailto:bsmith at petsc.dev>> wrote:
>>
>>
>>     ? It sounds like maybe the Spectrum MPI_Comm_free() is not
>>     returning the comm to the "pool" as available for future use; a
>>     very buggy MPI implementation. This can easily be checked in a
>>     tiny standalone MPI program that simply comm dups and frees
>>     thousands of times in a loop. Could even be a configure test
>>     (that requires running an MPI program). I do not remember if we
>>     ever tested this possibility; maybe and I forgot.
>>
>>     ? If this is the problem we can provide a "work around" that
>>     attributes the new comm (to be passed to hypre) to the old comm
>>     with a reference count value also in the attribute. When the
>>     hypre matrix is created that count is (with the new comm) is set
>>     to 1, when the hypre matrix is freed that count is set to zero
>>     (but the comm is not freed), in the next call to create the hypre
>>     matrix when the attribute is found, the count is zero so PETSc
>>     knows it can pass the same comm again to the new hypre matrix.
>>
>>     This will only allow one simultaneous hypre matrix to be created
>>     from the original comm. To allow multiply simultaneous hypre
>>     matrix one could have multiple comms and counts in the attribute
>>     and just check them until one finds an available one to reuse (or
>>     creates yet another one if all the current ones are busy with
>>     hypre matrices). So it is the same model as DMGetXXVector() where
>>     vectors are checked out and then checked in to be available
>>     later. This would solve the currently reported problem (if it is
>>     a buggy MPI that does not properly free comms), but not solve the
>>     MOOSE problem where 10,000 comms are needed at the same time.
>>
>>     ? Barry
>>
>>
>>
>>
>>
>>>     On Aug 19, 2021, at 3:29 PM, Junchao Zhang
>>>     <junchao.zhang at gmail.com <mailto:junchao.zhang at gmail.com>> wrote:
>>>
>>>
>>>
>>>
>>>     On Thu, Aug 19, 2021 at 2:08 PM Feimi Yu <yuf2 at rpi.edu
>>>     <mailto:yuf2 at rpi.edu>> wrote:
>>>
>>>         Hi Jed,
>>>
>>>         In my case, I only have 2 hypre preconditioners at the same
>>>         time, and
>>>         they do not solve simultaneously, so it might not be case 1.
>>>
>>>         I checked the stack for all the calls of
>>>         MPI_Comm_dup/MPI_Comm_free on
>>>         my own machine (with OpenMPI), all the communicators are
>>>         freed from my
>>>         observation. I could not test it with Spectrum MPI on the
>>>         clusters
>>>         immediately because all the dependencies were built in
>>>         release mode.
>>>         However, as I mentioned, I haven't had this problem with
>>>         OpenMPI before,
>>>         so I'm not sure if this is really an MPI implementation
>>>         problem, or just
>>>         because Spectrum MPI has less limit for the number of
>>>         communicators,
>>>         and/or this also depends on how many MPI ranks are used, as
>>>         only 2 out
>>>         of 40 ranks reported the error.
>>>
>>>     You can add printf around MPI_Comm_dup/MPI_Comm_free sites on
>>>     the two ranks, e.g., if (myrank == 38) printf(...), to see if
>>>     the dup/free are paired.
>>>     ?As a workaround, I replaced the MPI_Comm_dup() at
>>>
>>>         petsc/src/mat/impls/hypre/mhypre.c:2120 with a copy
>>>         assignment, and also
>>>         removed the MPI_Comm_free() in the hypre destroyer. My code
>>>         runs fine
>>>         with Spectrum MPI now, but I don't think this is a long-term
>>>         solution.
>>>
>>>         Thanks!
>>>
>>>         Feimi
>>>
>>>         On 8/19/21 9:01 AM, Jed Brown wrote:
>>>         > Junchao Zhang <junchao.zhang at gmail.com
>>>         <mailto:junchao.zhang at gmail.com>> writes:
>>>         >
>>>         >> Hi, Feimi,
>>>         >>? ? I need to consult Jed (cc'ed).
>>>         >>? ? Jed, is this an example of
>>>         >>
>>>         https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663
>>>         <https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663>?
>>>         >> If Feimi really can not free matrices, then we just need
>>>         to attach a
>>>         >> hypre-comm to a petsc inner comm, and pass that to hypre.
>>>         > Are there a bunch of solves as in that case?
>>>         >
>>>         > My understanding is that one should be able to
>>>         MPI_Comm_dup/MPI_Comm_free as many times as you like, but
>>>         the implementation has limits on how many communicators can
>>>         co-exist at any one time. The many-at-once is what we
>>>         encountered in that 2018 thread.
>>>         >
>>>         > One way to check would be to use a debugger or tracer to
>>>         examine the stack every time (P)MPI_Comm_dup and
>>>         (P)MPI_Comm_free are called.
>>>         >
>>>         > case 1: we'll find lots of dups without frees (until the
>>>         end) because the user really wants lots of these existing at
>>>         the same time.
>>>         >
>>>         > case 2: dups are unfreed because of reference counting
>>>         issue/inessential references
>>>         >
>>>         >
>>>         > In case 1, I think the solution is as outlined in the
>>>         thread, PETSc can create an inner-comm for Hypre. I think
>>>         I'd prefer to attach it to the outer comm instead of the
>>>         PETSc inner comm, but perhaps a case could be made either way.
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210820/4a78aca1/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hypre_precon_test.cpp
Type: text/x-c++src
Size: 3545 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210820/4a78aca1/attachment.bin>

From elbueler at alaska.edu  Fri Aug 20 14:11:36 2021
From: elbueler at alaska.edu (Ed Bueler)
Date: Fri, 20 Aug 2021 11:11:36 -0800
Subject: [petsc-users] Euclid or Boomeramg vs ILU: questions.
Message-ID: <CAOHboJ9UuN4pmJaR9+gL03t6rFAf4JS2rEhyGDq43W+p0jiuyw@mail.gmail.com>

Viktor --

As a basic comment, note that ILU can be used in parallel, namely on each
processor block, by either non-overlapping domain decomposition:

-pc_type bjacobi -sub_pc_type ilu

or with overlap:

-pc_type asm -sub_pc_type ilu

See the discussion of block Jacobi and ASM at

https://petsc.org/release/docs/manual/ksp/#block-jacobi-and-overlapping-additive-schwarz-preconditioners

Of course, no application of ILU will be generating optimal performance,
but it looks like you are not yet getting that from AMG either.

Ed


On Fri, Aug 20, 2021 at 8:53 AM ????????? ?????? <numbersixvs at gmail.com>
wrote:

> *Hello, dear PETSc team!*
>
>
>
> I have a 3D elasticity with heterogeneous properties problem. There is
> unstructured grid with aspect ratio varied from 4 to 25. Dirichlet BCs
> (bottom zero displacements) are imposed via linear constraint equations
> using Lagrange multipliers. Also, Neumann (traction) BCs are imposed on
> side edges of mesh. Gravity load is also accounted for.
>
> I can solve this problem with *dgmres solver* and *ILU* as a
> *preconditioner*. But ILU doesn`t support parallel computing, so I
> decided to use Euclid or Boomeramg as a preconditioner. The issue is in
> slow convergence and high memory consumption, much higher, than for ILU.
>
> E.g., for source matrix size 2.14 GB with *ILU-0 preconditioning* memory
> consumption is about 5.9 GB, and the process converges due to 767
> iterations, and with *Euclid-0 preconditioning* memory consumption is
> about 8.7 GB, and the process converges due to 1732 iterations.
>
> One of the following preconditioners is currently in use: *ILU-0, ILU-1,
> Hypre (Euclid), Hypre (boomeramg)*.
>
> As a result of computations *(logs and memory logs are attached)*, the
> following is established for preconditioners:
>
> 1. *ILU-0*: does not always provide convergence (or provides, but slow);
> uses an acceptable amount of RAM; does not support parallel computing.
>
> 2. *ILU-1*: stable; memory consumption is much higher than that of ILU-0;
> does not support parallel computing.
>
> 3. *Euclid*: provides very slow convergence, calculations are performed
> several times slower than for ILU-0; memory consumption greatly exceeds
> both ILU-0 and ILU-1; supports parallel computing. Also ?drop tolerance?
> doesn?t provide enough accuracy in some cells, so I don?t use it.
>
> 4. *Boomeramg*: provides very slow convergence, calculations are
> performed several times slower than for ILU-0; memory consumption greatly
> exceeds both ILU-0 and ILU-1; supports parallel computing.
>
>
>
> In this regard, the following questions arose:
>
> 1. Is this behavior expected for HYPRE in computations with 1 MPI process?
> If not, is that problem can be related to *PETSc* or *HYPRE*?
>
> 2. Hypre (Euclid) has much fewer parameters than ILU. Among them is the
> factorization level *"-pc_hypre_euclid_level <now -2: formerly -2>:
> Factorization levels (None)"* and its default value looks very strange,
> moreover, it doesn?t matter what factor is chosen -2, -1 or 0. Could it be
> that the parameter is confused with Column pivot tolerance in ILU -
*"-pc_factor_column_pivot
> <-2.: -2.>: Column pivot tolerance (used only for some factorization)
> (PCFactorSetColumnPivot)"*?
>
> 3. What preconditioner would you recommend to: optimize *convergence*,
> *memory* consumption, add *parallel computing*?
>
> 4. How can we theoretically estimate memory costs with *ILU, Euclid,
> Boomeramg*?
>
> 5. At what stage are memory leaks most likely?
>
>
>
> In any case, thank you so much for your attention! Will be grateful for
> any response.
>
> Kind regards,
> Viktor Nazdrachev
> R&D senior researcher
> Geosteering Technologies LLC

-- 
Ed Bueler
Dept of Mathematics and Statistics
University of Alaska Fairbanks
Fairbanks, AK 99775-6660
306C Chapman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210820/59be0c1b/attachment-0001.html>

From junchao.zhang at gmail.com  Fri Aug 20 16:14:11 2021
From: junchao.zhang at gmail.com (Junchao Zhang)
Date: Fri, 20 Aug 2021 16:14:11 -0500
Subject: [petsc-users] Reaching limit number of communicator with
 Spectrum MPI
In-Reply-To: <21c1bf50-2e96-b924-529d-b1e78a319f78@rpi.edu>
References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu>
	<CA+MQGp_EVwiHF1zaRF5o4qvWpk4yByq-Qrrx3OO2hrrKHnSvZA@mail.gmail.com>
	<71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu>
	<1ce1902e-4335-29d4-033-42e219e6355c@mcs.anl.gov>
	<CA+MQGp_PsC5bNr461Ehc5t81cy-PxzVmy5MNhKGMGS7NnPFRew@mail.gmail.com>
	<878s0x6118.fsf@jedbrown.org>
	<a515a91d-73ee-8622-4a8d-f0517924caa2@rpi.edu>
	<CA+MQGp-HzqZ_gDio0_TeO-ojXc7iJVYSZvztwcX8VMhkQtc4Xg@mail.gmail.com>
	<EBABBCEA-5488-4CF5-BD06-1A3DD9CE8DE1@petsc.dev>
	<CA+MQGp9fHsmmgO2f8DVhGDDoc+kRm87UFa=t0FwkrhJGfG8neA@mail.gmail.com>
	<59eab54c-ad95-c305-e233-f0f39d613011@rpi.edu>
	<21c1bf50-2e96-b924-529d-b1e78a319f78@rpi.edu>
Message-ID: <CA+MQGp--ashcdAv_YJ+CKxaK=fQSF2jcEO5Ykhht+FKCJQsvNw@mail.gmail.com>

Feimi,
  I'm able to reproduce the problem. I will have a look. Thanks a lot for
the example.
--Junchao Zhang


On Fri, Aug 20, 2021 at 2:02 PM Feimi Yu <yuf2 at rpi.edu> wrote:

> Sorry, I forgot to destroy the matrix after the loop, but anyway, the
> in-loop preconditioners are destroyed. Updated the code here and the google
> drive.
>
> Feimi
> On 8/20/21 2:54 PM, Feimi Yu wrote:
>
> Hi Barry and Junchao,
>
> Actually I did a simple MPI "dup and free" test before with Spectrum MPI,
> but that one did not have any problem. I'm not a PETSc programmer as I
> mainly use deal.ii's PETSc wrappers, but I managed to write a minimal
> program based on petsc/src/mat/tests/ex98.c to reproduce my problem. This
> piece of code creates and destroys 10,000 instances of Hypre Parasail
> preconditioners (for my own code, it uses Euclid, but I don't think it
> matters). It runs fine with OpenMPI but reports the out of communicator
> error with Sepctrum MPI. The code is attached in the email. In case the
> attachment is not available, I also uploaded a copy on my google drive:
>
>
> https://drive.google.com/drive/folders/1DCf7lNlks8GjazvoP7c211ojNHLwFKL6?usp=sharing
>
> Thanks!
>
> Feimi
> On 8/20/21 9:58 AM, Junchao Zhang wrote:
>
> Feimi, if it is easy to reproduce, could you give instructions on how to
> reproduce that?
>
> PS: Spectrum MPI is based on OpenMPI.  I don't understand why it has the
> problem but OpenMPI does not.  It could be a bug in petsc or user's code.
> For reference counting on MPI_Comm, we already have petsc inner comm. I
> think we can reuse that.
>
> --Junchao Zhang
>
>
> On Fri, Aug 20, 2021 at 12:33 AM Barry Smith <bsmith at petsc.dev> wrote:
>
>>
>>   It sounds like maybe the Spectrum MPI_Comm_free() is not returning the
>> comm to the "pool" as available for future use; a very buggy MPI
>> implementation. This can easily be checked in a tiny standalone MPI program
>> that simply comm dups and frees thousands of times in a loop. Could even be
>> a configure test (that requires running an MPI program). I do not remember
>> if we ever tested this possibility; maybe and I forgot.
>>
>>   If this is the problem we can provide a "work around" that attributes
>> the new comm (to be passed to hypre) to the old comm with a reference count
>> value also in the attribute. When the hypre matrix is created that count is
>> (with the new comm) is set to 1, when the hypre matrix is freed that count
>> is set to zero (but the comm is not freed), in the next call to create the
>> hypre matrix when the attribute is found, the count is zero so PETSc knows
>> it can pass the same comm again to the new hypre matrix.
>>
>> This will only allow one simultaneous hypre matrix to be created from the
>> original comm. To allow multiply simultaneous hypre matrix one could have
>> multiple comms and counts in the attribute and just check them until one
>> finds an available one to reuse (or creates yet another one if all the
>> current ones are busy with hypre matrices). So it is the same model as
>> DMGetXXVector() where vectors are checked out and then checked in to be
>> available later. This would solve the currently reported problem (if it is
>> a buggy MPI that does not properly free comms), but not solve the MOOSE
>> problem where 10,000 comms are needed at the same time.
>>
>>   Barry
>>
>>
>>
>>
>>
>> On Aug 19, 2021, at 3:29 PM, Junchao Zhang <junchao.zhang at gmail.com>
>> wrote:
>>
>>
>>
>>
>> On Thu, Aug 19, 2021 at 2:08 PM Feimi Yu <yuf2 at rpi.edu> wrote:
>>
>>> Hi Jed,
>>>
>>> In my case, I only have 2 hypre preconditioners at the same time, and
>>> they do not solve simultaneously, so it might not be case 1.
>>>
>>> I checked the stack for all the calls of MPI_Comm_dup/MPI_Comm_free on
>>> my own machine (with OpenMPI), all the communicators are freed from my
>>> observation. I could not test it with Spectrum MPI on the clusters
>>> immediately because all the dependencies were built in release mode.
>>> However, as I mentioned, I haven't had this problem with OpenMPI before,
>>> so I'm not sure if this is really an MPI implementation problem, or just
>>> because Spectrum MPI has less limit for the number of communicators,
>>> and/or this also depends on how many MPI ranks are used, as only 2 out
>>> of 40 ranks reported the error.
>>>
>> You can add printf around MPI_Comm_dup/MPI_Comm_free sites on the two
>> ranks, e.g., if (myrank == 38) printf(...), to see if the dup/free are
>> paired.
>>
>>  As a workaround, I replaced the MPI_Comm_dup() at
>>
>>> petsc/src/mat/impls/hypre/mhypre.c:2120 with a copy assignment, and also
>>> removed the MPI_Comm_free() in the hypre destroyer. My code runs fine
>>> with Spectrum MPI now, but I don't think this is a long-term solution.
>>>
>>> Thanks!
>>>
>>> Feimi
>>>
>>> On 8/19/21 9:01 AM, Jed Brown wrote:
>>> > Junchao Zhang <junchao.zhang at gmail.com> writes:
>>> >
>>> >> Hi, Feimi,
>>> >>    I need to consult Jed (cc'ed).
>>> >>    Jed, is this an example of
>>> >>
>>> https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663
>>> ?
>>> >> If Feimi really can not free matrices, then we just need to attach a
>>> >> hypre-comm to a petsc inner comm, and pass that to hypre.
>>> > Are there a bunch of solves as in that case?
>>> >
>>> > My understanding is that one should be able to
>>> MPI_Comm_dup/MPI_Comm_free as many times as you like, but the
>>> implementation has limits on how many communicators can co-exist at any one
>>> time. The many-at-once is what we encountered in that 2018 thread.
>>> >
>>> > One way to check would be to use a debugger or tracer to examine the
>>> stack every time (P)MPI_Comm_dup and (P)MPI_Comm_free are called.
>>> >
>>> > case 1: we'll find lots of dups without frees (until the end) because
>>> the user really wants lots of these existing at the same time.
>>> >
>>> > case 2: dups are unfreed because of reference counting
>>> issue/inessential references
>>> >
>>> >
>>> > In case 1, I think the solution is as outlined in the thread, PETSc
>>> can create an inner-comm for Hypre. I think I'd prefer to attach it to the
>>> outer comm instead of the PETSc inner comm, but perhaps a case could be
>>> made either way.
>>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210820/1c52e5d1/attachment.html>

From bsmith at petsc.dev  Fri Aug 20 16:17:12 2021
From: bsmith at petsc.dev (Barry Smith)
Date: Fri, 20 Aug 2021 16:17:12 -0500
Subject: [petsc-users] Parallelize in the y direction
In-Reply-To: <CAMYG4GkV0D0-JwWNAs-LMJvu4F3_5ZPSi+q3JcubV9ubXOC_ew@mail.gmail.com>
References: <AM9PR03MB69612B3C1D55A98A47F512D781C09@AM9PR03MB6961.eurprd03.prod.outlook.com>
	<CAMYG4GkV0D0-JwWNAs-LMJvu4F3_5ZPSi+q3JcubV9ubXOC_ew@mail.gmail.com>
Message-ID: <8FDFD94E-155D-475D-A8EA-BD459A014B2A@petsc.dev>


   Trying to solve many "one-dimensional" problems each in parallel on different subset of ranks will be massive pain to do specifically. I recommend just forming a single matrix for all these systems and solving it with KSPSolve and block Jacobi preconditioning or even a parallel direct solver such as with -pc_type lu -pc_factor_mat_solver_type mumps

   Barry

Yes, this single system, in a certain ordering is block diagonal (each block being tridiagonal) so contains "independent" subsystems; what I suggest above essentially takes advantage of this structure to be reasonably efficient, yet trivial to code.

> On Aug 20, 2021, at 9:14 AM, Matthew Knepley <knepley at gmail.com> wrote:
> 
> On Fri, Aug 20, 2021 at 7:53 AM Joauma Marichal <joauma.marichal at uclouvain.be <mailto:joauma.marichal at uclouvain.be>> wrote:
>  Dear Sir or Madam,
> 
> I am looking for advice regarding some of PETSc functionnalities. I am currently using PETSc to solve the Navier-Stokes equations on a 3D mesh decomposed over several processors.  However, until now, the processors are distributed along the x and z directions but not along the y one. Indeed, at some point in the algorithm, I must solve a tridiagonal system that depends only on y. Until now, I have therefore performed something like this:
> for(int k = cornp->zs, k<cornp->zs+cornp->zm; ++k){
>      for(int i = cornp->xs, i<cornp->xs+cornp->xm; ++i){
>            Create and solve a tridiagonal system for all the y coordinates (which are on the same process)
> }
> However, I would like to decompose my mesh in the y direction (as this should improve the code efficiency).
> I managed to do so by creating a system based on the 3D DM of all my case (so 1 system of size x*y*z). Unfortunately, this does not seem to be very efficient. 
> Do you have some advice on how to cut in the y direction while still being able to solve x*z systems of size y? Should I create 1D DMs?
> 
> 1) Are you using a 3D DMDA?
> 
> 2) Is the coupling much different in the x and z than in the y direction?
> 
>   Thanks,
> 
>      Matt
>  
> Thanks a lot for your help. 
> 
> Best regards, 
> 
> Joauma Marichal
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210820/c3c965a0/attachment.html>

From junchao.zhang at gmail.com  Sat Aug 21 10:30:46 2021
From: junchao.zhang at gmail.com (Junchao Zhang)
Date: Sat, 21 Aug 2021 10:30:46 -0500
Subject: [petsc-users] Reaching limit number of communicator with
 Spectrum MPI
In-Reply-To: <CA+MQGp--ashcdAv_YJ+CKxaK=fQSF2jcEO5Ykhht+FKCJQsvNw@mail.gmail.com>
References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu>
	<CA+MQGp_EVwiHF1zaRF5o4qvWpk4yByq-Qrrx3OO2hrrKHnSvZA@mail.gmail.com>
	<71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu>
	<1ce1902e-4335-29d4-033-42e219e6355c@mcs.anl.gov>
	<CA+MQGp_PsC5bNr461Ehc5t81cy-PxzVmy5MNhKGMGS7NnPFRew@mail.gmail.com>
	<878s0x6118.fsf@jedbrown.org>
	<a515a91d-73ee-8622-4a8d-f0517924caa2@rpi.edu>
	<CA+MQGp-HzqZ_gDio0_TeO-ojXc7iJVYSZvztwcX8VMhkQtc4Xg@mail.gmail.com>
	<EBABBCEA-5488-4CF5-BD06-1A3DD9CE8DE1@petsc.dev>
	<CA+MQGp9fHsmmgO2f8DVhGDDoc+kRm87UFa=t0FwkrhJGfG8neA@mail.gmail.com>
	<59eab54c-ad95-c305-e233-f0f39d613011@rpi.edu>
	<21c1bf50-2e96-b924-529d-b1e78a319f78@rpi.edu>
	<CA+MQGp--ashcdAv_YJ+CKxaK=fQSF2jcEO5Ykhht+FKCJQsvNw@mail.gmail.com>
Message-ID: <CA+MQGp8LbFNNkds8A_cvLN288wQVZSWVVg2iqmx-OmzneuGv+g@mail.gmail.com>

I checked and found MPI_Comm_dup() and MPI_Comm_free() were called in
pairs. So the MPI runtime should not complain about running out of
resources.
I guess there might be pending communications on communicators.  But I've
no means to know exactly. Per MPI manual, MPI_Comm_free() only marks a
communicator object for deallocation.
We can file a bug report to OLCF. With MPI source code, it should be easy
for them to debug.

--Junchao Zhang


On Fri, Aug 20, 2021 at 4:14 PM Junchao Zhang <junchao.zhang at gmail.com>
wrote:

> Feimi,
>   I'm able to reproduce the problem. I will have a look. Thanks a lot for
> the example.
> --Junchao Zhang
>
>
> On Fri, Aug 20, 2021 at 2:02 PM Feimi Yu <yuf2 at rpi.edu> wrote:
>
>> Sorry, I forgot to destroy the matrix after the loop, but anyway, the
>> in-loop preconditioners are destroyed. Updated the code here and the google
>> drive.
>>
>> Feimi
>> On 8/20/21 2:54 PM, Feimi Yu wrote:
>>
>> Hi Barry and Junchao,
>>
>> Actually I did a simple MPI "dup and free" test before with Spectrum MPI,
>> but that one did not have any problem. I'm not a PETSc programmer as I
>> mainly use deal.ii's PETSc wrappers, but I managed to write a minimal
>> program based on petsc/src/mat/tests/ex98.c to reproduce my problem. This
>> piece of code creates and destroys 10,000 instances of Hypre Parasail
>> preconditioners (for my own code, it uses Euclid, but I don't think it
>> matters). It runs fine with OpenMPI but reports the out of communicator
>> error with Sepctrum MPI. The code is attached in the email. In case the
>> attachment is not available, I also uploaded a copy on my google drive:
>>
>>
>> https://drive.google.com/drive/folders/1DCf7lNlks8GjazvoP7c211ojNHLwFKL6?usp=sharing
>>
>> Thanks!
>>
>> Feimi
>> On 8/20/21 9:58 AM, Junchao Zhang wrote:
>>
>> Feimi, if it is easy to reproduce, could you give instructions on how to
>> reproduce that?
>>
>> PS: Spectrum MPI is based on OpenMPI.  I don't understand why it has the
>> problem but OpenMPI does not.  It could be a bug in petsc or user's code.
>> For reference counting on MPI_Comm, we already have petsc inner comm. I
>> think we can reuse that.
>>
>> --Junchao Zhang
>>
>>
>> On Fri, Aug 20, 2021 at 12:33 AM Barry Smith <bsmith at petsc.dev> wrote:
>>
>>>
>>>   It sounds like maybe the Spectrum MPI_Comm_free() is not returning the
>>> comm to the "pool" as available for future use; a very buggy MPI
>>> implementation. This can easily be checked in a tiny standalone MPI program
>>> that simply comm dups and frees thousands of times in a loop. Could even be
>>> a configure test (that requires running an MPI program). I do not remember
>>> if we ever tested this possibility; maybe and I forgot.
>>>
>>>   If this is the problem we can provide a "work around" that attributes
>>> the new comm (to be passed to hypre) to the old comm with a reference count
>>> value also in the attribute. When the hypre matrix is created that count is
>>> (with the new comm) is set to 1, when the hypre matrix is freed that count
>>> is set to zero (but the comm is not freed), in the next call to create the
>>> hypre matrix when the attribute is found, the count is zero so PETSc knows
>>> it can pass the same comm again to the new hypre matrix.
>>>
>>> This will only allow one simultaneous hypre matrix to be created from
>>> the original comm. To allow multiply simultaneous hypre matrix one could
>>> have multiple comms and counts in the attribute and just check them until
>>> one finds an available one to reuse (or creates yet another one if all the
>>> current ones are busy with hypre matrices). So it is the same model as
>>> DMGetXXVector() where vectors are checked out and then checked in to be
>>> available later. This would solve the currently reported problem (if it is
>>> a buggy MPI that does not properly free comms), but not solve the MOOSE
>>> problem where 10,000 comms are needed at the same time.
>>>
>>>   Barry
>>>
>>>
>>>
>>>
>>>
>>> On Aug 19, 2021, at 3:29 PM, Junchao Zhang <junchao.zhang at gmail.com>
>>> wrote:
>>>
>>>
>>>
>>>
>>> On Thu, Aug 19, 2021 at 2:08 PM Feimi Yu <yuf2 at rpi.edu> wrote:
>>>
>>>> Hi Jed,
>>>>
>>>> In my case, I only have 2 hypre preconditioners at the same time, and
>>>> they do not solve simultaneously, so it might not be case 1.
>>>>
>>>> I checked the stack for all the calls of MPI_Comm_dup/MPI_Comm_free on
>>>> my own machine (with OpenMPI), all the communicators are freed from my
>>>> observation. I could not test it with Spectrum MPI on the clusters
>>>> immediately because all the dependencies were built in release mode.
>>>> However, as I mentioned, I haven't had this problem with OpenMPI
>>>> before,
>>>> so I'm not sure if this is really an MPI implementation problem, or
>>>> just
>>>> because Spectrum MPI has less limit for the number of communicators,
>>>> and/or this also depends on how many MPI ranks are used, as only 2 out
>>>> of 40 ranks reported the error.
>>>>
>>> You can add printf around MPI_Comm_dup/MPI_Comm_free sites on the two
>>> ranks, e.g., if (myrank == 38) printf(...), to see if the dup/free are
>>> paired.
>>>
>>>  As a workaround, I replaced the MPI_Comm_dup() at
>>>
>>>> petsc/src/mat/impls/hypre/mhypre.c:2120 with a copy assignment, and
>>>> also
>>>> removed the MPI_Comm_free() in the hypre destroyer. My code runs fine
>>>> with Spectrum MPI now, but I don't think this is a long-term solution.
>>>>
>>>> Thanks!
>>>>
>>>> Feimi
>>>>
>>>> On 8/19/21 9:01 AM, Jed Brown wrote:
>>>> > Junchao Zhang <junchao.zhang at gmail.com> writes:
>>>> >
>>>> >> Hi, Feimi,
>>>> >>    I need to consult Jed (cc'ed).
>>>> >>    Jed, is this an example of
>>>> >>
>>>> https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663
>>>> ?
>>>> >> If Feimi really can not free matrices, then we just need to attach a
>>>> >> hypre-comm to a petsc inner comm, and pass that to hypre.
>>>> > Are there a bunch of solves as in that case?
>>>> >
>>>> > My understanding is that one should be able to
>>>> MPI_Comm_dup/MPI_Comm_free as many times as you like, but the
>>>> implementation has limits on how many communicators can co-exist at any one
>>>> time. The many-at-once is what we encountered in that 2018 thread.
>>>> >
>>>> > One way to check would be to use a debugger or tracer to examine the
>>>> stack every time (P)MPI_Comm_dup and (P)MPI_Comm_free are called.
>>>> >
>>>> > case 1: we'll find lots of dups without frees (until the end) because
>>>> the user really wants lots of these existing at the same time.
>>>> >
>>>> > case 2: dups are unfreed because of reference counting
>>>> issue/inessential references
>>>> >
>>>> >
>>>> > In case 1, I think the solution is as outlined in the thread, PETSc
>>>> can create an inner-comm for Hypre. I think I'd prefer to attach it to the
>>>> outer comm instead of the PETSc inner comm, but perhaps a case could be
>>>> made either way.
>>>>
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210821/5ec0338b/attachment-0001.html>

From eijkhout at tacc.utexas.edu  Sat Aug 21 14:08:12 2021
From: eijkhout at tacc.utexas.edu (Victor Eijkhout)
Date: Sat, 21 Aug 2021 19:08:12 +0000
Subject: [petsc-users] Reaching limit number of communicator with
 Spectrum MPI
In-Reply-To: <EBABBCEA-5488-4CF5-BD06-1A3DD9CE8DE1@petsc.dev>
References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu>
	<CA+MQGp_EVwiHF1zaRF5o4qvWpk4yByq-Qrrx3OO2hrrKHnSvZA@mail.gmail.com>
	<71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu>
	<1ce1902e-4335-29d4-033-42e219e6355c@mcs.anl.gov>
	<CA+MQGp_PsC5bNr461Ehc5t81cy-PxzVmy5MNhKGMGS7NnPFRew@mail.gmail.com>
	<878s0x6118.fsf@jedbrown.org>
	<a515a91d-73ee-8622-4a8d-f0517924caa2@rpi.edu>
	<CA+MQGp-HzqZ_gDio0_TeO-ojXc7iJVYSZvztwcX8VMhkQtc4Xg@mail.gmail.com>
	<EBABBCEA-5488-4CF5-BD06-1A3DD9CE8DE1@petsc.dev>
Message-ID: <A37B7782-8EF7-45B3-B537-EBED725CFCDB@tacc.utexas.edu>


On , 2021Aug20, at 00:33, Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>> wrote:

 It sounds like maybe the Spectrum MPI_Comm_free() is not returning the comm to the "pool" as available for future use;

1. I can not find in the standard what the proper response is to running out of communicators.

2. Mpich on my laptop returns MPI_COMM_NULL after 2044 dups without free.

3. A million dups with free run in a second.

2b. Spectrum MPI on my P9 runs out after 4096 dups

3b. Dup and free million times takes more time than writing this email. Why do I keep hearing that OpenMPI is so great? Everything slightly non-standard I try is hopelessly slow and broken.

Victor.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210821/7a226643/attachment.html>

From eijkhout at tacc.utexas.edu  Sun Aug 22 13:10:57 2021
From: eijkhout at tacc.utexas.edu (Victor Eijkhout)
Date: Sun, 22 Aug 2021 18:10:57 +0000
Subject: [petsc-users] Reaching limit number of communicator with
 Spectrum MPI
In-Reply-To: <A37B7782-8EF7-45B3-B537-EBED725CFCDB@tacc.utexas.edu>
References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu>
	<CA+MQGp_EVwiHF1zaRF5o4qvWpk4yByq-Qrrx3OO2hrrKHnSvZA@mail.gmail.com>
	<71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu>
	<1ce1902e-4335-29d4-033-42e219e6355c@mcs.anl.gov>
	<CA+MQGp_PsC5bNr461Ehc5t81cy-PxzVmy5MNhKGMGS7NnPFRew@mail.gmail.com>
	<878s0x6118.fsf@jedbrown.org>
	<a515a91d-73ee-8622-4a8d-f0517924caa2@rpi.edu>
	<CA+MQGp-HzqZ_gDio0_TeO-ojXc7iJVYSZvztwcX8VMhkQtc4Xg@mail.gmail.com>
	<EBABBCEA-5488-4CF5-BD06-1A3DD9CE8DE1@petsc.dev>
	<A37B7782-8EF7-45B3-B537-EBED725CFCDB@tacc.utexas.edu>
Message-ID: <7ABE1E51-AF1B-4A38-873F-62CA254CA366@tacc.utexas.edu>


On , 2021Aug21, at 14:08, Victor Eijkhout <eijkhout at tacc.utexas.edu<mailto:eijkhout at tacc.utexas.edu>> wrote:

3b. Dup and free million times takes more time than writing this email. Why do I keep hearing that OpenMPI is so great? Everything slightly non-standard I try is hopelessly slow and broken.

Partial rehabilitation for OpenMPI:

Finished dup/free'ing 1000000 communicators

real    0m9.301s
user    0m9.120s
sys     0m0.114s

So that?s only 4 times slower than Mvapich, but the ludicrously slow performance is only for Spectrum MPI, not OpenMPI per se.

Victor.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210822/0a1be2ab/attachment.html>

From janne.ruuskanen at tuni.fi  Mon Aug 23 05:45:34 2021
From: janne.ruuskanen at tuni.fi (Janne Ruuskanen (TAU))
Date: Mon, 23 Aug 2021 10:45:34 +0000
Subject: [petsc-users] issues with mpi uni
Message-ID: <DB7PR08MB31934D6BC230BD90745A733BF3C49@DB7PR08MB3193.eurprd08.prod.outlook.com>

Hi,

Assumingly, I have an issue using petsc and openmpi together in my c++ code.

See the code there:
https://github.com/halbux/sparselizard/blob/master/src/slmpi.cpp


So when I run:

slmpi::initialize();
slmpi::count();
slmpi::finalize();

I get the following error:


*** The MPI_Comm_size() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.


Have you experienced anything similar with people trying to link openmpi and petsc into the same executable?

Best regards,
Janne Ruuskanen

From balay at mcs.anl.gov  Mon Aug 23 08:44:43 2021
From: balay at mcs.anl.gov (Satish Balay)
Date: Mon, 23 Aug 2021 08:44:43 -0500 (CDT)
Subject: [petsc-users] issues with mpi uni
In-Reply-To: <DB7PR08MB31934D6BC230BD90745A733BF3C49@DB7PR08MB3193.eurprd08.prod.outlook.com>
References: <DB7PR08MB31934D6BC230BD90745A733BF3C49@DB7PR08MB3193.eurprd08.prod.outlook.com>
Message-ID: <e665f69-c528-5c0-ddd0-82b4bbf763ce@mcs.anl.gov>

Did you build PETSc with the same openmpi [as what sparselizard is built with]?

Satish

On Mon, 23 Aug 2021, Janne Ruuskanen (TAU) wrote:

> Hi,
> 
> Assumingly, I have an issue using petsc and openmpi together in my c++ code.
> 
> See the code there:
> https://github.com/halbux/sparselizard/blob/master/src/slmpi.cpp
> 
> 
> So when I run:
> 
> slmpi::initialize();
> slmpi::count();
> slmpi::finalize();
> 
> I get the following error:
> 
> 
> *** The MPI_Comm_size() function was called before MPI_INIT was invoked.
> *** This is disallowed by the MPI standard.
> *** Your MPI job will now abort.
> 
> 
> Have you experienced anything similar with people trying to link openmpi and petsc into the same executable?
> 
> Best regards,
> Janne Ruuskanen
> 


From asher.mancinelli at pnnl.gov  Mon Aug 23 15:36:46 2021
From: asher.mancinelli at pnnl.gov (Mancinelli, Asher J)
Date: Mon, 23 Aug 2021 20:36:46 +0000
Subject: [petsc-users] PETSc + Cray MPICH Build Error in User Code
Message-ID: <CO6PR09MB76534E6D2C40CE053F66A3E0FFC49@CO6PR09MB7653.namprd09.prod.outlook.com>

Hello all,

We are attempting to build an application that relies on PETSc with Cray MPICH, and we're encountering the following build-time error:

cd <snip>/exago/build/src/utils && hipcc -DHAVE_HIP -I<snip>/exago/include -I<snip>/exago/build -I<snip>spack/opt/spack/cray-sles15-zen2/clang-12.0.0-rocm4.2-mpich/magma-2.6.1-l3ckgjdgsf4yhyzzb5zaibqg5u6lzgdb/include -isystem <snip>spack/opt/spack/cray-sles15-zen2/clang-12.0.0-rocm4.2-mpich/mumps-5.4.0-3naioareijver7s2em5sdsejh7s74kvf/include -isystem <cray-mpich prefix>/include -isystem <snip>spack/opt/spack/cray-sles15-zen2/clang-12.0.0-rocm4.2-mpich/petsc-3.14.1-bzve7phvhb7sf6ikzmm3jwgzjwgnm4ro/include -O3 -DNDEBUG -fPIC -D__INSDIR__=\"\" -std=gnu++11 -MD -MT src/utils/CMakeFiles/UTILS_obj_static.dir/utils.cpp.o -MF CMakeFiles/UTILS_obj_static.dir/utils.cpp.o.d -o CMakeFiles/UTILS_obj_static.dir/utils.cpp.o -c <snip>/exago/src/utils/utils.cpp
In file included from <snip>/exago/src/utils/utils.cpp:2:
In file included from <snip>/exago/include/common.h:8:
In file included from <snip>/spack/opt/spack/cray-sles15-zen2/clang-12.0.0-rocm4.2-mpich/petsc-3.14.1-bzve7phvhb7sf6ikzmm3jwgzjwgnm4ro/include/petsc.h:5:
In file included from <snip>/spack/opt/spack/cray-sles15-zen2/clang-12.0.0-rocm4.2-mpich/petsc-3.14.1-bzve7phvhb7sf6ikzmm3jwgzjwgnm4ro/include/petscbag.h:4:
<snip>/spack/opt/spack/cray-sles15-zen2/clang-12.0.0-rocm4.2-mpich/petsc-3.14.1-bzve7phvhb7sf6ikzmm3jwgzjwgnm4ro/include/petscsys.h:211:6: error: "PETSc was configured with MPICH but now appears to be compiling using a non-MPICH mpi.h"
#    error "PETSc was configured with MPICH but now appears to be compiling using a non-MPICH mpi.h"
     ^

I've replaced some possibly sensitive paths with text in angle brackets for a description, eg <cray-mpich prefix>.

Is this a known issue? Is it apparent from this text that we're doing anything wrong?

Our source may be found at this repository: https://gitlab.pnnl.gov/exasgd/frameworks/exago.
[https://gitlab.pnnl.gov/assets/gitlab_logo-7ae504fe4f68fdebb3c2034e36621930cd36ea87924c11ff65dbcb8ed50dca58.png]<https://gitlab.pnnl.gov/exasgd/frameworks/exago>
ExaSGD / Frameworks / ExaGO ? GitLab<https://gitlab.pnnl.gov/exasgd/frameworks/exago>
PNNL GitLab - Scientific Software Collaboration Platform
gitlab.pnnl.gov


Cheers,

Asher Mancinelli

Research Computing

Pacific Northwest National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210823/f4e6f99b/attachment.html>

From junchao.zhang at gmail.com  Mon Aug 23 16:07:59 2021
From: junchao.zhang at gmail.com (Junchao Zhang)
Date: Mon, 23 Aug 2021 16:07:59 -0500
Subject: [petsc-users] [petsc-maint] PETSc + Cray MPICH Build Error in
 User Code
In-Reply-To: <CO6PR09MB76534E6D2C40CE053F66A3E0FFC49@CO6PR09MB7653.namprd09.prod.outlook.com>
References: <CO6PR09MB76534E6D2C40CE053F66A3E0FFC49@CO6PR09MB7653.namprd09.prod.outlook.com>
Message-ID: <CA+MQGp-ojGahzrdCCsFwrrJxn3+eiQ-_DBcOZfo_CwoeEhScqQ@mail.gmail.com>

Could you send the configure.log of your petsc build?

--Junchao Zhang


On Mon, Aug 23, 2021 at 3:37 PM Mancinelli, Asher J via petsc-maint <
petsc-maint at mcs.anl.gov> wrote:

> Hello all,
>
> We are attempting to build an application that relies on PETSc with Cray
> MPICH, and we're encountering the following build-time error:
>
> cd <snip>/exago/build/src/utils && hipcc -DHAVE_HIP -I<snip>/exago/include
> -I<snip>/exago/build
> -I<snip>spack/opt/spack/cray-sles15-zen2/clang-12.0.0-rocm4.2-mpich/magma-2.6.1-l3ckgjdgsf4yhyzzb5zaibqg5u6lzgdb/include
> -isystem
> <snip>spack/opt/spack/cray-sles15-zen2/clang-12.0.0-rocm4.2-mpich/mumps-5.4.0-3naioareijver7s2em5sdsejh7s74kvf/include
> -isystem <cray-mpich prefix>/include -isystem
> <snip>spack/opt/spack/cray-sles15-zen2/clang-12.0.0-rocm4.2-mpich/petsc-3.14.1-bzve7phvhb7sf6ikzmm3jwgzjwgnm4ro/include
> -O3 -DNDEBUG -fPIC -D__INSDIR__=\"\" -std=gnu++11 -MD -MT
> src/utils/CMakeFiles/UTILS_obj_static.dir/utils.cpp.o -MF
> CMakeFiles/UTILS_obj_static.dir/utils.cpp.o.d -o
> CMakeFiles/UTILS_obj_static.dir/utils.cpp.o -c
> <snip>/exago/src/utils/utils.cpp
> In file included from <snip>/exago/src/utils/utils.cpp:2:
> In file included from <snip>/exago/include/common.h:8:
> In file included from
> <snip>/spack/opt/spack/cray-sles15-zen2/clang-12.0.0-rocm4.2-mpich/petsc-3.14.1-bzve7phvhb7sf6ikzmm3jwgzjwgnm4ro/include/petsc.h:5:
> In file included from
> <snip>/spack/opt/spack/cray-sles15-zen2/clang-12.0.0-rocm4.2-mpich/petsc-3.14.1-bzve7phvhb7sf6ikzmm3jwgzjwgnm4ro/include/petscbag.h:4:
> <snip>/spack/opt/spack/cray-sles15-zen2/clang-12.0.0-rocm4.2-mpich/petsc-3.14.1-bzve7phvhb7sf6ikzmm3jwgzjwgnm4ro/include/petscsys.h:211:6:
> error: "PETSc was configured with MPICH but now appears to be compiling
> using a non-MPICH mpi.h"
> #    error "PETSc was configured with MPICH but now appears to be
> compiling using a non-MPICH mpi.h"
>      ^
>
> I've replaced some possibly sensitive paths with text in angle brackets
> for a description, eg <cray-mpich prefix>.
>
> Is this a known issue? Is it apparent from this text that we're doing
> anything wrong?
>
> Our source may be found at this repository:
> https://gitlab.pnnl.gov/exasgd/frameworks/exago.
> <https://gitlab.pnnl.gov/exasgd/frameworks/exago>
> ExaSGD / Frameworks / ExaGO ? GitLab
> <https://gitlab.pnnl.gov/exasgd/frameworks/exago>
> PNNL GitLab - Scientific Software Collaboration Platform
> gitlab.pnnl.gov
>
>
> Cheers,
>
> *Asher Mancinelli*
>
> Research Computing
>
> *Pacific Northwest National Laboratory*
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210823/d4b2c637/attachment.html>

From janne.ruuskanen at tuni.fi  Tue Aug 24 04:47:23 2021
From: janne.ruuskanen at tuni.fi (Janne Ruuskanen (TAU))
Date: Tue, 24 Aug 2021 09:47:23 +0000
Subject: [petsc-users] issues with mpi uni
In-Reply-To: <e665f69-c528-5c0-ddd0-82b4bbf763ce@mcs.anl.gov>
References: <DB7PR08MB31934D6BC230BD90745A733BF3C49@DB7PR08MB3193.eurprd08.prod.outlook.com>
	<e665f69-c528-5c0-ddd0-82b4bbf763ce@mcs.anl.gov>
Message-ID: <AM0PR08MB31879F7FE3E8D7EB831EA8FBF3C59@AM0PR08MB3187.eurprd08.prod.outlook.com>

PETSc was built without mpi with the command:


./configure --with-openmp --with-mpi=0 --with-shared-libraries=1 --with-mumps-serial=1 --download-mumps --download-openblas --download-metis --download-slepc --with-debugging=0 --with-scalar-type=real --with-x=0 COPTFLAGS='-O3' CXXOPTFLAGS='-O3' FOPTFLAGS='-O3';

so the MPI_UNI  mpi wrapper of petsc collides in names with the actual MPI used to compile sparselizard.

-Janne


-----Original Message-----
From: Satish Balay <balay at mcs.anl.gov> 
Sent: Monday, August 23, 2021 4:45 PM
To: Janne Ruuskanen (TAU) <janne.ruuskanen at tuni.fi>
Cc: petsc-users at mcs.anl.gov
Subject: Re: [petsc-users] issues with mpi uni

Did you build PETSc with the same openmpi [as what sparselizard is built with]?

Satish

On Mon, 23 Aug 2021, Janne Ruuskanen (TAU) wrote:

> Hi,
> 
> Assumingly, I have an issue using petsc and openmpi together in my c++ code.
> 
> See the code there:
> https://github.com/halbux/sparselizard/blob/master/src/slmpi.cpp
> 
> 
> So when I run:
> 
> slmpi::initialize();
> slmpi::count();
> slmpi::finalize();
> 
> I get the following error:
> 
> 
> *** The MPI_Comm_size() function was called before MPI_INIT was invoked.
> *** This is disallowed by the MPI standard.
> *** Your MPI job will now abort.
> 
> 
> Have you experienced anything similar with people trying to link openmpi and petsc into the same executable?
> 
> Best regards,
> Janne Ruuskanen
> 


From knepley at gmail.com  Tue Aug 24 06:06:50 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 24 Aug 2021 07:06:50 -0400
Subject: [petsc-users] issues with mpi uni
In-Reply-To: <AM0PR08MB31879F7FE3E8D7EB831EA8FBF3C59@AM0PR08MB3187.eurprd08.prod.outlook.com>
References: <DB7PR08MB31934D6BC230BD90745A733BF3C49@DB7PR08MB3193.eurprd08.prod.outlook.com>
	<e665f69-c528-5c0-ddd0-82b4bbf763ce@mcs.anl.gov>
	<AM0PR08MB31879F7FE3E8D7EB831EA8FBF3C59@AM0PR08MB3187.eurprd08.prod.outlook.com>
Message-ID: <CAMYG4GnLkekJWN_eC3pu=kNzfKVuyPBG3Gk2h63ywr7XOOHKfg@mail.gmail.com>

On Tue, Aug 24, 2021 at 5:47 AM Janne Ruuskanen (TAU) <
janne.ruuskanen at tuni.fi> wrote:

> PETSc was built without mpi with the command:
>
>
> ./configure --with-openmp --with-mpi=0 --with-shared-libraries=1
> --with-mumps-serial=1 --download-mumps --download-openblas --download-metis
> --download-slepc --with-debugging=0 --with-scalar-type=real --with-x=0
> COPTFLAGS='-O3' CXXOPTFLAGS='-O3' FOPTFLAGS='-O3';
>
> so the MPI_UNI  mpi wrapper of petsc collides in names with the actual MPI
> used to compile sparselizard.
>

Different MPI implementations are not ABI compatible and therefore cannot
be used in the same program. You must
build all libraries in an executable with the same MPI. Thus, rebuild PETSc
with the same MPI as saprselizard.

  Thanks,

     Matt


> -Janne
>
>
> -----Original Message-----
> From: Satish Balay <balay at mcs.anl.gov>
> Sent: Monday, August 23, 2021 4:45 PM
> To: Janne Ruuskanen (TAU) <janne.ruuskanen at tuni.fi>
> Cc: petsc-users at mcs.anl.gov
> Subject: Re: [petsc-users] issues with mpi uni
>
> Did you build PETSc with the same openmpi [as what sparselizard is built
> with]?
>
> Satish
>
> On Mon, 23 Aug 2021, Janne Ruuskanen (TAU) wrote:
>
> > Hi,
> >
> > Assumingly, I have an issue using petsc and openmpi together in my c++
> code.
> >
> > See the code there:
> > https://github.com/halbux/sparselizard/blob/master/src/slmpi.cpp
> >
> >
> > So when I run:
> >
> > slmpi::initialize();
> > slmpi::count();
> > slmpi::finalize();
> >
> > I get the following error:
> >
> >
> > *** The MPI_Comm_size() function was called before MPI_INIT was invoked.
> > *** This is disallowed by the MPI standard.
> > *** Your MPI job will now abort.
> >
> >
> > Have you experienced anything similar with people trying to link openmpi
> and petsc into the same executable?
> >
> > Best regards,
> > Janne Ruuskanen
> >
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210824/b5122747/attachment.html>

From sayosale at hotmail.com  Tue Aug 24 07:47:12 2021
From: sayosale at hotmail.com (dazza simplythebest)
Date: Tue, 24 Aug 2021 12:47:12 +0000
Subject: [petsc-users] Improving efficiency of slepc usage
In-Reply-To: <CAMYG4Gn1h_S4o9LsnyzL_PRkHMxsXkpHgL16n_3iL5B6bRq7sw@mail.gmail.com>
References: <MEYP282MB1975D9155BCF70CF2F5D65CAD0C09@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<EF005C76-523A-4D6B-8DA5-45A9A3FC37B3@dsic.upv.es>
	<MEYP282MB19759AF49362AE3EF4024546D0C19@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<CAMYG4Gn1h_S4o9LsnyzL_PRkHMxsXkpHgL16n_3iL5B6bRq7sw@mail.gmail.com>
Message-ID: <MEYP282MB19750B9F47336C606742AA73D0C49@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>


Dear Matthew and Jose,
   Apologies for the delayed reply, I had a couple of unforeseen days off this week.
Firstly regarding Jose's suggestion re: MUMPS, the program is already using MUMPS
to solve linear systems (the code is using a distributed MPI  matrix to solve the generalised
non-Hermitian complex problem).

I have tried the gdb debugger as per Matthew's suggestion.
Just to note in case someone else is following this that at first it didn't work (couldn't 'attach') ,
but after some googling I found a tip suggesting the command;
echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
which seemed to get it working.

I then first ran the debugger on the small matrix case that worked.
That stopped in gdb almost immediately after starting execution
with a report regarding 'nanosleep.c':
../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory.
However, issuing the 'cont' command again caused the program to run through to the end of the
 execution w/out any problems, and with correct looking results, so I am guessing this error
is not particularly important.


I then tried the same debugging procedure on the large matrix case that fails.
The code again stopped almost immediately after the start of execution with
the same nanosleep error as before, and I was able to set the program running
 again with 'cont' (see full output below). I was running the code with 4 MPI processes,
 and so had 4 gdb windows appear.  Thereafter the code ran for sometime until completing the
matrix construction, and then one of the gdb process windows printed a
Program terminated with signal SIGKILL, Killed.
The program no longer exists.
message.  I then typed 'where' into this terminal but just received the message
No stack.

The other gdb windows basically seemed to be left in limbo until I issued the 'quit'
 command in the SIGKILL, and then they vanished.

I paste the full output from the gdb window that recorded the SIGKILL below here.
I guess it is necessary to somehow work out where the SIGKILL originates from ?

 Thanks once again,
                         Dan.


 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./stab1.exe...
Attaching to program: /data/work/rotplane/omega_to_zero/stability/test/tmp10/tmp6/stab1.exe, process 675919
Reading symbols from /data/work/slepc/SLEPC/slepc-3.15.1/arch-omp_nodbug/lib/libslepc.so.3.15...
Reading symbols from /data/work/slepc/PETSC/petsc-3.15.0/arch-omp_nodbug/lib/libpetsc.so.3.15...
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib--Type <RET> for more, q to quit, c to continue without paging--cont
/intel64_lin/libmkl_intel_lp64.so...
(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so)
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so...
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so...
(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so)
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so...
(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.so...
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.dbg...
Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...
Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libdl-2.31.so...
Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...
Reading symbols from /usr/lib/debug/.build-id/e5/4761f7b554d0fcc1562959665d93dffbebdaf0.debug...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6...
(No debugging symbols found in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpifort.so.12...
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12...
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.dbg...
Reading symbols from /lib/x86_64-linux-gnu/librt.so.1...
Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/librt-2.31.so...
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5...
(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5)
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so...
(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so)
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so...
(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so)
Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...
Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libm-2.31.so...
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so...
(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so)
Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1...
(No debugging symbols found in /lib/x86_64-linux-gnu/libgcc_s.so.1)
Reading symbols from /usr/lib/x86_64-linux-gnu/libquadmath.so.0...
(No debugging symbols found in /usr/lib/x86_64-linux-gnu/libquadmath.so.0)
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so...
(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so)
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...
Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.31.so...
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so...
(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so)
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5...
(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5)
Reading symbols from /lib64/ld-linux-x86-64.so.2...
Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.31.so...
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1...
(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1)
Reading symbols from /usr/lib/x86_64-linux-gnu/libnuma.so...
(No debugging symbols found in /usr/lib/x86_64-linux-gnu/libnuma.so)
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so...
(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so)
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so...
(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so)
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so...
(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so)
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so...
(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so)
Reading symbols from /usr/lib/x86_64-linux-gnu/libpsm2.so.2...
(No debugging symbols found in /usr/lib/x86_64-linux-gnu/libpsm2.so.2)
0x00007fac4d0d8334 in __GI___clock_nanosleep (clock_id=<optimized out>, clock_id at entry=0, flags=flags at entry=0, req=req at entry=0x7ffdc641a9a0, rem=rem at entry=0x7ffdc641a9a0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
78      ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory.
(gdb) cont
Continuing.
[New Thread 0x7f9e49c02780 (LWP 676559)]
[New Thread 0x7f9e49400800 (LWP 676560)]
[New Thread 0x7f9e48bfe880 (LWP 676562)]
[Thread 0x7f9e48bfe880 (LWP 676562) exited]
[Thread 0x7f9e49400800 (LWP 676560) exited]
[Thread 0x7f9e49c02780 (LWP 676559) exited]

Program terminated with signal SIGKILL, Killed.
The program no longer exists.
(gdb) where
No stack.

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

________________________________
From: Matthew Knepley <knepley at gmail.com>
Sent: Friday, August 20, 2021 2:12 PM
To: dazza simplythebest <sayosale at hotmail.com>
Cc: Jose E. Roman <jroman at dsic.upv.es>; PETSc <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Improving efficiency of slepc usage

On Fri, Aug 20, 2021 at 6:55 AM dazza simplythebest <sayosale at hotmail.com<mailto:sayosale at hotmail.com>> wrote:
Dear Jose,
    Many thanks for your response, I have been investigating this issue with a few more calculations
today, hence the slightly delayed response.

The problem is actually derived from a fluid dynamics problem, so to allow an easier exploration of things
I first downsized the resolution of the underlying fluid solver while keeping all the physical parameters
 the same - i.e. I would get a smaller matrix that should be solving the same physical problem as the original
 larger matrix but to lower accuracy.

Results

Small matrix (N= 21168) - everything good!
This converged when using the -eps_largest_real approach (taking 92 iterations for nev=10,
tol= 5.0000E-06 and ncv = 300), and also when using the shift-invert approach, converging
very impressively in a single iteration ! Interestingly it did this both for a non-zero  -eps_target
 and also for a zero  -eps_target.

Large matrix (N=50400)- works for -eps_largest_real , fails for st_type sinvert
I have just double checked again that the code does run properly when we use the -eps_largest_real
option - indeed I ran it with a small nev and large tolerance (nev = 4, tol= -eps_tol 5.0e-4 , ncv = 300)
and with these parameters convergence was obtained in 164 iterations, which took 6 hours on the
machine I was running it on. Furthermore the eigenvalues seem to be ballpark correct; for this large
higher resolution case (although with lower slepc tolerance) we obtain 1789.56816314173 -4724.51319554773i
 as the eigenvalue with largest real part, while the smaller matrix (same physical problem but at lower resolution case)
found this eigenvalue to be 1831.11845726501 -4787.54519511345i , which means the agreement is in line
with expectations.

Unfortunately though the code does still crash though when I try to do shift-invert for the large matrix case ,
 whether or not I use a non-zero  -eps_target. For reference this is the command line used :
-eps_nev 10    -eps_ncv 300  -log_view -eps_view   -eps_target 0.1 -st_type sinvert -eps_monitor :monitor_output05.txt
To be precise the code crashes soon after calling EPSSolve (it successfully calls
 MatCreateVecs, EPSCreate,  EPSSetOperators, EPSSetProblemType and EPSSetFromOptions).
By crashes I mean that I do not even get any error messages from slepc/PETSC, and do not even get the
'EPS Object: 16 MPI processes' message - I simply get a  MPI/Fortran 'KILLED BY SIGNAL: 9 (Killed)' message
 as soon as EPSsolve is called.

Hi Dan,

It would help track this error down if we had a stack trace. You can get a stack trace from the debugger. You run with

  -start_in_debugger

which should launch the debugger (usually), and then type

  cont

to continue, and then

  where

to get the stack trace when it crashes, or 'bt' on lldb.

  Thanks,

     Matt

Do you have any ideas as to why this larger matrix case should fail when using shift-invert but succeed when using
-eps_largest_real ? The fact that the program works and produces correct results
when using the -eps_largest_real  option suggests that there is probably nothing wrong with the specification
of the problem or the matrices ? It is strange how there is no error message from slepc / Petsc ... the
only idea I have at the moment is that perhaps max memory has been exceeded, which could cause such a sudden
shutdown? For your reference when running the large matrix case with the -eps_largest_real option I am using
about 36 GB of the 148GB available on this machine  - does the shift invert approach require substantially
more memory for example ?

  I would be very grateful if you have any suggestions to resolve this issue or even ways to clarify it further,
 the performance I have seen with the shift-invert for the small matrix is so impressive it would be great to
 get that working for the full-size problem.

   Many thanks and best wishes,
                                  Dan.


________________________________
From: Jose E. Roman <jroman at dsic.upv.es<mailto:jroman at dsic.upv.es>>
Sent: Thursday, August 19, 2021 7:58 AM
To: dazza simplythebest <sayosale at hotmail.com<mailto:sayosale at hotmail.com>>
Cc: PETSc <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Improving efficiency of slepc usage

In A) convergence may be slow, especially if the wanted eigenvalues have small magnitude. I would not say 600 iterations is a lot, you probably need many more. In most cases, approach B) is better because it improves convergence of eigenvalues close to the target, but it requires prior knowledge of your spectrum distribution in order to choose an appropriate target.

In B) what do you mean that it crashes. If you get an error about factorization, it means that your A-matrix is singular, In that case, try using a nonzero target -eps_target 0.1

Jose


> El 19 ago 2021, a las 7:12, dazza simplythebest <sayosale at hotmail.com<mailto:sayosale at hotmail.com>> escribi?:
>
> Dear All,
>             I am planning on using slepc to do a large number of eigenvalue calculations
>  of a generalized eigenvalue problem, called from a program written in fortran using MPI.
>  Thus far I have successfully installed the slepc/PETSc software, both locally and on a cluster,
>  and on smaller test problems everything is working well; the matrices are efficiently and
> correctly constructed and slepc returns the correct spectrum. I am just now starting to move
> towards now solving the full-size 'production run' problems, and would appreciate some
> general advice on how to improve the solver's performance.
>
> In particular, I am currently trying to solve the problem Ax = lambda Bx whose matrices
> are of size 50000 (this is the smallest 'production run' problem I will be tackling), and are
> complex, non-Hermitian.  In most cases I aim to find the eigenvalues with the largest real part,
> although in other cases I will also be interested in finding the eigenvalues whose real part
> is close to zero.
>
> A)
> Calling slepc 's EPS solver with the following options:
>
> -eps_nev 10   -log_view -eps_view -eps_max_it 600 -eps_ncv 140  -eps_tol 5.0e-6  -eps_largest_real -eps_monitor :monitor_output.txt
>
>
> led to the code successfully running, but failing to find any eigenvalues within the maximum 600 iterations
> (examining the monitor output it did appear to be very slowly approaching convergence).
>
> B)
> On the same problem I have also tried a shift-invert transformation using the options
>
> -eps_nev 10    -eps_ncv 140    -eps_target 0.0+0.0i  -st_type sinvert
>
> -in this case the code crashed at the point it tried to call slepc, so perhaps I have incorrectly specified these options ?
>
>
> Does anyone have any suggestions as to how to improve this performance ( or find out more about the problem) ?
> In the case of A) I can see from watching the slepc   videos that increasing ncv
> may help, but I am wondering , since 600 is a large number of iterations, whether there
> maybe something else going on - e.g. perhaps some alternative preconditioner may help ?
> In the case of B), I guess there must be some mistake in these command line options?
>  Again, any advice will be greatly appreciated.
>      Best wishes,  Dan.


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210824/f799fae4/attachment-0001.html>

From balay at mcs.anl.gov  Tue Aug 24 10:18:21 2021
From: balay at mcs.anl.gov (Satish Balay)
Date: Tue, 24 Aug 2021 10:18:21 -0500 (CDT)
Subject: [petsc-users] issues with mpi uni
In-Reply-To: <CAMYG4GnLkekJWN_eC3pu=kNzfKVuyPBG3Gk2h63ywr7XOOHKfg@mail.gmail.com>
References: <DB7PR08MB31934D6BC230BD90745A733BF3C49@DB7PR08MB3193.eurprd08.prod.outlook.com>
	<e665f69-c528-5c0-ddd0-82b4bbf763ce@mcs.anl.gov>
	<AM0PR08MB31879F7FE3E8D7EB831EA8FBF3C59@AM0PR08MB3187.eurprd08.prod.outlook.com>
	<CAMYG4GnLkekJWN_eC3pu=kNzfKVuyPBG3Gk2h63ywr7XOOHKfg@mail.gmail.com>
Message-ID: <bb778cc5-1b3-e88-7c19-eac6eac584f@mcs.anl.gov>

MPI_UNI is name-spaced to avoid such conflicts. Don't know about mumps.

But there could be corner cases where this issue comes up.

And its  best to have the same MPI across all packages that go into a binary anyway.

Satish

On Tue, 24 Aug 2021, Matthew Knepley wrote:

> On Tue, Aug 24, 2021 at 5:47 AM Janne Ruuskanen (TAU) <
> janne.ruuskanen at tuni.fi> wrote:
> 
> > PETSc was built without mpi with the command:
> >
> >
> > ./configure --with-openmp --with-mpi=0 --with-shared-libraries=1
> > --with-mumps-serial=1 --download-mumps --download-openblas --download-metis
> > --download-slepc --with-debugging=0 --with-scalar-type=real --with-x=0
> > COPTFLAGS='-O3' CXXOPTFLAGS='-O3' FOPTFLAGS='-O3';
> >
> > so the MPI_UNI  mpi wrapper of petsc collides in names with the actual MPI
> > used to compile sparselizard.
> >
> 
> Different MPI implementations are not ABI compatible and therefore cannot
> be used in the same program. You must
> build all libraries in an executable with the same MPI. Thus, rebuild PETSc
> with the same MPI as saprselizard.
> 
>   Thanks,
> 
>      Matt
> 
> 
> > -Janne
> >
> >
> > -----Original Message-----
> > From: Satish Balay <balay at mcs.anl.gov>
> > Sent: Monday, August 23, 2021 4:45 PM
> > To: Janne Ruuskanen (TAU) <janne.ruuskanen at tuni.fi>
> > Cc: petsc-users at mcs.anl.gov
> > Subject: Re: [petsc-users] issues with mpi uni
> >
> > Did you build PETSc with the same openmpi [as what sparselizard is built
> > with]?
> >
> > Satish
> >
> > On Mon, 23 Aug 2021, Janne Ruuskanen (TAU) wrote:
> >
> > > Hi,
> > >
> > > Assumingly, I have an issue using petsc and openmpi together in my c++
> > code.
> > >
> > > See the code there:
> > > https://github.com/halbux/sparselizard/blob/master/src/slmpi.cpp
> > >
> > >
> > > So when I run:
> > >
> > > slmpi::initialize();
> > > slmpi::count();
> > > slmpi::finalize();
> > >
> > > I get the following error:
> > >
> > >
> > > *** The MPI_Comm_size() function was called before MPI_INIT was invoked.
> > > *** This is disallowed by the MPI standard.
> > > *** Your MPI job will now abort.
> > >
> > >
> > > Have you experienced anything similar with people trying to link openmpi
> > and petsc into the same executable?
> > >
> > > Best regards,
> > > Janne Ruuskanen
> > >
> >
> >
> 
> 


From knepley at gmail.com  Tue Aug 24 10:59:23 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Tue, 24 Aug 2021 11:59:23 -0400
Subject: [petsc-users] Improving efficiency of slepc usage
In-Reply-To: <MEYP282MB19750B9F47336C606742AA73D0C49@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
References: <MEYP282MB1975D9155BCF70CF2F5D65CAD0C09@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<EF005C76-523A-4D6B-8DA5-45A9A3FC37B3@dsic.upv.es>
	<MEYP282MB19759AF49362AE3EF4024546D0C19@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<CAMYG4Gn1h_S4o9LsnyzL_PRkHMxsXkpHgL16n_3iL5B6bRq7sw@mail.gmail.com>
	<MEYP282MB19750B9F47336C606742AA73D0C49@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
Message-ID: <CAMYG4GkP6gM=KGaCQvkAFHLBVeKZY-nMt=-MH2qimh922wOK_g@mail.gmail.com>

On Tue, Aug 24, 2021 at 8:47 AM dazza simplythebest <sayosale at hotmail.com>
wrote:

>
> Dear Matthew and Jose,
>    Apologies for the delayed reply, I had a couple of unforeseen days off
> this week.
> Firstly regarding Jose's suggestion re: MUMPS, the program is already
> using MUMPS
> to solve linear systems (the code is using a distributed MPI  matrix to
> solve the generalised
> non-Hermitian complex problem).
>
> I have tried the gdb debugger as per Matthew's suggestion.
> Just to note in case someone else is following this that at first it
> didn't work (couldn't 'attach') ,
> but after some googling I found a tip suggesting the command;
> echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
> which seemed to get it working.
>
> *I then first ran the debugger on the small matrix case that worked.*
> That stopped in gdb almost immediately after starting execution
> with a report regarding 'nanosleep.c':
> ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory.
> However, issuing the 'cont' command again caused the program to run
> through to the end of the
>  execution w/out any problems, and with correct looking results, so I am
> guessing this error
> is not particularly important.
>

We do that on purpose when the debugger starts up. Typing 'cont' is correct.


> *I then tried the same debugging procedure on the large matrix case that
> fails.*
> The code again stopped almost immediately after the start of execution
> with
> the same nanosleep error as before, and I was able to set the program
> running
>  again with 'cont' (see full output below). I was running the code with 4
> MPI processes,
>  and so had 4 gdb windows appear.  Thereafter the code ran for sometime
> until completing the
> matrix construction, and then one of the gdb process windows printed a
> Program terminated with signal SIGKILL, Killed.
> The program no longer exists.
> message.  I then typed 'where' into this terminal but just received the
> message
> No stack.
>

I have only seen this behavior one other time, and it was with Fortran.
Fortran allows you to declare really big arrays
on the stack by putting them at the start of a function (rather than F90
malloc). When I had one of those arrays exceed
the stack space, I got this kind of an error where everything is destroyed
rather than just stopping. Could it be that you
have a large structure on the stack?

Second, you can at least look at the stack for the processes that were not
killed. You type Ctrl-C, which should give you
the prompt and then "where".

  Thanks,

      Matt


> The other gdb windows basically seemed to be left in limbo until I issued
> the 'quit'
>  command in the SIGKILL, and then they vanished.
>
> I paste the full output from the gdb window that recorded the SIGKILL
> below here.
> I guess it is necessary to somehow work out where the SIGKILL originates
> from ?
>
>  Thanks once again,
>                          Dan.
>
>
>  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
> Copyright (C) 2020 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <
> http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> Type "show copying" and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>.
> Find the GDB manual and other documentation resources online at:
>     <http://www.gnu.org/software/gdb/documentation/>.
>
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from ./stab1.exe...
> Attaching to program:
> /data/work/rotplane/omega_to_zero/stability/test/tmp10/tmp6/stab1.exe,
> process 675919
> Reading symbols from
> /data/work/slepc/SLEPC/slepc-3.15.1/arch-omp_nodbug/lib/libslepc.so.3.15...
> Reading symbols from
> /data/work/slepc/PETSC/petsc-3.15.0/arch-omp_nodbug/lib/libpetsc.so.3.15...
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib--Type <RET> for
> more, q to quit, c to continue without paging--cont
> /intel64_lin/libmkl_intel_lp64.so...
> (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so)
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so...
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so...
> (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so)
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so...
> (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.so...
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.dbg...
> Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libdl-2.31.so...
> Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...
> Reading symbols from
> /usr/lib/debug/.build-id/e5/4761f7b554d0fcc1562959665d93dffbebdaf0.debug...
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6...
> (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpifort.so.12...
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12...
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.dbg...
> Reading symbols from /lib/x86_64-linux-gnu/librt.so.1...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/librt-2.31.so...
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5...
> (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5)
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so...
> (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so)
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so...
> (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so)
> Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libm-2.31.so...
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so...
> (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so)
> Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1...
> (No debugging symbols found in /lib/x86_64-linux-gnu/libgcc_s.so.1)
> Reading symbols from /usr/lib/x86_64-linux-gnu/libquadmath.so.0...
> (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libquadmath.so.0)
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so...
> (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so)
> Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.31.so...
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so...
> (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so)
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5...
> (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5)
> Reading symbols from /lib64/ld-linux-x86-64.so.2...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.31.so...
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1...
> (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1)
> Reading symbols from /usr/lib/x86_64-linux-gnu/libnuma.so...
> (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libnuma.so)
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so...
> (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so)
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so...
> (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so)
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so...
> (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so)
> Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so...
> (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so)
> Reading symbols from /usr/lib/x86_64-linux-gnu/libpsm2.so.2...
> (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libpsm2.so.2)
> 0x00007fac4d0d8334 in __GI___clock_nanosleep (clock_id=<optimized out>,
> clock_id at entry=0, flags=flags at entry=0, req=req at entry=0x7ffdc641a9a0,
> rem=rem at entry=0x7ffdc641a9a0) at
> ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
> 78      ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or
> directory.
> (gdb) cont
> Continuing.
> [New Thread 0x7f9e49c02780 (LWP 676559)]
> [New Thread 0x7f9e49400800 (LWP 676560)]
> [New Thread 0x7f9e48bfe880 (LWP 676562)]
> [Thread 0x7f9e48bfe880 (LWP 676562) exited]
> [Thread 0x7f9e49400800 (LWP 676560) exited]
> [Thread 0x7f9e49c02780 (LWP 676559) exited]
>
> Program terminated with signal SIGKILL, Killed.
> The program no longer exists.
> (gdb) where
> No stack.
>
>  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - - - - - - - -
>
> ------------------------------
> *From:* Matthew Knepley <knepley at gmail.com>
> *Sent:* Friday, August 20, 2021 2:12 PM
> *To:* dazza simplythebest <sayosale at hotmail.com>
> *Cc:* Jose E. Roman <jroman at dsic.upv.es>; PETSc <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Improving efficiency of slepc usage
>
> On Fri, Aug 20, 2021 at 6:55 AM dazza simplythebest <sayosale at hotmail.com>
> wrote:
>
> Dear Jose,
>     Many thanks for your response, I have been investigating this issue
> with a few more calculations
> today, hence the slightly delayed response.
>
> The problem is actually derived from a fluid dynamics problem, so to allow
> an easier exploration of things
> I first downsized the resolution of the underlying fluid solver while
> keeping all the physical parameters
>  the same - i.e. I would get a smaller matrix that should be solving the
> same physical problem as the original
>  larger matrix but to lower accuracy.
>
> *Results*
>
> *Small matrix (N= 21168) - everything good!*
> This converged when using the -eps_largest_real approach (taking 92
> iterations for nev=10,
> tol= 5.0000E-06 and ncv = 300), and also when using the shift-invert
> approach, converging
> very impressively in a single iteration ! Interestingly it did this both
> for a non-zero  -eps_target
>  and also for a zero  -eps_target.
>
> *Large matrix (N=50400)- works for -eps_largest_real , fails for st_type
> sinvert *
> I have just double checked again that the code does run properly when we
> use the -eps_largest_real
> option - indeed I ran it with a small nev and large tolerance (nev = 4,
> tol= -eps_tol 5.0e-4 , ncv = 300)
> and with these parameters convergence was obtained in 164 iterations,
> which took 6 hours on the
> machine I was running it on. Furthermore the eigenvalues seem to be
> ballpark correct; for this large
> higher resolution case (although with lower slepc tolerance) we obtain
> 1789.56816314173 -4724.51319554773i
>  as the eigenvalue with largest real part, while the smaller matrix (same
> physical problem but at lower resolution case)
> found this eigenvalue to be 1831.11845726501 -4787.54519511345i , which
> means the agreement is in line
> with expectations.
>
> *Unfortunately though the code does still crash though when I try to do
> shift-invert for the large matrix case *,
>  whether or not I use a non-zero  -eps_target. For reference this is the
> command line used :
> -eps_nev 10    -eps_ncv 300  -log_view -eps_view   -eps_target 0.1
> -st_type sinvert -eps_monitor :monitor_output05.txt
> To be precise the code crashes soon after calling EPSSolve (it
> successfully calls
>  MatCreateVecs, EPSCreate,  EPSSetOperators, EPSSetProblemType and
> EPSSetFromOptions).
> By crashes I mean that I do not even get any error messages from
> slepc/PETSC, and do not even get the
> 'EPS Object: 16 MPI processes' message - I simply get a  MPI/Fortran
> 'KILLED BY SIGNAL: 9 (Killed)' message
>  as soon as EPSsolve is called.
>
>
> Hi Dan,
>
> It would help track this error down if we had a stack trace. You can get a
> stack trace from the debugger. You run with
>
>   -start_in_debugger
>
> which should launch the debugger (usually), and then type
>
>   cont
>
> to continue, and then
>
>   where
>
> to get the stack trace when it crashes, or 'bt' on lldb.
>
>   Thanks,
>
>      Matt
>
>
> Do you have any ideas as to why this larger matrix case should fail when
> using shift-invert but succeed when using
> -eps_largest_real ? The fact that the program works and produces correct
> results
> when using the -eps_largest_real  option suggests that there is probably
> nothing wrong with the specification
> of the problem or the matrices ? It is strange how there is no error
> message from slepc / Petsc ... the
> only idea I have at the moment is that perhaps max memory has been
> exceeded, which could cause such a sudden
> shutdown? For your reference when running the large matrix case with the
> -eps_largest_real option I am using
> about 36 GB of the 148GB available on this machine  - does the shift
> invert approach require substantially
> more memory for example ?
>
>   I would be very grateful if you have any suggestions to resolve this
> issue or even ways to clarify it further,
>  the performance I have seen with the shift-invert for the small matrix is
> so impressive it would be great to
>  get that working for the full-size problem.
>
>    Many thanks and best wishes,
>                                   Dan.
>
>
>
> ------------------------------
> *From:* Jose E. Roman <jroman at dsic.upv.es>
> *Sent:* Thursday, August 19, 2021 7:58 AM
> *To:* dazza simplythebest <sayosale at hotmail.com>
> *Cc:* PETSc <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Improving efficiency of slepc usage
>
> In A) convergence may be slow, especially if the wanted eigenvalues have
> small magnitude. I would not say 600 iterations is a lot, you probably need
> many more. In most cases, approach B) is better because it improves
> convergence of eigenvalues close to the target, but it requires prior
> knowledge of your spectrum distribution in order to choose an appropriate
> target.
>
> In B) what do you mean that it crashes. If you get an error about
> factorization, it means that your A-matrix is singular, In that case, try
> using a nonzero target -eps_target 0.1
>
> Jose
>
>
> > El 19 ago 2021, a las 7:12, dazza simplythebest <sayosale at hotmail.com>
> escribi?:
> >
> > Dear All,
> >             I am planning on using slepc to do a large number of
> eigenvalue calculations
> >  of a generalized eigenvalue problem, called from a program written in
> fortran using MPI.
> >  Thus far I have successfully installed the slepc/PETSc software, both
> locally and on a cluster,
> >  and on smaller test problems everything is working well; the matrices
> are efficiently and
> > correctly constructed and slepc returns the correct spectrum. I am just
> now starting to move
> > towards now solving the full-size 'production run' problems, and would
> appreciate some
> > general advice on how to improve the solver's performance.
> >
> > In particular, I am currently trying to solve the problem Ax = lambda Bx
> whose matrices
> > are of size 50000 (this is the smallest 'production run' problem I will
> be tackling), and are
> > complex, non-Hermitian.  In most cases I aim to find the eigenvalues
> with the largest real part,
> > although in other cases I will also be interested in finding the
> eigenvalues whose real part
> > is close to zero.
> >
> > A)
> > Calling slepc 's EPS solver with the following options:
> >
> > -eps_nev 10   -log_view -eps_view -eps_max_it 600 -eps_ncv 140  -eps_tol
> 5.0e-6  -eps_largest_real -eps_monitor :monitor_output.txt
> >
> >
> > led to the code successfully running, but failing to find any
> eigenvalues within the maximum 600 iterations
> > (examining the monitor output it did appear to be very slowly
> approaching convergence).
> >
> > B)
> > On the same problem I have also tried a shift-invert transformation
> using the options
> >
> > -eps_nev 10    -eps_ncv 140    -eps_target 0.0+0.0i  -st_type sinvert
> >
> > -in this case the code crashed at the point it tried to call slepc, so
> perhaps I have incorrectly specified these options ?
> >
> >
> > Does anyone have any suggestions as to how to improve this performance (
> or find out more about the problem) ?
> > In the case of A) I can see from watching the slepc   videos that
> increasing ncv
> > may help, but I am wondering , since 600 is a large number of
> iterations, whether there
> > maybe something else going on - e.g. perhaps some alternative
> preconditioner may help ?
> > In the case of B), I guess there must be some mistake in these command
> line options?
> >  Again, any advice will be greatly appreciated.
> >      Best wishes,  Dan.
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210824/cdd46486/attachment-0001.html>

From gsabhishek1ags at gmail.com  Tue Aug 24 22:03:31 2021
From: gsabhishek1ags at gmail.com (Abhishek G.S.)
Date: Wed, 25 Aug 2021 08:33:31 +0530
Subject: [petsc-users] Static Library based app for petsc
Message-ID: <CAED37r_R7e6i3UEjRxXC3jtRaPg6FLqb1ehRFuXvkttaoff1RQ@mail.gmail.com>

Hi,
I am trying to develop a static-library-based app using petsc. The
structure goes as,
.
??? benchmarks
?   ??? Test1
?       ??? main.cpp
?       ??? Makefile
??? libTest
    ??? build
    ??? CMakeLists.txt
    ??? include
    ?   ??? test.cpp
    ?   ??? test.h
    ??? lib
        ??? libTest.a

While this code compiles, I am unable to create a minimal working example
for the same. The aim is to just print "Hello World". Why is it that
nothing prints?. Is it something to do with the PETSC wrapper for cout?
Also, I would like to know whether it's a good idea to go ahead with this
kind of code structure.

Thanks for the help.

Code: https://github.com/gsabhishek/PetscStaticLibraryApp.git

Thanks for the help
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210825/e4b93c33/attachment.html>

From jed at jedbrown.org  Tue Aug 24 23:31:22 2021
From: jed at jedbrown.org (Jed Brown)
Date: Tue, 24 Aug 2021 22:31:22 -0600
Subject: [petsc-users] Static Library based app for petsc
In-Reply-To: <CAED37r_R7e6i3UEjRxXC3jtRaPg6FLqb1ehRFuXvkttaoff1RQ@mail.gmail.com>
References: <CAED37r_R7e6i3UEjRxXC3jtRaPg6FLqb1ehRFuXvkttaoff1RQ@mail.gmail.com>
Message-ID: <87v93u402t.fsf@jedbrown.org>

PETSc does not "wrap" cout.

Creating a library first, with an executable front-end that most/all initial users will use is generally good design. More users of the library emerge as people try to do more advanced/custom things that are not appropriate to do with the executable.

"Abhishek G.S." <gsabhishek1ags at gmail.com> writes:

> Hi,
> I am trying to develop a static-library-based app using petsc. The
> structure goes as,
> .
> ??? benchmarks
> ?   ??? Test1
> ?       ??? main.cpp
> ?       ??? Makefile
> ??? libTest
>     ??? build

Are build products meant to go into this build/ directory, not under lib/ as you have it?

>     ??? CMakeLists.txt
>     ??? include
>     ?   ??? test.cpp

cpp files would usually go under src/ or almost anywhere but in include/

>     ?   ??? test.h
>     ??? lib
>         ??? libTest.a
>
> While this code compiles, I am unable to create a minimal working example
> for the same. The aim is to just print "Hello World". Why is it that
> nothing prints?. Is it something to do with the PETSC wrapper for cout?
> Also, I would like to know whether it's a good idea to go ahead with this
> kind of code structure.
>
> Thanks for the help.
>
> Code: https://github.com/gsabhishek/PetscStaticLibraryApp.git
>
> Thanks for the help

From gsabhishek1ags at gmail.com  Tue Aug 24 23:59:32 2021
From: gsabhishek1ags at gmail.com (Abhishek G.S.)
Date: Wed, 25 Aug 2021 10:29:32 +0530
Subject: [petsc-users] Static Library based app for petsc
In-Reply-To: <87v93u402t.fsf@jedbrown.org>
References: <CAED37r_R7e6i3UEjRxXC3jtRaPg6FLqb1ehRFuXvkttaoff1RQ@mail.gmail.com>
	<87v93u402t.fsf@jedbrown.org>
Message-ID: <CAED37r-ihKFCYsfo7KK1dum==RN3ic8uB2ZMUcy6EgsZtOW9=A@mail.gmail.com>

Thanks for the reply.

On Wed, 25 Aug 2021 at 10:01, Jed Brown <jed at jedbrown.org> wrote:

> PETSc does not "wrap" cout.
>
    What I meant here is that petsc has a custom output stream through
PetscPrintf. I was wondering if that might have affected the stdout and
hence the no print. The constructor in the libTest/include/test.h was just
supposed to print a string when called in the benchmarks/Test1/main.cpp


> Creating a library first, with an executable front-end that most/all
> initial users will use is generally good design. More users of the library
> emerge as people try to do more advanced/custom things that are not
> appropriate to do with the executable.
>

Ok... This makes sense.  (It would be great if you could point me towards
some project whose structure I can borrow.)
Why I did what I did was that if the petsc environment is encapsulated in
the library, the rest of the code in the main.cpp would be outside. Since I
was writing a code for a very small audience(mostly me) I thought it would
be easier to debug if I was inside the petsc environment.


>
> "Abhishek G.S." <gsabhishek1ags at gmail.com> writes:
>
> > Hi,
> > I am trying to develop a static-library-based app using petsc. The
> > structure goes as,
> > .
> > ??? benchmarks
> > ?   ??? Test1
> > ?       ??? main.cpp
> > ?       ??? Makefile
> > ??? libTest
> >     ??? build
>
> Are build products meant to go into this build/ directory, not under lib/
> as you have it?
>

I routed the static library output to the libTest/lib folder in the
libTest/CMakeLists.txt.
The /benchmarks/Test1/Makefile includes this to the ld path


>
> >     ??? CMakeLists.txt
> >     ??? include
> >     ?   ??? test.cpp
>
> cpp files would usually go under src/ or almost anywhere but in include/
>

noted.


>
> >     ?   ??? test.h
> >     ??? lib
> >         ??? libTest.a
> >
> > While this code compiles, I am unable to create a minimal working example
> > for the same. The aim is to just print "Hello World". Why is it that
> > nothing prints?. Is it something to do with the PETSC wrapper for cout?
> > Also, I would like to know whether it's a good idea to go ahead with this
> > kind of code structure.
> >
> > Thanks for the help.
> >
> > Code: https://github.com/gsabhishek/PetscStaticLibraryApp.git
> >
> > Thanks for the help
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210825/bb9e94ca/attachment.html>

From knepley at gmail.com  Wed Aug 25 06:03:21 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 25 Aug 2021 07:03:21 -0400
Subject: [petsc-users] Static Library based app for petsc
In-Reply-To: <CAED37r-ihKFCYsfo7KK1dum==RN3ic8uB2ZMUcy6EgsZtOW9=A@mail.gmail.com>
References: <CAED37r_R7e6i3UEjRxXC3jtRaPg6FLqb1ehRFuXvkttaoff1RQ@mail.gmail.com>
	<87v93u402t.fsf@jedbrown.org>
	<CAED37r-ihKFCYsfo7KK1dum==RN3ic8uB2ZMUcy6EgsZtOW9=A@mail.gmail.com>
Message-ID: <CAMYG4Gkw4ezbsFvW8-CxX+=BqH0rTMJgk_4pP7PW=6YmKDhaLQ@mail.gmail.com>

On Wed, Aug 25, 2021 at 1:00 AM Abhishek G.S. <gsabhishek1ags at gmail.com>
wrote:

> Thanks for the reply.
>
> On Wed, 25 Aug 2021 at 10:01, Jed Brown <jed at jedbrown.org> wrote:
>
>> PETSc does not "wrap" cout.
>>
>     What I meant here is that petsc has a custom output stream through
> PetscPrintf. I was wondering if that might have affected the stdout and
> hence the no print. The constructor in the libTest/include/test.h was just
> supposed to print a string when called in the benchmarks/Test1/main.cpp
>

PetscPrintf() just calls printf() underneath.


>
>> Creating a library first, with an executable front-end that most/all
>> initial users will use is generally good design. More users of the library
>> emerge as people try to do more advanced/custom things that are not
>> appropriate to do with the executable.
>>
>
> Ok... This makes sense.  (It would be great if you could point me towards
> some project whose structure I can borrow.)
> Why I did what I did was that if the petsc environment is encapsulated in
> the library, the rest of the code in the main.cpp would be outside. Since I
> was writing a code for a very small audience(mostly me) I thought it would
> be easier to debug if I was inside the petsc environment.
>

There are many codes that use PETSc in this way. For example,

  https://petsc.org/release/#related-toolkits-libraries-that-use-petsc

  Thanks,

    Matt


>
>
>
>>
>> "Abhishek G.S." <gsabhishek1ags at gmail.com> writes:
>>
>> > Hi,
>> > I am trying to develop a static-library-based app using petsc. The
>> > structure goes as,
>> > .
>> > ??? benchmarks
>> > ?   ??? Test1
>> > ?       ??? main.cpp
>> > ?       ??? Makefile
>> > ??? libTest
>> >     ??? build
>>
>> Are build products meant to go into this build/ directory, not under lib/
>> as you have it?
>>
>
> I routed the static library output to the libTest/lib folder in the
> libTest/CMakeLists.txt.
> The /benchmarks/Test1/Makefile includes this to the ld path
>
>
>>
>> >     ??? CMakeLists.txt
>> >     ??? include
>> >     ?   ??? test.cpp
>>
>> cpp files would usually go under src/ or almost anywhere but in include/
>>
>
> noted.
>
>
>>
>> >     ?   ??? test.h
>> >     ??? lib
>> >         ??? libTest.a
>> >
>> > While this code compiles, I am unable to create a minimal working
>> example
>> > for the same. The aim is to just print "Hello World". Why is it that
>> > nothing prints?. Is it something to do with the PETSC wrapper for cout?
>> > Also, I would like to know whether it's a good idea to go ahead with
>> this
>> > kind of code structure.
>> >
>> > Thanks for the help.
>> >
>> > Code: https://github.com/gsabhishek/PetscStaticLibraryApp.git
>> >
>> > Thanks for the help
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210825/b6bac3fa/attachment-0001.html>

From sayosale at hotmail.com  Wed Aug 25 07:11:58 2021
From: sayosale at hotmail.com (dazza simplythebest)
Date: Wed, 25 Aug 2021 12:11:58 +0000
Subject: [petsc-users] Fw:  Improving efficiency of slepc usage
In-Reply-To: <MEYP282MB1975548CFDB3392C26941AC8D0C69@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
References: <MEYP282MB1975D9155BCF70CF2F5D65CAD0C09@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<EF005C76-523A-4D6B-8DA5-45A9A3FC37B3@dsic.upv.es>
	<MEYP282MB19759AF49362AE3EF4024546D0C19@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<CAMYG4Gn1h_S4o9LsnyzL_PRkHMxsXkpHgL16n_3iL5B6bRq7sw@mail.gmail.com>
	<MEYP282MB19750B9F47336C606742AA73D0C49@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<CAMYG4GkP6gM=KGaCQvkAFHLBVeKZY-nMt=-MH2qimh922wOK_g@mail.gmail.com>
	<MEYP282MB1975548CFDB3392C26941AC8D0C69@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
Message-ID: <MEYP282MB19755EE893AC7BBE055DE4E0D0C69@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>


________________________________
From: dazza simplythebest <sayosale at hotmail.com>
Sent: Wednesday, August 25, 2021 12:08 PM
To: Matthew Knepley <knepley at gmail.com>
Subject: Re: [petsc-users] Improving efficiency of slepc usage

?Dear Matthew and Jose,
                                          I have derived a smaller program from the original program by constructing
matrices of the same size, but filling their entries randomly instead of computing the correct
fluid dynamics values just to allow  faster experimentation. This modified code's behaviour seems
 to be similar, with the code again failing for the large matrix case  with  the SIGKILL error, so I first report
results from that code here. Firstly I can confirm that I am using Fortran , and I am compiling with the
 intel compiler, which it seems places automatic arrays on the stack.  The stacksize, as determined
by ulimit -a, is reported to be :
stack size              (kbytes, -s) 8192

[1] Okay, so I followed your suggestion and used ctrl-c  followed by 'where' in one of the non-SIGKILL gdb windows.
 I have pasted the output into the bottom of this email (see [1] output) - it does look like the problem occurs somewhere in the call
 to the MUMPS solver ?

[2] I have also today gained access to another workstation, and so have tried running the (original) code on that machine.
  This new machine has two (more powerful) CPU nodes and a larger memory (both machines feature Intel Xeon processors).
On this new machine the large matrix case again failed with the familiar SIGKILL report when I used 16 or 12 MPI
processes,  ran to the end w/out error for 4 or 6 MPI processes, and failed but with a PETSC error message
 when I used 8 MPI processes, which I have pasted below (see [2] output). Does this point to some sort of resource
demand that exceeds some limit as the number of MPI processes increases ?

  Many thanks once again,
            Dan.

[2] output
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Error in external library
[0]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6

[0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
[0]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
[0]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
[0]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
[1]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[1]PETSC ERROR: Error in external library
[1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6

[1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[1]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
[1]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
[1]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
[1]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
[1]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
[1]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
[1]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
[1]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
[2]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[2]PETSC ERROR: Error in external library
[2]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6

[2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[2]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
[2]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
[2]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
[2]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
[2]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
[2]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
[2]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
[2]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
[3]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[3]PETSC ERROR: Error in external library
[3]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6

[3]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[3]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
[3]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
[3]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
[3]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
[3]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
[3]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
[3]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
[3]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
[3]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
[3]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
[3]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
[4]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[4]PETSC ERROR: Error in external library
[4]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6

[4]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[4]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
[4]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
[4]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
[4]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
[4]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
[4]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
[4]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
[4]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
[4]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
[4]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
[4]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
[4]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
[5]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[5]PETSC ERROR: Error in external library
[5]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6

[5]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[5]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
[5]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
[5]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
[5]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
[5]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
[5]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
[5]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
[5]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
[5]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
[5]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
[5]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
[5]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
[6]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[6]PETSC ERROR: Error in external library
[6]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=21891045

[6]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[6]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
[6]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
[6]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
[6]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
[6]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
[6]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
[6]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
[6]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
[6]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
[6]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
[6]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
[6]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
[7]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[7]PETSC ERROR: Error in external library
[7]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=21841925

[7]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[7]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
[7]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
[7]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
[7]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
[7]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
[7]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
[7]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
[7]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
[7]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
[7]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
[7]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
[7]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
[0]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
[0]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
[0]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
[0]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
[0]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
[0]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
[0]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
[0]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
[1]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
[1]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
[1]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
[1]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
[2]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
[2]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
[2]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
[2]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
[3]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136


[1] output

Continuing.
[New Thread 0x7f6f5b2d2780 (LWP 794037)]
[New Thread 0x7f6f5aad0800 (LWP 794040)]
[New Thread 0x7f6f5a2ce880 (LWP 794041)]
^C
Thread 1 "my.exe" received signal SIGINT, Interrupt.
0x00007f72904927b0 in ofi_fastlock_release_noop ()
   from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
(gdb) where
#0  0x00007f72904927b0 in ofi_fastlock_release_noop ()
   from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
#1  0x00007f729049354b in ofi_cq_readfrom ()
   from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
#2  0x00007f728ffe8f0e in rxm_ep_do_progress ()
   from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
#3  0x00007f728ffe2b7d in rxm_ep_recv_common_flags ()
   from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
#4  0x00007f728ffe30f8 in rxm_ep_trecvmsg ()
   from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
#5  0x00007f72fe6b8c3e in PMPI_Iprobe (source=14090824, tag=-1481647392,
    comm=1, flag=0x0, status=0xffffffffffffffff)
    at /usr/include/rdma/fi_tagged.h:109
#6  0x00007f72ff3d7fad in pmpi_iprobe_ (v1=0xd70248, v2=0x7ffda7afdae0,
    v3=0x1, v4=0x0, v5=0xffffffffffffffff, ierr=0xd6fc90)
    at ../../src/binding/fortran/mpif_h/iprobef.c:276
#7  0x00007f730855b6e2 in zmumps_try_recvtreat (comm_load=1, ass_irecv=0,
    blocking=<error reading variable: Cannot access memory at address 0x1>,

    --Type <RET> for more, q to quit, c to continue without paging--cont
    irecv=<error reading variable: Cannot access memory at address 0x0>, message_received=<error reading variable: Cannot access memory at address 0xffffffffffffffff>, msgsou=1, msgtag=-1, status=..., bufr=..., lbufr=320782504, lbufr_bytes=1283130016, procnode_steps=..., posfac=1, iwpos=1, iwposcb=292535, iptrlu=2039063816, lrlu=2039063816, lrlus=2039063816, n=50400, iw=..., liw=292563, a=..., la=2611636796, ptrist=..., ptlust=..., ptrfac=..., ptrast=..., step=..., pimaster=..., pamaster=..., nstk_s=..., comp=0, iflag=0, ierror=0, comm=-1006632958, nbprocfils=..., ipool=..., lpool=5, leaf=1, nbfin=4, myid=1, slavef=4, root=<error reading variable: value of type `zmumps_root_struc' requires 766016 bytes, which is more than max-value-size>, opassw=0, opeliw=0, itloc=..., rhs_mumps=..., fils=..., dad=..., ptrarw=..., ptraiw=..., intarr=..., dblarr=..., icntl=..., keep=..., keep8=..., dkeep=..., nd=..., frere=..., lptrar=50400, nelt=1, frtptr=..., frtelt=..., istep_to_iniv2=..., tab_pos_in_pere=..., stack_right_authorized=4294967295, lrgroups=...) at zfac_process_message.F:730
#8  0x00007f73087076e2 in zmumps_fac_par_m::zmumps_fac_par (n=1, iw=..., liw=<error reading variable: Cannot access memory at address 0x1>, a=..., la=<error reading variable: Cannot access memory at address 0xffffffffffffffff>, nstk_steps=..., nbprocfils=..., nd=..., fils=..., step=..., frere=..., dad=..., cand=..., istep_to_iniv2=..., tab_pos_in_pere=..., nstepsdone=1690339657, opass=<error reading variable: Cannot access memory at address 0x5>, opeli=<error reading variable: Cannot access memory at address 0x0>, nelva=50400, comp=259581, maxfrt=-1889517576, nmaxnpiv=-1195144887, ntotpv=<error reading variable: Cannot access memory at address 0x2>, noffnegpv=<error reading variable: Cannot access memory at address 0x0>, nb22t1=<error reading variable: Cannot access memory at address 0x0>, nb22t2=<error reading variable: Cannot access memory at address 0x0>, nbtiny=<error reading variable: Cannot access memory at address 0x0>, det_exp=<error reading variable: Cannot access memory at address 0x0>, det_mant=<error reading variable: Cannot access memory at address 0x0>, det_sign=<error reading variable: Cannot access memory at address 0x0>, ptrist=..., ptrast=..., pimaster=..., pamaster=..., ptrarw=..., ptraiw=..., itloc=..., rhs_mumps=..., ipool=..., lpool=<error reading variable: Cannot access memory at address 0x0>, rinfo=<error reading variable: Cannot access memory at address 0x0>, posfac=<error reading variable: Cannot access memory at address 0x0>, iwpos=<error reading variable: Cannot access memory at address 0x0>, lrlu=<error reading variable: Cannot access memory at address 0x0>, iptrlu=<error reading variable: Cannot access memory at address 0x0>, lrlus=<error reading variable: Cannot access memory at address 0x0>, leaf=<error reading variable: Cannot access memory at address 0x0>, nbroot=<error reading variable: Cannot access memory at address 0x0>, nbrtot=<error reading variable: Cannot access memory at address 0x0>, uu=<error reading variable: Cannot access memory at address 0x0>, icntl=<error reading variable: Cannot access memory at address 0x0>, ptlust=..., ptrfac=..., info=<error reading variable: Cannot access memory at address 0x0>, keep=<error reading variable: Cannot access memory at address 0x3ff0000000000000>, keep8=<error reading variable: Cannot access memory at address 0x0>, procnode_steps=..., slavef=<error reading variable: Cannot access memory at address 0x4ffffffff>, myid=<error reading variable: Cannot access memory at address 0xffffffff>, comm_nodes=<error reading variable: Cannot access memory at address 0x0>, myid_nodes=<error reading variable: Cannot access memory at address 0x0>, bufr=..., lbufr=0, lbufr_bytes=5, intarr=..., dblarr=..., root=..., perm=..., nelt=0, frtptr=..., frtelt=..., lptrar=3, comm_load=-30, ass_irecv=30, seuil=2.1219957909652723e-314, seuil_ldlt_niv2=4.2439866417681519e-314, mem_distrib=..., ne=..., dkeep=..., pivnul_list=..., lpn_list=0, lrgroups=...) at zfac_par_m.F:182
#9  0x00007f730865af7a in zmumps_fac_b (n=1, s_is_pointers=..., la=<error reading variable: Cannot access memory at address 0x1>, liw=<error reading variable: Cannot access memory at address 0x0>, sym_perm=..., na=..., lna=1, ne_steps=..., nfsiz=..., fils=..., step=..., frere=..., dad=..., cand=..., istep_to_iniv2=..., tab_pos_in_pere=..., ptrar=..., ldptrar=<error reading variable: Cannot access memory at address 0x0>, ptrist=..., ptlust_s=..., ptrfac=..., iw1=..., iw2=..., itloc=..., rhs_mumps=..., pool=..., lpool=-1889529280, cntl1=-5.3576889161551131e-255, icntl=<error reading variable: Cannot access memory at address 0x25344>, info=..., rinfo=..., keep=..., keep8=..., procnode_steps=..., slavef=-1889504640, comm_nodes=-2048052411, myid=<error reading variable: Cannot access memory at address 0x81160>, myid_nodes=-1683330500, bufr=..., lbufr=<error reading variable: Cannot access memory at address 0x11db4c>, lbufr_bytes=<error reading variable: Cannot access memory at address 0xc4e0>, zmumps_lbuf=<error reading variable: Cannot access memory at address 0x4>, intarr=..., dblarr=..., root=<error reading variable: Cannot access memory at address 0x11dbec>, nelt=<error reading variable: Cannot access memory at address 0x3>, frtptr=..., frtelt=..., comm_load=<error reading variable: Cannot access memory at address 0x0>, ass_irecv=<error reading variable: Cannot access memory at address 0x0>, seuil=<error reading variable: Cannot access memory at address 0x0>, seuil_ldlt_niv2=<error reading variable: Cannot access memory at address 0x0>, mem_distrib=<error reading variable: Cannot access memory at address 0x0>, dkeep=<error reading variable: Cannot access memory at address 0x0>, pivnul_list=..., lpn_list=<error reading variable: Cannot access memory at address 0x0>, lrgroups=...) at zfac_b.F:243
#10 0x00007f7308610ff7 in zmumps_fac_driver (id=<error reading variable: value of type `zmumps_struc' requires 386095520 bytes, which is more than max-value-size>) at zfac_driver.F:2421
#11 0x00007f7308569256 in zmumps (id=<error reading variable: value of type `zmumps_struc' requires 386095520 bytes, which is more than max-value-size>) at zmumps_driver.F:1883
#12 0x00007f73084cf756 in zmumps_f77 (job=1, sym=0, par=<error reading variable: Cannot access memory at address 0x1>, comm_f77=<error reading variable: Cannot access memory at address 0x0>, n=<error reading variable: Cannot access memory at address 0xffffffffffffffff>, nblk=1, icntl=..., cntl=..., keep=..., dkeep=..., keep8=..., nz=0, nnz=0, irn=..., irnhere=0, jcn=..., jcnhere=0, a=..., ahere=0, nz_loc=0, nnz_loc=304384739, irn_loc=..., irn_lochere=1, jcn_loc=..., jcn_lochere=1, a_loc=..., a_lochere=1, nelt=0, eltptr=..., eltptrhere=0, eltvar=..., eltvarhere=0, a_elt=..., a_elthere=0, blkptr=..., blkptrhere=0, blkvar=..., blkvarhere=0, perm_in=..., perm_inhere=0, rhs=..., rhshere=0, redrhs=..., redrhshere=0, info=..., rinfo=..., infog=..., rinfog=..., deficiency=0, lwk_user=0, size_schur=0, listvar_schur=..., listvar_schurhere=0, schur=..., schurhere=0, wk_user=..., wk_userhere=0, colsca=..., colscahere=0, rowsca=..., rowscahere=0, instance_number=1, nrhs=1, lrhs=0, lredrhs=0, rhs_sparse=..., rhs_sparsehere=0, sol_loc=..., sol_lochere=0, rhs_loc=..., rhs_lochere=0, irhs_sparse=..., irhs_sparsehere=0, irhs_ptr=..., irhs_ptrhere=0, isol_loc=..., isol_lochere=0, irhs_loc=..., irhs_lochere=0, nz_rhs=0, lsol_loc=0, lrhs_loc=0, nloc_rhs=0, schur_mloc=0, schur_nloc=0, schur_lld=0, mblock=0, nblock=0, nprow=0, npcol=0, ooc_tmpdir=..., ooc_prefix=..., write_problem=..., save_dir=..., save_prefix=..., tmpdirlen=20, prefixlen=20, write_problemlen=20, save_dirlen=20, save_prefixlen=20, metis_options=...) at zmumps_f77.F:289
#13 0x00007f73084cd391 in zmumps_c (mumps_par=0xd70248) at mumps_c.c:485
#14 0x00007f7307c035ad in MatFactorNumeric_MUMPS (F=0xd70248, A=0x7ffda7afdae0, info=0x1) at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1683
#15 0x00007f7307765a8b in MatLUFactorNumeric (fact=0xd70248, mat=0x7ffda7afdae0, info=0x1) at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
#16 0x00007f73081b8427 in PCSetUp_LU (pc=0xd70248) at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
#17 0x00007f7308214939 in PCSetUp (pc=0xd70248) at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
#18 0x00007f73082260ae in KSPSetUp (ksp=0xd70248) at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
#19 0x00007f7309114959 in STSetUp_Sinvert (st=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
#20 0x00007f7309130462 in STSetUp (st=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
#21 0x00007f73092504af in EPSSetUp (eps=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
#22 0x00007f7309253635 in EPSSolve (eps=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
#23 0x00007f7309259c8d in epssolve_ (eps=0xd70248, __ierr=0x7ffda7afdae0) at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/ftn-auto/epssolvef.c:85
#24 0x0000000000403c19 in all_stab_routines::solve_by_slepc2 (a_pet=..., b_pet=..., jthisone=<error reading variable: Cannot access memory at address 0x1>, isize=<error reading variable: Cannot access memory at address 0x0>) at small_slepc_example_program.F:322
#25 0x00000000004025a0 in slepit () at small_slepc_example_program.F:549
#26 0x00000000004023f2 in main ()
#27 0x00007f72fb8380b3 in __libc_start_main (main=0x4023c0 <main>, argc=14, argv=0x7ffda7b024e8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffda7b024d8) at ../csu/libc-start.c:308
#28 0x00000000004022fe in _start ()

________________________________
From: Matthew Knepley <knepley at gmail.com>
Sent: Tuesday, August 24, 2021 3:59 PM
To: dazza simplythebest <sayosale at hotmail.com>
Cc: Jose E. Roman <jroman at dsic.upv.es>; PETSc <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Improving efficiency of slepc usage

On Tue, Aug 24, 2021 at 8:47 AM dazza simplythebest <sayosale at hotmail.com<mailto:sayosale at hotmail.com>> wrote:

Dear Matthew and Jose,
   Apologies for the delayed reply, I had a couple of unforeseen days off this week.
Firstly regarding Jose's suggestion re: MUMPS, the program is already using MUMPS
to solve linear systems (the code is using a distributed MPI  matrix to solve the generalised
non-Hermitian complex problem).

I have tried the gdb debugger as per Matthew's suggestion.
Just to note in case someone else is following this that at first it didn't work (couldn't 'attach') ,
but after some googling I found a tip suggesting the command;
echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
which seemed to get it working.

I then first ran the debugger on the small matrix case that worked.
That stopped in gdb almost immediately after starting execution
with a report regarding 'nanosleep.c':
../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory.
However, issuing the 'cont' command again caused the program to run through to the end of the
 execution w/out any problems, and with correct looking results, so I am guessing this error
is not particularly important.

We do that on purpose when the debugger starts up. Typing 'cont' is correct.

I then tried the same debugging procedure on the large matrix case that fails.
The code again stopped almost immediately after the start of execution with
the same nanosleep error as before, and I was able to set the program running
 again with 'cont' (see full output below). I was running the code with 4 MPI processes,
 and so had 4 gdb windows appear.  Thereafter the code ran for sometime until completing the
matrix construction, and then one of the gdb process windows printed a
Program terminated with signal SIGKILL, Killed.
The program no longer exists.
message.  I then typed 'where' into this terminal but just received the message
No stack.

I have only seen this behavior one other time, and it was with Fortran. Fortran allows you to declare really big arrays
on the stack by putting them at the start of a function (rather than F90 malloc). When I had one of those arrays exceed
the stack space, I got this kind of an error where everything is destroyed rather than just stopping. Could it be that you
have a large structure on the stack?

Second, you can at least look at the stack for the processes that were not killed. You type Ctrl-C, which should give you
the prompt and then "where".

  Thanks,

      Matt

The other gdb windows basically seemed to be left in limbo until I issued the 'quit'
 command in the SIGKILL, and then they vanished.

I paste the full output from the gdb window that recorded the SIGKILL below here.
I guess it is necessary to somehow work out where the SIGKILL originates from ?

 Thanks once again,
                         Dan.


 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./stab1.exe...
Attaching to program: /data/work/rotplane/omega_to_zero/stability/test/tmp10/tmp6/stab1.exe, process 675919
Reading symbols from /data/work/slepc/SLEPC/slepc-3.15.1/arch-omp_nodbug/lib/libslepc.so.3.15...
Reading symbols from /data/work/slepc/PETSC/petsc-3.15.0/arch-omp_nodbug/lib/libpetsc.so.3.15...
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib--Type <RET> for more, q to quit, c to continue without paging--cont
/intel64_lin/libmkl_intel_lp64.so...
(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so)
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so...
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so...
(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so)
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so...
(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.so...
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.dbg...
Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...
Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libdl-2.31.so...
Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...
Reading symbols from /usr/lib/debug/.build-id/e5/4761f7b554d0fcc1562959665d93dffbebdaf0.debug...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6...
(No debugging symbols found in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpifort.so.12...
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12...
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.dbg...
Reading symbols from /lib/x86_64-linux-gnu/librt.so.1...
Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/librt-2.31.so...
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5...
(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5)
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so...
(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so)
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so...
(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so)
Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...
Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libm-2.31.so...
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so...
(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so)
Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1...
(No debugging symbols found in /lib/x86_64-linux-gnu/libgcc_s.so.1)
Reading symbols from /usr/lib/x86_64-linux-gnu/libquadmath.so.0...
(No debugging symbols found in /usr/lib/x86_64-linux-gnu/libquadmath.so.0)
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so...
(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so)
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...
Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.31.so...
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so...
(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so)
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5...
(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5)
Reading symbols from /lib64/ld-linux-x86-64.so.2...
Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.31.so...
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1...
(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1)
Reading symbols from /usr/lib/x86_64-linux-gnu/libnuma.so...
(No debugging symbols found in /usr/lib/x86_64-linux-gnu/libnuma.so)
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so...
(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so)
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so...
(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so)
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so...
(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so)
Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so...
(No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so)
Reading symbols from /usr/lib/x86_64-linux-gnu/libpsm2.so.2...
(No debugging symbols found in /usr/lib/x86_64-linux-gnu/libpsm2.so.2)
0x00007fac4d0d8334 in __GI___clock_nanosleep (clock_id=<optimized out>, clock_id at entry=0, flags=flags at entry=0, req=req at entry=0x7ffdc641a9a0, rem=rem at entry=0x7ffdc641a9a0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
78      ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory.
(gdb) cont
Continuing.
[New Thread 0x7f9e49c02780 (LWP 676559)]
[New Thread 0x7f9e49400800 (LWP 676560)]
[New Thread 0x7f9e48bfe880 (LWP 676562)]
[Thread 0x7f9e48bfe880 (LWP 676562) exited]
[Thread 0x7f9e49400800 (LWP 676560) exited]
[Thread 0x7f9e49c02780 (LWP 676559) exited]

Program terminated with signal SIGKILL, Killed.
The program no longer exists.
(gdb) where
No stack.

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

________________________________
From: Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
Sent: Friday, August 20, 2021 2:12 PM
To: dazza simplythebest <sayosale at hotmail.com<mailto:sayosale at hotmail.com>>
Cc: Jose E. Roman <jroman at dsic.upv.es<mailto:jroman at dsic.upv.es>>; PETSc <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Improving efficiency of slepc usage

On Fri, Aug 20, 2021 at 6:55 AM dazza simplythebest <sayosale at hotmail.com<mailto:sayosale at hotmail.com>> wrote:
Dear Jose,
    Many thanks for your response, I have been investigating this issue with a few more calculations
today, hence the slightly delayed response.

The problem is actually derived from a fluid dynamics problem, so to allow an easier exploration of things
I first downsized the resolution of the underlying fluid solver while keeping all the physical parameters
 the same - i.e. I would get a smaller matrix that should be solving the same physical problem as the original
 larger matrix but to lower accuracy.

Results

Small matrix (N= 21168) - everything good!
This converged when using the -eps_largest_real approach (taking 92 iterations for nev=10,
tol= 5.0000E-06 and ncv = 300), and also when using the shift-invert approach, converging
very impressively in a single iteration ! Interestingly it did this both for a non-zero  -eps_target
 and also for a zero  -eps_target.

Large matrix (N=50400)- works for -eps_largest_real , fails for st_type sinvert
I have just double checked again that the code does run properly when we use the -eps_largest_real
option - indeed I ran it with a small nev and large tolerance (nev = 4, tol= -eps_tol 5.0e-4 , ncv = 300)
and with these parameters convergence was obtained in 164 iterations, which took 6 hours on the
machine I was running it on. Furthermore the eigenvalues seem to be ballpark correct; for this large
higher resolution case (although with lower slepc tolerance) we obtain 1789.56816314173 -4724.51319554773i
 as the eigenvalue with largest real part, while the smaller matrix (same physical problem but at lower resolution case)
found this eigenvalue to be 1831.11845726501 -4787.54519511345i , which means the agreement is in line
with expectations.

Unfortunately though the code does still crash though when I try to do shift-invert for the large matrix case ,
 whether or not I use a non-zero  -eps_target. For reference this is the command line used :
-eps_nev 10    -eps_ncv 300  -log_view -eps_view   -eps_target 0.1 -st_type sinvert -eps_monitor :monitor_output05.txt
To be precise the code crashes soon after calling EPSSolve (it successfully calls
 MatCreateVecs, EPSCreate,  EPSSetOperators, EPSSetProblemType and EPSSetFromOptions).
By crashes I mean that I do not even get any error messages from slepc/PETSC, and do not even get the
'EPS Object: 16 MPI processes' message - I simply get a  MPI/Fortran 'KILLED BY SIGNAL: 9 (Killed)' message
 as soon as EPSsolve is called.

Hi Dan,

It would help track this error down if we had a stack trace. You can get a stack trace from the debugger. You run with

  -start_in_debugger

which should launch the debugger (usually), and then type

  cont

to continue, and then

  where

to get the stack trace when it crashes, or 'bt' on lldb.

  Thanks,

     Matt

Do you have any ideas as to why this larger matrix case should fail when using shift-invert but succeed when using
-eps_largest_real ? The fact that the program works and produces correct results
when using the -eps_largest_real  option suggests that there is probably nothing wrong with the specification
of the problem or the matrices ? It is strange how there is no error message from slepc / Petsc ... the
only idea I have at the moment is that perhaps max memory has been exceeded, which could cause such a sudden
shutdown? For your reference when running the large matrix case with the -eps_largest_real option I am using
about 36 GB of the 148GB available on this machine  - does the shift invert approach require substantially
more memory for example ?

  I would be very grateful if you have any suggestions to resolve this issue or even ways to clarify it further,
 the performance I have seen with the shift-invert for the small matrix is so impressive it would be great to
 get that working for the full-size problem.

   Many thanks and best wishes,
                                  Dan.


________________________________
From: Jose E. Roman <jroman at dsic.upv.es<mailto:jroman at dsic.upv.es>>
Sent: Thursday, August 19, 2021 7:58 AM
To: dazza simplythebest <sayosale at hotmail.com<mailto:sayosale at hotmail.com>>
Cc: PETSc <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Improving efficiency of slepc usage

In A) convergence may be slow, especially if the wanted eigenvalues have small magnitude. I would not say 600 iterations is a lot, you probably need many more. In most cases, approach B) is better because it improves convergence of eigenvalues close to the target, but it requires prior knowledge of your spectrum distribution in order to choose an appropriate target.

In B) what do you mean that it crashes. If you get an error about factorization, it means that your A-matrix is singular, In that case, try using a nonzero target -eps_target 0.1

Jose


> El 19 ago 2021, a las 7:12, dazza simplythebest <sayosale at hotmail.com<mailto:sayosale at hotmail.com>> escribi?:
>
> Dear All,
>             I am planning on using slepc to do a large number of eigenvalue calculations
>  of a generalized eigenvalue problem, called from a program written in fortran using MPI.
>  Thus far I have successfully installed the slepc/PETSc software, both locally and on a cluster,
>  and on smaller test problems everything is working well; the matrices are efficiently and
> correctly constructed and slepc returns the correct spectrum. I am just now starting to move
> towards now solving the full-size 'production run' problems, and would appreciate some
> general advice on how to improve the solver's performance.
>
> In particular, I am currently trying to solve the problem Ax = lambda Bx whose matrices
> are of size 50000 (this is the smallest 'production run' problem I will be tackling), and are
> complex, non-Hermitian.  In most cases I aim to find the eigenvalues with the largest real part,
> although in other cases I will also be interested in finding the eigenvalues whose real part
> is close to zero.
>
> A)
> Calling slepc 's EPS solver with the following options:
>
> -eps_nev 10   -log_view -eps_view -eps_max_it 600 -eps_ncv 140  -eps_tol 5.0e-6  -eps_largest_real -eps_monitor :monitor_output.txt
>
>
> led to the code successfully running, but failing to find any eigenvalues within the maximum 600 iterations
> (examining the monitor output it did appear to be very slowly approaching convergence).
>
> B)
> On the same problem I have also tried a shift-invert transformation using the options
>
> -eps_nev 10    -eps_ncv 140    -eps_target 0.0+0.0i  -st_type sinvert
>
> -in this case the code crashed at the point it tried to call slepc, so perhaps I have incorrectly specified these options ?
>
>
> Does anyone have any suggestions as to how to improve this performance ( or find out more about the problem) ?
> In the case of A) I can see from watching the slepc   videos that increasing ncv
> may help, but I am wondering , since 600 is a large number of iterations, whether there
> maybe something else going on - e.g. perhaps some alternative preconditioner may help ?
> In the case of B), I guess there must be some mistake in these command line options?
>  Again, any advice will be greatly appreciated.
>      Best wishes,  Dan.


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210825/d747a733/attachment-0001.html>

From gsharma4189 at gmail.com  Wed Aug 25 08:07:34 2021
From: gsharma4189 at gmail.com (govind sharma)
Date: Wed, 25 Aug 2021 18:37:34 +0530
Subject: [petsc-users] laplace_equation
Message-ID: <CAGRS4a5xRRickahOOs=4dXcQ24kU-REdDh6XAsd-f9bf6JkhuA@mail.gmail.com>

Hi,

I want to solve 2D laplace equations at the starting level with petsc4py in
parallel using mpi4py.

Any examples or tutorials?


Regards,
Govind Sharma
Phd scholar, Indian Institute of Technology, Delhi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210825/924c98c4/attachment.html>

From jroman at dsic.upv.es  Wed Aug 25 08:40:57 2021
From: jroman at dsic.upv.es (Jose E. Roman)
Date: Wed, 25 Aug 2021 15:40:57 +0200
Subject: [petsc-users] Improving efficiency of slepc usage
In-Reply-To: <MEYP282MB19755EE893AC7BBE055DE4E0D0C69@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
References: <MEYP282MB1975D9155BCF70CF2F5D65CAD0C09@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<EF005C76-523A-4D6B-8DA5-45A9A3FC37B3@dsic.upv.es>
	<MEYP282MB19759AF49362AE3EF4024546D0C19@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<CAMYG4Gn1h_S4o9LsnyzL_PRkHMxsXkpHgL16n_3iL5B6bRq7sw@mail.gmail.com>
	<MEYP282MB19750B9F47336C606742AA73D0C49@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<CAMYG4GkP6gM=KGaCQvkAFHLBVeKZY-nMt=-MH2qimh922wOK_g@mail.gmail.com>
	<MEYP282MB1975548CFDB3392C26941AC8D0C69@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<MEYP282MB19755EE893AC7BBE055DE4E0D0C69@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
Message-ID: <B30EBC8E-A797-4D1A-AE1E-7EDAD39A610F@dsic.upv.es>

MUMPS documentation (section 8) indicates that the meaning of INFOG(1)=-9 is insuficient workspace. Try running with
 -st_mat_mumps_icntl_14 <percentage>
where <percentage> is the percentage in which you want to increase the workspace, e.g. 50 or 100 or more.

See ex43.c for an example showing how to set this option in code.

Jose


> El 25 ago 2021, a las 14:11, dazza simplythebest <sayosale at hotmail.com> escribi?:
> 
> 
> 
> From: dazza simplythebest <sayosale at hotmail.com>
> Sent: Wednesday, August 25, 2021 12:08 PM
> To: Matthew Knepley <knepley at gmail.com>
> Subject: Re: [petsc-users] Improving efficiency of slepc usage
>  
> ?Dear Matthew and Jose,
>                                           I have derived a smaller program from the original program by constructing 
> matrices of the same size, but filling their entries randomly instead of computing the correct 
> fluid dynamics values just to allow  faster experimentation. This modified code's behaviour seems 
>  to be similar, with the code again failing for the large matrix case  with  the SIGKILL error, so I first report 
> results from that code here. Firstly I can confirm that I am using Fortran , and I am compiling with the 
>  intel compiler, which it seems places automatic arrays on the stack.  The stacksize, as determined 
> by ulimit -a, is reported to be :
> stack size              (kbytes, -s) 8192
> 
> [1] Okay, so I followed your suggestion and used ctrl-c  followed by 'where' in one of the non-SIGKILL gdb windows. 
>  I have pasted the output into the bottom of this email (see [1] output) - it does look like the problem occurs somewhere in the call
>  to the MUMPS solver ?
> 
> [2] I have also today gained access to another workstation, and so have tried running the (original) code on that machine.
>   This new machine has two (more powerful) CPU nodes and a larger memory (both machines feature Intel Xeon processors).
> On this new machine the large matrix case again failed with the familiar SIGKILL report when I used 16 or 12 MPI
> processes,  ran to the end w/out error for 4 or 6 MPI processes, and failed but with a PETSC error message 
>  when I used 8 MPI processes, which I have pasted below (see [2] output). Does this point to some sort of resource
> demand that exceeds some limit as the number of MPI processes increases ?
> 
>   Many thanks once again,
>             Dan.
> 
> [2] output
> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [0]PETSC ERROR: Error in external library
> [0]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6
> 
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [0]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
> [0]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
> [0]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [1]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [1]PETSC ERROR: Error in external library
> [1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6
> 
> [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [1]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [1]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
> [1]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
> [1]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [1]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [1]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [1]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [1]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [2]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [2]PETSC ERROR: Error in external library
> [2]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6
> 
> [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [2]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [2]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
> [2]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
> [2]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [2]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [2]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [2]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [2]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [3]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [3]PETSC ERROR: Error in external library
> [3]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6
> 
> [3]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [3]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [3]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
> [3]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
> [3]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [3]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [3]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [3]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [3]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [3]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [3]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [3]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [4]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [4]PETSC ERROR: Error in external library
> [4]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6
> 
> [4]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [4]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [4]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
> [4]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
> [4]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [4]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [4]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [4]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [4]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [4]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [4]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [4]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [4]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [5]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [5]PETSC ERROR: Error in external library
> [5]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6
> 
> [5]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [5]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [5]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
> [5]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
> [5]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [5]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [5]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [5]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [5]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [5]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [5]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [5]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [5]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [6]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [6]PETSC ERROR: Error in external library
> [6]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=21891045
> 
> [6]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [6]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [6]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
> [6]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
> [6]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [6]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [6]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [6]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [6]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [6]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [6]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [6]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [6]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [7]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [7]PETSC ERROR: Error in external library
> [7]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=21841925
> 
> [7]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [7]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [7]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
> [7]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
> [7]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [7]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [7]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [7]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [7]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [7]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [7]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [7]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [7]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [0]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [0]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [0]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [0]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [0]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [0]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [0]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [0]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [1]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [1]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [1]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [1]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [2]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [2]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [2]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [2]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [3]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> 
> 
> 
> [1] output
> 
> Continuing.
> [New Thread 0x7f6f5b2d2780 (LWP 794037)]
> [New Thread 0x7f6f5aad0800 (LWP 794040)]
> [New Thread 0x7f6f5a2ce880 (LWP 794041)]
> ^C
> Thread 1 "my.exe" received signal SIGINT, Interrupt.
> 0x00007f72904927b0 in ofi_fastlock_release_noop ()
>    from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
> (gdb) where
> #0  0x00007f72904927b0 in ofi_fastlock_release_noop ()
>    from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
> #1  0x00007f729049354b in ofi_cq_readfrom ()
>    from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
> #2  0x00007f728ffe8f0e in rxm_ep_do_progress ()
>    from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
> #3  0x00007f728ffe2b7d in rxm_ep_recv_common_flags ()
>    from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
> #4  0x00007f728ffe30f8 in rxm_ep_trecvmsg ()
>    from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
> #5  0x00007f72fe6b8c3e in PMPI_Iprobe (source=14090824, tag=-1481647392,
>     comm=1, flag=0x0, status=0xffffffffffffffff)
>     at /usr/include/rdma/fi_tagged.h:109
> #6  0x00007f72ff3d7fad in pmpi_iprobe_ (v1=0xd70248, v2=0x7ffda7afdae0,
>     v3=0x1, v4=0x0, v5=0xffffffffffffffff, ierr=0xd6fc90)
>     at ../../src/binding/fortran/mpif_h/iprobef.c:276
> #7  0x00007f730855b6e2 in zmumps_try_recvtreat (comm_load=1, ass_irecv=0,
>     blocking=<error reading variable: Cannot access memory at address 0x1>,
>     
>     --Type <RET> for more, q to quit, c to continue without paging--cont
>     irecv=<error reading variable: Cannot access memory at address 0x0>, message_received=<error reading variable: Cannot access memory at address 0xffffffffffffffff>, msgsou=1, msgtag=-1, status=..., bufr=..., lbufr=320782504, lbufr_bytes=1283130016, procnode_steps=..., posfac=1, iwpos=1, iwposcb=292535, iptrlu=2039063816, lrlu=2039063816, lrlus=2039063816, n=50400, iw=..., liw=292563, a=..., la=2611636796, ptrist=..., ptlust=..., ptrfac=..., ptrast=..., step=..., pimaster=..., pamaster=..., nstk_s=..., comp=0, iflag=0, ierror=0, comm=-1006632958, nbprocfils=..., ipool=..., lpool=5, leaf=1, nbfin=4, myid=1, slavef=4, root=<error reading variable: value of type `zmumps_root_struc' requires 766016 bytes, which is more than max-value-size>, opassw=0, opeliw=0, itloc=..., rhs_mumps=..., fils=..., dad=..., ptrarw=..., ptraiw=..., intarr=..., dblarr=..., icntl=..., keep=..., keep8=..., dkeep=..., nd=..., frere=..., lptrar=50400, nelt=1, frtptr=..., frtelt=..., istep_to_iniv2=..., tab_pos_in_pere=..., stack_right_authorized=4294967295, lrgroups=...) at zfac_process_message.F:730
> #8  0x00007f73087076e2 in zmumps_fac_par_m::zmumps_fac_par (n=1, iw=..., liw=<error reading variable: Cannot access memory at address 0x1>, a=..., la=<error reading variable: Cannot access memory at address 0xffffffffffffffff>, nstk_steps=..., nbprocfils=..., nd=..., fils=..., step=..., frere=..., dad=..., cand=..., istep_to_iniv2=..., tab_pos_in_pere=..., nstepsdone=1690339657, opass=<error reading variable: Cannot access memory at address 0x5>, opeli=<error reading variable: Cannot access memory at address 0x0>, nelva=50400, comp=259581, maxfrt=-1889517576, nmaxnpiv=-1195144887, ntotpv=<error reading variable: Cannot access memory at address 0x2>, noffnegpv=<error reading variable: Cannot access memory at address 0x0>, nb22t1=<error reading variable: Cannot access memory at address 0x0>, nb22t2=<error reading variable: Cannot access memory at address 0x0>, nbtiny=<error reading variable: Cannot access memory at address 0x0>, det_exp=<error reading variable: Cannot access memory at address 0x0>, det_mant=<error reading variable: Cannot access memory at address 0x0>, det_sign=<error reading variable: Cannot access memory at address 0x0>, ptrist=..., ptrast=..., pimaster=..., pamaster=..., ptrarw=..., ptraiw=..., itloc=..., rhs_mumps=..., ipool=..., lpool=<error reading variable: Cannot access memory at address 0x0>, rinfo=<error reading variable: Cannot access memory at address 0x0>, posfac=<error reading variable: Cannot access memory at address 0x0>, iwpos=<error reading variable: Cannot access memory at address 0x0>, lrlu=<error reading variable: Cannot access memory at address 0x0>, iptrlu=<error reading variable: Cannot access memory at address 0x0>, lrlus=<error reading variable: Cannot access memory at address 0x0>, leaf=<error reading variable: Cannot access memory at address 0x0>, nbroot=<error reading variable: Cannot access memory at address 0x0>, nbrtot=<error reading variable: Cannot access memory at address 0x0>, uu=<error reading variable: Cannot access memory at address 0x0>, icntl=<error reading variable: Cannot access memory at address 0x0>, ptlust=..., ptrfac=..., info=<error reading variable: Cannot access memory at address 0x0>, keep=<error reading variable: Cannot access memory at address 0x3ff0000000000000>, keep8=<error reading variable: Cannot access memory at address 0x0>, procnode_steps=..., slavef=<error reading variable: Cannot access memory at address 0x4ffffffff>, myid=<error reading variable: Cannot access memory at address 0xffffffff>, comm_nodes=<error reading variable: Cannot access memory at address 0x0>, myid_nodes=<error reading variable: Cannot access memory at address 0x0>, bufr=..., lbufr=0, lbufr_bytes=5, intarr=..., dblarr=..., root=..., perm=..., nelt=0, frtptr=..., frtelt=..., lptrar=3, comm_load=-30, ass_irecv=30, seuil=2.1219957909652723e-314, seuil_ldlt_niv2=4.2439866417681519e-314, mem_distrib=..., ne=..., dkeep=..., pivnul_list=..., lpn_list=0, lrgroups=...) at zfac_par_m.F:182
> #9  0x00007f730865af7a in zmumps_fac_b (n=1, s_is_pointers=..., la=<error reading variable: Cannot access memory at address 0x1>, liw=<error reading variable: Cannot access memory at address 0x0>, sym_perm=..., na=..., lna=1, ne_steps=..., nfsiz=..., fils=..., step=..., frere=..., dad=..., cand=..., istep_to_iniv2=..., tab_pos_in_pere=..., ptrar=..., ldptrar=<error reading variable: Cannot access memory at address 0x0>, ptrist=..., ptlust_s=..., ptrfac=..., iw1=..., iw2=..., itloc=..., rhs_mumps=..., pool=..., lpool=-1889529280, cntl1=-5.3576889161551131e-255, icntl=<error reading variable: Cannot access memory at address 0x25344>, info=..., rinfo=..., keep=..., keep8=..., procnode_steps=..., slavef=-1889504640, comm_nodes=-2048052411, myid=<error reading variable: Cannot access memory at address 0x81160>, myid_nodes=-1683330500, bufr=..., lbufr=<error reading variable: Cannot access memory at address 0x11db4c>, lbufr_bytes=<error reading variable: Cannot access memory at address 0xc4e0>, zmumps_lbuf=<error reading variable: Cannot access memory at address 0x4>, intarr=..., dblarr=..., root=<error reading variable: Cannot access memory at address 0x11dbec>, nelt=<error reading variable: Cannot access memory at address 0x3>, frtptr=..., frtelt=..., comm_load=<error reading variable: Cannot access memory at address 0x0>, ass_irecv=<error reading variable: Cannot access memory at address 0x0>, seuil=<error reading variable: Cannot access memory at address 0x0>, seuil_ldlt_niv2=<error reading variable: Cannot access memory at address 0x0>, mem_distrib=<error reading variable: Cannot access memory at address 0x0>, dkeep=<error reading variable: Cannot access memory at address 0x0>, pivnul_list=..., lpn_list=<error reading variable: Cannot access memory at address 0x0>, lrgroups=...) at zfac_b.F:243
> #10 0x00007f7308610ff7 in zmumps_fac_driver (id=<error reading variable: value of type `zmumps_struc' requires 386095520 bytes, which is more than max-value-size>) at zfac_driver.F:2421
> #11 0x00007f7308569256 in zmumps (id=<error reading variable: value of type `zmumps_struc' requires 386095520 bytes, which is more than max-value-size>) at zmumps_driver.F:1883
> #12 0x00007f73084cf756 in zmumps_f77 (job=1, sym=0, par=<error reading variable: Cannot access memory at address 0x1>, comm_f77=<error reading variable: Cannot access memory at address 0x0>, n=<error reading variable: Cannot access memory at address 0xffffffffffffffff>, nblk=1, icntl=..., cntl=..., keep=..., dkeep=..., keep8=..., nz=0, nnz=0, irn=..., irnhere=0, jcn=..., jcnhere=0, a=..., ahere=0, nz_loc=0, nnz_loc=304384739, irn_loc=..., irn_lochere=1, jcn_loc=..., jcn_lochere=1, a_loc=..., a_lochere=1, nelt=0, eltptr=..., eltptrhere=0, eltvar=..., eltvarhere=0, a_elt=..., a_elthere=0, blkptr=..., blkptrhere=0, blkvar=..., blkvarhere=0, perm_in=..., perm_inhere=0, rhs=..., rhshere=0, redrhs=..., redrhshere=0, info=..., rinfo=..., infog=..., rinfog=..., deficiency=0, lwk_user=0, size_schur=0, listvar_schur=..., listvar_schurhere=0, schur=..., schurhere=0, wk_user=..., wk_userhere=0, colsca=..., colscahere=0, rowsca=..., rowscahere=0, instance_number=1, nrhs=1, lrhs=0, lredrhs=0, rhs_sparse=..., rhs_sparsehere=0, sol_loc=..., sol_lochere=0, rhs_loc=..., rhs_lochere=0, irhs_sparse=..., irhs_sparsehere=0, irhs_ptr=..., irhs_ptrhere=0, isol_loc=..., isol_lochere=0, irhs_loc=..., irhs_lochere=0, nz_rhs=0, lsol_loc=0, lrhs_loc=0, nloc_rhs=0, schur_mloc=0, schur_nloc=0, schur_lld=0, mblock=0, nblock=0, nprow=0, npcol=0, ooc_tmpdir=..., ooc_prefix=..., write_problem=..., save_dir=..., save_prefix=..., tmpdirlen=20, prefixlen=20, write_problemlen=20, save_dirlen=20, save_prefixlen=20, metis_options=...) at zmumps_f77.F:289
> #13 0x00007f73084cd391 in zmumps_c (mumps_par=0xd70248) at mumps_c.c:485
> #14 0x00007f7307c035ad in MatFactorNumeric_MUMPS (F=0xd70248, A=0x7ffda7afdae0, info=0x1) at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1683
> #15 0x00007f7307765a8b in MatLUFactorNumeric (fact=0xd70248, mat=0x7ffda7afdae0, info=0x1) at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> #16 0x00007f73081b8427 in PCSetUp_LU (pc=0xd70248) at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> #17 0x00007f7308214939 in PCSetUp (pc=0xd70248) at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> #18 0x00007f73082260ae in KSPSetUp (ksp=0xd70248) at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> #19 0x00007f7309114959 in STSetUp_Sinvert (st=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> #20 0x00007f7309130462 in STSetUp (st=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> #21 0x00007f73092504af in EPSSetUp (eps=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> #22 0x00007f7309253635 in EPSSolve (eps=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> #23 0x00007f7309259c8d in epssolve_ (eps=0xd70248, __ierr=0x7ffda7afdae0) at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/ftn-auto/epssolvef.c:85
> #24 0x0000000000403c19 in all_stab_routines::solve_by_slepc2 (a_pet=..., b_pet=..., jthisone=<error reading variable: Cannot access memory at address 0x1>, isize=<error reading variable: Cannot access memory at address 0x0>) at small_slepc_example_program.F:322
> #25 0x00000000004025a0 in slepit () at small_slepc_example_program.F:549
> #26 0x00000000004023f2 in main ()
> #27 0x00007f72fb8380b3 in __libc_start_main (main=0x4023c0 <main>, argc=14, argv=0x7ffda7b024e8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffda7b024d8) at ../csu/libc-start.c:308
> #28 0x00000000004022fe in _start ()
> 
> From: Matthew Knepley <knepley at gmail.com>
> Sent: Tuesday, August 24, 2021 3:59 PM
> To: dazza simplythebest <sayosale at hotmail.com>
> Cc: Jose E. Roman <jroman at dsic.upv.es>; PETSc <petsc-users at mcs.anl.gov>
> Subject: Re: [petsc-users] Improving efficiency of slepc usage
>  
> On Tue, Aug 24, 2021 at 8:47 AM dazza simplythebest <sayosale at hotmail.com> wrote:
> 
> Dear Matthew and Jose,
>    Apologies for the delayed reply, I had a couple of unforeseen days off this week.
> Firstly regarding Jose's suggestion re: MUMPS, the program is already using MUMPS
> to solve linear systems (the code is using a distributed MPI  matrix to solve the generalised 
> non-Hermitian complex problem).
> 
> I have tried the gdb debugger as per Matthew's suggestion.
> Just to note in case someone else is following this that at first it didn't work (couldn't 'attach') ,
> but after some googling I found a tip suggesting the command;
> echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
> which seemed to get it working.
> 
> I then first ran the debugger on the small matrix case that worked.
> That stopped in gdb almost immediately after starting execution 
> with a report regarding 'nanosleep.c':
> ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory.
> However, issuing the 'cont' command again caused the program to run through to the end of the
>  execution w/out any problems, and with correct looking results, so I am guessing this error
> is not particularly important.
> 
> We do that on purpose when the debugger starts up. Typing 'cont' is correct.
>  
> I then tried the same debugging procedure on the large matrix case that fails.
> The code again stopped almost immediately after the start of execution with 
> the same nanosleep error as before, and I was able to set the program running 
>  again with 'cont' (see full output below). I was running the code with 4 MPI processes,
>  and so had 4 gdb windows appear.  Thereafter the code ran for sometime until completing the 
> matrix construction, and then one of the gdb process windows printed a 
> Program terminated with signal SIGKILL, Killed.
> The program no longer exists.
> message.  I then typed 'where' into this terminal but just received the message
> No stack.
> 
> I have only seen this behavior one other time, and it was with Fortran. Fortran allows you to declare really big arrays
> on the stack by putting them at the start of a function (rather than F90 malloc). When I had one of those arrays exceed
> the stack space, I got this kind of an error where everything is destroyed rather than just stopping. Could it be that you
> have a large structure on the stack?
> 
> Second, you can at least look at the stack for the processes that were not killed. You type Ctrl-C, which should give you
> the prompt and then "where".
> 
>   Thanks,
> 
>       Matt
>  
> The other gdb windows basically seemed to be left in limbo until I issued the 'quit'
>  command in the SIGKILL, and then they vanished.
> 
> I paste the full output from the gdb window that recorded the SIGKILL below here.
> I guess it is necessary to somehow work out where the SIGKILL originates from ?
> 
>  Thanks once again,
>                          Dan.
> 
> 
>  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
> Copyright (C) 2020 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> Type "show copying" and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>.
> Find the GDB manual and other documentation resources online at:
>     <http://www.gnu.org/software/gdb/documentation/>.
> 
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from ./stab1.exe...
> Attaching to program: /data/work/rotplane/omega_to_zero/stability/test/tmp10/tmp6/stab1.exe, process 675919
> Reading symbols from /data/work/slepc/SLEPC/slepc-3.15.1/arch-omp_nodbug/lib/libslepc.so.3.15...
> Reading symbols from /data/work/slepc/PETSC/petsc-3.15.0/arch-omp_nodbug/lib/libpetsc.so.3.15...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib--Type <RET> for more, q to quit, c to continue without paging--cont
> /intel64_lin/libmkl_intel_lp64.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.so...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.dbg...
> Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libdl-2.31.so...
> Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...
> Reading symbols from /usr/lib/debug/.build-id/e5/4761f7b554d0fcc1562959665d93dffbebdaf0.debug...
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6...
> (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpifort.so.12...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.dbg...
> Reading symbols from /lib/x86_64-linux-gnu/librt.so.1...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/librt-2.31.so...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so)
> Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libm-2.31.so...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so)
> Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1...
> (No debugging symbols found in /lib/x86_64-linux-gnu/libgcc_s.so.1)
> Reading symbols from /usr/lib/x86_64-linux-gnu/libquadmath.so.0...
> (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libquadmath.so.0)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so)
> Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.31.so...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5)
> Reading symbols from /lib64/ld-linux-x86-64.so.2...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.31.so...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1)
> Reading symbols from /usr/lib/x86_64-linux-gnu/libnuma.so...
> (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libnuma.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so)
> Reading symbols from /usr/lib/x86_64-linux-gnu/libpsm2.so.2...
> (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libpsm2.so.2)
> 0x00007fac4d0d8334 in __GI___clock_nanosleep (clock_id=<optimized out>, clock_id at entry=0, flags=flags at entry=0, req=req at entry=0x7ffdc641a9a0, rem=rem at entry=0x7ffdc641a9a0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
> 78      ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory.
> (gdb) cont
> Continuing.
> [New Thread 0x7f9e49c02780 (LWP 676559)]
> [New Thread 0x7f9e49400800 (LWP 676560)]
> [New Thread 0x7f9e48bfe880 (LWP 676562)]
> [Thread 0x7f9e48bfe880 (LWP 676562) exited]
> [Thread 0x7f9e49400800 (LWP 676560) exited]
> [Thread 0x7f9e49c02780 (LWP 676559) exited]
> 
> Program terminated with signal SIGKILL, Killed.
> The program no longer exists.
> (gdb) where
> No stack.
> 
>  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
>   
> From: Matthew Knepley <knepley at gmail.com>
> Sent: Friday, August 20, 2021 2:12 PM
> To: dazza simplythebest <sayosale at hotmail.com>
> Cc: Jose E. Roman <jroman at dsic.upv.es>; PETSc <petsc-users at mcs.anl.gov>
> Subject: Re: [petsc-users] Improving efficiency of slepc usage
>  
> On Fri, Aug 20, 2021 at 6:55 AM dazza simplythebest <sayosale at hotmail.com> wrote:
> Dear Jose,
>     Many thanks for your response, I have been investigating this issue with a few more calculations 
> today, hence the slightly delayed response.
> 
> The problem is actually derived from a fluid dynamics problem, so to allow an easier exploration of things 
> I first downsized the resolution of the underlying fluid solver while keeping all the physical parameters
>  the same - i.e. I would get a smaller matrix that should be solving the same physical problem as the original
>  larger matrix but to lower accuracy.  
> 
> Results
> 
> Small matrix (N= 21168) - everything good!
> This converged when using the -eps_largest_real approach (taking 92 iterations for nev=10,
> tol= 5.0000E-06 and ncv = 300), and also when using the shift-invert approach, converging 
> very impressively in a single iteration ! Interestingly it did this both for a non-zero  -eps_target
>  and also for a zero  -eps_target.
> 
> Large matrix (N=50400)- works for -eps_largest_real , fails for st_type sinvert 
> I have just double checked again that the code does run properly when we use the -eps_largest_real 
> option - indeed I ran it with a small nev and large tolerance (nev = 4, tol= -eps_tol 5.0e-4 , ncv = 300)
> and with these parameters convergence was obtained in 164 iterations, which took 6 hours on the 
> machine I was running it on. Furthermore the eigenvalues seem to be ballpark correct; for this large
> higher resolution case (although with lower slepc tolerance) we obtain 1789.56816314173 -4724.51319554773i
>  as the eigenvalue with largest real part, while the smaller matrix (same physical problem but at lower resolution case)
> found this eigenvalue to be 1831.11845726501 -4787.54519511345i , which means the agreement is in line
> with expectations.
> 
> Unfortunately though the code does still crash though when I try to do shift-invert for the large matrix case ,
>  whether or not I use a non-zero  -eps_target. For reference this is the command line used :
> -eps_nev 10    -eps_ncv 300  -log_view -eps_view   -eps_target 0.1 -st_type sinvert -eps_monitor :monitor_output05.txt  
> To be precise the code crashes soon after calling EPSSolve (it successfully calls 
>  MatCreateVecs, EPSCreate,  EPSSetOperators, EPSSetProblemType and EPSSetFromOptions).
> By crashes I mean that I do not even get any error messages from slepc/PETSC, and do not even get the 
> 'EPS Object: 16 MPI processes' message - I simply get a  MPI/Fortran 'KILLED BY SIGNAL: 9 (Killed)' message
>  as soon as EPSsolve is called.
> 
> Hi Dan,
> 
> It would help track this error down if we had a stack trace. You can get a stack trace from the debugger. You run with
> 
>   -start_in_debugger
> 
> which should launch the debugger (usually), and then type
> 
>   cont
> 
> to continue, and then
> 
>   where
> 
> to get the stack trace when it crashes, or 'bt' on lldb.
> 
>   Thanks,
> 
>      Matt
>  
> Do you have any ideas as to why this larger matrix case should fail when using shift-invert but succeed when using 
> -eps_largest_real ? The fact that the program works and produces correct results 
> when using the -eps_largest_real  option suggests that there is probably nothing wrong with the specification 
> of the problem or the matrices ? It is strange how there is no error message from slepc / Petsc ... the 
> only idea I have at the moment is that perhaps max memory has been exceeded, which could cause such a sudden 
> shutdown? For your reference when running the large matrix case with the -eps_largest_real option I am using 
> about 36 GB of the 148GB available on this machine  - does the shift invert approach require substantially 
> more memory for example ?
> 
>   I would be very grateful if you have any suggestions to resolve this issue or even ways to clarify it further,
>  the performance I have seen with the shift-invert for the small matrix is so impressive it would be great to
>  get that working for the full-size problem.
> 
>    Many thanks and best wishes,
>                                   Dan.
> 
> 
> 
> From: Jose E. Roman <jroman at dsic.upv.es>
> Sent: Thursday, August 19, 2021 7:58 AM
> To: dazza simplythebest <sayosale at hotmail.com>
> Cc: PETSc <petsc-users at mcs.anl.gov>
> Subject: Re: [petsc-users] Improving efficiency of slepc usage
>  
> In A) convergence may be slow, especially if the wanted eigenvalues have small magnitude. I would not say 600 iterations is a lot, you probably need many more. In most cases, approach B) is better because it improves convergence of eigenvalues close to the target, but it requires prior knowledge of your spectrum distribution in order to choose an appropriate target.
> 
> In B) what do you mean that it crashes. If you get an error about factorization, it means that your A-matrix is singular, In that case, try using a nonzero target -eps_target 0.1
> 
> Jose
> 
> 
> > El 19 ago 2021, a las 7:12, dazza simplythebest <sayosale at hotmail.com> escribi?:
> > 
> > Dear All,
> >             I am planning on using slepc to do a large number of eigenvalue calculations
> >  of a generalized eigenvalue problem, called from a program written in fortran using MPI.
> >  Thus far I have successfully installed the slepc/PETSc software, both locally and on a cluster,
> >  and on smaller test problems everything is working well; the matrices are efficiently and 
> > correctly constructed and slepc returns the correct spectrum. I am just now starting to move
> > towards now solving the full-size 'production run' problems, and would appreciate some 
> > general advice on how to improve the solver's performance.
> > 
> > In particular, I am currently trying to solve the problem Ax = lambda Bx whose matrices 
> > are of size 50000 (this is the smallest 'production run' problem I will be tackling), and are 
> > complex, non-Hermitian.  In most cases I aim to find the eigenvalues with the largest real part, 
> > although in other cases I will also be interested in finding the eigenvalues whose real part 
> > is close to zero.
> > 
> > A)
> > Calling slepc 's EPS solver with the following options:
> > 
> > -eps_nev 10   -log_view -eps_view -eps_max_it 600 -eps_ncv 140  -eps_tol 5.0e-6  -eps_largest_real -eps_monitor :monitor_output.txt
> > 
> > 
> > led to the code successfully running, but failing to find any eigenvalues within the maximum 600 iterations 
> > (examining the monitor output it did appear to be very slowly approaching convergence).
> > 
> > B)
> > On the same problem I have also tried a shift-invert transformation using the options
> > 
> > -eps_nev 10    -eps_ncv 140    -eps_target 0.0+0.0i  -st_type sinvert
> > 
> > -in this case the code crashed at the point it tried to call slepc, so perhaps I have incorrectly specified these options ?
> > 
> > 
> > Does anyone have any suggestions as to how to improve this performance ( or find out more about the problem) ?
> > In the case of A) I can see from watching the slepc   videos that increasing ncv 
> > may help, but I am wondering , since 600 is a large number of iterations, whether there 
> > maybe something else going on - e.g. perhaps some alternative preconditioner may help ?
> > In the case of B), I guess there must be some mistake in these command line options?
> >  Again, any advice will be greatly appreciated.
> >      Best wishes,  Dan.
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/


From knepley at gmail.com  Wed Aug 25 11:17:09 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Wed, 25 Aug 2021 12:17:09 -0400
Subject: [petsc-users] laplace_equation
In-Reply-To: <CAGRS4a5xRRickahOOs=4dXcQ24kU-REdDh6XAsd-f9bf6JkhuA@mail.gmail.com>
References: <CAGRS4a5xRRickahOOs=4dXcQ24kU-REdDh6XAsd-f9bf6JkhuA@mail.gmail.com>
Message-ID: <CAMYG4GneW73C-GnoVyGCt7yMHDqSKLHEC+LSPKvLqS2PM2G28A@mail.gmail.com>

On Wed, Aug 25, 2021 at 9:06 AM govind sharma <gsharma4189 at gmail.com> wrote:

> Hi,
>
> I want to solve 2D laplace equations at the starting level with petsc4py
> in parallel using mpi4py.
>
> Any examples or tutorials?
>

https://gitlab.com/petsc/petsc/-/blob/main/src/binding/petsc4py/demo/poisson2d/poisson2d.py

  Thanks,

     Matt


> Regards,
> Govind Sharma
> Phd scholar, Indian Institute of Technology, Delhi
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210825/b428dda8/attachment.html>

From sayosale at hotmail.com  Thu Aug 26 07:32:12 2021
From: sayosale at hotmail.com (dazza simplythebest)
Date: Thu, 26 Aug 2021 12:32:12 +0000
Subject: [petsc-users] Improving efficiency of slepc usage -memory
 management when using shift-invert
In-Reply-To: <B30EBC8E-A797-4D1A-AE1E-7EDAD39A610F@dsic.upv.es>
References: <MEYP282MB1975D9155BCF70CF2F5D65CAD0C09@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<EF005C76-523A-4D6B-8DA5-45A9A3FC37B3@dsic.upv.es>
	<MEYP282MB19759AF49362AE3EF4024546D0C19@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<CAMYG4Gn1h_S4o9LsnyzL_PRkHMxsXkpHgL16n_3iL5B6bRq7sw@mail.gmail.com>
	<MEYP282MB19750B9F47336C606742AA73D0C49@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<CAMYG4GkP6gM=KGaCQvkAFHLBVeKZY-nMt=-MH2qimh922wOK_g@mail.gmail.com>
	<MEYP282MB1975548CFDB3392C26941AC8D0C69@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<MEYP282MB19755EE893AC7BBE055DE4E0D0C69@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<B30EBC8E-A797-4D1A-AE1E-7EDAD39A610F@dsic.upv.es>
Message-ID: <MEYP282MB19756D7938BFD132EB05090CD0C79@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>

Dear Jose and Matthew,
                 Many thanks for your assistance, this would seem to explain what the problem was.
So judging by this test case, there seems to be a memory vs computational time tradeoff involved
 in choosing whether to shift-invert or not; the shift-invert will greatly reduce the
number of required iterations ,but will require a higher memory cost ?
I have been trying a few values of -st_mat_mumps_icntl_14 (and also the alternative
-st_mat_mumps_icntl_23) today but have not yet been able to select one that fits onto the
 workstation I am using (although it seems that setting these parameters seems to guarantee
 that an error message is generated at least).

Thus I will probably need to reduce the number of MPI
processes and thereby reduce the memory requirement). In this regard the MUMPS documentation
 suggests that a hybrid MPI-OpenMP approach is optimum for their software, whereas I remember reading
somewhere else that openmp threading was not a good choice for using PETSC, would you have any
general advice on this ? I was thinking maybe that a version of slepc / petsc compiled against openmp,
 and with the number of threads set appropriately, but not explicitly using openmp directives in
 the user's code may be the way forward ? That way PETSC will (?) just ignore the threading whereas
 threading will be available to MUMPS when execution is passed to those routines ?

 Many thanks once again,
             Dan.


________________________________
From: Jose E. Roman <jroman at dsic.upv.es>
Sent: Wednesday, August 25, 2021 1:40 PM
To: dazza simplythebest <sayosale at hotmail.com>
Cc: PETSc <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Improving efficiency of slepc usage

MUMPS documentation (section 8) indicates that the meaning of INFOG(1)=-9 is insuficient workspace. Try running with
 -st_mat_mumps_icntl_14 <percentage>
where <percentage> is the percentage in which you want to increase the workspace, e.g. 50 or 100 or more.

See ex43.c for an example showing how to set this option in code.

Jose


> El 25 ago 2021, a las 14:11, dazza simplythebest <sayosale at hotmail.com> escribi?:
>
>
>
> From: dazza simplythebest <sayosale at hotmail.com>
> Sent: Wednesday, August 25, 2021 12:08 PM
> To: Matthew Knepley <knepley at gmail.com>
> Subject: Re: [petsc-users] Improving efficiency of slepc usage
>
> ?Dear Matthew and Jose,
>                                           I have derived a smaller program from the original program by constructing
> matrices of the same size, but filling their entries randomly instead of computing the correct
> fluid dynamics values just to allow  faster experimentation. This modified code's behaviour seems
>  to be similar, with the code again failing for the large matrix case  with  the SIGKILL error, so I first report
> results from that code here. Firstly I can confirm that I am using Fortran , and I am compiling with the
>  intel compiler, which it seems places automatic arrays on the stack.  The stacksize, as determined
> by ulimit -a, is reported to be :
> stack size              (kbytes, -s) 8192
>
> [1] Okay, so I followed your suggestion and used ctrl-c  followed by 'where' in one of the non-SIGKILL gdb windows.
>  I have pasted the output into the bottom of this email (see [1] output) - it does look like the problem occurs somewhere in the call
>  to the MUMPS solver ?
>
> [2] I have also today gained access to another workstation, and so have tried running the (original) code on that machine.
>   This new machine has two (more powerful) CPU nodes and a larger memory (both machines feature Intel Xeon processors).
> On this new machine the large matrix case again failed with the familiar SIGKILL report when I used 16 or 12 MPI
> processes,  ran to the end w/out error for 4 or 6 MPI processes, and failed but with a PETSC error message
>  when I used 8 MPI processes, which I have pasted below (see [2] output). Does this point to some sort of resource
> demand that exceeds some limit as the number of MPI processes increases ?
>
>   Many thanks once again,
>             Dan.
>
> [2] output
> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [0]PETSC ERROR: Error in external library
> [0]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6
>
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [0]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
> [0]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
> [0]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [1]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [1]PETSC ERROR: Error in external library
> [1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6
>
> [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [1]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [1]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
> [1]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
> [1]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [1]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [1]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [1]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [1]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [2]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [2]PETSC ERROR: Error in external library
> [2]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6
>
> [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [2]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [2]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
> [2]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
> [2]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [2]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [2]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [2]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [2]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [3]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [3]PETSC ERROR: Error in external library
> [3]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6
>
> [3]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [3]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [3]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
> [3]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
> [3]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [3]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [3]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [3]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [3]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [3]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [3]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [3]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [4]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [4]PETSC ERROR: Error in external library
> [4]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6
>
> [4]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [4]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [4]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
> [4]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
> [4]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [4]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [4]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [4]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [4]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [4]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [4]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [4]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [4]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [5]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [5]PETSC ERROR: Error in external library
> [5]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6
>
> [5]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [5]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [5]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
> [5]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
> [5]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [5]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [5]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [5]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [5]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [5]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [5]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [5]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [5]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [6]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [6]PETSC ERROR: Error in external library
> [6]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=21891045
>
> [6]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [6]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [6]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
> [6]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
> [6]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [6]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [6]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [6]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [6]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [6]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [6]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [6]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [6]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [7]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [7]PETSC ERROR: Error in external library
> [7]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=21841925
>
> [7]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [7]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [7]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
> [7]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
> [7]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [7]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [7]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [7]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [7]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [7]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [7]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [7]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [7]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [0]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [0]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [0]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [0]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [0]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [0]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [0]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [0]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [1]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [1]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [1]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [1]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [2]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [2]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [2]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [2]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [3]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
>
>
>
> [1] output
>
> Continuing.
> [New Thread 0x7f6f5b2d2780 (LWP 794037)]
> [New Thread 0x7f6f5aad0800 (LWP 794040)]
> [New Thread 0x7f6f5a2ce880 (LWP 794041)]
> ^C
> Thread 1 "my.exe" received signal SIGINT, Interrupt.
> 0x00007f72904927b0 in ofi_fastlock_release_noop ()
>    from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
> (gdb) where
> #0  0x00007f72904927b0 in ofi_fastlock_release_noop ()
>    from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
> #1  0x00007f729049354b in ofi_cq_readfrom ()
>    from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
> #2  0x00007f728ffe8f0e in rxm_ep_do_progress ()
>    from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
> #3  0x00007f728ffe2b7d in rxm_ep_recv_common_flags ()
>    from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
> #4  0x00007f728ffe30f8 in rxm_ep_trecvmsg ()
>    from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
> #5  0x00007f72fe6b8c3e in PMPI_Iprobe (source=14090824, tag=-1481647392,
>     comm=1, flag=0x0, status=0xffffffffffffffff)
>     at /usr/include/rdma/fi_tagged.h:109
> #6  0x00007f72ff3d7fad in pmpi_iprobe_ (v1=0xd70248, v2=0x7ffda7afdae0,
>     v3=0x1, v4=0x0, v5=0xffffffffffffffff, ierr=0xd6fc90)
>     at ../../src/binding/fortran/mpif_h/iprobef.c:276
> #7  0x00007f730855b6e2 in zmumps_try_recvtreat (comm_load=1, ass_irecv=0,
>     blocking=<error reading variable: Cannot access memory at address 0x1>,
>
>     --Type <RET> for more, q to quit, c to continue without paging--cont
>     irecv=<error reading variable: Cannot access memory at address 0x0>, message_received=<error reading variable: Cannot access memory at address 0xffffffffffffffff>, msgsou=1, msgtag=-1, status=..., bufr=..., lbufr=320782504, lbufr_bytes=1283130016, procnode_steps=..., posfac=1, iwpos=1, iwposcb=292535, iptrlu=2039063816, lrlu=2039063816, lrlus=2039063816, n=50400, iw=..., liw=292563, a=..., la=2611636796, ptrist=..., ptlust=..., ptrfac=..., ptrast=..., step=..., pimaster=..., pamaster=..., nstk_s=..., comp=0, iflag=0, ierror=0, comm=-1006632958, nbprocfils=..., ipool=..., lpool=5, leaf=1, nbfin=4, myid=1, slavef=4, root=<error reading variable: value of type `zmumps_root_struc' requires 766016 bytes, which is more than max-value-size>, opassw=0, opeliw=0, itloc=..., rhs_mumps=..., fils=..., dad=..., ptrarw=..., ptraiw=..., intarr=..., dblarr=..., icntl=..., keep=..., keep8=..., dkeep=..., nd=..., frere=..., lptrar=50400, nelt=1, frtptr=..., frtelt=..., istep_to_iniv2=..., tab_pos_in_pere=..., stack_right_authorized=4294967295, lrgroups=...) at zfac_process_message.F:730
> #8  0x00007f73087076e2 in zmumps_fac_par_m::zmumps_fac_par (n=1, iw=..., liw=<error reading variable: Cannot access memory at address 0x1>, a=..., la=<error reading variable: Cannot access memory at address 0xffffffffffffffff>, nstk_steps=..., nbprocfils=..., nd=..., fils=..., step=..., frere=..., dad=..., cand=..., istep_to_iniv2=..., tab_pos_in_pere=..., nstepsdone=1690339657, opass=<error reading variable: Cannot access memory at address 0x5>, opeli=<error reading variable: Cannot access memory at address 0x0>, nelva=50400, comp=259581, maxfrt=-1889517576, nmaxnpiv=-1195144887, ntotpv=<error reading variable: Cannot access memory at address 0x2>, noffnegpv=<error reading variable: Cannot access memory at address 0x0>, nb22t1=<error reading variable: Cannot access memory at address 0x0>, nb22t2=<error reading variable: Cannot access memory at address 0x0>, nbtiny=<error reading variable: Cannot access memory at address 0x0>, det_exp=<error reading variable: Cannot access memory at address 0x0>, det_mant=<error reading variable: Cannot access memory at address 0x0>, det_sign=<error reading variable: Cannot access memory at address 0x0>, ptrist=..., ptrast=..., pimaster=..., pamaster=..., ptrarw=..., ptraiw=..., itloc=..., rhs_mumps=..., ipool=..., lpool=<error reading variable: Cannot access memory at address 0x0>, rinfo=<error reading variable: Cannot access memory at address 0x0>, posfac=<error reading variable: Cannot access memory at address 0x0>, iwpos=<error reading variable: Cannot access memory at address 0x0>, lrlu=<error reading variable: Cannot access memory at address 0x0>, iptrlu=<error reading variable: Cannot access memory at address 0x0>, lrlus=<error reading variable: Cannot access memory at address 0x0>, leaf=<error reading variable: Cannot access memory at address 0x0>, nbroot=<error reading variable: Cannot access memory at address 0x0>, nbrtot=<error reading variable: Cannot access memory at address 0x0>, uu=<error reading variable: Cannot access memory at address 0x0>, icntl=<error reading variable: Cannot access memory at address 0x0>, ptlust=..., ptrfac=..., info=<error reading variable: Cannot access memory at address 0x0>, keep=<error reading variable: Cannot access memory at address 0x3ff0000000000000>, keep8=<error reading variable: Cannot access memory at address 0x0>, procnode_steps=..., slavef=<error reading variable: Cannot access memory at address 0x4ffffffff>, myid=<error reading variable: Cannot access memory at address 0xffffffff>, comm_nodes=<error reading variable: Cannot access memory at address 0x0>, myid_nodes=<error reading variable: Cannot access memory at address 0x0>, bufr=..., lbufr=0, lbufr_bytes=5, intarr=..., dblarr=..., root=..., perm=..., nelt=0, frtptr=..., frtelt=..., lptrar=3, comm_load=-30, ass_irecv=30, seuil=2.1219957909652723e-314, seuil_ldlt_niv2=4.2439866417681519e-314, mem_distrib=..., ne=..., dkeep=..., pivnul_list=..., lpn_list=0, lrgroups=...) at zfac_par_m.F:182
> #9  0x00007f730865af7a in zmumps_fac_b (n=1, s_is_pointers=..., la=<error reading variable: Cannot access memory at address 0x1>, liw=<error reading variable: Cannot access memory at address 0x0>, sym_perm=..., na=..., lna=1, ne_steps=..., nfsiz=..., fils=..., step=..., frere=..., dad=..., cand=..., istep_to_iniv2=..., tab_pos_in_pere=..., ptrar=..., ldptrar=<error reading variable: Cannot access memory at address 0x0>, ptrist=..., ptlust_s=..., ptrfac=..., iw1=..., iw2=..., itloc=..., rhs_mumps=..., pool=..., lpool=-1889529280, cntl1=-5.3576889161551131e-255, icntl=<error reading variable: Cannot access memory at address 0x25344>, info=..., rinfo=..., keep=..., keep8=..., procnode_steps=..., slavef=-1889504640, comm_nodes=-2048052411, myid=<error reading variable: Cannot access memory at address 0x81160>, myid_nodes=-1683330500, bufr=..., lbufr=<error reading variable: Cannot access memory at address 0x11db4c>, lbufr_bytes=<error reading variable: Cannot access memory at address 0xc4e0>, zmumps_lbuf=<error reading variable: Cannot access memory at address 0x4>, intarr=..., dblarr=..., root=<error reading variable: Cannot access memory at address 0x11dbec>, nelt=<error reading variable: Cannot access memory at address 0x3>, frtptr=..., frtelt=..., comm_load=<error reading variable: Cannot access memory at address 0x0>, ass_irecv=<error reading variable: Cannot access memory at address 0x0>, seuil=<error reading variable: Cannot access memory at address 0x0>, seuil_ldlt_niv2=<error reading variable: Cannot access memory at address 0x0>, mem_distrib=<error reading variable: Cannot access memory at address 0x0>, dkeep=<error reading variable: Cannot access memory at address 0x0>, pivnul_list=..., lpn_list=<error reading variable: Cannot access memory at address 0x0>, lrgroups=...) at zfac_b.F:243
> #10 0x00007f7308610ff7 in zmumps_fac_driver (id=<error reading variable: value of type `zmumps_struc' requires 386095520 bytes, which is more than max-value-size>) at zfac_driver.F:2421
> #11 0x00007f7308569256 in zmumps (id=<error reading variable: value of type `zmumps_struc' requires 386095520 bytes, which is more than max-value-size>) at zmumps_driver.F:1883
> #12 0x00007f73084cf756 in zmumps_f77 (job=1, sym=0, par=<error reading variable: Cannot access memory at address 0x1>, comm_f77=<error reading variable: Cannot access memory at address 0x0>, n=<error reading variable: Cannot access memory at address 0xffffffffffffffff>, nblk=1, icntl=..., cntl=..., keep=..., dkeep=..., keep8=..., nz=0, nnz=0, irn=..., irnhere=0, jcn=..., jcnhere=0, a=..., ahere=0, nz_loc=0, nnz_loc=304384739, irn_loc=..., irn_lochere=1, jcn_loc=..., jcn_lochere=1, a_loc=..., a_lochere=1, nelt=0, eltptr=..., eltptrhere=0, eltvar=..., eltvarhere=0, a_elt=..., a_elthere=0, blkptr=..., blkptrhere=0, blkvar=..., blkvarhere=0, perm_in=..., perm_inhere=0, rhs=..., rhshere=0, redrhs=..., redrhshere=0, info=..., rinfo=..., infog=..., rinfog=..., deficiency=0, lwk_user=0, size_schur=0, listvar_schur=..., listvar_schurhere=0, schur=..., schurhere=0, wk_user=..., wk_userhere=0, colsca=..., colscahere=0, rowsca=..., rowscahere=0, instance_number=1, nrhs=1, lrhs=0, lredrhs=0, rhs_sparse=..., rhs_sparsehere=0, sol_loc=..., sol_lochere=0, rhs_loc=..., rhs_lochere=0, irhs_sparse=..., irhs_sparsehere=0, irhs_ptr=..., irhs_ptrhere=0, isol_loc=..., isol_lochere=0, irhs_loc=..., irhs_lochere=0, nz_rhs=0, lsol_loc=0, lrhs_loc=0, nloc_rhs=0, schur_mloc=0, schur_nloc=0, schur_lld=0, mblock=0, nblock=0, nprow=0, npcol=0, ooc_tmpdir=..., ooc_prefix=..., write_problem=..., save_dir=..., save_prefix=..., tmpdirlen=20, prefixlen=20, write_problemlen=20, save_dirlen=20, save_prefixlen=20, metis_options=...) at zmumps_f77.F:289
> #13 0x00007f73084cd391 in zmumps_c (mumps_par=0xd70248) at mumps_c.c:485
> #14 0x00007f7307c035ad in MatFactorNumeric_MUMPS (F=0xd70248, A=0x7ffda7afdae0, info=0x1) at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1683
> #15 0x00007f7307765a8b in MatLUFactorNumeric (fact=0xd70248, mat=0x7ffda7afdae0, info=0x1) at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> #16 0x00007f73081b8427 in PCSetUp_LU (pc=0xd70248) at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> #17 0x00007f7308214939 in PCSetUp (pc=0xd70248) at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> #18 0x00007f73082260ae in KSPSetUp (ksp=0xd70248) at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> #19 0x00007f7309114959 in STSetUp_Sinvert (st=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> #20 0x00007f7309130462 in STSetUp (st=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> #21 0x00007f73092504af in EPSSetUp (eps=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> #22 0x00007f7309253635 in EPSSolve (eps=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> #23 0x00007f7309259c8d in epssolve_ (eps=0xd70248, __ierr=0x7ffda7afdae0) at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/ftn-auto/epssolvef.c:85
> #24 0x0000000000403c19 in all_stab_routines::solve_by_slepc2 (a_pet=..., b_pet=..., jthisone=<error reading variable: Cannot access memory at address 0x1>, isize=<error reading variable: Cannot access memory at address 0x0>) at small_slepc_example_program.F:322
> #25 0x00000000004025a0 in slepit () at small_slepc_example_program.F:549
> #26 0x00000000004023f2 in main ()
> #27 0x00007f72fb8380b3 in __libc_start_main (main=0x4023c0 <main>, argc=14, argv=0x7ffda7b024e8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffda7b024d8) at ../csu/libc-start.c:308
> #28 0x00000000004022fe in _start ()
>
> From: Matthew Knepley <knepley at gmail.com>
> Sent: Tuesday, August 24, 2021 3:59 PM
> To: dazza simplythebest <sayosale at hotmail.com>
> Cc: Jose E. Roman <jroman at dsic.upv.es>; PETSc <petsc-users at mcs.anl.gov>
> Subject: Re: [petsc-users] Improving efficiency of slepc usage
>
> On Tue, Aug 24, 2021 at 8:47 AM dazza simplythebest <sayosale at hotmail.com> wrote:
>
> Dear Matthew and Jose,
>    Apologies for the delayed reply, I had a couple of unforeseen days off this week.
> Firstly regarding Jose's suggestion re: MUMPS, the program is already using MUMPS
> to solve linear systems (the code is using a distributed MPI  matrix to solve the generalised
> non-Hermitian complex problem).
>
> I have tried the gdb debugger as per Matthew's suggestion.
> Just to note in case someone else is following this that at first it didn't work (couldn't 'attach') ,
> but after some googling I found a tip suggesting the command;
> echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
> which seemed to get it working.
>
> I then first ran the debugger on the small matrix case that worked.
> That stopped in gdb almost immediately after starting execution
> with a report regarding 'nanosleep.c':
> ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory.
> However, issuing the 'cont' command again caused the program to run through to the end of the
>  execution w/out any problems, and with correct looking results, so I am guessing this error
> is not particularly important.
>
> We do that on purpose when the debugger starts up. Typing 'cont' is correct.
>
> I then tried the same debugging procedure on the large matrix case that fails.
> The code again stopped almost immediately after the start of execution with
> the same nanosleep error as before, and I was able to set the program running
>  again with 'cont' (see full output below). I was running the code with 4 MPI processes,
>  and so had 4 gdb windows appear.  Thereafter the code ran for sometime until completing the
> matrix construction, and then one of the gdb process windows printed a
> Program terminated with signal SIGKILL, Killed.
> The program no longer exists.
> message.  I then typed 'where' into this terminal but just received the message
> No stack.
>
> I have only seen this behavior one other time, and it was with Fortran. Fortran allows you to declare really big arrays
> on the stack by putting them at the start of a function (rather than F90 malloc). When I had one of those arrays exceed
> the stack space, I got this kind of an error where everything is destroyed rather than just stopping. Could it be that you
> have a large structure on the stack?
>
> Second, you can at least look at the stack for the processes that were not killed. You type Ctrl-C, which should give you
> the prompt and then "where".
>
>   Thanks,
>
>       Matt
>
> The other gdb windows basically seemed to be left in limbo until I issued the 'quit'
>  command in the SIGKILL, and then they vanished.
>
> I paste the full output from the gdb window that recorded the SIGKILL below here.
> I guess it is necessary to somehow work out where the SIGKILL originates from ?
>
>  Thanks once again,
>                          Dan.
>
>
>  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
> Copyright (C) 2020 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> Type "show copying" and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>.
> Find the GDB manual and other documentation resources online at:
>     <http://www.gnu.org/software/gdb/documentation/>.
>
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from ./stab1.exe...
> Attaching to program: /data/work/rotplane/omega_to_zero/stability/test/tmp10/tmp6/stab1.exe, process 675919
> Reading symbols from /data/work/slepc/SLEPC/slepc-3.15.1/arch-omp_nodbug/lib/libslepc.so.3.15...
> Reading symbols from /data/work/slepc/PETSC/petsc-3.15.0/arch-omp_nodbug/lib/libpetsc.so.3.15...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib--Type <RET> for more, q to quit, c to continue without paging--cont
> /intel64_lin/libmkl_intel_lp64.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.so...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.dbg...
> Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libdl-2.31.so...
> Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...
> Reading symbols from /usr/lib/debug/.build-id/e5/4761f7b554d0fcc1562959665d93dffbebdaf0.debug...
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6...
> (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpifort.so.12...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.dbg...
> Reading symbols from /lib/x86_64-linux-gnu/librt.so.1...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/librt-2.31.so...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so)
> Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libm-2.31.so...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so)
> Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1...
> (No debugging symbols found in /lib/x86_64-linux-gnu/libgcc_s.so.1)
> Reading symbols from /usr/lib/x86_64-linux-gnu/libquadmath.so.0...
> (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libquadmath.so.0)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so)
> Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.31.so...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5)
> Reading symbols from /lib64/ld-linux-x86-64.so.2...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.31.so...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1)
> Reading symbols from /usr/lib/x86_64-linux-gnu/libnuma.so...
> (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libnuma.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so)
> Reading symbols from /usr/lib/x86_64-linux-gnu/libpsm2.so.2...
> (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libpsm2.so.2)
> 0x00007fac4d0d8334 in __GI___clock_nanosleep (clock_id=<optimized out>, clock_id at entry=0, flags=flags at entry=0, req=req at entry=0x7ffdc641a9a0, rem=rem at entry=0x7ffdc641a9a0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
> 78      ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory.
> (gdb) cont
> Continuing.
> [New Thread 0x7f9e49c02780 (LWP 676559)]
> [New Thread 0x7f9e49400800 (LWP 676560)]
> [New Thread 0x7f9e48bfe880 (LWP 676562)]
> [Thread 0x7f9e48bfe880 (LWP 676562) exited]
> [Thread 0x7f9e49400800 (LWP 676560) exited]
> [Thread 0x7f9e49c02780 (LWP 676559) exited]
>
> Program terminated with signal SIGKILL, Killed.
> The program no longer exists.
> (gdb) where
> No stack.
>
>  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> From: Matthew Knepley <knepley at gmail.com>
> Sent: Friday, August 20, 2021 2:12 PM
> To: dazza simplythebest <sayosale at hotmail.com>
> Cc: Jose E. Roman <jroman at dsic.upv.es>; PETSc <petsc-users at mcs.anl.gov>
> Subject: Re: [petsc-users] Improving efficiency of slepc usage
>
> On Fri, Aug 20, 2021 at 6:55 AM dazza simplythebest <sayosale at hotmail.com> wrote:
> Dear Jose,
>     Many thanks for your response, I have been investigating this issue with a few more calculations
> today, hence the slightly delayed response.
>
> The problem is actually derived from a fluid dynamics problem, so to allow an easier exploration of things
> I first downsized the resolution of the underlying fluid solver while keeping all the physical parameters
>  the same - i.e. I would get a smaller matrix that should be solving the same physical problem as the original
>  larger matrix but to lower accuracy.
>
> Results
>
> Small matrix (N= 21168) - everything good!
> This converged when using the -eps_largest_real approach (taking 92 iterations for nev=10,
> tol= 5.0000E-06 and ncv = 300), and also when using the shift-invert approach, converging
> very impressively in a single iteration ! Interestingly it did this both for a non-zero  -eps_target
>  and also for a zero  -eps_target.
>
> Large matrix (N=50400)- works for -eps_largest_real , fails for st_type sinvert
> I have just double checked again that the code does run properly when we use the -eps_largest_real
> option - indeed I ran it with a small nev and large tolerance (nev = 4, tol= -eps_tol 5.0e-4 , ncv = 300)
> and with these parameters convergence was obtained in 164 iterations, which took 6 hours on the
> machine I was running it on. Furthermore the eigenvalues seem to be ballpark correct; for this large
> higher resolution case (although with lower slepc tolerance) we obtain 1789.56816314173 -4724.51319554773i
>  as the eigenvalue with largest real part, while the smaller matrix (same physical problem but at lower resolution case)
> found this eigenvalue to be 1831.11845726501 -4787.54519511345i , which means the agreement is in line
> with expectations.
>
> Unfortunately though the code does still crash though when I try to do shift-invert for the large matrix case ,
>  whether or not I use a non-zero  -eps_target. For reference this is the command line used :
> -eps_nev 10    -eps_ncv 300  -log_view -eps_view   -eps_target 0.1 -st_type sinvert -eps_monitor :monitor_output05.txt
> To be precise the code crashes soon after calling EPSSolve (it successfully calls
>  MatCreateVecs, EPSCreate,  EPSSetOperators, EPSSetProblemType and EPSSetFromOptions).
> By crashes I mean that I do not even get any error messages from slepc/PETSC, and do not even get the
> 'EPS Object: 16 MPI processes' message - I simply get a  MPI/Fortran 'KILLED BY SIGNAL: 9 (Killed)' message
>  as soon as EPSsolve is called.
>
> Hi Dan,
>
> It would help track this error down if we had a stack trace. You can get a stack trace from the debugger. You run with
>
>   -start_in_debugger
>
> which should launch the debugger (usually), and then type
>
>   cont
>
> to continue, and then
>
>   where
>
> to get the stack trace when it crashes, or 'bt' on lldb.
>
>   Thanks,
>
>      Matt
>
> Do you have any ideas as to why this larger matrix case should fail when using shift-invert but succeed when using
> -eps_largest_real ? The fact that the program works and produces correct results
> when using the -eps_largest_real  option suggests that there is probably nothing wrong with the specification
> of the problem or the matrices ? It is strange how there is no error message from slepc / Petsc ... the
> only idea I have at the moment is that perhaps max memory has been exceeded, which could cause such a sudden
> shutdown? For your reference when running the large matrix case with the -eps_largest_real option I am using
> about 36 GB of the 148GB available on this machine  - does the shift invert approach require substantially
> more memory for example ?
>
>   I would be very grateful if you have any suggestions to resolve this issue or even ways to clarify it further,
>  the performance I have seen with the shift-invert for the small matrix is so impressive it would be great to
>  get that working for the full-size problem.
>
>    Many thanks and best wishes,
>                                   Dan.
>
>
>
> From: Jose E. Roman <jroman at dsic.upv.es>
> Sent: Thursday, August 19, 2021 7:58 AM
> To: dazza simplythebest <sayosale at hotmail.com>
> Cc: PETSc <petsc-users at mcs.anl.gov>
> Subject: Re: [petsc-users] Improving efficiency of slepc usage
>
> In A) convergence may be slow, especially if the wanted eigenvalues have small magnitude. I would not say 600 iterations is a lot, you probably need many more. In most cases, approach B) is better because it improves convergence of eigenvalues close to the target, but it requires prior knowledge of your spectrum distribution in order to choose an appropriate target.
>
> In B) what do you mean that it crashes. If you get an error about factorization, it means that your A-matrix is singular, In that case, try using a nonzero target -eps_target 0.1
>
> Jose
>
>
> > El 19 ago 2021, a las 7:12, dazza simplythebest <sayosale at hotmail.com> escribi?:
> >
> > Dear All,
> >             I am planning on using slepc to do a large number of eigenvalue calculations
> >  of a generalized eigenvalue problem, called from a program written in fortran using MPI.
> >  Thus far I have successfully installed the slepc/PETSc software, both locally and on a cluster,
> >  and on smaller test problems everything is working well; the matrices are efficiently and
> > correctly constructed and slepc returns the correct spectrum. I am just now starting to move
> > towards now solving the full-size 'production run' problems, and would appreciate some
> > general advice on how to improve the solver's performance.
> >
> > In particular, I am currently trying to solve the problem Ax = lambda Bx whose matrices
> > are of size 50000 (this is the smallest 'production run' problem I will be tackling), and are
> > complex, non-Hermitian.  In most cases I aim to find the eigenvalues with the largest real part,
> > although in other cases I will also be interested in finding the eigenvalues whose real part
> > is close to zero.
> >
> > A)
> > Calling slepc 's EPS solver with the following options:
> >
> > -eps_nev 10   -log_view -eps_view -eps_max_it 600 -eps_ncv 140  -eps_tol 5.0e-6  -eps_largest_real -eps_monitor :monitor_output.txt
> >
> >
> > led to the code successfully running, but failing to find any eigenvalues within the maximum 600 iterations
> > (examining the monitor output it did appear to be very slowly approaching convergence).
> >
> > B)
> > On the same problem I have also tried a shift-invert transformation using the options
> >
> > -eps_nev 10    -eps_ncv 140    -eps_target 0.0+0.0i  -st_type sinvert
> >
> > -in this case the code crashed at the point it tried to call slepc, so perhaps I have incorrectly specified these options ?
> >
> >
> > Does anyone have any suggestions as to how to improve this performance ( or find out more about the problem) ?
> > In the case of A) I can see from watching the slepc   videos that increasing ncv
> > may help, but I am wondering , since 600 is a large number of iterations, whether there
> > maybe something else going on - e.g. perhaps some alternative preconditioner may help ?
> > In the case of B), I guess there must be some mistake in these command line options?
> >  Again, any advice will be greatly appreciated.
> >      Best wishes,  Dan.
>
>
>
> --
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
>
>
> --
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210826/0ffa2318/attachment-0001.html>

From junchao.zhang at gmail.com  Thu Aug 26 10:29:44 2021
From: junchao.zhang at gmail.com (Junchao Zhang)
Date: Thu, 26 Aug 2021 10:29:44 -0500
Subject: [petsc-users] Improving efficiency of slepc usage -memory
 management when using shift-invert
In-Reply-To: <MEYP282MB19756D7938BFD132EB05090CD0C79@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
References: <MEYP282MB1975D9155BCF70CF2F5D65CAD0C09@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<EF005C76-523A-4D6B-8DA5-45A9A3FC37B3@dsic.upv.es>
	<MEYP282MB19759AF49362AE3EF4024546D0C19@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<CAMYG4Gn1h_S4o9LsnyzL_PRkHMxsXkpHgL16n_3iL5B6bRq7sw@mail.gmail.com>
	<MEYP282MB19750B9F47336C606742AA73D0C49@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<CAMYG4GkP6gM=KGaCQvkAFHLBVeKZY-nMt=-MH2qimh922wOK_g@mail.gmail.com>
	<MEYP282MB1975548CFDB3392C26941AC8D0C69@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<MEYP282MB19755EE893AC7BBE055DE4E0D0C69@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<B30EBC8E-A797-4D1A-AE1E-7EDAD39A610F@dsic.upv.es>
	<MEYP282MB19756D7938BFD132EB05090CD0C79@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
Message-ID: <CA+MQGp9kJe5psCfgdetU7VeRVCOOtcvROX2y6dgnP6Y3G9Rh7w@mail.gmail.com>

Hello, Dan,
 You might want to have a look the manual at
https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html
  Thanks.
--Junchao Zhang


On Thu, Aug 26, 2021 at 7:32 AM dazza simplythebest <sayosale at hotmail.com>
wrote:

> Dear Jose and Matthew,
>                  Many thanks for your assistance, this would seem to
> explain what the problem was.
> So judging by this test case, there seems to be a memory vs computational
> time tradeoff involved
>  in choosing whether to shift-invert or not; the shift-invert will greatly
> reduce the
> number of required iterations ,but will require a higher memory cost ?
> I have been trying a few values of -st_mat_mumps_icntl_14 (and also the
> alternative
> -st_mat_mumps_icntl_23) today but have not yet been able to select one
> that fits onto the
>  workstation I am using (although it seems that setting these parameters
> seems to guarantee
>  that an error message is generated at least).
>
> Thus I will probably need to reduce the number of MPI
> processes and thereby reduce the memory requirement). In this regard the
> MUMPS documentation
>  suggests that a hybrid MPI-OpenMP approach is optimum for their software,
> whereas I remember reading
> somewhere else that openmp threading was not a good choice for using
> PETSC, would you have any
> general advice on this ? I was thinking maybe that a version of slepc /
> petsc compiled against openmp,
>  and with the number of threads set appropriately, but not explicitly
> using openmp directives in
>  the user's code may be the way forward ? That way PETSC will (?) just
> ignore the threading whereas
>  threading will be available to MUMPS when execution is passed to those
> routines ?
>
>  Many thanks once again,
>              Dan.
>
>
>
> ------------------------------
> *From:* Jose E. Roman <jroman at dsic.upv.es>
> *Sent:* Wednesday, August 25, 2021 1:40 PM
> *To:* dazza simplythebest <sayosale at hotmail.com>
> *Cc:* PETSc <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Improving efficiency of slepc usage
>
> MUMPS documentation (section 8) indicates that the meaning of INFOG(1)=-9
> is insuficient workspace. Try running with
>  -st_mat_mumps_icntl_14 <percentage>
> where <percentage> is the percentage in which you want to increase the
> workspace, e.g. 50 or 100 or more.
>
> See ex43.c for an example showing how to set this option in code.
>
> Jose
>
>
> > El 25 ago 2021, a las 14:11, dazza simplythebest <sayosale at hotmail.com>
> escribi?:
> >
> >
> >
> > From: dazza simplythebest <sayosale at hotmail.com>
> > Sent: Wednesday, August 25, 2021 12:08 PM
> > To: Matthew Knepley <knepley at gmail.com>
> > Subject: Re: [petsc-users] Improving efficiency of slepc usage
> >
> > ?Dear Matthew and Jose,
> >                                           I have derived a smaller
> program from the original program by constructing
> > matrices of the same size, but filling their entries randomly instead of
> computing the correct
> > fluid dynamics values just to allow  faster experimentation. This
> modified code's behaviour seems
> >  to be similar, with the code again failing for the large matrix case
> with  the SIGKILL error, so I first report
> > results from that code here. Firstly I can confirm that I am using
> Fortran , and I am compiling with the
> >  intel compiler, which it seems places automatic arrays on the stack.
> The stacksize, as determined
> > by ulimit -a, is reported to be :
> > stack size              (kbytes, -s) 8192
> >
> > [1] Okay, so I followed your suggestion and used ctrl-c  followed by
> 'where' in one of the non-SIGKILL gdb windows.
> >  I have pasted the output into the bottom of this email (see [1] output)
> - it does look like the problem occurs somewhere in the call
> >  to the MUMPS solver ?
> >
> > [2] I have also today gained access to another workstation, and so have
> tried running the (original) code on that machine.
> >   This new machine has two (more powerful) CPU nodes and a larger memory
> (both machines feature Intel Xeon processors).
> > On this new machine the large matrix case again failed with the familiar
> SIGKILL report when I used 16 or 12 MPI
> > processes,  ran to the end w/out error for 4 or 6 MPI processes, and
> failed but with a PETSC error message
> >  when I used 8 MPI processes, which I have pasted below (see [2]
> output). Does this point to some sort of resource
> > demand that exceeds some limit as the number of MPI processes increases ?
> >
> >   Many thanks once again,
> >             Dan.
> >
> > [2] output
> > [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > [0]PETSC ERROR: Error in external library
> > [0]PETSC ERROR: Error reported by MUMPS in numerical factorization
> phase: INFOG(1)=-9, INFO(2)=6
> >
> > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> > [0]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> > [0]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren
> Wed Aug 25 11:18:48 2021
> > [0]PETSC ERROR: Configure options
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O"
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex
> --with-precision=double --with-debugging=0 --with-openmp
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --download-mumps --download-scalapack --download-cmake
> PETSC_ARCH=arch-omp_nodbug
> > [0]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> > [1]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > [1]PETSC ERROR: Error in external library
> > [1]PETSC ERROR: Error reported by MUMPS in numerical factorization
> phase: INFOG(1)=-9, INFO(2)=6
> >
> > [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> > [1]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> > [1]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren
> Wed Aug 25 11:18:48 2021
> > [1]PETSC ERROR: Configure options
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O"
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex
> --with-precision=double --with-debugging=0 --with-openmp
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --download-mumps --download-scalapack --download-cmake
> PETSC_ARCH=arch-omp_nodbug
> > [1]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> > [1]PETSC ERROR: #2 MatLUFactorNumeric() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> > [1]PETSC ERROR: #3 PCSetUp_LU() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> > [1]PETSC ERROR: #4 PCSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> > [1]PETSC ERROR: #5 KSPSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> > [2]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > [2]PETSC ERROR: Error in external library
> > [2]PETSC ERROR: Error reported by MUMPS in numerical factorization
> phase: INFOG(1)=-9, INFO(2)=6
> >
> > [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> > [2]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> > [2]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren
> Wed Aug 25 11:18:48 2021
> > [2]PETSC ERROR: Configure options
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O"
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex
> --with-precision=double --with-debugging=0 --with-openmp
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --download-mumps --download-scalapack --download-cmake
> PETSC_ARCH=arch-omp_nodbug
> > [2]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> > [2]PETSC ERROR: #2 MatLUFactorNumeric() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> > [2]PETSC ERROR: #3 PCSetUp_LU() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> > [2]PETSC ERROR: #4 PCSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> > [2]PETSC ERROR: #5 KSPSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> > [3]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > [3]PETSC ERROR: Error in external library
> > [3]PETSC ERROR: Error reported by MUMPS in numerical factorization
> phase: INFOG(1)=-9, INFO(2)=6
> >
> > [3]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> > [3]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> > [3]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren
> Wed Aug 25 11:18:48 2021
> > [3]PETSC ERROR: Configure options
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O"
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex
> --with-precision=double --with-debugging=0 --with-openmp
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --download-mumps --download-scalapack --download-cmake
> PETSC_ARCH=arch-omp_nodbug
> > [3]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> > [3]PETSC ERROR: #2 MatLUFactorNumeric() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> > [3]PETSC ERROR: #3 PCSetUp_LU() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> > [3]PETSC ERROR: #4 PCSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> > [3]PETSC ERROR: #5 KSPSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> > [3]PETSC ERROR: #6 STSetUp_Sinvert() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> > [3]PETSC ERROR: #7 STSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> > [3]PETSC ERROR: #8 EPSSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> > [4]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > [4]PETSC ERROR: Error in external library
> > [4]PETSC ERROR: Error reported by MUMPS in numerical factorization
> phase: INFOG(1)=-9, INFO(2)=6
> >
> > [4]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> > [4]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> > [4]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren
> Wed Aug 25 11:18:48 2021
> > [4]PETSC ERROR: Configure options
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O"
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex
> --with-precision=double --with-debugging=0 --with-openmp
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --download-mumps --download-scalapack --download-cmake
> PETSC_ARCH=arch-omp_nodbug
> > [4]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> > [4]PETSC ERROR: #2 MatLUFactorNumeric() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> > [4]PETSC ERROR: #3 PCSetUp_LU() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> > [4]PETSC ERROR: #4 PCSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> > [4]PETSC ERROR: #5 KSPSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> > [4]PETSC ERROR: #6 STSetUp_Sinvert() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> > [4]PETSC ERROR: #7 STSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> > [4]PETSC ERROR: #8 EPSSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> > [4]PETSC ERROR: #9 EPSSolve() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> > [5]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > [5]PETSC ERROR: Error in external library
> > [5]PETSC ERROR: Error reported by MUMPS in numerical factorization
> phase: INFOG(1)=-9, INFO(2)=6
> >
> > [5]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> > [5]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> > [5]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren
> Wed Aug 25 11:18:48 2021
> > [5]PETSC ERROR: Configure options
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O"
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex
> --with-precision=double --with-debugging=0 --with-openmp
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --download-mumps --download-scalapack --download-cmake
> PETSC_ARCH=arch-omp_nodbug
> > [5]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> > [5]PETSC ERROR: #2 MatLUFactorNumeric() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> > [5]PETSC ERROR: #3 PCSetUp_LU() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> > [5]PETSC ERROR: #4 PCSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> > [5]PETSC ERROR: #5 KSPSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> > [5]PETSC ERROR: #6 STSetUp_Sinvert() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> > [5]PETSC ERROR: #7 STSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> > [5]PETSC ERROR: #8 EPSSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> > [5]PETSC ERROR: #9 EPSSolve() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> > [6]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > [6]PETSC ERROR: Error in external library
> > [6]PETSC ERROR: Error reported by MUMPS in numerical factorization
> phase: INFOG(1)=-9, INFO(2)=21891045
> >
> > [6]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> > [6]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> > [6]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren
> Wed Aug 25 11:18:48 2021
> > [6]PETSC ERROR: Configure options
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O"
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex
> --with-precision=double --with-debugging=0 --with-openmp
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --download-mumps --download-scalapack --download-cmake
> PETSC_ARCH=arch-omp_nodbug
> > [6]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> > [6]PETSC ERROR: #2 MatLUFactorNumeric() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> > [6]PETSC ERROR: #3 PCSetUp_LU() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> > [6]PETSC ERROR: #4 PCSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> > [6]PETSC ERROR: #5 KSPSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> > [6]PETSC ERROR: #6 STSetUp_Sinvert() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> > [6]PETSC ERROR: #7 STSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> > [6]PETSC ERROR: #8 EPSSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> > [6]PETSC ERROR: #9 EPSSolve() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> > [7]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > [7]PETSC ERROR: Error in external library
> > [7]PETSC ERROR: Error reported by MUMPS in numerical factorization
> phase: INFOG(1)=-9, INFO(2)=21841925
> >
> > [7]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> > [7]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> > [7]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren
> Wed Aug 25 11:18:48 2021
> > [7]PETSC ERROR: Configure options
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O"
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex
> --with-precision=double --with-debugging=0 --with-openmp
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --download-mumps --download-scalapack --download-cmake
> PETSC_ARCH=arch-omp_nodbug
> > [7]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> > [7]PETSC ERROR: #2 MatLUFactorNumeric() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> > [7]PETSC ERROR: #3 PCSetUp_LU() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> > [7]PETSC ERROR: #4 PCSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> > [7]PETSC ERROR: #5 KSPSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> > [7]PETSC ERROR: #6 STSetUp_Sinvert() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> > [7]PETSC ERROR: #7 STSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> > [7]PETSC ERROR: #8 EPSSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> > [7]PETSC ERROR: #9 EPSSolve() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> > [0]PETSC ERROR: #2 MatLUFactorNumeric() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> > [0]PETSC ERROR: #3 PCSetUp_LU() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> > [0]PETSC ERROR: #4 PCSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> > [0]PETSC ERROR: #5 KSPSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> > [0]PETSC ERROR: #6 STSetUp_Sinvert() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> > [0]PETSC ERROR: #7 STSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> > [0]PETSC ERROR: #8 EPSSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> > [0]PETSC ERROR: #9 EPSSolve() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> > [1]PETSC ERROR: #6 STSetUp_Sinvert() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> > [1]PETSC ERROR: #7 STSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> > [1]PETSC ERROR: #8 EPSSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> > [1]PETSC ERROR: #9 EPSSolve() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> > [2]PETSC ERROR: #6 STSetUp_Sinvert() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> > [2]PETSC ERROR: #7 STSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> > [2]PETSC ERROR: #8 EPSSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> > [2]PETSC ERROR: #9 EPSSolve() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> > [3]PETSC ERROR: #9 EPSSolve() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> >
> >
> >
> > [1] output
> >
> > Continuing.
> > [New Thread 0x7f6f5b2d2780 (LWP 794037)]
> > [New Thread 0x7f6f5aad0800 (LWP 794040)]
> > [New Thread 0x7f6f5a2ce880 (LWP 794041)]
> > ^C
> > Thread 1 "my.exe" received signal SIGINT, Interrupt.
> > 0x00007f72904927b0 in ofi_fastlock_release_noop ()
> >    from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
> > (gdb) where
> > #0  0x00007f72904927b0 in ofi_fastlock_release_noop ()
> >    from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
> > #1  0x00007f729049354b in ofi_cq_readfrom ()
> >    from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
> > #2  0x00007f728ffe8f0e in rxm_ep_do_progress ()
> >    from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
> > #3  0x00007f728ffe2b7d in rxm_ep_recv_common_flags ()
> >    from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
> > #4  0x00007f728ffe30f8 in rxm_ep_trecvmsg ()
> >    from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
> > #5  0x00007f72fe6b8c3e in PMPI_Iprobe (source=14090824, tag=-1481647392,
> >     comm=1, flag=0x0, status=0xffffffffffffffff)
> >     at /usr/include/rdma/fi_tagged.h:109
> > #6  0x00007f72ff3d7fad in pmpi_iprobe_ (v1=0xd70248, v2=0x7ffda7afdae0,
> >     v3=0x1, v4=0x0, v5=0xffffffffffffffff, ierr=0xd6fc90)
> >     at ../../src/binding/fortran/mpif_h/iprobef.c:276
> > #7  0x00007f730855b6e2 in zmumps_try_recvtreat (comm_load=1, ass_irecv=0,
> >     blocking=<error reading variable: Cannot access memory at address
> 0x1>,
> >
> >     --Type <RET> for more, q to quit, c to continue without paging--cont
> >     irecv=<error reading variable: Cannot access memory at address 0x0>,
> message_received=<error reading variable: Cannot access memory at address
> 0xffffffffffffffff>, msgsou=1, msgtag=-1, status=..., bufr=...,
> lbufr=320782504, lbufr_bytes=1283130016, procnode_steps=..., posfac=1,
> iwpos=1, iwposcb=292535, iptrlu=2039063816, lrlu=2039063816,
> lrlus=2039063816, n=50400, iw=..., liw=292563, a=..., la=2611636796,
> ptrist=..., ptlust=..., ptrfac=..., ptrast=..., step=..., pimaster=...,
> pamaster=..., nstk_s=..., comp=0, iflag=0, ierror=0, comm=-1006632958,
> nbprocfils=..., ipool=..., lpool=5, leaf=1, nbfin=4, myid=1, slavef=4,
> root=<error reading variable: value of type `zmumps_root_struc' requires
> 766016 bytes, which is more than max-value-size>, opassw=0, opeliw=0,
> itloc=..., rhs_mumps=..., fils=..., dad=..., ptrarw=..., ptraiw=...,
> intarr=..., dblarr=..., icntl=..., keep=..., keep8=..., dkeep=..., nd=...,
> frere=..., lptrar=50400, nelt=1, frtptr=..., frtelt=...,
> istep_to_iniv2=..., tab_pos_in_pere=..., stack_right_authorized=4294967295,
> lrgroups=...) at zfac_process_message.F:730
> > #8  0x00007f73087076e2 in zmumps_fac_par_m::zmumps_fac_par (n=1, iw=...,
> liw=<error reading variable: Cannot access memory at address 0x1>, a=...,
> la=<error reading variable: Cannot access memory at address
> 0xffffffffffffffff>, nstk_steps=..., nbprocfils=..., nd=..., fils=...,
> step=..., frere=..., dad=..., cand=..., istep_to_iniv2=...,
> tab_pos_in_pere=..., nstepsdone=1690339657, opass=<error reading variable:
> Cannot access memory at address 0x5>, opeli=<error reading variable: Cannot
> access memory at address 0x0>, nelva=50400, comp=259581,
> maxfrt=-1889517576, nmaxnpiv=-1195144887, ntotpv=<error reading variable:
> Cannot access memory at address 0x2>, noffnegpv=<error reading variable:
> Cannot access memory at address 0x0>, nb22t1=<error reading variable:
> Cannot access memory at address 0x0>, nb22t2=<error reading variable:
> Cannot access memory at address 0x0>, nbtiny=<error reading variable:
> Cannot access memory at address 0x0>, det_exp=<error reading variable:
> Cannot access memory at address 0x0>, det_mant=<error reading variable:
> Cannot access memory at address 0x0>, det_sign=<error reading variable:
> Cannot access memory at address 0x0>, ptrist=..., ptrast=..., pimaster=...,
> pamaster=..., ptrarw=..., ptraiw=..., itloc=..., rhs_mumps=..., ipool=...,
> lpool=<error reading variable: Cannot access memory at address 0x0>,
> rinfo=<error reading variable: Cannot access memory at address 0x0>,
> posfac=<error reading variable: Cannot access memory at address 0x0>,
> iwpos=<error reading variable: Cannot access memory at address 0x0>,
> lrlu=<error reading variable: Cannot access memory at address 0x0>,
> iptrlu=<error reading variable: Cannot access memory at address 0x0>,
> lrlus=<error reading variable: Cannot access memory at address 0x0>,
> leaf=<error reading variable: Cannot access memory at address 0x0>,
> nbroot=<error reading variable: Cannot access memory at address 0x0>,
> nbrtot=<error reading variable: Cannot access memory at address 0x0>,
> uu=<error reading variable: Cannot access memory at address 0x0>,
> icntl=<error reading variable: Cannot access memory at address 0x0>,
> ptlust=..., ptrfac=..., info=<error reading variable: Cannot access memory
> at address 0x0>, keep=<error reading variable: Cannot access memory at
> address 0x3ff0000000000000>, keep8=<error reading variable: Cannot access
> memory at address 0x0>, procnode_steps=..., slavef=<error reading variable:
> Cannot access memory at address 0x4ffffffff>, myid=<error reading variable:
> Cannot access memory at address 0xffffffff>, comm_nodes=<error reading
> variable: Cannot access memory at address 0x0>, myid_nodes=<error reading
> variable: Cannot access memory at address 0x0>, bufr=..., lbufr=0,
> lbufr_bytes=5, intarr=..., dblarr=..., root=..., perm=..., nelt=0,
> frtptr=..., frtelt=..., lptrar=3, comm_load=-30, ass_irecv=30,
> seuil=2.1219957909652723e-314, seuil_ldlt_niv2=4.2439866417681519e-314,
> mem_distrib=..., ne=..., dkeep=..., pivnul_list=..., lpn_list=0,
> lrgroups=...) at zfac_par_m.F:182
> > #9  0x00007f730865af7a in zmumps_fac_b (n=1, s_is_pointers=...,
> la=<error reading variable: Cannot access memory at address 0x1>,
> liw=<error reading variable: Cannot access memory at address 0x0>,
> sym_perm=..., na=..., lna=1, ne_steps=..., nfsiz=..., fils=..., step=...,
> frere=..., dad=..., cand=..., istep_to_iniv2=..., tab_pos_in_pere=...,
> ptrar=..., ldptrar=<error reading variable: Cannot access memory at address
> 0x0>, ptrist=..., ptlust_s=..., ptrfac=..., iw1=..., iw2=..., itloc=...,
> rhs_mumps=..., pool=..., lpool=-1889529280, cntl1=-5.3576889161551131e-255,
> icntl=<error reading variable: Cannot access memory at address 0x25344>,
> info=..., rinfo=..., keep=..., keep8=..., procnode_steps=...,
> slavef=-1889504640, comm_nodes=-2048052411, myid=<error reading variable:
> Cannot access memory at address 0x81160>, myid_nodes=-1683330500, bufr=...,
> lbufr=<error reading variable: Cannot access memory at address 0x11db4c>,
> lbufr_bytes=<error reading variable: Cannot access memory at address
> 0xc4e0>, zmumps_lbuf=<error reading variable: Cannot access memory at
> address 0x4>, intarr=..., dblarr=..., root=<error reading variable: Cannot
> access memory at address 0x11dbec>, nelt=<error reading variable: Cannot
> access memory at address 0x3>, frtptr=..., frtelt=..., comm_load=<error
> reading variable: Cannot access memory at address 0x0>, ass_irecv=<error
> reading variable: Cannot access memory at address 0x0>, seuil=<error
> reading variable: Cannot access memory at address 0x0>,
> seuil_ldlt_niv2=<error reading variable: Cannot access memory at address
> 0x0>, mem_distrib=<error reading variable: Cannot access memory at address
> 0x0>, dkeep=<error reading variable: Cannot access memory at address 0x0>,
> pivnul_list=..., lpn_list=<error reading variable: Cannot access memory at
> address 0x0>, lrgroups=...) at zfac_b.F:243
> > #10 0x00007f7308610ff7 in zmumps_fac_driver (id=<error reading variable:
> value of type `zmumps_struc' requires 386095520 bytes, which is more than
> max-value-size>) at zfac_driver.F:2421
> > #11 0x00007f7308569256 in zmumps (id=<error reading variable: value of
> type `zmumps_struc' requires 386095520 bytes, which is more than
> max-value-size>) at zmumps_driver.F:1883
> > #12 0x00007f73084cf756 in zmumps_f77 (job=1, sym=0, par=<error reading
> variable: Cannot access memory at address 0x1>, comm_f77=<error reading
> variable: Cannot access memory at address 0x0>, n=<error reading variable:
> Cannot access memory at address 0xffffffffffffffff>, nblk=1, icntl=...,
> cntl=..., keep=..., dkeep=..., keep8=..., nz=0, nnz=0, irn=..., irnhere=0,
> jcn=..., jcnhere=0, a=..., ahere=0, nz_loc=0, nnz_loc=304384739,
> irn_loc=..., irn_lochere=1, jcn_loc=..., jcn_lochere=1, a_loc=...,
> a_lochere=1, nelt=0, eltptr=..., eltptrhere=0, eltvar=..., eltvarhere=0,
> a_elt=..., a_elthere=0, blkptr=..., blkptrhere=0, blkvar=..., blkvarhere=0,
> perm_in=..., perm_inhere=0, rhs=..., rhshere=0, redrhs=..., redrhshere=0,
> info=..., rinfo=..., infog=..., rinfog=..., deficiency=0, lwk_user=0,
> size_schur=0, listvar_schur=..., listvar_schurhere=0, schur=...,
> schurhere=0, wk_user=..., wk_userhere=0, colsca=..., colscahere=0,
> rowsca=..., rowscahere=0, instance_number=1, nrhs=1, lrhs=0, lredrhs=0,
> rhs_sparse=..., rhs_sparsehere=0, sol_loc=..., sol_lochere=0, rhs_loc=...,
> rhs_lochere=0, irhs_sparse=..., irhs_sparsehere=0, irhs_ptr=...,
> irhs_ptrhere=0, isol_loc=..., isol_lochere=0, irhs_loc=..., irhs_lochere=0,
> nz_rhs=0, lsol_loc=0, lrhs_loc=0, nloc_rhs=0, schur_mloc=0, schur_nloc=0,
> schur_lld=0, mblock=0, nblock=0, nprow=0, npcol=0, ooc_tmpdir=...,
> ooc_prefix=..., write_problem=..., save_dir=..., save_prefix=...,
> tmpdirlen=20, prefixlen=20, write_problemlen=20, save_dirlen=20,
> save_prefixlen=20, metis_options=...) at zmumps_f77.F:289
> > #13 0x00007f73084cd391 in zmumps_c (mumps_par=0xd70248) at mumps_c.c:485
> > #14 0x00007f7307c035ad in MatFactorNumeric_MUMPS (F=0xd70248,
> A=0x7ffda7afdae0, info=0x1) at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1683
> > #15 0x00007f7307765a8b in MatLUFactorNumeric (fact=0xd70248,
> mat=0x7ffda7afdae0, info=0x1) at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> > #16 0x00007f73081b8427 in PCSetUp_LU (pc=0xd70248) at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> > #17 0x00007f7308214939 in PCSetUp (pc=0xd70248) at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> > #18 0x00007f73082260ae in KSPSetUp (ksp=0xd70248) at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> > #19 0x00007f7309114959 in STSetUp_Sinvert (st=0xd70248) at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> > #20 0x00007f7309130462 in STSetUp (st=0xd70248) at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> > #21 0x00007f73092504af in EPSSetUp (eps=0xd70248) at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> > #22 0x00007f7309253635 in EPSSolve (eps=0xd70248) at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> > #23 0x00007f7309259c8d in epssolve_ (eps=0xd70248,
> __ierr=0x7ffda7afdae0) at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/ftn-auto/epssolvef.c:85
> > #24 0x0000000000403c19 in all_stab_routines::solve_by_slepc2 (a_pet=...,
> b_pet=..., jthisone=<error reading variable: Cannot access memory at
> address 0x1>, isize=<error reading variable: Cannot access memory at
> address 0x0>) at small_slepc_example_program.F:322
> > #25 0x00000000004025a0 in slepit () at small_slepc_example_program.F:549
> > #26 0x00000000004023f2 in main ()
> > #27 0x00007f72fb8380b3 in __libc_start_main (main=0x4023c0 <main>,
> argc=14, argv=0x7ffda7b024e8, init=<optimized out>, fini=<optimized out>,
> rtld_fini=<optimized out>, stack_end=0x7ffda7b024d8) at
> ../csu/libc-start.c:308
> > #28 0x00000000004022fe in _start ()
> >
> > From: Matthew Knepley <knepley at gmail.com>
> > Sent: Tuesday, August 24, 2021 3:59 PM
> > To: dazza simplythebest <sayosale at hotmail.com>
> > Cc: Jose E. Roman <jroman at dsic.upv.es>; PETSc <petsc-users at mcs.anl.gov>
> > Subject: Re: [petsc-users] Improving efficiency of slepc usage
> >
> > On Tue, Aug 24, 2021 at 8:47 AM dazza simplythebest <
> sayosale at hotmail.com> wrote:
> >
> > Dear Matthew and Jose,
> >    Apologies for the delayed reply, I had a couple of unforeseen days
> off this week.
> > Firstly regarding Jose's suggestion re: MUMPS, the program is already
> using MUMPS
> > to solve linear systems (the code is using a distributed MPI  matrix to
> solve the generalised
> > non-Hermitian complex problem).
> >
> > I have tried the gdb debugger as per Matthew's suggestion.
> > Just to note in case someone else is following this that at first it
> didn't work (couldn't 'attach') ,
> > but after some googling I found a tip suggesting the command;
> > echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
> > which seemed to get it working.
> >
> > I then first ran the debugger on the small matrix case that worked.
> > That stopped in gdb almost immediately after starting execution
> > with a report regarding 'nanosleep.c':
> > ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory.
> > However, issuing the 'cont' command again caused the program to run
> through to the end of the
> >  execution w/out any problems, and with correct looking results, so I am
> guessing this error
> > is not particularly important.
> >
> > We do that on purpose when the debugger starts up. Typing 'cont' is
> correct.
> >
> > I then tried the same debugging procedure on the large matrix case that
> fails.
> > The code again stopped almost immediately after the start of execution
> with
> > the same nanosleep error as before, and I was able to set the program
> running
> >  again with 'cont' (see full output below). I was running the code with
> 4 MPI processes,
> >  and so had 4 gdb windows appear.  Thereafter the code ran for sometime
> until completing the
> > matrix construction, and then one of the gdb process windows printed a
> > Program terminated with signal SIGKILL, Killed.
> > The program no longer exists.
> > message.  I then typed 'where' into this terminal but just received the
> message
> > No stack.
> >
> > I have only seen this behavior one other time, and it was with Fortran.
> Fortran allows you to declare really big arrays
> > on the stack by putting them at the start of a function (rather than F90
> malloc). When I had one of those arrays exceed
> > the stack space, I got this kind of an error where everything is
> destroyed rather than just stopping. Could it be that you
> > have a large structure on the stack?
> >
> > Second, you can at least look at the stack for the processes that were
> not killed. You type Ctrl-C, which should give you
> > the prompt and then "where".
> >
> >   Thanks,
> >
> >       Matt
> >
> > The other gdb windows basically seemed to be left in limbo until I
> issued the 'quit'
> >  command in the SIGKILL, and then they vanished.
> >
> > I paste the full output from the gdb window that recorded the SIGKILL
> below here.
> > I guess it is necessary to somehow work out where the SIGKILL originates
> from ?
> >
> >  Thanks once again,
> >                          Dan.
> >
> >
> >  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> > GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
> > Copyright (C) 2020 Free Software Foundation, Inc.
> > License GPLv3+: GNU GPL version 3 or later <
> http://gnu.org/licenses/gpl.html>
> > This is free software: you are free to change and redistribute it.
> > There is NO WARRANTY, to the extent permitted by law.
> > Type "show copying" and "show warranty" for details.
> > This GDB was configured as "x86_64-linux-gnu".
> > Type "show configuration" for configuration details.
> > For bug reporting instructions, please see:
> > <http://www.gnu.org/software/gdb/bugs/>.
> > Find the GDB manual and other documentation resources online at:
> >     <http://www.gnu.org/software/gdb/documentation/>.
> >
> > For help, type "help".
> > Type "apropos word" to search for commands related to "word"...
> > Reading symbols from ./stab1.exe...
> > Attaching to program:
> /data/work/rotplane/omega_to_zero/stability/test/tmp10/tmp6/stab1.exe,
> process 675919
> > Reading symbols from
> /data/work/slepc/SLEPC/slepc-3.15.1/arch-omp_nodbug/lib/libslepc.so.3.15...
> > Reading symbols from
> /data/work/slepc/PETSC/petsc-3.15.0/arch-omp_nodbug/lib/libpetsc.so.3.15...
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib--Type <RET> for
> more, q to quit, c to continue without paging--cont
> > /intel64_lin/libmkl_intel_lp64.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so...
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.so...
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.dbg...
> > Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...
> > Reading symbols from
> /usr/lib/debug//lib/x86_64-linux-gnu/libdl-2.31.so...
> > Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...
> > Reading symbols from
> /usr/lib/debug/.build-id/e5/4761f7b554d0fcc1562959665d93dffbebdaf0.debug...
> > [Thread debugging using libthread_db enabled]
> > Using host libthread_db library
> "/lib/x86_64-linux-gnu/libthread_db.so.1".
> > Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6...
> > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpifort.so.12...
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12...
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.dbg...
> > Reading symbols from /lib/x86_64-linux-gnu/librt.so.1...
> > Reading symbols from
> /usr/lib/debug//lib/x86_64-linux-gnu/librt-2.31.so...
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so)
> > Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...
> > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libm-2.31.so...
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so)
> > Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1...
> > (No debugging symbols found in /lib/x86_64-linux-gnu/libgcc_s.so.1)
> > Reading symbols from /usr/lib/x86_64-linux-gnu/libquadmath.so.0...
> > (No debugging symbols found in
> /usr/lib/x86_64-linux-gnu/libquadmath.so.0)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so)
> > Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...
> > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.31.so...
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5)
> > Reading symbols from /lib64/ld-linux-x86-64.so.2...
> > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.31.so...
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1)
> > Reading symbols from /usr/lib/x86_64-linux-gnu/libnuma.so...
> > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libnuma.so)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so)
> > Reading symbols from /usr/lib/x86_64-linux-gnu/libpsm2.so.2...
> > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libpsm2.so.2)
> > 0x00007fac4d0d8334 in __GI___clock_nanosleep (clock_id=<optimized out>,
> clock_id at entry=0, flags=flags at entry=0, req=req at entry=0x7ffdc641a9a0,
> rem=rem at entry=0x7ffdc641a9a0) at
> ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
> > 78      ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or
> directory.
> > (gdb) cont
> > Continuing.
> > [New Thread 0x7f9e49c02780 (LWP 676559)]
> > [New Thread 0x7f9e49400800 (LWP 676560)]
> > [New Thread 0x7f9e48bfe880 (LWP 676562)]
> > [Thread 0x7f9e48bfe880 (LWP 676562) exited]
> > [Thread 0x7f9e49400800 (LWP 676560) exited]
> > [Thread 0x7f9e49c02780 (LWP 676559) exited]
> >
> > Program terminated with signal SIGKILL, Killed.
> > The program no longer exists.
> > (gdb) where
> > No stack.
> >
> >  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - - - - - - - - -
> >
> > From: Matthew Knepley <knepley at gmail.com>
> > Sent: Friday, August 20, 2021 2:12 PM
> > To: dazza simplythebest <sayosale at hotmail.com>
> > Cc: Jose E. Roman <jroman at dsic.upv.es>; PETSc <petsc-users at mcs.anl.gov>
> > Subject: Re: [petsc-users] Improving efficiency of slepc usage
> >
> > On Fri, Aug 20, 2021 at 6:55 AM dazza simplythebest <
> sayosale at hotmail.com> wrote:
> > Dear Jose,
> >     Many thanks for your response, I have been investigating this issue
> with a few more calculations
> > today, hence the slightly delayed response.
> >
> > The problem is actually derived from a fluid dynamics problem, so to
> allow an easier exploration of things
> > I first downsized the resolution of the underlying fluid solver while
> keeping all the physical parameters
> >  the same - i.e. I would get a smaller matrix that should be solving the
> same physical problem as the original
> >  larger matrix but to lower accuracy.
> >
> > Results
> >
> > Small matrix (N= 21168) - everything good!
> > This converged when using the -eps_largest_real approach (taking 92
> iterations for nev=10,
> > tol= 5.0000E-06 and ncv = 300), and also when using the shift-invert
> approach, converging
> > very impressively in a single iteration ! Interestingly it did this both
> for a non-zero  -eps_target
> >  and also for a zero  -eps_target.
> >
> > Large matrix (N=50400)- works for -eps_largest_real , fails for st_type
> sinvert
> > I have just double checked again that the code does run properly when we
> use the -eps_largest_real
> > option - indeed I ran it with a small nev and large tolerance (nev = 4,
> tol= -eps_tol 5.0e-4 , ncv = 300)
> > and with these parameters convergence was obtained in 164 iterations,
> which took 6 hours on the
> > machine I was running it on. Furthermore the eigenvalues seem to be
> ballpark correct; for this large
> > higher resolution case (although with lower slepc tolerance) we obtain
> 1789.56816314173 -4724.51319554773i
> >  as the eigenvalue with largest real part, while the smaller matrix
> (same physical problem but at lower resolution case)
> > found this eigenvalue to be 1831.11845726501 -4787.54519511345i , which
> means the agreement is in line
> > with expectations.
> >
> > Unfortunately though the code does still crash though when I try to do
> shift-invert for the large matrix case ,
> >  whether or not I use a non-zero  -eps_target. For reference this is the
> command line used :
> > -eps_nev 10    -eps_ncv 300  -log_view -eps_view   -eps_target 0.1
> -st_type sinvert -eps_monitor :monitor_output05.txt
> > To be precise the code crashes soon after calling EPSSolve (it
> successfully calls
> >  MatCreateVecs, EPSCreate,  EPSSetOperators, EPSSetProblemType and
> EPSSetFromOptions).
> > By crashes I mean that I do not even get any error messages from
> slepc/PETSC, and do not even get the
> > 'EPS Object: 16 MPI processes' message - I simply get a  MPI/Fortran
> 'KILLED BY SIGNAL: 9 (Killed)' message
> >  as soon as EPSsolve is called.
> >
> > Hi Dan,
> >
> > It would help track this error down if we had a stack trace. You can get
> a stack trace from the debugger. You run with
> >
> >   -start_in_debugger
> >
> > which should launch the debugger (usually), and then type
> >
> >   cont
> >
> > to continue, and then
> >
> >   where
> >
> > to get the stack trace when it crashes, or 'bt' on lldb.
> >
> >   Thanks,
> >
> >      Matt
> >
> > Do you have any ideas as to why this larger matrix case should fail when
> using shift-invert but succeed when using
> > -eps_largest_real ? The fact that the program works and produces correct
> results
> > when using the -eps_largest_real  option suggests that there is probably
> nothing wrong with the specification
> > of the problem or the matrices ? It is strange how there is no error
> message from slepc / Petsc ... the
> > only idea I have at the moment is that perhaps max memory has been
> exceeded, which could cause such a sudden
> > shutdown? For your reference when running the large matrix case with the
> -eps_largest_real option I am using
> > about 36 GB of the 148GB available on this machine  - does the shift
> invert approach require substantially
> > more memory for example ?
> >
> >   I would be very grateful if you have any suggestions to resolve this
> issue or even ways to clarify it further,
> >  the performance I have seen with the shift-invert for the small matrix
> is so impressive it would be great to
> >  get that working for the full-size problem.
> >
> >    Many thanks and best wishes,
> >                                   Dan.
> >
> >
> >
> > From: Jose E. Roman <jroman at dsic.upv.es>
> > Sent: Thursday, August 19, 2021 7:58 AM
> > To: dazza simplythebest <sayosale at hotmail.com>
> > Cc: PETSc <petsc-users at mcs.anl.gov>
> > Subject: Re: [petsc-users] Improving efficiency of slepc usage
> >
> > In A) convergence may be slow, especially if the wanted eigenvalues have
> small magnitude. I would not say 600 iterations is a lot, you probably need
> many more. In most cases, approach B) is better because it improves
> convergence of eigenvalues close to the target, but it requires prior
> knowledge of your spectrum distribution in order to choose an appropriate
> target.
> >
> > In B) what do you mean that it crashes. If you get an error about
> factorization, it means that your A-matrix is singular, In that case, try
> using a nonzero target -eps_target 0.1
> >
> > Jose
> >
> >
> > > El 19 ago 2021, a las 7:12, dazza simplythebest <sayosale at hotmail.com>
> escribi?:
> > >
> > > Dear All,
> > >             I am planning on using slepc to do a large number of
> eigenvalue calculations
> > >  of a generalized eigenvalue problem, called from a program written in
> fortran using MPI.
> > >  Thus far I have successfully installed the slepc/PETSc software, both
> locally and on a cluster,
> > >  and on smaller test problems everything is working well; the matrices
> are efficiently and
> > > correctly constructed and slepc returns the correct spectrum. I am
> just now starting to move
> > > towards now solving the full-size 'production run' problems, and would
> appreciate some
> > > general advice on how to improve the solver's performance.
> > >
> > > In particular, I am currently trying to solve the problem Ax = lambda
> Bx whose matrices
> > > are of size 50000 (this is the smallest 'production run' problem I
> will be tackling), and are
> > > complex, non-Hermitian.  In most cases I aim to find the eigenvalues
> with the largest real part,
> > > although in other cases I will also be interested in finding the
> eigenvalues whose real part
> > > is close to zero.
> > >
> > > A)
> > > Calling slepc 's EPS solver with the following options:
> > >
> > > -eps_nev 10   -log_view -eps_view -eps_max_it 600 -eps_ncv 140
> -eps_tol 5.0e-6  -eps_largest_real -eps_monitor :monitor_output.txt
> > >
> > >
> > > led to the code successfully running, but failing to find any
> eigenvalues within the maximum 600 iterations
> > > (examining the monitor output it did appear to be very slowly
> approaching convergence).
> > >
> > > B)
> > > On the same problem I have also tried a shift-invert transformation
> using the options
> > >
> > > -eps_nev 10    -eps_ncv 140    -eps_target 0.0+0.0i  -st_type sinvert
> > >
> > > -in this case the code crashed at the point it tried to call slepc, so
> perhaps I have incorrectly specified these options ?
> > >
> > >
> > > Does anyone have any suggestions as to how to improve this performance
> ( or find out more about the problem) ?
> > > In the case of A) I can see from watching the slepc   videos that
> increasing ncv
> > > may help, but I am wondering , since 600 is a large number of
> iterations, whether there
> > > maybe something else going on - e.g. perhaps some alternative
> preconditioner may help ?
> > > In the case of B), I guess there must be some mistake in these command
> line options?
> > >  Again, any advice will be greatly appreciated.
> > >      Best wishes,  Dan.
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > -- Norbert Wiener
> >
> > https://www.cse.buffalo.edu/~knepley/
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > -- Norbert Wiener
> >
> > https://www.cse.buffalo.edu/~knepley/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210826/b83cdad0/attachment-0001.html>

From knepley at gmail.com  Thu Aug 26 10:53:40 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Thu, 26 Aug 2021 11:53:40 -0400
Subject: [petsc-users] Improving efficiency of slepc usage -memory
 management when using shift-invert
In-Reply-To: <MEYP282MB19756D7938BFD132EB05090CD0C79@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
References: <MEYP282MB1975D9155BCF70CF2F5D65CAD0C09@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<EF005C76-523A-4D6B-8DA5-45A9A3FC37B3@dsic.upv.es>
	<MEYP282MB19759AF49362AE3EF4024546D0C19@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<CAMYG4Gn1h_S4o9LsnyzL_PRkHMxsXkpHgL16n_3iL5B6bRq7sw@mail.gmail.com>
	<MEYP282MB19750B9F47336C606742AA73D0C49@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<CAMYG4GkP6gM=KGaCQvkAFHLBVeKZY-nMt=-MH2qimh922wOK_g@mail.gmail.com>
	<MEYP282MB1975548CFDB3392C26941AC8D0C69@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<MEYP282MB19755EE893AC7BBE055DE4E0D0C69@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<B30EBC8E-A797-4D1A-AE1E-7EDAD39A610F@dsic.upv.es>
	<MEYP282MB19756D7938BFD132EB05090CD0C79@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
Message-ID: <CAMYG4Gnadx4Z7=+R6BJBuQhsd5N=sJJYKV19Um-Hn3mOUFk-=g@mail.gmail.com>

On Thu, Aug 26, 2021 at 8:32 AM dazza simplythebest <sayosale at hotmail.com>
wrote:

> Dear Jose and Matthew,
>                  Many thanks for your assistance, this would seem to
> explain what the problem was.
> So judging by this test case, there seems to be a memory vs computational
> time tradeoff involved
>  in choosing whether to shift-invert or not; the shift-invert will greatly
> reduce the
> number of required iterations ,but will require a higher memory cost ?
> I have been trying a few values of -st_mat_mumps_icntl_14 (and also the
> alternative
> -st_mat_mumps_icntl_23) today but have not yet been able to select one
> that fits onto the
>  workstation I am using (although it seems that setting these parameters
> seems to guarantee
>  that an error message is generated at least).
>
> Thus I will probably need to reduce the number of MPI
> processes and thereby reduce the memory requirement). In this regard the
> MUMPS documentation
>  suggests that a hybrid MPI-OpenMP approach is optimum for their software,
> whereas I remember reading
> somewhere else that openmp threading was not a good choice for using
> PETSC, would you have any
> general advice on this ?
>

Memory does not really track the number of MPI processes. MUMPS does a lot
of things redundantly. For minimum memory, I
would suggest trying SuperLU_dist:

  --download-superlu_dist

I do not think OpenMP will have much influence at all.

  Thanks,

     Matt


> I was thinking maybe that a version of slepc / petsc compiled against
> openmp,
>  and with the number of threads set appropriately, but not explicitly
> using openmp directives in
>  the user's code may be the way forward ? That way PETSC will (?) just
> ignore the threading whereas
>  threading will be available to MUMPS when execution is passed to those
> routines ?
>
>  Many thanks once again,
>              Dan.
>
>
>
> ------------------------------
> *From:* Jose E. Roman <jroman at dsic.upv.es>
> *Sent:* Wednesday, August 25, 2021 1:40 PM
> *To:* dazza simplythebest <sayosale at hotmail.com>
> *Cc:* PETSc <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] Improving efficiency of slepc usage
>
> MUMPS documentation (section 8) indicates that the meaning of INFOG(1)=-9
> is insuficient workspace. Try running with
>  -st_mat_mumps_icntl_14 <percentage>
> where <percentage> is the percentage in which you want to increase the
> workspace, e.g. 50 or 100 or more.
>
> See ex43.c for an example showing how to set this option in code.
>
> Jose
>
>
> > El 25 ago 2021, a las 14:11, dazza simplythebest <sayosale at hotmail.com>
> escribi?:
> >
> >
> >
> > From: dazza simplythebest <sayosale at hotmail.com>
> > Sent: Wednesday, August 25, 2021 12:08 PM
> > To: Matthew Knepley <knepley at gmail.com>
> > Subject: Re: [petsc-users] Improving efficiency of slepc usage
> >
> > Dear Matthew and Jose,
> >                                           I have derived a smaller
> program from the original program by constructing
> > matrices of the same size, but filling their entries randomly instead of
> computing the correct
> > fluid dynamics values just to allow  faster experimentation. This
> modified code's behaviour seems
> >  to be similar, with the code again failing for the large matrix case
> with  the SIGKILL error, so I first report
> > results from that code here. Firstly I can confirm that I am using
> Fortran , and I am compiling with the
> >  intel compiler, which it seems places automatic arrays on the stack.
> The stacksize, as determined
> > by ulimit -a, is reported to be :
> > stack size              (kbytes, -s) 8192
> >
> > [1] Okay, so I followed your suggestion and used ctrl-c  followed by
> 'where' in one of the non-SIGKILL gdb windows.
> >  I have pasted the output into the bottom of this email (see [1] output)
> - it does look like the problem occurs somewhere in the call
> >  to the MUMPS solver ?
> >
> > [2] I have also today gained access to another workstation, and so have
> tried running the (original) code on that machine.
> >   This new machine has two (more powerful) CPU nodes and a larger memory
> (both machines feature Intel Xeon processors).
> > On this new machine the large matrix case again failed with the familiar
> SIGKILL report when I used 16 or 12 MPI
> > processes,  ran to the end w/out error for 4 or 6 MPI processes, and
> failed but with a PETSC error message
> >  when I used 8 MPI processes, which I have pasted below (see [2]
> output). Does this point to some sort of resource
> > demand that exceeds some limit as the number of MPI processes increases ?
> >
> >   Many thanks once again,
> >             Dan.
> >
> > [2] output
> > [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > [0]PETSC ERROR: Error in external library
> > [0]PETSC ERROR: Error reported by MUMPS in numerical factorization
> phase: INFOG(1)=-9, INFO(2)=6
> >
> > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> > [0]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> > [0]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren
> Wed Aug 25 11:18:48 2021
> > [0]PETSC ERROR: Configure options
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O"
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex
> --with-precision=double --with-debugging=0 --with-openmp
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --download-mumps --download-scalapack --download-cmake
> PETSC_ARCH=arch-omp_nodbug
> > [0]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> > [1]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > [1]PETSC ERROR: Error in external library
> > [1]PETSC ERROR: Error reported by MUMPS in numerical factorization
> phase: INFOG(1)=-9, INFO(2)=6
> >
> > [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> > [1]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> > [1]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren
> Wed Aug 25 11:18:48 2021
> > [1]PETSC ERROR: Configure options
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O"
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex
> --with-precision=double --with-debugging=0 --with-openmp
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --download-mumps --download-scalapack --download-cmake
> PETSC_ARCH=arch-omp_nodbug
> > [1]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> > [1]PETSC ERROR: #2 MatLUFactorNumeric() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> > [1]PETSC ERROR: #3 PCSetUp_LU() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> > [1]PETSC ERROR: #4 PCSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> > [1]PETSC ERROR: #5 KSPSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> > [2]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > [2]PETSC ERROR: Error in external library
> > [2]PETSC ERROR: Error reported by MUMPS in numerical factorization
> phase: INFOG(1)=-9, INFO(2)=6
> >
> > [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> > [2]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> > [2]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren
> Wed Aug 25 11:18:48 2021
> > [2]PETSC ERROR: Configure options
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O"
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex
> --with-precision=double --with-debugging=0 --with-openmp
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --download-mumps --download-scalapack --download-cmake
> PETSC_ARCH=arch-omp_nodbug
> > [2]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> > [2]PETSC ERROR: #2 MatLUFactorNumeric() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> > [2]PETSC ERROR: #3 PCSetUp_LU() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> > [2]PETSC ERROR: #4 PCSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> > [2]PETSC ERROR: #5 KSPSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> > [3]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > [3]PETSC ERROR: Error in external library
> > [3]PETSC ERROR: Error reported by MUMPS in numerical factorization
> phase: INFOG(1)=-9, INFO(2)=6
> >
> > [3]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> > [3]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> > [3]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren
> Wed Aug 25 11:18:48 2021
> > [3]PETSC ERROR: Configure options
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O"
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex
> --with-precision=double --with-debugging=0 --with-openmp
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --download-mumps --download-scalapack --download-cmake
> PETSC_ARCH=arch-omp_nodbug
> > [3]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> > [3]PETSC ERROR: #2 MatLUFactorNumeric() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> > [3]PETSC ERROR: #3 PCSetUp_LU() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> > [3]PETSC ERROR: #4 PCSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> > [3]PETSC ERROR: #5 KSPSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> > [3]PETSC ERROR: #6 STSetUp_Sinvert() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> > [3]PETSC ERROR: #7 STSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> > [3]PETSC ERROR: #8 EPSSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> > [4]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > [4]PETSC ERROR: Error in external library
> > [4]PETSC ERROR: Error reported by MUMPS in numerical factorization
> phase: INFOG(1)=-9, INFO(2)=6
> >
> > [4]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> > [4]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> > [4]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren
> Wed Aug 25 11:18:48 2021
> > [4]PETSC ERROR: Configure options
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O"
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex
> --with-precision=double --with-debugging=0 --with-openmp
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --download-mumps --download-scalapack --download-cmake
> PETSC_ARCH=arch-omp_nodbug
> > [4]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> > [4]PETSC ERROR: #2 MatLUFactorNumeric() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> > [4]PETSC ERROR: #3 PCSetUp_LU() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> > [4]PETSC ERROR: #4 PCSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> > [4]PETSC ERROR: #5 KSPSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> > [4]PETSC ERROR: #6 STSetUp_Sinvert() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> > [4]PETSC ERROR: #7 STSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> > [4]PETSC ERROR: #8 EPSSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> > [4]PETSC ERROR: #9 EPSSolve() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> > [5]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > [5]PETSC ERROR: Error in external library
> > [5]PETSC ERROR: Error reported by MUMPS in numerical factorization
> phase: INFOG(1)=-9, INFO(2)=6
> >
> > [5]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> > [5]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> > [5]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren
> Wed Aug 25 11:18:48 2021
> > [5]PETSC ERROR: Configure options
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O"
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex
> --with-precision=double --with-debugging=0 --with-openmp
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --download-mumps --download-scalapack --download-cmake
> PETSC_ARCH=arch-omp_nodbug
> > [5]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> > [5]PETSC ERROR: #2 MatLUFactorNumeric() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> > [5]PETSC ERROR: #3 PCSetUp_LU() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> > [5]PETSC ERROR: #4 PCSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> > [5]PETSC ERROR: #5 KSPSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> > [5]PETSC ERROR: #6 STSetUp_Sinvert() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> > [5]PETSC ERROR: #7 STSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> > [5]PETSC ERROR: #8 EPSSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> > [5]PETSC ERROR: #9 EPSSolve() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> > [6]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > [6]PETSC ERROR: Error in external library
> > [6]PETSC ERROR: Error reported by MUMPS in numerical factorization
> phase: INFOG(1)=-9, INFO(2)=21891045
> >
> > [6]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> > [6]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> > [6]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren
> Wed Aug 25 11:18:48 2021
> > [6]PETSC ERROR: Configure options
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O"
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex
> --with-precision=double --with-debugging=0 --with-openmp
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --download-mumps --download-scalapack --download-cmake
> PETSC_ARCH=arch-omp_nodbug
> > [6]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> > [6]PETSC ERROR: #2 MatLUFactorNumeric() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> > [6]PETSC ERROR: #3 PCSetUp_LU() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> > [6]PETSC ERROR: #4 PCSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> > [6]PETSC ERROR: #5 KSPSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> > [6]PETSC ERROR: #6 STSetUp_Sinvert() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> > [6]PETSC ERROR: #7 STSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> > [6]PETSC ERROR: #8 EPSSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> > [6]PETSC ERROR: #9 EPSSolve() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> > [7]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> > [7]PETSC ERROR: Error in external library
> > [7]PETSC ERROR: Error reported by MUMPS in numerical factorization
> phase: INFOG(1)=-9, INFO(2)=21841925
> >
> > [7]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> > [7]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> > [7]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren
> Wed Aug 25 11:18:48 2021
> > [7]PETSC ERROR: Configure options
> ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs
> --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort
> --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O"
> CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex
> --with-precision=double --with-debugging=0 --with-openmp
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
> --download-mumps --download-scalapack --download-cmake
> PETSC_ARCH=arch-omp_nodbug
> > [7]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> > [7]PETSC ERROR: #2 MatLUFactorNumeric() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> > [7]PETSC ERROR: #3 PCSetUp_LU() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> > [7]PETSC ERROR: #4 PCSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> > [7]PETSC ERROR: #5 KSPSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> > [7]PETSC ERROR: #6 STSetUp_Sinvert() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> > [7]PETSC ERROR: #7 STSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> > [7]PETSC ERROR: #8 EPSSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> > [7]PETSC ERROR: #9 EPSSolve() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> > [0]PETSC ERROR: #2 MatLUFactorNumeric() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> > [0]PETSC ERROR: #3 PCSetUp_LU() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> > [0]PETSC ERROR: #4 PCSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> > [0]PETSC ERROR: #5 KSPSetUp() at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> > [0]PETSC ERROR: #6 STSetUp_Sinvert() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> > [0]PETSC ERROR: #7 STSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> > [0]PETSC ERROR: #8 EPSSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> > [0]PETSC ERROR: #9 EPSSolve() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> > [1]PETSC ERROR: #6 STSetUp_Sinvert() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> > [1]PETSC ERROR: #7 STSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> > [1]PETSC ERROR: #8 EPSSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> > [1]PETSC ERROR: #9 EPSSolve() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> > [2]PETSC ERROR: #6 STSetUp_Sinvert() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> > [2]PETSC ERROR: #7 STSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> > [2]PETSC ERROR: #8 EPSSetUp() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> > [2]PETSC ERROR: #9 EPSSolve() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> > [3]PETSC ERROR: #9 EPSSolve() at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> >
> >
> >
> > [1] output
> >
> > Continuing.
> > [New Thread 0x7f6f5b2d2780 (LWP 794037)]
> > [New Thread 0x7f6f5aad0800 (LWP 794040)]
> > [New Thread 0x7f6f5a2ce880 (LWP 794041)]
> > ^C
> > Thread 1 "my.exe" received signal SIGINT, Interrupt.
> > 0x00007f72904927b0 in ofi_fastlock_release_noop ()
> >    from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
> > (gdb) where
> > #0  0x00007f72904927b0 in ofi_fastlock_release_noop ()
> >    from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
> > #1  0x00007f729049354b in ofi_cq_readfrom ()
> >    from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
> > #2  0x00007f728ffe8f0e in rxm_ep_do_progress ()
> >    from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
> > #3  0x00007f728ffe2b7d in rxm_ep_recv_common_flags ()
> >    from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
> > #4  0x00007f728ffe30f8 in rxm_ep_trecvmsg ()
> >    from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
> > #5  0x00007f72fe6b8c3e in PMPI_Iprobe (source=14090824, tag=-1481647392,
> >     comm=1, flag=0x0, status=0xffffffffffffffff)
> >     at /usr/include/rdma/fi_tagged.h:109
> > #6  0x00007f72ff3d7fad in pmpi_iprobe_ (v1=0xd70248, v2=0x7ffda7afdae0,
> >     v3=0x1, v4=0x0, v5=0xffffffffffffffff, ierr=0xd6fc90)
> >     at ../../src/binding/fortran/mpif_h/iprobef.c:276
> > #7  0x00007f730855b6e2 in zmumps_try_recvtreat (comm_load=1, ass_irecv=0,
> >     blocking=<error reading variable: Cannot access memory at address
> 0x1>,
> >
> >     --Type <RET> for more, q to quit, c to continue without paging--cont
> >     irecv=<error reading variable: Cannot access memory at address 0x0>,
> message_received=<error reading variable: Cannot access memory at address
> 0xffffffffffffffff>, msgsou=1, msgtag=-1, status=..., bufr=...,
> lbufr=320782504, lbufr_bytes=1283130016, procnode_steps=..., posfac=1,
> iwpos=1, iwposcb=292535, iptrlu=2039063816, lrlu=2039063816,
> lrlus=2039063816, n=50400, iw=..., liw=292563, a=..., la=2611636796,
> ptrist=..., ptlust=..., ptrfac=..., ptrast=..., step=..., pimaster=...,
> pamaster=..., nstk_s=..., comp=0, iflag=0, ierror=0, comm=-1006632958,
> nbprocfils=..., ipool=..., lpool=5, leaf=1, nbfin=4, myid=1, slavef=4,
> root=<error reading variable: value of type `zmumps_root_struc' requires
> 766016 bytes, which is more than max-value-size>, opassw=0, opeliw=0,
> itloc=..., rhs_mumps=..., fils=..., dad=..., ptrarw=..., ptraiw=...,
> intarr=..., dblarr=..., icntl=..., keep=..., keep8=..., dkeep=..., nd=...,
> frere=..., lptrar=50400, nelt=1, frtptr=..., frtelt=...,
> istep_to_iniv2=..., tab_pos_in_pere=..., stack_right_authorized=4294967295,
> lrgroups=...) at zfac_process_message.F:730
> > #8  0x00007f73087076e2 in zmumps_fac_par_m::zmumps_fac_par (n=1, iw=...,
> liw=<error reading variable: Cannot access memory at address 0x1>, a=...,
> la=<error reading variable: Cannot access memory at address
> 0xffffffffffffffff>, nstk_steps=..., nbprocfils=..., nd=..., fils=...,
> step=..., frere=..., dad=..., cand=..., istep_to_iniv2=...,
> tab_pos_in_pere=..., nstepsdone=1690339657, opass=<error reading variable:
> Cannot access memory at address 0x5>, opeli=<error reading variable: Cannot
> access memory at address 0x0>, nelva=50400, comp=259581,
> maxfrt=-1889517576, nmaxnpiv=-1195144887, ntotpv=<error reading variable:
> Cannot access memory at address 0x2>, noffnegpv=<error reading variable:
> Cannot access memory at address 0x0>, nb22t1=<error reading variable:
> Cannot access memory at address 0x0>, nb22t2=<error reading variable:
> Cannot access memory at address 0x0>, nbtiny=<error reading variable:
> Cannot access memory at address 0x0>, det_exp=<error reading variable:
> Cannot access memory at address 0x0>, det_mant=<error reading variable:
> Cannot access memory at address 0x0>, det_sign=<error reading variable:
> Cannot access memory at address 0x0>, ptrist=..., ptrast=..., pimaster=...,
> pamaster=..., ptrarw=..., ptraiw=..., itloc=..., rhs_mumps=..., ipool=...,
> lpool=<error reading variable: Cannot access memory at address 0x0>,
> rinfo=<error reading variable: Cannot access memory at address 0x0>,
> posfac=<error reading variable: Cannot access memory at address 0x0>,
> iwpos=<error reading variable: Cannot access memory at address 0x0>,
> lrlu=<error reading variable: Cannot access memory at address 0x0>,
> iptrlu=<error reading variable: Cannot access memory at address 0x0>,
> lrlus=<error reading variable: Cannot access memory at address 0x0>,
> leaf=<error reading variable: Cannot access memory at address 0x0>,
> nbroot=<error reading variable: Cannot access memory at address 0x0>,
> nbrtot=<error reading variable: Cannot access memory at address 0x0>,
> uu=<error reading variable: Cannot access memory at address 0x0>,
> icntl=<error reading variable: Cannot access memory at address 0x0>,
> ptlust=..., ptrfac=..., info=<error reading variable: Cannot access memory
> at address 0x0>, keep=<error reading variable: Cannot access memory at
> address 0x3ff0000000000000>, keep8=<error reading variable: Cannot access
> memory at address 0x0>, procnode_steps=..., slavef=<error reading variable:
> Cannot access memory at address 0x4ffffffff>, myid=<error reading variable:
> Cannot access memory at address 0xffffffff>, comm_nodes=<error reading
> variable: Cannot access memory at address 0x0>, myid_nodes=<error reading
> variable: Cannot access memory at address 0x0>, bufr=..., lbufr=0,
> lbufr_bytes=5, intarr=..., dblarr=..., root=..., perm=..., nelt=0,
> frtptr=..., frtelt=..., lptrar=3, comm_load=-30, ass_irecv=30,
> seuil=2.1219957909652723e-314, seuil_ldlt_niv2=4.2439866417681519e-314,
> mem_distrib=..., ne=..., dkeep=..., pivnul_list=..., lpn_list=0,
> lrgroups=...) at zfac_par_m.F:182
> > #9  0x00007f730865af7a in zmumps_fac_b (n=1, s_is_pointers=...,
> la=<error reading variable: Cannot access memory at address 0x1>,
> liw=<error reading variable: Cannot access memory at address 0x0>,
> sym_perm=..., na=..., lna=1, ne_steps=..., nfsiz=..., fils=..., step=...,
> frere=..., dad=..., cand=..., istep_to_iniv2=..., tab_pos_in_pere=...,
> ptrar=..., ldptrar=<error reading variable: Cannot access memory at address
> 0x0>, ptrist=..., ptlust_s=..., ptrfac=..., iw1=..., iw2=..., itloc=...,
> rhs_mumps=..., pool=..., lpool=-1889529280, cntl1=-5.3576889161551131e-255,
> icntl=<error reading variable: Cannot access memory at address 0x25344>,
> info=..., rinfo=..., keep=..., keep8=..., procnode_steps=...,
> slavef=-1889504640, comm_nodes=-2048052411, myid=<error reading variable:
> Cannot access memory at address 0x81160>, myid_nodes=-1683330500, bufr=...,
> lbufr=<error reading variable: Cannot access memory at address 0x11db4c>,
> lbufr_bytes=<error reading variable: Cannot access memory at address
> 0xc4e0>, zmumps_lbuf=<error reading variable: Cannot access memory at
> address 0x4>, intarr=..., dblarr=..., root=<error reading variable: Cannot
> access memory at address 0x11dbec>, nelt=<error reading variable: Cannot
> access memory at address 0x3>, frtptr=..., frtelt=..., comm_load=<error
> reading variable: Cannot access memory at address 0x0>, ass_irecv=<error
> reading variable: Cannot access memory at address 0x0>, seuil=<error
> reading variable: Cannot access memory at address 0x0>,
> seuil_ldlt_niv2=<error reading variable: Cannot access memory at address
> 0x0>, mem_distrib=<error reading variable: Cannot access memory at address
> 0x0>, dkeep=<error reading variable: Cannot access memory at address 0x0>,
> pivnul_list=..., lpn_list=<error reading variable: Cannot access memory at
> address 0x0>, lrgroups=...) at zfac_b.F:243
> > #10 0x00007f7308610ff7 in zmumps_fac_driver (id=<error reading variable:
> value of type `zmumps_struc' requires 386095520 bytes, which is more than
> max-value-size>) at zfac_driver.F:2421
> > #11 0x00007f7308569256 in zmumps (id=<error reading variable: value of
> type `zmumps_struc' requires 386095520 bytes, which is more than
> max-value-size>) at zmumps_driver.F:1883
> > #12 0x00007f73084cf756 in zmumps_f77 (job=1, sym=0, par=<error reading
> variable: Cannot access memory at address 0x1>, comm_f77=<error reading
> variable: Cannot access memory at address 0x0>, n=<error reading variable:
> Cannot access memory at address 0xffffffffffffffff>, nblk=1, icntl=...,
> cntl=..., keep=..., dkeep=..., keep8=..., nz=0, nnz=0, irn=..., irnhere=0,
> jcn=..., jcnhere=0, a=..., ahere=0, nz_loc=0, nnz_loc=304384739,
> irn_loc=..., irn_lochere=1, jcn_loc=..., jcn_lochere=1, a_loc=...,
> a_lochere=1, nelt=0, eltptr=..., eltptrhere=0, eltvar=..., eltvarhere=0,
> a_elt=..., a_elthere=0, blkptr=..., blkptrhere=0, blkvar=..., blkvarhere=0,
> perm_in=..., perm_inhere=0, rhs=..., rhshere=0, redrhs=..., redrhshere=0,
> info=..., rinfo=..., infog=..., rinfog=..., deficiency=0, lwk_user=0,
> size_schur=0, listvar_schur=..., listvar_schurhere=0, schur=...,
> schurhere=0, wk_user=..., wk_userhere=0, colsca=..., colscahere=0,
> rowsca=..., rowscahere=0, instance_number=1, nrhs=1, lrhs=0, lredrhs=0,
> rhs_sparse=..., rhs_sparsehere=0, sol_loc=..., sol_lochere=0, rhs_loc=...,
> rhs_lochere=0, irhs_sparse=..., irhs_sparsehere=0, irhs_ptr=...,
> irhs_ptrhere=0, isol_loc=..., isol_lochere=0, irhs_loc=..., irhs_lochere=0,
> nz_rhs=0, lsol_loc=0, lrhs_loc=0, nloc_rhs=0, schur_mloc=0, schur_nloc=0,
> schur_lld=0, mblock=0, nblock=0, nprow=0, npcol=0, ooc_tmpdir=...,
> ooc_prefix=..., write_problem=..., save_dir=..., save_prefix=...,
> tmpdirlen=20, prefixlen=20, write_problemlen=20, save_dirlen=20,
> save_prefixlen=20, metis_options=...) at zmumps_f77.F:289
> > #13 0x00007f73084cd391 in zmumps_c (mumps_par=0xd70248) at mumps_c.c:485
> > #14 0x00007f7307c035ad in MatFactorNumeric_MUMPS (F=0xd70248,
> A=0x7ffda7afdae0, info=0x1) at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1683
> > #15 0x00007f7307765a8b in MatLUFactorNumeric (fact=0xd70248,
> mat=0x7ffda7afdae0, info=0x1) at
> /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> > #16 0x00007f73081b8427 in PCSetUp_LU (pc=0xd70248) at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> > #17 0x00007f7308214939 in PCSetUp (pc=0xd70248) at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> > #18 0x00007f73082260ae in KSPSetUp (ksp=0xd70248) at
> /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> > #19 0x00007f7309114959 in STSetUp_Sinvert (st=0xd70248) at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> > #20 0x00007f7309130462 in STSetUp (st=0xd70248) at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> > #21 0x00007f73092504af in EPSSetUp (eps=0xd70248) at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> > #22 0x00007f7309253635 in EPSSolve (eps=0xd70248) at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> > #23 0x00007f7309259c8d in epssolve_ (eps=0xd70248,
> __ierr=0x7ffda7afdae0) at
> /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/ftn-auto/epssolvef.c:85
> > #24 0x0000000000403c19 in all_stab_routines::solve_by_slepc2 (a_pet=...,
> b_pet=..., jthisone=<error reading variable: Cannot access memory at
> address 0x1>, isize=<error reading variable: Cannot access memory at
> address 0x0>) at small_slepc_example_program.F:322
> > #25 0x00000000004025a0 in slepit () at small_slepc_example_program.F:549
> > #26 0x00000000004023f2 in main ()
> > #27 0x00007f72fb8380b3 in __libc_start_main (main=0x4023c0 <main>,
> argc=14, argv=0x7ffda7b024e8, init=<optimized out>, fini=<optimized out>,
> rtld_fini=<optimized out>, stack_end=0x7ffda7b024d8) at
> ../csu/libc-start.c:308
> > #28 0x00000000004022fe in _start ()
> >
> > From: Matthew Knepley <knepley at gmail.com>
> > Sent: Tuesday, August 24, 2021 3:59 PM
> > To: dazza simplythebest <sayosale at hotmail.com>
> > Cc: Jose E. Roman <jroman at dsic.upv.es>; PETSc <petsc-users at mcs.anl.gov>
> > Subject: Re: [petsc-users] Improving efficiency of slepc usage
> >
> > On Tue, Aug 24, 2021 at 8:47 AM dazza simplythebest <
> sayosale at hotmail.com> wrote:
> >
> > Dear Matthew and Jose,
> >    Apologies for the delayed reply, I had a couple of unforeseen days
> off this week.
> > Firstly regarding Jose's suggestion re: MUMPS, the program is already
> using MUMPS
> > to solve linear systems (the code is using a distributed MPI  matrix to
> solve the generalised
> > non-Hermitian complex problem).
> >
> > I have tried the gdb debugger as per Matthew's suggestion.
> > Just to note in case someone else is following this that at first it
> didn't work (couldn't 'attach') ,
> > but after some googling I found a tip suggesting the command;
> > echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
> > which seemed to get it working.
> >
> > I then first ran the debugger on the small matrix case that worked.
> > That stopped in gdb almost immediately after starting execution
> > with a report regarding 'nanosleep.c':
> > ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory.
> > However, issuing the 'cont' command again caused the program to run
> through to the end of the
> >  execution w/out any problems, and with correct looking results, so I am
> guessing this error
> > is not particularly important.
> >
> > We do that on purpose when the debugger starts up. Typing 'cont' is
> correct.
> >
> > I then tried the same debugging procedure on the large matrix case that
> fails.
> > The code again stopped almost immediately after the start of execution
> with
> > the same nanosleep error as before, and I was able to set the program
> running
> >  again with 'cont' (see full output below). I was running the code with
> 4 MPI processes,
> >  and so had 4 gdb windows appear.  Thereafter the code ran for sometime
> until completing the
> > matrix construction, and then one of the gdb process windows printed a
> > Program terminated with signal SIGKILL, Killed.
> > The program no longer exists.
> > message.  I then typed 'where' into this terminal but just received the
> message
> > No stack.
> >
> > I have only seen this behavior one other time, and it was with Fortran.
> Fortran allows you to declare really big arrays
> > on the stack by putting them at the start of a function (rather than F90
> malloc). When I had one of those arrays exceed
> > the stack space, I got this kind of an error where everything is
> destroyed rather than just stopping. Could it be that you
> > have a large structure on the stack?
> >
> > Second, you can at least look at the stack for the processes that were
> not killed. You type Ctrl-C, which should give you
> > the prompt and then "where".
> >
> >   Thanks,
> >
> >       Matt
> >
> > The other gdb windows basically seemed to be left in limbo until I
> issued the 'quit'
> >  command in the SIGKILL, and then they vanished.
> >
> > I paste the full output from the gdb window that recorded the SIGKILL
> below here.
> > I guess it is necessary to somehow work out where the SIGKILL originates
> from ?
> >
> >  Thanks once again,
> >                          Dan.
> >
> >
> >  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> > GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
> > Copyright (C) 2020 Free Software Foundation, Inc.
> > License GPLv3+: GNU GPL version 3 or later <
> http://gnu.org/licenses/gpl.html>
> > This is free software: you are free to change and redistribute it.
> > There is NO WARRANTY, to the extent permitted by law.
> > Type "show copying" and "show warranty" for details.
> > This GDB was configured as "x86_64-linux-gnu".
> > Type "show configuration" for configuration details.
> > For bug reporting instructions, please see:
> > <http://www.gnu.org/software/gdb/bugs/>.
> > Find the GDB manual and other documentation resources online at:
> >     <http://www.gnu.org/software/gdb/documentation/>.
> >
> > For help, type "help".
> > Type "apropos word" to search for commands related to "word"...
> > Reading symbols from ./stab1.exe...
> > Attaching to program:
> /data/work/rotplane/omega_to_zero/stability/test/tmp10/tmp6/stab1.exe,
> process 675919
> > Reading symbols from
> /data/work/slepc/SLEPC/slepc-3.15.1/arch-omp_nodbug/lib/libslepc.so.3.15...
> > Reading symbols from
> /data/work/slepc/PETSC/petsc-3.15.0/arch-omp_nodbug/lib/libpetsc.so.3.15...
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib--Type <RET> for
> more, q to quit, c to continue without paging--cont
> > /intel64_lin/libmkl_intel_lp64.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so...
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.so...
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.dbg...
> > Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...
> > Reading symbols from
> /usr/lib/debug//lib/x86_64-linux-gnu/libdl-2.31.so...
> > Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...
> > Reading symbols from
> /usr/lib/debug/.build-id/e5/4761f7b554d0fcc1562959665d93dffbebdaf0.debug...
> > [Thread debugging using libthread_db enabled]
> > Using host libthread_db library
> "/lib/x86_64-linux-gnu/libthread_db.so.1".
> > Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6...
> > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpifort.so.12...
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12...
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.dbg...
> > Reading symbols from /lib/x86_64-linux-gnu/librt.so.1...
> > Reading symbols from
> /usr/lib/debug//lib/x86_64-linux-gnu/librt-2.31.so...
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so)
> > Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...
> > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libm-2.31.so...
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so)
> > Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1...
> > (No debugging symbols found in /lib/x86_64-linux-gnu/libgcc_s.so.1)
> > Reading symbols from /usr/lib/x86_64-linux-gnu/libquadmath.so.0...
> > (No debugging symbols found in
> /usr/lib/x86_64-linux-gnu/libquadmath.so.0)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so)
> > Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...
> > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.31.so...
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5)
> > Reading symbols from /lib64/ld-linux-x86-64.so.2...
> > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.31.so...
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1)
> > Reading symbols from /usr/lib/x86_64-linux-gnu/libnuma.so...
> > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libnuma.so)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so)
> > Reading symbols from
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so...
> > (No debugging symbols found in
> /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so)
> > Reading symbols from /usr/lib/x86_64-linux-gnu/libpsm2.so.2...
> > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libpsm2.so.2)
> > 0x00007fac4d0d8334 in __GI___clock_nanosleep (clock_id=<optimized out>,
> clock_id at entry=0, flags=flags at entry=0, req=req at entry=0x7ffdc641a9a0,
> rem=rem at entry=0x7ffdc641a9a0) at
> ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
> > 78      ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or
> directory.
> > (gdb) cont
> > Continuing.
> > [New Thread 0x7f9e49c02780 (LWP 676559)]
> > [New Thread 0x7f9e49400800 (LWP 676560)]
> > [New Thread 0x7f9e48bfe880 (LWP 676562)]
> > [Thread 0x7f9e48bfe880 (LWP 676562) exited]
> > [Thread 0x7f9e49400800 (LWP 676560) exited]
> > [Thread 0x7f9e49c02780 (LWP 676559) exited]
> >
> > Program terminated with signal SIGKILL, Killed.
> > The program no longer exists.
> > (gdb) where
> > No stack.
> >
> >  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - - - - - - - - -
> >
> > From: Matthew Knepley <knepley at gmail.com>
> > Sent: Friday, August 20, 2021 2:12 PM
> > To: dazza simplythebest <sayosale at hotmail.com>
> > Cc: Jose E. Roman <jroman at dsic.upv.es>; PETSc <petsc-users at mcs.anl.gov>
> > Subject: Re: [petsc-users] Improving efficiency of slepc usage
> >
> > On Fri, Aug 20, 2021 at 6:55 AM dazza simplythebest <
> sayosale at hotmail.com> wrote:
> > Dear Jose,
> >     Many thanks for your response, I have been investigating this issue
> with a few more calculations
> > today, hence the slightly delayed response.
> >
> > The problem is actually derived from a fluid dynamics problem, so to
> allow an easier exploration of things
> > I first downsized the resolution of the underlying fluid solver while
> keeping all the physical parameters
> >  the same - i.e. I would get a smaller matrix that should be solving the
> same physical problem as the original
> >  larger matrix but to lower accuracy.
> >
> > Results
> >
> > Small matrix (N= 21168) - everything good!
> > This converged when using the -eps_largest_real approach (taking 92
> iterations for nev=10,
> > tol= 5.0000E-06 and ncv = 300), and also when using the shift-invert
> approach, converging
> > very impressively in a single iteration ! Interestingly it did this both
> for a non-zero  -eps_target
> >  and also for a zero  -eps_target.
> >
> > Large matrix (N=50400)- works for -eps_largest_real , fails for st_type
> sinvert
> > I have just double checked again that the code does run properly when we
> use the -eps_largest_real
> > option - indeed I ran it with a small nev and large tolerance (nev = 4,
> tol= -eps_tol 5.0e-4 , ncv = 300)
> > and with these parameters convergence was obtained in 164 iterations,
> which took 6 hours on the
> > machine I was running it on. Furthermore the eigenvalues seem to be
> ballpark correct; for this large
> > higher resolution case (although with lower slepc tolerance) we obtain
> 1789.56816314173 -4724.51319554773i
> >  as the eigenvalue with largest real part, while the smaller matrix
> (same physical problem but at lower resolution case)
> > found this eigenvalue to be 1831.11845726501 -4787.54519511345i , which
> means the agreement is in line
> > with expectations.
> >
> > Unfortunately though the code does still crash though when I try to do
> shift-invert for the large matrix case ,
> >  whether or not I use a non-zero  -eps_target. For reference this is the
> command line used :
> > -eps_nev 10    -eps_ncv 300  -log_view -eps_view   -eps_target 0.1
> -st_type sinvert -eps_monitor :monitor_output05.txt
> > To be precise the code crashes soon after calling EPSSolve (it
> successfully calls
> >  MatCreateVecs, EPSCreate,  EPSSetOperators, EPSSetProblemType and
> EPSSetFromOptions).
> > By crashes I mean that I do not even get any error messages from
> slepc/PETSC, and do not even get the
> > 'EPS Object: 16 MPI processes' message - I simply get a  MPI/Fortran
> 'KILLED BY SIGNAL: 9 (Killed)' message
> >  as soon as EPSsolve is called.
> >
> > Hi Dan,
> >
> > It would help track this error down if we had a stack trace. You can get
> a stack trace from the debugger. You run with
> >
> >   -start_in_debugger
> >
> > which should launch the debugger (usually), and then type
> >
> >   cont
> >
> > to continue, and then
> >
> >   where
> >
> > to get the stack trace when it crashes, or 'bt' on lldb.
> >
> >   Thanks,
> >
> >      Matt
> >
> > Do you have any ideas as to why this larger matrix case should fail when
> using shift-invert but succeed when using
> > -eps_largest_real ? The fact that the program works and produces correct
> results
> > when using the -eps_largest_real  option suggests that there is probably
> nothing wrong with the specification
> > of the problem or the matrices ? It is strange how there is no error
> message from slepc / Petsc ... the
> > only idea I have at the moment is that perhaps max memory has been
> exceeded, which could cause such a sudden
> > shutdown? For your reference when running the large matrix case with the
> -eps_largest_real option I am using
> > about 36 GB of the 148GB available on this machine  - does the shift
> invert approach require substantially
> > more memory for example ?
> >
> >   I would be very grateful if you have any suggestions to resolve this
> issue or even ways to clarify it further,
> >  the performance I have seen with the shift-invert for the small matrix
> is so impressive it would be great to
> >  get that working for the full-size problem.
> >
> >    Many thanks and best wishes,
> >                                   Dan.
> >
> >
> >
> > From: Jose E. Roman <jroman at dsic.upv.es>
> > Sent: Thursday, August 19, 2021 7:58 AM
> > To: dazza simplythebest <sayosale at hotmail.com>
> > Cc: PETSc <petsc-users at mcs.anl.gov>
> > Subject: Re: [petsc-users] Improving efficiency of slepc usage
> >
> > In A) convergence may be slow, especially if the wanted eigenvalues have
> small magnitude. I would not say 600 iterations is a lot, you probably need
> many more. In most cases, approach B) is better because it improves
> convergence of eigenvalues close to the target, but it requires prior
> knowledge of your spectrum distribution in order to choose an appropriate
> target.
> >
> > In B) what do you mean that it crashes. If you get an error about
> factorization, it means that your A-matrix is singular, In that case, try
> using a nonzero target -eps_target 0.1
> >
> > Jose
> >
> >
> > > El 19 ago 2021, a las 7:12, dazza simplythebest <sayosale at hotmail.com>
> escribi?:
> > >
> > > Dear All,
> > >             I am planning on using slepc to do a large number of
> eigenvalue calculations
> > >  of a generalized eigenvalue problem, called from a program written in
> fortran using MPI.
> > >  Thus far I have successfully installed the slepc/PETSc software, both
> locally and on a cluster,
> > >  and on smaller test problems everything is working well; the matrices
> are efficiently and
> > > correctly constructed and slepc returns the correct spectrum. I am
> just now starting to move
> > > towards now solving the full-size 'production run' problems, and would
> appreciate some
> > > general advice on how to improve the solver's performance.
> > >
> > > In particular, I am currently trying to solve the problem Ax = lambda
> Bx whose matrices
> > > are of size 50000 (this is the smallest 'production run' problem I
> will be tackling), and are
> > > complex, non-Hermitian.  In most cases I aim to find the eigenvalues
> with the largest real part,
> > > although in other cases I will also be interested in finding the
> eigenvalues whose real part
> > > is close to zero.
> > >
> > > A)
> > > Calling slepc 's EPS solver with the following options:
> > >
> > > -eps_nev 10   -log_view -eps_view -eps_max_it 600 -eps_ncv 140
> -eps_tol 5.0e-6  -eps_largest_real -eps_monitor :monitor_output.txt
> > >
> > >
> > > led to the code successfully running, but failing to find any
> eigenvalues within the maximum 600 iterations
> > > (examining the monitor output it did appear to be very slowly
> approaching convergence).
> > >
> > > B)
> > > On the same problem I have also tried a shift-invert transformation
> using the options
> > >
> > > -eps_nev 10    -eps_ncv 140    -eps_target 0.0+0.0i  -st_type sinvert
> > >
> > > -in this case the code crashed at the point it tried to call slepc, so
> perhaps I have incorrectly specified these options ?
> > >
> > >
> > > Does anyone have any suggestions as to how to improve this performance
> ( or find out more about the problem) ?
> > > In the case of A) I can see from watching the slepc   videos that
> increasing ncv
> > > may help, but I am wondering , since 600 is a large number of
> iterations, whether there
> > > maybe something else going on - e.g. perhaps some alternative
> preconditioner may help ?
> > > In the case of B), I guess there must be some mistake in these command
> line options?
> > >  Again, any advice will be greatly appreciated.
> > >      Best wishes,  Dan.
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > -- Norbert Wiener
> >
> > https://www.cse.buffalo.edu/~knepley/
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > -- Norbert Wiener
> >
> > https://www.cse.buffalo.edu/~knepley/
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210826/d5f854a4/attachment-0001.html>

From mfadams at lbl.gov  Fri Aug 27 07:05:45 2021
From: mfadams at lbl.gov (Mark Adams)
Date: Fri, 27 Aug 2021 08:05:45 -0400
Subject: [petsc-users] runtime error on Summit with nvhpc21.7
Message-ID: <CADOhEh6PL+5hg1_85WYP_XSMM7OLNd0gD8v4Z4ci97CHQS=kDw@mail.gmail.com>

I have a user (cc'ed) that has a C++ code and is using a PETSc that I
built. He is getting this runtime error.

'make check' runs clean and I built snes/tutorial/ex1 manually, to get a
link line, and it ran fine.
I appended the users link line and my test.

I see that they are using Kokkos' "nvcc_wrapper". Should I rebuild PETSc
using that, maybe we just need to make sure we are both using the same
underlying compiler or should they use mpiCC?

Thanks,
Mark


[e13n16:591873] *** Process received signal ***

[e13n16:591873] Signal: Segmentation fault (11)

[e13n16:591873] Signal code: Invalid permissions (2)

[e13n16:591873] Failing at address: 0x102c87e0

[e13n16:591873] [ 0]
linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]

[e13n16:591873] [ 1] [e13n16:591872] *** Process received signal ***

[e13n16:591872] Signal: Segmentation fault (11)

[e13n16:591872] Signal code: Invalid permissions (2)

[e13n16:591872] Failing at address: 0x102c87e0

[e13n16:591872] [ 0]
linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]

[e13n16:591872] [ 1] [e13n16:591871] *** Process received signal ***

[e13n16:591871] Signal: Segmentation fault (11)

[e13n16:591871] Signal code: Invalid permissions (2)

[e13n16:591871] Failing at address: 0x102c87e0

[e13n16:591871] [ 0]
linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]

[e13n16:591871] [ 1]
/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]

[e13n16:591871] [ 2] [e13n16:591874] *** Process received signal ***

[e13n16:591874] Signal: Segmentation fault (11)

[e13n16:591874] Signal code: Invalid permissions (2)

[e13n16:591874] Failing at address: 0x102c87e0

[e13n16:591874] [ 0]
linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]

[e13n16:591874] [ 1]
/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]

[e13n16:591874] [ 2]
/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]

[e13n16:591874] [ 3]
/ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]

[e13n16:591874] [ 4]
/ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]

[e13n16:591874] [ 5]
/ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]

[e13n16:591874] [ 6]
/ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]

[e13n16:591874] [ 7]
/ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]

[e13n16:591874] [ 8]
/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]

[e13n16:591871] [ 3]
/ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]

[e13n16:591871] [ 4]
/ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]

[e13n16:591871] [ 5]
/ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]

[e13n16:591871] [ 6]
/ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]

[e13n16:591871] [ 7]
/ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]

[e13n16:591871] [ 8]
/usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]

[e13n16:591871] [ 9]
/usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]

[e13n16:591871] *** End of error message ***

/usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]

[e13n16:591874] [ 9]
/usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]

[e13n16:591874] *** End of error message ***

/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]

[e13n16:591872] [ 2]
/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]

[e13n16:591872] [ 3]
/ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]

[e13n16:591872] [ 4]
/ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]

[e13n16:591872] [ 5]
/ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]

[e13n16:591872] [ 6]
/ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]

[e13n16:591872] [ 7]
/ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]

[e13n16:591872] [ 8]
/usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]

[e13n16:591872] [ 9]
/usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]

[e13n16:591872] *** End of error message ***

/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]

[e13n16:591873] [ 2]
/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]

[e13n16:591873] [ 3]
/ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]

[e13n16:591873] [ 4]
/ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]

[e13n16:591873] [ 5]
/ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]

[e13n16:591873] [ 6]
/ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]

[e13n16:591873] [ 7]
/ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]

[e13n16:591873] [ 8]
/usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]

[e13n16:591873] [ 9]
/usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]

[e13n16:591873] *** End of error message ***

ERROR:  One or more process (first noticed rank 1) terminated with signal
11 (core dumped)


/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper
-arch=sm_70 CMakeFiles/xgc-es-cpp.dir/xgc-es-cpp_build_info.F90.o -o
bin/xgc-es-cpp
-Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib:/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64
liblibxgc-es-cpp.a
/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/liblapack.so
/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/libblas.so
/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so
/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libparmetis.so
/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libmetis.so
/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/fftw-3.3.9-bzi7deue27ijd7xm4zn7pt22u4sj47g4/lib/libfftw3.so
libs/pspline/libpspline.a libs/camtimers/libtimers.a
/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libacchost.so
/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran_mpi.so.2.7.1
/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran.so.2.7.1
/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscontainers.a
/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscore.a
/usr/lib64/libcuda.so
/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/cuda/11.0/lib64/libcudart.so
/usr/lib64/libdl.so -lmpi_ibm_usempif08 -lmpi_ibm_usempi_ignore_tkr
-lmpi_ibm_mpifh -lnvf
-Wl,-rpath-link,/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64


19:39 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$
make
PETSC_DIR=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7
PETSC_ARCH="" ex1
*mpicc* -fPIC -g -fast  -fPIC -g -fast
 -I/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/include
    ex1.c
 -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
-L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
-Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
-L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
-Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib
-L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib
-Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib
-L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib
-Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64
-L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64
-Wl,-rpath,/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib
-L/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib
-Wl,-rpath,/usr/lib/gcc/ppc64le-redhat-linux/8
-L/usr/lib/gcc/ppc64le-redhat-linux/8 -lpetsc -llapack -lblas -lparmetis
-lmetis -lstdc++ -ldl -lpthread -lmpiprofilesupport -lmpi_ibm_usempif08
-lmpi_ibm_usempi_ignore_tkr -lmpi_ibm_mpifh -lmpi_ibm -lnvf -lnvomp
-latomic -lnvhpcatm -lnvcpumath -lnvc -lrt -lm -lgcc_s -lstdc++ -ldl -o ex1
19:40 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ *mpicc
--version*

*nvc 21.7-0 linuxpower target on Linuxpower*
NVIDIA Compilers and Tools
Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
19:40 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$
jsrun -n 1 ./ex1 -ksp_monitor
    0 KSP Residual norm 6.041522986797e+00
    1 KSP Residual norm 1.042493382631e+00
    2 KSP Residual norm 7.950907844730e-16
    0 KSP Residual norm 4.786756692342e+00
    1 KSP Residual norm 1.426392207750e-01
    2 KSP Residual norm 1.801079604472e-15
    0 KSP Residual norm 2.986456323228e+00
    1 KSP Residual norm 7.669888809223e-02
    2 KSP Residual norm 3.744083117256e-16
    0 KSP Residual norm 2.306244667700e-01
    1 KSP Residual norm 1.355550749587e-02
    2 KSP Residual norm 5.845524837731e-17
    0 KSP Residual norm 1.936314002654e-03
    1 KSP Residual norm 2.125593590819e-04
    2 KSP Residual norm 6.987141455073e-20
    0 KSP Residual norm 1.435593531990e-07
    1 KSP Residual norm 2.588271385567e-08
    2 KSP Residual norm 3.942196167935e-23
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210827/8c483a86/attachment-0001.html>

From sayosale at hotmail.com  Fri Aug 27 09:12:56 2021
From: sayosale at hotmail.com (dazza simplythebest)
Date: Fri, 27 Aug 2021 14:12:56 +0000
Subject: [petsc-users] Improving efficiency of slepc usage -memory
 management when using shift-invert
In-Reply-To: <CAMYG4Gnadx4Z7=+R6BJBuQhsd5N=sJJYKV19Um-Hn3mOUFk-=g@mail.gmail.com>
References: <MEYP282MB1975D9155BCF70CF2F5D65CAD0C09@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<EF005C76-523A-4D6B-8DA5-45A9A3FC37B3@dsic.upv.es>
	<MEYP282MB19759AF49362AE3EF4024546D0C19@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<CAMYG4Gn1h_S4o9LsnyzL_PRkHMxsXkpHgL16n_3iL5B6bRq7sw@mail.gmail.com>
	<MEYP282MB19750B9F47336C606742AA73D0C49@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<CAMYG4GkP6gM=KGaCQvkAFHLBVeKZY-nMt=-MH2qimh922wOK_g@mail.gmail.com>
	<MEYP282MB1975548CFDB3392C26941AC8D0C69@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<MEYP282MB19755EE893AC7BBE055DE4E0D0C69@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<B30EBC8E-A797-4D1A-AE1E-7EDAD39A610F@dsic.upv.es>
	<MEYP282MB19756D7938BFD132EB05090CD0C79@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>
	<CAMYG4Gnadx4Z7=+R6BJBuQhsd5N=sJJYKV19Um-Hn3mOUFk-=g@mail.gmail.com>
Message-ID: <MEYP282MB19750C38DCFB8487D62CF68BD0C89@MEYP282MB1975.AUSP282.PROD.OUTLOOK.COM>

Dear All,
             Okay, thanks for the tip and all the guidance this far -  I will also investigate superLU as the linear solver.
 I  have a good test problem now at least !
   Have a good weekend and many thanks once again,
        Dan.

________________________________
From: Matthew Knepley <knepley at gmail.com>
Sent: Thursday, August 26, 2021 3:53 PM
To: dazza simplythebest <sayosale at hotmail.com>
Cc: Jose E. Roman <jroman at dsic.upv.es>; PETSc <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Improving efficiency of slepc usage -memory management when using shift-invert

On Thu, Aug 26, 2021 at 8:32 AM dazza simplythebest <sayosale at hotmail.com<mailto:sayosale at hotmail.com>> wrote:
Dear Jose and Matthew,
                 Many thanks for your assistance, this would seem to explain what the problem was.
So judging by this test case, there seems to be a memory vs computational time tradeoff involved
 in choosing whether to shift-invert or not; the shift-invert will greatly reduce the
number of required iterations ,but will require a higher memory cost ?
I have been trying a few values of -st_mat_mumps_icntl_14 (and also the alternative
-st_mat_mumps_icntl_23) today but have not yet been able to select one that fits onto the
 workstation I am using (although it seems that setting these parameters seems to guarantee
 that an error message is generated at least).

Thus I will probably need to reduce the number of MPI
processes and thereby reduce the memory requirement). In this regard the MUMPS documentation
 suggests that a hybrid MPI-OpenMP approach is optimum for their software, whereas I remember reading
somewhere else that openmp threading was not a good choice for using PETSC, would you have any
general advice on this ?

Memory does not really track the number of MPI processes. MUMPS does a lot of things redundantly. For minimum memory, I
would suggest trying SuperLU_dist:

  --download-superlu_dist

I do not think OpenMP will have much influence at all.

  Thanks,

     Matt

I was thinking maybe that a version of slepc / petsc compiled against openmp,
 and with the number of threads set appropriately, but not explicitly using openmp directives in
 the user's code may be the way forward ? That way PETSC will (?) just ignore the threading whereas
 threading will be available to MUMPS when execution is passed to those routines ?

 Many thanks once again,
             Dan.


________________________________
From: Jose E. Roman <jroman at dsic.upv.es<mailto:jroman at dsic.upv.es>>
Sent: Wednesday, August 25, 2021 1:40 PM
To: dazza simplythebest <sayosale at hotmail.com<mailto:sayosale at hotmail.com>>
Cc: PETSc <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Improving efficiency of slepc usage

MUMPS documentation (section 8) indicates that the meaning of INFOG(1)=-9 is insuficient workspace. Try running with
 -st_mat_mumps_icntl_14 <percentage>
where <percentage> is the percentage in which you want to increase the workspace, e.g. 50 or 100 or more.

See ex43.c for an example showing how to set this option in code.

Jose


> El 25 ago 2021, a las 14:11, dazza simplythebest <sayosale at hotmail.com<mailto:sayosale at hotmail.com>> escribi?:
>
>
>
> From: dazza simplythebest <sayosale at hotmail.com<mailto:sayosale at hotmail.com>>
> Sent: Wednesday, August 25, 2021 12:08 PM
> To: Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
> Subject: Re: [petsc-users] Improving efficiency of slepc usage
>
> Dear Matthew and Jose,
>                                           I have derived a smaller program from the original program by constructing
> matrices of the same size, but filling their entries randomly instead of computing the correct
> fluid dynamics values just to allow  faster experimentation. This modified code's behaviour seems
>  to be similar, with the code again failing for the large matrix case  with  the SIGKILL error, so I first report
> results from that code here. Firstly I can confirm that I am using Fortran , and I am compiling with the
>  intel compiler, which it seems places automatic arrays on the stack.  The stacksize, as determined
> by ulimit -a, is reported to be :
> stack size              (kbytes, -s) 8192
>
> [1] Okay, so I followed your suggestion and used ctrl-c  followed by 'where' in one of the non-SIGKILL gdb windows.
>  I have pasted the output into the bottom of this email (see [1] output) - it does look like the problem occurs somewhere in the call
>  to the MUMPS solver ?
>
> [2] I have also today gained access to another workstation, and so have tried running the (original) code on that machine.
>   This new machine has two (more powerful) CPU nodes and a larger memory (both machines feature Intel Xeon processors).
> On this new machine the large matrix case again failed with the familiar SIGKILL report when I used 16 or 12 MPI
> processes,  ran to the end w/out error for 4 or 6 MPI processes, and failed but with a PETSC error message
>  when I used 8 MPI processes, which I have pasted below (see [2] output). Does this point to some sort of resource
> demand that exceeds some limit as the number of MPI processes increases ?
>
>   Many thanks once again,
>             Dan.
>
> [2] output
> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [0]PETSC ERROR: Error in external library
> [0]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6
>
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [0]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
> [0]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
> [0]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [1]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [1]PETSC ERROR: Error in external library
> [1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6
>
> [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [1]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [1]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
> [1]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
> [1]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [1]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [1]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [1]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [1]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [2]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [2]PETSC ERROR: Error in external library
> [2]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6
>
> [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [2]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [2]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
> [2]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
> [2]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [2]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [2]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [2]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [2]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [3]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [3]PETSC ERROR: Error in external library
> [3]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6
>
> [3]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [3]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [3]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
> [3]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
> [3]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [3]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [3]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [3]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [3]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [3]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [3]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [3]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [4]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [4]PETSC ERROR: Error in external library
> [4]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6
>
> [4]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [4]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [4]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
> [4]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
> [4]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [4]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [4]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [4]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [4]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [4]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [4]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [4]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [4]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [5]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [5]PETSC ERROR: Error in external library
> [5]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6
>
> [5]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [5]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [5]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
> [5]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
> [5]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [5]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [5]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [5]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [5]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [5]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [5]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [5]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [5]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [6]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [6]PETSC ERROR: Error in external library
> [6]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=21891045
>
> [6]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [6]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [6]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
> [6]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
> [6]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [6]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [6]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [6]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [6]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [6]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [6]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [6]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [6]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [7]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [7]PETSC ERROR: Error in external library
> [7]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=21841925
>
> [7]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [7]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [7]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021
> [7]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug
> [7]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686
> [7]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [7]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [7]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [7]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [7]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [7]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [7]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [7]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [0]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> [0]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> [0]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> [0]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> [0]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [0]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [0]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [0]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [1]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [1]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [1]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [1]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [2]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> [2]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> [2]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> [2]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> [3]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
>
>
>
> [1] output
>
> Continuing.
> [New Thread 0x7f6f5b2d2780 (LWP 794037)]
> [New Thread 0x7f6f5aad0800 (LWP 794040)]
> [New Thread 0x7f6f5a2ce880 (LWP 794041)]
> ^C
> Thread 1 "my.exe" received signal SIGINT, Interrupt.
> 0x00007f72904927b0 in ofi_fastlock_release_noop ()
>    from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
> (gdb) where
> #0  0x00007f72904927b0 in ofi_fastlock_release_noop ()
>    from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
> #1  0x00007f729049354b in ofi_cq_readfrom ()
>    from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so
> #2  0x00007f728ffe8f0e in rxm_ep_do_progress ()
>    from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
> #3  0x00007f728ffe2b7d in rxm_ep_recv_common_flags ()
>    from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
> #4  0x00007f728ffe30f8 in rxm_ep_trecvmsg ()
>    from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
> #5  0x00007f72fe6b8c3e in PMPI_Iprobe (source=14090824, tag=-1481647392,
>     comm=1, flag=0x0, status=0xffffffffffffffff)
>     at /usr/include/rdma/fi_tagged.h:109
> #6  0x00007f72ff3d7fad in pmpi_iprobe_ (v1=0xd70248, v2=0x7ffda7afdae0,
>     v3=0x1, v4=0x0, v5=0xffffffffffffffff, ierr=0xd6fc90)
>     at ../../src/binding/fortran/mpif_h/iprobef.c:276
> #7  0x00007f730855b6e2 in zmumps_try_recvtreat (comm_load=1, ass_irecv=0,
>     blocking=<error reading variable: Cannot access memory at address 0x1>,
>
>     --Type <RET> for more, q to quit, c to continue without paging--cont
>     irecv=<error reading variable: Cannot access memory at address 0x0>, message_received=<error reading variable: Cannot access memory at address 0xffffffffffffffff>, msgsou=1, msgtag=-1, status=..., bufr=..., lbufr=320782504, lbufr_bytes=1283130016, procnode_steps=..., posfac=1, iwpos=1, iwposcb=292535, iptrlu=2039063816, lrlu=2039063816, lrlus=2039063816, n=50400, iw=..., liw=292563, a=..., la=2611636796, ptrist=..., ptlust=..., ptrfac=..., ptrast=..., step=..., pimaster=..., pamaster=..., nstk_s=..., comp=0, iflag=0, ierror=0, comm=-1006632958, nbprocfils=..., ipool=..., lpool=5, leaf=1, nbfin=4, myid=1, slavef=4, root=<error reading variable: value of type `zmumps_root_struc' requires 766016 bytes, which is more than max-value-size>, opassw=0, opeliw=0, itloc=..., rhs_mumps=..., fils=..., dad=..., ptrarw=..., ptraiw=..., intarr=..., dblarr=..., icntl=..., keep=..., keep8=..., dkeep=..., nd=..., frere=..., lptrar=50400, nelt=1, frtptr=..., frtelt=..., istep_to_iniv2=..., tab_pos_in_pere=..., stack_right_authorized=4294967295, lrgroups=...) at zfac_process_message.F:730
> #8  0x00007f73087076e2 in zmumps_fac_par_m::zmumps_fac_par (n=1, iw=..., liw=<error reading variable: Cannot access memory at address 0x1>, a=..., la=<error reading variable: Cannot access memory at address 0xffffffffffffffff>, nstk_steps=..., nbprocfils=..., nd=..., fils=..., step=..., frere=..., dad=..., cand=..., istep_to_iniv2=..., tab_pos_in_pere=..., nstepsdone=1690339657, opass=<error reading variable: Cannot access memory at address 0x5>, opeli=<error reading variable: Cannot access memory at address 0x0>, nelva=50400, comp=259581, maxfrt=-1889517576, nmaxnpiv=-1195144887, ntotpv=<error reading variable: Cannot access memory at address 0x2>, noffnegpv=<error reading variable: Cannot access memory at address 0x0>, nb22t1=<error reading variable: Cannot access memory at address 0x0>, nb22t2=<error reading variable: Cannot access memory at address 0x0>, nbtiny=<error reading variable: Cannot access memory at address 0x0>, det_exp=<error reading variable: Cannot access memory at address 0x0>, det_mant=<error reading variable: Cannot access memory at address 0x0>, det_sign=<error reading variable: Cannot access memory at address 0x0>, ptrist=..., ptrast=..., pimaster=..., pamaster=..., ptrarw=..., ptraiw=..., itloc=..., rhs_mumps=..., ipool=..., lpool=<error reading variable: Cannot access memory at address 0x0>, rinfo=<error reading variable: Cannot access memory at address 0x0>, posfac=<error reading variable: Cannot access memory at address 0x0>, iwpos=<error reading variable: Cannot access memory at address 0x0>, lrlu=<error reading variable: Cannot access memory at address 0x0>, iptrlu=<error reading variable: Cannot access memory at address 0x0>, lrlus=<error reading variable: Cannot access memory at address 0x0>, leaf=<error reading variable: Cannot access memory at address 0x0>, nbroot=<error reading variable: Cannot access memory at address 0x0>, nbrtot=<error reading variable: Cannot access memory at address 0x0>, uu=<error reading variable: Cannot access memory at address 0x0>, icntl=<error reading variable: Cannot access memory at address 0x0>, ptlust=..., ptrfac=..., info=<error reading variable: Cannot access memory at address 0x0>, keep=<error reading variable: Cannot access memory at address 0x3ff0000000000000>, keep8=<error reading variable: Cannot access memory at address 0x0>, procnode_steps=..., slavef=<error reading variable: Cannot access memory at address 0x4ffffffff>, myid=<error reading variable: Cannot access memory at address 0xffffffff>, comm_nodes=<error reading variable: Cannot access memory at address 0x0>, myid_nodes=<error reading variable: Cannot access memory at address 0x0>, bufr=..., lbufr=0, lbufr_bytes=5, intarr=..., dblarr=..., root=..., perm=..., nelt=0, frtptr=..., frtelt=..., lptrar=3, comm_load=-30, ass_irecv=30, seuil=2.1219957909652723e-314, seuil_ldlt_niv2=4.2439866417681519e-314, mem_distrib=..., ne=..., dkeep=..., pivnul_list=..., lpn_list=0, lrgroups=...) at zfac_par_m.F:182
> #9  0x00007f730865af7a in zmumps_fac_b (n=1, s_is_pointers=..., la=<error reading variable: Cannot access memory at address 0x1>, liw=<error reading variable: Cannot access memory at address 0x0>, sym_perm=..., na=..., lna=1, ne_steps=..., nfsiz=..., fils=..., step=..., frere=..., dad=..., cand=..., istep_to_iniv2=..., tab_pos_in_pere=..., ptrar=..., ldptrar=<error reading variable: Cannot access memory at address 0x0>, ptrist=..., ptlust_s=..., ptrfac=..., iw1=..., iw2=..., itloc=..., rhs_mumps=..., pool=..., lpool=-1889529280, cntl1=-5.3576889161551131e-255, icntl=<error reading variable: Cannot access memory at address 0x25344>, info=..., rinfo=..., keep=..., keep8=..., procnode_steps=..., slavef=-1889504640, comm_nodes=-2048052411, myid=<error reading variable: Cannot access memory at address 0x81160>, myid_nodes=-1683330500, bufr=..., lbufr=<error reading variable: Cannot access memory at address 0x11db4c>, lbufr_bytes=<error reading variable: Cannot access memory at address 0xc4e0>, zmumps_lbuf=<error reading variable: Cannot access memory at address 0x4>, intarr=..., dblarr=..., root=<error reading variable: Cannot access memory at address 0x11dbec>, nelt=<error reading variable: Cannot access memory at address 0x3>, frtptr=..., frtelt=..., comm_load=<error reading variable: Cannot access memory at address 0x0>, ass_irecv=<error reading variable: Cannot access memory at address 0x0>, seuil=<error reading variable: Cannot access memory at address 0x0>, seuil_ldlt_niv2=<error reading variable: Cannot access memory at address 0x0>, mem_distrib=<error reading variable: Cannot access memory at address 0x0>, dkeep=<error reading variable: Cannot access memory at address 0x0>, pivnul_list=..., lpn_list=<error reading variable: Cannot access memory at address 0x0>, lrgroups=...) at zfac_b.F:243
> #10 0x00007f7308610ff7 in zmumps_fac_driver (id=<error reading variable: value of type `zmumps_struc' requires 386095520 bytes, which is more than max-value-size>) at zfac_driver.F:2421
> #11 0x00007f7308569256 in zmumps (id=<error reading variable: value of type `zmumps_struc' requires 386095520 bytes, which is more than max-value-size>) at zmumps_driver.F:1883
> #12 0x00007f73084cf756 in zmumps_f77 (job=1, sym=0, par=<error reading variable: Cannot access memory at address 0x1>, comm_f77=<error reading variable: Cannot access memory at address 0x0>, n=<error reading variable: Cannot access memory at address 0xffffffffffffffff>, nblk=1, icntl=..., cntl=..., keep=..., dkeep=..., keep8=..., nz=0, nnz=0, irn=..., irnhere=0, jcn=..., jcnhere=0, a=..., ahere=0, nz_loc=0, nnz_loc=304384739, irn_loc=..., irn_lochere=1, jcn_loc=..., jcn_lochere=1, a_loc=..., a_lochere=1, nelt=0, eltptr=..., eltptrhere=0, eltvar=..., eltvarhere=0, a_elt=..., a_elthere=0, blkptr=..., blkptrhere=0, blkvar=..., blkvarhere=0, perm_in=..., perm_inhere=0, rhs=..., rhshere=0, redrhs=..., redrhshere=0, info=..., rinfo=..., infog=..., rinfog=..., deficiency=0, lwk_user=0, size_schur=0, listvar_schur=..., listvar_schurhere=0, schur=..., schurhere=0, wk_user=..., wk_userhere=0, colsca=..., colscahere=0, rowsca=..., rowscahere=0, instance_number=1, nrhs=1, lrhs=0, lredrhs=0, rhs_sparse=..., rhs_sparsehere=0, sol_loc=..., sol_lochere=0, rhs_loc=..., rhs_lochere=0, irhs_sparse=..., irhs_sparsehere=0, irhs_ptr=..., irhs_ptrhere=0, isol_loc=..., isol_lochere=0, irhs_loc=..., irhs_lochere=0, nz_rhs=0, lsol_loc=0, lrhs_loc=0, nloc_rhs=0, schur_mloc=0, schur_nloc=0, schur_lld=0, mblock=0, nblock=0, nprow=0, npcol=0, ooc_tmpdir=..., ooc_prefix=..., write_problem=..., save_dir=..., save_prefix=..., tmpdirlen=20, prefixlen=20, write_problemlen=20, save_dirlen=20, save_prefixlen=20, metis_options=...) at zmumps_f77.F:289
> #13 0x00007f73084cd391 in zmumps_c (mumps_par=0xd70248) at mumps_c.c:485
> #14 0x00007f7307c035ad in MatFactorNumeric_MUMPS (F=0xd70248, A=0x7ffda7afdae0, info=0x1) at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1683
> #15 0x00007f7307765a8b in MatLUFactorNumeric (fact=0xd70248, mat=0x7ffda7afdae0, info=0x1) at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195
> #16 0x00007f73081b8427 in PCSetUp_LU (pc=0xd70248) at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131
> #17 0x00007f7308214939 in PCSetUp (pc=0xd70248) at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015
> #18 0x00007f73082260ae in KSPSetUp (ksp=0xd70248) at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406
> #19 0x00007f7309114959 in STSetUp_Sinvert (st=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123
> #20 0x00007f7309130462 in STSetUp (st=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582
> #21 0x00007f73092504af in EPSSetUp (eps=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350
> #22 0x00007f7309253635 in EPSSolve (eps=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136
> #23 0x00007f7309259c8d in epssolve_ (eps=0xd70248, __ierr=0x7ffda7afdae0) at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/ftn-auto/epssolvef.c:85
> #24 0x0000000000403c19 in all_stab_routines::solve_by_slepc2 (a_pet=..., b_pet=..., jthisone=<error reading variable: Cannot access memory at address 0x1>, isize=<error reading variable: Cannot access memory at address 0x0>) at small_slepc_example_program.F:322
> #25 0x00000000004025a0 in slepit () at small_slepc_example_program.F:549
> #26 0x00000000004023f2 in main ()
> #27 0x00007f72fb8380b3 in __libc_start_main (main=0x4023c0 <main>, argc=14, argv=0x7ffda7b024e8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffda7b024d8) at ../csu/libc-start.c:308
> #28 0x00000000004022fe in _start ()
>
> From: Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
> Sent: Tuesday, August 24, 2021 3:59 PM
> To: dazza simplythebest <sayosale at hotmail.com<mailto:sayosale at hotmail.com>>
> Cc: Jose E. Roman <jroman at dsic.upv.es<mailto:jroman at dsic.upv.es>>; PETSc <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
> Subject: Re: [petsc-users] Improving efficiency of slepc usage
>
> On Tue, Aug 24, 2021 at 8:47 AM dazza simplythebest <sayosale at hotmail.com<mailto:sayosale at hotmail.com>> wrote:
>
> Dear Matthew and Jose,
>    Apologies for the delayed reply, I had a couple of unforeseen days off this week.
> Firstly regarding Jose's suggestion re: MUMPS, the program is already using MUMPS
> to solve linear systems (the code is using a distributed MPI  matrix to solve the generalised
> non-Hermitian complex problem).
>
> I have tried the gdb debugger as per Matthew's suggestion.
> Just to note in case someone else is following this that at first it didn't work (couldn't 'attach') ,
> but after some googling I found a tip suggesting the command;
> echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
> which seemed to get it working.
>
> I then first ran the debugger on the small matrix case that worked.
> That stopped in gdb almost immediately after starting execution
> with a report regarding 'nanosleep.c':
> ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory.
> However, issuing the 'cont' command again caused the program to run through to the end of the
>  execution w/out any problems, and with correct looking results, so I am guessing this error
> is not particularly important.
>
> We do that on purpose when the debugger starts up. Typing 'cont' is correct.
>
> I then tried the same debugging procedure on the large matrix case that fails.
> The code again stopped almost immediately after the start of execution with
> the same nanosleep error as before, and I was able to set the program running
>  again with 'cont' (see full output below). I was running the code with 4 MPI processes,
>  and so had 4 gdb windows appear.  Thereafter the code ran for sometime until completing the
> matrix construction, and then one of the gdb process windows printed a
> Program terminated with signal SIGKILL, Killed.
> The program no longer exists.
> message.  I then typed 'where' into this terminal but just received the message
> No stack.
>
> I have only seen this behavior one other time, and it was with Fortran. Fortran allows you to declare really big arrays
> on the stack by putting them at the start of a function (rather than F90 malloc). When I had one of those arrays exceed
> the stack space, I got this kind of an error where everything is destroyed rather than just stopping. Could it be that you
> have a large structure on the stack?
>
> Second, you can at least look at the stack for the processes that were not killed. You type Ctrl-C, which should give you
> the prompt and then "where".
>
>   Thanks,
>
>       Matt
>
> The other gdb windows basically seemed to be left in limbo until I issued the 'quit'
>  command in the SIGKILL, and then they vanished.
>
> I paste the full output from the gdb window that recorded the SIGKILL below here.
> I guess it is necessary to somehow work out where the SIGKILL originates from ?
>
>  Thanks once again,
>                          Dan.
>
>
>  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
> Copyright (C) 2020 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> Type "show copying" and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>.
> Find the GDB manual and other documentation resources online at:
>     <http://www.gnu.org/software/gdb/documentation/>.
>
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from ./stab1.exe...
> Attaching to program: /data/work/rotplane/omega_to_zero/stability/test/tmp10/tmp6/stab1.exe, process 675919
> Reading symbols from /data/work/slepc/SLEPC/slepc-3.15.1/arch-omp_nodbug/lib/libslepc.so.3.15...
> Reading symbols from /data/work/slepc/PETSC/petsc-3.15.0/arch-omp_nodbug/lib/libpetsc.so.3.15...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib--Type <RET> for more, q to quit, c to continue without paging--cont
> /intel64_lin/libmkl_intel_lp64.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.so...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.dbg...
> Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libdl-2.31.so...
> Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...
> Reading symbols from /usr/lib/debug/.build-id/e5/4761f7b554d0fcc1562959665d93dffbebdaf0.debug...
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6...
> (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpifort.so.12...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.dbg...
> Reading symbols from /lib/x86_64-linux-gnu/librt.so.1...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/librt-2.31.so...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so)
> Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libm-2.31.so...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so)
> Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1...
> (No debugging symbols found in /lib/x86_64-linux-gnu/libgcc_s.so.1)
> Reading symbols from /usr/lib/x86_64-linux-gnu/libquadmath.so.0...
> (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libquadmath.so.0)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so)
> Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.31.so...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5)
> Reading symbols from /lib64/ld-linux-x86-64.so.2...
> Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.31.so...
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1)
> Reading symbols from /usr/lib/x86_64-linux-gnu/libnuma.so...
> (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libnuma.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so)
> Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so...
> (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so)
> Reading symbols from /usr/lib/x86_64-linux-gnu/libpsm2.so.2...
> (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libpsm2.so.2)
> 0x00007fac4d0d8334 in __GI___clock_nanosleep (clock_id=<optimized out>, clock_id at entry=0, flags=flags at entry=0, req=req at entry=0x7ffdc641a9a0, rem=rem at entry=0x7ffdc641a9a0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
> 78      ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory.
> (gdb) cont
> Continuing.
> [New Thread 0x7f9e49c02780 (LWP 676559)]
> [New Thread 0x7f9e49400800 (LWP 676560)]
> [New Thread 0x7f9e48bfe880 (LWP 676562)]
> [Thread 0x7f9e48bfe880 (LWP 676562) exited]
> [Thread 0x7f9e49400800 (LWP 676560) exited]
> [Thread 0x7f9e49c02780 (LWP 676559) exited]
>
> Program terminated with signal SIGKILL, Killed.
> The program no longer exists.
> (gdb) where
> No stack.
>
>  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> From: Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
> Sent: Friday, August 20, 2021 2:12 PM
> To: dazza simplythebest <sayosale at hotmail.com<mailto:sayosale at hotmail.com>>
> Cc: Jose E. Roman <jroman at dsic.upv.es<mailto:jroman at dsic.upv.es>>; PETSc <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
> Subject: Re: [petsc-users] Improving efficiency of slepc usage
>
> On Fri, Aug 20, 2021 at 6:55 AM dazza simplythebest <sayosale at hotmail.com<mailto:sayosale at hotmail.com>> wrote:
> Dear Jose,
>     Many thanks for your response, I have been investigating this issue with a few more calculations
> today, hence the slightly delayed response.
>
> The problem is actually derived from a fluid dynamics problem, so to allow an easier exploration of things
> I first downsized the resolution of the underlying fluid solver while keeping all the physical parameters
>  the same - i.e. I would get a smaller matrix that should be solving the same physical problem as the original
>  larger matrix but to lower accuracy.
>
> Results
>
> Small matrix (N= 21168) - everything good!
> This converged when using the -eps_largest_real approach (taking 92 iterations for nev=10,
> tol= 5.0000E-06 and ncv = 300), and also when using the shift-invert approach, converging
> very impressively in a single iteration ! Interestingly it did this both for a non-zero  -eps_target
>  and also for a zero  -eps_target.
>
> Large matrix (N=50400)- works for -eps_largest_real , fails for st_type sinvert
> I have just double checked again that the code does run properly when we use the -eps_largest_real
> option - indeed I ran it with a small nev and large tolerance (nev = 4, tol= -eps_tol 5.0e-4 , ncv = 300)
> and with these parameters convergence was obtained in 164 iterations, which took 6 hours on the
> machine I was running it on. Furthermore the eigenvalues seem to be ballpark correct; for this large
> higher resolution case (although with lower slepc tolerance) we obtain 1789.56816314173 -4724.51319554773i
>  as the eigenvalue with largest real part, while the smaller matrix (same physical problem but at lower resolution case)
> found this eigenvalue to be 1831.11845726501 -4787.54519511345i , which means the agreement is in line
> with expectations.
>
> Unfortunately though the code does still crash though when I try to do shift-invert for the large matrix case ,
>  whether or not I use a non-zero  -eps_target. For reference this is the command line used :
> -eps_nev 10    -eps_ncv 300  -log_view -eps_view   -eps_target 0.1 -st_type sinvert -eps_monitor :monitor_output05.txt
> To be precise the code crashes soon after calling EPSSolve (it successfully calls
>  MatCreateVecs, EPSCreate,  EPSSetOperators, EPSSetProblemType and EPSSetFromOptions).
> By crashes I mean that I do not even get any error messages from slepc/PETSC, and do not even get the
> 'EPS Object: 16 MPI processes' message - I simply get a  MPI/Fortran 'KILLED BY SIGNAL: 9 (Killed)' message
>  as soon as EPSsolve is called.
>
> Hi Dan,
>
> It would help track this error down if we had a stack trace. You can get a stack trace from the debugger. You run with
>
>   -start_in_debugger
>
> which should launch the debugger (usually), and then type
>
>   cont
>
> to continue, and then
>
>   where
>
> to get the stack trace when it crashes, or 'bt' on lldb.
>
>   Thanks,
>
>      Matt
>
> Do you have any ideas as to why this larger matrix case should fail when using shift-invert but succeed when using
> -eps_largest_real ? The fact that the program works and produces correct results
> when using the -eps_largest_real  option suggests that there is probably nothing wrong with the specification
> of the problem or the matrices ? It is strange how there is no error message from slepc / Petsc ... the
> only idea I have at the moment is that perhaps max memory has been exceeded, which could cause such a sudden
> shutdown? For your reference when running the large matrix case with the -eps_largest_real option I am using
> about 36 GB of the 148GB available on this machine  - does the shift invert approach require substantially
> more memory for example ?
>
>   I would be very grateful if you have any suggestions to resolve this issue or even ways to clarify it further,
>  the performance I have seen with the shift-invert for the small matrix is so impressive it would be great to
>  get that working for the full-size problem.
>
>    Many thanks and best wishes,
>                                   Dan.
>
>
>
> From: Jose E. Roman <jroman at dsic.upv.es<mailto:jroman at dsic.upv.es>>
> Sent: Thursday, August 19, 2021 7:58 AM
> To: dazza simplythebest <sayosale at hotmail.com<mailto:sayosale at hotmail.com>>
> Cc: PETSc <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
> Subject: Re: [petsc-users] Improving efficiency of slepc usage
>
> In A) convergence may be slow, especially if the wanted eigenvalues have small magnitude. I would not say 600 iterations is a lot, you probably need many more. In most cases, approach B) is better because it improves convergence of eigenvalues close to the target, but it requires prior knowledge of your spectrum distribution in order to choose an appropriate target.
>
> In B) what do you mean that it crashes. If you get an error about factorization, it means that your A-matrix is singular, In that case, try using a nonzero target -eps_target 0.1
>
> Jose
>
>
> > El 19 ago 2021, a las 7:12, dazza simplythebest <sayosale at hotmail.com<mailto:sayosale at hotmail.com>> escribi?:
> >
> > Dear All,
> >             I am planning on using slepc to do a large number of eigenvalue calculations
> >  of a generalized eigenvalue problem, called from a program written in fortran using MPI.
> >  Thus far I have successfully installed the slepc/PETSc software, both locally and on a cluster,
> >  and on smaller test problems everything is working well; the matrices are efficiently and
> > correctly constructed and slepc returns the correct spectrum. I am just now starting to move
> > towards now solving the full-size 'production run' problems, and would appreciate some
> > general advice on how to improve the solver's performance.
> >
> > In particular, I am currently trying to solve the problem Ax = lambda Bx whose matrices
> > are of size 50000 (this is the smallest 'production run' problem I will be tackling), and are
> > complex, non-Hermitian.  In most cases I aim to find the eigenvalues with the largest real part,
> > although in other cases I will also be interested in finding the eigenvalues whose real part
> > is close to zero.
> >
> > A)
> > Calling slepc 's EPS solver with the following options:
> >
> > -eps_nev 10   -log_view -eps_view -eps_max_it 600 -eps_ncv 140  -eps_tol 5.0e-6  -eps_largest_real -eps_monitor :monitor_output.txt
> >
> >
> > led to the code successfully running, but failing to find any eigenvalues within the maximum 600 iterations
> > (examining the monitor output it did appear to be very slowly approaching convergence).
> >
> > B)
> > On the same problem I have also tried a shift-invert transformation using the options
> >
> > -eps_nev 10    -eps_ncv 140    -eps_target 0.0+0.0i  -st_type sinvert
> >
> > -in this case the code crashed at the point it tried to call slepc, so perhaps I have incorrectly specified these options ?
> >
> >
> > Does anyone have any suggestions as to how to improve this performance ( or find out more about the problem) ?
> > In the case of A) I can see from watching the slepc   videos that increasing ncv
> > may help, but I am wondering , since 600 is a large number of iterations, whether there
> > maybe something else going on - e.g. perhaps some alternative preconditioner may help ?
> > In the case of B), I guess there must be some mistake in these command line options?
> >  Again, any advice will be greatly appreciated.
> >      Best wishes,  Dan.
>
>
>
> --
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
>
>
> --
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/


--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210827/079710a3/attachment-0001.html>

From junchao.zhang at gmail.com  Fri Aug 27 09:49:59 2021
From: junchao.zhang at gmail.com (Junchao Zhang)
Date: Fri, 27 Aug 2021 09:49:59 -0500
Subject: [petsc-users] runtime error on Summit with nvhpc21.7
In-Reply-To: <CADOhEh6PL+5hg1_85WYP_XSMM7OLNd0gD8v4Z4ci97CHQS=kDw@mail.gmail.com>
References: <CADOhEh6PL+5hg1_85WYP_XSMM7OLNd0gD8v4Z4ci97CHQS=kDw@mail.gmail.com>
Message-ID: <CA+MQGp_zF6bhBcp1Vk8BuCK3fgyUHvBrva1ZZ9c8y1KXbjiCNA@mail.gmail.com>

On Fri, Aug 27, 2021 at 7:06 AM Mark Adams <mfadams at lbl.gov> wrote:

> I have a user (cc'ed) that has a C++ code and is using a PETSc that I
> built. He is getting this runtime error.
>
> 'make check' runs clean and I built snes/tutorial/ex1 manually, to get a
> link line, and it ran fine.
> I appended the users link line and my test.
>
> I see that they are using Kokkos' "nvcc_wrapper". Should I rebuild PETSc
> using that, maybe we just need to make sure we are both using the same
> underlying compiler or should they use mpiCC?
>
It looks like they used nvcc_wrapper to replace nvcc.  You can ask them to
use nvcc directly to see what happens. But the error happened in petsc
initialization, petscsys_petscinitializenohelp, so I doubt it helps.  The
easy way is to just attach a debugger.

>
> Thanks,
> Mark
>
>
> [e13n16:591873] *** Process received signal ***
>
> [e13n16:591873] Signal: Segmentation fault (11)
>
> [e13n16:591873] Signal code: Invalid permissions (2)
>
> [e13n16:591873] Failing at address: 0x102c87e0
>
> [e13n16:591873] [ 0]
> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>
> [e13n16:591873] [ 1] [e13n16:591872] *** Process received signal ***
>
> [e13n16:591872] Signal: Segmentation fault (11)
>
> [e13n16:591872] Signal code: Invalid permissions (2)
>
> [e13n16:591872] Failing at address: 0x102c87e0
>
> [e13n16:591872] [ 0]
> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>
> [e13n16:591872] [ 1] [e13n16:591871] *** Process received signal ***
>
> [e13n16:591871] Signal: Segmentation fault (11)
>
> [e13n16:591871] Signal code: Invalid permissions (2)
>
> [e13n16:591871] Failing at address: 0x102c87e0
>
> [e13n16:591871] [ 0]
> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>
> [e13n16:591871] [ 1]
> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>
> [e13n16:591871] [ 2] [e13n16:591874] *** Process received signal ***
>
> [e13n16:591874] Signal: Segmentation fault (11)
>
> [e13n16:591874] Signal code: Invalid permissions (2)
>
> [e13n16:591874] Failing at address: 0x102c87e0
>
> [e13n16:591874] [ 0]
> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>
> [e13n16:591874] [ 1]
> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>
> [e13n16:591874] [ 2]
> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>
> [e13n16:591874] [ 3]
> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>
> [e13n16:591874] [ 4]
> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>
> [e13n16:591874] [ 5]
> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>
> [e13n16:591874] [ 6]
> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>
> [e13n16:591874] [ 7]
> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>
> [e13n16:591874] [ 8]
> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>
> [e13n16:591871] [ 3]
> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>
> [e13n16:591871] [ 4]
> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>
> [e13n16:591871] [ 5]
> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>
> [e13n16:591871] [ 6]
> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>
> [e13n16:591871] [ 7]
> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>
> [e13n16:591871] [ 8]
> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>
> [e13n16:591871] [ 9]
> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>
> [e13n16:591871] *** End of error message ***
>
>
> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>
> [e13n16:591874] [ 9]
> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>
> [e13n16:591874] *** End of error message ***
>
>
> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>
> [e13n16:591872] [ 2]
> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>
> [e13n16:591872] [ 3]
> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>
> [e13n16:591872] [ 4]
> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>
> [e13n16:591872] [ 5]
> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>
> [e13n16:591872] [ 6]
> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>
> [e13n16:591872] [ 7]
> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>
> [e13n16:591872] [ 8]
> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>
> [e13n16:591872] [ 9]
> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>
> [e13n16:591872] *** End of error message ***
>
>
> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>
> [e13n16:591873] [ 2]
> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>
> [e13n16:591873] [ 3]
> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>
> [e13n16:591873] [ 4]
> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>
> [e13n16:591873] [ 5]
> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>
> [e13n16:591873] [ 6]
> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>
> [e13n16:591873] [ 7]
> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>
> [e13n16:591873] [ 8]
> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>
> [e13n16:591873] [ 9]
> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>
> [e13n16:591873] *** End of error message ***
>
> ERROR:  One or more process (first noticed rank 1) terminated with signal
> 11 (core dumped)
>
>
>
>
> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper
> -arch=sm_70 CMakeFiles/xgc-es-cpp.dir/xgc-es-cpp_build_info.F90.o -o
> bin/xgc-es-cpp  -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib:/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64
> liblibxgc-es-cpp.a
> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/liblapack.so
> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/libblas.so
> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so
> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libparmetis.so
> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libmetis.so
> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/fftw-3.3.9-bzi7deue27ijd7xm4zn7pt22u4sj47g4/lib/libfftw3.so
> libs/pspline/libpspline.a libs/camtimers/libtimers.a
> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libacchost.so
> /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran_mpi.so.2.7.1
> /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran.so.2.7.1
> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscontainers.a
> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscore.a
> /usr/lib64/libcuda.so
> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/cuda/11.0/lib64/libcudart.so
> /usr/lib64/libdl.so -lmpi_ibm_usempif08 -lmpi_ibm_usempi_ignore_tkr
> -lmpi_ibm_mpifh -lnvf
> -Wl,-rpath-link,/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64
>
>
>
> 19:39 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$
> make
> PETSC_DIR=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7
> PETSC_ARCH="" ex1
> *mpicc* -fPIC -g -fast  -fPIC -g -fast
>  -I/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/include
>     ex1.c
>  -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
> -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
> -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
> -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib
> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib
> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib
> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib
> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64
> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64
> -Wl,-rpath,/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib
> -L/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib
> -Wl,-rpath,/usr/lib/gcc/ppc64le-redhat-linux/8
> -L/usr/lib/gcc/ppc64le-redhat-linux/8 -lpetsc -llapack -lblas -lparmetis
> -lmetis -lstdc++ -ldl -lpthread -lmpiprofilesupport -lmpi_ibm_usempif08
> -lmpi_ibm_usempi_ignore_tkr -lmpi_ibm_mpifh -lmpi_ibm -lnvf -lnvomp
> -latomic -lnvhpcatm -lnvcpumath -lnvc -lrt -lm -lgcc_s -lstdc++ -ldl -o ex1
> 19:40 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ *mpicc
> --version*
>
> *nvc 21.7-0 linuxpower target on Linuxpower*
> NVIDIA Compilers and Tools
> Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
> 19:40 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$
> jsrun -n 1 ./ex1 -ksp_monitor
>     0 KSP Residual norm 6.041522986797e+00
>     1 KSP Residual norm 1.042493382631e+00
>     2 KSP Residual norm 7.950907844730e-16
>     0 KSP Residual norm 4.786756692342e+00
>     1 KSP Residual norm 1.426392207750e-01
>     2 KSP Residual norm 1.801079604472e-15
>     0 KSP Residual norm 2.986456323228e+00
>     1 KSP Residual norm 7.669888809223e-02
>     2 KSP Residual norm 3.744083117256e-16
>     0 KSP Residual norm 2.306244667700e-01
>     1 KSP Residual norm 1.355550749587e-02
>     2 KSP Residual norm 5.845524837731e-17
>     0 KSP Residual norm 1.936314002654e-03
>     1 KSP Residual norm 2.125593590819e-04
>     2 KSP Residual norm 6.987141455073e-20
>     0 KSP Residual norm 1.435593531990e-07
>     1 KSP Residual norm 2.588271385567e-08
>     2 KSP Residual norm 3.942196167935e-23
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210827/80400ecc/attachment-0001.html>

From mfadams at lbl.gov  Fri Aug 27 13:52:42 2021
From: mfadams at lbl.gov (Mark Adams)
Date: Fri, 27 Aug 2021 14:52:42 -0400
Subject: [petsc-users] runtime error on Summit with nvhpc21.7
In-Reply-To: <CA+MQGp_zF6bhBcp1Vk8BuCK3fgyUHvBrva1ZZ9c8y1KXbjiCNA@mail.gmail.com>
References: <CADOhEh6PL+5hg1_85WYP_XSMM7OLNd0gD8v4Z4ci97CHQS=kDw@mail.gmail.com>
	<CA+MQGp_zF6bhBcp1Vk8BuCK3fgyUHvBrva1ZZ9c8y1KXbjiCNA@mail.gmail.com>
Message-ID: <CADOhEh5DgMAfziCA+fYY0O4GKLpPzYEyth_o8uFCVDJcZ09zjg@mail.gmail.com>

I think the problem is that I build with MPICC and they use nvcc_wrapper. I
could just try building PETSc with CC=nvcc_wrapper, but it was not clear if
this was the way to go.
I will try it.
Thanks,
Mark

On Fri, Aug 27, 2021 at 10:50 AM Junchao Zhang <junchao.zhang at gmail.com>
wrote:

>
>
>
> On Fri, Aug 27, 2021 at 7:06 AM Mark Adams <mfadams at lbl.gov> wrote:
>
>> I have a user (cc'ed) that has a C++ code and is using a PETSc that I
>> built. He is getting this runtime error.
>>
>> 'make check' runs clean and I built snes/tutorial/ex1 manually, to get a
>> link line, and it ran fine.
>> I appended the users link line and my test.
>>
>> I see that they are using Kokkos' "nvcc_wrapper". Should I rebuild PETSc
>> using that, maybe we just need to make sure we are both using the same
>> underlying compiler or should they use mpiCC?
>>
> It looks like they used nvcc_wrapper to replace nvcc.  You can ask them to
> use nvcc directly to see what happens. But the error happened in petsc
> initialization, petscsys_petscinitializenohelp, so I doubt it helps.  The
> easy way is to just attach a debugger.
>
>>
>> Thanks,
>> Mark
>>
>>
>> [e13n16:591873] *** Process received signal ***
>>
>> [e13n16:591873] Signal: Segmentation fault (11)
>>
>> [e13n16:591873] Signal code: Invalid permissions (2)
>>
>> [e13n16:591873] Failing at address: 0x102c87e0
>>
>> [e13n16:591873] [ 0]
>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>>
>> [e13n16:591873] [ 1] [e13n16:591872] *** Process received signal ***
>>
>> [e13n16:591872] Signal: Segmentation fault (11)
>>
>> [e13n16:591872] Signal code: Invalid permissions (2)
>>
>> [e13n16:591872] Failing at address: 0x102c87e0
>>
>> [e13n16:591872] [ 0]
>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>>
>> [e13n16:591872] [ 1] [e13n16:591871] *** Process received signal ***
>>
>> [e13n16:591871] Signal: Segmentation fault (11)
>>
>> [e13n16:591871] Signal code: Invalid permissions (2)
>>
>> [e13n16:591871] Failing at address: 0x102c87e0
>>
>> [e13n16:591871] [ 0]
>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>>
>> [e13n16:591871] [ 1]
>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>>
>> [e13n16:591871] [ 2] [e13n16:591874] *** Process received signal ***
>>
>> [e13n16:591874] Signal: Segmentation fault (11)
>>
>> [e13n16:591874] Signal code: Invalid permissions (2)
>>
>> [e13n16:591874] Failing at address: 0x102c87e0
>>
>> [e13n16:591874] [ 0]
>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>>
>> [e13n16:591874] [ 1]
>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>>
>> [e13n16:591874] [ 2]
>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>>
>> [e13n16:591874] [ 3]
>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>>
>> [e13n16:591874] [ 4]
>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>>
>> [e13n16:591874] [ 5]
>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>>
>> [e13n16:591874] [ 6]
>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>>
>> [e13n16:591874] [ 7]
>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>>
>> [e13n16:591874] [ 8]
>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>>
>> [e13n16:591871] [ 3]
>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>>
>> [e13n16:591871] [ 4]
>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>>
>> [e13n16:591871] [ 5]
>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>>
>> [e13n16:591871] [ 6]
>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>>
>> [e13n16:591871] [ 7]
>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>>
>> [e13n16:591871] [ 8]
>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>>
>> [e13n16:591871] [ 9]
>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>>
>> [e13n16:591871] *** End of error message ***
>>
>>
>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>>
>> [e13n16:591874] [ 9]
>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>>
>> [e13n16:591874] *** End of error message ***
>>
>>
>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>>
>> [e13n16:591872] [ 2]
>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>>
>> [e13n16:591872] [ 3]
>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>>
>> [e13n16:591872] [ 4]
>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>>
>> [e13n16:591872] [ 5]
>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>>
>> [e13n16:591872] [ 6]
>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>>
>> [e13n16:591872] [ 7]
>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>>
>> [e13n16:591872] [ 8]
>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>>
>> [e13n16:591872] [ 9]
>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>>
>> [e13n16:591872] *** End of error message ***
>>
>>
>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>>
>> [e13n16:591873] [ 2]
>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>>
>> [e13n16:591873] [ 3]
>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>>
>> [e13n16:591873] [ 4]
>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>>
>> [e13n16:591873] [ 5]
>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>>
>> [e13n16:591873] [ 6]
>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>>
>> [e13n16:591873] [ 7]
>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>>
>> [e13n16:591873] [ 8]
>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>>
>> [e13n16:591873] [ 9]
>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>>
>> [e13n16:591873] *** End of error message ***
>>
>> ERROR:  One or more process (first noticed rank 1) terminated with signal
>> 11 (core dumped)
>>
>>
>>
>>
>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper
>> -arch=sm_70 CMakeFiles/xgc-es-cpp.dir/xgc-es-cpp_build_info.F90.o -o
>> bin/xgc-es-cpp  -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib:/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64
>> liblibxgc-es-cpp.a
>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/liblapack.so
>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/libblas.so
>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so
>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libparmetis.so
>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libmetis.so
>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/fftw-3.3.9-bzi7deue27ijd7xm4zn7pt22u4sj47g4/lib/libfftw3.so
>> libs/pspline/libpspline.a libs/camtimers/libtimers.a
>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libacchost.so
>> /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran_mpi.so.2.7.1
>> /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran.so.2.7.1
>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscontainers.a
>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscore.a
>> /usr/lib64/libcuda.so
>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/cuda/11.0/lib64/libcudart.so
>> /usr/lib64/libdl.so -lmpi_ibm_usempif08 -lmpi_ibm_usempi_ignore_tkr
>> -lmpi_ibm_mpifh -lnvf
>> -Wl,-rpath-link,/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64
>>
>>
>>
>> 19:39 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$
>> make
>> PETSC_DIR=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7
>> PETSC_ARCH="" ex1
>> *mpicc* -fPIC -g -fast  -fPIC -g -fast
>>  -I/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/include
>>     ex1.c
>>  -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
>> -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
>> -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
>> -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib
>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib
>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib
>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib
>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64
>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64
>> -Wl,-rpath,/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib
>> -L/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib
>> -Wl,-rpath,/usr/lib/gcc/ppc64le-redhat-linux/8
>> -L/usr/lib/gcc/ppc64le-redhat-linux/8 -lpetsc -llapack -lblas -lparmetis
>> -lmetis -lstdc++ -ldl -lpthread -lmpiprofilesupport -lmpi_ibm_usempif08
>> -lmpi_ibm_usempi_ignore_tkr -lmpi_ibm_mpifh -lmpi_ibm -lnvf -lnvomp
>> -latomic -lnvhpcatm -lnvcpumath -lnvc -lrt -lm -lgcc_s -lstdc++ -ldl -o ex1
>> 19:40 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ *mpicc
>> --version*
>>
>> *nvc 21.7-0 linuxpower target on Linuxpower*
>> NVIDIA Compilers and Tools
>> Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
>> 19:40 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$
>> jsrun -n 1 ./ex1 -ksp_monitor
>>     0 KSP Residual norm 6.041522986797e+00
>>     1 KSP Residual norm 1.042493382631e+00
>>     2 KSP Residual norm 7.950907844730e-16
>>     0 KSP Residual norm 4.786756692342e+00
>>     1 KSP Residual norm 1.426392207750e-01
>>     2 KSP Residual norm 1.801079604472e-15
>>     0 KSP Residual norm 2.986456323228e+00
>>     1 KSP Residual norm 7.669888809223e-02
>>     2 KSP Residual norm 3.744083117256e-16
>>     0 KSP Residual norm 2.306244667700e-01
>>     1 KSP Residual norm 1.355550749587e-02
>>     2 KSP Residual norm 5.845524837731e-17
>>     0 KSP Residual norm 1.936314002654e-03
>>     1 KSP Residual norm 2.125593590819e-04
>>     2 KSP Residual norm 6.987141455073e-20
>>     0 KSP Residual norm 1.435593531990e-07
>>     1 KSP Residual norm 2.588271385567e-08
>>     2 KSP Residual norm 3.942196167935e-23
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210827/61acc2a5/attachment-0001.html>

From junchao.zhang at gmail.com  Fri Aug 27 14:55:52 2021
From: junchao.zhang at gmail.com (Junchao Zhang)
Date: Fri, 27 Aug 2021 14:55:52 -0500
Subject: [petsc-users] runtime error on Summit with nvhpc21.7
In-Reply-To: <CADOhEh5DgMAfziCA+fYY0O4GKLpPzYEyth_o8uFCVDJcZ09zjg@mail.gmail.com>
References: <CADOhEh6PL+5hg1_85WYP_XSMM7OLNd0gD8v4Z4ci97CHQS=kDw@mail.gmail.com>
	<CA+MQGp_zF6bhBcp1Vk8BuCK3fgyUHvBrva1ZZ9c8y1KXbjiCNA@mail.gmail.com>
	<CADOhEh5DgMAfziCA+fYY0O4GKLpPzYEyth_o8uFCVDJcZ09zjg@mail.gmail.com>
Message-ID: <CA+MQGp_xq3UkDfri25Kj+ZPdGp0+Bcj808tKF7v8SWWD9PLM_Q@mail.gmail.com>

On Fri, Aug 27, 2021, 1:52 PM Mark Adams <mfadams at lbl.gov> wrote:

> I think the problem is that I build with MPICC and they use nvcc_wrapper.
> I could just try building PETSc with CC=nvcc_wrapper, but it was not
> clear if this was the way to go.
>
--with-nvcc=nvcc_wrapper

> I will try it.
> Thanks,
> Mark
>
> On Fri, Aug 27, 2021 at 10:50 AM Junchao Zhang <junchao.zhang at gmail.com>
> wrote:
>
>>
>>
>>
>> On Fri, Aug 27, 2021 at 7:06 AM Mark Adams <mfadams at lbl.gov> wrote:
>>
>>> I have a user (cc'ed) that has a C++ code and is using a PETSc that I
>>> built. He is getting this runtime error.
>>>
>>> 'make check' runs clean and I built snes/tutorial/ex1 manually, to get a
>>> link line, and it ran fine.
>>> I appended the users link line and my test.
>>>
>>> I see that they are using Kokkos' "nvcc_wrapper". Should I rebuild
>>> PETSc using that, maybe we just need to make sure we are both using the
>>> same underlying compiler or should they use mpiCC?
>>>
>> It looks like they used nvcc_wrapper to replace nvcc.  You can ask them
>> to use nvcc directly to see what happens. But the error happened in petsc
>> initialization, petscsys_petscinitializenohelp, so I doubt it helps.
>> The easy way is to just attach a debugger.
>>
>>>
>>> Thanks,
>>> Mark
>>>
>>>
>>> [e13n16:591873] *** Process received signal ***
>>>
>>> [e13n16:591873] Signal: Segmentation fault (11)
>>>
>>> [e13n16:591873] Signal code: Invalid permissions (2)
>>>
>>> [e13n16:591873] Failing at address: 0x102c87e0
>>>
>>> [e13n16:591873] [ 0]
>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>>>
>>> [e13n16:591873] [ 1] [e13n16:591872] *** Process received signal ***
>>>
>>> [e13n16:591872] Signal: Segmentation fault (11)
>>>
>>> [e13n16:591872] Signal code: Invalid permissions (2)
>>>
>>> [e13n16:591872] Failing at address: 0x102c87e0
>>>
>>> [e13n16:591872] [ 0]
>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>>>
>>> [e13n16:591872] [ 1] [e13n16:591871] *** Process received signal ***
>>>
>>> [e13n16:591871] Signal: Segmentation fault (11)
>>>
>>> [e13n16:591871] Signal code: Invalid permissions (2)
>>>
>>> [e13n16:591871] Failing at address: 0x102c87e0
>>>
>>> [e13n16:591871] [ 0]
>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>>>
>>> [e13n16:591871] [ 1]
>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>>>
>>> [e13n16:591871] [ 2] [e13n16:591874] *** Process received signal ***
>>>
>>> [e13n16:591874] Signal: Segmentation fault (11)
>>>
>>> [e13n16:591874] Signal code: Invalid permissions (2)
>>>
>>> [e13n16:591874] Failing at address: 0x102c87e0
>>>
>>> [e13n16:591874] [ 0]
>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>>>
>>> [e13n16:591874] [ 1]
>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>>>
>>> [e13n16:591874] [ 2]
>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>>>
>>> [e13n16:591874] [ 3]
>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>>>
>>> [e13n16:591874] [ 4]
>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>>>
>>> [e13n16:591874] [ 5]
>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>>>
>>> [e13n16:591874] [ 6]
>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>>>
>>> [e13n16:591874] [ 7]
>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>>>
>>> [e13n16:591874] [ 8]
>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>>>
>>> [e13n16:591871] [ 3]
>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>>>
>>> [e13n16:591871] [ 4]
>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>>>
>>> [e13n16:591871] [ 5]
>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>>>
>>> [e13n16:591871] [ 6]
>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>>>
>>> [e13n16:591871] [ 7]
>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>>>
>>> [e13n16:591871] [ 8]
>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>>>
>>> [e13n16:591871] [ 9]
>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>>>
>>> [e13n16:591871] *** End of error message ***
>>>
>>>
>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>>>
>>> [e13n16:591874] [ 9]
>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>>>
>>> [e13n16:591874] *** End of error message ***
>>>
>>>
>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>>>
>>> [e13n16:591872] [ 2]
>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>>>
>>> [e13n16:591872] [ 3]
>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>>>
>>> [e13n16:591872] [ 4]
>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>>>
>>> [e13n16:591872] [ 5]
>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>>>
>>> [e13n16:591872] [ 6]
>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>>>
>>> [e13n16:591872] [ 7]
>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>>>
>>> [e13n16:591872] [ 8]
>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>>>
>>> [e13n16:591872] [ 9]
>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>>>
>>> [e13n16:591872] *** End of error message ***
>>>
>>>
>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>>>
>>> [e13n16:591873] [ 2]
>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>>>
>>> [e13n16:591873] [ 3]
>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>>>
>>> [e13n16:591873] [ 4]
>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>>>
>>> [e13n16:591873] [ 5]
>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>>>
>>> [e13n16:591873] [ 6]
>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>>>
>>> [e13n16:591873] [ 7]
>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>>>
>>> [e13n16:591873] [ 8]
>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>>>
>>> [e13n16:591873] [ 9]
>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>>>
>>> [e13n16:591873] *** End of error message ***
>>>
>>> ERROR:  One or more process (first noticed rank 1) terminated with
>>> signal 11 (core dumped)
>>>
>>>
>>>
>>>
>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper
>>> -arch=sm_70 CMakeFiles/xgc-es-cpp.dir/xgc-es-cpp_build_info.F90.o -o
>>> bin/xgc-es-cpp  -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib:/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64
>>> liblibxgc-es-cpp.a
>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/liblapack.so
>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/libblas.so
>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so
>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libparmetis.so
>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libmetis.so
>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/fftw-3.3.9-bzi7deue27ijd7xm4zn7pt22u4sj47g4/lib/libfftw3.so
>>> libs/pspline/libpspline.a libs/camtimers/libtimers.a
>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libacchost.so
>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran_mpi.so.2.7.1
>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran.so.2.7.1
>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscontainers.a
>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscore.a
>>> /usr/lib64/libcuda.so
>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/cuda/11.0/lib64/libcudart.so
>>> /usr/lib64/libdl.so -lmpi_ibm_usempif08 -lmpi_ibm_usempi_ignore_tkr
>>> -lmpi_ibm_mpifh -lnvf
>>> -Wl,-rpath-link,/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64
>>>
>>>
>>>
>>> 19:39 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$
>>> make
>>> PETSC_DIR=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7
>>> PETSC_ARCH="" ex1
>>> *mpicc* -fPIC -g -fast  -fPIC -g -fast
>>>  -I/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/include
>>>     ex1.c
>>>  -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
>>> -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
>>> -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
>>> -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib
>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib
>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib
>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib
>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64
>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64
>>> -Wl,-rpath,/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib
>>> -L/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib
>>> -Wl,-rpath,/usr/lib/gcc/ppc64le-redhat-linux/8
>>> -L/usr/lib/gcc/ppc64le-redhat-linux/8 -lpetsc -llapack -lblas -lparmetis
>>> -lmetis -lstdc++ -ldl -lpthread -lmpiprofilesupport -lmpi_ibm_usempif08
>>> -lmpi_ibm_usempi_ignore_tkr -lmpi_ibm_mpifh -lmpi_ibm -lnvf -lnvomp
>>> -latomic -lnvhpcatm -lnvcpumath -lnvc -lrt -lm -lgcc_s -lstdc++ -ldl -o ex1
>>> 19:40 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ *mpicc
>>> --version*
>>>
>>> *nvc 21.7-0 linuxpower target on Linuxpower*
>>> NVIDIA Compilers and Tools
>>> Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES.  All rights
>>> reserved.
>>> 19:40 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$
>>> jsrun -n 1 ./ex1 -ksp_monitor
>>>     0 KSP Residual norm 6.041522986797e+00
>>>     1 KSP Residual norm 1.042493382631e+00
>>>     2 KSP Residual norm 7.950907844730e-16
>>>     0 KSP Residual norm 4.786756692342e+00
>>>     1 KSP Residual norm 1.426392207750e-01
>>>     2 KSP Residual norm 1.801079604472e-15
>>>     0 KSP Residual norm 2.986456323228e+00
>>>     1 KSP Residual norm 7.669888809223e-02
>>>     2 KSP Residual norm 3.744083117256e-16
>>>     0 KSP Residual norm 2.306244667700e-01
>>>     1 KSP Residual norm 1.355550749587e-02
>>>     2 KSP Residual norm 5.845524837731e-17
>>>     0 KSP Residual norm 1.936314002654e-03
>>>     1 KSP Residual norm 2.125593590819e-04
>>>     2 KSP Residual norm 6.987141455073e-20
>>>     0 KSP Residual norm 1.435593531990e-07
>>>     1 KSP Residual norm 2.588271385567e-08
>>>     2 KSP Residual norm 3.942196167935e-23
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210827/94a8f0fb/attachment-0001.html>

From mfadams at lbl.gov  Fri Aug 27 15:28:13 2021
From: mfadams at lbl.gov (Mark Adams)
Date: Fri, 27 Aug 2021 16:28:13 -0400
Subject: [petsc-users] runtime error on Summit with nvhpc21.7
In-Reply-To: <CA+MQGp_xq3UkDfri25Kj+ZPdGp0+Bcj808tKF7v8SWWD9PLM_Q@mail.gmail.com>
References: <CADOhEh6PL+5hg1_85WYP_XSMM7OLNd0gD8v4Z4ci97CHQS=kDw@mail.gmail.com>
	<CA+MQGp_zF6bhBcp1Vk8BuCK3fgyUHvBrva1ZZ9c8y1KXbjiCNA@mail.gmail.com>
	<CADOhEh5DgMAfziCA+fYY0O4GKLpPzYEyth_o8uFCVDJcZ09zjg@mail.gmail.com>
	<CA+MQGp_xq3UkDfri25Kj+ZPdGp0+Bcj808tKF7v8SWWD9PLM_Q@mail.gmail.com>
Message-ID: <CADOhEh52X01-pO1pji7Byv+5xEMoQkSoBzRQYqK1GhOp+HV_zQ@mail.gmail.com>

On Fri, Aug 27, 2021 at 3:56 PM Junchao Zhang <junchao.zhang at gmail.com>
wrote:

>
>
> On Fri, Aug 27, 2021, 1:52 PM Mark Adams <mfadams at lbl.gov> wrote:
>
>> I think the problem is that I build with MPICC and they use nvcc_wrapper.
>> I could just try building PETSc with CC=nvcc_wrapper, but it was not
>> clear if this was the way to go.
>>
> --with-nvcc=nvcc_wrapper
>

What do I specify for cc and CC?


> I will try it.
>> Thanks,
>> Mark
>>
>> On Fri, Aug 27, 2021 at 10:50 AM Junchao Zhang <junchao.zhang at gmail.com>
>> wrote:
>>
>>>
>>>
>>>
>>> On Fri, Aug 27, 2021 at 7:06 AM Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>>> I have a user (cc'ed) that has a C++ code and is using a PETSc that I
>>>> built. He is getting this runtime error.
>>>>
>>>> 'make check' runs clean and I built snes/tutorial/ex1 manually, to get
>>>> a link line, and it ran fine.
>>>> I appended the users link line and my test.
>>>>
>>>> I see that they are using Kokkos' "nvcc_wrapper". Should I rebuild
>>>> PETSc using that, maybe we just need to make sure we are both using the
>>>> same underlying compiler or should they use mpiCC?
>>>>
>>> It looks like they used nvcc_wrapper to replace nvcc.  You can ask them
>>> to use nvcc directly to see what happens. But the error happened in petsc
>>> initialization, petscsys_petscinitializenohelp, so I doubt it helps.
>>> The easy way is to just attach a debugger.
>>>
>>>>
>>>> Thanks,
>>>> Mark
>>>>
>>>>
>>>> [e13n16:591873] *** Process received signal ***
>>>>
>>>> [e13n16:591873] Signal: Segmentation fault (11)
>>>>
>>>> [e13n16:591873] Signal code: Invalid permissions (2)
>>>>
>>>> [e13n16:591873] Failing at address: 0x102c87e0
>>>>
>>>> [e13n16:591873] [ 0]
>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>>>>
>>>> [e13n16:591873] [ 1] [e13n16:591872] *** Process received signal ***
>>>>
>>>> [e13n16:591872] Signal: Segmentation fault (11)
>>>>
>>>> [e13n16:591872] Signal code: Invalid permissions (2)
>>>>
>>>> [e13n16:591872] Failing at address: 0x102c87e0
>>>>
>>>> [e13n16:591872] [ 0]
>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>>>>
>>>> [e13n16:591872] [ 1] [e13n16:591871] *** Process received signal ***
>>>>
>>>> [e13n16:591871] Signal: Segmentation fault (11)
>>>>
>>>> [e13n16:591871] Signal code: Invalid permissions (2)
>>>>
>>>> [e13n16:591871] Failing at address: 0x102c87e0
>>>>
>>>> [e13n16:591871] [ 0]
>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>>>>
>>>> [e13n16:591871] [ 1]
>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>>>>
>>>> [e13n16:591871] [ 2] [e13n16:591874] *** Process received signal ***
>>>>
>>>> [e13n16:591874] Signal: Segmentation fault (11)
>>>>
>>>> [e13n16:591874] Signal code: Invalid permissions (2)
>>>>
>>>> [e13n16:591874] Failing at address: 0x102c87e0
>>>>
>>>> [e13n16:591874] [ 0]
>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>>>>
>>>> [e13n16:591874] [ 1]
>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>>>>
>>>> [e13n16:591874] [ 2]
>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>>>>
>>>> [e13n16:591874] [ 3]
>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>>>>
>>>> [e13n16:591874] [ 4]
>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>>>>
>>>> [e13n16:591874] [ 5]
>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>>>>
>>>> [e13n16:591874] [ 6]
>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>>>>
>>>> [e13n16:591874] [ 7]
>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>>>>
>>>> [e13n16:591874] [ 8]
>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>>>>
>>>> [e13n16:591871] [ 3]
>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>>>>
>>>> [e13n16:591871] [ 4]
>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>>>>
>>>> [e13n16:591871] [ 5]
>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>>>>
>>>> [e13n16:591871] [ 6]
>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>>>>
>>>> [e13n16:591871] [ 7]
>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>>>>
>>>> [e13n16:591871] [ 8]
>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>>>>
>>>> [e13n16:591871] [ 9]
>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>>>>
>>>> [e13n16:591871] *** End of error message ***
>>>>
>>>>
>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>>>>
>>>> [e13n16:591874] [ 9]
>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>>>>
>>>> [e13n16:591874] *** End of error message ***
>>>>
>>>>
>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>>>>
>>>> [e13n16:591872] [ 2]
>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>>>>
>>>> [e13n16:591872] [ 3]
>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>>>>
>>>> [e13n16:591872] [ 4]
>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>>>>
>>>> [e13n16:591872] [ 5]
>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>>>>
>>>> [e13n16:591872] [ 6]
>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>>>>
>>>> [e13n16:591872] [ 7]
>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>>>>
>>>> [e13n16:591872] [ 8]
>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>>>>
>>>> [e13n16:591872] [ 9]
>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>>>>
>>>> [e13n16:591872] *** End of error message ***
>>>>
>>>>
>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>>>>
>>>> [e13n16:591873] [ 2]
>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>>>>
>>>> [e13n16:591873] [ 3]
>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>>>>
>>>> [e13n16:591873] [ 4]
>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>>>>
>>>> [e13n16:591873] [ 5]
>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>>>>
>>>> [e13n16:591873] [ 6]
>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>>>>
>>>> [e13n16:591873] [ 7]
>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>>>>
>>>> [e13n16:591873] [ 8]
>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>>>>
>>>> [e13n16:591873] [ 9]
>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>>>>
>>>> [e13n16:591873] *** End of error message ***
>>>>
>>>> ERROR:  One or more process (first noticed rank 1) terminated with
>>>> signal 11 (core dumped)
>>>>
>>>>
>>>>
>>>>
>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper
>>>> -arch=sm_70 CMakeFiles/xgc-es-cpp.dir/xgc-es-cpp_build_info.F90.o -o
>>>> bin/xgc-es-cpp  -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib:/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64
>>>> liblibxgc-es-cpp.a
>>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/liblapack.so
>>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/libblas.so
>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so
>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libparmetis.so
>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libmetis.so
>>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/fftw-3.3.9-bzi7deue27ijd7xm4zn7pt22u4sj47g4/lib/libfftw3.so
>>>> libs/pspline/libpspline.a libs/camtimers/libtimers.a
>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libacchost.so
>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran_mpi.so.2.7.1
>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran.so.2.7.1
>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscontainers.a
>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscore.a
>>>> /usr/lib64/libcuda.so
>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/cuda/11.0/lib64/libcudart.so
>>>> /usr/lib64/libdl.so -lmpi_ibm_usempif08 -lmpi_ibm_usempi_ignore_tkr
>>>> -lmpi_ibm_mpifh -lnvf
>>>> -Wl,-rpath-link,/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64
>>>>
>>>>
>>>>
>>>> 19:39 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$
>>>> make
>>>> PETSC_DIR=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7
>>>> PETSC_ARCH="" ex1
>>>> *mpicc* -fPIC -g -fast  -fPIC -g -fast
>>>>  -I/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/include
>>>>     ex1.c
>>>>  -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
>>>> -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
>>>> -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
>>>> -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
>>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib
>>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib
>>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib
>>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib
>>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64
>>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64
>>>> -Wl,-rpath,/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib
>>>> -L/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib
>>>> -Wl,-rpath,/usr/lib/gcc/ppc64le-redhat-linux/8
>>>> -L/usr/lib/gcc/ppc64le-redhat-linux/8 -lpetsc -llapack -lblas -lparmetis
>>>> -lmetis -lstdc++ -ldl -lpthread -lmpiprofilesupport -lmpi_ibm_usempif08
>>>> -lmpi_ibm_usempi_ignore_tkr -lmpi_ibm_mpifh -lmpi_ibm -lnvf -lnvomp
>>>> -latomic -lnvhpcatm -lnvcpumath -lnvc -lrt -lm -lgcc_s -lstdc++ -ldl -o ex1
>>>> 19:40 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ *mpicc
>>>> --version*
>>>>
>>>> *nvc 21.7-0 linuxpower target on Linuxpower*
>>>> NVIDIA Compilers and Tools
>>>> Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES.  All rights
>>>> reserved.
>>>> 19:40 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$
>>>> jsrun -n 1 ./ex1 -ksp_monitor
>>>>     0 KSP Residual norm 6.041522986797e+00
>>>>     1 KSP Residual norm 1.042493382631e+00
>>>>     2 KSP Residual norm 7.950907844730e-16
>>>>     0 KSP Residual norm 4.786756692342e+00
>>>>     1 KSP Residual norm 1.426392207750e-01
>>>>     2 KSP Residual norm 1.801079604472e-15
>>>>     0 KSP Residual norm 2.986456323228e+00
>>>>     1 KSP Residual norm 7.669888809223e-02
>>>>     2 KSP Residual norm 3.744083117256e-16
>>>>     0 KSP Residual norm 2.306244667700e-01
>>>>     1 KSP Residual norm 1.355550749587e-02
>>>>     2 KSP Residual norm 5.845524837731e-17
>>>>     0 KSP Residual norm 1.936314002654e-03
>>>>     1 KSP Residual norm 2.125593590819e-04
>>>>     2 KSP Residual norm 6.987141455073e-20
>>>>     0 KSP Residual norm 1.435593531990e-07
>>>>     1 KSP Residual norm 2.588271385567e-08
>>>>     2 KSP Residual norm 3.942196167935e-23
>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210827/757e43e4/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: configure.log
Type: application/octet-stream
Size: 78776 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210827/757e43e4/attachment-0001.obj>

From junchao.zhang at gmail.com  Fri Aug 27 16:03:44 2021
From: junchao.zhang at gmail.com (Junchao Zhang)
Date: Fri, 27 Aug 2021 16:03:44 -0500
Subject: [petsc-users] runtime error on Summit with nvhpc21.7
In-Reply-To: <CADOhEh52X01-pO1pji7Byv+5xEMoQkSoBzRQYqK1GhOp+HV_zQ@mail.gmail.com>
References: <CADOhEh6PL+5hg1_85WYP_XSMM7OLNd0gD8v4Z4ci97CHQS=kDw@mail.gmail.com>
	<CA+MQGp_zF6bhBcp1Vk8BuCK3fgyUHvBrva1ZZ9c8y1KXbjiCNA@mail.gmail.com>
	<CADOhEh5DgMAfziCA+fYY0O4GKLpPzYEyth_o8uFCVDJcZ09zjg@mail.gmail.com>
	<CA+MQGp_xq3UkDfri25Kj+ZPdGp0+Bcj808tKF7v8SWWD9PLM_Q@mail.gmail.com>
	<CADOhEh52X01-pO1pji7Byv+5xEMoQkSoBzRQYqK1GhOp+HV_zQ@mail.gmail.com>
Message-ID: <CA+MQGp-Q+S=4FC_Pue-J6t5u78f2QYZ6yqMzn1vvSoe2zsoNRQ@mail.gmail.com>

I don't understand the configure options

--with-cc=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/
*nvcc_wrapper*
--with-cxx=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper
--with-fc=/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/bin/mpifort
COPTFLAGS="-g -fast" CXXOPTFLAGS="-g -fast" FOPTFLAGS="-g -fast"
CUDAFLAGS="-ccbin nvc++" --with-ssl=0 --with-batch=0 --with-mpiexec="jsrun
-g 1" *--with-cuda=0*
--with-cudac=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper
--with-cuda-gencodearch=70 --download-metis --download-parmetis --with-x=0
--with-debugging=0 PETSC_ARCH=arch-summit-opt-nvhpc
--prefix=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b

Why do you need to use nvcc_wrapper if you do not want to use cuda? In
addition, nvcc_wrapper is a C++ compiler. Using it for --with-cc=, you also
need --with-clanguage=c++

--Junchao Zhang


On Fri, Aug 27, 2021 at 3:28 PM Mark Adams <mfadams at lbl.gov> wrote:

>
>
> On Fri, Aug 27, 2021 at 3:56 PM Junchao Zhang <junchao.zhang at gmail.com>
> wrote:
>
>>
>>
>> On Fri, Aug 27, 2021, 1:52 PM Mark Adams <mfadams at lbl.gov> wrote:
>>
>>> I think the problem is that I build with MPICC and they use nvcc_wrapper.
>>> I could just try building PETSc with CC=nvcc_wrapper, but it was not
>>> clear if this was the way to go.
>>>
>> --with-nvcc=nvcc_wrapper
>>
>
> What do I specify for cc and CC?
>
>
>> I will try it.
>>> Thanks,
>>> Mark
>>>
>>> On Fri, Aug 27, 2021 at 10:50 AM Junchao Zhang <junchao.zhang at gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>>
>>>> On Fri, Aug 27, 2021 at 7:06 AM Mark Adams <mfadams at lbl.gov> wrote:
>>>>
>>>>> I have a user (cc'ed) that has a C++ code and is using a PETSc that I
>>>>> built. He is getting this runtime error.
>>>>>
>>>>> 'make check' runs clean and I built snes/tutorial/ex1 manually, to get
>>>>> a link line, and it ran fine.
>>>>> I appended the users link line and my test.
>>>>>
>>>>> I see that they are using Kokkos' "nvcc_wrapper". Should I rebuild
>>>>> PETSc using that, maybe we just need to make sure we are both using the
>>>>> same underlying compiler or should they use mpiCC?
>>>>>
>>>> It looks like they used nvcc_wrapper to replace nvcc.  You can ask them
>>>> to use nvcc directly to see what happens. But the error happened in petsc
>>>> initialization, petscsys_petscinitializenohelp, so I doubt it helps.
>>>> The easy way is to just attach a debugger.
>>>>
>>>>>
>>>>> Thanks,
>>>>> Mark
>>>>>
>>>>>
>>>>> [e13n16:591873] *** Process received signal ***
>>>>>
>>>>> [e13n16:591873] Signal: Segmentation fault (11)
>>>>>
>>>>> [e13n16:591873] Signal code: Invalid permissions (2)
>>>>>
>>>>> [e13n16:591873] Failing at address: 0x102c87e0
>>>>>
>>>>> [e13n16:591873] [ 0]
>>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>>>>>
>>>>> [e13n16:591873] [ 1] [e13n16:591872] *** Process received signal ***
>>>>>
>>>>> [e13n16:591872] Signal: Segmentation fault (11)
>>>>>
>>>>> [e13n16:591872] Signal code: Invalid permissions (2)
>>>>>
>>>>> [e13n16:591872] Failing at address: 0x102c87e0
>>>>>
>>>>> [e13n16:591872] [ 0]
>>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>>>>>
>>>>> [e13n16:591872] [ 1] [e13n16:591871] *** Process received signal ***
>>>>>
>>>>> [e13n16:591871] Signal: Segmentation fault (11)
>>>>>
>>>>> [e13n16:591871] Signal code: Invalid permissions (2)
>>>>>
>>>>> [e13n16:591871] Failing at address: 0x102c87e0
>>>>>
>>>>> [e13n16:591871] [ 0]
>>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>>>>>
>>>>> [e13n16:591871] [ 1]
>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>>>>>
>>>>> [e13n16:591871] [ 2] [e13n16:591874] *** Process received signal ***
>>>>>
>>>>> [e13n16:591874] Signal: Segmentation fault (11)
>>>>>
>>>>> [e13n16:591874] Signal code: Invalid permissions (2)
>>>>>
>>>>> [e13n16:591874] Failing at address: 0x102c87e0
>>>>>
>>>>> [e13n16:591874] [ 0]
>>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>>>>>
>>>>> [e13n16:591874] [ 1]
>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>>>>>
>>>>> [e13n16:591874] [ 2]
>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>>>>>
>>>>> [e13n16:591874] [ 3]
>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>>>>>
>>>>> [e13n16:591874] [ 4]
>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>>>>>
>>>>> [e13n16:591874] [ 5]
>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>>>>>
>>>>> [e13n16:591874] [ 6]
>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>>>>>
>>>>> [e13n16:591874] [ 7]
>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>>>>>
>>>>> [e13n16:591874] [ 8]
>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>>>>>
>>>>> [e13n16:591871] [ 3]
>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>>>>>
>>>>> [e13n16:591871] [ 4]
>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>>>>>
>>>>> [e13n16:591871] [ 5]
>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>>>>>
>>>>> [e13n16:591871] [ 6]
>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>>>>>
>>>>> [e13n16:591871] [ 7]
>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>>>>>
>>>>> [e13n16:591871] [ 8]
>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>>>>>
>>>>> [e13n16:591871] [ 9]
>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>>>>>
>>>>> [e13n16:591871] *** End of error message ***
>>>>>
>>>>>
>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>>>>>
>>>>> [e13n16:591874] [ 9]
>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>>>>>
>>>>> [e13n16:591874] *** End of error message ***
>>>>>
>>>>>
>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>>>>>
>>>>> [e13n16:591872] [ 2]
>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>>>>>
>>>>> [e13n16:591872] [ 3]
>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>>>>>
>>>>> [e13n16:591872] [ 4]
>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>>>>>
>>>>> [e13n16:591872] [ 5]
>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>>>>>
>>>>> [e13n16:591872] [ 6]
>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>>>>>
>>>>> [e13n16:591872] [ 7]
>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>>>>>
>>>>> [e13n16:591872] [ 8]
>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>>>>>
>>>>> [e13n16:591872] [ 9]
>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>>>>>
>>>>> [e13n16:591872] *** End of error message ***
>>>>>
>>>>>
>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>>>>>
>>>>> [e13n16:591873] [ 2]
>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>>>>>
>>>>> [e13n16:591873] [ 3]
>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>>>>>
>>>>> [e13n16:591873] [ 4]
>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>>>>>
>>>>> [e13n16:591873] [ 5]
>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>>>>>
>>>>> [e13n16:591873] [ 6]
>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>>>>>
>>>>> [e13n16:591873] [ 7]
>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>>>>>
>>>>> [e13n16:591873] [ 8]
>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>>>>>
>>>>> [e13n16:591873] [ 9]
>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>>>>>
>>>>> [e13n16:591873] *** End of error message ***
>>>>>
>>>>> ERROR:  One or more process (first noticed rank 1) terminated with
>>>>> signal 11 (core dumped)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper
>>>>> -arch=sm_70 CMakeFiles/xgc-es-cpp.dir/xgc-es-cpp_build_info.F90.o -o
>>>>> bin/xgc-es-cpp  -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib:/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64
>>>>> liblibxgc-es-cpp.a
>>>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/liblapack.so
>>>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/libblas.so
>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so
>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libparmetis.so
>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libmetis.so
>>>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/fftw-3.3.9-bzi7deue27ijd7xm4zn7pt22u4sj47g4/lib/libfftw3.so
>>>>> libs/pspline/libpspline.a libs/camtimers/libtimers.a
>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libacchost.so
>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran_mpi.so.2.7.1
>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran.so.2.7.1
>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscontainers.a
>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscore.a
>>>>> /usr/lib64/libcuda.so
>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/cuda/11.0/lib64/libcudart.so
>>>>> /usr/lib64/libdl.so -lmpi_ibm_usempif08 -lmpi_ibm_usempi_ignore_tkr
>>>>> -lmpi_ibm_mpifh -lnvf
>>>>> -Wl,-rpath-link,/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64
>>>>>
>>>>>
>>>>>
>>>>> 19:39 main=
>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ make
>>>>> PETSC_DIR=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7
>>>>> PETSC_ARCH="" ex1
>>>>> *mpicc* -fPIC -g -fast  -fPIC -g -fast
>>>>>  -I/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/include
>>>>>     ex1.c
>>>>>  -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
>>>>> -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
>>>>> -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
>>>>> -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
>>>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib
>>>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib
>>>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib
>>>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib
>>>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64
>>>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64
>>>>> -Wl,-rpath,/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib
>>>>> -L/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib
>>>>> -Wl,-rpath,/usr/lib/gcc/ppc64le-redhat-linux/8
>>>>> -L/usr/lib/gcc/ppc64le-redhat-linux/8 -lpetsc -llapack -lblas -lparmetis
>>>>> -lmetis -lstdc++ -ldl -lpthread -lmpiprofilesupport -lmpi_ibm_usempif08
>>>>> -lmpi_ibm_usempi_ignore_tkr -lmpi_ibm_mpifh -lmpi_ibm -lnvf -lnvomp
>>>>> -latomic -lnvhpcatm -lnvcpumath -lnvc -lrt -lm -lgcc_s -lstdc++ -ldl -o ex1
>>>>> 19:40 main=
>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ *mpicc
>>>>> --version*
>>>>>
>>>>> *nvc 21.7-0 linuxpower target on Linuxpower*
>>>>> NVIDIA Compilers and Tools
>>>>> Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES.  All rights
>>>>> reserved.
>>>>> 19:40 main=
>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ jsrun -n 1
>>>>> ./ex1 -ksp_monitor
>>>>>     0 KSP Residual norm 6.041522986797e+00
>>>>>     1 KSP Residual norm 1.042493382631e+00
>>>>>     2 KSP Residual norm 7.950907844730e-16
>>>>>     0 KSP Residual norm 4.786756692342e+00
>>>>>     1 KSP Residual norm 1.426392207750e-01
>>>>>     2 KSP Residual norm 1.801079604472e-15
>>>>>     0 KSP Residual norm 2.986456323228e+00
>>>>>     1 KSP Residual norm 7.669888809223e-02
>>>>>     2 KSP Residual norm 3.744083117256e-16
>>>>>     0 KSP Residual norm 2.306244667700e-01
>>>>>     1 KSP Residual norm 1.355550749587e-02
>>>>>     2 KSP Residual norm 5.845524837731e-17
>>>>>     0 KSP Residual norm 1.936314002654e-03
>>>>>     1 KSP Residual norm 2.125593590819e-04
>>>>>     2 KSP Residual norm 6.987141455073e-20
>>>>>     0 KSP Residual norm 1.435593531990e-07
>>>>>     1 KSP Residual norm 2.588271385567e-08
>>>>>     2 KSP Residual norm 3.942196167935e-23
>>>>>
>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210827/aa6fd639/attachment-0001.html>

From mfadams at lbl.gov  Fri Aug 27 17:05:44 2021
From: mfadams at lbl.gov (Mark Adams)
Date: Fri, 27 Aug 2021 18:05:44 -0400
Subject: [petsc-users] runtime error on Summit with nvhpc21.7
In-Reply-To: <CA+MQGp-Q+S=4FC_Pue-J6t5u78f2QYZ6yqMzn1vvSoe2zsoNRQ@mail.gmail.com>
References: <CADOhEh6PL+5hg1_85WYP_XSMM7OLNd0gD8v4Z4ci97CHQS=kDw@mail.gmail.com>
	<CA+MQGp_zF6bhBcp1Vk8BuCK3fgyUHvBrva1ZZ9c8y1KXbjiCNA@mail.gmail.com>
	<CADOhEh5DgMAfziCA+fYY0O4GKLpPzYEyth_o8uFCVDJcZ09zjg@mail.gmail.com>
	<CA+MQGp_xq3UkDfri25Kj+ZPdGp0+Bcj808tKF7v8SWWD9PLM_Q@mail.gmail.com>
	<CADOhEh52X01-pO1pji7Byv+5xEMoQkSoBzRQYqK1GhOp+HV_zQ@mail.gmail.com>
	<CA+MQGp-Q+S=4FC_Pue-J6t5u78f2QYZ6yqMzn1vvSoe2zsoNRQ@mail.gmail.com>
Message-ID: <CADOhEh5oviPo8hrV1jB7sfAeJPthBTVR0ckGcKtfRPc3esEg_Q@mail.gmail.com>

On Fri, Aug 27, 2021 at 5:03 PM Junchao Zhang <junchao.zhang at gmail.com>
wrote:

> I don't understand the configure options
>
>
> --with-cc=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/
> *nvcc_wrapper*
> --with-cxx=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper
> --with-fc=/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/bin/mpifort
> COPTFLAGS="-g -fast" CXXOPTFLAGS="-g -fast" FOPTFLAGS="-g -fast"
> CUDAFLAGS="-ccbin nvc++" --with-ssl=0 --with-batch=0 --with-mpiexec="jsrun
> -g 1" *--with-cuda=0*
> --with-cudac=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper
> --with-cuda-gencodearch=70 --download-metis --download-parmetis --with-x=0
> --with-debugging=0 PETSC_ARCH=arch-summit-opt-nvhpc
> --prefix=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b
>
> Why do you need to use nvcc_wrapper if you do not want to use cuda?
>

That code that is having a problem links with nvcc_wrapper.
They get a segv that I sent earlier, in PetscInitialize so I figure I
should use the same compiler / linker.
They use CUDA, but we don't need PETSc to use CUDA now.


> In addition, nvcc_wrapper is a C++ compiler. Using it for --with-cc=, you
> also need --with-clanguage=c++
>

I rebuilt PETSc with mpicc, mpiCC, mpif90 and --with-nvcc=nvcc_wrapper and that
built make check works. I gave it to them to test.

Thanks,
Mark


>
> --Junchao Zhang
>
>
> On Fri, Aug 27, 2021 at 3:28 PM Mark Adams <mfadams at lbl.gov> wrote:
>
>>
>>
>> On Fri, Aug 27, 2021 at 3:56 PM Junchao Zhang <junchao.zhang at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Fri, Aug 27, 2021, 1:52 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>>> I think the problem is that I build with MPICC and they use nvcc_wrapper.
>>>> I could just try building PETSc with CC=nvcc_wrapper, but it was not
>>>> clear if this was the way to go.
>>>>
>>> --with-nvcc=nvcc_wrapper
>>>
>>
>> What do I specify for cc and CC?
>>
>>
>>> I will try it.
>>>> Thanks,
>>>> Mark
>>>>
>>>> On Fri, Aug 27, 2021 at 10:50 AM Junchao Zhang <junchao.zhang at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Aug 27, 2021 at 7:06 AM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>
>>>>>> I have a user (cc'ed) that has a C++ code and is using a PETSc that I
>>>>>> built. He is getting this runtime error.
>>>>>>
>>>>>> 'make check' runs clean and I built snes/tutorial/ex1 manually, to
>>>>>> get a link line, and it ran fine.
>>>>>> I appended the users link line and my test.
>>>>>>
>>>>>> I see that they are using Kokkos' "nvcc_wrapper". Should I rebuild
>>>>>> PETSc using that, maybe we just need to make sure we are both using the
>>>>>> same underlying compiler or should they use mpiCC?
>>>>>>
>>>>> It looks like they used nvcc_wrapper to replace nvcc.  You can ask
>>>>> them to use nvcc directly to see what happens. But the error happened in
>>>>> petsc initialization, petscsys_petscinitializenohelp, so I doubt it
>>>>> helps.  The easy way is to just attach a debugger.
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Mark
>>>>>>
>>>>>>
>>>>>> [e13n16:591873] *** Process received signal ***
>>>>>>
>>>>>> [e13n16:591873] Signal: Segmentation fault (11)
>>>>>>
>>>>>> [e13n16:591873] Signal code: Invalid permissions (2)
>>>>>>
>>>>>> [e13n16:591873] Failing at address: 0x102c87e0
>>>>>>
>>>>>> [e13n16:591873] [ 0]
>>>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>>>>>>
>>>>>> [e13n16:591873] [ 1] [e13n16:591872] *** Process received signal ***
>>>>>>
>>>>>> [e13n16:591872] Signal: Segmentation fault (11)
>>>>>>
>>>>>> [e13n16:591872] Signal code: Invalid permissions (2)
>>>>>>
>>>>>> [e13n16:591872] Failing at address: 0x102c87e0
>>>>>>
>>>>>> [e13n16:591872] [ 0]
>>>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>>>>>>
>>>>>> [e13n16:591872] [ 1] [e13n16:591871] *** Process received signal ***
>>>>>>
>>>>>> [e13n16:591871] Signal: Segmentation fault (11)
>>>>>>
>>>>>> [e13n16:591871] Signal code: Invalid permissions (2)
>>>>>>
>>>>>> [e13n16:591871] Failing at address: 0x102c87e0
>>>>>>
>>>>>> [e13n16:591871] [ 0]
>>>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>>>>>>
>>>>>> [e13n16:591871] [ 1]
>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>>>>>>
>>>>>> [e13n16:591871] [ 2] [e13n16:591874] *** Process received signal ***
>>>>>>
>>>>>> [e13n16:591874] Signal: Segmentation fault (11)
>>>>>>
>>>>>> [e13n16:591874] Signal code: Invalid permissions (2)
>>>>>>
>>>>>> [e13n16:591874] Failing at address: 0x102c87e0
>>>>>>
>>>>>> [e13n16:591874] [ 0]
>>>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>>>>>>
>>>>>> [e13n16:591874] [ 1]
>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>>>>>>
>>>>>> [e13n16:591874] [ 2]
>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>>>>>>
>>>>>> [e13n16:591874] [ 3]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>>>>>>
>>>>>> [e13n16:591874] [ 4]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>>>>>>
>>>>>> [e13n16:591874] [ 5]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>>>>>>
>>>>>> [e13n16:591874] [ 6]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>>>>>>
>>>>>> [e13n16:591874] [ 7]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>>>>>>
>>>>>> [e13n16:591874] [ 8]
>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>>>>>>
>>>>>> [e13n16:591871] [ 3]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>>>>>>
>>>>>> [e13n16:591871] [ 4]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>>>>>>
>>>>>> [e13n16:591871] [ 5]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>>>>>>
>>>>>> [e13n16:591871] [ 6]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>>>>>>
>>>>>> [e13n16:591871] [ 7]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>>>>>>
>>>>>> [e13n16:591871] [ 8]
>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>>>>>>
>>>>>> [e13n16:591871] [ 9]
>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>>>>>>
>>>>>> [e13n16:591871] *** End of error message ***
>>>>>>
>>>>>>
>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>>>>>>
>>>>>> [e13n16:591874] [ 9]
>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>>>>>>
>>>>>> [e13n16:591874] *** End of error message ***
>>>>>>
>>>>>>
>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>>>>>>
>>>>>> [e13n16:591872] [ 2]
>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>>>>>>
>>>>>> [e13n16:591872] [ 3]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>>>>>>
>>>>>> [e13n16:591872] [ 4]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>>>>>>
>>>>>> [e13n16:591872] [ 5]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>>>>>>
>>>>>> [e13n16:591872] [ 6]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>>>>>>
>>>>>> [e13n16:591872] [ 7]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>>>>>>
>>>>>> [e13n16:591872] [ 8]
>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>>>>>>
>>>>>> [e13n16:591872] [ 9]
>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>>>>>>
>>>>>> [e13n16:591872] *** End of error message ***
>>>>>>
>>>>>>
>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>>>>>>
>>>>>> [e13n16:591873] [ 2]
>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>>>>>>
>>>>>> [e13n16:591873] [ 3]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>>>>>>
>>>>>> [e13n16:591873] [ 4]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>>>>>>
>>>>>> [e13n16:591873] [ 5]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>>>>>>
>>>>>> [e13n16:591873] [ 6]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>>>>>>
>>>>>> [e13n16:591873] [ 7]
>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>>>>>>
>>>>>> [e13n16:591873] [ 8]
>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>>>>>>
>>>>>> [e13n16:591873] [ 9]
>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>>>>>>
>>>>>> [e13n16:591873] *** End of error message ***
>>>>>>
>>>>>> ERROR:  One or more process (first noticed rank 1) terminated with
>>>>>> signal 11 (core dumped)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper
>>>>>> -arch=sm_70 CMakeFiles/xgc-es-cpp.dir/xgc-es-cpp_build_info.F90.o -o
>>>>>> bin/xgc-es-cpp  -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib:/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64
>>>>>> liblibxgc-es-cpp.a
>>>>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/liblapack.so
>>>>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/libblas.so
>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so
>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libparmetis.so
>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libmetis.so
>>>>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/fftw-3.3.9-bzi7deue27ijd7xm4zn7pt22u4sj47g4/lib/libfftw3.so
>>>>>> libs/pspline/libpspline.a libs/camtimers/libtimers.a
>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libacchost.so
>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran_mpi.so.2.7.1
>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran.so.2.7.1
>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscontainers.a
>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscore.a
>>>>>> /usr/lib64/libcuda.so
>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/cuda/11.0/lib64/libcudart.so
>>>>>> /usr/lib64/libdl.so -lmpi_ibm_usempif08 -lmpi_ibm_usempi_ignore_tkr
>>>>>> -lmpi_ibm_mpifh -lnvf
>>>>>> -Wl,-rpath-link,/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64
>>>>>>
>>>>>>
>>>>>>
>>>>>> 19:39 main=
>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ make
>>>>>> PETSC_DIR=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7
>>>>>> PETSC_ARCH="" ex1
>>>>>> *mpicc* -fPIC -g -fast  -fPIC -g -fast
>>>>>>  -I/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/include
>>>>>>     ex1.c
>>>>>>  -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
>>>>>> -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
>>>>>> -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
>>>>>> -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
>>>>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib
>>>>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib
>>>>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib
>>>>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib
>>>>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64
>>>>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64
>>>>>> -Wl,-rpath,/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib
>>>>>> -L/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib
>>>>>> -Wl,-rpath,/usr/lib/gcc/ppc64le-redhat-linux/8
>>>>>> -L/usr/lib/gcc/ppc64le-redhat-linux/8 -lpetsc -llapack -lblas -lparmetis
>>>>>> -lmetis -lstdc++ -ldl -lpthread -lmpiprofilesupport -lmpi_ibm_usempif08
>>>>>> -lmpi_ibm_usempi_ignore_tkr -lmpi_ibm_mpifh -lmpi_ibm -lnvf -lnvomp
>>>>>> -latomic -lnvhpcatm -lnvcpumath -lnvc -lrt -lm -lgcc_s -lstdc++ -ldl -o ex1
>>>>>> 19:40 main=
>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ *mpicc
>>>>>> --version*
>>>>>>
>>>>>> *nvc 21.7-0 linuxpower target on Linuxpower*
>>>>>> NVIDIA Compilers and Tools
>>>>>> Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES.  All rights
>>>>>> reserved.
>>>>>> 19:40 main=
>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ jsrun -n 1
>>>>>> ./ex1 -ksp_monitor
>>>>>>     0 KSP Residual norm 6.041522986797e+00
>>>>>>     1 KSP Residual norm 1.042493382631e+00
>>>>>>     2 KSP Residual norm 7.950907844730e-16
>>>>>>     0 KSP Residual norm 4.786756692342e+00
>>>>>>     1 KSP Residual norm 1.426392207750e-01
>>>>>>     2 KSP Residual norm 1.801079604472e-15
>>>>>>     0 KSP Residual norm 2.986456323228e+00
>>>>>>     1 KSP Residual norm 7.669888809223e-02
>>>>>>     2 KSP Residual norm 3.744083117256e-16
>>>>>>     0 KSP Residual norm 2.306244667700e-01
>>>>>>     1 KSP Residual norm 1.355550749587e-02
>>>>>>     2 KSP Residual norm 5.845524837731e-17
>>>>>>     0 KSP Residual norm 1.936314002654e-03
>>>>>>     1 KSP Residual norm 2.125593590819e-04
>>>>>>     2 KSP Residual norm 6.987141455073e-20
>>>>>>     0 KSP Residual norm 1.435593531990e-07
>>>>>>     1 KSP Residual norm 2.588271385567e-08
>>>>>>     2 KSP Residual norm 3.942196167935e-23
>>>>>>
>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210827/fb85a0f6/attachment-0001.html>

From mfadams at lbl.gov  Fri Aug 27 17:16:43 2021
From: mfadams at lbl.gov (Mark Adams)
Date: Fri, 27 Aug 2021 18:16:43 -0400
Subject: [petsc-users] runtime error on Summit with nvhpc21.7
In-Reply-To: <CADOhEh5oviPo8hrV1jB7sfAeJPthBTVR0ckGcKtfRPc3esEg_Q@mail.gmail.com>
References: <CADOhEh6PL+5hg1_85WYP_XSMM7OLNd0gD8v4Z4ci97CHQS=kDw@mail.gmail.com>
	<CA+MQGp_zF6bhBcp1Vk8BuCK3fgyUHvBrva1ZZ9c8y1KXbjiCNA@mail.gmail.com>
	<CADOhEh5DgMAfziCA+fYY0O4GKLpPzYEyth_o8uFCVDJcZ09zjg@mail.gmail.com>
	<CA+MQGp_xq3UkDfri25Kj+ZPdGp0+Bcj808tKF7v8SWWD9PLM_Q@mail.gmail.com>
	<CADOhEh52X01-pO1pji7Byv+5xEMoQkSoBzRQYqK1GhOp+HV_zQ@mail.gmail.com>
	<CA+MQGp-Q+S=4FC_Pue-J6t5u78f2QYZ6yqMzn1vvSoe2zsoNRQ@mail.gmail.com>
	<CADOhEh5oviPo8hrV1jB7sfAeJPthBTVR0ckGcKtfRPc3esEg_Q@mail.gmail.com>
Message-ID: <CADOhEh6dhAvuxphjBvqUMP0CpugwK8LRQrUVEAWn893fPuiN0A@mail.gmail.com>

And I found that this C++ code calls PetscIntiialize from Fortran code.
Hence the Fortran library in the call stack.

F90 tests work but our tests are pure Fortran.

Should they be using nvcc_wrapper (a Kokkos version) as a linker?

Thanks,
Mark

On Fri, Aug 27, 2021 at 6:05 PM Mark Adams <mfadams at lbl.gov> wrote:

>
>
> On Fri, Aug 27, 2021 at 5:03 PM Junchao Zhang <junchao.zhang at gmail.com>
> wrote:
>
>> I don't understand the configure options
>>
>>
>> --with-cc=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/
>> *nvcc_wrapper*
>> --with-cxx=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper
>> --with-fc=/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/bin/mpifort
>> COPTFLAGS="-g -fast" CXXOPTFLAGS="-g -fast" FOPTFLAGS="-g -fast"
>> CUDAFLAGS="-ccbin nvc++" --with-ssl=0 --with-batch=0 --with-mpiexec="jsrun
>> -g 1" *--with-cuda=0*
>> --with-cudac=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper
>> --with-cuda-gencodearch=70 --download-metis --download-parmetis --with-x=0
>> --with-debugging=0 PETSC_ARCH=arch-summit-opt-nvhpc
>> --prefix=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b
>>
>> Why do you need to use nvcc_wrapper if you do not want to use cuda?
>>
>
> That code that is having a problem links with nvcc_wrapper.
> They get a segv that I sent earlier, in PetscInitialize so I figure I
> should use the same compiler / linker.
> They use CUDA, but we don't need PETSc to use CUDA now.
>
>
>> In addition, nvcc_wrapper is a C++ compiler. Using it for --with-cc=, you
>> also need --with-clanguage=c++
>>
>
> I rebuilt PETSc with mpicc, mpiCC, mpif90 and --with-nvcc=nvcc_wrapper
> and that built make check works. I gave it to them to test.
>
> Thanks,
> Mark
>
>
>>
>> --Junchao Zhang
>>
>>
>> On Fri, Aug 27, 2021 at 3:28 PM Mark Adams <mfadams at lbl.gov> wrote:
>>
>>>
>>>
>>> On Fri, Aug 27, 2021 at 3:56 PM Junchao Zhang <junchao.zhang at gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Aug 27, 2021, 1:52 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>>
>>>>> I think the problem is that I build with MPICC and they use nvcc_wrapper.
>>>>> I could just try building PETSc with CC=nvcc_wrapper, but it was not
>>>>> clear if this was the way to go.
>>>>>
>>>> --with-nvcc=nvcc_wrapper
>>>>
>>>
>>> What do I specify for cc and CC?
>>>
>>>
>>>> I will try it.
>>>>> Thanks,
>>>>> Mark
>>>>>
>>>>> On Fri, Aug 27, 2021 at 10:50 AM Junchao Zhang <
>>>>> junchao.zhang at gmail.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 27, 2021 at 7:06 AM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>>
>>>>>>> I have a user (cc'ed) that has a C++ code and is using a PETSc that
>>>>>>> I built. He is getting this runtime error.
>>>>>>>
>>>>>>> 'make check' runs clean and I built snes/tutorial/ex1 manually, to
>>>>>>> get a link line, and it ran fine.
>>>>>>> I appended the users link line and my test.
>>>>>>>
>>>>>>> I see that they are using Kokkos' "nvcc_wrapper". Should I rebuild
>>>>>>> PETSc using that, maybe we just need to make sure we are both using the
>>>>>>> same underlying compiler or should they use mpiCC?
>>>>>>>
>>>>>> It looks like they used nvcc_wrapper to replace nvcc.  You can ask
>>>>>> them to use nvcc directly to see what happens. But the error happened in
>>>>>> petsc initialization, petscsys_petscinitializenohelp, so I doubt it
>>>>>> helps.  The easy way is to just attach a debugger.
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Mark
>>>>>>>
>>>>>>>
>>>>>>> [e13n16:591873] *** Process received signal ***
>>>>>>>
>>>>>>> [e13n16:591873] Signal: Segmentation fault (11)
>>>>>>>
>>>>>>> [e13n16:591873] Signal code: Invalid permissions (2)
>>>>>>>
>>>>>>> [e13n16:591873] Failing at address: 0x102c87e0
>>>>>>>
>>>>>>> [e13n16:591873] [ 0]
>>>>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>>>>>>>
>>>>>>> [e13n16:591873] [ 1] [e13n16:591872] *** Process received signal ***
>>>>>>>
>>>>>>> [e13n16:591872] Signal: Segmentation fault (11)
>>>>>>>
>>>>>>> [e13n16:591872] Signal code: Invalid permissions (2)
>>>>>>>
>>>>>>> [e13n16:591872] Failing at address: 0x102c87e0
>>>>>>>
>>>>>>> [e13n16:591872] [ 0]
>>>>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>>>>>>>
>>>>>>> [e13n16:591872] [ 1] [e13n16:591871] *** Process received signal ***
>>>>>>>
>>>>>>> [e13n16:591871] Signal: Segmentation fault (11)
>>>>>>>
>>>>>>> [e13n16:591871] Signal code: Invalid permissions (2)
>>>>>>>
>>>>>>> [e13n16:591871] Failing at address: 0x102c87e0
>>>>>>>
>>>>>>> [e13n16:591871] [ 0]
>>>>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>>>>>>>
>>>>>>> [e13n16:591871] [ 1]
>>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>>>>>>>
>>>>>>> [e13n16:591871] [ 2] [e13n16:591874] *** Process received signal ***
>>>>>>>
>>>>>>> [e13n16:591874] Signal: Segmentation fault (11)
>>>>>>>
>>>>>>> [e13n16:591874] Signal code: Invalid permissions (2)
>>>>>>>
>>>>>>> [e13n16:591874] Failing at address: 0x102c87e0
>>>>>>>
>>>>>>> [e13n16:591874] [ 0]
>>>>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>>>>>>>
>>>>>>> [e13n16:591874] [ 1]
>>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>>>>>>>
>>>>>>> [e13n16:591874] [ 2]
>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>>>>>>>
>>>>>>> [e13n16:591874] [ 3]
>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>>>>>>>
>>>>>>> [e13n16:591874] [ 4]
>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>>>>>>>
>>>>>>> [e13n16:591874] [ 5]
>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>>>>>>>
>>>>>>> [e13n16:591874] [ 6]
>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>>>>>>>
>>>>>>> [e13n16:591874] [ 7]
>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>>>>>>>
>>>>>>> [e13n16:591874] [ 8]
>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>>>>>>>
>>>>>>> [e13n16:591871] [ 3]
>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>>>>>>>
>>>>>>> [e13n16:591871] [ 4]
>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>>>>>>>
>>>>>>> [e13n16:591871] [ 5]
>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>>>>>>>
>>>>>>> [e13n16:591871] [ 6]
>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>>>>>>>
>>>>>>> [e13n16:591871] [ 7]
>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>>>>>>>
>>>>>>> [e13n16:591871] [ 8]
>>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>>>>>>>
>>>>>>> [e13n16:591871] [ 9]
>>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>>>>>>>
>>>>>>> [e13n16:591871] *** End of error message ***
>>>>>>>
>>>>>>>
>>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>>>>>>>
>>>>>>> [e13n16:591874] [ 9]
>>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>>>>>>>
>>>>>>> [e13n16:591874] *** End of error message ***
>>>>>>>
>>>>>>>
>>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>>>>>>>
>>>>>>> [e13n16:591872] [ 2]
>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>>>>>>>
>>>>>>> [e13n16:591872] [ 3]
>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>>>>>>>
>>>>>>> [e13n16:591872] [ 4]
>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>>>>>>>
>>>>>>> [e13n16:591872] [ 5]
>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>>>>>>>
>>>>>>> [e13n16:591872] [ 6]
>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>>>>>>>
>>>>>>> [e13n16:591872] [ 7]
>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>>>>>>>
>>>>>>> [e13n16:591872] [ 8]
>>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>>>>>>>
>>>>>>> [e13n16:591872] [ 9]
>>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>>>>>>>
>>>>>>> [e13n16:591872] *** End of error message ***
>>>>>>>
>>>>>>>
>>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>>>>>>>
>>>>>>> [e13n16:591873] [ 2]
>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>>>>>>>
>>>>>>> [e13n16:591873] [ 3]
>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>>>>>>>
>>>>>>> [e13n16:591873] [ 4]
>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>>>>>>>
>>>>>>> [e13n16:591873] [ 5]
>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>>>>>>>
>>>>>>> [e13n16:591873] [ 6]
>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>>>>>>>
>>>>>>> [e13n16:591873] [ 7]
>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>>>>>>>
>>>>>>> [e13n16:591873] [ 8]
>>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>>>>>>>
>>>>>>> [e13n16:591873] [ 9]
>>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>>>>>>>
>>>>>>> [e13n16:591873] *** End of error message ***
>>>>>>>
>>>>>>> ERROR:  One or more process (first noticed rank 1) terminated with
>>>>>>> signal 11 (core dumped)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper
>>>>>>> -arch=sm_70 CMakeFiles/xgc-es-cpp.dir/xgc-es-cpp_build_info.F90.o -o
>>>>>>> bin/xgc-es-cpp  -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib:/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64
>>>>>>> liblibxgc-es-cpp.a
>>>>>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/liblapack.so
>>>>>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/libblas.so
>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so
>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libparmetis.so
>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libmetis.so
>>>>>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/fftw-3.3.9-bzi7deue27ijd7xm4zn7pt22u4sj47g4/lib/libfftw3.so
>>>>>>> libs/pspline/libpspline.a libs/camtimers/libtimers.a
>>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libacchost.so
>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran_mpi.so.2.7.1
>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran.so.2.7.1
>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscontainers.a
>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscore.a
>>>>>>> /usr/lib64/libcuda.so
>>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/cuda/11.0/lib64/libcudart.so
>>>>>>> /usr/lib64/libdl.so -lmpi_ibm_usempif08 -lmpi_ibm_usempi_ignore_tkr
>>>>>>> -lmpi_ibm_mpifh -lnvf
>>>>>>> -Wl,-rpath-link,/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 19:39 main=
>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ make
>>>>>>> PETSC_DIR=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7
>>>>>>> PETSC_ARCH="" ex1
>>>>>>> *mpicc* -fPIC -g -fast  -fPIC -g -fast
>>>>>>>  -I/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/include
>>>>>>>     ex1.c
>>>>>>>  -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
>>>>>>> -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
>>>>>>> -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
>>>>>>> -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
>>>>>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib
>>>>>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib
>>>>>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib
>>>>>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib
>>>>>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64
>>>>>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64
>>>>>>> -Wl,-rpath,/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib
>>>>>>> -L/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib
>>>>>>> -Wl,-rpath,/usr/lib/gcc/ppc64le-redhat-linux/8
>>>>>>> -L/usr/lib/gcc/ppc64le-redhat-linux/8 -lpetsc -llapack -lblas -lparmetis
>>>>>>> -lmetis -lstdc++ -ldl -lpthread -lmpiprofilesupport -lmpi_ibm_usempif08
>>>>>>> -lmpi_ibm_usempi_ignore_tkr -lmpi_ibm_mpifh -lmpi_ibm -lnvf -lnvomp
>>>>>>> -latomic -lnvhpcatm -lnvcpumath -lnvc -lrt -lm -lgcc_s -lstdc++ -ldl -o ex1
>>>>>>> 19:40 main=
>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ *mpicc
>>>>>>> --version*
>>>>>>>
>>>>>>> *nvc 21.7-0 linuxpower target on Linuxpower*
>>>>>>> NVIDIA Compilers and Tools
>>>>>>> Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES.  All rights
>>>>>>> reserved.
>>>>>>> 19:40 main=
>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ jsrun -n 1
>>>>>>> ./ex1 -ksp_monitor
>>>>>>>     0 KSP Residual norm 6.041522986797e+00
>>>>>>>     1 KSP Residual norm 1.042493382631e+00
>>>>>>>     2 KSP Residual norm 7.950907844730e-16
>>>>>>>     0 KSP Residual norm 4.786756692342e+00
>>>>>>>     1 KSP Residual norm 1.426392207750e-01
>>>>>>>     2 KSP Residual norm 1.801079604472e-15
>>>>>>>     0 KSP Residual norm 2.986456323228e+00
>>>>>>>     1 KSP Residual norm 7.669888809223e-02
>>>>>>>     2 KSP Residual norm 3.744083117256e-16
>>>>>>>     0 KSP Residual norm 2.306244667700e-01
>>>>>>>     1 KSP Residual norm 1.355550749587e-02
>>>>>>>     2 KSP Residual norm 5.845524837731e-17
>>>>>>>     0 KSP Residual norm 1.936314002654e-03
>>>>>>>     1 KSP Residual norm 2.125593590819e-04
>>>>>>>     2 KSP Residual norm 6.987141455073e-20
>>>>>>>     0 KSP Residual norm 1.435593531990e-07
>>>>>>>     1 KSP Residual norm 2.588271385567e-08
>>>>>>>     2 KSP Residual norm 3.942196167935e-23
>>>>>>>
>>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210827/302afa45/attachment-0001.html>

From junchao.zhang at gmail.com  Fri Aug 27 18:07:11 2021
From: junchao.zhang at gmail.com (Junchao Zhang)
Date: Fri, 27 Aug 2021 18:07:11 -0500
Subject: [petsc-users] runtime error on Summit with nvhpc21.7
In-Reply-To: <CADOhEh6dhAvuxphjBvqUMP0CpugwK8LRQrUVEAWn893fPuiN0A@mail.gmail.com>
References: <CADOhEh6PL+5hg1_85WYP_XSMM7OLNd0gD8v4Z4ci97CHQS=kDw@mail.gmail.com>
	<CA+MQGp_zF6bhBcp1Vk8BuCK3fgyUHvBrva1ZZ9c8y1KXbjiCNA@mail.gmail.com>
	<CADOhEh5DgMAfziCA+fYY0O4GKLpPzYEyth_o8uFCVDJcZ09zjg@mail.gmail.com>
	<CA+MQGp_xq3UkDfri25Kj+ZPdGp0+Bcj808tKF7v8SWWD9PLM_Q@mail.gmail.com>
	<CADOhEh52X01-pO1pji7Byv+5xEMoQkSoBzRQYqK1GhOp+HV_zQ@mail.gmail.com>
	<CA+MQGp-Q+S=4FC_Pue-J6t5u78f2QYZ6yqMzn1vvSoe2zsoNRQ@mail.gmail.com>
	<CADOhEh5oviPo8hrV1jB7sfAeJPthBTVR0ckGcKtfRPc3esEg_Q@mail.gmail.com>
	<CADOhEh6dhAvuxphjBvqUMP0CpugwK8LRQrUVEAWn893fPuiN0A@mail.gmail.com>
Message-ID: <CA+MQGp-506KhLxD=BY5=Lxt3unB6ExW5i-GQFK_USFcWzSqLmg@mail.gmail.com>

On Fri, Aug 27, 2021 at 5:16 PM Mark Adams <mfadams at lbl.gov> wrote:

> And I found that this C++ code calls PetscIntiialize from Fortran code.
> Hence the Fortran library in the call stack.
>
> F90 tests work but our tests are pure Fortran.
>
> Should they be using nvcc_wrapper (a Kokkos version) as a linker?
>
 I don't think so.  @Satish.


> Thanks,
> Mark
>
> On Fri, Aug 27, 2021 at 6:05 PM Mark Adams <mfadams at lbl.gov> wrote:
>
>>
>>
>> On Fri, Aug 27, 2021 at 5:03 PM Junchao Zhang <junchao.zhang at gmail.com>
>> wrote:
>>
>>> I don't understand the configure options
>>>
>>>
>>> --with-cc=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/
>>> *nvcc_wrapper*
>>> --with-cxx=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper
>>> --with-fc=/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/bin/mpifort
>>> COPTFLAGS="-g -fast" CXXOPTFLAGS="-g -fast" FOPTFLAGS="-g -fast"
>>> CUDAFLAGS="-ccbin nvc++" --with-ssl=0 --with-batch=0 --with-mpiexec="jsrun
>>> -g 1" *--with-cuda=0*
>>> --with-cudac=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper
>>> --with-cuda-gencodearch=70 --download-metis --download-parmetis --with-x=0
>>> --with-debugging=0 PETSC_ARCH=arch-summit-opt-nvhpc
>>> --prefix=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b
>>>
>>> Why do you need to use nvcc_wrapper if you do not want to use cuda?
>>>
>>
>> That code that is having a problem links with nvcc_wrapper.
>> They get a segv that I sent earlier, in PetscInitialize so I figure I
>> should use the same compiler / linker.
>> They use CUDA, but we don't need PETSc to use CUDA now.
>>
>>
>>> In addition, nvcc_wrapper is a C++ compiler. Using it for --with-cc=,
>>> you also need --with-clanguage=c++
>>>
>>
>> I rebuilt PETSc with mpicc, mpiCC, mpif90 and --with-nvcc=nvcc_wrapper
>> and that built make check works. I gave it to them to test.
>>
>> Thanks,
>> Mark
>>
>>
>>>
>>> --Junchao Zhang
>>>
>>>
>>> On Fri, Aug 27, 2021 at 3:28 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Aug 27, 2021 at 3:56 PM Junchao Zhang <junchao.zhang at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Fri, Aug 27, 2021, 1:52 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>
>>>>>> I think the problem is that I build with MPICC and they use nvcc_wrapper.
>>>>>> I could just try building PETSc with CC=nvcc_wrapper, but it was not
>>>>>> clear if this was the way to go.
>>>>>>
>>>>> --with-nvcc=nvcc_wrapper
>>>>>
>>>>
>>>> What do I specify for cc and CC?
>>>>
>>>>
>>>>> I will try it.
>>>>>> Thanks,
>>>>>> Mark
>>>>>>
>>>>>> On Fri, Aug 27, 2021 at 10:50 AM Junchao Zhang <
>>>>>> junchao.zhang at gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Aug 27, 2021 at 7:06 AM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>>>
>>>>>>>> I have a user (cc'ed) that has a C++ code and is using a PETSc that
>>>>>>>> I built. He is getting this runtime error.
>>>>>>>>
>>>>>>>> 'make check' runs clean and I built snes/tutorial/ex1 manually, to
>>>>>>>> get a link line, and it ran fine.
>>>>>>>> I appended the users link line and my test.
>>>>>>>>
>>>>>>>> I see that they are using Kokkos' "nvcc_wrapper". Should I rebuild
>>>>>>>> PETSc using that, maybe we just need to make sure we are both using the
>>>>>>>> same underlying compiler or should they use mpiCC?
>>>>>>>>
>>>>>>> It looks like they used nvcc_wrapper to replace nvcc.  You can ask
>>>>>>> them to use nvcc directly to see what happens. But the error happened in
>>>>>>> petsc initialization, petscsys_petscinitializenohelp, so I doubt it
>>>>>>> helps.  The easy way is to just attach a debugger.
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Mark
>>>>>>>>
>>>>>>>>
>>>>>>>> [e13n16:591873] *** Process received signal ***
>>>>>>>>
>>>>>>>> [e13n16:591873] Signal: Segmentation fault (11)
>>>>>>>>
>>>>>>>> [e13n16:591873] Signal code: Invalid permissions (2)
>>>>>>>>
>>>>>>>> [e13n16:591873] Failing at address: 0x102c87e0
>>>>>>>>
>>>>>>>> [e13n16:591873] [ 0]
>>>>>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>>>>>>>>
>>>>>>>> [e13n16:591873] [ 1] [e13n16:591872] *** Process received signal ***
>>>>>>>>
>>>>>>>> [e13n16:591872] Signal: Segmentation fault (11)
>>>>>>>>
>>>>>>>> [e13n16:591872] Signal code: Invalid permissions (2)
>>>>>>>>
>>>>>>>> [e13n16:591872] Failing at address: 0x102c87e0
>>>>>>>>
>>>>>>>> [e13n16:591872] [ 0]
>>>>>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>>>>>>>>
>>>>>>>> [e13n16:591872] [ 1] [e13n16:591871] *** Process received signal ***
>>>>>>>>
>>>>>>>> [e13n16:591871] Signal: Segmentation fault (11)
>>>>>>>>
>>>>>>>> [e13n16:591871] Signal code: Invalid permissions (2)
>>>>>>>>
>>>>>>>> [e13n16:591871] Failing at address: 0x102c87e0
>>>>>>>>
>>>>>>>> [e13n16:591871] [ 0]
>>>>>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>>>>>>>>
>>>>>>>> [e13n16:591871] [ 1]
>>>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>>>>>>>>
>>>>>>>> [e13n16:591871] [ 2] [e13n16:591874] *** Process received signal ***
>>>>>>>>
>>>>>>>> [e13n16:591874] Signal: Segmentation fault (11)
>>>>>>>>
>>>>>>>> [e13n16:591874] Signal code: Invalid permissions (2)
>>>>>>>>
>>>>>>>> [e13n16:591874] Failing at address: 0x102c87e0
>>>>>>>>
>>>>>>>> [e13n16:591874] [ 0]
>>>>>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
>>>>>>>>
>>>>>>>> [e13n16:591874] [ 1]
>>>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>>>>>>>>
>>>>>>>> [e13n16:591874] [ 2]
>>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>>>>>>>>
>>>>>>>> [e13n16:591874] [ 3]
>>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>>>>>>>>
>>>>>>>> [e13n16:591874] [ 4]
>>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>>>>>>>>
>>>>>>>> [e13n16:591874] [ 5]
>>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>>>>>>>>
>>>>>>>> [e13n16:591874] [ 6]
>>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>>>>>>>>
>>>>>>>> [e13n16:591874] [ 7]
>>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>>>>>>>>
>>>>>>>> [e13n16:591874] [ 8]
>>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>>>>>>>>
>>>>>>>> [e13n16:591871] [ 3]
>>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>>>>>>>>
>>>>>>>> [e13n16:591871] [ 4]
>>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>>>>>>>>
>>>>>>>> [e13n16:591871] [ 5]
>>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>>>>>>>>
>>>>>>>> [e13n16:591871] [ 6]
>>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>>>>>>>>
>>>>>>>> [e13n16:591871] [ 7]
>>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>>>>>>>>
>>>>>>>> [e13n16:591871] [ 8]
>>>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>>>>>>>>
>>>>>>>> [e13n16:591871] [ 9]
>>>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>>>>>>>>
>>>>>>>> [e13n16:591871] *** End of error message ***
>>>>>>>>
>>>>>>>>
>>>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>>>>>>>>
>>>>>>>> [e13n16:591874] [ 9]
>>>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>>>>>>>>
>>>>>>>> [e13n16:591874] *** End of error message ***
>>>>>>>>
>>>>>>>>
>>>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>>>>>>>>
>>>>>>>> [e13n16:591872] [ 2]
>>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>>>>>>>>
>>>>>>>> [e13n16:591872] [ 3]
>>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>>>>>>>>
>>>>>>>> [e13n16:591872] [ 4]
>>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>>>>>>>>
>>>>>>>> [e13n16:591872] [ 5]
>>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>>>>>>>>
>>>>>>>> [e13n16:591872] [ 6]
>>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>>>>>>>>
>>>>>>>> [e13n16:591872] [ 7]
>>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>>>>>>>>
>>>>>>>> [e13n16:591872] [ 8]
>>>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>>>>>>>>
>>>>>>>> [e13n16:591872] [ 9]
>>>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>>>>>>>>
>>>>>>>> [e13n16:591872] *** End of error message ***
>>>>>>>>
>>>>>>>>
>>>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>>>>>>>>
>>>>>>>> [e13n16:591873] [ 2]
>>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
>>>>>>>>
>>>>>>>> [e13n16:591873] [ 3]
>>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
>>>>>>>>
>>>>>>>> [e13n16:591873] [ 4]
>>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
>>>>>>>>
>>>>>>>> [e13n16:591873] [ 5]
>>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
>>>>>>>>
>>>>>>>> [e13n16:591873] [ 6]
>>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
>>>>>>>>
>>>>>>>> [e13n16:591873] [ 7]
>>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
>>>>>>>>
>>>>>>>> [e13n16:591873] [ 8]
>>>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
>>>>>>>>
>>>>>>>> [e13n16:591873] [ 9]
>>>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
>>>>>>>>
>>>>>>>> [e13n16:591873] *** End of error message ***
>>>>>>>>
>>>>>>>> ERROR:  One or more process (first noticed rank 1) terminated with
>>>>>>>> signal 11 (core dumped)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper
>>>>>>>> -arch=sm_70 CMakeFiles/xgc-es-cpp.dir/xgc-es-cpp_build_info.F90.o -o
>>>>>>>> bin/xgc-es-cpp  -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib:/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64
>>>>>>>> liblibxgc-es-cpp.a
>>>>>>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/liblapack.so
>>>>>>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/libblas.so
>>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so
>>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libparmetis.so
>>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libmetis.so
>>>>>>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/fftw-3.3.9-bzi7deue27ijd7xm4zn7pt22u4sj47g4/lib/libfftw3.so
>>>>>>>> libs/pspline/libpspline.a libs/camtimers/libtimers.a
>>>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libacchost.so
>>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran_mpi.so.2.7.1
>>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran.so.2.7.1
>>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscontainers.a
>>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscore.a
>>>>>>>> /usr/lib64/libcuda.so
>>>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/cuda/11.0/lib64/libcudart.so
>>>>>>>> /usr/lib64/libdl.so -lmpi_ibm_usempif08 -lmpi_ibm_usempi_ignore_tkr
>>>>>>>> -lmpi_ibm_mpifh -lnvf
>>>>>>>> -Wl,-rpath-link,/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 19:39 main=
>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ make
>>>>>>>> PETSC_DIR=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7
>>>>>>>> PETSC_ARCH="" ex1
>>>>>>>> *mpicc* -fPIC -g -fast  -fPIC -g -fast
>>>>>>>>  -I/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/include
>>>>>>>>     ex1.c
>>>>>>>>  -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
>>>>>>>> -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
>>>>>>>> -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
>>>>>>>> -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib
>>>>>>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib
>>>>>>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib
>>>>>>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib
>>>>>>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib
>>>>>>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64
>>>>>>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64
>>>>>>>> -Wl,-rpath,/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib
>>>>>>>> -L/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib
>>>>>>>> -Wl,-rpath,/usr/lib/gcc/ppc64le-redhat-linux/8
>>>>>>>> -L/usr/lib/gcc/ppc64le-redhat-linux/8 -lpetsc -llapack -lblas -lparmetis
>>>>>>>> -lmetis -lstdc++ -ldl -lpthread -lmpiprofilesupport -lmpi_ibm_usempif08
>>>>>>>> -lmpi_ibm_usempi_ignore_tkr -lmpi_ibm_mpifh -lmpi_ibm -lnvf -lnvomp
>>>>>>>> -latomic -lnvhpcatm -lnvcpumath -lnvc -lrt -lm -lgcc_s -lstdc++ -ldl -o ex1
>>>>>>>> 19:40 main=
>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ *mpicc
>>>>>>>> --version*
>>>>>>>>
>>>>>>>> *nvc 21.7-0 linuxpower target on Linuxpower*
>>>>>>>> NVIDIA Compilers and Tools
>>>>>>>> Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES.  All rights
>>>>>>>> reserved.
>>>>>>>> 19:40 main=
>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ jsrun -n 1
>>>>>>>> ./ex1 -ksp_monitor
>>>>>>>>     0 KSP Residual norm 6.041522986797e+00
>>>>>>>>     1 KSP Residual norm 1.042493382631e+00
>>>>>>>>     2 KSP Residual norm 7.950907844730e-16
>>>>>>>>     0 KSP Residual norm 4.786756692342e+00
>>>>>>>>     1 KSP Residual norm 1.426392207750e-01
>>>>>>>>     2 KSP Residual norm 1.801079604472e-15
>>>>>>>>     0 KSP Residual norm 2.986456323228e+00
>>>>>>>>     1 KSP Residual norm 7.669888809223e-02
>>>>>>>>     2 KSP Residual norm 3.744083117256e-16
>>>>>>>>     0 KSP Residual norm 2.306244667700e-01
>>>>>>>>     1 KSP Residual norm 1.355550749587e-02
>>>>>>>>     2 KSP Residual norm 5.845524837731e-17
>>>>>>>>     0 KSP Residual norm 1.936314002654e-03
>>>>>>>>     1 KSP Residual norm 2.125593590819e-04
>>>>>>>>     2 KSP Residual norm 6.987141455073e-20
>>>>>>>>     0 KSP Residual norm 1.435593531990e-07
>>>>>>>>     1 KSP Residual norm 2.588271385567e-08
>>>>>>>>     2 KSP Residual norm 3.942196167935e-23
>>>>>>>>
>>>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210827/8d9c6258/attachment-0001.html>

From bsmith at petsc.dev  Fri Aug 27 23:19:06 2021
From: bsmith at petsc.dev (Barry Smith)
Date: Fri, 27 Aug 2021 23:19:06 -0500
Subject: [petsc-users] runtime error on Summit with nvhpc21.7
In-Reply-To: <CADOhEh5oviPo8hrV1jB7sfAeJPthBTVR0ckGcKtfRPc3esEg_Q@mail.gmail.com>
References: <CADOhEh6PL+5hg1_85WYP_XSMM7OLNd0gD8v4Z4ci97CHQS=kDw@mail.gmail.com>
	<CA+MQGp_zF6bhBcp1Vk8BuCK3fgyUHvBrva1ZZ9c8y1KXbjiCNA@mail.gmail.com>
	<CADOhEh5DgMAfziCA+fYY0O4GKLpPzYEyth_o8uFCVDJcZ09zjg@mail.gmail.com>
	<CA+MQGp_xq3UkDfri25Kj+ZPdGp0+Bcj808tKF7v8SWWD9PLM_Q@mail.gmail.com>
	<CADOhEh52X01-pO1pji7Byv+5xEMoQkSoBzRQYqK1GhOp+HV_zQ@mail.gmail.com>
	<CA+MQGp-Q+S=4FC_Pue-J6t5u78f2QYZ6yqMzn1vvSoe2zsoNRQ@mail.gmail.com>
	<CADOhEh5oviPo8hrV1jB7sfAeJPthBTVR0ckGcKtfRPc3esEg_Q@mail.gmail.com>
Message-ID: <05F13947-EAD7-442F-9346-F8203131AFD5@petsc.dev>


> On Aug 27, 2021, at 5:05 PM, Mark Adams <mfadams at lbl.gov> wrote:
> 
> 
> 
> On Fri, Aug 27, 2021 at 5:03 PM Junchao Zhang <junchao.zhang at gmail.com <mailto:junchao.zhang at gmail.com>> wrote:
> I don't understand the configure options
> 
> --with-cc=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper --with-cxx=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper --with-fc=/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/bin/mpifort COPTFLAGS="-g -fast" CXXOPTFLAGS="-g -fast" FOPTFLAGS="-g -fast" CUDAFLAGS="-ccbin nvc++" --with-ssl=0 --with-batch=0 --with-mpiexec="jsrun -g 1" --with-cuda=0 --with-cudac=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper --with-cuda-gencodearch=70 --download-metis --download-parmetis --with-x=0 --with-debugging=0 PETSC_ARCH=arch-summit-opt-nvhpc --prefix=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b
> 
> Why do you need to use nvcc_wrapper if you do not want to use cuda?
> 
> That code that is having a problem links with nvcc_wrapper.
> They get a segv that I sent earlier, in PetscInitialize so I figure I should use the same compiler / linker.
> They use CUDA, but we don't need PETSc to use CUDA now.
>  
> In addition, nvcc_wrapper is a C++ compiler. Using it for --with-cc=, you also need --with-clanguage=c++ 
> 
> I rebuilt PETSc with mpicc, mpiCC, mpif90 and --with-nvcc=nvcc_wrapper and that built make check works. I gave it to them to test.

   This is an odd way to do it. The Kokkos nvcc_wrapper wraps the nvcc compiler to allow it to compile Kokkos code and link it against the Kokkos libraries; so using nvcc_wrapper as nvcc is strangely recursive; sure everything in PETSc/Kokkos may build ok (assuming the nvcc that the nvcc_wrapper uses is correct for the situation and uses a correct underlying C++) but it is freakish. PETSc should just be built with the same nvcc that the nvcc_wrapper is using and using the same inner C++ compiler. 

  I suspect the crashes came from the /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper not using the same nvcc and internal C++ compiler as PETSc is ending up using.

  But if it works, I guess it works. Perhaps when PETSc does not build Kokkos we should have a --with-kokkos-nvcc-wrapper= to allow setting the wrapper.

 Barry


> 
> Thanks,
> Mark
>  
> 
> --Junchao Zhang
> 
> 
> On Fri, Aug 27, 2021 at 3:28 PM Mark Adams <mfadams at lbl.gov <mailto:mfadams at lbl.gov>> wrote:
> 
> 
> On Fri, Aug 27, 2021 at 3:56 PM Junchao Zhang <junchao.zhang at gmail.com <mailto:junchao.zhang at gmail.com>> wrote:
> 
> 
> On Fri, Aug 27, 2021, 1:52 PM Mark Adams <mfadams at lbl.gov <mailto:mfadams at lbl.gov>> wrote:
> I think the problem is that I build with MPICC and they use nvcc_wrapper. I could just try building PETSc with CC=nvcc_wrapper, but it was not clear if this was the way to go.
> --with-nvcc=nvcc_wrapper
> 
> What do I specify for cc and CC?
>  
> I will try it.
> Thanks,
> Mark
> 
> On Fri, Aug 27, 2021 at 10:50 AM Junchao Zhang <junchao.zhang at gmail.com <mailto:junchao.zhang at gmail.com>> wrote:
> 
> 
> 
> On Fri, Aug 27, 2021 at 7:06 AM Mark Adams <mfadams at lbl.gov <mailto:mfadams at lbl.gov>> wrote:
> I have a user (cc'ed) that has a C++ code and is using a PETSc that I built. He is getting this runtime error.
> 
> 'make check' runs clean and I built snes/tutorial/ex1 manually, to get a link line, and it ran fine.
> I appended the users link line and my test.
> 
> I see that they are using Kokkos' "nvcc_wrapper". Should I rebuild PETSc using that, maybe we just need to make sure we are both using the same underlying compiler or should they use mpiCC?
> It looks like they used nvcc_wrapper to replace nvcc.  You can ask them to use nvcc directly to see what happens. But the error happened in petsc initialization, petscsys_petscinitializenohelp, so I doubt it helps.  The easy way is to just attach a debugger. 
>  
> Thanks,
> Mark
> 
> 
> [e13n16:591873] *** Process received signal ***
> [e13n16:591873] Signal: Segmentation fault (11)
> [e13n16:591873] Signal code: Invalid permissions (2)
> [e13n16:591873] Failing at address: 0x102c87e0
> [e13n16:591873] [ 0] linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
> [e13n16:591873] [ 1] [e13n16:591872] *** Process received signal ***
> [e13n16:591872] Signal: Segmentation fault (11)
> [e13n16:591872] Signal code: Invalid permissions (2)
> [e13n16:591872] Failing at address: 0x102c87e0
> [e13n16:591872] [ 0] linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
> [e13n16:591872] [ 1] [e13n16:591871] *** Process received signal ***
> [e13n16:591871] Signal: Segmentation fault (11)
> [e13n16:591871] Signal code: Invalid permissions (2)
> [e13n16:591871] Failing at address: 0x102c87e0
> [e13n16:591871] [ 0] linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
> [e13n16:591871] [ 1] /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
> [e13n16:591871] [ 2] [e13n16:591874] *** Process received signal ***
> [e13n16:591874] Signal: Segmentation fault (11)
> [e13n16:591874] Signal code: Invalid permissions (2)
> [e13n16:591874] Failing at address: 0x102c87e0
> [e13n16:591874] [ 0] linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8]
> [e13n16:591874] [ 1] /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
> [e13n16:591874] [ 2] /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
> [e13n16:591874] [ 3] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
> [e13n16:591874] [ 4] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
> [e13n16:591874] [ 5] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
> [e13n16:591874] [ 6] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
> [e13n16:591874] [ 7] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
> [e13n16:591874] [ 8] /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
> [e13n16:591871] [ 3] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
> [e13n16:591871] [ 4] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
> [e13n16:591871] [ 5] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
> [e13n16:591871] [ 6] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
> [e13n16:591871] [ 7] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
> [e13n16:591871] [ 8] /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
> [e13n16:591871] [ 9] /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
> [e13n16:591871] *** End of error message ***
> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
> [e13n16:591874] [ 9] /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
> [e13n16:591874] *** End of error message ***
> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
> [e13n16:591872] [ 2] /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
> [e13n16:591872] [ 3] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
> [e13n16:591872] [ 4] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
> [e13n16:591872] [ 5] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
> [e13n16:591872] [ 6] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
> [e13n16:591872] [ 7] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
> [e13n16:591872] [ 8] /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
> [e13n16:591872] [ 9] /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
> [e13n16:591872] *** End of error message ***
> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
> [e13n16:591873] [ 2] /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec]
> [e13n16:591873] [ 3] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8]
> [e13n16:591873] [ 4] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60]
> [e13n16:591873] [ 5] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0]
> [e13n16:591873] [ 6] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14]
> [e13n16:591873] [ 7] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0]
> [e13n16:591873] [ 8] /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078]
> [e13n16:591873] [ 9] /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264]
> [e13n16:591873] *** End of error message ***
> ERROR:  One or more process (first noticed rank 1) terminated with signal 11 (core dumped)
> 
> 
> 
> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper -arch=sm_70 CMakeFiles/xgc-es-cpp.dir/xgc-es-cpp_build_info.F90.o -o bin/xgc-es-cpp  -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib:/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64 liblibxgc-es-cpp.a /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/liblapack.so /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/libblas.so /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libparmetis.so /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libmetis.so /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/fftw-3.3.9-bzi7deue27ijd7xm4zn7pt22u4sj47g4/lib/libfftw3.so libs/pspline/libpspline.a libs/camtimers/libtimers.a /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libacchost.so /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran_mpi.so.2.7.1 /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran.so.2.7.1 /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscontainers.a /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscore.a /usr/lib64/libcuda.so /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/cuda/11.0/lib64/libcudart.so /usr/lib64/libdl.so -lmpi_ibm_usempif08 -lmpi_ibm_usempi_ignore_tkr -lmpi_ibm_mpifh -lnvf -Wl,-rpath-link,/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64
> 
> 
> 19:39 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ make PETSC_DIR=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7 PETSC_ARCH="" ex1
> mpicc -fPIC -g -fast  -fPIC -g -fast    -I/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/include     ex1.c  -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64 -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64 -Wl,-rpath,/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib -L/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib -Wl,-rpath,/usr/lib/gcc/ppc64le-redhat-linux/8 -L/usr/lib/gcc/ppc64le-redhat-linux/8 -lpetsc -llapack -lblas -lparmetis -lmetis -lstdc++ -ldl -lpthread -lmpiprofilesupport -lmpi_ibm_usempif08 -lmpi_ibm_usempi_ignore_tkr -lmpi_ibm_mpifh -lmpi_ibm -lnvf -lnvomp -latomic -lnvhpcatm -lnvcpumath -lnvc -lrt -lm -lgcc_s -lstdc++ -ldl -o ex1
> 19:40 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ mpicc --version
> 
> nvc 21.7-0 linuxpower target on Linuxpower
> NVIDIA Compilers and Tools
> Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
> 19:40 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ jsrun -n 1 ./ex1 -ksp_monitor
>     0 KSP Residual norm 6.041522986797e+00
>     1 KSP Residual norm 1.042493382631e+00
>     2 KSP Residual norm 7.950907844730e-16
>     0 KSP Residual norm 4.786756692342e+00
>     1 KSP Residual norm 1.426392207750e-01
>     2 KSP Residual norm 1.801079604472e-15
>     0 KSP Residual norm 2.986456323228e+00
>     1 KSP Residual norm 7.669888809223e-02
>     2 KSP Residual norm 3.744083117256e-16
>     0 KSP Residual norm 2.306244667700e-01
>     1 KSP Residual norm 1.355550749587e-02
>     2 KSP Residual norm 5.845524837731e-17
>     0 KSP Residual norm 1.936314002654e-03
>     1 KSP Residual norm 2.125593590819e-04
>     2 KSP Residual norm 6.987141455073e-20
>     0 KSP Residual norm 1.435593531990e-07
>     1 KSP Residual norm 2.588271385567e-08
>     2 KSP Residual norm 3.942196167935e-23
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210827/d471be0e/attachment-0001.html>

From mfadams at lbl.gov  Sat Aug 28 06:40:20 2021
From: mfadams at lbl.gov (Mark Adams)
Date: Sat, 28 Aug 2021 07:40:20 -0400
Subject: [petsc-users] runtime error on Summit with nvhpc21.7
In-Reply-To: <05F13947-EAD7-442F-9346-F8203131AFD5@petsc.dev>
References: <CADOhEh6PL+5hg1_85WYP_XSMM7OLNd0gD8v4Z4ci97CHQS=kDw@mail.gmail.com>
	<CA+MQGp_zF6bhBcp1Vk8BuCK3fgyUHvBrva1ZZ9c8y1KXbjiCNA@mail.gmail.com>
	<CADOhEh5DgMAfziCA+fYY0O4GKLpPzYEyth_o8uFCVDJcZ09zjg@mail.gmail.com>
	<CA+MQGp_xq3UkDfri25Kj+ZPdGp0+Bcj808tKF7v8SWWD9PLM_Q@mail.gmail.com>
	<CADOhEh52X01-pO1pji7Byv+5xEMoQkSoBzRQYqK1GhOp+HV_zQ@mail.gmail.com>
	<CA+MQGp-Q+S=4FC_Pue-J6t5u78f2QYZ6yqMzn1vvSoe2zsoNRQ@mail.gmail.com>
	<CADOhEh5oviPo8hrV1jB7sfAeJPthBTVR0ckGcKtfRPc3esEg_Q@mail.gmail.com>
	<05F13947-EAD7-442F-9346-F8203131AFD5@petsc.dev>
Message-ID: <CADOhEh48CJ36EGGADW_upjSwLtzxtZh81h48o47k-WCNb5+E+w@mail.gmail.com>

cc'ing Robert who is taking over from Aaron for a few days.

Robert, I suggest hoisting PetscInitailize into main or at least a C call
of some sort.

This error in pgf90_str_copy_klen might be be avoided by not giving
PetscInitialize a string ('petsc.rc') and linking petsc.rc --> .petscrc
(PETSc will look for .petscrc by default).

more below.

On Sat, Aug 28, 2021 at 12:19 AM Barry Smith <bsmith at petsc.dev> wrote:

>
>
> On Aug 27, 2021, at 5:05 PM, Mark Adams <mfadams at lbl.gov> wrote:
>
>
>
> On Fri, Aug 27, 2021 at 5:03 PM Junchao Zhang <junchao.zhang at gmail.com>
> wrote:
>
>> I don't understand the configure options
>>
>>
>> --with-cc=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/
>> *nvcc_wrapper*
>> --with-cxx=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper
>> --with-fc=/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/bin/mpifort
>> COPTFLAGS="-g -fast" CXXOPTFLAGS="-g -fast" FOPTFLAGS="-g -fast"
>> CUDAFLAGS="-ccbin nvc++" --with-ssl=0 --with-batch=0 --with-mpiexec="jsrun
>> -g 1" *--with-cuda=0*
>> --with-cudac=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper
>> --with-cuda-gencodearch=70 --download-metis --download-parmetis --with-x=0
>> --with-debugging=0 PETSC_ARCH=arch-summit-opt-nvhpc
>> --prefix=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b
>>
>> Why do you need to use nvcc_wrapper if you do not want to use cuda?
>>
>
> That code that is having a problem links with nvcc_wrapper.
> They get a segv that I sent earlier, in PetscInitialize so I figure I
> should use the same compiler / linker.
> They use CUDA, but we don't need PETSc to use CUDA now.
>
>
>> In addition, nvcc_wrapper is a C++ compiler. Using it for --with-cc=, you
>> also need --with-clanguage=c++
>>
>
> I rebuilt PETSc with mpicc, mpiCC, mpif90 and --with-nvcc=nvcc_wrapper
> and that built make check works. I gave it to them to test.
>
>
>    This is an odd way to do it. The Kokkos nvcc_wrapper wraps the nvcc
> compiler to allow it to compile Kokkos code and link it against the Kokkos
> libraries; so using nvcc_wrapper as nvcc is strangely recursive; sure
> everything in PETSc/Kokkos may build ok (assuming the nvcc that the
> nvcc_wrapper uses is correct for the situation and uses a correct
> underlying C++) but it is freakish. PETSc should just be built with the
> same nvcc that the nvcc_wrapper is using and using the same inner C++
> compiler.
>

Yes, this convoluted. Thanks for your take on this.
That said, they have been struggling to get Kokkos to build with nvhpc and
I can see this is what they have compiling and are pressing on with a
milestone that is due soon.

Anyway, I found that PetscInitialize is called from Fortran code (in 25+
years have we ever seen a C++ code call PetscInitialize from a Fortran
subroutine ?), which should be fine. Just unusual.
This explains the error coming from a Fortran library:

[e13n16:591874] [ 1]
>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4]
>>>>>>
>>>>>
They are using this Fortran compiler and so I built PETSc with it:

/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/bin/mpifort

I see:

07:16 1  ~$ which mpif90
/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/bin/mpif90

so this is the nvhpc-21.7 Fortran.

Thanks,
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210828/32b38355/attachment.html>

From olivier.jamond at cea.fr  Mon Aug 30 13:17:16 2021
From: olivier.jamond at cea.fr (Olivier Jamond)
Date: Mon, 30 Aug 2021 20:17:16 +0200
Subject: [petsc-users] KSPSolve with MPIAIJ with non-square 'diagonal parts'
Message-ID: <7f8f699a-e9b5-768d-828c-b8a923f0fe75@cea.fr>

Hello,

I am sorry because I surely miss something, but I cannot manage to solve 
a problem with a MPIAIJ matrix which has non-square 'diagonal parts'.

I copy/paste at the bottom of this message a very simple piece of code 
which causes me troubles. I this code I try to do 'x=KSP(A)*b' (with 
gmres/jacobi), but this fails whereas a matrix multiplication 'b=A*x' 
seems to work. Is ksp with such a matrix supposed to work (I can't find 
anything in the documentation about that, so I guess that it is...) ?

Many thanks,
Olivier

NB: this code should be launched with exactly 3 procs

#include "petscsys.h" /* framework routines */
#include "petscvec.h" /* vectors */
#include "petscmat.h" /* matrices */
#include "petscksp.h"

#include <vector>
#include <string>
#include <iostream>
#include <numeric>

static char help[] = "Trying to solve a linear system on a sub-block of 
a matrix\n\n";
int main(int argc, char **argv)
{
 ? MPI_Init(NULL, NULL);
 ? PetscErrorCode ierr;
 ? ierr = PetscInitialize(&argc, &argv, NULL, help);
 ? CHKERRQ(ierr);

 ? // clang-format off
 ? std::vector<std::vector<double>> AA = {
 ????? { 1,? 2,? 0, /**/ 0,? 3, /**/ 0,? 0,? 4},
 ????? { 0,? 5,? 6, /**/ 7,? 0, /**/ 0,? 8,? 0},
 ????? { 9,? 0, 10, /**/11,? 0, /**/ 0, 12,? 0},
 ????? //---------------------------------------
 ????? {13,? 0, 14, /**/15, 16, /**/17,? 0,? 0},
 ????? { 0, 18,? 0, /**/19, 20, /**/21,? 0,? 0},
 ????? { 0,? 0,? 0, /**/22, 23, /**/ 1, 24,? 0},
 ????? //--------------------------------------
 ????? {25, 26, 27, /**/ 0,? 0, /**/28, 29,? 0},
 ????? {30,? 0,? 0, /**/31, 32, /**/33,? 0, 34},
 ? };


 ? std::vector<double> bb = {1.,
 ??????????????????????????? 1.,
 ??????????????????????????? 1.,
 ??????????????????????????? //
 ??????????????????????????? 1.,
 ??????????????????????????? 1.,
 ??????????????????????????? 1.,
 ??????????????????????????? //
 ??????????????????????????? 1.,
 ??????????????????????????? 1.};


 ? std::vector<int> nDofsRow = {3, 3, 2};
 ? std::vector<int> nDofsCol = {3, 2, 3};
 ? // clang-format on

 ? int NDofs = std::accumulate(nDofsRow.begin(), nDofsRow.end(), 0);

 ? int pRank, nProc;
 ? MPI_Comm_rank(PETSC_COMM_WORLD, &pRank);
 ? MPI_Comm_size(PETSC_COMM_WORLD, &nProc);

 ? if (nProc != 3) {
 ??? std::cerr << "THIS TEST MUST BE LAUNCHED WITH EXACTLY 3 PROCS\n";
 ??? abort();
 ? }

 ? Mat A;
 ? MatCreate(PETSC_COMM_WORLD, &A);
 ? MatSetType(A, MATMPIAIJ);
 ? MatSetSizes(A, nDofsRow[pRank], nDofsCol[pRank], PETSC_DETERMINE, 
PETSC_DETERMINE);
 ? MatMPIAIJSetPreallocation(A, NDofs, NULL, NDofs, NULL);

 ? Vec b;
 ? VecCreate(PETSC_COMM_WORLD, &b);
 ? VecSetType(b, VECMPI);
 ? VecSetSizes(b, nDofsRow[pRank], PETSC_DECIDE);

 ? if (pRank == 0) {
 ??? for (int i = 0; i < NDofs; ++i) {
 ????? for (int j = 0; j < NDofs; ++j) {
 ??????? if (AA[i][j] != 0.) {
 ????????? MatSetValue(A, i, j, AA[i][j], ADD_VALUES);
 ??????? }
 ????? }
 ????? VecSetValue(b, i, bb[i], ADD_VALUES);
 ??? }
 ? }

 ? MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);
 ? MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);
 ? VecAssemblyBegin(b);
 ? VecAssemblyEnd(b);

 ? PetscViewerPushFormat(PETSC_VIEWER_STDOUT_WORLD, 
PETSC_VIEWER_ASCII_DENSE);
 ? MatView(A, PETSC_VIEWER_STDOUT_WORLD);
 ? VecView(b, PETSC_VIEWER_STDOUT_WORLD);

 ? KSP ksp;
 ? KSPCreate(PETSC_COMM_WORLD, &ksp);
 ? KSPSetOperators(ksp, A, A);
 ? KSPSetFromOptions(ksp);

 ? PC pc;
 ? KSPGetPC(ksp, &pc);
 ? PCSetFromOptions(pc);

 ? Vec x;
 ? MatCreateVecs(A, &x, NULL);
 ? ierr = KSPSolve(ksp, b, x);??? // this fails
 ? MatMult(A, x, b);????????????? // whereas the seems to be ok...

 ? VecView(x, PETSC_VIEWER_STDOUT_WORLD);

 ? MPI_Finalize();

 ? return 0;
}


From stefano.zampini at gmail.com  Mon Aug 30 15:42:21 2021
From: stefano.zampini at gmail.com (Stefano Zampini)
Date: Mon, 30 Aug 2021 23:42:21 +0300
Subject: [petsc-users] KSPSolve with MPIAIJ with non-square 'diagonal
 parts'
In-Reply-To: <7f8f699a-e9b5-768d-828c-b8a923f0fe75@cea.fr>
References: <7f8f699a-e9b5-768d-828c-b8a923f0fe75@cea.fr>
Message-ID: <D04DF0BD-FFD7-42CF-AB06-CB3E2E3C248D@gmail.com>

What is the error you are getting from the KSP? Default solver in parallel in BlockJacobi+ILU which does not work for non-square matrices. You do not need to call PCSetFromOptions on the pc. Just call KSPSetFromOptions and run with -pc_type none

> On Aug 30, 2021, at 9:17 PM, Olivier Jamond <olivier.jamond at cea.fr> wrote:
> 
> Hello,
> 
> I am sorry because I surely miss something, but I cannot manage to solve a problem with a MPIAIJ matrix which has non-square 'diagonal parts'.
> 
> I copy/paste at the bottom of this message a very simple piece of code which causes me troubles. I this code I try to do 'x=KSP(A)*b' (with gmres/jacobi), but this fails whereas a matrix multiplication 'b=A*x' seems to work. Is ksp with such a matrix supposed to work (I can't find anything in the documentation about that, so I guess that it is...) ?
> 
> Many thanks,
> Olivier
> 
> NB: this code should be launched with exactly 3 procs
> 
> #include "petscsys.h" /* framework routines */
> #include "petscvec.h" /* vectors */
> #include "petscmat.h" /* matrices */
> #include "petscksp.h"
> 
> #include <vector>
> #include <string>
> #include <iostream>
> #include <numeric>
> 
> static char help[] = "Trying to solve a linear system on a sub-block of a matrix\n\n";
> int main(int argc, char **argv)
> {
>   MPI_Init(NULL, NULL);
>   PetscErrorCode ierr;
>   ierr = PetscInitialize(&argc, &argv, NULL, help);
>   CHKERRQ(ierr);
> 
>   // clang-format off
>   std::vector<std::vector<double>> AA = {
>       { 1,  2,  0, /**/ 0,  3, /**/ 0,  0,  4},
>       { 0,  5,  6, /**/ 7,  0, /**/ 0,  8,  0},
>       { 9,  0, 10, /**/11,  0, /**/ 0, 12,  0},
>       //---------------------------------------
>       {13,  0, 14, /**/15, 16, /**/17,  0,  0},
>       { 0, 18,  0, /**/19, 20, /**/21,  0,  0},
>       { 0,  0,  0, /**/22, 23, /**/ 1, 24,  0},
>       //--------------------------------------
>       {25, 26, 27, /**/ 0,  0, /**/28, 29,  0},
>       {30,  0,  0, /**/31, 32, /**/33,  0, 34},
>   };
> 
> 
>   std::vector<double> bb = {1.,
>                             1.,
>                             1.,
>                             //
>                             1.,
>                             1.,
>                             1.,
>                             //
>                             1.,
>                             1.};
> 
> 
>   std::vector<int> nDofsRow = {3, 3, 2};
>   std::vector<int> nDofsCol = {3, 2, 3};
>   // clang-format on
> 
>   int NDofs = std::accumulate(nDofsRow.begin(), nDofsRow.end(), 0);
> 
>   int pRank, nProc;
>   MPI_Comm_rank(PETSC_COMM_WORLD, &pRank);
>   MPI_Comm_size(PETSC_COMM_WORLD, &nProc);
> 
>   if (nProc != 3) {
>     std::cerr << "THIS TEST MUST BE LAUNCHED WITH EXACTLY 3 PROCS\n";
>     abort();
>   }
> 
>   Mat A;
>   MatCreate(PETSC_COMM_WORLD, &A);
>   MatSetType(A, MATMPIAIJ);
>   MatSetSizes(A, nDofsRow[pRank], nDofsCol[pRank], PETSC_DETERMINE, PETSC_DETERMINE);
>   MatMPIAIJSetPreallocation(A, NDofs, NULL, NDofs, NULL);
> 
>   Vec b;
>   VecCreate(PETSC_COMM_WORLD, &b);
>   VecSetType(b, VECMPI);
>   VecSetSizes(b, nDofsRow[pRank], PETSC_DECIDE);
> 
>   if (pRank == 0) {
>     for (int i = 0; i < NDofs; ++i) {
>       for (int j = 0; j < NDofs; ++j) {
>         if (AA[i][j] != 0.) {
>           MatSetValue(A, i, j, AA[i][j], ADD_VALUES);
>         }
>       }
>       VecSetValue(b, i, bb[i], ADD_VALUES);
>     }
>   }
> 
>   MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);
>   MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);
>   VecAssemblyBegin(b);
>   VecAssemblyEnd(b);
> 
>   PetscViewerPushFormat(PETSC_VIEWER_STDOUT_WORLD, PETSC_VIEWER_ASCII_DENSE);
>   MatView(A, PETSC_VIEWER_STDOUT_WORLD);
>   VecView(b, PETSC_VIEWER_STDOUT_WORLD);
> 
>   KSP ksp;
>   KSPCreate(PETSC_COMM_WORLD, &ksp);
>   KSPSetOperators(ksp, A, A);
>   KSPSetFromOptions(ksp);
> 
>   PC pc;
>   KSPGetPC(ksp, &pc);
>   PCSetFromOptions(pc);
> 
>   Vec x;
>   MatCreateVecs(A, &x, NULL);
>   ierr = KSPSolve(ksp, b, x);    // this fails
>   MatMult(A, x, b);              // whereas the seems to be ok...
> 
>   VecView(x, PETSC_VIEWER_STDOUT_WORLD);
> 
>   MPI_Finalize();
> 
>   return 0;
> }
> 


From knepley at gmail.com  Mon Aug 30 16:31:02 2021
From: knepley at gmail.com (Matthew Knepley)
Date: Mon, 30 Aug 2021 17:31:02 -0400
Subject: [petsc-users] KSPSolve with MPIAIJ with non-square 'diagonal
 parts'
In-Reply-To: <D04DF0BD-FFD7-42CF-AB06-CB3E2E3C248D@gmail.com>
References: <7f8f699a-e9b5-768d-828c-b8a923f0fe75@cea.fr>
	<D04DF0BD-FFD7-42CF-AB06-CB3E2E3C248D@gmail.com>
Message-ID: <CAMYG4GkgmOnRvPdds3NoFcoxnEn8U2XPh4cY77HTcgp3wYG3rg@mail.gmail.com>

On Mon, Aug 30, 2021 at 4:42 PM Stefano Zampini <stefano.zampini at gmail.com>
wrote:

> What is the error you are getting from the KSP? Default solver in parallel
> in BlockJacobi+ILU which does not work for non-square matrices. You do not
> need to call PCSetFromOptions on the pc. Just call KSPSetFromOptions and
> run with -pc_type none
>

I have a more basic question. The idea of GMRES is the following:

  We build a space, {r, A r, A^2 r, ...} for the solution

This means that r and Ar must have compatible layouts. It does not sound
like this is the case for you.

  Thanks,

     Matt


> > On Aug 30, 2021, at 9:17 PM, Olivier Jamond <olivier.jamond at cea.fr>
> wrote:
> >
> > Hello,
> >
> > I am sorry because I surely miss something, but I cannot manage to solve
> a problem with a MPIAIJ matrix which has non-square 'diagonal parts'.
> >
> > I copy/paste at the bottom of this message a very simple piece of code
> which causes me troubles. I this code I try to do 'x=KSP(A)*b' (with
> gmres/jacobi), but this fails whereas a matrix multiplication 'b=A*x' seems
> to work. Is ksp with such a matrix supposed to work (I can't find anything
> in the documentation about that, so I guess that it is...) ?
> >
> > Many thanks,
> > Olivier
> >
> > NB: this code should be launched with exactly 3 procs
> >
> > #include "petscsys.h" /* framework routines */
> > #include "petscvec.h" /* vectors */
> > #include "petscmat.h" /* matrices */
> > #include "petscksp.h"
> >
> > #include <vector>
> > #include <string>
> > #include <iostream>
> > #include <numeric>
> >
> > static char help[] = "Trying to solve a linear system on a sub-block of
> a matrix\n\n";
> > int main(int argc, char **argv)
> > {
> >   MPI_Init(NULL, NULL);
> >   PetscErrorCode ierr;
> >   ierr = PetscInitialize(&argc, &argv, NULL, help);
> >   CHKERRQ(ierr);
> >
> >   // clang-format off
> >   std::vector<std::vector<double>> AA = {
> >       { 1,  2,  0, /**/ 0,  3, /**/ 0,  0,  4},
> >       { 0,  5,  6, /**/ 7,  0, /**/ 0,  8,  0},
> >       { 9,  0, 10, /**/11,  0, /**/ 0, 12,  0},
> >       //---------------------------------------
> >       {13,  0, 14, /**/15, 16, /**/17,  0,  0},
> >       { 0, 18,  0, /**/19, 20, /**/21,  0,  0},
> >       { 0,  0,  0, /**/22, 23, /**/ 1, 24,  0},
> >       //--------------------------------------
> >       {25, 26, 27, /**/ 0,  0, /**/28, 29,  0},
> >       {30,  0,  0, /**/31, 32, /**/33,  0, 34},
> >   };
> >
> >
> >   std::vector<double> bb = {1.,
> >                             1.,
> >                             1.,
> >                             //
> >                             1.,
> >                             1.,
> >                             1.,
> >                             //
> >                             1.,
> >                             1.};
> >
> >
> >   std::vector<int> nDofsRow = {3, 3, 2};
> >   std::vector<int> nDofsCol = {3, 2, 3};
> >   // clang-format on
> >
> >   int NDofs = std::accumulate(nDofsRow.begin(), nDofsRow.end(), 0);
> >
> >   int pRank, nProc;
> >   MPI_Comm_rank(PETSC_COMM_WORLD, &pRank);
> >   MPI_Comm_size(PETSC_COMM_WORLD, &nProc);
> >
> >   if (nProc != 3) {
> >     std::cerr << "THIS TEST MUST BE LAUNCHED WITH EXACTLY 3 PROCS\n";
> >     abort();
> >   }
> >
> >   Mat A;
> >   MatCreate(PETSC_COMM_WORLD, &A);
> >   MatSetType(A, MATMPIAIJ);
> >   MatSetSizes(A, nDofsRow[pRank], nDofsCol[pRank], PETSC_DETERMINE,
> PETSC_DETERMINE);
> >   MatMPIAIJSetPreallocation(A, NDofs, NULL, NDofs, NULL);
> >
> >   Vec b;
> >   VecCreate(PETSC_COMM_WORLD, &b);
> >   VecSetType(b, VECMPI);
> >   VecSetSizes(b, nDofsRow[pRank], PETSC_DECIDE);
> >
> >   if (pRank == 0) {
> >     for (int i = 0; i < NDofs; ++i) {
> >       for (int j = 0; j < NDofs; ++j) {
> >         if (AA[i][j] != 0.) {
> >           MatSetValue(A, i, j, AA[i][j], ADD_VALUES);
> >         }
> >       }
> >       VecSetValue(b, i, bb[i], ADD_VALUES);
> >     }
> >   }
> >
> >   MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);
> >   MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);
> >   VecAssemblyBegin(b);
> >   VecAssemblyEnd(b);
> >
> >   PetscViewerPushFormat(PETSC_VIEWER_STDOUT_WORLD,
> PETSC_VIEWER_ASCII_DENSE);
> >   MatView(A, PETSC_VIEWER_STDOUT_WORLD);
> >   VecView(b, PETSC_VIEWER_STDOUT_WORLD);
> >
> >   KSP ksp;
> >   KSPCreate(PETSC_COMM_WORLD, &ksp);
> >   KSPSetOperators(ksp, A, A);
> >   KSPSetFromOptions(ksp);
> >
> >   PC pc;
> >   KSPGetPC(ksp, &pc);
> >   PCSetFromOptions(pc);
> >
> >   Vec x;
> >   MatCreateVecs(A, &x, NULL);
> >   ierr = KSPSolve(ksp, b, x);    // this fails
> >   MatMult(A, x, b);              // whereas the seems to be ok...
> >
> >   VecView(x, PETSC_VIEWER_STDOUT_WORLD);
> >
> >   MPI_Finalize();
> >
> >   return 0;
> > }
> >
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210830/8b761e91/attachment.html>

From sam.guo at cd-adapco.com  Mon Aug 30 18:39:23 2021
From: sam.guo at cd-adapco.com (Sam Guo)
Date: Mon, 30 Aug 2021 16:39:23 -0700
Subject: [petsc-users] PETSc 3.15.3 compiling error
Message-ID: <CAAZdwQuOEw2289v1a--OUx=b-gfwiUi7nvps+Bwu4x1ogLdwFA@mail.gmail.com>

Dear PETSc dev team,
   I am compiling petsc 3.15.3 and got following compiling error
petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing binary
operator before token "("
   52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0)
   Any idea what I did wrong?

Thanks,
Sam
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210830/746f105c/attachment-0001.html>

From balay at mcs.anl.gov  Mon Aug 30 18:52:45 2021
From: balay at mcs.anl.gov (Satish Balay)
Date: Mon, 30 Aug 2021 18:52:45 -0500 (CDT)
Subject: [petsc-users] PETSc 3.15.3 compiling error
In-Reply-To: <CAAZdwQuOEw2289v1a--OUx=b-gfwiUi7nvps+Bwu4x1ogLdwFA@mail.gmail.com>
References: <CAAZdwQuOEw2289v1a--OUx=b-gfwiUi7nvps+Bwu4x1ogLdwFA@mail.gmail.com>
Message-ID: <cdfd44a7-a0bf-c1d8-8d7a-14752c3e379b@mcs.anl.gov>


Are you using --download-mumps or pre-installed mumps? If using pre-installed - try --download-mumps.

If you still have issues - send us configure.log and make.log from the failed build.

Satish

On Mon, 30 Aug 2021, Sam Guo wrote:

> Dear PETSc dev team,
>    I am compiling petsc 3.15.3 and got following compiling error
> petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing binary
> operator before token "("
>    52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0)
>    Any idea what I did wrong?
> 
> Thanks,
> Sam
> 


From sam.guo at cd-adapco.com  Mon Aug 30 18:56:29 2021
From: sam.guo at cd-adapco.com (Sam Guo)
Date: Mon, 30 Aug 2021 16:56:29 -0700
Subject: [petsc-users] PETSc 3.15.3 compiling error
In-Reply-To: <cdfd44a7-a0bf-c1d8-8d7a-14752c3e379b@mcs.anl.gov>
References: <CAAZdwQuOEw2289v1a--OUx=b-gfwiUi7nvps+Bwu4x1ogLdwFA@mail.gmail.com>
	<cdfd44a7-a0bf-c1d8-8d7a-14752c3e379b@mcs.anl.gov>
Message-ID: <CAAZdwQujE4cX0qtc7AydqOm3bROxyqGLvdbZwLf_rp71BGRv3g@mail.gmail.com>

I use pre-installed

On Mon, Aug 30, 2021 at 4:53 PM Satish Balay <balay at mcs.anl.gov> wrote:

>
> Are you using --download-mumps or pre-installed mumps? If using
> pre-installed - try --download-mumps.
>
> If you still have issues - send us configure.log and make.log from the
> failed build.
>
> Satish
>
> On Mon, 30 Aug 2021, Sam Guo wrote:
>
> > Dear PETSc dev team,
> >    I am compiling petsc 3.15.3 and got following compiling error
> > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing binary
> > operator before token "("
> >    52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0)
> >    Any idea what I did wrong?
> >
> > Thanks,
> > Sam
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210830/06bc0d1d/attachment.html>

From sam.guo at cd-adapco.com  Mon Aug 30 19:10:45 2021
From: sam.guo at cd-adapco.com (Sam Guo)
Date: Mon, 30 Aug 2021 17:10:45 -0700
Subject: [petsc-users] PETSc 3.15.3 compiling error
In-Reply-To: <CAAZdwQujE4cX0qtc7AydqOm3bROxyqGLvdbZwLf_rp71BGRv3g@mail.gmail.com>
References: <CAAZdwQuOEw2289v1a--OUx=b-gfwiUi7nvps+Bwu4x1ogLdwFA@mail.gmail.com>
	<cdfd44a7-a0bf-c1d8-8d7a-14752c3e379b@mcs.anl.gov>
	<CAAZdwQujE4cX0qtc7AydqOm3bROxyqGLvdbZwLf_rp71BGRv3g@mail.gmail.com>
Message-ID: <CAAZdwQu9ezAD7qKdutN3iiGetvPXHm-PuQU69Omm3s_zimZVHQ@mail.gmail.com>

Attached please find the configure.log. I use my own CMake. I have
defined -DPETSC_HAVE_MUMPS. Thanks.

On Mon, Aug 30, 2021 at 4:56 PM Sam Guo <sam.guo at cd-adapco.com> wrote:

> I use pre-installed
>
> On Mon, Aug 30, 2021 at 4:53 PM Satish Balay <balay at mcs.anl.gov> wrote:
>
>>
>> Are you using --download-mumps or pre-installed mumps? If using
>> pre-installed - try --download-mumps.
>>
>> If you still have issues - send us configure.log and make.log from the
>> failed build.
>>
>> Satish
>>
>> On Mon, 30 Aug 2021, Sam Guo wrote:
>>
>> > Dear PETSc dev team,
>> >    I am compiling petsc 3.15.3 and got following compiling error
>> > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing binary
>> > operator before token "("
>> >    52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0)
>> >    Any idea what I did wrong?
>> >
>> > Thanks,
>> > Sam
>> >
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210830/ce71108f/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: configure.log
Type: text/x-log
Size: 88711 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210830/ce71108f/attachment-0001.bin>

From bantingl at myumanitoba.ca  Mon Aug 30 19:13:38 2021
From: bantingl at myumanitoba.ca (Lucas Banting)
Date: Tue, 31 Aug 2021 00:13:38 +0000
Subject: [petsc-users] PETSc 3.15.3 compiling error
In-Reply-To: <CAAZdwQu9ezAD7qKdutN3iiGetvPXHm-PuQU69Omm3s_zimZVHQ@mail.gmail.com>
References: <CAAZdwQuOEw2289v1a--OUx=b-gfwiUi7nvps+Bwu4x1ogLdwFA@mail.gmail.com>
	<cdfd44a7-a0bf-c1d8-8d7a-14752c3e379b@mcs.anl.gov>
	<CAAZdwQujE4cX0qtc7AydqOm3bROxyqGLvdbZwLf_rp71BGRv3g@mail.gmail.com>
	<CAAZdwQu9ezAD7qKdutN3iiGetvPXHm-PuQU69Omm3s_zimZVHQ@mail.gmail.com>
Message-ID: <YTBPR01MB2448F0D4E56E8AB920E5EA09B1CC9@YTBPR01MB2448.CANPRD01.PROD.OUTLOOK.COM>

Dumb question but are you configuring with scalapack?
________________________________
From: petsc-users <petsc-users-bounces at mcs.anl.gov> on behalf of Sam Guo <sam.guo at cd-adapco.com>
Sent: Monday, August 30, 2021 7:10:45 PM
To: petsc-users <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] PETSc 3.15.3 compiling error

Attached please find the configure.log. I use my own CMake. I have defined -DPETSC_HAVE_MUMPS. Thanks.

On Mon, Aug 30, 2021 at 4:56 PM Sam Guo <sam.guo at cd-adapco.com<mailto:sam.guo at cd-adapco.com>> wrote:
I use pre-installed

On Mon, Aug 30, 2021 at 4:53 PM Satish Balay <balay at mcs.anl.gov<mailto:balay at mcs.anl.gov>> wrote:

Are you using --download-mumps or pre-installed mumps? If using pre-installed - try --download-mumps.

If you still have issues - send us configure.log and make.log from the failed build.

Satish

On Mon, 30 Aug 2021, Sam Guo wrote:

> Dear PETSc dev team,
>    I am compiling petsc 3.15.3 and got following compiling error
> petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing binary
> operator before token "("
>    52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0)
>    Any idea what I did wrong?
>
> Thanks,
> Sam
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210831/5a9ce5de/attachment.html>

From sam.guo at cd-adapco.com  Mon Aug 30 19:17:10 2021
From: sam.guo at cd-adapco.com (Sam Guo)
Date: Mon, 30 Aug 2021 17:17:10 -0700
Subject: [petsc-users] PETSc 3.15.3 compiling error
In-Reply-To: <YTBPR01MB2448F0D4E56E8AB920E5EA09B1CC9@YTBPR01MB2448.CANPRD01.PROD.OUTLOOK.COM>
References: <CAAZdwQuOEw2289v1a--OUx=b-gfwiUi7nvps+Bwu4x1ogLdwFA@mail.gmail.com>
	<cdfd44a7-a0bf-c1d8-8d7a-14752c3e379b@mcs.anl.gov>
	<CAAZdwQujE4cX0qtc7AydqOm3bROxyqGLvdbZwLf_rp71BGRv3g@mail.gmail.com>
	<CAAZdwQu9ezAD7qKdutN3iiGetvPXHm-PuQU69Omm3s_zimZVHQ@mail.gmail.com>
	<YTBPR01MB2448F0D4E56E8AB920E5EA09B1CC9@YTBPR01MB2448.CANPRD01.PROD.OUTLOOK.COM>
Message-ID: <CAAZdwQukHw_c0z+PBQXM_9oYWSbNo+5Eo8YHVAqy7h07CnAS3w@mail.gmail.com>

I don't use scalapack.

On Mon, Aug 30, 2021 at 5:13 PM Lucas Banting <bantingl at myumanitoba.ca>
wrote:

> Dumb question but are you configuring with scalapack?
> ------------------------------
> *From:* petsc-users <petsc-users-bounces at mcs.anl.gov> on behalf of Sam
> Guo <sam.guo at cd-adapco.com>
> *Sent:* Monday, August 30, 2021 7:10:45 PM
> *To:* petsc-users <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] PETSc 3.15.3 compiling error
>
> Attached please find the configure.log. I use my own CMake. I have
> defined -DPETSC_HAVE_MUMPS. Thanks.
>
> On Mon, Aug 30, 2021 at 4:56 PM Sam Guo <sam.guo at cd-adapco.com> wrote:
>
> I use pre-installed
>
> On Mon, Aug 30, 2021 at 4:53 PM Satish Balay <balay at mcs.anl.gov> wrote:
>
>
> Are you using --download-mumps or pre-installed mumps? If using
> pre-installed - try --download-mumps.
>
> If you still have issues - send us configure.log and make.log from the
> failed build.
>
> Satish
>
> On Mon, 30 Aug 2021, Sam Guo wrote:
>
> > Dear PETSc dev team,
> >    I am compiling petsc 3.15.3 and got following compiling error
> > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing binary
> > operator before token "("
> >    52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0)
> >    Any idea what I did wrong?
> >
> > Thanks,
> > Sam
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210830/3ab0d0fd/attachment.html>

From bantingl at myumanitoba.ca  Mon Aug 30 19:21:33 2021
From: bantingl at myumanitoba.ca (Lucas Banting)
Date: Tue, 31 Aug 2021 00:21:33 +0000
Subject: [petsc-users] PETSc 3.15.3 compiling error
In-Reply-To: <CAAZdwQukHw_c0z+PBQXM_9oYWSbNo+5Eo8YHVAqy7h07CnAS3w@mail.gmail.com>
References: <CAAZdwQuOEw2289v1a--OUx=b-gfwiUi7nvps+Bwu4x1ogLdwFA@mail.gmail.com>
	<cdfd44a7-a0bf-c1d8-8d7a-14752c3e379b@mcs.anl.gov>
	<CAAZdwQujE4cX0qtc7AydqOm3bROxyqGLvdbZwLf_rp71BGRv3g@mail.gmail.com>
	<CAAZdwQu9ezAD7qKdutN3iiGetvPXHm-PuQU69Omm3s_zimZVHQ@mail.gmail.com>
	<YTBPR01MB2448F0D4E56E8AB920E5EA09B1CC9@YTBPR01MB2448.CANPRD01.PROD.OUTLOOK.COM>
	<CAAZdwQukHw_c0z+PBQXM_9oYWSbNo+5Eo8YHVAqy7h07CnAS3w@mail.gmail.com>
Message-ID: <YTBPR01MB24481DF65D748B3BD131444BB1CC9@YTBPR01MB2448.CANPRD01.PROD.OUTLOOK.COM>

I believe it is a dependency of mumps based on the configure.log ending.


Package mumps requested but dependency scalapack not requested

________________________________
From: Sam Guo <sam.guo at cd-adapco.com>
Sent: Monday, August 30, 2021 7:17:10 PM
To: Lucas Banting <bantingl at myumanitoba.ca>
Cc: petsc-users <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] PETSc 3.15.3 compiling error

I don't use scalapack.

On Mon, Aug 30, 2021 at 5:13 PM Lucas Banting <bantingl at myumanitoba.ca<mailto:bantingl at myumanitoba.ca>> wrote:
Dumb question but are you configuring with scalapack?
________________________________
From: petsc-users <petsc-users-bounces at mcs.anl.gov<mailto:petsc-users-bounces at mcs.anl.gov>> on behalf of Sam Guo <sam.guo at cd-adapco.com<mailto:sam.guo at cd-adapco.com>>
Sent: Monday, August 30, 2021 7:10:45 PM
To: petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] PETSc 3.15.3 compiling error

Attached please find the configure.log. I use my own CMake. I have defined -DPETSC_HAVE_MUMPS. Thanks.

On Mon, Aug 30, 2021 at 4:56 PM Sam Guo <sam.guo at cd-adapco.com<mailto:sam.guo at cd-adapco.com>> wrote:
I use pre-installed

On Mon, Aug 30, 2021 at 4:53 PM Satish Balay <balay at mcs.anl.gov<mailto:balay at mcs.anl.gov>> wrote:

Are you using --download-mumps or pre-installed mumps? If using pre-installed - try --download-mumps.

If you still have issues - send us configure.log and make.log from the failed build.

Satish

On Mon, 30 Aug 2021, Sam Guo wrote:

> Dear PETSc dev team,
>    I am compiling petsc 3.15.3 and got following compiling error
> petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing binary
> operator before token "("
>    52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0)
>    Any idea what I did wrong?
>
> Thanks,
> Sam
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210831/af6b265f/attachment-0001.html>

From sam.guo at cd-adapco.com  Mon Aug 30 19:22:33 2021
From: sam.guo at cd-adapco.com (Sam Guo)
Date: Mon, 30 Aug 2021 17:22:33 -0700
Subject: [petsc-users] PETSc 3.15.3 compiling error
In-Reply-To: <CAAZdwQukHw_c0z+PBQXM_9oYWSbNo+5Eo8YHVAqy7h07CnAS3w@mail.gmail.com>
References: <CAAZdwQuOEw2289v1a--OUx=b-gfwiUi7nvps+Bwu4x1ogLdwFA@mail.gmail.com>
	<cdfd44a7-a0bf-c1d8-8d7a-14752c3e379b@mcs.anl.gov>
	<CAAZdwQujE4cX0qtc7AydqOm3bROxyqGLvdbZwLf_rp71BGRv3g@mail.gmail.com>
	<CAAZdwQu9ezAD7qKdutN3iiGetvPXHm-PuQU69Omm3s_zimZVHQ@mail.gmail.com>
	<YTBPR01MB2448F0D4E56E8AB920E5EA09B1CC9@YTBPR01MB2448.CANPRD01.PROD.OUTLOOK.COM>
	<CAAZdwQukHw_c0z+PBQXM_9oYWSbNo+5Eo8YHVAqy7h07CnAS3w@mail.gmail.com>
Message-ID: <CAAZdwQveYD=ty9oMZZtEhdaSWU-9Vm-G-JRa3NtXa_Av_tyjEw@mail.gmail.com>

My pre-installed MUMPS defines the dummy for blacs and scalapack.

On Mon, Aug 30, 2021 at 5:17 PM Sam Guo <sam.guo at cd-adapco.com> wrote:

> I don't use scalapack.
>
> On Mon, Aug 30, 2021 at 5:13 PM Lucas Banting <bantingl at myumanitoba.ca>
> wrote:
>
>> Dumb question but are you configuring with scalapack?
>> ------------------------------
>> *From:* petsc-users <petsc-users-bounces at mcs.anl.gov> on behalf of Sam
>> Guo <sam.guo at cd-adapco.com>
>> *Sent:* Monday, August 30, 2021 7:10:45 PM
>> *To:* petsc-users <petsc-users at mcs.anl.gov>
>> *Subject:* Re: [petsc-users] PETSc 3.15.3 compiling error
>>
>> Attached please find the configure.log. I use my own CMake. I have
>> defined -DPETSC_HAVE_MUMPS. Thanks.
>>
>> On Mon, Aug 30, 2021 at 4:56 PM Sam Guo <sam.guo at cd-adapco.com> wrote:
>>
>> I use pre-installed
>>
>> On Mon, Aug 30, 2021 at 4:53 PM Satish Balay <balay at mcs.anl.gov> wrote:
>>
>>
>> Are you using --download-mumps or pre-installed mumps? If using
>> pre-installed - try --download-mumps.
>>
>> If you still have issues - send us configure.log and make.log from the
>> failed build.
>>
>> Satish
>>
>> On Mon, 30 Aug 2021, Sam Guo wrote:
>>
>> > Dear PETSc dev team,
>> >    I am compiling petsc 3.15.3 and got following compiling error
>> > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing binary
>> > operator before token "("
>> >    52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0)
>> >    Any idea what I did wrong?
>> >
>> > Thanks,
>> > Sam
>> >
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210830/bcc45a70/attachment.html>

From sam.guo at cd-adapco.com  Mon Aug 30 19:26:11 2021
From: sam.guo at cd-adapco.com (Sam Guo)
Date: Mon, 30 Aug 2021 17:26:11 -0700
Subject: [petsc-users] PETSc 3.15.3 compiling error
In-Reply-To: <CAAZdwQveYD=ty9oMZZtEhdaSWU-9Vm-G-JRa3NtXa_Av_tyjEw@mail.gmail.com>
References: <CAAZdwQuOEw2289v1a--OUx=b-gfwiUi7nvps+Bwu4x1ogLdwFA@mail.gmail.com>
	<cdfd44a7-a0bf-c1d8-8d7a-14752c3e379b@mcs.anl.gov>
	<CAAZdwQujE4cX0qtc7AydqOm3bROxyqGLvdbZwLf_rp71BGRv3g@mail.gmail.com>
	<CAAZdwQu9ezAD7qKdutN3iiGetvPXHm-PuQU69Omm3s_zimZVHQ@mail.gmail.com>
	<YTBPR01MB2448F0D4E56E8AB920E5EA09B1CC9@YTBPR01MB2448.CANPRD01.PROD.OUTLOOK.COM>
	<CAAZdwQukHw_c0z+PBQXM_9oYWSbNo+5Eo8YHVAqy7h07CnAS3w@mail.gmail.com>
	<CAAZdwQveYD=ty9oMZZtEhdaSWU-9Vm-G-JRa3NtXa_Av_tyjEw@mail.gmail.com>
Message-ID: <CAAZdwQshEHKjmsxDH1dw_pdrCaO7oOd4Gwf+zU7ktQ4B_8bziQ@mail.gmail.com>

I am able to compile PETSc 3.11.3 with my pre-installed MUMPS.

On Mon, Aug 30, 2021 at 5:22 PM Sam Guo <sam.guo at cd-adapco.com> wrote:

> My pre-installed MUMPS defines the dummy for blacs and scalapack.
>
> On Mon, Aug 30, 2021 at 5:17 PM Sam Guo <sam.guo at cd-adapco.com> wrote:
>
>> I don't use scalapack.
>>
>> On Mon, Aug 30, 2021 at 5:13 PM Lucas Banting <bantingl at myumanitoba.ca>
>> wrote:
>>
>>> Dumb question but are you configuring with scalapack?
>>> ------------------------------
>>> *From:* petsc-users <petsc-users-bounces at mcs.anl.gov> on behalf of Sam
>>> Guo <sam.guo at cd-adapco.com>
>>> *Sent:* Monday, August 30, 2021 7:10:45 PM
>>> *To:* petsc-users <petsc-users at mcs.anl.gov>
>>> *Subject:* Re: [petsc-users] PETSc 3.15.3 compiling error
>>>
>>> Attached please find the configure.log. I use my own CMake. I have
>>> defined -DPETSC_HAVE_MUMPS. Thanks.
>>>
>>> On Mon, Aug 30, 2021 at 4:56 PM Sam Guo <sam.guo at cd-adapco.com> wrote:
>>>
>>> I use pre-installed
>>>
>>> On Mon, Aug 30, 2021 at 4:53 PM Satish Balay <balay at mcs.anl.gov> wrote:
>>>
>>>
>>> Are you using --download-mumps or pre-installed mumps? If using
>>> pre-installed - try --download-mumps.
>>>
>>> If you still have issues - send us configure.log and make.log from the
>>> failed build.
>>>
>>> Satish
>>>
>>> On Mon, 30 Aug 2021, Sam Guo wrote:
>>>
>>> > Dear PETSc dev team,
>>> >    I am compiling petsc 3.15.3 and got following compiling error
>>> > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing binary
>>> > operator before token "("
>>> >    52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0)
>>> >    Any idea what I did wrong?
>>> >
>>> > Thanks,
>>> > Sam
>>> >
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210830/57a7d3d4/attachment.html>

From balay at mcs.anl.gov  Mon Aug 30 22:22:33 2021
From: balay at mcs.anl.gov (Satish Balay)
Date: Mon, 30 Aug 2021 22:22:33 -0500 (CDT)
Subject: [petsc-users] PETSc 3.15.3 compiling error
In-Reply-To: <CAAZdwQu9ezAD7qKdutN3iiGetvPXHm-PuQU69Omm3s_zimZVHQ@mail.gmail.com>
References: <CAAZdwQuOEw2289v1a--OUx=b-gfwiUi7nvps+Bwu4x1ogLdwFA@mail.gmail.com>
	<cdfd44a7-a0bf-c1d8-8d7a-14752c3e379b@mcs.anl.gov>
	<CAAZdwQujE4cX0qtc7AydqOm3bROxyqGLvdbZwLf_rp71BGRv3g@mail.gmail.com>
	<CAAZdwQu9ezAD7qKdutN3iiGetvPXHm-PuQU69Omm3s_zimZVHQ@mail.gmail.com>
Message-ID: <ea5527d1-2244-7c2-a1f9-f7c6babf8166@mcs.anl.gov>

Use the additional option: -with-mumps-serial

Satish

On Mon, 30 Aug 2021, Sam Guo wrote:

> Attached please find the configure.log. I use my own CMake. I have
> defined -DPETSC_HAVE_MUMPS. Thanks.
> 
> On Mon, Aug 30, 2021 at 4:56 PM Sam Guo <sam.guo at cd-adapco.com> wrote:
> 
> > I use pre-installed
> >
> > On Mon, Aug 30, 2021 at 4:53 PM Satish Balay <balay at mcs.anl.gov> wrote:
> >
> >>
> >> Are you using --download-mumps or pre-installed mumps? If using
> >> pre-installed - try --download-mumps.
> >>
> >> If you still have issues - send us configure.log and make.log from the
> >> failed build.
> >>
> >> Satish
> >>
> >> On Mon, 30 Aug 2021, Sam Guo wrote:
> >>
> >> > Dear PETSc dev team,
> >> >    I am compiling petsc 3.15.3 and got following compiling error
> >> > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing binary
> >> > operator before token "("
> >> >    52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0)
> >> >    Any idea what I did wrong?
> >> >
> >> > Thanks,
> >> > Sam
> >> >
> >>
> >>
> 


From sam.guo at cd-adapco.com  Mon Aug 30 23:26:37 2021
From: sam.guo at cd-adapco.com (Sam Guo)
Date: Mon, 30 Aug 2021 21:26:37 -0700
Subject: [petsc-users] PETSc 3.15.3 compiling error
In-Reply-To: <ea5527d1-2244-7c2-a1f9-f7c6babf8166@mcs.anl.gov>
References: <CAAZdwQuOEw2289v1a--OUx=b-gfwiUi7nvps+Bwu4x1ogLdwFA@mail.gmail.com>
	<cdfd44a7-a0bf-c1d8-8d7a-14752c3e379b@mcs.anl.gov>
	<CAAZdwQujE4cX0qtc7AydqOm3bROxyqGLvdbZwLf_rp71BGRv3g@mail.gmail.com>
	<CAAZdwQu9ezAD7qKdutN3iiGetvPXHm-PuQU69Omm3s_zimZVHQ@mail.gmail.com>
	<ea5527d1-2244-7c2-a1f9-f7c6babf8166@mcs.anl.gov>
Message-ID: <CAAZdwQuefc0E_m46WCd8E-41QJzQ-nS040ThMDvHAC8kX5nfiw@mail.gmail.com>

Same compiling error with --with-mumps-serial=1.

On Mon, Aug 30, 2021 at 8:22 PM Satish Balay <balay at mcs.anl.gov> wrote:

> Use the additional option: -with-mumps-serial
>
> Satish
>
> On Mon, 30 Aug 2021, Sam Guo wrote:
>
> > Attached please find the configure.log. I use my own CMake. I have
> > defined -DPETSC_HAVE_MUMPS. Thanks.
> >
> > On Mon, Aug 30, 2021 at 4:56 PM Sam Guo <sam.guo at cd-adapco.com> wrote:
> >
> > > I use pre-installed
> > >
> > > On Mon, Aug 30, 2021 at 4:53 PM Satish Balay <balay at mcs.anl.gov>
> wrote:
> > >
> > >>
> > >> Are you using --download-mumps or pre-installed mumps? If using
> > >> pre-installed - try --download-mumps.
> > >>
> > >> If you still have issues - send us configure.log and make.log from the
> > >> failed build.
> > >>
> > >> Satish
> > >>
> > >> On Mon, 30 Aug 2021, Sam Guo wrote:
> > >>
> > >> > Dear PETSc dev team,
> > >> >    I am compiling petsc 3.15.3 and got following compiling error
> > >> > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing
> binary
> > >> > operator before token "("
> > >> >    52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0)
> > >> >    Any idea what I did wrong?
> > >> >
> > >> > Thanks,
> > >> > Sam
> > >> >
> > >>
> > >>
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210830/b1ead470/attachment.html>

From balay at mcs.anl.gov  Mon Aug 30 23:42:55 2021
From: balay at mcs.anl.gov (Satish Balay)
Date: Mon, 30 Aug 2021 23:42:55 -0500 (CDT)
Subject: [petsc-users] PETSc 3.15.3 compiling error
In-Reply-To: <CAAZdwQuefc0E_m46WCd8E-41QJzQ-nS040ThMDvHAC8kX5nfiw@mail.gmail.com>
References: <CAAZdwQuOEw2289v1a--OUx=b-gfwiUi7nvps+Bwu4x1ogLdwFA@mail.gmail.com>
	<cdfd44a7-a0bf-c1d8-8d7a-14752c3e379b@mcs.anl.gov>
	<CAAZdwQujE4cX0qtc7AydqOm3bROxyqGLvdbZwLf_rp71BGRv3g@mail.gmail.com>
	<CAAZdwQu9ezAD7qKdutN3iiGetvPXHm-PuQU69Omm3s_zimZVHQ@mail.gmail.com>
	<ea5527d1-2244-7c2-a1f9-f7c6babf8166@mcs.anl.gov>
	<CAAZdwQuefc0E_m46WCd8E-41QJzQ-nS040ThMDvHAC8kX5nfiw@mail.gmail.com>
Message-ID: <cb5d920-3ab1-5c2d-28da-ef433557f5e9@mcs.anl.gov>

please resend the logs

Satish

On Mon, 30 Aug 2021, Sam Guo wrote:

> Same compiling error with --with-mumps-serial=1.
> 
> On Mon, Aug 30, 2021 at 8:22 PM Satish Balay <balay at mcs.anl.gov> wrote:
> 
> > Use the additional option: -with-mumps-serial
> >
> > Satish
> >
> > On Mon, 30 Aug 2021, Sam Guo wrote:
> >
> > > Attached please find the configure.log. I use my own CMake. I have
> > > defined -DPETSC_HAVE_MUMPS. Thanks.
> > >
> > > On Mon, Aug 30, 2021 at 4:56 PM Sam Guo <sam.guo at cd-adapco.com> wrote:
> > >
> > > > I use pre-installed
> > > >
> > > > On Mon, Aug 30, 2021 at 4:53 PM Satish Balay <balay at mcs.anl.gov>
> > wrote:
> > > >
> > > >>
> > > >> Are you using --download-mumps or pre-installed mumps? If using
> > > >> pre-installed - try --download-mumps.
> > > >>
> > > >> If you still have issues - send us configure.log and make.log from the
> > > >> failed build.
> > > >>
> > > >> Satish
> > > >>
> > > >> On Mon, 30 Aug 2021, Sam Guo wrote:
> > > >>
> > > >> > Dear PETSc dev team,
> > > >> >    I am compiling petsc 3.15.3 and got following compiling error
> > > >> > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing
> > binary
> > > >> > operator before token "("
> > > >> >    52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0)
> > > >> >    Any idea what I did wrong?
> > > >> >
> > > >> > Thanks,
> > > >> > Sam
> > > >> >
> > > >>
> > > >>
> > >
> >
> >
> 


From balay at mcs.anl.gov  Mon Aug 30 23:47:47 2021
From: balay at mcs.anl.gov (Satish Balay)
Date: Mon, 30 Aug 2021 23:47:47 -0500 (CDT)
Subject: [petsc-users] PETSc 3.15.3 compiling error
In-Reply-To: <cb5d920-3ab1-5c2d-28da-ef433557f5e9@mcs.anl.gov>
References: <CAAZdwQuOEw2289v1a--OUx=b-gfwiUi7nvps+Bwu4x1ogLdwFA@mail.gmail.com>
	<cdfd44a7-a0bf-c1d8-8d7a-14752c3e379b@mcs.anl.gov>
	<CAAZdwQujE4cX0qtc7AydqOm3bROxyqGLvdbZwLf_rp71BGRv3g@mail.gmail.com>
	<CAAZdwQu9ezAD7qKdutN3iiGetvPXHm-PuQU69Omm3s_zimZVHQ@mail.gmail.com>
	<ea5527d1-2244-7c2-a1f9-f7c6babf8166@mcs.anl.gov>
	<CAAZdwQuefc0E_m46WCd8E-41QJzQ-nS040ThMDvHAC8kX5nfiw@mail.gmail.com>
	<cb5d920-3ab1-5c2d-28da-ef433557f5e9@mcs.anl.gov>
Message-ID: <65d5cb9a-2dc0-8362-6a7-5acf784e7138@mcs.anl.gov>

Also - what do you have for:

grep MUMPS_VERSION /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h

Satish

On Mon, 30 Aug 2021, Satish Balay via petsc-users wrote:

> please resend the logs
> 
> Satish
> 
> On Mon, 30 Aug 2021, Sam Guo wrote:
> 
> > Same compiling error with --with-mumps-serial=1.
> > 
> > On Mon, Aug 30, 2021 at 8:22 PM Satish Balay <balay at mcs.anl.gov> wrote:
> > 
> > > Use the additional option: -with-mumps-serial
> > >
> > > Satish
> > >
> > > On Mon, 30 Aug 2021, Sam Guo wrote:
> > >
> > > > Attached please find the configure.log. I use my own CMake. I have
> > > > defined -DPETSC_HAVE_MUMPS. Thanks.
> > > >
> > > > On Mon, Aug 30, 2021 at 4:56 PM Sam Guo <sam.guo at cd-adapco.com> wrote:
> > > >
> > > > > I use pre-installed
> > > > >
> > > > > On Mon, Aug 30, 2021 at 4:53 PM Satish Balay <balay at mcs.anl.gov>
> > > wrote:
> > > > >
> > > > >>
> > > > >> Are you using --download-mumps or pre-installed mumps? If using
> > > > >> pre-installed - try --download-mumps.
> > > > >>
> > > > >> If you still have issues - send us configure.log and make.log from the
> > > > >> failed build.
> > > > >>
> > > > >> Satish
> > > > >>
> > > > >> On Mon, 30 Aug 2021, Sam Guo wrote:
> > > > >>
> > > > >> > Dear PETSc dev team,
> > > > >> >    I am compiling petsc 3.15.3 and got following compiling error
> > > > >> > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing
> > > binary
> > > > >> > operator before token "("
> > > > >> >    52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0)
> > > > >> >    Any idea what I did wrong?
> > > > >> >
> > > > >> > Thanks,
> > > > >> > Sam
> > > > >> >
> > > > >>
> > > > >>
> > > >
> > >
> > >
> > 
> 


From patrick.sanan at gmail.com  Tue Aug 31 04:52:31 2021
From: patrick.sanan at gmail.com (Patrick Sanan)
Date: Tue, 31 Aug 2021 11:52:31 +0200
Subject: [petsc-users] Postdoctoral position at ETH Zurich: Geodynamics /
 HPC / Julia
Message-ID: <CA+z91TeDoRvXqJDQARgOQApdN9+THw=+WP4GzQNX7b_psYS-bg@mail.gmail.com>

The Geophysical Fluid Dynamics group at ETH Zurich (Switzerland) is seeking
a postdoctoral appointee to work for about 2.5 years on an ambitious
project involving developing a Julia-based library for GPU-accelerated
multiphysics solvers based on pseudotransient relaxation.

Of particular interest for this audience might be that a major component of
the proposed work is to make these solvers available via PETSc (as a SNES
implementation), thus exposing them for use within a host of existing HPC
applications, including those involved in this specific project.

We'll accept applications until the position is filled, but for full
consideration please apply before October 1, 2021.

Full information is in the ad at the following link, and please feel free
to contact me directly!
https://github.com/psanan/gpu4geo_postdoc_ad/

Best,
Patrick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210831/8ba80016/attachment.html>

From matteo.semplice at uninsubria.it  Tue Aug 31 09:50:11 2021
From: matteo.semplice at uninsubria.it (Matteo Semplice)
Date: Tue, 31 Aug 2021 16:50:11 +0200
Subject: [petsc-users] Mat preallocation in case of variable stencils
Message-ID: <a5f04a44-f3d4-cd14-075b-bd1f99f58764@uninsubria.it>

Hi.

We are writing a code for a FD scheme on an irregular domain and thus 
the local stencil is quite variable: we have inner nodes, boundary nodes 
and inactive nodes, each with their own stencil type and offset with 
respect to the grid node. We currently create a matrix with 
DMCreateMatrix on a DMDA and for now have set the option 
MAT_NEW_NONZERO_LOCATIONS to PETSC_TRUE, but its time to render the code 
memory-efficient. The layout created automatically is correct for inner 
nodes, wrong for boundary ones (off-centered stencils) and redundant for 
outer nodes.

After the preprocessing stage (including stencil creation) we'd be in 
position to set the nonzero pattern properly.

Do we need to start from a Mat created by CreateMatrix? Or is it ok to 
call DMCreateMatrix (so that the splitting among CPUs and the block size 
are set by PETSc) and then call a MatSetPreallocation routine?

Also, I've seen in some examples that you call the Seq and the MPI 
preallocation routines in a row. Does this work because 
MatMPIAIJSetPreallocation silently does nothing on a Seq matrix and 
viceversa?

Thanks

 ??? Matteo


From jed at jedbrown.org  Tue Aug 31 10:32:21 2021
From: jed at jedbrown.org (Jed Brown)
Date: Tue, 31 Aug 2021 09:32:21 -0600
Subject: [petsc-users] Mat preallocation in case of variable stencils
In-Reply-To: <a5f04a44-f3d4-cd14-075b-bd1f99f58764@uninsubria.it>
References: <a5f04a44-f3d4-cd14-075b-bd1f99f58764@uninsubria.it>
Message-ID: <87k0k1zl2y.fsf@jedbrown.org>

Matteo Semplice <matteo.semplice at uninsubria.it> writes:

> Hi.
>
> We are writing a code for a FD scheme on an irregular domain and thus 
> the local stencil is quite variable: we have inner nodes, boundary nodes 
> and inactive nodes, each with their own stencil type and offset with 
> respect to the grid node. We currently create a matrix with 
> DMCreateMatrix on a DMDA and for now have set the option 
> MAT_NEW_NONZERO_LOCATIONS to PETSC_TRUE, but its time to render the code 
> memory-efficient. The layout created automatically is correct for inner 
> nodes, wrong for boundary ones (off-centered stencils) and redundant for 
> outer nodes.
>
> After the preprocessing stage (including stencil creation) we'd be in 
> position to set the nonzero pattern properly.
>
> Do we need to start from a Mat created by CreateMatrix? Or is it ok to 
> call DMCreateMatrix (so that the splitting among CPUs and the block size 
> are set by PETSc) and then call a MatSetPreallocation routine?

You can call MatXAIJSetPreallocation after. It'll handle all matrix types so you don't have to shepherd data for all the specific preallocations.

> Also, I've seen in some examples that you call the Seq and the MPI 
> preallocation routines in a row. Does this work because 
> MatMPIAIJSetPreallocation silently does nothing on a Seq matrix and 
> viceversa?
>
> Thanks
>
>  ??? Matteo

From sam.guo at cd-adapco.com  Tue Aug 31 16:42:57 2021
From: sam.guo at cd-adapco.com (Sam Guo)
Date: Tue, 31 Aug 2021 14:42:57 -0700
Subject: [petsc-users] PETSc 3.15.3 compiling error
In-Reply-To: <65d5cb9a-2dc0-8362-6a7-5acf784e7138@mcs.anl.gov>
References: <CAAZdwQuOEw2289v1a--OUx=b-gfwiUi7nvps+Bwu4x1ogLdwFA@mail.gmail.com>
	<cdfd44a7-a0bf-c1d8-8d7a-14752c3e379b@mcs.anl.gov>
	<CAAZdwQujE4cX0qtc7AydqOm3bROxyqGLvdbZwLf_rp71BGRv3g@mail.gmail.com>
	<CAAZdwQu9ezAD7qKdutN3iiGetvPXHm-PuQU69Omm3s_zimZVHQ@mail.gmail.com>
	<ea5527d1-2244-7c2-a1f9-f7c6babf8166@mcs.anl.gov>
	<CAAZdwQuefc0E_m46WCd8E-41QJzQ-nS040ThMDvHAC8kX5nfiw@mail.gmail.com>
	<cb5d920-3ab1-5c2d-28da-ef433557f5e9@mcs.anl.gov>
	<65d5cb9a-2dc0-8362-6a7-5acf784e7138@mcs.anl.gov>
Message-ID: <CAAZdwQs7Z1Vin5fsv6BmVOMEGHmzDesWCTsZk4uuXoE2sVbLHQ@mail.gmail.com>

Attached please find the latest configure.log.

grep MUMPS_VERSION
/u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h
/u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef
MUMPS_VERSION
/u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define
MUMPS_VERSION "5.2.1"
/u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef
MUMPS_VERSION_MAX_LEN
/u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define
MUMPS_VERSION_MAX_LEN 30
/u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:
   char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1];
/u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef
MUMPS_VERSION
/u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define
MUMPS_VERSION "5.2.1"
/u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef
MUMPS_VERSION_MAX_LEN
/u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define
MUMPS_VERSION_MAX_LEN 30
/u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:
   char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1];
/u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef
MUMPS_VERSION
/u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define
MUMPS_VERSION "5.2.1"
/u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef
MUMPS_VERSION_MAX_LEN
/u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define
MUMPS_VERSION_MAX_LEN 30
/u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:
   char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1];
/u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef
MUMPS_VERSION
/u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define
MUMPS_VERSION "5.2.1"
/u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef
MUMPS_VERSION_MAX_LEN
/u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define
MUMPS_VERSION_MAX_LEN 30
/u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:
   char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1];

On Mon, Aug 30, 2021 at 9:47 PM Satish Balay <balay at mcs.anl.gov> wrote:

> Also - what do you have for:
>
> grep MUMPS_VERSION
> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h
>
> Satish
>
> On Mon, 30 Aug 2021, Satish Balay via petsc-users wrote:
>
> > please resend the logs
> >
> > Satish
> >
> > On Mon, 30 Aug 2021, Sam Guo wrote:
> >
> > > Same compiling error with --with-mumps-serial=1.
> > >
> > > On Mon, Aug 30, 2021 at 8:22 PM Satish Balay <balay at mcs.anl.gov>
> wrote:
> > >
> > > > Use the additional option: -with-mumps-serial
> > > >
> > > > Satish
> > > >
> > > > On Mon, 30 Aug 2021, Sam Guo wrote:
> > > >
> > > > > Attached please find the configure.log. I use my own CMake. I have
> > > > > defined -DPETSC_HAVE_MUMPS. Thanks.
> > > > >
> > > > > On Mon, Aug 30, 2021 at 4:56 PM Sam Guo <sam.guo at cd-adapco.com>
> wrote:
> > > > >
> > > > > > I use pre-installed
> > > > > >
> > > > > > On Mon, Aug 30, 2021 at 4:53 PM Satish Balay <balay at mcs.anl.gov>
> > > > wrote:
> > > > > >
> > > > > >>
> > > > > >> Are you using --download-mumps or pre-installed mumps? If using
> > > > > >> pre-installed - try --download-mumps.
> > > > > >>
> > > > > >> If you still have issues - send us configure.log and make.log
> from the
> > > > > >> failed build.
> > > > > >>
> > > > > >> Satish
> > > > > >>
> > > > > >> On Mon, 30 Aug 2021, Sam Guo wrote:
> > > > > >>
> > > > > >> > Dear PETSc dev team,
> > > > > >> >    I am compiling petsc 3.15.3 and got following compiling
> error
> > > > > >> > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error:
> missing
> > > > binary
> > > > > >> > operator before token "("
> > > > > >> >    52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0)
> > > > > >> >    Any idea what I did wrong?
> > > > > >> >
> > > > > >> > Thanks,
> > > > > >> > Sam
> > > > > >> >
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > > >
> > >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210831/2cbfa0c9/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: configure.log
Type: text/x-log
Size: 88745 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210831/2cbfa0c9/attachment-0001.bin>

From balay at mcs.anl.gov  Tue Aug 31 18:47:11 2021
From: balay at mcs.anl.gov (Satish Balay)
Date: Tue, 31 Aug 2021 18:47:11 -0500 (CDT)
Subject: [petsc-users] PETSc 3.15.3 compiling error
In-Reply-To: <CAAZdwQs7Z1Vin5fsv6BmVOMEGHmzDesWCTsZk4uuXoE2sVbLHQ@mail.gmail.com>
References: <CAAZdwQuOEw2289v1a--OUx=b-gfwiUi7nvps+Bwu4x1ogLdwFA@mail.gmail.com>
	<cdfd44a7-a0bf-c1d8-8d7a-14752c3e379b@mcs.anl.gov>
	<CAAZdwQujE4cX0qtc7AydqOm3bROxyqGLvdbZwLf_rp71BGRv3g@mail.gmail.com>
	<CAAZdwQu9ezAD7qKdutN3iiGetvPXHm-PuQU69Omm3s_zimZVHQ@mail.gmail.com>
	<ea5527d1-2244-7c2-a1f9-f7c6babf8166@mcs.anl.gov>
	<CAAZdwQuefc0E_m46WCd8E-41QJzQ-nS040ThMDvHAC8kX5nfiw@mail.gmail.com>
	<cb5d920-3ab1-5c2d-28da-ef433557f5e9@mcs.anl.gov>
	<65d5cb9a-2dc0-8362-6a7-5acf784e7138@mcs.anl.gov>
	<CAAZdwQs7Z1Vin5fsv6BmVOMEGHmzDesWCTsZk4uuXoE2sVbLHQ@mail.gmail.com>
Message-ID: <575fd7-61c5-b983-5ad0-4c2748b6b6d2@mcs.anl.gov>

*******************************************************************************
         UNABLE to CONFIGURE with GIVEN OPTIONS    (see configure.log for details):
-------------------------------------------------------------------------------
Package mumps requested requires Fortran but compiler turned off.
*******************************************************************************

i.e remove '--with-fc=0' and rerun configure.

Satish

On Tue, 31 Aug 2021, Sam Guo wrote:

> Attached please find the latest configure.log.
> 
> grep MUMPS_VERSION
> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h
> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef
> MUMPS_VERSION
> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define
> MUMPS_VERSION "5.2.1"
> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef
> MUMPS_VERSION_MAX_LEN
> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define
> MUMPS_VERSION_MAX_LEN 30
> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:
>    char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1];
> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef
> MUMPS_VERSION
> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define
> MUMPS_VERSION "5.2.1"
> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef
> MUMPS_VERSION_MAX_LEN
> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define
> MUMPS_VERSION_MAX_LEN 30
> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:
>    char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1];
> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef
> MUMPS_VERSION
> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define
> MUMPS_VERSION "5.2.1"
> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef
> MUMPS_VERSION_MAX_LEN
> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define
> MUMPS_VERSION_MAX_LEN 30
> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:
>    char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1];
> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef
> MUMPS_VERSION
> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define
> MUMPS_VERSION "5.2.1"
> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef
> MUMPS_VERSION_MAX_LEN
> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define
> MUMPS_VERSION_MAX_LEN 30
> /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:
>    char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1];
> 
> On Mon, Aug 30, 2021 at 9:47 PM Satish Balay <balay at mcs.anl.gov> wrote:
> 
> > Also - what do you have for:
> >
> > grep MUMPS_VERSION
> > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h
> >
> > Satish
> >
> > On Mon, 30 Aug 2021, Satish Balay via petsc-users wrote:
> >
> > > please resend the logs
> > >
> > > Satish
> > >
> > > On Mon, 30 Aug 2021, Sam Guo wrote:
> > >
> > > > Same compiling error with --with-mumps-serial=1.
> > > >
> > > > On Mon, Aug 30, 2021 at 8:22 PM Satish Balay <balay at mcs.anl.gov>
> > wrote:
> > > >
> > > > > Use the additional option: -with-mumps-serial
> > > > >
> > > > > Satish
> > > > >
> > > > > On Mon, 30 Aug 2021, Sam Guo wrote:
> > > > >
> > > > > > Attached please find the configure.log. I use my own CMake. I have
> > > > > > defined -DPETSC_HAVE_MUMPS. Thanks.
> > > > > >
> > > > > > On Mon, Aug 30, 2021 at 4:56 PM Sam Guo <sam.guo at cd-adapco.com>
> > wrote:
> > > > > >
> > > > > > > I use pre-installed
> > > > > > >
> > > > > > > On Mon, Aug 30, 2021 at 4:53 PM Satish Balay <balay at mcs.anl.gov>
> > > > > wrote:
> > > > > > >
> > > > > > >>
> > > > > > >> Are you using --download-mumps or pre-installed mumps? If using
> > > > > > >> pre-installed - try --download-mumps.
> > > > > > >>
> > > > > > >> If you still have issues - send us configure.log and make.log
> > from the
> > > > > > >> failed build.
> > > > > > >>
> > > > > > >> Satish
> > > > > > >>
> > > > > > >> On Mon, 30 Aug 2021, Sam Guo wrote:
> > > > > > >>
> > > > > > >> > Dear PETSc dev team,
> > > > > > >> >    I am compiling petsc 3.15.3 and got following compiling
> > error
> > > > > > >> > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error:
> > missing
> > > > > binary
> > > > > > >> > operator before token "("
> > > > > > >> >    52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0)
> > > > > > >> >    Any idea what I did wrong?
> > > > > > >> >
> > > > > > >> > Thanks,
> > > > > > >> > Sam
> > > > > > >> >
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> >
>