[petsc-users] 2^32 integer problems

Sun Jun 2 11:52:24 CDT 2024

A couple of suggestions.

- try building with gcc/gfortran - likely the compiler will flag issues (warnings) with the sources - that might be the cause of some of the errors.
- try using PetscInt datatype across all sources (i.e use .F90 suffix - and include petsc includes) - to avoid any lingering mismatch (as a fix for some of the above warnings)
- and then - you might be able to simplify your makefile to be more portable [using petsc formatted makefile]

Satish

On Sun, 2 Jun 2024, Matthew Knepley wrote:

> On Sun, Jun 2, 2024 at 10:27 AM Matthew Knepley <knepley at gmail.com> wrote:
> 
> > On Sat, Jun 1, 2024 at 11:39 PM Carpenter, Mark H. (LARC-D302) via
> > petsc-users <petsc-users at mcs.anl.gov> wrote:
> >
> >> Mark Carpenter, NASA Langley. I am a novice PETSC user of about 10 years.
> >> I’ve build a DG-FEM code with petsc as one of the solver paths (I have my
> >> own as well). Furthermore, I use petsc for MPI communication. I’m running
> >> the DG-FEM
> >> ZjQcmQRYFpfptBannerStart
> >> This Message Is From an External Sender
> >> This message came from outside your organization.
> >>
> >> ZjQcmQRYFpfptBannerEnd
> >>
> >> Mark Carpenter,  NASA Langley.
> >>
> >>
> >>
> >> I am a novice PETSC user of about 10 years.  I’ve build  a DG-FEM code
> >> with petsc as one of the solver paths (I have my own as well).
> >> Furthermore, I use petsc for MPI communication.
> >>
> >>
> >>
> >> I’m running the DG-FEM code on our NAS supercomputer.  Everything works
> >> when my integer sizes are small.  When I exceed the 2^32 limit of integer
> >> arithmetic the code fails in very strange ways.
> >>
> >> The users that originally set up the petsc infrastructure in the code are
> >> no longer at NASA and I’m “dead in the water”.
> >>
> >
> One additional point. I have looked at the error message. When you make
> PETSc calls, each call should be wrapped in PetscCall(). Here is a Fortran
> example:
> 
> 
> https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/blob/main/src/ksp/ksp/tutorials/ex22f.F90?ref_type=heads__;!!G_uCfscf7eWS!eOkbaTOpui-YHhrX_HYLmYerXOaaGtlJn04-tdLvQzfRqa6gaCs2x-YtPn7xNTWzRRgD-wze7GkX5hkXqc8i$ 
> 
> This checks the return value after each call and ends early if there is an
> error. It would make your
> error output much more readable.
> 
>   Thanks,
> 
>      Matt
> 
> 
> >
> >>
> >> I think I’ve promoted all the integers that  are problematic in my code
> >> (F95).  On PETSC side:  I’ve tried
> >>
> >>    1. Reinstall petsc with –with-64-bit-integers  (no luck)
> >>
> >>
> > That option does not exist, so this will not work.
> >
> >
> >>
> >>    1.
> >>    2. Reinstall petsc with –with-64-bit-integers and
> >>    –with-64-bit-indices  (code will not compile with these options.
> >>    Additional variables on F90 side require promotion and then the errors
> >>    cascade through code  when making PETSC calls.
> >>
> >>
> > We should fix this. I feel confident we can get the code to compile.
> >
> >
> >>
> >>    1.
> >>    2. It’s possible that I’ve missed offending integers, but the petsc
> >>    error messages are so cryptic that I can’t even tell where it is failing.
> >>
> >>
> >>
> >> Further complicating matters:
> >>
> >> The problem by definition needs to be HUGE.  Problem sizes requiring 1000
> >> cores (10^6 elements at P5) are needed to experience the errors, which
> >> involves waiting in queues for ½ day at least.
> >>
> >>
> >>
> >> Attached are the
> >>
> >>    1. Install script used to install PETSC on our machine
> >>    2. The Makefile used on the fortran side
> >>    3. A data dump from an offending simulation (which is huge and I
> >>    can’t see any useful information.)
> >>
> >>
> >>
> >> How do I attack this problem.
> >>
> >> (I’ve never gotten debugging working properly).
> >>
> >
> > Let's get the install for 64-bit indices to work. So we
> >
> > 1) Configure PETSc adding --with-64bit-indices to the configure line. Does
> > this work? If not, send configure.log
> >
> > 2) Compile PETSc. Does this work? If not, send make.log
> >
> > 3) Compile your code. Does this work? If not, send all output.
> >
> > 4) Do one of the 1/2 day runs and let us know what happens. An alternative
> > is to run a small number
> >     of processes on a large memory workstation. We do this to test at the
> > lab.
> >
> >   Thanks,
> >
> >      Matt
> >
> >
> >> Mark
> >>
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> > experiments is infinitely more interesting than any results to which their
> > experiments lead.
> > -- Norbert Wiener
> >
> > https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!eOkbaTOpui-YHhrX_HYLmYerXOaaGtlJn04-tdLvQzfRqa6gaCs2x-YtPn7xNTWzRRgD-wze7GkX5gB4gnrA$ 
> > <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!eOkbaTOpui-YHhrX_HYLmYerXOaaGtlJn04-tdLvQzfRqa6gaCs2x-YtPn7xNTWzRRgD-wze7GkX5r8KJKDw$ >
> >
> 
> 
>