From knepley at gmail.com Mon Aug 2 08:48:38 2021 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 2 Aug 2021 09:48:38 -0400 Subject: [petsc-users] DMPlex box mesh periodicity bug (?) In-Reply-To: References: Message-ID: On Sat, Jul 31, 2021 at 6:01 AM Thibault Bridel-Bertomeu < thibault.bridelbertomeu at gmail.com> wrote: > Dear all, > > I have noticed what I think is a bug with a 3D DMPlex box mesh with > periodic boundaries. > When I project a function onto it, it behaves as if the last row of cells > in X and in Y direction do not have the right coordinates. > > I attach to this email a minimal example that reproduces the bug (files > mwe_periodic_3d.F90, wrapper_petsc.c, wrapper_petsc.h90, makefile), as well > as the output of this code (initmesh.vtu for an output of the DM, > solution.vtu for an output of the data projected onto the mesh). There is > also a screenshot of what's going on. > If one considers the function I project onto the mesh, what should > normally happen is that there is a "hole" is the density field around the > x=0, y=0 region, the rest being equal to one. > > I hope it is just a mishandling from my end !! > Hi Thibault, 1) The main thing happening here is that visualization has some problems for completely periodic things. It is on my list to fix, but below other things. 2) Second, it looks like you need to localize coordinates again after creating ghost cells. Everything is in the right order if you use the command line to create the mesh. I have attached a C example where I do this. In your code, I think just adding Localize again will work. In my example, you can see 2D as well as 3D, and change from non-periodic to periodic to see what is happening. Thanks, Matt > Thank you in advance for your help, > > Cheers, > > Thibault > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: thibault.c Type: application/octet-stream Size: 2647 bytes Desc: not available URL: From thibault.bridelbertomeu at gmail.com Mon Aug 2 15:23:44 2021 From: thibault.bridelbertomeu at gmail.com (Thibault Bridel-Bertomeu) Date: Mon, 2 Aug 2021 22:23:44 +0200 Subject: [petsc-users] DMPlex box mesh periodicity bug (?) In-Reply-To: References: Message-ID: Hi Matt, Thank you for taking the time to take a look ! 1) Yea ... I know most visualization softwares and most storage formats do not like periodicity (a.k.a. infinity) ... but fo CFD-research it's a great tool so we keep trying ... x) The thing that surprised me is that my mwe works perfectly well in 2D, and not in 3D : that's why I ventured to call it a bug ! 2) Hmm I tried adding a "DMLocalizeCoordinates" after the "DMPlexConstructGhostCells" but it does not change anything, the result is still exactly the same as I showed above. As for your code, it compiles fine and I can execute it as well, thanks ! However ... 1) when I run it in 2D and ask for the HDF5 output, and then run the petsc_gen_xdmf.py script, I get the attached result (sol_2D.png) : it seems either Paraview cannot handle what's in the HDF file or the HDF file does not contain something consistent, 2) when I run it in 3D and ask for the HDF5 output + run petsc_gen_xdmf.py I can see ... nothing : paraview does not show anything when I try to open the XDMF file, although it seems syntaxically correct (the HDF file as well) 3) when I run it in 3D and ask for VTU output, I get exactly the same thing as what I got with my F90 program (see sol_3D.png attached) - my command line is ./thibault -dm_plex_dim 3 -dm_plex_simplex 0 -dm_plex_box_faces 16,16,16 -dm_plex_box_lower -5,-5,-5 -dm_plex_box_upper 5,5,5 -dm_plex_box_bd periodic,periodic,periodic -dm_plex_create_fv_ghost_cells -dm_plex_periodic_cut -vec_view vtk:sol.vtu Does your piece of code yields different results on your end ? (P.S. I am using the main branch, commit id ae6adb75dd). Thank you very much for your support !! Thibault Le lun. 2 ao?t 2021 ? 15:48, Matthew Knepley a ?crit : > On Sat, Jul 31, 2021 at 6:01 AM Thibault Bridel-Bertomeu < > thibault.bridelbertomeu at gmail.com> wrote: > >> Dear all, >> >> I have noticed what I think is a bug with a 3D DMPlex box mesh with >> periodic boundaries. >> When I project a function onto it, it behaves as if the last row of cells >> in X and in Y direction do not have the right coordinates. >> >> I attach to this email a minimal example that reproduces the bug (files >> mwe_periodic_3d.F90, wrapper_petsc.c, wrapper_petsc.h90, makefile), as well >> as the output of this code (initmesh.vtu for an output of the DM, >> solution.vtu for an output of the data projected onto the mesh). There is >> also a screenshot of what's going on. >> If one considers the function I project onto the mesh, what should >> normally happen is that there is a "hole" is the density field around the >> x=0, y=0 region, the rest being equal to one. >> >> I hope it is just a mishandling from my end !! >> > > Hi Thibault, > > 1) The main thing happening here is that visualization has some problems > for completely periodic things. It is on my list to fix, but below other > things. > > 2) Second, it looks like you need to localize coordinates again after > creating ghost cells. Everything is in the right order if you use the > command line to create > the mesh. I have attached a C example where I do this. In your code, I > think just adding Localize again will work. > > In my example, you can see 2D as well as 3D, and change from non-periodic > to periodic to see what is happening. > > Thanks, > > Matt > > >> Thank you in advance for your help, >> >> Cheers, >> >> Thibault >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: sol_2D.png Type: image/png Size: 1218883 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: sol_3D.png Type: image/png Size: 1113773 bytes Desc: not available URL: From knepley at gmail.com Mon Aug 2 15:45:53 2021 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 2 Aug 2021 16:45:53 -0400 Subject: [petsc-users] DMPlex box mesh periodicity bug (?) In-Reply-To: References: Message-ID: On Mon, Aug 2, 2021 at 4:23 PM Thibault Bridel-Bertomeu < thibault.bridelbertomeu at gmail.com> wrote: > Hi Matt, > > Thank you for taking the time to take a look ! > > 1) Yea ... I know most visualization softwares and most storage formats do > not like periodicity (a.k.a. infinity) ... but fo CFD-research it's a great > tool so we keep trying ... x) The thing that surprised me is that my mwe > works perfectly well in 2D, and not in 3D : that's why I ventured to call > it a bug ! > > 2) Hmm I tried adding a "DMLocalizeCoordinates" after the > "DMPlexConstructGhostCells" but it does not change anything, the result is > still exactly the same as I showed above. > Hmm, I will try it. It should error telling you that VTK cannot handle meshes with localized coordinates, so something is going wrong there. > As for your code, it compiles fine and I can execute it as well, thanks ! > However ... > 1) when I run it in 2D and ask for the HDF5 output, and then run the > petsc_gen_xdmf.py script, I get the attached result (sol_2D.png) : it seems > either Paraview cannot handle what's in the HDF file or the HDF file does > not contain something consistent, > This is exactly what I get. Something is wrong with the periodic cut when I have a double point. If you use -dm_plex_box_bd periodic,none it looks fine. This is the bug with periodic visualization that I was talking about. > 2) when I run it in 3D and ask for the HDF5 output + run petsc_gen_xdmf.py > I can see ... nothing : paraview does not show anything when I try to open > the XDMF file, although it seems syntaxically correct (the HDF file as well) > Yes, Paraview misunderstands the mesh connections, so I must be telling it something wrong with the periodic cut. You can see that each periodic dimension vanishes from the display, but all the questions I ask it in tests are correct. > 3) when I run it in 3D and ask for VTU output, I get exactly the same > thing as what I got with my F90 program (see sol_3D.png attached) - my > command line is ./thibault -dm_plex_dim 3 -dm_plex_simplex 0 > -dm_plex_box_faces 16,16,16 -dm_plex_box_lower -5,-5,-5 -dm_plex_box_upper > 5,5,5 -dm_plex_box_bd periodic,periodic,periodic > -dm_plex_create_fv_ghost_cells -dm_plex_periodic_cut -vec_view vtk:sol.vtu > Yes, VTU output does not work at all with periodicity. It was only really intended to work with DMDA. I do everything with HDF5 now since it can handle multiple meshes, etc. I think the right thing to do is fix the "periodic cut" support for multiply periodic things, and for 3D. Thanks, Matt > Does your piece of code yields different results on your end ? > > (P.S. I am using the main branch, commit id ae6adb75dd). > > Thank you very much for your support !! > > Thibault > > > > Le lun. 2 ao?t 2021 ? 15:48, Matthew Knepley a ?crit : > >> On Sat, Jul 31, 2021 at 6:01 AM Thibault Bridel-Bertomeu < >> thibault.bridelbertomeu at gmail.com> wrote: >> >>> Dear all, >>> >>> I have noticed what I think is a bug with a 3D DMPlex box mesh with >>> periodic boundaries. >>> When I project a function onto it, it behaves as if the last row of >>> cells in X and in Y direction do not have the right coordinates. >>> >>> I attach to this email a minimal example that reproduces the bug (files >>> mwe_periodic_3d.F90, wrapper_petsc.c, wrapper_petsc.h90, makefile), as well >>> as the output of this code (initmesh.vtu for an output of the DM, >>> solution.vtu for an output of the data projected onto the mesh). There is >>> also a screenshot of what's going on. >>> If one considers the function I project onto the mesh, what should >>> normally happen is that there is a "hole" is the density field around the >>> x=0, y=0 region, the rest being equal to one. >>> >>> I hope it is just a mishandling from my end !! >>> >> >> Hi Thibault, >> >> 1) The main thing happening here is that visualization has some problems >> for completely periodic things. It is on my list to fix, but below other >> things. >> >> 2) Second, it looks like you need to localize coordinates again after >> creating ghost cells. Everything is in the right order if you use the >> command line to create >> the mesh. I have attached a C example where I do this. In your code, >> I think just adding Localize again will work. >> >> In my example, you can see 2D as well as 3D, and change from non-periodic >> to periodic to see what is happening. >> >> Thanks, >> >> Matt >> >> >>> Thank you in advance for your help, >>> >>> Cheers, >>> >>> Thibault >>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From aph at email.arizona.edu Mon Aug 2 16:06:33 2021 From: aph at email.arizona.edu (Anthony Paul Haas) Date: Mon, 2 Aug 2021 14:06:33 -0700 Subject: [petsc-users] from Petsc 3.7.4.0 to 3.13.3.0 Message-ID: Hello, I recently updated our code from Petsc 3.7.4.0 to 3.13.3.0. Among other things I noticed is that all the includes (such as #include ) have now to be accompanied with use statements (such as use petscvec). It seems that due to the use statements the compiler is now way more strict. In our code, we can solve stability equations in real arithmetic or in complex arithmetic, where some subroutines are used for complex arithmetic and some other ones for real arithmetic. My question is, is it good practice to wrap around a Petsc call with the pre-compiler flag PETSC_USE_COMPLEX in order to avoid compilation error if that call is not used say in the complex part of the code? Example, the call to MatSetValuesBlocked below is not used in the complex arithmetic code, so to avoid a compilation error, I wrapped the call with PETSC_USE_COMPLEX==0 (Mat1 is a real array in this example) #if (PETSC_USE_COMPLEX==0) call MatSetValuesBlocked(self%fieldLHSMat_ps,1,ptLoc-1,1,colIndex-1, transpose(Mat1(1:ndim1,1:ndim2)),INSERT_VALUES,ierr) #endif Thanks, Anthony -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Mon Aug 2 20:45:54 2021 From: bsmith at petsc.dev (Barry Smith) Date: Mon, 2 Aug 2021 20:45:54 -0500 Subject: [petsc-users] from Petsc 3.7.4.0 to 3.13.3.0 In-Reply-To: References: Message-ID: <01F246CF-0B93-4EC3-945B-DE40188A64EB@petsc.dev> It is find to use such conditional checks if needed. > On Aug 2, 2021, at 4:06 PM, Anthony Paul Haas wrote: > > Hello, > > I recently updated our code from Petsc 3.7.4.0 to 3.13.3.0. Among other things I noticed is that all the includes (such as #include ) have now to be accompanied with use statements (such as use petscvec). > > It seems that due to the use statements the compiler is now way more strict. In our code, we can solve stability equations in real arithmetic or in complex arithmetic, where some subroutines are used for complex arithmetic and some other ones for real arithmetic. > > My question is, is it good practice to wrap around a Petsc call with the pre-compiler flag PETSC_USE_COMPLEX in order to avoid compilation error if that call is not used say in the complex part of the code? > > Example, the call to MatSetValuesBlocked below is not used in the complex arithmetic code, so to avoid a compilation error, I wrapped the call with PETSC_USE_COMPLEX==0 (Mat1 is a real array in this example) > > #if (PETSC_USE_COMPLEX==0) > call MatSetValuesBlocked(self%fieldLHSMat_ps,1,ptLoc-1,1,colIndex-1,transpose(Mat1(1:ndim1,1:ndim2)),INSERT_VALUES,ierr) > #endif > > Thanks, > > Anthony > -------------- next part -------------- An HTML attachment was scrubbed... URL: From milan.pelletier at protonmail.com Tue Aug 3 05:39:31 2021 From: milan.pelletier at protonmail.com (Milan Pelletier) Date: Tue, 03 Aug 2021 10:39:31 +0000 Subject: [petsc-users] Is there an up-to-date list of GPU-supported preconditioners? Message-ID: Dear PETSc users, I would like to know if there is somewhere a list or table summarizing which preconditioners completely/partly run with CUDA kernels? Best regards, Milan Pelletier -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Tue Aug 3 07:08:32 2021 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 3 Aug 2021 08:08:32 -0400 Subject: [petsc-users] Is there an up-to-date list of GPU-supported preconditioners? In-Reply-To: References: Message-ID: Hi Milan, I would say no. Our GPU support is under active development and moving fast. All built-in solvers work with the cuSparse back-end, at least nominally (eg, SOR smoothers do not work). '-mat_type cusparse' is probably the place to start. Our Hypre/GPU support is close. Our Kokkos-HIP back-end is up and running (I've tested it). Our HIP and SYCL back-ends are just coming online now. Mark On Tue, Aug 3, 2021 at 6:43 AM Milan Pelletier via petsc-users < petsc-users at mcs.anl.gov> wrote: > Dear PETSc users, > > I would like to know if there is somewhere a list or table summarizing > which preconditioners completely/partly run with CUDA kernels? > > Best regards, > Milan Pelletier > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From s_sagar at ce.iitr.ac.in Wed Aug 4 07:31:29 2021 From: s_sagar at ce.iitr.ac.in (SHIV SAGAR) Date: Wed, 4 Aug 2021 18:01:29 +0530 Subject: [petsc-users] Issue while running a PETSC example Message-ID: Dear Sir, I would like to extend my gratitude towards you for creating PETSc libraries for efficient computation. I am a PhD student studying Brittle Fracture using Finite Elements and I have been introduced to the idea of efficient computation using PETSc libraries. Being a beginner to PETSc and the Linux OS, I was having difficulties in running PETSc example in ~/petsc/src/ksp/ksp/tutorials$ make ex1 I get the following error: makefile:41: home/sagar/petsc/lib/petsc/conf/test: No such file or directory make: *** No rule to make target 'home/sagar/petsc/lib/petsc/conf/test'. Stop. I have set the environment variables PETSC_DIR = home/sagar/petsc and PETSC_ARCH = arch-linux2-c-debug If you could help me with this, I would be grateful and could continue using the libraries for much complex programs. Thank You Yours Faithfully Shiv Sagar PhD, IIT Roorkee -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Aug 5 08:59:15 2021 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 5 Aug 2021 09:59:15 -0400 Subject: [petsc-users] Issue while running a PETSC example In-Reply-To: References: Message-ID: On Thu, Aug 5, 2021 at 9:56 AM SHIV SAGAR wrote: > Dear Sir, > I would like to extend my gratitude towards you for creating PETSc > libraries for efficient computation. > > I am a PhD student studying Brittle Fracture using Finite Elements and I > have been introduced to the idea of efficient computation using PETSc > libraries. Being a beginner to PETSc and the Linux OS, I was having > difficulties in running PETSc example in > > ~/petsc/src/ksp/ksp/tutorials$ make ex1 > > I get the following error: > > makefile:41: home/sagar/petsc/lib/petsc/conf/test: No such file or > directory > make: *** No rule to make target 'home/sagar/petsc/lib/petsc/conf/test'. > Stop. > > I have set the environment variables PETSC_DIR = home/sagar/petsc > Hi Shiv, I think the problem is that you need PETSC_DIR = /home/sagar/petsc Thanks, Matt > and PETSC_ARCH = arch-linux2-c-debug > > If you could help me with this, I would be grateful and could continue > using the libraries for much complex programs. > > Thank You > Yours Faithfully > Shiv Sagar > PhD, IIT Roorkee > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From milan.pelletier at protonmail.com Fri Aug 6 07:51:23 2021 From: milan.pelletier at protonmail.com (Milan Pelletier) Date: Fri, 06 Aug 2021 12:51:23 +0000 Subject: [petsc-users] Using external matrix as ILU preconditioner Message-ID: Dear PETSc users, I would like to know if it is possible to provide PETSc with a ILU-preconditioner matrix computed externally beforehand. I tried and built a PCSHELL, to which I pass the externally-computed preconditioner matrix as "Pmat" using the KSPSetOperators function. Then I wanted to use that Pmat in PCApply by calling MatSolve as it seems to be done in the ILU case. Though, this fails since the mat->ops->solve (with mat being my PC Matrix) is a null pointer. I guess the way I set the matrix (as a MATSEQAIJ) is not sufficient for PETSc to know what function to use as MatSolve. How could I achieve providing my own ILU-decomposed matrix and feed PETSc's PCG with it? Is it actually possible? Thanks for your help, Milan Pelletier -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Aug 6 08:02:56 2021 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 6 Aug 2021 09:02:56 -0400 Subject: [petsc-users] Using external matrix as ILU preconditioner In-Reply-To: References: Message-ID: PCSHELL is for adding your own preconditioner method (mat->ops->solve). Pmat is the matrix that you want the PC to use to compute the preconditioner. This is usually the same as Amat, the matrix that you want to use to apply the operator, but if you want to use say a matrix-free Amat, then you need to provide some sort of explicit matrix approximation of Amat for most preconditioners. If you have a matrix that is an approximation of the inverse of Amat that you simply want to apply then you would need to make a PCSHELL with a method that does that. Mark On Fri, Aug 6, 2021 at 8:51 AM Milan Pelletier via petsc-users < petsc-users at mcs.anl.gov> wrote: > Dear PETSc users, > > I would like to know if it is possible to provide PETSc with a > ILU-preconditioner matrix computed externally beforehand. > I tried and built a PCSHELL, to which I pass the externally-computed > preconditioner matrix as "Pmat" using the KSPSetOperators function. Then I > wanted to use that Pmat in PCApply by calling MatSolve as it seems to be > done in the ILU case. Though, this fails since the mat->ops->solve (with > mat being my PC Matrix) is a null pointer. > > I guess the way I set the matrix (as a MATSEQAIJ) is not sufficient for > PETSc to know what function to use as MatSolve. > > How could I achieve providing my own ILU-decomposed matrix and feed > PETSc's PCG with it? Is it actually possible? > > Thanks for your help, > > Milan Pelletier > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aduarteg at utexas.edu Fri Aug 6 17:26:07 2021 From: aduarteg at utexas.edu (Alfredo J Duarte Gomez) Date: Fri, 6 Aug 2021 17:26:07 -0500 Subject: [petsc-users] PCFIELDSPLIT and Block Matrices Message-ID: Good morning, I am currently working on a PETSC application that will require a preconditioner that uses several block matrices. For now, I have a simple problem that I am solving with a dmda structured grid with two fields. For presentation purposes (I know petsc does not use this ordering), lets assume a vector ordering [u1,u2,...,uN,v1,v2,...vN] where u and v are my two fields with N number of grid points. The coupling between these two fields is weak enough that an efficient preconditioner can be formed as the matrix P = [A1, 0;0,A2] where A1 (dependent on u only) and A2 (dependent on v only) are block matrices of size NxN. Therefore, I only require two linear solves of the reduced systems. I am passing the preconditioner matrix P in the Jacobian function, and I hope this strategy is what I am telling PETSC to do with the following block of code: ierr = KSPGetPC(snes,&pc);CHKERRQ(ierr);CHKERRQ(ierr); ierr = PCSetType(pc,PCFIELDSPLIT);CHKERRQ(ierr); ierr = DMCreateFieldDecomposition(dau,NULL,NULL,&fields,NULL);CHKERRQ(ierr); ierr = PCFieldSplitSetIS(pc,NULL,fields[0]);CHKERRQ(ierr); ierr = PCFieldSplitSetIS(pc,NULL,fields[1]);CHKERRQ(ierr); Is this what is actually happening, or is the split also including some of the zero blocks on P? Second, for a future application, I will need a slightly more complicated strategy. It will require solving a similar matrix to P as specified above with more fields (Block diagonal for the fields), and then using the answer to those independent systems for a smaller local solves. In summary, if i have M fields and N grid points, I will solve M systems of size N then followed by using solution as the right hand side to solve N systems of size M. Is this something that the PCFIELDSPLIT can accomodate? Or will I have to implement my own PCSHELL? Thank you, -Alfredo -- Alfredo Duarte Graduate Research Assistant The University of Texas at Austin -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Aug 6 21:11:45 2021 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 6 Aug 2021 21:11:45 -0500 Subject: [petsc-users] PCFIELDSPLIT and Block Matrices In-Reply-To: References: Message-ID: <4CCA2E78-69B0-40AF-821F-D31EB7E523F6@petsc.dev> > On Aug 6, 2021, at 5:26 PM, Alfredo J Duarte Gomez wrote: > > Good morning, > > I am currently working on a PETSC application that will require a preconditioner that uses several block matrices. > > For now, I have a simple problem that I am solving with a dmda structured grid with two fields. For presentation purposes (I know petsc does not use this ordering), lets assume a vector ordering [u1,u2,...,uN,v1,v2,...vN] where u and v are my two fields with N number of grid points. The coupling between these two fields is weak enough that an efficient preconditioner can be formed as the matrix P = [A1, 0;0,A2] where A1 (dependent on u only) and A2 (dependent on v only) are block matrices of size NxN. Therefore, I only require two linear solves of the reduced systems. > > I am passing the preconditioner matrix P in the Jacobian function, and I hope this strategy is what I am telling PETSC to do with the following block of code: > > ierr = KSPGetPC(snes,&pc);CHKERRQ(ierr);CHKERRQ(ierr); > ierr = PCSetType(pc,PCFIELDSPLIT);CHKERRQ(ierr); > ierr = DMCreateFieldDecomposition(dau,NULL,NULL,&fields,NULL);CHKERRQ(ierr); > ierr = PCFieldSplitSetIS(pc,NULL,fields[0]);CHKERRQ(ierr); > ierr = PCFieldSplitSetIS(pc,NULL,fields[1]);CHKERRQ(ierr); > > Is this what is actually happening, or is the split also including some of the zero blocks on P? You should also use PCFieldSplitSetType(pc,PC_COMPOSITE_ADDITIVE) then the preconditioned problem will look like [ KSPSolve(A1) ; 0 ] [ A1 A12 ] [ 0 ; KSPSolve(A2) ] [ A21 A22 ] in other words the preconditioner is [A1 0 ] approximate inverse ( ) [ 0 A2 ] the computation is done efficiently and never uses the zero blocks. The default PCFieldSplitType is PC_COMPOSITE_MULTIPLICATIVE where the preconditioner system looks like [A1 0 ] approximate inverse ( ) [ A12 A2 ] The preconditioner is applied efficiently by first (approximately) solving with A1, then applying A12 to that (approximate) solution, removing it from the right hand side of the second block and then (approximately) solving with A2. Note that if you are using DMDA you don't need to write the above code you can use -pc_type fieldsplit -pc_fieldsplit_type additive and it will use the fields as you would like. > > Second, for a future application, I will need a slightly more complicated strategy. It will require solving a similar matrix to P as specified above with more fields (Block diagonal for the fields), and then using the answer to those independent systems for a smaller local solves. In summary, if i have M fields and N grid points, I will solve M systems of size N then followed by using solution as the right hand side to solve N systems of size M. It sounds like a block diagonal preconditioner (with one block per field) in one ordering then changing the ordering and doing another block diagonal preconditioner with one block for each grid point. PCFIELDSPLIT cannot do this since it basically works with a single ordering. You might be able to combine multiple preconditioners using PCCOMPOSITE that does a PCFIELDSPLIT then a PCPBJACOBI. Again you have a choice of additive or multiplicative formulation. You should not need to use PCSHELL. In fact you should not have to write any code, you should be able to control it completely from the options database with, maybe, -pc_type composite -pc_composite_pcs fieldsplit,pbjacobi -pc_composite_type additive You can also control the solvers used on the inner solves from the options database. If you run with -ksp_view it will show the options prefix for each inner solve and you can use them to control the fields solvers, for example, using gamg for one of the fields in the PCFIELDSPLIT. > > Is this something that the PCFIELDSPLIT can accomodate? Or will I have to implement my own PCSHELL? > > Thank you, > > -Alfredo > > -- > Alfredo Duarte > Graduate Research Assistant > The University of Texas at Austin -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Fri Aug 6 22:38:45 2021 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 6 Aug 2021 22:38:45 -0500 (CDT) Subject: [petsc-users] petsc-3.15.3 now available Message-ID: <36a33a9e-8d92-77da-413d-506ca8dc923@mcs.anl.gov> Dear PETSc users, The patch release petsc-3.15.3 is now available for download. https://petsc.org/release/download/ Satish From armandococo28 at gmail.com Tue Aug 10 04:35:11 2021 From: armandococo28 at gmail.com (Armando Coco) Date: Tue, 10 Aug 2021 10:35:11 +0100 Subject: [petsc-users] use of undeclared identifier PetscViewerHDF5Open - in Petsc 3.12 Message-ID: Hello, I am trying to compile a petsc program that calls PetscViewerHDF5Open. I have added #include in the header, but the compilation fails with error: use of undeclared identifier PetscViewerHDF5Open I have asked a colleague to run the same program in a newer version 3.14 or 3.15 and everything seems to work properly. Does it mean that I have to update my petsc version necessarily? Many Thanks Armando -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Aug 10 08:23:51 2021 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 10 Aug 2021 09:23:51 -0400 Subject: [petsc-users] use of undeclared identifier PetscViewerHDF5Open - in Petsc 3.12 In-Reply-To: References: Message-ID: On Tue, Aug 10, 2021 at 5:35 AM Armando Coco wrote: > Hello, > > I am trying to compile a petsc program that calls PetscViewerHDF5Open. > I have added #include in the header, but the > compilation fails with error: > use of undeclared identifier PetscViewerHDF5Open > > I have asked a colleague to run the same program in a newer version 3.14 > or 3.15 and everything seems to work properly. Does it mean that I have to > update my petsc version necessarily? > I see the declaration there in version 3.12: https://gitlab.com/petsc/petsc/-/blob/v3.12/include/petscviewerhdf5.h#L43 Can you send the entire error output? Thanks, Matt > Many Thanks > Armando > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Tue Aug 10 13:40:45 2021 From: bsmith at petsc.dev (Barry Smith) Date: Tue, 10 Aug 2021 13:40:45 -0500 Subject: [petsc-users] use of undeclared identifier PetscViewerHDF5Open - in Petsc 3.12 In-Reply-To: References: Message-ID: <4CBB0226-798D-4A8B-BDDB-FD88042E231E@petsc.dev> Likely your install of PETSc was not configured for HDF5. Use ./configure --download-hdf5 > On Aug 10, 2021, at 8:23 AM, Matthew Knepley wrote: > > On Tue, Aug 10, 2021 at 5:35 AM Armando Coco > wrote: > Hello, > > I am trying to compile a petsc program that calls PetscViewerHDF5Open. > I have added #include in the header, but the compilation fails with error: > use of undeclared identifier PetscViewerHDF5Open > > I have asked a colleague to run the same program in a newer version 3.14 or 3.15 and everything seems to work properly. Does it mean that I have to update my petsc version necessarily? > > I see the declaration there in version 3.12: > > https://gitlab.com/petsc/petsc/-/blob/v3.12/include/petscviewerhdf5.h#L43 > > Can you send the entire error output? > > Thanks, > > Matt > > Many Thanks > Armando > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From armandococo28 at gmail.com Wed Aug 11 08:39:28 2021 From: armandococo28 at gmail.com (Armando Coco) Date: Wed, 11 Aug 2021 14:39:28 +0100 Subject: [petsc-users] use of undeclared identifier PetscViewerHDF5Open - in Petsc 3.12 In-Reply-To: <4CBB0226-798D-4A8B-BDDB-FD88042E231E@petsc.dev> References: <4CBB0226-798D-4A8B-BDDB-FD88042E231E@petsc.dev> Message-ID: Yes, it works with ./configure --download-hdf5 Thank you!! Armando Il giorno mar 10 ago 2021 alle ore 19:40 Barry Smith ha scritto: > > Likely your install of PETSc was not configured for HDF5. Use > ./configure --download-hdf5 > > > > On Aug 10, 2021, at 8:23 AM, Matthew Knepley wrote: > > On Tue, Aug 10, 2021 at 5:35 AM Armando Coco > wrote: > >> Hello, >> >> I am trying to compile a petsc program that calls PetscViewerHDF5Open. >> I have added #include in the header, but the >> compilation fails with error: >> use of undeclared identifier PetscViewerHDF5Open >> >> I have asked a colleague to run the same program in a newer version 3.14 >> or 3.15 and everything seems to work properly. Does it mean that I have to >> update my petsc version necessarily? >> > > I see the declaration there in version 3.12: > > > https://gitlab.com/petsc/petsc/-/blob/v3.12/include/petscviewerhdf5.h#L43 > > Can you send the entire error output? > > Thanks, > > Matt > > >> Many Thanks >> Armando >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aduarteg at utexas.edu Wed Aug 11 14:33:14 2021 From: aduarteg at utexas.edu (Alfredo J Duarte Gomez) Date: Wed, 11 Aug 2021 14:33:14 -0500 Subject: [petsc-users] Concatenating DM vectors Message-ID: Good morning, I am currently handling a structured dmda object with more than one field. In some intermediate operations, I have to create and handle vectors of a size that corresponds to the same dmda with one field only. After that, it would be very useful to concatenate these vectors and then use them with matrices of the size of the original dmda (more than one field), I hope the vectors keep their i,j structure from the dmda. I tried using VecConcatenate but it seems to be scrambling the vector without the i,j arrangement, and the only other way I can think of is using a for loop over every grid point which seems cumbersome. Any suggestions for this problem? -Alfredo -- Alfredo Duarte Graduate Research Assistant The University of Texas at Austin -------------- next part -------------- An HTML attachment was scrubbed... URL: From rlmackie862 at gmail.com Wed Aug 11 14:47:54 2021 From: rlmackie862 at gmail.com (Randall Mackie) Date: Wed, 11 Aug 2021 12:47:54 -0700 Subject: [petsc-users] Concatenating DM vectors In-Reply-To: References: Message-ID: <102AE53C-BAEA-4750-A997-3220EC45BFEE@gmail.com> Hi Alfredo Take a look at VecStrideGather and VecStrideScatter?.maybe these are what you want? https://petsc.org/release/docs/manualpages/Vec/VecStrideGather.html https://petsc.org/release/docs/manualpages/Vec/VecStrideScatter.html Randy M. > On Aug 11, 2021, at 12:33 PM, Alfredo J Duarte Gomez wrote: > > Good morning, > > I am currently handling a structured dmda object with more than one field. > > In some intermediate operations, I have to create and handle vectors of a size that corresponds to the same dmda with one field only. > > After that, it would be very useful to concatenate these vectors and then use them with matrices of the size of the original dmda (more than one field), I hope the vectors keep their i,j structure from the dmda. > > I tried using VecConcatenate but it seems to be scrambling the vector without the i,j arrangement, and the only other way I can think of is using a for loop over every grid point which seems cumbersome. > > Any suggestions for this problem? > > -Alfredo > > -- > Alfredo Duarte > Graduate Research Assistant > The University of Texas at Austin -------------- next part -------------- An HTML attachment was scrubbed... URL: From aduarteg at utexas.edu Thu Aug 12 09:43:59 2021 From: aduarteg at utexas.edu (Alfredo J Duarte Gomez) Date: Thu, 12 Aug 2021 09:43:59 -0500 Subject: [petsc-users] Sparse Matrix Matrix Multiply Message-ID: Good morning, I am currently having some trouble in the creation of some matrices. I am using structured dmda objects to create matrices using the DMCreate() function. One of these matrices will be the result of a matrix-matrix product of two of these dm matrices. I know that the matrix product will have more nonzero entries or at least a bigger stencil than the original dm matrices, however I accounted for that when I set the DMDA stencil width in the initial creation. The problem is that even with that, the resulting matrix-matrix product has a bigger stencil as evidenced by failure in subsequent matrix copy/addition operations using SAME_NONZERO_PATTERN. Judging by the difference of the nonzero entries I believe that initial zero entries (the ones I initialized to eventually hold this expaned stencil) on the original dm matrices are being combined to further expand the stencil of the product matrix. Is there any way of getting a matrix-matrix product that will keep the same nonzero pattern as the dm matrices? I have tried both MatMatMult() and the MatProductCreate() sequence so far, but both produce nonzero patterns that do not match the dm nonzero pattern. Thank you, -Alfredo -- Alfredo Duarte Graduate Research Assistant The University of Texas at Austin -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Aug 12 10:31:10 2021 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 12 Aug 2021 11:31:10 -0400 Subject: [petsc-users] Sparse Matrix Matrix Multiply In-Reply-To: References: Message-ID: On Thu, Aug 12, 2021 at 10:44 AM Alfredo J Duarte Gomez wrote: > Good morning, > > I am currently having some trouble in the creation of some matrices. > > I am using structured dmda objects to create matrices using the DMCreate() > function. > > One of these matrices will be the result of a matrix-matrix product of two > of these dm matrices. > > I know that the matrix product will have more nonzero entries or at least > a bigger stencil than the original dm matrices, however I accounted for > that when I set the DMDA stencil width in the initial creation. > By default, we put zeros into those locations, so you would expand that stencil when doing MatMatMult(). You can use -dm_preallocate_only to prevent the zeros from being included. However, then your target matrix would not have those locations, so you would need to turn that off before creating the product matrix, or you could just make two DMDA with different stencils, since they are really small. This later solutions sounds cleaner to me. Thanks, Matt > The problem is that even with that, the resulting matrix-matrix product > has a bigger stencil as evidenced by failure in subsequent matrix > copy/addition operations using SAME_NONZERO_PATTERN. > > Judging by the difference of the nonzero entries I believe that initial > zero entries (the ones I initialized to eventually hold this > expaned stencil) on the original dm matrices are being combined to further > expand the stencil of the product matrix. > > Is there any way of getting a matrix-matrix product that will keep the > same nonzero pattern as the dm matrices? > > I have tried both MatMatMult() and the MatProductCreate() sequence so far, > but both produce nonzero patterns that do not match the dm nonzero pattern. > > Thank you, > > -Alfredo > > > > -- > Alfredo Duarte > Graduate Research Assistant > The University of Texas at Austin > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Thu Aug 12 11:09:59 2021 From: bsmith at petsc.dev (Barry Smith) Date: Thu, 12 Aug 2021 11:09:59 -0500 Subject: [petsc-users] Sparse Matrix Matrix Multiply In-Reply-To: References: Message-ID: I don't understand. Why do you wish the new matrix-matrix product vector to have the same nonzero pattern as the basic dm matrix? If you multiple two dm matrices together it will generally have a larger stencil then the dm matrix but this is normal and the new product matrix handles it correctly. You should not copy this new "larger" matrix into a dm matrix. When you do MatAXPY() or MatAYPX() you should put the result into the product matrix, not the dm matrix and you can use SUBSET_NONZERO_PATTERN to make it reasonably efficient. Barry > On Aug 12, 2021, at 10:31 AM, Matthew Knepley wrote: > > On Thu, Aug 12, 2021 at 10:44 AM Alfredo J Duarte Gomez > wrote: > Good morning, > > I am currently having some trouble in the creation of some matrices. > > I am using structured dmda objects to create matrices using the DMCreate() function. > > One of these matrices will be the result of a matrix-matrix product of two of these dm matrices. > > I know that the matrix product will have more nonzero entries or at least a bigger stencil than the original dm matrices, however I accounted for that when I set the DMDA stencil width in the initial creation. > > By default, we put zeros into those locations, so you would expand that stencil when doing MatMatMult(). You can use > > -dm_preallocate_only > > to prevent the zeros from being included. However, then your target matrix would not have those locations, so you would > need to turn that off before creating the product matrix, or you could just make two DMDA with different stencils, since they > are really small. This later solutions sounds cleaner to me. > > Thanks, > > Matt > > The problem is that even with that, the resulting matrix-matrix product has a bigger stencil as evidenced by failure in subsequent matrix copy/addition operations using SAME_NONZERO_PATTERN. > > Judging by the difference of the nonzero entries I believe that initial zero entries (the ones I initialized to eventually hold this expaned stencil) on the original dm matrices are being combined to further expand the stencil of the product matrix. > > Is there any way of getting a matrix-matrix product that will keep the same nonzero pattern as the dm matrices? > > I have tried both MatMatMult() and the MatProductCreate() sequence so far, but both produce nonzero patterns that do not match the dm nonzero pattern. > > Thank you, > > -Alfredo > > > > -- > Alfredo Duarte > Graduate Research Assistant > The University of Texas at Austin > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel.td19 at outlook.com Fri Aug 13 23:19:19 2021 From: miguel.td19 at outlook.com (Miguel Angel Tapia) Date: Sat, 14 Aug 2021 04:19:19 +0000 Subject: [petsc-users] Numbering convention Message-ID: Hello. First of all thanks for the answers to the previous questions. They were really useful to me. Now I am facing a new problem. The code in which I want to implement DMPlex has a specific order in which it orders the elements that make up other elements for meshes of tetrahedra. But when I get the elements that make up some point of the DMPlex DAG they don't match what I need. So I would like to know what is the numbering convention for meshes of tetrahedra that is used in DMPlex? -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat Aug 14 08:27:52 2021 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 14 Aug 2021 09:27:52 -0400 Subject: [petsc-users] Numbering convention In-Reply-To: References: Message-ID: On Sat, Aug 14, 2021 at 12:19 AM Miguel Angel Tapia wrote: > Hello. First of all thanks for the answers to the previous questions. They > were really useful to me. > > Now I am facing a new problem. The code in which I want to implement > DMPlex has a specific order in which it orders the elements that make up > other elements for meshes of tetrahedra. But when I get the elements that > make up some point of the DMPlex DAG they don't match what I need. So I > would like to know what is the numbering convention for meshes of > tetrahedra that is used in DMPlex? > We need to be a little more precise, so I can understand what you need. We usually begin with a cell associated to a set of vertices. Our reference tetrahedron is composed of the vertices, numbered 0 to 3: (-1, -1, -1) -- (-1, 1, -1) -- (1, -1, -1) -- (-1, -1, 1) The triangular faces, numbers 0 to 3, are composed of the vertices {0, 1, 2}, {0, 3, 1}, {0, 2, 3}, {2, 1, 3} Notice that they all have outward normal. Thanks, Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From zjorti at lanl.gov Mon Aug 16 18:56:36 2021 From: zjorti at lanl.gov (Jorti, Zakariae) Date: Mon, 16 Aug 2021 23:56:36 +0000 Subject: [petsc-users] malloc error Message-ID: Hello, I am using TSSolve to solve a linear problem. In the FormIJacobian function that I provide to TSSetIJacobian, I first set the coefficients of both J and Jpre matrices the same way (J and Jpre matrices are equal in the first step). Then I call MatAXPY to prepare Jpre (Jpre := Jpre - another_matrix. So, Jpre and J are not equal anymore). But I get the error once FormIJacobian is called the second time inside TSSolve: "[0]PETSC ERROR: New nonzero at (5,1) caused a malloc Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn off this check". It looks like MatAXPY changes the allocation of Jpre, which the second FormIJacobian does not like unless Jpre is destroyed first. Do you have any suggestions to fix this malloc issue? Thanks. Best regards, Zakariae -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Aug 16 23:40:20 2021 From: jed at jedbrown.org (Jed Brown) Date: Mon, 16 Aug 2021 22:40:20 -0600 Subject: [petsc-users] malloc error In-Reply-To: References: Message-ID: <87sfz865vv.fsf@jedbrown.org> "Jorti, Zakariae via petsc-users" writes: > Hello, > > > I am using TSSolve to solve a linear problem. > > In the FormIJacobian function that I provide to TSSetIJacobian, I first set the coefficients of both J and Jpre matrices the same way (J and Jpre matrices are equal in the first step). Then I call MatAXPY to prepare Jpre (Jpre := Jpre - another_matrix. So, Jpre and J are not equal anymore). How do you call MatAXPY? What MatStructure arg are you passing? What is the sparsity pattern of another_matrix relative to Jpre? > > But I get the error once FormIJacobian is called the second time inside TSSolve: > > "[0]PETSC ERROR: New nonzero at (5,1) caused a malloc > > Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn off this check". > > > It looks like MatAXPY changes the allocation of Jpre, which the second FormIJacobian does not like unless Jpre is destroyed first. > > > Do you have any suggestions to fix this malloc issue? > > Thanks. > > > Best regards, > > > Zakariae From zjorti at lanl.gov Tue Aug 17 15:51:16 2021 From: zjorti at lanl.gov (Jorti, Zakariae) Date: Tue, 17 Aug 2021 20:51:16 +0000 Subject: [petsc-users] [EXTERNAL] Re: malloc error In-Reply-To: <87sfz865vv.fsf@jedbrown.org> References: , <87sfz865vv.fsf@jedbrown.org> Message-ID: <9220838398464eaf8961b78523988d90@lanl.gov> Hello, Thank you for your reply. The problem is now fixed. The issue was actually with MatZeroRowsIS that was called after MatAXPY to cancel the boundary rows of Jpre. It seems to change the non-zero pattern of Jpre. I added MatSetOption(Jpre,MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE); to make sure it does not happen. Thanks. Zakariae ________________________________ From: Jed Brown Sent: Monday, August 16, 2021 10:40:20 PM To: Jorti, Zakariae; petsc-users at mcs.anl.gov Cc: Tang, Xianzhu Subject: [EXTERNAL] Re: [petsc-users] malloc error "Jorti, Zakariae via petsc-users" writes: > Hello, > > > I am using TSSolve to solve a linear problem. > > In the FormIJacobian function that I provide to TSSetIJacobian, I first set the coefficients of both J and Jpre matrices the same way (J and Jpre matrices are equal in the first step). Then I call MatAXPY to prepare Jpre (Jpre := Jpre - another_matrix. So, Jpre and J are not equal anymore). How do you call MatAXPY? What MatStructure arg are you passing? What is the sparsity pattern of another_matrix relative to Jpre? > > But I get the error once FormIJacobian is called the second time inside TSSolve: > > "[0]PETSC ERROR: New nonzero at (5,1) caused a malloc > > Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn off this check". > > > It looks like MatAXPY changes the allocation of Jpre, which the second FormIJacobian does not like unless Jpre is destroyed first. > > > Do you have any suggestions to fix this malloc issue? > > Thanks. > > > Best regards, > > > Zakariae -------------- next part -------------- An HTML attachment was scrubbed... URL: From yuf2 at rpi.edu Wed Aug 18 12:52:30 2021 From: yuf2 at rpi.edu (Feimi Yu) Date: Wed, 18 Aug 2021 13:52:30 -0400 Subject: [petsc-users] Reaching limit number of communicator with Spectrum MPI Message-ID: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu> Hi, I was trying to run a simulation with a PETSc-wrapped Hypre preconditioner, and encountered this problem: [dcs122:133012] Out of resources: all 4095 communicator IDs have been used. [19]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [19]PETSC ERROR: General MPI error [19]PETSC ERROR: MPI error 17 MPI_ERR_INTERN: internal error [19]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [19]PETSC ERROR: Petsc Release Version 3.15.2, unknown [19]PETSC ERROR: ./main on a arch-linux-c-opt named dcs122 by CFSIfmyu Wed Aug 11 19:51:47 2021 [19]PETSC ERROR: [dcs122:133010] Out of resources: all 4095 communicator IDs have been used. [18]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [18]PETSC ERROR: General MPI error [18]PETSC ERROR: MPI error 17 MPI_ERR_INTERN: internal error [18]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [18]PETSC ERROR: Petsc Release Version 3.15.2, unknown [18]PETSC ERROR: ./main on a arch-linux-c-opt named dcs122 by CFSIfmyu Wed Aug 11 19:51:47 2021 [18]PETSC ERROR: Configure options --download-scalapack --download-mumps --download-hypre --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --with-cudac=0 --with-debugging=0 --with-blaslapack-dir=/gpfs/u/home/CFSI/CFSIfmyu/barn-shared/dcs-rh8/lapack-build/ [18]PETSC ERROR: #1 MatCreate_HYPRE() at /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:2120 [18]PETSC ERROR: #2 MatSetType() at /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matreg.c:91 [18]PETSC ERROR: #3 MatConvert_AIJ_HYPRE() at /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:392 [18]PETSC ERROR: #4 MatConvert() at /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matrix.c:4439 [18]PETSC ERROR: #5 PCSetUp_HYPRE() at /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/impls/hypre/hypre.c:240 [18]PETSC ERROR: #6 PCSetUp() at /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/interface/precon.c:1015 Configure options --download-scalapack --download-mumps --download-hypre --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --with-cudac=0 --with-debugging=0 --with-blaslapack-dir=/gpfs/u/home/CFSI/CFSIfmyu/barn-shared/dcs-rh8/lapack-build/ [19]PETSC ERROR: #1 MatCreate_HYPRE() at /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:2120 [19]PETSC ERROR: #2 MatSetType() at /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matreg.c:91 [19]PETSC ERROR: #3 MatConvert_AIJ_HYPRE() at /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:392 [19]PETSC ERROR: #4 MatConvert() at /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matrix.c:4439 [19]PETSC ERROR: #5 PCSetUp_HYPRE() at /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/impls/hypre/hypre.c:240 [19]PETSC ERROR: #6 PCSetUp() at /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/interface/precon.c:1015 It seems that MPI_Comm_dup() at petsc/src/mat/impls/hypre/mhypre.c:2120 caused the problem. Since mine is a time-dependent problem, MatCreate_HYPRE() is called every time the new system matrix is assembled. The above error message is reported after ~4095 calls of MatCreate_HYPRE(), which is around 455 time steps in my code. Here is some basic compiler information: IBM Spectrum MPI 10.4.0 GCC 8.4.1 I've never had this problem before with OpenMPI or MPICH implementation, so I was wondering if this can be resolved from my end, or it's an implementation specific problem. Thanks! Feimi -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Wed Aug 18 15:23:27 2021 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Wed, 18 Aug 2021 15:23:27 -0500 Subject: [petsc-users] Reaching limit number of communicator with Spectrum MPI In-Reply-To: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu> References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu> Message-ID: On Wed, Aug 18, 2021 at 12:52 PM Feimi Yu wrote: > Hi, > > I was trying to run a simulation with a PETSc-wrapped Hypre > preconditioner, and encountered this problem: > > [dcs122:133012] Out of resources: all 4095 communicator IDs have been used. > [19]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [19]PETSC ERROR: General MPI error > [19]PETSC ERROR: MPI error 17 MPI_ERR_INTERN: internal error > [19]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [19]PETSC ERROR: Petsc Release Version 3.15.2, unknown > [19]PETSC ERROR: ./main on a arch-linux-c-opt named dcs122 by CFSIfmyu Wed > Aug 11 19:51:47 2021 > [19]PETSC ERROR: [dcs122:133010] Out of resources: all 4095 communicator > IDs have been used. > [18]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [18]PETSC ERROR: General MPI error > [18]PETSC ERROR: MPI error 17 MPI_ERR_INTERN: internal error > [18]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [18]PETSC ERROR: Petsc Release Version 3.15.2, unknown > [18]PETSC ERROR: ./main on a arch-linux-c-opt named dcs122 by CFSIfmyu Wed > Aug 11 19:51:47 2021 > [18]PETSC ERROR: Configure options --download-scalapack --download-mumps > --download-hypre --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 > --with-cudac=0 --with-debugging=0 > --with-blaslapack-dir=/gpfs/u/home/CFSI/CFSIfmyu/barn-shared/dcs-rh8/lapack-build/ > [18]PETSC ERROR: #1 > MatCreate_HYPRE() at > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:2120 > [18]PETSC ERROR: #2 MatSetType() at > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matreg.c:91 > [18]PETSC ERROR: #3 > MatConvert_AIJ_HYPRE() at > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:392 > [18]PETSC ERROR: #4 MatConvert() at > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matrix.c:4439 > [18]PETSC ERROR: #5 PCSetUp_HYPRE() > at /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/impls/hypre/hypre.c:240 > [18]PETSC ERROR: #6 PCSetUp() at > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/interface/precon.c:1015 > Configure options --download-scalapack --download-mumps --download-hypre > --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --with-cudac=0 > --with-debugging=0 > --with-blaslapack-dir=/gpfs/u/home/CFSI/CFSIfmyu/barn-shared/dcs-rh8/lapack-build/ > [19]PETSC ERROR: #1 > MatCreate_HYPRE() at > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:2120 > [19]PETSC ERROR: #2 MatSetType() at > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matreg.c:91 > [19]PETSC ERROR: #3 > MatConvert_AIJ_HYPRE() at > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:392 > [19]PETSC ERROR: #4 MatConvert() at > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matrix.c:4439 > [19]PETSC ERROR: #5 PCSetUp_HYPRE() > at /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/impls/hypre/hypre.c:240 > [19]PETSC ERROR: #6 PCSetUp() at > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/interface/precon.c:1015 > > It seems that MPI_Comm_dup() at petsc/src/mat/impls/hypre/mhypre.c:2120 > caused the problem. Since mine is a time-dependent problem, > MatCreate_HYPRE() is called every time the new system matrix is assembled. > The above error message is reported after ~4095 calls of MatCreate_HYPRE(), > which is around 455 time steps in my code. Here is some basic compiler > information: > Can you destroy old matrices to free MPI communicators? Otherwise, you run into a limitation we knew before. > > IBM Spectrum MPI 10.4.0 > > GCC 8.4.1 > > I've never had this problem before with OpenMPI or MPICH implementation, > so I was wondering if this can be resolved from my end, or it's an > implementation specific problem. > > Thanks! > > Feimi > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yuf2 at rpi.edu Wed Aug 18 15:31:30 2021 From: yuf2 at rpi.edu (Feimi Yu) Date: Wed, 18 Aug 2021 16:31:30 -0400 Subject: [petsc-users] Reaching limit number of communicator with Spectrum MPI In-Reply-To: References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu> Message-ID: <71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu> Hi Junchao, Thank you for the suggestion! I'm using the deal.ii wrapper dealii::PETScWrappers::PreconditionBase to handle the PETSc preconditioners, and the wrappers does the destroy when the preconditioner is reinitialized or gets out of scope. I just double-checked, this is called to make sure the old matrices are destroyed: ?? void ?? PreconditionBase::clear() ?? { ???? matrix = nullptr; ???? if (pc != nullptr) ?????? { ???????? PetscErrorCode ierr = PCDestroy(&pc); ???????? pc????????????????? = nullptr; ???????? AssertThrow(ierr == 0, ExcPETScError(ierr)); ?????? } ?? } Thanks! Feimi On 8/18/21 4:23 PM, Junchao Zhang wrote: > > > > On Wed, Aug 18, 2021 at 12:52 PM Feimi Yu > wrote: > > Hi, > > I was trying to run a simulation with a PETSc-wrapped Hypre > preconditioner, and encountered this problem: > > [dcs122:133012] Out of resources: all 4095 communicator IDs have > been used. > [19]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [19]PETSC ERROR: General MPI error > [19]PETSC ERROR: MPI error 17 MPI_ERR_INTERN: internal error > [19]PETSC ERROR: See > https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble > shooting. > [19]PETSC ERROR: Petsc Release Version 3.15.2, unknown > [19]PETSC ERROR: ./main on a arch-linux-c-opt named dcs122 by > CFSIfmyu Wed Aug 11 19:51:47 2021 > [19]PETSC ERROR: [dcs122:133010] Out of resources: all 4095 > communicator IDs have been used. > [18]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [18]PETSC ERROR: General MPI error > [18]PETSC ERROR: MPI error 17 MPI_ERR_INTERN: internal error > [18]PETSC ERROR: See > https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble > shooting. > [18]PETSC ERROR: Petsc Release Version 3.15.2, unknown > [18]PETSC ERROR: ./main on a arch-linux-c-opt named dcs122 by > CFSIfmyu Wed Aug 11 19:51:47 2021 > [18]PETSC ERROR: Configure options --download-scalapack > --download-mumps --download-hypre --with-cc=mpicc > --with-cxx=mpicxx --with-fc=mpif90 --with-cudac=0 > --with-debugging=0 > --with-blaslapack-dir=/gpfs/u/home/CFSI/CFSIfmyu/barn-shared/dcs-rh8/lapack-build/ > [18]PETSC ERROR: #1 > MatCreate_HYPRE() at > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:2120 > [18]PETSC ERROR: #2 > MatSetType() at > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matreg.c:91 > [18]PETSC ERROR: #3 > MatConvert_AIJ_HYPRE() at > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:392 > [18]PETSC ERROR: #4 > MatConvert() at > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matrix.c:4439 > [18]PETSC ERROR: #5 > PCSetUp_HYPRE() at > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/impls/hypre/hypre.c:240 > [18]PETSC ERROR: #6 > PCSetUp() at > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/interface/precon.c:1015 > Configure options --download-scalapack --download-mumps > --download-hypre --with-cc=mpicc --with-cxx=mpicxx > --with-fc=mpif90 --with-cudac=0 --with-debugging=0 > --with-blaslapack-dir=/gpfs/u/home/CFSI/CFSIfmyu/barn-shared/dcs-rh8/lapack-build/ > [19]PETSC ERROR: #1 > MatCreate_HYPRE() at > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:2120 > [19]PETSC ERROR: #2 > MatSetType() at > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matreg.c:91 > [19]PETSC ERROR: #3 > MatConvert_AIJ_HYPRE() at > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:392 > [19]PETSC ERROR: #4 > MatConvert() at > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matrix.c:4439 > [19]PETSC ERROR: #5 > PCSetUp_HYPRE() at > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/impls/hypre/hypre.c:240 > [19]PETSC ERROR: #6 > PCSetUp() at > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/interface/precon.c:1015 > > It seems that MPI_Comm_dup() at > petsc/src/mat/impls/hypre/mhypre.c:2120 caused the problem. Since > mine is a time-dependent problem, MatCreate_HYPRE() is called > every time the new system matrix is assembled. The above error > message is reported after ~4095 calls of MatCreate_HYPRE(), which > is around 455 time steps in my code. Here is some basic compiler > information: > > Can you destroy old matrices to free MPI communicators? Otherwise, you > run into a limitation we knew before. > > IBM Spectrum MPI 10.4.0 > > GCC 8.4.1 > > I've never had this problem before with OpenMPI or MPICH > implementation, so I was wondering if this can be resolved from my > end, or it's an implementation specific problem. > > Thanks! > > Feimi > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yuf2 at rpi.edu Wed Aug 18 15:37:10 2021 From: yuf2 at rpi.edu (Feimi Yu) Date: Wed, 18 Aug 2021 16:37:10 -0400 Subject: [petsc-users] Reaching limit number of communicator with Spectrum MPI In-Reply-To: <71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu> References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu> <71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu> Message-ID: <23b3ff76-d2fd-e966-a3d7-c8982c4c21ef@rpi.edu> My previous message may sound misleading. This problem happens despite the fact that the old matrices are destroyed. Feimi On 8/18/21 4:31 PM, Feimi Yu wrote: > > Hi Junchao, > > Thank you for the suggestion! I'm using the deal.ii wrapper > dealii::PETScWrappers::PreconditionBase to handle the PETSc > preconditioners, and the wrappers does the destroy when the > preconditioner is reinitialized or gets out of scope. I just > double-checked, this is called to make sure the old matrices are > destroyed: > > ?? void > ?? PreconditionBase::clear() > ?? { > ???? matrix = nullptr; > > ???? if (pc != nullptr) > ?????? { > ???????? PetscErrorCode ierr = PCDestroy(&pc); > ???????? pc????????????????? = nullptr; > ???????? AssertThrow(ierr == 0, ExcPETScError(ierr)); > ?????? } > ?? } > > Thanks! > > Feimi > > On 8/18/21 4:23 PM, Junchao Zhang wrote: >> >> >> >> On Wed, Aug 18, 2021 at 12:52 PM Feimi Yu > > wrote: >> >> Hi, >> >> I was trying to run a simulation with a PETSc-wrapped Hypre >> preconditioner, and encountered this problem: >> >> [dcs122:133012] Out of resources: all 4095 communicator IDs have >> been used. >> [19]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [19]PETSC ERROR: General MPI error >> [19]PETSC ERROR: MPI error 17 MPI_ERR_INTERN: internal error >> [19]PETSC ERROR: See >> https://www.mcs.anl.gov/petsc/documentation/faq.html >> for >> trouble shooting. >> [19]PETSC ERROR: Petsc Release Version 3.15.2, unknown >> [19]PETSC ERROR: ./main on a arch-linux-c-opt named dcs122 by >> CFSIfmyu Wed Aug 11 19:51:47 2021 >> [19]PETSC ERROR: [dcs122:133010] Out of resources: all 4095 >> communicator IDs have been used. >> [18]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [18]PETSC ERROR: General MPI error >> [18]PETSC ERROR: MPI error 17 MPI_ERR_INTERN: internal error >> [18]PETSC ERROR: See >> https://www.mcs.anl.gov/petsc/documentation/faq.html >> for >> trouble shooting. >> [18]PETSC ERROR: Petsc Release Version 3.15.2, unknown >> [18]PETSC ERROR: ./main on a arch-linux-c-opt named dcs122 by >> CFSIfmyu Wed Aug 11 19:51:47 2021 >> [18]PETSC ERROR: Configure options --download-scalapack >> --download-mumps --download-hypre --with-cc=mpicc >> --with-cxx=mpicxx --with-fc=mpif90 --with-cudac=0 >> --with-debugging=0 >> --with-blaslapack-dir=/gpfs/u/home/CFSI/CFSIfmyu/barn-shared/dcs-rh8/lapack-build/ >> [18]PETSC ERROR: #1 >> MatCreate_HYPRE() at >> /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:2120 >> [18]PETSC ERROR: #2 >> MatSetType() at >> /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matreg.c:91 >> [18]PETSC ERROR: #3 >> MatConvert_AIJ_HYPRE() at >> /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:392 >> [18]PETSC ERROR: #4 >> MatConvert() at >> /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matrix.c:4439 >> [18]PETSC ERROR: #5 >> PCSetUp_HYPRE() at >> /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/impls/hypre/hypre.c:240 >> [18]PETSC ERROR: #6 >> PCSetUp() at >> /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/interface/precon.c:1015 >> Configure options --download-scalapack --download-mumps >> --download-hypre --with-cc=mpicc --with-cxx=mpicxx >> --with-fc=mpif90 --with-cudac=0 --with-debugging=0 >> --with-blaslapack-dir=/gpfs/u/home/CFSI/CFSIfmyu/barn-shared/dcs-rh8/lapack-build/ >> [19]PETSC ERROR: #1 >> MatCreate_HYPRE() at >> /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:2120 >> [19]PETSC ERROR: #2 >> MatSetType() at >> /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matreg.c:91 >> [19]PETSC ERROR: #3 >> MatConvert_AIJ_HYPRE() at >> /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:392 >> [19]PETSC ERROR: #4 >> MatConvert() at >> /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matrix.c:4439 >> [19]PETSC ERROR: #5 >> PCSetUp_HYPRE() at >> /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/impls/hypre/hypre.c:240 >> [19]PETSC ERROR: #6 >> PCSetUp() at >> /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/interface/precon.c:1015 >> >> It seems that MPI_Comm_dup() at >> petsc/src/mat/impls/hypre/mhypre.c:2120 caused the problem. Since >> mine is a time-dependent problem, MatCreate_HYPRE() is called >> every time the new system matrix is assembled. The above error >> message is reported after ~4095 calls of MatCreate_HYPRE(), which >> is around 455 time steps in my code. Here is some basic compiler >> information: >> >> Can you destroy old matrices to free MPI communicators?? Otherwise, >> you run into a limitation we knew before. >> >> IBM Spectrum MPI 10.4.0 >> >> GCC 8.4.1 >> >> I've never had this problem before with OpenMPI or MPICH >> implementation, so I was wondering if this can be resolved from >> my end, or it's an implementation specific problem. >> >> Thanks! >> >> Feimi >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed Aug 18 15:38:26 2021 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 18 Aug 2021 15:38:26 -0500 (CDT) Subject: [petsc-users] Reaching limit number of communicator with Spectrum MPI In-Reply-To: <71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu> References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu> <71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu> Message-ID: <1ce1902e-4335-29d4-033-42e219e6355c@mcs.anl.gov> Is the communicator used to create PETSc objects MPI_COMM_WORLD? If so - try changing it to PETSC_COMM_WORLD Satish On Wed, 18 Aug 2021, Feimi Yu wrote: > Hi Junchao, > > Thank you for the suggestion! I'm using the deal.ii wrapper > dealii::PETScWrappers::PreconditionBase to handle the PETSc preconditioners, > and the wrappers does the destroy when the preconditioner is reinitialized or > gets out of scope. I just double-checked, this is called to make sure the old > matrices are destroyed: > > ?? void > ?? PreconditionBase::clear() > ?? { > ???? matrix = nullptr; > > ???? if (pc != nullptr) > ?????? { > ???????? PetscErrorCode ierr = PCDestroy(&pc); > ???????? pc????????????????? = nullptr; > ???????? AssertThrow(ierr == 0, ExcPETScError(ierr)); > ?????? } > ?? } > > Thanks! > > Feimi > > On 8/18/21 4:23 PM, Junchao Zhang wrote: > > > > > > > > On Wed, Aug 18, 2021 at 12:52 PM Feimi Yu > > wrote: > > > > Hi, > > > > I was trying to run a simulation with a PETSc-wrapped Hypre > > preconditioner, and encountered this problem: > > > > [dcs122:133012] Out of resources: all 4095 communicator IDs have > > been used. > > [19]PETSC ERROR: --------------------- Error Message > > -------------------------------------------------------------- > > [19]PETSC ERROR: General MPI error > > [19]PETSC ERROR: MPI error 17 MPI_ERR_INTERN: internal error > > [19]PETSC ERROR: See > > https://www.mcs.anl.gov/petsc/documentation/faq.html > > for trouble > > shooting. > > [19]PETSC ERROR: Petsc Release Version 3.15.2, unknown > > [19]PETSC ERROR: ./main on a arch-linux-c-opt named dcs122 by > > CFSIfmyu Wed Aug 11 19:51:47 2021 > > [19]PETSC ERROR: [dcs122:133010] Out of resources: all 4095 > > communicator IDs have been used. > > [18]PETSC ERROR: --------------------- Error Message > > -------------------------------------------------------------- > > [18]PETSC ERROR: General MPI error > > [18]PETSC ERROR: MPI error 17 MPI_ERR_INTERN: internal error > > [18]PETSC ERROR: See > > https://www.mcs.anl.gov/petsc/documentation/faq.html > > for trouble > > shooting. > > [18]PETSC ERROR: Petsc Release Version 3.15.2, unknown > > [18]PETSC ERROR: ./main on a arch-linux-c-opt named dcs122 by > > CFSIfmyu Wed Aug 11 19:51:47 2021 > > [18]PETSC ERROR: Configure options --download-scalapack > > --download-mumps --download-hypre --with-cc=mpicc > > --with-cxx=mpicxx --with-fc=mpif90 --with-cudac=0 > > --with-debugging=0 > > --with-blaslapack-dir=/gpfs/u/home/CFSI/CFSIfmyu/barn-shared/dcs-rh8/lapack-build/ > > [18]PETSC ERROR: #1 > > MatCreate_HYPRE() at > > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:2120 > > [18]PETSC ERROR: #2 > > MatSetType() at > > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matreg.c:91 > > [18]PETSC ERROR: #3 > > MatConvert_AIJ_HYPRE() at > > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:392 > > [18]PETSC ERROR: #4 > > MatConvert() at > > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matrix.c:4439 > > [18]PETSC ERROR: #5 > > PCSetUp_HYPRE() at > > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/impls/hypre/hypre.c:240 > > [18]PETSC ERROR: #6 > > PCSetUp() at > > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/interface/precon.c:1015 > > Configure options --download-scalapack --download-mumps > > --download-hypre --with-cc=mpicc --with-cxx=mpicxx > > --with-fc=mpif90 --with-cudac=0 --with-debugging=0 > > --with-blaslapack-dir=/gpfs/u/home/CFSI/CFSIfmyu/barn-shared/dcs-rh8/lapack-build/ > > [19]PETSC ERROR: #1 > > MatCreate_HYPRE() at > > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:2120 > > [19]PETSC ERROR: #2 > > MatSetType() at > > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matreg.c:91 > > [19]PETSC ERROR: #3 > > MatConvert_AIJ_HYPRE() at > > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:392 > > [19]PETSC ERROR: #4 > > MatConvert() at > > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matrix.c:4439 > > [19]PETSC ERROR: #5 > > PCSetUp_HYPRE() at > > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/impls/hypre/hypre.c:240 > > [19]PETSC ERROR: #6 > > PCSetUp() at > > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/interface/precon.c:1015 > > > > It seems that MPI_Comm_dup() at > > petsc/src/mat/impls/hypre/mhypre.c:2120 caused the problem. Since > > mine is a time-dependent problem, MatCreate_HYPRE() is called > > every time the new system matrix is assembled. The above error > > message is reported after ~4095 calls of MatCreate_HYPRE(), which > > is around 455 time steps in my code. Here is some basic compiler > > information: > > > > Can you destroy old matrices to free MPI communicators? Otherwise, you run > > into a limitation we knew before. > > > > IBM Spectrum MPI 10.4.0 > > > > GCC 8.4.1 > > > > I've never had this problem before with OpenMPI or MPICH > > implementation, so I was wondering if this can be resolved from my > > end, or it's an implementation specific problem. > > > > Thanks! > > > > Feimi > > > > From junchao.zhang at gmail.com Wed Aug 18 15:53:22 2021 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Wed, 18 Aug 2021 15:53:22 -0500 Subject: [petsc-users] Reaching limit number of communicator with Spectrum MPI In-Reply-To: <1ce1902e-4335-29d4-033-42e219e6355c@mcs.anl.gov> References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu> <71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu> <1ce1902e-4335-29d4-033-42e219e6355c@mcs.anl.gov> Message-ID: Hi, Feimi, I need to consult Jed (cc'ed). Jed, is this an example of https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663? If Feimi really can not free matrices, then we just need to attach a hypre-comm to a petsc inner comm, and pass that to hypre. --Junchao Zhang On Wed, Aug 18, 2021 at 3:38 PM Satish Balay wrote: > Is the communicator used to create PETSc objects MPI_COMM_WORLD? > > If so - try changing it to PETSC_COMM_WORLD > > Satish > > On Wed, 18 Aug 2021, Feimi Yu wrote: > > > Hi Junchao, > > > > Thank you for the suggestion! I'm using the deal.ii wrapper > > dealii::PETScWrappers::PreconditionBase to handle the PETSc > preconditioners, > > and the wrappers does the destroy when the preconditioner is > reinitialized or > > gets out of scope. I just double-checked, this is called to make sure > the old > > matrices are destroyed: > > > > void > > PreconditionBase::clear() > > { > > matrix = nullptr; > > > > if (pc != nullptr) > > { > > PetscErrorCode ierr = PCDestroy(&pc); > > pc = nullptr; > > AssertThrow(ierr == 0, ExcPETScError(ierr)); > > } > > } > > > > Thanks! > > > > Feimi > > > > On 8/18/21 4:23 PM, Junchao Zhang wrote: > > > > > > > > > > > > On Wed, Aug 18, 2021 at 12:52 PM Feimi Yu > > > wrote: > > > > > > Hi, > > > > > > I was trying to run a simulation with a PETSc-wrapped Hypre > > > preconditioner, and encountered this problem: > > > > > > [dcs122:133012] Out of resources: all 4095 communicator IDs have > > > been used. > > > [19]PETSC ERROR: --------------------- Error Message > > > -------------------------------------------------------------- > > > [19]PETSC ERROR: General MPI error > > > [19]PETSC ERROR: MPI error 17 MPI_ERR_INTERN: internal error > > > [19]PETSC ERROR: See > > > https://www.mcs.anl.gov/petsc/documentation/faq.html > > > for trouble > > > shooting. > > > [19]PETSC ERROR: Petsc Release Version 3.15.2, unknown > > > [19]PETSC ERROR: ./main on a arch-linux-c-opt named dcs122 by > > > CFSIfmyu Wed Aug 11 19:51:47 2021 > > > [19]PETSC ERROR: [dcs122:133010] Out of resources: all 4095 > > > communicator IDs have been used. > > > [18]PETSC ERROR: --------------------- Error Message > > > -------------------------------------------------------------- > > > [18]PETSC ERROR: General MPI error > > > [18]PETSC ERROR: MPI error 17 MPI_ERR_INTERN: internal error > > > [18]PETSC ERROR: See > > > https://www.mcs.anl.gov/petsc/documentation/faq.html > > > for trouble > > > shooting. > > > [18]PETSC ERROR: Petsc Release Version 3.15.2, unknown > > > [18]PETSC ERROR: ./main on a arch-linux-c-opt named dcs122 by > > > CFSIfmyu Wed Aug 11 19:51:47 2021 > > > [18]PETSC ERROR: Configure options --download-scalapack > > > --download-mumps --download-hypre --with-cc=mpicc > > > --with-cxx=mpicxx --with-fc=mpif90 --with-cudac=0 > > > --with-debugging=0 > > > > --with-blaslapack-dir=/gpfs/u/home/CFSI/CFSIfmyu/barn-shared/dcs-rh8/lapack-build/ > > > [18]PETSC ERROR: #1 > > > MatCreate_HYPRE() at > > > > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:2120 > > > [18]PETSC ERROR: #2 > > > MatSetType() at > > > > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matreg.c:91 > > > [18]PETSC ERROR: #3 > > > MatConvert_AIJ_HYPRE() at > > > > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:392 > > > [18]PETSC ERROR: #4 > > > MatConvert() at > > > > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matrix.c:4439 > > > [18]PETSC ERROR: #5 > > > PCSetUp_HYPRE() at > > > > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/impls/hypre/hypre.c:240 > > > [18]PETSC ERROR: #6 > > > PCSetUp() at > > > > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/interface/precon.c:1015 > > > Configure options --download-scalapack --download-mumps > > > --download-hypre --with-cc=mpicc --with-cxx=mpicxx > > > --with-fc=mpif90 --with-cudac=0 --with-debugging=0 > > > > --with-blaslapack-dir=/gpfs/u/home/CFSI/CFSIfmyu/barn-shared/dcs-rh8/lapack-build/ > > > [19]PETSC ERROR: #1 > > > MatCreate_HYPRE() at > > > > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:2120 > > > [19]PETSC ERROR: #2 > > > MatSetType() at > > > > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matreg.c:91 > > > [19]PETSC ERROR: #3 > > > MatConvert_AIJ_HYPRE() at > > > > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:392 > > > [19]PETSC ERROR: #4 > > > MatConvert() at > > > > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matrix.c:4439 > > > [19]PETSC ERROR: #5 > > > PCSetUp_HYPRE() at > > > > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/impls/hypre/hypre.c:240 > > > [19]PETSC ERROR: #6 > > > PCSetUp() at > > > > /gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/interface/precon.c:1015 > > > > > > It seems that MPI_Comm_dup() at > > > petsc/src/mat/impls/hypre/mhypre.c:2120 caused the problem. Since > > > mine is a time-dependent problem, MatCreate_HYPRE() is called > > > every time the new system matrix is assembled. The above error > > > message is reported after ~4095 calls of MatCreate_HYPRE(), which > > > is around 455 time steps in my code. Here is some basic compiler > > > information: > > > > > > Can you destroy old matrices to free MPI communicators? Otherwise, you > run > > > into a limitation we knew before. > > > > > > IBM Spectrum MPI 10.4.0 > > > > > > GCC 8.4.1 > > > > > > I've never had this problem before with OpenMPI or MPICH > > > implementation, so I was wondering if this can be resolved from my > > > end, or it's an implementation specific problem. > > > > > > Thanks! > > > > > > Feimi > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yuf2 at rpi.edu Wed Aug 18 16:23:52 2021 From: yuf2 at rpi.edu (Feimi Yu) Date: Wed, 18 Aug 2021 17:23:52 -0400 Subject: [petsc-users] Reaching limit number of communicator with Spectrum MPI In-Reply-To: References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu> <71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu> <1ce1902e-4335-29d4-033-42e219e6355c@mcs.anl.gov> Message-ID: <095ee8d1-56d9-7d4c-484c-7dc0b88e657c@rpi.edu> Hi Satish and Junchao, I just tried replacing all MPI_COMM_WORLD with PETSC_COMM_WORLD, but it didn't do the trick. One thing that interests me is that, I ran with 40 ranks but only 2 ranks reported the communicator error. I think this means at least the rest 38 ranks freed the communicators properly. Thanks! Feimi On 8/18/21 4:53 PM, Junchao Zhang wrote: > Hi, Feimi, > ? I need to?consult?Jed (cc'ed). > ? Jed, is this an example of > https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663 > ? > If Feimi really can not free matrices, then we just need to attach a > hypre-comm to a petsc inner comm, and pass that to hypre. > > --Junchao Zhang > > > On Wed, Aug 18, 2021 at 3:38 PM Satish Balay > wrote: > > Is the communicator used to create PETSc objects MPI_COMM_WORLD? > > If so - try changing it to PETSC_COMM_WORLD > > Satish > > ?On Wed, 18 Aug 2021, Feimi Yu wrote: > > > Hi Junchao, > > > > Thank you for the suggestion! I'm using the deal.ii wrapper > > dealii::PETScWrappers::PreconditionBase to handle the PETSc > preconditioners, > > and the wrappers does the destroy when the preconditioner is > reinitialized or > > gets out of scope. I just double-checked, this is called to make > sure the old > > matrices are destroyed: > > > > ?? void > > ?? PreconditionBase::clear() > > ?? { > > ???? matrix = nullptr; > > > > ???? if (pc != nullptr) > > ?????? { > > ???????? PetscErrorCode ierr = PCDestroy(&pc); > > ???????? pc????????????????? = nullptr; > > ???????? AssertThrow(ierr == 0, ExcPETScError(ierr)); > > ?????? } > > ?? } > > > > Thanks! > > > > Feimi > > > > On 8/18/21 4:23 PM, Junchao Zhang wrote: > > > > > > > > > > > > On Wed, Aug 18, 2021 at 12:52 PM Feimi Yu > > > >> wrote: > > > > > >? ? ?Hi, > > > > > >? ? ?I was trying to run a simulation with a PETSc-wrapped Hypre > > >? ? ?preconditioner, and encountered this problem: > > > > > >? ? ?[dcs122:133012] Out of resources: all 4095 communicator > IDs have > > >? ? ?been used. > > >? ? ?[19]PETSC ERROR: --------------------- Error Message > > > ?-------------------------------------------------------------- > > >? ? ?[19]PETSC ERROR: General MPI error > > >? ? ?[19]PETSC ERROR: MPI error 17 MPI_ERR_INTERN: internal error > > >? ? ?[19]PETSC ERROR: See > > > https://www.mcs.anl.gov/petsc/documentation/faq.html > > > >? ? ? > for trouble > > >? ? ?shooting. > > >? ? ?[19]PETSC ERROR: Petsc Release Version 3.15.2, unknown > > >? ? ?[19]PETSC ERROR: ./main on a arch-linux-c-opt named dcs122 by > > >? ? ?CFSIfmyu Wed Aug 11 19:51:47 2021 > > >? ? ?[19]PETSC ERROR: [dcs122:133010] Out of resources: all 4095 > > >? ? ?communicator IDs have been used. > > >? ? ?[18]PETSC ERROR: --------------------- Error Message > > > ?-------------------------------------------------------------- > > >? ? ?[18]PETSC ERROR: General MPI error > > >? ? ?[18]PETSC ERROR: MPI error 17 MPI_ERR_INTERN: internal error > > >? ? ?[18]PETSC ERROR: See > > > https://www.mcs.anl.gov/petsc/documentation/faq.html > > > >? ? ? > for trouble > > >? ? ?shooting. > > >? ? ?[18]PETSC ERROR: Petsc Release Version 3.15.2, unknown > > >? ? ?[18]PETSC ERROR: ./main on a arch-linux-c-opt named dcs122 by > > >? ? ?CFSIfmyu Wed Aug 11 19:51:47 2021 > > >? ? ?[18]PETSC ERROR: Configure options --download-scalapack > > >? ? ?--download-mumps --download-hypre --with-cc=mpicc > > >? ? ?--with-cxx=mpicxx --with-fc=mpif90 --with-cudac=0 > > >? ? ?--with-debugging=0 > > > > ?--with-blaslapack-dir=/gpfs/u/home/CFSI/CFSIfmyu/barn-shared/dcs-rh8/lapack-build/ > > >? ? ?[18]PETSC ERROR: #1 > > > >? ? ?MatCreate_HYPRE() at > > > > ?/gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:2120 > > >? ? ?[18]PETSC ERROR: #2 > > > >? ? ?MatSetType() at > > > > ?/gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matreg.c:91 > > >? ? ?[18]PETSC ERROR: #3 > > > >? ? ?MatConvert_AIJ_HYPRE() at > > > > ?/gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:392 > > >? ? ?[18]PETSC ERROR: #4 > > > >? ? ?MatConvert() at > > > > ?/gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matrix.c:4439 > > >? ? ?[18]PETSC ERROR: #5 > > > >? ? ?PCSetUp_HYPRE() at > > > > ?/gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/impls/hypre/hypre.c:240 > > >? ? ?[18]PETSC ERROR: #6 > > > >? ? ?PCSetUp() at > > > > ?/gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/interface/precon.c:1015 > > >? ? ?Configure options --download-scalapack --download-mumps > > >? ? ?--download-hypre --with-cc=mpicc --with-cxx=mpicxx > > >? ? ?--with-fc=mpif90 --with-cudac=0 --with-debugging=0 > > > > ?--with-blaslapack-dir=/gpfs/u/home/CFSI/CFSIfmyu/barn-shared/dcs-rh8/lapack-build/ > > >? ? ?[19]PETSC ERROR: #1 > > > >? ? ?MatCreate_HYPRE() at > > > > ?/gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:2120 > > >? ? ?[19]PETSC ERROR: #2 > > > >? ? ?MatSetType() at > > > > ?/gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matreg.c:91 > > >? ? ?[19]PETSC ERROR: #3 > > > >? ? ?MatConvert_AIJ_HYPRE() at > > > > ?/gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/impls/hypre/mhypre.c:392 > > >? ? ?[19]PETSC ERROR: #4 > > > >? ? ?MatConvert() at > > > > ?/gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/mat/interface/matrix.c:4439 > > >? ? ?[19]PETSC ERROR: #5 > > > >? ? ?PCSetUp_HYPRE() at > > > > ?/gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/impls/hypre/hypre.c:240 > > >? ? ?[19]PETSC ERROR: #6 > > > >? ? ?PCSetUp() at > > > > ?/gpfs/u/barn/CFSI/shared/dcs-rh8/petsc/src/ksp/pc/interface/precon.c:1015 > > > > > >? ? ?It seems that MPI_Comm_dup() at > > >? ? ?petsc/src/mat/impls/hypre/mhypre.c:2120 caused the > problem. Since > > >? ? ?mine is a time-dependent problem, MatCreate_HYPRE() is called > > >? ? ?every time the new system matrix is assembled. The above error > > >? ? ?message is reported after ~4095 calls of > MatCreate_HYPRE(), which > > >? ? ?is around 455 time steps in my code. Here is some basic > compiler > > >? ? ?information: > > > > > > Can you destroy old matrices to free MPI communicators? > Otherwise, you run > > > into a limitation we knew before. > > > > > >? ? ?IBM Spectrum MPI 10.4.0 > > > > > >? ? ?GCC 8.4.1 > > > > > >? ? ?I've never had this problem before with OpenMPI or MPICH > > >? ? ?implementation, so I was wondering if this can be resolved > from my > > >? ? ?end, or it's an implementation specific problem. > > > > > >? ? ?Thanks! > > > > > >? ? ?Feimi > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sayosale at hotmail.com Thu Aug 19 00:12:14 2021 From: sayosale at hotmail.com (dazza simplythebest) Date: Thu, 19 Aug 2021 05:12:14 +0000 Subject: [petsc-users] Improving efficiency of slepc usage Message-ID: Dear All, I am planning on using slepc to do a large number of eigenvalue calculations of a generalized eigenvalue problem, called from a program written in fortran using MPI. Thus far I have successfully installed the slepc/PETSc software, both locally and on a cluster, and on smaller test problems everything is working well; the matrices are efficiently and correctly constructed and slepc returns the correct spectrum. I am just now starting to move towards now solving the full-size 'production run' problems, and would appreciate some general advice on how to improve the solver's performance. In particular, I am currently trying to solve the problem Ax = lambda Bx whose matrices are of size 50000 (this is the smallest 'production run' problem I will be tackling), and are complex, non-Hermitian. In most cases I aim to find the eigenvalues with the largest real part, although in other cases I will also be interested in finding the eigenvalues whose real part is close to zero. A) Calling slepc 's EPS solver with the following options: -eps_nev 10 -log_view -eps_view -eps_max_it 600 -eps_ncv 140 -eps_tol 5.0e-6 -eps_largest_real -eps_monitor :monitor_output.txt led to the code successfully running, but failing to find any eigenvalues within the maximum 600 iterations (examining the monitor output it did appear to be very slowly approaching convergence). B) On the same problem I have also tried a shift-invert transformation using the options -eps_nev 10 -eps_ncv 140 -eps_target 0.0+0.0i -st_type sinvert -in this case the code crashed at the point it tried to call slepc, so perhaps I have incorrectly specified these options ? Does anyone have any suggestions as to how to improve this performance ( or find out more about the problem) ? In the case of A) I can see from watching the slepc videos that increasing ncv may help, but I am wondering , since 600 is a large number of iterations, whether there maybe something else going on - e.g. perhaps some alternative preconditioner may help ? In the case of B), I guess there must be some mistake in these command line options? Again, any advice will be greatly appreciated. Best wishes, Dan. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Thu Aug 19 02:58:29 2021 From: jroman at dsic.upv.es (Jose E. Roman) Date: Thu, 19 Aug 2021 09:58:29 +0200 Subject: [petsc-users] Improving efficiency of slepc usage In-Reply-To: References: Message-ID: In A) convergence may be slow, especially if the wanted eigenvalues have small magnitude. I would not say 600 iterations is a lot, you probably need many more. In most cases, approach B) is better because it improves convergence of eigenvalues close to the target, but it requires prior knowledge of your spectrum distribution in order to choose an appropriate target. In B) what do you mean that it crashes. If you get an error about factorization, it means that your A-matrix is singular, In that case, try using a nonzero target -eps_target 0.1 Jose > El 19 ago 2021, a las 7:12, dazza simplythebest escribi?: > > Dear All, > I am planning on using slepc to do a large number of eigenvalue calculations > of a generalized eigenvalue problem, called from a program written in fortran using MPI. > Thus far I have successfully installed the slepc/PETSc software, both locally and on a cluster, > and on smaller test problems everything is working well; the matrices are efficiently and > correctly constructed and slepc returns the correct spectrum. I am just now starting to move > towards now solving the full-size 'production run' problems, and would appreciate some > general advice on how to improve the solver's performance. > > In particular, I am currently trying to solve the problem Ax = lambda Bx whose matrices > are of size 50000 (this is the smallest 'production run' problem I will be tackling), and are > complex, non-Hermitian. In most cases I aim to find the eigenvalues with the largest real part, > although in other cases I will also be interested in finding the eigenvalues whose real part > is close to zero. > > A) > Calling slepc 's EPS solver with the following options: > > -eps_nev 10 -log_view -eps_view -eps_max_it 600 -eps_ncv 140 -eps_tol 5.0e-6 -eps_largest_real -eps_monitor :monitor_output.txt > > > led to the code successfully running, but failing to find any eigenvalues within the maximum 600 iterations > (examining the monitor output it did appear to be very slowly approaching convergence). > > B) > On the same problem I have also tried a shift-invert transformation using the options > > -eps_nev 10 -eps_ncv 140 -eps_target 0.0+0.0i -st_type sinvert > > -in this case the code crashed at the point it tried to call slepc, so perhaps I have incorrectly specified these options ? > > > Does anyone have any suggestions as to how to improve this performance ( or find out more about the problem) ? > In the case of A) I can see from watching the slepc videos that increasing ncv > may help, but I am wondering , since 600 is a large number of iterations, whether there > maybe something else going on - e.g. perhaps some alternative preconditioner may help ? > In the case of B), I guess there must be some mistake in these command line options? > Again, any advice will be greatly appreciated. > Best wishes, Dan. From jed at jedbrown.org Thu Aug 19 08:01:55 2021 From: jed at jedbrown.org (Jed Brown) Date: Thu, 19 Aug 2021 07:01:55 -0600 Subject: [petsc-users] Reaching limit number of communicator with Spectrum MPI In-Reply-To: References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu> <71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu> <1ce1902e-4335-29d4-033-42e219e6355c@mcs.anl.gov> Message-ID: <878s0x6118.fsf@jedbrown.org> Junchao Zhang writes: > Hi, Feimi, > I need to consult Jed (cc'ed). > Jed, is this an example of > https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663? > If Feimi really can not free matrices, then we just need to attach a > hypre-comm to a petsc inner comm, and pass that to hypre. Are there a bunch of solves as in that case? My understanding is that one should be able to MPI_Comm_dup/MPI_Comm_free as many times as you like, but the implementation has limits on how many communicators can co-exist at any one time. The many-at-once is what we encountered in that 2018 thread. One way to check would be to use a debugger or tracer to examine the stack every time (P)MPI_Comm_dup and (P)MPI_Comm_free are called. case 1: we'll find lots of dups without frees (until the end) because the user really wants lots of these existing at the same time. case 2: dups are unfreed because of reference counting issue/inessential references In case 1, I think the solution is as outlined in the thread, PETSc can create an inner-comm for Hypre. I think I'd prefer to attach it to the outer comm instead of the PETSc inner comm, but perhaps a case could be made either way. From yuf2 at rpi.edu Thu Aug 19 14:08:00 2021 From: yuf2 at rpi.edu (Feimi Yu) Date: Thu, 19 Aug 2021 15:08:00 -0400 Subject: [petsc-users] Reaching limit number of communicator with Spectrum MPI In-Reply-To: <878s0x6118.fsf@jedbrown.org> References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu> <71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu> <1ce1902e-4335-29d4-033-42e219e6355c@mcs.anl.gov> <878s0x6118.fsf@jedbrown.org> Message-ID: Hi Jed, In my case, I only have 2 hypre preconditioners at the same time, and they do not solve simultaneously, so it might not be case 1. I checked the stack for all the calls of MPI_Comm_dup/MPI_Comm_free on my own machine (with OpenMPI), all the communicators are freed from my observation. I could not test it with Spectrum MPI on the clusters immediately because all the dependencies were built in release mode. However, as I mentioned, I haven't had this problem with OpenMPI before, so I'm not sure if this is really an MPI implementation problem, or just because Spectrum MPI has less limit for the number of communicators, and/or this also depends on how many MPI ranks are used, as only 2 out of 40 ranks reported the error. As a workaround, I replaced the MPI_Comm_dup() at petsc/src/mat/impls/hypre/mhypre.c:2120 with a copy assignment, and also removed the MPI_Comm_free() in the hypre destroyer. My code runs fine with Spectrum MPI now, but I don't think this is a long-term solution. Thanks! Feimi On 8/19/21 9:01 AM, Jed Brown wrote: > Junchao Zhang writes: > >> Hi, Feimi, >> I need to consult Jed (cc'ed). >> Jed, is this an example of >> https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663? >> If Feimi really can not free matrices, then we just need to attach a >> hypre-comm to a petsc inner comm, and pass that to hypre. > Are there a bunch of solves as in that case? > > My understanding is that one should be able to MPI_Comm_dup/MPI_Comm_free as many times as you like, but the implementation has limits on how many communicators can co-exist at any one time. The many-at-once is what we encountered in that 2018 thread. > > One way to check would be to use a debugger or tracer to examine the stack every time (P)MPI_Comm_dup and (P)MPI_Comm_free are called. > > case 1: we'll find lots of dups without frees (until the end) because the user really wants lots of these existing at the same time. > > case 2: dups are unfreed because of reference counting issue/inessential references > > > In case 1, I think the solution is as outlined in the thread, PETSc can create an inner-comm for Hypre. I think I'd prefer to attach it to the outer comm instead of the PETSc inner comm, but perhaps a case could be made either way. From knepley at gmail.com Thu Aug 19 14:14:37 2021 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 19 Aug 2021 14:14:37 -0500 Subject: [petsc-users] Reaching limit number of communicator with Spectrum MPI In-Reply-To: References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu> <71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu> <1ce1902e-4335-29d4-033-42e219e6355c@mcs.anl.gov> <878s0x6118.fsf@jedbrown.org> Message-ID: On Thu, Aug 19, 2021 at 2:08 PM Feimi Yu wrote: > Hi Jed, > > In my case, I only have 2 hypre preconditioners at the same time, and > they do not solve simultaneously, so it might not be case 1. > > I checked the stack for all the calls of MPI_Comm_dup/MPI_Comm_free on > my own machine (with OpenMPI), all the communicators are freed from my > observation. I could not test it with Spectrum MPI on the clusters > immediately because all the dependencies were built in release mode. > However, as I mentioned, I haven't had this problem with OpenMPI before, > so I'm not sure if this is really an MPI implementation problem, or just > because Spectrum MPI has less limit for the number of communicators, > and/or this also depends on how many MPI ranks are used, as only 2 out > of 40 ranks reported the error. > > As a workaround, I replaced the MPI_Comm_dup() at > petsc/src/mat/impls/hypre/mhypre.c:2120 with a copy assignment, and also > removed the MPI_Comm_free() in the hypre destroyer. My code runs fine > with Spectrum MPI now, but I don't think this is a long-term solution. > If that runs, then it is definitely an MPI implementation problem. Thanks, Matt > Thanks! > > Feimi > > On 8/19/21 9:01 AM, Jed Brown wrote: > > Junchao Zhang writes: > > > >> Hi, Feimi, > >> I need to consult Jed (cc'ed). > >> Jed, is this an example of > >> > https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663 > ? > >> If Feimi really can not free matrices, then we just need to attach a > >> hypre-comm to a petsc inner comm, and pass that to hypre. > > Are there a bunch of solves as in that case? > > > > My understanding is that one should be able to > MPI_Comm_dup/MPI_Comm_free as many times as you like, but the > implementation has limits on how many communicators can co-exist at any one > time. The many-at-once is what we encountered in that 2018 thread. > > > > One way to check would be to use a debugger or tracer to examine the > stack every time (P)MPI_Comm_dup and (P)MPI_Comm_free are called. > > > > case 1: we'll find lots of dups without frees (until the end) because > the user really wants lots of these existing at the same time. > > > > case 2: dups are unfreed because of reference counting issue/inessential > references > > > > > > In case 1, I think the solution is as outlined in the thread, PETSc can > create an inner-comm for Hypre. I think I'd prefer to attach it to the > outer comm instead of the PETSc inner comm, but perhaps a case could be > made either way. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Thu Aug 19 15:29:33 2021 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Thu, 19 Aug 2021 15:29:33 -0500 Subject: [petsc-users] Reaching limit number of communicator with Spectrum MPI In-Reply-To: References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu> <71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu> <1ce1902e-4335-29d4-033-42e219e6355c@mcs.anl.gov> <878s0x6118.fsf@jedbrown.org> Message-ID: On Thu, Aug 19, 2021 at 2:08 PM Feimi Yu wrote: > Hi Jed, > > In my case, I only have 2 hypre preconditioners at the same time, and > they do not solve simultaneously, so it might not be case 1. > > I checked the stack for all the calls of MPI_Comm_dup/MPI_Comm_free on > my own machine (with OpenMPI), all the communicators are freed from my > observation. I could not test it with Spectrum MPI on the clusters > immediately because all the dependencies were built in release mode. > However, as I mentioned, I haven't had this problem with OpenMPI before, > so I'm not sure if this is really an MPI implementation problem, or just > because Spectrum MPI has less limit for the number of communicators, > and/or this also depends on how many MPI ranks are used, as only 2 out > of 40 ranks reported the error. > You can add printf around MPI_Comm_dup/MPI_Comm_free sites on the two ranks, e.g., if (myrank == 38) printf(...), to see if the dup/free are paired. As a workaround, I replaced the MPI_Comm_dup() at > petsc/src/mat/impls/hypre/mhypre.c:2120 with a copy assignment, and also > removed the MPI_Comm_free() in the hypre destroyer. My code runs fine > with Spectrum MPI now, but I don't think this is a long-term solution. > > Thanks! > > Feimi > > On 8/19/21 9:01 AM, Jed Brown wrote: > > Junchao Zhang writes: > > > >> Hi, Feimi, > >> I need to consult Jed (cc'ed). > >> Jed, is this an example of > >> > https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663 > ? > >> If Feimi really can not free matrices, then we just need to attach a > >> hypre-comm to a petsc inner comm, and pass that to hypre. > > Are there a bunch of solves as in that case? > > > > My understanding is that one should be able to > MPI_Comm_dup/MPI_Comm_free as many times as you like, but the > implementation has limits on how many communicators can co-exist at any one > time. The many-at-once is what we encountered in that 2018 thread. > > > > One way to check would be to use a debugger or tracer to examine the > stack every time (P)MPI_Comm_dup and (P)MPI_Comm_free are called. > > > > case 1: we'll find lots of dups without frees (until the end) because > the user really wants lots of these existing at the same time. > > > > case 2: dups are unfreed because of reference counting issue/inessential > references > > > > > > In case 1, I think the solution is as outlined in the thread, PETSc can > create an inner-comm for Hypre. I think I'd prefer to attach it to the > outer comm instead of the PETSc inner comm, but perhaps a case could be > made either way. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Aug 20 00:33:24 2021 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 20 Aug 2021 00:33:24 -0500 Subject: [petsc-users] Reaching limit number of communicator with Spectrum MPI In-Reply-To: References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu> <71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu> <1ce1902e-4335-29d4-033-42e219e6355c@mcs.anl.gov> <878s0x6118.fsf@jedbrown.org> Message-ID: It sounds like maybe the Spectrum MPI_Comm_free() is not returning the comm to the "pool" as available for future use; a very buggy MPI implementation. This can easily be checked in a tiny standalone MPI program that simply comm dups and frees thousands of times in a loop. Could even be a configure test (that requires running an MPI program). I do not remember if we ever tested this possibility; maybe and I forgot. If this is the problem we can provide a "work around" that attributes the new comm (to be passed to hypre) to the old comm with a reference count value also in the attribute. When the hypre matrix is created that count is (with the new comm) is set to 1, when the hypre matrix is freed that count is set to zero (but the comm is not freed), in the next call to create the hypre matrix when the attribute is found, the count is zero so PETSc knows it can pass the same comm again to the new hypre matrix. This will only allow one simultaneous hypre matrix to be created from the original comm. To allow multiply simultaneous hypre matrix one could have multiple comms and counts in the attribute and just check them until one finds an available one to reuse (or creates yet another one if all the current ones are busy with hypre matrices). So it is the same model as DMGetXXVector() where vectors are checked out and then checked in to be available later. This would solve the currently reported problem (if it is a buggy MPI that does not properly free comms), but not solve the MOOSE problem where 10,000 comms are needed at the same time. Barry > On Aug 19, 2021, at 3:29 PM, Junchao Zhang wrote: > > > > > On Thu, Aug 19, 2021 at 2:08 PM Feimi Yu > wrote: > Hi Jed, > > In my case, I only have 2 hypre preconditioners at the same time, and > they do not solve simultaneously, so it might not be case 1. > > I checked the stack for all the calls of MPI_Comm_dup/MPI_Comm_free on > my own machine (with OpenMPI), all the communicators are freed from my > observation. I could not test it with Spectrum MPI on the clusters > immediately because all the dependencies were built in release mode. > However, as I mentioned, I haven't had this problem with OpenMPI before, > so I'm not sure if this is really an MPI implementation problem, or just > because Spectrum MPI has less limit for the number of communicators, > and/or this also depends on how many MPI ranks are used, as only 2 out > of 40 ranks reported the error. > You can add printf around MPI_Comm_dup/MPI_Comm_free sites on the two ranks, e.g., if (myrank == 38) printf(...), to see if the dup/free are paired. > > As a workaround, I replaced the MPI_Comm_dup() at > petsc/src/mat/impls/hypre/mhypre.c:2120 with a copy assignment, and also > removed the MPI_Comm_free() in the hypre destroyer. My code runs fine > with Spectrum MPI now, but I don't think this is a long-term solution. > > Thanks! > > Feimi > > On 8/19/21 9:01 AM, Jed Brown wrote: > > Junchao Zhang > writes: > > > >> Hi, Feimi, > >> I need to consult Jed (cc'ed). > >> Jed, is this an example of > >> https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663 ? > >> If Feimi really can not free matrices, then we just need to attach a > >> hypre-comm to a petsc inner comm, and pass that to hypre. > > Are there a bunch of solves as in that case? > > > > My understanding is that one should be able to MPI_Comm_dup/MPI_Comm_free as many times as you like, but the implementation has limits on how many communicators can co-exist at any one time. The many-at-once is what we encountered in that 2018 thread. > > > > One way to check would be to use a debugger or tracer to examine the stack every time (P)MPI_Comm_dup and (P)MPI_Comm_free are called. > > > > case 1: we'll find lots of dups without frees (until the end) because the user really wants lots of these existing at the same time. > > > > case 2: dups are unfreed because of reference counting issue/inessential references > > > > > > In case 1, I think the solution is as outlined in the thread, PETSc can create an inner-comm for Hypre. I think I'd prefer to attach it to the outer comm instead of the PETSc inner comm, but perhaps a case could be made either way. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Aug 20 00:52:06 2021 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 20 Aug 2021 00:52:06 -0500 Subject: [petsc-users] Reaching limit number of communicator with Spectrum MPI In-Reply-To: References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu> <71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu> <1ce1902e-4335-29d4-033-42e219e6355c@mcs.anl.gov> <878s0x6118.fsf@jedbrown.org> Message-ID: <130D0B03-EBDC-4FBF-A051-12FDE0B51CAD@petsc.dev> With a couple of new PETSc utility functions we could use this approach generically to provide communicators to all external packages instead of directly use the dup and free specifically for each external package as we do now. > On Aug 20, 2021, at 12:33 AM, Barry Smith wrote: > > > It sounds like maybe the Spectrum MPI_Comm_free() is not returning the comm to the "pool" as available for future use; a very buggy MPI implementation. This can easily be checked in a tiny standalone MPI program that simply comm dups and frees thousands of times in a loop. Could even be a configure test (that requires running an MPI program). I do not remember if we ever tested this possibility; maybe and I forgot. > > If this is the problem we can provide a "work around" that attributes the new comm (to be passed to hypre) to the old comm with a reference count value also in the attribute. When the hypre matrix is created that count is (with the new comm) is set to 1, when the hypre matrix is freed that count is set to zero (but the comm is not freed), in the next call to create the hypre matrix when the attribute is found, the count is zero so PETSc knows it can pass the same comm again to the new hypre matrix. > > This will only allow one simultaneous hypre matrix to be created from the original comm. To allow multiply simultaneous hypre matrix one could have multiple comms and counts in the attribute and just check them until one finds an available one to reuse (or creates yet another one if all the current ones are busy with hypre matrices). So it is the same model as DMGetXXVector() where vectors are checked out and then checked in to be available later. This would solve the currently reported problem (if it is a buggy MPI that does not properly free comms), but not solve the MOOSE problem where 10,000 comms are needed at the same time. > > Barry > > > > > >> On Aug 19, 2021, at 3:29 PM, Junchao Zhang > wrote: >> >> >> >> >> On Thu, Aug 19, 2021 at 2:08 PM Feimi Yu > wrote: >> Hi Jed, >> >> In my case, I only have 2 hypre preconditioners at the same time, and >> they do not solve simultaneously, so it might not be case 1. >> >> I checked the stack for all the calls of MPI_Comm_dup/MPI_Comm_free on >> my own machine (with OpenMPI), all the communicators are freed from my >> observation. I could not test it with Spectrum MPI on the clusters >> immediately because all the dependencies were built in release mode. >> However, as I mentioned, I haven't had this problem with OpenMPI before, >> so I'm not sure if this is really an MPI implementation problem, or just >> because Spectrum MPI has less limit for the number of communicators, >> and/or this also depends on how many MPI ranks are used, as only 2 out >> of 40 ranks reported the error. >> You can add printf around MPI_Comm_dup/MPI_Comm_free sites on the two ranks, e.g., if (myrank == 38) printf(...), to see if the dup/free are paired. >> >> As a workaround, I replaced the MPI_Comm_dup() at >> petsc/src/mat/impls/hypre/mhypre.c:2120 with a copy assignment, and also >> removed the MPI_Comm_free() in the hypre destroyer. My code runs fine >> with Spectrum MPI now, but I don't think this is a long-term solution. >> >> Thanks! >> >> Feimi >> >> On 8/19/21 9:01 AM, Jed Brown wrote: >> > Junchao Zhang > writes: >> > >> >> Hi, Feimi, >> >> I need to consult Jed (cc'ed). >> >> Jed, is this an example of >> >> https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663 ? >> >> If Feimi really can not free matrices, then we just need to attach a >> >> hypre-comm to a petsc inner comm, and pass that to hypre. >> > Are there a bunch of solves as in that case? >> > >> > My understanding is that one should be able to MPI_Comm_dup/MPI_Comm_free as many times as you like, but the implementation has limits on how many communicators can co-exist at any one time. The many-at-once is what we encountered in that 2018 thread. >> > >> > One way to check would be to use a debugger or tracer to examine the stack every time (P)MPI_Comm_dup and (P)MPI_Comm_free are called. >> > >> > case 1: we'll find lots of dups without frees (until the end) because the user really wants lots of these existing at the same time. >> > >> > case 2: dups are unfreed because of reference counting issue/inessential references >> > >> > >> > In case 1, I think the solution is as outlined in the thread, PETSc can create an inner-comm for Hypre. I think I'd prefer to attach it to the outer comm instead of the PETSc inner comm, but perhaps a case could be made either way. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sayosale at hotmail.com Fri Aug 20 06:55:29 2021 From: sayosale at hotmail.com (dazza simplythebest) Date: Fri, 20 Aug 2021 11:55:29 +0000 Subject: [petsc-users] Improving efficiency of slepc usage In-Reply-To: References: Message-ID: Dear Jose, Many thanks for your response, I have been investigating this issue with a few more calculations today, hence the slightly delayed response. The problem is actually derived from a fluid dynamics problem, so to allow an easier exploration of things I first downsized the resolution of the underlying fluid solver while keeping all the physical parameters the same - i.e. I would get a smaller matrix that should be solving the same physical problem as the original larger matrix but to lower accuracy. Results Small matrix (N= 21168) - everything good! This converged when using the -eps_largest_real approach (taking 92 iterations for nev=10, tol= 5.0000E-06 and ncv = 300), and also when using the shift-invert approach, converging very impressively in a single iteration ! Interestingly it did this both for a non-zero -eps_target and also for a zero -eps_target. Large matrix (N=50400)- works for -eps_largest_real , fails for st_type sinvert I have just double checked again that the code does run properly when we use the -eps_largest_real option - indeed I ran it with a small nev and large tolerance (nev = 4, tol= -eps_tol 5.0e-4 , ncv = 300) and with these parameters convergence was obtained in 164 iterations, which took 6 hours on the machine I was running it on. Furthermore the eigenvalues seem to be ballpark correct; for this large higher resolution case (although with lower slepc tolerance) we obtain 1789.56816314173 -4724.51319554773i as the eigenvalue with largest real part, while the smaller matrix (same physical problem but at lower resolution case) found this eigenvalue to be 1831.11845726501 -4787.54519511345i , which means the agreement is in line with expectations. Unfortunately though the code does still crash though when I try to do shift-invert for the large matrix case , whether or not I use a non-zero -eps_target. For reference this is the command line used : -eps_nev 10 -eps_ncv 300 -log_view -eps_view -eps_target 0.1 -st_type sinvert -eps_monitor :monitor_output05.txt To be precise the code crashes soon after calling EPSSolve (it successfully calls MatCreateVecs, EPSCreate, EPSSetOperators, EPSSetProblemType and EPSSetFromOptions). By crashes I mean that I do not even get any error messages from slepc/PETSC, and do not even get the 'EPS Object: 16 MPI processes' message - I simply get a MPI/Fortran 'KILLED BY SIGNAL: 9 (Killed)' message as soon as EPSsolve is called. Do you have any ideas as to why this larger matrix case should fail when using shift-invert but succeed when using -eps_largest_real ? The fact that the program works and produces correct results when using the -eps_largest_real option suggests that there is probably nothing wrong with the specification of the problem or the matrices ? It is strange how there is no error message from slepc / Petsc ... the only idea I have at the moment is that perhaps max memory has been exceeded, which could cause such a sudden shutdown? For your reference when running the large matrix case with the -eps_largest_real option I am using about 36 GB of the 148GB available on this machine - does the shift invert approach require substantially more memory for example ? I would be very grateful if you have any suggestions to resolve this issue or even ways to clarify it further, the performance I have seen with the shift-invert for the small matrix is so impressive it would be great to get that working for the full-size problem. Many thanks and best wishes, Dan. ________________________________ From: Jose E. Roman Sent: Thursday, August 19, 2021 7:58 AM To: dazza simplythebest Cc: PETSc Subject: Re: [petsc-users] Improving efficiency of slepc usage In A) convergence may be slow, especially if the wanted eigenvalues have small magnitude. I would not say 600 iterations is a lot, you probably need many more. In most cases, approach B) is better because it improves convergence of eigenvalues close to the target, but it requires prior knowledge of your spectrum distribution in order to choose an appropriate target. In B) what do you mean that it crashes. If you get an error about factorization, it means that your A-matrix is singular, In that case, try using a nonzero target -eps_target 0.1 Jose > El 19 ago 2021, a las 7:12, dazza simplythebest escribi?: > > Dear All, > I am planning on using slepc to do a large number of eigenvalue calculations > of a generalized eigenvalue problem, called from a program written in fortran using MPI. > Thus far I have successfully installed the slepc/PETSc software, both locally and on a cluster, > and on smaller test problems everything is working well; the matrices are efficiently and > correctly constructed and slepc returns the correct spectrum. I am just now starting to move > towards now solving the full-size 'production run' problems, and would appreciate some > general advice on how to improve the solver's performance. > > In particular, I am currently trying to solve the problem Ax = lambda Bx whose matrices > are of size 50000 (this is the smallest 'production run' problem I will be tackling), and are > complex, non-Hermitian. In most cases I aim to find the eigenvalues with the largest real part, > although in other cases I will also be interested in finding the eigenvalues whose real part > is close to zero. > > A) > Calling slepc 's EPS solver with the following options: > > -eps_nev 10 -log_view -eps_view -eps_max_it 600 -eps_ncv 140 -eps_tol 5.0e-6 -eps_largest_real -eps_monitor :monitor_output.txt > > > led to the code successfully running, but failing to find any eigenvalues within the maximum 600 iterations > (examining the monitor output it did appear to be very slowly approaching convergence). > > B) > On the same problem I have also tried a shift-invert transformation using the options > > -eps_nev 10 -eps_ncv 140 -eps_target 0.0+0.0i -st_type sinvert > > -in this case the code crashed at the point it tried to call slepc, so perhaps I have incorrectly specified these options ? > > > Does anyone have any suggestions as to how to improve this performance ( or find out more about the problem) ? > In the case of A) I can see from watching the slepc videos that increasing ncv > may help, but I am wondering , since 600 is a large number of iterations, whether there > maybe something else going on - e.g. perhaps some alternative preconditioner may help ? > In the case of B), I guess there must be some mistake in these command line options? > Again, any advice will be greatly appreciated. > Best wishes, Dan. -------------- next part -------------- An HTML attachment was scrubbed... URL: From numbersixvs at gmail.com Fri Aug 20 03:02:22 2021 From: numbersixvs at gmail.com (=?UTF-8?B?0J3QsNC30LTRgNCw0YfRkdCyINCS0LjQutGC0L7RgA==?=) Date: Fri, 20 Aug 2021 11:02:22 +0300 Subject: [petsc-users] Euclid or Boomeramg vs ILU: questions. Message-ID: *Hello, dear PETSc team!* I have a 3D elasticity with heterogeneous properties problem. There is unstructured grid with aspect ratio varied from 4 to 25. Dirichlet BCs (bottom zero displacements) are imposed via linear constraint equations using Lagrange multipliers. Also, Neumann (traction) BCs are imposed on side edges of mesh. Gravity load is also accounted for. I can solve this problem with *dgmres solver* and *ILU* as a *preconditioner*. But ILU doesn`t support parallel computing, so I decided to use Euclid or Boomeramg as a preconditioner. The issue is in slow convergence and high memory consumption, much higher, than for ILU. E.g., for source matrix size 2.14 GB with *ILU-0 preconditioning* memory consumption is about 5.9 GB, and the process converges due to 767 iterations, and with *Euclid-0 preconditioning* memory consumption is about 8.7 GB, and the process converges due to 1732 iterations. One of the following preconditioners is currently in use: *ILU-0, ILU-1, Hypre (Euclid), Hypre (boomeramg)*. As a result of computations *(logs and memory logs are attached)*, the following is established for preconditioners: 1. *ILU-0*: does not always provide convergence (or provides, but slow); uses an acceptable amount of RAM; does not support parallel computing. 2. *ILU-1*: stable; memory consumption is much higher than that of ILU-0; does not support parallel computing. 3. *Euclid*: provides very slow convergence, calculations are performed several times slower than for ILU-0; memory consumption greatly exceeds both ILU-0 and ILU-1; supports parallel computing. Also ?drop tolerance? doesn?t provide enough accuracy in some cells, so I don?t use it. 4. *Boomeramg*: provides very slow convergence, calculations are performed several times slower than for ILU-0; memory consumption greatly exceeds both ILU-0 and ILU-1; supports parallel computing. In this regard, the following questions arose: 1. Is this behavior expected for HYPRE in computations with 1 MPI process? If not, is that problem can be related to *PETSc* or *HYPRE*? 2. Hypre (Euclid) has much fewer parameters than ILU. Among them is the factorization level *"-pc_hypre_euclid_level : Factorization levels (None)"* and its default value looks very strange, moreover, it doesn?t matter what factor is chosen -2, -1 or 0. Could it be that the parameter is confused with Column pivot tolerance in ILU - *"-pc_factor_column_pivot <-2.: -2.>: Column pivot tolerance (used only for some factorization) (PCFactorSetColumnPivot)"*? 3. What preconditioner would you recommend to: optimize *convergence*, *memory* consumption, add *parallel computing*? 4. How can we theoretically estimate memory costs with *ILU, Euclid, Boomeramg*? 5. At what stage are memory leaks most likely? In any case, thank you so much for your attention! Will be grateful for any response. Kind regards, Viktor Nazdrachev R&D senior researcher Geosteering Technologies LLC -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: logs.rar Type: application/octet-stream Size: 90710 bytes Desc: not available URL: From joauma.marichal at uclouvain.be Fri Aug 20 03:33:03 2021 From: joauma.marichal at uclouvain.be (Joauma Marichal) Date: Fri, 20 Aug 2021 08:33:03 +0000 Subject: [petsc-users] Parallelize in the y direction Message-ID: Dear Sir or Madam, I am looking for advice regarding some of PETSc functionnalities. I am currently using PETSc to solve the Navier-Stokes equations on a 3D mesh decomposed over several processors. However, until now, the processors are distributed along the x and z directions but not along the y one. Indeed, at some point in the algorithm, I must solve a tridiagonal system that depends only on y. Until now, I have therefore performed something like this: for(int k = cornp->zs, kzs+cornp->zm; ++k){ for(int i = cornp->xs, ixs+cornp->xm; ++i){ Create and solve a tridiagonal system for all the y coordinates (which are on the same process) } However, I would like to decompose my mesh in the y direction (as this should improve the code efficiency). I managed to do so by creating a system based on the 3D DM of all my case (so 1 system of size x*y*z). Unfortunately, this does not seem to be very efficient. Do you have some advice on how to cut in the y direction while still being able to solve x*z systems of size y? Should I create 1D DMs? Thanks a lot for your help. Best regards, Joauma Marichal -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Fri Aug 20 08:58:45 2021 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Fri, 20 Aug 2021 08:58:45 -0500 Subject: [petsc-users] Reaching limit number of communicator with Spectrum MPI In-Reply-To: References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu> <71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu> <1ce1902e-4335-29d4-033-42e219e6355c@mcs.anl.gov> <878s0x6118.fsf@jedbrown.org> Message-ID: Feimi, if it is easy to reproduce, could you give instructions on how to reproduce that? PS: Spectrum MPI is based on OpenMPI. I don't understand why it has the problem but OpenMPI does not. It could be a bug in petsc or user's code. For reference counting on MPI_Comm, we already have petsc inner comm. I think we can reuse that. --Junchao Zhang On Fri, Aug 20, 2021 at 12:33 AM Barry Smith wrote: > > It sounds like maybe the Spectrum MPI_Comm_free() is not returning the > comm to the "pool" as available for future use; a very buggy MPI > implementation. This can easily be checked in a tiny standalone MPI program > that simply comm dups and frees thousands of times in a loop. Could even be > a configure test (that requires running an MPI program). I do not remember > if we ever tested this possibility; maybe and I forgot. > > If this is the problem we can provide a "work around" that attributes > the new comm (to be passed to hypre) to the old comm with a reference count > value also in the attribute. When the hypre matrix is created that count is > (with the new comm) is set to 1, when the hypre matrix is freed that count > is set to zero (but the comm is not freed), in the next call to create the > hypre matrix when the attribute is found, the count is zero so PETSc knows > it can pass the same comm again to the new hypre matrix. > > This will only allow one simultaneous hypre matrix to be created from the > original comm. To allow multiply simultaneous hypre matrix one could have > multiple comms and counts in the attribute and just check them until one > finds an available one to reuse (or creates yet another one if all the > current ones are busy with hypre matrices). So it is the same model as > DMGetXXVector() where vectors are checked out and then checked in to be > available later. This would solve the currently reported problem (if it is > a buggy MPI that does not properly free comms), but not solve the MOOSE > problem where 10,000 comms are needed at the same time. > > Barry > > > > > > On Aug 19, 2021, at 3:29 PM, Junchao Zhang > wrote: > > > > > On Thu, Aug 19, 2021 at 2:08 PM Feimi Yu wrote: > >> Hi Jed, >> >> In my case, I only have 2 hypre preconditioners at the same time, and >> they do not solve simultaneously, so it might not be case 1. >> >> I checked the stack for all the calls of MPI_Comm_dup/MPI_Comm_free on >> my own machine (with OpenMPI), all the communicators are freed from my >> observation. I could not test it with Spectrum MPI on the clusters >> immediately because all the dependencies were built in release mode. >> However, as I mentioned, I haven't had this problem with OpenMPI before, >> so I'm not sure if this is really an MPI implementation problem, or just >> because Spectrum MPI has less limit for the number of communicators, >> and/or this also depends on how many MPI ranks are used, as only 2 out >> of 40 ranks reported the error. >> > You can add printf around MPI_Comm_dup/MPI_Comm_free sites on the two > ranks, e.g., if (myrank == 38) printf(...), to see if the dup/free are > paired. > > As a workaround, I replaced the MPI_Comm_dup() at > >> petsc/src/mat/impls/hypre/mhypre.c:2120 with a copy assignment, and also >> removed the MPI_Comm_free() in the hypre destroyer. My code runs fine >> with Spectrum MPI now, but I don't think this is a long-term solution. >> >> Thanks! >> >> Feimi >> >> On 8/19/21 9:01 AM, Jed Brown wrote: >> > Junchao Zhang writes: >> > >> >> Hi, Feimi, >> >> I need to consult Jed (cc'ed). >> >> Jed, is this an example of >> >> >> https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663 >> ? >> >> If Feimi really can not free matrices, then we just need to attach a >> >> hypre-comm to a petsc inner comm, and pass that to hypre. >> > Are there a bunch of solves as in that case? >> > >> > My understanding is that one should be able to >> MPI_Comm_dup/MPI_Comm_free as many times as you like, but the >> implementation has limits on how many communicators can co-exist at any one >> time. The many-at-once is what we encountered in that 2018 thread. >> > >> > One way to check would be to use a debugger or tracer to examine the >> stack every time (P)MPI_Comm_dup and (P)MPI_Comm_free are called. >> > >> > case 1: we'll find lots of dups without frees (until the end) because >> the user really wants lots of these existing at the same time. >> > >> > case 2: dups are unfreed because of reference counting >> issue/inessential references >> > >> > >> > In case 1, I think the solution is as outlined in the thread, PETSc can >> create an inner-comm for Hypre. I think I'd prefer to attach it to the >> outer comm instead of the PETSc inner comm, but perhaps a case could be >> made either way. >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Aug 20 09:12:58 2021 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 20 Aug 2021 09:12:58 -0500 Subject: [petsc-users] Improving efficiency of slepc usage In-Reply-To: References: Message-ID: On Fri, Aug 20, 2021 at 6:55 AM dazza simplythebest wrote: > Dear Jose, > Many thanks for your response, I have been investigating this issue > with a few more calculations > today, hence the slightly delayed response. > > The problem is actually derived from a fluid dynamics problem, so to allow > an easier exploration of things > I first downsized the resolution of the underlying fluid solver while > keeping all the physical parameters > the same - i.e. I would get a smaller matrix that should be solving the > same physical problem as the original > larger matrix but to lower accuracy. > > *Results* > > *Small matrix (N= 21168) - everything good!* > This converged when using the -eps_largest_real approach (taking 92 > iterations for nev=10, > tol= 5.0000E-06 and ncv = 300), and also when using the shift-invert > approach, converging > very impressively in a single iteration ! Interestingly it did this both > for a non-zero -eps_target > and also for a zero -eps_target. > > *Large matrix (N=50400)- works for -eps_largest_real , fails for st_type > sinvert * > I have just double checked again that the code does run properly when we > use the -eps_largest_real > option - indeed I ran it with a small nev and large tolerance (nev = 4, > tol= -eps_tol 5.0e-4 , ncv = 300) > and with these parameters convergence was obtained in 164 iterations, > which took 6 hours on the > machine I was running it on. Furthermore the eigenvalues seem to be > ballpark correct; for this large > higher resolution case (although with lower slepc tolerance) we obtain > 1789.56816314173 -4724.51319554773i > as the eigenvalue with largest real part, while the smaller matrix (same > physical problem but at lower resolution case) > found this eigenvalue to be 1831.11845726501 -4787.54519511345i , which > means the agreement is in line > with expectations. > > *Unfortunately though the code does still crash though when I try to do > shift-invert for the large matrix case *, > whether or not I use a non-zero -eps_target. For reference this is the > command line used : > -eps_nev 10 -eps_ncv 300 -log_view -eps_view -eps_target 0.1 > -st_type sinvert -eps_monitor :monitor_output05.txt > To be precise the code crashes soon after calling EPSSolve (it > successfully calls > MatCreateVecs, EPSCreate, EPSSetOperators, EPSSetProblemType and > EPSSetFromOptions). > By crashes I mean that I do not even get any error messages from > slepc/PETSC, and do not even get the > 'EPS Object: 16 MPI processes' message - I simply get a MPI/Fortran > 'KILLED BY SIGNAL: 9 (Killed)' message > as soon as EPSsolve is called. > Hi Dan, It would help track this error down if we had a stack trace. You can get a stack trace from the debugger. You run with -start_in_debugger which should launch the debugger (usually), and then type cont to continue, and then where to get the stack trace when it crashes, or 'bt' on lldb. Thanks, Matt > Do you have any ideas as to why this larger matrix case should fail when > using shift-invert but succeed when using > -eps_largest_real ? The fact that the program works and produces correct > results > when using the -eps_largest_real option suggests that there is probably > nothing wrong with the specification > of the problem or the matrices ? It is strange how there is no error > message from slepc / Petsc ... the > only idea I have at the moment is that perhaps max memory has been > exceeded, which could cause such a sudden > shutdown? For your reference when running the large matrix case with the > -eps_largest_real option I am using > about 36 GB of the 148GB available on this machine - does the shift > invert approach require substantially > more memory for example ? > > I would be very grateful if you have any suggestions to resolve this > issue or even ways to clarify it further, > the performance I have seen with the shift-invert for the small matrix is > so impressive it would be great to > get that working for the full-size problem. > > Many thanks and best wishes, > Dan. > > > > ------------------------------ > *From:* Jose E. Roman > *Sent:* Thursday, August 19, 2021 7:58 AM > *To:* dazza simplythebest > *Cc:* PETSc > *Subject:* Re: [petsc-users] Improving efficiency of slepc usage > > In A) convergence may be slow, especially if the wanted eigenvalues have > small magnitude. I would not say 600 iterations is a lot, you probably need > many more. In most cases, approach B) is better because it improves > convergence of eigenvalues close to the target, but it requires prior > knowledge of your spectrum distribution in order to choose an appropriate > target. > > In B) what do you mean that it crashes. If you get an error about > factorization, it means that your A-matrix is singular, In that case, try > using a nonzero target -eps_target 0.1 > > Jose > > > > El 19 ago 2021, a las 7:12, dazza simplythebest > escribi?: > > > > Dear All, > > I am planning on using slepc to do a large number of > eigenvalue calculations > > of a generalized eigenvalue problem, called from a program written in > fortran using MPI. > > Thus far I have successfully installed the slepc/PETSc software, both > locally and on a cluster, > > and on smaller test problems everything is working well; the matrices > are efficiently and > > correctly constructed and slepc returns the correct spectrum. I am just > now starting to move > > towards now solving the full-size 'production run' problems, and would > appreciate some > > general advice on how to improve the solver's performance. > > > > In particular, I am currently trying to solve the problem Ax = lambda Bx > whose matrices > > are of size 50000 (this is the smallest 'production run' problem I will > be tackling), and are > > complex, non-Hermitian. In most cases I aim to find the eigenvalues > with the largest real part, > > although in other cases I will also be interested in finding the > eigenvalues whose real part > > is close to zero. > > > > A) > > Calling slepc 's EPS solver with the following options: > > > > -eps_nev 10 -log_view -eps_view -eps_max_it 600 -eps_ncv 140 -eps_tol > 5.0e-6 -eps_largest_real -eps_monitor :monitor_output.txt > > > > > > led to the code successfully running, but failing to find any > eigenvalues within the maximum 600 iterations > > (examining the monitor output it did appear to be very slowly > approaching convergence). > > > > B) > > On the same problem I have also tried a shift-invert transformation > using the options > > > > -eps_nev 10 -eps_ncv 140 -eps_target 0.0+0.0i -st_type sinvert > > > > -in this case the code crashed at the point it tried to call slepc, so > perhaps I have incorrectly specified these options ? > > > > > > Does anyone have any suggestions as to how to improve this performance ( > or find out more about the problem) ? > > In the case of A) I can see from watching the slepc videos that > increasing ncv > > may help, but I am wondering , since 600 is a large number of > iterations, whether there > > maybe something else going on - e.g. perhaps some alternative > preconditioner may help ? > > In the case of B), I guess there must be some mistake in these command > line options? > > Again, any advice will be greatly appreciated. > > Best wishes, Dan. > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Aug 20 09:14:51 2021 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 20 Aug 2021 09:14:51 -0500 Subject: [petsc-users] Parallelize in the y direction In-Reply-To: References: Message-ID: On Fri, Aug 20, 2021 at 7:53 AM Joauma Marichal < joauma.marichal at uclouvain.be> wrote: > Dear Sir or Madam, > > I am looking for advice regarding some of PETSc functionnalities. I am > currently using PETSc to solve the Navier-Stokes equations on a 3D mesh > decomposed over several processors. However, until now, the processors are > distributed along the x and z directions but not along the y one. Indeed, > at some point in the algorithm, I must solve a tridiagonal system that > depends only on y. Until now, I have therefore performed something like > this: > for(int k = cornp->zs, kzs+cornp->zm; ++k){ > for(int i = cornp->xs, ixs+cornp->xm; ++i){ > Create and solve a tridiagonal system for all the y coordinates > (which are on the same process) > } > However, I would like to decompose my mesh in the y direction (as this > should improve the code efficiency). > I managed to do so by creating a system based on the 3D DM of all my case > (so 1 system of size x*y*z). Unfortunately, this does not seem to be very > efficient. > Do you have some advice on how to cut in the y direction while still being > able to solve x*z systems of size y? Should I create 1D DMs? > 1) Are you using a 3D DMDA? 2) Is the coupling much different in the x and z than in the y direction? Thanks, Matt > Thanks a lot for your help. > > Best regards, > > Joauma Marichal > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhugp01 at nus.edu.sg Fri Aug 20 09:59:03 2021 From: zhugp01 at nus.edu.sg (Guangpu Zhu) Date: Fri, 20 Aug 2021 14:59:03 +0000 Subject: [petsc-users] Using Elemetal with petsc4py to solve AX = B paralelly Message-ID: Dear Sir/Madam, I am trying to use the petsc4py to solve AX = B parallelly, where A is a large dense matrix. The Elemental package in petsc4py is very suitable for the dense matrix, but I can't find any example or learning material about it on the PETSc website and other websites. I am writing this e-mail to ask if you can kindly provide a minimal example for solving a linear system based on Elemental with petsc4py. I am looking forward to hearing from you. Thank you very much. Best, Guangpu Zhu --- Guangpu Zhu (???) Research Associate, Department of Mechanical Engineering National University of Singapore Personal E-mail: zhugpupc at gmail.com Phone: (+65) 87581879 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Fri Aug 20 11:20:24 2021 From: jroman at dsic.upv.es (Jose E. Roman) Date: Fri, 20 Aug 2021 18:20:24 +0200 Subject: [petsc-users] Improving efficiency of slepc usage In-Reply-To: References: Message-ID: <922F34AC-1EB5-4A63-B25F-11C0007BD372@dsic.upv.es> Maybe too much fill-in during factorization. Try using an external linear solver such as MUMPS as explained in section 3.4.1 of SLEPc's users manual. Jose > El 20 ago 2021, a las 16:12, Matthew Knepley escribi?: > > On Fri, Aug 20, 2021 at 6:55 AM dazza simplythebest wrote: > Dear Jose, > Many thanks for your response, I have been investigating this issue with a few more calculations > today, hence the slightly delayed response. > > The problem is actually derived from a fluid dynamics problem, so to allow an easier exploration of things > I first downsized the resolution of the underlying fluid solver while keeping all the physical parameters > the same - i.e. I would get a smaller matrix that should be solving the same physical problem as the original > larger matrix but to lower accuracy. > > Results > > Small matrix (N= 21168) - everything good! > This converged when using the -eps_largest_real approach (taking 92 iterations for nev=10, > tol= 5.0000E-06 and ncv = 300), and also when using the shift-invert approach, converging > very impressively in a single iteration ! Interestingly it did this both for a non-zero -eps_target > and also for a zero -eps_target. > > Large matrix (N=50400)- works for -eps_largest_real , fails for st_type sinvert > I have just double checked again that the code does run properly when we use the -eps_largest_real > option - indeed I ran it with a small nev and large tolerance (nev = 4, tol= -eps_tol 5.0e-4 , ncv = 300) > and with these parameters convergence was obtained in 164 iterations, which took 6 hours on the > machine I was running it on. Furthermore the eigenvalues seem to be ballpark correct; for this large > higher resolution case (although with lower slepc tolerance) we obtain 1789.56816314173 -4724.51319554773i > as the eigenvalue with largest real part, while the smaller matrix (same physical problem but at lower resolution case) > found this eigenvalue to be 1831.11845726501 -4787.54519511345i , which means the agreement is in line > with expectations. > > Unfortunately though the code does still crash though when I try to do shift-invert for the large matrix case , > whether or not I use a non-zero -eps_target. For reference this is the command line used : > -eps_nev 10 -eps_ncv 300 -log_view -eps_view -eps_target 0.1 -st_type sinvert -eps_monitor :monitor_output05.txt > To be precise the code crashes soon after calling EPSSolve (it successfully calls > MatCreateVecs, EPSCreate, EPSSetOperators, EPSSetProblemType and EPSSetFromOptions). > By crashes I mean that I do not even get any error messages from slepc/PETSC, and do not even get the > 'EPS Object: 16 MPI processes' message - I simply get a MPI/Fortran 'KILLED BY SIGNAL: 9 (Killed)' message > as soon as EPSsolve is called. > > Hi Dan, > > It would help track this error down if we had a stack trace. You can get a stack trace from the debugger. You run with > > -start_in_debugger > > which should launch the debugger (usually), and then type > > cont > > to continue, and then > > where > > to get the stack trace when it crashes, or 'bt' on lldb. > > Thanks, > > Matt > > Do you have any ideas as to why this larger matrix case should fail when using shift-invert but succeed when using > -eps_largest_real ? The fact that the program works and produces correct results > when using the -eps_largest_real option suggests that there is probably nothing wrong with the specification > of the problem or the matrices ? It is strange how there is no error message from slepc / Petsc ... the > only idea I have at the moment is that perhaps max memory has been exceeded, which could cause such a sudden > shutdown? For your reference when running the large matrix case with the -eps_largest_real option I am using > about 36 GB of the 148GB available on this machine - does the shift invert approach require substantially > more memory for example ? > > I would be very grateful if you have any suggestions to resolve this issue or even ways to clarify it further, > the performance I have seen with the shift-invert for the small matrix is so impressive it would be great to > get that working for the full-size problem. > > Many thanks and best wishes, > Dan. > > > > From: Jose E. Roman > Sent: Thursday, August 19, 2021 7:58 AM > To: dazza simplythebest > Cc: PETSc > Subject: Re: [petsc-users] Improving efficiency of slepc usage > > In A) convergence may be slow, especially if the wanted eigenvalues have small magnitude. I would not say 600 iterations is a lot, you probably need many more. In most cases, approach B) is better because it improves convergence of eigenvalues close to the target, but it requires prior knowledge of your spectrum distribution in order to choose an appropriate target. > > In B) what do you mean that it crashes. If you get an error about factorization, it means that your A-matrix is singular, In that case, try using a nonzero target -eps_target 0.1 > > Jose > > > > El 19 ago 2021, a las 7:12, dazza simplythebest escribi?: > > > > Dear All, > > I am planning on using slepc to do a large number of eigenvalue calculations > > of a generalized eigenvalue problem, called from a program written in fortran using MPI. > > Thus far I have successfully installed the slepc/PETSc software, both locally and on a cluster, > > and on smaller test problems everything is working well; the matrices are efficiently and > > correctly constructed and slepc returns the correct spectrum. I am just now starting to move > > towards now solving the full-size 'production run' problems, and would appreciate some > > general advice on how to improve the solver's performance. > > > > In particular, I am currently trying to solve the problem Ax = lambda Bx whose matrices > > are of size 50000 (this is the smallest 'production run' problem I will be tackling), and are > > complex, non-Hermitian. In most cases I aim to find the eigenvalues with the largest real part, > > although in other cases I will also be interested in finding the eigenvalues whose real part > > is close to zero. > > > > A) > > Calling slepc 's EPS solver with the following options: > > > > -eps_nev 10 -log_view -eps_view -eps_max_it 600 -eps_ncv 140 -eps_tol 5.0e-6 -eps_largest_real -eps_monitor :monitor_output.txt > > > > > > led to the code successfully running, but failing to find any eigenvalues within the maximum 600 iterations > > (examining the monitor output it did appear to be very slowly approaching convergence). > > > > B) > > On the same problem I have also tried a shift-invert transformation using the options > > > > -eps_nev 10 -eps_ncv 140 -eps_target 0.0+0.0i -st_type sinvert > > > > -in this case the code crashed at the point it tried to call slepc, so perhaps I have incorrectly specified these options ? > > > > > > Does anyone have any suggestions as to how to improve this performance ( or find out more about the problem) ? > > In the case of A) I can see from watching the slepc videos that increasing ncv > > may help, but I am wondering , since 600 is a large number of iterations, whether there > > maybe something else going on - e.g. perhaps some alternative preconditioner may help ? > > In the case of B), I guess there must be some mistake in these command line options? > > Again, any advice will be greatly appreciated. > > Best wishes, Dan. > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From mfadams at lbl.gov Fri Aug 20 13:21:29 2021 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 20 Aug 2021 14:21:29 -0400 Subject: [petsc-users] Euclid or Boomeramg vs ILU: questions. In-Reply-To: References: Message-ID: Constraints are a pain with scalable/iterative solvers. If you order the constraints last then ILU should work as well as it can work, but AMG gets confused by the constraint equations. You could look at PETSc's Stokes solvers, but it would be best if you could remove the constrained equations from your system if they are just simple point wise BC's. Mark On Fri, Aug 20, 2021 at 8:53 AM ????????? ?????? wrote: > *Hello, dear PETSc team!* > > > > I have a 3D elasticity with heterogeneous properties problem. There is > unstructured grid with aspect ratio varied from 4 to 25. Dirichlet BCs > (bottom zero displacements) are imposed via linear constraint equations > using Lagrange multipliers. Also, Neumann (traction) BCs are imposed on > side edges of mesh. Gravity load is also accounted for. > > I can solve this problem with *dgmres solver* and *ILU* as a > *preconditioner*. But ILU doesn`t support parallel computing, so I > decided to use Euclid or Boomeramg as a preconditioner. The issue is in > slow convergence and high memory consumption, much higher, than for ILU. > > E.g., for source matrix size 2.14 GB with *ILU-0 preconditioning* memory > consumption is about 5.9 GB, and the process converges due to 767 > iterations, and with *Euclid-0 preconditioning* memory consumption is > about 8.7 GB, and the process converges due to 1732 iterations. > > One of the following preconditioners is currently in use: *ILU-0, ILU-1, > Hypre (Euclid), Hypre (boomeramg)*. > > As a result of computations *(logs and memory logs are attached)*, the > following is established for preconditioners: > > 1. *ILU-0*: does not always provide convergence (or provides, but slow); > uses an acceptable amount of RAM; does not support parallel computing. > > 2. *ILU-1*: stable; memory consumption is much higher than that of ILU-0; > does not support parallel computing. > > 3. *Euclid*: provides very slow convergence, calculations are performed > several times slower than for ILU-0; memory consumption greatly exceeds > both ILU-0 and ILU-1; supports parallel computing. Also ?drop tolerance? > doesn?t provide enough accuracy in some cells, so I don?t use it. > > 4. *Boomeramg*: provides very slow convergence, calculations are > performed several times slower than for ILU-0; memory consumption greatly > exceeds both ILU-0 and ILU-1; supports parallel computing. > > > > In this regard, the following questions arose: > > 1. Is this behavior expected for HYPRE in computations with 1 MPI process? > If not, is that problem can be related to *PETSc* or *HYPRE*? > > 2. Hypre (Euclid) has much fewer parameters than ILU. Among them is the > factorization level *"-pc_hypre_euclid_level : > Factorization levels (None)"* and its default value looks very strange, > moreover, it doesn?t matter what factor is chosen -2, -1 or 0. Could it be > that the parameter is confused with Column pivot tolerance in ILU - *"-pc_factor_column_pivot > <-2.: -2.>: Column pivot tolerance (used only for some factorization) > (PCFactorSetColumnPivot)"*? > > 3. What preconditioner would you recommend to: optimize *convergence*, > *memory* consumption, add *parallel computing*? > > 4. How can we theoretically estimate memory costs with *ILU, Euclid, > Boomeramg*? > > 5. At what stage are memory leaks most likely? > > > > In any case, thank you so much for your attention! Will be grateful for > any response. > > Kind regards, > Viktor Nazdrachev > R&D senior researcher > Geosteering Technologies LLC > -------------- next part -------------- An HTML attachment was scrubbed... URL: From s_g at berkeley.edu Fri Aug 20 13:32:13 2021 From: s_g at berkeley.edu (Sanjay Govindjee) Date: Fri, 20 Aug 2021 11:32:13 -0700 Subject: [petsc-users] Euclid or Boomeramg vs ILU: questions. In-Reply-To: References: Message-ID: <15e3a0b9-b16b-8a59-5801-bef9895f762d@berkeley.edu> Mark's suggestion will definitely help a lot.? Remove the displacement bc equations or include them in the matrix by zeroing out the row and putting a 1 on the diagonal.? The Lagrange multiplier will cause grief. On 8/20/21 11:21 AM, Mark Adams wrote: > Constraints are a pain with scalable/iterative solvers. If you order > the constraints last then ILU should work as well as it can work,?but > AMG gets confused by the constraint equations. > You could look at PETSc's?Stokes solvers, but it would be best if you > could remove the constrained equations from your system if they are > just simple point wise BC's. > Mark > > On Fri, Aug 20, 2021 at 8:53 AM ????????? ?????? > > wrote: > > *Hello, dear PETSc team!* > > I have a 3D elasticity with heterogeneous properties problem. > There is unstructured grid with aspect ratio varied from 4 to 25. > Dirichlet BCs (bottom zero displacements) are imposed via linear > constraint equations using Lagrange multipliers. Also, Neumann > (traction) BCs are imposed on side edges of mesh. Gravity load is > also accounted for. > > I can solve this problem with *dgmres solver*?and *ILU*?as a > *preconditioner*. But ILU doesn`t support parallel computing, so I > decided to use Euclid or Boomeramg as a preconditioner. The issue > is in slow convergence and high memory consumption, much higher, > than for ILU. > > E.g., for source matrix size 2.14 GB with *ILU-0 > preconditioning*?memory consumption is about 5.9 GB, and the > process converges due to 767 iterations, and with *Euclid-0 > preconditioning*?memory consumption is about 8.7 GB, and the > process converges due to 1732 iterations. > > One of the following preconditioners is currently in use: *ILU-0, > ILU-1, Hypre (Euclid), Hypre (boomeramg)*. > > As a result of computations */(logs and memory logs are > attached)/*, the following is established for preconditioners: > > 1. *ILU-0*: does not always provide convergence (or provides, but > slow); uses an acceptable amount of RAM; does not support parallel > computing. > > 2. *ILU-1*: stable; memory consumption is much higher than that of > ILU-0; does not support parallel computing. > > 3. *Euclid*: provides very slow convergence, calculations are > performed several times slower than for ILU-0; memory consumption > greatly exceeds both ILU-0 and ILU-1; supports parallel computing. > Also ?drop tolerance? doesn?t provide enough accuracy in some > cells, so I don?t use it. > > 4. *Boomeramg*: provides very slow convergence, calculations are > performed several times slower than for ILU-0; memory consumption > greatly exceeds both ILU-0 and ILU-1; supports parallel computing. > > In this regard, the following questions arose: > > 1. Is this behavior expected for HYPRE in computations with 1 MPI > process? If not, is that problem can be related to *PETSc*?or *HYPRE*? > > 2. Hypre (Euclid) has much fewer parameters than ILU. Among them > is the factorization level *"-pc_hypre_euclid_level formerly -2>: Factorization levels (None)"*?and its default value > looks very strange, moreover, it doesn?t matter what factor is > chosen -2, -1 or 0. Could it be that the parameter is confused > with Column pivot tolerance in ILU - *"-pc_factor_column_pivot > <-2.: -2.>: Column pivot tolerance (used only for some > factorization) (PCFactorSetColumnPivot)"*? > > 3. What preconditioner would you recommend to: optimize > *convergence*, *memory*?consumption, add *parallel computing*? > > 4. How can we theoretically estimate memory costs with *ILU, > Euclid, Boomeramg*? > > 5. At what stage are memory leaks most likely? > > In any case, thank you so much for your attention! Will be > grateful for any response. > > Kind regards, > Viktor Nazdrachev > R&D senior researcher > Geosteering Technologies LLC > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yuf2 at rpi.edu Fri Aug 20 13:54:22 2021 From: yuf2 at rpi.edu (Feimi Yu) Date: Fri, 20 Aug 2021 14:54:22 -0400 Subject: [petsc-users] Reaching limit number of communicator with Spectrum MPI In-Reply-To: References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu> <71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu> <1ce1902e-4335-29d4-033-42e219e6355c@mcs.anl.gov> <878s0x6118.fsf@jedbrown.org> Message-ID: <59eab54c-ad95-c305-e233-f0f39d613011@rpi.edu> Hi Barry and Junchao, Actually I did a simple MPI "dup and free" test before with Spectrum MPI, but that one did not have any problem. I'm not a PETSc programmer as I mainly use deal.ii's PETSc wrappers, but I managed to write a minimal program based on petsc/src/mat/tests/ex98.c to reproduce my problem. This piece of code creates and destroys 10,000 instances of Hypre Parasail preconditioners (for my own code, it uses Euclid, but I don't think it matters). It runs fine with OpenMPI but reports the out of communicator error with Sepctrum MPI. The code is attached in the email. In case the attachment is not available, I also uploaded a copy on my google drive: https://drive.google.com/drive/folders/1DCf7lNlks8GjazvoP7c211ojNHLwFKL6?usp=sharing Thanks! Feimi On 8/20/21 9:58 AM, Junchao Zhang wrote: > Feimi, if it is easy to reproduce, could you give instructions on how > to reproduce that? > > PS: Spectrum MPI is based on OpenMPI.? I don't understand why it has > the problem but OpenMPI does not.? It could be a bug in petsc or > user's code.? For reference counting on MPI_Comm, we already have > petsc inner comm. I think we can reuse that. > > --Junchao Zhang > > > On Fri, Aug 20, 2021 at 12:33 AM Barry Smith > wrote: > > > ? It sounds like maybe the Spectrum MPI_Comm_free() is not > returning the comm to the "pool" as available for future use; a > very buggy MPI implementation. This can easily be checked in a > tiny standalone MPI program that simply comm dups and frees > thousands of times in a loop. Could even be a configure test (that > requires running an MPI program). I do not remember if we ever > tested this possibility; maybe and I forgot. > > ? If this is the problem we can provide a "work around" that > attributes the new comm (to be passed to hypre) to the old comm > with a reference count value also in the attribute. When the hypre > matrix is created that count is (with the new comm) is set to 1, > when the hypre matrix is freed that count is set to zero (but the > comm is not freed), in the next call to create the hypre matrix > when the attribute is found, the count is zero so PETSc knows it > can pass the same comm again to the new hypre matrix. > > This will only allow one simultaneous hypre matrix to be created > from the original comm. To allow multiply simultaneous hypre > matrix one could have multiple comms and counts in the attribute > and just check them until one finds an available one to reuse (or > creates yet another one if all the current ones are busy with > hypre matrices). So it is the same model as DMGetXXVector() where > vectors are checked out and then checked in to be available later. > This would solve the currently reported problem (if it is a buggy > MPI that does not properly free comms), but not solve the MOOSE > problem where 10,000 comms are needed at the same time. > > ? Barry > > > > > >> On Aug 19, 2021, at 3:29 PM, Junchao Zhang >> > wrote: >> >> >> >> >> On Thu, Aug 19, 2021 at 2:08 PM Feimi Yu > > wrote: >> >> Hi Jed, >> >> In my case, I only have 2 hypre preconditioners at the same >> time, and >> they do not solve simultaneously, so it might not be case 1. >> >> I checked the stack for all the calls of >> MPI_Comm_dup/MPI_Comm_free on >> my own machine (with OpenMPI), all the communicators are >> freed from my >> observation. I could not test it with Spectrum MPI on the >> clusters >> immediately because all the dependencies were built in >> release mode. >> However, as I mentioned, I haven't had this problem with >> OpenMPI before, >> so I'm not sure if this is really an MPI implementation >> problem, or just >> because Spectrum MPI has less limit for the number of >> communicators, >> and/or this also depends on how many MPI ranks are used, as >> only 2 out >> of 40 ranks reported the error. >> >> You can add printf around MPI_Comm_dup/MPI_Comm_free sites on the >> two ranks, e.g., if (myrank == 38) printf(...), to see if the >> dup/free are paired. >> ?As a workaround, I replaced the MPI_Comm_dup() at >> >> petsc/src/mat/impls/hypre/mhypre.c:2120 with a copy >> assignment, and also >> removed the MPI_Comm_free() in the hypre destroyer. My code >> runs fine >> with Spectrum MPI now, but I don't think this is a long-term >> solution. >> >> Thanks! >> >> Feimi >> >> On 8/19/21 9:01 AM, Jed Brown wrote: >> > Junchao Zhang > > writes: >> > >> >> Hi, Feimi, >> >>? ? I need to consult Jed (cc'ed). >> >>? ? Jed, is this an example of >> >> >> https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663 >> ? >> >> If Feimi really can not free matrices, then we just need >> to attach a >> >> hypre-comm to a petsc inner comm, and pass that to hypre. >> > Are there a bunch of solves as in that case? >> > >> > My understanding is that one should be able to >> MPI_Comm_dup/MPI_Comm_free as many times as you like, but the >> implementation has limits on how many communicators can >> co-exist at any one time. The many-at-once is what we >> encountered in that 2018 thread. >> > >> > One way to check would be to use a debugger or tracer to >> examine the stack every time (P)MPI_Comm_dup and >> (P)MPI_Comm_free are called. >> > >> > case 1: we'll find lots of dups without frees (until the >> end) because the user really wants lots of these existing at >> the same time. >> > >> > case 2: dups are unfreed because of reference counting >> issue/inessential references >> > >> > >> > In case 1, I think the solution is as outlined in the >> thread, PETSc can create an inner-comm for Hypre. I think I'd >> prefer to attach it to the outer comm instead of the PETSc >> inner comm, but perhaps a case could be made either way. >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: hypre_precon_test.cpp Type: text/x-c++src Size: 3422 bytes Desc: not available URL: From yuf2 at rpi.edu Fri Aug 20 14:02:30 2021 From: yuf2 at rpi.edu (Feimi Yu) Date: Fri, 20 Aug 2021 15:02:30 -0400 Subject: [petsc-users] Reaching limit number of communicator with Spectrum MPI In-Reply-To: <59eab54c-ad95-c305-e233-f0f39d613011@rpi.edu> References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu> <71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu> <1ce1902e-4335-29d4-033-42e219e6355c@mcs.anl.gov> <878s0x6118.fsf@jedbrown.org> <59eab54c-ad95-c305-e233-f0f39d613011@rpi.edu> Message-ID: <21c1bf50-2e96-b924-529d-b1e78a319f78@rpi.edu> Sorry, I forgot to destroy the matrix after the loop, but anyway, the in-loop preconditioners are destroyed. Updated the code here and the google drive. Feimi On 8/20/21 2:54 PM, Feimi Yu wrote: > > Hi Barry and Junchao, > > Actually I did a simple MPI "dup and free" test before with Spectrum > MPI, but that one did not have any problem. I'm not a PETSc programmer > as I mainly use deal.ii's PETSc wrappers, but I managed to write a > minimal program based on petsc/src/mat/tests/ex98.c to reproduce my > problem. This piece of code creates and destroys 10,000 instances of > Hypre Parasail preconditioners (for my own code, it uses Euclid, but I > don't think it matters). It runs fine with OpenMPI but reports the out > of communicator error with Sepctrum MPI. The code is attached in the > email. In case the attachment is not available, I also uploaded a copy > on my google drive: > > https://drive.google.com/drive/folders/1DCf7lNlks8GjazvoP7c211ojNHLwFKL6?usp=sharing > > Thanks! > > Feimi > > On 8/20/21 9:58 AM, Junchao Zhang wrote: >> Feimi, if it is easy to reproduce, could you give instructions on how >> to reproduce that? >> >> PS: Spectrum MPI is based on OpenMPI.? I don't understand why it has >> the problem but OpenMPI does not.? It could be a bug in petsc or >> user's code.? For reference counting on MPI_Comm, we already have >> petsc inner comm. I think we can reuse that. >> >> --Junchao Zhang >> >> >> On Fri, Aug 20, 2021 at 12:33 AM Barry Smith > > wrote: >> >> >> ? It sounds like maybe the Spectrum MPI_Comm_free() is not >> returning the comm to the "pool" as available for future use; a >> very buggy MPI implementation. This can easily be checked in a >> tiny standalone MPI program that simply comm dups and frees >> thousands of times in a loop. Could even be a configure test >> (that requires running an MPI program). I do not remember if we >> ever tested this possibility; maybe and I forgot. >> >> ? If this is the problem we can provide a "work around" that >> attributes the new comm (to be passed to hypre) to the old comm >> with a reference count value also in the attribute. When the >> hypre matrix is created that count is (with the new comm) is set >> to 1, when the hypre matrix is freed that count is set to zero >> (but the comm is not freed), in the next call to create the hypre >> matrix when the attribute is found, the count is zero so PETSc >> knows it can pass the same comm again to the new hypre matrix. >> >> This will only allow one simultaneous hypre matrix to be created >> from the original comm. To allow multiply simultaneous hypre >> matrix one could have multiple comms and counts in the attribute >> and just check them until one finds an available one to reuse (or >> creates yet another one if all the current ones are busy with >> hypre matrices). So it is the same model as DMGetXXVector() where >> vectors are checked out and then checked in to be available >> later. This would solve the currently reported problem (if it is >> a buggy MPI that does not properly free comms), but not solve the >> MOOSE problem where 10,000 comms are needed at the same time. >> >> ? Barry >> >> >> >> >> >>> On Aug 19, 2021, at 3:29 PM, Junchao Zhang >>> > wrote: >>> >>> >>> >>> >>> On Thu, Aug 19, 2021 at 2:08 PM Feimi Yu >> > wrote: >>> >>> Hi Jed, >>> >>> In my case, I only have 2 hypre preconditioners at the same >>> time, and >>> they do not solve simultaneously, so it might not be case 1. >>> >>> I checked the stack for all the calls of >>> MPI_Comm_dup/MPI_Comm_free on >>> my own machine (with OpenMPI), all the communicators are >>> freed from my >>> observation. I could not test it with Spectrum MPI on the >>> clusters >>> immediately because all the dependencies were built in >>> release mode. >>> However, as I mentioned, I haven't had this problem with >>> OpenMPI before, >>> so I'm not sure if this is really an MPI implementation >>> problem, or just >>> because Spectrum MPI has less limit for the number of >>> communicators, >>> and/or this also depends on how many MPI ranks are used, as >>> only 2 out >>> of 40 ranks reported the error. >>> >>> You can add printf around MPI_Comm_dup/MPI_Comm_free sites on >>> the two ranks, e.g., if (myrank == 38) printf(...), to see if >>> the dup/free are paired. >>> ?As a workaround, I replaced the MPI_Comm_dup() at >>> >>> petsc/src/mat/impls/hypre/mhypre.c:2120 with a copy >>> assignment, and also >>> removed the MPI_Comm_free() in the hypre destroyer. My code >>> runs fine >>> with Spectrum MPI now, but I don't think this is a long-term >>> solution. >>> >>> Thanks! >>> >>> Feimi >>> >>> On 8/19/21 9:01 AM, Jed Brown wrote: >>> > Junchao Zhang >> > writes: >>> > >>> >> Hi, Feimi, >>> >>? ? I need to consult Jed (cc'ed). >>> >>? ? Jed, is this an example of >>> >> >>> https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663 >>> ? >>> >> If Feimi really can not free matrices, then we just need >>> to attach a >>> >> hypre-comm to a petsc inner comm, and pass that to hypre. >>> > Are there a bunch of solves as in that case? >>> > >>> > My understanding is that one should be able to >>> MPI_Comm_dup/MPI_Comm_free as many times as you like, but >>> the implementation has limits on how many communicators can >>> co-exist at any one time. The many-at-once is what we >>> encountered in that 2018 thread. >>> > >>> > One way to check would be to use a debugger or tracer to >>> examine the stack every time (P)MPI_Comm_dup and >>> (P)MPI_Comm_free are called. >>> > >>> > case 1: we'll find lots of dups without frees (until the >>> end) because the user really wants lots of these existing at >>> the same time. >>> > >>> > case 2: dups are unfreed because of reference counting >>> issue/inessential references >>> > >>> > >>> > In case 1, I think the solution is as outlined in the >>> thread, PETSc can create an inner-comm for Hypre. I think >>> I'd prefer to attach it to the outer comm instead of the >>> PETSc inner comm, but perhaps a case could be made either way. >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: hypre_precon_test.cpp Type: text/x-c++src Size: 3545 bytes Desc: not available URL: From elbueler at alaska.edu Fri Aug 20 14:11:36 2021 From: elbueler at alaska.edu (Ed Bueler) Date: Fri, 20 Aug 2021 11:11:36 -0800 Subject: [petsc-users] Euclid or Boomeramg vs ILU: questions. Message-ID: Viktor -- As a basic comment, note that ILU can be used in parallel, namely on each processor block, by either non-overlapping domain decomposition: -pc_type bjacobi -sub_pc_type ilu or with overlap: -pc_type asm -sub_pc_type ilu See the discussion of block Jacobi and ASM at https://petsc.org/release/docs/manual/ksp/#block-jacobi-and-overlapping-additive-schwarz-preconditioners Of course, no application of ILU will be generating optimal performance, but it looks like you are not yet getting that from AMG either. Ed On Fri, Aug 20, 2021 at 8:53 AM ????????? ?????? wrote: > *Hello, dear PETSc team!* > > > > I have a 3D elasticity with heterogeneous properties problem. There is > unstructured grid with aspect ratio varied from 4 to 25. Dirichlet BCs > (bottom zero displacements) are imposed via linear constraint equations > using Lagrange multipliers. Also, Neumann (traction) BCs are imposed on > side edges of mesh. Gravity load is also accounted for. > > I can solve this problem with *dgmres solver* and *ILU* as a > *preconditioner*. But ILU doesn`t support parallel computing, so I > decided to use Euclid or Boomeramg as a preconditioner. The issue is in > slow convergence and high memory consumption, much higher, than for ILU. > > E.g., for source matrix size 2.14 GB with *ILU-0 preconditioning* memory > consumption is about 5.9 GB, and the process converges due to 767 > iterations, and with *Euclid-0 preconditioning* memory consumption is > about 8.7 GB, and the process converges due to 1732 iterations. > > One of the following preconditioners is currently in use: *ILU-0, ILU-1, > Hypre (Euclid), Hypre (boomeramg)*. > > As a result of computations *(logs and memory logs are attached)*, the > following is established for preconditioners: > > 1. *ILU-0*: does not always provide convergence (or provides, but slow); > uses an acceptable amount of RAM; does not support parallel computing. > > 2. *ILU-1*: stable; memory consumption is much higher than that of ILU-0; > does not support parallel computing. > > 3. *Euclid*: provides very slow convergence, calculations are performed > several times slower than for ILU-0; memory consumption greatly exceeds > both ILU-0 and ILU-1; supports parallel computing. Also ?drop tolerance? > doesn?t provide enough accuracy in some cells, so I don?t use it. > > 4. *Boomeramg*: provides very slow convergence, calculations are > performed several times slower than for ILU-0; memory consumption greatly > exceeds both ILU-0 and ILU-1; supports parallel computing. > > > > In this regard, the following questions arose: > > 1. Is this behavior expected for HYPRE in computations with 1 MPI process? > If not, is that problem can be related to *PETSc* or *HYPRE*? > > 2. Hypre (Euclid) has much fewer parameters than ILU. Among them is the > factorization level *"-pc_hypre_euclid_level : > Factorization levels (None)"* and its default value looks very strange, > moreover, it doesn?t matter what factor is chosen -2, -1 or 0. Could it be > that the parameter is confused with Column pivot tolerance in ILU - *"-pc_factor_column_pivot > <-2.: -2.>: Column pivot tolerance (used only for some factorization) > (PCFactorSetColumnPivot)"*? > > 3. What preconditioner would you recommend to: optimize *convergence*, > *memory* consumption, add *parallel computing*? > > 4. How can we theoretically estimate memory costs with *ILU, Euclid, > Boomeramg*? > > 5. At what stage are memory leaks most likely? > > > > In any case, thank you so much for your attention! Will be grateful for > any response. > > Kind regards, > Viktor Nazdrachev > R&D senior researcher > Geosteering Technologies LLC -- Ed Bueler Dept of Mathematics and Statistics University of Alaska Fairbanks Fairbanks, AK 99775-6660 306C Chapman -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Fri Aug 20 16:14:11 2021 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Fri, 20 Aug 2021 16:14:11 -0500 Subject: [petsc-users] Reaching limit number of communicator with Spectrum MPI In-Reply-To: <21c1bf50-2e96-b924-529d-b1e78a319f78@rpi.edu> References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu> <71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu> <1ce1902e-4335-29d4-033-42e219e6355c@mcs.anl.gov> <878s0x6118.fsf@jedbrown.org> <59eab54c-ad95-c305-e233-f0f39d613011@rpi.edu> <21c1bf50-2e96-b924-529d-b1e78a319f78@rpi.edu> Message-ID: Feimi, I'm able to reproduce the problem. I will have a look. Thanks a lot for the example. --Junchao Zhang On Fri, Aug 20, 2021 at 2:02 PM Feimi Yu wrote: > Sorry, I forgot to destroy the matrix after the loop, but anyway, the > in-loop preconditioners are destroyed. Updated the code here and the google > drive. > > Feimi > On 8/20/21 2:54 PM, Feimi Yu wrote: > > Hi Barry and Junchao, > > Actually I did a simple MPI "dup and free" test before with Spectrum MPI, > but that one did not have any problem. I'm not a PETSc programmer as I > mainly use deal.ii's PETSc wrappers, but I managed to write a minimal > program based on petsc/src/mat/tests/ex98.c to reproduce my problem. This > piece of code creates and destroys 10,000 instances of Hypre Parasail > preconditioners (for my own code, it uses Euclid, but I don't think it > matters). It runs fine with OpenMPI but reports the out of communicator > error with Sepctrum MPI. The code is attached in the email. In case the > attachment is not available, I also uploaded a copy on my google drive: > > > https://drive.google.com/drive/folders/1DCf7lNlks8GjazvoP7c211ojNHLwFKL6?usp=sharing > > Thanks! > > Feimi > On 8/20/21 9:58 AM, Junchao Zhang wrote: > > Feimi, if it is easy to reproduce, could you give instructions on how to > reproduce that? > > PS: Spectrum MPI is based on OpenMPI. I don't understand why it has the > problem but OpenMPI does not. It could be a bug in petsc or user's code. > For reference counting on MPI_Comm, we already have petsc inner comm. I > think we can reuse that. > > --Junchao Zhang > > > On Fri, Aug 20, 2021 at 12:33 AM Barry Smith wrote: > >> >> It sounds like maybe the Spectrum MPI_Comm_free() is not returning the >> comm to the "pool" as available for future use; a very buggy MPI >> implementation. This can easily be checked in a tiny standalone MPI program >> that simply comm dups and frees thousands of times in a loop. Could even be >> a configure test (that requires running an MPI program). I do not remember >> if we ever tested this possibility; maybe and I forgot. >> >> If this is the problem we can provide a "work around" that attributes >> the new comm (to be passed to hypre) to the old comm with a reference count >> value also in the attribute. When the hypre matrix is created that count is >> (with the new comm) is set to 1, when the hypre matrix is freed that count >> is set to zero (but the comm is not freed), in the next call to create the >> hypre matrix when the attribute is found, the count is zero so PETSc knows >> it can pass the same comm again to the new hypre matrix. >> >> This will only allow one simultaneous hypre matrix to be created from the >> original comm. To allow multiply simultaneous hypre matrix one could have >> multiple comms and counts in the attribute and just check them until one >> finds an available one to reuse (or creates yet another one if all the >> current ones are busy with hypre matrices). So it is the same model as >> DMGetXXVector() where vectors are checked out and then checked in to be >> available later. This would solve the currently reported problem (if it is >> a buggy MPI that does not properly free comms), but not solve the MOOSE >> problem where 10,000 comms are needed at the same time. >> >> Barry >> >> >> >> >> >> On Aug 19, 2021, at 3:29 PM, Junchao Zhang >> wrote: >> >> >> >> >> On Thu, Aug 19, 2021 at 2:08 PM Feimi Yu wrote: >> >>> Hi Jed, >>> >>> In my case, I only have 2 hypre preconditioners at the same time, and >>> they do not solve simultaneously, so it might not be case 1. >>> >>> I checked the stack for all the calls of MPI_Comm_dup/MPI_Comm_free on >>> my own machine (with OpenMPI), all the communicators are freed from my >>> observation. I could not test it with Spectrum MPI on the clusters >>> immediately because all the dependencies were built in release mode. >>> However, as I mentioned, I haven't had this problem with OpenMPI before, >>> so I'm not sure if this is really an MPI implementation problem, or just >>> because Spectrum MPI has less limit for the number of communicators, >>> and/or this also depends on how many MPI ranks are used, as only 2 out >>> of 40 ranks reported the error. >>> >> You can add printf around MPI_Comm_dup/MPI_Comm_free sites on the two >> ranks, e.g., if (myrank == 38) printf(...), to see if the dup/free are >> paired. >> >> As a workaround, I replaced the MPI_Comm_dup() at >> >>> petsc/src/mat/impls/hypre/mhypre.c:2120 with a copy assignment, and also >>> removed the MPI_Comm_free() in the hypre destroyer. My code runs fine >>> with Spectrum MPI now, but I don't think this is a long-term solution. >>> >>> Thanks! >>> >>> Feimi >>> >>> On 8/19/21 9:01 AM, Jed Brown wrote: >>> > Junchao Zhang writes: >>> > >>> >> Hi, Feimi, >>> >> I need to consult Jed (cc'ed). >>> >> Jed, is this an example of >>> >> >>> https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663 >>> ? >>> >> If Feimi really can not free matrices, then we just need to attach a >>> >> hypre-comm to a petsc inner comm, and pass that to hypre. >>> > Are there a bunch of solves as in that case? >>> > >>> > My understanding is that one should be able to >>> MPI_Comm_dup/MPI_Comm_free as many times as you like, but the >>> implementation has limits on how many communicators can co-exist at any one >>> time. The many-at-once is what we encountered in that 2018 thread. >>> > >>> > One way to check would be to use a debugger or tracer to examine the >>> stack every time (P)MPI_Comm_dup and (P)MPI_Comm_free are called. >>> > >>> > case 1: we'll find lots of dups without frees (until the end) because >>> the user really wants lots of these existing at the same time. >>> > >>> > case 2: dups are unfreed because of reference counting >>> issue/inessential references >>> > >>> > >>> > In case 1, I think the solution is as outlined in the thread, PETSc >>> can create an inner-comm for Hypre. I think I'd prefer to attach it to the >>> outer comm instead of the PETSc inner comm, but perhaps a case could be >>> made either way. >>> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Aug 20 16:17:12 2021 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 20 Aug 2021 16:17:12 -0500 Subject: [petsc-users] Parallelize in the y direction In-Reply-To: References: Message-ID: <8FDFD94E-155D-475D-A8EA-BD459A014B2A@petsc.dev> Trying to solve many "one-dimensional" problems each in parallel on different subset of ranks will be massive pain to do specifically. I recommend just forming a single matrix for all these systems and solving it with KSPSolve and block Jacobi preconditioning or even a parallel direct solver such as with -pc_type lu -pc_factor_mat_solver_type mumps Barry Yes, this single system, in a certain ordering is block diagonal (each block being tridiagonal) so contains "independent" subsystems; what I suggest above essentially takes advantage of this structure to be reasonably efficient, yet trivial to code. > On Aug 20, 2021, at 9:14 AM, Matthew Knepley wrote: > > On Fri, Aug 20, 2021 at 7:53 AM Joauma Marichal > wrote: > Dear Sir or Madam, > > I am looking for advice regarding some of PETSc functionnalities. I am currently using PETSc to solve the Navier-Stokes equations on a 3D mesh decomposed over several processors. However, until now, the processors are distributed along the x and z directions but not along the y one. Indeed, at some point in the algorithm, I must solve a tridiagonal system that depends only on y. Until now, I have therefore performed something like this: > for(int k = cornp->zs, kzs+cornp->zm; ++k){ > for(int i = cornp->xs, ixs+cornp->xm; ++i){ > Create and solve a tridiagonal system for all the y coordinates (which are on the same process) > } > However, I would like to decompose my mesh in the y direction (as this should improve the code efficiency). > I managed to do so by creating a system based on the 3D DM of all my case (so 1 system of size x*y*z). Unfortunately, this does not seem to be very efficient. > Do you have some advice on how to cut in the y direction while still being able to solve x*z systems of size y? Should I create 1D DMs? > > 1) Are you using a 3D DMDA? > > 2) Is the coupling much different in the x and z than in the y direction? > > Thanks, > > Matt > > Thanks a lot for your help. > > Best regards, > > Joauma Marichal > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Sat Aug 21 10:30:46 2021 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Sat, 21 Aug 2021 10:30:46 -0500 Subject: [petsc-users] Reaching limit number of communicator with Spectrum MPI In-Reply-To: References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu> <71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu> <1ce1902e-4335-29d4-033-42e219e6355c@mcs.anl.gov> <878s0x6118.fsf@jedbrown.org> <59eab54c-ad95-c305-e233-f0f39d613011@rpi.edu> <21c1bf50-2e96-b924-529d-b1e78a319f78@rpi.edu> Message-ID: I checked and found MPI_Comm_dup() and MPI_Comm_free() were called in pairs. So the MPI runtime should not complain about running out of resources. I guess there might be pending communications on communicators. But I've no means to know exactly. Per MPI manual, MPI_Comm_free() only marks a communicator object for deallocation. We can file a bug report to OLCF. With MPI source code, it should be easy for them to debug. --Junchao Zhang On Fri, Aug 20, 2021 at 4:14 PM Junchao Zhang wrote: > Feimi, > I'm able to reproduce the problem. I will have a look. Thanks a lot for > the example. > --Junchao Zhang > > > On Fri, Aug 20, 2021 at 2:02 PM Feimi Yu wrote: > >> Sorry, I forgot to destroy the matrix after the loop, but anyway, the >> in-loop preconditioners are destroyed. Updated the code here and the google >> drive. >> >> Feimi >> On 8/20/21 2:54 PM, Feimi Yu wrote: >> >> Hi Barry and Junchao, >> >> Actually I did a simple MPI "dup and free" test before with Spectrum MPI, >> but that one did not have any problem. I'm not a PETSc programmer as I >> mainly use deal.ii's PETSc wrappers, but I managed to write a minimal >> program based on petsc/src/mat/tests/ex98.c to reproduce my problem. This >> piece of code creates and destroys 10,000 instances of Hypre Parasail >> preconditioners (for my own code, it uses Euclid, but I don't think it >> matters). It runs fine with OpenMPI but reports the out of communicator >> error with Sepctrum MPI. The code is attached in the email. In case the >> attachment is not available, I also uploaded a copy on my google drive: >> >> >> https://drive.google.com/drive/folders/1DCf7lNlks8GjazvoP7c211ojNHLwFKL6?usp=sharing >> >> Thanks! >> >> Feimi >> On 8/20/21 9:58 AM, Junchao Zhang wrote: >> >> Feimi, if it is easy to reproduce, could you give instructions on how to >> reproduce that? >> >> PS: Spectrum MPI is based on OpenMPI. I don't understand why it has the >> problem but OpenMPI does not. It could be a bug in petsc or user's code. >> For reference counting on MPI_Comm, we already have petsc inner comm. I >> think we can reuse that. >> >> --Junchao Zhang >> >> >> On Fri, Aug 20, 2021 at 12:33 AM Barry Smith wrote: >> >>> >>> It sounds like maybe the Spectrum MPI_Comm_free() is not returning the >>> comm to the "pool" as available for future use; a very buggy MPI >>> implementation. This can easily be checked in a tiny standalone MPI program >>> that simply comm dups and frees thousands of times in a loop. Could even be >>> a configure test (that requires running an MPI program). I do not remember >>> if we ever tested this possibility; maybe and I forgot. >>> >>> If this is the problem we can provide a "work around" that attributes >>> the new comm (to be passed to hypre) to the old comm with a reference count >>> value also in the attribute. When the hypre matrix is created that count is >>> (with the new comm) is set to 1, when the hypre matrix is freed that count >>> is set to zero (but the comm is not freed), in the next call to create the >>> hypre matrix when the attribute is found, the count is zero so PETSc knows >>> it can pass the same comm again to the new hypre matrix. >>> >>> This will only allow one simultaneous hypre matrix to be created from >>> the original comm. To allow multiply simultaneous hypre matrix one could >>> have multiple comms and counts in the attribute and just check them until >>> one finds an available one to reuse (or creates yet another one if all the >>> current ones are busy with hypre matrices). So it is the same model as >>> DMGetXXVector() where vectors are checked out and then checked in to be >>> available later. This would solve the currently reported problem (if it is >>> a buggy MPI that does not properly free comms), but not solve the MOOSE >>> problem where 10,000 comms are needed at the same time. >>> >>> Barry >>> >>> >>> >>> >>> >>> On Aug 19, 2021, at 3:29 PM, Junchao Zhang >>> wrote: >>> >>> >>> >>> >>> On Thu, Aug 19, 2021 at 2:08 PM Feimi Yu wrote: >>> >>>> Hi Jed, >>>> >>>> In my case, I only have 2 hypre preconditioners at the same time, and >>>> they do not solve simultaneously, so it might not be case 1. >>>> >>>> I checked the stack for all the calls of MPI_Comm_dup/MPI_Comm_free on >>>> my own machine (with OpenMPI), all the communicators are freed from my >>>> observation. I could not test it with Spectrum MPI on the clusters >>>> immediately because all the dependencies were built in release mode. >>>> However, as I mentioned, I haven't had this problem with OpenMPI >>>> before, >>>> so I'm not sure if this is really an MPI implementation problem, or >>>> just >>>> because Spectrum MPI has less limit for the number of communicators, >>>> and/or this also depends on how many MPI ranks are used, as only 2 out >>>> of 40 ranks reported the error. >>>> >>> You can add printf around MPI_Comm_dup/MPI_Comm_free sites on the two >>> ranks, e.g., if (myrank == 38) printf(...), to see if the dup/free are >>> paired. >>> >>> As a workaround, I replaced the MPI_Comm_dup() at >>> >>>> petsc/src/mat/impls/hypre/mhypre.c:2120 with a copy assignment, and >>>> also >>>> removed the MPI_Comm_free() in the hypre destroyer. My code runs fine >>>> with Spectrum MPI now, but I don't think this is a long-term solution. >>>> >>>> Thanks! >>>> >>>> Feimi >>>> >>>> On 8/19/21 9:01 AM, Jed Brown wrote: >>>> > Junchao Zhang writes: >>>> > >>>> >> Hi, Feimi, >>>> >> I need to consult Jed (cc'ed). >>>> >> Jed, is this an example of >>>> >> >>>> https://lists.mcs.anl.gov/mailman/htdig/petsc-dev/2018-April/thread.html#22663 >>>> ? >>>> >> If Feimi really can not free matrices, then we just need to attach a >>>> >> hypre-comm to a petsc inner comm, and pass that to hypre. >>>> > Are there a bunch of solves as in that case? >>>> > >>>> > My understanding is that one should be able to >>>> MPI_Comm_dup/MPI_Comm_free as many times as you like, but the >>>> implementation has limits on how many communicators can co-exist at any one >>>> time. The many-at-once is what we encountered in that 2018 thread. >>>> > >>>> > One way to check would be to use a debugger or tracer to examine the >>>> stack every time (P)MPI_Comm_dup and (P)MPI_Comm_free are called. >>>> > >>>> > case 1: we'll find lots of dups without frees (until the end) because >>>> the user really wants lots of these existing at the same time. >>>> > >>>> > case 2: dups are unfreed because of reference counting >>>> issue/inessential references >>>> > >>>> > >>>> > In case 1, I think the solution is as outlined in the thread, PETSc >>>> can create an inner-comm for Hypre. I think I'd prefer to attach it to the >>>> outer comm instead of the PETSc inner comm, but perhaps a case could be >>>> made either way. >>>> >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From eijkhout at tacc.utexas.edu Sat Aug 21 14:08:12 2021 From: eijkhout at tacc.utexas.edu (Victor Eijkhout) Date: Sat, 21 Aug 2021 19:08:12 +0000 Subject: [petsc-users] Reaching limit number of communicator with Spectrum MPI In-Reply-To: References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu> <71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu> <1ce1902e-4335-29d4-033-42e219e6355c@mcs.anl.gov> <878s0x6118.fsf@jedbrown.org> Message-ID: On , 2021Aug20, at 00:33, Barry Smith > wrote: It sounds like maybe the Spectrum MPI_Comm_free() is not returning the comm to the "pool" as available for future use; 1. I can not find in the standard what the proper response is to running out of communicators. 2. Mpich on my laptop returns MPI_COMM_NULL after 2044 dups without free. 3. A million dups with free run in a second. 2b. Spectrum MPI on my P9 runs out after 4096 dups 3b. Dup and free million times takes more time than writing this email. Why do I keep hearing that OpenMPI is so great? Everything slightly non-standard I try is hopelessly slow and broken. Victor. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eijkhout at tacc.utexas.edu Sun Aug 22 13:10:57 2021 From: eijkhout at tacc.utexas.edu (Victor Eijkhout) Date: Sun, 22 Aug 2021 18:10:57 +0000 Subject: [petsc-users] Reaching limit number of communicator with Spectrum MPI In-Reply-To: References: <1b4063db-9c32-931e-4b7b-962180651f65@rpi.edu> <71f57283-93b4-2470-7f34-0b2af7309e00@rpi.edu> <1ce1902e-4335-29d4-033-42e219e6355c@mcs.anl.gov> <878s0x6118.fsf@jedbrown.org> Message-ID: <7ABE1E51-AF1B-4A38-873F-62CA254CA366@tacc.utexas.edu> On , 2021Aug21, at 14:08, Victor Eijkhout > wrote: 3b. Dup and free million times takes more time than writing this email. Why do I keep hearing that OpenMPI is so great? Everything slightly non-standard I try is hopelessly slow and broken. Partial rehabilitation for OpenMPI: Finished dup/free'ing 1000000 communicators real 0m9.301s user 0m9.120s sys 0m0.114s So that?s only 4 times slower than Mvapich, but the ludicrously slow performance is only for Spectrum MPI, not OpenMPI per se. Victor. -------------- next part -------------- An HTML attachment was scrubbed... URL: From janne.ruuskanen at tuni.fi Mon Aug 23 05:45:34 2021 From: janne.ruuskanen at tuni.fi (Janne Ruuskanen (TAU)) Date: Mon, 23 Aug 2021 10:45:34 +0000 Subject: [petsc-users] issues with mpi uni Message-ID: Hi, Assumingly, I have an issue using petsc and openmpi together in my c++ code. See the code there: https://github.com/halbux/sparselizard/blob/master/src/slmpi.cpp So when I run: slmpi::initialize(); slmpi::count(); slmpi::finalize(); I get the following error: *** The MPI_Comm_size() function was called before MPI_INIT was invoked. *** This is disallowed by the MPI standard. *** Your MPI job will now abort. Have you experienced anything similar with people trying to link openmpi and petsc into the same executable? Best regards, Janne Ruuskanen From balay at mcs.anl.gov Mon Aug 23 08:44:43 2021 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 23 Aug 2021 08:44:43 -0500 (CDT) Subject: [petsc-users] issues with mpi uni In-Reply-To: References: Message-ID: Did you build PETSc with the same openmpi [as what sparselizard is built with]? Satish On Mon, 23 Aug 2021, Janne Ruuskanen (TAU) wrote: > Hi, > > Assumingly, I have an issue using petsc and openmpi together in my c++ code. > > See the code there: > https://github.com/halbux/sparselizard/blob/master/src/slmpi.cpp > > > So when I run: > > slmpi::initialize(); > slmpi::count(); > slmpi::finalize(); > > I get the following error: > > > *** The MPI_Comm_size() function was called before MPI_INIT was invoked. > *** This is disallowed by the MPI standard. > *** Your MPI job will now abort. > > > Have you experienced anything similar with people trying to link openmpi and petsc into the same executable? > > Best regards, > Janne Ruuskanen > From asher.mancinelli at pnnl.gov Mon Aug 23 15:36:46 2021 From: asher.mancinelli at pnnl.gov (Mancinelli, Asher J) Date: Mon, 23 Aug 2021 20:36:46 +0000 Subject: [petsc-users] PETSc + Cray MPICH Build Error in User Code Message-ID: Hello all, We are attempting to build an application that relies on PETSc with Cray MPICH, and we're encountering the following build-time error: cd /exago/build/src/utils && hipcc -DHAVE_HIP -I/exago/include -I/exago/build -Ispack/opt/spack/cray-sles15-zen2/clang-12.0.0-rocm4.2-mpich/magma-2.6.1-l3ckgjdgsf4yhyzzb5zaibqg5u6lzgdb/include -isystem spack/opt/spack/cray-sles15-zen2/clang-12.0.0-rocm4.2-mpich/mumps-5.4.0-3naioareijver7s2em5sdsejh7s74kvf/include -isystem /include -isystem spack/opt/spack/cray-sles15-zen2/clang-12.0.0-rocm4.2-mpich/petsc-3.14.1-bzve7phvhb7sf6ikzmm3jwgzjwgnm4ro/include -O3 -DNDEBUG -fPIC -D__INSDIR__=\"\" -std=gnu++11 -MD -MT src/utils/CMakeFiles/UTILS_obj_static.dir/utils.cpp.o -MF CMakeFiles/UTILS_obj_static.dir/utils.cpp.o.d -o CMakeFiles/UTILS_obj_static.dir/utils.cpp.o -c /exago/src/utils/utils.cpp In file included from /exago/src/utils/utils.cpp:2: In file included from /exago/include/common.h:8: In file included from /spack/opt/spack/cray-sles15-zen2/clang-12.0.0-rocm4.2-mpich/petsc-3.14.1-bzve7phvhb7sf6ikzmm3jwgzjwgnm4ro/include/petsc.h:5: In file included from /spack/opt/spack/cray-sles15-zen2/clang-12.0.0-rocm4.2-mpich/petsc-3.14.1-bzve7phvhb7sf6ikzmm3jwgzjwgnm4ro/include/petscbag.h:4: /spack/opt/spack/cray-sles15-zen2/clang-12.0.0-rocm4.2-mpich/petsc-3.14.1-bzve7phvhb7sf6ikzmm3jwgzjwgnm4ro/include/petscsys.h:211:6: error: "PETSc was configured with MPICH but now appears to be compiling using a non-MPICH mpi.h" # error "PETSc was configured with MPICH but now appears to be compiling using a non-MPICH mpi.h" ^ I've replaced some possibly sensitive paths with text in angle brackets for a description, eg . Is this a known issue? Is it apparent from this text that we're doing anything wrong? Our source may be found at this repository: https://gitlab.pnnl.gov/exasgd/frameworks/exago. [https://gitlab.pnnl.gov/assets/gitlab_logo-7ae504fe4f68fdebb3c2034e36621930cd36ea87924c11ff65dbcb8ed50dca58.png] ExaSGD / Frameworks / ExaGO ? GitLab PNNL GitLab - Scientific Software Collaboration Platform gitlab.pnnl.gov Cheers, Asher Mancinelli Research Computing Pacific Northwest National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Mon Aug 23 16:07:59 2021 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Mon, 23 Aug 2021 16:07:59 -0500 Subject: [petsc-users] [petsc-maint] PETSc + Cray MPICH Build Error in User Code In-Reply-To: References: Message-ID: Could you send the configure.log of your petsc build? --Junchao Zhang On Mon, Aug 23, 2021 at 3:37 PM Mancinelli, Asher J via petsc-maint < petsc-maint at mcs.anl.gov> wrote: > Hello all, > > We are attempting to build an application that relies on PETSc with Cray > MPICH, and we're encountering the following build-time error: > > cd /exago/build/src/utils && hipcc -DHAVE_HIP -I/exago/include > -I/exago/build > -Ispack/opt/spack/cray-sles15-zen2/clang-12.0.0-rocm4.2-mpich/magma-2.6.1-l3ckgjdgsf4yhyzzb5zaibqg5u6lzgdb/include > -isystem > spack/opt/spack/cray-sles15-zen2/clang-12.0.0-rocm4.2-mpich/mumps-5.4.0-3naioareijver7s2em5sdsejh7s74kvf/include > -isystem /include -isystem > spack/opt/spack/cray-sles15-zen2/clang-12.0.0-rocm4.2-mpich/petsc-3.14.1-bzve7phvhb7sf6ikzmm3jwgzjwgnm4ro/include > -O3 -DNDEBUG -fPIC -D__INSDIR__=\"\" -std=gnu++11 -MD -MT > src/utils/CMakeFiles/UTILS_obj_static.dir/utils.cpp.o -MF > CMakeFiles/UTILS_obj_static.dir/utils.cpp.o.d -o > CMakeFiles/UTILS_obj_static.dir/utils.cpp.o -c > /exago/src/utils/utils.cpp > In file included from /exago/src/utils/utils.cpp:2: > In file included from /exago/include/common.h:8: > In file included from > /spack/opt/spack/cray-sles15-zen2/clang-12.0.0-rocm4.2-mpich/petsc-3.14.1-bzve7phvhb7sf6ikzmm3jwgzjwgnm4ro/include/petsc.h:5: > In file included from > /spack/opt/spack/cray-sles15-zen2/clang-12.0.0-rocm4.2-mpich/petsc-3.14.1-bzve7phvhb7sf6ikzmm3jwgzjwgnm4ro/include/petscbag.h:4: > /spack/opt/spack/cray-sles15-zen2/clang-12.0.0-rocm4.2-mpich/petsc-3.14.1-bzve7phvhb7sf6ikzmm3jwgzjwgnm4ro/include/petscsys.h:211:6: > error: "PETSc was configured with MPICH but now appears to be compiling > using a non-MPICH mpi.h" > # error "PETSc was configured with MPICH but now appears to be > compiling using a non-MPICH mpi.h" > ^ > > I've replaced some possibly sensitive paths with text in angle brackets > for a description, eg . > > Is this a known issue? Is it apparent from this text that we're doing > anything wrong? > > Our source may be found at this repository: > https://gitlab.pnnl.gov/exasgd/frameworks/exago. > > ExaSGD / Frameworks / ExaGO ? GitLab > > PNNL GitLab - Scientific Software Collaboration Platform > gitlab.pnnl.gov > > > Cheers, > > *Asher Mancinelli* > > Research Computing > > *Pacific Northwest National Laboratory* > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From janne.ruuskanen at tuni.fi Tue Aug 24 04:47:23 2021 From: janne.ruuskanen at tuni.fi (Janne Ruuskanen (TAU)) Date: Tue, 24 Aug 2021 09:47:23 +0000 Subject: [petsc-users] issues with mpi uni In-Reply-To: References: Message-ID: PETSc was built without mpi with the command: ./configure --with-openmp --with-mpi=0 --with-shared-libraries=1 --with-mumps-serial=1 --download-mumps --download-openblas --download-metis --download-slepc --with-debugging=0 --with-scalar-type=real --with-x=0 COPTFLAGS='-O3' CXXOPTFLAGS='-O3' FOPTFLAGS='-O3'; so the MPI_UNI mpi wrapper of petsc collides in names with the actual MPI used to compile sparselizard. -Janne -----Original Message----- From: Satish Balay Sent: Monday, August 23, 2021 4:45 PM To: Janne Ruuskanen (TAU) Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] issues with mpi uni Did you build PETSc with the same openmpi [as what sparselizard is built with]? Satish On Mon, 23 Aug 2021, Janne Ruuskanen (TAU) wrote: > Hi, > > Assumingly, I have an issue using petsc and openmpi together in my c++ code. > > See the code there: > https://github.com/halbux/sparselizard/blob/master/src/slmpi.cpp > > > So when I run: > > slmpi::initialize(); > slmpi::count(); > slmpi::finalize(); > > I get the following error: > > > *** The MPI_Comm_size() function was called before MPI_INIT was invoked. > *** This is disallowed by the MPI standard. > *** Your MPI job will now abort. > > > Have you experienced anything similar with people trying to link openmpi and petsc into the same executable? > > Best regards, > Janne Ruuskanen > From knepley at gmail.com Tue Aug 24 06:06:50 2021 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 24 Aug 2021 07:06:50 -0400 Subject: [petsc-users] issues with mpi uni In-Reply-To: References: Message-ID: On Tue, Aug 24, 2021 at 5:47 AM Janne Ruuskanen (TAU) < janne.ruuskanen at tuni.fi> wrote: > PETSc was built without mpi with the command: > > > ./configure --with-openmp --with-mpi=0 --with-shared-libraries=1 > --with-mumps-serial=1 --download-mumps --download-openblas --download-metis > --download-slepc --with-debugging=0 --with-scalar-type=real --with-x=0 > COPTFLAGS='-O3' CXXOPTFLAGS='-O3' FOPTFLAGS='-O3'; > > so the MPI_UNI mpi wrapper of petsc collides in names with the actual MPI > used to compile sparselizard. > Different MPI implementations are not ABI compatible and therefore cannot be used in the same program. You must build all libraries in an executable with the same MPI. Thus, rebuild PETSc with the same MPI as saprselizard. Thanks, Matt > -Janne > > > -----Original Message----- > From: Satish Balay > Sent: Monday, August 23, 2021 4:45 PM > To: Janne Ruuskanen (TAU) > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] issues with mpi uni > > Did you build PETSc with the same openmpi [as what sparselizard is built > with]? > > Satish > > On Mon, 23 Aug 2021, Janne Ruuskanen (TAU) wrote: > > > Hi, > > > > Assumingly, I have an issue using petsc and openmpi together in my c++ > code. > > > > See the code there: > > https://github.com/halbux/sparselizard/blob/master/src/slmpi.cpp > > > > > > So when I run: > > > > slmpi::initialize(); > > slmpi::count(); > > slmpi::finalize(); > > > > I get the following error: > > > > > > *** The MPI_Comm_size() function was called before MPI_INIT was invoked. > > *** This is disallowed by the MPI standard. > > *** Your MPI job will now abort. > > > > > > Have you experienced anything similar with people trying to link openmpi > and petsc into the same executable? > > > > Best regards, > > Janne Ruuskanen > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sayosale at hotmail.com Tue Aug 24 07:47:12 2021 From: sayosale at hotmail.com (dazza simplythebest) Date: Tue, 24 Aug 2021 12:47:12 +0000 Subject: [petsc-users] Improving efficiency of slepc usage In-Reply-To: References: Message-ID: Dear Matthew and Jose, Apologies for the delayed reply, I had a couple of unforeseen days off this week. Firstly regarding Jose's suggestion re: MUMPS, the program is already using MUMPS to solve linear systems (the code is using a distributed MPI matrix to solve the generalised non-Hermitian complex problem). I have tried the gdb debugger as per Matthew's suggestion. Just to note in case someone else is following this that at first it didn't work (couldn't 'attach') , but after some googling I found a tip suggesting the command; echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope which seemed to get it working. I then first ran the debugger on the small matrix case that worked. That stopped in gdb almost immediately after starting execution with a report regarding 'nanosleep.c': ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory. However, issuing the 'cont' command again caused the program to run through to the end of the execution w/out any problems, and with correct looking results, so I am guessing this error is not particularly important. I then tried the same debugging procedure on the large matrix case that fails. The code again stopped almost immediately after the start of execution with the same nanosleep error as before, and I was able to set the program running again with 'cont' (see full output below). I was running the code with 4 MPI processes, and so had 4 gdb windows appear. Thereafter the code ran for sometime until completing the matrix construction, and then one of the gdb process windows printed a Program terminated with signal SIGKILL, Killed. The program no longer exists. message. I then typed 'where' into this terminal but just received the message No stack. The other gdb windows basically seemed to be left in limbo until I issued the 'quit' command in the SIGKILL, and then they vanished. I paste the full output from the gdb window that recorded the SIGKILL below here. I guess it is necessary to somehow work out where the SIGKILL originates from ? Thanks once again, Dan. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2 Copyright (C) 2020 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: . Find the GDB manual and other documentation resources online at: . For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./stab1.exe... Attaching to program: /data/work/rotplane/omega_to_zero/stability/test/tmp10/tmp6/stab1.exe, process 675919 Reading symbols from /data/work/slepc/SLEPC/slepc-3.15.1/arch-omp_nodbug/lib/libslepc.so.3.15... Reading symbols from /data/work/slepc/PETSC/petsc-3.15.0/arch-omp_nodbug/lib/libpetsc.so.3.15... Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib--Type for more, q to quit, c to continue without paging--cont /intel64_lin/libmkl_intel_lp64.so... (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so) Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so... Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so... (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so) Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so... (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so) Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.so... Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.dbg... Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2... Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libdl-2.31.so... Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0... Reading symbols from /usr/lib/debug/.build-id/e5/4761f7b554d0fcc1562959665d93dffbebdaf0.debug... [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6... (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libstdc++.so.6) Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpifort.so.12... Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12... Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.dbg... Reading symbols from /lib/x86_64-linux-gnu/librt.so.1... Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/librt-2.31.so... Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5... (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5) Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so... (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so) Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so... (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so) Reading symbols from /lib/x86_64-linux-gnu/libm.so.6... Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libm-2.31.so... Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so... (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so) Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1... (No debugging symbols found in /lib/x86_64-linux-gnu/libgcc_s.so.1) Reading symbols from /usr/lib/x86_64-linux-gnu/libquadmath.so.0... (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libquadmath.so.0) Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so... (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so) Reading symbols from /lib/x86_64-linux-gnu/libc.so.6... Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.31.so... Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so... (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so) Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5... (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5) Reading symbols from /lib64/ld-linux-x86-64.so.2... Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.31.so... Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1... (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1) Reading symbols from /usr/lib/x86_64-linux-gnu/libnuma.so... (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libnuma.so) Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so... (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so) Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so... (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so) Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so... (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so) Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so... (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so) Reading symbols from /usr/lib/x86_64-linux-gnu/libpsm2.so.2... (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libpsm2.so.2) 0x00007fac4d0d8334 in __GI___clock_nanosleep (clock_id=, clock_id at entry=0, flags=flags at entry=0, req=req at entry=0x7ffdc641a9a0, rem=rem at entry=0x7ffdc641a9a0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78 78 ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory. (gdb) cont Continuing. [New Thread 0x7f9e49c02780 (LWP 676559)] [New Thread 0x7f9e49400800 (LWP 676560)] [New Thread 0x7f9e48bfe880 (LWP 676562)] [Thread 0x7f9e48bfe880 (LWP 676562) exited] [Thread 0x7f9e49400800 (LWP 676560) exited] [Thread 0x7f9e49c02780 (LWP 676559) exited] Program terminated with signal SIGKILL, Killed. The program no longer exists. (gdb) where No stack. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ________________________________ From: Matthew Knepley Sent: Friday, August 20, 2021 2:12 PM To: dazza simplythebest Cc: Jose E. Roman ; PETSc Subject: Re: [petsc-users] Improving efficiency of slepc usage On Fri, Aug 20, 2021 at 6:55 AM dazza simplythebest > wrote: Dear Jose, Many thanks for your response, I have been investigating this issue with a few more calculations today, hence the slightly delayed response. The problem is actually derived from a fluid dynamics problem, so to allow an easier exploration of things I first downsized the resolution of the underlying fluid solver while keeping all the physical parameters the same - i.e. I would get a smaller matrix that should be solving the same physical problem as the original larger matrix but to lower accuracy. Results Small matrix (N= 21168) - everything good! This converged when using the -eps_largest_real approach (taking 92 iterations for nev=10, tol= 5.0000E-06 and ncv = 300), and also when using the shift-invert approach, converging very impressively in a single iteration ! Interestingly it did this both for a non-zero -eps_target and also for a zero -eps_target. Large matrix (N=50400)- works for -eps_largest_real , fails for st_type sinvert I have just double checked again that the code does run properly when we use the -eps_largest_real option - indeed I ran it with a small nev and large tolerance (nev = 4, tol= -eps_tol 5.0e-4 , ncv = 300) and with these parameters convergence was obtained in 164 iterations, which took 6 hours on the machine I was running it on. Furthermore the eigenvalues seem to be ballpark correct; for this large higher resolution case (although with lower slepc tolerance) we obtain 1789.56816314173 -4724.51319554773i as the eigenvalue with largest real part, while the smaller matrix (same physical problem but at lower resolution case) found this eigenvalue to be 1831.11845726501 -4787.54519511345i , which means the agreement is in line with expectations. Unfortunately though the code does still crash though when I try to do shift-invert for the large matrix case , whether or not I use a non-zero -eps_target. For reference this is the command line used : -eps_nev 10 -eps_ncv 300 -log_view -eps_view -eps_target 0.1 -st_type sinvert -eps_monitor :monitor_output05.txt To be precise the code crashes soon after calling EPSSolve (it successfully calls MatCreateVecs, EPSCreate, EPSSetOperators, EPSSetProblemType and EPSSetFromOptions). By crashes I mean that I do not even get any error messages from slepc/PETSC, and do not even get the 'EPS Object: 16 MPI processes' message - I simply get a MPI/Fortran 'KILLED BY SIGNAL: 9 (Killed)' message as soon as EPSsolve is called. Hi Dan, It would help track this error down if we had a stack trace. You can get a stack trace from the debugger. You run with -start_in_debugger which should launch the debugger (usually), and then type cont to continue, and then where to get the stack trace when it crashes, or 'bt' on lldb. Thanks, Matt Do you have any ideas as to why this larger matrix case should fail when using shift-invert but succeed when using -eps_largest_real ? The fact that the program works and produces correct results when using the -eps_largest_real option suggests that there is probably nothing wrong with the specification of the problem or the matrices ? It is strange how there is no error message from slepc / Petsc ... the only idea I have at the moment is that perhaps max memory has been exceeded, which could cause such a sudden shutdown? For your reference when running the large matrix case with the -eps_largest_real option I am using about 36 GB of the 148GB available on this machine - does the shift invert approach require substantially more memory for example ? I would be very grateful if you have any suggestions to resolve this issue or even ways to clarify it further, the performance I have seen with the shift-invert for the small matrix is so impressive it would be great to get that working for the full-size problem. Many thanks and best wishes, Dan. ________________________________ From: Jose E. Roman > Sent: Thursday, August 19, 2021 7:58 AM To: dazza simplythebest > Cc: PETSc > Subject: Re: [petsc-users] Improving efficiency of slepc usage In A) convergence may be slow, especially if the wanted eigenvalues have small magnitude. I would not say 600 iterations is a lot, you probably need many more. In most cases, approach B) is better because it improves convergence of eigenvalues close to the target, but it requires prior knowledge of your spectrum distribution in order to choose an appropriate target. In B) what do you mean that it crashes. If you get an error about factorization, it means that your A-matrix is singular, In that case, try using a nonzero target -eps_target 0.1 Jose > El 19 ago 2021, a las 7:12, dazza simplythebest > escribi?: > > Dear All, > I am planning on using slepc to do a large number of eigenvalue calculations > of a generalized eigenvalue problem, called from a program written in fortran using MPI. > Thus far I have successfully installed the slepc/PETSc software, both locally and on a cluster, > and on smaller test problems everything is working well; the matrices are efficiently and > correctly constructed and slepc returns the correct spectrum. I am just now starting to move > towards now solving the full-size 'production run' problems, and would appreciate some > general advice on how to improve the solver's performance. > > In particular, I am currently trying to solve the problem Ax = lambda Bx whose matrices > are of size 50000 (this is the smallest 'production run' problem I will be tackling), and are > complex, non-Hermitian. In most cases I aim to find the eigenvalues with the largest real part, > although in other cases I will also be interested in finding the eigenvalues whose real part > is close to zero. > > A) > Calling slepc 's EPS solver with the following options: > > -eps_nev 10 -log_view -eps_view -eps_max_it 600 -eps_ncv 140 -eps_tol 5.0e-6 -eps_largest_real -eps_monitor :monitor_output.txt > > > led to the code successfully running, but failing to find any eigenvalues within the maximum 600 iterations > (examining the monitor output it did appear to be very slowly approaching convergence). > > B) > On the same problem I have also tried a shift-invert transformation using the options > > -eps_nev 10 -eps_ncv 140 -eps_target 0.0+0.0i -st_type sinvert > > -in this case the code crashed at the point it tried to call slepc, so perhaps I have incorrectly specified these options ? > > > Does anyone have any suggestions as to how to improve this performance ( or find out more about the problem) ? > In the case of A) I can see from watching the slepc videos that increasing ncv > may help, but I am wondering , since 600 is a large number of iterations, whether there > maybe something else going on - e.g. perhaps some alternative preconditioner may help ? > In the case of B), I guess there must be some mistake in these command line options? > Again, any advice will be greatly appreciated. > Best wishes, Dan. -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Tue Aug 24 10:18:21 2021 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 24 Aug 2021 10:18:21 -0500 (CDT) Subject: [petsc-users] issues with mpi uni In-Reply-To: References: Message-ID: MPI_UNI is name-spaced to avoid such conflicts. Don't know about mumps. But there could be corner cases where this issue comes up. And its best to have the same MPI across all packages that go into a binary anyway. Satish On Tue, 24 Aug 2021, Matthew Knepley wrote: > On Tue, Aug 24, 2021 at 5:47 AM Janne Ruuskanen (TAU) < > janne.ruuskanen at tuni.fi> wrote: > > > PETSc was built without mpi with the command: > > > > > > ./configure --with-openmp --with-mpi=0 --with-shared-libraries=1 > > --with-mumps-serial=1 --download-mumps --download-openblas --download-metis > > --download-slepc --with-debugging=0 --with-scalar-type=real --with-x=0 > > COPTFLAGS='-O3' CXXOPTFLAGS='-O3' FOPTFLAGS='-O3'; > > > > so the MPI_UNI mpi wrapper of petsc collides in names with the actual MPI > > used to compile sparselizard. > > > > Different MPI implementations are not ABI compatible and therefore cannot > be used in the same program. You must > build all libraries in an executable with the same MPI. Thus, rebuild PETSc > with the same MPI as saprselizard. > > Thanks, > > Matt > > > > -Janne > > > > > > -----Original Message----- > > From: Satish Balay > > Sent: Monday, August 23, 2021 4:45 PM > > To: Janne Ruuskanen (TAU) > > Cc: petsc-users at mcs.anl.gov > > Subject: Re: [petsc-users] issues with mpi uni > > > > Did you build PETSc with the same openmpi [as what sparselizard is built > > with]? > > > > Satish > > > > On Mon, 23 Aug 2021, Janne Ruuskanen (TAU) wrote: > > > > > Hi, > > > > > > Assumingly, I have an issue using petsc and openmpi together in my c++ > > code. > > > > > > See the code there: > > > https://github.com/halbux/sparselizard/blob/master/src/slmpi.cpp > > > > > > > > > So when I run: > > > > > > slmpi::initialize(); > > > slmpi::count(); > > > slmpi::finalize(); > > > > > > I get the following error: > > > > > > > > > *** The MPI_Comm_size() function was called before MPI_INIT was invoked. > > > *** This is disallowed by the MPI standard. > > > *** Your MPI job will now abort. > > > > > > > > > Have you experienced anything similar with people trying to link openmpi > > and petsc into the same executable? > > > > > > Best regards, > > > Janne Ruuskanen > > > > > > > > > From knepley at gmail.com Tue Aug 24 10:59:23 2021 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 24 Aug 2021 11:59:23 -0400 Subject: [petsc-users] Improving efficiency of slepc usage In-Reply-To: References: Message-ID: On Tue, Aug 24, 2021 at 8:47 AM dazza simplythebest wrote: > > Dear Matthew and Jose, > Apologies for the delayed reply, I had a couple of unforeseen days off > this week. > Firstly regarding Jose's suggestion re: MUMPS, the program is already > using MUMPS > to solve linear systems (the code is using a distributed MPI matrix to > solve the generalised > non-Hermitian complex problem). > > I have tried the gdb debugger as per Matthew's suggestion. > Just to note in case someone else is following this that at first it > didn't work (couldn't 'attach') , > but after some googling I found a tip suggesting the command; > echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope > which seemed to get it working. > > *I then first ran the debugger on the small matrix case that worked.* > That stopped in gdb almost immediately after starting execution > with a report regarding 'nanosleep.c': > ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory. > However, issuing the 'cont' command again caused the program to run > through to the end of the > execution w/out any problems, and with correct looking results, so I am > guessing this error > is not particularly important. > We do that on purpose when the debugger starts up. Typing 'cont' is correct. > *I then tried the same debugging procedure on the large matrix case that > fails.* > The code again stopped almost immediately after the start of execution > with > the same nanosleep error as before, and I was able to set the program > running > again with 'cont' (see full output below). I was running the code with 4 > MPI processes, > and so had 4 gdb windows appear. Thereafter the code ran for sometime > until completing the > matrix construction, and then one of the gdb process windows printed a > Program terminated with signal SIGKILL, Killed. > The program no longer exists. > message. I then typed 'where' into this terminal but just received the > message > No stack. > I have only seen this behavior one other time, and it was with Fortran. Fortran allows you to declare really big arrays on the stack by putting them at the start of a function (rather than F90 malloc). When I had one of those arrays exceed the stack space, I got this kind of an error where everything is destroyed rather than just stopping. Could it be that you have a large structure on the stack? Second, you can at least look at the stack for the processes that were not killed. You type Ctrl-C, which should give you the prompt and then "where". Thanks, Matt > The other gdb windows basically seemed to be left in limbo until I issued > the 'quit' > command in the SIGKILL, and then they vanished. > > I paste the full output from the gdb window that recorded the SIGKILL > below here. > I guess it is necessary to somehow work out where the SIGKILL originates > from ? > > Thanks once again, > Dan. > > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2 > Copyright (C) 2020 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later < > http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. > Type "show copying" and "show warranty" for details. > This GDB was configured as "x86_64-linux-gnu". > Type "show configuration" for configuration details. > For bug reporting instructions, please see: > . > Find the GDB manual and other documentation resources online at: > . > > For help, type "help". > Type "apropos word" to search for commands related to "word"... > Reading symbols from ./stab1.exe... > Attaching to program: > /data/work/rotplane/omega_to_zero/stability/test/tmp10/tmp6/stab1.exe, > process 675919 > Reading symbols from > /data/work/slepc/SLEPC/slepc-3.15.1/arch-omp_nodbug/lib/libslepc.so.3.15... > Reading symbols from > /data/work/slepc/PETSC/petsc-3.15.0/arch-omp_nodbug/lib/libpetsc.so.3.15... > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib--Type for > more, q to quit, c to continue without paging--cont > /intel64_lin/libmkl_intel_lp64.so... > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so) > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so... > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so... > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so) > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so... > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so) > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.so... > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.dbg... > Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2... > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libdl-2.31.so... > Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0... > Reading symbols from > /usr/lib/debug/.build-id/e5/4761f7b554d0fcc1562959665d93dffbebdaf0.debug... > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". > Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6... > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libstdc++.so.6) > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpifort.so.12... > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12... > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.dbg... > Reading symbols from /lib/x86_64-linux-gnu/librt.so.1... > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/librt-2.31.so... > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5... > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5) > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so... > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so) > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so... > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so) > Reading symbols from /lib/x86_64-linux-gnu/libm.so.6... > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libm-2.31.so... > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so... > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so) > Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1... > (No debugging symbols found in /lib/x86_64-linux-gnu/libgcc_s.so.1) > Reading symbols from /usr/lib/x86_64-linux-gnu/libquadmath.so.0... > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libquadmath.so.0) > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so... > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so) > Reading symbols from /lib/x86_64-linux-gnu/libc.so.6... > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.31.so... > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so... > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so) > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5... > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5) > Reading symbols from /lib64/ld-linux-x86-64.so.2... > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.31.so... > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1... > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1) > Reading symbols from /usr/lib/x86_64-linux-gnu/libnuma.so... > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libnuma.so) > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so... > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so) > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so... > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so) > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so... > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so) > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so... > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so) > Reading symbols from /usr/lib/x86_64-linux-gnu/libpsm2.so.2... > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libpsm2.so.2) > 0x00007fac4d0d8334 in __GI___clock_nanosleep (clock_id=, > clock_id at entry=0, flags=flags at entry=0, req=req at entry=0x7ffdc641a9a0, > rem=rem at entry=0x7ffdc641a9a0) at > ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78 > 78 ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or > directory. > (gdb) cont > Continuing. > [New Thread 0x7f9e49c02780 (LWP 676559)] > [New Thread 0x7f9e49400800 (LWP 676560)] > [New Thread 0x7f9e48bfe880 (LWP 676562)] > [Thread 0x7f9e48bfe880 (LWP 676562) exited] > [Thread 0x7f9e49400800 (LWP 676560) exited] > [Thread 0x7f9e49c02780 (LWP 676559) exited] > > Program terminated with signal SIGKILL, Killed. > The program no longer exists. > (gdb) where > No stack. > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - - - - - - - - - - - - - > > ------------------------------ > *From:* Matthew Knepley > *Sent:* Friday, August 20, 2021 2:12 PM > *To:* dazza simplythebest > *Cc:* Jose E. Roman ; PETSc > *Subject:* Re: [petsc-users] Improving efficiency of slepc usage > > On Fri, Aug 20, 2021 at 6:55 AM dazza simplythebest > wrote: > > Dear Jose, > Many thanks for your response, I have been investigating this issue > with a few more calculations > today, hence the slightly delayed response. > > The problem is actually derived from a fluid dynamics problem, so to allow > an easier exploration of things > I first downsized the resolution of the underlying fluid solver while > keeping all the physical parameters > the same - i.e. I would get a smaller matrix that should be solving the > same physical problem as the original > larger matrix but to lower accuracy. > > *Results* > > *Small matrix (N= 21168) - everything good!* > This converged when using the -eps_largest_real approach (taking 92 > iterations for nev=10, > tol= 5.0000E-06 and ncv = 300), and also when using the shift-invert > approach, converging > very impressively in a single iteration ! Interestingly it did this both > for a non-zero -eps_target > and also for a zero -eps_target. > > *Large matrix (N=50400)- works for -eps_largest_real , fails for st_type > sinvert * > I have just double checked again that the code does run properly when we > use the -eps_largest_real > option - indeed I ran it with a small nev and large tolerance (nev = 4, > tol= -eps_tol 5.0e-4 , ncv = 300) > and with these parameters convergence was obtained in 164 iterations, > which took 6 hours on the > machine I was running it on. Furthermore the eigenvalues seem to be > ballpark correct; for this large > higher resolution case (although with lower slepc tolerance) we obtain > 1789.56816314173 -4724.51319554773i > as the eigenvalue with largest real part, while the smaller matrix (same > physical problem but at lower resolution case) > found this eigenvalue to be 1831.11845726501 -4787.54519511345i , which > means the agreement is in line > with expectations. > > *Unfortunately though the code does still crash though when I try to do > shift-invert for the large matrix case *, > whether or not I use a non-zero -eps_target. For reference this is the > command line used : > -eps_nev 10 -eps_ncv 300 -log_view -eps_view -eps_target 0.1 > -st_type sinvert -eps_monitor :monitor_output05.txt > To be precise the code crashes soon after calling EPSSolve (it > successfully calls > MatCreateVecs, EPSCreate, EPSSetOperators, EPSSetProblemType and > EPSSetFromOptions). > By crashes I mean that I do not even get any error messages from > slepc/PETSC, and do not even get the > 'EPS Object: 16 MPI processes' message - I simply get a MPI/Fortran > 'KILLED BY SIGNAL: 9 (Killed)' message > as soon as EPSsolve is called. > > > Hi Dan, > > It would help track this error down if we had a stack trace. You can get a > stack trace from the debugger. You run with > > -start_in_debugger > > which should launch the debugger (usually), and then type > > cont > > to continue, and then > > where > > to get the stack trace when it crashes, or 'bt' on lldb. > > Thanks, > > Matt > > > Do you have any ideas as to why this larger matrix case should fail when > using shift-invert but succeed when using > -eps_largest_real ? The fact that the program works and produces correct > results > when using the -eps_largest_real option suggests that there is probably > nothing wrong with the specification > of the problem or the matrices ? It is strange how there is no error > message from slepc / Petsc ... the > only idea I have at the moment is that perhaps max memory has been > exceeded, which could cause such a sudden > shutdown? For your reference when running the large matrix case with the > -eps_largest_real option I am using > about 36 GB of the 148GB available on this machine - does the shift > invert approach require substantially > more memory for example ? > > I would be very grateful if you have any suggestions to resolve this > issue or even ways to clarify it further, > the performance I have seen with the shift-invert for the small matrix is > so impressive it would be great to > get that working for the full-size problem. > > Many thanks and best wishes, > Dan. > > > > ------------------------------ > *From:* Jose E. Roman > *Sent:* Thursday, August 19, 2021 7:58 AM > *To:* dazza simplythebest > *Cc:* PETSc > *Subject:* Re: [petsc-users] Improving efficiency of slepc usage > > In A) convergence may be slow, especially if the wanted eigenvalues have > small magnitude. I would not say 600 iterations is a lot, you probably need > many more. In most cases, approach B) is better because it improves > convergence of eigenvalues close to the target, but it requires prior > knowledge of your spectrum distribution in order to choose an appropriate > target. > > In B) what do you mean that it crashes. If you get an error about > factorization, it means that your A-matrix is singular, In that case, try > using a nonzero target -eps_target 0.1 > > Jose > > > > El 19 ago 2021, a las 7:12, dazza simplythebest > escribi?: > > > > Dear All, > > I am planning on using slepc to do a large number of > eigenvalue calculations > > of a generalized eigenvalue problem, called from a program written in > fortran using MPI. > > Thus far I have successfully installed the slepc/PETSc software, both > locally and on a cluster, > > and on smaller test problems everything is working well; the matrices > are efficiently and > > correctly constructed and slepc returns the correct spectrum. I am just > now starting to move > > towards now solving the full-size 'production run' problems, and would > appreciate some > > general advice on how to improve the solver's performance. > > > > In particular, I am currently trying to solve the problem Ax = lambda Bx > whose matrices > > are of size 50000 (this is the smallest 'production run' problem I will > be tackling), and are > > complex, non-Hermitian. In most cases I aim to find the eigenvalues > with the largest real part, > > although in other cases I will also be interested in finding the > eigenvalues whose real part > > is close to zero. > > > > A) > > Calling slepc 's EPS solver with the following options: > > > > -eps_nev 10 -log_view -eps_view -eps_max_it 600 -eps_ncv 140 -eps_tol > 5.0e-6 -eps_largest_real -eps_monitor :monitor_output.txt > > > > > > led to the code successfully running, but failing to find any > eigenvalues within the maximum 600 iterations > > (examining the monitor output it did appear to be very slowly > approaching convergence). > > > > B) > > On the same problem I have also tried a shift-invert transformation > using the options > > > > -eps_nev 10 -eps_ncv 140 -eps_target 0.0+0.0i -st_type sinvert > > > > -in this case the code crashed at the point it tried to call slepc, so > perhaps I have incorrectly specified these options ? > > > > > > Does anyone have any suggestions as to how to improve this performance ( > or find out more about the problem) ? > > In the case of A) I can see from watching the slepc videos that > increasing ncv > > may help, but I am wondering , since 600 is a large number of > iterations, whether there > > maybe something else going on - e.g. perhaps some alternative > preconditioner may help ? > > In the case of B), I guess there must be some mistake in these command > line options? > > Again, any advice will be greatly appreciated. > > Best wishes, Dan. > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From gsabhishek1ags at gmail.com Tue Aug 24 22:03:31 2021 From: gsabhishek1ags at gmail.com (Abhishek G.S.) Date: Wed, 25 Aug 2021 08:33:31 +0530 Subject: [petsc-users] Static Library based app for petsc Message-ID: Hi, I am trying to develop a static-library-based app using petsc. The structure goes as, . ??? benchmarks ? ??? Test1 ? ??? main.cpp ? ??? Makefile ??? libTest ??? build ??? CMakeLists.txt ??? include ? ??? test.cpp ? ??? test.h ??? lib ??? libTest.a While this code compiles, I am unable to create a minimal working example for the same. The aim is to just print "Hello World". Why is it that nothing prints?. Is it something to do with the PETSC wrapper for cout? Also, I would like to know whether it's a good idea to go ahead with this kind of code structure. Thanks for the help. Code: https://github.com/gsabhishek/PetscStaticLibraryApp.git Thanks for the help -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Tue Aug 24 23:31:22 2021 From: jed at jedbrown.org (Jed Brown) Date: Tue, 24 Aug 2021 22:31:22 -0600 Subject: [petsc-users] Static Library based app for petsc In-Reply-To: References: Message-ID: <87v93u402t.fsf@jedbrown.org> PETSc does not "wrap" cout. Creating a library first, with an executable front-end that most/all initial users will use is generally good design. More users of the library emerge as people try to do more advanced/custom things that are not appropriate to do with the executable. "Abhishek G.S." writes: > Hi, > I am trying to develop a static-library-based app using petsc. The > structure goes as, > . > ??? benchmarks > ? ??? Test1 > ? ??? main.cpp > ? ??? Makefile > ??? libTest > ??? build Are build products meant to go into this build/ directory, not under lib/ as you have it? > ??? CMakeLists.txt > ??? include > ? ??? test.cpp cpp files would usually go under src/ or almost anywhere but in include/ > ? ??? test.h > ??? lib > ??? libTest.a > > While this code compiles, I am unable to create a minimal working example > for the same. The aim is to just print "Hello World". Why is it that > nothing prints?. Is it something to do with the PETSC wrapper for cout? > Also, I would like to know whether it's a good idea to go ahead with this > kind of code structure. > > Thanks for the help. > > Code: https://github.com/gsabhishek/PetscStaticLibraryApp.git > > Thanks for the help From gsabhishek1ags at gmail.com Tue Aug 24 23:59:32 2021 From: gsabhishek1ags at gmail.com (Abhishek G.S.) Date: Wed, 25 Aug 2021 10:29:32 +0530 Subject: [petsc-users] Static Library based app for petsc In-Reply-To: <87v93u402t.fsf@jedbrown.org> References: <87v93u402t.fsf@jedbrown.org> Message-ID: Thanks for the reply. On Wed, 25 Aug 2021 at 10:01, Jed Brown wrote: > PETSc does not "wrap" cout. > What I meant here is that petsc has a custom output stream through PetscPrintf. I was wondering if that might have affected the stdout and hence the no print. The constructor in the libTest/include/test.h was just supposed to print a string when called in the benchmarks/Test1/main.cpp > Creating a library first, with an executable front-end that most/all > initial users will use is generally good design. More users of the library > emerge as people try to do more advanced/custom things that are not > appropriate to do with the executable. > Ok... This makes sense. (It would be great if you could point me towards some project whose structure I can borrow.) Why I did what I did was that if the petsc environment is encapsulated in the library, the rest of the code in the main.cpp would be outside. Since I was writing a code for a very small audience(mostly me) I thought it would be easier to debug if I was inside the petsc environment. > > "Abhishek G.S." writes: > > > Hi, > > I am trying to develop a static-library-based app using petsc. The > > structure goes as, > > . > > ??? benchmarks > > ? ??? Test1 > > ? ??? main.cpp > > ? ??? Makefile > > ??? libTest > > ??? build > > Are build products meant to go into this build/ directory, not under lib/ > as you have it? > I routed the static library output to the libTest/lib folder in the libTest/CMakeLists.txt. The /benchmarks/Test1/Makefile includes this to the ld path > > > ??? CMakeLists.txt > > ??? include > > ? ??? test.cpp > > cpp files would usually go under src/ or almost anywhere but in include/ > noted. > > > ? ??? test.h > > ??? lib > > ??? libTest.a > > > > While this code compiles, I am unable to create a minimal working example > > for the same. The aim is to just print "Hello World". Why is it that > > nothing prints?. Is it something to do with the PETSC wrapper for cout? > > Also, I would like to know whether it's a good idea to go ahead with this > > kind of code structure. > > > > Thanks for the help. > > > > Code: https://github.com/gsabhishek/PetscStaticLibraryApp.git > > > > Thanks for the help > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Aug 25 06:03:21 2021 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 25 Aug 2021 07:03:21 -0400 Subject: [petsc-users] Static Library based app for petsc In-Reply-To: References: <87v93u402t.fsf@jedbrown.org> Message-ID: On Wed, Aug 25, 2021 at 1:00 AM Abhishek G.S. wrote: > Thanks for the reply. > > On Wed, 25 Aug 2021 at 10:01, Jed Brown wrote: > >> PETSc does not "wrap" cout. >> > What I meant here is that petsc has a custom output stream through > PetscPrintf. I was wondering if that might have affected the stdout and > hence the no print. The constructor in the libTest/include/test.h was just > supposed to print a string when called in the benchmarks/Test1/main.cpp > PetscPrintf() just calls printf() underneath. > >> Creating a library first, with an executable front-end that most/all >> initial users will use is generally good design. More users of the library >> emerge as people try to do more advanced/custom things that are not >> appropriate to do with the executable. >> > > Ok... This makes sense. (It would be great if you could point me towards > some project whose structure I can borrow.) > Why I did what I did was that if the petsc environment is encapsulated in > the library, the rest of the code in the main.cpp would be outside. Since I > was writing a code for a very small audience(mostly me) I thought it would > be easier to debug if I was inside the petsc environment. > There are many codes that use PETSc in this way. For example, https://petsc.org/release/#related-toolkits-libraries-that-use-petsc Thanks, Matt > > > >> >> "Abhishek G.S." writes: >> >> > Hi, >> > I am trying to develop a static-library-based app using petsc. The >> > structure goes as, >> > . >> > ??? benchmarks >> > ? ??? Test1 >> > ? ??? main.cpp >> > ? ??? Makefile >> > ??? libTest >> > ??? build >> >> Are build products meant to go into this build/ directory, not under lib/ >> as you have it? >> > > I routed the static library output to the libTest/lib folder in the > libTest/CMakeLists.txt. > The /benchmarks/Test1/Makefile includes this to the ld path > > >> >> > ??? CMakeLists.txt >> > ??? include >> > ? ??? test.cpp >> >> cpp files would usually go under src/ or almost anywhere but in include/ >> > > noted. > > >> >> > ? ??? test.h >> > ??? lib >> > ??? libTest.a >> > >> > While this code compiles, I am unable to create a minimal working >> example >> > for the same. The aim is to just print "Hello World". Why is it that >> > nothing prints?. Is it something to do with the PETSC wrapper for cout? >> > Also, I would like to know whether it's a good idea to go ahead with >> this >> > kind of code structure. >> > >> > Thanks for the help. >> > >> > Code: https://github.com/gsabhishek/PetscStaticLibraryApp.git >> > >> > Thanks for the help >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sayosale at hotmail.com Wed Aug 25 07:11:58 2021 From: sayosale at hotmail.com (dazza simplythebest) Date: Wed, 25 Aug 2021 12:11:58 +0000 Subject: [petsc-users] Fw: Improving efficiency of slepc usage In-Reply-To: References: Message-ID: ________________________________ From: dazza simplythebest Sent: Wednesday, August 25, 2021 12:08 PM To: Matthew Knepley Subject: Re: [petsc-users] Improving efficiency of slepc usage ?Dear Matthew and Jose, I have derived a smaller program from the original program by constructing matrices of the same size, but filling their entries randomly instead of computing the correct fluid dynamics values just to allow faster experimentation. This modified code's behaviour seems to be similar, with the code again failing for the large matrix case with the SIGKILL error, so I first report results from that code here. Firstly I can confirm that I am using Fortran , and I am compiling with the intel compiler, which it seems places automatic arrays on the stack. The stacksize, as determined by ulimit -a, is reported to be : stack size (kbytes, -s) 8192 [1] Okay, so I followed your suggestion and used ctrl-c followed by 'where' in one of the non-SIGKILL gdb windows. I have pasted the output into the bottom of this email (see [1] output) - it does look like the problem occurs somewhere in the call to the MUMPS solver ? [2] I have also today gained access to another workstation, and so have tried running the (original) code on that machine. This new machine has two (more powerful) CPU nodes and a larger memory (both machines feature Intel Xeon processors). On this new machine the large matrix case again failed with the familiar SIGKILL report when I used 16 or 12 MPI processes, ran to the end w/out error for 4 or 6 MPI processes, and failed but with a PETSC error message when I used 8 MPI processes, which I have pasted below (see [2] output). Does this point to some sort of resource demand that exceeds some limit as the number of MPI processes increases ? Many thanks once again, Dan. [2] output [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Error in external library [0]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6 [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 [0]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021 [0]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug [0]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: Error in external library [1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6 [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [1]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 [1]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021 [1]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug [1]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 [1]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 [1]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 [1]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 [1]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 [2]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [2]PETSC ERROR: Error in external library [2]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6 [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [2]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 [2]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021 [2]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug [2]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 [2]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 [2]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 [2]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 [2]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 [3]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [3]PETSC ERROR: Error in external library [3]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6 [3]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [3]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 [3]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021 [3]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug [3]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 [3]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 [3]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 [3]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 [3]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 [3]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 [3]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 [3]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 [4]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [4]PETSC ERROR: Error in external library [4]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6 [4]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [4]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 [4]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021 [4]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug [4]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 [4]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 [4]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 [4]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 [4]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 [4]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 [4]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 [4]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 [4]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 [5]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [5]PETSC ERROR: Error in external library [5]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6 [5]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [5]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 [5]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021 [5]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug [5]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 [5]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 [5]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 [5]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 [5]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 [5]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 [5]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 [5]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 [5]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 [6]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [6]PETSC ERROR: Error in external library [6]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=21891045 [6]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [6]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 [6]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021 [6]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug [6]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 [6]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 [6]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 [6]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 [6]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 [6]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 [6]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 [6]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 [6]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 [7]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [7]PETSC ERROR: Error in external library [7]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=21841925 [7]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [7]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 [7]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021 [7]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug [7]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 [7]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 [7]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 [7]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 [7]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 [7]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 [7]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 [7]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 [7]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 [0]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 [0]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 [0]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 [0]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 [0]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 [0]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 [0]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 [0]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 [1]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 [1]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 [1]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 [1]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 [2]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 [2]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 [2]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 [2]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 [3]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 [1] output Continuing. [New Thread 0x7f6f5b2d2780 (LWP 794037)] [New Thread 0x7f6f5aad0800 (LWP 794040)] [New Thread 0x7f6f5a2ce880 (LWP 794041)] ^C Thread 1 "my.exe" received signal SIGINT, Interrupt. 0x00007f72904927b0 in ofi_fastlock_release_noop () from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so (gdb) where #0 0x00007f72904927b0 in ofi_fastlock_release_noop () from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so #1 0x00007f729049354b in ofi_cq_readfrom () from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so #2 0x00007f728ffe8f0e in rxm_ep_do_progress () from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so #3 0x00007f728ffe2b7d in rxm_ep_recv_common_flags () from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so #4 0x00007f728ffe30f8 in rxm_ep_trecvmsg () from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so #5 0x00007f72fe6b8c3e in PMPI_Iprobe (source=14090824, tag=-1481647392, comm=1, flag=0x0, status=0xffffffffffffffff) at /usr/include/rdma/fi_tagged.h:109 #6 0x00007f72ff3d7fad in pmpi_iprobe_ (v1=0xd70248, v2=0x7ffda7afdae0, v3=0x1, v4=0x0, v5=0xffffffffffffffff, ierr=0xd6fc90) at ../../src/binding/fortran/mpif_h/iprobef.c:276 #7 0x00007f730855b6e2 in zmumps_try_recvtreat (comm_load=1, ass_irecv=0, blocking=, --Type for more, q to quit, c to continue without paging--cont irecv=, message_received=, msgsou=1, msgtag=-1, status=..., bufr=..., lbufr=320782504, lbufr_bytes=1283130016, procnode_steps=..., posfac=1, iwpos=1, iwposcb=292535, iptrlu=2039063816, lrlu=2039063816, lrlus=2039063816, n=50400, iw=..., liw=292563, a=..., la=2611636796, ptrist=..., ptlust=..., ptrfac=..., ptrast=..., step=..., pimaster=..., pamaster=..., nstk_s=..., comp=0, iflag=0, ierror=0, comm=-1006632958, nbprocfils=..., ipool=..., lpool=5, leaf=1, nbfin=4, myid=1, slavef=4, root=, opassw=0, opeliw=0, itloc=..., rhs_mumps=..., fils=..., dad=..., ptrarw=..., ptraiw=..., intarr=..., dblarr=..., icntl=..., keep=..., keep8=..., dkeep=..., nd=..., frere=..., lptrar=50400, nelt=1, frtptr=..., frtelt=..., istep_to_iniv2=..., tab_pos_in_pere=..., stack_right_authorized=4294967295, lrgroups=...) at zfac_process_message.F:730 #8 0x00007f73087076e2 in zmumps_fac_par_m::zmumps_fac_par (n=1, iw=..., liw=, a=..., la=, nstk_steps=..., nbprocfils=..., nd=..., fils=..., step=..., frere=..., dad=..., cand=..., istep_to_iniv2=..., tab_pos_in_pere=..., nstepsdone=1690339657, opass=, opeli=, nelva=50400, comp=259581, maxfrt=-1889517576, nmaxnpiv=-1195144887, ntotpv=, noffnegpv=, nb22t1=, nb22t2=, nbtiny=, det_exp=, det_mant=, det_sign=, ptrist=..., ptrast=..., pimaster=..., pamaster=..., ptrarw=..., ptraiw=..., itloc=..., rhs_mumps=..., ipool=..., lpool=, rinfo=, posfac=, iwpos=, lrlu=, iptrlu=, lrlus=, leaf=, nbroot=, nbrtot=, uu=, icntl=, ptlust=..., ptrfac=..., info=, keep=, keep8=, procnode_steps=..., slavef=, myid=, comm_nodes=, myid_nodes=, bufr=..., lbufr=0, lbufr_bytes=5, intarr=..., dblarr=..., root=..., perm=..., nelt=0, frtptr=..., frtelt=..., lptrar=3, comm_load=-30, ass_irecv=30, seuil=2.1219957909652723e-314, seuil_ldlt_niv2=4.2439866417681519e-314, mem_distrib=..., ne=..., dkeep=..., pivnul_list=..., lpn_list=0, lrgroups=...) at zfac_par_m.F:182 #9 0x00007f730865af7a in zmumps_fac_b (n=1, s_is_pointers=..., la=, liw=, sym_perm=..., na=..., lna=1, ne_steps=..., nfsiz=..., fils=..., step=..., frere=..., dad=..., cand=..., istep_to_iniv2=..., tab_pos_in_pere=..., ptrar=..., ldptrar=, ptrist=..., ptlust_s=..., ptrfac=..., iw1=..., iw2=..., itloc=..., rhs_mumps=..., pool=..., lpool=-1889529280, cntl1=-5.3576889161551131e-255, icntl=, info=..., rinfo=..., keep=..., keep8=..., procnode_steps=..., slavef=-1889504640, comm_nodes=-2048052411, myid=, myid_nodes=-1683330500, bufr=..., lbufr=, lbufr_bytes=, zmumps_lbuf=, intarr=..., dblarr=..., root=, nelt=, frtptr=..., frtelt=..., comm_load=, ass_irecv=, seuil=, seuil_ldlt_niv2=, mem_distrib=, dkeep=, pivnul_list=..., lpn_list=, lrgroups=...) at zfac_b.F:243 #10 0x00007f7308610ff7 in zmumps_fac_driver (id=) at zfac_driver.F:2421 #11 0x00007f7308569256 in zmumps (id=) at zmumps_driver.F:1883 #12 0x00007f73084cf756 in zmumps_f77 (job=1, sym=0, par=, comm_f77=, n=, nblk=1, icntl=..., cntl=..., keep=..., dkeep=..., keep8=..., nz=0, nnz=0, irn=..., irnhere=0, jcn=..., jcnhere=0, a=..., ahere=0, nz_loc=0, nnz_loc=304384739, irn_loc=..., irn_lochere=1, jcn_loc=..., jcn_lochere=1, a_loc=..., a_lochere=1, nelt=0, eltptr=..., eltptrhere=0, eltvar=..., eltvarhere=0, a_elt=..., a_elthere=0, blkptr=..., blkptrhere=0, blkvar=..., blkvarhere=0, perm_in=..., perm_inhere=0, rhs=..., rhshere=0, redrhs=..., redrhshere=0, info=..., rinfo=..., infog=..., rinfog=..., deficiency=0, lwk_user=0, size_schur=0, listvar_schur=..., listvar_schurhere=0, schur=..., schurhere=0, wk_user=..., wk_userhere=0, colsca=..., colscahere=0, rowsca=..., rowscahere=0, instance_number=1, nrhs=1, lrhs=0, lredrhs=0, rhs_sparse=..., rhs_sparsehere=0, sol_loc=..., sol_lochere=0, rhs_loc=..., rhs_lochere=0, irhs_sparse=..., irhs_sparsehere=0, irhs_ptr=..., irhs_ptrhere=0, isol_loc=..., isol_lochere=0, irhs_loc=..., irhs_lochere=0, nz_rhs=0, lsol_loc=0, lrhs_loc=0, nloc_rhs=0, schur_mloc=0, schur_nloc=0, schur_lld=0, mblock=0, nblock=0, nprow=0, npcol=0, ooc_tmpdir=..., ooc_prefix=..., write_problem=..., save_dir=..., save_prefix=..., tmpdirlen=20, prefixlen=20, write_problemlen=20, save_dirlen=20, save_prefixlen=20, metis_options=...) at zmumps_f77.F:289 #13 0x00007f73084cd391 in zmumps_c (mumps_par=0xd70248) at mumps_c.c:485 #14 0x00007f7307c035ad in MatFactorNumeric_MUMPS (F=0xd70248, A=0x7ffda7afdae0, info=0x1) at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1683 #15 0x00007f7307765a8b in MatLUFactorNumeric (fact=0xd70248, mat=0x7ffda7afdae0, info=0x1) at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 #16 0x00007f73081b8427 in PCSetUp_LU (pc=0xd70248) at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 #17 0x00007f7308214939 in PCSetUp (pc=0xd70248) at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 #18 0x00007f73082260ae in KSPSetUp (ksp=0xd70248) at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 #19 0x00007f7309114959 in STSetUp_Sinvert (st=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 #20 0x00007f7309130462 in STSetUp (st=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 #21 0x00007f73092504af in EPSSetUp (eps=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 #22 0x00007f7309253635 in EPSSolve (eps=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 #23 0x00007f7309259c8d in epssolve_ (eps=0xd70248, __ierr=0x7ffda7afdae0) at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/ftn-auto/epssolvef.c:85 #24 0x0000000000403c19 in all_stab_routines::solve_by_slepc2 (a_pet=..., b_pet=..., jthisone=, isize=) at small_slepc_example_program.F:322 #25 0x00000000004025a0 in slepit () at small_slepc_example_program.F:549 #26 0x00000000004023f2 in main () #27 0x00007f72fb8380b3 in __libc_start_main (main=0x4023c0
, argc=14, argv=0x7ffda7b024e8, init=, fini=, rtld_fini=, stack_end=0x7ffda7b024d8) at ../csu/libc-start.c:308 #28 0x00000000004022fe in _start () ________________________________ From: Matthew Knepley Sent: Tuesday, August 24, 2021 3:59 PM To: dazza simplythebest Cc: Jose E. Roman ; PETSc Subject: Re: [petsc-users] Improving efficiency of slepc usage On Tue, Aug 24, 2021 at 8:47 AM dazza simplythebest > wrote: Dear Matthew and Jose, Apologies for the delayed reply, I had a couple of unforeseen days off this week. Firstly regarding Jose's suggestion re: MUMPS, the program is already using MUMPS to solve linear systems (the code is using a distributed MPI matrix to solve the generalised non-Hermitian complex problem). I have tried the gdb debugger as per Matthew's suggestion. Just to note in case someone else is following this that at first it didn't work (couldn't 'attach') , but after some googling I found a tip suggesting the command; echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope which seemed to get it working. I then first ran the debugger on the small matrix case that worked. That stopped in gdb almost immediately after starting execution with a report regarding 'nanosleep.c': ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory. However, issuing the 'cont' command again caused the program to run through to the end of the execution w/out any problems, and with correct looking results, so I am guessing this error is not particularly important. We do that on purpose when the debugger starts up. Typing 'cont' is correct. I then tried the same debugging procedure on the large matrix case that fails. The code again stopped almost immediately after the start of execution with the same nanosleep error as before, and I was able to set the program running again with 'cont' (see full output below). I was running the code with 4 MPI processes, and so had 4 gdb windows appear. Thereafter the code ran for sometime until completing the matrix construction, and then one of the gdb process windows printed a Program terminated with signal SIGKILL, Killed. The program no longer exists. message. I then typed 'where' into this terminal but just received the message No stack. I have only seen this behavior one other time, and it was with Fortran. Fortran allows you to declare really big arrays on the stack by putting them at the start of a function (rather than F90 malloc). When I had one of those arrays exceed the stack space, I got this kind of an error where everything is destroyed rather than just stopping. Could it be that you have a large structure on the stack? Second, you can at least look at the stack for the processes that were not killed. You type Ctrl-C, which should give you the prompt and then "where". Thanks, Matt The other gdb windows basically seemed to be left in limbo until I issued the 'quit' command in the SIGKILL, and then they vanished. I paste the full output from the gdb window that recorded the SIGKILL below here. I guess it is necessary to somehow work out where the SIGKILL originates from ? Thanks once again, Dan. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2 Copyright (C) 2020 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: . Find the GDB manual and other documentation resources online at: . For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./stab1.exe... Attaching to program: /data/work/rotplane/omega_to_zero/stability/test/tmp10/tmp6/stab1.exe, process 675919 Reading symbols from /data/work/slepc/SLEPC/slepc-3.15.1/arch-omp_nodbug/lib/libslepc.so.3.15... Reading symbols from /data/work/slepc/PETSC/petsc-3.15.0/arch-omp_nodbug/lib/libpetsc.so.3.15... Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib--Type for more, q to quit, c to continue without paging--cont /intel64_lin/libmkl_intel_lp64.so... (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so) Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so... Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so... (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so) Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so... (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so) Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.so... Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.dbg... Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2... Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libdl-2.31.so... Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0... Reading symbols from /usr/lib/debug/.build-id/e5/4761f7b554d0fcc1562959665d93dffbebdaf0.debug... [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6... (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libstdc++.so.6) Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpifort.so.12... Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12... Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.dbg... Reading symbols from /lib/x86_64-linux-gnu/librt.so.1... Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/librt-2.31.so... Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5... (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5) Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so... (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so) Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so... (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so) Reading symbols from /lib/x86_64-linux-gnu/libm.so.6... Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libm-2.31.so... Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so... (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so) Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1... (No debugging symbols found in /lib/x86_64-linux-gnu/libgcc_s.so.1) Reading symbols from /usr/lib/x86_64-linux-gnu/libquadmath.so.0... (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libquadmath.so.0) Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so... (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so) Reading symbols from /lib/x86_64-linux-gnu/libc.so.6... Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.31.so... Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so... (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so) Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5... (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5) Reading symbols from /lib64/ld-linux-x86-64.so.2... Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.31.so... Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1... (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1) Reading symbols from /usr/lib/x86_64-linux-gnu/libnuma.so... (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libnuma.so) Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so... (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so) Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so... (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so) Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so... (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so) Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so... (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so) Reading symbols from /usr/lib/x86_64-linux-gnu/libpsm2.so.2... (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libpsm2.so.2) 0x00007fac4d0d8334 in __GI___clock_nanosleep (clock_id=, clock_id at entry=0, flags=flags at entry=0, req=req at entry=0x7ffdc641a9a0, rem=rem at entry=0x7ffdc641a9a0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78 78 ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory. (gdb) cont Continuing. [New Thread 0x7f9e49c02780 (LWP 676559)] [New Thread 0x7f9e49400800 (LWP 676560)] [New Thread 0x7f9e48bfe880 (LWP 676562)] [Thread 0x7f9e48bfe880 (LWP 676562) exited] [Thread 0x7f9e49400800 (LWP 676560) exited] [Thread 0x7f9e49c02780 (LWP 676559) exited] Program terminated with signal SIGKILL, Killed. The program no longer exists. (gdb) where No stack. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ________________________________ From: Matthew Knepley > Sent: Friday, August 20, 2021 2:12 PM To: dazza simplythebest > Cc: Jose E. Roman >; PETSc > Subject: Re: [petsc-users] Improving efficiency of slepc usage On Fri, Aug 20, 2021 at 6:55 AM dazza simplythebest > wrote: Dear Jose, Many thanks for your response, I have been investigating this issue with a few more calculations today, hence the slightly delayed response. The problem is actually derived from a fluid dynamics problem, so to allow an easier exploration of things I first downsized the resolution of the underlying fluid solver while keeping all the physical parameters the same - i.e. I would get a smaller matrix that should be solving the same physical problem as the original larger matrix but to lower accuracy. Results Small matrix (N= 21168) - everything good! This converged when using the -eps_largest_real approach (taking 92 iterations for nev=10, tol= 5.0000E-06 and ncv = 300), and also when using the shift-invert approach, converging very impressively in a single iteration ! Interestingly it did this both for a non-zero -eps_target and also for a zero -eps_target. Large matrix (N=50400)- works for -eps_largest_real , fails for st_type sinvert I have just double checked again that the code does run properly when we use the -eps_largest_real option - indeed I ran it with a small nev and large tolerance (nev = 4, tol= -eps_tol 5.0e-4 , ncv = 300) and with these parameters convergence was obtained in 164 iterations, which took 6 hours on the machine I was running it on. Furthermore the eigenvalues seem to be ballpark correct; for this large higher resolution case (although with lower slepc tolerance) we obtain 1789.56816314173 -4724.51319554773i as the eigenvalue with largest real part, while the smaller matrix (same physical problem but at lower resolution case) found this eigenvalue to be 1831.11845726501 -4787.54519511345i , which means the agreement is in line with expectations. Unfortunately though the code does still crash though when I try to do shift-invert for the large matrix case , whether or not I use a non-zero -eps_target. For reference this is the command line used : -eps_nev 10 -eps_ncv 300 -log_view -eps_view -eps_target 0.1 -st_type sinvert -eps_monitor :monitor_output05.txt To be precise the code crashes soon after calling EPSSolve (it successfully calls MatCreateVecs, EPSCreate, EPSSetOperators, EPSSetProblemType and EPSSetFromOptions). By crashes I mean that I do not even get any error messages from slepc/PETSC, and do not even get the 'EPS Object: 16 MPI processes' message - I simply get a MPI/Fortran 'KILLED BY SIGNAL: 9 (Killed)' message as soon as EPSsolve is called. Hi Dan, It would help track this error down if we had a stack trace. You can get a stack trace from the debugger. You run with -start_in_debugger which should launch the debugger (usually), and then type cont to continue, and then where to get the stack trace when it crashes, or 'bt' on lldb. Thanks, Matt Do you have any ideas as to why this larger matrix case should fail when using shift-invert but succeed when using -eps_largest_real ? The fact that the program works and produces correct results when using the -eps_largest_real option suggests that there is probably nothing wrong with the specification of the problem or the matrices ? It is strange how there is no error message from slepc / Petsc ... the only idea I have at the moment is that perhaps max memory has been exceeded, which could cause such a sudden shutdown? For your reference when running the large matrix case with the -eps_largest_real option I am using about 36 GB of the 148GB available on this machine - does the shift invert approach require substantially more memory for example ? I would be very grateful if you have any suggestions to resolve this issue or even ways to clarify it further, the performance I have seen with the shift-invert for the small matrix is so impressive it would be great to get that working for the full-size problem. Many thanks and best wishes, Dan. ________________________________ From: Jose E. Roman > Sent: Thursday, August 19, 2021 7:58 AM To: dazza simplythebest > Cc: PETSc > Subject: Re: [petsc-users] Improving efficiency of slepc usage In A) convergence may be slow, especially if the wanted eigenvalues have small magnitude. I would not say 600 iterations is a lot, you probably need many more. In most cases, approach B) is better because it improves convergence of eigenvalues close to the target, but it requires prior knowledge of your spectrum distribution in order to choose an appropriate target. In B) what do you mean that it crashes. If you get an error about factorization, it means that your A-matrix is singular, In that case, try using a nonzero target -eps_target 0.1 Jose > El 19 ago 2021, a las 7:12, dazza simplythebest > escribi?: > > Dear All, > I am planning on using slepc to do a large number of eigenvalue calculations > of a generalized eigenvalue problem, called from a program written in fortran using MPI. > Thus far I have successfully installed the slepc/PETSc software, both locally and on a cluster, > and on smaller test problems everything is working well; the matrices are efficiently and > correctly constructed and slepc returns the correct spectrum. I am just now starting to move > towards now solving the full-size 'production run' problems, and would appreciate some > general advice on how to improve the solver's performance. > > In particular, I am currently trying to solve the problem Ax = lambda Bx whose matrices > are of size 50000 (this is the smallest 'production run' problem I will be tackling), and are > complex, non-Hermitian. In most cases I aim to find the eigenvalues with the largest real part, > although in other cases I will also be interested in finding the eigenvalues whose real part > is close to zero. > > A) > Calling slepc 's EPS solver with the following options: > > -eps_nev 10 -log_view -eps_view -eps_max_it 600 -eps_ncv 140 -eps_tol 5.0e-6 -eps_largest_real -eps_monitor :monitor_output.txt > > > led to the code successfully running, but failing to find any eigenvalues within the maximum 600 iterations > (examining the monitor output it did appear to be very slowly approaching convergence). > > B) > On the same problem I have also tried a shift-invert transformation using the options > > -eps_nev 10 -eps_ncv 140 -eps_target 0.0+0.0i -st_type sinvert > > -in this case the code crashed at the point it tried to call slepc, so perhaps I have incorrectly specified these options ? > > > Does anyone have any suggestions as to how to improve this performance ( or find out more about the problem) ? > In the case of A) I can see from watching the slepc videos that increasing ncv > may help, but I am wondering , since 600 is a large number of iterations, whether there > maybe something else going on - e.g. perhaps some alternative preconditioner may help ? > In the case of B), I guess there must be some mistake in these command line options? > Again, any advice will be greatly appreciated. > Best wishes, Dan. -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From gsharma4189 at gmail.com Wed Aug 25 08:07:34 2021 From: gsharma4189 at gmail.com (govind sharma) Date: Wed, 25 Aug 2021 18:37:34 +0530 Subject: [petsc-users] laplace_equation Message-ID: Hi, I want to solve 2D laplace equations at the starting level with petsc4py in parallel using mpi4py. Any examples or tutorials? Regards, Govind Sharma Phd scholar, Indian Institute of Technology, Delhi -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Wed Aug 25 08:40:57 2021 From: jroman at dsic.upv.es (Jose E. Roman) Date: Wed, 25 Aug 2021 15:40:57 +0200 Subject: [petsc-users] Improving efficiency of slepc usage In-Reply-To: References: Message-ID: MUMPS documentation (section 8) indicates that the meaning of INFOG(1)=-9 is insuficient workspace. Try running with -st_mat_mumps_icntl_14 where is the percentage in which you want to increase the workspace, e.g. 50 or 100 or more. See ex43.c for an example showing how to set this option in code. Jose > El 25 ago 2021, a las 14:11, dazza simplythebest escribi?: > > > > From: dazza simplythebest > Sent: Wednesday, August 25, 2021 12:08 PM > To: Matthew Knepley > Subject: Re: [petsc-users] Improving efficiency of slepc usage > > ?Dear Matthew and Jose, > I have derived a smaller program from the original program by constructing > matrices of the same size, but filling their entries randomly instead of computing the correct > fluid dynamics values just to allow faster experimentation. This modified code's behaviour seems > to be similar, with the code again failing for the large matrix case with the SIGKILL error, so I first report > results from that code here. Firstly I can confirm that I am using Fortran , and I am compiling with the > intel compiler, which it seems places automatic arrays on the stack. The stacksize, as determined > by ulimit -a, is reported to be : > stack size (kbytes, -s) 8192 > > [1] Okay, so I followed your suggestion and used ctrl-c followed by 'where' in one of the non-SIGKILL gdb windows. > I have pasted the output into the bottom of this email (see [1] output) - it does look like the problem occurs somewhere in the call > to the MUMPS solver ? > > [2] I have also today gained access to another workstation, and so have tried running the (original) code on that machine. > This new machine has two (more powerful) CPU nodes and a larger memory (both machines feature Intel Xeon processors). > On this new machine the large matrix case again failed with the familiar SIGKILL report when I used 16 or 12 MPI > processes, ran to the end w/out error for 4 or 6 MPI processes, and failed but with a PETSC error message > when I used 8 MPI processes, which I have pasted below (see [2] output). Does this point to some sort of resource > demand that exceeds some limit as the number of MPI processes increases ? > > Many thanks once again, > Dan. > > [2] output > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Error in external library > [0]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6 > > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [0]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021 > [0]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug > [0]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [1]PETSC ERROR: Error in external library > [1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6 > > [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [1]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [1]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021 > [1]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug > [1]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > [1]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > [1]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > [1]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > [1]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > [2]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [2]PETSC ERROR: Error in external library > [2]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6 > > [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [2]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [2]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021 > [2]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug > [2]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > [2]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > [2]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > [2]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > [2]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > [3]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [3]PETSC ERROR: Error in external library > [3]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6 > > [3]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [3]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [3]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021 > [3]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug > [3]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > [3]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > [3]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > [3]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > [3]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > [3]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > [3]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > [3]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > [4]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [4]PETSC ERROR: Error in external library > [4]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6 > > [4]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [4]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [4]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021 > [4]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug > [4]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > [4]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > [4]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > [4]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > [4]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > [4]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > [4]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > [4]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > [4]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > [5]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [5]PETSC ERROR: Error in external library > [5]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6 > > [5]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [5]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [5]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021 > [5]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug > [5]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > [5]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > [5]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > [5]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > [5]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > [5]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > [5]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > [5]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > [5]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > [6]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [6]PETSC ERROR: Error in external library > [6]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=21891045 > > [6]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [6]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [6]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021 > [6]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug > [6]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > [6]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > [6]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > [6]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > [6]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > [6]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > [6]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > [6]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > [6]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > [7]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [7]PETSC ERROR: Error in external library > [7]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=21841925 > > [7]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [7]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [7]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021 > [7]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug > [7]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > [7]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > [7]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > [7]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > [7]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > [7]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > [7]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > [7]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > [7]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > [0]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > [0]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > [0]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > [0]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > [0]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > [0]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > [0]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > [0]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > [1]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > [1]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > [1]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > [1]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > [2]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > [2]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > [2]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > [2]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > [3]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > > > > [1] output > > Continuing. > [New Thread 0x7f6f5b2d2780 (LWP 794037)] > [New Thread 0x7f6f5aad0800 (LWP 794040)] > [New Thread 0x7f6f5a2ce880 (LWP 794041)] > ^C > Thread 1 "my.exe" received signal SIGINT, Interrupt. > 0x00007f72904927b0 in ofi_fastlock_release_noop () > from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so > (gdb) where > #0 0x00007f72904927b0 in ofi_fastlock_release_noop () > from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so > #1 0x00007f729049354b in ofi_cq_readfrom () > from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so > #2 0x00007f728ffe8f0e in rxm_ep_do_progress () > from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so > #3 0x00007f728ffe2b7d in rxm_ep_recv_common_flags () > from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so > #4 0x00007f728ffe30f8 in rxm_ep_trecvmsg () > from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so > #5 0x00007f72fe6b8c3e in PMPI_Iprobe (source=14090824, tag=-1481647392, > comm=1, flag=0x0, status=0xffffffffffffffff) > at /usr/include/rdma/fi_tagged.h:109 > #6 0x00007f72ff3d7fad in pmpi_iprobe_ (v1=0xd70248, v2=0x7ffda7afdae0, > v3=0x1, v4=0x0, v5=0xffffffffffffffff, ierr=0xd6fc90) > at ../../src/binding/fortran/mpif_h/iprobef.c:276 > #7 0x00007f730855b6e2 in zmumps_try_recvtreat (comm_load=1, ass_irecv=0, > blocking=, > > --Type for more, q to quit, c to continue without paging--cont > irecv=, message_received=, msgsou=1, msgtag=-1, status=..., bufr=..., lbufr=320782504, lbufr_bytes=1283130016, procnode_steps=..., posfac=1, iwpos=1, iwposcb=292535, iptrlu=2039063816, lrlu=2039063816, lrlus=2039063816, n=50400, iw=..., liw=292563, a=..., la=2611636796, ptrist=..., ptlust=..., ptrfac=..., ptrast=..., step=..., pimaster=..., pamaster=..., nstk_s=..., comp=0, iflag=0, ierror=0, comm=-1006632958, nbprocfils=..., ipool=..., lpool=5, leaf=1, nbfin=4, myid=1, slavef=4, root=, opassw=0, opeliw=0, itloc=..., rhs_mumps=..., fils=..., dad=..., ptrarw=..., ptraiw=..., intarr=..., dblarr=..., icntl=..., keep=..., keep8=..., dkeep=..., nd=..., frere=..., lptrar=50400, nelt=1, frtptr=..., frtelt=..., istep_to_iniv2=..., tab_pos_in_pere=..., stack_right_authorized=4294967295, lrgroups=...) at zfac_process_message.F:730 > #8 0x00007f73087076e2 in zmumps_fac_par_m::zmumps_fac_par (n=1, iw=..., liw=, a=..., la=, nstk_steps=..., nbprocfils=..., nd=..., fils=..., step=..., frere=..., dad=..., cand=..., istep_to_iniv2=..., tab_pos_in_pere=..., nstepsdone=1690339657, opass=, opeli=, nelva=50400, comp=259581, maxfrt=-1889517576, nmaxnpiv=-1195144887, ntotpv=, noffnegpv=, nb22t1=, nb22t2=, nbtiny=, det_exp=, det_mant=, det_sign=, ptrist=..., ptrast=..., pimaster=..., pamaster=..., ptrarw=..., ptraiw=..., itloc=..., rhs_mumps=..., ipool=..., lpool=, rinfo=, posfac=, iwpos=, lrlu=, iptrlu=, lrlus=, leaf=, nbroot=, nbrtot=, uu=, icntl=, ptlust=..., ptrfac=..., info=, keep=, keep8=, procnode_steps=..., slavef=, myid=, comm_nodes=, myid_nodes=, bufr=..., lbufr=0, lbufr_bytes=5, intarr=..., dblarr=..., root=..., perm=..., nelt=0, frtptr=..., frtelt=..., lptrar=3, comm_load=-30, ass_irecv=30, seuil=2.1219957909652723e-314, seuil_ldlt_niv2=4.2439866417681519e-314, mem_distrib=..., ne=..., dkeep=..., pivnul_list=..., lpn_list=0, lrgroups=...) at zfac_par_m.F:182 > #9 0x00007f730865af7a in zmumps_fac_b (n=1, s_is_pointers=..., la=, liw=, sym_perm=..., na=..., lna=1, ne_steps=..., nfsiz=..., fils=..., step=..., frere=..., dad=..., cand=..., istep_to_iniv2=..., tab_pos_in_pere=..., ptrar=..., ldptrar=, ptrist=..., ptlust_s=..., ptrfac=..., iw1=..., iw2=..., itloc=..., rhs_mumps=..., pool=..., lpool=-1889529280, cntl1=-5.3576889161551131e-255, icntl=, info=..., rinfo=..., keep=..., keep8=..., procnode_steps=..., slavef=-1889504640, comm_nodes=-2048052411, myid=, myid_nodes=-1683330500, bufr=..., lbufr=, lbufr_bytes=, zmumps_lbuf=, intarr=..., dblarr=..., root=, nelt=, frtptr=..., frtelt=..., comm_load=, ass_irecv=, seuil=, seuil_ldlt_niv2=, mem_distrib=, dkeep=, pivnul_list=..., lpn_list=, lrgroups=...) at zfac_b.F:243 > #10 0x00007f7308610ff7 in zmumps_fac_driver (id=) at zfac_driver.F:2421 > #11 0x00007f7308569256 in zmumps (id=) at zmumps_driver.F:1883 > #12 0x00007f73084cf756 in zmumps_f77 (job=1, sym=0, par=, comm_f77=, n=, nblk=1, icntl=..., cntl=..., keep=..., dkeep=..., keep8=..., nz=0, nnz=0, irn=..., irnhere=0, jcn=..., jcnhere=0, a=..., ahere=0, nz_loc=0, nnz_loc=304384739, irn_loc=..., irn_lochere=1, jcn_loc=..., jcn_lochere=1, a_loc=..., a_lochere=1, nelt=0, eltptr=..., eltptrhere=0, eltvar=..., eltvarhere=0, a_elt=..., a_elthere=0, blkptr=..., blkptrhere=0, blkvar=..., blkvarhere=0, perm_in=..., perm_inhere=0, rhs=..., rhshere=0, redrhs=..., redrhshere=0, info=..., rinfo=..., infog=..., rinfog=..., deficiency=0, lwk_user=0, size_schur=0, listvar_schur=..., listvar_schurhere=0, schur=..., schurhere=0, wk_user=..., wk_userhere=0, colsca=..., colscahere=0, rowsca=..., rowscahere=0, instance_number=1, nrhs=1, lrhs=0, lredrhs=0, rhs_sparse=..., rhs_sparsehere=0, sol_loc=..., sol_lochere=0, rhs_loc=..., rhs_lochere=0, irhs_sparse=..., irhs_sparsehere=0, irhs_ptr=..., irhs_ptrhere=0, isol_loc=..., isol_lochere=0, irhs_loc=..., irhs_lochere=0, nz_rhs=0, lsol_loc=0, lrhs_loc=0, nloc_rhs=0, schur_mloc=0, schur_nloc=0, schur_lld=0, mblock=0, nblock=0, nprow=0, npcol=0, ooc_tmpdir=..., ooc_prefix=..., write_problem=..., save_dir=..., save_prefix=..., tmpdirlen=20, prefixlen=20, write_problemlen=20, save_dirlen=20, save_prefixlen=20, metis_options=...) at zmumps_f77.F:289 > #13 0x00007f73084cd391 in zmumps_c (mumps_par=0xd70248) at mumps_c.c:485 > #14 0x00007f7307c035ad in MatFactorNumeric_MUMPS (F=0xd70248, A=0x7ffda7afdae0, info=0x1) at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1683 > #15 0x00007f7307765a8b in MatLUFactorNumeric (fact=0xd70248, mat=0x7ffda7afdae0, info=0x1) at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > #16 0x00007f73081b8427 in PCSetUp_LU (pc=0xd70248) at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > #17 0x00007f7308214939 in PCSetUp (pc=0xd70248) at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > #18 0x00007f73082260ae in KSPSetUp (ksp=0xd70248) at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > #19 0x00007f7309114959 in STSetUp_Sinvert (st=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > #20 0x00007f7309130462 in STSetUp (st=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > #21 0x00007f73092504af in EPSSetUp (eps=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > #22 0x00007f7309253635 in EPSSolve (eps=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > #23 0x00007f7309259c8d in epssolve_ (eps=0xd70248, __ierr=0x7ffda7afdae0) at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/ftn-auto/epssolvef.c:85 > #24 0x0000000000403c19 in all_stab_routines::solve_by_slepc2 (a_pet=..., b_pet=..., jthisone=, isize=) at small_slepc_example_program.F:322 > #25 0x00000000004025a0 in slepit () at small_slepc_example_program.F:549 > #26 0x00000000004023f2 in main () > #27 0x00007f72fb8380b3 in __libc_start_main (main=0x4023c0
, argc=14, argv=0x7ffda7b024e8, init=, fini=, rtld_fini=, stack_end=0x7ffda7b024d8) at ../csu/libc-start.c:308 > #28 0x00000000004022fe in _start () > > From: Matthew Knepley > Sent: Tuesday, August 24, 2021 3:59 PM > To: dazza simplythebest > Cc: Jose E. Roman ; PETSc > Subject: Re: [petsc-users] Improving efficiency of slepc usage > > On Tue, Aug 24, 2021 at 8:47 AM dazza simplythebest wrote: > > Dear Matthew and Jose, > Apologies for the delayed reply, I had a couple of unforeseen days off this week. > Firstly regarding Jose's suggestion re: MUMPS, the program is already using MUMPS > to solve linear systems (the code is using a distributed MPI matrix to solve the generalised > non-Hermitian complex problem). > > I have tried the gdb debugger as per Matthew's suggestion. > Just to note in case someone else is following this that at first it didn't work (couldn't 'attach') , > but after some googling I found a tip suggesting the command; > echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope > which seemed to get it working. > > I then first ran the debugger on the small matrix case that worked. > That stopped in gdb almost immediately after starting execution > with a report regarding 'nanosleep.c': > ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory. > However, issuing the 'cont' command again caused the program to run through to the end of the > execution w/out any problems, and with correct looking results, so I am guessing this error > is not particularly important. > > We do that on purpose when the debugger starts up. Typing 'cont' is correct. > > I then tried the same debugging procedure on the large matrix case that fails. > The code again stopped almost immediately after the start of execution with > the same nanosleep error as before, and I was able to set the program running > again with 'cont' (see full output below). I was running the code with 4 MPI processes, > and so had 4 gdb windows appear. Thereafter the code ran for sometime until completing the > matrix construction, and then one of the gdb process windows printed a > Program terminated with signal SIGKILL, Killed. > The program no longer exists. > message. I then typed 'where' into this terminal but just received the message > No stack. > > I have only seen this behavior one other time, and it was with Fortran. Fortran allows you to declare really big arrays > on the stack by putting them at the start of a function (rather than F90 malloc). When I had one of those arrays exceed > the stack space, I got this kind of an error where everything is destroyed rather than just stopping. Could it be that you > have a large structure on the stack? > > Second, you can at least look at the stack for the processes that were not killed. You type Ctrl-C, which should give you > the prompt and then "where". > > Thanks, > > Matt > > The other gdb windows basically seemed to be left in limbo until I issued the 'quit' > command in the SIGKILL, and then they vanished. > > I paste the full output from the gdb window that recorded the SIGKILL below here. > I guess it is necessary to somehow work out where the SIGKILL originates from ? > > Thanks once again, > Dan. > > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2 > Copyright (C) 2020 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. > Type "show copying" and "show warranty" for details. > This GDB was configured as "x86_64-linux-gnu". > Type "show configuration" for configuration details. > For bug reporting instructions, please see: > . > Find the GDB manual and other documentation resources online at: > . > > For help, type "help". > Type "apropos word" to search for commands related to "word"... > Reading symbols from ./stab1.exe... > Attaching to program: /data/work/rotplane/omega_to_zero/stability/test/tmp10/tmp6/stab1.exe, process 675919 > Reading symbols from /data/work/slepc/SLEPC/slepc-3.15.1/arch-omp_nodbug/lib/libslepc.so.3.15... > Reading symbols from /data/work/slepc/PETSC/petsc-3.15.0/arch-omp_nodbug/lib/libpetsc.so.3.15... > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib--Type for more, q to quit, c to continue without paging--cont > /intel64_lin/libmkl_intel_lp64.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so... > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.so... > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.dbg... > Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2... > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libdl-2.31.so... > Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0... > Reading symbols from /usr/lib/debug/.build-id/e5/4761f7b554d0fcc1562959665d93dffbebdaf0.debug... > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". > Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6... > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libstdc++.so.6) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpifort.so.12... > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12... > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.dbg... > Reading symbols from /lib/x86_64-linux-gnu/librt.so.1... > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/librt-2.31.so... > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so) > Reading symbols from /lib/x86_64-linux-gnu/libm.so.6... > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libm-2.31.so... > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so) > Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1... > (No debugging symbols found in /lib/x86_64-linux-gnu/libgcc_s.so.1) > Reading symbols from /usr/lib/x86_64-linux-gnu/libquadmath.so.0... > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libquadmath.so.0) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so) > Reading symbols from /lib/x86_64-linux-gnu/libc.so.6... > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.31.so... > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5) > Reading symbols from /lib64/ld-linux-x86-64.so.2... > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.31.so... > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1) > Reading symbols from /usr/lib/x86_64-linux-gnu/libnuma.so... > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libnuma.so) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so) > Reading symbols from /usr/lib/x86_64-linux-gnu/libpsm2.so.2... > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libpsm2.so.2) > 0x00007fac4d0d8334 in __GI___clock_nanosleep (clock_id=, clock_id at entry=0, flags=flags at entry=0, req=req at entry=0x7ffdc641a9a0, rem=rem at entry=0x7ffdc641a9a0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78 > 78 ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory. > (gdb) cont > Continuing. > [New Thread 0x7f9e49c02780 (LWP 676559)] > [New Thread 0x7f9e49400800 (LWP 676560)] > [New Thread 0x7f9e48bfe880 (LWP 676562)] > [Thread 0x7f9e48bfe880 (LWP 676562) exited] > [Thread 0x7f9e49400800 (LWP 676560) exited] > [Thread 0x7f9e49c02780 (LWP 676559) exited] > > Program terminated with signal SIGKILL, Killed. > The program no longer exists. > (gdb) where > No stack. > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > From: Matthew Knepley > Sent: Friday, August 20, 2021 2:12 PM > To: dazza simplythebest > Cc: Jose E. Roman ; PETSc > Subject: Re: [petsc-users] Improving efficiency of slepc usage > > On Fri, Aug 20, 2021 at 6:55 AM dazza simplythebest wrote: > Dear Jose, > Many thanks for your response, I have been investigating this issue with a few more calculations > today, hence the slightly delayed response. > > The problem is actually derived from a fluid dynamics problem, so to allow an easier exploration of things > I first downsized the resolution of the underlying fluid solver while keeping all the physical parameters > the same - i.e. I would get a smaller matrix that should be solving the same physical problem as the original > larger matrix but to lower accuracy. > > Results > > Small matrix (N= 21168) - everything good! > This converged when using the -eps_largest_real approach (taking 92 iterations for nev=10, > tol= 5.0000E-06 and ncv = 300), and also when using the shift-invert approach, converging > very impressively in a single iteration ! Interestingly it did this both for a non-zero -eps_target > and also for a zero -eps_target. > > Large matrix (N=50400)- works for -eps_largest_real , fails for st_type sinvert > I have just double checked again that the code does run properly when we use the -eps_largest_real > option - indeed I ran it with a small nev and large tolerance (nev = 4, tol= -eps_tol 5.0e-4 , ncv = 300) > and with these parameters convergence was obtained in 164 iterations, which took 6 hours on the > machine I was running it on. Furthermore the eigenvalues seem to be ballpark correct; for this large > higher resolution case (although with lower slepc tolerance) we obtain 1789.56816314173 -4724.51319554773i > as the eigenvalue with largest real part, while the smaller matrix (same physical problem but at lower resolution case) > found this eigenvalue to be 1831.11845726501 -4787.54519511345i , which means the agreement is in line > with expectations. > > Unfortunately though the code does still crash though when I try to do shift-invert for the large matrix case , > whether or not I use a non-zero -eps_target. For reference this is the command line used : > -eps_nev 10 -eps_ncv 300 -log_view -eps_view -eps_target 0.1 -st_type sinvert -eps_monitor :monitor_output05.txt > To be precise the code crashes soon after calling EPSSolve (it successfully calls > MatCreateVecs, EPSCreate, EPSSetOperators, EPSSetProblemType and EPSSetFromOptions). > By crashes I mean that I do not even get any error messages from slepc/PETSC, and do not even get the > 'EPS Object: 16 MPI processes' message - I simply get a MPI/Fortran 'KILLED BY SIGNAL: 9 (Killed)' message > as soon as EPSsolve is called. > > Hi Dan, > > It would help track this error down if we had a stack trace. You can get a stack trace from the debugger. You run with > > -start_in_debugger > > which should launch the debugger (usually), and then type > > cont > > to continue, and then > > where > > to get the stack trace when it crashes, or 'bt' on lldb. > > Thanks, > > Matt > > Do you have any ideas as to why this larger matrix case should fail when using shift-invert but succeed when using > -eps_largest_real ? The fact that the program works and produces correct results > when using the -eps_largest_real option suggests that there is probably nothing wrong with the specification > of the problem or the matrices ? It is strange how there is no error message from slepc / Petsc ... the > only idea I have at the moment is that perhaps max memory has been exceeded, which could cause such a sudden > shutdown? For your reference when running the large matrix case with the -eps_largest_real option I am using > about 36 GB of the 148GB available on this machine - does the shift invert approach require substantially > more memory for example ? > > I would be very grateful if you have any suggestions to resolve this issue or even ways to clarify it further, > the performance I have seen with the shift-invert for the small matrix is so impressive it would be great to > get that working for the full-size problem. > > Many thanks and best wishes, > Dan. > > > > From: Jose E. Roman > Sent: Thursday, August 19, 2021 7:58 AM > To: dazza simplythebest > Cc: PETSc > Subject: Re: [petsc-users] Improving efficiency of slepc usage > > In A) convergence may be slow, especially if the wanted eigenvalues have small magnitude. I would not say 600 iterations is a lot, you probably need many more. In most cases, approach B) is better because it improves convergence of eigenvalues close to the target, but it requires prior knowledge of your spectrum distribution in order to choose an appropriate target. > > In B) what do you mean that it crashes. If you get an error about factorization, it means that your A-matrix is singular, In that case, try using a nonzero target -eps_target 0.1 > > Jose > > > > El 19 ago 2021, a las 7:12, dazza simplythebest escribi?: > > > > Dear All, > > I am planning on using slepc to do a large number of eigenvalue calculations > > of a generalized eigenvalue problem, called from a program written in fortran using MPI. > > Thus far I have successfully installed the slepc/PETSc software, both locally and on a cluster, > > and on smaller test problems everything is working well; the matrices are efficiently and > > correctly constructed and slepc returns the correct spectrum. I am just now starting to move > > towards now solving the full-size 'production run' problems, and would appreciate some > > general advice on how to improve the solver's performance. > > > > In particular, I am currently trying to solve the problem Ax = lambda Bx whose matrices > > are of size 50000 (this is the smallest 'production run' problem I will be tackling), and are > > complex, non-Hermitian. In most cases I aim to find the eigenvalues with the largest real part, > > although in other cases I will also be interested in finding the eigenvalues whose real part > > is close to zero. > > > > A) > > Calling slepc 's EPS solver with the following options: > > > > -eps_nev 10 -log_view -eps_view -eps_max_it 600 -eps_ncv 140 -eps_tol 5.0e-6 -eps_largest_real -eps_monitor :monitor_output.txt > > > > > > led to the code successfully running, but failing to find any eigenvalues within the maximum 600 iterations > > (examining the monitor output it did appear to be very slowly approaching convergence). > > > > B) > > On the same problem I have also tried a shift-invert transformation using the options > > > > -eps_nev 10 -eps_ncv 140 -eps_target 0.0+0.0i -st_type sinvert > > > > -in this case the code crashed at the point it tried to call slepc, so perhaps I have incorrectly specified these options ? > > > > > > Does anyone have any suggestions as to how to improve this performance ( or find out more about the problem) ? > > In the case of A) I can see from watching the slepc videos that increasing ncv > > may help, but I am wondering , since 600 is a large number of iterations, whether there > > maybe something else going on - e.g. perhaps some alternative preconditioner may help ? > > In the case of B), I guess there must be some mistake in these command line options? > > Again, any advice will be greatly appreciated. > > Best wishes, Dan. > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From knepley at gmail.com Wed Aug 25 11:17:09 2021 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 25 Aug 2021 12:17:09 -0400 Subject: [petsc-users] laplace_equation In-Reply-To: References: Message-ID: On Wed, Aug 25, 2021 at 9:06 AM govind sharma wrote: > Hi, > > I want to solve 2D laplace equations at the starting level with petsc4py > in parallel using mpi4py. > > Any examples or tutorials? > https://gitlab.com/petsc/petsc/-/blob/main/src/binding/petsc4py/demo/poisson2d/poisson2d.py Thanks, Matt > Regards, > Govind Sharma > Phd scholar, Indian Institute of Technology, Delhi > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sayosale at hotmail.com Thu Aug 26 07:32:12 2021 From: sayosale at hotmail.com (dazza simplythebest) Date: Thu, 26 Aug 2021 12:32:12 +0000 Subject: [petsc-users] Improving efficiency of slepc usage -memory management when using shift-invert In-Reply-To: References: Message-ID: Dear Jose and Matthew, Many thanks for your assistance, this would seem to explain what the problem was. So judging by this test case, there seems to be a memory vs computational time tradeoff involved in choosing whether to shift-invert or not; the shift-invert will greatly reduce the number of required iterations ,but will require a higher memory cost ? I have been trying a few values of -st_mat_mumps_icntl_14 (and also the alternative -st_mat_mumps_icntl_23) today but have not yet been able to select one that fits onto the workstation I am using (although it seems that setting these parameters seems to guarantee that an error message is generated at least). Thus I will probably need to reduce the number of MPI processes and thereby reduce the memory requirement). In this regard the MUMPS documentation suggests that a hybrid MPI-OpenMP approach is optimum for their software, whereas I remember reading somewhere else that openmp threading was not a good choice for using PETSC, would you have any general advice on this ? I was thinking maybe that a version of slepc / petsc compiled against openmp, and with the number of threads set appropriately, but not explicitly using openmp directives in the user's code may be the way forward ? That way PETSC will (?) just ignore the threading whereas threading will be available to MUMPS when execution is passed to those routines ? Many thanks once again, Dan. ________________________________ From: Jose E. Roman Sent: Wednesday, August 25, 2021 1:40 PM To: dazza simplythebest Cc: PETSc Subject: Re: [petsc-users] Improving efficiency of slepc usage MUMPS documentation (section 8) indicates that the meaning of INFOG(1)=-9 is insuficient workspace. Try running with -st_mat_mumps_icntl_14 where is the percentage in which you want to increase the workspace, e.g. 50 or 100 or more. See ex43.c for an example showing how to set this option in code. Jose > El 25 ago 2021, a las 14:11, dazza simplythebest escribi?: > > > > From: dazza simplythebest > Sent: Wednesday, August 25, 2021 12:08 PM > To: Matthew Knepley > Subject: Re: [petsc-users] Improving efficiency of slepc usage > > ?Dear Matthew and Jose, > I have derived a smaller program from the original program by constructing > matrices of the same size, but filling their entries randomly instead of computing the correct > fluid dynamics values just to allow faster experimentation. This modified code's behaviour seems > to be similar, with the code again failing for the large matrix case with the SIGKILL error, so I first report > results from that code here. Firstly I can confirm that I am using Fortran , and I am compiling with the > intel compiler, which it seems places automatic arrays on the stack. The stacksize, as determined > by ulimit -a, is reported to be : > stack size (kbytes, -s) 8192 > > [1] Okay, so I followed your suggestion and used ctrl-c followed by 'where' in one of the non-SIGKILL gdb windows. > I have pasted the output into the bottom of this email (see [1] output) - it does look like the problem occurs somewhere in the call > to the MUMPS solver ? > > [2] I have also today gained access to another workstation, and so have tried running the (original) code on that machine. > This new machine has two (more powerful) CPU nodes and a larger memory (both machines feature Intel Xeon processors). > On this new machine the large matrix case again failed with the familiar SIGKILL report when I used 16 or 12 MPI > processes, ran to the end w/out error for 4 or 6 MPI processes, and failed but with a PETSC error message > when I used 8 MPI processes, which I have pasted below (see [2] output). Does this point to some sort of resource > demand that exceeds some limit as the number of MPI processes increases ? > > Many thanks once again, > Dan. > > [2] output > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Error in external library > [0]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6 > > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [0]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021 > [0]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug > [0]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [1]PETSC ERROR: Error in external library > [1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6 > > [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [1]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [1]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021 > [1]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug > [1]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > [1]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > [1]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > [1]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > [1]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > [2]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [2]PETSC ERROR: Error in external library > [2]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6 > > [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [2]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [2]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021 > [2]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug > [2]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > [2]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > [2]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > [2]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > [2]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > [3]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [3]PETSC ERROR: Error in external library > [3]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6 > > [3]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [3]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [3]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021 > [3]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug > [3]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > [3]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > [3]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > [3]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > [3]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > [3]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > [3]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > [3]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > [4]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [4]PETSC ERROR: Error in external library > [4]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6 > > [4]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [4]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [4]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021 > [4]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug > [4]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > [4]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > [4]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > [4]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > [4]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > [4]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > [4]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > [4]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > [4]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > [5]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [5]PETSC ERROR: Error in external library > [5]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6 > > [5]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [5]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [5]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021 > [5]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug > [5]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > [5]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > [5]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > [5]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > [5]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > [5]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > [5]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > [5]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > [5]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > [6]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [6]PETSC ERROR: Error in external library > [6]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=21891045 > > [6]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [6]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [6]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021 > [6]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug > [6]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > [6]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > [6]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > [6]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > [6]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > [6]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > [6]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > [6]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > [6]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > [7]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [7]PETSC ERROR: Error in external library > [7]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=21841925 > > [7]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [7]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [7]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021 > [7]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug > [7]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > [7]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > [7]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > [7]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > [7]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > [7]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > [7]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > [7]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > [7]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > [0]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > [0]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > [0]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > [0]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > [0]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > [0]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > [0]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > [0]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > [1]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > [1]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > [1]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > [1]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > [2]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > [2]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > [2]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > [2]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > [3]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > > > > [1] output > > Continuing. > [New Thread 0x7f6f5b2d2780 (LWP 794037)] > [New Thread 0x7f6f5aad0800 (LWP 794040)] > [New Thread 0x7f6f5a2ce880 (LWP 794041)] > ^C > Thread 1 "my.exe" received signal SIGINT, Interrupt. > 0x00007f72904927b0 in ofi_fastlock_release_noop () > from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so > (gdb) where > #0 0x00007f72904927b0 in ofi_fastlock_release_noop () > from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so > #1 0x00007f729049354b in ofi_cq_readfrom () > from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so > #2 0x00007f728ffe8f0e in rxm_ep_do_progress () > from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so > #3 0x00007f728ffe2b7d in rxm_ep_recv_common_flags () > from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so > #4 0x00007f728ffe30f8 in rxm_ep_trecvmsg () > from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so > #5 0x00007f72fe6b8c3e in PMPI_Iprobe (source=14090824, tag=-1481647392, > comm=1, flag=0x0, status=0xffffffffffffffff) > at /usr/include/rdma/fi_tagged.h:109 > #6 0x00007f72ff3d7fad in pmpi_iprobe_ (v1=0xd70248, v2=0x7ffda7afdae0, > v3=0x1, v4=0x0, v5=0xffffffffffffffff, ierr=0xd6fc90) > at ../../src/binding/fortran/mpif_h/iprobef.c:276 > #7 0x00007f730855b6e2 in zmumps_try_recvtreat (comm_load=1, ass_irecv=0, > blocking=, > > --Type for more, q to quit, c to continue without paging--cont > irecv=, message_received=, msgsou=1, msgtag=-1, status=..., bufr=..., lbufr=320782504, lbufr_bytes=1283130016, procnode_steps=..., posfac=1, iwpos=1, iwposcb=292535, iptrlu=2039063816, lrlu=2039063816, lrlus=2039063816, n=50400, iw=..., liw=292563, a=..., la=2611636796, ptrist=..., ptlust=..., ptrfac=..., ptrast=..., step=..., pimaster=..., pamaster=..., nstk_s=..., comp=0, iflag=0, ierror=0, comm=-1006632958, nbprocfils=..., ipool=..., lpool=5, leaf=1, nbfin=4, myid=1, slavef=4, root=, opassw=0, opeliw=0, itloc=..., rhs_mumps=..., fils=..., dad=..., ptrarw=..., ptraiw=..., intarr=..., dblarr=..., icntl=..., keep=..., keep8=..., dkeep=..., nd=..., frere=..., lptrar=50400, nelt=1, frtptr=..., frtelt=..., istep_to_iniv2=..., tab_pos_in_pere=..., stack_right_authorized=4294967295, lrgroups=...) at zfac_process_message.F:730 > #8 0x00007f73087076e2 in zmumps_fac_par_m::zmumps_fac_par (n=1, iw=..., liw=, a=..., la=, nstk_steps=..., nbprocfils=..., nd=..., fils=..., step=..., frere=..., dad=..., cand=..., istep_to_iniv2=..., tab_pos_in_pere=..., nstepsdone=1690339657, opass=, opeli=, nelva=50400, comp=259581, maxfrt=-1889517576, nmaxnpiv=-1195144887, ntotpv=, noffnegpv=, nb22t1=, nb22t2=, nbtiny=, det_exp=, det_mant=, det_sign=, ptrist=..., ptrast=..., pimaster=..., pamaster=..., ptrarw=..., ptraiw=..., itloc=..., rhs_mumps=..., ipool=..., lpool=, rinfo=, posfac=, iwpos=, lrlu=, iptrlu=, lrlus=, leaf=, nbroot=, nbrtot=, uu=, icntl=, ptlust=..., ptrfac=..., info=, keep=, keep8=, procnode_steps=..., slavef=, myid=, comm_nodes=, myid_nodes=, bufr=..., lbufr=0, lbufr_bytes=5, intarr=..., dblarr=..., root=..., perm=..., nelt=0, frtptr=..., frtelt=..., lptrar=3, comm_load=-30, ass_irecv=30, seuil=2.1219957909652723e-314, seuil_ldlt_niv2=4.2439866417681519e-314, mem_distrib=..., ne=..., dkeep=..., pivnul_list=..., lpn_list=0, lrgroups=...) at zfac_par_m.F:182 > #9 0x00007f730865af7a in zmumps_fac_b (n=1, s_is_pointers=..., la=, liw=, sym_perm=..., na=..., lna=1, ne_steps=..., nfsiz=..., fils=..., step=..., frere=..., dad=..., cand=..., istep_to_iniv2=..., tab_pos_in_pere=..., ptrar=..., ldptrar=, ptrist=..., ptlust_s=..., ptrfac=..., iw1=..., iw2=..., itloc=..., rhs_mumps=..., pool=..., lpool=-1889529280, cntl1=-5.3576889161551131e-255, icntl=, info=..., rinfo=..., keep=..., keep8=..., procnode_steps=..., slavef=-1889504640, comm_nodes=-2048052411, myid=, myid_nodes=-1683330500, bufr=..., lbufr=, lbufr_bytes=, zmumps_lbuf=, intarr=..., dblarr=..., root=, nelt=, frtptr=..., frtelt=..., comm_load=, ass_irecv=, seuil=, seuil_ldlt_niv2=, mem_distrib=, dkeep=, pivnul_list=..., lpn_list=, lrgroups=...) at zfac_b.F:243 > #10 0x00007f7308610ff7 in zmumps_fac_driver (id=) at zfac_driver.F:2421 > #11 0x00007f7308569256 in zmumps (id=) at zmumps_driver.F:1883 > #12 0x00007f73084cf756 in zmumps_f77 (job=1, sym=0, par=, comm_f77=, n=, nblk=1, icntl=..., cntl=..., keep=..., dkeep=..., keep8=..., nz=0, nnz=0, irn=..., irnhere=0, jcn=..., jcnhere=0, a=..., ahere=0, nz_loc=0, nnz_loc=304384739, irn_loc=..., irn_lochere=1, jcn_loc=..., jcn_lochere=1, a_loc=..., a_lochere=1, nelt=0, eltptr=..., eltptrhere=0, eltvar=..., eltvarhere=0, a_elt=..., a_elthere=0, blkptr=..., blkptrhere=0, blkvar=..., blkvarhere=0, perm_in=..., perm_inhere=0, rhs=..., rhshere=0, redrhs=..., redrhshere=0, info=..., rinfo=..., infog=..., rinfog=..., deficiency=0, lwk_user=0, size_schur=0, listvar_schur=..., listvar_schurhere=0, schur=..., schurhere=0, wk_user=..., wk_userhere=0, colsca=..., colscahere=0, rowsca=..., rowscahere=0, instance_number=1, nrhs=1, lrhs=0, lredrhs=0, rhs_sparse=..., rhs_sparsehere=0, sol_loc=..., sol_lochere=0, rhs_loc=..., rhs_lochere=0, irhs_sparse=..., irhs_sparsehere=0, irhs_ptr=..., irhs_ptrhere=0, isol_loc=..., isol_lochere=0, irhs_loc=..., irhs_lochere=0, nz_rhs=0, lsol_loc=0, lrhs_loc=0, nloc_rhs=0, schur_mloc=0, schur_nloc=0, schur_lld=0, mblock=0, nblock=0, nprow=0, npcol=0, ooc_tmpdir=..., ooc_prefix=..., write_problem=..., save_dir=..., save_prefix=..., tmpdirlen=20, prefixlen=20, write_problemlen=20, save_dirlen=20, save_prefixlen=20, metis_options=...) at zmumps_f77.F:289 > #13 0x00007f73084cd391 in zmumps_c (mumps_par=0xd70248) at mumps_c.c:485 > #14 0x00007f7307c035ad in MatFactorNumeric_MUMPS (F=0xd70248, A=0x7ffda7afdae0, info=0x1) at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1683 > #15 0x00007f7307765a8b in MatLUFactorNumeric (fact=0xd70248, mat=0x7ffda7afdae0, info=0x1) at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > #16 0x00007f73081b8427 in PCSetUp_LU (pc=0xd70248) at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > #17 0x00007f7308214939 in PCSetUp (pc=0xd70248) at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > #18 0x00007f73082260ae in KSPSetUp (ksp=0xd70248) at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > #19 0x00007f7309114959 in STSetUp_Sinvert (st=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > #20 0x00007f7309130462 in STSetUp (st=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > #21 0x00007f73092504af in EPSSetUp (eps=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > #22 0x00007f7309253635 in EPSSolve (eps=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > #23 0x00007f7309259c8d in epssolve_ (eps=0xd70248, __ierr=0x7ffda7afdae0) at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/ftn-auto/epssolvef.c:85 > #24 0x0000000000403c19 in all_stab_routines::solve_by_slepc2 (a_pet=..., b_pet=..., jthisone=, isize=) at small_slepc_example_program.F:322 > #25 0x00000000004025a0 in slepit () at small_slepc_example_program.F:549 > #26 0x00000000004023f2 in main () > #27 0x00007f72fb8380b3 in __libc_start_main (main=0x4023c0
, argc=14, argv=0x7ffda7b024e8, init=, fini=, rtld_fini=, stack_end=0x7ffda7b024d8) at ../csu/libc-start.c:308 > #28 0x00000000004022fe in _start () > > From: Matthew Knepley > Sent: Tuesday, August 24, 2021 3:59 PM > To: dazza simplythebest > Cc: Jose E. Roman ; PETSc > Subject: Re: [petsc-users] Improving efficiency of slepc usage > > On Tue, Aug 24, 2021 at 8:47 AM dazza simplythebest wrote: > > Dear Matthew and Jose, > Apologies for the delayed reply, I had a couple of unforeseen days off this week. > Firstly regarding Jose's suggestion re: MUMPS, the program is already using MUMPS > to solve linear systems (the code is using a distributed MPI matrix to solve the generalised > non-Hermitian complex problem). > > I have tried the gdb debugger as per Matthew's suggestion. > Just to note in case someone else is following this that at first it didn't work (couldn't 'attach') , > but after some googling I found a tip suggesting the command; > echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope > which seemed to get it working. > > I then first ran the debugger on the small matrix case that worked. > That stopped in gdb almost immediately after starting execution > with a report regarding 'nanosleep.c': > ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory. > However, issuing the 'cont' command again caused the program to run through to the end of the > execution w/out any problems, and with correct looking results, so I am guessing this error > is not particularly important. > > We do that on purpose when the debugger starts up. Typing 'cont' is correct. > > I then tried the same debugging procedure on the large matrix case that fails. > The code again stopped almost immediately after the start of execution with > the same nanosleep error as before, and I was able to set the program running > again with 'cont' (see full output below). I was running the code with 4 MPI processes, > and so had 4 gdb windows appear. Thereafter the code ran for sometime until completing the > matrix construction, and then one of the gdb process windows printed a > Program terminated with signal SIGKILL, Killed. > The program no longer exists. > message. I then typed 'where' into this terminal but just received the message > No stack. > > I have only seen this behavior one other time, and it was with Fortran. Fortran allows you to declare really big arrays > on the stack by putting them at the start of a function (rather than F90 malloc). When I had one of those arrays exceed > the stack space, I got this kind of an error where everything is destroyed rather than just stopping. Could it be that you > have a large structure on the stack? > > Second, you can at least look at the stack for the processes that were not killed. You type Ctrl-C, which should give you > the prompt and then "where". > > Thanks, > > Matt > > The other gdb windows basically seemed to be left in limbo until I issued the 'quit' > command in the SIGKILL, and then they vanished. > > I paste the full output from the gdb window that recorded the SIGKILL below here. > I guess it is necessary to somehow work out where the SIGKILL originates from ? > > Thanks once again, > Dan. > > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2 > Copyright (C) 2020 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. > Type "show copying" and "show warranty" for details. > This GDB was configured as "x86_64-linux-gnu". > Type "show configuration" for configuration details. > For bug reporting instructions, please see: > . > Find the GDB manual and other documentation resources online at: > . > > For help, type "help". > Type "apropos word" to search for commands related to "word"... > Reading symbols from ./stab1.exe... > Attaching to program: /data/work/rotplane/omega_to_zero/stability/test/tmp10/tmp6/stab1.exe, process 675919 > Reading symbols from /data/work/slepc/SLEPC/slepc-3.15.1/arch-omp_nodbug/lib/libslepc.so.3.15... > Reading symbols from /data/work/slepc/PETSC/petsc-3.15.0/arch-omp_nodbug/lib/libpetsc.so.3.15... > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib--Type for more, q to quit, c to continue without paging--cont > /intel64_lin/libmkl_intel_lp64.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so... > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.so... > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.dbg... > Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2... > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libdl-2.31.so... > Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0... > Reading symbols from /usr/lib/debug/.build-id/e5/4761f7b554d0fcc1562959665d93dffbebdaf0.debug... > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". > Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6... > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libstdc++.so.6) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpifort.so.12... > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12... > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.dbg... > Reading symbols from /lib/x86_64-linux-gnu/librt.so.1... > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/librt-2.31.so... > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so) > Reading symbols from /lib/x86_64-linux-gnu/libm.so.6... > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libm-2.31.so... > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so) > Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1... > (No debugging symbols found in /lib/x86_64-linux-gnu/libgcc_s.so.1) > Reading symbols from /usr/lib/x86_64-linux-gnu/libquadmath.so.0... > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libquadmath.so.0) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so) > Reading symbols from /lib/x86_64-linux-gnu/libc.so.6... > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.31.so... > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5) > Reading symbols from /lib64/ld-linux-x86-64.so.2... > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.31.so... > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1) > Reading symbols from /usr/lib/x86_64-linux-gnu/libnuma.so... > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libnuma.so) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so) > Reading symbols from /usr/lib/x86_64-linux-gnu/libpsm2.so.2... > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libpsm2.so.2) > 0x00007fac4d0d8334 in __GI___clock_nanosleep (clock_id=, clock_id at entry=0, flags=flags at entry=0, req=req at entry=0x7ffdc641a9a0, rem=rem at entry=0x7ffdc641a9a0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78 > 78 ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory. > (gdb) cont > Continuing. > [New Thread 0x7f9e49c02780 (LWP 676559)] > [New Thread 0x7f9e49400800 (LWP 676560)] > [New Thread 0x7f9e48bfe880 (LWP 676562)] > [Thread 0x7f9e48bfe880 (LWP 676562) exited] > [Thread 0x7f9e49400800 (LWP 676560) exited] > [Thread 0x7f9e49c02780 (LWP 676559) exited] > > Program terminated with signal SIGKILL, Killed. > The program no longer exists. > (gdb) where > No stack. > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > From: Matthew Knepley > Sent: Friday, August 20, 2021 2:12 PM > To: dazza simplythebest > Cc: Jose E. Roman ; PETSc > Subject: Re: [petsc-users] Improving efficiency of slepc usage > > On Fri, Aug 20, 2021 at 6:55 AM dazza simplythebest wrote: > Dear Jose, > Many thanks for your response, I have been investigating this issue with a few more calculations > today, hence the slightly delayed response. > > The problem is actually derived from a fluid dynamics problem, so to allow an easier exploration of things > I first downsized the resolution of the underlying fluid solver while keeping all the physical parameters > the same - i.e. I would get a smaller matrix that should be solving the same physical problem as the original > larger matrix but to lower accuracy. > > Results > > Small matrix (N= 21168) - everything good! > This converged when using the -eps_largest_real approach (taking 92 iterations for nev=10, > tol= 5.0000E-06 and ncv = 300), and also when using the shift-invert approach, converging > very impressively in a single iteration ! Interestingly it did this both for a non-zero -eps_target > and also for a zero -eps_target. > > Large matrix (N=50400)- works for -eps_largest_real , fails for st_type sinvert > I have just double checked again that the code does run properly when we use the -eps_largest_real > option - indeed I ran it with a small nev and large tolerance (nev = 4, tol= -eps_tol 5.0e-4 , ncv = 300) > and with these parameters convergence was obtained in 164 iterations, which took 6 hours on the > machine I was running it on. Furthermore the eigenvalues seem to be ballpark correct; for this large > higher resolution case (although with lower slepc tolerance) we obtain 1789.56816314173 -4724.51319554773i > as the eigenvalue with largest real part, while the smaller matrix (same physical problem but at lower resolution case) > found this eigenvalue to be 1831.11845726501 -4787.54519511345i , which means the agreement is in line > with expectations. > > Unfortunately though the code does still crash though when I try to do shift-invert for the large matrix case , > whether or not I use a non-zero -eps_target. For reference this is the command line used : > -eps_nev 10 -eps_ncv 300 -log_view -eps_view -eps_target 0.1 -st_type sinvert -eps_monitor :monitor_output05.txt > To be precise the code crashes soon after calling EPSSolve (it successfully calls > MatCreateVecs, EPSCreate, EPSSetOperators, EPSSetProblemType and EPSSetFromOptions). > By crashes I mean that I do not even get any error messages from slepc/PETSC, and do not even get the > 'EPS Object: 16 MPI processes' message - I simply get a MPI/Fortran 'KILLED BY SIGNAL: 9 (Killed)' message > as soon as EPSsolve is called. > > Hi Dan, > > It would help track this error down if we had a stack trace. You can get a stack trace from the debugger. You run with > > -start_in_debugger > > which should launch the debugger (usually), and then type > > cont > > to continue, and then > > where > > to get the stack trace when it crashes, or 'bt' on lldb. > > Thanks, > > Matt > > Do you have any ideas as to why this larger matrix case should fail when using shift-invert but succeed when using > -eps_largest_real ? The fact that the program works and produces correct results > when using the -eps_largest_real option suggests that there is probably nothing wrong with the specification > of the problem or the matrices ? It is strange how there is no error message from slepc / Petsc ... the > only idea I have at the moment is that perhaps max memory has been exceeded, which could cause such a sudden > shutdown? For your reference when running the large matrix case with the -eps_largest_real option I am using > about 36 GB of the 148GB available on this machine - does the shift invert approach require substantially > more memory for example ? > > I would be very grateful if you have any suggestions to resolve this issue or even ways to clarify it further, > the performance I have seen with the shift-invert for the small matrix is so impressive it would be great to > get that working for the full-size problem. > > Many thanks and best wishes, > Dan. > > > > From: Jose E. Roman > Sent: Thursday, August 19, 2021 7:58 AM > To: dazza simplythebest > Cc: PETSc > Subject: Re: [petsc-users] Improving efficiency of slepc usage > > In A) convergence may be slow, especially if the wanted eigenvalues have small magnitude. I would not say 600 iterations is a lot, you probably need many more. In most cases, approach B) is better because it improves convergence of eigenvalues close to the target, but it requires prior knowledge of your spectrum distribution in order to choose an appropriate target. > > In B) what do you mean that it crashes. If you get an error about factorization, it means that your A-matrix is singular, In that case, try using a nonzero target -eps_target 0.1 > > Jose > > > > El 19 ago 2021, a las 7:12, dazza simplythebest escribi?: > > > > Dear All, > > I am planning on using slepc to do a large number of eigenvalue calculations > > of a generalized eigenvalue problem, called from a program written in fortran using MPI. > > Thus far I have successfully installed the slepc/PETSc software, both locally and on a cluster, > > and on smaller test problems everything is working well; the matrices are efficiently and > > correctly constructed and slepc returns the correct spectrum. I am just now starting to move > > towards now solving the full-size 'production run' problems, and would appreciate some > > general advice on how to improve the solver's performance. > > > > In particular, I am currently trying to solve the problem Ax = lambda Bx whose matrices > > are of size 50000 (this is the smallest 'production run' problem I will be tackling), and are > > complex, non-Hermitian. In most cases I aim to find the eigenvalues with the largest real part, > > although in other cases I will also be interested in finding the eigenvalues whose real part > > is close to zero. > > > > A) > > Calling slepc 's EPS solver with the following options: > > > > -eps_nev 10 -log_view -eps_view -eps_max_it 600 -eps_ncv 140 -eps_tol 5.0e-6 -eps_largest_real -eps_monitor :monitor_output.txt > > > > > > led to the code successfully running, but failing to find any eigenvalues within the maximum 600 iterations > > (examining the monitor output it did appear to be very slowly approaching convergence). > > > > B) > > On the same problem I have also tried a shift-invert transformation using the options > > > > -eps_nev 10 -eps_ncv 140 -eps_target 0.0+0.0i -st_type sinvert > > > > -in this case the code crashed at the point it tried to call slepc, so perhaps I have incorrectly specified these options ? > > > > > > Does anyone have any suggestions as to how to improve this performance ( or find out more about the problem) ? > > In the case of A) I can see from watching the slepc videos that increasing ncv > > may help, but I am wondering , since 600 is a large number of iterations, whether there > > maybe something else going on - e.g. perhaps some alternative preconditioner may help ? > > In the case of B), I guess there must be some mistake in these command line options? > > Again, any advice will be greatly appreciated. > > Best wishes, Dan. > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Thu Aug 26 10:29:44 2021 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Thu, 26 Aug 2021 10:29:44 -0500 Subject: [petsc-users] Improving efficiency of slepc usage -memory management when using shift-invert In-Reply-To: References: Message-ID: Hello, Dan, You might want to have a look the manual at https://petsc.org/release/docs/manualpages/Mat/MATSOLVERMUMPS.html Thanks. --Junchao Zhang On Thu, Aug 26, 2021 at 7:32 AM dazza simplythebest wrote: > Dear Jose and Matthew, > Many thanks for your assistance, this would seem to > explain what the problem was. > So judging by this test case, there seems to be a memory vs computational > time tradeoff involved > in choosing whether to shift-invert or not; the shift-invert will greatly > reduce the > number of required iterations ,but will require a higher memory cost ? > I have been trying a few values of -st_mat_mumps_icntl_14 (and also the > alternative > -st_mat_mumps_icntl_23) today but have not yet been able to select one > that fits onto the > workstation I am using (although it seems that setting these parameters > seems to guarantee > that an error message is generated at least). > > Thus I will probably need to reduce the number of MPI > processes and thereby reduce the memory requirement). In this regard the > MUMPS documentation > suggests that a hybrid MPI-OpenMP approach is optimum for their software, > whereas I remember reading > somewhere else that openmp threading was not a good choice for using > PETSC, would you have any > general advice on this ? I was thinking maybe that a version of slepc / > petsc compiled against openmp, > and with the number of threads set appropriately, but not explicitly > using openmp directives in > the user's code may be the way forward ? That way PETSC will (?) just > ignore the threading whereas > threading will be available to MUMPS when execution is passed to those > routines ? > > Many thanks once again, > Dan. > > > > ------------------------------ > *From:* Jose E. Roman > *Sent:* Wednesday, August 25, 2021 1:40 PM > *To:* dazza simplythebest > *Cc:* PETSc > *Subject:* Re: [petsc-users] Improving efficiency of slepc usage > > MUMPS documentation (section 8) indicates that the meaning of INFOG(1)=-9 > is insuficient workspace. Try running with > -st_mat_mumps_icntl_14 > where is the percentage in which you want to increase the > workspace, e.g. 50 or 100 or more. > > See ex43.c for an example showing how to set this option in code. > > Jose > > > > El 25 ago 2021, a las 14:11, dazza simplythebest > escribi?: > > > > > > > > From: dazza simplythebest > > Sent: Wednesday, August 25, 2021 12:08 PM > > To: Matthew Knepley > > Subject: Re: [petsc-users] Improving efficiency of slepc usage > > > > ?Dear Matthew and Jose, > > I have derived a smaller > program from the original program by constructing > > matrices of the same size, but filling their entries randomly instead of > computing the correct > > fluid dynamics values just to allow faster experimentation. This > modified code's behaviour seems > > to be similar, with the code again failing for the large matrix case > with the SIGKILL error, so I first report > > results from that code here. Firstly I can confirm that I am using > Fortran , and I am compiling with the > > intel compiler, which it seems places automatic arrays on the stack. > The stacksize, as determined > > by ulimit -a, is reported to be : > > stack size (kbytes, -s) 8192 > > > > [1] Okay, so I followed your suggestion and used ctrl-c followed by > 'where' in one of the non-SIGKILL gdb windows. > > I have pasted the output into the bottom of this email (see [1] output) > - it does look like the problem occurs somewhere in the call > > to the MUMPS solver ? > > > > [2] I have also today gained access to another workstation, and so have > tried running the (original) code on that machine. > > This new machine has two (more powerful) CPU nodes and a larger memory > (both machines feature Intel Xeon processors). > > On this new machine the large matrix case again failed with the familiar > SIGKILL report when I used 16 or 12 MPI > > processes, ran to the end w/out error for 4 or 6 MPI processes, and > failed but with a PETSC error message > > when I used 8 MPI processes, which I have pasted below (see [2] > output). Does this point to some sort of resource > > demand that exceeds some limit as the number of MPI processes increases ? > > > > Many thanks once again, > > Dan. > > > > [2] output > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [0]PETSC ERROR: Error in external library > > [0]PETSC ERROR: Error reported by MUMPS in numerical factorization > phase: INFOG(1)=-9, INFO(2)=6 > > > > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [0]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > > [0]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren > Wed Aug 25 11:18:48 2021 > > [0]PETSC ERROR: Configure options > ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs > --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" > CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex > --with-precision=double --with-debugging=0 --with-openmp > --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --download-mumps --download-scalapack --download-cmake > PETSC_ARCH=arch-omp_nodbug > > [0]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [1]PETSC ERROR: Error in external library > > [1]PETSC ERROR: Error reported by MUMPS in numerical factorization > phase: INFOG(1)=-9, INFO(2)=6 > > > > [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [1]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > > [1]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren > Wed Aug 25 11:18:48 2021 > > [1]PETSC ERROR: Configure options > ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs > --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" > CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex > --with-precision=double --with-debugging=0 --with-openmp > --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --download-mumps --download-scalapack --download-cmake > PETSC_ARCH=arch-omp_nodbug > > [1]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > > [1]PETSC ERROR: #2 MatLUFactorNumeric() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > > [1]PETSC ERROR: #3 PCSetUp_LU() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > > [1]PETSC ERROR: #4 PCSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > > [1]PETSC ERROR: #5 KSPSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > > [2]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [2]PETSC ERROR: Error in external library > > [2]PETSC ERROR: Error reported by MUMPS in numerical factorization > phase: INFOG(1)=-9, INFO(2)=6 > > > > [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [2]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > > [2]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren > Wed Aug 25 11:18:48 2021 > > [2]PETSC ERROR: Configure options > ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs > --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" > CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex > --with-precision=double --with-debugging=0 --with-openmp > --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --download-mumps --download-scalapack --download-cmake > PETSC_ARCH=arch-omp_nodbug > > [2]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > > [2]PETSC ERROR: #2 MatLUFactorNumeric() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > > [2]PETSC ERROR: #3 PCSetUp_LU() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > > [2]PETSC ERROR: #4 PCSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > > [2]PETSC ERROR: #5 KSPSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > > [3]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [3]PETSC ERROR: Error in external library > > [3]PETSC ERROR: Error reported by MUMPS in numerical factorization > phase: INFOG(1)=-9, INFO(2)=6 > > > > [3]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [3]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > > [3]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren > Wed Aug 25 11:18:48 2021 > > [3]PETSC ERROR: Configure options > ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs > --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" > CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex > --with-precision=double --with-debugging=0 --with-openmp > --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --download-mumps --download-scalapack --download-cmake > PETSC_ARCH=arch-omp_nodbug > > [3]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > > [3]PETSC ERROR: #2 MatLUFactorNumeric() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > > [3]PETSC ERROR: #3 PCSetUp_LU() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > > [3]PETSC ERROR: #4 PCSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > > [3]PETSC ERROR: #5 KSPSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > > [3]PETSC ERROR: #6 STSetUp_Sinvert() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > > [3]PETSC ERROR: #7 STSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > > [3]PETSC ERROR: #8 EPSSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > > [4]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [4]PETSC ERROR: Error in external library > > [4]PETSC ERROR: Error reported by MUMPS in numerical factorization > phase: INFOG(1)=-9, INFO(2)=6 > > > > [4]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [4]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > > [4]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren > Wed Aug 25 11:18:48 2021 > > [4]PETSC ERROR: Configure options > ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs > --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" > CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex > --with-precision=double --with-debugging=0 --with-openmp > --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --download-mumps --download-scalapack --download-cmake > PETSC_ARCH=arch-omp_nodbug > > [4]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > > [4]PETSC ERROR: #2 MatLUFactorNumeric() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > > [4]PETSC ERROR: #3 PCSetUp_LU() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > > [4]PETSC ERROR: #4 PCSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > > [4]PETSC ERROR: #5 KSPSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > > [4]PETSC ERROR: #6 STSetUp_Sinvert() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > > [4]PETSC ERROR: #7 STSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > > [4]PETSC ERROR: #8 EPSSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > > [4]PETSC ERROR: #9 EPSSolve() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > > [5]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [5]PETSC ERROR: Error in external library > > [5]PETSC ERROR: Error reported by MUMPS in numerical factorization > phase: INFOG(1)=-9, INFO(2)=6 > > > > [5]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [5]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > > [5]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren > Wed Aug 25 11:18:48 2021 > > [5]PETSC ERROR: Configure options > ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs > --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" > CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex > --with-precision=double --with-debugging=0 --with-openmp > --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --download-mumps --download-scalapack --download-cmake > PETSC_ARCH=arch-omp_nodbug > > [5]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > > [5]PETSC ERROR: #2 MatLUFactorNumeric() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > > [5]PETSC ERROR: #3 PCSetUp_LU() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > > [5]PETSC ERROR: #4 PCSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > > [5]PETSC ERROR: #5 KSPSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > > [5]PETSC ERROR: #6 STSetUp_Sinvert() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > > [5]PETSC ERROR: #7 STSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > > [5]PETSC ERROR: #8 EPSSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > > [5]PETSC ERROR: #9 EPSSolve() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > > [6]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [6]PETSC ERROR: Error in external library > > [6]PETSC ERROR: Error reported by MUMPS in numerical factorization > phase: INFOG(1)=-9, INFO(2)=21891045 > > > > [6]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [6]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > > [6]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren > Wed Aug 25 11:18:48 2021 > > [6]PETSC ERROR: Configure options > ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs > --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" > CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex > --with-precision=double --with-debugging=0 --with-openmp > --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --download-mumps --download-scalapack --download-cmake > PETSC_ARCH=arch-omp_nodbug > > [6]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > > [6]PETSC ERROR: #2 MatLUFactorNumeric() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > > [6]PETSC ERROR: #3 PCSetUp_LU() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > > [6]PETSC ERROR: #4 PCSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > > [6]PETSC ERROR: #5 KSPSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > > [6]PETSC ERROR: #6 STSetUp_Sinvert() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > > [6]PETSC ERROR: #7 STSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > > [6]PETSC ERROR: #8 EPSSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > > [6]PETSC ERROR: #9 EPSSolve() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > > [7]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [7]PETSC ERROR: Error in external library > > [7]PETSC ERROR: Error reported by MUMPS in numerical factorization > phase: INFOG(1)=-9, INFO(2)=21841925 > > > > [7]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [7]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > > [7]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren > Wed Aug 25 11:18:48 2021 > > [7]PETSC ERROR: Configure options > ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs > --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" > CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex > --with-precision=double --with-debugging=0 --with-openmp > --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --download-mumps --download-scalapack --download-cmake > PETSC_ARCH=arch-omp_nodbug > > [7]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > > [7]PETSC ERROR: #2 MatLUFactorNumeric() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > > [7]PETSC ERROR: #3 PCSetUp_LU() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > > [7]PETSC ERROR: #4 PCSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > > [7]PETSC ERROR: #5 KSPSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > > [7]PETSC ERROR: #6 STSetUp_Sinvert() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > > [7]PETSC ERROR: #7 STSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > > [7]PETSC ERROR: #8 EPSSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > > [7]PETSC ERROR: #9 EPSSolve() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > > [0]PETSC ERROR: #2 MatLUFactorNumeric() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > > [0]PETSC ERROR: #3 PCSetUp_LU() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > > [0]PETSC ERROR: #4 PCSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > > [0]PETSC ERROR: #5 KSPSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > > [0]PETSC ERROR: #6 STSetUp_Sinvert() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > > [0]PETSC ERROR: #7 STSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > > [0]PETSC ERROR: #8 EPSSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > > [0]PETSC ERROR: #9 EPSSolve() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > > [1]PETSC ERROR: #6 STSetUp_Sinvert() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > > [1]PETSC ERROR: #7 STSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > > [1]PETSC ERROR: #8 EPSSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > > [1]PETSC ERROR: #9 EPSSolve() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > > [2]PETSC ERROR: #6 STSetUp_Sinvert() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > > [2]PETSC ERROR: #7 STSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > > [2]PETSC ERROR: #8 EPSSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > > [2]PETSC ERROR: #9 EPSSolve() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > > [3]PETSC ERROR: #9 EPSSolve() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > > > > > > > > [1] output > > > > Continuing. > > [New Thread 0x7f6f5b2d2780 (LWP 794037)] > > [New Thread 0x7f6f5aad0800 (LWP 794040)] > > [New Thread 0x7f6f5a2ce880 (LWP 794041)] > > ^C > > Thread 1 "my.exe" received signal SIGINT, Interrupt. > > 0x00007f72904927b0 in ofi_fastlock_release_noop () > > from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so > > (gdb) where > > #0 0x00007f72904927b0 in ofi_fastlock_release_noop () > > from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so > > #1 0x00007f729049354b in ofi_cq_readfrom () > > from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so > > #2 0x00007f728ffe8f0e in rxm_ep_do_progress () > > from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so > > #3 0x00007f728ffe2b7d in rxm_ep_recv_common_flags () > > from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so > > #4 0x00007f728ffe30f8 in rxm_ep_trecvmsg () > > from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so > > #5 0x00007f72fe6b8c3e in PMPI_Iprobe (source=14090824, tag=-1481647392, > > comm=1, flag=0x0, status=0xffffffffffffffff) > > at /usr/include/rdma/fi_tagged.h:109 > > #6 0x00007f72ff3d7fad in pmpi_iprobe_ (v1=0xd70248, v2=0x7ffda7afdae0, > > v3=0x1, v4=0x0, v5=0xffffffffffffffff, ierr=0xd6fc90) > > at ../../src/binding/fortran/mpif_h/iprobef.c:276 > > #7 0x00007f730855b6e2 in zmumps_try_recvtreat (comm_load=1, ass_irecv=0, > > blocking= 0x1>, > > > > --Type for more, q to quit, c to continue without paging--cont > > irecv=, > message_received= 0xffffffffffffffff>, msgsou=1, msgtag=-1, status=..., bufr=..., > lbufr=320782504, lbufr_bytes=1283130016, procnode_steps=..., posfac=1, > iwpos=1, iwposcb=292535, iptrlu=2039063816, lrlu=2039063816, > lrlus=2039063816, n=50400, iw=..., liw=292563, a=..., la=2611636796, > ptrist=..., ptlust=..., ptrfac=..., ptrast=..., step=..., pimaster=..., > pamaster=..., nstk_s=..., comp=0, iflag=0, ierror=0, comm=-1006632958, > nbprocfils=..., ipool=..., lpool=5, leaf=1, nbfin=4, myid=1, slavef=4, > root= 766016 bytes, which is more than max-value-size>, opassw=0, opeliw=0, > itloc=..., rhs_mumps=..., fils=..., dad=..., ptrarw=..., ptraiw=..., > intarr=..., dblarr=..., icntl=..., keep=..., keep8=..., dkeep=..., nd=..., > frere=..., lptrar=50400, nelt=1, frtptr=..., frtelt=..., > istep_to_iniv2=..., tab_pos_in_pere=..., stack_right_authorized=4294967295, > lrgroups=...) at zfac_process_message.F:730 > > #8 0x00007f73087076e2 in zmumps_fac_par_m::zmumps_fac_par (n=1, iw=..., > liw=, a=..., > la= 0xffffffffffffffff>, nstk_steps=..., nbprocfils=..., nd=..., fils=..., > step=..., frere=..., dad=..., cand=..., istep_to_iniv2=..., > tab_pos_in_pere=..., nstepsdone=1690339657, opass= Cannot access memory at address 0x5>, opeli= access memory at address 0x0>, nelva=50400, comp=259581, > maxfrt=-1889517576, nmaxnpiv=-1195144887, ntotpv= Cannot access memory at address 0x2>, noffnegpv= Cannot access memory at address 0x0>, nb22t1= Cannot access memory at address 0x0>, nb22t2= Cannot access memory at address 0x0>, nbtiny= Cannot access memory at address 0x0>, det_exp= Cannot access memory at address 0x0>, det_mant= Cannot access memory at address 0x0>, det_sign= Cannot access memory at address 0x0>, ptrist=..., ptrast=..., pimaster=..., > pamaster=..., ptrarw=..., ptraiw=..., itloc=..., rhs_mumps=..., ipool=..., > lpool=, > rinfo=, > posfac=, > iwpos=, > lrlu=, > iptrlu=, > lrlus=, > leaf=, > nbroot=, > nbrtot=, > uu=, > icntl=, > ptlust=..., ptrfac=..., info= at address 0x0>, keep= address 0x3ff0000000000000>, keep8= memory at address 0x0>, procnode_steps=..., slavef= Cannot access memory at address 0x4ffffffff>, myid= Cannot access memory at address 0xffffffff>, comm_nodes= variable: Cannot access memory at address 0x0>, myid_nodes= variable: Cannot access memory at address 0x0>, bufr=..., lbufr=0, > lbufr_bytes=5, intarr=..., dblarr=..., root=..., perm=..., nelt=0, > frtptr=..., frtelt=..., lptrar=3, comm_load=-30, ass_irecv=30, > seuil=2.1219957909652723e-314, seuil_ldlt_niv2=4.2439866417681519e-314, > mem_distrib=..., ne=..., dkeep=..., pivnul_list=..., lpn_list=0, > lrgroups=...) at zfac_par_m.F:182 > > #9 0x00007f730865af7a in zmumps_fac_b (n=1, s_is_pointers=..., > la=, > liw=, > sym_perm=..., na=..., lna=1, ne_steps=..., nfsiz=..., fils=..., step=..., > frere=..., dad=..., cand=..., istep_to_iniv2=..., tab_pos_in_pere=..., > ptrar=..., ldptrar= 0x0>, ptrist=..., ptlust_s=..., ptrfac=..., iw1=..., iw2=..., itloc=..., > rhs_mumps=..., pool=..., lpool=-1889529280, cntl1=-5.3576889161551131e-255, > icntl=, > info=..., rinfo=..., keep=..., keep8=..., procnode_steps=..., > slavef=-1889504640, comm_nodes=-2048052411, myid= Cannot access memory at address 0x81160>, myid_nodes=-1683330500, bufr=..., > lbufr=, > lbufr_bytes= 0xc4e0>, zmumps_lbuf= address 0x4>, intarr=..., dblarr=..., root= access memory at address 0x11dbec>, nelt= access memory at address 0x3>, frtptr=..., frtelt=..., comm_load= reading variable: Cannot access memory at address 0x0>, ass_irecv= reading variable: Cannot access memory at address 0x0>, seuil= reading variable: Cannot access memory at address 0x0>, > seuil_ldlt_niv2= 0x0>, mem_distrib= 0x0>, dkeep=, > pivnul_list=..., lpn_list= address 0x0>, lrgroups=...) at zfac_b.F:243 > > #10 0x00007f7308610ff7 in zmumps_fac_driver (id= value of type `zmumps_struc' requires 386095520 bytes, which is more than > max-value-size>) at zfac_driver.F:2421 > > #11 0x00007f7308569256 in zmumps (id= type `zmumps_struc' requires 386095520 bytes, which is more than > max-value-size>) at zmumps_driver.F:1883 > > #12 0x00007f73084cf756 in zmumps_f77 (job=1, sym=0, par= variable: Cannot access memory at address 0x1>, comm_f77= variable: Cannot access memory at address 0x0>, n= Cannot access memory at address 0xffffffffffffffff>, nblk=1, icntl=..., > cntl=..., keep=..., dkeep=..., keep8=..., nz=0, nnz=0, irn=..., irnhere=0, > jcn=..., jcnhere=0, a=..., ahere=0, nz_loc=0, nnz_loc=304384739, > irn_loc=..., irn_lochere=1, jcn_loc=..., jcn_lochere=1, a_loc=..., > a_lochere=1, nelt=0, eltptr=..., eltptrhere=0, eltvar=..., eltvarhere=0, > a_elt=..., a_elthere=0, blkptr=..., blkptrhere=0, blkvar=..., blkvarhere=0, > perm_in=..., perm_inhere=0, rhs=..., rhshere=0, redrhs=..., redrhshere=0, > info=..., rinfo=..., infog=..., rinfog=..., deficiency=0, lwk_user=0, > size_schur=0, listvar_schur=..., listvar_schurhere=0, schur=..., > schurhere=0, wk_user=..., wk_userhere=0, colsca=..., colscahere=0, > rowsca=..., rowscahere=0, instance_number=1, nrhs=1, lrhs=0, lredrhs=0, > rhs_sparse=..., rhs_sparsehere=0, sol_loc=..., sol_lochere=0, rhs_loc=..., > rhs_lochere=0, irhs_sparse=..., irhs_sparsehere=0, irhs_ptr=..., > irhs_ptrhere=0, isol_loc=..., isol_lochere=0, irhs_loc=..., irhs_lochere=0, > nz_rhs=0, lsol_loc=0, lrhs_loc=0, nloc_rhs=0, schur_mloc=0, schur_nloc=0, > schur_lld=0, mblock=0, nblock=0, nprow=0, npcol=0, ooc_tmpdir=..., > ooc_prefix=..., write_problem=..., save_dir=..., save_prefix=..., > tmpdirlen=20, prefixlen=20, write_problemlen=20, save_dirlen=20, > save_prefixlen=20, metis_options=...) at zmumps_f77.F:289 > > #13 0x00007f73084cd391 in zmumps_c (mumps_par=0xd70248) at mumps_c.c:485 > > #14 0x00007f7307c035ad in MatFactorNumeric_MUMPS (F=0xd70248, > A=0x7ffda7afdae0, info=0x1) at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1683 > > #15 0x00007f7307765a8b in MatLUFactorNumeric (fact=0xd70248, > mat=0x7ffda7afdae0, info=0x1) at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > > #16 0x00007f73081b8427 in PCSetUp_LU (pc=0xd70248) at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > > #17 0x00007f7308214939 in PCSetUp (pc=0xd70248) at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > > #18 0x00007f73082260ae in KSPSetUp (ksp=0xd70248) at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > > #19 0x00007f7309114959 in STSetUp_Sinvert (st=0xd70248) at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > > #20 0x00007f7309130462 in STSetUp (st=0xd70248) at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > > #21 0x00007f73092504af in EPSSetUp (eps=0xd70248) at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > > #22 0x00007f7309253635 in EPSSolve (eps=0xd70248) at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > > #23 0x00007f7309259c8d in epssolve_ (eps=0xd70248, > __ierr=0x7ffda7afdae0) at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/ftn-auto/epssolvef.c:85 > > #24 0x0000000000403c19 in all_stab_routines::solve_by_slepc2 (a_pet=..., > b_pet=..., jthisone= address 0x1>, isize= address 0x0>) at small_slepc_example_program.F:322 > > #25 0x00000000004025a0 in slepit () at small_slepc_example_program.F:549 > > #26 0x00000000004023f2 in main () > > #27 0x00007f72fb8380b3 in __libc_start_main (main=0x4023c0
, > argc=14, argv=0x7ffda7b024e8, init=, fini=, > rtld_fini=, stack_end=0x7ffda7b024d8) at > ../csu/libc-start.c:308 > > #28 0x00000000004022fe in _start () > > > > From: Matthew Knepley > > Sent: Tuesday, August 24, 2021 3:59 PM > > To: dazza simplythebest > > Cc: Jose E. Roman ; PETSc > > Subject: Re: [petsc-users] Improving efficiency of slepc usage > > > > On Tue, Aug 24, 2021 at 8:47 AM dazza simplythebest < > sayosale at hotmail.com> wrote: > > > > Dear Matthew and Jose, > > Apologies for the delayed reply, I had a couple of unforeseen days > off this week. > > Firstly regarding Jose's suggestion re: MUMPS, the program is already > using MUMPS > > to solve linear systems (the code is using a distributed MPI matrix to > solve the generalised > > non-Hermitian complex problem). > > > > I have tried the gdb debugger as per Matthew's suggestion. > > Just to note in case someone else is following this that at first it > didn't work (couldn't 'attach') , > > but after some googling I found a tip suggesting the command; > > echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope > > which seemed to get it working. > > > > I then first ran the debugger on the small matrix case that worked. > > That stopped in gdb almost immediately after starting execution > > with a report regarding 'nanosleep.c': > > ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory. > > However, issuing the 'cont' command again caused the program to run > through to the end of the > > execution w/out any problems, and with correct looking results, so I am > guessing this error > > is not particularly important. > > > > We do that on purpose when the debugger starts up. Typing 'cont' is > correct. > > > > I then tried the same debugging procedure on the large matrix case that > fails. > > The code again stopped almost immediately after the start of execution > with > > the same nanosleep error as before, and I was able to set the program > running > > again with 'cont' (see full output below). I was running the code with > 4 MPI processes, > > and so had 4 gdb windows appear. Thereafter the code ran for sometime > until completing the > > matrix construction, and then one of the gdb process windows printed a > > Program terminated with signal SIGKILL, Killed. > > The program no longer exists. > > message. I then typed 'where' into this terminal but just received the > message > > No stack. > > > > I have only seen this behavior one other time, and it was with Fortran. > Fortran allows you to declare really big arrays > > on the stack by putting them at the start of a function (rather than F90 > malloc). When I had one of those arrays exceed > > the stack space, I got this kind of an error where everything is > destroyed rather than just stopping. Could it be that you > > have a large structure on the stack? > > > > Second, you can at least look at the stack for the processes that were > not killed. You type Ctrl-C, which should give you > > the prompt and then "where". > > > > Thanks, > > > > Matt > > > > The other gdb windows basically seemed to be left in limbo until I > issued the 'quit' > > command in the SIGKILL, and then they vanished. > > > > I paste the full output from the gdb window that recorded the SIGKILL > below here. > > I guess it is necessary to somehow work out where the SIGKILL originates > from ? > > > > Thanks once again, > > Dan. > > > > > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2 > > Copyright (C) 2020 Free Software Foundation, Inc. > > License GPLv3+: GNU GPL version 3 or later < > http://gnu.org/licenses/gpl.html> > > This is free software: you are free to change and redistribute it. > > There is NO WARRANTY, to the extent permitted by law. > > Type "show copying" and "show warranty" for details. > > This GDB was configured as "x86_64-linux-gnu". > > Type "show configuration" for configuration details. > > For bug reporting instructions, please see: > > . > > Find the GDB manual and other documentation resources online at: > > . > > > > For help, type "help". > > Type "apropos word" to search for commands related to "word"... > > Reading symbols from ./stab1.exe... > > Attaching to program: > /data/work/rotplane/omega_to_zero/stability/test/tmp10/tmp6/stab1.exe, > process 675919 > > Reading symbols from > /data/work/slepc/SLEPC/slepc-3.15.1/arch-omp_nodbug/lib/libslepc.so.3.15... > > Reading symbols from > /data/work/slepc/PETSC/petsc-3.15.0/arch-omp_nodbug/lib/libpetsc.so.3.15... > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib--Type for > more, q to quit, c to continue without paging--cont > > /intel64_lin/libmkl_intel_lp64.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so... > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.so... > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.dbg... > > Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2... > > Reading symbols from > /usr/lib/debug//lib/x86_64-linux-gnu/libdl-2.31.so... > > Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0... > > Reading symbols from > /usr/lib/debug/.build-id/e5/4761f7b554d0fcc1562959665d93dffbebdaf0.debug... > > [Thread debugging using libthread_db enabled] > > Using host libthread_db library > "/lib/x86_64-linux-gnu/libthread_db.so.1". > > Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6... > > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libstdc++.so.6) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpifort.so.12... > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12... > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.dbg... > > Reading symbols from /lib/x86_64-linux-gnu/librt.so.1... > > Reading symbols from > /usr/lib/debug//lib/x86_64-linux-gnu/librt-2.31.so... > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so) > > Reading symbols from /lib/x86_64-linux-gnu/libm.so.6... > > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libm-2.31.so... > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so) > > Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1... > > (No debugging symbols found in /lib/x86_64-linux-gnu/libgcc_s.so.1) > > Reading symbols from /usr/lib/x86_64-linux-gnu/libquadmath.so.0... > > (No debugging symbols found in > /usr/lib/x86_64-linux-gnu/libquadmath.so.0) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so) > > Reading symbols from /lib/x86_64-linux-gnu/libc.so.6... > > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.31.so... > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5) > > Reading symbols from /lib64/ld-linux-x86-64.so.2... > > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.31.so... > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1) > > Reading symbols from /usr/lib/x86_64-linux-gnu/libnuma.so... > > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libnuma.so) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so) > > Reading symbols from /usr/lib/x86_64-linux-gnu/libpsm2.so.2... > > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libpsm2.so.2) > > 0x00007fac4d0d8334 in __GI___clock_nanosleep (clock_id=, > clock_id at entry=0, flags=flags at entry=0, req=req at entry=0x7ffdc641a9a0, > rem=rem at entry=0x7ffdc641a9a0) at > ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78 > > 78 ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or > directory. > > (gdb) cont > > Continuing. > > [New Thread 0x7f9e49c02780 (LWP 676559)] > > [New Thread 0x7f9e49400800 (LWP 676560)] > > [New Thread 0x7f9e48bfe880 (LWP 676562)] > > [Thread 0x7f9e48bfe880 (LWP 676562) exited] > > [Thread 0x7f9e49400800 (LWP 676560) exited] > > [Thread 0x7f9e49c02780 (LWP 676559) exited] > > > > Program terminated with signal SIGKILL, Killed. > > The program no longer exists. > > (gdb) where > > No stack. > > > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - - - - - - - - - - - - - - > > > > From: Matthew Knepley > > Sent: Friday, August 20, 2021 2:12 PM > > To: dazza simplythebest > > Cc: Jose E. Roman ; PETSc > > Subject: Re: [petsc-users] Improving efficiency of slepc usage > > > > On Fri, Aug 20, 2021 at 6:55 AM dazza simplythebest < > sayosale at hotmail.com> wrote: > > Dear Jose, > > Many thanks for your response, I have been investigating this issue > with a few more calculations > > today, hence the slightly delayed response. > > > > The problem is actually derived from a fluid dynamics problem, so to > allow an easier exploration of things > > I first downsized the resolution of the underlying fluid solver while > keeping all the physical parameters > > the same - i.e. I would get a smaller matrix that should be solving the > same physical problem as the original > > larger matrix but to lower accuracy. > > > > Results > > > > Small matrix (N= 21168) - everything good! > > This converged when using the -eps_largest_real approach (taking 92 > iterations for nev=10, > > tol= 5.0000E-06 and ncv = 300), and also when using the shift-invert > approach, converging > > very impressively in a single iteration ! Interestingly it did this both > for a non-zero -eps_target > > and also for a zero -eps_target. > > > > Large matrix (N=50400)- works for -eps_largest_real , fails for st_type > sinvert > > I have just double checked again that the code does run properly when we > use the -eps_largest_real > > option - indeed I ran it with a small nev and large tolerance (nev = 4, > tol= -eps_tol 5.0e-4 , ncv = 300) > > and with these parameters convergence was obtained in 164 iterations, > which took 6 hours on the > > machine I was running it on. Furthermore the eigenvalues seem to be > ballpark correct; for this large > > higher resolution case (although with lower slepc tolerance) we obtain > 1789.56816314173 -4724.51319554773i > > as the eigenvalue with largest real part, while the smaller matrix > (same physical problem but at lower resolution case) > > found this eigenvalue to be 1831.11845726501 -4787.54519511345i , which > means the agreement is in line > > with expectations. > > > > Unfortunately though the code does still crash though when I try to do > shift-invert for the large matrix case , > > whether or not I use a non-zero -eps_target. For reference this is the > command line used : > > -eps_nev 10 -eps_ncv 300 -log_view -eps_view -eps_target 0.1 > -st_type sinvert -eps_monitor :monitor_output05.txt > > To be precise the code crashes soon after calling EPSSolve (it > successfully calls > > MatCreateVecs, EPSCreate, EPSSetOperators, EPSSetProblemType and > EPSSetFromOptions). > > By crashes I mean that I do not even get any error messages from > slepc/PETSC, and do not even get the > > 'EPS Object: 16 MPI processes' message - I simply get a MPI/Fortran > 'KILLED BY SIGNAL: 9 (Killed)' message > > as soon as EPSsolve is called. > > > > Hi Dan, > > > > It would help track this error down if we had a stack trace. You can get > a stack trace from the debugger. You run with > > > > -start_in_debugger > > > > which should launch the debugger (usually), and then type > > > > cont > > > > to continue, and then > > > > where > > > > to get the stack trace when it crashes, or 'bt' on lldb. > > > > Thanks, > > > > Matt > > > > Do you have any ideas as to why this larger matrix case should fail when > using shift-invert but succeed when using > > -eps_largest_real ? The fact that the program works and produces correct > results > > when using the -eps_largest_real option suggests that there is probably > nothing wrong with the specification > > of the problem or the matrices ? It is strange how there is no error > message from slepc / Petsc ... the > > only idea I have at the moment is that perhaps max memory has been > exceeded, which could cause such a sudden > > shutdown? For your reference when running the large matrix case with the > -eps_largest_real option I am using > > about 36 GB of the 148GB available on this machine - does the shift > invert approach require substantially > > more memory for example ? > > > > I would be very grateful if you have any suggestions to resolve this > issue or even ways to clarify it further, > > the performance I have seen with the shift-invert for the small matrix > is so impressive it would be great to > > get that working for the full-size problem. > > > > Many thanks and best wishes, > > Dan. > > > > > > > > From: Jose E. Roman > > Sent: Thursday, August 19, 2021 7:58 AM > > To: dazza simplythebest > > Cc: PETSc > > Subject: Re: [petsc-users] Improving efficiency of slepc usage > > > > In A) convergence may be slow, especially if the wanted eigenvalues have > small magnitude. I would not say 600 iterations is a lot, you probably need > many more. In most cases, approach B) is better because it improves > convergence of eigenvalues close to the target, but it requires prior > knowledge of your spectrum distribution in order to choose an appropriate > target. > > > > In B) what do you mean that it crashes. If you get an error about > factorization, it means that your A-matrix is singular, In that case, try > using a nonzero target -eps_target 0.1 > > > > Jose > > > > > > > El 19 ago 2021, a las 7:12, dazza simplythebest > escribi?: > > > > > > Dear All, > > > I am planning on using slepc to do a large number of > eigenvalue calculations > > > of a generalized eigenvalue problem, called from a program written in > fortran using MPI. > > > Thus far I have successfully installed the slepc/PETSc software, both > locally and on a cluster, > > > and on smaller test problems everything is working well; the matrices > are efficiently and > > > correctly constructed and slepc returns the correct spectrum. I am > just now starting to move > > > towards now solving the full-size 'production run' problems, and would > appreciate some > > > general advice on how to improve the solver's performance. > > > > > > In particular, I am currently trying to solve the problem Ax = lambda > Bx whose matrices > > > are of size 50000 (this is the smallest 'production run' problem I > will be tackling), and are > > > complex, non-Hermitian. In most cases I aim to find the eigenvalues > with the largest real part, > > > although in other cases I will also be interested in finding the > eigenvalues whose real part > > > is close to zero. > > > > > > A) > > > Calling slepc 's EPS solver with the following options: > > > > > > -eps_nev 10 -log_view -eps_view -eps_max_it 600 -eps_ncv 140 > -eps_tol 5.0e-6 -eps_largest_real -eps_monitor :monitor_output.txt > > > > > > > > > led to the code successfully running, but failing to find any > eigenvalues within the maximum 600 iterations > > > (examining the monitor output it did appear to be very slowly > approaching convergence). > > > > > > B) > > > On the same problem I have also tried a shift-invert transformation > using the options > > > > > > -eps_nev 10 -eps_ncv 140 -eps_target 0.0+0.0i -st_type sinvert > > > > > > -in this case the code crashed at the point it tried to call slepc, so > perhaps I have incorrectly specified these options ? > > > > > > > > > Does anyone have any suggestions as to how to improve this performance > ( or find out more about the problem) ? > > > In the case of A) I can see from watching the slepc videos that > increasing ncv > > > may help, but I am wondering , since 600 is a large number of > iterations, whether there > > > maybe something else going on - e.g. perhaps some alternative > preconditioner may help ? > > > In the case of B), I guess there must be some mistake in these command > line options? > > > Again, any advice will be greatly appreciated. > > > Best wishes, Dan. > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Aug 26 10:53:40 2021 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 26 Aug 2021 11:53:40 -0400 Subject: [petsc-users] Improving efficiency of slepc usage -memory management when using shift-invert In-Reply-To: References: Message-ID: On Thu, Aug 26, 2021 at 8:32 AM dazza simplythebest wrote: > Dear Jose and Matthew, > Many thanks for your assistance, this would seem to > explain what the problem was. > So judging by this test case, there seems to be a memory vs computational > time tradeoff involved > in choosing whether to shift-invert or not; the shift-invert will greatly > reduce the > number of required iterations ,but will require a higher memory cost ? > I have been trying a few values of -st_mat_mumps_icntl_14 (and also the > alternative > -st_mat_mumps_icntl_23) today but have not yet been able to select one > that fits onto the > workstation I am using (although it seems that setting these parameters > seems to guarantee > that an error message is generated at least). > > Thus I will probably need to reduce the number of MPI > processes and thereby reduce the memory requirement). In this regard the > MUMPS documentation > suggests that a hybrid MPI-OpenMP approach is optimum for their software, > whereas I remember reading > somewhere else that openmp threading was not a good choice for using > PETSC, would you have any > general advice on this ? > Memory does not really track the number of MPI processes. MUMPS does a lot of things redundantly. For minimum memory, I would suggest trying SuperLU_dist: --download-superlu_dist I do not think OpenMP will have much influence at all. Thanks, Matt > I was thinking maybe that a version of slepc / petsc compiled against > openmp, > and with the number of threads set appropriately, but not explicitly > using openmp directives in > the user's code may be the way forward ? That way PETSC will (?) just > ignore the threading whereas > threading will be available to MUMPS when execution is passed to those > routines ? > > Many thanks once again, > Dan. > > > > ------------------------------ > *From:* Jose E. Roman > *Sent:* Wednesday, August 25, 2021 1:40 PM > *To:* dazza simplythebest > *Cc:* PETSc > *Subject:* Re: [petsc-users] Improving efficiency of slepc usage > > MUMPS documentation (section 8) indicates that the meaning of INFOG(1)=-9 > is insuficient workspace. Try running with > -st_mat_mumps_icntl_14 > where is the percentage in which you want to increase the > workspace, e.g. 50 or 100 or more. > > See ex43.c for an example showing how to set this option in code. > > Jose > > > > El 25 ago 2021, a las 14:11, dazza simplythebest > escribi?: > > > > > > > > From: dazza simplythebest > > Sent: Wednesday, August 25, 2021 12:08 PM > > To: Matthew Knepley > > Subject: Re: [petsc-users] Improving efficiency of slepc usage > > > > Dear Matthew and Jose, > > I have derived a smaller > program from the original program by constructing > > matrices of the same size, but filling their entries randomly instead of > computing the correct > > fluid dynamics values just to allow faster experimentation. This > modified code's behaviour seems > > to be similar, with the code again failing for the large matrix case > with the SIGKILL error, so I first report > > results from that code here. Firstly I can confirm that I am using > Fortran , and I am compiling with the > > intel compiler, which it seems places automatic arrays on the stack. > The stacksize, as determined > > by ulimit -a, is reported to be : > > stack size (kbytes, -s) 8192 > > > > [1] Okay, so I followed your suggestion and used ctrl-c followed by > 'where' in one of the non-SIGKILL gdb windows. > > I have pasted the output into the bottom of this email (see [1] output) > - it does look like the problem occurs somewhere in the call > > to the MUMPS solver ? > > > > [2] I have also today gained access to another workstation, and so have > tried running the (original) code on that machine. > > This new machine has two (more powerful) CPU nodes and a larger memory > (both machines feature Intel Xeon processors). > > On this new machine the large matrix case again failed with the familiar > SIGKILL report when I used 16 or 12 MPI > > processes, ran to the end w/out error for 4 or 6 MPI processes, and > failed but with a PETSC error message > > when I used 8 MPI processes, which I have pasted below (see [2] > output). Does this point to some sort of resource > > demand that exceeds some limit as the number of MPI processes increases ? > > > > Many thanks once again, > > Dan. > > > > [2] output > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [0]PETSC ERROR: Error in external library > > [0]PETSC ERROR: Error reported by MUMPS in numerical factorization > phase: INFOG(1)=-9, INFO(2)=6 > > > > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [0]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > > [0]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren > Wed Aug 25 11:18:48 2021 > > [0]PETSC ERROR: Configure options > ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs > --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" > CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex > --with-precision=double --with-debugging=0 --with-openmp > --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --download-mumps --download-scalapack --download-cmake > PETSC_ARCH=arch-omp_nodbug > > [0]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [1]PETSC ERROR: Error in external library > > [1]PETSC ERROR: Error reported by MUMPS in numerical factorization > phase: INFOG(1)=-9, INFO(2)=6 > > > > [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [1]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > > [1]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren > Wed Aug 25 11:18:48 2021 > > [1]PETSC ERROR: Configure options > ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs > --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" > CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex > --with-precision=double --with-debugging=0 --with-openmp > --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --download-mumps --download-scalapack --download-cmake > PETSC_ARCH=arch-omp_nodbug > > [1]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > > [1]PETSC ERROR: #2 MatLUFactorNumeric() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > > [1]PETSC ERROR: #3 PCSetUp_LU() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > > [1]PETSC ERROR: #4 PCSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > > [1]PETSC ERROR: #5 KSPSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > > [2]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [2]PETSC ERROR: Error in external library > > [2]PETSC ERROR: Error reported by MUMPS in numerical factorization > phase: INFOG(1)=-9, INFO(2)=6 > > > > [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [2]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > > [2]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren > Wed Aug 25 11:18:48 2021 > > [2]PETSC ERROR: Configure options > ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs > --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" > CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex > --with-precision=double --with-debugging=0 --with-openmp > --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --download-mumps --download-scalapack --download-cmake > PETSC_ARCH=arch-omp_nodbug > > [2]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > > [2]PETSC ERROR: #2 MatLUFactorNumeric() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > > [2]PETSC ERROR: #3 PCSetUp_LU() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > > [2]PETSC ERROR: #4 PCSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > > [2]PETSC ERROR: #5 KSPSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > > [3]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [3]PETSC ERROR: Error in external library > > [3]PETSC ERROR: Error reported by MUMPS in numerical factorization > phase: INFOG(1)=-9, INFO(2)=6 > > > > [3]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [3]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > > [3]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren > Wed Aug 25 11:18:48 2021 > > [3]PETSC ERROR: Configure options > ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs > --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" > CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex > --with-precision=double --with-debugging=0 --with-openmp > --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --download-mumps --download-scalapack --download-cmake > PETSC_ARCH=arch-omp_nodbug > > [3]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > > [3]PETSC ERROR: #2 MatLUFactorNumeric() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > > [3]PETSC ERROR: #3 PCSetUp_LU() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > > [3]PETSC ERROR: #4 PCSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > > [3]PETSC ERROR: #5 KSPSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > > [3]PETSC ERROR: #6 STSetUp_Sinvert() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > > [3]PETSC ERROR: #7 STSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > > [3]PETSC ERROR: #8 EPSSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > > [4]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [4]PETSC ERROR: Error in external library > > [4]PETSC ERROR: Error reported by MUMPS in numerical factorization > phase: INFOG(1)=-9, INFO(2)=6 > > > > [4]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [4]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > > [4]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren > Wed Aug 25 11:18:48 2021 > > [4]PETSC ERROR: Configure options > ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs > --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" > CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex > --with-precision=double --with-debugging=0 --with-openmp > --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --download-mumps --download-scalapack --download-cmake > PETSC_ARCH=arch-omp_nodbug > > [4]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > > [4]PETSC ERROR: #2 MatLUFactorNumeric() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > > [4]PETSC ERROR: #3 PCSetUp_LU() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > > [4]PETSC ERROR: #4 PCSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > > [4]PETSC ERROR: #5 KSPSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > > [4]PETSC ERROR: #6 STSetUp_Sinvert() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > > [4]PETSC ERROR: #7 STSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > > [4]PETSC ERROR: #8 EPSSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > > [4]PETSC ERROR: #9 EPSSolve() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > > [5]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [5]PETSC ERROR: Error in external library > > [5]PETSC ERROR: Error reported by MUMPS in numerical factorization > phase: INFOG(1)=-9, INFO(2)=6 > > > > [5]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [5]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > > [5]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren > Wed Aug 25 11:18:48 2021 > > [5]PETSC ERROR: Configure options > ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs > --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" > CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex > --with-precision=double --with-debugging=0 --with-openmp > --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --download-mumps --download-scalapack --download-cmake > PETSC_ARCH=arch-omp_nodbug > > [5]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > > [5]PETSC ERROR: #2 MatLUFactorNumeric() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > > [5]PETSC ERROR: #3 PCSetUp_LU() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > > [5]PETSC ERROR: #4 PCSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > > [5]PETSC ERROR: #5 KSPSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > > [5]PETSC ERROR: #6 STSetUp_Sinvert() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > > [5]PETSC ERROR: #7 STSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > > [5]PETSC ERROR: #8 EPSSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > > [5]PETSC ERROR: #9 EPSSolve() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > > [6]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [6]PETSC ERROR: Error in external library > > [6]PETSC ERROR: Error reported by MUMPS in numerical factorization > phase: INFOG(1)=-9, INFO(2)=21891045 > > > > [6]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [6]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > > [6]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren > Wed Aug 25 11:18:48 2021 > > [6]PETSC ERROR: Configure options > ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs > --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" > CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex > --with-precision=double --with-debugging=0 --with-openmp > --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --download-mumps --download-scalapack --download-cmake > PETSC_ARCH=arch-omp_nodbug > > [6]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > > [6]PETSC ERROR: #2 MatLUFactorNumeric() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > > [6]PETSC ERROR: #3 PCSetUp_LU() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > > [6]PETSC ERROR: #4 PCSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > > [6]PETSC ERROR: #5 KSPSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > > [6]PETSC ERROR: #6 STSetUp_Sinvert() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > > [6]PETSC ERROR: #7 STSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > > [6]PETSC ERROR: #8 EPSSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > > [6]PETSC ERROR: #9 EPSSolve() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > > [7]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [7]PETSC ERROR: Error in external library > > [7]PETSC ERROR: Error reported by MUMPS in numerical factorization > phase: INFOG(1)=-9, INFO(2)=21841925 > > > > [7]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [7]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > > [7]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren > Wed Aug 25 11:18:48 2021 > > [7]PETSC ERROR: Configure options > ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs > --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort > --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" > CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex > --with-precision=double --with-debugging=0 --with-openmp > --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --download-mumps --download-scalapack --download-cmake > PETSC_ARCH=arch-omp_nodbug > > [7]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > > [7]PETSC ERROR: #2 MatLUFactorNumeric() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > > [7]PETSC ERROR: #3 PCSetUp_LU() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > > [7]PETSC ERROR: #4 PCSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > > [7]PETSC ERROR: #5 KSPSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > > [7]PETSC ERROR: #6 STSetUp_Sinvert() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > > [7]PETSC ERROR: #7 STSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > > [7]PETSC ERROR: #8 EPSSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > > [7]PETSC ERROR: #9 EPSSolve() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > > [0]PETSC ERROR: #2 MatLUFactorNumeric() at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > > [0]PETSC ERROR: #3 PCSetUp_LU() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > > [0]PETSC ERROR: #4 PCSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > > [0]PETSC ERROR: #5 KSPSetUp() at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > > [0]PETSC ERROR: #6 STSetUp_Sinvert() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > > [0]PETSC ERROR: #7 STSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > > [0]PETSC ERROR: #8 EPSSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > > [0]PETSC ERROR: #9 EPSSolve() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > > [1]PETSC ERROR: #6 STSetUp_Sinvert() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > > [1]PETSC ERROR: #7 STSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > > [1]PETSC ERROR: #8 EPSSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > > [1]PETSC ERROR: #9 EPSSolve() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > > [2]PETSC ERROR: #6 STSetUp_Sinvert() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > > [2]PETSC ERROR: #7 STSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > > [2]PETSC ERROR: #8 EPSSetUp() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > > [2]PETSC ERROR: #9 EPSSolve() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > > [3]PETSC ERROR: #9 EPSSolve() at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > > > > > > > > [1] output > > > > Continuing. > > [New Thread 0x7f6f5b2d2780 (LWP 794037)] > > [New Thread 0x7f6f5aad0800 (LWP 794040)] > > [New Thread 0x7f6f5a2ce880 (LWP 794041)] > > ^C > > Thread 1 "my.exe" received signal SIGINT, Interrupt. > > 0x00007f72904927b0 in ofi_fastlock_release_noop () > > from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so > > (gdb) where > > #0 0x00007f72904927b0 in ofi_fastlock_release_noop () > > from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so > > #1 0x00007f729049354b in ofi_cq_readfrom () > > from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so > > #2 0x00007f728ffe8f0e in rxm_ep_do_progress () > > from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so > > #3 0x00007f728ffe2b7d in rxm_ep_recv_common_flags () > > from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so > > #4 0x00007f728ffe30f8 in rxm_ep_trecvmsg () > > from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so > > #5 0x00007f72fe6b8c3e in PMPI_Iprobe (source=14090824, tag=-1481647392, > > comm=1, flag=0x0, status=0xffffffffffffffff) > > at /usr/include/rdma/fi_tagged.h:109 > > #6 0x00007f72ff3d7fad in pmpi_iprobe_ (v1=0xd70248, v2=0x7ffda7afdae0, > > v3=0x1, v4=0x0, v5=0xffffffffffffffff, ierr=0xd6fc90) > > at ../../src/binding/fortran/mpif_h/iprobef.c:276 > > #7 0x00007f730855b6e2 in zmumps_try_recvtreat (comm_load=1, ass_irecv=0, > > blocking= 0x1>, > > > > --Type for more, q to quit, c to continue without paging--cont > > irecv=, > message_received= 0xffffffffffffffff>, msgsou=1, msgtag=-1, status=..., bufr=..., > lbufr=320782504, lbufr_bytes=1283130016, procnode_steps=..., posfac=1, > iwpos=1, iwposcb=292535, iptrlu=2039063816, lrlu=2039063816, > lrlus=2039063816, n=50400, iw=..., liw=292563, a=..., la=2611636796, > ptrist=..., ptlust=..., ptrfac=..., ptrast=..., step=..., pimaster=..., > pamaster=..., nstk_s=..., comp=0, iflag=0, ierror=0, comm=-1006632958, > nbprocfils=..., ipool=..., lpool=5, leaf=1, nbfin=4, myid=1, slavef=4, > root= 766016 bytes, which is more than max-value-size>, opassw=0, opeliw=0, > itloc=..., rhs_mumps=..., fils=..., dad=..., ptrarw=..., ptraiw=..., > intarr=..., dblarr=..., icntl=..., keep=..., keep8=..., dkeep=..., nd=..., > frere=..., lptrar=50400, nelt=1, frtptr=..., frtelt=..., > istep_to_iniv2=..., tab_pos_in_pere=..., stack_right_authorized=4294967295, > lrgroups=...) at zfac_process_message.F:730 > > #8 0x00007f73087076e2 in zmumps_fac_par_m::zmumps_fac_par (n=1, iw=..., > liw=, a=..., > la= 0xffffffffffffffff>, nstk_steps=..., nbprocfils=..., nd=..., fils=..., > step=..., frere=..., dad=..., cand=..., istep_to_iniv2=..., > tab_pos_in_pere=..., nstepsdone=1690339657, opass= Cannot access memory at address 0x5>, opeli= access memory at address 0x0>, nelva=50400, comp=259581, > maxfrt=-1889517576, nmaxnpiv=-1195144887, ntotpv= Cannot access memory at address 0x2>, noffnegpv= Cannot access memory at address 0x0>, nb22t1= Cannot access memory at address 0x0>, nb22t2= Cannot access memory at address 0x0>, nbtiny= Cannot access memory at address 0x0>, det_exp= Cannot access memory at address 0x0>, det_mant= Cannot access memory at address 0x0>, det_sign= Cannot access memory at address 0x0>, ptrist=..., ptrast=..., pimaster=..., > pamaster=..., ptrarw=..., ptraiw=..., itloc=..., rhs_mumps=..., ipool=..., > lpool=, > rinfo=, > posfac=, > iwpos=, > lrlu=, > iptrlu=, > lrlus=, > leaf=, > nbroot=, > nbrtot=, > uu=, > icntl=, > ptlust=..., ptrfac=..., info= at address 0x0>, keep= address 0x3ff0000000000000>, keep8= memory at address 0x0>, procnode_steps=..., slavef= Cannot access memory at address 0x4ffffffff>, myid= Cannot access memory at address 0xffffffff>, comm_nodes= variable: Cannot access memory at address 0x0>, myid_nodes= variable: Cannot access memory at address 0x0>, bufr=..., lbufr=0, > lbufr_bytes=5, intarr=..., dblarr=..., root=..., perm=..., nelt=0, > frtptr=..., frtelt=..., lptrar=3, comm_load=-30, ass_irecv=30, > seuil=2.1219957909652723e-314, seuil_ldlt_niv2=4.2439866417681519e-314, > mem_distrib=..., ne=..., dkeep=..., pivnul_list=..., lpn_list=0, > lrgroups=...) at zfac_par_m.F:182 > > #9 0x00007f730865af7a in zmumps_fac_b (n=1, s_is_pointers=..., > la=, > liw=, > sym_perm=..., na=..., lna=1, ne_steps=..., nfsiz=..., fils=..., step=..., > frere=..., dad=..., cand=..., istep_to_iniv2=..., tab_pos_in_pere=..., > ptrar=..., ldptrar= 0x0>, ptrist=..., ptlust_s=..., ptrfac=..., iw1=..., iw2=..., itloc=..., > rhs_mumps=..., pool=..., lpool=-1889529280, cntl1=-5.3576889161551131e-255, > icntl=, > info=..., rinfo=..., keep=..., keep8=..., procnode_steps=..., > slavef=-1889504640, comm_nodes=-2048052411, myid= Cannot access memory at address 0x81160>, myid_nodes=-1683330500, bufr=..., > lbufr=, > lbufr_bytes= 0xc4e0>, zmumps_lbuf= address 0x4>, intarr=..., dblarr=..., root= access memory at address 0x11dbec>, nelt= access memory at address 0x3>, frtptr=..., frtelt=..., comm_load= reading variable: Cannot access memory at address 0x0>, ass_irecv= reading variable: Cannot access memory at address 0x0>, seuil= reading variable: Cannot access memory at address 0x0>, > seuil_ldlt_niv2= 0x0>, mem_distrib= 0x0>, dkeep=, > pivnul_list=..., lpn_list= address 0x0>, lrgroups=...) at zfac_b.F:243 > > #10 0x00007f7308610ff7 in zmumps_fac_driver (id= value of type `zmumps_struc' requires 386095520 bytes, which is more than > max-value-size>) at zfac_driver.F:2421 > > #11 0x00007f7308569256 in zmumps (id= type `zmumps_struc' requires 386095520 bytes, which is more than > max-value-size>) at zmumps_driver.F:1883 > > #12 0x00007f73084cf756 in zmumps_f77 (job=1, sym=0, par= variable: Cannot access memory at address 0x1>, comm_f77= variable: Cannot access memory at address 0x0>, n= Cannot access memory at address 0xffffffffffffffff>, nblk=1, icntl=..., > cntl=..., keep=..., dkeep=..., keep8=..., nz=0, nnz=0, irn=..., irnhere=0, > jcn=..., jcnhere=0, a=..., ahere=0, nz_loc=0, nnz_loc=304384739, > irn_loc=..., irn_lochere=1, jcn_loc=..., jcn_lochere=1, a_loc=..., > a_lochere=1, nelt=0, eltptr=..., eltptrhere=0, eltvar=..., eltvarhere=0, > a_elt=..., a_elthere=0, blkptr=..., blkptrhere=0, blkvar=..., blkvarhere=0, > perm_in=..., perm_inhere=0, rhs=..., rhshere=0, redrhs=..., redrhshere=0, > info=..., rinfo=..., infog=..., rinfog=..., deficiency=0, lwk_user=0, > size_schur=0, listvar_schur=..., listvar_schurhere=0, schur=..., > schurhere=0, wk_user=..., wk_userhere=0, colsca=..., colscahere=0, > rowsca=..., rowscahere=0, instance_number=1, nrhs=1, lrhs=0, lredrhs=0, > rhs_sparse=..., rhs_sparsehere=0, sol_loc=..., sol_lochere=0, rhs_loc=..., > rhs_lochere=0, irhs_sparse=..., irhs_sparsehere=0, irhs_ptr=..., > irhs_ptrhere=0, isol_loc=..., isol_lochere=0, irhs_loc=..., irhs_lochere=0, > nz_rhs=0, lsol_loc=0, lrhs_loc=0, nloc_rhs=0, schur_mloc=0, schur_nloc=0, > schur_lld=0, mblock=0, nblock=0, nprow=0, npcol=0, ooc_tmpdir=..., > ooc_prefix=..., write_problem=..., save_dir=..., save_prefix=..., > tmpdirlen=20, prefixlen=20, write_problemlen=20, save_dirlen=20, > save_prefixlen=20, metis_options=...) at zmumps_f77.F:289 > > #13 0x00007f73084cd391 in zmumps_c (mumps_par=0xd70248) at mumps_c.c:485 > > #14 0x00007f7307c035ad in MatFactorNumeric_MUMPS (F=0xd70248, > A=0x7ffda7afdae0, info=0x1) at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1683 > > #15 0x00007f7307765a8b in MatLUFactorNumeric (fact=0xd70248, > mat=0x7ffda7afdae0, info=0x1) at > /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > > #16 0x00007f73081b8427 in PCSetUp_LU (pc=0xd70248) at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > > #17 0x00007f7308214939 in PCSetUp (pc=0xd70248) at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > > #18 0x00007f73082260ae in KSPSetUp (ksp=0xd70248) at > /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > > #19 0x00007f7309114959 in STSetUp_Sinvert (st=0xd70248) at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > > #20 0x00007f7309130462 in STSetUp (st=0xd70248) at > /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > > #21 0x00007f73092504af in EPSSetUp (eps=0xd70248) at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > > #22 0x00007f7309253635 in EPSSolve (eps=0xd70248) at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > > #23 0x00007f7309259c8d in epssolve_ (eps=0xd70248, > __ierr=0x7ffda7afdae0) at > /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/ftn-auto/epssolvef.c:85 > > #24 0x0000000000403c19 in all_stab_routines::solve_by_slepc2 (a_pet=..., > b_pet=..., jthisone= address 0x1>, isize= address 0x0>) at small_slepc_example_program.F:322 > > #25 0x00000000004025a0 in slepit () at small_slepc_example_program.F:549 > > #26 0x00000000004023f2 in main () > > #27 0x00007f72fb8380b3 in __libc_start_main (main=0x4023c0
, > argc=14, argv=0x7ffda7b024e8, init=, fini=, > rtld_fini=, stack_end=0x7ffda7b024d8) at > ../csu/libc-start.c:308 > > #28 0x00000000004022fe in _start () > > > > From: Matthew Knepley > > Sent: Tuesday, August 24, 2021 3:59 PM > > To: dazza simplythebest > > Cc: Jose E. Roman ; PETSc > > Subject: Re: [petsc-users] Improving efficiency of slepc usage > > > > On Tue, Aug 24, 2021 at 8:47 AM dazza simplythebest < > sayosale at hotmail.com> wrote: > > > > Dear Matthew and Jose, > > Apologies for the delayed reply, I had a couple of unforeseen days > off this week. > > Firstly regarding Jose's suggestion re: MUMPS, the program is already > using MUMPS > > to solve linear systems (the code is using a distributed MPI matrix to > solve the generalised > > non-Hermitian complex problem). > > > > I have tried the gdb debugger as per Matthew's suggestion. > > Just to note in case someone else is following this that at first it > didn't work (couldn't 'attach') , > > but after some googling I found a tip suggesting the command; > > echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope > > which seemed to get it working. > > > > I then first ran the debugger on the small matrix case that worked. > > That stopped in gdb almost immediately after starting execution > > with a report regarding 'nanosleep.c': > > ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory. > > However, issuing the 'cont' command again caused the program to run > through to the end of the > > execution w/out any problems, and with correct looking results, so I am > guessing this error > > is not particularly important. > > > > We do that on purpose when the debugger starts up. Typing 'cont' is > correct. > > > > I then tried the same debugging procedure on the large matrix case that > fails. > > The code again stopped almost immediately after the start of execution > with > > the same nanosleep error as before, and I was able to set the program > running > > again with 'cont' (see full output below). I was running the code with > 4 MPI processes, > > and so had 4 gdb windows appear. Thereafter the code ran for sometime > until completing the > > matrix construction, and then one of the gdb process windows printed a > > Program terminated with signal SIGKILL, Killed. > > The program no longer exists. > > message. I then typed 'where' into this terminal but just received the > message > > No stack. > > > > I have only seen this behavior one other time, and it was with Fortran. > Fortran allows you to declare really big arrays > > on the stack by putting them at the start of a function (rather than F90 > malloc). When I had one of those arrays exceed > > the stack space, I got this kind of an error where everything is > destroyed rather than just stopping. Could it be that you > > have a large structure on the stack? > > > > Second, you can at least look at the stack for the processes that were > not killed. You type Ctrl-C, which should give you > > the prompt and then "where". > > > > Thanks, > > > > Matt > > > > The other gdb windows basically seemed to be left in limbo until I > issued the 'quit' > > command in the SIGKILL, and then they vanished. > > > > I paste the full output from the gdb window that recorded the SIGKILL > below here. > > I guess it is necessary to somehow work out where the SIGKILL originates > from ? > > > > Thanks once again, > > Dan. > > > > > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2 > > Copyright (C) 2020 Free Software Foundation, Inc. > > License GPLv3+: GNU GPL version 3 or later < > http://gnu.org/licenses/gpl.html> > > This is free software: you are free to change and redistribute it. > > There is NO WARRANTY, to the extent permitted by law. > > Type "show copying" and "show warranty" for details. > > This GDB was configured as "x86_64-linux-gnu". > > Type "show configuration" for configuration details. > > For bug reporting instructions, please see: > > . > > Find the GDB manual and other documentation resources online at: > > . > > > > For help, type "help". > > Type "apropos word" to search for commands related to "word"... > > Reading symbols from ./stab1.exe... > > Attaching to program: > /data/work/rotplane/omega_to_zero/stability/test/tmp10/tmp6/stab1.exe, > process 675919 > > Reading symbols from > /data/work/slepc/SLEPC/slepc-3.15.1/arch-omp_nodbug/lib/libslepc.so.3.15... > > Reading symbols from > /data/work/slepc/PETSC/petsc-3.15.0/arch-omp_nodbug/lib/libpetsc.so.3.15... > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib--Type for > more, q to quit, c to continue without paging--cont > > /intel64_lin/libmkl_intel_lp64.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so... > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.so... > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.dbg... > > Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2... > > Reading symbols from > /usr/lib/debug//lib/x86_64-linux-gnu/libdl-2.31.so... > > Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0... > > Reading symbols from > /usr/lib/debug/.build-id/e5/4761f7b554d0fcc1562959665d93dffbebdaf0.debug... > > [Thread debugging using libthread_db enabled] > > Using host libthread_db library > "/lib/x86_64-linux-gnu/libthread_db.so.1". > > Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6... > > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libstdc++.so.6) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpifort.so.12... > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12... > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.dbg... > > Reading symbols from /lib/x86_64-linux-gnu/librt.so.1... > > Reading symbols from > /usr/lib/debug//lib/x86_64-linux-gnu/librt-2.31.so... > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so) > > Reading symbols from /lib/x86_64-linux-gnu/libm.so.6... > > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libm-2.31.so... > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so) > > Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1... > > (No debugging symbols found in /lib/x86_64-linux-gnu/libgcc_s.so.1) > > Reading symbols from /usr/lib/x86_64-linux-gnu/libquadmath.so.0... > > (No debugging symbols found in > /usr/lib/x86_64-linux-gnu/libquadmath.so.0) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so) > > Reading symbols from /lib/x86_64-linux-gnu/libc.so.6... > > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.31.so... > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5) > > Reading symbols from /lib64/ld-linux-x86-64.so.2... > > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.31.so... > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1) > > Reading symbols from /usr/lib/x86_64-linux-gnu/libnuma.so... > > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libnuma.so) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so) > > Reading symbols from > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so... > > (No debugging symbols found in > /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so) > > Reading symbols from /usr/lib/x86_64-linux-gnu/libpsm2.so.2... > > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libpsm2.so.2) > > 0x00007fac4d0d8334 in __GI___clock_nanosleep (clock_id=, > clock_id at entry=0, flags=flags at entry=0, req=req at entry=0x7ffdc641a9a0, > rem=rem at entry=0x7ffdc641a9a0) at > ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78 > > 78 ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or > directory. > > (gdb) cont > > Continuing. > > [New Thread 0x7f9e49c02780 (LWP 676559)] > > [New Thread 0x7f9e49400800 (LWP 676560)] > > [New Thread 0x7f9e48bfe880 (LWP 676562)] > > [Thread 0x7f9e48bfe880 (LWP 676562) exited] > > [Thread 0x7f9e49400800 (LWP 676560) exited] > > [Thread 0x7f9e49c02780 (LWP 676559) exited] > > > > Program terminated with signal SIGKILL, Killed. > > The program no longer exists. > > (gdb) where > > No stack. > > > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - - - - - - - - - - - - - - > > > > From: Matthew Knepley > > Sent: Friday, August 20, 2021 2:12 PM > > To: dazza simplythebest > > Cc: Jose E. Roman ; PETSc > > Subject: Re: [petsc-users] Improving efficiency of slepc usage > > > > On Fri, Aug 20, 2021 at 6:55 AM dazza simplythebest < > sayosale at hotmail.com> wrote: > > Dear Jose, > > Many thanks for your response, I have been investigating this issue > with a few more calculations > > today, hence the slightly delayed response. > > > > The problem is actually derived from a fluid dynamics problem, so to > allow an easier exploration of things > > I first downsized the resolution of the underlying fluid solver while > keeping all the physical parameters > > the same - i.e. I would get a smaller matrix that should be solving the > same physical problem as the original > > larger matrix but to lower accuracy. > > > > Results > > > > Small matrix (N= 21168) - everything good! > > This converged when using the -eps_largest_real approach (taking 92 > iterations for nev=10, > > tol= 5.0000E-06 and ncv = 300), and also when using the shift-invert > approach, converging > > very impressively in a single iteration ! Interestingly it did this both > for a non-zero -eps_target > > and also for a zero -eps_target. > > > > Large matrix (N=50400)- works for -eps_largest_real , fails for st_type > sinvert > > I have just double checked again that the code does run properly when we > use the -eps_largest_real > > option - indeed I ran it with a small nev and large tolerance (nev = 4, > tol= -eps_tol 5.0e-4 , ncv = 300) > > and with these parameters convergence was obtained in 164 iterations, > which took 6 hours on the > > machine I was running it on. Furthermore the eigenvalues seem to be > ballpark correct; for this large > > higher resolution case (although with lower slepc tolerance) we obtain > 1789.56816314173 -4724.51319554773i > > as the eigenvalue with largest real part, while the smaller matrix > (same physical problem but at lower resolution case) > > found this eigenvalue to be 1831.11845726501 -4787.54519511345i , which > means the agreement is in line > > with expectations. > > > > Unfortunately though the code does still crash though when I try to do > shift-invert for the large matrix case , > > whether or not I use a non-zero -eps_target. For reference this is the > command line used : > > -eps_nev 10 -eps_ncv 300 -log_view -eps_view -eps_target 0.1 > -st_type sinvert -eps_monitor :monitor_output05.txt > > To be precise the code crashes soon after calling EPSSolve (it > successfully calls > > MatCreateVecs, EPSCreate, EPSSetOperators, EPSSetProblemType and > EPSSetFromOptions). > > By crashes I mean that I do not even get any error messages from > slepc/PETSC, and do not even get the > > 'EPS Object: 16 MPI processes' message - I simply get a MPI/Fortran > 'KILLED BY SIGNAL: 9 (Killed)' message > > as soon as EPSsolve is called. > > > > Hi Dan, > > > > It would help track this error down if we had a stack trace. You can get > a stack trace from the debugger. You run with > > > > -start_in_debugger > > > > which should launch the debugger (usually), and then type > > > > cont > > > > to continue, and then > > > > where > > > > to get the stack trace when it crashes, or 'bt' on lldb. > > > > Thanks, > > > > Matt > > > > Do you have any ideas as to why this larger matrix case should fail when > using shift-invert but succeed when using > > -eps_largest_real ? The fact that the program works and produces correct > results > > when using the -eps_largest_real option suggests that there is probably > nothing wrong with the specification > > of the problem or the matrices ? It is strange how there is no error > message from slepc / Petsc ... the > > only idea I have at the moment is that perhaps max memory has been > exceeded, which could cause such a sudden > > shutdown? For your reference when running the large matrix case with the > -eps_largest_real option I am using > > about 36 GB of the 148GB available on this machine - does the shift > invert approach require substantially > > more memory for example ? > > > > I would be very grateful if you have any suggestions to resolve this > issue or even ways to clarify it further, > > the performance I have seen with the shift-invert for the small matrix > is so impressive it would be great to > > get that working for the full-size problem. > > > > Many thanks and best wishes, > > Dan. > > > > > > > > From: Jose E. Roman > > Sent: Thursday, August 19, 2021 7:58 AM > > To: dazza simplythebest > > Cc: PETSc > > Subject: Re: [petsc-users] Improving efficiency of slepc usage > > > > In A) convergence may be slow, especially if the wanted eigenvalues have > small magnitude. I would not say 600 iterations is a lot, you probably need > many more. In most cases, approach B) is better because it improves > convergence of eigenvalues close to the target, but it requires prior > knowledge of your spectrum distribution in order to choose an appropriate > target. > > > > In B) what do you mean that it crashes. If you get an error about > factorization, it means that your A-matrix is singular, In that case, try > using a nonzero target -eps_target 0.1 > > > > Jose > > > > > > > El 19 ago 2021, a las 7:12, dazza simplythebest > escribi?: > > > > > > Dear All, > > > I am planning on using slepc to do a large number of > eigenvalue calculations > > > of a generalized eigenvalue problem, called from a program written in > fortran using MPI. > > > Thus far I have successfully installed the slepc/PETSc software, both > locally and on a cluster, > > > and on smaller test problems everything is working well; the matrices > are efficiently and > > > correctly constructed and slepc returns the correct spectrum. I am > just now starting to move > > > towards now solving the full-size 'production run' problems, and would > appreciate some > > > general advice on how to improve the solver's performance. > > > > > > In particular, I am currently trying to solve the problem Ax = lambda > Bx whose matrices > > > are of size 50000 (this is the smallest 'production run' problem I > will be tackling), and are > > > complex, non-Hermitian. In most cases I aim to find the eigenvalues > with the largest real part, > > > although in other cases I will also be interested in finding the > eigenvalues whose real part > > > is close to zero. > > > > > > A) > > > Calling slepc 's EPS solver with the following options: > > > > > > -eps_nev 10 -log_view -eps_view -eps_max_it 600 -eps_ncv 140 > -eps_tol 5.0e-6 -eps_largest_real -eps_monitor :monitor_output.txt > > > > > > > > > led to the code successfully running, but failing to find any > eigenvalues within the maximum 600 iterations > > > (examining the monitor output it did appear to be very slowly > approaching convergence). > > > > > > B) > > > On the same problem I have also tried a shift-invert transformation > using the options > > > > > > -eps_nev 10 -eps_ncv 140 -eps_target 0.0+0.0i -st_type sinvert > > > > > > -in this case the code crashed at the point it tried to call slepc, so > perhaps I have incorrectly specified these options ? > > > > > > > > > Does anyone have any suggestions as to how to improve this performance > ( or find out more about the problem) ? > > > In the case of A) I can see from watching the slepc videos that > increasing ncv > > > may help, but I am wondering , since 600 is a large number of > iterations, whether there > > > maybe something else going on - e.g. perhaps some alternative > preconditioner may help ? > > > In the case of B), I guess there must be some mistake in these command > line options? > > > Again, any advice will be greatly appreciated. > > > Best wishes, Dan. > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Aug 27 07:05:45 2021 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 27 Aug 2021 08:05:45 -0400 Subject: [petsc-users] runtime error on Summit with nvhpc21.7 Message-ID: I have a user (cc'ed) that has a C++ code and is using a PETSc that I built. He is getting this runtime error. 'make check' runs clean and I built snes/tutorial/ex1 manually, to get a link line, and it ran fine. I appended the users link line and my test. I see that they are using Kokkos' "nvcc_wrapper". Should I rebuild PETSc using that, maybe we just need to make sure we are both using the same underlying compiler or should they use mpiCC? Thanks, Mark [e13n16:591873] *** Process received signal *** [e13n16:591873] Signal: Segmentation fault (11) [e13n16:591873] Signal code: Invalid permissions (2) [e13n16:591873] Failing at address: 0x102c87e0 [e13n16:591873] [ 0] linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] [e13n16:591873] [ 1] [e13n16:591872] *** Process received signal *** [e13n16:591872] Signal: Segmentation fault (11) [e13n16:591872] Signal code: Invalid permissions (2) [e13n16:591872] Failing at address: 0x102c87e0 [e13n16:591872] [ 0] linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] [e13n16:591872] [ 1] [e13n16:591871] *** Process received signal *** [e13n16:591871] Signal: Segmentation fault (11) [e13n16:591871] Signal code: Invalid permissions (2) [e13n16:591871] Failing at address: 0x102c87e0 [e13n16:591871] [ 0] linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] [e13n16:591871] [ 1] /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] [e13n16:591871] [ 2] [e13n16:591874] *** Process received signal *** [e13n16:591874] Signal: Segmentation fault (11) [e13n16:591874] Signal code: Invalid permissions (2) [e13n16:591874] Failing at address: 0x102c87e0 [e13n16:591874] [ 0] linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] [e13n16:591874] [ 1] /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] [e13n16:591874] [ 2] /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] [e13n16:591874] [ 3] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] [e13n16:591874] [ 4] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] [e13n16:591874] [ 5] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] [e13n16:591874] [ 6] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] [e13n16:591874] [ 7] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] [e13n16:591874] [ 8] /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] [e13n16:591871] [ 3] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] [e13n16:591871] [ 4] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] [e13n16:591871] [ 5] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] [e13n16:591871] [ 6] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] [e13n16:591871] [ 7] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] [e13n16:591871] [ 8] /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] [e13n16:591871] [ 9] /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] [e13n16:591871] *** End of error message *** /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] [e13n16:591874] [ 9] /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] [e13n16:591874] *** End of error message *** /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] [e13n16:591872] [ 2] /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] [e13n16:591872] [ 3] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] [e13n16:591872] [ 4] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] [e13n16:591872] [ 5] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] [e13n16:591872] [ 6] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] [e13n16:591872] [ 7] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] [e13n16:591872] [ 8] /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] [e13n16:591872] [ 9] /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] [e13n16:591872] *** End of error message *** /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] [e13n16:591873] [ 2] /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] [e13n16:591873] [ 3] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] [e13n16:591873] [ 4] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] [e13n16:591873] [ 5] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] [e13n16:591873] [ 6] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] [e13n16:591873] [ 7] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] [e13n16:591873] [ 8] /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] [e13n16:591873] [ 9] /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] [e13n16:591873] *** End of error message *** ERROR: One or more process (first noticed rank 1) terminated with signal 11 (core dumped) /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper -arch=sm_70 CMakeFiles/xgc-es-cpp.dir/xgc-es-cpp_build_info.F90.o -o bin/xgc-es-cpp -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib:/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64 liblibxgc-es-cpp.a /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/liblapack.so /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/libblas.so /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libparmetis.so /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libmetis.so /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/fftw-3.3.9-bzi7deue27ijd7xm4zn7pt22u4sj47g4/lib/libfftw3.so libs/pspline/libpspline.a libs/camtimers/libtimers.a /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libacchost.so /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran_mpi.so.2.7.1 /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran.so.2.7.1 /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscontainers.a /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscore.a /usr/lib64/libcuda.so /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/cuda/11.0/lib64/libcudart.so /usr/lib64/libdl.so -lmpi_ibm_usempif08 -lmpi_ibm_usempi_ignore_tkr -lmpi_ibm_mpifh -lnvf -Wl,-rpath-link,/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64 19:39 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ make PETSC_DIR=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7 PETSC_ARCH="" ex1 *mpicc* -fPIC -g -fast -fPIC -g -fast -I/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/include ex1.c -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64 -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64 -Wl,-rpath,/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib -L/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib -Wl,-rpath,/usr/lib/gcc/ppc64le-redhat-linux/8 -L/usr/lib/gcc/ppc64le-redhat-linux/8 -lpetsc -llapack -lblas -lparmetis -lmetis -lstdc++ -ldl -lpthread -lmpiprofilesupport -lmpi_ibm_usempif08 -lmpi_ibm_usempi_ignore_tkr -lmpi_ibm_mpifh -lmpi_ibm -lnvf -lnvomp -latomic -lnvhpcatm -lnvcpumath -lnvc -lrt -lm -lgcc_s -lstdc++ -ldl -o ex1 19:40 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ *mpicc --version* *nvc 21.7-0 linuxpower target on Linuxpower* NVIDIA Compilers and Tools Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved. 19:40 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ jsrun -n 1 ./ex1 -ksp_monitor 0 KSP Residual norm 6.041522986797e+00 1 KSP Residual norm 1.042493382631e+00 2 KSP Residual norm 7.950907844730e-16 0 KSP Residual norm 4.786756692342e+00 1 KSP Residual norm 1.426392207750e-01 2 KSP Residual norm 1.801079604472e-15 0 KSP Residual norm 2.986456323228e+00 1 KSP Residual norm 7.669888809223e-02 2 KSP Residual norm 3.744083117256e-16 0 KSP Residual norm 2.306244667700e-01 1 KSP Residual norm 1.355550749587e-02 2 KSP Residual norm 5.845524837731e-17 0 KSP Residual norm 1.936314002654e-03 1 KSP Residual norm 2.125593590819e-04 2 KSP Residual norm 6.987141455073e-20 0 KSP Residual norm 1.435593531990e-07 1 KSP Residual norm 2.588271385567e-08 2 KSP Residual norm 3.942196167935e-23 -------------- next part -------------- An HTML attachment was scrubbed... URL: From sayosale at hotmail.com Fri Aug 27 09:12:56 2021 From: sayosale at hotmail.com (dazza simplythebest) Date: Fri, 27 Aug 2021 14:12:56 +0000 Subject: [petsc-users] Improving efficiency of slepc usage -memory management when using shift-invert In-Reply-To: References: Message-ID: Dear All, Okay, thanks for the tip and all the guidance this far - I will also investigate superLU as the linear solver. I have a good test problem now at least ! Have a good weekend and many thanks once again, Dan. ________________________________ From: Matthew Knepley Sent: Thursday, August 26, 2021 3:53 PM To: dazza simplythebest Cc: Jose E. Roman ; PETSc Subject: Re: [petsc-users] Improving efficiency of slepc usage -memory management when using shift-invert On Thu, Aug 26, 2021 at 8:32 AM dazza simplythebest > wrote: Dear Jose and Matthew, Many thanks for your assistance, this would seem to explain what the problem was. So judging by this test case, there seems to be a memory vs computational time tradeoff involved in choosing whether to shift-invert or not; the shift-invert will greatly reduce the number of required iterations ,but will require a higher memory cost ? I have been trying a few values of -st_mat_mumps_icntl_14 (and also the alternative -st_mat_mumps_icntl_23) today but have not yet been able to select one that fits onto the workstation I am using (although it seems that setting these parameters seems to guarantee that an error message is generated at least). Thus I will probably need to reduce the number of MPI processes and thereby reduce the memory requirement). In this regard the MUMPS documentation suggests that a hybrid MPI-OpenMP approach is optimum for their software, whereas I remember reading somewhere else that openmp threading was not a good choice for using PETSC, would you have any general advice on this ? Memory does not really track the number of MPI processes. MUMPS does a lot of things redundantly. For minimum memory, I would suggest trying SuperLU_dist: --download-superlu_dist I do not think OpenMP will have much influence at all. Thanks, Matt I was thinking maybe that a version of slepc / petsc compiled against openmp, and with the number of threads set appropriately, but not explicitly using openmp directives in the user's code may be the way forward ? That way PETSC will (?) just ignore the threading whereas threading will be available to MUMPS when execution is passed to those routines ? Many thanks once again, Dan. ________________________________ From: Jose E. Roman > Sent: Wednesday, August 25, 2021 1:40 PM To: dazza simplythebest > Cc: PETSc > Subject: Re: [petsc-users] Improving efficiency of slepc usage MUMPS documentation (section 8) indicates that the meaning of INFOG(1)=-9 is insuficient workspace. Try running with -st_mat_mumps_icntl_14 where is the percentage in which you want to increase the workspace, e.g. 50 or 100 or more. See ex43.c for an example showing how to set this option in code. Jose > El 25 ago 2021, a las 14:11, dazza simplythebest > escribi?: > > > > From: dazza simplythebest > > Sent: Wednesday, August 25, 2021 12:08 PM > To: Matthew Knepley > > Subject: Re: [petsc-users] Improving efficiency of slepc usage > > Dear Matthew and Jose, > I have derived a smaller program from the original program by constructing > matrices of the same size, but filling their entries randomly instead of computing the correct > fluid dynamics values just to allow faster experimentation. This modified code's behaviour seems > to be similar, with the code again failing for the large matrix case with the SIGKILL error, so I first report > results from that code here. Firstly I can confirm that I am using Fortran , and I am compiling with the > intel compiler, which it seems places automatic arrays on the stack. The stacksize, as determined > by ulimit -a, is reported to be : > stack size (kbytes, -s) 8192 > > [1] Okay, so I followed your suggestion and used ctrl-c followed by 'where' in one of the non-SIGKILL gdb windows. > I have pasted the output into the bottom of this email (see [1] output) - it does look like the problem occurs somewhere in the call > to the MUMPS solver ? > > [2] I have also today gained access to another workstation, and so have tried running the (original) code on that machine. > This new machine has two (more powerful) CPU nodes and a larger memory (both machines feature Intel Xeon processors). > On this new machine the large matrix case again failed with the familiar SIGKILL report when I used 16 or 12 MPI > processes, ran to the end w/out error for 4 or 6 MPI processes, and failed but with a PETSC error message > when I used 8 MPI processes, which I have pasted below (see [2] output). Does this point to some sort of resource > demand that exceeds some limit as the number of MPI processes increases ? > > Many thanks once again, > Dan. > > [2] output > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Error in external library > [0]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6 > > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [0]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021 > [0]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug > [0]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [1]PETSC ERROR: Error in external library > [1]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6 > > [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [1]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [1]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021 > [1]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug > [1]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > [1]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > [1]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > [1]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > [1]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > [2]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [2]PETSC ERROR: Error in external library > [2]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6 > > [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [2]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [2]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021 > [2]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug > [2]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > [2]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > [2]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > [2]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > [2]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > [3]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [3]PETSC ERROR: Error in external library > [3]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6 > > [3]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [3]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [3]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021 > [3]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug > [3]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > [3]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > [3]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > [3]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > [3]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > [3]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > [3]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > [3]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > [4]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [4]PETSC ERROR: Error in external library > [4]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6 > > [4]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [4]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [4]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021 > [4]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug > [4]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > [4]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > [4]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > [4]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > [4]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > [4]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > [4]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > [4]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > [4]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > [5]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [5]PETSC ERROR: Error in external library > [5]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=6 > > [5]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [5]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [5]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021 > [5]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug > [5]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > [5]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > [5]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > [5]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > [5]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > [5]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > [5]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > [5]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > [5]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > [6]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [6]PETSC ERROR: Error in external library > [6]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=21891045 > > [6]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [6]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [6]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021 > [6]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug > [6]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > [6]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > [6]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > [6]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > [6]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > [6]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > [6]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > [6]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > [6]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > [7]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [7]PETSC ERROR: Error in external library > [7]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, INFO(2)=21841925 > > [7]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [7]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 > [7]PETSC ERROR: ./stab1.exe on a arch-omp_nodbug named super02 by darren Wed Aug 25 11:18:48 2021 > [7]PETSC ERROR: Configure options ----with-debugging=0--package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double --with-debugging=0 --with-openmp --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-cmake PETSC_ARCH=arch-omp_nodbug > [7]PETSC ERROR: #1 MatFactorNumeric_MUMPS() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1686 > [7]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > [7]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > [7]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > [7]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > [7]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > [7]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > [7]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > [7]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > [0]PETSC ERROR: #2 MatLUFactorNumeric() at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > [0]PETSC ERROR: #3 PCSetUp_LU() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > [0]PETSC ERROR: #4 PCSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > [0]PETSC ERROR: #5 KSPSetUp() at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > [0]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > [0]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > [0]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > [0]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > [1]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > [1]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > [1]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > [1]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > [2]PETSC ERROR: #6 STSetUp_Sinvert() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > [2]PETSC ERROR: #7 STSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > [2]PETSC ERROR: #8 EPSSetUp() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > [2]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > [3]PETSC ERROR: #9 EPSSolve() at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > > > > [1] output > > Continuing. > [New Thread 0x7f6f5b2d2780 (LWP 794037)] > [New Thread 0x7f6f5aad0800 (LWP 794040)] > [New Thread 0x7f6f5a2ce880 (LWP 794041)] > ^C > Thread 1 "my.exe" received signal SIGINT, Interrupt. > 0x00007f72904927b0 in ofi_fastlock_release_noop () > from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so > (gdb) where > #0 0x00007f72904927b0 in ofi_fastlock_release_noop () > from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so > #1 0x00007f729049354b in ofi_cq_readfrom () > from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so > #2 0x00007f728ffe8f0e in rxm_ep_do_progress () > from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so > #3 0x00007f728ffe2b7d in rxm_ep_recv_common_flags () > from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so > #4 0x00007f728ffe30f8 in rxm_ep_trecvmsg () > from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so > #5 0x00007f72fe6b8c3e in PMPI_Iprobe (source=14090824, tag=-1481647392, > comm=1, flag=0x0, status=0xffffffffffffffff) > at /usr/include/rdma/fi_tagged.h:109 > #6 0x00007f72ff3d7fad in pmpi_iprobe_ (v1=0xd70248, v2=0x7ffda7afdae0, > v3=0x1, v4=0x0, v5=0xffffffffffffffff, ierr=0xd6fc90) > at ../../src/binding/fortran/mpif_h/iprobef.c:276 > #7 0x00007f730855b6e2 in zmumps_try_recvtreat (comm_load=1, ass_irecv=0, > blocking=, > > --Type for more, q to quit, c to continue without paging--cont > irecv=, message_received=, msgsou=1, msgtag=-1, status=..., bufr=..., lbufr=320782504, lbufr_bytes=1283130016, procnode_steps=..., posfac=1, iwpos=1, iwposcb=292535, iptrlu=2039063816, lrlu=2039063816, lrlus=2039063816, n=50400, iw=..., liw=292563, a=..., la=2611636796, ptrist=..., ptlust=..., ptrfac=..., ptrast=..., step=..., pimaster=..., pamaster=..., nstk_s=..., comp=0, iflag=0, ierror=0, comm=-1006632958, nbprocfils=..., ipool=..., lpool=5, leaf=1, nbfin=4, myid=1, slavef=4, root=, opassw=0, opeliw=0, itloc=..., rhs_mumps=..., fils=..., dad=..., ptrarw=..., ptraiw=..., intarr=..., dblarr=..., icntl=..., keep=..., keep8=..., dkeep=..., nd=..., frere=..., lptrar=50400, nelt=1, frtptr=..., frtelt=..., istep_to_iniv2=..., tab_pos_in_pere=..., stack_right_authorized=4294967295, lrgroups=...) at zfac_process_message.F:730 > #8 0x00007f73087076e2 in zmumps_fac_par_m::zmumps_fac_par (n=1, iw=..., liw=, a=..., la=, nstk_steps=..., nbprocfils=..., nd=..., fils=..., step=..., frere=..., dad=..., cand=..., istep_to_iniv2=..., tab_pos_in_pere=..., nstepsdone=1690339657, opass=, opeli=, nelva=50400, comp=259581, maxfrt=-1889517576, nmaxnpiv=-1195144887, ntotpv=, noffnegpv=, nb22t1=, nb22t2=, nbtiny=, det_exp=, det_mant=, det_sign=, ptrist=..., ptrast=..., pimaster=..., pamaster=..., ptrarw=..., ptraiw=..., itloc=..., rhs_mumps=..., ipool=..., lpool=, rinfo=, posfac=, iwpos=, lrlu=, iptrlu=, lrlus=, leaf=, nbroot=, nbrtot=, uu=, icntl=, ptlust=..., ptrfac=..., info=, keep=, keep8=, procnode_steps=..., slavef=, myid=, comm_nodes=, myid_nodes=, bufr=..., lbufr=0, lbufr_bytes=5, intarr=..., dblarr=..., root=..., perm=..., nelt=0, frtptr=..., frtelt=..., lptrar=3, comm_load=-30, ass_irecv=30, seuil=2.1219957909652723e-314, seuil_ldlt_niv2=4.2439866417681519e-314, mem_distrib=..., ne=..., dkeep=..., pivnul_list=..., lpn_list=0, lrgroups=...) at zfac_par_m.F:182 > #9 0x00007f730865af7a in zmumps_fac_b (n=1, s_is_pointers=..., la=, liw=, sym_perm=..., na=..., lna=1, ne_steps=..., nfsiz=..., fils=..., step=..., frere=..., dad=..., cand=..., istep_to_iniv2=..., tab_pos_in_pere=..., ptrar=..., ldptrar=, ptrist=..., ptlust_s=..., ptrfac=..., iw1=..., iw2=..., itloc=..., rhs_mumps=..., pool=..., lpool=-1889529280, cntl1=-5.3576889161551131e-255, icntl=, info=..., rinfo=..., keep=..., keep8=..., procnode_steps=..., slavef=-1889504640, comm_nodes=-2048052411, myid=, myid_nodes=-1683330500, bufr=..., lbufr=, lbufr_bytes=, zmumps_lbuf=, intarr=..., dblarr=..., root=, nelt=, frtptr=..., frtelt=..., comm_load=, ass_irecv=, seuil=, seuil_ldlt_niv2=, mem_distrib=, dkeep=, pivnul_list=..., lpn_list=, lrgroups=...) at zfac_b.F:243 > #10 0x00007f7308610ff7 in zmumps_fac_driver (id=) at zfac_driver.F:2421 > #11 0x00007f7308569256 in zmumps (id=) at zmumps_driver.F:1883 > #12 0x00007f73084cf756 in zmumps_f77 (job=1, sym=0, par=, comm_f77=, n=, nblk=1, icntl=..., cntl=..., keep=..., dkeep=..., keep8=..., nz=0, nnz=0, irn=..., irnhere=0, jcn=..., jcnhere=0, a=..., ahere=0, nz_loc=0, nnz_loc=304384739, irn_loc=..., irn_lochere=1, jcn_loc=..., jcn_lochere=1, a_loc=..., a_lochere=1, nelt=0, eltptr=..., eltptrhere=0, eltvar=..., eltvarhere=0, a_elt=..., a_elthere=0, blkptr=..., blkptrhere=0, blkvar=..., blkvarhere=0, perm_in=..., perm_inhere=0, rhs=..., rhshere=0, redrhs=..., redrhshere=0, info=..., rinfo=..., infog=..., rinfog=..., deficiency=0, lwk_user=0, size_schur=0, listvar_schur=..., listvar_schurhere=0, schur=..., schurhere=0, wk_user=..., wk_userhere=0, colsca=..., colscahere=0, rowsca=..., rowscahere=0, instance_number=1, nrhs=1, lrhs=0, lredrhs=0, rhs_sparse=..., rhs_sparsehere=0, sol_loc=..., sol_lochere=0, rhs_loc=..., rhs_lochere=0, irhs_sparse=..., irhs_sparsehere=0, irhs_ptr=..., irhs_ptrhere=0, isol_loc=..., isol_lochere=0, irhs_loc=..., irhs_lochere=0, nz_rhs=0, lsol_loc=0, lrhs_loc=0, nloc_rhs=0, schur_mloc=0, schur_nloc=0, schur_lld=0, mblock=0, nblock=0, nprow=0, npcol=0, ooc_tmpdir=..., ooc_prefix=..., write_problem=..., save_dir=..., save_prefix=..., tmpdirlen=20, prefixlen=20, write_problemlen=20, save_dirlen=20, save_prefixlen=20, metis_options=...) at zmumps_f77.F:289 > #13 0x00007f73084cd391 in zmumps_c (mumps_par=0xd70248) at mumps_c.c:485 > #14 0x00007f7307c035ad in MatFactorNumeric_MUMPS (F=0xd70248, A=0x7ffda7afdae0, info=0x1) at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/impls/aij/mpi/mumps/mumps.c:1683 > #15 0x00007f7307765a8b in MatLUFactorNumeric (fact=0xd70248, mat=0x7ffda7afdae0, info=0x1) at /data/work/slepc/PETSC/petsc-3.15.0/src/mat/interface/matrix.c:3195 > #16 0x00007f73081b8427 in PCSetUp_LU (pc=0xd70248) at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/impls/factor/lu/lu.c:131 > #17 0x00007f7308214939 in PCSetUp (pc=0xd70248) at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/pc/interface/precon.c:1015 > #18 0x00007f73082260ae in KSPSetUp (ksp=0xd70248) at /data/work/slepc/PETSC/petsc-3.15.0/src/ksp/ksp/interface/itfunc.c:406 > #19 0x00007f7309114959 in STSetUp_Sinvert (st=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/impls/sinvert/sinvert.c:123 > #20 0x00007f7309130462 in STSetUp (st=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/sys/classes/st/interface/stsolve.c:582 > #21 0x00007f73092504af in EPSSetUp (eps=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssetup.c:350 > #22 0x00007f7309253635 in EPSSolve (eps=0xd70248) at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/epssolve.c:136 > #23 0x00007f7309259c8d in epssolve_ (eps=0xd70248, __ierr=0x7ffda7afdae0) at /data/work/slepc/SLEPC/slepc-3.15.1/src/eps/interface/ftn-auto/epssolvef.c:85 > #24 0x0000000000403c19 in all_stab_routines::solve_by_slepc2 (a_pet=..., b_pet=..., jthisone=, isize=) at small_slepc_example_program.F:322 > #25 0x00000000004025a0 in slepit () at small_slepc_example_program.F:549 > #26 0x00000000004023f2 in main () > #27 0x00007f72fb8380b3 in __libc_start_main (main=0x4023c0
, argc=14, argv=0x7ffda7b024e8, init=, fini=, rtld_fini=, stack_end=0x7ffda7b024d8) at ../csu/libc-start.c:308 > #28 0x00000000004022fe in _start () > > From: Matthew Knepley > > Sent: Tuesday, August 24, 2021 3:59 PM > To: dazza simplythebest > > Cc: Jose E. Roman >; PETSc > > Subject: Re: [petsc-users] Improving efficiency of slepc usage > > On Tue, Aug 24, 2021 at 8:47 AM dazza simplythebest > wrote: > > Dear Matthew and Jose, > Apologies for the delayed reply, I had a couple of unforeseen days off this week. > Firstly regarding Jose's suggestion re: MUMPS, the program is already using MUMPS > to solve linear systems (the code is using a distributed MPI matrix to solve the generalised > non-Hermitian complex problem). > > I have tried the gdb debugger as per Matthew's suggestion. > Just to note in case someone else is following this that at first it didn't work (couldn't 'attach') , > but after some googling I found a tip suggesting the command; > echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope > which seemed to get it working. > > I then first ran the debugger on the small matrix case that worked. > That stopped in gdb almost immediately after starting execution > with a report regarding 'nanosleep.c': > ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory. > However, issuing the 'cont' command again caused the program to run through to the end of the > execution w/out any problems, and with correct looking results, so I am guessing this error > is not particularly important. > > We do that on purpose when the debugger starts up. Typing 'cont' is correct. > > I then tried the same debugging procedure on the large matrix case that fails. > The code again stopped almost immediately after the start of execution with > the same nanosleep error as before, and I was able to set the program running > again with 'cont' (see full output below). I was running the code with 4 MPI processes, > and so had 4 gdb windows appear. Thereafter the code ran for sometime until completing the > matrix construction, and then one of the gdb process windows printed a > Program terminated with signal SIGKILL, Killed. > The program no longer exists. > message. I then typed 'where' into this terminal but just received the message > No stack. > > I have only seen this behavior one other time, and it was with Fortran. Fortran allows you to declare really big arrays > on the stack by putting them at the start of a function (rather than F90 malloc). When I had one of those arrays exceed > the stack space, I got this kind of an error where everything is destroyed rather than just stopping. Could it be that you > have a large structure on the stack? > > Second, you can at least look at the stack for the processes that were not killed. You type Ctrl-C, which should give you > the prompt and then "where". > > Thanks, > > Matt > > The other gdb windows basically seemed to be left in limbo until I issued the 'quit' > command in the SIGKILL, and then they vanished. > > I paste the full output from the gdb window that recorded the SIGKILL below here. > I guess it is necessary to somehow work out where the SIGKILL originates from ? > > Thanks once again, > Dan. > > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2 > Copyright (C) 2020 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. > Type "show copying" and "show warranty" for details. > This GDB was configured as "x86_64-linux-gnu". > Type "show configuration" for configuration details. > For bug reporting instructions, please see: > . > Find the GDB manual and other documentation resources online at: > . > > For help, type "help". > Type "apropos word" to search for commands related to "word"... > Reading symbols from ./stab1.exe... > Attaching to program: /data/work/rotplane/omega_to_zero/stability/test/tmp10/tmp6/stab1.exe, process 675919 > Reading symbols from /data/work/slepc/SLEPC/slepc-3.15.1/arch-omp_nodbug/lib/libslepc.so.3.15... > Reading symbols from /data/work/slepc/PETSC/petsc-3.15.0/arch-omp_nodbug/lib/libpetsc.so.3.15... > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib--Type for more, q to quit, c to continue without paging--cont > /intel64_lin/libmkl_intel_lp64.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_lp64.so) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_core.so... > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_blacs_intelmpi_lp64.so) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.so... > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libiomp5.dbg... > Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2... > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libdl-2.31.so... > Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0... > Reading symbols from /usr/lib/debug/.build-id/e5/4761f7b554d0fcc1562959665d93dffbebdaf0.debug... > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". > Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6... > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libstdc++.so.6) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpifort.so.12... > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12... > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.dbg... > Reading symbols from /lib/x86_64-linux-gnu/librt.so.1... > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/librt-2.31.so... > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libifport.so.5) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libimf.so) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libsvml.so) > Reading symbols from /lib/x86_64-linux-gnu/libm.so.6... > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libm-2.31.so... > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirc.so) > Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1... > (No debugging symbols found in /lib/x86_64-linux-gnu/libgcc_s.so.1) > Reading symbols from /usr/lib/x86_64-linux-gnu/libquadmath.so.0... > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libquadmath.so.0) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/libmpi_ilp64.so) > Reading symbols from /lib/x86_64-linux-gnu/libc.so.6... > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.31.so... > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libirng.so) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/compiler/lib/intel64_lin/libintlc.so.5) > Reading symbols from /lib64/ld-linux-x86-64.so.2... > Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.31.so... > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/libfabric.so.1) > Reading symbols from /usr/lib/x86_64-linux-gnu/libnuma.so... > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libnuma.so) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libtcp-fi.so) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libsockets-fi.so) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so) > Reading symbols from /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so... > (No debugging symbols found in /opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so) > Reading symbols from /usr/lib/x86_64-linux-gnu/libpsm2.so.2... > (No debugging symbols found in /usr/lib/x86_64-linux-gnu/libpsm2.so.2) > 0x00007fac4d0d8334 in __GI___clock_nanosleep (clock_id=, clock_id at entry=0, flags=flags at entry=0, req=req at entry=0x7ffdc641a9a0, rem=rem at entry=0x7ffdc641a9a0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78 > 78 ../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory. > (gdb) cont > Continuing. > [New Thread 0x7f9e49c02780 (LWP 676559)] > [New Thread 0x7f9e49400800 (LWP 676560)] > [New Thread 0x7f9e48bfe880 (LWP 676562)] > [Thread 0x7f9e48bfe880 (LWP 676562) exited] > [Thread 0x7f9e49400800 (LWP 676560) exited] > [Thread 0x7f9e49c02780 (LWP 676559) exited] > > Program terminated with signal SIGKILL, Killed. > The program no longer exists. > (gdb) where > No stack. > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > From: Matthew Knepley > > Sent: Friday, August 20, 2021 2:12 PM > To: dazza simplythebest > > Cc: Jose E. Roman >; PETSc > > Subject: Re: [petsc-users] Improving efficiency of slepc usage > > On Fri, Aug 20, 2021 at 6:55 AM dazza simplythebest > wrote: > Dear Jose, > Many thanks for your response, I have been investigating this issue with a few more calculations > today, hence the slightly delayed response. > > The problem is actually derived from a fluid dynamics problem, so to allow an easier exploration of things > I first downsized the resolution of the underlying fluid solver while keeping all the physical parameters > the same - i.e. I would get a smaller matrix that should be solving the same physical problem as the original > larger matrix but to lower accuracy. > > Results > > Small matrix (N= 21168) - everything good! > This converged when using the -eps_largest_real approach (taking 92 iterations for nev=10, > tol= 5.0000E-06 and ncv = 300), and also when using the shift-invert approach, converging > very impressively in a single iteration ! Interestingly it did this both for a non-zero -eps_target > and also for a zero -eps_target. > > Large matrix (N=50400)- works for -eps_largest_real , fails for st_type sinvert > I have just double checked again that the code does run properly when we use the -eps_largest_real > option - indeed I ran it with a small nev and large tolerance (nev = 4, tol= -eps_tol 5.0e-4 , ncv = 300) > and with these parameters convergence was obtained in 164 iterations, which took 6 hours on the > machine I was running it on. Furthermore the eigenvalues seem to be ballpark correct; for this large > higher resolution case (although with lower slepc tolerance) we obtain 1789.56816314173 -4724.51319554773i > as the eigenvalue with largest real part, while the smaller matrix (same physical problem but at lower resolution case) > found this eigenvalue to be 1831.11845726501 -4787.54519511345i , which means the agreement is in line > with expectations. > > Unfortunately though the code does still crash though when I try to do shift-invert for the large matrix case , > whether or not I use a non-zero -eps_target. For reference this is the command line used : > -eps_nev 10 -eps_ncv 300 -log_view -eps_view -eps_target 0.1 -st_type sinvert -eps_monitor :monitor_output05.txt > To be precise the code crashes soon after calling EPSSolve (it successfully calls > MatCreateVecs, EPSCreate, EPSSetOperators, EPSSetProblemType and EPSSetFromOptions). > By crashes I mean that I do not even get any error messages from slepc/PETSC, and do not even get the > 'EPS Object: 16 MPI processes' message - I simply get a MPI/Fortran 'KILLED BY SIGNAL: 9 (Killed)' message > as soon as EPSsolve is called. > > Hi Dan, > > It would help track this error down if we had a stack trace. You can get a stack trace from the debugger. You run with > > -start_in_debugger > > which should launch the debugger (usually), and then type > > cont > > to continue, and then > > where > > to get the stack trace when it crashes, or 'bt' on lldb. > > Thanks, > > Matt > > Do you have any ideas as to why this larger matrix case should fail when using shift-invert but succeed when using > -eps_largest_real ? The fact that the program works and produces correct results > when using the -eps_largest_real option suggests that there is probably nothing wrong with the specification > of the problem or the matrices ? It is strange how there is no error message from slepc / Petsc ... the > only idea I have at the moment is that perhaps max memory has been exceeded, which could cause such a sudden > shutdown? For your reference when running the large matrix case with the -eps_largest_real option I am using > about 36 GB of the 148GB available on this machine - does the shift invert approach require substantially > more memory for example ? > > I would be very grateful if you have any suggestions to resolve this issue or even ways to clarify it further, > the performance I have seen with the shift-invert for the small matrix is so impressive it would be great to > get that working for the full-size problem. > > Many thanks and best wishes, > Dan. > > > > From: Jose E. Roman > > Sent: Thursday, August 19, 2021 7:58 AM > To: dazza simplythebest > > Cc: PETSc > > Subject: Re: [petsc-users] Improving efficiency of slepc usage > > In A) convergence may be slow, especially if the wanted eigenvalues have small magnitude. I would not say 600 iterations is a lot, you probably need many more. In most cases, approach B) is better because it improves convergence of eigenvalues close to the target, but it requires prior knowledge of your spectrum distribution in order to choose an appropriate target. > > In B) what do you mean that it crashes. If you get an error about factorization, it means that your A-matrix is singular, In that case, try using a nonzero target -eps_target 0.1 > > Jose > > > > El 19 ago 2021, a las 7:12, dazza simplythebest > escribi?: > > > > Dear All, > > I am planning on using slepc to do a large number of eigenvalue calculations > > of a generalized eigenvalue problem, called from a program written in fortran using MPI. > > Thus far I have successfully installed the slepc/PETSc software, both locally and on a cluster, > > and on smaller test problems everything is working well; the matrices are efficiently and > > correctly constructed and slepc returns the correct spectrum. I am just now starting to move > > towards now solving the full-size 'production run' problems, and would appreciate some > > general advice on how to improve the solver's performance. > > > > In particular, I am currently trying to solve the problem Ax = lambda Bx whose matrices > > are of size 50000 (this is the smallest 'production run' problem I will be tackling), and are > > complex, non-Hermitian. In most cases I aim to find the eigenvalues with the largest real part, > > although in other cases I will also be interested in finding the eigenvalues whose real part > > is close to zero. > > > > A) > > Calling slepc 's EPS solver with the following options: > > > > -eps_nev 10 -log_view -eps_view -eps_max_it 600 -eps_ncv 140 -eps_tol 5.0e-6 -eps_largest_real -eps_monitor :monitor_output.txt > > > > > > led to the code successfully running, but failing to find any eigenvalues within the maximum 600 iterations > > (examining the monitor output it did appear to be very slowly approaching convergence). > > > > B) > > On the same problem I have also tried a shift-invert transformation using the options > > > > -eps_nev 10 -eps_ncv 140 -eps_target 0.0+0.0i -st_type sinvert > > > > -in this case the code crashed at the point it tried to call slepc, so perhaps I have incorrectly specified these options ? > > > > > > Does anyone have any suggestions as to how to improve this performance ( or find out more about the problem) ? > > In the case of A) I can see from watching the slepc videos that increasing ncv > > may help, but I am wondering , since 600 is a large number of iterations, whether there > > maybe something else going on - e.g. perhaps some alternative preconditioner may help ? > > In the case of B), I guess there must be some mistake in these command line options? > > Again, any advice will be greatly appreciated. > > Best wishes, Dan. > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Fri Aug 27 09:49:59 2021 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Fri, 27 Aug 2021 09:49:59 -0500 Subject: [petsc-users] runtime error on Summit with nvhpc21.7 In-Reply-To: References: Message-ID: On Fri, Aug 27, 2021 at 7:06 AM Mark Adams wrote: > I have a user (cc'ed) that has a C++ code and is using a PETSc that I > built. He is getting this runtime error. > > 'make check' runs clean and I built snes/tutorial/ex1 manually, to get a > link line, and it ran fine. > I appended the users link line and my test. > > I see that they are using Kokkos' "nvcc_wrapper". Should I rebuild PETSc > using that, maybe we just need to make sure we are both using the same > underlying compiler or should they use mpiCC? > It looks like they used nvcc_wrapper to replace nvcc. You can ask them to use nvcc directly to see what happens. But the error happened in petsc initialization, petscsys_petscinitializenohelp, so I doubt it helps. The easy way is to just attach a debugger. > > Thanks, > Mark > > > [e13n16:591873] *** Process received signal *** > > [e13n16:591873] Signal: Segmentation fault (11) > > [e13n16:591873] Signal code: Invalid permissions (2) > > [e13n16:591873] Failing at address: 0x102c87e0 > > [e13n16:591873] [ 0] > linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] > > [e13n16:591873] [ 1] [e13n16:591872] *** Process received signal *** > > [e13n16:591872] Signal: Segmentation fault (11) > > [e13n16:591872] Signal code: Invalid permissions (2) > > [e13n16:591872] Failing at address: 0x102c87e0 > > [e13n16:591872] [ 0] > linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] > > [e13n16:591872] [ 1] [e13n16:591871] *** Process received signal *** > > [e13n16:591871] Signal: Segmentation fault (11) > > [e13n16:591871] Signal code: Invalid permissions (2) > > [e13n16:591871] Failing at address: 0x102c87e0 > > [e13n16:591871] [ 0] > linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] > > [e13n16:591871] [ 1] > /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] > > [e13n16:591871] [ 2] [e13n16:591874] *** Process received signal *** > > [e13n16:591874] Signal: Segmentation fault (11) > > [e13n16:591874] Signal code: Invalid permissions (2) > > [e13n16:591874] Failing at address: 0x102c87e0 > > [e13n16:591874] [ 0] > linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] > > [e13n16:591874] [ 1] > /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] > > [e13n16:591874] [ 2] > /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] > > [e13n16:591874] [ 3] > /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] > > [e13n16:591874] [ 4] > /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] > > [e13n16:591874] [ 5] > /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] > > [e13n16:591874] [ 6] > /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] > > [e13n16:591874] [ 7] > /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] > > [e13n16:591874] [ 8] > /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] > > [e13n16:591871] [ 3] > /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] > > [e13n16:591871] [ 4] > /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] > > [e13n16:591871] [ 5] > /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] > > [e13n16:591871] [ 6] > /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] > > [e13n16:591871] [ 7] > /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] > > [e13n16:591871] [ 8] > /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] > > [e13n16:591871] [ 9] > /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] > > [e13n16:591871] *** End of error message *** > > > /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] > > [e13n16:591874] [ 9] > /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] > > [e13n16:591874] *** End of error message *** > > > /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] > > [e13n16:591872] [ 2] > /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] > > [e13n16:591872] [ 3] > /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] > > [e13n16:591872] [ 4] > /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] > > [e13n16:591872] [ 5] > /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] > > [e13n16:591872] [ 6] > /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] > > [e13n16:591872] [ 7] > /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] > > [e13n16:591872] [ 8] > /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] > > [e13n16:591872] [ 9] > /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] > > [e13n16:591872] *** End of error message *** > > > /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] > > [e13n16:591873] [ 2] > /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] > > [e13n16:591873] [ 3] > /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] > > [e13n16:591873] [ 4] > /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] > > [e13n16:591873] [ 5] > /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] > > [e13n16:591873] [ 6] > /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] > > [e13n16:591873] [ 7] > /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] > > [e13n16:591873] [ 8] > /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] > > [e13n16:591873] [ 9] > /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] > > [e13n16:591873] *** End of error message *** > > ERROR: One or more process (first noticed rank 1) terminated with signal > 11 (core dumped) > > > > > /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper > -arch=sm_70 CMakeFiles/xgc-es-cpp.dir/xgc-es-cpp_build_info.F90.o -o > bin/xgc-es-cpp -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib:/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64 > liblibxgc-es-cpp.a > /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/liblapack.so > /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/libblas.so > /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so > /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libparmetis.so > /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libmetis.so > /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/fftw-3.3.9-bzi7deue27ijd7xm4zn7pt22u4sj47g4/lib/libfftw3.so > libs/pspline/libpspline.a libs/camtimers/libtimers.a > /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libacchost.so > /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran_mpi.so.2.7.1 > /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran.so.2.7.1 > /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscontainers.a > /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscore.a > /usr/lib64/libcuda.so > /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/cuda/11.0/lib64/libcudart.so > /usr/lib64/libdl.so -lmpi_ibm_usempif08 -lmpi_ibm_usempi_ignore_tkr > -lmpi_ibm_mpifh -lnvf > -Wl,-rpath-link,/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64 > > > > 19:39 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ > make > PETSC_DIR=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7 > PETSC_ARCH="" ex1 > *mpicc* -fPIC -g -fast -fPIC -g -fast > -I/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/include > ex1.c > -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib > -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib > -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib > -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib > -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib > -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib > -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib > -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib > -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64 > -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64 > -Wl,-rpath,/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib > -L/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib > -Wl,-rpath,/usr/lib/gcc/ppc64le-redhat-linux/8 > -L/usr/lib/gcc/ppc64le-redhat-linux/8 -lpetsc -llapack -lblas -lparmetis > -lmetis -lstdc++ -ldl -lpthread -lmpiprofilesupport -lmpi_ibm_usempif08 > -lmpi_ibm_usempi_ignore_tkr -lmpi_ibm_mpifh -lmpi_ibm -lnvf -lnvomp > -latomic -lnvhpcatm -lnvcpumath -lnvc -lrt -lm -lgcc_s -lstdc++ -ldl -o ex1 > 19:40 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ *mpicc > --version* > > *nvc 21.7-0 linuxpower target on Linuxpower* > NVIDIA Compilers and Tools > Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved. > 19:40 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ > jsrun -n 1 ./ex1 -ksp_monitor > 0 KSP Residual norm 6.041522986797e+00 > 1 KSP Residual norm 1.042493382631e+00 > 2 KSP Residual norm 7.950907844730e-16 > 0 KSP Residual norm 4.786756692342e+00 > 1 KSP Residual norm 1.426392207750e-01 > 2 KSP Residual norm 1.801079604472e-15 > 0 KSP Residual norm 2.986456323228e+00 > 1 KSP Residual norm 7.669888809223e-02 > 2 KSP Residual norm 3.744083117256e-16 > 0 KSP Residual norm 2.306244667700e-01 > 1 KSP Residual norm 1.355550749587e-02 > 2 KSP Residual norm 5.845524837731e-17 > 0 KSP Residual norm 1.936314002654e-03 > 1 KSP Residual norm 2.125593590819e-04 > 2 KSP Residual norm 6.987141455073e-20 > 0 KSP Residual norm 1.435593531990e-07 > 1 KSP Residual norm 2.588271385567e-08 > 2 KSP Residual norm 3.942196167935e-23 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Aug 27 13:52:42 2021 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 27 Aug 2021 14:52:42 -0400 Subject: [petsc-users] runtime error on Summit with nvhpc21.7 In-Reply-To: References: Message-ID: I think the problem is that I build with MPICC and they use nvcc_wrapper. I could just try building PETSc with CC=nvcc_wrapper, but it was not clear if this was the way to go. I will try it. Thanks, Mark On Fri, Aug 27, 2021 at 10:50 AM Junchao Zhang wrote: > > > > On Fri, Aug 27, 2021 at 7:06 AM Mark Adams wrote: > >> I have a user (cc'ed) that has a C++ code and is using a PETSc that I >> built. He is getting this runtime error. >> >> 'make check' runs clean and I built snes/tutorial/ex1 manually, to get a >> link line, and it ran fine. >> I appended the users link line and my test. >> >> I see that they are using Kokkos' "nvcc_wrapper". Should I rebuild PETSc >> using that, maybe we just need to make sure we are both using the same >> underlying compiler or should they use mpiCC? >> > It looks like they used nvcc_wrapper to replace nvcc. You can ask them to > use nvcc directly to see what happens. But the error happened in petsc > initialization, petscsys_petscinitializenohelp, so I doubt it helps. The > easy way is to just attach a debugger. > >> >> Thanks, >> Mark >> >> >> [e13n16:591873] *** Process received signal *** >> >> [e13n16:591873] Signal: Segmentation fault (11) >> >> [e13n16:591873] Signal code: Invalid permissions (2) >> >> [e13n16:591873] Failing at address: 0x102c87e0 >> >> [e13n16:591873] [ 0] >> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] >> >> [e13n16:591873] [ 1] [e13n16:591872] *** Process received signal *** >> >> [e13n16:591872] Signal: Segmentation fault (11) >> >> [e13n16:591872] Signal code: Invalid permissions (2) >> >> [e13n16:591872] Failing at address: 0x102c87e0 >> >> [e13n16:591872] [ 0] >> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] >> >> [e13n16:591872] [ 1] [e13n16:591871] *** Process received signal *** >> >> [e13n16:591871] Signal: Segmentation fault (11) >> >> [e13n16:591871] Signal code: Invalid permissions (2) >> >> [e13n16:591871] Failing at address: 0x102c87e0 >> >> [e13n16:591871] [ 0] >> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] >> >> [e13n16:591871] [ 1] >> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] >> >> [e13n16:591871] [ 2] [e13n16:591874] *** Process received signal *** >> >> [e13n16:591874] Signal: Segmentation fault (11) >> >> [e13n16:591874] Signal code: Invalid permissions (2) >> >> [e13n16:591874] Failing at address: 0x102c87e0 >> >> [e13n16:591874] [ 0] >> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] >> >> [e13n16:591874] [ 1] >> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] >> >> [e13n16:591874] [ 2] >> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] >> >> [e13n16:591874] [ 3] >> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] >> >> [e13n16:591874] [ 4] >> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] >> >> [e13n16:591874] [ 5] >> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] >> >> [e13n16:591874] [ 6] >> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] >> >> [e13n16:591874] [ 7] >> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] >> >> [e13n16:591874] [ 8] >> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] >> >> [e13n16:591871] [ 3] >> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] >> >> [e13n16:591871] [ 4] >> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] >> >> [e13n16:591871] [ 5] >> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] >> >> [e13n16:591871] [ 6] >> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] >> >> [e13n16:591871] [ 7] >> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] >> >> [e13n16:591871] [ 8] >> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] >> >> [e13n16:591871] [ 9] >> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] >> >> [e13n16:591871] *** End of error message *** >> >> >> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] >> >> [e13n16:591874] [ 9] >> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] >> >> [e13n16:591874] *** End of error message *** >> >> >> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] >> >> [e13n16:591872] [ 2] >> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] >> >> [e13n16:591872] [ 3] >> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] >> >> [e13n16:591872] [ 4] >> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] >> >> [e13n16:591872] [ 5] >> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] >> >> [e13n16:591872] [ 6] >> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] >> >> [e13n16:591872] [ 7] >> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] >> >> [e13n16:591872] [ 8] >> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] >> >> [e13n16:591872] [ 9] >> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] >> >> [e13n16:591872] *** End of error message *** >> >> >> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] >> >> [e13n16:591873] [ 2] >> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] >> >> [e13n16:591873] [ 3] >> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] >> >> [e13n16:591873] [ 4] >> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] >> >> [e13n16:591873] [ 5] >> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] >> >> [e13n16:591873] [ 6] >> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] >> >> [e13n16:591873] [ 7] >> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] >> >> [e13n16:591873] [ 8] >> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] >> >> [e13n16:591873] [ 9] >> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] >> >> [e13n16:591873] *** End of error message *** >> >> ERROR: One or more process (first noticed rank 1) terminated with signal >> 11 (core dumped) >> >> >> >> >> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper >> -arch=sm_70 CMakeFiles/xgc-es-cpp.dir/xgc-es-cpp_build_info.F90.o -o >> bin/xgc-es-cpp -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib:/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64 >> liblibxgc-es-cpp.a >> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/liblapack.so >> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/libblas.so >> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so >> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libparmetis.so >> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libmetis.so >> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/fftw-3.3.9-bzi7deue27ijd7xm4zn7pt22u4sj47g4/lib/libfftw3.so >> libs/pspline/libpspline.a libs/camtimers/libtimers.a >> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libacchost.so >> /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran_mpi.so.2.7.1 >> /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran.so.2.7.1 >> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscontainers.a >> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscore.a >> /usr/lib64/libcuda.so >> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/cuda/11.0/lib64/libcudart.so >> /usr/lib64/libdl.so -lmpi_ibm_usempif08 -lmpi_ibm_usempi_ignore_tkr >> -lmpi_ibm_mpifh -lnvf >> -Wl,-rpath-link,/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64 >> >> >> >> 19:39 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ >> make >> PETSC_DIR=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7 >> PETSC_ARCH="" ex1 >> *mpicc* -fPIC -g -fast -fPIC -g -fast >> -I/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/include >> ex1.c >> -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib >> -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib >> -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib >> -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib >> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib >> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib >> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib >> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib >> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64 >> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64 >> -Wl,-rpath,/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib >> -L/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib >> -Wl,-rpath,/usr/lib/gcc/ppc64le-redhat-linux/8 >> -L/usr/lib/gcc/ppc64le-redhat-linux/8 -lpetsc -llapack -lblas -lparmetis >> -lmetis -lstdc++ -ldl -lpthread -lmpiprofilesupport -lmpi_ibm_usempif08 >> -lmpi_ibm_usempi_ignore_tkr -lmpi_ibm_mpifh -lmpi_ibm -lnvf -lnvomp >> -latomic -lnvhpcatm -lnvcpumath -lnvc -lrt -lm -lgcc_s -lstdc++ -ldl -o ex1 >> 19:40 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ *mpicc >> --version* >> >> *nvc 21.7-0 linuxpower target on Linuxpower* >> NVIDIA Compilers and Tools >> Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved. >> 19:40 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ >> jsrun -n 1 ./ex1 -ksp_monitor >> 0 KSP Residual norm 6.041522986797e+00 >> 1 KSP Residual norm 1.042493382631e+00 >> 2 KSP Residual norm 7.950907844730e-16 >> 0 KSP Residual norm 4.786756692342e+00 >> 1 KSP Residual norm 1.426392207750e-01 >> 2 KSP Residual norm 1.801079604472e-15 >> 0 KSP Residual norm 2.986456323228e+00 >> 1 KSP Residual norm 7.669888809223e-02 >> 2 KSP Residual norm 3.744083117256e-16 >> 0 KSP Residual norm 2.306244667700e-01 >> 1 KSP Residual norm 1.355550749587e-02 >> 2 KSP Residual norm 5.845524837731e-17 >> 0 KSP Residual norm 1.936314002654e-03 >> 1 KSP Residual norm 2.125593590819e-04 >> 2 KSP Residual norm 6.987141455073e-20 >> 0 KSP Residual norm 1.435593531990e-07 >> 1 KSP Residual norm 2.588271385567e-08 >> 2 KSP Residual norm 3.942196167935e-23 >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Fri Aug 27 14:55:52 2021 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Fri, 27 Aug 2021 14:55:52 -0500 Subject: [petsc-users] runtime error on Summit with nvhpc21.7 In-Reply-To: References: Message-ID: On Fri, Aug 27, 2021, 1:52 PM Mark Adams wrote: > I think the problem is that I build with MPICC and they use nvcc_wrapper. > I could just try building PETSc with CC=nvcc_wrapper, but it was not > clear if this was the way to go. > --with-nvcc=nvcc_wrapper > I will try it. > Thanks, > Mark > > On Fri, Aug 27, 2021 at 10:50 AM Junchao Zhang > wrote: > >> >> >> >> On Fri, Aug 27, 2021 at 7:06 AM Mark Adams wrote: >> >>> I have a user (cc'ed) that has a C++ code and is using a PETSc that I >>> built. He is getting this runtime error. >>> >>> 'make check' runs clean and I built snes/tutorial/ex1 manually, to get a >>> link line, and it ran fine. >>> I appended the users link line and my test. >>> >>> I see that they are using Kokkos' "nvcc_wrapper". Should I rebuild >>> PETSc using that, maybe we just need to make sure we are both using the >>> same underlying compiler or should they use mpiCC? >>> >> It looks like they used nvcc_wrapper to replace nvcc. You can ask them >> to use nvcc directly to see what happens. But the error happened in petsc >> initialization, petscsys_petscinitializenohelp, so I doubt it helps. >> The easy way is to just attach a debugger. >> >>> >>> Thanks, >>> Mark >>> >>> >>> [e13n16:591873] *** Process received signal *** >>> >>> [e13n16:591873] Signal: Segmentation fault (11) >>> >>> [e13n16:591873] Signal code: Invalid permissions (2) >>> >>> [e13n16:591873] Failing at address: 0x102c87e0 >>> >>> [e13n16:591873] [ 0] >>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] >>> >>> [e13n16:591873] [ 1] [e13n16:591872] *** Process received signal *** >>> >>> [e13n16:591872] Signal: Segmentation fault (11) >>> >>> [e13n16:591872] Signal code: Invalid permissions (2) >>> >>> [e13n16:591872] Failing at address: 0x102c87e0 >>> >>> [e13n16:591872] [ 0] >>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] >>> >>> [e13n16:591872] [ 1] [e13n16:591871] *** Process received signal *** >>> >>> [e13n16:591871] Signal: Segmentation fault (11) >>> >>> [e13n16:591871] Signal code: Invalid permissions (2) >>> >>> [e13n16:591871] Failing at address: 0x102c87e0 >>> >>> [e13n16:591871] [ 0] >>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] >>> >>> [e13n16:591871] [ 1] >>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] >>> >>> [e13n16:591871] [ 2] [e13n16:591874] *** Process received signal *** >>> >>> [e13n16:591874] Signal: Segmentation fault (11) >>> >>> [e13n16:591874] Signal code: Invalid permissions (2) >>> >>> [e13n16:591874] Failing at address: 0x102c87e0 >>> >>> [e13n16:591874] [ 0] >>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] >>> >>> [e13n16:591874] [ 1] >>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] >>> >>> [e13n16:591874] [ 2] >>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] >>> >>> [e13n16:591874] [ 3] >>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] >>> >>> [e13n16:591874] [ 4] >>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] >>> >>> [e13n16:591874] [ 5] >>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] >>> >>> [e13n16:591874] [ 6] >>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] >>> >>> [e13n16:591874] [ 7] >>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] >>> >>> [e13n16:591874] [ 8] >>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] >>> >>> [e13n16:591871] [ 3] >>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] >>> >>> [e13n16:591871] [ 4] >>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] >>> >>> [e13n16:591871] [ 5] >>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] >>> >>> [e13n16:591871] [ 6] >>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] >>> >>> [e13n16:591871] [ 7] >>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] >>> >>> [e13n16:591871] [ 8] >>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] >>> >>> [e13n16:591871] [ 9] >>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] >>> >>> [e13n16:591871] *** End of error message *** >>> >>> >>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] >>> >>> [e13n16:591874] [ 9] >>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] >>> >>> [e13n16:591874] *** End of error message *** >>> >>> >>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] >>> >>> [e13n16:591872] [ 2] >>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] >>> >>> [e13n16:591872] [ 3] >>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] >>> >>> [e13n16:591872] [ 4] >>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] >>> >>> [e13n16:591872] [ 5] >>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] >>> >>> [e13n16:591872] [ 6] >>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] >>> >>> [e13n16:591872] [ 7] >>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] >>> >>> [e13n16:591872] [ 8] >>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] >>> >>> [e13n16:591872] [ 9] >>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] >>> >>> [e13n16:591872] *** End of error message *** >>> >>> >>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] >>> >>> [e13n16:591873] [ 2] >>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] >>> >>> [e13n16:591873] [ 3] >>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] >>> >>> [e13n16:591873] [ 4] >>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] >>> >>> [e13n16:591873] [ 5] >>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] >>> >>> [e13n16:591873] [ 6] >>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] >>> >>> [e13n16:591873] [ 7] >>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] >>> >>> [e13n16:591873] [ 8] >>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] >>> >>> [e13n16:591873] [ 9] >>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] >>> >>> [e13n16:591873] *** End of error message *** >>> >>> ERROR: One or more process (first noticed rank 1) terminated with >>> signal 11 (core dumped) >>> >>> >>> >>> >>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper >>> -arch=sm_70 CMakeFiles/xgc-es-cpp.dir/xgc-es-cpp_build_info.F90.o -o >>> bin/xgc-es-cpp -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib:/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64 >>> liblibxgc-es-cpp.a >>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/liblapack.so >>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/libblas.so >>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so >>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libparmetis.so >>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libmetis.so >>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/fftw-3.3.9-bzi7deue27ijd7xm4zn7pt22u4sj47g4/lib/libfftw3.so >>> libs/pspline/libpspline.a libs/camtimers/libtimers.a >>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libacchost.so >>> /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran_mpi.so.2.7.1 >>> /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran.so.2.7.1 >>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscontainers.a >>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscore.a >>> /usr/lib64/libcuda.so >>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/cuda/11.0/lib64/libcudart.so >>> /usr/lib64/libdl.so -lmpi_ibm_usempif08 -lmpi_ibm_usempi_ignore_tkr >>> -lmpi_ibm_mpifh -lnvf >>> -Wl,-rpath-link,/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64 >>> >>> >>> >>> 19:39 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ >>> make >>> PETSC_DIR=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7 >>> PETSC_ARCH="" ex1 >>> *mpicc* -fPIC -g -fast -fPIC -g -fast >>> -I/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/include >>> ex1.c >>> -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib >>> -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib >>> -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib >>> -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib >>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib >>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib >>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib >>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib >>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64 >>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64 >>> -Wl,-rpath,/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib >>> -L/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib >>> -Wl,-rpath,/usr/lib/gcc/ppc64le-redhat-linux/8 >>> -L/usr/lib/gcc/ppc64le-redhat-linux/8 -lpetsc -llapack -lblas -lparmetis >>> -lmetis -lstdc++ -ldl -lpthread -lmpiprofilesupport -lmpi_ibm_usempif08 >>> -lmpi_ibm_usempi_ignore_tkr -lmpi_ibm_mpifh -lmpi_ibm -lnvf -lnvomp >>> -latomic -lnvhpcatm -lnvcpumath -lnvc -lrt -lm -lgcc_s -lstdc++ -ldl -o ex1 >>> 19:40 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ *mpicc >>> --version* >>> >>> *nvc 21.7-0 linuxpower target on Linuxpower* >>> NVIDIA Compilers and Tools >>> Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights >>> reserved. >>> 19:40 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ >>> jsrun -n 1 ./ex1 -ksp_monitor >>> 0 KSP Residual norm 6.041522986797e+00 >>> 1 KSP Residual norm 1.042493382631e+00 >>> 2 KSP Residual norm 7.950907844730e-16 >>> 0 KSP Residual norm 4.786756692342e+00 >>> 1 KSP Residual norm 1.426392207750e-01 >>> 2 KSP Residual norm 1.801079604472e-15 >>> 0 KSP Residual norm 2.986456323228e+00 >>> 1 KSP Residual norm 7.669888809223e-02 >>> 2 KSP Residual norm 3.744083117256e-16 >>> 0 KSP Residual norm 2.306244667700e-01 >>> 1 KSP Residual norm 1.355550749587e-02 >>> 2 KSP Residual norm 5.845524837731e-17 >>> 0 KSP Residual norm 1.936314002654e-03 >>> 1 KSP Residual norm 2.125593590819e-04 >>> 2 KSP Residual norm 6.987141455073e-20 >>> 0 KSP Residual norm 1.435593531990e-07 >>> 1 KSP Residual norm 2.588271385567e-08 >>> 2 KSP Residual norm 3.942196167935e-23 >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Aug 27 15:28:13 2021 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 27 Aug 2021 16:28:13 -0400 Subject: [petsc-users] runtime error on Summit with nvhpc21.7 In-Reply-To: References: Message-ID: On Fri, Aug 27, 2021 at 3:56 PM Junchao Zhang wrote: > > > On Fri, Aug 27, 2021, 1:52 PM Mark Adams wrote: > >> I think the problem is that I build with MPICC and they use nvcc_wrapper. >> I could just try building PETSc with CC=nvcc_wrapper, but it was not >> clear if this was the way to go. >> > --with-nvcc=nvcc_wrapper > What do I specify for cc and CC? > I will try it. >> Thanks, >> Mark >> >> On Fri, Aug 27, 2021 at 10:50 AM Junchao Zhang >> wrote: >> >>> >>> >>> >>> On Fri, Aug 27, 2021 at 7:06 AM Mark Adams wrote: >>> >>>> I have a user (cc'ed) that has a C++ code and is using a PETSc that I >>>> built. He is getting this runtime error. >>>> >>>> 'make check' runs clean and I built snes/tutorial/ex1 manually, to get >>>> a link line, and it ran fine. >>>> I appended the users link line and my test. >>>> >>>> I see that they are using Kokkos' "nvcc_wrapper". Should I rebuild >>>> PETSc using that, maybe we just need to make sure we are both using the >>>> same underlying compiler or should they use mpiCC? >>>> >>> It looks like they used nvcc_wrapper to replace nvcc. You can ask them >>> to use nvcc directly to see what happens. But the error happened in petsc >>> initialization, petscsys_petscinitializenohelp, so I doubt it helps. >>> The easy way is to just attach a debugger. >>> >>>> >>>> Thanks, >>>> Mark >>>> >>>> >>>> [e13n16:591873] *** Process received signal *** >>>> >>>> [e13n16:591873] Signal: Segmentation fault (11) >>>> >>>> [e13n16:591873] Signal code: Invalid permissions (2) >>>> >>>> [e13n16:591873] Failing at address: 0x102c87e0 >>>> >>>> [e13n16:591873] [ 0] >>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] >>>> >>>> [e13n16:591873] [ 1] [e13n16:591872] *** Process received signal *** >>>> >>>> [e13n16:591872] Signal: Segmentation fault (11) >>>> >>>> [e13n16:591872] Signal code: Invalid permissions (2) >>>> >>>> [e13n16:591872] Failing at address: 0x102c87e0 >>>> >>>> [e13n16:591872] [ 0] >>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] >>>> >>>> [e13n16:591872] [ 1] [e13n16:591871] *** Process received signal *** >>>> >>>> [e13n16:591871] Signal: Segmentation fault (11) >>>> >>>> [e13n16:591871] Signal code: Invalid permissions (2) >>>> >>>> [e13n16:591871] Failing at address: 0x102c87e0 >>>> >>>> [e13n16:591871] [ 0] >>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] >>>> >>>> [e13n16:591871] [ 1] >>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] >>>> >>>> [e13n16:591871] [ 2] [e13n16:591874] *** Process received signal *** >>>> >>>> [e13n16:591874] Signal: Segmentation fault (11) >>>> >>>> [e13n16:591874] Signal code: Invalid permissions (2) >>>> >>>> [e13n16:591874] Failing at address: 0x102c87e0 >>>> >>>> [e13n16:591874] [ 0] >>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] >>>> >>>> [e13n16:591874] [ 1] >>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] >>>> >>>> [e13n16:591874] [ 2] >>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] >>>> >>>> [e13n16:591874] [ 3] >>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] >>>> >>>> [e13n16:591874] [ 4] >>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] >>>> >>>> [e13n16:591874] [ 5] >>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] >>>> >>>> [e13n16:591874] [ 6] >>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] >>>> >>>> [e13n16:591874] [ 7] >>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] >>>> >>>> [e13n16:591874] [ 8] >>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] >>>> >>>> [e13n16:591871] [ 3] >>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] >>>> >>>> [e13n16:591871] [ 4] >>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] >>>> >>>> [e13n16:591871] [ 5] >>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] >>>> >>>> [e13n16:591871] [ 6] >>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] >>>> >>>> [e13n16:591871] [ 7] >>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] >>>> >>>> [e13n16:591871] [ 8] >>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] >>>> >>>> [e13n16:591871] [ 9] >>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] >>>> >>>> [e13n16:591871] *** End of error message *** >>>> >>>> >>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] >>>> >>>> [e13n16:591874] [ 9] >>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] >>>> >>>> [e13n16:591874] *** End of error message *** >>>> >>>> >>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] >>>> >>>> [e13n16:591872] [ 2] >>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] >>>> >>>> [e13n16:591872] [ 3] >>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] >>>> >>>> [e13n16:591872] [ 4] >>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] >>>> >>>> [e13n16:591872] [ 5] >>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] >>>> >>>> [e13n16:591872] [ 6] >>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] >>>> >>>> [e13n16:591872] [ 7] >>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] >>>> >>>> [e13n16:591872] [ 8] >>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] >>>> >>>> [e13n16:591872] [ 9] >>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] >>>> >>>> [e13n16:591872] *** End of error message *** >>>> >>>> >>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] >>>> >>>> [e13n16:591873] [ 2] >>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] >>>> >>>> [e13n16:591873] [ 3] >>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] >>>> >>>> [e13n16:591873] [ 4] >>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] >>>> >>>> [e13n16:591873] [ 5] >>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] >>>> >>>> [e13n16:591873] [ 6] >>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] >>>> >>>> [e13n16:591873] [ 7] >>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] >>>> >>>> [e13n16:591873] [ 8] >>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] >>>> >>>> [e13n16:591873] [ 9] >>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] >>>> >>>> [e13n16:591873] *** End of error message *** >>>> >>>> ERROR: One or more process (first noticed rank 1) terminated with >>>> signal 11 (core dumped) >>>> >>>> >>>> >>>> >>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper >>>> -arch=sm_70 CMakeFiles/xgc-es-cpp.dir/xgc-es-cpp_build_info.F90.o -o >>>> bin/xgc-es-cpp -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib:/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64 >>>> liblibxgc-es-cpp.a >>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/liblapack.so >>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/libblas.so >>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so >>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libparmetis.so >>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libmetis.so >>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/fftw-3.3.9-bzi7deue27ijd7xm4zn7pt22u4sj47g4/lib/libfftw3.so >>>> libs/pspline/libpspline.a libs/camtimers/libtimers.a >>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libacchost.so >>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran_mpi.so.2.7.1 >>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran.so.2.7.1 >>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscontainers.a >>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscore.a >>>> /usr/lib64/libcuda.so >>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/cuda/11.0/lib64/libcudart.so >>>> /usr/lib64/libdl.so -lmpi_ibm_usempif08 -lmpi_ibm_usempi_ignore_tkr >>>> -lmpi_ibm_mpifh -lnvf >>>> -Wl,-rpath-link,/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64 >>>> >>>> >>>> >>>> 19:39 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ >>>> make >>>> PETSC_DIR=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7 >>>> PETSC_ARCH="" ex1 >>>> *mpicc* -fPIC -g -fast -fPIC -g -fast >>>> -I/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/include >>>> ex1.c >>>> -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib >>>> -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib >>>> -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib >>>> -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib >>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib >>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib >>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib >>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib >>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64 >>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64 >>>> -Wl,-rpath,/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib >>>> -L/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib >>>> -Wl,-rpath,/usr/lib/gcc/ppc64le-redhat-linux/8 >>>> -L/usr/lib/gcc/ppc64le-redhat-linux/8 -lpetsc -llapack -lblas -lparmetis >>>> -lmetis -lstdc++ -ldl -lpthread -lmpiprofilesupport -lmpi_ibm_usempif08 >>>> -lmpi_ibm_usempi_ignore_tkr -lmpi_ibm_mpifh -lmpi_ibm -lnvf -lnvomp >>>> -latomic -lnvhpcatm -lnvcpumath -lnvc -lrt -lm -lgcc_s -lstdc++ -ldl -o ex1 >>>> 19:40 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ *mpicc >>>> --version* >>>> >>>> *nvc 21.7-0 linuxpower target on Linuxpower* >>>> NVIDIA Compilers and Tools >>>> Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights >>>> reserved. >>>> 19:40 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ >>>> jsrun -n 1 ./ex1 -ksp_monitor >>>> 0 KSP Residual norm 6.041522986797e+00 >>>> 1 KSP Residual norm 1.042493382631e+00 >>>> 2 KSP Residual norm 7.950907844730e-16 >>>> 0 KSP Residual norm 4.786756692342e+00 >>>> 1 KSP Residual norm 1.426392207750e-01 >>>> 2 KSP Residual norm 1.801079604472e-15 >>>> 0 KSP Residual norm 2.986456323228e+00 >>>> 1 KSP Residual norm 7.669888809223e-02 >>>> 2 KSP Residual norm 3.744083117256e-16 >>>> 0 KSP Residual norm 2.306244667700e-01 >>>> 1 KSP Residual norm 1.355550749587e-02 >>>> 2 KSP Residual norm 5.845524837731e-17 >>>> 0 KSP Residual norm 1.936314002654e-03 >>>> 1 KSP Residual norm 2.125593590819e-04 >>>> 2 KSP Residual norm 6.987141455073e-20 >>>> 0 KSP Residual norm 1.435593531990e-07 >>>> 1 KSP Residual norm 2.588271385567e-08 >>>> 2 KSP Residual norm 3.942196167935e-23 >>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: application/octet-stream Size: 78776 bytes Desc: not available URL: From junchao.zhang at gmail.com Fri Aug 27 16:03:44 2021 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Fri, 27 Aug 2021 16:03:44 -0500 Subject: [petsc-users] runtime error on Summit with nvhpc21.7 In-Reply-To: References: Message-ID: I don't understand the configure options --with-cc=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/ *nvcc_wrapper* --with-cxx=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper --with-fc=/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/bin/mpifort COPTFLAGS="-g -fast" CXXOPTFLAGS="-g -fast" FOPTFLAGS="-g -fast" CUDAFLAGS="-ccbin nvc++" --with-ssl=0 --with-batch=0 --with-mpiexec="jsrun -g 1" *--with-cuda=0* --with-cudac=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper --with-cuda-gencodearch=70 --download-metis --download-parmetis --with-x=0 --with-debugging=0 PETSC_ARCH=arch-summit-opt-nvhpc --prefix=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b Why do you need to use nvcc_wrapper if you do not want to use cuda? In addition, nvcc_wrapper is a C++ compiler. Using it for --with-cc=, you also need --with-clanguage=c++ --Junchao Zhang On Fri, Aug 27, 2021 at 3:28 PM Mark Adams wrote: > > > On Fri, Aug 27, 2021 at 3:56 PM Junchao Zhang > wrote: > >> >> >> On Fri, Aug 27, 2021, 1:52 PM Mark Adams wrote: >> >>> I think the problem is that I build with MPICC and they use nvcc_wrapper. >>> I could just try building PETSc with CC=nvcc_wrapper, but it was not >>> clear if this was the way to go. >>> >> --with-nvcc=nvcc_wrapper >> > > What do I specify for cc and CC? > > >> I will try it. >>> Thanks, >>> Mark >>> >>> On Fri, Aug 27, 2021 at 10:50 AM Junchao Zhang >>> wrote: >>> >>>> >>>> >>>> >>>> On Fri, Aug 27, 2021 at 7:06 AM Mark Adams wrote: >>>> >>>>> I have a user (cc'ed) that has a C++ code and is using a PETSc that I >>>>> built. He is getting this runtime error. >>>>> >>>>> 'make check' runs clean and I built snes/tutorial/ex1 manually, to get >>>>> a link line, and it ran fine. >>>>> I appended the users link line and my test. >>>>> >>>>> I see that they are using Kokkos' "nvcc_wrapper". Should I rebuild >>>>> PETSc using that, maybe we just need to make sure we are both using the >>>>> same underlying compiler or should they use mpiCC? >>>>> >>>> It looks like they used nvcc_wrapper to replace nvcc. You can ask them >>>> to use nvcc directly to see what happens. But the error happened in petsc >>>> initialization, petscsys_petscinitializenohelp, so I doubt it helps. >>>> The easy way is to just attach a debugger. >>>> >>>>> >>>>> Thanks, >>>>> Mark >>>>> >>>>> >>>>> [e13n16:591873] *** Process received signal *** >>>>> >>>>> [e13n16:591873] Signal: Segmentation fault (11) >>>>> >>>>> [e13n16:591873] Signal code: Invalid permissions (2) >>>>> >>>>> [e13n16:591873] Failing at address: 0x102c87e0 >>>>> >>>>> [e13n16:591873] [ 0] >>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] >>>>> >>>>> [e13n16:591873] [ 1] [e13n16:591872] *** Process received signal *** >>>>> >>>>> [e13n16:591872] Signal: Segmentation fault (11) >>>>> >>>>> [e13n16:591872] Signal code: Invalid permissions (2) >>>>> >>>>> [e13n16:591872] Failing at address: 0x102c87e0 >>>>> >>>>> [e13n16:591872] [ 0] >>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] >>>>> >>>>> [e13n16:591872] [ 1] [e13n16:591871] *** Process received signal *** >>>>> >>>>> [e13n16:591871] Signal: Segmentation fault (11) >>>>> >>>>> [e13n16:591871] Signal code: Invalid permissions (2) >>>>> >>>>> [e13n16:591871] Failing at address: 0x102c87e0 >>>>> >>>>> [e13n16:591871] [ 0] >>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] >>>>> >>>>> [e13n16:591871] [ 1] >>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] >>>>> >>>>> [e13n16:591871] [ 2] [e13n16:591874] *** Process received signal *** >>>>> >>>>> [e13n16:591874] Signal: Segmentation fault (11) >>>>> >>>>> [e13n16:591874] Signal code: Invalid permissions (2) >>>>> >>>>> [e13n16:591874] Failing at address: 0x102c87e0 >>>>> >>>>> [e13n16:591874] [ 0] >>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] >>>>> >>>>> [e13n16:591874] [ 1] >>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] >>>>> >>>>> [e13n16:591874] [ 2] >>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] >>>>> >>>>> [e13n16:591874] [ 3] >>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] >>>>> >>>>> [e13n16:591874] [ 4] >>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] >>>>> >>>>> [e13n16:591874] [ 5] >>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] >>>>> >>>>> [e13n16:591874] [ 6] >>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] >>>>> >>>>> [e13n16:591874] [ 7] >>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] >>>>> >>>>> [e13n16:591874] [ 8] >>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] >>>>> >>>>> [e13n16:591871] [ 3] >>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] >>>>> >>>>> [e13n16:591871] [ 4] >>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] >>>>> >>>>> [e13n16:591871] [ 5] >>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] >>>>> >>>>> [e13n16:591871] [ 6] >>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] >>>>> >>>>> [e13n16:591871] [ 7] >>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] >>>>> >>>>> [e13n16:591871] [ 8] >>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] >>>>> >>>>> [e13n16:591871] [ 9] >>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] >>>>> >>>>> [e13n16:591871] *** End of error message *** >>>>> >>>>> >>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] >>>>> >>>>> [e13n16:591874] [ 9] >>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] >>>>> >>>>> [e13n16:591874] *** End of error message *** >>>>> >>>>> >>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] >>>>> >>>>> [e13n16:591872] [ 2] >>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] >>>>> >>>>> [e13n16:591872] [ 3] >>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] >>>>> >>>>> [e13n16:591872] [ 4] >>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] >>>>> >>>>> [e13n16:591872] [ 5] >>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] >>>>> >>>>> [e13n16:591872] [ 6] >>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] >>>>> >>>>> [e13n16:591872] [ 7] >>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] >>>>> >>>>> [e13n16:591872] [ 8] >>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] >>>>> >>>>> [e13n16:591872] [ 9] >>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] >>>>> >>>>> [e13n16:591872] *** End of error message *** >>>>> >>>>> >>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] >>>>> >>>>> [e13n16:591873] [ 2] >>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] >>>>> >>>>> [e13n16:591873] [ 3] >>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] >>>>> >>>>> [e13n16:591873] [ 4] >>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] >>>>> >>>>> [e13n16:591873] [ 5] >>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] >>>>> >>>>> [e13n16:591873] [ 6] >>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] >>>>> >>>>> [e13n16:591873] [ 7] >>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] >>>>> >>>>> [e13n16:591873] [ 8] >>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] >>>>> >>>>> [e13n16:591873] [ 9] >>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] >>>>> >>>>> [e13n16:591873] *** End of error message *** >>>>> >>>>> ERROR: One or more process (first noticed rank 1) terminated with >>>>> signal 11 (core dumped) >>>>> >>>>> >>>>> >>>>> >>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper >>>>> -arch=sm_70 CMakeFiles/xgc-es-cpp.dir/xgc-es-cpp_build_info.F90.o -o >>>>> bin/xgc-es-cpp -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib:/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64 >>>>> liblibxgc-es-cpp.a >>>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/liblapack.so >>>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/libblas.so >>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so >>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libparmetis.so >>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libmetis.so >>>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/fftw-3.3.9-bzi7deue27ijd7xm4zn7pt22u4sj47g4/lib/libfftw3.so >>>>> libs/pspline/libpspline.a libs/camtimers/libtimers.a >>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libacchost.so >>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran_mpi.so.2.7.1 >>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran.so.2.7.1 >>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscontainers.a >>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscore.a >>>>> /usr/lib64/libcuda.so >>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/cuda/11.0/lib64/libcudart.so >>>>> /usr/lib64/libdl.so -lmpi_ibm_usempif08 -lmpi_ibm_usempi_ignore_tkr >>>>> -lmpi_ibm_mpifh -lnvf >>>>> -Wl,-rpath-link,/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64 >>>>> >>>>> >>>>> >>>>> 19:39 main= >>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ make >>>>> PETSC_DIR=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7 >>>>> PETSC_ARCH="" ex1 >>>>> *mpicc* -fPIC -g -fast -fPIC -g -fast >>>>> -I/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/include >>>>> ex1.c >>>>> -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib >>>>> -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib >>>>> -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib >>>>> -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib >>>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib >>>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib >>>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib >>>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib >>>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64 >>>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64 >>>>> -Wl,-rpath,/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib >>>>> -L/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib >>>>> -Wl,-rpath,/usr/lib/gcc/ppc64le-redhat-linux/8 >>>>> -L/usr/lib/gcc/ppc64le-redhat-linux/8 -lpetsc -llapack -lblas -lparmetis >>>>> -lmetis -lstdc++ -ldl -lpthread -lmpiprofilesupport -lmpi_ibm_usempif08 >>>>> -lmpi_ibm_usempi_ignore_tkr -lmpi_ibm_mpifh -lmpi_ibm -lnvf -lnvomp >>>>> -latomic -lnvhpcatm -lnvcpumath -lnvc -lrt -lm -lgcc_s -lstdc++ -ldl -o ex1 >>>>> 19:40 main= >>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ *mpicc >>>>> --version* >>>>> >>>>> *nvc 21.7-0 linuxpower target on Linuxpower* >>>>> NVIDIA Compilers and Tools >>>>> Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights >>>>> reserved. >>>>> 19:40 main= >>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ jsrun -n 1 >>>>> ./ex1 -ksp_monitor >>>>> 0 KSP Residual norm 6.041522986797e+00 >>>>> 1 KSP Residual norm 1.042493382631e+00 >>>>> 2 KSP Residual norm 7.950907844730e-16 >>>>> 0 KSP Residual norm 4.786756692342e+00 >>>>> 1 KSP Residual norm 1.426392207750e-01 >>>>> 2 KSP Residual norm 1.801079604472e-15 >>>>> 0 KSP Residual norm 2.986456323228e+00 >>>>> 1 KSP Residual norm 7.669888809223e-02 >>>>> 2 KSP Residual norm 3.744083117256e-16 >>>>> 0 KSP Residual norm 2.306244667700e-01 >>>>> 1 KSP Residual norm 1.355550749587e-02 >>>>> 2 KSP Residual norm 5.845524837731e-17 >>>>> 0 KSP Residual norm 1.936314002654e-03 >>>>> 1 KSP Residual norm 2.125593590819e-04 >>>>> 2 KSP Residual norm 6.987141455073e-20 >>>>> 0 KSP Residual norm 1.435593531990e-07 >>>>> 1 KSP Residual norm 2.588271385567e-08 >>>>> 2 KSP Residual norm 3.942196167935e-23 >>>>> >>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Aug 27 17:05:44 2021 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 27 Aug 2021 18:05:44 -0400 Subject: [petsc-users] runtime error on Summit with nvhpc21.7 In-Reply-To: References: Message-ID: On Fri, Aug 27, 2021 at 5:03 PM Junchao Zhang wrote: > I don't understand the configure options > > > --with-cc=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/ > *nvcc_wrapper* > --with-cxx=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper > --with-fc=/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/bin/mpifort > COPTFLAGS="-g -fast" CXXOPTFLAGS="-g -fast" FOPTFLAGS="-g -fast" > CUDAFLAGS="-ccbin nvc++" --with-ssl=0 --with-batch=0 --with-mpiexec="jsrun > -g 1" *--with-cuda=0* > --with-cudac=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper > --with-cuda-gencodearch=70 --download-metis --download-parmetis --with-x=0 > --with-debugging=0 PETSC_ARCH=arch-summit-opt-nvhpc > --prefix=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b > > Why do you need to use nvcc_wrapper if you do not want to use cuda? > That code that is having a problem links with nvcc_wrapper. They get a segv that I sent earlier, in PetscInitialize so I figure I should use the same compiler / linker. They use CUDA, but we don't need PETSc to use CUDA now. > In addition, nvcc_wrapper is a C++ compiler. Using it for --with-cc=, you > also need --with-clanguage=c++ > I rebuilt PETSc with mpicc, mpiCC, mpif90 and --with-nvcc=nvcc_wrapper and that built make check works. I gave it to them to test. Thanks, Mark > > --Junchao Zhang > > > On Fri, Aug 27, 2021 at 3:28 PM Mark Adams wrote: > >> >> >> On Fri, Aug 27, 2021 at 3:56 PM Junchao Zhang >> wrote: >> >>> >>> >>> On Fri, Aug 27, 2021, 1:52 PM Mark Adams wrote: >>> >>>> I think the problem is that I build with MPICC and they use nvcc_wrapper. >>>> I could just try building PETSc with CC=nvcc_wrapper, but it was not >>>> clear if this was the way to go. >>>> >>> --with-nvcc=nvcc_wrapper >>> >> >> What do I specify for cc and CC? >> >> >>> I will try it. >>>> Thanks, >>>> Mark >>>> >>>> On Fri, Aug 27, 2021 at 10:50 AM Junchao Zhang >>>> wrote: >>>> >>>>> >>>>> >>>>> >>>>> On Fri, Aug 27, 2021 at 7:06 AM Mark Adams wrote: >>>>> >>>>>> I have a user (cc'ed) that has a C++ code and is using a PETSc that I >>>>>> built. He is getting this runtime error. >>>>>> >>>>>> 'make check' runs clean and I built snes/tutorial/ex1 manually, to >>>>>> get a link line, and it ran fine. >>>>>> I appended the users link line and my test. >>>>>> >>>>>> I see that they are using Kokkos' "nvcc_wrapper". Should I rebuild >>>>>> PETSc using that, maybe we just need to make sure we are both using the >>>>>> same underlying compiler or should they use mpiCC? >>>>>> >>>>> It looks like they used nvcc_wrapper to replace nvcc. You can ask >>>>> them to use nvcc directly to see what happens. But the error happened in >>>>> petsc initialization, petscsys_petscinitializenohelp, so I doubt it >>>>> helps. The easy way is to just attach a debugger. >>>>> >>>>>> >>>>>> Thanks, >>>>>> Mark >>>>>> >>>>>> >>>>>> [e13n16:591873] *** Process received signal *** >>>>>> >>>>>> [e13n16:591873] Signal: Segmentation fault (11) >>>>>> >>>>>> [e13n16:591873] Signal code: Invalid permissions (2) >>>>>> >>>>>> [e13n16:591873] Failing at address: 0x102c87e0 >>>>>> >>>>>> [e13n16:591873] [ 0] >>>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] >>>>>> >>>>>> [e13n16:591873] [ 1] [e13n16:591872] *** Process received signal *** >>>>>> >>>>>> [e13n16:591872] Signal: Segmentation fault (11) >>>>>> >>>>>> [e13n16:591872] Signal code: Invalid permissions (2) >>>>>> >>>>>> [e13n16:591872] Failing at address: 0x102c87e0 >>>>>> >>>>>> [e13n16:591872] [ 0] >>>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] >>>>>> >>>>>> [e13n16:591872] [ 1] [e13n16:591871] *** Process received signal *** >>>>>> >>>>>> [e13n16:591871] Signal: Segmentation fault (11) >>>>>> >>>>>> [e13n16:591871] Signal code: Invalid permissions (2) >>>>>> >>>>>> [e13n16:591871] Failing at address: 0x102c87e0 >>>>>> >>>>>> [e13n16:591871] [ 0] >>>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] >>>>>> >>>>>> [e13n16:591871] [ 1] >>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] >>>>>> >>>>>> [e13n16:591871] [ 2] [e13n16:591874] *** Process received signal *** >>>>>> >>>>>> [e13n16:591874] Signal: Segmentation fault (11) >>>>>> >>>>>> [e13n16:591874] Signal code: Invalid permissions (2) >>>>>> >>>>>> [e13n16:591874] Failing at address: 0x102c87e0 >>>>>> >>>>>> [e13n16:591874] [ 0] >>>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] >>>>>> >>>>>> [e13n16:591874] [ 1] >>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] >>>>>> >>>>>> [e13n16:591874] [ 2] >>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] >>>>>> >>>>>> [e13n16:591874] [ 3] >>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] >>>>>> >>>>>> [e13n16:591874] [ 4] >>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] >>>>>> >>>>>> [e13n16:591874] [ 5] >>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] >>>>>> >>>>>> [e13n16:591874] [ 6] >>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] >>>>>> >>>>>> [e13n16:591874] [ 7] >>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] >>>>>> >>>>>> [e13n16:591874] [ 8] >>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] >>>>>> >>>>>> [e13n16:591871] [ 3] >>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] >>>>>> >>>>>> [e13n16:591871] [ 4] >>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] >>>>>> >>>>>> [e13n16:591871] [ 5] >>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] >>>>>> >>>>>> [e13n16:591871] [ 6] >>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] >>>>>> >>>>>> [e13n16:591871] [ 7] >>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] >>>>>> >>>>>> [e13n16:591871] [ 8] >>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] >>>>>> >>>>>> [e13n16:591871] [ 9] >>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] >>>>>> >>>>>> [e13n16:591871] *** End of error message *** >>>>>> >>>>>> >>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] >>>>>> >>>>>> [e13n16:591874] [ 9] >>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] >>>>>> >>>>>> [e13n16:591874] *** End of error message *** >>>>>> >>>>>> >>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] >>>>>> >>>>>> [e13n16:591872] [ 2] >>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] >>>>>> >>>>>> [e13n16:591872] [ 3] >>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] >>>>>> >>>>>> [e13n16:591872] [ 4] >>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] >>>>>> >>>>>> [e13n16:591872] [ 5] >>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] >>>>>> >>>>>> [e13n16:591872] [ 6] >>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] >>>>>> >>>>>> [e13n16:591872] [ 7] >>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] >>>>>> >>>>>> [e13n16:591872] [ 8] >>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] >>>>>> >>>>>> [e13n16:591872] [ 9] >>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] >>>>>> >>>>>> [e13n16:591872] *** End of error message *** >>>>>> >>>>>> >>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] >>>>>> >>>>>> [e13n16:591873] [ 2] >>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] >>>>>> >>>>>> [e13n16:591873] [ 3] >>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] >>>>>> >>>>>> [e13n16:591873] [ 4] >>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] >>>>>> >>>>>> [e13n16:591873] [ 5] >>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] >>>>>> >>>>>> [e13n16:591873] [ 6] >>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] >>>>>> >>>>>> [e13n16:591873] [ 7] >>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] >>>>>> >>>>>> [e13n16:591873] [ 8] >>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] >>>>>> >>>>>> [e13n16:591873] [ 9] >>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] >>>>>> >>>>>> [e13n16:591873] *** End of error message *** >>>>>> >>>>>> ERROR: One or more process (first noticed rank 1) terminated with >>>>>> signal 11 (core dumped) >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper >>>>>> -arch=sm_70 CMakeFiles/xgc-es-cpp.dir/xgc-es-cpp_build_info.F90.o -o >>>>>> bin/xgc-es-cpp -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib:/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64 >>>>>> liblibxgc-es-cpp.a >>>>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/liblapack.so >>>>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/libblas.so >>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so >>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libparmetis.so >>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libmetis.so >>>>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/fftw-3.3.9-bzi7deue27ijd7xm4zn7pt22u4sj47g4/lib/libfftw3.so >>>>>> libs/pspline/libpspline.a libs/camtimers/libtimers.a >>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libacchost.so >>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran_mpi.so.2.7.1 >>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran.so.2.7.1 >>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscontainers.a >>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscore.a >>>>>> /usr/lib64/libcuda.so >>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/cuda/11.0/lib64/libcudart.so >>>>>> /usr/lib64/libdl.so -lmpi_ibm_usempif08 -lmpi_ibm_usempi_ignore_tkr >>>>>> -lmpi_ibm_mpifh -lnvf >>>>>> -Wl,-rpath-link,/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64 >>>>>> >>>>>> >>>>>> >>>>>> 19:39 main= >>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ make >>>>>> PETSC_DIR=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7 >>>>>> PETSC_ARCH="" ex1 >>>>>> *mpicc* -fPIC -g -fast -fPIC -g -fast >>>>>> -I/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/include >>>>>> ex1.c >>>>>> -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib >>>>>> -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib >>>>>> -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib >>>>>> -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib >>>>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib >>>>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib >>>>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib >>>>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib >>>>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64 >>>>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64 >>>>>> -Wl,-rpath,/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib >>>>>> -L/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib >>>>>> -Wl,-rpath,/usr/lib/gcc/ppc64le-redhat-linux/8 >>>>>> -L/usr/lib/gcc/ppc64le-redhat-linux/8 -lpetsc -llapack -lblas -lparmetis >>>>>> -lmetis -lstdc++ -ldl -lpthread -lmpiprofilesupport -lmpi_ibm_usempif08 >>>>>> -lmpi_ibm_usempi_ignore_tkr -lmpi_ibm_mpifh -lmpi_ibm -lnvf -lnvomp >>>>>> -latomic -lnvhpcatm -lnvcpumath -lnvc -lrt -lm -lgcc_s -lstdc++ -ldl -o ex1 >>>>>> 19:40 main= >>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ *mpicc >>>>>> --version* >>>>>> >>>>>> *nvc 21.7-0 linuxpower target on Linuxpower* >>>>>> NVIDIA Compilers and Tools >>>>>> Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights >>>>>> reserved. >>>>>> 19:40 main= >>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ jsrun -n 1 >>>>>> ./ex1 -ksp_monitor >>>>>> 0 KSP Residual norm 6.041522986797e+00 >>>>>> 1 KSP Residual norm 1.042493382631e+00 >>>>>> 2 KSP Residual norm 7.950907844730e-16 >>>>>> 0 KSP Residual norm 4.786756692342e+00 >>>>>> 1 KSP Residual norm 1.426392207750e-01 >>>>>> 2 KSP Residual norm 1.801079604472e-15 >>>>>> 0 KSP Residual norm 2.986456323228e+00 >>>>>> 1 KSP Residual norm 7.669888809223e-02 >>>>>> 2 KSP Residual norm 3.744083117256e-16 >>>>>> 0 KSP Residual norm 2.306244667700e-01 >>>>>> 1 KSP Residual norm 1.355550749587e-02 >>>>>> 2 KSP Residual norm 5.845524837731e-17 >>>>>> 0 KSP Residual norm 1.936314002654e-03 >>>>>> 1 KSP Residual norm 2.125593590819e-04 >>>>>> 2 KSP Residual norm 6.987141455073e-20 >>>>>> 0 KSP Residual norm 1.435593531990e-07 >>>>>> 1 KSP Residual norm 2.588271385567e-08 >>>>>> 2 KSP Residual norm 3.942196167935e-23 >>>>>> >>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri Aug 27 17:16:43 2021 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 27 Aug 2021 18:16:43 -0400 Subject: [petsc-users] runtime error on Summit with nvhpc21.7 In-Reply-To: References: Message-ID: And I found that this C++ code calls PetscIntiialize from Fortran code. Hence the Fortran library in the call stack. F90 tests work but our tests are pure Fortran. Should they be using nvcc_wrapper (a Kokkos version) as a linker? Thanks, Mark On Fri, Aug 27, 2021 at 6:05 PM Mark Adams wrote: > > > On Fri, Aug 27, 2021 at 5:03 PM Junchao Zhang > wrote: > >> I don't understand the configure options >> >> >> --with-cc=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/ >> *nvcc_wrapper* >> --with-cxx=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper >> --with-fc=/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/bin/mpifort >> COPTFLAGS="-g -fast" CXXOPTFLAGS="-g -fast" FOPTFLAGS="-g -fast" >> CUDAFLAGS="-ccbin nvc++" --with-ssl=0 --with-batch=0 --with-mpiexec="jsrun >> -g 1" *--with-cuda=0* >> --with-cudac=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper >> --with-cuda-gencodearch=70 --download-metis --download-parmetis --with-x=0 >> --with-debugging=0 PETSC_ARCH=arch-summit-opt-nvhpc >> --prefix=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b >> >> Why do you need to use nvcc_wrapper if you do not want to use cuda? >> > > That code that is having a problem links with nvcc_wrapper. > They get a segv that I sent earlier, in PetscInitialize so I figure I > should use the same compiler / linker. > They use CUDA, but we don't need PETSc to use CUDA now. > > >> In addition, nvcc_wrapper is a C++ compiler. Using it for --with-cc=, you >> also need --with-clanguage=c++ >> > > I rebuilt PETSc with mpicc, mpiCC, mpif90 and --with-nvcc=nvcc_wrapper > and that built make check works. I gave it to them to test. > > Thanks, > Mark > > >> >> --Junchao Zhang >> >> >> On Fri, Aug 27, 2021 at 3:28 PM Mark Adams wrote: >> >>> >>> >>> On Fri, Aug 27, 2021 at 3:56 PM Junchao Zhang >>> wrote: >>> >>>> >>>> >>>> On Fri, Aug 27, 2021, 1:52 PM Mark Adams wrote: >>>> >>>>> I think the problem is that I build with MPICC and they use nvcc_wrapper. >>>>> I could just try building PETSc with CC=nvcc_wrapper, but it was not >>>>> clear if this was the way to go. >>>>> >>>> --with-nvcc=nvcc_wrapper >>>> >>> >>> What do I specify for cc and CC? >>> >>> >>>> I will try it. >>>>> Thanks, >>>>> Mark >>>>> >>>>> On Fri, Aug 27, 2021 at 10:50 AM Junchao Zhang < >>>>> junchao.zhang at gmail.com> wrote: >>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Aug 27, 2021 at 7:06 AM Mark Adams wrote: >>>>>> >>>>>>> I have a user (cc'ed) that has a C++ code and is using a PETSc that >>>>>>> I built. He is getting this runtime error. >>>>>>> >>>>>>> 'make check' runs clean and I built snes/tutorial/ex1 manually, to >>>>>>> get a link line, and it ran fine. >>>>>>> I appended the users link line and my test. >>>>>>> >>>>>>> I see that they are using Kokkos' "nvcc_wrapper". Should I rebuild >>>>>>> PETSc using that, maybe we just need to make sure we are both using the >>>>>>> same underlying compiler or should they use mpiCC? >>>>>>> >>>>>> It looks like they used nvcc_wrapper to replace nvcc. You can ask >>>>>> them to use nvcc directly to see what happens. But the error happened in >>>>>> petsc initialization, petscsys_petscinitializenohelp, so I doubt it >>>>>> helps. The easy way is to just attach a debugger. >>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Mark >>>>>>> >>>>>>> >>>>>>> [e13n16:591873] *** Process received signal *** >>>>>>> >>>>>>> [e13n16:591873] Signal: Segmentation fault (11) >>>>>>> >>>>>>> [e13n16:591873] Signal code: Invalid permissions (2) >>>>>>> >>>>>>> [e13n16:591873] Failing at address: 0x102c87e0 >>>>>>> >>>>>>> [e13n16:591873] [ 0] >>>>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] >>>>>>> >>>>>>> [e13n16:591873] [ 1] [e13n16:591872] *** Process received signal *** >>>>>>> >>>>>>> [e13n16:591872] Signal: Segmentation fault (11) >>>>>>> >>>>>>> [e13n16:591872] Signal code: Invalid permissions (2) >>>>>>> >>>>>>> [e13n16:591872] Failing at address: 0x102c87e0 >>>>>>> >>>>>>> [e13n16:591872] [ 0] >>>>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] >>>>>>> >>>>>>> [e13n16:591872] [ 1] [e13n16:591871] *** Process received signal *** >>>>>>> >>>>>>> [e13n16:591871] Signal: Segmentation fault (11) >>>>>>> >>>>>>> [e13n16:591871] Signal code: Invalid permissions (2) >>>>>>> >>>>>>> [e13n16:591871] Failing at address: 0x102c87e0 >>>>>>> >>>>>>> [e13n16:591871] [ 0] >>>>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] >>>>>>> >>>>>>> [e13n16:591871] [ 1] >>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] >>>>>>> >>>>>>> [e13n16:591871] [ 2] [e13n16:591874] *** Process received signal *** >>>>>>> >>>>>>> [e13n16:591874] Signal: Segmentation fault (11) >>>>>>> >>>>>>> [e13n16:591874] Signal code: Invalid permissions (2) >>>>>>> >>>>>>> [e13n16:591874] Failing at address: 0x102c87e0 >>>>>>> >>>>>>> [e13n16:591874] [ 0] >>>>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] >>>>>>> >>>>>>> [e13n16:591874] [ 1] >>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] >>>>>>> >>>>>>> [e13n16:591874] [ 2] >>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] >>>>>>> >>>>>>> [e13n16:591874] [ 3] >>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] >>>>>>> >>>>>>> [e13n16:591874] [ 4] >>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] >>>>>>> >>>>>>> [e13n16:591874] [ 5] >>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] >>>>>>> >>>>>>> [e13n16:591874] [ 6] >>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] >>>>>>> >>>>>>> [e13n16:591874] [ 7] >>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] >>>>>>> >>>>>>> [e13n16:591874] [ 8] >>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] >>>>>>> >>>>>>> [e13n16:591871] [ 3] >>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] >>>>>>> >>>>>>> [e13n16:591871] [ 4] >>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] >>>>>>> >>>>>>> [e13n16:591871] [ 5] >>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] >>>>>>> >>>>>>> [e13n16:591871] [ 6] >>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] >>>>>>> >>>>>>> [e13n16:591871] [ 7] >>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] >>>>>>> >>>>>>> [e13n16:591871] [ 8] >>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] >>>>>>> >>>>>>> [e13n16:591871] [ 9] >>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] >>>>>>> >>>>>>> [e13n16:591871] *** End of error message *** >>>>>>> >>>>>>> >>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] >>>>>>> >>>>>>> [e13n16:591874] [ 9] >>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] >>>>>>> >>>>>>> [e13n16:591874] *** End of error message *** >>>>>>> >>>>>>> >>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] >>>>>>> >>>>>>> [e13n16:591872] [ 2] >>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] >>>>>>> >>>>>>> [e13n16:591872] [ 3] >>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] >>>>>>> >>>>>>> [e13n16:591872] [ 4] >>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] >>>>>>> >>>>>>> [e13n16:591872] [ 5] >>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] >>>>>>> >>>>>>> [e13n16:591872] [ 6] >>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] >>>>>>> >>>>>>> [e13n16:591872] [ 7] >>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] >>>>>>> >>>>>>> [e13n16:591872] [ 8] >>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] >>>>>>> >>>>>>> [e13n16:591872] [ 9] >>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] >>>>>>> >>>>>>> [e13n16:591872] *** End of error message *** >>>>>>> >>>>>>> >>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] >>>>>>> >>>>>>> [e13n16:591873] [ 2] >>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] >>>>>>> >>>>>>> [e13n16:591873] [ 3] >>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] >>>>>>> >>>>>>> [e13n16:591873] [ 4] >>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] >>>>>>> >>>>>>> [e13n16:591873] [ 5] >>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] >>>>>>> >>>>>>> [e13n16:591873] [ 6] >>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] >>>>>>> >>>>>>> [e13n16:591873] [ 7] >>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] >>>>>>> >>>>>>> [e13n16:591873] [ 8] >>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] >>>>>>> >>>>>>> [e13n16:591873] [ 9] >>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] >>>>>>> >>>>>>> [e13n16:591873] *** End of error message *** >>>>>>> >>>>>>> ERROR: One or more process (first noticed rank 1) terminated with >>>>>>> signal 11 (core dumped) >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper >>>>>>> -arch=sm_70 CMakeFiles/xgc-es-cpp.dir/xgc-es-cpp_build_info.F90.o -o >>>>>>> bin/xgc-es-cpp -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib:/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64 >>>>>>> liblibxgc-es-cpp.a >>>>>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/liblapack.so >>>>>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/libblas.so >>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so >>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libparmetis.so >>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libmetis.so >>>>>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/fftw-3.3.9-bzi7deue27ijd7xm4zn7pt22u4sj47g4/lib/libfftw3.so >>>>>>> libs/pspline/libpspline.a libs/camtimers/libtimers.a >>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libacchost.so >>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran_mpi.so.2.7.1 >>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran.so.2.7.1 >>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscontainers.a >>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscore.a >>>>>>> /usr/lib64/libcuda.so >>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/cuda/11.0/lib64/libcudart.so >>>>>>> /usr/lib64/libdl.so -lmpi_ibm_usempif08 -lmpi_ibm_usempi_ignore_tkr >>>>>>> -lmpi_ibm_mpifh -lnvf >>>>>>> -Wl,-rpath-link,/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64 >>>>>>> >>>>>>> >>>>>>> >>>>>>> 19:39 main= >>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ make >>>>>>> PETSC_DIR=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7 >>>>>>> PETSC_ARCH="" ex1 >>>>>>> *mpicc* -fPIC -g -fast -fPIC -g -fast >>>>>>> -I/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/include >>>>>>> ex1.c >>>>>>> -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib >>>>>>> -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib >>>>>>> -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib >>>>>>> -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib >>>>>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib >>>>>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib >>>>>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib >>>>>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib >>>>>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64 >>>>>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64 >>>>>>> -Wl,-rpath,/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib >>>>>>> -L/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib >>>>>>> -Wl,-rpath,/usr/lib/gcc/ppc64le-redhat-linux/8 >>>>>>> -L/usr/lib/gcc/ppc64le-redhat-linux/8 -lpetsc -llapack -lblas -lparmetis >>>>>>> -lmetis -lstdc++ -ldl -lpthread -lmpiprofilesupport -lmpi_ibm_usempif08 >>>>>>> -lmpi_ibm_usempi_ignore_tkr -lmpi_ibm_mpifh -lmpi_ibm -lnvf -lnvomp >>>>>>> -latomic -lnvhpcatm -lnvcpumath -lnvc -lrt -lm -lgcc_s -lstdc++ -ldl -o ex1 >>>>>>> 19:40 main= >>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ *mpicc >>>>>>> --version* >>>>>>> >>>>>>> *nvc 21.7-0 linuxpower target on Linuxpower* >>>>>>> NVIDIA Compilers and Tools >>>>>>> Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights >>>>>>> reserved. >>>>>>> 19:40 main= >>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ jsrun -n 1 >>>>>>> ./ex1 -ksp_monitor >>>>>>> 0 KSP Residual norm 6.041522986797e+00 >>>>>>> 1 KSP Residual norm 1.042493382631e+00 >>>>>>> 2 KSP Residual norm 7.950907844730e-16 >>>>>>> 0 KSP Residual norm 4.786756692342e+00 >>>>>>> 1 KSP Residual norm 1.426392207750e-01 >>>>>>> 2 KSP Residual norm 1.801079604472e-15 >>>>>>> 0 KSP Residual norm 2.986456323228e+00 >>>>>>> 1 KSP Residual norm 7.669888809223e-02 >>>>>>> 2 KSP Residual norm 3.744083117256e-16 >>>>>>> 0 KSP Residual norm 2.306244667700e-01 >>>>>>> 1 KSP Residual norm 1.355550749587e-02 >>>>>>> 2 KSP Residual norm 5.845524837731e-17 >>>>>>> 0 KSP Residual norm 1.936314002654e-03 >>>>>>> 1 KSP Residual norm 2.125593590819e-04 >>>>>>> 2 KSP Residual norm 6.987141455073e-20 >>>>>>> 0 KSP Residual norm 1.435593531990e-07 >>>>>>> 1 KSP Residual norm 2.588271385567e-08 >>>>>>> 2 KSP Residual norm 3.942196167935e-23 >>>>>>> >>>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From junchao.zhang at gmail.com Fri Aug 27 18:07:11 2021 From: junchao.zhang at gmail.com (Junchao Zhang) Date: Fri, 27 Aug 2021 18:07:11 -0500 Subject: [petsc-users] runtime error on Summit with nvhpc21.7 In-Reply-To: References: Message-ID: On Fri, Aug 27, 2021 at 5:16 PM Mark Adams wrote: > And I found that this C++ code calls PetscIntiialize from Fortran code. > Hence the Fortran library in the call stack. > > F90 tests work but our tests are pure Fortran. > > Should they be using nvcc_wrapper (a Kokkos version) as a linker? > I don't think so. @Satish. > Thanks, > Mark > > On Fri, Aug 27, 2021 at 6:05 PM Mark Adams wrote: > >> >> >> On Fri, Aug 27, 2021 at 5:03 PM Junchao Zhang >> wrote: >> >>> I don't understand the configure options >>> >>> >>> --with-cc=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/ >>> *nvcc_wrapper* >>> --with-cxx=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper >>> --with-fc=/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/bin/mpifort >>> COPTFLAGS="-g -fast" CXXOPTFLAGS="-g -fast" FOPTFLAGS="-g -fast" >>> CUDAFLAGS="-ccbin nvc++" --with-ssl=0 --with-batch=0 --with-mpiexec="jsrun >>> -g 1" *--with-cuda=0* >>> --with-cudac=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper >>> --with-cuda-gencodearch=70 --download-metis --download-parmetis --with-x=0 >>> --with-debugging=0 PETSC_ARCH=arch-summit-opt-nvhpc >>> --prefix=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b >>> >>> Why do you need to use nvcc_wrapper if you do not want to use cuda? >>> >> >> That code that is having a problem links with nvcc_wrapper. >> They get a segv that I sent earlier, in PetscInitialize so I figure I >> should use the same compiler / linker. >> They use CUDA, but we don't need PETSc to use CUDA now. >> >> >>> In addition, nvcc_wrapper is a C++ compiler. Using it for --with-cc=, >>> you also need --with-clanguage=c++ >>> >> >> I rebuilt PETSc with mpicc, mpiCC, mpif90 and --with-nvcc=nvcc_wrapper >> and that built make check works. I gave it to them to test. >> >> Thanks, >> Mark >> >> >>> >>> --Junchao Zhang >>> >>> >>> On Fri, Aug 27, 2021 at 3:28 PM Mark Adams wrote: >>> >>>> >>>> >>>> On Fri, Aug 27, 2021 at 3:56 PM Junchao Zhang >>>> wrote: >>>> >>>>> >>>>> >>>>> On Fri, Aug 27, 2021, 1:52 PM Mark Adams wrote: >>>>> >>>>>> I think the problem is that I build with MPICC and they use nvcc_wrapper. >>>>>> I could just try building PETSc with CC=nvcc_wrapper, but it was not >>>>>> clear if this was the way to go. >>>>>> >>>>> --with-nvcc=nvcc_wrapper >>>>> >>>> >>>> What do I specify for cc and CC? >>>> >>>> >>>>> I will try it. >>>>>> Thanks, >>>>>> Mark >>>>>> >>>>>> On Fri, Aug 27, 2021 at 10:50 AM Junchao Zhang < >>>>>> junchao.zhang at gmail.com> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Aug 27, 2021 at 7:06 AM Mark Adams wrote: >>>>>>> >>>>>>>> I have a user (cc'ed) that has a C++ code and is using a PETSc that >>>>>>>> I built. He is getting this runtime error. >>>>>>>> >>>>>>>> 'make check' runs clean and I built snes/tutorial/ex1 manually, to >>>>>>>> get a link line, and it ran fine. >>>>>>>> I appended the users link line and my test. >>>>>>>> >>>>>>>> I see that they are using Kokkos' "nvcc_wrapper". Should I rebuild >>>>>>>> PETSc using that, maybe we just need to make sure we are both using the >>>>>>>> same underlying compiler or should they use mpiCC? >>>>>>>> >>>>>>> It looks like they used nvcc_wrapper to replace nvcc. You can ask >>>>>>> them to use nvcc directly to see what happens. But the error happened in >>>>>>> petsc initialization, petscsys_petscinitializenohelp, so I doubt it >>>>>>> helps. The easy way is to just attach a debugger. >>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Mark >>>>>>>> >>>>>>>> >>>>>>>> [e13n16:591873] *** Process received signal *** >>>>>>>> >>>>>>>> [e13n16:591873] Signal: Segmentation fault (11) >>>>>>>> >>>>>>>> [e13n16:591873] Signal code: Invalid permissions (2) >>>>>>>> >>>>>>>> [e13n16:591873] Failing at address: 0x102c87e0 >>>>>>>> >>>>>>>> [e13n16:591873] [ 0] >>>>>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] >>>>>>>> >>>>>>>> [e13n16:591873] [ 1] [e13n16:591872] *** Process received signal *** >>>>>>>> >>>>>>>> [e13n16:591872] Signal: Segmentation fault (11) >>>>>>>> >>>>>>>> [e13n16:591872] Signal code: Invalid permissions (2) >>>>>>>> >>>>>>>> [e13n16:591872] Failing at address: 0x102c87e0 >>>>>>>> >>>>>>>> [e13n16:591872] [ 0] >>>>>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] >>>>>>>> >>>>>>>> [e13n16:591872] [ 1] [e13n16:591871] *** Process received signal *** >>>>>>>> >>>>>>>> [e13n16:591871] Signal: Segmentation fault (11) >>>>>>>> >>>>>>>> [e13n16:591871] Signal code: Invalid permissions (2) >>>>>>>> >>>>>>>> [e13n16:591871] Failing at address: 0x102c87e0 >>>>>>>> >>>>>>>> [e13n16:591871] [ 0] >>>>>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] >>>>>>>> >>>>>>>> [e13n16:591871] [ 1] >>>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] >>>>>>>> >>>>>>>> [e13n16:591871] [ 2] [e13n16:591874] *** Process received signal *** >>>>>>>> >>>>>>>> [e13n16:591874] Signal: Segmentation fault (11) >>>>>>>> >>>>>>>> [e13n16:591874] Signal code: Invalid permissions (2) >>>>>>>> >>>>>>>> [e13n16:591874] Failing at address: 0x102c87e0 >>>>>>>> >>>>>>>> [e13n16:591874] [ 0] >>>>>>>> linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] >>>>>>>> >>>>>>>> [e13n16:591874] [ 1] >>>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] >>>>>>>> >>>>>>>> [e13n16:591874] [ 2] >>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] >>>>>>>> >>>>>>>> [e13n16:591874] [ 3] >>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] >>>>>>>> >>>>>>>> [e13n16:591874] [ 4] >>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] >>>>>>>> >>>>>>>> [e13n16:591874] [ 5] >>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] >>>>>>>> >>>>>>>> [e13n16:591874] [ 6] >>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] >>>>>>>> >>>>>>>> [e13n16:591874] [ 7] >>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] >>>>>>>> >>>>>>>> [e13n16:591874] [ 8] >>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] >>>>>>>> >>>>>>>> [e13n16:591871] [ 3] >>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] >>>>>>>> >>>>>>>> [e13n16:591871] [ 4] >>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] >>>>>>>> >>>>>>>> [e13n16:591871] [ 5] >>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] >>>>>>>> >>>>>>>> [e13n16:591871] [ 6] >>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] >>>>>>>> >>>>>>>> [e13n16:591871] [ 7] >>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] >>>>>>>> >>>>>>>> [e13n16:591871] [ 8] >>>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] >>>>>>>> >>>>>>>> [e13n16:591871] [ 9] >>>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] >>>>>>>> >>>>>>>> [e13n16:591871] *** End of error message *** >>>>>>>> >>>>>>>> >>>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] >>>>>>>> >>>>>>>> [e13n16:591874] [ 9] >>>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] >>>>>>>> >>>>>>>> [e13n16:591874] *** End of error message *** >>>>>>>> >>>>>>>> >>>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] >>>>>>>> >>>>>>>> [e13n16:591872] [ 2] >>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] >>>>>>>> >>>>>>>> [e13n16:591872] [ 3] >>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] >>>>>>>> >>>>>>>> [e13n16:591872] [ 4] >>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] >>>>>>>> >>>>>>>> [e13n16:591872] [ 5] >>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] >>>>>>>> >>>>>>>> [e13n16:591872] [ 6] >>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] >>>>>>>> >>>>>>>> [e13n16:591872] [ 7] >>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] >>>>>>>> >>>>>>>> [e13n16:591872] [ 8] >>>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] >>>>>>>> >>>>>>>> [e13n16:591872] [ 9] >>>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] >>>>>>>> >>>>>>>> [e13n16:591872] *** End of error message *** >>>>>>>> >>>>>>>> >>>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] >>>>>>>> >>>>>>>> [e13n16:591873] [ 2] >>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] >>>>>>>> >>>>>>>> [e13n16:591873] [ 3] >>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] >>>>>>>> >>>>>>>> [e13n16:591873] [ 4] >>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] >>>>>>>> >>>>>>>> [e13n16:591873] [ 5] >>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] >>>>>>>> >>>>>>>> [e13n16:591873] [ 6] >>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] >>>>>>>> >>>>>>>> [e13n16:591873] [ 7] >>>>>>>> /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] >>>>>>>> >>>>>>>> [e13n16:591873] [ 8] >>>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] >>>>>>>> >>>>>>>> [e13n16:591873] [ 9] >>>>>>>> /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] >>>>>>>> >>>>>>>> [e13n16:591873] *** End of error message *** >>>>>>>> >>>>>>>> ERROR: One or more process (first noticed rank 1) terminated with >>>>>>>> signal 11 (core dumped) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper >>>>>>>> -arch=sm_70 CMakeFiles/xgc-es-cpp.dir/xgc-es-cpp_build_info.F90.o -o >>>>>>>> bin/xgc-es-cpp -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib:/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64 >>>>>>>> liblibxgc-es-cpp.a >>>>>>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/liblapack.so >>>>>>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/libblas.so >>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so >>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libparmetis.so >>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libmetis.so >>>>>>>> /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/fftw-3.3.9-bzi7deue27ijd7xm4zn7pt22u4sj47g4/lib/libfftw3.so >>>>>>>> libs/pspline/libpspline.a libs/camtimers/libtimers.a >>>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libacchost.so >>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran_mpi.so.2.7.1 >>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran.so.2.7.1 >>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscontainers.a >>>>>>>> /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscore.a >>>>>>>> /usr/lib64/libcuda.so >>>>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/cuda/11.0/lib64/libcudart.so >>>>>>>> /usr/lib64/libdl.so -lmpi_ibm_usempif08 -lmpi_ibm_usempi_ignore_tkr >>>>>>>> -lmpi_ibm_mpifh -lnvf >>>>>>>> -Wl,-rpath-link,/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> 19:39 main= >>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ make >>>>>>>> PETSC_DIR=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7 >>>>>>>> PETSC_ARCH="" ex1 >>>>>>>> *mpicc* -fPIC -g -fast -fPIC -g -fast >>>>>>>> -I/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/include >>>>>>>> ex1.c >>>>>>>> -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib >>>>>>>> -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib >>>>>>>> -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib >>>>>>>> -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib >>>>>>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib >>>>>>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib >>>>>>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib >>>>>>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib >>>>>>>> -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64 >>>>>>>> -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64 >>>>>>>> -Wl,-rpath,/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib >>>>>>>> -L/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib >>>>>>>> -Wl,-rpath,/usr/lib/gcc/ppc64le-redhat-linux/8 >>>>>>>> -L/usr/lib/gcc/ppc64le-redhat-linux/8 -lpetsc -llapack -lblas -lparmetis >>>>>>>> -lmetis -lstdc++ -ldl -lpthread -lmpiprofilesupport -lmpi_ibm_usempif08 >>>>>>>> -lmpi_ibm_usempi_ignore_tkr -lmpi_ibm_mpifh -lmpi_ibm -lnvf -lnvomp >>>>>>>> -latomic -lnvhpcatm -lnvcpumath -lnvc -lrt -lm -lgcc_s -lstdc++ -ldl -o ex1 >>>>>>>> 19:40 main= >>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ *mpicc >>>>>>>> --version* >>>>>>>> >>>>>>>> *nvc 21.7-0 linuxpower target on Linuxpower* >>>>>>>> NVIDIA Compilers and Tools >>>>>>>> Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights >>>>>>>> reserved. >>>>>>>> 19:40 main= >>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ jsrun -n 1 >>>>>>>> ./ex1 -ksp_monitor >>>>>>>> 0 KSP Residual norm 6.041522986797e+00 >>>>>>>> 1 KSP Residual norm 1.042493382631e+00 >>>>>>>> 2 KSP Residual norm 7.950907844730e-16 >>>>>>>> 0 KSP Residual norm 4.786756692342e+00 >>>>>>>> 1 KSP Residual norm 1.426392207750e-01 >>>>>>>> 2 KSP Residual norm 1.801079604472e-15 >>>>>>>> 0 KSP Residual norm 2.986456323228e+00 >>>>>>>> 1 KSP Residual norm 7.669888809223e-02 >>>>>>>> 2 KSP Residual norm 3.744083117256e-16 >>>>>>>> 0 KSP Residual norm 2.306244667700e-01 >>>>>>>> 1 KSP Residual norm 1.355550749587e-02 >>>>>>>> 2 KSP Residual norm 5.845524837731e-17 >>>>>>>> 0 KSP Residual norm 1.936314002654e-03 >>>>>>>> 1 KSP Residual norm 2.125593590819e-04 >>>>>>>> 2 KSP Residual norm 6.987141455073e-20 >>>>>>>> 0 KSP Residual norm 1.435593531990e-07 >>>>>>>> 1 KSP Residual norm 2.588271385567e-08 >>>>>>>> 2 KSP Residual norm 3.942196167935e-23 >>>>>>>> >>>>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at petsc.dev Fri Aug 27 23:19:06 2021 From: bsmith at petsc.dev (Barry Smith) Date: Fri, 27 Aug 2021 23:19:06 -0500 Subject: [petsc-users] runtime error on Summit with nvhpc21.7 In-Reply-To: References: Message-ID: <05F13947-EAD7-442F-9346-F8203131AFD5@petsc.dev> > On Aug 27, 2021, at 5:05 PM, Mark Adams wrote: > > > > On Fri, Aug 27, 2021 at 5:03 PM Junchao Zhang > wrote: > I don't understand the configure options > > --with-cc=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper --with-cxx=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper --with-fc=/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/bin/mpifort COPTFLAGS="-g -fast" CXXOPTFLAGS="-g -fast" FOPTFLAGS="-g -fast" CUDAFLAGS="-ccbin nvc++" --with-ssl=0 --with-batch=0 --with-mpiexec="jsrun -g 1" --with-cuda=0 --with-cudac=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper --with-cuda-gencodearch=70 --download-metis --download-parmetis --with-x=0 --with-debugging=0 PETSC_ARCH=arch-summit-opt-nvhpc --prefix=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b > > Why do you need to use nvcc_wrapper if you do not want to use cuda? > > That code that is having a problem links with nvcc_wrapper. > They get a segv that I sent earlier, in PetscInitialize so I figure I should use the same compiler / linker. > They use CUDA, but we don't need PETSc to use CUDA now. > > In addition, nvcc_wrapper is a C++ compiler. Using it for --with-cc=, you also need --with-clanguage=c++ > > I rebuilt PETSc with mpicc, mpiCC, mpif90 and --with-nvcc=nvcc_wrapper and that built make check works. I gave it to them to test. This is an odd way to do it. The Kokkos nvcc_wrapper wraps the nvcc compiler to allow it to compile Kokkos code and link it against the Kokkos libraries; so using nvcc_wrapper as nvcc is strangely recursive; sure everything in PETSc/Kokkos may build ok (assuming the nvcc that the nvcc_wrapper uses is correct for the situation and uses a correct underlying C++) but it is freakish. PETSc should just be built with the same nvcc that the nvcc_wrapper is using and using the same inner C++ compiler. I suspect the crashes came from the /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper not using the same nvcc and internal C++ compiler as PETSc is ending up using. But if it works, I guess it works. Perhaps when PETSc does not build Kokkos we should have a --with-kokkos-nvcc-wrapper= to allow setting the wrapper. Barry > > Thanks, > Mark > > > --Junchao Zhang > > > On Fri, Aug 27, 2021 at 3:28 PM Mark Adams > wrote: > > > On Fri, Aug 27, 2021 at 3:56 PM Junchao Zhang > wrote: > > > On Fri, Aug 27, 2021, 1:52 PM Mark Adams > wrote: > I think the problem is that I build with MPICC and they use nvcc_wrapper. I could just try building PETSc with CC=nvcc_wrapper, but it was not clear if this was the way to go. > --with-nvcc=nvcc_wrapper > > What do I specify for cc and CC? > > I will try it. > Thanks, > Mark > > On Fri, Aug 27, 2021 at 10:50 AM Junchao Zhang > wrote: > > > > On Fri, Aug 27, 2021 at 7:06 AM Mark Adams > wrote: > I have a user (cc'ed) that has a C++ code and is using a PETSc that I built. He is getting this runtime error. > > 'make check' runs clean and I built snes/tutorial/ex1 manually, to get a link line, and it ran fine. > I appended the users link line and my test. > > I see that they are using Kokkos' "nvcc_wrapper". Should I rebuild PETSc using that, maybe we just need to make sure we are both using the same underlying compiler or should they use mpiCC? > It looks like they used nvcc_wrapper to replace nvcc. You can ask them to use nvcc directly to see what happens. But the error happened in petsc initialization, petscsys_petscinitializenohelp, so I doubt it helps. The easy way is to just attach a debugger. > > Thanks, > Mark > > > [e13n16:591873] *** Process received signal *** > [e13n16:591873] Signal: Segmentation fault (11) > [e13n16:591873] Signal code: Invalid permissions (2) > [e13n16:591873] Failing at address: 0x102c87e0 > [e13n16:591873] [ 0] linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] > [e13n16:591873] [ 1] [e13n16:591872] *** Process received signal *** > [e13n16:591872] Signal: Segmentation fault (11) > [e13n16:591872] Signal code: Invalid permissions (2) > [e13n16:591872] Failing at address: 0x102c87e0 > [e13n16:591872] [ 0] linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] > [e13n16:591872] [ 1] [e13n16:591871] *** Process received signal *** > [e13n16:591871] Signal: Segmentation fault (11) > [e13n16:591871] Signal code: Invalid permissions (2) > [e13n16:591871] Failing at address: 0x102c87e0 > [e13n16:591871] [ 0] linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] > [e13n16:591871] [ 1] /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] > [e13n16:591871] [ 2] [e13n16:591874] *** Process received signal *** > [e13n16:591874] Signal: Segmentation fault (11) > [e13n16:591874] Signal code: Invalid permissions (2) > [e13n16:591874] Failing at address: 0x102c87e0 > [e13n16:591874] [ 0] linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x2000000504d8] > [e13n16:591874] [ 1] /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] > [e13n16:591874] [ 2] /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] > [e13n16:591874] [ 3] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] > [e13n16:591874] [ 4] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] > [e13n16:591874] [ 5] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] > [e13n16:591874] [ 6] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] > [e13n16:591874] [ 7] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] > [e13n16:591874] [ 8] /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] > [e13n16:591871] [ 3] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] > [e13n16:591871] [ 4] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] > [e13n16:591871] [ 5] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] > [e13n16:591871] [ 6] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] > [e13n16:591871] [ 7] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] > [e13n16:591871] [ 8] /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] > [e13n16:591871] [ 9] /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] > [e13n16:591871] *** End of error message *** > /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] > [e13n16:591874] [ 9] /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] > [e13n16:591874] *** End of error message *** > /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] > [e13n16:591872] [ 2] /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] > [e13n16:591872] [ 3] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] > [e13n16:591872] [ 4] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] > [e13n16:591872] [ 5] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] > [e13n16:591872] [ 6] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] > [e13n16:591872] [ 7] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] > [e13n16:591872] [ 8] /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] > [e13n16:591872] [ 9] /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] > [e13n16:591872] *** End of error message *** > /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] > [e13n16:591873] [ 2] /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so.3.015(petscsys_petscinitializenohelp_+0xf4)[0x20000097b3ec] > [e13n16:591873] [ 3] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10131dd8] > [e13n16:591873] [ 4] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015c60] > [e13n16:591873] [ 5] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x1005a8b0] > [e13n16:591873] [ 6] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10015b14] > [e13n16:591873] [ 7] /ccs/home/scheinberg/new_install/build/bin/xgc-es-cpp[0x10014cd0] > [e13n16:591873] [ 8] /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(+0x24078)[0x200005934078] > [e13n16:591873] [ 9] /usr/lib/gcc/ppc64le-redhat-linux/8/../../../../lib64/power9/libc.so.6(__libc_start_main+0xb4)[0x200005934264] > [e13n16:591873] *** End of error message *** > ERROR: One or more process (first noticed rank 1) terminated with signal 11 (core dumped) > > > > /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper -arch=sm_70 CMakeFiles/xgc-es-cpp.dir/xgc-es-cpp_build_info.F90.o -o bin/xgc-es-cpp -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib:/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64 liblibxgc-es-cpp.a /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/liblapack.so /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64/libblas.so /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libpetsc.so /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libparmetis.so /gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b/lib/libmetis.so /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/fftw-3.3.9-bzi7deue27ijd7xm4zn7pt22u4sj47g4/lib/libfftw3.so libs/pspline/libpspline.a libs/camtimers/libtimers.a /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libacchost.so /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran_mpi.so.2.7.1 /gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64/libadios2_fortran.so.2.7.1 /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscontainers.a /gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/DEFAULT/install/lib64/libkokkoscore.a /usr/lib64/libcuda.so /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/cuda/11.0/lib64/libcudart.so /usr/lib64/libdl.so -lmpi_ibm_usempif08 -lmpi_ibm_usempi_ignore_tkr -lmpi_ibm_mpifh -lnvf -Wl,-rpath-link,/gpfs/alpine/world-shared/phy122/lib/install/summit/adios2/devel/nvhpc/lib64 > > > 19:39 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ make PETSC_DIR=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7 PETSC_ARCH="" ex1 > mpicc -fPIC -g -fast -fPIC -g -fast -I/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/include ex1.c -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib -Wl,-rpath,/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib -L/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7/lib -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/lib -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/hdf5-1.10.7-nfhjvzsshg5qihqv44y5ji6ihsqpd73v/lib -Wl,-rpath,/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64 -L/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/netlib-lapack-3.9.1-b5iqtudpwjumes5gsdol3bzsh7qlv7mf/lib64 -Wl,-rpath,/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib -L/autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib -Wl,-rpath,/usr/lib/gcc/ppc64le-redhat-linux/8 -L/usr/lib/gcc/ppc64le-redhat-linux/8 -lpetsc -llapack -lblas -lparmetis -lmetis -lstdc++ -ldl -lpthread -lmpiprofilesupport -lmpi_ibm_usempif08 -lmpi_ibm_usempi_ignore_tkr -lmpi_ibm_mpifh -lmpi_ibm -lnvf -lnvomp -latomic -lnvhpcatm -lnvcpumath -lnvc -lrt -lm -lgcc_s -lstdc++ -ldl -o ex1 > 19:40 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ mpicc --version > > nvc 21.7-0 linuxpower target on Linuxpower > NVIDIA Compilers and Tools > Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved. > 19:40 main= /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials$ jsrun -n 1 ./ex1 -ksp_monitor > 0 KSP Residual norm 6.041522986797e+00 > 1 KSP Residual norm 1.042493382631e+00 > 2 KSP Residual norm 7.950907844730e-16 > 0 KSP Residual norm 4.786756692342e+00 > 1 KSP Residual norm 1.426392207750e-01 > 2 KSP Residual norm 1.801079604472e-15 > 0 KSP Residual norm 2.986456323228e+00 > 1 KSP Residual norm 7.669888809223e-02 > 2 KSP Residual norm 3.744083117256e-16 > 0 KSP Residual norm 2.306244667700e-01 > 1 KSP Residual norm 1.355550749587e-02 > 2 KSP Residual norm 5.845524837731e-17 > 0 KSP Residual norm 1.936314002654e-03 > 1 KSP Residual norm 2.125593590819e-04 > 2 KSP Residual norm 6.987141455073e-20 > 0 KSP Residual norm 1.435593531990e-07 > 1 KSP Residual norm 2.588271385567e-08 > 2 KSP Residual norm 3.942196167935e-23 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Sat Aug 28 06:40:20 2021 From: mfadams at lbl.gov (Mark Adams) Date: Sat, 28 Aug 2021 07:40:20 -0400 Subject: [petsc-users] runtime error on Summit with nvhpc21.7 In-Reply-To: <05F13947-EAD7-442F-9346-F8203131AFD5@petsc.dev> References: <05F13947-EAD7-442F-9346-F8203131AFD5@petsc.dev> Message-ID: cc'ing Robert who is taking over from Aaron for a few days. Robert, I suggest hoisting PetscInitailize into main or at least a C call of some sort. This error in pgf90_str_copy_klen might be be avoided by not giving PetscInitialize a string ('petsc.rc') and linking petsc.rc --> .petscrc (PETSc will look for .petscrc by default). more below. On Sat, Aug 28, 2021 at 12:19 AM Barry Smith wrote: > > > On Aug 27, 2021, at 5:05 PM, Mark Adams wrote: > > > > On Fri, Aug 27, 2021 at 5:03 PM Junchao Zhang > wrote: > >> I don't understand the configure options >> >> >> --with-cc=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/ >> *nvcc_wrapper* >> --with-cxx=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper >> --with-fc=/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/bin/mpifort >> COPTFLAGS="-g -fast" CXXOPTFLAGS="-g -fast" FOPTFLAGS="-g -fast" >> CUDAFLAGS="-ccbin nvc++" --with-ssl=0 --with-batch=0 --with-mpiexec="jsrun >> -g 1" *--with-cuda=0* >> --with-cudac=/gpfs/alpine/world-shared/phy122/lib/install/summit/kokkos/nvhpc21.7/bin/nvcc_wrapper >> --with-cuda-gencodearch=70 --download-metis --download-parmetis --with-x=0 >> --with-debugging=0 PETSC_ARCH=arch-summit-opt-nvhpc >> --prefix=/gpfs/alpine/world-shared/phy122/lib/install/summit/petsc/current/opt-nvhpc21.7b >> >> Why do you need to use nvcc_wrapper if you do not want to use cuda? >> > > That code that is having a problem links with nvcc_wrapper. > They get a segv that I sent earlier, in PetscInitialize so I figure I > should use the same compiler / linker. > They use CUDA, but we don't need PETSc to use CUDA now. > > >> In addition, nvcc_wrapper is a C++ compiler. Using it for --with-cc=, you >> also need --with-clanguage=c++ >> > > I rebuilt PETSc with mpicc, mpiCC, mpif90 and --with-nvcc=nvcc_wrapper > and that built make check works. I gave it to them to test. > > > This is an odd way to do it. The Kokkos nvcc_wrapper wraps the nvcc > compiler to allow it to compile Kokkos code and link it against the Kokkos > libraries; so using nvcc_wrapper as nvcc is strangely recursive; sure > everything in PETSc/Kokkos may build ok (assuming the nvcc that the > nvcc_wrapper uses is correct for the situation and uses a correct > underlying C++) but it is freakish. PETSc should just be built with the > same nvcc that the nvcc_wrapper is using and using the same inner C++ > compiler. > Yes, this convoluted. Thanks for your take on this. That said, they have been struggling to get Kokkos to build with nvhpc and I can see this is what they have compiling and are pressing on with a milestone that is due soon. Anyway, I found that PetscInitialize is called from Fortran code (in 25+ years have we ever seen a C++ code call PetscInitialize from a Fortran subroutine ?), which should be fine. Just unusual. This explains the error coming from a Fortran library: [e13n16:591874] [ 1] >>>>>> /autofs/nccs-svm1_sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/compilers/lib/libnvf.so(pgf90_str_copy_klen+0x1fc)[0x200004a79ee4] >>>>>> >>>>> They are using this Fortran compiler and so I built PETSc with it: /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/bin/mpifort I see: 07:16 1 ~$ which mpif90 /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/nvhpc-21.7/spectrum-mpi-10.4.0.3-20210112-nv7jd363ym3n4zpgornfbq6bh4tqjyak/bin/mpif90 so this is the nvhpc-21.7 Fortran. Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.jamond at cea.fr Mon Aug 30 13:17:16 2021 From: olivier.jamond at cea.fr (Olivier Jamond) Date: Mon, 30 Aug 2021 20:17:16 +0200 Subject: [petsc-users] KSPSolve with MPIAIJ with non-square 'diagonal parts' Message-ID: <7f8f699a-e9b5-768d-828c-b8a923f0fe75@cea.fr> Hello, I am sorry because I surely miss something, but I cannot manage to solve a problem with a MPIAIJ matrix which has non-square 'diagonal parts'. I copy/paste at the bottom of this message a very simple piece of code which causes me troubles. I this code I try to do 'x=KSP(A)*b' (with gmres/jacobi), but this fails whereas a matrix multiplication 'b=A*x' seems to work. Is ksp with such a matrix supposed to work (I can't find anything in the documentation about that, so I guess that it is...) ? Many thanks, Olivier NB: this code should be launched with exactly 3 procs #include "petscsys.h" /* framework routines */ #include "petscvec.h" /* vectors */ #include "petscmat.h" /* matrices */ #include "petscksp.h" #include #include #include #include static char help[] = "Trying to solve a linear system on a sub-block of a matrix\n\n"; int main(int argc, char **argv) { ? MPI_Init(NULL, NULL); ? PetscErrorCode ierr; ? ierr = PetscInitialize(&argc, &argv, NULL, help); ? CHKERRQ(ierr); ? // clang-format off ? std::vector> AA = { ????? { 1,? 2,? 0, /**/ 0,? 3, /**/ 0,? 0,? 4}, ????? { 0,? 5,? 6, /**/ 7,? 0, /**/ 0,? 8,? 0}, ????? { 9,? 0, 10, /**/11,? 0, /**/ 0, 12,? 0}, ????? //--------------------------------------- ????? {13,? 0, 14, /**/15, 16, /**/17,? 0,? 0}, ????? { 0, 18,? 0, /**/19, 20, /**/21,? 0,? 0}, ????? { 0,? 0,? 0, /**/22, 23, /**/ 1, 24,? 0}, ????? //-------------------------------------- ????? {25, 26, 27, /**/ 0,? 0, /**/28, 29,? 0}, ????? {30,? 0,? 0, /**/31, 32, /**/33,? 0, 34}, ? }; ? std::vector bb = {1., ??????????????????????????? 1., ??????????????????????????? 1., ??????????????????????????? // ??????????????????????????? 1., ??????????????????????????? 1., ??????????????????????????? 1., ??????????????????????????? // ??????????????????????????? 1., ??????????????????????????? 1.}; ? std::vector nDofsRow = {3, 3, 2}; ? std::vector nDofsCol = {3, 2, 3}; ? // clang-format on ? int NDofs = std::accumulate(nDofsRow.begin(), nDofsRow.end(), 0); ? int pRank, nProc; ? MPI_Comm_rank(PETSC_COMM_WORLD, &pRank); ? MPI_Comm_size(PETSC_COMM_WORLD, &nProc); ? if (nProc != 3) { ??? std::cerr << "THIS TEST MUST BE LAUNCHED WITH EXACTLY 3 PROCS\n"; ??? abort(); ? } ? Mat A; ? MatCreate(PETSC_COMM_WORLD, &A); ? MatSetType(A, MATMPIAIJ); ? MatSetSizes(A, nDofsRow[pRank], nDofsCol[pRank], PETSC_DETERMINE, PETSC_DETERMINE); ? MatMPIAIJSetPreallocation(A, NDofs, NULL, NDofs, NULL); ? Vec b; ? VecCreate(PETSC_COMM_WORLD, &b); ? VecSetType(b, VECMPI); ? VecSetSizes(b, nDofsRow[pRank], PETSC_DECIDE); ? if (pRank == 0) { ??? for (int i = 0; i < NDofs; ++i) { ????? for (int j = 0; j < NDofs; ++j) { ??????? if (AA[i][j] != 0.) { ????????? MatSetValue(A, i, j, AA[i][j], ADD_VALUES); ??????? } ????? } ????? VecSetValue(b, i, bb[i], ADD_VALUES); ??? } ? } ? MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); ? MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); ? VecAssemblyBegin(b); ? VecAssemblyEnd(b); ? PetscViewerPushFormat(PETSC_VIEWER_STDOUT_WORLD, PETSC_VIEWER_ASCII_DENSE); ? MatView(A, PETSC_VIEWER_STDOUT_WORLD); ? VecView(b, PETSC_VIEWER_STDOUT_WORLD); ? KSP ksp; ? KSPCreate(PETSC_COMM_WORLD, &ksp); ? KSPSetOperators(ksp, A, A); ? KSPSetFromOptions(ksp); ? PC pc; ? KSPGetPC(ksp, &pc); ? PCSetFromOptions(pc); ? Vec x; ? MatCreateVecs(A, &x, NULL); ? ierr = KSPSolve(ksp, b, x);??? // this fails ? MatMult(A, x, b);????????????? // whereas the seems to be ok... ? VecView(x, PETSC_VIEWER_STDOUT_WORLD); ? MPI_Finalize(); ? return 0; } From stefano.zampini at gmail.com Mon Aug 30 15:42:21 2021 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Mon, 30 Aug 2021 23:42:21 +0300 Subject: [petsc-users] KSPSolve with MPIAIJ with non-square 'diagonal parts' In-Reply-To: <7f8f699a-e9b5-768d-828c-b8a923f0fe75@cea.fr> References: <7f8f699a-e9b5-768d-828c-b8a923f0fe75@cea.fr> Message-ID: What is the error you are getting from the KSP? Default solver in parallel in BlockJacobi+ILU which does not work for non-square matrices. You do not need to call PCSetFromOptions on the pc. Just call KSPSetFromOptions and run with -pc_type none > On Aug 30, 2021, at 9:17 PM, Olivier Jamond wrote: > > Hello, > > I am sorry because I surely miss something, but I cannot manage to solve a problem with a MPIAIJ matrix which has non-square 'diagonal parts'. > > I copy/paste at the bottom of this message a very simple piece of code which causes me troubles. I this code I try to do 'x=KSP(A)*b' (with gmres/jacobi), but this fails whereas a matrix multiplication 'b=A*x' seems to work. Is ksp with such a matrix supposed to work (I can't find anything in the documentation about that, so I guess that it is...) ? > > Many thanks, > Olivier > > NB: this code should be launched with exactly 3 procs > > #include "petscsys.h" /* framework routines */ > #include "petscvec.h" /* vectors */ > #include "petscmat.h" /* matrices */ > #include "petscksp.h" > > #include > #include > #include > #include > > static char help[] = "Trying to solve a linear system on a sub-block of a matrix\n\n"; > int main(int argc, char **argv) > { > MPI_Init(NULL, NULL); > PetscErrorCode ierr; > ierr = PetscInitialize(&argc, &argv, NULL, help); > CHKERRQ(ierr); > > // clang-format off > std::vector> AA = { > { 1, 2, 0, /**/ 0, 3, /**/ 0, 0, 4}, > { 0, 5, 6, /**/ 7, 0, /**/ 0, 8, 0}, > { 9, 0, 10, /**/11, 0, /**/ 0, 12, 0}, > //--------------------------------------- > {13, 0, 14, /**/15, 16, /**/17, 0, 0}, > { 0, 18, 0, /**/19, 20, /**/21, 0, 0}, > { 0, 0, 0, /**/22, 23, /**/ 1, 24, 0}, > //-------------------------------------- > {25, 26, 27, /**/ 0, 0, /**/28, 29, 0}, > {30, 0, 0, /**/31, 32, /**/33, 0, 34}, > }; > > > std::vector bb = {1., > 1., > 1., > // > 1., > 1., > 1., > // > 1., > 1.}; > > > std::vector nDofsRow = {3, 3, 2}; > std::vector nDofsCol = {3, 2, 3}; > // clang-format on > > int NDofs = std::accumulate(nDofsRow.begin(), nDofsRow.end(), 0); > > int pRank, nProc; > MPI_Comm_rank(PETSC_COMM_WORLD, &pRank); > MPI_Comm_size(PETSC_COMM_WORLD, &nProc); > > if (nProc != 3) { > std::cerr << "THIS TEST MUST BE LAUNCHED WITH EXACTLY 3 PROCS\n"; > abort(); > } > > Mat A; > MatCreate(PETSC_COMM_WORLD, &A); > MatSetType(A, MATMPIAIJ); > MatSetSizes(A, nDofsRow[pRank], nDofsCol[pRank], PETSC_DETERMINE, PETSC_DETERMINE); > MatMPIAIJSetPreallocation(A, NDofs, NULL, NDofs, NULL); > > Vec b; > VecCreate(PETSC_COMM_WORLD, &b); > VecSetType(b, VECMPI); > VecSetSizes(b, nDofsRow[pRank], PETSC_DECIDE); > > if (pRank == 0) { > for (int i = 0; i < NDofs; ++i) { > for (int j = 0; j < NDofs; ++j) { > if (AA[i][j] != 0.) { > MatSetValue(A, i, j, AA[i][j], ADD_VALUES); > } > } > VecSetValue(b, i, bb[i], ADD_VALUES); > } > } > > MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); > MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); > VecAssemblyBegin(b); > VecAssemblyEnd(b); > > PetscViewerPushFormat(PETSC_VIEWER_STDOUT_WORLD, PETSC_VIEWER_ASCII_DENSE); > MatView(A, PETSC_VIEWER_STDOUT_WORLD); > VecView(b, PETSC_VIEWER_STDOUT_WORLD); > > KSP ksp; > KSPCreate(PETSC_COMM_WORLD, &ksp); > KSPSetOperators(ksp, A, A); > KSPSetFromOptions(ksp); > > PC pc; > KSPGetPC(ksp, &pc); > PCSetFromOptions(pc); > > Vec x; > MatCreateVecs(A, &x, NULL); > ierr = KSPSolve(ksp, b, x); // this fails > MatMult(A, x, b); // whereas the seems to be ok... > > VecView(x, PETSC_VIEWER_STDOUT_WORLD); > > MPI_Finalize(); > > return 0; > } > From knepley at gmail.com Mon Aug 30 16:31:02 2021 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 30 Aug 2021 17:31:02 -0400 Subject: [petsc-users] KSPSolve with MPIAIJ with non-square 'diagonal parts' In-Reply-To: References: <7f8f699a-e9b5-768d-828c-b8a923f0fe75@cea.fr> Message-ID: On Mon, Aug 30, 2021 at 4:42 PM Stefano Zampini wrote: > What is the error you are getting from the KSP? Default solver in parallel > in BlockJacobi+ILU which does not work for non-square matrices. You do not > need to call PCSetFromOptions on the pc. Just call KSPSetFromOptions and > run with -pc_type none > I have a more basic question. The idea of GMRES is the following: We build a space, {r, A r, A^2 r, ...} for the solution This means that r and Ar must have compatible layouts. It does not sound like this is the case for you. Thanks, Matt > > On Aug 30, 2021, at 9:17 PM, Olivier Jamond > wrote: > > > > Hello, > > > > I am sorry because I surely miss something, but I cannot manage to solve > a problem with a MPIAIJ matrix which has non-square 'diagonal parts'. > > > > I copy/paste at the bottom of this message a very simple piece of code > which causes me troubles. I this code I try to do 'x=KSP(A)*b' (with > gmres/jacobi), but this fails whereas a matrix multiplication 'b=A*x' seems > to work. Is ksp with such a matrix supposed to work (I can't find anything > in the documentation about that, so I guess that it is...) ? > > > > Many thanks, > > Olivier > > > > NB: this code should be launched with exactly 3 procs > > > > #include "petscsys.h" /* framework routines */ > > #include "petscvec.h" /* vectors */ > > #include "petscmat.h" /* matrices */ > > #include "petscksp.h" > > > > #include > > #include > > #include > > #include > > > > static char help[] = "Trying to solve a linear system on a sub-block of > a matrix\n\n"; > > int main(int argc, char **argv) > > { > > MPI_Init(NULL, NULL); > > PetscErrorCode ierr; > > ierr = PetscInitialize(&argc, &argv, NULL, help); > > CHKERRQ(ierr); > > > > // clang-format off > > std::vector> AA = { > > { 1, 2, 0, /**/ 0, 3, /**/ 0, 0, 4}, > > { 0, 5, 6, /**/ 7, 0, /**/ 0, 8, 0}, > > { 9, 0, 10, /**/11, 0, /**/ 0, 12, 0}, > > //--------------------------------------- > > {13, 0, 14, /**/15, 16, /**/17, 0, 0}, > > { 0, 18, 0, /**/19, 20, /**/21, 0, 0}, > > { 0, 0, 0, /**/22, 23, /**/ 1, 24, 0}, > > //-------------------------------------- > > {25, 26, 27, /**/ 0, 0, /**/28, 29, 0}, > > {30, 0, 0, /**/31, 32, /**/33, 0, 34}, > > }; > > > > > > std::vector bb = {1., > > 1., > > 1., > > // > > 1., > > 1., > > 1., > > // > > 1., > > 1.}; > > > > > > std::vector nDofsRow = {3, 3, 2}; > > std::vector nDofsCol = {3, 2, 3}; > > // clang-format on > > > > int NDofs = std::accumulate(nDofsRow.begin(), nDofsRow.end(), 0); > > > > int pRank, nProc; > > MPI_Comm_rank(PETSC_COMM_WORLD, &pRank); > > MPI_Comm_size(PETSC_COMM_WORLD, &nProc); > > > > if (nProc != 3) { > > std::cerr << "THIS TEST MUST BE LAUNCHED WITH EXACTLY 3 PROCS\n"; > > abort(); > > } > > > > Mat A; > > MatCreate(PETSC_COMM_WORLD, &A); > > MatSetType(A, MATMPIAIJ); > > MatSetSizes(A, nDofsRow[pRank], nDofsCol[pRank], PETSC_DETERMINE, > PETSC_DETERMINE); > > MatMPIAIJSetPreallocation(A, NDofs, NULL, NDofs, NULL); > > > > Vec b; > > VecCreate(PETSC_COMM_WORLD, &b); > > VecSetType(b, VECMPI); > > VecSetSizes(b, nDofsRow[pRank], PETSC_DECIDE); > > > > if (pRank == 0) { > > for (int i = 0; i < NDofs; ++i) { > > for (int j = 0; j < NDofs; ++j) { > > if (AA[i][j] != 0.) { > > MatSetValue(A, i, j, AA[i][j], ADD_VALUES); > > } > > } > > VecSetValue(b, i, bb[i], ADD_VALUES); > > } > > } > > > > MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); > > MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); > > VecAssemblyBegin(b); > > VecAssemblyEnd(b); > > > > PetscViewerPushFormat(PETSC_VIEWER_STDOUT_WORLD, > PETSC_VIEWER_ASCII_DENSE); > > MatView(A, PETSC_VIEWER_STDOUT_WORLD); > > VecView(b, PETSC_VIEWER_STDOUT_WORLD); > > > > KSP ksp; > > KSPCreate(PETSC_COMM_WORLD, &ksp); > > KSPSetOperators(ksp, A, A); > > KSPSetFromOptions(ksp); > > > > PC pc; > > KSPGetPC(ksp, &pc); > > PCSetFromOptions(pc); > > > > Vec x; > > MatCreateVecs(A, &x, NULL); > > ierr = KSPSolve(ksp, b, x); // this fails > > MatMult(A, x, b); // whereas the seems to be ok... > > > > VecView(x, PETSC_VIEWER_STDOUT_WORLD); > > > > MPI_Finalize(); > > > > return 0; > > } > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Mon Aug 30 18:39:23 2021 From: sam.guo at cd-adapco.com (Sam Guo) Date: Mon, 30 Aug 2021 16:39:23 -0700 Subject: [petsc-users] PETSc 3.15.3 compiling error Message-ID: Dear PETSc dev team, I am compiling petsc 3.15.3 and got following compiling error petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing binary operator before token "(" 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0) Any idea what I did wrong? Thanks, Sam -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Mon Aug 30 18:52:45 2021 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 30 Aug 2021 18:52:45 -0500 (CDT) Subject: [petsc-users] PETSc 3.15.3 compiling error In-Reply-To: References: Message-ID: Are you using --download-mumps or pre-installed mumps? If using pre-installed - try --download-mumps. If you still have issues - send us configure.log and make.log from the failed build. Satish On Mon, 30 Aug 2021, Sam Guo wrote: > Dear PETSc dev team, > I am compiling petsc 3.15.3 and got following compiling error > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing binary > operator before token "(" > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0) > Any idea what I did wrong? > > Thanks, > Sam > From sam.guo at cd-adapco.com Mon Aug 30 18:56:29 2021 From: sam.guo at cd-adapco.com (Sam Guo) Date: Mon, 30 Aug 2021 16:56:29 -0700 Subject: [petsc-users] PETSc 3.15.3 compiling error In-Reply-To: References: Message-ID: I use pre-installed On Mon, Aug 30, 2021 at 4:53 PM Satish Balay wrote: > > Are you using --download-mumps or pre-installed mumps? If using > pre-installed - try --download-mumps. > > If you still have issues - send us configure.log and make.log from the > failed build. > > Satish > > On Mon, 30 Aug 2021, Sam Guo wrote: > > > Dear PETSc dev team, > > I am compiling petsc 3.15.3 and got following compiling error > > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing binary > > operator before token "(" > > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0) > > Any idea what I did wrong? > > > > Thanks, > > Sam > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Mon Aug 30 19:10:45 2021 From: sam.guo at cd-adapco.com (Sam Guo) Date: Mon, 30 Aug 2021 17:10:45 -0700 Subject: [petsc-users] PETSc 3.15.3 compiling error In-Reply-To: References: Message-ID: Attached please find the configure.log. I use my own CMake. I have defined -DPETSC_HAVE_MUMPS. Thanks. On Mon, Aug 30, 2021 at 4:56 PM Sam Guo wrote: > I use pre-installed > > On Mon, Aug 30, 2021 at 4:53 PM Satish Balay wrote: > >> >> Are you using --download-mumps or pre-installed mumps? If using >> pre-installed - try --download-mumps. >> >> If you still have issues - send us configure.log and make.log from the >> failed build. >> >> Satish >> >> On Mon, 30 Aug 2021, Sam Guo wrote: >> >> > Dear PETSc dev team, >> > I am compiling petsc 3.15.3 and got following compiling error >> > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing binary >> > operator before token "(" >> > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0) >> > Any idea what I did wrong? >> > >> > Thanks, >> > Sam >> > >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: text/x-log Size: 88711 bytes Desc: not available URL: From bantingl at myumanitoba.ca Mon Aug 30 19:13:38 2021 From: bantingl at myumanitoba.ca (Lucas Banting) Date: Tue, 31 Aug 2021 00:13:38 +0000 Subject: [petsc-users] PETSc 3.15.3 compiling error In-Reply-To: References: Message-ID: Dumb question but are you configuring with scalapack? ________________________________ From: petsc-users on behalf of Sam Guo Sent: Monday, August 30, 2021 7:10:45 PM To: petsc-users Subject: Re: [petsc-users] PETSc 3.15.3 compiling error Attached please find the configure.log. I use my own CMake. I have defined -DPETSC_HAVE_MUMPS. Thanks. On Mon, Aug 30, 2021 at 4:56 PM Sam Guo > wrote: I use pre-installed On Mon, Aug 30, 2021 at 4:53 PM Satish Balay > wrote: Are you using --download-mumps or pre-installed mumps? If using pre-installed - try --download-mumps. If you still have issues - send us configure.log and make.log from the failed build. Satish On Mon, 30 Aug 2021, Sam Guo wrote: > Dear PETSc dev team, > I am compiling petsc 3.15.3 and got following compiling error > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing binary > operator before token "(" > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0) > Any idea what I did wrong? > > Thanks, > Sam > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Mon Aug 30 19:17:10 2021 From: sam.guo at cd-adapco.com (Sam Guo) Date: Mon, 30 Aug 2021 17:17:10 -0700 Subject: [petsc-users] PETSc 3.15.3 compiling error In-Reply-To: References: Message-ID: I don't use scalapack. On Mon, Aug 30, 2021 at 5:13 PM Lucas Banting wrote: > Dumb question but are you configuring with scalapack? > ------------------------------ > *From:* petsc-users on behalf of Sam > Guo > *Sent:* Monday, August 30, 2021 7:10:45 PM > *To:* petsc-users > *Subject:* Re: [petsc-users] PETSc 3.15.3 compiling error > > Attached please find the configure.log. I use my own CMake. I have > defined -DPETSC_HAVE_MUMPS. Thanks. > > On Mon, Aug 30, 2021 at 4:56 PM Sam Guo wrote: > > I use pre-installed > > On Mon, Aug 30, 2021 at 4:53 PM Satish Balay wrote: > > > Are you using --download-mumps or pre-installed mumps? If using > pre-installed - try --download-mumps. > > If you still have issues - send us configure.log and make.log from the > failed build. > > Satish > > On Mon, 30 Aug 2021, Sam Guo wrote: > > > Dear PETSc dev team, > > I am compiling petsc 3.15.3 and got following compiling error > > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing binary > > operator before token "(" > > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0) > > Any idea what I did wrong? > > > > Thanks, > > Sam > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bantingl at myumanitoba.ca Mon Aug 30 19:21:33 2021 From: bantingl at myumanitoba.ca (Lucas Banting) Date: Tue, 31 Aug 2021 00:21:33 +0000 Subject: [petsc-users] PETSc 3.15.3 compiling error In-Reply-To: References: Message-ID: I believe it is a dependency of mumps based on the configure.log ending. Package mumps requested but dependency scalapack not requested ________________________________ From: Sam Guo Sent: Monday, August 30, 2021 7:17:10 PM To: Lucas Banting Cc: petsc-users Subject: Re: [petsc-users] PETSc 3.15.3 compiling error I don't use scalapack. On Mon, Aug 30, 2021 at 5:13 PM Lucas Banting > wrote: Dumb question but are you configuring with scalapack? ________________________________ From: petsc-users > on behalf of Sam Guo > Sent: Monday, August 30, 2021 7:10:45 PM To: petsc-users > Subject: Re: [petsc-users] PETSc 3.15.3 compiling error Attached please find the configure.log. I use my own CMake. I have defined -DPETSC_HAVE_MUMPS. Thanks. On Mon, Aug 30, 2021 at 4:56 PM Sam Guo > wrote: I use pre-installed On Mon, Aug 30, 2021 at 4:53 PM Satish Balay > wrote: Are you using --download-mumps or pre-installed mumps? If using pre-installed - try --download-mumps. If you still have issues - send us configure.log and make.log from the failed build. Satish On Mon, 30 Aug 2021, Sam Guo wrote: > Dear PETSc dev team, > I am compiling petsc 3.15.3 and got following compiling error > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing binary > operator before token "(" > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0) > Any idea what I did wrong? > > Thanks, > Sam > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Mon Aug 30 19:22:33 2021 From: sam.guo at cd-adapco.com (Sam Guo) Date: Mon, 30 Aug 2021 17:22:33 -0700 Subject: [petsc-users] PETSc 3.15.3 compiling error In-Reply-To: References: Message-ID: My pre-installed MUMPS defines the dummy for blacs and scalapack. On Mon, Aug 30, 2021 at 5:17 PM Sam Guo wrote: > I don't use scalapack. > > On Mon, Aug 30, 2021 at 5:13 PM Lucas Banting > wrote: > >> Dumb question but are you configuring with scalapack? >> ------------------------------ >> *From:* petsc-users on behalf of Sam >> Guo >> *Sent:* Monday, August 30, 2021 7:10:45 PM >> *To:* petsc-users >> *Subject:* Re: [petsc-users] PETSc 3.15.3 compiling error >> >> Attached please find the configure.log. I use my own CMake. I have >> defined -DPETSC_HAVE_MUMPS. Thanks. >> >> On Mon, Aug 30, 2021 at 4:56 PM Sam Guo wrote: >> >> I use pre-installed >> >> On Mon, Aug 30, 2021 at 4:53 PM Satish Balay wrote: >> >> >> Are you using --download-mumps or pre-installed mumps? If using >> pre-installed - try --download-mumps. >> >> If you still have issues - send us configure.log and make.log from the >> failed build. >> >> Satish >> >> On Mon, 30 Aug 2021, Sam Guo wrote: >> >> > Dear PETSc dev team, >> > I am compiling petsc 3.15.3 and got following compiling error >> > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing binary >> > operator before token "(" >> > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0) >> > Any idea what I did wrong? >> > >> > Thanks, >> > Sam >> > >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From sam.guo at cd-adapco.com Mon Aug 30 19:26:11 2021 From: sam.guo at cd-adapco.com (Sam Guo) Date: Mon, 30 Aug 2021 17:26:11 -0700 Subject: [petsc-users] PETSc 3.15.3 compiling error In-Reply-To: References: Message-ID: I am able to compile PETSc 3.11.3 with my pre-installed MUMPS. On Mon, Aug 30, 2021 at 5:22 PM Sam Guo wrote: > My pre-installed MUMPS defines the dummy for blacs and scalapack. > > On Mon, Aug 30, 2021 at 5:17 PM Sam Guo wrote: > >> I don't use scalapack. >> >> On Mon, Aug 30, 2021 at 5:13 PM Lucas Banting >> wrote: >> >>> Dumb question but are you configuring with scalapack? >>> ------------------------------ >>> *From:* petsc-users on behalf of Sam >>> Guo >>> *Sent:* Monday, August 30, 2021 7:10:45 PM >>> *To:* petsc-users >>> *Subject:* Re: [petsc-users] PETSc 3.15.3 compiling error >>> >>> Attached please find the configure.log. I use my own CMake. I have >>> defined -DPETSC_HAVE_MUMPS. Thanks. >>> >>> On Mon, Aug 30, 2021 at 4:56 PM Sam Guo wrote: >>> >>> I use pre-installed >>> >>> On Mon, Aug 30, 2021 at 4:53 PM Satish Balay wrote: >>> >>> >>> Are you using --download-mumps or pre-installed mumps? If using >>> pre-installed - try --download-mumps. >>> >>> If you still have issues - send us configure.log and make.log from the >>> failed build. >>> >>> Satish >>> >>> On Mon, 30 Aug 2021, Sam Guo wrote: >>> >>> > Dear PETSc dev team, >>> > I am compiling petsc 3.15.3 and got following compiling error >>> > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing binary >>> > operator before token "(" >>> > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0) >>> > Any idea what I did wrong? >>> > >>> > Thanks, >>> > Sam >>> > >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Mon Aug 30 22:22:33 2021 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 30 Aug 2021 22:22:33 -0500 (CDT) Subject: [petsc-users] PETSc 3.15.3 compiling error In-Reply-To: References: Message-ID: Use the additional option: -with-mumps-serial Satish On Mon, 30 Aug 2021, Sam Guo wrote: > Attached please find the configure.log. I use my own CMake. I have > defined -DPETSC_HAVE_MUMPS. Thanks. > > On Mon, Aug 30, 2021 at 4:56 PM Sam Guo wrote: > > > I use pre-installed > > > > On Mon, Aug 30, 2021 at 4:53 PM Satish Balay wrote: > > > >> > >> Are you using --download-mumps or pre-installed mumps? If using > >> pre-installed - try --download-mumps. > >> > >> If you still have issues - send us configure.log and make.log from the > >> failed build. > >> > >> Satish > >> > >> On Mon, 30 Aug 2021, Sam Guo wrote: > >> > >> > Dear PETSc dev team, > >> > I am compiling petsc 3.15.3 and got following compiling error > >> > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing binary > >> > operator before token "(" > >> > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0) > >> > Any idea what I did wrong? > >> > > >> > Thanks, > >> > Sam > >> > > >> > >> > From sam.guo at cd-adapco.com Mon Aug 30 23:26:37 2021 From: sam.guo at cd-adapco.com (Sam Guo) Date: Mon, 30 Aug 2021 21:26:37 -0700 Subject: [petsc-users] PETSc 3.15.3 compiling error In-Reply-To: References: Message-ID: Same compiling error with --with-mumps-serial=1. On Mon, Aug 30, 2021 at 8:22 PM Satish Balay wrote: > Use the additional option: -with-mumps-serial > > Satish > > On Mon, 30 Aug 2021, Sam Guo wrote: > > > Attached please find the configure.log. I use my own CMake. I have > > defined -DPETSC_HAVE_MUMPS. Thanks. > > > > On Mon, Aug 30, 2021 at 4:56 PM Sam Guo wrote: > > > > > I use pre-installed > > > > > > On Mon, Aug 30, 2021 at 4:53 PM Satish Balay > wrote: > > > > > >> > > >> Are you using --download-mumps or pre-installed mumps? If using > > >> pre-installed - try --download-mumps. > > >> > > >> If you still have issues - send us configure.log and make.log from the > > >> failed build. > > >> > > >> Satish > > >> > > >> On Mon, 30 Aug 2021, Sam Guo wrote: > > >> > > >> > Dear PETSc dev team, > > >> > I am compiling petsc 3.15.3 and got following compiling error > > >> > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing > binary > > >> > operator before token "(" > > >> > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0) > > >> > Any idea what I did wrong? > > >> > > > >> > Thanks, > > >> > Sam > > >> > > > >> > > >> > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Mon Aug 30 23:42:55 2021 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 30 Aug 2021 23:42:55 -0500 (CDT) Subject: [petsc-users] PETSc 3.15.3 compiling error In-Reply-To: References: Message-ID: please resend the logs Satish On Mon, 30 Aug 2021, Sam Guo wrote: > Same compiling error with --with-mumps-serial=1. > > On Mon, Aug 30, 2021 at 8:22 PM Satish Balay wrote: > > > Use the additional option: -with-mumps-serial > > > > Satish > > > > On Mon, 30 Aug 2021, Sam Guo wrote: > > > > > Attached please find the configure.log. I use my own CMake. I have > > > defined -DPETSC_HAVE_MUMPS. Thanks. > > > > > > On Mon, Aug 30, 2021 at 4:56 PM Sam Guo wrote: > > > > > > > I use pre-installed > > > > > > > > On Mon, Aug 30, 2021 at 4:53 PM Satish Balay > > wrote: > > > > > > > >> > > > >> Are you using --download-mumps or pre-installed mumps? If using > > > >> pre-installed - try --download-mumps. > > > >> > > > >> If you still have issues - send us configure.log and make.log from the > > > >> failed build. > > > >> > > > >> Satish > > > >> > > > >> On Mon, 30 Aug 2021, Sam Guo wrote: > > > >> > > > >> > Dear PETSc dev team, > > > >> > I am compiling petsc 3.15.3 and got following compiling error > > > >> > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing > > binary > > > >> > operator before token "(" > > > >> > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0) > > > >> > Any idea what I did wrong? > > > >> > > > > >> > Thanks, > > > >> > Sam > > > >> > > > > >> > > > >> > > > > > > > > From balay at mcs.anl.gov Mon Aug 30 23:47:47 2021 From: balay at mcs.anl.gov (Satish Balay) Date: Mon, 30 Aug 2021 23:47:47 -0500 (CDT) Subject: [petsc-users] PETSc 3.15.3 compiling error In-Reply-To: References: Message-ID: <65d5cb9a-2dc0-8362-6a7-5acf784e7138@mcs.anl.gov> Also - what do you have for: grep MUMPS_VERSION /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h Satish On Mon, 30 Aug 2021, Satish Balay via petsc-users wrote: > please resend the logs > > Satish > > On Mon, 30 Aug 2021, Sam Guo wrote: > > > Same compiling error with --with-mumps-serial=1. > > > > On Mon, Aug 30, 2021 at 8:22 PM Satish Balay wrote: > > > > > Use the additional option: -with-mumps-serial > > > > > > Satish > > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: > > > > > > > Attached please find the configure.log. I use my own CMake. I have > > > > defined -DPETSC_HAVE_MUMPS. Thanks. > > > > > > > > On Mon, Aug 30, 2021 at 4:56 PM Sam Guo wrote: > > > > > > > > > I use pre-installed > > > > > > > > > > On Mon, Aug 30, 2021 at 4:53 PM Satish Balay > > > wrote: > > > > > > > > > >> > > > > >> Are you using --download-mumps or pre-installed mumps? If using > > > > >> pre-installed - try --download-mumps. > > > > >> > > > > >> If you still have issues - send us configure.log and make.log from the > > > > >> failed build. > > > > >> > > > > >> Satish > > > > >> > > > > >> On Mon, 30 Aug 2021, Sam Guo wrote: > > > > >> > > > > >> > Dear PETSc dev team, > > > > >> > I am compiling petsc 3.15.3 and got following compiling error > > > > >> > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: missing > > > binary > > > > >> > operator before token "(" > > > > >> > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0) > > > > >> > Any idea what I did wrong? > > > > >> > > > > > >> > Thanks, > > > > >> > Sam > > > > >> > > > > > >> > > > > >> > > > > > > > > > > > > > From patrick.sanan at gmail.com Tue Aug 31 04:52:31 2021 From: patrick.sanan at gmail.com (Patrick Sanan) Date: Tue, 31 Aug 2021 11:52:31 +0200 Subject: [petsc-users] Postdoctoral position at ETH Zurich: Geodynamics / HPC / Julia Message-ID: The Geophysical Fluid Dynamics group at ETH Zurich (Switzerland) is seeking a postdoctoral appointee to work for about 2.5 years on an ambitious project involving developing a Julia-based library for GPU-accelerated multiphysics solvers based on pseudotransient relaxation. Of particular interest for this audience might be that a major component of the proposed work is to make these solvers available via PETSc (as a SNES implementation), thus exposing them for use within a host of existing HPC applications, including those involved in this specific project. We'll accept applications until the position is filled, but for full consideration please apply before October 1, 2021. Full information is in the ad at the following link, and please feel free to contact me directly! https://github.com/psanan/gpu4geo_postdoc_ad/ Best, Patrick -------------- next part -------------- An HTML attachment was scrubbed... URL: From matteo.semplice at uninsubria.it Tue Aug 31 09:50:11 2021 From: matteo.semplice at uninsubria.it (Matteo Semplice) Date: Tue, 31 Aug 2021 16:50:11 +0200 Subject: [petsc-users] Mat preallocation in case of variable stencils Message-ID: Hi. We are writing a code for a FD scheme on an irregular domain and thus the local stencil is quite variable: we have inner nodes, boundary nodes and inactive nodes, each with their own stencil type and offset with respect to the grid node. We currently create a matrix with DMCreateMatrix on a DMDA and for now have set the option MAT_NEW_NONZERO_LOCATIONS to PETSC_TRUE, but its time to render the code memory-efficient. The layout created automatically is correct for inner nodes, wrong for boundary ones (off-centered stencils) and redundant for outer nodes. After the preprocessing stage (including stencil creation) we'd be in position to set the nonzero pattern properly. Do we need to start from a Mat created by CreateMatrix? Or is it ok to call DMCreateMatrix (so that the splitting among CPUs and the block size are set by PETSc) and then call a MatSetPreallocation routine? Also, I've seen in some examples that you call the Seq and the MPI preallocation routines in a row. Does this work because MatMPIAIJSetPreallocation silently does nothing on a Seq matrix and viceversa? Thanks ??? Matteo From jed at jedbrown.org Tue Aug 31 10:32:21 2021 From: jed at jedbrown.org (Jed Brown) Date: Tue, 31 Aug 2021 09:32:21 -0600 Subject: [petsc-users] Mat preallocation in case of variable stencils In-Reply-To: References: Message-ID: <87k0k1zl2y.fsf@jedbrown.org> Matteo Semplice writes: > Hi. > > We are writing a code for a FD scheme on an irregular domain and thus > the local stencil is quite variable: we have inner nodes, boundary nodes > and inactive nodes, each with their own stencil type and offset with > respect to the grid node. We currently create a matrix with > DMCreateMatrix on a DMDA and for now have set the option > MAT_NEW_NONZERO_LOCATIONS to PETSC_TRUE, but its time to render the code > memory-efficient. The layout created automatically is correct for inner > nodes, wrong for boundary ones (off-centered stencils) and redundant for > outer nodes. > > After the preprocessing stage (including stencil creation) we'd be in > position to set the nonzero pattern properly. > > Do we need to start from a Mat created by CreateMatrix? Or is it ok to > call DMCreateMatrix (so that the splitting among CPUs and the block size > are set by PETSc) and then call a MatSetPreallocation routine? You can call MatXAIJSetPreallocation after. It'll handle all matrix types so you don't have to shepherd data for all the specific preallocations. > Also, I've seen in some examples that you call the Seq and the MPI > preallocation routines in a row. Does this work because > MatMPIAIJSetPreallocation silently does nothing on a Seq matrix and > viceversa? > > Thanks > > ??? Matteo From sam.guo at cd-adapco.com Tue Aug 31 16:42:57 2021 From: sam.guo at cd-adapco.com (Sam Guo) Date: Tue, 31 Aug 2021 14:42:57 -0700 Subject: [petsc-users] PETSc 3.15.3 compiling error In-Reply-To: <65d5cb9a-2dc0-8362-6a7-5acf784e7138@mcs.anl.gov> References: <65d5cb9a-2dc0-8362-6a7-5acf784e7138@mcs.anl.gov> Message-ID: Attached please find the latest configure.log. grep MUMPS_VERSION /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef MUMPS_VERSION /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define MUMPS_VERSION "5.2.1" /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef MUMPS_VERSION_MAX_LEN /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define MUMPS_VERSION_MAX_LEN 30 /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h: char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef MUMPS_VERSION /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define MUMPS_VERSION "5.2.1" /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef MUMPS_VERSION_MAX_LEN /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define MUMPS_VERSION_MAX_LEN 30 /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h: char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef MUMPS_VERSION /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define MUMPS_VERSION "5.2.1" /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef MUMPS_VERSION_MAX_LEN /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define MUMPS_VERSION_MAX_LEN 30 /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h: char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef MUMPS_VERSION /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define MUMPS_VERSION "5.2.1" /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef MUMPS_VERSION_MAX_LEN /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define MUMPS_VERSION_MAX_LEN 30 /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h: char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; On Mon, Aug 30, 2021 at 9:47 PM Satish Balay wrote: > Also - what do you have for: > > grep MUMPS_VERSION > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h > > Satish > > On Mon, 30 Aug 2021, Satish Balay via petsc-users wrote: > > > please resend the logs > > > > Satish > > > > On Mon, 30 Aug 2021, Sam Guo wrote: > > > > > Same compiling error with --with-mumps-serial=1. > > > > > > On Mon, Aug 30, 2021 at 8:22 PM Satish Balay > wrote: > > > > > > > Use the additional option: -with-mumps-serial > > > > > > > > Satish > > > > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: > > > > > > > > > Attached please find the configure.log. I use my own CMake. I have > > > > > defined -DPETSC_HAVE_MUMPS. Thanks. > > > > > > > > > > On Mon, Aug 30, 2021 at 4:56 PM Sam Guo > wrote: > > > > > > > > > > > I use pre-installed > > > > > > > > > > > > On Mon, Aug 30, 2021 at 4:53 PM Satish Balay > > > > wrote: > > > > > > > > > > > >> > > > > > >> Are you using --download-mumps or pre-installed mumps? If using > > > > > >> pre-installed - try --download-mumps. > > > > > >> > > > > > >> If you still have issues - send us configure.log and make.log > from the > > > > > >> failed build. > > > > > >> > > > > > >> Satish > > > > > >> > > > > > >> On Mon, 30 Aug 2021, Sam Guo wrote: > > > > > >> > > > > > >> > Dear PETSc dev team, > > > > > >> > I am compiling petsc 3.15.3 and got following compiling > error > > > > > >> > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: > missing > > > > binary > > > > > >> > operator before token "(" > > > > > >> > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0) > > > > > >> > Any idea what I did wrong? > > > > > >> > > > > > > >> > Thanks, > > > > > >> > Sam > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: text/x-log Size: 88745 bytes Desc: not available URL: From balay at mcs.anl.gov Tue Aug 31 18:47:11 2021 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 31 Aug 2021 18:47:11 -0500 (CDT) Subject: [petsc-users] PETSc 3.15.3 compiling error In-Reply-To: References: <65d5cb9a-2dc0-8362-6a7-5acf784e7138@mcs.anl.gov> Message-ID: <575fd7-61c5-b983-5ad0-4c2748b6b6d2@mcs.anl.gov> ******************************************************************************* UNABLE to CONFIGURE with GIVEN OPTIONS (see configure.log for details): ------------------------------------------------------------------------------- Package mumps requested requires Fortran but compiler turned off. ******************************************************************************* i.e remove '--with-fc=0' and rerun configure. Satish On Tue, 31 Aug 2021, Sam Guo wrote: > Attached please find the latest configure.log. > > grep MUMPS_VERSION > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef > MUMPS_VERSION > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define > MUMPS_VERSION "5.2.1" > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#ifndef > MUMPS_VERSION_MAX_LEN > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h:#define > MUMPS_VERSION_MAX_LEN 30 > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/cmumps_c.h: > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef > MUMPS_VERSION > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define > MUMPS_VERSION "5.2.1" > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#ifndef > MUMPS_VERSION_MAX_LEN > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h:#define > MUMPS_VERSION_MAX_LEN 30 > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/dmumps_c.h: > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef > MUMPS_VERSION > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define > MUMPS_VERSION "5.2.1" > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#ifndef > MUMPS_VERSION_MAX_LEN > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h:#define > MUMPS_VERSION_MAX_LEN 30 > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/smumps_c.h: > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef > MUMPS_VERSION > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define > MUMPS_VERSION "5.2.1" > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#ifndef > MUMPS_VERSION_MAX_LEN > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h:#define > MUMPS_VERSION_MAX_LEN 30 > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/zmumps_c.h: > char version_number[MUMPS_VERSION_MAX_LEN + 1 + 1]; > > On Mon, Aug 30, 2021 at 9:47 PM Satish Balay wrote: > > > Also - what do you have for: > > > > grep MUMPS_VERSION > > /u/cd4hhv/dev4/mumps/5.2.1-vanilla-parmetis3.2.0-openmp-cda-001/linux-x86_64-2.3.4/include/*.h > > > > Satish > > > > On Mon, 30 Aug 2021, Satish Balay via petsc-users wrote: > > > > > please resend the logs > > > > > > Satish > > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: > > > > > > > Same compiling error with --with-mumps-serial=1. > > > > > > > > On Mon, Aug 30, 2021 at 8:22 PM Satish Balay > > wrote: > > > > > > > > > Use the additional option: -with-mumps-serial > > > > > > > > > > Satish > > > > > > > > > > On Mon, 30 Aug 2021, Sam Guo wrote: > > > > > > > > > > > Attached please find the configure.log. I use my own CMake. I have > > > > > > defined -DPETSC_HAVE_MUMPS. Thanks. > > > > > > > > > > > > On Mon, Aug 30, 2021 at 4:56 PM Sam Guo > > wrote: > > > > > > > > > > > > > I use pre-installed > > > > > > > > > > > > > > On Mon, Aug 30, 2021 at 4:53 PM Satish Balay > > > > > wrote: > > > > > > > > > > > > > >> > > > > > > >> Are you using --download-mumps or pre-installed mumps? If using > > > > > > >> pre-installed - try --download-mumps. > > > > > > >> > > > > > > >> If you still have issues - send us configure.log and make.log > > from the > > > > > > >> failed build. > > > > > > >> > > > > > > >> Satish > > > > > > >> > > > > > > >> On Mon, 30 Aug 2021, Sam Guo wrote: > > > > > > >> > > > > > > >> > Dear PETSc dev team, > > > > > > >> > I am compiling petsc 3.15.3 and got following compiling > > error > > > > > > >> > petsc/src/mat/impls/aij/mpi/mumps/mumps.c:52:31: error: > > missing > > > > > binary > > > > > > >> > operator before token "(" > > > > > > >> > 52 | #if PETSC_PKG_MUMPS_VERSION_GE(5,3,0) > > > > > > >> > Any idea what I did wrong? > > > > > > >> > > > > > > > >> > Thanks, > > > > > > >> > Sam > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > >