From doss0032 at umn.edu Mon May 1 15:13:31 2017 From: doss0032 at umn.edu (Scott Dossa) Date: Mon, 1 May 2017 15:13:31 -0500 Subject: [petsc-users] Call KSP routine before each timestep Message-ID: Hi All, I'm looking to pass a vector between a KSP and TS routine. The KSP routine must be called before each timestep, and the solution vector is needed for the TS routine. Normally, TSSolve() runs over all timesteps, but in my case, I'd like to be able to add a routine before each timestep. Can someone direct me to an example script or briefly explain a case which shows how to control time stepping such that one could achieve something along the lines of: while (step < maxsteps+1){ KSPSolve(ksp, v, p); /* solves for Vec p and passes this info onto TS */ TSSolve(ts, u); /* only iterate for 1 timestep */ } The function TSSetPreStep() seemed promising, but it can only take TS as arguments which may not be sufficient to pass a global vector. Thank you in advance. Scott Dossa -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon May 1 15:24:20 2017 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 1 May 2017 15:24:20 -0500 Subject: [petsc-users] Call KSP routine before each timestep In-Reply-To: References: Message-ID: On Mon, May 1, 2017 at 3:13 PM, Scott Dossa wrote: > Hi All, > > I'm looking to pass a vector between a KSP and TS routine. The KSP routine > must be called before each timestep, and the solution vector is needed for > the TS routine. Normally, TSSolve() runs over all timesteps, but in my > case, I'd like to be able to add a routine before each timestep. > > Can someone direct me to an example script or briefly explain a case which > shows how to control time stepping such that one could achieve something > along the lines of: > > while (step < maxsteps+1){ > KSPSolve(ksp, v, p); /* solves for Vec p and passes this info onto > TS */ > TSSolve(ts, u); /* only iterate for 1 timestep */ > } > > The function TSSetPreStep() seemed promising, but it can only take TS as > arguments which may not be sufficient to pass a global vector. > Yes, this is the correct thing. You can a) Just attach a Vec to the TS using PetscObjectCompose(), but that is ugly so you can b) Make a context structure, and stick it in the TS using http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/TS/TSSetApplicationContext.html That is also where the KSP should go. Thanks, Matt > Thank you in advance. > Scott Dossa > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon May 1 15:32:22 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 1 May 2017 15:32:22 -0500 Subject: [petsc-users] Call KSP routine before each timestep In-Reply-To: References: Message-ID: Scott - Are you doing some kind of pressure projection method? PETSc-developers - should this functionality be directly added to TS since it comes up fairly often? Barry > On May 1, 2017, at 3:24 PM, Matthew Knepley wrote: > > On Mon, May 1, 2017 at 3:13 PM, Scott Dossa wrote: > Hi All, > > I'm looking to pass a vector between a KSP and TS routine. The KSP routine must be called before each timestep, and the solution vector is needed for the TS routine. Normally, TSSolve() runs over all timesteps, but in my case, I'd like to be able to add a routine before each timestep. > > Can someone direct me to an example script or briefly explain a case which shows how to control time stepping such that one could achieve something along the lines of: > > while (step < maxsteps+1){ > KSPSolve(ksp, v, p); /* solves for Vec p and passes this info onto TS */ > TSSolve(ts, u); /* only iterate for 1 timestep */ > } > > The function TSSetPreStep() seemed promising, but it can only take TS as arguments which may not be sufficient to pass a global vector. > > Yes, this is the correct thing. You can > > a) Just attach a Vec to the TS using PetscObjectCompose(), but that is ugly so you can > > b) Make a context structure, and stick it in the TS using > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/TS/TSSetApplicationContext.html > > That is also where the KSP should go. > > Thanks, > > Matt > > Thank you in advance. > Scott Dossa > > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener From doss0032 at umn.edu Mon May 1 16:27:45 2017 From: doss0032 at umn.edu (Scott Dossa) Date: Mon, 1 May 2017 16:27:45 -0500 Subject: [petsc-users] Call KSP routine before each timestep In-Reply-To: References: Message-ID: Hi All, Matt: Thank you! Using the application context is a good approach to pass the vector information. Can you also direct me to which command allows TSSolve to be only called for one timestep / start at the correct timestep? When TSSolve() is called, it always resets to timestep 0. Barry: Yes, this is a pressure projection method where one needs the pressure field at each timestep to solve for the velocity field. I will likely have more follow up questions as I quick write this up. Thank you both for your input. -Scott Dossa On Mon, May 1, 2017 at 3:32 PM, Barry Smith wrote: > > Scott - Are you doing some kind of pressure projection method? > > PETSc-developers - should this functionality be directly added to TS > since it comes up fairly often? > > Barry > > > > > On May 1, 2017, at 3:24 PM, Matthew Knepley wrote: > > > > On Mon, May 1, 2017 at 3:13 PM, Scott Dossa wrote: > > Hi All, > > > > I'm looking to pass a vector between a KSP and TS routine. The KSP > routine must be called before each timestep, and the solution vector is > needed for the TS routine. Normally, TSSolve() runs over all timesteps, but > in my case, I'd like to be able to add a routine before each timestep. > > > > Can someone direct me to an example script or briefly explain a case > which shows how to control time stepping such that one could achieve > something along the lines of: > > > > while (step < maxsteps+1){ > > KSPSolve(ksp, v, p); /* solves for Vec p and passes this info > onto TS */ > > TSSolve(ts, u); /* only iterate for 1 timestep */ > > } > > > > The function TSSetPreStep() seemed promising, but it can only take TS as > arguments which may not be sufficient to pass a global vector. > > > > Yes, this is the correct thing. You can > > > > a) Just attach a Vec to the TS using PetscObjectCompose(), but that is > ugly so you can > > > > b) Make a context structure, and stick it in the TS using > > > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/TS/ > TSSetApplicationContext.html > > > > That is also where the KSP should go. > > > > Thanks, > > > > Matt > > > > Thank you in advance. > > Scott Dossa > > > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon May 1 16:42:07 2017 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 1 May 2017 16:42:07 -0500 Subject: [petsc-users] Call KSP routine before each timestep In-Reply-To: References: Message-ID: On Mon, May 1, 2017 at 4:27 PM, Scott Dossa wrote: > Hi All, > > Matt: > Thank you! Using the application context is a good approach to pass the > vector information. Can you also direct me to which command allows TSSolve > to be only called for one timestep / start at the correct timestep? When > TSSolve() is called, it always resets to timestep 0. > You should not need that since PreStep will be called at the beginning of each step, but just in case http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/TS/TSStep.html although using this is tricky so I do not recommend it. > Barry: > Yes, this is a pressure projection method where one needs the pressure > field at each timestep to solve for the velocity field. > If it was me, I would not do it this way, but its somewhat a matter of taste. It makes more sense to me to formulate the whole system as a DAE, meaning time derivatives on some things (v) and not others (p). Then use a DAE timestepper and your fluid solver can be formulated as pressure projection using PCFIELDSPLIT. This way, if you want to use another kind of fluid solver, you can, whereas now you are stuck with the alternation of projection of momentum update. Thanks, Matt > I will likely have more follow up questions as I quick write this up. > Thank you both for your input. > -Scott Dossa > > On Mon, May 1, 2017 at 3:32 PM, Barry Smith wrote: > >> >> Scott - Are you doing some kind of pressure projection method? >> >> PETSc-developers - should this functionality be directly added to TS >> since it comes up fairly often? >> >> Barry >> >> >> >> > On May 1, 2017, at 3:24 PM, Matthew Knepley wrote: >> > >> > On Mon, May 1, 2017 at 3:13 PM, Scott Dossa wrote: >> > Hi All, >> > >> > I'm looking to pass a vector between a KSP and TS routine. The KSP >> routine must be called before each timestep, and the solution vector is >> needed for the TS routine. Normally, TSSolve() runs over all timesteps, but >> in my case, I'd like to be able to add a routine before each timestep. >> > >> > Can someone direct me to an example script or briefly explain a case >> which shows how to control time stepping such that one could achieve >> something along the lines of: >> > >> > while (step < maxsteps+1){ >> > KSPSolve(ksp, v, p); /* solves for Vec p and passes this info >> onto TS */ >> > TSSolve(ts, u); /* only iterate for 1 timestep */ >> > } >> > >> > The function TSSetPreStep() seemed promising, but it can only take TS >> as arguments which may not be sufficient to pass a global vector. >> > >> > Yes, this is the correct thing. You can >> > >> > a) Just attach a Vec to the TS using PetscObjectCompose(), but that >> is ugly so you can >> > >> > b) Make a context structure, and stick it in the TS using >> > >> > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages >> /TS/TSSetApplicationContext.html >> > >> > That is also where the KSP should go. >> > >> > Thanks, >> > >> > Matt >> > >> > Thank you in advance. >> > Scott Dossa >> > >> > >> > >> > >> > >> > -- >> > What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> > -- Norbert Wiener >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Mon May 1 19:14:56 2017 From: mfadams at lbl.gov (Mark Adams) Date: Mon, 1 May 2017 20:14:56 -0400 Subject: [petsc-users] SNES error Message-ID: I get this SNES failure and I don't understand what the problem is. The rtol is 1.e-6 and the first iteration reduces the residual by 9 orders of magnitude. Yet, TS is not satisfied. What is going on here? mpiexec -n 1 ./vml -v_coord_cylinder -x_dm_refine 2 -v_dm_refine 2 -snes_rtol 1.e-6 -snes_stol 1.e-6 -ts_type cn -snes_fd -pc_type lu -ksp_type preonly -x_petscspace_order 1 -x_petscspace_poly_tensor -v_petscspace_order 1 -v_petscspace_poly_tensor -ts_dt .1 -ts_max_steps 10 -ts_final_time 1e10 -verbose 3 -num_species 1 -snes_monitor -masses 1,2,4 -thermal_temps 30,30,30 -domainv_lo -2,-2 -domainv_hi 2,2 -domainx_lo -12,-12 -domainx_hi 12,12 -E 0,0 -blobx_radius 2 -x_dm_view hdf5:x.h5 -x_vec_view hdf5:x.h5::append -v_dm_view hdf5:v.h5 -v_vec_view hdf5:v.h5::append -x_pre_dm_view hdf5:prex.h5 -x_pre_vec_view hdf5:prex.h5::append .... 0 SNES Function norm 4.097052680599e+00 1 SNES Function norm 1.213148652908e-09 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_NONLINEAR_SOLVE, increase -ts_max_snes_failures or make negative to attempt recovery [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3659-g699918129c GIT Date: 2017-04-26 08:18:35 -0400 [0]PETSC ERROR: ./vml on a arch-macosx-gnu-O named MarksMac-5.local by markadams Mon May 1 19:21:32 2017 [0]PETSC ERROR: Configure options --with-cc=clang --with-cc++=clang++ COPTFLAGS="-O3 -g -mavx2" CXXOPTFLAGS="-O3 -g -mavx2" FOPTFLAGS="-O3 -g -mavx2" --download-mpich=1 --download-parmetis=1 --download-metis=1 --download-hypre=1 --download-ml=1 --download-triangle=1 --download-ctetgen=1 --download-p4est=1 --with-x=0 --download-superlu_dist --download-superlu --download-ctetgen --with-debugging=0 --download-hdf5=1 PETSC_ARCH=arch-macosx-gnu-O --download-chaco --with-viewfromoptions=1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon May 1 19:51:46 2017 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 1 May 2017 19:51:46 -0500 Subject: [petsc-users] SNES error In-Reply-To: References: Message-ID: Run with -snes_converged_reason. Matt On Mon, May 1, 2017 at 7:14 PM, Mark Adams wrote: > I get this SNES failure and I don't understand what the problem is. The > rtol is 1.e-6 and the first iteration reduces the residual by 9 orders of > magnitude. Yet, TS is not satisfied. What is going on here? > > mpiexec -n 1 ./vml -v_coord_cylinder -x_dm_refine 2 -v_dm_refine 2 > -snes_rtol 1.e-6 -snes_stol 1.e-6 -ts_type cn -snes_fd -pc_type lu > -ksp_type preonly -x_petscspace_order 1 -x_petscspace_poly_tensor > -v_petscspace_order 1 -v_petscspace_poly_tensor -ts_dt .1 -ts_max_steps 10 > -ts_final_time 1e10 -verbose 3 -num_species 1 -snes_monitor -masses 1,2,4 > -thermal_temps 30,30,30 -domainv_lo -2,-2 -domainv_hi 2,2 -domainx_lo > -12,-12 -domainx_hi 12,12 -E 0,0 -blobx_radius 2 -x_dm_view hdf5:x.h5 > -x_vec_view hdf5:x.h5::append -v_dm_view hdf5:v.h5 -v_vec_view > hdf5:v.h5::append -x_pre_dm_view hdf5:prex.h5 -x_pre_vec_view > hdf5:prex.h5::append > .... > > 0 SNES Function norm 4.097052680599e+00 > 1 SNES Function norm 1.213148652908e-09 > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: > [0]PETSC ERROR: TSStep has failed due to DIVERGED_NONLINEAR_SOLVE, > increase -ts_max_snes_failures or make negative to attempt recovery > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3659-g699918129c > GIT Date: 2017-04-26 08:18:35 -0400 > [0]PETSC ERROR: ./vml on a arch-macosx-gnu-O named MarksMac-5.local by > markadams Mon May 1 19:21:32 2017 > [0]PETSC ERROR: Configure options --with-cc=clang --with-cc++=clang++ > COPTFLAGS="-O3 -g -mavx2" CXXOPTFLAGS="-O3 -g -mavx2" FOPTFLAGS="-O3 -g > -mavx2" --download-mpich=1 --download-parmetis=1 --download-metis=1 > --download-hypre=1 --download-ml=1 --download-triangle=1 > --download-ctetgen=1 --download-p4est=1 --with-x=0 --download-superlu_dist > --download-superlu --download-ctetgen --with-debugging=0 --download-hdf5=1 > PETSC_ARCH=arch-macosx-gnu-O --download-chaco --with-viewfromoptions=1 > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon May 1 21:25:24 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 1 May 2017 21:25:24 -0500 Subject: [petsc-users] SNES error In-Reply-To: References: Message-ID: <677760BF-5666-4C9D-A064-B495ACD80889@mcs.anl.gov> and -snes_linesearch_monitor -ts_adapt_monitor > On May 1, 2017, at 7:51 PM, Matthew Knepley wrote: > > Run with -snes_converged_reason. > > Matt > > On Mon, May 1, 2017 at 7:14 PM, Mark Adams wrote: > I get this SNES failure and I don't understand what the problem is. The rtol is 1.e-6 and the first iteration reduces the residual by 9 orders of magnitude. Yet, TS is not satisfied. What is going on here? > > mpiexec -n 1 ./vml -v_coord_cylinder -x_dm_refine 2 -v_dm_refine 2 -snes_rtol 1.e-6 -snes_stol 1.e-6 -ts_type cn -snes_fd -pc_type lu -ksp_type preonly -x_petscspace_order 1 -x_petscspace_poly_tensor -v_petscspace_order 1 -v_petscspace_poly_tensor -ts_dt .1 -ts_max_steps 10 -ts_final_time 1e10 -verbose 3 -num_species 1 -snes_monitor -masses 1,2,4 -thermal_temps 30,30,30 -domainv_lo -2,-2 -domainv_hi 2,2 -domainx_lo -12,-12 -domainx_hi 12,12 -E 0,0 -blobx_radius 2 -x_dm_view hdf5:x.h5 -x_vec_view hdf5:x.h5::append -v_dm_view hdf5:v.h5 -v_vec_view hdf5:v.h5::append -x_pre_dm_view hdf5:prex.h5 -x_pre_vec_view hdf5:prex.h5::append > .... > > 0 SNES Function norm 4.097052680599e+00 > 1 SNES Function norm 1.213148652908e-09 > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: > [0]PETSC ERROR: TSStep has failed due to DIVERGED_NONLINEAR_SOLVE, increase -ts_max_snes_failures or make negative to attempt recovery > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3659-g699918129c GIT Date: 2017-04-26 08:18:35 -0400 > [0]PETSC ERROR: ./vml on a arch-macosx-gnu-O named MarksMac-5.local by markadams Mon May 1 19:21:32 2017 > [0]PETSC ERROR: Configure options --with-cc=clang --with-cc++=clang++ COPTFLAGS="-O3 -g -mavx2" CXXOPTFLAGS="-O3 -g -mavx2" FOPTFLAGS="-O3 -g -mavx2" --download-mpich=1 --download-parmetis=1 --download-metis=1 --download-hypre=1 --download-ml=1 --download-triangle=1 --download-ctetgen=1 --download-p4est=1 --with-x=0 --download-superlu_dist --download-superlu --download-ctetgen --with-debugging=0 --download-hdf5=1 PETSC_ARCH=arch-macosx-gnu-O --download-chaco --with-viewfromoptions=1 > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener From emconsta at mcs.anl.gov Mon May 1 22:06:27 2017 From: emconsta at mcs.anl.gov (Emil Constantinescu) Date: Mon, 1 May 2017 22:06:27 -0500 Subject: [petsc-users] Call KSP routine before each timestep In-Reply-To: References: Message-ID: On 5/1/17 4:42 PM, Matthew Knepley wrote: > On Mon, May 1, 2017 at 4:27 PM, Scott Dossa > wrote: > > Hi All, > > Matt: > Thank you! Using the application context is a good approach to pass > the vector information. Can you also direct me to which command > allows TSSolve to be only called for one timestep / start at the > correct timestep? When TSSolve() is called, it always resets to > timestep 0. > > > You should not need that since PreStep will be called at the beginning > of each step, but just in case > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/TS/TSStep.html > > although using this is tricky so I do not recommend it. If it's a projection you may need to set the PostStep and (or) PostStage if using multistage methods (http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/TS/TSSetPostStage.html#TSSetPostStage); otherwise the last step may not be div free. > Barry: > Yes, this is a pressure projection method where one needs the > pressure field at each timestep to solve for the velocity field. > > > If it was me, I would not do it this way, but its somewhat a matter of > taste. It makes more sense to me to formulate the whole > system as a DAE, meaning time derivatives on some things (v) and not > others (p). Then use a DAE timestepper and your > fluid solver can be formulated as pressure projection using > PCFIELDSPLIT. This way, if you want to use another kind of fluid > solver, you can, whereas now you are stuck with the alternation of > projection of momentum update. Yes, formulating it as a DAE is desirable; however, if you project it separately you have access to significantly more time steppers. Emil > Thanks, > > Matt > > I will likely have more follow up questions as I quick write this > up. Thank you both for your input. > -Scott Dossa > > On Mon, May 1, 2017 at 3:32 PM, Barry Smith > wrote: > > > Scott - Are you doing some kind of pressure projection method? > > PETSc-developers - should this functionality be directly > added to TS since it comes up fairly often? > > Barry > > > > > On May 1, 2017, at 3:24 PM, Matthew Knepley > > wrote: > > > > On Mon, May 1, 2017 at 3:13 PM, Scott Dossa > wrote: > > Hi All, > > > > I'm looking to pass a vector between a KSP and TS routine. > The KSP routine must be called before each timestep, and the > solution vector is needed for the TS routine. Normally, > TSSolve() runs over all timesteps, but in my case, I'd like to > be able to add a routine before each timestep. > > > > Can someone direct me to an example script or briefly explain > a case which shows how to control time stepping such that one > could achieve something along the lines of: > > > > while (step < maxsteps+1){ > > KSPSolve(ksp, v, p); /* solves for Vec p and passes > this info onto TS */ > > TSSolve(ts, u); /* only iterate for 1 timestep */ > > } > > > > The function TSSetPreStep() seemed promising, but it can only > take TS as arguments which may not be sufficient to pass a > global vector. > > > > Yes, this is the correct thing. You can > > > > a) Just attach a Vec to the TS using PetscObjectCompose(), > but that is ugly so you can > > > > b) Make a context structure, and stick it in the TS using > > > > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/TS/TSSetApplicationContext.html > > > > > That is also where the KSP should go. > > > > Thanks, > > > > Matt > > > > Thank you in advance. > > Scott Dossa > > > > > > > > > > > > -- > > What most experimenters take for granted before they begin > their experiments is infinitely more interesting than any > results to which their experiments lead. > > -- Norbert Wiener > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener From mfadams at lbl.gov Tue May 2 10:10:18 2017 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 2 May 2017 11:10:18 -0400 Subject: [petsc-users] SNES error In-Reply-To: <677760BF-5666-4C9D-A064-B495ACD80889@mcs.anl.gov> References: <677760BF-5666-4C9D-A064-B495ACD80889@mcs.anl.gov> Message-ID: /Users/markadams/Codes/petsc/arch-macosx-gnu-O/bin/mpiexec -n 1 ./vml -v_coord_cylinder -x_dm_refine 2 -v_dm_refine 2 -snes_rtol 1.e-6 -snes_stol 1.e-6 -ts_type cn -snes_fd -pc_type lu -ksp_type preonly -x_petscspace_order 1 -x_petscspace_poly_tensor -v_petscspace_order 1 -v_petscspace_poly_tensor -ts_dt .1 -ts_max_steps 10 -ts_final_time 1e10 -verbose 3 -num_species 1 -snes_monitor -masses 1,2,4 -thermal_temps 30,30,30 -domainv_lo -2,-2 -domainv_hi 2,2 -domainx_lo -12,-12 -domainx_hi 12,12 -E 0,0 -blobx_radius 2 -x_dm_view hdf5:x.h5 -x_vec_view hdf5:x.h5::append -v_dm_view hdf5:v.h5 -v_vec_view hdf5:v.h5::append -x_pre_dm_view hdf5:prex.h5 -x_pre_vec_view hdf5:prex.h5::append -snes_converged_reason -snes_linesearch_monitor -ts_adapt_monitor main call SetupXDiscretization main call SetInitialConditionDomain VMLViewX DMGetOutputSequenceNumber=-1, cmd_str=-x_pre_vec_view 0) species 0: charge density= -2.3940791757186e+00, z-momentum= 5.9851979392559e-01, energy= 3.2314073646197e-01, thermal-flux= 2.4419137539877e-01 0) Normalized: charge density= -2.3940791757186e+00, z momentum= 5.9851979392559e-01, energy= 3.2314073646197e-01, thermal flux= 2.4419137539877e-01, local: 64 X cells, 81 X vertices VMLViewX DMGetOutputSequenceNumber=0, cmd_str=(null) VMLViewV DMGetOutputSequenceNumber=-1 0 SNES Function norm 4.097052680599e+00 1 SNES Function norm 1.213148652908e-09 Nonlinear solve did not converge due to DIVERGED_FUNCTION_COUNT iterations 1 TSAdapt none step 0 stage rejected t=0 + 1.000e-01, nonlinear solve failures 1 greater than current TS allowed [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_NONLINEAR_SOLVE, increase -ts_max_snes_failures or make negative to attempt recovery [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3659-g699918129c GIT Date: 2017-04-26 08:18:35 -0400 [0]PETSC ERROR: ./vml on a arch-macosx-gnu-O named MarksMac-5.local by markadams Tue May 2 11:04:02 2017 [0]PETSC ERROR: Configure options --with-cc=clang --with-cc++=clang++ COPTFLAGS="-O3 -g -mavx2" CXXOPTFLAGS="-O3 -g -mavx2" FOPTFLAGS="-O3 -g -mavx2" --download-mpich=1 --download-parmetis=1 --download-metis=1 --download-hypre=1 --download-ml=1 --download-triangle=1 --download-ctetgen=1 --download-p4est=1 --with-x=0 --download-superlu_dist --download-superlu --download-ctetgen --with-debugging=0 --download-hdf5=1 PETSC_ARCH=arch-macosx-gnu-O --download-chaco --with-viewfromoptions=1 On Mon, May 1, 2017 at 10:25 PM, Barry Smith wrote: > > and > > -snes_linesearch_monitor > -ts_adapt_monitor > > > > On May 1, 2017, at 7:51 PM, Matthew Knepley wrote: > > > > Run with -snes_converged_reason. > > > > Matt > > > > On Mon, May 1, 2017 at 7:14 PM, Mark Adams wrote: > > I get this SNES failure and I don't understand what the problem is. The > rtol is 1.e-6 and the first iteration reduces the residual by 9 orders of > magnitude. Yet, TS is not satisfied. What is going on here? > > > > mpiexec -n 1 ./vml -v_coord_cylinder -x_dm_refine 2 -v_dm_refine 2 > -snes_rtol 1.e-6 -snes_stol 1.e-6 -ts_type cn -snes_fd -pc_type lu > -ksp_type preonly -x_petscspace_order 1 -x_petscspace_poly_tensor > -v_petscspace_order 1 -v_petscspace_poly_tensor -ts_dt .1 -ts_max_steps 10 > -ts_final_time 1e10 -verbose 3 -num_species 1 -snes_monitor -masses 1,2,4 > -thermal_temps 30,30,30 -domainv_lo -2,-2 -domainv_hi 2,2 -domainx_lo > -12,-12 -domainx_hi 12,12 -E 0,0 -blobx_radius 2 -x_dm_view hdf5:x.h5 > -x_vec_view hdf5:x.h5::append -v_dm_view hdf5:v.h5 -v_vec_view > hdf5:v.h5::append -x_pre_dm_view hdf5:prex.h5 -x_pre_vec_view > hdf5:prex.h5::append > > .... > > > > 0 SNES Function norm 4.097052680599e+00 > > 1 SNES Function norm 1.213148652908e-09 > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [0]PETSC ERROR: > > [0]PETSC ERROR: TSStep has failed due to DIVERGED_NONLINEAR_SOLVE, > increase -ts_max_snes_failures or make negative to attempt recovery > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3659-g699918129c > GIT Date: 2017-04-26 08:18:35 -0400 > > [0]PETSC ERROR: ./vml on a arch-macosx-gnu-O named MarksMac-5.local by > markadams Mon May 1 19:21:32 2017 > > [0]PETSC ERROR: Configure options --with-cc=clang --with-cc++=clang++ > COPTFLAGS="-O3 -g -mavx2" CXXOPTFLAGS="-O3 -g -mavx2" FOPTFLAGS="-O3 -g > -mavx2" --download-mpich=1 --download-parmetis=1 --download-metis=1 > --download-hypre=1 --download-ml=1 --download-triangle=1 > --download-ctetgen=1 --download-p4est=1 --with-x=0 --download-superlu_dist > --download-superlu --download-ctetgen --with-debugging=0 --download-hdf5=1 > PETSC_ARCH=arch-macosx-gnu-O --download-chaco --with-viewfromoptions=1 > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue May 2 10:18:53 2017 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 2 May 2017 10:18:53 -0500 Subject: [petsc-users] SNES error In-Reply-To: References: <677760BF-5666-4C9D-A064-B495ACD80889@mcs.anl.gov> Message-ID: On Tue, May 2, 2017 at 10:10 AM, Mark Adams wrote: > /Users/markadams/Codes/petsc/arch-macosx-gnu-O/bin/mpiexec -n 1 ./vml > -v_coord_cylinder -x_dm_refine 2 -v_dm_refine 2 -snes_rtol 1.e-6 -snes_stol > 1.e-6 -ts_type cn -snes_fd -pc_type lu -ksp_type preonly > -x_petscspace_order 1 -x_petscspace_poly_tensor -v_petscspace_order 1 > -v_petscspace_poly_tensor -ts_dt .1 -ts_max_steps 10 -ts_final_time 1e10 > -verbose 3 -num_species 1 -snes_monitor -masses 1,2,4 -thermal_temps > 30,30,30 -domainv_lo -2,-2 -domainv_hi 2,2 -domainx_lo -12,-12 -domainx_hi > 12,12 -E 0,0 -blobx_radius 2 -x_dm_view hdf5:x.h5 -x_vec_view > hdf5:x.h5::append -v_dm_view hdf5:v.h5 -v_vec_view hdf5:v.h5::append > -x_pre_dm_view hdf5:prex.h5 -x_pre_vec_view hdf5:prex.h5::append > -snes_converged_reason -snes_linesearch_monitor -ts_adapt_monitor > main call SetupXDiscretization > main call SetInitialConditionDomain > VMLViewX DMGetOutputSequenceNumber=-1, > cmd_str=-x_pre_vec_view > 0) species 0: charge density= -2.3940791757186e+00, z-momentum= > 5.9851979392559e-01, energy= 3.2314073646197e-01, thermal-flux= > 2.4419137539877e-01 > 0) Normalized: charge density= -2.3940791757186e+00, z momentum= > 5.9851979392559e-01, energy= 3.2314073646197e-01, thermal flux= > 2.4419137539877e-01, local: 64 X cells, 81 X vertices > VMLViewX DMGetOutputSequenceNumber=0, cmd_str=(null) > VMLViewV DMGetOutputSequenceNumber=-1 > 0 SNES Function norm 4.097052680599e+00 > 1 SNES Function norm 1.213148652908e-09 > Nonlinear solve did not converge due to DIVERGED_FUNCTION_COUNT > iterations 1 > Neat! Mark, I think this has to do with you calling SNESEvaluateFunc() inside another one. We limit the number of function evaluations to 10,000 by default, mostly to corral line searches. I think you hit this, and thus need to up the count. Thanks, Matt > TSAdapt none step 0 stage rejected t=0 + 1.000e-01, > nonlinear solve failures 1 greater than current TS allowed > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: > [0]PETSC ERROR: TSStep has failed due to DIVERGED_NONLINEAR_SOLVE, > increase -ts_max_snes_failures or make negative to attempt recovery > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3659-g699918129c > GIT Date: 2017-04-26 08:18:35 -0400 > [0]PETSC ERROR: ./vml on a arch-macosx-gnu-O named MarksMac-5.local by > markadams Tue May 2 11:04:02 2017 > [0]PETSC ERROR: Configure options --with-cc=clang --with-cc++=clang++ > COPTFLAGS="-O3 -g -mavx2" CXXOPTFLAGS="-O3 -g -mavx2" FOPTFLAGS="-O3 -g > -mavx2" --download-mpich=1 --download-parmetis=1 --download-metis=1 > --download-hypre=1 --download-ml=1 --download-triangle=1 > --download-ctetgen=1 --download-p4est=1 --with-x=0 --download-superlu_dist > --download-superlu --download-ctetgen --with-debugging=0 --download-hdf5=1 > PETSC_ARCH=arch-macosx-gnu-O --download-chaco --with-viewfromoptions=1 > > > On Mon, May 1, 2017 at 10:25 PM, Barry Smith wrote: > >> >> and >> >> -snes_linesearch_monitor >> -ts_adapt_monitor >> >> >> > On May 1, 2017, at 7:51 PM, Matthew Knepley wrote: >> > >> > Run with -snes_converged_reason. >> > >> > Matt >> > >> > On Mon, May 1, 2017 at 7:14 PM, Mark Adams wrote: >> > I get this SNES failure and I don't understand what the problem is. The >> rtol is 1.e-6 and the first iteration reduces the residual by 9 orders of >> magnitude. Yet, TS is not satisfied. What is going on here? >> > >> > mpiexec -n 1 ./vml -v_coord_cylinder -x_dm_refine 2 -v_dm_refine 2 >> -snes_rtol 1.e-6 -snes_stol 1.e-6 -ts_type cn -snes_fd -pc_type lu >> -ksp_type preonly -x_petscspace_order 1 -x_petscspace_poly_tensor >> -v_petscspace_order 1 -v_petscspace_poly_tensor -ts_dt .1 -ts_max_steps 10 >> -ts_final_time 1e10 -verbose 3 -num_species 1 -snes_monitor -masses 1,2,4 >> -thermal_temps 30,30,30 -domainv_lo -2,-2 -domainv_hi 2,2 -domainx_lo >> -12,-12 -domainx_hi 12,12 -E 0,0 -blobx_radius 2 -x_dm_view hdf5:x.h5 >> -x_vec_view hdf5:x.h5::append -v_dm_view hdf5:v.h5 -v_vec_view >> hdf5:v.h5::append -x_pre_dm_view hdf5:prex.h5 -x_pre_vec_view >> hdf5:prex.h5::append >> > .... >> > >> > 0 SNES Function norm 4.097052680599e+00 >> > 1 SNES Function norm 1.213148652908e-09 >> > [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> > [0]PETSC ERROR: >> > [0]PETSC ERROR: TSStep has failed due to DIVERGED_NONLINEAR_SOLVE, >> increase -ts_max_snes_failures or make negative to attempt recovery >> > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html >> for trouble shooting. >> > [0]PETSC ERROR: Petsc Development GIT revision: >> v3.7.6-3659-g699918129c GIT Date: 2017-04-26 08:18:35 -0400 >> > [0]PETSC ERROR: ./vml on a arch-macosx-gnu-O named MarksMac-5.local by >> markadams Mon May 1 19:21:32 2017 >> > [0]PETSC ERROR: Configure options --with-cc=clang --with-cc++=clang++ >> COPTFLAGS="-O3 -g -mavx2" CXXOPTFLAGS="-O3 -g -mavx2" FOPTFLAGS="-O3 -g >> -mavx2" --download-mpich=1 --download-parmetis=1 --download-metis=1 >> --download-hypre=1 --download-ml=1 --download-triangle=1 >> --download-ctetgen=1 --download-p4est=1 --with-x=0 --download-superlu_dist >> --download-superlu --download-ctetgen --with-debugging=0 --download-hdf5=1 >> PETSC_ARCH=arch-macosx-gnu-O --download-chaco --with-viewfromoptions=1 >> > >> > >> > >> > >> > -- >> > What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> > -- Norbert Wiener >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgbk2008 at gmail.com Wed May 3 02:29:13 2017 From: hgbk2008 at gmail.com (Hoang Giang Bui) Date: Wed, 3 May 2017 09:29:13 +0200 Subject: [petsc-users] strange convergence In-Reply-To: <87wpa3wd5j.fsf@jedbrown.org> References: <7891536D-91FE-4BFF-8DAD-CE7AB85A4E57@mcs.anl.gov> <425BBB58-9721-49F3-8C86-940F08E925F7@mcs.anl.gov> <42EB791A-40C2-439F-A5F7-5F8C15CECA6F@mcs.anl.gov> <82193784-B4C4-47D7-80EA-25F549C9091B@mcs.anl.gov> <87wpa3wd5j.fsf@jedbrown.org> Message-ID: Dear Jed If I understood you correctly you suggest to avoid penalty by using the Lagrange multiplier for the mortar constraint? In this case it leads to the use of discrete Lagrange multiplier space. Do you or anyone already have experience using discrete Lagrange multiplier space with Petsc? There is also similar question on stackexchange https://scicomp.stackexchange.com/questions/25113/preconditioners-and-discrete-lagrange-multipliers Giang On Sat, Apr 29, 2017 at 3:34 PM, Jed Brown wrote: > Hoang Giang Bui writes: > > > Hi Barry > > > > The first block is from a standard solid mechanics discretization based > on > > balance of momentum equation. There is some material involved but in > > principal it's well-posed elasticity equation with positive definite > > tangent operator. The "gluing business" uses the mortar method to keep > the > > continuity of displacement. Instead of using Lagrange multiplier to treat > > the constraint I used penalty method to penalize the energy. The > > discretization form of mortar is quite simple > > > > \int_{\Gamma_1} { rho * (\delta u_1 - \delta u_2) * (u_1 - u_2) dA } > > > > rho is penalty parameter. In the simulation I initially set it low (~E) > to > > preserve the conditioning of the system. > > There are two things that can go wrong here with AMG: > > * The penalty term can mess up the strength of connection heuristics > such that you get poor choice of C-points (classical AMG like > BoomerAMG) or poor choice of aggregates (smoothed aggregation). > > * The penalty term can prevent Jacobi smoothing from being effective; in > this case, it can lead to poor coarse basis functions (higher energy > than they should be) and poor smoothing in an MG cycle. You can fix > the poor smoothing in the MG cycle by using a stronger smoother, like > ASM with some overlap. > > I'm generally not a fan of penalty methods due to the irritating > tradeoffs and often poor solver performance. > > > In the figure below, the colorful blocks are u_1 and the base is u_2. > Both > > u_1 and u_2 use isoparametric quadratic approximation. > > > > ? > > Snapshot.png > > drive_web> > > ??? > > > > Giang > > > > On Fri, Apr 28, 2017 at 6:21 PM, Barry Smith wrote: > > > >> > >> Ok, so boomerAMG algebraic multigrid is not good for the first block. > >> You mentioned the first block has two things glued together? AMG is > >> fantastic for certain problems but doesn't work for everything. > >> > >> Tell us more about the first block, what PDE it comes from, what > >> discretization, and what the "gluing business" is and maybe we'll have > >> suggestions for how to precondition it. > >> > >> Barry > >> > >> > On Apr 28, 2017, at 3:56 AM, Hoang Giang Bui > wrote: > >> > > >> > It's in fact quite good > >> > > >> > Residual norms for fieldsplit_u_ solve. > >> > 0 KSP Residual norm 4.014715925568e+00 > >> > 1 KSP Residual norm 2.160497019264e-10 > >> > Residual norms for fieldsplit_wp_ solve. > >> > 0 KSP Residual norm 0.000000000000e+00 > >> > 0 KSP preconditioned resid norm 4.014715925568e+00 true resid norm > >> 9.006493082896e+06 ||r(i)||/||b|| 1.000000000000e+00 > >> > Residual norms for fieldsplit_u_ solve. > >> > 0 KSP Residual norm 9.999999999416e-01 > >> > 1 KSP Residual norm 7.118380416383e-11 > >> > Residual norms for fieldsplit_wp_ solve. > >> > 0 KSP Residual norm 0.000000000000e+00 > >> > 1 KSP preconditioned resid norm 1.701150951035e-10 true resid norm > >> 5.494262251846e-04 ||r(i)||/||b|| 6.100334726599e-11 > >> > Linear solve converged due to CONVERGED_ATOL iterations 1 > >> > > >> > Giang > >> > > >> > On Thu, Apr 27, 2017 at 5:25 PM, Barry Smith > wrote: > >> > > >> > Run again using LU on both blocks to see what happens. > >> > > >> > > >> > > On Apr 27, 2017, at 2:14 AM, Hoang Giang Bui > >> wrote: > >> > > > >> > > I have changed the way to tie the nonconforming mesh. It seems the > >> matrix now is better > >> > > > >> > > with -pc_type lu the output is > >> > > 0 KSP preconditioned resid norm 3.308678584240e-01 true resid norm > >> 9.006493082896e+06 ||r(i)||/||b|| 1.000000000000e+00 > >> > > 1 KSP preconditioned resid norm 2.004313395301e-12 true resid norm > >> 2.549872332830e-05 ||r(i)||/||b|| 2.831148938173e-12 > >> > > Linear solve converged due to CONVERGED_ATOL iterations 1 > >> > > > >> > > > >> > > with -pc_type fieldsplit -fieldsplit_u_pc_type hypre > >> -fieldsplit_wp_pc_type lu the convergence is slow > >> > > 0 KSP preconditioned resid norm 1.116302362553e-01 true resid norm > >> 9.006493083520e+06 ||r(i)||/||b|| 1.000000000000e+00 > >> > > 1 KSP preconditioned resid norm 2.582134825666e-02 true resid norm > >> 9.268347719866e+06 ||r(i)||/||b|| 1.029073984060e+00 > >> > > ... > >> > > 824 KSP preconditioned resid norm 1.018542387738e-09 true resid norm > >> 2.906608839310e+02 ||r(i)||/||b|| 3.227237074804e-05 > >> > > 825 KSP preconditioned resid norm 9.743727947637e-10 true resid norm > >> 2.820369993061e+02 ||r(i)||/||b|| 3.131485215062e-05 > >> > > Linear solve converged due to CONVERGED_ATOL iterations 825 > >> > > > >> > > checking with additional -fieldsplit_u_ksp_type richardson > >> -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 > >> -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor > >> -fieldsplit_wp_ksp_max_it 1 gives > >> > > > >> > > 0 KSP preconditioned resid norm 1.116302362553e-01 true resid norm > >> 9.006493083520e+06 ||r(i)||/||b|| 1.000000000000e+00 > >> > > Residual norms for fieldsplit_u_ solve. > >> > > 0 KSP Residual norm 5.803507549280e-01 > >> > > 1 KSP Residual norm 2.069538175950e-01 > >> > > Residual norms for fieldsplit_wp_ solve. > >> > > 0 KSP Residual norm 0.000000000000e+00 > >> > > 1 KSP preconditioned resid norm 2.582134825666e-02 true resid norm > >> 9.268347719866e+06 ||r(i)||/||b|| 1.029073984060e+00 > >> > > Residual norms for fieldsplit_u_ solve. > >> > > 0 KSP Residual norm 7.831796195225e-01 > >> > > 1 KSP Residual norm 1.734608520110e-01 > >> > > Residual norms for fieldsplit_wp_ solve. > >> > > 0 KSP Residual norm 0.000000000000e+00 > >> > > .... > >> > > 823 KSP preconditioned resid norm 1.065070135605e-09 true resid norm > >> 3.081881356833e+02 ||r(i)||/||b|| 3.421843916665e-05 > >> > > Residual norms for fieldsplit_u_ solve. > >> > > 0 KSP Residual norm 6.113806394327e-01 > >> > > 1 KSP Residual norm 1.535465290944e-01 > >> > > Residual norms for fieldsplit_wp_ solve. > >> > > 0 KSP Residual norm 0.000000000000e+00 > >> > > 824 KSP preconditioned resid norm 1.018542387746e-09 true resid norm > >> 2.906608839353e+02 ||r(i)||/||b|| 3.227237074851e-05 > >> > > Residual norms for fieldsplit_u_ solve. > >> > > 0 KSP Residual norm 6.123437055586e-01 > >> > > 1 KSP Residual norm 1.524661826133e-01 > >> > > Residual norms for fieldsplit_wp_ solve. > >> > > 0 KSP Residual norm 0.000000000000e+00 > >> > > 825 KSP preconditioned resid norm 9.743727947718e-10 true resid norm > >> 2.820369990571e+02 ||r(i)||/||b|| 3.131485212298e-05 > >> > > Linear solve converged due to CONVERGED_ATOL iterations 825 > >> > > > >> > > > >> > > The residual for wp block is zero since in this first step the rhs > is > >> zero. As can see in the output, the multigrid does not perform well to > >> reduce the residual in the sub-solve. Is my observation right? what can > be > >> done to improve this? > >> > > > >> > > > >> > > Giang > >> > > > >> > > On Tue, Apr 25, 2017 at 12:17 AM, Barry Smith > >> wrote: > >> > > > >> > > This can happen in the matrix is singular or nearly singular or > if > >> the factorization generates small pivots, which can occur for even > >> nonsingular problems if the matrix is poorly scaled or just plain nasty. > >> > > > >> > > > >> > > > On Apr 24, 2017, at 5:10 PM, Hoang Giang Bui > >> wrote: > >> > > > > >> > > > It took a while, here I send you the output > >> > > > > >> > > > 0 KSP preconditioned resid norm 3.129073545457e+05 true resid > norm > >> 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 > >> > > > 1 KSP preconditioned resid norm 7.442444222843e-01 true resid > norm > >> 1.003356247696e+02 ||r(i)||/||b|| 1.112966720375e-05 > >> > > > 2 KSP preconditioned resid norm 3.267453132529e-07 true resid > norm > >> 3.216722968300e+01 ||r(i)||/||b|| 3.568130084011e-06 > >> > > > 3 KSP preconditioned resid norm 1.155046883816e-11 true resid > norm > >> 3.234460376820e+01 ||r(i)||/||b|| 3.587805194854e-06 > >> > > > Linear solve converged due to CONVERGED_ATOL iterations 3 > >> > > > KSP Object: 4 MPI processes > >> > > > type: gmres > >> > > > GMRES: restart=1000, using Modified Gram-Schmidt > >> Orthogonalization > >> > > > GMRES: happy breakdown tolerance 1e-30 > >> > > > maximum iterations=1000, initial guess is zero > >> > > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 > >> > > > left preconditioning > >> > > > using PRECONDITIONED norm type for convergence test > >> > > > PC Object: 4 MPI processes > >> > > > type: lu > >> > > > LU: out-of-place factorization > >> > > > tolerance for zero pivot 2.22045e-14 > >> > > > matrix ordering: natural > >> > > > factor fill ratio given 0, needed 0 > >> > > > Factored matrix follows: > >> > > > Mat Object: 4 MPI processes > >> > > > type: mpiaij > >> > > > rows=973051, cols=973051 > >> > > > package used to perform factorization: pastix > >> > > > Error : 3.24786e-14 > >> > > > total: nonzeros=0, allocated nonzeros=0 > >> > > > total number of mallocs used during MatSetValues calls > =0 > >> > > > PaStiX run parameters: > >> > > > Matrix type : Unsymmetric > >> > > > Level of printing (0,1,2): 0 > >> > > > Number of refinements iterations : 3 > >> > > > Error : 3.24786e-14 > >> > > > linear system matrix = precond matrix: > >> > > > Mat Object: 4 MPI processes > >> > > > type: mpiaij > >> > > > rows=973051, cols=973051 > >> > > > Error : 3.24786e-14 > >> > > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 > >> > > > total number of mallocs used during MatSetValues calls =0 > >> > > > using I-node (on process 0) routines: found 78749 nodes, > limit > >> used is 5 > >> > > > Error : 3.24786e-14 > >> > > > > >> > > > It doesn't do as you said. Something is not right here. I will > look > >> in depth. > >> > > > > >> > > > Giang > >> > > > > >> > > > On Mon, Apr 24, 2017 at 8:21 PM, Barry Smith > >> wrote: > >> > > > > >> > > > > On Apr 24, 2017, at 12:47 PM, Hoang Giang Bui < > hgbk2008 at gmail.com> > >> wrote: > >> > > > > > >> > > > > Good catch. I get this for the very first step, maybe at that > time > >> the rhs_w is zero. > >> > > > > >> > > > With the multiplicative composition the right hand side of the > >> second solve is the initial right hand side of the second solve minus > >> A_10*x where x is the solution to the first sub solve and A_10 is the > lower > >> left block of the outer matrix. So unless both the initial right hand > side > >> has a zero for the second block and A_10 is identically zero the right > hand > >> side for the second sub solve should not be zero. Is A_10 == 0? > >> > > > > >> > > > > >> > > > > In the later step, it shows 2 step convergence > >> > > > > > >> > > > > Residual norms for fieldsplit_u_ solve. > >> > > > > 0 KSP Residual norm 3.165886479830e+04 > >> > > > > 1 KSP Residual norm 2.905922877684e-01 > >> > > > > Residual norms for fieldsplit_wp_ solve. > >> > > > > 0 KSP Residual norm 2.397669419027e-01 > >> > > > > 1 KSP Residual norm 0.000000000000e+00 > >> > > > > 0 KSP preconditioned resid norm 3.165886479920e+04 true resid > >> norm 7.963616922323e+05 ||r(i)||/||b|| 1.000000000000e+00 > >> > > > > Residual norms for fieldsplit_u_ solve. > >> > > > > 0 KSP Residual norm 9.999891813771e-01 > >> > > > > 1 KSP Residual norm 1.512000395579e-05 > >> > > > > Residual norms for fieldsplit_wp_ solve. > >> > > > > 0 KSP Residual norm 8.192702188243e-06 > >> > > > > 1 KSP Residual norm 0.000000000000e+00 > >> > > > > 1 KSP preconditioned resid norm 5.252183822848e-02 true resid > >> norm 7.135927677844e+04 ||r(i)||/||b|| 8.960661653427e-02 > >> > > > > >> > > > The outer residual norms are still wonky, the preconditioned > >> residual norm goes from 3.165886479920e+04 to 5.252183822848e-02 which > is a > >> huge drop but the 7.963616922323e+05 drops very much less > >> 7.135927677844e+04. This is not normal. > >> > > > > >> > > > What if you just use -pc_type lu for the entire system (no > >> fieldsplit), does the true residual drop to almost zero in the first > >> iteration (as it should?). Send the output. > >> > > > > >> > > > > >> > > > > >> > > > > Residual norms for fieldsplit_u_ solve. > >> > > > > 0 KSP Residual norm 6.946213936597e-01 > >> > > > > 1 KSP Residual norm 1.195514007343e-05 > >> > > > > Residual norms for fieldsplit_wp_ solve. > >> > > > > 0 KSP Residual norm 1.025694497535e+00 > >> > > > > 1 KSP Residual norm 0.000000000000e+00 > >> > > > > 2 KSP preconditioned resid norm 8.785709535405e-03 true resid > >> norm 1.419341799277e+04 ||r(i)||/||b|| 1.782282866091e-02 > >> > > > > Residual norms for fieldsplit_u_ solve. > >> > > > > 0 KSP Residual norm 7.255149996405e-01 > >> > > > > 1 KSP Residual norm 6.583512434218e-06 > >> > > > > Residual norms for fieldsplit_wp_ solve. > >> > > > > 0 KSP Residual norm 1.015229700337e+00 > >> > > > > 1 KSP Residual norm 0.000000000000e+00 > >> > > > > 3 KSP preconditioned resid norm 7.110407712709e-04 true resid > >> norm 5.284940654154e+02 ||r(i)||/||b|| 6.636357205153e-04 > >> > > > > Residual norms for fieldsplit_u_ solve. > >> > > > > 0 KSP Residual norm 3.512243341400e-01 > >> > > > > 1 KSP Residual norm 2.032490351200e-06 > >> > > > > Residual norms for fieldsplit_wp_ solve. > >> > > > > 0 KSP Residual norm 1.282327290982e+00 > >> > > > > 1 KSP Residual norm 0.000000000000e+00 > >> > > > > 4 KSP preconditioned resid norm 3.482036620521e-05 true resid > >> norm 4.291231924307e+01 ||r(i)||/||b|| 5.388546393133e-05 > >> > > > > Residual norms for fieldsplit_u_ solve. > >> > > > > 0 KSP Residual norm 3.423609338053e-01 > >> > > > > 1 KSP Residual norm 4.213703301972e-07 > >> > > > > Residual norms for fieldsplit_wp_ solve. > >> > > > > 0 KSP Residual norm 1.157384757538e+00 > >> > > > > 1 KSP Residual norm 0.000000000000e+00 > >> > > > > 5 KSP preconditioned resid norm 1.203470314534e-06 true resid > >> norm 4.544956156267e+00 ||r(i)||/||b|| 5.707150658550e-06 > >> > > > > Residual norms for fieldsplit_u_ solve. > >> > > > > 0 KSP Residual norm 3.838596289995e-01 > >> > > > > 1 KSP Residual norm 9.927864176103e-08 > >> > > > > Residual norms for fieldsplit_wp_ solve. > >> > > > > 0 KSP Residual norm 1.066298905618e+00 > >> > > > > 1 KSP Residual norm 0.000000000000e+00 > >> > > > > 6 KSP preconditioned resid norm 3.331619244266e-08 true resid > >> norm 2.821511729024e+00 ||r(i)||/||b|| 3.543002829675e-06 > >> > > > > Residual norms for fieldsplit_u_ solve. > >> > > > > 0 KSP Residual norm 4.624964188094e-01 > >> > > > > 1 KSP Residual norm 6.418229775372e-08 > >> > > > > Residual norms for fieldsplit_wp_ solve. > >> > > > > 0 KSP Residual norm 9.800784311614e-01 > >> > > > > 1 KSP Residual norm 0.000000000000e+00 > >> > > > > 7 KSP preconditioned resid norm 8.788046233297e-10 true resid > >> norm 2.849209671705e+00 ||r(i)||/||b|| 3.577783436215e-06 > >> > > > > Linear solve converged due to CONVERGED_ATOL iterations 7 > >> > > > > > >> > > > > The outer operator is an explicit matrix. > >> > > > > > >> > > > > Giang > >> > > > > > >> > > > > On Mon, Apr 24, 2017 at 7:32 PM, Barry Smith < > bsmith at mcs.anl.gov> > >> wrote: > >> > > > > > >> > > > > > On Apr 24, 2017, at 3:16 AM, Hoang Giang Bui < > hgbk2008 at gmail.com> > >> wrote: > >> > > > > > > >> > > > > > Thanks Barry, trying with -fieldsplit_u_type lu gives better > >> convergence. I still used 4 procs though, probably with 1 proc it should > >> also be the same. > >> > > > > > > >> > > > > > The u block used a Nitsche-type operator to connect two > >> non-matching domains. I don't think it will leave some rigid body motion > >> leads to not sufficient constraints. Maybe you have other idea? > >> > > > > > > >> > > > > > Residual norms for fieldsplit_u_ solve. > >> > > > > > 0 KSP Residual norm 3.129067184300e+05 > >> > > > > > 1 KSP Residual norm 5.906261468196e-01 > >> > > > > > Residual norms for fieldsplit_wp_ solve. > >> > > > > > 0 KSP Residual norm 0.000000000000e+00 > >> > > > > > >> > > > > ^^^^ something is wrong here. The sub solve should not be > >> starting with a 0 residual (this means the right hand side for this sub > >> solve is zero which it should not be). > >> > > > > > >> > > > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 > >> > > > > > >> > > > > > >> > > > > How are you providing the outer operator? As an explicit > matrix > >> or with some shell matrix? > >> > > > > > >> > > > > > >> > > > > > >> > > > > > 0 KSP preconditioned resid norm 3.129067184300e+05 true > resid > >> norm 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 > >> > > > > > Residual norms for fieldsplit_u_ solve. > >> > > > > > 0 KSP Residual norm 9.999955993437e-01 > >> > > > > > 1 KSP Residual norm 4.019774691831e-06 > >> > > > > > Residual norms for fieldsplit_wp_ solve. > >> > > > > > 0 KSP Residual norm 0.000000000000e+00 > >> > > > > > 1 KSP preconditioned resid norm 5.003913641475e-01 true > resid > >> norm 4.692996324114e+01 ||r(i)||/||b|| 5.205677185522e-06 > >> > > > > > Residual norms for fieldsplit_u_ solve. > >> > > > > > 0 KSP Residual norm 1.000012180204e+00 > >> > > > > > 1 KSP Residual norm 1.017367950422e-05 > >> > > > > > Residual norms for fieldsplit_wp_ solve. > >> > > > > > 0 KSP Residual norm 0.000000000000e+00 > >> > > > > > 2 KSP preconditioned resid norm 2.330910333756e-07 true > resid > >> norm 3.474855463983e+01 ||r(i)||/||b|| 3.854461960453e-06 > >> > > > > > Residual norms for fieldsplit_u_ solve. > >> > > > > > 0 KSP Residual norm 1.000004200085e+00 > >> > > > > > 1 KSP Residual norm 6.231613102458e-06 > >> > > > > > Residual norms for fieldsplit_wp_ solve. > >> > > > > > 0 KSP Residual norm 0.000000000000e+00 > >> > > > > > 3 KSP preconditioned resid norm 8.671259838389e-11 true > resid > >> norm 3.545103468011e+01 ||r(i)||/||b|| 3.932384125024e-06 > >> > > > > > Linear solve converged due to CONVERGED_ATOL iterations 3 > >> > > > > > KSP Object: 4 MPI processes > >> > > > > > type: gmres > >> > > > > > GMRES: restart=1000, using Modified Gram-Schmidt > >> Orthogonalization > >> > > > > > GMRES: happy breakdown tolerance 1e-30 > >> > > > > > maximum iterations=1000, initial guess is zero > >> > > > > > tolerances: relative=1e-20, absolute=1e-09, > divergence=10000 > >> > > > > > left preconditioning > >> > > > > > using PRECONDITIONED norm type for convergence test > >> > > > > > PC Object: 4 MPI processes > >> > > > > > type: fieldsplit > >> > > > > > FieldSplit with MULTIPLICATIVE composition: total splits > = 2 > >> > > > > > Solver info for each split is in the following KSP > objects: > >> > > > > > Split number 0 Defined by IS > >> > > > > > KSP Object: (fieldsplit_u_) 4 MPI processes > >> > > > > > type: richardson > >> > > > > > Richardson: damping factor=1 > >> > > > > > maximum iterations=1, initial guess is zero > >> > > > > > tolerances: relative=1e-05, absolute=1e-50, > >> divergence=10000 > >> > > > > > left preconditioning > >> > > > > > using PRECONDITIONED norm type for convergence test > >> > > > > > PC Object: (fieldsplit_u_) 4 MPI processes > >> > > > > > type: lu > >> > > > > > LU: out-of-place factorization > >> > > > > > tolerance for zero pivot 2.22045e-14 > >> > > > > > matrix ordering: natural > >> > > > > > factor fill ratio given 0, needed 0 > >> > > > > > Factored matrix follows: > >> > > > > > Mat Object: 4 MPI processes > >> > > > > > type: mpiaij > >> > > > > > rows=938910, cols=938910 > >> > > > > > package used to perform factorization: pastix > >> > > > > > total: nonzeros=0, allocated nonzeros=0 > >> > > > > > Error : 3.36878e-14 > >> > > > > > total number of mallocs used during MatSetValues > calls > >> =0 > >> > > > > > PaStiX run parameters: > >> > > > > > Matrix type : > Unsymmetric > >> > > > > > Level of printing (0,1,2): 0 > >> > > > > > Number of refinements iterations : 3 > >> > > > > > Error : 3.36878e-14 > >> > > > > > linear system matrix = precond matrix: > >> > > > > > Mat Object: (fieldsplit_u_) 4 MPI processes > >> > > > > > type: mpiaij > >> > > > > > rows=938910, cols=938910, bs=3 > >> > > > > > Error : 3.36878e-14 > >> > > > > > Error : 3.36878e-14 > >> > > > > > total: nonzeros=8.60906e+07, allocated > >> nonzeros=8.60906e+07 > >> > > > > > total number of mallocs used during MatSetValues > calls =0 > >> > > > > > using I-node (on process 0) routines: found 78749 > >> nodes, limit used is 5 > >> > > > > > Split number 1 Defined by IS > >> > > > > > KSP Object: (fieldsplit_wp_) 4 MPI processes > >> > > > > > type: richardson > >> > > > > > Richardson: damping factor=1 > >> > > > > > maximum iterations=1, initial guess is zero > >> > > > > > tolerances: relative=1e-05, absolute=1e-50, > >> divergence=10000 > >> > > > > > left preconditioning > >> > > > > > using PRECONDITIONED norm type for convergence test > >> > > > > > PC Object: (fieldsplit_wp_) 4 MPI processes > >> > > > > > type: lu > >> > > > > > LU: out-of-place factorization > >> > > > > > tolerance for zero pivot 2.22045e-14 > >> > > > > > matrix ordering: natural > >> > > > > > factor fill ratio given 0, needed 0 > >> > > > > > Factored matrix follows: > >> > > > > > Mat Object: 4 MPI processes > >> > > > > > type: mpiaij > >> > > > > > rows=34141, cols=34141 > >> > > > > > package used to perform factorization: pastix > >> > > > > > Error : -nan > >> > > > > > Error : -nan > >> > > > > > Error : -nan > >> > > > > > total: nonzeros=0, allocated nonzeros=0 > >> > > > > > total number of mallocs used during MatSetValues > >> calls =0 > >> > > > > > PaStiX run parameters: > >> > > > > > Matrix type : Symmetric > >> > > > > > Level of printing (0,1,2): 0 > >> > > > > > Number of refinements iterations : 0 > >> > > > > > Error : -nan > >> > > > > > linear system matrix = precond matrix: > >> > > > > > Mat Object: (fieldsplit_wp_) 4 MPI processes > >> > > > > > type: mpiaij > >> > > > > > rows=34141, cols=34141 > >> > > > > > total: nonzeros=485655, allocated nonzeros=485655 > >> > > > > > total number of mallocs used during MatSetValues > calls =0 > >> > > > > > not using I-node (on process 0) routines > >> > > > > > linear system matrix = precond matrix: > >> > > > > > Mat Object: 4 MPI processes > >> > > > > > type: mpiaij > >> > > > > > rows=973051, cols=973051 > >> > > > > > total: nonzeros=9.90037e+07, allocated > nonzeros=9.90037e+07 > >> > > > > > total number of mallocs used during MatSetValues calls =0 > >> > > > > > using I-node (on process 0) routines: found 78749 nodes, > >> limit used is 5 > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > Giang > >> > > > > > > >> > > > > > On Sun, Apr 23, 2017 at 10:19 PM, Barry Smith < > >> bsmith at mcs.anl.gov> wrote: > >> > > > > > > >> > > > > > > On Apr 23, 2017, at 2:42 PM, Hoang Giang Bui < > >> hgbk2008 at gmail.com> wrote: > >> > > > > > > > >> > > > > > > Dear Matt/Barry > >> > > > > > > > >> > > > > > > With your options, it results in > >> > > > > > > > >> > > > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true > >> resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 > >> > > > > > > Residual norms for fieldsplit_u_ solve. > >> > > > > > > 0 KSP Residual norm 2.407308987203e+36 > >> > > > > > > 1 KSP Residual norm 5.797185652683e+72 > >> > > > > > > >> > > > > > It looks like Matt is right, hypre is seemly producing useless > >> garbage. > >> > > > > > > >> > > > > > First how do things run on one process. If you have similar > >> problems then debug on one process (debugging any kind of problem is > always > >> far easy on one process). > >> > > > > > > >> > > > > > First run with -fieldsplit_u_type lu (instead of using hypre) > to > >> see if that works or also produces something bad. > >> > > > > > > >> > > > > > What is the operator and the boundary conditions for u? It > could > >> be singular. > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > Residual norms for fieldsplit_wp_ solve. > >> > > > > > > 0 KSP Residual norm 0.000000000000e+00 > >> > > > > > > ... > >> > > > > > > 999 KSP preconditioned resid norm 2.920157329174e+12 true > >> resid norm 9.015683504616e+06 ||r(i)||/||b|| 1.000059124102e+00 > >> > > > > > > Residual norms for fieldsplit_u_ solve. > >> > > > > > > 0 KSP Residual norm 1.533726746719e+36 > >> > > > > > > 1 KSP Residual norm 3.692757392261e+72 > >> > > > > > > Residual norms for fieldsplit_wp_ solve. > >> > > > > > > 0 KSP Residual norm 0.000000000000e+00 > >> > > > > > > > >> > > > > > > Do you suggest that the pastix solver for the "wp" block > >> encounters small pivot? In addition, seem like the "u" block is also > >> singular. > >> > > > > > > > >> > > > > > > Giang > >> > > > > > > > >> > > > > > > On Sun, Apr 23, 2017 at 7:39 PM, Barry Smith < > >> bsmith at mcs.anl.gov> wrote: > >> > > > > > > > >> > > > > > > Huge preconditioned norms but normal unpreconditioned > norms > >> almost always come from a very small pivot in an LU or ILU > factorization. > >> > > > > > > > >> > > > > > > The first thing to do is monitor the two sub solves. Run > >> with the additional options -fieldsplit_u_ksp_type richardson > >> -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 > >> -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor > >> -fieldsplit_wp_ksp_max_it 1 > >> > > > > > > > >> > > > > > > > On Apr 23, 2017, at 12:22 PM, Hoang Giang Bui < > >> hgbk2008 at gmail.com> wrote: > >> > > > > > > > > >> > > > > > > > Hello > >> > > > > > > > > >> > > > > > > > I encountered a strange convergence behavior that I have > >> trouble to understand > >> > > > > > > > > >> > > > > > > > KSPSetFromOptions completed > >> > > > > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true > >> resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 > >> > > > > > > > 1 KSP preconditioned resid norm 2.933141742664e+29 true > >> resid norm 9.015152282123e+06 ||r(i)||/||b|| 1.000000198575e+00 > >> > > > > > > > 2 KSP preconditioned resid norm 9.686409637174e+16 true > >> resid norm 9.015354521944e+06 ||r(i)||/||b|| 1.000022631902e+00 > >> > > > > > > > 3 KSP preconditioned resid norm 4.219243615809e+15 true > >> resid norm 9.017157702420e+06 ||r(i)||/||b|| 1.000222648583e+00 > >> > > > > > > > ..... > >> > > > > > > > 999 KSP preconditioned resid norm 3.043754298076e+12 true > >> resid norm 9.015425041089e+06 ||r(i)||/||b|| 1.000030454195e+00 > >> > > > > > > > 1000 KSP preconditioned resid norm 3.043000287819e+12 true > >> resid norm 9.015424313455e+06 ||r(i)||/||b|| 1.000030373483e+00 > >> > > > > > > > Linear solve did not converge due to DIVERGED_ITS > iterations > >> 1000 > >> > > > > > > > KSP Object: 4 MPI processes > >> > > > > > > > type: gmres > >> > > > > > > > GMRES: restart=1000, using Modified Gram-Schmidt > >> Orthogonalization > >> > > > > > > > GMRES: happy breakdown tolerance 1e-30 > >> > > > > > > > maximum iterations=1000, initial guess is zero > >> > > > > > > > tolerances: relative=1e-20, absolute=1e-09, > >> divergence=10000 > >> > > > > > > > left preconditioning > >> > > > > > > > using PRECONDITIONED norm type for convergence test > >> > > > > > > > PC Object: 4 MPI processes > >> > > > > > > > type: fieldsplit > >> > > > > > > > FieldSplit with MULTIPLICATIVE composition: total > splits > >> = 2 > >> > > > > > > > Solver info for each split is in the following KSP > >> objects: > >> > > > > > > > Split number 0 Defined by IS > >> > > > > > > > KSP Object: (fieldsplit_u_) 4 MPI processes > >> > > > > > > > type: preonly > >> > > > > > > > maximum iterations=10000, initial guess is zero > >> > > > > > > > tolerances: relative=1e-05, absolute=1e-50, > >> divergence=10000 > >> > > > > > > > left preconditioning > >> > > > > > > > using NONE norm type for convergence test > >> > > > > > > > PC Object: (fieldsplit_u_) 4 MPI processes > >> > > > > > > > type: hypre > >> > > > > > > > HYPRE BoomerAMG preconditioning > >> > > > > > > > HYPRE BoomerAMG: Cycle type V > >> > > > > > > > HYPRE BoomerAMG: Maximum number of levels 25 > >> > > > > > > > HYPRE BoomerAMG: Maximum number of iterations PER > >> hypre call 1 > >> > > > > > > > HYPRE BoomerAMG: Convergence tolerance PER hypre > >> call 0 > >> > > > > > > > HYPRE BoomerAMG: Threshold for strong coupling 0.6 > >> > > > > > > > HYPRE BoomerAMG: Interpolation truncation factor 0 > >> > > > > > > > HYPRE BoomerAMG: Interpolation: max elements per > row > >> 0 > >> > > > > > > > HYPRE BoomerAMG: Number of levels of aggressive > >> coarsening 0 > >> > > > > > > > HYPRE BoomerAMG: Number of paths for aggressive > >> coarsening 1 > >> > > > > > > > HYPRE BoomerAMG: Maximum row sums 0.9 > >> > > > > > > > HYPRE BoomerAMG: Sweeps down 1 > >> > > > > > > > HYPRE BoomerAMG: Sweeps up 1 > >> > > > > > > > HYPRE BoomerAMG: Sweeps on coarse 1 > >> > > > > > > > HYPRE BoomerAMG: Relax down > >> symmetric-SOR/Jacobi > >> > > > > > > > HYPRE BoomerAMG: Relax up > >> symmetric-SOR/Jacobi > >> > > > > > > > HYPRE BoomerAMG: Relax on coarse > >> Gaussian-elimination > >> > > > > > > > HYPRE BoomerAMG: Relax weight (all) 1 > >> > > > > > > > HYPRE BoomerAMG: Outer relax weight (all) 1 > >> > > > > > > > HYPRE BoomerAMG: Using CF-relaxation > >> > > > > > > > HYPRE BoomerAMG: Measure type local > >> > > > > > > > HYPRE BoomerAMG: Coarsen type PMIS > >> > > > > > > > HYPRE BoomerAMG: Interpolation type classical > >> > > > > > > > linear system matrix = precond matrix: > >> > > > > > > > Mat Object: (fieldsplit_u_) 4 MPI > processes > >> > > > > > > > type: mpiaij > >> > > > > > > > rows=938910, cols=938910, bs=3 > >> > > > > > > > total: nonzeros=8.60906e+07, allocated > >> nonzeros=8.60906e+07 > >> > > > > > > > total number of mallocs used during MatSetValues > >> calls =0 > >> > > > > > > > using I-node (on process 0) routines: found > 78749 > >> nodes, limit used is 5 > >> > > > > > > > Split number 1 Defined by IS > >> > > > > > > > KSP Object: (fieldsplit_wp_) 4 MPI processes > >> > > > > > > > type: preonly > >> > > > > > > > maximum iterations=10000, initial guess is zero > >> > > > > > > > tolerances: relative=1e-05, absolute=1e-50, > >> divergence=10000 > >> > > > > > > > left preconditioning > >> > > > > > > > using NONE norm type for convergence test > >> > > > > > > > PC Object: (fieldsplit_wp_) 4 MPI processes > >> > > > > > > > type: lu > >> > > > > > > > LU: out-of-place factorization > >> > > > > > > > tolerance for zero pivot 2.22045e-14 > >> > > > > > > > matrix ordering: natural > >> > > > > > > > factor fill ratio given 0, needed 0 > >> > > > > > > > Factored matrix follows: > >> > > > > > > > Mat Object: 4 MPI processes > >> > > > > > > > type: mpiaij > >> > > > > > > > rows=34141, cols=34141 > >> > > > > > > > package used to perform factorization: > pastix > >> > > > > > > > Error : -nan > >> > > > > > > > Error : -nan > >> > > > > > > > total: nonzeros=0, allocated nonzeros=0 > >> > > > > > > > Error : -nan > >> > > > > > > > total number of mallocs used during MatSetValues > calls =0 > >> > > > > > > > PaStiX run parameters: > >> > > > > > > > Matrix type : > >> Symmetric > >> > > > > > > > Level of printing (0,1,2): 0 > >> > > > > > > > Number of refinements iterations : 0 > >> > > > > > > > Error : -nan > >> > > > > > > > linear system matrix = precond matrix: > >> > > > > > > > Mat Object: (fieldsplit_wp_) 4 MPI > processes > >> > > > > > > > type: mpiaij > >> > > > > > > > rows=34141, cols=34141 > >> > > > > > > > total: nonzeros=485655, allocated nonzeros=485655 > >> > > > > > > > total number of mallocs used during MatSetValues > >> calls =0 > >> > > > > > > > not using I-node (on process 0) routines > >> > > > > > > > linear system matrix = precond matrix: > >> > > > > > > > Mat Object: 4 MPI processes > >> > > > > > > > type: mpiaij > >> > > > > > > > rows=973051, cols=973051 > >> > > > > > > > total: nonzeros=9.90037e+07, allocated > >> nonzeros=9.90037e+07 > >> > > > > > > > total number of mallocs used during MatSetValues > calls =0 > >> > > > > > > > using I-node (on process 0) routines: found 78749 > >> nodes, limit used is 5 > >> > > > > > > > > >> > > > > > > > The pattern of convergence gives a hint that this system > is > >> somehow bad/singular. But I don't know why the preconditioned error > goes up > >> too high. Anyone has an idea? > >> > > > > > > > > >> > > > > > > > Best regards > >> > > > > > > > Giang Bui > >> > > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > >> > > > > >> > > > >> > > > >> > > >> > > >> > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.mayhem23 at gmail.com Wed May 3 02:45:04 2017 From: dave.mayhem23 at gmail.com (Dave May) Date: Wed, 03 May 2017 07:45:04 +0000 Subject: [petsc-users] strange convergence In-Reply-To: References: <7891536D-91FE-4BFF-8DAD-CE7AB85A4E57@mcs.anl.gov> <425BBB58-9721-49F3-8C86-940F08E925F7@mcs.anl.gov> <42EB791A-40C2-439F-A5F7-5F8C15CECA6F@mcs.anl.gov> <82193784-B4C4-47D7-80EA-25F549C9091B@mcs.anl.gov> <87wpa3wd5j.fsf@jedbrown.org> Message-ID: On Wed, 3 May 2017 at 09:29, Hoang Giang Bui wrote: > Dear Jed > > If I understood you correctly you suggest to avoid penalty by using the > Lagrange multiplier for the mortar constraint? In this case it leads to the > use of discrete Lagrange multiplier space. Do you or anyone already have > experience using discrete Lagrange multiplier space with Petsc? > Yes - this is similar to solving incompressible Stokes in which the pressure is a Lagrange multiplier enforcing the div(v)=0 constraint. Robust preconditioners for this problem are constructed using PCFIELDSPLIT. Thanks, Dave > There is also similar question on stackexchange > > https://scicomp.stackexchange.com/questions/25113/preconditioners-and-discrete-lagrange-multipliers > > Giang > > On Sat, Apr 29, 2017 at 3:34 PM, Jed Brown wrote: > >> Hoang Giang Bui writes: >> >> > Hi Barry >> > >> > The first block is from a standard solid mechanics discretization based >> on >> > balance of momentum equation. There is some material involved but in >> > principal it's well-posed elasticity equation with positive definite >> > tangent operator. The "gluing business" uses the mortar method to keep >> the >> > continuity of displacement. Instead of using Lagrange multiplier to >> treat >> > the constraint I used penalty method to penalize the energy. The >> > discretization form of mortar is quite simple >> > >> > \int_{\Gamma_1} { rho * (\delta u_1 - \delta u_2) * (u_1 - u_2) dA } >> > >> > rho is penalty parameter. In the simulation I initially set it low (~E) >> to >> > preserve the conditioning of the system. >> >> There are two things that can go wrong here with AMG: >> >> * The penalty term can mess up the strength of connection heuristics >> such that you get poor choice of C-points (classical AMG like >> BoomerAMG) or poor choice of aggregates (smoothed aggregation). >> >> * The penalty term can prevent Jacobi smoothing from being effective; in >> this case, it can lead to poor coarse basis functions (higher energy >> than they should be) and poor smoothing in an MG cycle. You can fix >> the poor smoothing in the MG cycle by using a stronger smoother, like >> ASM with some overlap. >> >> I'm generally not a fan of penalty methods due to the irritating >> tradeoffs and often poor solver performance. >> >> > In the figure below, the colorful blocks are u_1 and the base is u_2. >> Both >> > u_1 and u_2 use isoparametric quadratic approximation. >> > >> > ? >> > Snapshot.png >> > < >> https://drive.google.com/file/d/0Bw8Hmu0-YGQXc2hKQ1BhQ1I4OEU/view?usp=drive_web >> > >> > ??? >> > >> > Giang >> > >> > On Fri, Apr 28, 2017 at 6:21 PM, Barry Smith >> wrote: >> > >> >> >> >> Ok, so boomerAMG algebraic multigrid is not good for the first block. >> >> You mentioned the first block has two things glued together? AMG is >> >> fantastic for certain problems but doesn't work for everything. >> >> >> >> Tell us more about the first block, what PDE it comes from, what >> >> discretization, and what the "gluing business" is and maybe we'll have >> >> suggestions for how to precondition it. >> >> >> >> Barry >> >> >> >> > On Apr 28, 2017, at 3:56 AM, Hoang Giang Bui >> wrote: >> >> > >> >> > It's in fact quite good >> >> > >> >> > Residual norms for fieldsplit_u_ solve. >> >> > 0 KSP Residual norm 4.014715925568e+00 >> >> > 1 KSP Residual norm 2.160497019264e-10 >> >> > Residual norms for fieldsplit_wp_ solve. >> >> > 0 KSP Residual norm 0.000000000000e+00 >> >> > 0 KSP preconditioned resid norm 4.014715925568e+00 true resid norm >> >> 9.006493082896e+06 ||r(i)||/||b|| 1.000000000000e+00 >> >> > Residual norms for fieldsplit_u_ solve. >> >> > 0 KSP Residual norm 9.999999999416e-01 >> >> > 1 KSP Residual norm 7.118380416383e-11 >> >> > Residual norms for fieldsplit_wp_ solve. >> >> > 0 KSP Residual norm 0.000000000000e+00 >> >> > 1 KSP preconditioned resid norm 1.701150951035e-10 true resid norm >> >> 5.494262251846e-04 ||r(i)||/||b|| 6.100334726599e-11 >> >> > Linear solve converged due to CONVERGED_ATOL iterations 1 >> >> > >> >> > Giang >> >> > >> >> > On Thu, Apr 27, 2017 at 5:25 PM, Barry Smith >> wrote: >> >> > >> >> > Run again using LU on both blocks to see what happens. >> >> > >> >> > >> >> > > On Apr 27, 2017, at 2:14 AM, Hoang Giang Bui >> >> wrote: >> >> > > >> >> > > I have changed the way to tie the nonconforming mesh. It seems the >> >> matrix now is better >> >> > > >> >> > > with -pc_type lu the output is >> >> > > 0 KSP preconditioned resid norm 3.308678584240e-01 true resid >> norm >> >> 9.006493082896e+06 ||r(i)||/||b|| 1.000000000000e+00 >> >> > > 1 KSP preconditioned resid norm 2.004313395301e-12 true resid >> norm >> >> 2.549872332830e-05 ||r(i)||/||b|| 2.831148938173e-12 >> >> > > Linear solve converged due to CONVERGED_ATOL iterations 1 >> >> > > >> >> > > >> >> > > with -pc_type fieldsplit -fieldsplit_u_pc_type hypre >> >> -fieldsplit_wp_pc_type lu the convergence is slow >> >> > > 0 KSP preconditioned resid norm 1.116302362553e-01 true resid >> norm >> >> 9.006493083520e+06 ||r(i)||/||b|| 1.000000000000e+00 >> >> > > 1 KSP preconditioned resid norm 2.582134825666e-02 true resid >> norm >> >> 9.268347719866e+06 ||r(i)||/||b|| 1.029073984060e+00 >> >> > > ... >> >> > > 824 KSP preconditioned resid norm 1.018542387738e-09 true resid >> norm >> >> 2.906608839310e+02 ||r(i)||/||b|| 3.227237074804e-05 >> >> > > 825 KSP preconditioned resid norm 9.743727947637e-10 true resid >> norm >> >> 2.820369993061e+02 ||r(i)||/||b|| 3.131485215062e-05 >> >> > > Linear solve converged due to CONVERGED_ATOL iterations 825 >> >> > > >> >> > > checking with additional -fieldsplit_u_ksp_type richardson >> >> -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 >> >> -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor >> >> -fieldsplit_wp_ksp_max_it 1 gives >> >> > > >> >> > > 0 KSP preconditioned resid norm 1.116302362553e-01 true resid >> norm >> >> 9.006493083520e+06 ||r(i)||/||b|| 1.000000000000e+00 >> >> > > Residual norms for fieldsplit_u_ solve. >> >> > > 0 KSP Residual norm 5.803507549280e-01 >> >> > > 1 KSP Residual norm 2.069538175950e-01 >> >> > > Residual norms for fieldsplit_wp_ solve. >> >> > > 0 KSP Residual norm 0.000000000000e+00 >> >> > > 1 KSP preconditioned resid norm 2.582134825666e-02 true resid >> norm >> >> 9.268347719866e+06 ||r(i)||/||b|| 1.029073984060e+00 >> >> > > Residual norms for fieldsplit_u_ solve. >> >> > > 0 KSP Residual norm 7.831796195225e-01 >> >> > > 1 KSP Residual norm 1.734608520110e-01 >> >> > > Residual norms for fieldsplit_wp_ solve. >> >> > > 0 KSP Residual norm 0.000000000000e+00 >> >> > > .... >> >> > > 823 KSP preconditioned resid norm 1.065070135605e-09 true resid >> norm >> >> 3.081881356833e+02 ||r(i)||/||b|| 3.421843916665e-05 >> >> > > Residual norms for fieldsplit_u_ solve. >> >> > > 0 KSP Residual norm 6.113806394327e-01 >> >> > > 1 KSP Residual norm 1.535465290944e-01 >> >> > > Residual norms for fieldsplit_wp_ solve. >> >> > > 0 KSP Residual norm 0.000000000000e+00 >> >> > > 824 KSP preconditioned resid norm 1.018542387746e-09 true resid >> norm >> >> 2.906608839353e+02 ||r(i)||/||b|| 3.227237074851e-05 >> >> > > Residual norms for fieldsplit_u_ solve. >> >> > > 0 KSP Residual norm 6.123437055586e-01 >> >> > > 1 KSP Residual norm 1.524661826133e-01 >> >> > > Residual norms for fieldsplit_wp_ solve. >> >> > > 0 KSP Residual norm 0.000000000000e+00 >> >> > > 825 KSP preconditioned resid norm 9.743727947718e-10 true resid >> norm >> >> 2.820369990571e+02 ||r(i)||/||b|| 3.131485212298e-05 >> >> > > Linear solve converged due to CONVERGED_ATOL iterations 825 >> >> > > >> >> > > >> >> > > The residual for wp block is zero since in this first step the rhs >> is >> >> zero. As can see in the output, the multigrid does not perform well to >> >> reduce the residual in the sub-solve. Is my observation right? what >> can be >> >> done to improve this? >> >> > > >> >> > > >> >> > > Giang >> >> > > >> >> > > On Tue, Apr 25, 2017 at 12:17 AM, Barry Smith >> >> wrote: >> >> > > >> >> > > This can happen in the matrix is singular or nearly singular or >> if >> >> the factorization generates small pivots, which can occur for even >> >> nonsingular problems if the matrix is poorly scaled or just plain >> nasty. >> >> > > >> >> > > >> >> > > > On Apr 24, 2017, at 5:10 PM, Hoang Giang Bui > > >> >> wrote: >> >> > > > >> >> > > > It took a while, here I send you the output >> >> > > > >> >> > > > 0 KSP preconditioned resid norm 3.129073545457e+05 true resid >> norm >> >> 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 >> >> > > > 1 KSP preconditioned resid norm 7.442444222843e-01 true resid >> norm >> >> 1.003356247696e+02 ||r(i)||/||b|| 1.112966720375e-05 >> >> > > > 2 KSP preconditioned resid norm 3.267453132529e-07 true resid >> norm >> >> 3.216722968300e+01 ||r(i)||/||b|| 3.568130084011e-06 >> >> > > > 3 KSP preconditioned resid norm 1.155046883816e-11 true resid >> norm >> >> 3.234460376820e+01 ||r(i)||/||b|| 3.587805194854e-06 >> >> > > > Linear solve converged due to CONVERGED_ATOL iterations 3 >> >> > > > KSP Object: 4 MPI processes >> >> > > > type: gmres >> >> > > > GMRES: restart=1000, using Modified Gram-Schmidt >> >> Orthogonalization >> >> > > > GMRES: happy breakdown tolerance 1e-30 >> >> > > > maximum iterations=1000, initial guess is zero >> >> > > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 >> >> > > > left preconditioning >> >> > > > using PRECONDITIONED norm type for convergence test >> >> > > > PC Object: 4 MPI processes >> >> > > > type: lu >> >> > > > LU: out-of-place factorization >> >> > > > tolerance for zero pivot 2.22045e-14 >> >> > > > matrix ordering: natural >> >> > > > factor fill ratio given 0, needed 0 >> >> > > > Factored matrix follows: >> >> > > > Mat Object: 4 MPI processes >> >> > > > type: mpiaij >> >> > > > rows=973051, cols=973051 >> >> > > > package used to perform factorization: pastix >> >> > > > Error : 3.24786e-14 >> >> > > > total: nonzeros=0, allocated nonzeros=0 >> >> > > > total number of mallocs used during MatSetValues calls >> =0 >> >> > > > PaStiX run parameters: >> >> > > > Matrix type : Unsymmetric >> >> > > > Level of printing (0,1,2): 0 >> >> > > > Number of refinements iterations : 3 >> >> > > > Error : 3.24786e-14 >> >> > > > linear system matrix = precond matrix: >> >> > > > Mat Object: 4 MPI processes >> >> > > > type: mpiaij >> >> > > > rows=973051, cols=973051 >> >> > > > Error : 3.24786e-14 >> >> > > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 >> >> > > > total number of mallocs used during MatSetValues calls =0 >> >> > > > using I-node (on process 0) routines: found 78749 nodes, >> limit >> >> used is 5 >> >> > > > Error : 3.24786e-14 >> >> > > > >> >> > > > It doesn't do as you said. Something is not right here. I will >> look >> >> in depth. >> >> > > > >> >> > > > Giang >> >> > > > >> >> > > > On Mon, Apr 24, 2017 at 8:21 PM, Barry Smith > > >> >> wrote: >> >> > > > >> >> > > > > On Apr 24, 2017, at 12:47 PM, Hoang Giang Bui < >> hgbk2008 at gmail.com> >> >> wrote: >> >> > > > > >> >> > > > > Good catch. I get this for the very first step, maybe at that >> time >> >> the rhs_w is zero. >> >> > > > >> >> > > > With the multiplicative composition the right hand side of >> the >> >> second solve is the initial right hand side of the second solve minus >> >> A_10*x where x is the solution to the first sub solve and A_10 is the >> lower >> >> left block of the outer matrix. So unless both the initial right hand >> side >> >> has a zero for the second block and A_10 is identically zero the right >> hand >> >> side for the second sub solve should not be zero. Is A_10 == 0? >> >> > > > >> >> > > > >> >> > > > > In the later step, it shows 2 step convergence >> >> > > > > >> >> > > > > Residual norms for fieldsplit_u_ solve. >> >> > > > > 0 KSP Residual norm 3.165886479830e+04 >> >> > > > > 1 KSP Residual norm 2.905922877684e-01 >> >> > > > > Residual norms for fieldsplit_wp_ solve. >> >> > > > > 0 KSP Residual norm 2.397669419027e-01 >> >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> >> > > > > 0 KSP preconditioned resid norm 3.165886479920e+04 true resid >> >> norm 7.963616922323e+05 ||r(i)||/||b|| 1.000000000000e+00 >> >> > > > > Residual norms for fieldsplit_u_ solve. >> >> > > > > 0 KSP Residual norm 9.999891813771e-01 >> >> > > > > 1 KSP Residual norm 1.512000395579e-05 >> >> > > > > Residual norms for fieldsplit_wp_ solve. >> >> > > > > 0 KSP Residual norm 8.192702188243e-06 >> >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> >> > > > > 1 KSP preconditioned resid norm 5.252183822848e-02 true resid >> >> norm 7.135927677844e+04 ||r(i)||/||b|| 8.960661653427e-02 >> >> > > > >> >> > > > The outer residual norms are still wonky, the preconditioned >> >> residual norm goes from 3.165886479920e+04 to 5.252183822848e-02 which >> is a >> >> huge drop but the 7.963616922323e+05 drops very much less >> >> 7.135927677844e+04. This is not normal. >> >> > > > >> >> > > > What if you just use -pc_type lu for the entire system (no >> >> fieldsplit), does the true residual drop to almost zero in the first >> >> iteration (as it should?). Send the output. >> >> > > > >> >> > > > >> >> > > > >> >> > > > > Residual norms for fieldsplit_u_ solve. >> >> > > > > 0 KSP Residual norm 6.946213936597e-01 >> >> > > > > 1 KSP Residual norm 1.195514007343e-05 >> >> > > > > Residual norms for fieldsplit_wp_ solve. >> >> > > > > 0 KSP Residual norm 1.025694497535e+00 >> >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> >> > > > > 2 KSP preconditioned resid norm 8.785709535405e-03 true resid >> >> norm 1.419341799277e+04 ||r(i)||/||b|| 1.782282866091e-02 >> >> > > > > Residual norms for fieldsplit_u_ solve. >> >> > > > > 0 KSP Residual norm 7.255149996405e-01 >> >> > > > > 1 KSP Residual norm 6.583512434218e-06 >> >> > > > > Residual norms for fieldsplit_wp_ solve. >> >> > > > > 0 KSP Residual norm 1.015229700337e+00 >> >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> >> > > > > 3 KSP preconditioned resid norm 7.110407712709e-04 true resid >> >> norm 5.284940654154e+02 ||r(i)||/||b|| 6.636357205153e-04 >> >> > > > > Residual norms for fieldsplit_u_ solve. >> >> > > > > 0 KSP Residual norm 3.512243341400e-01 >> >> > > > > 1 KSP Residual norm 2.032490351200e-06 >> >> > > > > Residual norms for fieldsplit_wp_ solve. >> >> > > > > 0 KSP Residual norm 1.282327290982e+00 >> >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> >> > > > > 4 KSP preconditioned resid norm 3.482036620521e-05 true resid >> >> norm 4.291231924307e+01 ||r(i)||/||b|| 5.388546393133e-05 >> >> > > > > Residual norms for fieldsplit_u_ solve. >> >> > > > > 0 KSP Residual norm 3.423609338053e-01 >> >> > > > > 1 KSP Residual norm 4.213703301972e-07 >> >> > > > > Residual norms for fieldsplit_wp_ solve. >> >> > > > > 0 KSP Residual norm 1.157384757538e+00 >> >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> >> > > > > 5 KSP preconditioned resid norm 1.203470314534e-06 true resid >> >> norm 4.544956156267e+00 ||r(i)||/||b|| 5.707150658550e-06 >> >> > > > > Residual norms for fieldsplit_u_ solve. >> >> > > > > 0 KSP Residual norm 3.838596289995e-01 >> >> > > > > 1 KSP Residual norm 9.927864176103e-08 >> >> > > > > Residual norms for fieldsplit_wp_ solve. >> >> > > > > 0 KSP Residual norm 1.066298905618e+00 >> >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> >> > > > > 6 KSP preconditioned resid norm 3.331619244266e-08 true resid >> >> norm 2.821511729024e+00 ||r(i)||/||b|| 3.543002829675e-06 >> >> > > > > Residual norms for fieldsplit_u_ solve. >> >> > > > > 0 KSP Residual norm 4.624964188094e-01 >> >> > > > > 1 KSP Residual norm 6.418229775372e-08 >> >> > > > > Residual norms for fieldsplit_wp_ solve. >> >> > > > > 0 KSP Residual norm 9.800784311614e-01 >> >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> >> > > > > 7 KSP preconditioned resid norm 8.788046233297e-10 true resid >> >> norm 2.849209671705e+00 ||r(i)||/||b|| 3.577783436215e-06 >> >> > > > > Linear solve converged due to CONVERGED_ATOL iterations 7 >> >> > > > > >> >> > > > > The outer operator is an explicit matrix. >> >> > > > > >> >> > > > > Giang >> >> > > > > >> >> > > > > On Mon, Apr 24, 2017 at 7:32 PM, Barry Smith < >> bsmith at mcs.anl.gov> >> >> wrote: >> >> > > > > >> >> > > > > > On Apr 24, 2017, at 3:16 AM, Hoang Giang Bui < >> hgbk2008 at gmail.com> >> >> wrote: >> >> > > > > > >> >> > > > > > Thanks Barry, trying with -fieldsplit_u_type lu gives better >> >> convergence. I still used 4 procs though, probably with 1 proc it >> should >> >> also be the same. >> >> > > > > > >> >> > > > > > The u block used a Nitsche-type operator to connect two >> >> non-matching domains. I don't think it will leave some rigid body >> motion >> >> leads to not sufficient constraints. Maybe you have other idea? >> >> > > > > > >> >> > > > > > Residual norms for fieldsplit_u_ solve. >> >> > > > > > 0 KSP Residual norm 3.129067184300e+05 >> >> > > > > > 1 KSP Residual norm 5.906261468196e-01 >> >> > > > > > Residual norms for fieldsplit_wp_ solve. >> >> > > > > > 0 KSP Residual norm 0.000000000000e+00 >> >> > > > > >> >> > > > > ^^^^ something is wrong here. The sub solve should not be >> >> starting with a 0 residual (this means the right hand side for this sub >> >> solve is zero which it should not be). >> >> > > > > >> >> > > > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 >> >> > > > > >> >> > > > > >> >> > > > > How are you providing the outer operator? As an explicit >> matrix >> >> or with some shell matrix? >> >> > > > > >> >> > > > > >> >> > > > > >> >> > > > > > 0 KSP preconditioned resid norm 3.129067184300e+05 true >> resid >> >> norm 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 >> >> > > > > > Residual norms for fieldsplit_u_ solve. >> >> > > > > > 0 KSP Residual norm 9.999955993437e-01 >> >> > > > > > 1 KSP Residual norm 4.019774691831e-06 >> >> > > > > > Residual norms for fieldsplit_wp_ solve. >> >> > > > > > 0 KSP Residual norm 0.000000000000e+00 >> >> > > > > > 1 KSP preconditioned resid norm 5.003913641475e-01 true >> resid >> >> norm 4.692996324114e+01 ||r(i)||/||b|| 5.205677185522e-06 >> >> > > > > > Residual norms for fieldsplit_u_ solve. >> >> > > > > > 0 KSP Residual norm 1.000012180204e+00 >> >> > > > > > 1 KSP Residual norm 1.017367950422e-05 >> >> > > > > > Residual norms for fieldsplit_wp_ solve. >> >> > > > > > 0 KSP Residual norm 0.000000000000e+00 >> >> > > > > > 2 KSP preconditioned resid norm 2.330910333756e-07 true >> resid >> >> norm 3.474855463983e+01 ||r(i)||/||b|| 3.854461960453e-06 >> >> > > > > > Residual norms for fieldsplit_u_ solve. >> >> > > > > > 0 KSP Residual norm 1.000004200085e+00 >> >> > > > > > 1 KSP Residual norm 6.231613102458e-06 >> >> > > > > > Residual norms for fieldsplit_wp_ solve. >> >> > > > > > 0 KSP Residual norm 0.000000000000e+00 >> >> > > > > > 3 KSP preconditioned resid norm 8.671259838389e-11 true >> resid >> >> norm 3.545103468011e+01 ||r(i)||/||b|| 3.932384125024e-06 >> >> > > > > > Linear solve converged due to CONVERGED_ATOL iterations 3 >> >> > > > > > KSP Object: 4 MPI processes >> >> > > > > > type: gmres >> >> > > > > > GMRES: restart=1000, using Modified Gram-Schmidt >> >> Orthogonalization >> >> > > > > > GMRES: happy breakdown tolerance 1e-30 >> >> > > > > > maximum iterations=1000, initial guess is zero >> >> > > > > > tolerances: relative=1e-20, absolute=1e-09, >> divergence=10000 >> >> > > > > > left preconditioning >> >> > > > > > using PRECONDITIONED norm type for convergence test >> >> > > > > > PC Object: 4 MPI processes >> >> > > > > > type: fieldsplit >> >> > > > > > FieldSplit with MULTIPLICATIVE composition: total splits >> = 2 >> >> > > > > > Solver info for each split is in the following KSP >> objects: >> >> > > > > > Split number 0 Defined by IS >> >> > > > > > KSP Object: (fieldsplit_u_) 4 MPI processes >> >> > > > > > type: richardson >> >> > > > > > Richardson: damping factor=1 >> >> > > > > > maximum iterations=1, initial guess is zero >> >> > > > > > tolerances: relative=1e-05, absolute=1e-50, >> >> divergence=10000 >> >> > > > > > left preconditioning >> >> > > > > > using PRECONDITIONED norm type for convergence test >> >> > > > > > PC Object: (fieldsplit_u_) 4 MPI processes >> >> > > > > > type: lu >> >> > > > > > LU: out-of-place factorization >> >> > > > > > tolerance for zero pivot 2.22045e-14 >> >> > > > > > matrix ordering: natural >> >> > > > > > factor fill ratio given 0, needed 0 >> >> > > > > > Factored matrix follows: >> >> > > > > > Mat Object: 4 MPI processes >> >> > > > > > type: mpiaij >> >> > > > > > rows=938910, cols=938910 >> >> > > > > > package used to perform factorization: pastix >> >> > > > > > total: nonzeros=0, allocated nonzeros=0 >> >> > > > > > Error : 3.36878e-14 >> >> > > > > > total number of mallocs used during MatSetValues >> calls >> >> =0 >> >> > > > > > PaStiX run parameters: >> >> > > > > > Matrix type : >> Unsymmetric >> >> > > > > > Level of printing (0,1,2): 0 >> >> > > > > > Number of refinements iterations : 3 >> >> > > > > > Error : 3.36878e-14 >> >> > > > > > linear system matrix = precond matrix: >> >> > > > > > Mat Object: (fieldsplit_u_) 4 MPI processes >> >> > > > > > type: mpiaij >> >> > > > > > rows=938910, cols=938910, bs=3 >> >> > > > > > Error : 3.36878e-14 >> >> > > > > > Error : 3.36878e-14 >> >> > > > > > total: nonzeros=8.60906e+07, allocated >> >> nonzeros=8.60906e+07 >> >> > > > > > total number of mallocs used during MatSetValues >> calls =0 >> >> > > > > > using I-node (on process 0) routines: found 78749 >> >> nodes, limit used is 5 >> >> > > > > > Split number 1 Defined by IS >> >> > > > > > KSP Object: (fieldsplit_wp_) 4 MPI processes >> >> > > > > > type: richardson >> >> > > > > > Richardson: damping factor=1 >> >> > > > > > maximum iterations=1, initial guess is zero >> >> > > > > > tolerances: relative=1e-05, absolute=1e-50, >> >> divergence=10000 >> >> > > > > > left preconditioning >> >> > > > > > using PRECONDITIONED norm type for convergence test >> >> > > > > > PC Object: (fieldsplit_wp_) 4 MPI processes >> >> > > > > > type: lu >> >> > > > > > LU: out-of-place factorization >> >> > > > > > tolerance for zero pivot 2.22045e-14 >> >> > > > > > matrix ordering: natural >> >> > > > > > factor fill ratio given 0, needed 0 >> >> > > > > > Factored matrix follows: >> >> > > > > > Mat Object: 4 MPI processes >> >> > > > > > type: mpiaij >> >> > > > > > rows=34141, cols=34141 >> >> > > > > > package used to perform factorization: pastix >> >> > > > > > Error : -nan >> >> > > > > > Error : -nan >> >> > > > > > Error : -nan >> >> > > > > > total: nonzeros=0, allocated nonzeros=0 >> >> > > > > > total number of mallocs used during >> MatSetValues >> >> calls =0 >> >> > > > > > PaStiX run parameters: >> >> > > > > > Matrix type : >> Symmetric >> >> > > > > > Level of printing (0,1,2): 0 >> >> > > > > > Number of refinements iterations : 0 >> >> > > > > > Error : -nan >> >> > > > > > linear system matrix = precond matrix: >> >> > > > > > Mat Object: (fieldsplit_wp_) 4 MPI processes >> >> > > > > > type: mpiaij >> >> > > > > > rows=34141, cols=34141 >> >> > > > > > total: nonzeros=485655, allocated nonzeros=485655 >> >> > > > > > total number of mallocs used during MatSetValues >> calls =0 >> >> > > > > > not using I-node (on process 0) routines >> >> > > > > > linear system matrix = precond matrix: >> >> > > > > > Mat Object: 4 MPI processes >> >> > > > > > type: mpiaij >> >> > > > > > rows=973051, cols=973051 >> >> > > > > > total: nonzeros=9.90037e+07, allocated >> nonzeros=9.90037e+07 >> >> > > > > > total number of mallocs used during MatSetValues calls =0 >> >> > > > > > using I-node (on process 0) routines: found 78749 >> nodes, >> >> limit used is 5 >> >> > > > > > >> >> > > > > > >> >> > > > > > >> >> > > > > > Giang >> >> > > > > > >> >> > > > > > On Sun, Apr 23, 2017 at 10:19 PM, Barry Smith < >> >> bsmith at mcs.anl.gov> wrote: >> >> > > > > > >> >> > > > > > > On Apr 23, 2017, at 2:42 PM, Hoang Giang Bui < >> >> hgbk2008 at gmail.com> wrote: >> >> > > > > > > >> >> > > > > > > Dear Matt/Barry >> >> > > > > > > >> >> > > > > > > With your options, it results in >> >> > > > > > > >> >> > > > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true >> >> resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 >> >> > > > > > > Residual norms for fieldsplit_u_ solve. >> >> > > > > > > 0 KSP Residual norm 2.407308987203e+36 >> >> > > > > > > 1 KSP Residual norm 5.797185652683e+72 >> >> > > > > > >> >> > > > > > It looks like Matt is right, hypre is seemly producing >> useless >> >> garbage. >> >> > > > > > >> >> > > > > > First how do things run on one process. If you have similar >> >> problems then debug on one process (debugging any kind of problem is >> always >> >> far easy on one process). >> >> > > > > > >> >> > > > > > First run with -fieldsplit_u_type lu (instead of using >> hypre) to >> >> see if that works or also produces something bad. >> >> > > > > > >> >> > > > > > What is the operator and the boundary conditions for u? It >> could >> >> be singular. >> >> > > > > > >> >> > > > > > >> >> > > > > > >> >> > > > > > >> >> > > > > > >> >> > > > > > >> >> > > > > > > Residual norms for fieldsplit_wp_ solve. >> >> > > > > > > 0 KSP Residual norm 0.000000000000e+00 >> >> > > > > > > ... >> >> > > > > > > 999 KSP preconditioned resid norm 2.920157329174e+12 true >> >> resid norm 9.015683504616e+06 ||r(i)||/||b|| 1.000059124102e+00 >> >> > > > > > > Residual norms for fieldsplit_u_ solve. >> >> > > > > > > 0 KSP Residual norm 1.533726746719e+36 >> >> > > > > > > 1 KSP Residual norm 3.692757392261e+72 >> >> > > > > > > Residual norms for fieldsplit_wp_ solve. >> >> > > > > > > 0 KSP Residual norm 0.000000000000e+00 >> >> > > > > > > >> >> > > > > > > Do you suggest that the pastix solver for the "wp" block >> >> encounters small pivot? In addition, seem like the "u" block is also >> >> singular. >> >> > > > > > > >> >> > > > > > > Giang >> >> > > > > > > >> >> > > > > > > On Sun, Apr 23, 2017 at 7:39 PM, Barry Smith < >> >> bsmith at mcs.anl.gov> wrote: >> >> > > > > > > >> >> > > > > > > Huge preconditioned norms but normal unpreconditioned >> norms >> >> almost always come from a very small pivot in an LU or ILU >> factorization. >> >> > > > > > > >> >> > > > > > > The first thing to do is monitor the two sub solves. Run >> >> with the additional options -fieldsplit_u_ksp_type richardson >> >> -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 >> >> -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor >> >> -fieldsplit_wp_ksp_max_it 1 >> >> > > > > > > >> >> > > > > > > > On Apr 23, 2017, at 12:22 PM, Hoang Giang Bui < >> >> hgbk2008 at gmail.com> wrote: >> >> > > > > > > > >> >> > > > > > > > Hello >> >> > > > > > > > >> >> > > > > > > > I encountered a strange convergence behavior that I have >> >> trouble to understand >> >> > > > > > > > >> >> > > > > > > > KSPSetFromOptions completed >> >> > > > > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true >> >> resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 >> >> > > > > > > > 1 KSP preconditioned resid norm 2.933141742664e+29 true >> >> resid norm 9.015152282123e+06 ||r(i)||/||b|| 1.000000198575e+00 >> >> > > > > > > > 2 KSP preconditioned resid norm 9.686409637174e+16 true >> >> resid norm 9.015354521944e+06 ||r(i)||/||b|| 1.000022631902e+00 >> >> > > > > > > > 3 KSP preconditioned resid norm 4.219243615809e+15 true >> >> resid norm 9.017157702420e+06 ||r(i)||/||b|| 1.000222648583e+00 >> >> > > > > > > > ..... >> >> > > > > > > > 999 KSP preconditioned resid norm 3.043754298076e+12 true >> >> resid norm 9.015425041089e+06 ||r(i)||/||b|| 1.000030454195e+00 >> >> > > > > > > > 1000 KSP preconditioned resid norm 3.043000287819e+12 >> true >> >> resid norm 9.015424313455e+06 ||r(i)||/||b|| 1.000030373483e+00 >> >> > > > > > > > Linear solve did not converge due to DIVERGED_ITS >> iterations >> >> 1000 >> >> > > > > > > > KSP Object: 4 MPI processes >> >> > > > > > > > type: gmres >> >> > > > > > > > GMRES: restart=1000, using Modified Gram-Schmidt >> >> Orthogonalization >> >> > > > > > > > GMRES: happy breakdown tolerance 1e-30 >> >> > > > > > > > maximum iterations=1000, initial guess is zero >> >> > > > > > > > tolerances: relative=1e-20, absolute=1e-09, >> >> divergence=10000 >> >> > > > > > > > left preconditioning >> >> > > > > > > > using PRECONDITIONED norm type for convergence test >> >> > > > > > > > PC Object: 4 MPI processes >> >> > > > > > > > type: fieldsplit >> >> > > > > > > > FieldSplit with MULTIPLICATIVE composition: total >> splits >> >> = 2 >> >> > > > > > > > Solver info for each split is in the following KSP >> >> objects: >> >> > > > > > > > Split number 0 Defined by IS >> >> > > > > > > > KSP Object: (fieldsplit_u_) 4 MPI processes >> >> > > > > > > > type: preonly >> >> > > > > > > > maximum iterations=10000, initial guess is zero >> >> > > > > > > > tolerances: relative=1e-05, absolute=1e-50, >> >> divergence=10000 >> >> > > > > > > > left preconditioning >> >> > > > > > > > using NONE norm type for convergence test >> >> > > > > > > > PC Object: (fieldsplit_u_) 4 MPI processes >> >> > > > > > > > type: hypre >> >> > > > > > > > HYPRE BoomerAMG preconditioning >> >> > > > > > > > HYPRE BoomerAMG: Cycle type V >> >> > > > > > > > HYPRE BoomerAMG: Maximum number of levels 25 >> >> > > > > > > > HYPRE BoomerAMG: Maximum number of iterations PER >> >> hypre call 1 >> >> > > > > > > > HYPRE BoomerAMG: Convergence tolerance PER hypre >> >> call 0 >> >> > > > > > > > HYPRE BoomerAMG: Threshold for strong coupling >> 0.6 >> >> > > > > > > > HYPRE BoomerAMG: Interpolation truncation factor >> 0 >> >> > > > > > > > HYPRE BoomerAMG: Interpolation: max elements per >> row >> >> 0 >> >> > > > > > > > HYPRE BoomerAMG: Number of levels of aggressive >> >> coarsening 0 >> >> > > > > > > > HYPRE BoomerAMG: Number of paths for aggressive >> >> coarsening 1 >> >> > > > > > > > HYPRE BoomerAMG: Maximum row sums 0.9 >> >> > > > > > > > HYPRE BoomerAMG: Sweeps down 1 >> >> > > > > > > > HYPRE BoomerAMG: Sweeps up 1 >> >> > > > > > > > HYPRE BoomerAMG: Sweeps on coarse 1 >> >> > > > > > > > HYPRE BoomerAMG: Relax down >> >> symmetric-SOR/Jacobi >> >> > > > > > > > HYPRE BoomerAMG: Relax up >> >> symmetric-SOR/Jacobi >> >> > > > > > > > HYPRE BoomerAMG: Relax on coarse >> >> Gaussian-elimination >> >> > > > > > > > HYPRE BoomerAMG: Relax weight (all) 1 >> >> > > > > > > > HYPRE BoomerAMG: Outer relax weight (all) 1 >> >> > > > > > > > HYPRE BoomerAMG: Using CF-relaxation >> >> > > > > > > > HYPRE BoomerAMG: Measure type local >> >> > > > > > > > HYPRE BoomerAMG: Coarsen type PMIS >> >> > > > > > > > HYPRE BoomerAMG: Interpolation type classical >> >> > > > > > > > linear system matrix = precond matrix: >> >> > > > > > > > Mat Object: (fieldsplit_u_) 4 MPI >> processes >> >> > > > > > > > type: mpiaij >> >> > > > > > > > rows=938910, cols=938910, bs=3 >> >> > > > > > > > total: nonzeros=8.60906e+07, allocated >> >> nonzeros=8.60906e+07 >> >> > > > > > > > total number of mallocs used during MatSetValues >> >> calls =0 >> >> > > > > > > > using I-node (on process 0) routines: found >> 78749 >> >> nodes, limit used is 5 >> >> > > > > > > > Split number 1 Defined by IS >> >> > > > > > > > KSP Object: (fieldsplit_wp_) 4 MPI processes >> >> > > > > > > > type: preonly >> >> > > > > > > > maximum iterations=10000, initial guess is zero >> >> > > > > > > > tolerances: relative=1e-05, absolute=1e-50, >> >> divergence=10000 >> >> > > > > > > > left preconditioning >> >> > > > > > > > using NONE norm type for convergence test >> >> > > > > > > > PC Object: (fieldsplit_wp_) 4 MPI processes >> >> > > > > > > > type: lu >> >> > > > > > > > LU: out-of-place factorization >> >> > > > > > > > tolerance for zero pivot 2.22045e-14 >> >> > > > > > > > matrix ordering: natural >> >> > > > > > > > factor fill ratio given 0, needed 0 >> >> > > > > > > > Factored matrix follows: >> >> > > > > > > > Mat Object: 4 MPI processes >> >> > > > > > > > type: mpiaij >> >> > > > > > > > rows=34141, cols=34141 >> >> > > > > > > > package used to perform factorization: >> pastix >> >> > > > > > > > Error : -nan >> >> > > > > > > > Error : -nan >> >> > > > > > > > total: nonzeros=0, allocated nonzeros=0 >> >> > > > > > > > Error : -nan >> >> > > > > > > > total number of mallocs used during MatSetValues >> calls =0 >> >> > > > > > > > PaStiX run parameters: >> >> > > > > > > > Matrix type : >> >> Symmetric >> >> > > > > > > > Level of printing (0,1,2): 0 >> >> > > > > > > > Number of refinements iterations : 0 >> >> > > > > > > > Error : -nan >> >> > > > > > > > linear system matrix = precond matrix: >> >> > > > > > > > Mat Object: (fieldsplit_wp_) 4 MPI >> processes >> >> > > > > > > > type: mpiaij >> >> > > > > > > > rows=34141, cols=34141 >> >> > > > > > > > total: nonzeros=485655, allocated nonzeros=485655 >> >> > > > > > > > total number of mallocs used during MatSetValues >> >> calls =0 >> >> > > > > > > > not using I-node (on process 0) routines >> >> > > > > > > > linear system matrix = precond matrix: >> >> > > > > > > > Mat Object: 4 MPI processes >> >> > > > > > > > type: mpiaij >> >> > > > > > > > rows=973051, cols=973051 >> >> > > > > > > > total: nonzeros=9.90037e+07, allocated >> >> nonzeros=9.90037e+07 >> >> > > > > > > > total number of mallocs used during MatSetValues >> calls =0 >> >> > > > > > > > using I-node (on process 0) routines: found 78749 >> >> nodes, limit used is 5 >> >> > > > > > > > >> >> > > > > > > > The pattern of convergence gives a hint that this system >> is >> >> somehow bad/singular. But I don't know why the preconditioned error >> goes up >> >> too high. Anyone has an idea? >> >> > > > > > > > >> >> > > > > > > > Best regards >> >> > > > > > > > Giang Bui >> >> > > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > >> >> > > > > > >> >> > > > > >> >> > > > > >> >> > > > >> >> > > > >> >> > > >> >> > > >> >> > >> >> > >> >> >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Lukasz.Kaczmarczyk at glasgow.ac.uk Wed May 3 02:53:46 2017 From: Lukasz.Kaczmarczyk at glasgow.ac.uk (Lukasz Kaczmarczyk) Date: Wed, 3 May 2017 07:53:46 +0000 Subject: [petsc-users] strange convergence In-Reply-To: References: <7891536D-91FE-4BFF-8DAD-CE7AB85A4E57@mcs.anl.gov> <425BBB58-9721-49F3-8C86-940F08E925F7@mcs.anl.gov> <42EB791A-40C2-439F-A5F7-5F8C15CECA6F@mcs.anl.gov> <82193784-B4C4-47D7-80EA-25F549C9091B@mcs.anl.gov> <87wpa3wd5j.fsf@jedbrown.org> Message-ID: <80368283-C55F-49AD-B986-83AD0CD72338@glasgow.ac.uk> On 3 May 2017, at 08:29, Hoang Giang Bui > wrote: Dear Jed If I understood you correctly you suggest to avoid penalty by using the Lagrange multiplier for the mortar constraint? In this case it leads to the use of discrete Lagrange multiplier space. Do you or anyone already have experience using discrete Lagrange multiplier space with Petsc? There is also similar question on stackexchange https://scicomp.stackexchange.com/questions/25113/preconditioners-and-discrete-lagrange-multipliers Hello, FIELDSPLIT solver can help with this. We apply this for slightly different problem, but with Lagrange multipliers, see this http://mofem.eng.gla.ac.uk/mofem/html/cell__forces_8cpp.html we working as well on mortar contact, but at development stage we use LU https://doi.org/10.5281/zenodo.439739 Hope that will be somehow helpful, Lukasz Giang On Sat, Apr 29, 2017 at 3:34 PM, Jed Brown > wrote: Hoang Giang Bui > writes: > Hi Barry > > The first block is from a standard solid mechanics discretization based on > balance of momentum equation. There is some material involved but in > principal it's well-posed elasticity equation with positive definite > tangent operator. The "gluing business" uses the mortar method to keep the > continuity of displacement. Instead of using Lagrange multiplier to treat > the constraint I used penalty method to penalize the energy. The > discretization form of mortar is quite simple > > \int_{\Gamma_1} { rho * (\delta u_1 - \delta u_2) * (u_1 - u_2) dA } > > rho is penalty parameter. In the simulation I initially set it low (~E) to > preserve the conditioning of the system. There are two things that can go wrong here with AMG: * The penalty term can mess up the strength of connection heuristics such that you get poor choice of C-points (classical AMG like BoomerAMG) or poor choice of aggregates (smoothed aggregation). * The penalty term can prevent Jacobi smoothing from being effective; in this case, it can lead to poor coarse basis functions (higher energy than they should be) and poor smoothing in an MG cycle. You can fix the poor smoothing in the MG cycle by using a stronger smoother, like ASM with some overlap. I'm generally not a fan of penalty methods due to the irritating tradeoffs and often poor solver performance. > In the figure below, the colorful blocks are u_1 and the base is u_2. Both > u_1 and u_2 use isoparametric quadratic approximation. > > ? > Snapshot.png > > ??? > > Giang > > On Fri, Apr 28, 2017 at 6:21 PM, Barry Smith > wrote: > >> >> Ok, so boomerAMG algebraic multigrid is not good for the first block. >> You mentioned the first block has two things glued together? AMG is >> fantastic for certain problems but doesn't work for everything. >> >> Tell us more about the first block, what PDE it comes from, what >> discretization, and what the "gluing business" is and maybe we'll have >> suggestions for how to precondition it. >> >> Barry >> >> > On Apr 28, 2017, at 3:56 AM, Hoang Giang Bui > wrote: >> > >> > It's in fact quite good >> > >> > Residual norms for fieldsplit_u_ solve. >> > 0 KSP Residual norm 4.014715925568e+00 >> > 1 KSP Residual norm 2.160497019264e-10 >> > Residual norms for fieldsplit_wp_ solve. >> > 0 KSP Residual norm 0.000000000000e+00 >> > 0 KSP preconditioned resid norm 4.014715925568e+00 true resid norm >> 9.006493082896e+06 ||r(i)||/||b|| 1.000000000000e+00 >> > Residual norms for fieldsplit_u_ solve. >> > 0 KSP Residual norm 9.999999999416e-01 >> > 1 KSP Residual norm 7.118380416383e-11 >> > Residual norms for fieldsplit_wp_ solve. >> > 0 KSP Residual norm 0.000000000000e+00 >> > 1 KSP preconditioned resid norm 1.701150951035e-10 true resid norm >> 5.494262251846e-04 ||r(i)||/||b|| 6.100334726599e-11 >> > Linear solve converged due to CONVERGED_ATOL iterations 1 >> > >> > Giang >> > >> > On Thu, Apr 27, 2017 at 5:25 PM, Barry Smith > wrote: >> > >> > Run again using LU on both blocks to see what happens. >> > >> > >> > > On Apr 27, 2017, at 2:14 AM, Hoang Giang Bui > >> wrote: >> > > >> > > I have changed the way to tie the nonconforming mesh. It seems the >> matrix now is better >> > > >> > > with -pc_type lu the output is >> > > 0 KSP preconditioned resid norm 3.308678584240e-01 true resid norm >> 9.006493082896e+06 ||r(i)||/||b|| 1.000000000000e+00 >> > > 1 KSP preconditioned resid norm 2.004313395301e-12 true resid norm >> 2.549872332830e-05 ||r(i)||/||b|| 2.831148938173e-12 >> > > Linear solve converged due to CONVERGED_ATOL iterations 1 >> > > >> > > >> > > with -pc_type fieldsplit -fieldsplit_u_pc_type hypre >> -fieldsplit_wp_pc_type lu the convergence is slow >> > > 0 KSP preconditioned resid norm 1.116302362553e-01 true resid norm >> 9.006493083520e+06 ||r(i)||/||b|| 1.000000000000e+00 >> > > 1 KSP preconditioned resid norm 2.582134825666e-02 true resid norm >> 9.268347719866e+06 ||r(i)||/||b|| 1.029073984060e+00 >> > > ... >> > > 824 KSP preconditioned resid norm 1.018542387738e-09 true resid norm >> 2.906608839310e+02 ||r(i)||/||b|| 3.227237074804e-05 >> > > 825 KSP preconditioned resid norm 9.743727947637e-10 true resid norm >> 2.820369993061e+02 ||r(i)||/||b|| 3.131485215062e-05 >> > > Linear solve converged due to CONVERGED_ATOL iterations 825 >> > > >> > > checking with additional -fieldsplit_u_ksp_type richardson >> -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 >> -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor >> -fieldsplit_wp_ksp_max_it 1 gives >> > > >> > > 0 KSP preconditioned resid norm 1.116302362553e-01 true resid norm >> 9.006493083520e+06 ||r(i)||/||b|| 1.000000000000e+00 >> > > Residual norms for fieldsplit_u_ solve. >> > > 0 KSP Residual norm 5.803507549280e-01 >> > > 1 KSP Residual norm 2.069538175950e-01 >> > > Residual norms for fieldsplit_wp_ solve. >> > > 0 KSP Residual norm 0.000000000000e+00 >> > > 1 KSP preconditioned resid norm 2.582134825666e-02 true resid norm >> 9.268347719866e+06 ||r(i)||/||b|| 1.029073984060e+00 >> > > Residual norms for fieldsplit_u_ solve. >> > > 0 KSP Residual norm 7.831796195225e-01 >> > > 1 KSP Residual norm 1.734608520110e-01 >> > > Residual norms for fieldsplit_wp_ solve. >> > > 0 KSP Residual norm 0.000000000000e+00 >> > > .... >> > > 823 KSP preconditioned resid norm 1.065070135605e-09 true resid norm >> 3.081881356833e+02 ||r(i)||/||b|| 3.421843916665e-05 >> > > Residual norms for fieldsplit_u_ solve. >> > > 0 KSP Residual norm 6.113806394327e-01 >> > > 1 KSP Residual norm 1.535465290944e-01 >> > > Residual norms for fieldsplit_wp_ solve. >> > > 0 KSP Residual norm 0.000000000000e+00 >> > > 824 KSP preconditioned resid norm 1.018542387746e-09 true resid norm >> 2.906608839353e+02 ||r(i)||/||b|| 3.227237074851e-05 >> > > Residual norms for fieldsplit_u_ solve. >> > > 0 KSP Residual norm 6.123437055586e-01 >> > > 1 KSP Residual norm 1.524661826133e-01 >> > > Residual norms for fieldsplit_wp_ solve. >> > > 0 KSP Residual norm 0.000000000000e+00 >> > > 825 KSP preconditioned resid norm 9.743727947718e-10 true resid norm >> 2.820369990571e+02 ||r(i)||/||b|| 3.131485212298e-05 >> > > Linear solve converged due to CONVERGED_ATOL iterations 825 >> > > >> > > >> > > The residual for wp block is zero since in this first step the rhs is >> zero. As can see in the output, the multigrid does not perform well to >> reduce the residual in the sub-solve. Is my observation right? what can be >> done to improve this? >> > > >> > > >> > > Giang >> > > >> > > On Tue, Apr 25, 2017 at 12:17 AM, Barry Smith > >> wrote: >> > > >> > > This can happen in the matrix is singular or nearly singular or if >> the factorization generates small pivots, which can occur for even >> nonsingular problems if the matrix is poorly scaled or just plain nasty. >> > > >> > > >> > > > On Apr 24, 2017, at 5:10 PM, Hoang Giang Bui > >> wrote: >> > > > >> > > > It took a while, here I send you the output >> > > > >> > > > 0 KSP preconditioned resid norm 3.129073545457e+05 true resid norm >> 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 >> > > > 1 KSP preconditioned resid norm 7.442444222843e-01 true resid norm >> 1.003356247696e+02 ||r(i)||/||b|| 1.112966720375e-05 >> > > > 2 KSP preconditioned resid norm 3.267453132529e-07 true resid norm >> 3.216722968300e+01 ||r(i)||/||b|| 3.568130084011e-06 >> > > > 3 KSP preconditioned resid norm 1.155046883816e-11 true resid norm >> 3.234460376820e+01 ||r(i)||/||b|| 3.587805194854e-06 >> > > > Linear solve converged due to CONVERGED_ATOL iterations 3 >> > > > KSP Object: 4 MPI processes >> > > > type: gmres >> > > > GMRES: restart=1000, using Modified Gram-Schmidt >> Orthogonalization >> > > > GMRES: happy breakdown tolerance 1e-30 >> > > > maximum iterations=1000, initial guess is zero >> > > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 >> > > > left preconditioning >> > > > using PRECONDITIONED norm type for convergence test >> > > > PC Object: 4 MPI processes >> > > > type: lu >> > > > LU: out-of-place factorization >> > > > tolerance for zero pivot 2.22045e-14 >> > > > matrix ordering: natural >> > > > factor fill ratio given 0, needed 0 >> > > > Factored matrix follows: >> > > > Mat Object: 4 MPI processes >> > > > type: mpiaij >> > > > rows=973051, cols=973051 >> > > > package used to perform factorization: pastix >> > > > Error : 3.24786e-14 >> > > > total: nonzeros=0, allocated nonzeros=0 >> > > > total number of mallocs used during MatSetValues calls =0 >> > > > PaStiX run parameters: >> > > > Matrix type : Unsymmetric >> > > > Level of printing (0,1,2): 0 >> > > > Number of refinements iterations : 3 >> > > > Error : 3.24786e-14 >> > > > linear system matrix = precond matrix: >> > > > Mat Object: 4 MPI processes >> > > > type: mpiaij >> > > > rows=973051, cols=973051 >> > > > Error : 3.24786e-14 >> > > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 >> > > > total number of mallocs used during MatSetValues calls =0 >> > > > using I-node (on process 0) routines: found 78749 nodes, limit >> used is 5 >> > > > Error : 3.24786e-14 >> > > > >> > > > It doesn't do as you said. Something is not right here. I will look >> in depth. >> > > > >> > > > Giang >> > > > >> > > > On Mon, Apr 24, 2017 at 8:21 PM, Barry Smith > >> wrote: >> > > > >> > > > > On Apr 24, 2017, at 12:47 PM, Hoang Giang Bui > >> wrote: >> > > > > >> > > > > Good catch. I get this for the very first step, maybe at that time >> the rhs_w is zero. >> > > > >> > > > With the multiplicative composition the right hand side of the >> second solve is the initial right hand side of the second solve minus >> A_10*x where x is the solution to the first sub solve and A_10 is the lower >> left block of the outer matrix. So unless both the initial right hand side >> has a zero for the second block and A_10 is identically zero the right hand >> side for the second sub solve should not be zero. Is A_10 == 0? >> > > > >> > > > >> > > > > In the later step, it shows 2 step convergence >> > > > > >> > > > > Residual norms for fieldsplit_u_ solve. >> > > > > 0 KSP Residual norm 3.165886479830e+04 >> > > > > 1 KSP Residual norm 2.905922877684e-01 >> > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > 0 KSP Residual norm 2.397669419027e-01 >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> > > > > 0 KSP preconditioned resid norm 3.165886479920e+04 true resid >> norm 7.963616922323e+05 ||r(i)||/||b|| 1.000000000000e+00 >> > > > > Residual norms for fieldsplit_u_ solve. >> > > > > 0 KSP Residual norm 9.999891813771e-01 >> > > > > 1 KSP Residual norm 1.512000395579e-05 >> > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > 0 KSP Residual norm 8.192702188243e-06 >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> > > > > 1 KSP preconditioned resid norm 5.252183822848e-02 true resid >> norm 7.135927677844e+04 ||r(i)||/||b|| 8.960661653427e-02 >> > > > >> > > > The outer residual norms are still wonky, the preconditioned >> residual norm goes from 3.165886479920e+04 to 5.252183822848e-02 which is a >> huge drop but the 7.963616922323e+05 drops very much less >> 7.135927677844e+04. This is not normal. >> > > > >> > > > What if you just use -pc_type lu for the entire system (no >> fieldsplit), does the true residual drop to almost zero in the first >> iteration (as it should?). Send the output. >> > > > >> > > > >> > > > >> > > > > Residual norms for fieldsplit_u_ solve. >> > > > > 0 KSP Residual norm 6.946213936597e-01 >> > > > > 1 KSP Residual norm 1.195514007343e-05 >> > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > 0 KSP Residual norm 1.025694497535e+00 >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> > > > > 2 KSP preconditioned resid norm 8.785709535405e-03 true resid >> norm 1.419341799277e+04 ||r(i)||/||b|| 1.782282866091e-02 >> > > > > Residual norms for fieldsplit_u_ solve. >> > > > > 0 KSP Residual norm 7.255149996405e-01 >> > > > > 1 KSP Residual norm 6.583512434218e-06 >> > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > 0 KSP Residual norm 1.015229700337e+00 >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> > > > > 3 KSP preconditioned resid norm 7.110407712709e-04 true resid >> norm 5.284940654154e+02 ||r(i)||/||b|| 6.636357205153e-04 >> > > > > Residual norms for fieldsplit_u_ solve. >> > > > > 0 KSP Residual norm 3.512243341400e-01 >> > > > > 1 KSP Residual norm 2.032490351200e-06 >> > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > 0 KSP Residual norm 1.282327290982e+00 >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> > > > > 4 KSP preconditioned resid norm 3.482036620521e-05 true resid >> norm 4.291231924307e+01 ||r(i)||/||b|| 5.388546393133e-05 >> > > > > Residual norms for fieldsplit_u_ solve. >> > > > > 0 KSP Residual norm 3.423609338053e-01 >> > > > > 1 KSP Residual norm 4.213703301972e-07 >> > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > 0 KSP Residual norm 1.157384757538e+00 >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> > > > > 5 KSP preconditioned resid norm 1.203470314534e-06 true resid >> norm 4.544956156267e+00 ||r(i)||/||b|| 5.707150658550e-06 >> > > > > Residual norms for fieldsplit_u_ solve. >> > > > > 0 KSP Residual norm 3.838596289995e-01 >> > > > > 1 KSP Residual norm 9.927864176103e-08 >> > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > 0 KSP Residual norm 1.066298905618e+00 >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> > > > > 6 KSP preconditioned resid norm 3.331619244266e-08 true resid >> norm 2.821511729024e+00 ||r(i)||/||b|| 3.543002829675e-06 >> > > > > Residual norms for fieldsplit_u_ solve. >> > > > > 0 KSP Residual norm 4.624964188094e-01 >> > > > > 1 KSP Residual norm 6.418229775372e-08 >> > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > 0 KSP Residual norm 9.800784311614e-01 >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> > > > > 7 KSP preconditioned resid norm 8.788046233297e-10 true resid >> norm 2.849209671705e+00 ||r(i)||/||b|| 3.577783436215e-06 >> > > > > Linear solve converged due to CONVERGED_ATOL iterations 7 >> > > > > >> > > > > The outer operator is an explicit matrix. >> > > > > >> > > > > Giang >> > > > > >> > > > > On Mon, Apr 24, 2017 at 7:32 PM, Barry Smith > >> wrote: >> > > > > >> > > > > > On Apr 24, 2017, at 3:16 AM, Hoang Giang Bui > >> wrote: >> > > > > > >> > > > > > Thanks Barry, trying with -fieldsplit_u_type lu gives better >> convergence. I still used 4 procs though, probably with 1 proc it should >> also be the same. >> > > > > > >> > > > > > The u block used a Nitsche-type operator to connect two >> non-matching domains. I don't think it will leave some rigid body motion >> leads to not sufficient constraints. Maybe you have other idea? >> > > > > > >> > > > > > Residual norms for fieldsplit_u_ solve. >> > > > > > 0 KSP Residual norm 3.129067184300e+05 >> > > > > > 1 KSP Residual norm 5.906261468196e-01 >> > > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > > 0 KSP Residual norm 0.000000000000e+00 >> > > > > >> > > > > ^^^^ something is wrong here. The sub solve should not be >> starting with a 0 residual (this means the right hand side for this sub >> solve is zero which it should not be). >> > > > > >> > > > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 >> > > > > >> > > > > >> > > > > How are you providing the outer operator? As an explicit matrix >> or with some shell matrix? >> > > > > >> > > > > >> > > > > >> > > > > > 0 KSP preconditioned resid norm 3.129067184300e+05 true resid >> norm 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 >> > > > > > Residual norms for fieldsplit_u_ solve. >> > > > > > 0 KSP Residual norm 9.999955993437e-01 >> > > > > > 1 KSP Residual norm 4.019774691831e-06 >> > > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > > 0 KSP Residual norm 0.000000000000e+00 >> > > > > > 1 KSP preconditioned resid norm 5.003913641475e-01 true resid >> norm 4.692996324114e+01 ||r(i)||/||b|| 5.205677185522e-06 >> > > > > > Residual norms for fieldsplit_u_ solve. >> > > > > > 0 KSP Residual norm 1.000012180204e+00 >> > > > > > 1 KSP Residual norm 1.017367950422e-05 >> > > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > > 0 KSP Residual norm 0.000000000000e+00 >> > > > > > 2 KSP preconditioned resid norm 2.330910333756e-07 true resid >> norm 3.474855463983e+01 ||r(i)||/||b|| 3.854461960453e-06 >> > > > > > Residual norms for fieldsplit_u_ solve. >> > > > > > 0 KSP Residual norm 1.000004200085e+00 >> > > > > > 1 KSP Residual norm 6.231613102458e-06 >> > > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > > 0 KSP Residual norm 0.000000000000e+00 >> > > > > > 3 KSP preconditioned resid norm 8.671259838389e-11 true resid >> norm 3.545103468011e+01 ||r(i)||/||b|| 3.932384125024e-06 >> > > > > > Linear solve converged due to CONVERGED_ATOL iterations 3 >> > > > > > KSP Object: 4 MPI processes >> > > > > > type: gmres >> > > > > > GMRES: restart=1000, using Modified Gram-Schmidt >> Orthogonalization >> > > > > > GMRES: happy breakdown tolerance 1e-30 >> > > > > > maximum iterations=1000, initial guess is zero >> > > > > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 >> > > > > > left preconditioning >> > > > > > using PRECONDITIONED norm type for convergence test >> > > > > > PC Object: 4 MPI processes >> > > > > > type: fieldsplit >> > > > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 >> > > > > > Solver info for each split is in the following KSP objects: >> > > > > > Split number 0 Defined by IS >> > > > > > KSP Object: (fieldsplit_u_) 4 MPI processes >> > > > > > type: richardson >> > > > > > Richardson: damping factor=1 >> > > > > > maximum iterations=1, initial guess is zero >> > > > > > tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000 >> > > > > > left preconditioning >> > > > > > using PRECONDITIONED norm type for convergence test >> > > > > > PC Object: (fieldsplit_u_) 4 MPI processes >> > > > > > type: lu >> > > > > > LU: out-of-place factorization >> > > > > > tolerance for zero pivot 2.22045e-14 >> > > > > > matrix ordering: natural >> > > > > > factor fill ratio given 0, needed 0 >> > > > > > Factored matrix follows: >> > > > > > Mat Object: 4 MPI processes >> > > > > > type: mpiaij >> > > > > > rows=938910, cols=938910 >> > > > > > package used to perform factorization: pastix >> > > > > > total: nonzeros=0, allocated nonzeros=0 >> > > > > > Error : 3.36878e-14 >> > > > > > total number of mallocs used during MatSetValues calls >> =0 >> > > > > > PaStiX run parameters: >> > > > > > Matrix type : Unsymmetric >> > > > > > Level of printing (0,1,2): 0 >> > > > > > Number of refinements iterations : 3 >> > > > > > Error : 3.36878e-14 >> > > > > > linear system matrix = precond matrix: >> > > > > > Mat Object: (fieldsplit_u_) 4 MPI processes >> > > > > > type: mpiaij >> > > > > > rows=938910, cols=938910, bs=3 >> > > > > > Error : 3.36878e-14 >> > > > > > Error : 3.36878e-14 >> > > > > > total: nonzeros=8.60906e+07, allocated >> nonzeros=8.60906e+07 >> > > > > > total number of mallocs used during MatSetValues calls =0 >> > > > > > using I-node (on process 0) routines: found 78749 >> nodes, limit used is 5 >> > > > > > Split number 1 Defined by IS >> > > > > > KSP Object: (fieldsplit_wp_) 4 MPI processes >> > > > > > type: richardson >> > > > > > Richardson: damping factor=1 >> > > > > > maximum iterations=1, initial guess is zero >> > > > > > tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000 >> > > > > > left preconditioning >> > > > > > using PRECONDITIONED norm type for convergence test >> > > > > > PC Object: (fieldsplit_wp_) 4 MPI processes >> > > > > > type: lu >> > > > > > LU: out-of-place factorization >> > > > > > tolerance for zero pivot 2.22045e-14 >> > > > > > matrix ordering: natural >> > > > > > factor fill ratio given 0, needed 0 >> > > > > > Factored matrix follows: >> > > > > > Mat Object: 4 MPI processes >> > > > > > type: mpiaij >> > > > > > rows=34141, cols=34141 >> > > > > > package used to perform factorization: pastix >> > > > > > Error : -nan >> > > > > > Error : -nan >> > > > > > Error : -nan >> > > > > > total: nonzeros=0, allocated nonzeros=0 >> > > > > > total number of mallocs used during MatSetValues >> calls =0 >> > > > > > PaStiX run parameters: >> > > > > > Matrix type : Symmetric >> > > > > > Level of printing (0,1,2): 0 >> > > > > > Number of refinements iterations : 0 >> > > > > > Error : -nan >> > > > > > linear system matrix = precond matrix: >> > > > > > Mat Object: (fieldsplit_wp_) 4 MPI processes >> > > > > > type: mpiaij >> > > > > > rows=34141, cols=34141 >> > > > > > total: nonzeros=485655, allocated nonzeros=485655 >> > > > > > total number of mallocs used during MatSetValues calls =0 >> > > > > > not using I-node (on process 0) routines >> > > > > > linear system matrix = precond matrix: >> > > > > > Mat Object: 4 MPI processes >> > > > > > type: mpiaij >> > > > > > rows=973051, cols=973051 >> > > > > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 >> > > > > > total number of mallocs used during MatSetValues calls =0 >> > > > > > using I-node (on process 0) routines: found 78749 nodes, >> limit used is 5 >> > > > > > >> > > > > > >> > > > > > >> > > > > > Giang >> > > > > > >> > > > > > On Sun, Apr 23, 2017 at 10:19 PM, Barry Smith < >> bsmith at mcs.anl.gov> wrote: >> > > > > > >> > > > > > > On Apr 23, 2017, at 2:42 PM, Hoang Giang Bui < >> hgbk2008 at gmail.com> wrote: >> > > > > > > >> > > > > > > Dear Matt/Barry >> > > > > > > >> > > > > > > With your options, it results in >> > > > > > > >> > > > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true >> resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 >> > > > > > > Residual norms for fieldsplit_u_ solve. >> > > > > > > 0 KSP Residual norm 2.407308987203e+36 >> > > > > > > 1 KSP Residual norm 5.797185652683e+72 >> > > > > > >> > > > > > It looks like Matt is right, hypre is seemly producing useless >> garbage. >> > > > > > >> > > > > > First how do things run on one process. If you have similar >> problems then debug on one process (debugging any kind of problem is always >> far easy on one process). >> > > > > > >> > > > > > First run with -fieldsplit_u_type lu (instead of using hypre) to >> see if that works or also produces something bad. >> > > > > > >> > > > > > What is the operator and the boundary conditions for u? It could >> be singular. >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > > > 0 KSP Residual norm 0.000000000000e+00 >> > > > > > > ... >> > > > > > > 999 KSP preconditioned resid norm 2.920157329174e+12 true >> resid norm 9.015683504616e+06 ||r(i)||/||b|| 1.000059124102e+00 >> > > > > > > Residual norms for fieldsplit_u_ solve. >> > > > > > > 0 KSP Residual norm 1.533726746719e+36 >> > > > > > > 1 KSP Residual norm 3.692757392261e+72 >> > > > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > > > 0 KSP Residual norm 0.000000000000e+00 >> > > > > > > >> > > > > > > Do you suggest that the pastix solver for the "wp" block >> encounters small pivot? In addition, seem like the "u" block is also >> singular. >> > > > > > > >> > > > > > > Giang >> > > > > > > >> > > > > > > On Sun, Apr 23, 2017 at 7:39 PM, Barry Smith < >> bsmith at mcs.anl.gov> wrote: >> > > > > > > >> > > > > > > Huge preconditioned norms but normal unpreconditioned norms >> almost always come from a very small pivot in an LU or ILU factorization. >> > > > > > > >> > > > > > > The first thing to do is monitor the two sub solves. Run >> with the additional options -fieldsplit_u_ksp_type richardson >> -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 >> -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor >> -fieldsplit_wp_ksp_max_it 1 >> > > > > > > >> > > > > > > > On Apr 23, 2017, at 12:22 PM, Hoang Giang Bui < >> hgbk2008 at gmail.com> wrote: >> > > > > > > > >> > > > > > > > Hello >> > > > > > > > >> > > > > > > > I encountered a strange convergence behavior that I have >> trouble to understand >> > > > > > > > >> > > > > > > > KSPSetFromOptions completed >> > > > > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true >> resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 >> > > > > > > > 1 KSP preconditioned resid norm 2.933141742664e+29 true >> resid norm 9.015152282123e+06 ||r(i)||/||b|| 1.000000198575e+00 >> > > > > > > > 2 KSP preconditioned resid norm 9.686409637174e+16 true >> resid norm 9.015354521944e+06 ||r(i)||/||b|| 1.000022631902e+00 >> > > > > > > > 3 KSP preconditioned resid norm 4.219243615809e+15 true >> resid norm 9.017157702420e+06 ||r(i)||/||b|| 1.000222648583e+00 >> > > > > > > > ..... >> > > > > > > > 999 KSP preconditioned resid norm 3.043754298076e+12 true >> resid norm 9.015425041089e+06 ||r(i)||/||b|| 1.000030454195e+00 >> > > > > > > > 1000 KSP preconditioned resid norm 3.043000287819e+12 true >> resid norm 9.015424313455e+06 ||r(i)||/||b|| 1.000030373483e+00 >> > > > > > > > Linear solve did not converge due to DIVERGED_ITS iterations >> 1000 >> > > > > > > > KSP Object: 4 MPI processes >> > > > > > > > type: gmres >> > > > > > > > GMRES: restart=1000, using Modified Gram-Schmidt >> Orthogonalization >> > > > > > > > GMRES: happy breakdown tolerance 1e-30 >> > > > > > > > maximum iterations=1000, initial guess is zero >> > > > > > > > tolerances: relative=1e-20, absolute=1e-09, >> divergence=10000 >> > > > > > > > left preconditioning >> > > > > > > > using PRECONDITIONED norm type for convergence test >> > > > > > > > PC Object: 4 MPI processes >> > > > > > > > type: fieldsplit >> > > > > > > > FieldSplit with MULTIPLICATIVE composition: total splits >> = 2 >> > > > > > > > Solver info for each split is in the following KSP >> objects: >> > > > > > > > Split number 0 Defined by IS >> > > > > > > > KSP Object: (fieldsplit_u_) 4 MPI processes >> > > > > > > > type: preonly >> > > > > > > > maximum iterations=10000, initial guess is zero >> > > > > > > > tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000 >> > > > > > > > left preconditioning >> > > > > > > > using NONE norm type for convergence test >> > > > > > > > PC Object: (fieldsplit_u_) 4 MPI processes >> > > > > > > > type: hypre >> > > > > > > > HYPRE BoomerAMG preconditioning >> > > > > > > > HYPRE BoomerAMG: Cycle type V >> > > > > > > > HYPRE BoomerAMG: Maximum number of levels 25 >> > > > > > > > HYPRE BoomerAMG: Maximum number of iterations PER >> hypre call 1 >> > > > > > > > HYPRE BoomerAMG: Convergence tolerance PER hypre >> call 0 >> > > > > > > > HYPRE BoomerAMG: Threshold for strong coupling 0.6 >> > > > > > > > HYPRE BoomerAMG: Interpolation truncation factor 0 >> > > > > > > > HYPRE BoomerAMG: Interpolation: max elements per row >> 0 >> > > > > > > > HYPRE BoomerAMG: Number of levels of aggressive >> coarsening 0 >> > > > > > > > HYPRE BoomerAMG: Number of paths for aggressive >> coarsening 1 >> > > > > > > > HYPRE BoomerAMG: Maximum row sums 0.9 >> > > > > > > > HYPRE BoomerAMG: Sweeps down 1 >> > > > > > > > HYPRE BoomerAMG: Sweeps up 1 >> > > > > > > > HYPRE BoomerAMG: Sweeps on coarse 1 >> > > > > > > > HYPRE BoomerAMG: Relax down >> symmetric-SOR/Jacobi >> > > > > > > > HYPRE BoomerAMG: Relax up >> symmetric-SOR/Jacobi >> > > > > > > > HYPRE BoomerAMG: Relax on coarse >> Gaussian-elimination >> > > > > > > > HYPRE BoomerAMG: Relax weight (all) 1 >> > > > > > > > HYPRE BoomerAMG: Outer relax weight (all) 1 >> > > > > > > > HYPRE BoomerAMG: Using CF-relaxation >> > > > > > > > HYPRE BoomerAMG: Measure type local >> > > > > > > > HYPRE BoomerAMG: Coarsen type PMIS >> > > > > > > > HYPRE BoomerAMG: Interpolation type classical >> > > > > > > > linear system matrix = precond matrix: >> > > > > > > > Mat Object: (fieldsplit_u_) 4 MPI processes >> > > > > > > > type: mpiaij >> > > > > > > > rows=938910, cols=938910, bs=3 >> > > > > > > > total: nonzeros=8.60906e+07, allocated >> nonzeros=8.60906e+07 >> > > > > > > > total number of mallocs used during MatSetValues >> calls =0 >> > > > > > > > using I-node (on process 0) routines: found 78749 >> nodes, limit used is 5 >> > > > > > > > Split number 1 Defined by IS >> > > > > > > > KSP Object: (fieldsplit_wp_) 4 MPI processes >> > > > > > > > type: preonly >> > > > > > > > maximum iterations=10000, initial guess is zero >> > > > > > > > tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000 >> > > > > > > > left preconditioning >> > > > > > > > using NONE norm type for convergence test >> > > > > > > > PC Object: (fieldsplit_wp_) 4 MPI processes >> > > > > > > > type: lu >> > > > > > > > LU: out-of-place factorization >> > > > > > > > tolerance for zero pivot 2.22045e-14 >> > > > > > > > matrix ordering: natural >> > > > > > > > factor fill ratio given 0, needed 0 >> > > > > > > > Factored matrix follows: >> > > > > > > > Mat Object: 4 MPI processes >> > > > > > > > type: mpiaij >> > > > > > > > rows=34141, cols=34141 >> > > > > > > > package used to perform factorization: pastix >> > > > > > > > Error : -nan >> > > > > > > > Error : -nan >> > > > > > > > total: nonzeros=0, allocated nonzeros=0 >> > > > > > > > Error : -nan >> > > > > > > > total number of mallocs used during MatSetValues calls =0 >> > > > > > > > PaStiX run parameters: >> > > > > > > > Matrix type : >> Symmetric >> > > > > > > > Level of printing (0,1,2): 0 >> > > > > > > > Number of refinements iterations : 0 >> > > > > > > > Error : -nan >> > > > > > > > linear system matrix = precond matrix: >> > > > > > > > Mat Object: (fieldsplit_wp_) 4 MPI processes >> > > > > > > > type: mpiaij >> > > > > > > > rows=34141, cols=34141 >> > > > > > > > total: nonzeros=485655, allocated nonzeros=485655 >> > > > > > > > total number of mallocs used during MatSetValues >> calls =0 >> > > > > > > > not using I-node (on process 0) routines >> > > > > > > > linear system matrix = precond matrix: >> > > > > > > > Mat Object: 4 MPI processes >> > > > > > > > type: mpiaij >> > > > > > > > rows=973051, cols=973051 >> > > > > > > > total: nonzeros=9.90037e+07, allocated >> nonzeros=9.90037e+07 >> > > > > > > > total number of mallocs used during MatSetValues calls =0 >> > > > > > > > using I-node (on process 0) routines: found 78749 >> nodes, limit used is 5 >> > > > > > > > >> > > > > > > > The pattern of convergence gives a hint that this system is >> somehow bad/singular. But I don't know why the preconditioned error goes up >> too high. Anyone has an idea? >> > > > > > > > >> > > > > > > > Best regards >> > > > > > > > Giang Bui >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > >> > > > > >> > > > >> > > > >> > > >> > > >> > >> > >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 3 07:22:59 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 3 May 2017 07:22:59 -0500 Subject: [petsc-users] strange convergence In-Reply-To: References: <7891536D-91FE-4BFF-8DAD-CE7AB85A4E57@mcs.anl.gov> <425BBB58-9721-49F3-8C86-940F08E925F7@mcs.anl.gov> <42EB791A-40C2-439F-A5F7-5F8C15CECA6F@mcs.anl.gov> <82193784-B4C4-47D7-80EA-25F549C9091B@mcs.anl.gov> <87wpa3wd5j.fsf@jedbrown.org> Message-ID: On Wed, May 3, 2017 at 2:29 AM, Hoang Giang Bui wrote: > Dear Jed > > If I understood you correctly you suggest to avoid penalty by using the > Lagrange multiplier for the mortar constraint? In this case it leads to the > use of discrete Lagrange multiplier space. > Sorry for being ignorant here, but why is the space "discrete"? It looks like you should have a continuum formulation of the mortar as well. Maybe I do not understand something fundamental. >From this (https://en.wikipedia.org/wiki/Mortar_methods) short description, it seems that mortars begin from a continuum formulation, but are then reduced to the discrete level. This is no problem if done consistently, as for instance in the FETI method where efficient preconditioners exist. Thanks, Matt > Do you or anyone already have experience using discrete Lagrange > multiplier space with Petsc? > > There is also similar question on stackexchange > https://scicomp.stackexchange.com/questions/25113/ > preconditioners-and-discrete-lagrange-multipliers > > Giang > > On Sat, Apr 29, 2017 at 3:34 PM, Jed Brown wrote: > >> Hoang Giang Bui writes: >> >> > Hi Barry >> > >> > The first block is from a standard solid mechanics discretization based >> on >> > balance of momentum equation. There is some material involved but in >> > principal it's well-posed elasticity equation with positive definite >> > tangent operator. The "gluing business" uses the mortar method to keep >> the >> > continuity of displacement. Instead of using Lagrange multiplier to >> treat >> > the constraint I used penalty method to penalize the energy. The >> > discretization form of mortar is quite simple >> > >> > \int_{\Gamma_1} { rho * (\delta u_1 - \delta u_2) * (u_1 - u_2) dA } >> > >> > rho is penalty parameter. In the simulation I initially set it low (~E) >> to >> > preserve the conditioning of the system. >> >> There are two things that can go wrong here with AMG: >> >> * The penalty term can mess up the strength of connection heuristics >> such that you get poor choice of C-points (classical AMG like >> BoomerAMG) or poor choice of aggregates (smoothed aggregation). >> >> * The penalty term can prevent Jacobi smoothing from being effective; in >> this case, it can lead to poor coarse basis functions (higher energy >> than they should be) and poor smoothing in an MG cycle. You can fix >> the poor smoothing in the MG cycle by using a stronger smoother, like >> ASM with some overlap. >> >> I'm generally not a fan of penalty methods due to the irritating >> tradeoffs and often poor solver performance. >> >> > In the figure below, the colorful blocks are u_1 and the base is u_2. >> Both >> > u_1 and u_2 use isoparametric quadratic approximation. >> > >> > ? >> > Snapshot.png >> > > U/view?usp=drive_web> >> > ??? >> > >> > Giang >> > >> > On Fri, Apr 28, 2017 at 6:21 PM, Barry Smith >> wrote: >> > >> >> >> >> Ok, so boomerAMG algebraic multigrid is not good for the first block. >> >> You mentioned the first block has two things glued together? AMG is >> >> fantastic for certain problems but doesn't work for everything. >> >> >> >> Tell us more about the first block, what PDE it comes from, what >> >> discretization, and what the "gluing business" is and maybe we'll have >> >> suggestions for how to precondition it. >> >> >> >> Barry >> >> >> >> > On Apr 28, 2017, at 3:56 AM, Hoang Giang Bui >> wrote: >> >> > >> >> > It's in fact quite good >> >> > >> >> > Residual norms for fieldsplit_u_ solve. >> >> > 0 KSP Residual norm 4.014715925568e+00 >> >> > 1 KSP Residual norm 2.160497019264e-10 >> >> > Residual norms for fieldsplit_wp_ solve. >> >> > 0 KSP Residual norm 0.000000000000e+00 >> >> > 0 KSP preconditioned resid norm 4.014715925568e+00 true resid norm >> >> 9.006493082896e+06 ||r(i)||/||b|| 1.000000000000e+00 >> >> > Residual norms for fieldsplit_u_ solve. >> >> > 0 KSP Residual norm 9.999999999416e-01 >> >> > 1 KSP Residual norm 7.118380416383e-11 >> >> > Residual norms for fieldsplit_wp_ solve. >> >> > 0 KSP Residual norm 0.000000000000e+00 >> >> > 1 KSP preconditioned resid norm 1.701150951035e-10 true resid norm >> >> 5.494262251846e-04 ||r(i)||/||b|| 6.100334726599e-11 >> >> > Linear solve converged due to CONVERGED_ATOL iterations 1 >> >> > >> >> > Giang >> >> > >> >> > On Thu, Apr 27, 2017 at 5:25 PM, Barry Smith >> wrote: >> >> > >> >> > Run again using LU on both blocks to see what happens. >> >> > >> >> > >> >> > > On Apr 27, 2017, at 2:14 AM, Hoang Giang Bui >> >> wrote: >> >> > > >> >> > > I have changed the way to tie the nonconforming mesh. It seems the >> >> matrix now is better >> >> > > >> >> > > with -pc_type lu the output is >> >> > > 0 KSP preconditioned resid norm 3.308678584240e-01 true resid >> norm >> >> 9.006493082896e+06 ||r(i)||/||b|| 1.000000000000e+00 >> >> > > 1 KSP preconditioned resid norm 2.004313395301e-12 true resid >> norm >> >> 2.549872332830e-05 ||r(i)||/||b|| 2.831148938173e-12 >> >> > > Linear solve converged due to CONVERGED_ATOL iterations 1 >> >> > > >> >> > > >> >> > > with -pc_type fieldsplit -fieldsplit_u_pc_type hypre >> >> -fieldsplit_wp_pc_type lu the convergence is slow >> >> > > 0 KSP preconditioned resid norm 1.116302362553e-01 true resid >> norm >> >> 9.006493083520e+06 ||r(i)||/||b|| 1.000000000000e+00 >> >> > > 1 KSP preconditioned resid norm 2.582134825666e-02 true resid >> norm >> >> 9.268347719866e+06 ||r(i)||/||b|| 1.029073984060e+00 >> >> > > ... >> >> > > 824 KSP preconditioned resid norm 1.018542387738e-09 true resid >> norm >> >> 2.906608839310e+02 ||r(i)||/||b|| 3.227237074804e-05 >> >> > > 825 KSP preconditioned resid norm 9.743727947637e-10 true resid >> norm >> >> 2.820369993061e+02 ||r(i)||/||b|| 3.131485215062e-05 >> >> > > Linear solve converged due to CONVERGED_ATOL iterations 825 >> >> > > >> >> > > checking with additional -fieldsplit_u_ksp_type richardson >> >> -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 >> >> -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor >> >> -fieldsplit_wp_ksp_max_it 1 gives >> >> > > >> >> > > 0 KSP preconditioned resid norm 1.116302362553e-01 true resid >> norm >> >> 9.006493083520e+06 ||r(i)||/||b|| 1.000000000000e+00 >> >> > > Residual norms for fieldsplit_u_ solve. >> >> > > 0 KSP Residual norm 5.803507549280e-01 >> >> > > 1 KSP Residual norm 2.069538175950e-01 >> >> > > Residual norms for fieldsplit_wp_ solve. >> >> > > 0 KSP Residual norm 0.000000000000e+00 >> >> > > 1 KSP preconditioned resid norm 2.582134825666e-02 true resid >> norm >> >> 9.268347719866e+06 ||r(i)||/||b|| 1.029073984060e+00 >> >> > > Residual norms for fieldsplit_u_ solve. >> >> > > 0 KSP Residual norm 7.831796195225e-01 >> >> > > 1 KSP Residual norm 1.734608520110e-01 >> >> > > Residual norms for fieldsplit_wp_ solve. >> >> > > 0 KSP Residual norm 0.000000000000e+00 >> >> > > .... >> >> > > 823 KSP preconditioned resid norm 1.065070135605e-09 true resid >> norm >> >> 3.081881356833e+02 ||r(i)||/||b|| 3.421843916665e-05 >> >> > > Residual norms for fieldsplit_u_ solve. >> >> > > 0 KSP Residual norm 6.113806394327e-01 >> >> > > 1 KSP Residual norm 1.535465290944e-01 >> >> > > Residual norms for fieldsplit_wp_ solve. >> >> > > 0 KSP Residual norm 0.000000000000e+00 >> >> > > 824 KSP preconditioned resid norm 1.018542387746e-09 true resid >> norm >> >> 2.906608839353e+02 ||r(i)||/||b|| 3.227237074851e-05 >> >> > > Residual norms for fieldsplit_u_ solve. >> >> > > 0 KSP Residual norm 6.123437055586e-01 >> >> > > 1 KSP Residual norm 1.524661826133e-01 >> >> > > Residual norms for fieldsplit_wp_ solve. >> >> > > 0 KSP Residual norm 0.000000000000e+00 >> >> > > 825 KSP preconditioned resid norm 9.743727947718e-10 true resid >> norm >> >> 2.820369990571e+02 ||r(i)||/||b|| 3.131485212298e-05 >> >> > > Linear solve converged due to CONVERGED_ATOL iterations 825 >> >> > > >> >> > > >> >> > > The residual for wp block is zero since in this first step the rhs >> is >> >> zero. As can see in the output, the multigrid does not perform well to >> >> reduce the residual in the sub-solve. Is my observation right? what >> can be >> >> done to improve this? >> >> > > >> >> > > >> >> > > Giang >> >> > > >> >> > > On Tue, Apr 25, 2017 at 12:17 AM, Barry Smith >> >> wrote: >> >> > > >> >> > > This can happen in the matrix is singular or nearly singular or >> if >> >> the factorization generates small pivots, which can occur for even >> >> nonsingular problems if the matrix is poorly scaled or just plain >> nasty. >> >> > > >> >> > > >> >> > > > On Apr 24, 2017, at 5:10 PM, Hoang Giang Bui > > >> >> wrote: >> >> > > > >> >> > > > It took a while, here I send you the output >> >> > > > >> >> > > > 0 KSP preconditioned resid norm 3.129073545457e+05 true resid >> norm >> >> 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 >> >> > > > 1 KSP preconditioned resid norm 7.442444222843e-01 true resid >> norm >> >> 1.003356247696e+02 ||r(i)||/||b|| 1.112966720375e-05 >> >> > > > 2 KSP preconditioned resid norm 3.267453132529e-07 true resid >> norm >> >> 3.216722968300e+01 ||r(i)||/||b|| 3.568130084011e-06 >> >> > > > 3 KSP preconditioned resid norm 1.155046883816e-11 true resid >> norm >> >> 3.234460376820e+01 ||r(i)||/||b|| 3.587805194854e-06 >> >> > > > Linear solve converged due to CONVERGED_ATOL iterations 3 >> >> > > > KSP Object: 4 MPI processes >> >> > > > type: gmres >> >> > > > GMRES: restart=1000, using Modified Gram-Schmidt >> >> Orthogonalization >> >> > > > GMRES: happy breakdown tolerance 1e-30 >> >> > > > maximum iterations=1000, initial guess is zero >> >> > > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 >> >> > > > left preconditioning >> >> > > > using PRECONDITIONED norm type for convergence test >> >> > > > PC Object: 4 MPI processes >> >> > > > type: lu >> >> > > > LU: out-of-place factorization >> >> > > > tolerance for zero pivot 2.22045e-14 >> >> > > > matrix ordering: natural >> >> > > > factor fill ratio given 0, needed 0 >> >> > > > Factored matrix follows: >> >> > > > Mat Object: 4 MPI processes >> >> > > > type: mpiaij >> >> > > > rows=973051, cols=973051 >> >> > > > package used to perform factorization: pastix >> >> > > > Error : 3.24786e-14 >> >> > > > total: nonzeros=0, allocated nonzeros=0 >> >> > > > total number of mallocs used during MatSetValues calls >> =0 >> >> > > > PaStiX run parameters: >> >> > > > Matrix type : Unsymmetric >> >> > > > Level of printing (0,1,2): 0 >> >> > > > Number of refinements iterations : 3 >> >> > > > Error : 3.24786e-14 >> >> > > > linear system matrix = precond matrix: >> >> > > > Mat Object: 4 MPI processes >> >> > > > type: mpiaij >> >> > > > rows=973051, cols=973051 >> >> > > > Error : 3.24786e-14 >> >> > > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 >> >> > > > total number of mallocs used during MatSetValues calls =0 >> >> > > > using I-node (on process 0) routines: found 78749 nodes, >> limit >> >> used is 5 >> >> > > > Error : 3.24786e-14 >> >> > > > >> >> > > > It doesn't do as you said. Something is not right here. I will >> look >> >> in depth. >> >> > > > >> >> > > > Giang >> >> > > > >> >> > > > On Mon, Apr 24, 2017 at 8:21 PM, Barry Smith > > >> >> wrote: >> >> > > > >> >> > > > > On Apr 24, 2017, at 12:47 PM, Hoang Giang Bui < >> hgbk2008 at gmail.com> >> >> wrote: >> >> > > > > >> >> > > > > Good catch. I get this for the very first step, maybe at that >> time >> >> the rhs_w is zero. >> >> > > > >> >> > > > With the multiplicative composition the right hand side of >> the >> >> second solve is the initial right hand side of the second solve minus >> >> A_10*x where x is the solution to the first sub solve and A_10 is the >> lower >> >> left block of the outer matrix. So unless both the initial right hand >> side >> >> has a zero for the second block and A_10 is identically zero the right >> hand >> >> side for the second sub solve should not be zero. Is A_10 == 0? >> >> > > > >> >> > > > >> >> > > > > In the later step, it shows 2 step convergence >> >> > > > > >> >> > > > > Residual norms for fieldsplit_u_ solve. >> >> > > > > 0 KSP Residual norm 3.165886479830e+04 >> >> > > > > 1 KSP Residual norm 2.905922877684e-01 >> >> > > > > Residual norms for fieldsplit_wp_ solve. >> >> > > > > 0 KSP Residual norm 2.397669419027e-01 >> >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> >> > > > > 0 KSP preconditioned resid norm 3.165886479920e+04 true resid >> >> norm 7.963616922323e+05 ||r(i)||/||b|| 1.000000000000e+00 >> >> > > > > Residual norms for fieldsplit_u_ solve. >> >> > > > > 0 KSP Residual norm 9.999891813771e-01 >> >> > > > > 1 KSP Residual norm 1.512000395579e-05 >> >> > > > > Residual norms for fieldsplit_wp_ solve. >> >> > > > > 0 KSP Residual norm 8.192702188243e-06 >> >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> >> > > > > 1 KSP preconditioned resid norm 5.252183822848e-02 true resid >> >> norm 7.135927677844e+04 ||r(i)||/||b|| 8.960661653427e-02 >> >> > > > >> >> > > > The outer residual norms are still wonky, the preconditioned >> >> residual norm goes from 3.165886479920e+04 to 5.252183822848e-02 which >> is a >> >> huge drop but the 7.963616922323e+05 drops very much less >> >> 7.135927677844e+04. This is not normal. >> >> > > > >> >> > > > What if you just use -pc_type lu for the entire system (no >> >> fieldsplit), does the true residual drop to almost zero in the first >> >> iteration (as it should?). Send the output. >> >> > > > >> >> > > > >> >> > > > >> >> > > > > Residual norms for fieldsplit_u_ solve. >> >> > > > > 0 KSP Residual norm 6.946213936597e-01 >> >> > > > > 1 KSP Residual norm 1.195514007343e-05 >> >> > > > > Residual norms for fieldsplit_wp_ solve. >> >> > > > > 0 KSP Residual norm 1.025694497535e+00 >> >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> >> > > > > 2 KSP preconditioned resid norm 8.785709535405e-03 true resid >> >> norm 1.419341799277e+04 ||r(i)||/||b|| 1.782282866091e-02 >> >> > > > > Residual norms for fieldsplit_u_ solve. >> >> > > > > 0 KSP Residual norm 7.255149996405e-01 >> >> > > > > 1 KSP Residual norm 6.583512434218e-06 >> >> > > > > Residual norms for fieldsplit_wp_ solve. >> >> > > > > 0 KSP Residual norm 1.015229700337e+00 >> >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> >> > > > > 3 KSP preconditioned resid norm 7.110407712709e-04 true resid >> >> norm 5.284940654154e+02 ||r(i)||/||b|| 6.636357205153e-04 >> >> > > > > Residual norms for fieldsplit_u_ solve. >> >> > > > > 0 KSP Residual norm 3.512243341400e-01 >> >> > > > > 1 KSP Residual norm 2.032490351200e-06 >> >> > > > > Residual norms for fieldsplit_wp_ solve. >> >> > > > > 0 KSP Residual norm 1.282327290982e+00 >> >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> >> > > > > 4 KSP preconditioned resid norm 3.482036620521e-05 true resid >> >> norm 4.291231924307e+01 ||r(i)||/||b|| 5.388546393133e-05 >> >> > > > > Residual norms for fieldsplit_u_ solve. >> >> > > > > 0 KSP Residual norm 3.423609338053e-01 >> >> > > > > 1 KSP Residual norm 4.213703301972e-07 >> >> > > > > Residual norms for fieldsplit_wp_ solve. >> >> > > > > 0 KSP Residual norm 1.157384757538e+00 >> >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> >> > > > > 5 KSP preconditioned resid norm 1.203470314534e-06 true resid >> >> norm 4.544956156267e+00 ||r(i)||/||b|| 5.707150658550e-06 >> >> > > > > Residual norms for fieldsplit_u_ solve. >> >> > > > > 0 KSP Residual norm 3.838596289995e-01 >> >> > > > > 1 KSP Residual norm 9.927864176103e-08 >> >> > > > > Residual norms for fieldsplit_wp_ solve. >> >> > > > > 0 KSP Residual norm 1.066298905618e+00 >> >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> >> > > > > 6 KSP preconditioned resid norm 3.331619244266e-08 true resid >> >> norm 2.821511729024e+00 ||r(i)||/||b|| 3.543002829675e-06 >> >> > > > > Residual norms for fieldsplit_u_ solve. >> >> > > > > 0 KSP Residual norm 4.624964188094e-01 >> >> > > > > 1 KSP Residual norm 6.418229775372e-08 >> >> > > > > Residual norms for fieldsplit_wp_ solve. >> >> > > > > 0 KSP Residual norm 9.800784311614e-01 >> >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> >> > > > > 7 KSP preconditioned resid norm 8.788046233297e-10 true resid >> >> norm 2.849209671705e+00 ||r(i)||/||b|| 3.577783436215e-06 >> >> > > > > Linear solve converged due to CONVERGED_ATOL iterations 7 >> >> > > > > >> >> > > > > The outer operator is an explicit matrix. >> >> > > > > >> >> > > > > Giang >> >> > > > > >> >> > > > > On Mon, Apr 24, 2017 at 7:32 PM, Barry Smith < >> bsmith at mcs.anl.gov> >> >> wrote: >> >> > > > > >> >> > > > > > On Apr 24, 2017, at 3:16 AM, Hoang Giang Bui < >> hgbk2008 at gmail.com> >> >> wrote: >> >> > > > > > >> >> > > > > > Thanks Barry, trying with -fieldsplit_u_type lu gives better >> >> convergence. I still used 4 procs though, probably with 1 proc it >> should >> >> also be the same. >> >> > > > > > >> >> > > > > > The u block used a Nitsche-type operator to connect two >> >> non-matching domains. I don't think it will leave some rigid body >> motion >> >> leads to not sufficient constraints. Maybe you have other idea? >> >> > > > > > >> >> > > > > > Residual norms for fieldsplit_u_ solve. >> >> > > > > > 0 KSP Residual norm 3.129067184300e+05 >> >> > > > > > 1 KSP Residual norm 5.906261468196e-01 >> >> > > > > > Residual norms for fieldsplit_wp_ solve. >> >> > > > > > 0 KSP Residual norm 0.000000000000e+00 >> >> > > > > >> >> > > > > ^^^^ something is wrong here. The sub solve should not be >> >> starting with a 0 residual (this means the right hand side for this sub >> >> solve is zero which it should not be). >> >> > > > > >> >> > > > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 >> >> > > > > >> >> > > > > >> >> > > > > How are you providing the outer operator? As an explicit >> matrix >> >> or with some shell matrix? >> >> > > > > >> >> > > > > >> >> > > > > >> >> > > > > > 0 KSP preconditioned resid norm 3.129067184300e+05 true >> resid >> >> norm 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 >> >> > > > > > Residual norms for fieldsplit_u_ solve. >> >> > > > > > 0 KSP Residual norm 9.999955993437e-01 >> >> > > > > > 1 KSP Residual norm 4.019774691831e-06 >> >> > > > > > Residual norms for fieldsplit_wp_ solve. >> >> > > > > > 0 KSP Residual norm 0.000000000000e+00 >> >> > > > > > 1 KSP preconditioned resid norm 5.003913641475e-01 true >> resid >> >> norm 4.692996324114e+01 ||r(i)||/||b|| 5.205677185522e-06 >> >> > > > > > Residual norms for fieldsplit_u_ solve. >> >> > > > > > 0 KSP Residual norm 1.000012180204e+00 >> >> > > > > > 1 KSP Residual norm 1.017367950422e-05 >> >> > > > > > Residual norms for fieldsplit_wp_ solve. >> >> > > > > > 0 KSP Residual norm 0.000000000000e+00 >> >> > > > > > 2 KSP preconditioned resid norm 2.330910333756e-07 true >> resid >> >> norm 3.474855463983e+01 ||r(i)||/||b|| 3.854461960453e-06 >> >> > > > > > Residual norms for fieldsplit_u_ solve. >> >> > > > > > 0 KSP Residual norm 1.000004200085e+00 >> >> > > > > > 1 KSP Residual norm 6.231613102458e-06 >> >> > > > > > Residual norms for fieldsplit_wp_ solve. >> >> > > > > > 0 KSP Residual norm 0.000000000000e+00 >> >> > > > > > 3 KSP preconditioned resid norm 8.671259838389e-11 true >> resid >> >> norm 3.545103468011e+01 ||r(i)||/||b|| 3.932384125024e-06 >> >> > > > > > Linear solve converged due to CONVERGED_ATOL iterations 3 >> >> > > > > > KSP Object: 4 MPI processes >> >> > > > > > type: gmres >> >> > > > > > GMRES: restart=1000, using Modified Gram-Schmidt >> >> Orthogonalization >> >> > > > > > GMRES: happy breakdown tolerance 1e-30 >> >> > > > > > maximum iterations=1000, initial guess is zero >> >> > > > > > tolerances: relative=1e-20, absolute=1e-09, >> divergence=10000 >> >> > > > > > left preconditioning >> >> > > > > > using PRECONDITIONED norm type for convergence test >> >> > > > > > PC Object: 4 MPI processes >> >> > > > > > type: fieldsplit >> >> > > > > > FieldSplit with MULTIPLICATIVE composition: total splits >> = 2 >> >> > > > > > Solver info for each split is in the following KSP >> objects: >> >> > > > > > Split number 0 Defined by IS >> >> > > > > > KSP Object: (fieldsplit_u_) 4 MPI processes >> >> > > > > > type: richardson >> >> > > > > > Richardson: damping factor=1 >> >> > > > > > maximum iterations=1, initial guess is zero >> >> > > > > > tolerances: relative=1e-05, absolute=1e-50, >> >> divergence=10000 >> >> > > > > > left preconditioning >> >> > > > > > using PRECONDITIONED norm type for convergence test >> >> > > > > > PC Object: (fieldsplit_u_) 4 MPI processes >> >> > > > > > type: lu >> >> > > > > > LU: out-of-place factorization >> >> > > > > > tolerance for zero pivot 2.22045e-14 >> >> > > > > > matrix ordering: natural >> >> > > > > > factor fill ratio given 0, needed 0 >> >> > > > > > Factored matrix follows: >> >> > > > > > Mat Object: 4 MPI processes >> >> > > > > > type: mpiaij >> >> > > > > > rows=938910, cols=938910 >> >> > > > > > package used to perform factorization: pastix >> >> > > > > > total: nonzeros=0, allocated nonzeros=0 >> >> > > > > > Error : 3.36878e-14 >> >> > > > > > total number of mallocs used during MatSetValues >> calls >> >> =0 >> >> > > > > > PaStiX run parameters: >> >> > > > > > Matrix type : >> Unsymmetric >> >> > > > > > Level of printing (0,1,2): 0 >> >> > > > > > Number of refinements iterations : 3 >> >> > > > > > Error : 3.36878e-14 >> >> > > > > > linear system matrix = precond matrix: >> >> > > > > > Mat Object: (fieldsplit_u_) 4 MPI processes >> >> > > > > > type: mpiaij >> >> > > > > > rows=938910, cols=938910, bs=3 >> >> > > > > > Error : 3.36878e-14 >> >> > > > > > Error : 3.36878e-14 >> >> > > > > > total: nonzeros=8.60906e+07, allocated >> >> nonzeros=8.60906e+07 >> >> > > > > > total number of mallocs used during MatSetValues >> calls =0 >> >> > > > > > using I-node (on process 0) routines: found 78749 >> >> nodes, limit used is 5 >> >> > > > > > Split number 1 Defined by IS >> >> > > > > > KSP Object: (fieldsplit_wp_) 4 MPI processes >> >> > > > > > type: richardson >> >> > > > > > Richardson: damping factor=1 >> >> > > > > > maximum iterations=1, initial guess is zero >> >> > > > > > tolerances: relative=1e-05, absolute=1e-50, >> >> divergence=10000 >> >> > > > > > left preconditioning >> >> > > > > > using PRECONDITIONED norm type for convergence test >> >> > > > > > PC Object: (fieldsplit_wp_) 4 MPI processes >> >> > > > > > type: lu >> >> > > > > > LU: out-of-place factorization >> >> > > > > > tolerance for zero pivot 2.22045e-14 >> >> > > > > > matrix ordering: natural >> >> > > > > > factor fill ratio given 0, needed 0 >> >> > > > > > Factored matrix follows: >> >> > > > > > Mat Object: 4 MPI processes >> >> > > > > > type: mpiaij >> >> > > > > > rows=34141, cols=34141 >> >> > > > > > package used to perform factorization: pastix >> >> > > > > > Error : -nan >> >> > > > > > Error : -nan >> >> > > > > > Error : -nan >> >> > > > > > total: nonzeros=0, allocated nonzeros=0 >> >> > > > > > total number of mallocs used during >> MatSetValues >> >> calls =0 >> >> > > > > > PaStiX run parameters: >> >> > > > > > Matrix type : >> Symmetric >> >> > > > > > Level of printing (0,1,2): 0 >> >> > > > > > Number of refinements iterations : 0 >> >> > > > > > Error : -nan >> >> > > > > > linear system matrix = precond matrix: >> >> > > > > > Mat Object: (fieldsplit_wp_) 4 MPI processes >> >> > > > > > type: mpiaij >> >> > > > > > rows=34141, cols=34141 >> >> > > > > > total: nonzeros=485655, allocated nonzeros=485655 >> >> > > > > > total number of mallocs used during MatSetValues >> calls =0 >> >> > > > > > not using I-node (on process 0) routines >> >> > > > > > linear system matrix = precond matrix: >> >> > > > > > Mat Object: 4 MPI processes >> >> > > > > > type: mpiaij >> >> > > > > > rows=973051, cols=973051 >> >> > > > > > total: nonzeros=9.90037e+07, allocated >> nonzeros=9.90037e+07 >> >> > > > > > total number of mallocs used during MatSetValues calls =0 >> >> > > > > > using I-node (on process 0) routines: found 78749 >> nodes, >> >> limit used is 5 >> >> > > > > > >> >> > > > > > >> >> > > > > > >> >> > > > > > Giang >> >> > > > > > >> >> > > > > > On Sun, Apr 23, 2017 at 10:19 PM, Barry Smith < >> >> bsmith at mcs.anl.gov> wrote: >> >> > > > > > >> >> > > > > > > On Apr 23, 2017, at 2:42 PM, Hoang Giang Bui < >> >> hgbk2008 at gmail.com> wrote: >> >> > > > > > > >> >> > > > > > > Dear Matt/Barry >> >> > > > > > > >> >> > > > > > > With your options, it results in >> >> > > > > > > >> >> > > > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true >> >> resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 >> >> > > > > > > Residual norms for fieldsplit_u_ solve. >> >> > > > > > > 0 KSP Residual norm 2.407308987203e+36 >> >> > > > > > > 1 KSP Residual norm 5.797185652683e+72 >> >> > > > > > >> >> > > > > > It looks like Matt is right, hypre is seemly producing >> useless >> >> garbage. >> >> > > > > > >> >> > > > > > First how do things run on one process. If you have similar >> >> problems then debug on one process (debugging any kind of problem is >> always >> >> far easy on one process). >> >> > > > > > >> >> > > > > > First run with -fieldsplit_u_type lu (instead of using >> hypre) to >> >> see if that works or also produces something bad. >> >> > > > > > >> >> > > > > > What is the operator and the boundary conditions for u? It >> could >> >> be singular. >> >> > > > > > >> >> > > > > > >> >> > > > > > >> >> > > > > > >> >> > > > > > >> >> > > > > > >> >> > > > > > > Residual norms for fieldsplit_wp_ solve. >> >> > > > > > > 0 KSP Residual norm 0.000000000000e+00 >> >> > > > > > > ... >> >> > > > > > > 999 KSP preconditioned resid norm 2.920157329174e+12 true >> >> resid norm 9.015683504616e+06 ||r(i)||/||b|| 1.000059124102e+00 >> >> > > > > > > Residual norms for fieldsplit_u_ solve. >> >> > > > > > > 0 KSP Residual norm 1.533726746719e+36 >> >> > > > > > > 1 KSP Residual norm 3.692757392261e+72 >> >> > > > > > > Residual norms for fieldsplit_wp_ solve. >> >> > > > > > > 0 KSP Residual norm 0.000000000000e+00 >> >> > > > > > > >> >> > > > > > > Do you suggest that the pastix solver for the "wp" block >> >> encounters small pivot? In addition, seem like the "u" block is also >> >> singular. >> >> > > > > > > >> >> > > > > > > Giang >> >> > > > > > > >> >> > > > > > > On Sun, Apr 23, 2017 at 7:39 PM, Barry Smith < >> >> bsmith at mcs.anl.gov> wrote: >> >> > > > > > > >> >> > > > > > > Huge preconditioned norms but normal unpreconditioned >> norms >> >> almost always come from a very small pivot in an LU or ILU >> factorization. >> >> > > > > > > >> >> > > > > > > The first thing to do is monitor the two sub solves. Run >> >> with the additional options -fieldsplit_u_ksp_type richardson >> >> -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 >> >> -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor >> >> -fieldsplit_wp_ksp_max_it 1 >> >> > > > > > > >> >> > > > > > > > On Apr 23, 2017, at 12:22 PM, Hoang Giang Bui < >> >> hgbk2008 at gmail.com> wrote: >> >> > > > > > > > >> >> > > > > > > > Hello >> >> > > > > > > > >> >> > > > > > > > I encountered a strange convergence behavior that I have >> >> trouble to understand >> >> > > > > > > > >> >> > > > > > > > KSPSetFromOptions completed >> >> > > > > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true >> >> resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 >> >> > > > > > > > 1 KSP preconditioned resid norm 2.933141742664e+29 true >> >> resid norm 9.015152282123e+06 ||r(i)||/||b|| 1.000000198575e+00 >> >> > > > > > > > 2 KSP preconditioned resid norm 9.686409637174e+16 true >> >> resid norm 9.015354521944e+06 ||r(i)||/||b|| 1.000022631902e+00 >> >> > > > > > > > 3 KSP preconditioned resid norm 4.219243615809e+15 true >> >> resid norm 9.017157702420e+06 ||r(i)||/||b|| 1.000222648583e+00 >> >> > > > > > > > ..... >> >> > > > > > > > 999 KSP preconditioned resid norm 3.043754298076e+12 true >> >> resid norm 9.015425041089e+06 ||r(i)||/||b|| 1.000030454195e+00 >> >> > > > > > > > 1000 KSP preconditioned resid norm 3.043000287819e+12 >> true >> >> resid norm 9.015424313455e+06 ||r(i)||/||b|| 1.000030373483e+00 >> >> > > > > > > > Linear solve did not converge due to DIVERGED_ITS >> iterations >> >> 1000 >> >> > > > > > > > KSP Object: 4 MPI processes >> >> > > > > > > > type: gmres >> >> > > > > > > > GMRES: restart=1000, using Modified Gram-Schmidt >> >> Orthogonalization >> >> > > > > > > > GMRES: happy breakdown tolerance 1e-30 >> >> > > > > > > > maximum iterations=1000, initial guess is zero >> >> > > > > > > > tolerances: relative=1e-20, absolute=1e-09, >> >> divergence=10000 >> >> > > > > > > > left preconditioning >> >> > > > > > > > using PRECONDITIONED norm type for convergence test >> >> > > > > > > > PC Object: 4 MPI processes >> >> > > > > > > > type: fieldsplit >> >> > > > > > > > FieldSplit with MULTIPLICATIVE composition: total >> splits >> >> = 2 >> >> > > > > > > > Solver info for each split is in the following KSP >> >> objects: >> >> > > > > > > > Split number 0 Defined by IS >> >> > > > > > > > KSP Object: (fieldsplit_u_) 4 MPI processes >> >> > > > > > > > type: preonly >> >> > > > > > > > maximum iterations=10000, initial guess is zero >> >> > > > > > > > tolerances: relative=1e-05, absolute=1e-50, >> >> divergence=10000 >> >> > > > > > > > left preconditioning >> >> > > > > > > > using NONE norm type for convergence test >> >> > > > > > > > PC Object: (fieldsplit_u_) 4 MPI processes >> >> > > > > > > > type: hypre >> >> > > > > > > > HYPRE BoomerAMG preconditioning >> >> > > > > > > > HYPRE BoomerAMG: Cycle type V >> >> > > > > > > > HYPRE BoomerAMG: Maximum number of levels 25 >> >> > > > > > > > HYPRE BoomerAMG: Maximum number of iterations PER >> >> hypre call 1 >> >> > > > > > > > HYPRE BoomerAMG: Convergence tolerance PER hypre >> >> call 0 >> >> > > > > > > > HYPRE BoomerAMG: Threshold for strong coupling >> 0.6 >> >> > > > > > > > HYPRE BoomerAMG: Interpolation truncation factor >> 0 >> >> > > > > > > > HYPRE BoomerAMG: Interpolation: max elements per >> row >> >> 0 >> >> > > > > > > > HYPRE BoomerAMG: Number of levels of aggressive >> >> coarsening 0 >> >> > > > > > > > HYPRE BoomerAMG: Number of paths for aggressive >> >> coarsening 1 >> >> > > > > > > > HYPRE BoomerAMG: Maximum row sums 0.9 >> >> > > > > > > > HYPRE BoomerAMG: Sweeps down 1 >> >> > > > > > > > HYPRE BoomerAMG: Sweeps up 1 >> >> > > > > > > > HYPRE BoomerAMG: Sweeps on coarse 1 >> >> > > > > > > > HYPRE BoomerAMG: Relax down >> >> symmetric-SOR/Jacobi >> >> > > > > > > > HYPRE BoomerAMG: Relax up >> >> symmetric-SOR/Jacobi >> >> > > > > > > > HYPRE BoomerAMG: Relax on coarse >> >> Gaussian-elimination >> >> > > > > > > > HYPRE BoomerAMG: Relax weight (all) 1 >> >> > > > > > > > HYPRE BoomerAMG: Outer relax weight (all) 1 >> >> > > > > > > > HYPRE BoomerAMG: Using CF-relaxation >> >> > > > > > > > HYPRE BoomerAMG: Measure type local >> >> > > > > > > > HYPRE BoomerAMG: Coarsen type PMIS >> >> > > > > > > > HYPRE BoomerAMG: Interpolation type classical >> >> > > > > > > > linear system matrix = precond matrix: >> >> > > > > > > > Mat Object: (fieldsplit_u_) 4 MPI >> processes >> >> > > > > > > > type: mpiaij >> >> > > > > > > > rows=938910, cols=938910, bs=3 >> >> > > > > > > > total: nonzeros=8.60906e+07, allocated >> >> nonzeros=8.60906e+07 >> >> > > > > > > > total number of mallocs used during MatSetValues >> >> calls =0 >> >> > > > > > > > using I-node (on process 0) routines: found >> 78749 >> >> nodes, limit used is 5 >> >> > > > > > > > Split number 1 Defined by IS >> >> > > > > > > > KSP Object: (fieldsplit_wp_) 4 MPI processes >> >> > > > > > > > type: preonly >> >> > > > > > > > maximum iterations=10000, initial guess is zero >> >> > > > > > > > tolerances: relative=1e-05, absolute=1e-50, >> >> divergence=10000 >> >> > > > > > > > left preconditioning >> >> > > > > > > > using NONE norm type for convergence test >> >> > > > > > > > PC Object: (fieldsplit_wp_) 4 MPI processes >> >> > > > > > > > type: lu >> >> > > > > > > > LU: out-of-place factorization >> >> > > > > > > > tolerance for zero pivot 2.22045e-14 >> >> > > > > > > > matrix ordering: natural >> >> > > > > > > > factor fill ratio given 0, needed 0 >> >> > > > > > > > Factored matrix follows: >> >> > > > > > > > Mat Object: 4 MPI processes >> >> > > > > > > > type: mpiaij >> >> > > > > > > > rows=34141, cols=34141 >> >> > > > > > > > package used to perform factorization: >> pastix >> >> > > > > > > > Error : -nan >> >> > > > > > > > Error : -nan >> >> > > > > > > > total: nonzeros=0, allocated nonzeros=0 >> >> > > > > > > > Error : -nan >> >> > > > > > > > total number of mallocs used during MatSetValues >> calls =0 >> >> > > > > > > > PaStiX run parameters: >> >> > > > > > > > Matrix type : >> >> Symmetric >> >> > > > > > > > Level of printing (0,1,2): 0 >> >> > > > > > > > Number of refinements iterations : 0 >> >> > > > > > > > Error : -nan >> >> > > > > > > > linear system matrix = precond matrix: >> >> > > > > > > > Mat Object: (fieldsplit_wp_) 4 MPI >> processes >> >> > > > > > > > type: mpiaij >> >> > > > > > > > rows=34141, cols=34141 >> >> > > > > > > > total: nonzeros=485655, allocated nonzeros=485655 >> >> > > > > > > > total number of mallocs used during MatSetValues >> >> calls =0 >> >> > > > > > > > not using I-node (on process 0) routines >> >> > > > > > > > linear system matrix = precond matrix: >> >> > > > > > > > Mat Object: 4 MPI processes >> >> > > > > > > > type: mpiaij >> >> > > > > > > > rows=973051, cols=973051 >> >> > > > > > > > total: nonzeros=9.90037e+07, allocated >> >> nonzeros=9.90037e+07 >> >> > > > > > > > total number of mallocs used during MatSetValues >> calls =0 >> >> > > > > > > > using I-node (on process 0) routines: found 78749 >> >> nodes, limit used is 5 >> >> > > > > > > > >> >> > > > > > > > The pattern of convergence gives a hint that this system >> is >> >> somehow bad/singular. But I don't know why the preconditioned error >> goes up >> >> too high. Anyone has an idea? >> >> > > > > > > > >> >> > > > > > > > Best regards >> >> > > > > > > > Giang Bui >> >> > > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > >> >> > > > > > >> >> > > > > >> >> > > > > >> >> > > > >> >> > > > >> >> > > >> >> > > >> >> > >> >> > >> >> >> >> >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From Lukasz.Kaczmarczyk at glasgow.ac.uk Wed May 3 07:55:19 2017 From: Lukasz.Kaczmarczyk at glasgow.ac.uk (Lukasz Kaczmarczyk) Date: Wed, 3 May 2017 12:55:19 +0000 Subject: [petsc-users] strange convergence In-Reply-To: References: <7891536D-91FE-4BFF-8DAD-CE7AB85A4E57@mcs.anl.gov> <425BBB58-9721-49F3-8C86-940F08E925F7@mcs.anl.gov> <42EB791A-40C2-439F-A5F7-5F8C15CECA6F@mcs.anl.gov> <82193784-B4C4-47D7-80EA-25F549C9091B@mcs.anl.gov> <87wpa3wd5j.fsf@jedbrown.org> Message-ID: On 3 May 2017, at 13:22, Matthew Knepley > wrote: On Wed, May 3, 2017 at 2:29 AM, Hoang Giang Bui > wrote: Dear Jed If I understood you correctly you suggest to avoid penalty by using the Lagrange multiplier for the mortar constraint? In this case it leads to the use of discrete Lagrange multiplier space. Sorry for being ignorant here, but why is the space "discrete"? It looks like you should have a continuum formulation of the mortar as well. Maybe I do not understand something fundamental. From this (https://en.wikipedia.org/wiki/Mortar_methods) short description, it seems that mortars begin from a continuum formulation, but are then reduced to the discrete level. This is no problem if done consistently, as for instance in the FETI method where efficient preconditioners exist. Hello, I copied the wrong link to mortar method, how we implemented it, see presentation http://doi.org/10.5281/zenodo.556996 You right that we always start from continuum formulation, on this we apply some discretisation, at the end Lagrange multiplier is expressed by a finite vector of discrete unknowns. It is better to formulate problem first for the continuum; you have better control on what you are doing and stability of the solution. Of course, you can add some constraints at the discreet level, after you discretised problem, but implicitly you have some continuous space for Lagrange multipliers, which is associated with shape functions which you use to discretise problem. In our problem which we have, we try to avoid rebuilding of the system of equations each time contact area is changing. We going to construct DM sub-problem for each body in contact, each sub-problem going to be solved using MG (adjacency for those matrices is fixed in time). All will go to put in nested matrix with the separate block for Lagrange multipliers (adjacency will change in each time step). For solving Lagrange multipliers we going to use FIELDSPLIT using Schur complement. I need to look more detail to FETI method, at are still at development stage for contact problem and direct solver works, for now, small problems at that point. In our code, we using higher order elements with hierarchical base, for this we using specialise MG solver, as you can see here, it works pretty well for moderate size problems, <100M http://mofem.eng.gla.ac.uk/mofem/html/_p_c_m_g_set_up_via_approx_orders_8cpp.html Regards, Lukasz Thanks, Matt Do you or anyone already have experience using discrete Lagrange multiplier space with Petsc? There is also similar question on stackexchange https://scicomp.stackexchange.com/questions/25113/preconditioners-and-discrete-lagrange-multipliers Giang On Sat, Apr 29, 2017 at 3:34 PM, Jed Brown > wrote: Hoang Giang Bui > writes: > Hi Barry > > The first block is from a standard solid mechanics discretization based on > balance of momentum equation. There is some material involved but in > principal it's well-posed elasticity equation with positive definite > tangent operator. The "gluing business" uses the mortar method to keep the > continuity of displacement. Instead of using Lagrange multiplier to treat > the constraint I used penalty method to penalize the energy. The > discretization form of mortar is quite simple > > \int_{\Gamma_1} { rho * (\delta u_1 - \delta u_2) * (u_1 - u_2) dA } > > rho is penalty parameter. In the simulation I initially set it low (~E) to > preserve the conditioning of the system. There are two things that can go wrong here with AMG: * The penalty term can mess up the strength of connection heuristics such that you get poor choice of C-points (classical AMG like BoomerAMG) or poor choice of aggregates (smoothed aggregation). * The penalty term can prevent Jacobi smoothing from being effective; in this case, it can lead to poor coarse basis functions (higher energy than they should be) and poor smoothing in an MG cycle. You can fix the poor smoothing in the MG cycle by using a stronger smoother, like ASM with some overlap. I'm generally not a fan of penalty methods due to the irritating tradeoffs and often poor solver performance. > In the figure below, the colorful blocks are u_1 and the base is u_2. Both > u_1 and u_2 use isoparametric quadratic approximation. > > ? > Snapshot.png > > ??? > > Giang > > On Fri, Apr 28, 2017 at 6:21 PM, Barry Smith > wrote: > >> >> Ok, so boomerAMG algebraic multigrid is not good for the first block. >> You mentioned the first block has two things glued together? AMG is >> fantastic for certain problems but doesn't work for everything. >> >> Tell us more about the first block, what PDE it comes from, what >> discretization, and what the "gluing business" is and maybe we'll have >> suggestions for how to precondition it. >> >> Barry >> >> > On Apr 28, 2017, at 3:56 AM, Hoang Giang Bui > wrote: >> > >> > It's in fact quite good >> > >> > Residual norms for fieldsplit_u_ solve. >> > 0 KSP Residual norm 4.014715925568e+00 >> > 1 KSP Residual norm 2.160497019264e-10 >> > Residual norms for fieldsplit_wp_ solve. >> > 0 KSP Residual norm 0.000000000000e+00 >> > 0 KSP preconditioned resid norm 4.014715925568e+00 true resid norm >> 9.006493082896e+06 ||r(i)||/||b|| 1.000000000000e+00 >> > Residual norms for fieldsplit_u_ solve. >> > 0 KSP Residual norm 9.999999999416e-01 >> > 1 KSP Residual norm 7.118380416383e-11 >> > Residual norms for fieldsplit_wp_ solve. >> > 0 KSP Residual norm 0.000000000000e+00 >> > 1 KSP preconditioned resid norm 1.701150951035e-10 true resid norm >> 5.494262251846e-04 ||r(i)||/||b|| 6.100334726599e-11 >> > Linear solve converged due to CONVERGED_ATOL iterations 1 >> > >> > Giang >> > >> > On Thu, Apr 27, 2017 at 5:25 PM, Barry Smith > wrote: >> > >> > Run again using LU on both blocks to see what happens. >> > >> > >> > > On Apr 27, 2017, at 2:14 AM, Hoang Giang Bui > >> wrote: >> > > >> > > I have changed the way to tie the nonconforming mesh. It seems the >> matrix now is better >> > > >> > > with -pc_type lu the output is >> > > 0 KSP preconditioned resid norm 3.308678584240e-01 true resid norm >> 9.006493082896e+06 ||r(i)||/||b|| 1.000000000000e+00 >> > > 1 KSP preconditioned resid norm 2.004313395301e-12 true resid norm >> 2.549872332830e-05 ||r(i)||/||b|| 2.831148938173e-12 >> > > Linear solve converged due to CONVERGED_ATOL iterations 1 >> > > >> > > >> > > with -pc_type fieldsplit -fieldsplit_u_pc_type hypre >> -fieldsplit_wp_pc_type lu the convergence is slow >> > > 0 KSP preconditioned resid norm 1.116302362553e-01 true resid norm >> 9.006493083520e+06 ||r(i)||/||b|| 1.000000000000e+00 >> > > 1 KSP preconditioned resid norm 2.582134825666e-02 true resid norm >> 9.268347719866e+06 ||r(i)||/||b|| 1.029073984060e+00 >> > > ... >> > > 824 KSP preconditioned resid norm 1.018542387738e-09 true resid norm >> 2.906608839310e+02 ||r(i)||/||b|| 3.227237074804e-05 >> > > 825 KSP preconditioned resid norm 9.743727947637e-10 true resid norm >> 2.820369993061e+02 ||r(i)||/||b|| 3.131485215062e-05 >> > > Linear solve converged due to CONVERGED_ATOL iterations 825 >> > > >> > > checking with additional -fieldsplit_u_ksp_type richardson >> -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 >> -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor >> -fieldsplit_wp_ksp_max_it 1 gives >> > > >> > > 0 KSP preconditioned resid norm 1.116302362553e-01 true resid norm >> 9.006493083520e+06 ||r(i)||/||b|| 1.000000000000e+00 >> > > Residual norms for fieldsplit_u_ solve. >> > > 0 KSP Residual norm 5.803507549280e-01 >> > > 1 KSP Residual norm 2.069538175950e-01 >> > > Residual norms for fieldsplit_wp_ solve. >> > > 0 KSP Residual norm 0.000000000000e+00 >> > > 1 KSP preconditioned resid norm 2.582134825666e-02 true resid norm >> 9.268347719866e+06 ||r(i)||/||b|| 1.029073984060e+00 >> > > Residual norms for fieldsplit_u_ solve. >> > > 0 KSP Residual norm 7.831796195225e-01 >> > > 1 KSP Residual norm 1.734608520110e-01 >> > > Residual norms for fieldsplit_wp_ solve. >> > > 0 KSP Residual norm 0.000000000000e+00 >> > > .... >> > > 823 KSP preconditioned resid norm 1.065070135605e-09 true resid norm >> 3.081881356833e+02 ||r(i)||/||b|| 3.421843916665e-05 >> > > Residual norms for fieldsplit_u_ solve. >> > > 0 KSP Residual norm 6.113806394327e-01 >> > > 1 KSP Residual norm 1.535465290944e-01 >> > > Residual norms for fieldsplit_wp_ solve. >> > > 0 KSP Residual norm 0.000000000000e+00 >> > > 824 KSP preconditioned resid norm 1.018542387746e-09 true resid norm >> 2.906608839353e+02 ||r(i)||/||b|| 3.227237074851e-05 >> > > Residual norms for fieldsplit_u_ solve. >> > > 0 KSP Residual norm 6.123437055586e-01 >> > > 1 KSP Residual norm 1.524661826133e-01 >> > > Residual norms for fieldsplit_wp_ solve. >> > > 0 KSP Residual norm 0.000000000000e+00 >> > > 825 KSP preconditioned resid norm 9.743727947718e-10 true resid norm >> 2.820369990571e+02 ||r(i)||/||b|| 3.131485212298e-05 >> > > Linear solve converged due to CONVERGED_ATOL iterations 825 >> > > >> > > >> > > The residual for wp block is zero since in this first step the rhs is >> zero. As can see in the output, the multigrid does not perform well to >> reduce the residual in the sub-solve. Is my observation right? what can be >> done to improve this? >> > > >> > > >> > > Giang >> > > >> > > On Tue, Apr 25, 2017 at 12:17 AM, Barry Smith > >> wrote: >> > > >> > > This can happen in the matrix is singular or nearly singular or if >> the factorization generates small pivots, which can occur for even >> nonsingular problems if the matrix is poorly scaled or just plain nasty. >> > > >> > > >> > > > On Apr 24, 2017, at 5:10 PM, Hoang Giang Bui > >> wrote: >> > > > >> > > > It took a while, here I send you the output >> > > > >> > > > 0 KSP preconditioned resid norm 3.129073545457e+05 true resid norm >> 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 >> > > > 1 KSP preconditioned resid norm 7.442444222843e-01 true resid norm >> 1.003356247696e+02 ||r(i)||/||b|| 1.112966720375e-05 >> > > > 2 KSP preconditioned resid norm 3.267453132529e-07 true resid norm >> 3.216722968300e+01 ||r(i)||/||b|| 3.568130084011e-06 >> > > > 3 KSP preconditioned resid norm 1.155046883816e-11 true resid norm >> 3.234460376820e+01 ||r(i)||/||b|| 3.587805194854e-06 >> > > > Linear solve converged due to CONVERGED_ATOL iterations 3 >> > > > KSP Object: 4 MPI processes >> > > > type: gmres >> > > > GMRES: restart=1000, using Modified Gram-Schmidt >> Orthogonalization >> > > > GMRES: happy breakdown tolerance 1e-30 >> > > > maximum iterations=1000, initial guess is zero >> > > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 >> > > > left preconditioning >> > > > using PRECONDITIONED norm type for convergence test >> > > > PC Object: 4 MPI processes >> > > > type: lu >> > > > LU: out-of-place factorization >> > > > tolerance for zero pivot 2.22045e-14 >> > > > matrix ordering: natural >> > > > factor fill ratio given 0, needed 0 >> > > > Factored matrix follows: >> > > > Mat Object: 4 MPI processes >> > > > type: mpiaij >> > > > rows=973051, cols=973051 >> > > > package used to perform factorization: pastix >> > > > Error : 3.24786e-14 >> > > > total: nonzeros=0, allocated nonzeros=0 >> > > > total number of mallocs used during MatSetValues calls =0 >> > > > PaStiX run parameters: >> > > > Matrix type : Unsymmetric >> > > > Level of printing (0,1,2): 0 >> > > > Number of refinements iterations : 3 >> > > > Error : 3.24786e-14 >> > > > linear system matrix = precond matrix: >> > > > Mat Object: 4 MPI processes >> > > > type: mpiaij >> > > > rows=973051, cols=973051 >> > > > Error : 3.24786e-14 >> > > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 >> > > > total number of mallocs used during MatSetValues calls =0 >> > > > using I-node (on process 0) routines: found 78749 nodes, limit >> used is 5 >> > > > Error : 3.24786e-14 >> > > > >> > > > It doesn't do as you said. Something is not right here. I will look >> in depth. >> > > > >> > > > Giang >> > > > >> > > > On Mon, Apr 24, 2017 at 8:21 PM, Barry Smith > >> wrote: >> > > > >> > > > > On Apr 24, 2017, at 12:47 PM, Hoang Giang Bui > >> wrote: >> > > > > >> > > > > Good catch. I get this for the very first step, maybe at that time >> the rhs_w is zero. >> > > > >> > > > With the multiplicative composition the right hand side of the >> second solve is the initial right hand side of the second solve minus >> A_10*x where x is the solution to the first sub solve and A_10 is the lower >> left block of the outer matrix. So unless both the initial right hand side >> has a zero for the second block and A_10 is identically zero the right hand >> side for the second sub solve should not be zero. Is A_10 == 0? >> > > > >> > > > >> > > > > In the later step, it shows 2 step convergence >> > > > > >> > > > > Residual norms for fieldsplit_u_ solve. >> > > > > 0 KSP Residual norm 3.165886479830e+04 >> > > > > 1 KSP Residual norm 2.905922877684e-01 >> > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > 0 KSP Residual norm 2.397669419027e-01 >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> > > > > 0 KSP preconditioned resid norm 3.165886479920e+04 true resid >> norm 7.963616922323e+05 ||r(i)||/||b|| 1.000000000000e+00 >> > > > > Residual norms for fieldsplit_u_ solve. >> > > > > 0 KSP Residual norm 9.999891813771e-01 >> > > > > 1 KSP Residual norm 1.512000395579e-05 >> > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > 0 KSP Residual norm 8.192702188243e-06 >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> > > > > 1 KSP preconditioned resid norm 5.252183822848e-02 true resid >> norm 7.135927677844e+04 ||r(i)||/||b|| 8.960661653427e-02 >> > > > >> > > > The outer residual norms are still wonky, the preconditioned >> residual norm goes from 3.165886479920e+04 to 5.252183822848e-02 which is a >> huge drop but the 7.963616922323e+05 drops very much less >> 7.135927677844e+04. This is not normal. >> > > > >> > > > What if you just use -pc_type lu for the entire system (no >> fieldsplit), does the true residual drop to almost zero in the first >> iteration (as it should?). Send the output. >> > > > >> > > > >> > > > >> > > > > Residual norms for fieldsplit_u_ solve. >> > > > > 0 KSP Residual norm 6.946213936597e-01 >> > > > > 1 KSP Residual norm 1.195514007343e-05 >> > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > 0 KSP Residual norm 1.025694497535e+00 >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> > > > > 2 KSP preconditioned resid norm 8.785709535405e-03 true resid >> norm 1.419341799277e+04 ||r(i)||/||b|| 1.782282866091e-02 >> > > > > Residual norms for fieldsplit_u_ solve. >> > > > > 0 KSP Residual norm 7.255149996405e-01 >> > > > > 1 KSP Residual norm 6.583512434218e-06 >> > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > 0 KSP Residual norm 1.015229700337e+00 >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> > > > > 3 KSP preconditioned resid norm 7.110407712709e-04 true resid >> norm 5.284940654154e+02 ||r(i)||/||b|| 6.636357205153e-04 >> > > > > Residual norms for fieldsplit_u_ solve. >> > > > > 0 KSP Residual norm 3.512243341400e-01 >> > > > > 1 KSP Residual norm 2.032490351200e-06 >> > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > 0 KSP Residual norm 1.282327290982e+00 >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> > > > > 4 KSP preconditioned resid norm 3.482036620521e-05 true resid >> norm 4.291231924307e+01 ||r(i)||/||b|| 5.388546393133e-05 >> > > > > Residual norms for fieldsplit_u_ solve. >> > > > > 0 KSP Residual norm 3.423609338053e-01 >> > > > > 1 KSP Residual norm 4.213703301972e-07 >> > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > 0 KSP Residual norm 1.157384757538e+00 >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> > > > > 5 KSP preconditioned resid norm 1.203470314534e-06 true resid >> norm 4.544956156267e+00 ||r(i)||/||b|| 5.707150658550e-06 >> > > > > Residual norms for fieldsplit_u_ solve. >> > > > > 0 KSP Residual norm 3.838596289995e-01 >> > > > > 1 KSP Residual norm 9.927864176103e-08 >> > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > 0 KSP Residual norm 1.066298905618e+00 >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> > > > > 6 KSP preconditioned resid norm 3.331619244266e-08 true resid >> norm 2.821511729024e+00 ||r(i)||/||b|| 3.543002829675e-06 >> > > > > Residual norms for fieldsplit_u_ solve. >> > > > > 0 KSP Residual norm 4.624964188094e-01 >> > > > > 1 KSP Residual norm 6.418229775372e-08 >> > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > 0 KSP Residual norm 9.800784311614e-01 >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> > > > > 7 KSP preconditioned resid norm 8.788046233297e-10 true resid >> norm 2.849209671705e+00 ||r(i)||/||b|| 3.577783436215e-06 >> > > > > Linear solve converged due to CONVERGED_ATOL iterations 7 >> > > > > >> > > > > The outer operator is an explicit matrix. >> > > > > >> > > > > Giang >> > > > > >> > > > > On Mon, Apr 24, 2017 at 7:32 PM, Barry Smith > >> wrote: >> > > > > >> > > > > > On Apr 24, 2017, at 3:16 AM, Hoang Giang Bui > >> wrote: >> > > > > > >> > > > > > Thanks Barry, trying with -fieldsplit_u_type lu gives better >> convergence. I still used 4 procs though, probably with 1 proc it should >> also be the same. >> > > > > > >> > > > > > The u block used a Nitsche-type operator to connect two >> non-matching domains. I don't think it will leave some rigid body motion >> leads to not sufficient constraints. Maybe you have other idea? >> > > > > > >> > > > > > Residual norms for fieldsplit_u_ solve. >> > > > > > 0 KSP Residual norm 3.129067184300e+05 >> > > > > > 1 KSP Residual norm 5.906261468196e-01 >> > > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > > 0 KSP Residual norm 0.000000000000e+00 >> > > > > >> > > > > ^^^^ something is wrong here. The sub solve should not be >> starting with a 0 residual (this means the right hand side for this sub >> solve is zero which it should not be). >> > > > > >> > > > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 >> > > > > >> > > > > >> > > > > How are you providing the outer operator? As an explicit matrix >> or with some shell matrix? >> > > > > >> > > > > >> > > > > >> > > > > > 0 KSP preconditioned resid norm 3.129067184300e+05 true resid >> norm 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 >> > > > > > Residual norms for fieldsplit_u_ solve. >> > > > > > 0 KSP Residual norm 9.999955993437e-01 >> > > > > > 1 KSP Residual norm 4.019774691831e-06 >> > > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > > 0 KSP Residual norm 0.000000000000e+00 >> > > > > > 1 KSP preconditioned resid norm 5.003913641475e-01 true resid >> norm 4.692996324114e+01 ||r(i)||/||b|| 5.205677185522e-06 >> > > > > > Residual norms for fieldsplit_u_ solve. >> > > > > > 0 KSP Residual norm 1.000012180204e+00 >> > > > > > 1 KSP Residual norm 1.017367950422e-05 >> > > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > > 0 KSP Residual norm 0.000000000000e+00 >> > > > > > 2 KSP preconditioned resid norm 2.330910333756e-07 true resid >> norm 3.474855463983e+01 ||r(i)||/||b|| 3.854461960453e-06 >> > > > > > Residual norms for fieldsplit_u_ solve. >> > > > > > 0 KSP Residual norm 1.000004200085e+00 >> > > > > > 1 KSP Residual norm 6.231613102458e-06 >> > > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > > 0 KSP Residual norm 0.000000000000e+00 >> > > > > > 3 KSP preconditioned resid norm 8.671259838389e-11 true resid >> norm 3.545103468011e+01 ||r(i)||/||b|| 3.932384125024e-06 >> > > > > > Linear solve converged due to CONVERGED_ATOL iterations 3 >> > > > > > KSP Object: 4 MPI processes >> > > > > > type: gmres >> > > > > > GMRES: restart=1000, using Modified Gram-Schmidt >> Orthogonalization >> > > > > > GMRES: happy breakdown tolerance 1e-30 >> > > > > > maximum iterations=1000, initial guess is zero >> > > > > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 >> > > > > > left preconditioning >> > > > > > using PRECONDITIONED norm type for convergence test >> > > > > > PC Object: 4 MPI processes >> > > > > > type: fieldsplit >> > > > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 >> > > > > > Solver info for each split is in the following KSP objects: >> > > > > > Split number 0 Defined by IS >> > > > > > KSP Object: (fieldsplit_u_) 4 MPI processes >> > > > > > type: richardson >> > > > > > Richardson: damping factor=1 >> > > > > > maximum iterations=1, initial guess is zero >> > > > > > tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000 >> > > > > > left preconditioning >> > > > > > using PRECONDITIONED norm type for convergence test >> > > > > > PC Object: (fieldsplit_u_) 4 MPI processes >> > > > > > type: lu >> > > > > > LU: out-of-place factorization >> > > > > > tolerance for zero pivot 2.22045e-14 >> > > > > > matrix ordering: natural >> > > > > > factor fill ratio given 0, needed 0 >> > > > > > Factored matrix follows: >> > > > > > Mat Object: 4 MPI processes >> > > > > > type: mpiaij >> > > > > > rows=938910, cols=938910 >> > > > > > package used to perform factorization: pastix >> > > > > > total: nonzeros=0, allocated nonzeros=0 >> > > > > > Error : 3.36878e-14 >> > > > > > total number of mallocs used during MatSetValues calls >> =0 >> > > > > > PaStiX run parameters: >> > > > > > Matrix type : Unsymmetric >> > > > > > Level of printing (0,1,2): 0 >> > > > > > Number of refinements iterations : 3 >> > > > > > Error : 3.36878e-14 >> > > > > > linear system matrix = precond matrix: >> > > > > > Mat Object: (fieldsplit_u_) 4 MPI processes >> > > > > > type: mpiaij >> > > > > > rows=938910, cols=938910, bs=3 >> > > > > > Error : 3.36878e-14 >> > > > > > Error : 3.36878e-14 >> > > > > > total: nonzeros=8.60906e+07, allocated >> nonzeros=8.60906e+07 >> > > > > > total number of mallocs used during MatSetValues calls =0 >> > > > > > using I-node (on process 0) routines: found 78749 >> nodes, limit used is 5 >> > > > > > Split number 1 Defined by IS >> > > > > > KSP Object: (fieldsplit_wp_) 4 MPI processes >> > > > > > type: richardson >> > > > > > Richardson: damping factor=1 >> > > > > > maximum iterations=1, initial guess is zero >> > > > > > tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000 >> > > > > > left preconditioning >> > > > > > using PRECONDITIONED norm type for convergence test >> > > > > > PC Object: (fieldsplit_wp_) 4 MPI processes >> > > > > > type: lu >> > > > > > LU: out-of-place factorization >> > > > > > tolerance for zero pivot 2.22045e-14 >> > > > > > matrix ordering: natural >> > > > > > factor fill ratio given 0, needed 0 >> > > > > > Factored matrix follows: >> > > > > > Mat Object: 4 MPI processes >> > > > > > type: mpiaij >> > > > > > rows=34141, cols=34141 >> > > > > > package used to perform factorization: pastix >> > > > > > Error : -nan >> > > > > > Error : -nan >> > > > > > Error : -nan >> > > > > > total: nonzeros=0, allocated nonzeros=0 >> > > > > > total number of mallocs used during MatSetValues >> calls =0 >> > > > > > PaStiX run parameters: >> > > > > > Matrix type : Symmetric >> > > > > > Level of printing (0,1,2): 0 >> > > > > > Number of refinements iterations : 0 >> > > > > > Error : -nan >> > > > > > linear system matrix = precond matrix: >> > > > > > Mat Object: (fieldsplit_wp_) 4 MPI processes >> > > > > > type: mpiaij >> > > > > > rows=34141, cols=34141 >> > > > > > total: nonzeros=485655, allocated nonzeros=485655 >> > > > > > total number of mallocs used during MatSetValues calls =0 >> > > > > > not using I-node (on process 0) routines >> > > > > > linear system matrix = precond matrix: >> > > > > > Mat Object: 4 MPI processes >> > > > > > type: mpiaij >> > > > > > rows=973051, cols=973051 >> > > > > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 >> > > > > > total number of mallocs used during MatSetValues calls =0 >> > > > > > using I-node (on process 0) routines: found 78749 nodes, >> limit used is 5 >> > > > > > >> > > > > > >> > > > > > >> > > > > > Giang >> > > > > > >> > > > > > On Sun, Apr 23, 2017 at 10:19 PM, Barry Smith < >> bsmith at mcs.anl.gov> wrote: >> > > > > > >> > > > > > > On Apr 23, 2017, at 2:42 PM, Hoang Giang Bui < >> hgbk2008 at gmail.com> wrote: >> > > > > > > >> > > > > > > Dear Matt/Barry >> > > > > > > >> > > > > > > With your options, it results in >> > > > > > > >> > > > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true >> resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 >> > > > > > > Residual norms for fieldsplit_u_ solve. >> > > > > > > 0 KSP Residual norm 2.407308987203e+36 >> > > > > > > 1 KSP Residual norm 5.797185652683e+72 >> > > > > > >> > > > > > It looks like Matt is right, hypre is seemly producing useless >> garbage. >> > > > > > >> > > > > > First how do things run on one process. If you have similar >> problems then debug on one process (debugging any kind of problem is always >> far easy on one process). >> > > > > > >> > > > > > First run with -fieldsplit_u_type lu (instead of using hypre) to >> see if that works or also produces something bad. >> > > > > > >> > > > > > What is the operator and the boundary conditions for u? It could >> be singular. >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > > > 0 KSP Residual norm 0.000000000000e+00 >> > > > > > > ... >> > > > > > > 999 KSP preconditioned resid norm 2.920157329174e+12 true >> resid norm 9.015683504616e+06 ||r(i)||/||b|| 1.000059124102e+00 >> > > > > > > Residual norms for fieldsplit_u_ solve. >> > > > > > > 0 KSP Residual norm 1.533726746719e+36 >> > > > > > > 1 KSP Residual norm 3.692757392261e+72 >> > > > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > > > 0 KSP Residual norm 0.000000000000e+00 >> > > > > > > >> > > > > > > Do you suggest that the pastix solver for the "wp" block >> encounters small pivot? In addition, seem like the "u" block is also >> singular. >> > > > > > > >> > > > > > > Giang >> > > > > > > >> > > > > > > On Sun, Apr 23, 2017 at 7:39 PM, Barry Smith < >> bsmith at mcs.anl.gov> wrote: >> > > > > > > >> > > > > > > Huge preconditioned norms but normal unpreconditioned norms >> almost always come from a very small pivot in an LU or ILU factorization. >> > > > > > > >> > > > > > > The first thing to do is monitor the two sub solves. Run >> with the additional options -fieldsplit_u_ksp_type richardson >> -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 >> -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor >> -fieldsplit_wp_ksp_max_it 1 >> > > > > > > >> > > > > > > > On Apr 23, 2017, at 12:22 PM, Hoang Giang Bui < >> hgbk2008 at gmail.com> wrote: >> > > > > > > > >> > > > > > > > Hello >> > > > > > > > >> > > > > > > > I encountered a strange convergence behavior that I have >> trouble to understand >> > > > > > > > >> > > > > > > > KSPSetFromOptions completed >> > > > > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true >> resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 >> > > > > > > > 1 KSP preconditioned resid norm 2.933141742664e+29 true >> resid norm 9.015152282123e+06 ||r(i)||/||b|| 1.000000198575e+00 >> > > > > > > > 2 KSP preconditioned resid norm 9.686409637174e+16 true >> resid norm 9.015354521944e+06 ||r(i)||/||b|| 1.000022631902e+00 >> > > > > > > > 3 KSP preconditioned resid norm 4.219243615809e+15 true >> resid norm 9.017157702420e+06 ||r(i)||/||b|| 1.000222648583e+00 >> > > > > > > > ..... >> > > > > > > > 999 KSP preconditioned resid norm 3.043754298076e+12 true >> resid norm 9.015425041089e+06 ||r(i)||/||b|| 1.000030454195e+00 >> > > > > > > > 1000 KSP preconditioned resid norm 3.043000287819e+12 true >> resid norm 9.015424313455e+06 ||r(i)||/||b|| 1.000030373483e+00 >> > > > > > > > Linear solve did not converge due to DIVERGED_ITS iterations >> 1000 >> > > > > > > > KSP Object: 4 MPI processes >> > > > > > > > type: gmres >> > > > > > > > GMRES: restart=1000, using Modified Gram-Schmidt >> Orthogonalization >> > > > > > > > GMRES: happy breakdown tolerance 1e-30 >> > > > > > > > maximum iterations=1000, initial guess is zero >> > > > > > > > tolerances: relative=1e-20, absolute=1e-09, >> divergence=10000 >> > > > > > > > left preconditioning >> > > > > > > > using PRECONDITIONED norm type for convergence test >> > > > > > > > PC Object: 4 MPI processes >> > > > > > > > type: fieldsplit >> > > > > > > > FieldSplit with MULTIPLICATIVE composition: total splits >> = 2 >> > > > > > > > Solver info for each split is in the following KSP >> objects: >> > > > > > > > Split number 0 Defined by IS >> > > > > > > > KSP Object: (fieldsplit_u_) 4 MPI processes >> > > > > > > > type: preonly >> > > > > > > > maximum iterations=10000, initial guess is zero >> > > > > > > > tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000 >> > > > > > > > left preconditioning >> > > > > > > > using NONE norm type for convergence test >> > > > > > > > PC Object: (fieldsplit_u_) 4 MPI processes >> > > > > > > > type: hypre >> > > > > > > > HYPRE BoomerAMG preconditioning >> > > > > > > > HYPRE BoomerAMG: Cycle type V >> > > > > > > > HYPRE BoomerAMG: Maximum number of levels 25 >> > > > > > > > HYPRE BoomerAMG: Maximum number of iterations PER >> hypre call 1 >> > > > > > > > HYPRE BoomerAMG: Convergence tolerance PER hypre >> call 0 >> > > > > > > > HYPRE BoomerAMG: Threshold for strong coupling 0.6 >> > > > > > > > HYPRE BoomerAMG: Interpolation truncation factor 0 >> > > > > > > > HYPRE BoomerAMG: Interpolation: max elements per row >> 0 >> > > > > > > > HYPRE BoomerAMG: Number of levels of aggressive >> coarsening 0 >> > > > > > > > HYPRE BoomerAMG: Number of paths for aggressive >> coarsening 1 >> > > > > > > > HYPRE BoomerAMG: Maximum row sums 0.9 >> > > > > > > > HYPRE BoomerAMG: Sweeps down 1 >> > > > > > > > HYPRE BoomerAMG: Sweeps up 1 >> > > > > > > > HYPRE BoomerAMG: Sweeps on coarse 1 >> > > > > > > > HYPRE BoomerAMG: Relax down >> symmetric-SOR/Jacobi >> > > > > > > > HYPRE BoomerAMG: Relax up >> symmetric-SOR/Jacobi >> > > > > > > > HYPRE BoomerAMG: Relax on coarse >> Gaussian-elimination >> > > > > > > > HYPRE BoomerAMG: Relax weight (all) 1 >> > > > > > > > HYPRE BoomerAMG: Outer relax weight (all) 1 >> > > > > > > > HYPRE BoomerAMG: Using CF-relaxation >> > > > > > > > HYPRE BoomerAMG: Measure type local >> > > > > > > > HYPRE BoomerAMG: Coarsen type PMIS >> > > > > > > > HYPRE BoomerAMG: Interpolation type classical >> > > > > > > > linear system matrix = precond matrix: >> > > > > > > > Mat Object: (fieldsplit_u_) 4 MPI processes >> > > > > > > > type: mpiaij >> > > > > > > > rows=938910, cols=938910, bs=3 >> > > > > > > > total: nonzeros=8.60906e+07, allocated >> nonzeros=8.60906e+07 >> > > > > > > > total number of mallocs used during MatSetValues >> calls =0 >> > > > > > > > using I-node (on process 0) routines: found 78749 >> nodes, limit used is 5 >> > > > > > > > Split number 1 Defined by IS >> > > > > > > > KSP Object: (fieldsplit_wp_) 4 MPI processes >> > > > > > > > type: preonly >> > > > > > > > maximum iterations=10000, initial guess is zero >> > > > > > > > tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000 >> > > > > > > > left preconditioning >> > > > > > > > using NONE norm type for convergence test >> > > > > > > > PC Object: (fieldsplit_wp_) 4 MPI processes >> > > > > > > > type: lu >> > > > > > > > LU: out-of-place factorization >> > > > > > > > tolerance for zero pivot 2.22045e-14 >> > > > > > > > matrix ordering: natural >> > > > > > > > factor fill ratio given 0, needed 0 >> > > > > > > > Factored matrix follows: >> > > > > > > > Mat Object: 4 MPI processes >> > > > > > > > type: mpiaij >> > > > > > > > rows=34141, cols=34141 >> > > > > > > > package used to perform factorization: pastix >> > > > > > > > Error : -nan >> > > > > > > > Error : -nan >> > > > > > > > total: nonzeros=0, allocated nonzeros=0 >> > > > > > > > Error : -nan >> > > > > > > > total number of mallocs used during MatSetValues calls =0 >> > > > > > > > PaStiX run parameters: >> > > > > > > > Matrix type : >> Symmetric >> > > > > > > > Level of printing (0,1,2): 0 >> > > > > > > > Number of refinements iterations : 0 >> > > > > > > > Error : -nan >> > > > > > > > linear system matrix = precond matrix: >> > > > > > > > Mat Object: (fieldsplit_wp_) 4 MPI processes >> > > > > > > > type: mpiaij >> > > > > > > > rows=34141, cols=34141 >> > > > > > > > total: nonzeros=485655, allocated nonzeros=485655 >> > > > > > > > total number of mallocs used during MatSetValues >> calls =0 >> > > > > > > > not using I-node (on process 0) routines >> > > > > > > > linear system matrix = precond matrix: >> > > > > > > > Mat Object: 4 MPI processes >> > > > > > > > type: mpiaij >> > > > > > > > rows=973051, cols=973051 >> > > > > > > > total: nonzeros=9.90037e+07, allocated >> nonzeros=9.90037e+07 >> > > > > > > > total number of mallocs used during MatSetValues calls =0 >> > > > > > > > using I-node (on process 0) routines: found 78749 >> nodes, limit used is 5 >> > > > > > > > >> > > > > > > > The pattern of convergence gives a hint that this system is >> somehow bad/singular. But I don't know why the preconditioned error goes up >> too high. Anyone has an idea? >> > > > > > > > >> > > > > > > > Best regards >> > > > > > > > Giang Bui >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > >> > > > > >> > > > >> > > > >> > > >> > > >> > >> > >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed May 3 10:01:43 2017 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 3 May 2017 11:01:43 -0400 Subject: [petsc-users] GAMG scaling Message-ID: (Hong), what is the current state of optimizing RAP for scaling? Nate, is driving 3D elasticity problems at scaling with GAMG and we are working out performance problems. They are hitting problems at ~1.5B dof problems on a basic Cray (XC30 I think). Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Wed May 3 11:17:22 2017 From: hzhang at mcs.anl.gov (Hong) Date: Wed, 3 May 2017 11:17:22 -0500 Subject: [petsc-users] GAMG scaling In-Reply-To: References: Message-ID: Mark, Below is the copy of my email sent to you on Feb 27: I implemented scalable MatPtAP and did comparisons of three implementations using ex56.c on alcf cetus machine (this machine has small memory, 1GB/core): - nonscalable PtAP: use an array of length PN to do dense axpy - scalable PtAP: do sparse axpy without use of PN array - hypre PtAP. The results are attached. Summary: - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre PtAP - scalable PtAP is 4x faster than hypre PtAP - hypre uses less memory (see job.ne399.n63.np1000.sh) Based on above observation, I set the default PtAP algorithm as 'nonscalable'. When PN > local estimated nonzero of C=PtAP, then switch default to 'scalable'. User can overwrite default. For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I get MatPtAP 3.6224e+01 (nonscalable for small mats, scalable for larger ones) scalable MatPtAP 4.6129e+01 hypre 1.9389e+02 This work in on petsc-master. Give it a try. If you encounter any problem, let me know. Hong On Wed, May 3, 2017 at 10:01 AM, Mark Adams wrote: > (Hong), what is the current state of optimizing RAP for scaling? > > Nate, is driving 3D elasticity problems at scaling with GAMG and we are > working out performance problems. They are hitting problems at ~1.5B dof > problems on a basic Cray (XC30 I think). > > Thanks, > Mark > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: out_ex56_cetus_short Type: application/octet-stream Size: 5377 bytes Desc: not available URL: From fande.kong at inl.gov Wed May 3 13:24:29 2017 From: fande.kong at inl.gov (Kong, Fande) Date: Wed, 3 May 2017 12:24:29 -0600 Subject: [petsc-users] log_view for the master branch Message-ID: Hi, I am using the current master branch. The log_view gives me the summary as follows, and the "WARNING" box repeats three times. Are we intending to do so? Thanks, Fande, ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ########################################################## # # # WARNING!!! # # # # This code was compiled with a debugging option, # # To get timing results run ./configure # # using --with-debugging=no, the performance will # # be generally two or three times faster. # # # ########################################################## ./ex29 on a arch-darwin-c-debug-master named FN604208 with 1 processor, by kongf Wed May 3 12:28:23 2017 Using Petsc Development GIT revision: v3.7.6-3529-g76c7fe0 GIT Date: 2017-05-03 08:46:23 -0500 Max Max/Min Avg Total Time (sec): 1.350e-02 1.00000 1.350e-02 Objects: 4.100e+01 1.00000 4.100e+01 Flop: 3.040e+02 1.00000 3.040e+02 3.040e+02 Flop/sec: 2.251e+04 1.00000 2.251e+04 2.251e+04 Memory: 1.576e+05 1.00000 1.576e+05 MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flop and VecAXPY() for complex vectors of length N --> 8N flop Summary of Stages: ----- Time ------ ----- Flop ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 1.3483e-02 99.8% 3.0400e+02 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ ########################################################## # # # WARNING!!! # # # # This code was compiled with a debugging option, # # To get timing results run ./configure # # using --with-debugging=no, the performance will # # be generally two or three times faster. # # # ########################################################## Event Count Time (sec) Flop --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage KSPGMRESOrthog 1 1.0 1.3617e-04 1.0 3.50e+01 1.0 0.0e+00 0.0e+00 0.0e+00 1 12 0 0 0 1 12 0 0 0 0 KSPSetUp 1 1.0 4.1097e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 KSPSolve 1 1.0 1.4596e-03 1.0 2.85e+02 1.0 0.0e+00 0.0e+00 0.0e+00 11 94 0 0 0 11 94 0 0 0 0 VecMDot 1 1.0 1.7958e-05 1.0 1.70e+01 1.0 0.0e+00 0.0e+00 0.0e+00 0 6 0 0 0 0 6 0 0 0 1 VecNorm 2 1.0 1.9152e-05 1.0 3.40e+01 1.0 0.0e+00 0.0e+00 0.0e+00 0 11 0 0 0 0 11 0 0 0 2 VecScale 1 1.0 4.4771e-05 1.0 9.00e+00 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 0 VecCopy 1 1.0 1.2218e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 10 1.0 7.3789e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecAXPY 1 1.0 6.3397e-05 1.0 1.80e+01 1.0 0.0e+00 0.0e+00 0.0e+00 0 6 0 0 0 0 6 0 0 0 0 VecMAXPY 2 1.0 4.8989e-05 1.0 3.60e+01 1.0 0.0e+00 0.0e+00 0.0e+00 0 12 0 0 0 0 12 0 0 0 1 VecAssemblyBegin 2 1.0 7.5148e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 2 1.0 7.5093e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecNormalize 2 1.0 9.5865e-05 1.0 4.30e+01 1.0 0.0e+00 0.0e+00 0.0e+00 1 14 0 0 0 1 14 0 0 0 0 MatMult 1 1.0 1.3781e-05 1.0 5.70e+01 1.0 0.0e+00 0.0e+00 0.0e+00 0 19 0 0 0 0 19 0 0 0 4 MatSolve 2 1.0 7.4019e-04 1.0 1.14e+02 1.0 0.0e+00 0.0e+00 0.0e+00 5 38 0 0 0 5 38 0 0 0 0 MatLUFactorNum 1 1.0 2.8001e-05 1.0 1.90e+01 1.0 0.0e+00 0.0e+00 0.0e+00 0 6 0 0 0 0 6 0 0 0 1 MatILUFactorSym 1 1.0 9.1556e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatAssemblyBegin 2 1.0 7.7938e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 2 1.0 4.5131e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetRowIJ 1 1.0 4.0429e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 1.7907e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 PCSetUp 1 1.0 5.8597e-04 1.0 1.90e+01 1.0 0.0e+00 0.0e+00 0.0e+00 4 6 0 0 0 4 6 0 0 0 0 PCApply 2 1.0 7.8497e-04 1.0 1.14e+02 1.0 0.0e+00 0.0e+00 0.0e+00 6 38 0 0 0 6 38 0 0 0 0 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Krylov Solver 1 1 18408 0. DMKSP interface 1 1 648 0. Vector 12 12 19224 0. Vector Scatter 2 2 1312 0. Matrix 2 2 7380 0. Distributed Mesh 3 3 14960 0. Index Set 7 7 5632 0. IS L to G Mapping 2 2 1368 0. Star Forest Bipartite Graph 6 6 4864 0. Discrete System 3 3 2596 0. Preconditioner 1 1 1000 0. Viewer 1 0 0 0. ======================================================================================================================== Average time to get PetscTime(): 4.50294e-08 #PETSc Option Table entries: -log_view #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --download-hypre=1 --with-ssl=0 --with-debugging=yes --with-pic=1 --with-shared-libraries=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --download-fblaslapack=1 --download-metis=1 --download-parmetis=1 --download-superlu_dist=1 --download-scalapack=1 --download-mumps=1 CC=mpicc CXX=mpicxx FC=mpif90 F77=mpif77 F90=mpif90 CFLAGS="-fPIC -fopenmp" CXXFLAGS="-fPIC -fopenmp" FFLAGS="-fPIC -fopenmp" FCFLAGS="-fPIC -fopenmp" F90FLAGS="-fPIC -fopenmp" F77FLAGS="-fPIC -fopenmp" PETSC_ARCH=arch-darwin-c-debug-master ----------------------------------------- Libraries compiled on Wed May 3 11:04:44 2017 on FN604208 Machine characteristics: Darwin-15.5.0-x86_64-i386-64bit Using PETSc directory: /Users/kongf/projects/petsc Using PETSc arch: arch-darwin-c-debug-master ----------------------------------------- Using C compiler: mpicc -fPIC -fopenmp -g3 ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: mpif90 -fPIC -fopenmp -g ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/Users/kongf/projects/petsc/arch-darwin-c-debug-master/include -I/Users/kongf/projects/petsc/include -I/Users/kongf/projects/petsc/include -I/Users/kongf/projects/petsc/arch-darwin-c-debug-master/include -I/opt/X11/include ----------------------------------------- Using C linker: mpicc Using Fortran linker: mpif90 Using libraries: -Wl,-rpath,/Users/kongf/projects/petsc/arch-darwin-c-debug-master/lib -L/Users/kongf/projects/petsc/arch-darwin-c-debug-master/lib -lpetsc -Wl,-rpath,/Users/kongf/projects/petsc/arch-darwin-c-debug-master/lib -L/Users/kongf/projects/petsc/arch-darwin-c-debug-master/lib -Wl,-rpath,/opt/X11/lib -L/opt/X11/lib -Wl,-rpath,/opt/moose/mpich/mpich-3.2/clang-opt/lib -L/opt/moose/mpich/mpich-3.2/clang-opt/lib -Wl,-rpath,/opt/moose/llvm-3.9.0/lib -L/opt/moose/llvm-3.9.0/lib -Wl,-rpath,/opt/moose/llvm-3.9.0/lib/clang/3.9.0/lib/darwin -L/opt/moose/llvm-3.9.0/lib/clang/3.9.0/lib/darwin -Wl,-rpath,/opt/moose/gcc-6.2.0/lib/gcc/x86_64-apple-darwin15.6.0/6.2.0 -L/opt/moose/gcc-6.2.0/lib/gcc/x86_64-apple-darwin15.6.0/6.2.0 -Wl,-rpath,/opt/moose/gcc-6.2.0/lib -L/opt/moose/gcc-6.2.0/lib -Wl,-rpath,/opt/moose/llvm-3.9.0/bin/../lib/clang/3.9.0/lib/darwin -L/opt/moose/llvm-3.9.0/bin/../lib/clang/3.9.0/lib/darwin -lsuperlu_dist -lHYPRE -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lflapack -lfblas -lparmetis -lmetis -lX11 -lclang_rt.osx -lmpifort -lgfortran -lgomp -lgcc_ext.10.5 -lquadmath -lm -lclang_rt.osx -lmpicxx -lc++ -lclang_rt.osx -ldl -lmpi -lpmpi -lomp -lSystem -lclang_rt.osx -ldl ----------------------------------------- ########################################################## # # # WARNING!!! # # # # This code was compiled with a debugging option, # # To get timing results run ./configure # # using --with-debugging=no, the performance will # # be generally two or three times faster. # # # ########################################################## -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 3 13:27:47 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 3 May 2017 13:27:47 -0500 Subject: [petsc-users] log_view for the master branch In-Reply-To: References: Message-ID: On Wed, May 3, 2017 at 1:24 PM, Kong, Fande wrote: > Hi, > > I am using the current master branch. The log_view gives me the summary as > follows, and the "WARNING" box repeats three times. Are we intending to do > so? > Yep, Barry is Really Freaking Serious@ that you should not interpret these numbers without optimization on. Matt > Thanks, > > Fande, > > > ************************************************************ > ************************************************************ > *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r > -fCourier9' to print this document *** > ************************************************************ > ************************************************************ > > ---------------------------------------------- PETSc Performance Summary: > ---------------------------------------------- > > > > ########################################################## > # # > # WARNING!!! # > # # > # This code was compiled with a debugging option, # > # To get timing results run ./configure # > # using --with-debugging=no, the performance will # > # be generally two or three times faster. # > # # > ########################################################## > > > ./ex29 on a arch-darwin-c-debug-master named FN604208 with 1 processor, by > kongf Wed May 3 12:28:23 2017 > Using Petsc Development GIT revision: v3.7.6-3529-g76c7fe0 GIT Date: > 2017-05-03 08:46:23 -0500 > > Max Max/Min Avg Total > Time (sec): 1.350e-02 1.00000 1.350e-02 > Objects: 4.100e+01 1.00000 4.100e+01 > Flop: 3.040e+02 1.00000 3.040e+02 3.040e+02 > Flop/sec: 2.251e+04 1.00000 2.251e+04 2.251e+04 > Memory: 1.576e+05 1.00000 1.576e+05 > MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 > MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 > MPI Reductions: 0.000e+00 0.00000 > > Flop counting convention: 1 flop = 1 real number operation of type > (multiply/divide/add/subtract) > e.g., VecAXPY() for real vectors of length N > --> 2N flop > and VecAXPY() for complex vectors of length N > --> 8N flop > > Summary of Stages: ----- Time ------ ----- Flop ----- --- Messages > --- -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total counts > %Total Avg %Total counts %Total > 0: Main Stage: 1.3483e-02 99.8% 3.0400e+02 100.0% 0.000e+00 > 0.0% 0.000e+00 0.0% 0.000e+00 0.0% > > ------------------------------------------------------------ > ------------------------------------------------------------ > See the 'Profiling' chapter of the users' manual for details on > interpreting output. > Phase summary info: > Count: number of times phase was executed > Time and Flop: Max - maximum over all processors > Ratio - ratio of maximum to minimum over all processors > Mess: number of messages sent > Avg. len: average message length (bytes) > Reduct: number of global reductions > Global: entire computation > Stage: stages of a computation. Set stages with PetscLogStagePush() and > PetscLogStagePop(). > %T - percent time in this phase %F - percent flop in this > phase > %M - percent messages in this phase %L - percent message lengths > in this phase > %R - percent reductions in this phase > Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over > all processors) > ------------------------------------------------------------ > ------------------------------------------------------------ > > > ########################################################## > # # > # WARNING!!! # > # # > # This code was compiled with a debugging option, # > # To get timing results run ./configure # > # using --with-debugging=no, the performance will # > # be generally two or three times faster. # > # # > ########################################################## > > > Event Count Time (sec) > Flop --- Global --- --- Stage --- Total > Max Ratio Max Ratio Max Ratio Mess Avg len > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > ------------------------------------------------------------ > ------------------------------------------------------------ > > --- Event Stage 0: Main Stage > > KSPGMRESOrthog 1 1.0 1.3617e-04 1.0 3.50e+01 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 12 0 0 0 1 12 0 0 0 0 > KSPSetUp 1 1.0 4.1097e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 > KSPSolve 1 1.0 1.4596e-03 1.0 2.85e+02 1.0 0.0e+00 0.0e+00 > 0.0e+00 11 94 0 0 0 11 94 0 0 0 0 > VecMDot 1 1.0 1.7958e-05 1.0 1.70e+01 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 6 0 0 0 0 6 0 0 0 1 > VecNorm 2 1.0 1.9152e-05 1.0 3.40e+01 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 11 0 0 0 0 11 0 0 0 2 > VecScale 1 1.0 4.4771e-05 1.0 9.00e+00 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 3 0 0 0 0 3 0 0 0 0 > VecCopy 1 1.0 1.2218e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 10 1.0 7.3789e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > VecAXPY 1 1.0 6.3397e-05 1.0 1.80e+01 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 6 0 0 0 0 6 0 0 0 0 > VecMAXPY 2 1.0 4.8989e-05 1.0 3.60e+01 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 12 0 0 0 0 12 0 0 0 1 > VecAssemblyBegin 2 1.0 7.5148e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAssemblyEnd 2 1.0 7.5093e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecNormalize 2 1.0 9.5865e-05 1.0 4.30e+01 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 14 0 0 0 1 14 0 0 0 0 > MatMult 1 1.0 1.3781e-05 1.0 5.70e+01 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 19 0 0 0 0 19 0 0 0 4 > MatSolve 2 1.0 7.4019e-04 1.0 1.14e+02 1.0 0.0e+00 0.0e+00 > 0.0e+00 5 38 0 0 0 5 38 0 0 0 0 > MatLUFactorNum 1 1.0 2.8001e-05 1.0 1.90e+01 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 6 0 0 0 0 6 0 0 0 1 > MatILUFactorSym 1 1.0 9.1556e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > MatAssemblyBegin 2 1.0 7.7938e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 2 1.0 4.5131e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatGetRowIJ 1 1.0 4.0429e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatGetOrdering 1 1.0 1.7907e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 > PCSetUp 1 1.0 5.8597e-04 1.0 1.90e+01 1.0 0.0e+00 0.0e+00 > 0.0e+00 4 6 0 0 0 4 6 0 0 0 0 > PCApply 2 1.0 7.8497e-04 1.0 1.14e+02 1.0 0.0e+00 0.0e+00 > 0.0e+00 6 38 0 0 0 6 38 0 0 0 0 > ------------------------------------------------------------ > ------------------------------------------------------------ > > Memory usage is given in bytes: > > Object Type Creations Destructions Memory Descendants' Mem. > Reports information only for process 0. > > --- Event Stage 0: Main Stage > > Krylov Solver 1 1 18408 0. > DMKSP interface 1 1 648 0. > Vector 12 12 19224 0. > Vector Scatter 2 2 1312 0. > Matrix 2 2 7380 0. > Distributed Mesh 3 3 14960 0. > Index Set 7 7 5632 0. > IS L to G Mapping 2 2 1368 0. > Star Forest Bipartite Graph 6 6 4864 0. > Discrete System 3 3 2596 0. > Preconditioner 1 1 1000 0. > Viewer 1 0 0 0. > ============================================================ > ============================================================ > Average time to get PetscTime(): 4.50294e-08 > #PETSc Option Table entries: > -log_view > #End of PETSc Option Table entries > Compiled without FORTRAN kernels > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 > sizeof(PetscScalar) 8 sizeof(PetscInt) 4 > Configure options: --download-hypre=1 --with-ssl=0 --with-debugging=yes > --with-pic=1 --with-shared-libraries=1 --with-cc=mpicc --with-cxx=mpicxx > --with-fc=mpif90 --download-fblaslapack=1 --download-metis=1 > --download-parmetis=1 --download-superlu_dist=1 --download-scalapack=1 > --download-mumps=1 CC=mpicc CXX=mpicxx FC=mpif90 F77=mpif77 F90=mpif90 > CFLAGS="-fPIC -fopenmp" CXXFLAGS="-fPIC -fopenmp" FFLAGS="-fPIC -fopenmp" > FCFLAGS="-fPIC -fopenmp" F90FLAGS="-fPIC -fopenmp" F77FLAGS="-fPIC > -fopenmp" PETSC_ARCH=arch-darwin-c-debug-master > ----------------------------------------- > Libraries compiled on Wed May 3 11:04:44 2017 on FN604208 > Machine characteristics: Darwin-15.5.0-x86_64-i386-64bit > Using PETSc directory: /Users/kongf/projects/petsc > Using PETSc arch: arch-darwin-c-debug-master > ----------------------------------------- > > Using C compiler: mpicc -fPIC -fopenmp -g3 ${COPTFLAGS} ${CFLAGS} > Using Fortran compiler: mpif90 -fPIC -fopenmp -g ${FOPTFLAGS} ${FFLAGS} > ----------------------------------------- > > Using include paths: -I/Users/kongf/projects/petsc/ > arch-darwin-c-debug-master/include -I/Users/kongf/projects/petsc/include > -I/Users/kongf/projects/petsc/include -I/Users/kongf/projects/petsc/ > arch-darwin-c-debug-master/include -I/opt/X11/include > ----------------------------------------- > > Using C linker: mpicc > Using Fortran linker: mpif90 > Using libraries: -Wl,-rpath,/Users/kongf/projects/petsc/arch-darwin-c-debug-master/lib > -L/Users/kongf/projects/petsc/arch-darwin-c-debug-master/lib -lpetsc > -Wl,-rpath,/Users/kongf/projects/petsc/arch-darwin-c-debug-master/lib > -L/Users/kongf/projects/petsc/arch-darwin-c-debug-master/lib > -Wl,-rpath,/opt/X11/lib -L/opt/X11/lib -Wl,-rpath,/opt/moose/mpich/mpich-3.2/clang-opt/lib > -L/opt/moose/mpich/mpich-3.2/clang-opt/lib -Wl,-rpath,/opt/moose/llvm-3.9.0/lib > -L/opt/moose/llvm-3.9.0/lib -Wl,-rpath,/opt/moose/llvm-3.9.0/lib/clang/3.9.0/lib/darwin > -L/opt/moose/llvm-3.9.0/lib/clang/3.9.0/lib/darwin > -Wl,-rpath,/opt/moose/gcc-6.2.0/lib/gcc/x86_64-apple-darwin15.6.0/6.2.0 > -L/opt/moose/gcc-6.2.0/lib/gcc/x86_64-apple-darwin15.6.0/6.2.0 > -Wl,-rpath,/opt/moose/gcc-6.2.0/lib -L/opt/moose/gcc-6.2.0/lib > -Wl,-rpath,/opt/moose/llvm-3.9.0/bin/../lib/clang/3.9.0/lib/darwin > -L/opt/moose/llvm-3.9.0/bin/../lib/clang/3.9.0/lib/darwin -lsuperlu_dist > -lHYPRE -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord > -lscalapack -lflapack -lfblas -lparmetis -lmetis -lX11 -lclang_rt.osx > -lmpifort -lgfortran -lgomp -lgcc_ext.10.5 -lquadmath -lm -lclang_rt.osx > -lmpicxx -lc++ -lclang_rt.osx -ldl -lmpi -lpmpi -lomp -lSystem > -lclang_rt.osx -ldl > ----------------------------------------- > > > > ########################################################## > # # > # WARNING!!! # > # # > # This code was compiled with a debugging option, # > # To get timing results run ./configure # > # using --with-debugging=no, the performance will # > # be generally two or three times faster. # > # # > ########################################################## > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Wed May 3 14:08:03 2017 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 3 May 2017 15:08:03 -0400 Subject: [petsc-users] GAMG scaling In-Reply-To: References: Message-ID: Hong,the input files do not seem to be accessible. What are the command line option? (I don't see a "rap" or "scale" in the source). On Wed, May 3, 2017 at 12:17 PM, Hong wrote: > Mark, > Below is the copy of my email sent to you on Feb 27: > > I implemented scalable MatPtAP and did comparisons of three > implementations using ex56.c on alcf cetus machine (this machine has > small memory, 1GB/core): > - nonscalable PtAP: use an array of length PN to do dense axpy > - scalable PtAP: do sparse axpy without use of PN array > - hypre PtAP. > > The results are attached. Summary: > - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre PtAP > - scalable PtAP is 4x faster than hypre PtAP > - hypre uses less memory (see job.ne399.n63.np1000.sh) > > Based on above observation, I set the default PtAP algorithm as > 'nonscalable'. > When PN > local estimated nonzero of C=PtAP, then switch default to > 'scalable'. > User can overwrite default. > > For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I get > MatPtAP 3.6224e+01 (nonscalable for small mats, scalable > for larger ones) > scalable MatPtAP 4.6129e+01 > hypre 1.9389e+02 > > This work in on petsc-master. Give it a try. If you encounter any problem, > let me know. > > Hong > > On Wed, May 3, 2017 at 10:01 AM, Mark Adams wrote: > >> (Hong), what is the current state of optimizing RAP for scaling? >> >> Nate, is driving 3D elasticity problems at scaling with GAMG and we are >> working out performance problems. They are hitting problems at ~1.5B dof >> problems on a basic Cray (XC30 I think). >> >> Thanks, >> Mark >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgbk2008 at gmail.com Wed May 3 16:19:01 2017 From: hgbk2008 at gmail.com (Hoang Giang Bui) Date: Wed, 3 May 2017 23:19:01 +0200 Subject: [petsc-users] strange convergence In-Reply-To: References: <7891536D-91FE-4BFF-8DAD-CE7AB85A4E57@mcs.anl.gov> <425BBB58-9721-49F3-8C86-940F08E925F7@mcs.anl.gov> <42EB791A-40C2-439F-A5F7-5F8C15CECA6F@mcs.anl.gov> <82193784-B4C4-47D7-80EA-25F549C9091B@mcs.anl.gov> <87wpa3wd5j.fsf@jedbrown.org> Message-ID: Hi Lukasz, thanks for sharing very interesting slide. Both of you are right, the mortar method starts from continuum argument then reduce to discrete space by discretizing the Lagrange multiplier. However, the way to choose the interpolation space has some implication on the properties of the mortar matrices. For example, the dual mortar space can help to reduce the multiplier by static condensation but it creates some numerical oscillation. In my opinion I think it's not stable despite a very sound theoretical foundation is developed. Both standard and dual mortar approach impose some drawbacks for high order contact because of negative shape function can create some spurious negative nodal gap. How do you cope with that case in your code? However, this question may be a bit off-topic. Come back to the main question, for mesh "gluing" using mortar method, the Schur matrix is S=[0 -D^T M^T] A^-1 [0 D M]^T, has the form of A_10 (A_00)^-1 A01 since A_11=0. The magnitude of S (~E^-1) is too small compare to A_00 (which is ~E for elasticity). I think in some case it's also rank deficient if three Lagrange multiplier is used per node (S is very ill-conditioned although A_00 is well). I'm skeptical here do you really solve the system of mortar with schur complement? Giang On Wed, May 3, 2017 at 2:55 PM, Lukasz Kaczmarczyk < Lukasz.Kaczmarczyk at glasgow.ac.uk> wrote: > > On 3 May 2017, at 13:22, Matthew Knepley wrote: > > On Wed, May 3, 2017 at 2:29 AM, Hoang Giang Bui wrote > : > >> Dear Jed >> >> If I understood you correctly you suggest to avoid penalty by using the >> Lagrange multiplier for the mortar constraint? In this case it leads to the >> use of discrete Lagrange multiplier space. >> > > Sorry for being ignorant here, but why is the space "discrete"? It looks > like you should have a continuum formulation > of the mortar as well. Maybe I do not understand something fundamental. > From this (https://en.wikipedia.org/wiki/Mortar_methods) > short description, it seems that mortars begin from a continuum > formulation, but are then reduced to the discrete level. This is no > problem if done consistently, as for instance in the FETI method where > efficient preconditioners exist. > > > Hello, > > I copied the wrong link to mortar method, how we implemented it, see > presentation http://doi.org/10.5281/zenodo.556996 > > You right that we always start from continuum formulation, on this we > apply some discretisation, at the end Lagrange multiplier is expressed by a > finite vector of discrete unknowns. It is better to formulate problem first > for the continuum; you have better control on what you are doing and > stability of the solution. > > Of course, you can add some constraints at the discreet level, after you > discretised problem, but implicitly you have some continuous space for > Lagrange multipliers, which is associated with shape functions which you > use to discretise problem. > > In our problem which we have, we try to avoid rebuilding of the system of > equations each time contact area is changing. We going to construct DM > sub-problem for each body in contact, each sub-problem going to be solved > using MG (adjacency for those matrices is fixed in time). All will go to > put in nested matrix with the separate block for Lagrange multipliers > (adjacency will change in each time step). For solving Lagrange > multipliers we going to use FIELDSPLIT using Schur complement. I need to > look more detail to FETI method, at are still at development stage for > contact problem and direct solver works, for now, small problems at that > point. > > In our code, we using higher order elements with hierarchical base, for > this we using specialise MG solver, as you can see here, it works pretty > well for moderate size problems, <100M > http://mofem.eng.gla.ac.uk/mofem/html/_p_c_m_g_set_up_via_ap > prox_orders_8cpp.html > > Regards, > Lukasz > > > > Thanks, > > Matt > > >> Do you or anyone already have experience using discrete Lagrange >> multiplier space with Petsc? >> >> There is also similar question on stackexchange >> https://scicomp.stackexchange.com/questions/25113/preconditi >> oners-and-discrete-lagrange-multipliers >> >> Giang >> >> On Sat, Apr 29, 2017 at 3:34 PM, Jed Brown wrote: >> >>> Hoang Giang Bui writes: >>> >>> > Hi Barry >>> > >>> > The first block is from a standard solid mechanics discretization >>> based on >>> > balance of momentum equation. There is some material involved but in >>> > principal it's well-posed elasticity equation with positive definite >>> > tangent operator. The "gluing business" uses the mortar method to keep >>> the >>> > continuity of displacement. Instead of using Lagrange multiplier to >>> treat >>> > the constraint I used penalty method to penalize the energy. The >>> > discretization form of mortar is quite simple >>> > >>> > \int_{\Gamma_1} { rho * (\delta u_1 - \delta u_2) * (u_1 - u_2) dA } >>> > >>> > rho is penalty parameter. In the simulation I initially set it low >>> (~E) to >>> > preserve the conditioning of the system. >>> >>> There are two things that can go wrong here with AMG: >>> >>> * The penalty term can mess up the strength of connection heuristics >>> such that you get poor choice of C-points (classical AMG like >>> BoomerAMG) or poor choice of aggregates (smoothed aggregation). >>> >>> * The penalty term can prevent Jacobi smoothing from being effective; in >>> this case, it can lead to poor coarse basis functions (higher energy >>> than they should be) and poor smoothing in an MG cycle. You can fix >>> the poor smoothing in the MG cycle by using a stronger smoother, like >>> ASM with some overlap. >>> >>> I'm generally not a fan of penalty methods due to the irritating >>> tradeoffs and often poor solver performance. >>> >>> > In the figure below, the colorful blocks are u_1 and the base is u_2. >>> Both >>> > u_1 and u_2 use isoparametric quadratic approximation. >>> > >>> > ? >>> > Snapshot.png >>> > >> U/view?usp=drive_web> >>> > ??? >>> > >>> > Giang >>> > >>> > On Fri, Apr 28, 2017 at 6:21 PM, Barry Smith >>> wrote: >>> > >>> >> >>> >> Ok, so boomerAMG algebraic multigrid is not good for the first >>> block. >>> >> You mentioned the first block has two things glued together? AMG is >>> >> fantastic for certain problems but doesn't work for everything. >>> >> >>> >> Tell us more about the first block, what PDE it comes from, what >>> >> discretization, and what the "gluing business" is and maybe we'll have >>> >> suggestions for how to precondition it. >>> >> >>> >> Barry >>> >> >>> >> > On Apr 28, 2017, at 3:56 AM, Hoang Giang Bui >>> wrote: >>> >> > >>> >> > It's in fact quite good >>> >> > >>> >> > Residual norms for fieldsplit_u_ solve. >>> >> > 0 KSP Residual norm 4.014715925568e+00 >>> >> > 1 KSP Residual norm 2.160497019264e-10 >>> >> > Residual norms for fieldsplit_wp_ solve. >>> >> > 0 KSP Residual norm 0.000000000000e+00 >>> >> > 0 KSP preconditioned resid norm 4.014715925568e+00 true resid norm >>> >> 9.006493082896e+06 ||r(i)||/||b|| 1.000000000000e+00 >>> >> > Residual norms for fieldsplit_u_ solve. >>> >> > 0 KSP Residual norm 9.999999999416e-01 >>> >> > 1 KSP Residual norm 7.118380416383e-11 >>> >> > Residual norms for fieldsplit_wp_ solve. >>> >> > 0 KSP Residual norm 0.000000000000e+00 >>> >> > 1 KSP preconditioned resid norm 1.701150951035e-10 true resid norm >>> >> 5.494262251846e-04 ||r(i)||/||b|| 6.100334726599e-11 >>> >> > Linear solve converged due to CONVERGED_ATOL iterations 1 >>> >> > >>> >> > Giang >>> >> > >>> >> > On Thu, Apr 27, 2017 at 5:25 PM, Barry Smith >>> wrote: >>> >> > >>> >> > Run again using LU on both blocks to see what happens. >>> >> > >>> >> > >>> >> > > On Apr 27, 2017, at 2:14 AM, Hoang Giang Bui >>> >> wrote: >>> >> > > >>> >> > > I have changed the way to tie the nonconforming mesh. It seems the >>> >> matrix now is better >>> >> > > >>> >> > > with -pc_type lu the output is >>> >> > > 0 KSP preconditioned resid norm 3.308678584240e-01 true resid >>> norm >>> >> 9.006493082896e+06 ||r(i)||/||b|| 1.000000000000e+00 >>> >> > > 1 KSP preconditioned resid norm 2.004313395301e-12 true resid >>> norm >>> >> 2.549872332830e-05 ||r(i)||/||b|| 2.831148938173e-12 >>> >> > > Linear solve converged due to CONVERGED_ATOL iterations 1 >>> >> > > >>> >> > > >>> >> > > with -pc_type fieldsplit -fieldsplit_u_pc_type hypre >>> >> -fieldsplit_wp_pc_type lu the convergence is slow >>> >> > > 0 KSP preconditioned resid norm 1.116302362553e-01 true resid >>> norm >>> >> 9.006493083520e+06 ||r(i)||/||b|| 1.000000000000e+00 >>> >> > > 1 KSP preconditioned resid norm 2.582134825666e-02 true resid >>> norm >>> >> 9.268347719866e+06 ||r(i)||/||b|| 1.029073984060e+00 >>> >> > > ... >>> >> > > 824 KSP preconditioned resid norm 1.018542387738e-09 true resid >>> norm >>> >> 2.906608839310e+02 ||r(i)||/||b|| 3.227237074804e-05 >>> >> > > 825 KSP preconditioned resid norm 9.743727947637e-10 true resid >>> norm >>> >> 2.820369993061e+02 ||r(i)||/||b|| 3.131485215062e-05 >>> >> > > Linear solve converged due to CONVERGED_ATOL iterations 825 >>> >> > > >>> >> > > checking with additional -fieldsplit_u_ksp_type richardson >>> >> -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 >>> >> -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor >>> >> -fieldsplit_wp_ksp_max_it 1 gives >>> >> > > >>> >> > > 0 KSP preconditioned resid norm 1.116302362553e-01 true resid >>> norm >>> >> 9.006493083520e+06 ||r(i)||/||b|| 1.000000000000e+00 >>> >> > > Residual norms for fieldsplit_u_ solve. >>> >> > > 0 KSP Residual norm 5.803507549280e-01 >>> >> > > 1 KSP Residual norm 2.069538175950e-01 >>> >> > > Residual norms for fieldsplit_wp_ solve. >>> >> > > 0 KSP Residual norm 0.000000000000e+00 >>> >> > > 1 KSP preconditioned resid norm 2.582134825666e-02 true resid >>> norm >>> >> 9.268347719866e+06 ||r(i)||/||b|| 1.029073984060e+00 >>> >> > > Residual norms for fieldsplit_u_ solve. >>> >> > > 0 KSP Residual norm 7.831796195225e-01 >>> >> > > 1 KSP Residual norm 1.734608520110e-01 >>> >> > > Residual norms for fieldsplit_wp_ solve. >>> >> > > 0 KSP Residual norm 0.000000000000e+00 >>> >> > > .... >>> >> > > 823 KSP preconditioned resid norm 1.065070135605e-09 true resid >>> norm >>> >> 3.081881356833e+02 ||r(i)||/||b|| 3.421843916665e-05 >>> >> > > Residual norms for fieldsplit_u_ solve. >>> >> > > 0 KSP Residual norm 6.113806394327e-01 >>> >> > > 1 KSP Residual norm 1.535465290944e-01 >>> >> > > Residual norms for fieldsplit_wp_ solve. >>> >> > > 0 KSP Residual norm 0.000000000000e+00 >>> >> > > 824 KSP preconditioned resid norm 1.018542387746e-09 true resid >>> norm >>> >> 2.906608839353e+02 ||r(i)||/||b|| 3.227237074851e-05 >>> >> > > Residual norms for fieldsplit_u_ solve. >>> >> > > 0 KSP Residual norm 6.123437055586e-01 >>> >> > > 1 KSP Residual norm 1.524661826133e-01 >>> >> > > Residual norms for fieldsplit_wp_ solve. >>> >> > > 0 KSP Residual norm 0.000000000000e+00 >>> >> > > 825 KSP preconditioned resid norm 9.743727947718e-10 true resid >>> norm >>> >> 2.820369990571e+02 ||r(i)||/||b|| 3.131485212298e-05 >>> >> > > Linear solve converged due to CONVERGED_ATOL iterations 825 >>> >> > > >>> >> > > >>> >> > > The residual for wp block is zero since in this first step the >>> rhs is >>> >> zero. As can see in the output, the multigrid does not perform well to >>> >> reduce the residual in the sub-solve. Is my observation right? what >>> can be >>> >> done to improve this? >>> >> > > >>> >> > > >>> >> > > Giang >>> >> > > >>> >> > > On Tue, Apr 25, 2017 at 12:17 AM, Barry Smith >> > >>> >> wrote: >>> >> > > >>> >> > > This can happen in the matrix is singular or nearly singular >>> or if >>> >> the factorization generates small pivots, which can occur for even >>> >> nonsingular problems if the matrix is poorly scaled or just plain >>> nasty. >>> >> > > >>> >> > > >>> >> > > > On Apr 24, 2017, at 5:10 PM, Hoang Giang Bui < >>> hgbk2008 at gmail.com> >>> >> wrote: >>> >> > > > >>> >> > > > It took a while, here I send you the output >>> >> > > > >>> >> > > > 0 KSP preconditioned resid norm 3.129073545457e+05 true resid >>> norm >>> >> 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 >>> >> > > > 1 KSP preconditioned resid norm 7.442444222843e-01 true resid >>> norm >>> >> 1.003356247696e+02 ||r(i)||/||b|| 1.112966720375e-05 >>> >> > > > 2 KSP preconditioned resid norm 3.267453132529e-07 true resid >>> norm >>> >> 3.216722968300e+01 ||r(i)||/||b|| 3.568130084011e-06 >>> >> > > > 3 KSP preconditioned resid norm 1.155046883816e-11 true resid >>> norm >>> >> 3.234460376820e+01 ||r(i)||/||b|| 3.587805194854e-06 >>> >> > > > Linear solve converged due to CONVERGED_ATOL iterations 3 >>> >> > > > KSP Object: 4 MPI processes >>> >> > > > type: gmres >>> >> > > > GMRES: restart=1000, using Modified Gram-Schmidt >>> >> Orthogonalization >>> >> > > > GMRES: happy breakdown tolerance 1e-30 >>> >> > > > maximum iterations=1000, initial guess is zero >>> >> > > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 >>> >> > > > left preconditioning >>> >> > > > using PRECONDITIONED norm type for convergence test >>> >> > > > PC Object: 4 MPI processes >>> >> > > > type: lu >>> >> > > > LU: out-of-place factorization >>> >> > > > tolerance for zero pivot 2.22045e-14 >>> >> > > > matrix ordering: natural >>> >> > > > factor fill ratio given 0, needed 0 >>> >> > > > Factored matrix follows: >>> >> > > > Mat Object: 4 MPI processes >>> >> > > > type: mpiaij >>> >> > > > rows=973051, cols=973051 >>> >> > > > package used to perform factorization: pastix >>> >> > > > Error : 3.24786e-14 >>> >> > > > total: nonzeros=0, allocated nonzeros=0 >>> >> > > > total number of mallocs used during MatSetValues >>> calls =0 >>> >> > > > PaStiX run parameters: >>> >> > > > Matrix type : Unsymmetric >>> >> > > > Level of printing (0,1,2): 0 >>> >> > > > Number of refinements iterations : 3 >>> >> > > > Error : 3.24786e-14 >>> >> > > > linear system matrix = precond matrix: >>> >> > > > Mat Object: 4 MPI processes >>> >> > > > type: mpiaij >>> >> > > > rows=973051, cols=973051 >>> >> > > > Error : 3.24786e-14 >>> >> > > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 >>> >> > > > total number of mallocs used during MatSetValues calls =0 >>> >> > > > using I-node (on process 0) routines: found 78749 nodes, >>> limit >>> >> used is 5 >>> >> > > > Error : 3.24786e-14 >>> >> > > > >>> >> > > > It doesn't do as you said. Something is not right here. I will >>> look >>> >> in depth. >>> >> > > > >>> >> > > > Giang >>> >> > > > >>> >> > > > On Mon, Apr 24, 2017 at 8:21 PM, Barry Smith < >>> bsmith at mcs.anl.gov> >>> >> wrote: >>> >> > > > >>> >> > > > > On Apr 24, 2017, at 12:47 PM, Hoang Giang Bui < >>> hgbk2008 at gmail.com> >>> >> wrote: >>> >> > > > > >>> >> > > > > Good catch. I get this for the very first step, maybe at that >>> time >>> >> the rhs_w is zero. >>> >> > > > >>> >> > > > With the multiplicative composition the right hand side of >>> the >>> >> second solve is the initial right hand side of the second solve minus >>> >> A_10*x where x is the solution to the first sub solve and A_10 is the >>> lower >>> >> left block of the outer matrix. So unless both the initial right hand >>> side >>> >> has a zero for the second block and A_10 is identically zero the >>> right hand >>> >> side for the second sub solve should not be zero. Is A_10 == 0? >>> >> > > > >>> >> > > > >>> >> > > > > In the later step, it shows 2 step convergence >>> >> > > > > >>> >> > > > > Residual norms for fieldsplit_u_ solve. >>> >> > > > > 0 KSP Residual norm 3.165886479830e+04 >>> >> > > > > 1 KSP Residual norm 2.905922877684e-01 >>> >> > > > > Residual norms for fieldsplit_wp_ solve. >>> >> > > > > 0 KSP Residual norm 2.397669419027e-01 >>> >> > > > > 1 KSP Residual norm 0.000000000000e+00 >>> >> > > > > 0 KSP preconditioned resid norm 3.165886479920e+04 true >>> resid >>> >> norm 7.963616922323e+05 ||r(i)||/||b|| 1.000000000000e+00 >>> >> > > > > Residual norms for fieldsplit_u_ solve. >>> >> > > > > 0 KSP Residual norm 9.999891813771e-01 >>> >> > > > > 1 KSP Residual norm 1.512000395579e-05 >>> >> > > > > Residual norms for fieldsplit_wp_ solve. >>> >> > > > > 0 KSP Residual norm 8.192702188243e-06 >>> >> > > > > 1 KSP Residual norm 0.000000000000e+00 >>> >> > > > > 1 KSP preconditioned resid norm 5.252183822848e-02 true >>> resid >>> >> norm 7.135927677844e+04 ||r(i)||/||b|| 8.960661653427e-02 >>> >> > > > >>> >> > > > The outer residual norms are still wonky, the preconditioned >>> >> residual norm goes from 3.165886479920e+04 to 5.252183822848e-02 >>> which is a >>> >> huge drop but the 7.963616922323e+05 drops very much less >>> >> 7.135927677844e+04. This is not normal. >>> >> > > > >>> >> > > > What if you just use -pc_type lu for the entire system (no >>> >> fieldsplit), does the true residual drop to almost zero in the first >>> >> iteration (as it should?). Send the output. >>> >> > > > >>> >> > > > >>> >> > > > >>> >> > > > > Residual norms for fieldsplit_u_ solve. >>> >> > > > > 0 KSP Residual norm 6.946213936597e-01 >>> >> > > > > 1 KSP Residual norm 1.195514007343e-05 >>> >> > > > > Residual norms for fieldsplit_wp_ solve. >>> >> > > > > 0 KSP Residual norm 1.025694497535e+00 >>> >> > > > > 1 KSP Residual norm 0.000000000000e+00 >>> >> > > > > 2 KSP preconditioned resid norm 8.785709535405e-03 true >>> resid >>> >> norm 1.419341799277e+04 ||r(i)||/||b|| 1.782282866091e-02 >>> >> > > > > Residual norms for fieldsplit_u_ solve. >>> >> > > > > 0 KSP Residual norm 7.255149996405e-01 >>> >> > > > > 1 KSP Residual norm 6.583512434218e-06 >>> >> > > > > Residual norms for fieldsplit_wp_ solve. >>> >> > > > > 0 KSP Residual norm 1.015229700337e+00 >>> >> > > > > 1 KSP Residual norm 0.000000000000e+00 >>> >> > > > > 3 KSP preconditioned resid norm 7.110407712709e-04 true >>> resid >>> >> norm 5.284940654154e+02 ||r(i)||/||b|| 6.636357205153e-04 >>> >> > > > > Residual norms for fieldsplit_u_ solve. >>> >> > > > > 0 KSP Residual norm 3.512243341400e-01 >>> >> > > > > 1 KSP Residual norm 2.032490351200e-06 >>> >> > > > > Residual norms for fieldsplit_wp_ solve. >>> >> > > > > 0 KSP Residual norm 1.282327290982e+00 >>> >> > > > > 1 KSP Residual norm 0.000000000000e+00 >>> >> > > > > 4 KSP preconditioned resid norm 3.482036620521e-05 true >>> resid >>> >> norm 4.291231924307e+01 ||r(i)||/||b|| 5.388546393133e-05 >>> >> > > > > Residual norms for fieldsplit_u_ solve. >>> >> > > > > 0 KSP Residual norm 3.423609338053e-01 >>> >> > > > > 1 KSP Residual norm 4.213703301972e-07 >>> >> > > > > Residual norms for fieldsplit_wp_ solve. >>> >> > > > > 0 KSP Residual norm 1.157384757538e+00 >>> >> > > > > 1 KSP Residual norm 0.000000000000e+00 >>> >> > > > > 5 KSP preconditioned resid norm 1.203470314534e-06 true >>> resid >>> >> norm 4.544956156267e+00 ||r(i)||/||b|| 5.707150658550e-06 >>> >> > > > > Residual norms for fieldsplit_u_ solve. >>> >> > > > > 0 KSP Residual norm 3.838596289995e-01 >>> >> > > > > 1 KSP Residual norm 9.927864176103e-08 >>> >> > > > > Residual norms for fieldsplit_wp_ solve. >>> >> > > > > 0 KSP Residual norm 1.066298905618e+00 >>> >> > > > > 1 KSP Residual norm 0.000000000000e+00 >>> >> > > > > 6 KSP preconditioned resid norm 3.331619244266e-08 true >>> resid >>> >> norm 2.821511729024e+00 ||r(i)||/||b|| 3.543002829675e-06 >>> >> > > > > Residual norms for fieldsplit_u_ solve. >>> >> > > > > 0 KSP Residual norm 4.624964188094e-01 >>> >> > > > > 1 KSP Residual norm 6.418229775372e-08 >>> >> > > > > Residual norms for fieldsplit_wp_ solve. >>> >> > > > > 0 KSP Residual norm 9.800784311614e-01 >>> >> > > > > 1 KSP Residual norm 0.000000000000e+00 >>> >> > > > > 7 KSP preconditioned resid norm 8.788046233297e-10 true >>> resid >>> >> norm 2.849209671705e+00 ||r(i)||/||b|| 3.577783436215e-06 >>> >> > > > > Linear solve converged due to CONVERGED_ATOL iterations 7 >>> >> > > > > >>> >> > > > > The outer operator is an explicit matrix. >>> >> > > > > >>> >> > > > > Giang >>> >> > > > > >>> >> > > > > On Mon, Apr 24, 2017 at 7:32 PM, Barry Smith < >>> bsmith at mcs.anl.gov> >>> >> wrote: >>> >> > > > > >>> >> > > > > > On Apr 24, 2017, at 3:16 AM, Hoang Giang Bui < >>> hgbk2008 at gmail.com> >>> >> wrote: >>> >> > > > > > >>> >> > > > > > Thanks Barry, trying with -fieldsplit_u_type lu gives better >>> >> convergence. I still used 4 procs though, probably with 1 proc it >>> should >>> >> also be the same. >>> >> > > > > > >>> >> > > > > > The u block used a Nitsche-type operator to connect two >>> >> non-matching domains. I don't think it will leave some rigid body >>> motion >>> >> leads to not sufficient constraints. Maybe you have other idea? >>> >> > > > > > >>> >> > > > > > Residual norms for fieldsplit_u_ solve. >>> >> > > > > > 0 KSP Residual norm 3.129067184300e+05 >>> >> > > > > > 1 KSP Residual norm 5.906261468196e-01 >>> >> > > > > > Residual norms for fieldsplit_wp_ solve. >>> >> > > > > > 0 KSP Residual norm 0.000000000000e+00 >>> >> > > > > >>> >> > > > > ^^^^ something is wrong here. The sub solve should not be >>> >> starting with a 0 residual (this means the right hand side for this >>> sub >>> >> solve is zero which it should not be). >>> >> > > > > >>> >> > > > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 >>> >> > > > > >>> >> > > > > >>> >> > > > > How are you providing the outer operator? As an explicit >>> matrix >>> >> or with some shell matrix? >>> >> > > > > >>> >> > > > > >>> >> > > > > >>> >> > > > > > 0 KSP preconditioned resid norm 3.129067184300e+05 true >>> resid >>> >> norm 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 >>> >> > > > > > Residual norms for fieldsplit_u_ solve. >>> >> > > > > > 0 KSP Residual norm 9.999955993437e-01 >>> >> > > > > > 1 KSP Residual norm 4.019774691831e-06 >>> >> > > > > > Residual norms for fieldsplit_wp_ solve. >>> >> > > > > > 0 KSP Residual norm 0.000000000000e+00 >>> >> > > > > > 1 KSP preconditioned resid norm 5.003913641475e-01 true >>> resid >>> >> norm 4.692996324114e+01 ||r(i)||/||b|| 5.205677185522e-06 >>> >> > > > > > Residual norms for fieldsplit_u_ solve. >>> >> > > > > > 0 KSP Residual norm 1.000012180204e+00 >>> >> > > > > > 1 KSP Residual norm 1.017367950422e-05 >>> >> > > > > > Residual norms for fieldsplit_wp_ solve. >>> >> > > > > > 0 KSP Residual norm 0.000000000000e+00 >>> >> > > > > > 2 KSP preconditioned resid norm 2.330910333756e-07 true >>> resid >>> >> norm 3.474855463983e+01 ||r(i)||/||b|| 3.854461960453e-06 >>> >> > > > > > Residual norms for fieldsplit_u_ solve. >>> >> > > > > > 0 KSP Residual norm 1.000004200085e+00 >>> >> > > > > > 1 KSP Residual norm 6.231613102458e-06 >>> >> > > > > > Residual norms for fieldsplit_wp_ solve. >>> >> > > > > > 0 KSP Residual norm 0.000000000000e+00 >>> >> > > > > > 3 KSP preconditioned resid norm 8.671259838389e-11 true >>> resid >>> >> norm 3.545103468011e+01 ||r(i)||/||b|| 3.932384125024e-06 >>> >> > > > > > Linear solve converged due to CONVERGED_ATOL iterations 3 >>> >> > > > > > KSP Object: 4 MPI processes >>> >> > > > > > type: gmres >>> >> > > > > > GMRES: restart=1000, using Modified Gram-Schmidt >>> >> Orthogonalization >>> >> > > > > > GMRES: happy breakdown tolerance 1e-30 >>> >> > > > > > maximum iterations=1000, initial guess is zero >>> >> > > > > > tolerances: relative=1e-20, absolute=1e-09, >>> divergence=10000 >>> >> > > > > > left preconditioning >>> >> > > > > > using PRECONDITIONED norm type for convergence test >>> >> > > > > > PC Object: 4 MPI processes >>> >> > > > > > type: fieldsplit >>> >> > > > > > FieldSplit with MULTIPLICATIVE composition: total >>> splits = 2 >>> >> > > > > > Solver info for each split is in the following KSP >>> objects: >>> >> > > > > > Split number 0 Defined by IS >>> >> > > > > > KSP Object: (fieldsplit_u_) 4 MPI processes >>> >> > > > > > type: richardson >>> >> > > > > > Richardson: damping factor=1 >>> >> > > > > > maximum iterations=1, initial guess is zero >>> >> > > > > > tolerances: relative=1e-05, absolute=1e-50, >>> >> divergence=10000 >>> >> > > > > > left preconditioning >>> >> > > > > > using PRECONDITIONED norm type for convergence test >>> >> > > > > > PC Object: (fieldsplit_u_) 4 MPI processes >>> >> > > > > > type: lu >>> >> > > > > > LU: out-of-place factorization >>> >> > > > > > tolerance for zero pivot 2.22045e-14 >>> >> > > > > > matrix ordering: natural >>> >> > > > > > factor fill ratio given 0, needed 0 >>> >> > > > > > Factored matrix follows: >>> >> > > > > > Mat Object: 4 MPI processes >>> >> > > > > > type: mpiaij >>> >> > > > > > rows=938910, cols=938910 >>> >> > > > > > package used to perform factorization: pastix >>> >> > > > > > total: nonzeros=0, allocated nonzeros=0 >>> >> > > > > > Error : 3.36878e-14 >>> >> > > > > > total number of mallocs used during MatSetValues >>> calls >>> >> =0 >>> >> > > > > > PaStiX run parameters: >>> >> > > > > > Matrix type : >>> Unsymmetric >>> >> > > > > > Level of printing (0,1,2): 0 >>> >> > > > > > Number of refinements iterations : 3 >>> >> > > > > > Error : 3.36878e-14 >>> >> > > > > > linear system matrix = precond matrix: >>> >> > > > > > Mat Object: (fieldsplit_u_) 4 MPI processes >>> >> > > > > > type: mpiaij >>> >> > > > > > rows=938910, cols=938910, bs=3 >>> >> > > > > > Error : 3.36878e-14 >>> >> > > > > > Error : 3.36878e-14 >>> >> > > > > > total: nonzeros=8.60906e+07, allocated >>> >> nonzeros=8.60906e+07 >>> >> > > > > > total number of mallocs used during MatSetValues >>> calls =0 >>> >> > > > > > using I-node (on process 0) routines: found 78749 >>> >> nodes, limit used is 5 >>> >> > > > > > Split number 1 Defined by IS >>> >> > > > > > KSP Object: (fieldsplit_wp_) 4 MPI processes >>> >> > > > > > type: richardson >>> >> > > > > > Richardson: damping factor=1 >>> >> > > > > > maximum iterations=1, initial guess is zero >>> >> > > > > > tolerances: relative=1e-05, absolute=1e-50, >>> >> divergence=10000 >>> >> > > > > > left preconditioning >>> >> > > > > > using PRECONDITIONED norm type for convergence test >>> >> > > > > > PC Object: (fieldsplit_wp_) 4 MPI processes >>> >> > > > > > type: lu >>> >> > > > > > LU: out-of-place factorization >>> >> > > > > > tolerance for zero pivot 2.22045e-14 >>> >> > > > > > matrix ordering: natural >>> >> > > > > > factor fill ratio given 0, needed 0 >>> >> > > > > > Factored matrix follows: >>> >> > > > > > Mat Object: 4 MPI processes >>> >> > > > > > type: mpiaij >>> >> > > > > > rows=34141, cols=34141 >>> >> > > > > > package used to perform factorization: pastix >>> >> > > > > > Error : -nan >>> >> > > > > > Error : -nan >>> >> > > > > > Error : -nan >>> >> > > > > > total: nonzeros=0, allocated nonzeros=0 >>> >> > > > > > total number of mallocs used during >>> MatSetValues >>> >> calls =0 >>> >> > > > > > PaStiX run parameters: >>> >> > > > > > Matrix type : >>> Symmetric >>> >> > > > > > Level of printing (0,1,2): 0 >>> >> > > > > > Number of refinements iterations : 0 >>> >> > > > > > Error : -nan >>> >> > > > > > linear system matrix = precond matrix: >>> >> > > > > > Mat Object: (fieldsplit_wp_) 4 MPI >>> processes >>> >> > > > > > type: mpiaij >>> >> > > > > > rows=34141, cols=34141 >>> >> > > > > > total: nonzeros=485655, allocated nonzeros=485655 >>> >> > > > > > total number of mallocs used during MatSetValues >>> calls =0 >>> >> > > > > > not using I-node (on process 0) routines >>> >> > > > > > linear system matrix = precond matrix: >>> >> > > > > > Mat Object: 4 MPI processes >>> >> > > > > > type: mpiaij >>> >> > > > > > rows=973051, cols=973051 >>> >> > > > > > total: nonzeros=9.90037e+07, allocated >>> nonzeros=9.90037e+07 >>> >> > > > > > total number of mallocs used during MatSetValues calls >>> =0 >>> >> > > > > > using I-node (on process 0) routines: found 78749 >>> nodes, >>> >> limit used is 5 >>> >> > > > > > >>> >> > > > > > >>> >> > > > > > >>> >> > > > > > Giang >>> >> > > > > > >>> >> > > > > > On Sun, Apr 23, 2017 at 10:19 PM, Barry Smith < >>> >> bsmith at mcs.anl.gov> wrote: >>> >> > > > > > >>> >> > > > > > > On Apr 23, 2017, at 2:42 PM, Hoang Giang Bui < >>> >> hgbk2008 at gmail.com> wrote: >>> >> > > > > > > >>> >> > > > > > > Dear Matt/Barry >>> >> > > > > > > >>> >> > > > > > > With your options, it results in >>> >> > > > > > > >>> >> > > > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true >>> >> resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 >>> >> > > > > > > Residual norms for fieldsplit_u_ solve. >>> >> > > > > > > 0 KSP Residual norm 2.407308987203e+36 >>> >> > > > > > > 1 KSP Residual norm 5.797185652683e+72 >>> >> > > > > > >>> >> > > > > > It looks like Matt is right, hypre is seemly producing >>> useless >>> >> garbage. >>> >> > > > > > >>> >> > > > > > First how do things run on one process. If you have similar >>> >> problems then debug on one process (debugging any kind of problem is >>> always >>> >> far easy on one process). >>> >> > > > > > >>> >> > > > > > First run with -fieldsplit_u_type lu (instead of using >>> hypre) to >>> >> see if that works or also produces something bad. >>> >> > > > > > >>> >> > > > > > What is the operator and the boundary conditions for u? It >>> could >>> >> be singular. >>> >> > > > > > >>> >> > > > > > >>> >> > > > > > >>> >> > > > > > >>> >> > > > > > >>> >> > > > > > >>> >> > > > > > > Residual norms for fieldsplit_wp_ solve. >>> >> > > > > > > 0 KSP Residual norm 0.000000000000e+00 >>> >> > > > > > > ... >>> >> > > > > > > 999 KSP preconditioned resid norm 2.920157329174e+12 true >>> >> resid norm 9.015683504616e+06 ||r(i)||/||b|| 1.000059124102e+00 >>> >> > > > > > > Residual norms for fieldsplit_u_ solve. >>> >> > > > > > > 0 KSP Residual norm 1.533726746719e+36 >>> >> > > > > > > 1 KSP Residual norm 3.692757392261e+72 >>> >> > > > > > > Residual norms for fieldsplit_wp_ solve. >>> >> > > > > > > 0 KSP Residual norm 0.000000000000e+00 >>> >> > > > > > > >>> >> > > > > > > Do you suggest that the pastix solver for the "wp" block >>> >> encounters small pivot? In addition, seem like the "u" block is also >>> >> singular. >>> >> > > > > > > >>> >> > > > > > > Giang >>> >> > > > > > > >>> >> > > > > > > On Sun, Apr 23, 2017 at 7:39 PM, Barry Smith < >>> >> bsmith at mcs.anl.gov> wrote: >>> >> > > > > > > >>> >> > > > > > > Huge preconditioned norms but normal unpreconditioned >>> norms >>> >> almost always come from a very small pivot in an LU or ILU >>> factorization. >>> >> > > > > > > >>> >> > > > > > > The first thing to do is monitor the two sub solves. >>> Run >>> >> with the additional options -fieldsplit_u_ksp_type richardson >>> >> -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 >>> >> -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor >>> >> -fieldsplit_wp_ksp_max_it 1 >>> >> > > > > > > >>> >> > > > > > > > On Apr 23, 2017, at 12:22 PM, Hoang Giang Bui < >>> >> hgbk2008 at gmail.com> wrote: >>> >> > > > > > > > >>> >> > > > > > > > Hello >>> >> > > > > > > > >>> >> > > > > > > > I encountered a strange convergence behavior that I have >>> >> trouble to understand >>> >> > > > > > > > >>> >> > > > > > > > KSPSetFromOptions completed >>> >> > > > > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 >>> true >>> >> resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 >>> >> > > > > > > > 1 KSP preconditioned resid norm 2.933141742664e+29 >>> true >>> >> resid norm 9.015152282123e+06 ||r(i)||/||b|| 1.000000198575e+00 >>> >> > > > > > > > 2 KSP preconditioned resid norm 9.686409637174e+16 >>> true >>> >> resid norm 9.015354521944e+06 ||r(i)||/||b|| 1.000022631902e+00 >>> >> > > > > > > > 3 KSP preconditioned resid norm 4.219243615809e+15 >>> true >>> >> resid norm 9.017157702420e+06 ||r(i)||/||b|| 1.000222648583e+00 >>> >> > > > > > > > ..... >>> >> > > > > > > > 999 KSP preconditioned resid norm 3.043754298076e+12 >>> true >>> >> resid norm 9.015425041089e+06 ||r(i)||/||b|| 1.000030454195e+00 >>> >> > > > > > > > 1000 KSP preconditioned resid norm 3.043000287819e+12 >>> true >>> >> resid norm 9.015424313455e+06 ||r(i)||/||b|| 1.000030373483e+00 >>> >> > > > > > > > Linear solve did not converge due to DIVERGED_ITS >>> iterations >>> >> 1000 >>> >> > > > > > > > KSP Object: 4 MPI processes >>> >> > > > > > > > type: gmres >>> >> > > > > > > > GMRES: restart=1000, using Modified Gram-Schmidt >>> >> Orthogonalization >>> >> > > > > > > > GMRES: happy breakdown tolerance 1e-30 >>> >> > > > > > > > maximum iterations=1000, initial guess is zero >>> >> > > > > > > > tolerances: relative=1e-20, absolute=1e-09, >>> >> divergence=10000 >>> >> > > > > > > > left preconditioning >>> >> > > > > > > > using PRECONDITIONED norm type for convergence test >>> >> > > > > > > > PC Object: 4 MPI processes >>> >> > > > > > > > type: fieldsplit >>> >> > > > > > > > FieldSplit with MULTIPLICATIVE composition: total >>> splits >>> >> = 2 >>> >> > > > > > > > Solver info for each split is in the following KSP >>> >> objects: >>> >> > > > > > > > Split number 0 Defined by IS >>> >> > > > > > > > KSP Object: (fieldsplit_u_) 4 MPI processes >>> >> > > > > > > > type: preonly >>> >> > > > > > > > maximum iterations=10000, initial guess is zero >>> >> > > > > > > > tolerances: relative=1e-05, absolute=1e-50, >>> >> divergence=10000 >>> >> > > > > > > > left preconditioning >>> >> > > > > > > > using NONE norm type for convergence test >>> >> > > > > > > > PC Object: (fieldsplit_u_) 4 MPI processes >>> >> > > > > > > > type: hypre >>> >> > > > > > > > HYPRE BoomerAMG preconditioning >>> >> > > > > > > > HYPRE BoomerAMG: Cycle type V >>> >> > > > > > > > HYPRE BoomerAMG: Maximum number of levels 25 >>> >> > > > > > > > HYPRE BoomerAMG: Maximum number of iterations >>> PER >>> >> hypre call 1 >>> >> > > > > > > > HYPRE BoomerAMG: Convergence tolerance PER hypre >>> >> call 0 >>> >> > > > > > > > HYPRE BoomerAMG: Threshold for strong coupling >>> 0.6 >>> >> > > > > > > > HYPRE BoomerAMG: Interpolation truncation >>> factor 0 >>> >> > > > > > > > HYPRE BoomerAMG: Interpolation: max elements >>> per row >>> >> 0 >>> >> > > > > > > > HYPRE BoomerAMG: Number of levels of aggressive >>> >> coarsening 0 >>> >> > > > > > > > HYPRE BoomerAMG: Number of paths for aggressive >>> >> coarsening 1 >>> >> > > > > > > > HYPRE BoomerAMG: Maximum row sums 0.9 >>> >> > > > > > > > HYPRE BoomerAMG: Sweeps down 1 >>> >> > > > > > > > HYPRE BoomerAMG: Sweeps up 1 >>> >> > > > > > > > HYPRE BoomerAMG: Sweeps on coarse 1 >>> >> > > > > > > > HYPRE BoomerAMG: Relax down >>> >> symmetric-SOR/Jacobi >>> >> > > > > > > > HYPRE BoomerAMG: Relax up >>> >> symmetric-SOR/Jacobi >>> >> > > > > > > > HYPRE BoomerAMG: Relax on coarse >>> >> Gaussian-elimination >>> >> > > > > > > > HYPRE BoomerAMG: Relax weight (all) 1 >>> >> > > > > > > > HYPRE BoomerAMG: Outer relax weight (all) 1 >>> >> > > > > > > > HYPRE BoomerAMG: Using CF-relaxation >>> >> > > > > > > > HYPRE BoomerAMG: Measure type local >>> >> > > > > > > > HYPRE BoomerAMG: Coarsen type PMIS >>> >> > > > > > > > HYPRE BoomerAMG: Interpolation type classical >>> >> > > > > > > > linear system matrix = precond matrix: >>> >> > > > > > > > Mat Object: (fieldsplit_u_) 4 MPI >>> processes >>> >> > > > > > > > type: mpiaij >>> >> > > > > > > > rows=938910, cols=938910, bs=3 >>> >> > > > > > > > total: nonzeros=8.60906e+07, allocated >>> >> nonzeros=8.60906e+07 >>> >> > > > > > > > total number of mallocs used during MatSetValues >>> >> calls =0 >>> >> > > > > > > > using I-node (on process 0) routines: found >>> 78749 >>> >> nodes, limit used is 5 >>> >> > > > > > > > Split number 1 Defined by IS >>> >> > > > > > > > KSP Object: (fieldsplit_wp_) 4 MPI processes >>> >> > > > > > > > type: preonly >>> >> > > > > > > > maximum iterations=10000, initial guess is zero >>> >> > > > > > > > tolerances: relative=1e-05, absolute=1e-50, >>> >> divergence=10000 >>> >> > > > > > > > left preconditioning >>> >> > > > > > > > using NONE norm type for convergence test >>> >> > > > > > > > PC Object: (fieldsplit_wp_) 4 MPI processes >>> >> > > > > > > > type: lu >>> >> > > > > > > > LU: out-of-place factorization >>> >> > > > > > > > tolerance for zero pivot 2.22045e-14 >>> >> > > > > > > > matrix ordering: natural >>> >> > > > > > > > factor fill ratio given 0, needed 0 >>> >> > > > > > > > Factored matrix follows: >>> >> > > > > > > > Mat Object: 4 MPI processes >>> >> > > > > > > > type: mpiaij >>> >> > > > > > > > rows=34141, cols=34141 >>> >> > > > > > > > package used to perform factorization: >>> pastix >>> >> > > > > > > > Error : -nan >>> >> > > > > > > > Error : -nan >>> >> > > > > > > > total: nonzeros=0, allocated nonzeros=0 >>> >> > > > > > > > Error : -nan >>> >> > > > > > > > total number of mallocs used during MatSetValues >>> calls =0 >>> >> > > > > > > > PaStiX run parameters: >>> >> > > > > > > > Matrix type : >>> >> Symmetric >>> >> > > > > > > > Level of printing (0,1,2): 0 >>> >> > > > > > > > Number of refinements iterations : 0 >>> >> > > > > > > > Error : -nan >>> >> > > > > > > > linear system matrix = precond matrix: >>> >> > > > > > > > Mat Object: (fieldsplit_wp_) 4 MPI >>> processes >>> >> > > > > > > > type: mpiaij >>> >> > > > > > > > rows=34141, cols=34141 >>> >> > > > > > > > total: nonzeros=485655, allocated >>> nonzeros=485655 >>> >> > > > > > > > total number of mallocs used during MatSetValues >>> >> calls =0 >>> >> > > > > > > > not using I-node (on process 0) routines >>> >> > > > > > > > linear system matrix = precond matrix: >>> >> > > > > > > > Mat Object: 4 MPI processes >>> >> > > > > > > > type: mpiaij >>> >> > > > > > > > rows=973051, cols=973051 >>> >> > > > > > > > total: nonzeros=9.90037e+07, allocated >>> >> nonzeros=9.90037e+07 >>> >> > > > > > > > total number of mallocs used during MatSetValues >>> calls =0 >>> >> > > > > > > > using I-node (on process 0) routines: found 78749 >>> >> nodes, limit used is 5 >>> >> > > > > > > > >>> >> > > > > > > > The pattern of convergence gives a hint that this >>> system is >>> >> somehow bad/singular. But I don't know why the preconditioned error >>> goes up >>> >> too high. Anyone has an idea? >>> >> > > > > > > > >>> >> > > > > > > > Best regards >>> >> > > > > > > > Giang Bui >>> >> > > > > > > > >>> >> > > > > > > >>> >> > > > > > > >>> >> > > > > > >>> >> > > > > > >>> >> > > > > >>> >> > > > > >>> >> > > > >>> >> > > > >>> >> > > >>> >> > > >>> >> > >>> >> > >>> >> >>> >> >>> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Lukasz.Kaczmarczyk at glasgow.ac.uk Wed May 3 18:19:51 2017 From: Lukasz.Kaczmarczyk at glasgow.ac.uk (Lukasz Kaczmarczyk) Date: Wed, 3 May 2017 23:19:51 +0000 Subject: [petsc-users] strange convergence In-Reply-To: References: <7891536D-91FE-4BFF-8DAD-CE7AB85A4E57@mcs.anl.gov> <425BBB58-9721-49F3-8C86-940F08E925F7@mcs.anl.gov> <42EB791A-40C2-439F-A5F7-5F8C15CECA6F@mcs.anl.gov> <82193784-B4C4-47D7-80EA-25F549C9091B@mcs.anl.gov> <87wpa3wd5j.fsf@jedbrown.org> Message-ID: <4E1AE917-3FB2-4EE9-88A5-AB3EA612D5D6@glasgow.ac.uk> On 3 May 2017, at 22:19, Hoang Giang Bui > wrote: Hi Lukasz, thanks for sharing very interesting slide. Both of you are right, the mortar method starts from continuum argument then reduce to discrete space by discretizing the Lagrange multiplier. However, the way to choose the interpolation space has some implication on the properties of the mortar matrices. For example, the dual mortar space can help to reduce the multiplier by static condensation but it creates some numerical oscillation. In my opinion I think it's not stable despite a very sound theoretical foundation is developed. Both standard and dual mortar approach impose some drawbacks for high order contact because of negative shape function can create some spurious negative nodal gap. How do you cope with that case in your code? We exploit that we can set approx. order independently to Lagrange multiplier and displacements. Having hierarchical base, we have dofs on vertices, edges, faces and volumes (in a case of displacements) we can apply order to each entity independently. This gives us some control over spurious modes, we do not see them at the moment, but this is work in progress, and we do not enough testing. It is as well important where you approximate Lagrange multipliers, master or slave side. We have as well some flexibility of choosing a base; right choice would be to use Bernstein polynomial which has only positive values. We do have them yet, but we can use now Legendre, Lobatto (integrated Legendre) or Jacobi. Many things to test, more to read, and see what will happen. However, this question may be a bit off-topic. Come back to the main question, for mesh "gluing" using mortar method, the Schur matrix is S=[0 -D^T M^T] A^-1 [0 D M]^T, has the form of A_10 (A_00)^-1 A01 since A_11=0. The magnitude of S (~E^-1) is too small compare to A_00 (which is ~E for elasticity). I think in some case it's also rank deficient if three Lagrange multiplier is used per node (S is very ill-conditioned although A_00 is well). I'm skeptical here do you really solve the system of mortar with schur complement? You could be right here about Schur complement. We do not try this yet, but note that you can multiply constraint by the scalar, this is exploited for example in Popp_et_al-2009-A finite deformation mortar contact formulation using a primal?dual active set strategy, where constrain equation is scaled by constant equal to the young modulus. Lukasz Giang On Wed, May 3, 2017 at 2:55 PM, Lukasz Kaczmarczyk > wrote: On 3 May 2017, at 13:22, Matthew Knepley > wrote: On Wed, May 3, 2017 at 2:29 AM, Hoang Giang Bui > wrote: Dear Jed If I understood you correctly you suggest to avoid penalty by using the Lagrange multiplier for the mortar constraint? In this case it leads to the use of discrete Lagrange multiplier space. Sorry for being ignorant here, but why is the space "discrete"? It looks like you should have a continuum formulation of the mortar as well. Maybe I do not understand something fundamental. From this (https://en.wikipedia.org/wiki/Mortar_methods) short description, it seems that mortars begin from a continuum formulation, but are then reduced to the discrete level. This is no problem if done consistently, as for instance in the FETI method where efficient preconditioners exist. Hello, I copied the wrong link to mortar method, how we implemented it, see presentation http://doi.org/10.5281/zenodo.556996 You right that we always start from continuum formulation, on this we apply some discretisation, at the end Lagrange multiplier is expressed by a finite vector of discrete unknowns. It is better to formulate problem first for the continuum; you have better control on what you are doing and stability of the solution. Of course, you can add some constraints at the discreet level, after you discretised problem, but implicitly you have some continuous space for Lagrange multipliers, which is associated with shape functions which you use to discretise problem. In our problem which we have, we try to avoid rebuilding of the system of equations each time contact area is changing. We going to construct DM sub-problem for each body in contact, each sub-problem going to be solved using MG (adjacency for those matrices is fixed in time). All will go to put in nested matrix with the separate block for Lagrange multipliers (adjacency will change in each time step). For solving Lagrange multipliers we going to use FIELDSPLIT using Schur complement. I need to look more detail to FETI method, at are still at development stage for contact problem and direct solver works, for now, small problems at that point. In our code, we using higher order elements with hierarchical base, for this we using specialise MG solver, as you can see here, it works pretty well for moderate size problems, <100M http://mofem.eng.gla.ac.uk/mofem/html/_p_c_m_g_set_up_via_approx_orders_8cpp.html Regards, Lukasz Thanks, Matt Do you or anyone already have experience using discrete Lagrange multiplier space with Petsc? There is also similar question on stackexchange https://scicomp.stackexchange.com/questions/25113/preconditioners-and-discrete-lagrange-multipliers Giang On Sat, Apr 29, 2017 at 3:34 PM, Jed Brown > wrote: Hoang Giang Bui > writes: > Hi Barry > > The first block is from a standard solid mechanics discretization based on > balance of momentum equation. There is some material involved but in > principal it's well-posed elasticity equation with positive definite > tangent operator. The "gluing business" uses the mortar method to keep the > continuity of displacement. Instead of using Lagrange multiplier to treat > the constraint I used penalty method to penalize the energy. The > discretization form of mortar is quite simple > > \int_{\Gamma_1} { rho * (\delta u_1 - \delta u_2) * (u_1 - u_2) dA } > > rho is penalty parameter. In the simulation I initially set it low (~E) to > preserve the conditioning of the system. There are two things that can go wrong here with AMG: * The penalty term can mess up the strength of connection heuristics such that you get poor choice of C-points (classical AMG like BoomerAMG) or poor choice of aggregates (smoothed aggregation). * The penalty term can prevent Jacobi smoothing from being effective; in this case, it can lead to poor coarse basis functions (higher energy than they should be) and poor smoothing in an MG cycle. You can fix the poor smoothing in the MG cycle by using a stronger smoother, like ASM with some overlap. I'm generally not a fan of penalty methods due to the irritating tradeoffs and often poor solver performance. > In the figure below, the colorful blocks are u_1 and the base is u_2. Both > u_1 and u_2 use isoparametric quadratic approximation. > > ? > Snapshot.png > > ??? > > Giang > > On Fri, Apr 28, 2017 at 6:21 PM, Barry Smith > wrote: > >> >> Ok, so boomerAMG algebraic multigrid is not good for the first block. >> You mentioned the first block has two things glued together? AMG is >> fantastic for certain problems but doesn't work for everything. >> >> Tell us more about the first block, what PDE it comes from, what >> discretization, and what the "gluing business" is and maybe we'll have >> suggestions for how to precondition it. >> >> Barry >> >> > On Apr 28, 2017, at 3:56 AM, Hoang Giang Bui > wrote: >> > >> > It's in fact quite good >> > >> > Residual norms for fieldsplit_u_ solve. >> > 0 KSP Residual norm 4.014715925568e+00 >> > 1 KSP Residual norm 2.160497019264e-10 >> > Residual norms for fieldsplit_wp_ solve. >> > 0 KSP Residual norm 0.000000000000e+00 >> > 0 KSP preconditioned resid norm 4.014715925568e+00 true resid norm >> 9.006493082896e+06 ||r(i)||/||b|| 1.000000000000e+00 >> > Residual norms for fieldsplit_u_ solve. >> > 0 KSP Residual norm 9.999999999416e-01 >> > 1 KSP Residual norm 7.118380416383e-11 >> > Residual norms for fieldsplit_wp_ solve. >> > 0 KSP Residual norm 0.000000000000e+00 >> > 1 KSP preconditioned resid norm 1.701150951035e-10 true resid norm >> 5.494262251846e-04 ||r(i)||/||b|| 6.100334726599e-11 >> > Linear solve converged due to CONVERGED_ATOL iterations 1 >> > >> > Giang >> > >> > On Thu, Apr 27, 2017 at 5:25 PM, Barry Smith > wrote: >> > >> > Run again using LU on both blocks to see what happens. >> > >> > >> > > On Apr 27, 2017, at 2:14 AM, Hoang Giang Bui > >> wrote: >> > > >> > > I have changed the way to tie the nonconforming mesh. It seems the >> matrix now is better >> > > >> > > with -pc_type lu the output is >> > > 0 KSP preconditioned resid norm 3.308678584240e-01 true resid norm >> 9.006493082896e+06 ||r(i)||/||b|| 1.000000000000e+00 >> > > 1 KSP preconditioned resid norm 2.004313395301e-12 true resid norm >> 2.549872332830e-05 ||r(i)||/||b|| 2.831148938173e-12 >> > > Linear solve converged due to CONVERGED_ATOL iterations 1 >> > > >> > > >> > > with -pc_type fieldsplit -fieldsplit_u_pc_type hypre >> -fieldsplit_wp_pc_type lu the convergence is slow >> > > 0 KSP preconditioned resid norm 1.116302362553e-01 true resid norm >> 9.006493083520e+06 ||r(i)||/||b|| 1.000000000000e+00 >> > > 1 KSP preconditioned resid norm 2.582134825666e-02 true resid norm >> 9.268347719866e+06 ||r(i)||/||b|| 1.029073984060e+00 >> > > ... >> > > 824 KSP preconditioned resid norm 1.018542387738e-09 true resid norm >> 2.906608839310e+02 ||r(i)||/||b|| 3.227237074804e-05 >> > > 825 KSP preconditioned resid norm 9.743727947637e-10 true resid norm >> 2.820369993061e+02 ||r(i)||/||b|| 3.131485215062e-05 >> > > Linear solve converged due to CONVERGED_ATOL iterations 825 >> > > >> > > checking with additional -fieldsplit_u_ksp_type richardson >> -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 >> -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor >> -fieldsplit_wp_ksp_max_it 1 gives >> > > >> > > 0 KSP preconditioned resid norm 1.116302362553e-01 true resid norm >> 9.006493083520e+06 ||r(i)||/||b|| 1.000000000000e+00 >> > > Residual norms for fieldsplit_u_ solve. >> > > 0 KSP Residual norm 5.803507549280e-01 >> > > 1 KSP Residual norm 2.069538175950e-01 >> > > Residual norms for fieldsplit_wp_ solve. >> > > 0 KSP Residual norm 0.000000000000e+00 >> > > 1 KSP preconditioned resid norm 2.582134825666e-02 true resid norm >> 9.268347719866e+06 ||r(i)||/||b|| 1.029073984060e+00 >> > > Residual norms for fieldsplit_u_ solve. >> > > 0 KSP Residual norm 7.831796195225e-01 >> > > 1 KSP Residual norm 1.734608520110e-01 >> > > Residual norms for fieldsplit_wp_ solve. >> > > 0 KSP Residual norm 0.000000000000e+00 >> > > .... >> > > 823 KSP preconditioned resid norm 1.065070135605e-09 true resid norm >> 3.081881356833e+02 ||r(i)||/||b|| 3.421843916665e-05 >> > > Residual norms for fieldsplit_u_ solve. >> > > 0 KSP Residual norm 6.113806394327e-01 >> > > 1 KSP Residual norm 1.535465290944e-01 >> > > Residual norms for fieldsplit_wp_ solve. >> > > 0 KSP Residual norm 0.000000000000e+00 >> > > 824 KSP preconditioned resid norm 1.018542387746e-09 true resid norm >> 2.906608839353e+02 ||r(i)||/||b|| 3.227237074851e-05 >> > > Residual norms for fieldsplit_u_ solve. >> > > 0 KSP Residual norm 6.123437055586e-01 >> > > 1 KSP Residual norm 1.524661826133e-01 >> > > Residual norms for fieldsplit_wp_ solve. >> > > 0 KSP Residual norm 0.000000000000e+00 >> > > 825 KSP preconditioned resid norm 9.743727947718e-10 true resid norm >> 2.820369990571e+02 ||r(i)||/||b|| 3.131485212298e-05 >> > > Linear solve converged due to CONVERGED_ATOL iterations 825 >> > > >> > > >> > > The residual for wp block is zero since in this first step the rhs is >> zero. As can see in the output, the multigrid does not perform well to >> reduce the residual in the sub-solve. Is my observation right? what can be >> done to improve this? >> > > >> > > >> > > Giang >> > > >> > > On Tue, Apr 25, 2017 at 12:17 AM, Barry Smith > >> wrote: >> > > >> > > This can happen in the matrix is singular or nearly singular or if >> the factorization generates small pivots, which can occur for even >> nonsingular problems if the matrix is poorly scaled or just plain nasty. >> > > >> > > >> > > > On Apr 24, 2017, at 5:10 PM, Hoang Giang Bui > >> wrote: >> > > > >> > > > It took a while, here I send you the output >> > > > >> > > > 0 KSP preconditioned resid norm 3.129073545457e+05 true resid norm >> 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 >> > > > 1 KSP preconditioned resid norm 7.442444222843e-01 true resid norm >> 1.003356247696e+02 ||r(i)||/||b|| 1.112966720375e-05 >> > > > 2 KSP preconditioned resid norm 3.267453132529e-07 true resid norm >> 3.216722968300e+01 ||r(i)||/||b|| 3.568130084011e-06 >> > > > 3 KSP preconditioned resid norm 1.155046883816e-11 true resid norm >> 3.234460376820e+01 ||r(i)||/||b|| 3.587805194854e-06 >> > > > Linear solve converged due to CONVERGED_ATOL iterations 3 >> > > > KSP Object: 4 MPI processes >> > > > type: gmres >> > > > GMRES: restart=1000, using Modified Gram-Schmidt >> Orthogonalization >> > > > GMRES: happy breakdown tolerance 1e-30 >> > > > maximum iterations=1000, initial guess is zero >> > > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 >> > > > left preconditioning >> > > > using PRECONDITIONED norm type for convergence test >> > > > PC Object: 4 MPI processes >> > > > type: lu >> > > > LU: out-of-place factorization >> > > > tolerance for zero pivot 2.22045e-14 >> > > > matrix ordering: natural >> > > > factor fill ratio given 0, needed 0 >> > > > Factored matrix follows: >> > > > Mat Object: 4 MPI processes >> > > > type: mpiaij >> > > > rows=973051, cols=973051 >> > > > package used to perform factorization: pastix >> > > > Error : 3.24786e-14 >> > > > total: nonzeros=0, allocated nonzeros=0 >> > > > total number of mallocs used during MatSetValues calls =0 >> > > > PaStiX run parameters: >> > > > Matrix type : Unsymmetric >> > > > Level of printing (0,1,2): 0 >> > > > Number of refinements iterations : 3 >> > > > Error : 3.24786e-14 >> > > > linear system matrix = precond matrix: >> > > > Mat Object: 4 MPI processes >> > > > type: mpiaij >> > > > rows=973051, cols=973051 >> > > > Error : 3.24786e-14 >> > > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 >> > > > total number of mallocs used during MatSetValues calls =0 >> > > > using I-node (on process 0) routines: found 78749 nodes, limit >> used is 5 >> > > > Error : 3.24786e-14 >> > > > >> > > > It doesn't do as you said. Something is not right here. I will look >> in depth. >> > > > >> > > > Giang >> > > > >> > > > On Mon, Apr 24, 2017 at 8:21 PM, Barry Smith > >> wrote: >> > > > >> > > > > On Apr 24, 2017, at 12:47 PM, Hoang Giang Bui > >> wrote: >> > > > > >> > > > > Good catch. I get this for the very first step, maybe at that time >> the rhs_w is zero. >> > > > >> > > > With the multiplicative composition the right hand side of the >> second solve is the initial right hand side of the second solve minus >> A_10*x where x is the solution to the first sub solve and A_10 is the lower >> left block of the outer matrix. So unless both the initial right hand side >> has a zero for the second block and A_10 is identically zero the right hand >> side for the second sub solve should not be zero. Is A_10 == 0? >> > > > >> > > > >> > > > > In the later step, it shows 2 step convergence >> > > > > >> > > > > Residual norms for fieldsplit_u_ solve. >> > > > > 0 KSP Residual norm 3.165886479830e+04 >> > > > > 1 KSP Residual norm 2.905922877684e-01 >> > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > 0 KSP Residual norm 2.397669419027e-01 >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> > > > > 0 KSP preconditioned resid norm 3.165886479920e+04 true resid >> norm 7.963616922323e+05 ||r(i)||/||b|| 1.000000000000e+00 >> > > > > Residual norms for fieldsplit_u_ solve. >> > > > > 0 KSP Residual norm 9.999891813771e-01 >> > > > > 1 KSP Residual norm 1.512000395579e-05 >> > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > 0 KSP Residual norm 8.192702188243e-06 >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> > > > > 1 KSP preconditioned resid norm 5.252183822848e-02 true resid >> norm 7.135927677844e+04 ||r(i)||/||b|| 8.960661653427e-02 >> > > > >> > > > The outer residual norms are still wonky, the preconditioned >> residual norm goes from 3.165886479920e+04 to 5.252183822848e-02 which is a >> huge drop but the 7.963616922323e+05 drops very much less >> 7.135927677844e+04. This is not normal. >> > > > >> > > > What if you just use -pc_type lu for the entire system (no >> fieldsplit), does the true residual drop to almost zero in the first >> iteration (as it should?). Send the output. >> > > > >> > > > >> > > > >> > > > > Residual norms for fieldsplit_u_ solve. >> > > > > 0 KSP Residual norm 6.946213936597e-01 >> > > > > 1 KSP Residual norm 1.195514007343e-05 >> > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > 0 KSP Residual norm 1.025694497535e+00 >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> > > > > 2 KSP preconditioned resid norm 8.785709535405e-03 true resid >> norm 1.419341799277e+04 ||r(i)||/||b|| 1.782282866091e-02 >> > > > > Residual norms for fieldsplit_u_ solve. >> > > > > 0 KSP Residual norm 7.255149996405e-01 >> > > > > 1 KSP Residual norm 6.583512434218e-06 >> > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > 0 KSP Residual norm 1.015229700337e+00 >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> > > > > 3 KSP preconditioned resid norm 7.110407712709e-04 true resid >> norm 5.284940654154e+02 ||r(i)||/||b|| 6.636357205153e-04 >> > > > > Residual norms for fieldsplit_u_ solve. >> > > > > 0 KSP Residual norm 3.512243341400e-01 >> > > > > 1 KSP Residual norm 2.032490351200e-06 >> > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > 0 KSP Residual norm 1.282327290982e+00 >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> > > > > 4 KSP preconditioned resid norm 3.482036620521e-05 true resid >> norm 4.291231924307e+01 ||r(i)||/||b|| 5.388546393133e-05 >> > > > > Residual norms for fieldsplit_u_ solve. >> > > > > 0 KSP Residual norm 3.423609338053e-01 >> > > > > 1 KSP Residual norm 4.213703301972e-07 >> > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > 0 KSP Residual norm 1.157384757538e+00 >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> > > > > 5 KSP preconditioned resid norm 1.203470314534e-06 true resid >> norm 4.544956156267e+00 ||r(i)||/||b|| 5.707150658550e-06 >> > > > > Residual norms for fieldsplit_u_ solve. >> > > > > 0 KSP Residual norm 3.838596289995e-01 >> > > > > 1 KSP Residual norm 9.927864176103e-08 >> > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > 0 KSP Residual norm 1.066298905618e+00 >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> > > > > 6 KSP preconditioned resid norm 3.331619244266e-08 true resid >> norm 2.821511729024e+00 ||r(i)||/||b|| 3.543002829675e-06 >> > > > > Residual norms for fieldsplit_u_ solve. >> > > > > 0 KSP Residual norm 4.624964188094e-01 >> > > > > 1 KSP Residual norm 6.418229775372e-08 >> > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > 0 KSP Residual norm 9.800784311614e-01 >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> > > > > 7 KSP preconditioned resid norm 8.788046233297e-10 true resid >> norm 2.849209671705e+00 ||r(i)||/||b|| 3.577783436215e-06 >> > > > > Linear solve converged due to CONVERGED_ATOL iterations 7 >> > > > > >> > > > > The outer operator is an explicit matrix. >> > > > > >> > > > > Giang >> > > > > >> > > > > On Mon, Apr 24, 2017 at 7:32 PM, Barry Smith > >> wrote: >> > > > > >> > > > > > On Apr 24, 2017, at 3:16 AM, Hoang Giang Bui > >> wrote: >> > > > > > >> > > > > > Thanks Barry, trying with -fieldsplit_u_type lu gives better >> convergence. I still used 4 procs though, probably with 1 proc it should >> also be the same. >> > > > > > >> > > > > > The u block used a Nitsche-type operator to connect two >> non-matching domains. I don't think it will leave some rigid body motion >> leads to not sufficient constraints. Maybe you have other idea? >> > > > > > >> > > > > > Residual norms for fieldsplit_u_ solve. >> > > > > > 0 KSP Residual norm 3.129067184300e+05 >> > > > > > 1 KSP Residual norm 5.906261468196e-01 >> > > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > > 0 KSP Residual norm 0.000000000000e+00 >> > > > > >> > > > > ^^^^ something is wrong here. The sub solve should not be >> starting with a 0 residual (this means the right hand side for this sub >> solve is zero which it should not be). >> > > > > >> > > > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 >> > > > > >> > > > > >> > > > > How are you providing the outer operator? As an explicit matrix >> or with some shell matrix? >> > > > > >> > > > > >> > > > > >> > > > > > 0 KSP preconditioned resid norm 3.129067184300e+05 true resid >> norm 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 >> > > > > > Residual norms for fieldsplit_u_ solve. >> > > > > > 0 KSP Residual norm 9.999955993437e-01 >> > > > > > 1 KSP Residual norm 4.019774691831e-06 >> > > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > > 0 KSP Residual norm 0.000000000000e+00 >> > > > > > 1 KSP preconditioned resid norm 5.003913641475e-01 true resid >> norm 4.692996324114e+01 ||r(i)||/||b|| 5.205677185522e-06 >> > > > > > Residual norms for fieldsplit_u_ solve. >> > > > > > 0 KSP Residual norm 1.000012180204e+00 >> > > > > > 1 KSP Residual norm 1.017367950422e-05 >> > > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > > 0 KSP Residual norm 0.000000000000e+00 >> > > > > > 2 KSP preconditioned resid norm 2.330910333756e-07 true resid >> norm 3.474855463983e+01 ||r(i)||/||b|| 3.854461960453e-06 >> > > > > > Residual norms for fieldsplit_u_ solve. >> > > > > > 0 KSP Residual norm 1.000004200085e+00 >> > > > > > 1 KSP Residual norm 6.231613102458e-06 >> > > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > > 0 KSP Residual norm 0.000000000000e+00 >> > > > > > 3 KSP preconditioned resid norm 8.671259838389e-11 true resid >> norm 3.545103468011e+01 ||r(i)||/||b|| 3.932384125024e-06 >> > > > > > Linear solve converged due to CONVERGED_ATOL iterations 3 >> > > > > > KSP Object: 4 MPI processes >> > > > > > type: gmres >> > > > > > GMRES: restart=1000, using Modified Gram-Schmidt >> Orthogonalization >> > > > > > GMRES: happy breakdown tolerance 1e-30 >> > > > > > maximum iterations=1000, initial guess is zero >> > > > > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 >> > > > > > left preconditioning >> > > > > > using PRECONDITIONED norm type for convergence test >> > > > > > PC Object: 4 MPI processes >> > > > > > type: fieldsplit >> > > > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 >> > > > > > Solver info for each split is in the following KSP objects: >> > > > > > Split number 0 Defined by IS >> > > > > > KSP Object: (fieldsplit_u_) 4 MPI processes >> > > > > > type: richardson >> > > > > > Richardson: damping factor=1 >> > > > > > maximum iterations=1, initial guess is zero >> > > > > > tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000 >> > > > > > left preconditioning >> > > > > > using PRECONDITIONED norm type for convergence test >> > > > > > PC Object: (fieldsplit_u_) 4 MPI processes >> > > > > > type: lu >> > > > > > LU: out-of-place factorization >> > > > > > tolerance for zero pivot 2.22045e-14 >> > > > > > matrix ordering: natural >> > > > > > factor fill ratio given 0, needed 0 >> > > > > > Factored matrix follows: >> > > > > > Mat Object: 4 MPI processes >> > > > > > type: mpiaij >> > > > > > rows=938910, cols=938910 >> > > > > > package used to perform factorization: pastix >> > > > > > total: nonzeros=0, allocated nonzeros=0 >> > > > > > Error : 3.36878e-14 >> > > > > > total number of mallocs used during MatSetValues calls >> =0 >> > > > > > PaStiX run parameters: >> > > > > > Matrix type : Unsymmetric >> > > > > > Level of printing (0,1,2): 0 >> > > > > > Number of refinements iterations : 3 >> > > > > > Error : 3.36878e-14 >> > > > > > linear system matrix = precond matrix: >> > > > > > Mat Object: (fieldsplit_u_) 4 MPI processes >> > > > > > type: mpiaij >> > > > > > rows=938910, cols=938910, bs=3 >> > > > > > Error : 3.36878e-14 >> > > > > > Error : 3.36878e-14 >> > > > > > total: nonzeros=8.60906e+07, allocated >> nonzeros=8.60906e+07 >> > > > > > total number of mallocs used during MatSetValues calls =0 >> > > > > > using I-node (on process 0) routines: found 78749 >> nodes, limit used is 5 >> > > > > > Split number 1 Defined by IS >> > > > > > KSP Object: (fieldsplit_wp_) 4 MPI processes >> > > > > > type: richardson >> > > > > > Richardson: damping factor=1 >> > > > > > maximum iterations=1, initial guess is zero >> > > > > > tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000 >> > > > > > left preconditioning >> > > > > > using PRECONDITIONED norm type for convergence test >> > > > > > PC Object: (fieldsplit_wp_) 4 MPI processes >> > > > > > type: lu >> > > > > > LU: out-of-place factorization >> > > > > > tolerance for zero pivot 2.22045e-14 >> > > > > > matrix ordering: natural >> > > > > > factor fill ratio given 0, needed 0 >> > > > > > Factored matrix follows: >> > > > > > Mat Object: 4 MPI processes >> > > > > > type: mpiaij >> > > > > > rows=34141, cols=34141 >> > > > > > package used to perform factorization: pastix >> > > > > > Error : -nan >> > > > > > Error : -nan >> > > > > > Error : -nan >> > > > > > total: nonzeros=0, allocated nonzeros=0 >> > > > > > total number of mallocs used during MatSetValues >> calls =0 >> > > > > > PaStiX run parameters: >> > > > > > Matrix type : Symmetric >> > > > > > Level of printing (0,1,2): 0 >> > > > > > Number of refinements iterations : 0 >> > > > > > Error : -nan >> > > > > > linear system matrix = precond matrix: >> > > > > > Mat Object: (fieldsplit_wp_) 4 MPI processes >> > > > > > type: mpiaij >> > > > > > rows=34141, cols=34141 >> > > > > > total: nonzeros=485655, allocated nonzeros=485655 >> > > > > > total number of mallocs used during MatSetValues calls =0 >> > > > > > not using I-node (on process 0) routines >> > > > > > linear system matrix = precond matrix: >> > > > > > Mat Object: 4 MPI processes >> > > > > > type: mpiaij >> > > > > > rows=973051, cols=973051 >> > > > > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 >> > > > > > total number of mallocs used during MatSetValues calls =0 >> > > > > > using I-node (on process 0) routines: found 78749 nodes, >> limit used is 5 >> > > > > > >> > > > > > >> > > > > > >> > > > > > Giang >> > > > > > >> > > > > > On Sun, Apr 23, 2017 at 10:19 PM, Barry Smith < >> bsmith at mcs.anl.gov> wrote: >> > > > > > >> > > > > > > On Apr 23, 2017, at 2:42 PM, Hoang Giang Bui < >> hgbk2008 at gmail.com> wrote: >> > > > > > > >> > > > > > > Dear Matt/Barry >> > > > > > > >> > > > > > > With your options, it results in >> > > > > > > >> > > > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true >> resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 >> > > > > > > Residual norms for fieldsplit_u_ solve. >> > > > > > > 0 KSP Residual norm 2.407308987203e+36 >> > > > > > > 1 KSP Residual norm 5.797185652683e+72 >> > > > > > >> > > > > > It looks like Matt is right, hypre is seemly producing useless >> garbage. >> > > > > > >> > > > > > First how do things run on one process. If you have similar >> problems then debug on one process (debugging any kind of problem is always >> far easy on one process). >> > > > > > >> > > > > > First run with -fieldsplit_u_type lu (instead of using hypre) to >> see if that works or also produces something bad. >> > > > > > >> > > > > > What is the operator and the boundary conditions for u? It could >> be singular. >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > > > 0 KSP Residual norm 0.000000000000e+00 >> > > > > > > ... >> > > > > > > 999 KSP preconditioned resid norm 2.920157329174e+12 true >> resid norm 9.015683504616e+06 ||r(i)||/||b|| 1.000059124102e+00 >> > > > > > > Residual norms for fieldsplit_u_ solve. >> > > > > > > 0 KSP Residual norm 1.533726746719e+36 >> > > > > > > 1 KSP Residual norm 3.692757392261e+72 >> > > > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > > > 0 KSP Residual norm 0.000000000000e+00 >> > > > > > > >> > > > > > > Do you suggest that the pastix solver for the "wp" block >> encounters small pivot? In addition, seem like the "u" block is also >> singular. >> > > > > > > >> > > > > > > Giang >> > > > > > > >> > > > > > > On Sun, Apr 23, 2017 at 7:39 PM, Barry Smith < >> bsmith at mcs.anl.gov> wrote: >> > > > > > > >> > > > > > > Huge preconditioned norms but normal unpreconditioned norms >> almost always come from a very small pivot in an LU or ILU factorization. >> > > > > > > >> > > > > > > The first thing to do is monitor the two sub solves. Run >> with the additional options -fieldsplit_u_ksp_type richardson >> -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 >> -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor >> -fieldsplit_wp_ksp_max_it 1 >> > > > > > > >> > > > > > > > On Apr 23, 2017, at 12:22 PM, Hoang Giang Bui < >> hgbk2008 at gmail.com> wrote: >> > > > > > > > >> > > > > > > > Hello >> > > > > > > > >> > > > > > > > I encountered a strange convergence behavior that I have >> trouble to understand >> > > > > > > > >> > > > > > > > KSPSetFromOptions completed >> > > > > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true >> resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 >> > > > > > > > 1 KSP preconditioned resid norm 2.933141742664e+29 true >> resid norm 9.015152282123e+06 ||r(i)||/||b|| 1.000000198575e+00 >> > > > > > > > 2 KSP preconditioned resid norm 9.686409637174e+16 true >> resid norm 9.015354521944e+06 ||r(i)||/||b|| 1.000022631902e+00 >> > > > > > > > 3 KSP preconditioned resid norm 4.219243615809e+15 true >> resid norm 9.017157702420e+06 ||r(i)||/||b|| 1.000222648583e+00 >> > > > > > > > ..... >> > > > > > > > 999 KSP preconditioned resid norm 3.043754298076e+12 true >> resid norm 9.015425041089e+06 ||r(i)||/||b|| 1.000030454195e+00 >> > > > > > > > 1000 KSP preconditioned resid norm 3.043000287819e+12 true >> resid norm 9.015424313455e+06 ||r(i)||/||b|| 1.000030373483e+00 >> > > > > > > > Linear solve did not converge due to DIVERGED_ITS iterations >> 1000 >> > > > > > > > KSP Object: 4 MPI processes >> > > > > > > > type: gmres >> > > > > > > > GMRES: restart=1000, using Modified Gram-Schmidt >> Orthogonalization >> > > > > > > > GMRES: happy breakdown tolerance 1e-30 >> > > > > > > > maximum iterations=1000, initial guess is zero >> > > > > > > > tolerances: relative=1e-20, absolute=1e-09, >> divergence=10000 >> > > > > > > > left preconditioning >> > > > > > > > using PRECONDITIONED norm type for convergence test >> > > > > > > > PC Object: 4 MPI processes >> > > > > > > > type: fieldsplit >> > > > > > > > FieldSplit with MULTIPLICATIVE composition: total splits >> = 2 >> > > > > > > > Solver info for each split is in the following KSP >> objects: >> > > > > > > > Split number 0 Defined by IS >> > > > > > > > KSP Object: (fieldsplit_u_) 4 MPI processes >> > > > > > > > type: preonly >> > > > > > > > maximum iterations=10000, initial guess is zero >> > > > > > > > tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000 >> > > > > > > > left preconditioning >> > > > > > > > using NONE norm type for convergence test >> > > > > > > > PC Object: (fieldsplit_u_) 4 MPI processes >> > > > > > > > type: hypre >> > > > > > > > HYPRE BoomerAMG preconditioning >> > > > > > > > HYPRE BoomerAMG: Cycle type V >> > > > > > > > HYPRE BoomerAMG: Maximum number of levels 25 >> > > > > > > > HYPRE BoomerAMG: Maximum number of iterations PER >> hypre call 1 >> > > > > > > > HYPRE BoomerAMG: Convergence tolerance PER hypre >> call 0 >> > > > > > > > HYPRE BoomerAMG: Threshold for strong coupling 0.6 >> > > > > > > > HYPRE BoomerAMG: Interpolation truncation factor 0 >> > > > > > > > HYPRE BoomerAMG: Interpolation: max elements per row >> 0 >> > > > > > > > HYPRE BoomerAMG: Number of levels of aggressive >> coarsening 0 >> > > > > > > > HYPRE BoomerAMG: Number of paths for aggressive >> coarsening 1 >> > > > > > > > HYPRE BoomerAMG: Maximum row sums 0.9 >> > > > > > > > HYPRE BoomerAMG: Sweeps down 1 >> > > > > > > > HYPRE BoomerAMG: Sweeps up 1 >> > > > > > > > HYPRE BoomerAMG: Sweeps on coarse 1 >> > > > > > > > HYPRE BoomerAMG: Relax down >> symmetric-SOR/Jacobi >> > > > > > > > HYPRE BoomerAMG: Relax up >> symmetric-SOR/Jacobi >> > > > > > > > HYPRE BoomerAMG: Relax on coarse >> Gaussian-elimination >> > > > > > > > HYPRE BoomerAMG: Relax weight (all) 1 >> > > > > > > > HYPRE BoomerAMG: Outer relax weight (all) 1 >> > > > > > > > HYPRE BoomerAMG: Using CF-relaxation >> > > > > > > > HYPRE BoomerAMG: Measure type local >> > > > > > > > HYPRE BoomerAMG: Coarsen type PMIS >> > > > > > > > HYPRE BoomerAMG: Interpolation type classical >> > > > > > > > linear system matrix = precond matrix: >> > > > > > > > Mat Object: (fieldsplit_u_) 4 MPI processes >> > > > > > > > type: mpiaij >> > > > > > > > rows=938910, cols=938910, bs=3 >> > > > > > > > total: nonzeros=8.60906e+07, allocated >> nonzeros=8.60906e+07 >> > > > > > > > total number of mallocs used during MatSetValues >> calls =0 >> > > > > > > > using I-node (on process 0) routines: found 78749 >> nodes, limit used is 5 >> > > > > > > > Split number 1 Defined by IS >> > > > > > > > KSP Object: (fieldsplit_wp_) 4 MPI processes >> > > > > > > > type: preonly >> > > > > > > > maximum iterations=10000, initial guess is zero >> > > > > > > > tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000 >> > > > > > > > left preconditioning >> > > > > > > > using NONE norm type for convergence test >> > > > > > > > PC Object: (fieldsplit_wp_) 4 MPI processes >> > > > > > > > type: lu >> > > > > > > > LU: out-of-place factorization >> > > > > > > > tolerance for zero pivot 2.22045e-14 >> > > > > > > > matrix ordering: natural >> > > > > > > > factor fill ratio given 0, needed 0 >> > > > > > > > Factored matrix follows: >> > > > > > > > Mat Object: 4 MPI processes >> > > > > > > > type: mpiaij >> > > > > > > > rows=34141, cols=34141 >> > > > > > > > package used to perform factorization: pastix >> > > > > > > > Error : -nan >> > > > > > > > Error : -nan >> > > > > > > > total: nonzeros=0, allocated nonzeros=0 >> > > > > > > > Error : -nan >> > > > > > > > total number of mallocs used during MatSetValues calls =0 >> > > > > > > > PaStiX run parameters: >> > > > > > > > Matrix type : >> Symmetric >> > > > > > > > Level of printing (0,1,2): 0 >> > > > > > > > Number of refinements iterations : 0 >> > > > > > > > Error : -nan >> > > > > > > > linear system matrix = precond matrix: >> > > > > > > > Mat Object: (fieldsplit_wp_) 4 MPI processes >> > > > > > > > type: mpiaij >> > > > > > > > rows=34141, cols=34141 >> > > > > > > > total: nonzeros=485655, allocated nonzeros=485655 >> > > > > > > > total number of mallocs used during MatSetValues >> calls =0 >> > > > > > > > not using I-node (on process 0) routines >> > > > > > > > linear system matrix = precond matrix: >> > > > > > > > Mat Object: 4 MPI processes >> > > > > > > > type: mpiaij >> > > > > > > > rows=973051, cols=973051 >> > > > > > > > total: nonzeros=9.90037e+07, allocated >> nonzeros=9.90037e+07 >> > > > > > > > total number of mallocs used during MatSetValues calls =0 >> > > > > > > > using I-node (on process 0) routines: found 78749 >> nodes, limit used is 5 >> > > > > > > > >> > > > > > > > The pattern of convergence gives a hint that this system is >> somehow bad/singular. But I don't know why the preconditioned error goes up >> too high. Anyone has an idea? >> > > > > > > > >> > > > > > > > Best regards >> > > > > > > > Giang Bui >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > >> > > > > >> > > > >> > > > >> > > >> > > >> > >> > >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Wed May 3 21:05:57 2017 From: hzhang at mcs.anl.gov (Hong) Date: Wed, 3 May 2017 21:05:57 -0500 Subject: [petsc-users] GAMG scaling In-Reply-To: References: Message-ID: I basically used 'runex56' and set '-ne' be compatible with np. Then I used option '-matptap_via scalable' '-matptap_via hypre' '-matptap_via nonscalable' I attached a job script below. In master branch, I set default as 'nonscalable' for small - medium size matrices, and automatically switch to 'scalable' when matrix size gets larger. Petsc solver uses MatPtAP, which does local RAP to reduce communication and accelerate computation. I suggest you simply use default setting. Let me know if you encounter trouble. Hong job.ne174.n8.np125.sh: runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56 -ne 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1 -pc_gamg_reuse_interpolation true -ksp_converged_reason -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1 -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01 -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30 -pc_gamg_repartition false -pc_mg_cycle_type v -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi -mg_coarse_ksp_type cg -ksp_monitor -log_view -matptap_via scalable > log.ne174.n8.np125.scalable runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56 -ne 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1 -pc_gamg_reuse_interpolation true -ksp_converged_reason -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1 -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01 -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30 -pc_gamg_repartition false -pc_mg_cycle_type v -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi -mg_coarse_ksp_type cg -ksp_monitor -log_view -matptap_via hypre > log.ne174.n8.np125.hypre runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56 -ne 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1 -pc_gamg_reuse_interpolation true -ksp_converged_reason -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1 -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01 -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30 -pc_gamg_repartition false -pc_mg_cycle_type v -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi -mg_coarse_ksp_type cg -ksp_monitor -log_view -matptap_via nonscalable > log.ne174.n8.np125.nonscalable runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56 -ne 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1 -pc_gamg_reuse_interpolation true -ksp_converged_reason -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1 -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01 -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30 -pc_gamg_repartition false -pc_mg_cycle_type v -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi -mg_coarse_ksp_type cg -ksp_monitor -log_view > log.ne174.n8.np125 On Wed, May 3, 2017 at 2:08 PM, Mark Adams wrote: > Hong,the input files do not seem to be accessible. What are the command > line option? (I don't see a "rap" or "scale" in the source). > > > > On Wed, May 3, 2017 at 12:17 PM, Hong wrote: > >> Mark, >> Below is the copy of my email sent to you on Feb 27: >> >> I implemented scalable MatPtAP and did comparisons of three >> implementations using ex56.c on alcf cetus machine (this machine has >> small memory, 1GB/core): >> - nonscalable PtAP: use an array of length PN to do dense axpy >> - scalable PtAP: do sparse axpy without use of PN array >> - hypre PtAP. >> >> The results are attached. Summary: >> - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre PtAP >> - scalable PtAP is 4x faster than hypre PtAP >> - hypre uses less memory (see job.ne399.n63.np1000.sh) >> >> Based on above observation, I set the default PtAP algorithm as >> 'nonscalable'. >> When PN > local estimated nonzero of C=PtAP, then switch default to >> 'scalable'. >> User can overwrite default. >> >> For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I get >> MatPtAP 3.6224e+01 (nonscalable for small mats, >> scalable for larger ones) >> scalable MatPtAP 4.6129e+01 >> hypre 1.9389e+02 >> >> This work in on petsc-master. Give it a try. If you encounter any >> problem, let me know. >> >> Hong >> >> On Wed, May 3, 2017 at 10:01 AM, Mark Adams wrote: >> >>> (Hong), what is the current state of optimizing RAP for scaling? >>> >>> Nate, is driving 3D elasticity problems at scaling with GAMG and we are >>> working out performance problems. They are hitting problems at ~1.5B dof >>> problems on a basic Cray (XC30 I think). >>> >>> Thanks, >>> Mark >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu May 4 07:44:04 2017 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 4 May 2017 08:44:04 -0400 Subject: [petsc-users] GAMG scaling In-Reply-To: References: Message-ID: Thanks Hong, I am not seeing these options with -help ... On Wed, May 3, 2017 at 10:05 PM, Hong wrote: > I basically used 'runex56' and set '-ne' be compatible with np. > Then I used option > '-matptap_via scalable' > '-matptap_via hypre' > '-matptap_via nonscalable' > > I attached a job script below. > > In master branch, I set default as 'nonscalable' for small - medium size > matrices, and automatically switch to 'scalable' when matrix size gets > larger. > > Petsc solver uses MatPtAP, which does local RAP to reduce communication > and accelerate computation. > I suggest you simply use default setting. Let me know if you encounter > trouble. > > Hong > > job.ne174.n8.np125.sh: > runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56 -ne > 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1 > -pc_gamg_reuse_interpolation true -ksp_converged_reason > -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg > -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1 > -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev > -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg > -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu > -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01 > -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30 > -pc_gamg_repartition false -pc_mg_cycle_type v -pc_gamg_use_parallel_coarse_grid_solver > -mg_coarse_pc_type jacobi -mg_coarse_ksp_type cg -ksp_monitor -log_view > -matptap_via scalable > log.ne174.n8.np125.scalable > > runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56 -ne > 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1 > -pc_gamg_reuse_interpolation true -ksp_converged_reason > -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg > -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1 > -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev > -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg > -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu > -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01 > -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30 > -pc_gamg_repartition false -pc_mg_cycle_type v -pc_gamg_use_parallel_coarse_grid_solver > -mg_coarse_pc_type jacobi -mg_coarse_ksp_type cg -ksp_monitor -log_view > -matptap_via hypre > log.ne174.n8.np125.hypre > > runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56 -ne > 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1 > -pc_gamg_reuse_interpolation true -ksp_converged_reason > -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg > -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1 > -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev > -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg > -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu > -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01 > -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30 > -pc_gamg_repartition false -pc_mg_cycle_type v -pc_gamg_use_parallel_coarse_grid_solver > -mg_coarse_pc_type jacobi -mg_coarse_ksp_type cg -ksp_monitor -log_view > -matptap_via nonscalable > log.ne174.n8.np125.nonscalable > > runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56 -ne > 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1 > -pc_gamg_reuse_interpolation true -ksp_converged_reason > -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg > -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1 > -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev > -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg > -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu > -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01 > -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30 > -pc_gamg_repartition false -pc_mg_cycle_type v -pc_gamg_use_parallel_coarse_grid_solver > -mg_coarse_pc_type jacobi -mg_coarse_ksp_type cg -ksp_monitor -log_view > > log.ne174.n8.np125 > > On Wed, May 3, 2017 at 2:08 PM, Mark Adams wrote: > >> Hong,the input files do not seem to be accessible. What are the command >> line option? (I don't see a "rap" or "scale" in the source). >> >> >> >> On Wed, May 3, 2017 at 12:17 PM, Hong wrote: >> >>> Mark, >>> Below is the copy of my email sent to you on Feb 27: >>> >>> I implemented scalable MatPtAP and did comparisons of three >>> implementations using ex56.c on alcf cetus machine (this machine has >>> small memory, 1GB/core): >>> - nonscalable PtAP: use an array of length PN to do dense axpy >>> - scalable PtAP: do sparse axpy without use of PN array >>> - hypre PtAP. >>> >>> The results are attached. Summary: >>> - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre PtAP >>> - scalable PtAP is 4x faster than hypre PtAP >>> - hypre uses less memory (see job.ne399.n63.np1000.sh) >>> >>> Based on above observation, I set the default PtAP algorithm as >>> 'nonscalable'. >>> When PN > local estimated nonzero of C=PtAP, then switch default to >>> 'scalable'. >>> User can overwrite default. >>> >>> For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I get >>> MatPtAP 3.6224e+01 (nonscalable for small mats, >>> scalable for larger ones) >>> scalable MatPtAP 4.6129e+01 >>> hypre 1.9389e+02 >>> >>> This work in on petsc-master. Give it a try. If you encounter any >>> problem, let me know. >>> >>> Hong >>> >>> On Wed, May 3, 2017 at 10:01 AM, Mark Adams wrote: >>> >>>> (Hong), what is the current state of optimizing RAP for scaling? >>>> >>>> Nate, is driving 3D elasticity problems at scaling with GAMG and we are >>>> working out performance problems. They are hitting problems at ~1.5B dof >>>> problems on a basic Cray (XC30 I think). >>>> >>>> Thanks, >>>> Mark >>>> >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu May 4 08:09:10 2017 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 4 May 2017 09:09:10 -0400 Subject: [petsc-users] SNES error In-Reply-To: References: <677760BF-5666-4C9D-A064-B495ACD80889@mcs.anl.gov> Message-ID: OK, that makes sense, it fails when my velocity grid gets not tiny. I can use tine velocity grids for now. On Tue, May 2, 2017 at 11:18 AM, Matthew Knepley wrote: > On Tue, May 2, 2017 at 10:10 AM, Mark Adams wrote: > >> /Users/markadams/Codes/petsc/arch-macosx-gnu-O/bin/mpiexec -n 1 ./vml >> -v_coord_cylinder -x_dm_refine 2 -v_dm_refine 2 -snes_rtol 1.e-6 -snes_stol >> 1.e-6 -ts_type cn -snes_fd -pc_type lu -ksp_type preonly >> -x_petscspace_order 1 -x_petscspace_poly_tensor -v_petscspace_order 1 >> -v_petscspace_poly_tensor -ts_dt .1 -ts_max_steps 10 -ts_final_time 1e10 >> -verbose 3 -num_species 1 -snes_monitor -masses 1,2,4 -thermal_temps >> 30,30,30 -domainv_lo -2,-2 -domainv_hi 2,2 -domainx_lo -12,-12 -domainx_hi >> 12,12 -E 0,0 -blobx_radius 2 -x_dm_view hdf5:x.h5 -x_vec_view >> hdf5:x.h5::append -v_dm_view hdf5:v.h5 -v_vec_view hdf5:v.h5::append >> -x_pre_dm_view hdf5:prex.h5 -x_pre_vec_view hdf5:prex.h5::append >> -snes_converged_reason -snes_linesearch_monitor -ts_adapt_monitor >> main call SetupXDiscretization >> main call SetInitialConditionDomain >> VMLViewX DMGetOutputSequenceNumber=-1, >> cmd_str=-x_pre_vec_view >> 0) species 0: charge density= -2.3940791757186e+00, z-momentum= >> 5.9851979392559e-01, energy= 3.2314073646197e-01, thermal-flux= >> 2.4419137539877e-01 >> 0) Normalized: charge density= -2.3940791757186e+00, z >> momentum= 5.9851979392559e-01, energy= 3.2314073646197e-01, thermal flux= >> 2.4419137539877e-01, local: 64 X cells, 81 X vertices >> VMLViewX DMGetOutputSequenceNumber=0, cmd_str=(null) >> VMLViewV DMGetOutputSequenceNumber=-1 >> 0 SNES Function norm 4.097052680599e+00 >> 1 SNES Function norm 1.213148652908e-09 >> Nonlinear solve did not converge due to DIVERGED_FUNCTION_COUNT >> iterations 1 >> > > Neat! Mark, I think this has to do with you calling SNESEvaluateFunc() > inside another one. We limit the number of function evaluations > to 10,000 by default, mostly to corral line searches. I think you hit > this, and thus need to up the count. > > Thanks, > > Matt > > >> TSAdapt none step 0 stage rejected t=0 + 1.000e-01, >> nonlinear solve failures 1 greater than current TS allowed >> [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [0]PETSC ERROR: >> [0]PETSC ERROR: TSStep has failed due to DIVERGED_NONLINEAR_SOLVE, >> increase -ts_max_snes_failures or make negative to attempt recovery >> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html >> for trouble shooting. >> [0]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3659-g699918129c >> GIT Date: 2017-04-26 08:18:35 -0400 >> [0]PETSC ERROR: ./vml on a arch-macosx-gnu-O named MarksMac-5.local by >> markadams Tue May 2 11:04:02 2017 >> [0]PETSC ERROR: Configure options --with-cc=clang --with-cc++=clang++ >> COPTFLAGS="-O3 -g -mavx2" CXXOPTFLAGS="-O3 -g -mavx2" FOPTFLAGS="-O3 -g >> -mavx2" --download-mpich=1 --download-parmetis=1 --download-metis=1 >> --download-hypre=1 --download-ml=1 --download-triangle=1 >> --download-ctetgen=1 --download-p4est=1 --with-x=0 --download-superlu_dist >> --download-superlu --download-ctetgen --with-debugging=0 --download-hdf5=1 >> PETSC_ARCH=arch-macosx-gnu-O --download-chaco --with-viewfromoptions=1 >> >> >> On Mon, May 1, 2017 at 10:25 PM, Barry Smith wrote: >> >>> >>> and >>> >>> -snes_linesearch_monitor >>> -ts_adapt_monitor >>> >>> >>> > On May 1, 2017, at 7:51 PM, Matthew Knepley wrote: >>> > >>> > Run with -snes_converged_reason. >>> > >>> > Matt >>> > >>> > On Mon, May 1, 2017 at 7:14 PM, Mark Adams wrote: >>> > I get this SNES failure and I don't understand what the problem is. >>> The rtol is 1.e-6 and the first iteration reduces the residual by 9 orders >>> of magnitude. Yet, TS is not satisfied. What is going on here? >>> > >>> > mpiexec -n 1 ./vml -v_coord_cylinder -x_dm_refine 2 -v_dm_refine 2 >>> -snes_rtol 1.e-6 -snes_stol 1.e-6 -ts_type cn -snes_fd -pc_type lu >>> -ksp_type preonly -x_petscspace_order 1 -x_petscspace_poly_tensor >>> -v_petscspace_order 1 -v_petscspace_poly_tensor -ts_dt .1 -ts_max_steps 10 >>> -ts_final_time 1e10 -verbose 3 -num_species 1 -snes_monitor -masses 1,2,4 >>> -thermal_temps 30,30,30 -domainv_lo -2,-2 -domainv_hi 2,2 -domainx_lo >>> -12,-12 -domainx_hi 12,12 -E 0,0 -blobx_radius 2 -x_dm_view hdf5:x.h5 >>> -x_vec_view hdf5:x.h5::append -v_dm_view hdf5:v.h5 -v_vec_view >>> hdf5:v.h5::append -x_pre_dm_view hdf5:prex.h5 -x_pre_vec_view >>> hdf5:prex.h5::append >>> > .... >>> > >>> > 0 SNES Function norm 4.097052680599e+00 >>> > 1 SNES Function norm 1.213148652908e-09 >>> > [0]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> > [0]PETSC ERROR: >>> > [0]PETSC ERROR: TSStep has failed due to DIVERGED_NONLINEAR_SOLVE, >>> increase -ts_max_snes_failures or make negative to attempt recovery >>> > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>> ocumentation/faq.html for trouble shooting. >>> > [0]PETSC ERROR: Petsc Development GIT revision: >>> v3.7.6-3659-g699918129c GIT Date: 2017-04-26 08:18:35 -0400 >>> > [0]PETSC ERROR: ./vml on a arch-macosx-gnu-O named MarksMac-5.local by >>> markadams Mon May 1 19:21:32 2017 >>> > [0]PETSC ERROR: Configure options --with-cc=clang --with-cc++=clang++ >>> COPTFLAGS="-O3 -g -mavx2" CXXOPTFLAGS="-O3 -g -mavx2" FOPTFLAGS="-O3 -g >>> -mavx2" --download-mpich=1 --download-parmetis=1 --download-metis=1 >>> --download-hypre=1 --download-ml=1 --download-triangle=1 >>> --download-ctetgen=1 --download-p4est=1 --with-x=0 --download-superlu_dist >>> --download-superlu --download-ctetgen --with-debugging=0 --download-hdf5=1 >>> PETSC_ARCH=arch-macosx-gnu-O --download-chaco --with-viewfromoptions=1 >>> > >>> > >>> > >>> > >>> > -- >>> > What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> > -- Norbert Wiener >>> >>> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From natacha.bereux at gmail.com Thu May 4 09:17:33 2017 From: natacha.bereux at gmail.com (Natacha BEREUX) Date: Thu, 4 May 2017 16:17:33 +0200 Subject: [petsc-users] Configure nested PCFIELDSPLIT with general index sets In-Reply-To: References: <6496846F-19F8-4494-87E1-DDC390513370@imperial.ac.uk> Message-ID: Dear Matt, I re-checked the master branch. To be precise, I downloaded the nightly tarball this morning (from http://ftp.mcs.anl.gov/pub/petsc/petsc-master.tar.gz) I am sure that the Fortran interface of DMSellSetCreateFieldDecomposition is missing. And it is quite tricky to add it. I have tried to write something in src/dm/impls/shell/ftn-custom/zdmshellf.c but I am not familiar with callbacks. Any help would be greatly appreciated! Best regards Natacha On Fri, Apr 28, 2017 at 8:11 PM, Matthew Knepley wrote: > On Fri, Apr 28, 2017 at 1:09 PM, Matthew Knepley > wrote: > >> On Fri, Apr 28, 2017 at 11:48 AM, Natacha BEREUX < >> natacha.bereux at gmail.com> wrote: >> >>> Dear Matt, >>> Sorry for my (very) late reply. >>> I was not able to find the Fortran interface of >>> DMSellSetCreateFieldDecomposition in the late petsc-3.7.6 fortran (and >>> my code still fails to link). >>> I have the feeling that it is missing in the master branch. >>> And I was not able to get it on bitbucket either. >>> Is there a branch from which I can pull your commit ? >>> >> >> I would either: >> >> a) Use the 'next' branch >> >> or >> >> b) wait until Monday for me to merge to 'master' >> >> This merge has been held up, but can now go forward. >> > > I just checked master. It was already merged. Please recheck your master. > > Thanks, > > Matt > > >> Thanks, >> >> Matt >> >> >>> Thans a lot for your help, >>> Natacha >>> >>> On Thu, Mar 30, 2017 at 9:25 PM, Matthew Knepley >>> wrote: >>> >>>> On Wed, Mar 22, 2017 at 1:45 PM, Natacha BEREUX < >>>> natacha.bereux at gmail.com> wrote: >>>> >>>>> Hello Matt, >>>>> Thanks a lot for your answers. >>>>> Since I am working on a large FEM Fortran code, I have to stick to >>>>> Fortran. >>>>> Do you know if someone plans to add this Fortran interface? Or may be >>>>> I could do it myself ? Is this particular interface very hard to add ? >>>>> Perhaps could I mimic some other interface ? >>>>> What would you advise ? >>>>> >>>> >>>> I have added the interface in branch knepley/feature-fortran-compose. >>>> I also put this in the 'next' branch. It >>>> should make it to master soon. There is a test in >>>> sys/examples/tests/ex13f >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Best regards, >>>>> Natacha >>>>> >>>>> On Wed, Mar 22, 2017 at 12:33 PM, Matthew Knepley >>>>> wrote: >>>>> >>>>>> On Wed, Mar 22, 2017 at 10:03 AM, Natacha BEREUX < >>>>>> natacha.bereux at gmail.com> wrote: >>>>>> >>>>>>> Hello, >>>>>>> if my understanding is correct, the approach proposed by Matt and >>>>>>> Lawrence is the following : >>>>>>> - create a DMShell (DMShellCreate) >>>>>>> - define my own CreateFieldDecomposition to return the index sets I >>>>>>> need (for displacement, pressure and temperature degrees of freedom) : >>>>>>> myCreateFieldDecomposition(... ) >>>>>>> - set it in the DMShell ( DMShellSetCreateFieldDecomposition) >>>>>>> - then sets the DM in KSP context (KSPSetDM) >>>>>>> >>>>>>> I have some more questions >>>>>>> - I did not succeed in setting my own CreateFieldDecomposition in >>>>>>> the DMShell : link fails with " unknown reference to ? >>>>>>> dmshellsetcreatefielddecomposition_ ?. Could it be a Fortran >>>>>>> problem (I am using Fortran)? Is this routine available in PETSc Fortran >>>>>>> interface ? \ >>>>>>> >>>>>> >>>>>> Yes, exactly. The Fortran interface for passing function pointers is >>>>>> complex, and no one has added this function yet. >>>>>> >>>>>> >>>>>>> - CreateFieldDecomposition is supposed to return an array of dms (to >>>>>>> define the fields). I am not able to return such datas. Do I return a >>>>>>> PETSC_NULL_OBJECT instead ? >>>>>>> >>>>>> >>>>>> Yes. >>>>>> >>>>>> >>>>>>> - do I have to provide something else to define the DMShell ? >>>>>>> >>>>>> >>>>>> I think you will have to return local and global vectors, but this >>>>>> just means creating a vector of the correct size and distribution. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>>> Thanks a lot for your help >>>>>>> Natacha >>>>>>> >>>>>>> On Tue, Mar 21, 2017 at 2:44 PM, Natacha BEREUX < >>>>>>> natacha.bereux at gmail.com> wrote: >>>>>>> >>>>>>>> Thanks for your quick answers. To be honest, I am not familiar at >>>>>>>> all with DMShells and DMPlexes. But since it is what I need, I am going to >>>>>>>> try it. >>>>>>>> Thanks again for your advices, >>>>>>>> Natacha >>>>>>>> >>>>>>>> On Tue, Mar 21, 2017 at 2:27 PM, Lawrence Mitchell < >>>>>>>> lawrence.mitchell at imperial.ac.uk> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> > On 21 Mar 2017, at 13:24, Matthew Knepley >>>>>>>>> wrote: >>>>>>>>> > >>>>>>>>> > I think the remedy is as easy as specifying a DMShell that has a >>>>>>>>> PetscSection (DMSetDefaultSection) with your ordering, and >>>>>>>>> > I think this is how Firedrake (http://www.firedrakeproject.org/) >>>>>>>>> does it. >>>>>>>>> >>>>>>>>> We actually don't use a section, but we do provide >>>>>>>>> DMCreateFieldDecomposition_Shell. >>>>>>>>> >>>>>>>>> If you have a section that describes all the fields, then I think >>>>>>>>> if the DMShell knows about it, you effectively get the same behaviour as >>>>>>>>> DMPlex (which does the decomposition in the same manner?). >>>>>>>>> >>>>>>>>> > However, I usually use a DMPlex which knows about my >>>>>>>>> > mesh, so I am not sure if this strategy has any holes. >>>>>>>>> >>>>>>>>> I haven't noticed anything yet. >>>>>>>>> >>>>>>>>> Lawrence >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu May 4 10:07:59 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 4 May 2017 10:07:59 -0500 Subject: [petsc-users] Configure nested PCFIELDSPLIT with general index sets In-Reply-To: References: <6496846F-19F8-4494-87E1-DDC390513370@imperial.ac.uk> Message-ID: On Thu, May 4, 2017 at 9:17 AM, Natacha BEREUX wrote: > Dear Matt, > I re-checked the master branch. To be precise, I downloaded the nightly > tarball this morning (from http://ftp.mcs.anl.gov/pub/ > petsc/petsc-master.tar.gz) > I am sure that the Fortran interface of DMSellSetCreateFieldDecomposition > is missing. > And it is quite tricky to add it. I have tried to write something in > src/dm/impls/shell/ftn-custom/zdmshellf.c but I am not familiar with > callbacks. > Any help would be greatly appreciated! > I added PetscObjectCompose() to Fortran, so you could compose IS objects when needed. Setting function pointers from Fortran is indeed complicated and I do not yet know how to do it. Could you submit and Issue ( https://bitbucket.org/petsc/petsc/issues?status=new&status=open) and someone will add this as soon as we have time? In the meantime, it would not be hard to create the DMShell in C and have a small C wrapper for your Fortran function to create the decomposition. Thanks, Matt > Best regards > Natacha > > On Fri, Apr 28, 2017 at 8:11 PM, Matthew Knepley > wrote: > >> On Fri, Apr 28, 2017 at 1:09 PM, Matthew Knepley >> wrote: >> >>> On Fri, Apr 28, 2017 at 11:48 AM, Natacha BEREUX < >>> natacha.bereux at gmail.com> wrote: >>> >>>> Dear Matt, >>>> Sorry for my (very) late reply. >>>> I was not able to find the Fortran interface of >>>> DMSellSetCreateFieldDecomposition in the late petsc-3.7.6 fortran (and >>>> my code still fails to link). >>>> I have the feeling that it is missing in the master branch. >>>> And I was not able to get it on bitbucket either. >>>> Is there a branch from which I can pull your commit ? >>>> >>> >>> I would either: >>> >>> a) Use the 'next' branch >>> >>> or >>> >>> b) wait until Monday for me to merge to 'master' >>> >>> This merge has been held up, but can now go forward. >>> >> >> I just checked master. It was already merged. Please recheck your master. >> >> Thanks, >> >> Matt >> >> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thans a lot for your help, >>>> Natacha >>>> >>>> On Thu, Mar 30, 2017 at 9:25 PM, Matthew Knepley >>>> wrote: >>>> >>>>> On Wed, Mar 22, 2017 at 1:45 PM, Natacha BEREUX < >>>>> natacha.bereux at gmail.com> wrote: >>>>> >>>>>> Hello Matt, >>>>>> Thanks a lot for your answers. >>>>>> Since I am working on a large FEM Fortran code, I have to stick to >>>>>> Fortran. >>>>>> Do you know if someone plans to add this Fortran interface? Or may >>>>>> be I could do it myself ? Is this particular interface very hard to add ? >>>>>> Perhaps could I mimic some other interface ? >>>>>> What would you advise ? >>>>>> >>>>> >>>>> I have added the interface in branch knepley/feature-fortran-compose. >>>>> I also put this in the 'next' branch. It >>>>> should make it to master soon. There is a test in >>>>> sys/examples/tests/ex13f >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> Best regards, >>>>>> Natacha >>>>>> >>>>>> On Wed, Mar 22, 2017 at 12:33 PM, Matthew Knepley >>>>>> wrote: >>>>>> >>>>>>> On Wed, Mar 22, 2017 at 10:03 AM, Natacha BEREUX < >>>>>>> natacha.bereux at gmail.com> wrote: >>>>>>> >>>>>>>> Hello, >>>>>>>> if my understanding is correct, the approach proposed by Matt and >>>>>>>> Lawrence is the following : >>>>>>>> - create a DMShell (DMShellCreate) >>>>>>>> - define my own CreateFieldDecomposition to return the index sets I >>>>>>>> need (for displacement, pressure and temperature degrees of freedom) : >>>>>>>> myCreateFieldDecomposition(... ) >>>>>>>> - set it in the DMShell ( DMShellSetCreateFieldDecomposition) >>>>>>>> - then sets the DM in KSP context (KSPSetDM) >>>>>>>> >>>>>>>> I have some more questions >>>>>>>> - I did not succeed in setting my own CreateFieldDecomposition in >>>>>>>> the DMShell : link fails with " unknown reference to ? >>>>>>>> dmshellsetcreatefielddecomposition_ ?. Could it be a Fortran >>>>>>>> problem (I am using Fortran)? Is this routine available in PETSc Fortran >>>>>>>> interface ? \ >>>>>>>> >>>>>>> >>>>>>> Yes, exactly. The Fortran interface for passing function pointers is >>>>>>> complex, and no one has added this function yet. >>>>>>> >>>>>>> >>>>>>>> - CreateFieldDecomposition is supposed to return an array of dms >>>>>>>> (to define the fields). I am not able to return such datas. Do I return a >>>>>>>> PETSC_NULL_OBJECT instead ? >>>>>>>> >>>>>>> >>>>>>> Yes. >>>>>>> >>>>>>> >>>>>>>> - do I have to provide something else to define the DMShell ? >>>>>>>> >>>>>>> >>>>>>> I think you will have to return local and global vectors, but this >>>>>>> just means creating a vector of the correct size and distribution. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Matt >>>>>>> >>>>>>> >>>>>>>> Thanks a lot for your help >>>>>>>> Natacha >>>>>>>> >>>>>>>> On Tue, Mar 21, 2017 at 2:44 PM, Natacha BEREUX < >>>>>>>> natacha.bereux at gmail.com> wrote: >>>>>>>> >>>>>>>>> Thanks for your quick answers. To be honest, I am not familiar at >>>>>>>>> all with DMShells and DMPlexes. But since it is what I need, I am going to >>>>>>>>> try it. >>>>>>>>> Thanks again for your advices, >>>>>>>>> Natacha >>>>>>>>> >>>>>>>>> On Tue, Mar 21, 2017 at 2:27 PM, Lawrence Mitchell < >>>>>>>>> lawrence.mitchell at imperial.ac.uk> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> > On 21 Mar 2017, at 13:24, Matthew Knepley >>>>>>>>>> wrote: >>>>>>>>>> > >>>>>>>>>> > I think the remedy is as easy as specifying a DMShell that has >>>>>>>>>> a PetscSection (DMSetDefaultSection) with your ordering, and >>>>>>>>>> > I think this is how Firedrake (http://www.firedrakeproject.org/) >>>>>>>>>> does it. >>>>>>>>>> >>>>>>>>>> We actually don't use a section, but we do provide >>>>>>>>>> DMCreateFieldDecomposition_Shell. >>>>>>>>>> >>>>>>>>>> If you have a section that describes all the fields, then I think >>>>>>>>>> if the DMShell knows about it, you effectively get the same behaviour as >>>>>>>>>> DMPlex (which does the decomposition in the same manner?). >>>>>>>>>> >>>>>>>>>> > However, I usually use a DMPlex which knows about my >>>>>>>>>> > mesh, so I am not sure if this strategy has any holes. >>>>>>>>>> >>>>>>>>>> I haven't noticed anything yet. >>>>>>>>>> >>>>>>>>>> Lawrence >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their >>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>> experiments lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>> >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Thu May 4 10:33:51 2017 From: hzhang at mcs.anl.gov (Hong) Date: Thu, 4 May 2017 10:33:51 -0500 Subject: [petsc-users] GAMG scaling In-Reply-To: References: Message-ID: Mark: > > I am not seeing these options with -help ... > Hmm, this might be a bug - I'll check it. Hong > > On Wed, May 3, 2017 at 10:05 PM, Hong wrote: > >> I basically used 'runex56' and set '-ne' be compatible with np. >> Then I used option >> '-matptap_via scalable' >> '-matptap_via hypre' >> '-matptap_via nonscalable' >> >> I attached a job script below. >> >> In master branch, I set default as 'nonscalable' for small - medium size >> matrices, and automatically switch to 'scalable' when matrix size gets >> larger. >> >> Petsc solver uses MatPtAP, which does local RAP to reduce communication >> and accelerate computation. >> I suggest you simply use default setting. Let me know if you encounter >> trouble. >> >> Hong >> >> job.ne174.n8.np125.sh: >> runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56 >> -ne 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1 >> -pc_gamg_reuse_interpolation true -ksp_converged_reason >> -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg >> -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1 >> -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev >> -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg >> -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu >> -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01 >> -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30 >> -pc_gamg_repartition false -pc_mg_cycle_type v >> -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi >> -mg_coarse_ksp_type cg -ksp_monitor -log_view -matptap_via scalable > >> log.ne174.n8.np125.scalable >> >> runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56 >> -ne 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1 >> -pc_gamg_reuse_interpolation true -ksp_converged_reason >> -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg >> -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1 >> -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev >> -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg >> -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu >> -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01 >> -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30 >> -pc_gamg_repartition false -pc_mg_cycle_type v >> -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi >> -mg_coarse_ksp_type cg -ksp_monitor -log_view -matptap_via hypre > >> log.ne174.n8.np125.hypre >> >> runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56 >> -ne 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1 >> -pc_gamg_reuse_interpolation true -ksp_converged_reason >> -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg >> -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1 >> -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev >> -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg >> -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu >> -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01 >> -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30 >> -pc_gamg_repartition false -pc_mg_cycle_type v >> -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi >> -mg_coarse_ksp_type cg -ksp_monitor -log_view -matptap_via nonscalable > >> log.ne174.n8.np125.nonscalable >> >> runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56 >> -ne 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1 >> -pc_gamg_reuse_interpolation true -ksp_converged_reason >> -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg >> -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1 >> -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev >> -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg >> -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu >> -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01 >> -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30 >> -pc_gamg_repartition false -pc_mg_cycle_type v >> -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi >> -mg_coarse_ksp_type cg -ksp_monitor -log_view > log.ne174.n8.np125 >> >> On Wed, May 3, 2017 at 2:08 PM, Mark Adams wrote: >> >>> Hong,the input files do not seem to be accessible. What are the command >>> line option? (I don't see a "rap" or "scale" in the source). >>> >>> >>> >>> On Wed, May 3, 2017 at 12:17 PM, Hong wrote: >>> >>>> Mark, >>>> Below is the copy of my email sent to you on Feb 27: >>>> >>>> I implemented scalable MatPtAP and did comparisons of three >>>> implementations using ex56.c on alcf cetus machine (this machine has >>>> small memory, 1GB/core): >>>> - nonscalable PtAP: use an array of length PN to do dense axpy >>>> - scalable PtAP: do sparse axpy without use of PN array >>>> - hypre PtAP. >>>> >>>> The results are attached. Summary: >>>> - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre PtAP >>>> - scalable PtAP is 4x faster than hypre PtAP >>>> - hypre uses less memory (see job.ne399.n63.np1000.sh) >>>> >>>> Based on above observation, I set the default PtAP algorithm as >>>> 'nonscalable'. >>>> When PN > local estimated nonzero of C=PtAP, then switch default to >>>> 'scalable'. >>>> User can overwrite default. >>>> >>>> For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I get >>>> MatPtAP 3.6224e+01 (nonscalable for small mats, >>>> scalable for larger ones) >>>> scalable MatPtAP 4.6129e+01 >>>> hypre 1.9389e+02 >>>> >>>> This work in on petsc-master. Give it a try. If you encounter any >>>> problem, let me know. >>>> >>>> Hong >>>> >>>> On Wed, May 3, 2017 at 10:01 AM, Mark Adams wrote: >>>> >>>>> (Hong), what is the current state of optimizing RAP for scaling? >>>>> >>>>> Nate, is driving 3D elasticity problems at scaling with GAMG and we >>>>> are working out performance problems. They are hitting problems at ~1.5B >>>>> dof problems on a basic Cray (XC30 I think). >>>>> >>>>> Thanks, >>>>> Mark >>>>> >>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hng.email at gmail.com Thu May 4 12:10:12 2017 From: hng.email at gmail.com (Hom Nath Gharti) Date: Thu, 4 May 2017 13:10:12 -0400 Subject: [petsc-users] Suggestion for large scale Poisson's solver Message-ID: Dear all, I am trying to solve a Poisson's equation on the Earth models with the following information - Degrees of freedom ~300,000,000 - I use MPIAIJ matrix - Coefficient matrix is symmetric and doesn't change with time steps - Need to compute for a large number of time steps Which solver/preconditioner is the most efficient for this problem? I would be grateful for your suggestion. Thanks, Hom Nath From bsmith at mcs.anl.gov Thu May 4 12:47:04 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 4 May 2017 12:47:04 -0500 Subject: [petsc-users] Suggestion for large scale Poisson's solver In-Reply-To: References: Message-ID: > On May 4, 2017, at 12:10 PM, Hom Nath Gharti wrote: > > Dear all, > > I am trying to solve a Poisson's equation on the Earth models with the > following information > > - Degrees of freedom ~300,000,000 > - I use MPIAIJ matrix > - Coefficient matrix is symmetric and doesn't change with time steps > - Need to compute for a large number of time steps > > Which solver/preconditioner is the most efficient for this problem? I > would be grateful for your suggestion. Geometric multigrid is always best if you can use it. If not I would use hypre BoomerAMG, it should have very good convergence. > > Thanks, > Hom Nath From hng.email at gmail.com Thu May 4 12:52:50 2017 From: hng.email at gmail.com (Hom Nath Gharti) Date: Thu, 4 May 2017 13:52:50 -0400 Subject: [petsc-users] Suggestion for large scale Poisson's solver In-Reply-To: References: Message-ID: Thanks, Barry. Is there a way to take advantage of the fact that the matrix remains same during time steps? On Thu, May 4, 2017 at 1:47 PM, Barry Smith wrote: > >> On May 4, 2017, at 12:10 PM, Hom Nath Gharti wrote: >> >> Dear all, >> >> I am trying to solve a Poisson's equation on the Earth models with the >> following information >> >> - Degrees of freedom ~300,000,000 >> - I use MPIAIJ matrix >> - Coefficient matrix is symmetric and doesn't change with time steps >> - Need to compute for a large number of time steps >> >> Which solver/preconditioner is the most efficient for this problem? I >> would be grateful for your suggestion. > > Geometric multigrid is always best if you can use it. If not I would use hypre BoomerAMG, it should have very good convergence. > > >> >> Thanks, >> Hom Nath > From bsmith at mcs.anl.gov Thu May 4 13:00:13 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 4 May 2017 13:00:13 -0500 Subject: [petsc-users] Suggestion for large scale Poisson's solver In-Reply-To: References: Message-ID: <0B8CA754-9781-48F1-B5F5-CF67E52E5AC0@mcs.anl.gov> > On May 4, 2017, at 12:52 PM, Hom Nath Gharti wrote: > > Thanks, Barry. Is there a way to take advantage of the fact that the > matrix remains same during time steps? Yes since you are not changing the matrix it will construct the preconditioner once and just use it forever. > > On Thu, May 4, 2017 at 1:47 PM, Barry Smith wrote: >> >>> On May 4, 2017, at 12:10 PM, Hom Nath Gharti wrote: >>> >>> Dear all, >>> >>> I am trying to solve a Poisson's equation on the Earth models with the >>> following information >>> >>> - Degrees of freedom ~300,000,000 >>> - I use MPIAIJ matrix >>> - Coefficient matrix is symmetric and doesn't change with time steps >>> - Need to compute for a large number of time steps >>> >>> Which solver/preconditioner is the most efficient for this problem? I >>> would be grateful for your suggestion. >> >> Geometric multigrid is always best if you can use it. If not I would use hypre BoomerAMG, it should have very good convergence. >> >> >>> >>> Thanks, >>> Hom Nath >> From hng.email at gmail.com Thu May 4 13:02:42 2017 From: hng.email at gmail.com (Hom Nath Gharti) Date: Thu, 4 May 2017 14:02:42 -0400 Subject: [petsc-users] Suggestion for large scale Poisson's solver In-Reply-To: <0B8CA754-9781-48F1-B5F5-CF67E52E5AC0@mcs.anl.gov> References: <0B8CA754-9781-48F1-B5F5-CF67E52E5AC0@mcs.anl.gov> Message-ID: Thanks a lot! On Thu, May 4, 2017 at 2:00 PM, Barry Smith wrote: > >> On May 4, 2017, at 12:52 PM, Hom Nath Gharti wrote: >> >> Thanks, Barry. Is there a way to take advantage of the fact that the >> matrix remains same during time steps? > > Yes since you are not changing the matrix it will construct the preconditioner once and just use it forever. > > >> >> On Thu, May 4, 2017 at 1:47 PM, Barry Smith wrote: >>> >>>> On May 4, 2017, at 12:10 PM, Hom Nath Gharti wrote: >>>> >>>> Dear all, >>>> >>>> I am trying to solve a Poisson's equation on the Earth models with the >>>> following information >>>> >>>> - Degrees of freedom ~300,000,000 >>>> - I use MPIAIJ matrix >>>> - Coefficient matrix is symmetric and doesn't change with time steps >>>> - Need to compute for a large number of time steps >>>> >>>> Which solver/preconditioner is the most efficient for this problem? I >>>> would be grateful for your suggestion. >>> >>> Geometric multigrid is always best if you can use it. If not I would use hypre BoomerAMG, it should have very good convergence. >>> >>> >>>> >>>> Thanks, >>>> Hom Nath >>> > From hzhang at mcs.anl.gov Thu May 4 14:33:11 2017 From: hzhang at mcs.anl.gov (Hong) Date: Thu, 4 May 2017 14:33:11 -0500 Subject: [petsc-users] GAMG scaling In-Reply-To: References: Message-ID: Mark, Fixed https://bitbucket.org/petsc/petsc/commits/68eacb73b84ae7f3fd7363217d47f23a8f967155 Run ex56 gives mpiexec -n 8 ./ex56 -ne 13 ... -h |grep via -mattransposematmult_via Algorithmic approach (choose one of) scalable nonscalable matmatmult (MatTransposeMatMult) -matmatmult_via Algorithmic approach (choose one of) scalable nonscalable hypre (MatMatMult) -matptap_via Algorithmic approach (choose one of) scalable nonscalable hypre (MatPtAP) ... I'll merge it to master after regression tests. Hong On Thu, May 4, 2017 at 10:33 AM, Hong wrote: > Mark: >> >> I am not seeing these options with -help ... >> > Hmm, this might be a bug - I'll check it. > Hong > > >> >> On Wed, May 3, 2017 at 10:05 PM, Hong wrote: >> >>> I basically used 'runex56' and set '-ne' be compatible with np. >>> Then I used option >>> '-matptap_via scalable' >>> '-matptap_via hypre' >>> '-matptap_via nonscalable' >>> >>> I attached a job script below. >>> >>> In master branch, I set default as 'nonscalable' for small - medium size >>> matrices, and automatically switch to 'scalable' when matrix size gets >>> larger. >>> >>> Petsc solver uses MatPtAP, which does local RAP to reduce communication >>> and accelerate computation. >>> I suggest you simply use default setting. Let me know if you encounter >>> trouble. >>> >>> Hong >>> >>> job.ne174.n8.np125.sh: >>> runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56 >>> -ne 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1 >>> -pc_gamg_reuse_interpolation true -ksp_converged_reason >>> -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg >>> -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1 >>> -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev >>> -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg >>> -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu >>> -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01 >>> -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30 >>> -pc_gamg_repartition false -pc_mg_cycle_type v >>> -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi >>> -mg_coarse_ksp_type cg -ksp_monitor -log_view -matptap_via scalable > >>> log.ne174.n8.np125.scalable >>> >>> runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56 >>> -ne 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1 >>> -pc_gamg_reuse_interpolation true -ksp_converged_reason >>> -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg >>> -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1 >>> -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev >>> -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg >>> -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu >>> -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01 >>> -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30 >>> -pc_gamg_repartition false -pc_mg_cycle_type v >>> -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi >>> -mg_coarse_ksp_type cg -ksp_monitor -log_view -matptap_via hypre > >>> log.ne174.n8.np125.hypre >>> >>> runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56 >>> -ne 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1 >>> -pc_gamg_reuse_interpolation true -ksp_converged_reason >>> -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg >>> -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1 >>> -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev >>> -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg >>> -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu >>> -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01 >>> -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30 >>> -pc_gamg_repartition false -pc_mg_cycle_type v >>> -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi >>> -mg_coarse_ksp_type cg -ksp_monitor -log_view -matptap_via nonscalable > >>> log.ne174.n8.np125.nonscalable >>> >>> runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56 >>> -ne 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1 >>> -pc_gamg_reuse_interpolation true -ksp_converged_reason >>> -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg >>> -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1 >>> -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev >>> -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg >>> -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu >>> -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01 >>> -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30 >>> -pc_gamg_repartition false -pc_mg_cycle_type v >>> -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi >>> -mg_coarse_ksp_type cg -ksp_monitor -log_view > log.ne174.n8.np125 >>> >>> On Wed, May 3, 2017 at 2:08 PM, Mark Adams wrote: >>> >>>> Hong,the input files do not seem to be accessible. What are the command >>>> line option? (I don't see a "rap" or "scale" in the source). >>>> >>>> >>>> >>>> On Wed, May 3, 2017 at 12:17 PM, Hong wrote: >>>> >>>>> Mark, >>>>> Below is the copy of my email sent to you on Feb 27: >>>>> >>>>> I implemented scalable MatPtAP and did comparisons of three >>>>> implementations using ex56.c on alcf cetus machine (this machine has >>>>> small memory, 1GB/core): >>>>> - nonscalable PtAP: use an array of length PN to do dense axpy >>>>> - scalable PtAP: do sparse axpy without use of PN array >>>>> - hypre PtAP. >>>>> >>>>> The results are attached. Summary: >>>>> - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre >>>>> PtAP >>>>> - scalable PtAP is 4x faster than hypre PtAP >>>>> - hypre uses less memory (see job.ne399.n63.np1000.sh) >>>>> >>>>> Based on above observation, I set the default PtAP algorithm as >>>>> 'nonscalable'. >>>>> When PN > local estimated nonzero of C=PtAP, then switch default to >>>>> 'scalable'. >>>>> User can overwrite default. >>>>> >>>>> For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I get >>>>> MatPtAP 3.6224e+01 (nonscalable for small mats, >>>>> scalable for larger ones) >>>>> scalable MatPtAP 4.6129e+01 >>>>> hypre 1.9389e+02 >>>>> >>>>> This work in on petsc-master. Give it a try. If you encounter any >>>>> problem, let me know. >>>>> >>>>> Hong >>>>> >>>>> On Wed, May 3, 2017 at 10:01 AM, Mark Adams wrote: >>>>> >>>>>> (Hong), what is the current state of optimizing RAP for scaling? >>>>>> >>>>>> Nate, is driving 3D elasticity problems at scaling with GAMG and we >>>>>> are working out performance problems. They are hitting problems at ~1.5B >>>>>> dof problems on a basic Cray (XC30 I think). >>>>>> >>>>>> Thanks, >>>>>> Mark >>>>>> >>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From natacha.bereux at gmail.com Thu May 4 15:39:15 2017 From: natacha.bereux at gmail.com (Natacha BEREUX) Date: Thu, 4 May 2017 22:39:15 +0200 Subject: [petsc-users] Configure nested PCFIELDSPLIT with general index sets In-Reply-To: References: <6496846F-19F8-4494-87E1-DDC390513370@imperial.ac.uk> Message-ID: Thanks for your explanation. It is much clearer now. I have just submitted an issue on the bugtracker (for DMShellSetCreateFieldDecomposition Fortran interface) I am going to work on your other proposals (using PetscObjectCompose or wrap my decomposition). I'll let you know what it gives ! Thanks a lot Natacha On Thu, May 4, 2017 at 5:07 PM, Matthew Knepley wrote: > On Thu, May 4, 2017 at 9:17 AM, Natacha BEREUX > wrote: > >> Dear Matt, >> I re-checked the master branch. To be precise, I downloaded the nightly >> tarball this morning (from http://ftp.mcs.anl.gov/pub/pet >> sc/petsc-master.tar.gz) >> I am sure that the Fortran interface of DMSellSetCreateFieldDecomposition >> is missing. >> And it is quite tricky to add it. I have tried to write something in >> src/dm/impls/shell/ftn-custom/zdmshellf.c but I am not familiar with >> callbacks. >> Any help would be greatly appreciated! >> > > I added PetscObjectCompose() to Fortran, so you could compose IS objects > when needed. Setting function pointers from Fortran is indeed > complicated and I do not yet know how to do it. Could you submit and Issue > (https://bitbucket.org/petsc/petsc/issues?status=new&status=open) > and someone will add this as soon as we have time? > > In the meantime, it would not be hard to create the DMShell in C and have > a small C wrapper for your Fortran function to create the decomposition. > > Thanks, > > Matt > > >> Best regards >> Natacha >> >> On Fri, Apr 28, 2017 at 8:11 PM, Matthew Knepley >> wrote: >> >>> On Fri, Apr 28, 2017 at 1:09 PM, Matthew Knepley >>> wrote: >>> >>>> On Fri, Apr 28, 2017 at 11:48 AM, Natacha BEREUX < >>>> natacha.bereux at gmail.com> wrote: >>>> >>>>> Dear Matt, >>>>> Sorry for my (very) late reply. >>>>> I was not able to find the Fortran interface of >>>>> DMSellSetCreateFieldDecomposition in the late petsc-3.7.6 fortran >>>>> (and my code still fails to link). >>>>> I have the feeling that it is missing in the master branch. >>>>> And I was not able to get it on bitbucket either. >>>>> Is there a branch from which I can pull your commit ? >>>>> >>>> >>>> I would either: >>>> >>>> a) Use the 'next' branch >>>> >>>> or >>>> >>>> b) wait until Monday for me to merge to 'master' >>>> >>>> This merge has been held up, but can now go forward. >>>> >>> >>> I just checked master. It was already merged. Please recheck your master. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Thans a lot for your help, >>>>> Natacha >>>>> >>>>> On Thu, Mar 30, 2017 at 9:25 PM, Matthew Knepley >>>>> wrote: >>>>> >>>>>> On Wed, Mar 22, 2017 at 1:45 PM, Natacha BEREUX < >>>>>> natacha.bereux at gmail.com> wrote: >>>>>> >>>>>>> Hello Matt, >>>>>>> Thanks a lot for your answers. >>>>>>> Since I am working on a large FEM Fortran code, I have to stick to >>>>>>> Fortran. >>>>>>> Do you know if someone plans to add this Fortran interface? Or may >>>>>>> be I could do it myself ? Is this particular interface very hard to add ? >>>>>>> Perhaps could I mimic some other interface ? >>>>>>> What would you advise ? >>>>>>> >>>>>> >>>>>> I have added the interface in branch knepley/feature-fortran-compose. >>>>>> I also put this in the 'next' branch. It >>>>>> should make it to master soon. There is a test in >>>>>> sys/examples/tests/ex13f >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>>> Best regards, >>>>>>> Natacha >>>>>>> >>>>>>> On Wed, Mar 22, 2017 at 12:33 PM, Matthew Knepley >>>>>> > wrote: >>>>>>> >>>>>>>> On Wed, Mar 22, 2017 at 10:03 AM, Natacha BEREUX < >>>>>>>> natacha.bereux at gmail.com> wrote: >>>>>>>> >>>>>>>>> Hello, >>>>>>>>> if my understanding is correct, the approach proposed by Matt and >>>>>>>>> Lawrence is the following : >>>>>>>>> - create a DMShell (DMShellCreate) >>>>>>>>> - define my own CreateFieldDecomposition to return the index sets >>>>>>>>> I need (for displacement, pressure and temperature degrees of freedom) : >>>>>>>>> myCreateFieldDecomposition(... ) >>>>>>>>> - set it in the DMShell ( DMShellSetCreateFieldDecomposition) >>>>>>>>> - then sets the DM in KSP context (KSPSetDM) >>>>>>>>> >>>>>>>>> I have some more questions >>>>>>>>> - I did not succeed in setting my own CreateFieldDecomposition in >>>>>>>>> the DMShell : link fails with " unknown reference to ? >>>>>>>>> dmshellsetcreatefielddecomposition_ ?. Could it be a Fortran >>>>>>>>> problem (I am using Fortran)? Is this routine available in PETSc Fortran >>>>>>>>> interface ? \ >>>>>>>>> >>>>>>>> >>>>>>>> Yes, exactly. The Fortran interface for passing function pointers >>>>>>>> is complex, and no one has added this function yet. >>>>>>>> >>>>>>>> >>>>>>>>> - CreateFieldDecomposition is supposed to return an array of dms >>>>>>>>> (to define the fields). I am not able to return such datas. Do I return a >>>>>>>>> PETSC_NULL_OBJECT instead ? >>>>>>>>> >>>>>>>> >>>>>>>> Yes. >>>>>>>> >>>>>>>> >>>>>>>>> - do I have to provide something else to define the DMShell ? >>>>>>>>> >>>>>>>> >>>>>>>> I think you will have to return local and global vectors, but this >>>>>>>> just means creating a vector of the correct size and distribution. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Matt >>>>>>>> >>>>>>>> >>>>>>>>> Thanks a lot for your help >>>>>>>>> Natacha >>>>>>>>> >>>>>>>>> On Tue, Mar 21, 2017 at 2:44 PM, Natacha BEREUX < >>>>>>>>> natacha.bereux at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Thanks for your quick answers. To be honest, I am not familiar at >>>>>>>>>> all with DMShells and DMPlexes. But since it is what I need, I am going to >>>>>>>>>> try it. >>>>>>>>>> Thanks again for your advices, >>>>>>>>>> Natacha >>>>>>>>>> >>>>>>>>>> On Tue, Mar 21, 2017 at 2:27 PM, Lawrence Mitchell < >>>>>>>>>> lawrence.mitchell at imperial.ac.uk> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> > On 21 Mar 2017, at 13:24, Matthew Knepley >>>>>>>>>>> wrote: >>>>>>>>>>> > >>>>>>>>>>> > I think the remedy is as easy as specifying a DMShell that has >>>>>>>>>>> a PetscSection (DMSetDefaultSection) with your ordering, and >>>>>>>>>>> > I think this is how Firedrake (http://www.firedrakeproject.o >>>>>>>>>>> rg/) does it. >>>>>>>>>>> >>>>>>>>>>> We actually don't use a section, but we do provide >>>>>>>>>>> DMCreateFieldDecomposition_Shell. >>>>>>>>>>> >>>>>>>>>>> If you have a section that describes all the fields, then I >>>>>>>>>>> think if the DMShell knows about it, you effectively get the same behaviour >>>>>>>>>>> as DMPlex (which does the decomposition in the same manner?). >>>>>>>>>>> >>>>>>>>>>> > However, I usually use a DMPlex which knows about my >>>>>>>>>>> > mesh, so I am not sure if this strategy has any holes. >>>>>>>>>>> >>>>>>>>>>> I haven't noticed anything yet. >>>>>>>>>>> >>>>>>>>>>> Lawrence >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> What most experimenters take for granted before they begin their >>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>> experiments lead. >>>>>>>> -- Norbert Wiener >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pvsang002 at gmail.com Fri May 5 05:45:30 2017 From: pvsang002 at gmail.com (Pham Pham) Date: Fri, 5 May 2017 18:45:30 +0800 Subject: [petsc-users] Installation question In-Reply-To: References: Message-ID: *Hi,* *I can configure now, but fail when testing:* [mpepvs at atlas7-c10 petsc-3.7.5]$ make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 PETSC_ARCH=arch-linux-cxx-opt test Running test examples to verify correct installation Using PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 and PETSC_ARCH=arch-linux-cxx-opt Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1 MPI process See http://www.mcs.anl.gov/petsc/documentation/faq.html mpiexec_atlas7-c10: cannot connect to local mpd (/tmp/mpd2.console_mpepvs); possible causes: 1. no mpd is running on this host 2. an mpd is running but was started without a "console" (-n option) Possible error running C/C++ src/snes/examples/tutorials/ex19 with 2 MPI processes See http://www.mcs.anl.gov/petsc/documentation/faq.html mpiexec_atlas7-c10: cannot connect to local mpd (/tmp/mpd2.console_mpepvs); possible causes: 1. no mpd is running on this host 2. an mpd is running but was started without a "console" (-n option) Possible error running Fortran example src/snes/examples/tutorials/ex5f with 1 MPI process See http://www.mcs.anl.gov/petsc/documentation/faq.html mpiexec_atlas7-c10: cannot connect to local mpd (/tmp/mpd2.console_mpepvs); possible causes: 1. no mpd is running on this host 2. an mpd is running but was started without a "console" (-n option) Completed test examples ========================================= Now to evaluate the computer systems you plan use - do: make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 PETSC_ARCH=arch-linux-cxx-opt streams *Please help on this.* *Many thanks!* On Thu, Apr 20, 2017 at 2:02 AM, Satish Balay wrote: > Sorry - should have mentioned: > > do 'rm -rf arch-linux-cxx-opt' and rerun configure again. > > The mpich install from previous build [that is currently in > arch-linux-cxx-opt/] > is conflicting with --with-mpi-dir=/app1/centos6.3/gnu/mvapich2-1.9/ > > Satish > > > On Wed, 19 Apr 2017, Pham Pham wrote: > > > I reconfigured PETSs with installed MPI, however, I got serous error: > > > > **************************ERROR************************************* > > Error during compile, check arch-linux-cxx-opt/lib/petsc/conf/make.log > > Send it and arch-linux-cxx-opt/lib/petsc/conf/configure.log to > > petsc-maint at mcs.anl.gov > > ******************************************************************** > > > > Please explain what is happening? > > > > Thank you very much. > > > > > > > > > > On Wed, Apr 19, 2017 at 11:43 PM, Satish Balay > wrote: > > > > > Presumably your cluster already has a recommended MPI to use [which is > > > already installed. So you should use that - instead of > > > --download-mpich=1 > > > > > > Satish > > > > > > On Wed, 19 Apr 2017, Pham Pham wrote: > > > > > > > Hi, > > > > > > > > I just installed petsc-3.7.5 into my university cluster. When > evaluating > > > > the computer system, PETSc reports "It appears you have 1 node(s)", I > > > donot > > > > understand this, since the system is a multinodes system. Could you > > > please > > > > explain this to me? > > > > > > > > Thank you very much. > > > > > > > > S. > > > > > > > > Output: > > > > ========================================= > > > > Now to evaluate the computer systems you plan use - do: > > > > make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > > > > PETSC_ARCH=arch-linux-cxx-opt streams > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ make > > > > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > > > PETSC_ARCH=arch-linux-cxx-opt > > > > streams > > > > cd src/benchmarks/streams; /usr/bin/gmake --no-print-directory > > > > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > > > PETSC_ARCH=arch-linux-cxx-opt > > > > streams > > > > /home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/bin/mpicxx -o > > > > MPIVersion.o -c -Wall -Wwrite-strings -Wno-strict-aliasing > > > > -Wno-unknown-pragmas -fvisibility=hidden -g -O > > > > -I/home/svu/mpepvs/petsc/petsc-3.7.5/include > > > > -I/home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/include > > > > `pwd`/MPIVersion.c > > > > Running streams with > > > > '/home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/bin/mpiexec ' > > > using > > > > 'NPMAX=12' > > > > Number of MPI processes 1 Processor names atlas7-c10 > > > > Triad: 9137.5025 Rate (MB/s) > > > > Number of MPI processes 2 Processor names atlas7-c10 atlas7-c10 > > > > Triad: 9707.2815 Rate (MB/s) > > > > Number of MPI processes 3 Processor names atlas7-c10 atlas7-c10 > > > atlas7-c10 > > > > Triad: 13559.5275 Rate (MB/s) > > > > Number of MPI processes 4 Processor names atlas7-c10 atlas7-c10 > > > atlas7-c10 > > > > atlas7-c10 > > > > Triad: 14193.0597 Rate (MB/s) > > > > Number of MPI processes 5 Processor names atlas7-c10 atlas7-c10 > > > atlas7-c10 > > > > atlas7-c10 atlas7-c10 > > > > Triad: 14492.9234 Rate (MB/s) > > > > Number of MPI processes 6 Processor names atlas7-c10 atlas7-c10 > > > atlas7-c10 > > > > atlas7-c10 atlas7-c10 atlas7-c10 > > > > Triad: 15476.5912 Rate (MB/s) > > > > Number of MPI processes 7 Processor names atlas7-c10 atlas7-c10 > > > atlas7-c10 > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > Triad: 15148.7388 Rate (MB/s) > > > > Number of MPI processes 8 Processor names atlas7-c10 atlas7-c10 > > > atlas7-c10 > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > Triad: 15799.1290 Rate (MB/s) > > > > Number of MPI processes 9 Processor names atlas7-c10 atlas7-c10 > > > atlas7-c10 > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > Triad: 15671.3104 Rate (MB/s) > > > > Number of MPI processes 10 Processor names atlas7-c10 atlas7-c10 > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > atlas7-c10 atlas7-c10 > > > > Triad: 15601.4754 Rate (MB/s) > > > > Number of MPI processes 11 Processor names atlas7-c10 atlas7-c10 > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > atlas7-c10 atlas7-c10 atlas7-c10 > > > > Triad: 15434.5790 Rate (MB/s) > > > > Number of MPI processes 12 Processor names atlas7-c10 atlas7-c10 > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > Triad: 15134.1263 Rate (MB/s) > > > > ------------------------------------------------ > > > > np speedup > > > > 1 1.0 > > > > 2 1.06 > > > > 3 1.48 > > > > 4 1.55 > > > > 5 1.59 > > > > 6 1.69 > > > > 7 1.66 > > > > 8 1.73 > > > > 9 1.72 > > > > 10 1.71 > > > > 11 1.69 > > > > 12 1.66 > > > > Estimation of possible speedup of MPI programs based on Streams > > > benchmark. > > > > It appears you have 1 node(s) > > > > Unable to plot speedup to a file > > > > Unable to open matplotlib to plot speedup > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: make.log Type: text/x-log Size: 102067 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: text/x-log Size: 6026195 bytes Desc: not available URL: From balay at mcs.anl.gov Fri May 5 09:02:53 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 5 May 2017 09:02:53 -0500 Subject: [petsc-users] Installation question In-Reply-To: References: Message-ID: With Intel MPI - its best to use mpiexec.hydra [and not mpiexec] So you can do: make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 PETSC_ARCH=arch-linux-cxx-opt MPIEXEC=mpiexec.hydra test [you can also specify --with-mpiexec=mpiexec.hydra at configure time] Satish On Fri, 5 May 2017, Pham Pham wrote: > *Hi,* > *I can configure now, but fail when testing:* > > [mpepvs at atlas7-c10 petsc-3.7.5]$ make > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 PETSC_ARCH=arch-linux-cxx-opt > test Running test examples to verify correct installation > Using PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 and > PETSC_ARCH=arch-linux-cxx-opt > Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1 MPI > process > See http://www.mcs.anl.gov/petsc/documentation/faq.html > mpiexec_atlas7-c10: cannot connect to local mpd (/tmp/mpd2.console_mpepvs); > possible causes: > 1. no mpd is running on this host > 2. an mpd is running but was started without a "console" (-n option) > Possible error running C/C++ src/snes/examples/tutorials/ex19 with 2 MPI > processes > See http://www.mcs.anl.gov/petsc/documentation/faq.html > mpiexec_atlas7-c10: cannot connect to local mpd (/tmp/mpd2.console_mpepvs); > possible causes: > 1. no mpd is running on this host > 2. an mpd is running but was started without a "console" (-n option) > Possible error running Fortran example src/snes/examples/tutorials/ex5f > with 1 MPI process > See http://www.mcs.anl.gov/petsc/documentation/faq.html > mpiexec_atlas7-c10: cannot connect to local mpd (/tmp/mpd2.console_mpepvs); > possible causes: > 1. no mpd is running on this host > 2. an mpd is running but was started without a "console" (-n option) > Completed test examples > ========================================= > Now to evaluate the computer systems you plan use - do: > make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > PETSC_ARCH=arch-linux-cxx-opt streams > > > > > *Please help on this.* > *Many thanks!* > > > On Thu, Apr 20, 2017 at 2:02 AM, Satish Balay wrote: > > > Sorry - should have mentioned: > > > > do 'rm -rf arch-linux-cxx-opt' and rerun configure again. > > > > The mpich install from previous build [that is currently in > > arch-linux-cxx-opt/] > > is conflicting with --with-mpi-dir=/app1/centos6.3/gnu/mvapich2-1.9/ > > > > Satish > > > > > > On Wed, 19 Apr 2017, Pham Pham wrote: > > > > > I reconfigured PETSs with installed MPI, however, I got serous error: > > > > > > **************************ERROR************************************* > > > Error during compile, check arch-linux-cxx-opt/lib/petsc/conf/make.log > > > Send it and arch-linux-cxx-opt/lib/petsc/conf/configure.log to > > > petsc-maint at mcs.anl.gov > > > ******************************************************************** > > > > > > Please explain what is happening? > > > > > > Thank you very much. > > > > > > > > > > > > > > > On Wed, Apr 19, 2017 at 11:43 PM, Satish Balay > > wrote: > > > > > > > Presumably your cluster already has a recommended MPI to use [which is > > > > already installed. So you should use that - instead of > > > > --download-mpich=1 > > > > > > > > Satish > > > > > > > > On Wed, 19 Apr 2017, Pham Pham wrote: > > > > > > > > > Hi, > > > > > > > > > > I just installed petsc-3.7.5 into my university cluster. When > > evaluating > > > > > the computer system, PETSc reports "It appears you have 1 node(s)", I > > > > donot > > > > > understand this, since the system is a multinodes system. Could you > > > > please > > > > > explain this to me? > > > > > > > > > > Thank you very much. > > > > > > > > > > S. > > > > > > > > > > Output: > > > > > ========================================= > > > > > Now to evaluate the computer systems you plan use - do: > > > > > make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > > > > > PETSC_ARCH=arch-linux-cxx-opt streams > > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ make > > > > > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > > > > PETSC_ARCH=arch-linux-cxx-opt > > > > > streams > > > > > cd src/benchmarks/streams; /usr/bin/gmake --no-print-directory > > > > > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > > > > PETSC_ARCH=arch-linux-cxx-opt > > > > > streams > > > > > /home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/bin/mpicxx -o > > > > > MPIVersion.o -c -Wall -Wwrite-strings -Wno-strict-aliasing > > > > > -Wno-unknown-pragmas -fvisibility=hidden -g -O > > > > > -I/home/svu/mpepvs/petsc/petsc-3.7.5/include > > > > > -I/home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/include > > > > > `pwd`/MPIVersion.c > > > > > Running streams with > > > > > '/home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/bin/mpiexec ' > > > > using > > > > > 'NPMAX=12' > > > > > Number of MPI processes 1 Processor names atlas7-c10 > > > > > Triad: 9137.5025 Rate (MB/s) > > > > > Number of MPI processes 2 Processor names atlas7-c10 atlas7-c10 > > > > > Triad: 9707.2815 Rate (MB/s) > > > > > Number of MPI processes 3 Processor names atlas7-c10 atlas7-c10 > > > > atlas7-c10 > > > > > Triad: 13559.5275 Rate (MB/s) > > > > > Number of MPI processes 4 Processor names atlas7-c10 atlas7-c10 > > > > atlas7-c10 > > > > > atlas7-c10 > > > > > Triad: 14193.0597 Rate (MB/s) > > > > > Number of MPI processes 5 Processor names atlas7-c10 atlas7-c10 > > > > atlas7-c10 > > > > > atlas7-c10 atlas7-c10 > > > > > Triad: 14492.9234 Rate (MB/s) > > > > > Number of MPI processes 6 Processor names atlas7-c10 atlas7-c10 > > > > atlas7-c10 > > > > > atlas7-c10 atlas7-c10 atlas7-c10 > > > > > Triad: 15476.5912 Rate (MB/s) > > > > > Number of MPI processes 7 Processor names atlas7-c10 atlas7-c10 > > > > atlas7-c10 > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > > Triad: 15148.7388 Rate (MB/s) > > > > > Number of MPI processes 8 Processor names atlas7-c10 atlas7-c10 > > > > atlas7-c10 > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > > Triad: 15799.1290 Rate (MB/s) > > > > > Number of MPI processes 9 Processor names atlas7-c10 atlas7-c10 > > > > atlas7-c10 > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > > Triad: 15671.3104 Rate (MB/s) > > > > > Number of MPI processes 10 Processor names atlas7-c10 atlas7-c10 > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > > atlas7-c10 atlas7-c10 > > > > > Triad: 15601.4754 Rate (MB/s) > > > > > Number of MPI processes 11 Processor names atlas7-c10 atlas7-c10 > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > > atlas7-c10 atlas7-c10 atlas7-c10 > > > > > Triad: 15434.5790 Rate (MB/s) > > > > > Number of MPI processes 12 Processor names atlas7-c10 atlas7-c10 > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > > Triad: 15134.1263 Rate (MB/s) > > > > > ------------------------------------------------ > > > > > np speedup > > > > > 1 1.0 > > > > > 2 1.06 > > > > > 3 1.48 > > > > > 4 1.55 > > > > > 5 1.59 > > > > > 6 1.69 > > > > > 7 1.66 > > > > > 8 1.73 > > > > > 9 1.72 > > > > > 10 1.71 > > > > > 11 1.69 > > > > > 12 1.66 > > > > > Estimation of possible speedup of MPI programs based on Streams > > > > benchmark. > > > > > It appears you have 1 node(s) > > > > > Unable to plot speedup to a file > > > > > Unable to open matplotlib to plot speedup > > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ > > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ > > > > > > > > > > > > > > > > > > > > > From pvsang002 at gmail.com Fri May 5 10:18:29 2017 From: pvsang002 at gmail.com (Pham Pham) Date: Fri, 5 May 2017 23:18:29 +0800 Subject: [petsc-users] Installation question In-Reply-To: References: Message-ID: Hi Satish, It runs now, and shows a bad speed up: Please help to improve this. Thank you. ? On Fri, May 5, 2017 at 10:02 PM, Satish Balay wrote: > With Intel MPI - its best to use mpiexec.hydra [and not mpiexec] > > So you can do: > > make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > PETSC_ARCH=arch-linux-cxx-opt MPIEXEC=mpiexec.hydra test > > > [you can also specify --with-mpiexec=mpiexec.hydra at configure time] > > Satish > > > On Fri, 5 May 2017, Pham Pham wrote: > > > *Hi,* > > *I can configure now, but fail when testing:* > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ make > > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > PETSC_ARCH=arch-linux-cxx-opt > > test Running test examples to verify correct installation > > Using PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 and > > PETSC_ARCH=arch-linux-cxx-opt > > Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1 MPI > > process > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > mpiexec_atlas7-c10: cannot connect to local mpd > (/tmp/mpd2.console_mpepvs); > > possible causes: > > 1. no mpd is running on this host > > 2. an mpd is running but was started without a "console" (-n option) > > Possible error running C/C++ src/snes/examples/tutorials/ex19 with 2 MPI > > processes > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > mpiexec_atlas7-c10: cannot connect to local mpd > (/tmp/mpd2.console_mpepvs); > > possible causes: > > 1. no mpd is running on this host > > 2. an mpd is running but was started without a "console" (-n option) > > Possible error running Fortran example src/snes/examples/tutorials/ex5f > > with 1 MPI process > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > mpiexec_atlas7-c10: cannot connect to local mpd > (/tmp/mpd2.console_mpepvs); > > possible causes: > > 1. no mpd is running on this host > > 2. an mpd is running but was started without a "console" (-n option) > > Completed test examples > > ========================================= > > Now to evaluate the computer systems you plan use - do: > > make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > > PETSC_ARCH=arch-linux-cxx-opt streams > > > > > > > > > > *Please help on this.* > > *Many thanks!* > > > > > > On Thu, Apr 20, 2017 at 2:02 AM, Satish Balay wrote: > > > > > Sorry - should have mentioned: > > > > > > do 'rm -rf arch-linux-cxx-opt' and rerun configure again. > > > > > > The mpich install from previous build [that is currently in > > > arch-linux-cxx-opt/] > > > is conflicting with --with-mpi-dir=/app1/centos6.3/gnu/mvapich2-1.9/ > > > > > > Satish > > > > > > > > > On Wed, 19 Apr 2017, Pham Pham wrote: > > > > > > > I reconfigured PETSs with installed MPI, however, I got serous error: > > > > > > > > **************************ERROR************************************* > > > > Error during compile, check arch-linux-cxx-opt/lib/petsc/ > conf/make.log > > > > Send it and arch-linux-cxx-opt/lib/petsc/conf/configure.log to > > > > petsc-maint at mcs.anl.gov > > > > ******************************************************************** > > > > > > > > Please explain what is happening? > > > > > > > > Thank you very much. > > > > > > > > > > > > > > > > > > > > On Wed, Apr 19, 2017 at 11:43 PM, Satish Balay > > > wrote: > > > > > > > > > Presumably your cluster already has a recommended MPI to use > [which is > > > > > already installed. So you should use that - instead of > > > > > --download-mpich=1 > > > > > > > > > > Satish > > > > > > > > > > On Wed, 19 Apr 2017, Pham Pham wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > I just installed petsc-3.7.5 into my university cluster. When > > > evaluating > > > > > > the computer system, PETSc reports "It appears you have 1 > node(s)", I > > > > > donot > > > > > > understand this, since the system is a multinodes system. Could > you > > > > > please > > > > > > explain this to me? > > > > > > > > > > > > Thank you very much. > > > > > > > > > > > > S. > > > > > > > > > > > > Output: > > > > > > ========================================= > > > > > > Now to evaluate the computer systems you plan use - do: > > > > > > make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > > > > > > PETSC_ARCH=arch-linux-cxx-opt streams > > > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ make > > > > > > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > > > > > PETSC_ARCH=arch-linux-cxx-opt > > > > > > streams > > > > > > cd src/benchmarks/streams; /usr/bin/gmake --no-print-directory > > > > > > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > > > > > PETSC_ARCH=arch-linux-cxx-opt > > > > > > streams > > > > > > /home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/bin/mpicxx > -o > > > > > > MPIVersion.o -c -Wall -Wwrite-strings -Wno-strict-aliasing > > > > > > -Wno-unknown-pragmas -fvisibility=hidden -g -O > > > > > > -I/home/svu/mpepvs/petsc/petsc-3.7.5/include > > > > > > -I/home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/include > > > > > > `pwd`/MPIVersion.c > > > > > > Running streams with > > > > > > '/home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/bin/mpiexec > ' > > > > > using > > > > > > 'NPMAX=12' > > > > > > Number of MPI processes 1 Processor names atlas7-c10 > > > > > > Triad: 9137.5025 Rate (MB/s) > > > > > > Number of MPI processes 2 Processor names atlas7-c10 atlas7-c10 > > > > > > Triad: 9707.2815 Rate (MB/s) > > > > > > Number of MPI processes 3 Processor names atlas7-c10 atlas7-c10 > > > > > atlas7-c10 > > > > > > Triad: 13559.5275 Rate (MB/s) > > > > > > Number of MPI processes 4 Processor names atlas7-c10 atlas7-c10 > > > > > atlas7-c10 > > > > > > atlas7-c10 > > > > > > Triad: 14193.0597 Rate (MB/s) > > > > > > Number of MPI processes 5 Processor names atlas7-c10 atlas7-c10 > > > > > atlas7-c10 > > > > > > atlas7-c10 atlas7-c10 > > > > > > Triad: 14492.9234 Rate (MB/s) > > > > > > Number of MPI processes 6 Processor names atlas7-c10 atlas7-c10 > > > > > atlas7-c10 > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 > > > > > > Triad: 15476.5912 Rate (MB/s) > > > > > > Number of MPI processes 7 Processor names atlas7-c10 atlas7-c10 > > > > > atlas7-c10 > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > > > Triad: 15148.7388 Rate (MB/s) > > > > > > Number of MPI processes 8 Processor names atlas7-c10 atlas7-c10 > > > > > atlas7-c10 > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > > > Triad: 15799.1290 Rate (MB/s) > > > > > > Number of MPI processes 9 Processor names atlas7-c10 atlas7-c10 > > > > > atlas7-c10 > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > > > Triad: 15671.3104 Rate (MB/s) > > > > > > Number of MPI processes 10 Processor names atlas7-c10 atlas7-c10 > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > > > atlas7-c10 atlas7-c10 > > > > > > Triad: 15601.4754 Rate (MB/s) > > > > > > Number of MPI processes 11 Processor names atlas7-c10 atlas7-c10 > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 > > > > > > Triad: 15434.5790 Rate (MB/s) > > > > > > Number of MPI processes 12 Processor names atlas7-c10 atlas7-c10 > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > > > Triad: 15134.1263 Rate (MB/s) > > > > > > ------------------------------------------------ > > > > > > np speedup > > > > > > 1 1.0 > > > > > > 2 1.06 > > > > > > 3 1.48 > > > > > > 4 1.55 > > > > > > 5 1.59 > > > > > > 6 1.69 > > > > > > 7 1.66 > > > > > > 8 1.73 > > > > > > 9 1.72 > > > > > > 10 1.71 > > > > > > 11 1.69 > > > > > > 12 1.66 > > > > > > Estimation of possible speedup of MPI programs based on Streams > > > > > benchmark. > > > > > > It appears you have 1 node(s) > > > > > > Unable to plot speedup to a file > > > > > > Unable to open matplotlib to plot speedup > > > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ > > > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: scaling.png Type: image/png Size: 46047 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test.log Type: text/x-log Size: 636 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test.log Type: text/x-log Size: 636 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: make.log Type: text/x-log Size: 102045 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: text/x-log Size: 4616950 bytes Desc: not available URL: From niko.karin at gmail.com Fri May 5 11:14:03 2017 From: niko.karin at gmail.com (Karin&NiKo) Date: Fri, 5 May 2017 18:14:03 +0200 Subject: [petsc-users] Using SNES in a legacy code Message-ID: Dear PETSc team, I am part of the development team of legacy fortran code with a tailored Newton's method. The software is already using PETSc's linear solvers and we enjoy it. Now I would like to evaluate the SNES solver. I have already extracted a function in order to compute the Jacobian and another one to compute the residual. But there is something I cannot figure out : at each Newton's iteration, our solver needs to know the unknowns value in order to compute the Jacobian. But the increment vector is computed within the SNES. How can I synchronize PETSc's vector of unknowns and mine? Is there some kind of SNESSetPostSolveShell ? Thanks for developping PETSc, Nicolas -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri May 5 11:26:25 2017 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 5 May 2017 11:26:25 -0500 Subject: [petsc-users] Installation question In-Reply-To: References: Message-ID: On Fri, May 5, 2017 at 10:18 AM, Pham Pham wrote: > Hi Satish, > > It runs now, and shows a bad speed up: > Please help to improve this. > http://www.mcs.anl.gov/petsc/documentation/faq.html#computers The short answer is: You cannot improve this without buying a different machine. This is a fundamental algorithmic limitation that cannot be helped by threads, or vectorization, or anything else. Matt > Thank you. > > > ? > > On Fri, May 5, 2017 at 10:02 PM, Satish Balay wrote: > >> With Intel MPI - its best to use mpiexec.hydra [and not mpiexec] >> >> So you can do: >> >> make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 >> PETSC_ARCH=arch-linux-cxx-opt MPIEXEC=mpiexec.hydra test >> >> >> [you can also specify --with-mpiexec=mpiexec.hydra at configure time] >> >> Satish >> >> >> On Fri, 5 May 2017, Pham Pham wrote: >> >> > *Hi,* >> > *I can configure now, but fail when testing:* >> > >> > [mpepvs at atlas7-c10 petsc-3.7.5]$ make >> > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 >> PETSC_ARCH=arch-linux-cxx-opt >> > test Running test examples to verify correct installation >> > Using PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 and >> > PETSC_ARCH=arch-linux-cxx-opt >> > Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1 >> MPI >> > process >> > See http://www.mcs.anl.gov/petsc/documentation/faq.html >> > mpiexec_atlas7-c10: cannot connect to local mpd >> (/tmp/mpd2.console_mpepvs); >> > possible causes: >> > 1. no mpd is running on this host >> > 2. an mpd is running but was started without a "console" (-n option) >> > Possible error running C/C++ src/snes/examples/tutorials/ex19 with 2 >> MPI >> > processes >> > See http://www.mcs.anl.gov/petsc/documentation/faq.html >> > mpiexec_atlas7-c10: cannot connect to local mpd >> (/tmp/mpd2.console_mpepvs); >> > possible causes: >> > 1. no mpd is running on this host >> > 2. an mpd is running but was started without a "console" (-n option) >> > Possible error running Fortran example src/snes/examples/tutorials/ex5f >> > with 1 MPI process >> > See http://www.mcs.anl.gov/petsc/documentation/faq.html >> > mpiexec_atlas7-c10: cannot connect to local mpd >> (/tmp/mpd2.console_mpepvs); >> > possible causes: >> > 1. no mpd is running on this host >> > 2. an mpd is running but was started without a "console" (-n option) >> > Completed test examples >> > ========================================= >> > Now to evaluate the computer systems you plan use - do: >> > make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 >> > PETSC_ARCH=arch-linux-cxx-opt streams >> > >> > >> > >> > >> > *Please help on this.* >> > *Many thanks!* >> > >> > >> > On Thu, Apr 20, 2017 at 2:02 AM, Satish Balay >> wrote: >> > >> > > Sorry - should have mentioned: >> > > >> > > do 'rm -rf arch-linux-cxx-opt' and rerun configure again. >> > > >> > > The mpich install from previous build [that is currently in >> > > arch-linux-cxx-opt/] >> > > is conflicting with --with-mpi-dir=/app1/centos6.3/gnu/mvapich2-1.9/ >> > > >> > > Satish >> > > >> > > >> > > On Wed, 19 Apr 2017, Pham Pham wrote: >> > > >> > > > I reconfigured PETSs with installed MPI, however, I got serous >> error: >> > > > >> > > > **************************ERROR***************************** >> ******** >> > > > Error during compile, check arch-linux-cxx-opt/lib/petsc/c >> onf/make.log >> > > > Send it and arch-linux-cxx-opt/lib/petsc/conf/configure.log to >> > > > petsc-maint at mcs.anl.gov >> > > > ************************************************************ >> ******** >> > > > >> > > > Please explain what is happening? >> > > > >> > > > Thank you very much. >> > > > >> > > > >> > > > >> > > > >> > > > On Wed, Apr 19, 2017 at 11:43 PM, Satish Balay >> > > wrote: >> > > > >> > > > > Presumably your cluster already has a recommended MPI to use >> [which is >> > > > > already installed. So you should use that - instead of >> > > > > --download-mpich=1 >> > > > > >> > > > > Satish >> > > > > >> > > > > On Wed, 19 Apr 2017, Pham Pham wrote: >> > > > > >> > > > > > Hi, >> > > > > > >> > > > > > I just installed petsc-3.7.5 into my university cluster. When >> > > evaluating >> > > > > > the computer system, PETSc reports "It appears you have 1 >> node(s)", I >> > > > > donot >> > > > > > understand this, since the system is a multinodes system. Could >> you >> > > > > please >> > > > > > explain this to me? >> > > > > > >> > > > > > Thank you very much. >> > > > > > >> > > > > > S. >> > > > > > >> > > > > > Output: >> > > > > > ========================================= >> > > > > > Now to evaluate the computer systems you plan use - do: >> > > > > > make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 >> > > > > > PETSC_ARCH=arch-linux-cxx-opt streams >> > > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ make >> > > > > > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 >> > > > > PETSC_ARCH=arch-linux-cxx-opt >> > > > > > streams >> > > > > > cd src/benchmarks/streams; /usr/bin/gmake --no-print-directory >> > > > > > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 >> > > > > PETSC_ARCH=arch-linux-cxx-opt >> > > > > > streams >> > > > > > /home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/bin/mpicxx >> -o >> > > > > > MPIVersion.o -c -Wall -Wwrite-strings -Wno-strict-aliasing >> > > > > > -Wno-unknown-pragmas -fvisibility=hidden -g -O >> > > > > > -I/home/svu/mpepvs/petsc/petsc-3.7.5/include >> > > > > > -I/home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/include >> > > > > > `pwd`/MPIVersion.c >> > > > > > Running streams with >> > > > > > '/home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/bin/mpiexec >> ' >> > > > > using >> > > > > > 'NPMAX=12' >> > > > > > Number of MPI processes 1 Processor names atlas7-c10 >> > > > > > Triad: 9137.5025 Rate (MB/s) >> > > > > > Number of MPI processes 2 Processor names atlas7-c10 atlas7-c10 >> > > > > > Triad: 9707.2815 Rate (MB/s) >> > > > > > Number of MPI processes 3 Processor names atlas7-c10 atlas7-c10 >> > > > > atlas7-c10 >> > > > > > Triad: 13559.5275 Rate (MB/s) >> > > > > > Number of MPI processes 4 Processor names atlas7-c10 atlas7-c10 >> > > > > atlas7-c10 >> > > > > > atlas7-c10 >> > > > > > Triad: 14193.0597 Rate (MB/s) >> > > > > > Number of MPI processes 5 Processor names atlas7-c10 atlas7-c10 >> > > > > atlas7-c10 >> > > > > > atlas7-c10 atlas7-c10 >> > > > > > Triad: 14492.9234 Rate (MB/s) >> > > > > > Number of MPI processes 6 Processor names atlas7-c10 atlas7-c10 >> > > > > atlas7-c10 >> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 >> > > > > > Triad: 15476.5912 Rate (MB/s) >> > > > > > Number of MPI processes 7 Processor names atlas7-c10 atlas7-c10 >> > > > > atlas7-c10 >> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >> > > > > > Triad: 15148.7388 Rate (MB/s) >> > > > > > Number of MPI processes 8 Processor names atlas7-c10 atlas7-c10 >> > > > > atlas7-c10 >> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >> > > > > > Triad: 15799.1290 Rate (MB/s) >> > > > > > Number of MPI processes 9 Processor names atlas7-c10 atlas7-c10 >> > > > > atlas7-c10 >> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >> atlas7-c10 >> > > > > > Triad: 15671.3104 Rate (MB/s) >> > > > > > Number of MPI processes 10 Processor names atlas7-c10 >> atlas7-c10 >> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >> atlas7-c10 >> > > > > > atlas7-c10 atlas7-c10 >> > > > > > Triad: 15601.4754 Rate (MB/s) >> > > > > > Number of MPI processes 11 Processor names atlas7-c10 >> atlas7-c10 >> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >> atlas7-c10 >> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 >> > > > > > Triad: 15434.5790 Rate (MB/s) >> > > > > > Number of MPI processes 12 Processor names atlas7-c10 >> atlas7-c10 >> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >> atlas7-c10 >> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >> > > > > > Triad: 15134.1263 Rate (MB/s) >> > > > > > ------------------------------------------------ >> > > > > > np speedup >> > > > > > 1 1.0 >> > > > > > 2 1.06 >> > > > > > 3 1.48 >> > > > > > 4 1.55 >> > > > > > 5 1.59 >> > > > > > 6 1.69 >> > > > > > 7 1.66 >> > > > > > 8 1.73 >> > > > > > 9 1.72 >> > > > > > 10 1.71 >> > > > > > 11 1.69 >> > > > > > 12 1.66 >> > > > > > Estimation of possible speedup of MPI programs based on Streams >> > > > > benchmark. >> > > > > > It appears you have 1 node(s) >> > > > > > Unable to plot speedup to a file >> > > > > > Unable to open matplotlib to plot speedup >> > > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ >> > > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ >> > > > > > >> > > > > >> > > > > >> > > > >> > > >> > > >> > >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: scaling.png Type: image/png Size: 46047 bytes Desc: not available URL: From knepley at gmail.com Fri May 5 11:28:38 2017 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 5 May 2017 11:28:38 -0500 Subject: [petsc-users] Using SNES in a legacy code In-Reply-To: References: Message-ID: On Fri, May 5, 2017 at 11:14 AM, Karin&NiKo wrote: > Dear PETSc team, > > I am part of the development team of legacy fortran code with a tailored > Newton's method. The software is already using PETSc's linear solvers and > we enjoy it. Now I would like to evaluate the SNES solver. > I have already extracted a function in order to compute the Jacobian and > another one to compute the residual. > But there is something I cannot figure out : at each Newton's iteration, > our solver needs to know the unknowns value in order to compute the > Jacobian. But the increment vector is computed within the SNES. > The FormJacobian() function that you pass in gets the current guess as an argument. Thanks, Matt > How can I synchronize PETSc's vector of unknowns and mine? Is there some > kind of SNESSetPostSolveShell ? > > Thanks for developping PETSc, > Nicolas > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From tinap89 at yahoo.com Mon May 8 01:13:51 2017 From: tinap89 at yahoo.com (Tina Patel) Date: Mon, 8 May 2017 06:13:51 +0000 (UTC) Subject: [petsc-users] PETSc module using modules~ both using PETSc sys, DMDA, vec, etc References: <992950763.4463130.1494224031189.ref@mail.yahoo.com> Message-ID: <992950763.4463130.1494224031189@mail.yahoo.com> Hello, I created a few standalone programs that use a DMDA structure, calculate and create matrices. However, now that I am trying to combine them using a main and using the files as modules, the header files seem to consistently conflict. I am currently only trying to compile 3 files out of the several that i have completed.? when trying to compile 2 modules that i have, where the 1st module uses the 2nd module.?Common errors that i am getting is the? "symbol 'xxx' ... conflicts with symbol from module 'utils', use-associated at ..."and"Cannot change attributes of USE-associated symbol xxxx at" Is there a method to go about this? thank you. -Tina -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon May 8 08:37:17 2017 From: jed at jedbrown.org (Jed Brown) Date: Mon, 08 May 2017 07:37:17 -0600 Subject: [petsc-users] PETSc module using modules~ both using PETSc sys, DMDA, vec, etc In-Reply-To: <992950763.4463130.1494224031189@mail.yahoo.com> References: <992950763.4463130.1494224031189.ref@mail.yahoo.com> <992950763.4463130.1494224031189@mail.yahoo.com> Message-ID: <87r2zzbhbm.fsf@jedbrown.org> Tina Patel writes: > Hello, > I created a few standalone programs that use a DMDA structure, calculate and create matrices. However, now that I am trying to combine them using a main and using the files as modules, the header files seem to consistently conflict. I am currently only trying to compile 3 files out of the several that i have completed.? > when trying to compile 2 modules that i have, where the 1st module uses the 2nd module.?Common errors that i am getting is the? > "symbol 'xxx' ... conflicts with symbol from module 'utils', use-associated at ..."and"Cannot change attributes of USE-associated symbol xxxx at" > Is there a method to go about this? Are these errors related to PETSc? Note that if you reuse names in different modules, you may need to do a selective import. use modulename, only: foo, bar=>baz -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From bsmith at mcs.anl.gov Mon May 8 11:25:58 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 8 May 2017 11:25:58 -0500 Subject: [petsc-users] PETSc module using modules~ both using PETSc sys, DMDA, vec, etc In-Reply-To: <992950763.4463130.1494224031189@mail.yahoo.com> References: <992950763.4463130.1494224031189.ref@mail.yahoo.com> <992950763.4463130.1494224031189@mail.yahoo.com> Message-ID: Which version of PETSc are you using. The way we handle Fortran in the git master branch of the repository makes this easier than in the current release so you might consider upgrading. Send the entire error messages, likely you are creating two modules that contain the same PETSc variables which is not allowed with modules. Barry > On May 8, 2017, at 1:13 AM, Tina Patel wrote: > > Hello, > > I created a few standalone programs that use a DMDA structure, calculate and create matrices. However, now that I am trying to combine them using a main and using the files as modules, the header files seem to consistently conflict. I am currently only trying to compile 3 files out of the several that i have completed. > > when trying to compile 2 modules that i have, where the 1st module uses the 2nd module. > Common errors that i am getting is the > > "symbol 'xxx' ... conflicts with symbol from module 'utils', use-associated at ..." > and > "Cannot change attributes of USE-associated symbol xxxx at" > > Is there a method to go about this? thank you. > > -Tina From leidy-catherine.ramirez-villalba at ec-nantes.fr Tue May 9 11:04:06 2017 From: leidy-catherine.ramirez-villalba at ec-nantes.fr (Leidy Catherine Ramirez Villalba) Date: Tue, 9 May 2017 18:04:06 +0200 (CEST) Subject: [petsc-users] Optimization time parallel version of structural solver Message-ID: <432790006.651849.1494345846204.JavaMail.zimbra@ec-nantes.fr> Dear PETSc team, I'm currently working on the parallelization of the assembling of a system, previously assembled in a serial way (manual), but solved using PETSc in parallel. The problem I have is that when comparing computational time with the previous implementation, it seem that the parallel version is slower than the serial one. The type of matrices we deal with are sparse and might change their size in a significant order (kind of contact problems, where relations between elements change). For the example I'm using, for giving an example, the initial size of the matrix is : 139905, after several iteratinos it changes to: 141501 and finally to: 254172. The system is assembled and solved at each iteration and the matrix can not be re-used, therefore for each new iteration the matrix is set to zero keeping the previous non-zero pattern, and the option 'MAT_NEW_NONZERO_LOCATIONS' is set to 'TRUE'. In order to do the assembling I use the function 'MatSetValues' , inserting 3 lines and 3 rows, which might not be next to each other, and thus might no constitute a block. I believe that, what makes an important difference in time is the fact of adding almost the double of elements (from 139905 to 254172), but i don't know how what could I implement to retain a larger preallocation or to solve in any other way. I don't know, neither, in advance the position of new elements so that I can think in placing zeros to, maybe, generate a pre-pattern. Do you have any idea of how could I improve the time of the parallel version? Thanks in advance! Regards, Catherine -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue May 9 11:11:24 2017 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 9 May 2017 11:11:24 -0500 Subject: [petsc-users] Optimization time parallel version of structural solver In-Reply-To: <432790006.651849.1494345846204.JavaMail.zimbra@ec-nantes.fr> References: <432790006.651849.1494345846204.JavaMail.zimbra@ec-nantes.fr> Message-ID: On Tue, May 9, 2017 at 11:04 AM, "Leidy Catherine Ramirez Villalba" < leidy-catherine.ramirez-villalba at ec-nantes.fr> wrote: > Dear PETSc team, > > I'm currently working on the parallelization of the assembling of a > system, previously assembled in a serial way (manual), but solved using > PETSc in parallel. > The problem I have is that when comparing computational time with the > previous implementation, it seem that the parallel version is slower than > the serial one. > > The type of matrices we deal with are sparse and might change their size > in a significant order (kind of contact problems, where relations between > elements change). > For the example I'm using, for giving an example, the initial size of the > matrix is : 139905, after several iteratinos it changes to: 141501 and > finally to: 254172. > > The system is assembled and solved at each iteration and the matrix can > not be re-used, therefore for each new iteration the matrix is set to zero > keeping the previous non-zero pattern, and the option > 'MAT_NEW_NONZERO_LOCATIONS' is set to 'TRUE'. > In order to do the assembling I use the function 'MatSetValues' , > inserting 3 lines and 3 rows, which might not be next to each other, and > thus might no constitute a block. > > I believe that, what makes an important difference in time is the fact of > adding almost the double of elements (from 139905 to 254172), but i don't > know how what could I implement to retain a larger preallocation or to > solve in any other way. > I don't know, neither, in advance the position of new elements so that I > can think in placing zeros to, maybe, generate a pre-pattern. > 1) Conceivably, the difference in parallel might be that you are setting elements owned by other processes. However for all performance questions, we need to see the output of -log_view 2) Certainly inserting new elements is slow because reallocating the matrix is slow. It would be faster to a) Throw away the first matrix b) Count all nonzeros in the second matrix c) Set preallocation for the second matrix d) Fill in the second matrix Thanks, Matt > Do you have any idea of how could I improve the time of the parallel > version? > > Thanks in advance! > > Regards, > Catherine > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From pvsang002 at gmail.com Thu May 11 07:08:24 2017 From: pvsang002 at gmail.com (Pham Pham) Date: Thu, 11 May 2017 19:08:24 +0700 Subject: [petsc-users] Installation question In-Reply-To: References: Message-ID: Hi Matt, Thank you for the reply. I am using University HPC which has multiple nodes, and should be good for parallel computing. The bad performance might be due to the way I install and run PETSc... Looking at the output when running streams, I can see that the Processor names were the same. Does that mean only one processor involved in computing, did it cause the bad performance? Thank you very much. Ph. Below is testing output: [mpepvs at atlas5-c01 petsc-3.7.5]$ make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 PETSC_ARCH=arch-linux-cxx-opt streams cd src/benchmarks/streams; /usr/bin/gmake --no-print-directory PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 PETSC_ARCH=arch-linux-cxx-opt streams /app1/centos6.3/Intel/xe_2015/impi/5.0.3.048/intel64/bin/mpicxx -o MPIVersion.o c -wd1572 -g -O3 -fPIC -I/home/svu/mpepvs/petsc/petsc-3.7.5/include -I/hom e/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/include -I/app1/centos6.3/Intel/xe_2015/impi/5.0.3.048/intel64/include `pwd`/MPIVersion.c +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ The version of PETSc you are using is out-of-date, we recommend updating to the new release Available Version: 3.7.6 Installed Version: 3.7.5 http://www.mcs.anl.gov/petsc/download/index.html +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Running streams with 'mpiexec.hydra ' using 'NPMAX=12' Number of MPI processes 1 Processor names atlas5-c01 Triad: 11026.7604 Rate (MB/s) Number of MPI processes 2 Processor names atlas5-c01 atlas5-c01 Triad: 14669.6730 Rate (MB/s) Number of MPI processes 3 Processor names atlas5-c01 atlas5-c01 atlas5-c01 Triad: 12848.2644 Rate (MB/s) Number of MPI processes 4 Processor names atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 Triad: 15033.7687 Rate (MB/s) Number of MPI processes 5 Processor names atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 Triad: 13299.3830 Rate (MB/s) Number of MPI processes 6 Processor names atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 Triad: 14382.2116 Rate (MB/s) Number of MPI processes 7 Processor names atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 Triad: 13194.2573 Rate (MB/s) Number of MPI processes 8 Processor names atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 Triad: 14199.7255 Rate (MB/s) Number of MPI processes 9 Processor names atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 Triad: 13045.8946 Rate (MB/s) Number of MPI processes 10 Processor names atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 Triad: 13058.3283 Rate (MB/s) Number of MPI processes 11 Processor names atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 Triad: 13037.3334 Rate (MB/s) Number of MPI processes 12 Processor names atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 Triad: 12526.6096 Rate (MB/s) ------------------------------------------------ np speedup 1 1.0 2 1.33 3 1.17 4 1.36 5 1.21 6 1.3 7 1.2 8 1.29 9 1.18 10 1.18 11 1.18 12 1.14 Estimation of possible speedup of MPI programs based on Streams benchmark. It appears you have 1 node(s) See graph in the file src/benchmarks/streams/scaling.png On Fri, May 5, 2017 at 11:26 PM, Matthew Knepley wrote: > On Fri, May 5, 2017 at 10:18 AM, Pham Pham wrote: > >> Hi Satish, >> >> It runs now, and shows a bad speed up: >> Please help to improve this. >> > > http://www.mcs.anl.gov/petsc/documentation/faq.html#computers > > The short answer is: You cannot improve this without buying a different > machine. This is > a fundamental algorithmic limitation that cannot be helped by threads, or > vectorization, or > anything else. > > Matt > > >> Thank you. >> >> >> ? >> >> On Fri, May 5, 2017 at 10:02 PM, Satish Balay wrote: >> >>> With Intel MPI - its best to use mpiexec.hydra [and not mpiexec] >>> >>> So you can do: >>> >>> make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 >>> PETSC_ARCH=arch-linux-cxx-opt MPIEXEC=mpiexec.hydra test >>> >>> >>> [you can also specify --with-mpiexec=mpiexec.hydra at configure time] >>> >>> Satish >>> >>> >>> On Fri, 5 May 2017, Pham Pham wrote: >>> >>> > *Hi,* >>> > *I can configure now, but fail when testing:* >>> > >>> > [mpepvs at atlas7-c10 petsc-3.7.5]$ make >>> > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 >>> PETSC_ARCH=arch-linux-cxx-opt >>> > test Running test examples to verify correct installation >>> > Using PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 and >>> > PETSC_ARCH=arch-linux-cxx-opt >>> > Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1 >>> MPI >>> > process >>> > See http://www.mcs.anl.gov/petsc/documentation/faq.html >>> > mpiexec_atlas7-c10: cannot connect to local mpd >>> (/tmp/mpd2.console_mpepvs); >>> > possible causes: >>> > 1. no mpd is running on this host >>> > 2. an mpd is running but was started without a "console" (-n option) >>> > Possible error running C/C++ src/snes/examples/tutorials/ex19 with 2 >>> MPI >>> > processes >>> > See http://www.mcs.anl.gov/petsc/documentation/faq.html >>> > mpiexec_atlas7-c10: cannot connect to local mpd >>> (/tmp/mpd2.console_mpepvs); >>> > possible causes: >>> > 1. no mpd is running on this host >>> > 2. an mpd is running but was started without a "console" (-n option) >>> > Possible error running Fortran example src/snes/examples/tutorials/ex >>> 5f >>> > with 1 MPI process >>> > See http://www.mcs.anl.gov/petsc/documentation/faq.html >>> > mpiexec_atlas7-c10: cannot connect to local mpd >>> (/tmp/mpd2.console_mpepvs); >>> > possible causes: >>> > 1. no mpd is running on this host >>> > 2. an mpd is running but was started without a "console" (-n option) >>> > Completed test examples >>> > ========================================= >>> > Now to evaluate the computer systems you plan use - do: >>> > make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 >>> > PETSC_ARCH=arch-linux-cxx-opt streams >>> > >>> > >>> > >>> > >>> > *Please help on this.* >>> > *Many thanks!* >>> > >>> > >>> > On Thu, Apr 20, 2017 at 2:02 AM, Satish Balay >>> wrote: >>> > >>> > > Sorry - should have mentioned: >>> > > >>> > > do 'rm -rf arch-linux-cxx-opt' and rerun configure again. >>> > > >>> > > The mpich install from previous build [that is currently in >>> > > arch-linux-cxx-opt/] >>> > > is conflicting with --with-mpi-dir=/app1/centos6.3/gnu/mvapich2-1.9/ >>> > > >>> > > Satish >>> > > >>> > > >>> > > On Wed, 19 Apr 2017, Pham Pham wrote: >>> > > >>> > > > I reconfigured PETSs with installed MPI, however, I got serous >>> error: >>> > > > >>> > > > **************************ERROR***************************** >>> ******** >>> > > > Error during compile, check arch-linux-cxx-opt/lib/petsc/c >>> onf/make.log >>> > > > Send it and arch-linux-cxx-opt/lib/petsc/conf/configure.log to >>> > > > petsc-maint at mcs.anl.gov >>> > > > ************************************************************ >>> ******** >>> > > > >>> > > > Please explain what is happening? >>> > > > >>> > > > Thank you very much. >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > On Wed, Apr 19, 2017 at 11:43 PM, Satish Balay >>> > > wrote: >>> > > > >>> > > > > Presumably your cluster already has a recommended MPI to use >>> [which is >>> > > > > already installed. So you should use that - instead of >>> > > > > --download-mpich=1 >>> > > > > >>> > > > > Satish >>> > > > > >>> > > > > On Wed, 19 Apr 2017, Pham Pham wrote: >>> > > > > >>> > > > > > Hi, >>> > > > > > >>> > > > > > I just installed petsc-3.7.5 into my university cluster. When >>> > > evaluating >>> > > > > > the computer system, PETSc reports "It appears you have 1 >>> node(s)", I >>> > > > > donot >>> > > > > > understand this, since the system is a multinodes system. >>> Could you >>> > > > > please >>> > > > > > explain this to me? >>> > > > > > >>> > > > > > Thank you very much. >>> > > > > > >>> > > > > > S. >>> > > > > > >>> > > > > > Output: >>> > > > > > ========================================= >>> > > > > > Now to evaluate the computer systems you plan use - do: >>> > > > > > make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 >>> > > > > > PETSC_ARCH=arch-linux-cxx-opt streams >>> > > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ make >>> > > > > > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 >>> > > > > PETSC_ARCH=arch-linux-cxx-opt >>> > > > > > streams >>> > > > > > cd src/benchmarks/streams; /usr/bin/gmake --no-print-directory >>> > > > > > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 >>> > > > > PETSC_ARCH=arch-linux-cxx-opt >>> > > > > > streams >>> > > > > > /home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/bin/mpicxx >>> -o >>> > > > > > MPIVersion.o -c -Wall -Wwrite-strings -Wno-strict-aliasing >>> > > > > > -Wno-unknown-pragmas -fvisibility=hidden -g -O >>> > > > > > -I/home/svu/mpepvs/petsc/petsc-3.7.5/include >>> > > > > > -I/home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/incl >>> ude >>> > > > > > `pwd`/MPIVersion.c >>> > > > > > Running streams with >>> > > > > > '/home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/bin/mpiexec >>> ' >>> > > > > using >>> > > > > > 'NPMAX=12' >>> > > > > > Number of MPI processes 1 Processor names atlas7-c10 >>> > > > > > Triad: 9137.5025 Rate (MB/s) >>> > > > > > Number of MPI processes 2 Processor names atlas7-c10 >>> atlas7-c10 >>> > > > > > Triad: 9707.2815 Rate (MB/s) >>> > > > > > Number of MPI processes 3 Processor names atlas7-c10 >>> atlas7-c10 >>> > > > > atlas7-c10 >>> > > > > > Triad: 13559.5275 Rate (MB/s) >>> > > > > > Number of MPI processes 4 Processor names atlas7-c10 >>> atlas7-c10 >>> > > > > atlas7-c10 >>> > > > > > atlas7-c10 >>> > > > > > Triad: 14193.0597 Rate (MB/s) >>> > > > > > Number of MPI processes 5 Processor names atlas7-c10 >>> atlas7-c10 >>> > > > > atlas7-c10 >>> > > > > > atlas7-c10 atlas7-c10 >>> > > > > > Triad: 14492.9234 Rate (MB/s) >>> > > > > > Number of MPI processes 6 Processor names atlas7-c10 >>> atlas7-c10 >>> > > > > atlas7-c10 >>> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 >>> > > > > > Triad: 15476.5912 Rate (MB/s) >>> > > > > > Number of MPI processes 7 Processor names atlas7-c10 >>> atlas7-c10 >>> > > > > atlas7-c10 >>> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >>> > > > > > Triad: 15148.7388 Rate (MB/s) >>> > > > > > Number of MPI processes 8 Processor names atlas7-c10 >>> atlas7-c10 >>> > > > > atlas7-c10 >>> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >>> > > > > > Triad: 15799.1290 Rate (MB/s) >>> > > > > > Number of MPI processes 9 Processor names atlas7-c10 >>> atlas7-c10 >>> > > > > atlas7-c10 >>> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >>> atlas7-c10 >>> > > > > > Triad: 15671.3104 Rate (MB/s) >>> > > > > > Number of MPI processes 10 Processor names atlas7-c10 >>> atlas7-c10 >>> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >>> atlas7-c10 >>> > > > > > atlas7-c10 atlas7-c10 >>> > > > > > Triad: 15601.4754 Rate (MB/s) >>> > > > > > Number of MPI processes 11 Processor names atlas7-c10 >>> atlas7-c10 >>> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >>> atlas7-c10 >>> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 >>> > > > > > Triad: 15434.5790 Rate (MB/s) >>> > > > > > Number of MPI processes 12 Processor names atlas7-c10 >>> atlas7-c10 >>> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >>> atlas7-c10 >>> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >>> > > > > > Triad: 15134.1263 Rate (MB/s) >>> > > > > > ------------------------------------------------ >>> > > > > > np speedup >>> > > > > > 1 1.0 >>> > > > > > 2 1.06 >>> > > > > > 3 1.48 >>> > > > > > 4 1.55 >>> > > > > > 5 1.59 >>> > > > > > 6 1.69 >>> > > > > > 7 1.66 >>> > > > > > 8 1.73 >>> > > > > > 9 1.72 >>> > > > > > 10 1.71 >>> > > > > > 11 1.69 >>> > > > > > 12 1.66 >>> > > > > > Estimation of possible speedup of MPI programs based on Streams >>> > > > > benchmark. >>> > > > > > It appears you have 1 node(s) >>> > > > > > Unable to plot speedup to a file >>> > > > > > Unable to open matplotlib to plot speedup >>> > > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ >>> > > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ >>> > > > > > >>> > > > > >>> > > > > >>> > > > >>> > > >>> > > >>> > >>> >>> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: scaling.png Type: image/png Size: 46047 bytes Desc: not available URL: From knepley at gmail.com Thu May 11 07:27:19 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 11 May 2017 07:27:19 -0500 Subject: [petsc-users] Installation question In-Reply-To: References: Message-ID: On Thu, May 11, 2017 at 7:08 AM, Pham Pham wrote: > Hi Matt, > > Thank you for the reply. > > I am using University HPC which has multiple nodes, and should be good for > parallel computing. The bad performance might be due to the way I install > and run PETSc... > > Looking at the output when running streams, I can see that the Processor > names were the same. > Does that mean only one processor involved in computing, did it cause the > bad performance? > Yes. From the data, it appears that the kind of processor you have has 12 cores, but only enough memory bandwidth to support 1.5 cores. Try running the STREAMS with only 1 process per node. This is a setting in your submission script, but it is different for every cluster. Thus I would ask the local sysdamin for this machine to help you do that. You should see almost perfect scaling with that configuration. You might also try 2 processes per node to compare. Thanks, Matt > Thank you very much. > > Ph. > > Below is testing output: > > [mpepvs at atlas5-c01 petsc-3.7.5]$ make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > PETSC_ARCH=arch-linux-cxx-opt streams > > > > > cd src/benchmarks/streams; /usr/bin/gmake --no-print-directory > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > PETSC_ARCH=arch-linux-cxx-opt streams > /app1/centos6.3/Intel/xe_2015/impi/5.0.3.048/intel64/bin/mpicxx -o > MPIVersion.o c -wd1572 -g -O3 -fPIC -I/home/svu/mpepvs/petsc/petsc-3.7.5/include > -I/hom > > > e/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/include > -I/app1/centos6.3/Intel/xe_2015/impi/5.0.3.048/intel64/include > `pwd`/MPIVersion.c > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > +++++++++++++++++++++++++++++++ > The version of PETSc you are using is out-of-date, we recommend updating > to the new release > Available Version: 3.7.6 Installed Version: 3.7.5 > http://www.mcs.anl.gov/petsc/download/index.html > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > +++++++++++++++++++++++++++++++ > Running streams with 'mpiexec.hydra ' using 'NPMAX=12' > Number of MPI processes 1 Processor names atlas5-c01 > Triad: 11026.7604 Rate (MB/s) > Number of MPI processes 2 Processor names atlas5-c01 atlas5-c01 > Triad: 14669.6730 Rate (MB/s) > Number of MPI processes 3 Processor names atlas5-c01 atlas5-c01 atlas5-c01 > Triad: 12848.2644 Rate (MB/s) > Number of MPI processes 4 Processor names atlas5-c01 atlas5-c01 > atlas5-c01 atlas5-c01 > Triad: 15033.7687 Rate (MB/s) > Number of MPI processes 5 Processor names atlas5-c01 atlas5-c01 > atlas5-c01 atlas5-c01 atlas5-c01 > Triad: 13299.3830 Rate (MB/s) > Number of MPI processes 6 Processor names atlas5-c01 atlas5-c01 > atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 > Triad: 14382.2116 Rate (MB/s) > Number of MPI processes 7 Processor names atlas5-c01 atlas5-c01 > atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 > Triad: 13194.2573 Rate (MB/s) > Number of MPI processes 8 Processor names atlas5-c01 atlas5-c01 > atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 > Triad: 14199.7255 Rate (MB/s) > Number of MPI processes 9 Processor names atlas5-c01 atlas5-c01 > atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 > Triad: 13045.8946 Rate (MB/s) > Number of MPI processes 10 Processor names atlas5-c01 atlas5-c01 > atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 > atlas5-c01 atlas5-c01 > Triad: 13058.3283 Rate (MB/s) > Number of MPI processes 11 Processor names atlas5-c01 atlas5-c01 > atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 > atlas5-c01 atlas5-c01 atlas5-c01 > Triad: 13037.3334 Rate (MB/s) > Number of MPI processes 12 Processor names atlas5-c01 atlas5-c01 > atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 > atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 > Triad: 12526.6096 Rate (MB/s) > ------------------------------------------------ > np speedup > 1 1.0 > 2 1.33 > 3 1.17 > 4 1.36 > 5 1.21 > 6 1.3 > 7 1.2 > 8 1.29 > 9 1.18 > 10 1.18 > 11 1.18 > 12 1.14 > Estimation of possible speedup of MPI programs based on Streams benchmark. > It appears you have 1 node(s) > See graph in the file src/benchmarks/streams/scaling.png > > On Fri, May 5, 2017 at 11:26 PM, Matthew Knepley > wrote: > >> On Fri, May 5, 2017 at 10:18 AM, Pham Pham wrote: >> >>> Hi Satish, >>> >>> It runs now, and shows a bad speed up: >>> Please help to improve this. >>> >> >> http://www.mcs.anl.gov/petsc/documentation/faq.html#computers >> >> The short answer is: You cannot improve this without buying a different >> machine. This is >> a fundamental algorithmic limitation that cannot be helped by threads, or >> vectorization, or >> anything else. >> >> Matt >> >> >>> Thank you. >>> >>> >>> ? >>> >>> On Fri, May 5, 2017 at 10:02 PM, Satish Balay wrote: >>> >>>> With Intel MPI - its best to use mpiexec.hydra [and not mpiexec] >>>> >>>> So you can do: >>>> >>>> make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 >>>> PETSC_ARCH=arch-linux-cxx-opt MPIEXEC=mpiexec.hydra test >>>> >>>> >>>> [you can also specify --with-mpiexec=mpiexec.hydra at configure time] >>>> >>>> Satish >>>> >>>> >>>> On Fri, 5 May 2017, Pham Pham wrote: >>>> >>>> > *Hi,* >>>> > *I can configure now, but fail when testing:* >>>> > >>>> > [mpepvs at atlas7-c10 petsc-3.7.5]$ make >>>> > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 >>>> PETSC_ARCH=arch-linux-cxx-opt >>>> > test Running test examples to verify correct installation >>>> > Using PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 and >>>> > PETSC_ARCH=arch-linux-cxx-opt >>>> > Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1 >>>> MPI >>>> > process >>>> > See http://www.mcs.anl.gov/petsc/documentation/faq.html >>>> > mpiexec_atlas7-c10: cannot connect to local mpd >>>> (/tmp/mpd2.console_mpepvs); >>>> > possible causes: >>>> > 1. no mpd is running on this host >>>> > 2. an mpd is running but was started without a "console" (-n option) >>>> > Possible error running C/C++ src/snes/examples/tutorials/ex19 with 2 >>>> MPI >>>> > processes >>>> > See http://www.mcs.anl.gov/petsc/documentation/faq.html >>>> > mpiexec_atlas7-c10: cannot connect to local mpd >>>> (/tmp/mpd2.console_mpepvs); >>>> > possible causes: >>>> > 1. no mpd is running on this host >>>> > 2. an mpd is running but was started without a "console" (-n option) >>>> > Possible error running Fortran example src/snes/examples/tutorials/ex >>>> 5f >>>> > with 1 MPI process >>>> > See http://www.mcs.anl.gov/petsc/documentation/faq.html >>>> > mpiexec_atlas7-c10: cannot connect to local mpd >>>> (/tmp/mpd2.console_mpepvs); >>>> > possible causes: >>>> > 1. no mpd is running on this host >>>> > 2. an mpd is running but was started without a "console" (-n option) >>>> > Completed test examples >>>> > ========================================= >>>> > Now to evaluate the computer systems you plan use - do: >>>> > make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 >>>> > PETSC_ARCH=arch-linux-cxx-opt streams >>>> > >>>> > >>>> > >>>> > >>>> > *Please help on this.* >>>> > *Many thanks!* >>>> > >>>> > >>>> > On Thu, Apr 20, 2017 at 2:02 AM, Satish Balay >>>> wrote: >>>> > >>>> > > Sorry - should have mentioned: >>>> > > >>>> > > do 'rm -rf arch-linux-cxx-opt' and rerun configure again. >>>> > > >>>> > > The mpich install from previous build [that is currently in >>>> > > arch-linux-cxx-opt/] >>>> > > is conflicting with --with-mpi-dir=/app1/centos6.3 >>>> /gnu/mvapich2-1.9/ >>>> > > >>>> > > Satish >>>> > > >>>> > > >>>> > > On Wed, 19 Apr 2017, Pham Pham wrote: >>>> > > >>>> > > > I reconfigured PETSs with installed MPI, however, I got serous >>>> error: >>>> > > > >>>> > > > **************************ERROR***************************** >>>> ******** >>>> > > > Error during compile, check arch-linux-cxx-opt/lib/petsc/c >>>> onf/make.log >>>> > > > Send it and arch-linux-cxx-opt/lib/petsc/conf/configure.log to >>>> > > > petsc-maint at mcs.anl.gov >>>> > > > ************************************************************ >>>> ******** >>>> > > > >>>> > > > Please explain what is happening? >>>> > > > >>>> > > > Thank you very much. >>>> > > > >>>> > > > >>>> > > > >>>> > > > >>>> > > > On Wed, Apr 19, 2017 at 11:43 PM, Satish Balay >>> > >>>> > > wrote: >>>> > > > >>>> > > > > Presumably your cluster already has a recommended MPI to use >>>> [which is >>>> > > > > already installed. So you should use that - instead of >>>> > > > > --download-mpich=1 >>>> > > > > >>>> > > > > Satish >>>> > > > > >>>> > > > > On Wed, 19 Apr 2017, Pham Pham wrote: >>>> > > > > >>>> > > > > > Hi, >>>> > > > > > >>>> > > > > > I just installed petsc-3.7.5 into my university cluster. When >>>> > > evaluating >>>> > > > > > the computer system, PETSc reports "It appears you have 1 >>>> node(s)", I >>>> > > > > donot >>>> > > > > > understand this, since the system is a multinodes system. >>>> Could you >>>> > > > > please >>>> > > > > > explain this to me? >>>> > > > > > >>>> > > > > > Thank you very much. >>>> > > > > > >>>> > > > > > S. >>>> > > > > > >>>> > > > > > Output: >>>> > > > > > ========================================= >>>> > > > > > Now to evaluate the computer systems you plan use - do: >>>> > > > > > make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 >>>> > > > > > PETSC_ARCH=arch-linux-cxx-opt streams >>>> > > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ make >>>> > > > > > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 >>>> > > > > PETSC_ARCH=arch-linux-cxx-opt >>>> > > > > > streams >>>> > > > > > cd src/benchmarks/streams; /usr/bin/gmake >>>> --no-print-directory >>>> > > > > > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 >>>> > > > > PETSC_ARCH=arch-linux-cxx-opt >>>> > > > > > streams >>>> > > > > > /home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/bin/mpicxx >>>> -o >>>> > > > > > MPIVersion.o -c -Wall -Wwrite-strings -Wno-strict-aliasing >>>> > > > > > -Wno-unknown-pragmas -fvisibility=hidden -g -O >>>> > > > > > -I/home/svu/mpepvs/petsc/petsc-3.7.5/include >>>> > > > > > -I/home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/incl >>>> ude >>>> > > > > > `pwd`/MPIVersion.c >>>> > > > > > Running streams with >>>> > > > > > '/home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/bin/mpiexec >>>> ' >>>> > > > > using >>>> > > > > > 'NPMAX=12' >>>> > > > > > Number of MPI processes 1 Processor names atlas7-c10 >>>> > > > > > Triad: 9137.5025 Rate (MB/s) >>>> > > > > > Number of MPI processes 2 Processor names atlas7-c10 >>>> atlas7-c10 >>>> > > > > > Triad: 9707.2815 Rate (MB/s) >>>> > > > > > Number of MPI processes 3 Processor names atlas7-c10 >>>> atlas7-c10 >>>> > > > > atlas7-c10 >>>> > > > > > Triad: 13559.5275 Rate (MB/s) >>>> > > > > > Number of MPI processes 4 Processor names atlas7-c10 >>>> atlas7-c10 >>>> > > > > atlas7-c10 >>>> > > > > > atlas7-c10 >>>> > > > > > Triad: 14193.0597 Rate (MB/s) >>>> > > > > > Number of MPI processes 5 Processor names atlas7-c10 >>>> atlas7-c10 >>>> > > > > atlas7-c10 >>>> > > > > > atlas7-c10 atlas7-c10 >>>> > > > > > Triad: 14492.9234 Rate (MB/s) >>>> > > > > > Number of MPI processes 6 Processor names atlas7-c10 >>>> atlas7-c10 >>>> > > > > atlas7-c10 >>>> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 >>>> > > > > > Triad: 15476.5912 Rate (MB/s) >>>> > > > > > Number of MPI processes 7 Processor names atlas7-c10 >>>> atlas7-c10 >>>> > > > > atlas7-c10 >>>> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >>>> > > > > > Triad: 15148.7388 Rate (MB/s) >>>> > > > > > Number of MPI processes 8 Processor names atlas7-c10 >>>> atlas7-c10 >>>> > > > > atlas7-c10 >>>> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >>>> > > > > > Triad: 15799.1290 Rate (MB/s) >>>> > > > > > Number of MPI processes 9 Processor names atlas7-c10 >>>> atlas7-c10 >>>> > > > > atlas7-c10 >>>> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >>>> atlas7-c10 >>>> > > > > > Triad: 15671.3104 Rate (MB/s) >>>> > > > > > Number of MPI processes 10 Processor names atlas7-c10 >>>> atlas7-c10 >>>> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >>>> atlas7-c10 >>>> > > > > > atlas7-c10 atlas7-c10 >>>> > > > > > Triad: 15601.4754 Rate (MB/s) >>>> > > > > > Number of MPI processes 11 Processor names atlas7-c10 >>>> atlas7-c10 >>>> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >>>> atlas7-c10 >>>> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 >>>> > > > > > Triad: 15434.5790 Rate (MB/s) >>>> > > > > > Number of MPI processes 12 Processor names atlas7-c10 >>>> atlas7-c10 >>>> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >>>> atlas7-c10 >>>> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >>>> > > > > > Triad: 15134.1263 Rate (MB/s) >>>> > > > > > ------------------------------------------------ >>>> > > > > > np speedup >>>> > > > > > 1 1.0 >>>> > > > > > 2 1.06 >>>> > > > > > 3 1.48 >>>> > > > > > 4 1.55 >>>> > > > > > 5 1.59 >>>> > > > > > 6 1.69 >>>> > > > > > 7 1.66 >>>> > > > > > 8 1.73 >>>> > > > > > 9 1.72 >>>> > > > > > 10 1.71 >>>> > > > > > 11 1.69 >>>> > > > > > 12 1.66 >>>> > > > > > Estimation of possible speedup of MPI programs based on >>>> Streams >>>> > > > > benchmark. >>>> > > > > > It appears you have 1 node(s) >>>> > > > > > Unable to plot speedup to a file >>>> > > > > > Unable to open matplotlib to plot speedup >>>> > > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ >>>> > > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ >>>> > > > > > >>>> > > > > >>>> > > > > >>>> > > > >>>> > > >>>> > > >>>> > >>>> >>>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: scaling.png Type: image/png Size: 46047 bytes Desc: not available URL: From gbisht at lbl.gov Thu May 11 11:24:39 2017 From: gbisht at lbl.gov (Gautam Bisht) Date: Thu, 11 May 2017 09:24:39 -0700 Subject: [petsc-users] [petsc-dev] For Fortran users of PETSc development version In-Reply-To: <4726FF04-1D07-455F-88B4-FF2488DCAE07@mcs.anl.gov> References: <4726FF04-1D07-455F-88B4-FF2488DCAE07@mcs.anl.gov> Message-ID: Hi Barry, I'm wondering if these changes will be a part of future 3.8.0 release or 4.0.0 release. And, do you have a tentative timeline when such a release tag would be made? -Gautam. On Sun, Dec 4, 2016 at 11:13 AM, Barry Smith wrote: > > Jed noticed a small mistake in my description. It is type(tXXX) not > type(iXXX) if you chose to declare your variables that way. Note that > declaring them via type(tXXX) or XXX is identical (XXX is just a macro for > type(tXXX)). > > Barry > > > > On Dec 4, 2016, at 11:57 AM, Barry Smith wrote: > > > > > > For Fortran users of the PETSc development (git master branch) version > > > > > > I have updated and simplified the Fortran usage of PETSc in the past > few weeks. I will put the branch barry/fortran-update into the master > branch on Monday. The usage changes are > > > > A) for each Fortran function (and main) use the following > > > > subroutine mysubroutine(.....) > > #include > > use petscxxx > > implicit none > > > > For example if you are using SNES in your code you would have > > > > #include > > use petscsnes > > implicit none > > > > B) Instead of PETSC_NULL_OBJECT you must pass PETSC_NULL_XXX (for > example PETSC_NULL_VEC) using the specific object type XXX that the > function call is expecting. > > > > C) Objects can be declared either as XXX a or type(iXXX) a, for > example Mat a or type(iMat) a. (Note that previously for those who used > types it was type(Mat) but that can no longer be used. > > > > Notes: > > > > 1) There are no longer any .h90 files that may be included > > > > 2) Like C the include files are now nested so you no longer need to > include for example > > > > #include > > #include > > #include > > #include > > #include > > > > you can just include > > > > #include > > > > 3) there is now type checking of most function calls. This will help > eliminate bugs due to incorrect calling sequences. Note that Fortran > distinguishes between a argument that is a scalar (zero dimensional array), > a one dimensional array and a two dimensional array (etc). So you may get > compile warnings because you are passing in an array when PETSc expects a > scalar or vis-versa. If you get these simply fix your declaration of the > variable to match what is expected. In some routines like MatSetValues() > and friends you can pass either scalars, one dimensional arrays or two > dimensional arrays, if you get errors here please send mail to > petsc-maint at mcs.anl.gov and include enough of your code so we can see the > dimensions of all your variables so we can fix the problems. > > > > 4) You can continue to use either fixed (.F extension) or free format > (.F90 extension) for your source > > > > 5) All the examples in PETSc have been updated so consult them for > clarifications. > > > > > > Please report any problems to petsc-maint at mcs.anl.gov > > > > Thanks > > > > Barry > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From leejearl at 126.com Thu May 11 22:57:03 2017 From: leejearl at 126.com (=?GBK?B?wO68vg==?=) Date: Fri, 12 May 2017 11:57:03 +0800 (CST) Subject: [petsc-users] how to get the vertices belongs to a control volume in 2D? Message-ID: <298d38df.50bb.15bfacd8032.Coremail.leejearl@126.com> Hi developers: I have such a question that I want to get the vertices of a cell. I know I can get the points by 1. Getting the faces of a cell such as "DMPlexGetCone(dm, c, &faces"; 2. Getting the vertices of every face of the cell such as "DMPlexGetCone(dm, f, &vertices)". Then I can obtain the vertices belongs to a cell. Is there any concise routine which I can choose to get the vertices of a cell directly? Thanks leejearl -------------- next part -------------- An HTML attachment was scrubbed... URL: From mbaker112 at outlook.de Fri May 12 01:50:41 2017 From: mbaker112 at outlook.de (Matt Baker) Date: Fri, 12 May 2017 06:50:41 +0000 Subject: [petsc-users] Some general questions Message-ID: Hello, I have a few questions on how to improve performance of my program. I'm solving Poisson's equation on a (large) 3D FD grid with Dirichlet boundary conditions and multiple right hand sides. I set up the matrix and everything's working fine so far, but I'm sure the solving process could go faster. I know multigrid is generally the best preconditioner in such a case and algebraic multigrid currently works best. So generally speaking: Should I make the effort of symmetrizising the system matrix? I know how to do it, but it would probably take some time. CG does currently work, but is not competitive against other methods, so I guess the matrix might not be "symmetric enough"? For the various multigrid preconditioners: I always read that the problem should be solved exactly on the coarsest grid, but wouldn't an iterative solver do the same job if its provided accuracy is high enough, since the coarse discretization and the subsequent interpolation process introduce errors themselves? I submit my program to a batch system, but PETSc was compiled on the login node with different hardware. Is this affecting performance? What parts of the configuration process should I perform on a compute node then? Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.mayhem23 at gmail.com Fri May 12 02:25:43 2017 From: dave.mayhem23 at gmail.com (Dave May) Date: Fri, 12 May 2017 08:25:43 +0100 Subject: [petsc-users] Some general questions In-Reply-To: References: Message-ID: On 12 May 2017 at 07:50, Matt Baker wrote: > Hello, > > > I have a few questions on how to improve performance of my program. I'm > solving Poisson's equation on a (large) 3D FD grid with Dirichlet boundary > conditions and multiple right hand sides. I set up the matrix and > everything's working fine so far, but I'm sure the solving process could go > faster. I know multigrid is generally the best preconditioner in such a > case and algebraic multigrid currently works best. > If you use a DMDA for your FD problem, consider using PCMG with Galerkin. It will set up a geometric multigrid hierarchy. Depending on the specifics of your Poisson problem (constant coefficient versus highly hetegoneous), geometric MG is likely superior (faster time to solution) than AMG. > > So generally speaking: > > > Should I make the effort of symmetrizising the system matrix? I know how > to do it, but it would probably take some time. CG does currently work, but > is not competitive against other methods, so I guess the matrix might not > be "symmetric enough"? > In its basic form, CG is only guaranteed to converge with with an SPD operator. If you want to use CG, definitely do the work and make the operator symmetric. > > For the various multigrid preconditioners: I always read that the problem > should be solved exactly on the coarsest grid, but wouldn't an iterative > solver do the same job if its provided accuracy is high enough, since the > coarse discretization and the subsequent interpolation process introduce > errors themselves? > Yes iterative can work well. If your Poisson problem has a constant coefficient, rtol 1.0e-1 is likely a sufficient tolerance to use for an the coarse grid solve (e.g. overall convergence of solve won't be affected). If the Poisson problem has a highly variable coefficient (jumps of O(1e3) or more), or it has very large gradients say 1e3 variation over a few cells, then you will have to perform a more accurate iterative coarse level solve (say rtol 1e-4 to 1e-6). Note that the numbers for rtol I quote are purely empirical. > > I submit my program to a batch system, but PETSc was compiled on the login > node with different hardware. Is this affecting performance? What parts of > the configuration process should I perform on a compute node then? > If the login and compute nodes are fundamentally different, you should configure petsc with the option --with-batch and following the instructions. Thanks, Dave > > Thanks. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lawrence.mitchell at imperial.ac.uk Fri May 12 03:48:37 2017 From: lawrence.mitchell at imperial.ac.uk (Lawrence Mitchell) Date: Fri, 12 May 2017 09:48:37 +0100 Subject: [petsc-users] how to get the vertices belongs to a control volume in 2D? In-Reply-To: <298d38df.50bb.15bfacd8032.Coremail.leejearl@126.com> References: <298d38df.50bb.15bfacd8032.Coremail.leejearl@126.com> Message-ID: <14503BBE-816D-4287-B724-AFC857AE63F1@imperial.ac.uk> > On 12 May 2017, at 04:57, ?? wrote: > > Hi developers: > I have such a question that I want to get the vertices of a cell. I know I can get the points by > 1. Getting the faces of a cell such as "DMPlexGetCone(dm, c, &faces"; > 2. Getting the vertices of every face of the cell such as "DMPlexGetCone(dm, f, &vertices)". > > Then I can obtain the vertices belongs to a cell. Is there any concise routine which I can choose to get the > vertices of a cell directly? You should use the interface for the transitive closure. Find bounds of points that are vertices: DMPlexGetDepthStratum(dm, &vStart, &vEnd); ... DMPlexGetTransitiveClosure(dm, c, PETSC_TRUE, &nclosure, &closure); for (PetscInt i = 0; i < nclosure; i++) { const PetscInt p = closure[2*i]; if (p >= vStart && p < vEnd) { p is a vertex } } This works regardless of the topological dimension of the "cell" point you are using (the same code is good to find the vertices in the closure of a facet, say). Matt's course notes (http://www.caam.rice.edu/~caam519/CSBook.pdf) have nice pictures that help understand this language in section 7.1. Cheers, Lawrence From knepley at gmail.com Fri May 12 04:08:23 2017 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 12 May 2017 04:08:23 -0500 Subject: [petsc-users] Some general questions In-Reply-To: References: Message-ID: On Fri, May 12, 2017 at 2:25 AM, Dave May wrote: > > > On 12 May 2017 at 07:50, Matt Baker wrote: > >> Hello, >> >> >> I have a few questions on how to improve performance of my program. I'm >> solving Poisson's equation on a (large) 3D FD grid with Dirichlet boundary >> conditions and multiple right hand sides. I set up the matrix and >> everything's working fine so far, but I'm sure the solving process could go >> faster. I know multigrid is generally the best preconditioner in such a >> case and algebraic multigrid currently works best. >> > > If you use a DMDA for your FD problem, consider using PCMG with Galerkin. > It will set up a geometric multigrid hierarchy. Depending on the specifics > of your Poisson problem (constant coefficient versus highly hetegoneous), > geometric MG is likely superior (faster time to solution) than AMG. > > >> >> So generally speaking: >> >> >> Should I make the effort of symmetrizising the system matrix? I know how >> to do it, but it would probably take some time. CG does currently work, but >> is not competitive against other methods, so I guess the matrix might not >> be "symmetric enough"? >> > > In its basic form, CG is only guaranteed to converge with with an SPD > operator. > If you want to use CG, definitely do the work and make the operator > symmetric. > For Poisson, CG is never, ever ever, ever ever faster than Full Multigrid (FMG). Don't use it. All the people publishing that are idiots :) but of course try it out for yourself with -pc_mg_type full It should converge to discretization error in 1 iterate if the smoother is strong enough (you might need to use 2 iterates on the downsmooth). > >> For the various multigrid preconditioners: I always read that the problem >> should be solved exactly on the coarsest grid, but wouldn't an iterative >> solver do the same job if its provided accuracy is high enough, since the >> coarse discretization and the subsequent interpolation process introduce >> errors themselves? >> > > Yes iterative can work well. If your Poisson problem has a constant > coefficient, rtol 1.0e-1 is likely a sufficient tolerance to use for an the > coarse grid solve (e.g. overall convergence of solve won't be affected). If > the Poisson problem has a highly variable coefficient (jumps of O(1e3) or > more), or it has very large gradients say 1e3 variation over a few cells, > then you will have to perform a more accurate iterative coarse level solve > (say rtol 1e-4 to 1e-6). Note that the numbers for rtol I quote are purely > empirical. > Always compare to direct. I don't think anything beats direct on problems the size of your coarse problem. If iterative is winning, likely your coarse problem is too big. However, again you can try it yourself easily with options. Matt > >> I submit my program to a batch system, but PETSc was compiled on the >> login node with different hardware. Is this affecting performance? What >> parts of the configuration process should I perform on a compute node then? >> > If the login and compute nodes are fundamentally different, you should > configure petsc with the option > --with-batch > and following the instructions. > > Thanks, > Dave > >> >> Thanks. >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri May 12 04:10:12 2017 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 12 May 2017 04:10:12 -0500 Subject: [petsc-users] how to get the vertices belongs to a control volume in 2D? In-Reply-To: <14503BBE-816D-4287-B724-AFC857AE63F1@imperial.ac.uk> References: <298d38df.50bb.15bfacd8032.Coremail.leejearl@126.com> <14503BBE-816D-4287-B724-AFC857AE63F1@imperial.ac.uk> Message-ID: On Fri, May 12, 2017 at 3:48 AM, Lawrence Mitchell < lawrence.mitchell at imperial.ac.uk> wrote: > > > On 12 May 2017, at 04:57, ?? wrote: > > > > Hi developers: > > I have such a question that I want to get the vertices of a cell. I > know I can get the points by > > 1. Getting the faces of a cell such as "DMPlexGetCone(dm, c, &faces"; > > 2. Getting the vertices of every face of the cell such as > "DMPlexGetCone(dm, f, &vertices)". > > > > Then I can obtain the vertices belongs to a cell. Is there any > concise routine which I can choose to get the > > vertices of a cell directly? > > You should use the interface for the transitive closure. > > Find bounds of points that are vertices: > > DMPlexGetDepthStratum(dm, &vStart, &vEnd); > > ... > DMPlexGetTransitiveClosure(dm, c, PETSC_TRUE, &nclosure, &closure); > for (PetscInt i = 0; i < nclosure; i++) { > const PetscInt p = closure[2*i]; > if (p >= vStart && p < vEnd) { > p is a vertex > } > } > > This works regardless of the topological dimension of the "cell" point you > are using (the same code is good to find the vertices in the closure of a > facet, say). > > Matt's course notes (http://www.caam.rice.edu/~caam519/CSBook.pdf) have > nice pictures that help understand this language in section 7.1. > Also note that this is fine for getting vertices if you want to do topological things. However, if what you really want is some function over the vertices (like coordinates), you should use http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DM/DMPlexVecGetClosure.html Thanks, Matt > Cheers, > > Lawrence -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From leejearl at 126.com Mon May 15 03:20:44 2017 From: leejearl at 126.com (leejearl) Date: Mon, 15 May 2017 16:20:44 +0800 Subject: [petsc-users] how to get the vertices belongs to a control volume in 2D? In-Reply-To: References: <298d38df.50bb.15bfacd8032.Coremail.leejearl@126.com> <14503BBE-816D-4287-B724-AFC857AE63F1@imperial.ac.uk> Message-ID: <0db70c6b-1c1e-6b28-c4c9-66ae05a08ca8@126.com> Hi, all: Thanks for your kind reply. Matt's course notes looks very nice. Thanks, leejearl On 2017?05?12? 17:10, Matthew Knepley wrote: > On Fri, May 12, 2017 at 3:48 AM, Lawrence Mitchell > > wrote: > > > > On 12 May 2017, at 04:57, ?? > wrote: > > > > Hi developers: > > I have such a question that I want to get the vertices of a > cell. I know I can get the points by > > 1. Getting the faces of a cell such as "DMPlexGetCone(dm, c, > &faces"; > > 2. Getting the vertices of every face of the cell such as > "DMPlexGetCone(dm, f, &vertices)". > > > > Then I can obtain the vertices belongs to a cell. Is there > any concise routine which I can choose to get the > > vertices of a cell directly? > > You should use the interface for the transitive closure. > > Find bounds of points that are vertices: > > DMPlexGetDepthStratum(dm, &vStart, &vEnd); > > ... > DMPlexGetTransitiveClosure(dm, c, PETSC_TRUE, &nclosure, &closure); > for (PetscInt i = 0; i < nclosure; i++) { > const PetscInt p = closure[2*i]; > if (p >= vStart && p < vEnd) { > p is a vertex > } > } > > This works regardless of the topological dimension of the "cell" > point you are using (the same code is good to find the vertices in > the closure of a facet, say). > > Matt's course notes (http://www.caam.rice.edu/~caam519/CSBook.pdf > ) have nice > pictures that help understand this language in section 7.1. > > > Also note that this is fine for getting vertices if you want to do > topological things. However, if what you really want is > some function over the vertices (like coordinates), you should use > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DM/DMPlexVecGetClosure.html > > Thanks, > > Matt > > Cheers, > > Lawrence > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From ibarletta at inogs.it Mon May 15 10:37:48 2017 From: ibarletta at inogs.it (Barletta, Ivano) Date: Mon, 15 May 2017 17:37:48 +0200 Subject: [petsc-users] Problem with IS and VecScatter Message-ID: Hello users/developers I'm trying to build a vecscatter object to migrate data from a vector x to a vector x2 having same global size but different parallel layout. Prior to this, I build an Index Set using the method ISCreateStride The IS is created correctly, since the program returns ierr=0 when I call the subroutine (I'm using Fortran 90). but when I run the program in parallel I get this error 0:[0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- 0:[0]PETSC ERROR: Unknown type. Check for miss-spelling or missing package: http://www.mcs.anl.gov/petsc/documentation/installation.html#external 0:[0]PETSC ERROR: Unknown IS type: general 0:[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. 0:[0]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 0:[0]PETSC ERROR: ./opa on a arch-linux2-c-debug named n419.cluster.net by ib04116 Mon May 15 17:19:25 2017 0:[0]PETSC ERROR: Configure options --with-cc=mpiicc --with-fc=mpiifort --with-cxx=mpiicpc --with-mpiexec=mpirun --with-blas-lapack-dir=/users/home/opt/intel/composer_xe_2013/mkl --with-scalapack-lib="-L/users/home/opt/intel/composer_xe_2013/mkl//lib/intel64 -lmkl_scalapack_ilp64 -lmkl_blacs_intelmpi_ilp64" --with-scalapack-include=/users/home/opt/intel/composer_xe_2013/mkl/include --download-metis --download-parmetis --download-mumps --download-superlu --with-debugging=yes CFLAGS=-I/users/home/opt/netcdf/netcdf-4.3/include -I/users/home/opt/szip/szip-2.1/include -I/users/home/opt/hdf5/hdf5-1.8.11/include -I/usr/include FFLAGS=-xHost -no-prec-div -O3 -I/users/home/opt/netcdf/netcdf-4.3/include LDFLAGS=-L/users/home/opt/netcdf/netcdf-4.3/lib -lnetcdff -L/users/home/opt/hdf5/hdf5-1.8.11/lib -L/users/home/opt/netcdf/netcdf-4.3/lib -L/usr/lib64/ -lz -lgpfs -lnetcdf -lcurl -lnetcdf The program cannot complete the scatter process and remains hanging. One odd thing is that, though I create a stride, the IS object is marked as general. An even more odd thing is that I've used the same code in a simple test case and everything worked fine... Have you got any hint about this? Thanks Ivano -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon May 15 12:26:04 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 15 May 2017 12:26:04 -0500 Subject: [petsc-users] Problem with IS and VecScatter In-Reply-To: References: Message-ID: <92FF3A9E-FEAD-442B-BDED-423FF47F4DFC@mcs.anl.gov> First run with valgrind: http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind then run with -start_in_debugger and look at the variables where it crashes to see why it might crash. > On May 15, 2017, at 10:37 AM, Barletta, Ivano wrote: > > Hello users/developers > > I'm trying to build a vecscatter object to > migrate data from a vector x to a vector x2 > having same global size but different parallel layout. > > Prior to this, I build an Index Set using the method > > ISCreateStride > > The IS is created correctly, since the program returns > ierr=0 when I call the subroutine (I'm using Fortran 90). > > but when I run the program in parallel I get this error > > 0:[0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > 0:[0]PETSC ERROR: Unknown type. Check for miss-spelling or missing package: http://www.mcs.anl.gov/petsc/documentation/installation.html#external > 0:[0]PETSC ERROR: Unknown IS type: general > 0:[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > 0:[0]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 > 0:[0]PETSC ERROR: ./opa on a arch-linux2-c-debug named n419.cluster.net by ib04116 Mon May 15 17:19:25 2017 > 0:[0]PETSC ERROR: Configure options --with-cc=mpiicc --with-fc=mpiifort --with-cxx=mpiicpc --with-mpiexec=mpirun --with-blas-lapack-dir=/users/home/opt/intel/composer_xe_2013/mkl --with-scalapack-lib="-L/users/home/opt/intel/composer_xe_2013/mkl//lib/intel64 -lmkl_scalapack_ilp64 -lmkl_blacs_intelmpi_ilp64" --with-scalapack-include=/users/home/opt/intel/composer_xe_2013/mkl/include --download-metis --download-parmetis --download-mumps --download-superlu --with-debugging=yes CFLAGS=-I/users/home/opt/netcdf/netcdf-4.3/include -I/users/home/opt/szip/szip-2.1/include -I/users/home/opt/hdf5/hdf5-1.8.11/include -I/usr/include FFLAGS=-xHost -no-prec-div -O3 -I/users/home/opt/netcdf/netcdf-4.3/include LDFLAGS=-L/users/home/opt/netcdf/netcdf-4.3/lib -lnetcdff -L/users/home/opt/hdf5/hdf5-1.8.11/lib -L/users/home/opt/netcdf/netcdf-4.3/lib -L/usr/lib64/ -lz -lgpfs -lnetcdf -lcurl -lnetcdf > > The program cannot complete the scatter process > and remains hanging. > > One odd thing is that, though I create a stride, the IS > object is marked as general. An even more odd thing is > that I've used the same code in a simple test case and > everything worked fine... > > Have you got any hint about this? > > Thanks > Ivano > > > From mfadams at lbl.gov Mon May 15 17:03:09 2017 From: mfadams at lbl.gov (Mark Adams) Date: Mon, 15 May 2017 18:03:09 -0400 Subject: [petsc-users] SNES error In-Reply-To: References: <677760BF-5666-4C9D-A064-B495ACD80889@mcs.anl.gov> Message-ID: I could use this fix for this global op counter that is getting trigger because I have an operator inside of another operator. Thanks, On Thu, May 4, 2017 at 9:09 AM, Mark Adams wrote: > OK, that makes sense, it fails when my velocity grid gets not tiny. > > I can use tine velocity grids for now. > > On Tue, May 2, 2017 at 11:18 AM, Matthew Knepley > wrote: > >> On Tue, May 2, 2017 at 10:10 AM, Mark Adams wrote: >> >>> /Users/markadams/Codes/petsc/arch-macosx-gnu-O/bin/mpiexec -n 1 ./vml >>> -v_coord_cylinder -x_dm_refine 2 -v_dm_refine 2 -snes_rtol 1.e-6 -snes_stol >>> 1.e-6 -ts_type cn -snes_fd -pc_type lu -ksp_type preonly >>> -x_petscspace_order 1 -x_petscspace_poly_tensor -v_petscspace_order 1 >>> -v_petscspace_poly_tensor -ts_dt .1 -ts_max_steps 10 -ts_final_time 1e10 >>> -verbose 3 -num_species 1 -snes_monitor -masses 1,2,4 -thermal_temps >>> 30,30,30 -domainv_lo -2,-2 -domainv_hi 2,2 -domainx_lo -12,-12 -domainx_hi >>> 12,12 -E 0,0 -blobx_radius 2 -x_dm_view hdf5:x.h5 -x_vec_view >>> hdf5:x.h5::append -v_dm_view hdf5:v.h5 -v_vec_view hdf5:v.h5::append >>> -x_pre_dm_view hdf5:prex.h5 -x_pre_vec_view hdf5:prex.h5::append >>> -snes_converged_reason -snes_linesearch_monitor -ts_adapt_monitor >>> main call SetupXDiscretization >>> main call SetInitialConditionDomain >>> VMLViewX DMGetOutputSequenceNumber=-1, >>> cmd_str=-x_pre_vec_view >>> 0) species 0: charge density= -2.3940791757186e+00, z-momentum= >>> 5.9851979392559e-01, energy= 3.2314073646197e-01, thermal-flux= >>> 2.4419137539877e-01 >>> 0) Normalized: charge density= -2.3940791757186e+00, z >>> momentum= 5.9851979392559e-01, energy= 3.2314073646197e-01, thermal flux= >>> 2.4419137539877e-01, local: 64 X cells, 81 X vertices >>> VMLViewX DMGetOutputSequenceNumber=0, cmd_str=(null) >>> VMLViewV DMGetOutputSequenceNumber=-1 >>> 0 SNES Function norm 4.097052680599e+00 >>> 1 SNES Function norm 1.213148652908e-09 >>> Nonlinear solve did not converge due to DIVERGED_FUNCTION_COUNT >>> iterations 1 >>> >> >> Neat! Mark, I think this has to do with you calling SNESEvaluateFunc() >> inside another one. We limit the number of function evaluations >> to 10,000 by default, mostly to corral line searches. I think you hit >> this, and thus need to up the count. >> >> Thanks, >> >> Matt >> >> >>> TSAdapt none step 0 stage rejected t=0 + 1.000e-01, >>> nonlinear solve failures 1 greater than current TS allowed >>> [0]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> [0]PETSC ERROR: >>> [0]PETSC ERROR: TSStep has failed due to DIVERGED_NONLINEAR_SOLVE, >>> increase -ts_max_snes_failures or make negative to attempt recovery >>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html >>> for trouble shooting. >>> [0]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3659-g699918129c >>> GIT Date: 2017-04-26 08:18:35 -0400 >>> [0]PETSC ERROR: ./vml on a arch-macosx-gnu-O named MarksMac-5.local by >>> markadams Tue May 2 11:04:02 2017 >>> [0]PETSC ERROR: Configure options --with-cc=clang --with-cc++=clang++ >>> COPTFLAGS="-O3 -g -mavx2" CXXOPTFLAGS="-O3 -g -mavx2" FOPTFLAGS="-O3 -g >>> -mavx2" --download-mpich=1 --download-parmetis=1 --download-metis=1 >>> --download-hypre=1 --download-ml=1 --download-triangle=1 >>> --download-ctetgen=1 --download-p4est=1 --with-x=0 --download-superlu_dist >>> --download-superlu --download-ctetgen --with-debugging=0 --download-hdf5=1 >>> PETSC_ARCH=arch-macosx-gnu-O --download-chaco --with-viewfromoptions=1 >>> >>> >>> On Mon, May 1, 2017 at 10:25 PM, Barry Smith wrote: >>> >>>> >>>> and >>>> >>>> -snes_linesearch_monitor >>>> -ts_adapt_monitor >>>> >>>> >>>> > On May 1, 2017, at 7:51 PM, Matthew Knepley >>>> wrote: >>>> > >>>> > Run with -snes_converged_reason. >>>> > >>>> > Matt >>>> > >>>> > On Mon, May 1, 2017 at 7:14 PM, Mark Adams wrote: >>>> > I get this SNES failure and I don't understand what the problem is. >>>> The rtol is 1.e-6 and the first iteration reduces the residual by 9 orders >>>> of magnitude. Yet, TS is not satisfied. What is going on here? >>>> > >>>> > mpiexec -n 1 ./vml -v_coord_cylinder -x_dm_refine 2 -v_dm_refine 2 >>>> -snes_rtol 1.e-6 -snes_stol 1.e-6 -ts_type cn -snes_fd -pc_type lu >>>> -ksp_type preonly -x_petscspace_order 1 -x_petscspace_poly_tensor >>>> -v_petscspace_order 1 -v_petscspace_poly_tensor -ts_dt .1 -ts_max_steps 10 >>>> -ts_final_time 1e10 -verbose 3 -num_species 1 -snes_monitor -masses 1,2,4 >>>> -thermal_temps 30,30,30 -domainv_lo -2,-2 -domainv_hi 2,2 -domainx_lo >>>> -12,-12 -domainx_hi 12,12 -E 0,0 -blobx_radius 2 -x_dm_view hdf5:x.h5 >>>> -x_vec_view hdf5:x.h5::append -v_dm_view hdf5:v.h5 -v_vec_view >>>> hdf5:v.h5::append -x_pre_dm_view hdf5:prex.h5 -x_pre_vec_view >>>> hdf5:prex.h5::append >>>> > .... >>>> > >>>> > 0 SNES Function norm 4.097052680599e+00 >>>> > 1 SNES Function norm 1.213148652908e-09 >>>> > [0]PETSC ERROR: --------------------- Error Message >>>> -------------------------------------------------------------- >>>> > [0]PETSC ERROR: >>>> > [0]PETSC ERROR: TSStep has failed due to DIVERGED_NONLINEAR_SOLVE, >>>> increase -ts_max_snes_failures or make negative to attempt recovery >>>> > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>>> ocumentation/faq.html for trouble shooting. >>>> > [0]PETSC ERROR: Petsc Development GIT revision: >>>> v3.7.6-3659-g699918129c GIT Date: 2017-04-26 08:18:35 -0400 >>>> > [0]PETSC ERROR: ./vml on a arch-macosx-gnu-O named MarksMac-5.local >>>> by markadams Mon May 1 19:21:32 2017 >>>> > [0]PETSC ERROR: Configure options --with-cc=clang --with-cc++=clang++ >>>> COPTFLAGS="-O3 -g -mavx2" CXXOPTFLAGS="-O3 -g -mavx2" FOPTFLAGS="-O3 -g >>>> -mavx2" --download-mpich=1 --download-parmetis=1 --download-metis=1 >>>> --download-hypre=1 --download-ml=1 --download-triangle=1 >>>> --download-ctetgen=1 --download-p4est=1 --with-x=0 --download-superlu_dist >>>> --download-superlu --download-ctetgen --with-debugging=0 --download-hdf5=1 >>>> PETSC_ARCH=arch-macosx-gnu-O --download-chaco --with-viewfromoptions=1 >>>> > >>>> > >>>> > >>>> > >>>> > -- >>>> > What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> > -- Norbert Wiener >>>> >>>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon May 15 17:12:20 2017 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 15 May 2017 17:12:20 -0500 Subject: [petsc-users] SNES error In-Reply-To: References: <677760BF-5666-4C9D-A064-B495ACD80889@mcs.anl.gov> Message-ID: On Mon, May 15, 2017 at 5:03 PM, Mark Adams wrote: > I could use this fix for this global op counter that is getting trigger > because I have an operator inside of another operator. > Set the last arugment to PETSC_MAX_INT http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESSetTolerances.html#SNESSetTolerances Matt > Thanks, > > On Thu, May 4, 2017 at 9:09 AM, Mark Adams wrote: > >> OK, that makes sense, it fails when my velocity grid gets not tiny. >> >> I can use tine velocity grids for now. >> >> On Tue, May 2, 2017 at 11:18 AM, Matthew Knepley >> wrote: >> >>> On Tue, May 2, 2017 at 10:10 AM, Mark Adams wrote: >>> >>>> /Users/markadams/Codes/petsc/arch-macosx-gnu-O/bin/mpiexec -n 1 ./vml >>>> -v_coord_cylinder -x_dm_refine 2 -v_dm_refine 2 -snes_rtol 1.e-6 -snes_stol >>>> 1.e-6 -ts_type cn -snes_fd -pc_type lu -ksp_type preonly >>>> -x_petscspace_order 1 -x_petscspace_poly_tensor -v_petscspace_order 1 >>>> -v_petscspace_poly_tensor -ts_dt .1 -ts_max_steps 10 -ts_final_time 1e10 >>>> -verbose 3 -num_species 1 -snes_monitor -masses 1,2,4 -thermal_temps >>>> 30,30,30 -domainv_lo -2,-2 -domainv_hi 2,2 -domainx_lo -12,-12 -domainx_hi >>>> 12,12 -E 0,0 -blobx_radius 2 -x_dm_view hdf5:x.h5 -x_vec_view >>>> hdf5:x.h5::append -v_dm_view hdf5:v.h5 -v_vec_view hdf5:v.h5::append >>>> -x_pre_dm_view hdf5:prex.h5 -x_pre_vec_view hdf5:prex.h5::append >>>> -snes_converged_reason -snes_linesearch_monitor -ts_adapt_monitor >>>> main call SetupXDiscretization >>>> main call SetInitialConditionDomain >>>> VMLViewX DMGetOutputSequenceNumber=-1, >>>> cmd_str=-x_pre_vec_view >>>> 0) species 0: charge density= -2.3940791757186e+00, z-momentum= >>>> 5.9851979392559e-01, energy= 3.2314073646197e-01, thermal-flux= >>>> 2.4419137539877e-01 >>>> 0) Normalized: charge density= -2.3940791757186e+00, z >>>> momentum= 5.9851979392559e-01, energy= 3.2314073646197e-01, thermal flux= >>>> 2.4419137539877e-01, local: 64 X cells, 81 X vertices >>>> VMLViewX DMGetOutputSequenceNumber=0, cmd_str=(null) >>>> VMLViewV DMGetOutputSequenceNumber=-1 >>>> 0 SNES Function norm 4.097052680599e+00 >>>> 1 SNES Function norm 1.213148652908e-09 >>>> Nonlinear solve did not converge due to DIVERGED_FUNCTION_COUNT >>>> iterations 1 >>>> >>> >>> Neat! Mark, I think this has to do with you calling SNESEvaluateFunc() >>> inside another one. We limit the number of function evaluations >>> to 10,000 by default, mostly to corral line searches. I think you hit >>> this, and thus need to up the count. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> TSAdapt none step 0 stage rejected t=0 + 1.000e-01, >>>> nonlinear solve failures 1 greater than current TS allowed >>>> [0]PETSC ERROR: --------------------- Error Message >>>> -------------------------------------------------------------- >>>> [0]PETSC ERROR: >>>> [0]PETSC ERROR: TSStep has failed due to DIVERGED_NONLINEAR_SOLVE, >>>> increase -ts_max_snes_failures or make negative to attempt recovery >>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html >>>> for trouble shooting. >>>> [0]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3659-g699918129c >>>> GIT Date: 2017-04-26 08:18:35 -0400 >>>> [0]PETSC ERROR: ./vml on a arch-macosx-gnu-O named MarksMac-5.local by >>>> markadams Tue May 2 11:04:02 2017 >>>> [0]PETSC ERROR: Configure options --with-cc=clang --with-cc++=clang++ >>>> COPTFLAGS="-O3 -g -mavx2" CXXOPTFLAGS="-O3 -g -mavx2" FOPTFLAGS="-O3 -g >>>> -mavx2" --download-mpich=1 --download-parmetis=1 --download-metis=1 >>>> --download-hypre=1 --download-ml=1 --download-triangle=1 >>>> --download-ctetgen=1 --download-p4est=1 --with-x=0 --download-superlu_dist >>>> --download-superlu --download-ctetgen --with-debugging=0 --download-hdf5=1 >>>> PETSC_ARCH=arch-macosx-gnu-O --download-chaco --with-viewfromoptions=1 >>>> >>>> >>>> On Mon, May 1, 2017 at 10:25 PM, Barry Smith >>>> wrote: >>>> >>>>> >>>>> and >>>>> >>>>> -snes_linesearch_monitor >>>>> -ts_adapt_monitor >>>>> >>>>> >>>>> > On May 1, 2017, at 7:51 PM, Matthew Knepley >>>>> wrote: >>>>> > >>>>> > Run with -snes_converged_reason. >>>>> > >>>>> > Matt >>>>> > >>>>> > On Mon, May 1, 2017 at 7:14 PM, Mark Adams wrote: >>>>> > I get this SNES failure and I don't understand what the problem is. >>>>> The rtol is 1.e-6 and the first iteration reduces the residual by 9 orders >>>>> of magnitude. Yet, TS is not satisfied. What is going on here? >>>>> > >>>>> > mpiexec -n 1 ./vml -v_coord_cylinder -x_dm_refine 2 -v_dm_refine 2 >>>>> -snes_rtol 1.e-6 -snes_stol 1.e-6 -ts_type cn -snes_fd -pc_type lu >>>>> -ksp_type preonly -x_petscspace_order 1 -x_petscspace_poly_tensor >>>>> -v_petscspace_order 1 -v_petscspace_poly_tensor -ts_dt .1 -ts_max_steps 10 >>>>> -ts_final_time 1e10 -verbose 3 -num_species 1 -snes_monitor -masses 1,2,4 >>>>> -thermal_temps 30,30,30 -domainv_lo -2,-2 -domainv_hi 2,2 -domainx_lo >>>>> -12,-12 -domainx_hi 12,12 -E 0,0 -blobx_radius 2 -x_dm_view hdf5:x.h5 >>>>> -x_vec_view hdf5:x.h5::append -v_dm_view hdf5:v.h5 -v_vec_view >>>>> hdf5:v.h5::append -x_pre_dm_view hdf5:prex.h5 -x_pre_vec_view >>>>> hdf5:prex.h5::append >>>>> > .... >>>>> > >>>>> > 0 SNES Function norm 4.097052680599e+00 >>>>> > 1 SNES Function norm 1.213148652908e-09 >>>>> > [0]PETSC ERROR: --------------------- Error Message >>>>> -------------------------------------------------------------- >>>>> > [0]PETSC ERROR: >>>>> > [0]PETSC ERROR: TSStep has failed due to DIVERGED_NONLINEAR_SOLVE, >>>>> increase -ts_max_snes_failures or make negative to attempt recovery >>>>> > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d >>>>> ocumentation/faq.html for trouble shooting. >>>>> > [0]PETSC ERROR: Petsc Development GIT revision: >>>>> v3.7.6-3659-g699918129c GIT Date: 2017-04-26 08:18:35 -0400 >>>>> > [0]PETSC ERROR: ./vml on a arch-macosx-gnu-O named MarksMac-5.local >>>>> by markadams Mon May 1 19:21:32 2017 >>>>> > [0]PETSC ERROR: Configure options --with-cc=clang >>>>> --with-cc++=clang++ COPTFLAGS="-O3 -g -mavx2" CXXOPTFLAGS="-O3 -g -mavx2" >>>>> FOPTFLAGS="-O3 -g -mavx2" --download-mpich=1 --download-parmetis=1 >>>>> --download-metis=1 --download-hypre=1 --download-ml=1 --download-triangle=1 >>>>> --download-ctetgen=1 --download-p4est=1 --with-x=0 --download-superlu_dist >>>>> --download-superlu --download-ctetgen --with-debugging=0 --download-hdf5=1 >>>>> PETSC_ARCH=arch-macosx-gnu-O --download-chaco --with-viewfromoptions=1 >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > -- >>>>> > What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> > -- Norbert Wiener >>>>> >>>>> >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed May 17 17:22:24 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 17 May 2017 17:22:24 -0500 Subject: [petsc-users] [petsc-dev] For Fortran users of PETSc development version In-Reply-To: References: <4726FF04-1D07-455F-88B4-FF2488DCAE07@mcs.anl.gov> Message-ID: <57A4F0B7-AFDC-4FE8-920A-A9700F827360@mcs.anl.gov> > On May 11, 2017, at 11:24 AM, Gautam Bisht wrote: > > Hi Barry, > > I'm wondering if these changes will be a part of future 3.8.0 release or 4.0.0 release. And, do you have a tentative timeline when such a release tag would be made? This will be in 3.8.0 and we hope to make the release in early June. Barry > > -Gautam. > > On Sun, Dec 4, 2016 at 11:13 AM, Barry Smith wrote: > > Jed noticed a small mistake in my description. It is type(tXXX) not type(iXXX) if you chose to declare your variables that way. Note that declaring them via type(tXXX) or XXX is identical (XXX is just a macro for type(tXXX)). > > Barry > > > > On Dec 4, 2016, at 11:57 AM, Barry Smith wrote: > > > > > > For Fortran users of the PETSc development (git master branch) version > > > > > > I have updated and simplified the Fortran usage of PETSc in the past few weeks. I will put the branch barry/fortran-update into the master branch on Monday. The usage changes are > > > > A) for each Fortran function (and main) use the following > > > > subroutine mysubroutine(.....) > > #include > > use petscxxx > > implicit none > > > > For example if you are using SNES in your code you would have > > > > #include > > use petscsnes > > implicit none > > > > B) Instead of PETSC_NULL_OBJECT you must pass PETSC_NULL_XXX (for example PETSC_NULL_VEC) using the specific object type XXX that the function call is expecting. > > > > C) Objects can be declared either as XXX a or type(iXXX) a, for example Mat a or type(iMat) a. (Note that previously for those who used types it was type(Mat) but that can no longer be used. > > > > Notes: > > > > 1) There are no longer any .h90 files that may be included > > > > 2) Like C the include files are now nested so you no longer need to include for example > > > > #include > > #include > > #include > > #include > > #include > > > > you can just include > > > > #include > > > > 3) there is now type checking of most function calls. This will help eliminate bugs due to incorrect calling sequences. Note that Fortran distinguishes between a argument that is a scalar (zero dimensional array), a one dimensional array and a two dimensional array (etc). So you may get compile warnings because you are passing in an array when PETSc expects a scalar or vis-versa. If you get these simply fix your declaration of the variable to match what is expected. In some routines like MatSetValues() and friends you can pass either scalars, one dimensional arrays or two dimensional arrays, if you get errors here please send mail to petsc-maint at mcs.anl.gov and include enough of your code so we can see the dimensions of all your variables so we can fix the problems. > > > > 4) You can continue to use either fixed (.F extension) or free format (.F90 extension) for your source > > > > 5) All the examples in PETSc have been updated so consult them for clarifications. > > > > > > Please report any problems to petsc-maint at mcs.anl.gov > > > > Thanks > > > > Barry > > > > > > From Fabian.Jakub at physik.uni-muenchen.de Thu May 18 05:11:40 2017 From: Fabian.Jakub at physik.uni-muenchen.de (Fabian.Jakub) Date: Thu, 18 May 2017 12:11:40 +0200 Subject: [petsc-users] Problems with PetscObjectViewFromOptions in Fortran Message-ID: <8603540a-38de-ead0-8690-3a9e5d063e7f@physik.uni-muenchen.de> Dear Petsc Team, I have a problem with object viewing through PetscObjectViewFromOptions The C Version works fine, e.g. static char help[] = "Testing multiple PetscObjectViewFromOptions"; #include int main(int argc,char **argv) { DM dmA, dmB; PetscInitialize(&argc,&argv,(char*)0,help); PetscErrorCode ierr; ierr = DMPlexCreate(PETSC_COMM_WORLD, &dmA); CHKERRQ(ierr); ierr = PetscObjectSetName((PetscObject) dmA, "DMPlex_A"); CHKERRQ(ierr); ierr = DMPlexCreate(PETSC_COMM_WORLD, &dmB); CHKERRQ(ierr); ierr = PetscObjectSetName((PetscObject) dmB, "DMPlex_B"); CHKERRQ(ierr); PetscObjectViewFromOptions((PetscObject) dmA, NULL, "-dmA"); PetscObjectViewFromOptions((PetscObject) dmB, NULL, "-dmB"); ierr = DMDestroy(&dmA); CHKERRQ(ierr); ierr = DMDestroy(&dmB); CHKERRQ(ierr); PetscFinalize(); } and running it with -help, correctly produces the options and views as: -dmA -dmB but the equivalent in Fortran, e.g.: program main #include "petsc/finclude/petsc.h" use petsc implicit none PetscErrorCode :: ierr DM :: dmA, dmB call PetscInitialize(PETSC_NULL_CHARACTER, ierr); CHKERRQ(ierr) call DMPlexCreate(PETSC_COMM_WORLD, dmA, ierr);CHKERRQ(ierr) call PetscObjectSetName(dmA, 'DMPlex_A', ierr);CHKERRQ(ierr) call DMPlexCreate(PETSC_COMM_WORLD, dmB, ierr);CHKERRQ(ierr) call PetscObjectSetName(dmB, 'DMPlex_B', ierr);CHKERRQ(ierr) call PetscObjectViewFromOptions(dmA, PETSC_NULL_CHARACTER, "-dmA", ierr); CHKERRQ(ierr) call PetscObjectViewFromOptions(dmB, PETSC_NULL_CHARACTER, "-dmB", ierr); CHKERRQ(ierr) call DMDestroy(dmA, ierr);CHKERRQ(ierr) call DMDestroy(dmB, ierr);CHKERRQ(ierr) call PetscFinalize(ierr) end program produces the options to be: -dmA-dmB -dmB While this works as expected when running with: ./example -dmA-dmB -dmB This is not intuitive. Is the hickup on my side or is it somewhere in the Fortran stubs? Please, let me know if you need more details on the build or if you cannot reproduce this. Many thanks, Fabian Petsc Development GIT revision: v3.7.6-3910-gd04c6f6 GIT Date: 2017-05-15 17:09:20 -0500 ./configure \ --with-cc=$(which mpicc) \ --with-fc=$(which mpif90) \ --with-cxx=$(which mpicxx) \ --with-fortran \ --with-fortran-interfaces \ --with-shared-libraries=1 \ --download-hdf5 \ --download-triangle \ --download-ctetgen \ --with-cmake=$(which cmake) \ --with-debugging=1 \ COPTFLAGS='-O2 ' \ FOPTFLAGS='-O2 ' \ \ && make all test GNU Fortran (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609 (Open MPI) 1.10.2 Complete output of -help -info (Fortran Version): [0] petscinitialize_internal(): (Fortran):PETSc successfully started: procs 1 [0] PetscGetHostName(): Rejecting domainname, likely is NIS met-ws-740m19.(none) [0] petscinitialize_internal(): Running on machine: met-ws-740m19 ------Additional PETSc component options-------- -log_exclude: -info_exclude: ----------------------------------------------- [0] PetscCommDuplicate(): Duplicating a communicator 47693199447680 11260976 max tags = 2147483647 [0] PetscCommDuplicate(): Using internal PETSc communicator 47693199447680 11260976 [0] PetscCommDuplicate(): Using internal PETSc communicator 47693199447680 11260976 [0] PetscCommDuplicate(): Using internal PETSc communicator 47693199447680 11260976 [0] PetscCommDuplicate(): Using internal PETSc communicator 47693199447680 11260976 [0] PetscCommDuplicate(): Using internal PETSc communicator 47693199447680 11260976 [0] PetscCommDuplicate(): Using internal PETSc communicator 47693199447680 11260976 [0] PetscCommDuplicate(): Using internal PETSc communicator 47693199447680 11260976 -dmA-dmB ascii[:[filename][:[format][:append]]]: Prints object to stdout or ASCII file (PetscOptionsGetViewer) -dmA-dmB binary[:[filename][:[format][:append]]]: Saves object to a binary file (PetscOptionsGetViewer) -dmA-dmB draw[:drawtype[:filename]] Draws object (PetscOptionsGetViewer) -dmA-dmB socket[:port]: Pushes object to a Unix socket (PetscOptionsGetViewer) -dmA-dmB saws[:communicatorname]: Publishes object to SAWs (PetscOptionsGetViewer) DM Object: DMPlex_A 1 MPI processes type: plex [0] PetscCommDuplicate(): Duplicating a communicator 47693199449728 13799472 max tags = 2147483647 [0] PetscCommDuplicate(): Using internal PETSc communicator 47693199449728 13799472 DMPlex_A in 0 dimensions: 0-cells: 0 -dmB ascii[:[filename][:[format][:append]]]: Prints object to stdout or ASCII file (PetscOptionsGetViewer) -dmB binary[:[filename][:[format][:append]]]: Saves object to a binary file (PetscOptionsGetViewer) -dmB draw[:drawtype[:filename]] Draws object (PetscOptionsGetViewer) -dmB socket[:port]: Pushes object to a Unix socket (PetscOptionsGetViewer) -dmB saws[:communicatorname]: Publishes object to SAWs (PetscOptionsGetViewer) DM Object: DMPlex_B 1 MPI processes type: plex [0] PetscCommDuplicate(): Using internal PETSc communicator 47693199449728 13799472 [0] PetscCommDuplicate(): Using internal PETSc communicator 47693199449728 13799472 DMPlex_B in 0 dimensions: 0-cells: 0 [0] Petsc_DelComm_Inner(): Removing reference to PETSc communicator embedded in a user MPI_Comm 13799472 [0] Petsc_DelComm_Outer(): User MPI_Comm 47693199449728 is being freed after removing reference from inner PETSc comm to this outer comm [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 13799472 [0] Petsc_DelCounter(): Deleting counter data in an MPI_Comm 13799472 [0] PetscFinalize(): PetscFinalize() called [0] PetscGetHostName(): Rejecting domainname, likely is NIS met-ws-740m19.(none) [0] PetscFOpen(): Opening file Log.0 [0] Petsc_DelViewer(): Removing viewer data attribute in an MPI_Comm 11260976 [0] Petsc_DelComm_Inner(): Removing reference to PETSc communicator embedded in a user MPI_Comm 11260976 [0] Petsc_DelComm_Outer(): User MPI_Comm 47693199447680 is being freed after removing reference from inner PETSc comm to this outer comm [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 11260976 [0] Petsc_DelCounter(): Deleting counter data in an MPI_Comm 11260976 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From mbaker112 at outlook.de Thu May 18 08:38:41 2017 From: mbaker112 at outlook.de (Matt Baker) Date: Thu, 18 May 2017 13:38:41 +0000 Subject: [petsc-users] Regarding the conjugate gradient method In-Reply-To: References: Message-ID: Hello, just a quick question: The CG method is generally derived for spd matrices. However, the PETSc man page states Notes: The PCG method requires both the matrix and preconditioner to be symmetric positive (or negative) (semi) definite Only left preconditioning is supported. Does this mean CG works for semi-definite problems as well? Is that guaranteed then? Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu May 18 08:46:29 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 18 May 2017 08:46:29 -0500 Subject: [petsc-users] Regarding the conjugate gradient method In-Reply-To: References: Message-ID: On Thu, May 18, 2017 at 8:38 AM, Matt Baker wrote: > Hello, > > > just a quick question: > > The CG method is generally derived for spd matrices. However, the PETSc > man page states > > > Notes: The PCG method requires both the matrix and preconditioner to be > symmetric positive (or negative) (semi) definite Only left preconditioning > is supported. > > > Does this mean CG works for semi-definite problems as well? Is that > guaranteed then? > I believe that CG converges to A^+ b if b is in the range space of A, but I would have to look it up. Its probably in Hestenes, "Optimization theory: the finite dimensional case" 1975 Thanks, Matt > Thanks. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu May 18 16:01:18 2017 From: jed at jedbrown.org (Jed Brown) Date: Thu, 18 May 2017 16:01:18 -0500 Subject: [petsc-users] [Yousef Saad] Preconditioning-17 Travel awards Message-ID: <8737c1yj4x.fsf@jedbrown.org> Travel awards for early career researchers are available. -------------- next part -------------- An embedded message was scrubbed... From: saad at cs.umn.edu (Yousef Saad) Subject: Preconditioning-17 Travel awards Date: Thu, 18 May 2017 14:26:33 -0500 (CDT) Size: 3408 URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From ling.zou at inl.gov Fri May 19 09:25:39 2017 From: ling.zou at inl.gov (Zou, Ling) Date: Fri, 19 May 2017 08:25:39 -0600 Subject: [petsc-users] Understanding log summary Message-ID: Hi All, In terms of code performance, sometimes people would ask for info about total non-linear iteration numbers, total linear iteration numbers, etc. I suppose all these could be found in the log summary. For the attached log summary, can I say? total non-linear iteration number = 573 total linear iteration number = 2321 Thank you. Ling ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage VecDot 607 1.0 1.8729e-04 1.0 1.95e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1040 VecMDot 2321 1.0 1.1075e-03 1.0 2.87e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 5 0 0 0 0 5 0 0 0 2590 VecNorm 5422 1.0 1.2229e-03 1.0 1.74e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 1423 VecScale 5822 1.0 1.2764e-03 1.0 9.37e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 734 VecCopy 14334 1.0 1.8302e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 1231 1.0 3.0700e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 14961 1.0 3.3679e-03 1.0 4.82e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 9 0 0 0 0 9 0 0 0 1430 VecWAXPY 20842 1.0 5.5537e-03 1.0 4.00e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 8 0 0 0 0 8 0 0 0 721 VecMAXPY 2894 1.0 1.4292e-03 1.0 3.62e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 7 0 0 0 0 7 0 0 0 2536 VecSetRandom 34 1.0 1.0322e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecReduceArith 1146 1.0 2.9907e-04 1.0 3.68e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1230 VecReduceComm 573 1.0 9.2384e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecNormalize 2894 1.0 1.9604e-03 1.0 1.39e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 712 SNESJacobianEval 573 1.0 2.2410e+00 1.0 5.60e+06 1.0 0.0e+00 0.0e+00 0.0e+00 62 11 0 0 0 62 11 0 0 0 2 MatMult MF 2928 1.0 5.7922e-01 1.0 2.83e+06 1.0 0.0e+00 0.0e+00 0.0e+00 16 5 0 0 0 16 5 0 0 0 5 MatMult 2928 1.0 5.7963e-01 1.0 2.83e+06 1.0 0.0e+00 0.0e+00 0.0e+00 16 5 0 0 0 16 5 0 0 0 5 MatSolve 2894 1.0 7.3171e-03 1.0 1.76e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 33 0 0 0 0 33 0 0 0 2405 MatLUFactorNum 573 1.0 1.4733e-02 1.0 1.71e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 32 0 0 0 0 32 0 0 0 1158 MatILUFactorSym 1 1.0 2.4543e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 1147 1.0 7.9204e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 1147 1.0 2.5825e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetRowIJ 2 1.0 6.3280e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 1.5754e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatZeroEntries 573 1.0 5.7120e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatFDColorCreate 1 1.0 2.0541e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatFDColorSetUp 1 1.0 1.6103e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatFDColorApply 573 1.0 2.2386e+00 1.0 5.60e+06 1.0 0.0e+00 0.0e+00 0.0e+00 62 11 0 0 0 62 11 0 0 0 2 MatFDColorFunc 11460 1.0 2.2264e+00 1.0 1.85e+06 1.0 0.0e+00 0.0e+00 0.0e+00 62 3 0 0 0 62 3 0 0 0 1 MatColoringApply 1 1.0 3.9990e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPGMRESOrthog 2321 1.0 2.9685e-03 1.0 5.75e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 11 0 0 0 0 11 0 0 0 1935 KSPSetUp 573 1.0 2.3291e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 573 1.0 5.0164e-01 1.0 4.50e+07 1.0 0.0e+00 0.0e+00 0.0e+00 14 85 0 0 0 14 85 0 0 0 90 PCSetUp 573 1.0 1.5172e-02 1.0 1.71e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 32 0 0 0 0 32 0 0 0 1124 PCApply 2894 1.0 7.8614e-03 1.0 1.76e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 33 0 0 0 0 33 0 0 0 2239 ------------------------------------------------------------------------------------------------------------------------ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri May 19 13:02:05 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 19 May 2017 13:02:05 -0500 Subject: [petsc-users] Understanding log summary In-Reply-To: References: Message-ID: <8D633024-16A3-41C6-8C08-5F8AA82BBD70@mcs.anl.gov> > On May 19, 2017, at 9:25 AM, Zou, Ling wrote: > > Hi All, > > In terms of code performance, sometimes people would ask for info about total non-linear iteration numbers, total linear iteration numbers, etc. I suppose all these could be found in the log summary. For the attached log summary, can I say? > total non-linear iteration number = 573 > total linear iteration number = 2321 Yes, The log file is kind of funny. It spends 62% of the time in MatFDColorApply() which is computing the Jacobian via differencing and coloring, this is a lot of time. You might consider lagging the Jacobian; that is not recompute the Jacobian for each new linear solve. You can use -snes_lag_jacobian 2 or -snes_lag_jacobian 3 etc and see how this affects the run time. Barry > MatMult MF > > Thank you. > > Ling > > > > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) Flops --- Global --- --- Stage --- Total > Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > ------------------------------------------------------------------------------------------------------------------------ > > --- Event Stage 0: Main Stage > > VecDot 607 1.0 1.8729e-04 1.0 1.95e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1040 > VecMDot 2321 1.0 1.1075e-03 1.0 2.87e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 5 0 0 0 0 5 0 0 0 2590 > VecNorm 5422 1.0 1.2229e-03 1.0 1.74e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 1423 > VecScale 5822 1.0 1.2764e-03 1.0 9.37e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 734 > VecCopy 14334 1.0 1.8302e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 1231 1.0 3.0700e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAXPY 14961 1.0 3.3679e-03 1.0 4.82e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 9 0 0 0 0 9 0 0 0 1430 > VecWAXPY 20842 1.0 5.5537e-03 1.0 4.00e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 8 0 0 0 0 8 0 0 0 721 > VecMAXPY 2894 1.0 1.4292e-03 1.0 3.62e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 7 0 0 0 0 7 0 0 0 2536 > VecSetRandom 34 1.0 1.0322e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecReduceArith 1146 1.0 2.9907e-04 1.0 3.68e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1230 > VecReduceComm 573 1.0 9.2384e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecNormalize 2894 1.0 1.9604e-03 1.0 1.39e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 712 > SNESJacobianEval 573 1.0 2.2410e+00 1.0 5.60e+06 1.0 0.0e+00 0.0e+00 0.0e+00 62 11 0 0 0 62 11 0 0 0 2 > MatMult MF 2928 1.0 5.7922e-01 1.0 2.83e+06 1.0 0.0e+00 0.0e+00 0.0e+00 16 5 0 0 0 16 5 0 0 0 5 > MatMult 2928 1.0 5.7963e-01 1.0 2.83e+06 1.0 0.0e+00 0.0e+00 0.0e+00 16 5 0 0 0 16 5 0 0 0 5 > MatSolve 2894 1.0 7.3171e-03 1.0 1.76e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 33 0 0 0 0 33 0 0 0 2405 > MatLUFactorNum 573 1.0 1.4733e-02 1.0 1.71e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 32 0 0 0 0 32 0 0 0 1158 > MatILUFactorSym 1 1.0 2.4543e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyBegin 1147 1.0 7.9204e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 1147 1.0 2.5825e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatGetRowIJ 2 1.0 6.3280e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatGetOrdering 1 1.0 1.5754e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatZeroEntries 573 1.0 5.7120e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatFDColorCreate 1 1.0 2.0541e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatFDColorSetUp 1 1.0 1.6103e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatFDColorApply 573 1.0 2.2386e+00 1.0 5.60e+06 1.0 0.0e+00 0.0e+00 0.0e+00 62 11 0 0 0 62 11 0 0 0 2 > MatFDColorFunc 11460 1.0 2.2264e+00 1.0 1.85e+06 1.0 0.0e+00 0.0e+00 0.0e+00 62 3 0 0 0 62 3 0 0 0 1 > MatColoringApply 1 1.0 3.9990e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPGMRESOrthog 2321 1.0 2.9685e-03 1.0 5.75e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 11 0 0 0 0 11 0 0 0 1935 > KSPSetUp 573 1.0 2.3291e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 573 1.0 5.0164e-01 1.0 4.50e+07 1.0 0.0e+00 0.0e+00 0.0e+00 14 85 0 0 0 14 85 0 0 0 90 > PCSetUp 573 1.0 1.5172e-02 1.0 1.71e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 32 0 0 0 0 32 0 0 0 1124 > PCApply 2894 1.0 7.8614e-03 1.0 1.76e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 33 0 0 0 0 33 0 0 0 2239 > ------------------------------------------------------------------------------------------------------------------------ > From ling.zou at inl.gov Fri May 19 13:12:25 2017 From: ling.zou at inl.gov (Zou, Ling) Date: Fri, 19 May 2017 12:12:25 -0600 Subject: [petsc-users] Understanding log summary In-Reply-To: <8D633024-16A3-41C6-8C08-5F8AA82BBD70@mcs.anl.gov> References: <8D633024-16A3-41C6-8C08-5F8AA82BBD70@mcs.anl.gov> Message-ID: Barry, thanks for your comments and advise. Lagging Jacobian evaluation certainly helped (just tested it). However, eventually the finite differencing for Jacobian evaluation should be replaced with some sort of approximated Jacobian evaluation subroutine. So at this moment I don't worry too much on its cost. Thanks again, Ling On Fri, May 19, 2017 at 12:02 PM, Barry Smith wrote: > > > On May 19, 2017, at 9:25 AM, Zou, Ling wrote: > > > > Hi All, > > > > In terms of code performance, sometimes people would ask for info about > total non-linear iteration numbers, total linear iteration numbers, etc. I > suppose all these could be found in the log summary. For the attached log > summary, can I say? > > total non-linear iteration number = 573 > > total linear iteration number = 2321 > > Yes, > > The log file is kind of funny. It spends 62% of the time in > MatFDColorApply() which is computing the Jacobian via differencing and > coloring, this is a lot of time. You might consider lagging the Jacobian; > that is not recompute the Jacobian for each new linear solve. You can use > -snes_lag_jacobian 2 or -snes_lag_jacobian 3 etc and see how this affects > the run time. > > Barry > > > > > > MatMult MF > > > > Thank you. > > > > Ling > > > > > > > > ------------------------------------------------------------ > ------------------------------------------------------------ > > Event Count Time (sec) Flops > --- Global --- --- Stage --- Total > > Max Ratio Max Ratio Max Ratio Mess Avg len > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > > ------------------------------------------------------------ > ------------------------------------------------------------ > > > > --- Event Stage 0: Main Stage > > > > VecDot 607 1.0 1.8729e-04 1.0 1.95e+05 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 1040 > > VecMDot 2321 1.0 1.1075e-03 1.0 2.87e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 5 0 0 0 0 5 0 0 0 2590 > > VecNorm 5422 1.0 1.2229e-03 1.0 1.74e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 3 0 0 0 0 3 0 0 0 1423 > > VecScale 5822 1.0 1.2764e-03 1.0 9.37e+05 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 2 0 0 0 0 2 0 0 0 734 > > VecCopy 14334 1.0 1.8302e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecSet 1231 1.0 3.0700e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecAXPY 14961 1.0 3.3679e-03 1.0 4.82e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 9 0 0 0 0 9 0 0 0 1430 > > VecWAXPY 20842 1.0 5.5537e-03 1.0 4.00e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 8 0 0 0 0 8 0 0 0 721 > > VecMAXPY 2894 1.0 1.4292e-03 1.0 3.62e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 7 0 0 0 0 7 0 0 0 2536 > > VecSetRandom 34 1.0 1.0322e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecReduceArith 1146 1.0 2.9907e-04 1.0 3.68e+05 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 0 1 0 0 0 1230 > > VecReduceComm 573 1.0 9.2384e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > VecNormalize 2894 1.0 1.9604e-03 1.0 1.39e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 3 0 0 0 0 3 0 0 0 712 > > SNESJacobianEval 573 1.0 2.2410e+00 1.0 5.60e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 62 11 0 0 0 62 11 0 0 0 2 > > MatMult MF 2928 1.0 5.7922e-01 1.0 2.83e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 16 5 0 0 0 16 5 0 0 0 5 > > MatMult 2928 1.0 5.7963e-01 1.0 2.83e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 16 5 0 0 0 16 5 0 0 0 5 > > MatSolve 2894 1.0 7.3171e-03 1.0 1.76e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 33 0 0 0 0 33 0 0 0 2405 > > MatLUFactorNum 573 1.0 1.4733e-02 1.0 1.71e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 32 0 0 0 0 32 0 0 0 1158 > > MatILUFactorSym 1 1.0 2.4543e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatAssemblyBegin 1147 1.0 7.9204e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatAssemblyEnd 1147 1.0 2.5825e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatGetRowIJ 2 1.0 6.3280e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatGetOrdering 1 1.0 1.5754e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatZeroEntries 573 1.0 5.7120e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatFDColorCreate 1 1.0 2.0541e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatFDColorSetUp 1 1.0 1.6103e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > MatFDColorApply 573 1.0 2.2386e+00 1.0 5.60e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 62 11 0 0 0 62 11 0 0 0 2 > > MatFDColorFunc 11460 1.0 2.2264e+00 1.0 1.85e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 62 3 0 0 0 62 3 0 0 0 1 > > MatColoringApply 1 1.0 3.9990e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > KSPGMRESOrthog 2321 1.0 2.9685e-03 1.0 5.75e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 11 0 0 0 0 11 0 0 0 1935 > > KSPSetUp 573 1.0 2.3291e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > KSPSolve 573 1.0 5.0164e-01 1.0 4.50e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 14 85 0 0 0 14 85 0 0 0 90 > > PCSetUp 573 1.0 1.5172e-02 1.0 1.71e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 32 0 0 0 0 32 0 0 0 1124 > > PCApply 2894 1.0 7.8614e-03 1.0 1.76e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 33 0 0 0 0 33 0 0 0 2239 > > ------------------------------------------------------------ > ------------------------------------------------------------ > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Fri May 19 13:53:47 2017 From: hongzhang at anl.gov (Zhang, Hong) Date: Fri, 19 May 2017 18:53:47 +0000 Subject: [petsc-users] Understanding log summary In-Reply-To: References: Message-ID: <633AA7C7-6B11-4DAE-B855-90873FD8B330@anl.gov> On May 19, 2017, at 9:25 AM, Zou, Ling > wrote: Hi All, In terms of code performance, sometimes people would ask for info about total non-linear iteration numbers, total linear iteration numbers, etc. I suppose all these could be found in the log summary. For the attached log summary, can I say? total non-linear iteration number = 573 total linear iteration number = 2321 Usually SNESSolve corresponds to the number of nonlinear iterations in the summary. It seems that you are solving some linear systems with GMRES. And there are 573 linear solves with 2321 GMRES iterations in total. Hong (Mr.) Thank you. Ling ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage VecDot 607 1.0 1.8729e-04 1.0 1.95e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1040 VecMDot 2321 1.0 1.1075e-03 1.0 2.87e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 5 0 0 0 0 5 0 0 0 2590 VecNorm 5422 1.0 1.2229e-03 1.0 1.74e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 1423 VecScale 5822 1.0 1.2764e-03 1.0 9.37e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 734 VecCopy 14334 1.0 1.8302e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 1231 1.0 3.0700e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 14961 1.0 3.3679e-03 1.0 4.82e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 9 0 0 0 0 9 0 0 0 1430 VecWAXPY 20842 1.0 5.5537e-03 1.0 4.00e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 8 0 0 0 0 8 0 0 0 721 VecMAXPY 2894 1.0 1.4292e-03 1.0 3.62e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 7 0 0 0 0 7 0 0 0 2536 VecSetRandom 34 1.0 1.0322e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecReduceArith 1146 1.0 2.9907e-04 1.0 3.68e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1230 VecReduceComm 573 1.0 9.2384e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecNormalize 2894 1.0 1.9604e-03 1.0 1.39e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 712 SNESJacobianEval 573 1.0 2.2410e+00 1.0 5.60e+06 1.0 0.0e+00 0.0e+00 0.0e+00 62 11 0 0 0 62 11 0 0 0 2 MatMult MF 2928 1.0 5.7922e-01 1.0 2.83e+06 1.0 0.0e+00 0.0e+00 0.0e+00 16 5 0 0 0 16 5 0 0 0 5 MatMult 2928 1.0 5.7963e-01 1.0 2.83e+06 1.0 0.0e+00 0.0e+00 0.0e+00 16 5 0 0 0 16 5 0 0 0 5 MatSolve 2894 1.0 7.3171e-03 1.0 1.76e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 33 0 0 0 0 33 0 0 0 2405 MatLUFactorNum 573 1.0 1.4733e-02 1.0 1.71e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 32 0 0 0 0 32 0 0 0 1158 MatILUFactorSym 1 1.0 2.4543e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 1147 1.0 7.9204e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 1147 1.0 2.5825e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetRowIJ 2 1.0 6.3280e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 1.5754e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatZeroEntries 573 1.0 5.7120e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatFDColorCreate 1 1.0 2.0541e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatFDColorSetUp 1 1.0 1.6103e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatFDColorApply 573 1.0 2.2386e+00 1.0 5.60e+06 1.0 0.0e+00 0.0e+00 0.0e+00 62 11 0 0 0 62 11 0 0 0 2 MatFDColorFunc 11460 1.0 2.2264e+00 1.0 1.85e+06 1.0 0.0e+00 0.0e+00 0.0e+00 62 3 0 0 0 62 3 0 0 0 1 MatColoringApply 1 1.0 3.9990e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPGMRESOrthog 2321 1.0 2.9685e-03 1.0 5.75e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 11 0 0 0 0 11 0 0 0 1935 KSPSetUp 573 1.0 2.3291e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 573 1.0 5.0164e-01 1.0 4.50e+07 1.0 0.0e+00 0.0e+00 0.0e+00 14 85 0 0 0 14 85 0 0 0 90 PCSetUp 573 1.0 1.5172e-02 1.0 1.71e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 32 0 0 0 0 32 0 0 0 1124 PCApply 2894 1.0 7.8614e-03 1.0 1.76e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 33 0 0 0 0 33 0 0 0 2239 ------------------------------------------------------------------------------------------------------------------------ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Fri May 19 14:03:08 2017 From: hongzhang at anl.gov (Zhang, Hong) Date: Fri, 19 May 2017 19:03:08 +0000 Subject: [petsc-users] Understanding log summary In-Reply-To: <633AA7C7-6B11-4DAE-B855-90873FD8B330@anl.gov> References: <633AA7C7-6B11-4DAE-B855-90873FD8B330@anl.gov> Message-ID: On May 19, 2017, at 1:53 PM, Zhang, Hong > wrote: On May 19, 2017, at 9:25 AM, Zou, Ling > wrote: Hi All, In terms of code performance, sometimes people would ask for info about total non-linear iteration numbers, total linear iteration numbers, etc. I suppose all these could be found in the log summary. For the attached log summary, can I say? total non-linear iteration number = 573 total linear iteration number = 2321 Usually SNESSolve corresponds to the number of nonlinear iterations in the summary. It seems that you are solving some linear systems with GMRES. And there are 573 linear solves with 2321 GMRES iterations in total. Correction: SNESSolve gives the number of nonlinear solves. My mistake. I just realize you are probably using your own nonlinear solver (not PETSc SNES). Then you can say 573 non-linear iterations and 2321 linear iterations. Hong (Mr.) Hong (Mr.) Thank you. Ling ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage VecDot 607 1.0 1.8729e-04 1.0 1.95e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1040 VecMDot 2321 1.0 1.1075e-03 1.0 2.87e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 5 0 0 0 0 5 0 0 0 2590 VecNorm 5422 1.0 1.2229e-03 1.0 1.74e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 1423 VecScale 5822 1.0 1.2764e-03 1.0 9.37e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 734 VecCopy 14334 1.0 1.8302e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 1231 1.0 3.0700e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 14961 1.0 3.3679e-03 1.0 4.82e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 9 0 0 0 0 9 0 0 0 1430 VecWAXPY 20842 1.0 5.5537e-03 1.0 4.00e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 8 0 0 0 0 8 0 0 0 721 VecMAXPY 2894 1.0 1.4292e-03 1.0 3.62e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 7 0 0 0 0 7 0 0 0 2536 VecSetRandom 34 1.0 1.0322e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecReduceArith 1146 1.0 2.9907e-04 1.0 3.68e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1230 VecReduceComm 573 1.0 9.2384e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecNormalize 2894 1.0 1.9604e-03 1.0 1.39e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 712 SNESJacobianEval 573 1.0 2.2410e+00 1.0 5.60e+06 1.0 0.0e+00 0.0e+00 0.0e+00 62 11 0 0 0 62 11 0 0 0 2 MatMult MF 2928 1.0 5.7922e-01 1.0 2.83e+06 1.0 0.0e+00 0.0e+00 0.0e+00 16 5 0 0 0 16 5 0 0 0 5 MatMult 2928 1.0 5.7963e-01 1.0 2.83e+06 1.0 0.0e+00 0.0e+00 0.0e+00 16 5 0 0 0 16 5 0 0 0 5 MatSolve 2894 1.0 7.3171e-03 1.0 1.76e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 33 0 0 0 0 33 0 0 0 2405 MatLUFactorNum 573 1.0 1.4733e-02 1.0 1.71e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 32 0 0 0 0 32 0 0 0 1158 MatILUFactorSym 1 1.0 2.4543e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 1147 1.0 7.9204e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 1147 1.0 2.5825e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetRowIJ 2 1.0 6.3280e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 1.5754e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatZeroEntries 573 1.0 5.7120e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatFDColorCreate 1 1.0 2.0541e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatFDColorSetUp 1 1.0 1.6103e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatFDColorApply 573 1.0 2.2386e+00 1.0 5.60e+06 1.0 0.0e+00 0.0e+00 0.0e+00 62 11 0 0 0 62 11 0 0 0 2 MatFDColorFunc 11460 1.0 2.2264e+00 1.0 1.85e+06 1.0 0.0e+00 0.0e+00 0.0e+00 62 3 0 0 0 62 3 0 0 0 1 MatColoringApply 1 1.0 3.9990e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPGMRESOrthog 2321 1.0 2.9685e-03 1.0 5.75e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 11 0 0 0 0 11 0 0 0 1935 KSPSetUp 573 1.0 2.3291e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 573 1.0 5.0164e-01 1.0 4.50e+07 1.0 0.0e+00 0.0e+00 0.0e+00 14 85 0 0 0 14 85 0 0 0 90 PCSetUp 573 1.0 1.5172e-02 1.0 1.71e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 32 0 0 0 0 32 0 0 0 1124 PCApply 2894 1.0 7.8614e-03 1.0 1.76e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 33 0 0 0 0 33 0 0 0 2239 ------------------------------------------------------------------------------------------------------------------------ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ling.zou at inl.gov Fri May 19 14:28:28 2017 From: ling.zou at inl.gov (Zou, Ling) Date: Fri, 19 May 2017 13:28:28 -0600 Subject: [petsc-users] Understanding log summary In-Reply-To: References: <633AA7C7-6B11-4DAE-B855-90873FD8B330@anl.gov> Message-ID: No problem :) Thanks as well. Ling On Fri, May 19, 2017 at 1:03 PM, Zhang, Hong wrote: > > On May 19, 2017, at 1:53 PM, Zhang, Hong wrote: > > > On May 19, 2017, at 9:25 AM, Zou, Ling wrote: > > Hi All, > > In terms of code performance, sometimes people would ask for info about > total non-linear iteration numbers, total linear iteration numbers, etc. I > suppose all these could be found in the log summary. For the attached log > summary, can I say? > total non-linear iteration number = 573 > total linear iteration number = 2321 > > > Usually SNESSolve corresponds to the number of nonlinear iterations in the > summary. > > It seems that you are solving some linear systems with GMRES. And there > are 573 linear solves with 2321 GMRES iterations in total. > > > Correction: SNESSolve gives the number of nonlinear solves. > > My mistake. I just realize you are probably using your own nonlinear > solver (not PETSc SNES). Then you can say 573 non-linear iterations and > 2321 linear iterations. > > Hong (Mr.) > > Hong (Mr.) > > Thank you. > > Ling > > > > ------------------------------------------------------------ > ------------------------------------------------------------ > Event Count Time (sec) Flops > --- Global --- --- Stage --- Total > Max Ratio Max Ratio Max Ratio Mess Avg len > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > ------------------------------------------------------------ > ------------------------------------------------------------ > > --- Event Stage 0: Main Stage > > VecDot 607 1.0 1.8729e-04 1.0 1.95e+05 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 1040 > VecMDot 2321 1.0 1.1075e-03 1.0 2.87e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 5 0 0 0 0 5 0 0 0 2590 > VecNorm 5422 1.0 1.2229e-03 1.0 1.74e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 3 0 0 0 0 3 0 0 0 1423 > VecScale 5822 1.0 1.2764e-03 1.0 9.37e+05 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 2 0 0 0 0 2 0 0 0 734 > VecCopy 14334 1.0 1.8302e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecSet 1231 1.0 3.0700e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAXPY 14961 1.0 3.3679e-03 1.0 4.82e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 9 0 0 0 0 9 0 0 0 1430 > VecWAXPY 20842 1.0 5.5537e-03 1.0 4.00e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 8 0 0 0 0 8 0 0 0 721 > VecMAXPY 2894 1.0 1.4292e-03 1.0 3.62e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 7 0 0 0 0 7 0 0 0 2536 > VecSetRandom 34 1.0 1.0322e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecReduceArith 1146 1.0 2.9907e-04 1.0 3.68e+05 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 0 1 0 0 0 1230 > VecReduceComm 573 1.0 9.2384e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecNormalize 2894 1.0 1.9604e-03 1.0 1.39e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 3 0 0 0 0 3 0 0 0 712 > SNESJacobianEval 573 1.0 2.2410e+00 1.0 5.60e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 62 11 0 0 0 62 11 0 0 0 2 > MatMult MF 2928 1.0 5.7922e-01 1.0 2.83e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 16 5 0 0 0 16 5 0 0 0 5 > MatMult 2928 1.0 5.7963e-01 1.0 2.83e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 16 5 0 0 0 16 5 0 0 0 5 > MatSolve 2894 1.0 7.3171e-03 1.0 1.76e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 33 0 0 0 0 33 0 0 0 2405 > MatLUFactorNum 573 1.0 1.4733e-02 1.0 1.71e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 32 0 0 0 0 32 0 0 0 1158 > MatILUFactorSym 1 1.0 2.4543e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyBegin 1147 1.0 7.9204e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 1147 1.0 2.5825e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatGetRowIJ 2 1.0 6.3280e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatGetOrdering 1 1.0 1.5754e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatZeroEntries 573 1.0 5.7120e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatFDColorCreate 1 1.0 2.0541e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatFDColorSetUp 1 1.0 1.6103e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatFDColorApply 573 1.0 2.2386e+00 1.0 5.60e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 62 11 0 0 0 62 11 0 0 0 2 > MatFDColorFunc 11460 1.0 2.2264e+00 1.0 1.85e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 62 3 0 0 0 62 3 0 0 0 1 > MatColoringApply 1 1.0 3.9990e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPGMRESOrthog 2321 1.0 2.9685e-03 1.0 5.75e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 11 0 0 0 0 11 0 0 0 1935 > KSPSetUp 573 1.0 2.3291e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 573 1.0 5.0164e-01 1.0 4.50e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 14 85 0 0 0 14 85 0 0 0 90 > PCSetUp 573 1.0 1.5172e-02 1.0 1.71e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 32 0 0 0 0 32 0 0 0 1124 > PCApply 2894 1.0 7.8614e-03 1.0 1.76e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 33 0 0 0 0 33 0 0 0 2239 > ------------------------------------------------------------ > ------------------------------------------------------------ > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sat May 20 15:09:55 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 20 May 2017 15:09:55 -0500 Subject: [petsc-users] Problems with PetscObjectViewFromOptions in Fortran In-Reply-To: <8603540a-38de-ead0-8690-3a9e5d063e7f@physik.uni-muenchen.de> References: <8603540a-38de-ead0-8690-3a9e5d063e7f@physik.uni-muenchen.de> Message-ID: <13EF3C0A-9E70-4CC3-9F84-1B6BA62104C2@mcs.anl.gov> > On May 18, 2017, at 5:11 AM, Fabian.Jakub wrote: > > Dear Petsc Team, > > I have a problem with object viewing through PetscObjectViewFromOptions > > The C Version works fine, e.g. > > static char help[] = "Testing multiple PetscObjectViewFromOptions"; > #include > > int main(int argc,char **argv) { > DM dmA, dmB; > PetscInitialize(&argc,&argv,(char*)0,help); > > PetscErrorCode ierr; > > ierr = DMPlexCreate(PETSC_COMM_WORLD, &dmA); CHKERRQ(ierr); > ierr = PetscObjectSetName((PetscObject) dmA, "DMPlex_A"); CHKERRQ(ierr); > > ierr = DMPlexCreate(PETSC_COMM_WORLD, &dmB); CHKERRQ(ierr); > ierr = PetscObjectSetName((PetscObject) dmB, "DMPlex_B"); CHKERRQ(ierr); > > PetscObjectViewFromOptions((PetscObject) dmA, NULL, "-dmA"); > PetscObjectViewFromOptions((PetscObject) dmB, NULL, "-dmB"); > > ierr = DMDestroy(&dmA); CHKERRQ(ierr); > ierr = DMDestroy(&dmB); CHKERRQ(ierr); > > PetscFinalize(); > } > > and running it with -help, correctly produces the options and views as: > > -dmA > > -dmB > > > but the equivalent in Fortran, e.g.: > > program main > #include "petsc/finclude/petsc.h" > use petsc > implicit none > > PetscErrorCode :: ierr > > DM :: dmA, dmB > > call PetscInitialize(PETSC_NULL_CHARACTER, ierr); CHKERRQ(ierr) > > call DMPlexCreate(PETSC_COMM_WORLD, dmA, ierr);CHKERRQ(ierr) > call PetscObjectSetName(dmA, 'DMPlex_A', ierr);CHKERRQ(ierr) > > call DMPlexCreate(PETSC_COMM_WORLD, dmB, ierr);CHKERRQ(ierr) > call PetscObjectSetName(dmB, 'DMPlex_B', ierr);CHKERRQ(ierr) > > call PetscObjectViewFromOptions(dmA, PETSC_NULL_CHARACTER, "-dmA", > ierr); CHKERRQ(ierr) > call PetscObjectViewFromOptions(dmB, PETSC_NULL_CHARACTER, "-dmB", The second argument is a PETScObject, not a character string. This is what is causing the error. You should replace the PETSC_NULL_CHARACTER with PETSC_NULL_OBJECT in PETSc 3.7.x or earlier or with PETSC_NULL_VEC with the master branch development version of PETSc. I have added more error checking to the branch barry/errorcheck-fortran-petscobjectviewfromoptions that will detect this error in the future. Thanks for reporting the problem, Barry > ierr); CHKERRQ(ierr) > > call DMDestroy(dmA, ierr);CHKERRQ(ierr) > call DMDestroy(dmB, ierr);CHKERRQ(ierr) > > call PetscFinalize(ierr) > > end program > > produces the options to be: > > -dmA-dmB > > -dmB > > > > While this works as expected when running with: > ./example -dmA-dmB -dmB > > This is not intuitive. > > Is the hickup on my side or is it somewhere in the Fortran stubs? > > Please, let me know if you need more details on the build or if you > cannot reproduce this. > > Many thanks, > > Fabian > > > > Petsc Development GIT revision: v3.7.6-3910-gd04c6f6 GIT Date: > 2017-05-15 17:09:20 -0500 > ./configure \ > --with-cc=$(which mpicc) \ > --with-fc=$(which mpif90) \ > --with-cxx=$(which mpicxx) \ > --with-fortran \ > --with-fortran-interfaces \ > --with-shared-libraries=1 \ > --download-hdf5 \ > --download-triangle \ > --download-ctetgen \ > --with-cmake=$(which cmake) \ > --with-debugging=1 \ > COPTFLAGS='-O2 ' \ > FOPTFLAGS='-O2 ' \ > \ > && make all test > > GNU Fortran (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609 > (Open MPI) 1.10.2 > > > Complete output of -help -info (Fortran Version): > [0] petscinitialize_internal(): (Fortran):PETSc successfully started: > procs 1 > [0] PetscGetHostName(): Rejecting domainname, likely is NIS > met-ws-740m19.(none) > [0] petscinitialize_internal(): Running on machine: met-ws-740m19 > ------Additional PETSc component options-------- > -log_exclude: > -info_exclude: > ----------------------------------------------- > [0] PetscCommDuplicate(): Duplicating a communicator 47693199447680 > 11260976 max tags = 2147483647 > [0] PetscCommDuplicate(): Using internal PETSc communicator > 47693199447680 11260976 > [0] PetscCommDuplicate(): Using internal PETSc communicator > 47693199447680 11260976 > [0] PetscCommDuplicate(): Using internal PETSc communicator > 47693199447680 11260976 > [0] PetscCommDuplicate(): Using internal PETSc communicator > 47693199447680 11260976 > [0] PetscCommDuplicate(): Using internal PETSc communicator > 47693199447680 11260976 > [0] PetscCommDuplicate(): Using internal PETSc communicator > 47693199447680 11260976 > [0] PetscCommDuplicate(): Using internal PETSc communicator > 47693199447680 11260976 > > -dmA-dmB ascii[:[filename][:[format][:append]]]: Prints object to > stdout or ASCII file (PetscOptionsGetViewer) > -dmA-dmB binary[:[filename][:[format][:append]]]: Saves object to a > binary file (PetscOptionsGetViewer) > -dmA-dmB draw[:drawtype[:filename]] Draws object (PetscOptionsGetViewer) > -dmA-dmB socket[:port]: Pushes object to a Unix socket > (PetscOptionsGetViewer) > -dmA-dmB saws[:communicatorname]: Publishes object to SAWs > (PetscOptionsGetViewer) > > DM Object: DMPlex_A 1 MPI processes > type: plex > [0] PetscCommDuplicate(): Duplicating a communicator 47693199449728 > 13799472 max tags = 2147483647 > [0] PetscCommDuplicate(): Using internal PETSc communicator > 47693199449728 13799472 > DMPlex_A in 0 dimensions: > 0-cells: 0 > > -dmB ascii[:[filename][:[format][:append]]]: Prints object to stdout > or ASCII file (PetscOptionsGetViewer) > -dmB binary[:[filename][:[format][:append]]]: Saves object to a > binary file (PetscOptionsGetViewer) > -dmB draw[:drawtype[:filename]] Draws object (PetscOptionsGetViewer) > -dmB socket[:port]: Pushes object to a Unix socket > (PetscOptionsGetViewer) > -dmB saws[:communicatorname]: Publishes object to SAWs > (PetscOptionsGetViewer) > > DM Object: DMPlex_B 1 MPI processes > type: plex > [0] PetscCommDuplicate(): Using internal PETSc communicator > 47693199449728 13799472 > [0] PetscCommDuplicate(): Using internal PETSc communicator > 47693199449728 13799472 > DMPlex_B in 0 dimensions: > 0-cells: 0 > [0] Petsc_DelComm_Inner(): Removing reference to PETSc communicator > embedded in a user MPI_Comm 13799472 > [0] Petsc_DelComm_Outer(): User MPI_Comm 47693199449728 is being freed > after removing reference from inner PETSc comm to this outer comm > [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 13799472 > [0] Petsc_DelCounter(): Deleting counter data in an MPI_Comm 13799472 > [0] PetscFinalize(): PetscFinalize() called > [0] PetscGetHostName(): Rejecting domainname, likely is NIS > met-ws-740m19.(none) > [0] PetscFOpen(): Opening file Log.0 > [0] Petsc_DelViewer(): Removing viewer data attribute in an MPI_Comm > 11260976 > [0] Petsc_DelComm_Inner(): Removing reference to PETSc communicator > embedded in a user MPI_Comm 11260976 > [0] Petsc_DelComm_Outer(): User MPI_Comm 47693199447680 is being freed > after removing reference from inner PETSc comm to this outer comm > [0] PetscCommDestroy(): Deleting PETSc MPI_Comm 11260976 > [0] Petsc_DelCounter(): Deleting counter data in an MPI_Comm 11260976 > From franck.houssen at inria.fr Sun May 21 11:11:57 2017 From: franck.houssen at inria.fr (Franck Houssen) Date: Sun, 21 May 2017 18:11:57 +0200 (CEST) Subject: [petsc-users] Is xout always already zero'ed when being called back on PCShellSetApply ? In-Reply-To: <2135243060.6755933.1495382556901.JavaMail.zimbra@inria.fr> Message-ID: <453436828.6756843.1495383117808.JavaMail.zimbra@inria.fr> When using http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCShellSetApply.html, do I have the guaranty that xout from " PetscErrorCode apply ( PC pc, Vec xin, Vec xout)" has always been previously filled with zeros ? I may have to fill xout by blocks (I would += several possibly overlapping blocks: I need to make sure xout is first filled with zero to get the correct result). Franck -------------- next part -------------- An HTML attachment was scrubbed... URL: From franck.houssen at inria.fr Sun May 21 11:23:00 2017 From: franck.houssen at inria.fr (Franck Houssen) Date: Sun, 21 May 2017 18:23:00 +0200 (CEST) Subject: [petsc-users] Using PETSc MatIS, how to get local matrix (= one domain) before and after assembly ? In-Reply-To: <867421313.6757137.1495383596545.JavaMail.zimbra@inria.fr> Message-ID: <1253777447.6757298.1495383780337.JavaMail.zimbra@inria.fr> I have a 3x3 global matrix is built (diag: 1, 2, 1): it's made of 2 overlapping 2x2 local matrix (diag: 1, 1). Getting non assembled local matrix is OK with MatISGetLocalMat. How to get assembled local matrix (initial local matrix + neigbhor contributions on the borders) ? (expected result is diag: 2, 1) Franck -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: matISLocalMat.cpp Type: text/x-c++src Size: 2354 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: matISLocalMat.log Type: text/x-log Size: 2285 bytes Desc: not available URL: From franck.houssen at inria.fr Sun May 21 11:26:14 2017 From: franck.houssen at inria.fr (Franck Houssen) Date: Sun, 21 May 2017 18:26:14 +0200 (CEST) Subject: [petsc-users] Using PETSc MatIS, how to matmult a global IS matrix and a global vector ? In-Reply-To: <2012394521.6757315.1495383841678.JavaMail.zimbra@inria.fr> Message-ID: <1564257107.6757440.1495383974167.JavaMail.zimbra@inria.fr> Using PETSc MatIS, how to matmult a global IS matrix and a global vector ? Example is attached : I don't get what I expect that is a vector such that proc0 = [1, 2] and proc1 = [2, 1] Franck -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: matISProdMatVec.cpp Type: text/x-c++src Size: 2104 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: matISProdMatVec.log Type: text/x-log Size: 441 bytes Desc: not available URL: From knepley at gmail.com Sun May 21 11:41:10 2017 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 21 May 2017 11:41:10 -0500 Subject: [petsc-users] Is xout always already zero'ed when being called back on PCShellSetApply ? In-Reply-To: <453436828.6756843.1495383117808.JavaMail.zimbra@inria.fr> References: <2135243060.6755933.1495382556901.JavaMail.zimbra@inria.fr> <453436828.6756843.1495383117808.JavaMail.zimbra@inria.fr> Message-ID: On Sun, May 21, 2017 at 11:11 AM, Franck Houssen wrote: > When using http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/ > PCShellSetApply.html, do I have the guaranty that xout from " > PetscErrorCode > > apply (PC > > pc,Vec > > xin,Vec > > xout)" has always been previously filled with zeros ? > I may have to fill xout by blocks (I would += several possibly overlapping > blocks: I need to make sure xout is first filled with zero to get the > correct result). > No, we do not initialize the output vector. Yo ucan call VecSet(xout, 0.0); Thanks, Matt > Franck > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun May 21 11:42:59 2017 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 21 May 2017 11:42:59 -0500 Subject: [petsc-users] [petsc-dev] Using PETSc MatIS, how to get local matrix (= one domain) before and after assembly ? In-Reply-To: <1253777447.6757298.1495383780337.JavaMail.zimbra@inria.fr> References: <867421313.6757137.1495383596545.JavaMail.zimbra@inria.fr> <1253777447.6757298.1495383780337.JavaMail.zimbra@inria.fr> Message-ID: On Sun, May 21, 2017 at 11:23 AM, Franck Houssen wrote: > I have a 3x3 global matrix is built (diag: 1, 2, 1): it's made of 2 > overlapping 2x2 local matrix (diag: 1, 1). > Getting non assembled local matrix is OK with MatISGetLocalMat. > How to get assembled local matrix (initial local matrix + neigbhor > contributions on the borders) ? (expected result is diag: 2, 1) > You can always use http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetSubMatrix.html http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetSubMatrices.html to get copies, but if you just want to build things, you can use http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetLocalSubMatrix.html Thanks, Matt > Franck > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun May 21 11:47:10 2017 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 21 May 2017 11:47:10 -0500 Subject: [petsc-users] [petsc-dev] Using PETSc MatIS, how to matmult a global IS matrix and a global vector ? In-Reply-To: <1564257107.6757440.1495383974167.JavaMail.zimbra@inria.fr> References: <2012394521.6757315.1495383841678.JavaMail.zimbra@inria.fr> <1564257107.6757440.1495383974167.JavaMail.zimbra@inria.fr> Message-ID: On Sun, May 21, 2017 at 11:26 AM, Franck Houssen wrote: > Using PETSc MatIS, how to matmult a global IS matrix and a global vector ? > Example is attached : I don't get what I expect that is a vector such that > proc0 = [1, 2] and proc1 = [2, 1] > 1) I think the global size of your matrix is wrong. You seem to want 3, not 4 2) Global vectors have a non-overlapping row partition. You might be thinking of local vectors Thanks, Matt > Franck > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Sun May 21 15:51:34 2017 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Sun, 21 May 2017 23:51:34 +0300 Subject: [petsc-users] [petsc-dev] Using PETSc MatIS, how to get local matrix (= one domain) before and after assembly ? In-Reply-To: References: <867421313.6757137.1495383596545.JavaMail.zimbra@inria.fr> <1253777447.6757298.1495383780337.JavaMail.zimbra@inria.fr> Message-ID: To assemble the operator in aij format, use MatISGetMPIXAIJ http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatISGetMPIXAIJ.html Il 21 Mag 2017 18:43, "Matthew Knepley" ha scritto: > On Sun, May 21, 2017 at 11:23 AM, Franck Houssen > wrote: > >> I have a 3x3 global matrix is built (diag: 1, 2, 1): it's made of 2 >> overlapping 2x2 local matrix (diag: 1, 1). >> Getting non assembled local matrix is OK with MatISGetLocalMat. >> How to get assembled local matrix (initial local matrix + neigbhor >> contributions on the borders) ? (expected result is diag: 2, 1) >> > > You can always use > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/ > MatGetSubMatrix.html > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/ > MatGetSubMatrices.html > > to get copies, but if you just want to build things, you can use > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/ > MatGetLocalSubMatrix.html > > Thanks, > > Matt > > >> Franck >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Sun May 21 16:02:37 2017 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Sun, 21 May 2017 23:02:37 +0200 Subject: [petsc-users] [petsc-dev] Using PETSc MatIS, how to matmult a global IS matrix and a global vector ? In-Reply-To: References: <2012394521.6757315.1495383841678.JavaMail.zimbra@inria.fr> <1564257107.6757440.1495383974167.JavaMail.zimbra@inria.fr> Message-ID: <264DC59D-B914-42E5-9A89-0746F21A37BF@gmail.com> Franck, PETSc takes care of doing the matrix-vector multiplication properly using MatIS. As Matt said, the layout of the vectors is the usual parallel layout. The local sizes of the MatIS matrix (i.e. the local size of the left and right vectors used in MatMult) are not the sizes of the local subdomain matrices in MatIS. > On May 21, 2017, at 6:47 PM, Matthew Knepley wrote: > > On Sun, May 21, 2017 at 11:26 AM, Franck Houssen > wrote: > Using PETSc MatIS, how to matmult a global IS matrix and a global vector ? Example is attached : I don't get what I expect that is a vector such that proc0 = [1, 2] and proc1 = [2, 1] > > 1) I think the global size of your matrix is wrong. You seem to want 3, not 4 > > 2) Global vectors have a non-overlapping row partition. You might be thinking of local vectors > > Thanks, > > Matt > > Franck > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From franck.houssen at inria.fr Mon May 22 02:23:52 2017 From: franck.houssen at inria.fr (Franck Houssen) Date: Mon, 22 May 2017 09:23:52 +0200 (CEST) Subject: [petsc-users] Is xout always already zero'ed when being called back on PCShellSetApply ? In-Reply-To: References: <2135243060.6755933.1495382556901.JavaMail.zimbra@inria.fr> <453436828.6756843.1495383117808.JavaMail.zimbra@inria.fr> Message-ID: <492707441.6837624.1495437832703.JavaMail.zimbra@inria.fr> OK, thanks. Franck ----- Mail original ----- > De: "Matthew Knepley" > ?: "Franck Houssen" > Cc: "PETSc" , "PETSc" > Envoy?: Dimanche 21 Mai 2017 18:41:10 > Objet: Re: [petsc-users] Is xout always already zero'ed when being called > back on PCShellSetApply ? > On Sun, May 21, 2017 at 11:11 AM, Franck Houssen < franck.houssen at inria.fr > > wrote: > > When using > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCShellSetApply.html, > > do I have the guaranty that xout from " PetscErrorCode apply ( PC pc, Vec > > xin, Vec xout)" has always been previously filled with zeros ? > > > I may have to fill xout by blocks (I would += several possibly overlapping > > blocks: I need to make sure xout is first filled with zero to get the > > correct result). > > No, we do not initialize the output vector. Yo ucan call VecSet(xout, 0.0); > Thanks, > Matt > > Franck > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener > http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pvsang002 at gmail.com Mon May 22 11:25:09 2017 From: pvsang002 at gmail.com (Pham Pham) Date: Tue, 23 May 2017 00:25:09 +0800 Subject: [petsc-users] Installation question In-Reply-To: References: Message-ID: Hi Matt, For the machine I have, Is it a good idea if I mix MPI and OpenMP: MPI for cores with Rank%12==0 and OpenMP for the others ? Thank you, PVS. On Thu, May 11, 2017 at 8:27 PM, Matthew Knepley wrote: > On Thu, May 11, 2017 at 7:08 AM, Pham Pham wrote: > >> Hi Matt, >> >> Thank you for the reply. >> >> I am using University HPC which has multiple nodes, and should be good >> for parallel computing. The bad performance might be due to the way I >> install and run PETSc... >> >> Looking at the output when running streams, I can see that the Processor >> names were the same. >> Does that mean only one processor involved in computing, did it cause the >> bad performance? >> > > Yes. From the data, it appears that the kind of processor you have has 12 > cores, but only enough memory bandwidth to support 1.5 cores. > Try running the STREAMS with only 1 process per node. This is a setting in > your submission script, but it is different for every cluster. Thus > I would ask the local sysdamin for this machine to help you do that. You > should see almost perfect scaling with that configuration. You might > also try 2 processes per node to compare. > > Thanks, > > Matt > > >> Thank you very much. >> >> Ph. >> >> Below is testing output: >> >> [mpepvs at atlas5-c01 petsc-3.7.5]$ make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 >> PETSC_ARCH=arch-linux-cxx-opt streams >> >> >> >> >> cd src/benchmarks/streams; /usr/bin/gmake --no-print-directory >> PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 >> PETSC_ARCH=arch-linux-cxx-opt streams >> /app1/centos6.3/Intel/xe_2015/impi/5.0.3.048/intel64/bin/mpicxx -o >> MPIVersion.o c -wd1572 -g -O3 -fPIC -I/home/svu/mpepvs/petsc/petsc-3.7.5/include >> -I/hom >> >> >> e/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/include >> -I/app1/centos6.3/Intel/xe_2015/impi/5.0.3.048/intel64/include >> `pwd`/MPIVersion.c >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> +++++++++++++++++++++++++++++++ >> The version of PETSc you are using is out-of-date, we recommend updating >> to the new release >> Available Version: 3.7.6 Installed Version: 3.7.5 >> http://www.mcs.anl.gov/petsc/download/index.html >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> +++++++++++++++++++++++++++++++ >> Running streams with 'mpiexec.hydra ' using 'NPMAX=12' >> Number of MPI processes 1 Processor names atlas5-c01 >> Triad: 11026.7604 Rate (MB/s) >> Number of MPI processes 2 Processor names atlas5-c01 atlas5-c01 >> Triad: 14669.6730 Rate (MB/s) >> Number of MPI processes 3 Processor names atlas5-c01 atlas5-c01 >> atlas5-c01 >> Triad: 12848.2644 Rate (MB/s) >> Number of MPI processes 4 Processor names atlas5-c01 atlas5-c01 >> atlas5-c01 atlas5-c01 >> Triad: 15033.7687 Rate (MB/s) >> Number of MPI processes 5 Processor names atlas5-c01 atlas5-c01 >> atlas5-c01 atlas5-c01 atlas5-c01 >> Triad: 13299.3830 Rate (MB/s) >> Number of MPI processes 6 Processor names atlas5-c01 atlas5-c01 >> atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 >> Triad: 14382.2116 Rate (MB/s) >> Number of MPI processes 7 Processor names atlas5-c01 atlas5-c01 >> atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 >> Triad: 13194.2573 Rate (MB/s) >> Number of MPI processes 8 Processor names atlas5-c01 atlas5-c01 >> atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 >> Triad: 14199.7255 Rate (MB/s) >> Number of MPI processes 9 Processor names atlas5-c01 atlas5-c01 >> atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 >> Triad: 13045.8946 Rate (MB/s) >> Number of MPI processes 10 Processor names atlas5-c01 atlas5-c01 >> atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 >> atlas5-c01 atlas5-c01 >> Triad: 13058.3283 Rate (MB/s) >> Number of MPI processes 11 Processor names atlas5-c01 atlas5-c01 >> atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 >> atlas5-c01 atlas5-c01 atlas5-c01 >> Triad: 13037.3334 Rate (MB/s) >> Number of MPI processes 12 Processor names atlas5-c01 atlas5-c01 >> atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 >> atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 >> Triad: 12526.6096 Rate (MB/s) >> ------------------------------------------------ >> np speedup >> 1 1.0 >> 2 1.33 >> 3 1.17 >> 4 1.36 >> 5 1.21 >> 6 1.3 >> 7 1.2 >> 8 1.29 >> 9 1.18 >> 10 1.18 >> 11 1.18 >> 12 1.14 >> Estimation of possible speedup of MPI programs based on Streams benchmark. >> It appears you have 1 node(s) >> See graph in the file src/benchmarks/streams/scaling.png >> >> On Fri, May 5, 2017 at 11:26 PM, Matthew Knepley >> wrote: >> >>> On Fri, May 5, 2017 at 10:18 AM, Pham Pham wrote: >>> >>>> Hi Satish, >>>> >>>> It runs now, and shows a bad speed up: >>>> Please help to improve this. >>>> >>> >>> http://www.mcs.anl.gov/petsc/documentation/faq.html#computers >>> >>> The short answer is: You cannot improve this without buying a different >>> machine. This is >>> a fundamental algorithmic limitation that cannot be helped by threads, >>> or vectorization, or >>> anything else. >>> >>> Matt >>> >>> >>>> Thank you. >>>> >>>> >>>> ? >>>> >>>> On Fri, May 5, 2017 at 10:02 PM, Satish Balay >>>> wrote: >>>> >>>>> With Intel MPI - its best to use mpiexec.hydra [and not mpiexec] >>>>> >>>>> So you can do: >>>>> >>>>> make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 >>>>> PETSC_ARCH=arch-linux-cxx-opt MPIEXEC=mpiexec.hydra test >>>>> >>>>> >>>>> [you can also specify --with-mpiexec=mpiexec.hydra at configure time] >>>>> >>>>> Satish >>>>> >>>>> >>>>> On Fri, 5 May 2017, Pham Pham wrote: >>>>> >>>>> > *Hi,* >>>>> > *I can configure now, but fail when testing:* >>>>> > >>>>> > [mpepvs at atlas7-c10 petsc-3.7.5]$ make >>>>> > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 >>>>> PETSC_ARCH=arch-linux-cxx-opt >>>>> > test Running test examples to verify correct installation >>>>> > Using PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 and >>>>> > PETSC_ARCH=arch-linux-cxx-opt >>>>> > Possible error running C/C++ src/snes/examples/tutorials/ex19 with >>>>> 1 MPI >>>>> > process >>>>> > See http://www.mcs.anl.gov/petsc/documentation/faq.html >>>>> > mpiexec_atlas7-c10: cannot connect to local mpd >>>>> (/tmp/mpd2.console_mpepvs); >>>>> > possible causes: >>>>> > 1. no mpd is running on this host >>>>> > 2. an mpd is running but was started without a "console" (-n >>>>> option) >>>>> > Possible error running C/C++ src/snes/examples/tutorials/ex19 with >>>>> 2 MPI >>>>> > processes >>>>> > See http://www.mcs.anl.gov/petsc/documentation/faq.html >>>>> > mpiexec_atlas7-c10: cannot connect to local mpd >>>>> (/tmp/mpd2.console_mpepvs); >>>>> > possible causes: >>>>> > 1. no mpd is running on this host >>>>> > 2. an mpd is running but was started without a "console" (-n >>>>> option) >>>>> > Possible error running Fortran example src/snes/examples/tutorials/ex >>>>> 5f >>>>> > with 1 MPI process >>>>> > See http://www.mcs.anl.gov/petsc/documentation/faq.html >>>>> > mpiexec_atlas7-c10: cannot connect to local mpd >>>>> (/tmp/mpd2.console_mpepvs); >>>>> > possible causes: >>>>> > 1. no mpd is running on this host >>>>> > 2. an mpd is running but was started without a "console" (-n >>>>> option) >>>>> > Completed test examples >>>>> > ========================================= >>>>> > Now to evaluate the computer systems you plan use - do: >>>>> > make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 >>>>> > PETSC_ARCH=arch-linux-cxx-opt streams >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > *Please help on this.* >>>>> > *Many thanks!* >>>>> > >>>>> > >>>>> > On Thu, Apr 20, 2017 at 2:02 AM, Satish Balay >>>>> wrote: >>>>> > >>>>> > > Sorry - should have mentioned: >>>>> > > >>>>> > > do 'rm -rf arch-linux-cxx-opt' and rerun configure again. >>>>> > > >>>>> > > The mpich install from previous build [that is currently in >>>>> > > arch-linux-cxx-opt/] >>>>> > > is conflicting with --with-mpi-dir=/app1/centos6.3 >>>>> /gnu/mvapich2-1.9/ >>>>> > > >>>>> > > Satish >>>>> > > >>>>> > > >>>>> > > On Wed, 19 Apr 2017, Pham Pham wrote: >>>>> > > >>>>> > > > I reconfigured PETSs with installed MPI, however, I got serous >>>>> error: >>>>> > > > >>>>> > > > **************************ERROR***************************** >>>>> ******** >>>>> > > > Error during compile, check arch-linux-cxx-opt/lib/petsc/c >>>>> onf/make.log >>>>> > > > Send it and arch-linux-cxx-opt/lib/petsc/conf/configure.log to >>>>> > > > petsc-maint at mcs.anl.gov >>>>> > > > ************************************************************ >>>>> ******** >>>>> > > > >>>>> > > > Please explain what is happening? >>>>> > > > >>>>> > > > Thank you very much. >>>>> > > > >>>>> > > > >>>>> > > > >>>>> > > > >>>>> > > > On Wed, Apr 19, 2017 at 11:43 PM, Satish Balay < >>>>> balay at mcs.anl.gov> >>>>> > > wrote: >>>>> > > > >>>>> > > > > Presumably your cluster already has a recommended MPI to use >>>>> [which is >>>>> > > > > already installed. So you should use that - instead of >>>>> > > > > --download-mpich=1 >>>>> > > > > >>>>> > > > > Satish >>>>> > > > > >>>>> > > > > On Wed, 19 Apr 2017, Pham Pham wrote: >>>>> > > > > >>>>> > > > > > Hi, >>>>> > > > > > >>>>> > > > > > I just installed petsc-3.7.5 into my university cluster. When >>>>> > > evaluating >>>>> > > > > > the computer system, PETSc reports "It appears you have 1 >>>>> node(s)", I >>>>> > > > > donot >>>>> > > > > > understand this, since the system is a multinodes system. >>>>> Could you >>>>> > > > > please >>>>> > > > > > explain this to me? >>>>> > > > > > >>>>> > > > > > Thank you very much. >>>>> > > > > > >>>>> > > > > > S. >>>>> > > > > > >>>>> > > > > > Output: >>>>> > > > > > ========================================= >>>>> > > > > > Now to evaluate the computer systems you plan use - do: >>>>> > > > > > make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 >>>>> > > > > > PETSC_ARCH=arch-linux-cxx-opt streams >>>>> > > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ make >>>>> > > > > > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 >>>>> > > > > PETSC_ARCH=arch-linux-cxx-opt >>>>> > > > > > streams >>>>> > > > > > cd src/benchmarks/streams; /usr/bin/gmake >>>>> --no-print-directory >>>>> > > > > > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 >>>>> > > > > PETSC_ARCH=arch-linux-cxx-opt >>>>> > > > > > streams >>>>> > > > > > /home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/bin/mpicxx >>>>> -o >>>>> > > > > > MPIVersion.o -c -Wall -Wwrite-strings -Wno-strict-aliasing >>>>> > > > > > -Wno-unknown-pragmas -fvisibility=hidden -g -O >>>>> > > > > > -I/home/svu/mpepvs/petsc/petsc-3.7.5/include >>>>> > > > > > -I/home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/incl >>>>> ude >>>>> > > > > > `pwd`/MPIVersion.c >>>>> > > > > > Running streams with >>>>> > > > > > '/home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/bin/mpiexec >>>>> ' >>>>> > > > > using >>>>> > > > > > 'NPMAX=12' >>>>> > > > > > Number of MPI processes 1 Processor names atlas7-c10 >>>>> > > > > > Triad: 9137.5025 Rate (MB/s) >>>>> > > > > > Number of MPI processes 2 Processor names atlas7-c10 >>>>> atlas7-c10 >>>>> > > > > > Triad: 9707.2815 Rate (MB/s) >>>>> > > > > > Number of MPI processes 3 Processor names atlas7-c10 >>>>> atlas7-c10 >>>>> > > > > atlas7-c10 >>>>> > > > > > Triad: 13559.5275 Rate (MB/s) >>>>> > > > > > Number of MPI processes 4 Processor names atlas7-c10 >>>>> atlas7-c10 >>>>> > > > > atlas7-c10 >>>>> > > > > > atlas7-c10 >>>>> > > > > > Triad: 14193.0597 Rate (MB/s) >>>>> > > > > > Number of MPI processes 5 Processor names atlas7-c10 >>>>> atlas7-c10 >>>>> > > > > atlas7-c10 >>>>> > > > > > atlas7-c10 atlas7-c10 >>>>> > > > > > Triad: 14492.9234 Rate (MB/s) >>>>> > > > > > Number of MPI processes 6 Processor names atlas7-c10 >>>>> atlas7-c10 >>>>> > > > > atlas7-c10 >>>>> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 >>>>> > > > > > Triad: 15476.5912 Rate (MB/s) >>>>> > > > > > Number of MPI processes 7 Processor names atlas7-c10 >>>>> atlas7-c10 >>>>> > > > > atlas7-c10 >>>>> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >>>>> > > > > > Triad: 15148.7388 Rate (MB/s) >>>>> > > > > > Number of MPI processes 8 Processor names atlas7-c10 >>>>> atlas7-c10 >>>>> > > > > atlas7-c10 >>>>> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >>>>> > > > > > Triad: 15799.1290 Rate (MB/s) >>>>> > > > > > Number of MPI processes 9 Processor names atlas7-c10 >>>>> atlas7-c10 >>>>> > > > > atlas7-c10 >>>>> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >>>>> atlas7-c10 >>>>> > > > > > Triad: 15671.3104 Rate (MB/s) >>>>> > > > > > Number of MPI processes 10 Processor names atlas7-c10 >>>>> atlas7-c10 >>>>> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >>>>> atlas7-c10 >>>>> > > > > > atlas7-c10 atlas7-c10 >>>>> > > > > > Triad: 15601.4754 Rate (MB/s) >>>>> > > > > > Number of MPI processes 11 Processor names atlas7-c10 >>>>> atlas7-c10 >>>>> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >>>>> atlas7-c10 >>>>> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 >>>>> > > > > > Triad: 15434.5790 Rate (MB/s) >>>>> > > > > > Number of MPI processes 12 Processor names atlas7-c10 >>>>> atlas7-c10 >>>>> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >>>>> atlas7-c10 >>>>> > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >>>>> > > > > > Triad: 15134.1263 Rate (MB/s) >>>>> > > > > > ------------------------------------------------ >>>>> > > > > > np speedup >>>>> > > > > > 1 1.0 >>>>> > > > > > 2 1.06 >>>>> > > > > > 3 1.48 >>>>> > > > > > 4 1.55 >>>>> > > > > > 5 1.59 >>>>> > > > > > 6 1.69 >>>>> > > > > > 7 1.66 >>>>> > > > > > 8 1.73 >>>>> > > > > > 9 1.72 >>>>> > > > > > 10 1.71 >>>>> > > > > > 11 1.69 >>>>> > > > > > 12 1.66 >>>>> > > > > > Estimation of possible speedup of MPI programs based on >>>>> Streams >>>>> > > > > benchmark. >>>>> > > > > > It appears you have 1 node(s) >>>>> > > > > > Unable to plot speedup to a file >>>>> > > > > > Unable to open matplotlib to plot speedup >>>>> > > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ >>>>> > > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ >>>>> > > > > > >>>>> > > > > >>>>> > > > > >>>>> > > > >>>>> > > >>>>> > > >>>>> > >>>>> >>>>> >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: scaling.png Type: image/png Size: 46047 bytes Desc: not available URL: From bsmith at mcs.anl.gov Mon May 22 12:58:49 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 22 May 2017 12:58:49 -0500 Subject: [petsc-users] Installation question In-Reply-To: References: Message-ID: <1B05ACB6-6BDF-42C8-89FB-C5ECC5657934@mcs.anl.gov> > On May 22, 2017, at 11:25 AM, Pham Pham wrote: > > Hi Matt, > > For the machine I have, Is it a good idea if I mix MPI and OpenMP: MPI for cores with Rank%12==0 and OpenMP for the others ? > MPI+OpenMP doesn't work this way. Each "rank" is an MPI process, you cannot say some ranks are MPI and some are OpenMP. If you want to use one MPI process per node and have each MPI process have 12 OpenMP threads you need to find out for YOUR systems MPI how you tell it to put one MPI process per node; Barry > Thank you, > > PVS. > > On Thu, May 11, 2017 at 8:27 PM, Matthew Knepley wrote: > On Thu, May 11, 2017 at 7:08 AM, Pham Pham wrote: > Hi Matt, > > Thank you for the reply. > > I am using University HPC which has multiple nodes, and should be good for parallel computing. The bad performance might be due to the way I install and run PETSc... > > Looking at the output when running streams, I can see that the Processor names were the same. > Does that mean only one processor involved in computing, did it cause the bad performance? > > Yes. From the data, it appears that the kind of processor you have has 12 cores, but only enough memory bandwidth to support 1.5 cores. > Try running the STREAMS with only 1 process per node. This is a setting in your submission script, but it is different for every cluster. Thus > I would ask the local sysdamin for this machine to help you do that. You should see almost perfect scaling with that configuration. You might > also try 2 processes per node to compare. > > Thanks, > > Matt > > Thank you very much. > > Ph. > > Below is testing output: > > [mpepvs at atlas5-c01 petsc-3.7.5]$ make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 PETSC_ARCH=arch-linux-cxx-opt streams > cd src/benchmarks/streams; /usr/bin/gmake --no-print-directory PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 PETSC_ARCH=arch-linux-cxx-opt streams > /app1/centos6.3/Intel/xe_2015/impi/5.0.3.048/intel64/bin/mpicxx -o MPIVersion.o c -wd1572 -g -O3 -fPIC -I/home/svu/mpepvs/petsc/petsc-3.7.5/include -I/hom e/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/include -I/app1/centos6.3/Intel/xe_2015/impi/5.0.3.048/intel64/include `pwd`/MPIVersion.c > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > The version of PETSc you are using is out-of-date, we recommend updating to the new release > Available Version: 3.7.6 Installed Version: 3.7.5 > http://www.mcs.anl.gov/petsc/download/index.html > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Running streams with 'mpiexec.hydra ' using 'NPMAX=12' > Number of MPI processes 1 Processor names atlas5-c01 > Triad: 11026.7604 Rate (MB/s) > Number of MPI processes 2 Processor names atlas5-c01 atlas5-c01 > Triad: 14669.6730 Rate (MB/s) > Number of MPI processes 3 Processor names atlas5-c01 atlas5-c01 atlas5-c01 > Triad: 12848.2644 Rate (MB/s) > Number of MPI processes 4 Processor names atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 > Triad: 15033.7687 Rate (MB/s) > Number of MPI processes 5 Processor names atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 > Triad: 13299.3830 Rate (MB/s) > Number of MPI processes 6 Processor names atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 > Triad: 14382.2116 Rate (MB/s) > Number of MPI processes 7 Processor names atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 > Triad: 13194.2573 Rate (MB/s) > Number of MPI processes 8 Processor names atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 > Triad: 14199.7255 Rate (MB/s) > Number of MPI processes 9 Processor names atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 > Triad: 13045.8946 Rate (MB/s) > Number of MPI processes 10 Processor names atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 > Triad: 13058.3283 Rate (MB/s) > Number of MPI processes 11 Processor names atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 > Triad: 13037.3334 Rate (MB/s) > Number of MPI processes 12 Processor names atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 > Triad: 12526.6096 Rate (MB/s) > ------------------------------------------------ > np speedup > 1 1.0 > 2 1.33 > 3 1.17 > 4 1.36 > 5 1.21 > 6 1.3 > 7 1.2 > 8 1.29 > 9 1.18 > 10 1.18 > 11 1.18 > 12 1.14 > Estimation of possible speedup of MPI programs based on Streams benchmark. > It appears you have 1 node(s) > See graph in the file src/benchmarks/streams/scaling.png > > On Fri, May 5, 2017 at 11:26 PM, Matthew Knepley wrote: > On Fri, May 5, 2017 at 10:18 AM, Pham Pham wrote: > Hi Satish, > > It runs now, and shows a bad speed up: > Please help to improve this. > > http://www.mcs.anl.gov/petsc/documentation/faq.html#computers > > The short answer is: You cannot improve this without buying a different machine. This is > a fundamental algorithmic limitation that cannot be helped by threads, or vectorization, or > anything else. > > Matt > > Thank you. > > > ? > > On Fri, May 5, 2017 at 10:02 PM, Satish Balay wrote: > With Intel MPI - its best to use mpiexec.hydra [and not mpiexec] > > So you can do: > > make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 PETSC_ARCH=arch-linux-cxx-opt MPIEXEC=mpiexec.hydra test > > > [you can also specify --with-mpiexec=mpiexec.hydra at configure time] > > Satish > > > On Fri, 5 May 2017, Pham Pham wrote: > > > *Hi,* > > *I can configure now, but fail when testing:* > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ make > > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 PETSC_ARCH=arch-linux-cxx-opt > > test Running test examples to verify correct installation > > Using PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 and > > PETSC_ARCH=arch-linux-cxx-opt > > Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1 MPI > > process > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > mpiexec_atlas7-c10: cannot connect to local mpd (/tmp/mpd2.console_mpepvs); > > possible causes: > > 1. no mpd is running on this host > > 2. an mpd is running but was started without a "console" (-n option) > > Possible error running C/C++ src/snes/examples/tutorials/ex19 with 2 MPI > > processes > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > mpiexec_atlas7-c10: cannot connect to local mpd (/tmp/mpd2.console_mpepvs); > > possible causes: > > 1. no mpd is running on this host > > 2. an mpd is running but was started without a "console" (-n option) > > Possible error running Fortran example src/snes/examples/tutorials/ex5f > > with 1 MPI process > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > mpiexec_atlas7-c10: cannot connect to local mpd (/tmp/mpd2.console_mpepvs); > > possible causes: > > 1. no mpd is running on this host > > 2. an mpd is running but was started without a "console" (-n option) > > Completed test examples > > ========================================= > > Now to evaluate the computer systems you plan use - do: > > make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > > PETSC_ARCH=arch-linux-cxx-opt streams > > > > > > > > > > *Please help on this.* > > *Many thanks!* > > > > > > On Thu, Apr 20, 2017 at 2:02 AM, Satish Balay wrote: > > > > > Sorry - should have mentioned: > > > > > > do 'rm -rf arch-linux-cxx-opt' and rerun configure again. > > > > > > The mpich install from previous build [that is currently in > > > arch-linux-cxx-opt/] > > > is conflicting with --with-mpi-dir=/app1/centos6.3/gnu/mvapich2-1.9/ > > > > > > Satish > > > > > > > > > On Wed, 19 Apr 2017, Pham Pham wrote: > > > > > > > I reconfigured PETSs with installed MPI, however, I got serous error: > > > > > > > > **************************ERROR************************************* > > > > Error during compile, check arch-linux-cxx-opt/lib/petsc/conf/make.log > > > > Send it and arch-linux-cxx-opt/lib/petsc/conf/configure.log to > > > > petsc-maint at mcs.anl.gov > > > > ******************************************************************** > > > > > > > > Please explain what is happening? > > > > > > > > Thank you very much. > > > > > > > > > > > > > > > > > > > > On Wed, Apr 19, 2017 at 11:43 PM, Satish Balay > > > wrote: > > > > > > > > > Presumably your cluster already has a recommended MPI to use [which is > > > > > already installed. So you should use that - instead of > > > > > --download-mpich=1 > > > > > > > > > > Satish > > > > > > > > > > On Wed, 19 Apr 2017, Pham Pham wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > I just installed petsc-3.7.5 into my university cluster. When > > > evaluating > > > > > > the computer system, PETSc reports "It appears you have 1 node(s)", I > > > > > donot > > > > > > understand this, since the system is a multinodes system. Could you > > > > > please > > > > > > explain this to me? > > > > > > > > > > > > Thank you very much. > > > > > > > > > > > > S. > > > > > > > > > > > > Output: > > > > > > ========================================= > > > > > > Now to evaluate the computer systems you plan use - do: > > > > > > make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > > > > > > PETSC_ARCH=arch-linux-cxx-opt streams > > > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ make > > > > > > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > > > > > PETSC_ARCH=arch-linux-cxx-opt > > > > > > streams > > > > > > cd src/benchmarks/streams; /usr/bin/gmake --no-print-directory > > > > > > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > > > > > PETSC_ARCH=arch-linux-cxx-opt > > > > > > streams > > > > > > /home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/bin/mpicxx -o > > > > > > MPIVersion.o -c -Wall -Wwrite-strings -Wno-strict-aliasing > > > > > > -Wno-unknown-pragmas -fvisibility=hidden -g -O > > > > > > -I/home/svu/mpepvs/petsc/petsc-3.7.5/include > > > > > > -I/home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/include > > > > > > `pwd`/MPIVersion.c > > > > > > Running streams with > > > > > > '/home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/bin/mpiexec ' > > > > > using > > > > > > 'NPMAX=12' > > > > > > Number of MPI processes 1 Processor names atlas7-c10 > > > > > > Triad: 9137.5025 Rate (MB/s) > > > > > > Number of MPI processes 2 Processor names atlas7-c10 atlas7-c10 > > > > > > Triad: 9707.2815 Rate (MB/s) > > > > > > Number of MPI processes 3 Processor names atlas7-c10 atlas7-c10 > > > > > atlas7-c10 > > > > > > Triad: 13559.5275 Rate (MB/s) > > > > > > Number of MPI processes 4 Processor names atlas7-c10 atlas7-c10 > > > > > atlas7-c10 > > > > > > atlas7-c10 > > > > > > Triad: 14193.0597 Rate (MB/s) > > > > > > Number of MPI processes 5 Processor names atlas7-c10 atlas7-c10 > > > > > atlas7-c10 > > > > > > atlas7-c10 atlas7-c10 > > > > > > Triad: 14492.9234 Rate (MB/s) > > > > > > Number of MPI processes 6 Processor names atlas7-c10 atlas7-c10 > > > > > atlas7-c10 > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 > > > > > > Triad: 15476.5912 Rate (MB/s) > > > > > > Number of MPI processes 7 Processor names atlas7-c10 atlas7-c10 > > > > > atlas7-c10 > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > > > Triad: 15148.7388 Rate (MB/s) > > > > > > Number of MPI processes 8 Processor names atlas7-c10 atlas7-c10 > > > > > atlas7-c10 > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > > > Triad: 15799.1290 Rate (MB/s) > > > > > > Number of MPI processes 9 Processor names atlas7-c10 atlas7-c10 > > > > > atlas7-c10 > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > > > Triad: 15671.3104 Rate (MB/s) > > > > > > Number of MPI processes 10 Processor names atlas7-c10 atlas7-c10 > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > > > atlas7-c10 atlas7-c10 > > > > > > Triad: 15601.4754 Rate (MB/s) > > > > > > Number of MPI processes 11 Processor names atlas7-c10 atlas7-c10 > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 > > > > > > Triad: 15434.5790 Rate (MB/s) > > > > > > Number of MPI processes 12 Processor names atlas7-c10 atlas7-c10 > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > > > Triad: 15134.1263 Rate (MB/s) > > > > > > ------------------------------------------------ > > > > > > np speedup > > > > > > 1 1.0 > > > > > > 2 1.06 > > > > > > 3 1.48 > > > > > > 4 1.55 > > > > > > 5 1.59 > > > > > > 6 1.69 > > > > > > 7 1.66 > > > > > > 8 1.73 > > > > > > 9 1.72 > > > > > > 10 1.71 > > > > > > 11 1.69 > > > > > > 12 1.66 > > > > > > Estimation of possible speedup of MPI programs based on Streams > > > > > benchmark. > > > > > > It appears you have 1 node(s) > > > > > > Unable to plot speedup to a file > > > > > > Unable to open matplotlib to plot speedup > > > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ > > > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > From pvsang002 at gmail.com Mon May 22 18:00:09 2017 From: pvsang002 at gmail.com (Pham Pham) Date: Tue, 23 May 2017 07:00:09 +0800 Subject: [petsc-users] Installation question In-Reply-To: <1B05ACB6-6BDF-42C8-89FB-C5ECC5657934@mcs.anl.gov> References: <1B05ACB6-6BDF-42C8-89FB-C5ECC5657934@mcs.anl.gov> Message-ID: Hi Barry, My code using DMDA, the mesh is partitioned in x-direction only. Can I have MPI+OpenMP works in the following way: I want to create a new communicator which includes processes with Rank%12==0, PETSc objects will be created with this new sub-set of processes. In each node (which has 12 cores), the first core (Rank%12==0) does MPI communicating job (with Rank%12==0 process of other nodes), then commanded other 11 processes do computation works using openMP? Thank you. On Tue, May 23, 2017 at 1:58 AM, Barry Smith wrote: > > > On May 22, 2017, at 11:25 AM, Pham Pham wrote: > > > > Hi Matt, > > > > For the machine I have, Is it a good idea if I mix MPI and OpenMP: MPI > for cores with Rank%12==0 and OpenMP for the others ? > > > > MPI+OpenMP doesn't work this way. Each "rank" is an MPI process, you > cannot say some ranks are MPI and some are OpenMP. If you want to use one > MPI process per node and have each MPI process have 12 OpenMP threads you > need to find out for YOUR systems MPI how you tell it to put one MPI > process per node; > > Barry > > > Thank you, > > > > PVS. > > > > On Thu, May 11, 2017 at 8:27 PM, Matthew Knepley > wrote: > > On Thu, May 11, 2017 at 7:08 AM, Pham Pham wrote: > > Hi Matt, > > > > Thank you for the reply. > > > > I am using University HPC which has multiple nodes, and should be good > for parallel computing. The bad performance might be due to the way I > install and run PETSc... > > > > Looking at the output when running streams, I can see that the Processor > names were the same. > > Does that mean only one processor involved in computing, did it cause > the bad performance? > > > > Yes. From the data, it appears that the kind of processor you have has > 12 cores, but only enough memory bandwidth to support 1.5 cores. > > Try running the STREAMS with only 1 process per node. This is a setting > in your submission script, but it is different for every cluster. Thus > > I would ask the local sysdamin for this machine to help you do that. You > should see almost perfect scaling with that configuration. You might > > also try 2 processes per node to compare. > > > > Thanks, > > > > Matt > > > > Thank you very much. > > > > Ph. > > > > Below is testing output: > > > > [mpepvs at atlas5-c01 petsc-3.7.5]$ make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > PETSC_ARCH=arch-linux-cxx-opt streams > > cd src/benchmarks/streams; /usr/bin/gmake --no-print-directory > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > PETSC_ARCH=arch-linux-cxx-opt streams > > /app1/centos6.3/Intel/xe_2015/impi/5.0.3.048/intel64/bin/mpicxx -o > MPIVersion.o c -wd1572 -g -O3 -fPIC -I/home/svu/mpepvs/petsc/petsc-3.7.5/include > -I/hom > > e/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/include > -I/app1/centos6.3/Intel/xe_2015/impi/5.0.3.048/intel64/include > `pwd`/MPIVersion.c > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > +++++++++++++++++++++++++++++++ > > The version of PETSc you are using is out-of-date, we recommend updating > to the new release > > Available Version: 3.7.6 Installed Version: 3.7.5 > > http://www.mcs.anl.gov/petsc/download/index.html > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > +++++++++++++++++++++++++++++++ > > Running streams with 'mpiexec.hydra ' using 'NPMAX=12' > > Number of MPI processes 1 Processor names atlas5-c01 > > Triad: 11026.7604 Rate (MB/s) > > Number of MPI processes 2 Processor names atlas5-c01 atlas5-c01 > > Triad: 14669.6730 Rate (MB/s) > > Number of MPI processes 3 Processor names atlas5-c01 atlas5-c01 > atlas5-c01 > > Triad: 12848.2644 Rate (MB/s) > > Number of MPI processes 4 Processor names atlas5-c01 atlas5-c01 > atlas5-c01 atlas5-c01 > > Triad: 15033.7687 Rate (MB/s) > > Number of MPI processes 5 Processor names atlas5-c01 atlas5-c01 > atlas5-c01 atlas5-c01 atlas5-c01 > > Triad: 13299.3830 Rate (MB/s) > > Number of MPI processes 6 Processor names atlas5-c01 atlas5-c01 > atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 > > Triad: 14382.2116 Rate (MB/s) > > Number of MPI processes 7 Processor names atlas5-c01 atlas5-c01 > atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 > > Triad: 13194.2573 Rate (MB/s) > > Number of MPI processes 8 Processor names atlas5-c01 atlas5-c01 > atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 > > Triad: 14199.7255 Rate (MB/s) > > Number of MPI processes 9 Processor names atlas5-c01 atlas5-c01 > atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 > > Triad: 13045.8946 Rate (MB/s) > > Number of MPI processes 10 Processor names atlas5-c01 atlas5-c01 > atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 > atlas5-c01 atlas5-c01 > > Triad: 13058.3283 Rate (MB/s) > > Number of MPI processes 11 Processor names atlas5-c01 atlas5-c01 > atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 > atlas5-c01 atlas5-c01 atlas5-c01 > > Triad: 13037.3334 Rate (MB/s) > > Number of MPI processes 12 Processor names atlas5-c01 atlas5-c01 > atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 > atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 > > Triad: 12526.6096 Rate (MB/s) > > ------------------------------------------------ > > np speedup > > 1 1.0 > > 2 1.33 > > 3 1.17 > > 4 1.36 > > 5 1.21 > > 6 1.3 > > 7 1.2 > > 8 1.29 > > 9 1.18 > > 10 1.18 > > 11 1.18 > > 12 1.14 > > Estimation of possible speedup of MPI programs based on Streams > benchmark. > > It appears you have 1 node(s) > > See graph in the file src/benchmarks/streams/scaling.png > > > > On Fri, May 5, 2017 at 11:26 PM, Matthew Knepley > wrote: > > On Fri, May 5, 2017 at 10:18 AM, Pham Pham wrote: > > Hi Satish, > > > > It runs now, and shows a bad speed up: > > Please help to improve this. > > > > http://www.mcs.anl.gov/petsc/documentation/faq.html#computers > > > > The short answer is: You cannot improve this without buying a different > machine. This is > > a fundamental algorithmic limitation that cannot be helped by threads, > or vectorization, or > > anything else. > > > > Matt > > > > Thank you. > > > > > > ? > > > > On Fri, May 5, 2017 at 10:02 PM, Satish Balay wrote: > > With Intel MPI - its best to use mpiexec.hydra [and not mpiexec] > > > > So you can do: > > > > make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > PETSC_ARCH=arch-linux-cxx-opt MPIEXEC=mpiexec.hydra test > > > > > > [you can also specify --with-mpiexec=mpiexec.hydra at configure time] > > > > Satish > > > > > > On Fri, 5 May 2017, Pham Pham wrote: > > > > > *Hi,* > > > *I can configure now, but fail when testing:* > > > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ make > > > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > PETSC_ARCH=arch-linux-cxx-opt > > > test Running test examples to verify correct installation > > > Using PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 and > > > PETSC_ARCH=arch-linux-cxx-opt > > > Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1 > MPI > > > process > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > > mpiexec_atlas7-c10: cannot connect to local mpd > (/tmp/mpd2.console_mpepvs); > > > possible causes: > > > 1. no mpd is running on this host > > > 2. an mpd is running but was started without a "console" (-n option) > > > Possible error running C/C++ src/snes/examples/tutorials/ex19 with 2 > MPI > > > processes > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > > mpiexec_atlas7-c10: cannot connect to local mpd > (/tmp/mpd2.console_mpepvs); > > > possible causes: > > > 1. no mpd is running on this host > > > 2. an mpd is running but was started without a "console" (-n option) > > > Possible error running Fortran example src/snes/examples/tutorials/ > ex5f > > > with 1 MPI process > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > > mpiexec_atlas7-c10: cannot connect to local mpd > (/tmp/mpd2.console_mpepvs); > > > possible causes: > > > 1. no mpd is running on this host > > > 2. an mpd is running but was started without a "console" (-n option) > > > Completed test examples > > > ========================================= > > > Now to evaluate the computer systems you plan use - do: > > > make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > > > PETSC_ARCH=arch-linux-cxx-opt streams > > > > > > > > > > > > > > > *Please help on this.* > > > *Many thanks!* > > > > > > > > > On Thu, Apr 20, 2017 at 2:02 AM, Satish Balay > wrote: > > > > > > > Sorry - should have mentioned: > > > > > > > > do 'rm -rf arch-linux-cxx-opt' and rerun configure again. > > > > > > > > The mpich install from previous build [that is currently in > > > > arch-linux-cxx-opt/] > > > > is conflicting with --with-mpi-dir=/app1/centos6.3/gnu/mvapich2-1.9/ > > > > > > > > Satish > > > > > > > > > > > > On Wed, 19 Apr 2017, Pham Pham wrote: > > > > > > > > > I reconfigured PETSs with installed MPI, however, I got serous > error: > > > > > > > > > > **************************ERROR************************* > ************ > > > > > Error during compile, check arch-linux-cxx-opt/lib/petsc/ > conf/make.log > > > > > Send it and arch-linux-cxx-opt/lib/petsc/conf/configure.log to > > > > > petsc-maint at mcs.anl.gov > > > > > ************************************************************ > ******** > > > > > > > > > > Please explain what is happening? > > > > > > > > > > Thank you very much. > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Apr 19, 2017 at 11:43 PM, Satish Balay > > > > wrote: > > > > > > > > > > > Presumably your cluster already has a recommended MPI to use > [which is > > > > > > already installed. So you should use that - instead of > > > > > > --download-mpich=1 > > > > > > > > > > > > Satish > > > > > > > > > > > > On Wed, 19 Apr 2017, Pham Pham wrote: > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > I just installed petsc-3.7.5 into my university cluster. When > > > > evaluating > > > > > > > the computer system, PETSc reports "It appears you have 1 > node(s)", I > > > > > > donot > > > > > > > understand this, since the system is a multinodes system. > Could you > > > > > > please > > > > > > > explain this to me? > > > > > > > > > > > > > > Thank you very much. > > > > > > > > > > > > > > S. > > > > > > > > > > > > > > Output: > > > > > > > ========================================= > > > > > > > Now to evaluate the computer systems you plan use - do: > > > > > > > make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > > > > > > > PETSC_ARCH=arch-linux-cxx-opt streams > > > > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ make > > > > > > > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > > > > > > PETSC_ARCH=arch-linux-cxx-opt > > > > > > > streams > > > > > > > cd src/benchmarks/streams; /usr/bin/gmake --no-print-directory > > > > > > > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > > > > > > PETSC_ARCH=arch-linux-cxx-opt > > > > > > > streams > > > > > > > /home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/bin/mpicxx > -o > > > > > > > MPIVersion.o -c -Wall -Wwrite-strings -Wno-strict-aliasing > > > > > > > -Wno-unknown-pragmas -fvisibility=hidden -g -O > > > > > > > -I/home/svu/mpepvs/petsc/petsc-3.7.5/include > > > > > > > -I/home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx- > opt/include > > > > > > > `pwd`/MPIVersion.c > > > > > > > Running streams with > > > > > > > '/home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/bin/mpiexec > ' > > > > > > using > > > > > > > 'NPMAX=12' > > > > > > > Number of MPI processes 1 Processor names atlas7-c10 > > > > > > > Triad: 9137.5025 Rate (MB/s) > > > > > > > Number of MPI processes 2 Processor names atlas7-c10 > atlas7-c10 > > > > > > > Triad: 9707.2815 Rate (MB/s) > > > > > > > Number of MPI processes 3 Processor names atlas7-c10 > atlas7-c10 > > > > > > atlas7-c10 > > > > > > > Triad: 13559.5275 Rate (MB/s) > > > > > > > Number of MPI processes 4 Processor names atlas7-c10 > atlas7-c10 > > > > > > atlas7-c10 > > > > > > > atlas7-c10 > > > > > > > Triad: 14193.0597 Rate (MB/s) > > > > > > > Number of MPI processes 5 Processor names atlas7-c10 > atlas7-c10 > > > > > > atlas7-c10 > > > > > > > atlas7-c10 atlas7-c10 > > > > > > > Triad: 14492.9234 Rate (MB/s) > > > > > > > Number of MPI processes 6 Processor names atlas7-c10 > atlas7-c10 > > > > > > atlas7-c10 > > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 > > > > > > > Triad: 15476.5912 Rate (MB/s) > > > > > > > Number of MPI processes 7 Processor names atlas7-c10 > atlas7-c10 > > > > > > atlas7-c10 > > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > > > > Triad: 15148.7388 Rate (MB/s) > > > > > > > Number of MPI processes 8 Processor names atlas7-c10 > atlas7-c10 > > > > > > atlas7-c10 > > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > > > > Triad: 15799.1290 Rate (MB/s) > > > > > > > Number of MPI processes 9 Processor names atlas7-c10 > atlas7-c10 > > > > > > atlas7-c10 > > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > atlas7-c10 > > > > > > > Triad: 15671.3104 Rate (MB/s) > > > > > > > Number of MPI processes 10 Processor names atlas7-c10 > atlas7-c10 > > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > atlas7-c10 > > > > > > > atlas7-c10 atlas7-c10 > > > > > > > Triad: 15601.4754 Rate (MB/s) > > > > > > > Number of MPI processes 11 Processor names atlas7-c10 > atlas7-c10 > > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > atlas7-c10 > > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 > > > > > > > Triad: 15434.5790 Rate (MB/s) > > > > > > > Number of MPI processes 12 Processor names atlas7-c10 > atlas7-c10 > > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > atlas7-c10 > > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > > > > Triad: 15134.1263 Rate (MB/s) > > > > > > > ------------------------------------------------ > > > > > > > np speedup > > > > > > > 1 1.0 > > > > > > > 2 1.06 > > > > > > > 3 1.48 > > > > > > > 4 1.55 > > > > > > > 5 1.59 > > > > > > > 6 1.69 > > > > > > > 7 1.66 > > > > > > > 8 1.73 > > > > > > > 9 1.72 > > > > > > > 10 1.71 > > > > > > > 11 1.69 > > > > > > > 12 1.66 > > > > > > > Estimation of possible speedup of MPI programs based on Streams > > > > > > benchmark. > > > > > > > It appears you have 1 node(s) > > > > > > > Unable to plot speedup to a file > > > > > > > Unable to open matplotlib to plot speedup > > > > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ > > > > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon May 22 18:37:15 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 22 May 2017 18:37:15 -0500 Subject: [petsc-users] Installation question In-Reply-To: References: <1B05ACB6-6BDF-42C8-89FB-C5ECC5657934@mcs.anl.gov> Message-ID: <9E32967B-800D-4C73-93D6-9B0A3E78FBCB@mcs.anl.gov> > On May 22, 2017, at 6:00 PM, Pham Pham wrote: > > Hi Barry, > > My code using DMDA, the mesh is partitioned in x-direction only. Can I have MPI+OpenMP works in the following way: > > I want to create a new communicator which includes processes with Rank%12==0, PETSc objects will be created with this new sub-set of processes. In each node (which has 12 cores), the first core (Rank%12==0) does MPI communicating job (with Rank%12==0 process of other nodes), then commanded other 11 processes do computation works using openMP? You cannot convert an MPI rank process into an OpenMP thread. You would just assign one MPI rank per node and have that one rank do 12 OpenMP threads. > > Thank you. > > On Tue, May 23, 2017 at 1:58 AM, Barry Smith wrote: > > > On May 22, 2017, at 11:25 AM, Pham Pham wrote: > > > > Hi Matt, > > > > For the machine I have, Is it a good idea if I mix MPI and OpenMP: MPI for cores with Rank%12==0 and OpenMP for the others ? > > > > MPI+OpenMP doesn't work this way. Each "rank" is an MPI process, you cannot say some ranks are MPI and some are OpenMP. If you want to use one MPI process per node and have each MPI process have 12 OpenMP threads you need to find out for YOUR systems MPI how you tell it to put one MPI process per node; > > Barry > > > Thank you, > > > > PVS. > > > > On Thu, May 11, 2017 at 8:27 PM, Matthew Knepley wrote: > > On Thu, May 11, 2017 at 7:08 AM, Pham Pham wrote: > > Hi Matt, > > > > Thank you for the reply. > > > > I am using University HPC which has multiple nodes, and should be good for parallel computing. The bad performance might be due to the way I install and run PETSc... > > > > Looking at the output when running streams, I can see that the Processor names were the same. > > Does that mean only one processor involved in computing, did it cause the bad performance? > > > > Yes. From the data, it appears that the kind of processor you have has 12 cores, but only enough memory bandwidth to support 1.5 cores. > > Try running the STREAMS with only 1 process per node. This is a setting in your submission script, but it is different for every cluster. Thus > > I would ask the local sysdamin for this machine to help you do that. You should see almost perfect scaling with that configuration. You might > > also try 2 processes per node to compare. > > > > Thanks, > > > > Matt > > > > Thank you very much. > > > > Ph. > > > > Below is testing output: > > > > [mpepvs at atlas5-c01 petsc-3.7.5]$ make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 PETSC_ARCH=arch-linux-cxx-opt streams > > cd src/benchmarks/streams; /usr/bin/gmake --no-print-directory PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 PETSC_ARCH=arch-linux-cxx-opt streams > > /app1/centos6.3/Intel/xe_2015/impi/5.0.3.048/intel64/bin/mpicxx -o MPIVersion.o c -wd1572 -g -O3 -fPIC -I/home/svu/mpepvs/petsc/petsc-3.7.5/include -I/hom e/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/include -I/app1/centos6.3/Intel/xe_2015/impi/5.0.3.048/intel64/include `pwd`/MPIVersion.c > > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > The version of PETSc you are using is out-of-date, we recommend updating to the new release > > Available Version: 3.7.6 Installed Version: 3.7.5 > > http://www.mcs.anl.gov/petsc/download/index.html > > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Running streams with 'mpiexec.hydra ' using 'NPMAX=12' > > Number of MPI processes 1 Processor names atlas5-c01 > > Triad: 11026.7604 Rate (MB/s) > > Number of MPI processes 2 Processor names atlas5-c01 atlas5-c01 > > Triad: 14669.6730 Rate (MB/s) > > Number of MPI processes 3 Processor names atlas5-c01 atlas5-c01 atlas5-c01 > > Triad: 12848.2644 Rate (MB/s) > > Number of MPI processes 4 Processor names atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 > > Triad: 15033.7687 Rate (MB/s) > > Number of MPI processes 5 Processor names atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 > > Triad: 13299.3830 Rate (MB/s) > > Number of MPI processes 6 Processor names atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 > > Triad: 14382.2116 Rate (MB/s) > > Number of MPI processes 7 Processor names atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 > > Triad: 13194.2573 Rate (MB/s) > > Number of MPI processes 8 Processor names atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 > > Triad: 14199.7255 Rate (MB/s) > > Number of MPI processes 9 Processor names atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 > > Triad: 13045.8946 Rate (MB/s) > > Number of MPI processes 10 Processor names atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 > > Triad: 13058.3283 Rate (MB/s) > > Number of MPI processes 11 Processor names atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 > > Triad: 13037.3334 Rate (MB/s) > > Number of MPI processes 12 Processor names atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 > > Triad: 12526.6096 Rate (MB/s) > > ------------------------------------------------ > > np speedup > > 1 1.0 > > 2 1.33 > > 3 1.17 > > 4 1.36 > > 5 1.21 > > 6 1.3 > > 7 1.2 > > 8 1.29 > > 9 1.18 > > 10 1.18 > > 11 1.18 > > 12 1.14 > > Estimation of possible speedup of MPI programs based on Streams benchmark. > > It appears you have 1 node(s) > > See graph in the file src/benchmarks/streams/scaling.png > > > > On Fri, May 5, 2017 at 11:26 PM, Matthew Knepley wrote: > > On Fri, May 5, 2017 at 10:18 AM, Pham Pham wrote: > > Hi Satish, > > > > It runs now, and shows a bad speed up: > > Please help to improve this. > > > > http://www.mcs.anl.gov/petsc/documentation/faq.html#computers > > > > The short answer is: You cannot improve this without buying a different machine. This is > > a fundamental algorithmic limitation that cannot be helped by threads, or vectorization, or > > anything else. > > > > Matt > > > > Thank you. > > > > > > ? > > > > On Fri, May 5, 2017 at 10:02 PM, Satish Balay wrote: > > With Intel MPI - its best to use mpiexec.hydra [and not mpiexec] > > > > So you can do: > > > > make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 PETSC_ARCH=arch-linux-cxx-opt MPIEXEC=mpiexec.hydra test > > > > > > [you can also specify --with-mpiexec=mpiexec.hydra at configure time] > > > > Satish > > > > > > On Fri, 5 May 2017, Pham Pham wrote: > > > > > *Hi,* > > > *I can configure now, but fail when testing:* > > > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ make > > > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 PETSC_ARCH=arch-linux-cxx-opt > > > test Running test examples to verify correct installation > > > Using PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 and > > > PETSC_ARCH=arch-linux-cxx-opt > > > Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1 MPI > > > process > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > > mpiexec_atlas7-c10: cannot connect to local mpd (/tmp/mpd2.console_mpepvs); > > > possible causes: > > > 1. no mpd is running on this host > > > 2. an mpd is running but was started without a "console" (-n option) > > > Possible error running C/C++ src/snes/examples/tutorials/ex19 with 2 MPI > > > processes > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > > mpiexec_atlas7-c10: cannot connect to local mpd (/tmp/mpd2.console_mpepvs); > > > possible causes: > > > 1. no mpd is running on this host > > > 2. an mpd is running but was started without a "console" (-n option) > > > Possible error running Fortran example src/snes/examples/tutorials/ex5f > > > with 1 MPI process > > > See http://www.mcs.anl.gov/petsc/documentation/faq.html > > > mpiexec_atlas7-c10: cannot connect to local mpd (/tmp/mpd2.console_mpepvs); > > > possible causes: > > > 1. no mpd is running on this host > > > 2. an mpd is running but was started without a "console" (-n option) > > > Completed test examples > > > ========================================= > > > Now to evaluate the computer systems you plan use - do: > > > make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > > > PETSC_ARCH=arch-linux-cxx-opt streams > > > > > > > > > > > > > > > *Please help on this.* > > > *Many thanks!* > > > > > > > > > On Thu, Apr 20, 2017 at 2:02 AM, Satish Balay wrote: > > > > > > > Sorry - should have mentioned: > > > > > > > > do 'rm -rf arch-linux-cxx-opt' and rerun configure again. > > > > > > > > The mpich install from previous build [that is currently in > > > > arch-linux-cxx-opt/] > > > > is conflicting with --with-mpi-dir=/app1/centos6.3/gnu/mvapich2-1.9/ > > > > > > > > Satish > > > > > > > > > > > > On Wed, 19 Apr 2017, Pham Pham wrote: > > > > > > > > > I reconfigured PETSs with installed MPI, however, I got serous error: > > > > > > > > > > **************************ERROR************************************* > > > > > Error during compile, check arch-linux-cxx-opt/lib/petsc/conf/make.log > > > > > Send it and arch-linux-cxx-opt/lib/petsc/conf/configure.log to > > > > > petsc-maint at mcs.anl.gov > > > > > ******************************************************************** > > > > > > > > > > Please explain what is happening? > > > > > > > > > > Thank you very much. > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Apr 19, 2017 at 11:43 PM, Satish Balay > > > > wrote: > > > > > > > > > > > Presumably your cluster already has a recommended MPI to use [which is > > > > > > already installed. So you should use that - instead of > > > > > > --download-mpich=1 > > > > > > > > > > > > Satish > > > > > > > > > > > > On Wed, 19 Apr 2017, Pham Pham wrote: > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > I just installed petsc-3.7.5 into my university cluster. When > > > > evaluating > > > > > > > the computer system, PETSc reports "It appears you have 1 node(s)", I > > > > > > donot > > > > > > > understand this, since the system is a multinodes system. Could you > > > > > > please > > > > > > > explain this to me? > > > > > > > > > > > > > > Thank you very much. > > > > > > > > > > > > > > S. > > > > > > > > > > > > > > Output: > > > > > > > ========================================= > > > > > > > Now to evaluate the computer systems you plan use - do: > > > > > > > make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > > > > > > > PETSC_ARCH=arch-linux-cxx-opt streams > > > > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ make > > > > > > > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > > > > > > PETSC_ARCH=arch-linux-cxx-opt > > > > > > > streams > > > > > > > cd src/benchmarks/streams; /usr/bin/gmake --no-print-directory > > > > > > > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > > > > > > PETSC_ARCH=arch-linux-cxx-opt > > > > > > > streams > > > > > > > /home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/bin/mpicxx -o > > > > > > > MPIVersion.o -c -Wall -Wwrite-strings -Wno-strict-aliasing > > > > > > > -Wno-unknown-pragmas -fvisibility=hidden -g -O > > > > > > > -I/home/svu/mpepvs/petsc/petsc-3.7.5/include > > > > > > > -I/home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/include > > > > > > > `pwd`/MPIVersion.c > > > > > > > Running streams with > > > > > > > '/home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/bin/mpiexec ' > > > > > > using > > > > > > > 'NPMAX=12' > > > > > > > Number of MPI processes 1 Processor names atlas7-c10 > > > > > > > Triad: 9137.5025 Rate (MB/s) > > > > > > > Number of MPI processes 2 Processor names atlas7-c10 atlas7-c10 > > > > > > > Triad: 9707.2815 Rate (MB/s) > > > > > > > Number of MPI processes 3 Processor names atlas7-c10 atlas7-c10 > > > > > > atlas7-c10 > > > > > > > Triad: 13559.5275 Rate (MB/s) > > > > > > > Number of MPI processes 4 Processor names atlas7-c10 atlas7-c10 > > > > > > atlas7-c10 > > > > > > > atlas7-c10 > > > > > > > Triad: 14193.0597 Rate (MB/s) > > > > > > > Number of MPI processes 5 Processor names atlas7-c10 atlas7-c10 > > > > > > atlas7-c10 > > > > > > > atlas7-c10 atlas7-c10 > > > > > > > Triad: 14492.9234 Rate (MB/s) > > > > > > > Number of MPI processes 6 Processor names atlas7-c10 atlas7-c10 > > > > > > atlas7-c10 > > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 > > > > > > > Triad: 15476.5912 Rate (MB/s) > > > > > > > Number of MPI processes 7 Processor names atlas7-c10 atlas7-c10 > > > > > > atlas7-c10 > > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > > > > Triad: 15148.7388 Rate (MB/s) > > > > > > > Number of MPI processes 8 Processor names atlas7-c10 atlas7-c10 > > > > > > atlas7-c10 > > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > > > > Triad: 15799.1290 Rate (MB/s) > > > > > > > Number of MPI processes 9 Processor names atlas7-c10 atlas7-c10 > > > > > > atlas7-c10 > > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > > > > Triad: 15671.3104 Rate (MB/s) > > > > > > > Number of MPI processes 10 Processor names atlas7-c10 atlas7-c10 > > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > > > > atlas7-c10 atlas7-c10 > > > > > > > Triad: 15601.4754 Rate (MB/s) > > > > > > > Number of MPI processes 11 Processor names atlas7-c10 atlas7-c10 > > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 > > > > > > > Triad: 15434.5790 Rate (MB/s) > > > > > > > Number of MPI processes 12 Processor names atlas7-c10 atlas7-c10 > > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > > > > > Triad: 15134.1263 Rate (MB/s) > > > > > > > ------------------------------------------------ > > > > > > > np speedup > > > > > > > 1 1.0 > > > > > > > 2 1.06 > > > > > > > 3 1.48 > > > > > > > 4 1.55 > > > > > > > 5 1.59 > > > > > > > 6 1.69 > > > > > > > 7 1.66 > > > > > > > 8 1.73 > > > > > > > 9 1.72 > > > > > > > 10 1.71 > > > > > > > 11 1.69 > > > > > > > 12 1.66 > > > > > > > Estimation of possible speedup of MPI programs based on Streams > > > > > > benchmark. > > > > > > > It appears you have 1 node(s) > > > > > > > Unable to plot speedup to a file > > > > > > > Unable to open matplotlib to plot speedup > > > > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ > > > > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > > -- Norbert Wiener > > > > > > > > > > -- > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > > -- Norbert Wiener > > > > From knepley at gmail.com Mon May 22 18:38:12 2017 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 22 May 2017 18:38:12 -0500 Subject: [petsc-users] Installation question In-Reply-To: References: <1B05ACB6-6BDF-42C8-89FB-C5ECC5657934@mcs.anl.gov> Message-ID: On Mon, May 22, 2017 at 6:00 PM, Pham Pham wrote: > Hi Barry, > > My code using DMDA, the mesh is partitioned in x-direction only. Can I > have MPI+OpenMP works in the following way: > > I want to create a new communicator which includes processes with > Rank%12==0, PETSc objects will be created with this new sub-set of > processes. In each node (which has 12 cores), the first core (Rank%12==0) > does MPI communicating job (with Rank%12==0 process of other nodes), then > commanded other 11 processes do computation works using openMP? > But this is not a sensible thing. The performance here is not dependent on processing, its dependent on memory bandwidth, but you do not increase that with more threads. In addition, the overhead of threads here is just as big or bigger than processes, so you would be better off just running that many MPI processes. Matt > Thank you. > > On Tue, May 23, 2017 at 1:58 AM, Barry Smith wrote: > >> >> > On May 22, 2017, at 11:25 AM, Pham Pham wrote: >> > >> > Hi Matt, >> > >> > For the machine I have, Is it a good idea if I mix MPI and OpenMP: MPI >> for cores with Rank%12==0 and OpenMP for the others ? >> > >> >> MPI+OpenMP doesn't work this way. Each "rank" is an MPI process, you >> cannot say some ranks are MPI and some are OpenMP. If you want to use one >> MPI process per node and have each MPI process have 12 OpenMP threads you >> need to find out for YOUR systems MPI how you tell it to put one MPI >> process per node; >> >> Barry >> >> > Thank you, >> > >> > PVS. >> > >> > On Thu, May 11, 2017 at 8:27 PM, Matthew Knepley >> wrote: >> > On Thu, May 11, 2017 at 7:08 AM, Pham Pham wrote: >> > Hi Matt, >> > >> > Thank you for the reply. >> > >> > I am using University HPC which has multiple nodes, and should be good >> for parallel computing. The bad performance might be due to the way I >> install and run PETSc... >> > >> > Looking at the output when running streams, I can see that the >> Processor names were the same. >> > Does that mean only one processor involved in computing, did it cause >> the bad performance? >> > >> > Yes. From the data, it appears that the kind of processor you have has >> 12 cores, but only enough memory bandwidth to support 1.5 cores. >> > Try running the STREAMS with only 1 process per node. This is a setting >> in your submission script, but it is different for every cluster. Thus >> > I would ask the local sysdamin for this machine to help you do that. >> You should see almost perfect scaling with that configuration. You might >> > also try 2 processes per node to compare. >> > >> > Thanks, >> > >> > Matt >> > >> > Thank you very much. >> > >> > Ph. >> > >> > Below is testing output: >> > >> > [mpepvs at atlas5-c01 petsc-3.7.5]$ make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 >> PETSC_ARCH=arch-linux-cxx-opt streams >> > cd src/benchmarks/streams; /usr/bin/gmake --no-print-directory >> PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 >> PETSC_ARCH=arch-linux-cxx-opt streams >> > /app1/centos6.3/Intel/xe_2015/impi/5.0.3.048/intel64/bin/mpicxx -o >> MPIVersion.o c -wd1572 -g -O3 -fPIC -I/home/svu/mpepvs/petsc/petsc-3.7.5/include >> -I/hom >> >> e/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/include >> -I/app1/centos6.3/Intel/xe_2015/impi/5.0.3.048/intel64/include >> `pwd`/MPIVersion.c >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> +++++++++++++++++++++++++++++++ >> > The version of PETSc you are using is out-of-date, we recommend >> updating to the new release >> > Available Version: 3.7.6 Installed Version: 3.7.5 >> > http://www.mcs.anl.gov/petsc/download/index.html >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> +++++++++++++++++++++++++++++++ >> > Running streams with 'mpiexec.hydra ' using 'NPMAX=12' >> > Number of MPI processes 1 Processor names atlas5-c01 >> > Triad: 11026.7604 Rate (MB/s) >> > Number of MPI processes 2 Processor names atlas5-c01 atlas5-c01 >> > Triad: 14669.6730 Rate (MB/s) >> > Number of MPI processes 3 Processor names atlas5-c01 atlas5-c01 >> atlas5-c01 >> > Triad: 12848.2644 Rate (MB/s) >> > Number of MPI processes 4 Processor names atlas5-c01 atlas5-c01 >> atlas5-c01 atlas5-c01 >> > Triad: 15033.7687 Rate (MB/s) >> > Number of MPI processes 5 Processor names atlas5-c01 atlas5-c01 >> atlas5-c01 atlas5-c01 atlas5-c01 >> > Triad: 13299.3830 Rate (MB/s) >> > Number of MPI processes 6 Processor names atlas5-c01 atlas5-c01 >> atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 >> > Triad: 14382.2116 Rate (MB/s) >> > Number of MPI processes 7 Processor names atlas5-c01 atlas5-c01 >> atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 >> > Triad: 13194.2573 Rate (MB/s) >> > Number of MPI processes 8 Processor names atlas5-c01 atlas5-c01 >> atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 >> > Triad: 14199.7255 Rate (MB/s) >> > Number of MPI processes 9 Processor names atlas5-c01 atlas5-c01 >> atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 >> > Triad: 13045.8946 Rate (MB/s) >> > Number of MPI processes 10 Processor names atlas5-c01 atlas5-c01 >> atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 >> atlas5-c01 atlas5-c01 >> > Triad: 13058.3283 Rate (MB/s) >> > Number of MPI processes 11 Processor names atlas5-c01 atlas5-c01 >> atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 >> atlas5-c01 atlas5-c01 atlas5-c01 >> > Triad: 13037.3334 Rate (MB/s) >> > Number of MPI processes 12 Processor names atlas5-c01 atlas5-c01 >> atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 >> atlas5-c01 atlas5-c01 atlas5-c01 atlas5-c01 >> > Triad: 12526.6096 Rate (MB/s) >> > ------------------------------------------------ >> > np speedup >> > 1 1.0 >> > 2 1.33 >> > 3 1.17 >> > 4 1.36 >> > 5 1.21 >> > 6 1.3 >> > 7 1.2 >> > 8 1.29 >> > 9 1.18 >> > 10 1.18 >> > 11 1.18 >> > 12 1.14 >> > Estimation of possible speedup of MPI programs based on Streams >> benchmark. >> > It appears you have 1 node(s) >> > See graph in the file src/benchmarks/streams/scaling.png >> > >> > On Fri, May 5, 2017 at 11:26 PM, Matthew Knepley >> wrote: >> > On Fri, May 5, 2017 at 10:18 AM, Pham Pham wrote: >> > Hi Satish, >> > >> > It runs now, and shows a bad speed up: >> > Please help to improve this. >> > >> > http://www.mcs.anl.gov/petsc/documentation/faq.html#computers >> > >> > The short answer is: You cannot improve this without buying a different >> machine. This is >> > a fundamental algorithmic limitation that cannot be helped by threads, >> or vectorization, or >> > anything else. >> > >> > Matt >> > >> > Thank you. >> > >> > >> > ? >> > >> > On Fri, May 5, 2017 at 10:02 PM, Satish Balay >> wrote: >> > With Intel MPI - its best to use mpiexec.hydra [and not mpiexec] >> > >> > So you can do: >> > >> > make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 >> PETSC_ARCH=arch-linux-cxx-opt MPIEXEC=mpiexec.hydra test >> > >> > >> > [you can also specify --with-mpiexec=mpiexec.hydra at configure time] >> > >> > Satish >> > >> > >> > On Fri, 5 May 2017, Pham Pham wrote: >> > >> > > *Hi,* >> > > *I can configure now, but fail when testing:* >> > > >> > > [mpepvs at atlas7-c10 petsc-3.7.5]$ make >> > > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 >> PETSC_ARCH=arch-linux-cxx-opt >> > > test Running test examples to verify correct installation >> > > Using PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 and >> > > PETSC_ARCH=arch-linux-cxx-opt >> > > Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1 >> MPI >> > > process >> > > See http://www.mcs.anl.gov/petsc/documentation/faq.html >> > > mpiexec_atlas7-c10: cannot connect to local mpd >> (/tmp/mpd2.console_mpepvs); >> > > possible causes: >> > > 1. no mpd is running on this host >> > > 2. an mpd is running but was started without a "console" (-n option) >> > > Possible error running C/C++ src/snes/examples/tutorials/ex19 with 2 >> MPI >> > > processes >> > > See http://www.mcs.anl.gov/petsc/documentation/faq.html >> > > mpiexec_atlas7-c10: cannot connect to local mpd >> (/tmp/mpd2.console_mpepvs); >> > > possible causes: >> > > 1. no mpd is running on this host >> > > 2. an mpd is running but was started without a "console" (-n option) >> > > Possible error running Fortran example src/snes/examples/tutorials/ex >> 5f >> > > with 1 MPI process >> > > See http://www.mcs.anl.gov/petsc/documentation/faq.html >> > > mpiexec_atlas7-c10: cannot connect to local mpd >> (/tmp/mpd2.console_mpepvs); >> > > possible causes: >> > > 1. no mpd is running on this host >> > > 2. an mpd is running but was started without a "console" (-n option) >> > > Completed test examples >> > > ========================================= >> > > Now to evaluate the computer systems you plan use - do: >> > > make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 >> > > PETSC_ARCH=arch-linux-cxx-opt streams >> > > >> > > >> > > >> > > >> > > *Please help on this.* >> > > *Many thanks!* >> > > >> > > >> > > On Thu, Apr 20, 2017 at 2:02 AM, Satish Balay >> wrote: >> > > >> > > > Sorry - should have mentioned: >> > > > >> > > > do 'rm -rf arch-linux-cxx-opt' and rerun configure again. >> > > > >> > > > The mpich install from previous build [that is currently in >> > > > arch-linux-cxx-opt/] >> > > > is conflicting with --with-mpi-dir=/app1/centos6.3 >> /gnu/mvapich2-1.9/ >> > > > >> > > > Satish >> > > > >> > > > >> > > > On Wed, 19 Apr 2017, Pham Pham wrote: >> > > > >> > > > > I reconfigured PETSs with installed MPI, however, I got serous >> error: >> > > > > >> > > > > **************************ERROR***************************** >> ******** >> > > > > Error during compile, check arch-linux-cxx-opt/lib/petsc/c >> onf/make.log >> > > > > Send it and arch-linux-cxx-opt/lib/petsc/conf/configure.log to >> > > > > petsc-maint at mcs.anl.gov >> > > > > ************************************************************ >> ******** >> > > > > >> > > > > Please explain what is happening? >> > > > > >> > > > > Thank you very much. >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > On Wed, Apr 19, 2017 at 11:43 PM, Satish Balay > > >> > > > wrote: >> > > > > >> > > > > > Presumably your cluster already has a recommended MPI to use >> [which is >> > > > > > already installed. So you should use that - instead of >> > > > > > --download-mpich=1 >> > > > > > >> > > > > > Satish >> > > > > > >> > > > > > On Wed, 19 Apr 2017, Pham Pham wrote: >> > > > > > >> > > > > > > Hi, >> > > > > > > >> > > > > > > I just installed petsc-3.7.5 into my university cluster. When >> > > > evaluating >> > > > > > > the computer system, PETSc reports "It appears you have 1 >> node(s)", I >> > > > > > donot >> > > > > > > understand this, since the system is a multinodes system. >> Could you >> > > > > > please >> > > > > > > explain this to me? >> > > > > > > >> > > > > > > Thank you very much. >> > > > > > > >> > > > > > > S. >> > > > > > > >> > > > > > > Output: >> > > > > > > ========================================= >> > > > > > > Now to evaluate the computer systems you plan use - do: >> > > > > > > make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 >> > > > > > > PETSC_ARCH=arch-linux-cxx-opt streams >> > > > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ make >> > > > > > > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 >> > > > > > PETSC_ARCH=arch-linux-cxx-opt >> > > > > > > streams >> > > > > > > cd src/benchmarks/streams; /usr/bin/gmake >> --no-print-directory >> > > > > > > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 >> > > > > > PETSC_ARCH=arch-linux-cxx-opt >> > > > > > > streams >> > > > > > > /home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/bin/mpicxx >> -o >> > > > > > > MPIVersion.o -c -Wall -Wwrite-strings -Wno-strict-aliasing >> > > > > > > -Wno-unknown-pragmas -fvisibility=hidden -g -O >> > > > > > > -I/home/svu/mpepvs/petsc/petsc-3.7.5/include >> > > > > > > -I/home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/ >> include >> > > > > > > `pwd`/MPIVersion.c >> > > > > > > Running streams with >> > > > > > > '/home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/bin/mpiexec >> ' >> > > > > > using >> > > > > > > 'NPMAX=12' >> > > > > > > Number of MPI processes 1 Processor names atlas7-c10 >> > > > > > > Triad: 9137.5025 Rate (MB/s) >> > > > > > > Number of MPI processes 2 Processor names atlas7-c10 >> atlas7-c10 >> > > > > > > Triad: 9707.2815 Rate (MB/s) >> > > > > > > Number of MPI processes 3 Processor names atlas7-c10 >> atlas7-c10 >> > > > > > atlas7-c10 >> > > > > > > Triad: 13559.5275 Rate (MB/s) >> > > > > > > Number of MPI processes 4 Processor names atlas7-c10 >> atlas7-c10 >> > > > > > atlas7-c10 >> > > > > > > atlas7-c10 >> > > > > > > Triad: 14193.0597 Rate (MB/s) >> > > > > > > Number of MPI processes 5 Processor names atlas7-c10 >> atlas7-c10 >> > > > > > atlas7-c10 >> > > > > > > atlas7-c10 atlas7-c10 >> > > > > > > Triad: 14492.9234 Rate (MB/s) >> > > > > > > Number of MPI processes 6 Processor names atlas7-c10 >> atlas7-c10 >> > > > > > atlas7-c10 >> > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 >> > > > > > > Triad: 15476.5912 Rate (MB/s) >> > > > > > > Number of MPI processes 7 Processor names atlas7-c10 >> atlas7-c10 >> > > > > > atlas7-c10 >> > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >> > > > > > > Triad: 15148.7388 Rate (MB/s) >> > > > > > > Number of MPI processes 8 Processor names atlas7-c10 >> atlas7-c10 >> > > > > > atlas7-c10 >> > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >> > > > > > > Triad: 15799.1290 Rate (MB/s) >> > > > > > > Number of MPI processes 9 Processor names atlas7-c10 >> atlas7-c10 >> > > > > > atlas7-c10 >> > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >> atlas7-c10 >> > > > > > > Triad: 15671.3104 Rate (MB/s) >> > > > > > > Number of MPI processes 10 Processor names atlas7-c10 >> atlas7-c10 >> > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >> atlas7-c10 >> > > > > > > atlas7-c10 atlas7-c10 >> > > > > > > Triad: 15601.4754 Rate (MB/s) >> > > > > > > Number of MPI processes 11 Processor names atlas7-c10 >> atlas7-c10 >> > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >> atlas7-c10 >> > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 >> > > > > > > Triad: 15434.5790 Rate (MB/s) >> > > > > > > Number of MPI processes 12 Processor names atlas7-c10 >> atlas7-c10 >> > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >> atlas7-c10 >> > > > > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 >> > > > > > > Triad: 15134.1263 Rate (MB/s) >> > > > > > > ------------------------------------------------ >> > > > > > > np speedup >> > > > > > > 1 1.0 >> > > > > > > 2 1.06 >> > > > > > > 3 1.48 >> > > > > > > 4 1.55 >> > > > > > > 5 1.59 >> > > > > > > 6 1.69 >> > > > > > > 7 1.66 >> > > > > > > 8 1.73 >> > > > > > > 9 1.72 >> > > > > > > 10 1.71 >> > > > > > > 11 1.69 >> > > > > > > 12 1.66 >> > > > > > > Estimation of possible speedup of MPI programs based on >> Streams >> > > > > > benchmark. >> > > > > > > It appears you have 1 node(s) >> > > > > > > Unable to plot speedup to a file >> > > > > > > Unable to open matplotlib to plot speedup >> > > > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ >> > > > > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ >> > > > > > > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > > >> > > >> > >> > >> > >> > >> > >> > -- >> > What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> > -- Norbert Wiener >> > >> > >> > >> > >> > -- >> > What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> > -- Norbert Wiener >> > >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From franck.houssen at inria.fr Tue May 23 04:41:32 2017 From: franck.houssen at inria.fr (Franck Houssen) Date: Tue, 23 May 2017 11:41:32 +0200 (CEST) Subject: [petsc-users] [petsc-dev] Using PETSc MatIS, how to get local matrix (= one domain) before and after assembly ? In-Reply-To: References: <867421313.6757137.1495383596545.JavaMail.zimbra@inria.fr> <1253777447.6757298.1495383780337.JavaMail.zimbra@inria.fr> Message-ID: <2033509705.7414108.1495532492501.JavaMail.zimbra@inria.fr> I have a 3x3 global matrix made of two overlapping 2x2 local matrix (= diagonal with 1.). Each local matrix correspond to one domain (each domain is delegated to one MPI proc, so, I have 2 MPI procs because I have 2 domains). This is the simplest possible example: I have two 2x2 (local) diag matrix that overlap so that the global matrix built from them is 1, 2, 1 on the diagonal (local contributions add up in the middle). Now, I need for each MPI proc to get the assembled local matrix (sometimes called the dirichlet matrix) : this is a local matrix (sequential - not distributed with MPI) that accounts for contribution of neighboring domains (MPI proc). How to get the local assembled matrix ? MatGetLocalSubMatrix does not work (throw error - see example attached). MatGetSubMatrix returns a MPI distributed matrix, not a local (sequential) one. 1. My understanding is that MatISGetMPIXAIJ should return a local matrix (sequential AIJ matrix) : the MPI in the name recall that you get the assembled matrix (with contributions from the shared border) from the other MPI processus. Correct ? In my simple example, I replaced MatGetLocalSubMatrix with MatISGetMPIXAIJ : I get a deadlock which was surprising to me... Is MatISGetMPIXAIJ a collective call ? 2. Supposing this is a collective call (and that point 1 is not correct), I ride up MatISGetMPIXAIJ before the "if (rank > 0)" : I don't deadlock now, but it seems I get a global matrix which is not the assembled local matrix I am looking for. 3. I am supposed to destroy the matrix returned by MatISGetMPIXAIJ ? (I believe yes - not sure as AFAIU wording should associate Destroy methods to Create methods) Franck The git diff illustrate modifications I tried to add to the initial file attached to this thread: --- a/matISLocalMat.cpp +++ b/matISLocalMat.cpp @@ -31,6 +31,8 @@ int main(int argc,char **argv) { MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); MatView(A, PETSC_VIEWER_STDOUT_WORLD); PetscViewerFlush(PETSC_VIEWER_STDOUT_WORLD); // Diag: 1, 2, 1 + Mat assembledLocalMat; + MatISGetMPIXAIJ(A, MAT_INITIAL_MATRIX, &assembledLocalMat); if (rank > 0) { // Do not pollute stdout: print only 1 proc std::cout << std::endl << "non assembled local matrix:" << std::endl << std::endl; Mat nonAssembledLocalMat; @@ -38,11 +40,10 @@ int main(int argc,char **argv) { MatView(nonAssembledLocalMat, PETSC_VIEWER_STDOUT_SELF); // Diag: 1, 1 std::cout << std::endl << "assembled local matrix:" << std::endl << std::endl; - Mat assembledLocalMat; - IS is; ISCreateGeneral(PETSC_COMM_SELF, localSize, localIdx, PETSC_COPY_VALUES, &is); - MatGetLocalSubMatrix(A, is, is, &assembledLocalMat); // KO ?!... - MatView(assembledLocalMat, PETSC_VIEWER_STDOUT_SELF); // Would like to get => Diag: 2, 1 + //IS is; ISCreateGeneral(PETSC_COMM_SELF, localSize, localIdx, PETSC_COPY_VALUES, &is); + //MatGetLocalSubMatrix(A, is, is, &assembledLocalMat); // KO ?!... } + MatView(assembledLocalMat, PETSC_VIEWER_STDOUT_WORLD); // Would like to get => Diag: 2, 1 ----- Mail original ----- > De: "Stefano Zampini" > ?: "petsc-maint" > Cc: "petsc-dev" , "PETSc users list" > , "Franck Houssen" > Envoy?: Dimanche 21 Mai 2017 22:51:34 > Objet: Re: [petsc-dev] Using PETSc MatIS, how to get local matrix (= one > domain) before and after assembly ? > To assemble the operator in aij format, use > MatISGetMPIXAIJ > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatISGetMPIXAIJ.html > Il 21 Mag 2017 18:43, "Matthew Knepley" < knepley at gmail.com > ha scritto: > > On Sun, May 21, 2017 at 11:23 AM, Franck Houssen < franck.houssen at inria.fr > > > > > wrote: > > > > I have a 3x3 global matrix is built (diag: 1, 2, 1): it's made of 2 > > > overlapping 2x2 local matrix (diag: 1, 1). > > > > > > Getting non assembled local matrix is OK with MatISGetLocalMat. > > > > > > How to get assembled local matrix (initial local matrix + neigbhor > > > contributions on the borders) ? (expected result is diag: 2, 1) > > > > > You can always use > > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetSubMatrix.html > > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetSubMatrices.html > > > to get copies, but if you just want to build things, you can use > > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetLocalSubMatrix.html > > > Thanks, > > > Matt > > > > Franck > > > > > -- > > > What most experimenters take for granted before they begin their > > experiments > > is infinitely more interesting than any results to which their experiments > > lead. > > > -- Norbert Wiener > > > http://www.caam.rice.edu/~mk51/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From franck.houssen at inria.fr Tue May 23 04:53:18 2017 From: franck.houssen at inria.fr (Franck Houssen) Date: Tue, 23 May 2017 11:53:18 +0200 (CEST) Subject: [petsc-users] [petsc-dev] Using PETSc MatIS, how to matmult a global IS matrix and a global vector ? In-Reply-To: <264DC59D-B914-42E5-9A89-0746F21A37BF@gmail.com> References: <2012394521.6757315.1495383841678.JavaMail.zimbra@inria.fr> <1564257107.6757440.1495383974167.JavaMail.zimbra@inria.fr> <264DC59D-B914-42E5-9A89-0746F21A37BF@gmail.com> Message-ID: <1392596904.7422896.1495533198072.JavaMail.zimbra@inria.fr> The first thing I did was to put 3, not 4 : I got an error thrown in MatCreateIS (see the git diff + stack below). As the error said I used globalSize = numberOfMPIProcessus * localSize : my understanding is that, when using MatIS, the global size needs to be the sum of all local sizes. Correct ? I have a 3x3 global matrix made of two overlapping 2x2 local matrix (= diagonal with 1.). Each local matrix correspond to one domain (each domain is delegated to one MPI proc, so, I have 2 MPI procs because I have 2 domains). This is the simplest possible example: I have two 2x2 (local) diag matrix that overlap so that the global matrix built from them is 1, 2, 1 on the diagonal (local contributions add up in the middle). I need to MatMult this global matrix with a global vector filled with 1. Franck Git diff : --- a/matISLocalMat.cpp +++ b/matISLocalMat.cpp @@ -16,7 +16,7 @@ int main(int argc,char **argv) { int size = 0; MPI_Comm_size(MPI_COMM_WORLD, &size); if (size != 2) return 1; int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); - PetscInt localSize = 2, globalSize = localSize*2 /*2 MPI*/; + PetscInt localSize = 2, globalSize = 3; PetscInt localIdx[2] = {0, 0}; if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} else {localIdx[0] = 1; localIdx[1] = 2;} Stack error: [0]PETSC ERROR: Nonconforming object sizes [0]PETSC ERROR: Sum of local lengths 4 does not equal global length 3, my local length 2 [0]PETSC ERROR: [0] ISG2LMapApply line 17 /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/vec/is/utils/isltog.c [0]PETSC ERROR: [0] MatSetValues_IS line 692 /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c [0]PETSC ERROR: [0] MatSetValues line 1157 /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c [0]PETSC ERROR: [0] MatISSetPreallocation_IS line 95 /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c [0]PETSC ERROR: [0] MatISSetPreallocation line 80 /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c [0]PETSC ERROR: [0] PetscSplitOwnership line 80 /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/sys/utils/psplit.c [0]PETSC ERROR: [0] PetscLayoutSetUp line 129 /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/vec/is/utils/pmap.c [0]PETSC ERROR: [0] MatSetLocalToGlobalMapping_IS line 628 /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c [0]PETSC ERROR: [0] MatSetLocalToGlobalMapping line 1899 /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c [0]PETSC ERROR: [0] MatCreateIS line 986 /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c ----- Mail original ----- > De: "Stefano Zampini" > ?: "Matthew Knepley" > Cc: "Franck Houssen" , "PETSc" > , "PETSc" > Envoy?: Dimanche 21 Mai 2017 23:02:37 > Objet: Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS matrix > and a global vector ? > Franck, > PETSc takes care of doing the matrix-vector multiplication properly using > MatIS. As Matt said, the layout of the vectors is the usual parallel layout. > The local sizes of the MatIS matrix (i.e. the local size of the left and > right vectors used in MatMult) are not the sizes of the local subdomain > matrices in MatIS. > > On May 21, 2017, at 6:47 PM, Matthew Knepley < knepley at gmail.com > wrote: > > > On Sun, May 21, 2017 at 11:26 AM, Franck Houssen < franck.houssen at inria.fr > > > > > wrote: > > > > Using PETSc MatIS, how to matmult a global IS matrix and a global vector > > > ? > > > Example is attached : I don't get what I expect that is a vector such > > > that > > > proc0 = [1, 2] and proc1 = [2, 1] > > > > > 1) I think the global size of your matrix is wrong. You seem to want 3, not > > 4 > > > 2) Global vectors have a non-overlapping row partition. You might be > > thinking > > of local vectors > > > Thanks, > > > Matt > > > -- > > > What most experimenters take for granted before they begin their > > experiments > > is infinitely more interesting than any results to which their experiments > > lead. > > > -- Norbert Wiener > > > http://www.caam.rice.edu/~mk51/ > ----- Mail original ----- > De: "Stefano Zampini" > ?: "Matthew Knepley" > Cc: "Franck Houssen" , "PETSc" > , "PETSc" > Envoy?: Dimanche 21 Mai 2017 23:02:37 > Objet: Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS matrix > and a global vector ? > Franck, > PETSc takes care of doing the matrix-vector multiplication properly using > MatIS. As Matt said, the layout of the vectors is the usual parallel layout. > The local sizes of the MatIS matrix (i.e. the local size of the left and > right vectors used in MatMult) are not the sizes of the local subdomain > matrices in MatIS. > > On May 21, 2017, at 6:47 PM, Matthew Knepley < knepley at gmail.com > wrote: > > > On Sun, May 21, 2017 at 11:26 AM, Franck Houssen < franck.houssen at inria.fr > > > > > wrote: > > > > Using PETSc MatIS, how to matmult a global IS matrix and a global vector > > > ? > > > Example is attached : I don't get what I expect that is a vector such > > > that > > > proc0 = [1, 2] and proc1 = [2, 1] > > > > > 1) I think the global size of your matrix is wrong. You seem to want 3, not > > 4 > > > 2) Global vectors have a non-overlapping row partition. You might be > > thinking > > of local vectors > > > Thanks, > > > Matt > > > > Franck > > > > > -- > > > What most experimenters take for granted before they begin their > > experiments > > is infinitely more interesting than any results to which their experiments > > lead. > > > -- Norbert Wiener > > > http://www.caam.rice.edu/~mk51/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Tue May 23 06:16:18 2017 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Tue, 23 May 2017 13:16:18 +0200 Subject: [petsc-users] [petsc-dev] Using PETSc MatIS, how to get local matrix (= one domain) before and after assembly ? In-Reply-To: <2033509705.7414108.1495532492501.JavaMail.zimbra@inria.fr> References: <867421313.6757137.1495383596545.JavaMail.zimbra@inria.fr> <1253777447.6757298.1495383780337.JavaMail.zimbra@inria.fr> <2033509705.7414108.1495532492501.JavaMail.zimbra@inria.fr> Message-ID: MatISGetMPIXAIJ is collective, as it assembles the global operator. To get the matrices you are looking for, you should call MatCreateSubMatrix on the assembled global operator, with the global indices representing the subdomain problem. Each process needs to call both functions Stefano Il 23 Mag 2017 11:41, "Franck Houssen" ha scritto: > I have a 3x3 global matrix made of two overlapping 2x2 local matrix (= > diagonal with 1.). Each local matrix correspond to one domain (each domain > is delegated to one MPI proc, so, I have 2 MPI procs because I have 2 > domains). > This is the simplest possible example: I have two 2x2 (local) diag matrix > that overlap so that the global matrix built from them is 1, 2, 1 on the > diagonal (local contributions add up in the middle). > > Now, I need for each MPI proc to get the assembled local matrix (sometimes > called the dirichlet matrix) : this is a local matrix (sequential - not > distributed with MPI) that accounts for contribution of neighboring domains > (MPI proc). > > How to get the local assembled matrix ? MatGetLocalSubMatrix does not work > (throw error - see example attached). MatGetSubMatrix returns a MPI > distributed matrix, not a local (sequential) one. > > 1. My understanding is that MatISGetMPIXAIJ should return a local > matrix (sequential AIJ matrix) : the MPI in the name recall that you get > the assembled matrix (with contributions from the shared border) from the > other MPI processus. Correct ? In my simple example, I replaced > MatGetLocalSubMatrix with MatISGetMPIXAIJ : I get a deadlock which was > surprising to me... Is MatISGetMPIXAIJ a collective call ? > 2. Supposing this is a collective call (and that point 1 is not > correct), I ride up MatISGetMPIXAIJ before the "if (rank > 0)" : I don't > deadlock now, but it seems I get a global matrix which is not the assembled > local matrix I am looking for. > 3. I am supposed to destroy the matrix returned by MatISGetMPIXAIJ ? > (I believe yes - not sure as AFAIU wording should associate Destroy methods > to Create methods) > > Franck > > The git diff illustrate modifications I tried to add to the initial file > attached to this thread: > --- a/matISLocalMat.cpp > +++ b/matISLocalMat.cpp > @@ -31,6 +31,8 @@ int main(int argc,char **argv) { > MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); MatAssemblyEnd(A, > MAT_FINAL_ASSEMBLY); > MatView(A, PETSC_VIEWER_STDOUT_WORLD); PetscViewerFlush(PETSC_VIEWER_STDOUT_WORLD); > // Diag: 1, 2, 1 > > + Mat assembledLocalMat; > + MatISGetMPIXAIJ(A, MAT_INITIAL_MATRIX, &assembledLocalMat); > if (rank > 0) { // Do not pollute stdout: print only 1 proc > std::cout << std::endl << "non assembled local matrix:" << std::endl > << std::endl; > Mat nonAssembledLocalMat; > @@ -38,11 +40,10 @@ int main(int argc,char **argv) { > MatView(nonAssembledLocalMat, PETSC_VIEWER_STDOUT_SELF); // Diag: 1, 1 > > std::cout << std::endl << "assembled local matrix:" << std::endl << > std::endl; > - Mat assembledLocalMat; > - IS is; ISCreateGeneral(PETSC_COMM_SELF, localSize, localIdx, > PETSC_COPY_VALUES, &is); > - MatGetLocalSubMatrix(A, is, is, &assembledLocalMat); // KO ?!... > - MatView(assembledLocalMat, PETSC_VIEWER_STDOUT_SELF); // Would like > to get => Diag: 2, 1 > + //IS is; ISCreateGeneral(PETSC_COMM_SELF, localSize, localIdx, > PETSC_COPY_VALUES, &is); > + //MatGetLocalSubMatrix(A, is, is, &assembledLocalMat); // KO ?!... > } > + MatView(assembledLocalMat, PETSC_VIEWER_STDOUT_WORLD); // Would like to > get => Diag: 2, 1 > > > ------------------------------ > > *De: *"Stefano Zampini" > *?: *"petsc-maint" > *Cc: *"petsc-dev" , "PETSc users list" < > petsc-users at mcs.anl.gov>, "Franck Houssen" > *Envoy?: *Dimanche 21 Mai 2017 22:51:34 > *Objet: *Re: [petsc-dev] Using PETSc MatIS, how to get local matrix (= > one domain) before and after assembly ? > > To assemble the operator in aij format, use > MatISGetMPIXAIJ > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/ > MatISGetMPIXAIJ.html > > Il 21 Mag 2017 18:43, "Matthew Knepley" ha scritto: > >> On Sun, May 21, 2017 at 11:23 AM, Franck Houssen > > wrote: >> >>> I have a 3x3 global matrix is built (diag: 1, 2, 1): it's made of 2 >>> overlapping 2x2 local matrix (diag: 1, 1). >>> Getting non assembled local matrix is OK with MatISGetLocalMat. >>> How to get assembled local matrix (initial local matrix + neigbhor >>> contributions on the borders) ? (expected result is diag: 2, 1) >>> >> >> You can always use >> >> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/ >> MatGetSubMatrix.html >> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/ >> MatGetSubMatrices.html >> >> to get copies, but if you just want to build things, you can use >> >> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/ >> MatGetLocalSubMatrix.html >> >> Thanks, >> >> Matt >> >> >>> Franck >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> http://www.caam.rice.edu/~mk51/ >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue May 23 06:21:21 2017 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 23 May 2017 06:21:21 -0500 Subject: [petsc-users] [petsc-dev] Using PETSc MatIS, how to matmult a global IS matrix and a global vector ? In-Reply-To: <1392596904.7422896.1495533198072.JavaMail.zimbra@inria.fr> References: <2012394521.6757315.1495383841678.JavaMail.zimbra@inria.fr> <1564257107.6757440.1495383974167.JavaMail.zimbra@inria.fr> <264DC59D-B914-42E5-9A89-0746F21A37BF@gmail.com> <1392596904.7422896.1495533198072.JavaMail.zimbra@inria.fr> Message-ID: On Tue, May 23, 2017 at 4:53 AM, Franck Houssen wrote: > The first thing I did was to put 3, not 4 : I got an error thrown in > MatCreateIS (see the git diff + stack below). As the error said I used > globalSize = numberOfMPIProcessus * localSize : my understanding is that, > when using MatIS, the global size needs to be the sum of all local sizes. > Correct ? > No. MatIS means that the matrix is not assembled. The easiest way (for me) to think of this is that processes do not have to hold full rows. One process can hold part of row i, and another processes can hold another part. However, there are still the same number of global rows. > I have a 3x3 global matrix made of two overlapping 2x2 local matrix (= > diagonal with 1.). Each local matrix correspond to one domain (each domain > is delegated to one MPI proc, so, I have 2 MPI procs because I have 2 > domains). > So the global size is 3. The local size here is not the size of the local IS block, since that is a property only of MatIS. It is the size of the local piece of the vector you multiply. This allows PETSc to understand the parallel layout of the Vec, and how it matched the Mat. This is somewhat confusing because FEM people mean something different by "local" than we do here, and in fact we use this other definition of local when assembling operators. Matt > This is the simplest possible example: I have two 2x2 (local) diag matrix > that overlap so that the global matrix built from them is 1, 2, 1 on the > diagonal (local contributions add up in the middle). > I need to MatMult this global matrix with a global vector filled with 1. > > Franck > > Git diff : > > --- a/matISLocalMat.cpp > +++ b/matISLocalMat.cpp > @@ -16,7 +16,7 @@ int main(int argc,char **argv) { > int size = 0; MPI_Comm_size(MPI_COMM_WORLD, &size); if (size != 2) > return 1; > int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > - PetscInt localSize = 2, globalSize = localSize*2 /*2 MPI*/; > + PetscInt localSize = 2, globalSize = 3; > PetscInt localIdx[2] = {0, 0}; > if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} > else {localIdx[0] = 1; localIdx[1] = 2;} > > > > Stack error: > > [0]PETSC ERROR: Nonconforming object sizes > [0]PETSC ERROR: Sum of local lengths 4 does not equal global length 3, my > local length 2 > [0]PETSC ERROR: [0] ISG2LMapApply line 17 /home/fghoussen/Documents/ > INRIA/petsc-3.7.6/src/vec/is/utils/isltog.c > [0]PETSC ERROR: [0] MatSetValues_IS line 692 /home/fghoussen/Documents/ > INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > [0]PETSC ERROR: [0] MatSetValues line 1157 /home/fghoussen/Documents/ > INRIA/petsc-3.7.6/src/mat/interface/matrix.c > [0]PETSC ERROR: [0] MatISSetPreallocation_IS line 95 > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > [0]PETSC ERROR: [0] MatISSetPreallocation line 80 > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > [0]PETSC ERROR: [0] PetscSplitOwnership line 80 /home/fghoussen/Documents/ > INRIA/petsc-3.7.6/src/sys/utils/psplit.c > [0]PETSC ERROR: [0] PetscLayoutSetUp line 129 /home/fghoussen/Documents/ > INRIA/petsc-3.7.6/src/vec/is/utils/pmap.c > [0]PETSC ERROR: [0] MatSetLocalToGlobalMapping_IS line 628 > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > [0]PETSC ERROR: [0] MatSetLocalToGlobalMapping line 1899 > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > [0]PETSC ERROR: [0] MatCreateIS line 986 /home/fghoussen/Documents/ > INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > ------------------------------ > > *De: *"Stefano Zampini" > *?: *"Matthew Knepley" > *Cc: *"Franck Houssen" , "PETSc" < > petsc-users at mcs.anl.gov>, "PETSc" > *Envoy?: *Dimanche 21 Mai 2017 23:02:37 > *Objet: *Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS > matrix and a global vector ? > > Franck, > > PETSc takes care of doing the matrix-vector multiplication properly using > MatIS. As Matt said, the layout of the vectors is the usual parallel > layout. > The local sizes of the MatIS matrix (i.e. the local size of the left and > right vectors used in MatMult) are not the sizes of the local subdomain > matrices in MatIS. > > > On May 21, 2017, at 6:47 PM, Matthew Knepley wrote: > > On Sun, May 21, 2017 at 11:26 AM, Franck Houssen > wrote: > >> Using PETSc MatIS, how to matmult a global IS matrix and a global vector >> ? Example is attached : I don't get what I expect that is a vector such >> that proc0 = [1, 2] and proc1 = [2, 1] >> > > 1) I think the global size of your matrix is wrong. You seem to want 3, > not 4 > > 2) Global vectors have a non-overlapping row partition. You might be > thinking of local vectors > > Thanks, > > Matt > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > > ------------------------------ > > *De: *"Stefano Zampini" > *?: *"Matthew Knepley" > *Cc: *"Franck Houssen" , "PETSc" < > petsc-users at mcs.anl.gov>, "PETSc" > *Envoy?: *Dimanche 21 Mai 2017 23:02:37 > *Objet: *Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS > matrix and a global vector ? > > Franck, > > PETSc takes care of doing the matrix-vector multiplication properly using > MatIS. As Matt said, the layout of the vectors is the usual parallel > layout. > The local sizes of the MatIS matrix (i.e. the local size of the left and > right vectors used in MatMult) are not the sizes of the local subdomain > matrices in MatIS. > > > On May 21, 2017, at 6:47 PM, Matthew Knepley wrote: > > On Sun, May 21, 2017 at 11:26 AM, Franck Houssen > wrote: > >> Using PETSc MatIS, how to matmult a global IS matrix and a global vector >> ? Example is attached : I don't get what I expect that is a vector such >> that proc0 = [1, 2] and proc1 = [2, 1] >> > > 1) I think the global size of your matrix is wrong. You seem to want 3, > not 4 > > 2) Global vectors have a non-overlapping row partition. You might be > thinking of local vectors > > Thanks, > > Matt > > >> Franck >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Tue May 23 06:23:52 2017 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Tue, 23 May 2017 13:23:52 +0200 Subject: [petsc-users] [petsc-dev] Using PETSc MatIS, how to matmult a global IS matrix and a global vector ? In-Reply-To: <1392596904.7422896.1495533198072.JavaMail.zimbra@inria.fr> References: <2012394521.6757315.1495383841678.JavaMail.zimbra@inria.fr> <1564257107.6757440.1495383974167.JavaMail.zimbra@inria.fr> <264DC59D-B914-42E5-9A89-0746F21A37BF@gmail.com> <1392596904.7422896.1495533198072.JavaMail.zimbra@inria.fr> Message-ID: As I said, the local sizes of the MATIS matrix are NOT the sizes of the subdomain problem. As in all PETSc code, the local sizes of the matrix correspond to the local size of the non-overlapping right and left vectors used in matmult operations. You can use PETSC_DECIDE in place of localsize in your call to MatCreateIS. The size of the subdomain problem in MATIS is the local size of the l2g map. Il 23 Mag 2017 11:53, "Franck Houssen" ha scritto: > The first thing I did was to put 3, not 4 : I got an error thrown in > MatCreateIS (see the git diff + stack below). As the error said I used > globalSize = numberOfMPIProcessus * localSize : my understanding is that, > when using MatIS, the global size needs to be the sum of all local sizes. > Correct ? > > I have a 3x3 global matrix made of two overlapping 2x2 local matrix (= > diagonal with 1.). Each local matrix correspond to one domain (each domain > is delegated to one MPI proc, so, I have 2 MPI procs because I have 2 > domains). > This is the simplest possible example: I have two 2x2 (local) diag matrix > that overlap so that the global matrix built from them is 1, 2, 1 on the > diagonal (local contributions add up in the middle). > I need to MatMult this global matrix with a global vector filled with 1. > > Franck > > Git diff : > > --- a/matISLocalMat.cpp > +++ b/matISLocalMat.cpp > @@ -16,7 +16,7 @@ int main(int argc,char **argv) { > int size = 0; MPI_Comm_size(MPI_COMM_WORLD, &size); if (size != 2) > return 1; > int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > - PetscInt localSize = 2, globalSize = localSize*2 /*2 MPI*/; > + PetscInt localSize = 2, globalSize = 3; > PetscInt localIdx[2] = {0, 0}; > if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} > else {localIdx[0] = 1; localIdx[1] = 2;} > > > > Stack error: > > [0]PETSC ERROR: Nonconforming object sizes > [0]PETSC ERROR: Sum of local lengths 4 does not equal global length 3, my > local length 2 > [0]PETSC ERROR: [0] ISG2LMapApply line 17 /home/fghoussen/Documents/ > INRIA/petsc-3.7.6/src/vec/is/utils/isltog.c > [0]PETSC ERROR: [0] MatSetValues_IS line 692 /home/fghoussen/Documents/ > INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > [0]PETSC ERROR: [0] MatSetValues line 1157 /home/fghoussen/Documents/ > INRIA/petsc-3.7.6/src/mat/interface/matrix.c > [0]PETSC ERROR: [0] MatISSetPreallocation_IS line 95 > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > [0]PETSC ERROR: [0] MatISSetPreallocation line 80 > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > [0]PETSC ERROR: [0] PetscSplitOwnership line 80 /home/fghoussen/Documents/ > INRIA/petsc-3.7.6/src/sys/utils/psplit.c > [0]PETSC ERROR: [0] PetscLayoutSetUp line 129 /home/fghoussen/Documents/ > INRIA/petsc-3.7.6/src/vec/is/utils/pmap.c > [0]PETSC ERROR: [0] MatSetLocalToGlobalMapping_IS line 628 > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > [0]PETSC ERROR: [0] MatSetLocalToGlobalMapping line 1899 > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > [0]PETSC ERROR: [0] MatCreateIS line 986 /home/fghoussen/Documents/ > INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > ------------------------------ > > *De: *"Stefano Zampini" > *?: *"Matthew Knepley" > *Cc: *"Franck Houssen" , "PETSc" < > petsc-users at mcs.anl.gov>, "PETSc" > *Envoy?: *Dimanche 21 Mai 2017 23:02:37 > *Objet: *Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS > matrix and a global vector ? > > Franck, > > PETSc takes care of doing the matrix-vector multiplication properly using > MatIS. As Matt said, the layout of the vectors is the usual parallel > layout. > The local sizes of the MatIS matrix (i.e. the local size of the left and > right vectors used in MatMult) are not the sizes of the local subdomain > matrices in MatIS. > > > On May 21, 2017, at 6:47 PM, Matthew Knepley wrote: > > On Sun, May 21, 2017 at 11:26 AM, Franck Houssen > wrote: > >> Using PETSc MatIS, how to matmult a global IS matrix and a global vector >> ? Example is attached : I don't get what I expect that is a vector such >> that proc0 = [1, 2] and proc1 = [2, 1] >> > > 1) I think the global size of your matrix is wrong. You seem to want 3, > not 4 > > 2) Global vectors have a non-overlapping row partition. You might be > thinking of local vectors > > Thanks, > > Matt > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > > ------------------------------ > > *De: *"Stefano Zampini" > *?: *"Matthew Knepley" > *Cc: *"Franck Houssen" , "PETSc" < > petsc-users at mcs.anl.gov>, "PETSc" > *Envoy?: *Dimanche 21 Mai 2017 23:02:37 > *Objet: *Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS > matrix and a global vector ? > > Franck, > > PETSc takes care of doing the matrix-vector multiplication properly using > MatIS. As Matt said, the layout of the vectors is the usual parallel > layout. > The local sizes of the MatIS matrix (i.e. the local size of the left and > right vectors used in MatMult) are not the sizes of the local subdomain > matrices in MatIS. > > > On May 21, 2017, at 6:47 PM, Matthew Knepley wrote: > > On Sun, May 21, 2017 at 11:26 AM, Franck Houssen > wrote: > >> Using PETSc MatIS, how to matmult a global IS matrix and a global vector >> ? Example is attached : I don't get what I expect that is a vector such >> that proc0 = [1, 2] and proc1 = [2, 1] >> > > 1) I think the global size of your matrix is wrong. You seem to want 3, > not 4 > > 2) Global vectors have a non-overlapping row partition. You might be > thinking of local vectors > > Thanks, > > Matt > > >> Franck >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue May 23 06:27:14 2017 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 23 May 2017 06:27:14 -0500 Subject: [petsc-users] [petsc-dev] Using PETSc MatIS, how to matmult a global IS matrix and a global vector ? In-Reply-To: References: <2012394521.6757315.1495383841678.JavaMail.zimbra@inria.fr> <1564257107.6757440.1495383974167.JavaMail.zimbra@inria.fr> <264DC59D-B914-42E5-9A89-0746F21A37BF@gmail.com> <1392596904.7422896.1495533198072.JavaMail.zimbra@inria.fr> Message-ID: On Tue, May 23, 2017 at 6:23 AM, Stefano Zampini wrote: > As I said, the local sizes of the MATIS matrix are NOT the sizes of the > subdomain problem. As in all PETSc code, the local sizes of the matrix > correspond to the local size of the non-overlapping right and left vectors > used in matmult operations. You can use PETSC_DECIDE in place of localsize > in your call to MatCreateIS. The size of the subdomain problem in MATIS is > the local size of the l2g map. > I just want to make sure that MatIS is really what you want, since its a little more complex than other options. MatIS is useful for methods that need unassembled matrices, like domain decomposition methods which use Neumann problems on subdomains. If you are fine with assembled sparse matrices, than the normal AIJ should be easier to handle. Just checking. Thanks, Matt > Il 23 Mag 2017 11:53, "Franck Houssen" ha > scritto: > >> The first thing I did was to put 3, not 4 : I got an error thrown in >> MatCreateIS (see the git diff + stack below). As the error said I used >> globalSize = numberOfMPIProcessus * localSize : my understanding is that, >> when using MatIS, the global size needs to be the sum of all local sizes. >> Correct ? >> >> I have a 3x3 global matrix made of two overlapping 2x2 local matrix (= >> diagonal with 1.). Each local matrix correspond to one domain (each domain >> is delegated to one MPI proc, so, I have 2 MPI procs because I have 2 >> domains). >> This is the simplest possible example: I have two 2x2 (local) diag matrix >> that overlap so that the global matrix built from them is 1, 2, 1 on the >> diagonal (local contributions add up in the middle). >> I need to MatMult this global matrix with a global vector filled with 1. >> >> Franck >> >> Git diff : >> >> --- a/matISLocalMat.cpp >> +++ b/matISLocalMat.cpp >> @@ -16,7 +16,7 @@ int main(int argc,char **argv) { >> int size = 0; MPI_Comm_size(MPI_COMM_WORLD, &size); if (size != 2) >> return 1; >> int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); >> >> - PetscInt localSize = 2, globalSize = localSize*2 /*2 MPI*/; >> + PetscInt localSize = 2, globalSize = 3; >> PetscInt localIdx[2] = {0, 0}; >> if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} >> else {localIdx[0] = 1; localIdx[1] = 2;} >> >> >> >> Stack error: >> >> [0]PETSC ERROR: Nonconforming object sizes >> [0]PETSC ERROR: Sum of local lengths 4 does not equal global length 3, my >> local length 2 >> [0]PETSC ERROR: [0] ISG2LMapApply line 17 /home/fghoussen/Documents/INRI >> A/petsc-3.7.6/src/vec/is/utils/isltog.c >> [0]PETSC ERROR: [0] MatSetValues_IS line 692 >> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c >> [0]PETSC ERROR: [0] MatSetValues line 1157 /home/fghoussen/Documents/INRI >> A/petsc-3.7.6/src/mat/interface/matrix.c >> [0]PETSC ERROR: [0] MatISSetPreallocation_IS line 95 >> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c >> [0]PETSC ERROR: [0] MatISSetPreallocation line 80 >> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c >> [0]PETSC ERROR: [0] PetscSplitOwnership line 80 >> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/sys/utils/psplit.c >> [0]PETSC ERROR: [0] PetscLayoutSetUp line 129 >> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/vec/is/utils/pmap.c >> [0]PETSC ERROR: [0] MatSetLocalToGlobalMapping_IS line 628 >> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c >> [0]PETSC ERROR: [0] MatSetLocalToGlobalMapping line 1899 >> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c >> [0]PETSC ERROR: [0] MatCreateIS line 986 /home/fghoussen/Documents/INRI >> A/petsc-3.7.6/src/mat/impls/is/matis.c >> >> >> >> ------------------------------ >> >> *De: *"Stefano Zampini" >> *?: *"Matthew Knepley" >> *Cc: *"Franck Houssen" , "PETSc" < >> petsc-users at mcs.anl.gov>, "PETSc" >> *Envoy?: *Dimanche 21 Mai 2017 23:02:37 >> *Objet: *Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS >> matrix and a global vector ? >> >> Franck, >> >> PETSc takes care of doing the matrix-vector multiplication properly using >> MatIS. As Matt said, the layout of the vectors is the usual parallel >> layout. >> The local sizes of the MatIS matrix (i.e. the local size of the left and >> right vectors used in MatMult) are not the sizes of the local subdomain >> matrices in MatIS. >> >> >> On May 21, 2017, at 6:47 PM, Matthew Knepley wrote: >> >> On Sun, May 21, 2017 at 11:26 AM, Franck Houssen > > wrote: >> >>> Using PETSc MatIS, how to matmult a global IS matrix and a global vector >>> ? Example is attached : I don't get what I expect that is a vector such >>> that proc0 = [1, 2] and proc1 = [2, 1] >>> >> >> 1) I think the global size of your matrix is wrong. You seem to want 3, >> not 4 >> >> 2) Global vectors have a non-overlapping row partition. You might be >> thinking of local vectors >> >> Thanks, >> >> Matt >> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> http://www.caam.rice.edu/~mk51/ >> >> >> ------------------------------ >> >> *De: *"Stefano Zampini" >> *?: *"Matthew Knepley" >> *Cc: *"Franck Houssen" , "PETSc" < >> petsc-users at mcs.anl.gov>, "PETSc" >> *Envoy?: *Dimanche 21 Mai 2017 23:02:37 >> *Objet: *Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS >> matrix and a global vector ? >> >> Franck, >> >> PETSc takes care of doing the matrix-vector multiplication properly using >> MatIS. As Matt said, the layout of the vectors is the usual parallel >> layout. >> The local sizes of the MatIS matrix (i.e. the local size of the left and >> right vectors used in MatMult) are not the sizes of the local subdomain >> matrices in MatIS. >> >> >> On May 21, 2017, at 6:47 PM, Matthew Knepley wrote: >> >> On Sun, May 21, 2017 at 11:26 AM, Franck Houssen > > wrote: >> >>> Using PETSc MatIS, how to matmult a global IS matrix and a global vector >>> ? Example is attached : I don't get what I expect that is a vector such >>> that proc0 = [1, 2] and proc1 = [2, 1] >>> >> >> 1) I think the global size of your matrix is wrong. You seem to want 3, >> not 4 >> >> 2) Global vectors have a non-overlapping row partition. You might be >> thinking of local vectors >> >> Thanks, >> >> Matt >> >> >>> Franck >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> http://www.caam.rice.edu/~mk51/ >> >> >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From franck.houssen at inria.fr Tue May 23 11:28:03 2017 From: franck.houssen at inria.fr (Franck Houssen) Date: Tue, 23 May 2017 18:28:03 +0200 (CEST) Subject: [petsc-users] [petsc-dev] Using PETSc MatIS, how to matmult a global IS matrix and a global vector ? In-Reply-To: References: <2012394521.6757315.1495383841678.JavaMail.zimbra@inria.fr> <1564257107.6757440.1495383974167.JavaMail.zimbra@inria.fr> <264DC59D-B914-42E5-9A89-0746F21A37BF@gmail.com> <1392596904.7422896.1495533198072.JavaMail.zimbra@inria.fr> Message-ID: <1784716977.7683031.1495556883254.JavaMail.zimbra@inria.fr> OK, thanks. This is helpfull... But I really think the doc should be more verbose about that: this is really confusing and I didn't find any simple example to begin with which make all this even more confusing (personal opinion). Franck ----- Mail original ----- > De: "Matthew Knepley" > ?: "Franck Houssen" > Cc: "Stefano Zampini" , "PETSc" > , "PETSc" > Envoy?: Mardi 23 Mai 2017 13:21:21 > Objet: Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS matrix > and a global vector ? > On Tue, May 23, 2017 at 4:53 AM, Franck Houssen < franck.houssen at inria.fr > > wrote: > > The first thing I did was to put 3, not 4 : I got an error thrown in > > MatCreateIS (see the git diff + stack below). As the error said I used > > globalSize = numberOfMPIProcessus * localSize : my understanding is that, > > when using MatIS, the global size needs to be the sum of all local sizes. > > Correct ? > > No. MatIS means that the matrix is not assembled. The easiest way (for me) to > think of this is that processes do not have > to hold full rows. One process can hold part of row i, and another processes > can hold another part. However, there are still > the same number of global rows. > > I have a 3x3 global matrix made of two overlapping 2x2 local matrix (= > > diagonal with 1.). Each local matrix correspond to one domain (each domain > > is delegated to one MPI proc, so, I have 2 MPI procs because I have 2 > > domains). > > So the global size is 3. The local size here is not the size of the local IS > block, since that is a property only of MatIS. It is the > size of the local piece of the vector you multiply. This allows PETSc to > understand the parallel layout of the Vec, and how it > matched the Mat. > This is somewhat confusing because FEM people mean something different by > "local" than we do here, and in fact we use this > other definition of local when assembling operators. > Matt > > This is the simplest possible example: I have two 2x2 (local) diag matrix > > that overlap so that the global matrix built from them is 1, 2, 1 on the > > diagonal (local contributions add up in the middle). > > > I need to MatMult this global matrix with a global vector filled with 1. > > > Franck > > > Git diff : > > > --- a/matISLocalMat.cpp > > > +++ b/matISLocalMat.cpp > > > @@ -16,7 +16,7 @@ int main(int argc,char **argv) { > > > int size = 0; MPI_Comm_size(MPI_COMM_WORLD, &size); if (size != 2) return > > 1; > > > int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > > - PetscInt localSize = 2, globalSize = localSize*2 /*2 MPI*/; > > > + PetscInt localSize = 2, globalSize = 3; > > > PetscInt localIdx[2] = {0, 0}; > > > if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} > > > else {localIdx[0] = 1; localIdx[1] = 2;} > > > Stack error: > > > [0]PETSC ERROR: Nonconforming object sizes > > > [0]PETSC ERROR: Sum of local lengths 4 does not equal global length 3, my > > local length 2 > > > [0]PETSC ERROR: [0] ISG2LMapApply line 17 > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/vec/is/utils/isltog.c > > > [0]PETSC ERROR: [0] MatSetValues_IS line 692 > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > [0]PETSC ERROR: [0] MatSetValues line 1157 > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > > > [0]PETSC ERROR: [0] MatISSetPreallocation_IS line 95 > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > [0]PETSC ERROR: [0] MatISSetPreallocation line 80 > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > [0]PETSC ERROR: [0] PetscSplitOwnership line 80 > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/sys/utils/psplit.c > > > [0]PETSC ERROR: [0] PetscLayoutSetUp line 129 > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/vec/is/utils/pmap.c > > > [0]PETSC ERROR: [0] MatSetLocalToGlobalMapping_IS line 628 > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > [0]PETSC ERROR: [0] MatSetLocalToGlobalMapping line 1899 > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > > > [0]PETSC ERROR: [0] MatCreateIS line 986 > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > De: "Stefano Zampini" < stefano.zampini at gmail.com > > > > > > > ?: "Matthew Knepley" < knepley at gmail.com > > > > > > > Cc: "Franck Houssen" < franck.houssen at inria.fr >, "PETSc" < > > > petsc-users at mcs.anl.gov >, "PETSc" < petsc-dev at mcs.anl.gov > > > > > > > Envoy?: Dimanche 21 Mai 2017 23:02:37 > > > > > > Objet: Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS > > > matrix > > > and a global vector ? > > > > > > Franck, > > > > > > PETSc takes care of doing the matrix-vector multiplication properly using > > > MatIS. As Matt said, the layout of the vectors is the usual parallel > > > layout. > > > > > > The local sizes of the MatIS matrix (i.e. the local size of the left and > > > right vectors used in MatMult) are not the sizes of the local subdomain > > > matrices in MatIS. > > > > > > > On May 21, 2017, at 6:47 PM, Matthew Knepley < knepley at gmail.com > > > > > wrote: > > > > > > > > > > On Sun, May 21, 2017 at 11:26 AM, Franck Houssen < > > > > franck.houssen at inria.fr > > > > > > > > > wrote: > > > > > > > > > > > Using PETSc MatIS, how to matmult a global IS matrix and a global > > > > > vector > > > > > ? > > > > > Example is attached : I don't get what I expect that is a vector such > > > > > that > > > > > proc0 = [1, 2] and proc1 = [2, 1] > > > > > > > > > > > > > > 1) I think the global size of your matrix is wrong. You seem to want 3, > > > > not > > > > 4 > > > > > > > > > > 2) Global vectors have a non-overlapping row partition. You might be > > > > thinking > > > > of local vectors > > > > > > > > > > Thanks, > > > > > > > > > > Matt > > > > > > > > > > -- > > > > > > > > > > What most experimenters take for granted before they begin their > > > > experiments > > > > is infinitely more interesting than any results to which their > > > > experiments > > > > lead. > > > > > > > > > > -- Norbert Wiener > > > > > > > > > > http://www.caam.rice.edu/~mk51/ > > > > > > > > > De: "Stefano Zampini" < stefano.zampini at gmail.com > > > > > > > ?: "Matthew Knepley" < knepley at gmail.com > > > > > > > Cc: "Franck Houssen" < franck.houssen at inria.fr >, "PETSc" < > > > petsc-users at mcs.anl.gov >, "PETSc" < petsc-dev at mcs.anl.gov > > > > > > > Envoy?: Dimanche 21 Mai 2017 23:02:37 > > > > > > Objet: Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS > > > matrix > > > and a global vector ? > > > > > > Franck, > > > > > > PETSc takes care of doing the matrix-vector multiplication properly using > > > MatIS. As Matt said, the layout of the vectors is the usual parallel > > > layout. > > > > > > The local sizes of the MatIS matrix (i.e. the local size of the left and > > > right vectors used in MatMult) are not the sizes of the local subdomain > > > matrices in MatIS. > > > > > > > On May 21, 2017, at 6:47 PM, Matthew Knepley < knepley at gmail.com > > > > > wrote: > > > > > > > > > > On Sun, May 21, 2017 at 11:26 AM, Franck Houssen < > > > > franck.houssen at inria.fr > > > > > > > > > wrote: > > > > > > > > > > > Using PETSc MatIS, how to matmult a global IS matrix and a global > > > > > vector > > > > > ? > > > > > Example is attached : I don't get what I expect that is a vector such > > > > > that > > > > > proc0 = [1, 2] and proc1 = [2, 1] > > > > > > > > > > > > > > 1) I think the global size of your matrix is wrong. You seem to want 3, > > > > not > > > > 4 > > > > > > > > > > 2) Global vectors have a non-overlapping row partition. You might be > > > > thinking > > > > of local vectors > > > > > > > > > > Thanks, > > > > > > > > > > Matt > > > > > > > > > > > Franck > > > > > > > > > > > > > > -- > > > > > > > > > > What most experimenters take for granted before they begin their > > > > experiments > > > > is infinitely more interesting than any results to which their > > > > experiments > > > > lead. > > > > > > > > > > -- Norbert Wiener > > > > > > > > > > http://www.caam.rice.edu/~mk51/ > > > > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener > http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From franck.houssen at inria.fr Tue May 23 11:34:53 2017 From: franck.houssen at inria.fr (Franck Houssen) Date: Tue, 23 May 2017 18:34:53 +0200 (CEST) Subject: [petsc-users] [petsc-dev] Using PETSc MatIS, how to get local matrix (= one domain) before and after assembly ? In-Reply-To: References: <867421313.6757137.1495383596545.JavaMail.zimbra@inria.fr> <1253777447.6757298.1495383780337.JavaMail.zimbra@inria.fr> <2033509705.7414108.1495532492501.JavaMail.zimbra@inria.fr> Message-ID: <740691579.7684644.1495557293858.JavaMail.zimbra@inria.fr> OK. I am supposed to destroy the matrix returned by MatISGetMPIXAIJ ? Also, my example still not get the final assembled local matrix (the MatCreateSubMatrix returns an empty matrix) but as far as I understand my (global) index set is OK: what did I miss ? Franck ----- Mail original ----- > De: "Stefano Zampini" > ?: "Franck Houssen" > Cc: "petsc-dev" , "PETSc users list" > , "petsc-maint" > Envoy?: Mardi 23 Mai 2017 13:16:18 > Objet: Re: [petsc-dev] Using PETSc MatIS, how to get local matrix (= one > domain) before and after assembly ? > MatISGetMPIXAIJ is collective, as it assembles the global operator. To get > the matrices you are looking for, you should call MatCreateSubMatrix on the > assembled global operator, with the global indices representing the > subdomain problem. Each process needs to call both functions > Stefano > Il 23 Mag 2017 11:41, "Franck Houssen" < franck.houssen at inria.fr > ha > scritto: > > I have a 3x3 global matrix made of two overlapping 2x2 local matrix (= > > diagonal with 1.). Each local matrix correspond to one domain (each domain > > is delegated to one MPI proc, so, I have 2 MPI procs because I have 2 > > domains). > > > This is the simplest possible example: I have two 2x2 (local) diag matrix > > that overlap so that the global matrix built from them is 1, 2, 1 on the > > diagonal (local contributions add up in the middle). > > > Now, I need for each MPI proc to get the assembled local matrix (sometimes > > called the dirichlet matrix) : this is a local matrix (sequential - not > > distributed with MPI) that accounts for contribution of neighboring domains > > (MPI proc). > > > How to get the local assembled matrix ? MatGetLocalSubMatrix does not work > > (throw error - see example attached). MatGetSubMatrix returns a MPI > > distributed matrix, not a local (sequential) one. > > > 1. My understanding is that MatISGetMPIXAIJ should return a local matrix > > (sequential AIJ matrix) : the MPI in the name recall that you get the > > assembled matrix (with contributions from the shared border) from the other > > MPI processus. Correct ? In my simple example, I replaced > > MatGetLocalSubMatrix with MatISGetMPIXAIJ : I get a deadlock which was > > surprising to me... Is MatISGetMPIXAIJ a collective call ? > > > 2. Supposing this is a collective call (and that point 1 is not correct), I > > ride up MatISGetMPIXAIJ before the "if (rank > 0)" : I don't deadlock now, > > but it seems I get a global matrix which is not the assembled local matrix > > I > > am looking for. > > > 3. I am supposed to destroy the matrix returned by MatISGetMPIXAIJ ? (I > > believe yes - not sure as AFAIU wording should associate Destroy methods to > > Create methods) > > > Franck > > > The git diff illustrate modifications I tried to add to the initial file > > attached to this thread: > > > --- a/matISLocalMat.cpp > > > +++ b/matISLocalMat.cpp > > > @@ -31,6 +31,8 @@ int main(int argc,char **argv) { > > > MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); MatAssemblyEnd(A, > > MAT_FINAL_ASSEMBLY); > > > MatView(A, PETSC_VIEWER_STDOUT_WORLD); > > PetscViewerFlush(PETSC_VIEWER_STDOUT_WORLD); // Diag: 1, 2, 1 > > > + Mat assembledLocalMat; > > > + MatISGetMPIXAIJ(A, MAT_INITIAL_MATRIX, &assembledLocalMat); > > > if (rank > 0) { // Do not pollute stdout: print only 1 proc > > > std::cout << std::endl << "non assembled local matrix:" << std::endl << > > std::endl; > > > Mat nonAssembledLocalMat; > > > @@ -38,11 +40,10 @@ int main(int argc,char **argv) { > > > MatView(nonAssembledLocalMat, PETSC_VIEWER_STDOUT_SELF); // Diag: 1, 1 > > > std::cout << std::endl << "assembled local matrix:" << std::endl << > > std::endl; > > > - Mat assembledLocalMat; > > > - IS is; ISCreateGeneral(PETSC_COMM_SELF, localSize, localIdx, > > PETSC_COPY_VALUES, &is); > > > - MatGetLocalSubMatrix(A, is, is, &assembledLocalMat); // KO ?!... > > > - MatView(assembledLocalMat, PETSC_VIEWER_STDOUT_SELF); // Would like to > > get > > => Diag: 2, 1 > > > + //IS is; ISCreateGeneral(PETSC_COMM_SELF, localSize, localIdx, > > PETSC_COPY_VALUES, &is); > > > + //MatGetLocalSubMatrix(A, is, is, &assembledLocalMat); // KO ?!... > > > } > > > + MatView(assembledLocalMat, PETSC_VIEWER_STDOUT_WORLD); // Would like to > > get > > => Diag: 2, 1 > > > > De: "Stefano Zampini" < stefano.zampini at gmail.com > > > > > > > ?: "petsc-maint" < knepley at gmail.com > > > > > > > Cc: "petsc-dev" < petsc-dev at mcs.anl.gov >, "PETSc users list" < > > > petsc-users at mcs.anl.gov >, "Franck Houssen" < franck.houssen at inria.fr > > > > > > > Envoy?: Dimanche 21 Mai 2017 22:51:34 > > > > > > Objet: Re: [petsc-dev] Using PETSc MatIS, how to get local matrix (= one > > > domain) before and after assembly ? > > > > > > To assemble the operator in aij format, use > > > > > > MatISGetMPIXAIJ > > > > > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatISGetMPIXAIJ.html > > > > > > Il 21 Mag 2017 18:43, "Matthew Knepley" < knepley at gmail.com > ha scritto: > > > > > > > On Sun, May 21, 2017 at 11:23 AM, Franck Houssen < > > > > franck.houssen at inria.fr > > > > > > > > > wrote: > > > > > > > > > > > I have a 3x3 global matrix is built (diag: 1, 2, 1): it's made of 2 > > > > > overlapping 2x2 local matrix (diag: 1, 1). > > > > > > > > > > > > > > > Getting non assembled local matrix is OK with MatISGetLocalMat. > > > > > > > > > > > > > > > How to get assembled local matrix (initial local matrix + neigbhor > > > > > contributions on the borders) ? (expected result is diag: 2, 1) > > > > > > > > > > > > > > You can always use > > > > > > > > > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetSubMatrix.html > > > > > > > > > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetSubMatrices.html > > > > > > > > > > to get copies, but if you just want to build things, you can use > > > > > > > > > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetLocalSubMatrix.html > > > > > > > > > > Thanks, > > > > > > > > > > Matt > > > > > > > > > > > Franck > > > > > > > > > > > > > > -- > > > > > > > > > > What most experimenters take for granted before they begin their > > > > experiments > > > > is infinitely more interesting than any results to which their > > > > experiments > > > > lead. > > > > > > > > > > -- Norbert Wiener > > > > > > > > > > http://www.caam.rice.edu/~mk51/ > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: matISLocalMat.cpp Type: text/x-c++src Size: 3685 bytes Desc: not available URL: From franck.houssen at inria.fr Tue May 23 11:40:59 2017 From: franck.houssen at inria.fr (Franck Houssen) Date: Tue, 23 May 2017 18:40:59 +0200 (CEST) Subject: [petsc-users] [petsc-dev] Using PETSc MatIS, how to matmult a global IS matrix and a global vector ? In-Reply-To: <1784716977.7683031.1495556883254.JavaMail.zimbra@inria.fr> References: <2012394521.6757315.1495383841678.JavaMail.zimbra@inria.fr> <1564257107.6757440.1495383974167.JavaMail.zimbra@inria.fr> <264DC59D-B914-42E5-9A89-0746F21A37BF@gmail.com> <1392596904.7422896.1495533198072.JavaMail.zimbra@inria.fr> <1784716977.7683031.1495556883254.JavaMail.zimbra@inria.fr> Message-ID: <241093803.7685734.1495557659717.JavaMail.zimbra@inria.fr> I let this small piece of code to demonstrate the basics.... If you believe (like I do) that this is worth to be added to the available examples: feel free to do it !... Franck ----- Mail original ----- > De: "Franck Houssen" > ?: "Matthew Knepley" > Cc: "PETSc" , "PETSc" > Envoy?: Mardi 23 Mai 2017 18:28:03 > Objet: Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS matrix > and a global vector ? > OK, thanks. This is helpfull... But I really think the doc should be more > verbose about that: this is really confusing and I didn't find any simple > example to begin with which make all this even more confusing (personal > opinion). > Franck > ----- Mail original ----- > > De: "Matthew Knepley" > > > ?: "Franck Houssen" > > > Cc: "Stefano Zampini" , "PETSc" > > , "PETSc" > > > Envoy?: Mardi 23 Mai 2017 13:21:21 > > > Objet: Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS matrix > > and a global vector ? > > > On Tue, May 23, 2017 at 4:53 AM, Franck Houssen < franck.houssen at inria.fr > > > wrote: > > > > The first thing I did was to put 3, not 4 : I got an error thrown in > > > MatCreateIS (see the git diff + stack below). As the error said I used > > > globalSize = numberOfMPIProcessus * localSize : my understanding is that, > > > when using MatIS, the global size needs to be the sum of all local sizes. > > > Correct ? > > > > > No. MatIS means that the matrix is not assembled. The easiest way (for me) > > to > > think of this is that processes do not have > > > to hold full rows. One process can hold part of row i, and another > > processes > > can hold another part. However, there are still > > > the same number of global rows. > > > > I have a 3x3 global matrix made of two overlapping 2x2 local matrix (= > > > diagonal with 1.). Each local matrix correspond to one domain (each > > > domain > > > is delegated to one MPI proc, so, I have 2 MPI procs because I have 2 > > > domains). > > > > > So the global size is 3. The local size here is not the size of the local > > IS > > block, since that is a property only of MatIS. It is the > > > size of the local piece of the vector you multiply. This allows PETSc to > > understand the parallel layout of the Vec, and how it > > > matched the Mat. > > > This is somewhat confusing because FEM people mean something different by > > "local" than we do here, and in fact we use this > > > other definition of local when assembling operators. > > > Matt > > > > This is the simplest possible example: I have two 2x2 (local) diag matrix > > > that overlap so that the global matrix built from them is 1, 2, 1 on the > > > diagonal (local contributions add up in the middle). > > > > > > I need to MatMult this global matrix with a global vector filled with 1. > > > > > > Franck > > > > > > Git diff : > > > > > > --- a/matISLocalMat.cpp > > > > > > +++ b/matISLocalMat.cpp > > > > > > @@ -16,7 +16,7 @@ int main(int argc,char **argv) { > > > > > > int size = 0; MPI_Comm_size(MPI_COMM_WORLD, &size); if (size != 2) return > > > 1; > > > > > > int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > > > > > - PetscInt localSize = 2, globalSize = localSize*2 /*2 MPI*/; > > > > > > + PetscInt localSize = 2, globalSize = 3; > > > > > > PetscInt localIdx[2] = {0, 0}; > > > > > > if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} > > > > > > else {localIdx[0] = 1; localIdx[1] = 2;} > > > > > > Stack error: > > > > > > [0]PETSC ERROR: Nonconforming object sizes > > > > > > [0]PETSC ERROR: Sum of local lengths 4 does not equal global length 3, my > > > local length 2 > > > > > > [0]PETSC ERROR: [0] ISG2LMapApply line 17 > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/vec/is/utils/isltog.c > > > > > > [0]PETSC ERROR: [0] MatSetValues_IS line 692 > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > > > [0]PETSC ERROR: [0] MatSetValues line 1157 > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > > > > > > [0]PETSC ERROR: [0] MatISSetPreallocation_IS line 95 > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > > > [0]PETSC ERROR: [0] MatISSetPreallocation line 80 > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > > > [0]PETSC ERROR: [0] PetscSplitOwnership line 80 > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/sys/utils/psplit.c > > > > > > [0]PETSC ERROR: [0] PetscLayoutSetUp line 129 > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/vec/is/utils/pmap.c > > > > > > [0]PETSC ERROR: [0] MatSetLocalToGlobalMapping_IS line 628 > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > > > [0]PETSC ERROR: [0] MatSetLocalToGlobalMapping line 1899 > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > > > > > > [0]PETSC ERROR: [0] MatCreateIS line 986 > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > > > > De: "Stefano Zampini" < stefano.zampini at gmail.com > > > > > > > > > > > ?: "Matthew Knepley" < knepley at gmail.com > > > > > > > > > > > Cc: "Franck Houssen" < franck.houssen at inria.fr >, "PETSc" < > > > > petsc-users at mcs.anl.gov >, "PETSc" < petsc-dev at mcs.anl.gov > > > > > > > > > > > Envoy?: Dimanche 21 Mai 2017 23:02:37 > > > > > > > > > > Objet: Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS > > > > matrix > > > > and a global vector ? > > > > > > > > > > Franck, > > > > > > > > > > PETSc takes care of doing the matrix-vector multiplication properly > > > > using > > > > MatIS. As Matt said, the layout of the vectors is the usual parallel > > > > layout. > > > > > > > > > > The local sizes of the MatIS matrix (i.e. the local size of the left > > > > and > > > > right vectors used in MatMult) are not the sizes of the local subdomain > > > > matrices in MatIS. > > > > > > > > > > > On May 21, 2017, at 6:47 PM, Matthew Knepley < knepley at gmail.com > > > > > > wrote: > > > > > > > > > > > > > > > On Sun, May 21, 2017 at 11:26 AM, Franck Houssen < > > > > > franck.houssen at inria.fr > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > Using PETSc MatIS, how to matmult a global IS matrix and a global > > > > > > vector > > > > > > ? > > > > > > Example is attached : I don't get what I expect that is a vector > > > > > > such > > > > > > that > > > > > > proc0 = [1, 2] and proc1 = [2, 1] > > > > > > > > > > > > > > > > > > > > 1) I think the global size of your matrix is wrong. You seem to want > > > > > 3, > > > > > not > > > > > 4 > > > > > > > > > > > > > > > 2) Global vectors have a non-overlapping row partition. You might be > > > > > thinking > > > > > of local vectors > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > Matt > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > What most experimenters take for granted before they begin their > > > > > experiments > > > > > is infinitely more interesting than any results to which their > > > > > experiments > > > > > lead. > > > > > > > > > > > > > > > -- Norbert Wiener > > > > > > > > > > > > > > > http://www.caam.rice.edu/~mk51/ > > > > > > > > > > > > > > De: "Stefano Zampini" < stefano.zampini at gmail.com > > > > > > > > > > > ?: "Matthew Knepley" < knepley at gmail.com > > > > > > > > > > > Cc: "Franck Houssen" < franck.houssen at inria.fr >, "PETSc" < > > > > petsc-users at mcs.anl.gov >, "PETSc" < petsc-dev at mcs.anl.gov > > > > > > > > > > > Envoy?: Dimanche 21 Mai 2017 23:02:37 > > > > > > > > > > Objet: Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS > > > > matrix > > > > and a global vector ? > > > > > > > > > > Franck, > > > > > > > > > > PETSc takes care of doing the matrix-vector multiplication properly > > > > using > > > > MatIS. As Matt said, the layout of the vectors is the usual parallel > > > > layout. > > > > > > > > > > The local sizes of the MatIS matrix (i.e. the local size of the left > > > > and > > > > right vectors used in MatMult) are not the sizes of the local subdomain > > > > matrices in MatIS. > > > > > > > > > > > On May 21, 2017, at 6:47 PM, Matthew Knepley < knepley at gmail.com > > > > > > wrote: > > > > > > > > > > > > > > > On Sun, May 21, 2017 at 11:26 AM, Franck Houssen < > > > > > franck.houssen at inria.fr > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > Using PETSc MatIS, how to matmult a global IS matrix and a global > > > > > > vector > > > > > > ? > > > > > > Example is attached : I don't get what I expect that is a vector > > > > > > such > > > > > > that > > > > > > proc0 = [1, 2] and proc1 = [2, 1] > > > > > > > > > > > > > > > > > > > > 1) I think the global size of your matrix is wrong. You seem to want > > > > > 3, > > > > > not > > > > > 4 > > > > > > > > > > > > > > > 2) Global vectors have a non-overlapping row partition. You might be > > > > > thinking > > > > > of local vectors > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > Matt > > > > > > > > > > > > > > > > Franck > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > What most experimenters take for granted before they begin their > > > > > experiments > > > > > is infinitely more interesting than any results to which their > > > > > experiments > > > > > lead. > > > > > > > > > > > > > > > -- Norbert Wiener > > > > > > > > > > > > > > > http://www.caam.rice.edu/~mk51/ > > > > > > > > > > > > -- > > > What most experimenters take for granted before they begin their > > experiments > > is infinitely more interesting than any results to which their experiments > > lead. > > > -- Norbert Wiener > > > http://www.caam.rice.edu/~mk51/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: matISProdMatVec.cpp Type: text/x-c++src Size: 2563 bytes Desc: not available URL: From knepley at gmail.com Tue May 23 11:46:34 2017 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 23 May 2017 11:46:34 -0500 Subject: [petsc-users] [petsc-dev] Using PETSc MatIS, how to matmult a global IS matrix and a global vector ? In-Reply-To: <1784716977.7683031.1495556883254.JavaMail.zimbra@inria.fr> References: <2012394521.6757315.1495383841678.JavaMail.zimbra@inria.fr> <1564257107.6757440.1495383974167.JavaMail.zimbra@inria.fr> <264DC59D-B914-42E5-9A89-0746F21A37BF@gmail.com> <1392596904.7422896.1495533198072.JavaMail.zimbra@inria.fr> <1784716977.7683031.1495556883254.JavaMail.zimbra@inria.fr> Message-ID: On Tue, May 23, 2017 at 11:28 AM, Franck Houssen wrote: > OK, thanks. This is helpfull... But I really think the doc should be more > verbose about that: this is really confusing and I didn't find any simple > example to begin with which make all this even more confusing (personal > opinion). > Did you respond to my other question (how are you using them)? That would help me understand how to phrase it. Thanks, Matt > Franck > > > ------------------------------ > > *De: *"Matthew Knepley" > *?: *"Franck Houssen" > *Cc: *"Stefano Zampini" , "PETSc" < > petsc-users at mcs.anl.gov>, "PETSc" > *Envoy?: *Mardi 23 Mai 2017 13:21:21 > *Objet: *Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS > matrix and a global vector ? > > On Tue, May 23, 2017 at 4:53 AM, Franck Houssen > wrote: > >> The first thing I did was to put 3, not 4 : I got an error thrown in >> MatCreateIS (see the git diff + stack below). As the error said I used >> globalSize = numberOfMPIProcessus * localSize : my understanding is that, >> when using MatIS, the global size needs to be the sum of all local sizes. >> Correct ? >> > > No. MatIS means that the matrix is not assembled. The easiest way (for me) > to think of this is that processes do not have > to hold full rows. One process can hold part of row i, and another > processes can hold another part. However, there are still > the same number of global rows. > > I have a 3x3 global matrix made of two overlapping 2x2 local matrix (= >> diagonal with 1.). Each local matrix correspond to one domain (each domain >> is delegated to one MPI proc, so, I have 2 MPI procs because I have 2 >> domains). >> > > So the global size is 3. The local size here is not the size of the local > IS block, since that is a property only of MatIS. It is the > size of the local piece of the vector you multiply. This allows PETSc to > understand the parallel layout of the Vec, and how it > matched the Mat. > > This is somewhat confusing because FEM people mean something different by > "local" than we do here, and in fact we use this > other definition of local when assembling operators. > > Matt > > >> This is the simplest possible example: I have two 2x2 (local) diag matrix >> that overlap so that the global matrix built from them is 1, 2, 1 on the >> diagonal (local contributions add up in the middle). >> I need to MatMult this global matrix with a global vector filled with 1. >> >> Franck >> >> Git diff : >> >> --- a/matISLocalMat.cpp >> +++ b/matISLocalMat.cpp >> @@ -16,7 +16,7 @@ int main(int argc,char **argv) { >> int size = 0; MPI_Comm_size(MPI_COMM_WORLD, &size); if (size != 2) >> return 1; >> int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); >> >> - PetscInt localSize = 2, globalSize = localSize*2 /*2 MPI*/; >> + PetscInt localSize = 2, globalSize = 3; >> PetscInt localIdx[2] = {0, 0}; >> if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} >> else {localIdx[0] = 1; localIdx[1] = 2;} >> >> >> >> Stack error: >> >> [0]PETSC ERROR: Nonconforming object sizes >> [0]PETSC ERROR: Sum of local lengths 4 does not equal global length 3, my >> local length 2 >> [0]PETSC ERROR: [0] ISG2LMapApply line 17 /home/fghoussen/Documents/ >> INRIA/petsc-3.7.6/src/vec/is/utils/isltog.c >> [0]PETSC ERROR: [0] MatSetValues_IS line 692 /home/fghoussen/Documents/ >> INRIA/petsc-3.7.6/src/mat/impls/is/matis.c >> [0]PETSC ERROR: [0] MatSetValues line 1157 /home/fghoussen/Documents/ >> INRIA/petsc-3.7.6/src/mat/interface/matrix.c >> [0]PETSC ERROR: [0] MatISSetPreallocation_IS line 95 >> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c >> [0]PETSC ERROR: [0] MatISSetPreallocation line 80 >> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c >> [0]PETSC ERROR: [0] PetscSplitOwnership line 80 /home/fghoussen/Documents/ >> INRIA/petsc-3.7.6/src/sys/utils/psplit.c >> [0]PETSC ERROR: [0] PetscLayoutSetUp line 129 /home/fghoussen/Documents/ >> INRIA/petsc-3.7.6/src/vec/is/utils/pmap.c >> [0]PETSC ERROR: [0] MatSetLocalToGlobalMapping_IS line 628 >> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c >> [0]PETSC ERROR: [0] MatSetLocalToGlobalMapping line 1899 >> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c >> [0]PETSC ERROR: [0] MatCreateIS line 986 /home/fghoussen/Documents/ >> INRIA/petsc-3.7.6/src/mat/impls/is/matis.c >> >> >> >> ------------------------------ >> >> *De: *"Stefano Zampini" >> *?: *"Matthew Knepley" >> *Cc: *"Franck Houssen" , "PETSc" < >> petsc-users at mcs.anl.gov>, "PETSc" >> *Envoy?: *Dimanche 21 Mai 2017 23:02:37 >> *Objet: *Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS >> matrix and a global vector ? >> >> Franck, >> >> PETSc takes care of doing the matrix-vector multiplication properly using >> MatIS. As Matt said, the layout of the vectors is the usual parallel >> layout. >> The local sizes of the MatIS matrix (i.e. the local size of the left and >> right vectors used in MatMult) are not the sizes of the local subdomain >> matrices in MatIS. >> >> >> On May 21, 2017, at 6:47 PM, Matthew Knepley wrote: >> >> On Sun, May 21, 2017 at 11:26 AM, Franck Houssen > > wrote: >> >>> Using PETSc MatIS, how to matmult a global IS matrix and a global vector >>> ? Example is attached : I don't get what I expect that is a vector such >>> that proc0 = [1, 2] and proc1 = [2, 1] >>> >> >> 1) I think the global size of your matrix is wrong. You seem to want 3, >> not 4 >> >> 2) Global vectors have a non-overlapping row partition. You might be >> thinking of local vectors >> >> Thanks, >> >> Matt >> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> http://www.caam.rice.edu/~mk51/ >> >> >> ------------------------------ >> >> *De: *"Stefano Zampini" >> *?: *"Matthew Knepley" >> *Cc: *"Franck Houssen" , "PETSc" < >> petsc-users at mcs.anl.gov>, "PETSc" >> *Envoy?: *Dimanche 21 Mai 2017 23:02:37 >> *Objet: *Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS >> matrix and a global vector ? >> >> Franck, >> >> PETSc takes care of doing the matrix-vector multiplication properly using >> MatIS. As Matt said, the layout of the vectors is the usual parallel >> layout. >> The local sizes of the MatIS matrix (i.e. the local size of the left and >> right vectors used in MatMult) are not the sizes of the local subdomain >> matrices in MatIS. >> >> >> On May 21, 2017, at 6:47 PM, Matthew Knepley wrote: >> >> On Sun, May 21, 2017 at 11:26 AM, Franck Houssen > > wrote: >> >>> Using PETSc MatIS, how to matmult a global IS matrix and a global vector >>> ? Example is attached : I don't get what I expect that is a vector such >>> that proc0 = [1, 2] and proc1 = [2, 1] >>> >> >> 1) I think the global size of your matrix is wrong. You seem to want 3, >> not 4 >> >> 2) Global vectors have a non-overlapping row partition. You might be >> thinking of local vectors >> >> Thanks, >> >> Matt >> >> >>> Franck >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> http://www.caam.rice.edu/~mk51/ >> >> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From franck.houssen at inria.fr Tue May 23 11:51:27 2017 From: franck.houssen at inria.fr (Franck Houssen) Date: Tue, 23 May 2017 18:51:27 +0200 (CEST) Subject: [petsc-users] [petsc-dev] Using PETSc MatIS, how to matmult a global IS matrix and a global vector ? In-Reply-To: References: <2012394521.6757315.1495383841678.JavaMail.zimbra@inria.fr> <1564257107.6757440.1495383974167.JavaMail.zimbra@inria.fr> <264DC59D-B914-42E5-9A89-0746F21A37BF@gmail.com> <1392596904.7422896.1495533198072.JavaMail.zimbra@inria.fr> <1784716977.7683031.1495556883254.JavaMail.zimbra@inria.fr> Message-ID: <855172682.7687763.1495558287122.JavaMail.zimbra@inria.fr> Not sure to know what question you're talking about ?!... I use MatIS to test some kind of domain decomposition methods. I define my own preconditioner for that: in the apply callback, I need to matmult my (matIS) matrix with the incoming vector. Franck ----- Mail original ----- > De: "Matthew Knepley" > ?: "Franck Houssen" > Cc: "Stefano Zampini" , "PETSc" > , "PETSc" > Envoy?: Mardi 23 Mai 2017 18:46:34 > Objet: Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS matrix > and a global vector ? > On Tue, May 23, 2017 at 11:28 AM, Franck Houssen < franck.houssen at inria.fr > > wrote: > > OK, thanks. This is helpfull... But I really think the doc should be more > > verbose about that: this is really confusing and I didn't find any simple > > example to begin with which make all this even more confusing (personal > > opinion). > > Did you respond to my other question (how are you using them)? That would > help me understand how to phrase it. > Thanks, > Matt > > Franck > > > > De: "Matthew Knepley" < knepley at gmail.com > > > > > > > ?: "Franck Houssen" < franck.houssen at inria.fr > > > > > > > Cc: "Stefano Zampini" < stefano.zampini at gmail.com >, "PETSc" < > > > petsc-users at mcs.anl.gov >, "PETSc" < petsc-dev at mcs.anl.gov > > > > > > > Envoy?: Mardi 23 Mai 2017 13:21:21 > > > > > > Objet: Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS > > > matrix > > > and a global vector ? > > > > > > On Tue, May 23, 2017 at 4:53 AM, Franck Houssen < franck.houssen at inria.fr > > > > > > > wrote: > > > > > > > The first thing I did was to put 3, not 4 : I got an error thrown in > > > > MatCreateIS (see the git diff + stack below). As the error said I used > > > > globalSize = numberOfMPIProcessus * localSize : my understanding is > > > > that, > > > > when using MatIS, the global size needs to be the sum of all local > > > > sizes. > > > > Correct ? > > > > > > > > > No. MatIS means that the matrix is not assembled. The easiest way (for > > > me) > > > to > > > think of this is that processes do not have > > > > > > to hold full rows. One process can hold part of row i, and another > > > processes > > > can hold another part. However, there are still > > > > > > the same number of global rows. > > > > > > > I have a 3x3 global matrix made of two overlapping 2x2 local matrix (= > > > > diagonal with 1.). Each local matrix correspond to one domain (each > > > > domain > > > > is delegated to one MPI proc, so, I have 2 MPI procs because I have 2 > > > > domains). > > > > > > > > > So the global size is 3. The local size here is not the size of the local > > > IS > > > block, since that is a property only of MatIS. It is the > > > > > > size of the local piece of the vector you multiply. This allows PETSc to > > > understand the parallel layout of the Vec, and how it > > > > > > matched the Mat. > > > > > > This is somewhat confusing because FEM people mean something different by > > > "local" than we do here, and in fact we use this > > > > > > other definition of local when assembling operators. > > > > > > Matt > > > > > > > This is the simplest possible example: I have two 2x2 (local) diag > > > > matrix > > > > that overlap so that the global matrix built from them is 1, 2, 1 on > > > > the > > > > diagonal (local contributions add up in the middle). > > > > > > > > > > I need to MatMult this global matrix with a global vector filled with > > > > 1. > > > > > > > > > > Franck > > > > > > > > > > Git diff : > > > > > > > > > > --- a/matISLocalMat.cpp > > > > > > > > > > +++ b/matISLocalMat.cpp > > > > > > > > > > @@ -16,7 +16,7 @@ int main(int argc,char **argv) { > > > > > > > > > > int size = 0; MPI_Comm_size(MPI_COMM_WORLD, &size); if (size != 2) > > > > return > > > > 1; > > > > > > > > > > int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > > > > > > > > > - PetscInt localSize = 2, globalSize = localSize*2 /*2 MPI*/; > > > > > > > > > > + PetscInt localSize = 2, globalSize = 3; > > > > > > > > > > PetscInt localIdx[2] = {0, 0}; > > > > > > > > > > if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} > > > > > > > > > > else {localIdx[0] = 1; localIdx[1] = 2;} > > > > > > > > > > Stack error: > > > > > > > > > > [0]PETSC ERROR: Nonconforming object sizes > > > > > > > > > > [0]PETSC ERROR: Sum of local lengths 4 does not equal global length 3, > > > > my > > > > local length 2 > > > > > > > > > > [0]PETSC ERROR: [0] ISG2LMapApply line 17 > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/vec/is/utils/isltog.c > > > > > > > > > > [0]PETSC ERROR: [0] MatSetValues_IS line 692 > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > > > > > > > [0]PETSC ERROR: [0] MatSetValues line 1157 > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > > > > > > > > > > [0]PETSC ERROR: [0] MatISSetPreallocation_IS line 95 > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > > > > > > > [0]PETSC ERROR: [0] MatISSetPreallocation line 80 > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > > > > > > > [0]PETSC ERROR: [0] PetscSplitOwnership line 80 > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/sys/utils/psplit.c > > > > > > > > > > [0]PETSC ERROR: [0] PetscLayoutSetUp line 129 > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/vec/is/utils/pmap.c > > > > > > > > > > [0]PETSC ERROR: [0] MatSetLocalToGlobalMapping_IS line 628 > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > > > > > > > [0]PETSC ERROR: [0] MatSetLocalToGlobalMapping line 1899 > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > > > > > > > > > > [0]PETSC ERROR: [0] MatCreateIS line 986 > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > > > > > > > > De: "Stefano Zampini" < stefano.zampini at gmail.com > > > > > > > > > > > > > > > > ?: "Matthew Knepley" < knepley at gmail.com > > > > > > > > > > > > > > > > Cc: "Franck Houssen" < franck.houssen at inria.fr >, "PETSc" < > > > > > petsc-users at mcs.anl.gov >, "PETSc" < petsc-dev at mcs.anl.gov > > > > > > > > > > > > > > > > Envoy?: Dimanche 21 Mai 2017 23:02:37 > > > > > > > > > > > > > > > Objet: Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS > > > > > matrix > > > > > and a global vector ? > > > > > > > > > > > > > > > Franck, > > > > > > > > > > > > > > > PETSc takes care of doing the matrix-vector multiplication properly > > > > > using > > > > > MatIS. As Matt said, the layout of the vectors is the usual parallel > > > > > layout. > > > > > > > > > > > > > > > The local sizes of the MatIS matrix (i.e. the local size of the left > > > > > and > > > > > right vectors used in MatMult) are not the sizes of the local > > > > > subdomain > > > > > matrices in MatIS. > > > > > > > > > > > > > > > > On May 21, 2017, at 6:47 PM, Matthew Knepley < knepley at gmail.com > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > On Sun, May 21, 2017 at 11:26 AM, Franck Houssen < > > > > > > franck.houssen at inria.fr > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > Using PETSc MatIS, how to matmult a global IS matrix and a global > > > > > > > vector > > > > > > > ? > > > > > > > Example is attached : I don't get what I expect that is a vector > > > > > > > such > > > > > > > that > > > > > > > proc0 = [1, 2] and proc1 = [2, 1] > > > > > > > > > > > > > > > > > > > > > > > > > > > 1) I think the global size of your matrix is wrong. You seem to > > > > > > want > > > > > > 3, > > > > > > not > > > > > > 4 > > > > > > > > > > > > > > > > > > > > > 2) Global vectors have a non-overlapping row partition. You might > > > > > > be > > > > > > thinking > > > > > > of local vectors > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > Matt > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > > > > What most experimenters take for granted before they begin their > > > > > > experiments > > > > > > is infinitely more interesting than any results to which their > > > > > > experiments > > > > > > lead. > > > > > > > > > > > > > > > > > > > > > -- Norbert Wiener > > > > > > > > > > > > > > > > > > > > > http://www.caam.rice.edu/~mk51/ > > > > > > > > > > > > > > > > > > > > De: "Stefano Zampini" < stefano.zampini at gmail.com > > > > > > > > > > > > > > > > ?: "Matthew Knepley" < knepley at gmail.com > > > > > > > > > > > > > > > > Cc: "Franck Houssen" < franck.houssen at inria.fr >, "PETSc" < > > > > > petsc-users at mcs.anl.gov >, "PETSc" < petsc-dev at mcs.anl.gov > > > > > > > > > > > > > > > > Envoy?: Dimanche 21 Mai 2017 23:02:37 > > > > > > > > > > > > > > > Objet: Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS > > > > > matrix > > > > > and a global vector ? > > > > > > > > > > > > > > > Franck, > > > > > > > > > > > > > > > PETSc takes care of doing the matrix-vector multiplication properly > > > > > using > > > > > MatIS. As Matt said, the layout of the vectors is the usual parallel > > > > > layout. > > > > > > > > > > > > > > > The local sizes of the MatIS matrix (i.e. the local size of the left > > > > > and > > > > > right vectors used in MatMult) are not the sizes of the local > > > > > subdomain > > > > > matrices in MatIS. > > > > > > > > > > > > > > > > On May 21, 2017, at 6:47 PM, Matthew Knepley < knepley at gmail.com > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > On Sun, May 21, 2017 at 11:26 AM, Franck Houssen < > > > > > > franck.houssen at inria.fr > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > Using PETSc MatIS, how to matmult a global IS matrix and a global > > > > > > > vector > > > > > > > ? > > > > > > > Example is attached : I don't get what I expect that is a vector > > > > > > > such > > > > > > > that > > > > > > > proc0 = [1, 2] and proc1 = [2, 1] > > > > > > > > > > > > > > > > > > > > > > > > > > > 1) I think the global size of your matrix is wrong. You seem to > > > > > > want > > > > > > 3, > > > > > > not > > > > > > 4 > > > > > > > > > > > > > > > > > > > > > 2) Global vectors have a non-overlapping row partition. You might > > > > > > be > > > > > > thinking > > > > > > of local vectors > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > Matt > > > > > > > > > > > > > > > > > > > > > > Franck > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > > > > What most experimenters take for granted before they begin their > > > > > > experiments > > > > > > is infinitely more interesting than any results to which their > > > > > > experiments > > > > > > lead. > > > > > > > > > > > > > > > > > > > > > -- Norbert Wiener > > > > > > > > > > > > > > > > > > > > > http://www.caam.rice.edu/~mk51/ > > > > > > > > > > > > > > > > > > -- > > > > > > What most experimenters take for granted before they begin their > > > experiments > > > is infinitely more interesting than any results to which their > > > experiments > > > lead. > > > > > > -- Norbert Wiener > > > > > > http://www.caam.rice.edu/~mk51/ > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener > http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue May 23 12:02:28 2017 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 23 May 2017 12:02:28 -0500 Subject: [petsc-users] [petsc-dev] Using PETSc MatIS, how to matmult a global IS matrix and a global vector ? In-Reply-To: <855172682.7687763.1495558287122.JavaMail.zimbra@inria.fr> References: <2012394521.6757315.1495383841678.JavaMail.zimbra@inria.fr> <1564257107.6757440.1495383974167.JavaMail.zimbra@inria.fr> <264DC59D-B914-42E5-9A89-0746F21A37BF@gmail.com> <1392596904.7422896.1495533198072.JavaMail.zimbra@inria.fr> <1784716977.7683031.1495556883254.JavaMail.zimbra@inria.fr> <855172682.7687763.1495558287122.JavaMail.zimbra@inria.fr> Message-ID: On Tue, May 23, 2017 at 11:51 AM, Franck Houssen wrote: > Not sure to know what question you're talking about ?!... > I use MatIS to test some kind of domain decomposition methods. I define my > own preconditioner for that: in the apply callback, I need to matmult my > (matIS) matrix with the incoming vector. > Okay. I will create an example using your suggestion. Thanks, Matt > Franck > > ------------------------------ > > *De: *"Matthew Knepley" > *?: *"Franck Houssen" > *Cc: *"Stefano Zampini" , "PETSc" < > petsc-users at mcs.anl.gov>, "PETSc" > *Envoy?: *Mardi 23 Mai 2017 18:46:34 > *Objet: *Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS > matrix and a global vector ? > > On Tue, May 23, 2017 at 11:28 AM, Franck Houssen > wrote: > >> OK, thanks. This is helpfull... But I really think the doc should be more >> verbose about that: this is really confusing and I didn't find any simple >> example to begin with which make all this even more confusing (personal >> opinion). >> > > Did you respond to my other question (how are you using them)? That would > help me understand how to phrase it. > > Thanks, > > Matt > > >> Franck >> >> >> ------------------------------ >> >> *De: *"Matthew Knepley" >> *?: *"Franck Houssen" >> *Cc: *"Stefano Zampini" , "PETSc" < >> petsc-users at mcs.anl.gov>, "PETSc" >> *Envoy?: *Mardi 23 Mai 2017 13:21:21 >> *Objet: *Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS >> matrix and a global vector ? >> >> On Tue, May 23, 2017 at 4:53 AM, Franck Houssen >> wrote: >> >>> The first thing I did was to put 3, not 4 : I got an error thrown in >>> MatCreateIS (see the git diff + stack below). As the error said I used >>> globalSize = numberOfMPIProcessus * localSize : my understanding is that, >>> when using MatIS, the global size needs to be the sum of all local sizes. >>> Correct ? >>> >> >> No. MatIS means that the matrix is not assembled. The easiest way (for >> me) to think of this is that processes do not have >> to hold full rows. One process can hold part of row i, and another >> processes can hold another part. However, there are still >> the same number of global rows. >> >> I have a 3x3 global matrix made of two overlapping 2x2 local matrix (= >>> diagonal with 1.). Each local matrix correspond to one domain (each domain >>> is delegated to one MPI proc, so, I have 2 MPI procs because I have 2 >>> domains). >>> >> >> So the global size is 3. The local size here is not the size of the local >> IS block, since that is a property only of MatIS. It is the >> size of the local piece of the vector you multiply. This allows PETSc to >> understand the parallel layout of the Vec, and how it >> matched the Mat. >> >> This is somewhat confusing because FEM people mean something different by >> "local" than we do here, and in fact we use this >> other definition of local when assembling operators. >> >> Matt >> >> >>> This is the simplest possible example: I have two 2x2 (local) diag >>> matrix that overlap so that the global matrix built from them is 1, 2, 1 on >>> the diagonal (local contributions add up in the middle). >>> I need to MatMult this global matrix with a global vector filled with 1. >>> >>> Franck >>> >>> Git diff : >>> >>> --- a/matISLocalMat.cpp >>> +++ b/matISLocalMat.cpp >>> @@ -16,7 +16,7 @@ int main(int argc,char **argv) { >>> int size = 0; MPI_Comm_size(MPI_COMM_WORLD, &size); if (size != 2) >>> return 1; >>> int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); >>> >>> - PetscInt localSize = 2, globalSize = localSize*2 /*2 MPI*/; >>> + PetscInt localSize = 2, globalSize = 3; >>> PetscInt localIdx[2] = {0, 0}; >>> if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} >>> else {localIdx[0] = 1; localIdx[1] = 2;} >>> >>> >>> >>> Stack error: >>> >>> [0]PETSC ERROR: Nonconforming object sizes >>> [0]PETSC ERROR: Sum of local lengths 4 does not equal global length 3, >>> my local length 2 >>> [0]PETSC ERROR: [0] ISG2LMapApply line 17 /home/fghoussen/Documents/ >>> INRIA/petsc-3.7.6/src/vec/is/utils/isltog.c >>> [0]PETSC ERROR: [0] MatSetValues_IS line 692 /home/fghoussen/Documents/ >>> INRIA/petsc-3.7.6/src/mat/impls/is/matis.c >>> [0]PETSC ERROR: [0] MatSetValues line 1157 /home/fghoussen/Documents/ >>> INRIA/petsc-3.7.6/src/mat/interface/matrix.c >>> [0]PETSC ERROR: [0] MatISSetPreallocation_IS line 95 >>> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c >>> [0]PETSC ERROR: [0] MatISSetPreallocation line 80 >>> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c >>> [0]PETSC ERROR: [0] PetscSplitOwnership line 80 >>> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/sys/utils/psplit.c >>> [0]PETSC ERROR: [0] PetscLayoutSetUp line 129 /home/fghoussen/Documents/ >>> INRIA/petsc-3.7.6/src/vec/is/utils/pmap.c >>> [0]PETSC ERROR: [0] MatSetLocalToGlobalMapping_IS line 628 >>> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c >>> [0]PETSC ERROR: [0] MatSetLocalToGlobalMapping line 1899 >>> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c >>> [0]PETSC ERROR: [0] MatCreateIS line 986 /home/fghoussen/Documents/ >>> INRIA/petsc-3.7.6/src/mat/impls/is/matis.c >>> >>> >>> >>> ------------------------------ >>> >>> *De: *"Stefano Zampini" >>> *?: *"Matthew Knepley" >>> *Cc: *"Franck Houssen" , "PETSc" < >>> petsc-users at mcs.anl.gov>, "PETSc" >>> *Envoy?: *Dimanche 21 Mai 2017 23:02:37 >>> *Objet: *Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS >>> matrix and a global vector ? >>> >>> Franck, >>> >>> PETSc takes care of doing the matrix-vector multiplication properly >>> using MatIS. As Matt said, the layout of the vectors is the usual parallel >>> layout. >>> The local sizes of the MatIS matrix (i.e. the local size of the left and >>> right vectors used in MatMult) are not the sizes of the local subdomain >>> matrices in MatIS. >>> >>> >>> On May 21, 2017, at 6:47 PM, Matthew Knepley wrote: >>> >>> On Sun, May 21, 2017 at 11:26 AM, Franck Houssen < >>> franck.houssen at inria.fr> wrote: >>> >>>> Using PETSc MatIS, how to matmult a global IS matrix and a global >>>> vector ? Example is attached : I don't get what I expect that is a vector >>>> such that proc0 = [1, 2] and proc1 = [2, 1] >>>> >>> >>> 1) I think the global size of your matrix is wrong. You seem to want 3, >>> not 4 >>> >>> 2) Global vectors have a non-overlapping row partition. You might be >>> thinking of local vectors >>> >>> Thanks, >>> >>> Matt >>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> http://www.caam.rice.edu/~mk51/ >>> >>> >>> ------------------------------ >>> >>> *De: *"Stefano Zampini" >>> *?: *"Matthew Knepley" >>> *Cc: *"Franck Houssen" , "PETSc" < >>> petsc-users at mcs.anl.gov>, "PETSc" >>> *Envoy?: *Dimanche 21 Mai 2017 23:02:37 >>> *Objet: *Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS >>> matrix and a global vector ? >>> >>> Franck, >>> >>> PETSc takes care of doing the matrix-vector multiplication properly >>> using MatIS. As Matt said, the layout of the vectors is the usual parallel >>> layout. >>> The local sizes of the MatIS matrix (i.e. the local size of the left and >>> right vectors used in MatMult) are not the sizes of the local subdomain >>> matrices in MatIS. >>> >>> >>> On May 21, 2017, at 6:47 PM, Matthew Knepley wrote: >>> >>> On Sun, May 21, 2017 at 11:26 AM, Franck Houssen < >>> franck.houssen at inria.fr> wrote: >>> >>>> Using PETSc MatIS, how to matmult a global IS matrix and a global >>>> vector ? Example is attached : I don't get what I expect that is a vector >>>> such that proc0 = [1, 2] and proc1 = [2, 1] >>>> >>> >>> 1) I think the global size of your matrix is wrong. You seem to want 3, >>> not 4 >>> >>> 2) Global vectors have a non-overlapping row partition. You might be >>> thinking of local vectors >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Franck >>>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> http://www.caam.rice.edu/~mk51/ >>> >>> >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> http://www.caam.rice.edu/~mk51/ >> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From friedenhe at gmail.com Tue May 23 12:09:05 2017 From: friedenhe at gmail.com (Ping He) Date: Tue, 23 May 2017 13:09:05 -0400 Subject: [petsc-users] How to manually set the matrix-free differencing parameter h? Message-ID: <59246CB1.4020000@gmail.com> Hi, I am using PETSc-SNES matrix free approach, and I would like to know how to manually set the differencing parameter h. I tried to use the SNESDefaultMatrixFreeSetParameters2 function but I got an error when compiling: ?SNESDefaultMatrixFreeSetParameters2? was not declared in this scope. Thanks very much in advance. Regards, Ping -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue May 23 13:02:53 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 23 May 2017 13:02:53 -0500 Subject: [petsc-users] How to manually set the matrix-free differencing parameter h? In-Reply-To: <59246CB1.4020000@gmail.com> References: <59246CB1.4020000@gmail.com> Message-ID: <6D3FA436-3969-45A8-ADAD-AAEDE2FFC891@mcs.anl.gov> That's not really the right function; you can use MatMFFDSetFunctionError(), MatMFFDSetType(), MatMFFDSetPeriod() to set the available parameters. Barry > On May 23, 2017, at 12:09 PM, Ping He wrote: > > Hi, > > I am using PETSc-SNES matrix free approach, and I would like to know how to manually set the differencing parameter h. I tried to use the SNESDefaultMatrixFreeSetParameters2 function but I got an error when compiling: ?SNESDefaultMatrixFreeSetParameters2? was not declared in this scope. > > Thanks very much in advance. > > Regards, > Ping From stefano.zampini at gmail.com Tue May 23 13:23:49 2017 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Tue, 23 May 2017 20:23:49 +0200 Subject: [petsc-users] [petsc-dev] Using PETSc MatIS, how to get local matrix (= one domain) before and after assembly ? In-Reply-To: <740691579.7684644.1495557293858.JavaMail.zimbra@inria.fr> References: <867421313.6757137.1495383596545.JavaMail.zimbra@inria.fr> <1253777447.6757298.1495383780337.JavaMail.zimbra@inria.fr> <2033509705.7414108.1495532492501.JavaMail.zimbra@inria.fr> <740691579.7684644.1495557293858.JavaMail.zimbra@inria.fr> Message-ID: <9EFA5BCF-FDD3-45FA-A41A-6AA304D58C74@gmail.com> > On May 23, 2017, at 6:34 PM, Franck Houssen wrote: > > OK. I am supposed to destroy the matrix returned by MatISGetMPIXAIJ ? Yes > Also, my example still not get the final assembled local matrix (the MatCreateSubMatrix returns an empty matrix) but as far as I understand my (global) index set is OK: what did I miss ? I really doubt you can use the example you have sent. It doesn?t compile, as MatCreateSubMatrix needs an extra argument. Attached a modified version that does what I guess is what you are looking for (sequential Dirichlet problems on the subdomains). > > Franck > > > De: "Stefano Zampini" > ?: "Franck Houssen" > Cc: "petsc-dev" , "PETSc users list" , "petsc-maint" > Envoy?: Mardi 23 Mai 2017 13:16:18 > Objet: Re: [petsc-dev] Using PETSc MatIS, how to get local matrix (= one domain) before and after assembly ? > > MatISGetMPIXAIJ is collective, as it assembles the global operator. To get the matrices you are looking for, you should call MatCreateSubMatrix on the assembled global operator, with the global indices representing the subdomain problem. Each process needs to call both functions > > Stefano > > Il 23 Mag 2017 11:41, "Franck Houssen" > ha scritto: > I have a 3x3 global matrix made of two overlapping 2x2 local matrix (= diagonal with 1.). Each local matrix correspond to one domain (each domain is delegated to one MPI proc, so, I have 2 MPI procs because I have 2 domains). > This is the simplest possible example: I have two 2x2 (local) diag matrix that overlap so that the global matrix built from them is 1, 2, 1 on the diagonal (local contributions add up in the middle). > > Now, I need for each MPI proc to get the assembled local matrix (sometimes called the dirichlet matrix) : this is a local matrix (sequential - not distributed with MPI) that accounts for contribution of neighboring domains (MPI proc). > > How to get the local assembled matrix ? MatGetLocalSubMatrix does not work (throw error - see example attached). MatGetSubMatrix returns a MPI distributed matrix, not a local (sequential) one. > My understanding is that MatISGetMPIXAIJ should return a local matrix (sequential AIJ matrix) : the MPI in the name recall that you get the assembled matrix (with contributions from the shared border) from the other MPI processus. Correct ? In my simple example, I replaced MatGetLocalSubMatrix with MatISGetMPIXAIJ : I get a deadlock which was surprising to me... Is MatISGetMPIXAIJ a collective call ? > Supposing this is a collective call (and that point 1 is not correct), I ride up MatISGetMPIXAIJ before the "if (rank > 0)" : I don't deadlock now, but it seems I get a global matrix which is not the assembled local matrix I am looking for. > I am supposed to destroy the matrix returned by MatISGetMPIXAIJ ? (I believe yes - not sure as AFAIU wording should associate Destroy methods to Create methods) > Franck > > The git diff illustrate modifications I tried to add to the initial file attached to this thread: > --- a/matISLocalMat.cpp > +++ b/matISLocalMat.cpp > @@ -31,6 +31,8 @@ int main(int argc,char **argv) { > MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); > MatView(A, PETSC_VIEWER_STDOUT_WORLD); PetscViewerFlush(PETSC_VIEWER_STDOUT_WORLD); // Diag: 1, 2, 1 > > + Mat assembledLocalMat; > + MatISGetMPIXAIJ(A, MAT_INITIAL_MATRIX, &assembledLocalMat); > if (rank > 0) { // Do not pollute stdout: print only 1 proc > std::cout << std::endl << "non assembled local matrix:" << std::endl << std::endl; > Mat nonAssembledLocalMat; > @@ -38,11 +40,10 @@ int main(int argc,char **argv) { > MatView(nonAssembledLocalMat, PETSC_VIEWER_STDOUT_SELF); // Diag: 1, 1 > > std::cout << std::endl << "assembled local matrix:" << std::endl << std::endl; > - Mat assembledLocalMat; > - IS is; ISCreateGeneral(PETSC_COMM_SELF, localSize, localIdx, PETSC_COPY_VALUES, &is); > - MatGetLocalSubMatrix(A, is, is, &assembledLocalMat); // KO ?!... > - MatView(assembledLocalMat, PETSC_VIEWER_STDOUT_SELF); // Would like to get => Diag: 2, 1 > + //IS is; ISCreateGeneral(PETSC_COMM_SELF, localSize, localIdx, PETSC_COPY_VALUES, &is); > + //MatGetLocalSubMatrix(A, is, is, &assembledLocalMat); // KO ?!... > } > + MatView(assembledLocalMat, PETSC_VIEWER_STDOUT_WORLD); // Would like to get => Diag: 2, 1 > > > De: "Stefano Zampini" > > ?: "petsc-maint" > > Cc: "petsc-dev" >, "PETSc users list" >, "Franck Houssen" > > Envoy?: Dimanche 21 Mai 2017 22:51:34 > Objet: Re: [petsc-dev] Using PETSc MatIS, how to get local matrix (= one domain) before and after assembly ? > > To assemble the operator in aij format, use > MatISGetMPIXAIJ > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatISGetMPIXAIJ.html > > Il 21 Mag 2017 18:43, "Matthew Knepley" > ha scritto: > On Sun, May 21, 2017 at 11:23 AM, Franck Houssen > wrote: > I have a 3x3 global matrix is built (diag: 1, 2, 1): it's made of 2 overlapping 2x2 local matrix (diag: 1, 1). > Getting non assembled local matrix is OK with MatISGetLocalMat. > How to get assembled local matrix (initial local matrix + neigbhor contributions on the borders) ? (expected result is diag: 2, 1) > > You can always use > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetSubMatrix.html > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetSubMatrices.html > > to get copies, but if you just want to build things, you can use > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetLocalSubMatrix.html > > Thanks, > > Matt > > Franck > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: matISLocalMat.cpp Type: application/octet-stream Size: 3788 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From luvsharma11 at gmail.com Tue May 23 14:05:08 2017 From: luvsharma11 at gmail.com (Luv Sharma) Date: Tue, 23 May 2017 21:05:08 +0200 Subject: [petsc-users] matshell for spectral methods in fortran Message-ID: Dear PETSc team, I am working on a code which solves mechanical equilibrium using spectral methods. I want to make use of the matshell to get the action J*v. I have been able to successfully implement it using petsc4py. But having difficulties to get it working in a fortran code. I am using petsc-3.7.6. Below is a stripped down version of the existing fortran code (module). Can you please help me in figuring out how the right way to do it a code with following structure? !-------------------------------------------------------------------------------------------------- module spectral_mech_basic implicit none private #include ! *PETSc data here* DM .. SNES .. .. contains !-------------------------------------------------------------------------------------------------- subroutine basicPETSc_init external :: & *petsc functions here* ! initialize solver specific parts of PETSc call SNESCreate(PETSC_COMM_WORLD,snes,ierr); CHKERRQ(ierr) call SNESSetOptionsPrefix(snes,'mech_',ierr);CHKERRQ(ierr) call DMDACreate3d(PETSC_COMM_WORLD, & DM_BOUNDARY_NONE, DM_BOUNDARY_NONE, DM_BOUNDARY_NONE, & DMDA_STENCIL_BOX, & grid(1),grid(2),grid(3), & 1 , 1, worldsize, & 9, 0, & grid(1),grid(2),localK, & da,ierr) CHKERRQ(ierr) call SNESSetDM(snes,da,ierr); CHKERRQ(ierr) call DMCreateGlobalVector(da,solution_vec,ierr); CHKERRQ(ierr) call DMDASNESSetFunctionLocal(da,INSERT_VALUES,BasicPETSC_formResidual,PETSC_NULL_OBJECT,ierr) CHKERRQ(ierr) call SNESSetDM(snes,da,ierr); CHKERRQ(ierr) call SNESGetKSP(snes,ksp,ierr); CHKERRQ(ierr) call KSPGetPC(ksp,pc,ierr); CHKERRQ(ierr) call PCSetType(pc,PCNONE,ierr); CHKERRQ(ierr) call SNESSetFromOptions(snes,ierr); CHKERRQ(ierr) ! init fields call DMDAVecGetArrayF90(da,solution_vec,F,ierr); CHKERRQ(ierr) call DMDAVecRestoreArrayF90(da,solution_vec,F,ierr); CHKERRQ(ierr) end subroutine basicPETSc_init !-------------------------------------------------------------------------------------------------- type(tSolutionState) function & basicPETSc_solution(incInfoIn,timeinc,timeinc_old,stress_BC,rotation_BC) implicit none ! PETSc Data PetscErrorCode :: ierr SNESConvergedReason :: reason external :: & SNESSolve, & ! solve BVP call SNESSolve(snes,PETSC_NULL_OBJECT,solution_vec,ierr) CHKERRQ(ierr) end function BasicPETSc_solution !-------------------------------------------------------------------------------------------------- !> @brief forms the basic residual vector !-------------------------------------------------------------------------------------------------- subroutine BasicPETSC_formResidual(in,x_scal,f_scal,dummy,ierr) implicit none DMDALocalInfo, dimension(DMDA_LOCAL_INFO_SIZE) :: & in PetscScalar, dimension(3,3, & XG_RANGE,YG_RANGE,ZG_RANGE), intent(in) :: & x_scal PetscScalar, dimension(3,3, & X_RANGE,Y_RANGE,Z_RANGE), intent(out) :: & f_scal ! constructing residual ?.. ?. f_scal = tensorField_real(1:3,1:3,1:grid(1),1:grid(2),1:grid3) end subroutine BasicPETSc_formResidual !-------------------------------------------------------------------------------------------------- end module spectral_mech_basic !-------------------------------------------------------------------------------------------------- Best regards, Luv > On 3 Nov 2016, at 01:17, Barry Smith wrote: > > > Is anyone away of cases where PETSc has been used with spectral methods? > > Thanks > > Barry > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Tue May 23 14:09:16 2017 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Tue, 23 May 2017 21:09:16 +0200 Subject: [petsc-users] [petsc-dev] Using PETSc MatIS, how to matmult a global IS matrix and a global vector ? In-Reply-To: <1784716977.7683031.1495556883254.JavaMail.zimbra@inria.fr> References: <2012394521.6757315.1495383841678.JavaMail.zimbra@inria.fr> <1564257107.6757440.1495383974167.JavaMail.zimbra@inria.fr> <264DC59D-B914-42E5-9A89-0746F21A37BF@gmail.com> <1392596904.7422896.1495533198072.JavaMail.zimbra@inria.fr> <1784716977.7683031.1495556883254.JavaMail.zimbra@inria.fr> Message-ID: Il 23 Mag 2017 6:28 PM, "Franck Houssen" ha scritto: OK, thanks. This is helpfull... But I really think the doc should be more verbose about that: this is really confusing and I didn't find any simple example to begin with which make all this even more confusing (personal opinion). The man page of MatCreateIS is clear to me http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Mat/MatCreateIS.html#MatCreateIS Franck ------------------------------ *De: *"Matthew Knepley" *?: *"Franck Houssen" *Cc: *"Stefano Zampini" , "PETSc" < petsc-users at mcs.anl.gov>, "PETSc" *Envoy?: *Mardi 23 Mai 2017 13:21:21 *Objet: *Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS matrix and a global vector ? On Tue, May 23, 2017 at 4:53 AM, Franck Houssen wrote: > The first thing I did was to put 3, not 4 : I got an error thrown in > MatCreateIS (see the git diff + stack below). As the error said I used > globalSize = numberOfMPIProcessus * localSize : my understanding is that, > when using MatIS, the global size needs to be the sum of all local sizes. > Correct ? > No. MatIS means that the matrix is not assembled. The easiest way (for me) to think of this is that processes do not have to hold full rows. One process can hold part of row i, and another processes can hold another part. However, there are still the same number of global rows. I have a 3x3 global matrix made of two overlapping 2x2 local matrix (= > diagonal with 1.). Each local matrix correspond to one domain (each domain > is delegated to one MPI proc, so, I have 2 MPI procs because I have 2 > domains). > So the global size is 3. The local size here is not the size of the local IS block, since that is a property only of MatIS. It is the size of the local piece of the vector you multiply. This allows PETSc to understand the parallel layout of the Vec, and how it matched the Mat. This is somewhat confusing because FEM people mean something different by "local" than we do here, and in fact we use this other definition of local when assembling operators. Matt > This is the simplest possible example: I have two 2x2 (local) diag matrix > that overlap so that the global matrix built from them is 1, 2, 1 on the > diagonal (local contributions add up in the middle). > I need to MatMult this global matrix with a global vector filled with 1. > > Franck > > Git diff : > > --- a/matISLocalMat.cpp > +++ b/matISLocalMat.cpp > @@ -16,7 +16,7 @@ int main(int argc,char **argv) { > int size = 0; MPI_Comm_size(MPI_COMM_WORLD, &size); if (size != 2) > return 1; > int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > - PetscInt localSize = 2, globalSize = localSize*2 /*2 MPI*/; > + PetscInt localSize = 2, globalSize = 3; > PetscInt localIdx[2] = {0, 0}; > if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} > else {localIdx[0] = 1; localIdx[1] = 2;} > > > > Stack error: > > [0]PETSC ERROR: Nonconforming object sizes > [0]PETSC ERROR: Sum of local lengths 4 does not equal global length 3, my > local length 2 > [0]PETSC ERROR: [0] ISG2LMapApply line 17 /home/fghoussen/Documents/ > INRIA/petsc-3.7.6/src/vec/is/utils/isltog.c > [0]PETSC ERROR: [0] MatSetValues_IS line 692 /home/fghoussen/Documents/ > INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > [0]PETSC ERROR: [0] MatSetValues line 1157 /home/fghoussen/Documents/ > INRIA/petsc-3.7.6/src/mat/interface/matrix.c > [0]PETSC ERROR: [0] MatISSetPreallocation_IS line 95 > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > [0]PETSC ERROR: [0] MatISSetPreallocation line 80 > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > [0]PETSC ERROR: [0] PetscSplitOwnership line 80 /home/fghoussen/Documents/ > INRIA/petsc-3.7.6/src/sys/utils/psplit.c > [0]PETSC ERROR: [0] PetscLayoutSetUp line 129 /home/fghoussen/Documents/ > INRIA/petsc-3.7.6/src/vec/is/utils/pmap.c > [0]PETSC ERROR: [0] MatSetLocalToGlobalMapping_IS line 628 > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > [0]PETSC ERROR: [0] MatSetLocalToGlobalMapping line 1899 > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > [0]PETSC ERROR: [0] MatCreateIS line 986 /home/fghoussen/Documents/ > INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > ------------------------------ > > *De: *"Stefano Zampini" > *?: *"Matthew Knepley" > *Cc: *"Franck Houssen" , "PETSc" < > petsc-users at mcs.anl.gov>, "PETSc" > *Envoy?: *Dimanche 21 Mai 2017 23:02:37 > *Objet: *Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS > matrix and a global vector ? > > Franck, > > PETSc takes care of doing the matrix-vector multiplication properly using > MatIS. As Matt said, the layout of the vectors is the usual parallel > layout. > The local sizes of the MatIS matrix (i.e. the local size of the left and > right vectors used in MatMult) are not the sizes of the local subdomain > matrices in MatIS. > > > On May 21, 2017, at 6:47 PM, Matthew Knepley wrote: > > On Sun, May 21, 2017 at 11:26 AM, Franck Houssen > wrote: > >> Using PETSc MatIS, how to matmult a global IS matrix and a global vector >> ? Example is attached : I don't get what I expect that is a vector such >> that proc0 = [1, 2] and proc1 = [2, 1] >> > > 1) I think the global size of your matrix is wrong. You seem to want 3, > not 4 > > 2) Global vectors have a non-overlapping row partition. You might be > thinking of local vectors > > Thanks, > > Matt > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > > ------------------------------ > > *De: *"Stefano Zampini" > *?: *"Matthew Knepley" > *Cc: *"Franck Houssen" , "PETSc" < > petsc-users at mcs.anl.gov>, "PETSc" > *Envoy?: *Dimanche 21 Mai 2017 23:02:37 > *Objet: *Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS > matrix and a global vector ? > > Franck, > > PETSc takes care of doing the matrix-vector multiplication properly using > MatIS. As Matt said, the layout of the vectors is the usual parallel > layout. > The local sizes of the MatIS matrix (i.e. the local size of the left and > right vectors used in MatMult) are not the sizes of the local subdomain > matrices in MatIS. > > > On May 21, 2017, at 6:47 PM, Matthew Knepley wrote: > > On Sun, May 21, 2017 at 11:26 AM, Franck Houssen > wrote: > >> Using PETSc MatIS, how to matmult a global IS matrix and a global vector >> ? Example is attached : I don't get what I expect that is a vector such >> that proc0 = [1, 2] and proc1 = [2, 1] >> > > 1) I think the global size of your matrix is wrong. You seem to want 3, > not 4 > > 2) Global vectors have a non-overlapping row partition. You might be > thinking of local vectors > > Thanks, > > Matt > > >> Franck >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue May 23 14:16:27 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 23 May 2017 14:16:27 -0500 Subject: [petsc-users] matshell for spectral methods in fortran In-Reply-To: References: Message-ID: <72BF559A-E663-4F04-93A2-222D4AD8A94B@mcs.anl.gov> You didn't include any code related to creating or setting the MATSHELL. What goes wrong with your Fortran code. > On May 23, 2017, at 2:05 PM, Luv Sharma wrote: > > Dear PETSc team, > > I am working on a code which solves mechanical equilibrium using spectral methods. > I want to make use of the matshell to get the action J*v. > > I have been able to successfully implement it using petsc4py. But having difficulties to get it working in a fortran code. > I am using petsc-3.7.6. > > Below is a stripped down version of the existing fortran code (module). Can you please help me in figuring out how the right way to do it a code with following structure? > > !-------------------------------------------------------------------------------------------------- > module spectral_mech_basic > > implicit none > private > #include > > ! *PETSc data here* > DM .. > SNES .. > .. > contains > > !-------------------------------------------------------------------------------------------------- > subroutine basicPETSc_init > > external :: & > *petsc functions here* > > ! initialize solver specific parts of PETSc > call SNESCreate(PETSC_COMM_WORLD,snes,ierr); CHKERRQ(ierr) > call SNESSetOptionsPrefix(snes,'mech_',ierr);CHKERRQ(ierr) > call DMDACreate3d(PETSC_COMM_WORLD, & > DM_BOUNDARY_NONE, DM_BOUNDARY_NONE, DM_BOUNDARY_NONE, & > DMDA_STENCIL_BOX, & > grid(1),grid(2),grid(3), & > 1 , 1, worldsize, & > 9, 0, & > grid(1),grid(2),localK, & > da,ierr) > CHKERRQ(ierr) > call SNESSetDM(snes,da,ierr); CHKERRQ(ierr) > call DMCreateGlobalVector(da,solution_vec,ierr); CHKERRQ(ierr) > call DMDASNESSetFunctionLocal(da,INSERT_VALUES,BasicPETSC_formResidual,PETSC_NULL_OBJECT,ierr) > CHKERRQ(ierr) > call SNESSetDM(snes,da,ierr); CHKERRQ(ierr) > call SNESGetKSP(snes,ksp,ierr); CHKERRQ(ierr) > call KSPGetPC(ksp,pc,ierr); CHKERRQ(ierr) > call PCSetType(pc,PCNONE,ierr); CHKERRQ(ierr) > call SNESSetFromOptions(snes,ierr); CHKERRQ(ierr) > > ! init fields > call DMDAVecGetArrayF90(da,solution_vec,F,ierr); CHKERRQ(ierr) > call DMDAVecRestoreArrayF90(da,solution_vec,F,ierr); CHKERRQ(ierr) > > end subroutine basicPETSc_init > !-------------------------------------------------------------------------------------------------- > > type(tSolutionState) function & > basicPETSc_solution(incInfoIn,timeinc,timeinc_old,stress_BC,rotation_BC) > implicit none > ! PETSc Data > PetscErrorCode :: ierr > SNESConvergedReason :: reason > external :: & > SNESSolve, & > ! solve BVP > call SNESSolve(snes,PETSC_NULL_OBJECT,solution_vec,ierr) > CHKERRQ(ierr) > end function BasicPETSc_solution > !-------------------------------------------------------------------------------------------------- > !> @brief forms the basic residual vector > !-------------------------------------------------------------------------------------------------- > subroutine BasicPETSC_formResidual(in,x_scal,f_scal,dummy,ierr) > > implicit none > DMDALocalInfo, dimension(DMDA_LOCAL_INFO_SIZE) :: & > in > PetscScalar, dimension(3,3, & > XG_RANGE,YG_RANGE,ZG_RANGE), intent(in) :: & > x_scal > PetscScalar, dimension(3,3, & > X_RANGE,Y_RANGE,Z_RANGE), intent(out) :: & > f_scal > ! constructing residual > ?.. > ?. > > f_scal = tensorField_real(1:3,1:3,1:grid(1),1:grid(2),1:grid3) > > end subroutine BasicPETSc_formResidual > !-------------------------------------------------------------------------------------------------- > > end module spectral_mech_basic > !-------------------------------------------------------------------------------------------------- > > Best regards, > Luv > >> On 3 Nov 2016, at 01:17, Barry Smith wrote: >> >> >> Is anyone away of cases where PETSc has been used with spectral methods? >> >> Thanks >> >> Barry >> From luvsharma11 at gmail.com Tue May 23 14:36:12 2017 From: luvsharma11 at gmail.com (Luv Sharma) Date: Tue, 23 May 2017 21:36:12 +0200 Subject: [petsc-users] matshell for spectral methods in fortran In-Reply-To: <72BF559A-E663-4F04-93A2-222D4AD8A94B@mcs.anl.gov> References: <72BF559A-E663-4F04-93A2-222D4AD8A94B@mcs.anl.gov> Message-ID: <8B5B69FE-F0A1-4442-9B03-3A9DBA447B2E@gmail.com> Dear Barry, Thanks for your quick reply. I have tried following: !-------------------------------------------------------------------------------------------------- module spectral_mech_basic implicit none private #include ! *PETSc data here* DM .. SNES .. .. contains !-------------------------------------------------------------------------------------------------- subroutine basicPETSc_init external :: & *petsc functions here* ! initialize solver specific parts of PETSc call SNESCreate(PETSC_COMM_WORLD,snes,ierr); CHKERRQ(ierr) call SNESSetOptionsPrefix(snes,'mech_',ierr);CHKERRQ(ierr) call DMDACreate3d(PETSC_COMM_WORLD, & DM_BOUNDARY_NONE, DM_BOUNDARY_NONE, DM_BOUNDARY_NONE, & DMDA_STENCIL_BOX, & grid(1),grid(2),grid(3), & 1 , 1, worldsize, & 9, 0, & grid(1),grid(2),localK, & da,ierr) CHKERRQ(ierr) call SNESSetDM(snes,da,ierr); CHKERRQ(ierr) call DMCreateGlobalVector(da,solution_vec,ierr); CHKERRQ(ierr) call DMDASNESSetFunctionLocal(da,INSERT_VALUES,BasicPETSC_formResidual,PETSC_NULL_OBJECT,ierr) CHKERRQ(ierr) !call DMCreateMatrix(da,J_shell,ierr) !CHKERRQ(ierr) !call DMSetMatType(da,MATSHELL,ierr) !CHKERRQ(ierr) !call DMSNESSetJacobianLocal(da,SPEC_mech_formJacobian,PETSC_NULL_OBJECT,ierr) !< function to evaluate stiffness matrix !CHKERRQ(ierr) matsize = 9_pInt*grid(1)*grid(2)*grid(3) call MatCreateShell( PETSC_COMM_WORLD, matsize, matsize, matsize, matsize, PETSC_NULL_OBJECT, J_shell, ierr ) CHKERRQ(ierr) call SNESSetJacobian( snes, J_shell, J_shell, SPEC_mech_formJacobian, PETSC_NULL_OBJECT, ierr) CHKERRQ(ierr) call MatShellSetOperation(J_shell, MATOP_MULT, jac_shell, ierr ) CHKERRQ(ierr) call SNESSetDM(snes,da,ierr); CHKERRQ(ierr) call SNESGetKSP(snes,ksp,ierr); CHKERRQ(ierr) call KSPGetPC(ksp,pc,ierr); CHKERRQ(ierr) call PCSetType(pc,PCNONE,ierr); CHKERRQ(ierr) call SNESSetFromOptions(snes,ierr); CHKERRQ(ierr) ! init fields call DMDAVecGetArrayF90(da,solution_vec,F,ierr); CHKERRQ(ierr) call DMDAVecRestoreArrayF90(da,solution_vec,F,ierr); CHKERRQ(ierr) end subroutine basicPETSc_init !-------------------------------------------------------------------------------------------------- type(tSolutionState) function & basicPETSc_solution(incInfoIn,timeinc,timeinc_old,stress_BC,rotation_BC) implicit none ! PETSc Data PetscErrorCode :: ierr SNESConvergedReason :: reason external :: & SNESSolve, & ! solve BVP call SNESSolve(snes,PETSC_NULL_OBJECT,solution_vec,ierr) CHKERRQ(ierr) end function BasicPETSc_solution !-------------------------------------------------------------------------------------------------- !> @brief forms the basic residual vector !-------------------------------------------------------------------------------------------------- subroutine BasicPETSC_formResidual(in,x_scal,f_scal,dummy,ierr) implicit none DMDALocalInfo, dimension(DMDA_LOCAL_INFO_SIZE) :: & in PetscScalar, dimension(3,3, & XG_RANGE,YG_RANGE,ZG_RANGE), intent(in) :: & x_scal PetscScalar, dimension(3,3, & X_RANGE,Y_RANGE,Z_RANGE), intent(out) :: & f_scal ! constructing residual ?.. ?. f_scal = tensorField_real(1:3,1:3,1:grid(1),1:grid(2),1:grid3) end subroutine BasicPETSc_formResidual !????????????????????????????????????????????????? !> @brief matmult routine !-------------------------------------------------------------------------------------------------- ! a shell jacobian; returns the action J*v subroutine jac_shell(Jshell,v_in,v_out) use math, only: & math_rotate_backward33, & math_transpose33, & math_mul3333xx33 use mesh, only: & grid, & grid3 use spectral_utilities, only: & wgt, & tensorField_real, & utilities_FFTtensorForward, & utilities_fourierGammaConvolution, & utilities_FFTtensorBackward, & Utilities_constitutiveResponse, & Utilities_divergenceRMS implicit none ! DMDALocalInfo, dimension(DMDA_LOCAL_INFO_SIZE) :: & ! in PetscScalar, dimension(3,3, & 1000,1,1), intent(in) :: & v_in PetscScalar, dimension(3,3, & 1000,1,1), intent(out) :: & v_out Mat :: Jshell PetscErrorCode :: ierr integer(pInt) :: & i,j,k,e e = 0_pInt tensorField_real = 0.0_pReal print*, SHAPE(v_in) print*, SHAPE(v_out) do k = 1_pInt, grid3; do j = 1_pInt, grid(2); do i = 1_pInt, grid(1) e = e + 1_pInt tensorField_real(1:3,1:3,i,j,k) = j*v enddo; enddo; enddo call utilities_FFTtensorForward() call utilities_fourierGammaConvolution() call utilities_FFTtensorBackward() v_out = tensorField_real(1:3,1:3,1:grid(1),1:grid(2),1:grid3) end subroutine jac_shell subroutine SPEC_mech_formJacobian(snes,xx_local,Jac_pre,Jac,dummy,ierr) implicit none SNES :: snes DM :: dm_local Vec :: x_local, xx_local Mat :: Jac_pre, Jac PetscObject :: dummy PetscErrorCode :: ierr end subroutine SPEC_mech_formJacobian end module spectral_mech_basic !-------------------------------------------------------------------------------------------------- Best regards, Luv > On 23 May 2017, at 21:16, Barry Smith wrote: > > > You didn't include any code related to creating or setting the MATSHELL. What goes wrong with your Fortran code. > > >> On May 23, 2017, at 2:05 PM, Luv Sharma wrote: >> >> Dear PETSc team, >> >> I am working on a code which solves mechanical equilibrium using spectral methods. >> I want to make use of the matshell to get the action J*v. >> >> I have been able to successfully implement it using petsc4py. But having difficulties to get it working in a fortran code. >> I am using petsc-3.7.6. >> >> Below is a stripped down version of the existing fortran code (module). Can you please help me in figuring out how the right way to do it a code with following structure? >> >> !-------------------------------------------------------------------------------------------------- >> module spectral_mech_basic >> >> implicit none >> private >> #include >> >> ! *PETSc data here* >> DM .. >> SNES .. >> .. >> contains >> >> !-------------------------------------------------------------------------------------------------- >> subroutine basicPETSc_init >> >> external :: & >> *petsc functions here* >> >> ! initialize solver specific parts of PETSc >> call SNESCreate(PETSC_COMM_WORLD,snes,ierr); CHKERRQ(ierr) >> call SNESSetOptionsPrefix(snes,'mech_',ierr);CHKERRQ(ierr) >> call DMDACreate3d(PETSC_COMM_WORLD, & >> DM_BOUNDARY_NONE, DM_BOUNDARY_NONE, DM_BOUNDARY_NONE, & >> DMDA_STENCIL_BOX, & >> grid(1),grid(2),grid(3), & >> 1 , 1, worldsize, & >> 9, 0, & >> grid(1),grid(2),localK, & >> da,ierr) >> CHKERRQ(ierr) >> call SNESSetDM(snes,da,ierr); CHKERRQ(ierr) >> call DMCreateGlobalVector(da,solution_vec,ierr); CHKERRQ(ierr) >> call DMDASNESSetFunctionLocal(da,INSERT_VALUES,BasicPETSC_formResidual,PETSC_NULL_OBJECT,ierr) >> CHKERRQ(ierr) >> call SNESSetDM(snes,da,ierr); CHKERRQ(ierr) >> call SNESGetKSP(snes,ksp,ierr); CHKERRQ(ierr) >> call KSPGetPC(ksp,pc,ierr); CHKERRQ(ierr) >> call PCSetType(pc,PCNONE,ierr); CHKERRQ(ierr) >> call SNESSetFromOptions(snes,ierr); CHKERRQ(ierr) >> >> ! init fields >> call DMDAVecGetArrayF90(da,solution_vec,F,ierr); CHKERRQ(ierr) >> call DMDAVecRestoreArrayF90(da,solution_vec,F,ierr); CHKERRQ(ierr) >> >> end subroutine basicPETSc_init >> !-------------------------------------------------------------------------------------------------- >> >> type(tSolutionState) function & >> basicPETSc_solution(incInfoIn,timeinc,timeinc_old,stress_BC,rotation_BC) >> implicit none >> ! PETSc Data >> PetscErrorCode :: ierr >> SNESConvergedReason :: reason >> external :: & >> SNESSolve, & >> ! solve BVP >> call SNESSolve(snes,PETSC_NULL_OBJECT,solution_vec,ierr) >> CHKERRQ(ierr) >> end function BasicPETSc_solution >> !-------------------------------------------------------------------------------------------------- >> !> @brief forms the basic residual vector >> !-------------------------------------------------------------------------------------------------- >> subroutine BasicPETSC_formResidual(in,x_scal,f_scal,dummy,ierr) >> >> implicit none >> DMDALocalInfo, dimension(DMDA_LOCAL_INFO_SIZE) :: & >> in >> PetscScalar, dimension(3,3, & >> XG_RANGE,YG_RANGE,ZG_RANGE), intent(in) :: & >> x_scal >> PetscScalar, dimension(3,3, & >> X_RANGE,Y_RANGE,Z_RANGE), intent(out) :: & >> f_scal >> ! constructing residual >> ?.. >> ?. >> >> f_scal = tensorField_real(1:3,1:3,1:grid(1),1:grid(2),1:grid3) >> >> end subroutine BasicPETSc_formResidual >> !-------------------------------------------------------------------------------------------------- >> >> end module spectral_mech_basic >> !-------------------------------------------------------------------------------------------------- >> >> Best regards, >> Luv >> >>> On 3 Nov 2016, at 01:17, Barry Smith wrote: >>> >>> >>> Is anyone away of cases where PETSc has been used with spectral methods? >>> >>> Thanks >>> >>> Barry >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue May 23 15:20:11 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 23 May 2017 15:20:11 -0500 Subject: [petsc-users] matshell for spectral methods in fortran In-Reply-To: <8B5B69FE-F0A1-4442-9B03-3A9DBA447B2E@gmail.com> References: <72BF559A-E663-4F04-93A2-222D4AD8A94B@mcs.anl.gov> <8B5B69FE-F0A1-4442-9B03-3A9DBA447B2E@gmail.com> Message-ID: <19D5F486-4121-4954-A8C4-072E78CFF0CA@mcs.anl.gov> I cannot easily see why this would or wouldn't work. If you send me the entire code as an attachment and makefile I can try to run it. Barry > On May 23, 2017, at 2:36 PM, Luv Sharma wrote: > > Dear Barry, > > Thanks for your quick reply. > I have tried following: > > !-------------------------------------------------------------------------------------------------- > module spectral_mech_basic > > implicit none > private > #include > > ! *PETSc data here* > DM .. > SNES .. > .. > contains > > !-------------------------------------------------------------------------------------------------- > subroutine basicPETSc_init > > external :: & > *petsc functions here* > > ! initialize solver specific parts of PETSc > call SNESCreate(PETSC_COMM_WORLD,snes,ierr); CHKERRQ(ierr) > call SNESSetOptionsPrefix(snes,'mech_',ierr);CHKERRQ(ierr) > call DMDACreate3d(PETSC_COMM_WORLD, & > DM_BOUNDARY_NONE, DM_BOUNDARY_NONE, DM_BOUNDARY_NONE, & > DMDA_STENCIL_BOX, & > grid(1),grid(2),grid(3), & > 1 , 1, worldsize, & > 9, 0, & > grid(1),grid(2),localK, & > da,ierr) > CHKERRQ(ierr) > call SNESSetDM(snes,da,ierr); CHKERRQ(ierr) > call DMCreateGlobalVector(da,solution_vec,ierr); CHKERRQ(ierr) > call DMDASNESSetFunctionLocal(da,INSERT_VALUES,BasicPETSC_formResidual,PETSC_NULL_OBJECT,ierr) > CHKERRQ(ierr) > > > !call DMCreateMatrix(da,J_shell,ierr) > !CHKERRQ(ierr) > !call DMSetMatType(da,MATSHELL,ierr) > !CHKERRQ(ierr) > !call DMSNESSetJacobianLocal(da,SPEC_mech_formJacobian,PETSC_NULL_OBJECT,ierr) !< function to evaluate stiffness matrix > !CHKERRQ(ierr) > > > matsize = 9_pInt*grid(1)*grid(2)*grid(3) > call MatCreateShell( PETSC_COMM_WORLD, matsize, matsize, matsize, matsize, PETSC_NULL_OBJECT, J_shell, ierr ) > CHKERRQ(ierr) > > call SNESSetJacobian( snes, J_shell, J_shell, SPEC_mech_formJacobian, PETSC_NULL_OBJECT, ierr) > CHKERRQ(ierr) > > call MatShellSetOperation(J_shell, MATOP_MULT, jac_shell, ierr ) > CHKERRQ(ierr) > > call SNESSetDM(snes,da,ierr); CHKERRQ(ierr) > call SNESGetKSP(snes,ksp,ierr); CHKERRQ(ierr) > call KSPGetPC(ksp,pc,ierr); CHKERRQ(ierr) > call PCSetType(pc,PCNONE,ierr); CHKERRQ(ierr) > call SNESSetFromOptions(snes,ierr); CHKERRQ(ierr) > > ! init fields > call DMDAVecGetArrayF90(da,solution_vec,F,ierr); CHKERRQ(ierr) > call DMDAVecRestoreArrayF90(da,solution_vec,F,ierr); CHKERRQ(ierr) > > end subroutine basicPETSc_init > !-------------------------------------------------------------------------------------------------- > > type(tSolutionState) function & > basicPETSc_solution(incInfoIn,timeinc,timeinc_old,stress_BC,rotation_BC) > implicit none > ! PETSc Data > PetscErrorCode :: ierr > SNESConvergedReason :: reason > external :: & > SNESSolve, & > ! solve BVP > call SNESSolve(snes,PETSC_NULL_OBJECT,solution_vec,ierr) > CHKERRQ(ierr) > end function BasicPETSc_solution > !-------------------------------------------------------------------------------------------------- > !> @brief forms the basic residual vector > !-------------------------------------------------------------------------------------------------- > subroutine BasicPETSC_formResidual(in,x_scal,f_scal,dummy,ierr) > > implicit none > DMDALocalInfo, dimension(DMDA_LOCAL_INFO_SIZE) :: & > in > PetscScalar, dimension(3,3, & > XG_RANGE,YG_RANGE,ZG_RANGE), intent(in) :: & > x_scal > PetscScalar, dimension(3,3, & > X_RANGE,Y_RANGE,Z_RANGE), intent(out) :: & > f_scal > ! constructing residual > ?.. > ?. > > f_scal = tensorField_real(1:3,1:3,1:grid(1),1:grid(2),1:grid3) > > end subroutine BasicPETSc_formResidual > !????????????????????????????????????????????????? > !> @brief matmult routine > !-------------------------------------------------------------------------------------------------- > ! a shell jacobian; returns the action J*v > subroutine jac_shell(Jshell,v_in,v_out) > > use math, only: & > math_rotate_backward33, & > math_transpose33, & > math_mul3333xx33 > use mesh, only: & > grid, & > grid3 > use spectral_utilities, only: & > wgt, & > tensorField_real, & > utilities_FFTtensorForward, & > utilities_fourierGammaConvolution, & > utilities_FFTtensorBackward, & > Utilities_constitutiveResponse, & > Utilities_divergenceRMS > implicit none > ! DMDALocalInfo, dimension(DMDA_LOCAL_INFO_SIZE) :: & > ! in > PetscScalar, dimension(3,3, & > 1000,1,1), intent(in) :: & > v_in > PetscScalar, dimension(3,3, & > 1000,1,1), intent(out) :: & > v_out > > Mat :: Jshell > PetscErrorCode :: ierr > integer(pInt) :: & > i,j,k,e > > e = 0_pInt > > tensorField_real = 0.0_pReal > print*, SHAPE(v_in) > print*, SHAPE(v_out) > > do k = 1_pInt, grid3; do j = 1_pInt, grid(2); do i = 1_pInt, grid(1) > e = e + 1_pInt > tensorField_real(1:3,1:3,i,j,k) = j*v > enddo; enddo; enddo > call utilities_FFTtensorForward() > call utilities_fourierGammaConvolution() > call utilities_FFTtensorBackward() > v_out = tensorField_real(1:3,1:3,1:grid(1),1:grid(2),1:grid3) > > end subroutine jac_shell > > > subroutine SPEC_mech_formJacobian(snes,xx_local,Jac_pre,Jac,dummy,ierr) > implicit none > SNES :: snes > DM :: dm_local > Vec :: x_local, xx_local > Mat :: Jac_pre, Jac > PetscObject :: dummy > PetscErrorCode :: ierr > > end subroutine SPEC_mech_formJacobian > > > > end module spectral_mech_basic > !-------------------------------------------------------------------------------------------------- > > Best regards, > Luv > > > > >> On 23 May 2017, at 21:16, Barry Smith wrote: >> >> >> You didn't include any code related to creating or setting the MATSHELL. What goes wrong with your Fortran code. >> >> >>> On May 23, 2017, at 2:05 PM, Luv Sharma wrote: >>> >>> Dear PETSc team, >>> >>> I am working on a code which solves mechanical equilibrium using spectral methods. >>> I want to make use of the matshell to get the action J*v. >>> >>> I have been able to successfully implement it using petsc4py. But having difficulties to get it working in a fortran code. >>> I am using petsc-3.7.6. >>> >>> Below is a stripped down version of the existing fortran code (module). Can you please help me in figuring out how the right way to do it a code with following structure? >>> >>> !-------------------------------------------------------------------------------------------------- >>> module spectral_mech_basic >>> >>> implicit none >>> private >>> #include >>> >>> ! *PETSc data here* >>> DM .. >>> SNES .. >>> .. >>> contains >>> >>> !-------------------------------------------------------------------------------------------------- >>> subroutine basicPETSc_init >>> >>> external :: & >>> *petsc functions here* >>> >>> ! initialize solver specific parts of PETSc >>> call SNESCreate(PETSC_COMM_WORLD,snes,ierr); CHKERRQ(ierr) >>> call SNESSetOptionsPrefix(snes,'mech_',ierr);CHKERRQ(ierr) >>> call DMDACreate3d(PETSC_COMM_WORLD, & >>> DM_BOUNDARY_NONE, DM_BOUNDARY_NONE, DM_BOUNDARY_NONE, & >>> DMDA_STENCIL_BOX, & >>> grid(1),grid(2),grid(3), & >>> 1 , 1, worldsize, & >>> 9, 0, & >>> grid(1),grid(2),localK, & >>> da,ierr) >>> CHKERRQ(ierr) >>> call SNESSetDM(snes,da,ierr); CHKERRQ(ierr) >>> call DMCreateGlobalVector(da,solution_vec,ierr); CHKERRQ(ierr) >>> call DMDASNESSetFunctionLocal(da,INSERT_VALUES,BasicPETSC_formResidual,PETSC_NULL_OBJECT,ierr) >>> CHKERRQ(ierr) >>> call SNESSetDM(snes,da,ierr); CHKERRQ(ierr) >>> call SNESGetKSP(snes,ksp,ierr); CHKERRQ(ierr) >>> call KSPGetPC(ksp,pc,ierr); CHKERRQ(ierr) >>> call PCSetType(pc,PCNONE,ierr); CHKERRQ(ierr) >>> call SNESSetFromOptions(snes,ierr); CHKERRQ(ierr) >>> >>> ! init fields >>> call DMDAVecGetArrayF90(da,solution_vec,F,ierr); CHKERRQ(ierr) >>> call DMDAVecRestoreArrayF90(da,solution_vec,F,ierr); CHKERRQ(ierr) >>> >>> end subroutine basicPETSc_init >>> !-------------------------------------------------------------------------------------------------- >>> >>> type(tSolutionState) function & >>> basicPETSc_solution(incInfoIn,timeinc,timeinc_old,stress_BC,rotation_BC) >>> implicit none >>> ! PETSc Data >>> PetscErrorCode :: ierr >>> SNESConvergedReason :: reason >>> external :: & >>> SNESSolve, & >>> ! solve BVP >>> call SNESSolve(snes,PETSC_NULL_OBJECT,solution_vec,ierr) >>> CHKERRQ(ierr) >>> end function BasicPETSc_solution >>> !-------------------------------------------------------------------------------------------------- >>> !> @brief forms the basic residual vector >>> !-------------------------------------------------------------------------------------------------- >>> subroutine BasicPETSC_formResidual(in,x_scal,f_scal,dummy,ierr) >>> >>> implicit none >>> DMDALocalInfo, dimension(DMDA_LOCAL_INFO_SIZE) :: & >>> in >>> PetscScalar, dimension(3,3, & >>> XG_RANGE,YG_RANGE,ZG_RANGE), intent(in) :: & >>> x_scal >>> PetscScalar, dimension(3,3, & >>> X_RANGE,Y_RANGE,Z_RANGE), intent(out) :: & >>> f_scal >>> ! constructing residual >>> ?.. >>> ?. >>> >>> f_scal = tensorField_real(1:3,1:3,1:grid(1),1:grid(2),1:grid3) >>> >>> end subroutine BasicPETSc_formResidual >>> !-------------------------------------------------------------------------------------------------- >>> >>> end module spectral_mech_basic >>> !-------------------------------------------------------------------------------------------------- >>> >>> Best regards, >>> Luv >>> >>>> On 3 Nov 2016, at 01:17, Barry Smith wrote: >>>> >>>> >>>> Is anyone away of cases where PETSc has been used with spectral methods? >>>> >>>> Thanks >>>> >>>> Barry >>>> >> > From friedenhe at gmail.com Tue May 23 18:40:44 2017 From: friedenhe at gmail.com (Ping He) Date: Tue, 23 May 2017 19:40:44 -0400 Subject: [petsc-users] How to manually set the matrix-free differencing parameter h? In-Reply-To: <6D3FA436-3969-45A8-ADAD-AAEDE2FFC891@mcs.anl.gov> References: <59246CB1.4020000@gmail.com> <6D3FA436-3969-45A8-ADAD-AAEDE2FFC891@mcs.anl.gov> Message-ID: <5924C87C.90405@gmail.com> Hi Barry, Thanks for your reply. It is working. Regards, Ping On 05/23/2017 02:02 PM, Barry Smith wrote: > That's not really the right function; you can use MatMFFDSetFunctionError(), MatMFFDSetType(), MatMFFDSetPeriod() to set the available parameters. > > > Barry > > >> On May 23, 2017, at 12:09 PM, Ping He wrote: >> >> Hi, >> >> I am using PETSc-SNES matrix free approach, and I would like to know how to manually set the differencing parameter h. I tried to use the SNESDefaultMatrixFreeSetParameters2 function but I got an error when compiling: ?SNESDefaultMatrixFreeSetParameters2? was not declared in this scope. >> >> Thanks very much in advance. >> >> Regards, >> Ping From lirui319 at hnu.edu.cn Wed May 24 00:14:54 2017 From: lirui319 at hnu.edu.cn (=?GBK?B?wO7I8A==?=) Date: Wed, 24 May 2017 13:14:54 +0800 (GMT+08:00) Subject: [petsc-users] Installation Error Message-ID: <15e1cc1.5bb6.15c38e117d7.Coremail.lirui319@hnu.edu.cn> Dear professor or engineer: I meet a problem about installation to petsc. When I type the code "./configure --with-cc=gcc --with-cxx=0 --with-fc=0 --download-f2cblaslapack --download-mpich" on my terminal,the answer reveals the following results. >>>ERROR:root:code for hash md5 was not found. Traceback (most recent call last): File "/home/zhuizhuluori/lirui/software/vapor-2.5.0-Linux_x86_64/vapor/vapor-2.5.0/lib/python2.7/hashlib.py", line 139, in Hi, I want to be able to perform matrix operations on several contiguous submatrices of a full matrix, without allocating the memory redundantly for the submatrices (in addition to the memory that is already allocated for the full matrix). I tried using MatGetSubMatrix, but this function appears to allocate the additional memory. The other way I found to do this is to create the smallest submatrices I need first, then use MatCreateNest to combine them into bigger ones (including the full matrix). The documentation of MatCreateNest seems to indicate that it does not allocate additional memory for storing the new matrix. Is this the right approach, or is there a better one? Thanks, Michal Derezinski. -------------- next part -------------- An HTML attachment was scrubbed... URL: From danyang.su at gmail.com Wed May 24 02:21:22 2017 From: danyang.su at gmail.com (Danyang Su) Date: Wed, 24 May 2017 00:21:22 -0700 Subject: [petsc-users] Question on incomplete factorization level and fill Message-ID: <93217794-9c63-fd52-ab36-4174de8cb9c8@gmail.com> Dear All, I use PCFactorSetLevels for ILU and PCFactorSetFill for other preconditioning in my code to help solve the problems that the default option is hard to solve. However, I found the latter one, PCFactorSetFill does not take effect for my problem. The matrices and rhs as well as the solutions are attached from the link below. I obtain the solution using hypre preconditioner and it takes 7 and 38 iterations for matrix 1 and matrix 2. However, if I use other preconditioner, the solver just failed at the first matrix. I have tested this matrix using the native sequential solver (not PETSc) with ILU preconditioning. If I set the incomplete factorization level to 0, this sequential solver will take more than 100 iterations. If I increase the factorization level to 1 or more, it just takes several iterations. This remind me that the PC factor for this matrices should be increased. However, when I tried it in PETSc, it just does not work. Matrix and rhs can be obtained from the link below. https://eilinator.eos.ubc.ca:8443/index.php/s/CalUcq9CMeblk4R Would anyone help to check if you can make this work by increasing the PC factor level or fill? Thanks and regards, Danyang From franck.houssen at inria.fr Wed May 24 04:45:58 2017 From: franck.houssen at inria.fr (Franck Houssen) Date: Wed, 24 May 2017 11:45:58 +0200 (CEST) Subject: [petsc-users] [petsc-dev] Using PETSc MatIS, how to matmult a global IS matrix and a global vector ? In-Reply-To: References: <2012394521.6757315.1495383841678.JavaMail.zimbra@inria.fr> <264DC59D-B914-42E5-9A89-0746F21A37BF@gmail.com> <1392596904.7422896.1495533198072.JavaMail.zimbra@inria.fr> <1784716977.7683031.1495556883254.JavaMail.zimbra@inria.fr> <855172682.7687763.1495558287122.JavaMail.zimbra@inria.fr> Message-ID: <1238048783.7876567.1495619158445.JavaMail.zimbra@inria.fr> Coming from FEM, I believe the very confusing thing is that the local size of the user problem (math, physics point of view - DDM domain size) is not (can not be ?) the local size expected in MatCreateIS. My understanding is that the local size in MatIS is "just" related to backend implementation problems (it's logical that this local size is necessary, but, for another purpose: MPI machinery). Taking a few steps back, I can not see a case (I may be wrong) when a user does know how to compute or set "by hand" the local size that MatIS will expect: my understanding (once again, not sure) is that in most cases, the user will need local size to be PETSC_DECIDE in MatIS (because he doesn't want to "bother" with that or can not guess / compute it => unfortunatelly, as is, this jam the whole thing). I guess this kind of signature for MatIS would avoid/limit confusion in most cases and for most users : PetscErrorCode MatCreateIS(MPI_Comm comm,PetscInt bs,PetscInt M,PetscInt N,ISLocalToGlobalMapping rmap,ISLocalToGlobalMapping cmap,Mat *A,PetscInt m = PETSC_DECIDE ,PetscInt n = PETSC_DECIDE ) Or even PetscErrorCode MatCreateIS(MPI_Comm comm,PetscInt bs,PetscInt M,PetscInt N,ISLocalToGlobalMapping rmap,ISLocalToGlobalMapping cmap,Mat *A ) // Always use PETSC_DECIDE backstage ? Franck ----- Mail original ----- > De: "Matthew Knepley" > ?: "Franck Houssen" > Cc: "Stefano Zampini" , "PETSc" > , "PETSc" > Envoy?: Mardi 23 Mai 2017 19:02:28 > Objet: Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS matrix > and a global vector ? > On Tue, May 23, 2017 at 11:51 AM, Franck Houssen < franck.houssen at inria.fr > > wrote: > > Not sure to know what question you're talking about ?!... > > > I use MatIS to test some kind of domain decomposition methods. I define my > > own preconditioner for that: in the apply callback, I need to matmult my > > (matIS) matrix with the incoming vector. > > Okay. I will create an example using your suggestion. > Thanks, > Matt > > Franck > > > > De: "Matthew Knepley" < knepley at gmail.com > > > > > > > ?: "Franck Houssen" < franck.houssen at inria.fr > > > > > > > Cc: "Stefano Zampini" < stefano.zampini at gmail.com >, "PETSc" < > > > petsc-users at mcs.anl.gov >, "PETSc" < petsc-dev at mcs.anl.gov > > > > > > > Envoy?: Mardi 23 Mai 2017 18:46:34 > > > > > > Objet: Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS > > > matrix > > > and a global vector ? > > > > > > On Tue, May 23, 2017 at 11:28 AM, Franck Houssen < > > > franck.houssen at inria.fr > > > > > > > wrote: > > > > > > > OK, thanks. This is helpfull... But I really think the doc should be > > > > more > > > > verbose about that: this is really confusing and I didn't find any > > > > simple > > > > example to begin with which make all this even more confusing (personal > > > > opinion). > > > > > > > > > Did you respond to my other question (how are you using them)? That would > > > help me understand how to phrase it. > > > > > > Thanks, > > > > > > Matt > > > > > > > Franck > > > > > > > > > > > De: "Matthew Knepley" < knepley at gmail.com > > > > > > > > > > > > > > > > ?: "Franck Houssen" < franck.houssen at inria.fr > > > > > > > > > > > > > > > > Cc: "Stefano Zampini" < stefano.zampini at gmail.com >, "PETSc" < > > > > > petsc-users at mcs.anl.gov >, "PETSc" < petsc-dev at mcs.anl.gov > > > > > > > > > > > > > > > > Envoy?: Mardi 23 Mai 2017 13:21:21 > > > > > > > > > > > > > > > Objet: Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS > > > > > matrix > > > > > and a global vector ? > > > > > > > > > > > > > > > On Tue, May 23, 2017 at 4:53 AM, Franck Houssen < > > > > > franck.houssen at inria.fr > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > The first thing I did was to put 3, not 4 : I got an error thrown > > > > > > in > > > > > > MatCreateIS (see the git diff + stack below). As the error said I > > > > > > used > > > > > > globalSize = numberOfMPIProcessus * localSize : my understanding is > > > > > > that, > > > > > > when using MatIS, the global size needs to be the sum of all local > > > > > > sizes. > > > > > > Correct ? > > > > > > > > > > > > > > > > > > > > No. MatIS means that the matrix is not assembled. The easiest way > > > > > (for > > > > > me) > > > > > to > > > > > think of this is that processes do not have > > > > > > > > > > > > > > > to hold full rows. One process can hold part of row i, and another > > > > > processes > > > > > can hold another part. However, there are still > > > > > > > > > > > > > > > the same number of global rows. > > > > > > > > > > > > > > > > I have a 3x3 global matrix made of two overlapping 2x2 local matrix > > > > > > (= > > > > > > diagonal with 1.). Each local matrix correspond to one domain (each > > > > > > domain > > > > > > is delegated to one MPI proc, so, I have 2 MPI procs because I have > > > > > > 2 > > > > > > domains). > > > > > > > > > > > > > > > > > > > > So the global size is 3. The local size here is not the size of the > > > > > local > > > > > IS > > > > > block, since that is a property only of MatIS. It is the > > > > > > > > > > > > > > > size of the local piece of the vector you multiply. This allows PETSc > > > > > to > > > > > understand the parallel layout of the Vec, and how it > > > > > > > > > > > > > > > matched the Mat. > > > > > > > > > > > > > > > This is somewhat confusing because FEM people mean something > > > > > different > > > > > by > > > > > "local" than we do here, and in fact we use this > > > > > > > > > > > > > > > other definition of local when assembling operators. > > > > > > > > > > > > > > > Matt > > > > > > > > > > > > > > > > This is the simplest possible example: I have two 2x2 (local) diag > > > > > > matrix > > > > > > that overlap so that the global matrix built from them is 1, 2, 1 > > > > > > on > > > > > > the > > > > > > diagonal (local contributions add up in the middle). > > > > > > > > > > > > > > > > > > > > > I need to MatMult this global matrix with a global vector filled > > > > > > with > > > > > > 1. > > > > > > > > > > > > > > > > > > > > > Franck > > > > > > > > > > > > > > > > > > > > > Git diff : > > > > > > > > > > > > > > > > > > > > > --- a/matISLocalMat.cpp > > > > > > > > > > > > > > > > > > > > > +++ b/matISLocalMat.cpp > > > > > > > > > > > > > > > > > > > > > @@ -16,7 +16,7 @@ int main(int argc,char **argv) { > > > > > > > > > > > > > > > > > > > > > int size = 0; MPI_Comm_size(MPI_COMM_WORLD, &size); if (size != 2) > > > > > > return > > > > > > 1; > > > > > > > > > > > > > > > > > > > > > int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > > > > > > > > > > > > > > > > > > > > - PetscInt localSize = 2, globalSize = localSize*2 /*2 MPI*/; > > > > > > > > > > > > > > > > > > > > > + PetscInt localSize = 2, globalSize = 3; > > > > > > > > > > > > > > > > > > > > > PetscInt localIdx[2] = {0, 0}; > > > > > > > > > > > > > > > > > > > > > if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} > > > > > > > > > > > > > > > > > > > > > else {localIdx[0] = 1; localIdx[1] = 2;} > > > > > > > > > > > > > > > > > > > > > Stack error: > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: Nonconforming object sizes > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: Sum of local lengths 4 does not equal global length > > > > > > 3, > > > > > > my > > > > > > local length 2 > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: [0] ISG2LMapApply line 17 > > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/vec/is/utils/isltog.c > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: [0] MatSetValues_IS line 692 > > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: [0] MatSetValues line 1157 > > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: [0] MatISSetPreallocation_IS line 95 > > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: [0] MatISSetPreallocation line 80 > > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: [0] PetscSplitOwnership line 80 > > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/sys/utils/psplit.c > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: [0] PetscLayoutSetUp line 129 > > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/vec/is/utils/pmap.c > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: [0] MatSetLocalToGlobalMapping_IS line 628 > > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: [0] MatSetLocalToGlobalMapping line 1899 > > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > > > > > > > > > > > > > > > > > > > > > [0]PETSC ERROR: [0] MatCreateIS line 986 > > > > > > /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > > > > > > > > > > > > > > > > > > > De: "Stefano Zampini" < stefano.zampini at gmail.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ?: "Matthew Knepley" < knepley at gmail.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cc: "Franck Houssen" < franck.houssen at inria.fr >, "PETSc" < > > > > > > > petsc-users at mcs.anl.gov >, "PETSc" < petsc-dev at mcs.anl.gov > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Envoy?: Dimanche 21 Mai 2017 23:02:37 > > > > > > > > > > > > > > > > > > > > > > > > > > > > Objet: Re: [petsc-dev] Using PETSc MatIS, how to matmult a global > > > > > > > IS > > > > > > > matrix > > > > > > > and a global vector ? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Franck, > > > > > > > > > > > > > > > > > > > > > > > > > > > > PETSc takes care of doing the matrix-vector multiplication > > > > > > > properly > > > > > > > using > > > > > > > MatIS. As Matt said, the layout of the vectors is the usual > > > > > > > parallel > > > > > > > layout. > > > > > > > > > > > > > > > > > > > > > > > > > > > > The local sizes of the MatIS matrix (i.e. the local size of the > > > > > > > left > > > > > > > and > > > > > > > right vectors used in MatMult) are not the sizes of the local > > > > > > > subdomain > > > > > > > matrices in MatIS. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On May 21, 2017, at 6:47 PM, Matthew Knepley < > > > > > > > > knepley at gmail.com > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sun, May 21, 2017 at 11:26 AM, Franck Houssen < > > > > > > > > franck.houssen at inria.fr > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Using PETSc MatIS, how to matmult a global IS matrix and a > > > > > > > > > global > > > > > > > > > vector > > > > > > > > > ? > > > > > > > > > Example is attached : I don't get what I expect that is a > > > > > > > > > vector > > > > > > > > > such > > > > > > > > > that > > > > > > > > > proc0 = [1, 2] and proc1 = [2, 1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1) I think the global size of your matrix is wrong. You seem to > > > > > > > > want > > > > > > > > 3, > > > > > > > > not > > > > > > > > 4 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2) Global vectors have a non-overlapping row partition. You > > > > > > > > might > > > > > > > > be > > > > > > > > thinking > > > > > > > > of local vectors > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Matt > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > What most experimenters take for granted before they begin > > > > > > > > their > > > > > > > > experiments > > > > > > > > is infinitely more interesting than any results to which their > > > > > > > > experiments > > > > > > > > lead. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- Norbert Wiener > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://www.caam.rice.edu/~mk51/ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > De: "Stefano Zampini" < stefano.zampini at gmail.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ?: "Matthew Knepley" < knepley at gmail.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cc: "Franck Houssen" < franck.houssen at inria.fr >, "PETSc" < > > > > > > > petsc-users at mcs.anl.gov >, "PETSc" < petsc-dev at mcs.anl.gov > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Envoy?: Dimanche 21 Mai 2017 23:02:37 > > > > > > > > > > > > > > > > > > > > > > > > > > > > Objet: Re: [petsc-dev] Using PETSc MatIS, how to matmult a global > > > > > > > IS > > > > > > > matrix > > > > > > > and a global vector ? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Franck, > > > > > > > > > > > > > > > > > > > > > > > > > > > > PETSc takes care of doing the matrix-vector multiplication > > > > > > > properly > > > > > > > using > > > > > > > MatIS. As Matt said, the layout of the vectors is the usual > > > > > > > parallel > > > > > > > layout. > > > > > > > > > > > > > > > > > > > > > > > > > > > > The local sizes of the MatIS matrix (i.e. the local size of the > > > > > > > left > > > > > > > and > > > > > > > right vectors used in MatMult) are not the sizes of the local > > > > > > > subdomain > > > > > > > matrices in MatIS. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On May 21, 2017, at 6:47 PM, Matthew Knepley < > > > > > > > > knepley at gmail.com > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sun, May 21, 2017 at 11:26 AM, Franck Houssen < > > > > > > > > franck.houssen at inria.fr > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Using PETSc MatIS, how to matmult a global IS matrix and a > > > > > > > > > global > > > > > > > > > vector > > > > > > > > > ? > > > > > > > > > Example is attached : I don't get what I expect that is a > > > > > > > > > vector > > > > > > > > > such > > > > > > > > > that > > > > > > > > > proc0 = [1, 2] and proc1 = [2, 1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1) I think the global size of your matrix is wrong. You seem to > > > > > > > > want > > > > > > > > 3, > > > > > > > > not > > > > > > > > 4 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2) Global vectors have a non-overlapping row partition. You > > > > > > > > might > > > > > > > > be > > > > > > > > thinking > > > > > > > > of local vectors > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Matt > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Franck > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > What most experimenters take for granted before they begin > > > > > > > > their > > > > > > > > experiments > > > > > > > > is infinitely more interesting than any results to which their > > > > > > > > experiments > > > > > > > > lead. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- Norbert Wiener > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://www.caam.rice.edu/~mk51/ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > What most experimenters take for granted before they begin their > > > > > experiments > > > > > is infinitely more interesting than any results to which their > > > > > experiments > > > > > lead. > > > > > > > > > > > > > > > -- Norbert Wiener > > > > > > > > > > > > > > > http://www.caam.rice.edu/~mk51/ > > > > > > > > > > > > > -- > > > > > > What most experimenters take for granted before they begin their > > > experiments > > > is infinitely more interesting than any results to which their > > > experiments > > > lead. > > > > > > -- Norbert Wiener > > > > > > http://www.caam.rice.edu/~mk51/ > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener > http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From franck.houssen at inria.fr Wed May 24 04:46:01 2017 From: franck.houssen at inria.fr (Franck Houssen) Date: Wed, 24 May 2017 11:46:01 +0200 (CEST) Subject: [petsc-users] [petsc-dev] Using PETSc MatIS, how to get local matrix (= one domain) before and after assembly ? In-Reply-To: <9EFA5BCF-FDD3-45FA-A41A-6AA304D58C74@gmail.com> References: <867421313.6757137.1495383596545.JavaMail.zimbra@inria.fr> <1253777447.6757298.1495383780337.JavaMail.zimbra@inria.fr> <2033509705.7414108.1495532492501.JavaMail.zimbra@inria.fr> <740691579.7684644.1495557293858.JavaMail.zimbra@inria.fr> <9EFA5BCF-FDD3-45FA-A41A-6AA304D58C74@gmail.com> Message-ID: <694141704.7876584.1495619161488.JavaMail.zimbra@inria.fr> The code I sent compile and run at my side with petsc-3.7.6 (on debian/testing with gcc-6.3). The code you sent does not compile at my side. Anyway, no big deal. The modification you propose as far as I understand is to replace "ISCreateGeneral(PETSC_COMM_WORLD" with "ISCreateGeneral(PETSC_COMM_SELF" : still not working at my side (empty dirichlet local matrix). I will try to get that with a MPI matrix (that would contain same data that MatIS : that's what I tried to avoid as this doubles allocations - anyway, no big deal). Franck ----- Mail original ----- > De: "Stefano Zampini" > ?: "Franck Houssen" > Cc: "petsc-dev" , "PETSc users list" > , "petsc-maint" > Envoy?: Mardi 23 Mai 2017 20:23:49 > Objet: Re: [petsc-dev] Using PETSc MatIS, how to get local matrix (= one > domain) before and after assembly ? > > On May 23, 2017, at 6:34 PM, Franck Houssen < franck.houssen at inria.fr > > > wrote: > > > OK. I am supposed to destroy the matrix returned by MatISGetMPIXAIJ ? > > Yes > > Also, my example still not get the final assembled local matrix (the > > MatCreateSubMatrix returns an empty matrix) but as far as I understand my > > (global) index set is OK: what did I miss ? > > I really doubt you can use the example you have sent. It doesn?t compile, as > MatCreateSubMatrix needs an extra argument. > Attached a modified version that does what I guess is what you are looking > for (sequential Dirichlet problems on the subdomains). > > Franck > > > ----- Mail original ----- > > > > De: "Stefano Zampini" < stefano.zampini at gmail.com > > > > > > > ?: "Franck Houssen" < franck.houssen at inria.fr > > > > > > > Cc: "petsc-dev" < petsc-dev at mcs.anl.gov >, "PETSc users list" < > > > petsc-users at mcs.anl.gov >, "petsc-maint" < knepley at gmail.com > > > > > > > Envoy?: Mardi 23 Mai 2017 13:16:18 > > > > > > Objet: Re: [petsc-dev] Using PETSc MatIS, how to get local matrix (= one > > > domain) before and after assembly ? > > > > > > MatISGetMPIXAIJ is collective, as it assembles the global operator. To > > > get > > > the matrices you are looking for, you should call MatCreateSubMatrix on > > > the > > > assembled global operator, with the global indices representing the > > > subdomain problem. Each process needs to call both functions > > > > > > Stefano > > > > > > Il 23 Mag 2017 11:41, "Franck Houssen" < franck.houssen at inria.fr > ha > > > scritto: > > > > > > > I have a 3x3 global matrix made of two overlapping 2x2 local matrix (= > > > > diagonal with 1.). Each local matrix correspond to one domain (each > > > > domain > > > > is delegated to one MPI proc, so, I have 2 MPI procs because I have 2 > > > > domains). > > > > > > > > > > This is the simplest possible example: I have two 2x2 (local) diag > > > > matrix > > > > that overlap so that the global matrix built from them is 1, 2, 1 on > > > > the > > > > diagonal (local contributions add up in the middle). > > > > > > > > > > Now, I need for each MPI proc to get the assembled local matrix > > > > (sometimes > > > > called the dirichlet matrix) : this is a local matrix (sequential - not > > > > distributed with MPI) that accounts for contribution of neighboring > > > > domains > > > > (MPI proc). > > > > > > > > > > How to get the local assembled matrix ? MatGetLocalSubMatrix does not > > > > work > > > > (throw error - see example attached). MatGetSubMatrix returns a MPI > > > > distributed matrix, not a local (sequential) one. > > > > > > > > > > 1. My understanding is that MatISGetMPIXAIJ should return a local > > > > matrix > > > > (sequential AIJ matrix) : the MPI in the name recall that you get the > > > > assembled matrix (with contributions from the shared border) from the > > > > other > > > > MPI processus. Correct ? In my simple example, I replaced > > > > MatGetLocalSubMatrix with MatISGetMPIXAIJ : I get a deadlock which was > > > > surprising to me... Is MatISGetMPIXAIJ a collective call ? > > > > > > > > > > 2. Supposing this is a collective call (and that point 1 is not > > > > correct), > > > > I > > > > ride up MatISGetMPIXAIJ before the "if (rank > 0)" : I don't deadlock > > > > now, > > > > but it seems I get a global matrix which is not the assembled local > > > > matrix > > > > I > > > > am looking for. > > > > > > > > > > 3. I am supposed to destroy the matrix returned by MatISGetMPIXAIJ ? (I > > > > believe yes - not sure as AFAIU wording should associate Destroy > > > > methods > > > > to > > > > Create methods) > > > > > > > > > > Franck > > > > > > > > > > The git diff illustrate modifications I tried to add to the initial > > > > file > > > > attached to this thread: > > > > > > > > > > --- a/matISLocalMat.cpp > > > > > > > > > > +++ b/matISLocalMat.cpp > > > > > > > > > > @@ -31,6 +31,8 @@ int main(int argc,char **argv) { > > > > > > > > > > MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); MatAssemblyEnd(A, > > > > MAT_FINAL_ASSEMBLY); > > > > > > > > > > MatView(A, PETSC_VIEWER_STDOUT_WORLD); > > > > PetscViewerFlush(PETSC_VIEWER_STDOUT_WORLD); // Diag: 1, 2, 1 > > > > > > > > > > + Mat assembledLocalMat; > > > > > > > > > > + MatISGetMPIXAIJ(A, MAT_INITIAL_MATRIX, &assembledLocalMat); > > > > > > > > > > if (rank > 0) { // Do not pollute stdout: print only 1 proc > > > > > > > > > > std::cout << std::endl << "non assembled local matrix:" << std::endl << > > > > std::endl; > > > > > > > > > > Mat nonAssembledLocalMat; > > > > > > > > > > @@ -38,11 +40,10 @@ int main(int argc,char **argv) { > > > > > > > > > > MatView(nonAssembledLocalMat, PETSC_VIEWER_STDOUT_SELF); // Diag: 1, 1 > > > > > > > > > > std::cout << std::endl << "assembled local matrix:" << std::endl << > > > > std::endl; > > > > > > > > > > - Mat assembledLocalMat; > > > > > > > > > > - IS is; ISCreateGeneral(PETSC_COMM_SELF, localSize, localIdx, > > > > PETSC_COPY_VALUES, &is); > > > > > > > > > > - MatGetLocalSubMatrix(A, is, is, &assembledLocalMat); // KO ?!... > > > > > > > > > > - MatView(assembledLocalMat, PETSC_VIEWER_STDOUT_SELF); // Would like > > > > to > > > > get > > > > => Diag: 2, 1 > > > > > > > > > > + //IS is; ISCreateGeneral(PETSC_COMM_SELF, localSize, localIdx, > > > > PETSC_COPY_VALUES, &is); > > > > > > > > > > + //MatGetLocalSubMatrix(A, is, is, &assembledLocalMat); // KO ?!... > > > > > > > > > > } > > > > > > > > > > + MatView(assembledLocalMat, PETSC_VIEWER_STDOUT_WORLD); // Would like > > > > to > > > > get > > > > => Diag: 2, 1 > > > > > > > > > > > De: "Stefano Zampini" < stefano.zampini at gmail.com > > > > > > > > > > > > > > > > ?: "petsc-maint" < knepley at gmail.com > > > > > > > > > > > > > > > > Cc: "petsc-dev" < petsc-dev at mcs.anl.gov >, "PETSc users list" < > > > > > petsc-users at mcs.anl.gov >, "Franck Houssen" < franck.houssen at inria.fr > > > > > > > > > > > > > > > > > > > > > Envoy?: Dimanche 21 Mai 2017 22:51:34 > > > > > > > > > > > > > > > Objet: Re: [petsc-dev] Using PETSc MatIS, how to get local matrix (= > > > > > one > > > > > domain) before and after assembly ? > > > > > > > > > > > > > > > To assemble the operator in aij format, use > > > > > > > > > > > > > > > MatISGetMPIXAIJ > > > > > > > > > > > > > > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatISGetMPIXAIJ.html > > > > > > > > > > > > > > > Il 21 Mag 2017 18:43, "Matthew Knepley" < knepley at gmail.com > ha > > > > > scritto: > > > > > > > > > > > > > > > > On Sun, May 21, 2017 at 11:23 AM, Franck Houssen < > > > > > > franck.houssen at inria.fr > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > I have a 3x3 global matrix is built (diag: 1, 2, 1): it's made of > > > > > > > 2 > > > > > > > overlapping 2x2 local matrix (diag: 1, 1). > > > > > > > > > > > > > > > > > > > > > > > > > > > > Getting non assembled local matrix is OK with MatISGetLocalMat. > > > > > > > > > > > > > > > > > > > > > > > > > > > > How to get assembled local matrix (initial local matrix + > > > > > > > neigbhor > > > > > > > contributions on the borders) ? (expected result is diag: 2, 1) > > > > > > > > > > > > > > > > > > > > > > > > > > > You can always use > > > > > > > > > > > > > > > > > > > > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetSubMatrix.html > > > > > > > > > > > > > > > > > > > > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetSubMatrices.html > > > > > > > > > > > > > > > > > > > > > to get copies, but if you just want to build things, you can use > > > > > > > > > > > > > > > > > > > > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetLocalSubMatrix.html > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > Matt > > > > > > > > > > > > > > > > > > > > > > Franck > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > > > > What most experimenters take for granted before they begin their > > > > > > experiments > > > > > > is infinitely more interesting than any results to which their > > > > > > experiments > > > > > > lead. > > > > > > > > > > > > > > > > > > > > > -- Norbert Wiener > > > > > > > > > > > > > > > > > > > > > http://www.caam.rice.edu/~mk51/ > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Wed May 24 06:42:10 2017 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Wed, 24 May 2017 13:42:10 +0200 Subject: [petsc-users] [petsc-dev] Using PETSc MatIS, how to get local matrix (= one domain) before and after assembly ? In-Reply-To: <694141704.7876584.1495619161488.JavaMail.zimbra@inria.fr> References: <867421313.6757137.1495383596545.JavaMail.zimbra@inria.fr> <1253777447.6757298.1495383780337.JavaMail.zimbra@inria.fr> <2033509705.7414108.1495532492501.JavaMail.zimbra@inria.fr> <740691579.7684644.1495557293858.JavaMail.zimbra@inria.fr> <9EFA5BCF-FDD3-45FA-A41A-6AA304D58C74@gmail.com> <694141704.7876584.1495619161488.JavaMail.zimbra@inria.fr> Message-ID: <2C9AF920-14AF-4BB4-B2E0-D1162FA0A0BB@gmail.com> > On May 24, 2017, at 11:46 AM, Franck Houssen wrote: > > The code I sent compile and run at my side with petsc-3.7.6 (on debian/testing with gcc-6.3). The code you sent does not compile at my side. Anyway, no big deal. > MatGetSubMatrix/MatGetSubMatrices have been renamed to MatCreateSubMatrix/MatCreateSubMatrices in petsc-dev. I thought you were using the master branch and not the latest release. Sorry for the confusion. To compile the code I have sent, just rename MatCreateSubMatrices with MatGetSubMatrices and it should work. http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetSubMatrices.html#MatGetSubMatrices http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetSubMatrix.html#MatGetSubMatrix > The modification you propose as far as I understand is to replace "ISCreateGeneral(PETSC_COMM_WORLD" with "ISCreateGeneral(PETSC_COMM_SELF" : still not working at my side (empty dirichlet local matrix). > I will try to get that with a MPI matrix (that would contain same data that MatIS : that's what I tried to avoid as this doubles allocations - anyway, no big deal). > In the code, you are already extracting submatrices from MPIAIJ format, not from MATIS. Attached a code that compiles and runs with petsc-3.7.6 > Franck > > > De: "Stefano Zampini" > ?: "Franck Houssen" > Cc: "petsc-dev" , "PETSc users list" , "petsc-maint" > Envoy?: Mardi 23 Mai 2017 20:23:49 > Objet: Re: [petsc-dev] Using PETSc MatIS, how to get local matrix (= one domain) before and after assembly ? > > > On May 23, 2017, at 6:34 PM, Franck Houssen > wrote: > > OK. I am supposed to destroy the matrix returned by MatISGetMPIXAIJ ? > Yes > Also, my example still not get the final assembled local matrix (the MatCreateSubMatrix returns an empty matrix) but as far as I understand my (global) index set is OK: what did I miss ? > > I really doubt you can use the example you have sent. It doesn?t compile, as MatCreateSubMatrix needs an extra argument. > Attached a modified version that does what I guess is what you are looking for (sequential Dirichlet problems on the subdomains). > > > Franck > > > De: "Stefano Zampini" > > ?: "Franck Houssen" > > Cc: "petsc-dev" >, "PETSc users list" >, "petsc-maint" > > Envoy?: Mardi 23 Mai 2017 13:16:18 > Objet: Re: [petsc-dev] Using PETSc MatIS, how to get local matrix (= one domain) before and after assembly ? > > MatISGetMPIXAIJ is collective, as it assembles the global operator. To get the matrices you are looking for, you should call MatCreateSubMatrix on the assembled global operator, with the global indices representing the subdomain problem. Each process needs to call both functions > > Stefano > > Il 23 Mag 2017 11:41, "Franck Houssen" > ha scritto: > I have a 3x3 global matrix made of two overlapping 2x2 local matrix (= diagonal with 1.). Each local matrix correspond to one domain (each domain is delegated to one MPI proc, so, I have 2 MPI procs because I have 2 domains). > This is the simplest possible example: I have two 2x2 (local) diag matrix that overlap so that the global matrix built from them is 1, 2, 1 on the diagonal (local contributions add up in the middle). > > Now, I need for each MPI proc to get the assembled local matrix (sometimes called the dirichlet matrix) : this is a local matrix (sequential - not distributed with MPI) that accounts for contribution of neighboring domains (MPI proc). > > How to get the local assembled matrix ? MatGetLocalSubMatrix does not work (throw error - see example attached). MatGetSubMatrix returns a MPI distributed matrix, not a local (sequential) one. > My understanding is that MatISGetMPIXAIJ should return a local matrix (sequential AIJ matrix) : the MPI in the name recall that you get the assembled matrix (with contributions from the shared border) from the other MPI processus. Correct ? In my simple example, I replaced MatGetLocalSubMatrix with MatISGetMPIXAIJ : I get a deadlock which was surprising to me... Is MatISGetMPIXAIJ a collective call ? > Supposing this is a collective call (and that point 1 is not correct), I ride up MatISGetMPIXAIJ before the "if (rank > 0)" : I don't deadlock now, but it seems I get a global matrix which is not the assembled local matrix I am looking for. > I am supposed to destroy the matrix returned by MatISGetMPIXAIJ ? (I believe yes - not sure as AFAIU wording should associate Destroy methods to Create methods) > Franck > > The git diff illustrate modifications I tried to add to the initial file attached to this thread: > --- a/matISLocalMat.cpp > +++ b/matISLocalMat.cpp > @@ -31,6 +31,8 @@ int main(int argc,char **argv) { > MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); > MatView(A, PETSC_VIEWER_STDOUT_WORLD); PetscViewerFlush(PETSC_VIEWER_STDOUT_WORLD); // Diag: 1, 2, 1 > > + Mat assembledLocalMat; > + MatISGetMPIXAIJ(A, MAT_INITIAL_MATRIX, &assembledLocalMat); > if (rank > 0) { // Do not pollute stdout: print only 1 proc > std::cout << std::endl << "non assembled local matrix:" << std::endl << std::endl; > Mat nonAssembledLocalMat; > @@ -38,11 +40,10 @@ int main(int argc,char **argv) { > MatView(nonAssembledLocalMat, PETSC_VIEWER_STDOUT_SELF); // Diag: 1, 1 > > std::cout << std::endl << "assembled local matrix:" << std::endl << std::endl; > - Mat assembledLocalMat; > - IS is; ISCreateGeneral(PETSC_COMM_SELF, localSize, localIdx, PETSC_COPY_VALUES, &is); > - MatGetLocalSubMatrix(A, is, is, &assembledLocalMat); // KO ?!... > - MatView(assembledLocalMat, PETSC_VIEWER_STDOUT_SELF); // Would like to get => Diag: 2, 1 > + //IS is; ISCreateGeneral(PETSC_COMM_SELF, localSize, localIdx, PETSC_COPY_VALUES, &is); > + //MatGetLocalSubMatrix(A, is, is, &assembledLocalMat); // KO ?!... > } > + MatView(assembledLocalMat, PETSC_VIEWER_STDOUT_WORLD); // Would like to get => Diag: 2, 1 > > > De: "Stefano Zampini" > > ?: "petsc-maint" > > Cc: "petsc-dev" >, "PETSc users list" >, "Franck Houssen" > > Envoy?: Dimanche 21 Mai 2017 22:51:34 > Objet: Re: [petsc-dev] Using PETSc MatIS, how to get local matrix (= one domain) before and after assembly ? > > To assemble the operator in aij format, use > MatISGetMPIXAIJ > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatISGetMPIXAIJ.html > > Il 21 Mag 2017 18:43, "Matthew Knepley" > ha scritto: > On Sun, May 21, 2017 at 11:23 AM, Franck Houssen > wrote: > I have a 3x3 global matrix is built (diag: 1, 2, 1): it's made of 2 overlapping 2x2 local matrix (diag: 1, 1). > Getting non assembled local matrix is OK with MatISGetLocalMat. > How to get assembled local matrix (initial local matrix + neigbhor contributions on the borders) ? (expected result is diag: 2, 1) > > You can always use > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetSubMatrix.html > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetSubMatrices.html > > to get copies, but if you just want to build things, you can use > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetLocalSubMatrix.html > > Thanks, > > Matt > > Franck > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: out.log Type: application/octet-stream Size: 929 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: matISLocalMat.cpp Type: application/octet-stream Size: 3889 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 24 06:54:18 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 24 May 2017 06:54:18 -0500 Subject: [petsc-users] Installation Error In-Reply-To: <15e1cc1.5bb6.15c38e117d7.Coremail.lirui319@hnu.edu.cn> References: <15e1cc1.5bb6.15c38e117d7.Coremail.lirui319@hnu.edu.cn> Message-ID: On Wed, May 24, 2017 at 12:14 AM, ?? wrote: > > Dear professor or engineer: > I meet a problem about installation to petsc. > When I type the code "./configure --with-cc=gcc --with-cxx=0 > --with-fc=0 --download-f2cblaslapack --download-mpich" on my terminal,the > answer reveals the following results. > > >>>ERROR:root:code for hash md5 was not found. > Traceback (most recent call last): > File "/home/zhuizhuluori/lirui/software/vapor-2.5.0-Linux_ > x86_64/vapor/vapor-2.5.0/lib/python2.7/hashlib.py", line 139, in globals()[__func_name] = __get_hash(__func_name) > File "/home/zhuizhuluori/lirui/software/vapor-2.5.0-Linux_ > x86_64/vapor/vapor-2.5.0/lib/python2.7/hashlib.py", line 91, in > __get_builtin_constructor > raise ValueError('unsupported hash type ' + name) > ValueError: unsupported hash type md5 > ERROR:root:code for hash sha1 was not found ..... > > I have used petsc for a long time,and never see the this problem.my > laptop is installed an old version of petsc and I wanna change it to a new > version.How can I fix it?Thanks for your heartful suggestion! > We are asking Python to do md5 and yours cannot. Is this the entire error message? Can you send configure.log? I think upgrading your Python will fix this. Thanks, Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 24 06:55:29 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 24 May 2017 06:55:29 -0500 Subject: [petsc-users] Accessing submatrices without additional memory usage In-Reply-To: References: Message-ID: On Wed, May 24, 2017 at 1:09 AM, Michal Derezinski wrote: > Hi, > > I want to be able to perform matrix operations on several contiguous > submatrices of a full matrix, without allocating the memory redundantly for > the submatrices (in addition to the memory that is already allocated for > the full matrix). > I tried using MatGetSubMatrix, but this function appears to allocate the > additional memory. > > The other way I found to do this is to create the smallest submatrices I > need first, then use MatCreateNest to combine them into bigger ones > (including the full matrix). > The documentation of MatCreateNest seems to indicate that it does not > allocate additional memory for storing the new matrix. > Is this the right approach, or is there a better one? > Yes, that is the right approach. Thanks, Matt > Thanks, > Michal Derezinski. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 24 06:59:48 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 24 May 2017 06:59:48 -0500 Subject: [petsc-users] Question on incomplete factorization level and fill In-Reply-To: <93217794-9c63-fd52-ab36-4174de8cb9c8@gmail.com> References: <93217794-9c63-fd52-ab36-4174de8cb9c8@gmail.com> Message-ID: On Wed, May 24, 2017 at 2:21 AM, Danyang Su wrote: > Dear All, > > I use PCFactorSetLevels for ILU and PCFactorSetFill for other > preconditioning in my code to help solve the problems that the default > option is hard to solve. However, I found the latter one, PCFactorSetFill > does not take effect for my problem. The matrices and rhs as well as the > solutions are attached from the link below. I obtain the solution using > hypre preconditioner and it takes 7 and 38 iterations for matrix 1 and > matrix 2. However, if I use other preconditioner, the solver just failed at > the first matrix. I have tested this matrix using the native sequential > solver (not PETSc) with ILU preconditioning. If I set the incomplete > factorization level to 0, this sequential solver will take more than 100 > iterations. If I increase the factorization level to 1 or more, it just > takes several iterations. This remind me that the PC factor for this > matrices should be increased. However, when I tried it in PETSc, it just > does not work. > > Matrix and rhs can be obtained from the link below. > > https://eilinator.eos.ubc.ca:8443/index.php/s/CalUcq9CMeblk4R > > Would anyone help to check if you can make this work by increasing the PC > factor level or fill? > We have ILU(k) supported in serial. However ILU(dt) which takes a tolerance only works through Hypre http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html I recommend you try SuperLU or MUMPS, which can both be downloaded automatically by configure, and do a full sparse LU. Thanks, Matt > Thanks and regards, > > Danyang > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 24 07:03:49 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 24 May 2017 07:03:49 -0500 Subject: [petsc-users] [petsc-dev] Using PETSc MatIS, how to matmult a global IS matrix and a global vector ? In-Reply-To: <1238048783.7876567.1495619158445.JavaMail.zimbra@inria.fr> References: <2012394521.6757315.1495383841678.JavaMail.zimbra@inria.fr> <264DC59D-B914-42E5-9A89-0746F21A37BF@gmail.com> <1392596904.7422896.1495533198072.JavaMail.zimbra@inria.fr> <1784716977.7683031.1495556883254.JavaMail.zimbra@inria.fr> <855172682.7687763.1495558287122.JavaMail.zimbra@inria.fr> <1238048783.7876567.1495619158445.JavaMail.zimbra@inria.fr> Message-ID: On Wed, May 24, 2017 at 4:45 AM, Franck Houssen wrote: > Coming from FEM, I believe the very confusing thing is that the local size > of the user problem (math, physics point of view - DDM domain size) is not > (can not be ?) the local size expected in MatCreateIS. > > My understanding is that the local size in MatIS is "just" related to > backend implementation problems (it's logical that this local size is > necessary, but, for another purpose: MPI machinery). Taking a few steps > back, I can not see a case (I may be wrong) when a user does know how to > compute or set "by hand" the local size that MatIS will expect: my > understanding (once again, not sure) is that in most cases, the user will > need local size to be PETSC_DECIDE in MatIS (because he doesn't want to > "bother" with that or can not guess / compute it => unfortunatelly, as is, > this jam the whole thing). > > I guess this kind of signature for MatIS would avoid/limit confusion in > most cases and for most users : > PetscErrorCode MatCreateIS(MPI_Comm comm,PetscInt bs,PetscInt M,PetscInt > N,ISLocalToGlobalMapping rmap,ISLocalToGlobalMapping cmap,Mat *A,PetscInt m *= > PETSC_DECIDE*,PetscInt n*= PETSC_DECIDE*) > Or even > PetscErrorCode MatCreateIS(MPI_Comm comm,PetscInt bs,PetscInt M,PetscInt > N,ISLocalToGlobalMapping rmap,ISLocalToGlobalMapping cmap,Mat *A) // > Always use PETSC_DECIDE backstage ? > I have added a MatIS example with the 1D Laplacian. https://bitbucket.org/petsc/petsc/branch/knepley/feature-matis-example Thanks, Matt > Franck > > ------------------------------ > > *De: *"Matthew Knepley" > *?: *"Franck Houssen" > *Cc: *"Stefano Zampini" , "PETSc" < > petsc-users at mcs.anl.gov>, "PETSc" > *Envoy?: *Mardi 23 Mai 2017 19:02:28 > *Objet: *Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS > matrix and a global vector ? > > On Tue, May 23, 2017 at 11:51 AM, Franck Houssen > wrote: > >> Not sure to know what question you're talking about ?!... >> I use MatIS to test some kind of domain decomposition methods. I define >> my own preconditioner for that: in the apply callback, I need to matmult my >> (matIS) matrix with the incoming vector. >> > > Okay. I will create an example using your suggestion. > > Thanks, > > Matt > > >> Franck >> >> ------------------------------ >> >> *De: *"Matthew Knepley" >> *?: *"Franck Houssen" >> *Cc: *"Stefano Zampini" , "PETSc" < >> petsc-users at mcs.anl.gov>, "PETSc" >> *Envoy?: *Mardi 23 Mai 2017 18:46:34 >> *Objet: *Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS >> matrix and a global vector ? >> >> On Tue, May 23, 2017 at 11:28 AM, Franck Houssen > > wrote: >> >>> OK, thanks. This is helpfull... But I really think the doc should be >>> more verbose about that: this is really confusing and I didn't find any >>> simple example to begin with which make all this even more confusing >>> (personal opinion). >>> >> >> Did you respond to my other question (how are you using them)? That would >> help me understand how to phrase it. >> >> Thanks, >> >> Matt >> >> >>> Franck >>> >>> >>> ------------------------------ >>> >>> *De: *"Matthew Knepley" >>> *?: *"Franck Houssen" >>> *Cc: *"Stefano Zampini" , "PETSc" < >>> petsc-users at mcs.anl.gov>, "PETSc" >>> *Envoy?: *Mardi 23 Mai 2017 13:21:21 >>> *Objet: *Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS >>> matrix and a global vector ? >>> >>> On Tue, May 23, 2017 at 4:53 AM, Franck Houssen >> > wrote: >>> >>>> The first thing I did was to put 3, not 4 : I got an error thrown in >>>> MatCreateIS (see the git diff + stack below). As the error said I used >>>> globalSize = numberOfMPIProcessus * localSize : my understanding is that, >>>> when using MatIS, the global size needs to be the sum of all local sizes. >>>> Correct ? >>>> >>> >>> No. MatIS means that the matrix is not assembled. The easiest way (for >>> me) to think of this is that processes do not have >>> to hold full rows. One process can hold part of row i, and another >>> processes can hold another part. However, there are still >>> the same number of global rows. >>> >>> I have a 3x3 global matrix made of two overlapping 2x2 local matrix (= >>>> diagonal with 1.). Each local matrix correspond to one domain (each domain >>>> is delegated to one MPI proc, so, I have 2 MPI procs because I have 2 >>>> domains). >>>> >>> >>> So the global size is 3. The local size here is not the size of the >>> local IS block, since that is a property only of MatIS. It is the >>> size of the local piece of the vector you multiply. This allows PETSc to >>> understand the parallel layout of the Vec, and how it >>> matched the Mat. >>> >>> This is somewhat confusing because FEM people mean something different >>> by "local" than we do here, and in fact we use this >>> other definition of local when assembling operators. >>> >>> Matt >>> >>> >>>> This is the simplest possible example: I have two 2x2 (local) diag >>>> matrix that overlap so that the global matrix built from them is 1, 2, 1 on >>>> the diagonal (local contributions add up in the middle). >>>> I need to MatMult this global matrix with a global vector filled with 1. >>>> >>>> Franck >>>> >>>> Git diff : >>>> >>>> --- a/matISLocalMat.cpp >>>> +++ b/matISLocalMat.cpp >>>> @@ -16,7 +16,7 @@ int main(int argc,char **argv) { >>>> int size = 0; MPI_Comm_size(MPI_COMM_WORLD, &size); if (size != 2) >>>> return 1; >>>> int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); >>>> >>>> - PetscInt localSize = 2, globalSize = localSize*2 /*2 MPI*/; >>>> + PetscInt localSize = 2, globalSize = 3; >>>> PetscInt localIdx[2] = {0, 0}; >>>> if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} >>>> else {localIdx[0] = 1; localIdx[1] = 2;} >>>> >>>> >>>> >>>> Stack error: >>>> >>>> [0]PETSC ERROR: Nonconforming object sizes >>>> [0]PETSC ERROR: Sum of local lengths 4 does not equal global length 3, >>>> my local length 2 >>>> [0]PETSC ERROR: [0] ISG2LMapApply line 17 /home/fghoussen/Documents/ >>>> INRIA/petsc-3.7.6/src/vec/is/utils/isltog.c >>>> [0]PETSC ERROR: [0] MatSetValues_IS line 692 /home/fghoussen/Documents/ >>>> INRIA/petsc-3.7.6/src/mat/impls/is/matis.c >>>> [0]PETSC ERROR: [0] MatSetValues line 1157 /home/fghoussen/Documents/ >>>> INRIA/petsc-3.7.6/src/mat/interface/matrix.c >>>> [0]PETSC ERROR: [0] MatISSetPreallocation_IS line 95 >>>> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c >>>> [0]PETSC ERROR: [0] MatISSetPreallocation line 80 >>>> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c >>>> [0]PETSC ERROR: [0] PetscSplitOwnership line 80 >>>> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/sys/utils/psplit.c >>>> [0]PETSC ERROR: [0] PetscLayoutSetUp line 129 /home/fghoussen/Documents/ >>>> INRIA/petsc-3.7.6/src/vec/is/utils/pmap.c >>>> [0]PETSC ERROR: [0] MatSetLocalToGlobalMapping_IS line 628 >>>> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c >>>> [0]PETSC ERROR: [0] MatSetLocalToGlobalMapping line 1899 >>>> /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c >>>> [0]PETSC ERROR: [0] MatCreateIS line 986 /home/fghoussen/Documents/ >>>> INRIA/petsc-3.7.6/src/mat/impls/is/matis.c >>>> >>>> >>>> >>>> ------------------------------ >>>> >>>> *De: *"Stefano Zampini" >>>> *?: *"Matthew Knepley" >>>> *Cc: *"Franck Houssen" , "PETSc" < >>>> petsc-users at mcs.anl.gov>, "PETSc" >>>> *Envoy?: *Dimanche 21 Mai 2017 23:02:37 >>>> *Objet: *Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS >>>> matrix and a global vector ? >>>> >>>> Franck, >>>> >>>> PETSc takes care of doing the matrix-vector multiplication properly >>>> using MatIS. As Matt said, the layout of the vectors is the usual parallel >>>> layout. >>>> The local sizes of the MatIS matrix (i.e. the local size of the left >>>> and right vectors used in MatMult) are not the sizes of the local subdomain >>>> matrices in MatIS. >>>> >>>> >>>> On May 21, 2017, at 6:47 PM, Matthew Knepley wrote: >>>> >>>> On Sun, May 21, 2017 at 11:26 AM, Franck Houssen < >>>> franck.houssen at inria.fr> wrote: >>>> >>>>> Using PETSc MatIS, how to matmult a global IS matrix and a global >>>>> vector ? Example is attached : I don't get what I expect that is a vector >>>>> such that proc0 = [1, 2] and proc1 = [2, 1] >>>>> >>>> >>>> 1) I think the global size of your matrix is wrong. You seem to want 3, >>>> not 4 >>>> >>>> 2) Global vectors have a non-overlapping row partition. You might be >>>> thinking of local vectors >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> http://www.caam.rice.edu/~mk51/ >>>> >>>> >>>> ------------------------------ >>>> >>>> *De: *"Stefano Zampini" >>>> *?: *"Matthew Knepley" >>>> *Cc: *"Franck Houssen" , "PETSc" < >>>> petsc-users at mcs.anl.gov>, "PETSc" >>>> *Envoy?: *Dimanche 21 Mai 2017 23:02:37 >>>> *Objet: *Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS >>>> matrix and a global vector ? >>>> >>>> Franck, >>>> >>>> PETSc takes care of doing the matrix-vector multiplication properly >>>> using MatIS. As Matt said, the layout of the vectors is the usual parallel >>>> layout. >>>> The local sizes of the MatIS matrix (i.e. the local size of the left >>>> and right vectors used in MatMult) are not the sizes of the local subdomain >>>> matrices in MatIS. >>>> >>>> >>>> On May 21, 2017, at 6:47 PM, Matthew Knepley wrote: >>>> >>>> On Sun, May 21, 2017 at 11:26 AM, Franck Houssen < >>>> franck.houssen at inria.fr> wrote: >>>> >>>>> Using PETSc MatIS, how to matmult a global IS matrix and a global >>>>> vector ? Example is attached : I don't get what I expect that is a vector >>>>> such that proc0 = [1, 2] and proc1 = [2, 1] >>>>> >>>> >>>> 1) I think the global size of your matrix is wrong. You seem to want 3, >>>> not 4 >>>> >>>> 2) Global vectors have a non-overlapping row partition. You might be >>>> thinking of local vectors >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Franck >>>>> >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> http://www.caam.rice.edu/~mk51/ >>>> >>>> >>>> >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> http://www.caam.rice.edu/~mk51/ >>> >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> http://www.caam.rice.edu/~mk51/ >> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Wed May 24 07:57:09 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 24 May 2017 07:57:09 -0500 Subject: [petsc-users] Installation Error In-Reply-To: <15e1cc1.5bb6.15c38e117d7.Coremail.lirui319@hnu.edu.cn> References: <15e1cc1.5bb6.15c38e117d7.Coremail.lirui319@hnu.edu.cn> Message-ID: What do you have for: which python echo $PYTHONPATH The following might work.. PYTHONPATH='' /usr/bin/python ./configure --with-cc=gcc --with-cxx=0 --with-fc=0 --download-f2cblaslapack --download-mpich Satish On Wed, 24 May 2017, ?? wrote: > > Dear professor or engineer: > I meet a problem about installation to petsc. > When I type the code "./configure --with-cc=gcc --with-cxx=0 --with-fc=0 --download-f2cblaslapack --download-mpich" on my terminal,the answer reveals the following results. > > >>>ERROR:root:code for hash md5 was not found. > Traceback (most recent call last): > File "/home/zhuizhuluori/lirui/software/vapor-2.5.0-Linux_x86_64/vapor/vapor-2.5.0/lib/python2.7/hashlib.py", line 139, in globals()[__func_name] = __get_hash(__func_name) > File "/home/zhuizhuluori/lirui/software/vapor-2.5.0-Linux_x86_64/vapor/vapor-2.5.0/lib/python2.7/hashlib.py", line 91, in __get_builtin_constructor > raise ValueError('unsupported hash type ' + name) > ValueError: unsupported hash type md5 > ERROR:root:code for hash sha1 was not found ..... > > I have used petsc for a long time,and never see the this problem.my laptop is installed an old version of petsc and I wanna change it to a new version.How can I fix it?Thanks for your heartful suggestion! > > > > > > From jchludzinski at gmail.com Wed May 24 08:03:01 2017 From: jchludzinski at gmail.com (John Chludzinski) Date: Wed, 24 May 2017 09:03:01 -0400 Subject: [petsc-users] PETSC OO C guide/standard? Message-ID: Is there a guide for how to write/develop PETSC OO C code? How a "class" is defined/implemented? How you implement inheritance? Memory management? Etc? ---John -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 24 08:11:46 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 24 May 2017 08:11:46 -0500 Subject: [petsc-users] PETSC OO C guide/standard? In-Reply-To: References: Message-ID: On Wed, May 24, 2017 at 8:03 AM, John Chludzinski wrote: > Is there a guide for how to write/develop PETSC OO C code? How a "class" > is defined/implemented? How you implement inheritance? Memory management? > Etc? > We have a guide: http://www.mcs.anl.gov/petsc/developers/developers.pdf If its not in there, you can mail the list. Thanks, Matt > ---John > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From franck.houssen at inria.fr Wed May 24 08:11:28 2017 From: franck.houssen at inria.fr (Franck Houssen) Date: Wed, 24 May 2017 15:11:28 +0200 (CEST) Subject: [petsc-users] [petsc-dev] Using PETSc MatIS, how to get local matrix (= one domain) before and after assembly ? In-Reply-To: <2C9AF920-14AF-4BB4-B2E0-D1162FA0A0BB@gmail.com> References: <867421313.6757137.1495383596545.JavaMail.zimbra@inria.fr> <2033509705.7414108.1495532492501.JavaMail.zimbra@inria.fr> <740691579.7684644.1495557293858.JavaMail.zimbra@inria.fr> <9EFA5BCF-FDD3-45FA-A41A-6AA304D58C74@gmail.com> <694141704.7876584.1495619161488.JavaMail.zimbra@inria.fr> <2C9AF920-14AF-4BB4-B2E0-D1162FA0A0BB@gmail.com> Message-ID: <1874209065.7990175.1495631488066.JavaMail.zimbra@inria.fr> OK, this is working now ! As the API changed between the latest stable and the master branch, I was actually not using the correct method. Thanks Stefano, Franck ----- Mail original ----- > De: "Stefano Zampini" > ?: "Franck Houssen" > Cc: "petsc-dev" , "PETSc users list" > , "petsc-maint" > Envoy?: Mercredi 24 Mai 2017 13:42:10 > Objet: Re: [petsc-dev] Using PETSc MatIS, how to get local matrix (= one > domain) before and after assembly ? > > On May 24, 2017, at 11:46 AM, Franck Houssen < franck.houssen at inria.fr > > > wrote: > > > The code I sent compile and run at my side with petsc-3.7.6 (on > > debian/testing with gcc-6.3). The code you sent does not compile at my > > side. > > Anyway, no big deal. > > MatGetSubMatrix/MatGetSubMatrices have been renamed to > MatCreateSubMatrix/MatCreateSubMatrices in petsc-dev. I thought you were > using the master branch and not the latest release. Sorry for the confusion. > To compile the code I have sent, just rename MatCreateSubMatrices with > MatGetSubMatrices and it should work. > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetSubMatrices.html#MatGetSubMatrices > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetSubMatrix.html#MatGetSubMatrix > > The modification you propose as far as I understand is to replace > > "ISCreateGeneral(PETSC_COMM_WORLD" with "ISCreateGeneral(PETSC_COMM_SELF" : > > still not working at my side (empty dirichlet local matrix). > > > I will try to get that with a MPI matrix (that would contain same data that > > MatIS : that's what I tried to avoid as this doubles allocations - anyway, > > no big deal). > > In the code, you are already extracting submatrices from MPIAIJ format, not > from MATIS. Attached a code that compiles and runs with petsc-3.7.6 > > Franck > > > ----- Mail original ----- > > > > De: "Stefano Zampini" < stefano.zampini at gmail.com > > > > > > > ?: "Franck Houssen" < franck.houssen at inria.fr > > > > > > > Cc: "petsc-dev" < petsc-dev at mcs.anl.gov >, "PETSc users list" < > > > petsc-users at mcs.anl.gov >, "petsc-maint" < knepley at gmail.com > > > > > > > Envoy?: Mardi 23 Mai 2017 20:23:49 > > > > > > Objet: Re: [petsc-dev] Using PETSc MatIS, how to get local matrix (= one > > > domain) before and after assembly ? > > > > > > > On May 23, 2017, at 6:34 PM, Franck Houssen < franck.houssen at inria.fr > > > > > wrote: > > > > > > > > > > OK. I am supposed to destroy the matrix returned by MatISGetMPIXAIJ ? > > > > > > > > > Yes > > > > > > > Also, my example still not get the final assembled local matrix (the > > > > MatCreateSubMatrix returns an empty matrix) but as far as I understand > > > > my > > > > (global) index set is OK: what did I miss ? > > > > > > > > > I really doubt you can use the example you have sent. It doesn?t compile, > > > as > > > MatCreateSubMatrix needs an extra argument. > > > > > > Attached a modified version that does what I guess is what you are > > > looking > > > for (sequential Dirichlet problems on the subdomains). > > > > > > > Franck > > > > > > > > > > ----- Mail original ----- > > > > > > > > > > > De: "Stefano Zampini" < stefano.zampini at gmail.com > > > > > > > > > > > > > > > > ?: "Franck Houssen" < franck.houssen at inria.fr > > > > > > > > > > > > > > > > Cc: "petsc-dev" < petsc-dev at mcs.anl.gov >, "PETSc users list" < > > > > > petsc-users at mcs.anl.gov >, "petsc-maint" < knepley at gmail.com > > > > > > > > > > > > > > > > Envoy?: Mardi 23 Mai 2017 13:16:18 > > > > > > > > > > > > > > > Objet: Re: [petsc-dev] Using PETSc MatIS, how to get local matrix (= > > > > > one > > > > > domain) before and after assembly ? > > > > > > > > > > > > > > > MatISGetMPIXAIJ is collective, as it assembles the global operator. > > > > > To > > > > > get > > > > > the matrices you are looking for, you should call MatCreateSubMatrix > > > > > on > > > > > the > > > > > assembled global operator, with the global indices representing the > > > > > subdomain problem. Each process needs to call both functions > > > > > > > > > > > > > > > Stefano > > > > > > > > > > > > > > > Il 23 Mag 2017 11:41, "Franck Houssen" < franck.houssen at inria.fr > ha > > > > > scritto: > > > > > > > > > > > > > > > > I have a 3x3 global matrix made of two overlapping 2x2 local matrix > > > > > > (= > > > > > > diagonal with 1.). Each local matrix correspond to one domain (each > > > > > > domain > > > > > > is delegated to one MPI proc, so, I have 2 MPI procs because I have > > > > > > 2 > > > > > > domains). > > > > > > > > > > > > > > > > > > > > > This is the simplest possible example: I have two 2x2 (local) diag > > > > > > matrix > > > > > > that overlap so that the global matrix built from them is 1, 2, 1 > > > > > > on > > > > > > the > > > > > > diagonal (local contributions add up in the middle). > > > > > > > > > > > > > > > > > > > > > Now, I need for each MPI proc to get the assembled local matrix > > > > > > (sometimes > > > > > > called the dirichlet matrix) : this is a local matrix (sequential - > > > > > > not > > > > > > distributed with MPI) that accounts for contribution of neighboring > > > > > > domains > > > > > > (MPI proc). > > > > > > > > > > > > > > > > > > > > > How to get the local assembled matrix ? MatGetLocalSubMatrix does > > > > > > not > > > > > > work > > > > > > (throw error - see example attached). MatGetSubMatrix returns a MPI > > > > > > distributed matrix, not a local (sequential) one. > > > > > > > > > > > > > > > > > > > > > 1. My understanding is that MatISGetMPIXAIJ should return a local > > > > > > matrix > > > > > > (sequential AIJ matrix) : the MPI in the name recall that you get > > > > > > the > > > > > > assembled matrix (with contributions from the shared border) from > > > > > > the > > > > > > other > > > > > > MPI processus. Correct ? In my simple example, I replaced > > > > > > MatGetLocalSubMatrix with MatISGetMPIXAIJ : I get a deadlock which > > > > > > was > > > > > > surprising to me... Is MatISGetMPIXAIJ a collective call ? > > > > > > > > > > > > > > > > > > > > > 2. Supposing this is a collective call (and that point 1 is not > > > > > > correct), > > > > > > I > > > > > > ride up MatISGetMPIXAIJ before the "if (rank > 0)" : I don't > > > > > > deadlock > > > > > > now, > > > > > > but it seems I get a global matrix which is not the assembled local > > > > > > matrix > > > > > > I > > > > > > am looking for. > > > > > > > > > > > > > > > > > > > > > 3. I am supposed to destroy the matrix returned by MatISGetMPIXAIJ > > > > > > ? > > > > > > (I > > > > > > believe yes - not sure as AFAIU wording should associate Destroy > > > > > > methods > > > > > > to > > > > > > Create methods) > > > > > > > > > > > > > > > > > > > > > Franck > > > > > > > > > > > > > > > > > > > > > The git diff illustrate modifications I tried to add to the initial > > > > > > file > > > > > > attached to this thread: > > > > > > > > > > > > > > > > > > > > > --- a/matISLocalMat.cpp > > > > > > > > > > > > > > > > > > > > > +++ b/matISLocalMat.cpp > > > > > > > > > > > > > > > > > > > > > @@ -31,6 +31,8 @@ int main(int argc,char **argv) { > > > > > > > > > > > > > > > > > > > > > MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); MatAssemblyEnd(A, > > > > > > MAT_FINAL_ASSEMBLY); > > > > > > > > > > > > > > > > > > > > > MatView(A, PETSC_VIEWER_STDOUT_WORLD); > > > > > > PetscViewerFlush(PETSC_VIEWER_STDOUT_WORLD); // Diag: 1, 2, 1 > > > > > > > > > > > > > > > > > > > > > + Mat assembledLocalMat; > > > > > > > > > > > > > > > > > > > > > + MatISGetMPIXAIJ(A, MAT_INITIAL_MATRIX, &assembledLocalMat); > > > > > > > > > > > > > > > > > > > > > if (rank > 0) { // Do not pollute stdout: print only 1 proc > > > > > > > > > > > > > > > > > > > > > std::cout << std::endl << "non assembled local matrix:" << > > > > > > std::endl > > > > > > << > > > > > > std::endl; > > > > > > > > > > > > > > > > > > > > > Mat nonAssembledLocalMat; > > > > > > > > > > > > > > > > > > > > > @@ -38,11 +40,10 @@ int main(int argc,char **argv) { > > > > > > > > > > > > > > > > > > > > > MatView(nonAssembledLocalMat, PETSC_VIEWER_STDOUT_SELF); // Diag: > > > > > > 1, > > > > > > 1 > > > > > > > > > > > > > > > > > > > > > std::cout << std::endl << "assembled local matrix:" << std::endl << > > > > > > std::endl; > > > > > > > > > > > > > > > > > > > > > - Mat assembledLocalMat; > > > > > > > > > > > > > > > > > > > > > - IS is; ISCreateGeneral(PETSC_COMM_SELF, localSize, localIdx, > > > > > > PETSC_COPY_VALUES, &is); > > > > > > > > > > > > > > > > > > > > > - MatGetLocalSubMatrix(A, is, is, &assembledLocalMat); // KO ?!... > > > > > > > > > > > > > > > > > > > > > - MatView(assembledLocalMat, PETSC_VIEWER_STDOUT_SELF); // Would > > > > > > like > > > > > > to > > > > > > get > > > > > > => Diag: 2, 1 > > > > > > > > > > > > > > > > > > > > > + //IS is; ISCreateGeneral(PETSC_COMM_SELF, localSize, localIdx, > > > > > > PETSC_COPY_VALUES, &is); > > > > > > > > > > > > > > > > > > > > > + //MatGetLocalSubMatrix(A, is, is, &assembledLocalMat); // KO > > > > > > ?!... > > > > > > > > > > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > + MatView(assembledLocalMat, PETSC_VIEWER_STDOUT_WORLD); // Would > > > > > > like > > > > > > to > > > > > > get > > > > > > => Diag: 2, 1 > > > > > > > > > > > > > > > > > > > > > > De: "Stefano Zampini" < stefano.zampini at gmail.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ?: "petsc-maint" < knepley at gmail.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cc: "petsc-dev" < petsc-dev at mcs.anl.gov >, "PETSc users list" < > > > > > > > petsc-users at mcs.anl.gov >, "Franck Houssen" < > > > > > > > franck.houssen at inria.fr > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Envoy?: Dimanche 21 Mai 2017 22:51:34 > > > > > > > > > > > > > > > > > > > > > > > > > > > > Objet: Re: [petsc-dev] Using PETSc MatIS, how to get local matrix > > > > > > > (= > > > > > > > one > > > > > > > domain) before and after assembly ? > > > > > > > > > > > > > > > > > > > > > > > > > > > > To assemble the operator in aij format, use > > > > > > > > > > > > > > > > > > > > > > > > > > > > MatISGetMPIXAIJ > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatISGetMPIXAIJ.html > > > > > > > > > > > > > > > > > > > > > > > > > > > > Il 21 Mag 2017 18:43, "Matthew Knepley" < knepley at gmail.com > ha > > > > > > > scritto: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sun, May 21, 2017 at 11:23 AM, Franck Houssen < > > > > > > > > franck.houssen at inria.fr > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I have a 3x3 global matrix is built (diag: 1, 2, 1): it's > > > > > > > > > made > > > > > > > > > of > > > > > > > > > 2 > > > > > > > > > overlapping 2x2 local matrix (diag: 1, 1). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Getting non assembled local matrix is OK with > > > > > > > > > MatISGetLocalMat. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > How to get assembled local matrix (initial local matrix + > > > > > > > > > neigbhor > > > > > > > > > contributions on the borders) ? (expected result is diag: 2, > > > > > > > > > 1) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > You can always use > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetSubMatrix.html > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetSubMatrices.html > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > to get copies, but if you just want to build things, you can > > > > > > > > use > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetLocalSubMatrix.html > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Matt > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Franck > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > What most experimenters take for granted before they begin > > > > > > > > their > > > > > > > > experiments > > > > > > > > is infinitely more interesting than any results to which their > > > > > > > > experiments > > > > > > > > lead. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- Norbert Wiener > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://www.caam.rice.edu/~mk51/ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jchludzinski at gmail.com Wed May 24 08:50:39 2017 From: jchludzinski at gmail.com (John Chludzinski) Date: Wed, 24 May 2017 09:50:39 -0400 Subject: [petsc-users] PETSC OO C guide/standard? In-Reply-To: References: Message-ID: Considering that the current C++ standard is >1600 pages and counting (still glomming on new "features"), I'm planning to try an OO style of C coding style. The standard's size (number of pages) being the best (and only *practical*) means to measure language complexity. On Wed, May 24, 2017 at 9:11 AM, Matthew Knepley wrote: > On Wed, May 24, 2017 at 8:03 AM, John Chludzinski > wrote: > >> Is there a guide for how to write/develop PETSC OO C code? How a "class" >> is defined/implemented? How you implement inheritance? Memory management? >> Etc? >> > > We have a guide: http://www.mcs.anl.gov/petsc/developers/developers.pdf > > If its not in there, you can mail the list. > > Thanks, > > Matt > > >> ---John >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 24 08:53:35 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 24 May 2017 08:53:35 -0500 Subject: [petsc-users] PETSC OO C guide/standard? In-Reply-To: References: Message-ID: On Wed, May 24, 2017 at 8:50 AM, John Chludzinski wrote: > Considering that the current C++ standard is >1600 pages and counting > (still glomming on new "features"), I'm planning to try an OO style of C > coding style. > > The standard's size (number of pages) being the best (and only *practical*) > means to measure language complexity. > Here is another thing I wrote talking about OO in PETSc: https://arxiv.org/abs/1209.1711 Matt > On Wed, May 24, 2017 at 9:11 AM, Matthew Knepley > wrote: > >> On Wed, May 24, 2017 at 8:03 AM, John Chludzinski > > wrote: >> >>> Is there a guide for how to write/develop PETSC OO C code? How a "class" >>> is defined/implemented? How you implement inheritance? Memory management? >>> Etc? >>> >> >> We have a guide: http://www.mcs.anl.gov/petsc/developers/developers.pdf >> >> If its not in there, you can mail the list. >> >> Thanks, >> >> Matt >> >> >>> ---John >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> http://www.caam.rice.edu/~mk51/ >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed May 24 12:37:24 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 24 May 2017 12:37:24 -0500 Subject: [petsc-users] Question on incomplete factorization level and fill In-Reply-To: <93217794-9c63-fd52-ab36-4174de8cb9c8@gmail.com> References: <93217794-9c63-fd52-ab36-4174de8cb9c8@gmail.com> Message-ID: <1BBCC24A-97A1-49CD-A234-09837E24FCA8@mcs.anl.gov> > On May 24, 2017, at 2:21 AM, Danyang Su wrote: > > Dear All, > > I use PCFactorSetLevels for ILU and PCFactorSetFill for other preconditioning in my code to help solve the problems that the default option is hard to solve. However, I found the latter one, PCFactorSetFill does not take effect for my problem. SetFill doesn't affect the numerical answers at all. It is just a prediction you make of how much memory you expect to be used inside the factorization. > The matrices and rhs as well as the solutions are attached from the link below. I obtain the solution using hypre preconditioner and it takes 7 and 38 iterations for matrix 1 and matrix 2. However, if I use other preconditioner, the solver just failed at the first matrix. I have tested this matrix using the native sequential solver (not PETSc) with ILU preconditioning. If I set the incomplete factorization level to 0, this sequential solver will take more than 100 iterations. If I increase the factorization level to 1 or more, it just takes several iterations. This remind me that the PC factor for this matrices should be increased. However, when I tried it in PETSc, it just does not work. > > Matrix and rhs can be obtained from the link below. > > https://eilinator.eos.ubc.ca:8443/index.php/s/CalUcq9CMeblk4R > > Would anyone help to check if you can make this work by increasing the PC factor level or fill? > > Thanks and regards, > > Danyang > > From michal.derezinski at gmail.com Wed May 24 12:37:11 2017 From: michal.derezinski at gmail.com (=?utf-8?Q?Micha=C5=82_Derezi=C5=84ski?=) Date: Wed, 24 May 2017 10:37:11 -0700 Subject: [petsc-users] Accessing submatrices without additional memory usage In-Reply-To: References: Message-ID: <0B69CBD4-9524-429A-8478-0BBF0C236F94@gmail.com> Great! Then I have a follow-up question: My goal is to be able to load the full matrix X from disk, while at the same time in parallel, performing computations on the submatrices that have already been loaded. Essentially, I want to think of X as a block matrix (where the blocks are horizontal, spanning the full width of the matrix), where I?m loading one block at a time, and all the blocks that have already been loaded are combined using MatCreateNest, so that I can make computations on that portion of the matrix. In this scenario, every process needs to be simultaneously loading the next block of X, and perform computations on the previously loaded portion. My strategy is for each MPI process to spawn a thread for data loading (so that the memory between the process and the thread is shared), while the process does computations. My concern is that the data loading thread may be using up computational resources of the processor, even though it is mainly doing IO. Will this be an issue? What is the best way to minimize the cpu time of this parallel data loading scheme? Thanks, Michal. > Wiadomo?? napisana przez Matthew Knepley w dniu 24.05.2017, o godz. 04:55: > > On Wed, May 24, 2017 at 1:09 AM, Michal Derezinski > wrote: > Hi, > > I want to be able to perform matrix operations on several contiguous submatrices of a full matrix, without allocating the memory redundantly for the submatrices (in addition to the memory that is already allocated for the full matrix). > I tried using MatGetSubMatrix, but this function appears to allocate the additional memory. > > The other way I found to do this is to create the smallest submatrices I need first, then use MatCreateNest to combine them into bigger ones (including the full matrix). > The documentation of MatCreateNest seems to indicate that it does not allocate additional memory for storing the new matrix. > Is this the right approach, or is there a better one? > > Yes, that is the right approach. > > Thanks, > > Matt > > Thanks, > Michal Derezinski. > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed May 24 12:43:49 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 24 May 2017 12:43:49 -0500 Subject: [petsc-users] Accessing submatrices without additional memory usage In-Reply-To: <0B69CBD4-9524-429A-8478-0BBF0C236F94@gmail.com> References: <0B69CBD4-9524-429A-8478-0BBF0C236F94@gmail.com> Message-ID: <226AA580-DAC1-4455-AB26-A98ECB76A2FA@mcs.anl.gov> How big are the sub matrices, how many MPI processes are you hoping to use, how fast/sophisticated is your file system? All of these things and others will determine whether this approach will buy you anything or not. I recommend NOT doing this first, instead just sequentially read in the matrices and perform the computations and then run profiling to determine where the time is being spent and whether even trying this kind of optimization makes sense. I suspect it does not. Barry > On May 24, 2017, at 12:37 PM, Micha? Derezi?ski wrote: > > Great! Then I have a follow-up question: > > My goal is to be able to load the full matrix X from disk, while at the same time in parallel, performing computations on the submatrices that have already been loaded. Essentially, I want to think of X as a block matrix (where the blocks are horizontal, spanning the full width of the matrix), where I?m loading one block at a time, and all the blocks that have already been loaded are combined using MatCreateNest, so that I can make computations on that portion of the matrix. > > In this scenario, every process needs to be simultaneously loading the next block of X, and perform computations on the previously loaded portion. My strategy is for each MPI process to spawn a thread for data loading (so that the memory between the process and the thread is shared), while the process does computations. My concern is that the data loading thread may be using up computational resources of the processor, even though it is mainly doing IO. Will this be an issue? What is the best way to minimize the cpu time of this parallel data loading scheme? > > Thanks, > Michal. > > >> Wiadomo?? napisana przez Matthew Knepley w dniu 24.05.2017, o godz. 04:55: >> >> On Wed, May 24, 2017 at 1:09 AM, Michal Derezinski wrote: >> Hi, >> >> I want to be able to perform matrix operations on several contiguous submatrices of a full matrix, without allocating the memory redundantly for the submatrices (in addition to the memory that is already allocated for the full matrix). >> I tried using MatGetSubMatrix, but this function appears to allocate the additional memory. >> >> The other way I found to do this is to create the smallest submatrices I need first, then use MatCreateNest to combine them into bigger ones (including the full matrix). >> The documentation of MatCreateNest seems to indicate that it does not allocate additional memory for storing the new matrix. >> Is this the right approach, or is there a better one? >> >> Yes, that is the right approach. >> >> Thanks, >> >> Matt >> >> Thanks, >> Michal Derezinski. >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> http://www.caam.rice.edu/~mk51/ > From knepley at gmail.com Wed May 24 12:44:47 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 24 May 2017 12:44:47 -0500 Subject: [petsc-users] Accessing submatrices without additional memory usage In-Reply-To: <0B69CBD4-9524-429A-8478-0BBF0C236F94@gmail.com> References: <0B69CBD4-9524-429A-8478-0BBF0C236F94@gmail.com> Message-ID: On Wed, May 24, 2017 at 12:37 PM, Micha? Derezi?ski < michal.derezinski at gmail.com> wrote: > Great! Then I have a follow-up question: > > My goal is to be able to load the full matrix X from disk, while at the > same time in parallel, performing computations on the submatrices that have > already been loaded. Essentially, I want to think of X as a block matrix > (where the blocks are horizontal, spanning the full width of the matrix), > where I?m loading one block at a time, and all the blocks that have already > been loaded are combined using MatCreateNest, so that I can make > computations on that portion of the matrix. > I need to understand better. So 1) You want to load a sparse matrix from disk 2) You are imagining that it is loaded row-wise, since you can do a calculation with some rows before others are loaded. What calculation, a MatMult? How long does that MatMult take compared to loading? 3) If you are talking about a dense matrix, you should be loading in parallel using MPI-I/O. We do this for Vec. Before you do complicated programming, I would assure myself that the performance gain is worth it. > In this scenario, every process needs to be simultaneously loading the > next block of X, and perform computations on the previously loaded portion. > My strategy is for each MPI process to spawn a thread for data loading (so > that the memory between the process and the thread is shared), while the > process does computations. My concern is that the data loading thread may > be using up computational resources of the processor, even though it is > mainly doing IO. Will this be an issue? What is the best way to minimize > the cpu time of this parallel data loading scheme? > Oh, you want to load each block in parallel, but there are many blocks. I would really caution you against using threads. They are death to clean code. Use non-blocking reads. Thanks, Matt > Thanks, > Michal. > > > Wiadomo?? napisana przez Matthew Knepley w dniu > 24.05.2017, o godz. 04:55: > > On Wed, May 24, 2017 at 1:09 AM, Michal Derezinski > wrote: > >> Hi, >> >> I want to be able to perform matrix operations on several contiguous >> submatrices of a full matrix, without allocating the memory redundantly for >> the submatrices (in addition to the memory that is already allocated for >> the full matrix). >> I tried using MatGetSubMatrix, but this function appears to allocate the >> additional memory. >> >> The other way I found to do this is to create the smallest submatrices I >> need first, then use MatCreateNest to combine them into bigger ones >> (including the full matrix). >> The documentation of MatCreateNest seems to indicate that it does not >> allocate additional memory for storing the new matrix. >> Is this the right approach, or is there a better one? >> > > Yes, that is the right approach. > > Thanks, > > Matt > > >> Thanks, >> Michal Derezinski. >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From danyang.su at gmail.com Wed May 24 12:50:25 2017 From: danyang.su at gmail.com (Danyang Su) Date: Wed, 24 May 2017 10:50:25 -0700 Subject: [petsc-users] Question on incomplete factorization level and fill In-Reply-To: References: <93217794-9c63-fd52-ab36-4174de8cb9c8@gmail.com> Message-ID: Hi Matthew and Barry, Thanks for the quick response. I also tried superlu and mumps, both work but it is about four times slower than ILU(dt) prec through hypre, with 24 processors I have tested. When I look into the convergence information, the method using ILU(dt) still takes 200 to 3000 linear iterations for each newton iteration. One reason is this equation is hard to solve. As for the general cases, the same method works awesome and get very good speedup. I also doubt if I use hypre correctly for this case. Is there anyway to check this problem, or is it possible to increase the factorization level through hypre? Thanks, Danyang On 17-05-24 04:59 AM, Matthew Knepley wrote: > On Wed, May 24, 2017 at 2:21 AM, Danyang Su > wrote: > > Dear All, > > I use PCFactorSetLevels for ILU and PCFactorSetFill for other > preconditioning in my code to help solve the problems that the > default option is hard to solve. However, I found the latter one, > PCFactorSetFill does not take effect for my problem. The matrices > and rhs as well as the solutions are attached from the link below. > I obtain the solution using hypre preconditioner and it takes 7 > and 38 iterations for matrix 1 and matrix 2. However, if I use > other preconditioner, the solver just failed at the first matrix. > I have tested this matrix using the native sequential solver (not > PETSc) with ILU preconditioning. If I set the incomplete > factorization level to 0, this sequential solver will take more > than 100 iterations. If I increase the factorization level to 1 or > more, it just takes several iterations. This remind me that the PC > factor for this matrices should be increased. However, when I > tried it in PETSc, it just does not work. > > Matrix and rhs can be obtained from the link below. > > https://eilinator.eos.ubc.ca:8443/index.php/s/CalUcq9CMeblk4R > > > Would anyone help to check if you can make this work by increasing > the PC factor level or fill? > > > We have ILU(k) supported in serial. However ILU(dt) which takes a > tolerance only works through Hypre > > http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html > > I recommend you try SuperLU or MUMPS, which can both be downloaded > automatically by configure, and > do a full sparse LU. > > Thanks, > > Matt > > Thanks and regards, > > Danyang > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 24 13:12:07 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 24 May 2017 13:12:07 -0500 Subject: [petsc-users] Question on incomplete factorization level and fill In-Reply-To: References: <93217794-9c63-fd52-ab36-4174de8cb9c8@gmail.com> Message-ID: On Wed, May 24, 2017 at 12:50 PM, Danyang Su wrote: > Hi Matthew and Barry, > > Thanks for the quick response. > > I also tried superlu and mumps, both work but it is about four times > slower than ILU(dt) prec through hypre, with 24 processors I have tested. > You mean the total time is 4x? And you are taking hundreds of iterates? That seems hard to believe, unless you are dropping a huge number of elements. > When I look into the convergence information, the method using ILU(dt) > still takes 200 to 3000 linear iterations for each newton iteration. One > reason is this equation is hard to solve. As for the general cases, the > same method works awesome and get very good speedup. > I do not understand what you mean here. > I also doubt if I use hypre correctly for this case. Is there anyway to > check this problem, or is it possible to increase the factorization level > through hypre? > > I don't know. Matt > Thanks, > > Danyang > > On 17-05-24 04:59 AM, Matthew Knepley wrote: > > On Wed, May 24, 2017 at 2:21 AM, Danyang Su wrote: > >> Dear All, >> >> I use PCFactorSetLevels for ILU and PCFactorSetFill for other >> preconditioning in my code to help solve the problems that the default >> option is hard to solve. However, I found the latter one, PCFactorSetFill >> does not take effect for my problem. The matrices and rhs as well as the >> solutions are attached from the link below. I obtain the solution using >> hypre preconditioner and it takes 7 and 38 iterations for matrix 1 and >> matrix 2. However, if I use other preconditioner, the solver just failed at >> the first matrix. I have tested this matrix using the native sequential >> solver (not PETSc) with ILU preconditioning. If I set the incomplete >> factorization level to 0, this sequential solver will take more than 100 >> iterations. If I increase the factorization level to 1 or more, it just >> takes several iterations. This remind me that the PC factor for this >> matrices should be increased. However, when I tried it in PETSc, it just >> does not work. >> >> Matrix and rhs can be obtained from the link below. >> >> https://eilinator.eos.ubc.ca:8443/index.php/s/CalUcq9CMeblk4R >> >> Would anyone help to check if you can make this work by increasing the PC >> factor level or fill? >> > > We have ILU(k) supported in serial. However ILU(dt) which takes a > tolerance only works through Hypre > > http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html > > I recommend you try SuperLU or MUMPS, which can both be downloaded > automatically by configure, and > do a full sparse LU. > > Thanks, > > Matt > > >> Thanks and regards, >> >> Danyang >> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From michal.derezinski at gmail.com Wed May 24 13:13:42 2017 From: michal.derezinski at gmail.com (=?utf-8?Q?Micha=C5=82_Derezi=C5=84ski?=) Date: Wed, 24 May 2017 11:13:42 -0700 Subject: [petsc-users] Accessing submatrices without additional memory usage In-Reply-To: References: <0B69CBD4-9524-429A-8478-0BBF0C236F94@gmail.com> Message-ID: > Wiadomo?? napisana przez Matthew Knepley w dniu 24.05.2017, o godz. 10:44: > > On Wed, May 24, 2017 at 12:37 PM, Micha? Derezi?ski > wrote: > Great! Then I have a follow-up question: > > My goal is to be able to load the full matrix X from disk, while at the same time in parallel, performing computations on the submatrices that have already been loaded. Essentially, I want to think of X as a block matrix (where the blocks are horizontal, spanning the full width of the matrix), where I?m loading one block at a time, and all the blocks that have already been loaded are combined using MatCreateNest, so that I can make computations on that portion of the matrix. > > I need to understand better. So > > 1) You want to load a sparse matrix from disk > Yes, the matrix is sparse, stored on disk in row-wise chunks (one per process), with total size of around 3TB. > 2) You are imagining that it is loaded row-wise, since you can do a calculation with some rows before others are loaded. > > What calculation, a MatMult? > How long does that MatMult take compared to loading? > Yes, a MatMult. I already have a more straightforward implementation where the matrix is loaded completely at the beginning, and then all of the multiplications are performed. Based on the loading time and computation time with the current implementation, it appears that most of the computation time could be subsumed into the loading time. > 3) If you are talking about a dense matrix, you should be loading in parallel using MPI-I/O. We do this for Vec. > > Before you do complicated programming, I would assure myself that the performance gain is worth it. > > In this scenario, every process needs to be simultaneously loading the next block of X, and perform computations on the previously loaded portion. My strategy is for each MPI process to spawn a thread for data loading (so that the memory between the process and the thread is shared), while the process does computations. My concern is that the data loading thread may be using up computational resources of the processor, even though it is mainly doing IO. Will this be an issue? What is the best way to minimize the cpu time of this parallel data loading scheme? > > Oh, you want to load each block in parallel, but there are many blocks. I would really caution you against using threads. They > are death to clean code. Use non-blocking reads. I see. Could you expand on your suggestion regarding non-blocking reads? Are you proposing that each process makes an asynchronous read request in between every, say, MatMult operation? > > Thanks, > > Matt > > Thanks, > Michal. > > >> Wiadomo?? napisana przez Matthew Knepley > w dniu 24.05.2017, o godz. 04:55: >> >> On Wed, May 24, 2017 at 1:09 AM, Michal Derezinski > wrote: >> Hi, >> >> I want to be able to perform matrix operations on several contiguous submatrices of a full matrix, without allocating the memory redundantly for the submatrices (in addition to the memory that is already allocated for the full matrix). >> I tried using MatGetSubMatrix, but this function appears to allocate the additional memory. >> >> The other way I found to do this is to create the smallest submatrices I need first, then use MatCreateNest to combine them into bigger ones (including the full matrix). >> The documentation of MatCreateNest seems to indicate that it does not allocate additional memory for storing the new matrix. >> Is this the right approach, or is there a better one? >> >> Yes, that is the right approach. >> >> Thanks, >> >> Matt >> >> Thanks, >> Michal Derezinski. >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> http://www.caam.rice.edu/~mk51/ > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 24 13:19:09 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 24 May 2017 13:19:09 -0500 Subject: [petsc-users] Accessing submatrices without additional memory usage In-Reply-To: References: <0B69CBD4-9524-429A-8478-0BBF0C236F94@gmail.com> Message-ID: On Wed, May 24, 2017 at 1:13 PM, Micha? Derezi?ski < michal.derezinski at gmail.com> wrote: > > Wiadomo?? napisana przez Matthew Knepley w dniu > 24.05.2017, o godz. 10:44: > > On Wed, May 24, 2017 at 12:37 PM, Micha? Derezi?ski gmail.com> wrote: > >> Great! Then I have a follow-up question: >> >> My goal is to be able to load the full matrix X from disk, while at the >> same time in parallel, performing computations on the submatrices that have >> already been loaded. Essentially, I want to think of X as a block matrix >> (where the blocks are horizontal, spanning the full width of the matrix), >> where I?m loading one block at a time, and all the blocks that have already >> been loaded are combined using MatCreateNest, so that I can make >> computations on that portion of the matrix. >> > > I need to understand better. So > > 1) You want to load a sparse matrix from disk > > > Yes, the matrix is sparse, stored on disk in row-wise chunks (one per > process), with total size of around 3TB. > > 2) You are imagining that it is loaded row-wise, since you can do a > calculation with some rows before others are loaded. > > What calculation, a MatMult? > How long does that MatMult take compared to loading? > > > Yes, a MatMult. > I already have a more straightforward implementation where the matrix is > loaded completely at the beginning, and then all of the multiplications are > performed. > Based on the loading time and computation time with the current > implementation, it appears that most of the computation time could be > subsumed into the loading time. > > 3) If you are talking about a dense matrix, you should be loading in > parallel using MPI-I/O. We do this for Vec. > > Before you do complicated programming, I would assure myself that the > performance gain is worth it. > > >> In this scenario, every process needs to be simultaneously loading the >> next block of X, and perform computations on the previously loaded portion. >> My strategy is for each MPI process to spawn a thread for data loading (so >> that the memory between the process and the thread is shared), while the >> process does computations. My concern is that the data loading thread may >> be using up computational resources of the processor, even though it is >> mainly doing IO. Will this be an issue? What is the best way to minimize >> the cpu time of this parallel data loading scheme? >> > > Oh, you want to load each block in parallel, but there are many blocks. I > would really caution you against using threads. They > are death to clean code. Use non-blocking reads. > > > I see. Could you expand on your suggestion regarding non-blocking reads? > Are you proposing that each process makes an asynchronous read request in > between every, say, MatMult operation? > Check this out: http://beige.ucs.indiana.edu/I590/node109.html PETSc does not do this currently, but it sounds like you are handling the load. Thanks, Matt > > Thanks, > > Matt > > >> Thanks, >> Michal. >> >> >> Wiadomo?? napisana przez Matthew Knepley w dniu >> 24.05.2017, o godz. 04:55: >> >> On Wed, May 24, 2017 at 1:09 AM, Michal Derezinski >> wrote: >> >>> Hi, >>> >>> I want to be able to perform matrix operations on several contiguous >>> submatrices of a full matrix, without allocating the memory redundantly for >>> the submatrices (in addition to the memory that is already allocated for >>> the full matrix). >>> I tried using MatGetSubMatrix, but this function appears to allocate the >>> additional memory. >>> >>> The other way I found to do this is to create the smallest submatrices I >>> need first, then use MatCreateNest to combine them into bigger ones >>> (including the full matrix). >>> The documentation of MatCreateNest seems to indicate that it does not >>> allocate additional memory for storing the new matrix. >>> Is this the right approach, or is there a better one? >>> >> >> Yes, that is the right approach. >> >> Thanks, >> >> Matt >> >> >>> Thanks, >>> Michal Derezinski. >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> http://www.caam.rice.edu/~mk51/ >> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed May 24 13:32:18 2017 From: jed at jedbrown.org (Jed Brown) Date: Wed, 24 May 2017 12:32:18 -0600 Subject: [petsc-users] Accessing submatrices without additional memory usage In-Reply-To: <0B69CBD4-9524-429A-8478-0BBF0C236F94@gmail.com> References: <0B69CBD4-9524-429A-8478-0BBF0C236F94@gmail.com> Message-ID: <87shju2jil.fsf@jedbrown.org> Micha? Derezi?ski writes: > Great! Then I have a follow-up question: > > My goal is to be able to load the full matrix X from disk, while at > the same time in parallel, performing computations on the submatrices > that have already been loaded. Essentially, I want to think of X as a > block matrix (where the blocks are horizontal, spanning the full width > of the matrix), What would be the distribution of the vector that this non-square submatrix (probably with many empty columns) is applied to? Could you back up and explain what problem you're trying to solve? It sounds like you're about to code yourself into a dungeon. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From bsmith at mcs.anl.gov Wed May 24 13:38:28 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 24 May 2017 13:38:28 -0500 Subject: [petsc-users] [petsc-dev] Using PETSc MatIS, how to matmult a global IS matrix and a global vector ? In-Reply-To: <1238048783.7876567.1495619158445.JavaMail.zimbra@inria.fr> References: <2012394521.6757315.1495383841678.JavaMail.zimbra@inria.fr> <264DC59D-B914-42E5-9A89-0746F21A37BF@gmail.com> <1392596904.7422896.1495533198072.JavaMail.zimbra@inria.fr> <1784716977.7683031.1495556883254.JavaMail.zimbra@inria.fr> <855172682.7687763.1495558287122.JavaMail.zimbra@inria.fr> <1238048783.7876567.1495619158445.JavaMail.zimbra@inria.fr> Message-ID: <02FA1C44-2E29-4000-B21E-B9D96C5B14A0@mcs.anl.gov> > On May 24, 2017, at 4:45 AM, Franck Houssen wrote: > > Coming from FEM, I believe the very confusing thing is that the local size of the user problem (math, physics point of view - DDM domain size) is not (can not be ?) the local size expected in MatCreateIS. > > My understanding is that the local size in MatIS is "just" related to backend implementation problems (it's logical that this local size is necessary, but, for another purpose: MPI machinery). Taking a few steps back, I can not see a case (I may be wrong) when a user does know how to compute or set "by hand" the local size that MatIS will expect: my understanding (once again, not sure) is that in most cases, the user will need local size to be PETSC_DECIDE in MatIS (because he doesn't want to "bother" with that or can not guess / compute it => unfortunatelly, as is, this jam the whole thing). > > I guess this kind of signature for MatIS would avoid/limit confusion in most cases and for most users : > PetscErrorCode MatCreateIS(MPI_Comm comm,PetscInt bs,PetscInt M,PetscInt N,ISLocalToGlobalMapping rmap,ISLocalToGlobalMapping cmap,Mat *A,PetscInt m = PETSC_DECIDE,PetscInt n= PETSC_DECIDE) > Or even > PetscErrorCode MatCreateIS(MPI_Comm comm,PetscInt bs,PetscInt M,PetscInt N,ISLocalToGlobalMapping rmap,ISLocalToGlobalMapping cmap,Mat *A) // Always use PETSC_DECIDE backstage ? > You are correct that often m and n may be PETSC_DECIDE, however there are also valid reasons for them to be determined by the user and not just set automatically. With finite elements and PETSc one often partitions first the elements and then partitions the degrees of freedom on the elements subservient to the partitioning of the elements; by this I mean any degree of freedom that is on an element interior to a process in the element partitioning (degree of freedom in no way "shared" between processes) would be a assigned to that MPI process while "shared" elements are assigned by some rule to one of the processes that "share" the degree of freedom. In this case if the user computes the correct local m and n they will get exactly the partitioning of degrees of freedom they want (in the global vector) but if they let PETSc decide they won't get neccessarily the same partitioning. The reason the m and n are the 3rd and 4th argument instead of the last arguments is to match the calls for, for example MatCreateAIJ() and MatCreateBAIJ() so that users understand the m and n have the same meaning as that case. Unfortunately this does not seem to have worked since the m and n arguments appear to have been confusing to you. Barry > Franck > > De: "Matthew Knepley" > ?: "Franck Houssen" > Cc: "Stefano Zampini" , "PETSc" , "PETSc" > Envoy?: Mardi 23 Mai 2017 19:02:28 > Objet: Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS matrix and a global vector ? > > On Tue, May 23, 2017 at 11:51 AM, Franck Houssen wrote: > Not sure to know what question you're talking about ?!... > I use MatIS to test some kind of domain decomposition methods. I define my own preconditioner for that: in the apply callback, I need to matmult my (matIS) matrix with the incoming vector. > > Okay. I will create an example using your suggestion. > > Thanks, > > Matt > > Franck > > De: "Matthew Knepley" > ?: "Franck Houssen" > Cc: "Stefano Zampini" , "PETSc" , "PETSc" > Envoy?: Mardi 23 Mai 2017 18:46:34 > Objet: Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS matrix and a global vector ? > > On Tue, May 23, 2017 at 11:28 AM, Franck Houssen wrote: > OK, thanks. This is helpfull... But I really think the doc should be more verbose about that: this is really confusing and I didn't find any simple example to begin with which make all this even more confusing (personal opinion). > > Did you respond to my other question (how are you using them)? That would help me understand how to phrase it. > > Thanks, > > Matt > > Franck > > > De: "Matthew Knepley" > ?: "Franck Houssen" > Cc: "Stefano Zampini" , "PETSc" , "PETSc" > Envoy?: Mardi 23 Mai 2017 13:21:21 > Objet: Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS matrix and a global vector ? > > On Tue, May 23, 2017 at 4:53 AM, Franck Houssen wrote: > The first thing I did was to put 3, not 4 : I got an error thrown in MatCreateIS (see the git diff + stack below). As the error said I used globalSize = numberOfMPIProcessus * localSize : my understanding is that, when using MatIS, the global size needs to be the sum of all local sizes. Correct ? > > No. MatIS means that the matrix is not assembled. The easiest way (for me) to think of this is that processes do not have > to hold full rows. One process can hold part of row i, and another processes can hold another part. However, there are still > the same number of global rows. > > I have a 3x3 global matrix made of two overlapping 2x2 local matrix (= diagonal with 1.). Each local matrix correspond to one domain (each domain is delegated to one MPI proc, so, I have 2 MPI procs because I have 2 domains). > > So the global size is 3. The local size here is not the size of the local IS block, since that is a property only of MatIS. It is the > size of the local piece of the vector you multiply. This allows PETSc to understand the parallel layout of the Vec, and how it > matched the Mat. > > This is somewhat confusing because FEM people mean something different by "local" than we do here, and in fact we use this > other definition of local when assembling operators. > > Matt > > This is the simplest possible example: I have two 2x2 (local) diag matrix that overlap so that the global matrix built from them is 1, 2, 1 on the diagonal (local contributions add up in the middle). > I need to MatMult this global matrix with a global vector filled with 1. > > Franck > > Git diff : > > --- a/matISLocalMat.cpp > +++ b/matISLocalMat.cpp > @@ -16,7 +16,7 @@ int main(int argc,char **argv) { > int size = 0; MPI_Comm_size(MPI_COMM_WORLD, &size); if (size != 2) return 1; > int rank = 0; MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > - PetscInt localSize = 2, globalSize = localSize*2 /*2 MPI*/; > + PetscInt localSize = 2, globalSize = 3; > PetscInt localIdx[2] = {0, 0}; > if (rank == 0) {localIdx[0] = 0; localIdx[1] = 1;} > else {localIdx[0] = 1; localIdx[1] = 2;} > > > > Stack error: > > [0]PETSC ERROR: Nonconforming object sizes > [0]PETSC ERROR: Sum of local lengths 4 does not equal global length 3, my local length 2 > [0]PETSC ERROR: [0] ISG2LMapApply line 17 /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/vec/is/utils/isltog.c > [0]PETSC ERROR: [0] MatSetValues_IS line 692 /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > [0]PETSC ERROR: [0] MatSetValues line 1157 /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > [0]PETSC ERROR: [0] MatISSetPreallocation_IS line 95 /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > [0]PETSC ERROR: [0] MatISSetPreallocation line 80 /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > [0]PETSC ERROR: [0] PetscSplitOwnership line 80 /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/sys/utils/psplit.c > [0]PETSC ERROR: [0] PetscLayoutSetUp line 129 /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/vec/is/utils/pmap.c > [0]PETSC ERROR: [0] MatSetLocalToGlobalMapping_IS line 628 /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > [0]PETSC ERROR: [0] MatSetLocalToGlobalMapping line 1899 /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/interface/matrix.c > [0]PETSC ERROR: [0] MatCreateIS line 986 /home/fghoussen/Documents/INRIA/petsc-3.7.6/src/mat/impls/is/matis.c > > > > De: "Stefano Zampini" > ?: "Matthew Knepley" > Cc: "Franck Houssen" , "PETSc" , "PETSc" > Envoy?: Dimanche 21 Mai 2017 23:02:37 > Objet: Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS matrix and a global vector ? > > Franck, > > PETSc takes care of doing the matrix-vector multiplication properly using MatIS. As Matt said, the layout of the vectors is the usual parallel layout. > The local sizes of the MatIS matrix (i.e. the local size of the left and right vectors used in MatMult) are not the sizes of the local subdomain matrices in MatIS. > > > On May 21, 2017, at 6:47 PM, Matthew Knepley wrote: > > On Sun, May 21, 2017 at 11:26 AM, Franck Houssen wrote: > Using PETSc MatIS, how to matmult a global IS matrix and a global vector ? Example is attached : I don't get what I expect that is a vector such that proc0 = [1, 2] and proc1 = [2, 1] > > 1) I think the global size of your matrix is wrong. You seem to want 3, not 4 > > 2) Global vectors have a non-overlapping row partition. You might be thinking of local vectors > > Thanks, > > Matt > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > De: "Stefano Zampini" > ?: "Matthew Knepley" > Cc: "Franck Houssen" , "PETSc" , "PETSc" > Envoy?: Dimanche 21 Mai 2017 23:02:37 > Objet: Re: [petsc-dev] Using PETSc MatIS, how to matmult a global IS matrix and a global vector ? > > Franck, > > PETSc takes care of doing the matrix-vector multiplication properly using MatIS. As Matt said, the layout of the vectors is the usual parallel layout. > The local sizes of the MatIS matrix (i.e. the local size of the left and right vectors used in MatMult) are not the sizes of the local subdomain matrices in MatIS. > > > On May 21, 2017, at 6:47 PM, Matthew Knepley wrote: > > On Sun, May 21, 2017 at 11:26 AM, Franck Houssen wrote: > Using PETSc MatIS, how to matmult a global IS matrix and a global vector ? Example is attached : I don't get what I expect that is a vector such that proc0 = [1, 2] and proc1 = [2, 1] > > 1) I think the global size of your matrix is wrong. You seem to want 3, not 4 > > 2) Global vectors have a non-overlapping row partition. You might be thinking of local vectors > > Thanks, > > Matt > > Franck > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > From danyang.su at gmail.com Wed May 24 13:49:38 2017 From: danyang.su at gmail.com (Danyang Su) Date: Wed, 24 May 2017 11:49:38 -0700 Subject: [petsc-users] Question on incomplete factorization level and fill In-Reply-To: References: <93217794-9c63-fd52-ab36-4174de8cb9c8@gmail.com> Message-ID: Hi Matt, Yes. The matrix is 450000x450000 sparse. The hypre takes hundreds of iterates, not for all but in most of the timesteps. The matrix is not well conditioned, with nonzero entries range from 1.0e-29 to 1.0e2. I also made double check if there is anything wrong in the parallel version, however, the matrix is the same with sequential version except some round error which is relatively very small. Usually for those not well conditioned matrix, direct solver should be faster than iterative solver, right? But when I use the sequential iterative solver with ILU prec developed almost 20 years go by others, the solver converge fast with appropriate factorization level. In other words, when I use 24 processor using hypre, the speed is almost the same as as the old sequential iterative solver using 1 processor. I use most of the default configuration for the general case with pretty good speedup. And I am not sure if I miss something for this problem. Thanks, Danyang On 17-05-24 11:12 AM, Matthew Knepley wrote: > On Wed, May 24, 2017 at 12:50 PM, Danyang Su > wrote: > > Hi Matthew and Barry, > > Thanks for the quick response. > > I also tried superlu and mumps, both work but it is about four > times slower than ILU(dt) prec through hypre, with 24 processors I > have tested. > > You mean the total time is 4x? And you are taking hundreds of > iterates? That seems hard to believe, unless you are dropping > a huge number of elements. > > When I look into the convergence information, the method using > ILU(dt) still takes 200 to 3000 linear iterations for each newton > iteration. One reason is this equation is hard to solve. As for > the general cases, the same method works awesome and get very good > speedup. > > I do not understand what you mean here. > > I also doubt if I use hypre correctly for this case. Is there > anyway to check this problem, or is it possible to increase the > factorization level through hypre? > > I don't know. > > Matt > > Thanks, > > Danyang > > > On 17-05-24 04:59 AM, Matthew Knepley wrote: >> On Wed, May 24, 2017 at 2:21 AM, Danyang Su > > wrote: >> >> Dear All, >> >> I use PCFactorSetLevels for ILU and PCFactorSetFill for other >> preconditioning in my code to help solve the problems that >> the default option is hard to solve. However, I found the >> latter one, PCFactorSetFill does not take effect for my >> problem. The matrices and rhs as well as the solutions are >> attached from the link below. I obtain the solution using >> hypre preconditioner and it takes 7 and 38 iterations for >> matrix 1 and matrix 2. However, if I use other >> preconditioner, the solver just failed at the first matrix. I >> have tested this matrix using the native sequential solver >> (not PETSc) with ILU preconditioning. If I set the incomplete >> factorization level to 0, this sequential solver will take >> more than 100 iterations. If I increase the factorization >> level to 1 or more, it just takes several iterations. This >> remind me that the PC factor for this matrices should be >> increased. However, when I tried it in PETSc, it just does >> not work. >> >> Matrix and rhs can be obtained from the link below. >> >> https://eilinator.eos.ubc.ca:8443/index.php/s/CalUcq9CMeblk4R >> >> >> Would anyone help to check if you can make this work by >> increasing the PC factor level or fill? >> >> >> We have ILU(k) supported in serial. However ILU(dt) which takes a >> tolerance only works through Hypre >> >> http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html >> >> >> I recommend you try SuperLU or MUMPS, which can both be >> downloaded automatically by configure, and >> do a full sparse LU. >> >> Thanks, >> >> Matt >> >> Thanks and regards, >> >> Danyang >> >> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to >> which their experiments lead. >> -- Norbert Wiener >> >> http://www.caam.rice.edu/~mk51/ > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From michal.derezinski at gmail.com Wed May 24 13:53:07 2017 From: michal.derezinski at gmail.com (=?utf-8?Q?Micha=C5=82_Derezi=C5=84ski?=) Date: Wed, 24 May 2017 11:53:07 -0700 Subject: [petsc-users] Accessing submatrices without additional memory usage In-Reply-To: <87shju2jil.fsf@jedbrown.org> References: <0B69CBD4-9524-429A-8478-0BBF0C236F94@gmail.com> <87shju2jil.fsf@jedbrown.org> Message-ID: <5BF044B5-49B3-49CB-A58D-A06B11DD6000@gmail.com> It is an optimization problem minimizing a convex objective for a binary classification task, which I?m solving using a Tao solver. The multiplication operations are performing gradient computation for each step of the optimization. So I?m performing both a MatMult and a MatMultTranspose, in both cases the vector may be a dense vector. The crucial part of the implementation is that at the beginning I am not running on the entire dataset (rows of the full matrix). As a consequence I don?t need to have the entire matrix loaded right away. In fact, in some cases I may choose to stop the optimization before the entire matrix has been loaded (I already verified that this scenario may come up as a use case). That is why it is important that I don?t load it at the beginning. Parallel loading is not a necessary part of the implementation. Initially, I intend to alternate between loading a portion of the matrix, then doing computations, then loading more of the matrix, etc. But, given that I observed large loading times for some datasets, parallel loading may make sense, if done efficiently. Thanks, Michal. > Wiadomo?? napisana przez Jed Brown w dniu 24.05.2017, o godz. 11:32: > > Micha? Derezi?ski writes: > >> Great! Then I have a follow-up question: >> >> My goal is to be able to load the full matrix X from disk, while at >> the same time in parallel, performing computations on the submatrices >> that have already been loaded. Essentially, I want to think of X as a >> block matrix (where the blocks are horizontal, spanning the full width >> of the matrix), > > What would be the distribution of the vector that this non-square > submatrix (probably with many empty columns) is applied to? > > Could you back up and explain what problem you're trying to solve? It > sounds like you're about to code yourself into a dungeon. From jed at jedbrown.org Wed May 24 14:06:11 2017 From: jed at jedbrown.org (Jed Brown) Date: Wed, 24 May 2017 13:06:11 -0600 Subject: [petsc-users] Accessing submatrices without additional memory usage In-Reply-To: <5BF044B5-49B3-49CB-A58D-A06B11DD6000@gmail.com> References: <0B69CBD4-9524-429A-8478-0BBF0C236F94@gmail.com> <87shju2jil.fsf@jedbrown.org> <5BF044B5-49B3-49CB-A58D-A06B11DD6000@gmail.com> Message-ID: <87poey2hy4.fsf@jedbrown.org> Okay, do you have more parameters than observations? And each segment of the matrix will be fully distributed? Do you have a parallel file system? Is your matrix sparse or dense? Micha? Derezi?ski writes: > It is an optimization problem minimizing a convex objective for a binary classification task, which I?m solving using a Tao solver. > The multiplication operations are performing gradient computation for each step of the optimization. > So I?m performing both a MatMult and a MatMultTranspose, in both cases the vector may be a dense vector. > > The crucial part of the implementation is that at the beginning I am not running on the entire dataset (rows of the full matrix). > As a consequence I don?t need to have the entire matrix loaded right away. In fact, in some cases I may choose to stop the optimization before the entire matrix has been loaded (I already verified that this scenario may come up as a use case). That is why it is important that I don?t load it at the beginning. > > Parallel loading is not a necessary part of the implementation. Initially, I intend to alternate between loading a portion of the matrix, then doing computations, then loading more of the matrix, etc. But, given that I observed large loading times for some datasets, parallel loading may make sense, if done efficiently. > > Thanks, > Michal. > >> Wiadomo?? napisana przez Jed Brown w dniu 24.05.2017, o godz. 11:32: >> >> Micha? Derezi?ski writes: >> >>> Great! Then I have a follow-up question: >>> >>> My goal is to be able to load the full matrix X from disk, while at >>> the same time in parallel, performing computations on the submatrices >>> that have already been loaded. Essentially, I want to think of X as a >>> block matrix (where the blocks are horizontal, spanning the full width >>> of the matrix), >> >> What would be the distribution of the vector that this non-square >> submatrix (probably with many empty columns) is applied to? >> >> Could you back up and explain what problem you're trying to solve? It >> sounds like you're about to code yourself into a dungeon. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From michal.derezinski at gmail.com Wed May 24 14:15:21 2017 From: michal.derezinski at gmail.com (=?utf-8?Q?Micha=C5=82_Derezi=C5=84ski?=) Date: Wed, 24 May 2017 12:15:21 -0700 Subject: [petsc-users] Accessing submatrices without additional memory usage In-Reply-To: <87poey2hy4.fsf@jedbrown.org> References: <0B69CBD4-9524-429A-8478-0BBF0C236F94@gmail.com> <87shju2jil.fsf@jedbrown.org> <5BF044B5-49B3-49CB-A58D-A06B11DD6000@gmail.com> <87poey2hy4.fsf@jedbrown.org> Message-ID: <0D256C07-0556-4D0E-A9D3-D7D3F5D8B2C6@gmail.com> > Wiadomo?? napisana przez Jed Brown w dniu 24.05.2017, o godz. 12:06: > > Okay, do you have more parameters than observations? No (not necessarily). The biggest matrix is 50M observations and 12M parameters. > And each segment > of the matrix will be fully distributed? Yes. > Do you have a parallel file > system? Yes. > Is your matrix sparse or dense? Yes. > > Micha? Derezi?ski writes: > >> It is an optimization problem minimizing a convex objective for a binary classification task, which I?m solving using a Tao solver. >> The multiplication operations are performing gradient computation for each step of the optimization. >> So I?m performing both a MatMult and a MatMultTranspose, in both cases the vector may be a dense vector. >> >> The crucial part of the implementation is that at the beginning I am not running on the entire dataset (rows of the full matrix). >> As a consequence I don?t need to have the entire matrix loaded right away. In fact, in some cases I may choose to stop the optimization before the entire matrix has been loaded (I already verified that this scenario may come up as a use case). That is why it is important that I don?t load it at the beginning. >> >> Parallel loading is not a necessary part of the implementation. Initially, I intend to alternate between loading a portion of the matrix, then doing computations, then loading more of the matrix, etc. But, given that I observed large loading times for some datasets, parallel loading may make sense, if done efficiently. >> >> Thanks, >> Michal. >> >>> Wiadomo?? napisana przez Jed Brown w dniu 24.05.2017, o godz. 11:32: >>> >>> Micha? Derezi?ski writes: >>> >>>> Great! Then I have a follow-up question: >>>> >>>> My goal is to be able to load the full matrix X from disk, while at >>>> the same time in parallel, performing computations on the submatrices >>>> that have already been loaded. Essentially, I want to think of X as a >>>> block matrix (where the blocks are horizontal, spanning the full width >>>> of the matrix), >>> >>> What would be the distribution of the vector that this non-square >>> submatrix (probably with many empty columns) is applied to? >>> >>> Could you back up and explain what problem you're trying to solve? It >>> sounds like you're about to code yourself into a dungeon. From hzhang at mcs.anl.gov Wed May 24 14:21:05 2017 From: hzhang at mcs.anl.gov (Hong) Date: Wed, 24 May 2017 14:21:05 -0500 Subject: [petsc-users] Question on incomplete factorization level and fill In-Reply-To: References: <93217794-9c63-fd52-ab36-4174de8cb9c8@gmail.com> Message-ID: Danyang : I tested your data. Your matrices encountered zero pivots, e.g. petsc/src/ksp/ksp/examples/tutorials (master) $ mpiexec -n 24 ./ex10 -f0 a_react_in_2.bin -rhs b_react_in_2.bin -ksp_monitor -ksp_error_if_not_converged [15]PETSC ERROR: Zero pivot in LU factorization: http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot [15]PETSC ERROR: Zero pivot row 1249 value 2.05808e-14 tolerance 2.22045e-14 ... Adding option '-sub_pc_factor_shift_type nonzero', I got mpiexec -n 24 ./ex10 -f0 a_react_in_2.bin -rhs b_react_in_2.bin -ksp_monitor -ksp_error_if_not_converged -sub_pc_factor_shift_type nonzero -mat_view ascii::ascii_info Mat Object: 24 MPI processes type: mpiaij rows=450000, cols=450000 total: nonzeros=6991400, allocated nonzeros=6991400 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines 0 KSP Residual norm 5.849777711755e+01 1 KSP Residual norm 6.824179430230e-01 2 KSP Residual norm 3.994483555787e-02 3 KSP Residual norm 6.085841461433e-03 4 KSP Residual norm 8.876162583511e-04 5 KSP Residual norm 9.407780665278e-05 Number of iterations = 5 Residual norm 0.00542891 Hong > Hi Matt, > > Yes. The matrix is 450000x450000 sparse. The hypre takes hundreds of > iterates, not for all but in most of the timesteps. The matrix is not well > conditioned, with nonzero entries range from 1.0e-29 to 1.0e2. I also made > double check if there is anything wrong in the parallel version, however, > the matrix is the same with sequential version except some round error > which is relatively very small. Usually for those not well conditioned > matrix, direct solver should be faster than iterative solver, right? But > when I use the sequential iterative solver with ILU prec developed almost > 20 years go by others, the solver converge fast with appropriate > factorization level. In other words, when I use 24 processor using hypre, > the speed is almost the same as as the old sequential iterative solver > using 1 processor. > > I use most of the default configuration for the general case with pretty > good speedup. And I am not sure if I miss something for this problem. > > Thanks, > > Danyang > > On 17-05-24 11:12 AM, Matthew Knepley wrote: > > On Wed, May 24, 2017 at 12:50 PM, Danyang Su wrote: > >> Hi Matthew and Barry, >> >> Thanks for the quick response. >> >> I also tried superlu and mumps, both work but it is about four times >> slower than ILU(dt) prec through hypre, with 24 processors I have tested. >> > You mean the total time is 4x? And you are taking hundreds of iterates? > That seems hard to believe, unless you are dropping > a huge number of elements. > >> When I look into the convergence information, the method using ILU(dt) >> still takes 200 to 3000 linear iterations for each newton iteration. One >> reason is this equation is hard to solve. As for the general cases, the >> same method works awesome and get very good speedup. >> > I do not understand what you mean here. > >> I also doubt if I use hypre correctly for this case. Is there anyway to >> check this problem, or is it possible to increase the factorization level >> through hypre? >> > I don't know. > > Matt > >> Thanks, >> >> Danyang >> >> On 17-05-24 04:59 AM, Matthew Knepley wrote: >> >> On Wed, May 24, 2017 at 2:21 AM, Danyang Su wrote: >> >>> Dear All, >>> >>> I use PCFactorSetLevels for ILU and PCFactorSetFill for other >>> preconditioning in my code to help solve the problems that the default >>> option is hard to solve. However, I found the latter one, PCFactorSetFill >>> does not take effect for my problem. The matrices and rhs as well as the >>> solutions are attached from the link below. I obtain the solution using >>> hypre preconditioner and it takes 7 and 38 iterations for matrix 1 and >>> matrix 2. However, if I use other preconditioner, the solver just failed at >>> the first matrix. I have tested this matrix using the native sequential >>> solver (not PETSc) with ILU preconditioning. If I set the incomplete >>> factorization level to 0, this sequential solver will take more than 100 >>> iterations. If I increase the factorization level to 1 or more, it just >>> takes several iterations. This remind me that the PC factor for this >>> matrices should be increased. However, when I tried it in PETSc, it just >>> does not work. >>> >>> Matrix and rhs can be obtained from the link below. >>> >>> https://eilinator.eos.ubc.ca:8443/index.php/s/CalUcq9CMeblk4R >>> >>> Would anyone help to check if you can make this work by increasing the >>> PC factor level or fill? >>> >> >> We have ILU(k) supported in serial. However ILU(dt) which takes a >> tolerance only works through Hypre >> >> http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html >> >> I recommend you try SuperLU or MUMPS, which can both be downloaded >> automatically by configure, and >> do a full sparse LU. >> >> Thanks, >> >> Matt >> >> >>> Thanks and regards, >>> >>> Danyang >>> >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> http://www.caam.rice.edu/~mk51/ >> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From danyang.su at gmail.com Wed May 24 14:28:51 2017 From: danyang.su at gmail.com (Danyang Su) Date: Wed, 24 May 2017 12:28:51 -0700 Subject: [petsc-users] Question on incomplete factorization level and fill In-Reply-To: References: <93217794-9c63-fd52-ab36-4174de8cb9c8@gmail.com> Message-ID: Hi Hong, Awesome. Thanks for testing the case. I will try your options for the code and get back to you later. Regards, Danyang On 17-05-24 12:21 PM, Hong wrote: > Danyang : > I tested your data. > Your matrices encountered zero pivots, e.g. > petsc/src/ksp/ksp/examples/tutorials (master) > $ mpiexec -n 24 ./ex10 -f0 a_react_in_2.bin -rhs b_react_in_2.bin > -ksp_monitor -ksp_error_if_not_converged > > [15]PETSC ERROR: Zero pivot in LU factorization: > http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot > [15]PETSC ERROR: Zero pivot row 1249 value 2.05808e-14 tolerance > 2.22045e-14 > ... > > Adding option '-sub_pc_factor_shift_type nonzero', I got > mpiexec -n 24 ./ex10 -f0 a_react_in_2.bin -rhs b_react_in_2.bin > -ksp_monitor -ksp_error_if_not_converged -sub_pc_factor_shift_type > nonzero -mat_view ascii::ascii_info > > Mat Object: 24 MPI processes > type: mpiaij > rows=450000, cols=450000 > total: nonzeros=6991400, allocated nonzeros=6991400 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > 0 KSP Residual norm 5.849777711755e+01 > 1 KSP Residual norm 6.824179430230e-01 > 2 KSP Residual norm 3.994483555787e-02 > 3 KSP Residual norm 6.085841461433e-03 > 4 KSP Residual norm 8.876162583511e-04 > 5 KSP Residual norm 9.407780665278e-05 > Number of iterations = 5 > Residual norm 0.00542891 > > Hong > > Hi Matt, > > Yes. The matrix is 450000x450000 sparse. The hypre takes hundreds > of iterates, not for all but in most of the timesteps. The matrix > is not well conditioned, with nonzero entries range from 1.0e-29 > to 1.0e2. I also made double check if there is anything wrong in > the parallel version, however, the matrix is the same with > sequential version except some round error which is relatively > very small. Usually for those not well conditioned matrix, direct > solver should be faster than iterative solver, right? But when I > use the sequential iterative solver with ILU prec developed almost > 20 years go by others, the solver converge fast with appropriate > factorization level. In other words, when I use 24 processor using > hypre, the speed is almost the same as as the old sequential > iterative solver using 1 processor. > > I use most of the default configuration for the general case with > pretty good speedup. And I am not sure if I miss something for > this problem. > > Thanks, > > Danyang > > > On 17-05-24 11:12 AM, Matthew Knepley wrote: >> On Wed, May 24, 2017 at 12:50 PM, Danyang Su >> > wrote: >> >> Hi Matthew and Barry, >> >> Thanks for the quick response. >> >> I also tried superlu and mumps, both work but it is about >> four times slower than ILU(dt) prec through hypre, with 24 >> processors I have tested. >> >> You mean the total time is 4x? And you are taking hundreds of >> iterates? That seems hard to believe, unless you are dropping >> a huge number of elements. >> >> When I look into the convergence information, the method >> using ILU(dt) still takes 200 to 3000 linear iterations for >> each newton iteration. One reason is this equation is hard to >> solve. As for the general cases, the same method works >> awesome and get very good speedup. >> >> I do not understand what you mean here. >> >> I also doubt if I use hypre correctly for this case. Is there >> anyway to check this problem, or is it possible to increase >> the factorization level through hypre? >> >> I don't know. >> >> Matt >> >> Thanks, >> >> Danyang >> >> >> On 17-05-24 04:59 AM, Matthew Knepley wrote: >>> On Wed, May 24, 2017 at 2:21 AM, Danyang Su >>> > wrote: >>> >>> Dear All, >>> >>> I use PCFactorSetLevels for ILU and PCFactorSetFill for >>> other preconditioning in my code to help solve the >>> problems that the default option is hard to solve. >>> However, I found the latter one, PCFactorSetFill does >>> not take effect for my problem. The matrices and rhs as >>> well as the solutions are attached from the link below. >>> I obtain the solution using hypre preconditioner and it >>> takes 7 and 38 iterations for matrix 1 and matrix 2. >>> However, if I use other preconditioner, the solver just >>> failed at the first matrix. I have tested this matrix >>> using the native sequential solver (not PETSc) with ILU >>> preconditioning. If I set the incomplete factorization >>> level to 0, this sequential solver will take more than >>> 100 iterations. If I increase the factorization level to >>> 1 or more, it just takes several iterations. This remind >>> me that the PC factor for this matrices should be >>> increased. However, when I tried it in PETSc, it just >>> does not work. >>> >>> Matrix and rhs can be obtained from the link below. >>> >>> https://eilinator.eos.ubc.ca:8443/index.php/s/CalUcq9CMeblk4R >>> >>> >>> Would anyone help to check if you can make this work by >>> increasing the PC factor level or fill? >>> >>> >>> We have ILU(k) supported in serial. However ILU(dt) which >>> takes a tolerance only works through Hypre >>> >>> http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html >>> >>> >>> I recommend you try SuperLU or MUMPS, which can both be >>> downloaded automatically by configure, and >>> do a full sparse LU. >>> >>> Thanks, >>> >>> Matt >>> >>> Thanks and regards, >>> >>> Danyang >>> >>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin >>> their experiments is infinitely more interesting than any >>> results to which their experiments lead. >>> -- Norbert Wiener >>> >>> http://www.caam.rice.edu/~mk51/ >>> >> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to >> which their experiments lead. >> -- Norbert Wiener >> >> http://www.caam.rice.edu/~mk51/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed May 24 14:28:54 2017 From: jed at jedbrown.org (Jed Brown) Date: Wed, 24 May 2017 13:28:54 -0600 Subject: [petsc-users] Accessing submatrices without additional memory usage In-Reply-To: <0D256C07-0556-4D0E-A9D3-D7D3F5D8B2C6@gmail.com> References: <0B69CBD4-9524-429A-8478-0BBF0C236F94@gmail.com> <87shju2jil.fsf@jedbrown.org> <5BF044B5-49B3-49CB-A58D-A06B11DD6000@gmail.com> <87poey2hy4.fsf@jedbrown.org> <0D256C07-0556-4D0E-A9D3-D7D3F5D8B2C6@gmail.com> Message-ID: <87mva22gw9.fsf@jedbrown.org> Micha? Derezi?ski writes: >> Wiadomo?? napisana przez Jed Brown w dniu 24.05.2017, o godz. 12:06: >> >> Okay, do you have more parameters than observations? > > No (not necessarily). The biggest matrix is 50M observations and 12M parameters. > >> And each segment >> of the matrix will be fully distributed? > > Yes. > >> Do you have a parallel file >> system? > > Yes. > >> Is your matrix sparse or dense? > > Yes. By that you mean sparse? You'll need some sort of segmented storage (could be separate files or a file format that allows seeking). (If the matrix is generated by some other process, you'd benefit from skipping the file system entirely, but I understand that may not be possible.) I would use MatNest, creating a new one after each segment is loaded. There isn't currently a MatLoadBegin/End interface, but that could be created if it would be useful. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From danyang.su at gmail.com Wed May 24 15:06:47 2017 From: danyang.su at gmail.com (Danyang Su) Date: Wed, 24 May 2017 13:06:47 -0700 Subject: [petsc-users] Question on incomplete factorization level and fill In-Reply-To: References: <93217794-9c63-fd52-ab36-4174de8cb9c8@gmail.com> Message-ID: Dear Hong, I just tested with different number of processors for the same matrix. It sometimes got "ERROR: Arguments are incompatible" for different number of processors. It works fine using 4, 8, or 24 processors, but failed with "ERROR: Arguments are incompatible" using 16 or 48 processors. The error information is attached. I tested this on my local computer with 6 cores 12 threads. Any suggestion on this? Thanks, Danyang On 17-05-24 12:28 PM, Danyang Su wrote: > > Hi Hong, > > Awesome. Thanks for testing the case. I will try your options for the > code and get back to you later. > > Regards, > > Danyang > > > On 17-05-24 12:21 PM, Hong wrote: >> Danyang : >> I tested your data. >> Your matrices encountered zero pivots, e.g. >> petsc/src/ksp/ksp/examples/tutorials (master) >> $ mpiexec -n 24 ./ex10 -f0 a_react_in_2.bin -rhs b_react_in_2.bin >> -ksp_monitor -ksp_error_if_not_converged >> >> [15]PETSC ERROR: Zero pivot in LU factorization: >> http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot >> [15]PETSC ERROR: Zero pivot row 1249 value 2.05808e-14 tolerance >> 2.22045e-14 >> ... >> >> Adding option '-sub_pc_factor_shift_type nonzero', I got >> mpiexec -n 24 ./ex10 -f0 a_react_in_2.bin -rhs b_react_in_2.bin >> -ksp_monitor -ksp_error_if_not_converged -sub_pc_factor_shift_type >> nonzero -mat_view ascii::ascii_info >> >> Mat Object: 24 MPI processes >> type: mpiaij >> rows=450000, cols=450000 >> total: nonzeros=6991400, allocated nonzeros=6991400 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node (on process 0) routines >> 0 KSP Residual norm 5.849777711755e+01 >> 1 KSP Residual norm 6.824179430230e-01 >> 2 KSP Residual norm 3.994483555787e-02 >> 3 KSP Residual norm 6.085841461433e-03 >> 4 KSP Residual norm 8.876162583511e-04 >> 5 KSP Residual norm 9.407780665278e-05 >> Number of iterations = 5 >> Residual norm 0.00542891 >> >> Hong >> >> Hi Matt, >> >> Yes. The matrix is 450000x450000 sparse. The hypre takes hundreds >> of iterates, not for all but in most of the timesteps. The matrix >> is not well conditioned, with nonzero entries range from 1.0e-29 >> to 1.0e2. I also made double check if there is anything wrong in >> the parallel version, however, the matrix is the same with >> sequential version except some round error which is relatively >> very small. Usually for those not well conditioned matrix, direct >> solver should be faster than iterative solver, right? But when I >> use the sequential iterative solver with ILU prec developed >> almost 20 years go by others, the solver converge fast with >> appropriate factorization level. In other words, when I use 24 >> processor using hypre, the speed is almost the same as as the old >> sequential iterative solver using 1 processor. >> >> I use most of the default configuration for the general case with >> pretty good speedup. And I am not sure if I miss something for >> this problem. >> >> Thanks, >> >> Danyang >> >> >> On 17-05-24 11:12 AM, Matthew Knepley wrote: >>> On Wed, May 24, 2017 at 12:50 PM, Danyang Su >>> > wrote: >>> >>> Hi Matthew and Barry, >>> >>> Thanks for the quick response. >>> >>> I also tried superlu and mumps, both work but it is about >>> four times slower than ILU(dt) prec through hypre, with 24 >>> processors I have tested. >>> >>> You mean the total time is 4x? And you are taking hundreds of >>> iterates? That seems hard to believe, unless you are dropping >>> a huge number of elements. >>> >>> When I look into the convergence information, the method >>> using ILU(dt) still takes 200 to 3000 linear iterations for >>> each newton iteration. One reason is this equation is hard >>> to solve. As for the general cases, the same method works >>> awesome and get very good speedup. >>> >>> I do not understand what you mean here. >>> >>> I also doubt if I use hypre correctly for this case. Is >>> there anyway to check this problem, or is it possible to >>> increase the factorization level through hypre? >>> >>> I don't know. >>> >>> Matt >>> >>> Thanks, >>> >>> Danyang >>> >>> >>> On 17-05-24 04:59 AM, Matthew Knepley wrote: >>>> On Wed, May 24, 2017 at 2:21 AM, Danyang Su >>>> > wrote: >>>> >>>> Dear All, >>>> >>>> I use PCFactorSetLevels for ILU and PCFactorSetFill for >>>> other preconditioning in my code to help solve the >>>> problems that the default option is hard to solve. >>>> However, I found the latter one, PCFactorSetFill does >>>> not take effect for my problem. The matrices and rhs as >>>> well as the solutions are attached from the link below. >>>> I obtain the solution using hypre preconditioner and it >>>> takes 7 and 38 iterations for matrix 1 and matrix 2. >>>> However, if I use other preconditioner, the solver just >>>> failed at the first matrix. I have tested this matrix >>>> using the native sequential solver (not PETSc) with ILU >>>> preconditioning. If I set the incomplete factorization >>>> level to 0, this sequential solver will take more than >>>> 100 iterations. If I increase the factorization level >>>> to 1 or more, it just takes several iterations. This >>>> remind me that the PC factor for this matrices should >>>> be increased. However, when I tried it in PETSc, it >>>> just does not work. >>>> >>>> Matrix and rhs can be obtained from the link below. >>>> >>>> https://eilinator.eos.ubc.ca:8443/index.php/s/CalUcq9CMeblk4R >>>> >>>> >>>> Would anyone help to check if you can make this work by >>>> increasing the PC factor level or fill? >>>> >>>> >>>> We have ILU(k) supported in serial. However ILU(dt) which >>>> takes a tolerance only works through Hypre >>>> >>>> http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html >>>> >>>> >>>> I recommend you try SuperLU or MUMPS, which can both be >>>> downloaded automatically by configure, and >>>> do a full sparse LU. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> Thanks and regards, >>>> >>>> Danyang >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin >>>> their experiments is infinitely more interesting than any >>>> results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> http://www.caam.rice.edu/~mk51/ >>>> >>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to >>> which their experiments lead. >>> -- Norbert Wiener >>> >>> http://www.caam.rice.edu/~mk51/ >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- Mat Object: 16 MPI processes type: mpiaij rows=450000, cols=450000 total: nonzeros=6991400, allocated nonzeros=6991400 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Arguments are incompatible [0]PETSC ERROR: Incompatible vector local lengths 28130 != 28125 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 [0]PETSC ERROR: [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: Arguments are incompatible [1]PETSC ERROR: Incompatible vector local lengths 28130 != 28125 [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [1]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 [1]PETSC ERROR: ./ex10 on a linux-gnu-dbg named nwmop by dsu Wed May 24 12:54:10 2017 [1]PETSC ERROR: [2]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [2]PETSC ERROR: Arguments are incompatible [2]PETSC ERROR: Incompatible vector local lengths 28130 != 28125 [2]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [2]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 [2]PETSC ERROR: ./ex10 on a linux-gnu-dbg named nwmop by dsu Wed May 24 12:54:10 2017 [2]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes [2]PETSC ERROR: #1 VecCopy() line 1639 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/vec/vec/interface/vector.c [4]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [4]PETSC ERROR: Arguments are incompatible [4]PETSC ERROR: Incompatible vector local lengths 28130 != 28125 [4]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [4]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 [4]PETSC ERROR: ./ex10 on a linux-gnu-dbg named nwmop by dsu Wed May 24 12:54:10 2017 [4]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes [4]PETSC ERROR: #1 VecCopy() line 1639 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/vec/vec/interface/vector.c [6]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [6]PETSC ERROR: Arguments are incompatible [6]PETSC ERROR: Incompatible vector local lengths 28130 != 28125 [6]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [6]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 [6]PETSC ERROR: ./ex10 on a linux-gnu-dbg named nwmop by dsu Wed May 24 12:54:10 2017 [6]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes [6]PETSC ERROR: #1 VecCopy() line 1639 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/vec/vec/interface/vector.c [8]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [8]PETSC ERROR: Arguments are incompatible [8]PETSC ERROR: Incompatible vector local lengths 28120 != 28125 [8]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [8]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 [8]PETSC ERROR: ./ex10 on a linux-gnu-dbg named nwmop by dsu Wed May 24 12:54:10 2017 [8]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes [8]PETSC ERROR: #1 VecCopy() line 1639 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/vec/vec/interface/vector.c [8]PETSC ERROR: ./ex10 on a linux-gnu-dbg named nwmop by dsu Wed May 24 12:54:10 2017 [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes [0]PETSC ERROR: #1 VecCopy() line 1639 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/vec/vec/interface/vector.c [0]PETSC ERROR: #2 KSPInitialResidual() line 65 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itres.c [0]PETSC ERROR: #3 KSPSolve_GMRES() line 239 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/impls/gmres/gmres.c [0]PETSC ERROR: #4 KSPSolve() line 656 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes [1]PETSC ERROR: #1 VecCopy() line 1639 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/vec/vec/interface/vector.c [1]PETSC ERROR: #2 KSPInitialResidual() line 65 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itres.c [1]PETSC ERROR: #3 KSPSolve_GMRES() line 239 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/impls/gmres/gmres.c [1]PETSC ERROR: #4 KSPSolve() line 656 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c [2]PETSC ERROR: #2 KSPInitialResidual() line 65 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itres.c [2]PETSC ERROR: #3 KSPSolve_GMRES() line 239 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/impls/gmres/gmres.c [2]PETSC ERROR: #4 KSPSolve() line 656 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c [2]PETSC ERROR: #5 main() line 330 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/examples/tutorials/ex10.c [3]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [3]PETSC ERROR: Arguments are incompatible [3]PETSC ERROR: Incompatible vector local lengths 28130 != 28125 [3]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [3]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 [3]PETSC ERROR: ./ex10 on a linux-gnu-dbg named nwmop by dsu Wed May 24 12:54:10 2017 [3]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes [3]PETSC ERROR: #1 VecCopy() line 1639 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/vec/vec/interface/vector.c [3]PETSC ERROR: #2 KSPInitialResidual() line 65 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itres.c [3]PETSC ERROR: [4]PETSC ERROR: #2 KSPInitialResidual() line 65 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itres.c [4]PETSC ERROR: #3 KSPSolve_GMRES() line 239 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/impls/gmres/gmres.c [4]PETSC ERROR: #4 KSPSolve() line 656 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c [4]PETSC ERROR: #5 main() line 330 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/examples/tutorials/ex10.c [4]PETSC ERROR: PETSc Option Table entries: [4]PETSC ERROR: [5]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [5]PETSC ERROR: Arguments are incompatible [5]PETSC ERROR: Incompatible vector local lengths 28130 != 28125 [5]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [5]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 [5]PETSC ERROR: ./ex10 on a linux-gnu-dbg named nwmop by dsu Wed May 24 12:54:10 2017 [5]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes [5]PETSC ERROR: #1 VecCopy() line 1639 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/vec/vec/interface/vector.c [5]PETSC ERROR: #2 KSPInitialResidual() line 65 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itres.c [5]PETSC ERROR: #3 KSPSolve_GMRES() line 239 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/impls/gmres/gmres.c [5]PETSC ERROR: [6]PETSC ERROR: #2 KSPInitialResidual() line 65 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itres.c [6]PETSC ERROR: #3 KSPSolve_GMRES() line 239 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/impls/gmres/gmres.c [6]PETSC ERROR: #4 KSPSolve() line 656 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c [6]PETSC ERROR: #5 main() line 330 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/examples/tutorials/ex10.c [6]PETSC ERROR: PETSc Option Table entries: [7]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [7]PETSC ERROR: Arguments are incompatible [7]PETSC ERROR: Incompatible vector local lengths 28130 != 28125 [7]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [7]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 [7]PETSC ERROR: ./ex10 on a linux-gnu-dbg named nwmop by dsu Wed May 24 12:54:10 2017 [7]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes [7]PETSC ERROR: #1 VecCopy() line 1639 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/vec/vec/interface/vector.c [7]PETSC ERROR: #2 KSPInitialResidual() line 65 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itres.c [7]PETSC ERROR: #3 KSPSolve_GMRES() line 239 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/impls/gmres/gmres.c #2 KSPInitialResidual() line 65 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itres.c [8]PETSC ERROR: #3 KSPSolve_GMRES() line 239 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/impls/gmres/gmres.c [8]PETSC ERROR: #4 KSPSolve() line 656 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c [8]PETSC ERROR: #5 main() line 330 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/examples/tutorials/ex10.c [8]PETSC ERROR: PETSc Option Table entries: [8]PETSC ERROR: -f0 ./mat_rhs/a_react_in_2.bin [8]PETSC ERROR: -ksp_error_if_not_converged [8]PETSC ERROR: [11]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [11]PETSC ERROR: Arguments are incompatible [11]PETSC ERROR: Incompatible vector local lengths 28120 != 28125 [11]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [11]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 [11]PETSC ERROR: [12]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [12]PETSC ERROR: Arguments are incompatible [12]PETSC ERROR: Incompatible vector local lengths 28120 != 28125 [12]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [12]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 [12]PETSC ERROR: ./ex10 on a linux-gnu-dbg named nwmop by dsu Wed May 24 12:54:10 2017 [12]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes [12]PETSC ERROR: #1 VecCopy() line 1639 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/vec/vec/interface/vector.c [12]PETSC ERROR: #2 KSPInitialResidual() line 65 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itres.c [12]PETSC ERROR: #3 KSPSolve_GMRES() line 239 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/impls/gmres/gmres.c [12]PETSC ERROR: #4 KSPSolve() line 656 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #5 main() line 330 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/examples/tutorials/ex10.c [0]PETSC ERROR: PETSc Option Table entries: [0]PETSC ERROR: -f0 ./mat_rhs/a_react_in_2.bin [0]PETSC ERROR: -ksp_error_if_not_converged [0]PETSC ERROR: -mat_view ascii::ascii_info [0]PETSC ERROR: -matload_block_size 1 [0]PETSC ERROR: -rhs ./mat_rhs/b_react_in_2.bin [0]PETSC ERROR: -skp_monitor [0]PETSC ERROR: -sub_pc_factor_shift_type nonzero [0]PETSC ERROR: -vecload_block_size 10 [1]PETSC ERROR: #5 main() line 330 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/examples/tutorials/ex10.c [1]PETSC ERROR: PETSc Option Table entries: [1]PETSC ERROR: -f0 ./mat_rhs/a_react_in_2.bin [1]PETSC ERROR: -ksp_error_if_not_converged [1]PETSC ERROR: -mat_view ascii::ascii_info [1]PETSC ERROR: -matload_block_size 1 [1]PETSC ERROR: -rhs ./mat_rhs/b_react_in_2.bin [1]PETSC ERROR: -skp_monitor [1]PETSC ERROR: -sub_pc_factor_shift_type nonzero [1]PETSC ERROR: -vecload_block_size 10 [2]PETSC ERROR: PETSc Option Table entries: [2]PETSC ERROR: -f0 ./mat_rhs/a_react_in_2.bin [2]PETSC ERROR: -ksp_error_if_not_converged [2]PETSC ERROR: -mat_view ascii::ascii_info [2]PETSC ERROR: -matload_block_size 1 [2]PETSC ERROR: -rhs ./mat_rhs/b_react_in_2.bin [2]PETSC ERROR: -skp_monitor [2]PETSC ERROR: -sub_pc_factor_shift_type nonzero [2]PETSC ERROR: -vecload_block_size 10 [2]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- #3 KSPSolve_GMRES() line 239 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/impls/gmres/gmres.c [3]PETSC ERROR: #4 KSPSolve() line 656 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c [3]PETSC ERROR: #5 main() line 330 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/examples/tutorials/ex10.c [3]PETSC ERROR: PETSc Option Table entries: [3]PETSC ERROR: -f0 ./mat_rhs/a_react_in_2.bin [3]PETSC ERROR: -ksp_error_if_not_converged [3]PETSC ERROR: -mat_view ascii::ascii_info [3]PETSC ERROR: -matload_block_size 1 [3]PETSC ERROR: -rhs ./mat_rhs/b_react_in_2.bin [3]PETSC ERROR: -skp_monitor -f0 ./mat_rhs/a_react_in_2.bin [4]PETSC ERROR: -ksp_error_if_not_converged [4]PETSC ERROR: -mat_view ascii::ascii_info [4]PETSC ERROR: -matload_block_size 1 [4]PETSC ERROR: -rhs ./mat_rhs/b_react_in_2.bin [4]PETSC ERROR: -skp_monitor [4]PETSC ERROR: -sub_pc_factor_shift_type nonzero [4]PETSC ERROR: -vecload_block_size 10 [4]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- #4 KSPSolve() line 656 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c [5]PETSC ERROR: #5 main() line 330 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/examples/tutorials/ex10.c [5]PETSC ERROR: PETSc Option Table entries: [5]PETSC ERROR: -f0 ./mat_rhs/a_react_in_2.bin [5]PETSC ERROR: -ksp_error_if_not_converged [5]PETSC ERROR: -mat_view ascii::ascii_info [5]PETSC ERROR: -matload_block_size 1 [5]PETSC ERROR: -rhs ./mat_rhs/b_react_in_2.bin [5]PETSC ERROR: -skp_monitor [5]PETSC ERROR: -sub_pc_factor_shift_type nonzero [6]PETSC ERROR: -f0 ./mat_rhs/a_react_in_2.bin [6]PETSC ERROR: -ksp_error_if_not_converged [6]PETSC ERROR: -mat_view ascii::ascii_info [6]PETSC ERROR: -matload_block_size 1 [6]PETSC ERROR: -rhs ./mat_rhs/b_react_in_2.bin [6]PETSC ERROR: -skp_monitor [6]PETSC ERROR: -sub_pc_factor_shift_type nonzero [6]PETSC ERROR: -vecload_block_size 10 [6]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- [7]PETSC ERROR: #4 KSPSolve() line 656 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c [7]PETSC ERROR: #5 main() line 330 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/examples/tutorials/ex10.c [7]PETSC ERROR: PETSc Option Table entries: [7]PETSC ERROR: -f0 ./mat_rhs/a_react_in_2.bin [7]PETSC ERROR: -ksp_error_if_not_converged [7]PETSC ERROR: -mat_view ascii::ascii_info [7]PETSC ERROR: -matload_block_size 1 [7]PETSC ERROR: -rhs ./mat_rhs/b_react_in_2.bin [7]PETSC ERROR: -skp_monitor -mat_view ascii::ascii_info [8]PETSC ERROR: -matload_block_size 1 [8]PETSC ERROR: -rhs ./mat_rhs/b_react_in_2.bin [8]PETSC ERROR: -skp_monitor [8]PETSC ERROR: -sub_pc_factor_shift_type nonzero [8]PETSC ERROR: -vecload_block_size 10 [8]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- ./ex10 on a linux-gnu-dbg named nwmop by dsu Wed May 24 12:54:10 2017 [11]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes [11]PETSC ERROR: #1 VecCopy() line 1639 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/vec/vec/interface/vector.c [11]PETSC ERROR: #2 KSPInitialResidual() line 65 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itres.c [11]PETSC ERROR: #3 KSPSolve_GMRES() line 239 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/impls/gmres/gmres.c [11]PETSC ERROR: #4 KSPSolve() line 656 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c [11]PETSC ERROR: [12]PETSC ERROR: #5 main() line 330 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/examples/tutorials/ex10.c [12]PETSC ERROR: PETSc Option Table entries: [12]PETSC ERROR: -f0 ./mat_rhs/a_react_in_2.bin [12]PETSC ERROR: -ksp_error_if_not_converged [12]PETSC ERROR: -mat_view ascii::ascii_info [12]PETSC ERROR: -matload_block_size 1 [12]PETSC ERROR: -rhs ./mat_rhs/b_react_in_2.bin [12]PETSC ERROR: -skp_monitor [12]PETSC ERROR: -sub_pc_factor_shift_type nonzero [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- [1]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_WORLD, 75) - process 2 [cli_2]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 75) - process 2 [3]PETSC ERROR: -sub_pc_factor_shift_type nonzero [3]PETSC ERROR: -vecload_block_size 10 [3]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- [5]PETSC ERROR: -vecload_block_size 10 [5]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- [7]PETSC ERROR: -sub_pc_factor_shift_type nonzero [7]PETSC ERROR: -vecload_block_size 10 [7]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- #5 main() line 330 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/examples/tutorials/ex10.c [11]PETSC ERROR: PETSc Option Table entries: [11]PETSC ERROR: -f0 ./mat_rhs/a_react_in_2.bin [11]PETSC ERROR: -ksp_error_if_not_converged [11]PETSC ERROR: -mat_view ascii::ascii_info [11]PETSC ERROR: -matload_block_size 1 [11]PETSC ERROR: [12]PETSC ERROR: -vecload_block_size 10 [12]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_WORLD, 75) - process 1 [cli_1]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 75) - process 1 application called MPI_Abort(MPI_COMM_WORLD, 75) - process 3 [cli_3]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 75) - process 3 application called MPI_Abort(MPI_COMM_WORLD, 75) - process 5 [cli_5]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 75) - process 5 application called MPI_Abort(MPI_COMM_WORLD, 75) - process 6 [cli_6]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 75) - process 6 application called MPI_Abort(MPI_COMM_WORLD, 75) - process 7 [cli_7]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 75) - process 7 application called MPI_Abort(MPI_COMM_WORLD, 75) - process 8 [cli_8]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 75) - process 8 -rhs ./mat_rhs/b_react_in_2.bin [11]PETSC ERROR: -skp_monitor [11]PETSC ERROR: -sub_pc_factor_shift_type nonzero [11]PETSC ERROR: -vecload_block_size 10 [11]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_WORLD, 75) - process 12 [cli_12]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 75) - process 12 application called MPI_Abort(MPI_COMM_WORLD, 75) - process 0 [cli_0]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 75) - process 0 application called MPI_Abort(MPI_COMM_WORLD, 75) - process 11 [cli_11]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 75) - process 11 application called MPI_Abort(MPI_COMM_WORLD, 75) - process 4 [cli_4]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 75) - process 4 [9]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [9]PETSC ERROR: Arguments are incompatible [9]PETSC ERROR: Incompatible vector local lengths 28120 != 28125 [9]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [9]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 [9]PETSC ERROR: ./ex10 on a linux-gnu-dbg named nwmop by dsu Wed May 24 12:54:10 2017 [9]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes [9]PETSC ERROR: #1 VecCopy() line 1639 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/vec/vec/interface/vector.c [9]PETSC ERROR: #2 KSPInitialResidual() line 65 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itres.c [9]PETSC ERROR: #3 KSPSolve_GMRES() line 239 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/impls/gmres/gmres.c [9]PETSC ERROR: #4 KSPSolve() line 656 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c [9]PETSC ERROR: #5 main() line 330 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/examples/tutorials/ex10.c [9]PETSC ERROR: PETSc Option Table entries: [9]PETSC ERROR: -f0 ./mat_rhs/a_react_in_2.bin [9]PETSC ERROR: -ksp_error_if_not_converged [9]PETSC ERROR: -mat_view ascii::ascii_info [9]PETSC ERROR: -matload_block_size 1 [9]PETSC ERROR: -rhs ./mat_rhs/b_react_in_2.bin [9]PETSC ERROR: -skp_monitor [9]PETSC ERROR: -sub_pc_factor_shift_type nonzero [9]PETSC ERROR: -vecload_block_size 10 [9]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_WORLD, 75) - process 9 [cli_9]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 75) - process 9 =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = EXIT CODE: 75 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES =================================================================================== -------------- next part -------------- Mat Object: 48 MPI processes type: mpiaij rows=450000, cols=450000 total: nonzeros=6991400, allocated nonzeros=6991400 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Arguments are incompatible [0]PETSC ERROR: Incompatible vector local lengths 9380 != 9375 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 [0]PETSC ERROR: ./ex10 on a linux-gnu-dbg named nwmop by dsu Wed May 24 12:57:30 2017 [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes [0]PETSC ERROR: #1 VecCopy() line 1639 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/vec/vec/interface/vector.c [0]PETSC ERROR: #2 KSPInitialResidual() line 65 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itres.c [0]PETSC ERROR: #3 KSPSolve_GMRES() line 239 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/impls/gmres/gmres.c [0]PETSC ERROR: #4 KSPSolve() line 656 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #5 main() line 330 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/examples/tutorials/ex10.c [0]PETSC ERROR: PETSc Option Table entries: [0]PETSC ERROR: -f0 ./mat_rhs/a_react_in_2.bin [0]PETSC ERROR: -ksp_error_if_not_converged [0]PETSC ERROR: -mat_view ascii::ascii_info [0]PETSC ERROR: -matload_block_size 1 [0]PETSC ERROR: -rhs ./mat_rhs/b_react_in_2.bin [0]PETSC ERROR: -skp_monitor [0]PETSC ERROR: -sub_pc_factor_shift_type nonzero [0]PETSC ERROR: -vecload_block_size 10 [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_WORLD, 75) - process 0 [cli_0]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 75) - process 0 [1]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [1]PETSC ERROR: Arguments are incompatible [1]PETSC ERROR: Incompatible vector local lengths 9380 != 9375 [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [1]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 [1]PETSC ERROR: ./ex10 on a linux-gnu-dbg named nwmop by dsu Wed May 24 12:57:30 2017 [1]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes [1]PETSC ERROR: #1 VecCopy() line 1639 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/vec/vec/interface/vector.c [1]PETSC ERROR: #2 KSPInitialResidual() line 65 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itres.c [1]PETSC ERROR: #3 KSPSolve_GMRES() line 239 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/impls/gmres/gmres.c [1]PETSC ERROR: #4 KSPSolve() line 656 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c [1]PETSC ERROR: #5 main() line 330 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/examples/tutorials/ex10.c [1]PETSC ERROR: PETSc Option Table entries: [1]PETSC ERROR: -f0 ./mat_rhs/a_react_in_2.bin [1]PETSC ERROR: -ksp_error_if_not_converged [1]PETSC ERROR: -mat_view ascii::ascii_info [1]PETSC ERROR: -matload_block_size 1 [1]PETSC ERROR: -rhs ./mat_rhs/b_react_in_2.bin [1]PETSC ERROR: -skp_monitor [1]PETSC ERROR: -sub_pc_factor_shift_type nonzero [1]PETSC ERROR: -vecload_block_size 10 [1]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_WORLD, 75) - process 1 [cli_1]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 75) - process 1 [2]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [2]PETSC ERROR: Arguments are incompatible [2]PETSC ERROR: Incompatible vector local lengths 9380 != 9375 [2]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [2]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 [2]PETSC ERROR: ./ex10 on a linux-gnu-dbg named nwmop by dsu Wed May 24 12:57:30 2017 [2]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes [2]PETSC ERROR: #1 VecCopy() line 1639 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/vec/vec/interface/vector.c [2]PETSC ERROR: #2 KSPInitialResidual() line 65 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itres.c [2]PETSC ERROR: #3 KSPSolve_GMRES() line 239 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/impls/gmres/gmres.c [2]PETSC ERROR: #4 KSPSolve() line 656 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c [2]PETSC ERROR: #5 main() line 330 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/examples/tutorials/ex10.c [2]PETSC ERROR: PETSc Option Table entries: [2]PETSC ERROR: -f0 ./mat_rhs/a_react_in_2.bin [2]PETSC ERROR: -ksp_error_if_not_converged [2]PETSC ERROR: -mat_view ascii::ascii_info [2]PETSC ERROR: -matload_block_size 1 [2]PETSC ERROR: -rhs ./mat_rhs/b_react_in_2.bin [2]PETSC ERROR: -skp_monitor [2]PETSC ERROR: -sub_pc_factor_shift_type nonzero [2]PETSC ERROR: -vecload_block_size 10 [2]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_WORLD, 75) - process 2 [cli_2]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 75) - process 2 [4]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [4]PETSC ERROR: Arguments are incompatible [4]PETSC ERROR: Incompatible vector local lengths 9380 != 9375 [4]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [4]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 [4]PETSC ERROR: ./ex10 on a linux-gnu-dbg named nwmop by dsu Wed May 24 12:57:30 2017 [4]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes [4]PETSC ERROR: #1 VecCopy() line 1639 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/vec/vec/interface/vector.c [4]PETSC ERROR: #2 KSPInitialResidual() line 65 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itres.c [4]PETSC ERROR: #3 KSPSolve_GMRES() line 239 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/impls/gmres/gmres.c [4]PETSC ERROR: #4 KSPSolve() line 656 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c [4]PETSC ERROR: #5 main() line 330 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/examples/tutorials/ex10.c [4]PETSC ERROR: PETSc Option Table entries: [4]PETSC ERROR: -f0 ./mat_rhs/a_react_in_2.bin [4]PETSC ERROR: -ksp_error_if_not_converged [4]PETSC ERROR: -mat_view ascii::ascii_info [4]PETSC ERROR: -matload_block_size 1 [4]PETSC ERROR: -rhs ./mat_rhs/b_react_in_2.bin [4]PETSC ERROR: -skp_monitor [4]PETSC ERROR: -sub_pc_factor_shift_type nonzero [4]PETSC ERROR: -vecload_block_size 10 [4]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_WORLD, 75) - process 4 [cli_4]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 75) - process 4 [5]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [5]PETSC ERROR: Arguments are incompatible [5]PETSC ERROR: Incompatible vector local lengths 9380 != 9375 [5]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [5]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 [5]PETSC ERROR: ./ex10 on a linux-gnu-dbg named nwmop by dsu Wed May 24 12:57:30 2017 [5]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes [5]PETSC ERROR: #1 VecCopy() line 1639 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/vec/vec/interface/vector.c [5]PETSC ERROR: #2 KSPInitialResidual() line 65 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itres.c [5]PETSC ERROR: #3 KSPSolve_GMRES() line 239 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/impls/gmres/gmres.c [5]PETSC ERROR: #4 KSPSolve() line 656 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c [5]PETSC ERROR: #5 main() line 330 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/examples/tutorials/ex10.c [5]PETSC ERROR: PETSc Option Table entries: [5]PETSC ERROR: -f0 ./mat_rhs/a_react_in_2.bin [5]PETSC ERROR: -ksp_error_if_not_converged [5]PETSC ERROR: -mat_view ascii::ascii_info [5]PETSC ERROR: -matload_block_size 1 [5]PETSC ERROR: -rhs ./mat_rhs/b_react_in_2.bin [5]PETSC ERROR: -skp_monitor [5]PETSC ERROR: -sub_pc_factor_shift_type nonzero [5]PETSC ERROR: -vecload_block_size 10 [5]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_WORLD, 75) - process 5 [cli_5]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 75) - process 5 [6]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [6]PETSC ERROR: Arguments are incompatible [6]PETSC ERROR: Incompatible vector local lengths 9380 != 9375 [6]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [6]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 [6]PETSC ERROR: ./ex10 on a linux-gnu-dbg named nwmop by dsu Wed May 24 12:57:30 2017 [6]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes [6]PETSC ERROR: #1 VecCopy() line 1639 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/vec/vec/interface/vector.c [6]PETSC ERROR: #2 KSPInitialResidual() line 65 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itres.c [6]PETSC ERROR: #3 KSPSolve_GMRES() line 239 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/impls/gmres/gmres.c [6]PETSC ERROR: #4 KSPSolve() line 656 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c [6]PETSC ERROR: #5 main() line 330 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/examples/tutorials/ex10.c [6]PETSC ERROR: PETSc Option Table entries: [6]PETSC ERROR: -f0 ./mat_rhs/a_react_in_2.bin [6]PETSC ERROR: -ksp_error_if_not_converged [6]PETSC ERROR: -mat_view ascii::ascii_info [6]PETSC ERROR: -matload_block_size 1 [6]PETSC ERROR: -rhs ./mat_rhs/b_react_in_2.bin [6]PETSC ERROR: -skp_monitor [6]PETSC ERROR: -sub_pc_factor_shift_type nonzero [6]PETSC ERROR: -vecload_block_size 10 [6]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_WORLD, 75) - process 6 [cli_6]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 75) - process 6 [8]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [8]PETSC ERROR: Arguments are incompatible [8]PETSC ERROR: Incompatible vector local lengths 9380 != 9375 [8]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [8]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 [8]PETSC ERROR: ./ex10 on a linux-gnu-dbg named nwmop by dsu Wed May 24 12:57:30 2017 [8]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes [8]PETSC ERROR: #1 VecCopy() line 1639 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/vec/vec/interface/vector.c [8]PETSC ERROR: #2 KSPInitialResidual() line 65 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itres.c [8]PETSC ERROR: #3 KSPSolve_GMRES() line 239 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/impls/gmres/gmres.c [8]PETSC ERROR: #4 KSPSolve() line 656 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c [8]PETSC ERROR: #5 main() line 330 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/examples/tutorials/ex10.c [8]PETSC ERROR: PETSc Option Table entries: [8]PETSC ERROR: -f0 ./mat_rhs/a_react_in_2.bin [8]PETSC ERROR: -ksp_error_if_not_converged [8]PETSC ERROR: -mat_view ascii::ascii_info [8]PETSC ERROR: -matload_block_size 1 [8]PETSC ERROR: -rhs ./mat_rhs/b_react_in_2.bin [8]PETSC ERROR: -skp_monitor [8]PETSC ERROR: -sub_pc_factor_shift_type nonzero [8]PETSC ERROR: -vecload_block_size 10 [8]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_WORLD, 75) - process 8 [cli_8]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 75) - process 8 [10]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [10]PETSC ERROR: Arguments are incompatible [10]PETSC ERROR: Incompatible vector local lengths 9380 != 9375 [10]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [10]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 [10]PETSC ERROR: ./ex10 on a linux-gnu-dbg named nwmop by dsu Wed May 24 12:57:30 2017 [10]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes [10]PETSC ERROR: #1 VecCopy() line 1639 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/vec/vec/interface/vector.c [10]PETSC ERROR: #2 KSPInitialResidual() line 65 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itres.c [10]PETSC ERROR: #3 KSPSolve_GMRES() line 239 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/impls/gmres/gmres.c [10]PETSC ERROR: #4 KSPSolve() line 656 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c [10]PETSC ERROR: #5 main() line 330 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/examples/tutorials/ex10.c [10]PETSC ERROR: PETSc Option Table entries: [10]PETSC ERROR: -f0 ./mat_rhs/a_react_in_2.bin [10]PETSC ERROR: -ksp_error_if_not_converged [10]PETSC ERROR: -mat_view ascii::ascii_info [10]PETSC ERROR: -matload_block_size 1 [10]PETSC ERROR: -rhs ./mat_rhs/b_react_in_2.bin [10]PETSC ERROR: -skp_monitor [10]PETSC ERROR: -sub_pc_factor_shift_type nonzero [10]PETSC ERROR: -vecload_block_size 10 [10]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_WORLD, 75) - process 10 [cli_10]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 75) - process 10 [16]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [16]PETSC ERROR: Arguments are incompatible [16]PETSC ERROR: Incompatible vector local lengths 9380 != 9375 [16]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [16]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 [16]PETSC ERROR: ./ex10 on a linux-gnu-dbg named nwmop by dsu Wed May 24 12:57:30 2017 [16]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes [16]PETSC ERROR: #1 VecCopy() line 1639 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/vec/vec/interface/vector.c [16]PETSC ERROR: #2 KSPInitialResidual() line 65 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itres.c [16]PETSC ERROR: #3 KSPSolve_GMRES() line 239 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/impls/gmres/gmres.c [16]PETSC ERROR: #4 KSPSolve() line 656 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c [16]PETSC ERROR: #5 main() line 330 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/examples/tutorials/ex10.c [16]PETSC ERROR: PETSc Option Table entries: [16]PETSC ERROR: -f0 ./mat_rhs/a_react_in_2.bin [16]PETSC ERROR: -ksp_error_if_not_converged [16]PETSC ERROR: -mat_view ascii::ascii_info [16]PETSC ERROR: -matload_block_size 1 [16]PETSC ERROR: -rhs ./mat_rhs/b_react_in_2.bin [16]PETSC ERROR: -skp_monitor [16]PETSC ERROR: -sub_pc_factor_shift_type nonzero [16]PETSC ERROR: -vecload_block_size 10 [16]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_WORLD, 75) - process 16 [cli_16]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 75) - process 16 [24]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [24]PETSC ERROR: Arguments are incompatible [24]PETSC ERROR: Incompatible vector local lengths 9370 != 9375 [24]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [24]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 [24]PETSC ERROR: ./ex10 on a linux-gnu-dbg named nwmop by dsu Wed May 24 12:57:30 2017 [24]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes [24]PETSC ERROR: #1 VecCopy() line 1639 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/vec/vec/interface/vector.c [24]PETSC ERROR: #2 KSPInitialResidual() line 65 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itres.c [24]PETSC ERROR: #3 KSPSolve_GMRES() line 239 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/impls/gmres/gmres.c [24]PETSC ERROR: #4 KSPSolve() line 656 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c [24]PETSC ERROR: #5 main() line 330 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/examples/tutorials/ex10.c [24]PETSC ERROR: PETSc Option Table entries: [24]PETSC ERROR: -f0 ./mat_rhs/a_react_in_2.bin [24]PETSC ERROR: -ksp_error_if_not_converged [24]PETSC ERROR: -mat_view ascii::ascii_info [24]PETSC ERROR: -matload_block_size 1 [24]PETSC ERROR: -rhs ./mat_rhs/b_react_in_2.bin [24]PETSC ERROR: -skp_monitor [24]PETSC ERROR: -sub_pc_factor_shift_type nonzero [24]PETSC ERROR: -vecload_block_size 10 [24]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_WORLD, 75) - process 24 [cli_24]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 75) - process 24 [28]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [28]PETSC ERROR: Arguments are incompatible [28]PETSC ERROR: Incompatible vector local lengths 9370 != 9375 [28]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [28]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 [28]PETSC ERROR: ./ex10 on a linux-gnu-dbg named nwmop by dsu Wed May 24 12:57:30 2017 [28]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes [28]PETSC ERROR: #1 VecCopy() line 1639 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/vec/vec/interface/vector.c [28]PETSC ERROR: #2 KSPInitialResidual() line 65 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itres.c [28]PETSC ERROR: #3 KSPSolve_GMRES() line 239 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/impls/gmres/gmres.c [28]PETSC ERROR: #4 KSPSolve() line 656 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c [28]PETSC ERROR: #5 main() line 330 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/examples/tutorials/ex10.c [28]PETSC ERROR: PETSc Option Table entries: [28]PETSC ERROR: -f0 ./mat_rhs/a_react_in_2.bin [28]PETSC ERROR: -ksp_error_if_not_converged [28]PETSC ERROR: -mat_view ascii::ascii_info [28]PETSC ERROR: -matload_block_size 1 [28]PETSC ERROR: -rhs ./mat_rhs/b_react_in_2.bin [28]PETSC ERROR: -skp_monitor [28]PETSC ERROR: -sub_pc_factor_shift_type nonzero [28]PETSC ERROR: -vecload_block_size 10 [28]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_WORLD, 75) - process 28 [cli_28]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 75) - process 28 [32]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [32]PETSC ERROR: Arguments are incompatible [32]PETSC ERROR: Incompatible vector local lengths 9370 != 9375 [32]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [32]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 [32]PETSC ERROR: ./ex10 on a linux-gnu-dbg named nwmop by dsu Wed May 24 12:57:30 2017 [32]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes [32]PETSC ERROR: #1 VecCopy() line 1639 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/vec/vec/interface/vector.c [32]PETSC ERROR: #2 KSPInitialResidual() line 65 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itres.c [32]PETSC ERROR: #3 KSPSolve_GMRES() line 239 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/impls/gmres/gmres.c [32]PETSC ERROR: #4 KSPSolve() line 656 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c [32]PETSC ERROR: #5 main() line 330 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/examples/tutorials/ex10.c [32]PETSC ERROR: PETSc Option Table entries: [32]PETSC ERROR: -f0 ./mat_rhs/a_react_in_2.bin [32]PETSC ERROR: -ksp_error_if_not_converged [32]PETSC ERROR: -mat_view ascii::ascii_info [32]PETSC ERROR: -matload_block_size 1 [32]PETSC ERROR: -rhs ./mat_rhs/b_react_in_2.bin [32]PETSC ERROR: -skp_monitor [32]PETSC ERROR: -sub_pc_factor_shift_type nonzero [32]PETSC ERROR: -vecload_block_size 10 [32]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_WORLD, 75) - process 32 [cli_32]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 75) - process 32 [33]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [33]PETSC ERROR: Arguments are incompatible [33]PETSC ERROR: Incompatible vector local lengths 9370 != 9375 [33]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [33]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 [33]PETSC ERROR: ./ex10 on a linux-gnu-dbg named nwmop by dsu Wed May 24 12:57:30 2017 [33]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes [33]PETSC ERROR: #1 VecCopy() line 1639 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/vec/vec/interface/vector.c [33]PETSC ERROR: #2 KSPInitialResidual() line 65 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itres.c [33]PETSC ERROR: #3 KSPSolve_GMRES() line 239 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/impls/gmres/gmres.c [33]PETSC ERROR: #4 KSPSolve() line 656 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c [33]PETSC ERROR: #5 main() line 330 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/examples/tutorials/ex10.c [33]PETSC ERROR: PETSc Option Table entries: [33]PETSC ERROR: -f0 ./mat_rhs/a_react_in_2.bin [33]PETSC ERROR: -ksp_error_if_not_converged [33]PETSC ERROR: -mat_view ascii::ascii_info [33]PETSC ERROR: -matload_block_size 1 [33]PETSC ERROR: -rhs ./mat_rhs/b_react_in_2.bin [33]PETSC ERROR: -skp_monitor [33]PETSC ERROR: -sub_pc_factor_shift_type nonzero [33]PETSC ERROR: -vecload_block_size 10 [33]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_WORLD, 75) - process 33 [cli_33]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 75) - process 33 [34]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [34]PETSC ERROR: Arguments are incompatible [34]PETSC ERROR: Incompatible vector local lengths 9370 != 9375 [34]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [34]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 [34]PETSC ERROR: ./ex10 on a linux-gnu-dbg named nwmop by dsu Wed May 24 12:57:30 2017 [34]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes [34]PETSC ERROR: #1 VecCopy() line 1639 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/vec/vec/interface/vector.c [34]PETSC ERROR: #2 KSPInitialResidual() line 65 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itres.c [34]PETSC ERROR: #3 KSPSolve_GMRES() line 239 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/impls/gmres/gmres.c [34]PETSC ERROR: #4 KSPSolve() line 656 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c [34]PETSC ERROR: #5 main() line 330 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/examples/tutorials/ex10.c [34]PETSC ERROR: PETSc Option Table entries: [34]PETSC ERROR: -f0 ./mat_rhs/a_react_in_2.bin [34]PETSC ERROR: -ksp_error_if_not_converged [34]PETSC ERROR: -mat_view ascii::ascii_info [34]PETSC ERROR: -matload_block_size 1 [34]PETSC ERROR: -rhs ./mat_rhs/b_react_in_2.bin [34]PETSC ERROR: -skp_monitor [34]PETSC ERROR: -sub_pc_factor_shift_type nonzero [34]PETSC ERROR: -vecload_block_size 10 [34]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_WORLD, 75) - process 34 [cli_34]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 75) - process 34 [40]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [40]PETSC ERROR: Arguments are incompatible [40]PETSC ERROR: Incompatible vector local lengths 9370 != 9375 [40]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [40]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 [40]PETSC ERROR: ./ex10 on a linux-gnu-dbg named nwmop by dsu Wed May 24 12:57:30 2017 [40]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes [40]PETSC ERROR: #1 VecCopy() line 1639 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/vec/vec/interface/vector.c [40]PETSC ERROR: #2 KSPInitialResidual() line 65 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itres.c [40]PETSC ERROR: #3 KSPSolve_GMRES() line 239 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/impls/gmres/gmres.c [40]PETSC ERROR: #4 KSPSolve() line 656 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c [40]PETSC ERROR: #5 main() line 330 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/examples/tutorials/ex10.c [40]PETSC ERROR: PETSc Option Table entries: [40]PETSC ERROR: -f0 ./mat_rhs/a_react_in_2.bin [40]PETSC ERROR: -ksp_error_if_not_converged [40]PETSC ERROR: -mat_view ascii::ascii_info [40]PETSC ERROR: -matload_block_size 1 [40]PETSC ERROR: -rhs ./mat_rhs/b_react_in_2.bin [40]PETSC ERROR: -skp_monitor [40]PETSC ERROR: -sub_pc_factor_shift_type nonzero [40]PETSC ERROR: -vecload_block_size 10 [40]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_WORLD, 75) - process 40 [cli_40]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 75) - process 40 [42]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [42]PETSC ERROR: Arguments are incompatible [42]PETSC ERROR: Incompatible vector local lengths 9370 != 9375 [42]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [42]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 [42]PETSC ERROR: ./ex10 on a linux-gnu-dbg named nwmop by dsu Wed May 24 12:57:30 2017 [42]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes [42]PETSC ERROR: #1 VecCopy() line 1639 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/vec/vec/interface/vector.c [42]PETSC ERROR: #2 KSPInitialResidual() line 65 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itres.c [42]PETSC ERROR: #3 KSPSolve_GMRES() line 239 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/impls/gmres/gmres.c [42]PETSC ERROR: #4 KSPSolve() line 656 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c [42]PETSC ERROR: #5 main() line 330 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/examples/tutorials/ex10.c [42]PETSC ERROR: PETSc Option Table entries: [42]PETSC ERROR: -f0 ./mat_rhs/a_react_in_2.bin [42]PETSC ERROR: -ksp_error_if_not_converged [42]PETSC ERROR: -mat_view ascii::ascii_info [42]PETSC ERROR: -matload_block_size 1 [42]PETSC ERROR: -rhs ./mat_rhs/b_react_in_2.bin [42]PETSC ERROR: -skp_monitor [42]PETSC ERROR: -sub_pc_factor_shift_type nonzero [42]PETSC ERROR: -vecload_block_size 10 [42]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_WORLD, 75) - process 42 [cli_42]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 75) - process 42 [44]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [44]PETSC ERROR: Arguments are incompatible [44]PETSC ERROR: Incompatible vector local lengths 9370 != 9375 [44]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [44]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 [44]PETSC ERROR: ./ex10 on a linux-gnu-dbg named nwmop by dsu Wed May 24 12:57:30 2017 [44]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --download-fblaslapack --download-mpich --download-hypre --download-superlu_dist --download-hdf5=yes [44]PETSC ERROR: #1 VecCopy() line 1639 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/vec/vec/interface/vector.c [44]PETSC ERROR: #2 KSPInitialResidual() line 65 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itres.c [44]PETSC ERROR: #3 KSPSolve_GMRES() line 239 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/impls/gmres/gmres.c [44]PETSC ERROR: #4 KSPSolve() line 656 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/interface/itfunc.c [44]PETSC ERROR: #5 main() line 330 in /home/dsu/Soft/PETSc/petsc-3.7.5/src/ksp/ksp/examples/tutorials/ex10.c [44]PETSC ERROR: PETSc Option Table entries: [44]PETSC ERROR: -f0 ./mat_rhs/a_react_in_2.bin [44]PETSC ERROR: -ksp_error_if_not_converged [44]PETSC ERROR: -mat_view ascii::ascii_info [44]PETSC ERROR: -matload_block_size 1 [44]PETSC ERROR: -rhs ./mat_rhs/b_react_in_2.bin [44]PETSC ERROR: -skp_monitor [44]PETSC ERROR: -sub_pc_factor_shift_type nonzero =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = EXIT CODE: 75 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES =================================================================================== -------------- next part -------------- Mat Object: 8 MPI processes type: mpiaij rows=450000, cols=450000 total: nonzeros=6991400, allocated nonzeros=6991400 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Number of iterations = 5 Residual norm 0.00542965 WARNING! There are options you set that were not used! WARNING! could be spelling mistake, etc! Option left: name:-skp_monitor (no value) From bsmith at mcs.anl.gov Wed May 24 15:18:09 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 24 May 2017 15:18:09 -0500 Subject: [petsc-users] Question on incomplete factorization level and fill In-Reply-To: References: <93217794-9c63-fd52-ab36-4174de8cb9c8@gmail.com> Message-ID: <2596E6EB-1F05-46F1-8B44-A36825EF9AEC@mcs.anl.gov> I don't think this has anything to do with the specific solver but is because you are loading both a vector and matrix from a file and when it uses the default parallel layout for each, because you have -matload_block_size 1 and -vecload_block_size 10 they do not get the same layout. Remove the -matload_block_size 1 and -vecload_block_size 10 they don't mean anything here anyways. Does this resolve the problem? Barry > On May 24, 2017, at 3:06 PM, Danyang Su wrote: > > Dear Hong, > > I just tested with different number of processors for the same matrix. It sometimes got "ERROR: Arguments are incompatible" for different number of processors. It works fine using 4, 8, or 24 processors, but failed with "ERROR: Arguments are incompatible" using 16 or 48 processors. The error information is attached. I tested this on my local computer with 6 cores 12 threads. Any suggestion on this? > > Thanks, > Danyang > > On 17-05-24 12:28 PM, Danyang Su wrote: >> Hi Hong, >> >> Awesome. Thanks for testing the case. I will try your options for the code and get back to you later. >> >> Regards, >> >> Danyang >> >> On 17-05-24 12:21 PM, Hong wrote: >>> Danyang : >>> I tested your data. >>> Your matrices encountered zero pivots, e.g. >>> petsc/src/ksp/ksp/examples/tutorials (master) >>> $ mpiexec -n 24 ./ex10 -f0 a_react_in_2.bin -rhs b_react_in_2.bin -ksp_monitor -ksp_error_if_not_converged >>> >>> [15]PETSC ERROR: Zero pivot in LU factorization: http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot >>> [15]PETSC ERROR: Zero pivot row 1249 value 2.05808e-14 tolerance 2.22045e-14 >>> ... >>> >>> Adding option '-sub_pc_factor_shift_type nonzero', I got >>> mpiexec -n 24 ./ex10 -f0 a_react_in_2.bin -rhs b_react_in_2.bin -ksp_monitor -ksp_error_if_not_converged -sub_pc_factor_shift_type nonzero -mat_view ascii::ascii_info >>> >>> Mat Object: 24 MPI processes >>> type: mpiaij >>> rows=450000, cols=450000 >>> total: nonzeros=6991400, allocated nonzeros=6991400 >>> total number of mallocs used during MatSetValues calls =0 >>> not using I-node (on process 0) routines >>> 0 KSP Residual norm 5.849777711755e+01 >>> 1 KSP Residual norm 6.824179430230e-01 >>> 2 KSP Residual norm 3.994483555787e-02 >>> 3 KSP Residual norm 6.085841461433e-03 >>> 4 KSP Residual norm 8.876162583511e-04 >>> 5 KSP Residual norm 9.407780665278e-05 >>> Number of iterations = 5 >>> Residual norm 0.00542891 >>> >>> Hong >>> Hi Matt, >>> >>> Yes. The matrix is 450000x450000 sparse. The hypre takes hundreds of iterates, not for all but in most of the timesteps. The matrix is not well conditioned, with nonzero entries range from 1.0e-29 to 1.0e2. I also made double check if there is anything wrong in the parallel version, however, the matrix is the same with sequential version except some round error which is relatively very small. Usually for those not well conditioned matrix, direct solver should be faster than iterative solver, right? But when I use the sequential iterative solver with ILU prec developed almost 20 years go by others, the solver converge fast with appropriate factorization level. In other words, when I use 24 processor using hypre, the speed is almost the same as as the old sequential iterative solver using 1 processor. >>> >>> I use most of the default configuration for the general case with pretty good speedup. And I am not sure if I miss something for this problem. >>> >>> Thanks, >>> >>> Danyang >>> >>> On 17-05-24 11:12 AM, Matthew Knepley wrote: >>>> On Wed, May 24, 2017 at 12:50 PM, Danyang Su wrote: >>>> Hi Matthew and Barry, >>>> >>>> Thanks for the quick response. >>>> I also tried superlu and mumps, both work but it is about four times slower than ILU(dt) prec through hypre, with 24 processors I have tested. >>>> >>>> You mean the total time is 4x? And you are taking hundreds of iterates? That seems hard to believe, unless you are dropping >>>> a huge number of elements. >>>> When I look into the convergence information, the method using ILU(dt) still takes 200 to 3000 linear iterations for each newton iteration. One reason is this equation is hard to solve. As for the general cases, the same method works awesome and get very good speedup. >>>> >>>> I do not understand what you mean here. >>>> I also doubt if I use hypre correctly for this case. Is there anyway to check this problem, or is it possible to increase the factorization level through hypre? >>>> >>>> I don't know. >>>> >>>> Matt >>>> Thanks, >>>> >>>> Danyang >>>> >>>> On 17-05-24 04:59 AM, Matthew Knepley wrote: >>>>> On Wed, May 24, 2017 at 2:21 AM, Danyang Su wrote: >>>>> Dear All, >>>>> >>>>> I use PCFactorSetLevels for ILU and PCFactorSetFill for other preconditioning in my code to help solve the problems that the default option is hard to solve. However, I found the latter one, PCFactorSetFill does not take effect for my problem. The matrices and rhs as well as the solutions are attached from the link below. I obtain the solution using hypre preconditioner and it takes 7 and 38 iterations for matrix 1 and matrix 2. However, if I use other preconditioner, the solver just failed at the first matrix. I have tested this matrix using the native sequential solver (not PETSc) with ILU preconditioning. If I set the incomplete factorization level to 0, this sequential solver will take more than 100 iterations. If I increase the factorization level to 1 or more, it just takes several iterations. This remind me that the PC factor for this matrices should be increased. However, when I tried it in PETSc, it just does not work. >>>>> >>>>> Matrix and rhs can be obtained from the link below. >>>>> >>>>> https://eilinator.eos.ubc.ca:8443/index.php/s/CalUcq9CMeblk4R >>>>> >>>>> Would anyone help to check if you can make this work by increasing the PC factor level or fill? >>>>> >>>>> We have ILU(k) supported in serial. However ILU(dt) which takes a tolerance only works through Hypre >>>>> >>>>> http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html >>>>> >>>>> I recommend you try SuperLU or MUMPS, which can both be downloaded automatically by configure, and >>>>> do a full sparse LU. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> Thanks and regards, >>>>> >>>>> Danyang >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> http://www.caam.rice.edu/~mk51/ >>>> >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> http://www.caam.rice.edu/~mk51/ >>> >>> >> > > From michal.derezinski at gmail.com Wed May 24 15:20:56 2017 From: michal.derezinski at gmail.com (=?utf-8?Q?Micha=C5=82_Derezi=C5=84ski?=) Date: Wed, 24 May 2017 13:20:56 -0700 Subject: [petsc-users] Accessing submatrices without additional memory usage In-Reply-To: <87mva22gw9.fsf@jedbrown.org> References: <0B69CBD4-9524-429A-8478-0BBF0C236F94@gmail.com> <87shju2jil.fsf@jedbrown.org> <5BF044B5-49B3-49CB-A58D-A06B11DD6000@gmail.com> <87poey2hy4.fsf@jedbrown.org> <0D256C07-0556-4D0E-A9D3-D7D3F5D8B2C6@gmail.com> <87mva22gw9.fsf@jedbrown.org> Message-ID: > Wiadomo?? napisana przez Jed Brown w dniu 24.05.2017, o godz. 12:28: > > Micha? Derezi?ski > writes: > >>> Wiadomo?? napisana przez Jed Brown w dniu 24.05.2017, o godz. 12:06: >>> >>> Okay, do you have more parameters than observations? >> >> No (not necessarily). The biggest matrix is 50M observations and 12M parameters. >> >>> And each segment >>> of the matrix will be fully distributed? >> >> Yes. >> >>> Do you have a parallel file >>> system? >> >> Yes. >> >>> Is your matrix sparse or dense? >> >> Yes. > > By that you mean sparse? > Yes, sorry, that?s what I meant. > You'll need some sort of segmented storage (could be separate files or a > file format that allows seeking). (If the matrix is generated by some > other process, you'd benefit from skipping the file system entirely, but > I understand that may not be possible.) > I have the segmented storage in place. > I would use MatNest, creating a new one after each segment is loaded. > There isn't currently a MatLoadBegin/End interface, but that could be > created if it would be useful. Ok, yeah, that was my plan with MatNest. As far as loading in parallel with computation goes, the feedback that I?m hearing so far is: 1. Don?t do it, unless you really have to; 2. If you?re going to do it, instead of spawning a separate thread, use asynchronous read, eg MPI_File_iread. Does this make sense? Thanks, Michal. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Wed May 24 20:32:40 2017 From: hzhang at mcs.anl.gov (Hong) Date: Wed, 24 May 2017 20:32:40 -0500 Subject: [petsc-users] Question on incomplete factorization level and fill In-Reply-To: References: <93217794-9c63-fd52-ab36-4174de8cb9c8@gmail.com> Message-ID: Remove your option '-vecload_block_size 10'. Hong On Wed, May 24, 2017 at 3:06 PM, Danyang Su wrote: > Dear Hong, > > I just tested with different number of processors for the same matrix. It > sometimes got "ERROR: Arguments are incompatible" for different number of > processors. It works fine using 4, 8, or 24 processors, but failed with > "ERROR: Arguments are incompatible" using 16 or 48 processors. The error > information is attached. I tested this on my local computer with 6 cores 12 > threads. Any suggestion on this? > > Thanks, > > Danyang > > On 17-05-24 12:28 PM, Danyang Su wrote: > > Hi Hong, > > Awesome. Thanks for testing the case. I will try your options for the code > and get back to you later. > > Regards, > > Danyang > > On 17-05-24 12:21 PM, Hong wrote: > > Danyang : > I tested your data. > Your matrices encountered zero pivots, e.g. > petsc/src/ksp/ksp/examples/tutorials (master) > $ mpiexec -n 24 ./ex10 -f0 a_react_in_2.bin -rhs b_react_in_2.bin > -ksp_monitor -ksp_error_if_not_converged > > [15]PETSC ERROR: Zero pivot in LU factorization: > http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot > [15]PETSC ERROR: Zero pivot row 1249 value 2.05808e-14 tolerance > 2.22045e-14 > ... > > Adding option '-sub_pc_factor_shift_type nonzero', I got > mpiexec -n 24 ./ex10 -f0 a_react_in_2.bin -rhs b_react_in_2.bin > -ksp_monitor -ksp_error_if_not_converged -sub_pc_factor_shift_type nonzero > -mat_view ascii::ascii_info > > Mat Object: 24 MPI processes > type: mpiaij > rows=450000, cols=450000 > total: nonzeros=6991400, allocated nonzeros=6991400 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > 0 KSP Residual norm 5.849777711755e+01 > 1 KSP Residual norm 6.824179430230e-01 > 2 KSP Residual norm 3.994483555787e-02 > 3 KSP Residual norm 6.085841461433e-03 > 4 KSP Residual norm 8.876162583511e-04 > 5 KSP Residual norm 9.407780665278e-05 > Number of iterations = 5 > Residual norm 0.00542891 > > Hong > >> Hi Matt, >> >> Yes. The matrix is 450000x450000 sparse. The hypre takes hundreds of >> iterates, not for all but in most of the timesteps. The matrix is not well >> conditioned, with nonzero entries range from 1.0e-29 to 1.0e2. I also made >> double check if there is anything wrong in the parallel version, however, >> the matrix is the same with sequential version except some round error >> which is relatively very small. Usually for those not well conditioned >> matrix, direct solver should be faster than iterative solver, right? But >> when I use the sequential iterative solver with ILU prec developed almost >> 20 years go by others, the solver converge fast with appropriate >> factorization level. In other words, when I use 24 processor using hypre, >> the speed is almost the same as as the old sequential iterative solver >> using 1 processor. >> >> I use most of the default configuration for the general case with pretty >> good speedup. And I am not sure if I miss something for this problem. >> >> Thanks, >> >> Danyang >> >> On 17-05-24 11:12 AM, Matthew Knepley wrote: >> >> On Wed, May 24, 2017 at 12:50 PM, Danyang Su >> wrote: >> >>> Hi Matthew and Barry, >>> >>> Thanks for the quick response. >>> >>> I also tried superlu and mumps, both work but it is about four times >>> slower than ILU(dt) prec through hypre, with 24 processors I have tested. >>> >> You mean the total time is 4x? And you are taking hundreds of iterates? >> That seems hard to believe, unless you are dropping >> a huge number of elements. >> >>> When I look into the convergence information, the method using ILU(dt) >>> still takes 200 to 3000 linear iterations for each newton iteration. One >>> reason is this equation is hard to solve. As for the general cases, the >>> same method works awesome and get very good speedup. >>> >> I do not understand what you mean here. >> >>> I also doubt if I use hypre correctly for this case. Is there anyway to >>> check this problem, or is it possible to increase the factorization level >>> through hypre? >>> >> I don't know. >> >> Matt >> >>> Thanks, >>> >>> Danyang >>> >>> On 17-05-24 04:59 AM, Matthew Knepley wrote: >>> >>> On Wed, May 24, 2017 at 2:21 AM, Danyang Su >>> wrote: >>> >>>> Dear All, >>>> >>>> I use PCFactorSetLevels for ILU and PCFactorSetFill for other >>>> preconditioning in my code to help solve the problems that the default >>>> option is hard to solve. However, I found the latter one, PCFactorSetFill >>>> does not take effect for my problem. The matrices and rhs as well as the >>>> solutions are attached from the link below. I obtain the solution using >>>> hypre preconditioner and it takes 7 and 38 iterations for matrix 1 and >>>> matrix 2. However, if I use other preconditioner, the solver just failed at >>>> the first matrix. I have tested this matrix using the native sequential >>>> solver (not PETSc) with ILU preconditioning. If I set the incomplete >>>> factorization level to 0, this sequential solver will take more than 100 >>>> iterations. If I increase the factorization level to 1 or more, it just >>>> takes several iterations. This remind me that the PC factor for this >>>> matrices should be increased. However, when I tried it in PETSc, it just >>>> does not work. >>>> >>>> Matrix and rhs can be obtained from the link below. >>>> >>>> https://eilinator.eos.ubc.ca:8443/index.php/s/CalUcq9CMeblk4R >>>> >>>> Would anyone help to check if you can make this work by increasing the >>>> PC factor level or fill? >>>> >>> >>> We have ILU(k) supported in serial. However ILU(dt) which takes a >>> tolerance only works through Hypre >>> >>> http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html >>> >>> I recommend you try SuperLU or MUMPS, which can both be downloaded >>> automatically by configure, and >>> do a full sparse LU. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thanks and regards, >>>> >>>> Danyang >>>> >>>> >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> http://www.caam.rice.edu/~mk51/ >>> >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> http://www.caam.rice.edu/~mk51/ >> >> >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From danyang.su at gmail.com Wed May 24 22:05:27 2017 From: danyang.su at gmail.com (Danyang Su) Date: Wed, 24 May 2017 20:05:27 -0700 Subject: [petsc-users] Question on incomplete factorization level and fill In-Reply-To: References: <93217794-9c63-fd52-ab36-4174de8cb9c8@gmail.com> Message-ID: <04d53b3d-a949-c389-2501-69845c5416d3@gmail.com> Hi All, I just delete the .info file and it works without problem now. Thanks, Danyang On 17-05-24 06:32 PM, Hong wrote: > Remove your option '-vecload_block_size 10'. > Hong > > On Wed, May 24, 2017 at 3:06 PM, Danyang Su > wrote: > > Dear Hong, > > I just tested with different number of processors for the same > matrix. It sometimes got "ERROR: Arguments are incompatible" for > different number of processors. It works fine using 4, 8, or 24 > processors, but failed with "ERROR: Arguments are incompatible" > using 16 or 48 processors. The error information is attached. I > tested this on my local computer with 6 cores 12 threads. Any > suggestion on this? > > Thanks, > > Danyang > > > On 17-05-24 12:28 PM, Danyang Su wrote: >> >> Hi Hong, >> >> Awesome. Thanks for testing the case. I will try your options for >> the code and get back to you later. >> >> Regards, >> >> Danyang >> >> >> On 17-05-24 12:21 PM, Hong wrote: >>> Danyang : >>> I tested your data. >>> Your matrices encountered zero pivots, e.g. >>> petsc/src/ksp/ksp/examples/tutorials (master) >>> $ mpiexec -n 24 ./ex10 -f0 a_react_in_2.bin -rhs >>> b_react_in_2.bin -ksp_monitor -ksp_error_if_not_converged >>> >>> [15]PETSC ERROR: Zero pivot in LU factorization: >>> http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot >>> >>> [15]PETSC ERROR: Zero pivot row 1249 value 2.05808e-14 tolerance >>> 2.22045e-14 >>> ... >>> >>> Adding option '-sub_pc_factor_shift_type nonzero', I got >>> mpiexec -n 24 ./ex10 -f0 a_react_in_2.bin -rhs b_react_in_2.bin >>> -ksp_monitor -ksp_error_if_not_converged >>> -sub_pc_factor_shift_type nonzero -mat_view ascii::ascii_info >>> >>> Mat Object: 24 MPI processes >>> type: mpiaij >>> rows=450000, cols=450000 >>> total: nonzeros=6991400, allocated nonzeros=6991400 >>> total number of mallocs used during MatSetValues calls =0 >>> not using I-node (on process 0) routines >>> 0 KSP Residual norm 5.849777711755e+01 >>> 1 KSP Residual norm 6.824179430230e-01 >>> 2 KSP Residual norm 3.994483555787e-02 >>> 3 KSP Residual norm 6.085841461433e-03 >>> 4 KSP Residual norm 8.876162583511e-04 >>> 5 KSP Residual norm 9.407780665278e-05 >>> Number of iterations = 5 >>> Residual norm 0.00542891 >>> >>> Hong >>> >>> Hi Matt, >>> >>> Yes. The matrix is 450000x450000 sparse. The hypre takes >>> hundreds of iterates, not for all but in most of the >>> timesteps. The matrix is not well conditioned, with nonzero >>> entries range from 1.0e-29 to 1.0e2. I also made double >>> check if there is anything wrong in the parallel version, >>> however, the matrix is the same with sequential version >>> except some round error which is relatively very small. >>> Usually for those not well conditioned matrix, direct solver >>> should be faster than iterative solver, right? But when I >>> use the sequential iterative solver with ILU prec developed >>> almost 20 years go by others, the solver converge fast with >>> appropriate factorization level. In other words, when I use >>> 24 processor using hypre, the speed is almost the same as as >>> the old sequential iterative solver using 1 processor. >>> >>> I use most of the default configuration for the general case >>> with pretty good speedup. And I am not sure if I miss >>> something for this problem. >>> >>> Thanks, >>> >>> Danyang >>> >>> >>> On 17-05-24 11:12 AM, Matthew Knepley wrote: >>>> On Wed, May 24, 2017 at 12:50 PM, Danyang Su >>>> > wrote: >>>> >>>> Hi Matthew and Barry, >>>> >>>> Thanks for the quick response. >>>> >>>> I also tried superlu and mumps, both work but it is >>>> about four times slower than ILU(dt) prec through >>>> hypre, with 24 processors I have tested. >>>> >>>> You mean the total time is 4x? And you are taking hundreds >>>> of iterates? That seems hard to believe, unless you are >>>> dropping >>>> a huge number of elements. >>>> >>>> When I look into the convergence information, the >>>> method using ILU(dt) still takes 200 to 3000 linear >>>> iterations for each newton iteration. One reason is >>>> this equation is hard to solve. As for the general >>>> cases, the same method works awesome and get very good >>>> speedup. >>>> >>>> I do not understand what you mean here. >>>> >>>> I also doubt if I use hypre correctly for this case. Is >>>> there anyway to check this problem, or is it possible >>>> to increase the factorization level through hypre? >>>> >>>> I don't know. >>>> >>>> Matt >>>> >>>> Thanks, >>>> >>>> Danyang >>>> >>>> >>>> On 17-05-24 04:59 AM, Matthew Knepley wrote: >>>>> On Wed, May 24, 2017 at 2:21 AM, Danyang Su >>>>> > >>>>> wrote: >>>>> >>>>> Dear All, >>>>> >>>>> I use PCFactorSetLevels for ILU and >>>>> PCFactorSetFill for other preconditioning in my >>>>> code to help solve the problems that the default >>>>> option is hard to solve. However, I found the >>>>> latter one, PCFactorSetFill does not take effect >>>>> for my problem. The matrices and rhs as well as >>>>> the solutions are attached from the link below. I >>>>> obtain the solution using hypre preconditioner and >>>>> it takes 7 and 38 iterations for matrix 1 and >>>>> matrix 2. However, if I use other preconditioner, >>>>> the solver just failed at the first matrix. I have >>>>> tested this matrix using the native sequential >>>>> solver (not PETSc) with ILU preconditioning. If I >>>>> set the incomplete factorization level to 0, this >>>>> sequential solver will take more than 100 >>>>> iterations. If I increase the factorization level >>>>> to 1 or more, it just takes several iterations. >>>>> This remind me that the PC factor for this >>>>> matrices should be increased. However, when I >>>>> tried it in PETSc, it just does not work. >>>>> >>>>> Matrix and rhs can be obtained from the link below. >>>>> >>>>> https://eilinator.eos.ubc.ca:8443/index.php/s/CalUcq9CMeblk4R >>>>> >>>>> >>>>> Would anyone help to check if you can make this >>>>> work by increasing the PC factor level or fill? >>>>> >>>>> >>>>> We have ILU(k) supported in serial. However ILU(dt) >>>>> which takes a tolerance only works through Hypre >>>>> >>>>> http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html >>>>> >>>>> >>>>> I recommend you try SuperLU or MUMPS, which can both >>>>> be downloaded automatically by configure, and >>>>> do a full sparse LU. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> Thanks and regards, >>>>> >>>>> Danyang >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they >>>>> begin their experiments is infinitely more interesting >>>>> than any results to which their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> http://www.caam.rice.edu/~mk51/ >>>>> >>>> >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin >>>> their experiments is infinitely more interesting than any >>>> results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> http://www.caam.rice.edu/~mk51/ >>>> >>> >>> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From danyang.su at gmail.com Thu May 25 02:26:16 2017 From: danyang.su at gmail.com (Danyang Su) Date: Thu, 25 May 2017 00:26:16 -0700 Subject: [petsc-users] PCFactorSetShiftType does not work in code but -pc_factor_set_shift_type works In-Reply-To: References: <93217794-9c63-fd52-ab36-4174de8cb9c8@gmail.com> Message-ID: <8634589f-d1a5-bf4f-b158-3ddb5a18026b@gmail.com> Dear Hong and Barry, I have implemented this option in the code, as we also need to use configuration from file for convenience. When I run the code using options, it works fine, however, when I run the code using configuration file, it does not work. The code has two set of equations, flow and reactive, with prefix been set to "flow_" and "react_". When I run the code using mpiexec -n 4 ../executable -flow_sub_pc_factor_shift_type nonzero -react_sub_pc_factor_shift_type nonzero it works. However, if I run using mpiexec -n 4 ../executable and let the executable file read the options from file, it just does not work at "call PCFactorSetShiftType(pc_flow,MAT_SHIFT_NONZERO, ierr) or none, positive_definite ...". Do I miss something here? Below is the pseudo code I have used for flow equations, similar for reactive equations. call MatCreateAIJ(Petsc_Comm_World,nndof,nndof,nngbldof, & nngbldof,d_nz,PETSC_NULL_INTEGER,o_nz, & PETSC_NULL_INTEGER,a_flow,ierr) CHKERRQ(ierr) call MatSetFromOptions(a_flow,ierr) CHKERRQ(ierr) call KSPCreate(Petsc_Comm_World, ksp_flow, ierr) CHKERRQ(ierr) call KSPAppendOptionsPrefix(ksp_flow,"flow_",ierr) CHKERRQ(ierr) call KSPSetInitialGuessNonzero(ksp_flow, & b_initial_guess_nonzero_flow, ierr) CHKERRQ(ierr) call KSPSetInitialGuessNonzero(ksp_flow, & b_initial_guess_nonzero_flow, ierr) CHKERRQ(ierr) call KSPSetDM(ksp_flow,dmda_flow%da,ierr) CHKERRQ(ierr) call KSPSetDMActive(ksp_flow,PETSC_FALSE,ierr) CHKERRQ(ierr) !!!!*********CHECK IF READ OPTION FROM FILE*********!!!! if (read_option_from_file) then call KSPSetType(ksp_flow, KSPGMRES, ierr) !or KSPBCGS or others... CHKERRQ(ierr) call KSPGetPC(ksp_flow, pc_flow, ierr) CHKERRQ(ierr) call PCSetType(pc_flow,PCBJACOBI, ierr) !or PCILU or PCJACOBI or PCHYPRE ... CHKERRQ(ierr) call PCFactorSetShiftType(pc_flow,MAT_SHIFT_NONZERO, ierr) or none, positive_definite ... CHKERRQ(ierr) end if call PCFactorGetMatSolverPackage(pc_flow,solver_pkg_flow,ierr) CHKERRQ(ierr) call compute_jacobian(rank,dmda_flow%da, & a_flow,a_in,ia_in,ja_in,nngl_in, & row_idx_l2pg,col_idx_l2pg, & b_non_interlaced) call KSPSetFromOptions(ksp_flow,ierr) CHKERRQ(ierr) call KSPSetUp(ksp_flow,ierr) CHKERRQ(ierr) call KSPSetUpOnBlocks(ksp_flow,ierr) CHKERRQ(ierr) call KSPSolve(ksp_flow,b_flow,x_flow,ierr) CHKERRQ(ierr) Thanks and Regards, Danyang On 17-05-24 06:32 PM, Hong wrote: > Remove your option '-vecload_block_size 10'. > Hong > > On Wed, May 24, 2017 at 3:06 PM, Danyang Su > wrote: > > Dear Hong, > > I just tested with different number of processors for the same > matrix. It sometimes got "ERROR: Arguments are incompatible" for > different number of processors. It works fine using 4, 8, or 24 > processors, but failed with "ERROR: Arguments are incompatible" > using 16 or 48 processors. The error information is attached. I > tested this on my local computer with 6 cores 12 threads. Any > suggestion on this? > > Thanks, > > Danyang > > > On 17-05-24 12:28 PM, Danyang Su wrote: >> >> Hi Hong, >> >> Awesome. Thanks for testing the case. I will try your options for >> the code and get back to you later. >> >> Regards, >> >> Danyang >> >> >> On 17-05-24 12:21 PM, Hong wrote: >>> Danyang : >>> I tested your data. >>> Your matrices encountered zero pivots, e.g. >>> petsc/src/ksp/ksp/examples/tutorials (master) >>> $ mpiexec -n 24 ./ex10 -f0 a_react_in_2.bin -rhs >>> b_react_in_2.bin -ksp_monitor -ksp_error_if_not_converged >>> >>> [15]PETSC ERROR: Zero pivot in LU factorization: >>> http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot >>> >>> [15]PETSC ERROR: Zero pivot row 1249 value 2.05808e-14 tolerance >>> 2.22045e-14 >>> ... >>> >>> Adding option '-sub_pc_factor_shift_type nonzero', I got >>> mpiexec -n 24 ./ex10 -f0 a_react_in_2.bin -rhs b_react_in_2.bin >>> -ksp_monitor -ksp_error_if_not_converged >>> -sub_pc_factor_shift_type nonzero -mat_view ascii::ascii_info >>> >>> Mat Object: 24 MPI processes >>> type: mpiaij >>> rows=450000, cols=450000 >>> total: nonzeros=6991400, allocated nonzeros=6991400 >>> total number of mallocs used during MatSetValues calls =0 >>> not using I-node (on process 0) routines >>> 0 KSP Residual norm 5.849777711755e+01 >>> 1 KSP Residual norm 6.824179430230e-01 >>> 2 KSP Residual norm 3.994483555787e-02 >>> 3 KSP Residual norm 6.085841461433e-03 >>> 4 KSP Residual norm 8.876162583511e-04 >>> 5 KSP Residual norm 9.407780665278e-05 >>> Number of iterations = 5 >>> Residual norm 0.00542891 >>> >>> Hong >>> >>> Hi Matt, >>> >>> Yes. The matrix is 450000x450000 sparse. The hypre takes >>> hundreds of iterates, not for all but in most of the >>> timesteps. The matrix is not well conditioned, with nonzero >>> entries range from 1.0e-29 to 1.0e2. I also made double >>> check if there is anything wrong in the parallel version, >>> however, the matrix is the same with sequential version >>> except some round error which is relatively very small. >>> Usually for those not well conditioned matrix, direct solver >>> should be faster than iterative solver, right? But when I >>> use the sequential iterative solver with ILU prec developed >>> almost 20 years go by others, the solver converge fast with >>> appropriate factorization level. In other words, when I use >>> 24 processor using hypre, the speed is almost the same as as >>> the old sequential iterative solver using 1 processor. >>> >>> I use most of the default configuration for the general case >>> with pretty good speedup. And I am not sure if I miss >>> something for this problem. >>> >>> Thanks, >>> >>> Danyang >>> >>> >>> On 17-05-24 11:12 AM, Matthew Knepley wrote: >>>> On Wed, May 24, 2017 at 12:50 PM, Danyang Su >>>> > wrote: >>>> >>>> Hi Matthew and Barry, >>>> >>>> Thanks for the quick response. >>>> >>>> I also tried superlu and mumps, both work but it is >>>> about four times slower than ILU(dt) prec through >>>> hypre, with 24 processors I have tested. >>>> >>>> You mean the total time is 4x? And you are taking hundreds >>>> of iterates? That seems hard to believe, unless you are >>>> dropping >>>> a huge number of elements. >>>> >>>> When I look into the convergence information, the >>>> method using ILU(dt) still takes 200 to 3000 linear >>>> iterations for each newton iteration. One reason is >>>> this equation is hard to solve. As for the general >>>> cases, the same method works awesome and get very good >>>> speedup. >>>> >>>> I do not understand what you mean here. >>>> >>>> I also doubt if I use hypre correctly for this case. Is >>>> there anyway to check this problem, or is it possible >>>> to increase the factorization level through hypre? >>>> >>>> I don't know. >>>> >>>> Matt >>>> >>>> Thanks, >>>> >>>> Danyang >>>> >>>> >>>> On 17-05-24 04:59 AM, Matthew Knepley wrote: >>>>> On Wed, May 24, 2017 at 2:21 AM, Danyang Su >>>>> > >>>>> wrote: >>>>> >>>>> Dear All, >>>>> >>>>> I use PCFactorSetLevels for ILU and >>>>> PCFactorSetFill for other preconditioning in my >>>>> code to help solve the problems that the default >>>>> option is hard to solve. However, I found the >>>>> latter one, PCFactorSetFill does not take effect >>>>> for my problem. The matrices and rhs as well as >>>>> the solutions are attached from the link below. I >>>>> obtain the solution using hypre preconditioner and >>>>> it takes 7 and 38 iterations for matrix 1 and >>>>> matrix 2. However, if I use other preconditioner, >>>>> the solver just failed at the first matrix. I have >>>>> tested this matrix using the native sequential >>>>> solver (not PETSc) with ILU preconditioning. If I >>>>> set the incomplete factorization level to 0, this >>>>> sequential solver will take more than 100 >>>>> iterations. If I increase the factorization level >>>>> to 1 or more, it just takes several iterations. >>>>> This remind me that the PC factor for this >>>>> matrices should be increased. However, when I >>>>> tried it in PETSc, it just does not work. >>>>> >>>>> Matrix and rhs can be obtained from the link below. >>>>> >>>>> https://eilinator.eos.ubc.ca:8443/index.php/s/CalUcq9CMeblk4R >>>>> >>>>> >>>>> Would anyone help to check if you can make this >>>>> work by increasing the PC factor level or fill? >>>>> >>>>> >>>>> We have ILU(k) supported in serial. However ILU(dt) >>>>> which takes a tolerance only works through Hypre >>>>> >>>>> http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html >>>>> >>>>> >>>>> I recommend you try SuperLU or MUMPS, which can both >>>>> be downloaded automatically by configure, and >>>>> do a full sparse LU. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> Thanks and regards, >>>>> >>>>> Danyang >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they >>>>> begin their experiments is infinitely more interesting >>>>> than any results to which their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> http://www.caam.rice.edu/~mk51/ >>>>> >>>> >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin >>>> their experiments is infinitely more interesting than any >>>> results to which their experiments lead. >>>> -- Norbert Wiener >>>> >>>> http://www.caam.rice.edu/~mk51/ >>>> >>> >>> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.afanasiev at erdw.ethz.ch Thu May 25 03:43:27 2017 From: michael.afanasiev at erdw.ethz.ch (Michael Afanasiev) Date: Thu, 25 May 2017 10:43:27 +0200 Subject: [petsc-users] Postdoc Position in the Computational Seismology Group at ETH Zurich. Message-ID: <8EADEF87-4EEA-4174-892C-1AD17FA556EE@erdw.ethz.ch> Hi everyone, The computational seismology group at ETH Zurich is looking for a postdoc to work with us on Salvus (www.salvus.io ) - a spectral-element software package for full-waveform modelling and inversion. The exact focus of the job is tied to the applicant's strengths and interests, and ranges from HPC engineering to tackling large-scale frequency domain (Helmholtz) applications. The code is currently integrated with PETSc, and utilizes DMPLEX for unstructured mesh management. Please find more details below. Cheers, Mike. _____ Postdoctoral research position: Full-waveform modeling and inversion across the scales The Computational Seismology Group at ETH Z?rich is seeking to appoint a postdoctoral researcher to work on Salvus, an open-source framework for full-waveform modeling and inversion (http://salvus.io ). The position is full-time (100%) for a duration of 24 months, with possibility for extension. Earliest starting date is 1 June 2017. Background: Salvus is a modular open-source code package for large-scale waveform modelling and inversion built on the basis of modern programming principles. This project will enable Salvus to (1) harness large homogeneous and various heterogeneous HPC architectures that are available today, and (2) easily adapt to future architectures, requiring minimal code modifications. The project is intended to position Salvus as a top wavefield modelling and inversion package in the exascale era. To ensure performance of Salvus on today's and tomorrow's supercomputing platforms, work will focus on cross-architecture developments, code and I/O optimisation, and systematic testing and validation. This will be complemented by actions to increase and broaden the usability and impact of Salvus. They include workflow developments, the implementation of frequency-domain solvers, and extensions of the physics that can be modelled. The successful candidate will be embedded into the team of Salvus developers and users covering a wide range of fields, including Computational Science, Applied Mathematics, Seismology, Exploration and Environmental Geophysics, Geothermal Energy, and Geofluids. She or he will have access to Piz Daint, currently Europe?s fastest supercomputer, located at the Swiss National Supercomputing Center (CSCS, www.cscs.ch ). Apart from the core responsibilities listed below, the successful candidate will have considerable freedom of research in order to develop an independent scientific career. Topics of interest to the group include, but are not limited to real-world waveform modeling and inversion applications, the development of methods for uncertainty analysis, and the transfer of Salvus to new domains outside traditional seismology. Core responsibilities: Cross-architecture developments, leveraging Salvus? mixin-based design to implement hardware-specific versions of compute-intensive code segments, while leaving most of the code unchanged. General code optimisations to achieve maximal performance from single nodes to full machine runs. I/O optimisation to handle the enormous data volumes needed in adjoint simulations. Sub-tasks include the incorporation and extension of a previously developed wavefield compression library, and the interfacing to modern parallel seismic data formats. Workflow developments to facilitate the solution of large-scale inverse problems, including the automatic orchestration of a large number of HPC jobs. Expected qualifications: The ideal candidate should have the following attributes: PhD degree in geophysics, computer science, physics, applied mathematics or a related field, strong programming skills in C or C++, experience developing software which exploits large scale HPC platforms with a strong knowledge of MPI and experience with at least one other parallel paradigm (OpenMP, CUDA, OpenCL), experience with collaborative software development (i.e. continuous integration services), experience with finite element methods, numerical wave propagation, and/or inverse problems, experience with Krylov methods and preconditioners - specifically either domain-decomposition methods and or multigrid methods (geometric, algebraic). Furthermore, the successful candidate is expected to have excellent organizational, communication and interpersonal skills that allow her or him to work in a highly collaborative and interdisciplinary environment. Application: To apply for this position, please send your full resume, cover letter and the names of three references to Prof. Andreas Fichtner (andreas.fichtner at erdw.ethz.ch ). If possible, please also attach a link to one or more software packages you have been involved with (GitHub, GitLab, Bitbucket, ?). The position will remain open until filled. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jchludzinski at gmail.com Thu May 25 07:54:29 2017 From: jchludzinski at gmail.com (John Chludzinski) Date: Thu, 25 May 2017 08:54:29 -0400 Subject: [petsc-users] PETSC OO C guide/standard? In-Reply-To: References: Message-ID: Thanks. C++ has now become the apotheosis of "no value-added complexity". Even Bjarne Stroustrup admits to understanding only a small fraction of the whole. On Wed, May 24, 2017 at 9:53 AM, Matthew Knepley wrote: > On Wed, May 24, 2017 at 8:50 AM, John Chludzinski > wrote: > >> Considering that the current C++ standard is >1600 pages and counting >> (still glomming on new "features"), I'm planning to try an OO style of C >> coding style. >> >> The standard's size (number of pages) being the best (and only >> *practical*) means to measure language complexity. >> > > Here is another thing I wrote talking about OO in PETSc: > > https://arxiv.org/abs/1209.1711 > > Matt > > >> On Wed, May 24, 2017 at 9:11 AM, Matthew Knepley >> wrote: >> >>> On Wed, May 24, 2017 at 8:03 AM, John Chludzinski < >>> jchludzinski at gmail.com> wrote: >>> >>>> Is there a guide for how to write/develop PETSC OO C code? How a >>>> "class" is defined/implemented? How you implement inheritance? Memory >>>> management? Etc? >>>> >>> >>> We have a guide: http://www.mcs.anl.gov/petsc/developers/developers.pdf >>> >>> If its not in there, you can mail the list. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> ---John >>>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> http://www.caam.rice.edu/~mk51/ >>> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lawrence.mitchell at imperial.ac.uk Thu May 25 09:27:39 2017 From: lawrence.mitchell at imperial.ac.uk (Lawrence Mitchell) Date: Thu, 25 May 2017 15:27:39 +0100 Subject: [petsc-users] DMPlex distribution with FVM adjacency Message-ID: Dear petsc-users, I am trying to distribute a triangle mesh with a cell halo defined by FVM adjacency (i.e. if I have a facet in the initial (0-overlap) distribution, I want the cell on the other side of it). Reading the documentation, I think I do: DMPlexSetAdjacencyUseCone(PETSC_TRUE) DMPlexSetAdjacencyUseClosure(PETSC_FALSE) and then DMPlexDistribute(..., ovelap=1) If I do this for a simple mesh and then try and do anything on it, I run into all sorts of problems because I have a plex where I have some facets, but not even one cell in the support of the facet. Is this to be expected? For example the following petsc4py code breaks when run on 3 processes: $ mpiexec -n 3 python bork.py [1] DMPlexGetOrdering() line 133 in /data/lmitche1/src/deps/petsc/src/dm/impls/plex/plexreorder.c [1] DMPlexCreateOrderingClosure_Static() line 41 in /data/lmitche1/src/deps/petsc/src/dm/impls/plex/plexreorder.c [1] Petsc has generated inconsistent data [1] Number of depth 2 faces 34 does not match permuted nubmer 29 : error code 77 [2] DMPlexGetOrdering() line 133 in /data/lmitche1/src/deps/petsc/src/dm/impls/plex/plexreorder.c [2] DMPlexCreateOrderingClosure_Static() line 41 in /data/lmitche1/src/deps/petsc/src/dm/impls/plex/plexreorder.c [2] Petsc has generated inconsistent data [2] Number of depth 2 faces 33 does not match permuted nubmer 28 : error code 77 [0] DMPlexGetOrdering() line 133 in /data/lmitche1/src/deps/petsc/src/dm/impls/plex/plexreorder.c [0] DMPlexCreateOrderingClosure_Static() line 41 in /data/lmitche1/src/deps/petsc/src/dm/impls/plex/plexreorder.c [0] Petsc has generated inconsistent data [0] Number of depth 2 faces 33 does not match permuted nubmer 31 $ cat > bork.py<<\EOF from petsc4py import PETSc import numpy as np Lx = Ly = 1 nx = ny = 4 xcoords = np.linspace(0.0, Lx, nx + 1, dtype=PETSc.RealType) ycoords = np.linspace(0.0, Ly, ny + 1, dtype=PETSc.RealType) coords = np.asarray(np.meshgrid(xcoords, ycoords)).swapaxes(0, 2).reshape(-1, 2) # cell vertices i, j = np.meshgrid(np.arange(nx, dtype=PETSc.IntType), np.arange(ny, dtype=PETSc.IntType)) cells = [i*(ny+1) + j, i*(ny+1) + j+1, (i+1)*(ny+1) + j+1, (i+1)*(ny+1) + j] cells = np.asarray(cells, dtype=PETSc.IntType).swapaxes(0, 2).reshape(-1, 4) idx = [0, 1, 3, 1, 2, 3] cells = cells[:, idx].reshape(-1, 3) comm = PETSc.COMM_WORLD if comm.rank == 0: dm = PETSc.DMPlex().createFromCellList(2, cells, coords, comm=comm) else: dm = PETSc.DMPlex().createFromCellList(2, np.zeros((0, 4), dtype=PETSc.IntType), np.zeros((0, 2), dtype=PETSc.RealType), comm=comm) dm.setAdjacencyUseClosure(False) dm.setAdjacencyUseCone(True) dm.distribute(overlap=1) dm.getOrdering(PETSc.Mat.OrderingType.RCM) dm.view() EOF Am I doing something wrong? Is this not expected to work? Cheers, Lawrence -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 473 bytes Desc: OpenPGP digital signature URL: From hzhang at mcs.anl.gov Thu May 25 09:49:59 2017 From: hzhang at mcs.anl.gov (Hong) Date: Thu, 25 May 2017 09:49:59 -0500 Subject: [petsc-users] PCFactorSetShiftType does not work in code but -pc_factor_set_shift_type works In-Reply-To: <8634589f-d1a5-bf4f-b158-3ddb5a18026b@gmail.com> References: <93217794-9c63-fd52-ab36-4174de8cb9c8@gmail.com> <8634589f-d1a5-bf4f-b158-3ddb5a18026b@gmail.com> Message-ID: Danyang: You must access inner pc, then set shift. See petsc/src/ksp/ksp/examples/tutorials/ex7.c For example, I add following to petsc/src/ksp/ksp/examples/tutorials/ex2.c, line 191: PetscBool isbjacobi; PC pc; ierr = KSPGetPC(ksp,&pc);CHKERRQ(ierr); ierr = PetscObjectTypeCompare((PetscObject)pc,PCBJACOBI,&isbjacobi);CHKERRQ(ierr); if (isbjacobi) { PetscInt nlocal; KSP *subksp; PC subpc; ierr = KSPSetUp(ksp);CHKERRQ(ierr); ierr = KSPGetPC(ksp,&pc);CHKERRQ(ierr); /* Extract the array of KSP contexts for the local blocks */ ierr = PCBJacobiGetSubKSP(pc,&nlocal,NULL,&subksp);CHKERRQ(ierr); printf("isbjacobi, nlocal %D, set option to subpc...\n",nlocal); for (i=0; i > I have implemented this option in the code, as we also need to use > configuration from file for convenience. When I run the code using options, > it works fine, however, when I run the code using configuration file, it > does not work. The code has two set of equations, flow and reactive, with > prefix been set to "flow_" and "react_". When I run the code using > > mpiexec -n 4 ../executable -flow_sub_pc_factor_shift_type nonzero > -react_sub_pc_factor_shift_type nonzero > > it works. However, if I run using > > mpiexec -n 4 ../executable > > and let the executable file read the options from file, it just does not > work at "call PCFactorSetShiftType(pc_flow,MAT_SHIFT_NONZERO, ierr) or > none, positive_definite ...". Do I miss something here? > > Below is the pseudo code I have used for flow equations, similar for > reactive equations. > > call MatCreateAIJ(Petsc_Comm_World,nndof,nndof,nngbldof, & > nngbldof,d_nz,PETSC_NULL_INTEGER,o_nz, & > PETSC_NULL_INTEGER,a_flow,ierr) > CHKERRQ(ierr) > > call MatSetFromOptions(a_flow,ierr) > CHKERRQ(ierr) > > call KSPCreate(Petsc_Comm_World, ksp_flow, ierr) > CHKERRQ(ierr) > > call KSPAppendOptionsPrefix(ksp_flow,"flow_",ierr) > CHKERRQ(ierr) > > call KSPSetInitialGuessNonzero(ksp_flow, & > b_initial_guess_nonzero_flow, ierr) > CHKERRQ(ierr) > > call KSPSetInitialGuessNonzero(ksp_flow, & > b_initial_guess_nonzero_flow, ierr) > CHKERRQ(ierr) > > call KSPSetDM(ksp_flow,dmda_flow%da,ierr) > CHKERRQ(ierr) > call KSPSetDMActive(ksp_flow,PETSC_FALSE,ierr) > CHKERRQ(ierr) > > !!!!*********CHECK IF READ OPTION FROM FILE*********!!!! > if (read_option_from_file) then > > call KSPSetType(ksp_flow, KSPGMRES, ierr) !or KSPBCGS or > others... > CHKERRQ(ierr) > > call KSPGetPC(ksp_flow, pc_flow, ierr) > CHKERRQ(ierr) > > call PCSetType(pc_flow,PCBJACOBI, ierr) !or PCILU or > PCJACOBI or PCHYPRE ... > CHKERRQ(ierr) > > call PCFactorSetShiftType(pc_flow,MAT_SHIFT_NONZERO, ierr) or > none, positive_definite ... > CHKERRQ(ierr) > > end if > > call PCFactorGetMatSolverPackage(pc_flow,solver_pkg_flow,ierr) > CHKERRQ(ierr) > > call compute_jacobian(rank,dmda_flow%da, & > a_flow,a_in,ia_in,ja_in,nngl_in, & > row_idx_l2pg,col_idx_l2pg, & > b_non_interlaced) > call KSPSetFromOptions(ksp_flow,ierr) > CHKERRQ(ierr) > > call KSPSetUp(ksp_flow,ierr) > CHKERRQ(ierr) > > call KSPSetUpOnBlocks(ksp_flow,ierr) > CHKERRQ(ierr) > > call KSPSolve(ksp_flow,b_flow,x_flow,ierr) > CHKERRQ(ierr) > > > Thanks and Regards, > > Danyang > On 17-05-24 06:32 PM, Hong wrote: > > Remove your option '-vecload_block_size 10'. > Hong > > On Wed, May 24, 2017 at 3:06 PM, Danyang Su wrote: > >> Dear Hong, >> >> I just tested with different number of processors for the same matrix. It >> sometimes got "ERROR: Arguments are incompatible" for different number of >> processors. It works fine using 4, 8, or 24 processors, but failed with >> "ERROR: Arguments are incompatible" using 16 or 48 processors. The error >> information is attached. I tested this on my local computer with 6 cores 12 >> threads. Any suggestion on this? >> >> Thanks, >> >> Danyang >> >> On 17-05-24 12:28 PM, Danyang Su wrote: >> >> Hi Hong, >> >> Awesome. Thanks for testing the case. I will try your options for the >> code and get back to you later. >> >> Regards, >> >> Danyang >> >> On 17-05-24 12:21 PM, Hong wrote: >> >> Danyang : >> I tested your data. >> Your matrices encountered zero pivots, e.g. >> petsc/src/ksp/ksp/examples/tutorials (master) >> $ mpiexec -n 24 ./ex10 -f0 a_react_in_2.bin -rhs b_react_in_2.bin >> -ksp_monitor -ksp_error_if_not_converged >> >> [15]PETSC ERROR: Zero pivot in LU factorization: >> http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot >> [15]PETSC ERROR: Zero pivot row 1249 value 2.05808e-14 tolerance >> 2.22045e-14 >> ... >> >> Adding option '-sub_pc_factor_shift_type nonzero', I got >> mpiexec -n 24 ./ex10 -f0 a_react_in_2.bin -rhs b_react_in_2.bin >> -ksp_monitor -ksp_error_if_not_converged -sub_pc_factor_shift_type nonzero >> -mat_view ascii::ascii_info >> >> Mat Object: 24 MPI processes >> type: mpiaij >> rows=450000, cols=450000 >> total: nonzeros=6991400, allocated nonzeros=6991400 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node (on process 0) routines >> 0 KSP Residual norm 5.849777711755e+01 >> 1 KSP Residual norm 6.824179430230e-01 >> 2 KSP Residual norm 3.994483555787e-02 >> 3 KSP Residual norm 6.085841461433e-03 >> 4 KSP Residual norm 8.876162583511e-04 >> 5 KSP Residual norm 9.407780665278e-05 >> Number of iterations = 5 >> Residual norm 0.00542891 >> >> Hong >> >>> Hi Matt, >>> >>> Yes. The matrix is 450000x450000 sparse. The hypre takes hundreds of >>> iterates, not for all but in most of the timesteps. The matrix is not well >>> conditioned, with nonzero entries range from 1.0e-29 to 1.0e2. I also made >>> double check if there is anything wrong in the parallel version, however, >>> the matrix is the same with sequential version except some round error >>> which is relatively very small. Usually for those not well conditioned >>> matrix, direct solver should be faster than iterative solver, right? But >>> when I use the sequential iterative solver with ILU prec developed almost >>> 20 years go by others, the solver converge fast with appropriate >>> factorization level. In other words, when I use 24 processor using hypre, >>> the speed is almost the same as as the old sequential iterative solver >>> using 1 processor. >>> >>> I use most of the default configuration for the general case with pretty >>> good speedup. And I am not sure if I miss something for this problem. >>> >>> Thanks, >>> >>> Danyang >>> >>> On 17-05-24 11:12 AM, Matthew Knepley wrote: >>> >>> On Wed, May 24, 2017 at 12:50 PM, Danyang Su >>> wrote: >>> >>>> Hi Matthew and Barry, >>>> >>>> Thanks for the quick response. >>>> >>>> I also tried superlu and mumps, both work but it is about four times >>>> slower than ILU(dt) prec through hypre, with 24 processors I have tested. >>>> >>> You mean the total time is 4x? And you are taking hundreds of iterates? >>> That seems hard to believe, unless you are dropping >>> a huge number of elements. >>> >>>> When I look into the convergence information, the method using ILU(dt) >>>> still takes 200 to 3000 linear iterations for each newton iteration. One >>>> reason is this equation is hard to solve. As for the general cases, the >>>> same method works awesome and get very good speedup. >>>> >>> I do not understand what you mean here. >>> >>>> I also doubt if I use hypre correctly for this case. Is there anyway to >>>> check this problem, or is it possible to increase the factorization level >>>> through hypre? >>>> >>> I don't know. >>> >>> Matt >>> >>>> Thanks, >>>> >>>> Danyang >>>> >>>> On 17-05-24 04:59 AM, Matthew Knepley wrote: >>>> >>>> On Wed, May 24, 2017 at 2:21 AM, Danyang Su >>>> wrote: >>>> >>>>> Dear All, >>>>> >>>>> I use PCFactorSetLevels for ILU and PCFactorSetFill for other >>>>> preconditioning in my code to help solve the problems that the default >>>>> option is hard to solve. However, I found the latter one, PCFactorSetFill >>>>> does not take effect for my problem. The matrices and rhs as well as the >>>>> solutions are attached from the link below. I obtain the solution using >>>>> hypre preconditioner and it takes 7 and 38 iterations for matrix 1 and >>>>> matrix 2. However, if I use other preconditioner, the solver just failed at >>>>> the first matrix. I have tested this matrix using the native sequential >>>>> solver (not PETSc) with ILU preconditioning. If I set the incomplete >>>>> factorization level to 0, this sequential solver will take more than 100 >>>>> iterations. If I increase the factorization level to 1 or more, it just >>>>> takes several iterations. This remind me that the PC factor for this >>>>> matrices should be increased. However, when I tried it in PETSc, it just >>>>> does not work. >>>>> >>>>> Matrix and rhs can be obtained from the link below. >>>>> >>>>> https://eilinator.eos.ubc.ca:8443/index.php/s/CalUcq9CMeblk4R >>>>> >>>>> Would anyone help to check if you can make this work by increasing the >>>>> PC factor level or fill? >>>>> >>>> >>>> We have ILU(k) supported in serial. However ILU(dt) which takes a >>>> tolerance only works through Hypre >>>> >>>> http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html >>>> >>>> I recommend you try SuperLU or MUMPS, which can both be downloaded >>>> automatically by configure, and >>>> do a full sparse LU. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Thanks and regards, >>>>> >>>>> Danyang >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> http://www.caam.rice.edu/~mk51/ >>>> >>>> >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> http://www.caam.rice.edu/~mk51/ >>> >>> >>> >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu May 25 10:25:30 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 25 May 2017 10:25:30 -0500 Subject: [petsc-users] DMPlex distribution with FVM adjacency In-Reply-To: References: Message-ID: On Thu, May 25, 2017 at 9:27 AM, Lawrence Mitchell < lawrence.mitchell at imperial.ac.uk> wrote: > Dear petsc-users, > > I am trying to distribute a triangle mesh with a cell halo defined by > FVM adjacency (i.e. if I have a facet in the initial (0-overlap) > distribution, I want the cell on the other side of it). > > Reading the documentation, I think I do: > > DMPlexSetAdjacencyUseCone(PETSC_TRUE) > DMPlexSetAdjacencyUseClosure(PETSC_FALSE) > > and then > DMPlexDistribute(..., ovelap=1) > > If I do this for a simple mesh and then try and do anything on it, I > run into all sorts of problems because I have a plex where I have some > facets, but not even one cell in the support of the facet. Is this to > be expected? > Hmm. I don't think so. You should have at least one cell in the support of every facet. TS ex11 works exactly this way. When using that adjacency, the overlap cells you get will not have anything but the facet connecting them to that partition. Although, if you have adjacent cells in that overlap layer, you can get ghost faces between those. With the code below, do you get an interpolated mesh when you create it there. That call in C has another argument http://www.mcs.anl.gov/petsc/petsc-master/docs/manualpages/DMPLEX/DMPlexCreateFromCellList.html If its just cells and vertices, you could get some bizarre things like you see. Matt > For example the following petsc4py code breaks when run on 3 processes: > > $ mpiexec -n 3 python bork.py > [1] DMPlexGetOrdering() line 133 in > /data/lmitche1/src/deps/petsc/src/dm/impls/plex/plexreorder.c > [1] DMPlexCreateOrderingClosure_Static() line 41 in > /data/lmitche1/src/deps/petsc/src/dm/impls/plex/plexreorder.c > [1] Petsc has generated inconsistent data > [1] Number of depth 2 faces 34 does not match permuted nubmer 29 > : error code 77 > [2] DMPlexGetOrdering() line 133 in > /data/lmitche1/src/deps/petsc/src/dm/impls/plex/plexreorder.c > [2] DMPlexCreateOrderingClosure_Static() line 41 in > /data/lmitche1/src/deps/petsc/src/dm/impls/plex/plexreorder.c > [2] Petsc has generated inconsistent data > [2] Number of depth 2 faces 33 does not match permuted nubmer 28 > : error code 77 > [0] DMPlexGetOrdering() line 133 in > /data/lmitche1/src/deps/petsc/src/dm/impls/plex/plexreorder.c > [0] DMPlexCreateOrderingClosure_Static() line 41 in > /data/lmitche1/src/deps/petsc/src/dm/impls/plex/plexreorder.c > [0] Petsc has generated inconsistent data > [0] Number of depth 2 faces 33 does not match permuted nubmer 31 > > $ cat > bork.py<<\EOF > from petsc4py import PETSc > import numpy as np > Lx = Ly = 1 > nx = ny = 4 > > xcoords = np.linspace(0.0, Lx, nx + 1, dtype=PETSc.RealType) > ycoords = np.linspace(0.0, Ly, ny + 1, dtype=PETSc.RealType) > coords = np.asarray(np.meshgrid(xcoords, ycoords)).swapaxes(0, > 2).reshape(-1, 2) > > # cell vertices > i, j = np.meshgrid(np.arange(nx, dtype=PETSc.IntType), np.arange(ny, > dtype=PETSc.IntType)) > cells = [i*(ny+1) + j, i*(ny+1) + j+1, (i+1)*(ny+1) + j+1, > (i+1)*(ny+1) + j] > cells = np.asarray(cells, dtype=PETSc.IntType).swapaxes(0, > 2).reshape(-1, 4) > idx = [0, 1, 3, 1, 2, 3] > cells = cells[:, idx].reshape(-1, 3) > > comm = PETSc.COMM_WORLD > if comm.rank == 0: > dm = PETSc.DMPlex().createFromCellList(2, cells, coords, comm=comm) > else: > dm = PETSc.DMPlex().createFromCellList(2, np.zeros((0, 4), > dtype=PETSc.IntType), > np.zeros((0, 2), > dtype=PETSc.RealType), > comm=comm) > > dm.setAdjacencyUseClosure(False) > dm.setAdjacencyUseCone(True) > > dm.distribute(overlap=1) > > dm.getOrdering(PETSc.Mat.OrderingType.RCM) > > dm.view() > EOF > > Am I doing something wrong? Is this not expected to work? > > Cheers, > > Lawrence > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From lawrence.mitchell at imperial.ac.uk Thu May 25 11:27:23 2017 From: lawrence.mitchell at imperial.ac.uk (Lawrence Mitchell) Date: Thu, 25 May 2017 17:27:23 +0100 Subject: [petsc-users] DMPlex distribution with FVM adjacency In-Reply-To: References: Message-ID: <15e465f7-dea1-39c5-7c43-ba447a7a8c09@imperial.ac.uk> On 25/05/17 16:25, Matthew Knepley wrote: > On Thu, May 25, 2017 at 9:27 AM, Lawrence Mitchell > > wrote: > > Dear petsc-users, > > I am trying to distribute a triangle mesh with a cell halo defined by > FVM adjacency (i.e. if I have a facet in the initial (0-overlap) > distribution, I want the cell on the other side of it). > > Reading the documentation, I think I do: > > DMPlexSetAdjacencyUseCone(PETSC_TRUE) > DMPlexSetAdjacencyUseClosure(PETSC_FALSE) > > and then > DMPlexDistribute(..., ovelap=1) > > If I do this for a simple mesh and then try and do anything on it, I > run into all sorts of problems because I have a plex where I have some > facets, but not even one cell in the support of the facet. Is this to > be expected? > > > Hmm. I don't think so. You should have at least one cell in the > support of every facet. > TS ex11 works exactly this way. > > When using that adjacency, the overlap cells you get will not have > anything but the > facet connecting them to that partition. Although, if you have > adjacent cells in that overlap layer, > you can get ghost faces between those. > > With the code below, do you get an interpolated mesh when you create > it there. That call in C > has another argument > > http://www.mcs.anl.gov/petsc/petsc-master/docs/manualpages/DMPLEX/DMPlexCreateFromCellList.html The mesh is interpolated. OK, so let's see if I can understand what the different adjacency relations are: usecone=False, useclosure=False: adj(p) => cone(p) + cone(support(p)) usecone=True, useclosure=False: adj(p) => support(p) + support(cone(p)) usecone=False, useclosure=True adj(p) => closure(star(p)) usecone=True, useclosure=True adj(p) => star(closure(p)) So let's imagine I have a facet f, the adjacent points are the support(cone(f)) so the support of the vertices in 2D, so those are some new facets. So now, following https://arxiv.org/pdf/1506.06194.pdf, I need to complete this new mesh, so I ask for the closure of these new facets. But that might mean I won't ask for cells, right? So I think I would end up with some facets that don't have any support. And empirically I observe that: e.g. the code attached: $ mpiexec -n 3 python bar.py [0] 7 [0] [0] 8 [0] [0] 9 [0 1] [0] 10 [1] [0] 11 [1] [0] 12 [] [1] 10 [0 2] [1] 11 [0 1] [1] 12 [0] [1] 13 [1] [1] 14 [2] [1] 15 [2] [1] 16 [1 3] [1] 17 [3] [1] 18 [3] [2] 7 [0 1] [2] 8 [0] [2] 9 [0] [2] 10 [1] [2] 11 [] [2] 12 [1] What I would like (although I'm not sure if this is supported right now), is the overlap to contain closure(support(facet)) for all shared facets. I think that's equivalent to closure(support(p)) \forall p. That way on any shared facets, I have both cells and their closure. Is that easy to do? Lawrence import sys, petsc4py petsc4py.init(sys.argv) from petsc4py import PETSc import numpy as np Lx = Ly = 1 nx = 1 ny = 2 xcoords = np.linspace(0.0, Lx, nx + 1, dtype=PETSc.RealType) ycoords = np.linspace(0.0, Ly, ny + 1, dtype=PETSc.RealType) coords = np.asarray(np.meshgrid(xcoords, ycoords)).swapaxes(0, 2).reshape(-1, 2) # cell vertices i, j = np.meshgrid(np.arange(nx, dtype=PETSc.IntType), np.arange(ny, dtype=PETSc.IntType)) cells = [i*(ny+1) + j, i*(ny+1) + j+1, (i+1)*(ny+1) + j+1, (i+1)*(ny+1) + j] cells = np.asarray(cells, dtype=PETSc.IntType).swapaxes(0, 2).reshape(-1, 4) idx = [0, 1, 3, 1, 2, 3] cells = cells[:, idx].reshape(-1, 3) comm = PETSc.COMM_WORLD if comm.rank == 0: dm = PETSc.DMPlex().createFromCellList(2, cells, coords, interpolate=True, comm=comm) else: dm = PETSc.DMPlex().createFromCellList(2, np.zeros((0, cells.shape[1]), dtype=PETSc.IntType), np.zeros((0, 2), dtype=PETSc.RealType), interpolate=True, comm=comm) dm.setAdjacencyUseClosure(False) dm.setAdjacencyUseCone(True) dm.distribute(overlap=1) sf = dm.getPointSF() for p in range(*dm.getDepthStratum(dm.getDepth()-1)): PETSc.Sys.syncPrint("[%d] %d %s" % (comm.rank, p, dm.getSupport(p))) PETSc.Sys.syncFlush() -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 473 bytes Desc: OpenPGP digital signature URL: From knepley at gmail.com Thu May 25 12:05:53 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 25 May 2017 12:05:53 -0500 Subject: [petsc-users] DMPlex distribution with FVM adjacency In-Reply-To: <15e465f7-dea1-39c5-7c43-ba447a7a8c09@imperial.ac.uk> References: <15e465f7-dea1-39c5-7c43-ba447a7a8c09@imperial.ac.uk> Message-ID: On Thu, May 25, 2017 at 11:27 AM, Lawrence Mitchell < lawrence.mitchell at imperial.ac.uk> wrote: > On 25/05/17 16:25, Matthew Knepley wrote: > > On Thu, May 25, 2017 at 9:27 AM, Lawrence Mitchell > > > > wrote: > > > > Dear petsc-users, > > > > I am trying to distribute a triangle mesh with a cell halo defined by > > FVM adjacency (i.e. if I have a facet in the initial (0-overlap) > > distribution, I want the cell on the other side of it). > > > > Reading the documentation, I think I do: > > > > DMPlexSetAdjacencyUseCone(PETSC_TRUE) > > DMPlexSetAdjacencyUseClosure(PETSC_FALSE) > > > > and then > > DMPlexDistribute(..., ovelap=1) > > > > If I do this for a simple mesh and then try and do anything on it, I > > run into all sorts of problems because I have a plex where I have > some > > facets, but not even one cell in the support of the facet. Is this > to > > be expected? > > > > > > Hmm. I don't think so. You should have at least one cell in the > > support of every facet. > > TS ex11 works exactly this way. > > > > When using that adjacency, the overlap cells you get will not have > > anything but the > > facet connecting them to that partition. Although, if you have > > adjacent cells in that overlap layer, > > you can get ghost faces between those. > > > > With the code below, do you get an interpolated mesh when you create > > it there. That call in C > > has another argument > > > > http://www.mcs.anl.gov/petsc/petsc-master/docs/manualpages/DMPLEX/ > DMPlexCreateFromCellList.html > > The mesh is interpolated. > > > OK, so let's see if I can understand what the different adjacency > relations are: > > usecone=False, useclosure=False: > > adj(p) => cone(p) + cone(support(p)) > > usecone=True, useclosure=False: > > adj(p) => support(p) + support(cone(p)) > > usecone=False, useclosure=True > > adj(p) => closure(star(p)) > > usecone=True, useclosure=True > > adj(p) => star(closure(p)) > > So let's imagine I have a facet f, the adjacent points are the > support(cone(f)) so the support of the vertices in 2D, so those are > some new facets. > If you want that, is there a reason you cannot use the FEM style FALSE+TRUE? If you already want the closure, usually the star is not really adding anything new. Matt > So now, following https://arxiv.org/pdf/1506.06194.pdf, I need to > complete this new mesh, so I ask for the closure of these new facets. > But that might mean I won't ask for cells, right? So I think I would > end up with some facets that don't have any support. And empirically > I observe that: > > e.g. the code attached: > > $ mpiexec -n 3 python bar.py > [0] 7 [0] > [0] 8 [0] > [0] 9 [0 1] > [0] 10 [1] > [0] 11 [1] > [0] 12 [] > [1] 10 [0 2] > [1] 11 [0 1] > [1] 12 [0] > [1] 13 [1] > [1] 14 [2] > [1] 15 [2] > [1] 16 [1 3] > [1] 17 [3] > [1] 18 [3] > [2] 7 [0 1] > [2] 8 [0] > [2] 9 [0] > [2] 10 [1] > [2] 11 [] > [2] 12 [1] > > > What I would like (although I'm not sure if this is supported right > now), is the overlap to contain closure(support(facet)) for all shared > facets. I think that's equivalent to closure(support(p)) \forall p. > > That way on any shared facets, I have both cells and their closure. > > Is that easy to do? > > Lawrence > > import sys, petsc4py > petsc4py.init(sys.argv) > from petsc4py import PETSc > import numpy as np > Lx = Ly = 1 > nx = 1 > ny = 2 > > xcoords = np.linspace(0.0, Lx, nx + 1, dtype=PETSc.RealType) > ycoords = np.linspace(0.0, Ly, ny + 1, dtype=PETSc.RealType) > coords = np.asarray(np.meshgrid(xcoords, ycoords)).swapaxes(0, > 2).reshape(-1, 2) > > # cell vertices > i, j = np.meshgrid(np.arange(nx, dtype=PETSc.IntType), np.arange(ny, > dtype=PETSc.IntType)) > cells = [i*(ny+1) + j, i*(ny+1) + j+1, (i+1)*(ny+1) + j+1, > (i+1)*(ny+1) + j] > cells = np.asarray(cells, dtype=PETSc.IntType).swapaxes(0, > 2).reshape(-1, 4) > idx = [0, 1, 3, 1, 2, 3] > cells = cells[:, idx].reshape(-1, 3) > > comm = PETSc.COMM_WORLD > if comm.rank == 0: > dm = PETSc.DMPlex().createFromCellList(2, cells, coords, > interpolate=True, comm=comm) > else: > dm = PETSc.DMPlex().createFromCellList(2, np.zeros((0, > cells.shape[1]), dtype=PETSc.IntType), > np.zeros((0, 2), > dtype=PETSc.RealType), > interpolate=True, > comm=comm) > > dm.setAdjacencyUseClosure(False) > dm.setAdjacencyUseCone(True) > > dm.distribute(overlap=1) > sf = dm.getPointSF() > > for p in range(*dm.getDepthStratum(dm.getDepth()-1)): > PETSc.Sys.syncPrint("[%d] %d %s" % (comm.rank, p, dm.getSupport(p))) > > PETSc.Sys.syncFlush() > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From lawrence.mitchell at imperial.ac.uk Thu May 25 13:10:59 2017 From: lawrence.mitchell at imperial.ac.uk (Lawrence Mitchell) Date: Thu, 25 May 2017 19:10:59 +0100 Subject: [petsc-users] DMPlex distribution with FVM adjacency In-Reply-To: References: <15e465f7-dea1-39c5-7c43-ba447a7a8c09@imperial.ac.uk> Message-ID: <54529998-4688-4774-845B-1FDF67A8C20B@imperial.ac.uk> > On 25 May 2017, at 18:05, Matthew Knepley wrote: > > If you want that, is there a reason you cannot use the FEM style FALSE+TRUE? > If you already want the closure, usually the star is not really adding anything new. Ok, let me clarify. Given shared facets, I'd like closure(support(facet)) this is a subset of the fem adjacency. "Add in the cell and its closure from the remote rank". This doesn't include remote cells I can only see through vertices. Without sending data evaluated at facet quad points, I think this is the adjacency I need to compute facet integrals: all the dofs in closure(support(facet)). I thought this was what the fv adjacency was, but I think I was mistaken. That is support(cone(p)) for all p that I have. Now I do a rendezvous to gather everything in the closure of these new points. But I think that means I still don't have some cells? Make sense? Lawrence -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu May 25 13:23:23 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 25 May 2017 13:23:23 -0500 Subject: [petsc-users] DMPlex distribution with FVM adjacency In-Reply-To: <54529998-4688-4774-845B-1FDF67A8C20B@imperial.ac.uk> References: <15e465f7-dea1-39c5-7c43-ba447a7a8c09@imperial.ac.uk> <54529998-4688-4774-845B-1FDF67A8C20B@imperial.ac.uk> Message-ID: On Thu, May 25, 2017 at 1:10 PM, Lawrence Mitchell < lawrence.mitchell at imperial.ac.uk> wrote: > > > On 25 May 2017, at 18:05, Matthew Knepley wrote: > > If you want that, is there a reason you cannot use the FEM style > FALSE+TRUE? > If you already want the closure, usually the star is not really adding > anything new. > > > Ok, let me clarify. > > Given shared facets, I'd like closure(support(facet)) this is a subset of > the fem adjacency. "Add in the cell and its closure from the remote rank". > This doesn't include remote cells I can only see through vertices. Without > sending data evaluated at facet quad points, I think this is the adjacency > I need to compute facet integrals: all the dofs in closure(support(facet)). > This seems incoherent to me. For FV, dofs reside in the cells, so you should only need the cell for adjacency. If you need dofs defined at vertices, then you should also need cells which are only attached by vertices. How could this scheme be consistent without this? Thanks, Matt > I thought this was what the fv adjacency was, but I think I was mistaken. > That is support(cone(p)) for all p that I have. > Now I do a rendezvous to gather everything in the closure of these new > points. But I think that means I still don't have some cells? > > Make sense? > > Lawrence > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From lawrence.mitchell at imperial.ac.uk Thu May 25 13:38:51 2017 From: lawrence.mitchell at imperial.ac.uk (Lawrence Mitchell) Date: Thu, 25 May 2017 19:38:51 +0100 Subject: [petsc-users] DMPlex distribution with FVM adjacency In-Reply-To: References: <15e465f7-dea1-39c5-7c43-ba447a7a8c09@imperial.ac.uk> <54529998-4688-4774-845B-1FDF67A8C20B@imperial.ac.uk> Message-ID: <0BEB36D4-C35B-48E4-8F66-8EE8D38E08B6@imperial.ac.uk> > On 25 May 2017, at 19:23, Matthew Knepley wrote: > > Ok, let me clarify. > > Given shared facets, I'd like closure(support(facet)) this is a subset of the fem adjacency. "Add in the cell and its closure from the remote rank". This doesn't include remote cells I can only see through vertices. Without sending data evaluated at facet quad points, I think this is the adjacency I need to compute facet integrals: all the dofs in closure(support(facet)). > > This seems incoherent to me. For FV, dofs reside in the cells, so you should only need the cell for adjacency. If you > need dofs defined at vertices, then you should also need cells which are only attached by vertices. How could this > scheme be consistent without this? OK, so what I think is this: I need to compute integrals over cells and facets. So I do: GlobalToLocal(INSERT_VALUES) ComputeIntegralsOnOwnedEntities LocalToGlobal(ADD_VALUES) That way, an integration is performed on every entity exactly once, and LocalToGlobal ensures that I get a consistent assembled Vec. OK, so if I only compute cell integrals, then the zero overlap distribution with all the points in the closure of the cell (including some remote points) is sufficient. If I compute facet integrals, I need both cells (and their closure) in the support of the facet. Again, each facet is only integrated by one process, and the LocalToGlobal adds in contributions to remote dofs. This is the same as cell integrals, just I need a bit more data, no? The other option is to notice that what I actually need when I compute a facet integral is the test function and/or any coefficients evaluated at quadrature points on the facet. So if I don't want the extra overlapped halo, then what I need to do is for the remote process to evaluate any coefficients at the quad points, then send the evaluated data to the facet owner. Now the facet owner can compute the integral, and again LocalToGlobal adds in contributions to remote dofs. Lawrence From knepley at gmail.com Thu May 25 13:46:01 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 25 May 2017 13:46:01 -0500 Subject: [petsc-users] DMPlex distribution with FVM adjacency In-Reply-To: <0BEB36D4-C35B-48E4-8F66-8EE8D38E08B6@imperial.ac.uk> References: <15e465f7-dea1-39c5-7c43-ba447a7a8c09@imperial.ac.uk> <54529998-4688-4774-845B-1FDF67A8C20B@imperial.ac.uk> <0BEB36D4-C35B-48E4-8F66-8EE8D38E08B6@imperial.ac.uk> Message-ID: On Thu, May 25, 2017 at 1:38 PM, Lawrence Mitchell < lawrence.mitchell at imperial.ac.uk> wrote: > > > On 25 May 2017, at 19:23, Matthew Knepley wrote: > > > > Ok, let me clarify. > > > > Given shared facets, I'd like closure(support(facet)) this is a subset > of the fem adjacency. "Add in the cell and its closure from the remote > rank". This doesn't include remote cells I can only see through vertices. > Without sending data evaluated at facet quad points, I think this is the > adjacency I need to compute facet integrals: all the dofs in > closure(support(facet)). > > > > This seems incoherent to me. For FV, dofs reside in the cells, so you > should only need the cell for adjacency. If you > > need dofs defined at vertices, then you should also need cells which are > only attached by vertices. How could this > > scheme be consistent without this? > > OK, so what I think is this: > > I need to compute integrals over cells and facets. > Sounds like DG. I will get out my dead chicken for the incantation. > So I do: > > GlobalToLocal(INSERT_VALUES) > ComputeIntegralsOnOwnedEntities > LocalToGlobal(ADD_VALUES) > > That way, an integration is performed on every entity exactly once, and > LocalToGlobal ensures that I get a consistent assembled Vec. > > OK, so if I only compute cell integrals, then the zero overlap > distribution with all the points in the closure of the cell (including some > remote points) is sufficient. > Yep. > If I compute facet integrals, I need both cells (and their closure) in the > support of the facet. Again, each facet is only integrated by one process, > and the LocalToGlobal adds in contributions to remote dofs. This is the > same as cell integrals, just I need a bit more data, no? > > The other option is to notice that what I actually need when I compute a > facet integral is the test function and/or any coefficients evaluated at > quadrature points on the facet. So if I don't want the extra overlapped > halo, then what I need to do is for the remote process to evaluate any > coefficients at the quad points, then send the evaluated data to the facet > owner. Now the facet owner can compute the integral, and again > LocalToGlobal adds in contributions to remote dofs. That seems baroque. So this is just another adjacency pattern. You should be able to easily define it, or if you are a patient person, wait for me to do it. Its here https://bitbucket.org/petsc/petsc/src/01c3230e040078628f5e559992965c1c4b6f473d/src/dm/impls/plex/plexdistribute.c?at=master&fileviewer=file-view-default#plexdistribute.c-239 I am more than willing to make this overridable by the user through function composition or another mechanism. Thanks, Matt > > Lawrence -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From lawrence.mitchell at imperial.ac.uk Thu May 25 13:58:02 2017 From: lawrence.mitchell at imperial.ac.uk (Lawrence Mitchell) Date: Thu, 25 May 2017 19:58:02 +0100 Subject: [petsc-users] DMPlex distribution with FVM adjacency In-Reply-To: References: <15e465f7-dea1-39c5-7c43-ba447a7a8c09@imperial.ac.uk> <54529998-4688-4774-845B-1FDF67A8C20B@imperial.ac.uk> <0BEB36D4-C35B-48E4-8F66-8EE8D38E08B6@imperial.ac.uk> Message-ID: > On 25 May 2017, at 19:46, Matthew Knepley wrote: > > Sounds like DG. I will get out my dead chicken for the incantation Actually no! Mixed H(div)-L2 for Stokes. Which has facet integrals for partially discontinuous fields. If you do redundant compute for such terms, you need a depth-2 FEM adjacency, which is just grim. Equally we have some strange users who have jump terms in CG formulations. Lawrence From knepley at gmail.com Thu May 25 14:03:29 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 25 May 2017 14:03:29 -0500 Subject: [petsc-users] DMPlex distribution with FVM adjacency In-Reply-To: References: <15e465f7-dea1-39c5-7c43-ba447a7a8c09@imperial.ac.uk> <54529998-4688-4774-845B-1FDF67A8C20B@imperial.ac.uk> <0BEB36D4-C35B-48E4-8F66-8EE8D38E08B6@imperial.ac.uk> Message-ID: On Thu, May 25, 2017 at 1:58 PM, Lawrence Mitchell < lawrence.mitchell at imperial.ac.uk> wrote: > > > On 25 May 2017, at 19:46, Matthew Knepley wrote: > > > > Sounds like DG. I will get out my dead chicken for the incantation > > Actually no! Mixed H(div)-L2 for Stokes. Which has facet integrals for > partially discontinuous fields. If you do redundant compute for such > terms, you need a depth-2 FEM adjacency, which is just grim. Equally we > have some strange users who have jump terms in CG formulations. Hmm, I thought I made adjacency per field. I have to look. That way, no problem with the Stokes example. DG is still weird. Matt > > Lawrence -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From lawrence.mitchell at imperial.ac.uk Thu May 25 14:22:15 2017 From: lawrence.mitchell at imperial.ac.uk (Lawrence Mitchell) Date: Thu, 25 May 2017 20:22:15 +0100 Subject: [petsc-users] DMPlex distribution with FVM adjacency In-Reply-To: References: <15e465f7-dea1-39c5-7c43-ba447a7a8c09@imperial.ac.uk> <54529998-4688-4774-845B-1FDF67A8C20B@imperial.ac.uk> <0BEB36D4-C35B-48E4-8F66-8EE8D38E08B6@imperial.ac.uk> Message-ID: <6C66D04E-72AD-445B-9DE6-BB0961B9F622@imperial.ac.uk> > On 25 May 2017, at 20:03, Matthew Knepley wrote: > > > Hmm, I thought I made adjacency per field. I have to look. That way, no problem with the Stokes example. DG is still weird. You might, we don't right now. We just make the topological adjacency that is "large enough", and then make fields on that. > > That seems baroque. So this is just another adjacency pattern. You should be able to easily define it, or if you are a patient person, > wait for me to do it. Its here > > https://bitbucket.org/petsc/petsc/src/01c3230e040078628f5e559992965c1c4b6f473d/src/dm/impls/plex/plexdistribute.c?at=master&fileviewer=file-view-default#plexdistribute.c-239 > > I am more than willing to make this overridable by the user through function composition or another mechanism. Hmm, that naive thing of just modifying the XXX_Support_Internal to compute with DMPlexGetTransitiveClosure rather than DMPlexGetCone didn't do what I expected, but I don't understand the way this bootstrapping is done very well. Cheers, Lawrence From knepley at gmail.com Thu May 25 15:00:19 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 25 May 2017 15:00:19 -0500 Subject: [petsc-users] DMPlex distribution with FVM adjacency In-Reply-To: <6C66D04E-72AD-445B-9DE6-BB0961B9F622@imperial.ac.uk> References: <15e465f7-dea1-39c5-7c43-ba447a7a8c09@imperial.ac.uk> <54529998-4688-4774-845B-1FDF67A8C20B@imperial.ac.uk> <0BEB36D4-C35B-48E4-8F66-8EE8D38E08B6@imperial.ac.uk> <6C66D04E-72AD-445B-9DE6-BB0961B9F622@imperial.ac.uk> Message-ID: On Thu, May 25, 2017 at 2:22 PM, Lawrence Mitchell < lawrence.mitchell at imperial.ac.uk> wrote: > > > On 25 May 2017, at 20:03, Matthew Knepley wrote: > > > > > > Hmm, I thought I made adjacency per field. I have to look. That way, no > problem with the Stokes example. DG is still weird. > > You might, we don't right now. We just make the topological adjacency > that is "large enough", and then make fields on that. > > > > > That seems baroque. So this is just another adjacency pattern. You > should be able to easily define it, or if you are a patient person, > > wait for me to do it. Its here > > > > https://bitbucket.org/petsc/petsc/src/01c3230e040078628f5e559992965c > 1c4b6f473d/src/dm/impls/plex/plexdistribute.c?at=master& > fileviewer=file-view-default#plexdistribute.c-239 > > > > I am more than willing to make this overridable by the user through > function composition or another mechanism. > > Hmm, that naive thing of just modifying the XXX_Support_Internal to > compute with DMPlexGetTransitiveClosure rather than DMPlexGetCone didn't do > what I expected, but I don't understand the way this bootstrapping is done > very well. > It should do the right thing. Notice that you have to be careful about the arrays that you use since I reuse them for efficiency here. What is going wrong? Matt > Cheers, > > Lawrence > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From danyang.su at gmail.com Thu May 25 15:18:39 2017 From: danyang.su at gmail.com (Danyang Su) Date: Thu, 25 May 2017 13:18:39 -0700 Subject: [petsc-users] PCFactorSetShiftType does not work in code but -pc_factor_set_shift_type works In-Reply-To: References: <93217794-9c63-fd52-ab36-4174de8cb9c8@gmail.com> <8634589f-d1a5-bf4f-b158-3ddb5a18026b@gmail.com> Message-ID: <02bb196f-6243-0a7f-ec1b-5ebb202e4539@gmail.com> Hi Hong, It works like a charm. I really appreciate your help. Regards, Danyang On 17-05-25 07:49 AM, Hong wrote: > Danyang: > You must access inner pc, then set shift. See > petsc/src/ksp/ksp/examples/tutorials/ex7.c > > For example, I add following to > petsc/src/ksp/ksp/examples/tutorials/ex2.c, line 191: > PetscBool isbjacobi; > PC pc; > ierr = KSPGetPC(ksp,&pc);CHKERRQ(ierr); > ierr = > PetscObjectTypeCompare((PetscObject)pc,PCBJACOBI,&isbjacobi);CHKERRQ(ierr); > if (isbjacobi) { > PetscInt nlocal; > KSP *subksp; > PC subpc; > > ierr = KSPSetUp(ksp);CHKERRQ(ierr); > ierr = KSPGetPC(ksp,&pc);CHKERRQ(ierr); > > /* Extract the array of KSP contexts for the local blocks */ > ierr = PCBJacobiGetSubKSP(pc,&nlocal,NULL,&subksp);CHKERRQ(ierr); > printf("isbjacobi, nlocal %D, set option to subpc...\n",nlocal); > for (i=0; i ierr = KSPGetPC(subksp[i],&subpc);CHKERRQ(ierr); > ierr = PCFactorSetShiftType(subpc,MAT_SHIFT_NONZERO);CHKERRQ(ierr); > } > } > > > Dear Hong and Barry, > > I have implemented this option in the code, as we also need to use > configuration from file for convenience. When I run the code using > options, it works fine, however, when I run the code using > configuration file, it does not work. The code has two set of > equations, flow and reactive, with prefix been set to "flow_" and > "react_". When I run the code using > > mpiexec -n 4 ../executable -flow_sub_pc_factor_shift_type nonzero > -react_sub_pc_factor_shift_type nonzero > > it works. However, if I run using > > mpiexec -n 4 ../executable > > and let the executable file read the options from file, it just > does not work at "call > PCFactorSetShiftType(pc_flow,MAT_SHIFT_NONZERO, ierr) or none, > positive_definite ...". Do I miss something here? > > Below is the pseudo code I have used for flow equations, similar > for reactive equations. > > call MatCreateAIJ(Petsc_Comm_World,nndof,nndof,nngbldof, & > nngbldof,d_nz,PETSC_NULL_INTEGER,o_nz, & > PETSC_NULL_INTEGER,a_flow,ierr) > CHKERRQ(ierr) > > call MatSetFromOptions(a_flow,ierr) > CHKERRQ(ierr) > > call KSPCreate(Petsc_Comm_World, ksp_flow, ierr) > CHKERRQ(ierr) > > call KSPAppendOptionsPrefix(ksp_flow,"flow_",ierr) > CHKERRQ(ierr) > > call KSPSetInitialGuessNonzero(ksp_flow, & > b_initial_guess_nonzero_flow, ierr) > CHKERRQ(ierr) > > call KSPSetInitialGuessNonzero(ksp_flow, & > b_initial_guess_nonzero_flow, ierr) > CHKERRQ(ierr) > > call KSPSetDM(ksp_flow,dmda_flow%da,ierr) > CHKERRQ(ierr) > call KSPSetDMActive(ksp_flow,PETSC_FALSE,ierr) > CHKERRQ(ierr) > > !!!!*********CHECK IF READ OPTION FROM FILE*********!!!! > if (read_option_from_file) then > > call KSPSetType(ksp_flow, KSPGMRES, ierr) !or KSPBCGS or > others... > CHKERRQ(ierr) > > call KSPGetPC(ksp_flow, pc_flow, ierr) > CHKERRQ(ierr) > > call PCSetType(pc_flow,PCBJACOBI, ierr) !or PCILU > or PCJACOBI or PCHYPRE ... > CHKERRQ(ierr) > > call PCFactorSetShiftType(pc_flow,MAT_SHIFT_NONZERO, > ierr) or none, positive_definite ... > CHKERRQ(ierr) > > end if > > call > PCFactorGetMatSolverPackage(pc_flow,solver_pkg_flow,ierr) > CHKERRQ(ierr) > > call compute_jacobian(rank,dmda_flow%da, & > a_flow,a_in,ia_in,ja_in,nngl_in, & > row_idx_l2pg,col_idx_l2pg, & > b_non_interlaced) > call KSPSetFromOptions(ksp_flow,ierr) > CHKERRQ(ierr) > > call KSPSetUp(ksp_flow,ierr) > CHKERRQ(ierr) > > call KSPSetUpOnBlocks(ksp_flow,ierr) > CHKERRQ(ierr) > > call KSPSolve(ksp_flow,b_flow,x_flow,ierr) > CHKERRQ(ierr) > > > Thanks and Regards, > > Danyang > > On 17-05-24 06:32 PM, Hong wrote: >> Remove your option '-vecload_block_size 10'. >> Hong >> >> On Wed, May 24, 2017 at 3:06 PM, Danyang Su > > wrote: >> >> Dear Hong, >> >> I just tested with different number of processors for the >> same matrix. It sometimes got "ERROR: Arguments are >> incompatible" for different number of processors. It works >> fine using 4, 8, or 24 processors, but failed with "ERROR: >> Arguments are incompatible" using 16 or 48 processors. The >> error information is attached. I tested this on my local >> computer with 6 cores 12 threads. Any suggestion on this? >> >> Thanks, >> >> Danyang >> >> >> On 17-05-24 12:28 PM, Danyang Su wrote: >>> >>> Hi Hong, >>> >>> Awesome. Thanks for testing the case. I will try your >>> options for the code and get back to you later. >>> >>> Regards, >>> >>> Danyang >>> >>> >>> On 17-05-24 12:21 PM, Hong wrote: >>>> Danyang : >>>> I tested your data. >>>> Your matrices encountered zero pivots, e.g. >>>> petsc/src/ksp/ksp/examples/tutorials (master) >>>> $ mpiexec -n 24 ./ex10 -f0 a_react_in_2.bin -rhs >>>> b_react_in_2.bin -ksp_monitor -ksp_error_if_not_converged >>>> >>>> [15]PETSC ERROR: Zero pivot in LU factorization: >>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot >>>> >>>> [15]PETSC ERROR: Zero pivot row 1249 value 2.05808e-14 >>>> tolerance 2.22045e-14 >>>> ... >>>> >>>> Adding option '-sub_pc_factor_shift_type nonzero', I got >>>> mpiexec -n 24 ./ex10 -f0 a_react_in_2.bin -rhs >>>> b_react_in_2.bin -ksp_monitor -ksp_error_if_not_converged >>>> -sub_pc_factor_shift_type nonzero -mat_view ascii::ascii_info >>>> >>>> Mat Object: 24 MPI processes >>>> type: mpiaij >>>> rows=450000, cols=450000 >>>> total: nonzeros=6991400, allocated nonzeros=6991400 >>>> total number of mallocs used during MatSetValues calls =0 >>>> not using I-node (on process 0) routines >>>> 0 KSP Residual norm 5.849777711755e+01 >>>> 1 KSP Residual norm 6.824179430230e-01 >>>> 2 KSP Residual norm 3.994483555787e-02 >>>> 3 KSP Residual norm 6.085841461433e-03 >>>> 4 KSP Residual norm 8.876162583511e-04 >>>> 5 KSP Residual norm 9.407780665278e-05 >>>> Number of iterations = 5 >>>> Residual norm 0.00542891 >>>> >>>> Hong >>>> >>>> Hi Matt, >>>> >>>> Yes. The matrix is 450000x450000 sparse. The hypre >>>> takes hundreds of iterates, not for all but in most of >>>> the timesteps. The matrix is not well conditioned, with >>>> nonzero entries range from 1.0e-29 to 1.0e2. I also >>>> made double check if there is anything wrong in the >>>> parallel version, however, the matrix is the same with >>>> sequential version except some round error which is >>>> relatively very small. Usually for those not well >>>> conditioned matrix, direct solver should be faster than >>>> iterative solver, right? But when I use the sequential >>>> iterative solver with ILU prec developed almost 20 >>>> years go by others, the solver converge fast with >>>> appropriate factorization level. In other words, when I >>>> use 24 processor using hypre, the speed is almost the >>>> same as as the old sequential iterative solver using 1 >>>> processor. >>>> >>>> I use most of the default configuration for the general >>>> case with pretty good speedup. And I am not sure if I >>>> miss something for this problem. >>>> >>>> Thanks, >>>> >>>> Danyang >>>> >>>> >>>> On 17-05-24 11:12 AM, Matthew Knepley wrote: >>>>> On Wed, May 24, 2017 at 12:50 PM, Danyang Su >>>>> > >>>>> wrote: >>>>> >>>>> Hi Matthew and Barry, >>>>> >>>>> Thanks for the quick response. >>>>> >>>>> I also tried superlu and mumps, both work but it >>>>> is about four times slower than ILU(dt) prec >>>>> through hypre, with 24 processors I have tested. >>>>> >>>>> You mean the total time is 4x? And you are taking >>>>> hundreds of iterates? That seems hard to believe, >>>>> unless you are dropping >>>>> a huge number of elements. >>>>> >>>>> When I look into the convergence information, the >>>>> method using ILU(dt) still takes 200 to 3000 >>>>> linear iterations for each newton iteration. One >>>>> reason is this equation is hard to solve. As for >>>>> the general cases, the same method works awesome >>>>> and get very good speedup. >>>>> >>>>> I do not understand what you mean here. >>>>> >>>>> I also doubt if I use hypre correctly for this >>>>> case. Is there anyway to check this problem, or is >>>>> it possible to increase the factorization level >>>>> through hypre? >>>>> >>>>> I don't know. >>>>> >>>>> Matt >>>>> >>>>> Thanks, >>>>> >>>>> Danyang >>>>> >>>>> >>>>> On 17-05-24 04:59 AM, Matthew Knepley wrote: >>>>>> On Wed, May 24, 2017 at 2:21 AM, Danyang Su >>>>>> >>>>> > wrote: >>>>>> >>>>>> Dear All, >>>>>> >>>>>> I use PCFactorSetLevels for ILU and >>>>>> PCFactorSetFill for other preconditioning in >>>>>> my code to help solve the problems that the >>>>>> default option is hard to solve. However, I >>>>>> found the latter one, PCFactorSetFill does >>>>>> not take effect for my problem. The matrices >>>>>> and rhs as well as the solutions are attached >>>>>> from the link below. I obtain the solution >>>>>> using hypre preconditioner and it takes 7 and >>>>>> 38 iterations for matrix 1 and matrix 2. >>>>>> However, if I use other preconditioner, the >>>>>> solver just failed at the first matrix. I >>>>>> have tested this matrix using the native >>>>>> sequential solver (not PETSc) with ILU >>>>>> preconditioning. If I set the incomplete >>>>>> factorization level to 0, this sequential >>>>>> solver will take more than 100 iterations. If >>>>>> I increase the factorization level to 1 or >>>>>> more, it just takes several iterations. This >>>>>> remind me that the PC factor for this >>>>>> matrices should be increased. However, when I >>>>>> tried it in PETSc, it just does not work. >>>>>> >>>>>> Matrix and rhs can be obtained from the link >>>>>> below. >>>>>> >>>>>> https://eilinator.eos.ubc.ca:8443/index.php/s/CalUcq9CMeblk4R >>>>>> >>>>>> >>>>>> Would anyone help to check if you can make >>>>>> this work by increasing the PC factor level >>>>>> or fill? >>>>>> >>>>>> >>>>>> We have ILU(k) supported in serial. However >>>>>> ILU(dt) which takes a tolerance only works >>>>>> through Hypre >>>>>> >>>>>> http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html >>>>>> >>>>>> >>>>>> I recommend you try SuperLU or MUMPS, which can >>>>>> both be downloaded automatically by configure, and >>>>>> do a full sparse LU. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>> Thanks and regards, >>>>>> >>>>>> Danyang >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before >>>>>> they begin their experiments is infinitely more >>>>>> interesting than any results to which their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> http://www.caam.rice.edu/~mk51/ >>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they >>>>> begin their experiments is infinitely more interesting >>>>> than any results to which their experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> http://www.caam.rice.edu/~mk51/ >>>>> >>>> >>>> >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu May 25 23:26:46 2017 From: jed at jedbrown.org (Jed Brown) Date: Thu, 25 May 2017 22:26:46 -0600 Subject: [petsc-users] PETSc User Meeting 2017, June 14-16 in Boulder, Colorado In-Reply-To: <87y3wbtk1i.fsf@jedbrown.org> References: <87y3wbtk1i.fsf@jedbrown.org> Message-ID: <87shjsxmyh.fsf@jedbrown.org> The program is up on the website: https://www.mcs.anl.gov/petsc/meetings/2017/ If you haven't registered yet, we can still accommodate you, but please register soon. If you haven't booked lodging, please do that soon -- the on-campus lodging option will close on *Tuesday, May 30*. https://confreg.colorado.edu/CSM2017 We are looking forward to seeing you in Boulder! Jed Brown writes: > We'd like to invite you to join us at the 2017 PETSc User Meeting held > at the University of Colorado Boulder on June 14-16, 2017. > > http://www.mcs.anl.gov/petsc/meetings/2017/ > > The first day consists of tutorials on various aspects and features of > PETSc. The second and third days will be devoted to exchange, > discussions, and a refinement of strategies for the future with our > users. We encourage you to present work illustrating your own use of > PETSc, for example in applications or in libraries built on top of > PETSc. > > Registration for the PETSc User Meeting 2017 is free for students and > $75 for non-students. We can host a maximum of 150 participants, so > register soon (and by May 15). > > http://www.eventzilla.net/web/e/petsc-user-meeting-2017-2138890185 > > We are also offering low-cost lodging on campus. A lodging registration > site will be available soon and announced here and on the website. > > Thanks to the generosity of Intel, we will be able to offer a limited > number of student travel grants. We are also soliciting additional > sponsors -- please contact us if you are interested. > > > We are looking forward to seeing you in Boulder! > > Please contact us at petsc2017 at mcs.anl.gov if you have any questions or > comments. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From knepley at gmail.com Thu May 25 23:34:21 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 25 May 2017 23:34:21 -0500 Subject: [petsc-users] PETSc User Meeting 2017, June 14-16 in Boulder, Colorado In-Reply-To: <87shjsxmyh.fsf@jedbrown.org> References: <87y3wbtk1i.fsf@jedbrown.org> <87shjsxmyh.fsf@jedbrown.org> Message-ID: On Thu, May 25, 2017 at 11:26 PM, Jed Brown wrote: > The program is up on the website: > > https://www.mcs.anl.gov/petsc/meetings/2017/ Put Toby on the oanel. Matt > > If you haven't registered yet, we can still accommodate you, but please > register soon. If you haven't booked lodging, please do that soon -- > the on-campus lodging option will close on *Tuesday, May 30*. > > https://confreg.colorado.edu/CSM2017 > > We are looking forward to seeing you in Boulder! > > Jed Brown writes: > > > We'd like to invite you to join us at the 2017 PETSc User Meeting held > > at the University of Colorado Boulder on June 14-16, 2017. > > > > http://www.mcs.anl.gov/petsc/meetings/2017/ > > > > The first day consists of tutorials on various aspects and features of > > PETSc. The second and third days will be devoted to exchange, > > discussions, and a refinement of strategies for the future with our > > users. We encourage you to present work illustrating your own use of > > PETSc, for example in applications or in libraries built on top of > > PETSc. > > > > Registration for the PETSc User Meeting 2017 is free for students and > > $75 for non-students. We can host a maximum of 150 participants, so > > register soon (and by May 15). > > > > http://www.eventzilla.net/web/e/petsc-user-meeting-2017-2138890185 > > > > We are also offering low-cost lodging on campus. A lodging registration > > site will be available soon and announced here and on the website. > > > > Thanks to the generosity of Intel, we will be able to offer a limited > > number of student travel grants. We are also soliciting additional > > sponsors -- please contact us if you are interested. > > > > > > We are looking forward to seeing you in Boulder! > > > > Please contact us at petsc2017 at mcs.anl.gov if you have any questions or > > comments. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From franck.houssen at inria.fr Fri May 26 04:52:12 2017 From: franck.houssen at inria.fr (Franck Houssen) Date: Fri, 26 May 2017 11:52:12 +0200 (CEST) Subject: [petsc-users] How to VecView with a formatted precision (%10.8f) ? In-Reply-To: <6348799.8316454.1495792325379.JavaMail.zimbra@inria.fr> Message-ID: <1559132119.8316480.1495792332227.JavaMail.zimbra@inria.fr> How to VecView with a formatted precision (%10.8f) ? Not possible ? Franck -------------- next part -------------- An HTML attachment was scrubbed... URL: From Fabian.Jakub at physik.uni-muenchen.de Fri May 26 12:27:25 2017 From: Fabian.Jakub at physik.uni-muenchen.de (Fabian.Jakub) Date: Fri, 26 May 2017 19:27:25 +0200 Subject: [petsc-users] DMPlex export to hdf5/vtk for triangle/prism mesh Message-ID: <28b9b347-6d83-7789-8f13-0409f312db34@physik.uni-muenchen.de> Dear Petsc Team, I am playing around with DMPlex, using it to generate the Mesh for the ICON weather model(http://doi.org/10.1002/2015MS000431), which employs a triangle mesh horizontally and columns, vertically. This results in a grid, looking like prisms, where top and bottom faces are triangles and side faces are rectangles. I was delighted to see that I could export the triangle DMPlex (2d Mesh) to hdf5 and use petsc_gen_xdmf.py to then visualize the mesh in visit/paraview. This is especially nice when exporting petscsections/vectors directly to VTK. I then tried the same approach for the prism grid in 3D. I attached the code for one single cell, as well as the output in hdf5. However, trying to convert the hdf5 output, it fails with: make prism.xmf $PETSC_DIR/bin/petsc_gen_xdmf.py prism.h5 Traceback (most recent call last): File "/software/meteo/xenial/x86_64/petsc/master/debug_gcc/..//bin/petsc_gen_xdmf.py", line 241, in generateXdmf(f) File "/software/meteo/xenial/x86_64/petsc/master/debug_gcc/..//bin/petsc_gen_xdmf.py", line 235, in generateXdmf Xdmf(xdmfFilename).write(hdfFilename, topoPath, numCells, numCorners, cellDim, geomPath, numVertices, spaceDim, time, vfields, cfields) File "/software/meteo/xenial/x86_64/petsc/master/debug_gcc/..//bin/petsc_gen_xdmf.py", line 193, in write self.writeSpaceGridHeader(fp, numCells, numCorners, cellDim, spaceDim) File "/software/meteo/xenial/x86_64/petsc/master/debug_gcc/..//bin/petsc_gen_xdmf.py", line 75, in writeSpaceGridHeader ''' % (self.cellMap[cellDim][numCorners], numCells, "XYZ" if spaceDim > 2 else "XY")) KeyError: 6 Also, if I try to export a vector directly to vtk, visit and paraview fail to open it. My question is: Is this a general limitation of these output formats, that I can not mix faces with 3 and 4 vertices or is it a limitation of the petsc_gen_xdmf.py or the VTK Viewer. I'd also welcome any thoughts on the prism mesh in general. Is it that uncommon to use and do you foresee other complications with it? I fear I cannot change the discretization of the host model but maybe it makes sense to use a different grid for my radiative transfer code? Many thanks, Fabian -------------- next part -------------- include ${PETSC_DIR}/lib/petsc/conf/variables include ${PETSC_DIR}/lib/petsc/conf/rules prism.xmf:: prism.h5 ${PETSC_DIR}/bin/petsc_gen_xdmf.py prism.h5 prism.h5:: plex_prism ./plex_prism -show_plex ::ascii_info_detail ./plex_prism -show_plex hdf5:prism.h5 plex_prism:: plex_prism.F90 ${PETSC_FCOMPILE} -c plex_prism.F90 ${FLINKER} plex_prism.o -o plex_prism ${PETSC_LIB} clean:: rm -rf *.o prism.h5 prism.xmf plex_prism -------------- next part -------------- A non-text attachment was scrubbed... Name: plex_prism.F90 Type: text/x-fortran Size: 5515 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: prism.h5 Type: application/x-hdf Size: 25400 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From jed at jedbrown.org Fri May 26 12:27:42 2017 From: jed at jedbrown.org (Jed Brown) Date: Fri, 26 May 2017 11:27:42 -0600 Subject: [petsc-users] How to VecView with a formatted precision (%10.8f) ? In-Reply-To: <1559132119.8316480.1495792332227.JavaMail.zimbra@inria.fr> References: <1559132119.8316480.1495792332227.JavaMail.zimbra@inria.fr> Message-ID: <87shjrwmsx.fsf@jedbrown.org> No, but this could be added to the ASCII viewer. Why do you want it? Franck Houssen writes: > How to VecView with a formatted precision (%10.8f) ? Not possible ? > > Franck -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From lvella at gmail.com Fri May 26 16:20:08 2017 From: lvella at gmail.com (Lucas Clemente Vella) Date: Fri, 26 May 2017 18:20:08 -0300 Subject: [petsc-users] How to replace the default global database? Message-ID: Here is what I want to do: - Take the global PetscOptions and store it somewhere; - Create my own PetscOptions; - Populate it with my options; - Set my new PetscOptions as the global default; - Create some PETSc objects; - Restore old PetscOptions as default global; - Destroy the PetscOptions I created. I could not find a function to replace global PetscOptions, or to copy one PetscOptions to another. Is it possible to do what I want to do? How? -- Lucas Clemente Vella lvella at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri May 26 17:55:41 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 26 May 2017 17:55:41 -0500 Subject: [petsc-users] How to replace the default global database? In-Reply-To: References: Message-ID: <1DEA680E-6BF8-459E-8AD2-3628E3B56BAF@mcs.anl.gov> I do not think you want to do this. The standard way we handle what it seems you need is to use PetscObjectSetOptionsPrefix() for the different PETSc objects giving them different prefixes and then appending the prefix for the options when you provide them to the options database. For example if you have a KSP for a flow solver and a KSP for a pressure solver you might do KSPCreate(PETSC_COMM_WORLD,&flow); KSPSetOptionsPrefix(flow,"u"); KSPCreate(PETSC_COMM_WORLD,&pressure); KSPSetOptionsPrefix(pressure,"p"); and set options like -u_pc_type jacobi -p_pc_type gamg Will this do what you need? Barry Because the options data base can be accessed by any object at any time (not just when it is created), it doesn't make sense to change the default options database ever because it would be uncertain what objects the change affected or did not affect. > On May 26, 2017, at 4:20 PM, Lucas Clemente Vella wrote: > > Here is what I want to do: > - Take the global PetscOptions and store it somewhere; > - Create my own PetscOptions; > - Populate it with my options; > - Set my new PetscOptions as the global default; > - Create some PETSc objects; > - Restore old PetscOptions as default global; > - Destroy the PetscOptions I created. > > I could not find a function to replace global PetscOptions, or to copy one PetscOptions to another. Is it possible to do what I want to do? How? > > -- > Lucas Clemente Vella > lvella at gmail.com From knepley at gmail.com Fri May 26 22:40:40 2017 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 26 May 2017 22:40:40 -0500 Subject: [petsc-users] DMPlex export to hdf5/vtk for triangle/prism mesh In-Reply-To: <28b9b347-6d83-7789-8f13-0409f312db34@physik.uni-muenchen.de> References: <28b9b347-6d83-7789-8f13-0409f312db34@physik.uni-muenchen.de> Message-ID: On Fri, May 26, 2017 at 12:27 PM, Fabian.Jakub < Fabian.Jakub at physik.uni-muenchen.de> wrote: > Dear Petsc Team, > > I am playing around with DMPlex, using it to generate the Mesh for the > ICON weather model(http://doi.org/10.1002/2015MS000431), which employs a > triangle mesh horizontally and columns, vertically. > > This results in a grid, looking like prisms, where top and bottom faces > are triangles and side faces are rectangles. > > I was delighted to see that I could export the triangle DMPlex (2d Mesh) > to hdf5 and use petsc_gen_xdmf.py to then visualize the mesh in > visit/paraview. > This is especially nice when exporting petscsections/vectors directly to > VTK. > Great. > I then tried the same approach for the prism grid in 3D. > I attached the code for one single cell, as well as the output in hdf5. > > However, trying to convert the hdf5 output, it fails with: > > make prism.xmf > > $PETSC_DIR/bin/petsc_gen_xdmf.py prism.h5 > Traceback (most recent call last): > File > "/software/meteo/xenial/x86_64/petsc/master/debug_gcc/..// > bin/petsc_gen_xdmf.py", > line 241, in > generateXdmf(f) > File > "/software/meteo/xenial/x86_64/petsc/master/debug_gcc/..// > bin/petsc_gen_xdmf.py", > line 235, in generateXdmf > Xdmf(xdmfFilename).write(hdfFilename, topoPath, numCells, > numCorners, cellDim, geomPath, numVertices, spaceDim, time, vfields, > cfields) > File > "/software/meteo/xenial/x86_64/petsc/master/debug_gcc/..// > bin/petsc_gen_xdmf.py", > line 193, in write > self.writeSpaceGridHeader(fp, numCells, numCorners, cellDim, spaceDim) > File > "/software/meteo/xenial/x86_64/petsc/master/debug_gcc/..// > bin/petsc_gen_xdmf.py", > line 75, in writeSpaceGridHeader > ''' % (self.cellMap[cellDim][numCorners], numCells, "XYZ" if > spaceDim > 2 else "XY")) > KeyError: 6 > > > Also, if I try to export a vector directly to vtk, visit and paraview > fail to open it. > > My question is: > Is this a general limitation of these output formats, that I can not mix > faces with 3 and 4 vertices or is it a limitation of the > petsc_gen_xdmf.py or the VTK Viewer. > petsc_gen_xdmf. Take a look here https://bitbucket.org/petsc/petsc/src/1731673c3fe570066779d46b51a4aee7a45775ed/bin/petsc_gen_xdmf.py?at=master&fileviewer=file-view-default#petsc_gen_xdmf.py-9 This is what fails. You need to add something like 6: "Wedge" in the dictionary. See http://www.xdmf.org/index.php/XDMF_Model_and_Format > I'd also welcome any thoughts on the prism mesh in general. > Is it that uncommon to use and do you foresee other complications with it? > You need an element that works with prisms, but it seems you already have one. I know there is good work from here: https://arxiv.org/abs/1411.2940 > I fear I cannot change the discretization of the host model but maybe it > makes sense to use a different grid for my radiative transfer code? > I do not really do RT, but would be happy to try and think about it. Thanks, Matt > Many thanks, > > > Fabian > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From leejearl at 126.com Sun May 28 01:31:48 2017 From: leejearl at 126.com (leejearl) Date: Sun, 28 May 2017 14:31:48 +0800 Subject: [petsc-users] a question about PetscSectionCreate Message-ID: <7a1aa1ce-fc8a-7c03-9219-995cea4f74b2@126.com> Hi, PETSc developer: I need to create a PetscSection with a struct. The struct is defined as follow, typedef struct { PetscReal x; PetscInt id; } testStruct; When I run the program, I got a wrong output as follow, Vec Object: 1 MPI processes type: seq 2. 4.94066e-324 2. 4.94066e-324 2. 4.94066e-324 2. 4.94066e-324 2. 4.94066e-324 2. 4.94066e-324 2. 4.94066e-324 2. 4.94066e-324 But when I defined the struct as typedef struct { PetscReal x; PetscReal id; } testStruct; The output is ok. It seems that there is some wrong with the memories when I define the "id" as a PetscInt type. I can not find out the reasons, and any one can help me with it? The source file "test.c" is attached. Thanks, leejearl -------------- next part -------------- A non-text attachment was scrubbed... Name: test.c Type: text/x-csrc Size: 2198 bytes Desc: not available URL: From dave.mayhem23 at gmail.com Sun May 28 01:49:25 2017 From: dave.mayhem23 at gmail.com (Dave May) Date: Sun, 28 May 2017 06:49:25 +0000 Subject: [petsc-users] a question about PetscSectionCreate In-Reply-To: <7a1aa1ce-fc8a-7c03-9219-995cea4f74b2@126.com> References: <7a1aa1ce-fc8a-7c03-9219-995cea4f74b2@126.com> Message-ID: On Sun, 28 May 2017 at 08:31, leejearl wrote: > Hi, PETSc developer: > > I need to create a PetscSection with a struct. The struct is > defined as follow, > > typedef struct > { > PetscReal x; > PetscInt id; > } testStruct; > > When I run the program, I got a wrong output as follow, > > Vec Object: 1 MPI processes > type: seq > 2. > 4.94066e-324 > 2. > 4.94066e-324 > 2. > 4.94066e-324 > 2. > 4.94066e-324 > 2. > 4.94066e-324 > 2. > 4.94066e-324 > 2. > 4.94066e-324 > 2. > 4.94066e-324 > > But when I defined the struct as > > typedef struct > { > PetscReal x; > PetscReal id; > } testStruct; > > The output is ok. It seems that there is some wrong with the memories > when I define the "id" as a PetscInt type. Yep. > > I can not find out the reasons, and any one can help me with it? The Vec object can only store quantities of type PetscScalar. It cannot store PetscInt's and it definitely cannot represent a mixture of PetscReal's and PetscInt's. Thanks, Dave The > source file "test.c" is attached. > > > Thanks, > > leejearl > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From leejearl at mail.nwpu.edu.cn Sun May 28 02:30:22 2017 From: leejearl at mail.nwpu.edu.cn (leejearl) Date: Sun, 28 May 2017 15:30:22 +0800 Subject: [petsc-users] a question about PetscSectionCreate Message-ID: <19d62bf5-8c56-0e99-b8c7-0bee39ad01d4@mail.nwpu.edu.cn> Hi, Dave: Thank you for your kind reply. If I want to store a mixture of PetscReal and PetscInt, how can I do it? Thanks, leejearl From dave.mayhem23 at gmail.com Sun May 28 02:44:31 2017 From: dave.mayhem23 at gmail.com (Dave May) Date: Sun, 28 May 2017 07:44:31 +0000 Subject: [petsc-users] a question about PetscSectionCreate In-Reply-To: <19d62bf5-8c56-0e99-b8c7-0bee39ad01d4@mail.nwpu.edu.cn> References: <19d62bf5-8c56-0e99-b8c7-0bee39ad01d4@mail.nwpu.edu.cn> Message-ID: On Sun, 28 May 2017 at 09:30, leejearl wrote: > Hi, Dave: > Thank you for your kind reply. If I want to store a mixture of > PetscReal and PetscInt, how can I do it? What operations do you need to perform with your struct? > > Thanks, > leejearl > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From leejearl at 126.com Sun May 28 03:16:36 2017 From: leejearl at 126.com (leejearl) Date: Sun, 28 May 2017 16:16:36 +0800 Subject: [petsc-users] a question about PetscSectionCreate Message-ID: Hi, Dave: I want to store a PetscInt tag for every cell of the dmplex with the struct. Thanks, leejearl >>/Hi, Dave: />/ > Thank you for your kind reply. If I want to store a mixture of />/PetscReal and PetscInt, how can I do it? / >What operations do you need to perform with your struct? >>//>/ > Thanks, />/ > leejearl />>//>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From lawrence.mitchell at imperial.ac.uk Sun May 28 06:02:16 2017 From: lawrence.mitchell at imperial.ac.uk (Lawrence Mitchell) Date: Sun, 28 May 2017 12:02:16 +0100 Subject: [petsc-users] a question about PetscSectionCreate In-Reply-To: References: Message-ID: > On 28 May 2017, at 09:16, leejearl wrote: > > Hi, Dave: I want to store a PetscInt tag for every cell of the dmplex with the struct. Thanks, You probably want to use a DMLabel to store these ids. Unless you have a different I'd for every cell. Lawrence From knepley at gmail.com Sun May 28 06:32:09 2017 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 28 May 2017 06:32:09 -0500 Subject: [petsc-users] a question about PetscSectionCreate In-Reply-To: References: <7a1aa1ce-fc8a-7c03-9219-995cea4f74b2@126.com> Message-ID: On Sun, May 28, 2017 at 1:49 AM, Dave May wrote: > > On Sun, 28 May 2017 at 08:31, leejearl wrote: > >> Hi, PETSc developer: >> >> I need to create a PetscSection with a struct. The struct is >> defined as follow, >> >> typedef struct >> { >> PetscReal x; >> PetscInt id; >> } testStruct; >> >> When I run the program, I got a wrong output as follow, >> >> Vec Object: 1 MPI processes >> type: seq >> 2. >> 4.94066e-324 >> 2. >> 4.94066e-324 >> 2. >> 4.94066e-324 >> 2. >> 4.94066e-324 >> 2. >> 4.94066e-324 >> 2. >> 4.94066e-324 >> 2. >> 4.94066e-324 >> 2. >> 4.94066e-324 >> >> But when I defined the struct as >> >> typedef struct >> { >> PetscReal x; >> PetscReal id; >> } testStruct; >> >> The output is ok. It seems that there is some wrong with the memories >> when I define the "id" as a PetscInt type. > > > Yep. > > >> >> I can not find out the reasons, and any one can help me with it? > > > The Vec object can only store quantities of type PetscScalar. It cannot > store PetscInt's and it definitely cannot represent a mixture of > PetscReal's and PetscInt's. > Dave is correct. However this usage completely misses the point of Section. Section is a device for storing indices into ANY storage, not just Vec and IS. I would manage an array of the structs that I allocate, and use the Section to index into. Matt > > Thanks, > Dave > > The >> source file "test.c" is attached. >> >> >> Thanks, >> >> leejearl >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun May 28 06:35:11 2017 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 28 May 2017 06:35:11 -0500 Subject: [petsc-users] a question about PetscSectionCreate In-Reply-To: References: Message-ID: On Sun, May 28, 2017 at 6:02 AM, Lawrence Mitchell < lawrence.mitchell at imperial.ac.uk> wrote: > > > > On 28 May 2017, at 09:16, leejearl wrote: > > > > Hi, Dave: I want to store a PetscInt tag for every cell of the dmplex > with the struct. Thanks, > > You probably want to use a DMLabel to store these ids. Unless you have a > different I'd for every cell. Several things to think about: 1) If you want to store a tag for EVERY cell, then just use an IS. Cell numberings are guaranteed to be contiguous and start from 0. 2) If you want to tag only SOME cells, then use a DMLabel as Lawrence suggests. This uses hash tables for fast construction, and sorted lists for fast search and retrieval. 3) If you want to store a VARIABLE number of data items per cell, then use a Section and an array that you allocate. Matt > > Lawrence > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From leejearl at 126.com Sun May 28 21:57:05 2017 From: leejearl at 126.com (leejearl) Date: Mon, 29 May 2017 10:57:05 +0800 Subject: [petsc-users] a question about PetscSectionCreate In-Reply-To: References: <7a1aa1ce-fc8a-7c03-9219-995cea4f74b2@126.com> Message-ID: <2a54add1-7e17-dd50-4a2b-58d905449bc5@126.com> Thanks for your kind replies. I will give a result after the test. On 2017?05?28? 19:32, Matthew Knepley wrote: > On Sun, May 28, 2017 at 1:49 AM, Dave May > wrote: > > > On Sun, 28 May 2017 at 08:31, leejearl > wrote: > > Hi, PETSc developer: > > I need to create a PetscSection with a struct. The struct is > defined as follow, > > typedef struct > { > PetscReal x; > PetscInt id; > } testStruct; > > When I run the program, I got a wrong output as follow, > > Vec Object: 1 MPI processes > type: seq > 2. > 4.94066e-324 > 2. > 4.94066e-324 > 2. > 4.94066e-324 > 2. > 4.94066e-324 > 2. > 4.94066e-324 > 2. > 4.94066e-324 > 2. > 4.94066e-324 > 2. > 4.94066e-324 > > But when I defined the struct as > > typedef struct > { > PetscReal x; > PetscReal id; > } testStruct; > > The output is ok. It seems that there is some wrong with the > memories > when I define the "id" as a PetscInt type. > > > Yep. > > > > I can not find out the reasons, and any one can help me with it? > > > The Vec object can only store quantities of type PetscScalar. It > cannot store PetscInt's and it definitely cannot represent a > mixture of PetscReal's and PetscInt's. > > > Dave is correct. However this usage completely misses the point of > Section. Section is a device for storing indices into > ANY storage, not just Vec and IS. I would manage an array of the > structs that I allocate, and use the Section to index into. > > Matt > > > Thanks, > Dave > > The > source file "test.c" is attached. > > > Thanks, > > leejearl > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ -- ?? ??????????????? Phone: 17792092487 QQ: 188524324 -------------- next part -------------- An HTML attachment was scrubbed... URL: From leejearl at 126.com Mon May 29 01:39:26 2017 From: leejearl at 126.com (leejearl) Date: Mon, 29 May 2017 14:39:26 +0800 Subject: [petsc-users] a question about PetscSectionCreate In-Reply-To: References: Message-ID: <9715fa58-bf80-7aca-d01a-c74cdcde5701@126.com> Hi, all: I have create a IS for every cell in dmplex by the following steps: 1. Creating a integer array which size is matched to the number of cells. 2. Use the routine "ISCreateGeneral" to create a corresponding IS. Is there any routine which can create a IS for every cell in the dmplex directly?, and what is the difference between ISCopy() and ISDuplicate()? Thanks, leejearl On 2017?05?28? 19:35, Matthew Knepley wrote: > On Sun, May 28, 2017 at 6:02 AM, Lawrence Mitchell > > wrote: > > > > > On 28 May 2017, at 09:16, leejearl > wrote: > > > > Hi, Dave: I want to store a PetscInt tag for every cell of the > dmplex with the struct. Thanks, > > You probably want to use a DMLabel to store these ids. Unless you > have a different I'd for every cell. > > > Several things to think about: > > 1) If you want to store a tag for EVERY cell, then just use an IS. > Cell numberings are guaranteed to be > contiguous and start from 0. > > 2) If you want to tag only SOME cells, then use a DMLabel as Lawrence > suggests. This uses hash tables > for fast construction, and sorted lists for fast search and retrieval. > > 3) If you want to store a VARIABLE number of data items per cell, then > use a Section and an array that you allocate. > > Matt > > > Lawrence > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ -- ?? ??????????????? Phone: 17792092487 QQ: 188524324 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.mayhem23 at gmail.com Mon May 29 02:47:56 2017 From: dave.mayhem23 at gmail.com (Dave May) Date: Mon, 29 May 2017 07:47:56 +0000 Subject: [petsc-users] a question about PetscSectionCreate In-Reply-To: <9715fa58-bf80-7aca-d01a-c74cdcde5701@126.com> References: <9715fa58-bf80-7aca-d01a-c74cdcde5701@126.com> Message-ID: On Mon, 29 May 2017 at 08:39, leejearl wrote: > Hi, all: > I have create a IS for every cell in dmplex by the following steps: > 1. Creating a integer array which size is matched to the number of cells. > 2. Use the routine "ISCreateGeneral" to create a corresponding IS. > > Is there any routine which can create a IS for every cell in the dmplex > directly?, > I don't think so as Plex would have to somehow know what geom quantity to use to define the size of IS (e.g. vertex, cell, face, edge) and what is the difference between ISCopy() and ISDuplicate()? > ISDuplicate allocates memory for a new with the same comm and layout as the original IS AND copies values from the original IS into the new one. (Note that this is slightly different from other duplicate functions like VecDuplicate which only allocate memory and does not copy values from the orig vec.) ISCopy does not allocate memory for the IS (passed as the second arg), it only performs the copy of values. Thanks Dave > > Thanks, > leejearl > > > On 2017?05?28? 19:35, Matthew Knepley wrote: > > On Sun, May 28, 2017 at 6:02 AM, Lawrence Mitchell < > lawrence.mitchell at imperial.ac.uk> wrote: > >> >> >> > On 28 May 2017, at 09:16, leejearl wrote: >> > >> > Hi, Dave: I want to store a PetscInt tag for every cell of the dmplex >> with the struct. Thanks, >> >> You probably want to use a DMLabel to store these ids. Unless you have a >> different I'd for every cell. > > > Several things to think about: > > 1) If you want to store a tag for EVERY cell, then just use an IS. Cell > numberings are guaranteed to be > contiguous and start from 0. > > 2) If you want to tag only SOME cells, then use a DMLabel as Lawrence > suggests. This uses hash tables > for fast construction, and sorted lists for fast search and retrieval. > > 3) If you want to store a VARIABLE number of data items per cell, then use > a Section and an array that you allocate. > > Matt > > >> >> Lawrence >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > > -- > ?? > ??????????????? > Phone: 17792092487 > QQ: 188524324 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dnolte at dim.uchile.cl Mon May 29 11:17:30 2017 From: dnolte at dim.uchile.cl (David Nolte) Date: Mon, 29 May 2017 12:17:30 -0400 Subject: [petsc-users] petsc4py and python's logging module Message-ID: <921c7e29-707c-eb57-1f8e-0b12a45aa7e9@dim.uchile.cl> Dear all, is it possible to use python's logging module (https://docs.python.org/2/howto/logging.html) to handle PETSc output in python, such as the residuals during a KSP/SNES solve? I log my solver's activity to a file using the logging module, it would be great to include the PETSc output aswell. Regards, David From xinzhe.wu1990 at gmail.com Mon May 29 11:19:20 2017 From: xinzhe.wu1990 at gmail.com (Xinzhe Wu) Date: Mon, 29 May 2017 18:19:20 +0200 Subject: [petsc-users] Errors about PETSc MPI+GPU Message-ID: Dear all, We have developed the codes with PETSc + SLEPc which works well on CPU version. Now we want to try these codes with GPU + MPI, but get some weird errors shown as below. I have found someone talked about this problem here http://lists.mcs.anl.gov/pipermail/petsc-dev/2016-March/018836.html , but I can hardly understand it. Can anyone help me with these issues? Thank you in advance! [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Error in external library [0]PETSC ERROR: CUBLAS error 1 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: [2]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [2]PETSC ERROR: Error in external library [2]PETSC ERROR: CUBLAS error 1 [2]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [2]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3965-gf375733 GIT Date: 2017-05-28 10:32:02 -0500 [2]PETSC ERROR: ./hyperh on a arch-linux2-c-debug named romeo44 by xinzhewu Mon May 29 18:03:58 2017 [2]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mpich --download-fblaslapack --with-visibility=0 --with-shared-libraries=0 --with-cuda=1 --with-thrust=1 --with-precision=double --with-clanguage=c --with-pestc-arch=linux-c-no-debug-complex --with-scalar-type=complex [2]PETSC ERROR: #1 PetscInitialize() line 906 in /home/xinzhewu/Petsc-GPUs/petsc/src/sys/objects/pinit.c [2]PETSC ERROR: #2 SlepcInitialize() line 259 in /home/xinzhewu/Petsc-GPUs/slepc/src/sys/slepcinit.c -- Xinzhe WU Ph.D Student of Computer Science Maison de la Simulation, CNRS USR3441 Building 565, CEA Saclay 91191, Gif-sur-Yvette, France Tel: +33 (0) 1 69 08 59 93 -------------- next part -------------- An HTML attachment was scrubbed... URL: From lvella at gmail.com Mon May 29 13:20:33 2017 From: lvella at gmail.com (Lucas Clemente Vella) Date: Mon, 29 May 2017 15:20:33 -0300 Subject: [petsc-users] How to replace the default global database? In-Reply-To: <1DEA680E-6BF8-459E-8AD2-3628E3B56BAF@mcs.anl.gov> References: <1DEA680E-6BF8-459E-8AD2-3628E3B56BAF@mcs.anl.gov> Message-ID: Hi. Not really what I need. Every time I run my program, I need to pass the non-trivial solver setup that works as a command line argument (I am using Schur complement with BCGS and Hypre as internal KSP and PC). I want to hardcode the complex solver setup so that I can use it depending on a runtime switch. Like this: if(use_schur) { // change global PETSc options to the settings I know to work. } my_solver_struct *s = create_petsc_solver(); if(use_schur) { // restore original PETSc options. } 2017-05-26 19:55 GMT-03:00 Barry Smith : > > I do not think you want to do this. The standard way we handle what it > seems you need is to use PetscObjectSetOptionsPrefix() for the different > PETSc objects giving them different prefixes and then appending the prefix > for the options when you provide them to the options database. For example > if you have a KSP for a flow solver and a KSP for a pressure solver you > might do > > KSPCreate(PETSC_COMM_WORLD,&flow); > KSPSetOptionsPrefix(flow,"u"); > > KSPCreate(PETSC_COMM_WORLD,&pressure); > KSPSetOptionsPrefix(pressure,"p"); > > and set options like > > -u_pc_type jacobi > > -p_pc_type gamg > > Will this do what you need? > > Barry > > Because the options data base can be accessed by any object at any > time (not just when it is created), it doesn't make sense to change the > default options database ever because it would be uncertain what objects > the change affected or did not affect. > > > > > > > > On May 26, 2017, at 4:20 PM, Lucas Clemente Vella > wrote: > > > > Here is what I want to do: > > - Take the global PetscOptions and store it somewhere; > > - Create my own PetscOptions; > > - Populate it with my options; > > - Set my new PetscOptions as the global default; > > - Create some PETSc objects; > > - Restore old PetscOptions as default global; > > - Destroy the PetscOptions I created. > > > > I could not find a function to replace global PetscOptions, or to copy > one PetscOptions to another. Is it possible to do what I want to do? How? > > > > -- > > Lucas Clemente Vella > > lvella at gmail.com > > -- Lucas Clemente Vella lvella at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon May 29 13:31:13 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 29 May 2017 13:31:13 -0500 Subject: [petsc-users] How to replace the default global database? In-Reply-To: References: <1DEA680E-6BF8-459E-8AD2-3628E3B56BAF@mcs.anl.gov> Message-ID: <131E7428-B018-4429-8834-BB08ADDCF54C@mcs.anl.gov> > On May 29, 2017, at 1:20 PM, Lucas Clemente Vella wrote: > > Hi. Not really what I need. Every time I run my program, I need to pass the non-trivial solver setup that works as a command line argument (I am using Schur complement with BCGS and Hypre as internal KSP and PC). I want to hardcode the complex solver setup so that I can use it depending on a runtime switch. Like this: > > if(use_schur) { > // change global PETSc options to the settings I know to work. This is ok. You can use PetscOptionsSetValue() or PetscOptionsInsert() to put the values in. > } > > my_solver_struct *s = create_petsc_solver(); > > if(use_schur) { > // restore original PETSc options. > } Why do you need to "restore original PETSc options" at this point? What are the options used for that they need to be reset? If they control other solvers, for example, then just give them a different prefix. > > 2017-05-26 19:55 GMT-03:00 Barry Smith : > > I do not think you want to do this. The standard way we handle what it seems you need is to use PetscObjectSetOptionsPrefix() for the different PETSc objects giving them different prefixes and then appending the prefix for the options when you provide them to the options database. For example if you have a KSP for a flow solver and a KSP for a pressure solver you might do > > KSPCreate(PETSC_COMM_WORLD,&flow); > KSPSetOptionsPrefix(flow,"u"); > > KSPCreate(PETSC_COMM_WORLD,&pressure); > KSPSetOptionsPrefix(pressure,"p"); > > and set options like > > -u_pc_type jacobi > > -p_pc_type gamg > > Will this do what you need? > > Barry > > Because the options data base can be accessed by any object at any time (not just when it is created), it doesn't make sense to change the default options database ever because it would be uncertain what objects the change affected or did not affect. > > > > > > > > On May 26, 2017, at 4:20 PM, Lucas Clemente Vella wrote: > > > > Here is what I want to do: > > - Take the global PetscOptions and store it somewhere; > > - Create my own PetscOptions; > > - Populate it with my options; > > - Set my new PetscOptions as the global default; > > - Create some PETSc objects; > > - Restore old PetscOptions as default global; > > - Destroy the PetscOptions I created. > > > > I could not find a function to replace global PetscOptions, or to copy one PetscOptions to another. Is it possible to do what I want to do? How? > > > > -- > > Lucas Clemente Vella > > lvella at gmail.com > > > > > -- > Lucas Clemente Vella > lvella at gmail.com From knepley at gmail.com Mon May 29 14:06:14 2017 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 29 May 2017 14:06:14 -0500 Subject: [petsc-users] Errors about PETSc MPI+GPU In-Reply-To: References: Message-ID: On Mon, May 29, 2017 at 11:19 AM, Xinzhe Wu wrote: > Dear all, > > We have developed the codes with PETSc + SLEPc which works well on CPU > version. Now we want to try these codes with GPU + MPI, but get some weird > errors shown as below. > > I have found someone talked about this problem here > http://lists.mcs.anl.gov/pipermail/petsc-dev/2016-March/018836.html , but > I can hardly understand it. Can anyone help me with these issues? > The answer is here: >>>>* I think the error messages you get is pretty descriptive regarding the root cause. You are probably running out of GPU memory. Since you are running on a GTX 285 you can't use MPS [1] therefore each MPI process has its own context on the GPU. Each context needs to initialize some data on the GPU (used for local variables and so on). The required amount needed for this depends on the size of the GPUs (essentially correlates with the maximum number of concurrently active threads). This can easily be 50-100MB. So with only 1GB of GPU memory you are probably using all GPUs memory for context data and nothing is available for your application. Unfortunately there is no good way to debug this with GeForce. On Tesla nvidia-smi does show you all processes that have a context on a GPU together with their memory consumption.* It appears that you are running out of GPU memory. This can happen if you use too many MPI processes for a single GPU. Thanks, Matt > Thank you in advance! > > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Error in external library > [0]PETSC ERROR: CUBLAS error 1 > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: [2]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [2]PETSC ERROR: Error in external library > [2]PETSC ERROR: CUBLAS error 1 > [2]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [2]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3965-gf375733 GIT > Date: 2017-05-28 10:32:02 -0500 > [2]PETSC ERROR: ./hyperh on a arch-linux2-c-debug named romeo44 by > xinzhewu Mon May 29 18:03:58 2017 > [2]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ > --with-fc=gfortran --download-mpich --download-fblaslapack > --with-visibility=0 --with-shared-libraries=0 --with-cuda=1 --with-thrust=1 > --with-precision=double --with-clanguage=c --with-pestc-arch=linux-c-no-debug-complex > --with-scalar-type=complex > [2]PETSC ERROR: #1 PetscInitialize() line 906 in /home/xinzhewu/Petsc-GPUs/ > petsc/src/sys/objects/pinit.c > [2]PETSC ERROR: #2 SlepcInitialize() line 259 in /home/xinzhewu/Petsc-GPUs/ > slepc/src/sys/slepcinit.c > > > -- > Xinzhe WU > Ph.D Student of Computer Science > Maison de la Simulation, CNRS USR3441 > Building 565, CEA Saclay > 91191, Gif-sur-Yvette, France > Tel: +33 (0) 1 69 08 59 93 <+33%201%2069%2008%2059%2093> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon May 29 14:30:39 2017 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 29 May 2017 14:30:39 -0500 Subject: [petsc-users] petsc4py and python's logging module In-Reply-To: <921c7e29-707c-eb57-1f8e-0b12a45aa7e9@dim.uchile.cl> References: <921c7e29-707c-eb57-1f8e-0b12a45aa7e9@dim.uchile.cl> Message-ID: On Mon, May 29, 2017 at 11:17 AM, David Nolte wrote: > Dear all, > > is it possible to use python's logging module > (https://docs.python.org/2/howto/logging.html) to handle PETSc output in > python, such as the residuals during a KSP/SNES solve? > I log my solver's activity to a file using the logging module, it would > be great to include the PETSc output aswell. > I think the best way to do this is the following: 1) Create a PetscViewer implementation, say PyASCII, that logs to the Python descriptor. This might be as easy as just augmenting the ASCII viewer to grab this descriptor on creation 2) Then you can hook this viewer to the monitor using options -ksp_monitor pyascii Thanks, Matt > Regards, > David > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dalcinl at gmail.com Mon May 29 14:55:52 2017 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Mon, 29 May 2017 22:55:52 +0300 Subject: [petsc-users] petsc4py and python's logging module In-Reply-To: <921c7e29-707c-eb57-1f8e-0b12a45aa7e9@dim.uchile.cl> References: <921c7e29-707c-eb57-1f8e-0b12a45aa7e9@dim.uchile.cl> Message-ID: On 29 May 2017 at 19:17, David Nolte wrote: > Dear all, > > is it possible to use python's logging module > (https://docs.python.org/2/howto/logging.html) to handle PETSc output in > python, such as the residuals during a KSP/SNES solve? > I log my solver's activity to a file using the logging module, it would > be great to include the PETSc output aswell. > Not sure if this is what you really want, but you could... 1) Use {ksp|snes}.setConvergenceHistory() before solve, then {ksp|snes}.getConvergenceHistory() after solve, you will get arrays with the residual history, then you can do whatever you want with them. 2) Implement a KSP/SNES monitor in a Python function and call {ksp|snes}.setMonitor(), then you can use python's logging inside your monitor function. -- Lisandro Dalcin ============ Research Scientist Computer, Electrical and Mathematical Sciences & Engineering (CEMSE) Extreme Computing Research Center (ECRC) King Abdullah University of Science and Technology (KAUST) http://ecrc.kaust.edu.sa/ 4700 King Abdullah University of Science and Technology al-Khawarizmi Bldg (Bldg 1), Office # 0109 Thuwal 23955-6900, Kingdom of Saudi Arabia http://www.kaust.edu.sa Office Phone: +966 12 808-0459 From bsmith at mcs.anl.gov Mon May 29 15:10:39 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 29 May 2017 15:10:39 -0500 Subject: [petsc-users] petsc4py and python's logging module In-Reply-To: <921c7e29-707c-eb57-1f8e-0b12a45aa7e9@dim.uchile.cl> References: <921c7e29-707c-eb57-1f8e-0b12a45aa7e9@dim.uchile.cl> Message-ID: <01274A21-4416-4FA4-AAD5-FE117E9424C6@mcs.anl.gov> If you want to log all PETSc ASCII output you can use http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscVFPrintf.html Just don't use the FILE *fd initial argument, instead pass the output into a Python function that calls the logger as desired. Barry > On May 29, 2017, at 11:17 AM, David Nolte wrote: > > Dear all, > > is it possible to use python's logging module > (https://docs.python.org/2/howto/logging.html) to handle PETSc output in > python, such as the residuals during a KSP/SNES solve? > I log my solver's activity to a file using the logging module, it would > be great to include the PETSc output aswell. > > Regards, > David > From lvella at gmail.com Mon May 29 15:24:03 2017 From: lvella at gmail.com (Lucas Clemente Vella) Date: Mon, 29 May 2017 17:24:03 -0300 Subject: [petsc-users] Can't retrieve inner KSP from Schur complement Message-ID: I want to set a custom convergence test for the inner KSPs of Schur complement method, so I am using PCFieldSplitGetSubKSP() to get the inner KSPs: int n_subksp; KSP *subksp = NULL; PCFieldSplitGetSubKSP(s->pc, &n_subksp, &subksp); assert(n_subksp == 2); But I get a segmentation fault on MatSchurComplementGetKSP(). From file src/ksp/ksp/utils/schurm.c (line 320): PetscErrorCode MatSchurComplementGetKSP(Mat S, KSP *ksp) { Mat_SchurComplement *Na; PetscFunctionBegin; PetscValidHeaderSpecific(S,MAT_CLASSID,1); PetscValidPointer(ksp,2); Na = (Mat_SchurComplement*) S->data; *ksp = Na->ksp; // <<<<< segfaults on this line, 'Na' is an invalid pointer... PetscFunctionReturn(0); } This is the stack trace given by valgrind: ==13559== Invalid read of size 8 ==13559== at 0x56B8780: MatSchurComplementGetKSP (schurm.c:320) ==13559== by 0x55F5B08: PCFieldSplitGetSubKSP_FieldSplit_Schur(_p_PC*, int*, _p_KSP***) (fieldsplit.c:1367) ==13559== by 0x5605187: PCFieldSplitGetSubKSP (fieldsplit.c:1869) ==13559== by 0x166305: set_singular_convergence_test (solver-petsc.c:293) ### irrelevant calls, from inside my program ==13559== Address 0x6c0 is not stack'd, malloc'd or (recently) free'd ==13559== I tried doing this operation both before and after MatAssembly*() calls, and with both I get the same result. Petsc version is 3.7.5, installed from Ubuntu repository. Is this a bug? Or I am doing it wrong? -- Lucas Clemente Vella lvella at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon May 29 15:32:22 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 29 May 2017 15:32:22 -0500 Subject: [petsc-users] Can't retrieve inner KSP from Schur complement In-Reply-To: References: Message-ID: <7F5B0AFF-558C-4E06-887F-3838456A3BB1@mcs.anl.gov> Likely the problem is that the inner objects do not yet exist when you are trying to set the options. It is kind of tricky to handle the construction of these multiple nested objects and when inner objects actually get created. Make sure you call KSPSetUp() on the outer KSP before you call this. But this may still not be enough to insure that this inner object has yet been created. Let us know. Barry I will add any error check for Na being null so it prints a useful error message instead of crashing. > On May 29, 2017, at 3:24 PM, Lucas Clemente Vella wrote: > > I want to set a custom convergence test for the inner KSPs of Schur complement method, so I am using PCFieldSplitGetSubKSP() to get the inner KSPs: > > int n_subksp; > KSP *subksp = NULL; > > PCFieldSplitGetSubKSP(s->pc, &n_subksp, &subksp); > assert(n_subksp == 2); > > But I get a segmentation fault on MatSchurComplementGetKSP(). From file src/ksp/ksp/utils/schurm.c (line 320): > > PetscErrorCode MatSchurComplementGetKSP(Mat S, KSP *ksp) > { > Mat_SchurComplement *Na; > > PetscFunctionBegin; > PetscValidHeaderSpecific(S,MAT_CLASSID,1); > PetscValidPointer(ksp,2); > Na = (Mat_SchurComplement*) S->data; > *ksp = Na->ksp; // <<<<< segfaults on this line, 'Na' is an invalid pointer... > PetscFunctionReturn(0); > } > > This is the stack trace given by valgrind: > > ==13559== Invalid read of size 8 > ==13559== at 0x56B8780: MatSchurComplementGetKSP (schurm.c:320) > ==13559== by 0x55F5B08: PCFieldSplitGetSubKSP_FieldSplit_Schur(_p_PC*, int*, _p_KSP***) (fieldsplit.c:1367) > ==13559== by 0x5605187: PCFieldSplitGetSubKSP (fieldsplit.c:1869) > ==13559== by 0x166305: set_singular_convergence_test (solver-petsc.c:293) > ### irrelevant calls, from inside my program > ==13559== Address 0x6c0 is not stack'd, malloc'd or (recently) free'd > ==13559== > > I tried doing this operation both before and after MatAssembly*() calls, and with both I get the same result. Petsc version is 3.7.5, installed from Ubuntu repository. Is this a bug? Or I am doing it wrong? > > -- > Lucas Clemente Vella > lvella at gmail.com From a.croucher at auckland.ac.nz Mon May 29 17:13:31 2017 From: a.croucher at auckland.ac.nz (Adrian Croucher) Date: Tue, 30 May 2017 10:13:31 +1200 Subject: [petsc-users] DMPlex export to hdf5/vtk for triangle/prism mesh In-Reply-To: References: Message-ID: <819c5072-372d-5f2d-49ef-ca654e96f749@auckland.ac.nz> hi, I was asking about support for exactly these 6-node wedge elements in DMPlex back in January. At the time, there was no support for them. Has there been some progress since then? We are going to need them before we can release our software, which we're aiming to do by the end of the year. Cheers, Adrian > Message: 4 > Date: Fri, 26 May 2017 22:40:40 -0500 > From: Matthew Knepley > To: "Fabian.Jakub" > Cc: PETSc > Subject: Re: [petsc-users] DMPlex export to hdf5/vtk for > triangle/prism mesh > Message-ID: > > Content-Type: text/plain; charset="utf-8" > > On Fri, May 26, 2017 at 12:27 PM, Fabian.Jakub < > Fabian.Jakub at physik.uni-muenchen.de> wrote: > > > Dear Petsc Team, > > > > I am playing around with DMPlex, using it to generate the Mesh for the > > ICON weather model(http://doi.org/10.1002/2015MS000431), which employs a > > triangle mesh horizontally and columns, vertically. > > > > This results in a grid, looking like prisms, where top and bottom faces > > are triangles and side faces are rectangles. > > > > I was delighted to see that I could export the triangle DMPlex (2d Mesh) > > to hdf5 and use petsc_gen_xdmf.py to then visualize the mesh in > > visit/paraview. > > This is especially nice when exporting petscsections/vectors directly to > > VTK. > > > > Great. > > > > I then tried the same approach for the prism grid in 3D. > > I attached the code for one single cell, as well as the output in hdf5. > > > > However, trying to convert the hdf5 output, it fails with: > > > > make prism.xmf > > > > $PETSC_DIR/bin/petsc_gen_xdmf.py prism.h5 > > Traceback (most recent call last): > > File > > "/software/meteo/xenial/x86_64/petsc/master/debug_gcc/..// > > bin/petsc_gen_xdmf.py", > > line 241, in > > generateXdmf(f) > > File > > "/software/meteo/xenial/x86_64/petsc/master/debug_gcc/..// > > bin/petsc_gen_xdmf.py", > > line 235, in generateXdmf > > Xdmf(xdmfFilename).write(hdfFilename, topoPath, numCells, > > numCorners, cellDim, geomPath, numVertices, spaceDim, time, vfields, > > cfields) > > File > > "/software/meteo/xenial/x86_64/petsc/master/debug_gcc/..// > > bin/petsc_gen_xdmf.py", > > line 193, in write > > self.writeSpaceGridHeader(fp, numCells, numCorners, cellDim, spaceDim) > > File > > "/software/meteo/xenial/x86_64/petsc/master/debug_gcc/..// > > bin/petsc_gen_xdmf.py", > > line 75, in writeSpaceGridHeader > > ''' % (self.cellMap[cellDim][numCorners], numCells, "XYZ" if > > spaceDim > 2 else "XY")) > > KeyError: 6 > > > > > > Also, if I try to export a vector directly to vtk, visit and paraview > > fail to open it. > > > > My question is: > > Is this a general limitation of these output formats, that I can not mix > > faces with 3 and 4 vertices or is it a limitation of the > > petsc_gen_xdmf.py or the VTK Viewer. > > > > petsc_gen_xdmf. Take a look here > > > https://bitbucket.org/petsc/petsc/src/1731673c3fe570066779d46b51a4aee7a45775ed/bin/petsc_gen_xdmf.py?at=master&fileviewer=file-view-default#petsc_gen_xdmf.py-9 > > This is what fails. You need to add something like > > 6: "Wedge" > > in the dictionary. See http://www.xdmf.org/index.php/XDMF_Model_and_Format > > > > I'd also welcome any thoughts on the prism mesh in general. > > Is it that uncommon to use and do you foresee other complications with it? > > > > You need an element that works with prisms, but it seems you already have > one. I know > there is good work from here: https://arxiv.org/abs/1411.2940 > > > > I fear I cannot change the discretization of the host model but maybe it > > makes sense to use a different grid for my radiative transfer code? > > > > I do not really do RT, but would be happy to try and think about it. > > Thanks, > > Matt > > > > Many thanks, > > > > > > Fabian > > > -- Dr Adrian Croucher Senior Research Fellow Department of Engineering Science University of Auckland, New Zealand email: a.croucher at auckland.ac.nz tel: +64 (0)9 923 4611 From Fabian.Jakub at physik.uni-muenchen.de Mon May 29 17:43:11 2017 From: Fabian.Jakub at physik.uni-muenchen.de (Fabian Jakub) Date: Tue, 30 May 2017 00:43:11 +0200 Subject: [petsc-users] DMPlex export to hdf5/vtk for triangle/prism mesh In-Reply-To: <819c5072-372d-5f2d-49ef-ca654e96f749@auckland.ac.nz> References: <819c5072-372d-5f2d-49ef-ca654e96f749@auckland.ac.nz> Message-ID: Hi, I did just as Matt suggested which works nicely... thanks by the way! Inserted in the petsc_gen_xdmf.py the "6: 'Wedge' " entry . Calling the example with: -show_plex hdf5:output.h5 -show_vector hdf5:output.h5::append exports the mesh and a vector to hdf5. Then calling $PETSC_DIR/bin/petsc_gen_xdmf.py output.h5 correctly creates the descriptor file and just loads to visit. Many thanks again to you, Matt :) Fab On 30.05.2017 00:13, Adrian Croucher wrote: > hi, > > I was asking about support for exactly these 6-node wedge elements in > DMPlex back in January. > > At the time, there was no support for them. Has there been some > progress since then? > > We are going to need them before we can release our software, which > we're aiming to do by the end of the year. > > Cheers, Adrian > >> Message: 4 >> Date: Fri, 26 May 2017 22:40:40 -0500 >> From: Matthew Knepley >> To: "Fabian.Jakub" >> Cc: PETSc >> Subject: Re: [petsc-users] DMPlex export to hdf5/vtk for >> triangle/prism mesh >> Message-ID: >> >> Content-Type: text/plain; charset="utf-8" >> >> On Fri, May 26, 2017 at 12:27 PM, Fabian.Jakub < >> Fabian.Jakub at physik.uni-muenchen.de> wrote: >> >> > Dear Petsc Team, >> > >> > I am playing around with DMPlex, using it to generate the Mesh for the >> > ICON weather model(http://doi.org/10.1002/2015MS000431), which >> employs a >> > triangle mesh horizontally and columns, vertically. >> > >> > This results in a grid, looking like prisms, where top and bottom >> faces >> > are triangles and side faces are rectangles. >> > >> > I was delighted to see that I could export the triangle DMPlex (2d >> Mesh) >> > to hdf5 and use petsc_gen_xdmf.py to then visualize the mesh in >> > visit/paraview. >> > This is especially nice when exporting petscsections/vectors >> directly to >> > VTK. >> > >> >> Great. >> >> >> > I then tried the same approach for the prism grid in 3D. >> > I attached the code for one single cell, as well as the output in >> hdf5. >> > >> > However, trying to convert the hdf5 output, it fails with: >> > >> > make prism.xmf >> > >> > $PETSC_DIR/bin/petsc_gen_xdmf.py prism.h5 >> > Traceback (most recent call last): >> > File >> > "/software/meteo/xenial/x86_64/petsc/master/debug_gcc/..// >> > bin/petsc_gen_xdmf.py", >> > line 241, in >> > generateXdmf(f) >> > File >> > "/software/meteo/xenial/x86_64/petsc/master/debug_gcc/..// >> > bin/petsc_gen_xdmf.py", >> > line 235, in generateXdmf >> > Xdmf(xdmfFilename).write(hdfFilename, topoPath, numCells, >> > numCorners, cellDim, geomPath, numVertices, spaceDim, time, vfields, >> > cfields) >> > File >> > "/software/meteo/xenial/x86_64/petsc/master/debug_gcc/..// >> > bin/petsc_gen_xdmf.py", >> > line 193, in write >> > self.writeSpaceGridHeader(fp, numCells, numCorners, cellDim, >> spaceDim) >> > File >> > "/software/meteo/xenial/x86_64/petsc/master/debug_gcc/..// >> > bin/petsc_gen_xdmf.py", >> > line 75, in writeSpaceGridHeader >> > ''' % (self.cellMap[cellDim][numCorners], numCells, "XYZ" if >> > spaceDim > 2 else "XY")) >> > KeyError: 6 >> > >> > >> > Also, if I try to export a vector directly to vtk, visit and paraview >> > fail to open it. >> > >> > My question is: >> > Is this a general limitation of these output formats, that I can >> not mix >> > faces with 3 and 4 vertices or is it a limitation of the >> > petsc_gen_xdmf.py or the VTK Viewer. >> > >> >> petsc_gen_xdmf. Take a look here >> >> >> https://bitbucket.org/petsc/petsc/src/1731673c3fe570066779d46b51a4aee7a45775ed/bin/petsc_gen_xdmf.py?at=master&fileviewer=file-view-default#petsc_gen_xdmf.py-9 >> >> >> This is what fails. You need to add something like >> >> 6: "Wedge" >> >> in the dictionary. See >> http://www.xdmf.org/index.php/XDMF_Model_and_Format >> >> >> > I'd also welcome any thoughts on the prism mesh in general. >> > Is it that uncommon to use and do you foresee other complications >> with it? >> > >> >> You need an element that works with prisms, but it seems you already >> have >> one. I know >> there is good work from here: https://arxiv.org/abs/1411.2940 >> >> >> > I fear I cannot change the discretization of the host model but >> maybe it >> > makes sense to use a different grid for my radiative transfer code? >> > >> >> I do not really do RT, but would be happy to try and think about it. >> >> Thanks, >> >> Matt >> >> >> > Many thanks, >> > >> > >> > Fabian >> > >> > From knepley at gmail.com Mon May 29 19:25:59 2017 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 29 May 2017 19:25:59 -0500 Subject: [petsc-users] DMPlex export to hdf5/vtk for triangle/prism mesh In-Reply-To: <819c5072-372d-5f2d-49ef-ca654e96f749@auckland.ac.nz> References: <819c5072-372d-5f2d-49ef-ca654e96f749@auckland.ac.nz> Message-ID: On Mon, May 29, 2017 at 5:13 PM, Adrian Croucher wrote: > hi, > > I was asking about support for exactly these 6-node wedge elements in > DMPlex back in January. > > At the time, there was no support for them. Has there been some progress > since then? > > We are going to need them before we can release our software, which we're > aiming to do by the end of the year. > Sorry about not keeping up to date on that. I had not really thought about it working until Fabian suggested it. So, it looks like XDMF output works. I am making a test now. However, other stuff will not, like refinement, interpolation, cell geometry, and other discretization stuff. What do you need working? Thanks, Matt > Cheers, Adrian > > Message: 4 >> Date: Fri, 26 May 2017 22:40:40 -0500 >> From: Matthew Knepley >> To: "Fabian.Jakub" >> Cc: PETSc >> Subject: Re: [petsc-users] DMPlex export to hdf5/vtk for >> triangle/prism mesh >> Message-ID: >> > gmail.com> >> Content-Type: text/plain; charset="utf-8" >> >> >> On Fri, May 26, 2017 at 12:27 PM, Fabian.Jakub < >> Fabian.Jakub at physik.uni-muenchen.de> wrote: >> >> > Dear Petsc Team, >> > >> > I am playing around with DMPlex, using it to generate the Mesh for the >> > ICON weather model(http://doi.org/10.1002/2015MS000431), which employs >> a >> > triangle mesh horizontally and columns, vertically. >> > >> > This results in a grid, looking like prisms, where top and bottom faces >> > are triangles and side faces are rectangles. >> > >> > I was delighted to see that I could export the triangle DMPlex (2d Mesh) >> > to hdf5 and use petsc_gen_xdmf.py to then visualize the mesh in >> > visit/paraview. >> > This is especially nice when exporting petscsections/vectors directly to >> > VTK. >> > >> >> Great. >> >> >> > I then tried the same approach for the prism grid in 3D. >> > I attached the code for one single cell, as well as the output in hdf5. >> > >> > However, trying to convert the hdf5 output, it fails with: >> > >> > make prism.xmf >> > >> > $PETSC_DIR/bin/petsc_gen_xdmf.py prism.h5 >> > Traceback (most recent call last): >> > File >> > "/software/meteo/xenial/x86_64/petsc/master/debug_gcc/..// >> > bin/petsc_gen_xdmf.py", >> > line 241, in >> > generateXdmf(f) >> > File >> > "/software/meteo/xenial/x86_64/petsc/master/debug_gcc/..// >> > bin/petsc_gen_xdmf.py", >> > line 235, in generateXdmf >> > Xdmf(xdmfFilename).write(hdfFilename, topoPath, numCells, >> > numCorners, cellDim, geomPath, numVertices, spaceDim, time, vfields, >> > cfields) >> > File >> > "/software/meteo/xenial/x86_64/petsc/master/debug_gcc/..// >> > bin/petsc_gen_xdmf.py", >> > line 193, in write >> > self.writeSpaceGridHeader(fp, numCells, numCorners, cellDim, >> spaceDim) >> > File >> > "/software/meteo/xenial/x86_64/petsc/master/debug_gcc/..// >> > bin/petsc_gen_xdmf.py", >> > line 75, in writeSpaceGridHeader >> > ''' % (self.cellMap[cellDim][numCorners], numCells, "XYZ" if >> > spaceDim > 2 else "XY")) >> > KeyError: 6 >> > >> > >> > Also, if I try to export a vector directly to vtk, visit and paraview >> > fail to open it. >> > >> > My question is: >> > Is this a general limitation of these output formats, that I can not mix >> > faces with 3 and 4 vertices or is it a limitation of the >> > petsc_gen_xdmf.py or the VTK Viewer. >> > >> >> petsc_gen_xdmf. Take a look here >> >> >> https://bitbucket.org/petsc/petsc/src/1731673c3fe570066779d4 >> 6b51a4aee7a45775ed/bin/petsc_gen_xdmf.py?at=master& >> fileviewer=file-view-default#petsc_gen_xdmf.py-9 >> >> This is what fails. You need to add something like >> >> 6: "Wedge" >> >> in the dictionary. See http://www.xdmf.org/index.php/ >> XDMF_Model_and_Format >> >> >> > I'd also welcome any thoughts on the prism mesh in general. >> > Is it that uncommon to use and do you foresee other complications with >> it? >> > >> >> You need an element that works with prisms, but it seems you already have >> one. I know >> there is good work from here: https://arxiv.org/abs/1411.2940 >> >> >> > I fear I cannot change the discretization of the host model but maybe it >> > makes sense to use a different grid for my radiative transfer code? >> > >> >> I do not really do RT, but would be happy to try and think about it. >> >> Thanks, >> >> Matt >> >> >> > Many thanks, >> > >> > >> > Fabian >> > >> >> > -- > Dr Adrian Croucher > Senior Research Fellow > Department of Engineering Science > University of Auckland, New Zealand > email: a.croucher at auckland.ac.nz > tel: +64 (0)9 923 4611 > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon May 29 19:27:49 2017 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 29 May 2017 19:27:49 -0500 Subject: [petsc-users] DMPlex export to hdf5/vtk for triangle/prism mesh In-Reply-To: References: <819c5072-372d-5f2d-49ef-ca654e96f749@auckland.ac.nz> Message-ID: On Mon, May 29, 2017 at 5:43 PM, Fabian Jakub < Fabian.Jakub at physik.uni-muenchen.de> wrote: > Hi, > > I did just as Matt suggested which works nicely... thanks by the way! > > Inserted in the petsc_gen_xdmf.py the "6: 'Wedge' " entry . > > Calling the example with: > > -show_plex hdf5:output.h5 -show_vector > hdf5:output.h5::append > > exports the mesh and a vector to hdf5. > > Then calling > > $PETSC_DIR/bin/petsc_gen_xdmf.py output.h5 > > correctly creates the descriptor file and just loads to visit. > > > Many thanks again to you, Matt :) > Great! I will make a test and push it soon. I'll put you on the ChangeSet. Thanks, Matt > Fab > > > > On 30.05.2017 00:13, Adrian Croucher wrote: > >> hi, >> >> I was asking about support for exactly these 6-node wedge elements in >> DMPlex back in January. >> >> At the time, there was no support for them. Has there been some progress >> since then? >> >> We are going to need them before we can release our software, which we're >> aiming to do by the end of the year. >> >> Cheers, Adrian >> >> Message: 4 >>> Date: Fri, 26 May 2017 22:40:40 -0500 >>> From: Matthew Knepley >>> To: "Fabian.Jakub" >>> Cc: PETSc >>> Subject: Re: [petsc-users] DMPlex export to hdf5/vtk for >>> triangle/prism mesh >>> Message-ID: >>> >>> Content-Type: text/plain; charset="utf-8" >>> >>> On Fri, May 26, 2017 at 12:27 PM, Fabian.Jakub < >>> Fabian.Jakub at physik.uni-muenchen.de> wrote: >>> >>> > Dear Petsc Team, >>> > >>> > I am playing around with DMPlex, using it to generate the Mesh for the >>> > ICON weather model(http://doi.org/10.1002/2015MS000431), which >>> employs a >>> > triangle mesh horizontally and columns, vertically. >>> > >>> > This results in a grid, looking like prisms, where top and bottom faces >>> > are triangles and side faces are rectangles. >>> > >>> > I was delighted to see that I could export the triangle DMPlex (2d >>> Mesh) >>> > to hdf5 and use petsc_gen_xdmf.py to then visualize the mesh in >>> > visit/paraview. >>> > This is especially nice when exporting petscsections/vectors directly >>> to >>> > VTK. >>> > >>> >>> Great. >>> >>> >>> > I then tried the same approach for the prism grid in 3D. >>> > I attached the code for one single cell, as well as the output in hdf5. >>> > >>> > However, trying to convert the hdf5 output, it fails with: >>> > >>> > make prism.xmf >>> > >>> > $PETSC_DIR/bin/petsc_gen_xdmf.py prism.h5 >>> > Traceback (most recent call last): >>> > File >>> > "/software/meteo/xenial/x86_64/petsc/master/debug_gcc/..// >>> > bin/petsc_gen_xdmf.py", >>> > line 241, in >>> > generateXdmf(f) >>> > File >>> > "/software/meteo/xenial/x86_64/petsc/master/debug_gcc/..// >>> > bin/petsc_gen_xdmf.py", >>> > line 235, in generateXdmf >>> > Xdmf(xdmfFilename).write(hdfFilename, topoPath, numCells, >>> > numCorners, cellDim, geomPath, numVertices, spaceDim, time, vfields, >>> > cfields) >>> > File >>> > "/software/meteo/xenial/x86_64/petsc/master/debug_gcc/..// >>> > bin/petsc_gen_xdmf.py", >>> > line 193, in write >>> > self.writeSpaceGridHeader(fp, numCells, numCorners, cellDim, >>> spaceDim) >>> > File >>> > "/software/meteo/xenial/x86_64/petsc/master/debug_gcc/..// >>> > bin/petsc_gen_xdmf.py", >>> > line 75, in writeSpaceGridHeader >>> > ''' % (self.cellMap[cellDim][numCorners], numCells, "XYZ" if >>> > spaceDim > 2 else "XY")) >>> > KeyError: 6 >>> > >>> > >>> > Also, if I try to export a vector directly to vtk, visit and paraview >>> > fail to open it. >>> > >>> > My question is: >>> > Is this a general limitation of these output formats, that I can not >>> mix >>> > faces with 3 and 4 vertices or is it a limitation of the >>> > petsc_gen_xdmf.py or the VTK Viewer. >>> > >>> >>> petsc_gen_xdmf. Take a look here >>> >>> >>> https://bitbucket.org/petsc/petsc/src/1731673c3fe570066779d4 >>> 6b51a4aee7a45775ed/bin/petsc_gen_xdmf.py?at=master& >>> fileviewer=file-view-default#petsc_gen_xdmf.py-9 >>> >>> This is what fails. You need to add something like >>> >>> 6: "Wedge" >>> >>> in the dictionary. See http://www.xdmf.org/index.php/ >>> XDMF_Model_and_Format >>> >>> >>> > I'd also welcome any thoughts on the prism mesh in general. >>> > Is it that uncommon to use and do you foresee other complications with >>> it? >>> > >>> >>> You need an element that works with prisms, but it seems you already have >>> one. I know >>> there is good work from here: https://arxiv.org/abs/1411.2940 >>> >>> >>> > I fear I cannot change the discretization of the host model but maybe >>> it >>> > makes sense to use a different grid for my radiative transfer code? >>> > >>> >>> I do not really do RT, but would be happy to try and think about it. >>> >>> Thanks, >>> >>> Matt >>> >>> >>> > Many thanks, >>> > >>> > >>> > Fabian >>> > >>> >>> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.croucher at auckland.ac.nz Mon May 29 20:58:20 2017 From: a.croucher at auckland.ac.nz (Adrian Croucher) Date: Tue, 30 May 2017 13:58:20 +1200 Subject: [petsc-users] DMPlex export to hdf5/vtk for triangle/prism mesh In-Reply-To: References: <819c5072-372d-5f2d-49ef-ca654e96f749@auckland.ac.nz> Message-ID: <472c6367-4a84-897e-ab9f-7444d52dbe7d@auckland.ac.nz> On 30/05/17 12:25, Matthew Knepley wrote: > > Sorry about not keeping up to date on that. I had not really thought > about it working until Fabian suggested it. > So, it looks like XDMF output works. I am making a test now. > > However, other stuff will not, like refinement, interpolation, cell > geometry, and other discretization stuff. > > What do you need working? We'll definitely need interpolation and cell geometry, but that might be about it. We won't need refinement. - Adrian -- Dr Adrian Croucher Senior Research Fellow Department of Engineering Science University of Auckland, New Zealand email: a.croucher at auckland.ac.nz tel: +64 (0)9 923 4611 -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon May 29 21:45:50 2017 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 29 May 2017 21:45:50 -0500 Subject: [petsc-users] DMPlex export to hdf5/vtk for triangle/prism mesh In-Reply-To: <472c6367-4a84-897e-ab9f-7444d52dbe7d@auckland.ac.nz> References: <819c5072-372d-5f2d-49ef-ca654e96f749@auckland.ac.nz> <472c6367-4a84-897e-ab9f-7444d52dbe7d@auckland.ac.nz> Message-ID: On Mon, May 29, 2017 at 8:58 PM, Adrian Croucher wrote: > On 30/05/17 12:25, Matthew Knepley wrote: > > > Sorry about not keeping up to date on that. I had not really thought about > it working until Fabian suggested it. > So, it looks like XDMF output works. I am making a test now. > > However, other stuff will not, like refinement, interpolation, cell > geometry, and other discretization stuff. > > What do you need working? > > > We'll definitely need interpolation and cell geometry, but that might be > about it. We won't need refinement. > What kind of basis are you expecting? A tensor product? Thanks, Matt > > - Adrian > > -- > Dr Adrian Croucher > Senior Research Fellow > Department of Engineering Science > University of Auckland, New Zealand > email: a.croucher at auckland.ac.nz > tel: +64 (0)9 923 4611 <+64%209-923%204611> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.croucher at auckland.ac.nz Mon May 29 21:52:32 2017 From: a.croucher at auckland.ac.nz (Adrian Croucher) Date: Tue, 30 May 2017 14:52:32 +1200 Subject: [petsc-users] DMPlex export to hdf5/vtk for triangle/prism mesh In-Reply-To: References: <819c5072-372d-5f2d-49ef-ca654e96f749@auckland.ac.nz> <472c6367-4a84-897e-ab9f-7444d52dbe7d@auckland.ac.nz> Message-ID: <7b5464bd-9517-f162-cf6c-4821589c19ca@auckland.ac.nz> On 30/05/17 14:45, Matthew Knepley wrote: > > > What kind of basis are you expecting? A tensor product? At present we don't even need basis functions, because we're just doing flow simulation and it's all finite volume. However further down the track we will also be doing rock mechanics on the same mesh, using finite elements. For that, tensor product basis would be fine. - Adrian -- Dr Adrian Croucher Senior Research Fellow Department of Engineering Science University of Auckland, New Zealand email: a.croucher at auckland.ac.nz tel: +64 (0)9 923 4611 -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon May 29 21:55:44 2017 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 29 May 2017 21:55:44 -0500 Subject: [petsc-users] DMPlex export to hdf5/vtk for triangle/prism mesh In-Reply-To: <7b5464bd-9517-f162-cf6c-4821589c19ca@auckland.ac.nz> References: <819c5072-372d-5f2d-49ef-ca654e96f749@auckland.ac.nz> <472c6367-4a84-897e-ab9f-7444d52dbe7d@auckland.ac.nz> <7b5464bd-9517-f162-cf6c-4821589c19ca@auckland.ac.nz> Message-ID: On Mon, May 29, 2017 at 9:52 PM, Adrian Croucher wrote: > On 30/05/17 14:45, Matthew Knepley wrote: > > > > What kind of basis are you expecting? A tensor product? > > > At present we don't even need basis functions, because we're just doing > flow simulation and it's all finite volume. > Okay good. Now for cell geometry. What kind of deformation do you allow in the wedge? and what do you want to know? For FV, we are providing the centroid and volume. If that is enough, we could be done quickly. Thanks, Matt > However further down the track we will also be doing rock mechanics on the > same mesh, using finite elements. For that, tensor product basis would be > fine. > > - Adrian > > -- > Dr Adrian Croucher > Senior Research Fellow > Department of Engineering Science > University of Auckland, New Zealand > email: a.croucher at auckland.ac.nz > tel: +64 (0)9 923 4611 <+64%209-923%204611> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From leejearl at 126.com Mon May 29 21:59:27 2017 From: leejearl at 126.com (leejearl) Date: Tue, 30 May 2017 10:59:27 +0800 Subject: [petsc-users] a question about PetscSectionCreate In-Reply-To: References: <9715fa58-bf80-7aca-d01a-c74cdcde5701@126.com> Message-ID: <224c1cd2-175f-9912-b692-7264a2dabb7b@126.com> Thanks for your kind reply. It helps me very much. leejearl On 2017?05?29? 15:47, Dave May wrote: > > On Mon, 29 May 2017 at 08:39, leejearl > wrote: > > Hi, all: > I have create a IS for every cell in dmplex by the following steps: > 1. Creating a integer array which size is matched to the number of > cells. > 2. Use the routine "ISCreateGeneral" to create a corresponding IS. > > Is there any routine which can create a IS for every cell in the > dmplex directly?, > > > I don't think so as Plex would have to somehow know what geom quantity > to use to define the size of IS (e.g. vertex, cell, face, edge) > > and what is the difference between ISCopy() and ISDuplicate()? > > > ISDuplicate allocates memory for a new with the same comm and layout > as the original IS AND copies values from the original IS into the new > one. (Note that this is slightly different from other duplicate > functions like VecDuplicate which only allocate memory and does not > copy values from the orig vec.) > > ISCopy does not allocate memory for the IS (passed as the second arg), > it only performs the copy of values. > > Thanks > Dave > > > > Thanks, > leejearl > > > On 2017?05?28? 19:35, Matthew Knepley wrote: >> On Sun, May 28, 2017 at 6:02 AM, Lawrence Mitchell >> > > wrote: >> >> >> >> > On 28 May 2017, at 09:16, leejearl > > wrote: >> > >> > Hi, Dave: I want to store a PetscInt tag for every cell of >> the dmplex with the struct. Thanks, >> >> You probably want to use a DMLabel to store these ids. Unless >> you have a different I'd for every cell. >> >> >> Several things to think about: >> >> 1) If you want to store a tag for EVERY cell, then just use an >> IS. Cell numberings are guaranteed to be >> contiguous and start from 0. >> >> 2) If you want to tag only SOME cells, then use a DMLabel as >> Lawrence suggests. This uses hash tables >> for fast construction, and sorted lists for fast search and >> retrieval. >> >> 3) If you want to store a VARIABLE number of data items per cell, >> then use a Section and an array that you allocate. >> >> Matt >> >> >> Lawrence >> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to >> which their experiments lead. >> -- Norbert Wiener >> >> http://www.caam.rice.edu/~mk51/ > > -- > ?? > ??????????????? > Phone: 17792092487 > QQ: 188524324 > -- ?? ??????????????? Phone: 17792092487 QQ: 188524324 -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.croucher at auckland.ac.nz Mon May 29 22:06:28 2017 From: a.croucher at auckland.ac.nz (Adrian Croucher) Date: Tue, 30 May 2017 15:06:28 +1200 Subject: [petsc-users] DMPlex export to hdf5/vtk for triangle/prism mesh In-Reply-To: References: <819c5072-372d-5f2d-49ef-ca654e96f749@auckland.ac.nz> <472c6367-4a84-897e-ab9f-7444d52dbe7d@auckland.ac.nz> <7b5464bd-9517-f162-cf6c-4821589c19ca@auckland.ac.nz> Message-ID: <85f0dc24-40c7-738b-c220-d94c8a14a32e@auckland.ac.nz> On 30/05/17 14:55, Matthew Knepley wrote: > On Mon, May 29, 2017 at 9:52 PM, Adrian Croucher > > wrote: > > On 30/05/17 14:45, Matthew Knepley wrote: > >> >> >> What kind of basis are you expecting? A tensor product? > > At present we don't even need basis functions, because we're just > doing flow simulation and it's all finite volume. > > > Okay good. Now for cell geometry. What kind of deformation do you > allow in the wedge? As in Fabian's application, these elements arise from meshes which have a simple layered structure in the vertical, but are unstructured in the horizontal (can be mixtures of quads and triangles in our case- in fact the triangles usually only occur where there is local refinement). So for us these wedges are just horizontal triangles projected downwards in the vertical- not really deformed at all. > and what do you want > to know? For FV, we are providing the centroid and volume. If that is > enough, we could be done quickly. Yes, just centroid and volume would be enough. - Adrian -- Dr Adrian Croucher Senior Research Fellow Department of Engineering Science University of Auckland, New Zealand email: a.croucher at auckland.ac.nz tel: +64 (0)9 923 4611 -------------- next part -------------- An HTML attachment was scrubbed... URL: From franck.houssen at inria.fr Tue May 30 02:14:58 2017 From: franck.houssen at inria.fr (Franck Houssen) Date: Tue, 30 May 2017 09:14:58 +0200 (CEST) Subject: [petsc-users] How to VecView with a formatted precision (%10.8f) ? In-Reply-To: <87shjrwmsx.fsf@jedbrown.org> References: <1559132119.8316480.1495792332227.JavaMail.zimbra@inria.fr> <87shjrwmsx.fsf@jedbrown.org> Message-ID: <1788087205.403799.1496128498132.JavaMail.zimbra@inria.fr> Mainly for debugging purposes: controlling format/precision could be convenient ! Franck ~> mpirun -n 5 ./vecViewPrecision.exe Vec Object: 5 MPI processes type: mpi Process [0] 0. 0. Process [1] 1.23457e+06 -8.1e-07 Process [2] 2.46914e-06 -1.62e+06 Process [3] 3.7037e+06 -2.43e-06 Process [4] 4.93827e-06 -3.24e+06 ----- Mail original ----- > De: "Jed Brown" > ?: "Franck Houssen" , "PETSc users list" > Envoy?: Vendredi 26 Mai 2017 19:27:42 > Objet: Re: [petsc-users] How to VecView with a formatted precision (%10.8f) ? > > No, but this could be added to the ASCII viewer. Why do you want it? > > Franck Houssen writes: > > > How to VecView with a formatted precision (%10.8f) ? Not possible ? > > > > Franck > -------------- next part -------------- A non-text attachment was scrubbed... Name: vecViewPrecision.cpp Type: text/x-c++src Size: 772 bytes Desc: not available URL: From franck.houssen at inria.fr Tue May 30 02:21:47 2017 From: franck.houssen at inria.fr (Franck Houssen) Date: Tue, 30 May 2017 09:21:47 +0200 (CEST) Subject: [petsc-users] Must I destroy the local matrix I have (created and) set with MatISSetLocalMat ? In-Reply-To: <872046060.405248.1496128843952.JavaMail.zimbra@inria.fr> Message-ID: <1901003584.405720.1496128907192.JavaMail.zimbra@inria.fr> Must I destroy the local matrix I have (created and) set with MatISSetLocalMat ? Franck -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue May 30 06:10:43 2017 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 30 May 2017 06:10:43 -0500 Subject: [petsc-users] Must I destroy the local matrix I have (created and) set with MatISSetLocalMat ? In-Reply-To: <1901003584.405720.1496128907192.JavaMail.zimbra@inria.fr> References: <872046060.405248.1496128843952.JavaMail.zimbra@inria.fr> <1901003584.405720.1496128907192.JavaMail.zimbra@inria.fr> Message-ID: On Tue, May 30, 2017 at 2:21 AM, Franck Houssen wrote: > Must I destroy the local matrix I have (created and) set with > MatISSetLocalMat ? > Yes. Matt > Franck > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Tue May 30 10:36:14 2017 From: jed at jedbrown.org (Jed Brown) Date: Tue, 30 May 2017 09:36:14 -0600 Subject: [petsc-users] Must I destroy the local matrix I have (created and) set with MatISSetLocalMat ? In-Reply-To: <1901003584.405720.1496128907192.JavaMail.zimbra@inria.fr> References: <1901003584.405720.1496128907192.JavaMail.zimbra@inria.fr> Message-ID: <87r2z6qrv5.fsf@jedbrown.org> Franck Houssen writes: > Must I destroy the local matrix I have (created and) set with MatISSetLocalMat ? The implementation references the local matrix so you need to destroy your copy. This pattern is always used when setting sub-objects like this. static PetscErrorCode MatISSetLocalMat_IS(Mat mat,Mat local) { Mat_IS *is = (Mat_IS*)mat->data; PetscInt nrows,ncols,orows,ocols; PetscErrorCode ierr; PetscFunctionBegin; if (is->A) { ierr = MatGetSize(is->A,&orows,&ocols);CHKERRQ(ierr); ierr = MatGetSize(local,&nrows,&ncols);CHKERRQ(ierr); if (orows != nrows || ocols != ncols) SETERRQ4(PETSC_COMM_SELF,PETSC_ERR_ARG_SIZ,"Local MATIS matrix should be of size %Dx%D (you passed a %Dx%D matrix)",orows,ocols,nrows,ncols); } ierr = PetscObjectReference((PetscObject)local);CHKERRQ(ierr); ierr = MatDestroy(&is->A);CHKERRQ(ierr); is->A = local; PetscFunctionReturn(0); } -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From bsmith at mcs.anl.gov Tue May 30 13:22:16 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 30 May 2017 13:22:16 -0500 Subject: [petsc-users] How to VecView with a formatted precision (%10.8f) ? In-Reply-To: <1788087205.403799.1496128498132.JavaMail.zimbra@inria.fr> References: <1559132119.8316480.1495792332227.JavaMail.zimbra@inria.fr> <87shjrwmsx.fsf@jedbrown.org> <1788087205.403799.1496128498132.JavaMail.zimbra@inria.fr> Message-ID: <1124CE1F-F1DC-4772-8ACB-932244B3518E@mcs.anl.gov> When I want "full precision" for debugging purposes I use PetscViewerPushFormat(viewer,PETSC_VIEWER_ASCII_MATLAB); > On May 30, 2017, at 2:14 AM, Franck Houssen wrote: > > Mainly for debugging purposes: controlling format/precision could be convenient ! > > Franck > > ~> mpirun -n 5 ./vecViewPrecision.exe > Vec Object: 5 MPI processes > type: mpi > Process [0] > 0. > 0. > Process [1] > 1.23457e+06 > -8.1e-07 > Process [2] > 2.46914e-06 > -1.62e+06 > Process [3] > 3.7037e+06 > -2.43e-06 > Process [4] > 4.93827e-06 > -3.24e+06 > > > ----- Mail original ----- >> De: "Jed Brown" >> ?: "Franck Houssen" , "PETSc users list" >> Envoy?: Vendredi 26 Mai 2017 19:27:42 >> Objet: Re: [petsc-users] How to VecView with a formatted precision (%10.8f) ? >> >> No, but this could be added to the ASCII viewer. Why do you want it? >> >> Franck Houssen writes: >> >>> How to VecView with a formatted precision (%10.8f) ? Not possible ? >>> >>> Franck >> > From j.pogacnik at auckland.ac.nz Tue May 30 22:19:34 2017 From: j.pogacnik at auckland.ac.nz (Justin Pogacnik) Date: Wed, 31 May 2017 03:19:34 +0000 Subject: [petsc-users] PetscFECreateDefault in Fortran Message-ID: <1496200773990.42892@auckland.ac.nz> Hello, I'm developing a finite element code in fortran 90. I recently updated my PETSc and am now getting the following error during compile/linking on an existing application: Undefined symbols for architecture x86_64: "_petscfecreatedefault_", referenced from: _MAIN__ in fe_test.o ld: symbol(s) not found for architecture x86_64 collect2: error: ld returned 1 exit status make: *** [dist/fe_test] Error 1 I'm running Mac OS X Yosemite (10.10.5). I've created a "minimum working example" (attached) that re-creates the problem. It's basically just dm/impls/plex/examples/tutorials/ex3f90, but tries to create a PetscFE object. Everything goes fine and the DM looks like what is expected if PetscFECreateDefault is commented out. Any idea what am I missing? Many thanks! Justin -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: fe_test.F90 Type: application/octet-stream Size: 1679 bytes Desc: fe_test.F90 URL: From lirui319 at hnu.edu.cn Wed May 31 03:29:55 2017 From: lirui319 at hnu.edu.cn (=?GBK?B?wO7I8A==?=) Date: Wed, 31 May 2017 16:29:55 +0800 (GMT+08:00) Subject: [petsc-users] Installation Error In-Reply-To: References: <15e1cc1.5bb6.15c38e117d7.Coremail.lirui319@hnu.edu.cn> Message-ID: <1ee9c6f.7d88.15c5da025b7.Coremail.lirui319@hnu.edu.cn> this problem was already approached.Thank you for your help! :) ?2017-05-24 20:57:09,????? > What do you have for: > > which python > echo $PYTHONPATH > > > The following might work.. > > PYTHONPATH='' /usr/bin/python ./configure --with-cc=gcc --with-cxx=0 --with-fc=0 --download-f2cblaslapack --download-mpich > > Satish > > > On Wed, 24 May 2017, ?? wrote: > > > > > Dear professor or engineer: > > I meet a problem about installation to petsc. > > When I type the code "./configure --with-cc=gcc --with-cxx=0 --with-fc=0 --download-f2cblaslapack --download-mpich" on my terminal,the answer reveals the following results. > > > > >>>ERROR:root:code for hash md5 was not found. > > Traceback (most recent call last): > > File "/home/zhuizhuluori/lirui/software/vapor-2.5.0-Linux_x86_64/vapor/vapor-2.5.0/lib/python2.7/hashlib.py", line 139, in > globals()[__func_name] = __get_hash(__func_name) > > File "/home/zhuizhuluori/lirui/software/vapor-2.5.0-Linux_x86_64/vapor/vapor-2.5.0/lib/python2.7/hashlib.py", line 91, in __get_builtin_constructor > > raise ValueError('unsupported hash type ' + name) > > ValueError: unsupported hash type md5 > > ERROR:root:code for hash sha1 was not found ..... > > > > I have used petsc for a long time,and never see the this problem.my laptop is installed an old version of petsc and I wanna change it to a new version.How can I fix it?Thanks for your heartful suggestion! > > > > > > > > > > > > From knepley at gmail.com Wed May 31 07:53:16 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 31 May 2017 07:53:16 -0500 Subject: [petsc-users] PetscFECreateDefault in Fortran In-Reply-To: <1496200773990.42892@auckland.ac.nz> References: <1496200773990.42892@auckland.ac.nz> Message-ID: On Tue, May 30, 2017 at 10:19 PM, Justin Pogacnik wrote: > Hello, > > I'm developing a finite element code in fortran 90. I recently updated my > PETSc and am now getting the following error during compile/linking on an > existing application: > > Undefined symbols for architecture x86_64: > > "_petscfecreatedefault_", referenced from: > > _MAIN__ in fe_test.o > > ld: symbol(s) not found for architecture x86_64 > > collect2: error: ld returned 1 exit status > > make: *** [dist/fe_test] Error 1 > > > I'm running Mac OS X Yosemite (10.10.5). I've created a "minimum working > example" (attached) that re-creates the problem. It's basically > just dm/impls/plex/examples/tutorials/ex3f90, but tries to create a > PetscFE object. Everything goes fine and the DM looks like what is expected > if PetscFECreateDefault is commented out. Any idea what am I missing? > Yes, I had not made a Fortran binding for this function. I will do it now. Thanks, Matt > Many thanks! > > Justin > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 31 08:34:22 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 31 May 2017 08:34:22 -0500 Subject: [petsc-users] PetscFECreateDefault in Fortran In-Reply-To: References: <1496200773990.42892@auckland.ac.nz> Message-ID: On Wed, May 31, 2017 at 7:53 AM, Matthew Knepley wrote: > On Tue, May 30, 2017 at 10:19 PM, Justin Pogacnik < > j.pogacnik at auckland.ac.nz> wrote: > >> Hello, >> >> I'm developing a finite element code in fortran 90. I recently updated my >> PETSc and am now getting the following error during compile/linking on an >> existing application: >> >> Undefined symbols for architecture x86_64: >> >> "_petscfecreatedefault_", referenced from: >> >> _MAIN__ in fe_test.o >> >> ld: symbol(s) not found for architecture x86_64 >> >> collect2: error: ld returned 1 exit status >> >> make: *** [dist/fe_test] Error 1 >> >> >> I'm running Mac OS X Yosemite (10.10.5). I've created a "minimum working >> example" (attached) that re-creates the problem. It's basically >> just dm/impls/plex/examples/tutorials/ex3f90, but tries to create a >> PetscFE object. Everything goes fine and the DM looks like what is expected >> if PetscFECreateDefault is commented out. Any idea what am I missing? >> > Yes, I had not made a Fortran binding for this function. I will do it now. > I have merged it to the 'next' branch, and it will be in 'master' soon. Thanks, Matt > Thanks, > > Matt > > >> Many thanks! >> >> Justin >> >> >> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From franck.houssen at inria.fr Wed May 31 10:59:53 2017 From: franck.houssen at inria.fr (Franck Houssen) Date: Wed, 31 May 2017 17:59:53 +0200 (CEST) Subject: [petsc-users] If I use MatISSetLocalMat with a MatCreateSeqAIJ local matrix, do I need to use MatISSetPreallocation for the global matrix ? In-Reply-To: <2045825082.1144082.1496246089403.JavaMail.zimbra@inria.fr> Message-ID: <1636134414.1146088.1496246393744.JavaMail.zimbra@inria.fr> If I use MatISSetLocalMat with a preallocated MatCreateSeqAIJ local matrix, do I need to use MatISSetPreallocation for the global matrix ? Here is the pseudo-code: MatCreateIS(PETSC_COMM_WORLD, ..., &globalMat) MatISSetPreallocation(globalMatrix, ...) // Is this necessary ? MatCreateSeqAIJ(PETSC_COMM_SELF, ..., &localMatrix) // Prealloc done on the fly MatSetValues(localMatrix, ...) MatISSetLocalMat(globalMatrix, localMatrix) Is it necessary to call MatISSetPreallocation for globalMatrix ? (prealloc should have been done locally for each local matrix, no ?) Franck -------------- next part -------------- An HTML attachment was scrubbed... URL: From franck.houssen at inria.fr Wed May 31 11:22:00 2017 From: franck.houssen at inria.fr (Franck Houssen) Date: Wed, 31 May 2017 18:22:00 +0200 (CEST) Subject: [petsc-users] When using MatIS, do I need to call MatAssemblyBegin/End between MatISSetLocalMat (local set) and MatISGetMPIXAIJ (get global assembly) ? In-Reply-To: <1834116955.1151067.1496247616349.JavaMail.zimbra@inria.fr> Message-ID: <1040288737.1151290.1496247720613.JavaMail.zimbra@inria.fr> When using MatIS, do I need to call MatAssemblyBegin/End between MatISSetLocalMat (local set) and MatISGetMPIXAIJ (get global assembly) ? Franck -------------- next part -------------- An HTML attachment was scrubbed... URL: From kannanr at ornl.gov Wed May 31 14:46:18 2017 From: kannanr at ornl.gov (Kannan, Ramakrishnan) Date: Wed, 31 May 2017 19:46:18 +0000 Subject: [petsc-users] slepc on 1D row distributed matrix In-Reply-To: References: Message-ID: <628DF9C9-8C85-4B0E-AE88-CCD2432008C7@ornl.gov> Hello, I have got a sparse 1D row distributed matrix in which every MPI process owns an m/p x n of the global matrix mxn. I am running NHEP with krylovschur on it. It is throwing me some wrong error. For your reference, I have attached the modified ex5.c in which I SetSizes on the matrix to emulate the 1D row distribution and the log file with the error. In the unmodified ex5.c, for m=5, N=15, the local_m and the local_n is 3x3. How is the global 15x15 matrix distributed locally as 3x3 matrices? When I print the global matrix, it doesn?t appear to be diagonal as well. If slepc doesn?t support sparse 1D row distributed matrix, how do I need to redistribute it such that I can run NHEP on this. -- Regards, Ramki -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex5.c Type: application/octet-stream Size: 7780 bytes Desc: ex5.c URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: slepc.o607511 Type: application/octet-stream Size: 26570 bytes Desc: slepc.o607511 URL: From jed at jedbrown.org Wed May 31 15:06:11 2017 From: jed at jedbrown.org (Jed Brown) Date: Wed, 31 May 2017 17:36:11 -0230 Subject: [petsc-users] PETSc User Meeting 2017, June 14-16 in Boulder, Colorado In-Reply-To: <87shjsxmyh.fsf@jedbrown.org> References: <87y3wbtk1i.fsf@jedbrown.org> <87shjsxmyh.fsf@jedbrown.org> Message-ID: <87zidsokp8.fsf@jedbrown.org> Correction: it is still possible to book lodging today (closes at midnight Mountain Time). See you in two short weeks. Thanks! Jed Brown writes: > The program is up on the website: > > https://www.mcs.anl.gov/petsc/meetings/2017/ > > If you haven't registered yet, we can still accommodate you, but please > register soon. If you haven't booked lodging, please do that soon -- > the on-campus lodging option will close on *Tuesday, May 30*. > > https://confreg.colorado.edu/CSM2017 > > We are looking forward to seeing you in Boulder! > > Jed Brown writes: > >> We'd like to invite you to join us at the 2017 PETSc User Meeting held >> at the University of Colorado Boulder on June 14-16, 2017. >> >> http://www.mcs.anl.gov/petsc/meetings/2017/ >> >> The first day consists of tutorials on various aspects and features of >> PETSc. The second and third days will be devoted to exchange, >> discussions, and a refinement of strategies for the future with our >> users. We encourage you to present work illustrating your own use of >> PETSc, for example in applications or in libraries built on top of >> PETSc. >> >> Registration for the PETSc User Meeting 2017 is free for students and >> $75 for non-students. We can host a maximum of 150 participants, so >> register soon (and by May 15). >> >> http://www.eventzilla.net/web/e/petsc-user-meeting-2017-2138890185 >> >> We are also offering low-cost lodging on campus. A lodging registration >> site will be available soon and announced here and on the website. >> >> Thanks to the generosity of Intel, we will be able to offer a limited >> number of student travel grants. We are also soliciting additional >> sponsors -- please contact us if you are interested. >> >> >> We are looking forward to seeing you in Boulder! >> >> Please contact us at petsc2017 at mcs.anl.gov if you have any questions or >> comments. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From jroman at dsic.upv.es Wed May 31 15:26:40 2017 From: jroman at dsic.upv.es (Jose E. Roman) Date: Wed, 31 May 2017 22:26:40 +0200 Subject: [petsc-users] slepc on 1D row distributed matrix In-Reply-To: <628DF9C9-8C85-4B0E-AE88-CCD2432008C7@ornl.gov> References: <628DF9C9-8C85-4B0E-AE88-CCD2432008C7@ornl.gov> Message-ID: <57665F1B-33B8-4448-A6C5-BFA3D14AA99C@dsic.upv.es> > El 31 may 2017, a las 21:46, Kannan, Ramakrishnan escribi?: > > Hello, > > I have got a sparse 1D row distributed matrix in which every MPI process owns an m/p x n of the global matrix mxn. I am running NHEP with krylovschur on it. It is throwing me some wrong error. For your reference, I have attached the modified ex5.c in which I SetSizes on the matrix to emulate the 1D row distribution and the log file with the error. > > In the unmodified ex5.c, for m=5, N=15, the local_m and the local_n is 3x3. How is the global 15x15 matrix distributed locally as 3x3 matrices? When I print the global matrix, it doesn?t appear to be diagonal as well. > > If slepc doesn?t support sparse 1D row distributed matrix, how do I need to redistribute it such that I can run NHEP on this. > -- > Regards, > Ramki > > As explained in the manpage, the local columns size n must match the local size of the x vector, so it must also be N/mpisize http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatSetSizes.html But be warned that your code will not work when N is not divisible by mpisize. In that case, global and local dimensions won't match. Setting local sizes is not necessary in your case, since by default PETSc is already doing a 1D block-row distribution. Jose From kannanr at ornl.gov Wed May 31 16:14:55 2017 From: kannanr at ornl.gov (Kannan, Ramakrishnan) Date: Wed, 31 May 2017 21:14:55 +0000 Subject: [petsc-users] slepc on 1D row distributed matrix In-Reply-To: <57665F1B-33B8-4448-A6C5-BFA3D14AA99C@dsic.upv.es> References: <628DF9C9-8C85-4B0E-AE88-CCD2432008C7@ornl.gov> <57665F1B-33B8-4448-A6C5-BFA3D14AA99C@dsic.upv.es> Message-ID: <3A050906-27B2-4C2D-B101-A16CC1EB78CA@ornl.gov> Jose, Thank you for the quick reply. In this specific example, there are 5 mpi processes and each process owns an 1D row distributed matrix of size 3x15. According to the MatSetSizes, I should set local rows, local cols, global rows, global cols which in this case are 3,15,15,15 respectively. Instead why would I set 3,3,15,15. Also in our program, I use global_row_idx, global_col_idx for MatSetValues. If I set 3,3,15,15 instead of 3,15,15,15, my MatSetValues fails with the error ?nnz cannot be greater than row length:?. Also to test the 3,15,15,15 in MatSetSizes to be right, we called a MatCreateVec and MatMult of petsc which seemed to work alright too. Appreciate your kind help. -- Regards, Ramki On 5/31/17, 4:26 PM, "Jose E. Roman" wrote: > El 31 may 2017, a las 21:46, Kannan, Ramakrishnan escribi?: > > Hello, > > I have got a sparse 1D row distributed matrix in which every MPI process owns an m/p x n of the global matrix mxn. I am running NHEP with krylovschur on it. It is throwing me some wrong error. For your reference, I have attached the modified ex5.c in which I SetSizes on the matrix to emulate the 1D row distribution and the log file with the error. > > In the unmodified ex5.c, for m=5, N=15, the local_m and the local_n is 3x3. How is the global 15x15 matrix distributed locally as 3x3 matrices? When I print the global matrix, it doesn?t appear to be diagonal as well. > > If slepc doesn?t support sparse 1D row distributed matrix, how do I need to redistribute it such that I can run NHEP on this. > -- > Regards, > Ramki > > As explained in the manpage, the local columns size n must match the local size of the x vector, so it must also be N/mpisize http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatSetSizes.html But be warned that your code will not work when N is not divisible by mpisize. In that case, global and local dimensions won't match. Setting local sizes is not necessary in your case, since by default PETSc is already doing a 1D block-row distribution. Jose From bsmith at mcs.anl.gov Wed May 31 19:13:17 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 31 May 2017 19:13:17 -0500 Subject: [petsc-users] slepc on 1D row distributed matrix In-Reply-To: <3A050906-27B2-4C2D-B101-A16CC1EB78CA@ornl.gov> References: <628DF9C9-8C85-4B0E-AE88-CCD2432008C7@ornl.gov> <57665F1B-33B8-4448-A6C5-BFA3D14AA99C@dsic.upv.es> <3A050906-27B2-4C2D-B101-A16CC1EB78CA@ornl.gov> Message-ID: <8A10B5D7-DDF6-4F69-8A42-290E70CA2596@mcs.anl.gov> > On May 31, 2017, at 4:14 PM, Kannan, Ramakrishnan wrote: > > Jose, > > Thank you for the quick reply. > > In this specific example, there are 5 mpi processes and each process owns an 1D row distributed matrix of size 3x15. According to the MatSetSizes, I should set local rows, local cols, global rows, global cols which in this case are 3,15,15,15 respectively. Instead why would I set 3,3,15,15. You have not read carefully the definition of "local size" for matrices in PETSc. http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatSetSizes.html > > Also in our program, I use global_row_idx, global_col_idx for MatSetValues. If I set 3,3,15,15 instead of 3,15,15,15, my MatSetValues fails with the error ?nnz cannot be greater than row length:?. This is a different problem that may need to be tracked down. > Also to test the 3,15,15,15 in MatSetSizes to be right, we called a MatCreateVec and MatMult of petsc which seemed to work alright too. This will not work under normal circumstances so something else must be different as well. Barry > > Appreciate your kind help. > -- > Regards, > Ramki > > > On 5/31/17, 4:26 PM, "Jose E. Roman" wrote: > > >> El 31 may 2017, a las 21:46, Kannan, Ramakrishnan escribi?: >> >> Hello, >> >> I have got a sparse 1D row distributed matrix in which every MPI process owns an m/p x n of the global matrix mxn. I am running NHEP with krylovschur on it. It is throwing me some wrong error. For your reference, I have attached the modified ex5.c in which I SetSizes on the matrix to emulate the 1D row distribution and the log file with the error. >> >> In the unmodified ex5.c, for m=5, N=15, the local_m and the local_n is 3x3. How is the global 15x15 matrix distributed locally as 3x3 matrices? When I print the global matrix, it doesn?t appear to be diagonal as well. >> >> If slepc doesn?t support sparse 1D row distributed matrix, how do I need to redistribute it such that I can run NHEP on this. >> -- >> Regards, >> Ramki >> >> > > As explained in the manpage, the local columns size n must match the local size of the x vector, so it must also be N/mpisize > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatSetSizes.html > > But be warned that your code will not work when N is not divisible by mpisize. In that case, global and local dimensions won't match. > > Setting local sizes is not necessary in your case, since by default PETSc is already doing a 1D block-row distribution. > > Jose > > > > From j.pogacnik at auckland.ac.nz Wed May 31 22:00:09 2017 From: j.pogacnik at auckland.ac.nz (Justin Pogacnik) Date: Thu, 1 Jun 2017 03:00:09 +0000 Subject: [petsc-users] PetscFECreateDefault in Fortran In-Reply-To: References: <1496200773990.42892@auckland.ac.nz> , Message-ID: <1496286009918.20206@auckland.ac.nz> Thanks Matt! That works perfectly now. I have another question regarding accessing the quadrature information. When I use PetscFEGetQuadrature(), then PetscQuadratureView(), I see what I expect regarding point locations, weights. However, when I try to use PetscQuadratureGetData() the pointers seem to point to random memory locations. The exact line from my test problem is: call PetscQuadratureGetData(quad,q_nc,q_dim,q_num,pq_points,pq_weights,ierr); where the pq_* are the pointers giving strange output. The q_nc, q_dim, and q_num are all giving what I would expect to see. Happy to send along the file if that helps. Thanks again, Justin ________________________________ From: Matthew Knepley Sent: Thursday, June 1, 2017 1:34 AM To: Justin Pogacnik Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] PetscFECreateDefault in Fortran On Wed, May 31, 2017 at 7:53 AM, Matthew Knepley > wrote: On Tue, May 30, 2017 at 10:19 PM, Justin Pogacnik > wrote: Hello, I'm developing a finite element code in fortran 90. I recently updated my PETSc and am now getting the following error during compile/linking on an existing application: Undefined symbols for architecture x86_64: "_petscfecreatedefault_", referenced from: _MAIN__ in fe_test.o ld: symbol(s) not found for architecture x86_64 collect2: error: ld returned 1 exit status make: *** [dist/fe_test] Error 1 I'm running Mac OS X Yosemite (10.10.5). I've created a "minimum working example" (attached) that re-creates the problem. It's basically just dm/impls/plex/examples/tutorials/ex3f90, but tries to create a PetscFE object. Everything goes fine and the DM looks like what is expected if PetscFECreateDefault is commented out. Any idea what am I missing? Yes, I had not made a Fortran binding for this function. I will do it now. I have merged it to the 'next' branch, and it will be in 'master' soon. Thanks, Matt Thanks, Matt Many thanks! Justin -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/ -------------- next part -------------- An HTML attachment was scrubbed... URL: