From sajidsyed2021 at u.northwestern.edu Wed May 1 12:02:02 2019 From: sajidsyed2021 at u.northwestern.edu (Sajid Ali) Date: Wed, 1 May 2019 12:02:02 -0500 Subject: [petsc-users] Quick question about ISCreateGeneral In-Reply-To: References: Message-ID: Hi Barry, I've written a simple program that does a scatter and reverses the order of data between two vectors with locally generate index sets and it works. While I'd have expected that I would need to concatenate the index sets before calling vecscatter, the program works without doing so (hopefully making it more efficient). Does calling vecscatter on each rank with the local index set take care of the necessary communication behind the scenes then? Thank You, Sajid Ali Applied Physics Northwestern University -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex_modify.c Type: application/octet-stream Size: 4181 bytes Desc: not available URL: From bsmith at mcs.anl.gov Wed May 1 13:20:23 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Wed, 1 May 2019 18:20:23 +0000 Subject: [petsc-users] Quick question about ISCreateGeneral In-Reply-To: References: Message-ID: <1A55587E-21A1-482B-BE4A-F8296FC3D9AE@mcs.anl.gov> > On May 1, 2019, at 12:02 PM, Sajid Ali wrote: > > Hi Barry, > > I've written a simple program that does a scatter and reverses the order of data between two vectors with locally generate index sets and it works. While I'd have expected that I would need to concatenate the index sets before calling vecscatter, the program works without doing so (hopefully making it more efficient). I am not sure what you mean by concatenating the index sets. When using a parallel vector associated with the IS the values are "virtually" concatenated in the sense that they are treated as just one huge array of indices but they need not be physically concatenated. Each process just keeps its part. > Does calling vecscatter on each rank with the local index set take care of the necessary communication behind the scenes then? Yes, that is what the VecScatter does, it figures out based on what each process provides what communication needs to take place. Barry > > Thank You, > Sajid Ali > Applied Physics > Northwestern University > From zakaryah at gmail.com Wed May 1 16:42:46 2019 From: zakaryah at gmail.com (zakaryah) Date: Wed, 1 May 2019 17:42:46 -0400 Subject: [petsc-users] Vector layout in PetscBinaryRead.m Message-ID: I'm using PETSc to solve some equations, outputting the results using PetscViewerBinaryOpen and VecView, then loading the vector files into Matlab using PetscBinaryRead.m. The vectors are global vectors created from a 3D DMDA. Is there a way to extract the layout from the binary file, so that I can visualize the vectors on a 3D grid? -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 1 16:51:34 2019 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 1 May 2019 17:51:34 -0400 Subject: [petsc-users] Vector layout in PetscBinaryRead.m In-Reply-To: References: Message-ID: On Wed, May 1, 2019 at 5:44 PM zakaryah via petsc-users < petsc-users at mcs.anl.gov> wrote: > I'm using PETSc to solve some equations, outputting the results using > PetscViewerBinaryOpen and VecView, then loading the vector files into > Matlab using PetscBinaryRead.m. The vectors are global vectors created > from a 3D DMDA. Is there a way to extract the layout from the binary file, > so that I can visualize the vectors on a 3D grid? > No, we do not preserve that information in the PETSc binary format. We have richer output that does, like HDF5. Thanks, Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From zakaryah at gmail.com Wed May 1 16:55:48 2019 From: zakaryah at gmail.com (zakaryah) Date: Wed, 1 May 2019 17:55:48 -0400 Subject: [petsc-users] Vector layout in PetscBinaryRead.m In-Reply-To: References: Message-ID: Thanks Matt. I have had problems getting HDF5 to work when I run my solver in parallel. Can you link me to an example which writes a vector as HDF5 that will work in parallel? To be clear - I don't care if the I/O is in serial, I just need the VecView to not crash with multiple processors. On Wed, May 1, 2019 at 5:51 PM Matthew Knepley wrote: > On Wed, May 1, 2019 at 5:44 PM zakaryah via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> I'm using PETSc to solve some equations, outputting the results using >> PetscViewerBinaryOpen and VecView, then loading the vector files into >> Matlab using PetscBinaryRead.m. The vectors are global vectors created >> from a 3D DMDA. Is there a way to extract the layout from the binary file, >> so that I can visualize the vectors on a 3D grid? >> > > No, we do not preserve that information in the PETSc binary format. We > have richer output that does, like HDF5. > > Thanks, > > Matt > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 1 17:03:10 2019 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 1 May 2019 18:03:10 -0400 Subject: [petsc-users] Vector layout in PetscBinaryRead.m In-Reply-To: References: Message-ID: On Wed, May 1, 2019 at 5:57 PM zakaryah via petsc-users < petsc-users at mcs.anl.gov> wrote: > Thanks Matt. I have had problems getting HDF5 to work when I run my > solver in parallel. Can you link me to an example which writes a vector as > HDF5 that will work in parallel? To be clear - I don't care if the I/O is > in serial, I just need the VecView to not crash with multiple processors. > Any crash is a bug. Here is generic code: ierr = VecViewFromOptions(v, NULL, "-my_vec_view");CHKERRQ(ierr); and then run with ./myprog -my_vec_view hdf5:v.h5 Thanks, Matt > On Wed, May 1, 2019 at 5:51 PM Matthew Knepley wrote: > >> On Wed, May 1, 2019 at 5:44 PM zakaryah via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> >>> I'm using PETSc to solve some equations, outputting the results using >>> PetscViewerBinaryOpen and VecView, then loading the vector files into >>> Matlab using PetscBinaryRead.m. The vectors are global vectors created >>> from a 3D DMDA. Is there a way to extract the layout from the binary file, >>> so that I can visualize the vectors on a 3D grid? >>> >> >> No, we do not preserve that information in the PETSc binary format. We >> have richer output that does, like HDF5. >> >> Thanks, >> >> Matt >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu May 2 02:08:50 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Thu, 2 May 2019 07:08:50 +0000 Subject: [petsc-users] Vector layout in PetscBinaryRead.m In-Reply-To: References: Message-ID: <02933E93-9058-4105-824D-B2930D077F9B@anl.gov> Actually there is magic code that can be generated to tell Matlab the "shape" of the result when coming from DMDA. You need to first push onto the viewer the format PETSC_VIEWER_BINARY_MATLAB. This will cause the VecView to save additional information about the dimensions (not in the binary file) but in the file with the same name but ending in .info. Once the vec saves are done you just look at the .info file and it tells you what matlab commands to run to load the vector into matlab and how to reshape it for 3d. Good luck, Barry > On May 1, 2019, at 4:42 PM, zakaryah via petsc-users wrote: > > I'm using PETSc to solve some equations, outputting the results using PetscViewerBinaryOpen and VecView, then loading the vector files into Matlab using PetscBinaryRead.m. The vectors are global vectors created from a 3D DMDA. Is there a way to extract the layout from the binary file, so that I can visualize the vectors on a 3D grid? From D.Liu-4 at tudelft.nl Thu May 2 06:45:57 2019 From: D.Liu-4 at tudelft.nl (Dongyu Liu) Date: Thu, 2 May 2019 13:45:57 +0200 Subject: [petsc-users] [PETSc-Users] The MatSetValues takes too much time Message-ID: <0d0e9cb1-e8d3-f49f-f1f8-038cb71c18f1@tudelft.nl> Hi, I am using PETSc sparse matrix ('aij') for a FEM program, when the matrix is really big, let's say 1million by 1million, we use setValues based on the index vector I and J, but the whole process takes around 2 hours. I don't know why it is like that, or is there any option that I should set to make this faster? Best, Dongyu From mfadams at lbl.gov Thu May 2 06:57:10 2019 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 2 May 2019 07:57:10 -0400 Subject: [petsc-users] [PETSc-Users] The MatSetValues takes too much time In-Reply-To: <0d0e9cb1-e8d3-f49f-f1f8-038cb71c18f1@tudelft.nl> References: <0d0e9cb1-e8d3-f49f-f1f8-038cb71c18f1@tudelft.nl> Message-ID: You need to set the preallocation for the matrix. https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatMPIAIJSetPreallocation.html On Thu, May 2, 2019 at 7:46 AM Dongyu Liu via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi, > > I am using PETSc sparse matrix ('aij') for a FEM program, when the > matrix is really big, let's say 1million by 1million, we use setValues > based on the index vector I and J, but the whole process takes around 2 > hours. > > I don't know why it is like that, or is there any option that I should > set to make this faster? > > > Best, > > Dongyu > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu May 2 06:59:32 2019 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 2 May 2019 07:59:32 -0400 Subject: [petsc-users] [PETSc-Users] The MatSetValues takes too much time In-Reply-To: References: <0d0e9cb1-e8d3-f49f-f1f8-038cb71c18f1@tudelft.nl> Message-ID: There is also a manual section on doing this. Thanks, Matt On Thu, May 2, 2019 at 7:57 AM Mark Adams via petsc-users < petsc-users at mcs.anl.gov> wrote: > You need to set the preallocation for the matrix. > > > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatMPIAIJSetPreallocation.html > > On Thu, May 2, 2019 at 7:46 AM Dongyu Liu via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> Hi, >> >> I am using PETSc sparse matrix ('aij') for a FEM program, when the >> matrix is really big, let's say 1million by 1million, we use setValues >> based on the index vector I and J, but the whole process takes around 2 >> hours. >> >> I don't know why it is like that, or is there any option that I should >> set to make this faster? >> >> >> Best, >> >> Dongyu >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From zakaryah at gmail.com Thu May 2 14:11:12 2019 From: zakaryah at gmail.com (zakaryah) Date: Thu, 2 May 2019 15:11:12 -0400 Subject: [petsc-users] Vector layout in PetscBinaryRead.m In-Reply-To: References: Message-ID: I get a segfault on VecViewFromOptions, with the command line you suggested. Likewise, if I execute ex10 (Tests I/O of vectors for different data formats (binary,HDF5) and illustrates the use of user-defined event logging), even with -n 1, I get a segfault. Is VecViewFromOptions documented somewhere? Google only turns up the version from 3.2, which took two arguments. From the source, it looks like a wrapper for PetscObjectViewFromOptions, but I still don't see all the format specifications. I'd like to test whether the same code crashes for other formats, like binary, matlab, etc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu May 2 14:29:57 2019 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 2 May 2019 15:29:57 -0400 Subject: [petsc-users] Vector layout in PetscBinaryRead.m In-Reply-To: References: Message-ID: On Thu, May 2, 2019 at 3:13 PM zakaryah via petsc-users < petsc-users at mcs.anl.gov> wrote: > I get a segfault on VecViewFromOptions, with the command line you > suggested. Likewise, if I execute ex10 (Tests I/O of vectors for > different data formats (binary,HDF5) and illustrates the use of > user-defined event logging), even with -n 1, I get a segfault. > > Is VecViewFromOptions documented somewhere? Google only turns up the > version from 3.2, which took two arguments. From the source, it looks like > a wrapper for PetscObjectViewFromOptions, but I still don't see all the > format specifications. I'd like to test whether the same code crashes for > other formats, like binary, matlab, etc. > I think it must be you call. Lets try and example first: cd $PETSC_DIR/src/snes/examples/tutorials make ex5 ./ex5 -mms 1 -sol_view hdf5:sol.h5 Thanks, Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From zakaryah at gmail.com Thu May 2 14:32:06 2019 From: zakaryah at gmail.com (zakaryah) Date: Thu, 2 May 2019 15:32:06 -0400 Subject: [petsc-users] Vector layout in PetscBinaryRead.m In-Reply-To: References: Message-ID: Thanks Matt - that snippet ran fine. On Thu, May 2, 2019 at 3:30 PM Matthew Knepley wrote: > On Thu, May 2, 2019 at 3:13 PM zakaryah via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> I get a segfault on VecViewFromOptions, with the command line you >> suggested. Likewise, if I execute ex10 (Tests I/O of vectors for >> different data formats (binary,HDF5) and illustrates the use of >> user-defined event logging), even with -n 1, I get a segfault. >> >> Is VecViewFromOptions documented somewhere? Google only turns up the >> version from 3.2, which took two arguments. From the source, it looks like >> a wrapper for PetscObjectViewFromOptions, but I still don't see all the >> format specifications. I'd like to test whether the same code crashes for >> other formats, like binary, matlab, etc. >> > > I think it must be you call. Lets try and example first: > > cd $PETSC_DIR/src/snes/examples/tutorials > make ex5 > ./ex5 -mms 1 -sol_view hdf5:sol.h5 > > Thanks, > > Matt > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu May 2 16:07:28 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Thu, 2 May 2019 21:07:28 +0000 Subject: [petsc-users] Vector layout in PetscBinaryRead.m In-Reply-To: References: Message-ID: > On May 2, 2019, at 2:11 PM, zakaryah via petsc-users wrote: > > I get a segfault on VecViewFromOptions, with the command line you suggested. Likewise, if I execute ex10 (Tests I/O of vectors for different data formats (binary,HDF5) and illustrates the use of user-defined event logging), even with -n 1, I get a segfault. Please send the command line options you use and all the output with the error message. We would like for it to never crash with a segfault but rather for it to return a very useful error message if your syntax is wrong. Barry > > Is VecViewFromOptions documented somewhere? Google only turns up the version from 3.2, which took two arguments. From the source, it looks like a wrapper for PetscObjectViewFromOptions, but I still don't see all the format specifications. I'd like to test whether the same code crashes for other formats, like binary, matlab, etc. > > From cpraveen at gmail.com Fri May 3 01:42:09 2019 From: cpraveen at gmail.com (Praveen C) Date: Fri, 3 May 2019 08:42:09 +0200 Subject: [petsc-users] How to get Gmsh boundary edge/face tags Message-ID: <0880D591-CC96-476D-9119-12191DC39F30@gmail.com> Dear all In a 2d mesh I save some tags like this Physical Line(100) = {1,2}; Physical Line(200) = {3,4}; so that I can identify boundary portions on which bc is to be applied. How can I access this tag (100,200 in above example) after reading the msh file into a DMPlex ? Thanks praveen From knepley at gmail.com Fri May 3 05:11:32 2019 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 3 May 2019 06:11:32 -0400 Subject: [petsc-users] How to get Gmsh boundary edge/face tags In-Reply-To: <0880D591-CC96-476D-9119-12191DC39F30@gmail.com> References: <0880D591-CC96-476D-9119-12191DC39F30@gmail.com> Message-ID: On Fri, May 3, 2019 at 2:42 AM Praveen C via petsc-users < petsc-users at mcs.anl.gov> wrote: > Dear all > > In a 2d mesh I save some tags like this > > Physical Line(100) = {1,2}; > Physical Line(200) = {3,4}; > > so that I can identify boundary portions on which bc is to be applied. > > How can I access this tag (100,200 in above example) after reading the msh > file into a DMPlex ? > Plex uses DMLabel objects to store tags like this. You would first retrieve the label. We store them by name, determined from the input file. For GMsh I think it should be "Face Sets", DMGetLabel(dm, "Face Sets", &label); You can always check the names using -dm_view. Next you can get all the point with a given label values using DMLabelGetStratumIS(label, 3, &pointIS); or check the label of a particular point, DMLabelGetValue(label, point, &val); Thanks, Matt > Thanks > praveen -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From myriam.peyrounette at idris.fr Fri May 3 09:11:18 2019 From: myriam.peyrounette at idris.fr (Myriam Peyrounette) Date: Fri, 3 May 2019 16:11:18 +0200 Subject: [petsc-users] Bad memory scaling with PETSc 3.10 In-Reply-To: References: <00dfa074-cc41-6a3b-257d-28a089e90617@idris.fr> <877ecaeo63.fsf@jedbrown.org> <9629bc17-89d4-fdc9-1b44-7c4a25f62861@idris.fr> <3ad6ca66-d665-739a-874e-7599d5270797@idris.fr> <877ec17g52.fsf@jedbrown.org> <87zhox5gxi.fsf@jedbrown.org> <93a32d83-0b81-8bf3-d654-6711d9b0138f@idris.fr> <26b73c92-6a23-cf03-9e7f-1a24893ee512@idris.fr> <31946231-4948-fc6a-093a-7ed8f00f3579@idris.fr> Message-ID: <13df6685-2825-3629-0c79-cb66f4deae22@idris.fr> Hi, I plotted new scalings (memory and time) using the new algorithms. I used the options /-options_left true /to make sure that the options are effectively used. They are. I don't have access to the platform I used to run my computations on, so I ran them on a different one. In particular, I can't reach problem size = 1e8 and the values might be different from the previous scalings I sent you. But the comparison of the PETSc versions and options is still relevant. I plotted the scalings of reference: the "good" one (PETSc 3.6.4) in green, the "bad" one (PETSc 3.10.2) in blue. I used the commit d330a26 (3.11.1) for all the other scalings, adding different sets of options: /Light blue/ -> -matptap_via allatonce??-mat_freeintermediatedatastructures 1 /Orange/ -> -matptap_via allatonce_*merged*?-mat_freeintermediatedatastructures 1 /Purple/ -> -matptap_via allatonce??-mat_freeintermediatedatastructures 1 *-inner_diag_matmatmult_via scalable -inner_offdiag_matmatmult_via scalable* /Yellow/: -matptap_via allatonce_*merged*?-mat_freeintermediatedatastructures 1 *-inner_diag_matmatmult_via scalable -inner_offdiag_matmatmult_via scalable* Conclusion: with regard to memory, the two algorithms imply a similarly good improvement of the scaling. The use of the -inner_(off)diag_matmatmult_via options is also very interesting. The scaling is still not as good as 3.6.4 though. With regard to time, I noted a real improvement in time execution! I used to spend 200-300s on these executions. Now they take 10-15s. Beside that, the "_merged" versions are more efficient. And the -inner_(off)diaf_matmatmult_via options are slightly expensive but it is not critical. What do you think? Is it possible to match again the scaling of PETSc 3.6.4? Is it worthy keeping investigating? Myriam Le 04/30/19 ? 17:00, Fande Kong a ?crit?: > HI Myriam, > > We are interesting how the new algorithms perform. So there are two > new algorithms you could try. > > Algorithm 1: > > -matptap_via allatonce??-mat_freeintermediatedatastructures 1 > > Algorithm 2: > > -matptap_via allatonce_merged?-mat_freeintermediatedatastructures 1 > > > Note that you need to use the current petsc-master, and also please > put "-snes_view" in your script so that we can confirm these options > are actually get set. > > Thanks, > > Fande, > > > On Tue, Apr 30, 2019 at 2:26 AM Myriam Peyrounette via petsc-users > > wrote: > > Hi, > > that's really good news for us, thanks! I will plot again the > memory scaling using these new options and let you know. Next week > I hope. > > Before that, I just need to clarify the situation. Throughout our > discussions, we mentionned a number of options concerning the > scalability: > > -matptatp_via scalable > -inner_diag_matmatmult_via scalable > -inner_diag_matmatmult_via scalable > -mat_freeintermediatedatastructures > -matptap_via allatonce > -matptap_via allatonce_merged > > Which ones of them are compatible? Should I use all of them at the > same time? Is there redundancy? > > Thanks, > > Myriam > > > Le 04/25/19 ? 21:47, Zhang, Hong a ?crit?: >> Myriam: >> Checking MatPtAP() in petsc-3.6.4, I realized that it uses >> different algorithm than petsc-10 and later versions. petsc-3.6 >> uses out-product for C=P^T * AP, while petsc-3.10 uses local >> transpose of P. petsc-3.10 accelerates data accessing, but >> doubles the memory of P.? >> >> Fande added two new implementations for MatPtAP() to petsc-master >> which use much smaller and scalable memories with slightly higher >> computing time (faster than hypre though). You may use these new >> implementations if you have concern on memory scalability. The >> option for these new implementation are:? >> -matptap_via allatonce >> -matptap_via allatonce_merged >> >> Hong >> >> On Mon, Apr 15, 2019 at 12:10 PM hzhang at mcs.anl.gov >> > > wrote: >> >> Myriam: >> Thank you very much for providing these results! >> I have put effort to accelerate execution time and avoid >> using global sizes in PtAP, for which the algorithm of >> transpose of P_local and P_other likely doubles the memory >> usage. I'll try to investigate why it becomes unscalable. >> Hong >> >> Hi, >> >> you'll find the new scaling attached (green line). I used >> the version 3.11 and the four scalability options : >> -matptap_via scalable >> -inner_diag_matmatmult_via scalable >> -inner_offdiag_matmatmult_via scalable >> -mat_freeintermediatedatastructures >> >> The scaling is much better! The code even uses less >> memory for the smallest cases. There is still an increase >> for the larger one. >> >> With regard to the time scaling, I used KSPView and >> LogView on the two previous scalings (blue and yellow >> lines) but not on the last one (green line). So we can't >> really compare them, am I right? However, we can see that >> the new time scaling looks quite good. It slightly >> increases from ~8s to ~27s. >> >> Unfortunately, the computations are expensive so I would >> like to avoid re-run them if possible. How relevant would >> be a proper time scaling for you?? >> >> Myriam >> >> >> Le 04/12/19 ? 18:18, Zhang, Hong a ?crit?: >>> Myriam : >>> Thanks for your effort. It will help us improve PETSc. >>> Hong >>> >>> Hi all, >>> >>> I used the wrong script, that's why it diverged... >>> Sorry about that.? >>> I tried again with the right script applied on a >>> tiny problem (~200 >>> elements). I can see a small difference in memory >>> usage (gain ~ 1mB). >>> when adding the -mat_freeintermediatestructures >>> option. I still have to >>> execute larger cases to plot the scaling. The >>> supercomputer I am used to >>> run my jobs on is really busy at the moment so it >>> takes a while. I hope >>> I'll send you the results on Monday. >>> >>> Thanks everyone, >>> >>> Myriam >>> >>> >>> Le 04/11/19 ? 06:01, Jed Brown a ?crit?: >>> > "Zhang, Hong" >> > writes: >>> > >>> >> Jed: >>> >>>> Myriam, >>> >>>> Thanks for the plot. >>> '-mat_freeintermediatedatastructures' should not >>> affect solution. It releases almost half of memory >>> in C=PtAP if C is not reused. >>> >>> And yet if turning it on causes divergence, that >>> would imply a bug. >>> >>> Hong, are you able to reproduce the experiment >>> to see the memory >>> >>> scaling? >>> >> I like to test his code using an alcf machine, >>> but my hands are full now. I'll try it as soon as I >>> find time, hopefully next week. >>> > I have now compiled and run her code locally. >>> > >>> > Myriam, thanks for your last mail adding >>> configuration and removing the >>> > MemManager.h dependency.? I ran with and without >>> > -mat_freeintermediatedatastructures and don't see >>> a difference in >>> > convergence.? What commands did you run to observe >>> that difference? >>> >>> -- >>> Myriam Peyrounette >>> CNRS/IDRIS - HLST >>> -- >>> >>> >> >> -- >> Myriam Peyrounette >> CNRS/IDRIS - HLST >> -- >> > > -- > Myriam Peyrounette > CNRS/IDRIS - HLST > -- > -- Myriam Peyrounette CNRS/IDRIS - HLST -- -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2975 bytes Desc: Signature cryptographique S/MIME URL: From myriam.peyrounette at idris.fr Fri May 3 09:14:23 2019 From: myriam.peyrounette at idris.fr (Myriam Peyrounette) Date: Fri, 3 May 2019 16:14:23 +0200 Subject: [petsc-users] Bad memory scaling with PETSc 3.10 In-Reply-To: <13df6685-2825-3629-0c79-cb66f4deae22@idris.fr> References: <877ecaeo63.fsf@jedbrown.org> <9629bc17-89d4-fdc9-1b44-7c4a25f62861@idris.fr> <3ad6ca66-d665-739a-874e-7599d5270797@idris.fr> <877ec17g52.fsf@jedbrown.org> <87zhox5gxi.fsf@jedbrown.org> <93a32d83-0b81-8bf3-d654-6711d9b0138f@idris.fr> <26b73c92-6a23-cf03-9e7f-1a24893ee512@idris.fr> <31946231-4948-fc6a-093a-7ed8f00f3579@idris.fr> <13df6685-2825-3629-0c79-cb66f4deae22@idris.fr> Message-ID: And the attached files... Sorry Le 05/03/19 ? 16:11, Myriam Peyrounette a ?crit?: > > Hi, > > I plotted new scalings (memory and time) using the new algorithms. I > used the options /-options_left true /to make sure that the options > are effectively used. They are. > > I don't have access to the platform I used to run my computations on, > so I ran them on a different one. In particular, I can't reach problem > size = 1e8 and the values might be different from the previous > scalings I sent you. But the comparison of the PETSc versions and > options is still relevant. > > I plotted the scalings of reference: the "good" one (PETSc 3.6.4) in > green, the "bad" one (PETSc 3.10.2) in blue. > > I used the commit d330a26 (3.11.1) for all the other scalings, adding > different sets of options: > > /Light blue/ -> -matptap_via > allatonce??-mat_freeintermediatedatastructures 1 > /Orange/ -> -matptap_via > allatonce_*merged*?-mat_freeintermediatedatastructures 1 > /Purple/ -> -matptap_via > allatonce??-mat_freeintermediatedatastructures 1 > *-inner_diag_matmatmult_via scalable -inner_offdiag_matmatmult_via > scalable* > /Yellow/: -matptap_via > allatonce_*merged*?-mat_freeintermediatedatastructures 1 > *-inner_diag_matmatmult_via scalable -inner_offdiag_matmatmult_via > scalable* > > Conclusion: with regard to memory, the two algorithms imply a > similarly good improvement of the scaling. The use of the > -inner_(off)diag_matmatmult_via options is also very interesting. The > scaling is still not as good as 3.6.4 though. > With regard to time, I noted a real improvement in time execution! I > used to spend 200-300s on these executions. Now they take 10-15s. > Beside that, the "_merged" versions are more efficient. And the > -inner_(off)diaf_matmatmult_via options are slightly expensive but it > is not critical. > > What do you think? Is it possible to match again the scaling of PETSc > 3.6.4? Is it worthy keeping investigating? > > Myriam > > > Le 04/30/19 ? 17:00, Fande Kong a ?crit?: >> HI Myriam, >> >> We are interesting how the new algorithms perform. So there are two >> new algorithms you could try. >> >> Algorithm 1: >> >> -matptap_via allatonce??-mat_freeintermediatedatastructures 1 >> >> Algorithm 2: >> >> -matptap_via allatonce_merged?-mat_freeintermediatedatastructures 1 >> >> >> Note that you need to use the current petsc-master, and also please >> put "-snes_view" in your script so that we can confirm these options >> are actually get set. >> >> Thanks, >> >> Fande, >> >> >> On Tue, Apr 30, 2019 at 2:26 AM Myriam Peyrounette via petsc-users >> > wrote: >> >> Hi, >> >> that's really good news for us, thanks! I will plot again the >> memory scaling using these new options and let you know. Next >> week I hope. >> >> Before that, I just need to clarify the situation. Throughout our >> discussions, we mentionned a number of options concerning the >> scalability: >> >> -matptatp_via scalable >> -inner_diag_matmatmult_via scalable >> -inner_diag_matmatmult_via scalable >> -mat_freeintermediatedatastructures >> -matptap_via allatonce >> -matptap_via allatonce_merged >> >> Which ones of them are compatible? Should I use all of them at >> the same time? Is there redundancy? >> >> Thanks, >> >> Myriam >> >> >> Le 04/25/19 ? 21:47, Zhang, Hong a ?crit?: >>> Myriam: >>> Checking MatPtAP() in petsc-3.6.4, I realized that it uses >>> different algorithm than petsc-10 and later versions. petsc-3.6 >>> uses out-product for C=P^T * AP, while petsc-3.10 uses local >>> transpose of P. petsc-3.10 accelerates data accessing, but >>> doubles the memory of P.? >>> >>> Fande added two new implementations for MatPtAP() to >>> petsc-master which use much smaller and scalable memories with >>> slightly higher computing time (faster than hypre though). You >>> may use these new implementations if you have concern on memory >>> scalability. The option for these new implementation are:? >>> -matptap_via allatonce >>> -matptap_via allatonce_merged >>> >>> Hong >>> >>> On Mon, Apr 15, 2019 at 12:10 PM hzhang at mcs.anl.gov >>> >> > wrote: >>> >>> Myriam: >>> Thank you very much for providing these results! >>> I have put effort to accelerate execution time and avoid >>> using global sizes in PtAP, for which the algorithm of >>> transpose of P_local and P_other likely doubles the memory >>> usage. I'll try to investigate why it becomes unscalable. >>> Hong >>> >>> Hi, >>> >>> you'll find the new scaling attached (green line). I >>> used the version 3.11 and the four scalability options : >>> -matptap_via scalable >>> -inner_diag_matmatmult_via scalable >>> -inner_offdiag_matmatmult_via scalable >>> -mat_freeintermediatedatastructures >>> >>> The scaling is much better! The code even uses less >>> memory for the smallest cases. There is still an >>> increase for the larger one. >>> >>> With regard to the time scaling, I used KSPView and >>> LogView on the two previous scalings (blue and yellow >>> lines) but not on the last one (green line). So we can't >>> really compare them, am I right? However, we can see >>> that the new time scaling looks quite good. It slightly >>> increases from ~8s to ~27s. >>> >>> Unfortunately, the computations are expensive so I would >>> like to avoid re-run them if possible. How relevant >>> would be a proper time scaling for you?? >>> >>> Myriam >>> >>> >>> Le 04/12/19 ? 18:18, Zhang, Hong a ?crit?: >>>> Myriam : >>>> Thanks for your effort. It will help us improve PETSc. >>>> Hong >>>> >>>> Hi all, >>>> >>>> I used the wrong script, that's why it diverged... >>>> Sorry about that.? >>>> I tried again with the right script applied on a >>>> tiny problem (~200 >>>> elements). I can see a small difference in memory >>>> usage (gain ~ 1mB). >>>> when adding the -mat_freeintermediatestructures >>>> option. I still have to >>>> execute larger cases to plot the scaling. The >>>> supercomputer I am used to >>>> run my jobs on is really busy at the moment so it >>>> takes a while. I hope >>>> I'll send you the results on Monday. >>>> >>>> Thanks everyone, >>>> >>>> Myriam >>>> >>>> >>>> Le 04/11/19 ? 06:01, Jed Brown a ?crit?: >>>> > "Zhang, Hong" >>> > writes: >>>> > >>>> >> Jed: >>>> >>>> Myriam, >>>> >>>> Thanks for the plot. >>>> '-mat_freeintermediatedatastructures' should not >>>> affect solution. It releases almost half of memory >>>> in C=PtAP if C is not reused. >>>> >>> And yet if turning it on causes divergence, >>>> that would imply a bug. >>>> >>> Hong, are you able to reproduce the experiment >>>> to see the memory >>>> >>> scaling? >>>> >> I like to test his code using an alcf machine, >>>> but my hands are full now. I'll try it as soon as I >>>> find time, hopefully next week. >>>> > I have now compiled and run her code locally. >>>> > >>>> > Myriam, thanks for your last mail adding >>>> configuration and removing the >>>> > MemManager.h dependency.? I ran with and without >>>> > -mat_freeintermediatedatastructures and don't see >>>> a difference in >>>> > convergence.? What commands did you run to >>>> observe that difference? >>>> >>>> -- >>>> Myriam Peyrounette >>>> CNRS/IDRIS - HLST >>>> -- >>>> >>>> >>> >>> -- >>> Myriam Peyrounette >>> CNRS/IDRIS - HLST >>> -- >>> >> >> -- >> Myriam Peyrounette >> CNRS/IDRIS - HLST >> -- >> > > -- > Myriam Peyrounette > CNRS/IDRIS - HLST > -- -- Myriam Peyrounette CNRS/IDRIS - HLST -- -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex42_mem_scaling_ada.png Type: image/png Size: 48984 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex42_time_scaling_ada.png Type: image/png Size: 36796 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2975 bytes Desc: Signature cryptographique S/MIME URL: From hzhang at mcs.anl.gov Fri May 3 09:34:58 2019 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Fri, 3 May 2019 14:34:58 +0000 Subject: [petsc-users] Bad memory scaling with PETSc 3.10 In-Reply-To: References: <877ecaeo63.fsf@jedbrown.org> <9629bc17-89d4-fdc9-1b44-7c4a25f62861@idris.fr> <3ad6ca66-d665-739a-874e-7599d5270797@idris.fr> <877ec17g52.fsf@jedbrown.org> <87zhox5gxi.fsf@jedbrown.org> <93a32d83-0b81-8bf3-d654-6711d9b0138f@idris.fr> <26b73c92-6a23-cf03-9e7f-1a24893ee512@idris.fr> <31946231-4948-fc6a-093a-7ed8f00f3579@idris.fr> <13df6685-2825-3629-0c79-cb66f4deae22@idris.fr> Message-ID: Myriam: Very interesting results. Do you have time for petsc-3.10 (blue) and 3.6 (green)? I do not understand why all algorithms gives non-scalable memory performance except petsc-3.6. We can easily resume petsc-3.6's MatPtAP though. Hong And the attached files... Sorry Le 05/03/19 ? 16:11, Myriam Peyrounette a ?crit : Hi, I plotted new scalings (memory and time) using the new algorithms. I used the options -options_left true to make sure that the options are effectively used. They are. I don't have access to the platform I used to run my computations on, so I ran them on a different one. In particular, I can't reach problem size = 1e8 and the values might be different from the previous scalings I sent you. But the comparison of the PETSc versions and options is still relevant. I plotted the scalings of reference: the "good" one (PETSc 3.6.4) in green, the "bad" one (PETSc 3.10.2) in blue. I used the commit d330a26 (3.11.1) for all the other scalings, adding different sets of options: Light blue -> -matptap_via allatonce -mat_freeintermediatedatastructures 1 Orange -> -matptap_via allatonce_merged -mat_freeintermediatedatastructures 1 Purple -> -matptap_via allatonce -mat_freeintermediatedatastructures 1 -inner_diag_matmatmult_via scalable -inner_offdiag_matmatmult_via scalable Yellow: -matptap_via allatonce_merged -mat_freeintermediatedatastructures 1 -inner_diag_matmatmult_via scalable -inner_offdiag_matmatmult_via scalable Conclusion: with regard to memory, the two algorithms imply a similarly good improvement of the scaling. The use of the -inner_(off)diag_matmatmult_via options is also very interesting. The scaling is still not as good as 3.6.4 though. With regard to time, I noted a real improvement in time execution! I used to spend 200-300s on these executions. Now they take 10-15s. Beside that, the "_merged" versions are more efficient. And the -inner_(off)diaf_matmatmult_via options are slightly expensive but it is not critical. What do you think? Is it possible to match again the scaling of PETSc 3.6.4? Is it worthy keeping investigating? Myriam Le 04/30/19 ? 17:00, Fande Kong a ?crit : HI Myriam, We are interesting how the new algorithms perform. So there are two new algorithms you could try. Algorithm 1: -matptap_via allatonce -mat_freeintermediatedatastructures 1 Algorithm 2: -matptap_via allatonce_merged -mat_freeintermediatedatastructures 1 Note that you need to use the current petsc-master, and also please put "-snes_view" in your script so that we can confirm these options are actually get set. Thanks, Fande, On Tue, Apr 30, 2019 at 2:26 AM Myriam Peyrounette via petsc-users > wrote: Hi, that's really good news for us, thanks! I will plot again the memory scaling using these new options and let you know. Next week I hope. Before that, I just need to clarify the situation. Throughout our discussions, we mentionned a number of options concerning the scalability: -matptatp_via scalable -inner_diag_matmatmult_via scalable -inner_diag_matmatmult_via scalable -mat_freeintermediatedatastructures -matptap_via allatonce -matptap_via allatonce_merged Which ones of them are compatible? Should I use all of them at the same time? Is there redundancy? Thanks, Myriam Le 04/25/19 ? 21:47, Zhang, Hong a ?crit : Myriam: Checking MatPtAP() in petsc-3.6.4, I realized that it uses different algorithm than petsc-10 and later versions. petsc-3.6 uses out-product for C=P^T * AP, while petsc-3.10 uses local transpose of P. petsc-3.10 accelerates data accessing, but doubles the memory of P. Fande added two new implementations for MatPtAP() to petsc-master which use much smaller and scalable memories with slightly higher computing time (faster than hypre though). You may use these new implementations if you have concern on memory scalability. The option for these new implementation are: -matptap_via allatonce -matptap_via allatonce_merged Hong On Mon, Apr 15, 2019 at 12:10 PM hzhang at mcs.anl.gov > wrote: Myriam: Thank you very much for providing these results! I have put effort to accelerate execution time and avoid using global sizes in PtAP, for which the algorithm of transpose of P_local and P_other likely doubles the memory usage. I'll try to investigate why it becomes unscalable. Hong Hi, you'll find the new scaling attached (green line). I used the version 3.11 and the four scalability options : -matptap_via scalable -inner_diag_matmatmult_via scalable -inner_offdiag_matmatmult_via scalable -mat_freeintermediatedatastructures The scaling is much better! The code even uses less memory for the smallest cases. There is still an increase for the larger one. With regard to the time scaling, I used KSPView and LogView on the two previous scalings (blue and yellow lines) but not on the last one (green line). So we can't really compare them, am I right? However, we can see that the new time scaling looks quite good. It slightly increases from ~8s to ~27s. Unfortunately, the computations are expensive so I would like to avoid re-run them if possible. How relevant would be a proper time scaling for you? Myriam Le 04/12/19 ? 18:18, Zhang, Hong a ?crit : Myriam : Thanks for your effort. It will help us improve PETSc. Hong Hi all, I used the wrong script, that's why it diverged... Sorry about that. I tried again with the right script applied on a tiny problem (~200 elements). I can see a small difference in memory usage (gain ~ 1mB). when adding the -mat_freeintermediatestructures option. I still have to execute larger cases to plot the scaling. The supercomputer I am used to run my jobs on is really busy at the moment so it takes a while. I hope I'll send you the results on Monday. Thanks everyone, Myriam Le 04/11/19 ? 06:01, Jed Brown a ?crit : > "Zhang, Hong" > writes: > >> Jed: >>>> Myriam, >>>> Thanks for the plot. '-mat_freeintermediatedatastructures' should not affect solution. It releases almost half of memory in C=PtAP if C is not reused. >>> And yet if turning it on causes divergence, that would imply a bug. >>> Hong, are you able to reproduce the experiment to see the memory >>> scaling? >> I like to test his code using an alcf machine, but my hands are full now. I'll try it as soon as I find time, hopefully next week. > I have now compiled and run her code locally. > > Myriam, thanks for your last mail adding configuration and removing the > MemManager.h dependency. I ran with and without > -mat_freeintermediatedatastructures and don't see a difference in > convergence. What commands did you run to observe that difference? -- Myriam Peyrounette CNRS/IDRIS - HLST -- -- Myriam Peyrounette CNRS/IDRIS - HLST -- -- Myriam Peyrounette CNRS/IDRIS - HLST -- -- Myriam Peyrounette CNRS/IDRIS - HLST -- -- Myriam Peyrounette CNRS/IDRIS - HLST -- -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdkong.jd at gmail.com Fri May 3 09:45:20 2019 From: fdkong.jd at gmail.com (Fande Kong) Date: Fri, 3 May 2019 08:45:20 -0600 Subject: [petsc-users] Bad memory scaling with PETSc 3.10 In-Reply-To: <13df6685-2825-3629-0c79-cb66f4deae22@idris.fr> References: <00dfa074-cc41-6a3b-257d-28a089e90617@idris.fr> <877ecaeo63.fsf@jedbrown.org> <9629bc17-89d4-fdc9-1b44-7c4a25f62861@idris.fr> <3ad6ca66-d665-739a-874e-7599d5270797@idris.fr> <877ec17g52.fsf@jedbrown.org> <87zhox5gxi.fsf@jedbrown.org> <93a32d83-0b81-8bf3-d654-6711d9b0138f@idris.fr> <26b73c92-6a23-cf03-9e7f-1a24893ee512@idris.fr> <31946231-4948-fc6a-093a-7ed8f00f3579@idris.fr> <13df6685-2825-3629-0c79-cb66f4deae22@idris.fr> Message-ID: On Fri, May 3, 2019 at 8:11 AM Myriam Peyrounette < myriam.peyrounette at idris.fr> wrote: > Hi, > > I plotted new scalings (memory and time) using the new algorithms. I used > the options *-options_left true *to make sure that the options are > effectively used. They are. > > I don't have access to the platform I used to run my computations on, so I > ran them on a different one. In particular, I can't reach problem size = > 1e8 and the values might be different from the previous scalings I sent > you. But the comparison of the PETSc versions and options is still > relevant. > > I plotted the scalings of reference: the "good" one (PETSc 3.6.4) in > green, the "bad" one (PETSc 3.10.2) in blue. > > I used the commit d330a26 (3.11.1) for all the other scalings, adding > different sets of options: > > *Light blue* -> -matptap_via > allatonce -mat_freeintermediatedatastructures 1 > *Orange* -> -matptap_via allatonce_*merged* -mat_freeintermediatedatastructures > 1 > As said earlier, you should use these two combinations only. > *Purple* -> -matptap_via allatonce -mat_freeintermediatedatastructures 1 *-inner_diag_matmatmult_via > scalable -inner_offdiag_matmatmult_via scalable* > Do not use it since it does not make any sense. The new algorithm does not need *-inner_diag_matmatmult_via scalable -inner_offdiag_matmatmult_via scalable* > *Yellow*: -matptap_via allatonce_*merged* -mat_freeintermediatedatastructures > 1 *-inner_diag_matmatmult_via scalable -inner_offdiag_matmatmult_via > scalable* > Do not use these. > Conclusion: with regard to memory, the two algorithms imply a similarly > good improvement of the scaling. The use of the > -inner_(off)diag_matmatmult_via options is also very interesting. > The use of the -inner_(off)diag_matmatmult_via should not change anything since I do not need these options at all in ``allatonce" and ``allatonce_merged". Thanks, Fande, The scaling is still not as good as 3.6.4 though. > With regard to time, I noted a real improvement in time execution! I used > to spend 200-300s on these executions > Now they take 10-15s. > It is interesting. I observed this similar behavior when the problem size is small for mat/ex96.c. Their performance will be very close when the problem is large. Thanks, Fande, Beside that, the "_merged" versions are more efficient. And the > -inner_(off)diaf_matmatmult_via options are slightly expensive but it is > not critical. > > What do you think? Is it possible to match again the scaling of PETSc > 3.6.4? Is it worthy keeping investigating? > > Myriam > > > Le 04/30/19 ? 17:00, Fande Kong a ?crit : > > HI Myriam, > > We are interesting how the new algorithms perform. So there are two new > algorithms you could try. > > Algorithm 1: > > -matptap_via allatonce -mat_freeintermediatedatastructures 1 > > Algorithm 2: > > -matptap_via allatonce_merged -mat_freeintermediatedatastructures 1 > > > Note that you need to use the current petsc-master, and also please put > "-snes_view" in your script so that we can confirm these options are > actually get set. > > Thanks, > > Fande, > > > On Tue, Apr 30, 2019 at 2:26 AM Myriam Peyrounette via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> Hi, >> >> that's really good news for us, thanks! I will plot again the memory >> scaling using these new options and let you know. Next week I hope. >> >> Before that, I just need to clarify the situation. Throughout our >> discussions, we mentionned a number of options concerning the scalability: >> >> -matptatp_via scalable >> -inner_diag_matmatmult_via scalable >> -inner_diag_matmatmult_via scalable >> -mat_freeintermediatedatastructures >> -matptap_via allatonce >> -matptap_via allatonce_merged >> >> Which ones of them are compatible? Should I use all of them at the same >> time? Is there redundancy? >> >> Thanks, >> >> Myriam >> >> Le 04/25/19 ? 21:47, Zhang, Hong a ?crit : >> >> Myriam: >> Checking MatPtAP() in petsc-3.6.4, I realized that it uses different >> algorithm than petsc-10 and later versions. petsc-3.6 uses out-product for >> C=P^T * AP, while petsc-3.10 uses local transpose of P. petsc-3.10 >> accelerates data accessing, but doubles the memory of P. >> >> Fande added two new implementations for MatPtAP() to petsc-master which >> use much smaller and scalable memories with slightly higher computing time >> (faster than hypre though). You may use these new implementations if you >> have concern on memory scalability. The option for these new implementation >> are: >> -matptap_via allatonce >> -matptap_via allatonce_merged >> >> Hong >> >> On Mon, Apr 15, 2019 at 12:10 PM hzhang at mcs.anl.gov >> wrote: >> >>> Myriam: >>> Thank you very much for providing these results! >>> I have put effort to accelerate execution time and avoid using global >>> sizes in PtAP, for which the algorithm of transpose of P_local and P_other >>> likely doubles the memory usage. I'll try to investigate why it becomes >>> unscalable. >>> Hong >>> >>>> Hi, >>>> >>>> you'll find the new scaling attached (green line). I used the version >>>> 3.11 and the four scalability options : >>>> -matptap_via scalable >>>> -inner_diag_matmatmult_via scalable >>>> -inner_offdiag_matmatmult_via scalable >>>> -mat_freeintermediatedatastructures >>>> >>>> The scaling is much better! The code even uses less memory for the >>>> smallest cases. There is still an increase for the larger one. >>>> >>>> With regard to the time scaling, I used KSPView and LogView on the two >>>> previous scalings (blue and yellow lines) but not on the last one (green >>>> line). So we can't really compare them, am I right? However, we can see >>>> that the new time scaling looks quite good. It slightly increases from ~8s >>>> to ~27s. >>>> >>>> Unfortunately, the computations are expensive so I would like to avoid >>>> re-run them if possible. How relevant would be a proper time scaling for >>>> you? >>>> >>>> Myriam >>>> >>>> Le 04/12/19 ? 18:18, Zhang, Hong a ?crit : >>>> >>>> Myriam : >>>> Thanks for your effort. It will help us improve PETSc. >>>> Hong >>>> >>>> Hi all, >>>>> >>>>> I used the wrong script, that's why it diverged... Sorry about that. >>>>> I tried again with the right script applied on a tiny problem (~200 >>>>> elements). I can see a small difference in memory usage (gain ~ 1mB). >>>>> when adding the -mat_freeintermediatestructures option. I still have to >>>>> execute larger cases to plot the scaling. The supercomputer I am used >>>>> to >>>>> run my jobs on is really busy at the moment so it takes a while. I hope >>>>> I'll send you the results on Monday. >>>>> >>>>> Thanks everyone, >>>>> >>>>> Myriam >>>>> >>>>> >>>>> Le 04/11/19 ? 06:01, Jed Brown a ?crit : >>>>> > "Zhang, Hong" writes: >>>>> > >>>>> >> Jed: >>>>> >>>> Myriam, >>>>> >>>> Thanks for the plot. '-mat_freeintermediatedatastructures' should >>>>> not affect solution. It releases almost half of memory in C=PtAP if C is >>>>> not reused. >>>>> >>> And yet if turning it on causes divergence, that would imply a bug. >>>>> >>> Hong, are you able to reproduce the experiment to see the memory >>>>> >>> scaling? >>>>> >> I like to test his code using an alcf machine, but my hands are >>>>> full now. I'll try it as soon as I find time, hopefully next week. >>>>> > I have now compiled and run her code locally. >>>>> > >>>>> > Myriam, thanks for your last mail adding configuration and removing >>>>> the >>>>> > MemManager.h dependency. I ran with and without >>>>> > -mat_freeintermediatedatastructures and don't see a difference in >>>>> > convergence. What commands did you run to observe that difference? >>>>> >>>>> -- >>>>> Myriam Peyrounette >>>>> CNRS/IDRIS - HLST >>>>> -- >>>>> >>>>> >>>>> >>>> -- >>>> Myriam Peyrounette >>>> CNRS/IDRIS - HLST >>>> -- >>>> >>>> >> -- >> Myriam Peyrounette >> CNRS/IDRIS - HLST >> -- >> >> > -- > Myriam Peyrounette > CNRS/IDRIS - HLST > -- > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdkong.jd at gmail.com Fri May 3 10:26:50 2019 From: fdkong.jd at gmail.com (Fande Kong) Date: Fri, 3 May 2019 09:26:50 -0600 Subject: [petsc-users] Bad memory scaling with PETSc 3.10 In-Reply-To: References: <877ecaeo63.fsf@jedbrown.org> <9629bc17-89d4-fdc9-1b44-7c4a25f62861@idris.fr> <3ad6ca66-d665-739a-874e-7599d5270797@idris.fr> <877ec17g52.fsf@jedbrown.org> <87zhox5gxi.fsf@jedbrown.org> <93a32d83-0b81-8bf3-d654-6711d9b0138f@idris.fr> <26b73c92-6a23-cf03-9e7f-1a24893ee512@idris.fr> <31946231-4948-fc6a-093a-7ed8f00f3579@idris.fr> <13df6685-2825-3629-0c79-cb66f4deae22@idris.fr> Message-ID: Thanks for your plots. The new algorithms should be scalable in terms of the memory usage. I am puzzled by these plots since the memory usage increases exponentially. It may come from somewhere else? How do you measure the memory? The memory is for the entire simulation or just PtAP? Could you measure the memory for PtAP only? Maybe several factors affect the memory usage not only PtAP. I will grab some data from my own simulations. Are you running ex43? Fande, On Fri, May 3, 2019 at 8:14 AM Myriam Peyrounette < myriam.peyrounette at idris.fr> wrote: > And the attached files... Sorry > > Le 05/03/19 ? 16:11, Myriam Peyrounette a ?crit : > > Hi, > > I plotted new scalings (memory and time) using the new algorithms. I used > the options *-options_left true *to make sure that the options are > effectively used. They are. > > I don't have access to the platform I used to run my computations on, so I > ran them on a different one. In particular, I can't reach problem size = > 1e8 and the values might be different from the previous scalings I sent > you. But the comparison of the PETSc versions and options is still > relevant. > > I plotted the scalings of reference: the "good" one (PETSc 3.6.4) in > green, the "bad" one (PETSc 3.10.2) in blue. > > I used the commit d330a26 (3.11.1) for all the other scalings, adding > different sets of options: > > *Light blue* -> -matptap_via > allatonce -mat_freeintermediatedatastructures 1 > *Orange* -> -matptap_via allatonce_*merged* -mat_freeintermediatedatastructures > 1 > *Purple* -> -matptap_via allatonce -mat_freeintermediatedatastructures 1 *-inner_diag_matmatmult_via > scalable -inner_offdiag_matmatmult_via scalable* > *Yellow*: -matptap_via allatonce_*merged* -mat_freeintermediatedatastructures > 1 *-inner_diag_matmatmult_via scalable -inner_offdiag_matmatmult_via > scalable* > > Conclusion: with regard to memory, the two algorithms imply a similarly > good improvement of the scaling. The use of the > -inner_(off)diag_matmatmult_via options is also very interesting. The > scaling is still not as good as 3.6.4 though. > With regard to time, I noted a real improvement in time execution! I used > to spend 200-300s on these executions. Now they take 10-15s. Beside that, > the "_merged" versions are more efficient. And the > -inner_(off)diaf_matmatmult_via options are slightly expensive but it is > not critical. > > What do you think? Is it possible to match again the scaling of PETSc > 3.6.4? Is it worthy keeping investigating? > > Myriam > > > Le 04/30/19 ? 17:00, Fande Kong a ?crit : > > HI Myriam, > > We are interesting how the new algorithms perform. So there are two new > algorithms you could try. > > Algorithm 1: > > -matptap_via allatonce -mat_freeintermediatedatastructures 1 > > Algorithm 2: > > -matptap_via allatonce_merged -mat_freeintermediatedatastructures 1 > > > Note that you need to use the current petsc-master, and also please put > "-snes_view" in your script so that we can confirm these options are > actually get set. > > Thanks, > > Fande, > > > On Tue, Apr 30, 2019 at 2:26 AM Myriam Peyrounette via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> Hi, >> >> that's really good news for us, thanks! I will plot again the memory >> scaling using these new options and let you know. Next week I hope. >> >> Before that, I just need to clarify the situation. Throughout our >> discussions, we mentionned a number of options concerning the scalability: >> >> -matptatp_via scalable >> -inner_diag_matmatmult_via scalable >> -inner_diag_matmatmult_via scalable >> -mat_freeintermediatedatastructures >> -matptap_via allatonce >> -matptap_via allatonce_merged >> >> Which ones of them are compatible? Should I use all of them at the same >> time? Is there redundancy? >> >> Thanks, >> >> Myriam >> >> Le 04/25/19 ? 21:47, Zhang, Hong a ?crit : >> >> Myriam: >> Checking MatPtAP() in petsc-3.6.4, I realized that it uses different >> algorithm than petsc-10 and later versions. petsc-3.6 uses out-product for >> C=P^T * AP, while petsc-3.10 uses local transpose of P. petsc-3.10 >> accelerates data accessing, but doubles the memory of P. >> >> Fande added two new implementations for MatPtAP() to petsc-master which >> use much smaller and scalable memories with slightly higher computing time >> (faster than hypre though). You may use these new implementations if you >> have concern on memory scalability. The option for these new implementation >> are: >> -matptap_via allatonce >> -matptap_via allatonce_merged >> >> Hong >> >> On Mon, Apr 15, 2019 at 12:10 PM hzhang at mcs.anl.gov >> wrote: >> >>> Myriam: >>> Thank you very much for providing these results! >>> I have put effort to accelerate execution time and avoid using global >>> sizes in PtAP, for which the algorithm of transpose of P_local and P_other >>> likely doubles the memory usage. I'll try to investigate why it becomes >>> unscalable. >>> Hong >>> >>>> Hi, >>>> >>>> you'll find the new scaling attached (green line). I used the version >>>> 3.11 and the four scalability options : >>>> -matptap_via scalable >>>> -inner_diag_matmatmult_via scalable >>>> -inner_offdiag_matmatmult_via scalable >>>> -mat_freeintermediatedatastructures >>>> >>>> The scaling is much better! The code even uses less memory for the >>>> smallest cases. There is still an increase for the larger one. >>>> >>>> With regard to the time scaling, I used KSPView and LogView on the two >>>> previous scalings (blue and yellow lines) but not on the last one (green >>>> line). So we can't really compare them, am I right? However, we can see >>>> that the new time scaling looks quite good. It slightly increases from ~8s >>>> to ~27s. >>>> >>>> Unfortunately, the computations are expensive so I would like to avoid >>>> re-run them if possible. How relevant would be a proper time scaling for >>>> you? >>>> >>>> Myriam >>>> >>>> Le 04/12/19 ? 18:18, Zhang, Hong a ?crit : >>>> >>>> Myriam : >>>> Thanks for your effort. It will help us improve PETSc. >>>> Hong >>>> >>>> Hi all, >>>>> >>>>> I used the wrong script, that's why it diverged... Sorry about that. >>>>> I tried again with the right script applied on a tiny problem (~200 >>>>> elements). I can see a small difference in memory usage (gain ~ 1mB). >>>>> when adding the -mat_freeintermediatestructures option. I still have to >>>>> execute larger cases to plot the scaling. The supercomputer I am used >>>>> to >>>>> run my jobs on is really busy at the moment so it takes a while. I hope >>>>> I'll send you the results on Monday. >>>>> >>>>> Thanks everyone, >>>>> >>>>> Myriam >>>>> >>>>> >>>>> Le 04/11/19 ? 06:01, Jed Brown a ?crit : >>>>> > "Zhang, Hong" writes: >>>>> > >>>>> >> Jed: >>>>> >>>> Myriam, >>>>> >>>> Thanks for the plot. '-mat_freeintermediatedatastructures' should >>>>> not affect solution. It releases almost half of memory in C=PtAP if C is >>>>> not reused. >>>>> >>> And yet if turning it on causes divergence, that would imply a bug. >>>>> >>> Hong, are you able to reproduce the experiment to see the memory >>>>> >>> scaling? >>>>> >> I like to test his code using an alcf machine, but my hands are >>>>> full now. I'll try it as soon as I find time, hopefully next week. >>>>> > I have now compiled and run her code locally. >>>>> > >>>>> > Myriam, thanks for your last mail adding configuration and removing >>>>> the >>>>> > MemManager.h dependency. I ran with and without >>>>> > -mat_freeintermediatedatastructures and don't see a difference in >>>>> > convergence. What commands did you run to observe that difference? >>>>> >>>>> -- >>>>> Myriam Peyrounette >>>>> CNRS/IDRIS - HLST >>>>> -- >>>>> >>>>> >>>>> >>>> -- >>>> Myriam Peyrounette >>>> CNRS/IDRIS - HLST >>>> -- >>>> >>>> >> -- >> Myriam Peyrounette >> CNRS/IDRIS - HLST >> -- >> >> > -- > Myriam Peyrounette > CNRS/IDRIS - HLST > -- > > > -- > Myriam Peyrounette > CNRS/IDRIS - HLST > -- > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdkong.jd at gmail.com Fri May 3 11:21:59 2019 From: fdkong.jd at gmail.com (Fande Kong) Date: Fri, 3 May 2019 10:21:59 -0600 Subject: [petsc-users] Bad memory scaling with PETSc 3.10 In-Reply-To: References: <877ecaeo63.fsf@jedbrown.org> <9629bc17-89d4-fdc9-1b44-7c4a25f62861@idris.fr> <3ad6ca66-d665-739a-874e-7599d5270797@idris.fr> <877ec17g52.fsf@jedbrown.org> <87zhox5gxi.fsf@jedbrown.org> <93a32d83-0b81-8bf3-d654-6711d9b0138f@idris.fr> <26b73c92-6a23-cf03-9e7f-1a24893ee512@idris.fr> <31946231-4948-fc6a-093a-7ed8f00f3579@idris.fr> <13df6685-2825-3629-0c79-cb66f4deae22@idris.fr> Message-ID: I have some data from my own simulations. The results do not look bad. The following are results (strong scaling) of "-matptap_via allatonce -mat_freeintermediatedatastructures 1" Problem 1 has 2,482,224,480 unknowns, and use 4000, 6000, 10000, and 12000 processor cores. 4000 processor cores: 587M 6000 processor cores: 270M 10000 processor cores: 251M 12000 processor cores: 136M Problem 2 has 7,446,673,440 unknowns, and use 6000, 10000, and 12000 process cores: 6000 processor cores: 975M 10000 processor cores: 599M 12000 processor cores: 415M The memory is used for PtAP only, and I do not include the memory from the other part of the simulation. I am sorry we did not resolve the issue for you so far. I will try to run your example you attached earlier to if we can reproduce it. If we can reproduce the problem, I will use a memory profiling tool to check where the memory comes from. Thanks again for your report, Fande, On Fri, May 3, 2019 at 9:26 AM Fande Kong wrote: > Thanks for your plots. > > The new algorithms should be scalable in terms of the memory usage. I am > puzzled by these plots since the memory usage increases exponentially. It > may come from somewhere else? How do you measure the memory? The memory is > for the entire simulation or just PtAP? Could you measure the memory for > PtAP only? Maybe several factors affect the memory usage not only PtAP. > > I will grab some data from my own simulations. > > Are you running ex43? > > Fande, > > > > On Fri, May 3, 2019 at 8:14 AM Myriam Peyrounette < > myriam.peyrounette at idris.fr> wrote: > >> And the attached files... Sorry >> >> Le 05/03/19 ? 16:11, Myriam Peyrounette a ?crit : >> >> Hi, >> >> I plotted new scalings (memory and time) using the new algorithms. I used >> the options *-options_left true *to make sure that the options are >> effectively used. They are. >> >> I don't have access to the platform I used to run my computations on, so >> I ran them on a different one. In particular, I can't reach problem size = >> 1e8 and the values might be different from the previous scalings I sent >> you. But the comparison of the PETSc versions and options is still >> relevant. >> >> I plotted the scalings of reference: the "good" one (PETSc 3.6.4) in >> green, the "bad" one (PETSc 3.10.2) in blue. >> >> I used the commit d330a26 (3.11.1) for all the other scalings, adding >> different sets of options: >> >> *Light blue* -> -matptap_via >> allatonce -mat_freeintermediatedatastructures 1 >> *Orange* -> -matptap_via allatonce_*merged* -mat_freeintermediatedatastructures >> 1 >> *Purple* -> -matptap_via allatonce -mat_freeintermediatedatastructures >> 1 *-inner_diag_matmatmult_via scalable -inner_offdiag_matmatmult_via >> scalable* >> *Yellow*: -matptap_via allatonce_*merged* -mat_freeintermediatedatastructures >> 1 *-inner_diag_matmatmult_via scalable -inner_offdiag_matmatmult_via >> scalable* >> >> Conclusion: with regard to memory, the two algorithms imply a similarly >> good improvement of the scaling. The use of the >> -inner_(off)diag_matmatmult_via options is also very interesting. The >> scaling is still not as good as 3.6.4 though. >> With regard to time, I noted a real improvement in time execution! I used >> to spend 200-300s on these executions. Now they take 10-15s. Beside that, >> the "_merged" versions are more efficient. And the >> -inner_(off)diaf_matmatmult_via options are slightly expensive but it is >> not critical. >> >> What do you think? Is it possible to match again the scaling of PETSc >> 3.6.4? Is it worthy keeping investigating? >> >> Myriam >> >> >> Le 04/30/19 ? 17:00, Fande Kong a ?crit : >> >> HI Myriam, >> >> We are interesting how the new algorithms perform. So there are two new >> algorithms you could try. >> >> Algorithm 1: >> >> -matptap_via allatonce -mat_freeintermediatedatastructures 1 >> >> Algorithm 2: >> >> -matptap_via allatonce_merged -mat_freeintermediatedatastructures 1 >> >> >> Note that you need to use the current petsc-master, and also please put >> "-snes_view" in your script so that we can confirm these options are >> actually get set. >> >> Thanks, >> >> Fande, >> >> >> On Tue, Apr 30, 2019 at 2:26 AM Myriam Peyrounette via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> >>> Hi, >>> >>> that's really good news for us, thanks! I will plot again the memory >>> scaling using these new options and let you know. Next week I hope. >>> >>> Before that, I just need to clarify the situation. Throughout our >>> discussions, we mentionned a number of options concerning the scalability: >>> >>> -matptatp_via scalable >>> -inner_diag_matmatmult_via scalable >>> -inner_diag_matmatmult_via scalable >>> -mat_freeintermediatedatastructures >>> -matptap_via allatonce >>> -matptap_via allatonce_merged >>> >>> Which ones of them are compatible? Should I use all of them at the same >>> time? Is there redundancy? >>> >>> Thanks, >>> >>> Myriam >>> >>> Le 04/25/19 ? 21:47, Zhang, Hong a ?crit : >>> >>> Myriam: >>> Checking MatPtAP() in petsc-3.6.4, I realized that it uses different >>> algorithm than petsc-10 and later versions. petsc-3.6 uses out-product for >>> C=P^T * AP, while petsc-3.10 uses local transpose of P. petsc-3.10 >>> accelerates data accessing, but doubles the memory of P. >>> >>> Fande added two new implementations for MatPtAP() to petsc-master which >>> use much smaller and scalable memories with slightly higher computing time >>> (faster than hypre though). You may use these new implementations if you >>> have concern on memory scalability. The option for these new implementation >>> are: >>> -matptap_via allatonce >>> -matptap_via allatonce_merged >>> >>> Hong >>> >>> On Mon, Apr 15, 2019 at 12:10 PM hzhang at mcs.anl.gov >>> wrote: >>> >>>> Myriam: >>>> Thank you very much for providing these results! >>>> I have put effort to accelerate execution time and avoid using global >>>> sizes in PtAP, for which the algorithm of transpose of P_local and P_other >>>> likely doubles the memory usage. I'll try to investigate why it becomes >>>> unscalable. >>>> Hong >>>> >>>>> Hi, >>>>> >>>>> you'll find the new scaling attached (green line). I used the version >>>>> 3.11 and the four scalability options : >>>>> -matptap_via scalable >>>>> -inner_diag_matmatmult_via scalable >>>>> -inner_offdiag_matmatmult_via scalable >>>>> -mat_freeintermediatedatastructures >>>>> >>>>> The scaling is much better! The code even uses less memory for the >>>>> smallest cases. There is still an increase for the larger one. >>>>> >>>>> With regard to the time scaling, I used KSPView and LogView on the two >>>>> previous scalings (blue and yellow lines) but not on the last one (green >>>>> line). So we can't really compare them, am I right? However, we can see >>>>> that the new time scaling looks quite good. It slightly increases from ~8s >>>>> to ~27s. >>>>> >>>>> Unfortunately, the computations are expensive so I would like to avoid >>>>> re-run them if possible. How relevant would be a proper time scaling for >>>>> you? >>>>> >>>>> Myriam >>>>> >>>>> Le 04/12/19 ? 18:18, Zhang, Hong a ?crit : >>>>> >>>>> Myriam : >>>>> Thanks for your effort. It will help us improve PETSc. >>>>> Hong >>>>> >>>>> Hi all, >>>>>> >>>>>> I used the wrong script, that's why it diverged... Sorry about that. >>>>>> I tried again with the right script applied on a tiny problem (~200 >>>>>> elements). I can see a small difference in memory usage (gain ~ 1mB). >>>>>> when adding the -mat_freeintermediatestructures option. I still have >>>>>> to >>>>>> execute larger cases to plot the scaling. The supercomputer I am used >>>>>> to >>>>>> run my jobs on is really busy at the moment so it takes a while. I >>>>>> hope >>>>>> I'll send you the results on Monday. >>>>>> >>>>>> Thanks everyone, >>>>>> >>>>>> Myriam >>>>>> >>>>>> >>>>>> Le 04/11/19 ? 06:01, Jed Brown a ?crit : >>>>>> > "Zhang, Hong" writes: >>>>>> > >>>>>> >> Jed: >>>>>> >>>> Myriam, >>>>>> >>>> Thanks for the plot. '-mat_freeintermediatedatastructures' >>>>>> should not affect solution. It releases almost half of memory in C=PtAP if >>>>>> C is not reused. >>>>>> >>> And yet if turning it on causes divergence, that would imply a >>>>>> bug. >>>>>> >>> Hong, are you able to reproduce the experiment to see the memory >>>>>> >>> scaling? >>>>>> >> I like to test his code using an alcf machine, but my hands are >>>>>> full now. I'll try it as soon as I find time, hopefully next week. >>>>>> > I have now compiled and run her code locally. >>>>>> > >>>>>> > Myriam, thanks for your last mail adding configuration and removing >>>>>> the >>>>>> > MemManager.h dependency. I ran with and without >>>>>> > -mat_freeintermediatedatastructures and don't see a difference in >>>>>> > convergence. What commands did you run to observe that difference? >>>>>> >>>>>> -- >>>>>> Myriam Peyrounette >>>>>> CNRS/IDRIS - HLST >>>>>> -- >>>>>> >>>>>> >>>>>> >>>>> -- >>>>> Myriam Peyrounette >>>>> CNRS/IDRIS - HLST >>>>> -- >>>>> >>>>> >>> -- >>> Myriam Peyrounette >>> CNRS/IDRIS - HLST >>> -- >>> >>> >> -- >> Myriam Peyrounette >> CNRS/IDRIS - HLST >> -- >> >> >> -- >> Myriam Peyrounette >> CNRS/IDRIS - HLST >> -- >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdkong.jd at gmail.com Fri May 3 20:00:36 2019 From: fdkong.jd at gmail.com (Fande Kong) Date: Fri, 3 May 2019 19:00:36 -0600 Subject: [petsc-users] ``--with-clanguage=c++" turns on "PETSC_HAVE_COMPLEX"? Message-ID: Hi All, Comping PETSc with ``--with-clanguage=c" works fine. But I could not compile PETSc with "--with-clanguage=c++" since the flag "PETSC_HAVE_COMPLEX" was wrongly set on by this option. */Users/kongf/projects/petsc/src/sys/objects/pinit.c:913:21: error: expected parameter declarator* * PetscComplex ic(0.0,1.0);* * ^* */Users/kongf/projects/petsc/src/sys/objects/pinit.c:913:21: error: expected ')'* */Users/kongf/projects/petsc/src/sys/objects/pinit.c:913:20: note: to match this '('* * PetscComplex ic(0.0,1.0);* * ^* */Users/kongf/projects/petsc/src/sys/objects/pinit.c:914:13: error: assigning to 'PetscComplex' (aka '_Complex double') from incompatible type 'PetscComplex ()' (aka '_Complex double ()')* * PETSC_i = ic;* * ^ ~~* *3 errors generated.* *make[2]: *** [arch-linux2-c-opt-memory/obj/sys/objects/pinit.o] Error 1* *make[2]: *** Waiting for unfinished jobs....* *make[2]: Leaving directory `/Users/kongf/projects/petsc'* *make[1]: *** [gnumake] Error 2* *make[1]: Leaving directory `/Users/kongf/projects/petsc'* ***************************ERROR************************************** * Error during compile, check arch-linux2-c-opt-memory/lib/petsc/conf/make.log* * Send it and arch-linux2-c-opt-memory/lib/petsc/conf/configure.log to petsc-maint at mcs.anl.gov * ********************************************************************** The make and configure logs are attached. Fande, -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: make.log.zip Type: application/zip Size: 4668 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log.zip Type: application/zip Size: 161471 bytes Desc: not available URL: From balay at mcs.anl.gov Fri May 3 20:29:08 2019 From: balay at mcs.anl.gov (Balay, Satish) Date: Sat, 4 May 2019 01:29:08 +0000 Subject: [petsc-users] ``--with-clanguage=c++" turns on "PETSC_HAVE_COMPLEX"? In-Reply-To: References: Message-ID: >>>>>>>> Executing: mpicxx -show stdout: clang -I/Users/kongf/projects/openmpi-2.1.1_installed/include -L/Users/kongf/projects/openmpi-2.1.1_installed/lib -lmpi <<<< Hm - I think this [specifying a C compiler as c++] is the trigger of this problem. configure checks if the c++ compiler supports complex. This test was successful [as it was done with .cxx file - and presumably clang switches to c++ mocd for a .cxx file. However PETSc sources a .c - so its likely compiling PETSc as C - so things are now inconsistant - and broken.. Note: PETSC_HAVE_COMPLEX => compilers support complex - so define a complex datatype PETSC_USE_COMPLEX => build PETSc with PetscScalar=complex Satish On Fri, 3 May 2019, Fande Kong via petsc-users wrote: > Hi All, > > Comping PETSc with ``--with-clanguage=c" works fine. But I could not > compile PETSc with "--with-clanguage=c++" since the flag > "PETSC_HAVE_COMPLEX" was wrongly set on by this option. > > */Users/kongf/projects/petsc/src/sys/objects/pinit.c:913:21: error: > expected parameter declarator* > * PetscComplex ic(0.0,1.0);* > * ^* > */Users/kongf/projects/petsc/src/sys/objects/pinit.c:913:21: error: > expected ')'* > */Users/kongf/projects/petsc/src/sys/objects/pinit.c:913:20: note: to match > this '('* > * PetscComplex ic(0.0,1.0);* > * ^* > */Users/kongf/projects/petsc/src/sys/objects/pinit.c:914:13: error: > assigning to 'PetscComplex' (aka '_Complex double') from incompatible type > 'PetscComplex ()' (aka '_Complex double ()')* > * PETSC_i = ic;* > * ^ ~~* > *3 errors generated.* > *make[2]: *** [arch-linux2-c-opt-memory/obj/sys/objects/pinit.o] Error 1* > *make[2]: *** Waiting for unfinished jobs....* > *make[2]: Leaving directory `/Users/kongf/projects/petsc'* > *make[1]: *** [gnumake] Error 2* > *make[1]: Leaving directory `/Users/kongf/projects/petsc'* > ***************************ERROR************************************** > * Error during compile, check > arch-linux2-c-opt-memory/lib/petsc/conf/make.log* > * Send it and arch-linux2-c-opt-memory/lib/petsc/conf/configure.log to > petsc-maint at mcs.anl.gov * > ********************************************************************** > > The make and configure logs are attached. > > Fande, > From fdkong.jd at gmail.com Fri May 3 20:53:45 2019 From: fdkong.jd at gmail.com (Fande Kong) Date: Fri, 3 May 2019 19:53:45 -0600 Subject: [petsc-users] ``--with-clanguage=c++" turns on "PETSC_HAVE_COMPLEX"? In-Reply-To: References: Message-ID: It looks like mpicxx from openmpi does not handle this correctly. I switched to mpich, and it works now. However there is till some warnings: *clang-6.0: warning: treating 'c' input as 'c++' when in C++ mode, this behavior is deprecated [-Wdeprecated]* * CXX arch-linux2-c-opt-memory/obj/dm/impls/plex/glexg.o* * CXX arch-linux2-c-opt-memory/obj/dm/impls/plex/petscpartmatpart.o* *clang-6.0: warning: treating 'c' input as 'c++' when in C++ mode, this behavior is dep* Fande, On Fri, May 3, 2019 at 7:29 PM Balay, Satish wrote: > >>>>>>>> > Executing: mpicxx -show > > stdout: clang > -I/Users/kongf/projects/openmpi-2.1.1_installed/include > -L/Users/kongf/projects/openmpi-2.1.1_installed/lib -lmpi > <<<< > > Hm - I think this [specifying a C compiler as c++] is the trigger of this > problem. > > configure checks if the c++ compiler supports complex. This test was > successful [as it was done with .cxx file - and presumably clang switches > to c++ mocd for a .cxx file. > > However PETSc sources a .c - so its likely compiling PETSc as C - so > things are now inconsistant - and broken.. > > Note: > PETSC_HAVE_COMPLEX => compilers support complex - so define a complex > datatype > PETSC_USE_COMPLEX => build PETSc with PetscScalar=complex > > > Satish > > On Fri, 3 May 2019, Fande Kong via petsc-users wrote: > > > Hi All, > > > > Comping PETSc with ``--with-clanguage=c" works fine. But I could not > > compile PETSc with "--with-clanguage=c++" since the flag > > "PETSC_HAVE_COMPLEX" was wrongly set on by this option. > > > > */Users/kongf/projects/petsc/src/sys/objects/pinit.c:913:21: error: > > expected parameter declarator* > > * PetscComplex ic(0.0,1.0);* > > * ^* > > */Users/kongf/projects/petsc/src/sys/objects/pinit.c:913:21: error: > > expected ')'* > > */Users/kongf/projects/petsc/src/sys/objects/pinit.c:913:20: note: to > match > > this '('* > > * PetscComplex ic(0.0,1.0);* > > * ^* > > */Users/kongf/projects/petsc/src/sys/objects/pinit.c:914:13: error: > > assigning to 'PetscComplex' (aka '_Complex double') from incompatible > type > > 'PetscComplex ()' (aka '_Complex double ()')* > > * PETSC_i = ic;* > > * ^ ~~* > > *3 errors generated.* > > *make[2]: *** [arch-linux2-c-opt-memory/obj/sys/objects/pinit.o] Error 1* > > *make[2]: *** Waiting for unfinished jobs....* > > *make[2]: Leaving directory `/Users/kongf/projects/petsc'* > > *make[1]: *** [gnumake] Error 2* > > *make[1]: Leaving directory `/Users/kongf/projects/petsc'* > > ***************************ERROR************************************** > > * Error during compile, check > > arch-linux2-c-opt-memory/lib/petsc/conf/make.log* > > * Send it and arch-linux2-c-opt-memory/lib/petsc/conf/configure.log to > > petsc-maint at mcs.anl.gov * > > ********************************************************************** > > > > The make and configure logs are attached. > > > > Fande, > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Fri May 3 21:02:56 2019 From: balay at mcs.anl.gov (Balay, Satish) Date: Sat, 4 May 2019 02:02:56 +0000 Subject: [petsc-users] ``--with-clanguage=c++" turns on "PETSC_HAVE_COMPLEX"? In-Reply-To: References: Message-ID: On Fri, 3 May 2019, Fande Kong via petsc-users wrote: > It looks like mpicxx from openmpi does not handle this correctly. Perhaps my earlier messages was not clear. The problem is not with OpenMPI - but your build of it. Its installed with 'clang' as the C++ compiler - it should be built with 'clang++' as the c++ compiler. > I switched to mpich, and it works now. > > However there is till some warnings: > > *clang-6.0: warning: treating 'c' input as 'c++' when in C++ mode, this > behavior is deprecated [-Wdeprecated]* > * CXX arch-linux2-c-opt-memory/obj/dm/impls/plex/glexg.o* > * CXX arch-linux2-c-opt-memory/obj/dm/impls/plex/petscpartmatpart.o* > *clang-6.0: warning: treating 'c' input as 'c++' when in C++ mode, this > behavior is dep* Yes - because PETSc sources are in C - and you are building with --with-clanguage=cxx - and this compiler thinks one should not compile .c sources as c++. Satish > > Fande, > > > On Fri, May 3, 2019 at 7:29 PM Balay, Satish wrote: > > > >>>>>>>> > > Executing: mpicxx -show > > > > stdout: clang > > -I/Users/kongf/projects/openmpi-2.1.1_installed/include > > -L/Users/kongf/projects/openmpi-2.1.1_installed/lib -lmpi > > <<<< > > > > Hm - I think this [specifying a C compiler as c++] is the trigger of this > > problem. > > > > configure checks if the c++ compiler supports complex. This test was > > successful [as it was done with .cxx file - and presumably clang switches > > to c++ mocd for a .cxx file. > > > > However PETSc sources a .c - so its likely compiling PETSc as C - so > > things are now inconsistant - and broken.. > > > > Note: > > PETSC_HAVE_COMPLEX => compilers support complex - so define a complex > > datatype > > PETSC_USE_COMPLEX => build PETSc with PetscScalar=complex > > > > > > Satish > > > > On Fri, 3 May 2019, Fande Kong via petsc-users wrote: > > > > > Hi All, > > > > > > Comping PETSc with ``--with-clanguage=c" works fine. But I could not > > > compile PETSc with "--with-clanguage=c++" since the flag > > > "PETSC_HAVE_COMPLEX" was wrongly set on by this option. > > > > > > */Users/kongf/projects/petsc/src/sys/objects/pinit.c:913:21: error: > > > expected parameter declarator* > > > * PetscComplex ic(0.0,1.0);* > > > * ^* > > > */Users/kongf/projects/petsc/src/sys/objects/pinit.c:913:21: error: > > > expected ')'* > > > */Users/kongf/projects/petsc/src/sys/objects/pinit.c:913:20: note: to > > match > > > this '('* > > > * PetscComplex ic(0.0,1.0);* > > > * ^* > > > */Users/kongf/projects/petsc/src/sys/objects/pinit.c:914:13: error: > > > assigning to 'PetscComplex' (aka '_Complex double') from incompatible > > type > > > 'PetscComplex ()' (aka '_Complex double ()')* > > > * PETSC_i = ic;* > > > * ^ ~~* > > > *3 errors generated.* > > > *make[2]: *** [arch-linux2-c-opt-memory/obj/sys/objects/pinit.o] Error 1* > > > *make[2]: *** Waiting for unfinished jobs....* > > > *make[2]: Leaving directory `/Users/kongf/projects/petsc'* > > > *make[1]: *** [gnumake] Error 2* > > > *make[1]: Leaving directory `/Users/kongf/projects/petsc'* > > > ***************************ERROR************************************** > > > * Error during compile, check > > > arch-linux2-c-opt-memory/lib/petsc/conf/make.log* > > > * Send it and arch-linux2-c-opt-memory/lib/petsc/conf/configure.log to > > > petsc-maint at mcs.anl.gov * > > > ********************************************************************** > > > > > > The make and configure logs are attached. > > > > > > Fande, > > > > > > > > From jed at jedbrown.org Fri May 3 21:04:35 2019 From: jed at jedbrown.org (Jed Brown) Date: Fri, 03 May 2019 20:04:35 -0600 Subject: [petsc-users] ``--with-clanguage=c++" turns on "PETSC_HAVE_COMPLEX"? In-Reply-To: References: Message-ID: <87a7g37z64.fsf@jedbrown.org> Fande Kong via petsc-users writes: > It looks like mpicxx from openmpi does not handle this correctly. I > switched to mpich, and it works now. > > However there is till some warnings: > > *clang-6.0: warning: treating 'c' input as 'c++' when in C++ mode, this > behavior is deprecated [-Wdeprecated]* > * CXX arch-linux2-c-opt-memory/obj/dm/impls/plex/glexg.o* > * CXX arch-linux2-c-opt-memory/obj/dm/impls/plex/petscpartmatpart.o* > *clang-6.0: warning: treating 'c' input as 'c++' when in C++ mode, this > behavior is dep* Clang has always done this. I don't know a clean way to work around the warning, but we don't recommend with-clanguage=c++ unless you're on a platform where a sane C compiler doesn't exist. From fdkong.jd at gmail.com Fri May 3 21:09:38 2019 From: fdkong.jd at gmail.com (Fande Kong) Date: Fri, 3 May 2019 20:09:38 -0600 Subject: [petsc-users] ``--with-clanguage=c++" turns on "PETSC_HAVE_COMPLEX"? In-Reply-To: References: Message-ID: On Fri, May 3, 2019 at 8:02 PM Balay, Satish wrote: > On Fri, 3 May 2019, Fande Kong via petsc-users wrote: > > > It looks like mpicxx from openmpi does not handle this correctly. > > Perhaps my earlier messages was not clear. The problem is not with > OpenMPI - but your build of it. Its installed with 'clang' as the C++ > compiler - it should be built with 'clang++' as the c++ compiler. > Oh, I see. Thanks. Fande, > > > I switched to mpich, and it works now. > > > > However there is till some warnings: > > > > *clang-6.0: warning: treating 'c' input as 'c++' when in C++ mode, this > > behavior is deprecated [-Wdeprecated]* > > * CXX arch-linux2-c-opt-memory/obj/dm/impls/plex/glexg.o* > > * CXX > arch-linux2-c-opt-memory/obj/dm/impls/plex/petscpartmatpart.o* > > *clang-6.0: warning: treating 'c' input as 'c++' when in C++ mode, this > > behavior is dep* > > Yes - because PETSc sources are in C - and you are building with > --with-clanguage=cxx - and this compiler thinks one should not compile > .c sources as c++. > > Satish > > > > > Fande, > > > > > > On Fri, May 3, 2019 at 7:29 PM Balay, Satish wrote: > > > > > >>>>>>>> > > > Executing: mpicxx -show > > > > > > stdout: clang > > > -I/Users/kongf/projects/openmpi-2.1.1_installed/include > > > -L/Users/kongf/projects/openmpi-2.1.1_installed/lib -lmpi > > > <<<< > > > > > > Hm - I think this [specifying a C compiler as c++] is the trigger of > this > > > problem. > > > > > > configure checks if the c++ compiler supports complex. This test was > > > successful [as it was done with .cxx file - and presumably clang > switches > > > to c++ mocd for a .cxx file. > > > > > > However PETSc sources a .c - so its likely compiling PETSc as C - so > > > things are now inconsistant - and broken.. > > > > > > Note: > > > PETSC_HAVE_COMPLEX => compilers support complex - so define a complex > > > datatype > > > PETSC_USE_COMPLEX => build PETSc with PetscScalar=complex > > > > > > > > > Satish > > > > > > On Fri, 3 May 2019, Fande Kong via petsc-users wrote: > > > > > > > Hi All, > > > > > > > > Comping PETSc with ``--with-clanguage=c" works fine. But I could not > > > > compile PETSc with "--with-clanguage=c++" since the flag > > > > "PETSC_HAVE_COMPLEX" was wrongly set on by this option. > > > > > > > > */Users/kongf/projects/petsc/src/sys/objects/pinit.c:913:21: error: > > > > expected parameter declarator* > > > > * PetscComplex ic(0.0,1.0);* > > > > * ^* > > > > */Users/kongf/projects/petsc/src/sys/objects/pinit.c:913:21: error: > > > > expected ')'* > > > > */Users/kongf/projects/petsc/src/sys/objects/pinit.c:913:20: note: to > > > match > > > > this '('* > > > > * PetscComplex ic(0.0,1.0);* > > > > * ^* > > > > */Users/kongf/projects/petsc/src/sys/objects/pinit.c:914:13: error: > > > > assigning to 'PetscComplex' (aka '_Complex double') from incompatible > > > type > > > > 'PetscComplex ()' (aka '_Complex double ()')* > > > > * PETSC_i = ic;* > > > > * ^ ~~* > > > > *3 errors generated.* > > > > *make[2]: *** [arch-linux2-c-opt-memory/obj/sys/objects/pinit.o] > Error 1* > > > > *make[2]: *** Waiting for unfinished jobs....* > > > > *make[2]: Leaving directory `/Users/kongf/projects/petsc'* > > > > *make[1]: *** [gnumake] Error 2* > > > > *make[1]: Leaving directory `/Users/kongf/projects/petsc'* > > > > > ***************************ERROR************************************** > > > > * Error during compile, check > > > > arch-linux2-c-opt-memory/lib/petsc/conf/make.log* > > > > * Send it and arch-linux2-c-opt-memory/lib/petsc/conf/configure.log > to > > > > petsc-maint at mcs.anl.gov * > > > > > ********************************************************************** > > > > > > > > The make and configure logs are attached. > > > > > > > > Fande, > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Fri May 3 21:43:11 2019 From: balay at mcs.anl.gov (Balay, Satish) Date: Sat, 4 May 2019 02:43:11 +0000 Subject: [petsc-users] ``--with-clanguage=c++" turns on "PETSC_HAVE_COMPLEX"? In-Reply-To: References: Message-ID: On Fri, 3 May 2019, Fande Kong via petsc-users wrote: > On Fri, May 3, 2019 at 8:02 PM Balay, Satish wrote: > > > On Fri, 3 May 2019, Fande Kong via petsc-users wrote: > > > > > It looks like mpicxx from openmpi does not handle this correctly. > > > > Perhaps my earlier messages was not clear. The problem is not with > > OpenMPI - but your build of it. Its installed with 'clang' as the C++ > > compiler - it should be built with 'clang++' as the c++ compiler. > > > > Oh, I see. Thanks. I've added a check to configure https://bitbucket.org/petsc/petsc/pull-requests/1622/configure-when-with-clanguage-cxx-is-used/diff Satish From fdkong.jd at gmail.com Fri May 3 23:20:32 2019 From: fdkong.jd at gmail.com (Fande Kong) Date: Fri, 3 May 2019 22:20:32 -0600 Subject: [petsc-users] Bad memory scaling with PETSc 3.10 In-Reply-To: References: <877ecaeo63.fsf@jedbrown.org> <9629bc17-89d4-fdc9-1b44-7c4a25f62861@idris.fr> <3ad6ca66-d665-739a-874e-7599d5270797@idris.fr> <877ec17g52.fsf@jedbrown.org> <87zhox5gxi.fsf@jedbrown.org> <93a32d83-0b81-8bf3-d654-6711d9b0138f@idris.fr> <26b73c92-6a23-cf03-9e7f-1a24893ee512@idris.fr> <31946231-4948-fc6a-093a-7ed8f00f3579@idris.fr> <13df6685-2825-3629-0c79-cb66f4deae22@idris.fr> Message-ID: Hi Myriam, I run the example you attached earlier with "-mx 48 -my 48 -mz 48 -levels 3 -ksp_view -matptap_via allatonce -log_view ". There are six PtAPs. Two of them are sill using the nonscalable version of the algorithm (this might explain why the memory still exponentially increases) even though we have asked PETSc to use the ``allatonce" algorithm. This is happening because MATMAIJ does not honor the petsc option, instead, it uses the default setting of MPIAIJ. I have a fix at https://bitbucket.org/petsc/petsc/pull-requests/1623/choose-algorithms-in/diff. The PR should fix the issue. Thanks again for your report, Fande, -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri May 3 23:54:52 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Sat, 4 May 2019 04:54:52 +0000 Subject: [petsc-users] Bad memory scaling with PETSc 3.10 In-Reply-To: References: <877ecaeo63.fsf@jedbrown.org> <9629bc17-89d4-fdc9-1b44-7c4a25f62861@idris.fr> <3ad6ca66-d665-739a-874e-7599d5270797@idris.fr> <877ec17g52.fsf@jedbrown.org> <87zhox5gxi.fsf@jedbrown.org> <93a32d83-0b81-8bf3-d654-6711d9b0138f@idris.fr> <26b73c92-6a23-cf03-9e7f-1a24893ee512@idris.fr> <31946231-4948-fc6a-093a-7ed8f00f3579@idris.fr> <13df6685-2825-3629-0c79-cb66f4deae22@idris.fr> Message-ID: <313C29E3-5860-4904-A343-1FF68D9C1809@anl.gov> Hmm, I had already fixed this, I think, https://bitbucket.org/petsc/petsc/pull-requests/1606/change-handling-of-matptap_mpiaij_mpimaij/diff but unfortunately our backlog of pull requests kept it out of master. We are (well Satish and Jed) working on a new CI infrastructure that will hopefully be more stable than the current CI that we are using. Fande, Sorry you had to spend time on this. Barry > On May 3, 2019, at 11:20 PM, Fande Kong via petsc-users wrote: > > Hi Myriam, > > I run the example you attached earlier with "-mx 48 -my 48 -mz 48 -levels 3 -ksp_view -matptap_via allatonce -log_view ". > > There are six PtAPs. Two of them are sill using the nonscalable version of the algorithm (this might explain why the memory still exponentially increases) even though we have asked PETSc to use the ``allatonce" algorithm. This is happening because MATMAIJ does not honor the petsc option, instead, it uses the default setting of MPIAIJ. I have a fix at https://bitbucket.org/petsc/petsc/pull-requests/1623/choose-algorithms-in/diff. The PR should fix the issue. > > Thanks again for your report, > > Fande, > > From thw1021 at outlook.com Sun May 5 08:59:26 2019 From: thw1021 at outlook.com (tang hongwei) Date: Sun, 5 May 2019 13:59:26 +0000 Subject: [petsc-users] Block Number of Grid; Domain Decomposit ion References: Message-ID: Dear Developers, Thanks for your work about PETSc. I am using PETSc (version 3.1) for CFD, but I get confused by some problems, hoping you can help me: 1. Could the block number of grid be more than 1 ? I use pointwise (http://www.pointwise.com) to draw grid. For some cases, the block number of grid may be more than 1. 2. How PETSc decompose the domain ? In the user manual, I notice that PETSc can decompose the domain automatically. However, I don't understand how PETSc distributes processors for each sub-domain (How xs, ys, zs, xm, ym, zm get values?) Best Regards, Hongwei ________________________________ Sent from YoMail -------------- next part -------------- An HTML attachment was scrubbed... URL: From qiyuelu1 at gmail.com Sun May 5 10:05:39 2019 From: qiyuelu1 at gmail.com (Qiyue Lu) Date: Sun, 5 May 2019 10:05:39 -0500 Subject: [petsc-users] SLEPc: logpcg works but lanczos fails Message-ID: Hello, I am solving a general eigenvalue problem Ax=lamda*Bx A is the stiffness matrix, B the mass matrix. The DOF of these matrices are 24,000 around. Both of them are symmetric and stored in SEQSBAIJ format. Also, I downloaded mumps during the configuration and installation of PETSc. In SLETc, this eigenvalue system can be solved by *-eps_tpye logpcg* while requesting *EPS_SMALLEST_REAL*. However, *-eps_type lanczos* doesn't work, which yields a very huge number. It seems mumps is called indeed. The output of *-eps_view* is attached. The command line I am using is: *mpirun -np 40 ./test -fA matrixA -fB matrixB -eps_type lanczos -eps_view* Did I miss any substantial configurations for lanczos solver? Thanks, Qiyue Lu -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test10B_bash.out Type: application/octet-stream Size: 14490 bytes Desc: not available URL: From qiyuelu1 at gmail.com Sun May 5 10:22:12 2019 From: qiyuelu1 at gmail.com (Qiyue Lu) Date: Sun, 5 May 2019 10:22:12 -0500 Subject: [petsc-users] LINEAR SOLVER In-Reply-To: References: Message-ID: Hello, Savneet: For the --with-debugging=0 issue, maybe you can try add --COPTFLAGS="-O" --CXXOPTFLAGS="-O" --FOPTFLAGS="-O", because in version of 3.10.3, the default flags are "-g -O". If -g is there, --with-debugging=0 won't work. I tried this in my installation and it works. Thanks, Qiyue Lu On Fri, Aug 24, 2018 at 4:48 AM Savneet Kaur wrote: > Hello, > > I am having a small trouble with PETSC. > > I am trying to use linear solver methods: LU and CHOLESKY but unable to > run the code. The error message which I am getting is "See > http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html for > possible LU and Cholesky solvers > [0]PETSC ERROR: Could not locate a solver package. Perhaps you must > ./configure with --download- > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting." > > But while installing the PETSC, I am using this command line for the > configuration *"./configure --with-cc=mpicc --with-fc=mpif90 > --with-cxx=mpicxx --download-scalapack --download-mumps --download-hypre > --download-parmetis --download-metis" .. *which should eventually install > LU and cholesky packages as i am calling them to configure it using MUMPS > package. > > > And my second question would be, I want to install the non debug version > of the PETSC and SLEPC. If you can help me in that? > > Infact, I even tried configruing with *"--with-debugging=0". *But it did > not changed anything. > > Looking forward for your response. > > > Thanks in advance. > > Regards, > > Savneet kaur > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun May 5 10:55:02 2019 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 5 May 2019 11:55:02 -0400 Subject: [petsc-users] Block Number of Grid; Domain Decomposit ion In-Reply-To: References: Message-ID: On Sun, May 5, 2019 at 9:59 AM tang hongwei via petsc-users < petsc-users at mcs.anl.gov> wrote: > Dear Developers, > Thanks for your work about PETSc. I am using PETSc (version 3.1) for > CFD, but I get confused by some problems, hoping you can help me: > > 1. Could the block number of grid be more than 1 ? I use pointwise ( > http://www.pointwise.com) to draw grid. For some cases, the block number > of grid may be more than 1. > Are you talking about DMDA? It can only handle purely Cartesian meshes. We now have: - DMStag: Cartesian meshes with staggered discretizations - DMComposite: Using multiple DMs at a time (this might be what you want for multiblock) - DMForest: Using p4est for structured, adaptive grids (this can also handle multiblock) - DMPlex: Arbitrary meshes (this can also handle multiblock) > 2. How PETSc decompose the domain ? > Into blocks. > In the user manual, I notice that PETSc can decompose the domain > automatically. However, I don't understand how PETSc distributes > processors for each sub-domain (How xs, ys, zs, xm, ym, zm get values?) > It divides each direction into pieces, and then the partitions are the tensor products of those pieces. Thanks, Matt > Best Regards, > Hongwei > > ------------------------------ > Sent from YoMail > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Sun May 5 14:24:36 2019 From: jroman at dsic.upv.es (Jose E. Roman) Date: Sun, 5 May 2019 21:24:36 +0200 Subject: [petsc-users] SLEPc: logpcg works but lanczos fails In-Reply-To: References: Message-ID: Is your B-matrix singular? Then the solver might be approximating an infinite eigenvalue even if you ask for smallest real eigenvalues. For Lanczos-type solvers, it is safer to run with -st_type sinvert -eps_target 0 if you know that eigenvalues are positive. Or instead of 0 use a target value that bounds the eigenvalues from below. Also, it is better to use Krylov-Schur instead of Lanczos, for symmetric problems it will run Lanczos with implicit restart - the 'lanczos' solver is Lanczos with explicit restart (usually worse). Jose > El 5 may 2019, a las 17:05, Qiyue Lu via petsc-users escribi?: > > Hello, > I am solving a general eigenvalue problem > Ax=lamda*Bx > A is the stiffness matrix, B the mass matrix. The DOF of these matrices are 24,000 around. Both of them are symmetric and stored in SEQSBAIJ format. Also, I downloaded mumps during the configuration and installation of PETSc. > In SLETc, this eigenvalue system can be solved by -eps_tpye logpcg while requesting EPS_SMALLEST_REAL. However, -eps_type lanczos doesn't work, which yields a very huge number. It seems mumps is called indeed. The output of -eps_view is attached. The command line I am using is: > mpirun -np 40 ./test -fA matrixA -fB matrixB -eps_type lanczos -eps_view > > Did I miss any substantial configurations for lanczos solver? > > Thanks, > > Qiyue Lu > From knepley at gmail.com Sun May 5 15:51:45 2019 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 5 May 2019 16:51:45 -0400 Subject: [petsc-users] LINEAR SOLVER In-Reply-To: References: Message-ID: On Sun, May 5, 2019 at 11:23 AM Qiyue Lu via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hello, Savneet: > For the --with-debugging=0 issue, maybe you can try add --COPTFLAGS="-O" > --CXXOPTFLAGS="-O" --FOPTFLAGS="-O", because in version of 3.10.3, the > default flags are "-g -O". If -g is there, --with-debugging=0 won't work. > 1) This is not correct. If you have a problem with --with-debugging=0, then send in the configure.log. 2) LU is supported by MUMPS. Send your configure.log and complete error message when trying to use it. Thanks, Matt > I tried this in my installation and it works. > > Thanks, > > Qiyue Lu > > On Fri, Aug 24, 2018 at 4:48 AM Savneet Kaur wrote: > >> Hello, >> >> I am having a small trouble with PETSC. >> >> I am trying to use linear solver methods: LU and CHOLESKY but unable to >> run the code. The error message which I am getting is "See >> http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html for >> possible LU and Cholesky solvers >> [0]PETSC ERROR: Could not locate a solver package. Perhaps you must >> ./configure with --download- >> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html >> for trouble shooting." >> >> But while installing the PETSC, I am using this command line for the >> configuration *"./configure --with-cc=mpicc --with-fc=mpif90 >> --with-cxx=mpicxx --download-scalapack --download-mumps --download-hypre >> --download-parmetis --download-metis" .. *which should eventually >> install LU and cholesky packages as i am calling them to configure it using >> MUMPS package. >> >> >> And my second question would be, I want to install the non debug version >> of the PETSC and SLEPC. If you can help me in that? >> >> Infact, I even tried configruing with *"--with-debugging=0". *But it did >> not changed anything. >> >> Looking forward for your response. >> >> >> Thanks in advance. >> >> Regards, >> >> Savneet kaur >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rlmackie862 at gmail.com Sun May 5 17:21:23 2019 From: rlmackie862 at gmail.com (Randall Mackie) Date: Sun, 5 May 2019 15:21:23 -0700 Subject: [petsc-users] strange error using fgmres Message-ID: In solving a nonlinear optimization problem, I was recently experimenting with fgmres using the following options: -nlcg_ksp_type fgmres \ -nlcg_pc_type ksp \ -nlcg_ksp_ksp_type bcgs \ -nlcg_ksp_pc_type jacobi \ -nlcg_ksp_rtol 1e-6 \ -nlcg_ksp_ksp_max_it 300 \ -nlcg_ksp_max_it 200 \ -nlcg_ksp_converged_reason \ -nlcg_ksp_monitor_true_residual \ I sometimes randomly will get an error like the following: Residual norms for nlcg_ solve. 0 KSP unpreconditioned resid norm 3.371606868500e+04 true resid norm 3.371606868500e+04 ||r(i)||/||b|| 1.000000000000e+00 1 KSP unpreconditioned resid norm 2.322590778002e+02 true resid norm 2.322590778002e+02 ||r(i)||/||b|| 6.888676137487e-03 2 KSP unpreconditioned resid norm 8.262440884758e+01 true resid norm 8.262440884758e+01 ||r(i)||/||b|| 2.450594392232e-03 3 KSP unpreconditioned resid norm 3.660428333809e+01 true resid norm 3.660428333809e+01 ||r(i)||/||b|| 1.085662853522e-03 3 KSP unpreconditioned resid norm 0.000000000000e+00 true resid norm -nan ||r(i)||/||b|| -nan Linear nlcg_ solve did not converge due to DIVERGED_PC_FAILED iterations 3 PC_FAILED due to SUBPC_ERROR This usually happens after a few nonlinear optimization iterations, meaning that it?s worked perfectly fine until this point. How can using jacobi pc all of a sudden cause a NaN, if it?s worked perfectly fine before? Some other errors in the output log file are as follows, although I have no idea if they result from the above error or not: [13]PETSC ERROR: Object is in wrong state [13]PETSC ERROR: Clearing DM of global vectors that has a global vector obtained with DMGetGlobalVector() [13]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [13]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019 [27]PETSC ERROR: #1 DMClearGlobalVectors() line 196 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/dm/interface/dmget.c [27]PETSC ERROR: Configure options --with-clean=1 --with-scalar-type=complex --with-debugging=0 --with-fortran=1 --with-blaslapack-dir=/state/std2/intel_2018/m kl --with-mkl_pardiso-dir=/state/std2/intel_2018/mkl --with-mkl_cpardiso-dir=/state/std2/intel_2018/mkl --download-mumps=../external/mumps_v5.1.2-p1.tar.gz --d ownload-scalapack=../external/scalapack-2.0.2.tgz --with-cc=mpiicc --with-fc=mpiifort --with-cxx=mpiicc --FOPTFLAGS="-O3 -xHost" --COPTFLAGS="-O3 -xHost" --CXX OPTFLAGS="-O3 -xHost" #2 DMDestroy() line 752 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/dm/interface/dm.c [72]PETSC ERROR: #3 PetscObjectDereference() line 624 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/sys/objects/inherit.c [72]PETSC ERROR: #4 PetscObjectListDestroy() line 156 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/sys/objects/olist.c [72]PETSC ERROR: #5 PetscHeaderDestroy_Private() line 122 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/sys/objects/inherit.c [72]PETSC ERROR: #6 VecDestroy() line 412 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/vec/vec/interface/vector.c This is a large run taking many hours to get to this problem. I will try to run in debug mode, but given that this seems to be randomly happening (this has happened maybe 30% of the time I have used the fgmres option), there is no guarantee that will show anything useful. Valgrind is obviously out of the question for a large run, and I have yet to reproduce this on a smaller run. Anyone have any ideas as to what?s causing this? Thanks in advance, Randy M. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun May 5 18:01:51 2019 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 5 May 2019 19:01:51 -0400 Subject: [petsc-users] strange error using fgmres In-Reply-To: References: Message-ID: On Sun, May 5, 2019 at 6:22 PM Randall Mackie via petsc-users < petsc-users at mcs.anl.gov> wrote: > In solving a nonlinear optimization problem, I was recently experimenting > with fgmres using the following options: > > -nlcg_ksp_type fgmres \ > -nlcg_pc_type ksp \ > -nlcg_ksp_ksp_type bcgs \ > -nlcg_ksp_pc_type jacobi \ > -nlcg_ksp_rtol 1e-6 \ > -nlcg_ksp_ksp_max_it 300 \ > -nlcg_ksp_max_it 200 \ > -nlcg_ksp_converged_reason \ > -nlcg_ksp_monitor_true_residual \ > > I sometimes randomly will get an error like the following: > > Residual norms for nlcg_ solve. > 0 KSP unpreconditioned resid norm 3.371606868500e+04 true resid norm > 3.371606868500e+04 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP unpreconditioned resid norm 2.322590778002e+02 true resid norm > 2.322590778002e+02 ||r(i)||/||b|| 6.888676137487e-03 > 2 KSP unpreconditioned resid norm 8.262440884758e+01 true resid norm > 8.262440884758e+01 ||r(i)||/||b|| 2.450594392232e-03 > 3 KSP unpreconditioned resid norm 3.660428333809e+01 true resid norm > 3.660428333809e+01 ||r(i)||/||b|| 1.085662853522e-03 > 3 KSP unpreconditioned resid norm 0.000000000000e+00 true resid > norm -nan ||r(i)||/||b|| -nan > Linear nlcg_ solve did not converge due to DIVERGED_PC_FAILED iterations 3 > PC_FAILED due to SUBPC_ERROR > > This usually happens after a few nonlinear optimization iterations, > meaning that it?s worked perfectly fine until this point. > How can using jacobi pc all of a sudden cause a NaN, if it?s worked > perfectly fine before? > This is not Jacobi breaking down, this is BCGS I believe. In order to see this, I think you can just replace bcgs with gmres. Thanks, Matt > Some other errors in the output log file are as follows, although I have > no idea if they result from the above error or not: > > [13]PETSC ERROR: Object is in wrong state > [13]PETSC ERROR: Clearing DM of global vectors that has a global vector > obtained with DMGetGlobalVector() > [13]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for > trouble shooting. > [13]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019 > > > [27]PETSC ERROR: #1 DMClearGlobalVectors() line 196 in > /state/std2/FEMI/PETSc/petsc-3.11.1/src/dm/interface/dmget.c > [27]PETSC ERROR: Configure options --with-clean=1 > --with-scalar-type=complex --with-debugging=0 --with-fortran=1 > --with-blaslapack-dir=/state/std2/intel_2018/m > kl --with-mkl_pardiso-dir=/state/std2/intel_2018/mkl > --with-mkl_cpardiso-dir=/state/std2/intel_2018/mkl > --download-mumps=../external/mumps_v5.1.2-p1.tar.gz --d > ownload-scalapack=../external/scalapack-2.0.2.tgz --with-cc=mpiicc > --with-fc=mpiifort --with-cxx=mpiicc --FOPTFLAGS="-O3 -xHost" > --COPTFLAGS="-O3 -xHost" --CXX > OPTFLAGS="-O3 -xHost" > > > #2 DMDestroy() line 752 in > /state/std2/FEMI/PETSc/petsc-3.11.1/src/dm/interface/dm.c > [72]PETSC ERROR: #3 PetscObjectDereference() line 624 in > /state/std2/FEMI/PETSc/petsc-3.11.1/src/sys/objects/inherit.c > [72]PETSC ERROR: #4 PetscObjectListDestroy() line 156 in > /state/std2/FEMI/PETSc/petsc-3.11.1/src/sys/objects/olist.c > [72]PETSC ERROR: #5 PetscHeaderDestroy_Private() line 122 in > /state/std2/FEMI/PETSc/petsc-3.11.1/src/sys/objects/inherit.c > [72]PETSC ERROR: #6 VecDestroy() line 412 in > /state/std2/FEMI/PETSc/petsc-3.11.1/src/vec/vec/interface/vector.c > > > > This is a large run taking many hours to get to this problem. I will try > to run in debug mode, but given that this seems to be randomly happening > (this has happened maybe 30% of the time I have used the fgmres option), > there is no guarantee that will show anything useful. Valgrind is obviously > out of the question for a large run, and I have yet to reproduce this on a > smaller run. > > Anyone have any ideas as to what?s causing this? > > Thanks in advance, > > Randy M. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sun May 5 19:18:12 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Mon, 6 May 2019 00:18:12 +0000 Subject: [petsc-users] strange error using fgmres In-Reply-To: References: Message-ID: <67121265-3FF7-4355-B2E3-D42A02D46D10@anl.gov> Even if you don't get failures on the smaller version of a code it can still be worth running with valgrind (when you can't run valgrind on the massive problem) because often the problem is still there on the smaller problem, just less directly visible but valgrind can still find it. > [13]PETSC ERROR: Object is in wrong state > [13]PETSC ERROR: Clearing DM of global vectors that has a global vector obtained with DMGetGlobalVector() You probably have a work vector obtained with DMGetGlobalVector() that you forgot to return with DMRestoreGlobalVector(). Though I would expect that this would reproduce on any size problem. Barry > On May 5, 2019, at 5:21 PM, Randall Mackie via petsc-users wrote: > > In solving a nonlinear optimization problem, I was recently experimenting with fgmres using the following options: > > -nlcg_ksp_type fgmres \ > -nlcg_pc_type ksp \ > -nlcg_ksp_ksp_type bcgs \ > -nlcg_ksp_pc_type jacobi \ > -nlcg_ksp_rtol 1e-6 \ > -nlcg_ksp_ksp_max_it 300 \ > -nlcg_ksp_max_it 200 \ > -nlcg_ksp_converged_reason \ > -nlcg_ksp_monitor_true_residual \ > > I sometimes randomly will get an error like the following: > > Residual norms for nlcg_ solve. > 0 KSP unpreconditioned resid norm 3.371606868500e+04 true resid norm 3.371606868500e+04 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP unpreconditioned resid norm 2.322590778002e+02 true resid norm 2.322590778002e+02 ||r(i)||/||b|| 6.888676137487e-03 > 2 KSP unpreconditioned resid norm 8.262440884758e+01 true resid norm 8.262440884758e+01 ||r(i)||/||b|| 2.450594392232e-03 > 3 KSP unpreconditioned resid norm 3.660428333809e+01 true resid norm 3.660428333809e+01 ||r(i)||/||b|| 1.085662853522e-03 > 3 KSP unpreconditioned resid norm 0.000000000000e+00 true resid norm -nan ||r(i)||/||b|| -nan > Linear nlcg_ solve did not converge due to DIVERGED_PC_FAILED iterations 3 > PC_FAILED due to SUBPC_ERROR > > This usually happens after a few nonlinear optimization iterations, meaning that it?s worked perfectly fine until this point. > How can using jacobi pc all of a sudden cause a NaN, if it?s worked perfectly fine before? > > Some other errors in the output log file are as follows, although I have no idea if they result from the above error or not: > > [13]PETSC ERROR: Object is in wrong state > [13]PETSC ERROR: Clearing DM of global vectors that has a global vector obtained with DMGetGlobalVector() > [13]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [13]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019 > > > [27]PETSC ERROR: #1 DMClearGlobalVectors() line 196 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/dm/interface/dmget.c > [27]PETSC ERROR: Configure options --with-clean=1 --with-scalar-type=complex --with-debugging=0 --with-fortran=1 --with-blaslapack-dir=/state/std2/intel_2018/m > kl --with-mkl_pardiso-dir=/state/std2/intel_2018/mkl --with-mkl_cpardiso-dir=/state/std2/intel_2018/mkl --download-mumps=../external/mumps_v5.1.2-p1.tar.gz --d > ownload-scalapack=../external/scalapack-2.0.2.tgz --with-cc=mpiicc --with-fc=mpiifort --with-cxx=mpiicc --FOPTFLAGS="-O3 -xHost" --COPTFLAGS="-O3 -xHost" --CXX > OPTFLAGS="-O3 -xHost" > > > #2 DMDestroy() line 752 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/dm/interface/dm.c > [72]PETSC ERROR: #3 PetscObjectDereference() line 624 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/sys/objects/inherit.c > [72]PETSC ERROR: #4 PetscObjectListDestroy() line 156 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/sys/objects/olist.c > [72]PETSC ERROR: #5 PetscHeaderDestroy_Private() line 122 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/sys/objects/inherit.c > [72]PETSC ERROR: #6 VecDestroy() line 412 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/vec/vec/interface/vector.c > > > > This is a large run taking many hours to get to this problem. I will try to run in debug mode, but given that this seems to be randomly happening (this has happened maybe 30% of the time I have used the fgmres option), there is no guarantee that will show anything useful. Valgrind is obviously out of the question for a large run, and I have yet to reproduce this on a smaller run. > > Anyone have any ideas as to what?s causing this? > > Thanks in advance, > > Randy M. From bsmith at mcs.anl.gov Sun May 5 20:03:54 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Mon, 6 May 2019 01:03:54 +0000 Subject: [petsc-users] strange error using fgmres In-Reply-To: References: Message-ID: <3D5F0B56-2D6D-4874-9273-2232BAC68158@anl.gov> Run with -ksp_error_if_not_converged -info this will provide more detail at locating the exact location the error occurred. Barry > On May 5, 2019, at 5:21 PM, Randall Mackie via petsc-users wrote: > > In solving a nonlinear optimization problem, I was recently experimenting with fgmres using the following options: > > -nlcg_ksp_type fgmres \ > -nlcg_pc_type ksp \ > -nlcg_ksp_ksp_type bcgs \ > -nlcg_ksp_pc_type jacobi \ > -nlcg_ksp_rtol 1e-6 \ > -nlcg_ksp_ksp_max_it 300 \ > -nlcg_ksp_max_it 200 \ > -nlcg_ksp_converged_reason \ > -nlcg_ksp_monitor_true_residual \ > > I sometimes randomly will get an error like the following: > > Residual norms for nlcg_ solve. > 0 KSP unpreconditioned resid norm 3.371606868500e+04 true resid norm 3.371606868500e+04 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP unpreconditioned resid norm 2.322590778002e+02 true resid norm 2.322590778002e+02 ||r(i)||/||b|| 6.888676137487e-03 > 2 KSP unpreconditioned resid norm 8.262440884758e+01 true resid norm 8.262440884758e+01 ||r(i)||/||b|| 2.450594392232e-03 > 3 KSP unpreconditioned resid norm 3.660428333809e+01 true resid norm 3.660428333809e+01 ||r(i)||/||b|| 1.085662853522e-03 > 3 KSP unpreconditioned resid norm 0.000000000000e+00 true resid norm -nan ||r(i)||/||b|| -nan > Linear nlcg_ solve did not converge due to DIVERGED_PC_FAILED iterations 3 > PC_FAILED due to SUBPC_ERROR > > This usually happens after a few nonlinear optimization iterations, meaning that it?s worked perfectly fine until this point. > How can using jacobi pc all of a sudden cause a NaN, if it?s worked perfectly fine before? > > Some other errors in the output log file are as follows, although I have no idea if they result from the above error or not: > > [13]PETSC ERROR: Object is in wrong state > [13]PETSC ERROR: Clearing DM of global vectors that has a global vector obtained with DMGetGlobalVector() > [13]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [13]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019 > > > [27]PETSC ERROR: #1 DMClearGlobalVectors() line 196 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/dm/interface/dmget.c > [27]PETSC ERROR: Configure options --with-clean=1 --with-scalar-type=complex --with-debugging=0 --with-fortran=1 --with-blaslapack-dir=/state/std2/intel_2018/m > kl --with-mkl_pardiso-dir=/state/std2/intel_2018/mkl --with-mkl_cpardiso-dir=/state/std2/intel_2018/mkl --download-mumps=../external/mumps_v5.1.2-p1.tar.gz --d > ownload-scalapack=../external/scalapack-2.0.2.tgz --with-cc=mpiicc --with-fc=mpiifort --with-cxx=mpiicc --FOPTFLAGS="-O3 -xHost" --COPTFLAGS="-O3 -xHost" --CXX > OPTFLAGS="-O3 -xHost" > > > #2 DMDestroy() line 752 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/dm/interface/dm.c > [72]PETSC ERROR: #3 PetscObjectDereference() line 624 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/sys/objects/inherit.c > [72]PETSC ERROR: #4 PetscObjectListDestroy() line 156 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/sys/objects/olist.c > [72]PETSC ERROR: #5 PetscHeaderDestroy_Private() line 122 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/sys/objects/inherit.c > [72]PETSC ERROR: #6 VecDestroy() line 412 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/vec/vec/interface/vector.c > > > > This is a large run taking many hours to get to this problem. I will try to run in debug mode, but given that this seems to be randomly happening (this has happened maybe 30% of the time I have used the fgmres option), there is no guarantee that will show anything useful. Valgrind is obviously out of the question for a large run, and I have yet to reproduce this on a smaller run. > > Anyone have any ideas as to what?s causing this? > > Thanks in advance, > > Randy M. From dave.mayhem23 at gmail.com Mon May 6 00:34:34 2019 From: dave.mayhem23 at gmail.com (Dave May) Date: Mon, 6 May 2019 07:34:34 +0200 Subject: [petsc-users] strange error using fgmres In-Reply-To: <67121265-3FF7-4355-B2E3-D42A02D46D10@anl.gov> References: <67121265-3FF7-4355-B2E3-D42A02D46D10@anl.gov> Message-ID: On Mon, 6 May 2019 at 02:18, Smith, Barry F. via petsc-users < petsc-users at mcs.anl.gov> wrote: > > > Even if you don't get failures on the smaller version of a code it can > still be worth running with valgrind (when you can't run valgrind on the > massive problem) because often the problem is still there on the smaller > problem, just less directly visible but valgrind can still find it. > > > > [13]PETSC ERROR: Object is in wrong state > > [13]PETSC ERROR: Clearing DM of global vectors that has a global vector > obtained with DMGetGlobalVector() > > You probably have a work vector obtained with DMGetGlobalVector() that > you forgot to return with DMRestoreGlobalVector(). Though I would expect > that this would reproduce on any size problem. I'd fix the DM issue first before addressing the solver problem. I suspect the DM error could cause the solver error. Yep - something is wrong with your management of vectors associated with one of your DM's. You can figure out if this is the case by running with -log_view. Make sure the summary of the objects reported shows that the number of Vecs created and destroyed matches. At the very least, if there is a mismatch, make sure this difference does not increase as you do additional optimization solvers (or time steps). As Barry says, you don't need to run a large scale job to detect this, nor do you need to run through many optimization solves - the problem exists and is detectable and thus fixable for all job sizes. > > Barry > > > > On May 5, 2019, at 5:21 PM, Randall Mackie via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > In solving a nonlinear optimization problem, I was recently > experimenting with fgmres using the following options: > > > > -nlcg_ksp_type fgmres \ > > -nlcg_pc_type ksp \ > > -nlcg_ksp_ksp_type bcgs \ > > -nlcg_ksp_pc_type jacobi \ > > -nlcg_ksp_rtol 1e-6 \ > > -nlcg_ksp_ksp_max_it 300 \ > > -nlcg_ksp_max_it 200 \ > > -nlcg_ksp_converged_reason \ > > -nlcg_ksp_monitor_true_residual \ > > > > I sometimes randomly will get an error like the following: > > > > Residual norms for nlcg_ solve. > > 0 KSP unpreconditioned resid norm 3.371606868500e+04 true resid norm > 3.371606868500e+04 ||r(i)||/||b|| 1.000000000000e+00 > > 1 KSP unpreconditioned resid norm 2.322590778002e+02 true resid norm > 2.322590778002e+02 ||r(i)||/||b|| 6.888676137487e-03 > > 2 KSP unpreconditioned resid norm 8.262440884758e+01 true resid norm > 8.262440884758e+01 ||r(i)||/||b|| 2.450594392232e-03 > > 3 KSP unpreconditioned resid norm 3.660428333809e+01 true resid norm > 3.660428333809e+01 ||r(i)||/||b|| 1.085662853522e-03 > > 3 KSP unpreconditioned resid norm 0.000000000000e+00 true resid norm > -nan ||r(i)||/||b|| -nan > > Linear nlcg_ solve did not converge due to DIVERGED_PC_FAILED iterations > 3 > > PC_FAILED due to SUBPC_ERROR > > > > This usually happens after a few nonlinear optimization iterations, > meaning that it?s worked perfectly fine until this point. > > How can using jacobi pc all of a sudden cause a NaN, if it?s worked > perfectly fine before? > > > > Some other errors in the output log file are as follows, although I have > no idea if they result from the above error or not: > > > > [13]PETSC ERROR: Object is in wrong state > > [13]PETSC ERROR: Clearing DM of global vectors that has a global vector > obtained with DMGetGlobalVector() > > [13]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [13]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019 > > > > > > [27]PETSC ERROR: #1 DMClearGlobalVectors() line 196 in > /state/std2/FEMI/PETSc/petsc-3.11.1/src/dm/interface/dmget.c > > [27]PETSC ERROR: Configure options --with-clean=1 > --with-scalar-type=complex --with-debugging=0 --with-fortran=1 > --with-blaslapack-dir=/state/std2/intel_2018/m > > kl --with-mkl_pardiso-dir=/state/std2/intel_2018/mkl > --with-mkl_cpardiso-dir=/state/std2/intel_2018/mkl > --download-mumps=../external/mumps_v5.1.2-p1.tar.gz --d > > ownload-scalapack=../external/scalapack-2.0.2.tgz --with-cc=mpiicc > --with-fc=mpiifort --with-cxx=mpiicc --FOPTFLAGS="-O3 -xHost" > --COPTFLAGS="-O3 -xHost" --CXX > > OPTFLAGS="-O3 -xHost" > > > > > > #2 DMDestroy() line 752 in > /state/std2/FEMI/PETSc/petsc-3.11.1/src/dm/interface/dm.c > > [72]PETSC ERROR: #3 PetscObjectDereference() line 624 in > /state/std2/FEMI/PETSc/petsc-3.11.1/src/sys/objects/inherit.c > > [72]PETSC ERROR: #4 PetscObjectListDestroy() line 156 in > /state/std2/FEMI/PETSc/petsc-3.11.1/src/sys/objects/olist.c > > [72]PETSC ERROR: #5 PetscHeaderDestroy_Private() line 122 in > /state/std2/FEMI/PETSc/petsc-3.11.1/src/sys/objects/inherit.c > > [72]PETSC ERROR: #6 VecDestroy() line 412 in > /state/std2/FEMI/PETSc/petsc-3.11.1/src/vec/vec/interface/vector.c > > > > > > > > This is a large run taking many hours to get to this problem. I will try > to run in debug mode, but given that this seems to be randomly happening > (this has happened maybe 30% of the time I have used the fgmres option), > there is no guarantee that will show anything useful. Valgrind is obviously > out of the question for a large run, and I have yet to reproduce this on a > smaller run. > > > > Anyone have any ideas as to what?s causing this? > > > > Thanks in advance, > > > > Randy M. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From raphaelegan at ucsb.edu Mon May 6 18:41:39 2019 From: raphaelegan at ucsb.edu (Raphael Egan) Date: Mon, 6 May 2019 16:41:39 -0700 Subject: [petsc-users] Strong scaling issue cg solver with HYPRE preconditioner (BoomerAMG preconditioning) Message-ID: Dear Petsc developer(s), I am assessing the strong scalability of our incompressible Navier-Stokes solver on distributed Octree grids developed at the University of California, Santa Barbara. I intend to show satisfactory strong scaling behavior, even on very large numbers of cores for computational problems that involve a mesh that is big enough. My scaling test case involves 10 time steps of our solver, applied to the three-dimensional flow past a sphere on an extremely fine (but non-uniform) mesh made of about 270,000,000 computational cells. We use the p4est library for grid management and Petsc solvers for all linear algebra problems. I am working on Stampede 2 supercomputer's KNL nodes, using 64 cores per node. The most elementary usage of the solver can be summarized as follows: - "viscosity step": velocity components are decoupled from each other and solved for successively, leading to three successive Poisson problems with an added positive diagonal term; - "projection step": a projection step is solved to enforce the velocity field to be divergence-free. The solution of the "projection step" updates the appropriate boundary conditions for the "viscosity step" and the process may thus be repeated until a desired convergence threshold is reached for the current time step. Given the discretization that we use for the "viscosity step", the three corresponding linear systems are slightly nonsymmetric but have a dominant diagonal. We use BiCGStab with PCSOR as a preconditioner invariably for those problems (only a few iterations are required for those problems in this case). On the contrary, the linear system to be solved for the "projection step" is symmetric and positive definite, so we use a conjugate gradient solver. PCSOR or PCHYPRE can be chosen by the user as a preconditioner for that task. If PCHYPRE is chosen, "-pc_hypre_boomeramg_strong_threshold" is set to "0.5", "-pc_hypre_boomeramg_coarsen_type" is set to "Falgout" and "-pc_hypre_boomeramg_truncfactor" is set to "0.1". Tests on my local machine (on a much smaller problem) revealed that PCHYPRE outperforms PCSOR (both in terms of execution time and number of iterations) and I expected the conclusion to hold for (much) bigger problems on Stampede 2, as well. However that was wrong: PCSOR actually outperforms PCHYPRE quite significantly when using large numbers of cores for my scaling test problem. Worse than that, strong scaling of the projection step is actually very poor when using PCHYPRE as preconditioner: the time spent on that part of the problem grows with the number of cores. See the results here below: NB: the scaling test that is considered here below consists of 2 time steps of the solver starting from an initial state read from disk, of around 270,000,000 computational cells. Each time step involves two sub-iterations,* i.e.*, every time step solves a first viscosity step (3 calls to KSPSolve of type bcgs with zero initial guesses), then a first projection step (1 call to KSPSolve of type cg with zero initial guess) then a second viscosity step (3 more calls to KSPSolve of type bcgs with nonzero initial guess), and a second projection step (1 call to KSPSolve of type cg with nonzero initial guess). This makes a total of 16 calls to KSPSolves, 12 of them are for the viscosity step (not under investigation here) and 4 others are for the projection steps *Mean execution time for the projection step (only):* Number of cores: 4096 --> mean execution time for projection step using PCSOR: 125.7 s |||||| mean exectution time for projection step using PCHYPRE: 192.7 s Number of cores: 8192 --> mean execution time for projection step using PCSOR: 81.58 s |||||| mean exectution time for projection step using PCHYPRE: 281.8 s Number of cores: 16384 --> mean execution time for projection step using PCSOR: 48.2 s |||||| mean exectution time for projection step using PCHYPRE: 311.5 s The tests were all run with "-ksp_monitor", "-ksp_view" and "-log_view" for gathering more information about the processes (see files attached). While it is clear that the number of iterations is dramatically reduced when using PCHYPRE, it looks like a *very* long time is spent in PCSetUp and PCApply when using PCHYPRE for the projection step. When using PCHYPRE, the time spent in PCSetUp, PCApply and KSPSolve grows with the number of cores that are used. In particular, the time spent in PCSetUp alone when using PCHYPRE is larger than the total KSPSolve time when using PCSOR (although thousands of iterations are required in that case)... I am very surprised by these results and I don't quite understand them, I used PCHYPRE with similar options and option values in other elliptic solver contexts previously with very satisfactory results, even on large problems. I can't figure out why we loose strong scaling in this case? Would you have some advice about a better preconditioner to use and or a better set of options and/or option values to set in order to recover strong scaling when using PCHYPRE? -- Raphael M. Egan Department of Mechanical Engineering Engineering II Building, Office 2317 University of California, Santa Barbara Santa Barbara, CA 93106 Email: raphaelegan at ucsb.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: petsc_test_flow_past_sphere_Re_500_scaling_test_16384_cg_hypre Type: application/octet-stream Size: 43617 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: petsc_test_flow_past_sphere_Re_500_scaling_test_8192_cg_hypre Type: application/octet-stream Size: 43440 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: petsc_test_flow_past_sphere_Re_500_scaling_test_4096_cg_hypre Type: application/octet-stream Size: 43270 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: petsc_test_flow_past_sphere_Re_500_scaling_test_4096_cg_sor Type: application/octet-stream Size: 428768 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: petsc_test_flow_past_sphere_Re_500_scaling_test_8192_cg_sor Type: application/octet-stream Size: 432853 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: petsc_test_flow_past_sphere_Re_500_scaling_test_16384_cg_sor Type: application/octet-stream Size: 439527 bytes Desc: not available URL: From mfadams at lbl.gov Mon May 6 19:50:23 2019 From: mfadams at lbl.gov (Mark Adams) Date: Mon, 6 May 2019 20:50:23 -0400 Subject: [petsc-users] Strong scaling issue cg solver with HYPRE preconditioner (BoomerAMG preconditioning) In-Reply-To: References: Message-ID: On Mon, May 6, 2019 at 7:53 PM Raphael Egan via petsc-users < petsc-users at mcs.anl.gov> wrote: > Dear Petsc developer(s), > > I am assessing the strong scalability of our incompressible Navier-Stokes > solver on distributed Octree grids developed at the University of > California, Santa Barbara. > > I intend to show satisfactory strong scaling behavior, even on very large > numbers of cores for computational problems that involve a mesh that is big > enough. My scaling test case involves 10 time steps of our solver, applied > to the three-dimensional flow past a sphere on an extremely fine (but > non-uniform) mesh made of about 270,000,000 computational cells. We use the > p4est library for grid management and Petsc solvers for all linear algebra > problems. I am working on Stampede 2 supercomputer's KNL nodes, using 64 > cores per node. > > The most elementary usage of the solver can be summarized as follows: > - "viscosity step": velocity components are decoupled from each other and > solved for successively, leading to three successive Poisson problems with > an added positive diagonal term; > - "projection step": a projection step is solved to enforce the velocity > field to be divergence-free. > The solution of the "projection step" updates the appropriate boundary > conditions for the "viscosity step" and the process may thus be repeated > until a desired convergence threshold is reached for the current time step. > > Given the discretization that we use for the "viscosity step", the three > corresponding linear systems are slightly nonsymmetric but have a dominant > diagonal. We use BiCGStab with PCSOR as a preconditioner invariably for > those problems (only a few iterations are required for those problems in > this case). On the contrary, the linear system to be solved for the > "projection step" is symmetric and positive definite, so we use a conjugate > gradient solver. PCSOR or PCHYPRE can be chosen by the user as a > preconditioner for that task. If PCHYPRE is chosen, > "-pc_hypre_boomeramg_strong_threshold" is set to "0.5", > "-pc_hypre_boomeramg_coarsen_type" is set to "Falgout" and > "-pc_hypre_boomeramg_truncfactor" is set to "0.1". Tests on my local > machine (on a much smaller problem) revealed that PCHYPRE outperforms PCSOR > (both in terms of execution time and number of iterations) and I expected > the conclusion to hold for (much) bigger problems on Stampede 2, as well. > > However that was wrong: PCSOR actually outperforms PCHYPRE quite > significantly when using large numbers of cores for my scaling test > problem. Worse than that, strong scaling of the projection step is actually > very poor when using PCHYPRE as preconditioner: the time spent on that part > of the problem grows with the number of cores. See the results here below: > > NB: the scaling test that is considered here below consists of 2 time > steps of the solver starting from an initial state read from disk, of > around 270,000,000 computational cells. Each time step involves two > sub-iterations,* i.e.*, every time step solves a first viscosity step (3 > calls to KSPSolve of type bcgs with zero initial guesses), then a first > projection step (1 call to KSPSolve of type cg with zero initial guess) > then a second viscosity step (3 more calls to KSPSolve of type bcgs with > nonzero initial guess), and a second projection step (1 call to KSPSolve of > type cg with nonzero initial guess). This makes a total of 16 calls to > KSPSolves, 12 of them are for the viscosity step (not under investigation > here) and 4 others are for the projection steps > > *Mean execution time for the projection step (only):* > Number of cores: 4096 --> mean execution time for projection step > using PCSOR: 125.7 s |||||| mean exectution time for projection step > using PCHYPRE: 192.7 s > Number of cores: 8192 --> mean execution time for projection step > using PCSOR: 81.58 s |||||| mean exectution time for projection step > using PCHYPRE: 281.8 s > Number of cores: 16384 --> mean execution time for projection step using > PCSOR: 48.2 s |||||| mean exectution time for projection step using > PCHYPRE: 311.5 s > > The tests were all run with "-ksp_monitor", "-ksp_view" and "-log_view" > for gathering more information about the processes (see files attached). > While it is clear that the number of iterations is dramatically reduced > when using PCHYPRE, it looks like a *very* long time is spent in PCSetUp > and PCApply when using PCHYPRE for the projection step. When using PCHYPRE, > the time spent in PCSetUp, PCApply and KSPSolve grows with the number of > cores that are used. In particular, the time spent in PCSetUp alone when > using PCHYPRE is larger than the total KSPSolve time when using PCSOR > (although thousands of iterations are required in that case)... > > I am very surprised by these results and I don't quite understand them, I > used PCHYPRE with similar options and option values in other elliptic > solver contexts previously with very satisfactory results, even on large > problems. > I assume you mean different hardware. The number of equations that you are using per core (MPI process presumably) are not crazy and using 64 cores on a KNL (ie, not 256 hardware threads) is also reasonable. > I can't figure out why we loose strong scaling in this case? > > Would you have some advice about a better preconditioner to use and or a > better set of options and/or option values to set in order to recover > strong scaling when using PCHYPRE? > We can not performance debug hypre, We just provide and interface to hypre. So you would have to talk to them. PETSc has a native AMG solver that we can performance debug (-pc_type gamg). You could try that if you like. If you do, run it with -ksp_view and send the output. The next step in debugging this, if you are not happy, is run with -info and grep on GAMG and send that output. KNL is a challenging architecture and it is not surprising to run into problems in going from say haswell to KNL. Also, you should start with smaller tests. 4K nodes is not trivial and we don't know if this is a network (inter-node) or a local (intra-node/socket) problem. KNL is different on the node. The network is similar. I would start with looking at one socket and scaling up to 64 cores (MPI processes) and debug that. You should see good scaling from 1 to 32 processes if the processes get put on each tile, which I have found it does by default. You should see some less than perfect strong scaling to 64 cores because you are now sharing an L2, you can then go to 128 processes (sharing L1), etc. if you like. Once you understand what is going on on a socket you can start scaling onto the network. Mark > > -- > Raphael M. Egan > > Department of Mechanical Engineering > Engineering II Building, Office 2317 > University of California, Santa Barbara > Santa Barbara, CA 93106 > Email: raphaelegan at ucsb.edu > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon May 6 22:02:51 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Tue, 7 May 2019 03:02:51 +0000 Subject: [petsc-users] Strong scaling issue cg solver with HYPRE preconditioner (BoomerAMG preconditioning) In-Reply-To: References: Message-ID: <4F27AC64-6AA3-4D8C-92E5-42D1D0993F2D@anl.gov> Before you make any more timing runs you should register three stages with PETSC_EXTERN PetscErrorCode PetscLogStageRegister(const char[],PetscLogStage*); and then put PETSC_EXTERN PetscErrorCode PetscLogStagePush(PetscLogStage); PETSC_EXTERN PetscErrorCode PetscLogStagePop(void); around the advection solve. Put the second set around the call to KSPSetUp() on the projection step and the third set around the projection solves. This will cause the -log_view to display separately the information about each type of solve and the setup so it is a bit easier to understand. To make test runs with GAMG as Mark suggestions you must use the master branch of the PETSc repository. We have been aggressively optimizing parts of GAMG recently. The set up time of AMG is largely determined by the performance of the sparse matrix triple product P^T A P, PETSc currently has four versions of this product which make tradeoffs in memory and time through different algorithmic approaches. They can be accessed with -matptap_via scalable or nonscalable or allatonce or allatonce_merged or hypre the final one uses the matrix triple product of hypre (but not the rest of hypre BoomerAMG.) Since your project problem remains the same for all time-steps (I assume) you can add the option -mat_freeintermediatedatastructures to save memory usage. We'd be very interested in hearing about your performance results Barry > On May 6, 2019, at 6:41 PM, Raphael Egan via petsc-users wrote: > > Dear Petsc developer(s), > > I am assessing the strong scalability of our incompressible Navier-Stokes solver on distributed Octree grids developed at the University of California, Santa Barbara. > > I intend to show satisfactory strong scaling behavior, even on very large numbers of cores for computational problems that involve a mesh that is big enough. My scaling test case involves 10 time steps of our solver, applied to the three-dimensional flow past a sphere on an extremely fine (but non-uniform) mesh made of about 270,000,000 computational cells. We use the p4est library for grid management and Petsc solvers for all linear algebra problems. I am working on Stampede 2 supercomputer's KNL nodes, using 64 cores per node. > > The most elementary usage of the solver can be summarized as follows: > - "viscosity step": velocity components are decoupled from each other and solved for successively, leading to three successive Poisson problems with an added positive diagonal term; > - "projection step": a projection step is solved to enforce the velocity field to be divergence-free. > The solution of the "projection step" updates the appropriate boundary conditions for the "viscosity step" and the process may thus be repeated until a desired convergence threshold is reached for the current time step. > > Given the discretization that we use for the "viscosity step", the three corresponding linear systems are slightly nonsymmetric but have a dominant diagonal. We use BiCGStab with PCSOR as a preconditioner invariably for those problems (only a few iterations are required for those problems in this case). On the contrary, the linear system to be solved for the "projection step" is symmetric and positive definite, so we use a conjugate gradient solver. PCSOR or PCHYPRE can be chosen by the user as a preconditioner for that task. If PCHYPRE is chosen, "-pc_hypre_boomeramg_strong_threshold" is set to "0.5", "-pc_hypre_boomeramg_coarsen_type" is set to "Falgout" and "-pc_hypre_boomeramg_truncfactor" is set to "0.1". Tests on my local machine (on a much smaller problem) revealed that PCHYPRE outperforms PCSOR (both in terms of execution time and number of iterations) and I expected the conclusion to hold for (much) bigger problems on Stampede 2, as well. > > However that was wrong: PCSOR actually outperforms PCHYPRE quite significantly when using large numbers of cores for my scaling test problem. Worse than that, strong scaling of the projection step is actually very poor when using PCHYPRE as preconditioner: the time spent on that part of the problem grows with the number of cores. See the results here below: > > NB: the scaling test that is considered here below consists of 2 time steps of the solver starting from an initial state read from disk, of around 270,000,000 computational cells. Each time step involves two sub-iterations, i.e., every time step solves a first viscosity step (3 calls to KSPSolve of type bcgs with zero initial guesses), then a first projection step (1 call to KSPSolve of type cg with zero initial guess) then a second viscosity step (3 more calls to KSPSolve of type bcgs with nonzero initial guess), and a second projection step (1 call to KSPSolve of type cg with nonzero initial guess). This makes a total of 16 calls to KSPSolves, 12 of them are for the viscosity step (not under investigation here) and 4 others are for the projection steps > > Mean execution time for the projection step (only): > Number of cores: 4096 --> mean execution time for projection step using PCSOR: 125.7 s |||||| mean exectution time for projection step using PCHYPRE: 192.7 s > Number of cores: 8192 --> mean execution time for projection step using PCSOR: 81.58 s |||||| mean exectution time for projection step using PCHYPRE: 281.8 s > Number of cores: 16384 --> mean execution time for projection step using PCSOR: 48.2 s |||||| mean exectution time for projection step using PCHYPRE: 311.5 s > > The tests were all run with "-ksp_monitor", "-ksp_view" and "-log_view" for gathering more information about the processes (see files attached). While it is clear that the number of iterations is dramatically reduced when using PCHYPRE, it looks like a very long time is spent in PCSetUp and PCApply when using PCHYPRE for the projection step. When using PCHYPRE, the time spent in PCSetUp, PCApply and KSPSolve grows with the number of cores that are used. In particular, the time spent in PCSetUp alone when using PCHYPRE is larger than the total KSPSolve time when using PCSOR (although thousands of iterations are required in that case)... > > I am very surprised by these results and I don't quite understand them, I used PCHYPRE with similar options and option values in other elliptic solver contexts previously with very satisfactory results, even on large problems. I can't figure out why we loose strong scaling in this case? > > Would you have some advice about a better preconditioner to use and or a better set of options and/or option values to set in order to recover strong scaling when using PCHYPRE? > > -- > Raphael M. Egan > > Department of Mechanical Engineering > Engineering II Building, Office 2317 > University of California, Santa Barbara > Santa Barbara, CA 93106 > Email: raphaelegan at ucsb.edu > From jean-christophe.giret at irt-saintexupery.com Tue May 7 10:15:05 2019 From: jean-christophe.giret at irt-saintexupery.com (GIRET Jean-Christophe) Date: Tue, 7 May 2019 15:15:05 +0000 Subject: [petsc-users] Question about parallel Vectors and communicators Message-ID: <25edff62fdda412e8a5db92c18e9dbc0@IRT00V020.IRT-AESE.local> Dear PETSc users, I would like to use Petsc4Py for a project extension, which consists mainly of: - Storing data and matrices on several rank/nodes which could not fit on a single node. - Performing some linear algebra in a parallel fashion (solving sparse linear system for instance) - Exchanging those data structures (parallel vectors) between non-overlapping MPI communicators, created for instance by splitting MPI_COMM_WORLD. While the two first items seems to be well addressed by PETSc, I am wondering about the last one. Is it possible to access the data of a vector, defined on a communicator from another, non-overlapping communicator? From what I have seen from the documentation and the several threads on the user mailing-list, I would say no. But maybe I am missing something? If not, is it possible to transfer a vector defined on a given communicator on a communicator which is a subset of the previous one? Best regards, Jean-Christophe -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Tue May 7 11:43:15 2019 From: jed at jedbrown.org (Jed Brown) Date: Tue, 07 May 2019 10:43:15 -0600 Subject: [petsc-users] Question about parallel Vectors and communicators In-Reply-To: <25edff62fdda412e8a5db92c18e9dbc0@IRT00V020.IRT-AESE.local> References: <25edff62fdda412e8a5db92c18e9dbc0@IRT00V020.IRT-AESE.local> Message-ID: <871s1az04c.fsf@jedbrown.org> The standard approach would be to communicate via the parent comm. So you split comm world into part0 and part1 and use a VecScatter with vecs on world (which can have zero entries on part1 and part0 respectively) to exchange your data. You can use VecPlaceArray or VecCreate*WithArray to avoid an extra copy. GIRET Jean-Christophe via petsc-users writes: > Dear PETSc users, > > I would like to use Petsc4Py for a project extension, which consists mainly of: > > - Storing data and matrices on several rank/nodes which could not fit on a single node. > > - Performing some linear algebra in a parallel fashion (solving sparse linear system for instance) > > - Exchanging those data structures (parallel vectors) between non-overlapping MPI communicators, created for instance by splitting MPI_COMM_WORLD. > > While the two first items seems to be well addressed by PETSc, I am wondering about the last one. > > Is it possible to access the data of a vector, defined on a communicator from another, non-overlapping communicator? From what I have seen from the documentation and the several threads on the user mailing-list, I would say no. But maybe I am missing something? If not, is it possible to transfer a vector defined on a given communicator on a communicator which is a subset of the previous one? > > Best regards, > Jean-Christophe From mfadams at lbl.gov Tue May 7 14:39:27 2019 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 7 May 2019 15:39:27 -0400 Subject: [petsc-users] Question about parallel Vectors and communicators In-Reply-To: <25edff62fdda412e8a5db92c18e9dbc0@IRT00V020.IRT-AESE.local> References: <25edff62fdda412e8a5db92c18e9dbc0@IRT00V020.IRT-AESE.local> Message-ID: On Tue, May 7, 2019 at 11:38 AM GIRET Jean-Christophe via petsc-users < petsc-users at mcs.anl.gov> wrote: > Dear PETSc users, > > > > I would like to use Petsc4Py for a project extension, which consists > mainly of: > > - Storing data and matrices on several rank/nodes which could > not fit on a single node. > > - Performing some linear algebra in a parallel fashion (solving > sparse linear system for instance) > > - Exchanging those data structures (parallel vectors) between > non-overlapping MPI communicators, created for instance by splitting > MPI_COMM_WORLD. > > > > While the two first items seems to be well addressed by PETSc, I am > wondering about the last one. > > > > Is it possible to access the data of a vector, defined on a communicator > from another, non-overlapping communicator? From what I have seen from the > documentation and the several threads on the user mailing-list, I would say > no. But maybe I am missing something? If not, is it possible to transfer a > vector defined on a given communicator on a communicator which is a subset > of the previous one? > If you are sending to a subset of processes then VecGetSubVec + Jed's tricks might work. https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecGetSubVector.html > > > Best regards, > > Jean-Christophe > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From s_g at berkeley.edu Tue May 7 18:55:35 2019 From: s_g at berkeley.edu (Sanjay Govindjee) Date: Tue, 7 May 2019 16:55:35 -0700 Subject: [petsc-users] Command line option -memory_info Message-ID: I was trying to clean up some old scripts we have for running our codes which include the command line option -memory_info. I went digging in the manuals to try and figure out what this used to do and what has replaced its functionality but I wasn't able to figure it out.? Does anyone recall the earlier functionality for this option? and/or know its "replacement"? -sanjay From jczhang at mcs.anl.gov Tue May 7 20:32:52 2019 From: jczhang at mcs.anl.gov (Zhang, Junchao) Date: Wed, 8 May 2019 01:32:52 +0000 Subject: [petsc-users] Command line option -memory_info In-Reply-To: References: Message-ID: https://www.mcs.anl.gov/petsc/documentation/changes/37.html has PetscMemoryShowUsage() and -memory_info changed to PetscMemoryView() and -memory_view --Junchao Zhang On Tue, May 7, 2019 at 6:56 PM Sanjay Govindjee via petsc-users > wrote: I was trying to clean up some old scripts we have for running our codes which include the command line option -memory_info. I went digging in the manuals to try and figure out what this used to do and what has replaced its functionality but I wasn't able to figure it out. Does anyone recall the earlier functionality for this option? and/or know its "replacement"? -sanjay -------------- next part -------------- An HTML attachment was scrubbed... URL: From s_g at berkeley.edu Tue May 7 22:26:44 2019 From: s_g at berkeley.edu (Sanjay Govindjee) Date: Tue, 7 May 2019 20:26:44 -0700 Subject: [petsc-users] Command line option -memory_info In-Reply-To: References: Message-ID: <3806b548-71ff-3fe6-e0b7-75d49ae882d5@berkeley.edu> Thanks!? I had tried searching the petsc site for -memory_info, not sure why this did not come up. Notwithstanding, thanks again. -sanjay On 5/7/19 6:35 PM, Zhang, Junchao wrote: > https://www.mcs.anl.gov/petsc/documentation/changes/37.html?has > ? PetscMemoryShowUsage() and -memory_info changed to PetscMemoryView() > and -memory_view > > --Junchao Zhang > > > On Tue, May 7, 2019 at 6:56 PM Sanjay Govindjee via petsc-users > > wrote: > > I was trying to clean up some old scripts we have for running our > codes > which include the command line option -memory_info. > I went digging in the manuals to try and figure out what this used > to do > and what has replaced its functionality but I wasn't able > to figure it out.? Does anyone recall the earlier functionality > for this > option? and/or know its "replacement"? > -sanjay > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jychang48 at gmail.com Wed May 8 03:44:15 2019 From: jychang48 at gmail.com (Justin Chang) Date: Wed, 8 May 2019 02:44:15 -0600 Subject: [petsc-users] Extremely slow DMNetwork Jacobian assembly Message-ID: Hi guys, I have a fully working distribution system solver written using DMNetwork, The idea is that each electrical bus can have up to three phase nodes, and each phase node has two unknowns: voltage magnitude and angle. In a completely balanced system, each bus has three nodes, but in an unbalanced system some of the buses can be either single phase or two-phase. The working DMNetwork code I developed, loosely based on the SNES network/power.c, essentially represents each vertex as a bus. DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to each vertex. If every single bus had the same number of variables, the mat block size = 2, 4, or 6, and my code is both fast and scalable. However, if the unknowns per DMNetwork vertex unknowns are not the same across, then my SNESFormJacobian function becomes extremely extremely slow. Specifically, the MatSetValues() calls when the col/row global indices contain an offset value that points to a neighboring bus vertex. Why is that? Is it because I no longer have a uniform block structure and lose the speed/optimization benefits of iterating through an AIJ matrix? I see three potential workarounds: 1) Treat every vertex as a three phase bus and "zero out" all the unused phase node dofs and put a 1 in the diagonal. The problem I see with this is that I will have unnecessary degrees of freedom (aka non-zeros in the matrix). From the distribution systems I've seen, it's possible that anywhere from 1/3 to 1/2 of the buses will be two-phase or less, meaning I may have nearly twice the amount of dofs than necessary if I wanted to preserve the block size = 6 for the AU mat. 2) Treat every phase node as a vertex aka solve a single-phase power flow solver. That way I guarantee to have a block size = 2, this is what Domenico's former student did in his thesis work. The problem I see with this is that I have a larger graph, which can take more time to setup and parallelize. 3) Create a "fieldsplit" where I essentially have three "blocks" - one for buses with all three phases, another for buses with only two phases, one for single-phase buses. This way each block/fieldsplit will have a consistent block size. I am not sure if this will solve the MatSetValues() issues, but it's, but can anyone give pointers on how to go about achieving this? Thanks, Justin -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 8 07:44:21 2019 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 8 May 2019 08:44:21 -0400 Subject: [petsc-users] Extremely slow DMNetwork Jacobian assembly In-Reply-To: References: Message-ID: On Wed, May 8, 2019 at 4:45 AM Justin Chang via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi guys, > > I have a fully working distribution system solver written using DMNetwork, > The idea is that each electrical bus can have up to three phase nodes, and > each phase node has two unknowns: voltage magnitude and angle. In a > completely balanced system, each bus has three nodes, but in an unbalanced > system some of the buses can be either single phase or two-phase. > > The working DMNetwork code I developed, loosely based on the SNES > network/power.c, essentially represents each vertex as a bus. > DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to > each vertex. If every single bus had the same number of variables, the mat > block size = 2, 4, or 6, and my code is both fast and scalable. However, if > the unknowns per DMNetwork vertex unknowns are not the same across, then my > SNESFormJacobian function becomes extremely extremely slow. Specifically, > the MatSetValues() calls when the col/row global indices contain an offset > value that points to a neighboring bus vertex. > I have never seen MatSetValues() be slow unless it is allocating. Did you confirm that you are not allocating, with -info? Thanks, MAtt > Why is that? Is it because I no longer have a uniform block structure and > lose the speed/optimization benefits of iterating through an AIJ matrix? I > see three potential workarounds: > > 1) Treat every vertex as a three phase bus and "zero out" all the unused > phase node dofs and put a 1 in the diagonal. The problem I see with this is > that I will have unnecessary degrees of freedom (aka non-zeros in the > matrix). From the distribution systems I've seen, it's possible that > anywhere from 1/3 to 1/2 of the buses will be two-phase or less, meaning I > may have nearly twice the amount of dofs than necessary if I wanted to > preserve the block size = 6 for the AU mat. > > 2) Treat every phase node as a vertex aka solve a single-phase power flow > solver. That way I guarantee to have a block size = 2, this is what > Domenico's former student did in his thesis work. The problem I see with > this is that I have a larger graph, which can take more time to setup and > parallelize. > > 3) Create a "fieldsplit" where I essentially have three "blocks" - one for > buses with all three phases, another for buses with only two phases, one > for single-phase buses. This way each block/fieldsplit will have a > consistent block size. I am not sure if this will solve the MatSetValues() > issues, but it's, but can anyone give pointers on how to go about achieving > this? > > Thanks, > Justin > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Wed May 8 10:29:42 2019 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Wed, 8 May 2019 15:29:42 +0000 Subject: [petsc-users] Extremely slow DMNetwork Jacobian assembly In-Reply-To: References: Message-ID: Justin, How large the number of variables for a vertex can be, 8? By default, DMNetwork uses Jacobian created by DMPlex which treats a vertex/edge as a dense block. Adding the couplings between the vertices, your Jacobian could be quite dense and result in slow SNESFormJacobian. How do you compute the Jacobian? You can plot the sparse structure of your Jacobian to see how dense your matrix is. This is why we enable user-provided sparse Jacobian matrix blocks. See Sec. 2.4 of the attached manuscript and petsc/src/ts/examples/tutorials/network/wash/ex1.c Hong On Wed, May 8, 2019 at 7:45 AM Matthew Knepley via petsc-users > wrote: On Wed, May 8, 2019 at 4:45 AM Justin Chang via petsc-users > wrote: Hi guys, I have a fully working distribution system solver written using DMNetwork, The idea is that each electrical bus can have up to three phase nodes, and each phase node has two unknowns: voltage magnitude and angle. In a completely balanced system, each bus has three nodes, but in an unbalanced system some of the buses can be either single phase or two-phase. The working DMNetwork code I developed, loosely based on the SNES network/power.c, essentially represents each vertex as a bus. DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to each vertex. If every single bus had the same number of variables, the mat block size = 2, 4, or 6, and my code is both fast and scalable. However, if the unknowns per DMNetwork vertex unknowns are not the same across, then my SNESFormJacobian function becomes extremely extremely slow. Specifically, the MatSetValues() calls when the col/row global indices contain an offset value that points to a neighboring bus vertex. I have never seen MatSetValues() be slow unless it is allocating. Did you confirm that you are not allocating, with -info? Thanks, MAtt Why is that? Is it because I no longer have a uniform block structure and lose the speed/optimization benefits of iterating through an AIJ matrix? I see three potential workarounds: 1) Treat every vertex as a three phase bus and "zero out" all the unused phase node dofs and put a 1 in the diagonal. The problem I see with this is that I will have unnecessary degrees of freedom (aka non-zeros in the matrix). From the distribution systems I've seen, it's possible that anywhere from 1/3 to 1/2 of the buses will be two-phase or less, meaning I may have nearly twice the amount of dofs than necessary if I wanted to preserve the block size = 6 for the AU mat. 2) Treat every phase node as a vertex aka solve a single-phase power flow solver. That way I guarantee to have a block size = 2, this is what Domenico's former student did in his thesis work. The problem I see with this is that I have a larger graph, which can take more time to setup and parallelize. 3) Create a "fieldsplit" where I essentially have three "blocks" - one for buses with all three phases, another for buses with only two phases, one for single-phase buses. This way each block/fieldsplit will have a consistent block size. I am not sure if this will solve the MatSetValues() issues, but it's, but can anyone give pointers on how to go about achieving this? Thanks, Justin -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: wash-paper.pdf Type: application/pdf Size: 1415457 bytes Desc: wash-paper.pdf URL: From shrirang.abhyankar at pnnl.gov Wed May 8 10:30:23 2019 From: shrirang.abhyankar at pnnl.gov (Abhyankar, Shrirang G) Date: Wed, 8 May 2019 15:30:23 +0000 Subject: [petsc-users] Extremely slow DMNetwork Jacobian assembly In-Reply-To: References: Message-ID: <1686AD01-09A8-457F-806F-BA1CE4DDFD68@pnnl.gov> From: petsc-users on behalf of Matthew Knepley via petsc-users Reply-To: Matthew Knepley Date: Wednesday, May 8, 2019 at 7:46 AM To: Justin Chang Cc: petsc-users Subject: Re: [petsc-users] Extremely slow DMNetwork Jacobian assembly On Wed, May 8, 2019 at 4:45 AM Justin Chang via petsc-users > wrote: Hi guys, I have a fully working distribution system solver written using DMNetwork, The idea is that each electrical bus can have up to three phase nodes, and each phase node has two unknowns: voltage magnitude and angle. In a completely balanced system, each bus has three nodes, but in an unbalanced system some of the buses can be either single phase or two-phase. The working DMNetwork code I developed, loosely based on the SNES network/power.c, essentially represents each vertex as a bus. DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to each vertex. If every single bus had the same number of variables, the mat block size = 2, 4, or 6, and my code is both fast and scalable. However, if the unknowns per DMNetwork vertex unknowns are not the same across, then my SNESFormJacobian function becomes extremely extremely slow. Specifically, the MatSetValues() calls when the col/row global indices contain an offset value that points to a neighboring bus vertex. I have never seen MatSetValues() be slow unless it is allocating. Did you confirm that you are not allocating, with -info? Thanks, Matt I have written power grid codes using DMNetwork where the vertex dofs range from 2 to 20. I have not yet observed the slow-down you report. My guess, as Matt points, is something to do with the preallocation. In power.c example, the DM creates the Jacobian matrix (which sets the Jacobian nonzero structure and does the allocation). Do you have the following lines in your code? ierr = DMCreateMatrix(networkdm,&J);CHKERRQ(ierr); ierr = MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE);CHKERRQ(ierr); Why is that? Is it because I no longer have a uniform block structure and lose the speed/optimization benefits of iterating through an AIJ matrix? I see three potential workarounds: 1) Treat every vertex as a three phase bus and "zero out" all the unused phase node dofs and put a 1 in the diagonal. The problem I see with this is that I will have unnecessary degrees of freedom (aka non-zeros in the matrix). From the distribution systems I've seen, it's possible that anywhere from 1/3 to 1/2 of the buses will be two-phase or less, meaning I may have nearly twice the amount of dofs than necessary if I wanted to preserve the block size = 6 for the AU mat. 2) Treat every phase node as a vertex aka solve a single-phase power flow solver. That way I guarantee to have a block size = 2, this is what Domenico's former student did in his thesis work. The problem I see with this is that I have a larger graph, which can take more time to setup and parallelize. 3) Create a "fieldsplit" where I essentially have three "blocks" - one for buses with all three phases, another for buses with only two phases, one for single-phase buses. This way each block/fieldsplit will have a consistent block size. I am not sure if this will solve the MatSetValues() issues, but it's, but can anyone give pointers on how to go about achieving this? Thanks, Justin -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed May 8 13:10:23 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Wed, 8 May 2019 18:10:23 +0000 Subject: [petsc-users] Extremely slow DMNetwork Jacobian assembly In-Reply-To: References: Message-ID: <4D37A5FF-992F-4959-A469-1C7CDAFC813C@anl.gov> Justin, Are you providing matrix entries that connect directly one vertex to another vertex ACROSS an edge? I don't think that is supported by the DMNetwork model. The assumption is that edges are only connected to vertices and vertices are only connected to neighboring edges. Everyone, I second Matt's reply. How is the DMNetwork preallocating for the Jacobian? Does it take into account coupling between neighboring vertices/edges? Or does it assume no coupling. Or assume full coupling. If it assumes no coupling and the user has a good amount of coupling it will be very slow. There would need to be a way for the user provide the coupling information between neighboring vertices/edges if it assumes no coupling. Barry > On May 8, 2019, at 7:44 AM, Matthew Knepley via petsc-users wrote: > > On Wed, May 8, 2019 at 4:45 AM Justin Chang via petsc-users wrote: > Hi guys, > > I have a fully working distribution system solver written using DMNetwork, The idea is that each electrical bus can have up to three phase nodes, and each phase node has two unknowns: voltage magnitude and angle. In a completely balanced system, each bus has three nodes, but in an unbalanced system some of the buses can be either single phase or two-phase. > > The working DMNetwork code I developed, loosely based on the SNES network/power.c, essentially represents each vertex as a bus. DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to each vertex. If every single bus had the same number of variables, the mat block size = 2, 4, or 6, and my code is both fast and scalable. However, if the unknowns per DMNetwork vertex unknowns are not the same across, then my SNESFormJacobian function becomes extremely extremely slow. Specifically, the MatSetValues() calls when the col/row global indices contain an offset value that points to a neighboring bus vertex. > > I have never seen MatSetValues() be slow unless it is allocating. Did you confirm that you are not allocating, with -info? > > Thanks, > > MAtt > > Why is that? Is it because I no longer have a uniform block structure and lose the speed/optimization benefits of iterating through an AIJ matrix? I see three potential workarounds: > > 1) Treat every vertex as a three phase bus and "zero out" all the unused phase node dofs and put a 1 in the diagonal. The problem I see with this is that I will have unnecessary degrees of freedom (aka non-zeros in the matrix). From the distribution systems I've seen, it's possible that anywhere from 1/3 to 1/2 of the buses will be two-phase or less, meaning I may have nearly twice the amount of dofs than necessary if I wanted to preserve the block size = 6 for the AU mat. > > 2) Treat every phase node as a vertex aka solve a single-phase power flow solver. That way I guarantee to have a block size = 2, this is what Domenico's former student did in his thesis work. The problem I see with this is that I have a larger graph, which can take more time to setup and parallelize. > > 3) Create a "fieldsplit" where I essentially have three "blocks" - one for buses with all three phases, another for buses with only two phases, one for single-phase buses. This way each block/fieldsplit will have a consistent block size. I am not sure if this will solve the MatSetValues() issues, but it's, but can anyone give pointers on how to go about achieving this? > > Thanks, > Justin > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From jychang48 at gmail.com Wed May 8 13:29:40 2019 From: jychang48 at gmail.com (Justin Chang) Date: Wed, 8 May 2019 12:29:40 -0600 Subject: [petsc-users] Extremely slow DMNetwork Jacobian assembly In-Reply-To: <4D37A5FF-992F-4959-A469-1C7CDAFC813C@anl.gov> References: <4D37A5FF-992F-4959-A469-1C7CDAFC813C@anl.gov> Message-ID: Hi everyone, Yes I have these lines in my code: ierr = DMCreateMatrix(networkdm,&J);CHKERRQ(ierr); ierr = MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE);CHKERRQ(ierr); I tried -info and here's my output: [0] PetscInitialize(): PETSc successfully started: number of processors = 1 [0] PetscInitialize(): Running on machine: jchang31606s.domain [0] PetscCommDuplicate(): Duplicating a communicator 4436504608 140550815662944 max tags = 2147483647 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944 Base power = 0.166667, numbus = 115000, numgen = 5000, numyl = 75000, numdl = 5000, numlbr = 109999, numtbr = 5000 **** Power flow dist case **** Base power = 0.166667, nbus = 115000, ngen = 5000, nwye = 75000, ndelta = 5000, nbranch = 114999 [0] PetscCommDuplicate(): Duplicating a communicator 4436505120 140550815683104 max tags = 2147483647 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 620000 X 620000; storage space: 0 unneeded,10799928 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 28 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 620000) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 205000 nodes of 620000. Limit used: 5. Using Inode routines [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944 [0] DMGetDMSNES(): Creating new DMSNES [0] DMGetDMKSP(): Creating new DMKSP [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 0 SNES Function norm 1155.45 nothing else -info related shows up as I'm iterating through the vertex loop. I'll have a MWE for you guys to play with shortly. Thanks, Justin On Wed, May 8, 2019 at 12:10 PM Smith, Barry F. wrote: > > Justin, > > Are you providing matrix entries that connect directly one vertex to > another vertex ACROSS an edge? I don't think that is supported by the > DMNetwork model. The assumption is that edges are only connected to > vertices and vertices are only connected to neighboring edges. > > Everyone, > > I second Matt's reply. > > How is the DMNetwork preallocating for the Jacobian? Does it take into > account coupling between neighboring vertices/edges? Or does it assume no > coupling. Or assume full coupling. If it assumes no coupling and the user > has a good amount of coupling it will be very slow. > > There would need to be a way for the user provide the coupling > information between neighboring vertices/edges if it assumes no coupling. > > Barry > > > > On May 8, 2019, at 7:44 AM, Matthew Knepley via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > On Wed, May 8, 2019 at 4:45 AM Justin Chang via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hi guys, > > > > I have a fully working distribution system solver written using > DMNetwork, The idea is that each electrical bus can have up to three phase > nodes, and each phase node has two unknowns: voltage magnitude and angle. > In a completely balanced system, each bus has three nodes, but in an > unbalanced system some of the buses can be either single phase or two-phase. > > > > The working DMNetwork code I developed, loosely based on the SNES > network/power.c, essentially represents each vertex as a bus. > DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to > each vertex. If every single bus had the same number of variables, the mat > block size = 2, 4, or 6, and my code is both fast and scalable. However, if > the unknowns per DMNetwork vertex unknowns are not the same across, then my > SNESFormJacobian function becomes extremely extremely slow. Specifically, > the MatSetValues() calls when the col/row global indices contain an offset > value that points to a neighboring bus vertex. > > > > I have never seen MatSetValues() be slow unless it is allocating. Did > you confirm that you are not allocating, with -info? > > > > Thanks, > > > > MAtt > > > > Why is that? Is it because I no longer have a uniform block structure > and lose the speed/optimization benefits of iterating through an AIJ > matrix? I see three potential workarounds: > > > > 1) Treat every vertex as a three phase bus and "zero out" all the unused > phase node dofs and put a 1 in the diagonal. The problem I see with this is > that I will have unnecessary degrees of freedom (aka non-zeros in the > matrix). From the distribution systems I've seen, it's possible that > anywhere from 1/3 to 1/2 of the buses will be two-phase or less, meaning I > may have nearly twice the amount of dofs than necessary if I wanted to > preserve the block size = 6 for the AU mat. > > > > 2) Treat every phase node as a vertex aka solve a single-phase power > flow solver. That way I guarantee to have a block size = 2, this is what > Domenico's former student did in his thesis work. The problem I see with > this is that I have a larger graph, which can take more time to setup and > parallelize. > > > > 3) Create a "fieldsplit" where I essentially have three "blocks" - one > for buses with all three phases, another for buses with only two phases, > one for single-phase buses. This way each block/fieldsplit will have a > consistent block size. I am not sure if this will solve the MatSetValues() > issues, but it's, but can anyone give pointers on how to go about achieving > this? > > > > Thanks, > > Justin > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 8 13:36:43 2019 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 8 May 2019 14:36:43 -0400 Subject: [petsc-users] Extremely slow DMNetwork Jacobian assembly In-Reply-To: References: <4D37A5FF-992F-4959-A469-1C7CDAFC813C@anl.gov> Message-ID: On Wed, May 8, 2019 at 2:30 PM Justin Chang wrote: > Hi everyone, > > Yes I have these lines in my code: > > ierr = DMCreateMatrix(networkdm,&J);CHKERRQ(ierr); > ierr = > MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE);CHKERRQ(ierr); > Okay, its not allocation. So maybe Hong is right that its setting great big element matrices. We will see with the example. Thanks, Matt > I tried -info and here's my output: > > [0] PetscInitialize(): PETSc successfully started: number of processors = 1 > [0] PetscInitialize(): Running on machine: jchang31606s.domain > [0] PetscCommDuplicate(): Duplicating a communicator 4436504608 > 140550815662944 max tags = 2147483647 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 > 140550815662944 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 > 140550815662944 > Base power = 0.166667, numbus = 115000, numgen = 5000, numyl = 75000, > numdl = 5000, numlbr = 109999, numtbr = 5000 > > **** Power flow dist case **** > > Base power = 0.166667, nbus = 115000, ngen = 5000, nwye = 75000, ndelta = > 5000, nbranch = 114999 > [0] PetscCommDuplicate(): Duplicating a communicator 4436505120 > 140550815683104 max tags = 2147483647 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 620000 X 620000; storage space: > 0 unneeded,10799928 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 28 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 620000) < 0.6. Do not use CompressedRow routines. > [0] MatSeqAIJCheckInode(): Found 205000 nodes of 620000. Limit used: 5. > Using Inode routines > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 > 140550815662944 > [0] DMGetDMSNES(): Creating new DMSNES > [0] DMGetDMKSP(): Creating new DMKSP > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > 0 SNES Function norm 1155.45 > > nothing else -info related shows up as I'm iterating through the vertex > loop. > > I'll have a MWE for you guys to play with shortly. > > Thanks, > Justin > > On Wed, May 8, 2019 at 12:10 PM Smith, Barry F. > wrote: > >> >> Justin, >> >> Are you providing matrix entries that connect directly one vertex >> to another vertex ACROSS an edge? I don't think that is supported by the >> DMNetwork model. The assumption is that edges are only connected to >> vertices and vertices are only connected to neighboring edges. >> >> Everyone, >> >> I second Matt's reply. >> >> How is the DMNetwork preallocating for the Jacobian? Does it take into >> account coupling between neighboring vertices/edges? Or does it assume no >> coupling. Or assume full coupling. If it assumes no coupling and the user >> has a good amount of coupling it will be very slow. >> >> There would need to be a way for the user provide the coupling >> information between neighboring vertices/edges if it assumes no coupling. >> >> Barry >> >> >> > On May 8, 2019, at 7:44 AM, Matthew Knepley via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> > >> > On Wed, May 8, 2019 at 4:45 AM Justin Chang via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> > Hi guys, >> > >> > I have a fully working distribution system solver written using >> DMNetwork, The idea is that each electrical bus can have up to three phase >> nodes, and each phase node has two unknowns: voltage magnitude and angle. >> In a completely balanced system, each bus has three nodes, but in an >> unbalanced system some of the buses can be either single phase or two-phase. >> > >> > The working DMNetwork code I developed, loosely based on the SNES >> network/power.c, essentially represents each vertex as a bus. >> DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to >> each vertex. If every single bus had the same number of variables, the mat >> block size = 2, 4, or 6, and my code is both fast and scalable. However, if >> the unknowns per DMNetwork vertex unknowns are not the same across, then my >> SNESFormJacobian function becomes extremely extremely slow. Specifically, >> the MatSetValues() calls when the col/row global indices contain an offset >> value that points to a neighboring bus vertex. >> > >> > I have never seen MatSetValues() be slow unless it is allocating. Did >> you confirm that you are not allocating, with -info? >> > >> > Thanks, >> > >> > MAtt >> > >> > Why is that? Is it because I no longer have a uniform block structure >> and lose the speed/optimization benefits of iterating through an AIJ >> matrix? I see three potential workarounds: >> > >> > 1) Treat every vertex as a three phase bus and "zero out" all the >> unused phase node dofs and put a 1 in the diagonal. The problem I see with >> this is that I will have unnecessary degrees of freedom (aka non-zeros in >> the matrix). From the distribution systems I've seen, it's possible that >> anywhere from 1/3 to 1/2 of the buses will be two-phase or less, meaning I >> may have nearly twice the amount of dofs than necessary if I wanted to >> preserve the block size = 6 for the AU mat. >> > >> > 2) Treat every phase node as a vertex aka solve a single-phase power >> flow solver. That way I guarantee to have a block size = 2, this is what >> Domenico's former student did in his thesis work. The problem I see with >> this is that I have a larger graph, which can take more time to setup and >> parallelize. >> > >> > 3) Create a "fieldsplit" where I essentially have three "blocks" - one >> for buses with all three phases, another for buses with only two phases, >> one for single-phase buses. This way each block/fieldsplit will have a >> consistent block size. I am not sure if this will solve the MatSetValues() >> issues, but it's, but can anyone give pointers on how to go about achieving >> this? >> > >> > Thanks, >> > Justin >> > >> > >> > -- >> > What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> > -- Norbert Wiener >> > >> > https://www.cse.buffalo.edu/~knepley/ >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jychang48 at gmail.com Wed May 8 14:32:43 2019 From: jychang48 at gmail.com (Justin Chang) Date: Wed, 8 May 2019 13:32:43 -0600 Subject: [petsc-users] Extremely slow DMNetwork Jacobian assembly In-Reply-To: References: <4D37A5FF-992F-4959-A469-1C7CDAFC813C@anl.gov> Message-ID: So here's the branch/repo to the working example I have: https://github.com/jychang48/petsc-dss/tree/single-bus-vertex Type 'make' to compile the dss, it should work with the latest petsc-dev To test the performance, I've taken an existing IEEE 13-bus and duplicated it N times to create a long radial-like network. I have three sizes where N = 100, 500, and 1000. Those test files are listed as: input/test_100.m input/test_500.m input/test_1000.m I also created another set of examples where the IEEE 13-bus is fully balanced (but the program will crash ar the solve step because I used some unrealistic parameters for the Y-bus matrices and probably have some zeros somewhere). They are listed as: input/test2_100.m input/test2_500.m input/test2_1000.m The dof count and matrices for the test2_*.m files are slightly larger than their respective test_*.m but they have a bs=6. To run these tests, type the following: ./dpflow -input input/test_100.m I have a timer that shows how long it takes to compute the Jacobian. Attached are the log outputs I have for each of the six cases. Turns out that only the first call to the SNESComputeJacobian() is slow, all the subsequent calls are fast as I expect. This makes me think it still has something to do with matrix allocation. Thanks for the help everyone, Justin On Wed, May 8, 2019 at 12:36 PM Matthew Knepley wrote: > On Wed, May 8, 2019 at 2:30 PM Justin Chang wrote: > >> Hi everyone, >> >> Yes I have these lines in my code: >> >> ierr = DMCreateMatrix(networkdm,&J);CHKERRQ(ierr); >> ierr = >> MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE);CHKERRQ(ierr); >> > > Okay, its not allocation. So maybe Hong is right that its setting great > big element matrices. We will see with the example. > > Thanks, > > Matt > > >> I tried -info and here's my output: >> >> [0] PetscInitialize(): PETSc successfully started: number of processors = >> 1 >> [0] PetscInitialize(): Running on machine: jchang31606s.domain >> [0] PetscCommDuplicate(): Duplicating a communicator 4436504608 >> 140550815662944 max tags = 2147483647 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 >> 140550815662944 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 >> 140550815662944 >> Base power = 0.166667, numbus = 115000, numgen = 5000, numyl = 75000, >> numdl = 5000, numlbr = 109999, numtbr = 5000 >> >> **** Power flow dist case **** >> >> Base power = 0.166667, nbus = 115000, ngen = 5000, nwye = 75000, ndelta = >> 5000, nbranch = 114999 >> [0] PetscCommDuplicate(): Duplicating a communicator 4436505120 >> 140550815683104 max tags = 2147483647 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >> 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >> 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >> 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >> 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >> 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >> 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >> 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >> 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >> 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >> 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >> 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >> 140550815683104 >> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 620000 X 620000; storage space: >> 0 unneeded,10799928 used >> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 >> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 28 >> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows >> 0)/(num_localrows 620000) < 0.6. Do not use CompressedRow routines. >> [0] MatSeqAIJCheckInode(): Found 205000 nodes of 620000. Limit used: 5. >> Using Inode routines >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >> 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 >> 140550815662944 >> [0] DMGetDMSNES(): Creating new DMSNES >> [0] DMGetDMKSP(): Creating new DMKSP >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >> 140550815683104 >> 0 SNES Function norm 1155.45 >> >> nothing else -info related shows up as I'm iterating through the vertex >> loop. >> >> I'll have a MWE for you guys to play with shortly. >> >> Thanks, >> Justin >> >> On Wed, May 8, 2019 at 12:10 PM Smith, Barry F. >> wrote: >> >>> >>> Justin, >>> >>> Are you providing matrix entries that connect directly one vertex >>> to another vertex ACROSS an edge? I don't think that is supported by the >>> DMNetwork model. The assumption is that edges are only connected to >>> vertices and vertices are only connected to neighboring edges. >>> >>> Everyone, >>> >>> I second Matt's reply. >>> >>> How is the DMNetwork preallocating for the Jacobian? Does it take into >>> account coupling between neighboring vertices/edges? Or does it assume no >>> coupling. Or assume full coupling. If it assumes no coupling and the user >>> has a good amount of coupling it will be very slow. >>> >>> There would need to be a way for the user provide the coupling >>> information between neighboring vertices/edges if it assumes no coupling. >>> >>> Barry >>> >>> >>> > On May 8, 2019, at 7:44 AM, Matthew Knepley via petsc-users < >>> petsc-users at mcs.anl.gov> wrote: >>> > >>> > On Wed, May 8, 2019 at 4:45 AM Justin Chang via petsc-users < >>> petsc-users at mcs.anl.gov> wrote: >>> > Hi guys, >>> > >>> > I have a fully working distribution system solver written using >>> DMNetwork, The idea is that each electrical bus can have up to three phase >>> nodes, and each phase node has two unknowns: voltage magnitude and angle. >>> In a completely balanced system, each bus has three nodes, but in an >>> unbalanced system some of the buses can be either single phase or two-phase. >>> > >>> > The working DMNetwork code I developed, loosely based on the SNES >>> network/power.c, essentially represents each vertex as a bus. >>> DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to >>> each vertex. If every single bus had the same number of variables, the mat >>> block size = 2, 4, or 6, and my code is both fast and scalable. However, if >>> the unknowns per DMNetwork vertex unknowns are not the same across, then my >>> SNESFormJacobian function becomes extremely extremely slow. Specifically, >>> the MatSetValues() calls when the col/row global indices contain an offset >>> value that points to a neighboring bus vertex. >>> > >>> > I have never seen MatSetValues() be slow unless it is allocating. Did >>> you confirm that you are not allocating, with -info? >>> > >>> > Thanks, >>> > >>> > MAtt >>> > >>> > Why is that? Is it because I no longer have a uniform block structure >>> and lose the speed/optimization benefits of iterating through an AIJ >>> matrix? I see three potential workarounds: >>> > >>> > 1) Treat every vertex as a three phase bus and "zero out" all the >>> unused phase node dofs and put a 1 in the diagonal. The problem I see with >>> this is that I will have unnecessary degrees of freedom (aka non-zeros in >>> the matrix). From the distribution systems I've seen, it's possible that >>> anywhere from 1/3 to 1/2 of the buses will be two-phase or less, meaning I >>> may have nearly twice the amount of dofs than necessary if I wanted to >>> preserve the block size = 6 for the AU mat. >>> > >>> > 2) Treat every phase node as a vertex aka solve a single-phase power >>> flow solver. That way I guarantee to have a block size = 2, this is what >>> Domenico's former student did in his thesis work. The problem I see with >>> this is that I have a larger graph, which can take more time to setup and >>> parallelize. >>> > >>> > 3) Create a "fieldsplit" where I essentially have three "blocks" - one >>> for buses with all three phases, another for buses with only two phases, >>> one for single-phase buses. This way each block/fieldsplit will have a >>> consistent block size. I am not sure if this will solve the MatSetValues() >>> issues, but it's, but can anyone give pointers on how to go about achieving >>> this? >>> > >>> > Thanks, >>> > Justin >>> > >>> > >>> > -- >>> > What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> > -- Norbert Wiener >>> > >>> > https://www.cse.buffalo.edu/~knepley/ >>> >>> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test_100.out Type: application/octet-stream Size: 16187 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test_1000.out Type: application/octet-stream Size: 16224 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test2_100.out Type: application/octet-stream Size: 14901 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test_500.out Type: application/octet-stream Size: 16200 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test2_500.out Type: application/octet-stream Size: 14913 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test2_1000.out Type: application/octet-stream Size: 14914 bytes Desc: not available URL: From hzhang at mcs.anl.gov Wed May 8 15:55:28 2019 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Wed, 8 May 2019 20:55:28 +0000 Subject: [petsc-users] Extremely slow DMNetwork Jacobian assembly In-Reply-To: References: <4D37A5FF-992F-4959-A469-1C7CDAFC813C@anl.gov> Message-ID: Justin: So here's the branch/repo to the working example I have: https://github.com/jychang48/petsc-dss/tree/single-bus-vertex Type 'make' to compile the dss, it should work with the latest petsc-dev With trivial update to the latest petsc-dev (API of DMNetworkSetSizes() is changed slightly), I built dpflow. However, I only see following tests in my clone: $ ls input distCase_4Dyn1.m distCase_4YNd1.m distCase_4YNyn0.m distCase_test13nodes.m distributionCaseFormat.m To test the performance, I've taken an existing IEEE 13-bus and duplicated it N times to create a long radial-like network. I have three sizes where N = 100, 500, and 1000. Those test files are listed as: input/test_100.m input/test_500.m input/test_1000.m I see these files in https://github.com/jychang48/petsc-dss/tree/single-bus-vertex/input Why I cannot get them into my local repository with 'git pull'? Hong I also created another set of examples where the IEEE 13-bus is fully balanced (but the program will crash ar the solve step because I used some unrealistic parameters for the Y-bus matrices and probably have some zeros somewhere). They are listed as: input/test2_100.m input/test2_500.m input/test2_1000.m The dof count and matrices for the test2_*.m files are slightly larger than their respective test_*.m but they have a bs=6. To run these tests, type the following: ./dpflow -input input/test_100.m I have a timer that shows how long it takes to compute the Jacobian. Attached are the log outputs I have for each of the six cases. Turns out that only the first call to the SNESComputeJacobian() is slow, all the subsequent calls are fast as I expect. This makes me think it still has something to do with matrix allocation. Thanks for the help everyone, Justin On Wed, May 8, 2019 at 12:36 PM Matthew Knepley > wrote: On Wed, May 8, 2019 at 2:30 PM Justin Chang > wrote: Hi everyone, Yes I have these lines in my code: ierr = DMCreateMatrix(networkdm,&J);CHKERRQ(ierr); ierr = MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE);CHKERRQ(ierr); Okay, its not allocation. So maybe Hong is right that its setting great big element matrices. We will see with the example. Thanks, Matt I tried -info and here's my output: [0] PetscInitialize(): PETSc successfully started: number of processors = 1 [0] PetscInitialize(): Running on machine: jchang31606s.domain [0] PetscCommDuplicate(): Duplicating a communicator 4436504608 140550815662944 max tags = 2147483647 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944 Base power = 0.166667, numbus = 115000, numgen = 5000, numyl = 75000, numdl = 5000, numlbr = 109999, numtbr = 5000 **** Power flow dist case **** Base power = 0.166667, nbus = 115000, ngen = 5000, nwye = 75000, ndelta = 5000, nbranch = 114999 [0] PetscCommDuplicate(): Duplicating a communicator 4436505120 140550815683104 max tags = 2147483647 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 620000 X 620000; storage space: 0 unneeded,10799928 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 28 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 620000) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 205000 nodes of 620000. Limit used: 5. Using Inode routines [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944 [0] DMGetDMSNES(): Creating new DMSNES [0] DMGetDMKSP(): Creating new DMKSP [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 0 SNES Function norm 1155.45 nothing else -info related shows up as I'm iterating through the vertex loop. I'll have a MWE for you guys to play with shortly. Thanks, Justin On Wed, May 8, 2019 at 12:10 PM Smith, Barry F. > wrote: Justin, Are you providing matrix entries that connect directly one vertex to another vertex ACROSS an edge? I don't think that is supported by the DMNetwork model. The assumption is that edges are only connected to vertices and vertices are only connected to neighboring edges. Everyone, I second Matt's reply. How is the DMNetwork preallocating for the Jacobian? Does it take into account coupling between neighboring vertices/edges? Or does it assume no coupling. Or assume full coupling. If it assumes no coupling and the user has a good amount of coupling it will be very slow. There would need to be a way for the user provide the coupling information between neighboring vertices/edges if it assumes no coupling. Barry > On May 8, 2019, at 7:44 AM, Matthew Knepley via petsc-users > wrote: > > On Wed, May 8, 2019 at 4:45 AM Justin Chang via petsc-users > wrote: > Hi guys, > > I have a fully working distribution system solver written using DMNetwork, The idea is that each electrical bus can have up to three phase nodes, and each phase node has two unknowns: voltage magnitude and angle. In a completely balanced system, each bus has three nodes, but in an unbalanced system some of the buses can be either single phase or two-phase. > > The working DMNetwork code I developed, loosely based on the SNES network/power.c, essentially represents each vertex as a bus. DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to each vertex. If every single bus had the same number of variables, the mat block size = 2, 4, or 6, and my code is both fast and scalable. However, if the unknowns per DMNetwork vertex unknowns are not the same across, then my SNESFormJacobian function becomes extremely extremely slow. Specifically, the MatSetValues() calls when the col/row global indices contain an offset value that points to a neighboring bus vertex. > > I have never seen MatSetValues() be slow unless it is allocating. Did you confirm that you are not allocating, with -info? > > Thanks, > > MAtt > > Why is that? Is it because I no longer have a uniform block structure and lose the speed/optimization benefits of iterating through an AIJ matrix? I see three potential workarounds: > > 1) Treat every vertex as a three phase bus and "zero out" all the unused phase node dofs and put a 1 in the diagonal. The problem I see with this is that I will have unnecessary degrees of freedom (aka non-zeros in the matrix). From the distribution systems I've seen, it's possible that anywhere from 1/3 to 1/2 of the buses will be two-phase or less, meaning I may have nearly twice the amount of dofs than necessary if I wanted to preserve the block size = 6 for the AU mat. > > 2) Treat every phase node as a vertex aka solve a single-phase power flow solver. That way I guarantee to have a block size = 2, this is what Domenico's former student did in his thesis work. The problem I see with this is that I have a larger graph, which can take more time to setup and parallelize. > > 3) Create a "fieldsplit" where I essentially have three "blocks" - one for buses with all three phases, another for buses with only two phases, one for single-phase buses. This way each block/fieldsplit will have a consistent block size. I am not sure if this will solve the MatSetValues() issues, but it's, but can anyone give pointers on how to go about achieving this? > > Thanks, > Justin > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.mayhem23 at gmail.com Wed May 8 17:00:40 2019 From: dave.mayhem23 at gmail.com (Dave May) Date: Wed, 8 May 2019 23:00:40 +0100 Subject: [petsc-users] Extremely slow DMNetwork Jacobian assembly In-Reply-To: References: <4D37A5FF-992F-4959-A469-1C7CDAFC813C@anl.gov> Message-ID: On Wed, 8 May 2019 at 20:34, Justin Chang via petsc-users < petsc-users at mcs.anl.gov> wrote: > So here's the branch/repo to the working example I have: > > https://github.com/jychang48/petsc-dss/tree/single-bus-vertex > > Type 'make' to compile the dss, it should work with the latest petsc-dev > > To test the performance, I've taken an existing IEEE 13-bus and duplicated > it N times to create a long radial-like network. I have three sizes where N > = 100, 500, and 1000. Those test files are listed as: > > input/test_100.m > input/test_500.m > input/test_1000.m > > I also created another set of examples where the IEEE 13-bus is fully > balanced (but the program will crash ar the solve step because I used some > unrealistic parameters for the Y-bus matrices and probably have some zeros > somewhere). They are listed as: > > input/test2_100.m > input/test2_500.m > input/test2_1000.m > > The dof count and matrices for the test2_*.m files are slightly larger > than their respective test_*.m but they have a bs=6. > > To run these tests, type the following: > > ./dpflow -input input/test_100.m > > I have a timer that shows how long it takes to compute the Jacobian. > Attached are the log outputs I have for each of the six cases. > > Turns out that only the first call to the SNESComputeJacobian() is slow, > all the subsequent calls are fast as I expect. This makes me think it still > has something to do with matrix allocation. > I think it is a preallocation issue. Looking to some of the output files (test_1000.out, test_100.out), under Mat Object I see this in the KSPView total number of mallocs used during MatSetValues calls =10000 > > Thanks for the help everyone, > > Justin > > On Wed, May 8, 2019 at 12:36 PM Matthew Knepley wrote: > >> On Wed, May 8, 2019 at 2:30 PM Justin Chang wrote: >> >>> Hi everyone, >>> >>> Yes I have these lines in my code: >>> >>> ierr = DMCreateMatrix(networkdm,&J);CHKERRQ(ierr); >>> ierr = >>> MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE);CHKERRQ(ierr); >>> >> >> Okay, its not allocation. So maybe Hong is right that its setting great >> big element matrices. We will see with the example. >> >> Thanks, >> >> Matt >> >> >>> I tried -info and here's my output: >>> >>> [0] PetscInitialize(): PETSc successfully started: number of processors >>> = 1 >>> [0] PetscInitialize(): Running on machine: jchang31606s.domain >>> [0] PetscCommDuplicate(): Duplicating a communicator 4436504608 >>> 140550815662944 max tags = 2147483647 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 >>> 140550815662944 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 >>> 140550815662944 >>> Base power = 0.166667, numbus = 115000, numgen = 5000, numyl = 75000, >>> numdl = 5000, numlbr = 109999, numtbr = 5000 >>> >>> **** Power flow dist case **** >>> >>> Base power = 0.166667, nbus = 115000, ngen = 5000, nwye = 75000, ndelta >>> = 5000, nbranch = 114999 >>> [0] PetscCommDuplicate(): Duplicating a communicator 4436505120 >>> 140550815683104 max tags = 2147483647 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>> 140550815683104 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>> 140550815683104 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>> 140550815683104 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>> 140550815683104 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>> 140550815683104 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>> 140550815683104 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>> 140550815683104 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>> 140550815683104 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>> 140550815683104 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>> 140550815683104 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>> 140550815683104 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>> 140550815683104 >>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 620000 X 620000; storage >>> space: 0 unneeded,10799928 used >>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 >>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 28 >>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows >>> 0)/(num_localrows 620000) < 0.6. Do not use CompressedRow routines. >>> [0] MatSeqAIJCheckInode(): Found 205000 nodes of 620000. Limit used: 5. >>> Using Inode routines >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>> 140550815683104 >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 >>> 140550815662944 >>> [0] DMGetDMSNES(): Creating new DMSNES >>> [0] DMGetDMKSP(): Creating new DMKSP >>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>> 140550815683104 >>> 0 SNES Function norm 1155.45 >>> >>> nothing else -info related shows up as I'm iterating through the vertex >>> loop. >>> >>> I'll have a MWE for you guys to play with shortly. >>> >>> Thanks, >>> Justin >>> >>> On Wed, May 8, 2019 at 12:10 PM Smith, Barry F. >>> wrote: >>> >>>> >>>> Justin, >>>> >>>> Are you providing matrix entries that connect directly one vertex >>>> to another vertex ACROSS an edge? I don't think that is supported by the >>>> DMNetwork model. The assumption is that edges are only connected to >>>> vertices and vertices are only connected to neighboring edges. >>>> >>>> Everyone, >>>> >>>> I second Matt's reply. >>>> >>>> How is the DMNetwork preallocating for the Jacobian? Does it take >>>> into account coupling between neighboring vertices/edges? Or does it assume >>>> no coupling. Or assume full coupling. If it assumes no coupling and the >>>> user has a good amount of coupling it will be very slow. >>>> >>>> There would need to be a way for the user provide the coupling >>>> information between neighboring vertices/edges if it assumes no coupling. >>>> >>>> Barry >>>> >>>> >>>> > On May 8, 2019, at 7:44 AM, Matthew Knepley via petsc-users < >>>> petsc-users at mcs.anl.gov> wrote: >>>> > >>>> > On Wed, May 8, 2019 at 4:45 AM Justin Chang via petsc-users < >>>> petsc-users at mcs.anl.gov> wrote: >>>> > Hi guys, >>>> > >>>> > I have a fully working distribution system solver written using >>>> DMNetwork, The idea is that each electrical bus can have up to three phase >>>> nodes, and each phase node has two unknowns: voltage magnitude and angle. >>>> In a completely balanced system, each bus has three nodes, but in an >>>> unbalanced system some of the buses can be either single phase or two-phase. >>>> > >>>> > The working DMNetwork code I developed, loosely based on the SNES >>>> network/power.c, essentially represents each vertex as a bus. >>>> DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to >>>> each vertex. If every single bus had the same number of variables, the mat >>>> block size = 2, 4, or 6, and my code is both fast and scalable. However, if >>>> the unknowns per DMNetwork vertex unknowns are not the same across, then my >>>> SNESFormJacobian function becomes extremely extremely slow. Specifically, >>>> the MatSetValues() calls when the col/row global indices contain an offset >>>> value that points to a neighboring bus vertex. >>>> > >>>> > I have never seen MatSetValues() be slow unless it is allocating. Did >>>> you confirm that you are not allocating, with -info? >>>> > >>>> > Thanks, >>>> > >>>> > MAtt >>>> > >>>> > Why is that? Is it because I no longer have a uniform block structure >>>> and lose the speed/optimization benefits of iterating through an AIJ >>>> matrix? I see three potential workarounds: >>>> > >>>> > 1) Treat every vertex as a three phase bus and "zero out" all the >>>> unused phase node dofs and put a 1 in the diagonal. The problem I see with >>>> this is that I will have unnecessary degrees of freedom (aka non-zeros in >>>> the matrix). From the distribution systems I've seen, it's possible that >>>> anywhere from 1/3 to 1/2 of the buses will be two-phase or less, meaning I >>>> may have nearly twice the amount of dofs than necessary if I wanted to >>>> preserve the block size = 6 for the AU mat. >>>> > >>>> > 2) Treat every phase node as a vertex aka solve a single-phase power >>>> flow solver. That way I guarantee to have a block size = 2, this is what >>>> Domenico's former student did in his thesis work. The problem I see with >>>> this is that I have a larger graph, which can take more time to setup and >>>> parallelize. >>>> > >>>> > 3) Create a "fieldsplit" where I essentially have three "blocks" - >>>> one for buses with all three phases, another for buses with only two >>>> phases, one for single-phase buses. This way each block/fieldsplit will >>>> have a consistent block size. I am not sure if this will solve the >>>> MatSetValues() issues, but it's, but can anyone give pointers on how to go >>>> about achieving this? >>>> > >>>> > Thanks, >>>> > Justin >>>> > >>>> > >>>> > -- >>>> > What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> > -- Norbert Wiener >>>> > >>>> > https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jychang48 at gmail.com Wed May 8 17:10:11 2019 From: jychang48 at gmail.com (Justin Chang) Date: Wed, 8 May 2019 16:10:11 -0600 Subject: [petsc-users] Extremely slow DMNetwork Jacobian assembly In-Reply-To: References: <4D37A5FF-992F-4959-A469-1C7CDAFC813C@anl.gov> Message-ID: Yeah I noticed that too. So could that possibly imply that the row/col indices used in my MatSetValues() extend beyond what was originally allocated in the DMCreateMatrix() function? On Wed, May 8, 2019 at 4:00 PM Dave May wrote: > > > On Wed, 8 May 2019 at 20:34, Justin Chang via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> So here's the branch/repo to the working example I have: >> >> https://github.com/jychang48/petsc-dss/tree/single-bus-vertex >> >> Type 'make' to compile the dss, it should work with the latest petsc-dev >> >> To test the performance, I've taken an existing IEEE 13-bus and >> duplicated it N times to create a long radial-like network. I have three >> sizes where N = 100, 500, and 1000. Those test files are listed as: >> >> input/test_100.m >> input/test_500.m >> input/test_1000.m >> >> I also created another set of examples where the IEEE 13-bus is fully >> balanced (but the program will crash ar the solve step because I used some >> unrealistic parameters for the Y-bus matrices and probably have some zeros >> somewhere). They are listed as: >> >> input/test2_100.m >> input/test2_500.m >> input/test2_1000.m >> >> The dof count and matrices for the test2_*.m files are slightly larger >> than their respective test_*.m but they have a bs=6. >> >> To run these tests, type the following: >> >> ./dpflow -input input/test_100.m >> >> I have a timer that shows how long it takes to compute the Jacobian. >> Attached are the log outputs I have for each of the six cases. >> >> Turns out that only the first call to the SNESComputeJacobian() is slow, >> all the subsequent calls are fast as I expect. This makes me think it still >> has something to do with matrix allocation. >> > > I think it is a preallocation issue. > Looking to some of the output files (test_1000.out, test_100.out), under > Mat Object I see this in the KSPView > > total number of mallocs used during MatSetValues calls =10000 > > > > > >> >> Thanks for the help everyone, >> >> Justin >> >> On Wed, May 8, 2019 at 12:36 PM Matthew Knepley >> wrote: >> >>> On Wed, May 8, 2019 at 2:30 PM Justin Chang wrote: >>> >>>> Hi everyone, >>>> >>>> Yes I have these lines in my code: >>>> >>>> ierr = DMCreateMatrix(networkdm,&J);CHKERRQ(ierr); >>>> ierr = >>>> MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE);CHKERRQ(ierr); >>>> >>> >>> Okay, its not allocation. So maybe Hong is right that its setting great >>> big element matrices. We will see with the example. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> I tried -info and here's my output: >>>> >>>> [0] PetscInitialize(): PETSc successfully started: number of processors >>>> = 1 >>>> [0] PetscInitialize(): Running on machine: jchang31606s.domain >>>> [0] PetscCommDuplicate(): Duplicating a communicator 4436504608 >>>> 140550815662944 max tags = 2147483647 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 >>>> 140550815662944 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 >>>> 140550815662944 >>>> Base power = 0.166667, numbus = 115000, numgen = 5000, numyl = 75000, >>>> numdl = 5000, numlbr = 109999, numtbr = 5000 >>>> >>>> **** Power flow dist case **** >>>> >>>> Base power = 0.166667, nbus = 115000, ngen = 5000, nwye = 75000, ndelta >>>> = 5000, nbranch = 114999 >>>> [0] PetscCommDuplicate(): Duplicating a communicator 4436505120 >>>> 140550815683104 max tags = 2147483647 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>>> 140550815683104 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>>> 140550815683104 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>>> 140550815683104 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>>> 140550815683104 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>>> 140550815683104 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>>> 140550815683104 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>>> 140550815683104 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>>> 140550815683104 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>>> 140550815683104 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>>> 140550815683104 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>>> 140550815683104 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>>> 140550815683104 >>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 620000 X 620000; storage >>>> space: 0 unneeded,10799928 used >>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is >>>> 0 >>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 28 >>>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>> 0)/(num_localrows 620000) < 0.6. Do not use CompressedRow routines. >>>> [0] MatSeqAIJCheckInode(): Found 205000 nodes of 620000. Limit used: 5. >>>> Using Inode routines >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>>> 140550815683104 >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 >>>> 140550815662944 >>>> [0] DMGetDMSNES(): Creating new DMSNES >>>> [0] DMGetDMKSP(): Creating new DMKSP >>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>>> 140550815683104 >>>> 0 SNES Function norm 1155.45 >>>> >>>> nothing else -info related shows up as I'm iterating through the vertex >>>> loop. >>>> >>>> I'll have a MWE for you guys to play with shortly. >>>> >>>> Thanks, >>>> Justin >>>> >>>> On Wed, May 8, 2019 at 12:10 PM Smith, Barry F. >>>> wrote: >>>> >>>>> >>>>> Justin, >>>>> >>>>> Are you providing matrix entries that connect directly one >>>>> vertex to another vertex ACROSS an edge? I don't think that is supported by >>>>> the DMNetwork model. The assumption is that edges are only connected to >>>>> vertices and vertices are only connected to neighboring edges. >>>>> >>>>> Everyone, >>>>> >>>>> I second Matt's reply. >>>>> >>>>> How is the DMNetwork preallocating for the Jacobian? Does it take >>>>> into account coupling between neighboring vertices/edges? Or does it assume >>>>> no coupling. Or assume full coupling. If it assumes no coupling and the >>>>> user has a good amount of coupling it will be very slow. >>>>> >>>>> There would need to be a way for the user provide the coupling >>>>> information between neighboring vertices/edges if it assumes no coupling. >>>>> >>>>> Barry >>>>> >>>>> >>>>> > On May 8, 2019, at 7:44 AM, Matthew Knepley via petsc-users < >>>>> petsc-users at mcs.anl.gov> wrote: >>>>> > >>>>> > On Wed, May 8, 2019 at 4:45 AM Justin Chang via petsc-users < >>>>> petsc-users at mcs.anl.gov> wrote: >>>>> > Hi guys, >>>>> > >>>>> > I have a fully working distribution system solver written using >>>>> DMNetwork, The idea is that each electrical bus can have up to three phase >>>>> nodes, and each phase node has two unknowns: voltage magnitude and angle. >>>>> In a completely balanced system, each bus has three nodes, but in an >>>>> unbalanced system some of the buses can be either single phase or two-phase. >>>>> > >>>>> > The working DMNetwork code I developed, loosely based on the SNES >>>>> network/power.c, essentially represents each vertex as a bus. >>>>> DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to >>>>> each vertex. If every single bus had the same number of variables, the mat >>>>> block size = 2, 4, or 6, and my code is both fast and scalable. However, if >>>>> the unknowns per DMNetwork vertex unknowns are not the same across, then my >>>>> SNESFormJacobian function becomes extremely extremely slow. Specifically, >>>>> the MatSetValues() calls when the col/row global indices contain an offset >>>>> value that points to a neighboring bus vertex. >>>>> > >>>>> > I have never seen MatSetValues() be slow unless it is allocating. >>>>> Did you confirm that you are not allocating, with -info? >>>>> > >>>>> > Thanks, >>>>> > >>>>> > MAtt >>>>> > >>>>> > Why is that? Is it because I no longer have a uniform block >>>>> structure and lose the speed/optimization benefits of iterating through an >>>>> AIJ matrix? I see three potential workarounds: >>>>> > >>>>> > 1) Treat every vertex as a three phase bus and "zero out" all the >>>>> unused phase node dofs and put a 1 in the diagonal. The problem I see with >>>>> this is that I will have unnecessary degrees of freedom (aka non-zeros in >>>>> the matrix). From the distribution systems I've seen, it's possible that >>>>> anywhere from 1/3 to 1/2 of the buses will be two-phase or less, meaning I >>>>> may have nearly twice the amount of dofs than necessary if I wanted to >>>>> preserve the block size = 6 for the AU mat. >>>>> > >>>>> > 2) Treat every phase node as a vertex aka solve a single-phase power >>>>> flow solver. That way I guarantee to have a block size = 2, this is what >>>>> Domenico's former student did in his thesis work. The problem I see with >>>>> this is that I have a larger graph, which can take more time to setup and >>>>> parallelize. >>>>> > >>>>> > 3) Create a "fieldsplit" where I essentially have three "blocks" - >>>>> one for buses with all three phases, another for buses with only two >>>>> phases, one for single-phase buses. This way each block/fieldsplit will >>>>> have a consistent block size. I am not sure if this will solve the >>>>> MatSetValues() issues, but it's, but can anyone give pointers on how to go >>>>> about achieving this? >>>>> > >>>>> > Thanks, >>>>> > Justin >>>>> > >>>>> > >>>>> > -- >>>>> > What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> > -- Norbert Wiener >>>>> > >>>>> > https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 8 17:16:00 2019 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 8 May 2019 18:16:00 -0400 Subject: [petsc-users] Extremely slow DMNetwork Jacobian assembly In-Reply-To: References: <4D37A5FF-992F-4959-A469-1C7CDAFC813C@anl.gov> Message-ID: On Wed, May 8, 2019 at 6:10 PM Justin Chang wrote: > Yeah I noticed that too. So could that possibly imply that the row/col > indices used in my MatSetValues() extend beyond what was originally > allocated in the DMCreateMatrix() function? > It definitely means that. We usually prevent this with a structured SetValues API. For example, DMDA uses MatSetValuesStencil() which cannot write outside the stencil you set. DMPlex uses MatSetValuesClosure(), which is guaranteed to be allocated. We should write one for DMNetwork. The allocation is just like Plex (I believe) where you allocate closure(star(p)), which would mean that setting values for a vertex gets the neighboring edges and their vertices, and setting values for an edge gets the covering vertices. Is that right for DMNetwork? Matt > On Wed, May 8, 2019 at 4:00 PM Dave May wrote: > >> >> >> On Wed, 8 May 2019 at 20:34, Justin Chang via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> >>> So here's the branch/repo to the working example I have: >>> >>> https://github.com/jychang48/petsc-dss/tree/single-bus-vertex >>> >>> Type 'make' to compile the dss, it should work with the latest petsc-dev >>> >>> To test the performance, I've taken an existing IEEE 13-bus and >>> duplicated it N times to create a long radial-like network. I have three >>> sizes where N = 100, 500, and 1000. Those test files are listed as: >>> >>> input/test_100.m >>> input/test_500.m >>> input/test_1000.m >>> >>> I also created another set of examples where the IEEE 13-bus is fully >>> balanced (but the program will crash ar the solve step because I used some >>> unrealistic parameters for the Y-bus matrices and probably have some zeros >>> somewhere). They are listed as: >>> >>> input/test2_100.m >>> input/test2_500.m >>> input/test2_1000.m >>> >>> The dof count and matrices for the test2_*.m files are slightly larger >>> than their respective test_*.m but they have a bs=6. >>> >>> To run these tests, type the following: >>> >>> ./dpflow -input input/test_100.m >>> >>> I have a timer that shows how long it takes to compute the Jacobian. >>> Attached are the log outputs I have for each of the six cases. >>> >>> Turns out that only the first call to the SNESComputeJacobian() is slow, >>> all the subsequent calls are fast as I expect. This makes me think it still >>> has something to do with matrix allocation. >>> >> >> I think it is a preallocation issue. >> Looking to some of the output files (test_1000.out, test_100.out), under >> Mat Object I see this in the KSPView >> >> total number of mallocs used during MatSetValues calls =10000 >> >> >> >> >> >>> >>> Thanks for the help everyone, >>> >>> Justin >>> >>> On Wed, May 8, 2019 at 12:36 PM Matthew Knepley >>> wrote: >>> >>>> On Wed, May 8, 2019 at 2:30 PM Justin Chang >>>> wrote: >>>> >>>>> Hi everyone, >>>>> >>>>> Yes I have these lines in my code: >>>>> >>>>> ierr = DMCreateMatrix(networkdm,&J);CHKERRQ(ierr); >>>>> ierr = >>>>> MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE);CHKERRQ(ierr); >>>>> >>>> >>>> Okay, its not allocation. So maybe Hong is right that its setting great >>>> big element matrices. We will see with the example. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> I tried -info and here's my output: >>>>> >>>>> [0] PetscInitialize(): PETSc successfully started: number of >>>>> processors = 1 >>>>> [0] PetscInitialize(): Running on machine: jchang31606s.domain >>>>> [0] PetscCommDuplicate(): Duplicating a communicator 4436504608 >>>>> 140550815662944 max tags = 2147483647 >>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 >>>>> 140550815662944 >>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 >>>>> 140550815662944 >>>>> Base power = 0.166667, numbus = 115000, numgen = 5000, numyl = 75000, >>>>> numdl = 5000, numlbr = 109999, numtbr = 5000 >>>>> >>>>> **** Power flow dist case **** >>>>> >>>>> Base power = 0.166667, nbus = 115000, ngen = 5000, nwye = 75000, >>>>> ndelta = 5000, nbranch = 114999 >>>>> [0] PetscCommDuplicate(): Duplicating a communicator 4436505120 >>>>> 140550815683104 max tags = 2147483647 >>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>>>> 140550815683104 >>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>>>> 140550815683104 >>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>>>> 140550815683104 >>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>>>> 140550815683104 >>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>>>> 140550815683104 >>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>>>> 140550815683104 >>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>>>> 140550815683104 >>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>>>> 140550815683104 >>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>>>> 140550815683104 >>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>>>> 140550815683104 >>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>>>> 140550815683104 >>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>>>> 140550815683104 >>>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 620000 X 620000; storage >>>>> space: 0 unneeded,10799928 used >>>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() >>>>> is 0 >>>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 28 >>>>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>>> 0)/(num_localrows 620000) < 0.6. Do not use CompressedRow routines. >>>>> [0] MatSeqAIJCheckInode(): Found 205000 nodes of 620000. Limit used: >>>>> 5. Using Inode routines >>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>>>> 140550815683104 >>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 >>>>> 140550815662944 >>>>> [0] DMGetDMSNES(): Creating new DMSNES >>>>> [0] DMGetDMKSP(): Creating new DMKSP >>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 >>>>> 140550815683104 >>>>> 0 SNES Function norm 1155.45 >>>>> >>>>> nothing else -info related shows up as I'm iterating through the >>>>> vertex loop. >>>>> >>>>> I'll have a MWE for you guys to play with shortly. >>>>> >>>>> Thanks, >>>>> Justin >>>>> >>>>> On Wed, May 8, 2019 at 12:10 PM Smith, Barry F. >>>>> wrote: >>>>> >>>>>> >>>>>> Justin, >>>>>> >>>>>> Are you providing matrix entries that connect directly one >>>>>> vertex to another vertex ACROSS an edge? I don't think that is supported by >>>>>> the DMNetwork model. The assumption is that edges are only connected to >>>>>> vertices and vertices are only connected to neighboring edges. >>>>>> >>>>>> Everyone, >>>>>> >>>>>> I second Matt's reply. >>>>>> >>>>>> How is the DMNetwork preallocating for the Jacobian? Does it take >>>>>> into account coupling between neighboring vertices/edges? Or does it assume >>>>>> no coupling. Or assume full coupling. If it assumes no coupling and the >>>>>> user has a good amount of coupling it will be very slow. >>>>>> >>>>>> There would need to be a way for the user provide the coupling >>>>>> information between neighboring vertices/edges if it assumes no coupling. >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> > On May 8, 2019, at 7:44 AM, Matthew Knepley via petsc-users < >>>>>> petsc-users at mcs.anl.gov> wrote: >>>>>> > >>>>>> > On Wed, May 8, 2019 at 4:45 AM Justin Chang via petsc-users < >>>>>> petsc-users at mcs.anl.gov> wrote: >>>>>> > Hi guys, >>>>>> > >>>>>> > I have a fully working distribution system solver written using >>>>>> DMNetwork, The idea is that each electrical bus can have up to three phase >>>>>> nodes, and each phase node has two unknowns: voltage magnitude and angle. >>>>>> In a completely balanced system, each bus has three nodes, but in an >>>>>> unbalanced system some of the buses can be either single phase or two-phase. >>>>>> > >>>>>> > The working DMNetwork code I developed, loosely based on the SNES >>>>>> network/power.c, essentially represents each vertex as a bus. >>>>>> DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to >>>>>> each vertex. If every single bus had the same number of variables, the mat >>>>>> block size = 2, 4, or 6, and my code is both fast and scalable. However, if >>>>>> the unknowns per DMNetwork vertex unknowns are not the same across, then my >>>>>> SNESFormJacobian function becomes extremely extremely slow. Specifically, >>>>>> the MatSetValues() calls when the col/row global indices contain an offset >>>>>> value that points to a neighboring bus vertex. >>>>>> > >>>>>> > I have never seen MatSetValues() be slow unless it is allocating. >>>>>> Did you confirm that you are not allocating, with -info? >>>>>> > >>>>>> > Thanks, >>>>>> > >>>>>> > MAtt >>>>>> > >>>>>> > Why is that? Is it because I no longer have a uniform block >>>>>> structure and lose the speed/optimization benefits of iterating through an >>>>>> AIJ matrix? I see three potential workarounds: >>>>>> > >>>>>> > 1) Treat every vertex as a three phase bus and "zero out" all the >>>>>> unused phase node dofs and put a 1 in the diagonal. The problem I see with >>>>>> this is that I will have unnecessary degrees of freedom (aka non-zeros in >>>>>> the matrix). From the distribution systems I've seen, it's possible that >>>>>> anywhere from 1/3 to 1/2 of the buses will be two-phase or less, meaning I >>>>>> may have nearly twice the amount of dofs than necessary if I wanted to >>>>>> preserve the block size = 6 for the AU mat. >>>>>> > >>>>>> > 2) Treat every phase node as a vertex aka solve a single-phase >>>>>> power flow solver. That way I guarantee to have a block size = 2, this is >>>>>> what Domenico's former student did in his thesis work. The problem I see >>>>>> with this is that I have a larger graph, which can take more time to setup >>>>>> and parallelize. >>>>>> > >>>>>> > 3) Create a "fieldsplit" where I essentially have three "blocks" - >>>>>> one for buses with all three phases, another for buses with only two >>>>>> phases, one for single-phase buses. This way each block/fieldsplit will >>>>>> have a consistent block size. I am not sure if this will solve the >>>>>> MatSetValues() issues, but it's, but can anyone give pointers on how to go >>>>>> about achieving this? >>>>>> > >>>>>> > Thanks, >>>>>> > Justin >>>>>> > >>>>>> > >>>>>> > -- >>>>>> > What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which their >>>>>> experiments lead. >>>>>> > -- Norbert Wiener >>>>>> > >>>>>> > https://www.cse.buffalo.edu/~knepley/ >>>>>> >>>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jychang48 at gmail.com Wed May 8 17:53:29 2019 From: jychang48 at gmail.com (Justin Chang) Date: Wed, 8 May 2019 16:53:29 -0600 Subject: [petsc-users] Extremely slow DMNetwork Jacobian assembly In-Reply-To: References: <4D37A5FF-992F-4959-A469-1C7CDAFC813C@anl.gov> Message-ID: Hi everyone, Ah I figured out what was wrong. It was an error on my part. My code was unintentionally calling MatSetValues when it shouldn't have. I fixed the problem and the speed of my code is as expected. In case anyone's interested to know the grand number of lines of codes needed to fix this error: https://github.com/jychang48/petsc-dss/commit/d93c49c679af79c36613bc0456a79842060ac2f4 Thanks everyone for your help! Justin On Wed, May 8, 2019 at 4:16 PM Matthew Knepley wrote: > On Wed, May 8, 2019 at 6:10 PM Justin Chang wrote: > >> Yeah I noticed that too. So could that possibly imply that the row/col >> indices used in my MatSetValues() extend beyond what was originally >> allocated in the DMCreateMatrix() function? >> > > It definitely means that. > > We usually prevent this with a structured SetValues API. For example, DMDA > uses MatSetValuesStencil() which cannot write > outside the stencil you set. DMPlex uses MatSetValuesClosure(), which is > guaranteed to be allocated. We should write one > for DMNetwork. The allocation is just like Plex (I believe) where you > allocate closure(star(p)), which would mean that setting > values for a vertex gets the neighboring edges and their vertices, and > setting values for an edge gets the covering vertices. > Is that right for DMNetwork? > > Matt > > >> On Wed, May 8, 2019 at 4:00 PM Dave May wrote: >> >>> >>> >>> On Wed, 8 May 2019 at 20:34, Justin Chang via petsc-users < >>> petsc-users at mcs.anl.gov> wrote: >>> >>>> So here's the branch/repo to the working example I have: >>>> >>>> https://github.com/jychang48/petsc-dss/tree/single-bus-vertex >>>> >>>> Type 'make' to compile the dss, it should work with the latest petsc-dev >>>> >>>> To test the performance, I've taken an existing IEEE 13-bus and >>>> duplicated it N times to create a long radial-like network. I have three >>>> sizes where N = 100, 500, and 1000. Those test files are listed as: >>>> >>>> input/test_100.m >>>> input/test_500.m >>>> input/test_1000.m >>>> >>>> I also created another set of examples where the IEEE 13-bus is fully >>>> balanced (but the program will crash ar the solve step because I used some >>>> unrealistic parameters for the Y-bus matrices and probably have some zeros >>>> somewhere). They are listed as: >>>> >>>> input/test2_100.m >>>> input/test2_500.m >>>> input/test2_1000.m >>>> >>>> The dof count and matrices for the test2_*.m files are slightly larger >>>> than their respective test_*.m but they have a bs=6. >>>> >>>> To run these tests, type the following: >>>> >>>> ./dpflow -input input/test_100.m >>>> >>>> I have a timer that shows how long it takes to compute the Jacobian. >>>> Attached are the log outputs I have for each of the six cases. >>>> >>>> Turns out that only the first call to the SNESComputeJacobian() is >>>> slow, all the subsequent calls are fast as I expect. This makes me think it >>>> still has something to do with matrix allocation. >>>> >>> >>> I think it is a preallocation issue. >>> Looking to some of the output files (test_1000.out, test_100.out), under >>> Mat Object I see this in the KSPView >>> >>> total number of mallocs used during MatSetValues calls =10000 >>> >>> >>> >>> >>> >>>> >>>> Thanks for the help everyone, >>>> >>>> Justin >>>> >>>> On Wed, May 8, 2019 at 12:36 PM Matthew Knepley >>>> wrote: >>>> >>>>> On Wed, May 8, 2019 at 2:30 PM Justin Chang >>>>> wrote: >>>>> >>>>>> Hi everyone, >>>>>> >>>>>> Yes I have these lines in my code: >>>>>> >>>>>> ierr = DMCreateMatrix(networkdm,&J);CHKERRQ(ierr); >>>>>> ierr = >>>>>> MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE);CHKERRQ(ierr); >>>>>> >>>>> >>>>> Okay, its not allocation. So maybe Hong is right that its setting >>>>> great big element matrices. We will see with the example. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> I tried -info and here's my output: >>>>>> >>>>>> [0] PetscInitialize(): PETSc successfully started: number of >>>>>> processors = 1 >>>>>> [0] PetscInitialize(): Running on machine: jchang31606s.domain >>>>>> [0] PetscCommDuplicate(): Duplicating a communicator 4436504608 >>>>>> 140550815662944 max tags = 2147483647 >>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator >>>>>> 4436504608 140550815662944 >>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator >>>>>> 4436504608 140550815662944 >>>>>> Base power = 0.166667, numbus = 115000, numgen = 5000, numyl = 75000, >>>>>> numdl = 5000, numlbr = 109999, numtbr = 5000 >>>>>> >>>>>> **** Power flow dist case **** >>>>>> >>>>>> Base power = 0.166667, nbus = 115000, ngen = 5000, nwye = 75000, >>>>>> ndelta = 5000, nbranch = 114999 >>>>>> [0] PetscCommDuplicate(): Duplicating a communicator 4436505120 >>>>>> 140550815683104 max tags = 2147483647 >>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator >>>>>> 4436505120 140550815683104 >>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator >>>>>> 4436505120 140550815683104 >>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator >>>>>> 4436505120 140550815683104 >>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator >>>>>> 4436505120 140550815683104 >>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator >>>>>> 4436505120 140550815683104 >>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator >>>>>> 4436505120 140550815683104 >>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator >>>>>> 4436505120 140550815683104 >>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator >>>>>> 4436505120 140550815683104 >>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator >>>>>> 4436505120 140550815683104 >>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator >>>>>> 4436505120 140550815683104 >>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator >>>>>> 4436505120 140550815683104 >>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator >>>>>> 4436505120 140550815683104 >>>>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 620000 X 620000; storage >>>>>> space: 0 unneeded,10799928 used >>>>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() >>>>>> is 0 >>>>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 28 >>>>>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>>>> 0)/(num_localrows 620000) < 0.6. Do not use CompressedRow routines. >>>>>> [0] MatSeqAIJCheckInode(): Found 205000 nodes of 620000. Limit used: >>>>>> 5. Using Inode routines >>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator >>>>>> 4436505120 140550815683104 >>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator >>>>>> 4436504608 140550815662944 >>>>>> [0] DMGetDMSNES(): Creating new DMSNES >>>>>> [0] DMGetDMKSP(): Creating new DMKSP >>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator >>>>>> 4436505120 140550815683104 >>>>>> 0 SNES Function norm 1155.45 >>>>>> >>>>>> nothing else -info related shows up as I'm iterating through the >>>>>> vertex loop. >>>>>> >>>>>> I'll have a MWE for you guys to play with shortly. >>>>>> >>>>>> Thanks, >>>>>> Justin >>>>>> >>>>>> On Wed, May 8, 2019 at 12:10 PM Smith, Barry F. >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> Justin, >>>>>>> >>>>>>> Are you providing matrix entries that connect directly one >>>>>>> vertex to another vertex ACROSS an edge? I don't think that is supported by >>>>>>> the DMNetwork model. The assumption is that edges are only connected to >>>>>>> vertices and vertices are only connected to neighboring edges. >>>>>>> >>>>>>> Everyone, >>>>>>> >>>>>>> I second Matt's reply. >>>>>>> >>>>>>> How is the DMNetwork preallocating for the Jacobian? Does it take >>>>>>> into account coupling between neighboring vertices/edges? Or does it assume >>>>>>> no coupling. Or assume full coupling. If it assumes no coupling and the >>>>>>> user has a good amount of coupling it will be very slow. >>>>>>> >>>>>>> There would need to be a way for the user provide the coupling >>>>>>> information between neighboring vertices/edges if it assumes no coupling. >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> >>>>>>> > On May 8, 2019, at 7:44 AM, Matthew Knepley via petsc-users < >>>>>>> petsc-users at mcs.anl.gov> wrote: >>>>>>> > >>>>>>> > On Wed, May 8, 2019 at 4:45 AM Justin Chang via petsc-users < >>>>>>> petsc-users at mcs.anl.gov> wrote: >>>>>>> > Hi guys, >>>>>>> > >>>>>>> > I have a fully working distribution system solver written using >>>>>>> DMNetwork, The idea is that each electrical bus can have up to three phase >>>>>>> nodes, and each phase node has two unknowns: voltage magnitude and angle. >>>>>>> In a completely balanced system, each bus has three nodes, but in an >>>>>>> unbalanced system some of the buses can be either single phase or two-phase. >>>>>>> > >>>>>>> > The working DMNetwork code I developed, loosely based on the SNES >>>>>>> network/power.c, essentially represents each vertex as a bus. >>>>>>> DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to >>>>>>> each vertex. If every single bus had the same number of variables, the mat >>>>>>> block size = 2, 4, or 6, and my code is both fast and scalable. However, if >>>>>>> the unknowns per DMNetwork vertex unknowns are not the same across, then my >>>>>>> SNESFormJacobian function becomes extremely extremely slow. Specifically, >>>>>>> the MatSetValues() calls when the col/row global indices contain an offset >>>>>>> value that points to a neighboring bus vertex. >>>>>>> > >>>>>>> > I have never seen MatSetValues() be slow unless it is allocating. >>>>>>> Did you confirm that you are not allocating, with -info? >>>>>>> > >>>>>>> > Thanks, >>>>>>> > >>>>>>> > MAtt >>>>>>> > >>>>>>> > Why is that? Is it because I no longer have a uniform block >>>>>>> structure and lose the speed/optimization benefits of iterating through an >>>>>>> AIJ matrix? I see three potential workarounds: >>>>>>> > >>>>>>> > 1) Treat every vertex as a three phase bus and "zero out" all the >>>>>>> unused phase node dofs and put a 1 in the diagonal. The problem I see with >>>>>>> this is that I will have unnecessary degrees of freedom (aka non-zeros in >>>>>>> the matrix). From the distribution systems I've seen, it's possible that >>>>>>> anywhere from 1/3 to 1/2 of the buses will be two-phase or less, meaning I >>>>>>> may have nearly twice the amount of dofs than necessary if I wanted to >>>>>>> preserve the block size = 6 for the AU mat. >>>>>>> > >>>>>>> > 2) Treat every phase node as a vertex aka solve a single-phase >>>>>>> power flow solver. That way I guarantee to have a block size = 2, this is >>>>>>> what Domenico's former student did in his thesis work. The problem I see >>>>>>> with this is that I have a larger graph, which can take more time to setup >>>>>>> and parallelize. >>>>>>> > >>>>>>> > 3) Create a "fieldsplit" where I essentially have three "blocks" - >>>>>>> one for buses with all three phases, another for buses with only two >>>>>>> phases, one for single-phase buses. This way each block/fieldsplit will >>>>>>> have a consistent block size. I am not sure if this will solve the >>>>>>> MatSetValues() issues, but it's, but can anyone give pointers on how to go >>>>>>> about achieving this? >>>>>>> > >>>>>>> > Thanks, >>>>>>> > Justin >>>>>>> > >>>>>>> > >>>>>>> > -- >>>>>>> > What most experimenters take for granted before they begin their >>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>> experiments lead. >>>>>>> > -- Norbert Wiener >>>>>>> > >>>>>>> > https://www.cse.buffalo.edu/~knepley/ >>>>>>> >>>>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Wed May 8 20:00:48 2019 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Thu, 9 May 2019 01:00:48 +0000 Subject: [petsc-users] Extremely slow DMNetwork Jacobian assembly In-Reply-To: References: <4D37A5FF-992F-4959-A469-1C7CDAFC813C@anl.gov> Message-ID: Justin: Great, the issue is resolved. Why MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE) does not raise an error? Matt, We usually prevent this with a structured SetValues API. For example, DMDA uses MatSetValuesStencil() which cannot write outside the stencil you set. DMPlex uses MatSetValuesClosure(), which is guaranteed to be allocated. We should write one for DMNetwork. The allocation is just like Plex (I believe) where you allocate closure(star(p)), which would mean that setting values for a vertex gets the neighboring edges and their vertices, and setting values for an edge gets the covering vertices. Is that right for DMNetwork? Yes, DMNetwork behaves in this fashion. I cannot find MatSetValuesClosure() in petsc-master. Can you provide detailed instruction on how to implement MatSetValuesClosure() for DMNetwork? Note, dmnetwork is a subclass of DMPlex. Hong On Wed, May 8, 2019 at 4:00 PM Dave May > wrote: On Wed, 8 May 2019 at 20:34, Justin Chang via petsc-users > wrote: So here's the branch/repo to the working example I have: https://github.com/jychang48/petsc-dss/tree/single-bus-vertex Type 'make' to compile the dss, it should work with the latest petsc-dev To test the performance, I've taken an existing IEEE 13-bus and duplicated it N times to create a long radial-like network. I have three sizes where N = 100, 500, and 1000. Those test files are listed as: input/test_100.m input/test_500.m input/test_1000.m I also created another set of examples where the IEEE 13-bus is fully balanced (but the program will crash ar the solve step because I used some unrealistic parameters for the Y-bus matrices and probably have some zeros somewhere). They are listed as: input/test2_100.m input/test2_500.m input/test2_1000.m The dof count and matrices for the test2_*.m files are slightly larger than their respective test_*.m but they have a bs=6. To run these tests, type the following: ./dpflow -input input/test_100.m I have a timer that shows how long it takes to compute the Jacobian. Attached are the log outputs I have for each of the six cases. Turns out that only the first call to the SNESComputeJacobian() is slow, all the subsequent calls are fast as I expect. This makes me think it still has something to do with matrix allocation. I think it is a preallocation issue. Looking to some of the output files (test_1000.out, test_100.out), under Mat Object I see this in the KSPView total number of mallocs used during MatSetValues calls =10000 Thanks for the help everyone, Justin On Wed, May 8, 2019 at 12:36 PM Matthew Knepley > wrote: On Wed, May 8, 2019 at 2:30 PM Justin Chang > wrote: Hi everyone, Yes I have these lines in my code: ierr = DMCreateMatrix(networkdm,&J);CHKERRQ(ierr); ierr = MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE);CHKERRQ(ierr); Okay, its not allocation. So maybe Hong is right that its setting great big element matrices. We will see with the example. Thanks, Matt I tried -info and here's my output: [0] PetscInitialize(): PETSc successfully started: number of processors = 1 [0] PetscInitialize(): Running on machine: jchang31606s.domain [0] PetscCommDuplicate(): Duplicating a communicator 4436504608 140550815662944 max tags = 2147483647 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944 Base power = 0.166667, numbus = 115000, numgen = 5000, numyl = 75000, numdl = 5000, numlbr = 109999, numtbr = 5000 **** Power flow dist case **** Base power = 0.166667, nbus = 115000, ngen = 5000, nwye = 75000, ndelta = 5000, nbranch = 114999 [0] PetscCommDuplicate(): Duplicating a communicator 4436505120 140550815683104 max tags = 2147483647 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 620000 X 620000; storage space: 0 unneeded,10799928 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 28 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 620000) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 205000 nodes of 620000. Limit used: 5. Using Inode routines [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944 [0] DMGetDMSNES(): Creating new DMSNES [0] DMGetDMKSP(): Creating new DMKSP [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 0 SNES Function norm 1155.45 nothing else -info related shows up as I'm iterating through the vertex loop. I'll have a MWE for you guys to play with shortly. Thanks, Justin On Wed, May 8, 2019 at 12:10 PM Smith, Barry F. > wrote: Justin, Are you providing matrix entries that connect directly one vertex to another vertex ACROSS an edge? I don't think that is supported by the DMNetwork model. The assumption is that edges are only connected to vertices and vertices are only connected to neighboring edges. Everyone, I second Matt's reply. How is the DMNetwork preallocating for the Jacobian? Does it take into account coupling between neighboring vertices/edges? Or does it assume no coupling. Or assume full coupling. If it assumes no coupling and the user has a good amount of coupling it will be very slow. There would need to be a way for the user provide the coupling information between neighboring vertices/edges if it assumes no coupling. Barry > On May 8, 2019, at 7:44 AM, Matthew Knepley via petsc-users > wrote: > > On Wed, May 8, 2019 at 4:45 AM Justin Chang via petsc-users > wrote: > Hi guys, > > I have a fully working distribution system solver written using DMNetwork, The idea is that each electrical bus can have up to three phase nodes, and each phase node has two unknowns: voltage magnitude and angle. In a completely balanced system, each bus has three nodes, but in an unbalanced system some of the buses can be either single phase or two-phase. > > The working DMNetwork code I developed, loosely based on the SNES network/power.c, essentially represents each vertex as a bus. DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to each vertex. If every single bus had the same number of variables, the mat block size = 2, 4, or 6, and my code is both fast and scalable. However, if the unknowns per DMNetwork vertex unknowns are not the same across, then my SNESFormJacobian function becomes extremely extremely slow. Specifically, the MatSetValues() calls when the col/row global indices contain an offset value that points to a neighboring bus vertex. > > I have never seen MatSetValues() be slow unless it is allocating. Did you confirm that you are not allocating, with -info? > > Thanks, > > MAtt > > Why is that? Is it because I no longer have a uniform block structure and lose the speed/optimization benefits of iterating through an AIJ matrix? I see three potential workarounds: > > 1) Treat every vertex as a three phase bus and "zero out" all the unused phase node dofs and put a 1 in the diagonal. The problem I see with this is that I will have unnecessary degrees of freedom (aka non-zeros in the matrix). From the distribution systems I've seen, it's possible that anywhere from 1/3 to 1/2 of the buses will be two-phase or less, meaning I may have nearly twice the amount of dofs than necessary if I wanted to preserve the block size = 6 for the AU mat. > > 2) Treat every phase node as a vertex aka solve a single-phase power flow solver. That way I guarantee to have a block size = 2, this is what Domenico's former student did in his thesis work. The problem I see with this is that I have a larger graph, which can take more time to setup and parallelize. > > 3) Create a "fieldsplit" where I essentially have three "blocks" - one for buses with all three phases, another for buses with only two phases, one for single-phase buses. This way each block/fieldsplit will have a consistent block size. I am not sure if this will solve the MatSetValues() issues, but it's, but can anyone give pointers on how to go about achieving this? > > Thanks, > Justin > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 8 20:54:08 2019 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 8 May 2019 21:54:08 -0400 Subject: [petsc-users] Extremely slow DMNetwork Jacobian assembly In-Reply-To: References: <4D37A5FF-992F-4959-A469-1C7CDAFC813C@anl.gov> Message-ID: On Wed, May 8, 2019 at 9:00 PM Zhang, Hong wrote: > Justin: > Great, the issue is resolved. > Why MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE) does not > raise an error? > Because it has PETSC_FALSE. > Matt, > >> >>> We usually prevent this with a structured SetValues API. For example, >>> DMDA uses MatSetValuesStencil() which cannot write >>> outside the stencil you set. DMPlex uses MatSetValuesClosure(), which is >>> guaranteed to be allocated. We should write one >>> for DMNetwork. The allocation is just like Plex (I believe) where you >>> allocate closure(star(p)), which would mean that setting >>> values for a vertex gets the neighboring edges and their vertices, and >>> setting values for an edge gets the covering vertices. >>> Is that right for DMNetwork? >>> >> Yes, DMNetwork behaves in this fashion. > I cannot find MatSetValuesClosure() in petsc-master. > I mean https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DMPLEX/DMPlexMatSetClosure.html > Can you provide detailed instruction on how to implement > MatSetValuesClosure() for DMNetwork? > It will just work as is for edges, but not for vertices since you want to set the star, not the closure. You would just need to reverse exactly what is in that function. Thanks, Matt > Note, dmnetwork is a subclass of DMPlex. > > Hong > >> >>> >>> >>>> On Wed, May 8, 2019 at 4:00 PM Dave May >>>> wrote: >>>> >>>>> >>>>> >>>>> On Wed, 8 May 2019 at 20:34, Justin Chang via petsc-users < >>>>> petsc-users at mcs.anl.gov> wrote: >>>>> >>>>>> So here's the branch/repo to the working example I have: >>>>>> >>>>>> https://github.com/jychang48/petsc-dss/tree/single-bus-vertex >>>>>> >>>>>> Type 'make' to compile the dss, it should work with the latest >>>>>> petsc-dev >>>>>> >>>>>> To test the performance, I've taken an existing IEEE 13-bus and >>>>>> duplicated it N times to create a long radial-like network. I have three >>>>>> sizes where N = 100, 500, and 1000. Those test files are listed as: >>>>>> >>>>>> input/test_100.m >>>>>> input/test_500.m >>>>>> input/test_1000.m >>>>>> >>>>>> I also created another set of examples where the IEEE 13-bus is fully >>>>>> balanced (but the program will crash ar the solve step because I used some >>>>>> unrealistic parameters for the Y-bus matrices and probably have some zeros >>>>>> somewhere). They are listed as: >>>>>> >>>>>> input/test2_100.m >>>>>> input/test2_500.m >>>>>> input/test2_1000.m >>>>>> >>>>>> The dof count and matrices for the test2_*.m files are slightly >>>>>> larger than their respective test_*.m but they have a bs=6. >>>>>> >>>>>> To run these tests, type the following: >>>>>> >>>>>> ./dpflow -input input/test_100.m >>>>>> >>>>>> I have a timer that shows how long it takes to compute the Jacobian. >>>>>> Attached are the log outputs I have for each of the six cases. >>>>>> >>>>>> Turns out that only the first call to the SNESComputeJacobian() is >>>>>> slow, all the subsequent calls are fast as I expect. This makes me think it >>>>>> still has something to do with matrix allocation. >>>>>> >>>>> >>>>> I think it is a preallocation issue. >>>>> Looking to some of the output files (test_1000.out, test_100.out), >>>>> under Mat Object I see this in the KSPView >>>>> >>>>> total number of mallocs used during MatSetValues calls =10000 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> >>>>>> Thanks for the help everyone, >>>>>> >>>>>> Justin >>>>>> >>>>>> On Wed, May 8, 2019 at 12:36 PM Matthew Knepley >>>>>> wrote: >>>>>> >>>>>>> On Wed, May 8, 2019 at 2:30 PM Justin Chang >>>>>>> wrote: >>>>>>> >>>>>>>> Hi everyone, >>>>>>>> >>>>>>>> Yes I have these lines in my code: >>>>>>>> >>>>>>>> ierr = DMCreateMatrix(networkdm,&J);CHKERRQ(ierr); >>>>>>>> ierr = >>>>>>>> MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE);CHKERRQ(ierr); >>>>>>>> >>>>>>> >>>>>>> Okay, its not allocation. So maybe Hong is right that its setting >>>>>>> great big element matrices. We will see with the example. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Matt >>>>>>> >>>>>>> >>>>>>>> I tried -info and here's my output: >>>>>>>> >>>>>>>> [0] PetscInitialize(): PETSc successfully started: number of >>>>>>>> processors = 1 >>>>>>>> [0] PetscInitialize(): Running on machine: jchang31606s.domain >>>>>>>> [0] PetscCommDuplicate(): Duplicating a communicator 4436504608 >>>>>>>> 140550815662944 max tags = 2147483647 >>>>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator >>>>>>>> 4436504608 140550815662944 >>>>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator >>>>>>>> 4436504608 140550815662944 >>>>>>>> Base power = 0.166667, numbus = 115000, numgen = 5000, numyl = >>>>>>>> 75000, numdl = 5000, numlbr = 109999, numtbr = 5000 >>>>>>>> >>>>>>>> **** Power flow dist case **** >>>>>>>> >>>>>>>> Base power = 0.166667, nbus = 115000, ngen = 5000, nwye = 75000, >>>>>>>> ndelta = 5000, nbranch = 114999 >>>>>>>> [0] PetscCommDuplicate(): Duplicating a communicator 4436505120 >>>>>>>> 140550815683104 max tags = 2147483647 >>>>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator >>>>>>>> 4436505120 140550815683104 >>>>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator >>>>>>>> 4436505120 140550815683104 >>>>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator >>>>>>>> 4436505120 140550815683104 >>>>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator >>>>>>>> 4436505120 140550815683104 >>>>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator >>>>>>>> 4436505120 140550815683104 >>>>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator >>>>>>>> 4436505120 140550815683104 >>>>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator >>>>>>>> 4436505120 140550815683104 >>>>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator >>>>>>>> 4436505120 140550815683104 >>>>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator >>>>>>>> 4436505120 140550815683104 >>>>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator >>>>>>>> 4436505120 140550815683104 >>>>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator >>>>>>>> 4436505120 140550815683104 >>>>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator >>>>>>>> 4436505120 140550815683104 >>>>>>>> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 620000 X 620000; storage >>>>>>>> space: 0 unneeded,10799928 used >>>>>>>> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during >>>>>>>> MatSetValues() is 0 >>>>>>>> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 28 >>>>>>>> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows >>>>>>>> 0)/(num_localrows 620000) < 0.6. Do not use CompressedRow routines. >>>>>>>> [0] MatSeqAIJCheckInode(): Found 205000 nodes of 620000. Limit >>>>>>>> used: 5. Using Inode routines >>>>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator >>>>>>>> 4436505120 140550815683104 >>>>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator >>>>>>>> 4436504608 140550815662944 >>>>>>>> [0] DMGetDMSNES(): Creating new DMSNES >>>>>>>> [0] DMGetDMKSP(): Creating new DMKSP >>>>>>>> [0] PetscCommDuplicate(): Using internal PETSc communicator >>>>>>>> 4436505120 140550815683104 >>>>>>>> 0 SNES Function norm 1155.45 >>>>>>>> >>>>>>>> nothing else -info related shows up as I'm iterating through the >>>>>>>> vertex loop. >>>>>>>> >>>>>>>> I'll have a MWE for you guys to play with shortly. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Justin >>>>>>>> >>>>>>>> On Wed, May 8, 2019 at 12:10 PM Smith, Barry F. >>>>>>>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> Justin, >>>>>>>>> >>>>>>>>> Are you providing matrix entries that connect directly one >>>>>>>>> vertex to another vertex ACROSS an edge? I don't think that is supported by >>>>>>>>> the DMNetwork model. The assumption is that edges are only connected to >>>>>>>>> vertices and vertices are only connected to neighboring edges. >>>>>>>>> >>>>>>>>> Everyone, >>>>>>>>> >>>>>>>>> I second Matt's reply. >>>>>>>>> >>>>>>>>> How is the DMNetwork preallocating for the Jacobian? Does it >>>>>>>>> take into account coupling between neighboring vertices/edges? Or does it >>>>>>>>> assume no coupling. Or assume full coupling. If it assumes no coupling and >>>>>>>>> the user has a good amount of coupling it will be very slow. >>>>>>>>> >>>>>>>>> There would need to be a way for the user provide the coupling >>>>>>>>> information between neighboring vertices/edges if it assumes no coupling. >>>>>>>>> >>>>>>>>> Barry >>>>>>>>> >>>>>>>>> >>>>>>>>> > On May 8, 2019, at 7:44 AM, Matthew Knepley via petsc-users < >>>>>>>>> petsc-users at mcs.anl.gov> wrote: >>>>>>>>> > >>>>>>>>> > On Wed, May 8, 2019 at 4:45 AM Justin Chang via petsc-users < >>>>>>>>> petsc-users at mcs.anl.gov> wrote: >>>>>>>>> > Hi guys, >>>>>>>>> > >>>>>>>>> > I have a fully working distribution system solver written using >>>>>>>>> DMNetwork, The idea is that each electrical bus can have up to three phase >>>>>>>>> nodes, and each phase node has two unknowns: voltage magnitude and angle. >>>>>>>>> In a completely balanced system, each bus has three nodes, but in an >>>>>>>>> unbalanced system some of the buses can be either single phase or two-phase. >>>>>>>>> > >>>>>>>>> > The working DMNetwork code I developed, loosely based on the >>>>>>>>> SNES network/power.c, essentially represents each vertex as a bus. >>>>>>>>> DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to >>>>>>>>> each vertex. If every single bus had the same number of variables, the mat >>>>>>>>> block size = 2, 4, or 6, and my code is both fast and scalable. However, if >>>>>>>>> the unknowns per DMNetwork vertex unknowns are not the same across, then my >>>>>>>>> SNESFormJacobian function becomes extremely extremely slow. Specifically, >>>>>>>>> the MatSetValues() calls when the col/row global indices contain an offset >>>>>>>>> value that points to a neighboring bus vertex. >>>>>>>>> > >>>>>>>>> > I have never seen MatSetValues() be slow unless it is >>>>>>>>> allocating. Did you confirm that you are not allocating, with -info? >>>>>>>>> > >>>>>>>>> > Thanks, >>>>>>>>> > >>>>>>>>> > MAtt >>>>>>>>> > >>>>>>>>> > Why is that? Is it because I no longer have a uniform block >>>>>>>>> structure and lose the speed/optimization benefits of iterating through an >>>>>>>>> AIJ matrix? I see three potential workarounds: >>>>>>>>> > >>>>>>>>> > 1) Treat every vertex as a three phase bus and "zero out" all >>>>>>>>> the unused phase node dofs and put a 1 in the diagonal. The problem I see >>>>>>>>> with this is that I will have unnecessary degrees of freedom (aka non-zeros >>>>>>>>> in the matrix). From the distribution systems I've seen, it's possible >>>>>>>>> that anywhere from 1/3 to 1/2 of the buses will be two-phase or less, >>>>>>>>> meaning I may have nearly twice the amount of dofs than necessary if I >>>>>>>>> wanted to preserve the block size = 6 for the AU mat. >>>>>>>>> > >>>>>>>>> > 2) Treat every phase node as a vertex aka solve a single-phase >>>>>>>>> power flow solver. That way I guarantee to have a block size = 2, this is >>>>>>>>> what Domenico's former student did in his thesis work. The problem I see >>>>>>>>> with this is that I have a larger graph, which can take more time to setup >>>>>>>>> and parallelize. >>>>>>>>> > >>>>>>>>> > 3) Create a "fieldsplit" where I essentially have three "blocks" >>>>>>>>> - one for buses with all three phases, another for buses with only two >>>>>>>>> phases, one for single-phase buses. This way each block/fieldsplit will >>>>>>>>> have a consistent block size. I am not sure if this will solve the >>>>>>>>> MatSetValues() issues, but it's, but can anyone give pointers on how to go >>>>>>>>> about achieving this? >>>>>>>>> > >>>>>>>>> > Thanks, >>>>>>>>> > Justin >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > -- >>>>>>>>> > What most experimenters take for granted before they begin their >>>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>>> experiments lead. >>>>>>>>> > -- Norbert Wiener >>>>>>>>> > >>>>>>>>> > https://www.cse.buffalo.edu/~knepley/ >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their >>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>> experiments lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>> >>>>>>> >>>>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From shrirang.abhyankar at pnnl.gov Wed May 8 21:17:00 2019 From: shrirang.abhyankar at pnnl.gov (Abhyankar, Shrirang G) Date: Thu, 9 May 2019 02:17:00 +0000 Subject: [petsc-users] Extremely slow DMNetwork Jacobian assembly In-Reply-To: References: <4D37A5FF-992F-4959-A469-1C7CDAFC813C@anl.gov> Message-ID: From: petsc-users on behalf of "Zhang, Hong via petsc-users" Reply-To: "Zhang, Hong" Date: Wednesday, May 8, 2019 at 8:01 PM To: Justin Chang Cc: petsc-users Subject: Re: [petsc-users] Extremely slow DMNetwork Jacobian assembly Justin: Great, the issue is resolved. Why MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE) does not raise an error? I copy-pasted the above line from the power.c example. MatSetOption should use PETSC_TRUE to activate the MAT_NEW_NONZERO_ALLOCATION_ERR option. Matt, We usually prevent this with a structured SetValues API. For example, DMDA uses MatSetValuesStencil() which cannot write outside the stencil you set. DMPlex uses MatSetValuesClosure(), which is guaranteed to be allocated. We should write one for DMNetwork. The allocation is just like Plex (I believe) where you allocate closure(star(p)), which would mean that setting values for a vertex gets the neighboring edges and their vertices, and setting values for an edge gets the covering vertices. Is that right for DMNetwork? Yes, DMNetwork behaves in this fashion. I cannot find MatSetValuesClosure() in petsc-master. Can you provide detailed instruction on how to implement MatSetValuesClosure() for DMNetwork? Note, dmnetwork is a subclass of DMPlex. DMNetwork does not do any matrix creation by itself. It calls Plex to create the matrix. Hong On Wed, May 8, 2019 at 4:00 PM Dave May > wrote: On Wed, 8 May 2019 at 20:34, Justin Chang via petsc-users > wrote: So here's the branch/repo to the working example I have: https://github.com/jychang48/petsc-dss/tree/single-bus-vertex Type 'make' to compile the dss, it should work with the latest petsc-dev To test the performance, I've taken an existing IEEE 13-bus and duplicated it N times to create a long radial-like network. I have three sizes where N = 100, 500, and 1000. Those test files are listed as: input/test_100.m input/test_500.m input/test_1000.m I also created another set of examples where the IEEE 13-bus is fully balanced (but the program will crash ar the solve step because I used some unrealistic parameters for the Y-bus matrices and probably have some zeros somewhere). They are listed as: input/test2_100.m input/test2_500.m input/test2_1000.m The dof count and matrices for the test2_*.m files are slightly larger than their respective test_*.m but they have a bs=6. To run these tests, type the following: ./dpflow -input input/test_100.m I have a timer that shows how long it takes to compute the Jacobian. Attached are the log outputs I have for each of the six cases. Turns out that only the first call to the SNESComputeJacobian() is slow, all the subsequent calls are fast as I expect. This makes me think it still has something to do with matrix allocation. I think it is a preallocation issue. Looking to some of the output files (test_1000.out, test_100.out), under Mat Object I see this in the KSPView total number of mallocs used during MatSetValues calls =10000 Thanks for the help everyone, Justin On Wed, May 8, 2019 at 12:36 PM Matthew Knepley > wrote: On Wed, May 8, 2019 at 2:30 PM Justin Chang > wrote: Hi everyone, Yes I have these lines in my code: ierr = DMCreateMatrix(networkdm,&J);CHKERRQ(ierr); ierr = MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE);CHKERRQ(ierr); Okay, its not allocation. So maybe Hong is right that its setting great big element matrices. We will see with the example. Thanks, Matt I tried -info and here's my output: [0] PetscInitialize(): PETSc successfully started: number of processors = 1 [0] PetscInitialize(): Running on machine: jchang31606s.domain [0] PetscCommDuplicate(): Duplicating a communicator 4436504608 140550815662944 max tags = 2147483647 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944 Base power = 0.166667, numbus = 115000, numgen = 5000, numyl = 75000, numdl = 5000, numlbr = 109999, numtbr = 5000 **** Power flow dist case **** Base power = 0.166667, nbus = 115000, ngen = 5000, nwye = 75000, ndelta = 5000, nbranch = 114999 [0] PetscCommDuplicate(): Duplicating a communicator 4436505120 140550815683104 max tags = 2147483647 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 620000 X 620000; storage space: 0 unneeded,10799928 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 28 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 620000) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 205000 nodes of 620000. Limit used: 5. Using Inode routines [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944 [0] DMGetDMSNES(): Creating new DMSNES [0] DMGetDMKSP(): Creating new DMKSP [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 0 SNES Function norm 1155.45 nothing else -info related shows up as I'm iterating through the vertex loop. I'll have a MWE for you guys to play with shortly. Thanks, Justin On Wed, May 8, 2019 at 12:10 PM Smith, Barry F. > wrote: Justin, Are you providing matrix entries that connect directly one vertex to another vertex ACROSS an edge? I don't think that is supported by the DMNetwork model. The assumption is that edges are only connected to vertices and vertices are only connected to neighboring edges. Everyone, I second Matt's reply. How is the DMNetwork preallocating for the Jacobian? Does it take into account coupling between neighboring vertices/edges? Or does it assume no coupling. Or assume full coupling. If it assumes no coupling and the user has a good amount of coupling it will be very slow. There would need to be a way for the user provide the coupling information between neighboring vertices/edges if it assumes no coupling. Barry > On May 8, 2019, at 7:44 AM, Matthew Knepley via petsc-users > wrote: > > On Wed, May 8, 2019 at 4:45 AM Justin Chang via petsc-users > wrote: > Hi guys, > > I have a fully working distribution system solver written using DMNetwork, The idea is that each electrical bus can have up to three phase nodes, and each phase node has two unknowns: voltage magnitude and angle. In a completely balanced system, each bus has three nodes, but in an unbalanced system some of the buses can be either single phase or two-phase. > > The working DMNetwork code I developed, loosely based on the SNES network/power.c, essentially represents each vertex as a bus. DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to each vertex. If every single bus had the same number of variables, the mat block size = 2, 4, or 6, and my code is both fast and scalable. However, if the unknowns per DMNetwork vertex unknowns are not the same across, then my SNESFormJacobian function becomes extremely extremely slow. Specifically, the MatSetValues() calls when the col/row global indices contain an offset value that points to a neighboring bus vertex. > > I have never seen MatSetValues() be slow unless it is allocating. Did you confirm that you are not allocating, with -info? > > Thanks, > > MAtt > > Why is that? Is it because I no longer have a uniform block structure and lose the speed/optimization benefits of iterating through an AIJ matrix? I see three potential workarounds: > > 1) Treat every vertex as a three phase bus and "zero out" all the unused phase node dofs and put a 1 in the diagonal. The problem I see with this is that I will have unnecessary degrees of freedom (aka non-zeros in the matrix). From the distribution systems I've seen, it's possible that anywhere from 1/3 to 1/2 of the buses will be two-phase or less, meaning I may have nearly twice the amount of dofs than necessary if I wanted to preserve the block size = 6 for the AU mat. > > 2) Treat every phase node as a vertex aka solve a single-phase power flow solver. That way I guarantee to have a block size = 2, this is what Domenico's former student did in his thesis work. The problem I see with this is that I have a larger graph, which can take more time to setup and parallelize. > > 3) Create a "fieldsplit" where I essentially have three "blocks" - one for buses with all three phases, another for buses with only two phases, one for single-phase buses. This way each block/fieldsplit will have a consistent block size. I am not sure if this will solve the MatSetValues() issues, but it's, but can anyone give pointers on how to go about achieving this? > > Thanks, > Justin > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 8 21:28:12 2019 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 8 May 2019 22:28:12 -0400 Subject: [petsc-users] Extremely slow DMNetwork Jacobian assembly In-Reply-To: References: <4D37A5FF-992F-4959-A469-1C7CDAFC813C@anl.gov> Message-ID: On Wed, May 8, 2019 at 10:17 PM Abhyankar, Shrirang G via petsc-users < petsc-users at mcs.anl.gov> wrote: > > > > > *From: *petsc-users on behalf of > "Zhang, Hong via petsc-users" > *Reply-To: *"Zhang, Hong" > *Date: *Wednesday, May 8, 2019 at 8:01 PM > *To: *Justin Chang > *Cc: *petsc-users > *Subject: *Re: [petsc-users] Extremely slow DMNetwork Jacobian assembly > > > > Justin: > > Great, the issue is resolved. > > Why MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE) does not > raise an error? > > > > I copy-pasted the above line from the power.c example. MatSetOption should > use PETSC_TRUE to activate the MAT_NEW_NONZERO_ALLOCATION_ERR option. > > > > Matt, > > > > We usually prevent this with a structured SetValues API. For example, DMDA > uses MatSetValuesStencil() which cannot write > > outside the stencil you set. DMPlex uses MatSetValuesClosure(), which is > guaranteed to be allocated. We should write one > > for DMNetwork. The allocation is just like Plex (I believe) where you > allocate closure(star(p)), which would mean that setting > > values for a vertex gets the neighboring edges and their vertices, and > setting values for an edge gets the covering vertices. > > Is that right for DMNetwork? > > Yes, DMNetwork behaves in this fashion. > > I cannot find MatSetValuesClosure() in petsc-master. > > Can you provide detailed instruction on how to implement > MatSetValuesClosure() for DMNetwork? > > Note, dmnetwork is a subclass of DMPlex. > > > > DMNetwork does not do any matrix creation by itself. It calls Plex to > create the matrix. > Right. However, the only operation I put in was MatSetClosure() since that is appropriate for FEM. I think you would need a MatSetStar() for DMNetwork as well. Matt > Hong > > > > > > On Wed, May 8, 2019 at 4:00 PM Dave May wrote: > > > > > > On Wed, 8 May 2019 at 20:34, Justin Chang via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > So here's the branch/repo to the working example I have: > > > > https://github.com/jychang48/petsc-dss/tree/single-bus-vertex > > > > Type 'make' to compile the dss, it should work with the latest petsc-dev > > > > To test the performance, I've taken an existing IEEE 13-bus and duplicated > it N times to create a long radial-like network. I have three sizes where N > = 100, 500, and 1000. Those test files are listed as: > > > > input/test_100.m > > input/test_500.m > > input/test_1000.m > > > > I also created another set of examples where the IEEE 13-bus is fully > balanced (but the program will crash ar the solve step because I used some > unrealistic parameters for the Y-bus matrices and probably have some zeros > somewhere). They are listed as: > > > > input/test2_100.m > > input/test2_500.m > > input/test2_1000.m > > > > The dof count and matrices for the test2_*.m files are slightly larger > than their respective test_*.m but they have a bs=6. > > > > To run these tests, type the following: > > > > ./dpflow -input input/test_100.m > > > > I have a timer that shows how long it takes to compute the Jacobian. > Attached are the log outputs I have for each of the six cases. > > > > Turns out that only the first call to the SNESComputeJacobian() is slow, > all the subsequent calls are fast as I expect. This makes me think it still > has something to do with matrix allocation. > > > > I think it is a preallocation issue. > > Looking to some of the output files (test_1000.out, test_100.out), under > Mat Object I see this in the KSPView > > > > total number of mallocs used during MatSetValues calls =10000 > > > > > > > > > > > > Thanks for the help everyone, > > > > Justin > > > > On Wed, May 8, 2019 at 12:36 PM Matthew Knepley wrote: > > On Wed, May 8, 2019 at 2:30 PM Justin Chang wrote: > > Hi everyone, > > > > Yes I have these lines in my code: > > ierr = DMCreateMatrix(networkdm,&J);CHKERRQ(ierr); > ierr = > MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE);CHKERRQ(ierr); > > > > Okay, its not allocation. So maybe Hong is right that its setting great > big element matrices. We will see with the example. > > > > Thanks, > > > > Matt > > > > I tried -info and here's my output: > > > > [0] PetscInitialize(): PETSc successfully started: number of processors = 1 > > [0] PetscInitialize(): Running on machine: jchang31606s.domain > > [0] PetscCommDuplicate(): Duplicating a communicator 4436504608 > 140550815662944 max tags = 2147483647 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 > 140550815662944 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 > 140550815662944 > > Base power = 0.166667, numbus = 115000, numgen = 5000, numyl = 75000, > numdl = 5000, numlbr = 109999, numtbr = 5000 > > > > **** Power flow dist case **** > > > > Base power = 0.166667, nbus = 115000, ngen = 5000, nwye = 75000, ndelta = > 5000, nbranch = 114999 > > [0] PetscCommDuplicate(): Duplicating a communicator 4436505120 > 140550815683104 max tags = 2147483647 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 620000 X 620000; storage space: > 0 unneeded,10799928 used > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 28 > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 620000) < 0.6. Do not use CompressedRow routines. > > [0] MatSeqAIJCheckInode(): Found 205000 nodes of 620000. Limit used: 5. > Using Inode routines > > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 > 140550815662944 > > [0] DMGetDMSNES(): Creating new DMSNES > > [0] DMGetDMKSP(): Creating new DMKSP > > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > > 0 SNES Function norm 1155.45 > > > > nothing else -info related shows up as I'm iterating through the vertex > loop. > > > > I'll have a MWE for you guys to play with shortly. > > > > Thanks, > > Justin > > > > On Wed, May 8, 2019 at 12:10 PM Smith, Barry F. > wrote: > > > Justin, > > Are you providing matrix entries that connect directly one vertex to > another vertex ACROSS an edge? I don't think that is supported by the > DMNetwork model. The assumption is that edges are only connected to > vertices and vertices are only connected to neighboring edges. > > Everyone, > > I second Matt's reply. > > How is the DMNetwork preallocating for the Jacobian? Does it take into > account coupling between neighboring vertices/edges? Or does it assume no > coupling. Or assume full coupling. If it assumes no coupling and the user > has a good amount of coupling it will be very slow. > > There would need to be a way for the user provide the coupling > information between neighboring vertices/edges if it assumes no coupling. > > Barry > > > > On May 8, 2019, at 7:44 AM, Matthew Knepley via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > On Wed, May 8, 2019 at 4:45 AM Justin Chang via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hi guys, > > > > I have a fully working distribution system solver written using > DMNetwork, The idea is that each electrical bus can have up to three phase > nodes, and each phase node has two unknowns: voltage magnitude and angle. > In a completely balanced system, each bus has three nodes, but in an > unbalanced system some of the buses can be either single phase or two-phase. > > > > The working DMNetwork code I developed, loosely based on the SNES > network/power.c, essentially represents each vertex as a bus. > DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to > each vertex. If every single bus had the same number of variables, the mat > block size = 2, 4, or 6, and my code is both fast and scalable. However, if > the unknowns per DMNetwork vertex unknowns are not the same across, then my > SNESFormJacobian function becomes extremely extremely slow. Specifically, > the MatSetValues() calls when the col/row global indices contain an offset > value that points to a neighboring bus vertex. > > > > I have never seen MatSetValues() be slow unless it is allocating. Did > you confirm that you are not allocating, with -info? > > > > Thanks, > > > > MAtt > > > > Why is that? Is it because I no longer have a uniform block structure > and lose the speed/optimization benefits of iterating through an AIJ > matrix? I see three potential workarounds: > > > > 1) Treat every vertex as a three phase bus and "zero out" all the unused > phase node dofs and put a 1 in the diagonal. The problem I see with this is > that I will have unnecessary degrees of freedom (aka non-zeros in the > matrix). From the distribution systems I've seen, it's possible that > anywhere from 1/3 to 1/2 of the buses will be two-phase or less, meaning I > may have nearly twice the amount of dofs than necessary if I wanted to > preserve the block size = 6 for the AU mat. > > > > 2) Treat every phase node as a vertex aka solve a single-phase power > flow solver. That way I guarantee to have a block size = 2, this is what > Domenico's former student did in his thesis work. The problem I see with > this is that I have a larger graph, which can take more time to setup and > parallelize. > > > > 3) Create a "fieldsplit" where I essentially have three "blocks" - one > for buses with all three phases, another for buses with only two phases, > one for single-phase buses. This way each block/fieldsplit will have a > consistent block size. I am not sure if this will solve the MatSetValues() > issues, but it's, but can anyone give pointers on how to go about achieving > this? > > > > Thanks, > > Justin > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From shrirang.abhyankar at pnnl.gov Wed May 8 21:53:12 2019 From: shrirang.abhyankar at pnnl.gov (Abhyankar, Shrirang G) Date: Thu, 9 May 2019 02:53:12 +0000 Subject: [petsc-users] Extremely slow DMNetwork Jacobian assembly In-Reply-To: References: <4D37A5FF-992F-4959-A469-1C7CDAFC813C@anl.gov> Message-ID: <6F20629B-EFF2-4C0F-9385-15B1F2FE9073@pnnl.gov> From: Matthew Knepley Date: Wednesday, May 8, 2019 at 9:29 PM To: "Abhyankar, Shrirang G" Cc: "Zhang, Hong" , Justin Chang , petsc-users Subject: Re: [petsc-users] Extremely slow DMNetwork Jacobian assembly On Wed, May 8, 2019 at 10:17 PM Abhyankar, Shrirang G via petsc-users > wrote: From: petsc-users > on behalf of "Zhang, Hong via petsc-users" > Reply-To: "Zhang, Hong" > Date: Wednesday, May 8, 2019 at 8:01 PM To: Justin Chang > Cc: petsc-users > Subject: Re: [petsc-users] Extremely slow DMNetwork Jacobian assembly Justin: Great, the issue is resolved. Why MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE) does not raise an error? I copy-pasted the above line from the power.c example. MatSetOption should use PETSC_TRUE to activate the MAT_NEW_NONZERO_ALLOCATION_ERR option. Matt, We usually prevent this with a structured SetValues API. For example, DMDA uses MatSetValuesStencil() which cannot write outside the stencil you set. DMPlex uses MatSetValuesClosure(), which is guaranteed to be allocated. We should write one for DMNetwork. The allocation is just like Plex (I believe) where you allocate closure(star(p)), which would mean that setting values for a vertex gets the neighboring edges and their vertices, and setting values for an edge gets the covering vertices. Is that right for DMNetwork? Yes, DMNetwork behaves in this fashion. I cannot find MatSetValuesClosure() in petsc-master. Can you provide detailed instruction on how to implement MatSetValuesClosure() for DMNetwork? Note, dmnetwork is a subclass of DMPlex. DMNetwork does not do any matrix creation by itself. It calls Plex to create the matrix. Right. However, the only operation I put in was MatSetClosure() since that is appropriate for FEM. I think you would need a MatSetStar() for DMNetwork as well. I see that Hong has implemented has some additional code in DMCreateMatrix_Network that does not use Plex for creating the matrix. I think it covers what you describe above. Hong: DMNetwork matrix creation code is not used unless the user wants to set special sparsity pattern for the blocks. Shouldn?t this code be used by default instead of having Plex create the matrix? Shri Matt Hong On Wed, May 8, 2019 at 4:00 PM Dave May > wrote: On Wed, 8 May 2019 at 20:34, Justin Chang via petsc-users > wrote: So here's the branch/repo to the working example I have: https://github.com/jychang48/petsc-dss/tree/single-bus-vertex Type 'make' to compile the dss, it should work with the latest petsc-dev To test the performance, I've taken an existing IEEE 13-bus and duplicated it N times to create a long radial-like network. I have three sizes where N = 100, 500, and 1000. Those test files are listed as: input/test_100.m input/test_500.m input/test_1000.m I also created another set of examples where the IEEE 13-bus is fully balanced (but the program will crash ar the solve step because I used some unrealistic parameters for the Y-bus matrices and probably have some zeros somewhere). They are listed as: input/test2_100.m input/test2_500.m input/test2_1000.m The dof count and matrices for the test2_*.m files are slightly larger than their respective test_*.m but they have a bs=6. To run these tests, type the following: ./dpflow -input input/test_100.m I have a timer that shows how long it takes to compute the Jacobian. Attached are the log outputs I have for each of the six cases. Turns out that only the first call to the SNESComputeJacobian() is slow, all the subsequent calls are fast as I expect. This makes me think it still has something to do with matrix allocation. I think it is a preallocation issue. Looking to some of the output files (test_1000.out, test_100.out), under Mat Object I see this in the KSPView total number of mallocs used during MatSetValues calls =10000 Thanks for the help everyone, Justin On Wed, May 8, 2019 at 12:36 PM Matthew Knepley > wrote: On Wed, May 8, 2019 at 2:30 PM Justin Chang > wrote: Hi everyone, Yes I have these lines in my code: ierr = DMCreateMatrix(networkdm,&J);CHKERRQ(ierr); ierr = MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE);CHKERRQ(ierr); Okay, its not allocation. So maybe Hong is right that its setting great big element matrices. We will see with the example. Thanks, Matt I tried -info and here's my output: [0] PetscInitialize(): PETSc successfully started: number of processors = 1 [0] PetscInitialize(): Running on machine: jchang31606s.domain [0] PetscCommDuplicate(): Duplicating a communicator 4436504608 140550815662944 max tags = 2147483647 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944 Base power = 0.166667, numbus = 115000, numgen = 5000, numyl = 75000, numdl = 5000, numlbr = 109999, numtbr = 5000 **** Power flow dist case **** Base power = 0.166667, nbus = 115000, ngen = 5000, nwye = 75000, ndelta = 5000, nbranch = 114999 [0] PetscCommDuplicate(): Duplicating a communicator 4436505120 140550815683104 max tags = 2147483647 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 620000 X 620000; storage space: 0 unneeded,10799928 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 28 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 620000) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 205000 nodes of 620000. Limit used: 5. Using Inode routines [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944 [0] DMGetDMSNES(): Creating new DMSNES [0] DMGetDMKSP(): Creating new DMKSP [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 0 SNES Function norm 1155.45 nothing else -info related shows up as I'm iterating through the vertex loop. I'll have a MWE for you guys to play with shortly. Thanks, Justin On Wed, May 8, 2019 at 12:10 PM Smith, Barry F. > wrote: Justin, Are you providing matrix entries that connect directly one vertex to another vertex ACROSS an edge? I don't think that is supported by the DMNetwork model. The assumption is that edges are only connected to vertices and vertices are only connected to neighboring edges. Everyone, I second Matt's reply. How is the DMNetwork preallocating for the Jacobian? Does it take into account coupling between neighboring vertices/edges? Or does it assume no coupling. Or assume full coupling. If it assumes no coupling and the user has a good amount of coupling it will be very slow. There would need to be a way for the user provide the coupling information between neighboring vertices/edges if it assumes no coupling. Barry > On May 8, 2019, at 7:44 AM, Matthew Knepley via petsc-users > wrote: > > On Wed, May 8, 2019 at 4:45 AM Justin Chang via petsc-users > wrote: > Hi guys, > > I have a fully working distribution system solver written using DMNetwork, The idea is that each electrical bus can have up to three phase nodes, and each phase node has two unknowns: voltage magnitude and angle. In a completely balanced system, each bus has three nodes, but in an unbalanced system some of the buses can be either single phase or two-phase. > > The working DMNetwork code I developed, loosely based on the SNES network/power.c, essentially represents each vertex as a bus. DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to each vertex. If every single bus had the same number of variables, the mat block size = 2, 4, or 6, and my code is both fast and scalable. However, if the unknowns per DMNetwork vertex unknowns are not the same across, then my SNESFormJacobian function becomes extremely extremely slow. Specifically, the MatSetValues() calls when the col/row global indices contain an offset value that points to a neighboring bus vertex. > > I have never seen MatSetValues() be slow unless it is allocating. Did you confirm that you are not allocating, with -info? > > Thanks, > > MAtt > > Why is that? Is it because I no longer have a uniform block structure and lose the speed/optimization benefits of iterating through an AIJ matrix? I see three potential workarounds: > > 1) Treat every vertex as a three phase bus and "zero out" all the unused phase node dofs and put a 1 in the diagonal. The problem I see with this is that I will have unnecessary degrees of freedom (aka non-zeros in the matrix). From the distribution systems I've seen, it's possible that anywhere from 1/3 to 1/2 of the buses will be two-phase or less, meaning I may have nearly twice the amount of dofs than necessary if I wanted to preserve the block size = 6 for the AU mat. > > 2) Treat every phase node as a vertex aka solve a single-phase power flow solver. That way I guarantee to have a block size = 2, this is what Domenico's former student did in his thesis work. The problem I see with this is that I have a larger graph, which can take more time to setup and parallelize. > > 3) Create a "fieldsplit" where I essentially have three "blocks" - one for buses with all three phases, another for buses with only two phases, one for single-phase buses. This way each block/fieldsplit will have a consistent block size. I am not sure if this will solve the MatSetValues() issues, but it's, but can anyone give pointers on how to go about achieving this? > > Thanks, > Justin > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 8 21:58:49 2019 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 8 May 2019 22:58:49 -0400 Subject: [petsc-users] Extremely slow DMNetwork Jacobian assembly In-Reply-To: <6F20629B-EFF2-4C0F-9385-15B1F2FE9073@pnnl.gov> References: <4D37A5FF-992F-4959-A469-1C7CDAFC813C@anl.gov> <6F20629B-EFF2-4C0F-9385-15B1F2FE9073@pnnl.gov> Message-ID: On Wed, May 8, 2019 at 10:53 PM Abhyankar, Shrirang G < shrirang.abhyankar at pnnl.gov> wrote: > > > > > *From: *Matthew Knepley > *Date: *Wednesday, May 8, 2019 at 9:29 PM > *To: *"Abhyankar, Shrirang G" > *Cc: *"Zhang, Hong" , Justin Chang < > jychang48 at gmail.com>, petsc-users > *Subject: *Re: [petsc-users] Extremely slow DMNetwork Jacobian assembly > > > > On Wed, May 8, 2019 at 10:17 PM Abhyankar, Shrirang G via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > > > *From: *petsc-users on behalf of > "Zhang, Hong via petsc-users" > *Reply-To: *"Zhang, Hong" > *Date: *Wednesday, May 8, 2019 at 8:01 PM > *To: *Justin Chang > *Cc: *petsc-users > *Subject: *Re: [petsc-users] Extremely slow DMNetwork Jacobian assembly > > > > Justin: > > Great, the issue is resolved. > > Why MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE) does not > raise an error? > > > > I copy-pasted the above line from the power.c example. MatSetOption should > use PETSC_TRUE to activate the MAT_NEW_NONZERO_ALLOCATION_ERR option. > > > > Matt, > > > > We usually prevent this with a structured SetValues API. For example, DMDA > uses MatSetValuesStencil() which cannot write > > outside the stencil you set. DMPlex uses MatSetValuesClosure(), which is > guaranteed to be allocated. We should write one > > for DMNetwork. The allocation is just like Plex (I believe) where you > allocate closure(star(p)), which would mean that setting > > values for a vertex gets the neighboring edges and their vertices, and > setting values for an edge gets the covering vertices. > > Is that right for DMNetwork? > > Yes, DMNetwork behaves in this fashion. > > I cannot find MatSetValuesClosure() in petsc-master. > > Can you provide detailed instruction on how to implement > MatSetValuesClosure() for DMNetwork? > > Note, dmnetwork is a subclass of DMPlex. > > > > DMNetwork does not do any matrix creation by itself. It calls Plex to > create the matrix. > > > > Right. However, the only operation I put in was MatSetClosure() since that > is appropriate for FEM. I think you would need > > a MatSetStar() for DMNetwork as well. > > > > I see that Hong has implemented has some additional code in > DMCreateMatrix_Network that does not use Plex for creating the matrix. I > think it covers what you describe above. > No. That is not what I am saying. It has nothing to do with matrix creation, It is about setting values in the matrix. Matt > Hong: DMNetwork matrix creation code is not used unless the user wants to > set special sparsity pattern for the blocks. Shouldn?t this code be used by > default instead of having Plex create the matrix? > > > > Shri > > > > Matt > > > > Hong > > > > > > On Wed, May 8, 2019 at 4:00 PM Dave May wrote: > > > > > > On Wed, 8 May 2019 at 20:34, Justin Chang via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > So here's the branch/repo to the working example I have: > > > > https://github.com/jychang48/petsc-dss/tree/single-bus-vertex > > > > Type 'make' to compile the dss, it should work with the latest petsc-dev > > > > To test the performance, I've taken an existing IEEE 13-bus and duplicated > it N times to create a long radial-like network. I have three sizes where N > = 100, 500, and 1000. Those test files are listed as: > > > > input/test_100.m > > input/test_500.m > > input/test_1000.m > > > > I also created another set of examples where the IEEE 13-bus is fully > balanced (but the program will crash ar the solve step because I used some > unrealistic parameters for the Y-bus matrices and probably have some zeros > somewhere). They are listed as: > > > > input/test2_100.m > > input/test2_500.m > > input/test2_1000.m > > > > The dof count and matrices for the test2_*.m files are slightly larger > than their respective test_*.m but they have a bs=6. > > > > To run these tests, type the following: > > > > ./dpflow -input input/test_100.m > > > > I have a timer that shows how long it takes to compute the Jacobian. > Attached are the log outputs I have for each of the six cases. > > > > Turns out that only the first call to the SNESComputeJacobian() is slow, > all the subsequent calls are fast as I expect. This makes me think it still > has something to do with matrix allocation. > > > > I think it is a preallocation issue. > > Looking to some of the output files (test_1000.out, test_100.out), under > Mat Object I see this in the KSPView > > > > total number of mallocs used during MatSetValues calls =10000 > > > > > > > > > > > > Thanks for the help everyone, > > > > Justin > > > > On Wed, May 8, 2019 at 12:36 PM Matthew Knepley wrote: > > On Wed, May 8, 2019 at 2:30 PM Justin Chang wrote: > > Hi everyone, > > > > Yes I have these lines in my code: > > ierr = DMCreateMatrix(networkdm,&J);CHKERRQ(ierr); > ierr = > MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE);CHKERRQ(ierr); > > > > Okay, its not allocation. So maybe Hong is right that its setting great > big element matrices. We will see with the example. > > > > Thanks, > > > > Matt > > > > I tried -info and here's my output: > > > > [0] PetscInitialize(): PETSc successfully started: number of processors = 1 > > [0] PetscInitialize(): Running on machine: jchang31606s.domain > > [0] PetscCommDuplicate(): Duplicating a communicator 4436504608 > 140550815662944 max tags = 2147483647 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 > 140550815662944 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 > 140550815662944 > > Base power = 0.166667, numbus = 115000, numgen = 5000, numyl = 75000, > numdl = 5000, numlbr = 109999, numtbr = 5000 > > > > **** Power flow dist case **** > > > > Base power = 0.166667, nbus = 115000, ngen = 5000, nwye = 75000, ndelta = > 5000, nbranch = 114999 > > [0] PetscCommDuplicate(): Duplicating a communicator 4436505120 > 140550815683104 max tags = 2147483647 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 620000 X 620000; storage space: > 0 unneeded,10799928 used > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 28 > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 620000) < 0.6. Do not use CompressedRow routines. > > [0] MatSeqAIJCheckInode(): Found 205000 nodes of 620000. Limit used: 5. > Using Inode routines > > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 > 140550815662944 > > [0] DMGetDMSNES(): Creating new DMSNES > > [0] DMGetDMKSP(): Creating new DMKSP > > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 > 140550815683104 > > 0 SNES Function norm 1155.45 > > > > nothing else -info related shows up as I'm iterating through the vertex > loop. > > > > I'll have a MWE for you guys to play with shortly. > > > > Thanks, > > Justin > > > > On Wed, May 8, 2019 at 12:10 PM Smith, Barry F. > wrote: > > > Justin, > > Are you providing matrix entries that connect directly one vertex to > another vertex ACROSS an edge? I don't think that is supported by the > DMNetwork model. The assumption is that edges are only connected to > vertices and vertices are only connected to neighboring edges. > > Everyone, > > I second Matt's reply. > > How is the DMNetwork preallocating for the Jacobian? Does it take into > account coupling between neighboring vertices/edges? Or does it assume no > coupling. Or assume full coupling. If it assumes no coupling and the user > has a good amount of coupling it will be very slow. > > There would need to be a way for the user provide the coupling > information between neighboring vertices/edges if it assumes no coupling. > > Barry > > > > On May 8, 2019, at 7:44 AM, Matthew Knepley via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > On Wed, May 8, 2019 at 4:45 AM Justin Chang via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hi guys, > > > > I have a fully working distribution system solver written using > DMNetwork, The idea is that each electrical bus can have up to three phase > nodes, and each phase node has two unknowns: voltage magnitude and angle. > In a completely balanced system, each bus has three nodes, but in an > unbalanced system some of the buses can be either single phase or two-phase. > > > > The working DMNetwork code I developed, loosely based on the SNES > network/power.c, essentially represents each vertex as a bus. > DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to > each vertex. If every single bus had the same number of variables, the mat > block size = 2, 4, or 6, and my code is both fast and scalable. However, if > the unknowns per DMNetwork vertex unknowns are not the same across, then my > SNESFormJacobian function becomes extremely extremely slow. Specifically, > the MatSetValues() calls when the col/row global indices contain an offset > value that points to a neighboring bus vertex. > > > > I have never seen MatSetValues() be slow unless it is allocating. Did > you confirm that you are not allocating, with -info? > > > > Thanks, > > > > MAtt > > > > Why is that? Is it because I no longer have a uniform block structure > and lose the speed/optimization benefits of iterating through an AIJ > matrix? I see three potential workarounds: > > > > 1) Treat every vertex as a three phase bus and "zero out" all the unused > phase node dofs and put a 1 in the diagonal. The problem I see with this is > that I will have unnecessary degrees of freedom (aka non-zeros in the > matrix). From the distribution systems I've seen, it's possible that > anywhere from 1/3 to 1/2 of the buses will be two-phase or less, meaning I > may have nearly twice the amount of dofs than necessary if I wanted to > preserve the block size = 6 for the AU mat. > > > > 2) Treat every phase node as a vertex aka solve a single-phase power > flow solver. That way I guarantee to have a block size = 2, this is what > Domenico's former student did in his thesis work. The problem I see with > this is that I have a larger graph, which can take more time to setup and > parallelize. > > > > 3) Create a "fieldsplit" where I essentially have three "blocks" - one > for buses with all three phases, another for buses with only two phases, > one for single-phase buses. This way each block/fieldsplit will have a > consistent block size. I am not sure if this will solve the MatSetValues() > issues, but it's, but can anyone give pointers on how to go about achieving > this? > > > > Thanks, > > Justin > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu May 9 00:19:36 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Thu, 9 May 2019 05:19:36 +0000 Subject: [petsc-users] Extremely slow DMNetwork Jacobian assembly In-Reply-To: References: <4D37A5FF-992F-4959-A469-1C7CDAFC813C@anl.gov> Message-ID: <102202D1-2FA3-4C52-ACCE-8AFD14AA4FF6@anl.gov> > On May 8, 2019, at 8:54 PM, Matthew Knepley via petsc-users wrote: > > On Wed, May 8, 2019 at 9:00 PM Zhang, Hong wrote: > Justin: > Great, the issue is resolved. > Why MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE) does not raise an error? > > Because it has PETSC_FALSE. Why would it EVER be set to false? Does DMNetwork preallocate appropriately? If not, why? Barry > > Matt, > > We usually prevent this with a structured SetValues API. For example, DMDA uses MatSetValuesStencil() which cannot write > outside the stencil you set. DMPlex uses MatSetValuesClosure(), which is guaranteed to be allocated. We should write one > for DMNetwork. The allocation is just like Plex (I believe) where you allocate closure(star(p)), which would mean that setting > values for a vertex gets the neighboring edges and their vertices, and setting values for an edge gets the covering vertices. > Is that right for DMNetwork? > Yes, DMNetwork behaves in this fashion. > I cannot find MatSetValuesClosure() in petsc-master. > > I mean https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DMPLEX/DMPlexMatSetClosure.html > > Can you provide detailed instruction on how to implement MatSetValuesClosure() for DMNetwork? > > It will just work as is for edges, but not for vertices since you want to set the star, not the closure. You would > just need to reverse exactly what is in that function. > > Thanks, > > Matt > > Note, dmnetwork is a subclass of DMPlex. > > Hong > > > On Wed, May 8, 2019 at 4:00 PM Dave May wrote: > > > On Wed, 8 May 2019 at 20:34, Justin Chang via petsc-users wrote: > So here's the branch/repo to the working example I have: > > https://github.com/jychang48/petsc-dss/tree/single-bus-vertex > > Type 'make' to compile the dss, it should work with the latest petsc-dev > > To test the performance, I've taken an existing IEEE 13-bus and duplicated it N times to create a long radial-like network. I have three sizes where N = 100, 500, and 1000. Those test files are listed as: > > input/test_100.m > input/test_500.m > input/test_1000.m > > I also created another set of examples where the IEEE 13-bus is fully balanced (but the program will crash ar the solve step because I used some unrealistic parameters for the Y-bus matrices and probably have some zeros somewhere). They are listed as: > > input/test2_100.m > input/test2_500.m > input/test2_1000.m > > The dof count and matrices for the test2_*.m files are slightly larger than their respective test_*.m but they have a bs=6. > > To run these tests, type the following: > > ./dpflow -input input/test_100.m > > I have a timer that shows how long it takes to compute the Jacobian. Attached are the log outputs I have for each of the six cases. > > Turns out that only the first call to the SNESComputeJacobian() is slow, all the subsequent calls are fast as I expect. This makes me think it still has something to do with matrix allocation. > > I think it is a preallocation issue. > Looking to some of the output files (test_1000.out, test_100.out), under Mat Object I see this in the KSPView > > total number of mallocs used during MatSetValues calls =10000 > > > > > > Thanks for the help everyone, > > Justin > > On Wed, May 8, 2019 at 12:36 PM Matthew Knepley wrote: > On Wed, May 8, 2019 at 2:30 PM Justin Chang wrote: > Hi everyone, > > Yes I have these lines in my code: > > ierr = DMCreateMatrix(networkdm,&J);CHKERRQ(ierr); > ierr = MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE);CHKERRQ(ierr); > > Okay, its not allocation. So maybe Hong is right that its setting great big element matrices. We will see with the example. > > Thanks, > > Matt > > I tried -info and here's my output: > > [0] PetscInitialize(): PETSc successfully started: number of processors = 1 > [0] PetscInitialize(): Running on machine: jchang31606s.domain > [0] PetscCommDuplicate(): Duplicating a communicator 4436504608 140550815662944 max tags = 2147483647 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944 > Base power = 0.166667, numbus = 115000, numgen = 5000, numyl = 75000, numdl = 5000, numlbr = 109999, numtbr = 5000 > > **** Power flow dist case **** > > Base power = 0.166667, nbus = 115000, ngen = 5000, nwye = 75000, ndelta = 5000, nbranch = 114999 > [0] PetscCommDuplicate(): Duplicating a communicator 4436505120 140550815683104 max tags = 2147483647 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 620000 X 620000; storage space: 0 unneeded,10799928 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 28 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 620000) < 0.6. Do not use CompressedRow routines. > [0] MatSeqAIJCheckInode(): Found 205000 nodes of 620000. Limit used: 5. Using Inode routines > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944 > [0] DMGetDMSNES(): Creating new DMSNES > [0] DMGetDMKSP(): Creating new DMKSP > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > 0 SNES Function norm 1155.45 > > nothing else -info related shows up as I'm iterating through the vertex loop. > > I'll have a MWE for you guys to play with shortly. > > Thanks, > Justin > > On Wed, May 8, 2019 at 12:10 PM Smith, Barry F. wrote: > > Justin, > > Are you providing matrix entries that connect directly one vertex to another vertex ACROSS an edge? I don't think that is supported by the DMNetwork model. The assumption is that edges are only connected to vertices and vertices are only connected to neighboring edges. > > Everyone, > > I second Matt's reply. > > How is the DMNetwork preallocating for the Jacobian? Does it take into account coupling between neighboring vertices/edges? Or does it assume no coupling. Or assume full coupling. If it assumes no coupling and the user has a good amount of coupling it will be very slow. > > There would need to be a way for the user provide the coupling information between neighboring vertices/edges if it assumes no coupling. > > Barry > > > > On May 8, 2019, at 7:44 AM, Matthew Knepley via petsc-users wrote: > > > > On Wed, May 8, 2019 at 4:45 AM Justin Chang via petsc-users wrote: > > Hi guys, > > > > I have a fully working distribution system solver written using DMNetwork, The idea is that each electrical bus can have up to three phase nodes, and each phase node has two unknowns: voltage magnitude and angle. In a completely balanced system, each bus has three nodes, but in an unbalanced system some of the buses can be either single phase or two-phase. > > > > The working DMNetwork code I developed, loosely based on the SNES network/power.c, essentially represents each vertex as a bus. DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to each vertex. If every single bus had the same number of variables, the mat block size = 2, 4, or 6, and my code is both fast and scalable. However, if the unknowns per DMNetwork vertex unknowns are not the same across, then my SNESFormJacobian function becomes extremely extremely slow. Specifically, the MatSetValues() calls when the col/row global indices contain an offset value that points to a neighboring bus vertex. > > > > I have never seen MatSetValues() be slow unless it is allocating. Did you confirm that you are not allocating, with -info? > > > > Thanks, > > > > MAtt > > > > Why is that? Is it because I no longer have a uniform block structure and lose the speed/optimization benefits of iterating through an AIJ matrix? I see three potential workarounds: > > > > 1) Treat every vertex as a three phase bus and "zero out" all the unused phase node dofs and put a 1 in the diagonal. The problem I see with this is that I will have unnecessary degrees of freedom (aka non-zeros in the matrix). From the distribution systems I've seen, it's possible that anywhere from 1/3 to 1/2 of the buses will be two-phase or less, meaning I may have nearly twice the amount of dofs than necessary if I wanted to preserve the block size = 6 for the AU mat. > > > > 2) Treat every phase node as a vertex aka solve a single-phase power flow solver. That way I guarantee to have a block size = 2, this is what Domenico's former student did in his thesis work. The problem I see with this is that I have a larger graph, which can take more time to setup and parallelize. > > > > 3) Create a "fieldsplit" where I essentially have three "blocks" - one for buses with all three phases, another for buses with only two phases, one for single-phase buses. This way each block/fieldsplit will have a consistent block size. I am not sure if this will solve the MatSetValues() issues, but it's, but can anyone give pointers on how to go about achieving this? > > > > Thanks, > > Justin > > > > > > -- > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From bsmith at mcs.anl.gov Thu May 9 00:23:15 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Thu, 9 May 2019 05:23:15 +0000 Subject: [petsc-users] Extremely slow DMNetwork Jacobian assembly In-Reply-To: <6F20629B-EFF2-4C0F-9385-15B1F2FE9073@pnnl.gov> References: <4D37A5FF-992F-4959-A469-1C7CDAFC813C@anl.gov> <6F20629B-EFF2-4C0F-9385-15B1F2FE9073@pnnl.gov> Message-ID: <54D32962-6EA8-4103-A168-68E92219EED4@anl.gov> > On May 8, 2019, at 9:53 PM, Abhyankar, Shrirang G via petsc-users wrote: > > > > From: Matthew Knepley > Date: Wednesday, May 8, 2019 at 9:29 PM > To: "Abhyankar, Shrirang G" > Cc: "Zhang, Hong" , Justin Chang , petsc-users > Subject: Re: [petsc-users] Extremely slow DMNetwork Jacobian assembly > > On Wed, May 8, 2019 at 10:17 PM Abhyankar, Shrirang G via petsc-users wrote: > > > From: petsc-users on behalf of "Zhang, Hong via petsc-users" > Reply-To: "Zhang, Hong" > Date: Wednesday, May 8, 2019 at 8:01 PM > To: Justin Chang > Cc: petsc-users > Subject: Re: [petsc-users] Extremely slow DMNetwork Jacobian assembly > > Justin: > Great, the issue is resolved. > Why MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE) does not raise an error? > > I copy-pasted the above line from the power.c example. MatSetOption should use PETSC_TRUE to activate the MAT_NEW_NONZERO_ALLOCATION_ERR option. > > Matt, > > We usually prevent this with a structured SetValues API. For example, DMDA uses MatSetValuesStencil() which cannot write > outside the stencil you set. DMPlex uses MatSetValuesClosure(), which is guaranteed to be allocated. We should write one > for DMNetwork. The allocation is just like Plex (I believe) where you allocate closure(star(p)), which would mean that setting > values for a vertex gets the neighboring edges and their vertices, and setting values for an edge gets the covering vertices. > Is that right for DMNetwork? > Yes, DMNetwork behaves in this fashion. > I cannot find MatSetValuesClosure() in petsc-master. > Can you provide detailed instruction on how to implement MatSetValuesClosure() for DMNetwork? > Note, dmnetwork is a subclass of DMPlex. > > DMNetwork does not do any matrix creation by itself. It calls Plex to create the matrix. > > Right. However, the only operation I put in was MatSetClosure() since that is appropriate for FEM. I think you would need > a MatSetStar() for DMNetwork as well. > > I see that Hong has implemented has some additional code in DMCreateMatrix_Network that does not use Plex for creating the matrix. I think it covers what you describe above. > > Hong: DMNetwork matrix creation code is not used unless the user wants to set special sparsity pattern for the blocks. Shouldn?t this code What code? The special sparsity pattern for blocks? > be used by default instead of having Plex create the matrix? What if the user has dense blocks or don'ts know or care about the sparsity pattern? By default it should allocate something that every users code will fit in the matrix and then user callable options for making parts of the matrices sparse. Barry > > Shri > > Matt > > Hong > > > On Wed, May 8, 2019 at 4:00 PM Dave May wrote: > > > On Wed, 8 May 2019 at 20:34, Justin Chang via petsc-users wrote: > So here's the branch/repo to the working example I have: > > https://github.com/jychang48/petsc-dss/tree/single-bus-vertex > > Type 'make' to compile the dss, it should work with the latest petsc-dev > > To test the performance, I've taken an existing IEEE 13-bus and duplicated it N times to create a long radial-like network. I have three sizes where N = 100, 500, and 1000. Those test files are listed as: > > input/test_100.m > input/test_500.m > input/test_1000.m > > I also created another set of examples where the IEEE 13-bus is fully balanced (but the program will crash ar the solve step because I used some unrealistic parameters for the Y-bus matrices and probably have some zeros somewhere). They are listed as: > > input/test2_100.m > input/test2_500.m > input/test2_1000.m > > The dof count and matrices for the test2_*.m files are slightly larger than their respective test_*.m but they have a bs=6. > > To run these tests, type the following: > > ./dpflow -input input/test_100.m > > I have a timer that shows how long it takes to compute the Jacobian. Attached are the log outputs I have for each of the six cases. > > Turns out that only the first call to the SNESComputeJacobian() is slow, all the subsequent calls are fast as I expect. This makes me think it still has something to do with matrix allocation. > > I think it is a preallocation issue. > Looking to some of the output files (test_1000.out, test_100.out), under Mat Object I see this in the KSPView > > total number of mallocs used during MatSetValues calls =10000 > > > > > > Thanks for the help everyone, > > Justin > > On Wed, May 8, 2019 at 12:36 PM Matthew Knepley wrote: > On Wed, May 8, 2019 at 2:30 PM Justin Chang wrote: > Hi everyone, > > Yes I have these lines in my code: > > ierr = DMCreateMatrix(networkdm,&J);CHKERRQ(ierr); > ierr = MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE);CHKERRQ(ierr); > > Okay, its not allocation. So maybe Hong is right that its setting great big element matrices. We will see with the example. > > Thanks, > > Matt > > I tried -info and here's my output: > > [0] PetscInitialize(): PETSc successfully started: number of processors = 1 > [0] PetscInitialize(): Running on machine: jchang31606s.domain > [0] PetscCommDuplicate(): Duplicating a communicator 4436504608 140550815662944 max tags = 2147483647 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944 > Base power = 0.166667, numbus = 115000, numgen = 5000, numyl = 75000, numdl = 5000, numlbr = 109999, numtbr = 5000 > > **** Power flow dist case **** > > Base power = 0.166667, nbus = 115000, ngen = 5000, nwye = 75000, ndelta = 5000, nbranch = 114999 > [0] PetscCommDuplicate(): Duplicating a communicator 4436505120 140550815683104 max tags = 2147483647 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 620000 X 620000; storage space: 0 unneeded,10799928 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 28 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 620000) < 0.6. Do not use CompressedRow routines. > [0] MatSeqAIJCheckInode(): Found 205000 nodes of 620000. Limit used: 5. Using Inode routines > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944 > [0] DMGetDMSNES(): Creating new DMSNES > [0] DMGetDMKSP(): Creating new DMKSP > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > 0 SNES Function norm 1155.45 > > nothing else -info related shows up as I'm iterating through the vertex loop. > > I'll have a MWE for you guys to play with shortly. > > Thanks, > Justin > > On Wed, May 8, 2019 at 12:10 PM Smith, Barry F. wrote: > > Justin, > > Are you providing matrix entries that connect directly one vertex to another vertex ACROSS an edge? I don't think that is supported by the DMNetwork model. The assumption is that edges are only connected to vertices and vertices are only connected to neighboring edges. > > Everyone, > > I second Matt's reply. > > How is the DMNetwork preallocating for the Jacobian? Does it take into account coupling between neighboring vertices/edges? Or does it assume no coupling. Or assume full coupling. If it assumes no coupling and the user has a good amount of coupling it will be very slow. > > There would need to be a way for the user provide the coupling information between neighboring vertices/edges if it assumes no coupling. > > Barry > > > > On May 8, 2019, at 7:44 AM, Matthew Knepley via petsc-users wrote: > > > > On Wed, May 8, 2019 at 4:45 AM Justin Chang via petsc-users wrote: > > Hi guys, > > > > I have a fully working distribution system solver written using DMNetwork, The idea is that each electrical bus can have up to three phase nodes, and each phase node has two unknowns: voltage magnitude and angle. In a completely balanced system, each bus has three nodes, but in an unbalanced system some of the buses can be either single phase or two-phase. > > > > The working DMNetwork code I developed, loosely based on the SNES network/power.c, essentially represents each vertex as a bus. DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to each vertex. If every single bus had the same number of variables, the mat block size = 2, 4, or 6, and my code is both fast and scalable. However, if the unknowns per DMNetwork vertex unknowns are not the same across, then my SNESFormJacobian function becomes extremely extremely slow. Specifically, the MatSetValues() calls when the col/row global indices contain an offset value that points to a neighboring bus vertex. > > > > I have never seen MatSetValues() be slow unless it is allocating. Did you confirm that you are not allocating, with -info? > > > > Thanks, > > > > MAtt > > > > Why is that? Is it because I no longer have a uniform block structure and lose the speed/optimization benefits of iterating through an AIJ matrix? I see three potential workarounds: > > > > 1) Treat every vertex as a three phase bus and "zero out" all the unused phase node dofs and put a 1 in the diagonal. The problem I see with this is that I will have unnecessary degrees of freedom (aka non-zeros in the matrix). From the distribution systems I've seen, it's possible that anywhere from 1/3 to 1/2 of the buses will be two-phase or less, meaning I may have nearly twice the amount of dofs than necessary if I wanted to preserve the block size = 6 for the AU mat. > > > > 2) Treat every phase node as a vertex aka solve a single-phase power flow solver. That way I guarantee to have a block size = 2, this is what Domenico's former student did in his thesis work. The problem I see with this is that I have a larger graph, which can take more time to setup and parallelize. > > > > 3) Create a "fieldsplit" where I essentially have three "blocks" - one for buses with all three phases, another for buses with only two phases, one for single-phase buses. This way each block/fieldsplit will have a consistent block size. I am not sure if this will solve the MatSetValues() issues, but it's, but can anyone give pointers on how to go about achieving this? > > > > Thanks, > > Justin > > > > > > -- > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From bsmith at mcs.anl.gov Thu May 9 00:53:46 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Thu, 9 May 2019 05:53:46 +0000 Subject: [petsc-users] Memory Corruption Error in MatPartitioningApply In-Reply-To: References: Message-ID: <9752861F-38A1-4527-B811-8AE51D188BAA@anl.gov> Did you ever make progress on this issue? > On Apr 22, 2019, at 8:47 AM, Smith, Barry F. wrote: > > > Are you able to run under valgrind? It is a bit better than the PETSc malloc to find each instance of memory corruption and the sooner you find it the easier it is to find the bug. https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > > >> On Apr 22, 2019, at 7:31 AM, Eda Oktay via petsc-users wrote: >> >> Hello, >> >> I am trying to partition an odd-numbered sized (for example 4253*4253), square permutation matrix by using 2 processors with ParMETIS. The permutation matrix is obtained by permuting the matrix by an index set "is" (MatPermute(A,is,is,&PL)). I checked the index set, it gives a permutation and it is correct. >> >> When I look at the local size of the matrix, it is given by 2127 and 2127 on each processor, so in order the local sizes of matrix and index sets to be same, I defined the index sets' sizes as 2127 and 2127. >> >> When I do that, I get memory corruption error in MatPartiitioningApply function. The error is as follows: >> >> [0]PETSC ERROR: PetscMallocValidate: error detected at MatPartitioningApply_Parmetis_Private() line 141 in /home/edaoktay/petsc-3.10.3/src/mat/partition/impls/pmetis/pmetis.c >> [0]PETSC ERROR: Memory [id=0(8512)] at address 0x19e6870 is corrupted (probably write past end of array) >> [0]PETSC ERROR: Memory originally allocated in main() line 310 in /home/edaoktay/petsc-3.10.3/arch-linux2-c-debug/share/slepc/examples/src/eda/TEK_SAYI_SON_YENI_DENEME_TEMIZ_ENYENI_FINAL.c >> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >> [0]PETSC ERROR: Memory corruption: http://www.mcs.anl.gov/petsc/documentation/installation.html#valgrind >> [0]PETSC ERROR: >> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >> [0]PETSC ERROR: Petsc Release Version 3.10.3, Dec, 18, 2018 >> [0]PETSC ERROR: ./TEK_SAYI_SON_YENI_DENEME_TEMIZ_ENYENI_FINAL on a arch-linux2-c-debug named 13ed.wls.metu.edu.tr by edaoktay Mon Apr 22 14:58:52 2019 >> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --with-cxx-dialect=C++11 --download-openblas --download-metis --download-parmetis --download-superlu_dist --download-slepc --download-mpich >> [0]PETSC ERROR: #1 PetscMallocValidate() line 146 in /home/edaoktay/petsc-3.10.3/src/sys/memory/mtr.c >> [0]PETSC ERROR: #2 MatPartitioningApply_Parmetis_Private() line 141 in /home/edaoktay/petsc-3.10.3/src/mat/partition/impls/pmetis/pmetis.c >> [0]PETSC ERROR: #3 MatPartitioningApply_Parmetis() line 215 in /home/edaoktay/petsc-3.10.3/src/mat/partition/impls/pmetis/pmetis.c >> [0]PETSC ERROR: #4 MatPartitioningApply() line 340 in /home/edaoktay/petsc-3.10.3/src/mat/partition/partition.c >> [0]PETSC ERROR: #5 main() line 374 in /home/edaoktay/petsc-3.10.3/arch-linux2-c-debug/share/slepc/examples/src/eda/TEK_SAYI_SON_YENI_DENEME_TEMIZ_ENYENI_FINAL.c >> [0]PETSC ERROR: PETSc Option Table entries: >> [0]PETSC ERROR: -f /home/edaoktay/petsc-3.10.3/share/petsc/datafiles/matrices/binary_files/airfoil1_binary >> [0]PETSC ERROR: -mat_partitioning_type parmetis >> [0]PETSC ERROR: -unweighted >> [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- >> >> >> The line 310 is PetscMalloc1(ss,&idxx). The part of my program is written as below: >> >> if (mod != 0){ >> ss = (siz+1)/size;//(siz+size-mod)/size; >> } else{ >> ss = siz/size; >> } >> >> PetscMalloc1(ss,&idxx); // LINE 310 >> >> if (rank != size-1) { >> j =0; >> for (i=rank*ss; i<(rank+1)*ss; i++) { >> idxx[j] = idx[i]; >> j++; >> } >> >> } else { >> >> j =0; >> for (i=rank*ss; i> idxx[j] = idx[i]; >> j++; >> } >> >> } >> >> if (mod != 0){ >> if (rank> idxx[ss+1] = idx[ss*size+rank+1]; >> } >> } >> >> /*Permute matrix L (spy(A(p1,p1))*/ >> >> if (mod != 0){ >> if (rank> ierr = ISCreateGeneral(PETSC_COMM_WORLD,ss+1,idxx,PETSC_COPY_VALUES,&is);CHKERRQ(ierr); >> } else{ >> ierr = ISCreateGeneral(PETSC_COMM_WORLD,ss,idxx,PETSC_COPY_VALUES,&is);CHKERRQ(ierr); >> } >> >> }else { >> ierr = ISCreateGeneral(PETSC_COMM_WORLD,ss,idxx,PETSC_COPY_VALUES,&is);CHKERRQ(ierr); >> } >> >> ierr = ISSetPermutation(is);CHKERRQ(ierr); >> ierr = MatPermute(A,is,is,&PL);CHKERRQ(ierr); >> >> /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >> Create Partitioning >> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */ >> >> ierr = MatConvert(PL,MATMPIADJ,MAT_INITIAL_MATRIX,&AL);CHKERRQ(ierr); >> ierr = MatPartitioningCreate(MPI_COMM_WORLD,&part);CHKERRQ(ierr); >> ierr = MatPartitioningSetAdjacency(part,AL);CHKERRQ(ierr); >> ierr = MatPartitioningSetFromOptions(part);CHKERRQ(ierr); >> ierr = MatPartitioningApply(part,&partitioning);CHKERRQ(ierr); >> >> I understood that I cannot change the local size of the matrix since it is read from a file. But as you can see above, when I defined index sets' sizes as 2127 and 2127, memory corruption occurs. I tried several things but at the end I got error in MatPermute or here. >> >> By the way, idx is from 0 to 4252 but the global size of is is 4253. If I change idx to 0:4253 then I think it will be incorrect since actually there is no 4254th element. >> >> How can I solve this problem? >> >> Thank you, >> >> Eda > From eda.oktay at metu.edu.tr Thu May 9 01:46:18 2019 From: eda.oktay at metu.edu.tr (Eda Oktay) Date: Thu, 9 May 2019 09:46:18 +0300 Subject: [petsc-users] Memory Corruption Error in MatPartitioningApply In-Reply-To: <9752861F-38A1-4527-B811-8AE51D188BAA@anl.gov> References: <9752861F-38A1-4527-B811-8AE51D188BAA@anl.gov> Message-ID: I misread local sizes of the matrix. Without using valgrind, I was able to fix the problem by using small sized matrix. It turned out that there is an indexing mistake. Thank you! Eda On Thu, May 9, 2019, 8:53 AM Smith, Barry F. wrote: > > Did you ever make progress on this issue? > > > On Apr 22, 2019, at 8:47 AM, Smith, Barry F. wrote: > > > > > > Are you able to run under valgrind? It is a bit better than the PETSc > malloc to find each instance of memory corruption and the sooner you find > it the easier it is to find the bug. > https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > > > > > > >> On Apr 22, 2019, at 7:31 AM, Eda Oktay via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> > >> Hello, > >> > >> I am trying to partition an odd-numbered sized (for example 4253*4253), > square permutation matrix by using 2 processors with ParMETIS. The > permutation matrix is obtained by permuting the matrix by an index set "is" > (MatPermute(A,is,is,&PL)). I checked the index set, it gives a permutation > and it is correct. > >> > >> When I look at the local size of the matrix, it is given by 2127 and > 2127 on each processor, so in order the local sizes of matrix and index > sets to be same, I defined the index sets' sizes as 2127 and 2127. > >> > >> When I do that, I get memory corruption error in MatPartiitioningApply > function. The error is as follows: > >> > >> [0]PETSC ERROR: PetscMallocValidate: error detected at > MatPartitioningApply_Parmetis_Private() line 141 in > /home/edaoktay/petsc-3.10.3/src/mat/partition/impls/pmetis/pmetis.c > >> [0]PETSC ERROR: Memory [id=0(8512)] at address 0x19e6870 is corrupted > (probably write past end of array) > >> [0]PETSC ERROR: Memory originally allocated in main() line 310 in > /home/edaoktay/petsc-3.10.3/arch-linux2-c-debug/share/slepc/examples/src/eda/TEK_SAYI_SON_YENI_DENEME_TEMIZ_ENYENI_FINAL.c > >> [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > >> [0]PETSC ERROR: Memory corruption: > http://www.mcs.anl.gov/petsc/documentation/installation.html#valgrind > >> [0]PETSC ERROR: > >> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > >> [0]PETSC ERROR: Petsc Release Version 3.10.3, Dec, 18, 2018 > >> [0]PETSC ERROR: ./TEK_SAYI_SON_YENI_DENEME_TEMIZ_ENYENI_FINAL on a > arch-linux2-c-debug named 13ed.wls.metu.edu.tr by edaoktay Mon Apr 22 > 14:58:52 2019 > >> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ > --with-fc=gfortran --with-cxx-dialect=C++11 --download-openblas > --download-metis --download-parmetis --download-superlu_dist > --download-slepc --download-mpich > >> [0]PETSC ERROR: #1 PetscMallocValidate() line 146 in > /home/edaoktay/petsc-3.10.3/src/sys/memory/mtr.c > >> [0]PETSC ERROR: #2 MatPartitioningApply_Parmetis_Private() line 141 in > /home/edaoktay/petsc-3.10.3/src/mat/partition/impls/pmetis/pmetis.c > >> [0]PETSC ERROR: #3 MatPartitioningApply_Parmetis() line 215 in > /home/edaoktay/petsc-3.10.3/src/mat/partition/impls/pmetis/pmetis.c > >> [0]PETSC ERROR: #4 MatPartitioningApply() line 340 in > /home/edaoktay/petsc-3.10.3/src/mat/partition/partition.c > >> [0]PETSC ERROR: #5 main() line 374 in > /home/edaoktay/petsc-3.10.3/arch-linux2-c-debug/share/slepc/examples/src/eda/TEK_SAYI_SON_YENI_DENEME_TEMIZ_ENYENI_FINAL.c > >> [0]PETSC ERROR: PETSc Option Table entries: > >> [0]PETSC ERROR: -f > /home/edaoktay/petsc-3.10.3/share/petsc/datafiles/matrices/binary_files/airfoil1_binary > >> [0]PETSC ERROR: -mat_partitioning_type parmetis > >> [0]PETSC ERROR: -unweighted > >> [0]PETSC ERROR: ----------------End of Error Message -------send entire > error message to petsc-maint at mcs.anl.gov---------- > >> > >> > >> The line 310 is PetscMalloc1(ss,&idxx). The part of my program is > written as below: > >> > >> if (mod != 0){ > >> ss = (siz+1)/size;//(siz+size-mod)/size; > >> } else{ > >> ss = siz/size; > >> } > >> > >> PetscMalloc1(ss,&idxx); // LINE > 310 > >> > >> if (rank != size-1) { > >> j =0; > >> for (i=rank*ss; i<(rank+1)*ss; i++) { > >> idxx[j] = idx[i]; > >> j++; > >> } > >> > >> } else { > >> > >> j =0; > >> for (i=rank*ss; i >> idxx[j] = idx[i]; > >> j++; > >> } > >> > >> } > >> > >> if (mod != 0){ > >> if (rank >> idxx[ss+1] = idx[ss*size+rank+1]; > >> } > >> } > >> > >> /*Permute matrix L (spy(A(p1,p1))*/ > >> > >> if (mod != 0){ > >> if (rank >> ierr = > ISCreateGeneral(PETSC_COMM_WORLD,ss+1,idxx,PETSC_COPY_VALUES,&is);CHKERRQ(ierr); > >> } else{ > >> ierr = > ISCreateGeneral(PETSC_COMM_WORLD,ss,idxx,PETSC_COPY_VALUES,&is);CHKERRQ(ierr); > >> } > >> > >> }else { > >> ierr = > ISCreateGeneral(PETSC_COMM_WORLD,ss,idxx,PETSC_COPY_VALUES,&is);CHKERRQ(ierr); > >> } > >> > >> ierr = ISSetPermutation(is);CHKERRQ(ierr); > >> ierr = MatPermute(A,is,is,&PL);CHKERRQ(ierr); > > >> > >> /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - > >> Create Partitioning > >> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > */ > >> > >> ierr = MatConvert(PL,MATMPIADJ,MAT_INITIAL_MATRIX,&AL);CHKERRQ(ierr); > > >> ierr = MatPartitioningCreate(MPI_COMM_WORLD,&part);CHKERRQ(ierr); > >> ierr = MatPartitioningSetAdjacency(part,AL);CHKERRQ(ierr); > > >> ierr = MatPartitioningSetFromOptions(part);CHKERRQ(ierr); > >> ierr = MatPartitioningApply(part,&partitioning);CHKERRQ(ierr); > >> > >> I understood that I cannot change the local size of the matrix since it > is read from a file. But as you can see above, when I defined index sets' > sizes as 2127 and 2127, memory corruption occurs. I tried several things > but at the end I got error in MatPermute or here. > >> > >> By the way, idx is from 0 to 4252 but the global size of is is 4253. If > I change idx to 0:4253 then I think it will be incorrect since actually > there is no 4254th element. > >> > >> How can I solve this problem? > >> > >> Thank you, > >> > >> Eda > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From davelee2804 at gmail.com Thu May 9 03:39:24 2019 From: davelee2804 at gmail.com (Dave Lee) Date: Thu, 9 May 2019 18:39:24 +1000 Subject: [petsc-users] trust region/hook step equivalence Message-ID: Hi PETSc, I'm using the SNES trust region to solve a matrix free Newton problem. I can't see a lot of description of the trust region algorithm in the manual (section 5.2.2), and have also found it difficult to find documentation on the MINPACK project from which it is apparently derived. I have a couple of questions about this: 1) Is the PETSc SNES trust region algorithm the same as the "hook step" algorithm detailed in Section 6.4.1 of Dennis and Schnabel (1996) "Numerical methods for Unconstrained Optimization and Nonlinear Equations"? 2) Is there anywhere I can find specific documentation on the trust region control parameters as defined in: https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESNEWTONTR.html#SNESNEWTONTR 3) My solve returns before it is sufficiently converged. On the last few Newton iterations the KSP converges due to: CONVERGED_STEP_LENGTH after only a couple of KSP iterations. What is the default for this parameter?, and how can I change it? Should I change it? Cheers, Dave. -------------- next part -------------- An HTML attachment was scrubbed... URL: From shrirang.abhyankar at pnnl.gov Thu May 9 09:24:39 2019 From: shrirang.abhyankar at pnnl.gov (Abhyankar, Shrirang G) Date: Thu, 9 May 2019 14:24:39 +0000 Subject: [petsc-users] Extremely slow DMNetwork Jacobian assembly In-Reply-To: <54D32962-6EA8-4103-A168-68E92219EED4@anl.gov> References: <4D37A5FF-992F-4959-A469-1C7CDAFC813C@anl.gov> <6F20629B-EFF2-4C0F-9385-15B1F2FE9073@pnnl.gov> <54D32962-6EA8-4103-A168-68E92219EED4@anl.gov> Message-ID: > What code? The special sparsity pattern for blocks? Yes. See here https://bitbucket.org/petsc/petsc/src/master/src/dm/impls/network/network.c#lines-1712 > What if the user has dense blocks or don't know or care about the sparsity pattern? By default it should allocate something that every users code will fit in the matrix and then user callable options for making parts of the matrices sparse. That's what it is doing I believe. If the user does not set the sparse matrix block for the point then it uses a dense block. Shri ?On 5/9/19, 12:24 AM, "Smith, Barry F." wrote: > On May 8, 2019, at 9:53 PM, Abhyankar, Shrirang G via petsc-users wrote: > > > > From: Matthew Knepley > Date: Wednesday, May 8, 2019 at 9:29 PM > To: "Abhyankar, Shrirang G" > Cc: "Zhang, Hong" , Justin Chang , petsc-users > Subject: Re: [petsc-users] Extremely slow DMNetwork Jacobian assembly > > On Wed, May 8, 2019 at 10:17 PM Abhyankar, Shrirang G via petsc-users wrote: > > > From: petsc-users on behalf of "Zhang, Hong via petsc-users" > Reply-To: "Zhang, Hong" > Date: Wednesday, May 8, 2019 at 8:01 PM > To: Justin Chang > Cc: petsc-users > Subject: Re: [petsc-users] Extremely slow DMNetwork Jacobian assembly > > Justin: > Great, the issue is resolved. > Why MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE) does not raise an error? > > I copy-pasted the above line from the power.c example. MatSetOption should use PETSC_TRUE to activate the MAT_NEW_NONZERO_ALLOCATION_ERR option. > > Matt, > > We usually prevent this with a structured SetValues API. For example, DMDA uses MatSetValuesStencil() which cannot write > outside the stencil you set. DMPlex uses MatSetValuesClosure(), which is guaranteed to be allocated. We should write one > for DMNetwork. The allocation is just like Plex (I believe) where you allocate closure(star(p)), which would mean that setting > values for a vertex gets the neighboring edges and their vertices, and setting values for an edge gets the covering vertices. > Is that right for DMNetwork? > Yes, DMNetwork behaves in this fashion. > I cannot find MatSetValuesClosure() in petsc-master. > Can you provide detailed instruction on how to implement MatSetValuesClosure() for DMNetwork? > Note, dmnetwork is a subclass of DMPlex. > > DMNetwork does not do any matrix creation by itself. It calls Plex to create the matrix. > > Right. However, the only operation I put in was MatSetClosure() since that is appropriate for FEM. I think you would need > a MatSetStar() for DMNetwork as well. > > I see that Hong has implemented has some additional code in DMCreateMatrix_Network that does not use Plex for creating the matrix. I think it covers what you describe above. > > Hong: DMNetwork matrix creation code is not used unless the user wants to set special sparsity pattern for the blocks. Shouldn?t this code What code? The special sparsity pattern for blocks? > be used by default instead of having Plex create the matrix? What if the user has dense blocks or don'ts know or care about the sparsity pattern? By default it should allocate something that every users code will fit in the matrix and then user callable options for making parts of the matrices sparse. Barry > > Shri > > Matt > > Hong > > > On Wed, May 8, 2019 at 4:00 PM Dave May wrote: > > > On Wed, 8 May 2019 at 20:34, Justin Chang via petsc-users wrote: > So here's the branch/repo to the working example I have: > > https://github.com/jychang48/petsc-dss/tree/single-bus-vertex > > Type 'make' to compile the dss, it should work with the latest petsc-dev > > To test the performance, I've taken an existing IEEE 13-bus and duplicated it N times to create a long radial-like network. I have three sizes where N = 100, 500, and 1000. Those test files are listed as: > > input/test_100.m > input/test_500.m > input/test_1000.m > > I also created another set of examples where the IEEE 13-bus is fully balanced (but the program will crash ar the solve step because I used some unrealistic parameters for the Y-bus matrices and probably have some zeros somewhere). They are listed as: > > input/test2_100.m > input/test2_500.m > input/test2_1000.m > > The dof count and matrices for the test2_*.m files are slightly larger than their respective test_*.m but they have a bs=6. > > To run these tests, type the following: > > ./dpflow -input input/test_100.m > > I have a timer that shows how long it takes to compute the Jacobian. Attached are the log outputs I have for each of the six cases. > > Turns out that only the first call to the SNESComputeJacobian() is slow, all the subsequent calls are fast as I expect. This makes me think it still has something to do with matrix allocation. > > I think it is a preallocation issue. > Looking to some of the output files (test_1000.out, test_100.out), under Mat Object I see this in the KSPView > > total number of mallocs used during MatSetValues calls =10000 > > > > > > Thanks for the help everyone, > > Justin > > On Wed, May 8, 2019 at 12:36 PM Matthew Knepley wrote: > On Wed, May 8, 2019 at 2:30 PM Justin Chang wrote: > Hi everyone, > > Yes I have these lines in my code: > > ierr = DMCreateMatrix(networkdm,&J);CHKERRQ(ierr); > ierr = MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE);CHKERRQ(ierr); > > Okay, its not allocation. So maybe Hong is right that its setting great big element matrices. We will see with the example. > > Thanks, > > Matt > > I tried -info and here's my output: > > [0] PetscInitialize(): PETSc successfully started: number of processors = 1 > [0] PetscInitialize(): Running on machine: jchang31606s.domain > [0] PetscCommDuplicate(): Duplicating a communicator 4436504608 140550815662944 max tags = 2147483647 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944 > Base power = 0.166667, numbus = 115000, numgen = 5000, numyl = 75000, numdl = 5000, numlbr = 109999, numtbr = 5000 > > **** Power flow dist case **** > > Base power = 0.166667, nbus = 115000, ngen = 5000, nwye = 75000, ndelta = 5000, nbranch = 114999 > [0] PetscCommDuplicate(): Duplicating a communicator 4436505120 140550815683104 max tags = 2147483647 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 620000 X 620000; storage space: 0 unneeded,10799928 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 28 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 620000) < 0.6. Do not use CompressedRow routines. > [0] MatSeqAIJCheckInode(): Found 205000 nodes of 620000. Limit used: 5. Using Inode routines > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944 > [0] DMGetDMSNES(): Creating new DMSNES > [0] DMGetDMKSP(): Creating new DMKSP > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > 0 SNES Function norm 1155.45 > > nothing else -info related shows up as I'm iterating through the vertex loop. > > I'll have a MWE for you guys to play with shortly. > > Thanks, > Justin > > On Wed, May 8, 2019 at 12:10 PM Smith, Barry F. wrote: > > Justin, > > Are you providing matrix entries that connect directly one vertex to another vertex ACROSS an edge? I don't think that is supported by the DMNetwork model. The assumption is that edges are only connected to vertices and vertices are only connected to neighboring edges. > > Everyone, > > I second Matt's reply. > > How is the DMNetwork preallocating for the Jacobian? Does it take into account coupling between neighboring vertices/edges? Or does it assume no coupling. Or assume full coupling. If it assumes no coupling and the user has a good amount of coupling it will be very slow. > > There would need to be a way for the user provide the coupling information between neighboring vertices/edges if it assumes no coupling. > > Barry > > > > On May 8, 2019, at 7:44 AM, Matthew Knepley via petsc-users wrote: > > > > On Wed, May 8, 2019 at 4:45 AM Justin Chang via petsc-users wrote: > > Hi guys, > > > > I have a fully working distribution system solver written using DMNetwork, The idea is that each electrical bus can have up to three phase nodes, and each phase node has two unknowns: voltage magnitude and angle. In a completely balanced system, each bus has three nodes, but in an unbalanced system some of the buses can be either single phase or two-phase. > > > > The working DMNetwork code I developed, loosely based on the SNES network/power.c, essentially represents each vertex as a bus. DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to each vertex. If every single bus had the same number of variables, the mat block size = 2, 4, or 6, and my code is both fast and scalable. However, if the unknowns per DMNetwork vertex unknowns are not the same across, then my SNESFormJacobian function becomes extremely extremely slow. Specifically, the MatSetValues() calls when the col/row global indices contain an offset value that points to a neighboring bus vertex. > > > > I have never seen MatSetValues() be slow unless it is allocating. Did you confirm that you are not allocating, with -info? > > > > Thanks, > > > > MAtt > > > > Why is that? Is it because I no longer have a uniform block structure and lose the speed/optimization benefits of iterating through an AIJ matrix? I see three potential workarounds: > > > > 1) Treat every vertex as a three phase bus and "zero out" all the unused phase node dofs and put a 1 in the diagonal. The problem I see with this is that I will have unnecessary degrees of freedom (aka non-zeros in the matrix). From the distribution systems I've seen, it's possible that anywhere from 1/3 to 1/2 of the buses will be two-phase or less, meaning I may have nearly twice the amount of dofs than necessary if I wanted to preserve the block size = 6 for the AU mat. > > > > 2) Treat every phase node as a vertex aka solve a single-phase power flow solver. That way I guarantee to have a block size = 2, this is what Domenico's former student did in his thesis work. The problem I see with this is that I have a larger graph, which can take more time to setup and parallelize. > > > > 3) Create a "fieldsplit" where I essentially have three "blocks" - one for buses with all three phases, another for buses with only two phases, one for single-phase buses. This way each block/fieldsplit will have a consistent block size. I am not sure if this will solve the MatSetValues() issues, but it's, but can anyone give pointers on how to go about achieving this? > > > > Thanks, > > Justin > > > > > > -- > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From hzhang at mcs.anl.gov Thu May 9 09:42:02 2019 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Thu, 9 May 2019 14:42:02 +0000 Subject: [petsc-users] Extremely slow DMNetwork Jacobian assembly In-Reply-To: References: <4D37A5FF-992F-4959-A469-1C7CDAFC813C@anl.gov> <6F20629B-EFF2-4C0F-9385-15B1F2FE9073@pnnl.gov> <54D32962-6EA8-4103-A168-68E92219EED4@anl.gov> Message-ID: On Thu, May 9, 2019 at 9:24 AM Abhyankar, Shrirang G via petsc-users > wrote: > What code? The special sparsity pattern for blocks? Yes. See here https://bitbucket.org/petsc/petsc/src/master/src/dm/impls/network/network.c#lines-1712 > What if the user has dense blocks or don't know or care about the sparsity pattern? By default it should allocate something that every users code will fit in the matrix and then user callable options for making parts of the matrices sparse. That's what it is doing I believe. If the user does not set the sparse matrix block for the point then it uses a dense block. This is what we do in DMNetwork (see Sec. 2.4 of our wash-paper). Hong ?On 5/9/19, 12:24 AM, "Smith, Barry F." > wrote: > On May 8, 2019, at 9:53 PM, Abhyankar, Shrirang G via petsc-users > wrote: > > > > From: Matthew Knepley > > Date: Wednesday, May 8, 2019 at 9:29 PM > To: "Abhyankar, Shrirang G" > > Cc: "Zhang, Hong" >, Justin Chang >, petsc-users > > Subject: Re: [petsc-users] Extremely slow DMNetwork Jacobian assembly > > On Wed, May 8, 2019 at 10:17 PM Abhyankar, Shrirang G via petsc-users > wrote: > > > From: petsc-users > on behalf of "Zhang, Hong via petsc-users" > > Reply-To: "Zhang, Hong" > > Date: Wednesday, May 8, 2019 at 8:01 PM > To: Justin Chang > > Cc: petsc-users > > Subject: Re: [petsc-users] Extremely slow DMNetwork Jacobian assembly > > Justin: > Great, the issue is resolved. > Why MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE) does not raise an error? > > I copy-pasted the above line from the power.c example. MatSetOption should use PETSC_TRUE to activate the MAT_NEW_NONZERO_ALLOCATION_ERR option. > > Matt, > > We usually prevent this with a structured SetValues API. For example, DMDA uses MatSetValuesStencil() which cannot write > outside the stencil you set. DMPlex uses MatSetValuesClosure(), which is guaranteed to be allocated. We should write one > for DMNetwork. The allocation is just like Plex (I believe) where you allocate closure(star(p)), which would mean that setting > values for a vertex gets the neighboring edges and their vertices, and setting values for an edge gets the covering vertices. > Is that right for DMNetwork? > Yes, DMNetwork behaves in this fashion. > I cannot find MatSetValuesClosure() in petsc-master. > Can you provide detailed instruction on how to implement MatSetValuesClosure() for DMNetwork? > Note, dmnetwork is a subclass of DMPlex. > > DMNetwork does not do any matrix creation by itself. It calls Plex to create the matrix. > > Right. However, the only operation I put in was MatSetClosure() since that is appropriate for FEM. I think you would need > a MatSetStar() for DMNetwork as well. > > I see that Hong has implemented has some additional code in DMCreateMatrix_Network that does not use Plex for creating the matrix. I think it covers what you describe above. > > Hong: DMNetwork matrix creation code is not used unless the user wants to set special sparsity pattern for the blocks. Shouldn?t this code What code? The special sparsity pattern for blocks? > be used by default instead of having Plex create the matrix? What if the user has dense blocks or don'ts know or care about the sparsity pattern? By default it should allocate something that every users code will fit in the matrix and then user callable options for making parts of the matrices sparse. Barry > > Shri > > Matt > > Hong > > > On Wed, May 8, 2019 at 4:00 PM Dave May > wrote: > > > On Wed, 8 May 2019 at 20:34, Justin Chang via petsc-users > wrote: > So here's the branch/repo to the working example I have: > > https://github.com/jychang48/petsc-dss/tree/single-bus-vertex > > Type 'make' to compile the dss, it should work with the latest petsc-dev > > To test the performance, I've taken an existing IEEE 13-bus and duplicated it N times to create a long radial-like network. I have three sizes where N = 100, 500, and 1000. Those test files are listed as: > > input/test_100.m > input/test_500.m > input/test_1000.m > > I also created another set of examples where the IEEE 13-bus is fully balanced (but the program will crash ar the solve step because I used some unrealistic parameters for the Y-bus matrices and probably have some zeros somewhere). They are listed as: > > input/test2_100.m > input/test2_500.m > input/test2_1000.m > > The dof count and matrices for the test2_*.m files are slightly larger than their respective test_*.m but they have a bs=6. > > To run these tests, type the following: > > ./dpflow -input input/test_100.m > > I have a timer that shows how long it takes to compute the Jacobian. Attached are the log outputs I have for each of the six cases. > > Turns out that only the first call to the SNESComputeJacobian() is slow, all the subsequent calls are fast as I expect. This makes me think it still has something to do with matrix allocation. > > I think it is a preallocation issue. > Looking to some of the output files (test_1000.out, test_100.out), under Mat Object I see this in the KSPView > > total number of mallocs used during MatSetValues calls =10000 > > > > > > Thanks for the help everyone, > > Justin > > On Wed, May 8, 2019 at 12:36 PM Matthew Knepley > wrote: > On Wed, May 8, 2019 at 2:30 PM Justin Chang > wrote: > Hi everyone, > > Yes I have these lines in my code: > > ierr = DMCreateMatrix(networkdm,&J);CHKERRQ(ierr); > ierr = MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE);CHKERRQ(ierr); > > Okay, its not allocation. So maybe Hong is right that its setting great big element matrices. We will see with the example. > > Thanks, > > Matt > > I tried -info and here's my output: > > [0] PetscInitialize(): PETSc successfully started: number of processors = 1 > [0] PetscInitialize(): Running on machine: jchang31606s.domain > [0] PetscCommDuplicate(): Duplicating a communicator 4436504608 140550815662944 max tags = 2147483647 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944 > Base power = 0.166667, numbus = 115000, numgen = 5000, numyl = 75000, numdl = 5000, numlbr = 109999, numtbr = 5000 > > **** Power flow dist case **** > > Base power = 0.166667, nbus = 115000, ngen = 5000, nwye = 75000, ndelta = 5000, nbranch = 114999 > [0] PetscCommDuplicate(): Duplicating a communicator 4436505120 140550815683104 max tags = 2147483647 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 620000 X 620000; storage space: 0 unneeded,10799928 used > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 28 > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 620000) < 0.6. Do not use CompressedRow routines. > [0] MatSeqAIJCheckInode(): Found 205000 nodes of 620000. Limit used: 5. Using Inode routines > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944 > [0] DMGetDMSNES(): Creating new DMSNES > [0] DMGetDMKSP(): Creating new DMKSP > [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 > 0 SNES Function norm 1155.45 > > nothing else -info related shows up as I'm iterating through the vertex loop. > > I'll have a MWE for you guys to play with shortly. > > Thanks, > Justin > > On Wed, May 8, 2019 at 12:10 PM Smith, Barry F. > wrote: > > Justin, > > Are you providing matrix entries that connect directly one vertex to another vertex ACROSS an edge? I don't think that is supported by the DMNetwork model. The assumption is that edges are only connected to vertices and vertices are only connected to neighboring edges. > > Everyone, > > I second Matt's reply. > > How is the DMNetwork preallocating for the Jacobian? Does it take into account coupling between neighboring vertices/edges? Or does it assume no coupling. Or assume full coupling. If it assumes no coupling and the user has a good amount of coupling it will be very slow. > > There would need to be a way for the user provide the coupling information between neighboring vertices/edges if it assumes no coupling. > > Barry > > > > On May 8, 2019, at 7:44 AM, Matthew Knepley via petsc-users > wrote: > > > > On Wed, May 8, 2019 at 4:45 AM Justin Chang via petsc-users > wrote: > > Hi guys, > > > > I have a fully working distribution system solver written using DMNetwork, The idea is that each electrical bus can have up to three phase nodes, and each phase node has two unknowns: voltage magnitude and angle. In a completely balanced system, each bus has three nodes, but in an unbalanced system some of the buses can be either single phase or two-phase. > > > > The working DMNetwork code I developed, loosely based on the SNES network/power.c, essentially represents each vertex as a bus. DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to each vertex. If every single bus had the same number of variables, the mat block size = 2, 4, or 6, and my code is both fast and scalable. However, if the unknowns per DMNetwork vertex unknowns are not the same across, then my SNESFormJacobian function becomes extremely extremely slow. Specifically, the MatSetValues() calls when the col/row global indices contain an offset value that points to a neighboring bus vertex. > > > > I have never seen MatSetValues() be slow unless it is allocating. Did you confirm that you are not allocating, with -info? > > > > Thanks, > > > > MAtt > > > > Why is that? Is it because I no longer have a uniform block structure and lose the speed/optimization benefits of iterating through an AIJ matrix? I see three potential workarounds: > > > > 1) Treat every vertex as a three phase bus and "zero out" all the unused phase node dofs and put a 1 in the diagonal. The problem I see with this is that I will have unnecessary degrees of freedom (aka non-zeros in the matrix). From the distribution systems I've seen, it's possible that anywhere from 1/3 to 1/2 of the buses will be two-phase or less, meaning I may have nearly twice the amount of dofs than necessary if I wanted to preserve the block size = 6 for the AU mat. > > > > 2) Treat every phase node as a vertex aka solve a single-phase power flow solver. That way I guarantee to have a block size = 2, this is what Domenico's former student did in his thesis work. The problem I see with this is that I have a larger graph, which can take more time to setup and parallelize. > > > > 3) Create a "fieldsplit" where I essentially have three "blocks" - one for buses with all three phases, another for buses with only two phases, one for single-phase buses. This way each block/fieldsplit will have a consistent block size. I am not sure if this will solve the MatSetValues() issues, but it's, but can anyone give pointers on how to go about achieving this? > > > > Thanks, > > Justin > > > > > > -- > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu May 9 09:42:18 2019 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 9 May 2019 10:42:18 -0400 Subject: [petsc-users] Extremely slow DMNetwork Jacobian assembly In-Reply-To: References: <4D37A5FF-992F-4959-A469-1C7CDAFC813C@anl.gov> <6F20629B-EFF2-4C0F-9385-15B1F2FE9073@pnnl.gov> <54D32962-6EA8-4103-A168-68E92219EED4@anl.gov> Message-ID: On Thu, May 9, 2019 at 10:24 AM Abhyankar, Shrirang G < shrirang.abhyankar at pnnl.gov> wrote: > > What code? The special sparsity pattern for blocks? > > Yes. See here > https://bitbucket.org/petsc/petsc/src/master/src/dm/impls/network/network.c#lines-1712 > > > What if the user has dense blocks or don't know or care about the > sparsity pattern? By default it should allocate something that every users > code will fit in the matrix and then user callable options for making parts > of the matrices sparse. > > That's what it is doing I believe. If the user does not set the sparse > matrix block for the point then it uses a dense block. > So that code should have been put directly into the Plex preallocation code. It should still be moved over. It is an easy change since we do all the sparsity computation on points, and then at the end translate points to dofs in dense blocks. You would just need to change 5 lines or so. Matt > Shri > > ?On 5/9/19, 12:24 AM, "Smith, Barry F." wrote: > > > > > On May 8, 2019, at 9:53 PM, Abhyankar, Shrirang G via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > > > > > From: Matthew Knepley > > Date: Wednesday, May 8, 2019 at 9:29 PM > > To: "Abhyankar, Shrirang G" > > Cc: "Zhang, Hong" , Justin Chang < > jychang48 at gmail.com>, petsc-users > > Subject: Re: [petsc-users] Extremely slow DMNetwork Jacobian assembly > > > > On Wed, May 8, 2019 at 10:17 PM Abhyankar, Shrirang G via > petsc-users wrote: > > > > > > From: petsc-users on behalf of > "Zhang, Hong via petsc-users" > > Reply-To: "Zhang, Hong" > > Date: Wednesday, May 8, 2019 at 8:01 PM > > To: Justin Chang > > Cc: petsc-users > > Subject: Re: [petsc-users] Extremely slow DMNetwork Jacobian assembly > > > > Justin: > > Great, the issue is resolved. > > Why MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE) does > not raise an error? > > > > I copy-pasted the above line from the power.c example. MatSetOption > should use PETSC_TRUE to activate the MAT_NEW_NONZERO_ALLOCATION_ERR option. > > > > Matt, > > > > We usually prevent this with a structured SetValues API. For > example, DMDA uses MatSetValuesStencil() which cannot write > > outside the stencil you set. DMPlex uses MatSetValuesClosure(), > which is guaranteed to be allocated. We should write one > > for DMNetwork. The allocation is just like Plex (I believe) where > you allocate closure(star(p)), which would mean that setting > > values for a vertex gets the neighboring edges and their vertices, > and setting values for an edge gets the covering vertices. > > Is that right for DMNetwork? > > Yes, DMNetwork behaves in this fashion. > > I cannot find MatSetValuesClosure() in petsc-master. > > Can you provide detailed instruction on how to implement > MatSetValuesClosure() for DMNetwork? > > Note, dmnetwork is a subclass of DMPlex. > > > > DMNetwork does not do any matrix creation by itself. It calls Plex > to create the matrix. > > > > Right. However, the only operation I put in was MatSetClosure() > since that is appropriate for FEM. I think you would need > > a MatSetStar() for DMNetwork as well. > > > > I see that Hong has implemented has some additional code in > DMCreateMatrix_Network that does not use Plex for creating the matrix. I > think it covers what you describe above. > > > > Hong: DMNetwork matrix creation code is not used unless the user > wants to set special sparsity pattern for the blocks. Shouldn?t this code > > What code? The special sparsity pattern for blocks? > > > be used by default instead of having Plex create the matrix? > > What if the user has dense blocks or don'ts know or care about the > sparsity pattern? By default it should allocate something that every users > code will fit in the matrix and then user callable options for making parts > of the matrices sparse. > > Barry > > > > > Shri > > > > Matt > > > > Hong > > > > > > On Wed, May 8, 2019 at 4:00 PM Dave May > wrote: > > > > > > On Wed, 8 May 2019 at 20:34, Justin Chang via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > So here's the branch/repo to the working example I have: > > > > https://github.com/jychang48/petsc-dss/tree/single-bus-vertex > > > > Type 'make' to compile the dss, it should work with the latest > petsc-dev > > > > To test the performance, I've taken an existing IEEE 13-bus and > duplicated it N times to create a long radial-like network. I have three > sizes where N = 100, 500, and 1000. Those test files are listed as: > > > > input/test_100.m > > input/test_500.m > > input/test_1000.m > > > > I also created another set of examples where the IEEE 13-bus is > fully balanced (but the program will crash ar the solve step because I used > some unrealistic parameters for the Y-bus matrices and probably have some > zeros somewhere). They are listed as: > > > > input/test2_100.m > > input/test2_500.m > > input/test2_1000.m > > > > The dof count and matrices for the test2_*.m files are slightly > larger than their respective test_*.m but they have a bs=6. > > > > To run these tests, type the following: > > > > ./dpflow -input input/test_100.m > > > > I have a timer that shows how long it takes to compute the Jacobian. > Attached are the log outputs I have for each of the six cases. > > > > Turns out that only the first call to the SNESComputeJacobian() is > slow, all the subsequent calls are fast as I expect. This makes me think it > still has something to do with matrix allocation. > > > > I think it is a preallocation issue. > > Looking to some of the output files (test_1000.out, test_100.out), > under Mat Object I see this in the KSPView > > > > total number of mallocs used during MatSetValues calls =10000 > > > > > > > > > > > > Thanks for the help everyone, > > > > Justin > > > > On Wed, May 8, 2019 at 12:36 PM Matthew Knepley > wrote: > > On Wed, May 8, 2019 at 2:30 PM Justin Chang > wrote: > > Hi everyone, > > > > Yes I have these lines in my code: > > > > ierr = DMCreateMatrix(networkdm,&J);CHKERRQ(ierr); > > ierr = > MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE);CHKERRQ(ierr); > > > > Okay, its not allocation. So maybe Hong is right that its setting > great big element matrices. We will see with the example. > > > > Thanks, > > > > Matt > > > > I tried -info and here's my output: > > > > [0] PetscInitialize(): PETSc successfully started: number of > processors = 1 > > [0] PetscInitialize(): Running on machine: jchang31606s.domain > > [0] PetscCommDuplicate(): Duplicating a communicator 4436504608 > 140550815662944 max tags = 2147483647 > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 4436504608 140550815662944 > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 4436504608 140550815662944 > > Base power = 0.166667, numbus = 115000, numgen = 5000, numyl = > 75000, numdl = 5000, numlbr = 109999, numtbr = 5000 > > > > **** Power flow dist case **** > > > > Base power = 0.166667, nbus = 115000, ngen = 5000, nwye = 75000, > ndelta = 5000, nbranch = 114999 > > [0] PetscCommDuplicate(): Duplicating a communicator 4436505120 > 140550815683104 max tags = 2147483647 > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 4436505120 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 4436505120 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 4436505120 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 4436505120 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 4436505120 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 4436505120 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 4436505120 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 4436505120 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 4436505120 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 4436505120 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 4436505120 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 4436505120 140550815683104 > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 620000 X 620000; storage > space: 0 unneeded,10799928 used > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() > is 0 > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 28 > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows > 0)/(num_localrows 620000) < 0.6. Do not use CompressedRow routines. > > [0] MatSeqAIJCheckInode(): Found 205000 nodes of 620000. Limit used: > 5. Using Inode routines > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 4436505120 140550815683104 > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 4436504608 140550815662944 > > [0] DMGetDMSNES(): Creating new DMSNES > > [0] DMGetDMKSP(): Creating new DMKSP > > [0] PetscCommDuplicate(): Using internal PETSc communicator > 4436505120 140550815683104 > > 0 SNES Function norm 1155.45 > > > > nothing else -info related shows up as I'm iterating through the > vertex loop. > > > > I'll have a MWE for you guys to play with shortly. > > > > Thanks, > > Justin > > > > On Wed, May 8, 2019 at 12:10 PM Smith, Barry F. > wrote: > > > > Justin, > > > > Are you providing matrix entries that connect directly one > vertex to another vertex ACROSS an edge? I don't think that is supported by > the DMNetwork model. The assumption is that edges are only connected to > vertices and vertices are only connected to neighboring edges. > > > > Everyone, > > > > I second Matt's reply. > > > > How is the DMNetwork preallocating for the Jacobian? Does it take > into account coupling between neighboring vertices/edges? Or does it assume > no coupling. Or assume full coupling. If it assumes no coupling and the > user has a good amount of coupling it will be very slow. > > > > There would need to be a way for the user provide the coupling > information between neighboring vertices/edges if it assumes no coupling. > > > > Barry > > > > > > > On May 8, 2019, at 7:44 AM, Matthew Knepley via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > > > On Wed, May 8, 2019 at 4:45 AM Justin Chang via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > Hi guys, > > > > > > I have a fully working distribution system solver written using > DMNetwork, The idea is that each electrical bus can have up to three phase > nodes, and each phase node has two unknowns: voltage magnitude and angle. > In a completely balanced system, each bus has three nodes, but in an > unbalanced system some of the buses can be either single phase or two-phase. > > > > > > The working DMNetwork code I developed, loosely based on the SNES > network/power.c, essentially represents each vertex as a bus. > DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to > each vertex. If every single bus had the same number of variables, the mat > block size = 2, 4, or 6, and my code is both fast and scalable. However, if > the unknowns per DMNetwork vertex unknowns are not the same across, then my > SNESFormJacobian function becomes extremely extremely slow. Specifically, > the MatSetValues() calls when the col/row global indices contain an offset > value that points to a neighboring bus vertex. > > > > > > I have never seen MatSetValues() be slow unless it is allocating. > Did you confirm that you are not allocating, with -info? > > > > > > Thanks, > > > > > > MAtt > > > > > > Why is that? Is it because I no longer have a uniform block > structure and lose the speed/optimization benefits of iterating through an > AIJ matrix? I see three potential workarounds: > > > > > > 1) Treat every vertex as a three phase bus and "zero out" all the > unused phase node dofs and put a 1 in the diagonal. The problem I see with > this is that I will have unnecessary degrees of freedom (aka non-zeros in > the matrix). From the distribution systems I've seen, it's possible that > anywhere from 1/3 to 1/2 of the buses will be two-phase or less, meaning I > may have nearly twice the amount of dofs than necessary if I wanted to > preserve the block size = 6 for the AU mat. > > > > > > 2) Treat every phase node as a vertex aka solve a single-phase > power flow solver. That way I guarantee to have a block size = 2, this is > what Domenico's former student did in his thesis work. The problem I see > with this is that I have a larger graph, which can take more time to setup > and parallelize. > > > > > > 3) Create a "fieldsplit" where I essentially have three "blocks" - > one for buses with all three phases, another for buses with only two > phases, one for single-phase buses. This way each block/fieldsplit will > have a consistent block size. I am not sure if this will solve the > MatSetValues() issues, but it's, but can anyone give pointers on how to go > about achieving this? > > > > > > Thanks, > > > Justin > > > > > > > > > -- > > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > > -- Norbert Wiener > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu May 9 10:58:20 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Thu, 9 May 2019 15:58:20 +0000 Subject: [petsc-users] Extremely slow DMNetwork Jacobian assembly In-Reply-To: References: <4D37A5FF-992F-4959-A469-1C7CDAFC813C@anl.gov> <6F20629B-EFF2-4C0F-9385-15B1F2FE9073@pnnl.gov> <54D32962-6EA8-4103-A168-68E92219EED4@anl.gov> Message-ID: > On May 9, 2019, at 9:24 AM, Abhyankar, Shrirang G wrote: > >> What code? The special sparsity pattern for blocks? > > Yes. See here https://bitbucket.org/petsc/petsc/src/master/src/dm/impls/network/network.c#lines-1712 > >> What if the user has dense blocks or don't know or care about the sparsity pattern? By default it should allocate something that every users code will fit in the matrix and then user callable options for making parts of the matrices sparse. > > That's what it is doing I believe. If the user does not set the sparse matrix block for the point then it uses a dense block. And this is perfectly fine. It may waste a lot of memory but it absolutely would not cause a massive slow down in MatSetValues(). > > Shri > > > > ?On 5/9/19, 12:24 AM, "Smith, Barry F." wrote: > > > >> On May 8, 2019, at 9:53 PM, Abhyankar, Shrirang G via petsc-users wrote: >> >> >> >> From: Matthew Knepley >> Date: Wednesday, May 8, 2019 at 9:29 PM >> To: "Abhyankar, Shrirang G" >> Cc: "Zhang, Hong" , Justin Chang , petsc-users >> Subject: Re: [petsc-users] Extremely slow DMNetwork Jacobian assembly >> >> On Wed, May 8, 2019 at 10:17 PM Abhyankar, Shrirang G via petsc-users wrote: >> >> >> From: petsc-users on behalf of "Zhang, Hong via petsc-users" >> Reply-To: "Zhang, Hong" >> Date: Wednesday, May 8, 2019 at 8:01 PM >> To: Justin Chang >> Cc: petsc-users >> Subject: Re: [petsc-users] Extremely slow DMNetwork Jacobian assembly >> >> Justin: >> Great, the issue is resolved. >> Why MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE) does not raise an error? >> >> I copy-pasted the above line from the power.c example. MatSetOption should use PETSC_TRUE to activate the MAT_NEW_NONZERO_ALLOCATION_ERR option. >> >> Matt, >> >> We usually prevent this with a structured SetValues API. For example, DMDA uses MatSetValuesStencil() which cannot write >> outside the stencil you set. DMPlex uses MatSetValuesClosure(), which is guaranteed to be allocated. We should write one >> for DMNetwork. The allocation is just like Plex (I believe) where you allocate closure(star(p)), which would mean that setting >> values for a vertex gets the neighboring edges and their vertices, and setting values for an edge gets the covering vertices. >> Is that right for DMNetwork? >> Yes, DMNetwork behaves in this fashion. >> I cannot find MatSetValuesClosure() in petsc-master. >> Can you provide detailed instruction on how to implement MatSetValuesClosure() for DMNetwork? >> Note, dmnetwork is a subclass of DMPlex. >> >> DMNetwork does not do any matrix creation by itself. It calls Plex to create the matrix. >> >> Right. However, the only operation I put in was MatSetClosure() since that is appropriate for FEM. I think you would need >> a MatSetStar() for DMNetwork as well. >> >> I see that Hong has implemented has some additional code in DMCreateMatrix_Network that does not use Plex for creating the matrix. I think it covers what you describe above. >> >> Hong: DMNetwork matrix creation code is not used unless the user wants to set special sparsity pattern for the blocks. Shouldn?t this code > > What code? The special sparsity pattern for blocks? > >> be used by default instead of having Plex create the matrix? > > What if the user has dense blocks or don'ts know or care about the sparsity pattern? By default it should allocate something that every users code will fit in the matrix and then user callable options for making parts of the matrices sparse. > > Barry > >> >> Shri >> >> Matt >> >> Hong >> >> >> On Wed, May 8, 2019 at 4:00 PM Dave May wrote: >> >> >> On Wed, 8 May 2019 at 20:34, Justin Chang via petsc-users wrote: >> So here's the branch/repo to the working example I have: >> >> https://github.com/jychang48/petsc-dss/tree/single-bus-vertex >> >> Type 'make' to compile the dss, it should work with the latest petsc-dev >> >> To test the performance, I've taken an existing IEEE 13-bus and duplicated it N times to create a long radial-like network. I have three sizes where N = 100, 500, and 1000. Those test files are listed as: >> >> input/test_100.m >> input/test_500.m >> input/test_1000.m >> >> I also created another set of examples where the IEEE 13-bus is fully balanced (but the program will crash ar the solve step because I used some unrealistic parameters for the Y-bus matrices and probably have some zeros somewhere). They are listed as: >> >> input/test2_100.m >> input/test2_500.m >> input/test2_1000.m >> >> The dof count and matrices for the test2_*.m files are slightly larger than their respective test_*.m but they have a bs=6. >> >> To run these tests, type the following: >> >> ./dpflow -input input/test_100.m >> >> I have a timer that shows how long it takes to compute the Jacobian. Attached are the log outputs I have for each of the six cases. >> >> Turns out that only the first call to the SNESComputeJacobian() is slow, all the subsequent calls are fast as I expect. This makes me think it still has something to do with matrix allocation. >> >> I think it is a preallocation issue. >> Looking to some of the output files (test_1000.out, test_100.out), under Mat Object I see this in the KSPView >> >> total number of mallocs used during MatSetValues calls =10000 >> >> >> >> >> >> Thanks for the help everyone, >> >> Justin >> >> On Wed, May 8, 2019 at 12:36 PM Matthew Knepley wrote: >> On Wed, May 8, 2019 at 2:30 PM Justin Chang wrote: >> Hi everyone, >> >> Yes I have these lines in my code: >> >> ierr = DMCreateMatrix(networkdm,&J);CHKERRQ(ierr); >> ierr = MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE);CHKERRQ(ierr); >> >> Okay, its not allocation. So maybe Hong is right that its setting great big element matrices. We will see with the example. >> >> Thanks, >> >> Matt >> >> I tried -info and here's my output: >> >> [0] PetscInitialize(): PETSc successfully started: number of processors = 1 >> [0] PetscInitialize(): Running on machine: jchang31606s.domain >> [0] PetscCommDuplicate(): Duplicating a communicator 4436504608 140550815662944 max tags = 2147483647 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944 >> Base power = 0.166667, numbus = 115000, numgen = 5000, numyl = 75000, numdl = 5000, numlbr = 109999, numtbr = 5000 >> >> **** Power flow dist case **** >> >> Base power = 0.166667, nbus = 115000, ngen = 5000, nwye = 75000, ndelta = 5000, nbranch = 114999 >> [0] PetscCommDuplicate(): Duplicating a communicator 4436505120 140550815683104 max tags = 2147483647 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 >> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 620000 X 620000; storage space: 0 unneeded,10799928 used >> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 >> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 28 >> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 620000) < 0.6. Do not use CompressedRow routines. >> [0] MatSeqAIJCheckInode(): Found 205000 nodes of 620000. Limit used: 5. Using Inode routines >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608 140550815662944 >> [0] DMGetDMSNES(): Creating new DMSNES >> [0] DMGetDMKSP(): Creating new DMKSP >> [0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120 140550815683104 >> 0 SNES Function norm 1155.45 >> >> nothing else -info related shows up as I'm iterating through the vertex loop. >> >> I'll have a MWE for you guys to play with shortly. >> >> Thanks, >> Justin >> >> On Wed, May 8, 2019 at 12:10 PM Smith, Barry F. wrote: >> >> Justin, >> >> Are you providing matrix entries that connect directly one vertex to another vertex ACROSS an edge? I don't think that is supported by the DMNetwork model. The assumption is that edges are only connected to vertices and vertices are only connected to neighboring edges. >> >> Everyone, >> >> I second Matt's reply. >> >> How is the DMNetwork preallocating for the Jacobian? Does it take into account coupling between neighboring vertices/edges? Or does it assume no coupling. Or assume full coupling. If it assumes no coupling and the user has a good amount of coupling it will be very slow. >> >> There would need to be a way for the user provide the coupling information between neighboring vertices/edges if it assumes no coupling. >> >> Barry >> >> >>> On May 8, 2019, at 7:44 AM, Matthew Knepley via petsc-users wrote: >>> >>> On Wed, May 8, 2019 at 4:45 AM Justin Chang via petsc-users wrote: >>> Hi guys, >>> >>> I have a fully working distribution system solver written using DMNetwork, The idea is that each electrical bus can have up to three phase nodes, and each phase node has two unknowns: voltage magnitude and angle. In a completely balanced system, each bus has three nodes, but in an unbalanced system some of the buses can be either single phase or two-phase. >>> >>> The working DMNetwork code I developed, loosely based on the SNES network/power.c, essentially represents each vertex as a bus. DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to each vertex. If every single bus had the same number of variables, the mat block size = 2, 4, or 6, and my code is both fast and scalable. However, if the unknowns per DMNetwork vertex unknowns are not the same across, then my SNESFormJacobian function becomes extremely extremely slow. Specifically, the MatSetValues() calls when the col/row global indices contain an offset value that points to a neighboring bus vertex. >>> >>> I have never seen MatSetValues() be slow unless it is allocating. Did you confirm that you are not allocating, with -info? >>> >>> Thanks, >>> >>> MAtt >>> >>> Why is that? Is it because I no longer have a uniform block structure and lose the speed/optimization benefits of iterating through an AIJ matrix? I see three potential workarounds: >>> >>> 1) Treat every vertex as a three phase bus and "zero out" all the unused phase node dofs and put a 1 in the diagonal. The problem I see with this is that I will have unnecessary degrees of freedom (aka non-zeros in the matrix). From the distribution systems I've seen, it's possible that anywhere from 1/3 to 1/2 of the buses will be two-phase or less, meaning I may have nearly twice the amount of dofs than necessary if I wanted to preserve the block size = 6 for the AU mat. >>> >>> 2) Treat every phase node as a vertex aka solve a single-phase power flow solver. That way I guarantee to have a block size = 2, this is what Domenico's former student did in his thesis work. The problem I see with this is that I have a larger graph, which can take more time to setup and parallelize. >>> >>> 3) Create a "fieldsplit" where I essentially have three "blocks" - one for buses with all three phases, another for buses with only two phases, one for single-phase buses. This way each block/fieldsplit will have a consistent block size. I am not sure if this will solve the MatSetValues() issues, but it's, but can anyone give pointers on how to go about achieving this? >>> >>> Thanks, >>> Justin >>> >>> >>> -- >>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > > > From jean-christophe.giret at irt-saintexupery.com Thu May 9 11:34:39 2019 From: jean-christophe.giret at irt-saintexupery.com (GIRET Jean-Christophe) Date: Thu, 9 May 2019 16:34:39 +0000 Subject: [petsc-users] Question about parallel Vectors and communicators In-Reply-To: References: <25edff62fdda412e8a5db92c18e9dbc0@IRT00V020.IRT-AESE.local> Message-ID: Hello, Thanks Mark and Jed for your quick answers. So the idea is to define all the Vecs on the world communicator, and perform the communications using traditional scatter objects? The data would still be accessible on the two sub-communicators as they are both subsets of the MPI_COMM_WORLD communicator, but they would be used while creating the Vecs or the IS for the scatter. Is that right? I?m currently trying, without success, to perform a Scatter from a MPI Vec defined on a subcomm to another Vec defined on the world comm, and vice-versa. But I don?t know if it?s possible. I can imagine that trying doing that seems a bit strange. However, I?m dealing with code coupling (and linear algebra for the main part of the code), and my idea was trying to use the Vec data structures to perform data exchange between some parts of the software which would have their own communicator. It would eliminate the need to re-implement an ad-hoc solution. An option would be to stick on the world communicator for all the PETSc part, but I could face some situations where my Vecs could be small while I would have to run the whole simulation on an important number of core for the coupled part. I imagine that It may not really serve the linear system solving part in terms of performance. Another one would be perform all the PETSc operations on a sub-communicator and use ?raw? MPI communications between the communicators to perform the data exchange for the coupling part. Thanks again for your support, Best regards, Jean-Christophe De : Mark Adams [mailto:mfadams at lbl.gov] Envoy? : mardi 7 mai 2019 21:39 ? : GIRET Jean-Christophe Cc : petsc-users at mcs.anl.gov Objet : Re: [petsc-users] Question about parallel Vectors and communicators On Tue, May 7, 2019 at 11:38 AM GIRET Jean-Christophe via petsc-users > wrote: Dear PETSc users, I would like to use Petsc4Py for a project extension, which consists mainly of: - Storing data and matrices on several rank/nodes which could not fit on a single node. - Performing some linear algebra in a parallel fashion (solving sparse linear system for instance) - Exchanging those data structures (parallel vectors) between non-overlapping MPI communicators, created for instance by splitting MPI_COMM_WORLD. While the two first items seems to be well addressed by PETSc, I am wondering about the last one. Is it possible to access the data of a vector, defined on a communicator from another, non-overlapping communicator? From what I have seen from the documentation and the several threads on the user mailing-list, I would say no. But maybe I am missing something? If not, is it possible to transfer a vector defined on a given communicator on a communicator which is a subset of the previous one? If you are sending to a subset of processes then VecGetSubVec + Jed's tricks might work. https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecGetSubVector.html Best regards, Jean-Christophe -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu May 9 12:18:24 2019 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 9 May 2019 13:18:24 -0400 Subject: [petsc-users] Question about parallel Vectors and communicators In-Reply-To: References: <25edff62fdda412e8a5db92c18e9dbc0@IRT00V020.IRT-AESE.local> Message-ID: On Thu, May 9, 2019 at 12:34 PM GIRET Jean-Christophe via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hello, > > > > Thanks Mark and Jed for your quick answers. > > > > So the idea is to define all the Vecs on the world communicator, and > perform the communications using traditional scatter objects? The data > would still be accessible on the two sub-communicators as they are both > subsets of the MPI_COMM_WORLD communicator, but they would be used while > creating the Vecs or the IS for the scatter. Is that right? > > > > I?m currently trying, without success, to perform a Scatter from a MPI Vec > defined on a subcomm to another Vec defined on the world comm, and > vice-versa. But I don?t know if it?s possible. > You cannot do that. What you want to do is: 1) Create two Vecs on COMM_WORLD. Make the second vec have all 0 sizes on processes not in the subcomm. 2) Create a scatter between the two Vec 3) Scatter data 4) Use VecGetArray() to get the pointer to the data on the second vec, and use VecCreateWithArray() ONLY on the subcomm, or if you do not mind copies, just use a copy. Thanks, Matt > > I can imagine that trying doing that seems a bit strange. However, I?m > dealing with code coupling (and linear algebra for the main part of the > code), and my idea was trying to use the Vec data structures to perform > data exchange between some parts of the software which would have their own > communicator. It would eliminate the need to re-implement an ad-hoc > solution. > > > > An option would be to stick on the world communicator for all the PETSc > part, but I could face some situations where my Vecs could be small while I > would have to run the whole simulation on an important number of core for > the coupled part. I imagine that It may not really serve the linear system > solving part in terms of performance. Another one would be perform all the > PETSc operations on a sub-communicator and use ?raw? MPI communications > between the communicators to perform the data exchange for the coupling > part. > > > > Thanks again for your support, > > Best regards, > > Jean-Christophe > > > > *De :* Mark Adams [mailto:mfadams at lbl.gov] > *Envoy? :* mardi 7 mai 2019 21:39 > *? :* GIRET Jean-Christophe > *Cc :* petsc-users at mcs.anl.gov > *Objet :* Re: [petsc-users] Question about parallel Vectors and > communicators > > > > > > > > On Tue, May 7, 2019 at 11:38 AM GIRET Jean-Christophe via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Dear PETSc users, > > > > I would like to use Petsc4Py for a project extension, which consists > mainly of: > > - Storing data and matrices on several rank/nodes which could > not fit on a single node. > > - Performing some linear algebra in a parallel fashion (solving > sparse linear system for instance) > > - Exchanging those data structures (parallel vectors) between > non-overlapping MPI communicators, created for instance by splitting > MPI_COMM_WORLD. > > > > While the two first items seems to be well addressed by PETSc, I am > wondering about the last one. > > > > Is it possible to access the data of a vector, defined on a communicator > from another, non-overlapping communicator? From what I have seen from the > documentation and the several threads on the user mailing-list, I would say > no. But maybe I am missing something? If not, is it possible to transfer a > vector defined on a given communicator on a communicator which is a subset > of the previous one? > > > > If you are sending to a subset of processes then VecGetSubVec + Jed's > tricks might work. > > > > > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecGetSubVector.html > > > > > > Best regards, > > Jean-Christophe > > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu May 9 12:22:32 2019 From: jed at jedbrown.org (Jed Brown) Date: Thu, 09 May 2019 11:22:32 -0600 Subject: [petsc-users] Question about parallel Vectors and communicators In-Reply-To: References: <25edff62fdda412e8a5db92c18e9dbc0@IRT00V020.IRT-AESE.local> Message-ID: <87o94br19j.fsf@jedbrown.org> GIRET Jean-Christophe via petsc-users writes: > Hello, > > Thanks Mark and Jed for your quick answers. > > So the idea is to define all the Vecs on the world communicator, and perform the communications using traditional scatter objects? The data would still be accessible on the two sub-communicators as they are both subsets of the MPI_COMM_WORLD communicator, but they would be used while creating the Vecs or the IS for the scatter. Is that right? I would use two global Vecs (on COMM_WORLD) to perform the scatter. Those global Vec memory can be aliased to Vecs on subcomms with the same local sizes. > I?m currently trying, without success, to perform a Scatter from a MPI Vec defined on a subcomm to another Vec defined on the world comm, and vice-versa. But I don?t know if it?s possible. > > I can imagine that trying doing that seems a bit strange. However, I?m dealing with code coupling (and linear algebra for the main part of the code), and my idea was trying to use the Vec data structures to perform data exchange between some parts of the software which would have their own communicator. It would eliminate the need to re-implement an ad-hoc solution. > > An option would be to stick on the world communicator for all the PETSc part, but I could face some situations where my Vecs could be small while I would have to run the whole simulation on an important number of core for the coupled part. I imagine that It may not really serve the linear system solving part in terms of performance. Another one would be perform all the PETSc operations on a sub-communicator and use ?raw? MPI communications between the communicators to perform the data exchange for the coupling part. > > Thanks again for your support, > Best regards, > Jean-Christophe > > De : Mark Adams [mailto:mfadams at lbl.gov] > Envoy? : mardi 7 mai 2019 21:39 > ? : GIRET Jean-Christophe > Cc : petsc-users at mcs.anl.gov > Objet : Re: [petsc-users] Question about parallel Vectors and communicators > > > > On Tue, May 7, 2019 at 11:38 AM GIRET Jean-Christophe via petsc-users > wrote: > Dear PETSc users, > > I would like to use Petsc4Py for a project extension, which consists mainly of: > > - Storing data and matrices on several rank/nodes which could not fit on a single node. > > - Performing some linear algebra in a parallel fashion (solving sparse linear system for instance) > > - Exchanging those data structures (parallel vectors) between non-overlapping MPI communicators, created for instance by splitting MPI_COMM_WORLD. > > While the two first items seems to be well addressed by PETSc, I am wondering about the last one. > > Is it possible to access the data of a vector, defined on a communicator from another, non-overlapping communicator? From what I have seen from the documentation and the several threads on the user mailing-list, I would say no. But maybe I am missing something? If not, is it possible to transfer a vector defined on a given communicator on a communicator which is a subset of the previous one? > > If you are sending to a subset of processes then VecGetSubVec + Jed's tricks might work. > > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecGetSubVector.html > > > Best regards, > Jean-Christophe From dave.mayhem23 at gmail.com Thu May 9 12:34:44 2019 From: dave.mayhem23 at gmail.com (Dave May) Date: Thu, 9 May 2019 18:34:44 +0100 Subject: [petsc-users] Question about parallel Vectors and communicators In-Reply-To: References: <25edff62fdda412e8a5db92c18e9dbc0@IRT00V020.IRT-AESE.local> Message-ID: On Thu, 9 May 2019 at 18:19, Matthew Knepley via petsc-users < petsc-users at mcs.anl.gov> wrote: > On Thu, May 9, 2019 at 12:34 PM GIRET Jean-Christophe via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> Hello, >> >> >> >> Thanks Mark and Jed for your quick answers. >> >> >> >> So the idea is to define all the Vecs on the world communicator, and >> perform the communications using traditional scatter objects? The data >> would still be accessible on the two sub-communicators as they are both >> subsets of the MPI_COMM_WORLD communicator, but they would be used while >> creating the Vecs or the IS for the scatter. Is that right? >> >> >> >> I?m currently trying, without success, to perform a Scatter from a MPI >> Vec defined on a subcomm to another Vec defined on the world comm, and >> vice-versa. But I don?t know if it?s possible. >> > > You cannot do that. What you want to do is: > > 1) Create two Vecs on COMM_WORLD. Make the second vec have all 0 sizes on > processes not in the subcomm. > > 2) Create a scatter between the two Vec > > 3) Scatter data > > 4) Use VecGetArray() to get the pointer to the data on the second vec, and > use VecCreateWithArray() ONLY on the subcomm, > or if you do not mind copies, just use a copy. > You can find a concrete example of doing exactly what Matt described above within PCTELESCOPE. See src/ksp/pc/impls/telescope/telescope.c or go here https://www.mcs.anl.gov/petsc/petsc-current/src/ksp/pc/impls/telescope/telescope.c.html#PCTELESCOPE Specifically you want to examine the functions PCTelescopeSetUp_default() PCApply_Telescope() To explain in more detail, within PCTelescopeSetUp_default() * I create two vectors, xtmp (living on comm_1 with some ranks owning zero entries) and xred (defined on a sub comm of comm_1, say sub_comm_1). * I create the scatter between some input x and xtmp. The points implement Matt's steps 1 & 2 In PCApply_Telescope(), I perform the scatter between the input arg x and the vector xtmp (both defined on comm_1) [Matt's step 3]. Then you'll see this if (xred) { PetscScalar *LA_xred; VecGetOwnershipRange(xred,&st,&ed); VecGetArray(xred,&LA_xred); for (i=0; i > Thanks, > > Matt > > >> >> > I can imagine that trying doing that seems a bit strange. However, I?m >> dealing with code coupling (and linear algebra for the main part of the >> code), and my idea was trying to use the Vec data structures to perform >> data exchange between some parts of the software which would have their own >> communicator. It would eliminate the need to re-implement an ad-hoc >> solution. >> >> >> >> An option would be to stick on the world communicator for all the PETSc >> part, but I could face some situations where my Vecs could be small while I >> would have to run the whole simulation on an important number of core for >> the coupled part. I imagine that It may not really serve the linear system >> solving part in terms of performance. Another one would be perform all the >> PETSc operations on a sub-communicator and use ?raw? MPI communications >> between the communicators to perform the data exchange for the coupling >> part. >> >> >> >> Thanks again for your support, >> >> Best regards, >> >> Jean-Christophe >> >> >> >> *De :* Mark Adams [mailto:mfadams at lbl.gov] >> *Envoy? :* mardi 7 mai 2019 21:39 >> *? :* GIRET Jean-Christophe >> *Cc :* petsc-users at mcs.anl.gov >> *Objet :* Re: [petsc-users] Question about parallel Vectors and >> communicators >> >> >> >> >> >> >> >> On Tue, May 7, 2019 at 11:38 AM GIRET Jean-Christophe via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> >> Dear PETSc users, >> >> >> >> I would like to use Petsc4Py for a project extension, which consists >> mainly of: >> >> - Storing data and matrices on several rank/nodes which could >> not fit on a single node. >> >> - Performing some linear algebra in a parallel fashion (solving >> sparse linear system for instance) >> >> - Exchanging those data structures (parallel vectors) between >> non-overlapping MPI communicators, created for instance by splitting >> MPI_COMM_WORLD. >> >> >> >> While the two first items seems to be well addressed by PETSc, I am >> wondering about the last one. >> >> >> >> Is it possible to access the data of a vector, defined on a communicator >> from another, non-overlapping communicator? From what I have seen from the >> documentation and the several threads on the user mailing-list, I would say >> no. But maybe I am missing something? If not, is it possible to transfer a >> vector defined on a given communicator on a communicator which is a subset >> of the previous one? >> >> >> >> If you are sending to a subset of processes then VecGetSubVec + Jed's >> tricks might work. >> >> >> >> >> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecGetSubVector.html >> >> >> >> >> >> Best regards, >> >> Jean-Christophe >> >> >> >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu May 9 12:56:19 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Thu, 9 May 2019 17:56:19 +0000 Subject: [petsc-users] trust region/hook step equivalence In-Reply-To: References: Message-ID: <9813BB0B-113E-4084-9184-2A3A78593263@anl.gov> > On May 9, 2019, at 3:39 AM, Dave Lee via petsc-users wrote: > > Hi PETSc, > > I'm using the SNES trust region to solve a matrix free Newton problem. I can't see a lot of description of the trust region algorithm in the manual (section 5.2.2), and have also found it difficult to find documentation on the MINPACK project from which it is apparently derived. I have a couple of questions about this: > > 1) Is the PETSc SNES trust region algorithm the same as the "hook step" algorithm detailed in Section 6.4.1 of Dennis and Schnabel (1996) "Numerical methods for Unconstrained Optimization and Nonlinear Equations"? No. It is more naive than that. If the trust region is detected to be too big it does a simple backtracking until it gets a sufficient decrease in the function norm. The "true" trust region algorithms do something more clever than just back tracking along the Newton direction. > > 2) Is there anywhere I can find specific documentation on the trust region control parameters as defined in: https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESNEWTONTR.html#SNESNEWTONTR You need to look at the code. It is in src/snes/impls/tr/tr.c It is very simple. > > 3) My solve returns before it is sufficiently converged. Define sufficiently converged? The whole point of trust regions is that the nonlinear solver/optimization algorithm decides when to stop the linear solver, not your measure of the residual of the linear system. > On the last few Newton iterations the KSP converges due to: > CONVERGED_STEP_LENGTH > after only a couple of KSP iterations. What is the default for this parameter?, and how can I change it? Should I change it? The name is slightly confusing. This means the solver has reached the size of the trust region. To change this value means to change the size of the trust region. The initial size of the trust region is given by delta0*norm2(x) (or delta0 if x == 0). See SNESNEWTONTR. You can control delta0 with -snes_tr_delta0 delta0. After you start the algorithm it automatically adjusts the size of the trust region making it bigger or smaller based on how well Newton is working. Normally as Newton's method starts to converge well the trust region gets bigger and bigger (and hence the linear solver is solved more and more accurately). If the trust region doesn't grow it usually means something has gone wrong. Note you can run with -info to see with more detail what decisions the trust region algorithm is making.. I'm not sure I recommend you spend a lot of time on the trust region approach. The various line searches in PETSc are more robust and mature and if they fail you the trust region code is unlikely to save you. Barry > > Cheers, Dave. From cpraveen at gmail.com Thu May 9 13:07:40 2019 From: cpraveen at gmail.com (Praveen C) Date: Thu, 9 May 2019 20:07:40 +0200 Subject: [petsc-users] Questions on DMPlexDistribute Message-ID: <35D56FE2-C1DD-4979-9795-02D38E1CEDDD@gmail.com> Dear all I am trying to understand partitioning using DMPlexDistribute. I see an option ?overlap?. Does this determine how many ghost cells are added ? Are these determined based on face neighboring cells or vertex neighboring cells ? In https://www.mcs.anl.gov/petsc/petsc-current/src/ts/examples/tutorials/ex11.c.html I see a call to DMPlexConstructGhostCells after DMPlexDistribute. Can you explain the purpose of this and how it is related to overlap ? After DMPlexDistribute, how can I identify locally owned cells and ghost cells ? Thanks praveen From knepley at gmail.com Thu May 9 14:17:48 2019 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 9 May 2019 15:17:48 -0400 Subject: [petsc-users] Questions on DMPlexDistribute In-Reply-To: <35D56FE2-C1DD-4979-9795-02D38E1CEDDD@gmail.com> References: <35D56FE2-C1DD-4979-9795-02D38E1CEDDD@gmail.com> Message-ID: On Thu, May 9, 2019 at 2:07 PM Praveen C via petsc-users < petsc-users at mcs.anl.gov> wrote: > Dear all > > I am trying to understand partitioning using DMPlexDistribute. > > I see an option ?overlap?. Does this determine how many ghost cells are > added ? Are these determined based on face neighboring cells or vertex > neighboring cells ? > Its user settable: https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DM/DMSetAdjacency.html > In > > > https://www.mcs.anl.gov/petsc/petsc-current/src/ts/examples/tutorials/ex11.c.html > > I see a call to DMPlexConstructGhostCells after DMPlexDistribute. Can you > explain the purpose of this and how it is related to overlap ? > This is a different thing. It is a way to enforce boundary conditions in finite volume methods. You make a fake cell on the other side of every boundary face so that you can put a state there which creates the right face flux. > After DMPlexDistribute, how can I identify locally owned cells and ghost > cells ? > 1) Ghost cells are special, not shared with other processors, and marked with a label 2) Overlap cells are regular cells, and shared with other processes. You can determine if you own the cell by checking whether it is present in the point PetscSF: https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DM/DMGetPointSF.html Thanks, Matt > Thanks > praveen > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From davelee2804 at gmail.com Thu May 9 22:41:56 2019 From: davelee2804 at gmail.com (Dave Lee) Date: Fri, 10 May 2019 13:41:56 +1000 Subject: [petsc-users] trust region/hook step equivalence In-Reply-To: <9813BB0B-113E-4084-9184-2A3A78593263@anl.gov> References: <9813BB0B-113E-4084-9184-2A3A78593263@anl.gov> Message-ID: Thanks Barry, I will try reducing the default -snes_tr_delta0 from 0.2 to 0.1, and also take a look at using the line search method as well. The reason I wanted to use the trust region solver instead is that previous studies of my problem have used the "hook step" method, which I gather is more in line with the trust region method in that it first chooses a step size and then determines a direction for which convergence is ensured w.r.t. the step size, and not vice versa, as I gather is the case for the line search algorithm. Thanks again, Dave. On Fri, May 10, 2019 at 3:56 AM Smith, Barry F. wrote: > > > > On May 9, 2019, at 3:39 AM, Dave Lee via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > Hi PETSc, > > > > I'm using the SNES trust region to solve a matrix free Newton problem. I > can't see a lot of description of the trust region algorithm in the manual > (section 5.2.2), and have also found it difficult to find documentation on > the MINPACK project from which it is apparently derived. I have a couple of > questions about this: > > > > 1) Is the PETSc SNES trust region algorithm the same as the "hook step" > algorithm detailed in Section 6.4.1 of Dennis and Schnabel (1996) > "Numerical methods for Unconstrained Optimization and Nonlinear Equations"? > > No. It is more naive than that. If the trust region is detected to be > too big it does a simple backtracking until it gets a sufficient decrease > in the function norm. The "true" trust region algorithms do something more > clever than just back tracking along the Newton direction. > > > > > 2) Is there anywhere I can find specific documentation on the trust > region control parameters as defined in: > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESNEWTONTR.html#SNESNEWTONTR > > You need to look at the code. It is in src/snes/impls/tr/tr.c It is very > simple. > > > > > > 3) My solve returns before it is sufficiently converged. > > Define sufficiently converged? The whole point of trust regions is that > the nonlinear solver/optimization algorithm decides when to stop the linear > solver, not your measure of the residual of the linear system. > > > On the last few Newton iterations the KSP converges due to: > > CONVERGED_STEP_LENGTH > > after only a couple of KSP iterations. What is the default for this > parameter?, and how can I change it? Should I change it? > > The name is slightly confusing. This means the solver has reached the > size of the trust region. To change this value means to change the size of > the trust region. The initial size of the trust region is given by > delta0*norm2(x) (or delta0 if x == 0). See SNESNEWTONTR. You can control > delta0 with -snes_tr_delta0 delta0. After you start the algorithm it > automatically adjusts the size of the trust region making it bigger or > smaller based on how well Newton is working. > > Normally as Newton's method starts to converge well the trust region > gets bigger and bigger (and hence the linear solver is solved more and more > accurately). If the trust region doesn't grow it usually means something > has gone wrong. > > Note you can run with -info to see with more detail what decisions the > trust region algorithm is making.. > > I'm not sure I recommend you spend a lot of time on the trust region > approach. The various line searches in PETSc are more robust and mature and > if they fail you the trust region code is unlikely to save you. > > Barry > > > > > > > Cheers, Dave. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu May 9 23:03:50 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Fri, 10 May 2019 04:03:50 +0000 Subject: [petsc-users] trust region/hook step equivalence In-Reply-To: References: <9813BB0B-113E-4084-9184-2A3A78593263@anl.gov> Message-ID: <84392E26-E396-46FA-88AE-7CEEB083F832@mcs.anl.gov> Dave, You could explore adding the "hook step" feature to the current TR code. We've never done it due to lack of resources and low priority for us but it is a completely reasonable thing to support. Barry > On May 9, 2019, at 10:41 PM, Dave Lee wrote: > > Thanks Barry, > > I will try reducing the default -snes_tr_delta0 from 0.2 to 0.1, and also take a look at using the line search method as well. > > The reason I wanted to use the trust region solver instead is that previous studies of my problem have used the "hook step" method, which I gather is more in line with the trust region method in that it first chooses a step size and then determines a direction for which convergence is ensured w.r.t. the step size, and not vice versa, as I gather is the case for the line search algorithm. > > Thanks again, Dave. > > > > On Fri, May 10, 2019 at 3:56 AM Smith, Barry F. wrote: > > > > On May 9, 2019, at 3:39 AM, Dave Lee via petsc-users wrote: > > > > Hi PETSc, > > > > I'm using the SNES trust region to solve a matrix free Newton problem. I can't see a lot of description of the trust region algorithm in the manual (section 5.2.2), and have also found it difficult to find documentation on the MINPACK project from which it is apparently derived. I have a couple of questions about this: > > > > 1) Is the PETSc SNES trust region algorithm the same as the "hook step" algorithm detailed in Section 6.4.1 of Dennis and Schnabel (1996) "Numerical methods for Unconstrained Optimization and Nonlinear Equations"? > > No. It is more naive than that. If the trust region is detected to be too big it does a simple backtracking until it gets a sufficient decrease in the function norm. The "true" trust region algorithms do something more clever than just back tracking along the Newton direction. > > > > > 2) Is there anywhere I can find specific documentation on the trust region control parameters as defined in: https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESNEWTONTR.html#SNESNEWTONTR > > You need to look at the code. It is in src/snes/impls/tr/tr.c It is very simple. > > > > > > 3) My solve returns before it is sufficiently converged. > > Define sufficiently converged? The whole point of trust regions is that the nonlinear solver/optimization algorithm decides when to stop the linear solver, not your measure of the residual of the linear system. > > > On the last few Newton iterations the KSP converges due to: > > CONVERGED_STEP_LENGTH > > after only a couple of KSP iterations. What is the default for this parameter?, and how can I change it? Should I change it? > > The name is slightly confusing. This means the solver has reached the size of the trust region. To change this value means to change the size of the trust region. The initial size of the trust region is given by delta0*norm2(x) (or delta0 if x == 0). See SNESNEWTONTR. You can control delta0 with -snes_tr_delta0 delta0. After you start the algorithm it automatically adjusts the size of the trust region making it bigger or smaller based on how well Newton is working. > > Normally as Newton's method starts to converge well the trust region gets bigger and bigger (and hence the linear solver is solved more and more accurately). If the trust region doesn't grow it usually means something has gone wrong. > > Note you can run with -info to see with more detail what decisions the trust region algorithm is making.. > > I'm not sure I recommend you spend a lot of time on the trust region approach. The various line searches in PETSc are more robust and mature and if they fail you the trust region code is unlikely to save you. > > Barry > > > > > > > Cheers, Dave. > From davelee2804 at gmail.com Thu May 9 23:17:28 2019 From: davelee2804 at gmail.com (Dave Lee) Date: Fri, 10 May 2019 14:17:28 +1000 Subject: [petsc-users] trust region/hook step equivalence In-Reply-To: <84392E26-E396-46FA-88AE-7CEEB083F832@mcs.anl.gov> References: <9813BB0B-113E-4084-9184-2A3A78593263@anl.gov> <84392E26-E396-46FA-88AE-7CEEB083F832@mcs.anl.gov> Message-ID: Hi Barry, Depending on the success of the line search method for my slightly exotic problem I might just do that. Cheers, Dave. On Fri, May 10, 2019 at 2:03 PM Smith, Barry F. wrote: > > Dave, > > You could explore adding the "hook step" feature to the current TR > code. We've never done it due to lack of resources and low priority for us > but it is a completely reasonable thing to support. > > Barry > > > > On May 9, 2019, at 10:41 PM, Dave Lee wrote: > > > > Thanks Barry, > > > > I will try reducing the default -snes_tr_delta0 from 0.2 to 0.1, and > also take a look at using the line search method as well. > > > > The reason I wanted to use the trust region solver instead is that > previous studies of my problem have used the "hook step" method, which I > gather is more in line with the trust region method in that it first > chooses a step size and then determines a direction for which convergence > is ensured w.r.t. the step size, and not vice versa, as I gather is the > case for the line search algorithm. > > > > Thanks again, Dave. > > > > > > > > On Fri, May 10, 2019 at 3:56 AM Smith, Barry F. > wrote: > > > > > > > On May 9, 2019, at 3:39 AM, Dave Lee via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > > > Hi PETSc, > > > > > > I'm using the SNES trust region to solve a matrix free Newton problem. > I can't see a lot of description of the trust region algorithm in the > manual (section 5.2.2), and have also found it difficult to find > documentation on the MINPACK project from which it is apparently derived. I > have a couple of questions about this: > > > > > > 1) Is the PETSc SNES trust region algorithm the same as the "hook > step" algorithm detailed in Section 6.4.1 of Dennis and Schnabel (1996) > "Numerical methods for Unconstrained Optimization and Nonlinear Equations"? > > > > No. It is more naive than that. If the trust region is detected to be > too big it does a simple backtracking until it gets a sufficient decrease > in the function norm. The "true" trust region algorithms do something more > clever than just back tracking along the Newton direction. > > > > > > > > 2) Is there anywhere I can find specific documentation on the trust > region control parameters as defined in: > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESNEWTONTR.html#SNESNEWTONTR > > > > You need to look at the code. It is in src/snes/impls/tr/tr.c It is > very simple. > > > > > > > > > > 3) My solve returns before it is sufficiently converged. > > > > Define sufficiently converged? The whole point of trust regions is > that the nonlinear solver/optimization algorithm decides when to stop the > linear solver, not your measure of the residual of the linear system. > > > > > On the last few Newton iterations the KSP converges due to: > > > CONVERGED_STEP_LENGTH > > > after only a couple of KSP iterations. What is the default for this > parameter?, and how can I change it? Should I change it? > > > > The name is slightly confusing. This means the solver has reached the > size of the trust region. To change this value means to change the size of > the trust region. The initial size of the trust region is given by > delta0*norm2(x) (or delta0 if x == 0). See SNESNEWTONTR. You can control > delta0 with -snes_tr_delta0 delta0. After you start the algorithm it > automatically adjusts the size of the trust region making it bigger or > smaller based on how well Newton is working. > > > > Normally as Newton's method starts to converge well the trust region > gets bigger and bigger (and hence the linear solver is solved more and more > accurately). If the trust region doesn't grow it usually means something > has gone wrong. > > > > Note you can run with -info to see with more detail what decisions > the trust region algorithm is making.. > > > > I'm not sure I recommend you spend a lot of time on the trust region > approach. The various line searches in PETSc are more robust and mature and > if they fail you the trust region code is unlikely to save you. > > > > Barry > > > > > > > > > > > > Cheers, Dave. > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jczhang at mcs.anl.gov Fri May 10 15:01:14 2019 From: jczhang at mcs.anl.gov (Zhang, Junchao) Date: Fri, 10 May 2019 20:01:14 +0000 Subject: [petsc-users] Question about parallel Vectors and communicators In-Reply-To: References: <25edff62fdda412e8a5db92c18e9dbc0@IRT00V020.IRT-AESE.local> Message-ID: Jean-Christophe, I added a petsc example at https://bitbucket.org/petsc/petsc/pull-requests/1652/add-an-example-to-show-transfer-vectors/diff#chg-src/vec/vscat/examples/ex9.c It shows how to transfer vectors from a parent communicator to vectors on a child communicator. It also shows how to transfer vectors from a subcomm to vectors on another subcomm. The two subcomms are not required to cover all processes in PETSC_COMM_WORLD. Hope it helps you better understand Vec and VecScatter. --Junchao Zhang On Thu, May 9, 2019 at 11:34 AM GIRET Jean-Christophe via petsc-users > wrote: Hello, Thanks Mark and Jed for your quick answers. So the idea is to define all the Vecs on the world communicator, and perform the communications using traditional scatter objects? The data would still be accessible on the two sub-communicators as they are both subsets of the MPI_COMM_WORLD communicator, but they would be used while creating the Vecs or the IS for the scatter. Is that right? I?m currently trying, without success, to perform a Scatter from a MPI Vec defined on a subcomm to another Vec defined on the world comm, and vice-versa. But I don?t know if it?s possible. I can imagine that trying doing that seems a bit strange. However, I?m dealing with code coupling (and linear algebra for the main part of the code), and my idea was trying to use the Vec data structures to perform data exchange between some parts of the software which would have their own communicator. It would eliminate the need to re-implement an ad-hoc solution. An option would be to stick on the world communicator for all the PETSc part, but I could face some situations where my Vecs could be small while I would have to run the whole simulation on an important number of core for the coupled part. I imagine that It may not really serve the linear system solving part in terms of performance. Another one would be perform all the PETSc operations on a sub-communicator and use ?raw? MPI communications between the communicators to perform the data exchange for the coupling part. Thanks again for your support, Best regards, Jean-Christophe De : Mark Adams [mailto:mfadams at lbl.gov] Envoy? : mardi 7 mai 2019 21:39 ? : GIRET Jean-Christophe Cc : petsc-users at mcs.anl.gov Objet : Re: [petsc-users] Question about parallel Vectors and communicators On Tue, May 7, 2019 at 11:38 AM GIRET Jean-Christophe via petsc-users > wrote: Dear PETSc users, I would like to use Petsc4Py for a project extension, which consists mainly of: - Storing data and matrices on several rank/nodes which could not fit on a single node. - Performing some linear algebra in a parallel fashion (solving sparse linear system for instance) - Exchanging those data structures (parallel vectors) between non-overlapping MPI communicators, created for instance by splitting MPI_COMM_WORLD. While the two first items seems to be well addressed by PETSc, I am wondering about the last one. Is it possible to access the data of a vector, defined on a communicator from another, non-overlapping communicator? From what I have seen from the documentation and the several threads on the user mailing-list, I would say no. But maybe I am missing something? If not, is it possible to transfer a vector defined on a given communicator on a communicator which is a subset of the previous one? If you are sending to a subset of processes then VecGetSubVec + Jed's tricks might work. https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecGetSubVector.html Best regards, Jean-Christophe -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdkong.jd at gmail.com Sat May 11 17:58:28 2019 From: fdkong.jd at gmail.com (Fande Kong) Date: Sat, 11 May 2019 16:58:28 -0600 Subject: [petsc-users] Processor's coarse DMDA must lie over fine DMDA ( i_start 1375 i_c 687 i_start_ghost_c 689) Message-ID: Hi All, I was running src/mat/examples/tests/ex96.c with "-Mx 1000 -My 1000 -Mz 1000" with 8192 MPI ranks, and got the message. If I changed the mesh size a little bit (such as -Mx 400 -My 400 -Mz 400), then the code ran fine. The relationship between the coarse mesh and the fine mesh is defined through the following code * user.ratio = 2;* * user.coarse.mx = 20; user.coarse.my = 20; user.coarse.mz = 20;* * ierr = PetscOptionsGetInt(NULL,NULL,"-Mx",&user.coarse.mx ,NULL);CHKERRQ(ierr);* * ierr = PetscOptionsGetInt(NULL,NULL,"-My",&user.coarse.my ,NULL);CHKERRQ(ierr);* * ierr = PetscOptionsGetInt(NULL,NULL,"-Mz",&user.coarse.mz ,NULL);CHKERRQ(ierr);* * ierr = PetscOptionsGetInt(NULL,NULL,"-ratio",&user.ratio,NULL);CHKERRQ(ierr);* * if (user.coarse.mz ) Test_3D = PETSC_TRUE;* * user.fine.mx = user.ratio*(user.coarse.mx-1)+1;* * user.fine.my = user.ratio*(user.coarse.my-1)+1;* * user.fine.mz = user.ratio*(user.coarse.mz-1)+1;* I was wondering what is the rule to determine what sizes I could pass in? Thanks, Fande, -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sat May 11 20:24:22 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Sun, 12 May 2019 01:24:22 +0000 Subject: [petsc-users] Processor's coarse DMDA must lie over fine DMDA ( i_start 1375 i_c 687 i_start_ghost_c 689) In-Reply-To: References: Message-ID: <12E67F8A-4FC9-4B5B-9B77-EC03C98C0AD1@anl.gov> Check the source code for exact details. > On May 11, 2019, at 5:58 PM, Fande Kong via petsc-users wrote: > > Hi All, > > I was running src/mat/examples/tests/ex96.c with "-Mx 1000 -My 1000 -Mz 1000" with 8192 MPI ranks, and got the message. If I changed the mesh size a little bit (such as -Mx 400 -My 400 -Mz 400), then the code ran fine. > > The relationship between the coarse mesh and the fine mesh is defined through the following code > > > user.ratio = 2; > user.coarse.mx = 20; user.coarse.my = 20; user.coarse.mz = 20; > > ierr = PetscOptionsGetInt(NULL,NULL,"-Mx",&user.coarse.mx,NULL);CHKERRQ(ierr); > ierr = PetscOptionsGetInt(NULL,NULL,"-My",&user.coarse.my,NULL);CHKERRQ(ierr); > ierr = PetscOptionsGetInt(NULL,NULL,"-Mz",&user.coarse.mz,NULL);CHKERRQ(ierr); > ierr = PetscOptionsGetInt(NULL,NULL,"-ratio",&user.ratio,NULL);CHKERRQ(ierr); > > if (user.coarse.mz) Test_3D = PETSC_TRUE; > > user.fine.mx = user.ratio*(user.coarse.mx-1)+1; > user.fine.my = user.ratio*(user.coarse.my-1)+1; > user.fine.mz = user.ratio*(user.coarse.mz-1)+1; > > > I was wondering what is the rule to determine what sizes I could pass in? > > Thanks, > > Fande, From fdkong.jd at gmail.com Sat May 11 20:31:03 2019 From: fdkong.jd at gmail.com (Fande Kong) Date: Sat, 11 May 2019 19:31:03 -0600 Subject: [petsc-users] Processor's coarse DMDA must lie over fine DMDA ( i_start 1375 i_c 687 i_start_ghost_c 689) In-Reply-To: <12E67F8A-4FC9-4B5B-9B77-EC03C98C0AD1@anl.gov> References: <12E67F8A-4FC9-4B5B-9B77-EC03C98C0AD1@anl.gov> Message-ID: OK, I figured it out. It was caused by the different partition of the coarse DA from the fine DA. I will fix ex96.c. Thanks, Fande, On Sat, May 11, 2019 at 7:24 PM Smith, Barry F. wrote: > > Check the source code for exact details. > > > > On May 11, 2019, at 5:58 PM, Fande Kong via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > Hi All, > > > > I was running src/mat/examples/tests/ex96.c with "-Mx 1000 -My 1000 -Mz > 1000" with 8192 MPI ranks, and got the message. If I changed the mesh > size a little bit (such as -Mx 400 -My 400 -Mz 400), then the code ran > fine. > > > > The relationship between the coarse mesh and the fine mesh is defined > through the following code > > > > > > user.ratio = 2; > > user.coarse.mx = 20; user.coarse.my = 20; user.coarse.mz = 20; > > > > ierr = PetscOptionsGetInt(NULL,NULL,"-Mx",&user.coarse.mx > ,NULL);CHKERRQ(ierr); > > ierr = PetscOptionsGetInt(NULL,NULL,"-My",&user.coarse.my > ,NULL);CHKERRQ(ierr); > > ierr = PetscOptionsGetInt(NULL,NULL,"-Mz",&user.coarse.mz > ,NULL);CHKERRQ(ierr); > > ierr = > PetscOptionsGetInt(NULL,NULL,"-ratio",&user.ratio,NULL);CHKERRQ(ierr); > > > > if (user.coarse.mz) Test_3D = PETSC_TRUE; > > > > user.fine.mx = user.ratio*(user.coarse.mx-1)+1; > > user.fine.my = user.ratio*(user.coarse.my-1)+1; > > user.fine.mz = user.ratio*(user.coarse.mz-1)+1; > > > > > > I was wondering what is the rule to determine what sizes I could pass > in? > > > > Thanks, > > > > Fande, > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From myriam.peyrounette at idris.fr Mon May 13 03:01:34 2019 From: myriam.peyrounette at idris.fr (Myriam Peyrounette) Date: Mon, 13 May 2019 10:01:34 +0200 Subject: [petsc-users] Fwd: Bad memory scaling with PETSc 3.10 In-Reply-To: <9bb4ddb6-b99e-7a1b-16e1-f226f8fd0d0b@idris.fr> References: <9bb4ddb6-b99e-7a1b-16e1-f226f8fd0d0b@idris.fr> Message-ID: Hi all, I tried with 3.11.1 version and Barry's fix. The good scaling is back! See the green curve in the plot attached. It is even better than PETSc 3.6! And it runs faster (10-15s instead of 200-300s with 3.6). So you were right. It seems that not all the PtAPs used the scalable version. I was a bit confused about the options to set... I used the options: -matptap_via scalable and -mat_freeintermediatedatastructures 1. Do you think it would be even better with allatonce? It is unfortunate that this fix can't be merged with the master branch. But the patch works well and I can consider the issue as solved now. Thanks a lot for your time! Myriam Le 05/04/19 ? 06:54, Smith, Barry F. a ?crit?: > Hmm, I had already fixed this, I think, > > https://bitbucket.org/petsc/petsc/pull-requests/1606/change-handling-of-matptap_mpiaij_mpimaij/diff > > but unfortunately our backlog of pull requests kept it out of master. We are (well Satish and Jed) working on a new CI infrastructure that will hopefully be more stable than the current CI that we are using. > > Fande, > Sorry you had to spend time on this. > > > Barry > > > >> On May 3, 2019, at 11:20 PM, Fande Kong via petsc-users wrote: >> >> Hi Myriam, >> >> I run the example you attached earlier with "-mx 48 -my 48 -mz 48 -levels 3 -ksp_view -matptap_via allatonce -log_view ". >> >> There are six PtAPs. Two of them are sill using the nonscalable version of the algorithm (this might explain why the memory still exponentially increases) even though we have asked PETSc to use the ``allatonce" algorithm. This is happening because MATMAIJ does not honor the petsc option, instead, it uses the default setting of MPIAIJ. I have a fix at https://bitbucket.org/petsc/petsc/pull-requests/1623/choose-algorithms-in/diff. The PR should fix the issue. >> >> Thanks again for your report, >> >> Fande, >> >> -- Myriam Peyrounette CNRS/IDRIS - HLST -- -------------- next part -------------- A non-text attachment was scrubbed... Name: ex42_mem_scaling_ada_patch.png Type: image/png Size: 23928 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2975 bytes Desc: Signature cryptographique S/MIME URL: From jean-christophe.giret at irt-saintexupery.com Mon May 13 09:07:42 2019 From: jean-christophe.giret at irt-saintexupery.com (GIRET Jean-Christophe) Date: Mon, 13 May 2019 14:07:42 +0000 Subject: [petsc-users] Question about parallel Vectors and communicators In-Reply-To: References: <25edff62fdda412e8a5db92c18e9dbc0@IRT00V020.IRT-AESE.local> Message-ID: <7a46710c39964cebbabe92ef19d55f0b@IRT00V020.IRT-AESE.local> Hello, Thank you all for you answers and examples, it?s now very clear: the trick is to alias a Vec on a subcomm with a Vec on the parent comm, and to make the comm through Scatter on the parent comm. I have also been able to implement it with petsc4py. Junchao, thank you for your example. It is indeed very clear. Although I understand how the exchanges are made through the Vecs defined on the parent comms, I am wondering why ISCreateStride is defined on the communicator PETSC_COMM_SELF and not on the parent communicator spanning the Vecs used for the Scatter operations. When I read the documentation, I see: ?The communicator, comm, should consist of all processes that will be using the IS.? I would say in that case that it is the same communicator used for the ?exchange? vectors. I am surely misunderstanding something here, but I didn?t find any answer while googling. Any hint on that? Again, thank you all for your great support, Best, JC De : Zhang, Junchao [mailto:jczhang at mcs.anl.gov] Envoy? : vendredi 10 mai 2019 22:01 ? : GIRET Jean-Christophe Cc : Mark Adams; petsc-users at mcs.anl.gov Objet : Re: [petsc-users] Question about parallel Vectors and communicators Jean-Christophe, I added a petsc example at https://bitbucket.org/petsc/petsc/pull-requests/1652/add-an-example-to-show-transfer-vectors/diff#chg-src/vec/vscat/examples/ex9.c It shows how to transfer vectors from a parent communicator to vectors on a child communicator. It also shows how to transfer vectors from a subcomm to vectors on another subcomm. The two subcomms are not required to cover all processes in PETSC_COMM_WORLD. Hope it helps you better understand Vec and VecScatter. --Junchao Zhang On Thu, May 9, 2019 at 11:34 AM GIRET Jean-Christophe via petsc-users > wrote: Hello, Thanks Mark and Jed for your quick answers. So the idea is to define all the Vecs on the world communicator, and perform the communications using traditional scatter objects? The data would still be accessible on the two sub-communicators as they are both subsets of the MPI_COMM_WORLD communicator, but they would be used while creating the Vecs or the IS for the scatter. Is that right? I?m currently trying, without success, to perform a Scatter from a MPI Vec defined on a subcomm to another Vec defined on the world comm, and vice-versa. But I don?t know if it?s possible. I can imagine that trying doing that seems a bit strange. However, I?m dealing with code coupling (and linear algebra for the main part of the code), and my idea was trying to use the Vec data structures to perform data exchange between some parts of the software which would have their own communicator. It would eliminate the need to re-implement an ad-hoc solution. An option would be to stick on the world communicator for all the PETSc part, but I could face some situations where my Vecs could be small while I would have to run the whole simulation on an important number of core for the coupled part. I imagine that It may not really serve the linear system solving part in terms of performance. Another one would be perform all the PETSc operations on a sub-communicator and use ?raw? MPI communications between the communicators to perform the data exchange for the coupling part. Thanks again for your support, Best regards, Jean-Christophe De : Mark Adams [mailto:mfadams at lbl.gov] Envoy? : mardi 7 mai 2019 21:39 ? : GIRET Jean-Christophe Cc : petsc-users at mcs.anl.gov Objet : Re: [petsc-users] Question about parallel Vectors and communicators On Tue, May 7, 2019 at 11:38 AM GIRET Jean-Christophe via petsc-users > wrote: Dear PETSc users, I would like to use Petsc4Py for a project extension, which consists mainly of: - Storing data and matrices on several rank/nodes which could not fit on a single node. - Performing some linear algebra in a parallel fashion (solving sparse linear system for instance) - Exchanging those data structures (parallel vectors) between non-overlapping MPI communicators, created for instance by splitting MPI_COMM_WORLD. While the two first items seems to be well addressed by PETSc, I am wondering about the last one. Is it possible to access the data of a vector, defined on a communicator from another, non-overlapping communicator? From what I have seen from the documentation and the several threads on the user mailing-list, I would say no. But maybe I am missing something? If not, is it possible to transfer a vector defined on a given communicator on a communicator which is a subset of the previous one? If you are sending to a subset of processes then VecGetSubVec + Jed's tricks might work. https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecGetSubVector.html Best regards, Jean-Christophe -------------- next part -------------- An HTML attachment was scrubbed... URL: From jczhang at mcs.anl.gov Mon May 13 10:14:20 2019 From: jczhang at mcs.anl.gov (Zhang, Junchao) Date: Mon, 13 May 2019 15:14:20 +0000 Subject: [petsc-users] Question about parallel Vectors and communicators In-Reply-To: <7a46710c39964cebbabe92ef19d55f0b@IRT00V020.IRT-AESE.local> References: <25edff62fdda412e8a5db92c18e9dbc0@IRT00V020.IRT-AESE.local> <7a46710c39964cebbabe92ef19d55f0b@IRT00V020.IRT-AESE.local> Message-ID: The index sets provide possible i, j in scatter "y[j] = x[i]". Each process provides a portion of the i and j of the whole scatter. The only requirement of VecScatterCreate is that on each process, local sizes of ix and iy must be equal (a process can provide empty ix and iy). A process's i and j can point to anyplace in their vector (not constrained to the vector's local part) The interpretation of ix and iy is not dependent on their communicator, instead, is dependent on their associated vector. Let P and S stand for parallel and sequential vectors respectively, there are four combinations of vecscatters: PtoP, PtoS, StoP and StoS. The assumption is: if x is parallel, then ix contains global indices of x. If x is sequential, ix contains local indices of x. Similarly for y and iy. So, index sets created with PETSC_COMM_SELF can perfectly include global indices. That is why I always use PETSC_COMM_SELF to create index sets for VecScatter. It makes things easier to understand. The quote you gave is also confusing to me. If you use PETSC_COMM_SELF, it means only the current process uses the IS. That sounds ok since other processes can not get a reference to this IS. Maybe, other petsc developers can explain when parallel communicators are useful for index sets. My feeling is that they are useless at least for VecScatter. --Junchao Zhang On Mon, May 13, 2019 at 9:07 AM GIRET Jean-Christophe > wrote: Hello, Thank you all for you answers and examples, it?s now very clear: the trick is to alias a Vec on a subcomm with a Vec on the parent comm, and to make the comm through Scatter on the parent comm. I have also been able to implement it with petsc4py. Junchao, thank you for your example. It is indeed very clear. Although I understand how the exchanges are made through the Vecs defined on the parent comms, I am wondering why ISCreateStride is defined on the communicator PETSC_COMM_SELF and not on the parent communicator spanning the Vecs used for the Scatter operations. When I read the documentation, I see: ?The communicator, comm, should consist of all processes that will be using the IS.? I would say in that case that it is the same communicator used for the ?exchange? vectors. I am surely misunderstanding something here, but I didn?t find any answer while googling. Any hint on that? Again, thank you all for your great support, Best, JC De : Zhang, Junchao [mailto:jczhang at mcs.anl.gov] Envoy? : vendredi 10 mai 2019 22:01 ? : GIRET Jean-Christophe Cc : Mark Adams; petsc-users at mcs.anl.gov Objet : Re: [petsc-users] Question about parallel Vectors and communicators Jean-Christophe, I added a petsc example at https://bitbucket.org/petsc/petsc/pull-requests/1652/add-an-example-to-show-transfer-vectors/diff#chg-src/vec/vscat/examples/ex9.c It shows how to transfer vectors from a parent communicator to vectors on a child communicator. It also shows how to transfer vectors from a subcomm to vectors on another subcomm. The two subcomms are not required to cover all processes in PETSC_COMM_WORLD. Hope it helps you better understand Vec and VecScatter. --Junchao Zhang On Thu, May 9, 2019 at 11:34 AM GIRET Jean-Christophe via petsc-users > wrote: Hello, Thanks Mark and Jed for your quick answers. So the idea is to define all the Vecs on the world communicator, and perform the communications using traditional scatter objects? The data would still be accessible on the two sub-communicators as they are both subsets of the MPI_COMM_WORLD communicator, but they would be used while creating the Vecs or the IS for the scatter. Is that right? I?m currently trying, without success, to perform a Scatter from a MPI Vec defined on a subcomm to another Vec defined on the world comm, and vice-versa. But I don?t know if it?s possible. I can imagine that trying doing that seems a bit strange. However, I?m dealing with code coupling (and linear algebra for the main part of the code), and my idea was trying to use the Vec data structures to perform data exchange between some parts of the software which would have their own communicator. It would eliminate the need to re-implement an ad-hoc solution. An option would be to stick on the world communicator for all the PETSc part, but I could face some situations where my Vecs could be small while I would have to run the whole simulation on an important number of core for the coupled part. I imagine that It may not really serve the linear system solving part in terms of performance. Another one would be perform all the PETSc operations on a sub-communicator and use ?raw? MPI communications between the communicators to perform the data exchange for the coupling part. Thanks again for your support, Best regards, Jean-Christophe De : Mark Adams [mailto:mfadams at lbl.gov] Envoy? : mardi 7 mai 2019 21:39 ? : GIRET Jean-Christophe Cc : petsc-users at mcs.anl.gov Objet : Re: [petsc-users] Question about parallel Vectors and communicators On Tue, May 7, 2019 at 11:38 AM GIRET Jean-Christophe via petsc-users > wrote: Dear PETSc users, I would like to use Petsc4Py for a project extension, which consists mainly of: - Storing data and matrices on several rank/nodes which could not fit on a single node. - Performing some linear algebra in a parallel fashion (solving sparse linear system for instance) - Exchanging those data structures (parallel vectors) between non-overlapping MPI communicators, created for instance by splitting MPI_COMM_WORLD. While the two first items seems to be well addressed by PETSc, I am wondering about the last one. Is it possible to access the data of a vector, defined on a communicator from another, non-overlapping communicator? From what I have seen from the documentation and the several threads on the user mailing-list, I would say no. But maybe I am missing something? If not, is it possible to transfer a vector defined on a given communicator on a communicator which is a subset of the previous one? If you are sending to a subset of processes then VecGetSubVec + Jed's tricks might work. https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecGetSubVector.html Best regards, Jean-Christophe -------------- next part -------------- An HTML attachment was scrubbed... URL: From fdkong.jd at gmail.com Mon May 13 10:20:07 2019 From: fdkong.jd at gmail.com (Fande Kong) Date: Mon, 13 May 2019 09:20:07 -0600 Subject: [petsc-users] Bad memory scaling with PETSc 3.10 In-Reply-To: References: <9bb4ddb6-b99e-7a1b-16e1-f226f8fd0d0b@idris.fr> Message-ID: Hi Myriam, Thanks for your report back. On Mon, May 13, 2019 at 2:01 AM Myriam Peyrounette < myriam.peyrounette at idris.fr> wrote: > Hi all, > > I tried with 3.11.1 version and Barry's fix. The good scaling is back! > See the green curve in the plot attached. It is even better than PETSc > 3.6! And it runs faster (10-15s instead of 200-300s with 3.6). > We are glad your issue was resolved here. > > So you were right. It seems that not all the PtAPs used the scalable > version. > > I was a bit confused about the options to set... I used the options: > -matptap_via scalable and -mat_freeintermediatedatastructures 1. Do you > think it would be even better with allatonce? > "scalable" and "allatonce" correspond to different algorithms respectively. ``allatonce" should be using less memory than "scalable". The "allatonce" algorithm would be a good alternative if your application is memory sensitive and the problem size is large. We are definitely curious about the memory usage of ``allatonce" in your test cases but don't feel obligated to do these tests since your concern were resolved now. In case you are also interested in how our new algorithms perform, I post petsc options here that are used to choose these algorithms: algorithm 1: ``allatonce" -matptap_via allatonce -mat_freeintermediatedatastructures 1 algorithm 2: ``allatonce_merged" -matptap_via allatonce_merged -mat_freeintermediatedatastructures 1 Again, thanks for your report that help us improve PETSc. Fande, > > It is unfortunate that this fix can't be merged with the master branch. > But the patch works well and I can consider the issue as solved now. > > Thanks a lot for your time! > > Myriam > > > Le 05/04/19 ? 06:54, Smith, Barry F. a ?crit : > > Hmm, I had already fixed this, I think, > > > > > https://bitbucket.org/petsc/petsc/pull-requests/1606/change-handling-of-matptap_mpiaij_mpimaij/diff > > > > but unfortunately our backlog of pull requests kept it out of master. > We are (well Satish and Jed) working on a new CI infrastructure that will > hopefully be more stable than the current CI that we are using. > > > > Fande, > > Sorry you had to spend time on this. > > > > > > Barry > > > > > > > >> On May 3, 2019, at 11:20 PM, Fande Kong via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> > >> Hi Myriam, > >> > >> I run the example you attached earlier with "-mx 48 -my 48 -mz 48 > -levels 3 -ksp_view -matptap_via allatonce -log_view ". > >> > >> There are six PtAPs. Two of them are sill using the nonscalable version > of the algorithm (this might explain why the memory still exponentially > increases) even though we have asked PETSc to use the ``allatonce" > algorithm. This is happening because MATMAIJ does not honor the petsc > option, instead, it uses the default setting of MPIAIJ. I have a fix at > https://bitbucket.org/petsc/petsc/pull-requests/1623/choose-algorithms-in/diff. > The PR should fix the issue. > >> > >> Thanks again for your report, > >> > >> Fande, > >> > >> > > -- > Myriam Peyrounette > CNRS/IDRIS - HLST > -- > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon May 13 10:21:54 2019 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 13 May 2019 11:21:54 -0400 Subject: [petsc-users] Question about parallel Vectors and communicators In-Reply-To: <7a46710c39964cebbabe92ef19d55f0b@IRT00V020.IRT-AESE.local> References: <25edff62fdda412e8a5db92c18e9dbc0@IRT00V020.IRT-AESE.local> <7a46710c39964cebbabe92ef19d55f0b@IRT00V020.IRT-AESE.local> Message-ID: On Mon, May 13, 2019 at 10:07 AM GIRET Jean-Christophe via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hello, > > > > Thank you all for you answers and examples, it?s now very clear: the trick > is to alias a Vec on a subcomm with a Vec on the parent comm, and to make > the comm through Scatter on the parent comm. I have also been able to > implement it with petsc4py. > > > > Junchao, thank you for your example. It is indeed very clear. Although I > understand how the exchanges are made through the Vecs defined on the > parent comms, I am wondering why *ISCreateStride* is defined on the > communicator PETSC_COMM_SELF and not on the parent communicator spanning > the Vecs used for the Scatter operations. > > > > When I read the documentation, I see*: ?The communicator, comm, should > consist of all processes that will be using the IS.?* I would say in that > case that it is the same communicator used for the ?exchange? vectors. > > > > I am surely misunderstanding something here, but I didn?t find any answer > while googling. Any hint on that? > It would work the same if you gave COMM_WORLD, however its unnecessary. As Junchao says, IS is only used locally. If, however, you wanted to concatenate a bunch of indices from different processes, you would need to set it to COMM_WORLD. Thanks, Matt > Again, thank you all for your great support, > > Best, > > JC > > > > > > > > *De :* Zhang, Junchao [mailto:jczhang at mcs.anl.gov] > *Envoy? :* vendredi 10 mai 2019 22:01 > *? :* GIRET Jean-Christophe > *Cc :* Mark Adams; petsc-users at mcs.anl.gov > *Objet :* Re: [petsc-users] Question about parallel Vectors and > communicators > > > > Jean-Christophe, > > I added a petsc example at > https://bitbucket.org/petsc/petsc/pull-requests/1652/add-an-example-to-show-transfer-vectors/diff#chg-src/vec/vscat/examples/ex9.c > > It shows how to transfer vectors from a parent communicator to vectors > on a child communicator. It also shows how to transfer vectors from a > subcomm to vectors on another subcomm. The two subcomms are not required to > cover all processes in PETSC_COMM_WORLD. > > Hope it helps you better understand Vec and VecScatter. > > --Junchao Zhang > > > > > > On Thu, May 9, 2019 at 11:34 AM GIRET Jean-Christophe via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hello, > > > > Thanks Mark and Jed for your quick answers. > > > > So the idea is to define all the Vecs on the world communicator, and > perform the communications using traditional scatter objects? The data > would still be accessible on the two sub-communicators as they are both > subsets of the MPI_COMM_WORLD communicator, but they would be used while > creating the Vecs or the IS for the scatter. Is that right? > > > > I?m currently trying, without success, to perform a Scatter from a MPI Vec > defined on a subcomm to another Vec defined on the world comm, and > vice-versa. But I don?t know if it?s possible. > > > > I can imagine that trying doing that seems a bit strange. However, I?m > dealing with code coupling (and linear algebra for the main part of the > code), and my idea was trying to use the Vec data structures to perform > data exchange between some parts of the software which would have their own > communicator. It would eliminate the need to re-implement an ad-hoc > solution. > > > > An option would be to stick on the world communicator for all the PETSc > part, but I could face some situations where my Vecs could be small while I > would have to run the whole simulation on an important number of core for > the coupled part. I imagine that It may not really serve the linear system > solving part in terms of performance. Another one would be perform all the > PETSc operations on a sub-communicator and use ?raw? MPI communications > between the communicators to perform the data exchange for the coupling > part. > > > > Thanks again for your support, > > Best regards, > > Jean-Christophe > > > > *De :* Mark Adams [mailto:mfadams at lbl.gov] > *Envoy? :* mardi 7 mai 2019 21:39 > *? :* GIRET Jean-Christophe > *Cc :* petsc-users at mcs.anl.gov > *Objet :* Re: [petsc-users] Question about parallel Vectors and > communicators > > > > > > > > On Tue, May 7, 2019 at 11:38 AM GIRET Jean-Christophe via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Dear PETSc users, > > > > I would like to use Petsc4Py for a project extension, which consists > mainly of: > > - Storing data and matrices on several rank/nodes which could > not fit on a single node. > > - Performing some linear algebra in a parallel fashion (solving > sparse linear system for instance) > > - Exchanging those data structures (parallel vectors) between > non-overlapping MPI communicators, created for instance by splitting > MPI_COMM_WORLD. > > > > While the two first items seems to be well addressed by PETSc, I am > wondering about the last one. > > > > Is it possible to access the data of a vector, defined on a communicator > from another, non-overlapping communicator? From what I have seen from the > documentation and the several threads on the user mailing-list, I would say > no. But maybe I am missing something? If not, is it possible to transfer a > vector defined on a given communicator on a communicator which is a subset > of the previous one? > > > > If you are sending to a subset of processes then VecGetSubVec + Jed's > tricks might work. > > > > > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecGetSubVector.html > > > > > > Best regards, > > Jean-Christophe > > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From s_g at berkeley.edu Tue May 14 18:14:24 2019 From: s_g at berkeley.edu (Sanjay Govindjee) Date: Tue, 14 May 2019 16:14:24 -0700 Subject: [petsc-users] Precision of MatView Message-ID: <684b3a35-b912-4e8c-b76e-91d989b5cdf4@berkeley.edu> I am using the following bit of code to debug a matrix.? What is the expected precision of the numbers that I will find in my ASCII file? As far as I can tell it is not the full double precision that I was expecting. ??????????? call PetscViewerASCIIOpen(PETSC_COMM_WORLD, tangview,K_view, ierr) ??????????? call PetscViewerSetFormat(K_view, PETSC_VIEWER_ASCII_MATLAB, ierr) ??????????? call MatView?????????? ? ? ? ? ? (Kmat, K_view, ierr) -sanjay From mfadams at lbl.gov Tue May 14 19:22:31 2019 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 14 May 2019 20:22:31 -0400 Subject: [petsc-users] Precision of MatView In-Reply-To: <684b3a35-b912-4e8c-b76e-91d989b5cdf4@berkeley.edu> References: <684b3a35-b912-4e8c-b76e-91d989b5cdf4@berkeley.edu> Message-ID: I would hope you get full precision. How many digits are you seeing? On Tue, May 14, 2019 at 7:15 PM Sanjay Govindjee via petsc-users < petsc-users at mcs.anl.gov> wrote: > I am using the following bit of code to debug a matrix. What is the > expected precision of the numbers that I will find in my ASCII file? > As far as I can tell it is not the full double precision that I was > expecting. > > call PetscViewerASCIIOpen(PETSC_COMM_WORLD, > tangview,K_view, ierr) > call PetscViewerSetFormat(K_view, > PETSC_VIEWER_ASCII_MATLAB, ierr) > call MatView (Kmat, K_view, ierr) > > -sanjay > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From s_g at berkeley.edu Tue May 14 19:34:31 2019 From: s_g at berkeley.edu (Sanjay Govindjee) Date: Tue, 14 May 2019 17:34:31 -0700 Subject: [petsc-users] Precision of MatView In-Reply-To: References: <684b3a35-b912-4e8c-b76e-91d989b5cdf4@berkeley.edu> Message-ID: <87b9e321-056f-1ac7-ae53-d52bf7beb891@berkeley.edu> I'm seeing half precision on at least 10 to 20% of the entries :( Knowing I should see full precision, I will dig deeper. -sanjay On 5/14/19 5:22 PM, Mark Adams wrote: > I would hope you get full precision. How many digits are you seeing? > > On Tue, May 14, 2019 at 7:15 PM Sanjay Govindjee via petsc-users > > wrote: > > I am using the following bit of code to debug a matrix.? What is the > expected precision of the numbers that I will find in my ASCII file? > As far as I can tell it is not the full double precision that I was > expecting. > > ???????????? call PetscViewerASCIIOpen(PETSC_COMM_WORLD, > tangview,K_view, ierr) > ???????????? call PetscViewerSetFormat(K_view, > PETSC_VIEWER_ASCII_MATLAB, ierr) > ???????????? call MatView?????????? ? ? ? ? ? (Kmat, K_view, ierr) > > -sanjay > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eda.oktay at metu.edu.tr Wed May 15 06:34:57 2019 From: eda.oktay at metu.edu.tr (Eda Oktay) Date: Wed, 15 May 2019 14:34:57 +0300 Subject: [petsc-users] Matrix Decomposition Message-ID: Hello, I am trying to divide a matrix into unequal sized parts into different processors (for example I want to divide 10*10 matrix into 4*4 and 6*6 submatrix in two processors). When my program reads a matrix from file, it automatically divides it into equal parts and then I can't change local sizes. How can I decompose a matrix that is read from a file? Thanks, Eda -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 15 06:51:23 2019 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 15 May 2019 07:51:23 -0400 Subject: [petsc-users] Matrix Decomposition In-Reply-To: References: Message-ID: On Wed, May 15, 2019 at 7:35 AM Eda Oktay via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hello, > > I am trying to divide a matrix into unequal sized parts into different > processors (for example I want to divide 10*10 matrix into 4*4 and 6*6 > submatrix in two processors). When my program reads a matrix from file, it > automatically divides it into equal parts and then I can't change local > sizes. > > How can I decompose a matrix that is read from a file? > MatLoad() takes a matrix argument. I believe you can use MatSetSizes() before loading to get the distribution you want. Matt > Thanks, > > Eda > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From eda.oktay at metu.edu.tr Wed May 15 08:06:02 2019 From: eda.oktay at metu.edu.tr (Eda Oktay) Date: Wed, 15 May 2019 16:06:02 +0300 Subject: [petsc-users] Matrix Decomposition In-Reply-To: References: Message-ID: Dear Matt, I am trying to distribute the matrix after loading it. So I tried something like this: Mat B; PetscInt bm,bn; MatSetSizes(B,kk,kk,PETSC_DETERMINE,PETSC_DETERMINE); MatDuplicate(A,MAT_COPY_VALUES,&B); MatGetLocalSize(B,&bm,&bn); where A is the original matrix (10*10) and kk is one the local sizes of A (kk=4 so I want to divide A into 4*4 and 6*6). However, I get error in MatSetSizes part and when I printed bm and bn, I get 5. In other words, B is divided equally even though I tried to divide it unequally. Am I using MatSetSizes wrong? Thanks, Eda Matthew Knepley , 15 May 2019 ?ar, 14:51 tarihinde ?unu yazd?: > On Wed, May 15, 2019 at 7:35 AM Eda Oktay via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> Hello, >> >> I am trying to divide a matrix into unequal sized parts into different >> processors (for example I want to divide 10*10 matrix into 4*4 and 6*6 >> submatrix in two processors). When my program reads a matrix from file, it >> automatically divides it into equal parts and then I can't change local >> sizes. >> >> How can I decompose a matrix that is read from a file? >> > > MatLoad() takes a matrix argument. I believe you can use MatSetSizes() > before loading to get the distribution you want. > > Matt > > >> Thanks, >> >> Eda >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed May 15 08:34:21 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Wed, 15 May 2019 13:34:21 +0000 Subject: [petsc-users] Matrix Decomposition In-Reply-To: References: Message-ID: <416E3627-3B9F-459A-8BEA-80BF1F918E17@anl.gov> You cannot change the sizes after the matrix is assembled, nor can you duplicate a matrix with a different layout from the original matrix. If you want to partition a matrix that already exists use MatCreateSubMatrix(). You can use ISCreateStride() to generate the IS that define the new layout you want. Barry But when possible I would set the sizes of the matrix before loading it since then you don't need two copies of the matrix in memory at the same time. > On May 15, 2019, at 8:06 AM, Eda Oktay via petsc-users wrote: > > Dear Matt, > > I am trying to distribute the matrix after loading it. So I tried something like this: > > Mat B; > PetscInt bm,bn; > MatSetSizes(B,kk,kk,PETSC_DETERMINE,PETSC_DETERMINE); > MatDuplicate(A,MAT_COPY_VALUES,&B); > MatGetLocalSize(B,&bm,&bn); > > where A is the original matrix (10*10) and kk is one the local sizes of A (kk=4 so I want to divide A into 4*4 and 6*6). However, I get error in MatSetSizes part and when I printed bm and bn, I get 5. In other words, B is divided equally even though I tried to divide it unequally. Am I using MatSetSizes wrong? > > Thanks, > > Eda > > Matthew Knepley , 15 May 2019 ?ar, 14:51 tarihinde ?unu yazd?: > On Wed, May 15, 2019 at 7:35 AM Eda Oktay via petsc-users wrote: > Hello, > > I am trying to divide a matrix into unequal sized parts into different processors (for example I want to divide 10*10 matrix into 4*4 and 6*6 submatrix in two processors). When my program reads a matrix from file, it automatically divides it into equal parts and then I can't change local sizes. > > How can I decompose a matrix that is read from a file? > > MatLoad() takes a matrix argument. I believe you can use MatSetSizes() before loading to get the distribution you want. > > Matt > > Thanks, > > Eda > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From mfadams at lbl.gov Wed May 15 08:59:43 2019 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 15 May 2019 09:59:43 -0400 Subject: [petsc-users] Precision of MatView In-Reply-To: <87b9e321-056f-1ac7-ae53-d52bf7beb891@berkeley.edu> References: <684b3a35-b912-4e8c-b76e-91d989b5cdf4@berkeley.edu> <87b9e321-056f-1ac7-ae53-d52bf7beb891@berkeley.edu> Message-ID: You are seeing half precision (like 7 digits) in 10-20% of the entries and full in the rest. Someone will probably chime in who knows about this but I can see where a serial matrix is printed in ASCII Matlab in MatView_SeqAIJ_ASCII in src/mat/impls/aij/seq/aij.c. I think this line is operative and is should clearly work: ierr = PetscViewerASCIIPrintf(viewer,"%D %D %18.16e\n",i+1,a->j[j]+1,(double)a->a[j]);CHKERRQ(ierr); Could you run in serial (this code could very well be used for MPI Mats also) with Matlab/ASCII to verify that you have this problem. And you could modify this print statement and remake PETSc, if that's easy, to verify that this code is operative. I think %18.16e should print 16 digits even if they are 0s ... On Tue, May 14, 2019 at 8:34 PM Sanjay Govindjee wrote: > I'm seeing half precision on at least 10 to 20% of the entries :( > Knowing I should see full precision, I will dig deeper. > > -sanjay > > On 5/14/19 5:22 PM, Mark Adams wrote: > > I would hope you get full precision. How many digits are you seeing? > > On Tue, May 14, 2019 at 7:15 PM Sanjay Govindjee via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> I am using the following bit of code to debug a matrix. What is the >> expected precision of the numbers that I will find in my ASCII file? >> As far as I can tell it is not the full double precision that I was >> expecting. >> >> call PetscViewerASCIIOpen(PETSC_COMM_WORLD, >> tangview,K_view, ierr) >> call PetscViewerSetFormat(K_view, >> PETSC_VIEWER_ASCII_MATLAB, ierr) >> call MatView (Kmat, K_view, ierr) >> >> -sanjay >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed May 15 09:06:09 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Wed, 15 May 2019 14:06:09 +0000 Subject: [petsc-users] Precision of MatView In-Reply-To: References: <684b3a35-b912-4e8c-b76e-91d989b5cdf4@berkeley.edu> <87b9e321-056f-1ac7-ae53-d52bf7beb891@berkeley.edu> Message-ID: <618B6424-ED63-4F4D-AC02-873B46F0E3EC@anl.gov> The 10-20% in seven digits are presumably printed accurately; it is presumably simply the case that the rest of the digits would be zero and hence are not printed. > On May 15, 2019, at 8:59 AM, Mark Adams via petsc-users wrote: > > You are seeing half precision (like 7 digits) in 10-20% of the entries and full in the rest. > > Someone will probably chime in who knows about this but I can see where a serial matrix is printed in ASCII Matlab in MatView_SeqAIJ_ASCII in src/mat/impls/aij/seq/aij.c. > > I think this line is operative and is should clearly work: > > ierr = PetscViewerASCIIPrintf(viewer,"%D %D %18.16e\n",i+1,a->j[j]+1,(double)a->a[j]);CHKERRQ(ierr); > > Could you run in serial (this code could very well be used for MPI Mats also) with Matlab/ASCII to verify that you have this problem. And you could modify this print statement and remake PETSc, if that's easy, to verify that this code is operative. > > I think %18.16e should print 16 digits even if they are 0s ... > > > On Tue, May 14, 2019 at 8:34 PM Sanjay Govindjee wrote: > I'm seeing half precision on at least 10 to 20% of the entries :( > Knowing I should see full precision, I will dig deeper. > > -sanjay > On 5/14/19 5:22 PM, Mark Adams wrote: >> I would hope you get full precision. How many digits are you seeing? >> >> On Tue, May 14, 2019 at 7:15 PM Sanjay Govindjee via petsc-users wrote: >> I am using the following bit of code to debug a matrix. What is the >> expected precision of the numbers that I will find in my ASCII file? >> As far as I can tell it is not the full double precision that I was >> expecting. >> >> call PetscViewerASCIIOpen(PETSC_COMM_WORLD, >> tangview,K_view, ierr) >> call PetscViewerSetFormat(K_view, >> PETSC_VIEWER_ASCII_MATLAB, ierr) >> call MatView (Kmat, K_view, ierr) >> >> -sanjay >> > From mfadams at lbl.gov Wed May 15 09:25:46 2019 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 15 May 2019 10:25:46 -0400 Subject: [petsc-users] Precision of MatView In-Reply-To: <618B6424-ED63-4F4D-AC02-873B46F0E3EC@anl.gov> References: <684b3a35-b912-4e8c-b76e-91d989b5cdf4@berkeley.edu> <87b9e321-056f-1ac7-ae53-d52bf7beb891@berkeley.edu> <618B6424-ED63-4F4D-AC02-873B46F0E3EC@anl.gov> Message-ID: This thread suggests that I was at least not wrong in assuming that trailing 0s should be printed by printf (although I did not trace the code down to printf) https://stackoverflow.com/questions/277772/avoid-trailing-zeroes-in-printf Maybe Sanjay's machines printf cuts off trailing 0s. On Wed, May 15, 2019 at 10:06 AM Smith, Barry F. wrote: > > The 10-20% in seven digits are presumably printed accurately; it is > presumably simply the case that the rest of the digits would be zero and > hence are not printed. > > > On May 15, 2019, at 8:59 AM, Mark Adams via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > You are seeing half precision (like 7 digits) in 10-20% of the entries > and full in the rest. > > > > Someone will probably chime in who knows about this but I can see where > a serial matrix is printed in ASCII Matlab in MatView_SeqAIJ_ASCII in > src/mat/impls/aij/seq/aij.c. > > > > I think this line is operative and is should clearly work: > > > > ierr = PetscViewerASCIIPrintf(viewer,"%D %D > %18.16e\n",i+1,a->j[j]+1,(double)a->a[j]);CHKERRQ(ierr); > > > > Could you run in serial (this code could very well be used for MPI Mats > also) with Matlab/ASCII to verify that you have this problem. And you could > modify this print statement and remake PETSc, if that's easy, to verify > that this code is operative. > > > > I think %18.16e should print 16 digits even if they are 0s ... > > > > > > On Tue, May 14, 2019 at 8:34 PM Sanjay Govindjee > wrote: > > I'm seeing half precision on at least 10 to 20% of the entries :( > > Knowing I should see full precision, I will dig deeper. > > > > -sanjay > > On 5/14/19 5:22 PM, Mark Adams wrote: > >> I would hope you get full precision. How many digits are you seeing? > >> > >> On Tue, May 14, 2019 at 7:15 PM Sanjay Govindjee via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> I am using the following bit of code to debug a matrix. What is the > >> expected precision of the numbers that I will find in my ASCII file? > >> As far as I can tell it is not the full double precision that I was > >> expecting. > >> > >> call PetscViewerASCIIOpen(PETSC_COMM_WORLD, > >> tangview,K_view, ierr) > >> call PetscViewerSetFormat(K_view, > >> PETSC_VIEWER_ASCII_MATLAB, ierr) > >> call MatView (Kmat, K_view, ierr) > >> > >> -sanjay > >> > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From s_g at berkeley.edu Wed May 15 10:44:37 2019 From: s_g at berkeley.edu (Sanjay Govindjee) Date: Wed, 15 May 2019 08:44:37 -0700 Subject: [petsc-users] Precision of MatView In-Reply-To: References: <684b3a35-b912-4e8c-b76e-91d989b5cdf4@berkeley.edu> <87b9e321-056f-1ac7-ae53-d52bf7beb891@berkeley.edu> <618B6424-ED63-4F4D-AC02-873B46F0E3EC@anl.gov> Message-ID: The issue is not with trailing zeros. The last ~7 digits are incorrect (in comparison to a write(*,*) from my serial code). I?m going to track down the issue today and will report back ? I?m guessing the problem is somewhere in my code. Sent from my iPhone > On May 15, 2019, at 7:25 AM, Mark Adams wrote: > > This thread suggests that I was at least not wrong in assuming that trailing 0s should be printed by printf (although I did not trace the code down to printf) > > https://stackoverflow.com/questions/277772/avoid-trailing-zeroes-in-printf > > Maybe Sanjay's machines printf cuts off trailing 0s. > >> On Wed, May 15, 2019 at 10:06 AM Smith, Barry F. wrote: >> >> The 10-20% in seven digits are presumably printed accurately; it is presumably simply the case that the rest of the digits would be zero and hence are not printed. >> >> > On May 15, 2019, at 8:59 AM, Mark Adams via petsc-users wrote: >> > >> > You are seeing half precision (like 7 digits) in 10-20% of the entries and full in the rest. >> > >> > Someone will probably chime in who knows about this but I can see where a serial matrix is printed in ASCII Matlab in MatView_SeqAIJ_ASCII in src/mat/impls/aij/seq/aij.c. >> > >> > I think this line is operative and is should clearly work: >> > >> > ierr = PetscViewerASCIIPrintf(viewer,"%D %D %18.16e\n",i+1,a->j[j]+1,(double)a->a[j]);CHKERRQ(ierr); >> > >> > Could you run in serial (this code could very well be used for MPI Mats also) with Matlab/ASCII to verify that you have this problem. And you could modify this print statement and remake PETSc, if that's easy, to verify that this code is operative. >> > >> > I think %18.16e should print 16 digits even if they are 0s ... >> > >> > >> > On Tue, May 14, 2019 at 8:34 PM Sanjay Govindjee wrote: >> > I'm seeing half precision on at least 10 to 20% of the entries :( >> > Knowing I should see full precision, I will dig deeper. >> > >> > -sanjay >> > On 5/14/19 5:22 PM, Mark Adams wrote: >> >> I would hope you get full precision. How many digits are you seeing? >> >> >> >> On Tue, May 14, 2019 at 7:15 PM Sanjay Govindjee via petsc-users wrote: >> >> I am using the following bit of code to debug a matrix. What is the >> >> expected precision of the numbers that I will find in my ASCII file? >> >> As far as I can tell it is not the full double precision that I was >> >> expecting. >> >> >> >> call PetscViewerASCIIOpen(PETSC_COMM_WORLD, >> >> tangview,K_view, ierr) >> >> call PetscViewerSetFormat(K_view, >> >> PETSC_VIEWER_ASCII_MATLAB, ierr) >> >> call MatView (Kmat, K_view, ierr) >> >> >> >> -sanjay >> >> >> > >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From s_g at berkeley.edu Wed May 15 14:30:16 2019 From: s_g at berkeley.edu (Sanjay Govindjee) Date: Wed, 15 May 2019 12:30:16 -0700 Subject: [petsc-users] Precision of MatView In-Reply-To: References: <684b3a35-b912-4e8c-b76e-91d989b5cdf4@berkeley.edu> <87b9e321-056f-1ac7-ae53-d52bf7beb891@berkeley.edu> <618B6424-ED63-4F4D-AC02-873B46F0E3EC@anl.gov> Message-ID: <23f427b8-ce88-2332-5e2f-19abf0b1ef08@berkeley.edu> Problem resolved. As I suspected the problem was mine.? My parallel runs were being performed using input files that has a lower precision in the input data than those being used in my serial runs. -sanjay On 5/15/19 7:25 AM, Mark Adams wrote: > This thread suggests that I was at least not wrong in assuming that > trailing 0s should be printed by printf (although I did not trace the > code down to printf) > > https://stackoverflow.com/questions/277772/avoid-trailing-zeroes-in-printf > > Maybe Sanjay's machines printf cuts off trailing 0s. > > On Wed, May 15, 2019 at 10:06 AM Smith, Barry F. > wrote: > > > ? ?The 10-20% in seven digits are presumably printed accurately; > it is presumably simply the case that the rest of the digits would > be zero and hence are not printed. > > > On May 15, 2019, at 8:59 AM, Mark Adams via petsc-users > > wrote: > > > > You are seeing half precision (like 7 digits) in 10-20% of the > entries and full in the rest. > > > > Someone will probably chime in who knows about this but I can > see where a serial matrix is printed in ASCII Matlab in > MatView_SeqAIJ_ASCII in? src/mat/impls/aij/seq/aij.c. > > > > I think this line is operative and is should clearly work: > > > >? ? ? ? ?ierr = PetscViewerASCIIPrintf(viewer,"%D %D > %18.16e\n",i+1,a->j[j]+1,(double)a->a[j]);CHKERRQ(ierr); > > > > Could you run in serial (this code could very well be used for > MPI Mats also) with Matlab/ASCII to verify that you have this > problem. And you could modify this print statement and remake > PETSc, if that's easy, to verify that this code is operative. > > > > I think %18.16e should print 16 digits even if they are 0s ... > > > > > > On Tue, May 14, 2019 at 8:34 PM Sanjay Govindjee > > wrote: > > I'm seeing half precision on at least 10 to 20% of the entries :( > > Knowing I should see full precision, I will dig deeper. > > > > -sanjay > > On 5/14/19 5:22 PM, Mark Adams wrote: > >> I would hope you get full precision. How many digits are you > seeing? > >> > >> On Tue, May 14, 2019 at 7:15 PM Sanjay Govindjee via > petsc-users > wrote: > >> I am using the following bit of code to debug a matrix.? What > is the > >> expected precision of the numbers that I will find in my ASCII > file? > >> As far as I can tell it is not the full double precision that I > was > >> expecting. > >> > >>? ? ? ? ? ? ? call PetscViewerASCIIOpen(PETSC_COMM_WORLD, > >> tangview,K_view, ierr) > >>? ? ? ? ? ? ? call PetscViewerSetFormat(K_view, > >> PETSC_VIEWER_ASCII_MATLAB, ierr) > >>? ? ? ? ? ? ? call MatView? ? ? ? ? ? ? ? ? ? ?(Kmat, K_view, ierr) > >> > >> -sanjay > >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sajidsyed2021 at u.northwestern.edu Thu May 16 15:06:09 2019 From: sajidsyed2021 at u.northwestern.edu (Sajid Ali) Date: Thu, 16 May 2019 15:06:09 -0500 Subject: [petsc-users] Question about TSComputeRHSJacobianConstant Message-ID: Hi PETSc developers, I have a question about TSComputeRHSJacobianConstant. If I create a TS (of type linear) for a problem where the jacobian does not change with time (set with the aforementioned option) and run it for different number of time steps, why does the time it takes to evaluate the jacobian change (as indicated by TSJacobianEval) ? To clarify, I run with the example with different TSSetTimeStep, but the same jacobian matrix. I see that the time spent in KSPSolve increases with increasing number of steps (which is as expected as this is a KSPOnly SNES solver). But surprisingly, the time spent in TSJacobianEval also increases with decreasing time-step (or increasing number of steps). For reference, I attach the log files for two cases which were run with different time steps and the source code. Thank You, Sajid Ali Applied Physics Northwestern University -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex_dmda.c Type: application/octet-stream Size: 13848 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: out_50 Type: application/octet-stream Size: 31472 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: out_100 Type: application/octet-stream Size: 32865 bytes Desc: not available URL: From William.Coirier at kratosdefense.com Thu May 16 15:44:47 2019 From: William.Coirier at kratosdefense.com (William Coirier) Date: Thu, 16 May 2019 20:44:47 +0000 Subject: [petsc-users] MatCreateBAIJ, SNES, Preallocation... Message-ID: Folks: I'm developing an application using the SNES, and overall it's working great, as many of our other PETSc-based projects. But, I'm having a problem related to (presumably) pre-allocation, block matrices and SNES. Without going into details about the actual problem we are solving, here are the symptoms/characteristics/behavior. * For the SNES Jacobian, I'm using MatCreateBAIJ for a block size=3, and letting "PETSC_DECIDE" the partitioning. Actual call is: * ierr = MatCreateBAIJ(PETSC_COMM_WORLD, bs, PETSC_DECIDE, PETSC_DECIDE, (int)3 * numNodesSAM, (int)3 * numNodesSAM, PETSC_DEFAULT, NULL, PETSC_DEFAULT, NULL, &J); * When registering the SNES jacobian function, I set the B and J matrices to be the same. * ierr = SNESSetJacobian(snes, J, J, SAMformSNESJ, (void *)this); CHKERRQ(ierr); * I can either let PETSc figure out the allocation structure: * ierr = MatMPIBAIJSetPreallocation(J, bs, PETSC_DEFAULT, NULL,PETSC_DEFAULT, NULL); * or, do it myself, since I know the fill pattern, * ierr = MatMPIBAIJSetPreallocation(J, bs, d_nz_dum,&d_nnz[0],o_nz_dum,&o_nnz[0]); The symptoms/problems are as follows: * Whether I do preallocation or not, the "setup" time is pretty long. It might take 2 minutes before SNES starts doing its thing. After this setup, convergence and speed is great. But this first phase takes a long time. I'm assuming this has to be related to some poor preallocation setup so it's doing tons of mallocs where it's not needed. * If I don't call my Jacobian formulation before calling SNESSolve, I get a segmentation violation in a PETSc routine. (If I DO call my Jacobian first, things work great, although slow for the setup phase.) Here's a snippet of the traceback: 0 0x00000000009649fc in MatMultAdd_SeqBAIJ_3 (A=, xx=0x3a525b0, yy=0x3a531b0, zz=0x3a531b0) at /home/jstutts/Downloads/petsc-3.11.1/src/mat/impls/baij/seq/baij2.c:1424 #1 0x00000000006444cb in MatMult_MPIBAIJ (A=0x15da340, xx=0x3a542a0, yy=0x3a531b0) at /home/jstutts/Downloads/petsc-3.11.1/src/mat/impls/baij/mpi/mpibaij.c:1380 #2 0x00000000005b2c0f in MatMult (mat=0x15da340, x=x at entry=0x3a542a0, y=y at entry=0x3a531b0) at /home/jstutts/Downloads/petsc-3.11.1/src/mat/interface/matrix.c:2396 #3 0x0000000000c61f2e in PCApplyBAorAB (pc=0x1ce78c0, side=PC_LEFT, x=0x3a542a0, y=y at entry=0x3a548a0, work=0x3a531b0) at /home/jstutts/Downloads/petsc-3.11.1/src/ksp/pc/interface/precon.c:690 #4 0x0000000000ccb36b in KSP_PCApplyBAorAB (w=, y=0x3a548a0, x=, ksp=0x1d44d50) at /home/jstutts/Downloads/petsc-3.11.1/include/petsc/private/kspimpl.h:309 #5 KSPGMRESCycle (itcount=itcount at entry=0x7fffffffc02c, ksp=ksp at entry=0x1d44d50) at /home/jstutts/Downloads/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmres.c:152 #6 0x0000000000ccbf6f in KSPSolve_GMRES (ksp=0x1d44d50) at /home/jstutts/Downloads/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmres.c:237 #7 0x00000000007dc193 in KSPSolve (ksp=0x1d44d50, b=b at entry=0x1d41c70, x=x at entry=0x1cebf40) I apologize if I've missed something in the documentation or examples, but I can't seem to figure this one out. The "setup" seems to take too long, and from my previous experiences with PETSc, this is due to a poor preallocation strategy. Any and all help is appreciated! ----------------------------------------------------------------------- William J. Coirier, Ph.D. Director, Aerosciences and Engineering Analysis Branch Advanced Concepts Development and Test Division Kratos Defense and Rocket Support Services 4904 Research Drive Huntsville, AL 35805 256-327-8170 256-327-8120 (fax) -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu May 16 15:50:36 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Thu, 16 May 2019 20:50:36 +0000 Subject: [petsc-users] Question about TSComputeRHSJacobianConstant In-Reply-To: References: Message-ID: <61B21078-9146-4FE2-8967-95D64DB583C6@anl.gov> Sajid, This is a huge embarrassing performance bug in PETSc https://bitbucket.org/petsc/petsc/issues/293/refactoring-of-ts-handling-of-reuse-of It is using 74 percent of the time to perform MatAXPY() on two large sparse matrices, not knowing they have identical nonzero patterns and one of which has all zeros off of the diagonal. This despite the fact that a few lines higher in the code is special purpose code for exactly the case you have that only stores one matrix and only ever shifts the diagonal of the matrix. Please edit TSSetUp() and remove the lines if (ts->rhsjacobian.reuse && rhsjac == TSComputeRHSJacobianConstant) { Mat Amat,Pmat; SNES snes; ierr = TSGetSNES(ts,&snes);CHKERRQ(ierr); ierr = SNESGetJacobian(snes,&Amat,&Pmat,NULL,NULL);CHKERRQ(ierr); /* Matching matrices implies that an IJacobian is NOT set, because if it had been set, the IJacobian's matrix would * have displaced the RHS matrix */ if (Amat && Amat == ts->Arhs) { /* we need to copy the values of the matrix because for the constant Jacobian case the user will never set the numerical values in this new location */ ierr = MatDuplicate(ts->Arhs,MAT_COPY_VALUES,&Amat);CHKERRQ(ierr); ierr = SNESSetJacobian(snes,Amat,NULL,NULL,NULL);CHKERRQ(ierr); ierr = MatDestroy(&Amat);CHKERRQ(ierr); } if (Pmat && Pmat == ts->Brhs) { ierr = MatDuplicate(ts->Brhs,MAT_COPY_VALUES,&Pmat);CHKERRQ(ierr); ierr = SNESSetJacobian(snes,NULL,Pmat,NULL,NULL);CHKERRQ(ierr); ierr = MatDestroy(&Pmat);CHKERRQ(ierr); } } You will be stunned by the improvement in time. > On May 16, 2019, at 3:06 PM, Sajid Ali via petsc-users wrote: > > Hi PETSc developers, > > I have a question about TSComputeRHSJacobianConstant. If I create a TS (of type linear) for a problem where the jacobian does not change with time (set with the aforementioned option) and run it for different number of time steps, why does the time it takes to evaluate the jacobian change (as indicated by TSJacobianEval) ? > > To clarify, I run with the example with different TSSetTimeStep, but the same jacobian matrix. I see that the time spent in KSPSolve increases with increasing number of steps (which is as expected as this is a KSPOnly SNES solver). But surprisingly, the time spent in TSJacobianEval also increases with decreasing time-step (or increasing number of steps). > > For reference, I attach the log files for two cases which were run with different time steps and the source code. > > Thank You, > Sajid Ali > Applied Physics > Northwestern University > From bsmith at mcs.anl.gov Thu May 16 16:07:10 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Thu, 16 May 2019 21:07:10 +0000 Subject: [petsc-users] MatCreateBAIJ, SNES, Preallocation... In-Reply-To: References: Message-ID: <1372FD84-8585-402C-84EA-38BEC1AC36B7@anl.gov> > On May 16, 2019, at 3:44 PM, William Coirier via petsc-users wrote: > > Folks: > > I'm developing an application using the SNES, and overall it's working great, as many of our other PETSc-based projects. But, I'm having a problem related to (presumably) pre-allocation, block matrices and SNES. > > Without going into details about the actual problem we are solving, here are the symptoms/characteristics/behavior. > ? For the SNES Jacobian, I'm using MatCreateBAIJ for a block size=3, and letting "PETSC_DECIDE" the partitioning. Actual call is: > ? ierr = MatCreateBAIJ(PETSC_COMM_WORLD, bs, PETSC_DECIDE, PETSC_DECIDE, (int)3 * numNodesSAM, (int)3 * numNodesSAM, PETSC_DEFAULT, NULL, PETSC_DEFAULT, NULL, &J); > ? When registering the SNES jacobian function, I set the B and J matrices to be the same. > ? ierr = SNESSetJacobian(snes, J, J, SAMformSNESJ, (void *)this); CHKERRQ(ierr); > ? I can either let PETSc figure out the allocation structure: > ? ierr = MatMPIBAIJSetPreallocation(J, bs, PETSC_DEFAULT, NULL,PETSC_DEFAULT, NULL); > ? or, do it myself, since I know the fill pattern, > ? ierr = MatMPIBAIJSetPreallocation(J, bs, d_nz_dum,&d_nnz[0],o_nz_dum,&o_nnz[0]); > The symptoms/problems are as follows: > ? Whether I do preallocation or not, the "setup" time is pretty long. It might take 2 minutes before SNES starts doing its thing. After this setup, convergence and speed is great. But this first phase takes a long time. I'm assuming this has to be related to some poor preallocation setup so it's doing tons of mallocs where it?s not needed. You should definitely get much better performance with proper preallocation then with none (unless the default is enough for your matrix). Run with -info and grep for "malloc" this will tell you exactly how many, if any mallocs are taking place inside the MatSetValues() due to improper preallocation. > ? If I don't call my Jacobian formulation before calling SNESSolve, I get a segmentation violation in a PETSc routine. Not sure what you mean by Jacobian formation but I'm guessing filling up the Jacobian with numerical values? Something is wrong because you should not need to fill up the values before calling SNES solve, and regardless it should never ever crash with a segmentation violation. You can run with valgrind https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind to make sure that it is not a memory corruption issue. You can also run in the debugger (perhaps the PETSc command line option -start_in_debugger) to get more details on why it is crashing. When you have it running satisfactory you can send us the output from running with -log_view and we can let you know how it seems to be performing efficiency wise. Barry > (If I DO call my Jacobian first, things work great, although slow for the setup phase.) Here's a snippet of the traceback: > 0 0x00000000009649fc in MatMultAdd_SeqBAIJ_3 (A=, > xx=0x3a525b0, yy=0x3a531b0, zz=0x3a531b0) > at /home/jstutts/Downloads/petsc-3.11.1/src/mat/impls/baij/seq/baij2.c:1424 > #1 0x00000000006444cb in MatMult_MPIBAIJ (A=0x15da340, xx=0x3a542a0, > yy=0x3a531b0) > at /home/jstutts/Downloads/petsc-3.11.1/src/mat/impls/baij/mpi/mpibaij.c:1380 > #2 0x00000000005b2c0f in MatMult (mat=0x15da340, x=x at entry=0x3a542a0, > y=y at entry=0x3a531b0) > at /home/jstutts/Downloads/petsc-3.11.1/src/mat/interface/matrix.c:2396 > #3 0x0000000000c61f2e in PCApplyBAorAB (pc=0x1ce78c0, side=PC_LEFT, > x=0x3a542a0, y=y at entry=0x3a548a0, work=0x3a531b0) > at /home/jstutts/Downloads/petsc-3.11.1/src/ksp/pc/interface/precon.c:690 > #4 0x0000000000ccb36b in KSP_PCApplyBAorAB (w=, y=0x3a548a0, > x=, ksp=0x1d44d50) > at /home/jstutts/Downloads/petsc-3.11.1/include/petsc/private/kspimpl.h:309 > #5 KSPGMRESCycle (itcount=itcount at entry=0x7fffffffc02c, > ksp=ksp at entry=0x1d44d50) > at /home/jstutts/Downloads/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmres.c:152 > #6 0x0000000000ccbf6f in KSPSolve_GMRES (ksp=0x1d44d50) > at /home/jstutts/Downloads/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmres.c:237 > #7 0x00000000007dc193 in KSPSolve (ksp=0x1d44d50, b=b at entry=0x1d41c70, > x=x at entry=0x1cebf40) > > > > I apologize if I?ve missed something in the documentation or examples, but I can?t seem to figure this one out. The ?setup? seems to take too long, and from my previous experiences with PETSc, this is due to a poor preallocation strategy. > > Any and all help is appreciated! > > ----------------------------------------------------------------------- > William J. Coirier, Ph.D. > Director, Aerosciences and Engineering Analysis Branch > Advanced Concepts Development and Test Division > Kratos Defense and Rocket Support Services > 4904 Research Drive > Huntsville, AL 35805 > 256-327-8170 > 256-327-8120 (fax) From William.Coirier at kratosdefense.com Thu May 16 16:50:40 2019 From: William.Coirier at kratosdefense.com (William Coirier) Date: Thu, 16 May 2019 21:50:40 +0000 Subject: [petsc-users] MatCreateBAIJ, SNES, Preallocation... In-Reply-To: <1372FD84-8585-402C-84EA-38BEC1AC36B7@anl.gov> References: , <1372FD84-8585-402C-84EA-38BEC1AC36B7@anl.gov> Message-ID: Barry: Thanks for the quick response! Running with -info gives nearly the same # of mallocs whether I "prealloc" or not. I'll bet I'm doing something wrong with the preallocation. I must know the matrix structure since convergence is really good with SNES. I should have 9232128 total non zeros, and when i do a -info -mat_view ::ascii_info i see that in the diagnostic output, but I also see a lot of allocated non-zeros: Mat Object: SNES_Jacobian 8 MPI processes type: mpibaij rows=453195, cols=453195, bs=3 total: nonzeros=9232128, allocated nonzeros=203660352 total number of mallocs used during MatSetValues calls =146300 block size is 3 grepping for malloc in the output shows this initially (8 processors) and then zeros afterwards. Makes sense. [0] MatAssemblyEnd_SeqBAIJ(): Number of mallocs during MatSetValues is 18884 [3] MatAssemblyEnd_SeqBAIJ(): Number of mallocs during MatSetValues is 18883 [7] MatAssemblyEnd_SeqBAIJ(): Number of mallocs during MatSetValues is 14122 [4] MatAssemblyEnd_SeqBAIJ(): Number of mallocs during MatSetValues is 18883 [5] MatAssemblyEnd_SeqBAIJ(): Number of mallocs during MatSetValues is 18881 [2] MatAssemblyEnd_SeqBAIJ(): Number of mallocs during MatSetValues is 18882 [1] MatAssemblyEnd_SeqBAIJ(): Number of mallocs during MatSetValues is 18882 [6] MatAssemblyEnd_SeqBAIJ(): Number of mallocs during MatSetValues is 18883 ________________________________________ From: Smith, Barry F. [bsmith at mcs.anl.gov] Sent: Thursday, May 16, 2019 4:07 PM To: William Coirier Cc: petsc-users at mcs.anl.gov; Michael Robinson; Andrew Holm Subject: Re: [petsc-users] MatCreateBAIJ, SNES, Preallocation... > On May 16, 2019, at 3:44 PM, William Coirier via petsc-users wrote: > > Folks: > > I'm developing an application using the SNES, and overall it's working great, as many of our other PETSc-based projects. But, I'm having a problem related to (presumably) pre-allocation, block matrices and SNES. > > Without going into details about the actual problem we are solving, here are the symptoms/characteristics/behavior. > ? For the SNES Jacobian, I'm using MatCreateBAIJ for a block size=3, and letting "PETSC_DECIDE" the partitioning. Actual call is: > ? ierr = MatCreateBAIJ(PETSC_COMM_WORLD, bs, PETSC_DECIDE, PETSC_DECIDE, (int)3 * numNodesSAM, (int)3 * numNodesSAM, PETSC_DEFAULT, NULL, PETSC_DEFAULT, NULL, &J); > ? When registering the SNES jacobian function, I set the B and J matrices to be the same. > ? ierr = SNESSetJacobian(snes, J, J, SAMformSNESJ, (void *)this); CHKERRQ(ierr); > ? I can either let PETSc figure out the allocation structure: > ? ierr = MatMPIBAIJSetPreallocation(J, bs, PETSC_DEFAULT, NULL,PETSC_DEFAULT, NULL); > ? or, do it myself, since I know the fill pattern, > ? ierr = MatMPIBAIJSetPreallocation(J, bs, d_nz_dum,&d_nnz[0],o_nz_dum,&o_nnz[0]); > The symptoms/problems are as follows: > ? Whether I do preallocation or not, the "setup" time is pretty long. It might take 2 minutes before SNES starts doing its thing. After this setup, convergence and speed is great. But this first phase takes a long time. I'm assuming this has to be related to some poor preallocation setup so it's doing tons of mallocs where it?s not needed. You should definitely get much better performance with proper preallocation then with none (unless the default is enough for your matrix). Run with -info and grep for "malloc" this will tell you exactly how many, if any mallocs are taking place inside the MatSetValues() due to improper preallocation. > ? If I don't call my Jacobian formulation before calling SNESSolve, I get a segmentation violation in a PETSc routine. Not sure what you mean by Jacobian formation but I'm guessing filling up the Jacobian with numerical values? Something is wrong because you should not need to fill up the values before calling SNES solve, and regardless it should never ever crash with a segmentation violation. You can run with valgrind https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mcs.anl.gov_petsc_documentation_faq.html-23valgrind&d=DwIGaQ&c=zeCCs5WLaN-HWPHrpXwbFoOqeS0G3NH2_2IQ_bzV13g&r=q_3hswOPAFb0l_4-IAZZi5DgTpzDUIpk984njq2YnggBd-vCWTgNlbk27KjHXmKK&m=IREQmkCt5PXK-SnLqJZXz3Du7h3mFP24xtI0jHGgGUY&s=UiGHkQ2Zr_nYYQ-GYg1HEYtbqZutYSgv9F1A86sfNKI&e= to make sure that it is not a memory corruption issue. You can also run in the debugger (perhaps the PETSc command line option -start_in_debugger) to get more details on why it is crashing. When you have it running satisfactory you can send us the output from running with -log_view and we can let you know how it seems to be performing efficiency wise. Barry > (If I DO call my Jacobian first, things work great, although slow for the setup phase.) Here's a snippet of the traceback: > 0 0x00000000009649fc in MatMultAdd_SeqBAIJ_3 (A=, > xx=0x3a525b0, yy=0x3a531b0, zz=0x3a531b0) > at /home/jstutts/Downloads/petsc-3.11.1/src/mat/impls/baij/seq/baij2.c:1424 > #1 0x00000000006444cb in MatMult_MPIBAIJ (A=0x15da340, xx=0x3a542a0, > yy=0x3a531b0) > at /home/jstutts/Downloads/petsc-3.11.1/src/mat/impls/baij/mpi/mpibaij.c:1380 > #2 0x00000000005b2c0f in MatMult (mat=0x15da340, x=x at entry=0x3a542a0, > y=y at entry=0x3a531b0) > at /home/jstutts/Downloads/petsc-3.11.1/src/mat/interface/matrix.c:2396 > #3 0x0000000000c61f2e in PCApplyBAorAB (pc=0x1ce78c0, side=PC_LEFT, > x=0x3a542a0, y=y at entry=0x3a548a0, work=0x3a531b0) > at /home/jstutts/Downloads/petsc-3.11.1/src/ksp/pc/interface/precon.c:690 > #4 0x0000000000ccb36b in KSP_PCApplyBAorAB (w=, y=0x3a548a0, > x=, ksp=0x1d44d50) > at /home/jstutts/Downloads/petsc-3.11.1/include/petsc/private/kspimpl.h:309 > #5 KSPGMRESCycle (itcount=itcount at entry=0x7fffffffc02c, > ksp=ksp at entry=0x1d44d50) > at /home/jstutts/Downloads/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmres.c:152 > #6 0x0000000000ccbf6f in KSPSolve_GMRES (ksp=0x1d44d50) > at /home/jstutts/Downloads/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmres.c:237 > #7 0x00000000007dc193 in KSPSolve (ksp=0x1d44d50, b=b at entry=0x1d41c70, > x=x at entry=0x1cebf40) > > > > I apologize if I?ve missed something in the documentation or examples, but I can?t seem to figure this one out. The ?setup? seems to take too long, and from my previous experiences with PETSc, this is due to a poor preallocation strategy. > > Any and all help is appreciated! > > ----------------------------------------------------------------------- > William J. Coirier, Ph.D. > Director, Aerosciences and Engineering Analysis Branch > Advanced Concepts Development and Test Division > Kratos Defense and Rocket Support Services > 4904 Research Drive > Huntsville, AL 35805 > 256-327-8170 > 256-327-8120 (fax) From William.Coirier at kratosdefense.com Thu May 16 17:28:16 2019 From: William.Coirier at kratosdefense.com (William Coirier) Date: Thu, 16 May 2019 22:28:16 +0000 Subject: [petsc-users] MatCreateBAIJ, SNES, Preallocation... In-Reply-To: <1372FD84-8585-402C-84EA-38BEC1AC36B7@anl.gov> References: , <1372FD84-8585-402C-84EA-38BEC1AC36B7@anl.gov> Message-ID: Ok, got it. My misinterpretation was how to fill the d_nnz and o_nnz arrays. Thank you for your help! Might I make a suggestion related to the documentation? Perhaps I have not fully read the page on the MatMPIBAIJSetPreallocation so you can simply disregard and I'm ok with that! The documentation has for the d_nnz: d_nnz - array containing the number of block nonzeros in the various block rows of the in diagonal portion of the local (possibly different for each block row) or NULL. If you plan to factor the matrix you must leave room for the diagonal entry and set it even if it is zero. Am I correct in that this array should be of size numRows, where numRows is found from calling MatGetOwnershipRange(J,&iLow,&iHigh) so numRows=iHigh-iLow. I think my error was allocating this only to be numRows/bs since I thought it's a block size thing. When I fixed this, things worked really fast! Maybe it's obvious it should be that size. :-) Regardless, thanks for your help. PETSc is great! I'm a believer and user. ________________________________________ From: Smith, Barry F. [bsmith at mcs.anl.gov] Sent: Thursday, May 16, 2019 4:07 PM To: William Coirier Cc: petsc-users at mcs.anl.gov; Michael Robinson; Andrew Holm Subject: Re: [petsc-users] MatCreateBAIJ, SNES, Preallocation... > On May 16, 2019, at 3:44 PM, William Coirier via petsc-users wrote: > > Folks: > > I'm developing an application using the SNES, and overall it's working great, as many of our other PETSc-based projects. But, I'm having a problem related to (presumably) pre-allocation, block matrices and SNES. > > Without going into details about the actual problem we are solving, here are the symptoms/characteristics/behavior. > ? For the SNES Jacobian, I'm using MatCreateBAIJ for a block size=3, and letting "PETSC_DECIDE" the partitioning. Actual call is: > ? ierr = MatCreateBAIJ(PETSC_COMM_WORLD, bs, PETSC_DECIDE, PETSC_DECIDE, (int)3 * numNodesSAM, (int)3 * numNodesSAM, PETSC_DEFAULT, NULL, PETSC_DEFAULT, NULL, &J); > ? When registering the SNES jacobian function, I set the B and J matrices to be the same. > ? ierr = SNESSetJacobian(snes, J, J, SAMformSNESJ, (void *)this); CHKERRQ(ierr); > ? I can either let PETSc figure out the allocation structure: > ? ierr = MatMPIBAIJSetPreallocation(J, bs, PETSC_DEFAULT, NULL,PETSC_DEFAULT, NULL); > ? or, do it myself, since I know the fill pattern, > ? ierr = MatMPIBAIJSetPreallocation(J, bs, d_nz_dum,&d_nnz[0],o_nz_dum,&o_nnz[0]); > The symptoms/problems are as follows: > ? Whether I do preallocation or not, the "setup" time is pretty long. It might take 2 minutes before SNES starts doing its thing. After this setup, convergence and speed is great. But this first phase takes a long time. I'm assuming this has to be related to some poor preallocation setup so it's doing tons of mallocs where it?s not needed. You should definitely get much better performance with proper preallocation then with none (unless the default is enough for your matrix). Run with -info and grep for "malloc" this will tell you exactly how many, if any mallocs are taking place inside the MatSetValues() due to improper preallocation. > ? If I don't call my Jacobian formulation before calling SNESSolve, I get a segmentation violation in a PETSc routine. Not sure what you mean by Jacobian formation but I'm guessing filling up the Jacobian with numerical values? Something is wrong because you should not need to fill up the values before calling SNES solve, and regardless it should never ever crash with a segmentation violation. You can run with valgrind https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mcs.anl.gov_petsc_documentation_faq.html-23valgrind&d=DwIGaQ&c=zeCCs5WLaN-HWPHrpXwbFoOqeS0G3NH2_2IQ_bzV13g&r=q_3hswOPAFb0l_4-IAZZi5DgTpzDUIpk984njq2YnggBd-vCWTgNlbk27KjHXmKK&m=IREQmkCt5PXK-SnLqJZXz3Du7h3mFP24xtI0jHGgGUY&s=UiGHkQ2Zr_nYYQ-GYg1HEYtbqZutYSgv9F1A86sfNKI&e= to make sure that it is not a memory corruption issue. You can also run in the debugger (perhaps the PETSc command line option -start_in_debugger) to get more details on why it is crashing. When you have it running satisfactory you can send us the output from running with -log_view and we can let you know how it seems to be performing efficiency wise. Barry > (If I DO call my Jacobian first, things work great, although slow for the setup phase.) Here's a snippet of the traceback: > 0 0x00000000009649fc in MatMultAdd_SeqBAIJ_3 (A=, > xx=0x3a525b0, yy=0x3a531b0, zz=0x3a531b0) > at /home/jstutts/Downloads/petsc-3.11.1/src/mat/impls/baij/seq/baij2.c:1424 > #1 0x00000000006444cb in MatMult_MPIBAIJ (A=0x15da340, xx=0x3a542a0, > yy=0x3a531b0) > at /home/jstutts/Downloads/petsc-3.11.1/src/mat/impls/baij/mpi/mpibaij.c:1380 > #2 0x00000000005b2c0f in MatMult (mat=0x15da340, x=x at entry=0x3a542a0, > y=y at entry=0x3a531b0) > at /home/jstutts/Downloads/petsc-3.11.1/src/mat/interface/matrix.c:2396 > #3 0x0000000000c61f2e in PCApplyBAorAB (pc=0x1ce78c0, side=PC_LEFT, > x=0x3a542a0, y=y at entry=0x3a548a0, work=0x3a531b0) > at /home/jstutts/Downloads/petsc-3.11.1/src/ksp/pc/interface/precon.c:690 > #4 0x0000000000ccb36b in KSP_PCApplyBAorAB (w=, y=0x3a548a0, > x=, ksp=0x1d44d50) > at /home/jstutts/Downloads/petsc-3.11.1/include/petsc/private/kspimpl.h:309 > #5 KSPGMRESCycle (itcount=itcount at entry=0x7fffffffc02c, > ksp=ksp at entry=0x1d44d50) > at /home/jstutts/Downloads/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmres.c:152 > #6 0x0000000000ccbf6f in KSPSolve_GMRES (ksp=0x1d44d50) > at /home/jstutts/Downloads/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmres.c:237 > #7 0x00000000007dc193 in KSPSolve (ksp=0x1d44d50, b=b at entry=0x1d41c70, > x=x at entry=0x1cebf40) > > > > I apologize if I?ve missed something in the documentation or examples, but I can?t seem to figure this one out. The ?setup? seems to take too long, and from my previous experiences with PETSc, this is due to a poor preallocation strategy. > > Any and all help is appreciated! > > ----------------------------------------------------------------------- > William J. Coirier, Ph.D. > Director, Aerosciences and Engineering Analysis Branch > Advanced Concepts Development and Test Division > Kratos Defense and Rocket Support Services > 4904 Research Drive > Huntsville, AL 35805 > 256-327-8170 > 256-327-8120 (fax) From sajidsyed2021 at u.northwestern.edu Thu May 16 18:33:37 2019 From: sajidsyed2021 at u.northwestern.edu (Sajid Ali) Date: Thu, 16 May 2019 18:33:37 -0500 Subject: [petsc-users] Question about TSComputeRHSJacobianConstant In-Reply-To: <61B21078-9146-4FE2-8967-95D64DB583C6@anl.gov> References: <61B21078-9146-4FE2-8967-95D64DB583C6@anl.gov> Message-ID: Hi Barry, Thanks a lot for pointing this out. I'm seeing ~3X speedup in time ! Attached are the new log files. Does everything look right ? Thank You, Sajid Ali Applied Physics Northwestern University -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: out_50 Type: application/octet-stream Size: 28409 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: out_100 Type: application/octet-stream Size: 29801 bytes Desc: not available URL: From sajidsyed2021 at u.northwestern.edu Thu May 16 20:04:03 2019 From: sajidsyed2021 at u.northwestern.edu (Sajid Ali) Date: Thu, 16 May 2019 20:04:03 -0500 Subject: [petsc-users] Question about TSComputeRHSJacobianConstant In-Reply-To: References: <61B21078-9146-4FE2-8967-95D64DB583C6@anl.gov> Message-ID: While there is a ~3.5X speedup, deleting the aforementioned 20 lines also leads the new version of petsc to give the wrong solution (off by orders of magnitude for the same program). I tried switching over the the IFunction/IJacobian interface as per the manual (page 146) which the following lines : ``` TSSetProblemType(ts,TSLINEAR); TSSetRHSFunction(ts,NULL,TSComputeRHSFunctionLinear,NULL); TSSetRHSJacobian(ts,A,A,TSComputeRHSJacobianConstant,NULL); ``` are equivalent to : ``` TSSetProblemType(ts,TSLINEAR); TSSetIFunction(ts,NULL,TSComputeIFunctionLinear,NULL); TSSetIJacobian(ts,A,A,TSComputeIJacobianConstant,NULL); ``` But the example at src/ts/examples/tutorials/ex3.c employs a strategy of setting a shift flag to prevent re-computation for time-independent problems. Moreover, the docs say "using this function (TSComputeIFunctionLinear) is NOT equivalent to using TSComputeRHSFunctionLinear()" and now I'm even more confused. PS : Doing the simple switch is as slow as the original code and the answer is wrong as well. Thank You, Sajid Ali Applied Physics Northwestern University -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Thu May 16 22:33:47 2019 From: hongzhang at anl.gov (Zhang, Hong) Date: Fri, 17 May 2019 03:33:47 +0000 Subject: [petsc-users] Question about TSComputeRHSJacobianConstant In-Reply-To: References: <61B21078-9146-4FE2-8967-95D64DB583C6@anl.gov> Message-ID: Hi Sajid, Can you please try this branch hongzh/fix-computejacobian quickly and see if it makes a difference? Thanks, Hong (Mr.) On May 16, 2019, at 8:04 PM, Sajid Ali via petsc-users > wrote: While there is a ~3.5X speedup, deleting the aforementioned 20 lines also leads the new version of petsc to give the wrong solution (off by orders of magnitude for the same program). I tried switching over the the IFunction/IJacobian interface as per the manual (page 146) which the following lines : ``` TSSetProblemType(ts,TSLINEAR); TSSetRHSFunction(ts,NULL,TSComputeRHSFunctionLinear,NULL); TSSetRHSJacobian(ts,A,A,TSComputeRHSJacobianConstant,NULL); ``` are equivalent to : ``` TSSetProblemType(ts,TSLINEAR); TSSetIFunction(ts,NULL,TSComputeIFunctionLinear,NULL); TSSetIJacobian(ts,A,A,TSComputeIJacobianConstant,NULL); ``` But the example at src/ts/examples/tutorials/ex3.c employs a strategy of setting a shift flag to prevent re-computation for time-independent problems. Moreover, the docs say "using this function (TSComputeIFunctionLinear) is NOT equivalent to using TSComputeRHSFunctionLinear()" and now I'm even more confused. PS : Doing the simple switch is as slow as the original code and the answer is wrong as well. Thank You, Sajid Ali Applied Physics Northwestern University -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu May 16 22:42:48 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Fri, 17 May 2019 03:42:48 +0000 Subject: [petsc-users] Question about TSComputeRHSJacobianConstant In-Reply-To: References: <61B21078-9146-4FE2-8967-95D64DB583C6@anl.gov> Message-ID: > On May 16, 2019, at 8:04 PM, Sajid Ali wrote: > > While there is a ~3.5X speedup, deleting the aforementioned 20 lines also leads the new version of petsc to give the wrong solution (off by orders of magnitude for the same program). Ok, sorry about this. Unfortunately this stuff has been giving us headaches for years and we are struggling to get it right. > > I tried switching over the the IFunction/IJacobian interface as per the manual (page 146) which the following lines : It is probably better to not switch to the IFunction/IJacobian, we are more likely to get the TS version working properly. > ``` > TSSetProblemType(ts,TSLINEAR); > TSSetRHSFunction(ts,NULL,TSComputeRHSFunctionLinear,NULL); > TSSetRHSJacobian(ts,A,A,TSComputeRHSJacobianConstant,NULL); > ``` > are equivalent to : > ``` > TSSetProblemType(ts,TSLINEAR); > TSSetIFunction(ts,NULL,TSComputeIFunctionLinear,NULL); > TSSetIJacobian(ts,A,A,TSComputeIJacobianConstant,NULL); > ``` > But the example at src/ts/examples/tutorials/ex3.c employs a strategy of setting a shift flag to prevent re-computation for time-independent problems. Moreover, the docs say "using this function (TSComputeIFunctionLinear) is NOT equivalent to using TSComputeRHSFunctionLinear()" and now I'm even more confused. > > PS : Doing the simple switch is as slow as the original code and the answer is wrong as well. > > Thank You, > Sajid Ali > Applied Physics > Northwestern University From bsmith at mcs.anl.gov Fri May 17 02:12:13 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Fri, 17 May 2019 07:12:13 +0000 Subject: [petsc-users] MatCreateBAIJ, SNES, Preallocation... In-Reply-To: References: <1372FD84-8585-402C-84EA-38BEC1AC36B7@anl.gov> Message-ID: <8B0E8F7D-80E6-4755-8BDD-C67053544B8D@mcs.anl.gov> > On May 16, 2019, at 5:28 PM, William Coirier wrote: > > Ok, got it. My misinterpretation was how to fill the d_nnz and o_nnz arrays. > > Thank you for your help! > > Might I make a suggestion related to the documentation? Perhaps I have not fully read the page on the MatMPIBAIJSetPreallocation so you can simply disregard and I'm ok with that! The documentation has for the d_nnz: > > d_nnz - array containing the number of block nonzeros in the various block rows of the in diagonal portion of the local (possibly different for each block row) or NULL. If you plan to factor the matrix you must leave room for the diagonal entry and set it even if it is zero. > > Am I correct in that this array should be of size numRows, where numRows is found from calling MatGetOwnershipRange(J,&iLow,&iHigh) so numRows=iHigh-iLow. > > I think my error was allocating this only to be numRows/bs since I thought it's a block size thing. You should no need set any more than that. So something still seems a bit odd. It is a block size thing. Are you sure he bs in you code matches here matches the block sie > > When I fixed this, things worked really fast! Maybe it's obvious it should be that size. :-) > > Regardless, thanks for your help. PETSc is great! I'm a believer and user. > ________________________________________ > From: Smith, Barry F. [bsmith at mcs.anl.gov] > Sent: Thursday, May 16, 2019 4:07 PM > To: William Coirier > Cc: petsc-users at mcs.anl.gov; Michael Robinson; Andrew Holm > Subject: Re: [petsc-users] MatCreateBAIJ, SNES, Preallocation... > >> On May 16, 2019, at 3:44 PM, William Coirier via petsc-users wrote: >> >> Folks: >> >> I'm developing an application using the SNES, and overall it's working great, as many of our other PETSc-based projects. But, I'm having a problem related to (presumably) pre-allocation, block matrices and SNES. >> >> Without going into details about the actual problem we are solving, here are the symptoms/characteristics/behavior. >> ? For the SNES Jacobian, I'm using MatCreateBAIJ for a block size=3, and letting "PETSC_DECIDE" the partitioning. Actual call is: >> ? ierr = MatCreateBAIJ(PETSC_COMM_WORLD, bs, PETSC_DECIDE, PETSC_DECIDE, (int)3 * numNodesSAM, (int)3 * numNodesSAM, PETSC_DEFAULT, NULL, PETSC_DEFAULT, NULL, &J); >> ? When registering the SNES jacobian function, I set the B and J matrices to be the same. >> ? ierr = SNESSetJacobian(snes, J, J, SAMformSNESJ, (void *)this); CHKERRQ(ierr); >> ? I can either let PETSc figure out the allocation structure: >> ? ierr = MatMPIBAIJSetPreallocation(J, bs, PETSC_DEFAULT, NULL,PETSC_DEFAULT, NULL); >> ? or, do it myself, since I know the fill pattern, >> ? ierr = MatMPIBAIJSetPreallocation(J, bs, d_nz_dum,&d_nnz[0],o_nz_dum,&o_nnz[0]); >> The symptoms/problems are as follows: >> ? Whether I do preallocation or not, the "setup" time is pretty long. It might take 2 minutes before SNES starts doing its thing. After this setup, convergence and speed is great. But this first phase takes a long time. I'm assuming this has to be related to some poor preallocation setup so it's doing tons of mallocs where it?s not needed. > > You should definitely get much better performance with proper preallocation then with none (unless the default is enough for your matrix). Run with -info and grep for "malloc" this will tell you exactly how many, if any mallocs are taking place inside the MatSetValues() due to improper preallocation. > >> ? If I don't call my Jacobian formulation before calling SNESSolve, I get a segmentation violation in a PETSc routine. > > Not sure what you mean by Jacobian formation but I'm guessing filling up the Jacobian with numerical values? > > Something is wrong because you should not need to fill up the values before calling SNES solve, and regardless it should never ever crash with > a segmentation violation. You can run with valgrind https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mcs.anl.gov_petsc_documentation_faq.html-23valgrind&d=DwIGaQ&c=zeCCs5WLaN-HWPHrpXwbFoOqeS0G3NH2_2IQ_bzV13g&r=q_3hswOPAFb0l_4-IAZZi5DgTpzDUIpk984njq2YnggBd-vCWTgNlbk27KjHXmKK&m=IREQmkCt5PXK-SnLqJZXz3Du7h3mFP24xtI0jHGgGUY&s=UiGHkQ2Zr_nYYQ-GYg1HEYtbqZutYSgv9F1A86sfNKI&e= to make sure that it is not a memory corruption issue. You can also run in the debugger (perhaps the PETSc command line option -start_in_debugger) to get more details on why it is crashing. > > When you have it running satisfactory you can send us the output from running with -log_view and we can let you know how it seems to be performing efficiency wise. > > Barry > > > >> (If I DO call my Jacobian first, things work great, although slow for the setup phase.) Here's a snippet of the traceback: >> 0 0x00000000009649fc in MatMultAdd_SeqBAIJ_3 (A=, >> xx=0x3a525b0, yy=0x3a531b0, zz=0x3a531b0) >> at /home/jstutts/Downloads/petsc-3.11.1/src/mat/impls/baij/seq/baij2.c:1424 >> #1 0x00000000006444cb in MatMult_MPIBAIJ (A=0x15da340, xx=0x3a542a0, >> yy=0x3a531b0) >> at /home/jstutts/Downloads/petsc-3.11.1/src/mat/impls/baij/mpi/mpibaij.c:1380 >> #2 0x00000000005b2c0f in MatMult (mat=0x15da340, x=x at entry=0x3a542a0, >> y=y at entry=0x3a531b0) >> at /home/jstutts/Downloads/petsc-3.11.1/src/mat/interface/matrix.c:2396 >> #3 0x0000000000c61f2e in PCApplyBAorAB (pc=0x1ce78c0, side=PC_LEFT, >> x=0x3a542a0, y=y at entry=0x3a548a0, work=0x3a531b0) >> at /home/jstutts/Downloads/petsc-3.11.1/src/ksp/pc/interface/precon.c:690 >> #4 0x0000000000ccb36b in KSP_PCApplyBAorAB (w=, y=0x3a548a0, >> x=, ksp=0x1d44d50) >> at /home/jstutts/Downloads/petsc-3.11.1/include/petsc/private/kspimpl.h:309 >> #5 KSPGMRESCycle (itcount=itcount at entry=0x7fffffffc02c, >> ksp=ksp at entry=0x1d44d50) >> at /home/jstutts/Downloads/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmres.c:152 >> #6 0x0000000000ccbf6f in KSPSolve_GMRES (ksp=0x1d44d50) >> at /home/jstutts/Downloads/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmres.c:237 >> #7 0x00000000007dc193 in KSPSolve (ksp=0x1d44d50, b=b at entry=0x1d41c70, >> x=x at entry=0x1cebf40) >> >> >> >> I apologize if I?ve missed something in the documentation or examples, but I can?t seem to figure this one out. The ?setup? seems to take too long, and from my previous experiences with PETSc, this is due to a poor preallocation strategy. >> >> Any and all help is appreciated! >> >> ----------------------------------------------------------------------- >> William J. Coirier, Ph.D. >> Director, Aerosciences and Engineering Analysis Branch >> Advanced Concepts Development and Test Division >> Kratos Defense and Rocket Support Services >> 4904 Research Drive >> Huntsville, AL 35805 >> 256-327-8170 >> 256-327-8120 (fax) > From tempohoper at gmail.com Fri May 17 03:40:37 2019 From: tempohoper at gmail.com (Sal Am) Date: Fri, 17 May 2019 09:40:37 +0100 Subject: [petsc-users] Suggestions for solver and pc Message-ID: Hello, So I am trying to solve this problem in the frequency domain by solving a Ax=b. It is apparently very ill conditioned. RF wave plasma interaction simulation. There are 6M finite elements the matrix is of the size: nnx: 1257303210 (1B non-zero elements) n: 20347817 (size of matrix so 20M x 20M) *What I have tried so far: * -ksp_type bcgs -pc_type gamg -ksp_type gmres -ksp_gmres_restart 150 -pc_type bjacobi -sub_pc_type ilu -sub_pc_factor_levels 5 -sub_ksp_type bcgs -mattransposematmult_via scalable -build_twosided allreduce -ksp_type bcgs -pc_type gamg -mattransposematmult_via scalable -build_twosided allreduce -ksp_type bcgs -pc_type asm -sub_pc_type lu None of the above has really shown any convergence after 2 days of running the simulation. The farthest I got was using gmres + bjacobi which gave me a ||r||/||b|| of the order 1e-2 but it got stuck between 1e-2 and 1e-1 after a week of having left it running. *What I have available in terms of computational resources:* select=25:ncpus=16:mpiprocs=16:mem=230GB So 25 nodes with around 6TB of total memory. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri May 17 03:45:56 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Fri, 17 May 2019 08:45:56 +0000 Subject: [petsc-users] Suggestions for solver and pc In-Reply-To: References: Message-ID: <53AAD52E-AF52-436E-9B5C-BCA75938065E@anl.gov> -ksp_type gmres -ksp_gmres_restart 200 -pc_type asm -sub_pc_type lu -pc_asm_overlap 3 -ksp_monitor It will run like molasses but may converge Good luck > On May 17, 2019, at 3:40 AM, Sal Am via petsc-users wrote: > > Hello, > > So I am trying to solve this problem in the frequency domain by solving a Ax=b. It is apparently very ill conditioned. RF wave plasma interaction simulation. There are 6M finite elements the matrix is of the size: > > nnx: 1257303210 (1B non-zero elements) > n: 20347817 (size of matrix so 20M x 20M) > > What I have tried so far: > > -ksp_type bcgs -pc_type gamg > > -ksp_type gmres -ksp_gmres_restart 150 -pc_type bjacobi -sub_pc_type ilu -sub_pc_factor_levels 5 -sub_ksp_type bcgs -mattransposematmult_via scalable -build_twosided allreduce > > -ksp_type bcgs -pc_type gamg -mattransposematmult_via scalable -build_twosided allreduce > > -ksp_type bcgs -pc_type asm -sub_pc_type lu > > None of the above has really shown any convergence after 2 days of running the simulation. The farthest I got was using gmres + bjacobi which gave me a ||r||/||b|| of the order 1e-2 but it got stuck between 1e-2 and 1e-1 after a week of having left it running. > > What I have available in terms of computational resources: > select=25:ncpus=16:mpiprocs=16:mem=230GB > > So 25 nodes with around 6TB of total memory. > > Thank you. > From tempohoper at gmail.com Fri May 17 05:32:08 2019 From: tempohoper at gmail.com (Sal Am) Date: Fri, 17 May 2019 11:32:08 +0100 Subject: [petsc-users] Suggestions for solver and pc In-Reply-To: <53AAD52E-AF52-436E-9B5C-BCA75938065E@anl.gov> References: <53AAD52E-AF52-436E-9B5C-BCA75938065E@anl.gov> Message-ID: Thank you Barry for quick response, I tried that, but I get several errors of which the first one is: [230]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [230]PETSC ERROR: Out of memory. This could be due to allocating [230]PETSC ERROR: too large an object or bleeding by not properly [230]PETSC ERROR: destroying unneeded objects. [230]PETSC ERROR: Memory allocated 0 Memory used by process 5116485632 [230]PETSC ERROR: Try running with -malloc_dump or -malloc_log for info. [230]PETSC ERROR: Memory requested 186645063808 [230]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [230]PETSC ERROR: Petsc Release Version 3.10.2, unknown [230]PETSC ERROR: ./solveCSys on a linux-cumulus-x64 named r03n04 by vef002 Fri May 17 04:23:38 2019 [230]PETSC ERROR: Configure options PETSC_ARCH=linux-cumulus-x64 --with-cc=/usr/local/depot/openmpi-3.1.1-gcc-7.3.0/bin/mpicc --with-fc=/usr/local/depot/openmpi-3.1.1-gcc-7.3.0/bin/mpifort --with-cxx=/usr/local/depot/openmpi-3.1.1-gcc-7.3.0/bin/mpicxx --download-parmetis --download-metis --download-ptscotch --download-superlu_dist --with-64-bit-indices --with-scalar-type=complex --with-debugging=no --download-scalapack --download-fblaslapack=1 --download-cmake [230]PETSC ERROR: #1 PetscFreeSpaceGet() line 11 in /lustre/home/vef002/petsc/src/mat/utils/freespace.c [230]PETSC ERROR: #2 PetscMallocA() line 390 in /lustre/home/vef002/petsc/src/sys/memory/mal.c [230]PETSC ERROR: #3 PetscFreeSpaceGet() line 11 in /lustre/home/vef002/petsc/src/mat/utils/freespace.c [230]PETSC ERROR: #4 MatLUFactorSymbolic_SeqAIJ() line 349 in /lustre/home/vef002/petsc/src/mat/impls/aij/seq/aijfact.c [230]PETSC ERROR: #5 MatLUFactorSymbolic() line 3015 in /lustre/home/vef002/petsc/src/mat/interface/matrix.c [230]PETSC ERROR: #6 PCSetUp_LU() line 95 in /lustre/home/vef002/petsc/src/ksp/pc/impls/factor/lu/lu.c [230]PETSC ERROR: #7 PCSetUp() line 932 in /lustre/home/vef002/petsc/src/ksp/pc/interface/precon.c [230]PETSC ERROR: #8 KSPSetUp() line 391 in /lustre/home/vef002/petsc/src/ksp/ksp/interface/itfunc.c [230]PETSC ERROR: #9 PCSetUpOnBlocks_ASM() line 450 in /lustre/home/vef002/petsc/src/ksp/pc/impls/asm/asm.c [230]PETSC ERROR: #10 PCSetUpOnBlocks() line 963 in /lustre/home/vef002/petsc/src/ksp/pc/interface/precon.c [230]PETSC ERROR: #11 KSPSetUpOnBlocks() line 223 in /lustre/home/vef002/petsc/src/ksp/ksp/interface/itfunc.c [230]PETSC ERROR: #12 KSPSolve() line 724 in /lustre/home/vef002/petsc/src/ksp/ksp/interface/itfunc.c So apparently ~6TB is not enough for the suggested routine... On Fri, May 17, 2019 at 9:45 AM Smith, Barry F. wrote: > > -ksp_type gmres -ksp_gmres_restart 200 -pc_type asm -sub_pc_type lu > -pc_asm_overlap 3 -ksp_monitor > > It will run like molasses but may converge > > Good luck > > > > On May 17, 2019, at 3:40 AM, Sal Am via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > Hello, > > > > So I am trying to solve this problem in the frequency domain by solving > a Ax=b. It is apparently very ill conditioned. RF wave plasma interaction > simulation. There are 6M finite elements the matrix is of the size: > > > > nnx: 1257303210 (1B non-zero elements) > > n: 20347817 (size of matrix so 20M x 20M) > > > > What I have tried so far: > > > > -ksp_type bcgs -pc_type gamg > > > > -ksp_type gmres -ksp_gmres_restart 150 -pc_type bjacobi -sub_pc_type > ilu -sub_pc_factor_levels 5 -sub_ksp_type bcgs -mattransposematmult_via > scalable -build_twosided allreduce > > > > -ksp_type bcgs -pc_type gamg -mattransposematmult_via scalable > -build_twosided allreduce > > > > -ksp_type bcgs -pc_type asm -sub_pc_type lu > > > > None of the above has really shown any convergence after 2 days of > running the simulation. The farthest I got was using gmres + bjacobi which > gave me a ||r||/||b|| of the order 1e-2 but it got stuck between 1e-2 and > 1e-1 after a week of having left it running. > > > > What I have available in terms of computational resources: > > select=25:ncpus=16:mpiprocs=16:mem=230GB > > > > So 25 nodes with around 6TB of total memory. > > > > Thank you. > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri May 17 06:16:05 2019 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 17 May 2019 07:16:05 -0400 Subject: [petsc-users] MatCreateBAIJ, SNES, Preallocation... In-Reply-To: References: <1372FD84-8585-402C-84EA-38BEC1AC36B7@anl.gov> Message-ID: On Thu, May 16, 2019 at 6:28 PM William Coirier via petsc-users < petsc-users at mcs.anl.gov> wrote: > Ok, got it. My misinterpretation was how to fill the d_nnz and o_nnz > arrays. > > Thank you for your help! > > Might I make a suggestion related to the documentation? Perhaps I have not > fully read the page on the MatMPIBAIJSetPreallocation so you can simply > disregard and I'm ok with that! The documentation has for the d_nnz: > > d_nnz - array containing the number of block nonzeros in the various > block rows of the in diagonal portion of the local (possibly different for > each block row) or NULL. If you plan to factor the matrix you must leave > room for the diagonal entry and set it even if it is zero. > > Am I correct in that this array should be of size numRows, where numRows > is found from calling MatGetOwnershipRange(J,&iLow,&iHigh) so > numRows=iHigh-iLow. > yes, this interface does not change if you set the block size or not. It is at the equation level. > > I think my error was allocating this only to be numRows/bs since I thought > it's a block size thing. > > This documentation looks wrong to me, and at least confusing. "number of block nonzeros" reads wrong to me. We now have: d_nz- number of block nonzeros per block row in diagonal portion of local submatrix (same for all local rows) d_nnz- array containing the number of block nonzeros in the various block rows of the in diagonal portion of the local (possibly different for each block row) or NULL. If you plan to factor the matrix you must leave room for the diagonal entry and set it even if it is zero. o_nz- number of block nonzeros per block row in the off-diagonal portion of local submatrix (same for all local rows). o_nnz- array containing the number of nonzeros in the various block rows of the off-diagonal portion of the local submatrix (possibly different for each block row) or NULL. I can suggest: d_nz- number of nonzeros per row in diagonal portion of local submatrix (same for all local rows) d_nnz- array containing the number of nonzeros in each row of the diagonal portion of the local matrix (the same for each row within a block) or NULL. You must have a diagonal entry and set it even if it is zero if you plan to factor the matrix. o_nz- number of nonzeros per row in the off-diagonal portion of local submatrix (same for all local rows). o_nnz- array containing the number of nonzeros in each row of the off-diagonal portion of the local submatrix (the same for each row within a block) or NULL. I can change this if this is acceptable. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri May 17 06:42:02 2019 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 17 May 2019 07:42:02 -0400 Subject: [petsc-users] Suggestions for solver and pc In-Reply-To: References: <53AAD52E-AF52-436E-9B5C-BCA75938065E@anl.gov> Message-ID: Are you shifting into high frequency? If not you can try tricks in this paper. Using a large coarse grid in AMG that captures the frequency response of interest and using a parallel direct solver is a good option if it is doable (not too deep a shift). @Article{Adams-03a, author = {Adams M.F.}, title = {Algebraic multigrid techniques for strongly indefinite linear systems from direct frequency response analysis in solid mechanics}, journal = {Computational Mechanics}, year = {2007}, volume = {39}, number = {4}, pages = {497-507} } On Fri, May 17, 2019 at 6:33 AM Sal Am via petsc-users < petsc-users at mcs.anl.gov> wrote: > Thank you Barry for quick response, I tried that, but I get several errors > of which the first one is: > > [230]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [230]PETSC ERROR: Out of memory. This could be due to allocating > [230]PETSC ERROR: too large an object or bleeding by not properly > [230]PETSC ERROR: destroying unneeded objects. > [230]PETSC ERROR: Memory allocated 0 Memory used by process 5116485632 > [230]PETSC ERROR: Try running with -malloc_dump or -malloc_log for info. > [230]PETSC ERROR: Memory requested 186645063808 > [230]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [230]PETSC ERROR: Petsc Release Version 3.10.2, unknown > [230]PETSC ERROR: ./solveCSys on a linux-cumulus-x64 named r03n04 by > vef002 Fri May 17 04:23:38 2019 > [230]PETSC ERROR: Configure options PETSC_ARCH=linux-cumulus-x64 > --with-cc=/usr/local/depot/openmpi-3.1.1-gcc-7.3.0/bin/mpicc > --with-fc=/usr/local/depot/openmpi-3.1.1-gcc-7.3.0/bin/mpifort > --with-cxx=/usr/local/depot/openmpi-3.1.1-gcc-7.3.0/bin/mpicxx > --download-parmetis --download-metis --download-ptscotch > --download-superlu_dist --with-64-bit-indices --with-scalar-type=complex > --with-debugging=no --download-scalapack --download-fblaslapack=1 > --download-cmake > [230]PETSC ERROR: #1 PetscFreeSpaceGet() line 11 in > /lustre/home/vef002/petsc/src/mat/utils/freespace.c > [230]PETSC ERROR: #2 PetscMallocA() line 390 in > /lustre/home/vef002/petsc/src/sys/memory/mal.c > [230]PETSC ERROR: #3 PetscFreeSpaceGet() line 11 in > /lustre/home/vef002/petsc/src/mat/utils/freespace.c > [230]PETSC ERROR: #4 MatLUFactorSymbolic_SeqAIJ() line 349 in > /lustre/home/vef002/petsc/src/mat/impls/aij/seq/aijfact.c > [230]PETSC ERROR: #5 MatLUFactorSymbolic() line 3015 in > /lustre/home/vef002/petsc/src/mat/interface/matrix.c > [230]PETSC ERROR: #6 PCSetUp_LU() line 95 in > /lustre/home/vef002/petsc/src/ksp/pc/impls/factor/lu/lu.c > [230]PETSC ERROR: #7 PCSetUp() line 932 in > /lustre/home/vef002/petsc/src/ksp/pc/interface/precon.c > [230]PETSC ERROR: #8 KSPSetUp() line 391 in > /lustre/home/vef002/petsc/src/ksp/ksp/interface/itfunc.c > [230]PETSC ERROR: #9 PCSetUpOnBlocks_ASM() line 450 in > /lustre/home/vef002/petsc/src/ksp/pc/impls/asm/asm.c > [230]PETSC ERROR: #10 PCSetUpOnBlocks() line 963 in > /lustre/home/vef002/petsc/src/ksp/pc/interface/precon.c > [230]PETSC ERROR: #11 KSPSetUpOnBlocks() line 223 in > /lustre/home/vef002/petsc/src/ksp/ksp/interface/itfunc.c > [230]PETSC ERROR: #12 KSPSolve() line 724 in > /lustre/home/vef002/petsc/src/ksp/ksp/interface/itfunc.c > > So apparently ~6TB is not enough for the suggested routine... > > On Fri, May 17, 2019 at 9:45 AM Smith, Barry F. > wrote: > >> >> -ksp_type gmres -ksp_gmres_restart 200 -pc_type asm -sub_pc_type lu >> -pc_asm_overlap 3 -ksp_monitor >> >> It will run like molasses but may converge >> >> Good luck >> >> >> > On May 17, 2019, at 3:40 AM, Sal Am via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> > >> > Hello, >> > >> > So I am trying to solve this problem in the frequency domain by solving >> a Ax=b. It is apparently very ill conditioned. RF wave plasma interaction >> simulation. There are 6M finite elements the matrix is of the size: >> > >> > nnx: 1257303210 (1B non-zero elements) >> > n: 20347817 (size of matrix so 20M x 20M) >> > >> > What I have tried so far: >> > >> > -ksp_type bcgs -pc_type gamg >> > >> > -ksp_type gmres -ksp_gmres_restart 150 -pc_type bjacobi -sub_pc_type >> ilu -sub_pc_factor_levels 5 -sub_ksp_type bcgs -mattransposematmult_via >> scalable -build_twosided allreduce >> > >> > -ksp_type bcgs -pc_type gamg -mattransposematmult_via scalable >> -build_twosided allreduce >> > >> > -ksp_type bcgs -pc_type asm -sub_pc_type lu >> > >> > None of the above has really shown any convergence after 2 days of >> running the simulation. The farthest I got was using gmres + bjacobi which >> gave me a ||r||/||b|| of the order 1e-2 but it got stuck between 1e-2 and >> 1e-1 after a week of having left it running. >> > >> > What I have available in terms of computational resources: >> > select=25:ncpus=16:mpiprocs=16:mem=230GB >> > >> > So 25 nodes with around 6TB of total memory. >> > >> > Thank you. >> > >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From sajidsyed2021 at u.northwestern.edu Fri May 17 09:13:28 2019 From: sajidsyed2021 at u.northwestern.edu (Sajid Ali) Date: Fri, 17 May 2019 09:13:28 -0500 Subject: [petsc-users] Question about TSComputeRHSJacobianConstant In-Reply-To: References: <61B21078-9146-4FE2-8967-95D64DB583C6@anl.gov> Message-ID: Hi Hong, The solution has the right characteristics but it's off by many orders of magnitude. It is ~3.5x faster as before. Am I supposed to keep the TSRHSJacobianSetReuse function or not? Thank You, Sajid Ali Applied Physics Northwestern University -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri May 17 09:15:52 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Fri, 17 May 2019 14:15:52 +0000 Subject: [petsc-users] MatCreateBAIJ, SNES, Preallocation... In-Reply-To: References: <1372FD84-8585-402C-84EA-38BEC1AC36B7@anl.gov> Message-ID: <98A29B48-3299-404C-BD27-BE5D0D809A11@mcs.anl.gov> I don't understand. For BAIJ and SBAIJ it is the number of block nonzeros in the two parts of the matrix. It is not the number of nonzeros per row. For Mat[MPI]AIJSetPreallocation() it is the number of nonzeros (even if a block size has been set) because for AIJ matrices the block size does not affect the storage of the matrix and it is merely informative. For MatXAIJSetPreallocation() however (which can be used with AIJ, BAIJ, and SBAIJ) the nonzero blocks are for all three matrix formats. MatXAIJSetPreallocation is a convenience function that allows the user to avoid needing to allocate three sets of arrays and fill them up separately for each of the AIJ, BAIJ, and SBAIJ formats. Is everything now clear, if not please feel free to ask more specific questions to clarify. Barry > On May 17, 2019, at 6:16 AM, Mark Adams wrote: > > > > On Thu, May 16, 2019 at 6:28 PM William Coirier via petsc-users wrote: > Ok, got it. My misinterpretation was how to fill the d_nnz and o_nnz arrays. > > Thank you for your help! > > Might I make a suggestion related to the documentation? Perhaps I have not fully read the page on the MatMPIBAIJSetPreallocation so you can simply disregard and I'm ok with that! The documentation has for the d_nnz: > > d_nnz - array containing the number of block nonzeros in the various block rows of the in diagonal portion of the local (possibly different for each block row) or NULL. If you plan to factor the matrix you must leave room for the diagonal entry and set it even if it is zero. > > Am I correct in that this array should be of size numRows, where numRows is found from calling MatGetOwnershipRange(J,&iLow,&iHigh) so numRows=iHigh-iLow. > > yes, this interface does not change if you set the block size or not. It is at the equation level. > > > I think my error was allocating this only to be numRows/bs since I thought it's a block size thing. > > > This documentation looks wrong to me, and at least confusing. "number of block nonzeros" reads wrong to me. We now have: > > d_nz- number of block nonzeros per block row in diagonal portion of local submatrix (same for all local rows) > d_nnz- array containing the number of block nonzeros in the various block rows of the in diagonal portion of the local (possibly different for each block row) or NULL. If you plan to factor the matrix you must leave room for the diagonal entry and set it even if it is zero. > o_nz- number of block nonzeros per block row in the off-diagonal portion of local submatrix (same for all local rows). > o_nnz- array containing the number of nonzeros in the various block rows of the off-diagonal portion of the local submatrix (possibly different for each block row) or NULL. > > I can suggest: > > d_nz- number of nonzeros per row in diagonal portion of local submatrix (same for all local rows) > d_nnz- array containing the number of nonzeros in each row of the diagonal portion of the local matrix (the same for each row within a block) or NULL. You must have a diagonal entry and set it even if it is zero if you plan to factor the matrix. > o_nz- number of nonzeros per row in the off-diagonal portion of local submatrix (same for all local rows). > o_nnz- array containing the number of nonzeros in each row of the off-diagonal portion of the local submatrix (the same for each row within a block) or NULL. > > I can change this if this is acceptable. From bsmith at mcs.anl.gov Fri May 17 09:20:31 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Fri, 17 May 2019 14:20:31 +0000 Subject: [petsc-users] Suggestions for solver and pc In-Reply-To: References: <53AAD52E-AF52-436E-9B5C-BCA75938065E@anl.gov> Message-ID: As well as trying Mark's suggestion, you can try with just an overlap of 2 for the ASM and use an ILU preconditioner on each domain -sub_pc_type ilu (in combination or one at a time) There is no doubt that when using ASM for difficult problems it does require a great deal of memory, you could try smaller problems to determine how an effective solver it is or not; it is is the most effective solver then at least you know this, and if needed, purse use of larger machines. Barry > On May 17, 2019, at 5:32 AM, Sal Am wrote: > > Thank you Barry for quick response, I tried that, but I get several errors of which the first one is: > > [230]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [230]PETSC ERROR: Out of memory. This could be due to allocating > [230]PETSC ERROR: too large an object or bleeding by not properly > [230]PETSC ERROR: destroying unneeded objects. > [230]PETSC ERROR: Memory allocated 0 Memory used by process 5116485632 > [230]PETSC ERROR: Try running with -malloc_dump or -malloc_log for info. > [230]PETSC ERROR: Memory requested 186645063808 > [230]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [230]PETSC ERROR: Petsc Release Version 3.10.2, unknown > [230]PETSC ERROR: ./solveCSys on a linux-cumulus-x64 named r03n04 by vef002 Fri May 17 04:23:38 2019 > [230]PETSC ERROR: Configure options PETSC_ARCH=linux-cumulus-x64 --with-cc=/usr/local/depot/openmpi-3.1.1-gcc-7.3.0/bin/mpicc --with-fc=/usr/local/depot/openmpi-3.1.1-gcc-7.3.0/bin/mpifort --with-cxx=/usr/local/depot/openmpi-3.1.1-gcc-7.3.0/bin/mpicxx --download-parmetis --download-metis --download-ptscotch --download-superlu_dist --with-64-bit-indices --with-scalar-type=complex --with-debugging=no --download-scalapack --download-fblaslapack=1 --download-cmake > [230]PETSC ERROR: #1 PetscFreeSpaceGet() line 11 in /lustre/home/vef002/petsc/src/mat/utils/freespace.c > [230]PETSC ERROR: #2 PetscMallocA() line 390 in /lustre/home/vef002/petsc/src/sys/memory/mal.c > [230]PETSC ERROR: #3 PetscFreeSpaceGet() line 11 in /lustre/home/vef002/petsc/src/mat/utils/freespace.c > [230]PETSC ERROR: #4 MatLUFactorSymbolic_SeqAIJ() line 349 in /lustre/home/vef002/petsc/src/mat/impls/aij/seq/aijfact.c > [230]PETSC ERROR: #5 MatLUFactorSymbolic() line 3015 in /lustre/home/vef002/petsc/src/mat/interface/matrix.c > [230]PETSC ERROR: #6 PCSetUp_LU() line 95 in /lustre/home/vef002/petsc/src/ksp/pc/impls/factor/lu/lu.c > [230]PETSC ERROR: #7 PCSetUp() line 932 in /lustre/home/vef002/petsc/src/ksp/pc/interface/precon.c > [230]PETSC ERROR: #8 KSPSetUp() line 391 in /lustre/home/vef002/petsc/src/ksp/ksp/interface/itfunc.c > [230]PETSC ERROR: #9 PCSetUpOnBlocks_ASM() line 450 in /lustre/home/vef002/petsc/src/ksp/pc/impls/asm/asm.c > [230]PETSC ERROR: #10 PCSetUpOnBlocks() line 963 in /lustre/home/vef002/petsc/src/ksp/pc/interface/precon.c > [230]PETSC ERROR: #11 KSPSetUpOnBlocks() line 223 in /lustre/home/vef002/petsc/src/ksp/ksp/interface/itfunc.c > [230]PETSC ERROR: #12 KSPSolve() line 724 in /lustre/home/vef002/petsc/src/ksp/ksp/interface/itfunc.c > > So apparently ~6TB is not enough for the suggested routine... > > On Fri, May 17, 2019 at 9:45 AM Smith, Barry F. wrote: > > -ksp_type gmres -ksp_gmres_restart 200 -pc_type asm -sub_pc_type lu -pc_asm_overlap 3 -ksp_monitor > > It will run like molasses but may converge > > Good luck > > > > On May 17, 2019, at 3:40 AM, Sal Am via petsc-users wrote: > > > > Hello, > > > > So I am trying to solve this problem in the frequency domain by solving a Ax=b. It is apparently very ill conditioned. RF wave plasma interaction simulation. There are 6M finite elements the matrix is of the size: > > > > nnx: 1257303210 (1B non-zero elements) > > n: 20347817 (size of matrix so 20M x 20M) > > > > What I have tried so far: > > > > -ksp_type bcgs -pc_type gamg > > > > -ksp_type gmres -ksp_gmres_restart 150 -pc_type bjacobi -sub_pc_type ilu -sub_pc_factor_levels 5 -sub_ksp_type bcgs -mattransposematmult_via scalable -build_twosided allreduce > > > > -ksp_type bcgs -pc_type gamg -mattransposematmult_via scalable -build_twosided allreduce > > > > -ksp_type bcgs -pc_type asm -sub_pc_type lu > > > > None of the above has really shown any convergence after 2 days of running the simulation. The farthest I got was using gmres + bjacobi which gave me a ||r||/||b|| of the order 1e-2 but it got stuck between 1e-2 and 1e-1 after a week of having left it running. > > > > What I have available in terms of computational resources: > > select=25:ncpus=16:mpiprocs=16:mem=230GB > > > > So 25 nodes with around 6TB of total memory. > > > > Thank you. > > > From William.Coirier at kratosdefense.com Fri May 17 09:55:44 2019 From: William.Coirier at kratosdefense.com (William Coirier) Date: Fri, 17 May 2019 14:55:44 +0000 Subject: [petsc-users] MatCreateBAIJ, SNES, Preallocation... In-Reply-To: <98A29B48-3299-404C-BD27-BE5D0D809A11@mcs.anl.gov> References: <1372FD84-8585-402C-84EA-38BEC1AC36B7@anl.gov> , <98A29B48-3299-404C-BD27-BE5D0D809A11@mcs.anl.gov> Message-ID: Mark: Should the size of d_nnz and o_nnz be the number of rows accessed on this processor or the number of block rows? ________________________________________ From: Smith, Barry F. [bsmith at mcs.anl.gov] Sent: Friday, May 17, 2019 9:15 AM To: Mark Adams Cc: William Coirier; petsc-users at mcs.anl.gov; Michael Robinson; Andrew Holm Subject: Re: [petsc-users] MatCreateBAIJ, SNES, Preallocation... I don't understand. For BAIJ and SBAIJ it is the number of block nonzeros in the two parts of the matrix. It is not the number of nonzeros per row. For Mat[MPI]AIJSetPreallocation() it is the number of nonzeros (even if a block size has been set) because for AIJ matrices the block size does not affect the storage of the matrix and it is merely informative. For MatXAIJSetPreallocation() however (which can be used with AIJ, BAIJ, and SBAIJ) the nonzero blocks are for all three matrix formats. MatXAIJSetPreallocation is a convenience function that allows the user to avoid needing to allocate three sets of arrays and fill them up separately for each of the AIJ, BAIJ, and SBAIJ formats. Is everything now clear, if not please feel free to ask more specific questions to clarify. Barry > On May 17, 2019, at 6:16 AM, Mark Adams wrote: > > > > On Thu, May 16, 2019 at 6:28 PM William Coirier via petsc-users wrote: > Ok, got it. My misinterpretation was how to fill the d_nnz and o_nnz arrays. > > Thank you for your help! > > Might I make a suggestion related to the documentation? Perhaps I have not fully read the page on the MatMPIBAIJSetPreallocation so you can simply disregard and I'm ok with that! The documentation has for the d_nnz: > > d_nnz - array containing the number of block nonzeros in the various block rows of the in diagonal portion of the local (possibly different for each block row) or NULL. If you plan to factor the matrix you must leave room for the diagonal entry and set it even if it is zero. > > Am I correct in that this array should be of size numRows, where numRows is found from calling MatGetOwnershipRange(J,&iLow,&iHigh) so numRows=iHigh-iLow. > > yes, this interface does not change if you set the block size or not. It is at the equation level. > > > I think my error was allocating this only to be numRows/bs since I thought it's a block size thing. > > > This documentation looks wrong to me, and at least confusing. "number of block nonzeros" reads wrong to me. We now have: > > d_nz- number of block nonzeros per block row in diagonal portion of local submatrix (same for all local rows) > d_nnz- array containing the number of block nonzeros in the various block rows of the in diagonal portion of the local (possibly different for each block row) or NULL. If you plan to factor the matrix you must leave room for the diagonal entry and set it even if it is zero. > o_nz- number of block nonzeros per block row in the off-diagonal portion of local submatrix (same for all local rows). > o_nnz- array containing the number of nonzeros in the various block rows of the off-diagonal portion of the local submatrix (possibly different for each block row) or NULL. > > I can suggest: > > d_nz- number of nonzeros per row in diagonal portion of local submatrix (same for all local rows) > d_nnz- array containing the number of nonzeros in each row of the diagonal portion of the local matrix (the same for each row within a block) or NULL. You must have a diagonal entry and set it even if it is zero if you plan to factor the matrix. > o_nz- number of nonzeros per row in the off-diagonal portion of local submatrix (same for all local rows). > o_nnz- array containing the number of nonzeros in each row of the off-diagonal portion of the local submatrix (the same for each row within a block) or NULL. > > I can change this if this is acceptable. From bsmith at mcs.anl.gov Fri May 17 10:00:55 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Fri, 17 May 2019 15:00:55 +0000 Subject: [petsc-users] MatCreateBAIJ, SNES, Preallocation... In-Reply-To: References: <1372FD84-8585-402C-84EA-38BEC1AC36B7@anl.gov> <98A29B48-3299-404C-BD27-BE5D0D809A11@mcs.anl.gov> Message-ID: <43ABBF55-05FC-49AA-94D7-FE5EEF2318EF@mcs.anl.gov> BAIJ and SBAIJ it is the number of block rows, so the row sizes divided by the bs. For AIJ the number of rows. > On May 17, 2019, at 9:55 AM, William Coirier wrote: > > Mark: > > Should the size of d_nnz and o_nnz be the number of rows accessed on this processor or the number of block rows? > ________________________________________ > From: Smith, Barry F. [bsmith at mcs.anl.gov] > Sent: Friday, May 17, 2019 9:15 AM > To: Mark Adams > Cc: William Coirier; petsc-users at mcs.anl.gov; Michael Robinson; Andrew Holm > Subject: Re: [petsc-users] MatCreateBAIJ, SNES, Preallocation... > > I don't understand. For BAIJ and SBAIJ it is the number of block nonzeros in the two parts of the matrix. It is not the number of nonzeros per row. > For Mat[MPI]AIJSetPreallocation() it is the number of nonzeros (even if a block size has been set) because for AIJ matrices the block size does not affect the storage of the matrix and it is merely informative. > > For MatXAIJSetPreallocation() however (which can be used with AIJ, BAIJ, and SBAIJ) the nonzero blocks are for all three matrix formats. MatXAIJSetPreallocation is a convenience function that allows the user to avoid needing to allocate three sets of arrays and fill them up separately for each of the AIJ, BAIJ, and SBAIJ formats. > > Is everything now clear, if not please feel free to ask more specific questions to clarify. > > Barry > >> On May 17, 2019, at 6:16 AM, Mark Adams wrote: >> >> >> >> On Thu, May 16, 2019 at 6:28 PM William Coirier via petsc-users wrote: >> Ok, got it. My misinterpretation was how to fill the d_nnz and o_nnz arrays. >> >> Thank you for your help! >> >> Might I make a suggestion related to the documentation? Perhaps I have not fully read the page on the MatMPIBAIJSetPreallocation so you can simply disregard and I'm ok with that! The documentation has for the d_nnz: >> >> d_nnz - array containing the number of block nonzeros in the various block rows of the in diagonal portion of the local (possibly different for each block row) or NULL. If you plan to factor the matrix you must leave room for the diagonal entry and set it even if it is zero. >> >> Am I correct in that this array should be of size numRows, where numRows is found from calling MatGetOwnershipRange(J,&iLow,&iHigh) so numRows=iHigh-iLow. >> >> yes, this interface does not change if you set the block size or not. It is at the equation level. >> >> >> I think my error was allocating this only to be numRows/bs since I thought it's a block size thing. >> >> >> This documentation looks wrong to me, and at least confusing. "number of block nonzeros" reads wrong to me. We now have: >> >> d_nz- number of block nonzeros per block row in diagonal portion of local submatrix (same for all local rows) >> d_nnz- array containing the number of block nonzeros in the various block rows of the in diagonal portion of the local (possibly different for each block row) or NULL. If you plan to factor the matrix you must leave room for the diagonal entry and set it even if it is zero. >> o_nz- number of block nonzeros per block row in the off-diagonal portion of local submatrix (same for all local rows). >> o_nnz- array containing the number of nonzeros in the various block rows of the off-diagonal portion of the local submatrix (possibly different for each block row) or NULL. >> >> I can suggest: >> >> d_nz- number of nonzeros per row in diagonal portion of local submatrix (same for all local rows) >> d_nnz- array containing the number of nonzeros in each row of the diagonal portion of the local matrix (the same for each row within a block) or NULL. You must have a diagonal entry and set it even if it is zero if you plan to factor the matrix. >> o_nz- number of nonzeros per row in the off-diagonal portion of local submatrix (same for all local rows). >> o_nnz- array containing the number of nonzeros in each row of the off-diagonal portion of the local submatrix (the same for each row within a block) or NULL. >> >> I can change this if this is acceptable. > From juaneah at gmail.com Fri May 17 10:19:55 2019 From: juaneah at gmail.com (Emmanuel Ayala) Date: Fri, 17 May 2019 17:19:55 +0200 Subject: [petsc-users] PETSc Matrix to MatLab Message-ID: Hello, I am a newby with PETSc. I want to check some matrices generated in PETSc, using MatLab. I created a matrix A (MATMPIAIJ), the partition is defined by PETSc and I defined the global size. Then, I used the next code to save in binary format the matrix from PETSc: PetscViewer viewer; PetscViewerBinaryOpen(PETSC_COMM_WORLD, "matrix", FILE_MODE_WRITE, &viewer); After that (I think) MatView writes in the viewer: MatView(A,viewer); Then I laod the matrix in MatLab with PetscBinaryRead(). For one process everything is Ok, I can see the full pattern of the sparse matrix (spy(A)). But when I generate the matrix A with more than one process, the resultant matrix only contains the data from the process 0. What is the mistake in my procedure? 1. I just want to export the PETSc matrix to MatLab, for any number of process. Regards. Thanks in advance, for your time. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Fri May 17 10:24:31 2019 From: mfadams at lbl.gov (Mark Adams) Date: Fri, 17 May 2019 11:24:31 -0400 Subject: [petsc-users] MatCreateBAIJ, SNES, Preallocation... In-Reply-To: <43ABBF55-05FC-49AA-94D7-FE5EEF2318EF@mcs.anl.gov> References: <1372FD84-8585-402C-84EA-38BEC1AC36B7@anl.gov> <98A29B48-3299-404C-BD27-BE5D0D809A11@mcs.anl.gov> <43ABBF55-05FC-49AA-94D7-FE5EEF2318EF@mcs.anl.gov> Message-ID: Oh, this is BAIJ. Sure. On Fri, May 17, 2019 at 11:01 AM Smith, Barry F. wrote: > > BAIJ and SBAIJ it is the number of block rows, so the row sizes divided > by the bs. For AIJ the number of rows. > > > On May 17, 2019, at 9:55 AM, William Coirier < > William.Coirier at kratosdefense.com> wrote: > > > > Mark: > > > > Should the size of d_nnz and o_nnz be the number of rows accessed on > this processor or the number of block rows? > > ________________________________________ > > From: Smith, Barry F. [bsmith at mcs.anl.gov] > > Sent: Friday, May 17, 2019 9:15 AM > > To: Mark Adams > > Cc: William Coirier; petsc-users at mcs.anl.gov; Michael Robinson; Andrew > Holm > > Subject: Re: [petsc-users] MatCreateBAIJ, SNES, Preallocation... > > > > I don't understand. For BAIJ and SBAIJ it is the number of block > nonzeros in the two parts of the matrix. It is not the number of nonzeros > per row. > > For Mat[MPI]AIJSetPreallocation() it is the number of nonzeros (even if > a block size has been set) because for AIJ matrices the block size does not > affect the storage of the matrix and it is merely informative. > > > > For MatXAIJSetPreallocation() however (which can be used with AIJ, > BAIJ, and SBAIJ) the nonzero blocks are for all three matrix formats. > MatXAIJSetPreallocation is a convenience function that allows the user to > avoid needing to allocate three sets of arrays and fill them up separately > for each of the AIJ, BAIJ, and SBAIJ formats. > > > > Is everything now clear, if not please feel free to ask more specific > questions to clarify. > > > > Barry > > > >> On May 17, 2019, at 6:16 AM, Mark Adams wrote: > >> > >> > >> > >> On Thu, May 16, 2019 at 6:28 PM William Coirier via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> Ok, got it. My misinterpretation was how to fill the d_nnz and o_nnz > arrays. > >> > >> Thank you for your help! > >> > >> Might I make a suggestion related to the documentation? Perhaps I have > not fully read the page on the MatMPIBAIJSetPreallocation so you can simply > disregard and I'm ok with that! The documentation has for the d_nnz: > >> > >> d_nnz - array containing the number of block nonzeros in the various > block rows of the in diagonal portion of the local (possibly different for > each block row) or NULL. If you plan to factor the matrix you must leave > room for the diagonal entry and set it even if it is zero. > >> > >> Am I correct in that this array should be of size numRows, where > numRows is found from calling MatGetOwnershipRange(J,&iLow,&iHigh) so > numRows=iHigh-iLow. > >> > >> yes, this interface does not change if you set the block size or not. > It is at the equation level. > >> > >> > >> I think my error was allocating this only to be numRows/bs since I > thought it's a block size thing. > >> > >> > >> This documentation looks wrong to me, and at least confusing. "number > of block nonzeros" reads wrong to me. We now have: > >> > >> d_nz- number of block nonzeros per block row in diagonal portion of > local submatrix (same for all local rows) > >> d_nnz- array containing the number of block nonzeros in the various > block rows of the in diagonal portion of the local (possibly different for > each block row) or NULL. If you plan to factor the matrix you must leave > room for the diagonal entry and set it even if it is zero. > >> o_nz- number of block nonzeros per block row in the off-diagonal > portion of local submatrix (same for all local rows). > >> o_nnz- array containing the number of nonzeros in the various block > rows of the off-diagonal portion of the local submatrix (possibly different > for each block row) or NULL. > >> > >> I can suggest: > >> > >> d_nz- number of nonzeros per row in diagonal portion of local submatrix > (same for all local rows) > >> d_nnz- array containing the number of nonzeros in each row of the > diagonal portion of the local matrix (the same for each row within a block) > or NULL. You must have a diagonal entry and set it even if it is zero if > you plan to factor the matrix. > >> o_nz- number of nonzeros per row in the off-diagonal portion of local > submatrix (same for all local rows). > >> o_nnz- array containing the number of nonzeros in each row of the > off-diagonal portion of the local submatrix (the same for each row within a > block) or NULL. > >> > >> I can change this if this is acceptable. > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hzhang at mcs.anl.gov Fri May 17 10:49:22 2019 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Fri, 17 May 2019 15:49:22 +0000 Subject: [petsc-users] PETSc Matrix to MatLab In-Reply-To: References: Message-ID: Check your petsc matrix before dumping the data by adding MatView(A,PETSC_VIEWER_STDOUT_WORLD); immediately after calling MatAssemblyEnd(). Do you see a correct parallel matrix? Hong On Fri, May 17, 2019 at 10:24 AM Emmanuel Ayala via petsc-users > wrote: Hello, I am a newby with PETSc. I want to check some matrices generated in PETSc, using MatLab. I created a matrix A (MATMPIAIJ), the partition is defined by PETSc and I defined the global size. Then, I used the next code to save in binary format the matrix from PETSc: PetscViewer viewer; PetscViewerBinaryOpen(PETSC_COMM_WORLD, "matrix", FILE_MODE_WRITE, &viewer); After that (I think) MatView writes in the viewer: MatView(A,viewer); Then I laod the matrix in MatLab with PetscBinaryRead(). For one process everything is Ok, I can see the full pattern of the sparse matrix (spy(A)). But when I generate the matrix A with more than one process, the resultant matrix only contains the data from the process 0. What is the mistake in my procedure? 1. I just want to export the PETSc matrix to MatLab, for any number of process. Regards. Thanks in advance, for your time. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ysjosh.lo at gmail.com Fri May 17 18:58:05 2019 From: ysjosh.lo at gmail.com (Josh L) Date: Fri, 17 May 2019 18:58:05 -0500 Subject: [petsc-users] DMPlex assembly global stiffness matrix Message-ID: Hi, I have a DM that has 2 fields , and field #1 has 2 dofs and field #2 has 1 dof. I only have dofs on vertex. Can I use the following to assemble global stiffness matrix instead of using MatSetClosure(I am not integrating 2 field separately) DMGetGlobalSection(dm,GlobalSection) For cells calculate element stiffness matrix eleMat For vertex in cells PetscSectionGetOffset(GlobalSection, vertex, offset) loc=[offset_v1, offset_v1+1, offset_v1+2, offset_v2, offset_v2+1.......] End MatSetValues(GlobalMat, n,loc,n,loc, eleMat, ADD_VALUES) End AssemblyBegin and End. Basically use the offset from global section to have the global dof number. Thanks, Josh -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Sat May 18 11:27:26 2019 From: mfadams at lbl.gov (Mark Adams) Date: Sat, 18 May 2019 12:27:26 -0400 Subject: [petsc-users] DMPlex assembly global stiffness matrix In-Reply-To: References: Message-ID: I don't think that will work. Offsets refer to the graph storage. On Fri, May 17, 2019 at 7:59 PM Josh L via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi, > > I have a DM that has 2 fields , and field #1 has 2 dofs and field #2 has 1 > dof. > I only have dofs on vertex. > > Can I use the following to assemble global stiffness matrix instead of > using MatSetClosure(I am not integrating 2 field separately) > > DMGetGlobalSection(dm,GlobalSection) > For cells > calculate element stiffness matrix eleMat > For vertex in cells > PetscSectionGetOffset(GlobalSection, vertex, offset) > loc=[offset_v1, offset_v1+1, offset_v1+2, offset_v2, > offset_v2+1.......] > End > MatSetValues(GlobalMat, n,loc,n,loc, eleMat, ADD_VALUES) > End > AssemblyBegin and End. > > Basically use the offset from global section to have the global dof number. > > Thanks, > Josh > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Sat May 18 21:44:23 2019 From: balay at mcs.anl.gov (Balay, Satish) Date: Sun, 19 May 2019 02:44:23 +0000 Subject: [petsc-users] petsc-3.11.2.tar.gz now available Message-ID: Dear PETSc users, The patch release petsc-3.11.2 is now available for download, with change list at 'PETSc-3.11 Changelog' http://www.mcs.anl.gov/petsc/download/index.html Satish From simon7412369 at gmail.com Sun May 19 08:21:23 2019 From: simon7412369 at gmail.com (=?UTF-8?B?6Zmz6bO06Kut?=) Date: Sun, 19 May 2019 21:21:23 +0800 Subject: [petsc-users] problem with generating simplicies mesh Message-ID: I have problem with generating simplicies mesh. I do as the description in DMPlexCreateBoxmesh says, but still meet error. The following is the error message: [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Argument out of range [0]PETSC ERROR: No grid generator of dimension 1 registered [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.11.1-723-g96d64d1 GIT Date: 2019-05-15 13:23:17 +0000 [0]PETSC ERROR: ./membrane on a arch-linux2-c-debug named simon-System-Product-Name by simon Sun May 19 20:54:54 2019 [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mpich --download-fblaslapack [0]PETSC ERROR: #1 DMPlexGenerate() line 181 in /home/simon/petsc/src/dm/impls/plex/plexgenerate.c [0]PETSC ERROR: #2 DMPlexCreateBoxMesh_Simplex_Internal() line 536 in /home/simon/petsc/src/dm/impls/plex/plexcreate.c [0]PETSC ERROR: #3 DMPlexCreateBoxMesh() line 1071 in /home/simon/petsc/src/dm/impls/plex/plexcreate.c [0]PETSC ERROR: #4 main() line 54 in /home/simon/Downloads/membrane.c [0]PETSC ERROR: PETSc Option Table entries: [0]PETSC ERROR: -dm_view [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=63 : system msg for write_line failure : Bad file descriptor I need some help about this, please. Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: error_message.png Type: image/png Size: 536030 bytes Desc: not available URL: From mfadams at lbl.gov Sun May 19 11:31:31 2019 From: mfadams at lbl.gov (Mark Adams) Date: Sun, 19 May 2019 12:31:31 -0400 Subject: [petsc-users] problem with generating simplicies mesh In-Reply-To: References: Message-ID: I would guess that you want 2 faces in each direction (the default so use NULL instead of faces). On Sun, May 19, 2019 at 9:23 AM ??? via petsc-users wrote: > I have problem with generating simplicies mesh. > I do as the description in DMPlexCreateBoxmesh says, but still meet error. > > The following is the error message: > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Argument out of range > [0]PETSC ERROR: No grid generator of dimension 1 registered > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.11.1-723-g96d64d1 GIT > Date: 2019-05-15 13:23:17 +0000 > [0]PETSC ERROR: ./membrane on a arch-linux2-c-debug named > simon-System-Product-Name by simon Sun May 19 20:54:54 2019 > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ > --with-fc=gfortran --download-mpich --download-fblaslapack > [0]PETSC ERROR: #1 DMPlexGenerate() line 181 in > /home/simon/petsc/src/dm/impls/plex/plexgenerate.c > [0]PETSC ERROR: #2 DMPlexCreateBoxMesh_Simplex_Internal() line 536 in > /home/simon/petsc/src/dm/impls/plex/plexcreate.c > [0]PETSC ERROR: #3 DMPlexCreateBoxMesh() line 1071 in > /home/simon/petsc/src/dm/impls/plex/plexcreate.c > [0]PETSC ERROR: #4 main() line 54 in /home/simon/Downloads/membrane.c > [0]PETSC ERROR: PETSc Option Table entries: > [0]PETSC ERROR: -dm_view > [0]PETSC ERROR: ----------------End of Error Message -------send entire > error message to petsc-maint at mcs.anl.gov---------- > application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 > [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=63 > : > system msg for write_line failure : Bad file descriptor > > > > I need some help about this, please. > > Simon > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Sun May 19 11:33:06 2019 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Sun, 19 May 2019 18:33:06 +0200 Subject: [petsc-users] problem with generating simplicies mesh In-Reply-To: References: Message-ID: You need a grid generator like triangle or tetgen. Reconfigure petsc adding -- download-triangle --download-ctetgen Il Dom 19 Mag 2019, 18:30 Mark Adams via petsc-users < petsc-users at mcs.anl.gov> ha scritto: > I would guess that you want 2 faces in each direction (the default so use > NULL instead of faces). > > On Sun, May 19, 2019 at 9:23 AM ??? via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> I have problem with generating simplicies mesh. >> I do as the description in DMPlexCreateBoxmesh says, but still meet error. >> >> The following is the error message: >> [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [0]PETSC ERROR: Argument out of range >> [0]PETSC ERROR: No grid generator of dimension 1 registered >> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html >> for trouble shooting. >> [0]PETSC ERROR: Petsc Development GIT revision: v3.11.1-723-g96d64d1 GIT >> Date: 2019-05-15 13:23:17 +0000 >> [0]PETSC ERROR: ./membrane on a arch-linux2-c-debug named >> simon-System-Product-Name by simon Sun May 19 20:54:54 2019 >> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ >> --with-fc=gfortran --download-mpich --download-fblaslapack >> [0]PETSC ERROR: #1 DMPlexGenerate() line 181 in >> /home/simon/petsc/src/dm/impls/plex/plexgenerate.c >> [0]PETSC ERROR: #2 DMPlexCreateBoxMesh_Simplex_Internal() line 536 in >> /home/simon/petsc/src/dm/impls/plex/plexcreate.c >> [0]PETSC ERROR: #3 DMPlexCreateBoxMesh() line 1071 in >> /home/simon/petsc/src/dm/impls/plex/plexcreate.c >> [0]PETSC ERROR: #4 main() line 54 in /home/simon/Downloads/membrane.c >> [0]PETSC ERROR: PETSc Option Table entries: >> [0]PETSC ERROR: -dm_view >> [0]PETSC ERROR: ----------------End of Error Message -------send entire >> error message to petsc-maint at mcs.anl.gov---------- >> application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 >> [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=63 >> : >> system msg for write_line failure : Bad file descriptor >> >> >> >> I need some help about this, please. >> >> Simon >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From swarnava89 at gmail.com Sun May 19 18:34:41 2019 From: swarnava89 at gmail.com (Swarnava Ghosh) Date: Sun, 19 May 2019 16:34:41 -0700 Subject: [petsc-users] Creating a DMNetwork from a DMPlex Message-ID: Hi Petsc users and developers, I am trying to find a way of creating a DMNetwork from a DMPlex. I have read the DMPlex from a gmesh file and have it distributed. Thanks, SG -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sun May 19 18:50:29 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Sun, 19 May 2019 23:50:29 +0000 Subject: [petsc-users] Creating a DMNetwork from a DMPlex In-Reply-To: References: Message-ID: This use case never occurred to us. Is the gmesh file containing a graph/network (as opposed to a mesh)? There seem two choices 1) if the gmesh file contains a graph/network one could write a gmesh reader for that case that reads directly for and constructs a DMNetwork or 2) write a converter for a DMPlex to DMNetwork. I lean toward the first Either way you need to understand the documentation for DMNetwork and how to build one up. Barry > On May 19, 2019, at 6:34 PM, Swarnava Ghosh via petsc-users wrote: > > Hi Petsc users and developers, > > I am trying to find a way of creating a DMNetwork from a DMPlex. I have read the DMPlex from a gmesh file and have it distributed. > > Thanks, > SG From bsmith at mcs.anl.gov Sun May 19 20:54:14 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Mon, 20 May 2019 01:54:14 +0000 Subject: [petsc-users] Creating a DMNetwork from a DMPlex In-Reply-To: References: Message-ID: <9DAFD49B-AB7F-435F-BB27-16EF946E1241@mcs.anl.gov> I am not sure you want DMNetwork, DMNetwork has no geometry; it only has vertices and edges. Vertices are connected to other vertices through the edges. For example I can't see how one would do vertex centered finite volume methods with DMNetwork. Maybe if you said something more about your planned discretization we could figure something out. > On May 19, 2019, at 8:32 PM, Swarnava Ghosh wrote: > > Hi Barry, > > No, the gmesh file contains a mesh and not a graph/network. > In that case, is it possible to create a DMNetwork first from the DMPlex and then distribute the DMNetwork. > > I have this case, because I want a vertex partitioning of my mesh. Domain decomposition of DMPlex gives me cell partitioning. Essentially what I want is that no two processes can share a vertex BUT that can share an edge. Similar to how a DMDA is distributed. > > Thanks, > Swarnava > > On Sun, May 19, 2019 at 4:50 PM Smith, Barry F. wrote: > > This use case never occurred to us. Is the gmesh file containing a graph/network (as opposed to a mesh)? There seem two choices > > 1) if the gmesh file contains a graph/network one could write a gmesh reader for that case that reads directly for and constructs a DMNetwork or > > 2) write a converter for a DMPlex to DMNetwork. > > I lean toward the first > > Either way you need to understand the documentation for DMNetwork and how to build one up. > > > Barry > > > > On May 19, 2019, at 6:34 PM, Swarnava Ghosh via petsc-users wrote: > > > > Hi Petsc users and developers, > > > > I am trying to find a way of creating a DMNetwork from a DMPlex. I have read the DMPlex from a gmesh file and have it distributed. > > > > Thanks, > > SG > From davelee2804 at gmail.com Mon May 20 01:11:22 2019 From: davelee2804 at gmail.com (Dave Lee) Date: Mon, 20 May 2019 16:11:22 +1000 Subject: [petsc-users] Calling LAPACK routines from PETSc Message-ID: Hi Petsc, I'm attempting to implement a "hookstep" for the SNES trust region solver. Essentially what I'm trying to do is replace the solution of the least squares problem at the end of each GMRES solve with a modified solution with a norm that is constrained to be within the size of the trust region. In order to do this I need to perform an SVD on the Hessenberg matrix, which copying the function KSPComputeExtremeSingularValues(), I'm trying to do by accessing the LAPACK function dgesvd() via the PetscStackCallBLAS() machinery. One thing I'm confused about however is the ordering of the 2D arrays into and out of this function, given that that C and FORTRAN arrays use reverse indexing, ie: C[j+1][i+1] = F[i,j]. Given that the Hessenberg matrix has k+1 rows and k columns, should I be still be initializing this as H[row][col] and passing this into PetscStackCallBLAS("LAPACKgesvd",LAPACKgrsvd_(...)) or should I be transposing this before passing it in? Also for the left and right singular vector matrices that are returned by this function, should I be transposing these before I interpret them as C arrays? I've attached my modified version of gmres.c in case this is helpful. If you grep for DRL (my initials) then you'll see my changes to the code. Cheers, Dave. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gmres.c Type: application/octet-stream Size: 38311 bytes Desc: not available URL: From jed at jedbrown.org Mon May 20 01:24:31 2019 From: jed at jedbrown.org (Jed Brown) Date: Mon, 20 May 2019 00:24:31 -0600 Subject: [petsc-users] Calling LAPACK routines from PETSc In-Reply-To: References: Message-ID: <8736l9abj4.fsf@jedbrown.org> Dave Lee via petsc-users writes: > Hi Petsc, > > I'm attempting to implement a "hookstep" for the SNES trust region solver. > Essentially what I'm trying to do is replace the solution of the least > squares problem at the end of each GMRES solve with a modified solution > with a norm that is constrained to be within the size of the trust region. > > In order to do this I need to perform an SVD on the Hessenberg matrix, > which copying the function KSPComputeExtremeSingularValues(), I'm trying to > do by accessing the LAPACK function dgesvd() via the PetscStackCallBLAS() > machinery. One thing I'm confused about however is the ordering of the 2D > arrays into and out of this function, given that that C and FORTRAN arrays > use reverse indexing, ie: C[j+1][i+1] = F[i,j]. > > Given that the Hessenberg matrix has k+1 rows and k columns, should I be > still be initializing this as H[row][col] and passing this into > PetscStackCallBLAS("LAPACKgesvd",LAPACKgrsvd_(...)) > or should I be transposing this before passing it in? LAPACK terminology is with respect to Fortran ordering. There is a "leading dimension" parameter so that you can operate on non-contiguous blocks. See KSPComputeExtremeSingularValues_GMRES for an example. > Also for the left and right singular vector matrices that are returned by > this function, should I be transposing these before I interpret them as C > arrays? > > I've attached my modified version of gmres.c in case this is helpful. If > you grep for DRL (my initials) then you'll see my changes to the code. > > Cheers, Dave. > > /* > This file implements GMRES (a Generalized Minimal Residual) method. > Reference: Saad and Schultz, 1986. > > > Some comments on left vs. right preconditioning, and restarts. > Left and right preconditioning. > If right preconditioning is chosen, then the problem being solved > by gmres is actually > My = AB^-1 y = f > so the initial residual is > r = f - Mx > Note that B^-1 y = x or y = B x, and if x is non-zero, the initial > residual is > r = f - A x > The final solution is then > x = B^-1 y > > If left preconditioning is chosen, then the problem being solved is > My = B^-1 A x = B^-1 f, > and the initial residual is > r = B^-1(f - Ax) > > Restarts: Restarts are basically solves with x0 not equal to zero. > Note that we can eliminate an extra application of B^-1 between > restarts as long as we don't require that the solution at the end > of an unsuccessful gmres iteration always be the solution x. > */ > > #include <../src/ksp/ksp/impls/gmres/gmresimpl.h> /*I "petscksp.h" I*/ > #include // DRL > #define GMRES_DELTA_DIRECTIONS 10 > #define GMRES_DEFAULT_MAXK 30 > static PetscErrorCode KSPGMRESUpdateHessenberg(KSP,PetscInt,PetscBool,PetscReal*); > static PetscErrorCode KSPGMRESBuildSoln(PetscScalar*,Vec,Vec,KSP,PetscInt); > > PetscErrorCode KSPSetUp_GMRES(KSP ksp) > { > PetscInt hh,hes,rs,cc; > PetscErrorCode ierr; > PetscInt max_k,k; > KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > > PetscFunctionBegin; > max_k = gmres->max_k; /* restart size */ > hh = (max_k + 2) * (max_k + 1); > hes = (max_k + 1) * (max_k + 1); > rs = (max_k + 2); > cc = (max_k + 1); > > ierr = PetscCalloc5(hh,&gmres->hh_origin,hes,&gmres->hes_origin,rs,&gmres->rs_origin,cc,&gmres->cc_origin,cc,&gmres->ss_origin);CHKERRQ(ierr); > ierr = PetscLogObjectMemory((PetscObject)ksp,(hh + hes + rs + 2*cc)*sizeof(PetscScalar));CHKERRQ(ierr); > > if (ksp->calc_sings) { > /* Allocate workspace to hold Hessenberg matrix needed by lapack */ > ierr = PetscMalloc1((max_k + 3)*(max_k + 9),&gmres->Rsvd);CHKERRQ(ierr); > ierr = PetscLogObjectMemory((PetscObject)ksp,(max_k + 3)*(max_k + 9)*sizeof(PetscScalar));CHKERRQ(ierr); > ierr = PetscMalloc1(6*(max_k+2),&gmres->Dsvd);CHKERRQ(ierr); > ierr = PetscLogObjectMemory((PetscObject)ksp,6*(max_k+2)*sizeof(PetscReal));CHKERRQ(ierr); > } > > /* Allocate array to hold pointers to user vectors. Note that we need > 4 + max_k + 1 (since we need it+1 vectors, and it <= max_k) */ > gmres->vecs_allocated = VEC_OFFSET + 2 + max_k + gmres->nextra_vecs; > > ierr = PetscMalloc1(gmres->vecs_allocated,&gmres->vecs);CHKERRQ(ierr); > ierr = PetscMalloc1(VEC_OFFSET+2+max_k,&gmres->user_work);CHKERRQ(ierr); > ierr = PetscMalloc1(VEC_OFFSET+2+max_k,&gmres->mwork_alloc);CHKERRQ(ierr); > ierr = PetscLogObjectMemory((PetscObject)ksp,(VEC_OFFSET+2+max_k)*(sizeof(Vec*)+sizeof(PetscInt)) + gmres->vecs_allocated*sizeof(Vec));CHKERRQ(ierr); > > if (gmres->q_preallocate) { > gmres->vv_allocated = VEC_OFFSET + 2 + max_k; > > ierr = KSPCreateVecs(ksp,gmres->vv_allocated,&gmres->user_work[0],0,NULL);CHKERRQ(ierr); > ierr = PetscLogObjectParents(ksp,gmres->vv_allocated,gmres->user_work[0]);CHKERRQ(ierr); > > gmres->mwork_alloc[0] = gmres->vv_allocated; > gmres->nwork_alloc = 1; > for (k=0; kvv_allocated; k++) { > gmres->vecs[k] = gmres->user_work[0][k]; > } > } else { > gmres->vv_allocated = 5; > > ierr = KSPCreateVecs(ksp,5,&gmres->user_work[0],0,NULL);CHKERRQ(ierr); > ierr = PetscLogObjectParents(ksp,5,gmres->user_work[0]);CHKERRQ(ierr); > > gmres->mwork_alloc[0] = 5; > gmres->nwork_alloc = 1; > for (k=0; kvv_allocated; k++) { > gmres->vecs[k] = gmres->user_work[0][k]; > } > } > PetscFunctionReturn(0); > } > > /* > Run gmres, possibly with restart. Return residual history if requested. > input parameters: > > . gmres - structure containing parameters and work areas > > output parameters: > . nres - residuals (from preconditioned system) at each step. > If restarting, consider passing nres+it. If null, > ignored > . itcount - number of iterations used. nres[0] to nres[itcount] > are defined. If null, ignored. > > Notes: > On entry, the value in vector VEC_VV(0) should be the initial residual > (this allows shortcuts where the initial preconditioned residual is 0). > */ > PetscErrorCode KSPGMRESCycle(PetscInt *itcount,KSP ksp) > { > KSP_GMRES *gmres = (KSP_GMRES*)(ksp->data); > PetscReal res_norm,res,hapbnd,tt; > PetscErrorCode ierr; > PetscInt it = 0, max_k = gmres->max_k; > PetscBool hapend = PETSC_FALSE; > > PetscFunctionBegin; > if (itcount) *itcount = 0; > ierr = VecNormalize(VEC_VV(0),&res_norm);CHKERRQ(ierr); > KSPCheckNorm(ksp,res_norm); > res = res_norm; > *GRS(0) = res_norm; > > /* check for the convergence */ > ierr = PetscObjectSAWsTakeAccess((PetscObject)ksp);CHKERRQ(ierr); > ksp->rnorm = res; > ierr = PetscObjectSAWsGrantAccess((PetscObject)ksp);CHKERRQ(ierr); > gmres->it = (it - 1); > ierr = KSPLogResidualHistory(ksp,res);CHKERRQ(ierr); > ierr = KSPMonitor(ksp,ksp->its,res);CHKERRQ(ierr); > if (!res) { > ksp->reason = KSP_CONVERGED_ATOL; > ierr = PetscInfo(ksp,"Converged due to zero residual norm on entry\n");CHKERRQ(ierr); > PetscFunctionReturn(0); > } > > ierr = (*ksp->converged)(ksp,ksp->its,res,&ksp->reason,ksp->cnvP);CHKERRQ(ierr); > while (!ksp->reason && it < max_k && ksp->its < ksp->max_it) { > if (it) { > ierr = KSPLogResidualHistory(ksp,res);CHKERRQ(ierr); > ierr = KSPMonitor(ksp,ksp->its,res);CHKERRQ(ierr); > } > gmres->it = (it - 1); > if (gmres->vv_allocated <= it + VEC_OFFSET + 1) { > ierr = KSPGMRESGetNewVectors(ksp,it+1);CHKERRQ(ierr); > } > ierr = KSP_PCApplyBAorAB(ksp,VEC_VV(it),VEC_VV(1+it),VEC_TEMP_MATOP);CHKERRQ(ierr); > > /* update hessenberg matrix and do Gram-Schmidt */ > ierr = (*gmres->orthog)(ksp,it);CHKERRQ(ierr); > if (ksp->reason) break; > > /* vv(i+1) . vv(i+1) */ > ierr = VecNormalize(VEC_VV(it+1),&tt);CHKERRQ(ierr); > > /* save the magnitude */ > *HH(it+1,it) = tt; > *HES(it+1,it) = tt; > > /* check for the happy breakdown */ > hapbnd = PetscAbsScalar(tt / *GRS(it)); > if (hapbnd > gmres->haptol) hapbnd = gmres->haptol; > if (tt < hapbnd) { > ierr = PetscInfo2(ksp,"Detected happy breakdown, current hapbnd = %14.12e tt = %14.12e\n",(double)hapbnd,(double)tt);CHKERRQ(ierr); > hapend = PETSC_TRUE; > } > ierr = KSPGMRESUpdateHessenberg(ksp,it,hapend,&res);CHKERRQ(ierr); > > it++; > gmres->it = (it-1); /* For converged */ > ksp->its++; > ksp->rnorm = res; > if (ksp->reason) break; > > ierr = (*ksp->converged)(ksp,ksp->its,res,&ksp->reason,ksp->cnvP);CHKERRQ(ierr); > > /* Catch error in happy breakdown and signal convergence and break from loop */ > if (hapend) { > if (!ksp->reason) { > if (ksp->errorifnotconverged) SETERRQ1(PetscObjectComm((PetscObject)ksp),PETSC_ERR_NOT_CONVERGED,"You reached the happy break down, but convergence was not indicated. Residual norm = %g",(double)res); > else { > ksp->reason = KSP_DIVERGED_BREAKDOWN; > break; > } > } > } > } > > /* Monitor if we know that we will not return for a restart */ > if (it && (ksp->reason || ksp->its >= ksp->max_it)) { > ierr = KSPLogResidualHistory(ksp,res);CHKERRQ(ierr); > ierr = KSPMonitor(ksp,ksp->its,res);CHKERRQ(ierr); > } > > if (itcount) *itcount = it; > > > /* > Down here we have to solve for the "best" coefficients of the Krylov > columns, add the solution values together, and possibly unwind the > preconditioning from the solution > */ > /* Form the solution (or the solution so far) */ > ierr = KSPGMRESBuildSoln(GRS(0),ksp->vec_sol,ksp->vec_sol,ksp,it-1);CHKERRQ(ierr); > PetscFunctionReturn(0); > } > > PetscErrorCode KSPSolve_GMRES(KSP ksp) > { > PetscErrorCode ierr; > PetscInt its,itcount,i; > KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > PetscBool guess_zero = ksp->guess_zero; > PetscInt N = gmres->max_k + 1; > PetscBLASInt bN; > > PetscFunctionBegin; > if (ksp->calc_sings && !gmres->Rsvd) SETERRQ(PetscObjectComm((PetscObject)ksp),PETSC_ERR_ORDER,"Must call KSPSetComputeSingularValues() before KSPSetUp() is called"); > > ierr = PetscObjectSAWsTakeAccess((PetscObject)ksp);CHKERRQ(ierr); > ksp->its = 0; > ierr = PetscObjectSAWsGrantAccess((PetscObject)ksp);CHKERRQ(ierr); > > itcount = 0; > gmres->fullcycle = 0; > ksp->reason = KSP_CONVERGED_ITERATING; > while (!ksp->reason) { > ierr = KSPInitialResidual(ksp,ksp->vec_sol,VEC_TEMP,VEC_TEMP_MATOP,VEC_VV(0),ksp->vec_rhs);CHKERRQ(ierr); > ierr = KSPGMRESCycle(&its,ksp);CHKERRQ(ierr); > /* Store the Hessenberg matrix and the basis vectors of the Krylov subspace > if the cycle is complete for the computation of the Ritz pairs */ > if (its == gmres->max_k) { > gmres->fullcycle++; > if (ksp->calc_ritz) { > if (!gmres->hes_ritz) { > ierr = PetscMalloc1(N*N,&gmres->hes_ritz);CHKERRQ(ierr); > ierr = PetscLogObjectMemory((PetscObject)ksp,N*N*sizeof(PetscScalar));CHKERRQ(ierr); > ierr = VecDuplicateVecs(VEC_VV(0),N,&gmres->vecb);CHKERRQ(ierr); > } > ierr = PetscBLASIntCast(N,&bN);CHKERRQ(ierr); > ierr = PetscMemcpy(gmres->hes_ritz,gmres->hes_origin,bN*bN*sizeof(PetscReal));CHKERRQ(ierr); > for (i=0; imax_k+1; i++) { > ierr = VecCopy(VEC_VV(i),gmres->vecb[i]);CHKERRQ(ierr); > } > } > } > itcount += its; > if (itcount >= ksp->max_it) { > if (!ksp->reason) ksp->reason = KSP_DIVERGED_ITS; > break; > } > ksp->guess_zero = PETSC_FALSE; /* every future call to KSPInitialResidual() will have nonzero guess */ > } > ksp->guess_zero = guess_zero; /* restore if user provided nonzero initial guess */ > PetscFunctionReturn(0); > } > > PetscErrorCode KSPReset_GMRES(KSP ksp) > { > KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > PetscErrorCode ierr; > PetscInt i; > > PetscFunctionBegin; > /* Free the Hessenberg matrices */ > ierr = PetscFree6(gmres->hh_origin,gmres->hes_origin,gmres->rs_origin,gmres->cc_origin,gmres->ss_origin,gmres->hes_ritz);CHKERRQ(ierr); > > /* free work vectors */ > ierr = PetscFree(gmres->vecs);CHKERRQ(ierr); > for (i=0; inwork_alloc; i++) { > ierr = VecDestroyVecs(gmres->mwork_alloc[i],&gmres->user_work[i]);CHKERRQ(ierr); > } > gmres->nwork_alloc = 0; > if (gmres->vecb) { > ierr = VecDestroyVecs(gmres->max_k+1,&gmres->vecb);CHKERRQ(ierr); > } > > ierr = PetscFree(gmres->user_work);CHKERRQ(ierr); > ierr = PetscFree(gmres->mwork_alloc);CHKERRQ(ierr); > ierr = PetscFree(gmres->nrs);CHKERRQ(ierr); > ierr = VecDestroy(&gmres->sol_temp);CHKERRQ(ierr); > ierr = PetscFree(gmres->Rsvd);CHKERRQ(ierr); > ierr = PetscFree(gmres->Dsvd);CHKERRQ(ierr); > ierr = PetscFree(gmres->orthogwork);CHKERRQ(ierr); > > gmres->sol_temp = 0; > gmres->vv_allocated = 0; > gmres->vecs_allocated = 0; > gmres->sol_temp = 0; > PetscFunctionReturn(0); > } > > PetscErrorCode KSPDestroy_GMRES(KSP ksp) > { > PetscErrorCode ierr; > > PetscFunctionBegin; > ierr = KSPReset_GMRES(ksp);CHKERRQ(ierr); > ierr = PetscFree(ksp->data);CHKERRQ(ierr); > /* clear composed functions */ > ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetPreAllocateVectors_C",NULL);CHKERRQ(ierr); > ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetOrthogonalization_C",NULL);CHKERRQ(ierr); > ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESGetOrthogonalization_C",NULL);CHKERRQ(ierr); > ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetRestart_C",NULL);CHKERRQ(ierr); > ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESGetRestart_C",NULL);CHKERRQ(ierr); > ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetHapTol_C",NULL);CHKERRQ(ierr); > ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetCGSRefinementType_C",NULL);CHKERRQ(ierr); > ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESGetCGSRefinementType_C",NULL);CHKERRQ(ierr); > PetscFunctionReturn(0); > } > /* > KSPGMRESBuildSoln - create the solution from the starting vector and the > current iterates. > > Input parameters: > nrs - work area of size it + 1. > vs - index of initial guess > vdest - index of result. Note that vs may == vdest (replace > guess with the solution). > > This is an internal routine that knows about the GMRES internals. > */ > static PetscErrorCode KSPGMRESBuildSoln(PetscScalar *nrs,Vec vs,Vec vdest,KSP ksp,PetscInt it) > { > PetscScalar tt; > PetscErrorCode ierr; > PetscInt ii,k,j; > KSP_GMRES *gmres = (KSP_GMRES*)(ksp->data); > > PetscFunctionBegin; > /* Solve for solution vector that minimizes the residual */ > > /* If it is < 0, no gmres steps have been performed */ > if (it < 0) { > ierr = VecCopy(vs,vdest);CHKERRQ(ierr); /* VecCopy() is smart, exists immediately if vguess == vdest */ > PetscFunctionReturn(0); > } > if (*HH(it,it) != 0.0) { > nrs[it] = *GRS(it) / *HH(it,it); > } else { > ksp->reason = KSP_DIVERGED_BREAKDOWN; > > ierr = PetscInfo2(ksp,"Likely your matrix or preconditioner is singular. HH(it,it) is identically zero; it = %D GRS(it) = %g\n",it,(double)PetscAbsScalar(*GRS(it)));CHKERRQ(ierr); > PetscFunctionReturn(0); > } > for (ii=1; ii<=it; ii++) { > k = it - ii; > tt = *GRS(k); > for (j=k+1; j<=it; j++) tt = tt - *HH(k,j) * nrs[j]; > if (*HH(k,k) == 0.0) { > ksp->reason = KSP_DIVERGED_BREAKDOWN; > > ierr = PetscInfo1(ksp,"Likely your matrix or preconditioner is singular. HH(k,k) is identically zero; k = %D\n",k);CHKERRQ(ierr); > PetscFunctionReturn(0); > } > nrs[k] = tt / *HH(k,k); > } > > /* Perform the hookstep correction - DRL */ > if(gmres->delta > 0.0 && gmres->it > 0) { // Apply the hookstep to correct the GMRES solution (if required) > printf("\t\tapplying hookstep: initial delta: %lf", gmres->delta); > PetscInt N = gmres->max_k+2, ii, jj, j0; > PetscBLASInt nRows, nCols, lwork, lierr; > PetscScalar *R, *work; > PetscReal* S; > PetscScalar *U, *VT, *p, *q, *y; > PetscScalar bnorm, mu, qMag, qMag2, delta2; > > ierr = PetscMalloc1((gmres->max_k + 3)*(gmres->max_k + 9),&R);CHKERRQ(ierr); > work = R + N*N; > ierr = PetscMalloc1(6*(gmres->max_k+2),&S);CHKERRQ(ierr); > > ierr = PetscBLASIntCast(gmres->it+1,&nRows);CHKERRQ(ierr); > ierr = PetscBLASIntCast(gmres->it+0,&nCols);CHKERRQ(ierr); > ierr = PetscBLASIntCast(5*N,&lwork);CHKERRQ(ierr); > //ierr = PetscMemcpy(R,gmres->hes_origin,(gmres->max_k+2)*(gmres->max_k+1)*sizeof(PetscScalar));CHKERRQ(ierr); > ierr = PetscMalloc1(nRows*nCols,&R);CHKERRQ(ierr); > for (ii = 0; ii < nRows; ii++) { > for (jj = 0; jj < nCols; jj++) { > R[ii*nCols+jj] = *HH(ii,jj); > // Ensure Hessenberg structure > //if (ii > jj+1) R[ii*nCols+jj] = 0.0; > } > } > > ierr = PetscMalloc1(nRows*nRows,&U);CHKERRQ(ierr); > ierr = PetscMalloc1(nCols*nCols,&VT);CHKERRQ(ierr); > ierr = PetscMalloc1(nRows,&p);CHKERRQ(ierr); > ierr = PetscMalloc1(nCols,&q);CHKERRQ(ierr); > ierr = PetscMalloc1(nRows,&y);CHKERRQ(ierr); > > printf("\n\n");for(ii=0;ii > // Perform an SVD on the Hessenberg matrix > ierr = PetscFPTrapPush(PETSC_FP_TRAP_OFF);CHKERRQ(ierr); > PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_("A","A",&nRows,&nCols,R,&nRows,S,U,&nRows,VT,&nCols,work,&lwork,&lierr)); > if (lierr) SETERRQ1(PETSC_COMM_SELF,PETSC_ERR_LIB,"Error in SVD Lapack routine %d",(int)lierr); > ierr = PetscFPTrapPop();CHKERRQ(ierr); > > // Compute p = ||b|| U^T e_1 > ierr = VecNorm(ksp->vec_rhs,NORM_2,&bnorm);CHKERRQ(ierr); > for (ii=0; ii p[ii] = bnorm*U[ii*nRows]; > } > > // Solve the root finding problem for \mu such that ||q|| < \delta (where \delta is the radius of the trust region) > // This step is largely copied from Ashley Willis' openpipeflow: doi.org/10.1016/j.softx.2017.05.003 > mu = S[nCols-1]*S[nCols-1]*1.0e-6; > if (mu < 1.0e-99) mu = 1.0e-99; > qMag = 1.0e+99; > > while (qMag > gmres->delta) { > mu *= 1.1; > qMag2 = 0.0; > for (ii=0; ii q[ii] = p[ii]*S[ii]/(mu + S[ii]*S[ii]); > qMag2 += q[ii]*q[ii]; > } > qMag = PetscSqrtScalar(qMag2); > } > > // Expand y in terms of the right singular vectors as y = V q > for (ii=0; ii y[ii] = 0.0; > for (jj=0; jj y[ii] += VT[jj*nCols+ii]*q[jj]; // transpose of the transpose > } > } > > // Recompute the size of the trust region, \delta > delta2 = 0.0; > for (ii=0; ii j0 = (ii < 2) ? 0 : ii - 1; > p[ii] = 0.0; > for (jj=j0; jj p[ii] -= R[ii*nCols+jj]*y[jj]; > } > if (ii == 0) { > p[ii] += bnorm; > } > delta2 += p[ii]*p[ii]; > } > gmres->delta = PetscSqrtScalar(delta2); > printf("\t\t...final delta: %lf.\n", gmres->delta); > > // Pass the orthnomalized Krylov vector weights back out > for (ii=0; ii nrs[ii] = y[ii]; > } > > ierr = PetscFree(R);CHKERRQ(ierr); > ierr = PetscFree(S);CHKERRQ(ierr); > ierr = PetscFree(U);CHKERRQ(ierr); > ierr = PetscFree(VT);CHKERRQ(ierr); > ierr = PetscFree(p);CHKERRQ(ierr); > ierr = PetscFree(q);CHKERRQ(ierr); > ierr = PetscFree(y);CHKERRQ(ierr); > } > /*** DRL ***/ > > /* Accumulate the correction to the solution of the preconditioned problem in TEMP */ > ierr = VecSet(VEC_TEMP,0.0);CHKERRQ(ierr); > if (gmres->delta > 0.0) { > ierr = VecMAXPY(VEC_TEMP,it,nrs,&VEC_VV(0));CHKERRQ(ierr); // DRL > } else { > ierr = VecMAXPY(VEC_TEMP,it+1,nrs,&VEC_VV(0));CHKERRQ(ierr); > } > > ierr = KSPUnwindPreconditioner(ksp,VEC_TEMP,VEC_TEMP_MATOP);CHKERRQ(ierr); > /* add solution to previous solution */ > if (vdest != vs) { > ierr = VecCopy(vs,vdest);CHKERRQ(ierr); > } > ierr = VecAXPY(vdest,1.0,VEC_TEMP);CHKERRQ(ierr); > PetscFunctionReturn(0); > } > /* > Do the scalar work for the orthogonalization. Return new residual norm. > */ > static PetscErrorCode KSPGMRESUpdateHessenberg(KSP ksp,PetscInt it,PetscBool hapend,PetscReal *res) > { > PetscScalar *hh,*cc,*ss,tt; > PetscInt j; > KSP_GMRES *gmres = (KSP_GMRES*)(ksp->data); > > PetscFunctionBegin; > hh = HH(0,it); > cc = CC(0); > ss = SS(0); > > /* Apply all the previously computed plane rotations to the new column > of the Hessenberg matrix */ > for (j=1; j<=it; j++) { > tt = *hh; > *hh = PetscConj(*cc) * tt + *ss * *(hh+1); > hh++; > *hh = *cc++ * *hh - (*ss++ * tt); > } > > /* > compute the new plane rotation, and apply it to: > 1) the right-hand-side of the Hessenberg system > 2) the new column of the Hessenberg matrix > thus obtaining the updated value of the residual > */ > if (!hapend) { > tt = PetscSqrtScalar(PetscConj(*hh) * *hh + PetscConj(*(hh+1)) * *(hh+1)); > if (tt == 0.0) { > ksp->reason = KSP_DIVERGED_NULL; > PetscFunctionReturn(0); > } > *cc = *hh / tt; > *ss = *(hh+1) / tt; > *GRS(it+1) = -(*ss * *GRS(it)); > *GRS(it) = PetscConj(*cc) * *GRS(it); > *hh = PetscConj(*cc) * *hh + *ss * *(hh+1); > *res = PetscAbsScalar(*GRS(it+1)); > } else { > /* happy breakdown: HH(it+1, it) = 0, therfore we don't need to apply > another rotation matrix (so RH doesn't change). The new residual is > always the new sine term times the residual from last time (GRS(it)), > but now the new sine rotation would be zero...so the residual should > be zero...so we will multiply "zero" by the last residual. This might > not be exactly what we want to do here -could just return "zero". */ > > *res = 0.0; > } > PetscFunctionReturn(0); > } > /* > This routine allocates more work vectors, starting from VEC_VV(it). > */ > PetscErrorCode KSPGMRESGetNewVectors(KSP ksp,PetscInt it) > { > KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > PetscErrorCode ierr; > PetscInt nwork = gmres->nwork_alloc,k,nalloc; > > PetscFunctionBegin; > nalloc = PetscMin(ksp->max_it,gmres->delta_allocate); > /* Adjust the number to allocate to make sure that we don't exceed the > number of available slots */ > if (it + VEC_OFFSET + nalloc >= gmres->vecs_allocated) { > nalloc = gmres->vecs_allocated - it - VEC_OFFSET; > } > if (!nalloc) PetscFunctionReturn(0); > > gmres->vv_allocated += nalloc; > > ierr = KSPCreateVecs(ksp,nalloc,&gmres->user_work[nwork],0,NULL);CHKERRQ(ierr); > ierr = PetscLogObjectParents(ksp,nalloc,gmres->user_work[nwork]);CHKERRQ(ierr); > > gmres->mwork_alloc[nwork] = nalloc; > for (k=0; k gmres->vecs[it+VEC_OFFSET+k] = gmres->user_work[nwork][k]; > } > gmres->nwork_alloc++; > PetscFunctionReturn(0); > } > > PetscErrorCode KSPBuildSolution_GMRES(KSP ksp,Vec ptr,Vec *result) > { > KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > PetscErrorCode ierr; > > PetscFunctionBegin; > if (!ptr) { > if (!gmres->sol_temp) { > ierr = VecDuplicate(ksp->vec_sol,&gmres->sol_temp);CHKERRQ(ierr); > ierr = PetscLogObjectParent((PetscObject)ksp,(PetscObject)gmres->sol_temp);CHKERRQ(ierr); > } > ptr = gmres->sol_temp; > } > if (!gmres->nrs) { > /* allocate the work area */ > ierr = PetscMalloc1(gmres->max_k,&gmres->nrs);CHKERRQ(ierr); > ierr = PetscLogObjectMemory((PetscObject)ksp,gmres->max_k*sizeof(PetscScalar));CHKERRQ(ierr); > } > > ierr = KSPGMRESBuildSoln(gmres->nrs,ksp->vec_sol,ptr,ksp,gmres->it);CHKERRQ(ierr); > if (result) *result = ptr; > PetscFunctionReturn(0); > } > > PetscErrorCode KSPView_GMRES(KSP ksp,PetscViewer viewer) > { > KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > const char *cstr; > PetscErrorCode ierr; > PetscBool iascii,isstring; > > PetscFunctionBegin; > ierr = PetscObjectTypeCompare((PetscObject)viewer,PETSCVIEWERASCII,&iascii);CHKERRQ(ierr); > ierr = PetscObjectTypeCompare((PetscObject)viewer,PETSCVIEWERSTRING,&isstring);CHKERRQ(ierr); > if (gmres->orthog == KSPGMRESClassicalGramSchmidtOrthogonalization) { > switch (gmres->cgstype) { > case (KSP_GMRES_CGS_REFINE_NEVER): > cstr = "Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement"; > break; > case (KSP_GMRES_CGS_REFINE_ALWAYS): > cstr = "Classical (unmodified) Gram-Schmidt Orthogonalization with one step of iterative refinement"; > break; > case (KSP_GMRES_CGS_REFINE_IFNEEDED): > cstr = "Classical (unmodified) Gram-Schmidt Orthogonalization with one step of iterative refinement when needed"; > break; > default: > SETERRQ(PetscObjectComm((PetscObject)ksp),PETSC_ERR_ARG_OUTOFRANGE,"Unknown orthogonalization"); > } > } else if (gmres->orthog == KSPGMRESModifiedGramSchmidtOrthogonalization) { > cstr = "Modified Gram-Schmidt Orthogonalization"; > } else { > cstr = "unknown orthogonalization"; > } > if (iascii) { > ierr = PetscViewerASCIIPrintf(viewer," restart=%D, using %s\n",gmres->max_k,cstr);CHKERRQ(ierr); > ierr = PetscViewerASCIIPrintf(viewer," happy breakdown tolerance %g\n",(double)gmres->haptol);CHKERRQ(ierr); > } else if (isstring) { > ierr = PetscViewerStringSPrintf(viewer,"%s restart %D",cstr,gmres->max_k);CHKERRQ(ierr); > } > PetscFunctionReturn(0); > } > > /*@C > KSPGMRESMonitorKrylov - Calls VecView() for each new direction in the GMRES accumulated Krylov space. > > Collective on KSP > > Input Parameters: > + ksp - the KSP context > . its - iteration number > . fgnorm - 2-norm of residual (or gradient) > - dummy - an collection of viewers created with KSPViewerCreate() > > Options Database Keys: > . -ksp_gmres_kyrlov_monitor > > Notes: A new PETSCVIEWERDRAW is created for each Krylov vector so they can all be simultaneously viewed > Level: intermediate > > .keywords: KSP, nonlinear, vector, monitor, view, Krylov space > > .seealso: KSPMonitorSet(), KSPMonitorDefault(), VecView(), KSPViewersCreate(), KSPViewersDestroy() > @*/ > PetscErrorCode KSPGMRESMonitorKrylov(KSP ksp,PetscInt its,PetscReal fgnorm,void *dummy) > { > PetscViewers viewers = (PetscViewers)dummy; > KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > PetscErrorCode ierr; > Vec x; > PetscViewer viewer; > PetscBool flg; > > PetscFunctionBegin; > ierr = PetscViewersGetViewer(viewers,gmres->it+1,&viewer);CHKERRQ(ierr); > ierr = PetscObjectTypeCompare((PetscObject)viewer,PETSCVIEWERDRAW,&flg);CHKERRQ(ierr); > if (!flg) { > ierr = PetscViewerSetType(viewer,PETSCVIEWERDRAW);CHKERRQ(ierr); > ierr = PetscViewerDrawSetInfo(viewer,NULL,"Krylov GMRES Monitor",PETSC_DECIDE,PETSC_DECIDE,300,300);CHKERRQ(ierr); > } > x = VEC_VV(gmres->it+1); > ierr = VecView(x,viewer);CHKERRQ(ierr); > PetscFunctionReturn(0); > } > > PetscErrorCode KSPSetFromOptions_GMRES(PetscOptionItems *PetscOptionsObject,KSP ksp) > { > PetscErrorCode ierr; > PetscInt restart; > PetscReal haptol; > KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > PetscBool flg; > > PetscFunctionBegin; > ierr = PetscOptionsHead(PetscOptionsObject,"KSP GMRES Options");CHKERRQ(ierr); > ierr = PetscOptionsInt("-ksp_gmres_restart","Number of Krylov search directions","KSPGMRESSetRestart",gmres->max_k,&restart,&flg);CHKERRQ(ierr); > if (flg) { ierr = KSPGMRESSetRestart(ksp,restart);CHKERRQ(ierr); } > ierr = PetscOptionsReal("-ksp_gmres_haptol","Tolerance for exact convergence (happy ending)","KSPGMRESSetHapTol",gmres->haptol,&haptol,&flg);CHKERRQ(ierr); > if (flg) { ierr = KSPGMRESSetHapTol(ksp,haptol);CHKERRQ(ierr); } > flg = PETSC_FALSE; > ierr = PetscOptionsBool("-ksp_gmres_preallocate","Preallocate Krylov vectors","KSPGMRESSetPreAllocateVectors",flg,&flg,NULL);CHKERRQ(ierr); > if (flg) {ierr = KSPGMRESSetPreAllocateVectors(ksp);CHKERRQ(ierr);} > ierr = PetscOptionsBoolGroupBegin("-ksp_gmres_classicalgramschmidt","Classical (unmodified) Gram-Schmidt (fast)","KSPGMRESSetOrthogonalization",&flg);CHKERRQ(ierr); > if (flg) {ierr = KSPGMRESSetOrthogonalization(ksp,KSPGMRESClassicalGramSchmidtOrthogonalization);CHKERRQ(ierr);} > ierr = PetscOptionsBoolGroupEnd("-ksp_gmres_modifiedgramschmidt","Modified Gram-Schmidt (slow,more stable)","KSPGMRESSetOrthogonalization",&flg);CHKERRQ(ierr); > if (flg) {ierr = KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization);CHKERRQ(ierr);} > ierr = PetscOptionsEnum("-ksp_gmres_cgs_refinement_type","Type of iterative refinement for classical (unmodified) Gram-Schmidt","KSPGMRESSetCGSRefinementType", > KSPGMRESCGSRefinementTypes,(PetscEnum)gmres->cgstype,(PetscEnum*)&gmres->cgstype,&flg);CHKERRQ(ierr); > flg = PETSC_FALSE; > ierr = PetscOptionsBool("-ksp_gmres_krylov_monitor","Plot the Krylov directions","KSPMonitorSet",flg,&flg,NULL);CHKERRQ(ierr); > if (flg) { > PetscViewers viewers; > ierr = PetscViewersCreate(PetscObjectComm((PetscObject)ksp),&viewers);CHKERRQ(ierr); > ierr = KSPMonitorSet(ksp,KSPGMRESMonitorKrylov,viewers,(PetscErrorCode (*)(void**))PetscViewersDestroy);CHKERRQ(ierr); > } > ierr = PetscOptionsTail();CHKERRQ(ierr); > PetscFunctionReturn(0); > } > > PetscErrorCode KSPGMRESSetHapTol_GMRES(KSP ksp,PetscReal tol) > { > KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > > PetscFunctionBegin; > if (tol < 0.0) SETERRQ(PetscObjectComm((PetscObject)ksp),PETSC_ERR_ARG_OUTOFRANGE,"Tolerance must be non-negative"); > gmres->haptol = tol; > PetscFunctionReturn(0); > } > > PetscErrorCode KSPGMRESGetRestart_GMRES(KSP ksp,PetscInt *max_k) > { > KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > > PetscFunctionBegin; > *max_k = gmres->max_k; > PetscFunctionReturn(0); > } > > PetscErrorCode KSPGMRESSetRestart_GMRES(KSP ksp,PetscInt max_k) > { > KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > PetscErrorCode ierr; > > PetscFunctionBegin; > if (max_k < 1) SETERRQ(PetscObjectComm((PetscObject)ksp),PETSC_ERR_ARG_OUTOFRANGE,"Restart must be positive"); > if (!ksp->setupstage) { > gmres->max_k = max_k; > } else if (gmres->max_k != max_k) { > gmres->max_k = max_k; > ksp->setupstage = KSP_SETUP_NEW; > /* free the data structures, then create them again */ > ierr = KSPReset_GMRES(ksp);CHKERRQ(ierr); > } > PetscFunctionReturn(0); > } > > PetscErrorCode KSPGMRESSetOrthogonalization_GMRES(KSP ksp,FCN fcn) > { > PetscFunctionBegin; > ((KSP_GMRES*)ksp->data)->orthog = fcn; > PetscFunctionReturn(0); > } > > PetscErrorCode KSPGMRESGetOrthogonalization_GMRES(KSP ksp,FCN *fcn) > { > PetscFunctionBegin; > *fcn = ((KSP_GMRES*)ksp->data)->orthog; > PetscFunctionReturn(0); > } > > PetscErrorCode KSPGMRESSetPreAllocateVectors_GMRES(KSP ksp) > { > KSP_GMRES *gmres; > > PetscFunctionBegin; > gmres = (KSP_GMRES*)ksp->data; > gmres->q_preallocate = 1; > PetscFunctionReturn(0); > } > > PetscErrorCode KSPGMRESSetCGSRefinementType_GMRES(KSP ksp,KSPGMRESCGSRefinementType type) > { > KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > > PetscFunctionBegin; > gmres->cgstype = type; > PetscFunctionReturn(0); > } > > PetscErrorCode KSPGMRESGetCGSRefinementType_GMRES(KSP ksp,KSPGMRESCGSRefinementType *type) > { > KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > > PetscFunctionBegin; > *type = gmres->cgstype; > PetscFunctionReturn(0); > } > > /*@ > KSPGMRESSetCGSRefinementType - Sets the type of iterative refinement to use > in the classical Gram Schmidt orthogonalization. > > Logically Collective on KSP > > Input Parameters: > + ksp - the Krylov space context > - type - the type of refinement > > Options Database: > . -ksp_gmres_cgs_refinement_type > > Level: intermediate > > .keywords: KSP, GMRES, iterative refinement > > .seealso: KSPGMRESSetOrthogonalization(), KSPGMRESCGSRefinementType, KSPGMRESClassicalGramSchmidtOrthogonalization(), KSPGMRESGetCGSRefinementType(), > KSPGMRESGetOrthogonalization() > @*/ > PetscErrorCode KSPGMRESSetCGSRefinementType(KSP ksp,KSPGMRESCGSRefinementType type) > { > PetscErrorCode ierr; > > PetscFunctionBegin; > PetscValidHeaderSpecific(ksp,KSP_CLASSID,1); > PetscValidLogicalCollectiveEnum(ksp,type,2); > ierr = PetscTryMethod(ksp,"KSPGMRESSetCGSRefinementType_C",(KSP,KSPGMRESCGSRefinementType),(ksp,type));CHKERRQ(ierr); > PetscFunctionReturn(0); > } > > /*@ > KSPGMRESGetCGSRefinementType - Gets the type of iterative refinement to use > in the classical Gram Schmidt orthogonalization. > > Not Collective > > Input Parameter: > . ksp - the Krylov space context > > Output Parameter: > . type - the type of refinement > > Options Database: > . -ksp_gmres_cgs_refinement_type > > Level: intermediate > > .keywords: KSP, GMRES, iterative refinement > > .seealso: KSPGMRESSetOrthogonalization(), KSPGMRESCGSRefinementType, KSPGMRESClassicalGramSchmidtOrthogonalization(), KSPGMRESSetCGSRefinementType(), > KSPGMRESGetOrthogonalization() > @*/ > PetscErrorCode KSPGMRESGetCGSRefinementType(KSP ksp,KSPGMRESCGSRefinementType *type) > { > PetscErrorCode ierr; > > PetscFunctionBegin; > PetscValidHeaderSpecific(ksp,KSP_CLASSID,1); > ierr = PetscUseMethod(ksp,"KSPGMRESGetCGSRefinementType_C",(KSP,KSPGMRESCGSRefinementType*),(ksp,type));CHKERRQ(ierr); > PetscFunctionReturn(0); > } > > > /*@ > KSPGMRESSetRestart - Sets number of iterations at which GMRES, FGMRES and LGMRES restarts. > > Logically Collective on KSP > > Input Parameters: > + ksp - the Krylov space context > - restart - integer restart value > > Options Database: > . -ksp_gmres_restart > > Note: The default value is 30. > > Level: intermediate > > .keywords: KSP, GMRES, restart, iterations > > .seealso: KSPSetTolerances(), KSPGMRESSetOrthogonalization(), KSPGMRESSetPreAllocateVectors(), KSPGMRESGetRestart() > @*/ > PetscErrorCode KSPGMRESSetRestart(KSP ksp, PetscInt restart) > { > PetscErrorCode ierr; > > PetscFunctionBegin; > PetscValidLogicalCollectiveInt(ksp,restart,2); > > ierr = PetscTryMethod(ksp,"KSPGMRESSetRestart_C",(KSP,PetscInt),(ksp,restart));CHKERRQ(ierr); > PetscFunctionReturn(0); > } > > /*@ > KSPGMRESGetRestart - Gets number of iterations at which GMRES, FGMRES and LGMRES restarts. > > Not Collective > > Input Parameter: > . ksp - the Krylov space context > > Output Parameter: > . restart - integer restart value > > Note: The default value is 30. > > Level: intermediate > > .keywords: KSP, GMRES, restart, iterations > > .seealso: KSPSetTolerances(), KSPGMRESSetOrthogonalization(), KSPGMRESSetPreAllocateVectors(), KSPGMRESSetRestart() > @*/ > PetscErrorCode KSPGMRESGetRestart(KSP ksp, PetscInt *restart) > { > PetscErrorCode ierr; > > PetscFunctionBegin; > ierr = PetscUseMethod(ksp,"KSPGMRESGetRestart_C",(KSP,PetscInt*),(ksp,restart));CHKERRQ(ierr); > PetscFunctionReturn(0); > } > > /*@ > KSPGMRESSetHapTol - Sets tolerance for determining happy breakdown in GMRES, FGMRES and LGMRES. > > Logically Collective on KSP > > Input Parameters: > + ksp - the Krylov space context > - tol - the tolerance > > Options Database: > . -ksp_gmres_haptol > > Note: Happy breakdown is the rare case in GMRES where an 'exact' solution is obtained after > a certain number of iterations. If you attempt more iterations after this point unstable > things can happen hence very occasionally you may need to set this value to detect this condition > > Level: intermediate > > .keywords: KSP, GMRES, tolerance > > .seealso: KSPSetTolerances() > @*/ > PetscErrorCode KSPGMRESSetHapTol(KSP ksp,PetscReal tol) > { > PetscErrorCode ierr; > > PetscFunctionBegin; > PetscValidLogicalCollectiveReal(ksp,tol,2); > ierr = PetscTryMethod((ksp),"KSPGMRESSetHapTol_C",(KSP,PetscReal),((ksp),(tol)));CHKERRQ(ierr); > PetscFunctionReturn(0); > } > > /*MC > KSPGMRES - Implements the Generalized Minimal Residual method. > (Saad and Schultz, 1986) with restart > > > Options Database Keys: > + -ksp_gmres_restart - the number of Krylov directions to orthogonalize against > . -ksp_gmres_haptol - sets the tolerance for "happy ending" (exact convergence) > . -ksp_gmres_preallocate - preallocate all the Krylov search directions initially (otherwise groups of > vectors are allocated as needed) > . -ksp_gmres_classicalgramschmidt - use classical (unmodified) Gram-Schmidt to orthogonalize against the Krylov space (fast) (the default) > . -ksp_gmres_modifiedgramschmidt - use modified Gram-Schmidt in the orthogonalization (more stable, but slower) > . -ksp_gmres_cgs_refinement_type - determine if iterative refinement is used to increase the > stability of the classical Gram-Schmidt orthogonalization. > - -ksp_gmres_krylov_monitor - plot the Krylov space generated > > Level: beginner > > Notes: Left and right preconditioning are supported, but not symmetric preconditioning. > > References: > . 1. - YOUCEF SAAD AND MARTIN H. SCHULTZ, GMRES: A GENERALIZED MINIMAL RESIDUAL ALGORITHM FOR SOLVING NONSYMMETRIC LINEAR SYSTEMS. > SIAM J. ScI. STAT. COMPUT. Vo|. 7, No. 3, July 1986. > > .seealso: KSPCreate(), KSPSetType(), KSPType (for list of available types), KSP, KSPFGMRES, KSPLGMRES, > KSPGMRESSetRestart(), KSPGMRESSetHapTol(), KSPGMRESSetPreAllocateVectors(), KSPGMRESSetOrthogonalization(), KSPGMRESGetOrthogonalization(), > KSPGMRESClassicalGramSchmidtOrthogonalization(), KSPGMRESModifiedGramSchmidtOrthogonalization(), > KSPGMRESCGSRefinementType, KSPGMRESSetCGSRefinementType(), KSPGMRESGetCGSRefinementType(), KSPGMRESMonitorKrylov(), KSPSetPCSide() > > M*/ > > PETSC_EXTERN PetscErrorCode KSPCreate_GMRES(KSP ksp) > { > KSP_GMRES *gmres; > PetscErrorCode ierr; > > PetscFunctionBegin; > ierr = PetscNewLog(ksp,&gmres);CHKERRQ(ierr); > ksp->data = (void*)gmres; > > ierr = KSPSetSupportedNorm(ksp,KSP_NORM_PRECONDITIONED,PC_LEFT,4);CHKERRQ(ierr); > ierr = KSPSetSupportedNorm(ksp,KSP_NORM_UNPRECONDITIONED,PC_RIGHT,3);CHKERRQ(ierr); > ierr = KSPSetSupportedNorm(ksp,KSP_NORM_PRECONDITIONED,PC_SYMMETRIC,2);CHKERRQ(ierr); > ierr = KSPSetSupportedNorm(ksp,KSP_NORM_NONE,PC_RIGHT,1);CHKERRQ(ierr); > ierr = KSPSetSupportedNorm(ksp,KSP_NORM_NONE,PC_LEFT,1);CHKERRQ(ierr); > > ksp->ops->buildsolution = KSPBuildSolution_GMRES; > ksp->ops->setup = KSPSetUp_GMRES; > ksp->ops->solve = KSPSolve_GMRES; > ksp->ops->reset = KSPReset_GMRES; > ksp->ops->destroy = KSPDestroy_GMRES; > ksp->ops->view = KSPView_GMRES; > ksp->ops->setfromoptions = KSPSetFromOptions_GMRES; > ksp->ops->computeextremesingularvalues = KSPComputeExtremeSingularValues_GMRES; > ksp->ops->computeeigenvalues = KSPComputeEigenvalues_GMRES; > #if !defined(PETSC_USE_COMPLEX) && !defined(PETSC_HAVE_ESSL) > ksp->ops->computeritz = KSPComputeRitz_GMRES; > #endif > ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetPreAllocateVectors_C",KSPGMRESSetPreAllocateVectors_GMRES);CHKERRQ(ierr); > ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetOrthogonalization_C",KSPGMRESSetOrthogonalization_GMRES);CHKERRQ(ierr); > ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESGetOrthogonalization_C",KSPGMRESGetOrthogonalization_GMRES);CHKERRQ(ierr); > ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetRestart_C",KSPGMRESSetRestart_GMRES);CHKERRQ(ierr); > ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESGetRestart_C",KSPGMRESGetRestart_GMRES);CHKERRQ(ierr); > ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetHapTol_C",KSPGMRESSetHapTol_GMRES);CHKERRQ(ierr); > ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetCGSRefinementType_C",KSPGMRESSetCGSRefinementType_GMRES);CHKERRQ(ierr); > ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESGetCGSRefinementType_C",KSPGMRESGetCGSRefinementType_GMRES);CHKERRQ(ierr); > > gmres->haptol = 1.0e-30; > gmres->q_preallocate = 0; > gmres->delta_allocate = GMRES_DELTA_DIRECTIONS; > gmres->orthog = KSPGMRESClassicalGramSchmidtOrthogonalization; > gmres->nrs = 0; > gmres->sol_temp = 0; > gmres->max_k = GMRES_DEFAULT_MAXK; > gmres->Rsvd = 0; > gmres->cgstype = KSP_GMRES_CGS_REFINE_NEVER; > gmres->orthogwork = 0; > gmres->delta = -1.0; // DRL > PetscFunctionReturn(0); > } From bsmith at mcs.anl.gov Mon May 20 01:34:04 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Mon, 20 May 2019 06:34:04 +0000 Subject: [petsc-users] Calling LAPACK routines from PETSc In-Reply-To: <8736l9abj4.fsf@jedbrown.org> References: <8736l9abj4.fsf@jedbrown.org> Message-ID: <8D37944A-6C37-48D0-B238-4E9806D93B7E@anl.gov> The little work arrays in GMRES tend to be stored in Fortran ordering; there is no C style p[][] indexing into such arrays. Thus the arrays can safely be sent to LAPACK. The only trick is knowing the two dimensions and as Jed say the "leading dimension parameter. He gave you a place to look > On May 20, 2019, at 1:24 AM, Jed Brown via petsc-users wrote: > > Dave Lee via petsc-users writes: > >> Hi Petsc, >> >> I'm attempting to implement a "hookstep" for the SNES trust region solver. >> Essentially what I'm trying to do is replace the solution of the least >> squares problem at the end of each GMRES solve with a modified solution >> with a norm that is constrained to be within the size of the trust region. >> >> In order to do this I need to perform an SVD on the Hessenberg matrix, >> which copying the function KSPComputeExtremeSingularValues(), I'm trying to >> do by accessing the LAPACK function dgesvd() via the PetscStackCallBLAS() >> machinery. One thing I'm confused about however is the ordering of the 2D >> arrays into and out of this function, given that that C and FORTRAN arrays >> use reverse indexing, ie: C[j+1][i+1] = F[i,j]. >> >> Given that the Hessenberg matrix has k+1 rows and k columns, should I be >> still be initializing this as H[row][col] and passing this into >> PetscStackCallBLAS("LAPACKgesvd",LAPACKgrsvd_(...)) >> or should I be transposing this before passing it in? > > LAPACK terminology is with respect to Fortran ordering. There is a > "leading dimension" parameter so that you can operate on non-contiguous > blocks. See KSPComputeExtremeSingularValues_GMRES for an example. > >> Also for the left and right singular vector matrices that are returned by >> this function, should I be transposing these before I interpret them as C >> arrays? >> >> I've attached my modified version of gmres.c in case this is helpful. If >> you grep for DRL (my initials) then you'll see my changes to the code. >> >> Cheers, Dave. >> >> /* >> This file implements GMRES (a Generalized Minimal Residual) method. >> Reference: Saad and Schultz, 1986. >> >> >> Some comments on left vs. right preconditioning, and restarts. >> Left and right preconditioning. >> If right preconditioning is chosen, then the problem being solved >> by gmres is actually >> My = AB^-1 y = f >> so the initial residual is >> r = f - Mx >> Note that B^-1 y = x or y = B x, and if x is non-zero, the initial >> residual is >> r = f - A x >> The final solution is then >> x = B^-1 y >> >> If left preconditioning is chosen, then the problem being solved is >> My = B^-1 A x = B^-1 f, >> and the initial residual is >> r = B^-1(f - Ax) >> >> Restarts: Restarts are basically solves with x0 not equal to zero. >> Note that we can eliminate an extra application of B^-1 between >> restarts as long as we don't require that the solution at the end >> of an unsuccessful gmres iteration always be the solution x. >> */ >> >> #include <../src/ksp/ksp/impls/gmres/gmresimpl.h> /*I "petscksp.h" I*/ >> #include // DRL >> #define GMRES_DELTA_DIRECTIONS 10 >> #define GMRES_DEFAULT_MAXK 30 >> static PetscErrorCode KSPGMRESUpdateHessenberg(KSP,PetscInt,PetscBool,PetscReal*); >> static PetscErrorCode KSPGMRESBuildSoln(PetscScalar*,Vec,Vec,KSP,PetscInt); >> >> PetscErrorCode KSPSetUp_GMRES(KSP ksp) >> { >> PetscInt hh,hes,rs,cc; >> PetscErrorCode ierr; >> PetscInt max_k,k; >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; >> >> PetscFunctionBegin; >> max_k = gmres->max_k; /* restart size */ >> hh = (max_k + 2) * (max_k + 1); >> hes = (max_k + 1) * (max_k + 1); >> rs = (max_k + 2); >> cc = (max_k + 1); >> >> ierr = PetscCalloc5(hh,&gmres->hh_origin,hes,&gmres->hes_origin,rs,&gmres->rs_origin,cc,&gmres->cc_origin,cc,&gmres->ss_origin);CHKERRQ(ierr); >> ierr = PetscLogObjectMemory((PetscObject)ksp,(hh + hes + rs + 2*cc)*sizeof(PetscScalar));CHKERRQ(ierr); >> >> if (ksp->calc_sings) { >> /* Allocate workspace to hold Hessenberg matrix needed by lapack */ >> ierr = PetscMalloc1((max_k + 3)*(max_k + 9),&gmres->Rsvd);CHKERRQ(ierr); >> ierr = PetscLogObjectMemory((PetscObject)ksp,(max_k + 3)*(max_k + 9)*sizeof(PetscScalar));CHKERRQ(ierr); >> ierr = PetscMalloc1(6*(max_k+2),&gmres->Dsvd);CHKERRQ(ierr); >> ierr = PetscLogObjectMemory((PetscObject)ksp,6*(max_k+2)*sizeof(PetscReal));CHKERRQ(ierr); >> } >> >> /* Allocate array to hold pointers to user vectors. Note that we need >> 4 + max_k + 1 (since we need it+1 vectors, and it <= max_k) */ >> gmres->vecs_allocated = VEC_OFFSET + 2 + max_k + gmres->nextra_vecs; >> >> ierr = PetscMalloc1(gmres->vecs_allocated,&gmres->vecs);CHKERRQ(ierr); >> ierr = PetscMalloc1(VEC_OFFSET+2+max_k,&gmres->user_work);CHKERRQ(ierr); >> ierr = PetscMalloc1(VEC_OFFSET+2+max_k,&gmres->mwork_alloc);CHKERRQ(ierr); >> ierr = PetscLogObjectMemory((PetscObject)ksp,(VEC_OFFSET+2+max_k)*(sizeof(Vec*)+sizeof(PetscInt)) + gmres->vecs_allocated*sizeof(Vec));CHKERRQ(ierr); >> >> if (gmres->q_preallocate) { >> gmres->vv_allocated = VEC_OFFSET + 2 + max_k; >> >> ierr = KSPCreateVecs(ksp,gmres->vv_allocated,&gmres->user_work[0],0,NULL);CHKERRQ(ierr); >> ierr = PetscLogObjectParents(ksp,gmres->vv_allocated,gmres->user_work[0]);CHKERRQ(ierr); >> >> gmres->mwork_alloc[0] = gmres->vv_allocated; >> gmres->nwork_alloc = 1; >> for (k=0; kvv_allocated; k++) { >> gmres->vecs[k] = gmres->user_work[0][k]; >> } >> } else { >> gmres->vv_allocated = 5; >> >> ierr = KSPCreateVecs(ksp,5,&gmres->user_work[0],0,NULL);CHKERRQ(ierr); >> ierr = PetscLogObjectParents(ksp,5,gmres->user_work[0]);CHKERRQ(ierr); >> >> gmres->mwork_alloc[0] = 5; >> gmres->nwork_alloc = 1; >> for (k=0; kvv_allocated; k++) { >> gmres->vecs[k] = gmres->user_work[0][k]; >> } >> } >> PetscFunctionReturn(0); >> } >> >> /* >> Run gmres, possibly with restart. Return residual history if requested. >> input parameters: >> >> . gmres - structure containing parameters and work areas >> >> output parameters: >> . nres - residuals (from preconditioned system) at each step. >> If restarting, consider passing nres+it. If null, >> ignored >> . itcount - number of iterations used. nres[0] to nres[itcount] >> are defined. If null, ignored. >> >> Notes: >> On entry, the value in vector VEC_VV(0) should be the initial residual >> (this allows shortcuts where the initial preconditioned residual is 0). >> */ >> PetscErrorCode KSPGMRESCycle(PetscInt *itcount,KSP ksp) >> { >> KSP_GMRES *gmres = (KSP_GMRES*)(ksp->data); >> PetscReal res_norm,res,hapbnd,tt; >> PetscErrorCode ierr; >> PetscInt it = 0, max_k = gmres->max_k; >> PetscBool hapend = PETSC_FALSE; >> >> PetscFunctionBegin; >> if (itcount) *itcount = 0; >> ierr = VecNormalize(VEC_VV(0),&res_norm);CHKERRQ(ierr); >> KSPCheckNorm(ksp,res_norm); >> res = res_norm; >> *GRS(0) = res_norm; >> >> /* check for the convergence */ >> ierr = PetscObjectSAWsTakeAccess((PetscObject)ksp);CHKERRQ(ierr); >> ksp->rnorm = res; >> ierr = PetscObjectSAWsGrantAccess((PetscObject)ksp);CHKERRQ(ierr); >> gmres->it = (it - 1); >> ierr = KSPLogResidualHistory(ksp,res);CHKERRQ(ierr); >> ierr = KSPMonitor(ksp,ksp->its,res);CHKERRQ(ierr); >> if (!res) { >> ksp->reason = KSP_CONVERGED_ATOL; >> ierr = PetscInfo(ksp,"Converged due to zero residual norm on entry\n");CHKERRQ(ierr); >> PetscFunctionReturn(0); >> } >> >> ierr = (*ksp->converged)(ksp,ksp->its,res,&ksp->reason,ksp->cnvP);CHKERRQ(ierr); >> while (!ksp->reason && it < max_k && ksp->its < ksp->max_it) { >> if (it) { >> ierr = KSPLogResidualHistory(ksp,res);CHKERRQ(ierr); >> ierr = KSPMonitor(ksp,ksp->its,res);CHKERRQ(ierr); >> } >> gmres->it = (it - 1); >> if (gmres->vv_allocated <= it + VEC_OFFSET + 1) { >> ierr = KSPGMRESGetNewVectors(ksp,it+1);CHKERRQ(ierr); >> } >> ierr = KSP_PCApplyBAorAB(ksp,VEC_VV(it),VEC_VV(1+it),VEC_TEMP_MATOP);CHKERRQ(ierr); >> >> /* update hessenberg matrix and do Gram-Schmidt */ >> ierr = (*gmres->orthog)(ksp,it);CHKERRQ(ierr); >> if (ksp->reason) break; >> >> /* vv(i+1) . vv(i+1) */ >> ierr = VecNormalize(VEC_VV(it+1),&tt);CHKERRQ(ierr); >> >> /* save the magnitude */ >> *HH(it+1,it) = tt; >> *HES(it+1,it) = tt; >> >> /* check for the happy breakdown */ >> hapbnd = PetscAbsScalar(tt / *GRS(it)); >> if (hapbnd > gmres->haptol) hapbnd = gmres->haptol; >> if (tt < hapbnd) { >> ierr = PetscInfo2(ksp,"Detected happy breakdown, current hapbnd = %14.12e tt = %14.12e\n",(double)hapbnd,(double)tt);CHKERRQ(ierr); >> hapend = PETSC_TRUE; >> } >> ierr = KSPGMRESUpdateHessenberg(ksp,it,hapend,&res);CHKERRQ(ierr); >> >> it++; >> gmres->it = (it-1); /* For converged */ >> ksp->its++; >> ksp->rnorm = res; >> if (ksp->reason) break; >> >> ierr = (*ksp->converged)(ksp,ksp->its,res,&ksp->reason,ksp->cnvP);CHKERRQ(ierr); >> >> /* Catch error in happy breakdown and signal convergence and break from loop */ >> if (hapend) { >> if (!ksp->reason) { >> if (ksp->errorifnotconverged) SETERRQ1(PetscObjectComm((PetscObject)ksp),PETSC_ERR_NOT_CONVERGED,"You reached the happy break down, but convergence was not indicated. Residual norm = %g",(double)res); >> else { >> ksp->reason = KSP_DIVERGED_BREAKDOWN; >> break; >> } >> } >> } >> } >> >> /* Monitor if we know that we will not return for a restart */ >> if (it && (ksp->reason || ksp->its >= ksp->max_it)) { >> ierr = KSPLogResidualHistory(ksp,res);CHKERRQ(ierr); >> ierr = KSPMonitor(ksp,ksp->its,res);CHKERRQ(ierr); >> } >> >> if (itcount) *itcount = it; >> >> >> /* >> Down here we have to solve for the "best" coefficients of the Krylov >> columns, add the solution values together, and possibly unwind the >> preconditioning from the solution >> */ >> /* Form the solution (or the solution so far) */ >> ierr = KSPGMRESBuildSoln(GRS(0),ksp->vec_sol,ksp->vec_sol,ksp,it-1);CHKERRQ(ierr); >> PetscFunctionReturn(0); >> } >> >> PetscErrorCode KSPSolve_GMRES(KSP ksp) >> { >> PetscErrorCode ierr; >> PetscInt its,itcount,i; >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; >> PetscBool guess_zero = ksp->guess_zero; >> PetscInt N = gmres->max_k + 1; >> PetscBLASInt bN; >> >> PetscFunctionBegin; >> if (ksp->calc_sings && !gmres->Rsvd) SETERRQ(PetscObjectComm((PetscObject)ksp),PETSC_ERR_ORDER,"Must call KSPSetComputeSingularValues() before KSPSetUp() is called"); >> >> ierr = PetscObjectSAWsTakeAccess((PetscObject)ksp);CHKERRQ(ierr); >> ksp->its = 0; >> ierr = PetscObjectSAWsGrantAccess((PetscObject)ksp);CHKERRQ(ierr); >> >> itcount = 0; >> gmres->fullcycle = 0; >> ksp->reason = KSP_CONVERGED_ITERATING; >> while (!ksp->reason) { >> ierr = KSPInitialResidual(ksp,ksp->vec_sol,VEC_TEMP,VEC_TEMP_MATOP,VEC_VV(0),ksp->vec_rhs);CHKERRQ(ierr); >> ierr = KSPGMRESCycle(&its,ksp);CHKERRQ(ierr); >> /* Store the Hessenberg matrix and the basis vectors of the Krylov subspace >> if the cycle is complete for the computation of the Ritz pairs */ >> if (its == gmres->max_k) { >> gmres->fullcycle++; >> if (ksp->calc_ritz) { >> if (!gmres->hes_ritz) { >> ierr = PetscMalloc1(N*N,&gmres->hes_ritz);CHKERRQ(ierr); >> ierr = PetscLogObjectMemory((PetscObject)ksp,N*N*sizeof(PetscScalar));CHKERRQ(ierr); >> ierr = VecDuplicateVecs(VEC_VV(0),N,&gmres->vecb);CHKERRQ(ierr); >> } >> ierr = PetscBLASIntCast(N,&bN);CHKERRQ(ierr); >> ierr = PetscMemcpy(gmres->hes_ritz,gmres->hes_origin,bN*bN*sizeof(PetscReal));CHKERRQ(ierr); >> for (i=0; imax_k+1; i++) { >> ierr = VecCopy(VEC_VV(i),gmres->vecb[i]);CHKERRQ(ierr); >> } >> } >> } >> itcount += its; >> if (itcount >= ksp->max_it) { >> if (!ksp->reason) ksp->reason = KSP_DIVERGED_ITS; >> break; >> } >> ksp->guess_zero = PETSC_FALSE; /* every future call to KSPInitialResidual() will have nonzero guess */ >> } >> ksp->guess_zero = guess_zero; /* restore if user provided nonzero initial guess */ >> PetscFunctionReturn(0); >> } >> >> PetscErrorCode KSPReset_GMRES(KSP ksp) >> { >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; >> PetscErrorCode ierr; >> PetscInt i; >> >> PetscFunctionBegin; >> /* Free the Hessenberg matrices */ >> ierr = PetscFree6(gmres->hh_origin,gmres->hes_origin,gmres->rs_origin,gmres->cc_origin,gmres->ss_origin,gmres->hes_ritz);CHKERRQ(ierr); >> >> /* free work vectors */ >> ierr = PetscFree(gmres->vecs);CHKERRQ(ierr); >> for (i=0; inwork_alloc; i++) { >> ierr = VecDestroyVecs(gmres->mwork_alloc[i],&gmres->user_work[i]);CHKERRQ(ierr); >> } >> gmres->nwork_alloc = 0; >> if (gmres->vecb) { >> ierr = VecDestroyVecs(gmres->max_k+1,&gmres->vecb);CHKERRQ(ierr); >> } >> >> ierr = PetscFree(gmres->user_work);CHKERRQ(ierr); >> ierr = PetscFree(gmres->mwork_alloc);CHKERRQ(ierr); >> ierr = PetscFree(gmres->nrs);CHKERRQ(ierr); >> ierr = VecDestroy(&gmres->sol_temp);CHKERRQ(ierr); >> ierr = PetscFree(gmres->Rsvd);CHKERRQ(ierr); >> ierr = PetscFree(gmres->Dsvd);CHKERRQ(ierr); >> ierr = PetscFree(gmres->orthogwork);CHKERRQ(ierr); >> >> gmres->sol_temp = 0; >> gmres->vv_allocated = 0; >> gmres->vecs_allocated = 0; >> gmres->sol_temp = 0; >> PetscFunctionReturn(0); >> } >> >> PetscErrorCode KSPDestroy_GMRES(KSP ksp) >> { >> PetscErrorCode ierr; >> >> PetscFunctionBegin; >> ierr = KSPReset_GMRES(ksp);CHKERRQ(ierr); >> ierr = PetscFree(ksp->data);CHKERRQ(ierr); >> /* clear composed functions */ >> ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetPreAllocateVectors_C",NULL);CHKERRQ(ierr); >> ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetOrthogonalization_C",NULL);CHKERRQ(ierr); >> ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESGetOrthogonalization_C",NULL);CHKERRQ(ierr); >> ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetRestart_C",NULL);CHKERRQ(ierr); >> ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESGetRestart_C",NULL);CHKERRQ(ierr); >> ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetHapTol_C",NULL);CHKERRQ(ierr); >> ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetCGSRefinementType_C",NULL);CHKERRQ(ierr); >> ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESGetCGSRefinementType_C",NULL);CHKERRQ(ierr); >> PetscFunctionReturn(0); >> } >> /* >> KSPGMRESBuildSoln - create the solution from the starting vector and the >> current iterates. >> >> Input parameters: >> nrs - work area of size it + 1. >> vs - index of initial guess >> vdest - index of result. Note that vs may == vdest (replace >> guess with the solution). >> >> This is an internal routine that knows about the GMRES internals. >> */ >> static PetscErrorCode KSPGMRESBuildSoln(PetscScalar *nrs,Vec vs,Vec vdest,KSP ksp,PetscInt it) >> { >> PetscScalar tt; >> PetscErrorCode ierr; >> PetscInt ii,k,j; >> KSP_GMRES *gmres = (KSP_GMRES*)(ksp->data); >> >> PetscFunctionBegin; >> /* Solve for solution vector that minimizes the residual */ >> >> /* If it is < 0, no gmres steps have been performed */ >> if (it < 0) { >> ierr = VecCopy(vs,vdest);CHKERRQ(ierr); /* VecCopy() is smart, exists immediately if vguess == vdest */ >> PetscFunctionReturn(0); >> } >> if (*HH(it,it) != 0.0) { >> nrs[it] = *GRS(it) / *HH(it,it); >> } else { >> ksp->reason = KSP_DIVERGED_BREAKDOWN; >> >> ierr = PetscInfo2(ksp,"Likely your matrix or preconditioner is singular. HH(it,it) is identically zero; it = %D GRS(it) = %g\n",it,(double)PetscAbsScalar(*GRS(it)));CHKERRQ(ierr); >> PetscFunctionReturn(0); >> } >> for (ii=1; ii<=it; ii++) { >> k = it - ii; >> tt = *GRS(k); >> for (j=k+1; j<=it; j++) tt = tt - *HH(k,j) * nrs[j]; >> if (*HH(k,k) == 0.0) { >> ksp->reason = KSP_DIVERGED_BREAKDOWN; >> >> ierr = PetscInfo1(ksp,"Likely your matrix or preconditioner is singular. HH(k,k) is identically zero; k = %D\n",k);CHKERRQ(ierr); >> PetscFunctionReturn(0); >> } >> nrs[k] = tt / *HH(k,k); >> } >> >> /* Perform the hookstep correction - DRL */ >> if(gmres->delta > 0.0 && gmres->it > 0) { // Apply the hookstep to correct the GMRES solution (if required) >> printf("\t\tapplying hookstep: initial delta: %lf", gmres->delta); >> PetscInt N = gmres->max_k+2, ii, jj, j0; >> PetscBLASInt nRows, nCols, lwork, lierr; >> PetscScalar *R, *work; >> PetscReal* S; >> PetscScalar *U, *VT, *p, *q, *y; >> PetscScalar bnorm, mu, qMag, qMag2, delta2; >> >> ierr = PetscMalloc1((gmres->max_k + 3)*(gmres->max_k + 9),&R);CHKERRQ(ierr); >> work = R + N*N; >> ierr = PetscMalloc1(6*(gmres->max_k+2),&S);CHKERRQ(ierr); >> >> ierr = PetscBLASIntCast(gmres->it+1,&nRows);CHKERRQ(ierr); >> ierr = PetscBLASIntCast(gmres->it+0,&nCols);CHKERRQ(ierr); >> ierr = PetscBLASIntCast(5*N,&lwork);CHKERRQ(ierr); >> //ierr = PetscMemcpy(R,gmres->hes_origin,(gmres->max_k+2)*(gmres->max_k+1)*sizeof(PetscScalar));CHKERRQ(ierr); >> ierr = PetscMalloc1(nRows*nCols,&R);CHKERRQ(ierr); >> for (ii = 0; ii < nRows; ii++) { >> for (jj = 0; jj < nCols; jj++) { >> R[ii*nCols+jj] = *HH(ii,jj); >> // Ensure Hessenberg structure >> //if (ii > jj+1) R[ii*nCols+jj] = 0.0; >> } >> } >> >> ierr = PetscMalloc1(nRows*nRows,&U);CHKERRQ(ierr); >> ierr = PetscMalloc1(nCols*nCols,&VT);CHKERRQ(ierr); >> ierr = PetscMalloc1(nRows,&p);CHKERRQ(ierr); >> ierr = PetscMalloc1(nCols,&q);CHKERRQ(ierr); >> ierr = PetscMalloc1(nRows,&y);CHKERRQ(ierr); >> >> printf("\n\n");for(ii=0;ii> >> // Perform an SVD on the Hessenberg matrix >> ierr = PetscFPTrapPush(PETSC_FP_TRAP_OFF);CHKERRQ(ierr); >> PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_("A","A",&nRows,&nCols,R,&nRows,S,U,&nRows,VT,&nCols,work,&lwork,&lierr)); >> if (lierr) SETERRQ1(PETSC_COMM_SELF,PETSC_ERR_LIB,"Error in SVD Lapack routine %d",(int)lierr); >> ierr = PetscFPTrapPop();CHKERRQ(ierr); >> >> // Compute p = ||b|| U^T e_1 >> ierr = VecNorm(ksp->vec_rhs,NORM_2,&bnorm);CHKERRQ(ierr); >> for (ii=0; ii> p[ii] = bnorm*U[ii*nRows]; >> } >> >> // Solve the root finding problem for \mu such that ||q|| < \delta (where \delta is the radius of the trust region) >> // This step is largely copied from Ashley Willis' openpipeflow: doi.org/10.1016/j.softx.2017.05.003 >> mu = S[nCols-1]*S[nCols-1]*1.0e-6; >> if (mu < 1.0e-99) mu = 1.0e-99; >> qMag = 1.0e+99; >> >> while (qMag > gmres->delta) { >> mu *= 1.1; >> qMag2 = 0.0; >> for (ii=0; ii> q[ii] = p[ii]*S[ii]/(mu + S[ii]*S[ii]); >> qMag2 += q[ii]*q[ii]; >> } >> qMag = PetscSqrtScalar(qMag2); >> } >> >> // Expand y in terms of the right singular vectors as y = V q >> for (ii=0; ii> y[ii] = 0.0; >> for (jj=0; jj> y[ii] += VT[jj*nCols+ii]*q[jj]; // transpose of the transpose >> } >> } >> >> // Recompute the size of the trust region, \delta >> delta2 = 0.0; >> for (ii=0; ii> j0 = (ii < 2) ? 0 : ii - 1; >> p[ii] = 0.0; >> for (jj=j0; jj> p[ii] -= R[ii*nCols+jj]*y[jj]; >> } >> if (ii == 0) { >> p[ii] += bnorm; >> } >> delta2 += p[ii]*p[ii]; >> } >> gmres->delta = PetscSqrtScalar(delta2); >> printf("\t\t...final delta: %lf.\n", gmres->delta); >> >> // Pass the orthnomalized Krylov vector weights back out >> for (ii=0; ii> nrs[ii] = y[ii]; >> } >> >> ierr = PetscFree(R);CHKERRQ(ierr); >> ierr = PetscFree(S);CHKERRQ(ierr); >> ierr = PetscFree(U);CHKERRQ(ierr); >> ierr = PetscFree(VT);CHKERRQ(ierr); >> ierr = PetscFree(p);CHKERRQ(ierr); >> ierr = PetscFree(q);CHKERRQ(ierr); >> ierr = PetscFree(y);CHKERRQ(ierr); >> } >> /*** DRL ***/ >> >> /* Accumulate the correction to the solution of the preconditioned problem in TEMP */ >> ierr = VecSet(VEC_TEMP,0.0);CHKERRQ(ierr); >> if (gmres->delta > 0.0) { >> ierr = VecMAXPY(VEC_TEMP,it,nrs,&VEC_VV(0));CHKERRQ(ierr); // DRL >> } else { >> ierr = VecMAXPY(VEC_TEMP,it+1,nrs,&VEC_VV(0));CHKERRQ(ierr); >> } >> >> ierr = KSPUnwindPreconditioner(ksp,VEC_TEMP,VEC_TEMP_MATOP);CHKERRQ(ierr); >> /* add solution to previous solution */ >> if (vdest != vs) { >> ierr = VecCopy(vs,vdest);CHKERRQ(ierr); >> } >> ierr = VecAXPY(vdest,1.0,VEC_TEMP);CHKERRQ(ierr); >> PetscFunctionReturn(0); >> } >> /* >> Do the scalar work for the orthogonalization. Return new residual norm. >> */ >> static PetscErrorCode KSPGMRESUpdateHessenberg(KSP ksp,PetscInt it,PetscBool hapend,PetscReal *res) >> { >> PetscScalar *hh,*cc,*ss,tt; >> PetscInt j; >> KSP_GMRES *gmres = (KSP_GMRES*)(ksp->data); >> >> PetscFunctionBegin; >> hh = HH(0,it); >> cc = CC(0); >> ss = SS(0); >> >> /* Apply all the previously computed plane rotations to the new column >> of the Hessenberg matrix */ >> for (j=1; j<=it; j++) { >> tt = *hh; >> *hh = PetscConj(*cc) * tt + *ss * *(hh+1); >> hh++; >> *hh = *cc++ * *hh - (*ss++ * tt); >> } >> >> /* >> compute the new plane rotation, and apply it to: >> 1) the right-hand-side of the Hessenberg system >> 2) the new column of the Hessenberg matrix >> thus obtaining the updated value of the residual >> */ >> if (!hapend) { >> tt = PetscSqrtScalar(PetscConj(*hh) * *hh + PetscConj(*(hh+1)) * *(hh+1)); >> if (tt == 0.0) { >> ksp->reason = KSP_DIVERGED_NULL; >> PetscFunctionReturn(0); >> } >> *cc = *hh / tt; >> *ss = *(hh+1) / tt; >> *GRS(it+1) = -(*ss * *GRS(it)); >> *GRS(it) = PetscConj(*cc) * *GRS(it); >> *hh = PetscConj(*cc) * *hh + *ss * *(hh+1); >> *res = PetscAbsScalar(*GRS(it+1)); >> } else { >> /* happy breakdown: HH(it+1, it) = 0, therfore we don't need to apply >> another rotation matrix (so RH doesn't change). The new residual is >> always the new sine term times the residual from last time (GRS(it)), >> but now the new sine rotation would be zero...so the residual should >> be zero...so we will multiply "zero" by the last residual. This might >> not be exactly what we want to do here -could just return "zero". */ >> >> *res = 0.0; >> } >> PetscFunctionReturn(0); >> } >> /* >> This routine allocates more work vectors, starting from VEC_VV(it). >> */ >> PetscErrorCode KSPGMRESGetNewVectors(KSP ksp,PetscInt it) >> { >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; >> PetscErrorCode ierr; >> PetscInt nwork = gmres->nwork_alloc,k,nalloc; >> >> PetscFunctionBegin; >> nalloc = PetscMin(ksp->max_it,gmres->delta_allocate); >> /* Adjust the number to allocate to make sure that we don't exceed the >> number of available slots */ >> if (it + VEC_OFFSET + nalloc >= gmres->vecs_allocated) { >> nalloc = gmres->vecs_allocated - it - VEC_OFFSET; >> } >> if (!nalloc) PetscFunctionReturn(0); >> >> gmres->vv_allocated += nalloc; >> >> ierr = KSPCreateVecs(ksp,nalloc,&gmres->user_work[nwork],0,NULL);CHKERRQ(ierr); >> ierr = PetscLogObjectParents(ksp,nalloc,gmres->user_work[nwork]);CHKERRQ(ierr); >> >> gmres->mwork_alloc[nwork] = nalloc; >> for (k=0; k> gmres->vecs[it+VEC_OFFSET+k] = gmres->user_work[nwork][k]; >> } >> gmres->nwork_alloc++; >> PetscFunctionReturn(0); >> } >> >> PetscErrorCode KSPBuildSolution_GMRES(KSP ksp,Vec ptr,Vec *result) >> { >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; >> PetscErrorCode ierr; >> >> PetscFunctionBegin; >> if (!ptr) { >> if (!gmres->sol_temp) { >> ierr = VecDuplicate(ksp->vec_sol,&gmres->sol_temp);CHKERRQ(ierr); >> ierr = PetscLogObjectParent((PetscObject)ksp,(PetscObject)gmres->sol_temp);CHKERRQ(ierr); >> } >> ptr = gmres->sol_temp; >> } >> if (!gmres->nrs) { >> /* allocate the work area */ >> ierr = PetscMalloc1(gmres->max_k,&gmres->nrs);CHKERRQ(ierr); >> ierr = PetscLogObjectMemory((PetscObject)ksp,gmres->max_k*sizeof(PetscScalar));CHKERRQ(ierr); >> } >> >> ierr = KSPGMRESBuildSoln(gmres->nrs,ksp->vec_sol,ptr,ksp,gmres->it);CHKERRQ(ierr); >> if (result) *result = ptr; >> PetscFunctionReturn(0); >> } >> >> PetscErrorCode KSPView_GMRES(KSP ksp,PetscViewer viewer) >> { >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; >> const char *cstr; >> PetscErrorCode ierr; >> PetscBool iascii,isstring; >> >> PetscFunctionBegin; >> ierr = PetscObjectTypeCompare((PetscObject)viewer,PETSCVIEWERASCII,&iascii);CHKERRQ(ierr); >> ierr = PetscObjectTypeCompare((PetscObject)viewer,PETSCVIEWERSTRING,&isstring);CHKERRQ(ierr); >> if (gmres->orthog == KSPGMRESClassicalGramSchmidtOrthogonalization) { >> switch (gmres->cgstype) { >> case (KSP_GMRES_CGS_REFINE_NEVER): >> cstr = "Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement"; >> break; >> case (KSP_GMRES_CGS_REFINE_ALWAYS): >> cstr = "Classical (unmodified) Gram-Schmidt Orthogonalization with one step of iterative refinement"; >> break; >> case (KSP_GMRES_CGS_REFINE_IFNEEDED): >> cstr = "Classical (unmodified) Gram-Schmidt Orthogonalization with one step of iterative refinement when needed"; >> break; >> default: >> SETERRQ(PetscObjectComm((PetscObject)ksp),PETSC_ERR_ARG_OUTOFRANGE,"Unknown orthogonalization"); >> } >> } else if (gmres->orthog == KSPGMRESModifiedGramSchmidtOrthogonalization) { >> cstr = "Modified Gram-Schmidt Orthogonalization"; >> } else { >> cstr = "unknown orthogonalization"; >> } >> if (iascii) { >> ierr = PetscViewerASCIIPrintf(viewer," restart=%D, using %s\n",gmres->max_k,cstr);CHKERRQ(ierr); >> ierr = PetscViewerASCIIPrintf(viewer," happy breakdown tolerance %g\n",(double)gmres->haptol);CHKERRQ(ierr); >> } else if (isstring) { >> ierr = PetscViewerStringSPrintf(viewer,"%s restart %D",cstr,gmres->max_k);CHKERRQ(ierr); >> } >> PetscFunctionReturn(0); >> } >> >> /*@C >> KSPGMRESMonitorKrylov - Calls VecView() for each new direction in the GMRES accumulated Krylov space. >> >> Collective on KSP >> >> Input Parameters: >> + ksp - the KSP context >> . its - iteration number >> . fgnorm - 2-norm of residual (or gradient) >> - dummy - an collection of viewers created with KSPViewerCreate() >> >> Options Database Keys: >> . -ksp_gmres_kyrlov_monitor >> >> Notes: A new PETSCVIEWERDRAW is created for each Krylov vector so they can all be simultaneously viewed >> Level: intermediate >> >> .keywords: KSP, nonlinear, vector, monitor, view, Krylov space >> >> .seealso: KSPMonitorSet(), KSPMonitorDefault(), VecView(), KSPViewersCreate(), KSPViewersDestroy() >> @*/ >> PetscErrorCode KSPGMRESMonitorKrylov(KSP ksp,PetscInt its,PetscReal fgnorm,void *dummy) >> { >> PetscViewers viewers = (PetscViewers)dummy; >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; >> PetscErrorCode ierr; >> Vec x; >> PetscViewer viewer; >> PetscBool flg; >> >> PetscFunctionBegin; >> ierr = PetscViewersGetViewer(viewers,gmres->it+1,&viewer);CHKERRQ(ierr); >> ierr = PetscObjectTypeCompare((PetscObject)viewer,PETSCVIEWERDRAW,&flg);CHKERRQ(ierr); >> if (!flg) { >> ierr = PetscViewerSetType(viewer,PETSCVIEWERDRAW);CHKERRQ(ierr); >> ierr = PetscViewerDrawSetInfo(viewer,NULL,"Krylov GMRES Monitor",PETSC_DECIDE,PETSC_DECIDE,300,300);CHKERRQ(ierr); >> } >> x = VEC_VV(gmres->it+1); >> ierr = VecView(x,viewer);CHKERRQ(ierr); >> PetscFunctionReturn(0); >> } >> >> PetscErrorCode KSPSetFromOptions_GMRES(PetscOptionItems *PetscOptionsObject,KSP ksp) >> { >> PetscErrorCode ierr; >> PetscInt restart; >> PetscReal haptol; >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; >> PetscBool flg; >> >> PetscFunctionBegin; >> ierr = PetscOptionsHead(PetscOptionsObject,"KSP GMRES Options");CHKERRQ(ierr); >> ierr = PetscOptionsInt("-ksp_gmres_restart","Number of Krylov search directions","KSPGMRESSetRestart",gmres->max_k,&restart,&flg);CHKERRQ(ierr); >> if (flg) { ierr = KSPGMRESSetRestart(ksp,restart);CHKERRQ(ierr); } >> ierr = PetscOptionsReal("-ksp_gmres_haptol","Tolerance for exact convergence (happy ending)","KSPGMRESSetHapTol",gmres->haptol,&haptol,&flg);CHKERRQ(ierr); >> if (flg) { ierr = KSPGMRESSetHapTol(ksp,haptol);CHKERRQ(ierr); } >> flg = PETSC_FALSE; >> ierr = PetscOptionsBool("-ksp_gmres_preallocate","Preallocate Krylov vectors","KSPGMRESSetPreAllocateVectors",flg,&flg,NULL);CHKERRQ(ierr); >> if (flg) {ierr = KSPGMRESSetPreAllocateVectors(ksp);CHKERRQ(ierr);} >> ierr = PetscOptionsBoolGroupBegin("-ksp_gmres_classicalgramschmidt","Classical (unmodified) Gram-Schmidt (fast)","KSPGMRESSetOrthogonalization",&flg);CHKERRQ(ierr); >> if (flg) {ierr = KSPGMRESSetOrthogonalization(ksp,KSPGMRESClassicalGramSchmidtOrthogonalization);CHKERRQ(ierr);} >> ierr = PetscOptionsBoolGroupEnd("-ksp_gmres_modifiedgramschmidt","Modified Gram-Schmidt (slow,more stable)","KSPGMRESSetOrthogonalization",&flg);CHKERRQ(ierr); >> if (flg) {ierr = KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization);CHKERRQ(ierr);} >> ierr = PetscOptionsEnum("-ksp_gmres_cgs_refinement_type","Type of iterative refinement for classical (unmodified) Gram-Schmidt","KSPGMRESSetCGSRefinementType", >> KSPGMRESCGSRefinementTypes,(PetscEnum)gmres->cgstype,(PetscEnum*)&gmres->cgstype,&flg);CHKERRQ(ierr); >> flg = PETSC_FALSE; >> ierr = PetscOptionsBool("-ksp_gmres_krylov_monitor","Plot the Krylov directions","KSPMonitorSet",flg,&flg,NULL);CHKERRQ(ierr); >> if (flg) { >> PetscViewers viewers; >> ierr = PetscViewersCreate(PetscObjectComm((PetscObject)ksp),&viewers);CHKERRQ(ierr); >> ierr = KSPMonitorSet(ksp,KSPGMRESMonitorKrylov,viewers,(PetscErrorCode (*)(void**))PetscViewersDestroy);CHKERRQ(ierr); >> } >> ierr = PetscOptionsTail();CHKERRQ(ierr); >> PetscFunctionReturn(0); >> } >> >> PetscErrorCode KSPGMRESSetHapTol_GMRES(KSP ksp,PetscReal tol) >> { >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; >> >> PetscFunctionBegin; >> if (tol < 0.0) SETERRQ(PetscObjectComm((PetscObject)ksp),PETSC_ERR_ARG_OUTOFRANGE,"Tolerance must be non-negative"); >> gmres->haptol = tol; >> PetscFunctionReturn(0); >> } >> >> PetscErrorCode KSPGMRESGetRestart_GMRES(KSP ksp,PetscInt *max_k) >> { >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; >> >> PetscFunctionBegin; >> *max_k = gmres->max_k; >> PetscFunctionReturn(0); >> } >> >> PetscErrorCode KSPGMRESSetRestart_GMRES(KSP ksp,PetscInt max_k) >> { >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; >> PetscErrorCode ierr; >> >> PetscFunctionBegin; >> if (max_k < 1) SETERRQ(PetscObjectComm((PetscObject)ksp),PETSC_ERR_ARG_OUTOFRANGE,"Restart must be positive"); >> if (!ksp->setupstage) { >> gmres->max_k = max_k; >> } else if (gmres->max_k != max_k) { >> gmres->max_k = max_k; >> ksp->setupstage = KSP_SETUP_NEW; >> /* free the data structures, then create them again */ >> ierr = KSPReset_GMRES(ksp);CHKERRQ(ierr); >> } >> PetscFunctionReturn(0); >> } >> >> PetscErrorCode KSPGMRESSetOrthogonalization_GMRES(KSP ksp,FCN fcn) >> { >> PetscFunctionBegin; >> ((KSP_GMRES*)ksp->data)->orthog = fcn; >> PetscFunctionReturn(0); >> } >> >> PetscErrorCode KSPGMRESGetOrthogonalization_GMRES(KSP ksp,FCN *fcn) >> { >> PetscFunctionBegin; >> *fcn = ((KSP_GMRES*)ksp->data)->orthog; >> PetscFunctionReturn(0); >> } >> >> PetscErrorCode KSPGMRESSetPreAllocateVectors_GMRES(KSP ksp) >> { >> KSP_GMRES *gmres; >> >> PetscFunctionBegin; >> gmres = (KSP_GMRES*)ksp->data; >> gmres->q_preallocate = 1; >> PetscFunctionReturn(0); >> } >> >> PetscErrorCode KSPGMRESSetCGSRefinementType_GMRES(KSP ksp,KSPGMRESCGSRefinementType type) >> { >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; >> >> PetscFunctionBegin; >> gmres->cgstype = type; >> PetscFunctionReturn(0); >> } >> >> PetscErrorCode KSPGMRESGetCGSRefinementType_GMRES(KSP ksp,KSPGMRESCGSRefinementType *type) >> { >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; >> >> PetscFunctionBegin; >> *type = gmres->cgstype; >> PetscFunctionReturn(0); >> } >> >> /*@ >> KSPGMRESSetCGSRefinementType - Sets the type of iterative refinement to use >> in the classical Gram Schmidt orthogonalization. >> >> Logically Collective on KSP >> >> Input Parameters: >> + ksp - the Krylov space context >> - type - the type of refinement >> >> Options Database: >> . -ksp_gmres_cgs_refinement_type >> >> Level: intermediate >> >> .keywords: KSP, GMRES, iterative refinement >> >> .seealso: KSPGMRESSetOrthogonalization(), KSPGMRESCGSRefinementType, KSPGMRESClassicalGramSchmidtOrthogonalization(), KSPGMRESGetCGSRefinementType(), >> KSPGMRESGetOrthogonalization() >> @*/ >> PetscErrorCode KSPGMRESSetCGSRefinementType(KSP ksp,KSPGMRESCGSRefinementType type) >> { >> PetscErrorCode ierr; >> >> PetscFunctionBegin; >> PetscValidHeaderSpecific(ksp,KSP_CLASSID,1); >> PetscValidLogicalCollectiveEnum(ksp,type,2); >> ierr = PetscTryMethod(ksp,"KSPGMRESSetCGSRefinementType_C",(KSP,KSPGMRESCGSRefinementType),(ksp,type));CHKERRQ(ierr); >> PetscFunctionReturn(0); >> } >> >> /*@ >> KSPGMRESGetCGSRefinementType - Gets the type of iterative refinement to use >> in the classical Gram Schmidt orthogonalization. >> >> Not Collective >> >> Input Parameter: >> . ksp - the Krylov space context >> >> Output Parameter: >> . type - the type of refinement >> >> Options Database: >> . -ksp_gmres_cgs_refinement_type >> >> Level: intermediate >> >> .keywords: KSP, GMRES, iterative refinement >> >> .seealso: KSPGMRESSetOrthogonalization(), KSPGMRESCGSRefinementType, KSPGMRESClassicalGramSchmidtOrthogonalization(), KSPGMRESSetCGSRefinementType(), >> KSPGMRESGetOrthogonalization() >> @*/ >> PetscErrorCode KSPGMRESGetCGSRefinementType(KSP ksp,KSPGMRESCGSRefinementType *type) >> { >> PetscErrorCode ierr; >> >> PetscFunctionBegin; >> PetscValidHeaderSpecific(ksp,KSP_CLASSID,1); >> ierr = PetscUseMethod(ksp,"KSPGMRESGetCGSRefinementType_C",(KSP,KSPGMRESCGSRefinementType*),(ksp,type));CHKERRQ(ierr); >> PetscFunctionReturn(0); >> } >> >> >> /*@ >> KSPGMRESSetRestart - Sets number of iterations at which GMRES, FGMRES and LGMRES restarts. >> >> Logically Collective on KSP >> >> Input Parameters: >> + ksp - the Krylov space context >> - restart - integer restart value >> >> Options Database: >> . -ksp_gmres_restart >> >> Note: The default value is 30. >> >> Level: intermediate >> >> .keywords: KSP, GMRES, restart, iterations >> >> .seealso: KSPSetTolerances(), KSPGMRESSetOrthogonalization(), KSPGMRESSetPreAllocateVectors(), KSPGMRESGetRestart() >> @*/ >> PetscErrorCode KSPGMRESSetRestart(KSP ksp, PetscInt restart) >> { >> PetscErrorCode ierr; >> >> PetscFunctionBegin; >> PetscValidLogicalCollectiveInt(ksp,restart,2); >> >> ierr = PetscTryMethod(ksp,"KSPGMRESSetRestart_C",(KSP,PetscInt),(ksp,restart));CHKERRQ(ierr); >> PetscFunctionReturn(0); >> } >> >> /*@ >> KSPGMRESGetRestart - Gets number of iterations at which GMRES, FGMRES and LGMRES restarts. >> >> Not Collective >> >> Input Parameter: >> . ksp - the Krylov space context >> >> Output Parameter: >> . restart - integer restart value >> >> Note: The default value is 30. >> >> Level: intermediate >> >> .keywords: KSP, GMRES, restart, iterations >> >> .seealso: KSPSetTolerances(), KSPGMRESSetOrthogonalization(), KSPGMRESSetPreAllocateVectors(), KSPGMRESSetRestart() >> @*/ >> PetscErrorCode KSPGMRESGetRestart(KSP ksp, PetscInt *restart) >> { >> PetscErrorCode ierr; >> >> PetscFunctionBegin; >> ierr = PetscUseMethod(ksp,"KSPGMRESGetRestart_C",(KSP,PetscInt*),(ksp,restart));CHKERRQ(ierr); >> PetscFunctionReturn(0); >> } >> >> /*@ >> KSPGMRESSetHapTol - Sets tolerance for determining happy breakdown in GMRES, FGMRES and LGMRES. >> >> Logically Collective on KSP >> >> Input Parameters: >> + ksp - the Krylov space context >> - tol - the tolerance >> >> Options Database: >> . -ksp_gmres_haptol >> >> Note: Happy breakdown is the rare case in GMRES where an 'exact' solution is obtained after >> a certain number of iterations. If you attempt more iterations after this point unstable >> things can happen hence very occasionally you may need to set this value to detect this condition >> >> Level: intermediate >> >> .keywords: KSP, GMRES, tolerance >> >> .seealso: KSPSetTolerances() >> @*/ >> PetscErrorCode KSPGMRESSetHapTol(KSP ksp,PetscReal tol) >> { >> PetscErrorCode ierr; >> >> PetscFunctionBegin; >> PetscValidLogicalCollectiveReal(ksp,tol,2); >> ierr = PetscTryMethod((ksp),"KSPGMRESSetHapTol_C",(KSP,PetscReal),((ksp),(tol)));CHKERRQ(ierr); >> PetscFunctionReturn(0); >> } >> >> /*MC >> KSPGMRES - Implements the Generalized Minimal Residual method. >> (Saad and Schultz, 1986) with restart >> >> >> Options Database Keys: >> + -ksp_gmres_restart - the number of Krylov directions to orthogonalize against >> . -ksp_gmres_haptol - sets the tolerance for "happy ending" (exact convergence) >> . -ksp_gmres_preallocate - preallocate all the Krylov search directions initially (otherwise groups of >> vectors are allocated as needed) >> . -ksp_gmres_classicalgramschmidt - use classical (unmodified) Gram-Schmidt to orthogonalize against the Krylov space (fast) (the default) >> . -ksp_gmres_modifiedgramschmidt - use modified Gram-Schmidt in the orthogonalization (more stable, but slower) >> . -ksp_gmres_cgs_refinement_type - determine if iterative refinement is used to increase the >> stability of the classical Gram-Schmidt orthogonalization. >> - -ksp_gmres_krylov_monitor - plot the Krylov space generated >> >> Level: beginner >> >> Notes: Left and right preconditioning are supported, but not symmetric preconditioning. >> >> References: >> . 1. - YOUCEF SAAD AND MARTIN H. SCHULTZ, GMRES: A GENERALIZED MINIMAL RESIDUAL ALGORITHM FOR SOLVING NONSYMMETRIC LINEAR SYSTEMS. >> SIAM J. ScI. STAT. COMPUT. Vo|. 7, No. 3, July 1986. >> >> .seealso: KSPCreate(), KSPSetType(), KSPType (for list of available types), KSP, KSPFGMRES, KSPLGMRES, >> KSPGMRESSetRestart(), KSPGMRESSetHapTol(), KSPGMRESSetPreAllocateVectors(), KSPGMRESSetOrthogonalization(), KSPGMRESGetOrthogonalization(), >> KSPGMRESClassicalGramSchmidtOrthogonalization(), KSPGMRESModifiedGramSchmidtOrthogonalization(), >> KSPGMRESCGSRefinementType, KSPGMRESSetCGSRefinementType(), KSPGMRESGetCGSRefinementType(), KSPGMRESMonitorKrylov(), KSPSetPCSide() >> >> M*/ >> >> PETSC_EXTERN PetscErrorCode KSPCreate_GMRES(KSP ksp) >> { >> KSP_GMRES *gmres; >> PetscErrorCode ierr; >> >> PetscFunctionBegin; >> ierr = PetscNewLog(ksp,&gmres);CHKERRQ(ierr); >> ksp->data = (void*)gmres; >> >> ierr = KSPSetSupportedNorm(ksp,KSP_NORM_PRECONDITIONED,PC_LEFT,4);CHKERRQ(ierr); >> ierr = KSPSetSupportedNorm(ksp,KSP_NORM_UNPRECONDITIONED,PC_RIGHT,3);CHKERRQ(ierr); >> ierr = KSPSetSupportedNorm(ksp,KSP_NORM_PRECONDITIONED,PC_SYMMETRIC,2);CHKERRQ(ierr); >> ierr = KSPSetSupportedNorm(ksp,KSP_NORM_NONE,PC_RIGHT,1);CHKERRQ(ierr); >> ierr = KSPSetSupportedNorm(ksp,KSP_NORM_NONE,PC_LEFT,1);CHKERRQ(ierr); >> >> ksp->ops->buildsolution = KSPBuildSolution_GMRES; >> ksp->ops->setup = KSPSetUp_GMRES; >> ksp->ops->solve = KSPSolve_GMRES; >> ksp->ops->reset = KSPReset_GMRES; >> ksp->ops->destroy = KSPDestroy_GMRES; >> ksp->ops->view = KSPView_GMRES; >> ksp->ops->setfromoptions = KSPSetFromOptions_GMRES; >> ksp->ops->computeextremesingularvalues = KSPComputeExtremeSingularValues_GMRES; >> ksp->ops->computeeigenvalues = KSPComputeEigenvalues_GMRES; >> #if !defined(PETSC_USE_COMPLEX) && !defined(PETSC_HAVE_ESSL) >> ksp->ops->computeritz = KSPComputeRitz_GMRES; >> #endif >> ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetPreAllocateVectors_C",KSPGMRESSetPreAllocateVectors_GMRES);CHKERRQ(ierr); >> ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetOrthogonalization_C",KSPGMRESSetOrthogonalization_GMRES);CHKERRQ(ierr); >> ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESGetOrthogonalization_C",KSPGMRESGetOrthogonalization_GMRES);CHKERRQ(ierr); >> ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetRestart_C",KSPGMRESSetRestart_GMRES);CHKERRQ(ierr); >> ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESGetRestart_C",KSPGMRESGetRestart_GMRES);CHKERRQ(ierr); >> ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetHapTol_C",KSPGMRESSetHapTol_GMRES);CHKERRQ(ierr); >> ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetCGSRefinementType_C",KSPGMRESSetCGSRefinementType_GMRES);CHKERRQ(ierr); >> ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESGetCGSRefinementType_C",KSPGMRESGetCGSRefinementType_GMRES);CHKERRQ(ierr); >> >> gmres->haptol = 1.0e-30; >> gmres->q_preallocate = 0; >> gmres->delta_allocate = GMRES_DELTA_DIRECTIONS; >> gmres->orthog = KSPGMRESClassicalGramSchmidtOrthogonalization; >> gmres->nrs = 0; >> gmres->sol_temp = 0; >> gmres->max_k = GMRES_DEFAULT_MAXK; >> gmres->Rsvd = 0; >> gmres->cgstype = KSP_GMRES_CGS_REFINE_NEVER; >> gmres->orthogwork = 0; >> gmres->delta = -1.0; // DRL >> PetscFunctionReturn(0); >> } From swarnava89 at gmail.com Mon May 20 02:04:19 2019 From: swarnava89 at gmail.com (Swarnava Ghosh) Date: Mon, 20 May 2019 00:04:19 -0700 Subject: [petsc-users] Creating a DMNetwork from a DMPlex In-Reply-To: <9DAFD49B-AB7F-435F-BB27-16EF946E1241@mcs.anl.gov> References: <9DAFD49B-AB7F-435F-BB27-16EF946E1241@mcs.anl.gov> Message-ID: Hi Barry, Thank you for your email. My planned discretization is based on the fact that I need a distributed unstructured mesh, where at each vertex point I perform local calculations. For these calculations, I do NOT need need to assemble any global matrix. I will have fields defined at the vertices, and using linear interpolation, I am planing to find the values of these fields at some spatial points with are within a ball around each vertex. Once the values of these fields are known within the compact support around each vertex, I do local computations to calculate my unknown field. My reason for having the a mesh is essentially to 1) define fields at the vertices and 2) perform linear interpolation (using finite elements) at some spatial points. Also the local computations around at each vertex is computationally the most expensive step. In that case, having a cell partitioning will result in vertices being shared among processes, which will result in redundant computations. My idea is therefore to have DMNetwork to distribute vertices across processes and use finite elements for the linear interpolation part. Thanks, SG On Sun, May 19, 2019 at 6:54 PM Smith, Barry F. wrote: > > I am not sure you want DMNetwork, DMNetwork has no geometry; it only has > vertices and edges. Vertices are connected to other vertices through the > edges. For example I can't see how one would do vertex centered finite > volume methods with DMNetwork. Maybe if you said something more about your > planned discretization we could figure something out. > > > On May 19, 2019, at 8:32 PM, Swarnava Ghosh > wrote: > > > > Hi Barry, > > > > No, the gmesh file contains a mesh and not a graph/network. > > In that case, is it possible to create a DMNetwork first from the DMPlex > and then distribute the DMNetwork. > > > > I have this case, because I want a vertex partitioning of my mesh. > Domain decomposition of DMPlex gives me cell partitioning. Essentially what > I want is that no two processes can share a vertex BUT that can share an > edge. Similar to how a DMDA is distributed. > > > > Thanks, > > Swarnava > > > > On Sun, May 19, 2019 at 4:50 PM Smith, Barry F. > wrote: > > > > This use case never occurred to us. Is the gmesh file containing a > graph/network (as opposed to a mesh)? There seem two choices > > > > 1) if the gmesh file contains a graph/network one could write a gmesh > reader for that case that reads directly for and constructs a DMNetwork or > > > > 2) write a converter for a DMPlex to DMNetwork. > > > > I lean toward the first > > > > Either way you need to understand the documentation for DMNetwork and > how to build one up. > > > > > > Barry > > > > > > > On May 19, 2019, at 6:34 PM, Swarnava Ghosh via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > > > Hi Petsc users and developers, > > > > > > I am trying to find a way of creating a DMNetwork from a DMPlex. I > have read the DMPlex from a gmesh file and have it distributed. > > > > > > Thanks, > > > SG > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From davelee2804 at gmail.com Mon May 20 02:28:36 2019 From: davelee2804 at gmail.com (Dave Lee) Date: Mon, 20 May 2019 17:28:36 +1000 Subject: [petsc-users] Calling LAPACK routines from PETSc In-Reply-To: <8D37944A-6C37-48D0-B238-4E9806D93B7E@anl.gov> References: <8736l9abj4.fsf@jedbrown.org> <8D37944A-6C37-48D0-B238-4E9806D93B7E@anl.gov> Message-ID: Thanks Jed and Barry, So, just to confirm, -- From the KSP_GMRES structure, if I call *HH(a,b), that will return the row a, column b entry of the Hessenberg matrix (while the back end array *hh_origin array is ordering using the Fortran convention) -- Matrices are passed into and returned from PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_() using Fortran indexing, and need to be transposed to get back to C ordering Are both of these statements correct? Cheers, Dave. On Mon, May 20, 2019 at 4:34 PM Smith, Barry F. wrote: > > The little work arrays in GMRES tend to be stored in Fortran ordering; > there is no C style p[][] indexing into such arrays. Thus the arrays can > safely be sent to LAPACK. The only trick is knowing the two dimensions and > as Jed say the "leading dimension parameter. He gave you a place to look > > > On May 20, 2019, at 1:24 AM, Jed Brown via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > Dave Lee via petsc-users writes: > > > >> Hi Petsc, > >> > >> I'm attempting to implement a "hookstep" for the SNES trust region > solver. > >> Essentially what I'm trying to do is replace the solution of the least > >> squares problem at the end of each GMRES solve with a modified solution > >> with a norm that is constrained to be within the size of the trust > region. > >> > >> In order to do this I need to perform an SVD on the Hessenberg matrix, > >> which copying the function KSPComputeExtremeSingularValues(), I'm > trying to > >> do by accessing the LAPACK function dgesvd() via the > PetscStackCallBLAS() > >> machinery. One thing I'm confused about however is the ordering of the > 2D > >> arrays into and out of this function, given that that C and FORTRAN > arrays > >> use reverse indexing, ie: C[j+1][i+1] = F[i,j]. > >> > >> Given that the Hessenberg matrix has k+1 rows and k columns, should I be > >> still be initializing this as H[row][col] and passing this into > >> PetscStackCallBLAS("LAPACKgesvd",LAPACKgrsvd_(...)) > >> or should I be transposing this before passing it in? > > > > LAPACK terminology is with respect to Fortran ordering. There is a > > "leading dimension" parameter so that you can operate on non-contiguous > > blocks. See KSPComputeExtremeSingularValues_GMRES for an example. > > > >> Also for the left and right singular vector matrices that are returned > by > >> this function, should I be transposing these before I interpret them as > C > >> arrays? > >> > >> I've attached my modified version of gmres.c in case this is helpful. If > >> you grep for DRL (my initials) then you'll see my changes to the code. > >> > >> Cheers, Dave. > >> > >> /* > >> This file implements GMRES (a Generalized Minimal Residual) method. > >> Reference: Saad and Schultz, 1986. > >> > >> > >> Some comments on left vs. right preconditioning, and restarts. > >> Left and right preconditioning. > >> If right preconditioning is chosen, then the problem being solved > >> by gmres is actually > >> My = AB^-1 y = f > >> so the initial residual is > >> r = f - Mx > >> Note that B^-1 y = x or y = B x, and if x is non-zero, the initial > >> residual is > >> r = f - A x > >> The final solution is then > >> x = B^-1 y > >> > >> If left preconditioning is chosen, then the problem being solved is > >> My = B^-1 A x = B^-1 f, > >> and the initial residual is > >> r = B^-1(f - Ax) > >> > >> Restarts: Restarts are basically solves with x0 not equal to zero. > >> Note that we can eliminate an extra application of B^-1 between > >> restarts as long as we don't require that the solution at the end > >> of an unsuccessful gmres iteration always be the solution x. > >> */ > >> > >> #include <../src/ksp/ksp/impls/gmres/gmresimpl.h> /*I > "petscksp.h" I*/ > >> #include // DRL > >> #define GMRES_DELTA_DIRECTIONS 10 > >> #define GMRES_DEFAULT_MAXK 30 > >> static PetscErrorCode > KSPGMRESUpdateHessenberg(KSP,PetscInt,PetscBool,PetscReal*); > >> static PetscErrorCode > KSPGMRESBuildSoln(PetscScalar*,Vec,Vec,KSP,PetscInt); > >> > >> PetscErrorCode KSPSetUp_GMRES(KSP ksp) > >> { > >> PetscInt hh,hes,rs,cc; > >> PetscErrorCode ierr; > >> PetscInt max_k,k; > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > >> > >> PetscFunctionBegin; > >> max_k = gmres->max_k; /* restart size */ > >> hh = (max_k + 2) * (max_k + 1); > >> hes = (max_k + 1) * (max_k + 1); > >> rs = (max_k + 2); > >> cc = (max_k + 1); > >> > >> ierr = > PetscCalloc5(hh,&gmres->hh_origin,hes,&gmres->hes_origin,rs,&gmres->rs_origin,cc,&gmres->cc_origin,cc,&gmres->ss_origin);CHKERRQ(ierr); > >> ierr = PetscLogObjectMemory((PetscObject)ksp,(hh + hes + rs + > 2*cc)*sizeof(PetscScalar));CHKERRQ(ierr); > >> > >> if (ksp->calc_sings) { > >> /* Allocate workspace to hold Hessenberg matrix needed by lapack */ > >> ierr = PetscMalloc1((max_k + 3)*(max_k + > 9),&gmres->Rsvd);CHKERRQ(ierr); > >> ierr = PetscLogObjectMemory((PetscObject)ksp,(max_k + 3)*(max_k + > 9)*sizeof(PetscScalar));CHKERRQ(ierr); > >> ierr = PetscMalloc1(6*(max_k+2),&gmres->Dsvd);CHKERRQ(ierr); > >> ierr = > PetscLogObjectMemory((PetscObject)ksp,6*(max_k+2)*sizeof(PetscReal));CHKERRQ(ierr); > >> } > >> > >> /* Allocate array to hold pointers to user vectors. Note that we need > >> 4 + max_k + 1 (since we need it+1 vectors, and it <= max_k) */ > >> gmres->vecs_allocated = VEC_OFFSET + 2 + max_k + gmres->nextra_vecs; > >> > >> ierr = PetscMalloc1(gmres->vecs_allocated,&gmres->vecs);CHKERRQ(ierr); > >> ierr = > PetscMalloc1(VEC_OFFSET+2+max_k,&gmres->user_work);CHKERRQ(ierr); > >> ierr = > PetscMalloc1(VEC_OFFSET+2+max_k,&gmres->mwork_alloc);CHKERRQ(ierr); > >> ierr = > PetscLogObjectMemory((PetscObject)ksp,(VEC_OFFSET+2+max_k)*(sizeof(Vec*)+sizeof(PetscInt)) > + gmres->vecs_allocated*sizeof(Vec));CHKERRQ(ierr); > >> > >> if (gmres->q_preallocate) { > >> gmres->vv_allocated = VEC_OFFSET + 2 + max_k; > >> > >> ierr = > KSPCreateVecs(ksp,gmres->vv_allocated,&gmres->user_work[0],0,NULL);CHKERRQ(ierr); > >> ierr = > PetscLogObjectParents(ksp,gmres->vv_allocated,gmres->user_work[0]);CHKERRQ(ierr); > >> > >> gmres->mwork_alloc[0] = gmres->vv_allocated; > >> gmres->nwork_alloc = 1; > >> for (k=0; kvv_allocated; k++) { > >> gmres->vecs[k] = gmres->user_work[0][k]; > >> } > >> } else { > >> gmres->vv_allocated = 5; > >> > >> ierr = > KSPCreateVecs(ksp,5,&gmres->user_work[0],0,NULL);CHKERRQ(ierr); > >> ierr = > PetscLogObjectParents(ksp,5,gmres->user_work[0]);CHKERRQ(ierr); > >> > >> gmres->mwork_alloc[0] = 5; > >> gmres->nwork_alloc = 1; > >> for (k=0; kvv_allocated; k++) { > >> gmres->vecs[k] = gmres->user_work[0][k]; > >> } > >> } > >> PetscFunctionReturn(0); > >> } > >> > >> /* > >> Run gmres, possibly with restart. Return residual history if > requested. > >> input parameters: > >> > >> . gmres - structure containing parameters and work areas > >> > >> output parameters: > >> . nres - residuals (from preconditioned system) at each step. > >> If restarting, consider passing nres+it. If null, > >> ignored > >> . itcount - number of iterations used. nres[0] to nres[itcount] > >> are defined. If null, ignored. > >> > >> Notes: > >> On entry, the value in vector VEC_VV(0) should be the initial > residual > >> (this allows shortcuts where the initial preconditioned residual is > 0). > >> */ > >> PetscErrorCode KSPGMRESCycle(PetscInt *itcount,KSP ksp) > >> { > >> KSP_GMRES *gmres = (KSP_GMRES*)(ksp->data); > >> PetscReal res_norm,res,hapbnd,tt; > >> PetscErrorCode ierr; > >> PetscInt it = 0, max_k = gmres->max_k; > >> PetscBool hapend = PETSC_FALSE; > >> > >> PetscFunctionBegin; > >> if (itcount) *itcount = 0; > >> ierr = VecNormalize(VEC_VV(0),&res_norm);CHKERRQ(ierr); > >> KSPCheckNorm(ksp,res_norm); > >> res = res_norm; > >> *GRS(0) = res_norm; > >> > >> /* check for the convergence */ > >> ierr = PetscObjectSAWsTakeAccess((PetscObject)ksp);CHKERRQ(ierr); > >> ksp->rnorm = res; > >> ierr = > PetscObjectSAWsGrantAccess((PetscObject)ksp);CHKERRQ(ierr); > >> gmres->it = (it - 1); > >> ierr = KSPLogResidualHistory(ksp,res);CHKERRQ(ierr); > >> ierr = KSPMonitor(ksp,ksp->its,res);CHKERRQ(ierr); > >> if (!res) { > >> ksp->reason = KSP_CONVERGED_ATOL; > >> ierr = PetscInfo(ksp,"Converged due to zero residual norm on > entry\n");CHKERRQ(ierr); > >> PetscFunctionReturn(0); > >> } > >> > >> ierr = > (*ksp->converged)(ksp,ksp->its,res,&ksp->reason,ksp->cnvP);CHKERRQ(ierr); > >> while (!ksp->reason && it < max_k && ksp->its < ksp->max_it) { > >> if (it) { > >> ierr = KSPLogResidualHistory(ksp,res);CHKERRQ(ierr); > >> ierr = KSPMonitor(ksp,ksp->its,res);CHKERRQ(ierr); > >> } > >> gmres->it = (it - 1); > >> if (gmres->vv_allocated <= it + VEC_OFFSET + 1) { > >> ierr = KSPGMRESGetNewVectors(ksp,it+1);CHKERRQ(ierr); > >> } > >> ierr = > KSP_PCApplyBAorAB(ksp,VEC_VV(it),VEC_VV(1+it),VEC_TEMP_MATOP);CHKERRQ(ierr); > >> > >> /* update hessenberg matrix and do Gram-Schmidt */ > >> ierr = (*gmres->orthog)(ksp,it);CHKERRQ(ierr); > >> if (ksp->reason) break; > >> > >> /* vv(i+1) . vv(i+1) */ > >> ierr = VecNormalize(VEC_VV(it+1),&tt);CHKERRQ(ierr); > >> > >> /* save the magnitude */ > >> *HH(it+1,it) = tt; > >> *HES(it+1,it) = tt; > >> > >> /* check for the happy breakdown */ > >> hapbnd = PetscAbsScalar(tt / *GRS(it)); > >> if (hapbnd > gmres->haptol) hapbnd = gmres->haptol; > >> if (tt < hapbnd) { > >> ierr = PetscInfo2(ksp,"Detected happy breakdown, current hapbnd > = %14.12e tt = %14.12e\n",(double)hapbnd,(double)tt);CHKERRQ(ierr); > >> hapend = PETSC_TRUE; > >> } > >> ierr = KSPGMRESUpdateHessenberg(ksp,it,hapend,&res);CHKERRQ(ierr); > >> > >> it++; > >> gmres->it = (it-1); /* For converged */ > >> ksp->its++; > >> ksp->rnorm = res; > >> if (ksp->reason) break; > >> > >> ierr = > (*ksp->converged)(ksp,ksp->its,res,&ksp->reason,ksp->cnvP);CHKERRQ(ierr); > >> > >> /* Catch error in happy breakdown and signal convergence and break > from loop */ > >> if (hapend) { > >> if (!ksp->reason) { > >> if (ksp->errorifnotconverged) > SETERRQ1(PetscObjectComm((PetscObject)ksp),PETSC_ERR_NOT_CONVERGED,"You > reached the happy break down, but convergence was not indicated. Residual > norm = %g",(double)res); > >> else { > >> ksp->reason = KSP_DIVERGED_BREAKDOWN; > >> break; > >> } > >> } > >> } > >> } > >> > >> /* Monitor if we know that we will not return for a restart */ > >> if (it && (ksp->reason || ksp->its >= ksp->max_it)) { > >> ierr = KSPLogResidualHistory(ksp,res);CHKERRQ(ierr); > >> ierr = KSPMonitor(ksp,ksp->its,res);CHKERRQ(ierr); > >> } > >> > >> if (itcount) *itcount = it; > >> > >> > >> /* > >> Down here we have to solve for the "best" coefficients of the Krylov > >> columns, add the solution values together, and possibly unwind the > >> preconditioning from the solution > >> */ > >> /* Form the solution (or the solution so far) */ > >> ierr = > KSPGMRESBuildSoln(GRS(0),ksp->vec_sol,ksp->vec_sol,ksp,it-1);CHKERRQ(ierr); > >> PetscFunctionReturn(0); > >> } > >> > >> PetscErrorCode KSPSolve_GMRES(KSP ksp) > >> { > >> PetscErrorCode ierr; > >> PetscInt its,itcount,i; > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > >> PetscBool guess_zero = ksp->guess_zero; > >> PetscInt N = gmres->max_k + 1; > >> PetscBLASInt bN; > >> > >> PetscFunctionBegin; > >> if (ksp->calc_sings && !gmres->Rsvd) > SETERRQ(PetscObjectComm((PetscObject)ksp),PETSC_ERR_ORDER,"Must call > KSPSetComputeSingularValues() before KSPSetUp() is called"); > >> > >> ierr = PetscObjectSAWsTakeAccess((PetscObject)ksp);CHKERRQ(ierr); > >> ksp->its = 0; > >> ierr = PetscObjectSAWsGrantAccess((PetscObject)ksp);CHKERRQ(ierr); > >> > >> itcount = 0; > >> gmres->fullcycle = 0; > >> ksp->reason = KSP_CONVERGED_ITERATING; > >> while (!ksp->reason) { > >> ierr = > KSPInitialResidual(ksp,ksp->vec_sol,VEC_TEMP,VEC_TEMP_MATOP,VEC_VV(0),ksp->vec_rhs);CHKERRQ(ierr); > >> ierr = KSPGMRESCycle(&its,ksp);CHKERRQ(ierr); > >> /* Store the Hessenberg matrix and the basis vectors of the Krylov > subspace > >> if the cycle is complete for the computation of the Ritz pairs */ > >> if (its == gmres->max_k) { > >> gmres->fullcycle++; > >> if (ksp->calc_ritz) { > >> if (!gmres->hes_ritz) { > >> ierr = PetscMalloc1(N*N,&gmres->hes_ritz);CHKERRQ(ierr); > >> ierr = > PetscLogObjectMemory((PetscObject)ksp,N*N*sizeof(PetscScalar));CHKERRQ(ierr); > >> ierr = > VecDuplicateVecs(VEC_VV(0),N,&gmres->vecb);CHKERRQ(ierr); > >> } > >> ierr = PetscBLASIntCast(N,&bN);CHKERRQ(ierr); > >> ierr = > PetscMemcpy(gmres->hes_ritz,gmres->hes_origin,bN*bN*sizeof(PetscReal));CHKERRQ(ierr); > >> for (i=0; imax_k+1; i++) { > >> ierr = VecCopy(VEC_VV(i),gmres->vecb[i]);CHKERRQ(ierr); > >> } > >> } > >> } > >> itcount += its; > >> if (itcount >= ksp->max_it) { > >> if (!ksp->reason) ksp->reason = KSP_DIVERGED_ITS; > >> break; > >> } > >> ksp->guess_zero = PETSC_FALSE; /* every future call to > KSPInitialResidual() will have nonzero guess */ > >> } > >> ksp->guess_zero = guess_zero; /* restore if user provided nonzero > initial guess */ > >> PetscFunctionReturn(0); > >> } > >> > >> PetscErrorCode KSPReset_GMRES(KSP ksp) > >> { > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > >> PetscErrorCode ierr; > >> PetscInt i; > >> > >> PetscFunctionBegin; > >> /* Free the Hessenberg matrices */ > >> ierr = > PetscFree6(gmres->hh_origin,gmres->hes_origin,gmres->rs_origin,gmres->cc_origin,gmres->ss_origin,gmres->hes_ritz);CHKERRQ(ierr); > >> > >> /* free work vectors */ > >> ierr = PetscFree(gmres->vecs);CHKERRQ(ierr); > >> for (i=0; inwork_alloc; i++) { > >> ierr = > VecDestroyVecs(gmres->mwork_alloc[i],&gmres->user_work[i]);CHKERRQ(ierr); > >> } > >> gmres->nwork_alloc = 0; > >> if (gmres->vecb) { > >> ierr = VecDestroyVecs(gmres->max_k+1,&gmres->vecb);CHKERRQ(ierr); > >> } > >> > >> ierr = PetscFree(gmres->user_work);CHKERRQ(ierr); > >> ierr = PetscFree(gmres->mwork_alloc);CHKERRQ(ierr); > >> ierr = PetscFree(gmres->nrs);CHKERRQ(ierr); > >> ierr = VecDestroy(&gmres->sol_temp);CHKERRQ(ierr); > >> ierr = PetscFree(gmres->Rsvd);CHKERRQ(ierr); > >> ierr = PetscFree(gmres->Dsvd);CHKERRQ(ierr); > >> ierr = PetscFree(gmres->orthogwork);CHKERRQ(ierr); > >> > >> gmres->sol_temp = 0; > >> gmres->vv_allocated = 0; > >> gmres->vecs_allocated = 0; > >> gmres->sol_temp = 0; > >> PetscFunctionReturn(0); > >> } > >> > >> PetscErrorCode KSPDestroy_GMRES(KSP ksp) > >> { > >> PetscErrorCode ierr; > >> > >> PetscFunctionBegin; > >> ierr = KSPReset_GMRES(ksp);CHKERRQ(ierr); > >> ierr = PetscFree(ksp->data);CHKERRQ(ierr); > >> /* clear composed functions */ > >> ierr = > PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetPreAllocateVectors_C",NULL);CHKERRQ(ierr); > >> ierr = > PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetOrthogonalization_C",NULL);CHKERRQ(ierr); > >> ierr = > PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESGetOrthogonalization_C",NULL);CHKERRQ(ierr); > >> ierr = > PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetRestart_C",NULL);CHKERRQ(ierr); > >> ierr = > PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESGetRestart_C",NULL);CHKERRQ(ierr); > >> ierr = > PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetHapTol_C",NULL);CHKERRQ(ierr); > >> ierr = > PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetCGSRefinementType_C",NULL);CHKERRQ(ierr); > >> ierr = > PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESGetCGSRefinementType_C",NULL);CHKERRQ(ierr); > >> PetscFunctionReturn(0); > >> } > >> /* > >> KSPGMRESBuildSoln - create the solution from the starting vector and > the > >> current iterates. > >> > >> Input parameters: > >> nrs - work area of size it + 1. > >> vs - index of initial guess > >> vdest - index of result. Note that vs may == vdest (replace > >> guess with the solution). > >> > >> This is an internal routine that knows about the GMRES internals. > >> */ > >> static PetscErrorCode KSPGMRESBuildSoln(PetscScalar *nrs,Vec vs,Vec > vdest,KSP ksp,PetscInt it) > >> { > >> PetscScalar tt; > >> PetscErrorCode ierr; > >> PetscInt ii,k,j; > >> KSP_GMRES *gmres = (KSP_GMRES*)(ksp->data); > >> > >> PetscFunctionBegin; > >> /* Solve for solution vector that minimizes the residual */ > >> > >> /* If it is < 0, no gmres steps have been performed */ > >> if (it < 0) { > >> ierr = VecCopy(vs,vdest);CHKERRQ(ierr); /* VecCopy() is smart, > exists immediately if vguess == vdest */ > >> PetscFunctionReturn(0); > >> } > >> if (*HH(it,it) != 0.0) { > >> nrs[it] = *GRS(it) / *HH(it,it); > >> } else { > >> ksp->reason = KSP_DIVERGED_BREAKDOWN; > >> > >> ierr = PetscInfo2(ksp,"Likely your matrix or preconditioner is > singular. HH(it,it) is identically zero; it = %D GRS(it) = > %g\n",it,(double)PetscAbsScalar(*GRS(it)));CHKERRQ(ierr); > >> PetscFunctionReturn(0); > >> } > >> for (ii=1; ii<=it; ii++) { > >> k = it - ii; > >> tt = *GRS(k); > >> for (j=k+1; j<=it; j++) tt = tt - *HH(k,j) * nrs[j]; > >> if (*HH(k,k) == 0.0) { > >> ksp->reason = KSP_DIVERGED_BREAKDOWN; > >> > >> ierr = PetscInfo1(ksp,"Likely your matrix or preconditioner is > singular. HH(k,k) is identically zero; k = %D\n",k);CHKERRQ(ierr); > >> PetscFunctionReturn(0); > >> } > >> nrs[k] = tt / *HH(k,k); > >> } > >> > >> /* Perform the hookstep correction - DRL */ > >> if(gmres->delta > 0.0 && gmres->it > 0) { // Apply the hookstep to > correct the GMRES solution (if required) > >> printf("\t\tapplying hookstep: initial delta: %lf", gmres->delta); > >> PetscInt N = gmres->max_k+2, ii, jj, j0; > >> PetscBLASInt nRows, nCols, lwork, lierr; > >> PetscScalar *R, *work; > >> PetscReal* S; > >> PetscScalar *U, *VT, *p, *q, *y; > >> PetscScalar bnorm, mu, qMag, qMag2, delta2; > >> > >> ierr = PetscMalloc1((gmres->max_k + 3)*(gmres->max_k + > 9),&R);CHKERRQ(ierr); > >> work = R + N*N; > >> ierr = PetscMalloc1(6*(gmres->max_k+2),&S);CHKERRQ(ierr); > >> > >> ierr = PetscBLASIntCast(gmres->it+1,&nRows);CHKERRQ(ierr); > >> ierr = PetscBLASIntCast(gmres->it+0,&nCols);CHKERRQ(ierr); > >> ierr = PetscBLASIntCast(5*N,&lwork);CHKERRQ(ierr); > >> //ierr = > PetscMemcpy(R,gmres->hes_origin,(gmres->max_k+2)*(gmres->max_k+1)*sizeof(PetscScalar));CHKERRQ(ierr); > >> ierr = PetscMalloc1(nRows*nCols,&R);CHKERRQ(ierr); > >> for (ii = 0; ii < nRows; ii++) { > >> for (jj = 0; jj < nCols; jj++) { > >> R[ii*nCols+jj] = *HH(ii,jj); > >> // Ensure Hessenberg structure > >> //if (ii > jj+1) R[ii*nCols+jj] = 0.0; > >> } > >> } > >> > >> ierr = PetscMalloc1(nRows*nRows,&U);CHKERRQ(ierr); > >> ierr = PetscMalloc1(nCols*nCols,&VT);CHKERRQ(ierr); > >> ierr = PetscMalloc1(nRows,&p);CHKERRQ(ierr); > >> ierr = PetscMalloc1(nCols,&q);CHKERRQ(ierr); > >> ierr = PetscMalloc1(nRows,&y);CHKERRQ(ierr); > >> > >> > printf("\n\n");for(ii=0;ii >> > >> // Perform an SVD on the Hessenberg matrix > >> ierr = PetscFPTrapPush(PETSC_FP_TRAP_OFF);CHKERRQ(ierr); > >> > PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_("A","A",&nRows,&nCols,R,&nRows,S,U,&nRows,VT,&nCols,work,&lwork,&lierr)); > >> if (lierr) SETERRQ1(PETSC_COMM_SELF,PETSC_ERR_LIB,"Error in SVD > Lapack routine %d",(int)lierr); > >> ierr = PetscFPTrapPop();CHKERRQ(ierr); > >> > >> // Compute p = ||b|| U^T e_1 > >> ierr = VecNorm(ksp->vec_rhs,NORM_2,&bnorm);CHKERRQ(ierr); > >> for (ii=0; ii >> p[ii] = bnorm*U[ii*nRows]; > >> } > >> > >> // Solve the root finding problem for \mu such that ||q|| < \delta > (where \delta is the radius of the trust region) > >> // This step is largely copied from Ashley Willis' openpipeflow: > doi.org/10.1016/j.softx.2017.05.003 > >> mu = S[nCols-1]*S[nCols-1]*1.0e-6; > >> if (mu < 1.0e-99) mu = 1.0e-99; > >> qMag = 1.0e+99; > >> > >> while (qMag > gmres->delta) { > >> mu *= 1.1; > >> qMag2 = 0.0; > >> for (ii=0; ii >> q[ii] = p[ii]*S[ii]/(mu + S[ii]*S[ii]); > >> qMag2 += q[ii]*q[ii]; > >> } > >> qMag = PetscSqrtScalar(qMag2); > >> } > >> > >> // Expand y in terms of the right singular vectors as y = V q > >> for (ii=0; ii >> y[ii] = 0.0; > >> for (jj=0; jj >> y[ii] += VT[jj*nCols+ii]*q[jj]; // transpose of the transpose > >> } > >> } > >> > >> // Recompute the size of the trust region, \delta > >> delta2 = 0.0; > >> for (ii=0; ii >> j0 = (ii < 2) ? 0 : ii - 1; > >> p[ii] = 0.0; > >> for (jj=j0; jj >> p[ii] -= R[ii*nCols+jj]*y[jj]; > >> } > >> if (ii == 0) { > >> p[ii] += bnorm; > >> } > >> delta2 += p[ii]*p[ii]; > >> } > >> gmres->delta = PetscSqrtScalar(delta2); > >> printf("\t\t...final delta: %lf.\n", gmres->delta); > >> > >> // Pass the orthnomalized Krylov vector weights back out > >> for (ii=0; ii >> nrs[ii] = y[ii]; > >> } > >> > >> ierr = PetscFree(R);CHKERRQ(ierr); > >> ierr = PetscFree(S);CHKERRQ(ierr); > >> ierr = PetscFree(U);CHKERRQ(ierr); > >> ierr = PetscFree(VT);CHKERRQ(ierr); > >> ierr = PetscFree(p);CHKERRQ(ierr); > >> ierr = PetscFree(q);CHKERRQ(ierr); > >> ierr = PetscFree(y);CHKERRQ(ierr); > >> } > >> /*** DRL ***/ > >> > >> /* Accumulate the correction to the solution of the preconditioned > problem in TEMP */ > >> ierr = VecSet(VEC_TEMP,0.0);CHKERRQ(ierr); > >> if (gmres->delta > 0.0) { > >> ierr = VecMAXPY(VEC_TEMP,it,nrs,&VEC_VV(0));CHKERRQ(ierr); // DRL > >> } else { > >> ierr = VecMAXPY(VEC_TEMP,it+1,nrs,&VEC_VV(0));CHKERRQ(ierr); > >> } > >> > >> ierr = > KSPUnwindPreconditioner(ksp,VEC_TEMP,VEC_TEMP_MATOP);CHKERRQ(ierr); > >> /* add solution to previous solution */ > >> if (vdest != vs) { > >> ierr = VecCopy(vs,vdest);CHKERRQ(ierr); > >> } > >> ierr = VecAXPY(vdest,1.0,VEC_TEMP);CHKERRQ(ierr); > >> PetscFunctionReturn(0); > >> } > >> /* > >> Do the scalar work for the orthogonalization. Return new residual > norm. > >> */ > >> static PetscErrorCode KSPGMRESUpdateHessenberg(KSP ksp,PetscInt > it,PetscBool hapend,PetscReal *res) > >> { > >> PetscScalar *hh,*cc,*ss,tt; > >> PetscInt j; > >> KSP_GMRES *gmres = (KSP_GMRES*)(ksp->data); > >> > >> PetscFunctionBegin; > >> hh = HH(0,it); > >> cc = CC(0); > >> ss = SS(0); > >> > >> /* Apply all the previously computed plane rotations to the new column > >> of the Hessenberg matrix */ > >> for (j=1; j<=it; j++) { > >> tt = *hh; > >> *hh = PetscConj(*cc) * tt + *ss * *(hh+1); > >> hh++; > >> *hh = *cc++ * *hh - (*ss++ * tt); > >> } > >> > >> /* > >> compute the new plane rotation, and apply it to: > >> 1) the right-hand-side of the Hessenberg system > >> 2) the new column of the Hessenberg matrix > >> thus obtaining the updated value of the residual > >> */ > >> if (!hapend) { > >> tt = PetscSqrtScalar(PetscConj(*hh) * *hh + PetscConj(*(hh+1)) * > *(hh+1)); > >> if (tt == 0.0) { > >> ksp->reason = KSP_DIVERGED_NULL; > >> PetscFunctionReturn(0); > >> } > >> *cc = *hh / tt; > >> *ss = *(hh+1) / tt; > >> *GRS(it+1) = -(*ss * *GRS(it)); > >> *GRS(it) = PetscConj(*cc) * *GRS(it); > >> *hh = PetscConj(*cc) * *hh + *ss * *(hh+1); > >> *res = PetscAbsScalar(*GRS(it+1)); > >> } else { > >> /* happy breakdown: HH(it+1, it) = 0, therfore we don't need to apply > >> another rotation matrix (so RH doesn't change). The new > residual is > >> always the new sine term times the residual from last time > (GRS(it)), > >> but now the new sine rotation would be zero...so the > residual should > >> be zero...so we will multiply "zero" by the last residual. > This might > >> not be exactly what we want to do here -could just return > "zero". */ > >> > >> *res = 0.0; > >> } > >> PetscFunctionReturn(0); > >> } > >> /* > >> This routine allocates more work vectors, starting from VEC_VV(it). > >> */ > >> PetscErrorCode KSPGMRESGetNewVectors(KSP ksp,PetscInt it) > >> { > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > >> PetscErrorCode ierr; > >> PetscInt nwork = gmres->nwork_alloc,k,nalloc; > >> > >> PetscFunctionBegin; > >> nalloc = PetscMin(ksp->max_it,gmres->delta_allocate); > >> /* Adjust the number to allocate to make sure that we don't exceed the > >> number of available slots */ > >> if (it + VEC_OFFSET + nalloc >= gmres->vecs_allocated) { > >> nalloc = gmres->vecs_allocated - it - VEC_OFFSET; > >> } > >> if (!nalloc) PetscFunctionReturn(0); > >> > >> gmres->vv_allocated += nalloc; > >> > >> ierr = > KSPCreateVecs(ksp,nalloc,&gmres->user_work[nwork],0,NULL);CHKERRQ(ierr); > >> ierr = > PetscLogObjectParents(ksp,nalloc,gmres->user_work[nwork]);CHKERRQ(ierr); > >> > >> gmres->mwork_alloc[nwork] = nalloc; > >> for (k=0; k >> gmres->vecs[it+VEC_OFFSET+k] = gmres->user_work[nwork][k]; > >> } > >> gmres->nwork_alloc++; > >> PetscFunctionReturn(0); > >> } > >> > >> PetscErrorCode KSPBuildSolution_GMRES(KSP ksp,Vec ptr,Vec *result) > >> { > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > >> PetscErrorCode ierr; > >> > >> PetscFunctionBegin; > >> if (!ptr) { > >> if (!gmres->sol_temp) { > >> ierr = VecDuplicate(ksp->vec_sol,&gmres->sol_temp);CHKERRQ(ierr); > >> ierr = > PetscLogObjectParent((PetscObject)ksp,(PetscObject)gmres->sol_temp);CHKERRQ(ierr); > >> } > >> ptr = gmres->sol_temp; > >> } > >> if (!gmres->nrs) { > >> /* allocate the work area */ > >> ierr = PetscMalloc1(gmres->max_k,&gmres->nrs);CHKERRQ(ierr); > >> ierr = > PetscLogObjectMemory((PetscObject)ksp,gmres->max_k*sizeof(PetscScalar));CHKERRQ(ierr); > >> } > >> > >> ierr = > KSPGMRESBuildSoln(gmres->nrs,ksp->vec_sol,ptr,ksp,gmres->it);CHKERRQ(ierr); > >> if (result) *result = ptr; > >> PetscFunctionReturn(0); > >> } > >> > >> PetscErrorCode KSPView_GMRES(KSP ksp,PetscViewer viewer) > >> { > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > >> const char *cstr; > >> PetscErrorCode ierr; > >> PetscBool iascii,isstring; > >> > >> PetscFunctionBegin; > >> ierr = > PetscObjectTypeCompare((PetscObject)viewer,PETSCVIEWERASCII,&iascii);CHKERRQ(ierr); > >> ierr = > PetscObjectTypeCompare((PetscObject)viewer,PETSCVIEWERSTRING,&isstring);CHKERRQ(ierr); > >> if (gmres->orthog == KSPGMRESClassicalGramSchmidtOrthogonalization) { > >> switch (gmres->cgstype) { > >> case (KSP_GMRES_CGS_REFINE_NEVER): > >> cstr = "Classical (unmodified) Gram-Schmidt Orthogonalization with > no iterative refinement"; > >> break; > >> case (KSP_GMRES_CGS_REFINE_ALWAYS): > >> cstr = "Classical (unmodified) Gram-Schmidt Orthogonalization with > one step of iterative refinement"; > >> break; > >> case (KSP_GMRES_CGS_REFINE_IFNEEDED): > >> cstr = "Classical (unmodified) Gram-Schmidt Orthogonalization with > one step of iterative refinement when needed"; > >> break; > >> default: > >> > SETERRQ(PetscObjectComm((PetscObject)ksp),PETSC_ERR_ARG_OUTOFRANGE,"Unknown > orthogonalization"); > >> } > >> } else if (gmres->orthog == > KSPGMRESModifiedGramSchmidtOrthogonalization) { > >> cstr = "Modified Gram-Schmidt Orthogonalization"; > >> } else { > >> cstr = "unknown orthogonalization"; > >> } > >> if (iascii) { > >> ierr = PetscViewerASCIIPrintf(viewer," restart=%D, using > %s\n",gmres->max_k,cstr);CHKERRQ(ierr); > >> ierr = PetscViewerASCIIPrintf(viewer," happy breakdown tolerance > %g\n",(double)gmres->haptol);CHKERRQ(ierr); > >> } else if (isstring) { > >> ierr = PetscViewerStringSPrintf(viewer,"%s restart > %D",cstr,gmres->max_k);CHKERRQ(ierr); > >> } > >> PetscFunctionReturn(0); > >> } > >> > >> /*@C > >> KSPGMRESMonitorKrylov - Calls VecView() for each new direction in the > GMRES accumulated Krylov space. > >> > >> Collective on KSP > >> > >> Input Parameters: > >> + ksp - the KSP context > >> . its - iteration number > >> . fgnorm - 2-norm of residual (or gradient) > >> - dummy - an collection of viewers created with KSPViewerCreate() > >> > >> Options Database Keys: > >> . -ksp_gmres_kyrlov_monitor > >> > >> Notes: A new PETSCVIEWERDRAW is created for each Krylov vector so > they can all be simultaneously viewed > >> Level: intermediate > >> > >> .keywords: KSP, nonlinear, vector, monitor, view, Krylov space > >> > >> .seealso: KSPMonitorSet(), KSPMonitorDefault(), VecView(), > KSPViewersCreate(), KSPViewersDestroy() > >> @*/ > >> PetscErrorCode KSPGMRESMonitorKrylov(KSP ksp,PetscInt its,PetscReal > fgnorm,void *dummy) > >> { > >> PetscViewers viewers = (PetscViewers)dummy; > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > >> PetscErrorCode ierr; > >> Vec x; > >> PetscViewer viewer; > >> PetscBool flg; > >> > >> PetscFunctionBegin; > >> ierr = > PetscViewersGetViewer(viewers,gmres->it+1,&viewer);CHKERRQ(ierr); > >> ierr = > PetscObjectTypeCompare((PetscObject)viewer,PETSCVIEWERDRAW,&flg);CHKERRQ(ierr); > >> if (!flg) { > >> ierr = PetscViewerSetType(viewer,PETSCVIEWERDRAW);CHKERRQ(ierr); > >> ierr = PetscViewerDrawSetInfo(viewer,NULL,"Krylov GMRES > Monitor",PETSC_DECIDE,PETSC_DECIDE,300,300);CHKERRQ(ierr); > >> } > >> x = VEC_VV(gmres->it+1); > >> ierr = VecView(x,viewer);CHKERRQ(ierr); > >> PetscFunctionReturn(0); > >> } > >> > >> PetscErrorCode KSPSetFromOptions_GMRES(PetscOptionItems > *PetscOptionsObject,KSP ksp) > >> { > >> PetscErrorCode ierr; > >> PetscInt restart; > >> PetscReal haptol; > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > >> PetscBool flg; > >> > >> PetscFunctionBegin; > >> ierr = PetscOptionsHead(PetscOptionsObject,"KSP GMRES > Options");CHKERRQ(ierr); > >> ierr = PetscOptionsInt("-ksp_gmres_restart","Number of Krylov search > directions","KSPGMRESSetRestart",gmres->max_k,&restart,&flg);CHKERRQ(ierr); > >> if (flg) { ierr = KSPGMRESSetRestart(ksp,restart);CHKERRQ(ierr); } > >> ierr = PetscOptionsReal("-ksp_gmres_haptol","Tolerance for exact > convergence (happy > ending)","KSPGMRESSetHapTol",gmres->haptol,&haptol,&flg);CHKERRQ(ierr); > >> if (flg) { ierr = KSPGMRESSetHapTol(ksp,haptol);CHKERRQ(ierr); } > >> flg = PETSC_FALSE; > >> ierr = PetscOptionsBool("-ksp_gmres_preallocate","Preallocate Krylov > vectors","KSPGMRESSetPreAllocateVectors",flg,&flg,NULL);CHKERRQ(ierr); > >> if (flg) {ierr = KSPGMRESSetPreAllocateVectors(ksp);CHKERRQ(ierr);} > >> ierr = > PetscOptionsBoolGroupBegin("-ksp_gmres_classicalgramschmidt","Classical > (unmodified) Gram-Schmidt > (fast)","KSPGMRESSetOrthogonalization",&flg);CHKERRQ(ierr); > >> if (flg) {ierr = > KSPGMRESSetOrthogonalization(ksp,KSPGMRESClassicalGramSchmidtOrthogonalization);CHKERRQ(ierr);} > >> ierr = > PetscOptionsBoolGroupEnd("-ksp_gmres_modifiedgramschmidt","Modified > Gram-Schmidt (slow,more > stable)","KSPGMRESSetOrthogonalization",&flg);CHKERRQ(ierr); > >> if (flg) {ierr = > KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization);CHKERRQ(ierr);} > >> ierr = PetscOptionsEnum("-ksp_gmres_cgs_refinement_type","Type of > iterative refinement for classical (unmodified) > Gram-Schmidt","KSPGMRESSetCGSRefinementType", > >> > KSPGMRESCGSRefinementTypes,(PetscEnum)gmres->cgstype,(PetscEnum*)&gmres->cgstype,&flg);CHKERRQ(ierr); > >> flg = PETSC_FALSE; > >> ierr = PetscOptionsBool("-ksp_gmres_krylov_monitor","Plot the Krylov > directions","KSPMonitorSet",flg,&flg,NULL);CHKERRQ(ierr); > >> if (flg) { > >> PetscViewers viewers; > >> ierr = > PetscViewersCreate(PetscObjectComm((PetscObject)ksp),&viewers);CHKERRQ(ierr); > >> ierr = > KSPMonitorSet(ksp,KSPGMRESMonitorKrylov,viewers,(PetscErrorCode > (*)(void**))PetscViewersDestroy);CHKERRQ(ierr); > >> } > >> ierr = PetscOptionsTail();CHKERRQ(ierr); > >> PetscFunctionReturn(0); > >> } > >> > >> PetscErrorCode KSPGMRESSetHapTol_GMRES(KSP ksp,PetscReal tol) > >> { > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > >> > >> PetscFunctionBegin; > >> if (tol < 0.0) > SETERRQ(PetscObjectComm((PetscObject)ksp),PETSC_ERR_ARG_OUTOFRANGE,"Tolerance > must be non-negative"); > >> gmres->haptol = tol; > >> PetscFunctionReturn(0); > >> } > >> > >> PetscErrorCode KSPGMRESGetRestart_GMRES(KSP ksp,PetscInt *max_k) > >> { > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > >> > >> PetscFunctionBegin; > >> *max_k = gmres->max_k; > >> PetscFunctionReturn(0); > >> } > >> > >> PetscErrorCode KSPGMRESSetRestart_GMRES(KSP ksp,PetscInt max_k) > >> { > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > >> PetscErrorCode ierr; > >> > >> PetscFunctionBegin; > >> if (max_k < 1) > SETERRQ(PetscObjectComm((PetscObject)ksp),PETSC_ERR_ARG_OUTOFRANGE,"Restart > must be positive"); > >> if (!ksp->setupstage) { > >> gmres->max_k = max_k; > >> } else if (gmres->max_k != max_k) { > >> gmres->max_k = max_k; > >> ksp->setupstage = KSP_SETUP_NEW; > >> /* free the data structures, then create them again */ > >> ierr = KSPReset_GMRES(ksp);CHKERRQ(ierr); > >> } > >> PetscFunctionReturn(0); > >> } > >> > >> PetscErrorCode KSPGMRESSetOrthogonalization_GMRES(KSP ksp,FCN fcn) > >> { > >> PetscFunctionBegin; > >> ((KSP_GMRES*)ksp->data)->orthog = fcn; > >> PetscFunctionReturn(0); > >> } > >> > >> PetscErrorCode KSPGMRESGetOrthogonalization_GMRES(KSP ksp,FCN *fcn) > >> { > >> PetscFunctionBegin; > >> *fcn = ((KSP_GMRES*)ksp->data)->orthog; > >> PetscFunctionReturn(0); > >> } > >> > >> PetscErrorCode KSPGMRESSetPreAllocateVectors_GMRES(KSP ksp) > >> { > >> KSP_GMRES *gmres; > >> > >> PetscFunctionBegin; > >> gmres = (KSP_GMRES*)ksp->data; > >> gmres->q_preallocate = 1; > >> PetscFunctionReturn(0); > >> } > >> > >> PetscErrorCode KSPGMRESSetCGSRefinementType_GMRES(KSP > ksp,KSPGMRESCGSRefinementType type) > >> { > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > >> > >> PetscFunctionBegin; > >> gmres->cgstype = type; > >> PetscFunctionReturn(0); > >> } > >> > >> PetscErrorCode KSPGMRESGetCGSRefinementType_GMRES(KSP > ksp,KSPGMRESCGSRefinementType *type) > >> { > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > >> > >> PetscFunctionBegin; > >> *type = gmres->cgstype; > >> PetscFunctionReturn(0); > >> } > >> > >> /*@ > >> KSPGMRESSetCGSRefinementType - Sets the type of iterative refinement > to use > >> in the classical Gram Schmidt orthogonalization. > >> > >> Logically Collective on KSP > >> > >> Input Parameters: > >> + ksp - the Krylov space context > >> - type - the type of refinement > >> > >> Options Database: > >> . -ksp_gmres_cgs_refinement_type > > >> > >> Level: intermediate > >> > >> .keywords: KSP, GMRES, iterative refinement > >> > >> .seealso: KSPGMRESSetOrthogonalization(), KSPGMRESCGSRefinementType, > KSPGMRESClassicalGramSchmidtOrthogonalization(), > KSPGMRESGetCGSRefinementType(), > >> KSPGMRESGetOrthogonalization() > >> @*/ > >> PetscErrorCode KSPGMRESSetCGSRefinementType(KSP > ksp,KSPGMRESCGSRefinementType type) > >> { > >> PetscErrorCode ierr; > >> > >> PetscFunctionBegin; > >> PetscValidHeaderSpecific(ksp,KSP_CLASSID,1); > >> PetscValidLogicalCollectiveEnum(ksp,type,2); > >> ierr = > PetscTryMethod(ksp,"KSPGMRESSetCGSRefinementType_C",(KSP,KSPGMRESCGSRefinementType),(ksp,type));CHKERRQ(ierr); > >> PetscFunctionReturn(0); > >> } > >> > >> /*@ > >> KSPGMRESGetCGSRefinementType - Gets the type of iterative refinement > to use > >> in the classical Gram Schmidt orthogonalization. > >> > >> Not Collective > >> > >> Input Parameter: > >> . ksp - the Krylov space context > >> > >> Output Parameter: > >> . type - the type of refinement > >> > >> Options Database: > >> . -ksp_gmres_cgs_refinement_type > >> > >> Level: intermediate > >> > >> .keywords: KSP, GMRES, iterative refinement > >> > >> .seealso: KSPGMRESSetOrthogonalization(), KSPGMRESCGSRefinementType, > KSPGMRESClassicalGramSchmidtOrthogonalization(), > KSPGMRESSetCGSRefinementType(), > >> KSPGMRESGetOrthogonalization() > >> @*/ > >> PetscErrorCode KSPGMRESGetCGSRefinementType(KSP > ksp,KSPGMRESCGSRefinementType *type) > >> { > >> PetscErrorCode ierr; > >> > >> PetscFunctionBegin; > >> PetscValidHeaderSpecific(ksp,KSP_CLASSID,1); > >> ierr = > PetscUseMethod(ksp,"KSPGMRESGetCGSRefinementType_C",(KSP,KSPGMRESCGSRefinementType*),(ksp,type));CHKERRQ(ierr); > >> PetscFunctionReturn(0); > >> } > >> > >> > >> /*@ > >> KSPGMRESSetRestart - Sets number of iterations at which GMRES, FGMRES > and LGMRES restarts. > >> > >> Logically Collective on KSP > >> > >> Input Parameters: > >> + ksp - the Krylov space context > >> - restart - integer restart value > >> > >> Options Database: > >> . -ksp_gmres_restart > >> > >> Note: The default value is 30. > >> > >> Level: intermediate > >> > >> .keywords: KSP, GMRES, restart, iterations > >> > >> .seealso: KSPSetTolerances(), KSPGMRESSetOrthogonalization(), > KSPGMRESSetPreAllocateVectors(), KSPGMRESGetRestart() > >> @*/ > >> PetscErrorCode KSPGMRESSetRestart(KSP ksp, PetscInt restart) > >> { > >> PetscErrorCode ierr; > >> > >> PetscFunctionBegin; > >> PetscValidLogicalCollectiveInt(ksp,restart,2); > >> > >> ierr = > PetscTryMethod(ksp,"KSPGMRESSetRestart_C",(KSP,PetscInt),(ksp,restart));CHKERRQ(ierr); > >> PetscFunctionReturn(0); > >> } > >> > >> /*@ > >> KSPGMRESGetRestart - Gets number of iterations at which GMRES, FGMRES > and LGMRES restarts. > >> > >> Not Collective > >> > >> Input Parameter: > >> . ksp - the Krylov space context > >> > >> Output Parameter: > >> . restart - integer restart value > >> > >> Note: The default value is 30. > >> > >> Level: intermediate > >> > >> .keywords: KSP, GMRES, restart, iterations > >> > >> .seealso: KSPSetTolerances(), KSPGMRESSetOrthogonalization(), > KSPGMRESSetPreAllocateVectors(), KSPGMRESSetRestart() > >> @*/ > >> PetscErrorCode KSPGMRESGetRestart(KSP ksp, PetscInt *restart) > >> { > >> PetscErrorCode ierr; > >> > >> PetscFunctionBegin; > >> ierr = > PetscUseMethod(ksp,"KSPGMRESGetRestart_C",(KSP,PetscInt*),(ksp,restart));CHKERRQ(ierr); > >> PetscFunctionReturn(0); > >> } > >> > >> /*@ > >> KSPGMRESSetHapTol - Sets tolerance for determining happy breakdown in > GMRES, FGMRES and LGMRES. > >> > >> Logically Collective on KSP > >> > >> Input Parameters: > >> + ksp - the Krylov space context > >> - tol - the tolerance > >> > >> Options Database: > >> . -ksp_gmres_haptol > >> > >> Note: Happy breakdown is the rare case in GMRES where an 'exact' > solution is obtained after > >> a certain number of iterations. If you attempt more iterations > after this point unstable > >> things can happen hence very occasionally you may need to set > this value to detect this condition > >> > >> Level: intermediate > >> > >> .keywords: KSP, GMRES, tolerance > >> > >> .seealso: KSPSetTolerances() > >> @*/ > >> PetscErrorCode KSPGMRESSetHapTol(KSP ksp,PetscReal tol) > >> { > >> PetscErrorCode ierr; > >> > >> PetscFunctionBegin; > >> PetscValidLogicalCollectiveReal(ksp,tol,2); > >> ierr = > PetscTryMethod((ksp),"KSPGMRESSetHapTol_C",(KSP,PetscReal),((ksp),(tol)));CHKERRQ(ierr); > >> PetscFunctionReturn(0); > >> } > >> > >> /*MC > >> KSPGMRES - Implements the Generalized Minimal Residual method. > >> (Saad and Schultz, 1986) with restart > >> > >> > >> Options Database Keys: > >> + -ksp_gmres_restart - the number of Krylov directions to > orthogonalize against > >> . -ksp_gmres_haptol - sets the tolerance for "happy ending" > (exact convergence) > >> . -ksp_gmres_preallocate - preallocate all the Krylov search > directions initially (otherwise groups of > >> vectors are allocated as needed) > >> . -ksp_gmres_classicalgramschmidt - use classical (unmodified) > Gram-Schmidt to orthogonalize against the Krylov space (fast) (the default) > >> . -ksp_gmres_modifiedgramschmidt - use modified Gram-Schmidt in the > orthogonalization (more stable, but slower) > >> . -ksp_gmres_cgs_refinement_type - determine > if iterative refinement is used to increase the > >> stability of the classical > Gram-Schmidt orthogonalization. > >> - -ksp_gmres_krylov_monitor - plot the Krylov space generated > >> > >> Level: beginner > >> > >> Notes: Left and right preconditioning are supported, but not > symmetric preconditioning. > >> > >> References: > >> . 1. - YOUCEF SAAD AND MARTIN H. SCHULTZ, GMRES: A GENERALIZED > MINIMAL RESIDUAL ALGORITHM FOR SOLVING NONSYMMETRIC LINEAR SYSTEMS. > >> SIAM J. ScI. STAT. COMPUT. Vo|. 7, No. 3, July 1986. > >> > >> .seealso: KSPCreate(), KSPSetType(), KSPType (for list of available > types), KSP, KSPFGMRES, KSPLGMRES, > >> KSPGMRESSetRestart(), KSPGMRESSetHapTol(), > KSPGMRESSetPreAllocateVectors(), KSPGMRESSetOrthogonalization(), > KSPGMRESGetOrthogonalization(), > >> KSPGMRESClassicalGramSchmidtOrthogonalization(), > KSPGMRESModifiedGramSchmidtOrthogonalization(), > >> KSPGMRESCGSRefinementType, KSPGMRESSetCGSRefinementType(), > KSPGMRESGetCGSRefinementType(), KSPGMRESMonitorKrylov(), KSPSetPCSide() > >> > >> M*/ > >> > >> PETSC_EXTERN PetscErrorCode KSPCreate_GMRES(KSP ksp) > >> { > >> KSP_GMRES *gmres; > >> PetscErrorCode ierr; > >> > >> PetscFunctionBegin; > >> ierr = PetscNewLog(ksp,&gmres);CHKERRQ(ierr); > >> ksp->data = (void*)gmres; > >> > >> ierr = > KSPSetSupportedNorm(ksp,KSP_NORM_PRECONDITIONED,PC_LEFT,4);CHKERRQ(ierr); > >> ierr = > KSPSetSupportedNorm(ksp,KSP_NORM_UNPRECONDITIONED,PC_RIGHT,3);CHKERRQ(ierr); > >> ierr = > KSPSetSupportedNorm(ksp,KSP_NORM_PRECONDITIONED,PC_SYMMETRIC,2);CHKERRQ(ierr); > >> ierr = KSPSetSupportedNorm(ksp,KSP_NORM_NONE,PC_RIGHT,1);CHKERRQ(ierr); > >> ierr = KSPSetSupportedNorm(ksp,KSP_NORM_NONE,PC_LEFT,1);CHKERRQ(ierr); > >> > >> ksp->ops->buildsolution = KSPBuildSolution_GMRES; > >> ksp->ops->setup = KSPSetUp_GMRES; > >> ksp->ops->solve = KSPSolve_GMRES; > >> ksp->ops->reset = KSPReset_GMRES; > >> ksp->ops->destroy = KSPDestroy_GMRES; > >> ksp->ops->view = KSPView_GMRES; > >> ksp->ops->setfromoptions = KSPSetFromOptions_GMRES; > >> ksp->ops->computeextremesingularvalues = > KSPComputeExtremeSingularValues_GMRES; > >> ksp->ops->computeeigenvalues = KSPComputeEigenvalues_GMRES; > >> #if !defined(PETSC_USE_COMPLEX) && !defined(PETSC_HAVE_ESSL) > >> ksp->ops->computeritz = KSPComputeRitz_GMRES; > >> #endif > >> ierr = > PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetPreAllocateVectors_C",KSPGMRESSetPreAllocateVectors_GMRES);CHKERRQ(ierr); > >> ierr = > PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetOrthogonalization_C",KSPGMRESSetOrthogonalization_GMRES);CHKERRQ(ierr); > >> ierr = > PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESGetOrthogonalization_C",KSPGMRESGetOrthogonalization_GMRES);CHKERRQ(ierr); > >> ierr = > PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetRestart_C",KSPGMRESSetRestart_GMRES);CHKERRQ(ierr); > >> ierr = > PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESGetRestart_C",KSPGMRESGetRestart_GMRES);CHKERRQ(ierr); > >> ierr = > PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetHapTol_C",KSPGMRESSetHapTol_GMRES);CHKERRQ(ierr); > >> ierr = > PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetCGSRefinementType_C",KSPGMRESSetCGSRefinementType_GMRES);CHKERRQ(ierr); > >> ierr = > PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESGetCGSRefinementType_C",KSPGMRESGetCGSRefinementType_GMRES);CHKERRQ(ierr); > >> > >> gmres->haptol = 1.0e-30; > >> gmres->q_preallocate = 0; > >> gmres->delta_allocate = GMRES_DELTA_DIRECTIONS; > >> gmres->orthog = KSPGMRESClassicalGramSchmidtOrthogonalization; > >> gmres->nrs = 0; > >> gmres->sol_temp = 0; > >> gmres->max_k = GMRES_DEFAULT_MAXK; > >> gmres->Rsvd = 0; > >> gmres->cgstype = KSP_GMRES_CGS_REFINE_NEVER; > >> gmres->orthogwork = 0; > >> gmres->delta = -1.0; // DRL > >> PetscFunctionReturn(0); > >> } > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon May 20 03:03:03 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Mon, 20 May 2019 08:03:03 +0000 Subject: [petsc-users] Creating a DMNetwork from a DMPlex In-Reply-To: References: <9DAFD49B-AB7F-435F-BB27-16EF946E1241@mcs.anl.gov> Message-ID: Maybe try building by hand in a DMNetwork using a handrawn mesh with just a few vertices and endless and see if what you want to do makes sense > On May 20, 2019, at 2:04 AM, Swarnava Ghosh wrote: > > Hi Barry, > > Thank you for your email. My planned discretization is based on the fact that I need a distributed unstructured mesh, where at each vertex point I perform local calculations. For these calculations, I do NOT need need to assemble any global matrix. I will have fields defined at the vertices, and using linear interpolation, I am planing to find the values of these fields at some spatial points with are within a ball around each vertex. Once the values of these fields are known within the compact support around each vertex, I do local computations to calculate my unknown field. My reason for having the a mesh is essentially to 1) define fields at the vertices and 2) perform linear interpolation (using finite elements) at some spatial points. Also the local computations around at each vertex is computationally the most expensive step. In that case, having a cell partitioning will result in vertices being shared among processes, which will result in redundant computations. > > My idea is therefore to have DMNetwork to distribute vertices across processes and use finite elements for the linear interpolation part. > > Thanks, > SG > > > > On Sun, May 19, 2019 at 6:54 PM Smith, Barry F. wrote: > > I am not sure you want DMNetwork, DMNetwork has no geometry; it only has vertices and edges. Vertices are connected to other vertices through the edges. For example I can't see how one would do vertex centered finite volume methods with DMNetwork. Maybe if you said something more about your planned discretization we could figure something out. > > > On May 19, 2019, at 8:32 PM, Swarnava Ghosh wrote: > > > > Hi Barry, > > > > No, the gmesh file contains a mesh and not a graph/network. > > In that case, is it possible to create a DMNetwork first from the DMPlex and then distribute the DMNetwork. > > > > I have this case, because I want a vertex partitioning of my mesh. Domain decomposition of DMPlex gives me cell partitioning. Essentially what I want is that no two processes can share a vertex BUT that can share an edge. Similar to how a DMDA is distributed. > > > > Thanks, > > Swarnava > > > > On Sun, May 19, 2019 at 4:50 PM Smith, Barry F. wrote: > > > > This use case never occurred to us. Is the gmesh file containing a graph/network (as opposed to a mesh)? There seem two choices > > > > 1) if the gmesh file contains a graph/network one could write a gmesh reader for that case that reads directly for and constructs a DMNetwork or > > > > 2) write a converter for a DMPlex to DMNetwork. > > > > I lean toward the first > > > > Either way you need to understand the documentation for DMNetwork and how to build one up. > > > > > > Barry > > > > > > > On May 19, 2019, at 6:34 PM, Swarnava Ghosh via petsc-users wrote: > > > > > > Hi Petsc users and developers, > > > > > > I am trying to find a way of creating a DMNetwork from a DMPlex. I have read the DMPlex from a gmesh file and have it distributed. > > > > > > Thanks, > > > SG > > > From bsmith at mcs.anl.gov Mon May 20 03:06:53 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Mon, 20 May 2019 08:06:53 +0000 Subject: [petsc-users] Calling LAPACK routines from PETSc In-Reply-To: References: <8736l9abj4.fsf@jedbrown.org> <8D37944A-6C37-48D0-B238-4E9806D93B7E@anl.gov> Message-ID: <92E6EB6E-A0C0-4E1B-94B3-3DA34C87F4F3@mcs.anl.gov> > On May 20, 2019, at 2:28 AM, Dave Lee wrote: > > Thanks Jed and Barry, > > So, just to confirm, > > -- From the KSP_GMRES structure, if I call *HH(a,b), that will return the row a, column b entry of the Hessenberg matrix (while the back end array *hh_origin array is ordering using the Fortran convention) > > -- Matrices are passed into and returned from PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_() using Fortran indexing, and need to be transposed to get back to C ordering In general, I guess depending on what you want to do with them you don' need to transpose them. Why would you want to? Just leave them as little column oriented blogs and with them what you need directly. Just do stuff and you'll find it works out. > > Are both of these statements correct? > > Cheers, Dave. > > On Mon, May 20, 2019 at 4:34 PM Smith, Barry F. wrote: > > The little work arrays in GMRES tend to be stored in Fortran ordering; there is no C style p[][] indexing into such arrays. Thus the arrays can safely be sent to LAPACK. The only trick is knowing the two dimensions and as Jed say the "leading dimension parameter. He gave you a place to look > > > On May 20, 2019, at 1:24 AM, Jed Brown via petsc-users wrote: > > > > Dave Lee via petsc-users writes: > > > >> Hi Petsc, > >> > >> I'm attempting to implement a "hookstep" for the SNES trust region solver. > >> Essentially what I'm trying to do is replace the solution of the least > >> squares problem at the end of each GMRES solve with a modified solution > >> with a norm that is constrained to be within the size of the trust region. > >> > >> In order to do this I need to perform an SVD on the Hessenberg matrix, > >> which copying the function KSPComputeExtremeSingularValues(), I'm trying to > >> do by accessing the LAPACK function dgesvd() via the PetscStackCallBLAS() > >> machinery. One thing I'm confused about however is the ordering of the 2D > >> arrays into and out of this function, given that that C and FORTRAN arrays > >> use reverse indexing, ie: C[j+1][i+1] = F[i,j]. > >> > >> Given that the Hessenberg matrix has k+1 rows and k columns, should I be > >> still be initializing this as H[row][col] and passing this into > >> PetscStackCallBLAS("LAPACKgesvd",LAPACKgrsvd_(...)) > >> or should I be transposing this before passing it in? > > > > LAPACK terminology is with respect to Fortran ordering. There is a > > "leading dimension" parameter so that you can operate on non-contiguous > > blocks. See KSPComputeExtremeSingularValues_GMRES for an example. > > > >> Also for the left and right singular vector matrices that are returned by > >> this function, should I be transposing these before I interpret them as C > >> arrays? > >> > >> I've attached my modified version of gmres.c in case this is helpful. If > >> you grep for DRL (my initials) then you'll see my changes to the code. > >> > >> Cheers, Dave. > >> > >> /* > >> This file implements GMRES (a Generalized Minimal Residual) method. > >> Reference: Saad and Schultz, 1986. > >> > >> > >> Some comments on left vs. right preconditioning, and restarts. > >> Left and right preconditioning. > >> If right preconditioning is chosen, then the problem being solved > >> by gmres is actually > >> My = AB^-1 y = f > >> so the initial residual is > >> r = f - Mx > >> Note that B^-1 y = x or y = B x, and if x is non-zero, the initial > >> residual is > >> r = f - A x > >> The final solution is then > >> x = B^-1 y > >> > >> If left preconditioning is chosen, then the problem being solved is > >> My = B^-1 A x = B^-1 f, > >> and the initial residual is > >> r = B^-1(f - Ax) > >> > >> Restarts: Restarts are basically solves with x0 not equal to zero. > >> Note that we can eliminate an extra application of B^-1 between > >> restarts as long as we don't require that the solution at the end > >> of an unsuccessful gmres iteration always be the solution x. > >> */ > >> > >> #include <../src/ksp/ksp/impls/gmres/gmresimpl.h> /*I "petscksp.h" I*/ > >> #include // DRL > >> #define GMRES_DELTA_DIRECTIONS 10 > >> #define GMRES_DEFAULT_MAXK 30 > >> static PetscErrorCode KSPGMRESUpdateHessenberg(KSP,PetscInt,PetscBool,PetscReal*); > >> static PetscErrorCode KSPGMRESBuildSoln(PetscScalar*,Vec,Vec,KSP,PetscInt); > >> > >> PetscErrorCode KSPSetUp_GMRES(KSP ksp) > >> { > >> PetscInt hh,hes,rs,cc; > >> PetscErrorCode ierr; > >> PetscInt max_k,k; > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > >> > >> PetscFunctionBegin; > >> max_k = gmres->max_k; /* restart size */ > >> hh = (max_k + 2) * (max_k + 1); > >> hes = (max_k + 1) * (max_k + 1); > >> rs = (max_k + 2); > >> cc = (max_k + 1); > >> > >> ierr = PetscCalloc5(hh,&gmres->hh_origin,hes,&gmres->hes_origin,rs,&gmres->rs_origin,cc,&gmres->cc_origin,cc,&gmres->ss_origin);CHKERRQ(ierr); > >> ierr = PetscLogObjectMemory((PetscObject)ksp,(hh + hes + rs + 2*cc)*sizeof(PetscScalar));CHKERRQ(ierr); > >> > >> if (ksp->calc_sings) { > >> /* Allocate workspace to hold Hessenberg matrix needed by lapack */ > >> ierr = PetscMalloc1((max_k + 3)*(max_k + 9),&gmres->Rsvd);CHKERRQ(ierr); > >> ierr = PetscLogObjectMemory((PetscObject)ksp,(max_k + 3)*(max_k + 9)*sizeof(PetscScalar));CHKERRQ(ierr); > >> ierr = PetscMalloc1(6*(max_k+2),&gmres->Dsvd);CHKERRQ(ierr); > >> ierr = PetscLogObjectMemory((PetscObject)ksp,6*(max_k+2)*sizeof(PetscReal));CHKERRQ(ierr); > >> } > >> > >> /* Allocate array to hold pointers to user vectors. Note that we need > >> 4 + max_k + 1 (since we need it+1 vectors, and it <= max_k) */ > >> gmres->vecs_allocated = VEC_OFFSET + 2 + max_k + gmres->nextra_vecs; > >> > >> ierr = PetscMalloc1(gmres->vecs_allocated,&gmres->vecs);CHKERRQ(ierr); > >> ierr = PetscMalloc1(VEC_OFFSET+2+max_k,&gmres->user_work);CHKERRQ(ierr); > >> ierr = PetscMalloc1(VEC_OFFSET+2+max_k,&gmres->mwork_alloc);CHKERRQ(ierr); > >> ierr = PetscLogObjectMemory((PetscObject)ksp,(VEC_OFFSET+2+max_k)*(sizeof(Vec*)+sizeof(PetscInt)) + gmres->vecs_allocated*sizeof(Vec));CHKERRQ(ierr); > >> > >> if (gmres->q_preallocate) { > >> gmres->vv_allocated = VEC_OFFSET + 2 + max_k; > >> > >> ierr = KSPCreateVecs(ksp,gmres->vv_allocated,&gmres->user_work[0],0,NULL);CHKERRQ(ierr); > >> ierr = PetscLogObjectParents(ksp,gmres->vv_allocated,gmres->user_work[0]);CHKERRQ(ierr); > >> > >> gmres->mwork_alloc[0] = gmres->vv_allocated; > >> gmres->nwork_alloc = 1; > >> for (k=0; kvv_allocated; k++) { > >> gmres->vecs[k] = gmres->user_work[0][k]; > >> } > >> } else { > >> gmres->vv_allocated = 5; > >> > >> ierr = KSPCreateVecs(ksp,5,&gmres->user_work[0],0,NULL);CHKERRQ(ierr); > >> ierr = PetscLogObjectParents(ksp,5,gmres->user_work[0]);CHKERRQ(ierr); > >> > >> gmres->mwork_alloc[0] = 5; > >> gmres->nwork_alloc = 1; > >> for (k=0; kvv_allocated; k++) { > >> gmres->vecs[k] = gmres->user_work[0][k]; > >> } > >> } > >> PetscFunctionReturn(0); > >> } > >> > >> /* > >> Run gmres, possibly with restart. Return residual history if requested. > >> input parameters: > >> > >> . gmres - structure containing parameters and work areas > >> > >> output parameters: > >> . nres - residuals (from preconditioned system) at each step. > >> If restarting, consider passing nres+it. If null, > >> ignored > >> . itcount - number of iterations used. nres[0] to nres[itcount] > >> are defined. If null, ignored. > >> > >> Notes: > >> On entry, the value in vector VEC_VV(0) should be the initial residual > >> (this allows shortcuts where the initial preconditioned residual is 0). > >> */ > >> PetscErrorCode KSPGMRESCycle(PetscInt *itcount,KSP ksp) > >> { > >> KSP_GMRES *gmres = (KSP_GMRES*)(ksp->data); > >> PetscReal res_norm,res,hapbnd,tt; > >> PetscErrorCode ierr; > >> PetscInt it = 0, max_k = gmres->max_k; > >> PetscBool hapend = PETSC_FALSE; > >> > >> PetscFunctionBegin; > >> if (itcount) *itcount = 0; > >> ierr = VecNormalize(VEC_VV(0),&res_norm);CHKERRQ(ierr); > >> KSPCheckNorm(ksp,res_norm); > >> res = res_norm; > >> *GRS(0) = res_norm; > >> > >> /* check for the convergence */ > >> ierr = PetscObjectSAWsTakeAccess((PetscObject)ksp);CHKERRQ(ierr); > >> ksp->rnorm = res; > >> ierr = PetscObjectSAWsGrantAccess((PetscObject)ksp);CHKERRQ(ierr); > >> gmres->it = (it - 1); > >> ierr = KSPLogResidualHistory(ksp,res);CHKERRQ(ierr); > >> ierr = KSPMonitor(ksp,ksp->its,res);CHKERRQ(ierr); > >> if (!res) { > >> ksp->reason = KSP_CONVERGED_ATOL; > >> ierr = PetscInfo(ksp,"Converged due to zero residual norm on entry\n");CHKERRQ(ierr); > >> PetscFunctionReturn(0); > >> } > >> > >> ierr = (*ksp->converged)(ksp,ksp->its,res,&ksp->reason,ksp->cnvP);CHKERRQ(ierr); > >> while (!ksp->reason && it < max_k && ksp->its < ksp->max_it) { > >> if (it) { > >> ierr = KSPLogResidualHistory(ksp,res);CHKERRQ(ierr); > >> ierr = KSPMonitor(ksp,ksp->its,res);CHKERRQ(ierr); > >> } > >> gmres->it = (it - 1); > >> if (gmres->vv_allocated <= it + VEC_OFFSET + 1) { > >> ierr = KSPGMRESGetNewVectors(ksp,it+1);CHKERRQ(ierr); > >> } > >> ierr = KSP_PCApplyBAorAB(ksp,VEC_VV(it),VEC_VV(1+it),VEC_TEMP_MATOP);CHKERRQ(ierr); > >> > >> /* update hessenberg matrix and do Gram-Schmidt */ > >> ierr = (*gmres->orthog)(ksp,it);CHKERRQ(ierr); > >> if (ksp->reason) break; > >> > >> /* vv(i+1) . vv(i+1) */ > >> ierr = VecNormalize(VEC_VV(it+1),&tt);CHKERRQ(ierr); > >> > >> /* save the magnitude */ > >> *HH(it+1,it) = tt; > >> *HES(it+1,it) = tt; > >> > >> /* check for the happy breakdown */ > >> hapbnd = PetscAbsScalar(tt / *GRS(it)); > >> if (hapbnd > gmres->haptol) hapbnd = gmres->haptol; > >> if (tt < hapbnd) { > >> ierr = PetscInfo2(ksp,"Detected happy breakdown, current hapbnd = %14.12e tt = %14.12e\n",(double)hapbnd,(double)tt);CHKERRQ(ierr); > >> hapend = PETSC_TRUE; > >> } > >> ierr = KSPGMRESUpdateHessenberg(ksp,it,hapend,&res);CHKERRQ(ierr); > >> > >> it++; > >> gmres->it = (it-1); /* For converged */ > >> ksp->its++; > >> ksp->rnorm = res; > >> if (ksp->reason) break; > >> > >> ierr = (*ksp->converged)(ksp,ksp->its,res,&ksp->reason,ksp->cnvP);CHKERRQ(ierr); > >> > >> /* Catch error in happy breakdown and signal convergence and break from loop */ > >> if (hapend) { > >> if (!ksp->reason) { > >> if (ksp->errorifnotconverged) SETERRQ1(PetscObjectComm((PetscObject)ksp),PETSC_ERR_NOT_CONVERGED,"You reached the happy break down, but convergence was not indicated. Residual norm = %g",(double)res); > >> else { > >> ksp->reason = KSP_DIVERGED_BREAKDOWN; > >> break; > >> } > >> } > >> } > >> } > >> > >> /* Monitor if we know that we will not return for a restart */ > >> if (it && (ksp->reason || ksp->its >= ksp->max_it)) { > >> ierr = KSPLogResidualHistory(ksp,res);CHKERRQ(ierr); > >> ierr = KSPMonitor(ksp,ksp->its,res);CHKERRQ(ierr); > >> } > >> > >> if (itcount) *itcount = it; > >> > >> > >> /* > >> Down here we have to solve for the "best" coefficients of the Krylov > >> columns, add the solution values together, and possibly unwind the > >> preconditioning from the solution > >> */ > >> /* Form the solution (or the solution so far) */ > >> ierr = KSPGMRESBuildSoln(GRS(0),ksp->vec_sol,ksp->vec_sol,ksp,it-1);CHKERRQ(ierr); > >> PetscFunctionReturn(0); > >> } > >> > >> PetscErrorCode KSPSolve_GMRES(KSP ksp) > >> { > >> PetscErrorCode ierr; > >> PetscInt its,itcount,i; > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > >> PetscBool guess_zero = ksp->guess_zero; > >> PetscInt N = gmres->max_k + 1; > >> PetscBLASInt bN; > >> > >> PetscFunctionBegin; > >> if (ksp->calc_sings && !gmres->Rsvd) SETERRQ(PetscObjectComm((PetscObject)ksp),PETSC_ERR_ORDER,"Must call KSPSetComputeSingularValues() before KSPSetUp() is called"); > >> > >> ierr = PetscObjectSAWsTakeAccess((PetscObject)ksp);CHKERRQ(ierr); > >> ksp->its = 0; > >> ierr = PetscObjectSAWsGrantAccess((PetscObject)ksp);CHKERRQ(ierr); > >> > >> itcount = 0; > >> gmres->fullcycle = 0; > >> ksp->reason = KSP_CONVERGED_ITERATING; > >> while (!ksp->reason) { > >> ierr = KSPInitialResidual(ksp,ksp->vec_sol,VEC_TEMP,VEC_TEMP_MATOP,VEC_VV(0),ksp->vec_rhs);CHKERRQ(ierr); > >> ierr = KSPGMRESCycle(&its,ksp);CHKERRQ(ierr); > >> /* Store the Hessenberg matrix and the basis vectors of the Krylov subspace > >> if the cycle is complete for the computation of the Ritz pairs */ > >> if (its == gmres->max_k) { > >> gmres->fullcycle++; > >> if (ksp->calc_ritz) { > >> if (!gmres->hes_ritz) { > >> ierr = PetscMalloc1(N*N,&gmres->hes_ritz);CHKERRQ(ierr); > >> ierr = PetscLogObjectMemory((PetscObject)ksp,N*N*sizeof(PetscScalar));CHKERRQ(ierr); > >> ierr = VecDuplicateVecs(VEC_VV(0),N,&gmres->vecb);CHKERRQ(ierr); > >> } > >> ierr = PetscBLASIntCast(N,&bN);CHKERRQ(ierr); > >> ierr = PetscMemcpy(gmres->hes_ritz,gmres->hes_origin,bN*bN*sizeof(PetscReal));CHKERRQ(ierr); > >> for (i=0; imax_k+1; i++) { > >> ierr = VecCopy(VEC_VV(i),gmres->vecb[i]);CHKERRQ(ierr); > >> } > >> } > >> } > >> itcount += its; > >> if (itcount >= ksp->max_it) { > >> if (!ksp->reason) ksp->reason = KSP_DIVERGED_ITS; > >> break; > >> } > >> ksp->guess_zero = PETSC_FALSE; /* every future call to KSPInitialResidual() will have nonzero guess */ > >> } > >> ksp->guess_zero = guess_zero; /* restore if user provided nonzero initial guess */ > >> PetscFunctionReturn(0); > >> } > >> > >> PetscErrorCode KSPReset_GMRES(KSP ksp) > >> { > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > >> PetscErrorCode ierr; > >> PetscInt i; > >> > >> PetscFunctionBegin; > >> /* Free the Hessenberg matrices */ > >> ierr = PetscFree6(gmres->hh_origin,gmres->hes_origin,gmres->rs_origin,gmres->cc_origin,gmres->ss_origin,gmres->hes_ritz);CHKERRQ(ierr); > >> > >> /* free work vectors */ > >> ierr = PetscFree(gmres->vecs);CHKERRQ(ierr); > >> for (i=0; inwork_alloc; i++) { > >> ierr = VecDestroyVecs(gmres->mwork_alloc[i],&gmres->user_work[i]);CHKERRQ(ierr); > >> } > >> gmres->nwork_alloc = 0; > >> if (gmres->vecb) { > >> ierr = VecDestroyVecs(gmres->max_k+1,&gmres->vecb);CHKERRQ(ierr); > >> } > >> > >> ierr = PetscFree(gmres->user_work);CHKERRQ(ierr); > >> ierr = PetscFree(gmres->mwork_alloc);CHKERRQ(ierr); > >> ierr = PetscFree(gmres->nrs);CHKERRQ(ierr); > >> ierr = VecDestroy(&gmres->sol_temp);CHKERRQ(ierr); > >> ierr = PetscFree(gmres->Rsvd);CHKERRQ(ierr); > >> ierr = PetscFree(gmres->Dsvd);CHKERRQ(ierr); > >> ierr = PetscFree(gmres->orthogwork);CHKERRQ(ierr); > >> > >> gmres->sol_temp = 0; > >> gmres->vv_allocated = 0; > >> gmres->vecs_allocated = 0; > >> gmres->sol_temp = 0; > >> PetscFunctionReturn(0); > >> } > >> > >> PetscErrorCode KSPDestroy_GMRES(KSP ksp) > >> { > >> PetscErrorCode ierr; > >> > >> PetscFunctionBegin; > >> ierr = KSPReset_GMRES(ksp);CHKERRQ(ierr); > >> ierr = PetscFree(ksp->data);CHKERRQ(ierr); > >> /* clear composed functions */ > >> ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetPreAllocateVectors_C",NULL);CHKERRQ(ierr); > >> ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetOrthogonalization_C",NULL);CHKERRQ(ierr); > >> ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESGetOrthogonalization_C",NULL);CHKERRQ(ierr); > >> ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetRestart_C",NULL);CHKERRQ(ierr); > >> ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESGetRestart_C",NULL);CHKERRQ(ierr); > >> ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetHapTol_C",NULL);CHKERRQ(ierr); > >> ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetCGSRefinementType_C",NULL);CHKERRQ(ierr); > >> ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESGetCGSRefinementType_C",NULL);CHKERRQ(ierr); > >> PetscFunctionReturn(0); > >> } > >> /* > >> KSPGMRESBuildSoln - create the solution from the starting vector and the > >> current iterates. > >> > >> Input parameters: > >> nrs - work area of size it + 1. > >> vs - index of initial guess > >> vdest - index of result. Note that vs may == vdest (replace > >> guess with the solution). > >> > >> This is an internal routine that knows about the GMRES internals. > >> */ > >> static PetscErrorCode KSPGMRESBuildSoln(PetscScalar *nrs,Vec vs,Vec vdest,KSP ksp,PetscInt it) > >> { > >> PetscScalar tt; > >> PetscErrorCode ierr; > >> PetscInt ii,k,j; > >> KSP_GMRES *gmres = (KSP_GMRES*)(ksp->data); > >> > >> PetscFunctionBegin; > >> /* Solve for solution vector that minimizes the residual */ > >> > >> /* If it is < 0, no gmres steps have been performed */ > >> if (it < 0) { > >> ierr = VecCopy(vs,vdest);CHKERRQ(ierr); /* VecCopy() is smart, exists immediately if vguess == vdest */ > >> PetscFunctionReturn(0); > >> } > >> if (*HH(it,it) != 0.0) { > >> nrs[it] = *GRS(it) / *HH(it,it); > >> } else { > >> ksp->reason = KSP_DIVERGED_BREAKDOWN; > >> > >> ierr = PetscInfo2(ksp,"Likely your matrix or preconditioner is singular. HH(it,it) is identically zero; it = %D GRS(it) = %g\n",it,(double)PetscAbsScalar(*GRS(it)));CHKERRQ(ierr); > >> PetscFunctionReturn(0); > >> } > >> for (ii=1; ii<=it; ii++) { > >> k = it - ii; > >> tt = *GRS(k); > >> for (j=k+1; j<=it; j++) tt = tt - *HH(k,j) * nrs[j]; > >> if (*HH(k,k) == 0.0) { > >> ksp->reason = KSP_DIVERGED_BREAKDOWN; > >> > >> ierr = PetscInfo1(ksp,"Likely your matrix or preconditioner is singular. HH(k,k) is identically zero; k = %D\n",k);CHKERRQ(ierr); > >> PetscFunctionReturn(0); > >> } > >> nrs[k] = tt / *HH(k,k); > >> } > >> > >> /* Perform the hookstep correction - DRL */ > >> if(gmres->delta > 0.0 && gmres->it > 0) { // Apply the hookstep to correct the GMRES solution (if required) > >> printf("\t\tapplying hookstep: initial delta: %lf", gmres->delta); > >> PetscInt N = gmres->max_k+2, ii, jj, j0; > >> PetscBLASInt nRows, nCols, lwork, lierr; > >> PetscScalar *R, *work; > >> PetscReal* S; > >> PetscScalar *U, *VT, *p, *q, *y; > >> PetscScalar bnorm, mu, qMag, qMag2, delta2; > >> > >> ierr = PetscMalloc1((gmres->max_k + 3)*(gmres->max_k + 9),&R);CHKERRQ(ierr); > >> work = R + N*N; > >> ierr = PetscMalloc1(6*(gmres->max_k+2),&S);CHKERRQ(ierr); > >> > >> ierr = PetscBLASIntCast(gmres->it+1,&nRows);CHKERRQ(ierr); > >> ierr = PetscBLASIntCast(gmres->it+0,&nCols);CHKERRQ(ierr); > >> ierr = PetscBLASIntCast(5*N,&lwork);CHKERRQ(ierr); > >> //ierr = PetscMemcpy(R,gmres->hes_origin,(gmres->max_k+2)*(gmres->max_k+1)*sizeof(PetscScalar));CHKERRQ(ierr); > >> ierr = PetscMalloc1(nRows*nCols,&R);CHKERRQ(ierr); > >> for (ii = 0; ii < nRows; ii++) { > >> for (jj = 0; jj < nCols; jj++) { > >> R[ii*nCols+jj] = *HH(ii,jj); > >> // Ensure Hessenberg structure > >> //if (ii > jj+1) R[ii*nCols+jj] = 0.0; > >> } > >> } > >> > >> ierr = PetscMalloc1(nRows*nRows,&U);CHKERRQ(ierr); > >> ierr = PetscMalloc1(nCols*nCols,&VT);CHKERRQ(ierr); > >> ierr = PetscMalloc1(nRows,&p);CHKERRQ(ierr); > >> ierr = PetscMalloc1(nCols,&q);CHKERRQ(ierr); > >> ierr = PetscMalloc1(nRows,&y);CHKERRQ(ierr); > >> > >> printf("\n\n");for(ii=0;ii >> > >> // Perform an SVD on the Hessenberg matrix > >> ierr = PetscFPTrapPush(PETSC_FP_TRAP_OFF);CHKERRQ(ierr); > >> PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_("A","A",&nRows,&nCols,R,&nRows,S,U,&nRows,VT,&nCols,work,&lwork,&lierr)); > >> if (lierr) SETERRQ1(PETSC_COMM_SELF,PETSC_ERR_LIB,"Error in SVD Lapack routine %d",(int)lierr); > >> ierr = PetscFPTrapPop();CHKERRQ(ierr); > >> > >> // Compute p = ||b|| U^T e_1 > >> ierr = VecNorm(ksp->vec_rhs,NORM_2,&bnorm);CHKERRQ(ierr); > >> for (ii=0; ii >> p[ii] = bnorm*U[ii*nRows]; > >> } > >> > >> // Solve the root finding problem for \mu such that ||q|| < \delta (where \delta is the radius of the trust region) > >> // This step is largely copied from Ashley Willis' openpipeflow: doi.org/10.1016/j.softx.2017.05.003 > >> mu = S[nCols-1]*S[nCols-1]*1.0e-6; > >> if (mu < 1.0e-99) mu = 1.0e-99; > >> qMag = 1.0e+99; > >> > >> while (qMag > gmres->delta) { > >> mu *= 1.1; > >> qMag2 = 0.0; > >> for (ii=0; ii >> q[ii] = p[ii]*S[ii]/(mu + S[ii]*S[ii]); > >> qMag2 += q[ii]*q[ii]; > >> } > >> qMag = PetscSqrtScalar(qMag2); > >> } > >> > >> // Expand y in terms of the right singular vectors as y = V q > >> for (ii=0; ii >> y[ii] = 0.0; > >> for (jj=0; jj >> y[ii] += VT[jj*nCols+ii]*q[jj]; // transpose of the transpose > >> } > >> } > >> > >> // Recompute the size of the trust region, \delta > >> delta2 = 0.0; > >> for (ii=0; ii >> j0 = (ii < 2) ? 0 : ii - 1; > >> p[ii] = 0.0; > >> for (jj=j0; jj >> p[ii] -= R[ii*nCols+jj]*y[jj]; > >> } > >> if (ii == 0) { > >> p[ii] += bnorm; > >> } > >> delta2 += p[ii]*p[ii]; > >> } > >> gmres->delta = PetscSqrtScalar(delta2); > >> printf("\t\t...final delta: %lf.\n", gmres->delta); > >> > >> // Pass the orthnomalized Krylov vector weights back out > >> for (ii=0; ii >> nrs[ii] = y[ii]; > >> } > >> > >> ierr = PetscFree(R);CHKERRQ(ierr); > >> ierr = PetscFree(S);CHKERRQ(ierr); > >> ierr = PetscFree(U);CHKERRQ(ierr); > >> ierr = PetscFree(VT);CHKERRQ(ierr); > >> ierr = PetscFree(p);CHKERRQ(ierr); > >> ierr = PetscFree(q);CHKERRQ(ierr); > >> ierr = PetscFree(y);CHKERRQ(ierr); > >> } > >> /*** DRL ***/ > >> > >> /* Accumulate the correction to the solution of the preconditioned problem in TEMP */ > >> ierr = VecSet(VEC_TEMP,0.0);CHKERRQ(ierr); > >> if (gmres->delta > 0.0) { > >> ierr = VecMAXPY(VEC_TEMP,it,nrs,&VEC_VV(0));CHKERRQ(ierr); // DRL > >> } else { > >> ierr = VecMAXPY(VEC_TEMP,it+1,nrs,&VEC_VV(0));CHKERRQ(ierr); > >> } > >> > >> ierr = KSPUnwindPreconditioner(ksp,VEC_TEMP,VEC_TEMP_MATOP);CHKERRQ(ierr); > >> /* add solution to previous solution */ > >> if (vdest != vs) { > >> ierr = VecCopy(vs,vdest);CHKERRQ(ierr); > >> } > >> ierr = VecAXPY(vdest,1.0,VEC_TEMP);CHKERRQ(ierr); > >> PetscFunctionReturn(0); > >> } > >> /* > >> Do the scalar work for the orthogonalization. Return new residual norm. > >> */ > >> static PetscErrorCode KSPGMRESUpdateHessenberg(KSP ksp,PetscInt it,PetscBool hapend,PetscReal *res) > >> { > >> PetscScalar *hh,*cc,*ss,tt; > >> PetscInt j; > >> KSP_GMRES *gmres = (KSP_GMRES*)(ksp->data); > >> > >> PetscFunctionBegin; > >> hh = HH(0,it); > >> cc = CC(0); > >> ss = SS(0); > >> > >> /* Apply all the previously computed plane rotations to the new column > >> of the Hessenberg matrix */ > >> for (j=1; j<=it; j++) { > >> tt = *hh; > >> *hh = PetscConj(*cc) * tt + *ss * *(hh+1); > >> hh++; > >> *hh = *cc++ * *hh - (*ss++ * tt); > >> } > >> > >> /* > >> compute the new plane rotation, and apply it to: > >> 1) the right-hand-side of the Hessenberg system > >> 2) the new column of the Hessenberg matrix > >> thus obtaining the updated value of the residual > >> */ > >> if (!hapend) { > >> tt = PetscSqrtScalar(PetscConj(*hh) * *hh + PetscConj(*(hh+1)) * *(hh+1)); > >> if (tt == 0.0) { > >> ksp->reason = KSP_DIVERGED_NULL; > >> PetscFunctionReturn(0); > >> } > >> *cc = *hh / tt; > >> *ss = *(hh+1) / tt; > >> *GRS(it+1) = -(*ss * *GRS(it)); > >> *GRS(it) = PetscConj(*cc) * *GRS(it); > >> *hh = PetscConj(*cc) * *hh + *ss * *(hh+1); > >> *res = PetscAbsScalar(*GRS(it+1)); > >> } else { > >> /* happy breakdown: HH(it+1, it) = 0, therfore we don't need to apply > >> another rotation matrix (so RH doesn't change). The new residual is > >> always the new sine term times the residual from last time (GRS(it)), > >> but now the new sine rotation would be zero...so the residual should > >> be zero...so we will multiply "zero" by the last residual. This might > >> not be exactly what we want to do here -could just return "zero". */ > >> > >> *res = 0.0; > >> } > >> PetscFunctionReturn(0); > >> } > >> /* > >> This routine allocates more work vectors, starting from VEC_VV(it). > >> */ > >> PetscErrorCode KSPGMRESGetNewVectors(KSP ksp,PetscInt it) > >> { > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > >> PetscErrorCode ierr; > >> PetscInt nwork = gmres->nwork_alloc,k,nalloc; > >> > >> PetscFunctionBegin; > >> nalloc = PetscMin(ksp->max_it,gmres->delta_allocate); > >> /* Adjust the number to allocate to make sure that we don't exceed the > >> number of available slots */ > >> if (it + VEC_OFFSET + nalloc >= gmres->vecs_allocated) { > >> nalloc = gmres->vecs_allocated - it - VEC_OFFSET; > >> } > >> if (!nalloc) PetscFunctionReturn(0); > >> > >> gmres->vv_allocated += nalloc; > >> > >> ierr = KSPCreateVecs(ksp,nalloc,&gmres->user_work[nwork],0,NULL);CHKERRQ(ierr); > >> ierr = PetscLogObjectParents(ksp,nalloc,gmres->user_work[nwork]);CHKERRQ(ierr); > >> > >> gmres->mwork_alloc[nwork] = nalloc; > >> for (k=0; k >> gmres->vecs[it+VEC_OFFSET+k] = gmres->user_work[nwork][k]; > >> } > >> gmres->nwork_alloc++; > >> PetscFunctionReturn(0); > >> } > >> > >> PetscErrorCode KSPBuildSolution_GMRES(KSP ksp,Vec ptr,Vec *result) > >> { > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > >> PetscErrorCode ierr; > >> > >> PetscFunctionBegin; > >> if (!ptr) { > >> if (!gmres->sol_temp) { > >> ierr = VecDuplicate(ksp->vec_sol,&gmres->sol_temp);CHKERRQ(ierr); > >> ierr = PetscLogObjectParent((PetscObject)ksp,(PetscObject)gmres->sol_temp);CHKERRQ(ierr); > >> } > >> ptr = gmres->sol_temp; > >> } > >> if (!gmres->nrs) { > >> /* allocate the work area */ > >> ierr = PetscMalloc1(gmres->max_k,&gmres->nrs);CHKERRQ(ierr); > >> ierr = PetscLogObjectMemory((PetscObject)ksp,gmres->max_k*sizeof(PetscScalar));CHKERRQ(ierr); > >> } > >> > >> ierr = KSPGMRESBuildSoln(gmres->nrs,ksp->vec_sol,ptr,ksp,gmres->it);CHKERRQ(ierr); > >> if (result) *result = ptr; > >> PetscFunctionReturn(0); > >> } > >> > >> PetscErrorCode KSPView_GMRES(KSP ksp,PetscViewer viewer) > >> { > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > >> const char *cstr; > >> PetscErrorCode ierr; > >> PetscBool iascii,isstring; > >> > >> PetscFunctionBegin; > >> ierr = PetscObjectTypeCompare((PetscObject)viewer,PETSCVIEWERASCII,&iascii);CHKERRQ(ierr); > >> ierr = PetscObjectTypeCompare((PetscObject)viewer,PETSCVIEWERSTRING,&isstring);CHKERRQ(ierr); > >> if (gmres->orthog == KSPGMRESClassicalGramSchmidtOrthogonalization) { > >> switch (gmres->cgstype) { > >> case (KSP_GMRES_CGS_REFINE_NEVER): > >> cstr = "Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement"; > >> break; > >> case (KSP_GMRES_CGS_REFINE_ALWAYS): > >> cstr = "Classical (unmodified) Gram-Schmidt Orthogonalization with one step of iterative refinement"; > >> break; > >> case (KSP_GMRES_CGS_REFINE_IFNEEDED): > >> cstr = "Classical (unmodified) Gram-Schmidt Orthogonalization with one step of iterative refinement when needed"; > >> break; > >> default: > >> SETERRQ(PetscObjectComm((PetscObject)ksp),PETSC_ERR_ARG_OUTOFRANGE,"Unknown orthogonalization"); > >> } > >> } else if (gmres->orthog == KSPGMRESModifiedGramSchmidtOrthogonalization) { > >> cstr = "Modified Gram-Schmidt Orthogonalization"; > >> } else { > >> cstr = "unknown orthogonalization"; > >> } > >> if (iascii) { > >> ierr = PetscViewerASCIIPrintf(viewer," restart=%D, using %s\n",gmres->max_k,cstr);CHKERRQ(ierr); > >> ierr = PetscViewerASCIIPrintf(viewer," happy breakdown tolerance %g\n",(double)gmres->haptol);CHKERRQ(ierr); > >> } else if (isstring) { > >> ierr = PetscViewerStringSPrintf(viewer,"%s restart %D",cstr,gmres->max_k);CHKERRQ(ierr); > >> } > >> PetscFunctionReturn(0); > >> } > >> > >> /*@C > >> KSPGMRESMonitorKrylov - Calls VecView() for each new direction in the GMRES accumulated Krylov space. > >> > >> Collective on KSP > >> > >> Input Parameters: > >> + ksp - the KSP context > >> . its - iteration number > >> . fgnorm - 2-norm of residual (or gradient) > >> - dummy - an collection of viewers created with KSPViewerCreate() > >> > >> Options Database Keys: > >> . -ksp_gmres_kyrlov_monitor > >> > >> Notes: A new PETSCVIEWERDRAW is created for each Krylov vector so they can all be simultaneously viewed > >> Level: intermediate > >> > >> .keywords: KSP, nonlinear, vector, monitor, view, Krylov space > >> > >> .seealso: KSPMonitorSet(), KSPMonitorDefault(), VecView(), KSPViewersCreate(), KSPViewersDestroy() > >> @*/ > >> PetscErrorCode KSPGMRESMonitorKrylov(KSP ksp,PetscInt its,PetscReal fgnorm,void *dummy) > >> { > >> PetscViewers viewers = (PetscViewers)dummy; > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > >> PetscErrorCode ierr; > >> Vec x; > >> PetscViewer viewer; > >> PetscBool flg; > >> > >> PetscFunctionBegin; > >> ierr = PetscViewersGetViewer(viewers,gmres->it+1,&viewer);CHKERRQ(ierr); > >> ierr = PetscObjectTypeCompare((PetscObject)viewer,PETSCVIEWERDRAW,&flg);CHKERRQ(ierr); > >> if (!flg) { > >> ierr = PetscViewerSetType(viewer,PETSCVIEWERDRAW);CHKERRQ(ierr); > >> ierr = PetscViewerDrawSetInfo(viewer,NULL,"Krylov GMRES Monitor",PETSC_DECIDE,PETSC_DECIDE,300,300);CHKERRQ(ierr); > >> } > >> x = VEC_VV(gmres->it+1); > >> ierr = VecView(x,viewer);CHKERRQ(ierr); > >> PetscFunctionReturn(0); > >> } > >> > >> PetscErrorCode KSPSetFromOptions_GMRES(PetscOptionItems *PetscOptionsObject,KSP ksp) > >> { > >> PetscErrorCode ierr; > >> PetscInt restart; > >> PetscReal haptol; > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > >> PetscBool flg; > >> > >> PetscFunctionBegin; > >> ierr = PetscOptionsHead(PetscOptionsObject,"KSP GMRES Options");CHKERRQ(ierr); > >> ierr = PetscOptionsInt("-ksp_gmres_restart","Number of Krylov search directions","KSPGMRESSetRestart",gmres->max_k,&restart,&flg);CHKERRQ(ierr); > >> if (flg) { ierr = KSPGMRESSetRestart(ksp,restart);CHKERRQ(ierr); } > >> ierr = PetscOptionsReal("-ksp_gmres_haptol","Tolerance for exact convergence (happy ending)","KSPGMRESSetHapTol",gmres->haptol,&haptol,&flg);CHKERRQ(ierr); > >> if (flg) { ierr = KSPGMRESSetHapTol(ksp,haptol);CHKERRQ(ierr); } > >> flg = PETSC_FALSE; > >> ierr = PetscOptionsBool("-ksp_gmres_preallocate","Preallocate Krylov vectors","KSPGMRESSetPreAllocateVectors",flg,&flg,NULL);CHKERRQ(ierr); > >> if (flg) {ierr = KSPGMRESSetPreAllocateVectors(ksp);CHKERRQ(ierr);} > >> ierr = PetscOptionsBoolGroupBegin("-ksp_gmres_classicalgramschmidt","Classical (unmodified) Gram-Schmidt (fast)","KSPGMRESSetOrthogonalization",&flg);CHKERRQ(ierr); > >> if (flg) {ierr = KSPGMRESSetOrthogonalization(ksp,KSPGMRESClassicalGramSchmidtOrthogonalization);CHKERRQ(ierr);} > >> ierr = PetscOptionsBoolGroupEnd("-ksp_gmres_modifiedgramschmidt","Modified Gram-Schmidt (slow,more stable)","KSPGMRESSetOrthogonalization",&flg);CHKERRQ(ierr); > >> if (flg) {ierr = KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization);CHKERRQ(ierr);} > >> ierr = PetscOptionsEnum("-ksp_gmres_cgs_refinement_type","Type of iterative refinement for classical (unmodified) Gram-Schmidt","KSPGMRESSetCGSRefinementType", > >> KSPGMRESCGSRefinementTypes,(PetscEnum)gmres->cgstype,(PetscEnum*)&gmres->cgstype,&flg);CHKERRQ(ierr); > >> flg = PETSC_FALSE; > >> ierr = PetscOptionsBool("-ksp_gmres_krylov_monitor","Plot the Krylov directions","KSPMonitorSet",flg,&flg,NULL);CHKERRQ(ierr); > >> if (flg) { > >> PetscViewers viewers; > >> ierr = PetscViewersCreate(PetscObjectComm((PetscObject)ksp),&viewers);CHKERRQ(ierr); > >> ierr = KSPMonitorSet(ksp,KSPGMRESMonitorKrylov,viewers,(PetscErrorCode (*)(void**))PetscViewersDestroy);CHKERRQ(ierr); > >> } > >> ierr = PetscOptionsTail();CHKERRQ(ierr); > >> PetscFunctionReturn(0); > >> } > >> > >> PetscErrorCode KSPGMRESSetHapTol_GMRES(KSP ksp,PetscReal tol) > >> { > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > >> > >> PetscFunctionBegin; > >> if (tol < 0.0) SETERRQ(PetscObjectComm((PetscObject)ksp),PETSC_ERR_ARG_OUTOFRANGE,"Tolerance must be non-negative"); > >> gmres->haptol = tol; > >> PetscFunctionReturn(0); > >> } > >> > >> PetscErrorCode KSPGMRESGetRestart_GMRES(KSP ksp,PetscInt *max_k) > >> { > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > >> > >> PetscFunctionBegin; > >> *max_k = gmres->max_k; > >> PetscFunctionReturn(0); > >> } > >> > >> PetscErrorCode KSPGMRESSetRestart_GMRES(KSP ksp,PetscInt max_k) > >> { > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > >> PetscErrorCode ierr; > >> > >> PetscFunctionBegin; > >> if (max_k < 1) SETERRQ(PetscObjectComm((PetscObject)ksp),PETSC_ERR_ARG_OUTOFRANGE,"Restart must be positive"); > >> if (!ksp->setupstage) { > >> gmres->max_k = max_k; > >> } else if (gmres->max_k != max_k) { > >> gmres->max_k = max_k; > >> ksp->setupstage = KSP_SETUP_NEW; > >> /* free the data structures, then create them again */ > >> ierr = KSPReset_GMRES(ksp);CHKERRQ(ierr); > >> } > >> PetscFunctionReturn(0); > >> } > >> > >> PetscErrorCode KSPGMRESSetOrthogonalization_GMRES(KSP ksp,FCN fcn) > >> { > >> PetscFunctionBegin; > >> ((KSP_GMRES*)ksp->data)->orthog = fcn; > >> PetscFunctionReturn(0); > >> } > >> > >> PetscErrorCode KSPGMRESGetOrthogonalization_GMRES(KSP ksp,FCN *fcn) > >> { > >> PetscFunctionBegin; > >> *fcn = ((KSP_GMRES*)ksp->data)->orthog; > >> PetscFunctionReturn(0); > >> } > >> > >> PetscErrorCode KSPGMRESSetPreAllocateVectors_GMRES(KSP ksp) > >> { > >> KSP_GMRES *gmres; > >> > >> PetscFunctionBegin; > >> gmres = (KSP_GMRES*)ksp->data; > >> gmres->q_preallocate = 1; > >> PetscFunctionReturn(0); > >> } > >> > >> PetscErrorCode KSPGMRESSetCGSRefinementType_GMRES(KSP ksp,KSPGMRESCGSRefinementType type) > >> { > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > >> > >> PetscFunctionBegin; > >> gmres->cgstype = type; > >> PetscFunctionReturn(0); > >> } > >> > >> PetscErrorCode KSPGMRESGetCGSRefinementType_GMRES(KSP ksp,KSPGMRESCGSRefinementType *type) > >> { > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > >> > >> PetscFunctionBegin; > >> *type = gmres->cgstype; > >> PetscFunctionReturn(0); > >> } > >> > >> /*@ > >> KSPGMRESSetCGSRefinementType - Sets the type of iterative refinement to use > >> in the classical Gram Schmidt orthogonalization. > >> > >> Logically Collective on KSP > >> > >> Input Parameters: > >> + ksp - the Krylov space context > >> - type - the type of refinement > >> > >> Options Database: > >> . -ksp_gmres_cgs_refinement_type > >> > >> Level: intermediate > >> > >> .keywords: KSP, GMRES, iterative refinement > >> > >> .seealso: KSPGMRESSetOrthogonalization(), KSPGMRESCGSRefinementType, KSPGMRESClassicalGramSchmidtOrthogonalization(), KSPGMRESGetCGSRefinementType(), > >> KSPGMRESGetOrthogonalization() > >> @*/ > >> PetscErrorCode KSPGMRESSetCGSRefinementType(KSP ksp,KSPGMRESCGSRefinementType type) > >> { > >> PetscErrorCode ierr; > >> > >> PetscFunctionBegin; > >> PetscValidHeaderSpecific(ksp,KSP_CLASSID,1); > >> PetscValidLogicalCollectiveEnum(ksp,type,2); > >> ierr = PetscTryMethod(ksp,"KSPGMRESSetCGSRefinementType_C",(KSP,KSPGMRESCGSRefinementType),(ksp,type));CHKERRQ(ierr); > >> PetscFunctionReturn(0); > >> } > >> > >> /*@ > >> KSPGMRESGetCGSRefinementType - Gets the type of iterative refinement to use > >> in the classical Gram Schmidt orthogonalization. > >> > >> Not Collective > >> > >> Input Parameter: > >> . ksp - the Krylov space context > >> > >> Output Parameter: > >> . type - the type of refinement > >> > >> Options Database: > >> . -ksp_gmres_cgs_refinement_type > >> > >> Level: intermediate > >> > >> .keywords: KSP, GMRES, iterative refinement > >> > >> .seealso: KSPGMRESSetOrthogonalization(), KSPGMRESCGSRefinementType, KSPGMRESClassicalGramSchmidtOrthogonalization(), KSPGMRESSetCGSRefinementType(), > >> KSPGMRESGetOrthogonalization() > >> @*/ > >> PetscErrorCode KSPGMRESGetCGSRefinementType(KSP ksp,KSPGMRESCGSRefinementType *type) > >> { > >> PetscErrorCode ierr; > >> > >> PetscFunctionBegin; > >> PetscValidHeaderSpecific(ksp,KSP_CLASSID,1); > >> ierr = PetscUseMethod(ksp,"KSPGMRESGetCGSRefinementType_C",(KSP,KSPGMRESCGSRefinementType*),(ksp,type));CHKERRQ(ierr); > >> PetscFunctionReturn(0); > >> } > >> > >> > >> /*@ > >> KSPGMRESSetRestart - Sets number of iterations at which GMRES, FGMRES and LGMRES restarts. > >> > >> Logically Collective on KSP > >> > >> Input Parameters: > >> + ksp - the Krylov space context > >> - restart - integer restart value > >> > >> Options Database: > >> . -ksp_gmres_restart > >> > >> Note: The default value is 30. > >> > >> Level: intermediate > >> > >> .keywords: KSP, GMRES, restart, iterations > >> > >> .seealso: KSPSetTolerances(), KSPGMRESSetOrthogonalization(), KSPGMRESSetPreAllocateVectors(), KSPGMRESGetRestart() > >> @*/ > >> PetscErrorCode KSPGMRESSetRestart(KSP ksp, PetscInt restart) > >> { > >> PetscErrorCode ierr; > >> > >> PetscFunctionBegin; > >> PetscValidLogicalCollectiveInt(ksp,restart,2); > >> > >> ierr = PetscTryMethod(ksp,"KSPGMRESSetRestart_C",(KSP,PetscInt),(ksp,restart));CHKERRQ(ierr); > >> PetscFunctionReturn(0); > >> } > >> > >> /*@ > >> KSPGMRESGetRestart - Gets number of iterations at which GMRES, FGMRES and LGMRES restarts. > >> > >> Not Collective > >> > >> Input Parameter: > >> . ksp - the Krylov space context > >> > >> Output Parameter: > >> . restart - integer restart value > >> > >> Note: The default value is 30. > >> > >> Level: intermediate > >> > >> .keywords: KSP, GMRES, restart, iterations > >> > >> .seealso: KSPSetTolerances(), KSPGMRESSetOrthogonalization(), KSPGMRESSetPreAllocateVectors(), KSPGMRESSetRestart() > >> @*/ > >> PetscErrorCode KSPGMRESGetRestart(KSP ksp, PetscInt *restart) > >> { > >> PetscErrorCode ierr; > >> > >> PetscFunctionBegin; > >> ierr = PetscUseMethod(ksp,"KSPGMRESGetRestart_C",(KSP,PetscInt*),(ksp,restart));CHKERRQ(ierr); > >> PetscFunctionReturn(0); > >> } > >> > >> /*@ > >> KSPGMRESSetHapTol - Sets tolerance for determining happy breakdown in GMRES, FGMRES and LGMRES. > >> > >> Logically Collective on KSP > >> > >> Input Parameters: > >> + ksp - the Krylov space context > >> - tol - the tolerance > >> > >> Options Database: > >> . -ksp_gmres_haptol > >> > >> Note: Happy breakdown is the rare case in GMRES where an 'exact' solution is obtained after > >> a certain number of iterations. If you attempt more iterations after this point unstable > >> things can happen hence very occasionally you may need to set this value to detect this condition > >> > >> Level: intermediate > >> > >> .keywords: KSP, GMRES, tolerance > >> > >> .seealso: KSPSetTolerances() > >> @*/ > >> PetscErrorCode KSPGMRESSetHapTol(KSP ksp,PetscReal tol) > >> { > >> PetscErrorCode ierr; > >> > >> PetscFunctionBegin; > >> PetscValidLogicalCollectiveReal(ksp,tol,2); > >> ierr = PetscTryMethod((ksp),"KSPGMRESSetHapTol_C",(KSP,PetscReal),((ksp),(tol)));CHKERRQ(ierr); > >> PetscFunctionReturn(0); > >> } > >> > >> /*MC > >> KSPGMRES - Implements the Generalized Minimal Residual method. > >> (Saad and Schultz, 1986) with restart > >> > >> > >> Options Database Keys: > >> + -ksp_gmres_restart - the number of Krylov directions to orthogonalize against > >> . -ksp_gmres_haptol - sets the tolerance for "happy ending" (exact convergence) > >> . -ksp_gmres_preallocate - preallocate all the Krylov search directions initially (otherwise groups of > >> vectors are allocated as needed) > >> . -ksp_gmres_classicalgramschmidt - use classical (unmodified) Gram-Schmidt to orthogonalize against the Krylov space (fast) (the default) > >> . -ksp_gmres_modifiedgramschmidt - use modified Gram-Schmidt in the orthogonalization (more stable, but slower) > >> . -ksp_gmres_cgs_refinement_type - determine if iterative refinement is used to increase the > >> stability of the classical Gram-Schmidt orthogonalization. > >> - -ksp_gmres_krylov_monitor - plot the Krylov space generated > >> > >> Level: beginner > >> > >> Notes: Left and right preconditioning are supported, but not symmetric preconditioning. > >> > >> References: > >> . 1. - YOUCEF SAAD AND MARTIN H. SCHULTZ, GMRES: A GENERALIZED MINIMAL RESIDUAL ALGORITHM FOR SOLVING NONSYMMETRIC LINEAR SYSTEMS. > >> SIAM J. ScI. STAT. COMPUT. Vo|. 7, No. 3, July 1986. > >> > >> .seealso: KSPCreate(), KSPSetType(), KSPType (for list of available types), KSP, KSPFGMRES, KSPLGMRES, > >> KSPGMRESSetRestart(), KSPGMRESSetHapTol(), KSPGMRESSetPreAllocateVectors(), KSPGMRESSetOrthogonalization(), KSPGMRESGetOrthogonalization(), > >> KSPGMRESClassicalGramSchmidtOrthogonalization(), KSPGMRESModifiedGramSchmidtOrthogonalization(), > >> KSPGMRESCGSRefinementType, KSPGMRESSetCGSRefinementType(), KSPGMRESGetCGSRefinementType(), KSPGMRESMonitorKrylov(), KSPSetPCSide() > >> > >> M*/ > >> > >> PETSC_EXTERN PetscErrorCode KSPCreate_GMRES(KSP ksp) > >> { > >> KSP_GMRES *gmres; > >> PetscErrorCode ierr; > >> > >> PetscFunctionBegin; > >> ierr = PetscNewLog(ksp,&gmres);CHKERRQ(ierr); > >> ksp->data = (void*)gmres; > >> > >> ierr = KSPSetSupportedNorm(ksp,KSP_NORM_PRECONDITIONED,PC_LEFT,4);CHKERRQ(ierr); > >> ierr = KSPSetSupportedNorm(ksp,KSP_NORM_UNPRECONDITIONED,PC_RIGHT,3);CHKERRQ(ierr); > >> ierr = KSPSetSupportedNorm(ksp,KSP_NORM_PRECONDITIONED,PC_SYMMETRIC,2);CHKERRQ(ierr); > >> ierr = KSPSetSupportedNorm(ksp,KSP_NORM_NONE,PC_RIGHT,1);CHKERRQ(ierr); > >> ierr = KSPSetSupportedNorm(ksp,KSP_NORM_NONE,PC_LEFT,1);CHKERRQ(ierr); > >> > >> ksp->ops->buildsolution = KSPBuildSolution_GMRES; > >> ksp->ops->setup = KSPSetUp_GMRES; > >> ksp->ops->solve = KSPSolve_GMRES; > >> ksp->ops->reset = KSPReset_GMRES; > >> ksp->ops->destroy = KSPDestroy_GMRES; > >> ksp->ops->view = KSPView_GMRES; > >> ksp->ops->setfromoptions = KSPSetFromOptions_GMRES; > >> ksp->ops->computeextremesingularvalues = KSPComputeExtremeSingularValues_GMRES; > >> ksp->ops->computeeigenvalues = KSPComputeEigenvalues_GMRES; > >> #if !defined(PETSC_USE_COMPLEX) && !defined(PETSC_HAVE_ESSL) > >> ksp->ops->computeritz = KSPComputeRitz_GMRES; > >> #endif > >> ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetPreAllocateVectors_C",KSPGMRESSetPreAllocateVectors_GMRES);CHKERRQ(ierr); > >> ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetOrthogonalization_C",KSPGMRESSetOrthogonalization_GMRES);CHKERRQ(ierr); > >> ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESGetOrthogonalization_C",KSPGMRESGetOrthogonalization_GMRES);CHKERRQ(ierr); > >> ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetRestart_C",KSPGMRESSetRestart_GMRES);CHKERRQ(ierr); > >> ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESGetRestart_C",KSPGMRESGetRestart_GMRES);CHKERRQ(ierr); > >> ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetHapTol_C",KSPGMRESSetHapTol_GMRES);CHKERRQ(ierr); > >> ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetCGSRefinementType_C",KSPGMRESSetCGSRefinementType_GMRES);CHKERRQ(ierr); > >> ierr = PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESGetCGSRefinementType_C",KSPGMRESGetCGSRefinementType_GMRES);CHKERRQ(ierr); > >> > >> gmres->haptol = 1.0e-30; > >> gmres->q_preallocate = 0; > >> gmres->delta_allocate = GMRES_DELTA_DIRECTIONS; > >> gmres->orthog = KSPGMRESClassicalGramSchmidtOrthogonalization; > >> gmres->nrs = 0; > >> gmres->sol_temp = 0; > >> gmres->max_k = GMRES_DEFAULT_MAXK; > >> gmres->Rsvd = 0; > >> gmres->cgstype = KSP_GMRES_CGS_REFINE_NEVER; > >> gmres->orthogwork = 0; > >> gmres->delta = -1.0; // DRL > >> PetscFunctionReturn(0); > >> } > From davelee2804 at gmail.com Mon May 20 03:31:28 2019 From: davelee2804 at gmail.com (Dave Lee) Date: Mon, 20 May 2019 18:31:28 +1000 Subject: [petsc-users] Calling LAPACK routines from PETSc In-Reply-To: <92E6EB6E-A0C0-4E1B-94B3-3DA34C87F4F3@mcs.anl.gov> References: <8736l9abj4.fsf@jedbrown.org> <8D37944A-6C37-48D0-B238-4E9806D93B7E@anl.gov> <92E6EB6E-A0C0-4E1B-94B3-3DA34C87F4F3@mcs.anl.gov> Message-ID: Thanks Barry, I found some helpful examples on the intel lapack site - moral of the story: using C ordering for input matrix, but transposed output matrices leads to a consistent solution. Cheers, Dave. On Mon, May 20, 2019 at 6:07 PM Smith, Barry F. wrote: > > > > On May 20, 2019, at 2:28 AM, Dave Lee wrote: > > > > Thanks Jed and Barry, > > > > So, just to confirm, > > > > -- From the KSP_GMRES structure, if I call *HH(a,b), that will return > the row a, column b entry of the Hessenberg matrix (while the back end > array *hh_origin array is ordering using the Fortran convention) > > > > -- Matrices are passed into and returned from > PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_() using Fortran indexing, and > need to be transposed to get back to C ordering > > In general, I guess depending on what you want to do with them you don' > need to transpose them. Why would you want to? Just leave them as little > column oriented blogs and with them what you need directly. > > Just do stuff and you'll find it works out. > > > > > Are both of these statements correct? > > > > Cheers, Dave. > > > > On Mon, May 20, 2019 at 4:34 PM Smith, Barry F. > wrote: > > > > The little work arrays in GMRES tend to be stored in Fortran > ordering; there is no C style p[][] indexing into such arrays. Thus the > arrays can safely be sent to LAPACK. The only trick is knowing the two > dimensions and as Jed say the "leading dimension parameter. He gave you a > place to look > > > > > On May 20, 2019, at 1:24 AM, Jed Brown via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > > > Dave Lee via petsc-users writes: > > > > > >> Hi Petsc, > > >> > > >> I'm attempting to implement a "hookstep" for the SNES trust region > solver. > > >> Essentially what I'm trying to do is replace the solution of the least > > >> squares problem at the end of each GMRES solve with a modified > solution > > >> with a norm that is constrained to be within the size of the trust > region. > > >> > > >> In order to do this I need to perform an SVD on the Hessenberg matrix, > > >> which copying the function KSPComputeExtremeSingularValues(), I'm > trying to > > >> do by accessing the LAPACK function dgesvd() via the > PetscStackCallBLAS() > > >> machinery. One thing I'm confused about however is the ordering of > the 2D > > >> arrays into and out of this function, given that that C and FORTRAN > arrays > > >> use reverse indexing, ie: C[j+1][i+1] = F[i,j]. > > >> > > >> Given that the Hessenberg matrix has k+1 rows and k columns, should I > be > > >> still be initializing this as H[row][col] and passing this into > > >> PetscStackCallBLAS("LAPACKgesvd",LAPACKgrsvd_(...)) > > >> or should I be transposing this before passing it in? > > > > > > LAPACK terminology is with respect to Fortran ordering. There is a > > > "leading dimension" parameter so that you can operate on non-contiguous > > > blocks. See KSPComputeExtremeSingularValues_GMRES for an example. > > > > > >> Also for the left and right singular vector matrices that are > returned by > > >> this function, should I be transposing these before I interpret them > as C > > >> arrays? > > >> > > >> I've attached my modified version of gmres.c in case this is helpful. > If > > >> you grep for DRL (my initials) then you'll see my changes to the code. > > >> > > >> Cheers, Dave. > > >> > > >> /* > > >> This file implements GMRES (a Generalized Minimal Residual) method. > > >> Reference: Saad and Schultz, 1986. > > >> > > >> > > >> Some comments on left vs. right preconditioning, and restarts. > > >> Left and right preconditioning. > > >> If right preconditioning is chosen, then the problem being solved > > >> by gmres is actually > > >> My = AB^-1 y = f > > >> so the initial residual is > > >> r = f - Mx > > >> Note that B^-1 y = x or y = B x, and if x is non-zero, the initial > > >> residual is > > >> r = f - A x > > >> The final solution is then > > >> x = B^-1 y > > >> > > >> If left preconditioning is chosen, then the problem being solved is > > >> My = B^-1 A x = B^-1 f, > > >> and the initial residual is > > >> r = B^-1(f - Ax) > > >> > > >> Restarts: Restarts are basically solves with x0 not equal to zero. > > >> Note that we can eliminate an extra application of B^-1 between > > >> restarts as long as we don't require that the solution at the end > > >> of an unsuccessful gmres iteration always be the solution x. > > >> */ > > >> > > >> #include <../src/ksp/ksp/impls/gmres/gmresimpl.h> /*I > "petscksp.h" I*/ > > >> #include // DRL > > >> #define GMRES_DELTA_DIRECTIONS 10 > > >> #define GMRES_DEFAULT_MAXK 30 > > >> static PetscErrorCode > KSPGMRESUpdateHessenberg(KSP,PetscInt,PetscBool,PetscReal*); > > >> static PetscErrorCode > KSPGMRESBuildSoln(PetscScalar*,Vec,Vec,KSP,PetscInt); > > >> > > >> PetscErrorCode KSPSetUp_GMRES(KSP ksp) > > >> { > > >> PetscInt hh,hes,rs,cc; > > >> PetscErrorCode ierr; > > >> PetscInt max_k,k; > > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > > >> > > >> PetscFunctionBegin; > > >> max_k = gmres->max_k; /* restart size */ > > >> hh = (max_k + 2) * (max_k + 1); > > >> hes = (max_k + 1) * (max_k + 1); > > >> rs = (max_k + 2); > > >> cc = (max_k + 1); > > >> > > >> ierr = > PetscCalloc5(hh,&gmres->hh_origin,hes,&gmres->hes_origin,rs,&gmres->rs_origin,cc,&gmres->cc_origin,cc,&gmres->ss_origin);CHKERRQ(ierr); > > >> ierr = PetscLogObjectMemory((PetscObject)ksp,(hh + hes + rs + > 2*cc)*sizeof(PetscScalar));CHKERRQ(ierr); > > >> > > >> if (ksp->calc_sings) { > > >> /* Allocate workspace to hold Hessenberg matrix needed by lapack */ > > >> ierr = PetscMalloc1((max_k + 3)*(max_k + > 9),&gmres->Rsvd);CHKERRQ(ierr); > > >> ierr = PetscLogObjectMemory((PetscObject)ksp,(max_k + 3)*(max_k + > 9)*sizeof(PetscScalar));CHKERRQ(ierr); > > >> ierr = PetscMalloc1(6*(max_k+2),&gmres->Dsvd);CHKERRQ(ierr); > > >> ierr = > PetscLogObjectMemory((PetscObject)ksp,6*(max_k+2)*sizeof(PetscReal));CHKERRQ(ierr); > > >> } > > >> > > >> /* Allocate array to hold pointers to user vectors. Note that we > need > > >> 4 + max_k + 1 (since we need it+1 vectors, and it <= max_k) */ > > >> gmres->vecs_allocated = VEC_OFFSET + 2 + max_k + gmres->nextra_vecs; > > >> > > >> ierr = > PetscMalloc1(gmres->vecs_allocated,&gmres->vecs);CHKERRQ(ierr); > > >> ierr = > PetscMalloc1(VEC_OFFSET+2+max_k,&gmres->user_work);CHKERRQ(ierr); > > >> ierr = > PetscMalloc1(VEC_OFFSET+2+max_k,&gmres->mwork_alloc);CHKERRQ(ierr); > > >> ierr = > PetscLogObjectMemory((PetscObject)ksp,(VEC_OFFSET+2+max_k)*(sizeof(Vec*)+sizeof(PetscInt)) > + gmres->vecs_allocated*sizeof(Vec));CHKERRQ(ierr); > > >> > > >> if (gmres->q_preallocate) { > > >> gmres->vv_allocated = VEC_OFFSET + 2 + max_k; > > >> > > >> ierr = > KSPCreateVecs(ksp,gmres->vv_allocated,&gmres->user_work[0],0,NULL);CHKERRQ(ierr); > > >> ierr = > PetscLogObjectParents(ksp,gmres->vv_allocated,gmres->user_work[0]);CHKERRQ(ierr); > > >> > > >> gmres->mwork_alloc[0] = gmres->vv_allocated; > > >> gmres->nwork_alloc = 1; > > >> for (k=0; kvv_allocated; k++) { > > >> gmres->vecs[k] = gmres->user_work[0][k]; > > >> } > > >> } else { > > >> gmres->vv_allocated = 5; > > >> > > >> ierr = > KSPCreateVecs(ksp,5,&gmres->user_work[0],0,NULL);CHKERRQ(ierr); > > >> ierr = > PetscLogObjectParents(ksp,5,gmres->user_work[0]);CHKERRQ(ierr); > > >> > > >> gmres->mwork_alloc[0] = 5; > > >> gmres->nwork_alloc = 1; > > >> for (k=0; kvv_allocated; k++) { > > >> gmres->vecs[k] = gmres->user_work[0][k]; > > >> } > > >> } > > >> PetscFunctionReturn(0); > > >> } > > >> > > >> /* > > >> Run gmres, possibly with restart. Return residual history if > requested. > > >> input parameters: > > >> > > >> . gmres - structure containing parameters and work areas > > >> > > >> output parameters: > > >> . nres - residuals (from preconditioned system) at each > step. > > >> If restarting, consider passing nres+it. If null, > > >> ignored > > >> . itcount - number of iterations used. nres[0] to > nres[itcount] > > >> are defined. If null, ignored. > > >> > > >> Notes: > > >> On entry, the value in vector VEC_VV(0) should be the initial > residual > > >> (this allows shortcuts where the initial preconditioned residual > is 0). > > >> */ > > >> PetscErrorCode KSPGMRESCycle(PetscInt *itcount,KSP ksp) > > >> { > > >> KSP_GMRES *gmres = (KSP_GMRES*)(ksp->data); > > >> PetscReal res_norm,res,hapbnd,tt; > > >> PetscErrorCode ierr; > > >> PetscInt it = 0, max_k = gmres->max_k; > > >> PetscBool hapend = PETSC_FALSE; > > >> > > >> PetscFunctionBegin; > > >> if (itcount) *itcount = 0; > > >> ierr = VecNormalize(VEC_VV(0),&res_norm);CHKERRQ(ierr); > > >> KSPCheckNorm(ksp,res_norm); > > >> res = res_norm; > > >> *GRS(0) = res_norm; > > >> > > >> /* check for the convergence */ > > >> ierr = > PetscObjectSAWsTakeAccess((PetscObject)ksp);CHKERRQ(ierr); > > >> ksp->rnorm = res; > > >> ierr = > PetscObjectSAWsGrantAccess((PetscObject)ksp);CHKERRQ(ierr); > > >> gmres->it = (it - 1); > > >> ierr = KSPLogResidualHistory(ksp,res);CHKERRQ(ierr); > > >> ierr = KSPMonitor(ksp,ksp->its,res);CHKERRQ(ierr); > > >> if (!res) { > > >> ksp->reason = KSP_CONVERGED_ATOL; > > >> ierr = PetscInfo(ksp,"Converged due to zero residual norm > on entry\n");CHKERRQ(ierr); > > >> PetscFunctionReturn(0); > > >> } > > >> > > >> ierr = > (*ksp->converged)(ksp,ksp->its,res,&ksp->reason,ksp->cnvP);CHKERRQ(ierr); > > >> while (!ksp->reason && it < max_k && ksp->its < ksp->max_it) { > > >> if (it) { > > >> ierr = KSPLogResidualHistory(ksp,res);CHKERRQ(ierr); > > >> ierr = KSPMonitor(ksp,ksp->its,res);CHKERRQ(ierr); > > >> } > > >> gmres->it = (it - 1); > > >> if (gmres->vv_allocated <= it + VEC_OFFSET + 1) { > > >> ierr = KSPGMRESGetNewVectors(ksp,it+1);CHKERRQ(ierr); > > >> } > > >> ierr = > KSP_PCApplyBAorAB(ksp,VEC_VV(it),VEC_VV(1+it),VEC_TEMP_MATOP);CHKERRQ(ierr); > > >> > > >> /* update hessenberg matrix and do Gram-Schmidt */ > > >> ierr = (*gmres->orthog)(ksp,it);CHKERRQ(ierr); > > >> if (ksp->reason) break; > > >> > > >> /* vv(i+1) . vv(i+1) */ > > >> ierr = VecNormalize(VEC_VV(it+1),&tt);CHKERRQ(ierr); > > >> > > >> /* save the magnitude */ > > >> *HH(it+1,it) = tt; > > >> *HES(it+1,it) = tt; > > >> > > >> /* check for the happy breakdown */ > > >> hapbnd = PetscAbsScalar(tt / *GRS(it)); > > >> if (hapbnd > gmres->haptol) hapbnd = gmres->haptol; > > >> if (tt < hapbnd) { > > >> ierr = PetscInfo2(ksp,"Detected happy breakdown, current > hapbnd = %14.12e tt = %14.12e\n",(double)hapbnd,(double)tt);CHKERRQ(ierr); > > >> hapend = PETSC_TRUE; > > >> } > > >> ierr = KSPGMRESUpdateHessenberg(ksp,it,hapend,&res);CHKERRQ(ierr); > > >> > > >> it++; > > >> gmres->it = (it-1); /* For converged */ > > >> ksp->its++; > > >> ksp->rnorm = res; > > >> if (ksp->reason) break; > > >> > > >> ierr = > (*ksp->converged)(ksp,ksp->its,res,&ksp->reason,ksp->cnvP);CHKERRQ(ierr); > > >> > > >> /* Catch error in happy breakdown and signal convergence and break > from loop */ > > >> if (hapend) { > > >> if (!ksp->reason) { > > >> if (ksp->errorifnotconverged) > SETERRQ1(PetscObjectComm((PetscObject)ksp),PETSC_ERR_NOT_CONVERGED,"You > reached the happy break down, but convergence was not indicated. Residual > norm = %g",(double)res); > > >> else { > > >> ksp->reason = KSP_DIVERGED_BREAKDOWN; > > >> break; > > >> } > > >> } > > >> } > > >> } > > >> > > >> /* Monitor if we know that we will not return for a restart */ > > >> if (it && (ksp->reason || ksp->its >= ksp->max_it)) { > > >> ierr = KSPLogResidualHistory(ksp,res);CHKERRQ(ierr); > > >> ierr = KSPMonitor(ksp,ksp->its,res);CHKERRQ(ierr); > > >> } > > >> > > >> if (itcount) *itcount = it; > > >> > > >> > > >> /* > > >> Down here we have to solve for the "best" coefficients of the > Krylov > > >> columns, add the solution values together, and possibly unwind the > > >> preconditioning from the solution > > >> */ > > >> /* Form the solution (or the solution so far) */ > > >> ierr = > KSPGMRESBuildSoln(GRS(0),ksp->vec_sol,ksp->vec_sol,ksp,it-1);CHKERRQ(ierr); > > >> PetscFunctionReturn(0); > > >> } > > >> > > >> PetscErrorCode KSPSolve_GMRES(KSP ksp) > > >> { > > >> PetscErrorCode ierr; > > >> PetscInt its,itcount,i; > > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > > >> PetscBool guess_zero = ksp->guess_zero; > > >> PetscInt N = gmres->max_k + 1; > > >> PetscBLASInt bN; > > >> > > >> PetscFunctionBegin; > > >> if (ksp->calc_sings && !gmres->Rsvd) > SETERRQ(PetscObjectComm((PetscObject)ksp),PETSC_ERR_ORDER,"Must call > KSPSetComputeSingularValues() before KSPSetUp() is called"); > > >> > > >> ierr = PetscObjectSAWsTakeAccess((PetscObject)ksp);CHKERRQ(ierr); > > >> ksp->its = 0; > > >> ierr = > PetscObjectSAWsGrantAccess((PetscObject)ksp);CHKERRQ(ierr); > > >> > > >> itcount = 0; > > >> gmres->fullcycle = 0; > > >> ksp->reason = KSP_CONVERGED_ITERATING; > > >> while (!ksp->reason) { > > >> ierr = > KSPInitialResidual(ksp,ksp->vec_sol,VEC_TEMP,VEC_TEMP_MATOP,VEC_VV(0),ksp->vec_rhs);CHKERRQ(ierr); > > >> ierr = KSPGMRESCycle(&its,ksp);CHKERRQ(ierr); > > >> /* Store the Hessenberg matrix and the basis vectors of the Krylov > subspace > > >> if the cycle is complete for the computation of the Ritz pairs */ > > >> if (its == gmres->max_k) { > > >> gmres->fullcycle++; > > >> if (ksp->calc_ritz) { > > >> if (!gmres->hes_ritz) { > > >> ierr = PetscMalloc1(N*N,&gmres->hes_ritz);CHKERRQ(ierr); > > >> ierr = > PetscLogObjectMemory((PetscObject)ksp,N*N*sizeof(PetscScalar));CHKERRQ(ierr); > > >> ierr = > VecDuplicateVecs(VEC_VV(0),N,&gmres->vecb);CHKERRQ(ierr); > > >> } > > >> ierr = PetscBLASIntCast(N,&bN);CHKERRQ(ierr); > > >> ierr = > PetscMemcpy(gmres->hes_ritz,gmres->hes_origin,bN*bN*sizeof(PetscReal));CHKERRQ(ierr); > > >> for (i=0; imax_k+1; i++) { > > >> ierr = VecCopy(VEC_VV(i),gmres->vecb[i]);CHKERRQ(ierr); > > >> } > > >> } > > >> } > > >> itcount += its; > > >> if (itcount >= ksp->max_it) { > > >> if (!ksp->reason) ksp->reason = KSP_DIVERGED_ITS; > > >> break; > > >> } > > >> ksp->guess_zero = PETSC_FALSE; /* every future call to > KSPInitialResidual() will have nonzero guess */ > > >> } > > >> ksp->guess_zero = guess_zero; /* restore if user provided nonzero > initial guess */ > > >> PetscFunctionReturn(0); > > >> } > > >> > > >> PetscErrorCode KSPReset_GMRES(KSP ksp) > > >> { > > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > > >> PetscErrorCode ierr; > > >> PetscInt i; > > >> > > >> PetscFunctionBegin; > > >> /* Free the Hessenberg matrices */ > > >> ierr = > PetscFree6(gmres->hh_origin,gmres->hes_origin,gmres->rs_origin,gmres->cc_origin,gmres->ss_origin,gmres->hes_ritz);CHKERRQ(ierr); > > >> > > >> /* free work vectors */ > > >> ierr = PetscFree(gmres->vecs);CHKERRQ(ierr); > > >> for (i=0; inwork_alloc; i++) { > > >> ierr = > VecDestroyVecs(gmres->mwork_alloc[i],&gmres->user_work[i]);CHKERRQ(ierr); > > >> } > > >> gmres->nwork_alloc = 0; > > >> if (gmres->vecb) { > > >> ierr = VecDestroyVecs(gmres->max_k+1,&gmres->vecb);CHKERRQ(ierr); > > >> } > > >> > > >> ierr = PetscFree(gmres->user_work);CHKERRQ(ierr); > > >> ierr = PetscFree(gmres->mwork_alloc);CHKERRQ(ierr); > > >> ierr = PetscFree(gmres->nrs);CHKERRQ(ierr); > > >> ierr = VecDestroy(&gmres->sol_temp);CHKERRQ(ierr); > > >> ierr = PetscFree(gmres->Rsvd);CHKERRQ(ierr); > > >> ierr = PetscFree(gmres->Dsvd);CHKERRQ(ierr); > > >> ierr = PetscFree(gmres->orthogwork);CHKERRQ(ierr); > > >> > > >> gmres->sol_temp = 0; > > >> gmres->vv_allocated = 0; > > >> gmres->vecs_allocated = 0; > > >> gmres->sol_temp = 0; > > >> PetscFunctionReturn(0); > > >> } > > >> > > >> PetscErrorCode KSPDestroy_GMRES(KSP ksp) > > >> { > > >> PetscErrorCode ierr; > > >> > > >> PetscFunctionBegin; > > >> ierr = KSPReset_GMRES(ksp);CHKERRQ(ierr); > > >> ierr = PetscFree(ksp->data);CHKERRQ(ierr); > > >> /* clear composed functions */ > > >> ierr = > PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetPreAllocateVectors_C",NULL);CHKERRQ(ierr); > > >> ierr = > PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetOrthogonalization_C",NULL);CHKERRQ(ierr); > > >> ierr = > PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESGetOrthogonalization_C",NULL);CHKERRQ(ierr); > > >> ierr = > PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetRestart_C",NULL);CHKERRQ(ierr); > > >> ierr = > PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESGetRestart_C",NULL);CHKERRQ(ierr); > > >> ierr = > PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetHapTol_C",NULL);CHKERRQ(ierr); > > >> ierr = > PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetCGSRefinementType_C",NULL);CHKERRQ(ierr); > > >> ierr = > PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESGetCGSRefinementType_C",NULL);CHKERRQ(ierr); > > >> PetscFunctionReturn(0); > > >> } > > >> /* > > >> KSPGMRESBuildSoln - create the solution from the starting vector > and the > > >> current iterates. > > >> > > >> Input parameters: > > >> nrs - work area of size it + 1. > > >> vs - index of initial guess > > >> vdest - index of result. Note that vs may == vdest (replace > > >> guess with the solution). > > >> > > >> This is an internal routine that knows about the GMRES internals. > > >> */ > > >> static PetscErrorCode KSPGMRESBuildSoln(PetscScalar *nrs,Vec vs,Vec > vdest,KSP ksp,PetscInt it) > > >> { > > >> PetscScalar tt; > > >> PetscErrorCode ierr; > > >> PetscInt ii,k,j; > > >> KSP_GMRES *gmres = (KSP_GMRES*)(ksp->data); > > >> > > >> PetscFunctionBegin; > > >> /* Solve for solution vector that minimizes the residual */ > > >> > > >> /* If it is < 0, no gmres steps have been performed */ > > >> if (it < 0) { > > >> ierr = VecCopy(vs,vdest);CHKERRQ(ierr); /* VecCopy() is smart, > exists immediately if vguess == vdest */ > > >> PetscFunctionReturn(0); > > >> } > > >> if (*HH(it,it) != 0.0) { > > >> nrs[it] = *GRS(it) / *HH(it,it); > > >> } else { > > >> ksp->reason = KSP_DIVERGED_BREAKDOWN; > > >> > > >> ierr = PetscInfo2(ksp,"Likely your matrix or preconditioner is > singular. HH(it,it) is identically zero; it = %D GRS(it) = > %g\n",it,(double)PetscAbsScalar(*GRS(it)));CHKERRQ(ierr); > > >> PetscFunctionReturn(0); > > >> } > > >> for (ii=1; ii<=it; ii++) { > > >> k = it - ii; > > >> tt = *GRS(k); > > >> for (j=k+1; j<=it; j++) tt = tt - *HH(k,j) * nrs[j]; > > >> if (*HH(k,k) == 0.0) { > > >> ksp->reason = KSP_DIVERGED_BREAKDOWN; > > >> > > >> ierr = PetscInfo1(ksp,"Likely your matrix or preconditioner is > singular. HH(k,k) is identically zero; k = %D\n",k);CHKERRQ(ierr); > > >> PetscFunctionReturn(0); > > >> } > > >> nrs[k] = tt / *HH(k,k); > > >> } > > >> > > >> /* Perform the hookstep correction - DRL */ > > >> if(gmres->delta > 0.0 && gmres->it > 0) { // Apply the hookstep to > correct the GMRES solution (if required) > > >> printf("\t\tapplying hookstep: initial delta: %lf", gmres->delta); > > >> PetscInt N = gmres->max_k+2, ii, jj, j0; > > >> PetscBLASInt nRows, nCols, lwork, lierr; > > >> PetscScalar *R, *work; > > >> PetscReal* S; > > >> PetscScalar *U, *VT, *p, *q, *y; > > >> PetscScalar bnorm, mu, qMag, qMag2, delta2; > > >> > > >> ierr = PetscMalloc1((gmres->max_k + 3)*(gmres->max_k + > 9),&R);CHKERRQ(ierr); > > >> work = R + N*N; > > >> ierr = PetscMalloc1(6*(gmres->max_k+2),&S);CHKERRQ(ierr); > > >> > > >> ierr = PetscBLASIntCast(gmres->it+1,&nRows);CHKERRQ(ierr); > > >> ierr = PetscBLASIntCast(gmres->it+0,&nCols);CHKERRQ(ierr); > > >> ierr = PetscBLASIntCast(5*N,&lwork);CHKERRQ(ierr); > > >> //ierr = > PetscMemcpy(R,gmres->hes_origin,(gmres->max_k+2)*(gmres->max_k+1)*sizeof(PetscScalar));CHKERRQ(ierr); > > >> ierr = PetscMalloc1(nRows*nCols,&R);CHKERRQ(ierr); > > >> for (ii = 0; ii < nRows; ii++) { > > >> for (jj = 0; jj < nCols; jj++) { > > >> R[ii*nCols+jj] = *HH(ii,jj); > > >> // Ensure Hessenberg structure > > >> //if (ii > jj+1) R[ii*nCols+jj] = 0.0; > > >> } > > >> } > > >> > > >> ierr = PetscMalloc1(nRows*nRows,&U);CHKERRQ(ierr); > > >> ierr = PetscMalloc1(nCols*nCols,&VT);CHKERRQ(ierr); > > >> ierr = PetscMalloc1(nRows,&p);CHKERRQ(ierr); > > >> ierr = PetscMalloc1(nCols,&q);CHKERRQ(ierr); > > >> ierr = PetscMalloc1(nRows,&y);CHKERRQ(ierr); > > >> > > >> > printf("\n\n");for(ii=0;ii > >> > > >> // Perform an SVD on the Hessenberg matrix > > >> ierr = PetscFPTrapPush(PETSC_FP_TRAP_OFF);CHKERRQ(ierr); > > >> > PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_("A","A",&nRows,&nCols,R,&nRows,S,U,&nRows,VT,&nCols,work,&lwork,&lierr)); > > >> if (lierr) SETERRQ1(PETSC_COMM_SELF,PETSC_ERR_LIB,"Error in SVD > Lapack routine %d",(int)lierr); > > >> ierr = PetscFPTrapPop();CHKERRQ(ierr); > > >> > > >> // Compute p = ||b|| U^T e_1 > > >> ierr = VecNorm(ksp->vec_rhs,NORM_2,&bnorm);CHKERRQ(ierr); > > >> for (ii=0; ii > >> p[ii] = bnorm*U[ii*nRows]; > > >> } > > >> > > >> // Solve the root finding problem for \mu such that ||q|| < \delta > (where \delta is the radius of the trust region) > > >> // This step is largely copied from Ashley Willis' openpipeflow: > doi.org/10.1016/j.softx.2017.05.003 > > >> mu = S[nCols-1]*S[nCols-1]*1.0e-6; > > >> if (mu < 1.0e-99) mu = 1.0e-99; > > >> qMag = 1.0e+99; > > >> > > >> while (qMag > gmres->delta) { > > >> mu *= 1.1; > > >> qMag2 = 0.0; > > >> for (ii=0; ii > >> q[ii] = p[ii]*S[ii]/(mu + S[ii]*S[ii]); > > >> qMag2 += q[ii]*q[ii]; > > >> } > > >> qMag = PetscSqrtScalar(qMag2); > > >> } > > >> > > >> // Expand y in terms of the right singular vectors as y = V q > > >> for (ii=0; ii > >> y[ii] = 0.0; > > >> for (jj=0; jj > >> y[ii] += VT[jj*nCols+ii]*q[jj]; // transpose of the transpose > > >> } > > >> } > > >> > > >> // Recompute the size of the trust region, \delta > > >> delta2 = 0.0; > > >> for (ii=0; ii > >> j0 = (ii < 2) ? 0 : ii - 1; > > >> p[ii] = 0.0; > > >> for (jj=j0; jj > >> p[ii] -= R[ii*nCols+jj]*y[jj]; > > >> } > > >> if (ii == 0) { > > >> p[ii] += bnorm; > > >> } > > >> delta2 += p[ii]*p[ii]; > > >> } > > >> gmres->delta = PetscSqrtScalar(delta2); > > >> printf("\t\t...final delta: %lf.\n", gmres->delta); > > >> > > >> // Pass the orthnomalized Krylov vector weights back out > > >> for (ii=0; ii > >> nrs[ii] = y[ii]; > > >> } > > >> > > >> ierr = PetscFree(R);CHKERRQ(ierr); > > >> ierr = PetscFree(S);CHKERRQ(ierr); > > >> ierr = PetscFree(U);CHKERRQ(ierr); > > >> ierr = PetscFree(VT);CHKERRQ(ierr); > > >> ierr = PetscFree(p);CHKERRQ(ierr); > > >> ierr = PetscFree(q);CHKERRQ(ierr); > > >> ierr = PetscFree(y);CHKERRQ(ierr); > > >> } > > >> /*** DRL ***/ > > >> > > >> /* Accumulate the correction to the solution of the preconditioned > problem in TEMP */ > > >> ierr = VecSet(VEC_TEMP,0.0);CHKERRQ(ierr); > > >> if (gmres->delta > 0.0) { > > >> ierr = VecMAXPY(VEC_TEMP,it,nrs,&VEC_VV(0));CHKERRQ(ierr); // DRL > > >> } else { > > >> ierr = VecMAXPY(VEC_TEMP,it+1,nrs,&VEC_VV(0));CHKERRQ(ierr); > > >> } > > >> > > >> ierr = > KSPUnwindPreconditioner(ksp,VEC_TEMP,VEC_TEMP_MATOP);CHKERRQ(ierr); > > >> /* add solution to previous solution */ > > >> if (vdest != vs) { > > >> ierr = VecCopy(vs,vdest);CHKERRQ(ierr); > > >> } > > >> ierr = VecAXPY(vdest,1.0,VEC_TEMP);CHKERRQ(ierr); > > >> PetscFunctionReturn(0); > > >> } > > >> /* > > >> Do the scalar work for the orthogonalization. Return new residual > norm. > > >> */ > > >> static PetscErrorCode KSPGMRESUpdateHessenberg(KSP ksp,PetscInt > it,PetscBool hapend,PetscReal *res) > > >> { > > >> PetscScalar *hh,*cc,*ss,tt; > > >> PetscInt j; > > >> KSP_GMRES *gmres = (KSP_GMRES*)(ksp->data); > > >> > > >> PetscFunctionBegin; > > >> hh = HH(0,it); > > >> cc = CC(0); > > >> ss = SS(0); > > >> > > >> /* Apply all the previously computed plane rotations to the new > column > > >> of the Hessenberg matrix */ > > >> for (j=1; j<=it; j++) { > > >> tt = *hh; > > >> *hh = PetscConj(*cc) * tt + *ss * *(hh+1); > > >> hh++; > > >> *hh = *cc++ * *hh - (*ss++ * tt); > > >> } > > >> > > >> /* > > >> compute the new plane rotation, and apply it to: > > >> 1) the right-hand-side of the Hessenberg system > > >> 2) the new column of the Hessenberg matrix > > >> thus obtaining the updated value of the residual > > >> */ > > >> if (!hapend) { > > >> tt = PetscSqrtScalar(PetscConj(*hh) * *hh + PetscConj(*(hh+1)) * > *(hh+1)); > > >> if (tt == 0.0) { > > >> ksp->reason = KSP_DIVERGED_NULL; > > >> PetscFunctionReturn(0); > > >> } > > >> *cc = *hh / tt; > > >> *ss = *(hh+1) / tt; > > >> *GRS(it+1) = -(*ss * *GRS(it)); > > >> *GRS(it) = PetscConj(*cc) * *GRS(it); > > >> *hh = PetscConj(*cc) * *hh + *ss * *(hh+1); > > >> *res = PetscAbsScalar(*GRS(it+1)); > > >> } else { > > >> /* happy breakdown: HH(it+1, it) = 0, therfore we don't need to > apply > > >> another rotation matrix (so RH doesn't change). The new > residual is > > >> always the new sine term times the residual from last time > (GRS(it)), > > >> but now the new sine rotation would be zero...so the > residual should > > >> be zero...so we will multiply "zero" by the last > residual. This might > > >> not be exactly what we want to do here -could just return > "zero". */ > > >> > > >> *res = 0.0; > > >> } > > >> PetscFunctionReturn(0); > > >> } > > >> /* > > >> This routine allocates more work vectors, starting from VEC_VV(it). > > >> */ > > >> PetscErrorCode KSPGMRESGetNewVectors(KSP ksp,PetscInt it) > > >> { > > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > > >> PetscErrorCode ierr; > > >> PetscInt nwork = gmres->nwork_alloc,k,nalloc; > > >> > > >> PetscFunctionBegin; > > >> nalloc = PetscMin(ksp->max_it,gmres->delta_allocate); > > >> /* Adjust the number to allocate to make sure that we don't exceed > the > > >> number of available slots */ > > >> if (it + VEC_OFFSET + nalloc >= gmres->vecs_allocated) { > > >> nalloc = gmres->vecs_allocated - it - VEC_OFFSET; > > >> } > > >> if (!nalloc) PetscFunctionReturn(0); > > >> > > >> gmres->vv_allocated += nalloc; > > >> > > >> ierr = > KSPCreateVecs(ksp,nalloc,&gmres->user_work[nwork],0,NULL);CHKERRQ(ierr); > > >> ierr = > PetscLogObjectParents(ksp,nalloc,gmres->user_work[nwork]);CHKERRQ(ierr); > > >> > > >> gmres->mwork_alloc[nwork] = nalloc; > > >> for (k=0; k > >> gmres->vecs[it+VEC_OFFSET+k] = gmres->user_work[nwork][k]; > > >> } > > >> gmres->nwork_alloc++; > > >> PetscFunctionReturn(0); > > >> } > > >> > > >> PetscErrorCode KSPBuildSolution_GMRES(KSP ksp,Vec ptr,Vec *result) > > >> { > > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > > >> PetscErrorCode ierr; > > >> > > >> PetscFunctionBegin; > > >> if (!ptr) { > > >> if (!gmres->sol_temp) { > > >> ierr = VecDuplicate(ksp->vec_sol,&gmres->sol_temp);CHKERRQ(ierr); > > >> ierr = > PetscLogObjectParent((PetscObject)ksp,(PetscObject)gmres->sol_temp);CHKERRQ(ierr); > > >> } > > >> ptr = gmres->sol_temp; > > >> } > > >> if (!gmres->nrs) { > > >> /* allocate the work area */ > > >> ierr = PetscMalloc1(gmres->max_k,&gmres->nrs);CHKERRQ(ierr); > > >> ierr = > PetscLogObjectMemory((PetscObject)ksp,gmres->max_k*sizeof(PetscScalar));CHKERRQ(ierr); > > >> } > > >> > > >> ierr = > KSPGMRESBuildSoln(gmres->nrs,ksp->vec_sol,ptr,ksp,gmres->it);CHKERRQ(ierr); > > >> if (result) *result = ptr; > > >> PetscFunctionReturn(0); > > >> } > > >> > > >> PetscErrorCode KSPView_GMRES(KSP ksp,PetscViewer viewer) > > >> { > > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > > >> const char *cstr; > > >> PetscErrorCode ierr; > > >> PetscBool iascii,isstring; > > >> > > >> PetscFunctionBegin; > > >> ierr = > PetscObjectTypeCompare((PetscObject)viewer,PETSCVIEWERASCII,&iascii);CHKERRQ(ierr); > > >> ierr = > PetscObjectTypeCompare((PetscObject)viewer,PETSCVIEWERSTRING,&isstring);CHKERRQ(ierr); > > >> if (gmres->orthog == KSPGMRESClassicalGramSchmidtOrthogonalization) { > > >> switch (gmres->cgstype) { > > >> case (KSP_GMRES_CGS_REFINE_NEVER): > > >> cstr = "Classical (unmodified) Gram-Schmidt Orthogonalization > with no iterative refinement"; > > >> break; > > >> case (KSP_GMRES_CGS_REFINE_ALWAYS): > > >> cstr = "Classical (unmodified) Gram-Schmidt Orthogonalization > with one step of iterative refinement"; > > >> break; > > >> case (KSP_GMRES_CGS_REFINE_IFNEEDED): > > >> cstr = "Classical (unmodified) Gram-Schmidt Orthogonalization > with one step of iterative refinement when needed"; > > >> break; > > >> default: > > >> > SETERRQ(PetscObjectComm((PetscObject)ksp),PETSC_ERR_ARG_OUTOFRANGE,"Unknown > orthogonalization"); > > >> } > > >> } else if (gmres->orthog == > KSPGMRESModifiedGramSchmidtOrthogonalization) { > > >> cstr = "Modified Gram-Schmidt Orthogonalization"; > > >> } else { > > >> cstr = "unknown orthogonalization"; > > >> } > > >> if (iascii) { > > >> ierr = PetscViewerASCIIPrintf(viewer," restart=%D, using > %s\n",gmres->max_k,cstr);CHKERRQ(ierr); > > >> ierr = PetscViewerASCIIPrintf(viewer," happy breakdown tolerance > %g\n",(double)gmres->haptol);CHKERRQ(ierr); > > >> } else if (isstring) { > > >> ierr = PetscViewerStringSPrintf(viewer,"%s restart > %D",cstr,gmres->max_k);CHKERRQ(ierr); > > >> } > > >> PetscFunctionReturn(0); > > >> } > > >> > > >> /*@C > > >> KSPGMRESMonitorKrylov - Calls VecView() for each new direction in > the GMRES accumulated Krylov space. > > >> > > >> Collective on KSP > > >> > > >> Input Parameters: > > >> + ksp - the KSP context > > >> . its - iteration number > > >> . fgnorm - 2-norm of residual (or gradient) > > >> - dummy - an collection of viewers created with KSPViewerCreate() > > >> > > >> Options Database Keys: > > >> . -ksp_gmres_kyrlov_monitor > > >> > > >> Notes: A new PETSCVIEWERDRAW is created for each Krylov vector so > they can all be simultaneously viewed > > >> Level: intermediate > > >> > > >> .keywords: KSP, nonlinear, vector, monitor, view, Krylov space > > >> > > >> .seealso: KSPMonitorSet(), KSPMonitorDefault(), VecView(), > KSPViewersCreate(), KSPViewersDestroy() > > >> @*/ > > >> PetscErrorCode KSPGMRESMonitorKrylov(KSP ksp,PetscInt its,PetscReal > fgnorm,void *dummy) > > >> { > > >> PetscViewers viewers = (PetscViewers)dummy; > > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > > >> PetscErrorCode ierr; > > >> Vec x; > > >> PetscViewer viewer; > > >> PetscBool flg; > > >> > > >> PetscFunctionBegin; > > >> ierr = > PetscViewersGetViewer(viewers,gmres->it+1,&viewer);CHKERRQ(ierr); > > >> ierr = > PetscObjectTypeCompare((PetscObject)viewer,PETSCVIEWERDRAW,&flg);CHKERRQ(ierr); > > >> if (!flg) { > > >> ierr = PetscViewerSetType(viewer,PETSCVIEWERDRAW);CHKERRQ(ierr); > > >> ierr = PetscViewerDrawSetInfo(viewer,NULL,"Krylov GMRES > Monitor",PETSC_DECIDE,PETSC_DECIDE,300,300);CHKERRQ(ierr); > > >> } > > >> x = VEC_VV(gmres->it+1); > > >> ierr = VecView(x,viewer);CHKERRQ(ierr); > > >> PetscFunctionReturn(0); > > >> } > > >> > > >> PetscErrorCode KSPSetFromOptions_GMRES(PetscOptionItems > *PetscOptionsObject,KSP ksp) > > >> { > > >> PetscErrorCode ierr; > > >> PetscInt restart; > > >> PetscReal haptol; > > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > > >> PetscBool flg; > > >> > > >> PetscFunctionBegin; > > >> ierr = PetscOptionsHead(PetscOptionsObject,"KSP GMRES > Options");CHKERRQ(ierr); > > >> ierr = PetscOptionsInt("-ksp_gmres_restart","Number of Krylov search > directions","KSPGMRESSetRestart",gmres->max_k,&restart,&flg);CHKERRQ(ierr); > > >> if (flg) { ierr = KSPGMRESSetRestart(ksp,restart);CHKERRQ(ierr); } > > >> ierr = PetscOptionsReal("-ksp_gmres_haptol","Tolerance for exact > convergence (happy > ending)","KSPGMRESSetHapTol",gmres->haptol,&haptol,&flg);CHKERRQ(ierr); > > >> if (flg) { ierr = KSPGMRESSetHapTol(ksp,haptol);CHKERRQ(ierr); } > > >> flg = PETSC_FALSE; > > >> ierr = PetscOptionsBool("-ksp_gmres_preallocate","Preallocate Krylov > vectors","KSPGMRESSetPreAllocateVectors",flg,&flg,NULL);CHKERRQ(ierr); > > >> if (flg) {ierr = KSPGMRESSetPreAllocateVectors(ksp);CHKERRQ(ierr);} > > >> ierr = > PetscOptionsBoolGroupBegin("-ksp_gmres_classicalgramschmidt","Classical > (unmodified) Gram-Schmidt > (fast)","KSPGMRESSetOrthogonalization",&flg);CHKERRQ(ierr); > > >> if (flg) {ierr = > KSPGMRESSetOrthogonalization(ksp,KSPGMRESClassicalGramSchmidtOrthogonalization);CHKERRQ(ierr);} > > >> ierr = > PetscOptionsBoolGroupEnd("-ksp_gmres_modifiedgramschmidt","Modified > Gram-Schmidt (slow,more > stable)","KSPGMRESSetOrthogonalization",&flg);CHKERRQ(ierr); > > >> if (flg) {ierr = > KSPGMRESSetOrthogonalization(ksp,KSPGMRESModifiedGramSchmidtOrthogonalization);CHKERRQ(ierr);} > > >> ierr = PetscOptionsEnum("-ksp_gmres_cgs_refinement_type","Type of > iterative refinement for classical (unmodified) > Gram-Schmidt","KSPGMRESSetCGSRefinementType", > > >> > KSPGMRESCGSRefinementTypes,(PetscEnum)gmres->cgstype,(PetscEnum*)&gmres->cgstype,&flg);CHKERRQ(ierr); > > >> flg = PETSC_FALSE; > > >> ierr = PetscOptionsBool("-ksp_gmres_krylov_monitor","Plot the Krylov > directions","KSPMonitorSet",flg,&flg,NULL);CHKERRQ(ierr); > > >> if (flg) { > > >> PetscViewers viewers; > > >> ierr = > PetscViewersCreate(PetscObjectComm((PetscObject)ksp),&viewers);CHKERRQ(ierr); > > >> ierr = > KSPMonitorSet(ksp,KSPGMRESMonitorKrylov,viewers,(PetscErrorCode > (*)(void**))PetscViewersDestroy);CHKERRQ(ierr); > > >> } > > >> ierr = PetscOptionsTail();CHKERRQ(ierr); > > >> PetscFunctionReturn(0); > > >> } > > >> > > >> PetscErrorCode KSPGMRESSetHapTol_GMRES(KSP ksp,PetscReal tol) > > >> { > > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > > >> > > >> PetscFunctionBegin; > > >> if (tol < 0.0) > SETERRQ(PetscObjectComm((PetscObject)ksp),PETSC_ERR_ARG_OUTOFRANGE,"Tolerance > must be non-negative"); > > >> gmres->haptol = tol; > > >> PetscFunctionReturn(0); > > >> } > > >> > > >> PetscErrorCode KSPGMRESGetRestart_GMRES(KSP ksp,PetscInt *max_k) > > >> { > > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > > >> > > >> PetscFunctionBegin; > > >> *max_k = gmres->max_k; > > >> PetscFunctionReturn(0); > > >> } > > >> > > >> PetscErrorCode KSPGMRESSetRestart_GMRES(KSP ksp,PetscInt max_k) > > >> { > > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > > >> PetscErrorCode ierr; > > >> > > >> PetscFunctionBegin; > > >> if (max_k < 1) > SETERRQ(PetscObjectComm((PetscObject)ksp),PETSC_ERR_ARG_OUTOFRANGE,"Restart > must be positive"); > > >> if (!ksp->setupstage) { > > >> gmres->max_k = max_k; > > >> } else if (gmres->max_k != max_k) { > > >> gmres->max_k = max_k; > > >> ksp->setupstage = KSP_SETUP_NEW; > > >> /* free the data structures, then create them again */ > > >> ierr = KSPReset_GMRES(ksp);CHKERRQ(ierr); > > >> } > > >> PetscFunctionReturn(0); > > >> } > > >> > > >> PetscErrorCode KSPGMRESSetOrthogonalization_GMRES(KSP ksp,FCN fcn) > > >> { > > >> PetscFunctionBegin; > > >> ((KSP_GMRES*)ksp->data)->orthog = fcn; > > >> PetscFunctionReturn(0); > > >> } > > >> > > >> PetscErrorCode KSPGMRESGetOrthogonalization_GMRES(KSP ksp,FCN *fcn) > > >> { > > >> PetscFunctionBegin; > > >> *fcn = ((KSP_GMRES*)ksp->data)->orthog; > > >> PetscFunctionReturn(0); > > >> } > > >> > > >> PetscErrorCode KSPGMRESSetPreAllocateVectors_GMRES(KSP ksp) > > >> { > > >> KSP_GMRES *gmres; > > >> > > >> PetscFunctionBegin; > > >> gmres = (KSP_GMRES*)ksp->data; > > >> gmres->q_preallocate = 1; > > >> PetscFunctionReturn(0); > > >> } > > >> > > >> PetscErrorCode KSPGMRESSetCGSRefinementType_GMRES(KSP > ksp,KSPGMRESCGSRefinementType type) > > >> { > > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > > >> > > >> PetscFunctionBegin; > > >> gmres->cgstype = type; > > >> PetscFunctionReturn(0); > > >> } > > >> > > >> PetscErrorCode KSPGMRESGetCGSRefinementType_GMRES(KSP > ksp,KSPGMRESCGSRefinementType *type) > > >> { > > >> KSP_GMRES *gmres = (KSP_GMRES*)ksp->data; > > >> > > >> PetscFunctionBegin; > > >> *type = gmres->cgstype; > > >> PetscFunctionReturn(0); > > >> } > > >> > > >> /*@ > > >> KSPGMRESSetCGSRefinementType - Sets the type of iterative > refinement to use > > >> in the classical Gram Schmidt orthogonalization. > > >> > > >> Logically Collective on KSP > > >> > > >> Input Parameters: > > >> + ksp - the Krylov space context > > >> - type - the type of refinement > > >> > > >> Options Database: > > >> . -ksp_gmres_cgs_refinement_type > > > >> > > >> Level: intermediate > > >> > > >> .keywords: KSP, GMRES, iterative refinement > > >> > > >> .seealso: KSPGMRESSetOrthogonalization(), KSPGMRESCGSRefinementType, > KSPGMRESClassicalGramSchmidtOrthogonalization(), > KSPGMRESGetCGSRefinementType(), > > >> KSPGMRESGetOrthogonalization() > > >> @*/ > > >> PetscErrorCode KSPGMRESSetCGSRefinementType(KSP > ksp,KSPGMRESCGSRefinementType type) > > >> { > > >> PetscErrorCode ierr; > > >> > > >> PetscFunctionBegin; > > >> PetscValidHeaderSpecific(ksp,KSP_CLASSID,1); > > >> PetscValidLogicalCollectiveEnum(ksp,type,2); > > >> ierr = > PetscTryMethod(ksp,"KSPGMRESSetCGSRefinementType_C",(KSP,KSPGMRESCGSRefinementType),(ksp,type));CHKERRQ(ierr); > > >> PetscFunctionReturn(0); > > >> } > > >> > > >> /*@ > > >> KSPGMRESGetCGSRefinementType - Gets the type of iterative > refinement to use > > >> in the classical Gram Schmidt orthogonalization. > > >> > > >> Not Collective > > >> > > >> Input Parameter: > > >> . ksp - the Krylov space context > > >> > > >> Output Parameter: > > >> . type - the type of refinement > > >> > > >> Options Database: > > >> . -ksp_gmres_cgs_refinement_type > > >> > > >> Level: intermediate > > >> > > >> .keywords: KSP, GMRES, iterative refinement > > >> > > >> .seealso: KSPGMRESSetOrthogonalization(), KSPGMRESCGSRefinementType, > KSPGMRESClassicalGramSchmidtOrthogonalization(), > KSPGMRESSetCGSRefinementType(), > > >> KSPGMRESGetOrthogonalization() > > >> @*/ > > >> PetscErrorCode KSPGMRESGetCGSRefinementType(KSP > ksp,KSPGMRESCGSRefinementType *type) > > >> { > > >> PetscErrorCode ierr; > > >> > > >> PetscFunctionBegin; > > >> PetscValidHeaderSpecific(ksp,KSP_CLASSID,1); > > >> ierr = > PetscUseMethod(ksp,"KSPGMRESGetCGSRefinementType_C",(KSP,KSPGMRESCGSRefinementType*),(ksp,type));CHKERRQ(ierr); > > >> PetscFunctionReturn(0); > > >> } > > >> > > >> > > >> /*@ > > >> KSPGMRESSetRestart - Sets number of iterations at which GMRES, > FGMRES and LGMRES restarts. > > >> > > >> Logically Collective on KSP > > >> > > >> Input Parameters: > > >> + ksp - the Krylov space context > > >> - restart - integer restart value > > >> > > >> Options Database: > > >> . -ksp_gmres_restart > > >> > > >> Note: The default value is 30. > > >> > > >> Level: intermediate > > >> > > >> .keywords: KSP, GMRES, restart, iterations > > >> > > >> .seealso: KSPSetTolerances(), KSPGMRESSetOrthogonalization(), > KSPGMRESSetPreAllocateVectors(), KSPGMRESGetRestart() > > >> @*/ > > >> PetscErrorCode KSPGMRESSetRestart(KSP ksp, PetscInt restart) > > >> { > > >> PetscErrorCode ierr; > > >> > > >> PetscFunctionBegin; > > >> PetscValidLogicalCollectiveInt(ksp,restart,2); > > >> > > >> ierr = > PetscTryMethod(ksp,"KSPGMRESSetRestart_C",(KSP,PetscInt),(ksp,restart));CHKERRQ(ierr); > > >> PetscFunctionReturn(0); > > >> } > > >> > > >> /*@ > > >> KSPGMRESGetRestart - Gets number of iterations at which GMRES, > FGMRES and LGMRES restarts. > > >> > > >> Not Collective > > >> > > >> Input Parameter: > > >> . ksp - the Krylov space context > > >> > > >> Output Parameter: > > >> . restart - integer restart value > > >> > > >> Note: The default value is 30. > > >> > > >> Level: intermediate > > >> > > >> .keywords: KSP, GMRES, restart, iterations > > >> > > >> .seealso: KSPSetTolerances(), KSPGMRESSetOrthogonalization(), > KSPGMRESSetPreAllocateVectors(), KSPGMRESSetRestart() > > >> @*/ > > >> PetscErrorCode KSPGMRESGetRestart(KSP ksp, PetscInt *restart) > > >> { > > >> PetscErrorCode ierr; > > >> > > >> PetscFunctionBegin; > > >> ierr = > PetscUseMethod(ksp,"KSPGMRESGetRestart_C",(KSP,PetscInt*),(ksp,restart));CHKERRQ(ierr); > > >> PetscFunctionReturn(0); > > >> } > > >> > > >> /*@ > > >> KSPGMRESSetHapTol - Sets tolerance for determining happy breakdown > in GMRES, FGMRES and LGMRES. > > >> > > >> Logically Collective on KSP > > >> > > >> Input Parameters: > > >> + ksp - the Krylov space context > > >> - tol - the tolerance > > >> > > >> Options Database: > > >> . -ksp_gmres_haptol > > >> > > >> Note: Happy breakdown is the rare case in GMRES where an 'exact' > solution is obtained after > > >> a certain number of iterations. If you attempt more > iterations after this point unstable > > >> things can happen hence very occasionally you may need to set > this value to detect this condition > > >> > > >> Level: intermediate > > >> > > >> .keywords: KSP, GMRES, tolerance > > >> > > >> .seealso: KSPSetTolerances() > > >> @*/ > > >> PetscErrorCode KSPGMRESSetHapTol(KSP ksp,PetscReal tol) > > >> { > > >> PetscErrorCode ierr; > > >> > > >> PetscFunctionBegin; > > >> PetscValidLogicalCollectiveReal(ksp,tol,2); > > >> ierr = > PetscTryMethod((ksp),"KSPGMRESSetHapTol_C",(KSP,PetscReal),((ksp),(tol)));CHKERRQ(ierr); > > >> PetscFunctionReturn(0); > > >> } > > >> > > >> /*MC > > >> KSPGMRES - Implements the Generalized Minimal Residual method. > > >> (Saad and Schultz, 1986) with restart > > >> > > >> > > >> Options Database Keys: > > >> + -ksp_gmres_restart - the number of Krylov directions to > orthogonalize against > > >> . -ksp_gmres_haptol - sets the tolerance for "happy ending" > (exact convergence) > > >> . -ksp_gmres_preallocate - preallocate all the Krylov search > directions initially (otherwise groups of > > >> vectors are allocated as needed) > > >> . -ksp_gmres_classicalgramschmidt - use classical (unmodified) > Gram-Schmidt to orthogonalize against the Krylov space (fast) (the default) > > >> . -ksp_gmres_modifiedgramschmidt - use modified Gram-Schmidt in the > orthogonalization (more stable, but slower) > > >> . -ksp_gmres_cgs_refinement_type - > determine if iterative refinement is used to increase the > > >> stability of the classical > Gram-Schmidt orthogonalization. > > >> - -ksp_gmres_krylov_monitor - plot the Krylov space generated > > >> > > >> Level: beginner > > >> > > >> Notes: Left and right preconditioning are supported, but not > symmetric preconditioning. > > >> > > >> References: > > >> . 1. - YOUCEF SAAD AND MARTIN H. SCHULTZ, GMRES: A GENERALIZED > MINIMAL RESIDUAL ALGORITHM FOR SOLVING NONSYMMETRIC LINEAR SYSTEMS. > > >> SIAM J. ScI. STAT. COMPUT. Vo|. 7, No. 3, July 1986. > > >> > > >> .seealso: KSPCreate(), KSPSetType(), KSPType (for list of available > types), KSP, KSPFGMRES, KSPLGMRES, > > >> KSPGMRESSetRestart(), KSPGMRESSetHapTol(), > KSPGMRESSetPreAllocateVectors(), KSPGMRESSetOrthogonalization(), > KSPGMRESGetOrthogonalization(), > > >> KSPGMRESClassicalGramSchmidtOrthogonalization(), > KSPGMRESModifiedGramSchmidtOrthogonalization(), > > >> KSPGMRESCGSRefinementType, KSPGMRESSetCGSRefinementType(), > KSPGMRESGetCGSRefinementType(), KSPGMRESMonitorKrylov(), KSPSetPCSide() > > >> > > >> M*/ > > >> > > >> PETSC_EXTERN PetscErrorCode KSPCreate_GMRES(KSP ksp) > > >> { > > >> KSP_GMRES *gmres; > > >> PetscErrorCode ierr; > > >> > > >> PetscFunctionBegin; > > >> ierr = PetscNewLog(ksp,&gmres);CHKERRQ(ierr); > > >> ksp->data = (void*)gmres; > > >> > > >> ierr = > KSPSetSupportedNorm(ksp,KSP_NORM_PRECONDITIONED,PC_LEFT,4);CHKERRQ(ierr); > > >> ierr = > KSPSetSupportedNorm(ksp,KSP_NORM_UNPRECONDITIONED,PC_RIGHT,3);CHKERRQ(ierr); > > >> ierr = > KSPSetSupportedNorm(ksp,KSP_NORM_PRECONDITIONED,PC_SYMMETRIC,2);CHKERRQ(ierr); > > >> ierr = > KSPSetSupportedNorm(ksp,KSP_NORM_NONE,PC_RIGHT,1);CHKERRQ(ierr); > > >> ierr = > KSPSetSupportedNorm(ksp,KSP_NORM_NONE,PC_LEFT,1);CHKERRQ(ierr); > > >> > > >> ksp->ops->buildsolution = KSPBuildSolution_GMRES; > > >> ksp->ops->setup = KSPSetUp_GMRES; > > >> ksp->ops->solve = KSPSolve_GMRES; > > >> ksp->ops->reset = KSPReset_GMRES; > > >> ksp->ops->destroy = KSPDestroy_GMRES; > > >> ksp->ops->view = KSPView_GMRES; > > >> ksp->ops->setfromoptions = KSPSetFromOptions_GMRES; > > >> ksp->ops->computeextremesingularvalues = > KSPComputeExtremeSingularValues_GMRES; > > >> ksp->ops->computeeigenvalues = KSPComputeEigenvalues_GMRES; > > >> #if !defined(PETSC_USE_COMPLEX) && !defined(PETSC_HAVE_ESSL) > > >> ksp->ops->computeritz = KSPComputeRitz_GMRES; > > >> #endif > > >> ierr = > PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetPreAllocateVectors_C",KSPGMRESSetPreAllocateVectors_GMRES);CHKERRQ(ierr); > > >> ierr = > PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetOrthogonalization_C",KSPGMRESSetOrthogonalization_GMRES);CHKERRQ(ierr); > > >> ierr = > PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESGetOrthogonalization_C",KSPGMRESGetOrthogonalization_GMRES);CHKERRQ(ierr); > > >> ierr = > PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetRestart_C",KSPGMRESSetRestart_GMRES);CHKERRQ(ierr); > > >> ierr = > PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESGetRestart_C",KSPGMRESGetRestart_GMRES);CHKERRQ(ierr); > > >> ierr = > PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetHapTol_C",KSPGMRESSetHapTol_GMRES);CHKERRQ(ierr); > > >> ierr = > PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESSetCGSRefinementType_C",KSPGMRESSetCGSRefinementType_GMRES);CHKERRQ(ierr); > > >> ierr = > PetscObjectComposeFunction((PetscObject)ksp,"KSPGMRESGetCGSRefinementType_C",KSPGMRESGetCGSRefinementType_GMRES);CHKERRQ(ierr); > > >> > > >> gmres->haptol = 1.0e-30; > > >> gmres->q_preallocate = 0; > > >> gmres->delta_allocate = GMRES_DELTA_DIRECTIONS; > > >> gmres->orthog = > KSPGMRESClassicalGramSchmidtOrthogonalization; > > >> gmres->nrs = 0; > > >> gmres->sol_temp = 0; > > >> gmres->max_k = GMRES_DEFAULT_MAXK; > > >> gmres->Rsvd = 0; > > >> gmres->cgstype = KSP_GMRES_CGS_REFINE_NEVER; > > >> gmres->orthogwork = 0; > > >> gmres->delta = -1.0; // DRL > > >> PetscFunctionReturn(0); > > >> } > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon May 20 05:52:04 2019 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 20 May 2019 06:52:04 -0400 Subject: [petsc-users] DMPlex assembly global stiffness matrix In-Reply-To: References: Message-ID: On Fri, May 17, 2019 at 7:59 PM Josh L via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi, > > I have a DM that has 2 fields , and field #1 has 2 dofs and field #2 has 1 > dof. > I only have dofs on vertex. > > Can I use the following to assemble global stiffness matrix instead of > using MatSetClosure(I am not integrating 2 field separately) > > DMGetGlobalSection(dm,GlobalSection) > For cells > calculate element stiffness matrix eleMat > For vertex in cells > PetscSectionGetOffset(GlobalSection, vertex, offset) > loc=[offset_v1, offset_v1+1, offset_v1+2, offset_v2, > offset_v2+1.......] > End > MatSetValues(GlobalMat, n,loc,n,loc, eleMat, ADD_VALUES) > End > AssemblyBegin and End. > > Basically use the offset from global section to have the global dof number. > Yes, that is exactly what happens in MatSetClosure(). However, you have to be careful if you use constraints (like Dirichlet conditions) in the Section to filter them out. I use negative indices to do that since they are ignored by MatSetValues(). Are you doing this because you want to set one field at a time? If so, just call DMCreateSubDM() for that field, and everything should work correctly with MatSetClosure(). Thanks, Matt > Thanks, > Josh > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon May 20 05:54:10 2019 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 20 May 2019 06:54:10 -0400 Subject: [petsc-users] problem with generating simplicies mesh In-Reply-To: References: Message-ID: On Sun, May 19, 2019 at 9:22 AM ??? via petsc-users wrote: > I have problem with generating simplicies mesh. > I do as the description in DMPlexCreateBoxmesh says, but still meet error. > Stefano is right that you will need a mesh generator for a simplex mesh. However, you are asking for a 1D mesh for which there are no generators. Since in 1D simplces and tensor cells are the same, just change it to tensor and it will work. Thanks, Matt > The following is the error message: > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Argument out of range > [0]PETSC ERROR: No grid generator of dimension 1 registered > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.11.1-723-g96d64d1 GIT > Date: 2019-05-15 13:23:17 +0000 > [0]PETSC ERROR: ./membrane on a arch-linux2-c-debug named > simon-System-Product-Name by simon Sun May 19 20:54:54 2019 > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ > --with-fc=gfortran --download-mpich --download-fblaslapack > [0]PETSC ERROR: #1 DMPlexGenerate() line 181 in > /home/simon/petsc/src/dm/impls/plex/plexgenerate.c > [0]PETSC ERROR: #2 DMPlexCreateBoxMesh_Simplex_Internal() line 536 in > /home/simon/petsc/src/dm/impls/plex/plexcreate.c > [0]PETSC ERROR: #3 DMPlexCreateBoxMesh() line 1071 in > /home/simon/petsc/src/dm/impls/plex/plexcreate.c > [0]PETSC ERROR: #4 main() line 54 in /home/simon/Downloads/membrane.c > [0]PETSC ERROR: PETSc Option Table entries: > [0]PETSC ERROR: -dm_view > [0]PETSC ERROR: ----------------End of Error Message -------send entire > error message to petsc-maint at mcs.anl.gov---------- > application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 > [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=63 > : > system msg for write_line failure : Bad file descriptor > > > > I need some help about this, please. > > Simon > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon May 20 06:02:22 2019 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 20 May 2019 07:02:22 -0400 Subject: [petsc-users] Creating a DMNetwork from a DMPlex In-Reply-To: References: <9DAFD49B-AB7F-435F-BB27-16EF946E1241@mcs.anl.gov> Message-ID: On Mon, May 20, 2019 at 3:05 AM Swarnava Ghosh via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi Barry, > > Thank you for your email. My planned discretization is based on the fact > that I need a distributed unstructured mesh, where at each vertex point I > perform local calculations. For these calculations, I do NOT need need to > assemble any global matrix. I will have fields defined at the vertices, and > using linear interpolation, I am planing to find the values of these fields > at some spatial points with are within a ball around each vertex. Once the > values of these fields are known within the compact support around each > vertex, I do local computations to calculate my unknown field. My reason > for having the a mesh is essentially to 1) define fields at the vertices > and 2) perform linear interpolation (using finite elements) at some spatial > points. Also the local computations around at each vertex is > computationally the most expensive step. In that case, having a cell > partitioning will result in vertices being shared among processes, which > will result in redundant computations. > > My idea is therefore to have DMNetwork to distribute vertices across > processes and use finite elements for the linear interpolation part. > I think DMNetwork is not buying you anything here. It seems to make more sense to do it directly in Plex. You can easily lay down a P1 element for each field so that you can interpolate wherever you want. I would start from a clean example, such as SNES ex17. That solves elasticity, so it has multiple fields and FEM. The change is that you don't want to use any of the assembly functions, so you keep the code that does data layout and FEM discretization, but it ignore the residual/Jacobian stuff. Feel free to ask about using the lower-level interpolation stuff which is not as documented. Thanks, Matt > Thanks, > SG > > > > On Sun, May 19, 2019 at 6:54 PM Smith, Barry F. > wrote: > >> >> I am not sure you want DMNetwork, DMNetwork has no geometry; it only >> has vertices and edges. Vertices are connected to other vertices through >> the edges. For example I can't see how one would do vertex centered finite >> volume methods with DMNetwork. Maybe if you said something more about your >> planned discretization we could figure something out. >> >> > On May 19, 2019, at 8:32 PM, Swarnava Ghosh >> wrote: >> > >> > Hi Barry, >> > >> > No, the gmesh file contains a mesh and not a graph/network. >> > In that case, is it possible to create a DMNetwork first from the >> DMPlex and then distribute the DMNetwork. >> > >> > I have this case, because I want a vertex partitioning of my mesh. >> Domain decomposition of DMPlex gives me cell partitioning. Essentially what >> I want is that no two processes can share a vertex BUT that can share an >> edge. Similar to how a DMDA is distributed. >> > >> > Thanks, >> > Swarnava >> > >> > On Sun, May 19, 2019 at 4:50 PM Smith, Barry F. >> wrote: >> > >> > This use case never occurred to us. Is the gmesh file containing a >> graph/network (as opposed to a mesh)? There seem two choices >> > >> > 1) if the gmesh file contains a graph/network one could write a gmesh >> reader for that case that reads directly for and constructs a DMNetwork or >> > >> > 2) write a converter for a DMPlex to DMNetwork. >> > >> > I lean toward the first >> > >> > Either way you need to understand the documentation for DMNetwork >> and how to build one up. >> > >> > >> > Barry >> > >> > >> > > On May 19, 2019, at 6:34 PM, Swarnava Ghosh via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> > > >> > > Hi Petsc users and developers, >> > > >> > > I am trying to find a way of creating a DMNetwork from a DMPlex. I >> have read the DMPlex from a gmesh file and have it distributed. >> > > >> > > Thanks, >> > > SG >> > >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Mon May 20 07:34:35 2019 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Mon, 20 May 2019 14:34:35 +0200 Subject: [petsc-users] problem with generating simplicies mesh In-Reply-To: References: Message-ID: Matt, The code is actually for 2d. > On May 20, 2019, at 12:54 PM, Matthew Knepley via petsc-users wrote: > > On Sun, May 19, 2019 at 9:22 AM ??? via petsc-users > wrote: > I have problem with generating simplicies mesh. > I do as the description in DMPlexCreateBoxmesh says, but still meet error. > > Stefano is right that you will need a mesh generator for a simplex mesh. However, you are asking for > a 1D mesh for which there are no generators. Since in 1D simplces and tensor cells are the same, > just change it to tensor and it will work. > > Thanks, > > Matt > > The following is the error message: > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Argument out of range > [0]PETSC ERROR: No grid generator of dimension 1 registered > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.11.1-723-g96d64d1 GIT Date: 2019-05-15 13:23:17 +0000 > [0]PETSC ERROR: ./membrane on a arch-linux2-c-debug named simon-System-Product-Name by simon Sun May 19 20:54:54 2019 > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mpich --download-fblaslapack > [0]PETSC ERROR: #1 DMPlexGenerate() line 181 in /home/simon/petsc/src/dm/impls/plex/plexgenerate.c > [0]PETSC ERROR: #2 DMPlexCreateBoxMesh_Simplex_Internal() line 536 in /home/simon/petsc/src/dm/impls/plex/plexcreate.c > [0]PETSC ERROR: #3 DMPlexCreateBoxMesh() line 1071 in /home/simon/petsc/src/dm/impls/plex/plexcreate.c > [0]PETSC ERROR: #4 main() line 54 in /home/simon/Downloads/membrane.c > [0]PETSC ERROR: PETSc Option Table entries: > [0]PETSC ERROR: -dm_view > [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- > application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 > [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=63 > : > system msg for write_line failure : Bad file descriptor > > > > I need some help about this, please. > > Simon > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon May 20 08:36:38 2019 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 20 May 2019 09:36:38 -0400 Subject: [petsc-users] problem with generating simplicies mesh In-Reply-To: References: Message-ID: On Mon, May 20, 2019 at 8:34 AM Stefano Zampini wrote: > Matt, > > The code is actually for 2d. > "No grid generator of dimension 1 registered" Matt > On May 20, 2019, at 12:54 PM, Matthew Knepley via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > On Sun, May 19, 2019 at 9:22 AM ??? via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> I have problem with generating simplicies mesh. >> I do as the description in DMPlexCreateBoxmesh says, but still meet error. >> > > Stefano is right that you will need a mesh generator for a simplex mesh. > However, you are asking for > a 1D mesh for which there are no generators. Since in 1D simplces and > tensor cells are the same, > just change it to tensor and it will work. > > Thanks, > > Matt > > >> The following is the error message: >> [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [0]PETSC ERROR: Argument out of range >> [0]PETSC ERROR: No grid generator of dimension 1 registered >> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for >> trouble shooting. >> [0]PETSC ERROR: Petsc Development GIT revision: v3.11.1-723-g96d64d1 GIT >> Date: 2019-05-15 13:23:17 +0000 >> [0]PETSC ERROR: ./membrane on a arch-linux2-c-debug named >> simon-System-Product-Name by simon Sun May 19 20:54:54 2019 >> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ >> --with-fc=gfortran --download-mpich --download-fblaslapack >> [0]PETSC ERROR: #1 DMPlexGenerate() line 181 in >> /home/simon/petsc/src/dm/impls/plex/plexgenerate.c >> [0]PETSC ERROR: #2 DMPlexCreateBoxMesh_Simplex_Internal() line 536 in >> /home/simon/petsc/src/dm/impls/plex/plexcreate.c >> [0]PETSC ERROR: #3 DMPlexCreateBoxMesh() line 1071 in >> /home/simon/petsc/src/dm/impls/plex/plexcreate.c >> [0]PETSC ERROR: #4 main() line 54 in /home/simon/Downloads/membrane.c >> [0]PETSC ERROR: PETSc Option Table entries: >> [0]PETSC ERROR: -dm_view >> [0]PETSC ERROR: ----------------End of Error Message -------send entire >> error message to petsc-maint at mcs.anl.gov---------- >> application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 >> [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=63 >> : >> system msg for write_line failure : Bad file descriptor >> >> >> >> I need some help about this, please. >> >> Simon >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Mon May 20 08:38:05 2019 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Mon, 20 May 2019 15:38:05 +0200 Subject: [petsc-users] problem with generating simplicies mesh In-Reply-To: References: Message-ID: Matt, You coded it. This is trying to mesh the boundary?. > On May 20, 2019, at 3:36 PM, Matthew Knepley wrote: > > On Mon, May 20, 2019 at 8:34 AM Stefano Zampini > wrote: > Matt, > > The code is actually for 2d. > > "No grid generator of dimension 1 registered" > > Matt > >> On May 20, 2019, at 12:54 PM, Matthew Knepley via petsc-users > wrote: >> >> On Sun, May 19, 2019 at 9:22 AM ??? via petsc-users > wrote: >> I have problem with generating simplicies mesh. >> I do as the description in DMPlexCreateBoxmesh says, but still meet error. >> >> Stefano is right that you will need a mesh generator for a simplex mesh. However, you are asking for >> a 1D mesh for which there are no generators. Since in 1D simplces and tensor cells are the same, >> just change it to tensor and it will work. >> >> Thanks, >> >> Matt >> >> The following is the error message: >> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >> [0]PETSC ERROR: Argument out of range >> [0]PETSC ERROR: No grid generator of dimension 1 registered >> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >> [0]PETSC ERROR: Petsc Development GIT revision: v3.11.1-723-g96d64d1 GIT Date: 2019-05-15 13:23:17 +0000 >> [0]PETSC ERROR: ./membrane on a arch-linux2-c-debug named simon-System-Product-Name by simon Sun May 19 20:54:54 2019 >> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-mpich --download-fblaslapack >> [0]PETSC ERROR: #1 DMPlexGenerate() line 181 in /home/simon/petsc/src/dm/impls/plex/plexgenerate.c >> [0]PETSC ERROR: #2 DMPlexCreateBoxMesh_Simplex_Internal() line 536 in /home/simon/petsc/src/dm/impls/plex/plexcreate.c >> [0]PETSC ERROR: #3 DMPlexCreateBoxMesh() line 1071 in /home/simon/petsc/src/dm/impls/plex/plexcreate.c >> [0]PETSC ERROR: #4 main() line 54 in /home/simon/Downloads/membrane.c >> [0]PETSC ERROR: PETSc Option Table entries: >> [0]PETSC ERROR: -dm_view >> [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov ---------- >> application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 >> [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=63 >> : >> system msg for write_line failure : Bad file descriptor >> >> >> >> I need some help about this, please. >> >> Simon >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon May 20 08:42:33 2019 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 20 May 2019 09:42:33 -0400 Subject: [petsc-users] problem with generating simplicies mesh In-Reply-To: References: Message-ID: On Mon, May 20, 2019 at 9:38 AM Stefano Zampini wrote: > Matt, > > You coded it. This is trying to mesh the boundary?. > Maybe. I was not the last one to touch it :) https://bitbucket.org/petsc/petsc/commits/367003a68d4e38a62ba2a0620cd4e6f42aa373fd#chg-src/dm/impls/plex/plexgenerate.c I will fix the error message. Thanks, Matt > On May 20, 2019, at 3:36 PM, Matthew Knepley wrote: > > On Mon, May 20, 2019 at 8:34 AM Stefano Zampini > wrote: > >> Matt, >> >> The code is actually for 2d. >> > > "No grid generator of dimension 1 registered" > > Matt > > >> On May 20, 2019, at 12:54 PM, Matthew Knepley via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> >> On Sun, May 19, 2019 at 9:22 AM ??? via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> >>> I have problem with generating simplicies mesh. >>> I do as the description in DMPlexCreateBoxmesh says, but still meet >>> error. >>> >> >> Stefano is right that you will need a mesh generator for a simplex mesh. >> However, you are asking for >> a 1D mesh for which there are no generators. Since in 1D simplces and >> tensor cells are the same, >> just change it to tensor and it will work. >> >> Thanks, >> >> Matt >> >> >>> The following is the error message: >>> [0]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> [0]PETSC ERROR: Argument out of range >>> [0]PETSC ERROR: No grid generator of dimension 1 registered >>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for >>> trouble shooting. >>> [0]PETSC ERROR: Petsc Development GIT revision: v3.11.1-723-g96d64d1 >>> GIT Date: 2019-05-15 13:23:17 +0000 >>> [0]PETSC ERROR: ./membrane on a arch-linux2-c-debug named >>> simon-System-Product-Name by simon Sun May 19 20:54:54 2019 >>> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ >>> --with-fc=gfortran --download-mpich --download-fblaslapack >>> [0]PETSC ERROR: #1 DMPlexGenerate() line 181 in >>> /home/simon/petsc/src/dm/impls/plex/plexgenerate.c >>> [0]PETSC ERROR: #2 DMPlexCreateBoxMesh_Simplex_Internal() line 536 in >>> /home/simon/petsc/src/dm/impls/plex/plexcreate.c >>> [0]PETSC ERROR: #3 DMPlexCreateBoxMesh() line 1071 in >>> /home/simon/petsc/src/dm/impls/plex/plexcreate.c >>> [0]PETSC ERROR: #4 main() line 54 in /home/simon/Downloads/membrane.c >>> [0]PETSC ERROR: PETSc Option Table entries: >>> [0]PETSC ERROR: -dm_view >>> [0]PETSC ERROR: ----------------End of Error Message -------send entire >>> error message to petsc-maint at mcs.anl.gov---------- >>> application called MPI_Abort(MPI_COMM_WORLD, 63) - process 0 >>> [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=63 >>> : >>> system msg for write_line failure : Bad file descriptor >>> >>> >>> >>> I need some help about this, please. >>> >>> Simon >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From juaneah at gmail.com Mon May 20 09:40:55 2019 From: juaneah at gmail.com (Emmanuel Ayala) Date: Mon, 20 May 2019 16:40:55 +0200 Subject: [petsc-users] PETSc Matrix to MatLab In-Reply-To: References: Message-ID: Thanks a lot for the advice, I was filling the matrix only with one process. Best regards. El vie., 17 de may. de 2019 a la(s) 17:49, Zhang, Hong (hzhang at mcs.anl.gov) escribi?: > Check your petsc matrix before dumping the data by adding > MatView(A,PETSC_VIEWER_STDOUT_WORLD); > immediately after calling MatAssemblyEnd(). > Do you see a correct parallel matrix? > Hong > > > On Fri, May 17, 2019 at 10:24 AM Emmanuel Ayala via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> Hello, >> >> I am a newby with PETSc. I want to check some matrices generated in >> PETSc, using MatLab. >> >> I created a matrix A (MATMPIAIJ), the partition is defined by PETSc and I >> defined the global size. >> >> Then, I used the next code to save in binary format the matrix from PETSc: >> >> PetscViewer viewer; >> PetscViewerBinaryOpen(PETSC_COMM_WORLD, "matrix", FILE_MODE_WRITE, >> &viewer); >> >> After that (I think) MatView writes in the viewer: >> >> MatView(A,viewer); >> >> Then I laod the matrix in MatLab with PetscBinaryRead(). >> >> For one process everything is Ok, I can see the full pattern of the >> sparse matrix (spy(A)). But when I generate the matrix A with more than one >> process, the resultant matrix only contains the data from the process 0. >> >> What is the mistake in my procedure? >> >> 1. I just want to export the PETSc matrix to MatLab, for any number of >> process. >> >> Regards. >> Thanks in advance, for your time. >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ysjosh.lo at gmail.com Mon May 20 11:15:29 2019 From: ysjosh.lo at gmail.com (Josh L) Date: Mon, 20 May 2019 11:15:29 -0500 Subject: [petsc-users] DMPlex assembly global stiffness matrix In-Reply-To: References: Message-ID: Hi, I get it done by adding one more field that contains all the dofs i need, and use DMGetSubDM and MatSetClosure with this subDM VecSetClosure takes in data partitioned by field and rearrange them into sieve ordering. I think MatSetClosure does the same thing, but my stiffness matrix is not partitioned by field, so I only want to have one field with all my dofs on it. Thanks, Josh Matthew Knepley ? 2019?5?20? ?? ??5:52??? > On Fri, May 17, 2019 at 7:59 PM Josh L via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> Hi, >> >> I have a DM that has 2 fields , and field #1 has 2 dofs and field #2 has >> 1 dof. >> I only have dofs on vertex. >> >> Can I use the following to assemble global stiffness matrix instead of >> using MatSetClosure(I am not integrating 2 field separately) >> >> DMGetGlobalSection(dm,GlobalSection) >> For cells >> calculate element stiffness matrix eleMat >> For vertex in cells >> PetscSectionGetOffset(GlobalSection, vertex, offset) >> loc=[offset_v1, offset_v1+1, offset_v1+2, offset_v2, >> offset_v2+1.......] >> End >> MatSetValues(GlobalMat, n,loc,n,loc, eleMat, ADD_VALUES) >> End >> AssemblyBegin and End. >> >> Basically use the offset from global section to have the global dof >> number. >> > > Yes, that is exactly what happens in MatSetClosure(). However, you have to > be careful if you use > constraints (like Dirichlet conditions) in the Section to filter them out. > I use negative indices to do that > since they are ignored by MatSetValues(). > > Are you doing this because you want to set one field at a time? If so, > just call DMCreateSubDM() for that > field, and everything should work correctly with MatSetClosure(). > > Thanks, > > Matt > > >> Thanks, >> Josh >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From swarnava89 at gmail.com Mon May 20 11:51:19 2019 From: swarnava89 at gmail.com (Swarnava Ghosh) Date: Mon, 20 May 2019 09:51:19 -0700 Subject: [petsc-users] Creating a DMNetwork from a DMPlex In-Reply-To: References: <9DAFD49B-AB7F-435F-BB27-16EF946E1241@mcs.anl.gov> Message-ID: Hi Barry and Matt, Maybe try building by hand in a DMNetwork using a handrawn mesh with just a few vertices and endless and see if what you want to do makes sense > Okay, will try to do that. Do you have any DMNetwork example which I could follow. I think DMNetwork is not buying you anything here. It seems to make more sense to do it directly in Plex. You can easily lay down a P1 element for each field so that you can interpolate wherever you want. > Okay, then will it be possible to do vertex partitioning with plex? Essentially two processes can share an element but not vertex. I would start from a clean example, such as SNES ex17. That solves elasticity, so it has multiple fields and FEM. The change is that you don't want to use any of the assembly functions, so you keep the code that does data layout and FEM discretization, but it ignore the residual/Jacobian stuff. Feel free to ask about using the lower-level interpolation stuff which is not as documented. >Thanks for pointing out the reference. Could you please share the functions for interpolation? Sincerely, SG On Mon, May 20, 2019 at 4:02 AM Matthew Knepley wrote: > On Mon, May 20, 2019 at 3:05 AM Swarnava Ghosh via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> Hi Barry, >> >> Thank you for your email. My planned discretization is based on the fact >> that I need a distributed unstructured mesh, where at each vertex point I >> perform local calculations. For these calculations, I do NOT need need to >> assemble any global matrix. I will have fields defined at the vertices, and >> using linear interpolation, I am planing to find the values of these fields >> at some spatial points with are within a ball around each vertex. Once the >> values of these fields are known within the compact support around each >> vertex, I do local computations to calculate my unknown field. My reason >> for having the a mesh is essentially to 1) define fields at the vertices >> and 2) perform linear interpolation (using finite elements) at some spatial >> points. Also the local computations around at each vertex is >> computationally the most expensive step. In that case, having a cell >> partitioning will result in vertices being shared among processes, which >> will result in redundant computations. >> >> My idea is therefore to have DMNetwork to distribute vertices across >> processes and use finite elements for the linear interpolation part. >> > > I think DMNetwork is not buying you anything here. It seems to make more > sense to do it directly in Plex. > You can easily lay down a P1 element for each field so that you can > interpolate wherever you want. > > I would start from a clean example, such as SNES ex17. That solves > elasticity, so it has multiple fields and FEM. > The change is that you don't want to use any of the assembly functions, so > you keep the code that does data layout > and FEM discretization, but it ignore the residual/Jacobian stuff. Feel > free to ask about using the lower-level > interpolation stuff which is not as documented. > > Thanks, > > Matt > > >> Thanks, >> SG >> >> >> >> On Sun, May 19, 2019 at 6:54 PM Smith, Barry F. >> wrote: >> >>> >>> I am not sure you want DMNetwork, DMNetwork has no geometry; it only >>> has vertices and edges. Vertices are connected to other vertices through >>> the edges. For example I can't see how one would do vertex centered finite >>> volume methods with DMNetwork. Maybe if you said something more about your >>> planned discretization we could figure something out. >>> >>> > On May 19, 2019, at 8:32 PM, Swarnava Ghosh >>> wrote: >>> > >>> > Hi Barry, >>> > >>> > No, the gmesh file contains a mesh and not a graph/network. >>> > In that case, is it possible to create a DMNetwork first from the >>> DMPlex and then distribute the DMNetwork. >>> > >>> > I have this case, because I want a vertex partitioning of my mesh. >>> Domain decomposition of DMPlex gives me cell partitioning. Essentially what >>> I want is that no two processes can share a vertex BUT that can share an >>> edge. Similar to how a DMDA is distributed. >>> > >>> > Thanks, >>> > Swarnava >>> > >>> > On Sun, May 19, 2019 at 4:50 PM Smith, Barry F. >>> wrote: >>> > >>> > This use case never occurred to us. Is the gmesh file containing a >>> graph/network (as opposed to a mesh)? There seem two choices >>> > >>> > 1) if the gmesh file contains a graph/network one could write a gmesh >>> reader for that case that reads directly for and constructs a DMNetwork or >>> > >>> > 2) write a converter for a DMPlex to DMNetwork. >>> > >>> > I lean toward the first >>> > >>> > Either way you need to understand the documentation for DMNetwork >>> and how to build one up. >>> > >>> > >>> > Barry >>> > >>> > >>> > > On May 19, 2019, at 6:34 PM, Swarnava Ghosh via petsc-users < >>> petsc-users at mcs.anl.gov> wrote: >>> > > >>> > > Hi Petsc users and developers, >>> > > >>> > > I am trying to find a way of creating a DMNetwork from a DMPlex. I >>> have read the DMPlex from a gmesh file and have it distributed. >>> > > >>> > > Thanks, >>> > > SG >>> > >>> >>> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jychang48 at gmail.com Mon May 20 13:48:11 2019 From: jychang48 at gmail.com (Justin Chang) Date: Mon, 20 May 2019 12:48:11 -0600 Subject: [petsc-users] DMNetwork in petsc4py Message-ID: Hi all, Is there any current effort or plan to make the DMNetwork calls available in petsc4py? I don't see anything DMNetwork related in the master branch. Because we (NREL) have lots of existing network-like applications written in Python and Julia, none of which use PETSc, and instead use unscalable direct methods, so I think instead of rewriting everything into a C-based PETSc code it would be much more feasible to integrate their application code into petsc4py with DMNetwork. If not I was planning on writing these wrappers myself but wasn't sure if someone here is already doing it. Thanks, Justin -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Mon May 20 14:48:18 2019 From: mfadams at lbl.gov (Mark Adams) Date: Mon, 20 May 2019 15:48:18 -0400 Subject: [petsc-users] With-batch (new) flags Message-ID: We are getting this failure. This a bit frustrating in that the first error message "Must give a default value for known-mpi-shared-libraries.." OK, I google it and find that =0 is suggested. That seemed to work. Then we got a similar error about -known-64-bit-blas-indices. It was clear from the documentation what to use so we tried =0 and that failed (attached). This is little frustrating having to use try and error for each of these 'known" things. Dylan is trying --known-64-bit-blas-indices=1 now. I trust that will work, but I think the error are not very informative. All this known stuff is new to me. Perhaps put an FAQ for this and list all of the "known"s that we need to add in batch. Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure (2).log Type: application/octet-stream Size: 1630960 bytes Desc: not available URL: From balay at mcs.anl.gov Mon May 20 14:55:34 2019 From: balay at mcs.anl.gov (Balay, Satish) Date: Mon, 20 May 2019 19:55:34 +0000 Subject: [petsc-users] With-batch (new) flags In-Reply-To: References: Message-ID: for ex: ilp version of mkl is --known-64-bit-blas-indices=1 while lp mkl is --known-64-bit-blas-indices=0 Default blas we normally use is --known-64-bit-blas-indices=0 [they don't use 64bit indices] Satish On Mon, 20 May 2019, Mark Adams via petsc-users wrote: > We are getting this failure. This a bit frustrating in that the first error > message "Must give a default value for known-mpi-shared-libraries.." OK, I > google it and find that =0 is suggested. That seemed to work. Then we got a > similar error about -known-64-bit-blas-indices. It was clear from the > documentation what to use so we tried =0 and that failed (attached). This > is little frustrating having to use try and error for each of these 'known" > things. > > Dylan is trying --known-64-bit-blas-indices=1 now. I trust that will work, > but I think the error are not very informative. All this known stuff is new > to me. Perhaps put an FAQ for this and list all of the "known"s that we > need to add in batch. > > Thanks, > Mark > From mfadams at lbl.gov Mon May 20 15:10:45 2019 From: mfadams at lbl.gov (Mark Adams) Date: Mon, 20 May 2019 16:10:45 -0400 Subject: [petsc-users] With-batch (new) flags In-Reply-To: References: Message-ID: On Mon, May 20, 2019 at 3:55 PM Balay, Satish wrote: > for ex: ilp version of mkl is --known-64-bit-blas-indices=1 while lp mkl > is --known-64-bit-blas-indices=0 > > Default blas we normally use is --known-64-bit-blas-indices=0 [they don't > use 64bit indices] > Humm, that is what Dylan (in the log that I sent). He is downloading blas and has --known-64-bit-blas-indices=0. Should this be correct? > > Satish > > On Mon, 20 May 2019, Mark Adams via petsc-users wrote: > > > We are getting this failure. This a bit frustrating in that the first > error > > message "Must give a default value for known-mpi-shared-libraries.." OK, > I > > google it and find that =0 is suggested. That seemed to work. Then we > got a > > similar error about -known-64-bit-blas-indices. It was clear from the > > documentation what to use so we tried =0 and that failed (attached). This > > is little frustrating having to use try and error for each of these > 'known" > > things. > > > > Dylan is trying --known-64-bit-blas-indices=1 now. I trust that will > work, > > but I think the error are not very informative. All this known stuff is > new > > to me. Perhaps put an FAQ for this and list all of the "known"s that we > > need to add in batch. > > > > Thanks, > > Mark > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Mon May 20 15:50:31 2019 From: balay at mcs.anl.gov (Balay, Satish) Date: Mon, 20 May 2019 20:50:31 +0000 Subject: [petsc-users] With-batch (new) flags In-Reply-To: References: Message-ID: I'm not yet sure what the correct fix is - but the following change should get this going.. diff --git a/config/BuildSystem/config/packages/BlasLapack.py b/config/BuildSystem/config/packages/BlasLapack.py index e0310da4b0..7355f1a369 100644 --- a/config/BuildSystem/config/packages/BlasLapack.py +++ b/config/BuildSystem/config/packages/BlasLapack.py @@ -42,7 +42,7 @@ class Configure(config.package.Package): help.addArgument('BLAS/LAPACK', '-with-lapack-lib=',nargs.ArgLibrary(None, None, 'Indicate the library(s) containing LAPACK')) help.addArgument('BLAS/LAPACK', '-with-blaslapack-suffix=',nargs.ArgLibrary(None, None, 'Indicate a suffix for BLAS/LAPACK subroutine names.')) help.addArgument('BLAS/LAPACK', '-with-64-bit-blas-indices', nargs.ArgBool(None, 0, 'Try to use 64 bit integers for BLAS/LAPACK; will error if not available')) -# help.addArgument('BLAS/LAPACK', '-known-64-bit-blas-indices=', nargs.ArgBool(None, 0, 'Indicate if using 64 bit integer BLAS')) + help.addArgument('BLAS/LAPACK', '-known-64-bit-blas-indices=', nargs.ArgBool(None, 0, 'Indicate if using 64 bit integer BLAS')) return def getPrefix(self): Satish On Mon, 20 May 2019, Mark Adams via petsc-users wrote: > On Mon, May 20, 2019 at 3:55 PM Balay, Satish wrote: > > > for ex: ilp version of mkl is --known-64-bit-blas-indices=1 while lp mkl > > is --known-64-bit-blas-indices=0 > > > > Default blas we normally use is --known-64-bit-blas-indices=0 [they don't > > use 64bit indices] > > > > Humm, that is what Dylan (in the log that I sent). He is downloading blas > and has --known-64-bit-blas-indices=0. Should this be correct? > > > > > > Satish > > > > On Mon, 20 May 2019, Mark Adams via petsc-users wrote: > > > > > We are getting this failure. This a bit frustrating in that the first > > error > > > message "Must give a default value for known-mpi-shared-libraries.." OK, > > I > > > google it and find that =0 is suggested. That seemed to work. Then we > > got a > > > similar error about -known-64-bit-blas-indices. It was clear from the > > > documentation what to use so we tried =0 and that failed (attached). This > > > is little frustrating having to use try and error for each of these > > 'known" > > > things. > > > > > > Dylan is trying --known-64-bit-blas-indices=1 now. I trust that will > > work, > > > but I think the error are not very informative. All this known stuff is > > new > > > to me. Perhaps put an FAQ for this and list all of the "known"s that we > > > need to add in batch. > > > > > > Thanks, > > > Mark > > > > > > > > From knepley at gmail.com Mon May 20 16:00:40 2019 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 20 May 2019 17:00:40 -0400 Subject: [petsc-users] Creating a DMNetwork from a DMPlex In-Reply-To: References: <9DAFD49B-AB7F-435F-BB27-16EF946E1241@mcs.anl.gov> Message-ID: On Mon, May 20, 2019 at 12:51 PM Swarnava Ghosh wrote: > Hi Barry and Matt, > > Maybe try building by hand in a DMNetwork using a handrawn mesh with just > a few vertices and endless and see if what you want to do makes sense > > Okay, will try to do that. Do you have any DMNetwork example which I > could follow. > > I think DMNetwork is not buying you anything here. It seems to make more > sense to do it directly in Plex. > You can easily lay down a P1 element for each field so that you can > interpolate wherever you want. > > Okay, then will it be possible to do vertex partitioning with plex? > Essentially two processes can share an element but not vertex. > We can worry about that when everything works. Some things are easy but that breaks a lot of the model, so it unclear what all would have to change in order to do what you want. The partitioning of vertices, however, is trivial. > I would start from a clean example, such as SNES ex17. That solves > elasticity, so it has multiple fields and FEM. > The change is that you don't want to use any of the assembly functions, so > you keep the code that does data layout > and FEM discretization, but it ignore the residual/Jacobian stuff. Feel > free to ask about using the lower-level > interpolation stuff which is not as documented. > >Thanks for pointing out the reference. Could you please share the > functions for interpolation? > There are at least two ways to do it: 1) Locally: DMPlexEvaluateFieldJets_Internal() or DMFieldEvaluate() which is newer 2) Globally: https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/DMInterpolationEvaluate.html Thanks, Matt > Sincerely, > SG > > > On Mon, May 20, 2019 at 4:02 AM Matthew Knepley wrote: > >> On Mon, May 20, 2019 at 3:05 AM Swarnava Ghosh via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> >>> Hi Barry, >>> >>> Thank you for your email. My planned discretization is based on the fact >>> that I need a distributed unstructured mesh, where at each vertex point I >>> perform local calculations. For these calculations, I do NOT need need to >>> assemble any global matrix. I will have fields defined at the vertices, and >>> using linear interpolation, I am planing to find the values of these fields >>> at some spatial points with are within a ball around each vertex. Once the >>> values of these fields are known within the compact support around each >>> vertex, I do local computations to calculate my unknown field. My reason >>> for having the a mesh is essentially to 1) define fields at the vertices >>> and 2) perform linear interpolation (using finite elements) at some spatial >>> points. Also the local computations around at each vertex is >>> computationally the most expensive step. In that case, having a cell >>> partitioning will result in vertices being shared among processes, which >>> will result in redundant computations. >>> >>> My idea is therefore to have DMNetwork to distribute vertices across >>> processes and use finite elements for the linear interpolation part. >>> >> >> I think DMNetwork is not buying you anything here. It seems to make more >> sense to do it directly in Plex. >> You can easily lay down a P1 element for each field so that you can >> interpolate wherever you want. >> >> I would start from a clean example, such as SNES ex17. That solves >> elasticity, so it has multiple fields and FEM. >> The change is that you don't want to use any of the assembly functions, >> so you keep the code that does data layout >> and FEM discretization, but it ignore the residual/Jacobian stuff. Feel >> free to ask about using the lower-level >> interpolation stuff which is not as documented. >> >> Thanks, >> >> Matt >> >> >>> Thanks, >>> SG >>> >>> >>> >>> On Sun, May 19, 2019 at 6:54 PM Smith, Barry F. >>> wrote: >>> >>>> >>>> I am not sure you want DMNetwork, DMNetwork has no geometry; it only >>>> has vertices and edges. Vertices are connected to other vertices through >>>> the edges. For example I can't see how one would do vertex centered finite >>>> volume methods with DMNetwork. Maybe if you said something more about your >>>> planned discretization we could figure something out. >>>> >>>> > On May 19, 2019, at 8:32 PM, Swarnava Ghosh >>>> wrote: >>>> > >>>> > Hi Barry, >>>> > >>>> > No, the gmesh file contains a mesh and not a graph/network. >>>> > In that case, is it possible to create a DMNetwork first from the >>>> DMPlex and then distribute the DMNetwork. >>>> > >>>> > I have this case, because I want a vertex partitioning of my mesh. >>>> Domain decomposition of DMPlex gives me cell partitioning. Essentially what >>>> I want is that no two processes can share a vertex BUT that can share an >>>> edge. Similar to how a DMDA is distributed. >>>> > >>>> > Thanks, >>>> > Swarnava >>>> > >>>> > On Sun, May 19, 2019 at 4:50 PM Smith, Barry F. >>>> wrote: >>>> > >>>> > This use case never occurred to us. Is the gmesh file containing a >>>> graph/network (as opposed to a mesh)? There seem two choices >>>> > >>>> > 1) if the gmesh file contains a graph/network one could write a gmesh >>>> reader for that case that reads directly for and constructs a DMNetwork or >>>> > >>>> > 2) write a converter for a DMPlex to DMNetwork. >>>> > >>>> > I lean toward the first >>>> > >>>> > Either way you need to understand the documentation for DMNetwork >>>> and how to build one up. >>>> > >>>> > >>>> > Barry >>>> > >>>> > >>>> > > On May 19, 2019, at 6:34 PM, Swarnava Ghosh via petsc-users < >>>> petsc-users at mcs.anl.gov> wrote: >>>> > > >>>> > > Hi Petsc users and developers, >>>> > > >>>> > > I am trying to find a way of creating a DMNetwork from a DMPlex. I >>>> have read the DMPlex from a gmesh file and have it distributed. >>>> > > >>>> > > Thanks, >>>> > > SG >>>> > >>>> >>>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon May 20 16:38:38 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Mon, 20 May 2019 21:38:38 +0000 Subject: [petsc-users] DMNetwork in petsc4py In-Reply-To: References: Message-ID: Justin, That would be great. No one is working on it that I know of. Barry > On May 20, 2019, at 1:48 PM, Justin Chang via petsc-users wrote: > > Hi all, > > Is there any current effort or plan to make the DMNetwork calls available in petsc4py? I don't see anything DMNetwork related in the master branch. Because we (NREL) have lots of existing network-like applications written in Python and Julia, none of which use PETSc, and instead use unscalable direct methods, so I think instead of rewriting everything into a C-based PETSc code it would be much more feasible to integrate their application code into petsc4py with DMNetwork. > > If not I was planning on writing these wrappers myself but wasn't sure if someone here is already doing it. > > Thanks, > Justin From hongzhang at anl.gov Mon May 20 17:20:21 2019 From: hongzhang at anl.gov (Zhang, Hong) Date: Mon, 20 May 2019 22:20:21 +0000 Subject: [petsc-users] Question about TSComputeRHSJacobianConstant In-Reply-To: References: <61B21078-9146-4FE2-8967-95D64DB583C6@anl.gov> <400504D5-9319-4A96-B0C0-C871284EB989@anl.gov> <1A99BD32-723F-4A76-98A4-2AFFA790802B@anl.gov> <6280A5E9-9DA5-485D-96F0-12FB944ACC4C@anl.gov> Message-ID: Sajid, I have also rested the simpler problem you provided. The branch hongzh/fix-computejacobian gives exactly the same numerical results as the master branch does, but runs much faster. So the solver seems to work correctly. To rule out the possible compiler issues, you might want to try a different compiler or different optimization flags. Also you might want to try smaller stepsizes. Hong On May 20, 2019, at 4:47 PM, Sajid Ali > wrote: Hi Hong, I tried running a simpler problem that solves the equation ` u_t = A*u_xx + A*u_yy; ` and the fix-computejacobian branch works for this on a coarse grid. The code for the same is here : https://github.com/s-sajid-ali/xwp_petsc/blob/master/2d/FD/free_space/ex_dmda.c and it requires no input file. It writes at all times steps and comparing the output at the last time step, everything looks fine. I want to eliminate a possible source of error which could be the fact that I installed both versions (3.11.2 and fix-computejacobian) with intel compilers and O3. Could floating point errors occur due to this ? I didn't specify -fp-model strict but since the results I got were reasonable I never bothered to run a test suite. Thank You, Sajid Ali Applied Physics Northwestern University -------------- next part -------------- An HTML attachment was scrubbed... URL: From swarnava89 at gmail.com Mon May 20 22:34:12 2019 From: swarnava89 at gmail.com (Swarnava Ghosh) Date: Mon, 20 May 2019 20:34:12 -0700 Subject: [petsc-users] Creating a DMNetwork from a DMPlex In-Reply-To: References: <9DAFD49B-AB7F-435F-BB27-16EF946E1241@mcs.anl.gov> Message-ID: Hi Matt, I am trying to code my interpolation in parallel using DMInterpolationEvaluate. Do you have an example which I could refer to? Thanks, Swarnava On Mon, May 20, 2019 at 2:00 PM Matthew Knepley wrote: > On Mon, May 20, 2019 at 12:51 PM Swarnava Ghosh > wrote: > >> Hi Barry and Matt, >> >> Maybe try building by hand in a DMNetwork using a handrawn mesh with just >> a few vertices and endless and see if what you want to do makes sense >> > Okay, will try to do that. Do you have any DMNetwork example which I >> could follow. >> >> I think DMNetwork is not buying you anything here. It seems to make more >> sense to do it directly in Plex. >> You can easily lay down a P1 element for each field so that you can >> interpolate wherever you want. >> > Okay, then will it be possible to do vertex partitioning with plex? >> Essentially two processes can share an element but not vertex. >> > > We can worry about that when everything works. Some things are easy but > that breaks a lot of the model, so it > unclear what all would have to change in order to do what you want. The > partitioning of vertices, however, is trivial. > > >> I would start from a clean example, such as SNES ex17. That solves >> elasticity, so it has multiple fields and FEM. >> The change is that you don't want to use any of the assembly functions, >> so you keep the code that does data layout >> and FEM discretization, but it ignore the residual/Jacobian stuff. Feel >> free to ask about using the lower-level >> interpolation stuff which is not as documented. >> >Thanks for pointing out the reference. Could you please share the >> functions for interpolation? >> > > There are at least two ways to do it: > > 1) Locally: DMPlexEvaluateFieldJets_Internal() or DMFieldEvaluate() > which is newer > > 2) Globally: > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/DMInterpolationEvaluate.html > > Thanks, > > Matt > > >> Sincerely, >> SG >> >> >> On Mon, May 20, 2019 at 4:02 AM Matthew Knepley >> wrote: >> >>> On Mon, May 20, 2019 at 3:05 AM Swarnava Ghosh via petsc-users < >>> petsc-users at mcs.anl.gov> wrote: >>> >>>> Hi Barry, >>>> >>>> Thank you for your email. My planned discretization is based on the >>>> fact that I need a distributed unstructured mesh, where at each vertex >>>> point I perform local calculations. For these calculations, I do NOT need >>>> need to assemble any global matrix. I will have fields defined at the >>>> vertices, and using linear interpolation, I am planing to find the values >>>> of these fields at some spatial points with are within a ball around each >>>> vertex. Once the values of these fields are known within the compact >>>> support around each vertex, I do local computations to calculate my unknown >>>> field. My reason for having the a mesh is essentially to 1) define fields >>>> at the vertices and 2) perform linear interpolation (using finite elements) >>>> at some spatial points. Also the local computations around at each vertex >>>> is computationally the most expensive step. In that case, having a cell >>>> partitioning will result in vertices being shared among processes, which >>>> will result in redundant computations. >>>> >>>> My idea is therefore to have DMNetwork to distribute vertices across >>>> processes and use finite elements for the linear interpolation part. >>>> >>> >>> I think DMNetwork is not buying you anything here. It seems to make more >>> sense to do it directly in Plex. >>> You can easily lay down a P1 element for each field so that you can >>> interpolate wherever you want. >>> >>> I would start from a clean example, such as SNES ex17. That solves >>> elasticity, so it has multiple fields and FEM. >>> The change is that you don't want to use any of the assembly functions, >>> so you keep the code that does data layout >>> and FEM discretization, but it ignore the residual/Jacobian stuff. Feel >>> free to ask about using the lower-level >>> interpolation stuff which is not as documented. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thanks, >>>> SG >>>> >>>> >>>> >>>> On Sun, May 19, 2019 at 6:54 PM Smith, Barry F. >>>> wrote: >>>> >>>>> >>>>> I am not sure you want DMNetwork, DMNetwork has no geometry; it only >>>>> has vertices and edges. Vertices are connected to other vertices through >>>>> the edges. For example I can't see how one would do vertex centered finite >>>>> volume methods with DMNetwork. Maybe if you said something more about your >>>>> planned discretization we could figure something out. >>>>> >>>>> > On May 19, 2019, at 8:32 PM, Swarnava Ghosh >>>>> wrote: >>>>> > >>>>> > Hi Barry, >>>>> > >>>>> > No, the gmesh file contains a mesh and not a graph/network. >>>>> > In that case, is it possible to create a DMNetwork first from the >>>>> DMPlex and then distribute the DMNetwork. >>>>> > >>>>> > I have this case, because I want a vertex partitioning of my mesh. >>>>> Domain decomposition of DMPlex gives me cell partitioning. Essentially what >>>>> I want is that no two processes can share a vertex BUT that can share an >>>>> edge. Similar to how a DMDA is distributed. >>>>> > >>>>> > Thanks, >>>>> > Swarnava >>>>> > >>>>> > On Sun, May 19, 2019 at 4:50 PM Smith, Barry F. >>>>> wrote: >>>>> > >>>>> > This use case never occurred to us. Is the gmesh file containing >>>>> a graph/network (as opposed to a mesh)? There seem two choices >>>>> > >>>>> > 1) if the gmesh file contains a graph/network one could write a >>>>> gmesh reader for that case that reads directly for and constructs a >>>>> DMNetwork or >>>>> > >>>>> > 2) write a converter for a DMPlex to DMNetwork. >>>>> > >>>>> > I lean toward the first >>>>> > >>>>> > Either way you need to understand the documentation for DMNetwork >>>>> and how to build one up. >>>>> > >>>>> > >>>>> > Barry >>>>> > >>>>> > >>>>> > > On May 19, 2019, at 6:34 PM, Swarnava Ghosh via petsc-users < >>>>> petsc-users at mcs.anl.gov> wrote: >>>>> > > >>>>> > > Hi Petsc users and developers, >>>>> > > >>>>> > > I am trying to find a way of creating a DMNetwork from a DMPlex. I >>>>> have read the DMPlex from a gmesh file and have it distributed. >>>>> > > >>>>> > > Thanks, >>>>> > > SG >>>>> > >>>>> >>>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon May 20 23:15:57 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Tue, 21 May 2019 04:15:57 +0000 Subject: [petsc-users] With-batch (new) flags In-Reply-To: References: Message-ID: <861F3371-902A-4AD2-A4E3-19C96DFDAEF8@anl.gov> Yes, this is totally my fault. By removing the help message it made configure treat the argument as a string hence '0' was true and you got the error message. For fblaslapack one should use -known-64-bit-blas-indices=0 just as you did, I have pushed a fix to master What kind of system is sunfire09.pppl.gov ? Surely a system that has a batch system provides its own good BLAS/LAPACK. You should use the ones on the machine, not fblaslapack. Using fblaslapack in this situation is like going to a fancy sit-down dinner but bringing your dessert from McDonalds. It may be possible to remove many (but not all) of the cases where -known-64-bit-blas-indices is needed (for example when MKL, fblaslapack, f2blaslapack or --download-openblas is used we we know if the library is 64 bit indices and should set that without a need for a test or command line option. I'll look at it. Barry > On May 20, 2019, at 3:50 PM, Balay, Satish via petsc-users wrote: > > I'm not yet sure what the correct fix is - but the following change should get this going.. > > diff --git a/config/BuildSystem/config/packages/BlasLapack.py b/config/BuildSystem/config/packages/BlasLapack.py > index e0310da4b0..7355f1a369 100644 > --- a/config/BuildSystem/config/packages/BlasLapack.py > +++ b/config/BuildSystem/config/packages/BlasLapack.py > @@ -42,7 +42,7 @@ class Configure(config.package.Package): > help.addArgument('BLAS/LAPACK', '-with-lapack-lib=',nargs.ArgLibrary(None, None, 'Indicate the library(s) containing LAPACK')) > help.addArgument('BLAS/LAPACK', '-with-blaslapack-suffix=',nargs.ArgLibrary(None, None, 'Indicate a suffix for BLAS/LAPACK subroutine names.')) > help.addArgument('BLAS/LAPACK', '-with-64-bit-blas-indices', nargs.ArgBool(None, 0, 'Try to use 64 bit integers for BLAS/LAPACK; will error if not available')) > -# help.addArgument('BLAS/LAPACK', '-known-64-bit-blas-indices=', nargs.ArgBool(None, 0, 'Indicate if using 64 bit integer BLAS')) > + help.addArgument('BLAS/LAPACK', '-known-64-bit-blas-indices=', nargs.ArgBool(None, 0, 'Indicate if using 64 bit integer BLAS')) > return > > def getPrefix(self): > > Satish > > On Mon, 20 May 2019, Mark Adams via petsc-users wrote: > >> On Mon, May 20, 2019 at 3:55 PM Balay, Satish wrote: >> >>> for ex: ilp version of mkl is --known-64-bit-blas-indices=1 while lp mkl >>> is --known-64-bit-blas-indices=0 >>> >>> Default blas we normally use is --known-64-bit-blas-indices=0 [they don't >>> use 64bit indices] >>> >> >> Humm, that is what Dylan (in the log that I sent). He is downloading blas >> and has --known-64-bit-blas-indices=0. Should this be correct? >> >> >>> >>> Satish >>> >>> On Mon, 20 May 2019, Mark Adams via petsc-users wrote: >>> >>>> We are getting this failure. This a bit frustrating in that the first >>> error >>>> message "Must give a default value for known-mpi-shared-libraries.." OK, >>> I >>>> google it and find that =0 is suggested. That seemed to work. Then we >>> got a >>>> similar error about -known-64-bit-blas-indices. It was clear from the >>>> documentation what to use so we tried =0 and that failed (attached). This >>>> is little frustrating having to use try and error for each of these >>> 'known" >>>> things. >>>> >>>> Dylan is trying --known-64-bit-blas-indices=1 now. I trust that will >>> work, >>>> but I think the error are not very informative. All this known stuff is >>> new >>>> to me. Perhaps put an FAQ for this and list all of the "known"s that we >>>> need to add in batch. >>>> >>>> Thanks, >>>> Mark >>>> >>> >>> >> > From knepley at gmail.com Tue May 21 07:13:20 2019 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 21 May 2019 08:13:20 -0400 Subject: [petsc-users] Creating a DMNetwork from a DMPlex In-Reply-To: References: <9DAFD49B-AB7F-435F-BB27-16EF946E1241@mcs.anl.gov> Message-ID: On Mon, May 20, 2019 at 11:34 PM Swarnava Ghosh wrote: > Hi Matt, > > I am trying to code my interpolation in parallel using > DMInterpolationEvaluate. Do you have an example which I could refer to? > https://bitbucket.org/petsc/petsc/src/master/src/snes/examples/tests/ex2.c Matt > Thanks, > Swarnava > > On Mon, May 20, 2019 at 2:00 PM Matthew Knepley wrote: > >> On Mon, May 20, 2019 at 12:51 PM Swarnava Ghosh >> wrote: >> >>> Hi Barry and Matt, >>> >>> Maybe try building by hand in a DMNetwork using a handrawn mesh with >>> just a few vertices and endless and see if what you want to do makes sense >>> > Okay, will try to do that. Do you have any DMNetwork example which I >>> could follow. >>> >>> I think DMNetwork is not buying you anything here. It seems to make more >>> sense to do it directly in Plex. >>> You can easily lay down a P1 element for each field so that you can >>> interpolate wherever you want. >>> > Okay, then will it be possible to do vertex partitioning with plex? >>> Essentially two processes can share an element but not vertex. >>> >> >> We can worry about that when everything works. Some things are easy but >> that breaks a lot of the model, so it >> unclear what all would have to change in order to do what you want. The >> partitioning of vertices, however, is trivial. >> >> >>> I would start from a clean example, such as SNES ex17. That solves >>> elasticity, so it has multiple fields and FEM. >>> The change is that you don't want to use any of the assembly functions, >>> so you keep the code that does data layout >>> and FEM discretization, but it ignore the residual/Jacobian stuff. Feel >>> free to ask about using the lower-level >>> interpolation stuff which is not as documented. >>> >Thanks for pointing out the reference. Could you please share the >>> functions for interpolation? >>> >> >> There are at least two ways to do it: >> >> 1) Locally: DMPlexEvaluateFieldJets_Internal() or DMFieldEvaluate() >> which is newer >> >> 2) Globally: >> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/DMInterpolationEvaluate.html >> >> Thanks, >> >> Matt >> >> >>> Sincerely, >>> SG >>> >>> >>> On Mon, May 20, 2019 at 4:02 AM Matthew Knepley >>> wrote: >>> >>>> On Mon, May 20, 2019 at 3:05 AM Swarnava Ghosh via petsc-users < >>>> petsc-users at mcs.anl.gov> wrote: >>>> >>>>> Hi Barry, >>>>> >>>>> Thank you for your email. My planned discretization is based on the >>>>> fact that I need a distributed unstructured mesh, where at each vertex >>>>> point I perform local calculations. For these calculations, I do NOT need >>>>> need to assemble any global matrix. I will have fields defined at the >>>>> vertices, and using linear interpolation, I am planing to find the values >>>>> of these fields at some spatial points with are within a ball around each >>>>> vertex. Once the values of these fields are known within the compact >>>>> support around each vertex, I do local computations to calculate my unknown >>>>> field. My reason for having the a mesh is essentially to 1) define fields >>>>> at the vertices and 2) perform linear interpolation (using finite elements) >>>>> at some spatial points. Also the local computations around at each vertex >>>>> is computationally the most expensive step. In that case, having a cell >>>>> partitioning will result in vertices being shared among processes, which >>>>> will result in redundant computations. >>>>> >>>>> My idea is therefore to have DMNetwork to distribute vertices across >>>>> processes and use finite elements for the linear interpolation part. >>>>> >>>> >>>> I think DMNetwork is not buying you anything here. It seems to make >>>> more sense to do it directly in Plex. >>>> You can easily lay down a P1 element for each field so that you can >>>> interpolate wherever you want. >>>> >>>> I would start from a clean example, such as SNES ex17. That solves >>>> elasticity, so it has multiple fields and FEM. >>>> The change is that you don't want to use any of the assembly functions, >>>> so you keep the code that does data layout >>>> and FEM discretization, but it ignore the residual/Jacobian stuff. Feel >>>> free to ask about using the lower-level >>>> interpolation stuff which is not as documented. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Thanks, >>>>> SG >>>>> >>>>> >>>>> >>>>> On Sun, May 19, 2019 at 6:54 PM Smith, Barry F. >>>>> wrote: >>>>> >>>>>> >>>>>> I am not sure you want DMNetwork, DMNetwork has no geometry; it >>>>>> only has vertices and edges. Vertices are connected to other vertices >>>>>> through the edges. For example I can't see how one would do vertex centered >>>>>> finite volume methods with DMNetwork. Maybe if you said something more >>>>>> about your planned discretization we could figure something out. >>>>>> >>>>>> > On May 19, 2019, at 8:32 PM, Swarnava Ghosh >>>>>> wrote: >>>>>> > >>>>>> > Hi Barry, >>>>>> > >>>>>> > No, the gmesh file contains a mesh and not a graph/network. >>>>>> > In that case, is it possible to create a DMNetwork first from the >>>>>> DMPlex and then distribute the DMNetwork. >>>>>> > >>>>>> > I have this case, because I want a vertex partitioning of my mesh. >>>>>> Domain decomposition of DMPlex gives me cell partitioning. Essentially what >>>>>> I want is that no two processes can share a vertex BUT that can share an >>>>>> edge. Similar to how a DMDA is distributed. >>>>>> > >>>>>> > Thanks, >>>>>> > Swarnava >>>>>> > >>>>>> > On Sun, May 19, 2019 at 4:50 PM Smith, Barry F. >>>>>> wrote: >>>>>> > >>>>>> > This use case never occurred to us. Is the gmesh file containing >>>>>> a graph/network (as opposed to a mesh)? There seem two choices >>>>>> > >>>>>> > 1) if the gmesh file contains a graph/network one could write a >>>>>> gmesh reader for that case that reads directly for and constructs a >>>>>> DMNetwork or >>>>>> > >>>>>> > 2) write a converter for a DMPlex to DMNetwork. >>>>>> > >>>>>> > I lean toward the first >>>>>> > >>>>>> > Either way you need to understand the documentation for >>>>>> DMNetwork and how to build one up. >>>>>> > >>>>>> > >>>>>> > Barry >>>>>> > >>>>>> > >>>>>> > > On May 19, 2019, at 6:34 PM, Swarnava Ghosh via petsc-users < >>>>>> petsc-users at mcs.anl.gov> wrote: >>>>>> > > >>>>>> > > Hi Petsc users and developers, >>>>>> > > >>>>>> > > I am trying to find a way of creating a DMNetwork from a DMPlex. >>>>>> I have read the DMPlex from a gmesh file and have it distributed. >>>>>> > > >>>>>> > > Thanks, >>>>>> > > SG >>>>>> > >>>>>> >>>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Tue May 21 07:30:37 2019 From: mfadams at lbl.gov (Mark Adams) Date: Tue, 21 May 2019 08:30:37 -0400 Subject: [petsc-users] With-batch (new) flags In-Reply-To: <861F3371-902A-4AD2-A4E3-19C96DFDAEF8@anl.gov> References: <861F3371-902A-4AD2-A4E3-19C96DFDAEF8@anl.gov> Message-ID: On Tue, May 21, 2019 at 12:16 AM Smith, Barry F. wrote: > > Yes, this is totally my fault. By removing the help message it made > configure treat the argument as a string hence '0' was true and you got the > error message. For fblaslapack one should use -known-64-bit-blas-indices=0 > just as you did, I have pushed a fix to master > > What kind of system is sunfire09.pppl.gov ? Surely a system that has a > batch system provides its own good BLAS/LAPACK. You should use the ones on > the machine, not fblaslapack. Using fblaslapack in this situation is like > going to a fancy sit-down dinner but bringing your dessert from McDonalds. > We just want stuff to work. This machine is not well supported and we are using MUMPS, so we just want PETSc to build and be correct. Thanks > > It may be possible to remove many (but not all) of the cases where > -known-64-bit-blas-indices is needed (for example when MKL, fblaslapack, > f2blaslapack or --download-openblas is used we we know if the library is 64 > bit indices and should set that without a need for a test or command line > option. I'll look at it. > > Barry > > > > On May 20, 2019, at 3:50 PM, Balay, Satish via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > > > I'm not yet sure what the correct fix is - but the following change > should get this going.. > > > > diff --git a/config/BuildSystem/config/packages/BlasLapack.py > b/config/BuildSystem/config/packages/BlasLapack.py > > index e0310da4b0..7355f1a369 100644 > > --- a/config/BuildSystem/config/packages/BlasLapack.py > > +++ b/config/BuildSystem/config/packages/BlasLapack.py > > @@ -42,7 +42,7 @@ class Configure(config.package.Package): > > help.addArgument('BLAS/LAPACK', '-with-lapack-lib= [/Users/..../liblapack.a,...]>',nargs.ArgLibrary(None, None, 'Indicate the > library(s) containing LAPACK')) > > help.addArgument('BLAS/LAPACK', > '-with-blaslapack-suffix=',nargs.ArgLibrary(None, None, 'Indicate a > suffix for BLAS/LAPACK subroutine names.')) > > help.addArgument('BLAS/LAPACK', '-with-64-bit-blas-indices', > nargs.ArgBool(None, 0, 'Try to use 64 bit integers for BLAS/LAPACK; will > error if not available')) > > -# help.addArgument('BLAS/LAPACK', > '-known-64-bit-blas-indices=', nargs.ArgBool(None, 0, 'Indicate if > using 64 bit integer BLAS')) > > + help.addArgument('BLAS/LAPACK', > '-known-64-bit-blas-indices=', nargs.ArgBool(None, 0, 'Indicate if > using 64 bit integer BLAS')) > > return > > > > def getPrefix(self): > > > > Satish > > > > On Mon, 20 May 2019, Mark Adams via petsc-users wrote: > > > >> On Mon, May 20, 2019 at 3:55 PM Balay, Satish > wrote: > >> > >>> for ex: ilp version of mkl is --known-64-bit-blas-indices=1 while lp > mkl > >>> is --known-64-bit-blas-indices=0 > >>> > >>> Default blas we normally use is --known-64-bit-blas-indices=0 [they > don't > >>> use 64bit indices] > >>> > >> > >> Humm, that is what Dylan (in the log that I sent). He is downloading > blas > >> and has --known-64-bit-blas-indices=0. Should this be correct? > >> > >> > >>> > >>> Satish > >>> > >>> On Mon, 20 May 2019, Mark Adams via petsc-users wrote: > >>> > >>>> We are getting this failure. This a bit frustrating in that the first > >>> error > >>>> message "Must give a default value for known-mpi-shared-libraries.." > OK, > >>> I > >>>> google it and find that =0 is suggested. That seemed to work. Then we > >>> got a > >>>> similar error about -known-64-bit-blas-indices. It was clear from the > >>>> documentation what to use so we tried =0 and that failed (attached). > This > >>>> is little frustrating having to use try and error for each of these > >>> 'known" > >>>> things. > >>>> > >>>> Dylan is trying --known-64-bit-blas-indices=1 now. I trust that will > >>> work, > >>>> but I think the error are not very informative. All this known stuff > is > >>> new > >>>> to me. Perhaps put an FAQ for this and list all of the "known"s that > we > >>>> need to add in batch. > >>>> > >>>> Thanks, > >>>> Mark > >>>> > >>> > >>> > >> > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue May 21 09:57:21 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Tue, 21 May 2019 14:57:21 +0000 Subject: [petsc-users] With-batch (new) flags In-Reply-To: References: <861F3371-902A-4AD2-A4E3-19C96DFDAEF8@anl.gov> Message-ID: <0D1E0A10-AECE-4D62-BD14-A0CCA528F244@mcs.anl.gov> I have posted a pull request that will greatly reduce the need to use the -known-64-bit-blas-indices flag on batch systems. https://bitbucket.org/petsc/petsc/pull-requests/1689/reduce-the-need-to-automatically-detect-or/diff This includes your use case. Thanks for the complaint, it resulted in easier installs for some people in the future, Barry > On May 21, 2019, at 7:30 AM, Mark Adams wrote: > > > > On Tue, May 21, 2019 at 12:16 AM Smith, Barry F. wrote: > > Yes, this is totally my fault. By removing the help message it made configure treat the argument as a string hence '0' was true and you got the error message. For fblaslapack one should use -known-64-bit-blas-indices=0 just as you did, I have pushed a fix to master > > What kind of system is sunfire09.pppl.gov ? Surely a system that has a batch system provides its own good BLAS/LAPACK. You should use the ones on the machine, not fblaslapack. Using fblaslapack in this situation is like going to a fancy sit-down dinner but bringing your dessert from McDonalds. > > We just want stuff to work. This machine is not well supported and we are using MUMPS, so we just want PETSc to build and be correct. > > Thanks > > > It may be possible to remove many (but not all) of the cases where -known-64-bit-blas-indices is needed (for example when MKL, fblaslapack, f2blaslapack or --download-openblas is used we we know if the library is 64 bit indices and should set that without a need for a test or command line option. I'll look at it. > > Barry > > > > On May 20, 2019, at 3:50 PM, Balay, Satish via petsc-users wrote: > > > > I'm not yet sure what the correct fix is - but the following change should get this going.. > > > > diff --git a/config/BuildSystem/config/packages/BlasLapack.py b/config/BuildSystem/config/packages/BlasLapack.py > > index e0310da4b0..7355f1a369 100644 > > --- a/config/BuildSystem/config/packages/BlasLapack.py > > +++ b/config/BuildSystem/config/packages/BlasLapack.py > > @@ -42,7 +42,7 @@ class Configure(config.package.Package): > > help.addArgument('BLAS/LAPACK', '-with-lapack-lib=',nargs.ArgLibrary(None, None, 'Indicate the library(s) containing LAPACK')) > > help.addArgument('BLAS/LAPACK', '-with-blaslapack-suffix=',nargs.ArgLibrary(None, None, 'Indicate a suffix for BLAS/LAPACK subroutine names.')) > > help.addArgument('BLAS/LAPACK', '-with-64-bit-blas-indices', nargs.ArgBool(None, 0, 'Try to use 64 bit integers for BLAS/LAPACK; will error if not available')) > > -# help.addArgument('BLAS/LAPACK', '-known-64-bit-blas-indices=', nargs.ArgBool(None, 0, 'Indicate if using 64 bit integer BLAS')) > > + help.addArgument('BLAS/LAPACK', '-known-64-bit-blas-indices=', nargs.ArgBool(None, 0, 'Indicate if using 64 bit integer BLAS')) > > return > > > > def getPrefix(self): > > > > Satish > > > > On Mon, 20 May 2019, Mark Adams via petsc-users wrote: > > > >> On Mon, May 20, 2019 at 3:55 PM Balay, Satish wrote: > >> > >>> for ex: ilp version of mkl is --known-64-bit-blas-indices=1 while lp mkl > >>> is --known-64-bit-blas-indices=0 > >>> > >>> Default blas we normally use is --known-64-bit-blas-indices=0 [they don't > >>> use 64bit indices] > >>> > >> > >> Humm, that is what Dylan (in the log that I sent). He is downloading blas > >> and has --known-64-bit-blas-indices=0. Should this be correct? > >> > >> > >>> > >>> Satish > >>> > >>> On Mon, 20 May 2019, Mark Adams via petsc-users wrote: > >>> > >>>> We are getting this failure. This a bit frustrating in that the first > >>> error > >>>> message "Must give a default value for known-mpi-shared-libraries.." OK, > >>> I > >>>> google it and find that =0 is suggested. That seemed to work. Then we > >>> got a > >>>> similar error about -known-64-bit-blas-indices. It was clear from the > >>>> documentation what to use so we tried =0 and that failed (attached). This > >>>> is little frustrating having to use try and error for each of these > >>> 'known" > >>>> things. > >>>> > >>>> Dylan is trying --known-64-bit-blas-indices=1 now. I trust that will > >>> work, > >>>> but I think the error are not very informative. All this known stuff is > >>> new > >>>> to me. Perhaps put an FAQ for this and list all of the "known"s that we > >>>> need to add in batch. > >>>> > >>>> Thanks, > >>>> Mark > >>>> > >>> > >>> > >> > > > From sajidsyed2021 at u.northwestern.edu Wed May 22 14:26:17 2019 From: sajidsyed2021 at u.northwestern.edu (Sajid Ali) Date: Wed, 22 May 2019 14:26:17 -0500 Subject: [petsc-users] Question about TSComputeRHSJacobianConstant In-Reply-To: References: <61B21078-9146-4FE2-8967-95D64DB583C6@anl.gov> <400504D5-9319-4A96-B0C0-C871284EB989@anl.gov> <1A99BD32-723F-4A76-98A4-2AFFA790802B@anl.gov> <6280A5E9-9DA5-485D-96F0-12FB944ACC4C@anl.gov> Message-ID: Hi Hong, Looks like this is my fault since I'm using -ksp_type preonly -pc_type gamg. If I use the default ksp (GMRES) then everything works fine on a smaller problem. Just to confirm, -ksp_type preonly is to be used only with direct-solve preconditioners like LU,Cholesky, right ? Thank You, Sajid Ali Applied Physics Northwestern University -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 22 14:30:36 2019 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 22 May 2019 15:30:36 -0400 Subject: [petsc-users] Question about TSComputeRHSJacobianConstant In-Reply-To: References: <61B21078-9146-4FE2-8967-95D64DB583C6@anl.gov> <400504D5-9319-4A96-B0C0-C871284EB989@anl.gov> <1A99BD32-723F-4A76-98A4-2AFFA790802B@anl.gov> <6280A5E9-9DA5-485D-96F0-12FB944ACC4C@anl.gov> Message-ID: On Wed, May 22, 2019 at 3:28 PM Sajid Ali via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi Hong, > > Looks like this is my fault since I'm using -ksp_type preonly -pc_type > gamg. If I use the default ksp (GMRES) then everything works fine on a > smaller problem. > > Just to confirm, -ksp_type preonly is to be used only with direct-solve > preconditioners like LU,Cholesky, right ? > It depends what you want. preconly just applies the PC once. It could be that you just want MG applied once, but you have to know that you might not get residual reduction you want unless your system has certain characteristics. Matt > Thank You, > Sajid Ali > Applied Physics > Northwestern University > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sajidsyed2021 at u.northwestern.edu Wed May 22 14:45:56 2019 From: sajidsyed2021 at u.northwestern.edu (Sajid Ali) Date: Wed, 22 May 2019 14:45:56 -0500 Subject: [petsc-users] Question about TSComputeRHSJacobianConstant In-Reply-To: References: <61B21078-9146-4FE2-8967-95D64DB583C6@anl.gov> <400504D5-9319-4A96-B0C0-C871284EB989@anl.gov> <1A99BD32-723F-4A76-98A4-2AFFA790802B@anl.gov> <6280A5E9-9DA5-485D-96F0-12FB944ACC4C@anl.gov> Message-ID: Hi Matt, Thanks for the explanation. That makes sense since I'd get reasonably close convergence with preonly sometimes and not at other times which was confusing. Anyway, since there's no pc_tol (analogous to ksp_rtol/ksp_atol, etc), I'd have to more carefully set the gamg preconditioner options to ensure that it converges in one run, but since there's no guarantee that what works for one problem might not work for another (or the same problem at a different grid size), I'll stick with GMRES+gamg for now. Thank You, Sajid Ali Applied Physics Northwestern University -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed May 22 15:02:51 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Wed, 22 May 2019 20:02:51 +0000 Subject: [petsc-users] Question about TSComputeRHSJacobianConstant In-Reply-To: References: <61B21078-9146-4FE2-8967-95D64DB583C6@anl.gov> <400504D5-9319-4A96-B0C0-C871284EB989@anl.gov> <1A99BD32-723F-4A76-98A4-2AFFA790802B@anl.gov> <6280A5E9-9DA5-485D-96F0-12FB944ACC4C@anl.gov> Message-ID: > On May 22, 2019, at 2:26 PM, Sajid Ali via petsc-users wrote: > > Hi Hong, > > Looks like this is my fault since I'm using -ksp_type preonly -pc_type gamg. If I use the default ksp (GMRES) then everything works fine on a smaller problem. > > Just to confirm, -ksp_type preonly is to be used only with direct-solve preconditioners like LU,Cholesky, right ? You can use it any time you like but it only applies the preconditioner; thus unless your preconditioner is a really good approximation to the operator it won't give you much information. > > Thank You, > Sajid Ali > Applied Physics > Northwestern University From bsmith at mcs.anl.gov Wed May 22 16:14:23 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Wed, 22 May 2019 21:14:23 +0000 Subject: [petsc-users] Question about TSComputeRHSJacobianConstant In-Reply-To: References: <61B21078-9146-4FE2-8967-95D64DB583C6@anl.gov> <400504D5-9319-4A96-B0C0-C871284EB989@anl.gov> <1A99BD32-723F-4A76-98A4-2AFFA790802B@anl.gov> <6280A5E9-9DA5-485D-96F0-12FB944ACC4C@anl.gov> Message-ID: <0D532C0E-1D9A-41A8-9B37-E286DF08B22B@anl.gov> There is no harm in having the GMRES there even if you use a direct solver (for testing) so just leave the GMRES. Changing to preonly every time you try LU is prone to error if you forget to change back. Barry > On May 22, 2019, at 2:45 PM, Sajid Ali via petsc-users wrote: > > Hi Matt, > > Thanks for the explanation. That makes sense since I'd get reasonably close convergence with preonly sometimes and not at other times which was confusing. > > Anyway, since there's no pc_tol (analogous to ksp_rtol/ksp_atol, etc), I'd have to more carefully set the gamg preconditioner options to ensure that it converges in one run, but since there's no guarantee that what works for one problem might not work for another (or the same problem at a different grid size), I'll stick with GMRES+gamg for now. > > Thank You, > Sajid Ali > Applied Physics > Northwestern University From davelee2804 at gmail.com Thu May 23 04:08:33 2019 From: davelee2804 at gmail.com (Dave Lee) Date: Thu, 23 May 2019 19:08:33 +1000 Subject: [petsc-users] Singlar values of the GMRES Hessenberg matrix Message-ID: Hi PETSc, I'm trying to add a "hook step" to the SNES trust region solver (at the end of the function: KSPGMRESBuildSoln()) I'm testing this using the (linear) example: src/ksp/ksp/examples/tutorials/ex1.c as gdb --args ./test -snes_mf -snes_type newtontr -ksp_rtol 1.0e-12 -snes_stol 1.0e-12 -ksp_converged_reason -snes_converged_reason -ksp_monitor -snes_monitor (Ignore the SNES stuff, this is for when I test nonlinear examples). When I call the LAPACK SVD routine via PETSc as PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_(...)) I get the following singular values: 0 KSP Residual norm 7.071067811865e-01 1 KSP Residual norm 3.162277660168e-01 2 KSP Residual norm 1.889822365046e-01 3 KSP Residual norm 1.290994448736e-01 4 KSP Residual norm 9.534625892456e-02 5 KSP Residual norm 8.082545620881e-16 1 0.5 -7.85046e-16 1.17757e-15 0.5 1 0.5 1.7271e-15 0 0.5 1 0.5 0 0 0.5 1 0 0 0 0.5 singular values: 2.36264 0.409816 1.97794e-15 6.67632e-16 Linear solve converged due to CONVERGED_RTOL iterations 5 Where the lines above the singular values are the Hessenberg matrix that I'm doing the SVD on. When I build the solution in terms of the leading two right singular vectors (and subsequently the first two orthonormal basis vectors in VECS_VV I get an error norm as: Norm of error 3.16228, Iterations 5 My suspicion is that I'm creating the Hessenberg incorrectly, as I would have thought that this problem should have more than two non-zero leading singular values. Within my modified version of the GMRES build solution function (attached) I'm creating this (and passing it to LAPACK as): nRows = gmres->it+1; nCols = nRows-1; ierr = PetscBLASIntCast(nRows,&nRows_blas);CHKERRQ(ierr); ierr = PetscBLASIntCast(nCols,&nCols_blas);CHKERRQ(ierr); ierr = PetscBLASIntCast(5*nRows,&lwork);CHKERRQ(ierr); ierr = PetscMalloc1(5*nRows,&work);CHKERRQ(ierr); ierr = PetscMalloc1(nRows*nCols,&R);CHKERRQ(ierr); ierr = PetscMalloc1(nRows*nCols,&H);CHKERRQ(ierr); for (jj = 0; jj < nRows; jj++) { for (ii = 0; ii < nCols; ii++) { R[jj*nCols+ii] = *HES(jj,ii); } } // Duplicate the Hessenberg matrix as the one passed to the SVD solver is destroyed for (ii=0; iires_beta; for (ii=0; ii -------------- next part -------------- A non-text attachment was scrubbed... Name: gmres.c Type: application/octet-stream Size: 40806 bytes Desc: not available URL: From knepley at gmail.com Thu May 23 05:20:19 2019 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 23 May 2019 06:20:19 -0400 Subject: [petsc-users] Singlar values of the GMRES Hessenberg matrix In-Reply-To: References: Message-ID: On Thu, May 23, 2019 at 5:09 AM Dave Lee via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi PETSc, > > I'm trying to add a "hook step" to the SNES trust region solver (at the > end of the function: KSPGMRESBuildSoln()) > > I'm testing this using the (linear) example: > src/ksp/ksp/examples/tutorials/ex1.c > as > gdb --args ./test -snes_mf -snes_type newtontr -ksp_rtol 1.0e-12 > -snes_stol 1.0e-12 -ksp_converged_reason -snes_converged_reason > -ksp_monitor -snes_monitor > (Ignore the SNES stuff, this is for when I test nonlinear examples). > > When I call the LAPACK SVD routine via PETSc as > PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_(...)) > I get the following singular values: > > 0 KSP Residual norm 7.071067811865e-01 > 1 KSP Residual norm 3.162277660168e-01 > 2 KSP Residual norm 1.889822365046e-01 > 3 KSP Residual norm 1.290994448736e-01 > 4 KSP Residual norm 9.534625892456e-02 > 5 KSP Residual norm 8.082545620881e-16 > > 1 0.5 -7.85046e-16 1.17757e-15 > 0.5 1 0.5 1.7271e-15 > 0 0.5 1 0.5 > 0 0 0.5 1 > 0 0 0 0.5 > > singular values: 2.36264 0.409816 1.97794e-15 6.67632e-16 > > Linear solve converged due to CONVERGED_RTOL iterations 5 > > Where the lines above the singular values are the Hessenberg matrix that > I'm doing the SVD on. > First, write out all the SVD matrices you get and make sure that they reconstruct the input matrix (that you do not have something transposed somewhere). Matt > When I build the solution in terms of the leading two right singular > vectors (and subsequently the first two orthonormal basis vectors in > VECS_VV I get an error norm as: > Norm of error 3.16228, Iterations 5 > > My suspicion is that I'm creating the Hessenberg incorrectly, as I would > have thought that this problem should have more than two non-zero leading > singular values. > > Within my modified version of the GMRES build solution function (attached) > I'm creating this (and passing it to LAPACK as): > > nRows = gmres->it+1; > nCols = nRows-1; > > ierr = PetscBLASIntCast(nRows,&nRows_blas);CHKERRQ(ierr); > ierr = PetscBLASIntCast(nCols,&nCols_blas);CHKERRQ(ierr); > ierr = PetscBLASIntCast(5*nRows,&lwork);CHKERRQ(ierr); > ierr = PetscMalloc1(5*nRows,&work);CHKERRQ(ierr); > ierr = PetscMalloc1(nRows*nCols,&R);CHKERRQ(ierr); > ierr = PetscMalloc1(nRows*nCols,&H);CHKERRQ(ierr); > for (jj = 0; jj < nRows; jj++) { > for (ii = 0; ii < nCols; ii++) { > R[jj*nCols+ii] = *HES(jj,ii); > } > } > // Duplicate the Hessenberg matrix as the one passed to the SVD solver > is destroyed > for (ii=0; ii > ierr = PetscMalloc1(nRows*nRows,&U);CHKERRQ(ierr); > ierr = PetscMalloc1(nCols*nCols,&VT);CHKERRQ(ierr); > ierr = PetscMalloc1(nRows*nRows,&UT);CHKERRQ(ierr); > ierr = PetscMalloc1(nCols*nCols,&V);CHKERRQ(ierr); > ierr = PetscMalloc1(nRows,&p);CHKERRQ(ierr); > ierr = PetscMalloc1(nCols,&q);CHKERRQ(ierr); > ierr = PetscMalloc1(nCols,&y);CHKERRQ(ierr); > > // Perform an SVD on the Hessenberg matrix - Note: this call destroys > the input Hessenberg > ierr = PetscFPTrapPush(PETSC_FP_TRAP_OFF);CHKERRQ(ierr); > > PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_("A","A",&nRows_blas,&nCols_blas,R,&nRows_blas,S,UT,&nRows_blas,V,&nCols_blas,work,&lwork,&lierr)); > if (lierr) SETERRQ1(PETSC_COMM_SELF,PETSC_ERR_LIB,"Error in SVD Lapack > routine %d",(int)lierr); > ierr = PetscFPTrapPop();CHKERRQ(ierr); > > // Find the number of non-zero singular values > for(nnz=0; nnz if(fabs(S[nnz]) < 1.0e-8) break; > } > printf("number of nonzero singular values: %d\n",nnz); > > trans(nRows,nRows,UT,U); > trans(nCols,nCols,V,VT); > > // Compute p = ||r_0|| U^T e_1 > beta = gmres->res_beta; > for (ii=0; ii p[ii] = beta*UT[ii*nRows]; > } > p[nCols] = 0.0; > > // Original GMRES solution (\mu = 0) > for (ii=0; ii q[ii] = p[ii]/S[ii]; > } > > // Expand y in terms of the right singular vectors as y = V q > for (jj=0; jj y[jj] = 0.0; > for (ii=0; ii y[jj] += V[jj*nCols+ii]*q[ii]; // transpose of the transpose > } > } > > // Pass the orthnomalized Krylov vector weights back out > for (ii=0; ii nrs[ii] = y[ii]; > } > > I just wanted to check that this is the correct way to extract the > Hessenberg from the KSP_GMRES structure, and to pass it to LAPACK, and if > so, should I really be expecting only two non-zero singular values in > return for this problem? > > Cheers, Dave. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From swarnava89 at gmail.com Thu May 23 14:08:54 2019 From: swarnava89 at gmail.com (Swarnava Ghosh) Date: Thu, 23 May 2019 12:08:54 -0700 Subject: [petsc-users] Interpolation using DMInterpolationEvaluate Message-ID: Hi PETSc developers and users, I am trying to setup a test case of DMInterpolationEvaluate. I am referring to the example provided in ( https://bitbucket.org/petsc/petsc/src/master/src/snes/examples/tests/ex2.c) to code it up. In my case. I have read a mesh from a file created using Gmesh, and I am trying to interpolate a constant field of ones using DMInterpolationEvaluate. Since the field is constant, then the interpolated values should also be the same. However, I am getting other values: Attached is the code which I wrote to test it: static char help[] = "Test interpolation\n\ using a parallel unstructured mesh (DMPLEX) to discretize it.\n"; #include #include #include #include #include #include #undef __FUNCT__ #define __FUNCT__ "main" PetscErrorCode SolveProblem(int i) { PetscBool interpolate=0; char filename[]="mesh2.msh"; //char filename[]="mesh1"; PetscErrorCode ierr; DM dm; DM distributedMesh = NULL; PetscViewer viewer; DMLabel label; PetscBool simplex=PETSC_TRUE; PetscFE fe; PetscQuadrature q; PetscDS prob=NULL; PetscInt id = 1; PetscInt order; PetscPartitioner part; SNES snes; /* nonlinear solver */ int rank; Vec fieldvec; MPI_Comm_rank(MPI_COMM_WORLD,&rank); // read mesh from file ierr=DMPlexCreateGmshFromFile(PETSC_COMM_WORLD,filename,interpolate,&dm);CHKERRQ(ierr); // Distribute mesh over processes ierr = DMPlexGetPartitioner(dm,&part);CHKERRQ(ierr); ierr = PetscPartitionerSetFromOptions(part);CHKERRQ(ierr); ierr = DMPlexDistribute(dm, 0, NULL, &distributedMesh);CHKERRQ(ierr); if (distributedMesh) { ierr = DMDestroy(&dm);CHKERRQ(ierr); dm = distributedMesh; } Vec vcoord; DMGetCoordinatesLocal(dm,&vcoord); VecView(vcoord,PETSC_VIEWER_STDOUT_SELF); // interpolation PetscInt spaceDim, c, Np=8, p; DMInterpolationInfo interpolator; PetscReal pcoords[16]; PetscBool pointsAllProcs=PETSC_TRUE; Vec lu, fieldVals; pcoords[0]=0.0; pcoords[1]=0; pcoords[2]=1; pcoords[3]=0; pcoords[4]=1; pcoords[5]=1; pcoords[6]=0; pcoords[7]=1; pcoords[8]=0.5; pcoords[9]=0.5; pcoords[10]=2; pcoords[11]=0; pcoords[12]=2; pcoords[13]=1; pcoords[14]=1.5; pcoords[15]=0.5; ierr = DMGetCoordinateDim(dm, &spaceDim);CHKERRQ(ierr); /* Create interpolator */ ierr = DMInterpolationCreate(PETSC_COMM_WORLD, &interpolator);CHKERRQ(ierr); ierr = DMInterpolationSetDim(interpolator, spaceDim);CHKERRQ(ierr); ierr = DMInterpolationAddPoints(interpolator, Np, pcoords);CHKERRQ(ierr); // //VecGetArray(vcoord,&interpolator->points); ierr = DMInterpolationSetUp(interpolator, dm, pointsAllProcs);CHKERRQ(ierr); /* Check locations */ for (c = 0; c < interpolator->n; ++c) { ierr = PetscSynchronizedPrintf(PETSC_COMM_WORLD, "[%d]Point %D is in Cell %D\n", rank, c, interpolator->cells[c]);CHKERRQ(ierr); } ierr = PetscSynchronizedFlush(PETSC_COMM_WORLD, NULL);CHKERRQ(ierr); ierr = VecView(interpolator->coords, PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr); /* Setup Discretization */ ierr = PetscFECreateDefault(dm, 2, 1, PETSC_TRUE, NULL, -1, &fe);CHKERRQ(ierr); ierr = DMSetField(dm, 0, (PetscObject)fe);CHKERRQ(ierr); // ierr = DMSetDS(dm);CHKERRQ(ierr); ierr = PetscFEDestroy(&fe);CHKERRQ(ierr); /* Create function */ ierr = DMGetLocalVector(dm, &lu);CHKERRQ(ierr); VecSet(lu,1.0); /* Check interpolant */ ierr = VecCreateSeq(PETSC_COMM_SELF, interpolator->n, &fieldVals);CHKERRQ(ierr); VecZeroEntries(fieldVals); ierr = DMInterpolationSetDof(interpolator, 1);CHKERRQ(ierr); ierr = DMInterpolationEvaluate(interpolator, dm, lu, fieldVals);CHKERRQ(ierr); ierr = VecView(fieldVals, PETSC_VIEWER_STDOUT_SELF);CHKERRQ(ierr); return ierr; } int main(int argc, char **argv) { PetscInitialize(&argc, &argv, (char *) 0, help); SolveProblem(1); printf("done \n"); PetscFinalize(); return 0; } The interpolated field at the 8 points defined by pcoords should be all 1s, and instead I get: Vec Object: 1 MPI processes type: seq 1. 0. 0. 0.5 1. Vec Object: 1 MPI processes type: seq -1.5 -1. 2. Also including the test gmesh file: $MeshFormat 2.2 0 8 $EndMeshFormat $PhysicalNames 2 2 1 "left" 2 2 "right" $EndPhysicalNames $Nodes 8 1 0 0 0 2 1 0 0 3 1 1 0 4 0 1 0 5 2 0 0 6 2 1 0 7 0.5 0.5 0 8 1.5 0.5 0 $EndNodes $Elements 8 1 2 2 1 1 1 2 7 2 2 2 1 1 1 7 4 3 2 2 1 1 2 3 7 4 2 2 1 1 3 4 7 5 2 2 2 2 2 8 3 6 2 2 2 2 2 5 8 7 2 2 2 2 3 8 6 8 2 2 2 2 5 6 8 $EndElements $Periodic 7 0 1 4 1 1 4 0 2 6 1 2 6 0 5 4 1 5 4 0 6 1 1 6 1 1 1 3 0 1 5 7 0 1 6 4 0 $EndPeriodic Thanks, Swarnava -------------- next part -------------- An HTML attachment was scrubbed... URL: From vu.doquochust at gmail.com Thu May 23 21:39:25 2019 From: vu.doquochust at gmail.com (Vu Q. Do) Date: Fri, 24 May 2019 09:39:25 +0700 Subject: [petsc-users] Problem coupling Petsc into OpenFOAM In-Reply-To: <12E2C6D2-8152-42E8-9286-C190FD30AC5D@mcs.anl.gov> References: <56797615-51EA-4C56-AEB4-F6FEC1949348@anl.gov> <673e9f0f-499c-07e3-280f-63ec02f08810@esi-group.com> <12E2C6D2-8152-42E8-9286-C190FD30AC5D@mcs.anl.gov> Message-ID: Hi all, Thanks for your previous suggestion, I have been able to successfully link Petsc to OpenFOAM. I have written a simple interface and it works quite well in serial mode, but cannot run in parallel. I have been thinking about this problem for weeks but couldn't solve it. So I think maybe you could give me some idea. I describe my problem below. My interface is just a class named "petscSolver*"*, which is used to convert an openfoam's matrix or blocked matrix to Petsc Mat, then solve the matrix using Petsc's solver. To use Petsc, an Openfoam's solver need to be recompiled after adding the following lines to make file: EXE_INC = \ ... -I$(LIB_SRC)/petscSolver \ -I$(PETSC_DIR)/include \ -I$(PETSC_DIR)/$(PETSC_ARCH)/include EXE_LIBS = \ ... -L$(PETSC_DIR)/$(PETSC_ARCH)/lib -lpetsc To run an openfoam's case in parallel, first I need to discretize the domain into subdomains (e.g. 2 subdomains ), then use the following command: mpirun -np 2 mySolver -parallel (where mpirun is literally mpiexec) The simulation crashed even before doing anything and the error message is as in the attached image. I have tested and realized that the solver can run in parallel as normal by removing the two lines: -I$(PETSC_DIR)/include \ -I$(PETSC_DIR)/$(PETSC_ARCH)/include But then it is clearly no longer linked to Petsc. I would appreciate any suggestion. [image: Screenshot from 2019-05-24 09-22-17.png] On Thu, Apr 11, 2019 at 1:37 PM Smith, Barry F. wrote: > > Mark, > > Thanks for the clarifying email. My google searches didn't locate the > rheoTool you mention nor "a PRACE project running via CINECA (Bologna)". > > It would be nice if someday OpenFOAM had (either directly or somehow > with the modules directory) an interface to the PETSc solvers. This would > allow the use of a variety of other solvers including hypre BoomerAMG, > SuperLU_Dist, MUMPS, and even the use of PETSc/ViennaCL GPU based solvers > automatically from OpenFOAM. Unfortunately the PETSc group doesn't have the > resources or expertise to develop and support such an interface ourselves. > We would, of course, try to answer emails about PETSc usage and bugs for > such an interface. > > Barry > > If OpenFOAM did have such an interface one thing we could provide is the > CI infrastructure for tracking changes to PETSc that may effect OpenFOAM. > For example we could automatically build OpenFOAM each day with the latest > master of PETSc thus immediately detecting changes that effect the > interface. > > > > > > On Apr 10, 2019, at 4:55 PM, Mark Olesen > wrote: > > > > The paper that Barry mentioned gives some generalities, but probably > > won't help much. There are some PETSc/OpenFOAM interfaces in rheoTool > > that are probably much more helpful. > > > > As Barry also rightly noted, there are some config files in the OpenFOAM > > tree that were put in some time ago for helping with setting up PETSc > > and OpenFOAM. Assuming that you have set the appropriate values in the > > etc/config.sh/petsc file you will be able to use those when using > wmake. > > For running you will still need to ensure that the LD_LIBARY_PATH is set > > correctly. For example, what some build scripts exhibit: > > > > wmake(petsc) : > > ==> Before running, verify that PETSc libraries can be found > > > > Enable in the OpenFOAM etc/bashrc, define manually or try with the > > following (POSIX shell): > > > > eval $(foamEtcFile -sh -config petsc -- -force) > > > > == > > > > > > There is currently a PRACE project running via CINECA (Bologna) with > > binding in PETSc as a runtime selectable linear solver in OpenFOAM. This > > is still at the stage of early testing and performance benchmarking. > > > > Cheers, > > /mark > > > > On 4/10/19 6:37 PM, Smith, Barry F. via petsc-users wrote: > >> > >> We don't know much about OpenFoam but > >> > >> 1) if I do a > >> > >> git grep -i petsc > >> > >> in the https://develop.openfoam.com/Development/OpenFOAM-plus.git > repository I see various configuration files specifically for PETSc. > >> > >> etc/config.csh/petsc etc/config.sh/petsc wmake/scripts/have_petsc > >> > >> so it appears that OpenFOAM has the tools to be linked against > PETSc (to me the documentation on how to use them is rather terse). Are > >> you using these? If you have trouble with them perhaps you can ask > the OpenFOAM user community how to use them. > >> > >> > >> 2) if you are editing the Make/options file directly you can try > changing > >> > >> -L$(PETSC_DIR)/$(PETSC_ARCH)/lib -lpetsc > >> > >> to > >> > >> -Wl,-rpath,$(PETSC_DIR)/$(PETSC_ARCH)/lib > -L$(PETSC_DIR)/$(PETSC_ARCH)/lib -lpetsc > >> > >> > >> > >> Note also that simply including petsc.h into the OpenFoam source > code and linking against -lpetsc will not immediately allow calling the > PETSc solvers from OpenFOAM. One needs to write all the interface code that > sets up and calls the PETSc solvers from OpenFOAM. There is a paper > https://www.researchgate.net/publication/319045499_Insertion_of_PETSc_in_the_OpenFOAM_Framework > that describes at an abstract level how they wrote code that calls the > PETSc solvers from OpenFOAM but the source code that actually does the work > does not appear to be available. > >> > >> Note that PETSc is now at version 3.11 we recommend working with > that version (unless you already have a lot of code that calls PETSc > written with a previous version of PETSc, for that we recommend first > upgrading to petsc 3.11 and then continuing to add code). > >> > >> Barry > >> > >> > >> > >> > >> > >>> On Apr 10, 2019, at 8:23 AM, Balay, Satish via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >>> > >>> Runtime error? You might have to add the path to $PETSC_ARCH/lib in > LD_LIBRARY_PATH env variable > >>> or - to your link command. If linux/gcc - the linker option is > -Wl,-rpath,$PETSC_ARCH/lib > >>> > >>> If not - send detail logs. > >>> > >>> Satish > >>> > >>> On Wed, 10 Apr 2019, Vu Do Quoc via petsc-users wrote: > >>> > >>>> Hi all, > >>>> > >>>> I am trying to insert Petsc to OpenFOAM opensource software. > >>>> I have been successfully compiling Petsc with an available solver in > >>>> OpenFOAM by linking it with the shared library libpetsc.so. However, > when I > >>>> call the solver to run a test case, I got an error saying that: > >>>> "libpetsc.so cannot be found", even though the library still exists > in the > >>>> $PETSC_ARCH/lib folder. > >>>> > >>>> I have been struggling for weeks but still, have not been able to > figure it > >>>> out. Therefore I would be very grateful for any suggestion to solve > this > >>>> problem. > >>>> > >>>> Thanks in advance for your time, > >>>> > >>>> Best regards, > >>>> > >>>> Vu Do > >>>> > > -- *Vu Q. Do*------------------------------------------------ *Student of Aeronautical Engineering* Programme de Formation d'Ing?nieurs d'Excellence au Vietnam *- PFIEV* School of Transportation Engineering Hanoi University of Science and Technology 01 Dai Co Viet Avenue, Hanoi, Vietnam E-mail: vu.doquochust at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot from 2019-05-24 09-22-17.png Type: image/png Size: 18904 bytes Desc: not available URL: From bsmith at mcs.anl.gov Thu May 23 22:19:07 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Fri, 24 May 2019 03:19:07 +0000 Subject: [petsc-users] Problem coupling Petsc into OpenFOAM In-Reply-To: References: <56797615-51EA-4C56-AEB4-F6FEC1949348@anl.gov> <673e9f0f-499c-07e3-280f-63ec02f08810@esi-group.com> <12E2C6D2-8152-42E8-9286-C190FD30AC5D@mcs.anl.gov> Message-ID: <93DB37F3-E8DB-4974-A7F5-B2905E4A23C4@mcs.anl.gov> > On May 23, 2019, at 9:39 PM, Vu Q. Do wrote: > > Hi all, > > Thanks for your previous suggestion, I have been able to successfully link Petsc to OpenFOAM. I have written a simple interface and it works quite well in serial mode, but cannot run in parallel. I have been thinking about this problem for weeks but couldn't solve it. So I think maybe you could give me some idea. I describe my problem below. > > My interface is just a class named "petscSolver", which is used to convert an openfoam's matrix or blocked matrix to Petsc Mat, then solve the matrix using Petsc's solver. There must be a problem with the converter in parallel. You need to run with two processes (if it crashes with 2) in the debugger and determine where/why it is crashing and fix the problem. If you don't have access to a parallel debugger like Totalview or DDT you can use the PETSc option -start_in_debugger and it will open two xterms with the debugger running on each process in its own xterm. You can use regular debugger options in each xterm. In particular use cont in each to continue running in the debugger, when it crashes type bt in both xterms to see where it is crashing. Barry > To use Petsc, an Openfoam's solver need to be recompiled after adding the following lines to make file: > > EXE_INC = \ > ... > -I$(LIB_SRC)/petscSolver \ > -I$(PETSC_DIR)/include \ > -I$(PETSC_DIR)/$(PETSC_ARCH)/include > EXE_LIBS = \ > ... > -L$(PETSC_DIR)/$(PETSC_ARCH)/lib -lpetsc > > To run an openfoam's case in parallel, first I need to discretize the domain into subdomains (e.g. 2 subdomains ), then use the following command: > > mpirun -np 2 mySolver -parallel > > (where mpirun is literally mpiexec) The simulation crashed even before doing anything and the error message is as in the attached image. > I have tested and realized that the solver can run in parallel as normal by removing the two lines: > -I$(PETSC_DIR)/include \ > -I$(PETSC_DIR)/$(PETSC_ARCH)/include > But then it is clearly no longer linked to Petsc. > > I would appreciate any suggestion. > > > > > On Thu, Apr 11, 2019 at 1:37 PM Smith, Barry F. wrote: > > Mark, > > Thanks for the clarifying email. My google searches didn't locate the rheoTool you mention nor "a PRACE project running via CINECA (Bologna)". > > It would be nice if someday OpenFOAM had (either directly or somehow with the modules directory) an interface to the PETSc solvers. This would allow the use of a variety of other solvers including hypre BoomerAMG, SuperLU_Dist, MUMPS, and even the use of PETSc/ViennaCL GPU based solvers automatically from OpenFOAM. Unfortunately the PETSc group doesn't have the resources or expertise to develop and support such an interface ourselves. We would, of course, try to answer emails about PETSc usage and bugs for such an interface. > > Barry > > If OpenFOAM did have such an interface one thing we could provide is the CI infrastructure for tracking changes to PETSc that may effect OpenFOAM. For example we could automatically build OpenFOAM each day with the latest master of PETSc thus immediately detecting changes that effect the interface. > > > > > > On Apr 10, 2019, at 4:55 PM, Mark Olesen wrote: > > > > The paper that Barry mentioned gives some generalities, but probably > > won't help much. There are some PETSc/OpenFOAM interfaces in rheoTool > > that are probably much more helpful. > > > > As Barry also rightly noted, there are some config files in the OpenFOAM > > tree that were put in some time ago for helping with setting up PETSc > > and OpenFOAM. Assuming that you have set the appropriate values in the > > etc/config.sh/petsc file you will be able to use those when using wmake. > > For running you will still need to ensure that the LD_LIBARY_PATH is set > > correctly. For example, what some build scripts exhibit: > > > > wmake(petsc) : > > ==> Before running, verify that PETSc libraries can be found > > > > Enable in the OpenFOAM etc/bashrc, define manually or try with the > > following (POSIX shell): > > > > eval $(foamEtcFile -sh -config petsc -- -force) > > > > == > > > > > > There is currently a PRACE project running via CINECA (Bologna) with > > binding in PETSc as a runtime selectable linear solver in OpenFOAM. This > > is still at the stage of early testing and performance benchmarking. > > > > Cheers, > > /mark > > > > On 4/10/19 6:37 PM, Smith, Barry F. via petsc-users wrote: > >> > >> We don't know much about OpenFoam but > >> > >> 1) if I do a > >> > >> git grep -i petsc > >> > >> in the https://develop.openfoam.com/Development/OpenFOAM-plus.git repository I see various configuration files specifically for PETSc. > >> > >> etc/config.csh/petsc etc/config.sh/petsc wmake/scripts/have_petsc > >> > >> so it appears that OpenFOAM has the tools to be linked against PETSc (to me the documentation on how to use them is rather terse). Are > >> you using these? If you have trouble with them perhaps you can ask the OpenFOAM user community how to use them. > >> > >> > >> 2) if you are editing the Make/options file directly you can try changing > >> > >> -L$(PETSC_DIR)/$(PETSC_ARCH)/lib -lpetsc > >> > >> to > >> > >> -Wl,-rpath,$(PETSC_DIR)/$(PETSC_ARCH)/lib -L$(PETSC_DIR)/$(PETSC_ARCH)/lib -lpetsc > >> > >> > >> > >> Note also that simply including petsc.h into the OpenFoam source code and linking against -lpetsc will not immediately allow calling the PETSc solvers from OpenFOAM. One needs to write all the interface code that sets up and calls the PETSc solvers from OpenFOAM. There is a paper https://www.researchgate.net/publication/319045499_Insertion_of_PETSc_in_the_OpenFOAM_Framework that describes at an abstract level how they wrote code that calls the PETSc solvers from OpenFOAM but the source code that actually does the work does not appear to be available. > >> > >> Note that PETSc is now at version 3.11 we recommend working with that version (unless you already have a lot of code that calls PETSc written with a previous version of PETSc, for that we recommend first upgrading to petsc 3.11 and then continuing to add code). > >> > >> Barry > >> > >> > >> > >> > >> > >>> On Apr 10, 2019, at 8:23 AM, Balay, Satish via petsc-users wrote: > >>> > >>> Runtime error? You might have to add the path to $PETSC_ARCH/lib in LD_LIBRARY_PATH env variable > >>> or - to your link command. If linux/gcc - the linker option is -Wl,-rpath,$PETSC_ARCH/lib > >>> > >>> If not - send detail logs. > >>> > >>> Satish > >>> > >>> On Wed, 10 Apr 2019, Vu Do Quoc via petsc-users wrote: > >>> > >>>> Hi all, > >>>> > >>>> I am trying to insert Petsc to OpenFOAM opensource software. > >>>> I have been successfully compiling Petsc with an available solver in > >>>> OpenFOAM by linking it with the shared library libpetsc.so. However, when I > >>>> call the solver to run a test case, I got an error saying that: > >>>> "libpetsc.so cannot be found", even though the library still exists in the > >>>> $PETSC_ARCH/lib folder. > >>>> > >>>> I have been struggling for weeks but still, have not been able to figure it > >>>> out. Therefore I would be very grateful for any suggestion to solve this > >>>> problem. > >>>> > >>>> Thanks in advance for your time, > >>>> > >>>> Best regards, > >>>> > >>>> Vu Do > >>>> > > > > -- > Vu Q. Do > ------------------------------------------------ > Student of Aeronautical Engineering > Programme de Formation d'Ing?nieurs d'Excellence au Vietnam - PFIEV > School of Transportation Engineering > Hanoi University of Science and Technology > 01 Dai Co Viet Avenue, Hanoi, Vietnam > E-mail: vu.doquochust at gmail.com From knepley at gmail.com Fri May 24 05:06:25 2019 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 24 May 2019 06:06:25 -0400 Subject: [petsc-users] Problem coupling Petsc into OpenFOAM In-Reply-To: References: <56797615-51EA-4C56-AEB4-F6FEC1949348@anl.gov> <673e9f0f-499c-07e3-280f-63ec02f08810@esi-group.com> <12E2C6D2-8152-42E8-9286-C190FD30AC5D@mcs.anl.gov> Message-ID: On Thu, May 23, 2019 at 10:41 PM Vu Q. Do via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi all, > > Thanks for your previous suggestion, I have been able to successfully link > Petsc to OpenFOAM. I have written a simple interface and it works quite > well in serial mode, but cannot run in parallel. I have been thinking about > this problem for weeks but couldn't solve it. So I think maybe you could > give me some idea. I describe my problem below. > > My interface is just a class named "petscSolver*"*, which is used to > convert an openfoam's matrix or blocked matrix to Petsc Mat, then solve > the matrix using Petsc's solver. > To use Petsc, an Openfoam's solver need to be recompiled after adding the > following lines to make file: > > EXE_INC = \ > ... > -I$(LIB_SRC)/petscSolver \ > -I$(PETSC_DIR)/include \ > -I$(PETSC_DIR)/$(PETSC_ARCH)/include > EXE_LIBS = \ > ... > -L$(PETSC_DIR)/$(PETSC_ARCH)/lib -lpetsc > > To run an openfoam's case in parallel, first I need to discretize the > domain into subdomains (e.g. 2 subdomains ), then use the following command: > > mpirun -np 2 mySolver -parallel > > (where mpirun is literally mpiexec) The simulation crashed even before > doing anything and the error message is as in the attached image. > Run in the debugger and see where it is crashing. Its possible to screw up the MPI linking here, so that you link OPENFOAM with one MPI and PETSc with another, or you call MPIInit() after you call PetscInitialize(), etc. Thanks, Matt > I have tested and realized that the solver can run in parallel as normal > by removing the two lines: > -I$(PETSC_DIR)/include \ > -I$(PETSC_DIR)/$(PETSC_ARCH)/include > But then it is clearly no longer linked to Petsc. > > I would appreciate any suggestion. > > [image: Screenshot from 2019-05-24 09-22-17.png] > > > On Thu, Apr 11, 2019 at 1:37 PM Smith, Barry F. > wrote: > >> >> Mark, >> >> Thanks for the clarifying email. My google searches didn't locate the >> rheoTool you mention nor "a PRACE project running via CINECA (Bologna)". >> >> It would be nice if someday OpenFOAM had (either directly or somehow >> with the modules directory) an interface to the PETSc solvers. This would >> allow the use of a variety of other solvers including hypre BoomerAMG, >> SuperLU_Dist, MUMPS, and even the use of PETSc/ViennaCL GPU based solvers >> automatically from OpenFOAM. Unfortunately the PETSc group doesn't have the >> resources or expertise to develop and support such an interface ourselves. >> We would, of course, try to answer emails about PETSc usage and bugs for >> such an interface. >> >> Barry >> >> If OpenFOAM did have such an interface one thing we could provide is >> the CI infrastructure for tracking changes to PETSc that may effect >> OpenFOAM. For example we could automatically build OpenFOAM each day with >> the latest master of PETSc thus immediately detecting changes that effect >> the interface. >> >> >> >> >> > On Apr 10, 2019, at 4:55 PM, Mark Olesen >> wrote: >> > >> > The paper that Barry mentioned gives some generalities, but probably >> > won't help much. There are some PETSc/OpenFOAM interfaces in rheoTool >> > that are probably much more helpful. >> > >> > As Barry also rightly noted, there are some config files in the >> OpenFOAM >> > tree that were put in some time ago for helping with setting up PETSc >> > and OpenFOAM. Assuming that you have set the appropriate values in the >> > etc/config.sh/petsc file you will be able to use those when using >> wmake. >> > For running you will still need to ensure that the LD_LIBARY_PATH is >> set >> > correctly. For example, what some build scripts exhibit: >> > >> > wmake(petsc) : >> > ==> Before running, verify that PETSc libraries can be found >> > >> > Enable in the OpenFOAM etc/bashrc, define manually or try with the >> > following (POSIX shell): >> > >> > eval $(foamEtcFile -sh -config petsc -- -force) >> > >> > == >> > >> > >> > There is currently a PRACE project running via CINECA (Bologna) with >> > binding in PETSc as a runtime selectable linear solver in OpenFOAM. >> This >> > is still at the stage of early testing and performance benchmarking. >> > >> > Cheers, >> > /mark >> > >> > On 4/10/19 6:37 PM, Smith, Barry F. via petsc-users wrote: >> >> >> >> We don't know much about OpenFoam but >> >> >> >> 1) if I do a >> >> >> >> git grep -i petsc >> >> >> >> in the https://develop.openfoam.com/Development/OpenFOAM-plus.git >> repository I see various configuration files specifically for PETSc. >> >> >> >> etc/config.csh/petsc etc/config.sh/petsc wmake/scripts/have_petsc >> >> >> >> so it appears that OpenFOAM has the tools to be linked against >> PETSc (to me the documentation on how to use them is rather terse). Are >> >> you using these? If you have trouble with them perhaps you can >> ask the OpenFOAM user community how to use them. >> >> >> >> >> >> 2) if you are editing the Make/options file directly you can try >> changing >> >> >> >> -L$(PETSC_DIR)/$(PETSC_ARCH)/lib -lpetsc >> >> >> >> to >> >> >> >> -Wl,-rpath,$(PETSC_DIR)/$(PETSC_ARCH)/lib >> -L$(PETSC_DIR)/$(PETSC_ARCH)/lib -lpetsc >> >> >> >> >> >> >> >> Note also that simply including petsc.h into the OpenFoam source >> code and linking against -lpetsc will not immediately allow calling the >> PETSc solvers from OpenFOAM. One needs to write all the interface code that >> sets up and calls the PETSc solvers from OpenFOAM. There is a paper >> https://www.researchgate.net/publication/319045499_Insertion_of_PETSc_in_the_OpenFOAM_Framework >> that describes at an abstract level how they wrote code that calls the >> PETSc solvers from OpenFOAM but the source code that actually does the work >> does not appear to be available. >> >> >> >> Note that PETSc is now at version 3.11 we recommend working with >> that version (unless you already have a lot of code that calls PETSc >> written with a previous version of PETSc, for that we recommend first >> upgrading to petsc 3.11 and then continuing to add code). >> >> >> >> Barry >> >> >> >> >> >> >> >> >> >> >> >>> On Apr 10, 2019, at 8:23 AM, Balay, Satish via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> >>> >> >>> Runtime error? You might have to add the path to $PETSC_ARCH/lib in >> LD_LIBRARY_PATH env variable >> >>> or - to your link command. If linux/gcc - the linker option is >> -Wl,-rpath,$PETSC_ARCH/lib >> >>> >> >>> If not - send detail logs. >> >>> >> >>> Satish >> >>> >> >>> On Wed, 10 Apr 2019, Vu Do Quoc via petsc-users wrote: >> >>> >> >>>> Hi all, >> >>>> >> >>>> I am trying to insert Petsc to OpenFOAM opensource software. >> >>>> I have been successfully compiling Petsc with an available solver in >> >>>> OpenFOAM by linking it with the shared library libpetsc.so. However, >> when I >> >>>> call the solver to run a test case, I got an error saying that: >> >>>> "libpetsc.so cannot be found", even though the library still exists >> in the >> >>>> $PETSC_ARCH/lib folder. >> >>>> >> >>>> I have been struggling for weeks but still, have not been able to >> figure it >> >>>> out. Therefore I would be very grateful for any suggestion to solve >> this >> >>>> problem. >> >>>> >> >>>> Thanks in advance for your time, >> >>>> >> >>>> Best regards, >> >>>> >> >>>> Vu Do >> >>>> >> >> > > -- > > *Vu Q. Do*------------------------------------------------ > *Student of Aeronautical Engineering* > Programme de Formation d'Ing?nieurs d'Excellence au Vietnam *- PFIEV* > School of Transportation Engineering > Hanoi University of Science and Technology > 01 Dai Co Viet Avenue, Hanoi, Vietnam > E-mail: vu.doquochust at gmail.com > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot from 2019-05-24 09-22-17.png Type: image/png Size: 18904 bytes Desc: not available URL: From davelee2804 at gmail.com Fri May 24 07:38:26 2019 From: davelee2804 at gmail.com (Dave Lee) Date: Fri, 24 May 2019 22:38:26 +1000 Subject: [petsc-users] Singlar values of the GMRES Hessenberg matrix In-Reply-To: References: Message-ID: Thanks Matt, great suggestion. I did indeed find a transpose error this way. The SVD as reconstructed via U S V^T now matches the input Hessenberg matrix as derived via the *HES(row,col) macro, and all the singular values are non-zero. However the solution to example src/ksp/ksp/examples/tutorials/ex1.c as determined via the expansion over the singular vectors is still not correct. I suspect I'm doing something wrong with regards to the expansion over the vec array VEC_VV(), which I assume are the orthonormal vectors of the Q_k matrix in the Arnoldi iteration.... Thanks again for your advice, I'll keep digging. Cheers, Dave. On Thu, May 23, 2019 at 8:20 PM Matthew Knepley wrote: > On Thu, May 23, 2019 at 5:09 AM Dave Lee via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> Hi PETSc, >> >> I'm trying to add a "hook step" to the SNES trust region solver (at the >> end of the function: KSPGMRESBuildSoln()) >> >> I'm testing this using the (linear) example: >> src/ksp/ksp/examples/tutorials/ex1.c >> as >> gdb --args ./test -snes_mf -snes_type newtontr -ksp_rtol 1.0e-12 >> -snes_stol 1.0e-12 -ksp_converged_reason -snes_converged_reason >> -ksp_monitor -snes_monitor >> (Ignore the SNES stuff, this is for when I test nonlinear examples). >> >> When I call the LAPACK SVD routine via PETSc as >> PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_(...)) >> I get the following singular values: >> >> 0 KSP Residual norm 7.071067811865e-01 >> 1 KSP Residual norm 3.162277660168e-01 >> 2 KSP Residual norm 1.889822365046e-01 >> 3 KSP Residual norm 1.290994448736e-01 >> 4 KSP Residual norm 9.534625892456e-02 >> 5 KSP Residual norm 8.082545620881e-16 >> >> 1 0.5 -7.85046e-16 1.17757e-15 >> 0.5 1 0.5 1.7271e-15 >> 0 0.5 1 0.5 >> 0 0 0.5 1 >> 0 0 0 0.5 >> >> singular values: 2.36264 0.409816 1.97794e-15 6.67632e-16 >> >> Linear solve converged due to CONVERGED_RTOL iterations 5 >> >> Where the lines above the singular values are the Hessenberg matrix that >> I'm doing the SVD on. >> > > First, write out all the SVD matrices you get and make sure that they > reconstruct the input matrix (that > you do not have something transposed somewhere). > > Matt > > >> When I build the solution in terms of the leading two right singular >> vectors (and subsequently the first two orthonormal basis vectors in >> VECS_VV I get an error norm as: >> Norm of error 3.16228, Iterations 5 >> >> My suspicion is that I'm creating the Hessenberg incorrectly, as I would >> have thought that this problem should have more than two non-zero leading >> singular values. >> >> Within my modified version of the GMRES build solution function >> (attached) I'm creating this (and passing it to LAPACK as): >> >> nRows = gmres->it+1; >> nCols = nRows-1; >> >> ierr = PetscBLASIntCast(nRows,&nRows_blas);CHKERRQ(ierr); >> ierr = PetscBLASIntCast(nCols,&nCols_blas);CHKERRQ(ierr); >> ierr = PetscBLASIntCast(5*nRows,&lwork);CHKERRQ(ierr); >> ierr = PetscMalloc1(5*nRows,&work);CHKERRQ(ierr); >> ierr = PetscMalloc1(nRows*nCols,&R);CHKERRQ(ierr); >> ierr = PetscMalloc1(nRows*nCols,&H);CHKERRQ(ierr); >> for (jj = 0; jj < nRows; jj++) { >> for (ii = 0; ii < nCols; ii++) { >> R[jj*nCols+ii] = *HES(jj,ii); >> } >> } >> // Duplicate the Hessenberg matrix as the one passed to the SVD >> solver is destroyed >> for (ii=0; ii> >> ierr = PetscMalloc1(nRows*nRows,&U);CHKERRQ(ierr); >> ierr = PetscMalloc1(nCols*nCols,&VT);CHKERRQ(ierr); >> ierr = PetscMalloc1(nRows*nRows,&UT);CHKERRQ(ierr); >> ierr = PetscMalloc1(nCols*nCols,&V);CHKERRQ(ierr); >> ierr = PetscMalloc1(nRows,&p);CHKERRQ(ierr); >> ierr = PetscMalloc1(nCols,&q);CHKERRQ(ierr); >> ierr = PetscMalloc1(nCols,&y);CHKERRQ(ierr); >> >> // Perform an SVD on the Hessenberg matrix - Note: this call destroys >> the input Hessenberg >> ierr = PetscFPTrapPush(PETSC_FP_TRAP_OFF);CHKERRQ(ierr); >> >> PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_("A","A",&nRows_blas,&nCols_blas,R,&nRows_blas,S,UT,&nRows_blas,V,&nCols_blas,work,&lwork,&lierr)); >> if (lierr) SETERRQ1(PETSC_COMM_SELF,PETSC_ERR_LIB,"Error in SVD >> Lapack routine %d",(int)lierr); >> ierr = PetscFPTrapPop();CHKERRQ(ierr); >> >> // Find the number of non-zero singular values >> for(nnz=0; nnz> if(fabs(S[nnz]) < 1.0e-8) break; >> } >> printf("number of nonzero singular values: %d\n",nnz); >> >> trans(nRows,nRows,UT,U); >> trans(nCols,nCols,V,VT); >> >> // Compute p = ||r_0|| U^T e_1 >> beta = gmres->res_beta; >> for (ii=0; ii> p[ii] = beta*UT[ii*nRows]; >> } >> p[nCols] = 0.0; >> >> // Original GMRES solution (\mu = 0) >> for (ii=0; ii> q[ii] = p[ii]/S[ii]; >> } >> >> // Expand y in terms of the right singular vectors as y = V q >> for (jj=0; jj> y[jj] = 0.0; >> for (ii=0; ii> y[jj] += V[jj*nCols+ii]*q[ii]; // transpose of the transpose >> } >> } >> >> // Pass the orthnomalized Krylov vector weights back out >> for (ii=0; ii> nrs[ii] = y[ii]; >> } >> >> I just wanted to check that this is the correct way to extract the >> Hessenberg from the KSP_GMRES structure, and to pass it to LAPACK, and if >> so, should I really be expecting only two non-zero singular values in >> return for this problem? >> >> Cheers, Dave. >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri May 24 07:48:51 2019 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 24 May 2019 08:48:51 -0400 Subject: [petsc-users] Singlar values of the GMRES Hessenberg matrix In-Reply-To: References: Message-ID: On Fri, May 24, 2019 at 8:38 AM Dave Lee wrote: > Thanks Matt, great suggestion. > > I did indeed find a transpose error this way. The SVD as reconstructed via > U S V^T now matches the input Hessenberg matrix as derived via the > *HES(row,col) macro, and all the singular values are non-zero. However > the solution to example src/ksp/ksp/examples/tutorials/ex1.c as > determined via the expansion over the singular vectors is still not > correct. I suspect I'm doing something wrong with regards to the expansion > over the vec array VEC_VV(), which I assume are the orthonormal vectors > of the Q_k matrix in the Arnoldi iteration.... > Here we are building the solution: https://bitbucket.org/petsc/petsc/src/7c23e6aa64ffbff85a2457e1aa154ec3d7f238e3/src/ksp/ksp/impls/gmres/gmres.c#lines-331 There are some subtleties if you have a nonzero initial guess or a preconditioner. Thanks, Matt > Thanks again for your advice, I'll keep digging. > > Cheers, Dave. > > On Thu, May 23, 2019 at 8:20 PM Matthew Knepley wrote: > >> On Thu, May 23, 2019 at 5:09 AM Dave Lee via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> >>> Hi PETSc, >>> >>> I'm trying to add a "hook step" to the SNES trust region solver (at the >>> end of the function: KSPGMRESBuildSoln()) >>> >>> I'm testing this using the (linear) example: >>> src/ksp/ksp/examples/tutorials/ex1.c >>> as >>> gdb --args ./test -snes_mf -snes_type newtontr -ksp_rtol 1.0e-12 >>> -snes_stol 1.0e-12 -ksp_converged_reason -snes_converged_reason >>> -ksp_monitor -snes_monitor >>> (Ignore the SNES stuff, this is for when I test nonlinear examples). >>> >>> When I call the LAPACK SVD routine via PETSc as >>> PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_(...)) >>> I get the following singular values: >>> >>> 0 KSP Residual norm 7.071067811865e-01 >>> 1 KSP Residual norm 3.162277660168e-01 >>> 2 KSP Residual norm 1.889822365046e-01 >>> 3 KSP Residual norm 1.290994448736e-01 >>> 4 KSP Residual norm 9.534625892456e-02 >>> 5 KSP Residual norm 8.082545620881e-16 >>> >>> 1 0.5 -7.85046e-16 1.17757e-15 >>> 0.5 1 0.5 1.7271e-15 >>> 0 0.5 1 0.5 >>> 0 0 0.5 1 >>> 0 0 0 0.5 >>> >>> singular values: 2.36264 0.409816 1.97794e-15 6.67632e-16 >>> >>> Linear solve converged due to CONVERGED_RTOL iterations 5 >>> >>> Where the lines above the singular values are the Hessenberg matrix that >>> I'm doing the SVD on. >>> >> >> First, write out all the SVD matrices you get and make sure that they >> reconstruct the input matrix (that >> you do not have something transposed somewhere). >> >> Matt >> >> >>> When I build the solution in terms of the leading two right singular >>> vectors (and subsequently the first two orthonormal basis vectors in >>> VECS_VV I get an error norm as: >>> Norm of error 3.16228, Iterations 5 >>> >>> My suspicion is that I'm creating the Hessenberg incorrectly, as I would >>> have thought that this problem should have more than two non-zero leading >>> singular values. >>> >>> Within my modified version of the GMRES build solution function >>> (attached) I'm creating this (and passing it to LAPACK as): >>> >>> nRows = gmres->it+1; >>> nCols = nRows-1; >>> >>> ierr = PetscBLASIntCast(nRows,&nRows_blas);CHKERRQ(ierr); >>> ierr = PetscBLASIntCast(nCols,&nCols_blas);CHKERRQ(ierr); >>> ierr = PetscBLASIntCast(5*nRows,&lwork);CHKERRQ(ierr); >>> ierr = PetscMalloc1(5*nRows,&work);CHKERRQ(ierr); >>> ierr = PetscMalloc1(nRows*nCols,&R);CHKERRQ(ierr); >>> ierr = PetscMalloc1(nRows*nCols,&H);CHKERRQ(ierr); >>> for (jj = 0; jj < nRows; jj++) { >>> for (ii = 0; ii < nCols; ii++) { >>> R[jj*nCols+ii] = *HES(jj,ii); >>> } >>> } >>> // Duplicate the Hessenberg matrix as the one passed to the SVD >>> solver is destroyed >>> for (ii=0; ii>> >>> ierr = PetscMalloc1(nRows*nRows,&U);CHKERRQ(ierr); >>> ierr = PetscMalloc1(nCols*nCols,&VT);CHKERRQ(ierr); >>> ierr = PetscMalloc1(nRows*nRows,&UT);CHKERRQ(ierr); >>> ierr = PetscMalloc1(nCols*nCols,&V);CHKERRQ(ierr); >>> ierr = PetscMalloc1(nRows,&p);CHKERRQ(ierr); >>> ierr = PetscMalloc1(nCols,&q);CHKERRQ(ierr); >>> ierr = PetscMalloc1(nCols,&y);CHKERRQ(ierr); >>> >>> // Perform an SVD on the Hessenberg matrix - Note: this call >>> destroys the input Hessenberg >>> ierr = PetscFPTrapPush(PETSC_FP_TRAP_OFF);CHKERRQ(ierr); >>> >>> PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_("A","A",&nRows_blas,&nCols_blas,R,&nRows_blas,S,UT,&nRows_blas,V,&nCols_blas,work,&lwork,&lierr)); >>> if (lierr) SETERRQ1(PETSC_COMM_SELF,PETSC_ERR_LIB,"Error in SVD >>> Lapack routine %d",(int)lierr); >>> ierr = PetscFPTrapPop();CHKERRQ(ierr); >>> >>> // Find the number of non-zero singular values >>> for(nnz=0; nnz>> if(fabs(S[nnz]) < 1.0e-8) break; >>> } >>> printf("number of nonzero singular values: %d\n",nnz); >>> >>> trans(nRows,nRows,UT,U); >>> trans(nCols,nCols,V,VT); >>> >>> // Compute p = ||r_0|| U^T e_1 >>> beta = gmres->res_beta; >>> for (ii=0; ii>> p[ii] = beta*UT[ii*nRows]; >>> } >>> p[nCols] = 0.0; >>> >>> // Original GMRES solution (\mu = 0) >>> for (ii=0; ii>> q[ii] = p[ii]/S[ii]; >>> } >>> >>> // Expand y in terms of the right singular vectors as y = V q >>> for (jj=0; jj>> y[jj] = 0.0; >>> for (ii=0; ii>> y[jj] += V[jj*nCols+ii]*q[ii]; // transpose of the transpose >>> } >>> } >>> >>> // Pass the orthnomalized Krylov vector weights back out >>> for (ii=0; ii>> nrs[ii] = y[ii]; >>> } >>> >>> I just wanted to check that this is the correct way to extract the >>> Hessenberg from the KSP_GMRES structure, and to pass it to LAPACK, and if >>> so, should I really be expecting only two non-zero singular values in >>> return for this problem? >>> >>> Cheers, Dave. >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From davelee2804 at gmail.com Sat May 25 03:18:50 2019 From: davelee2804 at gmail.com (Dave Lee) Date: Sat, 25 May 2019 18:18:50 +1000 Subject: [petsc-users] Singlar values of the GMRES Hessenberg matrix In-Reply-To: References: Message-ID: Thanks Matt, this is where I'm adding in my hookstep code. Cheers, Dave. On Fri, May 24, 2019 at 10:49 PM Matthew Knepley wrote: > On Fri, May 24, 2019 at 8:38 AM Dave Lee wrote: > >> Thanks Matt, great suggestion. >> >> I did indeed find a transpose error this way. The SVD as reconstructed >> via U S V^T now matches the input Hessenberg matrix as derived via the >> *HES(row,col) macro, and all the singular values are non-zero. However >> the solution to example src/ksp/ksp/examples/tutorials/ex1.c as >> determined via the expansion over the singular vectors is still not >> correct. I suspect I'm doing something wrong with regards to the expansion >> over the vec array VEC_VV(), which I assume are the orthonormal vectors >> of the Q_k matrix in the Arnoldi iteration.... >> > > Here we are building the solution: > > > https://bitbucket.org/petsc/petsc/src/7c23e6aa64ffbff85a2457e1aa154ec3d7f238e3/src/ksp/ksp/impls/gmres/gmres.c#lines-331 > > There are some subtleties if you have a nonzero initial guess or a > preconditioner. > > Thanks, > > Matt > > >> Thanks again for your advice, I'll keep digging. >> >> Cheers, Dave. >> >> On Thu, May 23, 2019 at 8:20 PM Matthew Knepley >> wrote: >> >>> On Thu, May 23, 2019 at 5:09 AM Dave Lee via petsc-users < >>> petsc-users at mcs.anl.gov> wrote: >>> >>>> Hi PETSc, >>>> >>>> I'm trying to add a "hook step" to the SNES trust region solver (at the >>>> end of the function: KSPGMRESBuildSoln()) >>>> >>>> I'm testing this using the (linear) example: >>>> src/ksp/ksp/examples/tutorials/ex1.c >>>> as >>>> gdb --args ./test -snes_mf -snes_type newtontr -ksp_rtol 1.0e-12 >>>> -snes_stol 1.0e-12 -ksp_converged_reason -snes_converged_reason >>>> -ksp_monitor -snes_monitor >>>> (Ignore the SNES stuff, this is for when I test nonlinear examples). >>>> >>>> When I call the LAPACK SVD routine via PETSc as >>>> PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_(...)) >>>> I get the following singular values: >>>> >>>> 0 KSP Residual norm 7.071067811865e-01 >>>> 1 KSP Residual norm 3.162277660168e-01 >>>> 2 KSP Residual norm 1.889822365046e-01 >>>> 3 KSP Residual norm 1.290994448736e-01 >>>> 4 KSP Residual norm 9.534625892456e-02 >>>> 5 KSP Residual norm 8.082545620881e-16 >>>> >>>> 1 0.5 -7.85046e-16 1.17757e-15 >>>> 0.5 1 0.5 1.7271e-15 >>>> 0 0.5 1 0.5 >>>> 0 0 0.5 1 >>>> 0 0 0 0.5 >>>> >>>> singular values: 2.36264 0.409816 1.97794e-15 6.67632e-16 >>>> >>>> Linear solve converged due to CONVERGED_RTOL iterations 5 >>>> >>>> Where the lines above the singular values are the Hessenberg matrix >>>> that I'm doing the SVD on. >>>> >>> >>> First, write out all the SVD matrices you get and make sure that they >>> reconstruct the input matrix (that >>> you do not have something transposed somewhere). >>> >>> Matt >>> >>> >>>> When I build the solution in terms of the leading two right singular >>>> vectors (and subsequently the first two orthonormal basis vectors in >>>> VECS_VV I get an error norm as: >>>> Norm of error 3.16228, Iterations 5 >>>> >>>> My suspicion is that I'm creating the Hessenberg incorrectly, as I >>>> would have thought that this problem should have more than two non-zero >>>> leading singular values. >>>> >>>> Within my modified version of the GMRES build solution function >>>> (attached) I'm creating this (and passing it to LAPACK as): >>>> >>>> nRows = gmres->it+1; >>>> nCols = nRows-1; >>>> >>>> ierr = PetscBLASIntCast(nRows,&nRows_blas);CHKERRQ(ierr); >>>> ierr = PetscBLASIntCast(nCols,&nCols_blas);CHKERRQ(ierr); >>>> ierr = PetscBLASIntCast(5*nRows,&lwork);CHKERRQ(ierr); >>>> ierr = PetscMalloc1(5*nRows,&work);CHKERRQ(ierr); >>>> ierr = PetscMalloc1(nRows*nCols,&R);CHKERRQ(ierr); >>>> ierr = PetscMalloc1(nRows*nCols,&H);CHKERRQ(ierr); >>>> for (jj = 0; jj < nRows; jj++) { >>>> for (ii = 0; ii < nCols; ii++) { >>>> R[jj*nCols+ii] = *HES(jj,ii); >>>> } >>>> } >>>> // Duplicate the Hessenberg matrix as the one passed to the SVD >>>> solver is destroyed >>>> for (ii=0; ii>>> >>>> ierr = PetscMalloc1(nRows*nRows,&U);CHKERRQ(ierr); >>>> ierr = PetscMalloc1(nCols*nCols,&VT);CHKERRQ(ierr); >>>> ierr = PetscMalloc1(nRows*nRows,&UT);CHKERRQ(ierr); >>>> ierr = PetscMalloc1(nCols*nCols,&V);CHKERRQ(ierr); >>>> ierr = PetscMalloc1(nRows,&p);CHKERRQ(ierr); >>>> ierr = PetscMalloc1(nCols,&q);CHKERRQ(ierr); >>>> ierr = PetscMalloc1(nCols,&y);CHKERRQ(ierr); >>>> >>>> // Perform an SVD on the Hessenberg matrix - Note: this call >>>> destroys the input Hessenberg >>>> ierr = PetscFPTrapPush(PETSC_FP_TRAP_OFF);CHKERRQ(ierr); >>>> >>>> PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_("A","A",&nRows_blas,&nCols_blas,R,&nRows_blas,S,UT,&nRows_blas,V,&nCols_blas,work,&lwork,&lierr)); >>>> if (lierr) SETERRQ1(PETSC_COMM_SELF,PETSC_ERR_LIB,"Error in SVD >>>> Lapack routine %d",(int)lierr); >>>> ierr = PetscFPTrapPop();CHKERRQ(ierr); >>>> >>>> // Find the number of non-zero singular values >>>> for(nnz=0; nnz>>> if(fabs(S[nnz]) < 1.0e-8) break; >>>> } >>>> printf("number of nonzero singular values: %d\n",nnz); >>>> >>>> trans(nRows,nRows,UT,U); >>>> trans(nCols,nCols,V,VT); >>>> >>>> // Compute p = ||r_0|| U^T e_1 >>>> beta = gmres->res_beta; >>>> for (ii=0; ii>>> p[ii] = beta*UT[ii*nRows]; >>>> } >>>> p[nCols] = 0.0; >>>> >>>> // Original GMRES solution (\mu = 0) >>>> for (ii=0; ii>>> q[ii] = p[ii]/S[ii]; >>>> } >>>> >>>> // Expand y in terms of the right singular vectors as y = V q >>>> for (jj=0; jj>>> y[jj] = 0.0; >>>> for (ii=0; ii>>> y[jj] += V[jj*nCols+ii]*q[ii]; // transpose of the transpose >>>> } >>>> } >>>> >>>> // Pass the orthnomalized Krylov vector weights back out >>>> for (ii=0; ii>>> nrs[ii] = y[ii]; >>>> } >>>> >>>> I just wanted to check that this is the correct way to extract the >>>> Hessenberg from the KSP_GMRES structure, and to pass it to LAPACK, and if >>>> so, should I really be expecting only two non-zero singular values in >>>> return for this problem? >>>> >>>> Cheers, Dave. >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From afrah.nacib at gmail.com Sat May 25 10:09:32 2019 From: afrah.nacib at gmail.com (Afrah Najib) Date: Sat, 25 May 2019 18:09:32 +0300 Subject: [petsc-users] About finding the eigenvalues of of large sparse matrices using SLEPc package Message-ID: Hi, I want to use SLEPc to find the eigenvlaues of matrices from FLorida sparse matrix collection (https://sparse.tamu.edu/). The available matrix formats are mtx, matlab and Rutherford Boeing. I installed the slepc-3.11.1 successfully with petsc-lite-3.11.2 as a dependency How can I use the examples of this library with such matrix formats, Many thanks for any kind of help, -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat May 25 10:24:57 2019 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 25 May 2019 11:24:57 -0400 Subject: [petsc-users] About finding the eigenvalues of of large sparse matrices using SLEPc package In-Reply-To: References: Message-ID: On Sat, May 25, 2019 at 11:10 AM Afrah Najib via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi, > > I want to use SLEPc to find the eigenvlaues of matrices from FLorida > sparse matrix collection (https://sparse.tamu.edu/). The available matrix > formats are mtx, matlab and Rutherford Boeing. > > I installed the slepc-3.11.1 successfully with petsc-lite-3.11.2 as a > dependency > How can I use the examples of this library with such matrix formats, > You can use https://bitbucket.org/petsc/petsc/src/master/src/mat/examples/tests/ex72.c to read it in and then write it as a PETSc binary. Thanks, Matt > Many thanks for any kind of help, > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From davelee2804 at gmail.com Mon May 27 02:55:33 2019 From: davelee2804 at gmail.com (Dave Lee) Date: Mon, 27 May 2019 17:55:33 +1000 Subject: [petsc-users] Singlar values of the GMRES Hessenberg matrix In-Reply-To: References: Message-ID: Hi Matt and PETSc. Thanks again for the advice. So I think I know what my problem might be. Looking at the comments above the function KSPInitialResidual() in src/ksp/ksp/interface/itres.c I see that the initial residual, as passed into VEC_VV(0) is the residual of the *preconditioned* system (and that the original residual goes temporarily to gmres->vecs[1]). So I'm wondering, is the Hessenberg, as derived via the *HES(row,col) macro the Hessenberg for the original Krylov subspace, or the preconditioned subspace? Secondly, do the vecs within the KSP_GMRES structure, as accessed via VEC_VV() correspond to the (preconditioned) Krylov subspace or the orthonormalized vectors that make up the matrix Q_k in the Arnoldi iteration? This isn't clear to me, and I need to access the vectors in Q_k in order to expand the corrected hookstep solution. Thanks again, Dave. On Sat, May 25, 2019 at 6:18 PM Dave Lee wrote: > Thanks Matt, this is where I'm adding in my hookstep code. > > Cheers, Dave. > > On Fri, May 24, 2019 at 10:49 PM Matthew Knepley > wrote: > >> On Fri, May 24, 2019 at 8:38 AM Dave Lee wrote: >> >>> Thanks Matt, great suggestion. >>> >>> I did indeed find a transpose error this way. The SVD as reconstructed >>> via U S V^T now matches the input Hessenberg matrix as derived via the >>> *HES(row,col) macro, and all the singular values are non-zero. However >>> the solution to example src/ksp/ksp/examples/tutorials/ex1.c as >>> determined via the expansion over the singular vectors is still not >>> correct. I suspect I'm doing something wrong with regards to the expansion >>> over the vec array VEC_VV(), which I assume are the orthonormal vectors >>> of the Q_k matrix in the Arnoldi iteration.... >>> >> >> Here we are building the solution: >> >> >> https://bitbucket.org/petsc/petsc/src/7c23e6aa64ffbff85a2457e1aa154ec3d7f238e3/src/ksp/ksp/impls/gmres/gmres.c#lines-331 >> >> There are some subtleties if you have a nonzero initial guess or a >> preconditioner. >> >> Thanks, >> >> Matt >> >> >>> Thanks again for your advice, I'll keep digging. >>> >>> Cheers, Dave. >>> >>> On Thu, May 23, 2019 at 8:20 PM Matthew Knepley >>> wrote: >>> >>>> On Thu, May 23, 2019 at 5:09 AM Dave Lee via petsc-users < >>>> petsc-users at mcs.anl.gov> wrote: >>>> >>>>> Hi PETSc, >>>>> >>>>> I'm trying to add a "hook step" to the SNES trust region solver (at >>>>> the end of the function: KSPGMRESBuildSoln()) >>>>> >>>>> I'm testing this using the (linear) example: >>>>> src/ksp/ksp/examples/tutorials/ex1.c >>>>> as >>>>> gdb --args ./test -snes_mf -snes_type newtontr -ksp_rtol 1.0e-12 >>>>> -snes_stol 1.0e-12 -ksp_converged_reason -snes_converged_reason >>>>> -ksp_monitor -snes_monitor >>>>> (Ignore the SNES stuff, this is for when I test nonlinear examples). >>>>> >>>>> When I call the LAPACK SVD routine via PETSc as >>>>> PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_(...)) >>>>> I get the following singular values: >>>>> >>>>> 0 KSP Residual norm 7.071067811865e-01 >>>>> 1 KSP Residual norm 3.162277660168e-01 >>>>> 2 KSP Residual norm 1.889822365046e-01 >>>>> 3 KSP Residual norm 1.290994448736e-01 >>>>> 4 KSP Residual norm 9.534625892456e-02 >>>>> 5 KSP Residual norm 8.082545620881e-16 >>>>> >>>>> 1 0.5 -7.85046e-16 1.17757e-15 >>>>> 0.5 1 0.5 1.7271e-15 >>>>> 0 0.5 1 0.5 >>>>> 0 0 0.5 1 >>>>> 0 0 0 0.5 >>>>> >>>>> singular values: 2.36264 0.409816 1.97794e-15 6.67632e-16 >>>>> >>>>> Linear solve converged due to CONVERGED_RTOL iterations 5 >>>>> >>>>> Where the lines above the singular values are the Hessenberg matrix >>>>> that I'm doing the SVD on. >>>>> >>>> >>>> First, write out all the SVD matrices you get and make sure that they >>>> reconstruct the input matrix (that >>>> you do not have something transposed somewhere). >>>> >>>> Matt >>>> >>>> >>>>> When I build the solution in terms of the leading two right singular >>>>> vectors (and subsequently the first two orthonormal basis vectors in >>>>> VECS_VV I get an error norm as: >>>>> Norm of error 3.16228, Iterations 5 >>>>> >>>>> My suspicion is that I'm creating the Hessenberg incorrectly, as I >>>>> would have thought that this problem should have more than two non-zero >>>>> leading singular values. >>>>> >>>>> Within my modified version of the GMRES build solution function >>>>> (attached) I'm creating this (and passing it to LAPACK as): >>>>> >>>>> nRows = gmres->it+1; >>>>> nCols = nRows-1; >>>>> >>>>> ierr = PetscBLASIntCast(nRows,&nRows_blas);CHKERRQ(ierr); >>>>> ierr = PetscBLASIntCast(nCols,&nCols_blas);CHKERRQ(ierr); >>>>> ierr = PetscBLASIntCast(5*nRows,&lwork);CHKERRQ(ierr); >>>>> ierr = PetscMalloc1(5*nRows,&work);CHKERRQ(ierr); >>>>> ierr = PetscMalloc1(nRows*nCols,&R);CHKERRQ(ierr); >>>>> ierr = PetscMalloc1(nRows*nCols,&H);CHKERRQ(ierr); >>>>> for (jj = 0; jj < nRows; jj++) { >>>>> for (ii = 0; ii < nCols; ii++) { >>>>> R[jj*nCols+ii] = *HES(jj,ii); >>>>> } >>>>> } >>>>> // Duplicate the Hessenberg matrix as the one passed to the SVD >>>>> solver is destroyed >>>>> for (ii=0; ii>>>> >>>>> ierr = PetscMalloc1(nRows*nRows,&U);CHKERRQ(ierr); >>>>> ierr = PetscMalloc1(nCols*nCols,&VT);CHKERRQ(ierr); >>>>> ierr = PetscMalloc1(nRows*nRows,&UT);CHKERRQ(ierr); >>>>> ierr = PetscMalloc1(nCols*nCols,&V);CHKERRQ(ierr); >>>>> ierr = PetscMalloc1(nRows,&p);CHKERRQ(ierr); >>>>> ierr = PetscMalloc1(nCols,&q);CHKERRQ(ierr); >>>>> ierr = PetscMalloc1(nCols,&y);CHKERRQ(ierr); >>>>> >>>>> // Perform an SVD on the Hessenberg matrix - Note: this call >>>>> destroys the input Hessenberg >>>>> ierr = PetscFPTrapPush(PETSC_FP_TRAP_OFF);CHKERRQ(ierr); >>>>> >>>>> PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_("A","A",&nRows_blas,&nCols_blas,R,&nRows_blas,S,UT,&nRows_blas,V,&nCols_blas,work,&lwork,&lierr)); >>>>> if (lierr) SETERRQ1(PETSC_COMM_SELF,PETSC_ERR_LIB,"Error in SVD >>>>> Lapack routine %d",(int)lierr); >>>>> ierr = PetscFPTrapPop();CHKERRQ(ierr); >>>>> >>>>> // Find the number of non-zero singular values >>>>> for(nnz=0; nnz>>>> if(fabs(S[nnz]) < 1.0e-8) break; >>>>> } >>>>> printf("number of nonzero singular values: %d\n",nnz); >>>>> >>>>> trans(nRows,nRows,UT,U); >>>>> trans(nCols,nCols,V,VT); >>>>> >>>>> // Compute p = ||r_0|| U^T e_1 >>>>> beta = gmres->res_beta; >>>>> for (ii=0; ii>>>> p[ii] = beta*UT[ii*nRows]; >>>>> } >>>>> p[nCols] = 0.0; >>>>> >>>>> // Original GMRES solution (\mu = 0) >>>>> for (ii=0; ii>>>> q[ii] = p[ii]/S[ii]; >>>>> } >>>>> >>>>> // Expand y in terms of the right singular vectors as y = V q >>>>> for (jj=0; jj>>>> y[jj] = 0.0; >>>>> for (ii=0; ii>>>> y[jj] += V[jj*nCols+ii]*q[ii]; // transpose of the transpose >>>>> } >>>>> } >>>>> >>>>> // Pass the orthnomalized Krylov vector weights back out >>>>> for (ii=0; ii>>>> nrs[ii] = y[ii]; >>>>> } >>>>> >>>>> I just wanted to check that this is the correct way to extract the >>>>> Hessenberg from the KSP_GMRES structure, and to pass it to LAPACK, and if >>>>> so, should I really be expecting only two non-zero singular values in >>>>> return for this problem? >>>>> >>>>> Cheers, Dave. >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From afrah.nacib at gmail.com Mon May 27 04:36:16 2019 From: afrah.nacib at gmail.com (Afrah Najib) Date: Mon, 27 May 2019 12:36:16 +0300 Subject: [petsc-users] Getting n largest and m smallest eigenvalues from SLEPc at the same time Message-ID: Hi, I have very large matrices and limited CPU hours and I want to get the first n largest and m smallest eigenvalues form these matrices using SLEPc without calculating all of them. I tried with this command : mpirun -n 2 ./ex4 -file bcsstk17.bin -eps_max_it 10000 -eps_ncv 24 -eps_tol 1.0e-8 -eps_type gd -eps_nev 6 -eps_largest_real -eps_smallest_real but it returns the first smallest eigenvalues only. Any idea, -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Mon May 27 04:43:17 2019 From: jroman at dsic.upv.es (Jose E. Roman) Date: Mon, 27 May 2019 11:43:17 +0200 Subject: [petsc-users] Getting n largest and m smallest eigenvalues from SLEPc at the same time In-Reply-To: References: Message-ID: The options -eps_largest_real -eps_smallest_real are exclusive. The second one, overrides the first one, because both of them call http://slepc.upv.es/documentation/current/docs/manualpages/EPS/EPSSetWhichEigenpairs.html You should run the example twice, or create a program that calls EPSSolve() twice. Jose > El 27 may 2019, a las 11:36, Afrah Najib via petsc-users escribi?: > > Hi, > > I have very large matrices and limited CPU hours and I want to get the first n largest and m smallest eigenvalues form these matrices using SLEPc without calculating all of them. > > I tried with this command : > > mpirun -n 2 ./ex4 -file bcsstk17.bin -eps_max_it 10000 -eps_ncv 24 -eps_tol 1.0e-8 -eps_type gd -eps_nev 6 -eps_largest_real -eps_smallest_real > > but it returns the first smallest eigenvalues only. > > Any idea, From knepley at gmail.com Mon May 27 05:50:25 2019 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 27 May 2019 06:50:25 -0400 Subject: [petsc-users] Singlar values of the GMRES Hessenberg matrix In-Reply-To: References: Message-ID: On Mon, May 27, 2019 at 3:55 AM Dave Lee wrote: > Hi Matt and PETSc. > > Thanks again for the advice. > > So I think I know what my problem might be. Looking at the comments above > the function > KSPInitialResidual() > in > src/ksp/ksp/interface/itres.c > I see that the initial residual, as passed into VEC_VV(0) is the residual > of the *preconditioned* system (and that the original residual goes > temporarily to gmres->vecs[1]). > > So I'm wondering, is the Hessenberg, as derived via the *HES(row,col) macro > the Hessenberg for the original Krylov subspace, or the preconditioned > subspace? > Left-preconditioning changes the operator, so you get he Arnoldi subspace for the transforned operator, starting with a transformed rhs. Thanks, Matt > Secondly, do the vecs within the KSP_GMRES structure, as accessed via > VEC_VV() correspond to the (preconditioned) Krylov subspace or the > orthonormalized vectors that make up the matrix Q_k in the Arnoldi > iteration? This isn't clear to me, and I need to access the vectors in Q_k in > order to expand the corrected hookstep solution. > > Thanks again, Dave. > > On Sat, May 25, 2019 at 6:18 PM Dave Lee wrote: > >> Thanks Matt, this is where I'm adding in my hookstep code. >> >> Cheers, Dave. >> >> On Fri, May 24, 2019 at 10:49 PM Matthew Knepley >> wrote: >> >>> On Fri, May 24, 2019 at 8:38 AM Dave Lee wrote: >>> >>>> Thanks Matt, great suggestion. >>>> >>>> I did indeed find a transpose error this way. The SVD as reconstructed >>>> via U S V^T now matches the input Hessenberg matrix as derived via the >>>> *HES(row,col) macro, and all the singular values are non-zero. However >>>> the solution to example src/ksp/ksp/examples/tutorials/ex1.c as >>>> determined via the expansion over the singular vectors is still not >>>> correct. I suspect I'm doing something wrong with regards to the expansion >>>> over the vec array VEC_VV(), which I assume are the orthonormal >>>> vectors of the Q_k matrix in the Arnoldi iteration.... >>>> >>> >>> Here we are building the solution: >>> >>> >>> https://bitbucket.org/petsc/petsc/src/7c23e6aa64ffbff85a2457e1aa154ec3d7f238e3/src/ksp/ksp/impls/gmres/gmres.c#lines-331 >>> >>> There are some subtleties if you have a nonzero initial guess or a >>> preconditioner. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thanks again for your advice, I'll keep digging. >>>> >>>> Cheers, Dave. >>>> >>>> On Thu, May 23, 2019 at 8:20 PM Matthew Knepley >>>> wrote: >>>> >>>>> On Thu, May 23, 2019 at 5:09 AM Dave Lee via petsc-users < >>>>> petsc-users at mcs.anl.gov> wrote: >>>>> >>>>>> Hi PETSc, >>>>>> >>>>>> I'm trying to add a "hook step" to the SNES trust region solver (at >>>>>> the end of the function: KSPGMRESBuildSoln()) >>>>>> >>>>>> I'm testing this using the (linear) example: >>>>>> src/ksp/ksp/examples/tutorials/ex1.c >>>>>> as >>>>>> gdb --args ./test -snes_mf -snes_type newtontr -ksp_rtol 1.0e-12 >>>>>> -snes_stol 1.0e-12 -ksp_converged_reason -snes_converged_reason >>>>>> -ksp_monitor -snes_monitor >>>>>> (Ignore the SNES stuff, this is for when I test nonlinear examples). >>>>>> >>>>>> When I call the LAPACK SVD routine via PETSc as >>>>>> PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_(...)) >>>>>> I get the following singular values: >>>>>> >>>>>> 0 KSP Residual norm 7.071067811865e-01 >>>>>> 1 KSP Residual norm 3.162277660168e-01 >>>>>> 2 KSP Residual norm 1.889822365046e-01 >>>>>> 3 KSP Residual norm 1.290994448736e-01 >>>>>> 4 KSP Residual norm 9.534625892456e-02 >>>>>> 5 KSP Residual norm 8.082545620881e-16 >>>>>> >>>>>> 1 0.5 -7.85046e-16 1.17757e-15 >>>>>> 0.5 1 0.5 1.7271e-15 >>>>>> 0 0.5 1 0.5 >>>>>> 0 0 0.5 1 >>>>>> 0 0 0 0.5 >>>>>> >>>>>> singular values: 2.36264 0.409816 1.97794e-15 6.67632e-16 >>>>>> >>>>>> Linear solve converged due to CONVERGED_RTOL iterations 5 >>>>>> >>>>>> Where the lines above the singular values are the Hessenberg matrix >>>>>> that I'm doing the SVD on. >>>>>> >>>>> >>>>> First, write out all the SVD matrices you get and make sure that they >>>>> reconstruct the input matrix (that >>>>> you do not have something transposed somewhere). >>>>> >>>>> Matt >>>>> >>>>> >>>>>> When I build the solution in terms of the leading two right singular >>>>>> vectors (and subsequently the first two orthonormal basis vectors in >>>>>> VECS_VV I get an error norm as: >>>>>> Norm of error 3.16228, Iterations 5 >>>>>> >>>>>> My suspicion is that I'm creating the Hessenberg incorrectly, as I >>>>>> would have thought that this problem should have more than two non-zero >>>>>> leading singular values. >>>>>> >>>>>> Within my modified version of the GMRES build solution function >>>>>> (attached) I'm creating this (and passing it to LAPACK as): >>>>>> >>>>>> nRows = gmres->it+1; >>>>>> nCols = nRows-1; >>>>>> >>>>>> ierr = PetscBLASIntCast(nRows,&nRows_blas);CHKERRQ(ierr); >>>>>> ierr = PetscBLASIntCast(nCols,&nCols_blas);CHKERRQ(ierr); >>>>>> ierr = PetscBLASIntCast(5*nRows,&lwork);CHKERRQ(ierr); >>>>>> ierr = PetscMalloc1(5*nRows,&work);CHKERRQ(ierr); >>>>>> ierr = PetscMalloc1(nRows*nCols,&R);CHKERRQ(ierr); >>>>>> ierr = PetscMalloc1(nRows*nCols,&H);CHKERRQ(ierr); >>>>>> for (jj = 0; jj < nRows; jj++) { >>>>>> for (ii = 0; ii < nCols; ii++) { >>>>>> R[jj*nCols+ii] = *HES(jj,ii); >>>>>> } >>>>>> } >>>>>> // Duplicate the Hessenberg matrix as the one passed to the SVD >>>>>> solver is destroyed >>>>>> for (ii=0; ii>>>>> >>>>>> ierr = PetscMalloc1(nRows*nRows,&U);CHKERRQ(ierr); >>>>>> ierr = PetscMalloc1(nCols*nCols,&VT);CHKERRQ(ierr); >>>>>> ierr = PetscMalloc1(nRows*nRows,&UT);CHKERRQ(ierr); >>>>>> ierr = PetscMalloc1(nCols*nCols,&V);CHKERRQ(ierr); >>>>>> ierr = PetscMalloc1(nRows,&p);CHKERRQ(ierr); >>>>>> ierr = PetscMalloc1(nCols,&q);CHKERRQ(ierr); >>>>>> ierr = PetscMalloc1(nCols,&y);CHKERRQ(ierr); >>>>>> >>>>>> // Perform an SVD on the Hessenberg matrix - Note: this call >>>>>> destroys the input Hessenberg >>>>>> ierr = PetscFPTrapPush(PETSC_FP_TRAP_OFF);CHKERRQ(ierr); >>>>>> >>>>>> PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_("A","A",&nRows_blas,&nCols_blas,R,&nRows_blas,S,UT,&nRows_blas,V,&nCols_blas,work,&lwork,&lierr)); >>>>>> if (lierr) SETERRQ1(PETSC_COMM_SELF,PETSC_ERR_LIB,"Error in SVD >>>>>> Lapack routine %d",(int)lierr); >>>>>> ierr = PetscFPTrapPop();CHKERRQ(ierr); >>>>>> >>>>>> // Find the number of non-zero singular values >>>>>> for(nnz=0; nnz>>>>> if(fabs(S[nnz]) < 1.0e-8) break; >>>>>> } >>>>>> printf("number of nonzero singular values: %d\n",nnz); >>>>>> >>>>>> trans(nRows,nRows,UT,U); >>>>>> trans(nCols,nCols,V,VT); >>>>>> >>>>>> // Compute p = ||r_0|| U^T e_1 >>>>>> beta = gmres->res_beta; >>>>>> for (ii=0; ii>>>>> p[ii] = beta*UT[ii*nRows]; >>>>>> } >>>>>> p[nCols] = 0.0; >>>>>> >>>>>> // Original GMRES solution (\mu = 0) >>>>>> for (ii=0; ii>>>>> q[ii] = p[ii]/S[ii]; >>>>>> } >>>>>> >>>>>> // Expand y in terms of the right singular vectors as y = V q >>>>>> for (jj=0; jj>>>>> y[jj] = 0.0; >>>>>> for (ii=0; ii>>>>> y[jj] += V[jj*nCols+ii]*q[ii]; // transpose of the transpose >>>>>> } >>>>>> } >>>>>> >>>>>> // Pass the orthnomalized Krylov vector weights back out >>>>>> for (ii=0; ii>>>>> nrs[ii] = y[ii]; >>>>>> } >>>>>> >>>>>> I just wanted to check that this is the correct way to extract the >>>>>> Hessenberg from the KSP_GMRES structure, and to pass it to LAPACK, and if >>>>>> so, should I really be expecting only two non-zero singular values in >>>>>> return for this problem? >>>>>> >>>>>> Cheers, Dave. >>>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From davelee2804 at gmail.com Mon May 27 06:46:01 2019 From: davelee2804 at gmail.com (Dave Lee) Date: Mon, 27 May 2019 21:46:01 +1000 Subject: [petsc-users] Singlar values of the GMRES Hessenberg matrix In-Reply-To: References: Message-ID: Thanks Matt, so just to clarify: -- VEC_VV() contains the Krylov subspace vectors (left preconditioned if PC_LEFT) and not the orthonomalized vectors that make up Q_k? - if so, is it possible to obtain Q_k? -- HES(row,col) contains the entries of the Hessenberg matrix corresponding to the Arnoldi iteration for the preconditioned Krylov vectors (if PC_LEFT)? Cheers, Dave. On Mon, May 27, 2019 at 8:50 PM Matthew Knepley wrote: > On Mon, May 27, 2019 at 3:55 AM Dave Lee wrote: > >> Hi Matt and PETSc. >> >> Thanks again for the advice. >> >> So I think I know what my problem might be. Looking at the comments above >> the function >> KSPInitialResidual() >> in >> src/ksp/ksp/interface/itres.c >> I see that the initial residual, as passed into VEC_VV(0) is the >> residual of the *preconditioned* system (and that the original residual >> goes temporarily to gmres->vecs[1]). >> >> So I'm wondering, is the Hessenberg, as derived via the *HES(row,col) macro >> the Hessenberg for the original Krylov subspace, or the preconditioned >> subspace? >> > > Left-preconditioning changes the operator, so you get he Arnoldi subspace > for the transforned operator, starting with a transformed rhs. > > Thanks, > > Matt > > >> Secondly, do the vecs within the KSP_GMRES structure, as accessed via >> VEC_VV() correspond to the (preconditioned) Krylov subspace or the >> orthonormalized vectors that make up the matrix Q_k in the Arnoldi >> iteration? This isn't clear to me, and I need to access the vectors in >> Q_k in order to expand the corrected hookstep solution. >> >> Thanks again, Dave. >> >> On Sat, May 25, 2019 at 6:18 PM Dave Lee wrote: >> >>> Thanks Matt, this is where I'm adding in my hookstep code. >>> >>> Cheers, Dave. >>> >>> On Fri, May 24, 2019 at 10:49 PM Matthew Knepley >>> wrote: >>> >>>> On Fri, May 24, 2019 at 8:38 AM Dave Lee wrote: >>>> >>>>> Thanks Matt, great suggestion. >>>>> >>>>> I did indeed find a transpose error this way. The SVD as reconstructed >>>>> via U S V^T now matches the input Hessenberg matrix as derived via the >>>>> *HES(row,col) macro, and all the singular values are non-zero. >>>>> However the solution to example src/ksp/ksp/examples/tutorials/ex1.c as >>>>> determined via the expansion over the singular vectors is still not >>>>> correct. I suspect I'm doing something wrong with regards to the expansion >>>>> over the vec array VEC_VV(), which I assume are the orthonormal >>>>> vectors of the Q_k matrix in the Arnoldi iteration.... >>>>> >>>> >>>> Here we are building the solution: >>>> >>>> >>>> https://bitbucket.org/petsc/petsc/src/7c23e6aa64ffbff85a2457e1aa154ec3d7f238e3/src/ksp/ksp/impls/gmres/gmres.c#lines-331 >>>> >>>> There are some subtleties if you have a nonzero initial guess or a >>>> preconditioner. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Thanks again for your advice, I'll keep digging. >>>>> >>>>> Cheers, Dave. >>>>> >>>>> On Thu, May 23, 2019 at 8:20 PM Matthew Knepley >>>>> wrote: >>>>> >>>>>> On Thu, May 23, 2019 at 5:09 AM Dave Lee via petsc-users < >>>>>> petsc-users at mcs.anl.gov> wrote: >>>>>> >>>>>>> Hi PETSc, >>>>>>> >>>>>>> I'm trying to add a "hook step" to the SNES trust region solver (at >>>>>>> the end of the function: KSPGMRESBuildSoln()) >>>>>>> >>>>>>> I'm testing this using the (linear) example: >>>>>>> src/ksp/ksp/examples/tutorials/ex1.c >>>>>>> as >>>>>>> gdb --args ./test -snes_mf -snes_type newtontr -ksp_rtol 1.0e-12 >>>>>>> -snes_stol 1.0e-12 -ksp_converged_reason -snes_converged_reason >>>>>>> -ksp_monitor -snes_monitor >>>>>>> (Ignore the SNES stuff, this is for when I test nonlinear examples). >>>>>>> >>>>>>> When I call the LAPACK SVD routine via PETSc as >>>>>>> PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_(...)) >>>>>>> I get the following singular values: >>>>>>> >>>>>>> 0 KSP Residual norm 7.071067811865e-01 >>>>>>> 1 KSP Residual norm 3.162277660168e-01 >>>>>>> 2 KSP Residual norm 1.889822365046e-01 >>>>>>> 3 KSP Residual norm 1.290994448736e-01 >>>>>>> 4 KSP Residual norm 9.534625892456e-02 >>>>>>> 5 KSP Residual norm 8.082545620881e-16 >>>>>>> >>>>>>> 1 0.5 -7.85046e-16 1.17757e-15 >>>>>>> 0.5 1 0.5 1.7271e-15 >>>>>>> 0 0.5 1 0.5 >>>>>>> 0 0 0.5 1 >>>>>>> 0 0 0 0.5 >>>>>>> >>>>>>> singular values: 2.36264 0.409816 1.97794e-15 6.67632e-16 >>>>>>> >>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 5 >>>>>>> >>>>>>> Where the lines above the singular values are the Hessenberg matrix >>>>>>> that I'm doing the SVD on. >>>>>>> >>>>>> >>>>>> First, write out all the SVD matrices you get and make sure that they >>>>>> reconstruct the input matrix (that >>>>>> you do not have something transposed somewhere). >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>>> When I build the solution in terms of the leading two right singular >>>>>>> vectors (and subsequently the first two orthonormal basis vectors in >>>>>>> VECS_VV I get an error norm as: >>>>>>> Norm of error 3.16228, Iterations 5 >>>>>>> >>>>>>> My suspicion is that I'm creating the Hessenberg incorrectly, as I >>>>>>> would have thought that this problem should have more than two non-zero >>>>>>> leading singular values. >>>>>>> >>>>>>> Within my modified version of the GMRES build solution function >>>>>>> (attached) I'm creating this (and passing it to LAPACK as): >>>>>>> >>>>>>> nRows = gmres->it+1; >>>>>>> nCols = nRows-1; >>>>>>> >>>>>>> ierr = PetscBLASIntCast(nRows,&nRows_blas);CHKERRQ(ierr); >>>>>>> ierr = PetscBLASIntCast(nCols,&nCols_blas);CHKERRQ(ierr); >>>>>>> ierr = PetscBLASIntCast(5*nRows,&lwork);CHKERRQ(ierr); >>>>>>> ierr = PetscMalloc1(5*nRows,&work);CHKERRQ(ierr); >>>>>>> ierr = PetscMalloc1(nRows*nCols,&R);CHKERRQ(ierr); >>>>>>> ierr = PetscMalloc1(nRows*nCols,&H);CHKERRQ(ierr); >>>>>>> for (jj = 0; jj < nRows; jj++) { >>>>>>> for (ii = 0; ii < nCols; ii++) { >>>>>>> R[jj*nCols+ii] = *HES(jj,ii); >>>>>>> } >>>>>>> } >>>>>>> // Duplicate the Hessenberg matrix as the one passed to the SVD >>>>>>> solver is destroyed >>>>>>> for (ii=0; ii>>>>>> >>>>>>> ierr = PetscMalloc1(nRows*nRows,&U);CHKERRQ(ierr); >>>>>>> ierr = PetscMalloc1(nCols*nCols,&VT);CHKERRQ(ierr); >>>>>>> ierr = PetscMalloc1(nRows*nRows,&UT);CHKERRQ(ierr); >>>>>>> ierr = PetscMalloc1(nCols*nCols,&V);CHKERRQ(ierr); >>>>>>> ierr = PetscMalloc1(nRows,&p);CHKERRQ(ierr); >>>>>>> ierr = PetscMalloc1(nCols,&q);CHKERRQ(ierr); >>>>>>> ierr = PetscMalloc1(nCols,&y);CHKERRQ(ierr); >>>>>>> >>>>>>> // Perform an SVD on the Hessenberg matrix - Note: this call >>>>>>> destroys the input Hessenberg >>>>>>> ierr = PetscFPTrapPush(PETSC_FP_TRAP_OFF);CHKERRQ(ierr); >>>>>>> >>>>>>> PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_("A","A",&nRows_blas,&nCols_blas,R,&nRows_blas,S,UT,&nRows_blas,V,&nCols_blas,work,&lwork,&lierr)); >>>>>>> if (lierr) SETERRQ1(PETSC_COMM_SELF,PETSC_ERR_LIB,"Error in SVD >>>>>>> Lapack routine %d",(int)lierr); >>>>>>> ierr = PetscFPTrapPop();CHKERRQ(ierr); >>>>>>> >>>>>>> // Find the number of non-zero singular values >>>>>>> for(nnz=0; nnz>>>>>> if(fabs(S[nnz]) < 1.0e-8) break; >>>>>>> } >>>>>>> printf("number of nonzero singular values: %d\n",nnz); >>>>>>> >>>>>>> trans(nRows,nRows,UT,U); >>>>>>> trans(nCols,nCols,V,VT); >>>>>>> >>>>>>> // Compute p = ||r_0|| U^T e_1 >>>>>>> beta = gmres->res_beta; >>>>>>> for (ii=0; ii>>>>>> p[ii] = beta*UT[ii*nRows]; >>>>>>> } >>>>>>> p[nCols] = 0.0; >>>>>>> >>>>>>> // Original GMRES solution (\mu = 0) >>>>>>> for (ii=0; ii>>>>>> q[ii] = p[ii]/S[ii]; >>>>>>> } >>>>>>> >>>>>>> // Expand y in terms of the right singular vectors as y = V q >>>>>>> for (jj=0; jj>>>>>> y[jj] = 0.0; >>>>>>> for (ii=0; ii>>>>>> y[jj] += V[jj*nCols+ii]*q[ii]; // transpose of the transpose >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> // Pass the orthnomalized Krylov vector weights back out >>>>>>> for (ii=0; ii>>>>>> nrs[ii] = y[ii]; >>>>>>> } >>>>>>> >>>>>>> I just wanted to check that this is the correct way to extract the >>>>>>> Hessenberg from the KSP_GMRES structure, and to pass it to LAPACK, and if >>>>>>> so, should I really be expecting only two non-zero singular values in >>>>>>> return for this problem? >>>>>>> >>>>>>> Cheers, Dave. >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> >>>>>> >>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> >>>> >>> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon May 27 06:52:31 2019 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 27 May 2019 07:52:31 -0400 Subject: [petsc-users] Singlar values of the GMRES Hessenberg matrix In-Reply-To: References: Message-ID: On Mon, May 27, 2019 at 7:46 AM Dave Lee wrote: > Thanks Matt, so just to clarify: > > -- VEC_VV() contains the Krylov subspace vectors (left preconditioned if > PC_LEFT) and not the orthonomalized vectors that make up Q_k? > - if so, is it possible to obtain Q_k? > > -- HES(row,col) contains the entries of the Hessenberg matrix > corresponding to the Arnoldi iteration for the preconditioned Krylov > vectors (if PC_LEFT)? > Everything in GMRES refers to the preconditioned operator. That is how preconditioned GMRES works. If you need the unpreconditioned space, you would have to use FGMRES, since it pays the storage overhead. Thanks, Matt > Cheers, Dave. > > On Mon, May 27, 2019 at 8:50 PM Matthew Knepley wrote: > >> On Mon, May 27, 2019 at 3:55 AM Dave Lee wrote: >> >>> Hi Matt and PETSc. >>> >>> Thanks again for the advice. >>> >>> So I think I know what my problem might be. Looking at the comments >>> above the function >>> KSPInitialResidual() >>> in >>> src/ksp/ksp/interface/itres.c >>> I see that the initial residual, as passed into VEC_VV(0) is the >>> residual of the *preconditioned* system (and that the original residual >>> goes temporarily to gmres->vecs[1]). >>> >>> So I'm wondering, is the Hessenberg, as derived via the *HES(row,col) macro >>> the Hessenberg for the original Krylov subspace, or the preconditioned >>> subspace? >>> >> >> Left-preconditioning changes the operator, so you get he Arnoldi subspace >> for the transforned operator, starting with a transformed rhs. >> >> Thanks, >> >> Matt >> >> >>> Secondly, do the vecs within the KSP_GMRES structure, as accessed via >>> VEC_VV() correspond to the (preconditioned) Krylov subspace or the >>> orthonormalized vectors that make up the matrix Q_k in the Arnoldi >>> iteration? This isn't clear to me, and I need to access the vectors in >>> Q_k in order to expand the corrected hookstep solution. >>> >>> Thanks again, Dave. >>> >>> On Sat, May 25, 2019 at 6:18 PM Dave Lee wrote: >>> >>>> Thanks Matt, this is where I'm adding in my hookstep code. >>>> >>>> Cheers, Dave. >>>> >>>> On Fri, May 24, 2019 at 10:49 PM Matthew Knepley >>>> wrote: >>>> >>>>> On Fri, May 24, 2019 at 8:38 AM Dave Lee >>>>> wrote: >>>>> >>>>>> Thanks Matt, great suggestion. >>>>>> >>>>>> I did indeed find a transpose error this way. The SVD as >>>>>> reconstructed via U S V^T now matches the input Hessenberg matrix as >>>>>> derived via the *HES(row,col) macro, and all the singular values are >>>>>> non-zero. However the solution to example src/ksp/ksp/examples/tutorials/ex1.c >>>>>> as determined via the expansion over the singular vectors is still >>>>>> not correct. I suspect I'm doing something wrong with regards to the >>>>>> expansion over the vec array VEC_VV(), which I assume are the >>>>>> orthonormal vectors of the Q_k matrix in the Arnoldi iteration.... >>>>>> >>>>> >>>>> Here we are building the solution: >>>>> >>>>> >>>>> https://bitbucket.org/petsc/petsc/src/7c23e6aa64ffbff85a2457e1aa154ec3d7f238e3/src/ksp/ksp/impls/gmres/gmres.c#lines-331 >>>>> >>>>> There are some subtleties if you have a nonzero initial guess or a >>>>> preconditioner. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> Thanks again for your advice, I'll keep digging. >>>>>> >>>>>> Cheers, Dave. >>>>>> >>>>>> On Thu, May 23, 2019 at 8:20 PM Matthew Knepley >>>>>> wrote: >>>>>> >>>>>>> On Thu, May 23, 2019 at 5:09 AM Dave Lee via petsc-users < >>>>>>> petsc-users at mcs.anl.gov> wrote: >>>>>>> >>>>>>>> Hi PETSc, >>>>>>>> >>>>>>>> I'm trying to add a "hook step" to the SNES trust region solver (at >>>>>>>> the end of the function: KSPGMRESBuildSoln()) >>>>>>>> >>>>>>>> I'm testing this using the (linear) example: >>>>>>>> src/ksp/ksp/examples/tutorials/ex1.c >>>>>>>> as >>>>>>>> gdb --args ./test -snes_mf -snes_type newtontr -ksp_rtol 1.0e-12 >>>>>>>> -snes_stol 1.0e-12 -ksp_converged_reason -snes_converged_reason >>>>>>>> -ksp_monitor -snes_monitor >>>>>>>> (Ignore the SNES stuff, this is for when I test nonlinear examples). >>>>>>>> >>>>>>>> When I call the LAPACK SVD routine via PETSc as >>>>>>>> PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_(...)) >>>>>>>> I get the following singular values: >>>>>>>> >>>>>>>> 0 KSP Residual norm 7.071067811865e-01 >>>>>>>> 1 KSP Residual norm 3.162277660168e-01 >>>>>>>> 2 KSP Residual norm 1.889822365046e-01 >>>>>>>> 3 KSP Residual norm 1.290994448736e-01 >>>>>>>> 4 KSP Residual norm 9.534625892456e-02 >>>>>>>> 5 KSP Residual norm 8.082545620881e-16 >>>>>>>> >>>>>>>> 1 0.5 -7.85046e-16 1.17757e-15 >>>>>>>> 0.5 1 0.5 1.7271e-15 >>>>>>>> 0 0.5 1 0.5 >>>>>>>> 0 0 0.5 1 >>>>>>>> 0 0 0 0.5 >>>>>>>> >>>>>>>> singular values: 2.36264 0.409816 1.97794e-15 6.67632e-16 >>>>>>>> >>>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 5 >>>>>>>> >>>>>>>> Where the lines above the singular values are the Hessenberg matrix >>>>>>>> that I'm doing the SVD on. >>>>>>>> >>>>>>> >>>>>>> First, write out all the SVD matrices you get and make sure that >>>>>>> they reconstruct the input matrix (that >>>>>>> you do not have something transposed somewhere). >>>>>>> >>>>>>> Matt >>>>>>> >>>>>>> >>>>>>>> When I build the solution in terms of the leading two right >>>>>>>> singular vectors (and subsequently the first two orthonormal basis vectors >>>>>>>> in VECS_VV I get an error norm as: >>>>>>>> Norm of error 3.16228, Iterations 5 >>>>>>>> >>>>>>>> My suspicion is that I'm creating the Hessenberg incorrectly, as I >>>>>>>> would have thought that this problem should have more than two non-zero >>>>>>>> leading singular values. >>>>>>>> >>>>>>>> Within my modified version of the GMRES build solution function >>>>>>>> (attached) I'm creating this (and passing it to LAPACK as): >>>>>>>> >>>>>>>> nRows = gmres->it+1; >>>>>>>> nCols = nRows-1; >>>>>>>> >>>>>>>> ierr = PetscBLASIntCast(nRows,&nRows_blas);CHKERRQ(ierr); >>>>>>>> ierr = PetscBLASIntCast(nCols,&nCols_blas);CHKERRQ(ierr); >>>>>>>> ierr = PetscBLASIntCast(5*nRows,&lwork);CHKERRQ(ierr); >>>>>>>> ierr = PetscMalloc1(5*nRows,&work);CHKERRQ(ierr); >>>>>>>> ierr = PetscMalloc1(nRows*nCols,&R);CHKERRQ(ierr); >>>>>>>> ierr = PetscMalloc1(nRows*nCols,&H);CHKERRQ(ierr); >>>>>>>> for (jj = 0; jj < nRows; jj++) { >>>>>>>> for (ii = 0; ii < nCols; ii++) { >>>>>>>> R[jj*nCols+ii] = *HES(jj,ii); >>>>>>>> } >>>>>>>> } >>>>>>>> // Duplicate the Hessenberg matrix as the one passed to the SVD >>>>>>>> solver is destroyed >>>>>>>> for (ii=0; ii>>>>>>> >>>>>>>> ierr = PetscMalloc1(nRows*nRows,&U);CHKERRQ(ierr); >>>>>>>> ierr = PetscMalloc1(nCols*nCols,&VT);CHKERRQ(ierr); >>>>>>>> ierr = PetscMalloc1(nRows*nRows,&UT);CHKERRQ(ierr); >>>>>>>> ierr = PetscMalloc1(nCols*nCols,&V);CHKERRQ(ierr); >>>>>>>> ierr = PetscMalloc1(nRows,&p);CHKERRQ(ierr); >>>>>>>> ierr = PetscMalloc1(nCols,&q);CHKERRQ(ierr); >>>>>>>> ierr = PetscMalloc1(nCols,&y);CHKERRQ(ierr); >>>>>>>> >>>>>>>> // Perform an SVD on the Hessenberg matrix - Note: this call >>>>>>>> destroys the input Hessenberg >>>>>>>> ierr = PetscFPTrapPush(PETSC_FP_TRAP_OFF);CHKERRQ(ierr); >>>>>>>> >>>>>>>> PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_("A","A",&nRows_blas,&nCols_blas,R,&nRows_blas,S,UT,&nRows_blas,V,&nCols_blas,work,&lwork,&lierr)); >>>>>>>> if (lierr) SETERRQ1(PETSC_COMM_SELF,PETSC_ERR_LIB,"Error in SVD >>>>>>>> Lapack routine %d",(int)lierr); >>>>>>>> ierr = PetscFPTrapPop();CHKERRQ(ierr); >>>>>>>> >>>>>>>> // Find the number of non-zero singular values >>>>>>>> for(nnz=0; nnz>>>>>>> if(fabs(S[nnz]) < 1.0e-8) break; >>>>>>>> } >>>>>>>> printf("number of nonzero singular values: %d\n",nnz); >>>>>>>> >>>>>>>> trans(nRows,nRows,UT,U); >>>>>>>> trans(nCols,nCols,V,VT); >>>>>>>> >>>>>>>> // Compute p = ||r_0|| U^T e_1 >>>>>>>> beta = gmres->res_beta; >>>>>>>> for (ii=0; ii>>>>>>> p[ii] = beta*UT[ii*nRows]; >>>>>>>> } >>>>>>>> p[nCols] = 0.0; >>>>>>>> >>>>>>>> // Original GMRES solution (\mu = 0) >>>>>>>> for (ii=0; ii>>>>>>> q[ii] = p[ii]/S[ii]; >>>>>>>> } >>>>>>>> >>>>>>>> // Expand y in terms of the right singular vectors as y = V q >>>>>>>> for (jj=0; jj>>>>>>> y[jj] = 0.0; >>>>>>>> for (ii=0; ii>>>>>>> y[jj] += V[jj*nCols+ii]*q[ii]; // transpose of the transpose >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> // Pass the orthnomalized Krylov vector weights back out >>>>>>>> for (ii=0; ii>>>>>>> nrs[ii] = y[ii]; >>>>>>>> } >>>>>>>> >>>>>>>> I just wanted to check that this is the correct way to extract the >>>>>>>> Hessenberg from the KSP_GMRES structure, and to pass it to LAPACK, and if >>>>>>>> so, should I really be expecting only two non-zero singular values in >>>>>>>> return for this problem? >>>>>>>> >>>>>>>> Cheers, Dave. >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their >>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>> experiments lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> >>>>> >>>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From davelee2804 at gmail.com Mon May 27 06:57:02 2019 From: davelee2804 at gmail.com (Dave Lee) Date: Mon, 27 May 2019 21:57:02 +1000 Subject: [petsc-users] Singlar values of the GMRES Hessenberg matrix In-Reply-To: References: Message-ID: Thanks for the tip Matt, I'll look into it. Cheers, Dave. On Mon, May 27, 2019 at 9:53 PM Matthew Knepley wrote: > On Mon, May 27, 2019 at 7:46 AM Dave Lee wrote: > >> Thanks Matt, so just to clarify: >> >> -- VEC_VV() contains the Krylov subspace vectors (left preconditioned if >> PC_LEFT) and not the orthonomalized vectors that make up Q_k? >> - if so, is it possible to obtain Q_k? >> >> -- HES(row,col) contains the entries of the Hessenberg matrix >> corresponding to the Arnoldi iteration for the preconditioned Krylov >> vectors (if PC_LEFT)? >> > > Everything in GMRES refers to the preconditioned operator. That is how > preconditioned GMRES works. > > If you need the unpreconditioned space, you would have to use FGMRES, > since it pays the storage overhead. > > Thanks, > > Matt > > >> Cheers, Dave. >> >> On Mon, May 27, 2019 at 8:50 PM Matthew Knepley >> wrote: >> >>> On Mon, May 27, 2019 at 3:55 AM Dave Lee wrote: >>> >>>> Hi Matt and PETSc. >>>> >>>> Thanks again for the advice. >>>> >>>> So I think I know what my problem might be. Looking at the comments >>>> above the function >>>> KSPInitialResidual() >>>> in >>>> src/ksp/ksp/interface/itres.c >>>> I see that the initial residual, as passed into VEC_VV(0) is the >>>> residual of the *preconditioned* system (and that the original >>>> residual goes temporarily to gmres->vecs[1]). >>>> >>>> So I'm wondering, is the Hessenberg, as derived via the *HES(row,col) macro >>>> the Hessenberg for the original Krylov subspace, or the preconditioned >>>> subspace? >>>> >>> >>> Left-preconditioning changes the operator, so you get he Arnoldi >>> subspace for the transforned operator, starting with a transformed rhs. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Secondly, do the vecs within the KSP_GMRES structure, as accessed via >>>> VEC_VV() correspond to the (preconditioned) Krylov subspace or the >>>> orthonormalized vectors that make up the matrix Q_k in the Arnoldi >>>> iteration? This isn't clear to me, and I need to access the vectors in >>>> Q_k in order to expand the corrected hookstep solution. >>>> >>>> Thanks again, Dave. >>>> >>>> On Sat, May 25, 2019 at 6:18 PM Dave Lee wrote: >>>> >>>>> Thanks Matt, this is where I'm adding in my hookstep code. >>>>> >>>>> Cheers, Dave. >>>>> >>>>> On Fri, May 24, 2019 at 10:49 PM Matthew Knepley >>>>> wrote: >>>>> >>>>>> On Fri, May 24, 2019 at 8:38 AM Dave Lee >>>>>> wrote: >>>>>> >>>>>>> Thanks Matt, great suggestion. >>>>>>> >>>>>>> I did indeed find a transpose error this way. The SVD as >>>>>>> reconstructed via U S V^T now matches the input Hessenberg matrix as >>>>>>> derived via the *HES(row,col) macro, and all the singular values >>>>>>> are non-zero. However the solution to example src/ksp/ksp/examples/tutorials/ex1.c >>>>>>> as determined via the expansion over the singular vectors is still >>>>>>> not correct. I suspect I'm doing something wrong with regards to the >>>>>>> expansion over the vec array VEC_VV(), which I assume are the >>>>>>> orthonormal vectors of the Q_k matrix in the Arnoldi iteration.... >>>>>>> >>>>>> >>>>>> Here we are building the solution: >>>>>> >>>>>> >>>>>> https://bitbucket.org/petsc/petsc/src/7c23e6aa64ffbff85a2457e1aa154ec3d7f238e3/src/ksp/ksp/impls/gmres/gmres.c#lines-331 >>>>>> >>>>>> There are some subtleties if you have a nonzero initial guess or a >>>>>> preconditioner. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>>> Thanks again for your advice, I'll keep digging. >>>>>>> >>>>>>> Cheers, Dave. >>>>>>> >>>>>>> On Thu, May 23, 2019 at 8:20 PM Matthew Knepley >>>>>>> wrote: >>>>>>> >>>>>>>> On Thu, May 23, 2019 at 5:09 AM Dave Lee via petsc-users < >>>>>>>> petsc-users at mcs.anl.gov> wrote: >>>>>>>> >>>>>>>>> Hi PETSc, >>>>>>>>> >>>>>>>>> I'm trying to add a "hook step" to the SNES trust region solver >>>>>>>>> (at the end of the function: KSPGMRESBuildSoln()) >>>>>>>>> >>>>>>>>> I'm testing this using the (linear) example: >>>>>>>>> src/ksp/ksp/examples/tutorials/ex1.c >>>>>>>>> as >>>>>>>>> gdb --args ./test -snes_mf -snes_type newtontr -ksp_rtol 1.0e-12 >>>>>>>>> -snes_stol 1.0e-12 -ksp_converged_reason -snes_converged_reason >>>>>>>>> -ksp_monitor -snes_monitor >>>>>>>>> (Ignore the SNES stuff, this is for when I test nonlinear >>>>>>>>> examples). >>>>>>>>> >>>>>>>>> When I call the LAPACK SVD routine via PETSc as >>>>>>>>> PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_(...)) >>>>>>>>> I get the following singular values: >>>>>>>>> >>>>>>>>> 0 KSP Residual norm 7.071067811865e-01 >>>>>>>>> 1 KSP Residual norm 3.162277660168e-01 >>>>>>>>> 2 KSP Residual norm 1.889822365046e-01 >>>>>>>>> 3 KSP Residual norm 1.290994448736e-01 >>>>>>>>> 4 KSP Residual norm 9.534625892456e-02 >>>>>>>>> 5 KSP Residual norm 8.082545620881e-16 >>>>>>>>> >>>>>>>>> 1 0.5 -7.85046e-16 1.17757e-15 >>>>>>>>> 0.5 1 0.5 1.7271e-15 >>>>>>>>> 0 0.5 1 0.5 >>>>>>>>> 0 0 0.5 1 >>>>>>>>> 0 0 0 0.5 >>>>>>>>> >>>>>>>>> singular values: 2.36264 0.409816 1.97794e-15 6.67632e-16 >>>>>>>>> >>>>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 5 >>>>>>>>> >>>>>>>>> Where the lines above the singular values are the Hessenberg >>>>>>>>> matrix that I'm doing the SVD on. >>>>>>>>> >>>>>>>> >>>>>>>> First, write out all the SVD matrices you get and make sure that >>>>>>>> they reconstruct the input matrix (that >>>>>>>> you do not have something transposed somewhere). >>>>>>>> >>>>>>>> Matt >>>>>>>> >>>>>>>> >>>>>>>>> When I build the solution in terms of the leading two right >>>>>>>>> singular vectors (and subsequently the first two orthonormal basis vectors >>>>>>>>> in VECS_VV I get an error norm as: >>>>>>>>> Norm of error 3.16228, Iterations 5 >>>>>>>>> >>>>>>>>> My suspicion is that I'm creating the Hessenberg incorrectly, as I >>>>>>>>> would have thought that this problem should have more than two non-zero >>>>>>>>> leading singular values. >>>>>>>>> >>>>>>>>> Within my modified version of the GMRES build solution function >>>>>>>>> (attached) I'm creating this (and passing it to LAPACK as): >>>>>>>>> >>>>>>>>> nRows = gmres->it+1; >>>>>>>>> nCols = nRows-1; >>>>>>>>> >>>>>>>>> ierr = PetscBLASIntCast(nRows,&nRows_blas);CHKERRQ(ierr); >>>>>>>>> ierr = PetscBLASIntCast(nCols,&nCols_blas);CHKERRQ(ierr); >>>>>>>>> ierr = PetscBLASIntCast(5*nRows,&lwork);CHKERRQ(ierr); >>>>>>>>> ierr = PetscMalloc1(5*nRows,&work);CHKERRQ(ierr); >>>>>>>>> ierr = PetscMalloc1(nRows*nCols,&R);CHKERRQ(ierr); >>>>>>>>> ierr = PetscMalloc1(nRows*nCols,&H);CHKERRQ(ierr); >>>>>>>>> for (jj = 0; jj < nRows; jj++) { >>>>>>>>> for (ii = 0; ii < nCols; ii++) { >>>>>>>>> R[jj*nCols+ii] = *HES(jj,ii); >>>>>>>>> } >>>>>>>>> } >>>>>>>>> // Duplicate the Hessenberg matrix as the one passed to the >>>>>>>>> SVD solver is destroyed >>>>>>>>> for (ii=0; ii>>>>>>>> >>>>>>>>> ierr = PetscMalloc1(nRows*nRows,&U);CHKERRQ(ierr); >>>>>>>>> ierr = PetscMalloc1(nCols*nCols,&VT);CHKERRQ(ierr); >>>>>>>>> ierr = PetscMalloc1(nRows*nRows,&UT);CHKERRQ(ierr); >>>>>>>>> ierr = PetscMalloc1(nCols*nCols,&V);CHKERRQ(ierr); >>>>>>>>> ierr = PetscMalloc1(nRows,&p);CHKERRQ(ierr); >>>>>>>>> ierr = PetscMalloc1(nCols,&q);CHKERRQ(ierr); >>>>>>>>> ierr = PetscMalloc1(nCols,&y);CHKERRQ(ierr); >>>>>>>>> >>>>>>>>> // Perform an SVD on the Hessenberg matrix - Note: this call >>>>>>>>> destroys the input Hessenberg >>>>>>>>> ierr = PetscFPTrapPush(PETSC_FP_TRAP_OFF);CHKERRQ(ierr); >>>>>>>>> >>>>>>>>> PetscStackCallBLAS("LAPACKgesvd",LAPACKgesvd_("A","A",&nRows_blas,&nCols_blas,R,&nRows_blas,S,UT,&nRows_blas,V,&nCols_blas,work,&lwork,&lierr)); >>>>>>>>> if (lierr) SETERRQ1(PETSC_COMM_SELF,PETSC_ERR_LIB,"Error in >>>>>>>>> SVD Lapack routine %d",(int)lierr); >>>>>>>>> ierr = PetscFPTrapPop();CHKERRQ(ierr); >>>>>>>>> >>>>>>>>> // Find the number of non-zero singular values >>>>>>>>> for(nnz=0; nnz>>>>>>>> if(fabs(S[nnz]) < 1.0e-8) break; >>>>>>>>> } >>>>>>>>> printf("number of nonzero singular values: %d\n",nnz); >>>>>>>>> >>>>>>>>> trans(nRows,nRows,UT,U); >>>>>>>>> trans(nCols,nCols,V,VT); >>>>>>>>> >>>>>>>>> // Compute p = ||r_0|| U^T e_1 >>>>>>>>> beta = gmres->res_beta; >>>>>>>>> for (ii=0; ii>>>>>>>> p[ii] = beta*UT[ii*nRows]; >>>>>>>>> } >>>>>>>>> p[nCols] = 0.0; >>>>>>>>> >>>>>>>>> // Original GMRES solution (\mu = 0) >>>>>>>>> for (ii=0; ii>>>>>>>> q[ii] = p[ii]/S[ii]; >>>>>>>>> } >>>>>>>>> >>>>>>>>> // Expand y in terms of the right singular vectors as y = V q >>>>>>>>> for (jj=0; jj>>>>>>>> y[jj] = 0.0; >>>>>>>>> for (ii=0; ii>>>>>>>> y[jj] += V[jj*nCols+ii]*q[ii]; // transpose of the >>>>>>>>> transpose >>>>>>>>> } >>>>>>>>> } >>>>>>>>> >>>>>>>>> // Pass the orthnomalized Krylov vector weights back out >>>>>>>>> for (ii=0; ii>>>>>>>> nrs[ii] = y[ii]; >>>>>>>>> } >>>>>>>>> >>>>>>>>> I just wanted to check that this is the correct way to extract the >>>>>>>>> Hessenberg from the KSP_GMRES structure, and to pass it to LAPACK, and if >>>>>>>>> so, should I really be expecting only two non-zero singular values in >>>>>>>>> return for this problem? >>>>>>>>> >>>>>>>>> Cheers, Dave. >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> What most experimenters take for granted before they begin their >>>>>>>> experiments is infinitely more interesting than any results to which their >>>>>>>> experiments lead. >>>>>>>> -- Norbert Wiener >>>>>>>> >>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>> >>>>>> >>>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.croucher at auckland.ac.nz Mon May 27 18:26:14 2019 From: a.croucher at auckland.ac.nz (Adrian Croucher) Date: Tue, 28 May 2019 11:26:14 +1200 Subject: [petsc-users] parallel dual porosity Message-ID: hi A couple of years back I was asking questions here about implementing "dual porosity" finite volume methods via PETSc (in which flow in fractured media is represented by adding extra "matrix" cells nested inside the original mesh cells). At the time I was asking about how to solve the resulting linear equations more efficiently (I still haven't worked on that part of it yet, so at present it's still just using a naive linear solve which doesn't take advantage of the particular sparsity pattern), and about how to add the extra cells into the DMPlex mesh, which I figured out how to do. It is working OK except that strong scaling performance is not very good, if dual porosity is applied over only part of the mesh. I think the reason is that I read the mesh in and distribute it, then add the dual porosity cells in parallel on each process. So some processes can end up with more cells than others, in which case the load balancing is bad. I'm considering trying to change it so that I add the dual porosity cells to the DMPlex in serial, before distribution, to regain decent load balancing. To do that, I'd also need to compute the cell centroids in serial (as they are often used to identify which cells should have dual porosity applied), using DMPlexComputeGeometryFVM(). The geometry vectors would then have to be distributed later, I guess using something like DMPlexDistributeField(). Should I expect a significant performance hit from calling DMPlexComputeGeometryFVM() on the serial mesh compared with doing it (as now) on the distributed mesh? It will increase the serial fraction of the code but as it's only done once at the start I'm hoping the benefits will outweigh the costs. - Adrian -- Dr Adrian Croucher Senior Research Fellow Department of Engineering Science University of Auckland, New Zealand email:a.croucher at auckland.ac.nz tel: +64 (0)9 923 4611 From knepley at gmail.com Mon May 27 18:32:38 2019 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 27 May 2019 19:32:38 -0400 Subject: [petsc-users] parallel dual porosity In-Reply-To: References: Message-ID: On Mon, May 27, 2019 at 7:26 PM Adrian Croucher via petsc-users < petsc-users at mcs.anl.gov> wrote: > hi > > A couple of years back I was asking questions here about implementing > "dual porosity" finite volume methods via PETSc (in which flow in > fractured media is represented by adding extra "matrix" cells nested > inside the original mesh cells). > > At the time I was asking about how to solve the resulting linear > equations more efficiently (I still haven't worked on that part of it > yet, so at present it's still just using a naive linear solve which > doesn't take advantage of the particular sparsity pattern), and about > how to add the extra cells into the DMPlex mesh, which I figured out how > to do. > > It is working OK except that strong scaling performance is not very > good, if dual porosity is applied over only part of the mesh. I think > the reason is that I read the mesh in and distribute it, then add the > dual porosity cells in parallel on each process. So some processes can > end up with more cells than others, in which case the load balancing is > bad. > > I'm considering trying to change it so that I add the dual porosity > cells to the DMPlex in serial, before distribution, to regain decent > load balancing. > I would not do that. It should be much easier, and better from a workflow standpoint, to just redistribute in parallel. We now have several test examples that redistribute in parallel, for example https://bitbucket.org/petsc/petsc/src/cd762eb66180d8d1fcc3950bd19a3c1b423f4f20/src/dm/impls/plex/examples/tests/ex1.c#lines-486 Let us know if you have problems. Thanks, Matt > To do that, I'd also need to compute the cell centroids in serial (as > they are often used to identify which cells should have dual porosity > applied), using DMPlexComputeGeometryFVM(). The geometry vectors would > then have to be distributed later, I guess using something like > DMPlexDistributeField(). > > Should I expect a significant performance hit from calling > DMPlexComputeGeometryFVM() on the serial mesh compared with doing it (as > now) on the distributed mesh? It will increase the serial fraction of > the code but as it's only done once at the start I'm hoping the benefits > will outweigh the costs. > > - Adrian > > -- > Dr Adrian Croucher > Senior Research Fellow > Department of Engineering Science > University of Auckland, New Zealand > email:a.croucher at auckland.ac.nz > tel: +64 (0)9 923 4611 > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.croucher at auckland.ac.nz Mon May 27 18:53:17 2019 From: a.croucher at auckland.ac.nz (Adrian Croucher) Date: Tue, 28 May 2019 11:53:17 +1200 Subject: [petsc-users] parallel dual porosity In-Reply-To: References: Message-ID: Oh, so DMPlexDistribute() can redistribute an already-distributed mesh? I didn't know that- had assumed it was only for distributing a serial mesh. If so, that sounds like a much better idea. I'll check it out. Thanks! - Adrian On 28/05/19 11:32 AM, Matthew Knepley wrote: > > I would not do that. It should be much easier, and better from a > workflow standpoint, > to just redistribute in parallel. We now have several test examples > that redistribute > in parallel, for example > > https://bitbucket.org/petsc/petsc/src/cd762eb66180d8d1fcc3950bd19a3c1b423f4f20/src/dm/impls/plex/examples/tests/ex1.c#lines-486 > > Let us know if you have problems. > > ? Thanks, > > ? ? ?Matt -- Dr Adrian Croucher Senior Research Fellow Department of Engineering Science University of Auckland, New Zealand email: a.croucher at auckland.ac.nz tel: +64 (0)9 923 4611 -------------- next part -------------- An HTML attachment was scrubbed... URL: From janicvermaak at gmail.com Mon May 27 23:33:52 2019 From: janicvermaak at gmail.com (Jan Izak Cornelius Vermaak) Date: Mon, 27 May 2019 23:33:52 -0500 Subject: [petsc-users] Matrix free GMRES seems to ignore my initial guess Message-ID: Hi all, So I am faced with this debacle. I have a neutron transport solver with a sweep code that can compute the action of the matrix on a vector. I use a matrix shell to set up the action of the matrix. The method works but only if I can get the solution converged before GMRES restarts. It gets the right answer. Now my first problem is (and I only saw this when I hit the first restart) is that it looks like the solver completely resets after the GMRES-restart. Below is an iteration log with restart interval set to 10. At first I thought it wasn't updating the initial guess but it became clear that it initial guess always had no effect. I do set KSPSetInitialGuessNonZero but it has no effect. Is the matrix-free business defaulting my initial guess to zero everytime? What can I do to actually supply an initial guess? I've used PETSc for diffusion many times and the initial guess always works, just not now. [0] Computing b [0] Iteration 0 Residual 169.302 [0] Iteration 1 Residual 47.582 [0] Iteration 2 Residual 13.2614 [0] Iteration 3 Residual 4.46795 [0] Iteration 4 Residual 1.03038 [0] Iteration 5 Residual 0.246807 [0] Iteration 6 Residual 0.0828341 [0] Iteration 7 Residual 0.0410627 [0] Iteration 8 Residual 0.0243749 [0] Iteration 9 Residual 0.0136067 [0] Iteration 10 Residual 169.302 [0] Iteration 11 Residual 47.582 [0] Iteration 12 Residual 13.2614 [0] Iteration 13 Residual 4.46795 [0] Iteration 14 Residual 1.03038 [0] Iteration 15 Residual 0.246807 [0] Iteration 16 Residual 0.0828341 [0] Iteration 17 Residual 0.0410627 [0] Iteration 18 Residual 0.0243749 [0] Iteration 19 Residual 0.0136067 [0] Iteration 20 Residual 169.302 -- Jan Izak Cornelius Vermaak (M.Eng Nuclear) Email: janicvermaak at gmail.com Cell: +1-979-739-0789 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon May 27 23:55:15 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Tue, 28 May 2019 04:55:15 +0000 Subject: [petsc-users] Matrix free GMRES seems to ignore my initial guess In-Reply-To: References: Message-ID: <393102F5-A229-4C5A-B2B3-3BE9B5B7314B@anl.gov> This behavior where the residual norm jumps at restart indicates something is very very wrong. Run with the option -ksp_monitor_true_residual and I think you'll see the true residual is not decreasing as is the preconditioned residual. My guess is that your "action of the matrix" is incorrect and not actually a linear operator. Try using MatComputeExplicitOperator() and see what explicit matrix it produces, is it what you expect? Barry > On May 27, 2019, at 11:33 PM, Jan Izak Cornelius Vermaak via petsc-users wrote: > > Hi all, > > So I am faced with this debacle. I have a neutron transport solver with a sweep code that can compute the action of the matrix on a vector. > > I use a matrix shell to set up the action of the matrix. The method works but only if I can get the solution converged before GMRES restarts. It gets the right answer. Now my first problem is (and I only saw this when I hit the first restart) is that it looks like the solver completely resets after the GMRES-restart. Below is an iteration log with restart interval set to 10. At first I thought it wasn't updating the initial guess but it became clear that it initial guess always had no effect. I do set KSPSetInitialGuessNonZero but it has no effect. > > Is the matrix-free business defaulting my initial guess to zero everytime? What can I do to actually supply an initial guess? I've used PETSc for diffusion many times and the initial guess always works, just not now. > > [0] Computing b > [0] Iteration 0 Residual 169.302 > [0] Iteration 1 Residual 47.582 > [0] Iteration 2 Residual 13.2614 > [0] Iteration 3 Residual 4.46795 > [0] Iteration 4 Residual 1.03038 > [0] Iteration 5 Residual 0.246807 > [0] Iteration 6 Residual 0.0828341 > [0] Iteration 7 Residual 0.0410627 > [0] Iteration 8 Residual 0.0243749 > [0] Iteration 9 Residual 0.0136067 > [0] Iteration 10 Residual 169.302 > [0] Iteration 11 Residual 47.582 > [0] Iteration 12 Residual 13.2614 > [0] Iteration 13 Residual 4.46795 > [0] Iteration 14 Residual 1.03038 > [0] Iteration 15 Residual 0.246807 > [0] Iteration 16 Residual 0.0828341 > [0] Iteration 17 Residual 0.0410627 > [0] Iteration 18 Residual 0.0243749 > [0] Iteration 19 Residual 0.0136067 > [0] Iteration 20 Residual 169.302 > > -- > Jan Izak Cornelius Vermaak > (M.Eng Nuclear) > Email: janicvermaak at gmail.com > Cell: +1-979-739-0789 From bsmith at mcs.anl.gov Tue May 28 01:16:06 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Tue, 28 May 2019 06:16:06 +0000 Subject: [petsc-users] Matrix free GMRES seems to ignore my initial guess In-Reply-To: References: <393102F5-A229-4C5A-B2B3-3BE9B5B7314B@anl.gov> Message-ID: <80420AED-1242-40CE-9D00-BB60371C12F5@mcs.anl.gov> > On May 28, 2019, at 12:41 AM, Jan Izak Cornelius Vermaak wrote: > > Checking the matrix would be hard as I have a really big operator. Its transport sweeps. > > When I increase the restart interval the solution converges to the right one. Run with -ksp_monitor_true_residual what are the true residuals being printed? The GMRES code has been in continuous use for 25 years, it would stunning if you suddenly found a bug in it. How it works, within a restart, the GMRES algorithm uses a simple recursive formula to compute an "estimate" for the residual norm. At restart it actually computes the current solution and then uses that to compute an accurate residual norm via the formula b - A x. When the residual computed by the b - A x is different than that computed by the recursive formula it means the recursive formula has run into some difficulty (bad operator, bad preconditioner, null space in the operator) and is not computing correct values. Now if you increase the restart to past the point when it "converges" you are hiding the incorrectly computed values computed via the recursive formula. I urge you to check the residual norm by b - A x at the end of the solve and double check that it is small. It seems unlikely GMRES is providing the correct answer for your problem. Barry > Checked against a reference solution and Classic Richardson. It is really as if the initial guess is completely ignored. > > [0] Computing b > [0] Iteration 0 Residual 169.302 > [0] Iteration 1 Residual 47.582 > [0] Iteration 2 Residual 13.2614 > [0] Iteration 3 Residual 4.46795 > [0] Iteration 4 Residual 1.03038 > [0] Iteration 5 Residual 0.246807 > [0] Iteration 6 Residual 0.0828341 > [0] Iteration 7 Residual 0.0410627 > [0] Iteration 8 Residual 0.0243749 > [0] Iteration 9 Residual 0.0136067 > [0] Iteration 10 Residual 0.00769078 > [0] Iteration 11 Residual 0.00441658 > [0] Iteration 12 Residual 0.00240794 > [0] Iteration 13 Residual 0.00132048 > [0] Iteration 14 Residual 0.00073003 > [0] Iteration 15 Residual 0.000399504 > [0] Iteration 16 Residual 0.000217677 > [0] Iteration 17 Residual 0.000120408 > [0] Iteration 18 Residual 6.49719e-05 > [0] Iteration 19 Residual 3.44523e-05 > [0] Iteration 20 Residual 1.87909e-05 > [0] Iteration 21 Residual 1.02385e-05 > [0] Iteration 22 Residual 5.57859e-06 > [0] Iteration 23 Residual 3.03431e-06 > [0] Iteration 24 Residual 1.63696e-06 > [0] Iteration 25 Residual 8.78202e-07 > > On Mon, May 27, 2019 at 11:55 PM Smith, Barry F. wrote: > > This behavior where the residual norm jumps at restart indicates something is very very wrong. Run with the option -ksp_monitor_true_residual and I think you'll see the true residual is not decreasing as is the preconditioned residual. My guess is that your "action of the matrix" is incorrect and not actually a linear operator. Try using MatComputeExplicitOperator() and see what explicit matrix it produces, is it what you expect? > > Barry > > > > > > On May 27, 2019, at 11:33 PM, Jan Izak Cornelius Vermaak via petsc-users wrote: > > > > Hi all, > > > > So I am faced with this debacle. I have a neutron transport solver with a sweep code that can compute the action of the matrix on a vector. > > > > I use a matrix shell to set up the action of the matrix. The method works but only if I can get the solution converged before GMRES restarts. It gets the right answer. Now my first problem is (and I only saw this when I hit the first restart) is that it looks like the solver completely resets after the GMRES-restart. Below is an iteration log with restart interval set to 10. At first I thought it wasn't updating the initial guess but it became clear that it initial guess always had no effect. I do set KSPSetInitialGuessNonZero but it has no effect. > > > > Is the matrix-free business defaulting my initial guess to zero everytime? What can I do to actually supply an initial guess? I've used PETSc for diffusion many times and the initial guess always works, just not now. > > > > [0] Computing b > > [0] Iteration 0 Residual 169.302 > > [0] Iteration 1 Residual 47.582 > > [0] Iteration 2 Residual 13.2614 > > [0] Iteration 3 Residual 4.46795 > > [0] Iteration 4 Residual 1.03038 > > [0] Iteration 5 Residual 0.246807 > > [0] Iteration 6 Residual 0.0828341 > > [0] Iteration 7 Residual 0.0410627 > > [0] Iteration 8 Residual 0.0243749 > > [0] Iteration 9 Residual 0.0136067 > > [0] Iteration 10 Residual 169.302 > > [0] Iteration 11 Residual 47.582 > > [0] Iteration 12 Residual 13.2614 > > [0] Iteration 13 Residual 4.46795 > > [0] Iteration 14 Residual 1.03038 > > [0] Iteration 15 Residual 0.246807 > > [0] Iteration 16 Residual 0.0828341 > > [0] Iteration 17 Residual 0.0410627 > > [0] Iteration 18 Residual 0.0243749 > > [0] Iteration 19 Residual 0.0136067 > > [0] Iteration 20 Residual 169.302 > > > > -- > > Jan Izak Cornelius Vermaak > > (M.Eng Nuclear) > > Email: janicvermaak at gmail.com > > Cell: +1-979-739-0789 > > > > -- > Jan Izak Cornelius Vermaak > (M.Eng Nuclear) > Email: janicvermaak at gmail.com > Cell: +1-979-739-0789 From patrick.sanan at gmail.com Tue May 28 04:36:25 2019 From: patrick.sanan at gmail.com (Patrick Sanan) Date: Tue, 28 May 2019 11:36:25 +0200 Subject: [petsc-users] TS: How to implement a simple stopping criterion Message-ID: I'm working with/on a code which uses TSSUNDIALS, and I'd like to be able to stop the timestepper based on the value of the solution. In particular, I wish to enforce that a given concentration has not changed by more than a specified amount before stopping. Note that this is simpler than general event detection, as I'm happy stopping before the condition is satisfied and don't care about finding the point in time when the condition is satisfied exactly. As far as I know, PETSc's event handling interface isn't supported with the SUNDIALS implementation. (As an aside, I'd be happier using TSARKIMEX or another native timestepper, but so far haven't been able to avoid tiny timesteps). My question is whether the following approach has any obvious fatal flaw, and if any TS gurus have other/better/simpler ideas. The idea is to add my own logic, say with TSSetPreStep(), to: 1. Maintain the previous step's state (this is a 1d problem, so I'm not too concerned about the overhead of this) 2. Check my condition, and if it's satisfied, dump the previous step's data, and use TSSetMaxTime() with the previous step's time, thus ending the solve. -------------- next part -------------- An HTML attachment was scrubbed... URL: From i.gutheil at fz-juelich.de Tue May 28 09:20:00 2019 From: i.gutheil at fz-juelich.de (Inge Gutheil) Date: Tue, 28 May 2019 16:20:00 +0200 Subject: [petsc-users] How do I supply the compiler PIC flag via CFLAGS, CXXXFLAGS, and FCFLAGS In-Reply-To: References: Message-ID: Dear PETSc list, when I try to install the petsc-3.11.2 library as a static library - for some reasons I do not want the dynamic library - I suddenly get the error Cannot determine compiler PIC flags if shared libraries is turned off Either run using --with-shared-libraries or --with-pic=0 and supply the compiler PIC flag via CFLAGS, CXXXFLAGS, and FCFLAGS Attached find the configure.log I added --with-pic=0 as can seen from configure.log but I do not know where I can find how to set the compiler PIC flag via CFLAGS, CXXXFLAGS, and FCFLAGS, at least -fPIC seems to be not sufficient, so what can I do? Wit 3.11.1 I did not have that problem. Regards Inge -- Inge Gutheil Juelich Supercomputing Centre Institute for Advanced Simulation Forschungszentrum Juelich GmbH 52425 Juelich, Germany Phone: +49-2461-61-3135 Fax: +49-2461-61-6656 E-mail:i.gutheil at fz-juelich.de Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Volker Rieke Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: text/x-log Size: 117057 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From balay at mcs.anl.gov Tue May 28 09:30:54 2019 From: balay at mcs.anl.gov (Balay, Satish) Date: Tue, 28 May 2019 14:30:54 +0000 Subject: [petsc-users] How do I supply the compiler PIC flag via CFLAGS, CXXXFLAGS, and FCFLAGS In-Reply-To: References: Message-ID: Configure.log shows '--with-pic=1' - hence this error. Remove '--with-pic=1' and retry. Saitsh On Tue, 28 May 2019, Inge Gutheil via petsc-users wrote: > Dear PETSc list, > when I try to install the petsc-3.11.2 library as a static library - for > some reasons I do not want the dynamic library - > I suddenly get the error > > Cannot determine compiler PIC flags if shared libraries is turned off > Either run using --with-shared-libraries or --with-pic=0 and supply the > compiler PIC flag via CFLAGS, CXXXFLAGS, and FCFLAGS > > Attached find the configure.log > I added --with-pic=0 as can seen from configure.log but I do not know > where I can find how to set the compiler PIC flag via CFLAGS, CXXXFLAGS, > and FCFLAGS, at least -fPIC seems to be not sufficient, so what can I do? > > Wit 3.11.1 I did not have that problem. > > Regards > Inge > > From dalcinl at gmail.com Tue May 28 10:01:52 2019 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Tue, 28 May 2019 18:01:52 +0300 Subject: [petsc-users] How do I supply the compiler PIC flag via CFLAGS, CXXXFLAGS, and FCFLAGS In-Reply-To: References: Message-ID: On Tue, 28 May 2019 at 17:31, Balay, Satish via petsc-users < petsc-users at mcs.anl.gov> wrote: > Configure.log shows '--with-pic=1' - hence this error. > > Remove '--with-pic=1' and retry. > > Nonsense. Why this behavior? Building a static library with PIC code is a perfectly valid use case. -- Lisandro Dalcin ============ Research Scientist Extreme Computing Research Center (ECRC) King Abdullah University of Science and Technology (KAUST) http://ecrc.kaust.edu.sa/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Tue May 28 10:04:06 2019 From: hongzhang at anl.gov (Zhang, Hong) Date: Tue, 28 May 2019 15:04:06 +0000 Subject: [petsc-users] TS: How to implement a simple stopping criterion In-Reply-To: References: Message-ID: You are right that TSEvent is not suitable for this case. To stop the timestepper, I would call TSSetConvergedReason(ts,TS_CONVERGED_USER) in a PostStep function. Hong (Mr.) > On May 28, 2019, at 4:36 AM, Patrick Sanan via petsc-users wrote: > > I'm working with/on a code which uses TSSUNDIALS, and I'd like to be able to stop the timestepper based on the value of the solution. In particular, I wish to enforce that a given concentration has not changed by more than a specified amount before stopping. Note that this is simpler than general event detection, as I'm happy stopping before the condition is satisfied and don't care about finding the point in time when the condition is satisfied exactly. > > As far as I know, PETSc's event handling interface isn't supported with the SUNDIALS implementation. (As an aside, I'd be happier using TSARKIMEX or another native timestepper, but so far haven't been able to avoid tiny timesteps). > > My question is whether the following approach has any obvious fatal flaw, and if any TS gurus have other/better/simpler ideas. > > The idea is to add my own logic, say with TSSetPreStep(), to: > > 1. Maintain the previous step's state (this is a 1d problem, so I'm not too concerned about the overhead of this) > 2. Check my condition, and if it's satisfied, dump the previous step's data, and use TSSetMaxTime() with the previous step's time, thus ending the solve. > From bsmith at mcs.anl.gov Tue May 28 10:14:59 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Tue, 28 May 2019 15:14:59 +0000 Subject: [petsc-users] How do I supply the compiler PIC flag via CFLAGS, CXXXFLAGS, and FCFLAGS In-Reply-To: References: Message-ID: <3643F65F-A93F-4A97-8C65-C79AEC365227@mcs.anl.gov> > On May 28, 2019, at 10:01 AM, Lisandro Dalcin wrote: > > > > On Tue, 28 May 2019 at 17:31, Balay, Satish via petsc-users wrote: > Configure.log shows '--with-pic=1' - hence this error. > > Remove '--with-pic=1' and retry. > > > Nonsense. Why this behavior? Building a static library with PIC code is a perfectly valid use case. Indeed it is. But for that case PETSc's configure requires the user to pass the appropriate flag for PIC in through the compiler flags. (Should be better documented EXACTLY how) The reason for this nonsense design feature is that Configure does not know how to check for appropriate PIC flags unless Configure has turned on shared libraries for PETSc. It is rather convoluted; I'd love if someone fixed it but obviously Jed, Satish, and I (who agree with you that this is utter nonsense) are not brave enough to muck with that portion of configure. > > -- > Lisandro Dalcin > ============ > Research Scientist > Extreme Computing Research Center (ECRC) > King Abdullah University of Science and Technology (KAUST) > http://ecrc.kaust.edu.sa/ From jed at jedbrown.org Tue May 28 10:19:29 2019 From: jed at jedbrown.org (Jed Brown) Date: Tue, 28 May 2019 09:19:29 -0600 Subject: [petsc-users] How do I supply the compiler PIC flag via CFLAGS, CXXXFLAGS, and FCFLAGS In-Reply-To: References: Message-ID: <87tvdea9oe.fsf@jedbrown.org> Lisandro Dalcin via petsc-users writes: > On Tue, 28 May 2019 at 17:31, Balay, Satish via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> Configure.log shows '--with-pic=1' - hence this error. >> >> Remove '--with-pic=1' and retry. >> >> > Nonsense. Why this behavior? Building a static library with PIC code is a > perfectly valid use case. And that's what will happen because Inge passed -fPIC in CFLAGS et al. Do you know how we could confirm that PIC code is generated without attempting to use shared libraries? if self.argDB['with-pic'] and not useSharedLibraries: # this is a flaw in configure; it is a legitimate use case where PETSc is built with PIC flags but not shared libraries # to fix it the capability to build shared libraries must be enabled in configure if --with-pic=true even if shared libraries are off and this # test must use that capability instead of using the default shared library build in that case which is static libraries raise RuntimeError("Cannot determine compiler PIC flags if shared libraries is turned off\nEither run using --with-shared-libraries or --with-pic=0 and supply the compiler PIC flag via CFLAGS, CXXXFLAGS, and FCFLAGS\n") From dalcinl at gmail.com Tue May 28 12:29:17 2019 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Tue, 28 May 2019 20:29:17 +0300 Subject: [petsc-users] How do I supply the compiler PIC flag via CFLAGS, CXXXFLAGS, and FCFLAGS In-Reply-To: <87tvdea9oe.fsf@jedbrown.org> References: <87tvdea9oe.fsf@jedbrown.org> Message-ID: On Tue, 28 May 2019 at 18:19, Jed Brown wrote: > Lisandro Dalcin via petsc-users writes: > > > On Tue, 28 May 2019 at 17:31, Balay, Satish via petsc-users < > > petsc-users at mcs.anl.gov> wrote: > > > >> Configure.log shows '--with-pic=1' - hence this error. > >> > >> Remove '--with-pic=1' and retry. > >> > >> > > Nonsense. Why this behavior? Building a static library with PIC code is a > > perfectly valid use case. > > And that's what will happen because Inge passed -fPIC in CFLAGS et al. > > Do you know how we could confirm that PIC code is generated without > attempting to use shared libraries? > > I know how to do it with the `readelf` command for ELF objects. I even know how to do it compile-time for GCC and clang. Maybe Intel also works this way. I do not know about a general solution, though. $ cat check-pic.c #ifndef __PIC__ #error "no-PIC" #endif $ gcc -c check-pic.c -fPIC $ clang -c check-pic.c -fPIC $ gcc -c check-pic.c check-pic.c:2:2: error: #error "no-PIC" 2 | #error "no-PIC" | ^~~~~ $ clang -c check-pic.c check-pic.c:2:2: error: "no-PIC" #error "no-PIC" ^ 1 error generated. -- Lisandro Dalcin ============ Research Scientist Extreme Computing Research Center (ECRC) King Abdullah University of Science and Technology (KAUST) http://ecrc.kaust.edu.sa/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue May 28 13:54:44 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Tue, 28 May 2019 18:54:44 +0000 Subject: [petsc-users] How do I supply the compiler PIC flag via CFLAGS, CXXXFLAGS, and FCFLAGS In-Reply-To: References: <87tvdea9oe.fsf@jedbrown.org> Message-ID: <1D6F65B9-8126-47DB-94CE-4D8E5D3B9708@anl.gov> Works for Intel and PGI compiles (the version I checked) bsmith at es:~$ pgcc check-pic.c -PIC pgcc-Error-Unknown switch: -PIC bsmith at es:~$ pgcc check-pic.c -fPIC bsmith at es:~$ pgcc check-pic.c PGC-F-0249-#error -- "no-PIC" (check-pic.c: 2) PGC/x86-64 Linux 19.3-0: compilation aborted bsmith at es:~$ icc check-pic.c check-pic.c(2): error: #error directive: "no-PIC" #error "no-PIC" ^ compilation aborted for check-pic.c (code 2) bsmith at es:~$ icc check-pic.c -PIC icc: command line warning #10006: ignoring unknown option '-PIC' check-pic.c(2): error: #error directive: "no-PIC" #error "no-PIC" ^ compilation aborted for check-pic.c (code 2) bsmith at es:~$ icc check-pic.c -fPIC bsmith at es:~$ You are the man! > On May 28, 2019, at 12:29 PM, Lisandro Dalcin via petsc-users wrote: > > > > On Tue, 28 May 2019 at 18:19, Jed Brown wrote: > Lisandro Dalcin via petsc-users writes: > > > On Tue, 28 May 2019 at 17:31, Balay, Satish via petsc-users < > > petsc-users at mcs.anl.gov> wrote: > > > >> Configure.log shows '--with-pic=1' - hence this error. > >> > >> Remove '--with-pic=1' and retry. > >> > >> > > Nonsense. Why this behavior? Building a static library with PIC code is a > > perfectly valid use case. > > And that's what will happen because Inge passed -fPIC in CFLAGS et al. > > Do you know how we could confirm that PIC code is generated without > attempting to use shared libraries? > > > I know how to do it with the `readelf` command for ELF objects. I even know how to do it compile-time for GCC and clang. Maybe Intel also works this way. I do not know about a general solution, though. > > $ cat check-pic.c > #ifndef __PIC__ > #error "no-PIC" > #endif > > $ gcc -c check-pic.c -fPIC > > $ clang -c check-pic.c -fPIC > > $ gcc -c check-pic.c > check-pic.c:2:2: error: #error "no-PIC" > 2 | #error "no-PIC" > | ^~~~~ > > $ clang -c check-pic.c > check-pic.c:2:2: error: "no-PIC" > #error "no-PIC" > ^ > 1 error generated. > > -- > Lisandro Dalcin > ============ > Research Scientist > Extreme Computing Research Center (ECRC) > King Abdullah University of Science and Technology (KAUST) > http://ecrc.kaust.edu.sa/ From jed at jedbrown.org Tue May 28 14:05:43 2019 From: jed at jedbrown.org (Jed Brown) Date: Tue, 28 May 2019 13:05:43 -0600 Subject: [petsc-users] How do I supply the compiler PIC flag via CFLAGS, CXXXFLAGS, and FCFLAGS In-Reply-To: <1D6F65B9-8126-47DB-94CE-4D8E5D3B9708@anl.gov> References: <87tvdea9oe.fsf@jedbrown.org> <1D6F65B9-8126-47DB-94CE-4D8E5D3B9708@anl.gov> Message-ID: <877eaa9z7c.fsf@jedbrown.org> No love with: cc: Sun C 5.12 Linux_i386 2011/11/16 Note that all of these compilers (including Sun C, which doesn't define the macro) recognize -fPIC. (Blue Gene xlc requires -qpic.) Do we still need to test the other alternatives? "Smith, Barry F." writes: > Works for Intel and PGI compiles (the version I checked) > > bsmith at es:~$ pgcc check-pic.c -PIC > pgcc-Error-Unknown switch: -PIC > bsmith at es:~$ pgcc check-pic.c -fPIC > bsmith at es:~$ pgcc check-pic.c > PGC-F-0249-#error -- "no-PIC" (check-pic.c: 2) > PGC/x86-64 Linux 19.3-0: compilation aborted > bsmith at es:~$ icc check-pic.c > check-pic.c(2): error: #error directive: "no-PIC" > #error "no-PIC" > ^ > > compilation aborted for check-pic.c (code 2) > bsmith at es:~$ icc check-pic.c -PIC > icc: command line warning #10006: ignoring unknown option '-PIC' > check-pic.c(2): error: #error directive: "no-PIC" > #error "no-PIC" > ^ > > compilation aborted for check-pic.c (code 2) > bsmith at es:~$ icc check-pic.c -fPIC > bsmith at es:~$ > > > You are the man! > > >> On May 28, 2019, at 12:29 PM, Lisandro Dalcin via petsc-users wrote: >> >> >> >> On Tue, 28 May 2019 at 18:19, Jed Brown wrote: >> Lisandro Dalcin via petsc-users writes: >> >> > On Tue, 28 May 2019 at 17:31, Balay, Satish via petsc-users < >> > petsc-users at mcs.anl.gov> wrote: >> > >> >> Configure.log shows '--with-pic=1' - hence this error. >> >> >> >> Remove '--with-pic=1' and retry. >> >> >> >> >> > Nonsense. Why this behavior? Building a static library with PIC code is a >> > perfectly valid use case. >> >> And that's what will happen because Inge passed -fPIC in CFLAGS et al. >> >> Do you know how we could confirm that PIC code is generated without >> attempting to use shared libraries? >> >> >> I know how to do it with the `readelf` command for ELF objects. I even know how to do it compile-time for GCC and clang. Maybe Intel also works this way. I do not know about a general solution, though. >> >> $ cat check-pic.c >> #ifndef __PIC__ >> #error "no-PIC" >> #endif >> >> $ gcc -c check-pic.c -fPIC >> >> $ clang -c check-pic.c -fPIC >> >> $ gcc -c check-pic.c >> check-pic.c:2:2: error: #error "no-PIC" >> 2 | #error "no-PIC" >> | ^~~~~ >> >> $ clang -c check-pic.c >> check-pic.c:2:2: error: "no-PIC" >> #error "no-PIC" >> ^ >> 1 error generated. >> >> -- >> Lisandro Dalcin >> ============ >> Research Scientist >> Extreme Computing Research Center (ECRC) >> King Abdullah University of Science and Technology (KAUST) >> http://ecrc.kaust.edu.sa/ From jczhang at mcs.anl.gov Tue May 28 16:25:45 2019 From: jczhang at mcs.anl.gov (Zhang, Junchao) Date: Tue, 28 May 2019 21:25:45 +0000 Subject: [petsc-users] How do I supply the compiler PIC flag via CFLAGS, CXXXFLAGS, and FCFLAGS In-Reply-To: <1D6F65B9-8126-47DB-94CE-4D8E5D3B9708@anl.gov> References: <87tvdea9oe.fsf@jedbrown.org> <1D6F65B9-8126-47DB-94CE-4D8E5D3B9708@anl.gov> Message-ID: Also works with PathScale EKOPath Compiler Suite installed on MCS machines. $ pathcc -c check-pic.c -fPIC $ pathcc -c check-pic.c check-pic.c:2:2: error: "no-PIC" #error "no-PIC" ^ 1 error generated. --Junchao Zhang On Tue, May 28, 2019 at 1:54 PM Smith, Barry F. via petsc-users > wrote: Works for Intel and PGI compiles (the version I checked) bsmith at es:~$ pgcc check-pic.c -PIC pgcc-Error-Unknown switch: -PIC bsmith at es:~$ pgcc check-pic.c -fPIC bsmith at es:~$ pgcc check-pic.c PGC-F-0249-#error -- "no-PIC" (check-pic.c: 2) PGC/x86-64 Linux 19.3-0: compilation aborted bsmith at es:~$ icc check-pic.c check-pic.c(2): error: #error directive: "no-PIC" #error "no-PIC" ^ compilation aborted for check-pic.c (code 2) bsmith at es:~$ icc check-pic.c -PIC icc: command line warning #10006: ignoring unknown option '-PIC' check-pic.c(2): error: #error directive: "no-PIC" #error "no-PIC" ^ compilation aborted for check-pic.c (code 2) bsmith at es:~$ icc check-pic.c -fPIC bsmith at es:~$ You are the man! > On May 28, 2019, at 12:29 PM, Lisandro Dalcin via petsc-users > wrote: > > > > On Tue, 28 May 2019 at 18:19, Jed Brown > wrote: > Lisandro Dalcin via petsc-users > writes: > > > On Tue, 28 May 2019 at 17:31, Balay, Satish via petsc-users < > > petsc-users at mcs.anl.gov> wrote: > > > >> Configure.log shows '--with-pic=1' - hence this error. > >> > >> Remove '--with-pic=1' and retry. > >> > >> > > Nonsense. Why this behavior? Building a static library with PIC code is a > > perfectly valid use case. > > And that's what will happen because Inge passed -fPIC in CFLAGS et al. > > Do you know how we could confirm that PIC code is generated without > attempting to use shared libraries? > > > I know how to do it with the `readelf` command for ELF objects. I even know how to do it compile-time for GCC and clang. Maybe Intel also works this way. I do not know about a general solution, though. > > $ cat check-pic.c > #ifndef __PIC__ > #error "no-PIC" > #endif > > $ gcc -c check-pic.c -fPIC > > $ clang -c check-pic.c -fPIC > > $ gcc -c check-pic.c > check-pic.c:2:2: error: #error "no-PIC" > 2 | #error "no-PIC" > | ^~~~~ > > $ clang -c check-pic.c > check-pic.c:2:2: error: "no-PIC" > #error "no-PIC" > ^ > 1 error generated. > > -- > Lisandro Dalcin > ============ > Research Scientist > Extreme Computing Research Center (ECRC) > King Abdullah University of Science and Technology (KAUST) > http://ecrc.kaust.edu.sa/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From edoardo.alinovi at gmail.com Wed May 29 02:06:33 2019 From: edoardo.alinovi at gmail.com (Edoardo alinovi) Date: Wed, 29 May 2019 09:06:33 +0200 Subject: [petsc-users] Stop KSP if diverging Message-ID: Dear PETSc friends, Hope you are doing all well. I have a quick question for you that I am not able to solve by my self. Time to time, while testing new code features, it happens that KSP diverges but it does not stop automatically and the iterations continue even after getting a NaN. In the KSP setup I use the following instruction to set the divergence stopping criteria (div = 10000): call KSPSetTolerances(myksp, rel_tol, abs_tol, div, itmax, ierr) But is does not help. Looking into the documentation I have found also: KSPConvergedDefault (KSP ksp,PetscInt n,PetscReal rnorm,KSPConvergedReason *reason,void *ctx) Which I am not calling in the code. Is this maybe the reason of my problem? If yes how can I use KSPConvergedDefault in FORTRAN? Thanks, Edo ------ Edoardo Alinovi, Ph.D. DICCA, Scuola Politecnica, Universita' degli Studi di Genova, 1, via Montallegro, 16145 Genova, Italy Email: edoardo.alinovi at dicca.unige.it Tel: +39 010 353 2540 Website: https://www.edoardoalinovi.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dalcinl at gmail.com Wed May 29 04:02:05 2019 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Wed, 29 May 2019 12:02:05 +0300 Subject: [petsc-users] How do I supply the compiler PIC flag via CFLAGS, CXXXFLAGS, and FCFLAGS In-Reply-To: <877eaa9z7c.fsf@jedbrown.org> References: <87tvdea9oe.fsf@jedbrown.org> <1D6F65B9-8126-47DB-94CE-4D8E5D3B9708@anl.gov> <877eaa9z7c.fsf@jedbrown.org> Message-ID: On Tue, 28 May 2019 at 22:05, Jed Brown wrote: > > Note that all of these compilers (including Sun C, which doesn't define > the macro) recognize -fPIC. (Blue Gene xlc requires -qpic.) Do we > still need to test the other alternatives? > > Well, worst case, if the configure test always fails with and without all the possible variants of the PIC flag (-fPIC, -kPIC, -qpic, etc.) because they do not define the __PIC__ macro, then you are free to abort configure and ask users to pass the pic flag in CFLAGS and remove --with-pic=1 from the configure line, as we do today for all compilers. BTW, my trick seems to works with the Cray compiler. $ cc CC-2107 craycc: ERROR in command line No valid filenames are specified on the command line. $ cc -c check-pic.c -fPIC $ cc -c check-pic.c CC-35 craycc: ERROR File = check-pic.c, Line = 2 #error directive: "no-PIC" #error "no-PIC" ^ -- Lisandro Dalcin ============ Research Scientist Extreme Computing Research Center (ECRC) King Abdullah University of Science and Technology (KAUST) http://ecrc.kaust.edu.sa/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 29 05:31:48 2019 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 29 May 2019 06:31:48 -0400 Subject: [petsc-users] Stop KSP if diverging In-Reply-To: References: Message-ID: On Wed, May 29, 2019 at 3:07 AM Edoardo alinovi via petsc-users < petsc-users at mcs.anl.gov> wrote: > Dear PETSc friends, > > Hope you are doing all well. > > I have a quick question for you that I am not able to solve by my self. > Time to time, while testing new code features, it happens that KSP diverges > but it does not stop automatically and the iterations continue even after > getting a NaN. > I think you want https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/KSP/KSPSetErrorIfNotConverged.html Thanks, Matt > In the KSP setup I use the following instruction to set the divergence > stopping criteria (div = 10000): > > call KSPSetTolerances(myksp, rel_tol, abs_tol, div, itmax, ierr) > > But is does not help. Looking into the documentation I have found also: > > KSPConvergedDefault (KSP ksp,PetscInt n,PetscReal rnorm,KSPConvergedReason *reason,void *ctx) > > Which I am not calling in the code. Is this maybe the reason of my > problem? If yes how can I use KSPConvergedDefault > in > FORTRAN? > > Thanks, > > Edo > > ------ > > Edoardo Alinovi, Ph.D. > > DICCA, Scuola Politecnica, > Universita' degli Studi di Genova, > 1, via Montallegro, > 16145 Genova, Italy > > Email: edoardo.alinovi at dicca.unige.it > Tel: +39 010 353 2540 > Website: https://www.edoardoalinovi.com/ > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From edoardo.alinovi at gmail.com Wed May 29 05:35:06 2019 From: edoardo.alinovi at gmail.com (Edoardo alinovi) Date: Wed, 29 May 2019 11:35:06 +0100 Subject: [petsc-users] Stop KSP if diverging In-Reply-To: References: Message-ID: Thanks Matthew, Yes, I will give it a try thid evening. Thak you very much! On Wed, 29 May 2019, 11:32 Matthew Knepley, wrote: > On Wed, May 29, 2019 at 3:07 AM Edoardo alinovi via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> Dear PETSc friends, >> >> Hope you are doing all well. >> >> I have a quick question for you that I am not able to solve by my self. >> Time to time, while testing new code features, it happens that KSP diverges >> but it does not stop automatically and the iterations continue even after >> getting a NaN. >> > > I think you want > > > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/KSP/KSPSetErrorIfNotConverged.html > > Thanks, > > Matt > > >> In the KSP setup I use the following instruction to set the divergence >> stopping criteria (div = 10000): >> >> call KSPSetTolerances(myksp, rel_tol, abs_tol, div, itmax, ierr) >> >> But is does not help. Looking into the documentation I have found also: >> >> KSPConvergedDefault (KSP ksp,PetscInt n,PetscReal rnorm,KSPConvergedReason *reason,void *ctx) >> >> Which I am not calling in the code. Is this maybe the reason of my >> problem? If yes how can I use KSPConvergedDefault >> in >> FORTRAN? >> >> Thanks, >> >> Edo >> >> ------ >> >> Edoardo Alinovi, Ph.D. >> >> DICCA, Scuola Politecnica, >> Universita' degli Studi di Genova, >> 1, via Montallegro, >> 16145 Genova, Italy >> >> Email: edoardo.alinovi at dicca.unige.it >> Tel: +39 010 353 2540 >> Website: https://www.edoardoalinovi.com/ >> >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From myriam.peyrounette at idris.fr Wed May 29 06:00:24 2019 From: myriam.peyrounette at idris.fr (Myriam Peyrounette) Date: Wed, 29 May 2019 13:00:24 +0200 Subject: [petsc-users] Bad memory scaling with PETSc 3.10 In-Reply-To: References: <9bb4ddb6-b99e-7a1b-16e1-f226f8fd0d0b@idris.fr> Message-ID: <53aa2d89-137e-850d-1772-109ad6ec3196@idris.fr> Hi, Do you have any idea when Barry's fix (https://bitbucket.org/petsc/petsc/pull-requests/1606/change-handling-of-matptap_mpiaij_mpimaij/diff) will be released? I can see it has been merged to the "next" branch. Does it mean it will be soon available on master? +for your information, I plotted a summary of the scalings of interest (memory and time): - using petsc-3.10.2 (ref "bad" scaling) - using petsc-3.6.4 (ref "good" scaling) - using commit d330a26 + Barry's fix and different algorithms (none, scalable, allatonce, allatonce_merged) Best regards, Myriam Le 05/13/19 ? 17:20, Fande Kong a ?crit?: > Hi?Myriam, > > Thanks for your report back. > > On Mon, May 13, 2019 at 2:01 AM Myriam Peyrounette > > wrote: > > Hi all, > > I tried with 3.11.1 version and Barry's fix. The good scaling is back! > See the green curve in the plot attached. It is even better than PETSc > 3.6! And it runs faster (10-15s instead of 200-300s with 3.6). > > > We are glad your issue was resolved here.? > ? > > > So you were right. It seems that not all the PtAPs used the scalable > version. > > I was a bit confused about the options to set... I used the options: > -matptap_via scalable and -mat_freeintermediatedatastructures 1. > Do you > think it would be even better with allatonce? > > > "scalable" and "allatonce" correspond to different algorithms > respectively. ``allatonce" should be using less memory than > "scalable". The "allatonce" algorithm ?would be a good alternative if > your application is memory sensitive and the problem size is large.? > We are definitely curious about the memory usage of ``allatonce" in > your test cases but don't feel obligated to do these tests since your > concern were resolved now. In case you are also interested in how our > new algorithms perform, I post petsc options here that are used to? > choose these algorithms: > > algorithm 1: ``allatonce"? > > -matptap_via allatonce > -mat_freeintermediatedatastructures 1 > > algorithm 2: ``allatonce_merged"? > > -matptap_via allatonce_merged > -mat_freeintermediatedatastructures 1 > > > Again, thanks for your report that help us improve PETSc. > > Fande, > ? > > > It is unfortunate that this fix can't be merged with the master > branch. > But the patch works well and I can consider the issue as solved now. > > Thanks a lot for your time! > > Myriam > > > Le 05/04/19 ? 06:54, Smith, Barry F. a ?crit?: > >? ? Hmm, I had already fixed this, I think, > > > >? ? > https://bitbucket.org/petsc/petsc/pull-requests/1606/change-handling-of-matptap_mpiaij_mpimaij/diff > > > >? ? but unfortunately our backlog of pull requests kept it out of > master. We are (well Satish and Jed) working on a new CI > infrastructure that will hopefully be more stable than the current > CI that we are using. > > > >? ? Fande, > >? ? ? ?Sorry you had to spend time on this. > > > > > >? ? Barry > > > > > > > >> On May 3, 2019, at 11:20 PM, Fande Kong via petsc-users > > wrote: > >> > >> Hi Myriam, > >> > >> I run the example you attached earlier with "-mx 48 -my 48 -mz > 48 -levels 3 -ksp_view? -matptap_via allatonce -log_view ".? > >> > >> There are six PtAPs. Two of them are sill using the nonscalable > version of the algorithm (this might explain why the memory still > exponentially increases) even though we have asked PETSc to use > the ``allatonce" algorithm. This is happening because MATMAIJ does > not honor the petsc option, instead, it uses the default setting > of MPIAIJ.? I have a fix at > https://bitbucket.org/petsc/petsc/pull-requests/1623/choose-algorithms-in/diff. > The PR should fix the issue. > >> > >> Thanks again for your report, > >> > >> Fande, > >> > >>? > > -- > Myriam Peyrounette > CNRS/IDRIS - HLST > -- > > -- Myriam Peyrounette CNRS/IDRIS - HLST -- -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex42_mem_scaling_ada_patched.png Type: image/png Size: 37597 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ex42_time_scaling_ada_patched.png Type: image/png Size: 81468 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2975 bytes Desc: Signature cryptographique S/MIME URL: From hzhang at mcs.anl.gov Wed May 29 09:55:05 2019 From: hzhang at mcs.anl.gov (Zhang, Hong) Date: Wed, 29 May 2019 14:55:05 +0000 Subject: [petsc-users] Bad memory scaling with PETSc 3.10 In-Reply-To: <53aa2d89-137e-850d-1772-109ad6ec3196@idris.fr> References: <9bb4ddb6-b99e-7a1b-16e1-f226f8fd0d0b@idris.fr> <53aa2d89-137e-850d-1772-109ad6ec3196@idris.fr> Message-ID: Myriam: This branch is merged to master. Thanks for your work and patience. It helps us a lot. The graphs are very nice :-) We plan to re-organise the APIs of mat-mat opts, make them easier for users. Hong Hi, Do you have any idea when Barry's fix (https://bitbucket.org/petsc/petsc/pull-requests/1606/change-handling-of-matptap_mpiaij_mpimaij/diff) will be released? I can see it has been merged to the "next" branch. Does it mean it will be soon available on master? +for your information, I plotted a summary of the scalings of interest (memory and time): - using petsc-3.10.2 (ref "bad" scaling) - using petsc-3.6.4 (ref "good" scaling) - using commit d330a26 + Barry's fix and different algorithms (none, scalable, allatonce, allatonce_merged) Best regards, Myriam Le 05/13/19 ? 17:20, Fande Kong a ?crit : Hi Myriam, Thanks for your report back. On Mon, May 13, 2019 at 2:01 AM Myriam Peyrounette > wrote: Hi all, I tried with 3.11.1 version and Barry's fix. The good scaling is back! See the green curve in the plot attached. It is even better than PETSc 3.6! And it runs faster (10-15s instead of 200-300s with 3.6). We are glad your issue was resolved here. So you were right. It seems that not all the PtAPs used the scalable version. I was a bit confused about the options to set... I used the options: -matptap_via scalable and -mat_freeintermediatedatastructures 1. Do you think it would be even better with allatonce? "scalable" and "allatonce" correspond to different algorithms respectively. ``allatonce" should be using less memory than "scalable". The "allatonce" algorithm would be a good alternative if your application is memory sensitive and the problem size is large. We are definitely curious about the memory usage of ``allatonce" in your test cases but don't feel obligated to do these tests since your concern were resolved now. In case you are also interested in how our new algorithms perform, I post petsc options here that are used to choose these algorithms: algorithm 1: ``allatonce" -matptap_via allatonce -mat_freeintermediatedatastructures 1 algorithm 2: ``allatonce_merged" -matptap_via allatonce_merged -mat_freeintermediatedatastructures 1 Again, thanks for your report that help us improve PETSc. Fande, It is unfortunate that this fix can't be merged with the master branch. But the patch works well and I can consider the issue as solved now. Thanks a lot for your time! Myriam Le 05/04/19 ? 06:54, Smith, Barry F. a ?crit : > Hmm, I had already fixed this, I think, > > https://bitbucket.org/petsc/petsc/pull-requests/1606/change-handling-of-matptap_mpiaij_mpimaij/diff > > but unfortunately our backlog of pull requests kept it out of master. We are (well Satish and Jed) working on a new CI infrastructure that will hopefully be more stable than the current CI that we are using. > > Fande, > Sorry you had to spend time on this. > > > Barry > > > >> On May 3, 2019, at 11:20 PM, Fande Kong via petsc-users > wrote: >> >> Hi Myriam, >> >> I run the example you attached earlier with "-mx 48 -my 48 -mz 48 -levels 3 -ksp_view -matptap_via allatonce -log_view ". >> >> There are six PtAPs. Two of them are sill using the nonscalable version of the algorithm (this might explain why the memory still exponentially increases) even though we have asked PETSc to use the ``allatonce" algorithm. This is happening because MATMAIJ does not honor the petsc option, instead, it uses the default setting of MPIAIJ. I have a fix at https://bitbucket.org/petsc/petsc/pull-requests/1623/choose-algorithms-in/diff. The PR should fix the issue. >> >> Thanks again for your report, >> >> Fande, >> >> -- Myriam Peyrounette CNRS/IDRIS - HLST -- -- Myriam Peyrounette CNRS/IDRIS - HLST -- -------------- next part -------------- An HTML attachment was scrubbed... URL: From myriam.peyrounette at idris.fr Wed May 29 09:58:20 2019 From: myriam.peyrounette at idris.fr (Myriam Peyrounette) Date: Wed, 29 May 2019 16:58:20 +0200 Subject: [petsc-users] Bad memory scaling with PETSc 3.10 In-Reply-To: References: <9bb4ddb6-b99e-7a1b-16e1-f226f8fd0d0b@idris.fr> <53aa2d89-137e-850d-1772-109ad6ec3196@idris.fr> Message-ID: <197623c9-1060-d2a5-38b4-16d6dc5a0815@idris.fr> Oh sorry, I missed that. That's great! Thanks, Myriam Le 05/29/19 ? 16:55, Zhang, Hong a ?crit?: > Myriam: > This branch is merged to master. > Thanks for your work and patience. It helps us a lot. The graphs are > very nice :-) > > We plan to re-organise the APIs of mat-mat opts, make them easier for > users. > Hong > > Hi, > > Do you have any idea when Barry's fix > (https://bitbucket.org/petsc/petsc/pull-requests/1606/change-handling-of-matptap_mpiaij_mpimaij/diff) > will be released? I can see it has been merged to the "next" > branch. Does it mean it will be soon available on master? > > +for your information, I plotted a summary of the scalings of > interest (memory and time): > - using petsc-3.10.2 (ref "bad" scaling) > - using petsc-3.6.4 (ref "good" scaling) > - using commit d330a26 + Barry's fix and different algorithms > (none, scalable, allatonce, allatonce_merged) > > Best regards, > > Myriam > > > Le 05/13/19 ? 17:20, Fande Kong a ?crit?: >> Hi?Myriam, >> >> Thanks for your report back. >> >> On Mon, May 13, 2019 at 2:01 AM Myriam Peyrounette >> > > wrote: >> >> Hi all, >> >> I tried with 3.11.1 version and Barry's fix. The good scaling >> is back! >> See the green curve in the plot attached. It is even better >> than PETSc >> 3.6! And it runs faster (10-15s instead of 200-300s with 3.6). >> >> >> We are glad your issue was resolved here.? >> ? >> >> >> So you were right. It seems that not all the PtAPs used the >> scalable >> version. >> >> I was a bit confused about the options to set... I used the >> options: >> -matptap_via scalable and -mat_freeintermediatedatastructures >> 1. Do you >> think it would be even better with allatonce? >> >> >> "scalable" and "allatonce" correspond to different algorithms >> respectively. ``allatonce" should be using less memory than >> "scalable". The "allatonce" algorithm ?would be a good >> alternative if your application is memory sensitive and the >> problem size is large.? >> We are definitely curious about the memory usage of ``allatonce" >> in your test cases but don't feel obligated to do these tests >> since your concern were resolved now. In case you are also >> interested in how our new algorithms perform, I post petsc >> options here that are used to? >> choose these algorithms: >> >> algorithm 1: ``allatonce"? >> >> -matptap_via allatonce >> -mat_freeintermediatedatastructures 1 >> >> algorithm 2: ``allatonce_merged"? >> >> -matptap_via allatonce_merged >> -mat_freeintermediatedatastructures 1 >> >> >> Again, thanks for your report that help us improve PETSc. >> >> Fande, >> ? >> >> >> It is unfortunate that this fix can't be merged with the >> master branch. >> But the patch works well and I can consider the issue as >> solved now. >> >> Thanks a lot for your time! >> >> Myriam >> >> >> Le 05/04/19 ? 06:54, Smith, Barry F. a ?crit?: >> >? ? Hmm, I had already fixed this, I think, >> > >> >? ? >> https://bitbucket.org/petsc/petsc/pull-requests/1606/change-handling-of-matptap_mpiaij_mpimaij/diff >> > >> >? ? but unfortunately our backlog of pull requests kept it >> out of master. We are (well Satish and Jed) working on a new >> CI infrastructure that will hopefully be more stable than the >> current CI that we are using. >> > >> >? ? Fande, >> >? ? ? ?Sorry you had to spend time on this. >> > >> > >> >? ? Barry >> > >> > >> > >> >> On May 3, 2019, at 11:20 PM, Fande Kong via petsc-users >> > wrote: >> >> >> >> Hi Myriam, >> >> >> >> I run the example you attached earlier with "-mx 48 -my 48 >> -mz 48 -levels 3 -ksp_view? -matptap_via allatonce -log_view ".? >> >> >> >> There are six PtAPs. Two of them are sill using the >> nonscalable version of the algorithm (this might explain why >> the memory still exponentially increases) even though we have >> asked PETSc to use the ``allatonce" algorithm. This is >> happening because MATMAIJ does not honor the petsc option, >> instead, it uses the default setting of MPIAIJ.? I have a fix >> at >> https://bitbucket.org/petsc/petsc/pull-requests/1623/choose-algorithms-in/diff. >> The PR should fix the issue. >> >> >> >> Thanks again for your report, >> >> >> >> Fande, >> >> >> >>? >> >> -- >> Myriam Peyrounette >> CNRS/IDRIS - HLST >> -- >> >> > > -- > Myriam Peyrounette > CNRS/IDRIS - HLST > -- > -- Myriam Peyrounette CNRS/IDRIS - HLST -- -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2975 bytes Desc: Signature cryptographique S/MIME URL: From fdkong.jd at gmail.com Wed May 29 10:15:09 2019 From: fdkong.jd at gmail.com (Fande Kong) Date: Wed, 29 May 2019 09:15:09 -0600 Subject: [petsc-users] Bad memory scaling with PETSc 3.10 In-Reply-To: <197623c9-1060-d2a5-38b4-16d6dc5a0815@idris.fr> References: <9bb4ddb6-b99e-7a1b-16e1-f226f8fd0d0b@idris.fr> <53aa2d89-137e-850d-1772-109ad6ec3196@idris.fr> <197623c9-1060-d2a5-38b4-16d6dc5a0815@idris.fr> Message-ID: Hi Myriam, Thanks for your valuable feedback. The plots are really cool! Finally all the algorithms are scalable. Thanks, Fande On Wed, May 29, 2019 at 8:58 AM Myriam Peyrounette < myriam.peyrounette at idris.fr> wrote: > Oh sorry, I missed that. That's great! > > Thanks, > > Myriam > > Le 05/29/19 ? 16:55, Zhang, Hong a ?crit : > > Myriam: > This branch is merged to master. > Thanks for your work and patience. It helps us a lot. The graphs are very > nice :-) > > We plan to re-organise the APIs of mat-mat opts, make them easier for > users. > Hong > >> Hi, >> >> Do you have any idea when Barry's fix ( >> https://bitbucket.org/petsc/petsc/pull-requests/1606/change-handling-of-matptap_mpiaij_mpimaij/diff) >> will be released? I can see it has been merged to the "next" branch. Does >> it mean it will be soon available on master? >> >> +for your information, I plotted a summary of the scalings of interest >> (memory and time): >> - using petsc-3.10.2 (ref "bad" scaling) >> - using petsc-3.6.4 (ref "good" scaling) >> - using commit d330a26 + Barry's fix and different algorithms (none, >> scalable, allatonce, allatonce_merged) >> >> Best regards, >> >> Myriam >> >> Le 05/13/19 ? 17:20, Fande Kong a ?crit : >> >> Hi Myriam, >> >> Thanks for your report back. >> >> On Mon, May 13, 2019 at 2:01 AM Myriam Peyrounette < >> myriam.peyrounette at idris.fr> wrote: >> >>> Hi all, >>> >>> I tried with 3.11.1 version and Barry's fix. The good scaling is back! >>> See the green curve in the plot attached. It is even better than PETSc >>> 3.6! And it runs faster (10-15s instead of 200-300s with 3.6). >>> >> >> We are glad your issue was resolved here. >> >> >>> >>> So you were right. It seems that not all the PtAPs used the scalable >>> version. >>> >>> I was a bit confused about the options to set... I used the options: >>> -matptap_via scalable and -mat_freeintermediatedatastructures 1. Do you >>> think it would be even better with allatonce? >>> >> >> "scalable" and "allatonce" correspond to different algorithms >> respectively. ``allatonce" should be using less memory than "scalable". The >> "allatonce" algorithm would be a good alternative if your application is >> memory sensitive and the problem size is large. >> We are definitely curious about the memory usage of ``allatonce" in your >> test cases but don't feel obligated to do these tests since your concern >> were resolved now. In case you are also interested in how our new >> algorithms perform, I post petsc options here that are used to >> choose these algorithms: >> >> algorithm 1: ``allatonce" >> >> -matptap_via allatonce >> -mat_freeintermediatedatastructures 1 >> >> algorithm 2: ``allatonce_merged" >> >> -matptap_via allatonce_merged >> -mat_freeintermediatedatastructures 1 >> >> >> Again, thanks for your report that help us improve PETSc. >> >> Fande, >> >> >>> >>> It is unfortunate that this fix can't be merged with the master branch. >>> But the patch works well and I can consider the issue as solved now. >>> >>> Thanks a lot for your time! >>> >>> Myriam >>> >>> >>> Le 05/04/19 ? 06:54, Smith, Barry F. a ?crit : >>> > Hmm, I had already fixed this, I think, >>> > >>> > >>> https://bitbucket.org/petsc/petsc/pull-requests/1606/change-handling-of-matptap_mpiaij_mpimaij/diff >>> > >>> > but unfortunately our backlog of pull requests kept it out of >>> master. We are (well Satish and Jed) working on a new CI infrastructure >>> that will hopefully be more stable than the current CI that we are using. >>> > >>> > Fande, >>> > Sorry you had to spend time on this. >>> > >>> > >>> > Barry >>> > >>> > >>> > >>> >> On May 3, 2019, at 11:20 PM, Fande Kong via petsc-users < >>> petsc-users at mcs.anl.gov> wrote: >>> >> >>> >> Hi Myriam, >>> >> >>> >> I run the example you attached earlier with "-mx 48 -my 48 -mz 48 >>> -levels 3 -ksp_view -matptap_via allatonce -log_view ". >>> >> >>> >> There are six PtAPs. Two of them are sill using the nonscalable >>> version of the algorithm (this might explain why the memory still >>> exponentially increases) even though we have asked PETSc to use the >>> ``allatonce" algorithm. This is happening because MATMAIJ does not honor >>> the petsc option, instead, it uses the default setting of MPIAIJ. I have a >>> fix at >>> https://bitbucket.org/petsc/petsc/pull-requests/1623/choose-algorithms-in/diff. >>> The PR should fix the issue. >>> >> >>> >> Thanks again for your report, >>> >> >>> >> Fande, >>> >> >>> >> >>> >>> -- >>> Myriam Peyrounette >>> CNRS/IDRIS - HLST >>> -- >>> >>> >>> >> -- >> Myriam Peyrounette >> CNRS/IDRIS - HLST >> -- >> >> > -- > Myriam Peyrounette > CNRS/IDRIS - HLST > -- > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed May 29 13:22:26 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Wed, 29 May 2019 18:22:26 +0000 Subject: [petsc-users] Stop KSP if diverging In-Reply-To: References: Message-ID: <2B4302F1-A47E-4865-9601-B42BC618ED29@anl.gov> Hmm, in the lastest couple of releases of PETSc the KSPSolve is suppose to end as soon as it hits a NaN or Infinity. Is that not happening for you? If you run with -ksp_monitor does it print multiple lines with Nan or Inf? If so please send use the -ksp_view output so we can track down which solver is not correctly handling the Nan or Info. That said if you call KSPSolve() multiple times in a loop or from SNESSolve() each new solve may have Nan or Inf (from the previous) but it should only do one iteration before exiting. You should always call KSPGetConvergedReason() after KSPSolve() and confirm that the reason is positive, if it is native it indicates something failed in the solve. Barry > On May 29, 2019, at 2:06 AM, Edoardo alinovi via petsc-users wrote: > > Dear PETSc friends, > > Hope you are doing all well. > > I have a quick question for you that I am not able to solve by my self. Time to time, while testing new code features, it happens that KSP diverges but it does not stop automatically and the iterations continue even after getting a NaN. > > In the KSP setup I use the following instruction to set the divergence stopping criteria (div = 10000): > > call KSPSetTolerances(myksp, rel_tol, abs_tol, div, itmax, ierr) > > But is does not help. Looking into the documentation I have found also: > KSPConvergedDefault(KSP ksp,PetscInt n,PetscReal rnorm,KSPConvergedReason *reason,void *ctx) > Which I am not calling in the code. Is this maybe the reason of my problem? If yes how can I use KSPConvergedDefault in FORTRAN? > > Thanks, > > Edo > > ------ > > Edoardo Alinovi, Ph.D. > > DICCA, Scuola Politecnica, > Universita' degli Studi di Genova, > 1, via Montallegro, > 16145 Genova, Italy > > Email: edoardo.alinovi at dicca.unige.it > Tel: +39 010 353 2540 > Website: https://www.edoardoalinovi.com/ > > From jed at jedbrown.org Wed May 29 13:59:35 2019 From: jed at jedbrown.org (Jed Brown) Date: Wed, 29 May 2019 12:59:35 -0600 Subject: [petsc-users] How do I supply the compiler PIC flag via CFLAGS, CXXXFLAGS, and FCFLAGS In-Reply-To: References: <87tvdea9oe.fsf@jedbrown.org> <1D6F65B9-8126-47DB-94CE-4D8E5D3B9708@anl.gov> <877eaa9z7c.fsf@jedbrown.org> Message-ID: <87imttjdd4.fsf@jedbrown.org> Lisandro Dalcin writes: > On Tue, 28 May 2019 at 22:05, Jed Brown wrote: > >> >> Note that all of these compilers (including Sun C, which doesn't define >> the macro) recognize -fPIC. (Blue Gene xlc requires -qpic.) Do we >> still need to test the other alternatives? >> >> > Well, worst case, if the configure test always fails with and without all > the possible variants of the PIC flag (-fPIC, -kPIC, -qpic, etc.) because > they do not define the __PIC__ macro, then you are free to abort configure > and ask users to pass the pic flag in CFLAGS and remove --with-pic=1 from > the configure line, as we do today for all compilers. Yeah, this would also need to apply to shared libraries, where need for PIC is implied and we should proceed to confirm that linking works (i.e., the old test) using user-provided CFLAGS. > BTW, my trick seems to works with the Cray compiler. > > $ cc > CC-2107 craycc: ERROR in command line > No valid filenames are specified on the command line. > > $ cc -c check-pic.c -fPIC > > $ cc -c check-pic.c > CC-35 craycc: ERROR File = check-pic.c, Line = 2 > #error directive: "no-PIC" > #error "no-PIC" > ^ > > > -- > Lisandro Dalcin > ============ > Research Scientist > Extreme Computing Research Center (ECRC) > King Abdullah University of Science and Technology (KAUST) > http://ecrc.kaust.edu.sa/ From bhatiamanav at gmail.com Wed May 29 14:27:16 2019 From: bhatiamanav at gmail.com (Manav Bhatia) Date: Wed, 29 May 2019 14:27:16 -0500 Subject: [petsc-users] Nonzero I-j locations Message-ID: Hi, Once a MPI-AIJ matrix has been assembled, is there a method to get the nonzero I-J locations? I see one for sequential matrices here: https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetRowIJ.html , but not for parallel matrices. Regards, Manav -------------- next part -------------- An HTML attachment was scrubbed... URL: From jczhang at mcs.anl.gov Wed May 29 17:25:40 2019 From: jczhang at mcs.anl.gov (Zhang, Junchao) Date: Wed, 29 May 2019 22:25:40 +0000 Subject: [petsc-users] Nonzero I-j locations In-Reply-To: References: Message-ID: Yes, see MatGetRow https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetRow.html --Junchao Zhang On Wed, May 29, 2019 at 2:28 PM Manav Bhatia via petsc-users > wrote: Hi, Once a MPI-AIJ matrix has been assembled, is there a method to get the nonzero I-J locations? I see one for sequential matrices here: https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetRowIJ.html , but not for parallel matrices. Regards, Manav -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed May 29 18:07:23 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Wed, 29 May 2019 23:07:23 +0000 Subject: [petsc-users] Nonzero I-j locations In-Reply-To: References: Message-ID: <320DB646-4D80-4273-BD82-A33014D0DF65@anl.gov> Manav, For parallel sparse matrices using the standard PETSc formats the matrix is stored in two parts on each process (see the details in MatCreateAIJ()) thus there is no inexpensive way to access directly the IJ locations as a single local matrix. What are you hoping to use the information for? Perhaps we have other suggestions on how to achieve the goal. Barry > On May 29, 2019, at 2:27 PM, Manav Bhatia via petsc-users wrote: > > Hi, > > Once a MPI-AIJ matrix has been assembled, is there a method to get the nonzero I-J locations? I see one for sequential matrices here: https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetRowIJ.html , but not for parallel matrices. > > Regards, > Manav > > From bhatiamanav at gmail.com Wed May 29 18:29:08 2019 From: bhatiamanav at gmail.com (Manav Bhatia) Date: Wed, 29 May 2019 18:29:08 -0500 Subject: [petsc-users] Nonzero I-j locations In-Reply-To: <320DB646-4D80-4273-BD82-A33014D0DF65@anl.gov> References: <320DB646-4D80-4273-BD82-A33014D0DF65@anl.gov> Message-ID: Thanks, Barry. I am working on a FE application (involving bifurcation behavior) with libMesh where I need to solve the system of equations along with a few extra unknowns that are not directly related to the FE mesh. I am able to assemble the n x 1 residual (R_fe) and n x n Jacobian (J_fe ) from my code and libMesh provides me with the sparsity pattern for this. Next, the system of equations that I need to solve is: [ J_fe A ] { dX } = { R_fe } [ B C ] { dV } = {R_ext } Where, C is a dense matrix of size m x m ( m << n ), A is n x m, B is m x n, R_ext is m x 1. A, B and C are dense matrixes. This comes from the bordered system for my path continuation solver. I have implemented a solver using Schur factorization ( this is outside of PETSc and does not use the FieldSplit construct ). This works well for most cases, except when J_fe is close to singular. I am now attempting to create a monolithic matrix that solves the complete system. Currently, the approach I am considering is to compute J_fe using my libMesh application, so that I don?t have to change that. I am defining a new matrix with the extra non-zero locations for A, B, C. With J_fe computed, I am looking to copy its non-zero entries to this new matrix. This is where I am stumbling since I don?t know how best to get the non-zero locations in J_fe. Maybe there is a better approach to copy from J_fe to the new matrix? I have looked through the nested matrix construct, but have not given this a serious consideration. Maybe I should? Note that I don?t want to solve J_fe and C separately (not as separate systems), so the field-split approach will not be suitable here. Also, I am currently using MUMPS for all my parallel solves. I would appreciate any advice. Regards, Manav > On May 29, 2019, at 6:07 PM, Smith, Barry F. wrote: > > > Manav, > > For parallel sparse matrices using the standard PETSc formats the matrix is stored in two parts on each process (see the details in MatCreateAIJ()) thus there is no inexpensive way to access directly the IJ locations as a single local matrix. What are you hoping to use the information for? Perhaps we have other suggestions on how to achieve the goal. > > Barry > > >> On May 29, 2019, at 2:27 PM, Manav Bhatia via petsc-users wrote: >> >> Hi, >> >> Once a MPI-AIJ matrix has been assembled, is there a method to get the nonzero I-J locations? I see one for sequential matrices here: https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetRowIJ.html , but not for parallel matrices. >> >> Regards, >> Manav >> >> > From knepley at gmail.com Wed May 29 18:43:14 2019 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 29 May 2019 19:43:14 -0400 Subject: [petsc-users] Nonzero I-j locations In-Reply-To: References: <320DB646-4D80-4273-BD82-A33014D0DF65@anl.gov> Message-ID: On Wed, May 29, 2019 at 7:30 PM Manav Bhatia via petsc-users < petsc-users at mcs.anl.gov> wrote: > Thanks, Barry. > > I am working on a FE application (involving bifurcation behavior) with > libMesh where I need to solve the system of equations along with a few > extra unknowns that are not directly related to the FE mesh. I am able to > assemble the n x 1 residual (R_fe) and n x n Jacobian (J_fe ) from my > code and libMesh provides me with the sparsity pattern for this. > > Next, the system of equations that I need to solve is: > > [ J_fe A ] { dX } = { R_fe } > [ B C ] { dV } = {R_ext } > > Where, C is a dense matrix of size m x m ( m << n ), A is n x m, B is m x > n, R_ext is m x 1. A, B and C are dense matrixes. This comes from the > bordered system for my path continuation solver. > > I have implemented a solver using Schur factorization ( this is outside > of PETSc and does not use the FieldSplit construct ). This works well for > most cases, except when J_fe is close to singular. > > I am now attempting to create a monolithic matrix that solves the > complete system. > > Currently, the approach I am considering is to compute J_fe using my > libMesh application, so that I don?t have to change that. I am defining a > new matrix with the extra non-zero locations for A, B, C. > > With J_fe computed, I am looking to copy its non-zero entries to this > new matrix. This is where I am stumbling since I don?t know how best to get > the non-zero locations in J_fe. Maybe there is a better approach to copy > from J_fe to the new matrix? > > I have looked through the nested matrix construct, but have not given > this a serious consideration. Maybe I should? Note that I don?t want to > solve J_fe and C separately (not as separate systems), so the field-split > approach will not be suitable here. > I would not choose Nest if you want to eventually run MUMPS, since that will not work. I would try to build your matrix using https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetLocalSubMatrix.html obtained from your bigger matrix. This is our interface to assembling into portions or your matrix as if its the entire matrix. Thanks, Matt > Also, I am currently using MUMPS for all my parallel solves. > > I would appreciate any advice. > > Regards, > Manav > > > > On May 29, 2019, at 6:07 PM, Smith, Barry F. wrote: > > > > > > Manav, > > > > For parallel sparse matrices using the standard PETSc formats the > matrix is stored in two parts on each process (see the details in > MatCreateAIJ()) thus there is no inexpensive way to access directly the IJ > locations as a single local matrix. What are you hoping to use the > information for? Perhaps we have other suggestions on how to achieve the > goal. > > > > Barry > > > > > >> On May 29, 2019, at 2:27 PM, Manav Bhatia via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> > >> Hi, > >> > >> Once a MPI-AIJ matrix has been assembled, is there a method to get > the nonzero I-J locations? I see one for sequential matrices here: > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetRowIJ.html > , but not for parallel matrices. > >> > >> Regards, > >> Manav > >> > >> > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed May 29 19:04:54 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Thu, 30 May 2019 00:04:54 +0000 Subject: [petsc-users] Nonzero I-j locations In-Reply-To: References: <320DB646-4D80-4273-BD82-A33014D0DF65@anl.gov> Message-ID: Understood. Where are you putting the "few extra unknowns" in the vector and matrix? On the first process, on the last process, some places in the middle of the matrix? We don't have any trivial code for copying a big matrix into a even larger matrix directly because we frown on doing that. It is very wasteful in time and memory. The simplest way to do it is call MatGetRow() twice for each row, once to get the nonzero locations for each row to determine the numbers needed for preallocation and then the second time after the big matrix has been preallocated to get the nonzero locations and numerical values for the row to call MatSetValues() with to set that row into the bigger matrix. Note of course when you call MatSetValues() you will need to shift the rows and column locations to take into account the new rows and columns in the bigger matrix. If you put the "extra unknowns" at the every end of the rows/columns on the last process you won't have to shift. Note that B being dense really messes up chances for load balancing since its rows are dense and take a great deal of space so whatever process gets those rows needs to have much less of the mesh. The correct long term approach is to have libmesh provide the needed functionality (for continuation) for the slightly larger matrix directly so huge matrices do not need to be copied. I noticed that libmesh has some functionality related to continuation. I do not know if they handle it by creating the larger matrix and vector and filling that up directly for finite elements. If they do then you should definitely take a look at that and see if it can be extended for your case (ignore the continuation algorithm they may be using, that is not relevant, the question is if they generate the larger matrices and if you can leverage this). The ultimate hack would be to (for example) assign the extra variables to the end of the last process and hack lib mesh a little bit so the matrix it creates (before it puts in the numerical values) has the extra rows and columns, that libmesh will not put the values into but you will. Thus you get libmesh to fill up the true final matrix for its finite element problem (not realizing the matrix is a little bigger then it needs) directly, no copies of the data needed. But this is bit tricky, you'll need to combine libmesh's preallocation information with yours for the final columns and rows before you have lib mesh put the numerical values in. Double check if they have any support for this first. Barry > On May 29, 2019, at 6:29 PM, Manav Bhatia wrote: > > Thanks, Barry. > > I am working on a FE application (involving bifurcation behavior) with libMesh where I need to solve the system of equations along with a few extra unknowns that are not directly related to the FE mesh. I am able to assemble the n x 1 residual (R_fe) and n x n Jacobian (J_fe ) from my code and libMesh provides me with the sparsity pattern for this. > > Next, the system of equations that I need to solve is: > > [ J_fe A ] { dX } = { R_fe } > [ B C ] { dV } = {R_ext } > > Where, C is a dense matrix of size m x m ( m << n ), A is n x m, B is m x n, R_ext is m x 1. A, B and C are dense matrixes. This comes from the bordered system for my path continuation solver. > > I have implemented a solver using Schur factorization ( this is outside of PETSc and does not use the FieldSplit construct ). This works well for most cases, except when J_fe is close to singular. > > I am now attempting to create a monolithic matrix that solves the complete system. > > Currently, the approach I am considering is to compute J_fe using my libMesh application, so that I don?t have to change that. I am defining a new matrix with the extra non-zero locations for A, B, C. > > With J_fe computed, I am looking to copy its non-zero entries to this new matrix. This is where I am stumbling since I don?t know how best to get the non-zero locations in J_fe. Maybe there is a better approach to copy from J_fe to the new matrix? > > I have looked through the nested matrix construct, but have not given this a serious consideration. Maybe I should? Note that I don?t want to solve J_fe and C separately (not as separate systems), so the field-split approach will not be suitable here. > > Also, I am currently using MUMPS for all my parallel solves. > > I would appreciate any advice. > > Regards, > Manav > > >> On May 29, 2019, at 6:07 PM, Smith, Barry F. wrote: >> >> >> Manav, >> >> For parallel sparse matrices using the standard PETSc formats the matrix is stored in two parts on each process (see the details in MatCreateAIJ()) thus there is no inexpensive way to access directly the IJ locations as a single local matrix. What are you hoping to use the information for? Perhaps we have other suggestions on how to achieve the goal. >> >> Barry >> >> >>> On May 29, 2019, at 2:27 PM, Manav Bhatia via petsc-users wrote: >>> >>> Hi, >>> >>> Once a MPI-AIJ matrix has been assembled, is there a method to get the nonzero I-J locations? I see one for sequential matrices here: https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetRowIJ.html , but not for parallel matrices. >>> >>> Regards, >>> Manav >>> >>> >> > From s_g at berkeley.edu Wed May 29 19:10:58 2019 From: s_g at berkeley.edu (Sanjay Govindjee) Date: Wed, 29 May 2019 17:10:58 -0700 Subject: [petsc-users] Memory inquire functions Message-ID: (In Fortran) do the calls ??????? call PetscMallocGetCurrentUsage(val, ierr) ??????? call PetscMemoryGetCurrentUsage(val, ierr) return the per process memory numbers? or are the returned values summed across all processes? -sanjay From bsmith at mcs.anl.gov Wed May 29 19:13:35 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Thu, 30 May 2019 00:13:35 +0000 Subject: [petsc-users] Nonzero I-j locations In-Reply-To: References: <320DB646-4D80-4273-BD82-A33014D0DF65@anl.gov> Message-ID: <9F2458F2-0CDB-4D7E-B44B-1E2DA4B1C66A@mcs.anl.gov> This is an interesting idea, but unfortunately not directly compatible with libMesh filling up the finite element part of the matrix. Plus it appears MatGetLocalSubMatrix() is only implemented for IS and Nest matrices :-( You could create a MATNEST reusing exactly the matrix from lib mesh as the first block and call MatConvert() to MPIAIJ format. This is easier I guess then coding the conversion yourself. (Still has the memory and copy issues but if that is the best we can do). Note that MatNest() requires that all its matrices live on all the ranks of the MPI_Comm, so for your A B and C you will need to declare them on the MPI_Comm with zero rows and columns for most ranks (maybe all but one). Barry > On May 29, 2019, at 6:43 PM, Matthew Knepley wrote: > > On Wed, May 29, 2019 at 7:30 PM Manav Bhatia via petsc-users wrote: > > I would not choose Nest if you want to eventually run MUMPS, since that will not work. I would > try to build your matrix using > > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetLocalSubMatrix.html > > obtained from your bigger matrix. This is our interface to assembling into portions or your matrix as if > its the entire matrix. > > Thanks, > > Matt > > Also, I am currently using MUMPS for all my parallel solves. > > I would appreciate any advice. > > Regards, > Manav > > > > On May 29, 2019, at 6:07 PM, Smith, Barry F. wrote: > > > > > > Manav, > > > > For parallel sparse matrices using the standard PETSc formats the matrix is stored in two parts on each process (see the details in MatCreateAIJ()) thus there is no inexpensive way to access directly the IJ locations as a single local matrix. What are you hoping to use the information for? Perhaps we have other suggestions on how to achieve the goal. > > > > Barry > > > > > >> On May 29, 2019, at 2:27 PM, Manav Bhatia via petsc-users wrote: > >> > >> Hi, > >> > >> Once a MPI-AIJ matrix has been assembled, is there a method to get the nonzero I-J locations? I see one for sequential matrices here: https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetRowIJ.html , but not for parallel matrices. > >> > >> Regards, > >> Manav > >> > >> > > > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ From bsmith at mcs.anl.gov Wed May 29 19:27:58 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Thu, 30 May 2019 00:27:58 +0000 Subject: [petsc-users] Memory inquire functions In-Reply-To: References: Message-ID: They are for the given process. > On May 29, 2019, at 7:10 PM, Sanjay Govindjee via petsc-users wrote: > > (In Fortran) do the calls > > call PetscMallocGetCurrentUsage(val, ierr) > call PetscMemoryGetCurrentUsage(val, ierr) > > return the per process memory numbers? or are the returned values summed across all processes? > > -sanjay > From bhatiamanav at gmail.com Wed May 29 21:11:02 2019 From: bhatiamanav at gmail.com (Manav Bhatia) Date: Wed, 29 May 2019 21:11:02 -0500 Subject: [petsc-users] Nonzero I-j locations In-Reply-To: References: <320DB646-4D80-4273-BD82-A33014D0DF65@anl.gov> Message-ID: <312FC2BE-E528-47CA-AF94-36DCCB246313@gmail.com> Barry, Thanks for the detailed message. I checked libMesh?s continuation sovler and it appears to be using the same system solver without creating a larger matrix: https://github.com/libMesh/libmesh/blob/master/src/systems/continuation_system.C I need to implement this in my code, MAST, for various reasons (mainly, it fits inside a bigger workflow). The current implementation implementation follows the Schur factorization approach: https://mastmultiphysics.github.io/class_m_a_s_t_1_1_continuation_solver_base.html#details I will look into some solutions pertaining to the use of MatGetLocalSubMatrix or leverage some existing functionality in libMesh. Thanks, Manav > On May 29, 2019, at 7:04 PM, Smith, Barry F. wrote: > > > Understood. Where are you putting the "few extra unknowns" in the vector and matrix? On the first process, on the last process, some places in the middle of the matrix? > > We don't have any trivial code for copying a big matrix into a even larger matrix directly because we frown on doing that. It is very wasteful in time and memory. > > The simplest way to do it is call MatGetRow() twice for each row, once to get the nonzero locations for each row to determine the numbers needed for preallocation and then the second time after the big matrix has been preallocated to get the nonzero locations and numerical values for the row to call MatSetValues() with to set that row into the bigger matrix. Note of course when you call MatSetValues() you will need to shift the rows and column locations to take into account the new rows and columns in the bigger matrix. If you put the "extra unknowns" at the every end of the rows/columns on the last process you won't have to shift. > > Note that B being dense really messes up chances for load balancing since its rows are dense and take a great deal of space so whatever process gets those rows needs to have much less of the mesh. > > The correct long term approach is to have libmesh provide the needed functionality (for continuation) for the slightly larger matrix directly so huge matrices do not need to be copied. > > I noticed that libmesh has some functionality related to continuation. I do not know if they handle it by creating the larger matrix and vector and filling that up directly for finite elements. If they do then you should definitely take a look at that and see if it can be extended for your case (ignore the continuation algorithm they may be using, that is not relevant, the question is if they generate the larger matrices and if you can leverage this). > > > The ultimate hack would be to (for example) assign the extra variables to the end of the last process and hack lib mesh a little bit so the matrix it creates (before it puts in the numerical values) has the extra rows and columns, that libmesh will not put the values into but you will. Thus you get libmesh to fill up the true final matrix for its finite element problem (not realizing the matrix is a little bigger then it needs) directly, no copies of the data needed. But this is bit tricky, you'll need to combine libmesh's preallocation information with yours for the final columns and rows before you have lib mesh put the numerical values in. Double check if they have any support for this first. > > Barry > > >> On May 29, 2019, at 6:29 PM, Manav Bhatia wrote: >> >> Thanks, Barry. >> >> I am working on a FE application (involving bifurcation behavior) with libMesh where I need to solve the system of equations along with a few extra unknowns that are not directly related to the FE mesh. I am able to assemble the n x 1 residual (R_fe) and n x n Jacobian (J_fe ) from my code and libMesh provides me with the sparsity pattern for this. >> >> Next, the system of equations that I need to solve is: >> >> [ J_fe A ] { dX } = { R_fe } >> [ B C ] { dV } = {R_ext } >> >> Where, C is a dense matrix of size m x m ( m << n ), A is n x m, B is m x n, R_ext is m x 1. A, B and C are dense matrixes. This comes from the bordered system for my path continuation solver. >> >> I have implemented a solver using Schur factorization ( this is outside of PETSc and does not use the FieldSplit construct ). This works well for most cases, except when J_fe is close to singular. >> >> I am now attempting to create a monolithic matrix that solves the complete system. >> >> Currently, the approach I am considering is to compute J_fe using my libMesh application, so that I don?t have to change that. I am defining a new matrix with the extra non-zero locations for A, B, C. >> >> With J_fe computed, I am looking to copy its non-zero entries to this new matrix. This is where I am stumbling since I don?t know how best to get the non-zero locations in J_fe. Maybe there is a better approach to copy from J_fe to the new matrix? >> >> I have looked through the nested matrix construct, but have not given this a serious consideration. Maybe I should? Note that I don?t want to solve J_fe and C separately (not as separate systems), so the field-split approach will not be suitable here. >> >> Also, I am currently using MUMPS for all my parallel solves. >> >> I would appreciate any advice. >> >> Regards, >> Manav >> >> >>> On May 29, 2019, at 6:07 PM, Smith, Barry F. wrote: >>> >>> >>> Manav, >>> >>> For parallel sparse matrices using the standard PETSc formats the matrix is stored in two parts on each process (see the details in MatCreateAIJ()) thus there is no inexpensive way to access directly the IJ locations as a single local matrix. What are you hoping to use the information for? Perhaps we have other suggestions on how to achieve the goal. >>> >>> Barry >>> >>> >>>> On May 29, 2019, at 2:27 PM, Manav Bhatia via petsc-users wrote: >>>> >>>> Hi, >>>> >>>> Once a MPI-AIJ matrix has been assembled, is there a method to get the nonzero I-J locations? I see one for sequential matrices here: https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetRowIJ.html , but not for parallel matrices. >>>> >>>> Regards, >>>> Manav >>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.croucher at auckland.ac.nz Wed May 29 21:38:07 2019 From: a.croucher at auckland.ac.nz (Adrian Croucher) Date: Thu, 30 May 2019 14:38:07 +1200 Subject: [petsc-users] parallel dual porosity In-Reply-To: References: Message-ID: <32084356-1159-0ad7-c510-57ed0fb0d34b@auckland.ac.nz> hi On 28/05/19 11:32 AM, Matthew Knepley wrote: > > I would not do that. It should be much easier, and better from a > workflow standpoint, > to just redistribute in parallel. We now have several test examples > that redistribute > in parallel, for example > > https://bitbucket.org/petsc/petsc/src/cd762eb66180d8d1fcc3950bd19a3c1b423f4f20/src/dm/impls/plex/examples/tests/ex1.c#lines-486 > > Let us know if you have problems. If you use DMPlexDistribute() a second time to redistribute the mesh, presumably it will first delete all the partition ghost cells created from the initial distribution (and create new ones after redistribution). If I have also used DMPlexConstructGhostCells() to create boundary condition ghost cells on the distributed mesh, what happens to them when I redistribute? Do they get copied over to the redistributed mesh? or is it better not to add them until the mesh has been redistributed? - Adrian -- Dr Adrian Croucher Senior Research Fellow Department of Engineering Science University of Auckland, New Zealand email: a.croucher at auckland.ac.nz tel: +64 (0)9 923 4611 -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 29 21:45:07 2019 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 29 May 2019 22:45:07 -0400 Subject: [petsc-users] parallel dual porosity In-Reply-To: <32084356-1159-0ad7-c510-57ed0fb0d34b@auckland.ac.nz> References: <32084356-1159-0ad7-c510-57ed0fb0d34b@auckland.ac.nz> Message-ID: On Wed, May 29, 2019 at 10:38 PM Adrian Croucher wrote: > hi > On 28/05/19 11:32 AM, Matthew Knepley wrote: > > > I would not do that. It should be much easier, and better from a workflow > standpoint, > to just redistribute in parallel. We now have several test examples that > redistribute > in parallel, for example > > > https://bitbucket.org/petsc/petsc/src/cd762eb66180d8d1fcc3950bd19a3c1b423f4f20/src/dm/impls/plex/examples/tests/ex1.c#lines-486 > > Let us know if you have problems. > > If you use DMPlexDistribute() a second time to redistribute the mesh, > presumably it will first delete all the partition ghost cells created from > the initial distribution (and create new ones after redistribution). > > Hmm, I had not thought about that. It will not do that at all. We have never rebalanced a simulation using overlap cells. I would have to write the code that strips them out. Not hard, but more code. If you only plan on redistributing once, you can wait until then to add the overlap cells. > If I have also used DMPlexConstructGhostCells() to create boundary > condition ghost cells on the distributed mesh, what happens to them when I > redistribute? Do they get copied over to the redistributed mesh? or is it > better not to add them until the mesh has been redistributed? > > Same thing here. Don't add ghost cells until after redistribution. If you want to redistribute multiple times, again we would need to strip out those cells. Thanks, Matt > - Adrian > > -- > Dr Adrian Croucher > Senior Research Fellow > Department of Engineering Science > University of Auckland, New Zealand > email: a.croucher at auckland.ac.nz > tel: +64 (0)9 923 4611 > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From janicvermaak at gmail.com Wed May 29 21:49:39 2019 From: janicvermaak at gmail.com (Jan Izak Cornelius Vermaak) Date: Wed, 29 May 2019 21:49:39 -0500 Subject: [petsc-users] Matrix free GMRES seems to ignore my initial guess In-Reply-To: <80420AED-1242-40CE-9D00-BB60371C12F5@mcs.anl.gov> References: <393102F5-A229-4C5A-B2B3-3BE9B5B7314B@anl.gov> <80420AED-1242-40CE-9D00-BB60371C12F5@mcs.anl.gov> Message-ID: Just some feedback. I found the problem. For reference my solve was called as follows KSPSolve(ksp,b,phi_new) Inside my matrix operation (the "Matrix-Action" or MAT_OP_MULT) I was using phi_new for a computation and that overwrite my initial guess everytime. Looks like the solver still holds on to phi_new and uses it internally and therefore when I modify it it basically changes the entire behavior. Lesson learned: Internal to the custom MAT_OP_MULT, do not modify, b or phi_new. Thanks for the help. Regards, Jan Vermaak On Tue, May 28, 2019 at 1:16 AM Smith, Barry F. wrote: > > > > On May 28, 2019, at 12:41 AM, Jan Izak Cornelius Vermaak < > janicvermaak at gmail.com> wrote: > > > > Checking the matrix would be hard as I have a really big operator. Its > transport sweeps. > > > > When I increase the restart interval the solution converges to the right > one. > > Run with -ksp_monitor_true_residual what are the true residuals being > printed? > > The GMRES code has been in continuous use for 25 years, it would > stunning if you suddenly found a bug in it. > > How it works, within a restart, the GMRES algorithm uses a simple > recursive formula to compute an "estimate" for the residual norm. At > restart it actually computes the current solution and then uses that to > compute an accurate residual norm via the formula b - A x. When the > residual computed by the b - A x is different than that computed by the > recursive formula it means the recursive formula has run into some > difficulty (bad operator, bad preconditioner, null space in the operator) > and is not computing correct values. Now if you increase the restart to > past the point when it "converges" you are hiding the incorrectly computed > values computed via the recursive formula. > > I urge you to check the residual norm by b - A x at the end of the solve > and double check that it is small. It seems unlikely GMRES is providing the > correct answer for your problem. > > Barry > > > > Checked against a reference solution and Classic Richardson. It is > really as if the initial guess is completely ignored. > > > > [0] Computing b > > [0] Iteration 0 Residual 169.302 > > [0] Iteration 1 Residual 47.582 > > [0] Iteration 2 Residual 13.2614 > > [0] Iteration 3 Residual 4.46795 > > [0] Iteration 4 Residual 1.03038 > > [0] Iteration 5 Residual 0.246807 > > [0] Iteration 6 Residual 0.0828341 > > [0] Iteration 7 Residual 0.0410627 > > [0] Iteration 8 Residual 0.0243749 > > [0] Iteration 9 Residual 0.0136067 > > [0] Iteration 10 Residual 0.00769078 > > [0] Iteration 11 Residual 0.00441658 > > [0] Iteration 12 Residual 0.00240794 > > [0] Iteration 13 Residual 0.00132048 > > [0] Iteration 14 Residual 0.00073003 > > [0] Iteration 15 Residual 0.000399504 > > [0] Iteration 16 Residual 0.000217677 > > [0] Iteration 17 Residual 0.000120408 > > [0] Iteration 18 Residual 6.49719e-05 > > [0] Iteration 19 Residual 3.44523e-05 > > [0] Iteration 20 Residual 1.87909e-05 > > [0] Iteration 21 Residual 1.02385e-05 > > [0] Iteration 22 Residual 5.57859e-06 > > [0] Iteration 23 Residual 3.03431e-06 > > [0] Iteration 24 Residual 1.63696e-06 > > [0] Iteration 25 Residual 8.78202e-07 > > > > On Mon, May 27, 2019 at 11:55 PM Smith, Barry F. > wrote: > > > > This behavior where the residual norm jumps at restart indicates > something is very very wrong. Run with the option > -ksp_monitor_true_residual and I think you'll see the true residual is not > decreasing as is the preconditioned residual. My guess is that your "action > of the matrix" is incorrect and not actually a linear operator. Try using > MatComputeExplicitOperator() and see what explicit matrix it produces, is > it what you expect? > > > > Barry > > > > > > > > > > > On May 27, 2019, at 11:33 PM, Jan Izak Cornelius Vermaak via > petsc-users wrote: > > > > > > Hi all, > > > > > > So I am faced with this debacle. I have a neutron transport solver > with a sweep code that can compute the action of the matrix on a vector. > > > > > > I use a matrix shell to set up the action of the matrix. The method > works but only if I can get the solution converged before GMRES restarts. > It gets the right answer. Now my first problem is (and I only saw this when > I hit the first restart) is that it looks like the solver completely resets > after the GMRES-restart. Below is an iteration log with restart interval > set to 10. At first I thought it wasn't updating the initial guess but it > became clear that it initial guess always had no effect. I do set > KSPSetInitialGuessNonZero but it has no effect. > > > > > > Is the matrix-free business defaulting my initial guess to zero > everytime? What can I do to actually supply an initial guess? I've used > PETSc for diffusion many times and the initial guess always works, just not > now. > > > > > > [0] Computing b > > > [0] Iteration 0 Residual 169.302 > > > [0] Iteration 1 Residual 47.582 > > > [0] Iteration 2 Residual 13.2614 > > > [0] Iteration 3 Residual 4.46795 > > > [0] Iteration 4 Residual 1.03038 > > > [0] Iteration 5 Residual 0.246807 > > > [0] Iteration 6 Residual 0.0828341 > > > [0] Iteration 7 Residual 0.0410627 > > > [0] Iteration 8 Residual 0.0243749 > > > [0] Iteration 9 Residual 0.0136067 > > > [0] Iteration 10 Residual 169.302 > > > [0] Iteration 11 Residual 47.582 > > > [0] Iteration 12 Residual 13.2614 > > > [0] Iteration 13 Residual 4.46795 > > > [0] Iteration 14 Residual 1.03038 > > > [0] Iteration 15 Residual 0.246807 > > > [0] Iteration 16 Residual 0.0828341 > > > [0] Iteration 17 Residual 0.0410627 > > > [0] Iteration 18 Residual 0.0243749 > > > [0] Iteration 19 Residual 0.0136067 > > > [0] Iteration 20 Residual 169.302 > > > > > > -- > > > Jan Izak Cornelius Vermaak > > > (M.Eng Nuclear) > > > Email: janicvermaak at gmail.com > > > Cell: +1-979-739-0789 > > > > > > > > -- > > Jan Izak Cornelius Vermaak > > (M.Eng Nuclear) > > Email: janicvermaak at gmail.com > > Cell: +1-979-739-0789 > > -- Jan Izak Cornelius Vermaak (M.Eng Nuclear) Email: janicvermaak at gmail.com Cell: +1-979-739-0789 -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.croucher at auckland.ac.nz Wed May 29 21:54:24 2019 From: a.croucher at auckland.ac.nz (Adrian Croucher) Date: Thu, 30 May 2019 14:54:24 +1200 Subject: [petsc-users] parallel dual porosity In-Reply-To: References: <32084356-1159-0ad7-c510-57ed0fb0d34b@auckland.ac.nz> Message-ID: <1cd1bf65-3b89-abef-2f18-58f5499499f2@auckland.ac.nz> On 30/05/19 2:45 PM, Matthew Knepley wrote: > > Hmm, I had not thought about that. It will not do that at all. We have > never rebalanced a simulation > using overlap cells. I would have to write the code that strips them > out. Not hard, but more code. > If you only plan on redistributing once, you can wait until then to > add the overlap cells. So for now at least, you would suggest doing the initial distribution with overlap = 0, and then the redistribution with overlap = 1? (I shouldn't need to redistribute more than once.) - Adrian -- Dr Adrian Croucher Senior Research Fellow Department of Engineering Science University of Auckland, New Zealand email: a.croucher at auckland.ac.nz tel: +64 (0)9 923 4611 -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed May 29 22:08:07 2019 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 29 May 2019 23:08:07 -0400 Subject: [petsc-users] parallel dual porosity In-Reply-To: <1cd1bf65-3b89-abef-2f18-58f5499499f2@auckland.ac.nz> References: <32084356-1159-0ad7-c510-57ed0fb0d34b@auckland.ac.nz> <1cd1bf65-3b89-abef-2f18-58f5499499f2@auckland.ac.nz> Message-ID: On Wed, May 29, 2019 at 10:54 PM Adrian Croucher wrote: > On 30/05/19 2:45 PM, Matthew Knepley wrote: > > > Hmm, I had not thought about that. It will not do that at all. We have > never rebalanced a simulation > using overlap cells. I would have to write the code that strips them out. > Not hard, but more code. > If you only plan on redistributing once, you can wait until then to add > the overlap cells. > > > So for now at least, you would suggest doing the initial distribution with > overlap = 0, and then the redistribution with overlap = 1? > > (I shouldn't need to redistribute more than once.) > Yep. Those actions are actually separate too. All Distribute() does is first distribute with no overlap, and then call DIstributeOverlap(). Thanks, Matt > - Adrian > > -- > Dr Adrian Croucher > Senior Research Fellow > Department of Engineering Science > University of Auckland, New Zealand > email: a.croucher at auckland.ac.nz > tel: +64 (0)9 923 4611 > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed May 29 23:13:22 2019 From: jed at jedbrown.org (Jed Brown) Date: Wed, 29 May 2019 22:13:22 -0600 Subject: [petsc-users] Nonzero I-j locations In-Reply-To: <9F2458F2-0CDB-4D7E-B44B-1E2DA4B1C66A@mcs.anl.gov> References: <320DB646-4D80-4273-BD82-A33014D0DF65@anl.gov> <9F2458F2-0CDB-4D7E-B44B-1E2DA4B1C66A@mcs.anl.gov> Message-ID: <8736kwk2al.fsf@jedbrown.org> "Smith, Barry F. via petsc-users" writes: > This is an interesting idea, but unfortunately not directly compatible with libMesh filling up the finite element part of the matrix. Plus it appears MatGetLocalSubMatrix() is only implemented for IS and Nest matrices :-( Maybe I'm missing something, but MatGetLocalSubMatrix *is* implemented for arbitrary Mats; it returns a view that allows you to set entries using local submatrix indexing. That was a key feature of the MatNest work from so many years ago and a paper on which you're coauthor. ;-) From bsmith at mcs.anl.gov Wed May 29 23:27:47 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Thu, 30 May 2019 04:27:47 +0000 Subject: [petsc-users] Nonzero I-j locations In-Reply-To: <8736kwk2al.fsf@jedbrown.org> References: <320DB646-4D80-4273-BD82-A33014D0DF65@anl.gov> <9F2458F2-0CDB-4D7E-B44B-1E2DA4B1C66A@mcs.anl.gov> <8736kwk2al.fsf@jedbrown.org> Message-ID: Sorry, my mistake. I assumed that the naming would follow PETSc convention and there would be MatGetLocalSubMatrix_something() as there is MatGetLocalSubMatrix_IS() and MatGetLocalSubMatrix_Nest(). Instead MatGetLocalSubMatrix() is hardwired to call MatCreateLocalRef() if the method is not provide for the original matrix. Now interestingly MatCreateLocalRef() has its own manual page which states: Most will use MatGetLocalSubMatrix(). I am not sure why MatCreateLocalRef() is a public function (that is why it would ever be called directly). Perhaps a note could be added to its manual page indicating why it is public. My inclination would be to make it private and call it MatGetLocalSubMatrix_Basic(). There is harm in having multiple similar public functions unless there is a true need for them. Barry I don't remember the names of anything in PETSc, I only remember the naming conventions, hence when something is nonstandard I tend to get lost. > On May 29, 2019, at 11:13 PM, Jed Brown wrote: > > "Smith, Barry F. via petsc-users" writes: > >> This is an interesting idea, but unfortunately not directly compatible with libMesh filling up the finite element part of the matrix. Plus it appears MatGetLocalSubMatrix() is only implemented for IS and Nest matrices :-( > > Maybe I'm missing something, but MatGetLocalSubMatrix *is* implemented > for arbitrary Mats; it returns a view that allows you to set entries > using local submatrix indexing. That was a key feature of the MatNest > work from so many years ago and a paper on which you're coauthor. ;-) From jed at jedbrown.org Wed May 29 23:46:05 2019 From: jed at jedbrown.org (Jed Brown) Date: Wed, 29 May 2019 22:46:05 -0600 Subject: [petsc-users] Nonzero I-j locations In-Reply-To: References: <320DB646-4D80-4273-BD82-A33014D0DF65@anl.gov> <9F2458F2-0CDB-4D7E-B44B-1E2DA4B1C66A@mcs.anl.gov> <8736kwk2al.fsf@jedbrown.org> Message-ID: <87zhn4im7m.fsf@jedbrown.org> "Smith, Barry F." writes: > Sorry, my mistake. I assumed that the naming would follow PETSc convention and there would be MatGetLocalSubMatrix_something() as there is MatGetLocalSubMatrix_IS() and MatGetLocalSubMatrix_Nest(). Instead MatGetLocalSubMatrix() is hardwired to call MatCreateLocalRef() if the > method is not provide for the original matrix. > > Now interestingly MatCreateLocalRef() has its own manual page which states: Most will use MatGetLocalSubMatrix(). I am not sure why MatCreateLocalRef() is a public function (that is why it would ever be called directly). Perhaps a note could be added to its manual page indicating why it is public. My inclination would be to make it private and call it MatGetLocalSubMatrix_Basic(). There is harm in having multiple similar public functions unless there is a true need for them. I think the motivation was that it's actually a Mat implementation and thus made sense as Developer level interface rather than a strictly internal interface. I don't know if we had in mind any use cases where it could be useful to a general caller. I don't have a strong opinion at the moment about whether it makes sense to keep like this or make internal. From s_g at berkeley.edu Wed May 29 23:51:54 2019 From: s_g at berkeley.edu (Sanjay Govindjee) Date: Wed, 29 May 2019 21:51:54 -0700 Subject: [petsc-users] Memory growth issue Message-ID: <93669f36-2cfd-6772-abf9-3636e8d09fc3@berkeley.edu> I am trying to track down a memory issue with my code; apologies in advance for the longish message. I am solving a FEA problem with a number of load steps involving about 3000 right hand side and tangent assemblies and solves.? The program is mainly Fortran, with a C memory allocator. When I run my code in strictly serial mode (no Petsc or MPI routines) the memory stays constant over the whole run. When I run it in parallel mode with petsc solvers with num_processes=1, the memory (max resident set size) also stays constant: PetscMalloc = 28,976, ProgramNativeMalloc = constant, Resident Size = 24,854,528 (constant) [CG/JACOBI] [PetscMalloc and Resident Size as reported by PetscMallocGetCurrentUsage and PetscMemoryGetCurrentUsage (and summed across processes as needed); ProgramNativeMalloc reported by program memory allocator.] When I run it in parallel mode with petsc solvers but num_processes=2, the resident memory grows steadily during the run: PetscMalloc = 3,039,072 (constant), ProgramNativeMalloc = constant, Resident Size = (finish) 31,313,920 (start) 24,698,880 [CG/JACOBI] When I run it in parallel mode with petsc solvers but num_processes=4, the resident memory grows steadily during the run: PetscMalloc = 3,307,888 (constant), ProgramNativeMalloc = 1,427,584 (constant), Resident Size = (finish) 70,787,072? (start) 45,801,472 [CG/JACOBI] PetscMalloc = 5,903,808 (constant), ProgramNativeMalloc = 1,427,584 (constant), Resident Size = (finish) 112,410,624 (start) 52,076,544 [GMRES/BJACOBI] PetscMalloc = 3,188,944 (constant), ProgramNativeMalloc = 1,427,584 (constant), Resident Size = (finish) 712,798,208 (start) 381,480,960 [SUPERLU] PetscMalloc = 6,539,408 (constant), ProgramNativeMalloc = 1,427,584 (constant), Resident Size = (finish) 591,048,704 (start) 278,671,360 [MUMPS] The memory growth feels alarming but maybe I do not understand the values in ru_maxrss from getrusage(). My box (MacBook Pro) has a broken Valgrind so I need to get to a system with a functional one; notwithstanding, the code has always been Valgrind clean. There are no Fortran Pointers or Fortran Allocatable arrays in the part of the code being used.? The program's C memory allocator keeps track of itself so I do not see that the problem is there.? The Petsc malloc is also steady. Other random hints: 1) If I comment out the call to KSPSolve and to my MPI data-exchange routine (for passing solution values between processes after each solve, use? MPI_Isend, MPI_Recv, MPI_BARRIER)? the memory growth essentially goes away. 2) If I comment out the call to my MPI data-exchange routine but leave the call to KSPSolve the problem remains but is substantially reduced for CG/JACOBI, and is marginally reduced for the GMRES/BJACOBI, SUPERLU, and MUMPS runs. 3) If I comment out the call to KSPSolve but leave the call to my MPI data-exchange routine the problem remains. Any suggestions/hints of where to look will be great. -sanjay From bsmith at mcs.anl.gov Thu May 30 00:17:30 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Thu, 30 May 2019 05:17:30 +0000 Subject: [petsc-users] Memory growth issue In-Reply-To: <93669f36-2cfd-6772-abf9-3636e8d09fc3@berkeley.edu> References: <93669f36-2cfd-6772-abf9-3636e8d09fc3@berkeley.edu> Message-ID: <2076CEA0-CE15-486E-B001-EDE7D86DACA8@anl.gov> This is indeed worrisome. Would it be possible to put PetscMemoryGetCurrentUsage() around each call to KSPSolve() and each call to your data exchange? See if at each step they increase? One thing to be aware of with "max resident set size" is that it measures the number of pages that have been set up. Not the amount of memory allocated. So, if, for example, you allocate a very large array but don't actually read or write the memory in that array until later in the code it won't appear in the "resident set size" until you read or write the memory (because Unix doesn't set up pages until it needs to). You should also try another MPI. Both OpenMPI and MPICH can be installed with brew or you can use --download-mpich or --download-openmp to see if the MPI implementation is making a difference. For now I would focus on the PETSc only solvers to eliminate one variable from the equation; once that is understood you can go back to the question of the memory management of the other solvers Barry > On May 29, 2019, at 11:51 PM, Sanjay Govindjee via petsc-users wrote: > > I am trying to track down a memory issue with my code; apologies in advance for the longish message. > > I am solving a FEA problem with a number of load steps involving about 3000 > right hand side and tangent assemblies and solves. The program is mainly Fortran, with a C memory allocator. > > When I run my code in strictly serial mode (no Petsc or MPI routines) the memory stays constant over the whole run. > > When I run it in parallel mode with petsc solvers with num_processes=1, the memory (max resident set size) also stays constant: > > PetscMalloc = 28,976, ProgramNativeMalloc = constant, Resident Size = 24,854,528 (constant) [CG/JACOBI] > > [PetscMalloc and Resident Size as reported by PetscMallocGetCurrentUsage and PetscMemoryGetCurrentUsage (and summed across processes as needed); > ProgramNativeMalloc reported by program memory allocator.] > > When I run it in parallel mode with petsc solvers but num_processes=2, the resident memory grows steadily during the run: > > PetscMalloc = 3,039,072 (constant), ProgramNativeMalloc = constant, Resident Size = (finish) 31,313,920 (start) 24,698,880 [CG/JACOBI] > > When I run it in parallel mode with petsc solvers but num_processes=4, the resident memory grows steadily during the run: > > PetscMalloc = 3,307,888 (constant), ProgramNativeMalloc = 1,427,584 (constant), Resident Size = (finish) 70,787,072 (start) 45,801,472 [CG/JACOBI] > PetscMalloc = 5,903,808 (constant), ProgramNativeMalloc = 1,427,584 (constant), Resident Size = (finish) 112,410,624 (start) 52,076,544 [GMRES/BJACOBI] > PetscMalloc = 3,188,944 (constant), ProgramNativeMalloc = 1,427,584 (constant), Resident Size = (finish) 712,798,208 (start) 381,480,960 [SUPERLU] > PetscMalloc = 6,539,408 (constant), ProgramNativeMalloc = 1,427,584 (constant), Resident Size = (finish) 591,048,704 (start) 278,671,360 [MUMPS] > > The memory growth feels alarming but maybe I do not understand the values in ru_maxrss from getrusage(). > > My box (MacBook Pro) has a broken Valgrind so I need to get to a system with a functional one; notwithstanding, the code has always been Valgrind clean. > There are no Fortran Pointers or Fortran Allocatable arrays in the part of the code being used. The program's C memory allocator keeps track of > itself so I do not see that the problem is there. The Petsc malloc is also steady. > > Other random hints: > > 1) If I comment out the call to KSPSolve and to my MPI data-exchange routine (for passing solution values between processes after each solve, > use MPI_Isend, MPI_Recv, MPI_BARRIER) the memory growth essentially goes away. > > 2) If I comment out the call to my MPI data-exchange routine but leave the call to KSPSolve the problem remains but is substantially reduced > for CG/JACOBI, and is marginally reduced for the GMRES/BJACOBI, SUPERLU, and MUMPS runs. > > 3) If I comment out the call to KSPSolve but leave the call to my MPI data-exchange routine the problem remains. > > Any suggestions/hints of where to look will be great. > > -sanjay > > From stefano.zampini at gmail.com Thu May 30 00:26:36 2019 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Thu, 30 May 2019 08:26:36 +0300 Subject: [petsc-users] parallel dual porosity In-Reply-To: References: <32084356-1159-0ad7-c510-57ed0fb0d34b@auckland.ac.nz> <1cd1bf65-3b89-abef-2f18-58f5499499f2@auckland.ac.nz> Message-ID: Matt, redistribution with overlapped mesh is fixed in master (probably also in maint) Stefano Il Gio 30 Mag 2019, 06:09 Matthew Knepley via petsc-users < petsc-users at mcs.anl.gov> ha scritto: > On Wed, May 29, 2019 at 10:54 PM Adrian Croucher < > a.croucher at auckland.ac.nz> wrote: > >> On 30/05/19 2:45 PM, Matthew Knepley wrote: >> >> >> Hmm, I had not thought about that. It will not do that at all. We have >> never rebalanced a simulation >> using overlap cells. I would have to write the code that strips them out. >> Not hard, but more code. >> If you only plan on redistributing once, you can wait until then to add >> the overlap cells. >> >> >> So for now at least, you would suggest doing the initial distribution >> with overlap = 0, and then the redistribution with overlap = 1? >> >> (I shouldn't need to redistribute more than once.) >> > > Yep. Those actions are actually separate too. All Distribute() does is > first distribute with no overlap, and then call DIstributeOverlap(). > > Thanks, > > Matt > >> - Adrian >> >> -- >> Dr Adrian Croucher >> Senior Research Fellow >> Department of Engineering Science >> University of Auckland, New Zealand >> email: a.croucher at auckland.ac.nz >> tel: +64 (0)9 923 4611 >> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From s_g at berkeley.edu Thu May 30 02:01:18 2019 From: s_g at berkeley.edu (Sanjay Govindjee) Date: Thu, 30 May 2019 00:01:18 -0700 Subject: [petsc-users] Memory growth issue In-Reply-To: <2076CEA0-CE15-486E-B001-EDE7D86DACA8@anl.gov> References: <93669f36-2cfd-6772-abf9-3636e8d09fc3@berkeley.edu> <2076CEA0-CE15-486E-B001-EDE7D86DACA8@anl.gov> Message-ID: I put in calls to PetscMemoryGetCurrentUsage( ) around KSPSolve and my data exchange routine.? The problem is clearly mostly in my data exchange routine. Attached are graphs of the change in memory? for each call.? Lots of calls have zero change, but on a periodic regular basis the memory goes up from the data exchange; much less so with the KSPSolve calls (and then mostly on the first calls). For the CG/Jacobi????????? data_exchange_total = 21,311,488; kspsolve_total = 2,625,536 For the GMRES/BJACOBI data_exchange_total = 6,619,136; kspsolve_total = 54,403,072 (dominated by initial calls) I will try to switch up my MPI to see if anything changes; right now my configure is with? --download-openmpi. I've also attached the data exchange routine in case there is something obviously wrong. NB: Graphs/Data are from just one run each. -sanjay On 5/29/19 10:17 PM, Smith, Barry F. wrote: > This is indeed worrisome. > > Would it be possible to put PetscMemoryGetCurrentUsage() around each call to KSPSolve() and each call to your data exchange? See if at each step they increase? > > One thing to be aware of with "max resident set size" is that it measures the number of pages that have been set up. Not the amount of memory allocated. So, if, for example, you allocate a very large array but don't actually read or write the memory in that array until later in the code it won't appear in the "resident set size" until you read or write the memory (because Unix doesn't set up pages until it needs to). > > You should also try another MPI. Both OpenMPI and MPICH can be installed with brew or you can use --download-mpich or --download-openmp to see if the MPI implementation is making a difference. > > For now I would focus on the PETSc only solvers to eliminate one variable from the equation; once that is understood you can go back to the question of the memory management of the other solvers > > Barry > > >> On May 29, 2019, at 11:51 PM, Sanjay Govindjee via petsc-users wrote: >> >> I am trying to track down a memory issue with my code; apologies in advance for the longish message. >> >> I am solving a FEA problem with a number of load steps involving about 3000 >> right hand side and tangent assemblies and solves. The program is mainly Fortran, with a C memory allocator. >> >> When I run my code in strictly serial mode (no Petsc or MPI routines) the memory stays constant over the whole run. >> >> When I run it in parallel mode with petsc solvers with num_processes=1, the memory (max resident set size) also stays constant: >> >> PetscMalloc = 28,976, ProgramNativeMalloc = constant, Resident Size = 24,854,528 (constant) [CG/JACOBI] >> >> [PetscMalloc and Resident Size as reported by PetscMallocGetCurrentUsage and PetscMemoryGetCurrentUsage (and summed across processes as needed); >> ProgramNativeMalloc reported by program memory allocator.] >> >> When I run it in parallel mode with petsc solvers but num_processes=2, the resident memory grows steadily during the run: >> >> PetscMalloc = 3,039,072 (constant), ProgramNativeMalloc = constant, Resident Size = (finish) 31,313,920 (start) 24,698,880 [CG/JACOBI] >> >> When I run it in parallel mode with petsc solvers but num_processes=4, the resident memory grows steadily during the run: >> >> PetscMalloc = 3,307,888 (constant), ProgramNativeMalloc = 1,427,584 (constant), Resident Size = (finish) 70,787,072 (start) 45,801,472 [CG/JACOBI] >> PetscMalloc = 5,903,808 (constant), ProgramNativeMalloc = 1,427,584 (constant), Resident Size = (finish) 112,410,624 (start) 52,076,544 [GMRES/BJACOBI] >> PetscMalloc = 3,188,944 (constant), ProgramNativeMalloc = 1,427,584 (constant), Resident Size = (finish) 712,798,208 (start) 381,480,960 [SUPERLU] >> PetscMalloc = 6,539,408 (constant), ProgramNativeMalloc = 1,427,584 (constant), Resident Size = (finish) 591,048,704 (start) 278,671,360 [MUMPS] >> >> The memory growth feels alarming but maybe I do not understand the values in ru_maxrss from getrusage(). >> >> My box (MacBook Pro) has a broken Valgrind so I need to get to a system with a functional one; notwithstanding, the code has always been Valgrind clean. >> There are no Fortran Pointers or Fortran Allocatable arrays in the part of the code being used. The program's C memory allocator keeps track of >> itself so I do not see that the problem is there. The Petsc malloc is also steady. >> >> Other random hints: >> >> 1) If I comment out the call to KSPSolve and to my MPI data-exchange routine (for passing solution values between processes after each solve, >> use MPI_Isend, MPI_Recv, MPI_BARRIER) the memory growth essentially goes away. >> >> 2) If I comment out the call to my MPI data-exchange routine but leave the call to KSPSolve the problem remains but is substantially reduced >> for CG/JACOBI, and is marginally reduced for the GMRES/BJACOBI, SUPERLU, and MUMPS runs. >> >> 3) If I comment out the call to KSPSolve but leave the call to my MPI data-exchange routine the problem remains. >> >> Any suggestions/hints of where to look will be great. >> >> -sanjay >> >> -------------- next part -------------- A non-text attachment was scrubbed... Name: cg.png Type: image/png Size: 52635 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gmres.png Type: image/png Size: 51046 bytes Desc: not available URL: -------------- next part -------------- !$Id:$ subroutine psetb(b,getp,getv,senp,senv,eq, ndf, rdatabuf,sdatabuf) ! * * F E A P * * A Finite Element Analysis Program !.... Copyright (c) 1984-2017: Regents of the University of California ! All rights reserved !-----[--.----+----.----+----.-----------------------------------------] ! Modification log Date (dd/mm/year) ! Original version 01/11/2006 ! 1. Revise send/receive data add barrier 24/11/2006 ! 2. Correct send/receive and add error messages 16/03/2007 ! 3. Change 'include/finclude' to 'finclude' 23/01/2009 ! 4. Remove common 'pfeapa' (values in 'setups') 05/02/2009 ! 5. finclude -> petsc/finclude 12/05/2016 ! 6. Update for PETSc 3.8.3 28/02/2018 ! 7. Change 'id' to 'eq' 23/05/2019 !-----[--.----+----.----+----.-----------------------------------------] ! Purpose: Transfer PETSC vector to local arrays including ghost ! node data via MPI messaging ! Inputs: ! getp(ntasks) - Pointer array for getting ghost data ! getv(sum(getp)) - Local node numbers for getting ghost data ! senp(ntasks) - Pointer array for sending ghost data ! senv(sum(senp)) - Local node numbers to send out as ghost data ! eq(ndf,numnp) - Local equation numbers ! ndf - dofs per node ! rdatabuf(*) - receive communication array ! sdatabuf(*) - send communication array ! Outputs: ! b(neq) - Local solution vector !-----[--.----+----.----+----.-----------------------------------------] # include use petscsys implicit none # include "cdata.h" # include "iofile.h" # include "pfeapb.h" # include "setups.h" integer ndf integer i, j, k, lnodei, eqi, soff, rbuf,sbuf,tbuf, idesp integer getp(*),getv(*),senp(*),senv(*) integer eq(ndf,*) real*8 b(*), rdatabuf(*), sdatabuf(*) integer usolve_msg, req parameter (usolve_msg=10) ! Petsc values PetscErrorCode ierr integer msg_stat(MPI_STATUS_SIZE) ! Sending Data Asynchronously soff = 0 idesp = 0 do i = 1, ntasks if (senp(i) .gt. 0) then sbuf = soff do j = 1, senp(i) lnodei = senv(j + idesp) do k = 1, ndf eqi = eq(k,lnodei) if (eqi .gt. 0) then sbuf = sbuf + 1 sdatabuf(sbuf) = b(eqi) endif end do ! k end do ! j idesp = idesp + senp(i) sbuf = sbuf - soff call MPI_Isend( sdatabuf(soff+1), sbuf, MPI_DOUBLE_PRECISION, & i-1, usolve_msg, MPI_COMM_WORLD, req, ierr ) ! Report send error if(ierr.ne.0) then write(iow,*) ' -->> Send Error[',rank+1,'->',i,']',ierr write( *,*) ' -->> Send Error[',rank+1,'->',i,']',ierr endif soff = soff + sbuf endif end do ! i ! Receiving Data in blocking mode idesp = 0 do i = 1, ntasks if (getp(i) .gt. 0) then ! Determine receive length tbuf = getp(i)*ndf call MPI_Recv( rdatabuf, tbuf, MPI_DOUBLE_PRECISION, i-1, & usolve_msg, MPI_COMM_WORLD,msg_stat, ierr) if(ierr.ne.0) then write(iow,*) 'Recv Error[',i,'->',rank+1,']',ierr write( *,*) 'Recv Error[',i,'->',rank+1,']',ierr endif rbuf = 0 do j = 1, getp(i) lnodei = getv(j + idesp) do k = 1, ndf eqi = eq(k,lnodei) if (eqi .gt. 0 ) then rbuf = rbuf + 1 b( eqi ) = rdatabuf( rbuf ) endif end do ! k end do ! j idesp = idesp + getp(i) endif end do ! i call MPI_BARRIER(MPI_COMM_WORLD, ierr) end From bsmith at mcs.anl.gov Thu May 30 02:14:41 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Thu, 30 May 2019 07:14:41 +0000 Subject: [petsc-users] Memory growth issue In-Reply-To: References: <93669f36-2cfd-6772-abf9-3636e8d09fc3@berkeley.edu> <2076CEA0-CE15-486E-B001-EDE7D86DACA8@anl.gov> Message-ID: <0EC809E5-9DE2-4470-BF58-F8EBDECF3ACD@mcs.anl.gov> Let us know how it goes with MPICH > On May 30, 2019, at 2:01 AM, Sanjay Govindjee wrote: > > I put in calls to PetscMemoryGetCurrentUsage( ) around KSPSolve and my data exchange routine. The problem is clearly mostly in my data exchange routine. > Attached are graphs of the change in memory for each call. Lots of calls have zero change, but on a periodic regular basis the memory goes up from the data exchange; much less > so with the KSPSolve calls (and then mostly on the first calls). > > For the CG/Jacobi data_exchange_total = 21,311,488; kspsolve_total = 2,625,536 > For the GMRES/BJACOBI data_exchange_total = 6,619,136; kspsolve_total = 54,403,072 (dominated by initial calls) > > I will try to switch up my MPI to see if anything changes; right now my configure is with --download-openmpi. > I've also attached the data exchange routine in case there is something obviously wrong. > > NB: Graphs/Data are from just one run each. > > -sanjay > > On 5/29/19 10:17 PM, Smith, Barry F. wrote: >> This is indeed worrisome. >> >> Would it be possible to put PetscMemoryGetCurrentUsage() around each call to KSPSolve() and each call to your data exchange? See if at each step they increase? >> >> One thing to be aware of with "max resident set size" is that it measures the number of pages that have been set up. Not the amount of memory allocated. So, if, for example, you allocate a very large array but don't actually read or write the memory in that array until later in the code it won't appear in the "resident set size" until you read or write the memory (because Unix doesn't set up pages until it needs to). >> >> You should also try another MPI. Both OpenMPI and MPICH can be installed with brew or you can use --download-mpich or --download-openmp to see if the MPI implementation is making a difference. >> >> For now I would focus on the PETSc only solvers to eliminate one variable from the equation; once that is understood you can go back to the question of the memory management of the other solvers >> >> Barry >> >> >>> On May 29, 2019, at 11:51 PM, Sanjay Govindjee via petsc-users wrote: >>> >>> I am trying to track down a memory issue with my code; apologies in advance for the longish message. >>> >>> I am solving a FEA problem with a number of load steps involving about 3000 >>> right hand side and tangent assemblies and solves. The program is mainly Fortran, with a C memory allocator. >>> >>> When I run my code in strictly serial mode (no Petsc or MPI routines) the memory stays constant over the whole run. >>> >>> When I run it in parallel mode with petsc solvers with num_processes=1, the memory (max resident set size) also stays constant: >>> >>> PetscMalloc = 28,976, ProgramNativeMalloc = constant, Resident Size = 24,854,528 (constant) [CG/JACOBI] >>> >>> [PetscMalloc and Resident Size as reported by PetscMallocGetCurrentUsage and PetscMemoryGetCurrentUsage (and summed across processes as needed); >>> ProgramNativeMalloc reported by program memory allocator.] >>> >>> When I run it in parallel mode with petsc solvers but num_processes=2, the resident memory grows steadily during the run: >>> >>> PetscMalloc = 3,039,072 (constant), ProgramNativeMalloc = constant, Resident Size = (finish) 31,313,920 (start) 24,698,880 [CG/JACOBI] >>> >>> When I run it in parallel mode with petsc solvers but num_processes=4, the resident memory grows steadily during the run: >>> >>> PetscMalloc = 3,307,888 (constant), ProgramNativeMalloc = 1,427,584 (constant), Resident Size = (finish) 70,787,072 (start) 45,801,472 [CG/JACOBI] >>> PetscMalloc = 5,903,808 (constant), ProgramNativeMalloc = 1,427,584 (constant), Resident Size = (finish) 112,410,624 (start) 52,076,544 [GMRES/BJACOBI] >>> PetscMalloc = 3,188,944 (constant), ProgramNativeMalloc = 1,427,584 (constant), Resident Size = (finish) 712,798,208 (start) 381,480,960 [SUPERLU] >>> PetscMalloc = 6,539,408 (constant), ProgramNativeMalloc = 1,427,584 (constant), Resident Size = (finish) 591,048,704 (start) 278,671,360 [MUMPS] >>> >>> The memory growth feels alarming but maybe I do not understand the values in ru_maxrss from getrusage(). >>> >>> My box (MacBook Pro) has a broken Valgrind so I need to get to a system with a functional one; notwithstanding, the code has always been Valgrind clean. >>> There are no Fortran Pointers or Fortran Allocatable arrays in the part of the code being used. The program's C memory allocator keeps track of >>> itself so I do not see that the problem is there. The Petsc malloc is also steady. >>> >>> Other random hints: >>> >>> 1) If I comment out the call to KSPSolve and to my MPI data-exchange routine (for passing solution values between processes after each solve, >>> use MPI_Isend, MPI_Recv, MPI_BARRIER) the memory growth essentially goes away. >>> >>> 2) If I comment out the call to my MPI data-exchange routine but leave the call to KSPSolve the problem remains but is substantially reduced >>> for CG/JACOBI, and is marginally reduced for the GMRES/BJACOBI, SUPERLU, and MUMPS runs. >>> >>> 3) If I comment out the call to KSPSolve but leave the call to my MPI data-exchange routine the problem remains. >>> >>> Any suggestions/hints of where to look will be great. >>> >>> -sanjay >>> >>> > > From s_g at berkeley.edu Thu May 30 02:58:59 2019 From: s_g at berkeley.edu (Sanjay Govindjee) Date: Thu, 30 May 2019 00:58:59 -0700 Subject: [petsc-users] Memory growth issue In-Reply-To: <0EC809E5-9DE2-4470-BF58-F8EBDECF3ACD@mcs.anl.gov> References: <93669f36-2cfd-6772-abf9-3636e8d09fc3@berkeley.edu> <2076CEA0-CE15-486E-B001-EDE7D86DACA8@anl.gov> <0EC809E5-9DE2-4470-BF58-F8EBDECF3ACD@mcs.anl.gov> Message-ID: <9f80a732-c2a8-1bab-8b2e-591e3f3d65ba@berkeley.edu> The problem seems to persist but with a different signature.? Graphs attached as before. Totals with MPICH (NB: single run) For the CG/Jacobi data_exchange_total = 41,385,984; kspsolve_total = 38,289,408 For the GMRES/BJACOBI data_exchange_total = 41,324,544; kspsolve_total = 41,324,544 Just reading the MPI docs I am wondering if I need some sort of MPI_Wait/MPI_Waitall before my MPI_Barrier in the data exchange routine? I would have thought that with the blocking receives and the MPI_Barrier that everything will have fully completed and cleaned up before all processes exited the routine, but perhaps I am wrong on that. -sanjay On 5/30/19 12:14 AM, Smith, Barry F. wrote: > Let us know how it goes with MPICH > > >> On May 30, 2019, at 2:01 AM, Sanjay Govindjee wrote: >> >> I put in calls to PetscMemoryGetCurrentUsage( ) around KSPSolve and my data exchange routine. The problem is clearly mostly in my data exchange routine. >> Attached are graphs of the change in memory for each call. Lots of calls have zero change, but on a periodic regular basis the memory goes up from the data exchange; much less >> so with the KSPSolve calls (and then mostly on the first calls). >> >> For the CG/Jacobi data_exchange_total = 21,311,488; kspsolve_total = 2,625,536 >> For the GMRES/BJACOBI data_exchange_total = 6,619,136; kspsolve_total = 54,403,072 (dominated by initial calls) >> >> I will try to switch up my MPI to see if anything changes; right now my configure is with --download-openmpi. >> I've also attached the data exchange routine in case there is something obviously wrong. >> >> NB: Graphs/Data are from just one run each. >> >> -sanjay >> >> On 5/29/19 10:17 PM, Smith, Barry F. wrote: >>> This is indeed worrisome. >>> >>> Would it be possible to put PetscMemoryGetCurrentUsage() around each call to KSPSolve() and each call to your data exchange? See if at each step they increase? >>> >>> One thing to be aware of with "max resident set size" is that it measures the number of pages that have been set up. Not the amount of memory allocated. So, if, for example, you allocate a very large array but don't actually read or write the memory in that array until later in the code it won't appear in the "resident set size" until you read or write the memory (because Unix doesn't set up pages until it needs to). >>> >>> You should also try another MPI. Both OpenMPI and MPICH can be installed with brew or you can use --download-mpich or --download-openmp to see if the MPI implementation is making a difference. >>> >>> For now I would focus on the PETSc only solvers to eliminate one variable from the equation; once that is understood you can go back to the question of the memory management of the other solvers >>> >>> Barry >>> >>> >>>> On May 29, 2019, at 11:51 PM, Sanjay Govindjee via petsc-users wrote: >>>> >>>> I am trying to track down a memory issue with my code; apologies in advance for the longish message. >>>> >>>> I am solving a FEA problem with a number of load steps involving about 3000 >>>> right hand side and tangent assemblies and solves. The program is mainly Fortran, with a C memory allocator. >>>> >>>> When I run my code in strictly serial mode (no Petsc or MPI routines) the memory stays constant over the whole run. >>>> >>>> When I run it in parallel mode with petsc solvers with num_processes=1, the memory (max resident set size) also stays constant: >>>> >>>> PetscMalloc = 28,976, ProgramNativeMalloc = constant, Resident Size = 24,854,528 (constant) [CG/JACOBI] >>>> >>>> [PetscMalloc and Resident Size as reported by PetscMallocGetCurrentUsage and PetscMemoryGetCurrentUsage (and summed across processes as needed); >>>> ProgramNativeMalloc reported by program memory allocator.] >>>> >>>> When I run it in parallel mode with petsc solvers but num_processes=2, the resident memory grows steadily during the run: >>>> >>>> PetscMalloc = 3,039,072 (constant), ProgramNativeMalloc = constant, Resident Size = (finish) 31,313,920 (start) 24,698,880 [CG/JACOBI] >>>> >>>> When I run it in parallel mode with petsc solvers but num_processes=4, the resident memory grows steadily during the run: >>>> >>>> PetscMalloc = 3,307,888 (constant), ProgramNativeMalloc = 1,427,584 (constant), Resident Size = (finish) 70,787,072 (start) 45,801,472 [CG/JACOBI] >>>> PetscMalloc = 5,903,808 (constant), ProgramNativeMalloc = 1,427,584 (constant), Resident Size = (finish) 112,410,624 (start) 52,076,544 [GMRES/BJACOBI] >>>> PetscMalloc = 3,188,944 (constant), ProgramNativeMalloc = 1,427,584 (constant), Resident Size = (finish) 712,798,208 (start) 381,480,960 [SUPERLU] >>>> PetscMalloc = 6,539,408 (constant), ProgramNativeMalloc = 1,427,584 (constant), Resident Size = (finish) 591,048,704 (start) 278,671,360 [MUMPS] >>>> >>>> The memory growth feels alarming but maybe I do not understand the values in ru_maxrss from getrusage(). >>>> >>>> My box (MacBook Pro) has a broken Valgrind so I need to get to a system with a functional one; notwithstanding, the code has always been Valgrind clean. >>>> There are no Fortran Pointers or Fortran Allocatable arrays in the part of the code being used. The program's C memory allocator keeps track of >>>> itself so I do not see that the problem is there. The Petsc malloc is also steady. >>>> >>>> Other random hints: >>>> >>>> 1) If I comment out the call to KSPSolve and to my MPI data-exchange routine (for passing solution values between processes after each solve, >>>> use MPI_Isend, MPI_Recv, MPI_BARRIER) the memory growth essentially goes away. >>>> >>>> 2) If I comment out the call to my MPI data-exchange routine but leave the call to KSPSolve the problem remains but is substantially reduced >>>> for CG/JACOBI, and is marginally reduced for the GMRES/BJACOBI, SUPERLU, and MUMPS runs. >>>> >>>> 3) If I comment out the call to KSPSolve but leave the call to my MPI data-exchange routine the problem remains. >>>> >>>> Any suggestions/hints of where to look will be great. >>>> >>>> -sanjay >>>> >>>> >> -------------- next part -------------- A non-text attachment was scrubbed... Name: gmres_mpich.png Type: image/png Size: 47684 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: cg_mpich.png Type: image/png Size: 46839 bytes Desc: not available URL: From knepley at gmail.com Thu May 30 06:36:06 2019 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 30 May 2019 07:36:06 -0400 Subject: [petsc-users] parallel dual porosity In-Reply-To: References: <32084356-1159-0ad7-c510-57ed0fb0d34b@auckland.ac.nz> <1cd1bf65-3b89-abef-2f18-58f5499499f2@auckland.ac.nz> Message-ID: On Thu, May 30, 2019 at 1:26 AM Stefano Zampini wrote: > Matt, > > redistribution with overlapped mesh is fixed in master (probably also in > maint) > Thanks! Do you just strip out the overlap cells from the partition calculation? Matt > Stefano > > Il Gio 30 Mag 2019, 06:09 Matthew Knepley via petsc-users < > petsc-users at mcs.anl.gov> ha scritto: > >> On Wed, May 29, 2019 at 10:54 PM Adrian Croucher < >> a.croucher at auckland.ac.nz> wrote: >> >>> On 30/05/19 2:45 PM, Matthew Knepley wrote: >>> >>> >>> Hmm, I had not thought about that. It will not do that at all. We have >>> never rebalanced a simulation >>> using overlap cells. I would have to write the code that strips them >>> out. Not hard, but more code. >>> If you only plan on redistributing once, you can wait until then to add >>> the overlap cells. >>> >>> >>> So for now at least, you would suggest doing the initial distribution >>> with overlap = 0, and then the redistribution with overlap = 1? >>> >>> (I shouldn't need to redistribute more than once.) >>> >> >> Yep. Those actions are actually separate too. All Distribute() does is >> first distribute with no overlap, and then call DIstributeOverlap(). >> >> Thanks, >> >> Matt >> >>> - Adrian >>> >>> -- >>> Dr Adrian Croucher >>> Senior Research Fellow >>> Department of Engineering Science >>> University of Auckland, New Zealand >>> email: a.croucher at auckland.ac.nz >>> tel: +64 (0)9 923 4611 >>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Thu May 30 06:37:01 2019 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Thu, 30 May 2019 14:37:01 +0300 Subject: [petsc-users] parallel dual porosity In-Reply-To: References: <32084356-1159-0ad7-c510-57ed0fb0d34b@auckland.ac.nz> <1cd1bf65-3b89-abef-2f18-58f5499499f2@auckland.ac.nz> Message-ID: Yes Il Gio 30 Mag 2019, 14:36 Matthew Knepley ha scritto: > On Thu, May 30, 2019 at 1:26 AM Stefano Zampini > wrote: > >> Matt, >> >> redistribution with overlapped mesh is fixed in master (probably also in >> maint) >> > > Thanks! Do you just strip out the overlap cells from the partition > calculation? > > Matt > > >> Stefano >> >> Il Gio 30 Mag 2019, 06:09 Matthew Knepley via petsc-users < >> petsc-users at mcs.anl.gov> ha scritto: >> >>> On Wed, May 29, 2019 at 10:54 PM Adrian Croucher < >>> a.croucher at auckland.ac.nz> wrote: >>> >>>> On 30/05/19 2:45 PM, Matthew Knepley wrote: >>>> >>>> >>>> Hmm, I had not thought about that. It will not do that at all. We have >>>> never rebalanced a simulation >>>> using overlap cells. I would have to write the code that strips them >>>> out. Not hard, but more code. >>>> If you only plan on redistributing once, you can wait until then to add >>>> the overlap cells. >>>> >>>> >>>> So for now at least, you would suggest doing the initial distribution >>>> with overlap = 0, and then the redistribution with overlap = 1? >>>> >>>> (I shouldn't need to redistribute more than once.) >>>> >>> >>> Yep. Those actions are actually separate too. All Distribute() does is >>> first distribute with no overlap, and then call DIstributeOverlap(). >>> >>> Thanks, >>> >>> Matt >>> >>>> - Adrian >>>> >>>> -- >>>> Dr Adrian Croucher >>>> Senior Research Fellow >>>> Department of Engineering Science >>>> University of Auckland, New Zealand >>>> email: a.croucher at auckland.ac.nz >>>> tel: +64 (0)9 923 4611 >>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >>> >> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wence at gmx.li Thu May 30 06:48:45 2019 From: wence at gmx.li (Lawrence Mitchell) Date: Thu, 30 May 2019 12:48:45 +0100 Subject: [petsc-users] Memory growth issue In-Reply-To: <9f80a732-c2a8-1bab-8b2e-591e3f3d65ba@berkeley.edu> References: <93669f36-2cfd-6772-abf9-3636e8d09fc3@berkeley.edu> <2076CEA0-CE15-486E-B001-EDE7D86DACA8@anl.gov> <0EC809E5-9DE2-4470-BF58-F8EBDECF3ACD@mcs.anl.gov> <9f80a732-c2a8-1bab-8b2e-591e3f3d65ba@berkeley.edu> Message-ID: <52F9D2F5-7EA4-4225-928B-A2C02DC47DE3@gmx.li> Hi Sanjay, > On 30 May 2019, at 08:58, Sanjay Govindjee via petsc-users wrote: > > The problem seems to persist but with a different signature. Graphs attached as before. > > Totals with MPICH (NB: single run) > > For the CG/Jacobi data_exchange_total = 41,385,984; kspsolve_total = 38,289,408 > For the GMRES/BJACOBI data_exchange_total = 41,324,544; kspsolve_total = 41,324,544 > > Just reading the MPI docs I am wondering if I need some sort of MPI_Wait/MPI_Waitall before my MPI_Barrier in the data exchange routine? > I would have thought that with the blocking receives and the MPI_Barrier that everything will have fully completed and cleaned up before > all processes exited the routine, but perhaps I am wrong on that. Skimming the fortran code you sent you do: for i in ...: call MPI_Isend(..., req, ierr) for i in ...: call MPI_Recv(..., ierr) But you never call MPI_Wait on the request you got back from the Isend. So the MPI library will never free the data structures it created. The usual pattern for these non-blocking communications is to allocate an array for the requests of length nsend+nrecv and then do: for i in nsend: call MPI_Isend(..., req[i], ierr) for j in nrecv: call MPI_Irecv(..., req[nsend+j], ierr) call MPI_Waitall(req, ..., ierr) I note also there's no need for the Barrier at the end of the routine, this kind of communication does neighbourwise synchronisation, no need to add (unnecessary) global synchronisation too. As an aside, is there a reason you don't use PETSc's VecScatter to manage this global to local exchange? Cheers, Lawrence From bsmith at mcs.anl.gov Thu May 30 09:58:46 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Thu, 30 May 2019 14:58:46 +0000 Subject: [petsc-users] Stop KSP if diverging In-Reply-To: References: <2B4302F1-A47E-4865-9601-B42BC618ED29@anl.gov> <00BF757D-21B7-4660-A6BE-FE03A5AFB3BE@mcs.anl.gov> Message-ID: You most definitely want to call KSPGetConvergedReason() after every solve KSPConvergedReason reason .... call KSPSolve(ksp,b,x,ierr) call KSPGetConvergedReason(ksp,reason,ierr) if (reason .lt. 0) then printf*,'KSPSolve() has not converged' return endif > On May 30, 2019, at 5:13 AM, Edoardo alinovi wrote: > > Hello Barry, > > I am solving ns eqs using the standard simple algorithm. > > As example let's say the the u velocity compoment diverges, the output of ksp is a nan as solution vector, but the program keeps going ahead solving also for the other components and for the pressure where again I get NaN. The only way to stop it is by hand using ctrl+C. > > I am not using kspConvergedReason, maybe can you provide me an example in fortran? > > Many thanks as always! > > On Wed, 29 May 2019, 23:54 Smith, Barry F., wrote: > > > > On May 29, 2019, at 2:54 PM, Edoardo alinovi wrote: > > > > Hello Barry, > > > > Using Matt's fix KSP stops exactly at the first NaN. Otherwise the simulation continues > > Yes, but what do you mean by the simulation? Are you calling KSPGetConvergedReason() after each solve to check if it was valid? Is the particular linear solver continuing to run with Nan? > > > > > even if everything is a NaN. I am using version 3.10.4, is this not enough up to date maybe? > > > > Tomorrow, I will run the monitor and get back to you. For today I have got my "safe working limit" :) > > > > Thanks a lot guys for the help! > > > > ------ > > > > Edoardo Alinovi, Ph.D. > > > > DICCA, Scuola Politecnica, > > Universita' degli Studi di Genova, > > 1, via Montallegro, > > 16145 Genova, Italy > > > > Email: edoardo.alinovi at dicca.unige.it > > Tel: +39 010 353 2540 > > Website: https://www.edoardoalinovi.com/ > > > > > > > > > > Il giorno mer 29 mag 2019 alle ore 20:22 Smith, Barry F. ha scritto: > > > > Hmm, in the lastest couple of releases of PETSc the KSPSolve is suppose to end as soon as it hits a NaN or Infinity. Is that not happening for you? If you run with -ksp_monitor does it print multiple lines with Nan or Inf? If so please send use the -ksp_view output so we can track down which solver is not correctly handling the Nan or Info. > > > > That said if you call KSPSolve() multiple times in a loop or from SNESSolve() each new solve may have Nan or Inf (from the previous) but it should only do one iteration before exiting. > > > > You should always call KSPGetConvergedReason() after KSPSolve() and confirm that the reason is positive, if it is native it indicates something failed in the solve. > > > > Barry > > > > > > > On May 29, 2019, at 2:06 AM, Edoardo alinovi via petsc-users wrote: > > > > > > Dear PETSc friends, > > > > > > Hope you are doing all well. > > > > > > I have a quick question for you that I am not able to solve by my self. Time to time, while testing new code features, it happens that KSP diverges but it does not stop automatically and the iterations continue even after getting a NaN. > > > > > > In the KSP setup I use the following instruction to set the divergence stopping criteria (div = 10000): > > > > > > call KSPSetTolerances(myksp, rel_tol, abs_tol, div, itmax, ierr) > > > > > > But is does not help. Looking into the documentation I have found also: > > > KSPConvergedDefault(KSP ksp,PetscInt n,PetscReal rnorm,KSPConvergedReason *reason,void *ctx) > > > Which I am not calling in the code. Is this maybe the reason of my problem? If yes how can I use KSPConvergedDefault in FORTRAN? > > > > > > Thanks, > > > > > > Edo > > > > > > ------ > > > > > > Edoardo Alinovi, Ph.D. > > > > > > DICCA, Scuola Politecnica, > > > Universita' degli Studi di Genova, > > > 1, via Montallegro, > > > 16145 Genova, Italy > > > > > > Email: edoardo.alinovi at dicca.unige.it > > > Tel: +39 010 353 2540 > > > Website: https://www.edoardoalinovi.com/ > > > > > > > > > From bsmith at mcs.anl.gov Thu May 30 10:16:45 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Thu, 30 May 2019 15:16:45 +0000 Subject: [petsc-users] Memory growth issue In-Reply-To: <52F9D2F5-7EA4-4225-928B-A2C02DC47DE3@gmx.li> References: <93669f36-2cfd-6772-abf9-3636e8d09fc3@berkeley.edu> <2076CEA0-CE15-486E-B001-EDE7D86DACA8@anl.gov> <0EC809E5-9DE2-4470-BF58-F8EBDECF3ACD@mcs.anl.gov> <9f80a732-c2a8-1bab-8b2e-591e3f3d65ba@berkeley.edu> <52F9D2F5-7EA4-4225-928B-A2C02DC47DE3@gmx.li> Message-ID: <8DB59F2B-8704-4ACE-BA4F-94882976EC8A@mcs.anl.gov> Great observation Lawrence. https://www.slideshare.net/jsquyres/friends-dont-let-friends-leak-mpirequests You can add the following option to --download-mpich --download-mpich-configure-arguments="--enable-error-messages=all --enable-g" then MPICH will report all MPI resources that have not been freed during the run. This helps catching missing waits, etc. We have a nightly test that utilizes this for the PETSc libraries. Barry > On May 30, 2019, at 6:48 AM, Lawrence Mitchell wrote: > > Hi Sanjay, > >> On 30 May 2019, at 08:58, Sanjay Govindjee via petsc-users wrote: >> >> The problem seems to persist but with a different signature. Graphs attached as before. >> >> Totals with MPICH (NB: single run) >> >> For the CG/Jacobi data_exchange_total = 41,385,984; kspsolve_total = 38,289,408 >> For the GMRES/BJACOBI data_exchange_total = 41,324,544; kspsolve_total = 41,324,544 >> >> Just reading the MPI docs I am wondering if I need some sort of MPI_Wait/MPI_Waitall before my MPI_Barrier in the data exchange routine? >> I would have thought that with the blocking receives and the MPI_Barrier that everything will have fully completed and cleaned up before >> all processes exited the routine, but perhaps I am wrong on that. > > > Skimming the fortran code you sent you do: > > for i in ...: > call MPI_Isend(..., req, ierr) > > for i in ...: > call MPI_Recv(..., ierr) > > But you never call MPI_Wait on the request you got back from the Isend. So the MPI library will never free the data structures it created. > > The usual pattern for these non-blocking communications is to allocate an array for the requests of length nsend+nrecv and then do: > > for i in nsend: > call MPI_Isend(..., req[i], ierr) > for j in nrecv: > call MPI_Irecv(..., req[nsend+j], ierr) > > call MPI_Waitall(req, ..., ierr) > > I note also there's no need for the Barrier at the end of the routine, this kind of communication does neighbourwise synchronisation, no need to add (unnecessary) global synchronisation too. > > As an aside, is there a reason you don't use PETSc's VecScatter to manage this global to local exchange? > > Cheers, > > Lawrence From s_g at berkeley.edu Thu May 30 13:47:22 2019 From: s_g at berkeley.edu (Sanjay Govindjee) Date: Thu, 30 May 2019 11:47:22 -0700 Subject: [petsc-users] Memory growth issue In-Reply-To: <52F9D2F5-7EA4-4225-928B-A2C02DC47DE3@gmx.li> References: <93669f36-2cfd-6772-abf9-3636e8d09fc3@berkeley.edu> <2076CEA0-CE15-486E-B001-EDE7D86DACA8@anl.gov> <0EC809E5-9DE2-4470-BF58-F8EBDECF3ACD@mcs.anl.gov> <9f80a732-c2a8-1bab-8b2e-591e3f3d65ba@berkeley.edu> <52F9D2F5-7EA4-4225-928B-A2C02DC47DE3@gmx.li> Message-ID: <026b416b-ea74-c73a-285f-484a82c806f2@berkeley.edu> Lawrence, Thanks for taking a look!? This is what I had been wondering about -- my knowledge of MPI is pretty minimal and this origins of the routine were from a programmer we hired a decade+ back from NERSC.? I'll have to look into VecScatter.? It will be great to dispense with our roll-your-own routines (we even have our own reduceALL scattered around the code). Interestingly, the MPI_WaitALL has solved the problem when using OpenMPI but it still persists with MPICH.? Graphs attached. I'm going to run with openmpi for now (but I guess I really still need to figure out what is wrong with MPICH and WaitALL; I'll try Barry's suggestion of --download-mpich-configure-arguments="--enable-error-messages=all --enable-g" later today and report back). Regarding MPI_Barrier, it was put in due a problem that some processes were finishing up sending and receiving and exiting the subroutine before the receiving processes had completed (which resulted in data loss as the buffers are freed after the call to the routine). MPI_Barrier was the solution proposed to us.? I don't think I can dispense with it, but will think about some more. I'm not so sure about using MPI_IRecv as it will require a bit of rewriting since right now I process the received data sequentially after each blocking MPI_Recv -- clearly slower but easier to code. Thanks again for the help. -sanjay On 5/30/19 4:48 AM, Lawrence Mitchell wrote: > Hi Sanjay, > >> On 30 May 2019, at 08:58, Sanjay Govindjee via petsc-users wrote: >> >> The problem seems to persist but with a different signature. Graphs attached as before. >> >> Totals with MPICH (NB: single run) >> >> For the CG/Jacobi data_exchange_total = 41,385,984; kspsolve_total = 38,289,408 >> For the GMRES/BJACOBI data_exchange_total = 41,324,544; kspsolve_total = 41,324,544 >> >> Just reading the MPI docs I am wondering if I need some sort of MPI_Wait/MPI_Waitall before my MPI_Barrier in the data exchange routine? >> I would have thought that with the blocking receives and the MPI_Barrier that everything will have fully completed and cleaned up before >> all processes exited the routine, but perhaps I am wrong on that. > > Skimming the fortran code you sent you do: > > for i in ...: > call MPI_Isend(..., req, ierr) > > for i in ...: > call MPI_Recv(..., ierr) > > But you never call MPI_Wait on the request you got back from the Isend. So the MPI library will never free the data structures it created. > > The usual pattern for these non-blocking communications is to allocate an array for the requests of length nsend+nrecv and then do: > > for i in nsend: > call MPI_Isend(..., req[i], ierr) > for j in nrecv: > call MPI_Irecv(..., req[nsend+j], ierr) > > call MPI_Waitall(req, ..., ierr) > > I note also there's no need for the Barrier at the end of the routine, this kind of communication does neighbourwise synchronisation, no need to add (unnecessary) global synchronisation too. > > As an aside, is there a reason you don't use PETSc's VecScatter to manage this global to local exchange? > > Cheers, > > Lawrence -------------- next part -------------- A non-text attachment was scrubbed... Name: cg_mpichwall.png Type: image/png Size: 49339 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: cg_wall.png Type: image/png Size: 44147 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gmres_mpichwall.png Type: image/png Size: 47961 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gmres_wall.png Type: image/png Size: 44029 bytes Desc: not available URL: From bsmith at mcs.anl.gov Thu May 30 13:59:51 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Thu, 30 May 2019 18:59:51 +0000 Subject: [petsc-users] Memory growth issue In-Reply-To: <026b416b-ea74-c73a-285f-484a82c806f2@berkeley.edu> References: <93669f36-2cfd-6772-abf9-3636e8d09fc3@berkeley.edu> <2076CEA0-CE15-486E-B001-EDE7D86DACA8@anl.gov> <0EC809E5-9DE2-4470-BF58-F8EBDECF3ACD@mcs.anl.gov> <9f80a732-c2a8-1bab-8b2e-591e3f3d65ba@berkeley.edu> <52F9D2F5-7EA4-4225-928B-A2C02DC47DE3@gmx.li> <026b416b-ea74-c73a-285f-484a82c806f2@berkeley.edu> Message-ID: <0A8E3440-6893-41A3-BE6B-B9C99574164A@mcs.anl.gov> Thanks for the update. So the current conclusions are that using the Waitall in your code 1) solves the memory issue with OpenMPI in your code 2) does not solve the memory issue with PETSc KSPSolve 3) MPICH has memory issues both for your code and PETSc KSPSolve (despite) the wait all fix? If you literately just comment out the call to KSPSolve() with OpenMPI is there no growth in memory usage? Both 2 and 3 are concerning, indicate possible memory leak bugs in MPICH and not freeing all MPI resources in KSPSolve() Junchao, can you please investigate 2 and 3 with, for example, a TS example that uses the linear solver (like with -ts_type beuler)? Thanks Barry > On May 30, 2019, at 1:47 PM, Sanjay Govindjee wrote: > > Lawrence, > Thanks for taking a look! This is what I had been wondering about -- my knowledge of MPI is pretty minimal and > this origins of the routine were from a programmer we hired a decade+ back from NERSC. I'll have to look into > VecScatter. It will be great to dispense with our roll-your-own routines (we even have our own reduceALL scattered around the code). > > Interestingly, the MPI_WaitALL has solved the problem when using OpenMPI but it still persists with MPICH. Graphs attached. > I'm going to run with openmpi for now (but I guess I really still need to figure out what is wrong with MPICH and WaitALL; > I'll try Barry's suggestion of --download-mpich-configure-arguments="--enable-error-messages=all --enable-g" later today and report back). > > Regarding MPI_Barrier, it was put in due a problem that some processes were finishing up sending and receiving and exiting the subroutine > before the receiving processes had completed (which resulted in data loss as the buffers are freed after the call to the routine). MPI_Barrier was the solution proposed > to us. I don't think I can dispense with it, but will think about some more. > > I'm not so sure about using MPI_IRecv as it will require a bit of rewriting since right now I process the received > data sequentially after each blocking MPI_Recv -- clearly slower but easier to code. > > Thanks again for the help. > > -sanjay > > On 5/30/19 4:48 AM, Lawrence Mitchell wrote: >> Hi Sanjay, >> >>> On 30 May 2019, at 08:58, Sanjay Govindjee via petsc-users wrote: >>> >>> The problem seems to persist but with a different signature. Graphs attached as before. >>> >>> Totals with MPICH (NB: single run) >>> >>> For the CG/Jacobi data_exchange_total = 41,385,984; kspsolve_total = 38,289,408 >>> For the GMRES/BJACOBI data_exchange_total = 41,324,544; kspsolve_total = 41,324,544 >>> >>> Just reading the MPI docs I am wondering if I need some sort of MPI_Wait/MPI_Waitall before my MPI_Barrier in the data exchange routine? >>> I would have thought that with the blocking receives and the MPI_Barrier that everything will have fully completed and cleaned up before >>> all processes exited the routine, but perhaps I am wrong on that. >> >> Skimming the fortran code you sent you do: >> >> for i in ...: >> call MPI_Isend(..., req, ierr) >> >> for i in ...: >> call MPI_Recv(..., ierr) >> >> But you never call MPI_Wait on the request you got back from the Isend. So the MPI library will never free the data structures it created. >> >> The usual pattern for these non-blocking communications is to allocate an array for the requests of length nsend+nrecv and then do: >> >> for i in nsend: >> call MPI_Isend(..., req[i], ierr) >> for j in nrecv: >> call MPI_Irecv(..., req[nsend+j], ierr) >> >> call MPI_Waitall(req, ..., ierr) >> >> I note also there's no need for the Barrier at the end of the routine, this kind of communication does neighbourwise synchronisation, no need to add (unnecessary) global synchronisation too. >> >> As an aside, is there a reason you don't use PETSc's VecScatter to manage this global to local exchange? >> >> Cheers, >> >> Lawrence > > From s_g at berkeley.edu Thu May 30 15:31:00 2019 From: s_g at berkeley.edu (Sanjay Govindjee) Date: Thu, 30 May 2019 13:31:00 -0700 Subject: [petsc-users] Memory growth issue In-Reply-To: <0A8E3440-6893-41A3-BE6B-B9C99574164A@mcs.anl.gov> References: <93669f36-2cfd-6772-abf9-3636e8d09fc3@berkeley.edu> <2076CEA0-CE15-486E-B001-EDE7D86DACA8@anl.gov> <0EC809E5-9DE2-4470-BF58-F8EBDECF3ACD@mcs.anl.gov> <9f80a732-c2a8-1bab-8b2e-591e3f3d65ba@berkeley.edu> <52F9D2F5-7EA4-4225-928B-A2C02DC47DE3@gmx.li> <026b416b-ea74-c73a-285f-484a82c806f2@berkeley.edu> <0A8E3440-6893-41A3-BE6B-B9C99574164A@mcs.anl.gov> Message-ID: <70795be3-bee1-e1a3-46d8-345c2ab0ed38@berkeley.edu> 1) Correct: Placing a WaitAll before the MPI_Barrier solve the problem in our send-get routine for OPENMPI 2) Correct: The problem persists with KSPSolve 3) Correct: WaitAll did not fix the problem in our send-get nor in KSPSolve when using MPICH Also correct.? Commenting out the call to KSPSolve results in zero memory growth on OPENMPI. On 5/30/19 11:59 AM, Smith, Barry F. wrote: > Thanks for the update. So the current conclusions are that using the Waitall in your code > > 1) solves the memory issue with OpenMPI in your code > > 2) does not solve the memory issue with PETSc KSPSolve > > 3) MPICH has memory issues both for your code and PETSc KSPSolve (despite) the wait all fix? > > If you literately just comment out the call to KSPSolve() with OpenMPI is there no growth in memory usage? > > > Both 2 and 3 are concerning, indicate possible memory leak bugs in MPICH and not freeing all MPI resources in KSPSolve() > > Junchao, can you please investigate 2 and 3 with, for example, a TS example that uses the linear solver (like with -ts_type beuler)? Thanks > > > Barry > > > >> On May 30, 2019, at 1:47 PM, Sanjay Govindjee wrote: >> >> Lawrence, >> Thanks for taking a look! This is what I had been wondering about -- my knowledge of MPI is pretty minimal and >> this origins of the routine were from a programmer we hired a decade+ back from NERSC. I'll have to look into >> VecScatter. It will be great to dispense with our roll-your-own routines (we even have our own reduceALL scattered around the code). >> >> Interestingly, the MPI_WaitALL has solved the problem when using OpenMPI but it still persists with MPICH. Graphs attached. >> I'm going to run with openmpi for now (but I guess I really still need to figure out what is wrong with MPICH and WaitALL; >> I'll try Barry's suggestion of --download-mpich-configure-arguments="--enable-error-messages=all --enable-g" later today and report back). >> >> Regarding MPI_Barrier, it was put in due a problem that some processes were finishing up sending and receiving and exiting the subroutine >> before the receiving processes had completed (which resulted in data loss as the buffers are freed after the call to the routine). MPI_Barrier was the solution proposed >> to us. I don't think I can dispense with it, but will think about some more. >> >> I'm not so sure about using MPI_IRecv as it will require a bit of rewriting since right now I process the received >> data sequentially after each blocking MPI_Recv -- clearly slower but easier to code. >> >> Thanks again for the help. >> >> -sanjay >> >> On 5/30/19 4:48 AM, Lawrence Mitchell wrote: >>> Hi Sanjay, >>> >>>> On 30 May 2019, at 08:58, Sanjay Govindjee via petsc-users wrote: >>>> >>>> The problem seems to persist but with a different signature. Graphs attached as before. >>>> >>>> Totals with MPICH (NB: single run) >>>> >>>> For the CG/Jacobi data_exchange_total = 41,385,984; kspsolve_total = 38,289,408 >>>> For the GMRES/BJACOBI data_exchange_total = 41,324,544; kspsolve_total = 41,324,544 >>>> >>>> Just reading the MPI docs I am wondering if I need some sort of MPI_Wait/MPI_Waitall before my MPI_Barrier in the data exchange routine? >>>> I would have thought that with the blocking receives and the MPI_Barrier that everything will have fully completed and cleaned up before >>>> all processes exited the routine, but perhaps I am wrong on that. >>> Skimming the fortran code you sent you do: >>> >>> for i in ...: >>> call MPI_Isend(..., req, ierr) >>> >>> for i in ...: >>> call MPI_Recv(..., ierr) >>> >>> But you never call MPI_Wait on the request you got back from the Isend. So the MPI library will never free the data structures it created. >>> >>> The usual pattern for these non-blocking communications is to allocate an array for the requests of length nsend+nrecv and then do: >>> >>> for i in nsend: >>> call MPI_Isend(..., req[i], ierr) >>> for j in nrecv: >>> call MPI_Irecv(..., req[nsend+j], ierr) >>> >>> call MPI_Waitall(req, ..., ierr) >>> >>> I note also there's no need for the Barrier at the end of the routine, this kind of communication does neighbourwise synchronisation, no need to add (unnecessary) global synchronisation too. >>> >>> As an aside, is there a reason you don't use PETSc's VecScatter to manage this global to local exchange? >>> >>> Cheers, >>> >>> Lawrence >> From swarnava89 at gmail.com Thu May 30 17:19:23 2019 From: swarnava89 at gmail.com (Swarnava Ghosh) Date: Thu, 30 May 2019 15:19:23 -0700 Subject: [petsc-users] Shared vertices of DMPlex mesh Message-ID: Hi PETSc users and developers, I am trying to find the list of vertices of a DMPlex mesh that is shared among processes and also the ranks of these processes. Thoughts on how to do this would be helpful. Sincerely, SG -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.croucher at auckland.ac.nz Thu May 30 17:53:18 2019 From: a.croucher at auckland.ac.nz (Adrian Croucher) Date: Fri, 31 May 2019 10:53:18 +1200 Subject: [petsc-users] parallel dual porosity In-Reply-To: References: <32084356-1159-0ad7-c510-57ed0fb0d34b@auckland.ac.nz> <1cd1bf65-3b89-abef-2f18-58f5499499f2@auckland.ac.nz> Message-ID: On 30/05/19 5:26 PM, Stefano Zampini wrote: > Matt, > > ?redistribution with overlapped mesh is fixed in master (probably also > in maint) Even better. Thanks very much... - Adrian -- Dr Adrian Croucher Senior Research Fellow Department of Engineering Science University of Auckland, New Zealand email: a.croucher at auckland.ac.nz tel: +64 (0)9 923 4611 From knepley at gmail.com Thu May 30 19:38:54 2019 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 30 May 2019 20:38:54 -0400 Subject: [petsc-users] Shared vertices of DMPlex mesh In-Reply-To: References: Message-ID: On Thu, May 30, 2019 at 6:21 PM Swarnava Ghosh via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi PETSc users and developers, > > I am trying to find the list of vertices of a DMPlex mesh that is shared > among processes and also the ranks of these processes. Thoughts on how to > do this would be helpful. > The list of ghost points for a process is in the pointSF, DMPlexGetPointSF(). You can get the owned points which are ghosts on other processes using https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PetscSF/PetscSFComputeDegreeBegin.html Does that make sense? Thanks, Matt > Sincerely, > SG > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jczhang at mcs.anl.gov Thu May 30 22:41:52 2019 From: jczhang at mcs.anl.gov (Zhang, Junchao) Date: Fri, 31 May 2019 03:41:52 +0000 Subject: [petsc-users] Memory growth issue In-Reply-To: <026b416b-ea74-c73a-285f-484a82c806f2@berkeley.edu> References: <93669f36-2cfd-6772-abf9-3636e8d09fc3@berkeley.edu> <2076CEA0-CE15-486E-B001-EDE7D86DACA8@anl.gov> <0EC809E5-9DE2-4470-BF58-F8EBDECF3ACD@mcs.anl.gov> <9f80a732-c2a8-1bab-8b2e-591e3f3d65ba@berkeley.edu> <52F9D2F5-7EA4-4225-928B-A2C02DC47DE3@gmx.li> <026b416b-ea74-c73a-285f-484a82c806f2@berkeley.edu> Message-ID: Hi, Sanjay, Could you send your modified data exchange code (psetb.F) with MPI_Waitall? See other inlined comments below. Thanks. On Thu, May 30, 2019 at 1:49 PM Sanjay Govindjee via petsc-users > wrote: Lawrence, Thanks for taking a look! This is what I had been wondering about -- my knowledge of MPI is pretty minimal and this origins of the routine were from a programmer we hired a decade+ back from NERSC. I'll have to look into VecScatter. It will be great to dispense with our roll-your-own routines (we even have our own reduceALL scattered around the code). Petsc VecScatter has a very simple interface and you definitely should go with. With VecScatter, you can think in familiar vectors and indices instead of the low level MPI_Send/Recv. Besides that, PETSc has optimized VecScatter so that communication is efficient. Interestingly, the MPI_WaitALL has solved the problem when using OpenMPI but it still persists with MPICH. Graphs attached. I'm going to run with openmpi for now (but I guess I really still need to figure out what is wrong with MPICH and WaitALL; I'll try Barry's suggestion of --download-mpich-configure-arguments="--enable-error-messages=all --enable-g" later today and report back). Regarding MPI_Barrier, it was put in due a problem that some processes were finishing up sending and receiving and exiting the subroutine before the receiving processes had completed (which resulted in data loss as the buffers are freed after the call to the routine). MPI_Barrier was the solution proposed to us. I don't think I can dispense with it, but will think about some more. After MPI_Send(), or after MPI_Isend(..,req) and MPI_Wait(req), you can safely free the send buffer without worry that the receive has not completed. MPI guarantees the receiver can get the data, for example, through internal buffering. I'm not so sure about using MPI_IRecv as it will require a bit of rewriting since right now I process the received data sequentially after each blocking MPI_Recv -- clearly slower but easier to code. Thanks again for the help. -sanjay On 5/30/19 4:48 AM, Lawrence Mitchell wrote: > Hi Sanjay, > >> On 30 May 2019, at 08:58, Sanjay Govindjee via petsc-users > wrote: >> >> The problem seems to persist but with a different signature. Graphs attached as before. >> >> Totals with MPICH (NB: single run) >> >> For the CG/Jacobi data_exchange_total = 41,385,984; kspsolve_total = 38,289,408 >> For the GMRES/BJACOBI data_exchange_total = 41,324,544; kspsolve_total = 41,324,544 >> >> Just reading the MPI docs I am wondering if I need some sort of MPI_Wait/MPI_Waitall before my MPI_Barrier in the data exchange routine? >> I would have thought that with the blocking receives and the MPI_Barrier that everything will have fully completed and cleaned up before >> all processes exited the routine, but perhaps I am wrong on that. > > Skimming the fortran code you sent you do: > > for i in ...: > call MPI_Isend(..., req, ierr) > > for i in ...: > call MPI_Recv(..., ierr) > > But you never call MPI_Wait on the request you got back from the Isend. So the MPI library will never free the data structures it created. > > The usual pattern for these non-blocking communications is to allocate an array for the requests of length nsend+nrecv and then do: > > for i in nsend: > call MPI_Isend(..., req[i], ierr) > for j in nrecv: > call MPI_Irecv(..., req[nsend+j], ierr) > > call MPI_Waitall(req, ..., ierr) > > I note also there's no need for the Barrier at the end of the routine, this kind of communication does neighbourwise synchronisation, no need to add (unnecessary) global synchronisation too. > > As an aside, is there a reason you don't use PETSc's VecScatter to manage this global to local exchange? > > Cheers, > > Lawrence -------------- next part -------------- An HTML attachment was scrubbed... URL: From s_g at berkeley.edu Thu May 30 22:54:04 2019 From: s_g at berkeley.edu (Sanjay Govindjee) Date: Thu, 30 May 2019 20:54:04 -0700 Subject: [petsc-users] Memory growth issue In-Reply-To: References: <93669f36-2cfd-6772-abf9-3636e8d09fc3@berkeley.edu> <2076CEA0-CE15-486E-B001-EDE7D86DACA8@anl.gov> <0EC809E5-9DE2-4470-BF58-F8EBDECF3ACD@mcs.anl.gov> <9f80a732-c2a8-1bab-8b2e-591e3f3d65ba@berkeley.edu> <52F9D2F5-7EA4-4225-928B-A2C02DC47DE3@gmx.li> <026b416b-ea74-c73a-285f-484a82c806f2@berkeley.edu> Message-ID: Hi Juanchao, Thanks for the hints below, they will take some time to absorb as the vectors that are being? moved around are actually partly petsc vectors and partly local process vectors. Attached is the modified routine that now works (on leaking memory) with openmpi. -sanjay On 5/30/19 8:41 PM, Zhang, Junchao wrote: > > Hi, Sanjay, > ? Could you send your modified data exchange code (psetb.F) with > MPI_Waitall? See other inlined comments below. Thanks. > > On Thu, May 30, 2019 at 1:49 PM Sanjay Govindjee via petsc-users > > wrote: > > Lawrence, > Thanks for taking a look!? This is what I had been wondering about > -- my > knowledge of MPI is pretty minimal and > this origins of the routine were from a programmer we hired a decade+ > back from NERSC.? I'll have to look into > VecScatter.? It will be great to dispense with our roll-your-own > routines (we even have our own reduceALL scattered around the code). > > Petsc VecScatter has a very simple interface and you definitely should > go with.? With VecScatter, you can think in familiar vectors and > indices instead of the low level MPI_Send/Recv. Besides that, PETSc > has optimized VecScatter so that communication is efficient. > > > Interestingly, the MPI_WaitALL has solved the problem when using > OpenMPI > but it still persists with MPICH.? Graphs attached. > I'm going to run with openmpi for now (but I guess I really still > need > to figure out what is wrong with MPICH and WaitALL; > I'll try Barry's suggestion of > --download-mpich-configure-arguments="--enable-error-messages=all > --enable-g" later today and report back). > > Regarding MPI_Barrier, it was put in due a problem that some > processes > were finishing up sending and receiving and exiting the subroutine > before the receiving processes had completed (which resulted in data > loss as the buffers are freed after the call to the routine). > MPI_Barrier was the solution proposed > to us.? I don't think I can dispense with it, but will think about > some > more. > > After MPI_Send(), or after MPI_Isend(..,req) and MPI_Wait(req), you > can safely free the send buffer without worry that the receive has not > completed. MPI guarantees the receiver can get the data, for example, > through internal buffering. > > > I'm not so sure about using MPI_IRecv as it will require a bit of > rewriting since right now I process the received > data sequentially after each blocking MPI_Recv -- clearly slower but > easier to code. > > Thanks again for the help. > > -sanjay > > On 5/30/19 4:48 AM, Lawrence Mitchell wrote: > > Hi Sanjay, > > > >> On 30 May 2019, at 08:58, Sanjay Govindjee via petsc-users > > wrote: > >> > >> The problem seems to persist but with a different signature.? > Graphs attached as before. > >> > >> Totals with MPICH (NB: single run) > >> > >> For the CG/Jacobi? ? ? ? ? data_exchange_total = 41,385,984; > kspsolve_total = 38,289,408 > >> For the GMRES/BJACOBI? ? ? data_exchange_total = 41,324,544; > kspsolve_total = 41,324,544 > >> > >> Just reading the MPI docs I am wondering if I need some sort of > MPI_Wait/MPI_Waitall before my MPI_Barrier in the data exchange > routine? > >> I would have thought that with the blocking receives and the > MPI_Barrier that everything will have fully completed and cleaned > up before > >> all processes exited the routine, but perhaps I am wrong on that. > > > > Skimming the fortran code you sent you do: > > > > for i in ...: > >? ? ?call MPI_Isend(..., req, ierr) > > > > for i in ...: > >? ? ?call MPI_Recv(..., ierr) > > > > But you never call MPI_Wait on the request you got back from the > Isend. So the MPI library will never free the data structures it > created. > > > > The usual pattern for these non-blocking communications is to > allocate an array for the requests of length nsend+nrecv and then do: > > > > for i in nsend: > >? ? ?call MPI_Isend(..., req[i], ierr) > > for j in nrecv: > >? ? ?call MPI_Irecv(..., req[nsend+j], ierr) > > > > call MPI_Waitall(req, ..., ierr) > > > > I note also there's no need for the Barrier at the end of the > routine, this kind of communication does neighbourwise > synchronisation, no need to add (unnecessary) global > synchronisation too. > > > > As an aside, is there a reason you don't use PETSc's VecScatter > to manage this global to local exchange? > > > > Cheers, > > > > Lawrence > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- !$Id:$ subroutine psetb(b,getp,getv,senp,senv,eq, ndf, rdatabuf,sdatabuf) ! * * F E A P * * A Finite Element Analysis Program !.... Copyright (c) 1984-2017: Regents of the University of California ! All rights reserved !-----[--.----+----.----+----.-----------------------------------------] ! Modification log Date (dd/mm/year) ! Original version 01/11/2006 ! 1. Revise send/receive data add barrier 24/11/2006 ! 2. Correct send/receive and add error messages 16/03/2007 ! 3. Change 'include/finclude' to 'finclude' 23/01/2009 ! 4. Remove common 'pfeapa' (values in 'setups') 05/02/2009 ! 5. finclude -> petsc/finclude 12/05/2016 ! 6. Update for PETSc 3.8.3 28/02/2018 ! 7. Change 'id' to 'eq' 23/05/2019 !-----[--.----+----.----+----.-----------------------------------------] ! Purpose: Transfer PETSC vector to local arrays including ghost ! node data via MPI messaging ! Inputs: ! getp(ntasks) - Pointer array for getting ghost data ! getv(sum(getp)) - Local node numbers for getting ghost data ! senp(ntasks) - Pointer array for sending ghost data ! senv(sum(senp)) - Local node numbers to send out as ghost data ! eq(ndf,numnp) - Local equation numbers ! ndf - dofs per node ! rdatabuf(*) - receive communication array ! sdatabuf(*) - send communication array ! Outputs: ! b(neq) - Local solution vector !-----[--.----+----.----+----.-----------------------------------------] # include use petscsys implicit none # include "cdata.h" # include "iofile.h" # include "pfeapb.h" # include "setups.h" integer ndf integer i, j, k, lnodei, eqi, soff, rbuf,sbuf,tbuf, idesp integer getp(*),getv(*),senp(*),senv(*) integer eq(ndf,*) real*8 b(*), rdatabuf(*), sdatabuf(*) integer usolve_msg, req(ntasks),reqcnt parameter (usolve_msg=10) ! Petsc values PetscErrorCode ierr integer msg_stat(MPI_STATUS_SIZE) ! Sending Data Asynchronously soff = 0 idesp = 0 req(:) = 0 reqcnt = 0 do i = 1, ntasks if (senp(i) .gt. 0) then sbuf = soff do j = 1, senp(i) lnodei = senv(j + idesp) do k = 1, ndf eqi = eq(k,lnodei) if (eqi .gt. 0) then sbuf = sbuf + 1 sdatabuf(sbuf) = b(eqi) endif end do ! k end do ! j idesp = idesp + senp(i) sbuf = sbuf - soff reqcnt = reqcnt + 1 call MPI_Isend( sdatabuf(soff+1), sbuf, MPI_DOUBLE_PRECISION, & i-1, usolve_msg, MPI_COMM_WORLD, req(reqcnt), & ierr) ! Report send error if(ierr.ne.0) then write(iow,*) ' -->> Send Error[',rank+1,'->',i,']',ierr write( *,*) ' -->> Send Error[',rank+1,'->',i,']',ierr endif soff = soff + sbuf endif end do ! i ! Receiving Data in blocking mode idesp = 0 do i = 1, ntasks if (getp(i) .gt. 0) then ! Determine receive length tbuf = getp(i)*ndf call MPI_Recv( rdatabuf, tbuf, MPI_DOUBLE_PRECISION, i-1, & usolve_msg, MPI_COMM_WORLD,msg_stat, ierr) if(ierr.ne.0) then write(iow,*) 'Recv Error[',i,'->',rank+1,']',ierr write( *,*) 'Recv Error[',i,'->',rank+1,']',ierr endif rbuf = 0 do j = 1, getp(i) lnodei = getv(j + idesp) do k = 1, ndf eqi = eq(k,lnodei) if (eqi .gt. 0 ) then rbuf = rbuf + 1 b( eqi ) = rdatabuf( rbuf ) endif end do ! k end do ! j idesp = idesp + getp(i) endif end do ! i call MPI_WaitALL(reqcnt,req,MPI_STATUSES_IGNORE,ierr) call MPI_BARRIER(MPI_COMM_WORLD, ierr) end From bhatiamanav at gmail.com Thu May 30 23:08:43 2019 From: bhatiamanav at gmail.com (Manav Bhatia) Date: Thu, 30 May 2019 23:08:43 -0500 Subject: [petsc-users] Nonzero I-j locations In-Reply-To: <312FC2BE-E528-47CA-AF94-36DCCB246313@gmail.com> References: <320DB646-4D80-4273-BD82-A33014D0DF65@anl.gov> <312FC2BE-E528-47CA-AF94-36DCCB246313@gmail.com> Message-ID: I managed to get this to work. I defined a larger matrix with the dense blocks appended to the end of the matrix on the last processor. Currently, I am only running with one extra unknown, so this should not be a significant penalty for load balancing. Since the larger matrix has the same I-j locations for the FE non-zeros, I use it directly in the FE assembly. I have tested with parallel MUMPS solves and it working smoothly. Also, the monolithic system removes the issue with the singularity of J_fe at/near the bifurcation point. Next, I would like to figure out if there are ways to bring in iterative solvers to solve this more efficiently. My J_fe comes from a nonlinear shell deformation problem with snap through response. I am not sure if it would make sense to use an AMG solver on this monolithic matrix, as opposed to using it as a preconditioner for J_fe in the Schur-factorization approach. The LOCA solver in Trillions was able to find some success with the latter approach: https://www.worldscientific.com/doi/abs/10.1142/S0218127405012508 I would appreciate any general thoughts concerning this. Regards, Manav > On May 29, 2019, at 9:11 PM, Manav Bhatia wrote: > > Barry, > > Thanks for the detailed message. > > I checked libMesh?s continuation sovler and it appears to be using the same system solver without creating a larger matrix: https://github.com/libMesh/libmesh/blob/master/src/systems/continuation_system.C > > I need to implement this in my code, MAST, for various reasons (mainly, it fits inside a bigger workflow). The current implementation implementation follows the Schur factorization approach: https://mastmultiphysics.github.io/class_m_a_s_t_1_1_continuation_solver_base.html#details > > I will look into some solutions pertaining to the use of MatGetLocalSubMatrix or leverage some existing functionality in libMesh. > > Thanks, > Manav > > >> On May 29, 2019, at 7:04 PM, Smith, Barry F. > wrote: >> >> >> Understood. Where are you putting the "few extra unknowns" in the vector and matrix? On the first process, on the last process, some places in the middle of the matrix? >> >> We don't have any trivial code for copying a big matrix into a even larger matrix directly because we frown on doing that. It is very wasteful in time and memory. >> >> The simplest way to do it is call MatGetRow() twice for each row, once to get the nonzero locations for each row to determine the numbers needed for preallocation and then the second time after the big matrix has been preallocated to get the nonzero locations and numerical values for the row to call MatSetValues() with to set that row into the bigger matrix. Note of course when you call MatSetValues() you will need to shift the rows and column locations to take into account the new rows and columns in the bigger matrix. If you put the "extra unknowns" at the every end of the rows/columns on the last process you won't have to shift. >> >> Note that B being dense really messes up chances for load balancing since its rows are dense and take a great deal of space so whatever process gets those rows needs to have much less of the mesh. >> >> The correct long term approach is to have libmesh provide the needed functionality (for continuation) for the slightly larger matrix directly so huge matrices do not need to be copied. >> >> I noticed that libmesh has some functionality related to continuation. I do not know if they handle it by creating the larger matrix and vector and filling that up directly for finite elements. If they do then you should definitely take a look at that and see if it can be extended for your case (ignore the continuation algorithm they may be using, that is not relevant, the question is if they generate the larger matrices and if you can leverage this). >> >> >> The ultimate hack would be to (for example) assign the extra variables to the end of the last process and hack lib mesh a little bit so the matrix it creates (before it puts in the numerical values) has the extra rows and columns, that libmesh will not put the values into but you will. Thus you get libmesh to fill up the true final matrix for its finite element problem (not realizing the matrix is a little bigger then it needs) directly, no copies of the data needed. But this is bit tricky, you'll need to combine libmesh's preallocation information with yours for the final columns and rows before you have lib mesh put the numerical values in. Double check if they have any support for this first. >> >> Barry >> >> >>> On May 29, 2019, at 6:29 PM, Manav Bhatia > wrote: >>> >>> Thanks, Barry. >>> >>> I am working on a FE application (involving bifurcation behavior) with libMesh where I need to solve the system of equations along with a few extra unknowns that are not directly related to the FE mesh. I am able to assemble the n x 1 residual (R_fe) and n x n Jacobian (J_fe ) from my code and libMesh provides me with the sparsity pattern for this. >>> >>> Next, the system of equations that I need to solve is: >>> >>> [ J_fe A ] { dX } = { R_fe } >>> [ B C ] { dV } = {R_ext } >>> >>> Where, C is a dense matrix of size m x m ( m << n ), A is n x m, B is m x n, R_ext is m x 1. A, B and C are dense matrixes. This comes from the bordered system for my path continuation solver. >>> >>> I have implemented a solver using Schur factorization ( this is outside of PETSc and does not use the FieldSplit construct ). This works well for most cases, except when J_fe is close to singular. >>> >>> I am now attempting to create a monolithic matrix that solves the complete system. >>> >>> Currently, the approach I am considering is to compute J_fe using my libMesh application, so that I don?t have to change that. I am defining a new matrix with the extra non-zero locations for A, B, C. >>> >>> With J_fe computed, I am looking to copy its non-zero entries to this new matrix. This is where I am stumbling since I don?t know how best to get the non-zero locations in J_fe. Maybe there is a better approach to copy from J_fe to the new matrix? >>> >>> I have looked through the nested matrix construct, but have not given this a serious consideration. Maybe I should? Note that I don?t want to solve J_fe and C separately (not as separate systems), so the field-split approach will not be suitable here. >>> >>> Also, I am currently using MUMPS for all my parallel solves. >>> >>> I would appreciate any advice. >>> >>> Regards, >>> Manav >>> >>> >>>> On May 29, 2019, at 6:07 PM, Smith, Barry F. > wrote: >>>> >>>> >>>> Manav, >>>> >>>> For parallel sparse matrices using the standard PETSc formats the matrix is stored in two parts on each process (see the details in MatCreateAIJ()) thus there is no inexpensive way to access directly the IJ locations as a single local matrix. What are you hoping to use the information for? Perhaps we have other suggestions on how to achieve the goal. >>>> >>>> Barry >>>> >>>> >>>>> On May 29, 2019, at 2:27 PM, Manav Bhatia via petsc-users > wrote: >>>>> >>>>> Hi, >>>>> >>>>> Once a MPI-AIJ matrix has been assembled, is there a method to get the nonzero I-J locations? I see one for sequential matrices here: https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetRowIJ.html , but not for parallel matrices. >>>>> >>>>> Regards, >>>>> Manav >>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu May 30 23:46:22 2019 From: bsmith at mcs.anl.gov (Smith, Barry F.) Date: Fri, 31 May 2019 04:46:22 +0000 Subject: [petsc-users] Nonzero I-j locations In-Reply-To: References: <320DB646-4D80-4273-BD82-A33014D0DF65@anl.gov> <312FC2BE-E528-47CA-AF94-36DCCB246313@gmail.com> Message-ID: > On May 30, 2019, at 11:08 PM, Manav Bhatia wrote: > > I managed to get this to work. > > I defined a larger matrix with the dense blocks appended to the end of the matrix on the last processor. Currently, I am only running with one extra unknown, so this should not be a significant penalty for load balancing. > > Since the larger matrix has the same I-j locations for the FE non-zeros, I use it directly in the FE assembly. Great! > > I have tested with parallel MUMPS solves and it working smoothly. Also, the monolithic system removes the issue with the singularity of J_fe at/near the bifurcation point. > > Next, I would like to figure out if there are ways to bring in iterative solvers to solve this more efficiently. My J_fe comes from a nonlinear shell deformation problem with snap through response. This can be a tough problem for AMG (or any) iterative method. > > I am not sure if it would make sense to use an AMG solver on this monolithic matrix, Almost surely not. > as opposed to using it as a preconditioner for J_fe in the Schur-factorization approach. The LOCA solver in Trillions was able to find some success with the latter approach: https://www.worldscientific.com/doi/abs/10.1142/S0218127405012508 In theory you can use PCFIELDSPLIT to do the Schur factorization approach with your monothic matrix. You would create two IS, one for the FE problem, you can create this by using a ISCreateStride() (each process would create an IS for all the local variables except the last process which would skip the last variable) and the second IS would be of size zero on all processes except the last process where it would have only the last variable. This would be fine for testing if Schur factorization plus AMG works in your case. The drawback is that PCFIELDSPLIT in this circumstance will pull out of the big matrix (copy) the ever so slightly smaller matrix that needs to be passed to AMG; it needs the copy because GAMG currently needs to directly work with a AIJ matrix. Thus what you need to do is to use MatCreateNest(). You can do this and share almost all your current code. You would create the FE matrix (using, for example libmesh), this matrix is what you would have the FE assembly code fill in (because MATNEST does not support MatSetValues()). You would also create A, B, C as MPIAIJ matrices. Your Jacobian filling routine would then fill up separately the FE matrix, A, B, and C. For a PC you would use field split with the IS I indicated above. GAMG will then directly use the FE matrix with no copy needed inside the PCFIELDSPLIT. So the only difference in your code for the two cases would be 1) The creation of the matrix. 2) The filling up of the pieces of the matrix versus just filling up the big matrix directly. 3) for field split you would need to create the IS and provide them to the PC. Based on your previous rapid progress I have no doubt that you will be able to achieve this approach rapidly, good luck Barry > > I would appreciate any general thoughts concerning this. > > Regards, > Manav > > >> On May 29, 2019, at 9:11 PM, Manav Bhatia wrote: >> >> Barry, >> >> Thanks for the detailed message. >> >> I checked libMesh?s continuation sovler and it appears to be using the same system solver without creating a larger matrix: https://github.com/libMesh/libmesh/blob/master/src/systems/continuation_system.C >> >> I need to implement this in my code, MAST, for various reasons (mainly, it fits inside a bigger workflow). The current implementation implementation follows the Schur factorization approach: https://mastmultiphysics.github.io/class_m_a_s_t_1_1_continuation_solver_base.html#details >> >> I will look into some solutions pertaining to the use of MatGetLocalSubMatrix or leverage some existing functionality in libMesh. >> >> Thanks, >> Manav >> >> >>> On May 29, 2019, at 7:04 PM, Smith, Barry F. wrote: >>> >>> >>> Understood. Where are you putting the "few extra unknowns" in the vector and matrix? On the first process, on the last process, some places in the middle of the matrix? >>> >>> We don't have any trivial code for copying a big matrix into a even larger matrix directly because we frown on doing that. It is very wasteful in time and memory. >>> >>> The simplest way to do it is call MatGetRow() twice for each row, once to get the nonzero locations for each row to determine the numbers needed for preallocation and then the second time after the big matrix has been preallocated to get the nonzero locations and numerical values for the row to call MatSetValues() with to set that row into the bigger matrix. Note of course when you call MatSetValues() you will need to shift the rows and column locations to take into account the new rows and columns in the bigger matrix. If you put the "extra unknowns" at the every end of the rows/columns on the last process you won't have to shift. >>> >>> Note that B being dense really messes up chances for load balancing since its rows are dense and take a great deal of space so whatever process gets those rows needs to have much less of the mesh. >>> >>> The correct long term approach is to have libmesh provide the needed functionality (for continuation) for the slightly larger matrix directly so huge matrices do not need to be copied. >>> >>> I noticed that libmesh has some functionality related to continuation. I do not know if they handle it by creating the larger matrix and vector and filling that up directly for finite elements. If they do then you should definitely take a look at that and see if it can be extended for your case (ignore the continuation algorithm they may be using, that is not relevant, the question is if they generate the larger matrices and if you can leverage this). >>> >>> >>> The ultimate hack would be to (for example) assign the extra variables to the end of the last process and hack lib mesh a little bit so the matrix it creates (before it puts in the numerical values) has the extra rows and columns, that libmesh will not put the values into but you will. Thus you get libmesh to fill up the true final matrix for its finite element problem (not realizing the matrix is a little bigger then it needs) directly, no copies of the data needed. But this is bit tricky, you'll need to combine libmesh's preallocation information with yours for the final columns and rows before you have lib mesh put the numerical values in. Double check if they have any support for this first. >>> >>> Barry >>> >>> >>>> On May 29, 2019, at 6:29 PM, Manav Bhatia wrote: >>>> >>>> Thanks, Barry. >>>> >>>> I am working on a FE application (involving bifurcation behavior) with libMesh where I need to solve the system of equations along with a few extra unknowns that are not directly related to the FE mesh. I am able to assemble the n x 1 residual (R_fe) and n x n Jacobian (J_fe ) from my code and libMesh provides me with the sparsity pattern for this. >>>> >>>> Next, the system of equations that I need to solve is: >>>> >>>> [ J_fe A ] { dX } = { R_fe } >>>> [ B C ] { dV } = {R_ext } >>>> >>>> Where, C is a dense matrix of size m x m ( m << n ), A is n x m, B is m x n, R_ext is m x 1. A, B and C are dense matrixes. This comes from the bordered system for my path continuation solver. >>>> >>>> I have implemented a solver using Schur factorization ( this is outside of PETSc and does not use the FieldSplit construct ). This works well for most cases, except when J_fe is close to singular. >>>> >>>> I am now attempting to create a monolithic matrix that solves the complete system. >>>> >>>> Currently, the approach I am considering is to compute J_fe using my libMesh application, so that I don?t have to change that. I am defining a new matrix with the extra non-zero locations for A, B, C. >>>> >>>> With J_fe computed, I am looking to copy its non-zero entries to this new matrix. This is where I am stumbling since I don?t know how best to get the non-zero locations in J_fe. Maybe there is a better approach to copy from J_fe to the new matrix? >>>> >>>> I have looked through the nested matrix construct, but have not given this a serious consideration. Maybe I should? Note that I don?t want to solve J_fe and C separately (not as separate systems), so the field-split approach will not be suitable here. >>>> >>>> Also, I am currently using MUMPS for all my parallel solves. >>>> >>>> I would appreciate any advice. >>>> >>>> Regards, >>>> Manav >>>> >>>> >>>>> On May 29, 2019, at 6:07 PM, Smith, Barry F. wrote: >>>>> >>>>> >>>>> Manav, >>>>> >>>>> For parallel sparse matrices using the standard PETSc formats the matrix is stored in two parts on each process (see the details in MatCreateAIJ()) thus there is no inexpensive way to access directly the IJ locations as a single local matrix. What are you hoping to use the information for? Perhaps we have other suggestions on how to achieve the goal. >>>>> >>>>> Barry >>>>> >>>>> >>>>>> On May 29, 2019, at 2:27 PM, Manav Bhatia via petsc-users wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> Once a MPI-AIJ matrix has been assembled, is there a method to get the nonzero I-J locations? I see one for sequential matrices here: https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetRowIJ.html , but not for parallel matrices. >>>>>> >>>>>> Regards, >>>>>> Manav >>>>>> >>>>>> >>>>> >>>> >>> >> > From knepley at gmail.com Fri May 31 05:37:30 2019 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 31 May 2019 06:37:30 -0400 Subject: [petsc-users] Memory growth issue In-Reply-To: References: <93669f36-2cfd-6772-abf9-3636e8d09fc3@berkeley.edu> <2076CEA0-CE15-486E-B001-EDE7D86DACA8@anl.gov> <0EC809E5-9DE2-4470-BF58-F8EBDECF3ACD@mcs.anl.gov> <9f80a732-c2a8-1bab-8b2e-591e3f3d65ba@berkeley.edu> <52F9D2F5-7EA4-4225-928B-A2C02DC47DE3@gmx.li> <026b416b-ea74-c73a-285f-484a82c806f2@berkeley.edu> Message-ID: On Thu, May 30, 2019 at 11:55 PM Sanjay Govindjee via petsc-users < petsc-users at mcs.anl.gov> wrote: > Hi Juanchao, > Thanks for the hints below, they will take some time to absorb as the > vectors that are being moved around > are actually partly petsc vectors and partly local process vectors. > Is this code just doing a global-to-local map? Meaning, does it just map all the local unknowns to some global unknown on some process? We have an even simpler interface for that, where we make the VecScatter automatically, https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/IS/ISLocalToGlobalMappingCreate.html#ISLocalToGlobalMappingCreate Then you can use it with Vecs, Mats, etc. Thanks, Matt > Attached is the modified routine that now works (on leaking memory) with > openmpi. > > -sanjay > > On 5/30/19 8:41 PM, Zhang, Junchao wrote: > > > Hi, Sanjay, > Could you send your modified data exchange code (psetb.F) with > MPI_Waitall? See other inlined comments below. Thanks. > > On Thu, May 30, 2019 at 1:49 PM Sanjay Govindjee via petsc-users < > petsc-users at mcs.anl.gov> wrote: > >> Lawrence, >> Thanks for taking a look! This is what I had been wondering about -- my >> knowledge of MPI is pretty minimal and >> this origins of the routine were from a programmer we hired a decade+ >> back from NERSC. I'll have to look into >> VecScatter. It will be great to dispense with our roll-your-own >> routines (we even have our own reduceALL scattered around the code). >> > Petsc VecScatter has a very simple interface and you definitely should go > with. With VecScatter, you can think in familiar vectors and indices > instead of the low level MPI_Send/Recv. Besides that, PETSc has optimized > VecScatter so that communication is efficient. > >> >> Interestingly, the MPI_WaitALL has solved the problem when using OpenMPI >> but it still persists with MPICH. Graphs attached. >> I'm going to run with openmpi for now (but I guess I really still need >> to figure out what is wrong with MPICH and WaitALL; >> I'll try Barry's suggestion of >> --download-mpich-configure-arguments="--enable-error-messages=all >> --enable-g" later today and report back). >> >> Regarding MPI_Barrier, it was put in due a problem that some processes >> were finishing up sending and receiving and exiting the subroutine >> before the receiving processes had completed (which resulted in data >> loss as the buffers are freed after the call to the routine). >> MPI_Barrier was the solution proposed >> to us. I don't think I can dispense with it, but will think about some >> more. > > After MPI_Send(), or after MPI_Isend(..,req) and MPI_Wait(req), you can > safely free the send buffer without worry that the receive has not > completed. MPI guarantees the receiver can get the data, for example, > through internal buffering. > >> >> I'm not so sure about using MPI_IRecv as it will require a bit of >> rewriting since right now I process the received >> data sequentially after each blocking MPI_Recv -- clearly slower but >> easier to code. >> >> Thanks again for the help. >> >> -sanjay >> >> On 5/30/19 4:48 AM, Lawrence Mitchell wrote: >> > Hi Sanjay, >> > >> >> On 30 May 2019, at 08:58, Sanjay Govindjee via petsc-users < >> petsc-users at mcs.anl.gov> wrote: >> >> >> >> The problem seems to persist but with a different signature. Graphs >> attached as before. >> >> >> >> Totals with MPICH (NB: single run) >> >> >> >> For the CG/Jacobi data_exchange_total = 41,385,984; >> kspsolve_total = 38,289,408 >> >> For the GMRES/BJACOBI data_exchange_total = 41,324,544; >> kspsolve_total = 41,324,544 >> >> >> >> Just reading the MPI docs I am wondering if I need some sort of >> MPI_Wait/MPI_Waitall before my MPI_Barrier in the data exchange routine? >> >> I would have thought that with the blocking receives and the >> MPI_Barrier that everything will have fully completed and cleaned up before >> >> all processes exited the routine, but perhaps I am wrong on that. >> > >> > Skimming the fortran code you sent you do: >> > >> > for i in ...: >> > call MPI_Isend(..., req, ierr) >> > >> > for i in ...: >> > call MPI_Recv(..., ierr) >> > >> > But you never call MPI_Wait on the request you got back from the Isend. >> So the MPI library will never free the data structures it created. >> > >> > The usual pattern for these non-blocking communications is to allocate >> an array for the requests of length nsend+nrecv and then do: >> > >> > for i in nsend: >> > call MPI_Isend(..., req[i], ierr) >> > for j in nrecv: >> > call MPI_Irecv(..., req[nsend+j], ierr) >> > >> > call MPI_Waitall(req, ..., ierr) >> > >> > I note also there's no need for the Barrier at the end of the routine, >> this kind of communication does neighbourwise synchronisation, no need to >> add (unnecessary) global synchronisation too. >> > >> > As an aside, is there a reason you don't use PETSc's VecScatter to >> manage this global to local exchange? >> > >> > Cheers, >> > >> > Lawrence >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri May 31 05:44:28 2019 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 31 May 2019 06:44:28 -0400 Subject: [petsc-users] Nonzero I-j locations In-Reply-To: References: <320DB646-4D80-4273-BD82-A33014D0DF65@anl.gov> <312FC2BE-E528-47CA-AF94-36DCCB246313@gmail.com> Message-ID: On Fri, May 31, 2019 at 12:09 AM Manav Bhatia via petsc-users < petsc-users at mcs.anl.gov> wrote: > I managed to get this to work. > > I defined a larger matrix with the dense blocks appended to the end of the > matrix on the last processor. Currently, I am only running with one extra > unknown, so this should not be a significant penalty for load balancing. > > Since the larger matrix has the same I-j locations for the FE non-zeros, I > use it directly in the FE assembly. > > I have tested with parallel MUMPS solves and it working smoothly. Also, > the monolithic system removes the issue with the singularity of J_fe > at/near the bifurcation point. > > Next, I would like to figure out if there are ways to bring in iterative > solvers to solve this more efficiently. My J_fe comes from a nonlinear > shell deformation problem with snap through response. > > I am not sure if it would make sense to use an AMG solver on this > monolithic matrix, as opposed to using it as a preconditioner for J_fe in > the Schur-factorization approach. The LOCA solver in Trillions was able to > find some success with the latter approach: > https://www.worldscientific.com/doi/abs/10.1142/S0218127405012508 > > I would appreciate any general thoughts concerning this. > Hi Manav, If you are using this formulation to uncover sophisticated properties of the system, I can understand all the work for solving it. However, if you are just using it to follow solution branches like LOCA, I think there is a better way. Look at this algorithm, https://arxiv.org/abs/1410.5620 https://arxiv.org/abs/1904.13299 It is strikingly simple, and you can compute it only with a modified FEM matrix, so your subblock solver will work fine. He uses it on a bunch of examples https://arxiv.org/abs/1603.00809 https://arxiv.org/abs/1609.08842 https://arxiv.org/abs/1706.04597 There are more if you check arXiv. Thanks, Matt > Regards, > Manav > > > On May 29, 2019, at 9:11 PM, Manav Bhatia wrote: > > Barry, > > Thanks for the detailed message. > > I checked libMesh?s continuation sovler and it appears to be using the > same system solver without creating a larger matrix: > https://github.com/libMesh/libmesh/blob/master/src/systems/continuation_system.C > > > I need to implement this in my code, MAST, for various reasons (mainly, > it fits inside a bigger workflow). The current implementation > implementation follows the Schur factorization approach: > https://mastmultiphysics.github.io/class_m_a_s_t_1_1_continuation_solver_base.html#details > > > I will look into some solutions pertaining to the use of > MatGetLocalSubMatrix or leverage some existing functionality in libMesh. > > Thanks, > Manav > > > On May 29, 2019, at 7:04 PM, Smith, Barry F. wrote: > > > Understood. Where are you putting the "few extra unknowns" in the vector > and matrix? On the first process, on the last process, some places in the > middle of the matrix? > > We don't have any trivial code for copying a big matrix into a even > larger matrix directly because we frown on doing that. It is very wasteful > in time and memory. > > The simplest way to do it is call MatGetRow() twice for each row, once > to get the nonzero locations for each row to determine the numbers needed > for preallocation and then the second time after the big matrix has been > preallocated to get the nonzero locations and numerical values for the row > to call MatSetValues() with to set that row into the bigger matrix. Note of > course when you call MatSetValues() you will need to shift the rows and > column locations to take into account the new rows and columns in the > bigger matrix. If you put the "extra unknowns" at the every end of the > rows/columns on the last process you won't have to shift. > > Note that B being dense really messes up chances for load balancing > since its rows are dense and take a great deal of space so whatever process > gets those rows needs to have much less of the mesh. > > The correct long term approach is to have libmesh provide the needed > functionality (for continuation) for the slightly larger matrix directly so > huge matrices do not need to be copied. > > I noticed that libmesh has some functionality related to continuation. I > do not know if they handle it by creating the larger matrix and vector and > filling that up directly for finite elements. If they do then you should > definitely take a look at that and see if it can be extended for your case > (ignore the continuation algorithm they may be using, that is not relevant, > the question is if they generate the larger matrices and if you can > leverage this). > > > The ultimate hack would be to (for example) assign the extra variables to > the end of the last process and hack lib mesh a little bit so the matrix it > creates (before it puts in the numerical values) has the extra rows and > columns, that libmesh will not put the values into but you will. Thus you > get libmesh to fill up the true final matrix for its finite element problem > (not realizing the matrix is a little bigger then it needs) directly, no > copies of the data needed. But this is bit tricky, you'll need to combine > libmesh's preallocation information with yours for the final columns and > rows before you have lib mesh put the numerical values in. Double check if > they have any support for this first. > > Barry > > > On May 29, 2019, at 6:29 PM, Manav Bhatia wrote: > > Thanks, Barry. > > I am working on a FE application (involving bifurcation behavior) with > libMesh where I need to solve the system of equations along with a few > extra unknowns that are not directly related to the FE mesh. I am able to > assemble the n x 1 residual (R_fe) and n x n Jacobian (J_fe ) from my > code and libMesh provides me with the sparsity pattern for this. > > Next, the system of equations that I need to solve is: > > [ J_fe A ] { dX } = { R_fe } > [ B C ] { dV } = {R_ext } > > Where, C is a dense matrix of size m x m ( m << n ), A is n x m, B is m x > n, R_ext is m x 1. A, B and C are dense matrixes. This comes from the > bordered system for my path continuation solver. > > I have implemented a solver using Schur factorization ( this is outside of > PETSc and does not use the FieldSplit construct ). This works well for most > cases, except when J_fe is close to singular. > > I am now attempting to create a monolithic matrix that solves the complete > system. > > Currently, the approach I am considering is to compute J_fe using my > libMesh application, so that I don?t have to change that. I am defining a > new matrix with the extra non-zero locations for A, B, C. > > With J_fe computed, I am looking to copy its non-zero entries to this new > matrix. This is where I am stumbling since I don?t know how best to get the > non-zero locations in J_fe. Maybe there is a better approach to copy from > J_fe to the new matrix? > > I have looked through the nested matrix construct, but have not given this > a serious consideration. Maybe I should? Note that I don?t want to solve > J_fe and C separately (not as separate systems), so the field-split > approach will not be suitable here. > > Also, I am currently using MUMPS for all my parallel solves. > > I would appreciate any advice. > > Regards, > Manav > > > On May 29, 2019, at 6:07 PM, Smith, Barry F. wrote: > > > Manav, > > For parallel sparse matrices using the standard PETSc formats the matrix > is stored in two parts on each process (see the details in MatCreateAIJ()) > thus there is no inexpensive way to access directly the IJ locations as a > single local matrix. What are you hoping to use the information for? > Perhaps we have other suggestions on how to achieve the goal. > > Barry > > > On May 29, 2019, at 2:27 PM, Manav Bhatia via petsc-users < > petsc-users at mcs.anl.gov> wrote: > > Hi, > > Once a MPI-AIJ matrix has been assembled, is there a method to get the > nonzero I-J locations? I see one for sequential matrices here: > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatGetRowIJ.html > , but not for parallel matrices. > > Regards, > Manav > > > > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From xiaoma5 at illinois.edu Fri May 31 13:26:50 2019 From: xiaoma5 at illinois.edu (Ma, Xiao) Date: Fri, 31 May 2019 18:26:50 +0000 Subject: [petsc-users] Configuration process of Petsc hanging Message-ID: Hi , I am trying to install Pylith which is a earthquake simulator using Petsc library, I am building it in PSC bridge cluster, during the steps of building Petsc, the configuration hanging at TESTING: configureMPITypes from config.packages.MPI(config/BuildSystem/config/packages/MPI.py:247) I am not sure if this has to with the configuration setup of the mpi version I am using. Any help would be deeply appreciated. I am attaching the configure options here: Saving to: ?petsc-pylith-2.2.1.tgz? 100%[===========================================================>] 10,415,016 37.3MB/s in 0.3s 2019-05-31 14:03:13 (37.3 MB/s) - ?petsc-pylith-2.2.1.tgz? saved [10415016/10415016] FINISHED --2019-05-31 14:03:13-- Total wall clock time: 1.1s Downloaded: 1 files, 9.9M in 0.3s (37.3 MB/s) /usr/bin/tar -zxf petsc-pylith-2.2.1.tgz cd petsc-pylith && \ ./configure --prefix=/home/xm12345/pylith \ --with-c2html=0 --with-x=0 \ --with-clanguage=C \ --with-mpicompilers=1 \ --with-shared-libraries=1 --with-64-bit-points=1 --with-large-file-io=1 \ --download-chaco=1 --download-ml=1 --download-f2cblaslapack=1 --with-hdf5=1 --with -debugging=0 --with-fc=0 CPPFLAGS="-I/home/xm12345/pylith/include -I/home/xm12345/pylith/include " L DFLAGS="-L/home/xm12345/pylith/lib -L/home/xm12345/pylith/lib64 -L/home/xm12345/pylith/lib -L/home/xm 12345/pylith/lib64 " CFLAGS="-g -O2" CXXFLAGS="-g -O2 -DMPICH_IGNORE_CXX_SEEK" FCFLAGS="" \ PETSC_DIR=/home/xm12345/build/pylith/petsc-pylith PETSC_ARCH=arch-pylith && \ make -f gmakefile -j2 PETSC_DIR=/home/xm12345/build/pylith/petsc-pylith PETSC_ARCH=arch-pylit h && \ make PETSC_DIR=/home/xm12345/build/pylith/petsc-pylith install && \ make PETSC_DIR=/home/xm12345/build/pylith/petsc-pylith test && \ touch ../installed_petsc =============================================================================== Configuring PETSc to compile on your system =============================================================================== =============================================================================== ***** WARNING: MAKEFLAGS (set to w) found in environment variables - ignoring use ./configure MAKEFLAGS=$MAKEFLAGS if you really want to use that value ****** =============================================================================== =============================================================================== WARNING! Compiling PETSc with no debugging, this should only be done for timing and production runs. All development should be done when configured using --with-debugging=1 =============================================================================== TESTING: configureMPITypes from config.packages.MPI(config/BuildSystem/config/packages/MPI.py:247) -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Fri May 31 13:35:15 2019 From: balay at mcs.anl.gov (Balay, Satish) Date: Fri, 31 May 2019 18:35:15 +0000 Subject: [petsc-users] Configuration process of Petsc hanging In-Reply-To: References: Message-ID: PETSc configure is attempting to run some MPI binaries - and that is hanging with this MPI. You can retry with the options: --batch=1 --known-64-bit-blas-indices=0 -known-mpi-shared-libraries=0 [and follow instructions provided by configure] Satish On Fri, 31 May 2019, Ma, Xiao via petsc-users wrote: > Hi , > > I am trying to install Pylith which is a earthquake simulator using Petsc library, I am building it in PSC bridge cluster, during the steps of building Petsc, the configuration hanging at > > TESTING: configureMPITypes from config.packages.MPI(config/BuildSystem/config/packages/MPI.py:247) > > > I am not sure if this has to with the configuration setup of the mpi version I am using. > > Any help would be deeply appreciated. > > I am attaching the configure options here: > > > Saving to: ?petsc-pylith-2.2.1.tgz? > > 100%[===========================================================>] 10,415,016 37.3MB/s in 0.3s > > 2019-05-31 14:03:13 (37.3 MB/s) - ?petsc-pylith-2.2.1.tgz? saved [10415016/10415016] > > FINISHED --2019-05-31 14:03:13-- > Total wall clock time: 1.1s > Downloaded: 1 files, 9.9M in 0.3s (37.3 MB/s) > /usr/bin/tar -zxf petsc-pylith-2.2.1.tgz > cd petsc-pylith && \ > ./configure --prefix=/home/xm12345/pylith \ > --with-c2html=0 --with-x=0 \ > --with-clanguage=C \ > --with-mpicompilers=1 \ > --with-shared-libraries=1 --with-64-bit-points=1 --with-large-file-io=1 \ > --download-chaco=1 --download-ml=1 --download-f2cblaslapack=1 --with-hdf5=1 --with -debugging=0 --with-fc=0 CPPFLAGS="-I/home/xm12345/pylith/include -I/home/xm12345/pylith/include " L DFLAGS="-L/home/xm12345/pylith/lib -L/home/xm12345/pylith/lib64 -L/home/xm12345/pylith/lib -L/home/xm 12345/pylith/lib64 " CFLAGS="-g -O2" CXXFLAGS="-g -O2 -DMPICH_IGNORE_CXX_SEEK" FCFLAGS="" \ > PETSC_DIR=/home/xm12345/build/pylith/petsc-pylith PETSC_ARCH=arch-pylith && \ > make -f gmakefile -j2 PETSC_DIR=/home/xm12345/build/pylith/petsc-pylith PETSC_ARCH=arch-pylit h && \ > make PETSC_DIR=/home/xm12345/build/pylith/petsc-pylith install && \ > make PETSC_DIR=/home/xm12345/build/pylith/petsc-pylith test && \ > touch ../installed_petsc > =============================================================================== > Configuring PETSc to compile on your system > =============================================================================== > =============================================================================== ***** WARNING: MAKEFLAGS (set to w) found in environment variables - ignoring use ./configure MAKEFLAGS=$MAKEFLAGS if you really want to use that value ****** =============================================================================== =============================================================================== WARNING! Compiling PETSc with no debugging, this should only be done for timing and production runs. All development should be done when configured using --with-debugging=1 =============================================================================== TESTING: configureMPITypes from config.packages.MPI(config/BuildSystem/config/packages/MPI.py:247) > > > From s_g at berkeley.edu Fri May 31 13:50:25 2019 From: s_g at berkeley.edu (Sanjay Govindjee) Date: Fri, 31 May 2019 11:50:25 -0700 Subject: [petsc-users] Memory growth issue In-Reply-To: References: <93669f36-2cfd-6772-abf9-3636e8d09fc3@berkeley.edu> <2076CEA0-CE15-486E-B001-EDE7D86DACA8@anl.gov> <0EC809E5-9DE2-4470-BF58-F8EBDECF3ACD@mcs.anl.gov> <9f80a732-c2a8-1bab-8b2e-591e3f3d65ba@berkeley.edu> <52F9D2F5-7EA4-4225-928B-A2C02DC47DE3@gmx.li> <026b416b-ea74-c73a-285f-484a82c806f2@berkeley.edu> Message-ID: <5faac7eb-706c-139b-7fe9-2476ea0990b2@berkeley.edu> Matt, ? Here is the process as it currently stands: 1) I have a PETSc Vec (sol), which come from a KSPSolve 2) Each processor grabs its section of sol via VecGetOwnershipRange and VecGetArrayReadF90 and inserts parts of its section of sol in a local array (locarr) using a complex but easily computable mapping. 3) The routine you are looking at then exchanges various parts of the locarr between the processors. 4) Each processor then does computations using its updated locarr. Typing it out this way, I guess the answer to your question is "yes."? I have a global Vec and I want its values sent in a complex but computable way to local vectors on each process. -sanjay On 5/31/19 3:37 AM, Matthew Knepley wrote: > On Thu, May 30, 2019 at 11:55 PM Sanjay Govindjee via petsc-users > > wrote: > > Hi Juanchao, > Thanks for the hints below, they will take some time to absorb as > the vectors that are being? moved around > are actually partly petsc vectors and partly local process vectors. > > > Is this code just doing a global-to-local map? Meaning, does it just > map all the local unknowns to some global > unknown on some process? We have an even simpler interface for that, > where we make the VecScatter > automatically, > > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/IS/ISLocalToGlobalMappingCreate.html#ISLocalToGlobalMappingCreate > > Then you can use it with Vecs, Mats, etc. > > ? Thanks, > > ? ? ?Matt > > Attached is the modified routine that now works (on leaking > memory) with openmpi. > > -sanjay > On 5/30/19 8:41 PM, Zhang, Junchao wrote: >> >> Hi, Sanjay, >> ? Could you send your modified data exchange code (psetb.F) with >> MPI_Waitall? See other inlined comments below. Thanks. >> >> On Thu, May 30, 2019 at 1:49 PM Sanjay Govindjee via petsc-users >> > wrote: >> >> Lawrence, >> Thanks for taking a look!? This is what I had been wondering >> about -- my >> knowledge of MPI is pretty minimal and >> this origins of the routine were from a programmer we hired a >> decade+ >> back from NERSC.? I'll have to look into >> VecScatter.? It will be great to dispense with our roll-your-own >> routines (we even have our own reduceALL scattered around the >> code). >> >> Petsc VecScatter has a very simple interface and you definitely >> should go with.? With VecScatter, you can think in familiar >> vectors and indices instead of the low level MPI_Send/Recv. >> Besides that, PETSc has optimized VecScatter so that >> communication is efficient. >> >> >> Interestingly, the MPI_WaitALL has solved the problem when >> using OpenMPI >> but it still persists with MPICH.? Graphs attached. >> I'm going to run with openmpi for now (but I guess I really >> still need >> to figure out what is wrong with MPICH and WaitALL; >> I'll try Barry's suggestion of >> --download-mpich-configure-arguments="--enable-error-messages=all >> >> --enable-g" later today and report back). >> >> Regarding MPI_Barrier, it was put in due a problem that some >> processes >> were finishing up sending and receiving and exiting the >> subroutine >> before the receiving processes had completed (which resulted >> in data >> loss as the buffers are freed after the call to the routine). >> MPI_Barrier was the solution proposed >> to us.? I don't think I can dispense with it, but will think >> about some >> more. >> >> After MPI_Send(), or after MPI_Isend(..,req) and MPI_Wait(req), >> you can safely free the send buffer without worry that the >> receive has not completed. MPI guarantees the receiver can get >> the data, for example, through internal buffering. >> >> >> I'm not so sure about using MPI_IRecv as it will require a >> bit of >> rewriting since right now I process the received >> data sequentially after each blocking MPI_Recv -- clearly >> slower but >> easier to code. >> >> Thanks again for the help. >> >> -sanjay >> >> On 5/30/19 4:48 AM, Lawrence Mitchell wrote: >> > Hi Sanjay, >> > >> >> On 30 May 2019, at 08:58, Sanjay Govindjee via petsc-users >> > wrote: >> >> >> >> The problem seems to persist but with a different >> signature.? Graphs attached as before. >> >> >> >> Totals with MPICH (NB: single run) >> >> >> >> For the CG/Jacobi data_exchange_total = 41,385,984; >> kspsolve_total = 38,289,408 >> >> For the GMRES/BJACOBI data_exchange_total = 41,324,544; >> kspsolve_total = 41,324,544 >> >> >> >> Just reading the MPI docs I am wondering if I need some >> sort of MPI_Wait/MPI_Waitall before my MPI_Barrier in the >> data exchange routine? >> >> I would have thought that with the blocking receives and >> the MPI_Barrier that everything will have fully completed and >> cleaned up before >> >> all processes exited the routine, but perhaps I am wrong >> on that. >> > >> > Skimming the fortran code you sent you do: >> > >> > for i in ...: >> >? ? ?call MPI_Isend(..., req, ierr) >> > >> > for i in ...: >> >? ? ?call MPI_Recv(..., ierr) >> > >> > But you never call MPI_Wait on the request you got back >> from the Isend. So the MPI library will never free the data >> structures it created. >> > >> > The usual pattern for these non-blocking communications is >> to allocate an array for the requests of length nsend+nrecv >> and then do: >> > >> > for i in nsend: >> >? ? ?call MPI_Isend(..., req[i], ierr) >> > for j in nrecv: >> >? ? ?call MPI_Irecv(..., req[nsend+j], ierr) >> > >> > call MPI_Waitall(req, ..., ierr) >> > >> > I note also there's no need for the Barrier at the end of >> the routine, this kind of communication does neighbourwise >> synchronisation, no need to add (unnecessary) global >> synchronisation too. >> > >> > As an aside, is there a reason you don't use PETSc's >> VecScatter to manage this global to local exchange? >> > >> > Cheers, >> > >> > Lawrence >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Fri May 31 14:02:27 2019 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Fri, 31 May 2019 22:02:27 +0300 Subject: [petsc-users] Memory growth issue In-Reply-To: <5faac7eb-706c-139b-7fe9-2476ea0990b2@berkeley.edu> References: <93669f36-2cfd-6772-abf9-3636e8d09fc3@berkeley.edu> <2076CEA0-CE15-486E-B001-EDE7D86DACA8@anl.gov> <0EC809E5-9DE2-4470-BF58-F8EBDECF3ACD@mcs.anl.gov> <9f80a732-c2a8-1bab-8b2e-591e3f3d65ba@berkeley.edu> <52F9D2F5-7EA4-4225-928B-A2C02DC47DE3@gmx.li> <026b416b-ea74-c73a-285f-484a82c806f2@berkeley.edu> <5faac7eb-706c-139b-7fe9-2476ea0990b2@berkeley.edu> Message-ID: > On May 31, 2019, at 9:50 PM, Sanjay Govindjee via petsc-users wrote: > > Matt, > Here is the process as it currently stands: > > 1) I have a PETSc Vec (sol), which come from a KSPSolve > > 2) Each processor grabs its section of sol via VecGetOwnershipRange and VecGetArrayReadF90 > and inserts parts of its section of sol in a local array (locarr) using a complex but easily computable mapping. > > 3) The routine you are looking at then exchanges various parts of the locarr between the processors. > You need a VecScatter object https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecScatterCreate.html#VecScatterCreate > 4) Each processor then does computations using its updated locarr. > > Typing it out this way, I guess the answer to your question is "yes." I have a global Vec and I want its values > sent in a complex but computable way to local vectors on each process. > > -sanjay > On 5/31/19 3:37 AM, Matthew Knepley wrote: >> On Thu, May 30, 2019 at 11:55 PM Sanjay Govindjee via petsc-users > wrote: >> Hi Juanchao, >> Thanks for the hints below, they will take some time to absorb as the vectors that are being moved around >> are actually partly petsc vectors and partly local process vectors. >> >> Is this code just doing a global-to-local map? Meaning, does it just map all the local unknowns to some global >> unknown on some process? We have an even simpler interface for that, where we make the VecScatter >> automatically, >> >> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/IS/ISLocalToGlobalMappingCreate.html#ISLocalToGlobalMappingCreate >> >> Then you can use it with Vecs, Mats, etc. >> >> Thanks, >> >> Matt >> >> Attached is the modified routine that now works (on leaking memory) with openmpi. >> >> -sanjay >> On 5/30/19 8:41 PM, Zhang, Junchao wrote: >>> >>> Hi, Sanjay, >>> Could you send your modified data exchange code (psetb.F) with MPI_Waitall? See other inlined comments below. Thanks. >>> >>> On Thu, May 30, 2019 at 1:49 PM Sanjay Govindjee via petsc-users > wrote: >>> Lawrence, >>> Thanks for taking a look! This is what I had been wondering about -- my >>> knowledge of MPI is pretty minimal and >>> this origins of the routine were from a programmer we hired a decade+ >>> back from NERSC. I'll have to look into >>> VecScatter. It will be great to dispense with our roll-your-own >>> routines (we even have our own reduceALL scattered around the code). >>> Petsc VecScatter has a very simple interface and you definitely should go with. With VecScatter, you can think in familiar vectors and indices instead of the low level MPI_Send/Recv. Besides that, PETSc has optimized VecScatter so that communication is efficient. >>> >>> Interestingly, the MPI_WaitALL has solved the problem when using OpenMPI >>> but it still persists with MPICH. Graphs attached. >>> I'm going to run with openmpi for now (but I guess I really still need >>> to figure out what is wrong with MPICH and WaitALL; >>> I'll try Barry's suggestion of >>> --download-mpich-configure-arguments="--enable-error-messages=all >>> --enable-g" later today and report back). >>> >>> Regarding MPI_Barrier, it was put in due a problem that some processes >>> were finishing up sending and receiving and exiting the subroutine >>> before the receiving processes had completed (which resulted in data >>> loss as the buffers are freed after the call to the routine). >>> MPI_Barrier was the solution proposed >>> to us. I don't think I can dispense with it, but will think about some >>> more. >>> After MPI_Send(), or after MPI_Isend(..,req) and MPI_Wait(req), you can safely free the send buffer without worry that the receive has not completed. MPI guarantees the receiver can get the data, for example, through internal buffering. >>> >>> I'm not so sure about using MPI_IRecv as it will require a bit of >>> rewriting since right now I process the received >>> data sequentially after each blocking MPI_Recv -- clearly slower but >>> easier to code. >>> >>> Thanks again for the help. >>> >>> -sanjay >>> >>> On 5/30/19 4:48 AM, Lawrence Mitchell wrote: >>> > Hi Sanjay, >>> > >>> >> On 30 May 2019, at 08:58, Sanjay Govindjee via petsc-users > wrote: >>> >> >>> >> The problem seems to persist but with a different signature. Graphs attached as before. >>> >> >>> >> Totals with MPICH (NB: single run) >>> >> >>> >> For the CG/Jacobi data_exchange_total = 41,385,984; kspsolve_total = 38,289,408 >>> >> For the GMRES/BJACOBI data_exchange_total = 41,324,544; kspsolve_total = 41,324,544 >>> >> >>> >> Just reading the MPI docs I am wondering if I need some sort of MPI_Wait/MPI_Waitall before my MPI_Barrier in the data exchange routine? >>> >> I would have thought that with the blocking receives and the MPI_Barrier that everything will have fully completed and cleaned up before >>> >> all processes exited the routine, but perhaps I am wrong on that. >>> > >>> > Skimming the fortran code you sent you do: >>> > >>> > for i in ...: >>> > call MPI_Isend(..., req, ierr) >>> > >>> > for i in ...: >>> > call MPI_Recv(..., ierr) >>> > >>> > But you never call MPI_Wait on the request you got back from the Isend. So the MPI library will never free the data structures it created. >>> > >>> > The usual pattern for these non-blocking communications is to allocate an array for the requests of length nsend+nrecv and then do: >>> > >>> > for i in nsend: >>> > call MPI_Isend(..., req[i], ierr) >>> > for j in nrecv: >>> > call MPI_Irecv(..., req[nsend+j], ierr) >>> > >>> > call MPI_Waitall(req, ..., ierr) >>> > >>> > I note also there's no need for the Barrier at the end of the routine, this kind of communication does neighbourwise synchronisation, no need to add (unnecessary) global synchronisation too. >>> > >>> > As an aside, is there a reason you don't use PETSc's VecScatter to manage this global to local exchange? >>> > >>> > Cheers, >>> > >>> > Lawrence >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xiaoma5 at illinois.edu Fri May 31 14:21:29 2019 From: xiaoma5 at illinois.edu (Ma, Xiao) Date: Fri, 31 May 2019 19:21:29 +0000 Subject: [petsc-users] Configuration process of Petsc hanging In-Reply-To: References: , Message-ID: Hi Satish, I have added these configure options --batch=1 --known-64-bit-blas-indices=0 -known-mpi-shared-libraries=0 It is still hanging Best, Xiao ________________________________ From: Balay, Satish Sent: Friday, May 31, 2019 12:35 To: Ma, Xiao Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Configuration process of Petsc hanging PETSc configure is attempting to run some MPI binaries - and that is hanging with this MPI. You can retry with the options: --batch=1 --known-64-bit-blas-indices=0 -known-mpi-shared-libraries=0 [and follow instructions provided by configure] Satish On Fri, 31 May 2019, Ma, Xiao via petsc-users wrote: > Hi , > > I am trying to install Pylith which is a earthquake simulator using Petsc library, I am building it in PSC bridge cluster, during the steps of building Petsc, the configuration hanging at > > TESTING: configureMPITypes from config.packages.MPI(config/BuildSystem/config/packages/MPI.py:247) > > > I am not sure if this has to with the configuration setup of the mpi version I am using. > > Any help would be deeply appreciated. > > I am attaching the configure options here: > > > Saving to: ?petsc-pylith-2.2.1.tgz? > > 100%[===========================================================>] 10,415,016 37.3MB/s in 0.3s > > 2019-05-31 14:03:13 (37.3 MB/s) - ?petsc-pylith-2.2.1.tgz? saved [10415016/10415016] > > FINISHED --2019-05-31 14:03:13-- > Total wall clock time: 1.1s > Downloaded: 1 files, 9.9M in 0.3s (37.3 MB/s) > /usr/bin/tar -zxf petsc-pylith-2.2.1.tgz > cd petsc-pylith && \ > ./configure --prefix=/home/xm12345/pylith \ > --with-c2html=0 --with-x=0 \ > --with-clanguage=C \ > --with-mpicompilers=1 \ > --with-shared-libraries=1 --with-64-bit-points=1 --with-large-file-io=1 \ > --download-chaco=1 --download-ml=1 --download-f2cblaslapack=1 --with-hdf5=1 --with -debugging=0 --with-fc=0 CPPFLAGS="-I/home/xm12345/pylith/include -I/home/xm12345/pylith/include " L DFLAGS="-L/home/xm12345/pylith/lib -L/home/xm12345/pylith/lib64 -L/home/xm12345/pylith/lib -L/home/xm 12345/pylith/lib64 " CFLAGS="-g -O2" CXXFLAGS="-g -O2 -DMPICH_IGNORE_CXX_SEEK" FCFLAGS="" \ > PETSC_DIR=/home/xm12345/build/pylith/petsc-pylith PETSC_ARCH=arch-pylith && \ > make -f gmakefile -j2 PETSC_DIR=/home/xm12345/build/pylith/petsc-pylith PETSC_ARCH=arch-pylit h && \ > make PETSC_DIR=/home/xm12345/build/pylith/petsc-pylith install && \ > make PETSC_DIR=/home/xm12345/build/pylith/petsc-pylith test && \ > touch ../installed_petsc > =============================================================================== > Configuring PETSc to compile on your system > =============================================================================== > =============================================================================== ***** WARNING: MAKEFLAGS (set to w) found in environment variables - ignoring use ./configure MAKEFLAGS=$MAKEFLAGS if you really want to use that value ****** =============================================================================== =============================================================================== WARNING! Compiling PETSc with no debugging, this should only be done for timing and production runs. All development should be done when configured using --with-debugging=1 =============================================================================== TESTING: configureMPITypes from config.packages.MPI(config/BuildSystem/config/packages/MPI.py:247) > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.zampini at gmail.com Fri May 31 14:22:32 2019 From: stefano.zampini at gmail.com (Stefano Zampini) Date: Fri, 31 May 2019 22:22:32 +0300 Subject: [petsc-users] Configuration process of Petsc hanging In-Reply-To: References: Message-ID: It should be ?with-batch=1 > On May 31, 2019, at 10:21 PM, Ma, Xiao via petsc-users wrote: > > Hi Satish, > > I have added these configure options > --batch=1 --known-64-bit-blas-indices=0 -known-mpi-shared-libraries=0 > > It is still hanging > > Best, > Xiao > From: Balay, Satish > Sent: Friday, May 31, 2019 12:35 > To: Ma, Xiao > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Configuration process of Petsc hanging > > PETSc configure is attempting to run some MPI binaries - and that is hanging with this MPI. > > You can retry with the options: > > --batch=1 --known-64-bit-blas-indices=0 -known-mpi-shared-libraries=0 > > [and follow instructions provided by configure] > > Satish > > > > On Fri, 31 May 2019, Ma, Xiao via petsc-users wrote: > > > Hi , > > > > I am trying to install Pylith which is a earthquake simulator using Petsc library, I am building it in PSC bridge cluster, during the steps of building Petsc, the configuration hanging at > > > > TESTING: configureMPITypes from config.packages.MPI(config/BuildSystem/config/packages/MPI.py:247) > > > > > > I am not sure if this has to with the configuration setup of the mpi version I am using. > > > > Any help would be deeply appreciated. > > > > I am attaching the configure options here: > > > > > > Saving to: ?petsc-pylith-2.2.1.tgz? > > > > 100%[===========================================================>] 10,415,016 37.3MB/s in 0.3s > > > > 2019-05-31 14:03:13 (37.3 MB/s) - ?petsc-pylith-2.2.1.tgz? saved [10415016/10415016] > > > > FINISHED --2019-05-31 14:03:13-- > > Total wall clock time: 1.1s > > Downloaded: 1 files, 9.9M in 0.3s (37.3 MB/s) > > /usr/bin/tar -zxf petsc-pylith-2.2.1.tgz > > cd petsc-pylith && \ > > ./configure --prefix=/home/xm12345/pylith \ > > --with-c2html=0 --with-x=0 \ > > --with-clanguage=C \ > > --with-mpicompilers=1 \ > > --with-shared-libraries=1 --with-64-bit-points=1 --with-large-file-io=1 \ > > --download-chaco=1 --download-ml=1 --download-f2cblaslapack=1 --with-hdf5=1 --with -debugging=0 --with-fc=0 CPPFLAGS="-I/home/xm12345/pylith/include -I/home/xm12345/pylith/include " L DFLAGS="-L/home/xm12345/pylith/lib -L/home/xm12345/pylith/lib64 -L/home/xm12345/pylith/lib -L/home/xm 12345/pylith/lib64 " CFLAGS="-g -O2" CXXFLAGS="-g -O2 -DMPICH_IGNORE_CXX_SEEK" FCFLAGS="" \ > > PETSC_DIR=/home/xm12345/build/pylith/petsc-pylith PETSC_ARCH=arch-pylith && \ > > make -f gmakefile -j2 PETSC_DIR=/home/xm12345/build/pylith/petsc-pylith PETSC_ARCH=arch-pylit h && \ > > make PETSC_DIR=/home/xm12345/build/pylith/petsc-pylith install && \ > > make PETSC_DIR=/home/xm12345/build/pylith/petsc-pylith test && \ > > touch ../installed_petsc > > =============================================================================== > > Configuring PETSc to compile on your system > > =============================================================================== > > =============================================================================== ***** WARNING: MAKEFLAGS (set to w) found in environment variables - ignoring use ./configure MAKEFLAGS=$MAKEFLAGS if you really want to use that value ****** =============================================================================== =============================================================================== WARNING! Compiling PETSc with no debugging, this should only be done for timing and production runs. All development should be done when configured using --with-debugging=1 =============================================================================== TESTING: configureMPITypes from config.packages.MPI(config/BuildSystem/config/packages/MPI.py:247) > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jczhang at mcs.anl.gov Fri May 31 14:53:39 2019 From: jczhang at mcs.anl.gov (Zhang, Junchao) Date: Fri, 31 May 2019 19:53:39 +0000 Subject: [petsc-users] Memory growth issue In-Reply-To: <0a0555fc86314d02ac23b06bee61fc3f@BYAPR09MB3063.namprd09.prod.outlook.com> References: <93669f36-2cfd-6772-abf9-3636e8d09fc3@berkeley.edu> <2076CEA0-CE15-486E-B001-EDE7D86DACA8@anl.gov> <0EC809E5-9DE2-4470-BF58-F8EBDECF3ACD@mcs.anl.gov> <9f80a732-c2a8-1bab-8b2e-591e3f3d65ba@berkeley.edu> <52F9D2F5-7EA4-4225-928B-A2C02DC47DE3@gmx.li> <026b416b-ea74-c73a-285f-484a82c806f2@berkeley.edu> <0a0555fc86314d02ac23b06bee61fc3f@BYAPR09MB3063.namprd09.prod.outlook.com> Message-ID: Sanjay, I tried petsc with MPICH and OpenMPI on my Macbook. I inserted PetscMemoryGetCurrentUsage/PetscMallocGetCurrentUsage at the beginning and end of KSPSolve and then computed the delta and summed over processes. Then I tested with src/ts/examples/tutorials/advection-diffusion-reaction/ex5.c With OpenMPI, mpirun -n 4 ./ex5 -da_grid_x 128 -da_grid_y 128 -ts_type beuler -ts_max_steps 500 > 128.log grep -n -v "RSS Delta= 0, Malloc Delta= 0" 128.log 1:RSS Delta= 69632, Malloc Delta= 0 2:RSS Delta= 69632, Malloc Delta= 0 3:RSS Delta= 69632, Malloc Delta= 0 4:RSS Delta= 69632, Malloc Delta= 0 9:RSS Delta=9.25286e+06, Malloc Delta= 0 22:RSS Delta= 49152, Malloc Delta= 0 44:RSS Delta= 20480, Malloc Delta= 0 53:RSS Delta= 49152, Malloc Delta= 0 66:RSS Delta= 4096, Malloc Delta= 0 97:RSS Delta= 16384, Malloc Delta= 0 119:RSS Delta= 20480, Malloc Delta= 0 141:RSS Delta= 53248, Malloc Delta= 0 176:RSS Delta= 16384, Malloc Delta= 0 308:RSS Delta= 16384, Malloc Delta= 0 352:RSS Delta= 16384, Malloc Delta= 0 550:RSS Delta= 16384, Malloc Delta= 0 572:RSS Delta= 16384, Malloc Delta= 0 669:RSS Delta= 40960, Malloc Delta= 0 924:RSS Delta= 32768, Malloc Delta= 0 1694:RSS Delta= 20480, Malloc Delta= 0 2099:RSS Delta= 16384, Malloc Delta= 0 2244:RSS Delta= 20480, Malloc Delta= 0 3001:RSS Delta= 16384, Malloc Delta= 0 5883:RSS Delta= 16384, Malloc Delta= 0 If I increased the grid mpirun -n 4 ./ex5 -da_grid_x 512 -da_grid_y 512 -ts_type beuler -ts_max_steps 500 -malloc_test >512.log grep -n -v "RSS Delta= 0, Malloc Delta= 0" 512.log 1:RSS Delta=1.05267e+06, Malloc Delta= 0 2:RSS Delta=1.05267e+06, Malloc Delta= 0 3:RSS Delta=1.05267e+06, Malloc Delta= 0 4:RSS Delta=1.05267e+06, Malloc Delta= 0 13:RSS Delta=1.24932e+08, Malloc Delta= 0 So we did see RSS increase in 4k-page sizes after KSPSolve. As long as no memory leaks, why do you care about it? Is it because you run out of memory? On Thu, May 30, 2019 at 1:59 PM Smith, Barry F. > wrote: Thanks for the update. So the current conclusions are that using the Waitall in your code 1) solves the memory issue with OpenMPI in your code 2) does not solve the memory issue with PETSc KSPSolve 3) MPICH has memory issues both for your code and PETSc KSPSolve (despite) the wait all fix? If you literately just comment out the call to KSPSolve() with OpenMPI is there no growth in memory usage? Both 2 and 3 are concerning, indicate possible memory leak bugs in MPICH and not freeing all MPI resources in KSPSolve() Junchao, can you please investigate 2 and 3 with, for example, a TS example that uses the linear solver (like with -ts_type beuler)? Thanks Barry > On May 30, 2019, at 1:47 PM, Sanjay Govindjee > wrote: > > Lawrence, > Thanks for taking a look! This is what I had been wondering about -- my knowledge of MPI is pretty minimal and > this origins of the routine were from a programmer we hired a decade+ back from NERSC. I'll have to look into > VecScatter. It will be great to dispense with our roll-your-own routines (we even have our own reduceALL scattered around the code). > > Interestingly, the MPI_WaitALL has solved the problem when using OpenMPI but it still persists with MPICH. Graphs attached. > I'm going to run with openmpi for now (but I guess I really still need to figure out what is wrong with MPICH and WaitALL; > I'll try Barry's suggestion of --download-mpich-configure-arguments="--enable-error-messages=all --enable-g" later today and report back). > > Regarding MPI_Barrier, it was put in due a problem that some processes were finishing up sending and receiving and exiting the subroutine > before the receiving processes had completed (which resulted in data loss as the buffers are freed after the call to the routine). MPI_Barrier was the solution proposed > to us. I don't think I can dispense with it, but will think about some more. > > I'm not so sure about using MPI_IRecv as it will require a bit of rewriting since right now I process the received > data sequentially after each blocking MPI_Recv -- clearly slower but easier to code. > > Thanks again for the help. > > -sanjay > > On 5/30/19 4:48 AM, Lawrence Mitchell wrote: >> Hi Sanjay, >> >>> On 30 May 2019, at 08:58, Sanjay Govindjee via petsc-users > wrote: >>> >>> The problem seems to persist but with a different signature. Graphs attached as before. >>> >>> Totals with MPICH (NB: single run) >>> >>> For the CG/Jacobi data_exchange_total = 41,385,984; kspsolve_total = 38,289,408 >>> For the GMRES/BJACOBI data_exchange_total = 41,324,544; kspsolve_total = 41,324,544 >>> >>> Just reading the MPI docs I am wondering if I need some sort of MPI_Wait/MPI_Waitall before my MPI_Barrier in the data exchange routine? >>> I would have thought that with the blocking receives and the MPI_Barrier that everything will have fully completed and cleaned up before >>> all processes exited the routine, but perhaps I am wrong on that. >> >> Skimming the fortran code you sent you do: >> >> for i in ...: >> call MPI_Isend(..., req, ierr) >> >> for i in ...: >> call MPI_Recv(..., ierr) >> >> But you never call MPI_Wait on the request you got back from the Isend. So the MPI library will never free the data structures it created. >> >> The usual pattern for these non-blocking communications is to allocate an array for the requests of length nsend+nrecv and then do: >> >> for i in nsend: >> call MPI_Isend(..., req[i], ierr) >> for j in nrecv: >> call MPI_Irecv(..., req[nsend+j], ierr) >> >> call MPI_Waitall(req, ..., ierr) >> >> I note also there's no need for the Barrier at the end of the routine, this kind of communication does neighbourwise synchronisation, no need to add (unnecessary) global synchronisation too. >> >> As an aside, is there a reason you don't use PETSc's VecScatter to manage this global to local exchange? >> >> Cheers, >> >> Lawrence > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From s_g at berkeley.edu Fri May 31 15:10:00 2019 From: s_g at berkeley.edu (Sanjay Govindjee) Date: Fri, 31 May 2019 13:10:00 -0700 Subject: [petsc-users] Memory growth issue In-Reply-To: References: <93669f36-2cfd-6772-abf9-3636e8d09fc3@berkeley.edu> <2076CEA0-CE15-486E-B001-EDE7D86DACA8@anl.gov> <0EC809E5-9DE2-4470-BF58-F8EBDECF3ACD@mcs.anl.gov> <9f80a732-c2a8-1bab-8b2e-591e3f3d65ba@berkeley.edu> <52F9D2F5-7EA4-4225-928B-A2C02DC47DE3@gmx.li> <026b416b-ea74-c73a-285f-484a82c806f2@berkeley.edu> <0a0555fc86314d02ac23b06bee61fc3f@BYAPR09MB3063.namprd09.prod.outlook.com> Message-ID: <979b9cd8-7d22-98c7-b8c7-48d1515531eb@berkeley.edu> Yes, the issue is running out of memory on long runs. Perhaps some clean-up happens latter when the memory pressure builds but that is a bit non-ideal. -sanjay On 5/31/19 12:53 PM, Zhang, Junchao wrote: > Sanjay, > I tried petsc with MPICH and OpenMPI on my Macbook. I > inserted?PetscMemoryGetCurrentUsage/PetscMallocGetCurrentUsage at the > beginning and end of KSPSolve and then computed the delta and summed > over processes. Then I tested > with?src/ts/examples/tutorials/advection-diffusion-reaction/ex5.c > With OpenMPI, > mpirun -n 4 ./ex5 -da_grid_x 128 -da_grid_y 128 -ts_type beuler > -ts_max_steps 500 > 128.log > grep -n -v "RSS Delta= ? ? ? ? 0, Malloc Delta= 0" 128.log > 1:RSS Delta= ? ? 69632, Malloc Delta= ? ? ? ? 0 > 2:RSS Delta= ? ? 69632, Malloc Delta= ? ? ? ? 0 > 3:RSS Delta= ? ? 69632, Malloc Delta= ? ? ? ? 0 > 4:RSS Delta= ? ? 69632, Malloc Delta= ? ? ? ? 0 > 9:RSS Delta=9.25286e+06, Malloc Delta= ? ? ? ? 0 > 22:RSS Delta= ? ? 49152, Malloc Delta= ? ? ? ? 0 > 44:RSS Delta= ? ? 20480, Malloc Delta= ? ? ? ? 0 > 53:RSS Delta= ? ? 49152, Malloc Delta= ? ? ? ? 0 > 66:RSS Delta= ? ? ?4096, Malloc Delta= ? ? ? ? 0 > 97:RSS Delta= ? ? 16384, Malloc Delta= ? ? ? ? 0 > 119:RSS Delta= ? ? 20480, Malloc Delta= ? ? ? ? 0 > 141:RSS Delta= ? ? 53248, Malloc Delta= ? ? ? ? 0 > 176:RSS Delta= ? ? 16384, Malloc Delta= ? ? ? ? 0 > 308:RSS Delta= ? ? 16384, Malloc Delta= ? ? ? ? 0 > 352:RSS Delta= ? ? 16384, Malloc Delta= ? ? ? ? 0 > 550:RSS Delta= ? ? 16384, Malloc Delta= ? ? ? ? 0 > 572:RSS Delta= ? ? 16384, Malloc Delta= ? ? ? ? 0 > 669:RSS Delta= ? ? 40960, Malloc Delta= ? ? ? ? 0 > 924:RSS Delta= ? ? 32768, Malloc Delta= ? ? ? ? 0 > 1694:RSS Delta= ? ? 20480, Malloc Delta= ? ? ? ? 0 > 2099:RSS Delta= ? ? 16384, Malloc Delta= ? ? ? ? 0 > 2244:RSS Delta= ? ? 20480, Malloc Delta= ? ? ? ? 0 > 3001:RSS Delta= ? ? 16384, Malloc Delta= ? ? ? ? 0 > 5883:RSS Delta= ? ? 16384, Malloc Delta= ? ? ? ? 0 > > If I increased the grid > mpirun -n 4 ./ex5 -da_grid_x 512 -da_grid_y 512 -ts_type beuler > -ts_max_steps 500 -malloc_test >512.log > grep -n -v "RSS Delta= ? ? ? ? 0, Malloc Delta= 0" 512.log > 1:RSS Delta=1.05267e+06, Malloc Delta= ? ? ? ? 0 > 2:RSS Delta=1.05267e+06, Malloc Delta= ? ? ? ? 0 > 3:RSS Delta=1.05267e+06, Malloc Delta= ? ? ? ? 0 > 4:RSS Delta=1.05267e+06, Malloc Delta= ? ? ? ? 0 > 13:RSS Delta=1.24932e+08, Malloc Delta= ? ? ? ? 0 > > So we did see RSS increase in 4k-page sizes after KSPSolve. As long as > no memory leaks, why do you care about it? Is it because you run out > of memory? > > On Thu, May 30, 2019 at 1:59 PM Smith, Barry F. > wrote: > > > ? ?Thanks for the update. So the current conclusions are that > using the Waitall in your code > > 1) solves the memory issue with OpenMPI in your code > > 2) does not solve the memory issue with PETSc KSPSolve > > 3) MPICH has memory issues both for your code and PETSc KSPSolve > (despite) the wait all fix? > > If you literately just comment out the call to KSPSolve() with > OpenMPI is there no growth in memory usage? > > > Both 2 and 3 are concerning, indicate possible memory leak bugs in > MPICH and not freeing all MPI resources in KSPSolve() > > Junchao, can you please investigate 2 and 3 with, for example, a > TS example that uses the linear solver (like with -ts_type > beuler)? Thanks > > > ? Barry > > > > > On May 30, 2019, at 1:47 PM, Sanjay Govindjee > wrote: > > > > Lawrence, > > Thanks for taking a look!? This is what I had been wondering > about -- my knowledge of MPI is pretty minimal and > > this origins of the routine were from a programmer we hired a > decade+ back from NERSC.? I'll have to look into > > VecScatter.? It will be great to dispense with our roll-your-own > routines (we even have our own reduceALL scattered around the code). > > > > Interestingly, the MPI_WaitALL has solved the problem when using > OpenMPI but it still persists with MPICH.? Graphs attached. > > I'm going to run with openmpi for now (but I guess I really > still need to figure out what is wrong with MPICH and WaitALL; > > I'll try Barry's suggestion of > --download-mpich-configure-arguments="--enable-error-messages=all > --enable-g" later today and report back). > > > > Regarding MPI_Barrier, it was put in due a problem that some > processes were finishing up sending and receiving and exiting the > subroutine > > before the receiving processes had completed (which resulted in > data loss as the buffers are freed after the call to the routine). > MPI_Barrier was the solution proposed > > to us.? I don't think I can dispense with it, but will think > about some more. > > > > I'm not so sure about using MPI_IRecv as it will require a bit > of rewriting since right now I process the received > > data sequentially after each blocking MPI_Recv -- clearly slower > but easier to code. > > > > Thanks again for the help. > > > > -sanjay > > > > On 5/30/19 4:48 AM, Lawrence Mitchell wrote: > >> Hi Sanjay, > >> > >>> On 30 May 2019, at 08:58, Sanjay Govindjee via petsc-users > > wrote: > >>> > >>> The problem seems to persist but with a different signature.? > Graphs attached as before. > >>> > >>> Totals with MPICH (NB: single run) > >>> > >>> For the CG/Jacobi? ? ? ? ? data_exchange_total = 41,385,984; > kspsolve_total = 38,289,408 > >>> For the GMRES/BJACOBI? ? ? data_exchange_total = 41,324,544; > kspsolve_total = 41,324,544 > >>> > >>> Just reading the MPI docs I am wondering if I need some sort > of MPI_Wait/MPI_Waitall before my MPI_Barrier in the data exchange > routine? > >>> I would have thought that with the blocking receives and the > MPI_Barrier that everything will have fully completed and cleaned > up before > >>> all processes exited the routine, but perhaps I am wrong on that. > >> > >> Skimming the fortran code you sent you do: > >> > >> for i in ...: > >>? ? call MPI_Isend(..., req, ierr) > >> > >> for i in ...: > >>? ? call MPI_Recv(..., ierr) > >> > >> But you never call MPI_Wait on the request you got back from > the Isend. So the MPI library will never free the data structures > it created. > >> > >> The usual pattern for these non-blocking communications is to > allocate an array for the requests of length nsend+nrecv and then do: > >> > >> for i in nsend: > >>? ? call MPI_Isend(..., req[i], ierr) > >> for j in nrecv: > >>? ? call MPI_Irecv(..., req[nsend+j], ierr) > >> > >> call MPI_Waitall(req, ..., ierr) > >> > >> I note also there's no need for the Barrier at the end of the > routine, this kind of communication does neighbourwise > synchronisation, no need to add (unnecessary) global > synchronisation too. > >> > >> As an aside, is there a reason you don't use PETSc's VecScatter > to manage this global to local exchange? > >> > >> Cheers, > >> > >> Lawrence > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From s_g at berkeley.edu Fri May 31 15:46:55 2019 From: s_g at berkeley.edu (Sanjay Govindjee) Date: Fri, 31 May 2019 13:46:55 -0700 Subject: [petsc-users] Memory growth issue In-Reply-To: References: <93669f36-2cfd-6772-abf9-3636e8d09fc3@berkeley.edu> <2076CEA0-CE15-486E-B001-EDE7D86DACA8@anl.gov> <0EC809E5-9DE2-4470-BF58-F8EBDECF3ACD@mcs.anl.gov> <9f80a732-c2a8-1bab-8b2e-591e3f3d65ba@berkeley.edu> <52F9D2F5-7EA4-4225-928B-A2C02DC47DE3@gmx.li> <026b416b-ea74-c73a-285f-484a82c806f2@berkeley.edu> <5faac7eb-706c-139b-7fe9-2476ea0990b2@berkeley.edu> Message-ID: <0fd530bb-65c9-6cbe-6ab1-d15398129719@berkeley.edu> Thanks Stefano. Reading the manual pages a bit more carefully, I think I can see what I should be doing.? Which should be roughly to 1. Set up target Seq vectors on PETSC_COMM_SELF 2. Use ISCreateGeneral to create ISs for the target Vecs? and the source Vec which will be MPI on PETSC_COMM_WORLD. 3. Create the scatter context with VecScatterCreate 4. Call VecScatterBegin/End on each process (instead of using my prior routine). Lingering questions: a. Is there any performance advantage/disadvantage to creating a single parallel target Vec instead of multiple target Seq Vecs (in terms of the scatter operation)? b. The data that ends up in the target on each processor needs to be in an application array.? Is there a clever way to 'move' the data from the scatter target to the array (short of just running a loop over it and copying)? -sanjay On 5/31/19 12:02 PM, Stefano Zampini wrote: > > >> On May 31, 2019, at 9:50 PM, Sanjay Govindjee via petsc-users >> > wrote: >> >> Matt, >> ? Here is the process as it currently stands: >> >> 1) I have a PETSc Vec (sol), which come from a KSPSolve >> >> 2) Each processor grabs its section of sol via VecGetOwnershipRange >> and VecGetArrayReadF90 >> and inserts parts of its section of sol in a local array (locarr) >> using a complex but easily computable mapping. >> >> 3) The routine you are looking at then exchanges various parts of the >> locarr between the processors. >> > > You need a VecScatter object > https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecScatterCreate.html#VecScatterCreate > > >> 4) Each processor then does computations using its updated locarr. >> >> Typing it out this way, I guess the answer to your question is >> "yes."? I have a global Vec and I want its values >> sent in a complex but computable way to local vectors on each process. >> >> -sanjay >> On 5/31/19 3:37 AM, Matthew Knepley wrote: >>> On Thu, May 30, 2019 at 11:55 PM Sanjay Govindjee via petsc-users >>> > wrote: >>> >>> Hi Juanchao, >>> Thanks for the hints below, they will take some time to absorb >>> as the vectors that are being moved around >>> are actually partly petsc vectors and partly local process vectors. >>> >>> >>> Is this code just doing a global-to-local map? Meaning, does it just >>> map all the local unknowns to some global >>> unknown on some process? We have an even simpler interface for that, >>> where we make the VecScatter >>> automatically, >>> >>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/IS/ISLocalToGlobalMappingCreate.html#ISLocalToGlobalMappingCreate >>> >>> Then you can use it with Vecs, Mats, etc. >>> >>> ? Thanks, >>> >>> ? ? ?Matt >>> >>> Attached is the modified routine that now works (on leaking >>> memory) with openmpi. >>> >>> -sanjay >>> On 5/30/19 8:41 PM, Zhang, Junchao wrote: >>>> >>>> Hi, Sanjay, >>>> ? Could you send your modified data exchange code (psetb.F) >>>> with MPI_Waitall? See other inlined comments below. Thanks. >>>> >>>> On Thu, May 30, 2019 at 1:49 PM Sanjay Govindjee via >>>> petsc-users >>> > wrote: >>>> >>>> Lawrence, >>>> Thanks for taking a look!? This is what I had been >>>> wondering about -- my >>>> knowledge of MPI is pretty minimal and >>>> this origins of the routine were from a programmer we hired >>>> a decade+ >>>> back from NERSC.? I'll have to look into >>>> VecScatter.? It will be great to dispense with our >>>> roll-your-own >>>> routines (we even have our own reduceALL scattered around >>>> the code). >>>> >>>> Petsc VecScatter has a very simple interface and you definitely >>>> should go with.? With VecScatter, you can think in familiar >>>> vectors and indices instead of the low level MPI_Send/Recv. >>>> Besides that, PETSc has optimized VecScatter so that >>>> communication is efficient. >>>> >>>> >>>> Interestingly, the MPI_WaitALL has solved the problem when >>>> using OpenMPI >>>> but it still persists with MPICH. Graphs attached. >>>> I'm going to run with openmpi for now (but I guess I really >>>> still need >>>> to figure out what is wrong with MPICH and WaitALL; >>>> I'll try Barry's suggestion of >>>> --download-mpich-configure-arguments="--enable-error-messages=all >>>> >>>> --enable-g" later today and report back). >>>> >>>> Regarding MPI_Barrier, it was put in due a problem that >>>> some processes >>>> were finishing up sending and receiving and exiting the >>>> subroutine >>>> before the receiving processes had completed (which >>>> resulted in data >>>> loss as the buffers are freed after the call to the routine). >>>> MPI_Barrier was the solution proposed >>>> to us.? I don't think I can dispense with it, but will >>>> think about some >>>> more. >>>> >>>> After MPI_Send(), or after MPI_Isend(..,req) and MPI_Wait(req), >>>> you can safely free the send buffer without worry that the >>>> receive has not completed. MPI guarantees the receiver can get >>>> the data, for example, through internal buffering. >>>> >>>> >>>> I'm not so sure about using MPI_IRecv as it will require a >>>> bit of >>>> rewriting since right now I process the received >>>> data sequentially after each blocking MPI_Recv -- clearly >>>> slower but >>>> easier to code. >>>> >>>> Thanks again for the help. >>>> >>>> -sanjay >>>> >>>> On 5/30/19 4:48 AM, Lawrence Mitchell wrote: >>>> > Hi Sanjay, >>>> > >>>> >> On 30 May 2019, at 08:58, Sanjay Govindjee via >>>> petsc-users >>> > wrote: >>>> >> >>>> >> The problem seems to persist but with a different >>>> signature.? Graphs attached as before. >>>> >> >>>> >> Totals with MPICH (NB: single run) >>>> >> >>>> >> For the CG/Jacobi data_exchange_total = 41,385,984; >>>> kspsolve_total = 38,289,408 >>>> >> For the GMRES/BJACOBI data_exchange_total = 41,324,544; >>>> kspsolve_total = 41,324,544 >>>> >> >>>> >> Just reading the MPI docs I am wondering if I need some >>>> sort of MPI_Wait/MPI_Waitall before my MPI_Barrier in the >>>> data exchange routine? >>>> >> I would have thought that with the blocking receives and >>>> the MPI_Barrier that everything will have fully completed >>>> and cleaned up before >>>> >> all processes exited the routine, but perhaps I am wrong >>>> on that. >>>> > >>>> > Skimming the fortran code you sent you do: >>>> > >>>> > for i in ...: >>>> >? ? ?call MPI_Isend(..., req, ierr) >>>> > >>>> > for i in ...: >>>> >? ? ?call MPI_Recv(..., ierr) >>>> > >>>> > But you never call MPI_Wait on the request you got back >>>> from the Isend. So the MPI library will never free the data >>>> structures it created. >>>> > >>>> > The usual pattern for these non-blocking communications >>>> is to allocate an array for the requests of length >>>> nsend+nrecv and then do: >>>> > >>>> > for i in nsend: >>>> >? ? ?call MPI_Isend(..., req[i], ierr) >>>> > for j in nrecv: >>>> >? ? ?call MPI_Irecv(..., req[nsend+j], ierr) >>>> > >>>> > call MPI_Waitall(req, ..., ierr) >>>> > >>>> > I note also there's no need for the Barrier at the end of >>>> the routine, this kind of communication does neighbourwise >>>> synchronisation, no need to add (unnecessary) global >>>> synchronisation too. >>>> > >>>> > As an aside, is there a reason you don't use PETSc's >>>> VecScatter to manage this global to local exchange? >>>> > >>>> > Cheers, >>>> > >>>> > Lawrence >>>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which >>> their experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jczhang at mcs.anl.gov Fri May 31 16:17:34 2019 From: jczhang at mcs.anl.gov (Zhang, Junchao) Date: Fri, 31 May 2019 21:17:34 +0000 Subject: [petsc-users] Memory growth issue In-Reply-To: <0fd530bb-65c9-6cbe-6ab1-d15398129719@berkeley.edu> References: <93669f36-2cfd-6772-abf9-3636e8d09fc3@berkeley.edu> <2076CEA0-CE15-486E-B001-EDE7D86DACA8@anl.gov> <0EC809E5-9DE2-4470-BF58-F8EBDECF3ACD@mcs.anl.gov> <9f80a732-c2a8-1bab-8b2e-591e3f3d65ba@berkeley.edu> <52F9D2F5-7EA4-4225-928B-A2C02DC47DE3@gmx.li> <026b416b-ea74-c73a-285f-484a82c806f2@berkeley.edu> <5faac7eb-706c-139b-7fe9-2476ea0990b2@berkeley.edu> <0fd530bb-65c9-6cbe-6ab1-d15398129719@berkeley.edu> Message-ID: On Fri, May 31, 2019 at 3:48 PM Sanjay Govindjee via petsc-users > wrote: Thanks Stefano. Reading the manual pages a bit more carefully, I think I can see what I should be doing. Which should be roughly to 1. Set up target Seq vectors on PETSC_COMM_SELF 2. Use ISCreateGeneral to create ISs for the target Vecs and the source Vec which will be MPI on PETSC_COMM_WORLD. 3. Create the scatter context with VecScatterCreate 4. Call VecScatterBegin/End on each process (instead of using my prior routine). Lingering questions: a. Is there any performance advantage/disadvantage to creating a single parallel target Vec instead of multiple target Seq Vecs (in terms of the scatter operation)? No performance difference. But pay attention, if you use seq vec, the indices in IS are locally numbered; if you use MPI vec, the indices are globally numbered. b. The data that ends up in the target on each processor needs to be in an application array. Is there a clever way to 'move' the data from the scatter target to the array (short of just running a loop over it and copying)? See VecGetArray, VecGetArrayRead etc, which pull the data out of Vecs without memory copying. -sanjay On 5/31/19 12:02 PM, Stefano Zampini wrote: On May 31, 2019, at 9:50 PM, Sanjay Govindjee via petsc-users > wrote: Matt, Here is the process as it currently stands: 1) I have a PETSc Vec (sol), which come from a KSPSolve 2) Each processor grabs its section of sol via VecGetOwnershipRange and VecGetArrayReadF90 and inserts parts of its section of sol in a local array (locarr) using a complex but easily computable mapping. 3) The routine you are looking at then exchanges various parts of the locarr between the processors. You need a VecScatter object https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecScatterCreate.html#VecScatterCreate 4) Each processor then does computations using its updated locarr. Typing it out this way, I guess the answer to your question is "yes." I have a global Vec and I want its values sent in a complex but computable way to local vectors on each process. -sanjay On 5/31/19 3:37 AM, Matthew Knepley wrote: On Thu, May 30, 2019 at 11:55 PM Sanjay Govindjee via petsc-users > wrote: Hi Juanchao, Thanks for the hints below, they will take some time to absorb as the vectors that are being moved around are actually partly petsc vectors and partly local process vectors. Is this code just doing a global-to-local map? Meaning, does it just map all the local unknowns to some global unknown on some process? We have an even simpler interface for that, where we make the VecScatter automatically, https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/IS/ISLocalToGlobalMappingCreate.html#ISLocalToGlobalMappingCreate Then you can use it with Vecs, Mats, etc. Thanks, Matt Attached is the modified routine that now works (on leaking memory) with openmpi. -sanjay On 5/30/19 8:41 PM, Zhang, Junchao wrote: Hi, Sanjay, Could you send your modified data exchange code (psetb.F) with MPI_Waitall? See other inlined comments below. Thanks. On Thu, May 30, 2019 at 1:49 PM Sanjay Govindjee via petsc-users > wrote: Lawrence, Thanks for taking a look! This is what I had been wondering about -- my knowledge of MPI is pretty minimal and this origins of the routine were from a programmer we hired a decade+ back from NERSC. I'll have to look into VecScatter. It will be great to dispense with our roll-your-own routines (we even have our own reduceALL scattered around the code). Petsc VecScatter has a very simple interface and you definitely should go with. With VecScatter, you can think in familiar vectors and indices instead of the low level MPI_Send/Recv. Besides that, PETSc has optimized VecScatter so that communication is efficient. Interestingly, the MPI_WaitALL has solved the problem when using OpenMPI but it still persists with MPICH. Graphs attached. I'm going to run with openmpi for now (but I guess I really still need to figure out what is wrong with MPICH and WaitALL; I'll try Barry's suggestion of --download-mpich-configure-arguments="--enable-error-messages=all --enable-g" later today and report back). Regarding MPI_Barrier, it was put in due a problem that some processes were finishing up sending and receiving and exiting the subroutine before the receiving processes had completed (which resulted in data loss as the buffers are freed after the call to the routine). MPI_Barrier was the solution proposed to us. I don't think I can dispense with it, but will think about some more. After MPI_Send(), or after MPI_Isend(..,req) and MPI_Wait(req), you can safely free the send buffer without worry that the receive has not completed. MPI guarantees the receiver can get the data, for example, through internal buffering. I'm not so sure about using MPI_IRecv as it will require a bit of rewriting since right now I process the received data sequentially after each blocking MPI_Recv -- clearly slower but easier to code. Thanks again for the help. -sanjay On 5/30/19 4:48 AM, Lawrence Mitchell wrote: > Hi Sanjay, > >> On 30 May 2019, at 08:58, Sanjay Govindjee via petsc-users > wrote: >> >> The problem seems to persist but with a different signature. Graphs attached as before. >> >> Totals with MPICH (NB: single run) >> >> For the CG/Jacobi data_exchange_total = 41,385,984; kspsolve_total = 38,289,408 >> For the GMRES/BJACOBI data_exchange_total = 41,324,544; kspsolve_total = 41,324,544 >> >> Just reading the MPI docs I am wondering if I need some sort of MPI_Wait/MPI_Waitall before my MPI_Barrier in the data exchange routine? >> I would have thought that with the blocking receives and the MPI_Barrier that everything will have fully completed and cleaned up before >> all processes exited the routine, but perhaps I am wrong on that. > > Skimming the fortran code you sent you do: > > for i in ...: > call MPI_Isend(..., req, ierr) > > for i in ...: > call MPI_Recv(..., ierr) > > But you never call MPI_Wait on the request you got back from the Isend. So the MPI library will never free the data structures it created. > > The usual pattern for these non-blocking communications is to allocate an array for the requests of length nsend+nrecv and then do: > > for i in nsend: > call MPI_Isend(..., req[i], ierr) > for j in nrecv: > call MPI_Irecv(..., req[nsend+j], ierr) > > call MPI_Waitall(req, ..., ierr) > > I note also there's no need for the Barrier at the end of the routine, this kind of communication does neighbourwise synchronisation, no need to add (unnecessary) global synchronisation too. > > As an aside, is there a reason you don't use PETSc's VecScatter to manage this global to local exchange? > > Cheers, > > Lawrence -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ -------------- next part -------------- An HTML attachment was scrubbed... URL: