From sospinar at unal.edu.co Sun Apr 2 06:43:19 2017 From: sospinar at unal.edu.co (Unal) Date: Sun, 2 Apr 2017 13:43:19 +0200 Subject: [petsc-users] [3.7.5] strange config error on macOS with XCode 8.3 and Clang 8.1.0 In-Reply-To: <7651BACB-F833-400A-85AF-D990FF4FC024@gmail.com> References: <668CE3D2-B464-4AE2-82F6-F87F88D2A53B@gmail.com> <5BCD5BA6-D6A5-49CD-B199-1E90AC557935@gmail.com> <7C9859FD-FAA6-4038-974F-507BE295992C@mcs.anl.gov> <7651BACB-F833-400A-85AF-D990FF4FC024@gmail.com> Message-ID: <0E041E8B-9869-442B-8D2C-2CA6F76B782A@unal.edu.co> In my experience, each time you update Xcode, you also have to update the Command Line Tools (https://developer.apple.com/downloads/index.action?=command%20line%20tools ), and reinstall everything again in the right order; in my case, I have to reinstall Fortran, MPI, HDF5 and PETSc. When one doesn't do it in the right order, it appears problems like what you reported. Santiago > On 29 Mar 2017, at 07:26, Denis Davydov wrote: > > Thanks Barry, I can confirm that an adaptation of your patch to 3.7.5 allows to compile PETSc. > > Regads, > Denis. > > >> On 29 Mar 2017, at 06:23, Barry Smith wrote: >> >> >> I have added the commit https://bitbucket.org/petsc/petsc/commits/4f290403fdd060d09d5cb07345cbfd52670e3cbc to the maint, master and next branch that allows ./configure to to go through in this situation. If the change does not break other tests it will be included in the next patch release. >> >> Thanks for reporting the problem, >> >> Barry >> >> This patch does not directly deal with the problem (which i don't understand but seems to be an Apple Xcode problem) but works around the problem on my machine. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From davydden at gmail.com Sun Apr 2 09:07:16 2017 From: davydden at gmail.com (Denis Davydov) Date: Sun, 2 Apr 2017 16:07:16 +0200 Subject: [petsc-users] [3.7.5] strange config error on macOS with XCode 8.3 and Clang 8.1.0 In-Reply-To: <0E041E8B-9869-442B-8D2C-2CA6F76B782A@unal.edu.co> References: <668CE3D2-B464-4AE2-82F6-F87F88D2A53B@gmail.com> <5BCD5BA6-D6A5-49CD-B199-1E90AC557935@gmail.com> <7C9859FD-FAA6-4038-974F-507BE295992C@mcs.anl.gov> <7651BACB-F833-400A-85AF-D990FF4FC024@gmail.com> <0E041E8B-9869-442B-8D2C-2CA6F76B782A@unal.edu.co> Message-ID: <1586BA3F-6A6F-4076-951F-F9AF97BF74F6@gmail.com> I don't think you comment is related to this problem. Command Line Tools were also updated. The "right order" is quite simple with Spack, obviously I had whole lot of things re-compiled, including GCC, OpenMPI as well as third party packages used in Petsc. Regards, Denis > 2 ???. 2017 ?., ? 13:43, Unal ???????(?): > > In my experience, each time you update Xcode, you also have to update the Command Line Tools (https://developer.apple.com/downloads/index.action?=command%20line%20tools), and reinstall everything again in the right order; in my case, I have to reinstall Fortran, MPI, HDF5 and PETSc. When one doesn't do it in the right order, it appears problems like what you reported. > > Santiago > >> On 29 Mar 2017, at 07:26, Denis Davydov wrote: >> >> Thanks Barry, I can confirm that an adaptation of your patch to 3.7.5 allows to compile PETSc. >> >> Regads, >> Denis. >> >> >>> On 29 Mar 2017, at 06:23, Barry Smith wrote: >>> >>> >>> I have added the commit https://bitbucket.org/petsc/petsc/commits/4f290403fdd060d09d5cb07345cbfd52670e3cbc to the maint, master and next branch that allows ./configure to to go through in this situation. If the change does not break other tests it will be included in the next patch release. >>> >>> Thanks for reporting the problem, >>> >>> Barry >>> >>> This patch does not directly deal with the problem (which i don't understand but seems to be an Apple Xcode problem) but works around the problem on my machine. >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jychang48 at gmail.com Sun Apr 2 09:25:58 2017 From: jychang48 at gmail.com (Justin Chang) Date: Sun, 2 Apr 2017 09:25:58 -0500 Subject: [petsc-users] Correlation between da_refine and pg_mg_levels In-Reply-To: References: <87y3vlmqye.fsf@jedbrown.org> Message-ID: Thanks guys, So I want to run SNES ex48 across 1032 processes on Edison, but I keep getting segmentation violations. These are the parameters I am trying: srun -n 1032 -c 2 ./ex48 -M 80 -N 80 -P 9 -da_refine 1 -pc_type mg -thi_mat_type baij -mg_coarse_pc_type gamg The above works perfectly fine if I used 96 processes. I also tried to use a finer coarse mesh on 1032 but the error persists. Any ideas why this is happening? What are the ideal parameters to use if I want to use 1k+ cores? Thanks, Justin On Fri, Mar 31, 2017 at 12:47 PM, Barry Smith wrote: > > > On Mar 31, 2017, at 10:00 AM, Jed Brown wrote: > > > > Justin Chang writes: > > > >> Yeah based on my experiments it seems setting pc_mg_levels to $DAREFINE > + 1 > >> has decent performance. > >> > >> 1) is there ever a case where you'd want $MGLEVELS <= $DAREFINE? In > some of > >> the PETSc tutorial slides (e.g., http://www.mcs.anl.gov/ > >> petsc/documentation/tutorials/TutorialCEMRACS2016.pdf on slide 203/227) > >> they say to use $MGLEVELS = 4 and $DAREFINE = 5, but when I ran this, it > >> was almost twice as slow as if $MGLEVELS >= $DAREFINE > > > > Smaller coarse grids are generally more scalable -- when the problem > > data is distributed, multigrid is a good solution algorithm. But if > > multigrid stops being effective because it is not preserving sufficient > > coarse grid accuracy (e.g., for transport-dominated problems in > > complicated domains) then you might want to stop early and use a more > > robust method (like direct solves). > > Basically for symmetric positive definite operators you can make the > coarse problem as small as you like (even 1 point) in theory. For > indefinite and non-symmetric problems the theory says the "coarse grid must > be sufficiently fine" (loosely speaking the coarse grid has to resolve the > eigenmodes for the eigenvalues to the left of the x = 0). > > https://www.jstor.org/stable/2158375?seq=1#page_scan_tab_contents > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Sun Apr 2 11:29:38 2017 From: jed at jedbrown.org (Jed Brown) Date: Sun, 02 Apr 2017 10:29:38 -0600 Subject: [petsc-users] Correlation between da_refine and pg_mg_levels In-Reply-To: References: <87y3vlmqye.fsf@jedbrown.org> Message-ID: <8760imsrgt.fsf@jedbrown.org> Justin Chang writes: > Thanks guys, > > So I want to run SNES ex48 across 1032 processes on Edison, How did you decide on 1032 processes? What shape did the DMDA produce? Of course this should work, but we didn't explicitly test that in the paper since we were running on BG/P. https://github.com/jedbrown/tme-ice/tree/master/shaheen/b > but I keep getting segmentation violations. These are the parameters I > am trying: > > srun -n 1032 -c 2 ./ex48 -M 80 -N 80 -P 9 -da_refine 1 -pc_type mg > -thi_mat_type baij -mg_coarse_pc_type gamg > > The above works perfectly fine if I used 96 processes. I also tried to use > a finer coarse mesh on 1032 but the error persists. > > Any ideas why this is happening? What are the ideal parameters to use if I > want to use 1k+ cores? > > Thanks, > Justin > > On Fri, Mar 31, 2017 at 12:47 PM, Barry Smith wrote: > >> >> > On Mar 31, 2017, at 10:00 AM, Jed Brown wrote: >> > >> > Justin Chang writes: >> > >> >> Yeah based on my experiments it seems setting pc_mg_levels to $DAREFINE >> + 1 >> >> has decent performance. >> >> >> >> 1) is there ever a case where you'd want $MGLEVELS <= $DAREFINE? In >> some of >> >> the PETSc tutorial slides (e.g., http://www.mcs.anl.gov/ >> >> petsc/documentation/tutorials/TutorialCEMRACS2016.pdf on slide 203/227) >> >> they say to use $MGLEVELS = 4 and $DAREFINE = 5, but when I ran this, it >> >> was almost twice as slow as if $MGLEVELS >= $DAREFINE >> > >> > Smaller coarse grids are generally more scalable -- when the problem >> > data is distributed, multigrid is a good solution algorithm. But if >> > multigrid stops being effective because it is not preserving sufficient >> > coarse grid accuracy (e.g., for transport-dominated problems in >> > complicated domains) then you might want to stop early and use a more >> > robust method (like direct solves). >> >> Basically for symmetric positive definite operators you can make the >> coarse problem as small as you like (even 1 point) in theory. For >> indefinite and non-symmetric problems the theory says the "coarse grid must >> be sufficiently fine" (loosely speaking the coarse grid has to resolve the >> eigenmodes for the eigenvalues to the left of the x = 0). >> >> https://www.jstor.org/stable/2158375?seq=1#page_scan_tab_contents >> >> >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From jychang48 at gmail.com Sun Apr 2 11:54:02 2017 From: jychang48 at gmail.com (Justin Chang) Date: Sun, 2 Apr 2017 11:54:02 -0500 Subject: [petsc-users] Correlation between da_refine and pg_mg_levels In-Reply-To: <8760imsrgt.fsf@jedbrown.org> References: <87y3vlmqye.fsf@jedbrown.org> <8760imsrgt.fsf@jedbrown.org> Message-ID: It was sort of arbitrary. I want to conduct a performance spectrum (dofs/sec) study where at least 1k processors are used on various HPC machines (and hopefully one more case with 10k procs). Assuming all available cores on these compute nodes (which I know is not the greatest idea here), 1032 Ivybridge (24 cores/node) on Edison best matches Cori's 1024 Haswell (32 core/node). How do I determine the shape of the DMDA? I am guessing the number of MPI processes needs to be compatible with this? Thanks, Justin On Sun, Apr 2, 2017 at 11:29 AM, Jed Brown wrote: > Justin Chang writes: > > > Thanks guys, > > > > So I want to run SNES ex48 across 1032 processes on Edison, > > How did you decide on 1032 processes? What shape did the DMDA produce? > Of course this should work, but we didn't explicitly test that in the > paper since we were running on BG/P. > > https://github.com/jedbrown/tme-ice/tree/master/shaheen/b > > > but I keep getting segmentation violations. These are the parameters I > > am trying: > > > > srun -n 1032 -c 2 ./ex48 -M 80 -N 80 -P 9 -da_refine 1 -pc_type mg > > -thi_mat_type baij -mg_coarse_pc_type gamg > > > > The above works perfectly fine if I used 96 processes. I also tried to > use > > a finer coarse mesh on 1032 but the error persists. > > > > Any ideas why this is happening? What are the ideal parameters to use if > I > > want to use 1k+ cores? > > > > Thanks, > > Justin > > > > On Fri, Mar 31, 2017 at 12:47 PM, Barry Smith > wrote: > > > >> > >> > On Mar 31, 2017, at 10:00 AM, Jed Brown wrote: > >> > > >> > Justin Chang writes: > >> > > >> >> Yeah based on my experiments it seems setting pc_mg_levels to > $DAREFINE > >> + 1 > >> >> has decent performance. > >> >> > >> >> 1) is there ever a case where you'd want $MGLEVELS <= $DAREFINE? In > >> some of > >> >> the PETSc tutorial slides (e.g., http://www.mcs.anl.gov/ > >> >> petsc/documentation/tutorials/TutorialCEMRACS2016.pdf on slide > 203/227) > >> >> they say to use $MGLEVELS = 4 and $DAREFINE = 5, but when I ran > this, it > >> >> was almost twice as slow as if $MGLEVELS >= $DAREFINE > >> > > >> > Smaller coarse grids are generally more scalable -- when the problem > >> > data is distributed, multigrid is a good solution algorithm. But if > >> > multigrid stops being effective because it is not preserving > sufficient > >> > coarse grid accuracy (e.g., for transport-dominated problems in > >> > complicated domains) then you might want to stop early and use a more > >> > robust method (like direct solves). > >> > >> Basically for symmetric positive definite operators you can make the > >> coarse problem as small as you like (even 1 point) in theory. For > >> indefinite and non-symmetric problems the theory says the "coarse grid > must > >> be sufficiently fine" (loosely speaking the coarse grid has to resolve > the > >> eigenmodes for the eigenvalues to the left of the x = 0). > >> > >> https://www.jstor.org/stable/2158375?seq=1#page_scan_tab_contents > >> > >> > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Sun Apr 2 12:18:07 2017 From: jed at jedbrown.org (Jed Brown) Date: Sun, 02 Apr 2017 11:18:07 -0600 Subject: [petsc-users] Correlation between da_refine and pg_mg_levels In-Reply-To: References: <87y3vlmqye.fsf@jedbrown.org> <8760imsrgt.fsf@jedbrown.org> Message-ID: <87tw66rank.fsf@jedbrown.org> Justin Chang writes: > It was sort of arbitrary. I want to conduct a performance spectrum > (dofs/sec) study where at least 1k processors are used on various HPC > machines (and hopefully one more case with 10k procs). Assuming all > available cores on these compute nodes (which I know is not the greatest > idea here), 1032 Ivybridge (24 cores/node) on Edison best matches Cori's > 1024 Haswell (32 core/node). > > How do I determine the shape of the DMDA? I am guessing the number of MPI > processes needs to be compatible with this? Running with -dm_view gives you the ownership by process. DMDA is supposed to be tolerant of relatively weird numbers of processes and in any case, should error in an understandable way. 1032 has a prime factor of 43 which, with only 80 elements in a decomposed dimension, means that some processes must have only one element in that direction. Of course this hasn't been a problem at smaller scale. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From bsmith at mcs.anl.gov Sun Apr 2 14:13:43 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 2 Apr 2017 14:13:43 -0500 Subject: [petsc-users] Correlation between da_refine and pg_mg_levels In-Reply-To: References: <87y3vlmqye.fsf@jedbrown.org> Message-ID: <3FA19A8A-CED7-4D65-9A4C-03D10CCBF3EF@mcs.anl.gov> > On Apr 2, 2017, at 9:25 AM, Justin Chang wrote: > > Thanks guys, > > So I want to run SNES ex48 across 1032 processes on Edison, but I keep getting segmentation violations. These are the parameters I am trying: > > srun -n 1032 -c 2 ./ex48 -M 80 -N 80 -P 9 -da_refine 1 -pc_type mg -thi_mat_type baij -mg_coarse_pc_type gamg > > The above works perfectly fine if I used 96 processes. I also tried to use a finer coarse mesh on 1032 but the error persists. > > Any ideas why this is happening? What are the ideal parameters to use if I want to use 1k+ cores? > Hmm, one should never get segmentation violations. You should only get not completely useful error messages about incompatible sizes etc. Send an example of the segmentation violations. (I sure hope you are checking the error return codes for all functions?). Barry > Thanks, > Justin > > On Fri, Mar 31, 2017 at 12:47 PM, Barry Smith wrote: > > > On Mar 31, 2017, at 10:00 AM, Jed Brown wrote: > > > > Justin Chang writes: > > > >> Yeah based on my experiments it seems setting pc_mg_levels to $DAREFINE + 1 > >> has decent performance. > >> > >> 1) is there ever a case where you'd want $MGLEVELS <= $DAREFINE? In some of > >> the PETSc tutorial slides (e.g., http://www.mcs.anl.gov/ > >> petsc/documentation/tutorials/TutorialCEMRACS2016.pdf on slide 203/227) > >> they say to use $MGLEVELS = 4 and $DAREFINE = 5, but when I ran this, it > >> was almost twice as slow as if $MGLEVELS >= $DAREFINE > > > > Smaller coarse grids are generally more scalable -- when the problem > > data is distributed, multigrid is a good solution algorithm. But if > > multigrid stops being effective because it is not preserving sufficient > > coarse grid accuracy (e.g., for transport-dominated problems in > > complicated domains) then you might want to stop early and use a more > > robust method (like direct solves). > > Basically for symmetric positive definite operators you can make the coarse problem as small as you like (even 1 point) in theory. For indefinite and non-symmetric problems the theory says the "coarse grid must be sufficiently fine" (loosely speaking the coarse grid has to resolve the eigenmodes for the eigenvalues to the left of the x = 0). > > https://www.jstor.org/stable/2158375?seq=1#page_scan_tab_contents > > > From knepley at gmail.com Sun Apr 2 14:15:43 2017 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 2 Apr 2017 14:15:43 -0500 Subject: [petsc-users] Correlation between da_refine and pg_mg_levels In-Reply-To: <3FA19A8A-CED7-4D65-9A4C-03D10CCBF3EF@mcs.anl.gov> References: <87y3vlmqye.fsf@jedbrown.org> <3FA19A8A-CED7-4D65-9A4C-03D10CCBF3EF@mcs.anl.gov> Message-ID: On Sun, Apr 2, 2017 at 2:13 PM, Barry Smith wrote: > > > On Apr 2, 2017, at 9:25 AM, Justin Chang wrote: > > > > Thanks guys, > > > > So I want to run SNES ex48 across 1032 processes on Edison, but I keep > getting segmentation violations. These are the parameters I am trying: > > > > srun -n 1032 -c 2 ./ex48 -M 80 -N 80 -P 9 -da_refine 1 -pc_type mg > -thi_mat_type baij -mg_coarse_pc_type gamg > > > > The above works perfectly fine if I used 96 processes. I also tried to > use a finer coarse mesh on 1032 but the error persists. > > > > Any ideas why this is happening? What are the ideal parameters to use if > I want to use 1k+ cores? > > > > Hmm, one should never get segmentation violations. You should only get > not completely useful error messages about incompatible sizes etc. Send an > example of the segmentation violations. (I sure hope you are checking the > error return codes for all functions?). He is just running SNES ex48. Matt > > Barry > > > Thanks, > > Justin > > > > On Fri, Mar 31, 2017 at 12:47 PM, Barry Smith > wrote: > > > > > On Mar 31, 2017, at 10:00 AM, Jed Brown wrote: > > > > > > Justin Chang writes: > > > > > >> Yeah based on my experiments it seems setting pc_mg_levels to > $DAREFINE + 1 > > >> has decent performance. > > >> > > >> 1) is there ever a case where you'd want $MGLEVELS <= $DAREFINE? In > some of > > >> the PETSc tutorial slides (e.g., http://www.mcs.anl.gov/ > > >> petsc/documentation/tutorials/TutorialCEMRACS2016.pdf on slide > 203/227) > > >> they say to use $MGLEVELS = 4 and $DAREFINE = 5, but when I ran this, > it > > >> was almost twice as slow as if $MGLEVELS >= $DAREFINE > > > > > > Smaller coarse grids are generally more scalable -- when the problem > > > data is distributed, multigrid is a good solution algorithm. But if > > > multigrid stops being effective because it is not preserving sufficient > > > coarse grid accuracy (e.g., for transport-dominated problems in > > > complicated domains) then you might want to stop early and use a more > > > robust method (like direct solves). > > > > Basically for symmetric positive definite operators you can make the > coarse problem as small as you like (even 1 point) in theory. For > indefinite and non-symmetric problems the theory says the "coarse grid must > be sufficiently fine" (loosely speaking the coarse grid has to resolve the > eigenmodes for the eigenvalues to the left of the x = 0). > > > > https://www.jstor.org/stable/2158375?seq=1#page_scan_tab_contents > > > > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From filippo.leon at gmail.com Sun Apr 2 14:15:53 2017 From: filippo.leon at gmail.com (Filippo Leonardi) Date: Sun, 02 Apr 2017 19:15:53 +0000 Subject: [petsc-users] PETSc with modern C++ Message-ID: Hello, I have a project in mind and seek feedback. Disclaimer: I hope I am not abusing of this mailing list with this idea. If so, please ignore. As a thought experiment, and to have a bit of fun, I am currently writing/thinking on writing, a small (modern) C++ wrapper around PETSc. Premise: PETSc is awesome, I love it and use in many projects. Sometimes I am just not super comfortable writing C. (I know my idea goes against PETSc's design philosophy). I know there are many around, and there is not really a need for this (especially since PETSc has his own object-oriented style), but there are a few things I would like to really include in this wrapper, that I found nowhere): - I am currently only thinking about the Vector/Matrix/KSP/DM part of the Framework, there are many other cool things that PETSc does that I do not have the brainpower to consider those as well. - expression templates (in my opinion this is where C++ shines): this would replace all code bloat that a user might need with cool/easy to read expressions (this could increase the number of axpy-like routines); - those expression templates should use SSE and AVX whenever available; - expressions like x += alpha * y should fall back to BLAS axpy (tough sometimes this is not even faster than a simple loop); - all calls to PETSc should be less verbose, more C++-like: * for instance a VecGlobalToLocalBegin could return an empty object that calls VecGlobalToLocalEnd when it is destroyed. * some cool idea to easily write GPU kernels. - the idea would be to have safer routines (at compile time), by means of RAII etc. I aim for zero/near-zero/negligible overhead with full optimization, for that I include benchmarks and extensive test units. So my question is: - anyone that would be interested (in the product/in developing)? - anyone that has suggestions (maybe that what I have in mind is nonsense)? If you have read up to here, thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mpovolot at purdue.edu Sun Apr 2 17:15:36 2017 From: mpovolot at purdue.edu (Michael Povolotskyi) Date: Sun, 2 Apr 2017 18:15:36 -0400 Subject: [petsc-users] PETSc with modern C++ In-Reply-To: References: Message-ID: <7b43ad9a-292c-e790-6692-ee09d8420153@purdue.edu> Hello Filippo, we had to write a wrapper around Petsc to use both double and double complex functions in the same code. We achieved it by creating two shared object libraries and hiding Petsc symbols. Once we had to achieve it for a statically linked executable, this was really painful, we had to change symbols in the object files. If your wrapper can make this process easier, it would be great. I'm not interested in developing as such, but I'm interested in the product and in a good performance. Michael. On 4/2/2017 3:15 PM, Filippo Leonardi wrote: > > Hello, > > I have a project in mind and seek feedback. > > Disclaimer: I hope I am not abusing of this mailing list with this > idea. If so, please ignore. > > As a thought experiment, and to have a bit of fun, I am currently > writing/thinking on writing, a small (modern) C++ wrapper around PETSc. > > Premise: PETSc is awesome, I love it and use in many projects. > Sometimes I am just not super comfortable writing C. (I know my idea > goes against PETSc's design philosophy). > > I know there are many around, and there is not really a need for this > (especially since PETSc has his own object-oriented style), but there > are a few things I would like to really include in this wrapper, that > I found nowhere): > - I am currently only thinking about the Vector/Matrix/KSP/DM part of > the Framework, there are many other cool things that PETSc does that I > do not have the brainpower to consider those as well. > - expression templates (in my opinion this is where C++ shines): this > would replace all code bloat that a user might need with cool/easy to > read expressions (this could increase the number of axpy-like routines); > - those expression templates should use SSE and AVX whenever available; > - expressions like x += alpha * y should fall back to BLAS axpy (tough > sometimes this is not even faster than a simple loop); > - all calls to PETSc should be less verbose, more C++-like: > * for instance a VecGlobalToLocalBegin could return an empty object > that calls VecGlobalToLocalEnd when it is destroyed. > * some cool idea to easily write GPU kernels. > - the idea would be to have safer routines(at compile time), by means > of RAII etc. > > I aim for zero/near-zero/negligible overhead with full optimization, > for that I include benchmarks and extensive test units. > > So my question is: > - anyone that would be interested (in the product/in developing)? > - anyone that has suggestions (maybe that what I have in mind is > nonsense)? > > If you have read up to here, thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun Apr 2 19:00:53 2017 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 2 Apr 2017 19:00:53 -0500 Subject: [petsc-users] PETSc with modern C++ In-Reply-To: References: Message-ID: On Sun, Apr 2, 2017 at 2:15 PM, Filippo Leonardi wrote: > > Hello, > > I have a project in mind and seek feedback. > > Disclaimer: I hope I am not abusing of this mailing list with this idea. > If so, please ignore. > > As a thought experiment, and to have a bit of fun, I am currently > writing/thinking on writing, a small (modern) C++ wrapper around PETSc. > > Premise: PETSc is awesome, I love it and use in many projects. Sometimes I > am just not super comfortable writing C. (I know my idea goes against > PETSc's design philosophy). > > I know there are many around, and there is not really a need for this > (especially since PETSc has his own object-oriented style), but there are a > few things I would like to really include in this wrapper, that I found > nowhere): > - I am currently only thinking about the Vector/Matrix/KSP/DM part of the > Framework, there are many other cool things that PETSc does that I do not > have the brainpower to consider those as well. > - expression templates (in my opinion this is where C++ shines): this > would replace all code bloat that a user might need with cool/easy to read > expressions (this could increase the number of axpy-like routines); > - those expression templates should use SSE and AVX whenever available; > - expressions like x += alpha * y should fall back to BLAS axpy (tough > sometimes this is not even faster than a simple loop); > The idea for the above is not clear. Do you want templates generating calls to BLAS? Or scalar code that operates on raw arrays with SSE/AVX? There is some advantage here of expanding the range of BLAS operations, which has been done to death by Liz Jessup and collaborators, but not that much. > - all calls to PETSc should be less verbose, more C++-like: > * for instance a VecGlobalToLocalBegin could return an empty object that > calls VecGlobalToLocalEnd when it is destroyed. > * some cool idea to easily write GPU kernels. > If you find a way to make this pay off it would be amazing, since currently nothing but BLAS3 has a hope of mattering in this context. > - the idea would be to have safer routines (at compile time), by means of > RAII etc. > > I aim for zero/near-zero/negligible overhead with full optimization, for > that I include benchmarks and extensive test units. > > So my question is: > - anyone that would be interested (in the product/in developing)? > - anyone that has suggestions (maybe that what I have in mind is nonsense)? > I would suggest making a simple performance model that says what you will do will have at least a 2x speed gain. Because anything less is not worth your time, and inevitably you will not get the whole multiplier. I am really skeptical that is possible with the above sketch. Second, I would try to convince myself that what you propose would be simpler, in terms of lines of code, number of objects, number of concepts, etc. Right now, that is not clear to me either. Baring that, maybe you can argue that new capabilities, such as the type flexibility described by Michael, are enabled. That would be the most convincing I think. Thanks, Matt If you have read up to here, thanks. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From elbueler at alaska.edu Sun Apr 2 20:06:46 2017 From: elbueler at alaska.edu (Ed Bueler) Date: Sun, 2 Apr 2017 17:06:46 -0800 Subject: [petsc-users] -snes_mf_operator yields "No support for this operation for this object type" in TS codes? Message-ID: Dear PETSc -- I have a TS-using and DMDA-using code in which I want to set a RHSJacobian which is only approximate. (The Jacobian uses first-order upwinding MOL while the RHSFunction uses a flux-limited MOL.) While it works with the analytical Jacobian, and -snes_fd, and -snes_fd_color, and -snes_mf, I get a "No support ..." message for -snes_mf_operator. It suffices to show the problem with src/ts/examples/tutorials/ex2.c (Note ex2.c does not use DMDA so ..._color is not available for ex2.c.) Error as follows: ~/petsc/src/ts/examples/tutorials[master*]$ ./ex2 -ts_dt 0.001 -ts_final_time 0.001 -snes_monitor 0 SNES Function norm 1.059316422854e+01 1 SNES Function norm 1.035505461114e-05 2 SNES Function norm 5.498223366328e-12 ~/petsc/src/ts/examples/tutorials[master*]$ ./ex2 -ts_dt 0.001 -ts_final_time 0.001 -snes_monitor -snes_fd 0 SNES Function norm 1.059316422854e+01 1 SNES Function norm 1.208245988550e-05 2 SNES Function norm 6.022374930788e-12 ~/petsc/src/ts/examples/tutorials[master*]$ ./ex2 -ts_dt 0.001 -ts_final_time 0.001 -snes_monitor -snes_mf 0 SNES Function norm 1.059316422854e+01 1 SNES Function norm 6.136984336801e-05 2 SNES Function norm 5.355730806625e-10 ~/petsc/src/ts/examples/tutorials[master*]$ ./ex2 -ts_dt 0.001 -ts_final_time 0.001 -snes_monitor -snes_mf_operator 0 SNES Function norm 1.059316422854e+01 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: No support for this operation for this object type [0]PETSC ERROR: Mat type mffd [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3426-g0c7851c GIT Date: 2017-04-01 18:40:06 -0600 [0]PETSC ERROR: ./ex2 on a linux-c-dbg named ed-lemur by ed Sun Apr 2 17:02:47 2017 [0]PETSC ERROR: Configure options --download-mpich --with-debugging=1 [0]PETSC ERROR: #1 MatZeroEntries() line 5471 in /home/ed/petsc/src/mat/interface/matrix.c [0]PETSC ERROR: #2 TSComputeIJacobian() line 965 in /home/ed/petsc/src/ts/interface/ts.c [0]PETSC ERROR: #3 SNESTSFormJacobian_Theta() line 515 in /home/ed/petsc/src/ts/impls/implicit/theta/theta.c [0]PETSC ERROR: #4 SNESTSFormJacobian() line 5078 in /home/ed/petsc/src/ts/interface/ts.c [0]PETSC ERROR: #5 SNESComputeJacobian() line 2276 in /home/ed/petsc/src/snes/interface/snes.c [0]PETSC ERROR: #6 SNESSolve_NEWTONLS() line 222 in /home/ed/petsc/src/snes/impls/ls/ls.c [0]PETSC ERROR: #7 SNESSolve() line 3967 in /home/ed/petsc/src/snes/interface/snes.c [0]PETSC ERROR: #8 TS_SNESSolve() line 171 in /home/ed/petsc/src/ts/impls/implicit/theta/theta.c [0]PETSC ERROR: #9 TSStep_Theta() line 211 in /home/ed/petsc/src/ts/impls/implicit/theta/theta.c [0]PETSC ERROR: #10 TSStep() line 3843 in /home/ed/petsc/src/ts/interface/ts.c [0]PETSC ERROR: #11 TSSolve() line 4088 in /home/ed/petsc/src/ts/interface/ts.c [0]PETSC ERROR: #12 main() line 194 in /home/ed/petsc/src/ts/examples/tutorials/ex2.c [0]PETSC ERROR: PETSc Option Table entries: [0]PETSC ERROR: -snes_mf_operator [0]PETSC ERROR: -snes_monitor [0]PETSC ERROR: -ts_dt 0.001 [0]PETSC ERROR: -ts_final_time 0.001 [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0 [unset]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0 Is this intended or a bug? If it is intended, what is the issue? I am using current master branch (commit 0c7851c55cba8e40da5083f79ba1ff846acd45b2). Thanks for your help and awesome library! Ed -- Ed Bueler Dept of Math and Stat and Geophysical Institute University of Alaska Fairbanks Fairbanks, AK 99775-6660 301C Chapman -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Sun Apr 2 21:23:14 2017 From: jed at jedbrown.org (Jed Brown) Date: Sun, 02 Apr 2017 20:23:14 -0600 Subject: [petsc-users] -snes_mf_operator yields "No support for this operation for this object type" in TS codes? In-Reply-To: References: Message-ID: <877f32qlf1.fsf@jedbrown.org> The issue is that we need to create a*I - Jrhs and this is currently done by creating a*I first when we have separate matrices for the left and right hand sides. There is code to just scale and shift Jrhs when there is no IJacobian, but the creation logic got messed up at some point (or at least for some TS configurations that are common). We were discussing this recently in a performance context and this branch is supposed to fix that logic. Does this branch work for you? https://bitbucket.org/petsc/petsc/pull-requests/655/fix-flaw-with-tssetrhsjacobian-and-no/diff Ed Bueler writes: > Dear PETSc -- > > I have a TS-using and DMDA-using code in which I want to set a RHSJacobian > which is only approximate. (The Jacobian uses first-order upwinding MOL > while the RHSFunction uses a flux-limited MOL.) While it works with the > analytical Jacobian, and -snes_fd, and -snes_fd_color, and -snes_mf, I get > a "No support ..." message for -snes_mf_operator. > > It suffices to show the problem with > > src/ts/examples/tutorials/ex2.c > > (Note ex2.c does not use DMDA so ..._color is not available for ex2.c.) > > Error as follows: > > ~/petsc/src/ts/examples/tutorials[master*]$ ./ex2 -ts_dt 0.001 > -ts_final_time 0.001 -snes_monitor > 0 SNES Function norm 1.059316422854e+01 > 1 SNES Function norm 1.035505461114e-05 > 2 SNES Function norm 5.498223366328e-12 > ~/petsc/src/ts/examples/tutorials[master*]$ ./ex2 -ts_dt 0.001 > -ts_final_time 0.001 -snes_monitor -snes_fd > 0 SNES Function norm 1.059316422854e+01 > 1 SNES Function norm 1.208245988550e-05 > 2 SNES Function norm 6.022374930788e-12 > ~/petsc/src/ts/examples/tutorials[master*]$ ./ex2 -ts_dt 0.001 > -ts_final_time 0.001 -snes_monitor -snes_mf > 0 SNES Function norm 1.059316422854e+01 > 1 SNES Function norm 6.136984336801e-05 > 2 SNES Function norm 5.355730806625e-10 > ~/petsc/src/ts/examples/tutorials[master*]$ ./ex2 -ts_dt 0.001 > -ts_final_time 0.001 -snes_monitor -snes_mf_operator > 0 SNES Function norm 1.059316422854e+01 > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: No support for this operation for this object type > [0]PETSC ERROR: Mat type mffd > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for > trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3426-g0c7851c GIT > Date: 2017-04-01 18:40:06 -0600 > [0]PETSC ERROR: ./ex2 on a linux-c-dbg named ed-lemur by ed Sun Apr 2 > 17:02:47 2017 > [0]PETSC ERROR: Configure options --download-mpich --with-debugging=1 > [0]PETSC ERROR: #1 MatZeroEntries() line 5471 in > /home/ed/petsc/src/mat/interface/matrix.c > [0]PETSC ERROR: #2 TSComputeIJacobian() line 965 in > /home/ed/petsc/src/ts/interface/ts.c > [0]PETSC ERROR: #3 SNESTSFormJacobian_Theta() line 515 in > /home/ed/petsc/src/ts/impls/implicit/theta/theta.c > [0]PETSC ERROR: #4 SNESTSFormJacobian() line 5078 in > /home/ed/petsc/src/ts/interface/ts.c > [0]PETSC ERROR: #5 SNESComputeJacobian() line 2276 in > /home/ed/petsc/src/snes/interface/snes.c > [0]PETSC ERROR: #6 SNESSolve_NEWTONLS() line 222 in > /home/ed/petsc/src/snes/impls/ls/ls.c > [0]PETSC ERROR: #7 SNESSolve() line 3967 in > /home/ed/petsc/src/snes/interface/snes.c > [0]PETSC ERROR: #8 TS_SNESSolve() line 171 in > /home/ed/petsc/src/ts/impls/implicit/theta/theta.c > [0]PETSC ERROR: #9 TSStep_Theta() line 211 in > /home/ed/petsc/src/ts/impls/implicit/theta/theta.c > [0]PETSC ERROR: #10 TSStep() line 3843 in > /home/ed/petsc/src/ts/interface/ts.c > [0]PETSC ERROR: #11 TSSolve() line 4088 in > /home/ed/petsc/src/ts/interface/ts.c > [0]PETSC ERROR: #12 main() line 194 in > /home/ed/petsc/src/ts/examples/tutorials/ex2.c > [0]PETSC ERROR: PETSc Option Table entries: > [0]PETSC ERROR: -snes_mf_operator > [0]PETSC ERROR: -snes_monitor > [0]PETSC ERROR: -ts_dt 0.001 > [0]PETSC ERROR: -ts_final_time 0.001 > [0]PETSC ERROR: ----------------End of Error Message -------send entire > error message to petsc-maint at mcs.anl.gov---------- > application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0 > [unset]: aborting job: > application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0 > > Is this intended or a bug? If it is intended, what is the issue? > > I am using current master branch (commit > 0c7851c55cba8e40da5083f79ba1ff846acd45b2). > > Thanks for your help and awesome library! > > Ed > > > > > > -- > Ed Bueler > Dept of Math and Stat and Geophysical Institute > University of Alaska Fairbanks > Fairbanks, AK 99775-6660 > 301C Chapman -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From elbueler at alaska.edu Sun Apr 2 21:39:39 2017 From: elbueler at alaska.edu (Ed Bueler) Date: Sun, 2 Apr 2017 18:39:39 -0800 Subject: [petsc-users] -snes_mf_operator yields "No support for this operation for this object type" in TS codes? In-Reply-To: <877f32qlf1.fsf@jedbrown.org> References: <877f32qlf1.fsf@jedbrown.org> Message-ID: I checked out branch barry/fix-huge-flaw-in-ts (see pull request #655) and reconfigured and rebuilt. No, the ts/.../ex2.c example is not fixed. It gives same error: ~/petsc/src/ts/examples/tutorials[barry/fix-huge-flaw-in-ts*]$ ./ex2 -ts_dt 0.001 -ts_final_time 0.001 -snes_monitor -snes_mf_operator 0 SNES Function norm 1.059316422854e+01 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: No support for this operation for this object type [0]PETSC ERROR: Mat type mffd [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3346-g41a1d4d GIT Date: 2017-03-24 14:39:28 -0500 [0]PETSC ERROR: ./ex2 on a linux-c-dbg named bueler-leopard by ed Sun Apr 2 18:31:55 2017 [0]PETSC ERROR: Configure options --download-mpich --with-debugging=1 [0]PETSC ERROR: #1 MatZeroEntries() line 5471 in /home/ed/petsc/src/mat/interface/matrix.c [0]PETSC ERROR: #2 TSComputeIJacobian() line 942 in /home/ed/petsc/src/ts/interface/ts.c [0]PETSC ERROR: #3 SNESTSFormJacobian_Theta() line 515 in /home/ed/petsc/src/ts/impls/implicit/theta/theta.c [0]PETSC ERROR: #4 SNESTSFormJacobian() line 5055 in /home/ed/petsc/src/ts/interface/ts.c [0]PETSC ERROR: #5 SNESComputeJacobian() line 2276 in /home/ed/petsc/src/snes/interface/snes.c [0]PETSC ERROR: #6 SNESSolve_NEWTONLS() line 222 in /home/ed/petsc/src/snes/impls/ls/ls.c [0]PETSC ERROR: #7 SNESSolve() line 3967 in /home/ed/petsc/src/snes/interface/snes.c [0]PETSC ERROR: #8 TS_SNESSolve() line 171 in /home/ed/petsc/src/ts/impls/implicit/theta/theta.c [0]PETSC ERROR: #9 TSStep_Theta() line 211 in /home/ed/petsc/src/ts/impls/implicit/theta/theta.c [0]PETSC ERROR: #10 TSStep() line 3820 in /home/ed/petsc/src/ts/interface/ts.c [0]PETSC ERROR: #11 TSSolve() line 4065 in /home/ed/petsc/src/ts/interface/ts.c [0]PETSC ERROR: #12 main() line 194 in /home/ed/petsc/src/ts/examples/tutorials/ex2.c [0]PETSC ERROR: PETSc Option Table entries: [0]PETSC ERROR: -snes_mf_operator [0]PETSC ERROR: -snes_monitor [0]PETSC ERROR: -ts_dt 0.001 [0]PETSC ERROR: -ts_final_time 0.001 [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0 [unset]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0 Curiously, my own code (attached), which differs from ex2.c in that it uses DMDA, *does* change behavior to a DIVERGED_NONLINEAR_SOLVE error: $ ./advect -da_refine 0 -ts_monitor -adv_circlewind -adv_conex 0.3 -adv_coney 0.3 -ts_type beuler -snes_monitor_short -ts_final_time 0.01 -ts_dt 0.01 -snes_rtol 1.0e-4 -adv_firstorder -snes_mf_operator solving on 5 x 5 grid with dx=0.2 x dy=0.2 cells, t0=0., and initial step dt=0.01 ... 0 TS dt 0.01 time 0. 0 SNES Function norm 2.1277 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: [0]PETSC ERROR: TSStep has failed due to DIVERGED_NONLINEAR_SOLVE, increase -ts_max_snes_failures or make negative to attempt recovery [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3346-g41a1d4d GIT Date: 2017-03-24 14:39:28 -0500 [0]PETSC ERROR: ./advect on a linux-c-dbg named bueler-leopard by ed Sun Apr 2 18:34:18 2017 [0]PETSC ERROR: Configure options --download-mpich --with-debugging=1 [0]PETSC ERROR: #1 TSStep() line 3829 in /home/ed/petsc/src/ts/interface/ts.c [0]PETSC ERROR: #2 TSSolve() line 4065 in /home/ed/petsc/src/ts/interface/ts.c [0]PETSC ERROR: #3 main() line 314 in /home/ed/repos/p4pdes/c/ch9/advect.c [0]PETSC ERROR: PETSc Option Table entries: [0]PETSC ERROR: -adv_circlewind [0]PETSC ERROR: -adv_conex 0.3 [0]PETSC ERROR: -adv_coney 0.3 [0]PETSC ERROR: -adv_firstorder [0]PETSC ERROR: -da_refine 0 [0]PETSC ERROR: -snes_mf_operator [0]PETSC ERROR: -snes_monitor_short [0]PETSC ERROR: -snes_rtol 1.0e-4 [0]PETSC ERROR: -ts_dt 0.01 [0]PETSC ERROR: -ts_final_time 0.01 [0]PETSC ERROR: -ts_monitor [0]PETSC ERROR: -ts_type beuler [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- application called MPI_Abort(MPI_COMM_WORLD, 91) - process 0 [unset]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 91) - process 0 Previously it gave a "No support for this operation for this object type ... Mat type mffd" error like ex2.c. Progress? Note that my code works fine (i.e. no errors) with any of: the analytical Jacobian or -snes_fd or -snes_fd_color or -snes_mf. Ed On Sun, Apr 2, 2017 at 6:23 PM, Jed Brown wrote: > The issue is that we need to create > > a*I - Jrhs > > and this is currently done by creating a*I first when we have separate > matrices for the left and right hand sides. There is code to just scale > and shift Jrhs when there is no IJacobian, but the creation logic got > messed up at some point (or at least for some TS configurations that are > common). We were discussing this recently in a performance context and > this branch is supposed to fix that logic. Does this branch work for > you? > > https://bitbucket.org/petsc/petsc/pull-requests/655/fix- > flaw-with-tssetrhsjacobian-and-no/diff > > Ed Bueler writes: > > > Dear PETSc -- > > > > I have a TS-using and DMDA-using code in which I want to set a > RHSJacobian > > which is only approximate. (The Jacobian uses first-order upwinding MOL > > while the RHSFunction uses a flux-limited MOL.) While it works with the > > analytical Jacobian, and -snes_fd, and -snes_fd_color, and -snes_mf, I > get > > a "No support ..." message for -snes_mf_operator. > > > > It suffices to show the problem with > > > > src/ts/examples/tutorials/ex2.c > > > > (Note ex2.c does not use DMDA so ..._color is not available for ex2.c.) > > > > Error as follows: > > > > ~/petsc/src/ts/examples/tutorials[master*]$ ./ex2 -ts_dt 0.001 > > -ts_final_time 0.001 -snes_monitor > > 0 SNES Function norm 1.059316422854e+01 > > 1 SNES Function norm 1.035505461114e-05 > > 2 SNES Function norm 5.498223366328e-12 > > ~/petsc/src/ts/examples/tutorials[master*]$ ./ex2 -ts_dt 0.001 > > -ts_final_time 0.001 -snes_monitor -snes_fd > > 0 SNES Function norm 1.059316422854e+01 > > 1 SNES Function norm 1.208245988550e-05 > > 2 SNES Function norm 6.022374930788e-12 > > ~/petsc/src/ts/examples/tutorials[master*]$ ./ex2 -ts_dt 0.001 > > -ts_final_time 0.001 -snes_monitor -snes_mf > > 0 SNES Function norm 1.059316422854e+01 > > 1 SNES Function norm 6.136984336801e-05 > > 2 SNES Function norm 5.355730806625e-10 > > ~/petsc/src/ts/examples/tutorials[master*]$ ./ex2 -ts_dt 0.001 > > -ts_final_time 0.001 -snes_monitor -snes_mf_operator > > 0 SNES Function norm 1.059316422854e+01 > > [0]PETSC ERROR: --------------------- Error Message > > -------------------------------------------------------------- > > [0]PETSC ERROR: No support for this operation for this object type > > [0]PETSC ERROR: Mat type mffd > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for > > trouble shooting. > > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3426-g0c7851c GIT > > Date: 2017-04-01 18:40:06 -0600 > > [0]PETSC ERROR: ./ex2 on a linux-c-dbg named ed-lemur by ed Sun Apr 2 > > 17:02:47 2017 > > [0]PETSC ERROR: Configure options --download-mpich --with-debugging=1 > > [0]PETSC ERROR: #1 MatZeroEntries() line 5471 in > > /home/ed/petsc/src/mat/interface/matrix.c > > [0]PETSC ERROR: #2 TSComputeIJacobian() line 965 in > > /home/ed/petsc/src/ts/interface/ts.c > > [0]PETSC ERROR: #3 SNESTSFormJacobian_Theta() line 515 in > > /home/ed/petsc/src/ts/impls/implicit/theta/theta.c > > [0]PETSC ERROR: #4 SNESTSFormJacobian() line 5078 in > > /home/ed/petsc/src/ts/interface/ts.c > > [0]PETSC ERROR: #5 SNESComputeJacobian() line 2276 in > > /home/ed/petsc/src/snes/interface/snes.c > > [0]PETSC ERROR: #6 SNESSolve_NEWTONLS() line 222 in > > /home/ed/petsc/src/snes/impls/ls/ls.c > > [0]PETSC ERROR: #7 SNESSolve() line 3967 in > > /home/ed/petsc/src/snes/interface/snes.c > > [0]PETSC ERROR: #8 TS_SNESSolve() line 171 in > > /home/ed/petsc/src/ts/impls/implicit/theta/theta.c > > [0]PETSC ERROR: #9 TSStep_Theta() line 211 in > > /home/ed/petsc/src/ts/impls/implicit/theta/theta.c > > [0]PETSC ERROR: #10 TSStep() line 3843 in > > /home/ed/petsc/src/ts/interface/ts.c > > [0]PETSC ERROR: #11 TSSolve() line 4088 in > > /home/ed/petsc/src/ts/interface/ts.c > > [0]PETSC ERROR: #12 main() line 194 in > > /home/ed/petsc/src/ts/examples/tutorials/ex2.c > > [0]PETSC ERROR: PETSc Option Table entries: > > [0]PETSC ERROR: -snes_mf_operator > > [0]PETSC ERROR: -snes_monitor > > [0]PETSC ERROR: -ts_dt 0.001 > > [0]PETSC ERROR: -ts_final_time 0.001 > > [0]PETSC ERROR: ----------------End of Error Message -------send entire > > error message to petsc-maint at mcs.anl.gov---------- > > application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0 > > [unset]: aborting job: > > application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0 > > > > Is this intended or a bug? If it is intended, what is the issue? > > > > I am using current master branch (commit > > 0c7851c55cba8e40da5083f79ba1ff846acd45b2). > > > > Thanks for your help and awesome library! > > > > Ed > > > > > > > > > > > > -- > > Ed Bueler > > Dept of Math and Stat and Geophysical Institute > > University of Alaska Fairbanks > > Fairbanks, AK 99775-6660 > > 301C Chapman > -- Ed Bueler Dept of Math and Stat and Geophysical Institute University of Alaska Fairbanks Fairbanks, AK 99775-6660 301C Chapman -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: advect.c Type: text/x-csrc Size: 14636 bytes Desc: not available URL: From ling.zou at inl.gov Sun Apr 2 21:51:55 2017 From: ling.zou at inl.gov (Zou, Ling) Date: Sun, 2 Apr 2017 20:51:55 -0600 Subject: [petsc-users] Proper way to abort/restart SNESSolve due to exceptions In-Reply-To: References: Message-ID: Barry, appreciate your detailed answers. I will give it a try. Best, Ling On Fri, Mar 31, 2017 at 3:45 PM, Barry Smith wrote: > > PETSc doesn't use C++ exceptions. If a catastrophic unrecoverable error > occurs each PETSc routine returns a nonzero error code. All the application > can do in that case is end. > > If a solver does not converge then PETSc does not use a nonzero error > code, instead you obtain information from calls to SNESGetConvergedReason() > (or KSPGetConvergedReason()) to determine if there was a convergence > failure or not. If there was a lack of convergence PETSc solvers still > remain in a valid state and there is no "cleanup" on the solver objects > needed. > > We have improved the PETSc handling of failed function evaluations, > failed linear solvers etc in the past year so you MUST use the master > branch of the PETSc repository and not the release version. > > So your code can look like > > SNESSolve() > SNESGetConvergedReason(snes,&reason); > if (reason < 0) dt = .5*dt; > else no_converged = false; > > > Note that the PETSc TS time-stepping object already manages stuff like > this as well as providing local error estimate controls so it is better to > use the PETSc TS rather than using SNES and having to manage all the > time-stepping yourself. > > Barry > > > On Mar 31, 2017, at 10:28 AM, Zou, Ling wrote: > > > > Hi All, > > > > I have done some researching in the PETSc email archive, but did not > find a decent way to do it. Assume that, during a SNESSolve, something > unphysical happening and a C++ exception is send, and now I want to stop > the SNESSolve and let it try a smaller time step. > > Here are some pseudo code I have: > > > > while(no_converged) > > { > > try > > { > > SNESSolve(...); > > if (SNESSolve converged) > > no_converged = false; > > } > > catch(int err) > > { > > /* do some clean work here? */ > > dt = 0.5 * dt; > > } > > } > > > > It seems to me that it is not a good way to do, as if an exception is > thrown during SNESSolve, the code go to the error catching, changing time > step, and immediately do SNESSolve again. I would expect that there should > be some clean work needed before another SNESSolve call, as I commented in > the code. > > > > I found two related email threads: > > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists. > mcs.anl.gov_pipermail_petsc-2Dusers_2014-2DAugust_022597.html&d=DwIFAg&c= > 54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=kuHHom1yjd94zUrBWecnYg&m= > lP2oXFkw7mIXP8jrBmP9ibf6v7wu51fnRj9VeYDYS3k&s= > RRKX6UIqdBWee88UeOAUNO8MHg6ULljCgZkCaTqCZCQ&e= > > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists. > mcs.anl.gov_pipermail_petsc-2Dusers_2015-2DFebruary_ > 024367.html&d=DwIFAg&c=54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r= > kuHHom1yjd94zUrBWecnYg&m=lP2oXFkw7mIXP8jrBmP9ibf6v7wu51 > fnRj9VeYDYS3k&s=h1wm-wFX_C4uzDG0aa6uA7w1jAlnJmKUMvDNneULt4I&e= > > > > But I don't see a clear answer there. > > Any comments on this issue? > > > > Best, > > > > Ling > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jychang48 at gmail.com Mon Apr 3 00:45:26 2017 From: jychang48 at gmail.com (Justin Chang) Date: Mon, 3 Apr 2017 00:45:26 -0500 Subject: [petsc-users] Correlation between da_refine and pg_mg_levels In-Reply-To: References: <87y3vlmqye.fsf@jedbrown.org> <3FA19A8A-CED7-4D65-9A4C-03D10CCBF3EF@mcs.anl.gov> Message-ID: So if I begin with a 128x128x8 grid on 1032 procs, it works fine for the first two levels of da_refine. However, on the third level I get this error: Level 3 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 1024 x 1024 x 57 (59768832), size (m) 9.76562 x 9.76562 x 17.8571 Level 2 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 512 x 512 x 29 (7602176), size (m) 19.5312 x 19.5312 x 35.7143 Level 1 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 256 x 256 x 15 (983040), size (m) 39.0625 x 39.0625 x 71.4286 Level 0 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 128 x 128 x 8 (131072), size (m) 78.125 x 78.125 x 142.857 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Petsc has generated inconsistent data [0]PETSC ERROR: Eigen estimator failed: DIVERGED_NANORINF at iteration 0 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3418-ge372536 GIT Date: 2017-03-30 13:35:15 -0500 [0]PETSC ERROR: /scratch2/scratchdirs/jychang/Icesheet/./ex48edison on a arch-edison-c-opt named nid00865 by jychang Sun Apr 2 21:44:44 2017 [0]PETSC ERROR: Configure options --download-fblaslapack --with-cc=cc --with-clib-autodetect=0 --with-cxx=CC --with-cxxlib-autodetect=0 --with-debugging=0 --with-fc=ftn --with-fortranlib-autodetect=0 --with-mpiexec=srun --with-64-bit-indices=1 COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 PETSC_ARCH=arch-edison-c-opt [0]PETSC ERROR: #1 KSPSolve_Chebyshev() line 380 in /global/u1/j/jychang/Software/petsc/src/ksp/ksp/impls/cheby/cheby.c [0]PETSC ERROR: #2 KSPSolve() line 655 in /global/u1/j/jychang/Software/ petsc/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #3 PCMGMCycle_Private() line 19 in /global/u1/j/jychang/Software/petsc/src/ksp/pc/impls/mg/mg.c [0]PETSC ERROR: #4 PCMGMCycle_Private() line 53 in /global/u1/j/jychang/Software/petsc/src/ksp/pc/impls/mg/mg.c [0]PETSC ERROR: #5 PCApply_MG() line 331 in /global/u1/j/jychang/Software/ petsc/src/ksp/pc/impls/mg/mg.c [0]PETSC ERROR: #6 PCApply() line 458 in /global/u1/j/jychang/Software/ petsc/src/ksp/pc/interface/precon.c [0]PETSC ERROR: #7 KSP_PCApply() line 251 in /global/homes/j/jychang/ Software/petsc/include/petsc/private/kspimpl.h [0]PETSC ERROR: #8 KSPInitialResidual() line 67 in /global/u1/j/jychang/Software/petsc/src/ksp/ksp/interface/itres.c [0]PETSC ERROR: #9 KSPSolve_GMRES() line 233 in /global/u1/j/jychang/Software/petsc/src/ksp/ksp/impls/gmres/gmres.c [0]PETSC ERROR: #10 KSPSolve() line 655 in /global/u1/j/jychang/Software/ petsc/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #11 SNESSolve_NEWTONLS() line 224 in /global/u1/j/jychang/Software/petsc/src/snes/impls/ls/ls.c [0]PETSC ERROR: #12 SNESSolve() line 3967 in /global/u1/j/jychang/Software/ petsc/src/snes/interface/snes.c [0]PETSC ERROR: #13 main() line 1548 in /scratch2/scratchdirs/jychang/ Icesheet/ex48.c [0]PETSC ERROR: PETSc Option Table entries: [0]PETSC ERROR: -M 128 [0]PETSC ERROR: -N 128 [0]PETSC ERROR: -P 8 [0]PETSC ERROR: -da_refine 3 [0]PETSC ERROR: -mg_coarse_pc_type gamg [0]PETSC ERROR: -pc_mg_levels 4 [0]PETSC ERROR: -pc_type mg [0]PETSC ERROR: -thi_mat_type baij [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- If I changed the coarse grid to 129x129x8, no error whatsoever for up to 4 levels of refinement. However, I am having trouble getting this started up on Cori's KNL... I am using a coarse grid 136x136x8 across 1088 cores, and slurm is simply cancelling the job. No other PETSc error was given. This is literally what my log files say: Level 1 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 272 x 272 x 15 (1109760), size (m) 36.7647 x 36.7647 x 71.4286 Level 0 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 136 x 136 x 8 (147968), size (m) 73.5294 x 73.5294 x 142.857 makefile:25: recipe for target 'runcori' failed Level 2 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 544 x 544 x 29 (8582144), size (m) 18.3824 x 18.3824 x 35.7143 Level 1 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 272 x 272 x 15 (1109760), size (m) 36.7647 x 36.7647 x 71.4286 Level 0 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 136 x 136 x 8 (147968), size (m) 73.5294 x 73.5294 x 142.857 srun: error: nid04139: task 480: Killed srun: Terminating job step 4387719.0 srun: Job step aborted: Waiting up to 32 seconds for job step to finish. slurmstepd: error: *** STEP 4387719.0 ON nid03873 CANCELLED AT 2017-04-02T22:21:21 *** srun: error: nid03960: task 202: Killed srun: error: nid04005: task 339: Killed srun: error: nid03873: task 32: Killed srun: error: nid03960: task 203: Killed srun: error: nid03873: task 3: Killed srun: error: nid03960: task 199: Killed srun: error: nid04004: task 264: Killed srun: error: nid04141: task 660: Killed srun: error: nid04139: task 539: Killed srun: error: nid03873: task 63: Killed srun: error: nid03960: task 170: Killed srun: error: nid08164: task 821: Killed srun: error: nid04139: task 507: Killed srun: error: nid04005: task 299: Killed srun: error: nid03960: tasks 136-169,171-198,200-201: Killed srun: error: nid04005: task 310: Killed srun: error: nid08166: task 1008: Killed srun: error: nid04141: task 671: Killed srun: error: nid03873: task 18: Killed srun: error: nid04139: tasks 476-479,481-506,508-538,540-543: Killed srun: error: nid04005: tasks 272-298,300-309,311-338: Killed srun: error: nid04140: tasks 544-611: Killed srun: error: nid04142: tasks 680-747: Killed srun: error: nid04138: tasks 408-475: Killed srun: error: nid04006: tasks 340-407: Killed srun: error: nid08163: tasks 748-815: Killed srun: error: nid08166: tasks 952-1007,1009-1019: Killed srun: error: nid03873: tasks 0-2,4-17,19-31,33-62,64-67: Killed srun: error: nid08165: tasks 884-951: Killed srun: error: nid03883: tasks 68-135: Killed srun: error: nid08164: tasks 816-820,822-883: Killed srun: error: nid08167: tasks 1020-1087: Killed srun: error: nid04141: tasks 612-659,661-670,672-679: Killed srun: error: nid04004: tasks 204-263,265-271: Killed make: [runcori] Error 137 (ignored) [257]PETSC ERROR: ------------------------------------------------------------------------ [257]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch system) has told this process to end [257]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [257]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [257]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [257]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run [257]PETSC ERROR: to get more information on the crash. [878]PETSC ERROR: ------------------------------------------------------------------------ [878]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch system) has told this process to end [878]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [878]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [878]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [878]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run [878]PETSC ERROR: to get more information on the crash. .... [clipped] .... my job script for KNL looks like this: #!/bin/bash #SBATCH -N 16 #SBATCH -C knl,quad,cache #SBATCH -p regular #SBATCH -J knl1024 #SBATCH -L SCRATCH #SBATCH -o knl1088.o%j #SBATCH -e knl1088.e%j #SBATCH --mail-type=ALL #SBATCH --mail-user=jychang48 at gmail.com #SBATCH -t 00:20:00 srun -n 1088 -c 4 --cpu_bind=cores ./ex48 .... Any ideas why this is happening? Or do I need to contact the NERSC folks? Thanks, Justin On Sun, Apr 2, 2017 at 2:15 PM, Matthew Knepley wrote: > On Sun, Apr 2, 2017 at 2:13 PM, Barry Smith wrote: > >> >> > On Apr 2, 2017, at 9:25 AM, Justin Chang wrote: >> > >> > Thanks guys, >> > >> > So I want to run SNES ex48 across 1032 processes on Edison, but I keep >> getting segmentation violations. These are the parameters I am trying: >> > >> > srun -n 1032 -c 2 ./ex48 -M 80 -N 80 -P 9 -da_refine 1 -pc_type mg >> -thi_mat_type baij -mg_coarse_pc_type gamg >> > >> > The above works perfectly fine if I used 96 processes. I also tried to >> use a finer coarse mesh on 1032 but the error persists. >> > >> > Any ideas why this is happening? What are the ideal parameters to use >> if I want to use 1k+ cores? >> > >> >> Hmm, one should never get segmentation violations. You should only get >> not completely useful error messages about incompatible sizes etc. Send an >> example of the segmentation violations. (I sure hope you are checking the >> error return codes for all functions?). > > > He is just running SNES ex48. > > Matt > > >> >> Barry >> >> > Thanks, >> > Justin >> > >> > On Fri, Mar 31, 2017 at 12:47 PM, Barry Smith >> wrote: >> > >> > > On Mar 31, 2017, at 10:00 AM, Jed Brown wrote: >> > > >> > > Justin Chang writes: >> > > >> > >> Yeah based on my experiments it seems setting pc_mg_levels to >> $DAREFINE + 1 >> > >> has decent performance. >> > >> >> > >> 1) is there ever a case where you'd want $MGLEVELS <= $DAREFINE? In >> some of >> > >> the PETSc tutorial slides (e.g., http://www.mcs.anl.gov/ >> > >> petsc/documentation/tutorials/TutorialCEMRACS2016.pdf on slide >> 203/227) >> > >> they say to use $MGLEVELS = 4 and $DAREFINE = 5, but when I ran >> this, it >> > >> was almost twice as slow as if $MGLEVELS >= $DAREFINE >> > > >> > > Smaller coarse grids are generally more scalable -- when the problem >> > > data is distributed, multigrid is a good solution algorithm. But if >> > > multigrid stops being effective because it is not preserving >> sufficient >> > > coarse grid accuracy (e.g., for transport-dominated problems in >> > > complicated domains) then you might want to stop early and use a more >> > > robust method (like direct solves). >> > >> > Basically for symmetric positive definite operators you can make the >> coarse problem as small as you like (even 1 point) in theory. For >> indefinite and non-symmetric problems the theory says the "coarse grid must >> be sufficiently fine" (loosely speaking the coarse grid has to resolve the >> eigenmodes for the eigenvalues to the left of the x = 0). >> > >> > https://www.jstor.org/stable/2158375?seq=1#page_scan_tab_contents >> > >> > >> > >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon Apr 3 00:54:46 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 3 Apr 2017 01:54:46 -0400 Subject: [petsc-users] -snes_mf_operator yields "No support for this operation for this object type" in TS codes? In-Reply-To: References: <877f32qlf1.fsf@jedbrown.org> Message-ID: <61C35066-A114-4C0B-A026-94677F4B3C7B@mcs.anl.gov> Jed, Here is the problem. https://bitbucket.org/petsc/petsc/branch/barry/fix/even-huger-flaw-in-ts > On Apr 2, 2017, at 10:39 PM, Ed Bueler wrote: > > I checked out branch barry/fix-huge-flaw-in-ts (see pull request #655) and reconfigured and rebuilt. > > No, the ts/.../ex2.c example is not fixed. It gives same error: > > ~/petsc/src/ts/examples/tutorials[barry/fix-huge-flaw-in-ts*]$ ./ex2 -ts_dt 0.001 -ts_final_time 0.001 -snes_monitor -snes_mf_operator > 0 SNES Function norm 1.059316422854e+01 > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: No support for this operation for this object type > [0]PETSC ERROR: Mat type mffd > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3346-g41a1d4d GIT Date: 2017-03-24 14:39:28 -0500 > [0]PETSC ERROR: ./ex2 on a linux-c-dbg named bueler-leopard by ed Sun Apr 2 18:31:55 2017 > [0]PETSC ERROR: Configure options --download-mpich --with-debugging=1 > [0]PETSC ERROR: #1 MatZeroEntries() line 5471 in /home/ed/petsc/src/mat/interface/matrix.c > [0]PETSC ERROR: #2 TSComputeIJacobian() line 942 in /home/ed/petsc/src/ts/interface/ts.c > [0]PETSC ERROR: #3 SNESTSFormJacobian_Theta() line 515 in /home/ed/petsc/src/ts/impls/implicit/theta/theta.c > [0]PETSC ERROR: #4 SNESTSFormJacobian() line 5055 in /home/ed/petsc/src/ts/interface/ts.c > [0]PETSC ERROR: #5 SNESComputeJacobian() line 2276 in /home/ed/petsc/src/snes/interface/snes.c > [0]PETSC ERROR: #6 SNESSolve_NEWTONLS() line 222 in /home/ed/petsc/src/snes/impls/ls/ls.c > [0]PETSC ERROR: #7 SNESSolve() line 3967 in /home/ed/petsc/src/snes/interface/snes.c > [0]PETSC ERROR: #8 TS_SNESSolve() line 171 in /home/ed/petsc/src/ts/impls/implicit/theta/theta.c > [0]PETSC ERROR: #9 TSStep_Theta() line 211 in /home/ed/petsc/src/ts/impls/implicit/theta/theta.c > [0]PETSC ERROR: #10 TSStep() line 3820 in /home/ed/petsc/src/ts/interface/ts.c > [0]PETSC ERROR: #11 TSSolve() line 4065 in /home/ed/petsc/src/ts/interface/ts.c > [0]PETSC ERROR: #12 main() line 194 in /home/ed/petsc/src/ts/examples/tutorials/ex2.c > [0]PETSC ERROR: PETSc Option Table entries: > [0]PETSC ERROR: -snes_mf_operator > [0]PETSC ERROR: -snes_monitor > [0]PETSC ERROR: -ts_dt 0.001 > [0]PETSC ERROR: -ts_final_time 0.001 > [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- > application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0 > [unset]: aborting job: > application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0 > > Curiously, my own code (attached), which differs from ex2.c in that it uses DMDA, *does* change behavior to a DIVERGED_NONLINEAR_SOLVE error: > > $ ./advect -da_refine 0 -ts_monitor -adv_circlewind -adv_conex 0.3 -adv_coney 0.3 -ts_type beuler -snes_monitor_short -ts_final_time 0.01 -ts_dt 0.01 -snes_rtol 1.0e-4 -adv_firstorder -snes_mf_operator > solving on 5 x 5 grid with dx=0.2 x dy=0.2 cells, t0=0., > and initial step dt=0.01 ... > 0 TS dt 0.01 time 0. > 0 SNES Function norm 2.1277 > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: > [0]PETSC ERROR: TSStep has failed due to DIVERGED_NONLINEAR_SOLVE, increase -ts_max_snes_failures or make negative to attempt recovery > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3346-g41a1d4d GIT Date: 2017-03-24 14:39:28 -0500 > [0]PETSC ERROR: ./advect on a linux-c-dbg named bueler-leopard by ed Sun Apr 2 18:34:18 2017 > [0]PETSC ERROR: Configure options --download-mpich --with-debugging=1 > [0]PETSC ERROR: #1 TSStep() line 3829 in /home/ed/petsc/src/ts/interface/ts.c > [0]PETSC ERROR: #2 TSSolve() line 4065 in /home/ed/petsc/src/ts/interface/ts.c > [0]PETSC ERROR: #3 main() line 314 in /home/ed/repos/p4pdes/c/ch9/advect.c > [0]PETSC ERROR: PETSc Option Table entries: > [0]PETSC ERROR: -adv_circlewind > [0]PETSC ERROR: -adv_conex 0.3 > [0]PETSC ERROR: -adv_coney 0.3 > [0]PETSC ERROR: -adv_firstorder > [0]PETSC ERROR: -da_refine 0 > [0]PETSC ERROR: -snes_mf_operator > [0]PETSC ERROR: -snes_monitor_short > [0]PETSC ERROR: -snes_rtol 1.0e-4 > [0]PETSC ERROR: -ts_dt 0.01 > [0]PETSC ERROR: -ts_final_time 0.01 > [0]PETSC ERROR: -ts_monitor > [0]PETSC ERROR: -ts_type beuler > [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- > application called MPI_Abort(MPI_COMM_WORLD, 91) - process 0 > [unset]: aborting job: > application called MPI_Abort(MPI_COMM_WORLD, 91) - process 0 > > Previously it gave a "No support for this operation for this object type ... Mat type mffd" error like ex2.c. Progress? > > Note that my code works fine (i.e. no errors) with any of: the analytical Jacobian or -snes_fd or -snes_fd_color or -snes_mf. > > Ed > > > > On Sun, Apr 2, 2017 at 6:23 PM, Jed Brown wrote: > The issue is that we need to create > > a*I - Jrhs > > and this is currently done by creating a*I first when we have separate > matrices for the left and right hand sides. There is code to just scale > and shift Jrhs when there is no IJacobian, but the creation logic got > messed up at some point (or at least for some TS configurations that are > common). We were discussing this recently in a performance context and > this branch is supposed to fix that logic. Does this branch work for > you? > > https://bitbucket.org/petsc/petsc/pull-requests/655/fix-flaw-with-tssetrhsjacobian-and-no/diff > > Ed Bueler writes: > > > Dear PETSc -- > > > > I have a TS-using and DMDA-using code in which I want to set a RHSJacobian > > which is only approximate. (The Jacobian uses first-order upwinding MOL > > while the RHSFunction uses a flux-limited MOL.) While it works with the > > analytical Jacobian, and -snes_fd, and -snes_fd_color, and -snes_mf, I get > > a "No support ..." message for -snes_mf_operator. > > > > It suffices to show the problem with > > > > src/ts/examples/tutorials/ex2.c > > > > (Note ex2.c does not use DMDA so ..._color is not available for ex2.c.) > > > > Error as follows: > > > > ~/petsc/src/ts/examples/tutorials[master*]$ ./ex2 -ts_dt 0.001 > > -ts_final_time 0.001 -snes_monitor > > 0 SNES Function norm 1.059316422854e+01 > > 1 SNES Function norm 1.035505461114e-05 > > 2 SNES Function norm 5.498223366328e-12 > > ~/petsc/src/ts/examples/tutorials[master*]$ ./ex2 -ts_dt 0.001 > > -ts_final_time 0.001 -snes_monitor -snes_fd > > 0 SNES Function norm 1.059316422854e+01 > > 1 SNES Function norm 1.208245988550e-05 > > 2 SNES Function norm 6.022374930788e-12 > > ~/petsc/src/ts/examples/tutorials[master*]$ ./ex2 -ts_dt 0.001 > > -ts_final_time 0.001 -snes_monitor -snes_mf > > 0 SNES Function norm 1.059316422854e+01 > > 1 SNES Function norm 6.136984336801e-05 > > 2 SNES Function norm 5.355730806625e-10 > > ~/petsc/src/ts/examples/tutorials[master*]$ ./ex2 -ts_dt 0.001 > > -ts_final_time 0.001 -snes_monitor -snes_mf_operator > > 0 SNES Function norm 1.059316422854e+01 > > [0]PETSC ERROR: --------------------- Error Message > > -------------------------------------------------------------- > > [0]PETSC ERROR: No support for this operation for this object type > > [0]PETSC ERROR: Mat type mffd > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for > > trouble shooting. > > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3426-g0c7851c GIT > > Date: 2017-04-01 18:40:06 -0600 > > [0]PETSC ERROR: ./ex2 on a linux-c-dbg named ed-lemur by ed Sun Apr 2 > > 17:02:47 2017 > > [0]PETSC ERROR: Configure options --download-mpich --with-debugging=1 > > [0]PETSC ERROR: #1 MatZeroEntries() line 5471 in > > /home/ed/petsc/src/mat/interface/matrix.c > > [0]PETSC ERROR: #2 TSComputeIJacobian() line 965 in > > /home/ed/petsc/src/ts/interface/ts.c > > [0]PETSC ERROR: #3 SNESTSFormJacobian_Theta() line 515 in > > /home/ed/petsc/src/ts/impls/implicit/theta/theta.c > > [0]PETSC ERROR: #4 SNESTSFormJacobian() line 5078 in > > /home/ed/petsc/src/ts/interface/ts.c > > [0]PETSC ERROR: #5 SNESComputeJacobian() line 2276 in > > /home/ed/petsc/src/snes/interface/snes.c > > [0]PETSC ERROR: #6 SNESSolve_NEWTONLS() line 222 in > > /home/ed/petsc/src/snes/impls/ls/ls.c > > [0]PETSC ERROR: #7 SNESSolve() line 3967 in > > /home/ed/petsc/src/snes/interface/snes.c > > [0]PETSC ERROR: #8 TS_SNESSolve() line 171 in > > /home/ed/petsc/src/ts/impls/implicit/theta/theta.c > > [0]PETSC ERROR: #9 TSStep_Theta() line 211 in > > /home/ed/petsc/src/ts/impls/implicit/theta/theta.c > > [0]PETSC ERROR: #10 TSStep() line 3843 in > > /home/ed/petsc/src/ts/interface/ts.c > > [0]PETSC ERROR: #11 TSSolve() line 4088 in > > /home/ed/petsc/src/ts/interface/ts.c > > [0]PETSC ERROR: #12 main() line 194 in > > /home/ed/petsc/src/ts/examples/tutorials/ex2.c > > [0]PETSC ERROR: PETSc Option Table entries: > > [0]PETSC ERROR: -snes_mf_operator > > [0]PETSC ERROR: -snes_monitor > > [0]PETSC ERROR: -ts_dt 0.001 > > [0]PETSC ERROR: -ts_final_time 0.001 > > [0]PETSC ERROR: ----------------End of Error Message -------send entire > > error message to petsc-maint at mcs.anl.gov---------- > > application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0 > > [unset]: aborting job: > > application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0 > > > > Is this intended or a bug? If it is intended, what is the issue? > > > > I am using current master branch (commit > > 0c7851c55cba8e40da5083f79ba1ff846acd45b2). > > > > Thanks for your help and awesome library! > > > > Ed > > > > > > > > > > > > -- > > Ed Bueler > > Dept of Math and Stat and Geophysical Institute > > University of Alaska Fairbanks > > Fairbanks, AK 99775-6660 > > 301C Chapman > > > > -- > Ed Bueler > Dept of Math and Stat and Geophysical Institute > University of Alaska Fairbanks > Fairbanks, AK 99775-6660 > 301C Chapman > From elbueler at alaska.edu Mon Apr 3 01:04:27 2017 From: elbueler at alaska.edu (Ed Bueler) Date: Sun, 2 Apr 2017 22:04:27 -0800 Subject: [petsc-users] -snes_mf_operator yields "No support for this operation for this object type" in TS codes? In-Reply-To: <61C35066-A114-4C0B-A026-94677F4B3C7B@mcs.anl.gov> References: <877f32qlf1.fsf@jedbrown.org> <61C35066-A114-4C0B-A026-94677F4B3C7B@mcs.anl.gov> Message-ID: Works for my code and ts/../ex2.c ... as you probably know. Ed On Sun, Apr 2, 2017 at 9:54 PM, Barry Smith wrote: > > Jed, > > Here is the problem. > > https://bitbucket.org/petsc/petsc/branch/barry/fix/even-huger-flaw-in-ts > > > > On Apr 2, 2017, at 10:39 PM, Ed Bueler wrote: > > > > I checked out branch barry/fix-huge-flaw-in-ts (see pull request #655) > and reconfigured and rebuilt. > > > > No, the ts/.../ex2.c example is not fixed. It gives same error: > > > > ~/petsc/src/ts/examples/tutorials[barry/fix-huge-flaw-in-ts*]$ ./ex2 > -ts_dt 0.001 -ts_final_time 0.001 -snes_monitor -snes_mf_operator > > 0 SNES Function norm 1.059316422854e+01 > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [0]PETSC ERROR: No support for this operation for this object type > > [0]PETSC ERROR: Mat type mffd > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3346-g41a1d4d > GIT Date: 2017-03-24 14:39:28 -0500 > > [0]PETSC ERROR: ./ex2 on a linux-c-dbg named bueler-leopard by ed Sun > Apr 2 18:31:55 2017 > > [0]PETSC ERROR: Configure options --download-mpich --with-debugging=1 > > [0]PETSC ERROR: #1 MatZeroEntries() line 5471 in /home/ed/petsc/src/mat/ > interface/matrix.c > > [0]PETSC ERROR: #2 TSComputeIJacobian() line 942 in > /home/ed/petsc/src/ts/interface/ts.c > > [0]PETSC ERROR: #3 SNESTSFormJacobian_Theta() line 515 in > /home/ed/petsc/src/ts/impls/implicit/theta/theta.c > > [0]PETSC ERROR: #4 SNESTSFormJacobian() line 5055 in > /home/ed/petsc/src/ts/interface/ts.c > > [0]PETSC ERROR: #5 SNESComputeJacobian() line 2276 in > /home/ed/petsc/src/snes/interface/snes.c > > [0]PETSC ERROR: #6 SNESSolve_NEWTONLS() line 222 in > /home/ed/petsc/src/snes/impls/ls/ls.c > > [0]PETSC ERROR: #7 SNESSolve() line 3967 in /home/ed/petsc/src/snes/ > interface/snes.c > > [0]PETSC ERROR: #8 TS_SNESSolve() line 171 in > /home/ed/petsc/src/ts/impls/implicit/theta/theta.c > > [0]PETSC ERROR: #9 TSStep_Theta() line 211 in > /home/ed/petsc/src/ts/impls/implicit/theta/theta.c > > [0]PETSC ERROR: #10 TSStep() line 3820 in /home/ed/petsc/src/ts/ > interface/ts.c > > [0]PETSC ERROR: #11 TSSolve() line 4065 in /home/ed/petsc/src/ts/ > interface/ts.c > > [0]PETSC ERROR: #12 main() line 194 in /home/ed/petsc/src/ts/ > examples/tutorials/ex2.c > > [0]PETSC ERROR: PETSc Option Table entries: > > [0]PETSC ERROR: -snes_mf_operator > > [0]PETSC ERROR: -snes_monitor > > [0]PETSC ERROR: -ts_dt 0.001 > > [0]PETSC ERROR: -ts_final_time 0.001 > > [0]PETSC ERROR: ----------------End of Error Message -------send entire > error message to petsc-maint at mcs.anl.gov---------- > > application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0 > > [unset]: aborting job: > > application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0 > > > > Curiously, my own code (attached), which differs from ex2.c in that it > uses DMDA, *does* change behavior to a DIVERGED_NONLINEAR_SOLVE error: > > > > $ ./advect -da_refine 0 -ts_monitor -adv_circlewind -adv_conex 0.3 > -adv_coney 0.3 -ts_type beuler -snes_monitor_short -ts_final_time 0.01 > -ts_dt 0.01 -snes_rtol 1.0e-4 -adv_firstorder -snes_mf_operator > > solving on 5 x 5 grid with dx=0.2 x dy=0.2 cells, t0=0., > > and initial step dt=0.01 ... > > 0 TS dt 0.01 time 0. > > 0 SNES Function norm 2.1277 > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [0]PETSC ERROR: > > [0]PETSC ERROR: TSStep has failed due to DIVERGED_NONLINEAR_SOLVE, > increase -ts_max_snes_failures or make negative to attempt recovery > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3346-g41a1d4d > GIT Date: 2017-03-24 14:39:28 -0500 > > [0]PETSC ERROR: ./advect on a linux-c-dbg named bueler-leopard by ed Sun > Apr 2 18:34:18 2017 > > [0]PETSC ERROR: Configure options --download-mpich --with-debugging=1 > > [0]PETSC ERROR: #1 TSStep() line 3829 in /home/ed/petsc/src/ts/ > interface/ts.c > > [0]PETSC ERROR: #2 TSSolve() line 4065 in /home/ed/petsc/src/ts/ > interface/ts.c > > [0]PETSC ERROR: #3 main() line 314 in /home/ed/repos/p4pdes/c/ch9/ > advect.c > > [0]PETSC ERROR: PETSc Option Table entries: > > [0]PETSC ERROR: -adv_circlewind > > [0]PETSC ERROR: -adv_conex 0.3 > > [0]PETSC ERROR: -adv_coney 0.3 > > [0]PETSC ERROR: -adv_firstorder > > [0]PETSC ERROR: -da_refine 0 > > [0]PETSC ERROR: -snes_mf_operator > > [0]PETSC ERROR: -snes_monitor_short > > [0]PETSC ERROR: -snes_rtol 1.0e-4 > > [0]PETSC ERROR: -ts_dt 0.01 > > [0]PETSC ERROR: -ts_final_time 0.01 > > [0]PETSC ERROR: -ts_monitor > > [0]PETSC ERROR: -ts_type beuler > > [0]PETSC ERROR: ----------------End of Error Message -------send entire > error message to petsc-maint at mcs.anl.gov---------- > > application called MPI_Abort(MPI_COMM_WORLD, 91) - process 0 > > [unset]: aborting job: > > application called MPI_Abort(MPI_COMM_WORLD, 91) - process 0 > > > > Previously it gave a "No support for this operation for this object type > ... Mat type mffd" error like ex2.c. Progress? > > > > Note that my code works fine (i.e. no errors) with any of: the > analytical Jacobian or -snes_fd or -snes_fd_color or -snes_mf. > > > > Ed > > > > > > > > On Sun, Apr 2, 2017 at 6:23 PM, Jed Brown wrote: > > The issue is that we need to create > > > > a*I - Jrhs > > > > and this is currently done by creating a*I first when we have separate > > matrices for the left and right hand sides. There is code to just scale > > and shift Jrhs when there is no IJacobian, but the creation logic got > > messed up at some point (or at least for some TS configurations that are > > common). We were discussing this recently in a performance context and > > this branch is supposed to fix that logic. Does this branch work for > > you? > > > > https://bitbucket.org/petsc/petsc/pull-requests/655/fix- > flaw-with-tssetrhsjacobian-and-no/diff > > > > Ed Bueler writes: > > > > > Dear PETSc -- > > > > > > I have a TS-using and DMDA-using code in which I want to set a > RHSJacobian > > > which is only approximate. (The Jacobian uses first-order upwinding > MOL > > > while the RHSFunction uses a flux-limited MOL.) While it works with > the > > > analytical Jacobian, and -snes_fd, and -snes_fd_color, and -snes_mf, I > get > > > a "No support ..." message for -snes_mf_operator. > > > > > > It suffices to show the problem with > > > > > > src/ts/examples/tutorials/ex2.c > > > > > > (Note ex2.c does not use DMDA so ..._color is not available for ex2.c.) > > > > > > Error as follows: > > > > > > ~/petsc/src/ts/examples/tutorials[master*]$ ./ex2 -ts_dt 0.001 > > > -ts_final_time 0.001 -snes_monitor > > > 0 SNES Function norm 1.059316422854e+01 > > > 1 SNES Function norm 1.035505461114e-05 > > > 2 SNES Function norm 5.498223366328e-12 > > > ~/petsc/src/ts/examples/tutorials[master*]$ ./ex2 -ts_dt 0.001 > > > -ts_final_time 0.001 -snes_monitor -snes_fd > > > 0 SNES Function norm 1.059316422854e+01 > > > 1 SNES Function norm 1.208245988550e-05 > > > 2 SNES Function norm 6.022374930788e-12 > > > ~/petsc/src/ts/examples/tutorials[master*]$ ./ex2 -ts_dt 0.001 > > > -ts_final_time 0.001 -snes_monitor -snes_mf > > > 0 SNES Function norm 1.059316422854e+01 > > > 1 SNES Function norm 6.136984336801e-05 > > > 2 SNES Function norm 5.355730806625e-10 > > > ~/petsc/src/ts/examples/tutorials[master*]$ ./ex2 -ts_dt 0.001 > > > -ts_final_time 0.001 -snes_monitor -snes_mf_operator > > > 0 SNES Function norm 1.059316422854e+01 > > > [0]PETSC ERROR: --------------------- Error Message > > > -------------------------------------------------------------- > > > [0]PETSC ERROR: No support for this operation for this object type > > > [0]PETSC ERROR: Mat type mffd > > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/ > documentation/faq.html for > > > trouble shooting. > > > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3426-g0c7851c > GIT > > > Date: 2017-04-01 18:40:06 -0600 > > > [0]PETSC ERROR: ./ex2 on a linux-c-dbg named ed-lemur by ed Sun Apr 2 > > > 17:02:47 2017 > > > [0]PETSC ERROR: Configure options --download-mpich --with-debugging=1 > > > [0]PETSC ERROR: #1 MatZeroEntries() line 5471 in > > > /home/ed/petsc/src/mat/interface/matrix.c > > > [0]PETSC ERROR: #2 TSComputeIJacobian() line 965 in > > > /home/ed/petsc/src/ts/interface/ts.c > > > [0]PETSC ERROR: #3 SNESTSFormJacobian_Theta() line 515 in > > > /home/ed/petsc/src/ts/impls/implicit/theta/theta.c > > > [0]PETSC ERROR: #4 SNESTSFormJacobian() line 5078 in > > > /home/ed/petsc/src/ts/interface/ts.c > > > [0]PETSC ERROR: #5 SNESComputeJacobian() line 2276 in > > > /home/ed/petsc/src/snes/interface/snes.c > > > [0]PETSC ERROR: #6 SNESSolve_NEWTONLS() line 222 in > > > /home/ed/petsc/src/snes/impls/ls/ls.c > > > [0]PETSC ERROR: #7 SNESSolve() line 3967 in > > > /home/ed/petsc/src/snes/interface/snes.c > > > [0]PETSC ERROR: #8 TS_SNESSolve() line 171 in > > > /home/ed/petsc/src/ts/impls/implicit/theta/theta.c > > > [0]PETSC ERROR: #9 TSStep_Theta() line 211 in > > > /home/ed/petsc/src/ts/impls/implicit/theta/theta.c > > > [0]PETSC ERROR: #10 TSStep() line 3843 in > > > /home/ed/petsc/src/ts/interface/ts.c > > > [0]PETSC ERROR: #11 TSSolve() line 4088 in > > > /home/ed/petsc/src/ts/interface/ts.c > > > [0]PETSC ERROR: #12 main() line 194 in > > > /home/ed/petsc/src/ts/examples/tutorials/ex2.c > > > [0]PETSC ERROR: PETSc Option Table entries: > > > [0]PETSC ERROR: -snes_mf_operator > > > [0]PETSC ERROR: -snes_monitor > > > [0]PETSC ERROR: -ts_dt 0.001 > > > [0]PETSC ERROR: -ts_final_time 0.001 > > > [0]PETSC ERROR: ----------------End of Error Message -------send entire > > > error message to petsc-maint at mcs.anl.gov---------- > > > application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0 > > > [unset]: aborting job: > > > application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0 > > > > > > Is this intended or a bug? If it is intended, what is the issue? > > > > > > I am using current master branch (commit > > > 0c7851c55cba8e40da5083f79ba1ff846acd45b2). > > > > > > Thanks for your help and awesome library! > > > > > > Ed > > > > > > > > > > > > > > > > > > -- > > > Ed Bueler > > > Dept of Math and Stat and Geophysical Institute > > > University of Alaska Fairbanks > > > Fairbanks, AK 99775-6660 > > > 301C Chapman > > > > > > > > -- > > Ed Bueler > > Dept of Math and Stat and Geophysical Institute > > University of Alaska Fairbanks > > Fairbanks, AK 99775-6660 > > 301C Chapman > > > > -- Ed Bueler Dept of Math and Stat and Geophysical Institute University of Alaska Fairbanks Fairbanks, AK 99775-6660 301C Chapman -------------- next part -------------- An HTML attachment was scrubbed... URL: From jychang48 at gmail.com Mon Apr 3 03:41:43 2017 From: jychang48 at gmail.com (Justin Chang) Date: Mon, 3 Apr 2017 03:41:43 -0500 Subject: [petsc-users] Correlation between da_refine and pg_mg_levels In-Reply-To: References: <87y3vlmqye.fsf@jedbrown.org> <3FA19A8A-CED7-4D65-9A4C-03D10CCBF3EF@mcs.anl.gov> Message-ID: I fixed the KNL issue - apparently "larger" jobs need to have the executable copied into the /tmp directory to speedup loading/startup time so I did that instead of executing the program via makefile. On Mon, Apr 3, 2017 at 12:45 AM, Justin Chang wrote: > So if I begin with a 128x128x8 grid on 1032 procs, it works fine for the > first two levels of da_refine. However, on the third level I get this error: > > Level 3 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 1024 > x 1024 x 57 (59768832), size (m) 9.76562 x 9.76562 x 17.8571 > Level 2 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 512 x > 512 x 29 (7602176), size (m) 19.5312 x 19.5312 x 35.7143 > Level 1 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 256 x > 256 x 15 (983040), size (m) 39.0625 x 39.0625 x 71.4286 > Level 0 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 128 x > 128 x 8 (131072), size (m) 78.125 x 78.125 x 142.857 > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Petsc has generated inconsistent data > [0]PETSC ERROR: Eigen estimator failed: DIVERGED_NANORINF at iteration 0 > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3418-ge372536 GIT > Date: 2017-03-30 13:35:15 -0500 > [0]PETSC ERROR: /scratch2/scratchdirs/jychang/Icesheet/./ex48edison on a > arch-edison-c-opt named nid00865 by jychang Sun Apr 2 21:44:44 2017 > [0]PETSC ERROR: Configure options --download-fblaslapack --with-cc=cc > --with-clib-autodetect=0 --with-cxx=CC --with-cxxlib-autodetect=0 > --with-debugging=0 --with-fc=ftn --with-fortranlib-autodetect=0 > --with-mpiexec=srun --with-64-bit-indices=1 COPTFLAGS=-O3 CXXOPTFLAGS=-O3 > FOPTFLAGS=-O3 PETSC_ARCH=arch-edison-c-opt > [0]PETSC ERROR: #1 KSPSolve_Chebyshev() line 380 in > /global/u1/j/jychang/Software/petsc/src/ksp/ksp/impls/cheby/cheby.c > [0]PETSC ERROR: #2 KSPSolve() line 655 in /global/u1/j/jychang/Software/ > petsc/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: #3 PCMGMCycle_Private() line 19 in > /global/u1/j/jychang/Software/petsc/src/ksp/pc/impls/mg/mg.c > [0]PETSC ERROR: #4 PCMGMCycle_Private() line 53 in > /global/u1/j/jychang/Software/petsc/src/ksp/pc/impls/mg/mg.c > [0]PETSC ERROR: #5 PCApply_MG() line 331 in /global/u1/j/jychang/Software/ > petsc/src/ksp/pc/impls/mg/mg.c > [0]PETSC ERROR: #6 PCApply() line 458 in /global/u1/j/jychang/Software/ > petsc/src/ksp/pc/interface/precon.c > [0]PETSC ERROR: #7 KSP_PCApply() line 251 in /global/homes/j/jychang/Softwa > re/petsc/include/petsc/private/kspimpl.h > [0]PETSC ERROR: #8 KSPInitialResidual() line 67 in > /global/u1/j/jychang/Software/petsc/src/ksp/ksp/interface/itres.c > [0]PETSC ERROR: #9 KSPSolve_GMRES() line 233 in > /global/u1/j/jychang/Software/petsc/src/ksp/ksp/impls/gmres/gmres.c > [0]PETSC ERROR: #10 KSPSolve() line 655 in /global/u1/j/jychang/Software/ > petsc/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: #11 SNESSolve_NEWTONLS() line 224 in > /global/u1/j/jychang/Software/petsc/src/snes/impls/ls/ls.c > [0]PETSC ERROR: #12 SNESSolve() line 3967 in /global/u1/j/jychang/Software/ > petsc/src/snes/interface/snes.c > [0]PETSC ERROR: #13 main() line 1548 in /scratch2/scratchdirs/jychang/ > Icesheet/ex48.c > [0]PETSC ERROR: PETSc Option Table entries: > [0]PETSC ERROR: -M 128 > [0]PETSC ERROR: -N 128 > [0]PETSC ERROR: -P 8 > [0]PETSC ERROR: -da_refine 3 > [0]PETSC ERROR: -mg_coarse_pc_type gamg > [0]PETSC ERROR: -pc_mg_levels 4 > [0]PETSC ERROR: -pc_type mg > [0]PETSC ERROR: -thi_mat_type baij > [0]PETSC ERROR: ----------------End of Error Message -------send entire > error message to petsc-maint at mcs.anl.gov---------- > > If I changed the coarse grid to 129x129x8, no error whatsoever for up to 4 > levels of refinement. > > However, I am having trouble getting this started up on Cori's KNL... > > I am using a coarse grid 136x136x8 across 1088 cores, and slurm is simply > cancelling the job. No other PETSc error was given. This is literally what > my log files say: > > Level 1 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 272 x > 272 x 15 (1109760), size (m) 36.7647 x 36.7647 x 71.4286 > Level 0 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 136 x > 136 x 8 (147968), size (m) 73.5294 x 73.5294 x 142.857 > makefile:25: recipe for target 'runcori' failed > Level 2 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 544 x > 544 x 29 (8582144), size (m) 18.3824 x 18.3824 x 35.7143 > Level 1 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 272 x > 272 x 15 (1109760), size (m) 36.7647 x 36.7647 x 71.4286 > Level 0 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 136 x > 136 x 8 (147968), size (m) 73.5294 x 73.5294 x 142.857 > srun: error: nid04139: task 480: Killed > srun: Terminating job step 4387719.0 > srun: Job step aborted: Waiting up to 32 seconds for job step to finish. > slurmstepd: error: *** STEP 4387719.0 ON nid03873 CANCELLED AT > 2017-04-02T22:21:21 *** > srun: error: nid03960: task 202: Killed > srun: error: nid04005: task 339: Killed > srun: error: nid03873: task 32: Killed > srun: error: nid03960: task 203: Killed > srun: error: nid03873: task 3: Killed > srun: error: nid03960: task 199: Killed > srun: error: nid04004: task 264: Killed > srun: error: nid04141: task 660: Killed > srun: error: nid04139: task 539: Killed > srun: error: nid03873: task 63: Killed > srun: error: nid03960: task 170: Killed > srun: error: nid08164: task 821: Killed > srun: error: nid04139: task 507: Killed > srun: error: nid04005: task 299: Killed > srun: error: nid03960: tasks 136-169,171-198,200-201: Killed > srun: error: nid04005: task 310: Killed > srun: error: nid08166: task 1008: Killed > srun: error: nid04141: task 671: Killed > srun: error: nid03873: task 18: Killed > srun: error: nid04139: tasks 476-479,481-506,508-538,540-543: Killed > srun: error: nid04005: tasks 272-298,300-309,311-338: Killed > srun: error: nid04140: tasks 544-611: Killed > srun: error: nid04142: tasks 680-747: Killed > srun: error: nid04138: tasks 408-475: Killed > srun: error: nid04006: tasks 340-407: Killed > srun: error: nid08163: tasks 748-815: Killed > srun: error: nid08166: tasks 952-1007,1009-1019: Killed > srun: error: nid03873: tasks 0-2,4-17,19-31,33-62,64-67: Killed > srun: error: nid08165: tasks 884-951: Killed > srun: error: nid03883: tasks 68-135: Killed > srun: error: nid08164: tasks 816-820,822-883: Killed > srun: error: nid08167: tasks 1020-1087: Killed > srun: error: nid04141: tasks 612-659,661-670,672-679: Killed > srun: error: nid04004: tasks 204-263,265-271: Killed > make: [runcori] Error 137 (ignored) > [257]PETSC ERROR: ------------------------------ > ------------------------------------------ > [257]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the > batch system) has told this process to end > [257]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > [257]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/ > documentation/faq.html#valgrind > [257]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac > OS X to find memory corruption errors > [257]PETSC ERROR: configure using --with-debugging=yes, recompile, link, > and run > [257]PETSC ERROR: to get more information on the crash. > [878]PETSC ERROR: ------------------------------ > ------------------------------------------ > [878]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the > batch system) has told this process to end > [878]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > [878]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/ > documentation/faq.html#valgrind > [878]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac > OS X to find memory corruption errors > [878]PETSC ERROR: configure using --with-debugging=yes, recompile, link, > and run > [878]PETSC ERROR: to get more information on the crash. > .... > [clipped] > .... > > > > my job script for KNL looks like this: > > #!/bin/bash > #SBATCH -N 16 > #SBATCH -C knl,quad,cache > #SBATCH -p regular > #SBATCH -J knl1024 > #SBATCH -L SCRATCH > #SBATCH -o knl1088.o%j > #SBATCH -e knl1088.e%j > #SBATCH --mail-type=ALL > #SBATCH --mail-user=jychang48 at gmail.com > #SBATCH -t 00:20:00 > > srun -n 1088 -c 4 --cpu_bind=cores ./ex48 .... > > Any ideas why this is happening? Or do I need to contact the NERSC folks? > > Thanks, > Justin > > On Sun, Apr 2, 2017 at 2:15 PM, Matthew Knepley wrote: > >> On Sun, Apr 2, 2017 at 2:13 PM, Barry Smith wrote: >> >>> >>> > On Apr 2, 2017, at 9:25 AM, Justin Chang wrote: >>> > >>> > Thanks guys, >>> > >>> > So I want to run SNES ex48 across 1032 processes on Edison, but I keep >>> getting segmentation violations. These are the parameters I am trying: >>> > >>> > srun -n 1032 -c 2 ./ex48 -M 80 -N 80 -P 9 -da_refine 1 -pc_type mg >>> -thi_mat_type baij -mg_coarse_pc_type gamg >>> > >>> > The above works perfectly fine if I used 96 processes. I also tried to >>> use a finer coarse mesh on 1032 but the error persists. >>> > >>> > Any ideas why this is happening? What are the ideal parameters to use >>> if I want to use 1k+ cores? >>> > >>> >>> Hmm, one should never get segmentation violations. You should only >>> get not completely useful error messages about incompatible sizes etc. Send >>> an example of the segmentation violations. (I sure hope you are checking >>> the error return codes for all functions?). >> >> >> He is just running SNES ex48. >> >> Matt >> >> >>> >>> Barry >>> >>> > Thanks, >>> > Justin >>> > >>> > On Fri, Mar 31, 2017 at 12:47 PM, Barry Smith >>> wrote: >>> > >>> > > On Mar 31, 2017, at 10:00 AM, Jed Brown wrote: >>> > > >>> > > Justin Chang writes: >>> > > >>> > >> Yeah based on my experiments it seems setting pc_mg_levels to >>> $DAREFINE + 1 >>> > >> has decent performance. >>> > >> >>> > >> 1) is there ever a case where you'd want $MGLEVELS <= $DAREFINE? In >>> some of >>> > >> the PETSc tutorial slides (e.g., http://www.mcs.anl.gov/ >>> > >> petsc/documentation/tutorials/TutorialCEMRACS2016.pdf on slide >>> 203/227) >>> > >> they say to use $MGLEVELS = 4 and $DAREFINE = 5, but when I ran >>> this, it >>> > >> was almost twice as slow as if $MGLEVELS >= $DAREFINE >>> > > >>> > > Smaller coarse grids are generally more scalable -- when the problem >>> > > data is distributed, multigrid is a good solution algorithm. But if >>> > > multigrid stops being effective because it is not preserving >>> sufficient >>> > > coarse grid accuracy (e.g., for transport-dominated problems in >>> > > complicated domains) then you might want to stop early and use a more >>> > > robust method (like direct solves). >>> > >>> > Basically for symmetric positive definite operators you can make the >>> coarse problem as small as you like (even 1 point) in theory. For >>> indefinite and non-symmetric problems the theory says the "coarse grid must >>> be sufficiently fine" (loosely speaking the coarse grid has to resolve the >>> eigenmodes for the eigenvalues to the left of the x = 0). >>> > >>> > https://www.jstor.org/stable/2158375?seq=1#page_scan_tab_contents >>> > >>> > >>> > >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Apr 3 06:11:38 2017 From: jed at jedbrown.org (Jed Brown) Date: Mon, 03 Apr 2017 05:11:38 -0600 Subject: [petsc-users] Correlation between da_refine and pg_mg_levels In-Reply-To: References: <87y3vlmqye.fsf@jedbrown.org> <3FA19A8A-CED7-4D65-9A4C-03D10CCBF3EF@mcs.anl.gov> Message-ID: <87vaqlpwyd.fsf@jedbrown.org> Justin Chang writes: > So if I begin with a 128x128x8 grid on 1032 procs, it works fine for the > first two levels of da_refine. However, on the third level I get this error: > > Level 3 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 1024 x > 1024 x 57 (59768832), size (m) 9.76562 x 9.76562 x 17.8571 > Level 2 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 512 x > 512 x 29 (7602176), size (m) 19.5312 x 19.5312 x 35.7143 > Level 1 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 256 x > 256 x 15 (983040), size (m) 39.0625 x 39.0625 x 71.4286 > Level 0 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 128 x > 128 x 8 (131072), size (m) 78.125 x 78.125 x 142.857 > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Petsc has generated inconsistent data > [0]PETSC ERROR: Eigen estimator failed: DIVERGED_NANORINF at iteration 0 Building with debugging and adding -fp_trap to get a stack trace would be really useful. Or reproducing at smaller scale. > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for > trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3418-ge372536 GIT > Date: 2017-03-30 13:35:15 -0500 > [0]PETSC ERROR: /scratch2/scratchdirs/jychang/Icesheet/./ex48edison on a > arch-edison-c-opt named nid00865 by jychang Sun Apr 2 21:44:44 2017 > [0]PETSC ERROR: Configure options --download-fblaslapack --with-cc=cc > --with-clib-autodetect=0 --with-cxx=CC --with-cxxlib-autodetect=0 > --with-debugging=0 --with-fc=ftn --with-fortranlib-autodetect=0 > --with-mpiexec=srun --with-64-bit-indices=1 COPTFLAGS=-O3 CXXOPTFLAGS=-O3 > FOPTFLAGS=-O3 PETSC_ARCH=arch-edison-c-opt > [0]PETSC ERROR: #1 KSPSolve_Chebyshev() line 380 in > /global/u1/j/jychang/Software/petsc/src/ksp/ksp/impls/cheby/cheby.c > [0]PETSC ERROR: #2 KSPSolve() line 655 in /global/u1/j/jychang/Software/ > petsc/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: #3 PCMGMCycle_Private() line 19 in > /global/u1/j/jychang/Software/petsc/src/ksp/pc/impls/mg/mg.c > [0]PETSC ERROR: #4 PCMGMCycle_Private() line 53 in > /global/u1/j/jychang/Software/petsc/src/ksp/pc/impls/mg/mg.c > [0]PETSC ERROR: #5 PCApply_MG() line 331 in /global/u1/j/jychang/Software/ > petsc/src/ksp/pc/impls/mg/mg.c > [0]PETSC ERROR: #6 PCApply() line 458 in /global/u1/j/jychang/Software/ > petsc/src/ksp/pc/interface/precon.c > [0]PETSC ERROR: #7 KSP_PCApply() line 251 in /global/homes/j/jychang/ > Software/petsc/include/petsc/private/kspimpl.h > [0]PETSC ERROR: #8 KSPInitialResidual() line 67 in > /global/u1/j/jychang/Software/petsc/src/ksp/ksp/interface/itres.c > [0]PETSC ERROR: #9 KSPSolve_GMRES() line 233 in > /global/u1/j/jychang/Software/petsc/src/ksp/ksp/impls/gmres/gmres.c > [0]PETSC ERROR: #10 KSPSolve() line 655 in /global/u1/j/jychang/Software/ > petsc/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: #11 SNESSolve_NEWTONLS() line 224 in > /global/u1/j/jychang/Software/petsc/src/snes/impls/ls/ls.c > [0]PETSC ERROR: #12 SNESSolve() line 3967 in /global/u1/j/jychang/Software/ > petsc/src/snes/interface/snes.c > [0]PETSC ERROR: #13 main() line 1548 in /scratch2/scratchdirs/jychang/ > Icesheet/ex48.c > [0]PETSC ERROR: PETSc Option Table entries: > [0]PETSC ERROR: -M 128 > [0]PETSC ERROR: -N 128 > [0]PETSC ERROR: -P 8 > [0]PETSC ERROR: -da_refine 3 > [0]PETSC ERROR: -mg_coarse_pc_type gamg > [0]PETSC ERROR: -pc_mg_levels 4 > [0]PETSC ERROR: -pc_type mg > [0]PETSC ERROR: -thi_mat_type baij > [0]PETSC ERROR: ----------------End of Error Message -------send entire > error message to petsc-maint at mcs.anl.gov---------- > > If I changed the coarse grid to 129x129x8, no error whatsoever for up to 4 > levels of refinement. > > However, I am having trouble getting this started up on Cori's KNL... > > I am using a coarse grid 136x136x8 across 1088 cores, and slurm is simply > cancelling the job. No other PETSc error was given. This is literally what > my log files say: > > Level 1 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 272 x > 272 x 15 (1109760), size (m) 36.7647 x 36.7647 x 71.4286 > Level 0 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 136 x > 136 x 8 (147968), size (m) 73.5294 x 73.5294 x 142.857 Why are levels 1 and 0 printed above, then 2,1,0 below. > makefile:25: recipe for target 'runcori' failed What is this makefile message doing? > Level 2 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 544 x > 544 x 29 (8582144), size (m) 18.3824 x 18.3824 x 35.7143 > Level 1 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 272 x > 272 x 15 (1109760), size (m) 36.7647 x 36.7647 x 71.4286 > Level 0 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 136 x > 136 x 8 (147968), size (m) 73.5294 x 73.5294 x 142.857 > srun: error: nid04139: task 480: Killed > srun: Terminating job step 4387719.0 > srun: Job step aborted: Waiting up to 32 seconds for job step to finish. > slurmstepd: error: *** STEP 4387719.0 ON nid03873 CANCELLED AT > 2017-04-02T22:21:21 *** > srun: error: nid03960: task 202: Killed > srun: error: nid04005: task 339: Killed > srun: error: nid03873: task 32: Killed > srun: error: nid03960: task 203: Killed > srun: error: nid03873: task 3: Killed > srun: error: nid03960: task 199: Killed > srun: error: nid04004: task 264: Killed > srun: error: nid04141: task 660: Killed > srun: error: nid04139: task 539: Killed > srun: error: nid03873: task 63: Killed > srun: error: nid03960: task 170: Killed > srun: error: nid08164: task 821: Killed > srun: error: nid04139: task 507: Killed > srun: error: nid04005: task 299: Killed > srun: error: nid03960: tasks 136-169,171-198,200-201: Killed > srun: error: nid04005: task 310: Killed > srun: error: nid08166: task 1008: Killed > srun: error: nid04141: task 671: Killed > srun: error: nid03873: task 18: Killed > srun: error: nid04139: tasks 476-479,481-506,508-538,540-543: Killed > srun: error: nid04005: tasks 272-298,300-309,311-338: Killed > srun: error: nid04140: tasks 544-611: Killed > srun: error: nid04142: tasks 680-747: Killed > srun: error: nid04138: tasks 408-475: Killed > srun: error: nid04006: tasks 340-407: Killed > srun: error: nid08163: tasks 748-815: Killed > srun: error: nid08166: tasks 952-1007,1009-1019: Killed > srun: error: nid03873: tasks 0-2,4-17,19-31,33-62,64-67: Killed > srun: error: nid08165: tasks 884-951: Killed > srun: error: nid03883: tasks 68-135: Killed > srun: error: nid08164: tasks 816-820,822-883: Killed > srun: error: nid08167: tasks 1020-1087: Killed > srun: error: nid04141: tasks 612-659,661-670,672-679: Killed > srun: error: nid04004: tasks 204-263,265-271: Killed > make: [runcori] Error 137 (ignored) > [257]PETSC ERROR: > ------------------------------------------------------------------------ > [257]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the > batch system) has told this process to end > [257]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [257]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [257]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS > X to find memory corruption errors > [257]PETSC ERROR: configure using --with-debugging=yes, recompile, link, > and run > [257]PETSC ERROR: to get more information on the crash. > [878]PETSC ERROR: > ------------------------------------------------------------------------ > [878]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the > batch system) has told this process to end > [878]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [878]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [878]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS > X to find memory corruption errors > [878]PETSC ERROR: configure using --with-debugging=yes, recompile, link, > and run > [878]PETSC ERROR: to get more information on the crash. > .... > [clipped] > .... > > > > my job script for KNL looks like this: > > #!/bin/bash > #SBATCH -N 16 > #SBATCH -C knl,quad,cache > #SBATCH -p regular > #SBATCH -J knl1024 > #SBATCH -L SCRATCH > #SBATCH -o knl1088.o%j > #SBATCH -e knl1088.e%j > #SBATCH --mail-type=ALL > #SBATCH --mail-user=jychang48 at gmail.com > #SBATCH -t 00:20:00 > > srun -n 1088 -c 4 --cpu_bind=cores ./ex48 .... > > Any ideas why this is happening? Or do I need to contact the NERSC folks? > > Thanks, > Justin > > On Sun, Apr 2, 2017 at 2:15 PM, Matthew Knepley wrote: > >> On Sun, Apr 2, 2017 at 2:13 PM, Barry Smith wrote: >> >>> >>> > On Apr 2, 2017, at 9:25 AM, Justin Chang wrote: >>> > >>> > Thanks guys, >>> > >>> > So I want to run SNES ex48 across 1032 processes on Edison, but I keep >>> getting segmentation violations. These are the parameters I am trying: >>> > >>> > srun -n 1032 -c 2 ./ex48 -M 80 -N 80 -P 9 -da_refine 1 -pc_type mg >>> -thi_mat_type baij -mg_coarse_pc_type gamg >>> > >>> > The above works perfectly fine if I used 96 processes. I also tried to >>> use a finer coarse mesh on 1032 but the error persists. >>> > >>> > Any ideas why this is happening? What are the ideal parameters to use >>> if I want to use 1k+ cores? >>> > >>> >>> Hmm, one should never get segmentation violations. You should only get >>> not completely useful error messages about incompatible sizes etc. Send an >>> example of the segmentation violations. (I sure hope you are checking the >>> error return codes for all functions?). >> >> >> He is just running SNES ex48. >> >> Matt >> >> >>> >>> Barry >>> >>> > Thanks, >>> > Justin >>> > >>> > On Fri, Mar 31, 2017 at 12:47 PM, Barry Smith >>> wrote: >>> > >>> > > On Mar 31, 2017, at 10:00 AM, Jed Brown wrote: >>> > > >>> > > Justin Chang writes: >>> > > >>> > >> Yeah based on my experiments it seems setting pc_mg_levels to >>> $DAREFINE + 1 >>> > >> has decent performance. >>> > >> >>> > >> 1) is there ever a case where you'd want $MGLEVELS <= $DAREFINE? In >>> some of >>> > >> the PETSc tutorial slides (e.g., http://www.mcs.anl.gov/ >>> > >> petsc/documentation/tutorials/TutorialCEMRACS2016.pdf on slide >>> 203/227) >>> > >> they say to use $MGLEVELS = 4 and $DAREFINE = 5, but when I ran >>> this, it >>> > >> was almost twice as slow as if $MGLEVELS >= $DAREFINE >>> > > >>> > > Smaller coarse grids are generally more scalable -- when the problem >>> > > data is distributed, multigrid is a good solution algorithm. But if >>> > > multigrid stops being effective because it is not preserving >>> sufficient >>> > > coarse grid accuracy (e.g., for transport-dominated problems in >>> > > complicated domains) then you might want to stop early and use a more >>> > > robust method (like direct solves). >>> > >>> > Basically for symmetric positive definite operators you can make the >>> coarse problem as small as you like (even 1 point) in theory. For >>> indefinite and non-symmetric problems the theory says the "coarse grid must >>> be sufficiently fine" (loosely speaking the coarse grid has to resolve the >>> eigenmodes for the eigenvalues to the left of the x = 0). >>> > >>> > https://www.jstor.org/stable/2158375?seq=1#page_scan_tab_contents >>> > >>> > >>> > >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From knepley at gmail.com Mon Apr 3 06:15:18 2017 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 3 Apr 2017 06:15:18 -0500 Subject: [petsc-users] Correlation between da_refine and pg_mg_levels In-Reply-To: <87vaqlpwyd.fsf@jedbrown.org> References: <87y3vlmqye.fsf@jedbrown.org> <3FA19A8A-CED7-4D65-9A4C-03D10CCBF3EF@mcs.anl.gov> <87vaqlpwyd.fsf@jedbrown.org> Message-ID: On Mon, Apr 3, 2017 at 6:11 AM, Jed Brown wrote: > Justin Chang writes: > > > So if I begin with a 128x128x8 grid on 1032 procs, it works fine for the > > first two levels of da_refine. However, on the third level I get this > error: > > > > Level 3 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements > 1024 x > > 1024 x 57 (59768832), size (m) 9.76562 x 9.76562 x 17.8571 > > Level 2 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 512 > x > > 512 x 29 (7602176), size (m) 19.5312 x 19.5312 x 35.7143 > > Level 1 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 256 > x > > 256 x 15 (983040), size (m) 39.0625 x 39.0625 x 71.4286 > > Level 0 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 128 > x > > 128 x 8 (131072), size (m) 78.125 x 78.125 x 142.857 > > [0]PETSC ERROR: --------------------- Error Message > > -------------------------------------------------------------- > > [0]PETSC ERROR: Petsc has generated inconsistent data > > [0]PETSC ERROR: Eigen estimator failed: DIVERGED_NANORINF at iteration 0 > > Building with debugging and adding -fp_trap to get a stack trace would > be really useful. Or reproducing at smaller scale. I can't think why it would fail there, but DMDA really likes old numbers of vertices, because it wants to take every other point, 129 seems good. I will see if I can reproduce once I get a chance. And now you see why it almost always takes a full-time person just to run jobs on one of these machines. Horrible design flaws never get fixed. Matt > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for > > trouble shooting. > > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3418-ge372536 GIT > > Date: 2017-03-30 13:35:15 -0500 > > [0]PETSC ERROR: /scratch2/scratchdirs/jychang/Icesheet/./ex48edison on a > > arch-edison-c-opt named nid00865 by jychang Sun Apr 2 21:44:44 2017 > > [0]PETSC ERROR: Configure options --download-fblaslapack --with-cc=cc > > --with-clib-autodetect=0 --with-cxx=CC --with-cxxlib-autodetect=0 > > --with-debugging=0 --with-fc=ftn --with-fortranlib-autodetect=0 > > --with-mpiexec=srun --with-64-bit-indices=1 COPTFLAGS=-O3 CXXOPTFLAGS=-O3 > > FOPTFLAGS=-O3 PETSC_ARCH=arch-edison-c-opt > > [0]PETSC ERROR: #1 KSPSolve_Chebyshev() line 380 in > > /global/u1/j/jychang/Software/petsc/src/ksp/ksp/impls/cheby/cheby.c > > [0]PETSC ERROR: #2 KSPSolve() line 655 in /global/u1/j/jychang/Software/ > > petsc/src/ksp/ksp/interface/itfunc.c > > [0]PETSC ERROR: #3 PCMGMCycle_Private() line 19 in > > /global/u1/j/jychang/Software/petsc/src/ksp/pc/impls/mg/mg.c > > [0]PETSC ERROR: #4 PCMGMCycle_Private() line 53 in > > /global/u1/j/jychang/Software/petsc/src/ksp/pc/impls/mg/mg.c > > [0]PETSC ERROR: #5 PCApply_MG() line 331 in > /global/u1/j/jychang/Software/ > > petsc/src/ksp/pc/impls/mg/mg.c > > [0]PETSC ERROR: #6 PCApply() line 458 in /global/u1/j/jychang/Software/ > > petsc/src/ksp/pc/interface/precon.c > > [0]PETSC ERROR: #7 KSP_PCApply() line 251 in /global/homes/j/jychang/ > > Software/petsc/include/petsc/private/kspimpl.h > > [0]PETSC ERROR: #8 KSPInitialResidual() line 67 in > > /global/u1/j/jychang/Software/petsc/src/ksp/ksp/interface/itres.c > > [0]PETSC ERROR: #9 KSPSolve_GMRES() line 233 in > > /global/u1/j/jychang/Software/petsc/src/ksp/ksp/impls/gmres/gmres.c > > [0]PETSC ERROR: #10 KSPSolve() line 655 in /global/u1/j/jychang/Software/ > > petsc/src/ksp/ksp/interface/itfunc.c > > [0]PETSC ERROR: #11 SNESSolve_NEWTONLS() line 224 in > > /global/u1/j/jychang/Software/petsc/src/snes/impls/ls/ls.c > > [0]PETSC ERROR: #12 SNESSolve() line 3967 in > /global/u1/j/jychang/Software/ > > petsc/src/snes/interface/snes.c > > [0]PETSC ERROR: #13 main() line 1548 in /scratch2/scratchdirs/jychang/ > > Icesheet/ex48.c > > [0]PETSC ERROR: PETSc Option Table entries: > > [0]PETSC ERROR: -M 128 > > [0]PETSC ERROR: -N 128 > > [0]PETSC ERROR: -P 8 > > [0]PETSC ERROR: -da_refine 3 > > [0]PETSC ERROR: -mg_coarse_pc_type gamg > > [0]PETSC ERROR: -pc_mg_levels 4 > > [0]PETSC ERROR: -pc_type mg > > [0]PETSC ERROR: -thi_mat_type baij > > [0]PETSC ERROR: ----------------End of Error Message -------send entire > > error message to petsc-maint at mcs.anl.gov---------- > > > > If I changed the coarse grid to 129x129x8, no error whatsoever for up to > 4 > > levels of refinement. > > > > However, I am having trouble getting this started up on Cori's KNL... > > > > I am using a coarse grid 136x136x8 across 1088 cores, and slurm is simply > > cancelling the job. No other PETSc error was given. This is literally > what > > my log files say: > > > > Level 1 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 272 > x > > 272 x 15 (1109760), size (m) 36.7647 x 36.7647 x 71.4286 > > Level 0 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 136 > x > > 136 x 8 (147968), size (m) 73.5294 x 73.5294 x 142.857 > > Why are levels 1 and 0 printed above, then 2,1,0 below. > > > makefile:25: recipe for target 'runcori' failed > > What is this makefile message doing? > > > Level 2 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 544 > x > > 544 x 29 (8582144), size (m) 18.3824 x 18.3824 x 35.7143 > > Level 1 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 272 > x > > 272 x 15 (1109760), size (m) 36.7647 x 36.7647 x 71.4286 > > Level 0 domain size (m) 1e+04 x 1e+04 x 1e+03, num elements 136 > x > > 136 x 8 (147968), size (m) 73.5294 x 73.5294 x 142.857 > > srun: error: nid04139: task 480: Killed > > srun: Terminating job step 4387719.0 > > srun: Job step aborted: Waiting up to 32 seconds for job step to finish. > > slurmstepd: error: *** STEP 4387719.0 ON nid03873 CANCELLED AT > > 2017-04-02T22:21:21 *** > > srun: error: nid03960: task 202: Killed > > srun: error: nid04005: task 339: Killed > > srun: error: nid03873: task 32: Killed > > srun: error: nid03960: task 203: Killed > > srun: error: nid03873: task 3: Killed > > srun: error: nid03960: task 199: Killed > > srun: error: nid04004: task 264: Killed > > srun: error: nid04141: task 660: Killed > > srun: error: nid04139: task 539: Killed > > srun: error: nid03873: task 63: Killed > > srun: error: nid03960: task 170: Killed > > srun: error: nid08164: task 821: Killed > > srun: error: nid04139: task 507: Killed > > srun: error: nid04005: task 299: Killed > > srun: error: nid03960: tasks 136-169,171-198,200-201: Killed > > srun: error: nid04005: task 310: Killed > > srun: error: nid08166: task 1008: Killed > > srun: error: nid04141: task 671: Killed > > srun: error: nid03873: task 18: Killed > > srun: error: nid04139: tasks 476-479,481-506,508-538,540-543: Killed > > srun: error: nid04005: tasks 272-298,300-309,311-338: Killed > > srun: error: nid04140: tasks 544-611: Killed > > srun: error: nid04142: tasks 680-747: Killed > > srun: error: nid04138: tasks 408-475: Killed > > srun: error: nid04006: tasks 340-407: Killed > > srun: error: nid08163: tasks 748-815: Killed > > srun: error: nid08166: tasks 952-1007,1009-1019: Killed > > srun: error: nid03873: tasks 0-2,4-17,19-31,33-62,64-67: Killed > > srun: error: nid08165: tasks 884-951: Killed > > srun: error: nid03883: tasks 68-135: Killed > > srun: error: nid08164: tasks 816-820,822-883: Killed > > srun: error: nid08167: tasks 1020-1087: Killed > > srun: error: nid04141: tasks 612-659,661-670,672-679: Killed > > srun: error: nid04004: tasks 204-263,265-271: Killed > > make: [runcori] Error 137 (ignored) > > [257]PETSC ERROR: > > ------------------------------------------------------------------------ > > [257]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the > > batch system) has told this process to end > > [257]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > > [257]PETSC ERROR: or see > > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > [257]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac > OS > > X to find memory corruption errors > > [257]PETSC ERROR: configure using --with-debugging=yes, recompile, link, > > and run > > [257]PETSC ERROR: to get more information on the crash. > > [878]PETSC ERROR: > > ------------------------------------------------------------------------ > > [878]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the > > batch system) has told this process to end > > [878]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > > [878]PETSC ERROR: or see > > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > [878]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac > OS > > X to find memory corruption errors > > [878]PETSC ERROR: configure using --with-debugging=yes, recompile, link, > > and run > > [878]PETSC ERROR: to get more information on the crash. > > .... > > [clipped] > > .... > > > > > > > > my job script for KNL looks like this: > > > > #!/bin/bash > > #SBATCH -N 16 > > #SBATCH -C knl,quad,cache > > #SBATCH -p regular > > #SBATCH -J knl1024 > > #SBATCH -L SCRATCH > > #SBATCH -o knl1088.o%j > > #SBATCH -e knl1088.e%j > > #SBATCH --mail-type=ALL > > #SBATCH --mail-user=jychang48 at gmail.com > > #SBATCH -t 00:20:00 > > > > srun -n 1088 -c 4 --cpu_bind=cores ./ex48 .... > > > > Any ideas why this is happening? Or do I need to contact the NERSC folks? > > > > Thanks, > > Justin > > > > On Sun, Apr 2, 2017 at 2:15 PM, Matthew Knepley > wrote: > > > >> On Sun, Apr 2, 2017 at 2:13 PM, Barry Smith wrote: > >> > >>> > >>> > On Apr 2, 2017, at 9:25 AM, Justin Chang > wrote: > >>> > > >>> > Thanks guys, > >>> > > >>> > So I want to run SNES ex48 across 1032 processes on Edison, but I > keep > >>> getting segmentation violations. These are the parameters I am trying: > >>> > > >>> > srun -n 1032 -c 2 ./ex48 -M 80 -N 80 -P 9 -da_refine 1 -pc_type mg > >>> -thi_mat_type baij -mg_coarse_pc_type gamg > >>> > > >>> > The above works perfectly fine if I used 96 processes. I also tried > to > >>> use a finer coarse mesh on 1032 but the error persists. > >>> > > >>> > Any ideas why this is happening? What are the ideal parameters to use > >>> if I want to use 1k+ cores? > >>> > > >>> > >>> Hmm, one should never get segmentation violations. You should only > get > >>> not completely useful error messages about incompatible sizes etc. > Send an > >>> example of the segmentation violations. (I sure hope you are checking > the > >>> error return codes for all functions?). > >> > >> > >> He is just running SNES ex48. > >> > >> Matt > >> > >> > >>> > >>> Barry > >>> > >>> > Thanks, > >>> > Justin > >>> > > >>> > On Fri, Mar 31, 2017 at 12:47 PM, Barry Smith > >>> wrote: > >>> > > >>> > > On Mar 31, 2017, at 10:00 AM, Jed Brown wrote: > >>> > > > >>> > > Justin Chang writes: > >>> > > > >>> > >> Yeah based on my experiments it seems setting pc_mg_levels to > >>> $DAREFINE + 1 > >>> > >> has decent performance. > >>> > >> > >>> > >> 1) is there ever a case where you'd want $MGLEVELS <= $DAREFINE? > In > >>> some of > >>> > >> the PETSc tutorial slides (e.g., http://www.mcs.anl.gov/ > >>> > >> petsc/documentation/tutorials/TutorialCEMRACS2016.pdf on slide > >>> 203/227) > >>> > >> they say to use $MGLEVELS = 4 and $DAREFINE = 5, but when I ran > >>> this, it > >>> > >> was almost twice as slow as if $MGLEVELS >= $DAREFINE > >>> > > > >>> > > Smaller coarse grids are generally more scalable -- when the > problem > >>> > > data is distributed, multigrid is a good solution algorithm. But > if > >>> > > multigrid stops being effective because it is not preserving > >>> sufficient > >>> > > coarse grid accuracy (e.g., for transport-dominated problems in > >>> > > complicated domains) then you might want to stop early and use a > more > >>> > > robust method (like direct solves). > >>> > > >>> > Basically for symmetric positive definite operators you can make the > >>> coarse problem as small as you like (even 1 point) in theory. For > >>> indefinite and non-symmetric problems the theory says the "coarse grid > must > >>> be sufficiently fine" (loosely speaking the coarse grid has to resolve > the > >>> eigenmodes for the eigenvalues to the left of the x = 0). > >>> > > >>> > https://www.jstor.org/stable/2158375?seq=1#page_scan_tab_contents > >>> > > >>> > > >>> > > >>> > >>> > >> > >> > >> -- > >> What most experimenters take for granted before they begin their > >> experiments is infinitely more interesting than any results to which > their > >> experiments lead. > >> -- Norbert Wiener > >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Apr 3 06:23:18 2017 From: jed at jedbrown.org (Jed Brown) Date: Mon, 03 Apr 2017 05:23:18 -0600 Subject: [petsc-users] Correlation between da_refine and pg_mg_levels In-Reply-To: References: <87y3vlmqye.fsf@jedbrown.org> <3FA19A8A-CED7-4D65-9A4C-03D10CCBF3EF@mcs.anl.gov> <87vaqlpwyd.fsf@jedbrown.org> Message-ID: <87pogtpwex.fsf@jedbrown.org> Matthew Knepley writes: > I can't think why it would fail there, but DMDA really likes old numbers of > vertices, because it wants > to take every other point, 129 seems good. I will see if I can reproduce > once I get a chance. This problem uses periodic boundary conditions so even is right, but Justin only defines the coarsest grid and uses -da_refine so it should actually be irrelevant. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From jed at jedbrown.org Mon Apr 3 07:51:15 2017 From: jed at jedbrown.org (Jed Brown) Date: Mon, 03 Apr 2017 06:51:15 -0600 Subject: [petsc-users] -snes_mf_operator yields "No support for this operation for this object type" in TS codes? In-Reply-To: <61C35066-A114-4C0B-A026-94677F4B3C7B@mcs.anl.gov> References: <877f32qlf1.fsf@jedbrown.org> <61C35066-A114-4C0B-A026-94677F4B3C7B@mcs.anl.gov> Message-ID: <87h925pscc.fsf@jedbrown.org> Barry Smith writes: > Jed, > > Here is the problem. > > https://bitbucket.org/petsc/petsc/branch/barry/fix/even-huger-flaw-in-ts Hmm, when someone uses -snes_mf_operator, we really just need SNESTSFormJacobian to ignore the Amat. However, the user is allowed to create a MatMFFD and have their TSRHSJacobian function use MatMFFD on their RHSFunction. That might even be more accurate, but would require the shift/scale. But I'm not aware of any way in which TS can distinguish these cases. What if SNESComputeJacobian was aware of -snes_mf_operator and just passed Pmat in both slots? Or does the user sometimes need access to the MatMFFD created by -snes_mf_operator? (Seems like possibly, e.g., to adjust differencing parameters.) -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From bsmith at mcs.anl.gov Mon Apr 3 08:24:28 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 3 Apr 2017 09:24:28 -0400 Subject: [petsc-users] -snes_mf_operator yields "No support for this operation for this object type" in TS codes? In-Reply-To: <87h925pscc.fsf@jedbrown.org> References: <877f32qlf1.fsf@jedbrown.org> <61C35066-A114-4C0B-A026-94677F4B3C7B@mcs.anl.gov> <87h925pscc.fsf@jedbrown.org> Message-ID: <7CDCCF9F-F6B0-44E5-AB91-B20A977F3D23@mcs.anl.gov> > On Apr 3, 2017, at 8:51 AM, Jed Brown wrote: > > Barry Smith writes: > >> Jed, >> >> Here is the problem. >> >> https://bitbucket.org/petsc/petsc/branch/barry/fix/even-huger-flaw-in-ts > > Hmm, when someone uses -snes_mf_operator, we really just need > SNESTSFormJacobian to ignore the Amat. However, the user is allowed to > create a MatMFFD and have their TSRHSJacobian function use MatMFFD on > their RHSFunction. That might even be more accurate, but would require > the shift/scale. But I'm not aware of any way in which TS can > distinguish these cases. SNESGetUsingInternalMatMFFD(snes,&flg); Then you can get rid of the horrible PetscBool flg; ierr = PetscObjectTypeCompare((PetscObject)A,MATMFFD,&flg);CHKERRQ(ierr); I had to add in two places. Still ugly but I think less buggy. > > What if SNESComputeJacobian was aware of -snes_mf_operator and just > passed Pmat in both slots? Or does the user sometimes need access to > the MatMFFD created by -snes_mf_operator? (Seems like possibly, e.g., > to adjust differencing parameters.) From jed at jedbrown.org Mon Apr 3 09:05:48 2017 From: jed at jedbrown.org (Jed Brown) Date: Mon, 03 Apr 2017 08:05:48 -0600 Subject: [petsc-users] -snes_mf_operator yields "No support for this operation for this object type" in TS codes? In-Reply-To: <7CDCCF9F-F6B0-44E5-AB91-B20A977F3D23@mcs.anl.gov> References: <877f32qlf1.fsf@jedbrown.org> <61C35066-A114-4C0B-A026-94677F4B3C7B@mcs.anl.gov> <87h925pscc.fsf@jedbrown.org> <7CDCCF9F-F6B0-44E5-AB91-B20A977F3D23@mcs.anl.gov> Message-ID: <874ly5pow3.fsf@jedbrown.org> Barry Smith writes: >> On Apr 3, 2017, at 8:51 AM, Jed Brown wrote: >> >> Barry Smith writes: >> >>> Jed, >>> >>> Here is the problem. >>> >>> https://bitbucket.org/petsc/petsc/branch/barry/fix/even-huger-flaw-in-ts >> >> Hmm, when someone uses -snes_mf_operator, we really just need >> SNESTSFormJacobian to ignore the Amat. However, the user is allowed to >> create a MatMFFD and have their TSRHSJacobian function use MatMFFD on >> their RHSFunction. That might even be more accurate, but would require >> the shift/scale. But I'm not aware of any way in which TS can >> distinguish these cases. > > SNESGetUsingInternalMatMFFD(snes,&flg); Then you can get rid of the horrible > > PetscBool flg; > ierr = PetscObjectTypeCompare((PetscObject)A,MATMFFD,&flg);CHKERRQ(ierr); > > I had to add in two places. Still ugly but I think less buggy. Yeah, there are also MATMFFD checks in SNESComputeJacobian. > > > > >> >> What if SNESComputeJacobian was aware of -snes_mf_operator and just >> passed Pmat in both slots? Or does the user sometimes need access to >> the MatMFFD created by -snes_mf_operator? (Seems like possibly, e.g., >> to adjust differencing parameters.) -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From filippo.leon at gmail.com Mon Apr 3 11:45:15 2017 From: filippo.leon at gmail.com (Filippo Leonardi) Date: Mon, 03 Apr 2017 16:45:15 +0000 Subject: [petsc-users] PETSc with modern C++ In-Reply-To: References: Message-ID: On Monday, 3 April 2017 02:00:53 CEST you wrote: > On Sun, Apr 2, 2017 at 2:15 PM, Filippo Leonardi > > wrote: > > Hello, > > > > I have a project in mind and seek feedback. > > > > Disclaimer: I hope I am not abusing of this mailing list with this idea. > > If so, please ignore. > > > > As a thought experiment, and to have a bit of fun, I am currently > > writing/thinking on writing, a small (modern) C++ wrapper around PETSc. > > > > Premise: PETSc is awesome, I love it and use in many projects. Sometimes I > > am just not super comfortable writing C. (I know my idea goes against > > PETSc's design philosophy). > > > > I know there are many around, and there is not really a need for this > > (especially since PETSc has his own object-oriented style), but there are > > a > > few things I would like to really include in this wrapper, that I found > > nowhere): > > - I am currently only thinking about the Vector/Matrix/KSP/DM part of the > > Framework, there are many other cool things that PETSc does that I do not > > have the brainpower to consider those as well. > > - expression templates (in my opinion this is where C++ shines): this > > would replace all code bloat that a user might need with cool/easy to read > > expressions (this could increase the number of axpy-like routines); > > - those expression templates should use SSE and AVX whenever available; > > - expressions like x += alpha * y should fall back to BLAS axpy (tough > > sometimes this is not even faster than a simple loop); > > The idea for the above is not clear. Do you want templates generating calls > to BLAS? Or scalar code that operates on raw arrays with SSE/AVX? > There is some advantage here of expanding the range of BLAS operations, > which has been done to death by Liz Jessup and collaborators, but not > that much. Templates should generate scalar code operating on raw arrays using SIMD. But I can detect if you want to use axpbycz or gemv, and use the blas implementation instead. I do not think there is a point in trying to "beat" BLAS. (Here a interesting point opens: I assume an efficient BLAS implementation, but I am not so sure about how the different BLAS do things internally. I work from the assumption that we have a very well tuned BLAS implementation at our disposal). > > > - all calls to PETSc should be less verbose, more C++-like: > > * for instance a VecGlobalToLocalBegin could return an empty object that > > > > calls VecGlobalToLocalEnd when it is destroyed. > > > > * some cool idea to easily write GPU kernels. > > If you find a way to make this pay off it would be amazing, since currently > nothing but BLAS3 has a hope of mattering in this context. > > > - the idea would be to have safer routines (at compile time), by means of > > RAII etc. > > > > I aim for zero/near-zero/negligible overhead with full optimization, for > > that I include benchmarks and extensive test units. > > > > So my question is: > > - anyone that would be interested (in the product/in developing)? > > - anyone that has suggestions (maybe that what I have in mind is > > nonsense)? > > I would suggest making a simple performance model that says what you will > do will have at least > a 2x speed gain. Because anything less is not worth your time, and > inevitably you will not get the > whole multiplier. I am really skeptical that is possible with the above > sketch. That I will do as next steps for sure. But I also doubt this much of will be achievable in any case. > > Second, I would try to convince myself that what you propose would be > simpler, in terms of lines of code, > number of objects, number of concepts, etc. Right now, that is not clear to > me either. Number of objects per se may not be smaller. I am more thinking about reducing lines of codes (verbosity), concepts and increase safety. I have two examples I've been burnt with in the past: - casting to void* to pass custom contexts to PETSc routines - forgetting to call the corresponding XXXEnd after a call to XXXBegin (PETSc notices that, ofc., but at runtime, and that might be too late). Example: I can imagine that I need a Petsc's internal array. In this case I call VecGetArray. However I will inevitably foget to return the array to PETSc. I could have my new VecArray returning an object that restores the array when it goes out of scope. I can also flag the function with [[nodiscard]] to prevent the user to destroy the returned object from the start. > > Baring that, maybe you can argue that new capabilities, such as the type > flexibility described by Michael, are enabled. That > would be the most convincing I think. This would be very interesting indeed, but I see only two options: - recompile PETSc twice - manually implement all complex routines, which might be to much of a task > > Thanks, > > Matt Thanks for the feedback Matt. > > If you have read up to here, thanks. On Mon, 3 Apr 2017 at 02:00 Matthew Knepley wrote: > On Sun, Apr 2, 2017 at 2:15 PM, Filippo Leonardi > wrote: > > > Hello, > > I have a project in mind and seek feedback. > > Disclaimer: I hope I am not abusing of this mailing list with this idea. > If so, please ignore. > > As a thought experiment, and to have a bit of fun, I am currently > writing/thinking on writing, a small (modern) C++ wrapper around PETSc. > > Premise: PETSc is awesome, I love it and use in many projects. Sometimes I > am just not super comfortable writing C. (I know my idea goes against > PETSc's design philosophy). > > I know there are many around, and there is not really a need for this > (especially since PETSc has his own object-oriented style), but there are a > few things I would like to really include in this wrapper, that I found > nowhere): > - I am currently only thinking about the Vector/Matrix/KSP/DM part of the > Framework, there are many other cool things that PETSc does that I do not > have the brainpower to consider those as well. > - expression templates (in my opinion this is where C++ shines): this > would replace all code bloat that a user might need with cool/easy to read > expressions (this could increase the number of axpy-like routines); > - those expression templates should use SSE and AVX whenever available; > - expressions like x += alpha * y should fall back to BLAS axpy (tough > sometimes this is not even faster than a simple loop); > > > The idea for the above is not clear. Do you want templates generating > calls to BLAS? Or scalar code that operates on raw arrays with SSE/AVX? > There is some advantage here of expanding the range of BLAS operations, > which has been done to death by Liz Jessup and collaborators, but not > that much. > > > - all calls to PETSc should be less verbose, more C++-like: > * for instance a VecGlobalToLocalBegin could return an empty object that > calls VecGlobalToLocalEnd when it is destroyed. > * some cool idea to easily write GPU kernels. > > > If you find a way to make this pay off it would be amazing, since > currently nothing but BLAS3 has a hope of mattering in this context. > > > - the idea would be to have safer routines (at compile time), by means of > RAII etc. > > I aim for zero/near-zero/negligible overhead with full optimization, for > that I include benchmarks and extensive test units. > > So my question is: > - anyone that would be interested (in the product/in developing)? > - anyone that has suggestions (maybe that what I have in mind is nonsense)? > > > I would suggest making a simple performance model that says what you will > do will have at least > a 2x speed gain. Because anything less is not worth your time, and > inevitably you will not get the > whole multiplier. I am really skeptical that is possible with the above > sketch. > > Second, I would try to convince myself that what you propose would be > simpler, in terms of lines of code, > number of objects, number of concepts, etc. Right now, that is not clear > to me either. > > Baring that, maybe you can argue that new capabilities, such as the type > flexibility described by Michael, are enabled. That > would be the most convincing I think. > > Thanks, > > Matt > > If you have read up to here, thanks. > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhaowenbo.npic at gmail.com Mon Apr 3 12:10:36 2017 From: zhaowenbo.npic at gmail.com (Wenbo Zhao) Date: Tue, 4 Apr 2017 01:10:36 +0800 Subject: [petsc-users] Question about DMDA BOUNDARY_CONDITION set Message-ID: Barry, Hi. I am sorry for too late to reply you. I read the code you send to me which create a VecScatter for ghost points on rotation boundary. But I am still not clear to how to use it to assemble the matrix. I studied the example "$SLEPC_DIR/src/eps/examples/tutorials/ex19.c", which is a 3D eigenvalue problem. The example uses DMCreateMatrix and MatSetValuesStencil. In addition, I use MatSetValues to insert value about rotation boundary condtion. I run it with command "mpirun -n 1 ./step-2 -eps_nev 3 -eps_ncv 9" Error message is listed below: " 3-D Laplacian Eigenproblem Grid partitioning: 1 1 1 [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Argument out of range [0]PETSC ERROR: Inserting a new nonzero at (91,100) in the matrix [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.7.4, Oct, 02, 2016 [0]PETSC ERROR: ./step-2 on a arch-linux2-c-debug named ubuntu by zhaowenbo Mon Apr 3 09:39:08 2017 [0]PETSC ERROR: Configure options --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --with-shared-libraries=1 --with-hypre=1 --with-hypre-dir=/usr/local/hypre [0]PETSC ERROR: #1 MatSetValues_SeqAIJ() line 484 in /home/zhaowenbo/research/petsc/petsc-3.7.4/src/mat/impls/aij/seq/aij.c [0]PETSC ERROR: #2 MatSetValues() line 1190 in /home/zhaowenbo/research/petsc/petsc-3.7.4/src/mat/interface/matrix.c [0]PETSC ERROR: #3 FillMatrix() line 107 in /home/zhaowenbo/test_slepc/step-2/step-2.c [0]PETSC ERROR: #4 main() line 152 in /home/zhaowenbo/test_slepc/step-2/step-2.c [0]PETSC ERROR: PETSc Option Table entries: [0]PETSC ERROR: -eps_ncv 9 [0]PETSC ERROR: -eps_nev 3 [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- " I think the reason is Mat Prellocation. If so, I need to create matrix using MatCreate by myself according to the src code of DMCreateMatrix_DA? Is it correct? Could you give some explanation about how to use VecScatter? In the $PETSC_DIR/src/ksp/ksp/examples/tutorials/ex2.c, I did not find the VecScatter. I found VecScatter is created in the src code of dm/impls/da/da1.c but i am not clear about it. BEST, Wenbo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: step-2.c Type: text/x-csrc Size: 10267 bytes Desc: not available URL: From mpovolot at purdue.edu Mon Apr 3 12:11:53 2017 From: mpovolot at purdue.edu (Michael Povolotskyi) Date: Mon, 03 Apr 2017 13:11:53 -0400 Subject: [petsc-users] PETSc with modern C++ In-Reply-To: References: Message-ID: <58E28259.2070708@purdue.edu> Hi Filippo, to recompile Petsc twice is easy. The difficulty is that in both libraries there will be the same symbols for double and double complex functions. If they were a part of a C++ namespaces, then it would be easier. Michael. On 04/03/2017 12:45 PM, Filippo Leonardi wrote: > > On Monday, 3 April 2017 02:00:53 CEST you wrote: > > > On Sun, Apr 2, 2017 at 2:15 PM, Filippo Leonardi > > > > > > > > wrote: > > > > Hello, > > > > > > > > I have a project in mind and seek feedback. > > > > > > > > Disclaimer: I hope I am not abusing of this mailing list with this > idea. > > > > If so, please ignore. > > > > > > > > As a thought experiment, and to have a bit of fun, I am currently > > > > writing/thinking on writing, a small (modern) C++ wrapper around > PETSc. > > > > > > > > Premise: PETSc is awesome, I love it and use in many projects. > Sometimes I > > > > am just not super comfortable writing C. (I know my idea goes against > > > > PETSc's design philosophy). > > > > > > > > I know there are many around, and there is not really a need for this > > > > (especially since PETSc has his own object-oriented style), but > there are > > > > a > > > > few things I would like to really include in this wrapper, that I > found > > > > nowhere): > > > > - I am currently only thinking about the Vector/Matrix/KSP/DM part > of the > > > > Framework, there are many other cool things that PETSc does that I > do not > > > > have the brainpower to consider those as well. > > > > - expression templates (in my opinion this is where C++ shines): this > > > > would replace all code bloat that a user might need with cool/easy > to read > > > > expressions (this could increase the number of axpy-like routines); > > > > - those expression templates should use SSE and AVX whenever > available; > > > > - expressions like x += alpha * y should fall back to BLAS axpy (tough > > > > sometimes this is not even faster than a simple loop); > > > > > > The idea for the above is not clear. Do you want templates > generating calls > > > to BLAS? Or scalar code that operates on raw arrays with SSE/AVX? > > > There is some advantage here of expanding the range of BLAS operations, > > > which has been done to death by Liz Jessup and collaborators, but not > > > that much. > > > Templates should generate scalar code operating on raw arrays using > SIMD. But > > I can detect if you want to use axpbycz or gemv, and use the blas > > implementation instead. I do not think there is a point in trying to > "beat" > > BLAS. (Here a interesting point opens: I assume an efficient BLAS > > implementation, but I am not so sure about how the different BLAS do > things > > internally. I work from the assumption that we have a very well tuned > BLAS > > implementation at our disposal). > > > > > > > > - all calls to PETSc should be less verbose, more C++-like: > > > > * for instance a VecGlobalToLocalBegin could return an empty > object that > > > > > > > > calls VecGlobalToLocalEnd when it is destroyed. > > > > > > > > * some cool idea to easily write GPU kernels. > > > > > > If you find a way to make this pay off it would be amazing, since > currently > > > nothing but BLAS3 has a hope of mattering in this context. > > > > > > > - the idea would be to have safer routines (at compile time), by > means of > > > > RAII etc. > > > > > > > > I aim for zero/near-zero/negligible overhead with full > optimization, for > > > > that I include benchmarks and extensive test units. > > > > > > > > So my question is: > > > > - anyone that would be interested (in the product/in developing)? > > > > - anyone that has suggestions (maybe that what I have in mind is > > > > nonsense)? > > > > > > I would suggest making a simple performance model that says what you > will > > > do will have at least > > > a 2x speed gain. Because anything less is not worth your time, and > > > inevitably you will not get the > > > whole multiplier. I am really skeptical that is possible with the above > > > sketch. > > > That I will do as next steps for sure. But I also doubt this much of > will be > > achievable in any case. > > > > > > > Second, I would try to convince myself that what you propose would be > > > simpler, in terms of lines of code, > > > number of objects, number of concepts, etc. Right now, that is not > clear to > > > me either. > > > Number of objects per se may not be smaller. I am more thinking about > reducing > > lines of codes (verbosity), concepts and increase safety. > > > I have two examples I've been burnt with in the past: > > - casting to void* to pass custom contexts to PETSc routines > > - forgetting to call the corresponding XXXEnd after a call to XXXBegin > > (PETSc notices that, ofc., but at runtime, and that might be too late). > > > Example: I can imagine that I need a Petsc's internal array. In this > case I > > call VecGetArray. However I will inevitably foget to return the array to > > PETSc. I could have my new VecArray returning an object that restores the > > array > > when it goes out of scope. I can also flag the function with > [[nodiscard]] to > > prevent the user to destroy the returned object from the start. > > > > > > > Baring that, maybe you can argue that new capabilities, such as the type > > > flexibility described by Michael, are enabled. That > > > would be the most convincing I think. > > > This would be very interesting indeed, but I see only two options: > > - recompile PETSc twice > > - manually implement all complex routines, which might be to much of a > task > > > > > > > Thanks, > > > > > > Matt > > > Thanks for the feedback Matt. > > > > > > > If you have read up to here, thanks. > > > > > On Mon, 3 Apr 2017 at 02:00 Matthew Knepley > wrote: > > On Sun, Apr 2, 2017 at 2:15 PM, Filippo Leonardi > > wrote: > > > Hello, > > I have a project in mind and seek feedback. > > Disclaimer: I hope I am not abusing of this mailing list with > this idea. If so, please ignore. > > As a thought experiment, and to have a bit of fun, I am > currently writing/thinking on writing, a small (modern) C++ > wrapper around PETSc. > > Premise: PETSc is awesome, I love it and use in many projects. > Sometimes I am just not super comfortable writing C. (I know > my idea goes against PETSc's design philosophy). > > I know there are many around, and there is not really a need > for this (especially since PETSc has his own object-oriented > style), but there are a few things I would like to really > include in this wrapper, that I found nowhere): > - I am currently only thinking about the Vector/Matrix/KSP/DM > part of the Framework, there are many other cool things that > PETSc does that I do not have the brainpower to consider those > as well. > - expression templates (in my opinion this is where C++ > shines): this would replace all code bloat that a user might > need with cool/easy to read expressions (this could increase > the number of axpy-like routines); > - those expression templates should use SSE and AVX whenever > available; > - expressions like x += alpha * y should fall back to BLAS > axpy (tough sometimes this is not even faster than a simple loop); > > > The idea for the above is not clear. Do you want templates > generating calls to BLAS? Or scalar code that operates on raw > arrays with SSE/AVX? > There is some advantage here of expanding the range of BLAS > operations, which has been done to death by Liz Jessup and > collaborators, but not > that much. > > - all calls to PETSc should be less verbose, more C++-like: > * for instance a VecGlobalToLocalBegin could return an empty > object that calls VecGlobalToLocalEnd when it is destroyed. > * some cool idea to easily write GPU kernels. > > > If you find a way to make this pay off it would be amazing, since > currently nothing but BLAS3 has a hope of mattering in this context. > > - the idea would be to have safer routines(at compile time), > by means of RAII etc. > > I aim for zero/near-zero/negligible overhead with full > optimization, for that I include benchmarks and extensive test > units. > > So my question is: > - anyone that would be interested (in the product/in developing)? > - anyone that has suggestions (maybe that what I have in mind > is nonsense)? > > > I would suggest making a simple performance model that says what > you will do will have at least > a 2x speed gain. Because anything less is not worth your time, and > inevitably you will not get the > whole multiplier. I am really skeptical that is possible with the > above sketch. > > Second, I would try to convince myself that what you propose would > be simpler, in terms of lines of code, > number of objects, number of concepts, etc. Right now, that is not > clear to me either. > > Baring that, maybe you can argue that new capabilities, such as > the type flexibility described by Michael, are enabled. That > would be the most convincing I think. > > Thanks, > > Matt > > If you have read up to here, thanks. > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to > which their experiments lead. > -- Norbert Wiener > -- Michael Povolotskyi, PhD Research Assistant Professor Network for Computational Nanotechnology Hall for Discover and Learning Research, Room 441 207 South Martin Jischke Drive West Lafayette, IN 47907 Phone (765) 4949396 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jychang48 at gmail.com Mon Apr 3 13:04:13 2017 From: jychang48 at gmail.com (Justin Chang) Date: Mon, 3 Apr 2017 13:04:13 -0500 Subject: [petsc-users] Correlation between da_refine and pg_mg_levels In-Reply-To: <87pogtpwex.fsf@jedbrown.org> References: <87y3vlmqye.fsf@jedbrown.org> <3FA19A8A-CED7-4D65-9A4C-03D10CCBF3EF@mcs.anl.gov> <87vaqlpwyd.fsf@jedbrown.org> <87pogtpwex.fsf@jedbrown.org> Message-ID: So my makefile/script is slightly different from the tutorial directory. Basically I have a shell for loop that runs the 'make runex48' four times where -da_refine is increased each time. It showed Levels 1 0 then 2 1 0 because the job was in the middle of the loop, and I cancelled it halfway when I realized it was returning errors as I didn't want to burn any precious SU's :) Anyway, I ended up using Edison with 16 cores/node and Cori/Haswell with 32 cores/node and got some nice numbers for 128x128x16 coarse grid. I am however having issues with Cori/KNL, which I think has more to do with how I configured PETSc and/or the job scripts. On Mon, Apr 3, 2017 at 6:23 AM, Jed Brown wrote: > Matthew Knepley writes: > > I can't think why it would fail there, but DMDA really likes old numbers > of > > vertices, because it wants > > to take every other point, 129 seems good. I will see if I can reproduce > > once I get a chance. > > This problem uses periodic boundary conditions so even is right, but > Justin only defines the coarsest grid and uses -da_refine so it should > actually be irrelevant. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jychang48 at gmail.com Mon Apr 3 13:13:27 2017 From: jychang48 at gmail.com (Justin Chang) Date: Mon, 3 Apr 2017 13:13:27 -0500 Subject: [petsc-users] Configuring PETSc for KNL Message-ID: Hi all, On NERSC's Cori I have the following configure options for PETSc: ./configure --download-fblaslapack --with-cc=cc --with-clib-autodetect=0 --with-cxx=CC --with-cxxlib-autodetect=0 --with-debugging=0 --with-fc=ftn --with-fortranlib-autodetect=0 --with-mpiexec=srun --with-64-bit-indices=1 COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 PETSC_ARCH=arch-cori-opt Where I swapped out the default Intel programming environment with that of Cray (e.g., 'module switch PrgEnv-intel/6.0.3 PrgEnv-cray/6.0.3'). I want to document the performance difference between Cori's Haswell and KNL processors. When I run a PETSc example like SNES ex48 on 1024 cores (32 Haswell and 16 KNL nodes), the simulations are almost twice as fast on Haswell nodes. Which leads me to suspect that I am not doing something right for KNL. Does anyone know what are some "optimal" configure options for running PETSc on KNL? Thanks, Justin -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Mon Apr 3 13:29:53 2017 From: jroman at dsic.upv.es (Jose E. Roman) Date: Mon, 3 Apr 2017 20:29:53 +0200 Subject: [petsc-users] Slepc JD and GD converge to wrong eigenpair In-Reply-To: References: <65A0A5E7-399B-4D19-A967-73765A96DB98@dsic.upv.es> <2A5BFE40-C401-42CA-944A-9008E57B55EB@dsic.upv.es> Message-ID: <9BC45C28-0AB4-48EF-9981-B54A0FD45F41@dsic.upv.es> > El 1 abr 2017, a las 0:01, Toon Weyens escribi?: > > Dear jose, > > I have saved the matrices in Matlab format and am sending them to you using pCloud. If you want another format, please tell me. Please also note that they are about 1.4GB each. > > I also attach a typical output of eps_view and log_view in output.txt, for 8 processes. > > Thanks so much for helping me out! I think Petsc and Slepc are amazing inventions that really have saved me many months of work! > > Regards I played a little bit with your matrices. With Krylov-Schur I can solve the problem quite easily. Note that in generalized eigenvalue problems it is always better to use STSINVERT because you have to invert a matrix anyway. So instead of setting which=smallest_real, use shift-and-invert with a target that is close to the wanted eigenvalue. For instance, with target=-0.005 I get convergence with just one iteration: $ ./ex7 -f1 A.bin -f2 B.bin -st_ksp_type preonly -st_pc_type lu -st_pc_factor_mat_solver_package mumps -eps_tol 1e-5 -st_type sinvert -eps_target -0.005 Generalized eigenproblem stored in file. Reading COMPLEX matrices from binary files... Number of iterations of the method: 1 Number of linear iterations of the method: 16 Solution method: krylovschur Number of requested eigenvalues: 1 Stopping condition: tol=1e-05, maxit=7500 Linear eigensolve converged (1 eigenpair) due to CONVERGED_TOL; iterations 1 ---------------------- -------------------- k ||Ax-kBx||/||kx|| ---------------------- -------------------- -0.004809-0.000000i 8.82085e-05 ---------------------- -------------------- Of course, you don't know a priori where your eigenvalue is. Alternatively, you can set the target at 0 and get rid of positive eigenvalues with a region filtering. For instance: $ ./ex7 -f1 A.bin -f2 B.bin -st_ksp_type preonly -st_pc_type lu -st_pc_factor_mat_solver_package mumps -eps_tol 1e-5 -st_type sinvert -eps_target 0 -rg_type interval -rg_interval_endpoints -1,0,-.05,.05 -eps_nev 2 Generalized eigenproblem stored in file. Reading COMPLEX matrices from binary files... Number of iterations of the method: 8 Number of linear iterations of the method: 74 Solution method: krylovschur Number of requested eigenvalues: 2 Stopping condition: tol=1e-05, maxit=7058 Linear eigensolve converged (2 eigenpairs) due to CONVERGED_TOL; iterations 8 ---------------------- -------------------- k ||Ax-kBx||/||kx|| ---------------------- -------------------- -0.000392-0.000000i 2636.4 -0.004809+0.000000i 318441 ---------------------- -------------------- In this case, the residuals seem very bad. But this is due to the fact that your matrices have huge norms. Adding the option -eps_error_backward ::ascii_info_detail will show residuals relative to the matrix norms: ---------------------- -------------------- k eta(x,k) ---------------------- -------------------- -0.000392-0.000000i 3.78647e-11 -0.004809+0.000000i 5.61419e-08 ---------------------- -------------------- Regarding the GD solver, I am also getting the correct solution. I don't know why you are not getting convergence to the wanted eigenvalue: $ ./ex7 -f1 A.bin -f2 B.bin -st_ksp_type preonly -st_pc_type lu -st_pc_factor_mat_solver_package mumps -eps_tol 1e-5 -eps_smallest_real -eps_ncv 32 -eps_type gd Generalized eigenproblem stored in file. Reading COMPLEX matrices from binary files... Number of iterations of the method: 132 Number of linear iterations of the method: 0 Solution method: gd Number of requested eigenvalues: 1 Stopping condition: tol=1e-05, maxit=120000 Linear eigensolve converged (1 eigenpair) due to CONVERGED_TOL; iterations 132 ---------------------- -------------------- k ||Ax-kBx||/||kx|| ---------------------- -------------------- -0.004809+0.000000i 2.16223e-05 ---------------------- -------------------- Again, it is much better to use a target instead of smallest_real: $ ./ex7 -f1 A.bin -f2 B.bin -st_ksp_type preonly -st_pc_type lu -st_pc_factor_mat_solver_package mumps -eps_tol 1e-5 -eps_type gd -eps_target -0.005 Generalized eigenproblem stored in file. Reading COMPLEX matrices from binary files... Number of iterations of the method: 23 Number of linear iterations of the method: 0 Solution method: gd Number of requested eigenvalues: 1 Stopping condition: tol=1e-05, maxit=120000 Linear eigensolve converged (1 eigenpair) due to CONVERGED_TOL; iterations 23 ---------------------- -------------------- k ||Ax-kBx||/||kx|| ---------------------- -------------------- -0.004809-0.000000i 2.06572e-05 ---------------------- -------------------- Jose From richardtmills at gmail.com Mon Apr 3 13:36:34 2017 From: richardtmills at gmail.com (Richard Mills) Date: Mon, 3 Apr 2017 11:36:34 -0700 Subject: [petsc-users] Configuring PETSc for KNL In-Reply-To: References: Message-ID: Hi Justin, How is the MCDRAM (on-package "high-bandwidth memory") configured for your KNL runs? And if it is in "flat" mode, what are you doing to ensure that you use the MCDRAM? Doing this wrong seems to be one of the most common reasons for unexpected poor performance on KNL. I'm not that familiar with the environment on Cori, but I think that if you are building for KNL, you should add "-xMIC-AVX512" to your compiler flags to explicitly instruct the compiler to use the AVX512 instruction set. I usually use something along the lines of 'COPTFLAGS=-g -O3 -fp-model fast -xMIC-AVX512' (The "-g" just adds symbols, which make the output from performance profiling tools much more useful.) That said, I think that if you are comparing 1024 Haswell cores vs. 1024 KNL cores (so double the number of Haswell nodes), I'm not surprised that the simulations are almost twice as fast using the Haswell nodes. Keep in mind that individual KNL cores are much less powerful than an individual Haswell node. You are also using roughly twice the power footprint (dual socket Haswell node should be roughly equivalent to a KNL node, I believe). How do things look on when you compare equal nodes? Cheers, Richard On Mon, Apr 3, 2017 at 11:13 AM, Justin Chang wrote: > Hi all, > > On NERSC's Cori I have the following configure options for PETSc: > > ./configure --download-fblaslapack --with-cc=cc --with-clib-autodetect=0 > --with-cxx=CC --with-cxxlib-autodetect=0 --with-debugging=0 --with-fc=ftn > --with-fortranlib-autodetect=0 --with-mpiexec=srun --with-64-bit-indices=1 > COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 PETSC_ARCH=arch-cori-opt > > Where I swapped out the default Intel programming environment with that of > Cray (e.g., 'module switch PrgEnv-intel/6.0.3 PrgEnv-cray/6.0.3'). I want > to document the performance difference between Cori's Haswell and KNL > processors. > > When I run a PETSc example like SNES ex48 on 1024 cores (32 Haswell and 16 > KNL nodes), the simulations are almost twice as fast on Haswell nodes. > Which leads me to suspect that I am not doing something right for KNL. Does > anyone know what are some "optimal" configure options for running PETSc on > KNL? > > Thanks, > Justin > -------------- next part -------------- An HTML attachment was scrubbed... URL: From richardtmills at gmail.com Mon Apr 3 13:40:33 2017 From: richardtmills at gmail.com (Richard Mills) Date: Mon, 3 Apr 2017 11:40:33 -0700 Subject: [petsc-users] Configuring PETSc for KNL In-Reply-To: References: Message-ID: Fixing typo: Meant to say "Keep in mind that individual KNL cores are much less powerful than an individual Haswell *core*." --Richard On Mon, Apr 3, 2017 at 11:36 AM, Richard Mills wrote: > Hi Justin, > > How is the MCDRAM (on-package "high-bandwidth memory") configured for your > KNL runs? And if it is in "flat" mode, what are you doing to ensure that > you use the MCDRAM? Doing this wrong seems to be one of the most common > reasons for unexpected poor performance on KNL. > > I'm not that familiar with the environment on Cori, but I think that if > you are building for KNL, you should add "-xMIC-AVX512" to your compiler > flags to explicitly instruct the compiler to use the AVX512 instruction > set. I usually use something along the lines of > > 'COPTFLAGS=-g -O3 -fp-model fast -xMIC-AVX512' > > (The "-g" just adds symbols, which make the output from performance > profiling tools much more useful.) > > That said, I think that if you are comparing 1024 Haswell cores vs. 1024 > KNL cores (so double the number of Haswell nodes), I'm not surprised that > the simulations are almost twice as fast using the Haswell nodes. Keep in > mind that individual KNL cores are much less powerful than an individual > Haswell node. You are also using roughly twice the power footprint (dual > socket Haswell node should be roughly equivalent to a KNL node, I > believe). How do things look on when you compare equal nodes? > > Cheers, > Richard > > On Mon, Apr 3, 2017 at 11:13 AM, Justin Chang wrote: > >> Hi all, >> >> On NERSC's Cori I have the following configure options for PETSc: >> >> ./configure --download-fblaslapack --with-cc=cc --with-clib-autodetect=0 >> --with-cxx=CC --with-cxxlib-autodetect=0 --with-debugging=0 --with-fc=ftn >> --with-fortranlib-autodetect=0 --with-mpiexec=srun --with-64-bit-indices=1 >> COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 PETSC_ARCH=arch-cori-opt >> >> Where I swapped out the default Intel programming environment with that >> of Cray (e.g., 'module switch PrgEnv-intel/6.0.3 PrgEnv-cray/6.0.3'). I >> want to document the performance difference between Cori's Haswell and KNL >> processors. >> >> When I run a PETSc example like SNES ex48 on 1024 cores (32 Haswell and >> 16 KNL nodes), the simulations are almost twice as fast on Haswell nodes. >> Which leads me to suspect that I am not doing something right for KNL. Does >> anyone know what are some "optimal" configure options for running PETSc on >> KNL? >> >> Thanks, >> Justin >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jychang48 at gmail.com Mon Apr 3 13:44:42 2017 From: jychang48 at gmail.com (Justin Chang) Date: Mon, 3 Apr 2017 13:44:42 -0500 Subject: [petsc-users] Configuring PETSc for KNL In-Reply-To: References: Message-ID: Richard, This is what my job script looks like: #!/bin/bash #SBATCH -N 16 #SBATCH -C knl,quad,flat #SBATCH -p regular #SBATCH -J knlflat1024 #SBATCH -L SCRATCH #SBATCH -o knlflat1024.o%j #SBATCH --mail-type=ALL #SBATCH --mail-user=jychang48 at gmail.com #SBATCH -t 00:20:00 #run the application: cd $SCRATCH/Icesheet sbcast --compress=lz4 ./ex48cori /tmp/ex48cori srun -n 1024 -c 4 --cpu_bind=cores numactl -p 1 /tmp/ex48cori -M 128 -N 128 -P 16 -thi_mat_type baij -pc_type mg -mg_coarse_pc_type gamg -da_refine 1 According to the NERSC info pages, they say to add the "numactl" if using flat mode. Previously I tried cache mode but the performance seems to be unaffected. I also comparerd 256 haswell nodes vs 256 KNL nodes and haswell is nearly 4-5x faster. Though I suspect this drastic change has much to do with the initial coarse grid size now being extremely small. I'll give the COPTFLAGS a try and see what happens Thanks, Justin On Mon, Apr 3, 2017 at 1:36 PM, Richard Mills wrote: > Hi Justin, > > How is the MCDRAM (on-package "high-bandwidth memory") configured for your > KNL runs? And if it is in "flat" mode, what are you doing to ensure that > you use the MCDRAM? Doing this wrong seems to be one of the most common > reasons for unexpected poor performance on KNL. > > I'm not that familiar with the environment on Cori, but I think that if > you are building for KNL, you should add "-xMIC-AVX512" to your compiler > flags to explicitly instruct the compiler to use the AVX512 instruction > set. I usually use something along the lines of > > 'COPTFLAGS=-g -O3 -fp-model fast -xMIC-AVX512' > > (The "-g" just adds symbols, which make the output from performance > profiling tools much more useful.) > > That said, I think that if you are comparing 1024 Haswell cores vs. 1024 > KNL cores (so double the number of Haswell nodes), I'm not surprised that > the simulations are almost twice as fast using the Haswell nodes. Keep in > mind that individual KNL cores are much less powerful than an individual > Haswell node. You are also using roughly twice the power footprint (dual > socket Haswell node should be roughly equivalent to a KNL node, I > believe). How do things look on when you compare equal nodes? > > Cheers, > Richard > > On Mon, Apr 3, 2017 at 11:13 AM, Justin Chang wrote: > >> Hi all, >> >> On NERSC's Cori I have the following configure options for PETSc: >> >> ./configure --download-fblaslapack --with-cc=cc --with-clib-autodetect=0 >> --with-cxx=CC --with-cxxlib-autodetect=0 --with-debugging=0 --with-fc=ftn >> --with-fortranlib-autodetect=0 --with-mpiexec=srun --with-64-bit-indices=1 >> COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 PETSC_ARCH=arch-cori-opt >> >> Where I swapped out the default Intel programming environment with that >> of Cray (e.g., 'module switch PrgEnv-intel/6.0.3 PrgEnv-cray/6.0.3'). I >> want to document the performance difference between Cori's Haswell and KNL >> processors. >> >> When I run a PETSc example like SNES ex48 on 1024 cores (32 Haswell and >> 16 KNL nodes), the simulations are almost twice as fast on Haswell nodes. >> Which leads me to suspect that I am not doing something right for KNL. Does >> anyone know what are some "optimal" configure options for running PETSc on >> KNL? >> >> Thanks, >> Justin >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Mon Apr 3 14:24:41 2017 From: hongzhang at anl.gov (Zhang, Hong) Date: Mon, 3 Apr 2017 19:24:41 +0000 Subject: [petsc-users] Configuring PETSc for KNL In-Reply-To: References: Message-ID: On Apr 3, 2017, at 1:44 PM, Justin Chang > wrote: Richard, This is what my job script looks like: #!/bin/bash #SBATCH -N 16 #SBATCH -C knl,quad,flat #SBATCH -p regular #SBATCH -J knlflat1024 #SBATCH -L SCRATCH #SBATCH -o knlflat1024.o%j #SBATCH --mail-type=ALL #SBATCH --mail-user=jychang48 at gmail.com #SBATCH -t 00:20:00 #run the application: cd $SCRATCH/Icesheet sbcast --compress=lz4 ./ex48cori /tmp/ex48cori srun -n 1024 -c 4 --cpu_bind=cores numactl -p 1 /tmp/ex48cori -M 128 -N 128 -P 16 -thi_mat_type baij -pc_type mg -mg_coarse_pc_type gamg -da_refine 1 Maybe it is a typo. It should be numactl -m 1. According to the NERSC info pages, they say to add the "numactl" if using flat mode. Previously I tried cache mode but the performance seems to be unaffected. Using cache mode should give similar performance as using flat mode with the numactl option. But both approaches should be significant faster than using flat mode without the numactl option. I usually see over 3X speedup. You can also do such comparison to see if the high-bandwidth memory is working properly. I also comparerd 256 haswell nodes vs 256 KNL nodes and haswell is nearly 4-5x faster. Though I suspect this drastic change has much to do with the initial coarse grid size now being extremely small. I'll give the COPTFLAGS a try and see what happens Make sure to use --with-memalign=64 for data alignment when configuring PETSc. The option -xMIC-AVX512 would improve the vectorization performance. But it may cause problems for the MPIBAIJ format for some unknown reason. MPIAIJ should work fine with this option. Hong (Mr.) Thanks, Justin On Mon, Apr 3, 2017 at 1:36 PM, Richard Mills > wrote: Hi Justin, How is the MCDRAM (on-package "high-bandwidth memory") configured for your KNL runs? And if it is in "flat" mode, what are you doing to ensure that you use the MCDRAM? Doing this wrong seems to be one of the most common reasons for unexpected poor performance on KNL. I'm not that familiar with the environment on Cori, but I think that if you are building for KNL, you should add "-xMIC-AVX512" to your compiler flags to explicitly instruct the compiler to use the AVX512 instruction set. I usually use something along the lines of 'COPTFLAGS=-g -O3 -fp-model fast -xMIC-AVX512' (The "-g" just adds symbols, which make the output from performance profiling tools much more useful.) That said, I think that if you are comparing 1024 Haswell cores vs. 1024 KNL cores (so double the number of Haswell nodes), I'm not surprised that the simulations are almost twice as fast using the Haswell nodes. Keep in mind that individual KNL cores are much less powerful than an individual Haswell node. You are also using roughly twice the power footprint (dual socket Haswell node should be roughly equivalent to a KNL node, I believe). How do things look on when you compare equal nodes? Cheers, Richard On Mon, Apr 3, 2017 at 11:13 AM, Justin Chang > wrote: Hi all, On NERSC's Cori I have the following configure options for PETSc: ./configure --download-fblaslapack --with-cc=cc --with-clib-autodetect=0 --with-cxx=CC --with-cxxlib-autodetect=0 --with-debugging=0 --with-fc=ftn --with-fortranlib-autodetect=0 --with-mpiexec=srun --with-64-bit-indices=1 COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 PETSC_ARCH=arch-cori-opt Where I swapped out the default Intel programming environment with that of Cray (e.g., 'module switch PrgEnv-intel/6.0.3 PrgEnv-cray/6.0.3'). I want to document the performance difference between Cori's Haswell and KNL processors. When I run a PETSc example like SNES ex48 on 1024 cores (32 Haswell and 16 KNL nodes), the simulations are almost twice as fast on Haswell nodes. Which leads me to suspect that I am not doing something right for KNL. Does anyone know what are some "optimal" configure options for running PETSc on KNL? Thanks, Justin -------------- next part -------------- An HTML attachment was scrubbed... URL: From richardtmills at gmail.com Mon Apr 3 14:45:05 2017 From: richardtmills at gmail.com (Richard Mills) Date: Mon, 3 Apr 2017 12:45:05 -0700 Subject: [petsc-users] Configuring PETSc for KNL In-Reply-To: References: Message-ID: On Mon, Apr 3, 2017 at 12:24 PM, Zhang, Hong wrote: > > On Apr 3, 2017, at 1:44 PM, Justin Chang wrote: > > Richard, > > This is what my job script looks like: > > #!/bin/bash > #SBATCH -N 16 > #SBATCH -C knl,quad,flat > #SBATCH -p regular > #SBATCH -J knlflat1024 > #SBATCH -L SCRATCH > #SBATCH -o knlflat1024.o%j > #SBATCH --mail-type=ALL > #SBATCH --mail-user=jychang48 at gmail.com > #SBATCH -t 00:20:00 > > #run the application: > cd $SCRATCH/Icesheet > sbcast --compress=lz4 ./ex48cori /tmp/ex48cori > srun -n 1024 -c 4 --cpu_bind=cores numactl -p 1 /tmp/ex48cori -M 128 -N > 128 -P 16 -thi_mat_type baij -pc_type mg -mg_coarse_pc_type gamg -da_refine > 1 > > > Maybe it is a typo. It should be numactl -m 1. > "-p 1" will also work. "-p" means to "prefer" NUMA node 1 (the MCDRAM), whereas "-m" means to use only NUMA node 1. In the former case, MCDRAM will be used for allocations until the available memory there has been exhausted, and then things will spill over into the DRAM. One would think that "-m" would be better for doing performance studies, but on systems where the nodes have swap space enabled, you can get terrible performance if your code's working set exceeds the size of the MCDRAM, as the system will obediently obey your wishes to not use the DRAM and go straight to the swap disk! I assume the Cori nodes don't have swap space, though I could be wrong. > According to the NERSC info pages, they say to add the "numactl" if using > flat mode. Previously I tried cache mode but the performance seems to be > unaffected. > > > Using cache mode should give similar performance as using flat mode with > the numactl option. But both approaches should be significant faster than > using flat mode without the numactl option. I usually see over 3X speedup. > You can also do such comparison to see if the high-bandwidth memory is > working properly. > > I also comparerd 256 haswell nodes vs 256 KNL nodes and haswell is nearly > 4-5x faster. Though I suspect this drastic change has much to do with the > initial coarse grid size now being extremely small. > > I think you may be right about why you see such a big difference. The KNL nodes need enough work to be able to use the SIMD lanes effectively. Also, if your problem gets small enough, then it's going to be able to fit in the Haswell's L3 cache. Although KNL has MCDRAM and this delivers *a lot* more memory bandwidth than the DDR4 memory, it will deliver a lot less bandwidth than the Haswell's L3. > I'll give the COPTFLAGS a try and see what happens > > > Make sure to use --with-memalign=64 for data alignment when configuring > PETSc. > Ah, yes, I forgot that. Thanks for mentioning it, Hong! > The option -xMIC-AVX512 would improve the vectorization performance. But > it may cause problems for the MPIBAIJ format for some unknown reason. > MPIAIJ should work fine with this option. > Hmm. Try both, and, if you see worse performance with MPIBAIJ, let us know and I'll try to figure this out. --Richard > > Hong (Mr.) > > Thanks, > Justin > > On Mon, Apr 3, 2017 at 1:36 PM, Richard Mills > wrote: > >> Hi Justin, >> >> How is the MCDRAM (on-package "high-bandwidth memory") configured for >> your KNL runs? And if it is in "flat" mode, what are you doing to ensure >> that you use the MCDRAM? Doing this wrong seems to be one of the most >> common reasons for unexpected poor performance on KNL. >> >> I'm not that familiar with the environment on Cori, but I think that if >> you are building for KNL, you should add "-xMIC-AVX512" to your compiler >> flags to explicitly instruct the compiler to use the AVX512 instruction >> set. I usually use something along the lines of >> >> 'COPTFLAGS=-g -O3 -fp-model fast -xMIC-AVX512' >> >> (The "-g" just adds symbols, which make the output from performance >> profiling tools much more useful.) >> >> That said, I think that if you are comparing 1024 Haswell cores vs. 1024 >> KNL cores (so double the number of Haswell nodes), I'm not surprised that >> the simulations are almost twice as fast using the Haswell nodes. Keep in >> mind that individual KNL cores are much less powerful than an individual >> Haswell node. You are also using roughly twice the power footprint (dual >> socket Haswell node should be roughly equivalent to a KNL node, I >> believe). How do things look on when you compare equal nodes? >> >> Cheers, >> Richard >> >> On Mon, Apr 3, 2017 at 11:13 AM, Justin Chang >> wrote: >> >>> Hi all, >>> >>> On NERSC's Cori I have the following configure options for PETSc: >>> >>> ./configure --download-fblaslapack --with-cc=cc --with-clib-autodetect=0 >>> --with-cxx=CC --with-cxxlib-autodetect=0 --with-debugging=0 --with-fc=ftn >>> --with-fortranlib-autodetect=0 --with-mpiexec=srun --with-64-bit-indices=1 >>> COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 PETSC_ARCH=arch-cori-opt >>> >>> Where I swapped out the default Intel programming environment with that >>> of Cray (e.g., 'module switch PrgEnv-intel/6.0.3 PrgEnv-cray/6.0.3'). I >>> want to document the performance difference between Cori's Haswell and KNL >>> processors. >>> >>> When I run a PETSc example like SNES ex48 on 1024 cores (32 Haswell and >>> 16 KNL nodes), the simulations are almost twice as fast on Haswell nodes. >>> Which leads me to suspect that I am not doing something right for KNL. Does >>> anyone know what are some "optimal" configure options for running PETSc on >>> KNL? >>> >>> Thanks, >>> Justin >>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ingogaertner.tus at gmail.com Mon Apr 3 14:50:25 2017 From: ingogaertner.tus at gmail.com (Ingo Gaertner) Date: Mon, 3 Apr 2017 21:50:25 +0200 Subject: [petsc-users] examples of DMPlex*FVM methods Message-ID: Dear all, as part of my studies I would like to implement a simple finite volume CFD solver (following the textbook by Ferziger) on an unstructured, distributed mesh. It seems like the DMPlex class with its DMPlex*FVM methods has prepared much of what is needed for such a CFD solver. Unfortunately I couldn't find any examples how all the great solutions can be used that appear to be already implemented in PETSc. Are some basic tutorials available, for example how to solve a simple poisson equation on a DMPlex using PETSc's FVM methods, so that I don't have to reinvent the wheel? Thanks Ingo Virenfrei. www.avast.com <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> -------------- next part -------------- An HTML attachment was scrubbed... URL: From keceli at gmail.com Mon Apr 3 14:57:36 2017 From: keceli at gmail.com (=?UTF-8?Q?murat_ke=C3=A7eli?=) Date: Mon, 3 Apr 2017 14:57:36 -0500 Subject: [petsc-users] Configuring PETSc for KNL In-Reply-To: References: Message-ID: How about replacing --download-fblaslapack with vendor specific BLAS/LAPACK? Murat On Mon, Apr 3, 2017 at 2:45 PM, Richard Mills wrote: > On Mon, Apr 3, 2017 at 12:24 PM, Zhang, Hong wrote: > >> >> On Apr 3, 2017, at 1:44 PM, Justin Chang wrote: >> >> Richard, >> >> This is what my job script looks like: >> >> #!/bin/bash >> #SBATCH -N 16 >> #SBATCH -C knl,quad,flat >> #SBATCH -p regular >> #SBATCH -J knlflat1024 >> #SBATCH -L SCRATCH >> #SBATCH -o knlflat1024.o%j >> #SBATCH --mail-type=ALL >> #SBATCH --mail-user=jychang48 at gmail.com >> #SBATCH -t 00:20:00 >> >> #run the application: >> cd $SCRATCH/Icesheet >> sbcast --compress=lz4 ./ex48cori /tmp/ex48cori >> srun -n 1024 -c 4 --cpu_bind=cores numactl -p 1 /tmp/ex48cori -M 128 -N >> 128 -P 16 -thi_mat_type baij -pc_type mg -mg_coarse_pc_type gamg -da_refine >> 1 >> >> >> Maybe it is a typo. It should be numactl -m 1. >> > > "-p 1" will also work. "-p" means to "prefer" NUMA node 1 (the MCDRAM), > whereas "-m" means to use only NUMA node 1. In the former case, MCDRAM > will be used for allocations until the available memory there has been > exhausted, and then things will spill over into the DRAM. One would think > that "-m" would be better for doing performance studies, but on systems > where the nodes have swap space enabled, you can get terrible performance > if your code's working set exceeds the size of the MCDRAM, as the system > will obediently obey your wishes to not use the DRAM and go straight to the > swap disk! I assume the Cori nodes don't have swap space, though I could > be wrong. > > >> According to the NERSC info pages, they say to add the "numactl" if using >> flat mode. Previously I tried cache mode but the performance seems to be >> unaffected. >> >> >> Using cache mode should give similar performance as using flat mode with >> the numactl option. But both approaches should be significant faster than >> using flat mode without the numactl option. I usually see over 3X speedup. >> You can also do such comparison to see if the high-bandwidth memory is >> working properly. >> >> I also comparerd 256 haswell nodes vs 256 KNL nodes and haswell is nearly >> 4-5x faster. Though I suspect this drastic change has much to do with the >> initial coarse grid size now being extremely small. >> >> I think you may be right about why you see such a big difference. The > KNL nodes need enough work to be able to use the SIMD lanes effectively. > Also, if your problem gets small enough, then it's going to be able to fit > in the Haswell's L3 cache. Although KNL has MCDRAM and this delivers *a > lot* more memory bandwidth than the DDR4 memory, it will deliver a lot less > bandwidth than the Haswell's L3. > >> I'll give the COPTFLAGS a try and see what happens >> >> >> Make sure to use --with-memalign=64 for data alignment when configuring >> PETSc. >> > > Ah, yes, I forgot that. Thanks for mentioning it, Hong! > > >> The option -xMIC-AVX512 would improve the vectorization performance. But >> it may cause problems for the MPIBAIJ format for some unknown reason. >> MPIAIJ should work fine with this option. >> > > Hmm. Try both, and, if you see worse performance with MPIBAIJ, let us > know and I'll try to figure this out. > > --Richard > > >> >> Hong (Mr.) >> >> Thanks, >> Justin >> >> On Mon, Apr 3, 2017 at 1:36 PM, Richard Mills >> wrote: >> >>> Hi Justin, >>> >>> How is the MCDRAM (on-package "high-bandwidth memory") configured for >>> your KNL runs? And if it is in "flat" mode, what are you doing to ensure >>> that you use the MCDRAM? Doing this wrong seems to be one of the most >>> common reasons for unexpected poor performance on KNL. >>> >>> I'm not that familiar with the environment on Cori, but I think that if >>> you are building for KNL, you should add "-xMIC-AVX512" to your compiler >>> flags to explicitly instruct the compiler to use the AVX512 instruction >>> set. I usually use something along the lines of >>> >>> 'COPTFLAGS=-g -O3 -fp-model fast -xMIC-AVX512' >>> >>> (The "-g" just adds symbols, which make the output from performance >>> profiling tools much more useful.) >>> >>> That said, I think that if you are comparing 1024 Haswell cores vs. 1024 >>> KNL cores (so double the number of Haswell nodes), I'm not surprised that >>> the simulations are almost twice as fast using the Haswell nodes. Keep in >>> mind that individual KNL cores are much less powerful than an individual >>> Haswell node. You are also using roughly twice the power footprint (dual >>> socket Haswell node should be roughly equivalent to a KNL node, I >>> believe). How do things look on when you compare equal nodes? >>> >>> Cheers, >>> Richard >>> >>> On Mon, Apr 3, 2017 at 11:13 AM, Justin Chang >>> wrote: >>> >>>> Hi all, >>>> >>>> On NERSC's Cori I have the following configure options for PETSc: >>>> >>>> ./configure --download-fblaslapack --with-cc=cc >>>> --with-clib-autodetect=0 --with-cxx=CC --with-cxxlib-autodetect=0 >>>> --with-debugging=0 --with-fc=ftn --with-fortranlib-autodetect=0 >>>> --with-mpiexec=srun --with-64-bit-indices=1 COPTFLAGS=-O3 CXXOPTFLAGS=-O3 >>>> FOPTFLAGS=-O3 PETSC_ARCH=arch-cori-opt >>>> >>>> Where I swapped out the default Intel programming environment with that >>>> of Cray (e.g., 'module switch PrgEnv-intel/6.0.3 PrgEnv-cray/6.0.3'). I >>>> want to document the performance difference between Cori's Haswell and KNL >>>> processors. >>>> >>>> When I run a PETSc example like SNES ex48 on 1024 cores (32 Haswell and >>>> 16 KNL nodes), the simulations are almost twice as fast on Haswell nodes. >>>> Which leads me to suspect that I am not doing something right for KNL. Does >>>> anyone know what are some "optimal" configure options for running PETSc on >>>> KNL? >>>> >>>> Thanks, >>>> Justin >>>> >>> >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From richardtmills at gmail.com Mon Apr 3 15:06:22 2017 From: richardtmills at gmail.com (Richard Mills) Date: Mon, 3 Apr 2017 13:06:22 -0700 Subject: [petsc-users] Configuring PETSc for KNL In-Reply-To: References: Message-ID: Yes, one should rely on MKL (or Cray LibSci, if using the Cray toolchain) on Cori. But I'm guessing that this will make no noticeable difference for what Justin is doing. --Richard On Mon, Apr 3, 2017 at 12:57 PM, murat ke?eli wrote: > How about replacing --download-fblaslapack with vendor specific > BLAS/LAPACK? > > Murat > > On Mon, Apr 3, 2017 at 2:45 PM, Richard Mills > wrote: > >> On Mon, Apr 3, 2017 at 12:24 PM, Zhang, Hong wrote: >> >>> >>> On Apr 3, 2017, at 1:44 PM, Justin Chang wrote: >>> >>> Richard, >>> >>> This is what my job script looks like: >>> >>> #!/bin/bash >>> #SBATCH -N 16 >>> #SBATCH -C knl,quad,flat >>> #SBATCH -p regular >>> #SBATCH -J knlflat1024 >>> #SBATCH -L SCRATCH >>> #SBATCH -o knlflat1024.o%j >>> #SBATCH --mail-type=ALL >>> #SBATCH --mail-user=jychang48 at gmail.com >>> #SBATCH -t 00:20:00 >>> >>> #run the application: >>> cd $SCRATCH/Icesheet >>> sbcast --compress=lz4 ./ex48cori /tmp/ex48cori >>> srun -n 1024 -c 4 --cpu_bind=cores numactl -p 1 /tmp/ex48cori -M 128 -N >>> 128 -P 16 -thi_mat_type baij -pc_type mg -mg_coarse_pc_type gamg -da_refine >>> 1 >>> >>> >>> Maybe it is a typo. It should be numactl -m 1. >>> >> >> "-p 1" will also work. "-p" means to "prefer" NUMA node 1 (the MCDRAM), >> whereas "-m" means to use only NUMA node 1. In the former case, MCDRAM >> will be used for allocations until the available memory there has been >> exhausted, and then things will spill over into the DRAM. One would think >> that "-m" would be better for doing performance studies, but on systems >> where the nodes have swap space enabled, you can get terrible performance >> if your code's working set exceeds the size of the MCDRAM, as the system >> will obediently obey your wishes to not use the DRAM and go straight to the >> swap disk! I assume the Cori nodes don't have swap space, though I could >> be wrong. >> >> >>> According to the NERSC info pages, they say to add the "numactl" if >>> using flat mode. Previously I tried cache mode but the performance seems to >>> be unaffected. >>> >>> >>> Using cache mode should give similar performance as using flat mode with >>> the numactl option. But both approaches should be significant faster than >>> using flat mode without the numactl option. I usually see over 3X speedup. >>> You can also do such comparison to see if the high-bandwidth memory is >>> working properly. >>> >>> I also comparerd 256 haswell nodes vs 256 KNL nodes and haswell is >>> nearly 4-5x faster. Though I suspect this drastic change has much to do >>> with the initial coarse grid size now being extremely small. >>> >>> I think you may be right about why you see such a big difference. The >> KNL nodes need enough work to be able to use the SIMD lanes effectively. >> Also, if your problem gets small enough, then it's going to be able to fit >> in the Haswell's L3 cache. Although KNL has MCDRAM and this delivers *a >> lot* more memory bandwidth than the DDR4 memory, it will deliver a lot less >> bandwidth than the Haswell's L3. >> >>> I'll give the COPTFLAGS a try and see what happens >>> >>> >>> Make sure to use --with-memalign=64 for data alignment when configuring >>> PETSc. >>> >> >> Ah, yes, I forgot that. Thanks for mentioning it, Hong! >> >> >>> The option -xMIC-AVX512 would improve the vectorization performance. But >>> it may cause problems for the MPIBAIJ format for some unknown reason. >>> MPIAIJ should work fine with this option. >>> >> >> Hmm. Try both, and, if you see worse performance with MPIBAIJ, let us >> know and I'll try to figure this out. >> >> --Richard >> >> >>> >>> Hong (Mr.) >>> >>> Thanks, >>> Justin >>> >>> On Mon, Apr 3, 2017 at 1:36 PM, Richard Mills >>> wrote: >>> >>>> Hi Justin, >>>> >>>> How is the MCDRAM (on-package "high-bandwidth memory") configured for >>>> your KNL runs? And if it is in "flat" mode, what are you doing to ensure >>>> that you use the MCDRAM? Doing this wrong seems to be one of the most >>>> common reasons for unexpected poor performance on KNL. >>>> >>>> I'm not that familiar with the environment on Cori, but I think that if >>>> you are building for KNL, you should add "-xMIC-AVX512" to your compiler >>>> flags to explicitly instruct the compiler to use the AVX512 instruction >>>> set. I usually use something along the lines of >>>> >>>> 'COPTFLAGS=-g -O3 -fp-model fast -xMIC-AVX512' >>>> >>>> (The "-g" just adds symbols, which make the output from performance >>>> profiling tools much more useful.) >>>> >>>> That said, I think that if you are comparing 1024 Haswell cores vs. >>>> 1024 KNL cores (so double the number of Haswell nodes), I'm not surprised >>>> that the simulations are almost twice as fast using the Haswell nodes. >>>> Keep in mind that individual KNL cores are much less powerful than an >>>> individual Haswell node. You are also using roughly twice the power >>>> footprint (dual socket Haswell node should be roughly equivalent to a KNL >>>> node, I believe). How do things look on when you compare equal nodes? >>>> >>>> Cheers, >>>> Richard >>>> >>>> On Mon, Apr 3, 2017 at 11:13 AM, Justin Chang >>>> wrote: >>>> >>>>> Hi all, >>>>> >>>>> On NERSC's Cori I have the following configure options for PETSc: >>>>> >>>>> ./configure --download-fblaslapack --with-cc=cc >>>>> --with-clib-autodetect=0 --with-cxx=CC --with-cxxlib-autodetect=0 >>>>> --with-debugging=0 --with-fc=ftn --with-fortranlib-autodetect=0 >>>>> --with-mpiexec=srun --with-64-bit-indices=1 COPTFLAGS=-O3 CXXOPTFLAGS=-O3 >>>>> FOPTFLAGS=-O3 PETSC_ARCH=arch-cori-opt >>>>> >>>>> Where I swapped out the default Intel programming environment with >>>>> that of Cray (e.g., 'module switch PrgEnv-intel/6.0.3 PrgEnv-cray/6.0.3'). >>>>> I want to document the performance difference between Cori's Haswell and >>>>> KNL processors. >>>>> >>>>> When I run a PETSc example like SNES ex48 on 1024 cores (32 Haswell >>>>> and 16 KNL nodes), the simulations are almost twice as fast on Haswell >>>>> nodes. Which leads me to suspect that I am not doing something right for >>>>> KNL. Does anyone know what are some "optimal" configure options for running >>>>> PETSc on KNL? >>>>> >>>>> Thanks, >>>>> Justin >>>>> >>>> >>>> >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon Apr 3 15:33:51 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 3 Apr 2017 16:33:51 -0400 Subject: [petsc-users] Question about DMDA BOUNDARY_CONDITION set In-Reply-To: References: Message-ID: > On Apr 3, 2017, at 1:10 PM, Wenbo Zhao wrote: > > Barry, > Hi. I am sorry for too late to reply you. > I read the code you send to me which create a VecScatter for ghost points on rotation boundary. > But I am still not clear to how to use it to assemble the matrix. You did not ask specifically about assembling the matrix for the rotated boundary; what I provided was a way to update the rotated ghost locations with the correct ghost values so you could write function evaluations that use the local representation including the rotated ghost locations. The easiest way to avoid the error below is to turn off the flag not allowing additional matrix allocations when you set the values MatSetOption(A,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE); This will make the assembly process slower. But if this is a one time thing you want to do (i.e. you are not devoting your career to this problem with the rotated boundary) then just wait for the slow assembly. > > I studied the example "$SLEPC_DIR/src/eps/examples/tutorials/ex19.c", which is a 3D eigenvalue problem. > > > [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov---------- > " > I think the reason is Mat Prellocation. If so, I need to create matrix using MatCreate by myself according to the src code of DMCreateMatrix_DA? Is it correct? Yes you can do this. It will be tedious, you can reuse some of the code I sent in the example to get the mappings correct. > > Could you give some explanation about how to use VecScatter? I don't understand. The example shows exactly how to use the scatter (but it won't help you fill up matrices). > In the $PETSC_DIR/src/ksp/ksp/examples/tutorials/ex2.c, I did not find the VecScatter. > I found VecScatter is created in the src code of dm/impls/da/da1.c but i am not clear about it. Barry > > BEST, > Wenbo > > From bsmith at mcs.anl.gov Mon Apr 3 16:26:53 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 3 Apr 2017 17:26:53 -0400 Subject: [petsc-users] -snes_mf_operator yields "No support for this operation for this object type" in TS codes? In-Reply-To: <874ly5pow3.fsf@jedbrown.org> References: <877f32qlf1.fsf@jedbrown.org> <61C35066-A114-4C0B-A026-94677F4B3C7B@mcs.anl.gov> <87h925pscc.fsf@jedbrown.org> <7CDCCF9F-F6B0-44E5-AB91-B20A977F3D23@mcs.anl.gov> <874ly5pow3.fsf@jedbrown.org> Message-ID: <005AE402-02DE-4C50-B913-ECFA401E8618@mcs.anl.gov> > On Apr 3, 2017, at 10:05 AM, Jed Brown wrote: > > Barry Smith writes: > >> >> SNESGetUsingInternalMatMFFD(snes,&flg); Then you can get rid of the horrible >> >> PetscBool flg; >> ierr = PetscObjectTypeCompare((PetscObject)A,MATMFFD,&flg);CHKERRQ(ierr); >> >> I had to add in two places. Still ugly but I think less buggy. > > Yeah, there are also MATMFFD checks in SNESComputeJacobian. These are different in that, I think, the same code needs to be used regardless of whether the user just used -snes_mf[*] or the user provided a matrix free matrix directly to the TS or SNES. So I don't think these can be changed to call SNESGetUsingInternalMatMFFD(). > > >> >> >> >> >>> >>> What if SNESComputeJacobian was aware of -snes_mf_operator and just >>> passed Pmat in both slots? Or does the user sometimes need access to >>> the MatMFFD created by -snes_mf_operator? (Seems like possibly, e.g., >>> to adjust differencing parameters.) From knepley at gmail.com Mon Apr 3 16:50:26 2017 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 3 Apr 2017 16:50:26 -0500 Subject: [petsc-users] PETSc with modern C++ In-Reply-To: References: Message-ID: On Mon, Apr 3, 2017 at 11:45 AM, Filippo Leonardi wrote: > On Monday, 3 April 2017 02:00:53 CEST you wrote: > > > On Sun, Apr 2, 2017 at 2:15 PM, Filippo Leonardi > > > > > > > wrote: > > > > Hello, > > > > > > > > I have a project in mind and seek feedback. > > > > > > > > Disclaimer: I hope I am not abusing of this mailing list with this > idea. > > > > If so, please ignore. > > > > > > > > As a thought experiment, and to have a bit of fun, I am currently > > > > writing/thinking on writing, a small (modern) C++ wrapper around PETSc. > > > > > > > > Premise: PETSc is awesome, I love it and use in many projects. > Sometimes I > > > > am just not super comfortable writing C. (I know my idea goes against > > > > PETSc's design philosophy). > > > > > > > > I know there are many around, and there is not really a need for this > > > > (especially since PETSc has his own object-oriented style), but there > are > > > > a > > > > few things I would like to really include in this wrapper, that I found > > > > nowhere): > > > > - I am currently only thinking about the Vector/Matrix/KSP/DM part of > the > > > > Framework, there are many other cool things that PETSc does that I do > not > > > > have the brainpower to consider those as well. > > > > - expression templates (in my opinion this is where C++ shines): this > > > > would replace all code bloat that a user might need with cool/easy to > read > > > > expressions (this could increase the number of axpy-like routines); > > > > - those expression templates should use SSE and AVX whenever available; > > > > - expressions like x += alpha * y should fall back to BLAS axpy (tough > > > > sometimes this is not even faster than a simple loop); > > > > > > The idea for the above is not clear. Do you want templates generating > calls > > > to BLAS? Or scalar code that operates on raw arrays with SSE/AVX? > > > There is some advantage here of expanding the range of BLAS operations, > > > which has been done to death by Liz Jessup and collaborators, but not > > > that much. > > > Templates should generate scalar code operating on raw arrays using SIMD. > But > > I can detect if you want to use axpbycz or gemv, and use the blas > > implementation instead. I do not think there is a point in trying to > "beat" > > BLAS. (Here a interesting point opens: I assume an efficient BLAS > > implementation, but I am not so sure about how the different BLAS do > things > > internally. I work from the assumption that we have a very well tuned BLAS > > implementation at our disposal). > The speed improvement comes from pulling vectors through memory fewer times by merging operations (kernel fusion). > > > > > > - all calls to PETSc should be less verbose, more C++-like: > > > > * for instance a VecGlobalToLocalBegin could return an empty object > that > > > > > > > > calls VecGlobalToLocalEnd when it is destroyed. > > > > > > > > * some cool idea to easily write GPU kernels. > > > > > > If you find a way to make this pay off it would be amazing, since > currently > > > nothing but BLAS3 has a hope of mattering in this context. > > > > > > > - the idea would be to have safer routines (at compile time), by means > of > > > > RAII etc. > > > > > > > > I aim for zero/near-zero/negligible overhead with full optimization, > for > > > > that I include benchmarks and extensive test units. > > > > > > > > So my question is: > > > > - anyone that would be interested (in the product/in developing)? > > > > - anyone that has suggestions (maybe that what I have in mind is > > > > nonsense)? > > > > > > I would suggest making a simple performance model that says what you will > > > do will have at least > > > a 2x speed gain. Because anything less is not worth your time, and > > > inevitably you will not get the > > > whole multiplier. I am really skeptical that is possible with the above > > > sketch. > > > That I will do as next steps for sure. But I also doubt this much of will > be > > achievable in any case. > > > > > > > Second, I would try to convince myself that what you propose would be > > > simpler, in terms of lines of code, > > > number of objects, number of concepts, etc. Right now, that is not clear > to > > > me either. > > > Number of objects per se may not be smaller. I am more thinking about > reducing > > lines of codes (verbosity), concepts and increase safety. > > > I have two examples I've been burnt with in the past: > > - casting to void* to pass custom contexts to PETSc routines > I don't think you can beat this cast, and I do not know of a way to enforce type safety with this open world assumption using C++ either. > - forgetting to call the corresponding XXXEnd after a call to XXXBegin > > (PETSc notices that, ofc., but at runtime, and that might be too late). > > > Example: I can imagine that I need a Petsc's internal array. In this case > I > > call VecGetArray. However I will inevitably foget to return the array to > > PETSc. I could have my new VecArray returning an object that restores the > > array > > when it goes out of scope. I can also flag the function with [[nodiscard]] > to > > prevent the user to destroy the returned object from the start. > Jed claims that this pattern is no longer preferred, but I have forgotten his argument. Jed? > > > > > Baring that, maybe you can argue that new capabilities, such as the type > > > flexibility described by Michael, are enabled. That > > > would be the most convincing I think. > > > This would be very interesting indeed, but I see only two options: > > - recompile PETSc twice > > - manually implement all complex routines, which might be to much of a task > We have had this discussion for years on this list. Having separate names for each type is really ugly and does not achieve what we want. We want smooth interoperability between objects with different backing types, but it is still not clear how to do this. Thanks, Matt > > > > > Thanks, > > > > > > Matt > > > Thanks for the feedback Matt. > > > > > > > If you have read up to here, thanks. > > > > > On Mon, 3 Apr 2017 at 02:00 Matthew Knepley wrote: > >> On Sun, Apr 2, 2017 at 2:15 PM, Filippo Leonardi >> wrote: >> >> >> Hello, >> >> I have a project in mind and seek feedback. >> >> Disclaimer: I hope I am not abusing of this mailing list with this idea. >> If so, please ignore. >> >> As a thought experiment, and to have a bit of fun, I am currently >> writing/thinking on writing, a small (modern) C++ wrapper around PETSc. >> >> Premise: PETSc is awesome, I love it and use in many projects. Sometimes >> I am just not super comfortable writing C. (I know my idea goes against >> PETSc's design philosophy). >> >> I know there are many around, and there is not really a need for this >> (especially since PETSc has his own object-oriented style), but there are a >> few things I would like to really include in this wrapper, that I found >> nowhere): >> - I am currently only thinking about the Vector/Matrix/KSP/DM part of the >> Framework, there are many other cool things that PETSc does that I do >> not have the brainpower to consider those as well. >> - expression templates (in my opinion this is where C++ shines): this >> would replace all code bloat that a user might need with cool/easy to read >> expressions (this could increase the number of axpy-like routines); >> - those expression templates should use SSE and AVX whenever available; >> - expressions like x += alpha * y should fall back to BLAS axpy (tough >> sometimes this is not even faster than a simple loop); >> >> >> The idea for the above is not clear. Do you want templates generating >> calls to BLAS? Or scalar code that operates on raw arrays with SSE/AVX? >> There is some advantage here of expanding the range of BLAS operations, >> which has been done to death by Liz Jessup and collaborators, but not >> that much. >> >> >> - all calls to PETSc should be less verbose, more C++-like: >> * for instance a VecGlobalToLocalBegin could return an empty object >> that calls VecGlobalToLocalEnd when it is destroyed. >> * some cool idea to easily write GPU kernels. >> >> >> If you find a way to make this pay off it would be amazing, since >> currently nothing but BLAS3 has a hope of mattering in this context. >> >> >> - the idea would be to have safer routines (at compile time), by means >> of RAII etc. >> >> I aim for zero/near-zero/negligible overhead with full optimization, for >> that I include benchmarks and extensive test units. >> >> So my question is: >> - anyone that would be interested (in the product/in developing)? >> - anyone that has suggestions (maybe that what I have in mind is >> nonsense)? >> >> >> I would suggest making a simple performance model that says what you will >> do will have at least >> a 2x speed gain. Because anything less is not worth your time, and >> inevitably you will not get the >> whole multiplier. I am really skeptical that is possible with the above >> sketch. >> >> Second, I would try to convince myself that what you propose would be >> simpler, in terms of lines of code, >> number of objects, number of concepts, etc. Right now, that is not clear >> to me either. >> >> Baring that, maybe you can argue that new capabilities, such as the type >> flexibility described by Michael, are enabled. That >> would be the most convincing I think. >> >> Thanks, >> >> Matt >> >> If you have read up to here, thanks. >> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Mon Apr 3 16:58:58 2017 From: knepley at gmail.com (Matthew Knepley) Date: Mon, 3 Apr 2017 16:58:58 -0500 Subject: [petsc-users] examples of DMPlex*FVM methods In-Reply-To: References: Message-ID: On Mon, Apr 3, 2017 at 2:50 PM, Ingo Gaertner wrote: > Dear all, > as part of my studies I would like to implement a simple finite volume CFD > solver (following the textbook by Ferziger) on an unstructured, distributed > mesh. It seems like the DMPlex class with its DMPlex*FVM methods has > prepared much of what is needed for such a CFD solver. > Unfortunately I couldn't find any examples how all the great solutions can > be used that appear to be already implemented in PETSc. Are some basic > tutorials available, for example how to solve a simple poisson equation on > a DMPlex using PETSc's FVM methods, so that I don't have to reinvent the > wheel? > There are no tutorials, and almost no documentation. The best thing to look at is TS ex11. This solves a bunch of different equations (advection, shallow water, Euler) with a "pointwise" Riemann solver. Feel free to ask questions about it. However, I acknowledge that the learning curve is steep. I have put much more time into the FEM examples. I will eventually get around to FVM, but feel free to contribute a simple example (the TS ex45, 46, and 47 were contributed FEM examples). Thanks, Matt > Thanks > Ingo > > > > Virenfrei. > www.avast.com > > <#m_-6516768804621389606_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Mon Apr 3 18:08:43 2017 From: jed at jedbrown.org (Jed Brown) Date: Mon, 03 Apr 2017 17:08:43 -0600 Subject: [petsc-users] PETSc with modern C++ In-Reply-To: References: Message-ID: <87o9wdm6mc.fsf@jedbrown.org> Matthew Knepley writes: >> BLAS. (Here a interesting point opens: I assume an efficient BLAS >> >> implementation, but I am not so sure about how the different BLAS do >> things >> >> internally. I work from the assumption that we have a very well tuned BLAS >> >> implementation at our disposal). >> > > The speed improvement comes from pulling vectors through memory fewer > times by merging operations (kernel fusion). Typical examples are VecMAXPY and VecMDot, but note that these are not xGEMV because the vectors are independent arrays rather than single arrays with a constant leading dimension. >> call VecGetArray. However I will inevitably foget to return the array to >> >> PETSc. I could have my new VecArray returning an object that restores the >> >> array >> >> when it goes out of scope. I can also flag the function with [[nodiscard]] >> to >> >> prevent the user to destroy the returned object from the start. >> > > Jed claims that this pattern is no longer preferred, but I have forgotten > his argument. > Jed? Destruction order matters and needs to be collective. If an error condition causes destruction to occur in a different order on different processes, you can get deadlock. I would much rather have an error leave some resources (for the OS to collect) than escalate into deadlock. > We have had this discussion for years on this list. Having separate names > for each type > is really ugly and does not achieve what we want. We want smooth > interoperability between > objects with different backing types, but it is still not clear how to do > this. Hide it internally and implicitly promote. Only the *GetArray functions need to be parametrized on numeric type. But it's a lot of work on the backend. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From zhaowenbo.npic at gmail.com Tue Apr 4 01:24:53 2017 From: zhaowenbo.npic at gmail.com (Wenbo Zhao) Date: Tue, 4 Apr 2017 14:24:53 +0800 Subject: [petsc-users] Question about DMDA BOUNDARY_CONDITION set Message-ID: Barry, Thanks. It is my fault. I should not mix the VecScatter and MatSetValues. 1. Matrix assemble There are only two options matrix for case with rotation boundary. The first is using "MatSetoption(A,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_FALSE)". The second is to create matrix by hand. Is it correct? 2. VecScatter It will be used to get the values on ghosted cell. My case is neutron diffusion equation which is like below: \( -\nabla \cdot D_g \nabla \phi_g + \Sigma_{r,g}\phi_g - \sum_{g' \neq g} \Sigma_{s,g'\to g} \phi_{g'} = \frac{1}{Keff} \chi_g \sum_{g' = 1, G} \nu\Sigma_{f,g'}\phi_{g'} \) where g denotes energy group, G is the number of energy groups, \(D_g\) is the diffusion coefficient of group g, \(\Sigma_{r,g}\) is the removal cross section of group g, \(\Sigma_{s,g'\to g}\) is the scatter cross section from group g' to group g, \(chi_g\) is the fission spectral of group g, \(\nu\Sigma_{f,g'}\) is the fission production cross section, \(\phi_g\) is neutron flux and eigenvector, Keff is the eigenvalue. The diffusion coefficients and other cross sections varied in the region and are distributed on procs. If I use mesh-centerd seven point finite difference method for 3D, the degree of freedom is G and \(D_g\) need comunication. I get the \(D_g\) of the ghost cell through VecScatter and insert values to matrix. Is it correct? BEST, Wenbo -------------- next part -------------- An HTML attachment was scrubbed... URL: From toon.weyens at gmail.com Tue Apr 4 02:20:57 2017 From: toon.weyens at gmail.com (Toon Weyens) Date: Tue, 04 Apr 2017 07:20:57 +0000 Subject: [petsc-users] Slepc JD and GD converge to wrong eigenpair In-Reply-To: <9BC45C28-0AB4-48EF-9981-B54A0FD45F41@dsic.upv.es> References: <65A0A5E7-399B-4D19-A967-73765A96DB98@dsic.upv.es> <2A5BFE40-C401-42CA-944A-9008E57B55EB@dsic.upv.es> <9BC45C28-0AB4-48EF-9981-B54A0FD45F41@dsic.upv.es> Message-ID: Dear Jose and Matthew, Thank you so much for the effort! I still don't manage to converge using the range interval technique to filter out the positive eigenvalues, but using shift-invert combined with a target eigenvalue does true miracles. I get extremely fast convergence. The truth of the matter is that we are mainly interested in negative eigenvalues (unstable modes), and from physical considerations they are more or less situated in -0.2 wrote: > > > El 1 abr 2017, a las 0:01, Toon Weyens escribi?: > > > > Dear jose, > > > > I have saved the matrices in Matlab format and am sending them to you > using pCloud. If you want another format, please tell me. Please also note > that they are about 1.4GB each. > > > > I also attach a typical output of eps_view and log_view in output.txt, > for 8 processes. > > > > Thanks so much for helping me out! I think Petsc and Slepc are amazing > inventions that really have saved me many months of work! > > > > Regards > > I played a little bit with your matrices. > > With Krylov-Schur I can solve the problem quite easily. Note that in > generalized eigenvalue problems it is always better to use STSINVERT > because you have to invert a matrix anyway. So instead of setting > which=smallest_real, use shift-and-invert with a target that is close to > the wanted eigenvalue. For instance, with target=-0.005 I get convergence > with just one iteration: > > $ ./ex7 -f1 A.bin -f2 B.bin -st_ksp_type preonly -st_pc_type lu > -st_pc_factor_mat_solver_package mumps -eps_tol 1e-5 -st_type sinvert > -eps_target -0.005 > > Generalized eigenproblem stored in file. > > Reading COMPLEX matrices from binary files... > Number of iterations of the method: 1 > Number of linear iterations of the method: 16 > Solution method: krylovschur > > Number of requested eigenvalues: 1 > Stopping condition: tol=1e-05, maxit=7500 > Linear eigensolve converged (1 eigenpair) due to CONVERGED_TOL; > iterations 1 > ---------------------- -------------------- > k ||Ax-kBx||/||kx|| > ---------------------- -------------------- > -0.004809-0.000000i 8.82085e-05 > ---------------------- -------------------- > > > Of course, you don't know a priori where your eigenvalue is. > Alternatively, you can set the target at 0 and get rid of positive > eigenvalues with a region filtering. For instance: > > $ ./ex7 -f1 A.bin -f2 B.bin -st_ksp_type preonly -st_pc_type lu > -st_pc_factor_mat_solver_package mumps -eps_tol 1e-5 -st_type sinvert > -eps_target 0 -rg_type interval -rg_interval_endpoints -1,0,-.05,.05 > -eps_nev 2 > > Generalized eigenproblem stored in file. > > Reading COMPLEX matrices from binary files... > Number of iterations of the method: 8 > Number of linear iterations of the method: 74 > Solution method: krylovschur > > Number of requested eigenvalues: 2 > Stopping condition: tol=1e-05, maxit=7058 > Linear eigensolve converged (2 eigenpairs) due to CONVERGED_TOL; > iterations 8 > ---------------------- -------------------- > k ||Ax-kBx||/||kx|| > ---------------------- -------------------- > -0.000392-0.000000i 2636.4 > -0.004809+0.000000i 318441 > ---------------------- -------------------- > > In this case, the residuals seem very bad. But this is due to the fact > that your matrices have huge norms. Adding the option -eps_error_backward > ::ascii_info_detail will show residuals relative to the matrix norms: > ---------------------- -------------------- > k eta(x,k) > ---------------------- -------------------- > -0.000392-0.000000i 3.78647e-11 > -0.004809+0.000000i 5.61419e-08 > ---------------------- -------------------- > > > Regarding the GD solver, I am also getting the correct solution. I don't > know why you are not getting convergence to the wanted eigenvalue: > > $ ./ex7 -f1 A.bin -f2 B.bin -st_ksp_type preonly -st_pc_type lu > -st_pc_factor_mat_solver_package mumps -eps_tol 1e-5 -eps_smallest_real > -eps_ncv 32 -eps_type gd > > Generalized eigenproblem stored in file. > > Reading COMPLEX matrices from binary files... > Number of iterations of the method: 132 > Number of linear iterations of the method: 0 > Solution method: gd > > Number of requested eigenvalues: 1 > Stopping condition: tol=1e-05, maxit=120000 > Linear eigensolve converged (1 eigenpair) due to CONVERGED_TOL; > iterations 132 > ---------------------- -------------------- > k ||Ax-kBx||/||kx|| > ---------------------- -------------------- > -0.004809+0.000000i 2.16223e-05 > ---------------------- -------------------- > > > Again, it is much better to use a target instead of smallest_real: > > $ ./ex7 -f1 A.bin -f2 B.bin -st_ksp_type preonly -st_pc_type lu > -st_pc_factor_mat_solver_package mumps -eps_tol 1e-5 -eps_type gd > -eps_target -0.005 > > Generalized eigenproblem stored in file. > > Reading COMPLEX matrices from binary files... > Number of iterations of the method: 23 > Number of linear iterations of the method: 0 > Solution method: gd > > Number of requested eigenvalues: 1 > Stopping condition: tol=1e-05, maxit=120000 > Linear eigensolve converged (1 eigenpair) due to CONVERGED_TOL; > iterations 23 > ---------------------- -------------------- > k ||Ax-kBx||/||kx|| > ---------------------- -------------------- > -0.004809-0.000000i 2.06572e-05 > ---------------------- -------------------- > > > Jose > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test_streams.pdf Type: application/pdf Size: 9487 bytes Desc: not available URL: From knepley at gmail.com Tue Apr 4 06:40:39 2017 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 4 Apr 2017 06:40:39 -0500 Subject: [petsc-users] Slepc JD and GD converge to wrong eigenpair In-Reply-To: References: <65A0A5E7-399B-4D19-A967-73765A96DB98@dsic.upv.es> <2A5BFE40-C401-42CA-944A-9008E57B55EB@dsic.upv.es> <9BC45C28-0AB4-48EF-9981-B54A0FD45F41@dsic.upv.es> Message-ID: On Tue, Apr 4, 2017 at 2:20 AM, Toon Weyens wrote: > Dear Jose and Matthew, > > Thank you so much for the effort! > > I still don't manage to converge using the range interval technique to > filter out the positive eigenvalues, but using shift-invert combined with a > target eigenvalue does true miracles. I get extremely fast convergence. > > The truth of the matter is that we are mainly interested in negative > eigenvalues (unstable modes), and from physical considerations they are > more or less situated in -0.2 use. So we will just use guesses. > > Thank you so much again! > > Also, I have finally managed to run streams (the cluster is quite full > atm). These are the outputs: > 1) This shows you have a bad process mapping. You could get much more speedup for 1-4 procs by properly mapping processes to cores, perhaps with numactl. 2) Essentially 3 processes can saturate your memory bandwidth, so I would not expect much gain from using more than 4. Thanks, Matt > 1 processes > Number of MPI processes 1 Processor names c04b27 > Triad: 12352.0825 Rate (MB/s) > 2 processes > Number of MPI processes 2 Processor names c04b27 c04b27 > Triad: 18968.0226 Rate (MB/s) > 3 processes > Number of MPI processes 3 Processor names c04b27 c04b27 c04b27 > Triad: 21106.8580 Rate (MB/s) > 4 processes > Number of MPI processes 4 Processor names c04b27 c04b27 c04b27 c04b27 > Triad: 21655.5885 Rate (MB/s) > 5 processes > Number of MPI processes 5 Processor names c04b27 c04b27 c04b27 c04b27 > c04b27 > Triad: 21627.5559 Rate (MB/s) > 6 processes > Number of MPI processes 6 Processor names c04b27 c04b27 c04b27 c04b27 > c04b27 c04b27 > Triad: 21394.9620 Rate (MB/s) > 7 processes > Number of MPI processes 7 Processor names c04b27 c04b27 c04b27 c04b27 > c04b27 c04b27 c04b27 > Triad: 24952.7076 Rate (MB/s) > 8 processes > Number of MPI processes 8 Processor names c04b27 c04b27 c04b27 c04b27 > c04b27 c04b27 c04b27 c04b27 > Triad: 28357.1062 Rate (MB/s) > 9 processes > Number of MPI processes 9 Processor names c04b27 c04b27 c04b27 c04b27 > c04b27 c04b27 c04b27 c04b27 c04b27 > Triad: 31720.4545 Rate (MB/s) > 10 processes > Number of MPI processes 10 Processor names c04b27 c04b27 c04b27 c04b27 > c04b27 c04b27 c04b27 c04b27 c04b27 c04b27 > Triad: 35198.7412 Rate (MB/s) > 11 processes > Number of MPI processes 11 Processor names c04b27 c04b27 c04b27 c04b27 > c04b27 c04b27 c04b27 c04b27 c04b27 c04b27 c04b27 > Triad: 38616.0615 Rate (MB/s) > 12 processes > Number of MPI processes 12 Processor names c04b27 c04b27 c04b27 c04b27 > c04b27 c04b27 c04b27 c04b27 c04b27 c04b27 c04b27 c04b27 > Triad: 41939.3994 Rate (MB/s) > > I attach a figure. > > Thanks again! > > On Mon, Apr 3, 2017 at 8:29 PM Jose E. Roman wrote: > >> >> > El 1 abr 2017, a las 0:01, Toon Weyens >> escribi?: >> > >> > Dear jose, >> > >> > I have saved the matrices in Matlab format and am sending them to you >> using pCloud. If you want another format, please tell me. Please also note >> that they are about 1.4GB each. >> > >> > I also attach a typical output of eps_view and log_view in output.txt, >> for 8 processes. >> > >> > Thanks so much for helping me out! I think Petsc and Slepc are amazing >> inventions that really have saved me many months of work! >> > >> > Regards >> >> I played a little bit with your matrices. >> >> With Krylov-Schur I can solve the problem quite easily. Note that in >> generalized eigenvalue problems it is always better to use STSINVERT >> because you have to invert a matrix anyway. So instead of setting >> which=smallest_real, use shift-and-invert with a target that is close to >> the wanted eigenvalue. For instance, with target=-0.005 I get convergence >> with just one iteration: >> >> $ ./ex7 -f1 A.bin -f2 B.bin -st_ksp_type preonly -st_pc_type lu >> -st_pc_factor_mat_solver_package mumps -eps_tol 1e-5 -st_type sinvert >> -eps_target -0.005 >> >> Generalized eigenproblem stored in file. >> >> Reading COMPLEX matrices from binary files... >> Number of iterations of the method: 1 >> Number of linear iterations of the method: 16 >> Solution method: krylovschur >> >> Number of requested eigenvalues: 1 >> Stopping condition: tol=1e-05, maxit=7500 >> Linear eigensolve converged (1 eigenpair) due to CONVERGED_TOL; >> iterations 1 >> ---------------------- -------------------- >> k ||Ax-kBx||/||kx|| >> ---------------------- -------------------- >> -0.004809-0.000000i 8.82085e-05 >> ---------------------- -------------------- >> >> >> Of course, you don't know a priori where your eigenvalue is. >> Alternatively, you can set the target at 0 and get rid of positive >> eigenvalues with a region filtering. For instance: >> >> $ ./ex7 -f1 A.bin -f2 B.bin -st_ksp_type preonly -st_pc_type lu >> -st_pc_factor_mat_solver_package mumps -eps_tol 1e-5 -st_type sinvert >> -eps_target 0 -rg_type interval -rg_interval_endpoints -1,0,-.05,.05 >> -eps_nev 2 >> >> Generalized eigenproblem stored in file. >> >> Reading COMPLEX matrices from binary files... >> Number of iterations of the method: 8 >> Number of linear iterations of the method: 74 >> Solution method: krylovschur >> >> Number of requested eigenvalues: 2 >> Stopping condition: tol=1e-05, maxit=7058 >> Linear eigensolve converged (2 eigenpairs) due to CONVERGED_TOL; >> iterations 8 >> ---------------------- -------------------- >> k ||Ax-kBx||/||kx|| >> ---------------------- -------------------- >> -0.000392-0.000000i 2636.4 >> -0.004809+0.000000i 318441 >> ---------------------- -------------------- >> >> In this case, the residuals seem very bad. But this is due to the fact >> that your matrices have huge norms. Adding the option -eps_error_backward >> ::ascii_info_detail will show residuals relative to the matrix norms: >> ---------------------- -------------------- >> k eta(x,k) >> ---------------------- -------------------- >> -0.000392-0.000000i 3.78647e-11 >> -0.004809+0.000000i 5.61419e-08 >> ---------------------- -------------------- >> >> >> Regarding the GD solver, I am also getting the correct solution. I don't >> know why you are not getting convergence to the wanted eigenvalue: >> >> $ ./ex7 -f1 A.bin -f2 B.bin -st_ksp_type preonly -st_pc_type lu >> -st_pc_factor_mat_solver_package mumps -eps_tol 1e-5 -eps_smallest_real >> -eps_ncv 32 -eps_type gd >> >> Generalized eigenproblem stored in file. >> >> Reading COMPLEX matrices from binary files... >> Number of iterations of the method: 132 >> Number of linear iterations of the method: 0 >> Solution method: gd >> >> Number of requested eigenvalues: 1 >> Stopping condition: tol=1e-05, maxit=120000 >> Linear eigensolve converged (1 eigenpair) due to CONVERGED_TOL; >> iterations 132 >> ---------------------- -------------------- >> k ||Ax-kBx||/||kx|| >> ---------------------- -------------------- >> -0.004809+0.000000i 2.16223e-05 >> ---------------------- -------------------- >> >> >> Again, it is much better to use a target instead of smallest_real: >> >> $ ./ex7 -f1 A.bin -f2 B.bin -st_ksp_type preonly -st_pc_type lu >> -st_pc_factor_mat_solver_package mumps -eps_tol 1e-5 -eps_type gd >> -eps_target -0.005 >> >> Generalized eigenproblem stored in file. >> >> Reading COMPLEX matrices from binary files... >> Number of iterations of the method: 23 >> Number of linear iterations of the method: 0 >> Solution method: gd >> >> Number of requested eigenvalues: 1 >> Stopping condition: tol=1e-05, maxit=120000 >> Linear eigensolve converged (1 eigenpair) due to CONVERGED_TOL; >> iterations 23 >> ---------------------- -------------------- >> k ||Ax-kBx||/||kx|| >> ---------------------- -------------------- >> -0.004809-0.000000i 2.06572e-05 >> ---------------------- -------------------- >> >> >> Jose >> >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From toon.weyens at gmail.com Tue Apr 4 06:58:04 2017 From: toon.weyens at gmail.com (Toon Weyens) Date: Tue, 04 Apr 2017 11:58:04 +0000 Subject: [petsc-users] Slepc JD and GD converge to wrong eigenpair In-Reply-To: References: <65A0A5E7-399B-4D19-A967-73765A96DB98@dsic.upv.es> <2A5BFE40-C401-42CA-944A-9008E57B55EB@dsic.upv.es> <9BC45C28-0AB4-48EF-9981-B54A0FD45F41@dsic.upv.es> Message-ID: Dear Matthew, Thanks for your answer, but this is something I do not really know much about... The node I used has 12 cores and about 24GB of RAM. But for these test cases, isn't the distribution of memory over cores handled automatically by SLEPC? Regards On Tue, Apr 4, 2017 at 1:40 PM Matthew Knepley wrote: > On Tue, Apr 4, 2017 at 2:20 AM, Toon Weyens wrote: > > Dear Jose and Matthew, > > Thank you so much for the effort! > > I still don't manage to converge using the range interval technique to > filter out the positive eigenvalues, but using shift-invert combined with a > target eigenvalue does true miracles. I get extremely fast convergence. > > The truth of the matter is that we are mainly interested in negative > eigenvalues (unstable modes), and from physical considerations they are > more or less situated in -0.2 use. So we will just use guesses. > > Thank you so much again! > > Also, I have finally managed to run streams (the cluster is quite full > atm). These are the outputs: > > > 1) This shows you have a bad process mapping. You could get much more > speedup for 1-4 procs by properly mapping processes to cores, perhaps with > numactl. > > 2) Essentially 3 processes can saturate your memory bandwidth, so I would > not expect much gain from using more than 4. > > Thanks, > > Matt > > > 1 processes > Number of MPI processes 1 Processor names c04b27 > Triad: 12352.0825 Rate (MB/s) > 2 processes > Number of MPI processes 2 Processor names c04b27 c04b27 > Triad: 18968.0226 Rate (MB/s) > 3 processes > Number of MPI processes 3 Processor names c04b27 c04b27 c04b27 > Triad: 21106.8580 Rate (MB/s) > 4 processes > Number of MPI processes 4 Processor names c04b27 c04b27 c04b27 c04b27 > Triad: 21655.5885 Rate (MB/s) > 5 processes > Number of MPI processes 5 Processor names c04b27 c04b27 c04b27 c04b27 > c04b27 > Triad: 21627.5559 Rate (MB/s) > 6 processes > Number of MPI processes 6 Processor names c04b27 c04b27 c04b27 c04b27 > c04b27 c04b27 > Triad: 21394.9620 Rate (MB/s) > 7 processes > Number of MPI processes 7 Processor names c04b27 c04b27 c04b27 c04b27 > c04b27 c04b27 c04b27 > Triad: 24952.7076 Rate (MB/s) > 8 processes > Number of MPI processes 8 Processor names c04b27 c04b27 c04b27 c04b27 > c04b27 c04b27 c04b27 c04b27 > Triad: 28357.1062 Rate (MB/s) > 9 processes > Number of MPI processes 9 Processor names c04b27 c04b27 c04b27 c04b27 > c04b27 c04b27 c04b27 c04b27 c04b27 > Triad: 31720.4545 Rate (MB/s) > 10 processes > Number of MPI processes 10 Processor names c04b27 c04b27 c04b27 c04b27 > c04b27 c04b27 c04b27 c04b27 c04b27 c04b27 > Triad: 35198.7412 Rate (MB/s) > 11 processes > Number of MPI processes 11 Processor names c04b27 c04b27 c04b27 c04b27 > c04b27 c04b27 c04b27 c04b27 c04b27 c04b27 c04b27 > Triad: 38616.0615 Rate (MB/s) > 12 processes > Number of MPI processes 12 Processor names c04b27 c04b27 c04b27 c04b27 > c04b27 c04b27 c04b27 c04b27 c04b27 c04b27 c04b27 c04b27 > Triad: 41939.3994 Rate (MB/s) > > I attach a figure. > > Thanks again! > > On Mon, Apr 3, 2017 at 8:29 PM Jose E. Roman wrote: > > > > El 1 abr 2017, a las 0:01, Toon Weyens escribi?: > > > > Dear jose, > > > > I have saved the matrices in Matlab format and am sending them to you > using pCloud. If you want another format, please tell me. Please also note > that they are about 1.4GB each. > > > > I also attach a typical output of eps_view and log_view in output.txt, > for 8 processes. > > > > Thanks so much for helping me out! I think Petsc and Slepc are amazing > inventions that really have saved me many months of work! > > > > Regards > > I played a little bit with your matrices. > > With Krylov-Schur I can solve the problem quite easily. Note that in > generalized eigenvalue problems it is always better to use STSINVERT > because you have to invert a matrix anyway. So instead of setting > which=smallest_real, use shift-and-invert with a target that is close to > the wanted eigenvalue. For instance, with target=-0.005 I get convergence > with just one iteration: > > $ ./ex7 -f1 A.bin -f2 B.bin -st_ksp_type preonly -st_pc_type lu > -st_pc_factor_mat_solver_package mumps -eps_tol 1e-5 -st_type sinvert > -eps_target -0.005 > > Generalized eigenproblem stored in file. > > Reading COMPLEX matrices from binary files... > Number of iterations of the method: 1 > Number of linear iterations of the method: 16 > Solution method: krylovschur > > Number of requested eigenvalues: 1 > Stopping condition: tol=1e-05, maxit=7500 > Linear eigensolve converged (1 eigenpair) due to CONVERGED_TOL; > iterations 1 > ---------------------- -------------------- > k ||Ax-kBx||/||kx|| > ---------------------- -------------------- > -0.004809-0.000000i 8.82085e-05 > ---------------------- -------------------- > > > Of course, you don't know a priori where your eigenvalue is. > Alternatively, you can set the target at 0 and get rid of positive > eigenvalues with a region filtering. For instance: > > $ ./ex7 -f1 A.bin -f2 B.bin -st_ksp_type preonly -st_pc_type lu > -st_pc_factor_mat_solver_package mumps -eps_tol 1e-5 -st_type sinvert > -eps_target 0 -rg_type interval -rg_interval_endpoints -1,0,-.05,.05 > -eps_nev 2 > > Generalized eigenproblem stored in file. > > Reading COMPLEX matrices from binary files... > Number of iterations of the method: 8 > Number of linear iterations of the method: 74 > Solution method: krylovschur > > Number of requested eigenvalues: 2 > Stopping condition: tol=1e-05, maxit=7058 > Linear eigensolve converged (2 eigenpairs) due to CONVERGED_TOL; > iterations 8 > ---------------------- -------------------- > k ||Ax-kBx||/||kx|| > ---------------------- -------------------- > -0.000392-0.000000i 2636.4 > -0.004809+0.000000i 318441 > ---------------------- -------------------- > > In this case, the residuals seem very bad. But this is due to the fact > that your matrices have huge norms. Adding the option -eps_error_backward > ::ascii_info_detail will show residuals relative to the matrix norms: > ---------------------- -------------------- > k eta(x,k) > ---------------------- -------------------- > -0.000392-0.000000i 3.78647e-11 > -0.004809+0.000000i 5.61419e-08 > ---------------------- -------------------- > > > Regarding the GD solver, I am also getting the correct solution. I don't > know why you are not getting convergence to the wanted eigenvalue: > > $ ./ex7 -f1 A.bin -f2 B.bin -st_ksp_type preonly -st_pc_type lu > -st_pc_factor_mat_solver_package mumps -eps_tol 1e-5 -eps_smallest_real > -eps_ncv 32 -eps_type gd > > Generalized eigenproblem stored in file. > > Reading COMPLEX matrices from binary files... > Number of iterations of the method: 132 > Number of linear iterations of the method: 0 > Solution method: gd > > Number of requested eigenvalues: 1 > Stopping condition: tol=1e-05, maxit=120000 > Linear eigensolve converged (1 eigenpair) due to CONVERGED_TOL; > iterations 132 > ---------------------- -------------------- > k ||Ax-kBx||/||kx|| > ---------------------- -------------------- > -0.004809+0.000000i 2.16223e-05 > ---------------------- -------------------- > > > Again, it is much better to use a target instead of smallest_real: > > $ ./ex7 -f1 A.bin -f2 B.bin -st_ksp_type preonly -st_pc_type lu > -st_pc_factor_mat_solver_package mumps -eps_tol 1e-5 -eps_type gd > -eps_target -0.005 > > Generalized eigenproblem stored in file. > > Reading COMPLEX matrices from binary files... > Number of iterations of the method: 23 > Number of linear iterations of the method: 0 > Solution method: gd > > Number of requested eigenvalues: 1 > Stopping condition: tol=1e-05, maxit=120000 > Linear eigensolve converged (1 eigenpair) due to CONVERGED_TOL; > iterations 23 > ---------------------- -------------------- > k ||Ax-kBx||/||kx|| > ---------------------- -------------------- > -0.004809-0.000000i 2.06572e-05 > ---------------------- -------------------- > > > Jose > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Apr 4 07:00:29 2017 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 4 Apr 2017 07:00:29 -0500 Subject: [petsc-users] Slepc JD and GD converge to wrong eigenpair In-Reply-To: References: <65A0A5E7-399B-4D19-A967-73765A96DB98@dsic.upv.es> <2A5BFE40-C401-42CA-944A-9008E57B55EB@dsic.upv.es> <9BC45C28-0AB4-48EF-9981-B54A0FD45F41@dsic.upv.es> Message-ID: On Tue, Apr 4, 2017 at 6:58 AM, Toon Weyens wrote: > Dear Matthew, > > Thanks for your answer, but this is something I do not really know much > about... The node I used has 12 cores and about 24GB of RAM. > > But for these test cases, isn't the distribution of memory over cores > handled automatically by SLEPC? > No. Its handled by MPI, which just passes that job off to the OS, which does a crap job. Matt > Regards > > On Tue, Apr 4, 2017 at 1:40 PM Matthew Knepley wrote: > >> On Tue, Apr 4, 2017 at 2:20 AM, Toon Weyens >> wrote: >> >> Dear Jose and Matthew, >> >> Thank you so much for the effort! >> >> I still don't manage to converge using the range interval technique to >> filter out the positive eigenvalues, but using shift-invert combined with a >> target eigenvalue does true miracles. I get extremely fast convergence. >> >> The truth of the matter is that we are mainly interested in negative >> eigenvalues (unstable modes), and from physical considerations they are >> more or less situated in -0.2> use. So we will just use guesses. >> >> Thank you so much again! >> >> Also, I have finally managed to run streams (the cluster is quite full >> atm). These are the outputs: >> >> >> 1) This shows you have a bad process mapping. You could get much more >> speedup for 1-4 procs by properly mapping processes to cores, perhaps with >> numactl. >> >> 2) Essentially 3 processes can saturate your memory bandwidth, so I would >> not expect much gain from using more than 4. >> >> Thanks, >> >> Matt >> >> >> 1 processes >> Number of MPI processes 1 Processor names c04b27 >> Triad: 12352.0825 Rate (MB/s) >> 2 processes >> Number of MPI processes 2 Processor names c04b27 c04b27 >> Triad: 18968.0226 Rate (MB/s) >> 3 processes >> Number of MPI processes 3 Processor names c04b27 c04b27 c04b27 >> Triad: 21106.8580 Rate (MB/s) >> 4 processes >> Number of MPI processes 4 Processor names c04b27 c04b27 c04b27 c04b27 >> Triad: 21655.5885 Rate (MB/s) >> 5 processes >> Number of MPI processes 5 Processor names c04b27 c04b27 c04b27 c04b27 >> c04b27 >> Triad: 21627.5559 Rate (MB/s) >> 6 processes >> Number of MPI processes 6 Processor names c04b27 c04b27 c04b27 c04b27 >> c04b27 c04b27 >> Triad: 21394.9620 Rate (MB/s) >> 7 processes >> Number of MPI processes 7 Processor names c04b27 c04b27 c04b27 c04b27 >> c04b27 c04b27 c04b27 >> Triad: 24952.7076 Rate (MB/s) >> 8 processes >> Number of MPI processes 8 Processor names c04b27 c04b27 c04b27 c04b27 >> c04b27 c04b27 c04b27 c04b27 >> Triad: 28357.1062 Rate (MB/s) >> 9 processes >> Number of MPI processes 9 Processor names c04b27 c04b27 c04b27 c04b27 >> c04b27 c04b27 c04b27 c04b27 c04b27 >> Triad: 31720.4545 Rate (MB/s) >> 10 processes >> Number of MPI processes 10 Processor names c04b27 c04b27 c04b27 c04b27 >> c04b27 c04b27 c04b27 c04b27 c04b27 c04b27 >> Triad: 35198.7412 Rate (MB/s) >> 11 processes >> Number of MPI processes 11 Processor names c04b27 c04b27 c04b27 c04b27 >> c04b27 c04b27 c04b27 c04b27 c04b27 c04b27 c04b27 >> Triad: 38616.0615 Rate (MB/s) >> 12 processes >> Number of MPI processes 12 Processor names c04b27 c04b27 c04b27 c04b27 >> c04b27 c04b27 c04b27 c04b27 c04b27 c04b27 c04b27 c04b27 >> Triad: 41939.3994 Rate (MB/s) >> >> I attach a figure. >> >> Thanks again! >> >> On Mon, Apr 3, 2017 at 8:29 PM Jose E. Roman wrote: >> >> >> > El 1 abr 2017, a las 0:01, Toon Weyens >> escribi?: >> > >> > Dear jose, >> > >> > I have saved the matrices in Matlab format and am sending them to you >> using pCloud. If you want another format, please tell me. Please also note >> that they are about 1.4GB each. >> > >> > I also attach a typical output of eps_view and log_view in output.txt, >> for 8 processes. >> > >> > Thanks so much for helping me out! I think Petsc and Slepc are amazing >> inventions that really have saved me many months of work! >> > >> > Regards >> >> I played a little bit with your matrices. >> >> With Krylov-Schur I can solve the problem quite easily. Note that in >> generalized eigenvalue problems it is always better to use STSINVERT >> because you have to invert a matrix anyway. So instead of setting >> which=smallest_real, use shift-and-invert with a target that is close to >> the wanted eigenvalue. For instance, with target=-0.005 I get convergence >> with just one iteration: >> >> $ ./ex7 -f1 A.bin -f2 B.bin -st_ksp_type preonly -st_pc_type lu >> -st_pc_factor_mat_solver_package mumps -eps_tol 1e-5 -st_type sinvert >> -eps_target -0.005 >> >> Generalized eigenproblem stored in file. >> >> Reading COMPLEX matrices from binary files... >> Number of iterations of the method: 1 >> Number of linear iterations of the method: 16 >> Solution method: krylovschur >> >> Number of requested eigenvalues: 1 >> Stopping condition: tol=1e-05, maxit=7500 >> Linear eigensolve converged (1 eigenpair) due to CONVERGED_TOL; >> iterations 1 >> ---------------------- -------------------- >> k ||Ax-kBx||/||kx|| >> ---------------------- -------------------- >> -0.004809-0.000000i 8.82085e-05 >> ---------------------- -------------------- >> >> >> Of course, you don't know a priori where your eigenvalue is. >> Alternatively, you can set the target at 0 and get rid of positive >> eigenvalues with a region filtering. For instance: >> >> $ ./ex7 -f1 A.bin -f2 B.bin -st_ksp_type preonly -st_pc_type lu >> -st_pc_factor_mat_solver_package mumps -eps_tol 1e-5 -st_type sinvert >> -eps_target 0 -rg_type interval -rg_interval_endpoints -1,0,-.05,.05 >> -eps_nev 2 >> >> Generalized eigenproblem stored in file. >> >> Reading COMPLEX matrices from binary files... >> Number of iterations of the method: 8 >> Number of linear iterations of the method: 74 >> Solution method: krylovschur >> >> Number of requested eigenvalues: 2 >> Stopping condition: tol=1e-05, maxit=7058 >> Linear eigensolve converged (2 eigenpairs) due to CONVERGED_TOL; >> iterations 8 >> ---------------------- -------------------- >> k ||Ax-kBx||/||kx|| >> ---------------------- -------------------- >> -0.000392-0.000000i 2636.4 >> -0.004809+0.000000i 318441 >> ---------------------- -------------------- >> >> In this case, the residuals seem very bad. But this is due to the fact >> that your matrices have huge norms. Adding the option -eps_error_backward >> ::ascii_info_detail will show residuals relative to the matrix norms: >> ---------------------- -------------------- >> k eta(x,k) >> ---------------------- -------------------- >> -0.000392-0.000000i 3.78647e-11 >> -0.004809+0.000000i 5.61419e-08 >> ---------------------- -------------------- >> >> >> Regarding the GD solver, I am also getting the correct solution. I don't >> know why you are not getting convergence to the wanted eigenvalue: >> >> $ ./ex7 -f1 A.bin -f2 B.bin -st_ksp_type preonly -st_pc_type lu >> -st_pc_factor_mat_solver_package mumps -eps_tol 1e-5 -eps_smallest_real >> -eps_ncv 32 -eps_type gd >> >> Generalized eigenproblem stored in file. >> >> Reading COMPLEX matrices from binary files... >> Number of iterations of the method: 132 >> Number of linear iterations of the method: 0 >> Solution method: gd >> >> Number of requested eigenvalues: 1 >> Stopping condition: tol=1e-05, maxit=120000 >> Linear eigensolve converged (1 eigenpair) due to CONVERGED_TOL; >> iterations 132 >> ---------------------- -------------------- >> k ||Ax-kBx||/||kx|| >> ---------------------- -------------------- >> -0.004809+0.000000i 2.16223e-05 >> ---------------------- -------------------- >> >> >> Again, it is much better to use a target instead of smallest_real: >> >> $ ./ex7 -f1 A.bin -f2 B.bin -st_ksp_type preonly -st_pc_type lu >> -st_pc_factor_mat_solver_package mumps -eps_tol 1e-5 -eps_type gd >> -eps_target -0.005 >> >> Generalized eigenproblem stored in file. >> >> Reading COMPLEX matrices from binary files... >> Number of iterations of the method: 23 >> Number of linear iterations of the method: 0 >> Solution method: gd >> >> Number of requested eigenvalues: 1 >> Stopping condition: tol=1e-05, maxit=120000 >> Linear eigensolve converged (1 eigenpair) due to CONVERGED_TOL; >> iterations 23 >> ---------------------- -------------------- >> k ||Ax-kBx||/||kx|| >> ---------------------- -------------------- >> -0.004809-0.000000i 2.06572e-05 >> ---------------------- -------------------- >> >> >> Jose >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From toon.weyens at gmail.com Tue Apr 4 07:34:38 2017 From: toon.weyens at gmail.com (Toon Weyens) Date: Tue, 04 Apr 2017 12:34:38 +0000 Subject: [petsc-users] Slepc JD and GD converge to wrong eigenpair In-Reply-To: References: <65A0A5E7-399B-4D19-A967-73765A96DB98@dsic.upv.es> <2A5BFE40-C401-42CA-944A-9008E57B55EB@dsic.upv.es> <9BC45C28-0AB4-48EF-9981-B54A0FD45F41@dsic.upv.es> Message-ID: Ah ok. When I find the time I will have a look into mapping processes to cores. I guess it is possible using the torque scheduler. Thank you! On Tue, Apr 4, 2017 at 2:00 PM Matthew Knepley wrote: > On Tue, Apr 4, 2017 at 6:58 AM, Toon Weyens wrote: > > Dear Matthew, > > Thanks for your answer, but this is something I do not really know much > about... The node I used has 12 cores and about 24GB of RAM. > > But for these test cases, isn't the distribution of memory over cores > handled automatically by SLEPC? > > > No. Its handled by MPI, which just passes that job off to the OS, which > does a crap job. > > Matt > > > Regards > > On Tue, Apr 4, 2017 at 1:40 PM Matthew Knepley wrote: > > On Tue, Apr 4, 2017 at 2:20 AM, Toon Weyens wrote: > > Dear Jose and Matthew, > > Thank you so much for the effort! > > I still don't manage to converge using the range interval technique to > filter out the positive eigenvalues, but using shift-invert combined with a > target eigenvalue does true miracles. I get extremely fast convergence. > > The truth of the matter is that we are mainly interested in negative > eigenvalues (unstable modes), and from physical considerations they are > more or less situated in -0.2 use. So we will just use guesses. > > Thank you so much again! > > Also, I have finally managed to run streams (the cluster is quite full > atm). These are the outputs: > > > 1) This shows you have a bad process mapping. You could get much more > speedup for 1-4 procs by properly mapping processes to cores, perhaps with > numactl. > > 2) Essentially 3 processes can saturate your memory bandwidth, so I would > not expect much gain from using more than 4. > > Thanks, > > Matt > > > 1 processes > Number of MPI processes 1 Processor names c04b27 > Triad: 12352.0825 Rate (MB/s) > 2 processes > Number of MPI processes 2 Processor names c04b27 c04b27 > Triad: 18968.0226 Rate (MB/s) > 3 processes > Number of MPI processes 3 Processor names c04b27 c04b27 c04b27 > Triad: 21106.8580 Rate (MB/s) > 4 processes > Number of MPI processes 4 Processor names c04b27 c04b27 c04b27 c04b27 > Triad: 21655.5885 Rate (MB/s) > 5 processes > Number of MPI processes 5 Processor names c04b27 c04b27 c04b27 c04b27 > c04b27 > Triad: 21627.5559 Rate (MB/s) > 6 processes > Number of MPI processes 6 Processor names c04b27 c04b27 c04b27 c04b27 > c04b27 c04b27 > Triad: 21394.9620 Rate (MB/s) > 7 processes > Number of MPI processes 7 Processor names c04b27 c04b27 c04b27 c04b27 > c04b27 c04b27 c04b27 > Triad: 24952.7076 Rate (MB/s) > 8 processes > Number of MPI processes 8 Processor names c04b27 c04b27 c04b27 c04b27 > c04b27 c04b27 c04b27 c04b27 > Triad: 28357.1062 Rate (MB/s) > 9 processes > Number of MPI processes 9 Processor names c04b27 c04b27 c04b27 c04b27 > c04b27 c04b27 c04b27 c04b27 c04b27 > Triad: 31720.4545 Rate (MB/s) > 10 processes > Number of MPI processes 10 Processor names c04b27 c04b27 c04b27 c04b27 > c04b27 c04b27 c04b27 c04b27 c04b27 c04b27 > Triad: 35198.7412 Rate (MB/s) > 11 processes > Number of MPI processes 11 Processor names c04b27 c04b27 c04b27 c04b27 > c04b27 c04b27 c04b27 c04b27 c04b27 c04b27 c04b27 > Triad: 38616.0615 Rate (MB/s) > 12 processes > Number of MPI processes 12 Processor names c04b27 c04b27 c04b27 c04b27 > c04b27 c04b27 c04b27 c04b27 c04b27 c04b27 c04b27 c04b27 > Triad: 41939.3994 Rate (MB/s) > > I attach a figure. > > Thanks again! > > On Mon, Apr 3, 2017 at 8:29 PM Jose E. Roman wrote: > > > > El 1 abr 2017, a las 0:01, Toon Weyens escribi?: > > > > Dear jose, > > > > I have saved the matrices in Matlab format and am sending them to you > using pCloud. If you want another format, please tell me. Please also note > that they are about 1.4GB each. > > > > I also attach a typical output of eps_view and log_view in output.txt, > for 8 processes. > > > > Thanks so much for helping me out! I think Petsc and Slepc are amazing > inventions that really have saved me many months of work! > > > > Regards > > I played a little bit with your matrices. > > With Krylov-Schur I can solve the problem quite easily. Note that in > generalized eigenvalue problems it is always better to use STSINVERT > because you have to invert a matrix anyway. So instead of setting > which=smallest_real, use shift-and-invert with a target that is close to > the wanted eigenvalue. For instance, with target=-0.005 I get convergence > with just one iteration: > > $ ./ex7 -f1 A.bin -f2 B.bin -st_ksp_type preonly -st_pc_type lu > -st_pc_factor_mat_solver_package mumps -eps_tol 1e-5 -st_type sinvert > -eps_target -0.005 > > Generalized eigenproblem stored in file. > > Reading COMPLEX matrices from binary files... > Number of iterations of the method: 1 > Number of linear iterations of the method: 16 > Solution method: krylovschur > > Number of requested eigenvalues: 1 > Stopping condition: tol=1e-05, maxit=7500 > Linear eigensolve converged (1 eigenpair) due to CONVERGED_TOL; > iterations 1 > ---------------------- -------------------- > k ||Ax-kBx||/||kx|| > ---------------------- -------------------- > -0.004809-0.000000i 8.82085e-05 > ---------------------- -------------------- > > > Of course, you don't know a priori where your eigenvalue is. > Alternatively, you can set the target at 0 and get rid of positive > eigenvalues with a region filtering. For instance: > > $ ./ex7 -f1 A.bin -f2 B.bin -st_ksp_type preonly -st_pc_type lu > -st_pc_factor_mat_solver_package mumps -eps_tol 1e-5 -st_type sinvert > -eps_target 0 -rg_type interval -rg_interval_endpoints -1,0,-.05,.05 > -eps_nev 2 > > Generalized eigenproblem stored in file. > > Reading COMPLEX matrices from binary files... > Number of iterations of the method: 8 > Number of linear iterations of the method: 74 > Solution method: krylovschur > > Number of requested eigenvalues: 2 > Stopping condition: tol=1e-05, maxit=7058 > Linear eigensolve converged (2 eigenpairs) due to CONVERGED_TOL; > iterations 8 > ---------------------- -------------------- > k ||Ax-kBx||/||kx|| > ---------------------- -------------------- > -0.000392-0.000000i 2636.4 > -0.004809+0.000000i 318441 > ---------------------- -------------------- > > In this case, the residuals seem very bad. But this is due to the fact > that your matrices have huge norms. Adding the option -eps_error_backward > ::ascii_info_detail will show residuals relative to the matrix norms: > ---------------------- -------------------- > k eta(x,k) > ---------------------- -------------------- > -0.000392-0.000000i 3.78647e-11 > -0.004809+0.000000i 5.61419e-08 > ---------------------- -------------------- > > > Regarding the GD solver, I am also getting the correct solution. I don't > know why you are not getting convergence to the wanted eigenvalue: > > $ ./ex7 -f1 A.bin -f2 B.bin -st_ksp_type preonly -st_pc_type lu > -st_pc_factor_mat_solver_package mumps -eps_tol 1e-5 -eps_smallest_real > -eps_ncv 32 -eps_type gd > > Generalized eigenproblem stored in file. > > Reading COMPLEX matrices from binary files... > Number of iterations of the method: 132 > Number of linear iterations of the method: 0 > Solution method: gd > > Number of requested eigenvalues: 1 > Stopping condition: tol=1e-05, maxit=120000 > Linear eigensolve converged (1 eigenpair) due to CONVERGED_TOL; > iterations 132 > ---------------------- -------------------- > k ||Ax-kBx||/||kx|| > ---------------------- -------------------- > -0.004809+0.000000i 2.16223e-05 > ---------------------- -------------------- > > > Again, it is much better to use a target instead of smallest_real: > > $ ./ex7 -f1 A.bin -f2 B.bin -st_ksp_type preonly -st_pc_type lu > -st_pc_factor_mat_solver_package mumps -eps_tol 1e-5 -eps_type gd > -eps_target -0.005 > > Generalized eigenproblem stored in file. > > Reading COMPLEX matrices from binary files... > Number of iterations of the method: 23 > Number of linear iterations of the method: 0 > Solution method: gd > > Number of requested eigenvalues: 1 > Stopping condition: tol=1e-05, maxit=120000 > Linear eigensolve converged (1 eigenpair) due to CONVERGED_TOL; > iterations 23 > ---------------------- -------------------- > k ||Ax-kBx||/||kx|| > ---------------------- -------------------- > -0.004809-0.000000i 2.06572e-05 > ---------------------- -------------------- > > > Jose > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From filippo.leon at gmail.com Tue Apr 4 10:25:15 2017 From: filippo.leon at gmail.com (Filippo Leonardi) Date: Tue, 04 Apr 2017 15:25:15 +0000 Subject: [petsc-users] PETSc with modern C++ In-Reply-To: <87o9wdm6mc.fsf@jedbrown.org> References: <87o9wdm6mc.fsf@jedbrown.org> Message-ID: I really appreciate the feedback. Thanks. That of deadlock, when the order of destruction is not preserved, is a point I hadn't thought of. Maybe it can be cleverly addressed. PS: If you are interested, I ran some benchmark on BLAS1 stuff and, for a single processor, I obtain: Example for MAXPY, with expression templates: BM_Vector_petscxx_MAXPY/8 38 ns 38 ns 18369805 BM_Vector_petscxx_MAXPY/64 622 ns 622 ns 1364335 BM_Vector_petscxx_MAXPY/512 281 ns 281 ns 2477718 BM_Vector_petscxx_MAXPY/4096 2046 ns 2046 ns 349954 BM_Vector_petscxx_MAXPY/32768 18012 ns 18012 ns 38788 BM_Vector_petscxx_MAXPY_BigO 0.55 N 0.55 N BM_Vector_petscxx_MAXPY_RMS 7 % 7 % Direct call to MAXPY: BM_Vector_PETSc_MAXPY/8 33 ns 33 ns 20973674 BM_Vector_PETSc_MAXPY/64 116 ns 116 ns 5992878 BM_Vector_PETSc_MAXPY/512 731 ns 731 ns 963340 BM_Vector_PETSc_MAXPY/4096 5739 ns 5739 ns 122414 BM_Vector_PETSc_MAXPY/32768 46346 ns 46346 ns 15312 BM_Vector_PETSc_MAXPY_BigO 1.41 N 1.41 N BM_Vector_PETSc_MAXPY_RMS 0 % 0 % And 3x speedup on 2 MPI ranks (not much communication here, anyway). I am now convinced that this warrants some further investigation/testing. On Tue, 4 Apr 2017 at 01:08 Jed Brown wrote: > Matthew Knepley writes: > > >> BLAS. (Here a interesting point opens: I assume an efficient BLAS > >> > >> implementation, but I am not so sure about how the different BLAS do > >> things > >> > >> internally. I work from the assumption that we have a very well tuned > BLAS > >> > >> implementation at our disposal). > >> > > > > The speed improvement comes from pulling vectors through memory fewer > > times by merging operations (kernel fusion). > > Typical examples are VecMAXPY and VecMDot, but note that these are not > xGEMV because the vectors are independent arrays rather than single > arrays with a constant leading dimension. > > >> call VecGetArray. However I will inevitably foget to return the array to > >> > >> PETSc. I could have my new VecArray returning an object that restores > the > >> > >> array > >> > >> when it goes out of scope. I can also flag the function with > [[nodiscard]] > >> to > >> > >> prevent the user to destroy the returned object from the start. > >> > > > > Jed claims that this pattern is no longer preferred, but I have forgotten > > his argument. > > Jed? > > Destruction order matters and needs to be collective. If an error > condition causes destruction to occur in a different order on different > processes, you can get deadlock. I would much rather have an error > leave some resources (for the OS to collect) than escalate into > deadlock. > > > We have had this discussion for years on this list. Having separate names > > for each type > > is really ugly and does not achieve what we want. We want smooth > > interoperability between > > objects with different backing types, but it is still not clear how to do > > this. > > Hide it internally and implicitly promote. Only the *GetArray functions > need to be parametrized on numeric type. But it's a lot of work on the > backend. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Apr 4 10:50:29 2017 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 4 Apr 2017 10:50:29 -0500 Subject: [petsc-users] PETSc with modern C++ In-Reply-To: References: <87o9wdm6mc.fsf@jedbrown.org> Message-ID: On Tue, Apr 4, 2017 at 10:25 AM, Filippo Leonardi wrote: > I really appreciate the feedback. Thanks. > > That of deadlock, when the order of destruction is not preserved, is a > point I hadn't thought of. Maybe it can be cleverly addressed. > > PS: If you are interested, I ran some benchmark on BLAS1 stuff and, for a > single processor, I obtain: > > Example for MAXPY, with expression templates: > > BM_Vector_petscxx_MAXPY/8 38 ns 38 ns 18369805 > > BM_Vector_petscxx_MAXPY/64 622 ns 622 ns 1364335 > > BM_Vector_petscxx_MAXPY/512 281 ns 281 ns 2477718 > > BM_Vector_petscxx_MAXPY/4096 2046 ns 2046 ns 349954 > > BM_Vector_petscxx_MAXPY/32768 18012 ns 18012 ns 38788 > > BM_Vector_petscxx_MAXPY_BigO 0.55 N 0.55 N > > BM_Vector_petscxx_MAXPY_RMS 7 % 7 % > > Direct call to MAXPY: > > BM_Vector_PETSc_MAXPY/8 33 ns 33 ns 20973674 > > BM_Vector_PETSc_MAXPY/64 116 ns 116 ns 5992878 > > BM_Vector_PETSc_MAXPY/512 731 ns 731 ns 963340 > > BM_Vector_PETSc_MAXPY/4096 5739 ns 5739 ns 122414 > > BM_Vector_PETSc_MAXPY/32768 46346 ns 46346 ns 15312 > > BM_Vector_PETSc_MAXPY_BigO 1.41 N 1.41 N > > BM_Vector_PETSc_MAXPY_RMS 0 % 0 % > > > And 3x speedup on 2 MPI ranks (not much communication here, anyway). I am > now convinced that this warrants some further investigation/testing. > 1) There is no communication here, so 3x for 2 ranks means the result is not believable 2) I do not understand what the two cases are. Do you mean that expression templates vectorize the scalar work? I am surprised that BLAS is so bad, but I guess not completely surprised. What BLAS? Matt > > On Tue, 4 Apr 2017 at 01:08 Jed Brown wrote: > >> Matthew Knepley writes: >> >> >> BLAS. (Here a interesting point opens: I assume an efficient BLAS >> >> >> >> implementation, but I am not so sure about how the different BLAS do >> >> things >> >> >> >> internally. I work from the assumption that we have a very well tuned >> BLAS >> >> >> >> implementation at our disposal). >> >> >> > >> > The speed improvement comes from pulling vectors through memory fewer >> > times by merging operations (kernel fusion). >> >> Typical examples are VecMAXPY and VecMDot, but note that these are not >> xGEMV because the vectors are independent arrays rather than single >> arrays with a constant leading dimension. >> >> >> call VecGetArray. However I will inevitably foget to return the array >> to >> >> >> >> PETSc. I could have my new VecArray returning an object that restores >> the >> >> >> >> array >> >> >> >> when it goes out of scope. I can also flag the function with >> [[nodiscard]] >> >> to >> >> >> >> prevent the user to destroy the returned object from the start. >> >> >> > >> > Jed claims that this pattern is no longer preferred, but I have >> forgotten >> > his argument. >> > Jed? >> >> Destruction order matters and needs to be collective. If an error >> condition causes destruction to occur in a different order on different >> processes, you can get deadlock. I would much rather have an error >> leave some resources (for the OS to collect) than escalate into >> deadlock. >> >> > We have had this discussion for years on this list. Having separate >> names >> > for each type >> > is really ugly and does not achieve what we want. We want smooth >> > interoperability between >> > objects with different backing types, but it is still not clear how to >> do >> > this. >> >> Hide it internally and implicitly promote. Only the *GetArray functions >> need to be parametrized on numeric type. But it's a lot of work on the >> backend. >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jychang48 at gmail.com Tue Apr 4 10:57:59 2017 From: jychang48 at gmail.com (Justin Chang) Date: Tue, 04 Apr 2017 15:57:59 +0000 Subject: [petsc-users] Configuring PETSc for KNL In-Reply-To: References: Message-ID: Thanks everyone for the helpful advice. So I tried all the suggestions including using libsci. The performance did not improve for my particular runs, which I think suggests the problem parameters chosen for my tests (SNES ex48) are not optimal for KNL. Does anyone have example test runs I could reproduce that compare the performance between KNL and Haswell/Ivybridge/etc? On Mon, Apr 3, 2017 at 3:06 PM Richard Mills wrote: > Yes, one should rely on MKL (or Cray LibSci, if using the Cray toolchain) > on Cori. But I'm guessing that this will make no noticeable difference for > what Justin is doing. > > --Richard > > On Mon, Apr 3, 2017 at 12:57 PM, murat ke?eli wrote: > > How about replacing --download-fblaslapack with vendor specific > BLAS/LAPACK? > > Murat > > On Mon, Apr 3, 2017 at 2:45 PM, Richard Mills > wrote: > > On Mon, Apr 3, 2017 at 12:24 PM, Zhang, Hong wrote: > > > On Apr 3, 2017, at 1:44 PM, Justin Chang wrote: > > Richard, > > This is what my job script looks like: > > #!/bin/bash > #SBATCH -N 16 > #SBATCH -C knl,quad,flat > #SBATCH -p regular > #SBATCH -J knlflat1024 > #SBATCH -L SCRATCH > #SBATCH -o knlflat1024.o%j > #SBATCH --mail-type=ALL > #SBATCH --mail-user=jychang48 at gmail.com > #SBATCH -t 00:20:00 > > #run the application: > cd $SCRATCH/Icesheet > sbcast --compress=lz4 ./ex48cori /tmp/ex48cori > srun -n 1024 -c 4 --cpu_bind=cores numactl -p 1 /tmp/ex48cori -M 128 -N > 128 -P 16 -thi_mat_type baij -pc_type mg -mg_coarse_pc_type gamg -da_refine > 1 > > > Maybe it is a typo. It should be numactl -m 1. > > > "-p 1" will also work. "-p" means to "prefer" NUMA node 1 (the MCDRAM), > whereas "-m" means to use only NUMA node 1. In the former case, MCDRAM > will be used for allocations until the available memory there has been > exhausted, and then things will spill over into the DRAM. One would think > that "-m" would be better for doing performance studies, but on systems > where the nodes have swap space enabled, you can get terrible performance > if your code's working set exceeds the size of the MCDRAM, as the system > will obediently obey your wishes to not use the DRAM and go straight to the > swap disk! I assume the Cori nodes don't have swap space, though I could > be wrong. > > > According to the NERSC info pages, they say to add the "numactl" if using > flat mode. Previously I tried cache mode but the performance seems to be > unaffected. > > > Using cache mode should give similar performance as using flat mode with > the numactl option. But both approaches should be significant faster than > using flat mode without the numactl option. I usually see over 3X speedup. > You can also do such comparison to see if the high-bandwidth memory is > working properly. > > I also comparerd 256 haswell nodes vs 256 KNL nodes and haswell is nearly > 4-5x faster. Though I suspect this drastic change has much to do with the > initial coarse grid size now being extremely small. > > I think you may be right about why you see such a big difference. The KNL > nodes need enough work to be able to use the SIMD lanes effectively. Also, > if your problem gets small enough, then it's going to be able to fit in the > Haswell's L3 cache. Although KNL has MCDRAM and this delivers *a lot* more > memory bandwidth than the DDR4 memory, it will deliver a lot less bandwidth > than the Haswell's L3. > > I'll give the COPTFLAGS a try and see what happens > > > Make sure to use --with-memalign=64 for data alignment when configuring > PETSc. > > > Ah, yes, I forgot that. Thanks for mentioning it, Hong! > > > The option -xMIC-AVX512 would improve the vectorization performance. But > it may cause problems for the MPIBAIJ format for some unknown reason. > MPIAIJ should work fine with this option. > > > Hmm. Try both, and, if you see worse performance with MPIBAIJ, let us > know and I'll try to figure this out. > > --Richard > > > > Hong (Mr.) > > Thanks, > Justin > > On Mon, Apr 3, 2017 at 1:36 PM, Richard Mills > wrote: > > Hi Justin, > > How is the MCDRAM (on-package "high-bandwidth memory") configured for your > KNL runs? And if it is in "flat" mode, what are you doing to ensure that > you use the MCDRAM? Doing this wrong seems to be one of the most common > reasons for unexpected poor performance on KNL. > > I'm not that familiar with the environment on Cori, but I think that if > you are building for KNL, you should add "-xMIC-AVX512" to your compiler > flags to explicitly instruct the compiler to use the AVX512 instruction > set. I usually use something along the lines of > > 'COPTFLAGS=-g -O3 -fp-model fast -xMIC-AVX512' > > (The "-g" just adds symbols, which make the output from performance > profiling tools much more useful.) > > That said, I think that if you are comparing 1024 Haswell cores vs. 1024 > KNL cores (so double the number of Haswell nodes), I'm not surprised that > the simulations are almost twice as fast using the Haswell nodes. Keep in > mind that individual KNL cores are much less powerful than an individual > Haswell node. You are also using roughly twice the power footprint (dual > socket Haswell node should be roughly equivalent to a KNL node, I > believe). How do things look on when you compare equal nodes? > > Cheers, > Richard > > On Mon, Apr 3, 2017 at 11:13 AM, Justin Chang wrote: > > Hi all, > > On NERSC's Cori I have the following configure options for PETSc: > > ./configure --download-fblaslapack --with-cc=cc --with-clib-autodetect=0 > --with-cxx=CC --with-cxxlib-autodetect=0 --with-debugging=0 --with-fc=ftn > --with-fortranlib-autodetect=0 --with-mpiexec=srun --with-64-bit-indices=1 > COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 PETSC_ARCH=arch-cori-opt > > Where I swapped out the default Intel programming environment with that of > Cray (e.g., 'module switch PrgEnv-intel/6.0.3 PrgEnv-cray/6.0.3'). I want > to document the performance difference between Cori's Haswell and KNL > processors. > > When I run a PETSc example like SNES ex48 on 1024 cores (32 Haswell and 16 > KNL nodes), the simulations are almost twice as fast on Haswell nodes. > Which leads me to suspect that I am not doing something right for KNL. Does > anyone know what are some "optimal" configure options for running PETSc on > KNL? > > Thanks, > Justin > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Apr 4 11:05:33 2017 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 4 Apr 2017 11:05:33 -0500 Subject: [petsc-users] Configuring PETSc for KNL In-Reply-To: References: Message-ID: On Tue, Apr 4, 2017 at 10:57 AM, Justin Chang wrote: > Thanks everyone for the helpful advice. So I tried all the suggestions > including using libsci. The performance did not improve for my particular > runs, which I think suggests the problem parameters chosen for my tests > (SNES ex48) are not optimal for KNL. Does anyone have example test runs I > could reproduce that compare the performance between KNL and > Haswell/Ivybridge/etc? > Lets try to see what is going on with your existing data first. First, I think that main thing is to make sure we are using MCDRAM. Everything else in KNL is window dressing (IMHO). All we have to look at is something like MAXPY. You can get the bandwidth estimate from the flop rate and problem size (I think), and we can at least get bandwidth ratios between Haswell and KNL with that number. Matt > On Mon, Apr 3, 2017 at 3:06 PM Richard Mills > wrote: > >> Yes, one should rely on MKL (or Cray LibSci, if using the Cray toolchain) >> on Cori. But I'm guessing that this will make no noticeable difference for >> what Justin is doing. >> >> --Richard >> >> On Mon, Apr 3, 2017 at 12:57 PM, murat ke?eli wrote: >> >> How about replacing --download-fblaslapack with vendor specific >> BLAS/LAPACK? >> >> Murat >> >> On Mon, Apr 3, 2017 at 2:45 PM, Richard Mills >> wrote: >> >> On Mon, Apr 3, 2017 at 12:24 PM, Zhang, Hong wrote: >> >> >> On Apr 3, 2017, at 1:44 PM, Justin Chang wrote: >> >> Richard, >> >> This is what my job script looks like: >> >> #!/bin/bash >> #SBATCH -N 16 >> #SBATCH -C knl,quad,flat >> #SBATCH -p regular >> #SBATCH -J knlflat1024 >> #SBATCH -L SCRATCH >> #SBATCH -o knlflat1024.o%j >> #SBATCH --mail-type=ALL >> #SBATCH --mail-user=jychang48 at gmail.com >> #SBATCH -t 00:20:00 >> >> #run the application: >> cd $SCRATCH/Icesheet >> sbcast --compress=lz4 ./ex48cori /tmp/ex48cori >> srun -n 1024 -c 4 --cpu_bind=cores numactl -p 1 /tmp/ex48cori -M 128 -N >> 128 -P 16 -thi_mat_type baij -pc_type mg -mg_coarse_pc_type gamg -da_refine >> 1 >> >> >> Maybe it is a typo. It should be numactl -m 1. >> >> >> "-p 1" will also work. "-p" means to "prefer" NUMA node 1 (the MCDRAM), >> whereas "-m" means to use only NUMA node 1. In the former case, MCDRAM >> will be used for allocations until the available memory there has been >> exhausted, and then things will spill over into the DRAM. One would think >> that "-m" would be better for doing performance studies, but on systems >> where the nodes have swap space enabled, you can get terrible performance >> if your code's working set exceeds the size of the MCDRAM, as the system >> will obediently obey your wishes to not use the DRAM and go straight to the >> swap disk! I assume the Cori nodes don't have swap space, though I could >> be wrong. >> >> >> According to the NERSC info pages, they say to add the "numactl" if using >> flat mode. Previously I tried cache mode but the performance seems to be >> unaffected. >> >> >> Using cache mode should give similar performance as using flat mode with >> the numactl option. But both approaches should be significant faster than >> using flat mode without the numactl option. I usually see over 3X speedup. >> You can also do such comparison to see if the high-bandwidth memory is >> working properly. >> >> I also comparerd 256 haswell nodes vs 256 KNL nodes and haswell is nearly >> 4-5x faster. Though I suspect this drastic change has much to do with the >> initial coarse grid size now being extremely small. >> >> I think you may be right about why you see such a big difference. The >> KNL nodes need enough work to be able to use the SIMD lanes effectively. >> Also, if your problem gets small enough, then it's going to be able to fit >> in the Haswell's L3 cache. Although KNL has MCDRAM and this delivers *a >> lot* more memory bandwidth than the DDR4 memory, it will deliver a lot less >> bandwidth than the Haswell's L3. >> >> I'll give the COPTFLAGS a try and see what happens >> >> >> Make sure to use --with-memalign=64 for data alignment when configuring >> PETSc. >> >> >> Ah, yes, I forgot that. Thanks for mentioning it, Hong! >> >> >> The option -xMIC-AVX512 would improve the vectorization performance. But >> it may cause problems for the MPIBAIJ format for some unknown reason. >> MPIAIJ should work fine with this option. >> >> >> Hmm. Try both, and, if you see worse performance with MPIBAIJ, let us >> know and I'll try to figure this out. >> >> --Richard >> >> >> >> Hong (Mr.) >> >> Thanks, >> Justin >> >> On Mon, Apr 3, 2017 at 1:36 PM, Richard Mills >> wrote: >> >> Hi Justin, >> >> How is the MCDRAM (on-package "high-bandwidth memory") configured for >> your KNL runs? And if it is in "flat" mode, what are you doing to ensure >> that you use the MCDRAM? Doing this wrong seems to be one of the most >> common reasons for unexpected poor performance on KNL. >> >> I'm not that familiar with the environment on Cori, but I think that if >> you are building for KNL, you should add "-xMIC-AVX512" to your compiler >> flags to explicitly instruct the compiler to use the AVX512 instruction >> set. I usually use something along the lines of >> >> 'COPTFLAGS=-g -O3 -fp-model fast -xMIC-AVX512' >> >> (The "-g" just adds symbols, which make the output from performance >> profiling tools much more useful.) >> >> That said, I think that if you are comparing 1024 Haswell cores vs. 1024 >> KNL cores (so double the number of Haswell nodes), I'm not surprised that >> the simulations are almost twice as fast using the Haswell nodes. Keep in >> mind that individual KNL cores are much less powerful than an individual >> Haswell node. You are also using roughly twice the power footprint (dual >> socket Haswell node should be roughly equivalent to a KNL node, I >> believe). How do things look on when you compare equal nodes? >> >> Cheers, >> Richard >> >> On Mon, Apr 3, 2017 at 11:13 AM, Justin Chang >> wrote: >> >> Hi all, >> >> On NERSC's Cori I have the following configure options for PETSc: >> >> ./configure --download-fblaslapack --with-cc=cc --with-clib-autodetect=0 >> --with-cxx=CC --with-cxxlib-autodetect=0 --with-debugging=0 --with-fc=ftn >> --with-fortranlib-autodetect=0 --with-mpiexec=srun --with-64-bit-indices=1 >> COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 PETSC_ARCH=arch-cori-opt >> >> Where I swapped out the default Intel programming environment with that >> of Cray (e.g., 'module switch PrgEnv-intel/6.0.3 PrgEnv-cray/6.0.3'). I >> want to document the performance difference between Cori's Haswell and KNL >> processors. >> >> When I run a PETSc example like SNES ex48 on 1024 cores (32 Haswell and >> 16 KNL nodes), the simulations are almost twice as fast on Haswell nodes. >> Which leads me to suspect that I am not doing something right for KNL. Does >> anyone know what are some "optimal" configure options for running PETSc on >> KNL? >> >> Thanks, >> Justin >> >> >> >> >> >> >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From fande.kong at inl.gov Tue Apr 4 11:46:03 2017 From: fande.kong at inl.gov (Kong, Fande) Date: Tue, 4 Apr 2017 10:46:03 -0600 Subject: [petsc-users] GAMG for the unsymmetrical matrix Message-ID: Hi All, I am using GAMG to solve a group of coupled diffusion equations, but the resulting matrix is not symmetrical. I got the following error messages: *[0]PETSC ERROR: Petsc has generated inconsistent data[0]PETSC ERROR: Have un-symmetric graph (apparently). Use '-pc_gamg_sym_graph true' to symetrize the graph or '-pc_gamg_threshold -1.0' if the matrix is structurally symmetric.[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.[0]PETSC ERROR: Petsc Release Version 3.7.5, unknown [0]PETSC ERROR: /home/kongf/workhome/projects/yak/yak-opt on a arch-linux2-c-opt named r2i2n0 by kongf Mon Apr 3 16:19:59 2017[0]PETSC ERROR: /home/kongf/workhome/projects/yak/yak-opt on a arch-linux2-c-opt named r2i2n0 by kongf Mon Apr 3 16:19:59 2017[0]PETSC ERROR: #1 smoothAggs() line 462 in /home/kongf/workhome/projects/petsc/src/ksp/pc/impls/gamg/agg.c[0]PETSC ERROR: #2 PCGAMGCoarsen_AGG() line 998 in /home/kongf/workhome/projects/petsc/src/ksp/pc/impls/gamg/agg.c[0]PETSC ERROR: #3 PCSetUp_GAMG() line 571 in /home/kongf/workhome/projects/petsc/src/ksp/pc/impls/gamg/gamg.c[0]PETSC ERROR: #3 PCSetUp_GAMG() line 571 in /home/kongf/workhome/projects/petsc/src/ksp/pc/impls/gamg/gamg.c* Does this mean that GAMG works for the symmetrical matrix only? Fande, -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Apr 4 12:10:47 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 4 Apr 2017 12:10:47 -0500 Subject: [petsc-users] GAMG for the unsymmetrical matrix In-Reply-To: References: Message-ID: <66D79752-D4C3-4A2E-ADCE-EFC23C93CCD7@mcs.anl.gov> > Does this mean that GAMG works for the symmetrical matrix only? No, it means that for non symmetric nonzero structure you need the extra flag. So use the extra flag. The reason we don't always use the flag is because it adds extra cost and isn't needed if the matrix already has a symmetric nonzero structure. Barry > On Apr 4, 2017, at 11:46 AM, Kong, Fande wrote: > > Hi All, > > I am using GAMG to solve a group of coupled diffusion equations, but the resulting matrix is not symmetrical. I got the following error messages: > > > [0]PETSC ERROR: Petsc has generated inconsistent data > [0]PETSC ERROR: Have un-symmetric graph (apparently). Use '-pc_gamg_sym_graph true' to symetrize the graph or '-pc_gamg_threshold -1.0' if the matrix is structurally symmetric. > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.7.5, unknown > [0]PETSC ERROR: /home/kongf/workhome/projects/yak/yak-opt on a arch-linux2-c-opt named r2i2n0 by kongf Mon Apr 3 16:19:59 2017 > [0]PETSC ERROR: /home/kongf/workhome/projects/yak/yak-opt on a arch-linux2-c-opt named r2i2n0 by kongf Mon Apr 3 16:19:59 2017 > [0]PETSC ERROR: #1 smoothAggs() line 462 in /home/kongf/workhome/projects/petsc/src/ksp/pc/impls/gamg/agg.c > [0]PETSC ERROR: #2 PCGAMGCoarsen_AGG() line 998 in /home/kongf/workhome/projects/petsc/src/ksp/pc/impls/gamg/agg.c > [0]PETSC ERROR: #3 PCSetUp_GAMG() line 571 in /home/kongf/workhome/projects/petsc/src/ksp/pc/impls/gamg/gamg.c > [0]PETSC ERROR: #3 PCSetUp_GAMG() line 571 in /home/kongf/workhome/projects/petsc/src/ksp/pc/impls/gamg/gamg.c > > Does this mean that GAMG works for the symmetrical matrix only? > > Fande, From bsmith at mcs.anl.gov Tue Apr 4 12:12:39 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 4 Apr 2017 12:12:39 -0500 Subject: [petsc-users] PETSc with modern C++ In-Reply-To: References: <87o9wdm6mc.fsf@jedbrown.org> Message-ID: <026B5877-0005-4759-B205-86C24C69628C@mcs.anl.gov> MAXPY isn't really a BLAS 1 since it can reuse some data in certain vectors. > On Apr 4, 2017, at 10:25 AM, Filippo Leonardi wrote: > > I really appreciate the feedback. Thanks. > > That of deadlock, when the order of destruction is not preserved, is a point I hadn't thought of. Maybe it can be cleverly addressed. > > PS: If you are interested, I ran some benchmark on BLAS1 stuff and, for a single processor, I obtain: > > Example for MAXPY, with expression templates: > BM_Vector_petscxx_MAXPY/8 38 ns 38 ns 18369805 > BM_Vector_petscxx_MAXPY/64 622 ns 622 ns 1364335 > BM_Vector_petscxx_MAXPY/512 281 ns 281 ns 2477718 > BM_Vector_petscxx_MAXPY/4096 2046 ns 2046 ns 349954 > BM_Vector_petscxx_MAXPY/32768 18012 ns 18012 ns 38788 > BM_Vector_petscxx_MAXPY_BigO 0.55 N 0.55 N > BM_Vector_petscxx_MAXPY_RMS 7 % 7 % > Direct call to MAXPY: > BM_Vector_PETSc_MAXPY/8 33 ns 33 ns 20973674 > BM_Vector_PETSc_MAXPY/64 116 ns 116 ns 5992878 > BM_Vector_PETSc_MAXPY/512 731 ns 731 ns 963340 > BM_Vector_PETSc_MAXPY/4096 5739 ns 5739 ns 122414 > BM_Vector_PETSc_MAXPY/32768 46346 ns 46346 ns 15312 > BM_Vector_PETSc_MAXPY_BigO 1.41 N 1.41 N > BM_Vector_PETSc_MAXPY_RMS 0 % 0 % > > And 3x speedup on 2 MPI ranks (not much communication here, anyway). I am now convinced that this warrants some further investigation/testing. > > > On Tue, 4 Apr 2017 at 01:08 Jed Brown wrote: > Matthew Knepley writes: > > >> BLAS. (Here a interesting point opens: I assume an efficient BLAS > >> > >> implementation, but I am not so sure about how the different BLAS do > >> things > >> > >> internally. I work from the assumption that we have a very well tuned BLAS > >> > >> implementation at our disposal). > >> > > > > The speed improvement comes from pulling vectors through memory fewer > > times by merging operations (kernel fusion). > > Typical examples are VecMAXPY and VecMDot, but note that these are not > xGEMV because the vectors are independent arrays rather than single > arrays with a constant leading dimension. > > >> call VecGetArray. However I will inevitably foget to return the array to > >> > >> PETSc. I could have my new VecArray returning an object that restores the > >> > >> array > >> > >> when it goes out of scope. I can also flag the function with [[nodiscard]] > >> to > >> > >> prevent the user to destroy the returned object from the start. > >> > > > > Jed claims that this pattern is no longer preferred, but I have forgotten > > his argument. > > Jed? > > Destruction order matters and needs to be collective. If an error > condition causes destruction to occur in a different order on different > processes, you can get deadlock. I would much rather have an error > leave some resources (for the OS to collect) than escalate into > deadlock. > > > We have had this discussion for years on this list. Having separate names > > for each type > > is really ugly and does not achieve what we want. We want smooth > > interoperability between > > objects with different backing types, but it is still not clear how to do > > this. > > Hide it internally and implicitly promote. Only the *GetArray functions > need to be parametrized on numeric type. But it's a lot of work on the > backend. From ingogaertner.tus at gmail.com Tue Apr 4 12:18:00 2017 From: ingogaertner.tus at gmail.com (Ingo Gaertner) Date: Tue, 4 Apr 2017 19:18:00 +0200 Subject: [petsc-users] examples of DMPlex*FVM methods In-Reply-To: References: Message-ID: 2017-04-03 23:58 GMT+02:00 Matthew Knepley : > There are no tutorials, and almost no documentation. > Uhh, I'm not sure whether it makes sense for me to use PETSc then. > The best thing to look at is TS ex11. This solves a bunch of different > equations > (advection, shallow water, Euler) with a "pointwise" Riemann solver. Feel > free to ask questions about it. > We have never talked about Riemann solvers in our CFD course, and I don't understand what's going on in ex11. However, if you could answer a few of my questions, you'll give me a good start with PETSc. For the simple poisson problem that I am trying to implement, I have to discretize div(k grad u) integrated over each FV cell, where k is the known diffusivity, and u is the vector to solve for. The cell integral is approximated as the sum of the fluxes (k grad u) in each face centroid multiplied by each surface area vector. In principle, I have all necessary information available in the DMPlex to assemble the FV matrix for div(k grad u), assuming an orthogonal grid for the beginning. But I thought that the gradient coefficients should be available in the PETScFVFaceGeom.grad elements of the faceGeometry vector after using the methods DMPlexComputeGeometryFVM (DM dm, Vec *cellgeom, Vec *facegeom) and DMPlexComputeGradientFVM (DM dm, PetscFV fvm, Vec faceGeometry, Vec cellGeometry, DM *dmGrad) But, while these calls fill in the cell and face centroids, volumes and normals, the PETScFVFaceGeom.grad elements of the faceGeometry vector are all zero. Am I misunderstanding the purpose of DMPlexComputeGradientFVM? (My second question is more general about the PETSc installation. When I configure PETSc with "--prefix=/somewhere --download-triangle --download-parmetis" etc., these extra libraries are built correctly during the make step, but they are not copied to /somewhere during the "make install" step. Also the pkg-config files don't include the details about the extra libraries. Is it intended that one has to manually correct the install directory, or am I missing something during the configuration or installation steps?) If I ever get this poisson problem implemented, you're welcome to share it as another FVM example. But I am afraid, I'll have a few more questions before I can come up with something useful. Thanks Ingo Virenfrei. www.avast.com <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> -------------- next part -------------- An HTML attachment was scrubbed... URL: From filippo.leon at gmail.com Tue Apr 4 13:19:04 2017 From: filippo.leon at gmail.com (Filippo Leonardi) Date: Tue, 04 Apr 2017 18:19:04 +0000 Subject: [petsc-users] PETSc with modern C++ In-Reply-To: <026B5877-0005-4759-B205-86C24C69628C@mcs.anl.gov> References: <87o9wdm6mc.fsf@jedbrown.org> <026B5877-0005-4759-B205-86C24C69628C@mcs.anl.gov> Message-ID: You are in fact right, it is the same speedup of approximatively 2.5x (with 2 ranks), my brain rounded up to 3. (This was just a test done in 10 min on my Workstation, so no pretence to be definite, I just wanted to have an indication). As you say, I am using OpenBLAS, so I wouldn't be surprised of those results. If/when I use MKL (or something similar), I really do not expect such an improvement). Since you seem interested (if you are interested, I can give you all the details): the comparison I make, is with "petscxx" which is my template code (which uses a single loop) using AVX (I force PETSc to align the memory to 32 bit boundary and then I use packets of 4 doubles). Also notice that I use vectors with nice lengths, so there is no need to "peel" the end of the loop. The "PETSc" simulation is using PETSc's VecMAXPY. On Tue, 4 Apr 2017 at 19:12 Barry Smith wrote: > > MAXPY isn't really a BLAS 1 since it can reuse some data in certain > vectors. > > > > On Apr 4, 2017, at 10:25 AM, Filippo Leonardi > wrote: > > > > I really appreciate the feedback. Thanks. > > > > That of deadlock, when the order of destruction is not preserved, is a > point I hadn't thought of. Maybe it can be cleverly addressed. > > > > PS: If you are interested, I ran some benchmark on BLAS1 stuff and, for > a single processor, I obtain: > > > > Example for MAXPY, with expression templates: > > BM_Vector_petscxx_MAXPY/8 38 ns 38 ns 18369805 > > BM_Vector_petscxx_MAXPY/64 622 ns 622 ns 1364335 > > BM_Vector_petscxx_MAXPY/512 281 ns 281 ns 2477718 > > BM_Vector_petscxx_MAXPY/4096 2046 ns 2046 ns 349954 > > BM_Vector_petscxx_MAXPY/32768 18012 ns 18012 ns 38788 > > BM_Vector_petscxx_MAXPY_BigO 0.55 N 0.55 N > > BM_Vector_petscxx_MAXPY_RMS 7 % 7 % > > Direct call to MAXPY: > > BM_Vector_PETSc_MAXPY/8 33 ns 33 ns 20973674 > > BM_Vector_PETSc_MAXPY/64 116 ns 116 ns 5992878 > > BM_Vector_PETSc_MAXPY/512 731 ns 731 ns 963340 > > BM_Vector_PETSc_MAXPY/4096 5739 ns 5739 ns 122414 > > BM_Vector_PETSc_MAXPY/32768 46346 ns 46346 ns 15312 > > BM_Vector_PETSc_MAXPY_BigO 1.41 N 1.41 N > > BM_Vector_PETSc_MAXPY_RMS 0 % 0 % > > > > And 3x speedup on 2 MPI ranks (not much communication here, anyway). I > am now convinced that this warrants some further investigation/testing. > > > > > > On Tue, 4 Apr 2017 at 01:08 Jed Brown wrote: > > Matthew Knepley writes: > > > > >> BLAS. (Here a interesting point opens: I assume an efficient BLAS > > >> > > >> implementation, but I am not so sure about how the different BLAS do > > >> things > > >> > > >> internally. I work from the assumption that we have a very well tuned > BLAS > > >> > > >> implementation at our disposal). > > >> > > > > > > The speed improvement comes from pulling vectors through memory fewer > > > times by merging operations (kernel fusion). > > > > Typical examples are VecMAXPY and VecMDot, but note that these are not > > xGEMV because the vectors are independent arrays rather than single > > arrays with a constant leading dimension. > > > > >> call VecGetArray. However I will inevitably foget to return the array > to > > >> > > >> PETSc. I could have my new VecArray returning an object that restores > the > > >> > > >> array > > >> > > >> when it goes out of scope. I can also flag the function with > [[nodiscard]] > > >> to > > >> > > >> prevent the user to destroy the returned object from the start. > > >> > > > > > > Jed claims that this pattern is no longer preferred, but I have > forgotten > > > his argument. > > > Jed? > > > > Destruction order matters and needs to be collective. If an error > > condition causes destruction to occur in a different order on different > > processes, you can get deadlock. I would much rather have an error > > leave some resources (for the OS to collect) than escalate into > > deadlock. > > > > > We have had this discussion for years on this list. Having separate > names > > > for each type > > > is really ugly and does not achieve what we want. We want smooth > > > interoperability between > > > objects with different backing types, but it is still not clear how to > do > > > this. > > > > Hide it internally and implicitly promote. Only the *GetArray functions > > need to be parametrized on numeric type. But it's a lot of work on the > > backend. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aherrema at iastate.edu Tue Apr 4 13:24:25 2017 From: aherrema at iastate.edu (Austin Herrema) Date: Tue, 4 Apr 2017 13:24:25 -0500 Subject: [petsc-users] Simultaneous use petsc4py and fortran/petsc-based python module Message-ID: Hello all, Another question in a fairly long line of questions from me. Thank you to this community for all the help I've gotten. I have a Fortran/PETSc-based code that, with the help of f2py and some of you, I have compiled into a python module (we'll call it pc_fort_mod). So I can now successfully execute my code with import pc_fort_mod pc_fort_mod.execute() I am now hoping to use this analysis module in a large optimization problem being solved with OpenMDAO . OpenMDAO also makes use of PETSc/petsc4py, which, unsurprisingly, does not play well with my PETSc-based module. So doing from petsc4py import PETSc import pc_fort_mod pc_fort_mod.execute() causes the pc_fort_mod execution to fail (in particular, preallocation fails with an exit code of 63, "input argument out of range." I assume the matrix is invalid or something along those lines). So my question is, is there any way to make this work? Or is this pretty much out of the realm of what should be possible at this point? Thank you, Austin -- *Austin Herrema* PhD Student | Graduate Research Assistant | Iowa State University Wind Energy Science, Engineering, and Policy | Mechanical Engineering -------------- next part -------------- An HTML attachment was scrubbed... URL: From gaetank at gmail.com Tue Apr 4 13:30:58 2017 From: gaetank at gmail.com (Gaetan Kenway) Date: Tue, 4 Apr 2017 11:30:58 -0700 Subject: [petsc-users] Simultaneous use petsc4py and fortran/petsc-based python module In-Reply-To: References: Message-ID: There shouldn't be any additional issue with the petsc4py wrapper. We do this all the time. In fact, it's generally best to use the petsc4py to do the initialization of petsc at the very top of your highest level python script. You'll need to do this anyway if you want to use command line arguments to change the petsc arch. Again, its probably some 4/8 byte issue or maybe a real/complex issue that is caused by the petsc4py import initializing something different from what you expect. Gaetan On Tue, Apr 4, 2017 at 11:24 AM, Austin Herrema wrote: > Hello all, > > Another question in a fairly long line of questions from me. Thank you to > this community for all the help I've gotten. > > I have a Fortran/PETSc-based code that, with the help of f2py and some of > you, I have compiled into a python module (we'll call it pc_fort_mod). So I > can now successfully execute my code with > > import pc_fort_mod > pc_fort_mod.execute() > > I am now hoping to use this analysis module in a large optimization > problem being solved with OpenMDAO . OpenMDAO also > makes use of PETSc/petsc4py, which, unsurprisingly, does not play well with > my PETSc-based module. So doing > > from petsc4py import PETSc > import pc_fort_mod > pc_fort_mod.execute() > > causes the pc_fort_mod execution to fail (in particular, preallocation > fails with an exit code of 63, "input argument out of range." I assume the > matrix is invalid or something along those lines). > > So my question is, is there any way to make this work? Or is this pretty > much out of the realm of what should be possible at this point? > > Thank you, > Austin > > > -- > *Austin Herrema* > PhD Student | Graduate Research Assistant | Iowa State University > Wind Energy Science, Engineering, and Policy | Mechanical Engineering > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Apr 4 13:59:39 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 4 Apr 2017 13:59:39 -0500 Subject: [petsc-users] Question about DMDA BOUNDARY_CONDITION set In-Reply-To: References: Message-ID: <4DD6D7BC-8BA6-4283-B005-629016BA3F7D@mcs.anl.gov> > On Apr 4, 2017, at 1:24 AM, Wenbo Zhao wrote: > > Barry, > > Thanks. > > It is my fault. I should not mix the VecScatter and MatSetValues. > > 1. Matrix assemble > There are only two options matrix for case with rotation boundary. > The first is using "MatSetoption(A,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_FALSE)". > The second is to create matrix by hand. > Is it correct? Yes > > 2. VecScatter > It will be used to get the values on ghosted cell. > My case is neutron diffusion equation which is like below: > \( -\nabla \cdot D_g \nabla \phi_g + \Sigma_{r,g}\phi_g - \sum_{g' \neq g} \Sigma_{s,g'\to g} \phi_{g'} = \frac{1}{Keff} \chi_g \sum_{g' = 1, G} \nu\Sigma_{f,g'}\phi_{g'} \) > where g denotes energy group, G is the number of energy groups, \(D_g\) is the diffusion coefficient of group g, \(\Sigma_{r,g}\) is the removal cross section of group g, > \(\Sigma_{s,g'\to g}\) is the scatter cross section from group g' to group g, \(chi_g\) is the fission spectral of group g, \(\nu\Sigma_{f,g'}\) is the fission production cross section, > \(\phi_g\) is neutron flux and eigenvector, Keff is the eigenvalue. > The diffusion coefficients and other cross sections varied in the region and are distributed on procs. > If I use mesh-centerd seven point finite difference method for 3D, the degree of freedom is G and \(D_g\) need comunication. > > I get the \(D_g\) of the ghost cell through VecScatter and insert values to matrix. I don't understand what you are getting at here. Why would you ever take values from a vector and put them in a matrix. In PETSc vectors contain field variables and matrices contain operators on fields. Or do you mean by "matrix" a multidimensional array used to represent a field? The actual matrix operator for the equation above is very complicated if you fully compute the entries in the operator. > Is it correct? > > BEST, > > Wenbo From knepley at gmail.com Tue Apr 4 14:39:29 2017 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 4 Apr 2017 14:39:29 -0500 Subject: [petsc-users] PETSc with modern C++ In-Reply-To: References: <87o9wdm6mc.fsf@jedbrown.org> <026B5877-0005-4759-B205-86C24C69628C@mcs.anl.gov> Message-ID: On Tue, Apr 4, 2017 at 1:19 PM, Filippo Leonardi wrote: > You are in fact right, it is the same speedup of approximatively 2.5x > (with 2 ranks), my brain rounded up to 3. (This was just a test done in 10 > min on my Workstation, so no pretence to be definite, I just wanted to have > an indication). > Hmm, it seems like PetscKernelAXPY4() is just not vectorizing correctly then. I would be interested to see your code. As you say, I am using OpenBLAS, so I wouldn't be surprised of those > results. If/when I use MKL (or something similar), I really do not expect > such an improvement). > > Since you seem interested (if you are interested, I can give you all the > details): the comparison I make, is with "petscxx" which is my template > code (which uses a single loop) using AVX (I force PETSc to align the > memory to 32 bit boundary and then I use packets of 4 doubles). Also notice > that I use vectors with nice lengths, so there is no need to "peel" the end > of the loop. The "PETSc" simulation is using PETSc's VecMAXPY. > Thanks, Matt > On Tue, 4 Apr 2017 at 19:12 Barry Smith wrote: > >> >> MAXPY isn't really a BLAS 1 since it can reuse some data in certain >> vectors. >> >> >> > On Apr 4, 2017, at 10:25 AM, Filippo Leonardi >> wrote: >> > >> > I really appreciate the feedback. Thanks. >> > >> > That of deadlock, when the order of destruction is not preserved, is a >> point I hadn't thought of. Maybe it can be cleverly addressed. >> > >> > PS: If you are interested, I ran some benchmark on BLAS1 stuff and, for >> a single processor, I obtain: >> > >> > Example for MAXPY, with expression templates: >> > BM_Vector_petscxx_MAXPY/8 38 ns 38 ns 18369805 >> > BM_Vector_petscxx_MAXPY/64 622 ns 622 ns 1364335 >> > BM_Vector_petscxx_MAXPY/512 281 ns 281 ns 2477718 >> > BM_Vector_petscxx_MAXPY/4096 2046 ns 2046 ns 349954 >> > BM_Vector_petscxx_MAXPY/32768 18012 ns 18012 ns 38788 >> > BM_Vector_petscxx_MAXPY_BigO 0.55 N 0.55 N >> > BM_Vector_petscxx_MAXPY_RMS 7 % 7 % >> > Direct call to MAXPY: >> > BM_Vector_PETSc_MAXPY/8 33 ns 33 ns 20973674 >> > BM_Vector_PETSc_MAXPY/64 116 ns 116 ns 5992878 >> > BM_Vector_PETSc_MAXPY/512 731 ns 731 ns 963340 >> > BM_Vector_PETSc_MAXPY/4096 5739 ns 5739 ns 122414 >> > BM_Vector_PETSc_MAXPY/32768 46346 ns 46346 ns 15312 >> > BM_Vector_PETSc_MAXPY_BigO 1.41 N 1.41 N >> > BM_Vector_PETSc_MAXPY_RMS 0 % 0 % >> > >> > And 3x speedup on 2 MPI ranks (not much communication here, anyway). I >> am now convinced that this warrants some further investigation/testing. >> > >> > >> > On Tue, 4 Apr 2017 at 01:08 Jed Brown wrote: >> > Matthew Knepley writes: >> > >> > >> BLAS. (Here a interesting point opens: I assume an efficient BLAS >> > >> >> > >> implementation, but I am not so sure about how the different BLAS do >> > >> things >> > >> >> > >> internally. I work from the assumption that we have a very well >> tuned BLAS >> > >> >> > >> implementation at our disposal). >> > >> >> > > >> > > The speed improvement comes from pulling vectors through memory fewer >> > > times by merging operations (kernel fusion). >> > >> > Typical examples are VecMAXPY and VecMDot, but note that these are not >> > xGEMV because the vectors are independent arrays rather than single >> > arrays with a constant leading dimension. >> > >> > >> call VecGetArray. However I will inevitably foget to return the >> array to >> > >> >> > >> PETSc. I could have my new VecArray returning an object that >> restores the >> > >> >> > >> array >> > >> >> > >> when it goes out of scope. I can also flag the function with >> [[nodiscard]] >> > >> to >> > >> >> > >> prevent the user to destroy the returned object from the start. >> > >> >> > > >> > > Jed claims that this pattern is no longer preferred, but I have >> forgotten >> > > his argument. >> > > Jed? >> > >> > Destruction order matters and needs to be collective. If an error >> > condition causes destruction to occur in a different order on different >> > processes, you can get deadlock. I would much rather have an error >> > leave some resources (for the OS to collect) than escalate into >> > deadlock. >> > >> > > We have had this discussion for years on this list. Having separate >> names >> > > for each type >> > > is really ugly and does not achieve what we want. We want smooth >> > > interoperability between >> > > objects with different backing types, but it is still not clear how >> to do >> > > this. >> > >> > Hide it internally and implicitly promote. Only the *GetArray functions >> > need to be parametrized on numeric type. But it's a lot of work on the >> > backend. >> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Tue Apr 4 14:41:02 2017 From: hongzhang at anl.gov (Zhang, Hong) Date: Tue, 4 Apr 2017 19:41:02 +0000 Subject: [petsc-users] Configuring PETSc for KNL In-Reply-To: References: Message-ID: <7C5297AA-4558-4194-9AC0-4BB5BD5375B0@anl.gov> I did some quick tests (with a different example) on a single KNL node and a single Haswell node, both using 4 processes. Check below for the results about MatMult. And the total running time on KNL is a bit more than two times of that on Haswell. So I think the results Justin got with SNE ex48 are reasonable, considering the fact that KNL cores are much less powerful than Haswell cores, as Richard mentioned. ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ MatMult(KNL) 1609 1.0 1.4044e+02 1.0 6.41e+10 1.0 1.3e+04 3.3e+04 0.0e+00 18 19 91 93 0 18 19 91 93 0 1826 MatMult(Haswell) 1609 1.0 4.4927e+01 1.0 6.41e+10 1.0 1.3e+04 3.3e+04 0.0e+00 18 19 91 93 0 18 19 91 93 0 5708 Hong(Mr.) On Apr 4, 2017, at 11:05 AM, Matthew Knepley > wrote: On Tue, Apr 4, 2017 at 10:57 AM, Justin Chang > wrote: Thanks everyone for the helpful advice. So I tried all the suggestions including using libsci. The performance did not improve for my particular runs, which I think suggests the problem parameters chosen for my tests (SNES ex48) are not optimal for KNL. Does anyone have example test runs I could reproduce that compare the performance between KNL and Haswell/Ivybridge/etc? Lets try to see what is going on with your existing data first. First, I think that main thing is to make sure we are using MCDRAM. Everything else in KNL is window dressing (IMHO). All we have to look at is something like MAXPY. You can get the bandwidth estimate from the flop rate and problem size (I think), and we can at least get bandwidth ratios between Haswell and KNL with that number. Matt On Mon, Apr 3, 2017 at 3:06 PM Richard Mills > wrote: Yes, one should rely on MKL (or Cray LibSci, if using the Cray toolchain) on Cori. But I'm guessing that this will make no noticeable difference for what Justin is doing. --Richard On Mon, Apr 3, 2017 at 12:57 PM, murat ke?eli > wrote: How about replacing --download-fblaslapack with vendor specific BLAS/LAPACK? Murat On Mon, Apr 3, 2017 at 2:45 PM, Richard Mills > wrote: On Mon, Apr 3, 2017 at 12:24 PM, Zhang, Hong > wrote: On Apr 3, 2017, at 1:44 PM, Justin Chang > wrote: Richard, This is what my job script looks like: #!/bin/bash #SBATCH -N 16 #SBATCH -C knl,quad,flat #SBATCH -p regular #SBATCH -J knlflat1024 #SBATCH -L SCRATCH #SBATCH -o knlflat1024.o%j #SBATCH --mail-type=ALL #SBATCH --mail-user=jychang48 at gmail.com #SBATCH -t 00:20:00 #run the application: cd $SCRATCH/Icesheet sbcast --compress=lz4 ./ex48cori /tmp/ex48cori srun -n 1024 -c 4 --cpu_bind=cores numactl -p 1 /tmp/ex48cori -M 128 -N 128 -P 16 -thi_mat_type baij -pc_type mg -mg_coarse_pc_type gamg -da_refine 1 Maybe it is a typo. It should be numactl -m 1. "-p 1" will also work. "-p" means to "prefer" NUMA node 1 (the MCDRAM), whereas "-m" means to use only NUMA node 1. In the former case, MCDRAM will be used for allocations until the available memory there has been exhausted, and then things will spill over into the DRAM. One would think that "-m" would be better for doing performance studies, but on systems where the nodes have swap space enabled, you can get terrible performance if your code's working set exceeds the size of the MCDRAM, as the system will obediently obey your wishes to not use the DRAM and go straight to the swap disk! I assume the Cori nodes don't have swap space, though I could be wrong. According to the NERSC info pages, they say to add the "numactl" if using flat mode. Previously I tried cache mode but the performance seems to be unaffected. Using cache mode should give similar performance as using flat mode with the numactl option. But both approaches should be significant faster than using flat mode without the numactl option. I usually see over 3X speedup. You can also do such comparison to see if the high-bandwidth memory is working properly. I also comparerd 256 haswell nodes vs 256 KNL nodes and haswell is nearly 4-5x faster. Though I suspect this drastic change has much to do with the initial coarse grid size now being extremely small. I think you may be right about why you see such a big difference. The KNL nodes need enough work to be able to use the SIMD lanes effectively. Also, if your problem gets small enough, then it's going to be able to fit in the Haswell's L3 cache. Although KNL has MCDRAM and this delivers *a lot* more memory bandwidth than the DDR4 memory, it will deliver a lot less bandwidth than the Haswell's L3. I'll give the COPTFLAGS a try and see what happens Make sure to use --with-memalign=64 for data alignment when configuring PETSc. Ah, yes, I forgot that. Thanks for mentioning it, Hong! The option -xMIC-AVX512 would improve the vectorization performance. But it may cause problems for the MPIBAIJ format for some unknown reason. MPIAIJ should work fine with this option. Hmm. Try both, and, if you see worse performance with MPIBAIJ, let us know and I'll try to figure this out. --Richard Hong (Mr.) Thanks, Justin On Mon, Apr 3, 2017 at 1:36 PM, Richard Mills > wrote: Hi Justin, How is the MCDRAM (on-package "high-bandwidth memory") configured for your KNL runs? And if it is in "flat" mode, what are you doing to ensure that you use the MCDRAM? Doing this wrong seems to be one of the most common reasons for unexpected poor performance on KNL. I'm not that familiar with the environment on Cori, but I think that if you are building for KNL, you should add "-xMIC-AVX512" to your compiler flags to explicitly instruct the compiler to use the AVX512 instruction set. I usually use something along the lines of 'COPTFLAGS=-g -O3 -fp-model fast -xMIC-AVX512' (The "-g" just adds symbols, which make the output from performance profiling tools much more useful.) That said, I think that if you are comparing 1024 Haswell cores vs. 1024 KNL cores (so double the number of Haswell nodes), I'm not surprised that the simulations are almost twice as fast using the Haswell nodes. Keep in mind that individual KNL cores are much less powerful than an individual Haswell node. You are also using roughly twice the power footprint (dual socket Haswell node should be roughly equivalent to a KNL node, I believe). How do things look on when you compare equal nodes? Cheers, Richard On Mon, Apr 3, 2017 at 11:13 AM, Justin Chang > wrote: Hi all, On NERSC's Cori I have the following configure options for PETSc: ./configure --download-fblaslapack --with-cc=cc --with-clib-autodetect=0 --with-cxx=CC --with-cxxlib-autodetect=0 --with-debugging=0 --with-fc=ftn --with-fortranlib-autodetect=0 --with-mpiexec=srun --with-64-bit-indices=1 COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 PETSC_ARCH=arch-cori-opt Where I swapped out the default Intel programming environment with that of Cray (e.g., 'module switch PrgEnv-intel/6.0.3 PrgEnv-cray/6.0.3'). I want to document the performance difference between Cori's Haswell and KNL processors. When I run a PETSc example like SNES ex48 on 1024 cores (32 Haswell and 16 KNL nodes), the simulations are almost twice as fast on Haswell nodes. Which leads me to suspect that I am not doing something right for KNL. Does anyone know what are some "optimal" configure options for running PETSc on KNL? Thanks, Justin -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jychang48 at gmail.com Tue Apr 4 14:44:39 2017 From: jychang48 at gmail.com (Justin Chang) Date: Tue, 4 Apr 2017 14:44:39 -0500 Subject: [petsc-users] Configuring PETSc for KNL In-Reply-To: References: Message-ID: Attached are the job output files (which include -log_view) for SNES ex48 run on a single haswell and knl node (32 and 64 cores respectively). Started off with a coarse grid of size 40x40x5 and ran three different tests with -da_refine 1/2/3 and -pc_type mg What's interesting/strange is that if i try to do -da_refine 4 on KNL, i get a slurm error that says: "slurmstepd: error: Step 4408401.0 exceeded memory limit (96737652 > 94371840), being killed" but it runs perfectly fine on Haswell. Adding -pc_mg_levels 7 enables KNL to run on -da_refine 4 but the performance still does not beat out haswell. The performance spectrum (dofs/sec) for 1-3 levels of refinement looks like this: Haswell: 2.416e+03 1.490e+04 5.188e+04 KNL: 9.308e+02 7.257e+03 3.838e+04 Which might suggest to me that KNL performs better with larger problem sizes. On Tue, Apr 4, 2017 at 11:05 AM, Matthew Knepley wrote: > On Tue, Apr 4, 2017 at 10:57 AM, Justin Chang wrote: > >> Thanks everyone for the helpful advice. So I tried all the suggestions >> including using libsci. The performance did not improve for my particular >> runs, which I think suggests the problem parameters chosen for my tests >> (SNES ex48) are not optimal for KNL. Does anyone have example test runs I >> could reproduce that compare the performance between KNL and >> Haswell/Ivybridge/etc? >> > > Lets try to see what is going on with your existing data first. > > First, I think that main thing is to make sure we are using MCDRAM. > Everything else in KNL > is window dressing (IMHO). All we have to look at is something like MAXPY. > You can get the > bandwidth estimate from the flop rate and problem size (I think), and we > can at least get > bandwidth ratios between Haswell and KNL with that number. > > Matt > > >> On Mon, Apr 3, 2017 at 3:06 PM Richard Mills >> wrote: >> >>> Yes, one should rely on MKL (or Cray LibSci, if using the Cray >>> toolchain) on Cori. But I'm guessing that this will make no noticeable >>> difference for what Justin is doing. >>> >>> --Richard >>> >>> On Mon, Apr 3, 2017 at 12:57 PM, murat ke?eli wrote: >>> >>> How about replacing --download-fblaslapack with vendor specific >>> BLAS/LAPACK? >>> >>> Murat >>> >>> On Mon, Apr 3, 2017 at 2:45 PM, Richard Mills >>> wrote: >>> >>> On Mon, Apr 3, 2017 at 12:24 PM, Zhang, Hong wrote: >>> >>> >>> On Apr 3, 2017, at 1:44 PM, Justin Chang wrote: >>> >>> Richard, >>> >>> This is what my job script looks like: >>> >>> #!/bin/bash >>> #SBATCH -N 16 >>> #SBATCH -C knl,quad,flat >>> #SBATCH -p regular >>> #SBATCH -J knlflat1024 >>> #SBATCH -L SCRATCH >>> #SBATCH -o knlflat1024.o%j >>> #SBATCH --mail-type=ALL >>> #SBATCH --mail-user=jychang48 at gmail.com >>> #SBATCH -t 00:20:00 >>> >>> #run the application: >>> cd $SCRATCH/Icesheet >>> sbcast --compress=lz4 ./ex48cori /tmp/ex48cori >>> srun -n 1024 -c 4 --cpu_bind=cores numactl -p 1 /tmp/ex48cori -M 128 -N >>> 128 -P 16 -thi_mat_type baij -pc_type mg -mg_coarse_pc_type gamg -da_refine >>> 1 >>> >>> >>> Maybe it is a typo. It should be numactl -m 1. >>> >>> >>> "-p 1" will also work. "-p" means to "prefer" NUMA node 1 (the MCDRAM), >>> whereas "-m" means to use only NUMA node 1. In the former case, MCDRAM >>> will be used for allocations until the available memory there has been >>> exhausted, and then things will spill over into the DRAM. One would think >>> that "-m" would be better for doing performance studies, but on systems >>> where the nodes have swap space enabled, you can get terrible performance >>> if your code's working set exceeds the size of the MCDRAM, as the system >>> will obediently obey your wishes to not use the DRAM and go straight to the >>> swap disk! I assume the Cori nodes don't have swap space, though I could >>> be wrong. >>> >>> >>> According to the NERSC info pages, they say to add the "numactl" if >>> using flat mode. Previously I tried cache mode but the performance seems to >>> be unaffected. >>> >>> >>> Using cache mode should give similar performance as using flat mode with >>> the numactl option. But both approaches should be significant faster than >>> using flat mode without the numactl option. I usually see over 3X speedup. >>> You can also do such comparison to see if the high-bandwidth memory is >>> working properly. >>> >>> I also comparerd 256 haswell nodes vs 256 KNL nodes and haswell is >>> nearly 4-5x faster. Though I suspect this drastic change has much to do >>> with the initial coarse grid size now being extremely small. >>> >>> I think you may be right about why you see such a big difference. The >>> KNL nodes need enough work to be able to use the SIMD lanes effectively. >>> Also, if your problem gets small enough, then it's going to be able to fit >>> in the Haswell's L3 cache. Although KNL has MCDRAM and this delivers *a >>> lot* more memory bandwidth than the DDR4 memory, it will deliver a lot less >>> bandwidth than the Haswell's L3. >>> >>> I'll give the COPTFLAGS a try and see what happens >>> >>> >>> Make sure to use --with-memalign=64 for data alignment when configuring >>> PETSc. >>> >>> >>> Ah, yes, I forgot that. Thanks for mentioning it, Hong! >>> >>> >>> The option -xMIC-AVX512 would improve the vectorization performance. But >>> it may cause problems for the MPIBAIJ format for some unknown reason. >>> MPIAIJ should work fine with this option. >>> >>> >>> Hmm. Try both, and, if you see worse performance with MPIBAIJ, let us >>> know and I'll try to figure this out. >>> >>> --Richard >>> >>> >>> >>> Hong (Mr.) >>> >>> Thanks, >>> Justin >>> >>> On Mon, Apr 3, 2017 at 1:36 PM, Richard Mills >>> wrote: >>> >>> Hi Justin, >>> >>> How is the MCDRAM (on-package "high-bandwidth memory") configured for >>> your KNL runs? And if it is in "flat" mode, what are you doing to ensure >>> that you use the MCDRAM? Doing this wrong seems to be one of the most >>> common reasons for unexpected poor performance on KNL. >>> >>> I'm not that familiar with the environment on Cori, but I think that if >>> you are building for KNL, you should add "-xMIC-AVX512" to your compiler >>> flags to explicitly instruct the compiler to use the AVX512 instruction >>> set. I usually use something along the lines of >>> >>> 'COPTFLAGS=-g -O3 -fp-model fast -xMIC-AVX512' >>> >>> (The "-g" just adds symbols, which make the output from performance >>> profiling tools much more useful.) >>> >>> That said, I think that if you are comparing 1024 Haswell cores vs. 1024 >>> KNL cores (so double the number of Haswell nodes), I'm not surprised that >>> the simulations are almost twice as fast using the Haswell nodes. Keep in >>> mind that individual KNL cores are much less powerful than an individual >>> Haswell node. You are also using roughly twice the power footprint (dual >>> socket Haswell node should be roughly equivalent to a KNL node, I >>> believe). How do things look on when you compare equal nodes? >>> >>> Cheers, >>> Richard >>> >>> On Mon, Apr 3, 2017 at 11:13 AM, Justin Chang >>> wrote: >>> >>> Hi all, >>> >>> On NERSC's Cori I have the following configure options for PETSc: >>> >>> ./configure --download-fblaslapack --with-cc=cc --with-clib-autodetect=0 >>> --with-cxx=CC --with-cxxlib-autodetect=0 --with-debugging=0 --with-fc=ftn >>> --with-fortranlib-autodetect=0 --with-mpiexec=srun --with-64-bit-indices=1 >>> COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 PETSC_ARCH=arch-cori-opt >>> >>> Where I swapped out the default Intel programming environment with that >>> of Cray (e.g., 'module switch PrgEnv-intel/6.0.3 PrgEnv-cray/6.0.3'). I >>> want to document the performance difference between Cori's Haswell and KNL >>> processors. >>> >>> When I run a PETSc example like SNES ex48 on 1024 cores (32 Haswell and >>> 16 KNL nodes), the simulations are almost twice as fast on Haswell nodes. >>> Which leads me to suspect that I am not doing something right for KNL. Does >>> anyone know what are some "optimal" configure options for running PETSc on >>> KNL? >>> >>> Thanks, >>> Justin >>> >>> >>> >>> >>> >>> >>> >>> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: testhas_flat_1node.o4407087 Type: application/octet-stream Size: 38661 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: testknl_flat_1node.o4407080 Type: application/octet-stream Size: 38551 bytes Desc: not available URL: From jed at jedbrown.org Tue Apr 4 15:02:07 2017 From: jed at jedbrown.org (Jed Brown) Date: Tue, 04 Apr 2017 14:02:07 -0600 Subject: [petsc-users] Configuring PETSc for KNL In-Reply-To: References: Message-ID: <87mvbwkklc.fsf@jedbrown.org> Justin Chang writes: > Thanks everyone for the helpful advice. So I tried all the suggestions > including using libsci. The performance did not improve for my particular > runs, which I think suggests the problem parameters chosen for my tests > (SNES ex48) are not optimal for KNL. Where is -log_view output on the two machines? Note that ex48 has SSE2 intrinsics, but KNL would like AVX512 (versus AVX on Haswell). -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From jed at jedbrown.org Tue Apr 4 15:18:54 2017 From: jed at jedbrown.org (Jed Brown) Date: Tue, 04 Apr 2017 14:18:54 -0600 Subject: [petsc-users] examples of DMPlex*FVM methods In-Reply-To: References: Message-ID: <87k270kjtd.fsf@jedbrown.org> Ingo Gaertner writes: > We have never talked about Riemann solvers in our CFD course, and I don't > understand what's going on in ex11. > However, if you could answer a few of my questions, you'll give me a good > start with PETSc. For the simple poisson problem that I am trying to > implement, I have to discretize div(k grad u) integrated over each FV cell, > where k is the known diffusivity, and u is the vector to solve for. Note that ex11 solves hyperbolic conservation laws, but you are solving an elliptic equation. > The cell integral is approximated as the sum of the fluxes (k grad u) > in each face centroid multiplied by each surface area vector. In > principle, I have all necessary information available in the DMPlex to > assemble the FV matrix for div(k grad u), assuming an orthogonal grid > for the beginning. But I thought that the gradient coefficients > should be available in the PETScFVFaceGeom.grad elements of the > faceGeometry vector after using the methods > > DMPlexComputeGeometryFVM > (DM > > dm, Vec > *cellgeom, Vec > *facegeom) and > DMPlexComputeGradientFVM > (DM > > dm, PetscFV > fvm, Vec > faceGeometry, Vec > > cellGeometry, DM > > *dmGrad) > > But, while these calls fill in the cell and face centroids, volumes and > normals, the PETScFVFaceGeom.grad elements of the faceGeometry vector are > all zero. Am I misunderstanding the purpose of DMPlexComputeGradientFVM? This routine is intended to do a conservative reconstruction of a piecewise linear function from piecewise constant data. Is that what you want? > (My second question is more general about the PETSc installation. When I > configure PETSc with "--prefix=/somewhere --download-triangle > --download-parmetis" etc., these extra libraries are built correctly during > the make step, but they are not copied to /somewhere during the "make > install" step. Where are they put during configure? > Also the pkg-config files don't include the details about the extra > libraries. Is it intended that one has to manually correct the install > directory, or am I missing something during the configuration or > installation steps?) You shouldn't need that if linking with shared libraries (pkg-config says not to list them), but you can use pkg-config --static to get the transitive dependencies. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From me at karlrupp.net Tue Apr 4 15:34:00 2017 From: me at karlrupp.net (Karl Rupp) Date: Tue, 4 Apr 2017 22:34:00 +0200 Subject: [petsc-users] Configuring PETSc for KNL In-Reply-To: References: Message-ID: <50f047d3-ef83-ce67-1c00-31711db59976@karlrupp.net> Hey, here's some data on what you should see with STREAM when comparing against conventional XEONs: https://www.karlrupp.net/2016/07/knights-landing-vs-knights-corner-haswell-ivy-bridge-and-sandy-bridge-stream-benchmark-results/ Note that MCDRAM only pays off if you can keep enough cores busy. Thus, anything below 16 processes is unlikely to give you any benefit. Also, your working set must be large enough not to stay in L3 on Haswell (I think this was already mentioned earlier in this thread). Best regards, Karli On 04/04/2017 06:05 PM, Matthew Knepley wrote: > On Tue, Apr 4, 2017 at 10:57 AM, Justin Chang > wrote: > > Thanks everyone for the helpful advice. So I tried all the > suggestions including using libsci. The performance did not improve > for my particular runs, which I think suggests the problem > parameters chosen for my tests (SNES ex48) are not optimal for KNL. > Does anyone have example test runs I could reproduce that compare > the performance between KNL and Haswell/Ivybridge/etc? > > > Lets try to see what is going on with your existing data first. > > First, I think that main thing is to make sure we are using MCDRAM. > Everything else in KNL > is window dressing (IMHO). All we have to look at is something like > MAXPY. You can get the > bandwidth estimate from the flop rate and problem size (I think), and we > can at least get > bandwidth ratios between Haswell and KNL with that number. > > Matt > > > On Mon, Apr 3, 2017 at 3:06 PM Richard Mills > > wrote: > > Yes, one should rely on MKL (or Cray LibSci, if using the Cray > toolchain) on Cori. But I'm guessing that this will make no > noticeable difference for what Justin is doing. > > --Richard > > On Mon, Apr 3, 2017 at 12:57 PM, murat ke?eli > wrote: > > How about replacing --download-fblaslapack with vendor > specific BLAS/LAPACK? > > Murat > > On Mon, Apr 3, 2017 at 2:45 PM, Richard Mills > > > wrote: > > On Mon, Apr 3, 2017 at 12:24 PM, Zhang, Hong > > wrote: > > >> On Apr 3, 2017, at 1:44 PM, Justin Chang >> > >> wrote: >> >> Richard, >> >> This is what my job script looks like: >> >> #!/bin/bash >> #SBATCH -N 16 >> #SBATCH -C knl,quad,flat >> #SBATCH -p regular >> #SBATCH -J knlflat1024 >> #SBATCH -L SCRATCH >> #SBATCH -o knlflat1024.o%j >> #SBATCH --mail-type=ALL >> #SBATCH --mail-user=jychang48 at gmail.com >> >> #SBATCH -t 00:20:00 >> >> #run the application: >> cd $SCRATCH/Icesheet >> sbcast --compress=lz4 ./ex48cori /tmp/ex48cori >> srun -n 1024 -c 4 --cpu_bind=cores numactl -p 1 >> /tmp/ex48cori -M 128 -N 128 -P 16 -thi_mat_type >> baij -pc_type mg -mg_coarse_pc_type gamg -da_refine 1 >> > > Maybe it is a typo. It should be numactl -m 1. > > > "-p 1" will also work. "-p" means to "prefer" NUMA node > 1 (the MCDRAM), whereas "-m" means to use only NUMA node > 1. In the former case, MCDRAM will be used for > allocations until the available memory there has been > exhausted, and then things will spill over into the > DRAM. One would think that "-m" would be better for > doing performance studies, but on systems where the > nodes have swap space enabled, you can get terrible > performance if your code's working set exceeds the size > of the MCDRAM, as the system will obediently obey your > wishes to not use the DRAM and go straight to the swap > disk! I assume the Cori nodes don't have swap space, > though I could be wrong. > > >> According to the NERSC info pages, they say to add >> the "numactl" if using flat mode. Previously I >> tried cache mode but the performance seems to be >> unaffected. > > Using cache mode should give similar performance as > using flat mode with the numactl option. But both > approaches should be significant faster than using > flat mode without the numactl option. I usually see > over 3X speedup. You can also do such comparison to > see if the high-bandwidth memory is working properly. > >> I also comparerd 256 haswell nodes vs 256 KNL >> nodes and haswell is nearly 4-5x faster. Though I >> suspect this drastic change has much to do with >> the initial coarse grid size now being extremely >> small. > > I think you may be right about why you see such a big > difference. The KNL nodes need enough work to be able > to use the SIMD lanes effectively. Also, if your > problem gets small enough, then it's going to be able to > fit in the Haswell's L3 cache. Although KNL has MCDRAM > and this delivers *a lot* more memory bandwidth than the > DDR4 memory, it will deliver a lot less bandwidth than > the Haswell's L3. > >> I'll give the COPTFLAGS a try and see what happens > > Make sure to use --with-memalign=64 for data > alignment when configuring PETSc. > > > Ah, yes, I forgot that. Thanks for mentioning it, Hong! > > > The option -xMIC-AVX512 would improve the > vectorization performance. But it may cause problems > for the MPIBAIJ format for some unknown reason. > MPIAIJ should work fine with this option. > > > Hmm. Try both, and, if you see worse performance with > MPIBAIJ, let us know and I'll try to figure this out. > > --Richard > > > > Hong (Mr.) > >> Thanks, >> Justin >> >> On Mon, Apr 3, 2017 at 1:36 PM, Richard Mills >> > > wrote: >> >> Hi Justin, >> >> How is the MCDRAM (on-package "high-bandwidth >> memory") configured for your KNL runs? And if >> it is in "flat" mode, what are you doing to >> ensure that you use the MCDRAM? Doing this >> wrong seems to be one of the most common >> reasons for unexpected poor performance on KNL. >> >> I'm not that familiar with the environment on >> Cori, but I think that if you are building for >> KNL, you should add "-xMIC-AVX512" to your >> compiler flags to explicitly instruct the >> compiler to use the AVX512 instruction set. I >> usually use something along the lines of >> >> 'COPTFLAGS=-g -O3 -fp-model fast -xMIC-AVX512' >> >> (The "-g" just adds symbols, which make the >> output from performance profiling tools much >> more useful.) >> >> That said, I think that if you are comparing >> 1024 Haswell cores vs. 1024 KNL cores (so >> double the number of Haswell nodes), I'm not >> surprised that the simulations are almost >> twice as fast using the Haswell nodes. Keep >> in mind that individual KNL cores are much >> less powerful than an individual Haswell >> node. You are also using roughly twice the >> power footprint (dual socket Haswell node >> should be roughly equivalent to a KNL node, I >> believe). How do things look on when you >> compare equal nodes? >> >> Cheers, >> Richard >> >> On Mon, Apr 3, 2017 at 11:13 AM, Justin Chang >> > > wrote: >> >> Hi all, >> >> On NERSC's Cori I have the following >> configure options for PETSc: >> >> ./configure --download-fblaslapack >> --with-cc=cc --with-clib-autodetect=0 >> --with-cxx=CC --with-cxxlib-autodetect=0 >> --with-debugging=0 --with-fc=ftn >> --with-fortranlib-autodetect=0 >> --with-mpiexec=srun >> --with-64-bit-indices=1 COPTFLAGS=-O3 >> CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 >> PETSC_ARCH=arch-cori-opt >> >> Where I swapped out the default Intel >> programming environment with that of Cray >> (e.g., 'module switch PrgEnv-intel/6.0.3 >> PrgEnv-cray/6.0.3'). I want to document >> the performance difference between Cori's >> Haswell and KNL processors. >> >> When I run a PETSc example like SNES ex48 >> on 1024 cores (32 Haswell and 16 KNL >> nodes), the simulations are almost twice >> as fast on Haswell nodes. Which leads me >> to suspect that I am not doing something >> right for KNL. Does anyone know what are >> some "optimal" configure options for >> running PETSc on KNL? >> >> Thanks, >> Justin >> >> >> > > > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener From filippo.leon at gmail.com Tue Apr 4 15:40:32 2017 From: filippo.leon at gmail.com (Filippo Leonardi) Date: Tue, 04 Apr 2017 20:40:32 +0000 Subject: [petsc-users] PETSc with modern C++ In-Reply-To: References: <87o9wdm6mc.fsf@jedbrown.org> <026B5877-0005-4759-B205-86C24C69628C@mcs.anl.gov> Message-ID: I had weird issues where gcc (that I am using for my tests right now) wasn't vectorising properly (even enabling all flags, from tree-vectorize, to mavx). According to my tests, I know the Intel compiler was a bit better at that. I actually did not know PETSc was doing some unrolling himself. On my machine, PETSc aligns the memory to 16 bytes, that might also be a cause. However, I have no idea on the ability of the different compilers to detect and vectorize codes, even when those are manually unrolled by the user. I see PETSc unrolls the loop for MAXPY up 4 right hand side vectors. Just by coincidence, the numbers I reported are actually for this latter case. I pushed my code to a public git repository. However let me stress that the code is extremely ugly and buggy. I actually feel shame in showing you the code :) : https://bitbucket.org/FilippoL/petscxx/commits/branch/master If you want to look at the API usage, you can find it in the file "/src/benchmarks/bench_vector.cpp" after line 153. If you want the expressions generating the kernels you have to look at the file "/src/base/vectorexpression.hpp", specifically the "evalat" members. On Tue, 4 Apr 2017 at 21:39 Matthew Knepley wrote: On Tue, Apr 4, 2017 at 1:19 PM, Filippo Leonardi wrote: You are in fact right, it is the same speedup of approximatively 2.5x (with 2 ranks), my brain rounded up to 3. (This was just a test done in 10 min on my Workstation, so no pretence to be definite, I just wanted to have an indication). Hmm, it seems like PetscKernelAXPY4() is just not vectorizing correctly then. I would be interested to see your code. As you say, I am using OpenBLAS, so I wouldn't be surprised of those results. If/when I use MKL (or something similar), I really do not expect such an improvement). Since you seem interested (if you are interested, I can give you all the details): the comparison I make, is with "petscxx" which is my template code (which uses a single loop) using AVX (I force PETSc to align the memory to 32 bit boundary and then I use packets of 4 doubles). Also notice that I use vectors with nice lengths, so there is no need to "peel" the end of the loop. The "PETSc" simulation is using PETSc's VecMAXPY. Thanks, Matt On Tue, 4 Apr 2017 at 19:12 Barry Smith wrote: MAXPY isn't really a BLAS 1 since it can reuse some data in certain vectors. > On Apr 4, 2017, at 10:25 AM, Filippo Leonardi wrote: > > I really appreciate the feedback. Thanks. > > That of deadlock, when the order of destruction is not preserved, is a point I hadn't thought of. Maybe it can be cleverly addressed. > > PS: If you are interested, I ran some benchmark on BLAS1 stuff and, for a single processor, I obtain: > > Example for MAXPY, with expression templates: > BM_Vector_petscxx_MAXPY/8 38 ns 38 ns 18369805 > BM_Vector_petscxx_MAXPY/64 622 ns 622 ns 1364335 > BM_Vector_petscxx_MAXPY/512 281 ns 281 ns 2477718 > BM_Vector_petscxx_MAXPY/4096 2046 ns 2046 ns 349954 > BM_Vector_petscxx_MAXPY/32768 18012 ns 18012 ns 38788 > BM_Vector_petscxx_MAXPY_BigO 0.55 N 0.55 N > BM_Vector_petscxx_MAXPY_RMS 7 % 7 % > Direct call to MAXPY: > BM_Vector_PETSc_MAXPY/8 33 ns 33 ns 20973674 > BM_Vector_PETSc_MAXPY/64 116 ns 116 ns 5992878 > BM_Vector_PETSc_MAXPY/512 731 ns 731 ns 963340 > BM_Vector_PETSc_MAXPY/4096 5739 ns 5739 ns 122414 > BM_Vector_PETSc_MAXPY/32768 46346 ns 46346 ns 15312 > BM_Vector_PETSc_MAXPY_BigO 1.41 N 1.41 N > BM_Vector_PETSc_MAXPY_RMS 0 % 0 % > > And 3x speedup on 2 MPI ranks (not much communication here, anyway). I am now convinced that this warrants some further investigation/testing. > > > On Tue, 4 Apr 2017 at 01:08 Jed Brown wrote: > Matthew Knepley writes: > > >> BLAS. (Here a interesting point opens: I assume an efficient BLAS > >> > >> implementation, but I am not so sure about how the different BLAS do > >> things > >> > >> internally. I work from the assumption that we have a very well tuned BLAS > >> > >> implementation at our disposal). > >> > > > > The speed improvement comes from pulling vectors through memory fewer > > times by merging operations (kernel fusion). > > Typical examples are VecMAXPY and VecMDot, but note that these are not > xGEMV because the vectors are independent arrays rather than single > arrays with a constant leading dimension. > > >> call VecGetArray. However I will inevitably foget to return the array to > >> > >> PETSc. I could have my new VecArray returning an object that restores the > >> > >> array > >> > >> when it goes out of scope. I can also flag the function with [[nodiscard]] > >> to > >> > >> prevent the user to destroy the returned object from the start. > >> > > > > Jed claims that this pattern is no longer preferred, but I have forgotten > > his argument. > > Jed? > > Destruction order matters and needs to be collective. If an error > condition causes destruction to occur in a different order on different > processes, you can get deadlock. I would much rather have an error > leave some resources (for the OS to collect) than escalate into > deadlock. > > > We have had this discussion for years on this list. Having separate names > > for each type > > is really ugly and does not achieve what we want. We want smooth > > interoperability between > > objects with different backing types, but it is still not clear how to do > > this. > > Hide it internally and implicitly promote. Only the *GetArray functions > need to be parametrized on numeric type. But it's a lot of work on the > backend. -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Tue Apr 4 16:19:42 2017 From: jed at jedbrown.org (Jed Brown) Date: Tue, 04 Apr 2017 15:19:42 -0600 Subject: [petsc-users] Configuring PETSc for KNL In-Reply-To: References: Message-ID: <87h923lvkh.fsf@jedbrown.org> Justin Chang writes: > Attached are the job output files (which include -log_view) for SNES ex48 > run on a single haswell and knl node (32 and 64 cores respectively). > Started off with a coarse grid of size 40x40x5 and ran three different > tests with -da_refine 1/2/3 and -pc_type mg > > What's interesting/strange is that if i try to do -da_refine 4 on KNL, i > get a slurm error that says: "slurmstepd: error: Step 4408401.0 exceeded > memory limit (96737652 > 94371840), being killed" but it runs perfectly > fine on Haswell. Adding -pc_mg_levels 7 enables KNL to run on -da_refine 4 > but the performance still does not beat out haswell. > > The performance spectrum (dofs/sec) for 1-3 levels of refinement looks like > this: > > Haswell: > 2.416e+03 > 1.490e+04 > 5.188e+04 > > KNL: > 9.308e+02 > 7.257e+03 > 3.838e+04 > > Which might suggest to me that KNL performs better with larger problem > sizes. Look at the events. The (redundant) coarse LU factorization takes most of the run time on KNL. The PETSc sparse LU is not vectorized and doesn't exploit dense blocks in the way that the optimized direct solvers do. You'll note that the paper was more aggressive about minimizing the coarse grid size and used BoomerAMG instead of redundant direct solves to avoid this scaling problem. > On Tue, Apr 4, 2017 at 11:05 AM, Matthew Knepley wrote: > >> On Tue, Apr 4, 2017 at 10:57 AM, Justin Chang wrote: >> >>> Thanks everyone for the helpful advice. So I tried all the suggestions >>> including using libsci. The performance did not improve for my particular >>> runs, which I think suggests the problem parameters chosen for my tests >>> (SNES ex48) are not optimal for KNL. Does anyone have example test runs I >>> could reproduce that compare the performance between KNL and >>> Haswell/Ivybridge/etc? >>> >> >> Lets try to see what is going on with your existing data first. >> >> First, I think that main thing is to make sure we are using MCDRAM. >> Everything else in KNL >> is window dressing (IMHO). All we have to look at is something like MAXPY. >> You can get the >> bandwidth estimate from the flop rate and problem size (I think), and we >> can at least get >> bandwidth ratios between Haswell and KNL with that number. >> >> Matt >> >> >>> On Mon, Apr 3, 2017 at 3:06 PM Richard Mills >>> wrote: >>> >>>> Yes, one should rely on MKL (or Cray LibSci, if using the Cray >>>> toolchain) on Cori. But I'm guessing that this will make no noticeable >>>> difference for what Justin is doing. >>>> >>>> --Richard >>>> >>>> On Mon, Apr 3, 2017 at 12:57 PM, murat ke?eli wrote: >>>> >>>> How about replacing --download-fblaslapack with vendor specific >>>> BLAS/LAPACK? >>>> >>>> Murat >>>> >>>> On Mon, Apr 3, 2017 at 2:45 PM, Richard Mills >>>> wrote: >>>> >>>> On Mon, Apr 3, 2017 at 12:24 PM, Zhang, Hong wrote: >>>> >>>> >>>> On Apr 3, 2017, at 1:44 PM, Justin Chang wrote: >>>> >>>> Richard, >>>> >>>> This is what my job script looks like: >>>> >>>> #!/bin/bash >>>> #SBATCH -N 16 >>>> #SBATCH -C knl,quad,flat >>>> #SBATCH -p regular >>>> #SBATCH -J knlflat1024 >>>> #SBATCH -L SCRATCH >>>> #SBATCH -o knlflat1024.o%j >>>> #SBATCH --mail-type=ALL >>>> #SBATCH --mail-user=jychang48 at gmail.com >>>> #SBATCH -t 00:20:00 >>>> >>>> #run the application: >>>> cd $SCRATCH/Icesheet >>>> sbcast --compress=lz4 ./ex48cori /tmp/ex48cori >>>> srun -n 1024 -c 4 --cpu_bind=cores numactl -p 1 /tmp/ex48cori -M 128 -N >>>> 128 -P 16 -thi_mat_type baij -pc_type mg -mg_coarse_pc_type gamg -da_refine >>>> 1 >>>> >>>> >>>> Maybe it is a typo. It should be numactl -m 1. >>>> >>>> >>>> "-p 1" will also work. "-p" means to "prefer" NUMA node 1 (the MCDRAM), >>>> whereas "-m" means to use only NUMA node 1. In the former case, MCDRAM >>>> will be used for allocations until the available memory there has been >>>> exhausted, and then things will spill over into the DRAM. One would think >>>> that "-m" would be better for doing performance studies, but on systems >>>> where the nodes have swap space enabled, you can get terrible performance >>>> if your code's working set exceeds the size of the MCDRAM, as the system >>>> will obediently obey your wishes to not use the DRAM and go straight to the >>>> swap disk! I assume the Cori nodes don't have swap space, though I could >>>> be wrong. >>>> >>>> >>>> According to the NERSC info pages, they say to add the "numactl" if >>>> using flat mode. Previously I tried cache mode but the performance seems to >>>> be unaffected. >>>> >>>> >>>> Using cache mode should give similar performance as using flat mode with >>>> the numactl option. But both approaches should be significant faster than >>>> using flat mode without the numactl option. I usually see over 3X speedup. >>>> You can also do such comparison to see if the high-bandwidth memory is >>>> working properly. >>>> >>>> I also comparerd 256 haswell nodes vs 256 KNL nodes and haswell is >>>> nearly 4-5x faster. Though I suspect this drastic change has much to do >>>> with the initial coarse grid size now being extremely small. >>>> >>>> I think you may be right about why you see such a big difference. The >>>> KNL nodes need enough work to be able to use the SIMD lanes effectively. >>>> Also, if your problem gets small enough, then it's going to be able to fit >>>> in the Haswell's L3 cache. Although KNL has MCDRAM and this delivers *a >>>> lot* more memory bandwidth than the DDR4 memory, it will deliver a lot less >>>> bandwidth than the Haswell's L3. >>>> >>>> I'll give the COPTFLAGS a try and see what happens >>>> >>>> >>>> Make sure to use --with-memalign=64 for data alignment when configuring >>>> PETSc. >>>> >>>> >>>> Ah, yes, I forgot that. Thanks for mentioning it, Hong! >>>> >>>> >>>> The option -xMIC-AVX512 would improve the vectorization performance. But >>>> it may cause problems for the MPIBAIJ format for some unknown reason. >>>> MPIAIJ should work fine with this option. >>>> >>>> >>>> Hmm. Try both, and, if you see worse performance with MPIBAIJ, let us >>>> know and I'll try to figure this out. >>>> >>>> --Richard >>>> >>>> >>>> >>>> Hong (Mr.) >>>> >>>> Thanks, >>>> Justin >>>> >>>> On Mon, Apr 3, 2017 at 1:36 PM, Richard Mills >>>> wrote: >>>> >>>> Hi Justin, >>>> >>>> How is the MCDRAM (on-package "high-bandwidth memory") configured for >>>> your KNL runs? And if it is in "flat" mode, what are you doing to ensure >>>> that you use the MCDRAM? Doing this wrong seems to be one of the most >>>> common reasons for unexpected poor performance on KNL. >>>> >>>> I'm not that familiar with the environment on Cori, but I think that if >>>> you are building for KNL, you should add "-xMIC-AVX512" to your compiler >>>> flags to explicitly instruct the compiler to use the AVX512 instruction >>>> set. I usually use something along the lines of >>>> >>>> 'COPTFLAGS=-g -O3 -fp-model fast -xMIC-AVX512' >>>> >>>> (The "-g" just adds symbols, which make the output from performance >>>> profiling tools much more useful.) >>>> >>>> That said, I think that if you are comparing 1024 Haswell cores vs. 1024 >>>> KNL cores (so double the number of Haswell nodes), I'm not surprised that >>>> the simulations are almost twice as fast using the Haswell nodes. Keep in >>>> mind that individual KNL cores are much less powerful than an individual >>>> Haswell node. You are also using roughly twice the power footprint (dual >>>> socket Haswell node should be roughly equivalent to a KNL node, I >>>> believe). How do things look on when you compare equal nodes? >>>> >>>> Cheers, >>>> Richard >>>> >>>> On Mon, Apr 3, 2017 at 11:13 AM, Justin Chang >>>> wrote: >>>> >>>> Hi all, >>>> >>>> On NERSC's Cori I have the following configure options for PETSc: >>>> >>>> ./configure --download-fblaslapack --with-cc=cc --with-clib-autodetect=0 >>>> --with-cxx=CC --with-cxxlib-autodetect=0 --with-debugging=0 --with-fc=ftn >>>> --with-fortranlib-autodetect=0 --with-mpiexec=srun --with-64-bit-indices=1 >>>> COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 PETSC_ARCH=arch-cori-opt >>>> >>>> Where I swapped out the default Intel programming environment with that >>>> of Cray (e.g., 'module switch PrgEnv-intel/6.0.3 PrgEnv-cray/6.0.3'). I >>>> want to document the performance difference between Cori's Haswell and KNL >>>> processors. >>>> >>>> When I run a PETSc example like SNES ex48 on 1024 cores (32 Haswell and >>>> 16 KNL nodes), the simulations are almost twice as fast on Haswell nodes. >>>> Which leads me to suspect that I am not doing something right for KNL. Does >>>> anyone know what are some "optimal" configure options for running PETSc on >>>> KNL? >>>> >>>> Thanks, >>>> Justin >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From knepley at gmail.com Tue Apr 4 20:17:46 2017 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 4 Apr 2017 20:17:46 -0500 Subject: [petsc-users] PETSc with modern C++ In-Reply-To: References: <87o9wdm6mc.fsf@jedbrown.org> <026B5877-0005-4759-B205-86C24C69628C@mcs.anl.gov> Message-ID: On Tue, Apr 4, 2017 at 3:40 PM, Filippo Leonardi wrote: > I had weird issues where gcc (that I am using for my tests right now) > wasn't vectorising properly (even enabling all flags, from tree-vectorize, > to mavx). According to my tests, I know the Intel compiler was a bit better > at that. > We are definitely at the mercy of the compiler for this. Maybe Jed has an idea why its not vectorizing. > I actually did not know PETSc was doing some unrolling himself. On my > machine, PETSc aligns the memory to 16 bytes, that might also be a cause. > However, I have no idea on the ability of the different compilers to detect > and vectorize codes, even when those are manually unrolled by the user. I > see PETSc unrolls the loop for MAXPY up 4 right hand side vectors. Just by > coincidence, the numbers I reported are actually for this latter case. > > I pushed my code to a public git repository. However let me stress that > the code is extremely ugly and buggy. I actually feel shame in showing you > the code :) : > https://bitbucket.org/FilippoL/petscxx/commits/branch/master > > If you want to look at the API usage, you can find it in the file > "/src/benchmarks/bench_vector.cpp" after line 153. > > If you want the expressions generating the kernels you have to look at the > file "/src/base/vectorexpression.hpp", specifically the "evalat" members. > I looked. I guess it it a matter of taste whether this seems simpler, or unrolling and putting in an intrinsic call. The automation is undeniable. Matt > On Tue, 4 Apr 2017 at 21:39 Matthew Knepley wrote: > > On Tue, Apr 4, 2017 at 1:19 PM, Filippo Leonardi > wrote: > > You are in fact right, it is the same speedup of approximatively 2.5x > (with 2 ranks), my brain rounded up to 3. (This was just a test done in 10 > min on my Workstation, so no pretence to be definite, I just wanted to have > an indication). > > > Hmm, it seems like PetscKernelAXPY4() is just not vectorizing correctly > then. I would be interested to see your code. > > As you say, I am using OpenBLAS, so I wouldn't be surprised of those > results. If/when I use MKL (or something similar), I really do not expect > such an improvement). > > Since you seem interested (if you are interested, I can give you all the > details): the comparison I make, is with "petscxx" which is my template > code (which uses a single loop) using AVX (I force PETSc to align the > memory to 32 bit boundary and then I use packets of 4 doubles). Also notice > that I use vectors with nice lengths, so there is no need to "peel" the end > of the loop. The "PETSc" simulation is using PETSc's VecMAXPY. > > > Thanks, > > Matt > > > On Tue, 4 Apr 2017 at 19:12 Barry Smith wrote: > > > MAXPY isn't really a BLAS 1 since it can reuse some data in certain > vectors. > > > > On Apr 4, 2017, at 10:25 AM, Filippo Leonardi > wrote: > > > > I really appreciate the feedback. Thanks. > > > > That of deadlock, when the order of destruction is not preserved, is a > point I hadn't thought of. Maybe it can be cleverly addressed. > > > > PS: If you are interested, I ran some benchmark on BLAS1 stuff and, for > a single processor, I obtain: > > > > Example for MAXPY, with expression templates: > > BM_Vector_petscxx_MAXPY/8 38 ns 38 ns 18369805 > > BM_Vector_petscxx_MAXPY/64 622 ns 622 ns 1364335 > > BM_Vector_petscxx_MAXPY/512 281 ns 281 ns 2477718 > > BM_Vector_petscxx_MAXPY/4096 2046 ns 2046 ns 349954 > > BM_Vector_petscxx_MAXPY/32768 18012 ns 18012 ns 38788 > > BM_Vector_petscxx_MAXPY_BigO 0.55 N 0.55 N > > BM_Vector_petscxx_MAXPY_RMS 7 % 7 % > > Direct call to MAXPY: > > BM_Vector_PETSc_MAXPY/8 33 ns 33 ns 20973674 > > BM_Vector_PETSc_MAXPY/64 116 ns 116 ns 5992878 > > BM_Vector_PETSc_MAXPY/512 731 ns 731 ns 963340 > > BM_Vector_PETSc_MAXPY/4096 5739 ns 5739 ns 122414 > > BM_Vector_PETSc_MAXPY/32768 46346 ns 46346 ns 15312 > > BM_Vector_PETSc_MAXPY_BigO 1.41 N 1.41 N > > BM_Vector_PETSc_MAXPY_RMS 0 % 0 % > > > > And 3x speedup on 2 MPI ranks (not much communication here, anyway). I > am now convinced that this warrants some further investigation/testing. > > > > > > On Tue, 4 Apr 2017 at 01:08 Jed Brown wrote: > > Matthew Knepley writes: > > > > >> BLAS. (Here a interesting point opens: I assume an efficient BLAS > > >> > > >> implementation, but I am not so sure about how the different BLAS do > > >> things > > >> > > >> internally. I work from the assumption that we have a very well tuned > BLAS > > >> > > >> implementation at our disposal). > > >> > > > > > > The speed improvement comes from pulling vectors through memory fewer > > > times by merging operations (kernel fusion). > > > > Typical examples are VecMAXPY and VecMDot, but note that these are not > > xGEMV because the vectors are independent arrays rather than single > > arrays with a constant leading dimension. > > > > >> call VecGetArray. However I will inevitably foget to return the array > to > > >> > > >> PETSc. I could have my new VecArray returning an object that restores > the > > >> > > >> array > > >> > > >> when it goes out of scope. I can also flag the function with > [[nodiscard]] > > >> to > > >> > > >> prevent the user to destroy the returned object from the start. > > >> > > > > > > Jed claims that this pattern is no longer preferred, but I have > forgotten > > > his argument. > > > Jed? > > > > Destruction order matters and needs to be collective. If an error > > condition causes destruction to occur in a different order on different > > processes, you can get deadlock. I would much rather have an error > > leave some resources (for the OS to collect) than escalate into > > deadlock. > > > > > We have had this discussion for years on this list. Having separate > names > > > for each type > > > is really ugly and does not achieve what we want. We want smooth > > > interoperability between > > > objects with different backing types, but it is still not clear how to > do > > > this. > > > > Hide it internally and implicitly promote. Only the *GetArray functions > > need to be parametrized on numeric type. But it's a lot of work on the > > backend. > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Apr 4 21:03:36 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 4 Apr 2017 21:03:36 -0500 Subject: [petsc-users] -snes_mf_operator yields "No support for this operation for this object type" in TS codes? In-Reply-To: <874ly5pow3.fsf@jedbrown.org> References: <877f32qlf1.fsf@jedbrown.org> <61C35066-A114-4C0B-A026-94677F4B3C7B@mcs.anl.gov> <87h925pscc.fsf@jedbrown.org> <7CDCCF9F-F6B0-44E5-AB91-B20A977F3D23@mcs.anl.gov> <874ly5pow3.fsf@jedbrown.org> Message-ID: <0CC80ACE-93F9-4E69-9D25-983C8CB126BB@mcs.anl.gov> > On Apr 3, 2017, at 10:05 AM, Jed Brown wrote: > > Barry Smith writes: > >> >> SNESGetUsingInternalMatMFFD(snes,&flg); Then you can get rid of the horrible >> >> PetscBool flg; >> ierr = PetscObjectTypeCompare((PetscObject)A,MATMFFD,&flg);CHKERRQ(ierr); >> >> I had to add in two places. Still ugly but I think less buggy. > > Yeah, there are also MATMFFD checks in SNESComputeJacobian. These are different in that, I think, the same code needs to be used regardless of whether the user just used -snes_mf[*] or the user provided a matrix free matrix directly to the TS or SNES. So I don't think these can be changed to call SNESGetUsingInternalMatMFFD(). > > >> >> >> >> >>> >>> What if SNESComputeJacobian was aware of -snes_mf_operator and just >>> passed Pmat in both slots? Or does the user sometimes need access to >>> the MatMFFD created by -snes_mf_operator? (Seems like possibly, e.g., >>> to adjust differencing parameters.) From bsmith at mcs.anl.gov Tue Apr 4 21:19:43 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 4 Apr 2017 21:19:43 -0500 Subject: [petsc-users] PETSc with modern C++ In-Reply-To: References: Message-ID: <6E184775-E51E-48CE-BA78-308AC60BD116@mcs.anl.gov> > On Apr 2, 2017, at 2:15 PM, Filippo Leonardi wrote: > > > Hello, > > I have a project in mind and seek feedback. > > Disclaimer: I hope I am not abusing of this mailing list with this idea. If so, please ignore. > > As a thought experiment, and to have a bit of fun, I am currently writing/thinking on writing, a small (modern) C++ wrapper around PETSc. > > Premise: PETSc is awesome, I love it and use in many projects. Sometimes I am just not super comfortable writing C. (I know my idea goes against PETSc's design philosophy). > > I know there are many around, and there is not really a need for this (especially since PETSc has his own object-oriented style), but there are a few things I would like to really include in this wrapper, that I found nowhere): > - I am currently only thinking about the Vector/Matrix/KSP/DM part of the Framework, there are many other cool things that PETSc does that I do not have the brainpower to consider those as well. > - expression templates (in my opinion this is where C++ shines): this would replace all code bloat that a user might need with cool/easy to read expressions (this could increase the number of axpy-like routines); > - those expression templates should use SSE and AVX whenever available; > - expressions like x += alpha * y should fall back to BLAS axpy (tough sometimes this is not even faster than a simple loop); People have been playing with this type of thing for well over 20 years, for example, Rebecca Parsons and Dan Quinlan. A++/P++ array classes for architecture independent finite difference computations. In Proceedings of the Second Annual Object-Oriented Numerics Conference (OONSKI?94), April 1994. and it seems never to have gotten to the level of maturity needed for common usage (i.e. no one that I know of uses it seriously). Has something changed in 1) the templating abilities of C++ (newer better standards?) 2) people's (your?) abilities to utilize the templating abilities that have always been their? 3) something else? that would make this project a meaningful thing to do now? Frankly, not worrying about technical details, couldn't someone have done what you are suggesting 20 years? I actually considered trying to utilize these techniques when starting PETSc 2.0 but concluded that the benefits are minimal (slightly more readable/writable code) while the costs are an endless rat-hole of complexity and unmaintainable infrastructure; that only the single super-clever author can understand and update. Barry > - all calls to PETSc should be less verbose, more C++-like: > * for instance a VecGlobalToLocalBegin could return an empty object that calls VecGlobalToLocalEnd when it is destroyed. > * some cool idea to easily write GPU kernels. > - the idea would be to have safer routines (at compile time), by means of RAII etc. > > I aim for zero/near-zero/negligible overhead with full optimization, for that I include benchmarks and extensive test units. > > So my question is: > - anyone that would be interested (in the product/in developing)? > - anyone that has suggestions (maybe that what I have in mind is nonsense)? > > If you have read up to here, thanks. From jed at jedbrown.org Tue Apr 4 22:02:54 2017 From: jed at jedbrown.org (Jed Brown) Date: Tue, 04 Apr 2017 21:02:54 -0600 Subject: [petsc-users] PETSc with modern C++ In-Reply-To: References: <87o9wdm6mc.fsf@jedbrown.org> <026B5877-0005-4759-B205-86C24C69628C@mcs.anl.gov> Message-ID: <87zifvk141.fsf@jedbrown.org> Matthew Knepley writes: > On Tue, Apr 4, 2017 at 3:40 PM, Filippo Leonardi > wrote: > >> I had weird issues where gcc (that I am using for my tests right now) >> wasn't vectorising properly (even enabling all flags, from tree-vectorize, >> to mavx). According to my tests, I know the Intel compiler was a bit better >> at that. >> > > We are definitely at the mercy of the compiler for this. Maybe Jed has an > idea why its not vectorizing. Is this so bad? 000000000024080e mov rax,QWORD PTR [rbp-0xb0] 0000000000240815 add ebx,0x1 0000000000240818 vmulpd ymm0,ymm7,YMMWORD PTR [rax+r9*1] 000000000024081e mov rax,QWORD PTR [rbp-0xa8] 0000000000240825 vfmadd231pd ymm0,ymm8,YMMWORD PTR [rax+r9*1] 000000000024082b mov rax,QWORD PTR [rbp-0xb8] 0000000000240832 vfmadd231pd ymm0,ymm6,YMMWORD PTR [rax+r9*1] 0000000000240838 vfmadd231pd ymm0,ymm5,YMMWORD PTR [r10+r9*1] 000000000024083e vaddpd ymm0,ymm0,YMMWORD PTR [r11+r9*1] 0000000000240844 vmovapd YMMWORD PTR [r11+r9*1],ymm0 000000000024084a add r9,0x20 000000000024084e cmp DWORD PTR [rbp-0xa0],ebx 0000000000240854 ja 000000000024080e -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From knepley at gmail.com Tue Apr 4 22:34:21 2017 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 4 Apr 2017 22:34:21 -0500 Subject: [petsc-users] PETSc with modern C++ In-Reply-To: <87zifvk141.fsf@jedbrown.org> References: <87o9wdm6mc.fsf@jedbrown.org> <026B5877-0005-4759-B205-86C24C69628C@mcs.anl.gov> <87zifvk141.fsf@jedbrown.org> Message-ID: On Tue, Apr 4, 2017 at 10:02 PM, Jed Brown wrote: > Matthew Knepley writes: > > > On Tue, Apr 4, 2017 at 3:40 PM, Filippo Leonardi > > > wrote: > > > >> I had weird issues where gcc (that I am using for my tests right now) > >> wasn't vectorising properly (even enabling all flags, from > tree-vectorize, > >> to mavx). According to my tests, I know the Intel compiler was a bit > better > >> at that. > >> > > > > We are definitely at the mercy of the compiler for this. Maybe Jed has an > > idea why its not vectorizing. > > Is this so bad? > > 000000000024080e mov rax,QWORD PTR [rbp-0xb0] > 0000000000240815 add ebx,0x1 > 0000000000240818 vmulpd ymm0,ymm7,YMMWORD PTR > [rax+r9*1] > 000000000024081e mov rax,QWORD PTR [rbp-0xa8] > 0000000000240825 vfmadd231pd ymm0,ymm8,YMMWORD PTR > [rax+r9*1] > 000000000024082b mov rax,QWORD PTR [rbp-0xb8] > 0000000000240832 vfmadd231pd ymm0,ymm6,YMMWORD PTR > [rax+r9*1] > 0000000000240838 vfmadd231pd ymm0,ymm5,YMMWORD PTR > [r10+r9*1] > 000000000024083e vaddpd ymm0,ymm0,YMMWORD PTR > [r11+r9*1] > 0000000000240844 vmovapd YMMWORD PTR [r11+r9*1],ymm0 > 000000000024084a add r9,0x20 > 000000000024084e cmp DWORD PTR [rbp-0xa0],ebx > 0000000000240854 ja 000000000024080e > > I agree that is what we should see. It cannot be what Fillippo has if he is getting ~4x with the template stuff. Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Tue Apr 4 22:39:31 2017 From: jed at jedbrown.org (Jed Brown) Date: Tue, 04 Apr 2017 21:39:31 -0600 Subject: [petsc-users] PETSc with modern C++ In-Reply-To: References: <87o9wdm6mc.fsf@jedbrown.org> <026B5877-0005-4759-B205-86C24C69628C@mcs.anl.gov> <87zifvk141.fsf@jedbrown.org> Message-ID: <87wpazjzf0.fsf@jedbrown.org> Matthew Knepley writes: > On Tue, Apr 4, 2017 at 10:02 PM, Jed Brown wrote: > >> Matthew Knepley writes: >> >> > On Tue, Apr 4, 2017 at 3:40 PM, Filippo Leonardi > > >> > wrote: >> > >> >> I had weird issues where gcc (that I am using for my tests right now) >> >> wasn't vectorising properly (even enabling all flags, from >> tree-vectorize, >> >> to mavx). According to my tests, I know the Intel compiler was a bit >> better >> >> at that. >> >> >> > >> > We are definitely at the mercy of the compiler for this. Maybe Jed has an >> > idea why its not vectorizing. >> >> Is this so bad? >> >> 000000000024080e mov rax,QWORD PTR [rbp-0xb0] >> 0000000000240815 add ebx,0x1 >> 0000000000240818 vmulpd ymm0,ymm7,YMMWORD PTR >> [rax+r9*1] >> 000000000024081e mov rax,QWORD PTR [rbp-0xa8] >> 0000000000240825 vfmadd231pd ymm0,ymm8,YMMWORD PTR >> [rax+r9*1] >> 000000000024082b mov rax,QWORD PTR [rbp-0xb8] >> 0000000000240832 vfmadd231pd ymm0,ymm6,YMMWORD PTR >> [rax+r9*1] >> 0000000000240838 vfmadd231pd ymm0,ymm5,YMMWORD PTR >> [r10+r9*1] >> 000000000024083e vaddpd ymm0,ymm0,YMMWORD PTR >> [r11+r9*1] >> 0000000000240844 vmovapd YMMWORD PTR [r11+r9*1],ymm0 >> 000000000024084a add r9,0x20 >> 000000000024084e cmp DWORD PTR [rbp-0xa0],ebx >> 0000000000240854 ja 000000000024080e >> >> > > I agree that is what we should see. It cannot be what Fillippo has if he is > getting ~4x with the template stuff. I'm using gcc. Fillippo, can you make an easy to run test that we can evaluate on Xeon and KNL? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From jychang48 at gmail.com Tue Apr 4 22:45:52 2017 From: jychang48 at gmail.com (Justin Chang) Date: Tue, 4 Apr 2017 22:45:52 -0500 Subject: [petsc-users] Configuring PETSc for KNL In-Reply-To: <87h923lvkh.fsf@jedbrown.org> References: <87h923lvkh.fsf@jedbrown.org> Message-ID: So I tried the following options: -M 40 -N 40 -P 5 -da_refine 1/2/3/4 -log_view -mg_coarse_pc_type gamg -mg_levels_0_pc_type gamg -mg_levels_1_sub_pc_type cholesky -pc_type mg -thi_mat_type baij Performance improved dramatically. However, Haswell still beats out KNL but only by a little. Now it seems like MatSOR is taking some time (though I can't really judge whether it's significant or not). Attached are the log files. If ex48 has SSE2 intrinsics, does that mean Haswell would almost always be better? On Tue, Apr 4, 2017 at 4:19 PM, Jed Brown wrote: > Justin Chang writes: > > > Attached are the job output files (which include -log_view) for SNES ex48 > > run on a single haswell and knl node (32 and 64 cores respectively). > > Started off with a coarse grid of size 40x40x5 and ran three different > > tests with -da_refine 1/2/3 and -pc_type mg > > > > What's interesting/strange is that if i try to do -da_refine 4 on KNL, i > > get a slurm error that says: "slurmstepd: error: Step 4408401.0 exceeded > > memory limit (96737652 > 94371840), being killed" but it runs perfectly > > fine on Haswell. Adding -pc_mg_levels 7 enables KNL to run on -da_refine > 4 > > but the performance still does not beat out haswell. > > > > The performance spectrum (dofs/sec) for 1-3 levels of refinement looks > like > > this: > > > > Haswell: > > 2.416e+03 > > 1.490e+04 > > 5.188e+04 > > > > KNL: > > 9.308e+02 > > 7.257e+03 > > 3.838e+04 > > > > Which might suggest to me that KNL performs better with larger problem > > sizes. > > Look at the events. The (redundant) coarse LU factorization takes most > of the run time on KNL. The PETSc sparse LU is not vectorized and > doesn't exploit dense blocks in the way that the optimized direct > solvers do. You'll note that the paper was more aggressive about > minimizing the coarse grid size and used BoomerAMG instead of redundant > direct solves to avoid this scaling problem. > > > On Tue, Apr 4, 2017 at 11:05 AM, Matthew Knepley > wrote: > > > >> On Tue, Apr 4, 2017 at 10:57 AM, Justin Chang > wrote: > >> > >>> Thanks everyone for the helpful advice. So I tried all the suggestions > >>> including using libsci. The performance did not improve for my > particular > >>> runs, which I think suggests the problem parameters chosen for my tests > >>> (SNES ex48) are not optimal for KNL. Does anyone have example test > runs I > >>> could reproduce that compare the performance between KNL and > >>> Haswell/Ivybridge/etc? > >>> > >> > >> Lets try to see what is going on with your existing data first. > >> > >> First, I think that main thing is to make sure we are using MCDRAM. > >> Everything else in KNL > >> is window dressing (IMHO). All we have to look at is something like > MAXPY. > >> You can get the > >> bandwidth estimate from the flop rate and problem size (I think), and we > >> can at least get > >> bandwidth ratios between Haswell and KNL with that number. > >> > >> Matt > >> > >> > >>> On Mon, Apr 3, 2017 at 3:06 PM Richard Mills > >>> wrote: > >>> > >>>> Yes, one should rely on MKL (or Cray LibSci, if using the Cray > >>>> toolchain) on Cori. But I'm guessing that this will make no > noticeable > >>>> difference for what Justin is doing. > >>>> > >>>> --Richard > >>>> > >>>> On Mon, Apr 3, 2017 at 12:57 PM, murat ke?eli > wrote: > >>>> > >>>> How about replacing --download-fblaslapack with vendor specific > >>>> BLAS/LAPACK? > >>>> > >>>> Murat > >>>> > >>>> On Mon, Apr 3, 2017 at 2:45 PM, Richard Mills < > richardtmills at gmail.com> > >>>> wrote: > >>>> > >>>> On Mon, Apr 3, 2017 at 12:24 PM, Zhang, Hong > wrote: > >>>> > >>>> > >>>> On Apr 3, 2017, at 1:44 PM, Justin Chang wrote: > >>>> > >>>> Richard, > >>>> > >>>> This is what my job script looks like: > >>>> > >>>> #!/bin/bash > >>>> #SBATCH -N 16 > >>>> #SBATCH -C knl,quad,flat > >>>> #SBATCH -p regular > >>>> #SBATCH -J knlflat1024 > >>>> #SBATCH -L SCRATCH > >>>> #SBATCH -o knlflat1024.o%j > >>>> #SBATCH --mail-type=ALL > >>>> #SBATCH --mail-user=jychang48 at gmail.com > >>>> #SBATCH -t 00:20:00 > >>>> > >>>> #run the application: > >>>> cd $SCRATCH/Icesheet > >>>> sbcast --compress=lz4 ./ex48cori /tmp/ex48cori > >>>> srun -n 1024 -c 4 --cpu_bind=cores numactl -p 1 /tmp/ex48cori -M 128 > -N > >>>> 128 -P 16 -thi_mat_type baij -pc_type mg -mg_coarse_pc_type gamg > -da_refine > >>>> 1 > >>>> > >>>> > >>>> Maybe it is a typo. It should be numactl -m 1. > >>>> > >>>> > >>>> "-p 1" will also work. "-p" means to "prefer" NUMA node 1 (the > MCDRAM), > >>>> whereas "-m" means to use only NUMA node 1. In the former case, > MCDRAM > >>>> will be used for allocations until the available memory there has been > >>>> exhausted, and then things will spill over into the DRAM. One would > think > >>>> that "-m" would be better for doing performance studies, but on > systems > >>>> where the nodes have swap space enabled, you can get terrible > performance > >>>> if your code's working set exceeds the size of the MCDRAM, as the > system > >>>> will obediently obey your wishes to not use the DRAM and go straight > to the > >>>> swap disk! I assume the Cori nodes don't have swap space, though I > could > >>>> be wrong. > >>>> > >>>> > >>>> According to the NERSC info pages, they say to add the "numactl" if > >>>> using flat mode. Previously I tried cache mode but the performance > seems to > >>>> be unaffected. > >>>> > >>>> > >>>> Using cache mode should give similar performance as using flat mode > with > >>>> the numactl option. But both approaches should be significant faster > than > >>>> using flat mode without the numactl option. I usually see over 3X > speedup. > >>>> You can also do such comparison to see if the high-bandwidth memory is > >>>> working properly. > >>>> > >>>> I also comparerd 256 haswell nodes vs 256 KNL nodes and haswell is > >>>> nearly 4-5x faster. Though I suspect this drastic change has much to > do > >>>> with the initial coarse grid size now being extremely small. > >>>> > >>>> I think you may be right about why you see such a big difference. The > >>>> KNL nodes need enough work to be able to use the SIMD lanes > effectively. > >>>> Also, if your problem gets small enough, then it's going to be able > to fit > >>>> in the Haswell's L3 cache. Although KNL has MCDRAM and this delivers > *a > >>>> lot* more memory bandwidth than the DDR4 memory, it will deliver a > lot less > >>>> bandwidth than the Haswell's L3. > >>>> > >>>> I'll give the COPTFLAGS a try and see what happens > >>>> > >>>> > >>>> Make sure to use --with-memalign=64 for data alignment when > configuring > >>>> PETSc. > >>>> > >>>> > >>>> Ah, yes, I forgot that. Thanks for mentioning it, Hong! > >>>> > >>>> > >>>> The option -xMIC-AVX512 would improve the vectorization performance. > But > >>>> it may cause problems for the MPIBAIJ format for some unknown reason. > >>>> MPIAIJ should work fine with this option. > >>>> > >>>> > >>>> Hmm. Try both, and, if you see worse performance with MPIBAIJ, let us > >>>> know and I'll try to figure this out. > >>>> > >>>> --Richard > >>>> > >>>> > >>>> > >>>> Hong (Mr.) > >>>> > >>>> Thanks, > >>>> Justin > >>>> > >>>> On Mon, Apr 3, 2017 at 1:36 PM, Richard Mills < > richardtmills at gmail.com> > >>>> wrote: > >>>> > >>>> Hi Justin, > >>>> > >>>> How is the MCDRAM (on-package "high-bandwidth memory") configured for > >>>> your KNL runs? And if it is in "flat" mode, what are you doing to > ensure > >>>> that you use the MCDRAM? Doing this wrong seems to be one of the most > >>>> common reasons for unexpected poor performance on KNL. > >>>> > >>>> I'm not that familiar with the environment on Cori, but I think that > if > >>>> you are building for KNL, you should add "-xMIC-AVX512" to your > compiler > >>>> flags to explicitly instruct the compiler to use the AVX512 > instruction > >>>> set. I usually use something along the lines of > >>>> > >>>> 'COPTFLAGS=-g -O3 -fp-model fast -xMIC-AVX512' > >>>> > >>>> (The "-g" just adds symbols, which make the output from performance > >>>> profiling tools much more useful.) > >>>> > >>>> That said, I think that if you are comparing 1024 Haswell cores vs. > 1024 > >>>> KNL cores (so double the number of Haswell nodes), I'm not surprised > that > >>>> the simulations are almost twice as fast using the Haswell nodes. > Keep in > >>>> mind that individual KNL cores are much less powerful than an > individual > >>>> Haswell node. You are also using roughly twice the power footprint > (dual > >>>> socket Haswell node should be roughly equivalent to a KNL node, I > >>>> believe). How do things look on when you compare equal nodes? > >>>> > >>>> Cheers, > >>>> Richard > >>>> > >>>> On Mon, Apr 3, 2017 at 11:13 AM, Justin Chang > >>>> wrote: > >>>> > >>>> Hi all, > >>>> > >>>> On NERSC's Cori I have the following configure options for PETSc: > >>>> > >>>> ./configure --download-fblaslapack --with-cc=cc > --with-clib-autodetect=0 > >>>> --with-cxx=CC --with-cxxlib-autodetect=0 --with-debugging=0 > --with-fc=ftn > >>>> --with-fortranlib-autodetect=0 --with-mpiexec=srun > --with-64-bit-indices=1 > >>>> COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 PETSC_ARCH=arch-cori-opt > >>>> > >>>> Where I swapped out the default Intel programming environment with > that > >>>> of Cray (e.g., 'module switch PrgEnv-intel/6.0.3 PrgEnv-cray/6.0.3'). > I > >>>> want to document the performance difference between Cori's Haswell > and KNL > >>>> processors. > >>>> > >>>> When I run a PETSc example like SNES ex48 on 1024 cores (32 Haswell > and > >>>> 16 KNL nodes), the simulations are almost twice as fast on Haswell > nodes. > >>>> Which leads me to suspect that I am not doing something right for > KNL. Does > >>>> anyone know what are some "optimal" configure options for running > PETSc on > >>>> KNL? > >>>> > >>>> Thanks, > >>>> Justin > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >> > >> > >> -- > >> What most experimenters take for granted before they begin their > >> experiments is infinitely more interesting than any results to which > their > >> experiments lead. > >> -- Norbert Wiener > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: testhas_sor_1node.o4410779 Type: application/octet-stream Size: 69141 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: testknl_sor_1node.o4410753 Type: application/octet-stream Size: 68989 bytes Desc: not available URL: From bsmith at mcs.anl.gov Tue Apr 4 23:03:29 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 4 Apr 2017 23:03:29 -0500 Subject: [petsc-users] Configuring PETSc for KNL In-Reply-To: References: <87h923lvkh.fsf@jedbrown.org> Message-ID: <8084C1D4-27BB-4C08-9735-82C15554B61B@mcs.anl.gov> These results seem reasonable to me. What makes you think that KNL should be doing better than it does in comparison to Haswell? The entire reason for the existence of KNL is that it is a way for Intel to be able to "compete" with Nvidia GPUs for numerics and data processing, for example in the financial industry. By "compete" I mean convince gullible purchasing agents for large companies to purchase Intel KNL systems instead of Nvidia GPU systems. There is nothing in the hardware specifications of KNL that would indicate that it should work better on this type of problem than Haswell, in fact the specifications indicate that the Haskell should perform better, as it does. No surprises. The surprise is that allegedly smart HPC members fall for Intel's scam because they don't do the most basic homework, or because they are so obsessed with "power usage". > On Apr 4, 2017, at 10:45 PM, Justin Chang wrote: > > So I tried the following options: > > -M 40 > -N 40 > -P 5 > -da_refine 1/2/3/4 > -log_view > -mg_coarse_pc_type gamg > -mg_levels_0_pc_type gamg > -mg_levels_1_sub_pc_type cholesky > -pc_type mg > -thi_mat_type baij > > Performance improved dramatically. However, Haswell still beats out KNL but only by a little. Now it seems like MatSOR is taking some time (though I can't really judge whether it's significant or not). Attached are the log files. > > If ex48 has SSE2 intrinsics, does that mean Haswell would almost always be better? > > On Tue, Apr 4, 2017 at 4:19 PM, Jed Brown wrote: > Justin Chang writes: > > > Attached are the job output files (which include -log_view) for SNES ex48 > > run on a single haswell and knl node (32 and 64 cores respectively). > > Started off with a coarse grid of size 40x40x5 and ran three different > > tests with -da_refine 1/2/3 and -pc_type mg > > > > What's interesting/strange is that if i try to do -da_refine 4 on KNL, i > > get a slurm error that says: "slurmstepd: error: Step 4408401.0 exceeded > > memory limit (96737652 > 94371840), being killed" but it runs perfectly > > fine on Haswell. Adding -pc_mg_levels 7 enables KNL to run on -da_refine 4 > > but the performance still does not beat out haswell. > > > > The performance spectrum (dofs/sec) for 1-3 levels of refinement looks like > > this: > > > > Haswell: > > 2.416e+03 > > 1.490e+04 > > 5.188e+04 > > > > KNL: > > 9.308e+02 > > 7.257e+03 > > 3.838e+04 > > > > Which might suggest to me that KNL performs better with larger problem > > sizes. > > Look at the events. The (redundant) coarse LU factorization takes most > of the run time on KNL. The PETSc sparse LU is not vectorized and > doesn't exploit dense blocks in the way that the optimized direct > solvers do. You'll note that the paper was more aggressive about > minimizing the coarse grid size and used BoomerAMG instead of redundant > direct solves to avoid this scaling problem. > > > On Tue, Apr 4, 2017 at 11:05 AM, Matthew Knepley wrote: > > > >> On Tue, Apr 4, 2017 at 10:57 AM, Justin Chang wrote: > >> > >>> Thanks everyone for the helpful advice. So I tried all the suggestions > >>> including using libsci. The performance did not improve for my particular > >>> runs, which I think suggests the problem parameters chosen for my tests > >>> (SNES ex48) are not optimal for KNL. Does anyone have example test runs I > >>> could reproduce that compare the performance between KNL and > >>> Haswell/Ivybridge/etc? > >>> > >> > >> Lets try to see what is going on with your existing data first. > >> > >> First, I think that main thing is to make sure we are using MCDRAM. > >> Everything else in KNL > >> is window dressing (IMHO). All we have to look at is something like MAXPY. > >> You can get the > >> bandwidth estimate from the flop rate and problem size (I think), and we > >> can at least get > >> bandwidth ratios between Haswell and KNL with that number. > >> > >> Matt > >> > >> > >>> On Mon, Apr 3, 2017 at 3:06 PM Richard Mills > >>> wrote: > >>> > >>>> Yes, one should rely on MKL (or Cray LibSci, if using the Cray > >>>> toolchain) on Cori. But I'm guessing that this will make no noticeable > >>>> difference for what Justin is doing. > >>>> > >>>> --Richard > >>>> > >>>> On Mon, Apr 3, 2017 at 12:57 PM, murat ke?eli wrote: > >>>> > >>>> How about replacing --download-fblaslapack with vendor specific > >>>> BLAS/LAPACK? > >>>> > >>>> Murat > >>>> > >>>> On Mon, Apr 3, 2017 at 2:45 PM, Richard Mills > >>>> wrote: > >>>> > >>>> On Mon, Apr 3, 2017 at 12:24 PM, Zhang, Hong wrote: > >>>> > >>>> > >>>> On Apr 3, 2017, at 1:44 PM, Justin Chang wrote: > >>>> > >>>> Richard, > >>>> > >>>> This is what my job script looks like: > >>>> > >>>> #!/bin/bash > >>>> #SBATCH -N 16 > >>>> #SBATCH -C knl,quad,flat > >>>> #SBATCH -p regular > >>>> #SBATCH -J knlflat1024 > >>>> #SBATCH -L SCRATCH > >>>> #SBATCH -o knlflat1024.o%j > >>>> #SBATCH --mail-type=ALL > >>>> #SBATCH --mail-user=jychang48 at gmail.com > >>>> #SBATCH -t 00:20:00 > >>>> > >>>> #run the application: > >>>> cd $SCRATCH/Icesheet > >>>> sbcast --compress=lz4 ./ex48cori /tmp/ex48cori > >>>> srun -n 1024 -c 4 --cpu_bind=cores numactl -p 1 /tmp/ex48cori -M 128 -N > >>>> 128 -P 16 -thi_mat_type baij -pc_type mg -mg_coarse_pc_type gamg -da_refine > >>>> 1 > >>>> > >>>> > >>>> Maybe it is a typo. It should be numactl -m 1. > >>>> > >>>> > >>>> "-p 1" will also work. "-p" means to "prefer" NUMA node 1 (the MCDRAM), > >>>> whereas "-m" means to use only NUMA node 1. In the former case, MCDRAM > >>>> will be used for allocations until the available memory there has been > >>>> exhausted, and then things will spill over into the DRAM. One would think > >>>> that "-m" would be better for doing performance studies, but on systems > >>>> where the nodes have swap space enabled, you can get terrible performance > >>>> if your code's working set exceeds the size of the MCDRAM, as the system > >>>> will obediently obey your wishes to not use the DRAM and go straight to the > >>>> swap disk! I assume the Cori nodes don't have swap space, though I could > >>>> be wrong. > >>>> > >>>> > >>>> According to the NERSC info pages, they say to add the "numactl" if > >>>> using flat mode. Previously I tried cache mode but the performance seems to > >>>> be unaffected. > >>>> > >>>> > >>>> Using cache mode should give similar performance as using flat mode with > >>>> the numactl option. But both approaches should be significant faster than > >>>> using flat mode without the numactl option. I usually see over 3X speedup. > >>>> You can also do such comparison to see if the high-bandwidth memory is > >>>> working properly. > >>>> > >>>> I also comparerd 256 haswell nodes vs 256 KNL nodes and haswell is > >>>> nearly 4-5x faster. Though I suspect this drastic change has much to do > >>>> with the initial coarse grid size now being extremely small. > >>>> > >>>> I think you may be right about why you see such a big difference. The > >>>> KNL nodes need enough work to be able to use the SIMD lanes effectively. > >>>> Also, if your problem gets small enough, then it's going to be able to fit > >>>> in the Haswell's L3 cache. Although KNL has MCDRAM and this delivers *a > >>>> lot* more memory bandwidth than the DDR4 memory, it will deliver a lot less > >>>> bandwidth than the Haswell's L3. > >>>> > >>>> I'll give the COPTFLAGS a try and see what happens > >>>> > >>>> > >>>> Make sure to use --with-memalign=64 for data alignment when configuring > >>>> PETSc. > >>>> > >>>> > >>>> Ah, yes, I forgot that. Thanks for mentioning it, Hong! > >>>> > >>>> > >>>> The option -xMIC-AVX512 would improve the vectorization performance. But > >>>> it may cause problems for the MPIBAIJ format for some unknown reason. > >>>> MPIAIJ should work fine with this option. > >>>> > >>>> > >>>> Hmm. Try both, and, if you see worse performance with MPIBAIJ, let us > >>>> know and I'll try to figure this out. > >>>> > >>>> --Richard > >>>> > >>>> > >>>> > >>>> Hong (Mr.) > >>>> > >>>> Thanks, > >>>> Justin > >>>> > >>>> On Mon, Apr 3, 2017 at 1:36 PM, Richard Mills > >>>> wrote: > >>>> > >>>> Hi Justin, > >>>> > >>>> How is the MCDRAM (on-package "high-bandwidth memory") configured for > >>>> your KNL runs? And if it is in "flat" mode, what are you doing to ensure > >>>> that you use the MCDRAM? Doing this wrong seems to be one of the most > >>>> common reasons for unexpected poor performance on KNL. > >>>> > >>>> I'm not that familiar with the environment on Cori, but I think that if > >>>> you are building for KNL, you should add "-xMIC-AVX512" to your compiler > >>>> flags to explicitly instruct the compiler to use the AVX512 instruction > >>>> set. I usually use something along the lines of > >>>> > >>>> 'COPTFLAGS=-g -O3 -fp-model fast -xMIC-AVX512' > >>>> > >>>> (The "-g" just adds symbols, which make the output from performance > >>>> profiling tools much more useful.) > >>>> > >>>> That said, I think that if you are comparing 1024 Haswell cores vs. 1024 > >>>> KNL cores (so double the number of Haswell nodes), I'm not surprised that > >>>> the simulations are almost twice as fast using the Haswell nodes. Keep in > >>>> mind that individual KNL cores are much less powerful than an individual > >>>> Haswell node. You are also using roughly twice the power footprint (dual > >>>> socket Haswell node should be roughly equivalent to a KNL node, I > >>>> believe). How do things look on when you compare equal nodes? > >>>> > >>>> Cheers, > >>>> Richard > >>>> > >>>> On Mon, Apr 3, 2017 at 11:13 AM, Justin Chang > >>>> wrote: > >>>> > >>>> Hi all, > >>>> > >>>> On NERSC's Cori I have the following configure options for PETSc: > >>>> > >>>> ./configure --download-fblaslapack --with-cc=cc --with-clib-autodetect=0 > >>>> --with-cxx=CC --with-cxxlib-autodetect=0 --with-debugging=0 --with-fc=ftn > >>>> --with-fortranlib-autodetect=0 --with-mpiexec=srun --with-64-bit-indices=1 > >>>> COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 PETSC_ARCH=arch-cori-opt > >>>> > >>>> Where I swapped out the default Intel programming environment with that > >>>> of Cray (e.g., 'module switch PrgEnv-intel/6.0.3 PrgEnv-cray/6.0.3'). I > >>>> want to document the performance difference between Cori's Haswell and KNL > >>>> processors. > >>>> > >>>> When I run a PETSc example like SNES ex48 on 1024 cores (32 Haswell and > >>>> 16 KNL nodes), the simulations are almost twice as fast on Haswell nodes. > >>>> Which leads me to suspect that I am not doing something right for KNL. Does > >>>> anyone know what are some "optimal" configure options for running PETSc on > >>>> KNL? > >>>> > >>>> Thanks, > >>>> Justin > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >> > >> > >> -- > >> What most experimenters take for granted before they begin their > >> experiments is infinitely more interesting than any results to which their > >> experiments lead. > >> -- Norbert Wiener > >> > > From jed at jedbrown.org Tue Apr 4 23:05:18 2017 From: jed at jedbrown.org (Jed Brown) Date: Tue, 04 Apr 2017 22:05:18 -0600 Subject: [petsc-users] Configuring PETSc for KNL In-Reply-To: References: <87h923lvkh.fsf@jedbrown.org> Message-ID: <87r317jy81.fsf@jedbrown.org> Justin Chang writes: > So I tried the following options: > > -M 40 > -N 40 > -P 5 > -da_refine 1/2/3/4 > -log_view > -mg_coarse_pc_type gamg > -mg_levels_0_pc_type gamg > -mg_levels_1_sub_pc_type cholesky > -pc_type mg > -thi_mat_type baij > > Performance improved dramatically. However, Haswell still beats out KNL but > only by a little. Now it seems like MatSOR is taking some time (though I > can't really judge whether it's significant or not). Attached are the log > files. MatSOR is gonna suck on KNL without some dirty work. > If ex48 has SSE2 intrinsics, does that mean Haswell would almost always be > better? The SSE2 intrinsics are only in Jacobian assemble and could be upgraded to AVX/512. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From jed at jedbrown.org Tue Apr 4 23:10:35 2017 From: jed at jedbrown.org (Jed Brown) Date: Tue, 04 Apr 2017 22:10:35 -0600 Subject: [petsc-users] Configuring PETSc for KNL In-Reply-To: <8084C1D4-27BB-4C08-9735-82C15554B61B@mcs.anl.gov> References: <87h923lvkh.fsf@jedbrown.org> <8084C1D4-27BB-4C08-9735-82C15554B61B@mcs.anl.gov> Message-ID: <87o9wbjxz8.fsf@jedbrown.org> Barry Smith writes: > These results seem reasonable to me. > > What makes you think that KNL should be doing better than it does in comparison to Haswell? > > The entire reason for the existence of KNL is that it is a way for > Intel to be able to "compete" with Nvidia GPUs for numerics and > data processing, for example in the financial industry. By > "compete" I mean convince gullible purchasing agents for large > companies to purchase Intel KNL systems instead of Nvidia GPU > systems. There is nothing in the hardware specifications of KNL > that would indicate that it should work better on this type of > problem than Haswell, in fact the specifications indicate that the > Haskell should perform better Boom! Time to rewrite PETSc in Haskell! -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From bsmith at mcs.anl.gov Tue Apr 4 23:13:20 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 4 Apr 2017 23:13:20 -0500 Subject: [petsc-users] Configuring PETSc for KNL In-Reply-To: <87o9wbjxz8.fsf@jedbrown.org> References: <87h923lvkh.fsf@jedbrown.org> <8084C1D4-27BB-4C08-9735-82C15554B61B@mcs.anl.gov> <87o9wbjxz8.fsf@jedbrown.org> Message-ID: > On Apr 4, 2017, at 11:10 PM, Jed Brown wrote: > > Barry Smith writes: > >> These results seem reasonable to me. >> >> What makes you think that KNL should be doing better than it does in comparison to Haswell? >> >> The entire reason for the existence of KNL is that it is a way for >> Intel to be able to "compete" with Nvidia GPUs for numerics and >> data processing, for example in the financial industry. By >> "compete" I mean convince gullible purchasing agents for large >> companies to purchase Intel KNL systems instead of Nvidia GPU >> systems. There is nothing in the hardware specifications of KNL >> that would indicate that it should work better on this type of >> problem than Haswell, in fact the specifications indicate that the >> Haskell should perform better > > Boom! Time to rewrite PETSc in Haskell! Damn spell checker! From richardtmills at gmail.com Wed Apr 5 00:12:09 2017 From: richardtmills at gmail.com (Richard Mills) Date: Tue, 4 Apr 2017 22:12:09 -0700 Subject: [petsc-users] Configuring PETSc for KNL In-Reply-To: <87o9wbjxz8.fsf@jedbrown.org> References: <87h923lvkh.fsf@jedbrown.org> <8084C1D4-27BB-4C08-9735-82C15554B61B@mcs.anl.gov> <87o9wbjxz8.fsf@jedbrown.org> Message-ID: On Tue, Apr 4, 2017 at 9:10 PM, Jed Brown wrote: > Barry Smith writes: > > > These results seem reasonable to me. > > > > What makes you think that KNL should be doing better than it does in > comparison to Haswell? > > > > The entire reason for the existence of KNL is that it is a way for > > Intel to be able to "compete" with Nvidia GPUs for numerics and > > data processing, for example in the financial industry. By > > "compete" I mean convince gullible purchasing agents for large > > companies to purchase Intel KNL systems instead of Nvidia GPU > > systems. There is nothing in the hardware specifications of KNL > > that would indicate that it should work better on this type of > > problem than Haswell, in fact the specifications indicate that the > > Haskell should perform better > > Boom! Time to rewrite PETSc in Haskell! > Yeah, forget this debate about using C++! -------------- next part -------------- An HTML attachment was scrubbed... URL: From filippo.leon at gmail.com Wed Apr 5 00:42:42 2017 From: filippo.leon at gmail.com (Filippo Leonardi) Date: Wed, 05 Apr 2017 05:42:42 +0000 Subject: [petsc-users] PETSc with modern C++ In-Reply-To: <87wpazjzf0.fsf@jedbrown.org> References: <87o9wdm6mc.fsf@jedbrown.org> <026B5877-0005-4759-B205-86C24C69628C@mcs.anl.gov> <87zifvk141.fsf@jedbrown.org> <87wpazjzf0.fsf@jedbrown.org> Message-ID: @jed: You assembly is what I would've expected. Let me simplify my code and see if I can provide a useful test example. (also: I assume your assembly is for xeon, so I should definitely use avx512). Let me get back at you in a few days (work permitting) with something you can use. >From your example I wouldn't expect any benefit with my code compared to just calling petsc (for those simple kernels). A big plus I hadn't thought of, would be that the compiler is really forced to vectorise (like in my case, where I might'have messed up some config parameter). @barry: I'm definitely too young to comment here (i.e. it's me that changed, not the world). Definitely this is not new stuff, and, for instance, Armadillo/boost/Eigen have been successfully production ready for many years now. I have somehow the impression that now that c++11 is more mainstream, it is much easier to write easily readable/maintainable code (still ugly as hell tough). I think we can now give for granted a c++11 compiler on any "supercomputer", and even c++14 and soon c++17... and this makes development and interfaces much nicer. What I would like to see is something like PETSc (where I have nice, hidden MPI calls for instance), combined with the niceness of those libraries (where many operations can be written in a, if I might say so, more natural way). (My plan is: you did all the hard work, C++ can put a ribbon on it and see what comes out.) On 5 Apr 2017 5:39 am, "Jed Brown" wrote: Matthew Knepley writes: > On Tue, Apr 4, 2017 at 10:02 PM, Jed Brown wrote: > >> Matthew Knepley writes: >> >> > On Tue, Apr 4, 2017 at 3:40 PM, Filippo Leonardi < filippo.leon at gmail.com >> > >> > wrote: >> > >> >> I had weird issues where gcc (that I am using for my tests right now) >> >> wasn't vectorising properly (even enabling all flags, from >> tree-vectorize, >> >> to mavx). According to my tests, I know the Intel compiler was a bit >> better >> >> at that. >> >> >> > >> > We are definitely at the mercy of the compiler for this. Maybe Jed has an >> > idea why its not vectorizing. >> >> Is this so bad? >> >> 000000000024080e mov rax,QWORD PTR [rbp-0xb0] >> 0000000000240815 add ebx,0x1 >> 0000000000240818 vmulpd ymm0,ymm7,YMMWORD PTR >> [rax+r9*1] >> 000000000024081e mov rax,QWORD PTR [rbp-0xa8] >> 0000000000240825 vfmadd231pd ymm0,ymm8,YMMWORD PTR >> [rax+r9*1] >> 000000000024082b mov rax,QWORD PTR [rbp-0xb8] >> 0000000000240832 vfmadd231pd ymm0,ymm6,YMMWORD PTR >> [rax+r9*1] >> 0000000000240838 vfmadd231pd ymm0,ymm5,YMMWORD PTR >> [r10+r9*1] >> 000000000024083e vaddpd ymm0,ymm0,YMMWORD PTR >> [r11+r9*1] >> 0000000000240844 vmovapd YMMWORD PTR [r11+r9*1],ymm0 >> 000000000024084a add r9,0x20 >> 000000000024084e cmp DWORD PTR [rbp-0xa0],ebx >> 0000000000240854 ja 000000000024080e >> >> > > I agree that is what we should see. It cannot be what Fillippo has if he is > getting ~4x with the template stuff. I'm using gcc. Fillippo, can you make an easy to run test that we can evaluate on Xeon and KNL? -------------- next part -------------- An HTML attachment was scrubbed... URL: From rupp at iue.tuwien.ac.at Wed Apr 5 04:19:52 2017 From: rupp at iue.tuwien.ac.at (Karl Rupp) Date: Wed, 5 Apr 2017 11:19:52 +0200 Subject: [petsc-users] PETSc with modern C++ In-Reply-To: References: <87o9wdm6mc.fsf@jedbrown.org> <026B5877-0005-4759-B205-86C24C69628C@mcs.anl.gov> <87zifvk141.fsf@jedbrown.org> <87wpazjzf0.fsf@jedbrown.org> Message-ID: <0bddea6d-7bf6-9dbc-c101-96cbecbe6686@iue.tuwien.ac.at> Hi Filippo, did you compile PETSc with the same level of optimization than your template code? In particular, did you turn debugging off for the timings? Either way, let me share some of the mental stages I've gone through with ViennaCL. It started out with the same premises as you provide: There is nice C++, compilers are 'now' mature enough, and everything can be made a lot 'easier' and 'simpler' for the user. This was in 2010, back when all this GPU computing took off. Now, fast-forward to this day, I consider the extensive use of C++ with expression templates and other fanciness to be the biggest mistake of ViennaCL. Here is an incomplete list of why: a) The 'generic' interface mimics that of Boost.uBLAS, so you can just mix&match the two libraries. Problems: First, one replicates the really poor design choices in uBLAS. Second, most users didn't even *think* about using these generic routines for anything other than ViennaCL types. There are just many more computational scientists familiar with Fortran or C than with "modern C++". b) Keeping the type system consistent. So you would probably introduce a matrix, a vector, and operator overloads. For example, take y = A * x for a matrix A and vectors y, x. Sooner or later, users will write y = R * A * P * x and all of a sudden your left-to-right associativity results in an evaluation as y = ((R * A) * P) * x. (How to 'solve' this particular case: Introduce even more expression templates! See Blaze for an example. So you soon have hundreds of lines of code just to fix problems you never had with just calling functions.) c) Deal with errors: Continuing with the above example, what if R * A runs out of memory? Sure, you can throw exceptions, but the resulting stack trace will contain all the expression template frenzy, plus you will have a hard time to find out whether the operation R * A or the subsequent multiplication with P causes the error. d) Expression templates are limited to a single line of code. Every now and then, algorithms require repeated operations on different data. For example, you may have vector operations of the form x_i <- x_{i-1} + alpha * p_{i-1} r_i <- r_{i-1} - alpha * y_i p_i <- r_i + beta * p_{i-1} (with alpha and beta being scalars, everything else being vectors) in a pipelined conjugate gradient formulation. The "C"-way of doing this is to grab the pointers to the data and compute all three vector operations in a single for-loop, thus saving on repeated data access from main memory. With expression templates, there is no way to make this as efficient, because the lifetime of expression templates is exactly one line of code. e) Lack of a stable ABI. There is a good reason why many vendor libraries come with a C interface, not with a C++ interface. If you try to link C++ object files generated by different C++ compilers (it is enough to have different versions of the same compiler, see Microsoft), it is undefined behavior. In best case, it fails with a linker error. In worst case, it will produce random crashes right before the end of a one-month simulation that you needed for your paper to be submitted in a few days. If I remember correctly, one group got ambitious with C++ for writing an MPI library, running into these kinds of problems with the C++ standard library. f) The entry bar gets way too high. PETSc's current code base allows several dozens of users to contribute to each release. If some problem shows up in a particular function, you just go there, hook up the debugger, and figure out what is going on. Now assume there are C++ expression templates involved. Your stack trace may now spread over several screen pages. Once you navigate to the offending line, you find that a certain overload of a traits class coupled with two functors provides wrong results. The reverse lookup of how exactly the traits class was instantiated, which working set was passed in, and how those functors interact may easily take you many minutes to digest - if you are the guy who has written that piece of code. People new to PETSc are likely to just give up. Note that this is a problem for new library users across the spectrum of C++ libraries; you can find evidence e.g. on the Eigen mailinglist. In the end, C++ templates are leaky abstractions and their use will sooner or later hit the user. Long story short: Keep in mind that there is a cost to "modern C++" in exchange for eventual performance benefits. For PETSc applications, the bottleneck is all too often not the performance of an expression that could be 'expression templated', but in areas where the use of C++ wouldn't make a difference: algorithm selection, network performance, or poor process placement, just to name a few. Nonetheless, don't feel set back; you may have better ideas and know better ways of dealing with C++ than we do. A lightweight wrapper on top of PETSc, similar to what petsc4py does for Python, is something that a share of users may find handy. Whether it's worth the extra coding and maintenance effort? I don't know. Best regards, Karli On 04/05/2017 07:42 AM, Filippo Leonardi wrote: > @jed: You assembly is what I would've expected. Let me simplify my code > and see if I can provide a useful test example. (also: I assume your > assembly is for xeon, so I should definitely use avx512). > > Let me get back at you in a few days (work permitting) with something > you can use. > > From your example I wouldn't expect any benefit with my code compared to > just calling petsc (for those simple kernels). > > A big plus I hadn't thought of, would be that the compiler is really > forced to vectorise (like in my case, where I might'have messed up some > config parameter). > > @barry: I'm definitely too young to comment here (i.e. it's me that > changed, not the world). Definitely this is not new stuff, and, for > instance, Armadillo/boost/Eigen have been successfully production ready > for many years now. I have somehow the impression that now that c++11 is > more mainstream, it is much easier to write easily readable/maintainable > code (still ugly as hell tough). I think we can now give for granted a > c++11 compiler on any "supercomputer", and even c++14 and soon c++17... > and this makes development and interfaces much nicer. > > What I would like to see is something like PETSc (where I have nice, > hidden MPI calls for instance), combined with the niceness of those > libraries (where many operations can be written in a, if I might say so, > more natural way). (My plan is: you did all the hard work, C++ can put a > ribbon on it and see what comes out.) > > On 5 Apr 2017 5:39 am, "Jed Brown" > wrote: > > Matthew Knepley > writes: > > > On Tue, Apr 4, 2017 at 10:02 PM, Jed Brown > wrote: > > > >> Matthew Knepley > > writes: > >> > >> > On Tue, Apr 4, 2017 at 3:40 PM, Filippo Leonardi > > >> > > >> > wrote: > >> > > >> >> I had weird issues where gcc (that I am using for my tests > right now) > >> >> wasn't vectorising properly (even enabling all flags, from > >> tree-vectorize, > >> >> to mavx). According to my tests, I know the Intel compiler was > a bit > >> better > >> >> at that. > >> >> > >> > > >> > We are definitely at the mercy of the compiler for this. Maybe > Jed has an > >> > idea why its not vectorizing. > >> > >> Is this so bad? > >> > >> 000000000024080e mov rax,QWORD PTR [rbp-0xb0] > >> 0000000000240815 add ebx,0x1 > >> 0000000000240818 vmulpd ymm0,ymm7,YMMWORD PTR > >> [rax+r9*1] > >> 000000000024081e mov rax,QWORD PTR [rbp-0xa8] > >> 0000000000240825 vfmadd231pd > ymm0,ymm8,YMMWORD PTR > >> [rax+r9*1] > >> 000000000024082b mov rax,QWORD PTR [rbp-0xb8] > >> 0000000000240832 vfmadd231pd > ymm0,ymm6,YMMWORD PTR > >> [rax+r9*1] > >> 0000000000240838 vfmadd231pd > ymm0,ymm5,YMMWORD PTR > >> [r10+r9*1] > >> 000000000024083e vaddpd ymm0,ymm0,YMMWORD PTR > >> [r11+r9*1] > >> 0000000000240844 vmovapd YMMWORD PTR > [r11+r9*1],ymm0 > >> 000000000024084a add r9,0x20 > >> 000000000024084e cmp DWORD PTR [rbp-0xa0],ebx > >> 0000000000240854 ja 000000000024080e > >> > >> > > > > I agree that is what we should see. It cannot be what Fillippo has > if he is > > getting ~4x with the template stuff. > > I'm using gcc. Fillippo, can you make an easy to run test that we can > evaluate on Xeon and KNL? > From francesco.caimmi at polimi.it Wed Apr 5 04:27:37 2017 From: francesco.caimmi at polimi.it (Francesco Caimmi) Date: Wed, 5 Apr 2017 11:27:37 +0200 Subject: [petsc-users] Understanding DMPlexDistribute overlap Message-ID: <8575748.yB8pcOHeRQ@pc-fcaimmi> Dear all, I was playing with DMPlex objects and I was trying to exactly figure out what the `overlap` parameter in DMPlexDistribute does. From the tutorial "Flexible, Scalable Mesh and Data Management using PETSc DMPlex" (slide 10) and from the work by Knepley et al. "Unstructured Overlapping Mesh Distribution in Parallel" I somehow got the idea that it should control the "depth" of the mesh overlap. That is, given the partition boundary, if overlay is set to 0 only the entities adjacent (in the DMPlex topological sense and with the "style" defined by the AdjacencyUse routines) to entities at the boundary are shared, if overlay is 1 the first and the second neighbors (always in the DMPlex topological sense) are shared and so on, up to the point were we have a full duplicate of the mesh on each process (i.e. there is no upper bound on `overlap`). Is this correct or am I -totally- misunderstanding the meaning of the parameter? I am asking this because I see some behavior I cannot explain at varying the value of the overlap, but before going into the details I would like to be sure to understand exactly what the overlap parameter is supposed to do. Many thanks, -- Francesco Caimmi Laboratorio di Ingegneria dei Polimeri http://www.chem.polimi.it/polyenglab/ Politecnico di Milano - Dipartimento di Chimica, Materiali e Ingegneria Chimica ?Giulio Natta? P.zza Leonardo da Vinci, 32 I-20133 Milano Tel. +39.02.2399.4711 Fax +39.02.7063.8173 francesco.caimmi at polimi.it Skype: fmglcaimmi (please arrange meetings by e-mail) From michael.lange at imperial.ac.uk Wed Apr 5 04:50:59 2017 From: michael.lange at imperial.ac.uk (Michael Lange) Date: Wed, 5 Apr 2017 10:50:59 +0100 Subject: [petsc-users] Understanding DMPlexDistribute overlap In-Reply-To: <8575748.yB8pcOHeRQ@pc-fcaimmi> References: <8575748.yB8pcOHeRQ@pc-fcaimmi> Message-ID: <47d05e4c-7ec3-7b51-8893-d34ba9df0ceb@imperial.ac.uk> Hi Francesco, Your description is almost correct: the overlap defines the topological depth of shared entities as counted in "neighboring cells", where a cell counts as a neighbor of an owned cell according to the defined adjacency style. So for overlap=0 only facets, edges and vertices may be shared along the partition boundary, whereas for overlap=1 you can expect one additional "layer" of cells around each partition (the partitioning is done based on cell connectivity). For second neighbors, however, you need overlap=2. And yes, there is conceptually no upper bound on the overlap. Hope this helps, Michael On 05/04/17 10:27, Francesco Caimmi wrote: > Dear all, > > I was playing with DMPlex objects and I was trying to exactly figure out what > the `overlap` parameter in DMPlexDistribute does. > > From the tutorial "Flexible, Scalable Mesh and Data Management > using PETSc DMPlex" (slide 10) and from the work by Knepley et al. > "Unstructured Overlapping Mesh Distribution in Parallel" I somehow got the > idea that it should control the "depth" of the mesh overlap. > That is, given the partition boundary, if overlay is set to 0 only the > entities adjacent (in the DMPlex topological sense and with the "style" defined > by the AdjacencyUse routines) to entities at the boundary are shared, if > overlay is 1 the first and the second neighbors (always in the DMPlex > topological sense) are shared and so on, up to the point were we have a full > duplicate of the mesh on each process (i.e. there is no upper bound on > `overlap`). > > Is this correct or am I -totally- misunderstanding the meaning of the > parameter? > > I am asking this because I see some behavior I cannot explain at varying the > value of the overlap, but before going into the details I would like to be > sure to understand exactly what the overlap parameter is supposed to do. > > Many thanks, From francesco.caimmi at polimi.it Wed Apr 5 06:03:34 2017 From: francesco.caimmi at polimi.it (Francesco Caimmi) Date: Wed, 5 Apr 2017 13:03:34 +0200 Subject: [petsc-users] Understanding DMPlexDistribute overlap In-Reply-To: <47d05e4c-7ec3-7b51-8893-d34ba9df0ceb@imperial.ac.uk> References: <8575748.yB8pcOHeRQ@pc-fcaimmi> <47d05e4c-7ec3-7b51-8893-d34ba9df0ceb@imperial.ac.uk> Message-ID: <8336361.4c60jFNEBE@pc-fcaimmi> Hi Michael, thanks for the prompt reply! While I am happy I mostly got it right, this means I have some kind of problem I cannot solve on my own. :( I have this very simple 2D mesh I am experimenting with: a rectangle with 64 vertexes and 45 cells (attached in exodus format as cantilever.e); I am using in this very simple petsc4py program to read it, define a section and output a vector. The overlay value can be controlled by the -o command line switch. The program is executed as: mpiexec -np 2 python overlay-test.py -o -log_view. Everything works smoothly for = 0 or to 1, but for values >2 the program fails with the error message captured in the attached file error.log. Changing the number of processors does not alter the behavior. Note also that the same holds if I use a mesh generated by DMPlexCreateBoxMesh. I would really appreciate hints on how to solve this issue and I will of course provide any needed additional information. Thank you very much, FC On mercoled? 5 aprile 2017 10:50:59 CEST you wrote: > Hi Francesco, > > Your description is almost correct: the overlap defines the topological > depth of shared entities as counted in "neighboring cells", where a cell > counts as a neighbor of an owned cell according to the defined adjacency > style. So for overlap=0 only facets, edges and vertices may be shared > along the partition boundary, whereas for overlap=1 you can expect one > additional "layer" of cells around each partition (the partitioning is > done based on cell connectivity). For second neighbors, however, you > need overlap=2. And yes, there is conceptually no upper bound on the > overlap. > > Hope this helps, > > Michael > > On 05/04/17 10:27, Francesco Caimmi wrote: > > Dear all, > > > > I was playing with DMPlex objects and I was trying to exactly figure out > > what the `overlap` parameter in DMPlexDistribute does. > > > > From the tutorial "Flexible, Scalable Mesh and Data Management > > > > using PETSc DMPlex" (slide 10) and from the work by Knepley et al. > > "Unstructured Overlapping Mesh Distribution in Parallel" I somehow got the > > idea that it should control the "depth" of the mesh overlap. > > That is, given the partition boundary, if overlay is set to 0 only the > > entities adjacent (in the DMPlex topological sense and with the "style" > > defined by the AdjacencyUse routines) to entities at the boundary are > > shared, if overlay is 1 the first and the second neighbors (always in the > > DMPlex topological sense) are shared and so on, up to the point were we > > have a full duplicate of the mesh on each process (i.e. there is no upper > > bound on `overlap`). > > > > Is this correct or am I -totally- misunderstanding the meaning of the > > parameter? > > > > I am asking this because I see some behavior I cannot explain at varying > > the value of the overlap, but before going into the details I would like > > to be sure to understand exactly what the overlap parameter is supposed > > to do. > > > > Many thanks, -- Francesco Caimmi Laboratorio di Ingegneria dei Polimeri http://www.chem.polimi.it/polyenglab/ Politecnico di Milano - Dipartimento di Chimica, Materiali e Ingegneria Chimica ?Giulio Natta? P.zza Leonardo da Vinci, 32 I-20133 Milano Tel. +39.02.2399.4711 Fax +39.02.7063.8173 francesco.caimmi at polimi.it Skype: fmglcaimmi (please arrange meetings by e-mail) -------------- next part -------------- A non-text attachment was scrubbed... Name: cantilever.e Type: text/x-eiffel Size: 5348 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: overlay-test.py Type: text/x-python Size: 2475 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: error.log Type: text/x-log Size: 6062 bytes Desc: not available URL: From filippo.leon at gmail.com Wed Apr 5 06:39:53 2017 From: filippo.leon at gmail.com (Filippo Leonardi) Date: Wed, 05 Apr 2017 11:39:53 +0000 Subject: [petsc-users] PETSc with modern C++ In-Reply-To: <0bddea6d-7bf6-9dbc-c101-96cbecbe6686@iue.tuwien.ac.at> References: <87o9wdm6mc.fsf@jedbrown.org> <026B5877-0005-4759-B205-86C24C69628C@mcs.anl.gov> <87zifvk141.fsf@jedbrown.org> <87wpazjzf0.fsf@jedbrown.org> <0bddea6d-7bf6-9dbc-c101-96cbecbe6686@iue.tuwien.ac.at> Message-ID: Awesome, I appreciate the feedback of all of you. @Karli: Just to be clear: my original idea is to wrap up and make PETSc C++ bindings in a clean C++-ish way. Then an added value would be expression templates ("easy" to implement once you have a clear infrastructure), with all the benefits and drawbacks, as we discussed. But this would just be an "extra" thing. For my computations: I really have to say that I must've messed up something (gcc not vectorizing some PETSc call), and I really do not expect a 4x improvemet from templates. If I manage, I will do some precise test and provide you a simple code to do so (if you are interested), since I am not sure I will have KNL available for those tests. On Wed, 5 Apr 2017 at 11:19 Karl Rupp wrote: > Hi Filippo, > > did you compile PETSc with the same level of optimization than your > template code? In particular, did you turn debugging off for the timings? > > Either way, let me share some of the mental stages I've gone through > with ViennaCL. It started out with the same premises as you provide: > There is nice C++, compilers are 'now' mature enough, and everything can > be made a lot 'easier' and 'simpler' for the user. This was in 2010, > back when all this GPU computing took off. > > Now, fast-forward to this day, I consider the extensive use of C++ with > expression templates and other fanciness to be the biggest mistake of > ViennaCL. Here is an incomplete list of why: > > a) The 'generic' interface mimics that of Boost.uBLAS, so you can just > mix&match the two libraries. Problems: First, one replicates the really > poor design choices in uBLAS. Second, most users didn't even *think* > about using these generic routines for anything other than ViennaCL > types. There are just many more computational scientists familiar with > Fortran or C than with "modern C++". > > b) Keeping the type system consistent. So you would probably introduce > a matrix, a vector, and operator overloads. For example, take y = A * x > for a matrix A and vectors y, x. Sooner or later, users will write y = R > * A * P * x and all of a sudden your left-to-right associativity results > in an evaluation as > y = ((R * A) * P) * x. > (How to 'solve' this particular case: Introduce even more expression > templates! See Blaze for an example. So you soon have hundreds of lines > of code just to fix problems you never had with just calling functions.) > > c) Deal with errors: Continuing with the above example, what if R * A > runs out of memory? Sure, you can throw exceptions, but the resulting > stack trace will contain all the expression template frenzy, plus you > will have a hard time to find out whether the operation R * A or the > subsequent multiplication with P causes the error. > > d) Expression templates are limited to a single line of code. Every > now and then, algorithms require repeated operations on different data. > For example, you may have vector operations of the form > x_i <- x_{i-1} + alpha * p_{i-1} > r_i <- r_{i-1} - alpha * y_i > p_i <- r_i + beta * p_{i-1} > (with alpha and beta being scalars, everything else being vectors) > in a pipelined conjugate gradient formulation. The "C"-way of doing this > is to grab the pointers to the data and compute all three vector > operations in a single for-loop, thus saving on repeated data access > from main memory. With expression templates, there is no way to make > this as efficient, because the lifetime of expression templates is > exactly one line of code. > > e) Lack of a stable ABI. There is a good reason why many vendor > libraries come with a C interface, not with a C++ interface. If you try > to link C++ object files generated by different C++ compilers (it is > enough to have different versions of the same compiler, see Microsoft), > it is undefined behavior. In best case, it fails with a linker error. In > worst case, it will produce random crashes right before the end of a > one-month simulation that you needed for your paper to be submitted in a > few days. If I remember correctly, one group got ambitious with C++ for > writing an MPI library, running into these kinds of problems with the > C++ standard library. > > f) The entry bar gets way too high. PETSc's current code base allows > several dozens of users to contribute to each release. If some problem > shows up in a particular function, you just go there, hook up the > debugger, and figure out what is going on. Now assume there are C++ > expression templates involved. Your stack trace may now spread over > several screen pages. Once you navigate to the offending line, you find > that a certain overload of a traits class coupled with two functors > provides wrong results. The reverse lookup of how exactly the traits > class was instantiated, which working set was passed in, and how those > functors interact may easily take you many minutes to digest - if you > are the guy who has written that piece of code. People new to PETSc are > likely to just give up. Note that this is a problem for new library > users across the spectrum of C++ libraries; you can find evidence e.g. > on the Eigen mailinglist. In the end, C++ templates are leaky > abstractions and their use will sooner or later hit the user. > > Long story short: Keep in mind that there is a cost to "modern C++" in > exchange for eventual performance benefits. For PETSc applications, the > bottleneck is all too often not the performance of an expression that > could be 'expression templated', but in areas where the use of C++ > wouldn't make a difference: algorithm selection, network performance, or > poor process placement, just to name a few. > > Nonetheless, don't feel set back; you may have better ideas and know > better ways of dealing with C++ than we do. A lightweight wrapper on top > of PETSc, similar to what petsc4py does for Python, is something that a > share of users may find handy. Whether it's worth the extra coding and > maintenance effort? I don't know. > > Best regards, > Karli > > > On 04/05/2017 07:42 AM, Filippo Leonardi wrote: > > @jed: You assembly is what I would've expected. Let me simplify my code > > and see if I can provide a useful test example. (also: I assume your > > assembly is for xeon, so I should definitely use avx512). > > > > Let me get back at you in a few days (work permitting) with something > > you can use. > > > > From your example I wouldn't expect any benefit with my code compared to > > just calling petsc (for those simple kernels). > > > > A big plus I hadn't thought of, would be that the compiler is really > > forced to vectorise (like in my case, where I might'have messed up some > > config parameter). > > > > @barry: I'm definitely too young to comment here (i.e. it's me that > > changed, not the world). Definitely this is not new stuff, and, for > > instance, Armadillo/boost/Eigen have been successfully production ready > > for many years now. I have somehow the impression that now that c++11 is > > more mainstream, it is much easier to write easily readable/maintainable > > code (still ugly as hell tough). I think we can now give for granted a > > c++11 compiler on any "supercomputer", and even c++14 and soon c++17... > > and this makes development and interfaces much nicer. > > > > What I would like to see is something like PETSc (where I have nice, > > hidden MPI calls for instance), combined with the niceness of those > > libraries (where many operations can be written in a, if I might say so, > > more natural way). (My plan is: you did all the hard work, C++ can put a > > ribbon on it and see what comes out.) > > > > On 5 Apr 2017 5:39 am, "Jed Brown" > > wrote: > > > > Matthew Knepley > > writes: > > > > > On Tue, Apr 4, 2017 at 10:02 PM, Jed Brown > > wrote: > > > > > >> Matthew Knepley > > > writes: > > >> > > >> > On Tue, Apr 4, 2017 at 3:40 PM, Filippo Leonardi > > > > >> > > > >> > wrote: > > >> > > > >> >> I had weird issues where gcc (that I am using for my tests > > right now) > > >> >> wasn't vectorising properly (even enabling all flags, from > > >> tree-vectorize, > > >> >> to mavx). According to my tests, I know the Intel compiler was > > a bit > > >> better > > >> >> at that. > > >> >> > > >> > > > >> > We are definitely at the mercy of the compiler for this. Maybe > > Jed has an > > >> > idea why its not vectorizing. > > >> > > >> Is this so bad? > > >> > > >> 000000000024080e mov rax,QWORD PTR > [rbp-0xb0] > > >> 0000000000240815 add ebx,0x1 > > >> 0000000000240818 vmulpd ymm0,ymm7,YMMWORD PTR > > >> [rax+r9*1] > > >> 000000000024081e mov rax,QWORD PTR > [rbp-0xa8] > > >> 0000000000240825 vfmadd231pd > > ymm0,ymm8,YMMWORD PTR > > >> [rax+r9*1] > > >> 000000000024082b mov rax,QWORD PTR > [rbp-0xb8] > > >> 0000000000240832 vfmadd231pd > > ymm0,ymm6,YMMWORD PTR > > >> [rax+r9*1] > > >> 0000000000240838 vfmadd231pd > > ymm0,ymm5,YMMWORD PTR > > >> [r10+r9*1] > > >> 000000000024083e vaddpd ymm0,ymm0,YMMWORD PTR > > >> [r11+r9*1] > > >> 0000000000240844 vmovapd YMMWORD PTR > > [r11+r9*1],ymm0 > > >> 000000000024084a add r9,0x20 > > >> 000000000024084e cmp DWORD PTR > [rbp-0xa0],ebx > > >> 0000000000240854 ja 000000000024080e > > >> > > >> > > > > > > I agree that is what we should see. It cannot be what Fillippo has > > if he is > > > getting ~4x with the template stuff. > > > > I'm using gcc. Fillippo, can you make an easy to run test that we > can > > evaluate on Xeon and KNL? > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Apr 5 07:29:41 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 5 Apr 2017 07:29:41 -0500 Subject: [petsc-users] Configuring PETSc for KNL In-Reply-To: References: <87h923lvkh.fsf@jedbrown.org> <8084C1D4-27BB-4C08-9735-82C15554B61B@mcs.anl.gov> <87o9wbjxz8.fsf@jedbrown.org> Message-ID: On Wed, Apr 5, 2017 at 12:12 AM, Richard Mills wrote: > On Tue, Apr 4, 2017 at 9:10 PM, Jed Brown wrote: > >> Barry Smith writes: >> >> > These results seem reasonable to me. >> > >> > What makes you think that KNL should be doing better than it does in >> comparison to Haswell? >> > >> > The entire reason for the existence of KNL is that it is a way for >> > Intel to be able to "compete" with Nvidia GPUs for numerics and >> > data processing, for example in the financial industry. By >> > "compete" I mean convince gullible purchasing agents for large >> > companies to purchase Intel KNL systems instead of Nvidia GPU >> > systems. There is nothing in the hardware specifications of KNL >> > that would indicate that it should work better on this type of >> > problem than Haswell, in fact the specifications indicate that the >> > Haskell should perform better >> >> Boom! Time to rewrite PETSc in Haskell! >> > > Yeah, forget this debate about using C++! > I think what Jed means is Time to write a Haskell program to Write PETSc. Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Apr 5 07:54:19 2017 From: jed at jedbrown.org (Jed Brown) Date: Wed, 05 Apr 2017 06:54:19 -0600 Subject: [petsc-users] Configuring PETSc for KNL In-Reply-To: References: <87h923lvkh.fsf@jedbrown.org> <8084C1D4-27BB-4C08-9735-82C15554B61B@mcs.anl.gov> <87o9wbjxz8.fsf@jedbrown.org> Message-ID: <87bmsbj9qc.fsf@jedbrown.org> Matthew Knepley writes: > On Wed, Apr 5, 2017 at 12:12 AM, Richard Mills > wrote: > >> On Tue, Apr 4, 2017 at 9:10 PM, Jed Brown wrote: >> >>> Barry Smith writes: >>> >>> > These results seem reasonable to me. >>> > >>> > What makes you think that KNL should be doing better than it does in >>> comparison to Haswell? >>> > >>> > The entire reason for the existence of KNL is that it is a way for >>> > Intel to be able to "compete" with Nvidia GPUs for numerics and >>> > data processing, for example in the financial industry. By >>> > "compete" I mean convince gullible purchasing agents for large >>> > companies to purchase Intel KNL systems instead of Nvidia GPU >>> > systems. There is nothing in the hardware specifications of KNL >>> > that would indicate that it should work better on this type of >>> > problem than Haswell, in fact the specifications indicate that the >>> > Haskell should perform better >>> >>> Boom! Time to rewrite PETSc in Haskell! >>> >> >> Yeah, forget this debate about using C++! >> > > I think what Jed means is Time to write a Haskell program to Write PETSc. Free your points and your mind will follow. https://wiki.haskell.org/Pointfree -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From knepley at gmail.com Wed Apr 5 07:58:27 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 5 Apr 2017 07:58:27 -0500 Subject: [petsc-users] Configuring PETSc for KNL In-Reply-To: <87bmsbj9qc.fsf@jedbrown.org> References: <87h923lvkh.fsf@jedbrown.org> <8084C1D4-27BB-4C08-9735-82C15554B61B@mcs.anl.gov> <87o9wbjxz8.fsf@jedbrown.org> <87bmsbj9qc.fsf@jedbrown.org> Message-ID: On Wed, Apr 5, 2017 at 7:54 AM, Jed Brown wrote: > Matthew Knepley writes: > > > On Wed, Apr 5, 2017 at 12:12 AM, Richard Mills > > wrote: > > > >> On Tue, Apr 4, 2017 at 9:10 PM, Jed Brown wrote: > >> > >>> Barry Smith writes: > >>> > >>> > These results seem reasonable to me. > >>> > > >>> > What makes you think that KNL should be doing better than it does > in > >>> comparison to Haswell? > >>> > > >>> > The entire reason for the existence of KNL is that it is a way for > >>> > Intel to be able to "compete" with Nvidia GPUs for numerics and > >>> > data processing, for example in the financial industry. By > >>> > "compete" I mean convince gullible purchasing agents for large > >>> > companies to purchase Intel KNL systems instead of Nvidia GPU > >>> > systems. There is nothing in the hardware specifications of KNL > >>> > that would indicate that it should work better on this type of > >>> > problem than Haswell, in fact the specifications indicate that the > >>> > Haskell should perform better > >>> > >>> Boom! Time to rewrite PETSc in Haskell! > >>> > >> > >> Yeah, forget this debate about using C++! > >> > > > > I think what Jed means is Time to write a Haskell program to Write PETSc. > > Free your points and your mind will follow. > > https://wiki.haskell.org/Pointfree > After reading that, its hard to see why people use anything else Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Wed Apr 5 10:10:48 2017 From: hongzhang at anl.gov (Zhang, Hong) Date: Wed, 5 Apr 2017 15:10:48 +0000 Subject: [petsc-users] Configuring PETSc for KNL In-Reply-To: References: <87h923lvkh.fsf@jedbrown.org> Message-ID: <78CC519D-CA8C-4735-AD22-48F63CF60442@anl.gov> On Apr 4, 2017, at 10:45 PM, Justin Chang > wrote: So I tried the following options: -M 40 -N 40 -P 5 -da_refine 1/2/3/4 -log_view -mg_coarse_pc_type gamg -mg_levels_0_pc_type gamg -mg_levels_1_sub_pc_type cholesky -pc_type mg -thi_mat_type baij Performance improved dramatically. However, Haswell still beats out KNL but only by a little. Now it seems like MatSOR is taking some time (though I can't really judge whether it's significant or not). Attached are the log files. MatSOR takes only 3% of the total time. Most of the time is spent on PCSetUp (~30%) and PCApply (~11%). If ex48 has SSE2 intrinsics, does that mean Haswell would almost always be better? The Jacobian evaluation (which has SSE2 intrinsics) on Haswell is about two times as fast as on KNL, but it eats only 3%-4% of the total time. According to your logs, the compute-intensive kernels such as MatMult, MatSOR, PCApply run faster (~2X) on Haswell. But since the setup time dominates in this test, Haswell would not show much benefit. If you increase the problem size, it could be expected that the performance gap would also increase. Hong (Mr.) On Tue, Apr 4, 2017 at 4:19 PM, Jed Brown > wrote: Justin Chang > writes: > Attached are the job output files (which include -log_view) for SNES ex48 > run on a single haswell and knl node (32 and 64 cores respectively). > Started off with a coarse grid of size 40x40x5 and ran three different > tests with -da_refine 1/2/3 and -pc_type mg > > What's interesting/strange is that if i try to do -da_refine 4 on KNL, i > get a slurm error that says: "slurmstepd: error: Step 4408401.0 exceeded > memory limit (96737652 > 94371840), being killed" but it runs perfectly > fine on Haswell. Adding -pc_mg_levels 7 enables KNL to run on -da_refine 4 > but the performance still does not beat out haswell. > > The performance spectrum (dofs/sec) for 1-3 levels of refinement looks like > this: > > Haswell: > 2.416e+03 > 1.490e+04 > 5.188e+04 > > KNL: > 9.308e+02 > 7.257e+03 > 3.838e+04 > > Which might suggest to me that KNL performs better with larger problem > sizes. Look at the events. The (redundant) coarse LU factorization takes most of the run time on KNL. The PETSc sparse LU is not vectorized and doesn't exploit dense blocks in the way that the optimized direct solvers do. You'll note that the paper was more aggressive about minimizing the coarse grid size and used BoomerAMG instead of redundant direct solves to avoid this scaling problem. > On Tue, Apr 4, 2017 at 11:05 AM, Matthew Knepley > wrote: > >> On Tue, Apr 4, 2017 at 10:57 AM, Justin Chang > wrote: >> >>> Thanks everyone for the helpful advice. So I tried all the suggestions >>> including using libsci. The performance did not improve for my particular >>> runs, which I think suggests the problem parameters chosen for my tests >>> (SNES ex48) are not optimal for KNL. Does anyone have example test runs I >>> could reproduce that compare the performance between KNL and >>> Haswell/Ivybridge/etc? >>> >> >> Lets try to see what is going on with your existing data first. >> >> First, I think that main thing is to make sure we are using MCDRAM. >> Everything else in KNL >> is window dressing (IMHO). All we have to look at is something like MAXPY. >> You can get the >> bandwidth estimate from the flop rate and problem size (I think), and we >> can at least get >> bandwidth ratios between Haswell and KNL with that number. >> >> Matt >> >> >>> On Mon, Apr 3, 2017 at 3:06 PM Richard Mills > >>> wrote: >>> >>>> Yes, one should rely on MKL (or Cray LibSci, if using the Cray >>>> toolchain) on Cori. But I'm guessing that this will make no noticeable >>>> difference for what Justin is doing. >>>> >>>> --Richard >>>> >>>> On Mon, Apr 3, 2017 at 12:57 PM, murat ke?eli > wrote: >>>> >>>> How about replacing --download-fblaslapack with vendor specific >>>> BLAS/LAPACK? >>>> >>>> Murat >>>> >>>> On Mon, Apr 3, 2017 at 2:45 PM, Richard Mills > >>>> wrote: >>>> >>>> On Mon, Apr 3, 2017 at 12:24 PM, Zhang, Hong > wrote: >>>> >>>> >>>> On Apr 3, 2017, at 1:44 PM, Justin Chang > wrote: >>>> >>>> Richard, >>>> >>>> This is what my job script looks like: >>>> >>>> #!/bin/bash >>>> #SBATCH -N 16 >>>> #SBATCH -C knl,quad,flat >>>> #SBATCH -p regular >>>> #SBATCH -J knlflat1024 >>>> #SBATCH -L SCRATCH >>>> #SBATCH -o knlflat1024.o%j >>>> #SBATCH --mail-type=ALL >>>> #SBATCH --mail-user=jychang48 at gmail.com >>>> #SBATCH -t 00:20:00 >>>> >>>> #run the application: >>>> cd $SCRATCH/Icesheet >>>> sbcast --compress=lz4 ./ex48cori /tmp/ex48cori >>>> srun -n 1024 -c 4 --cpu_bind=cores numactl -p 1 /tmp/ex48cori -M 128 -N >>>> 128 -P 16 -thi_mat_type baij -pc_type mg -mg_coarse_pc_type gamg -da_refine >>>> 1 >>>> >>>> >>>> Maybe it is a typo. It should be numactl -m 1. >>>> >>>> >>>> "-p 1" will also work. "-p" means to "prefer" NUMA node 1 (the MCDRAM), >>>> whereas "-m" means to use only NUMA node 1. In the former case, MCDRAM >>>> will be used for allocations until the available memory there has been >>>> exhausted, and then things will spill over into the DRAM. One would think >>>> that "-m" would be better for doing performance studies, but on systems >>>> where the nodes have swap space enabled, you can get terrible performance >>>> if your code's working set exceeds the size of the MCDRAM, as the system >>>> will obediently obey your wishes to not use the DRAM and go straight to the >>>> swap disk! I assume the Cori nodes don't have swap space, though I could >>>> be wrong. >>>> >>>> >>>> According to the NERSC info pages, they say to add the "numactl" if >>>> using flat mode. Previously I tried cache mode but the performance seems to >>>> be unaffected. >>>> >>>> >>>> Using cache mode should give similar performance as using flat mode with >>>> the numactl option. But both approaches should be significant faster than >>>> using flat mode without the numactl option. I usually see over 3X speedup. >>>> You can also do such comparison to see if the high-bandwidth memory is >>>> working properly. >>>> >>>> I also comparerd 256 haswell nodes vs 256 KNL nodes and haswell is >>>> nearly 4-5x faster. Though I suspect this drastic change has much to do >>>> with the initial coarse grid size now being extremely small. >>>> >>>> I think you may be right about why you see such a big difference. The >>>> KNL nodes need enough work to be able to use the SIMD lanes effectively. >>>> Also, if your problem gets small enough, then it's going to be able to fit >>>> in the Haswell's L3 cache. Although KNL has MCDRAM and this delivers *a >>>> lot* more memory bandwidth than the DDR4 memory, it will deliver a lot less >>>> bandwidth than the Haswell's L3. >>>> >>>> I'll give the COPTFLAGS a try and see what happens >>>> >>>> >>>> Make sure to use --with-memalign=64 for data alignment when configuring >>>> PETSc. >>>> >>>> >>>> Ah, yes, I forgot that. Thanks for mentioning it, Hong! >>>> >>>> >>>> The option -xMIC-AVX512 would improve the vectorization performance. But >>>> it may cause problems for the MPIBAIJ format for some unknown reason. >>>> MPIAIJ should work fine with this option. >>>> >>>> >>>> Hmm. Try both, and, if you see worse performance with MPIBAIJ, let us >>>> know and I'll try to figure this out. >>>> >>>> --Richard >>>> >>>> >>>> >>>> Hong (Mr.) >>>> >>>> Thanks, >>>> Justin >>>> >>>> On Mon, Apr 3, 2017 at 1:36 PM, Richard Mills > >>>> wrote: >>>> >>>> Hi Justin, >>>> >>>> How is the MCDRAM (on-package "high-bandwidth memory") configured for >>>> your KNL runs? And if it is in "flat" mode, what are you doing to ensure >>>> that you use the MCDRAM? Doing this wrong seems to be one of the most >>>> common reasons for unexpected poor performance on KNL. >>>> >>>> I'm not that familiar with the environment on Cori, but I think that if >>>> you are building for KNL, you should add "-xMIC-AVX512" to your compiler >>>> flags to explicitly instruct the compiler to use the AVX512 instruction >>>> set. I usually use something along the lines of >>>> >>>> 'COPTFLAGS=-g -O3 -fp-model fast -xMIC-AVX512' >>>> >>>> (The "-g" just adds symbols, which make the output from performance >>>> profiling tools much more useful.) >>>> >>>> That said, I think that if you are comparing 1024 Haswell cores vs. 1024 >>>> KNL cores (so double the number of Haswell nodes), I'm not surprised that >>>> the simulations are almost twice as fast using the Haswell nodes. Keep in >>>> mind that individual KNL cores are much less powerful than an individual >>>> Haswell node. You are also using roughly twice the power footprint (dual >>>> socket Haswell node should be roughly equivalent to a KNL node, I >>>> believe). How do things look on when you compare equal nodes? >>>> >>>> Cheers, >>>> Richard >>>> >>>> On Mon, Apr 3, 2017 at 11:13 AM, Justin Chang > >>>> wrote: >>>> >>>> Hi all, >>>> >>>> On NERSC's Cori I have the following configure options for PETSc: >>>> >>>> ./configure --download-fblaslapack --with-cc=cc --with-clib-autodetect=0 >>>> --with-cxx=CC --with-cxxlib-autodetect=0 --with-debugging=0 --with-fc=ftn >>>> --with-fortranlib-autodetect=0 --with-mpiexec=srun --with-64-bit-indices=1 >>>> COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 PETSC_ARCH=arch-cori-opt >>>> >>>> Where I swapped out the default Intel programming environment with that >>>> of Cray (e.g., 'module switch PrgEnv-intel/6.0.3 PrgEnv-cray/6.0.3'). I >>>> want to document the performance difference between Cori's Haswell and KNL >>>> processors. >>>> >>>> When I run a PETSc example like SNES ex48 on 1024 cores (32 Haswell and >>>> 16 KNL nodes), the simulations are almost twice as fast on Haswell nodes. >>>> Which leads me to suspect that I am not doing something right for KNL. Does >>>> anyone know what are some "optimal" configure options for running PETSc on >>>> KNL? >>>> >>>> Thanks, >>>> Justin >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Apr 5 10:53:52 2017 From: jed at jedbrown.org (Jed Brown) Date: Wed, 05 Apr 2017 09:53:52 -0600 Subject: [petsc-users] Configuring PETSc for KNL In-Reply-To: <78CC519D-CA8C-4735-AD22-48F63CF60442@anl.gov> References: <87h923lvkh.fsf@jedbrown.org> <78CC519D-CA8C-4735-AD22-48F63CF60442@anl.gov> Message-ID: <87inmij1f3.fsf@jedbrown.org> "Zhang, Hong" writes: > On Apr 4, 2017, at 10:45 PM, Justin Chang > wrote: > > So I tried the following options: > > -M 40 > -N 40 > -P 5 > -da_refine 1/2/3/4 > -log_view > -mg_coarse_pc_type gamg > -mg_levels_0_pc_type gamg > -mg_levels_1_sub_pc_type cholesky > -pc_type mg > -thi_mat_type baij > > Performance improved dramatically. However, Haswell still beats out KNL but only by a little. Now it seems like MatSOR is taking some time (though I can't really judge whether it's significant or not). Attached are the log files. > > > MatSOR takes only 3% of the total time. Most of the time is spent on PCSetUp (~30%) and PCApply (~11%). I don't see any of your conclusions in the actual data, unless you only looked at the smallest size that Justin tested. For example, from the largest problem size in Justin's logs: KNL: MatSOR 2688 1.0 2.3942e+02 1.1 4.47e+10 1.0 0.0e+00 0.0e+00 0.0e+00 36 45 0 0 0 36 45 0 0 0 11946 KSPSolve 8 1.0 4.3837e+02 1.0 9.87e+10 1.0 1.5e+06 8.8e+03 5.0e+03 68 99 98 61 98 68 99 98 61 98 14409 SNESSolve 1 1.0 6.1583e+02 1.0 9.95e+10 1.0 1.6e+06 1.4e+04 5.1e+03 96100100100 99 96100100100 99 10338 SNESFunctionEval 9 1.0 3.8730e+01 1.0 0.00e+00 0.0 9.2e+03 3.2e+04 0.0e+00 6 0 1 1 0 6 0 1 1 0 0 SNESJacobianEval 40 1.0 1.5628e+02 1.0 0.00e+00 0.0 4.4e+04 2.5e+05 1.4e+02 24 0 3 49 3 24 0 3 49 3 0 PCSetUp 16 1.0 3.4525e+01 1.0 6.52e+07 1.0 2.8e+05 1.0e+04 3.8e+03 5 0 18 13 74 5 0 18 13 74 119 PCSetUpOnBlocks 60 1.0 9.5716e-01 1.1 1.41e+05 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCApply 60 1.0 3.8705e+02 1.0 9.32e+10 1.0 1.2e+06 8.0e+03 1.1e+03 60 94 79 45 21 60 94 79 45 21 15407 MatMult 2860 1.0 1.4578e+02 1.1 4.92e+10 1.0 1.2e+06 8.8e+03 0.0e+00 21 49 77 48 0 21 49 77 48 0 21579 Haswell: MatSOR 2262 1.0 2.2116e+02 1.1 7.56e+10 1.0 0.0e+00 0.0e+00 0.0e+00 48 45 0 0 0 48 45 0 0 0 10936 KSPSolve 7 1.0 3.5937e+02 1.0 1.67e+11 1.0 6.7e+05 1.3e+04 4.5e+03 81 99 98 60 98 81 99 98 60 98 14828 SNESSolve 1 1.0 4.3749e+02 1.0 1.68e+11 1.0 6.8e+05 2.1e+04 4.5e+03 99100100100 99 99100100100 99 12280 SNESFunctionEval 8 1.0 1.5460e+01 1.0 0.00e+00 0.0 4.1e+03 4.7e+04 0.0e+00 3 0 1 1 0 3 0 1 1 0 0 SNESJacobianEval 35 1.0 6.8994e+01 1.0 0.00e+00 0.0 1.9e+04 3.8e+05 1.3e+02 16 0 3 50 3 16 0 3 50 3 0 PCSetUp 14 1.0 1.0860e+01 1.0 1.15e+08 1.0 1.3e+05 1.4e+04 3.4e+03 2 0 19 13 74 2 0 19 13 74 335 PCSetUpOnBlocks 50 1.0 4.5601e-02 1.6 2.89e+05 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 6 PCApply 50 1.0 3.3545e+02 1.0 1.57e+11 1.0 5.3e+05 1.2e+04 9.7e+02 75 94 77 44 21 75 94 77 44 21 15017 MatMult 2410 1.0 1.2050e+02 1.1 8.28e+10 1.0 5.1e+05 1.3e+04 0.0e+00 27 49 75 46 0 27 49 75 46 0 21983 > If ex48 has SSE2 intrinsics, does that mean Haswell would almost always be better? > > The Jacobian evaluation (which has SSE2 intrinsics) on Haswell is about two times as fast as on KNL, but it eats only 3%-4% of the total time. SNESJacobianEval alone accounts for 90 seconds of the 180 second difference between KNL and Haswell. > According to your logs, the compute-intensive kernels such as MatMult, > MatSOR, PCApply run faster (~2X) on Haswell. They run almost the same speed. > But since the setup time dominates in this test, It doesn't dominate on the larger sizes. > Haswell would not show much benefit. If you increase the problem size, > it could be expected that the performance gap would also increase. Backwards. Haswell is great for low latency on small problem sizes while KNL offers higher theoretical throughput (often not realized due to lack of vectorization) for sufficiently large problem sizes (especially if they don't fit in Haswell L3 cache but do fit in MCDRAM). -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From ingogaertner.tus at gmail.com Wed Apr 5 11:50:54 2017 From: ingogaertner.tus at gmail.com (Ingo Gaertner) Date: Wed, 5 Apr 2017 18:50:54 +0200 Subject: [petsc-users] examples of DMPlex*FVM methods In-Reply-To: <87k270kjtd.fsf@jedbrown.org> References: <87k270kjtd.fsf@jedbrown.org> Message-ID: Hi Jed, thank you for your reply. Two followup questions below: 2017-04-04 22:18 GMT+02:00 Jed Brown : > Ingo Gaertner writes: > > > We have never talked about Riemann solvers in our CFD course, and I don't > > understand what's going on in ex11. > > However, if you could answer a few of my questions, you'll give me a good > > start with PETSc. For the simple poisson problem that I am trying to > > implement, I have to discretize div(k grad u) integrated over each FV > cell, > > where k is the known diffusivity, and u is the vector to solve for. > > Note that ex11 solves hyperbolic conservation laws, but you are solving > an elliptic equation. > I begin to understand. Petscs FVM methods don't provide a FVM library that can be used to implement the FV control volume approach (see Ferziger) for general CFD problems? They are around just because they have been used to tackle one or two specific problems, is this correct? I thought they could be used similar to the OpenFvm or OpenFoam libraries which seem to solve Poisson, Navier-Stokes, Euler and other problems. If such methods have not been prepared for Petsc, I'll just follow Ferzigers book and start my work on a lower level than I thought would be necessary. More work, more fun :) > > > (My second question is more general about the PETSc installation. When I > > configure PETSc with "--prefix=/somewhere --download-triangle > > --download-parmetis" etc., these extra libraries are built correctly > during > > the make step, but they are not copied to /somewhere during the "make > > install" step. > > Where are they put during configure? > My bad, Petsc installation works as expected. But the build system that I am using is doing something weird. I'll have to find out, what's going wrong there, but it is not related to Petsc. Thank you! Ingo Virenfrei. www.avast.com <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Wed Apr 5 11:54:44 2017 From: hongzhang at anl.gov (Zhang, Hong) Date: Wed, 5 Apr 2017 16:54:44 +0000 Subject: [petsc-users] Configuring PETSc for KNL In-Reply-To: <87inmij1f3.fsf@jedbrown.org> References: <87h923lvkh.fsf@jedbrown.org> <78CC519D-CA8C-4735-AD22-48F63CF60442@anl.gov> <87inmij1f3.fsf@jedbrown.org> Message-ID: > On Apr 5, 2017, at 10:53 AM, Jed Brown wrote: > > "Zhang, Hong" writes: > >> On Apr 4, 2017, at 10:45 PM, Justin Chang > wrote: >> >> So I tried the following options: >> >> -M 40 >> -N 40 >> -P 5 >> -da_refine 1/2/3/4 >> -log_view >> -mg_coarse_pc_type gamg >> -mg_levels_0_pc_type gamg >> -mg_levels_1_sub_pc_type cholesky >> -pc_type mg >> -thi_mat_type baij >> >> Performance improved dramatically. However, Haswell still beats out KNL but only by a little. Now it seems like MatSOR is taking some time (though I can't really judge whether it's significant or not). Attached are the log files. >> >> >> MatSOR takes only 3% of the total time. Most of the time is spent on PCSetUp (~30%) and PCApply (~11%). > > I don't see any of your conclusions in the actual data, unless you only > looked at the smallest size that Justin tested. For example, from the > largest problem size in Justin's logs: My mistake. I did not see the results for the large problem sizes. I was talking about the data for the smallest case. Now I am very surprised by the performance of MatSOR: -da_refine 1 ~2x slower on KNL -da_refine 2 ~2x faster on KNL -da_refine 3 ~2x faster on KNL -da_refine 4 almost the same KNL -da_refine 1 MatSOR 1185 1.0 2.8965e-01 1.1 7.01e+07 1.0 0.0e+00 0.0e+00 0.0e+00 3 41 0 0 0 3 41 0 0 0 15231 -da_refine 2 MatSOR 1556 1.0 1.6883e+00 1.0 5.82e+08 1.0 0.0e+00 0.0e+00 0.0e+00 11 44 0 0 0 11 44 0 0 0 22019 -da_refine 3 MatSOR 2240 1.0 1.4959e+01 1.0 5.51e+09 1.0 0.0e+00 0.0e+00 0.0e+00 22 45 0 0 0 22 45 0 0 0 23571 -da_refine 4 MatSOR 2688 1.0 2.3942e+02 1.1 4.47e+10 1.0 0.0e+00 0.0e+00 0.0e+00 36 45 0 0 0 36 45 0 0 0 11946 Haswell -da_refine 1 MatSOR 1167 1.0 1.4839e-01 1.1 1.42e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 42 0 0 0 3 42 0 0 0 30450 -da_refine 2 MatSOR 1532 1.0 2.9772e+00 1.0 1.17e+09 1.0 0.0e+00 0.0e+00 0.0e+00 28 44 0 0 0 28 44 0 0 0 12539 -da_refine 3 MatSOR 1915 1.0 2.7142e+01 1.1 9.51e+09 1.0 0.0e+00 0.0e+00 0.0e+00 45 45 0 0 0 45 45 0 0 0 11216 -da_refine 4 MatSOR 2262 1.0 2.2116e+02 1.1 7.56e+10 1.0 0.0e+00 0.0e+00 0.0e+00 48 45 0 0 0 48 45 0 0 0 10936 Hong (Mr.) > KNL: > MatSOR 2688 1.0 2.3942e+02 1.1 4.47e+10 1.0 0.0e+00 0.0e+00 0.0e+00 36 45 0 0 0 36 45 0 0 0 11946 > KSPSolve 8 1.0 4.3837e+02 1.0 9.87e+10 1.0 1.5e+06 8.8e+03 5.0e+03 68 99 98 61 98 68 99 98 61 98 14409 > SNESSolve 1 1.0 6.1583e+02 1.0 9.95e+10 1.0 1.6e+06 1.4e+04 5.1e+03 96100100100 99 96100100100 99 10338 > SNESFunctionEval 9 1.0 3.8730e+01 1.0 0.00e+00 0.0 9.2e+03 3.2e+04 0.0e+00 6 0 1 1 0 6 0 1 1 0 0 > SNESJacobianEval 40 1.0 1.5628e+02 1.0 0.00e+00 0.0 4.4e+04 2.5e+05 1.4e+02 24 0 3 49 3 24 0 3 49 3 0 > PCSetUp 16 1.0 3.4525e+01 1.0 6.52e+07 1.0 2.8e+05 1.0e+04 3.8e+03 5 0 18 13 74 5 0 18 13 74 119 > PCSetUpOnBlocks 60 1.0 9.5716e-01 1.1 1.41e+05 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > PCApply 60 1.0 3.8705e+02 1.0 9.32e+10 1.0 1.2e+06 8.0e+03 1.1e+03 60 94 79 45 21 60 94 79 45 21 15407 > MatMult 2860 1.0 1.4578e+02 1.1 4.92e+10 1.0 1.2e+06 8.8e+03 0.0e+00 21 49 77 48 0 21 49 77 48 0 21579 > > Haswell: > MatSOR 2262 1.0 2.2116e+02 1.1 7.56e+10 1.0 0.0e+00 0.0e+00 0.0e+00 48 45 0 0 0 48 45 0 0 0 10936 > KSPSolve 7 1.0 3.5937e+02 1.0 1.67e+11 1.0 6.7e+05 1.3e+04 4.5e+03 81 99 98 60 98 81 99 98 60 98 14828 > SNESSolve 1 1.0 4.3749e+02 1.0 1.68e+11 1.0 6.8e+05 2.1e+04 4.5e+03 99100100100 99 99100100100 99 12280 > SNESFunctionEval 8 1.0 1.5460e+01 1.0 0.00e+00 0.0 4.1e+03 4.7e+04 0.0e+00 3 0 1 1 0 3 0 1 1 0 0 > SNESJacobianEval 35 1.0 6.8994e+01 1.0 0.00e+00 0.0 1.9e+04 3.8e+05 1.3e+02 16 0 3 50 3 16 0 3 50 3 0 > PCSetUp 14 1.0 1.0860e+01 1.0 1.15e+08 1.0 1.3e+05 1.4e+04 3.4e+03 2 0 19 13 74 2 0 19 13 74 335 > PCSetUpOnBlocks 50 1.0 4.5601e-02 1.6 2.89e+05 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 6 > PCApply 50 1.0 3.3545e+02 1.0 1.57e+11 1.0 5.3e+05 1.2e+04 9.7e+02 75 94 77 44 21 75 94 77 44 21 15017 > MatMult 2410 1.0 1.2050e+02 1.1 8.28e+10 1.0 5.1e+05 1.3e+04 0.0e+00 27 49 75 46 0 27 49 75 46 0 21983 > >> If ex48 has SSE2 intrinsics, does that mean Haswell would almost always be better? >> >> The Jacobian evaluation (which has SSE2 intrinsics) on Haswell is about two times as fast as on KNL, but it eats only 3%-4% of the total time. > > SNESJacobianEval alone accounts for 90 seconds of the 180 second > difference between KNL and Haswell. > >> According to your logs, the compute-intensive kernels such as MatMult, >> MatSOR, PCApply run faster (~2X) on Haswell. > > They run almost the same speed. > >> But since the setup time dominates in this test, > > It doesn't dominate on the larger sizes. > >> Haswell would not show much benefit. If you increase the problem size, >> it could be expected that the performance gap would also increase. > > Backwards. Haswell is great for low latency on small problem sizes > while KNL offers higher theoretical throughput (often not realized due > to lack of vectorization) for sufficiently large problem sizes > (especially if they don't fit in Haswell L3 cache but do fit in MCDRAM). From knepley at gmail.com Wed Apr 5 11:56:46 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 5 Apr 2017 11:56:46 -0500 Subject: [petsc-users] examples of DMPlex*FVM methods In-Reply-To: References: <87k270kjtd.fsf@jedbrown.org> Message-ID: On Wed, Apr 5, 2017 at 11:50 AM, Ingo Gaertner wrote: > Hi Jed, > thank you for your reply. Two followup questions below: > > 2017-04-04 22:18 GMT+02:00 Jed Brown : > >> Ingo Gaertner writes: >> >> > We have never talked about Riemann solvers in our CFD course, and I >> don't >> > understand what's going on in ex11. >> > However, if you could answer a few of my questions, you'll give me a >> good >> > start with PETSc. For the simple poisson problem that I am trying to >> > implement, I have to discretize div(k grad u) integrated over each FV >> cell, >> > where k is the known diffusivity, and u is the vector to solve for. >> >> Note that ex11 solves hyperbolic conservation laws, but you are solving >> an elliptic equation. >> > > I begin to understand. Petscs FVM methods don't provide a FVM library that > can be used to implement the FV control volume approach (see Ferziger) for > general CFD problems? They are around just because they have been used to > tackle one or two specific problems, is this correct? > I thought they could be used similar to the OpenFvm or OpenFoam libraries > which seem to solve Poisson, Navier-Stokes, Euler and other problems. If > such methods have not been prepared for Petsc, I'll just follow Ferzigers > book and start my work on a lower level than I thought would be necessary. > More work, more fun :) > Yes, that is correct. As a side note, I think using FV to solve an elliptic equation should be a felony. Continuous FEM is excellent for this, whereas FV needs a variety of twisted hacks and is always worse in terms of computation and accuracy. Hyperbolic problems are what FV is designed for and I don't think I would ever support it for anything but that. Thanks, Matt > > (My second question is more general about the PETSc installation. When I >> > configure PETSc with "--prefix=/somewhere --download-triangle >> > --download-parmetis" etc., these extra libraries are built correctly >> during >> > the make step, but they are not copied to /somewhere during the "make >> > install" step. >> >> Where are they put during configure? >> > > My bad, Petsc installation works as expected. But the build system that I > am using is doing something weird. I'll have to find out, what's going > wrong there, but it is not related to Petsc. > > Thank you! > Ingo > > > > Virenfrei. > www.avast.com > > <#m_-9031669297911553285_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Apr 5 12:00:07 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 5 Apr 2017 12:00:07 -0500 Subject: [petsc-users] Configuring PETSc for KNL In-Reply-To: References: <87h923lvkh.fsf@jedbrown.org> <78CC519D-CA8C-4735-AD22-48F63CF60442@anl.gov> <87inmij1f3.fsf@jedbrown.org> Message-ID: On Wed, Apr 5, 2017 at 11:54 AM, Zhang, Hong wrote: > > > On Apr 5, 2017, at 10:53 AM, Jed Brown wrote: > > > > "Zhang, Hong" writes: > > > >> On Apr 4, 2017, at 10:45 PM, Justin Chang jychang48 at gmail.com>> wrote: > >> > >> So I tried the following options: > >> > >> -M 40 > >> -N 40 > >> -P 5 > >> -da_refine 1/2/3/4 > >> -log_view > >> -mg_coarse_pc_type gamg > >> -mg_levels_0_pc_type gamg > >> -mg_levels_1_sub_pc_type cholesky > >> -pc_type mg > >> -thi_mat_type baij > >> > >> Performance improved dramatically. However, Haswell still beats out KNL > but only by a little. Now it seems like MatSOR is taking some time (though > I can't really judge whether it's significant or not). Attached are the log > files. > >> > >> > >> MatSOR takes only 3% of the total time. Most of the time is spent on > PCSetUp (~30%) and PCApply (~11%). > > > > I don't see any of your conclusions in the actual data, unless you only > > looked at the smallest size that Justin tested. For example, from the > > largest problem size in Justin's logs: > > My mistake. I did not see the results for the large problem sizes. I was > talking about the data for the smallest case. > > Now I am very surprised by the performance of MatSOR: > > -da_refine 1 ~2x slower on KNL > -da_refine 2 ~2x faster on KNL > -da_refine 3 ~2x faster on KNL > -da_refine 4 almost the same > > KNL > > -da_refine 1 MatSOR 1185 1.0 2.8965e-01 1.1 7.01e+07 1.0 > 0.0e+00 0.0e+00 0.0e+00 3 41 0 0 0 3 41 0 0 0 15231 > -da_refine 2 MatSOR 1556 1.0 1.6883e+00 1.0 5.82e+08 1.0 > 0.0e+00 0.0e+00 0.0e+00 11 44 0 0 0 11 44 0 0 0 22019 > -da_refine 3 MatSOR 2240 1.0 1.4959e+01 1.0 5.51e+09 1.0 > 0.0e+00 0.0e+00 0.0e+00 22 45 0 0 0 22 45 0 0 0 23571 > -da_refine 4 MatSOR 2688 1.0 2.3942e+02 1.1 4.47e+10 1.0 > 0.0e+00 0.0e+00 0.0e+00 36 45 0 0 0 36 45 0 0 0 11946 > > > Haswell > -da_refine 1 MatSOR 1167 1.0 1.4839e-01 1.1 1.42e+08 1.0 > 0.0e+00 0.0e+00 0.0e+00 3 42 0 0 0 3 42 0 0 0 30450 > -da_refine 2 MatSOR 1532 1.0 2.9772e+00 1.0 1.17e+09 1.0 > 0.0e+00 0.0e+00 0.0e+00 28 44 0 0 0 28 44 0 0 0 12539 > -da_refine 3 MatSOR 1915 1.0 2.7142e+01 1.1 9.51e+09 1.0 > 0.0e+00 0.0e+00 0.0e+00 45 45 0 0 0 45 45 0 0 0 11216 > -da_refine 4 MatSOR 2262 1.0 2.2116e+02 1.1 7.56e+10 1.0 > 0.0e+00 0.0e+00 0.0e+00 48 45 0 0 0 48 45 0 0 0 10936 > SOR should track memory bandwidth, so it seems to me either a) We fell out of MCDRAM or b) We saturated the KNL node, but not the Haswell configuration I think these are all runs with identical parallelism, so its not b). Justin, did you tell it to fall back to DRAM, or fail? Thanks, Matt > Hong (Mr.) > > > > KNL: > > MatSOR 2688 1.0 2.3942e+02 1.1 4.47e+10 1.0 0.0e+00 0.0e+00 > 0.0e+00 36 45 0 0 0 36 45 0 0 0 11946 > > KSPSolve 8 1.0 4.3837e+02 1.0 9.87e+10 1.0 1.5e+06 8.8e+03 > 5.0e+03 68 99 98 61 98 68 99 98 61 98 14409 > > SNESSolve 1 1.0 6.1583e+02 1.0 9.95e+10 1.0 1.6e+06 1.4e+04 > 5.1e+03 96100100100 99 96100100100 99 10338 > > SNESFunctionEval 9 1.0 3.8730e+01 1.0 0.00e+00 0.0 9.2e+03 3.2e+04 > 0.0e+00 6 0 1 1 0 6 0 1 1 0 0 > > SNESJacobianEval 40 1.0 1.5628e+02 1.0 0.00e+00 0.0 4.4e+04 2.5e+05 > 1.4e+02 24 0 3 49 3 24 0 3 49 3 0 > > PCSetUp 16 1.0 3.4525e+01 1.0 6.52e+07 1.0 2.8e+05 1.0e+04 > 3.8e+03 5 0 18 13 74 5 0 18 13 74 119 > > PCSetUpOnBlocks 60 1.0 9.5716e-01 1.1 1.41e+05 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > PCApply 60 1.0 3.8705e+02 1.0 9.32e+10 1.0 1.2e+06 8.0e+03 > 1.1e+03 60 94 79 45 21 60 94 79 45 21 15407 > > MatMult 2860 1.0 1.4578e+02 1.1 4.92e+10 1.0 1.2e+06 8.8e+03 > 0.0e+00 21 49 77 48 0 21 49 77 48 0 21579 > > > > Haswell: > > MatSOR 2262 1.0 2.2116e+02 1.1 7.56e+10 1.0 0.0e+00 0.0e+00 > 0.0e+00 48 45 0 0 0 48 45 0 0 0 10936 > > KSPSolve 7 1.0 3.5937e+02 1.0 1.67e+11 1.0 6.7e+05 1.3e+04 > 4.5e+03 81 99 98 60 98 81 99 98 60 98 14828 > > SNESSolve 1 1.0 4.3749e+02 1.0 1.68e+11 1.0 6.8e+05 2.1e+04 > 4.5e+03 99100100100 99 99100100100 99 12280 > > SNESFunctionEval 8 1.0 1.5460e+01 1.0 0.00e+00 0.0 4.1e+03 4.7e+04 > 0.0e+00 3 0 1 1 0 3 0 1 1 0 0 > > SNESJacobianEval 35 1.0 6.8994e+01 1.0 0.00e+00 0.0 1.9e+04 3.8e+05 > 1.3e+02 16 0 3 50 3 16 0 3 50 3 0 > > PCSetUp 14 1.0 1.0860e+01 1.0 1.15e+08 1.0 1.3e+05 1.4e+04 > 3.4e+03 2 0 19 13 74 2 0 19 13 74 335 > > PCSetUpOnBlocks 50 1.0 4.5601e-02 1.6 2.89e+05 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 6 > > PCApply 50 1.0 3.3545e+02 1.0 1.57e+11 1.0 5.3e+05 1.2e+04 > 9.7e+02 75 94 77 44 21 75 94 77 44 21 15017 > > MatMult 2410 1.0 1.2050e+02 1.1 8.28e+10 1.0 5.1e+05 1.3e+04 > 0.0e+00 27 49 75 46 0 27 49 75 46 0 21983 > > > >> If ex48 has SSE2 intrinsics, does that mean Haswell would almost always > be better? > >> > >> The Jacobian evaluation (which has SSE2 intrinsics) on Haswell is about > two times as fast as on KNL, but it eats only 3%-4% of the total time. > > > > SNESJacobianEval alone accounts for 90 seconds of the 180 second > > difference between KNL and Haswell. > > > >> According to your logs, the compute-intensive kernels such as MatMult, > >> MatSOR, PCApply run faster (~2X) on Haswell. > > > > They run almost the same speed. > > > >> But since the setup time dominates in this test, > > > > It doesn't dominate on the larger sizes. > > > >> Haswell would not show much benefit. If you increase the problem size, > >> it could be expected that the performance gap would also increase. > > > > Backwards. Haswell is great for low latency on small problem sizes > > while KNL offers higher theoretical throughput (often not realized due > > to lack of vectorization) for sufficiently large problem sizes > > (especially if they don't fit in Haswell L3 cache but do fit in MCDRAM). > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Apr 5 12:03:46 2017 From: jed at jedbrown.org (Jed Brown) Date: Wed, 05 Apr 2017 11:03:46 -0600 Subject: [petsc-users] examples of DMPlex*FVM methods In-Reply-To: References: <87k270kjtd.fsf@jedbrown.org> Message-ID: <877f2yiy6l.fsf@jedbrown.org> Matthew Knepley writes: > As a side note, I think using FV to solve an elliptic equation should be a > felony. Continuous FEM is excellent for this, whereas FV needs > a variety of twisted hacks and is always worse in terms of computation and > accuracy. Unless you need exact (no discretization error) local conservation, e.g., for a projection in a staggered grid incompressible flow problem, in which case you can use either FV or mixed FEM (algebraically equivalent to FV in some cases). -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From jychang48 at gmail.com Wed Apr 5 12:23:49 2017 From: jychang48 at gmail.com (Justin Chang) Date: Wed, 5 Apr 2017 12:23:49 -0500 Subject: [petsc-users] Configuring PETSc for KNL In-Reply-To: References: <87h923lvkh.fsf@jedbrown.org> <78CC519D-CA8C-4735-AD22-48F63CF60442@anl.gov> <87inmij1f3.fsf@jedbrown.org> Message-ID: I simply ran these KNL simulations in flat mode with the following options: srun -n 64 -c 4 --cpu_bind=cores numactl -p 1 ./ex48 .... Basically I told it that MCDRAM usage in NUMA domain 1 is preferred. I followed the last example: http://www.nersc.gov/users/computational-systems/cori/configuration/knl-processor-modes/ On Wed, Apr 5, 2017 at 12:00 PM, Matthew Knepley wrote: > On Wed, Apr 5, 2017 at 11:54 AM, Zhang, Hong wrote: > >> >> > On Apr 5, 2017, at 10:53 AM, Jed Brown wrote: >> > >> > "Zhang, Hong" writes: >> > >> >> On Apr 4, 2017, at 10:45 PM, Justin Chang > jychang48 at gmail.com>> wrote: >> >> >> >> So I tried the following options: >> >> >> >> -M 40 >> >> -N 40 >> >> -P 5 >> >> -da_refine 1/2/3/4 >> >> -log_view >> >> -mg_coarse_pc_type gamg >> >> -mg_levels_0_pc_type gamg >> >> -mg_levels_1_sub_pc_type cholesky >> >> -pc_type mg >> >> -thi_mat_type baij >> >> >> >> Performance improved dramatically. However, Haswell still beats out >> KNL but only by a little. Now it seems like MatSOR is taking some time >> (though I can't really judge whether it's significant or not). Attached are >> the log files. >> >> >> >> >> >> MatSOR takes only 3% of the total time. Most of the time is spent on >> PCSetUp (~30%) and PCApply (~11%). >> > >> > I don't see any of your conclusions in the actual data, unless you only >> > looked at the smallest size that Justin tested. For example, from the >> > largest problem size in Justin's logs: >> >> My mistake. I did not see the results for the large problem sizes. I was >> talking about the data for the smallest case. >> >> Now I am very surprised by the performance of MatSOR: >> >> -da_refine 1 ~2x slower on KNL >> -da_refine 2 ~2x faster on KNL >> -da_refine 3 ~2x faster on KNL >> -da_refine 4 almost the same >> >> KNL >> >> -da_refine 1 MatSOR 1185 1.0 2.8965e-01 1.1 7.01e+07 1.0 >> 0.0e+00 0.0e+00 0.0e+00 3 41 0 0 0 3 41 0 0 0 15231 >> -da_refine 2 MatSOR 1556 1.0 1.6883e+00 1.0 5.82e+08 1.0 >> 0.0e+00 0.0e+00 0.0e+00 11 44 0 0 0 11 44 0 0 0 22019 >> -da_refine 3 MatSOR 2240 1.0 1.4959e+01 1.0 5.51e+09 1.0 >> 0.0e+00 0.0e+00 0.0e+00 22 45 0 0 0 22 45 0 0 0 23571 >> -da_refine 4 MatSOR 2688 1.0 2.3942e+02 1.1 4.47e+10 1.0 >> 0.0e+00 0.0e+00 0.0e+00 36 45 0 0 0 36 45 0 0 0 11946 >> >> >> Haswell >> -da_refine 1 MatSOR 1167 1.0 1.4839e-01 1.1 1.42e+08 1.0 >> 0.0e+00 0.0e+00 0.0e+00 3 42 0 0 0 3 42 0 0 0 30450 >> -da_refine 2 MatSOR 1532 1.0 2.9772e+00 1.0 1.17e+09 1.0 >> 0.0e+00 0.0e+00 0.0e+00 28 44 0 0 0 28 44 0 0 0 12539 >> -da_refine 3 MatSOR 1915 1.0 2.7142e+01 1.1 9.51e+09 1.0 >> 0.0e+00 0.0e+00 0.0e+00 45 45 0 0 0 45 45 0 0 0 11216 >> -da_refine 4 MatSOR 2262 1.0 2.2116e+02 1.1 7.56e+10 1.0 >> 0.0e+00 0.0e+00 0.0e+00 48 45 0 0 0 48 45 0 0 0 10936 >> > > SOR should track memory bandwidth, so it seems to me either > > a) We fell out of MCDRAM > > or > > b) We saturated the KNL node, but not the Haswell configuration > > I think these are all runs with identical parallelism, so its not b). > Justin, did you tell it to fall back to DRAM, or fail? > > Thanks, > > Matt > > > >> Hong (Mr.) >> >> >> > KNL: >> > MatSOR 2688 1.0 2.3942e+02 1.1 4.47e+10 1.0 0.0e+00 >> 0.0e+00 0.0e+00 36 45 0 0 0 36 45 0 0 0 11946 >> > KSPSolve 8 1.0 4.3837e+02 1.0 9.87e+10 1.0 1.5e+06 >> 8.8e+03 5.0e+03 68 99 98 61 98 68 99 98 61 98 14409 >> > SNESSolve 1 1.0 6.1583e+02 1.0 9.95e+10 1.0 1.6e+06 >> 1.4e+04 5.1e+03 96100100100 99 96100100100 99 10338 >> > SNESFunctionEval 9 1.0 3.8730e+01 1.0 0.00e+00 0.0 9.2e+03 >> 3.2e+04 0.0e+00 6 0 1 1 0 6 0 1 1 0 0 >> > SNESJacobianEval 40 1.0 1.5628e+02 1.0 0.00e+00 0.0 4.4e+04 >> 2.5e+05 1.4e+02 24 0 3 49 3 24 0 3 49 3 0 >> > PCSetUp 16 1.0 3.4525e+01 1.0 6.52e+07 1.0 2.8e+05 >> 1.0e+04 3.8e+03 5 0 18 13 74 5 0 18 13 74 119 >> > PCSetUpOnBlocks 60 1.0 9.5716e-01 1.1 1.41e+05 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> > PCApply 60 1.0 3.8705e+02 1.0 9.32e+10 1.0 1.2e+06 >> 8.0e+03 1.1e+03 60 94 79 45 21 60 94 79 45 21 15407 >> > MatMult 2860 1.0 1.4578e+02 1.1 4.92e+10 1.0 1.2e+06 >> 8.8e+03 0.0e+00 21 49 77 48 0 21 49 77 48 0 21579 >> > >> > Haswell: >> > MatSOR 2262 1.0 2.2116e+02 1.1 7.56e+10 1.0 0.0e+00 >> 0.0e+00 0.0e+00 48 45 0 0 0 48 45 0 0 0 10936 >> > KSPSolve 7 1.0 3.5937e+02 1.0 1.67e+11 1.0 6.7e+05 >> 1.3e+04 4.5e+03 81 99 98 60 98 81 99 98 60 98 14828 >> > SNESSolve 1 1.0 4.3749e+02 1.0 1.68e+11 1.0 6.8e+05 >> 2.1e+04 4.5e+03 99100100100 99 99100100100 99 12280 >> > SNESFunctionEval 8 1.0 1.5460e+01 1.0 0.00e+00 0.0 4.1e+03 >> 4.7e+04 0.0e+00 3 0 1 1 0 3 0 1 1 0 0 >> > SNESJacobianEval 35 1.0 6.8994e+01 1.0 0.00e+00 0.0 1.9e+04 >> 3.8e+05 1.3e+02 16 0 3 50 3 16 0 3 50 3 0 >> > PCSetUp 14 1.0 1.0860e+01 1.0 1.15e+08 1.0 1.3e+05 >> 1.4e+04 3.4e+03 2 0 19 13 74 2 0 19 13 74 335 >> > PCSetUpOnBlocks 50 1.0 4.5601e-02 1.6 2.89e+05 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 6 >> > PCApply 50 1.0 3.3545e+02 1.0 1.57e+11 1.0 5.3e+05 >> 1.2e+04 9.7e+02 75 94 77 44 21 75 94 77 44 21 15017 >> > MatMult 2410 1.0 1.2050e+02 1.1 8.28e+10 1.0 5.1e+05 >> 1.3e+04 0.0e+00 27 49 75 46 0 27 49 75 46 0 21983 >> > >> >> If ex48 has SSE2 intrinsics, does that mean Haswell would almost >> always be better? >> >> >> >> The Jacobian evaluation (which has SSE2 intrinsics) on Haswell is >> about two times as fast as on KNL, but it eats only 3%-4% of the total time. >> > >> > SNESJacobianEval alone accounts for 90 seconds of the 180 second >> > difference between KNL and Haswell. >> > >> >> According to your logs, the compute-intensive kernels such as MatMult, >> >> MatSOR, PCApply run faster (~2X) on Haswell. >> > >> > They run almost the same speed. >> > >> >> But since the setup time dominates in this test, >> > >> > It doesn't dominate on the larger sizes. >> > >> >> Haswell would not show much benefit. If you increase the problem size, >> >> it could be expected that the performance gap would also increase. >> > >> > Backwards. Haswell is great for low latency on small problem sizes >> > while KNL offers higher theoretical throughput (often not realized due >> > to lack of vectorization) for sufficiently large problem sizes >> > (especially if they don't fit in Haswell L3 cache but do fit in MCDRAM). >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Apr 5 12:34:12 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 5 Apr 2017 12:34:12 -0500 Subject: [petsc-users] examples of DMPlex*FVM methods In-Reply-To: <877f2yiy6l.fsf@jedbrown.org> References: <87k270kjtd.fsf@jedbrown.org> <877f2yiy6l.fsf@jedbrown.org> Message-ID: On Wed, Apr 5, 2017 at 12:03 PM, Jed Brown wrote: > Matthew Knepley writes: > > As a side note, I think using FV to solve an elliptic equation should be > a > > felony. Continuous FEM is excellent for this, whereas FV needs > > a variety of twisted hacks and is always worse in terms of computation > and > > accuracy. > > Unless you need exact (no discretization error) local conservation, > e.g., for a projection in a staggered grid incompressible flow problem, > in which case you can use either FV or mixed FEM (algebraically > equivalent to FV in some cases). > Okay, the words are getting in the way of me understanding. I want to see if I can pull something I can use out of the above explanation. First, "locally conservative" bothers me. It does not seem to indicate what it really does. I start with the Poisson equation \Delta p = f So the setup is then that I discretize both the quantity and its derivative (I will use mixed FEM style since I know it better) div v = f grad p = v Now, you might expect that "local conservation" would give me the exact result for \int_T p everywhere, meaning the integral of p over every cell T. However, there is discretization error in the fluxes v, and then I determine p by adding them up. So the thing that is exact is the fact that anything going out of one cell T, goes into another cell T'. Thus \int_\Omega p is exact IF my boundary conditions for p are also exact. Otherwise they cannot be more accurate than the inflows/outflows. So suppose you have exact BC on your Poisson equation, then you can get exact global conservation. Why is it "locally conservative"? Moreover, why would exact global conservation of p be necessary? That might be equivalent to mass loss, but are you telling me that 10^-8 mass loss is distinguishable from 10^-14? I do not understand that. Thanks, Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Apr 5 12:36:58 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 5 Apr 2017 12:36:58 -0500 Subject: [petsc-users] Configuring PETSc for KNL In-Reply-To: References: <87h923lvkh.fsf@jedbrown.org> <78CC519D-CA8C-4735-AD22-48F63CF60442@anl.gov> <87inmij1f3.fsf@jedbrown.org> Message-ID: On Wed, Apr 5, 2017 at 12:23 PM, Justin Chang wrote: > I simply ran these KNL simulations in flat mode with the following options: > > srun -n 64 -c 4 --cpu_bind=cores numactl -p 1 ./ex48 .... > > Basically I told it that MCDRAM usage in NUMA domain 1 is preferred. I > followed the last example: http://www.nersc.gov/users/ > computational-systems/cori/configuration/knl-processor-modes/ > Right. I think, from the prior discussion, that -m 1 causes the run to fail if you spill out of MCDRAM. I think that is usually what we want since it makes things easier to interpret and running MKL from DRAM is like towing your McLaren with your Toyota. Matt > On Wed, Apr 5, 2017 at 12:00 PM, Matthew Knepley > wrote: > >> On Wed, Apr 5, 2017 at 11:54 AM, Zhang, Hong wrote: >> >>> >>> > On Apr 5, 2017, at 10:53 AM, Jed Brown wrote: >>> > >>> > "Zhang, Hong" writes: >>> > >>> >> On Apr 4, 2017, at 10:45 PM, Justin Chang >> > wrote: >>> >> >>> >> So I tried the following options: >>> >> >>> >> -M 40 >>> >> -N 40 >>> >> -P 5 >>> >> -da_refine 1/2/3/4 >>> >> -log_view >>> >> -mg_coarse_pc_type gamg >>> >> -mg_levels_0_pc_type gamg >>> >> -mg_levels_1_sub_pc_type cholesky >>> >> -pc_type mg >>> >> -thi_mat_type baij >>> >> >>> >> Performance improved dramatically. However, Haswell still beats out >>> KNL but only by a little. Now it seems like MatSOR is taking some time >>> (though I can't really judge whether it's significant or not). Attached are >>> the log files. >>> >> >>> >> >>> >> MatSOR takes only 3% of the total time. Most of the time is spent on >>> PCSetUp (~30%) and PCApply (~11%). >>> > >>> > I don't see any of your conclusions in the actual data, unless you only >>> > looked at the smallest size that Justin tested. For example, from the >>> > largest problem size in Justin's logs: >>> >>> My mistake. I did not see the results for the large problem sizes. I was >>> talking about the data for the smallest case. >>> >>> Now I am very surprised by the performance of MatSOR: >>> >>> -da_refine 1 ~2x slower on KNL >>> -da_refine 2 ~2x faster on KNL >>> -da_refine 3 ~2x faster on KNL >>> -da_refine 4 almost the same >>> >>> KNL >>> >>> -da_refine 1 MatSOR 1185 1.0 2.8965e-01 1.1 7.01e+07 1.0 >>> 0.0e+00 0.0e+00 0.0e+00 3 41 0 0 0 3 41 0 0 0 15231 >>> -da_refine 2 MatSOR 1556 1.0 1.6883e+00 1.0 5.82e+08 1.0 >>> 0.0e+00 0.0e+00 0.0e+00 11 44 0 0 0 11 44 0 0 0 22019 >>> -da_refine 3 MatSOR 2240 1.0 1.4959e+01 1.0 5.51e+09 1.0 >>> 0.0e+00 0.0e+00 0.0e+00 22 45 0 0 0 22 45 0 0 0 23571 >>> -da_refine 4 MatSOR 2688 1.0 2.3942e+02 1.1 4.47e+10 1.0 >>> 0.0e+00 0.0e+00 0.0e+00 36 45 0 0 0 36 45 0 0 0 11946 >>> >>> >>> Haswell >>> -da_refine 1 MatSOR 1167 1.0 1.4839e-01 1.1 1.42e+08 1.0 >>> 0.0e+00 0.0e+00 0.0e+00 3 42 0 0 0 3 42 0 0 0 30450 >>> -da_refine 2 MatSOR 1532 1.0 2.9772e+00 1.0 1.17e+09 1.0 >>> 0.0e+00 0.0e+00 0.0e+00 28 44 0 0 0 28 44 0 0 0 12539 >>> -da_refine 3 MatSOR 1915 1.0 2.7142e+01 1.1 9.51e+09 1.0 >>> 0.0e+00 0.0e+00 0.0e+00 45 45 0 0 0 45 45 0 0 0 11216 >>> -da_refine 4 MatSOR 2262 1.0 2.2116e+02 1.1 7.56e+10 1.0 >>> 0.0e+00 0.0e+00 0.0e+00 48 45 0 0 0 48 45 0 0 0 10936 >>> >> >> SOR should track memory bandwidth, so it seems to me either >> >> a) We fell out of MCDRAM >> >> or >> >> b) We saturated the KNL node, but not the Haswell configuration >> >> I think these are all runs with identical parallelism, so its not b). >> Justin, did you tell it to fall back to DRAM, or fail? >> >> Thanks, >> >> Matt >> >> >> >>> Hong (Mr.) >>> >>> >>> > KNL: >>> > MatSOR 2688 1.0 2.3942e+02 1.1 4.47e+10 1.0 0.0e+00 >>> 0.0e+00 0.0e+00 36 45 0 0 0 36 45 0 0 0 11946 >>> > KSPSolve 8 1.0 4.3837e+02 1.0 9.87e+10 1.0 1.5e+06 >>> 8.8e+03 5.0e+03 68 99 98 61 98 68 99 98 61 98 14409 >>> > SNESSolve 1 1.0 6.1583e+02 1.0 9.95e+10 1.0 1.6e+06 >>> 1.4e+04 5.1e+03 96100100100 99 96100100100 99 10338 >>> > SNESFunctionEval 9 1.0 3.8730e+01 1.0 0.00e+00 0.0 9.2e+03 >>> 3.2e+04 0.0e+00 6 0 1 1 0 6 0 1 1 0 0 >>> > SNESJacobianEval 40 1.0 1.5628e+02 1.0 0.00e+00 0.0 4.4e+04 >>> 2.5e+05 1.4e+02 24 0 3 49 3 24 0 3 49 3 0 >>> > PCSetUp 16 1.0 3.4525e+01 1.0 6.52e+07 1.0 2.8e+05 >>> 1.0e+04 3.8e+03 5 0 18 13 74 5 0 18 13 74 119 >>> > PCSetUpOnBlocks 60 1.0 9.5716e-01 1.1 1.41e+05 0.0 0.0e+00 >>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> > PCApply 60 1.0 3.8705e+02 1.0 9.32e+10 1.0 1.2e+06 >>> 8.0e+03 1.1e+03 60 94 79 45 21 60 94 79 45 21 15407 >>> > MatMult 2860 1.0 1.4578e+02 1.1 4.92e+10 1.0 1.2e+06 >>> 8.8e+03 0.0e+00 21 49 77 48 0 21 49 77 48 0 21579 >>> > >>> > Haswell: >>> > MatSOR 2262 1.0 2.2116e+02 1.1 7.56e+10 1.0 0.0e+00 >>> 0.0e+00 0.0e+00 48 45 0 0 0 48 45 0 0 0 10936 >>> > KSPSolve 7 1.0 3.5937e+02 1.0 1.67e+11 1.0 6.7e+05 >>> 1.3e+04 4.5e+03 81 99 98 60 98 81 99 98 60 98 14828 >>> > SNESSolve 1 1.0 4.3749e+02 1.0 1.68e+11 1.0 6.8e+05 >>> 2.1e+04 4.5e+03 99100100100 99 99100100100 99 12280 >>> > SNESFunctionEval 8 1.0 1.5460e+01 1.0 0.00e+00 0.0 4.1e+03 >>> 4.7e+04 0.0e+00 3 0 1 1 0 3 0 1 1 0 0 >>> > SNESJacobianEval 35 1.0 6.8994e+01 1.0 0.00e+00 0.0 1.9e+04 >>> 3.8e+05 1.3e+02 16 0 3 50 3 16 0 3 50 3 0 >>> > PCSetUp 14 1.0 1.0860e+01 1.0 1.15e+08 1.0 1.3e+05 >>> 1.4e+04 3.4e+03 2 0 19 13 74 2 0 19 13 74 335 >>> > PCSetUpOnBlocks 50 1.0 4.5601e-02 1.6 2.89e+05 0.0 0.0e+00 >>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 6 >>> > PCApply 50 1.0 3.3545e+02 1.0 1.57e+11 1.0 5.3e+05 >>> 1.2e+04 9.7e+02 75 94 77 44 21 75 94 77 44 21 15017 >>> > MatMult 2410 1.0 1.2050e+02 1.1 8.28e+10 1.0 5.1e+05 >>> 1.3e+04 0.0e+00 27 49 75 46 0 27 49 75 46 0 21983 >>> > >>> >> If ex48 has SSE2 intrinsics, does that mean Haswell would almost >>> always be better? >>> >> >>> >> The Jacobian evaluation (which has SSE2 intrinsics) on Haswell is >>> about two times as fast as on KNL, but it eats only 3%-4% of the total time. >>> > >>> > SNESJacobianEval alone accounts for 90 seconds of the 180 second >>> > difference between KNL and Haswell. >>> > >>> >> According to your logs, the compute-intensive kernels such as MatMult, >>> >> MatSOR, PCApply run faster (~2X) on Haswell. >>> > >>> > They run almost the same speed. >>> > >>> >> But since the setup time dominates in this test, >>> > >>> > It doesn't dominate on the larger sizes. >>> > >>> >> Haswell would not show much benefit. If you increase the problem size, >>> >> it could be expected that the performance gap would also increase. >>> > >>> > Backwards. Haswell is great for low latency on small problem sizes >>> > while KNL offers higher theoretical throughput (often not realized due >>> > to lack of vectorization) for sufficiently large problem sizes >>> > (especially if they don't fit in Haswell L3 cache but do fit in >>> MCDRAM). >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From ingogaertner.tus at gmail.com Wed Apr 5 12:46:45 2017 From: ingogaertner.tus at gmail.com (Ingo Gaertner) Date: Wed, 5 Apr 2017 19:46:45 +0200 Subject: [petsc-users] examples of DMPlex*FVM methods In-Reply-To: References: <87k270kjtd.fsf@jedbrown.org> Message-ID: Hi Matt, I don't care if FV is suboptimal to solve the Poisson equation. I only want to better understand the method by getting my hands dirty, and also implement the general transport equation later. We were told that FVM is far more efficient for the transport equation than FEM, and this is why most CFD codes would use FVM. Do you contradict? Do you have benchmarks that show bad performance for the (parabolic) transport equation solved by FVM, or why do you think that FVM was designed only for hyperbolic problems? The decision whether to focus on FEM or FVM is quite interesting for me, because it seems like a matter of taste, and our professor of numerical methods for CFD seems to strongly prefer FVM without a solid basis to justify his preference. Thanks Ingo 2017-04-05 18:56 GMT+02:00 Matthew Knepley : > On Wed, Apr 5, 2017 at 11:50 AM, Ingo Gaertner > wrote: > >> Hi Jed, >> thank you for your reply. Two followup questions below: >> >> 2017-04-04 22:18 GMT+02:00 Jed Brown : >> >>> Ingo Gaertner writes: >>> >>> > We have never talked about Riemann solvers in our CFD course, and I >>> don't >>> > understand what's going on in ex11. >>> > However, if you could answer a few of my questions, you'll give me a >>> good >>> > start with PETSc. For the simple poisson problem that I am trying to >>> > implement, I have to discretize div(k grad u) integrated over each FV >>> cell, >>> > where k is the known diffusivity, and u is the vector to solve for. >>> >>> Note that ex11 solves hyperbolic conservation laws, but you are solving >>> an elliptic equation. >>> >> >> I begin to understand. Petscs FVM methods don't provide a FVM library >> that can be used to implement the FV control volume approach (see Ferziger) >> for general CFD problems? They are around just because they have been used >> to tackle one or two specific problems, is this correct? >> I thought they could be used similar to the OpenFvm or OpenFoam libraries >> which seem to solve Poisson, Navier-Stokes, Euler and other problems. If >> such methods have not been prepared for Petsc, I'll just follow Ferzigers >> book and start my work on a lower level than I thought would be necessary. >> More work, more fun :) >> > > Yes, that is correct. > > As a side note, I think using FV to solve an elliptic equation should be a > felony. Continuous FEM is excellent for this, whereas FV needs > a variety of twisted hacks and is always worse in terms of computation and > accuracy. Hyperbolic problems are what FV is designed for > and I don't think I would ever support it for anything but that. > > Thanks, > > Matt > > >> > (My second question is more general about the PETSc installation. When I >>> > configure PETSc with "--prefix=/somewhere --download-triangle >>> > --download-parmetis" etc., these extra libraries are built correctly >>> during >>> > the make step, but they are not copied to /somewhere during the "make >>> > install" step. >>> >>> Where are they put during configure? >>> >> >> My bad, Petsc installation works as expected. But the build system that I >> am using is doing something weird. I'll have to find out, what's going >> wrong there, but it is not related to Petsc. >> >> Thank you! >> Ingo >> >> >> >> Virenfrei. >> www.avast.com >> >> <#m_-443598393346959692_m_-9031669297911553285_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Apr 5 12:53:09 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 5 Apr 2017 12:53:09 -0500 Subject: [petsc-users] examples of DMPlex*FVM methods In-Reply-To: References: <87k270kjtd.fsf@jedbrown.org> Message-ID: On Wed, Apr 5, 2017 at 12:46 PM, Ingo Gaertner wrote: > Hi Matt, > I don't care if FV is suboptimal to solve the Poisson equation. I only > want to better understand the method by getting my hands dirty, and also > implement the general transport equation later. We were told that FVM is > far more efficient for the transport equation than FEM, and this is why > most CFD codes would use FVM. Do you contradict? > For transport, there are issues with accuracy first. You normally want to preserve positivity of the field since its a concentration. FV will do this, but there are also strategies for FEM. Plain vanilla FEM will do a bad job no doubt. > Do you have benchmarks that show bad performance for the (parabolic) > transport equation solved by FVM, or why do you think that FVM was designed > only for hyperbolic problems? > There are benchmarks for Euler/high-Re NS which show FVM losing badly to FEM (Ryan Glasby on the COFFEE project, Masayuki Yano from MIT), since FEM can do better than first order. Justin Chang from UH has nice papers enforcing positivity in FEM transport but I do not think there are comparisons to FV. That would make a good paper for him soon :) > The decision whether to focus on FEM or FVM is quite interesting for me, > because it seems like a matter of taste, and our professor of numerical > methods for CFD seems to strongly prefer FVM without a solid basis to > justify his preference. > I agree with Jed here that the proper way to evaluate this tradeoff is to agree on a figure of merit (perfect or not) and try to construct work-precision diagrams (despite the imperfect measures of work). Thanks, Matt > Thanks > Ingo > > > 2017-04-05 18:56 GMT+02:00 Matthew Knepley : > >> On Wed, Apr 5, 2017 at 11:50 AM, Ingo Gaertner < >> ingogaertner.tus at gmail.com> wrote: >> >>> Hi Jed, >>> thank you for your reply. Two followup questions below: >>> >>> 2017-04-04 22:18 GMT+02:00 Jed Brown : >>> >>>> Ingo Gaertner writes: >>>> >>>> > We have never talked about Riemann solvers in our CFD course, and I >>>> don't >>>> > understand what's going on in ex11. >>>> > However, if you could answer a few of my questions, you'll give me a >>>> good >>>> > start with PETSc. For the simple poisson problem that I am trying to >>>> > implement, I have to discretize div(k grad u) integrated over each FV >>>> cell, >>>> > where k is the known diffusivity, and u is the vector to solve for. >>>> >>>> Note that ex11 solves hyperbolic conservation laws, but you are solving >>>> an elliptic equation. >>>> >>> >>> I begin to understand. Petscs FVM methods don't provide a FVM library >>> that can be used to implement the FV control volume approach (see Ferziger) >>> for general CFD problems? They are around just because they have been used >>> to tackle one or two specific problems, is this correct? >>> I thought they could be used similar to the OpenFvm or OpenFoam >>> libraries which seem to solve Poisson, Navier-Stokes, Euler and other >>> problems. If such methods have not been prepared for Petsc, I'll just >>> follow Ferzigers book and start my work on a lower level than I thought >>> would be necessary. More work, more fun :) >>> >> >> Yes, that is correct. >> >> As a side note, I think using FV to solve an elliptic equation should be >> a felony. Continuous FEM is excellent for this, whereas FV needs >> a variety of twisted hacks and is always worse in terms of computation >> and accuracy. Hyperbolic problems are what FV is designed for >> and I don't think I would ever support it for anything but that. >> >> Thanks, >> >> Matt >> >> >>> > (My second question is more general about the PETSc installation. When >>>> I >>>> > configure PETSc with "--prefix=/somewhere --download-triangle >>>> > --download-parmetis" etc., these extra libraries are built correctly >>>> during >>>> > the make step, but they are not copied to /somewhere during the "make >>>> > install" step. >>>> >>>> Where are they put during configure? >>>> >>> >>> My bad, Petsc installation works as expected. But the build system that >>> I am using is doing something weird. I'll have to find out, what's going >>> wrong there, but it is not related to Petsc. >>> >>> Thank you! >>> Ingo >>> >>> >>> >>> Virenfrei. >>> www.avast.com >>> >>> <#m_2465698883969087045_m_-443598393346959692_m_-9031669297911553285_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Apr 5 12:56:10 2017 From: jed at jedbrown.org (Jed Brown) Date: Wed, 05 Apr 2017 11:56:10 -0600 Subject: [petsc-users] examples of DMPlex*FVM methods In-Reply-To: References: <87k270kjtd.fsf@jedbrown.org> Message-ID: <8737dmivr9.fsf@jedbrown.org> Ingo Gaertner writes: > Hi Matt, > I don't care if FV is suboptimal to solve the Poisson equation. I only want > to better understand the method by getting my hands dirty, and also > implement the general transport equation later. We were told that FVM is > far more efficient for the transport equation than FEM, and this is why > most CFD codes would use FVM. Do you contradict? Do you have benchmarks > that show bad performance for the (parabolic) transport equation What is the "parabolic transport equation"? Advection-dominated diffusion? The hyperbolic part is usually the hard part. FEM can solve these problems, but FV is a good method, particularly if you want local conservation and monotonicity. > solved by FVM, or why do you think that FVM was designed only for > hyperbolic problems? The decision whether to focus on FEM or FVM is > quite interesting for me, because it seems like a matter of taste, and > our professor of numerical methods for CFD seems to strongly prefer > FVM without a solid basis to justify his preference. > > Thanks > Ingo > > > 2017-04-05 18:56 GMT+02:00 Matthew Knepley : > >> On Wed, Apr 5, 2017 at 11:50 AM, Ingo Gaertner > > wrote: >> >>> Hi Jed, >>> thank you for your reply. Two followup questions below: >>> >>> 2017-04-04 22:18 GMT+02:00 Jed Brown : >>> >>>> Ingo Gaertner writes: >>>> >>>> > We have never talked about Riemann solvers in our CFD course, and I >>>> don't >>>> > understand what's going on in ex11. >>>> > However, if you could answer a few of my questions, you'll give me a >>>> good >>>> > start with PETSc. For the simple poisson problem that I am trying to >>>> > implement, I have to discretize div(k grad u) integrated over each FV >>>> cell, >>>> > where k is the known diffusivity, and u is the vector to solve for. >>>> >>>> Note that ex11 solves hyperbolic conservation laws, but you are solving >>>> an elliptic equation. >>>> >>> >>> I begin to understand. Petscs FVM methods don't provide a FVM library >>> that can be used to implement the FV control volume approach (see Ferziger) >>> for general CFD problems? They are around just because they have been used >>> to tackle one or two specific problems, is this correct? >>> I thought they could be used similar to the OpenFvm or OpenFoam libraries >>> which seem to solve Poisson, Navier-Stokes, Euler and other problems. If >>> such methods have not been prepared for Petsc, I'll just follow Ferzigers >>> book and start my work on a lower level than I thought would be necessary. >>> More work, more fun :) >>> >> >> Yes, that is correct. >> >> As a side note, I think using FV to solve an elliptic equation should be a >> felony. Continuous FEM is excellent for this, whereas FV needs >> a variety of twisted hacks and is always worse in terms of computation and >> accuracy. Hyperbolic problems are what FV is designed for >> and I don't think I would ever support it for anything but that. >> >> Thanks, >> >> Matt >> >> >>> > (My second question is more general about the PETSc installation. When I >>>> > configure PETSc with "--prefix=/somewhere --download-triangle >>>> > --download-parmetis" etc., these extra libraries are built correctly >>>> during >>>> > the make step, but they are not copied to /somewhere during the "make >>>> > install" step. >>>> >>>> Where are they put during configure? >>>> >>> >>> My bad, Petsc installation works as expected. But the build system that I >>> am using is doing something weird. I'll have to find out, what's going >>> wrong there, but it is not related to Petsc. >>> >>> Thank you! >>> Ingo >>> >>> >>> >>> Virenfrei. >>> www.avast.com >>> >>> <#m_-443598393346959692_m_-9031669297911553285_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From jed at jedbrown.org Wed Apr 5 13:13:58 2017 From: jed at jedbrown.org (Jed Brown) Date: Wed, 05 Apr 2017 12:13:58 -0600 Subject: [petsc-users] examples of DMPlex*FVM methods In-Reply-To: References: <87k270kjtd.fsf@jedbrown.org> <877f2yiy6l.fsf@jedbrown.org> Message-ID: <87zifuhgd5.fsf@jedbrown.org> Matthew Knepley writes: > On Wed, Apr 5, 2017 at 12:03 PM, Jed Brown wrote: > >> Matthew Knepley writes: >> > As a side note, I think using FV to solve an elliptic equation should be >> a >> > felony. Continuous FEM is excellent for this, whereas FV needs >> > a variety of twisted hacks and is always worse in terms of computation >> and >> > accuracy. >> >> Unless you need exact (no discretization error) local conservation, >> e.g., for a projection in a staggered grid incompressible flow problem, >> in which case you can use either FV or mixed FEM (algebraically >> equivalent to FV in some cases). >> > > Okay, the words are getting in the way of me understanding. I want to see > if I can pull something I can use out of the above explanation. > > First, "locally conservative" bothers me. It does not seem to indicate what > it really does. I start with the Poisson equation > > \Delta p = f > > So the setup is then that I discretize both the quantity and its derivative > (I will use mixed FEM style since I know it better) > > div v = f > grad p = v > > Now, you might expect that "local conservation" would give me the exact > result for > > \int_T p > > everywhere, meaning the integral of p over every cell T. Since when is pressure a conserved quantity? In your notation above, local conservation means \int_T (div v - f) = 0 I.e., if you have a tracer moving in a source-free velocity field v solving the above equation, its concentration satisfies c_t + div(c v) = 0 and it will be conserved element-wise. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From jychang48 at gmail.com Wed Apr 5 13:31:42 2017 From: jychang48 at gmail.com (Justin Chang) Date: Wed, 5 Apr 2017 13:31:42 -0500 Subject: [petsc-users] Correlation between da_refine and pg_mg_levels In-Reply-To: References: <87y3vlmqye.fsf@jedbrown.org> <3FA19A8A-CED7-4D65-9A4C-03D10CCBF3EF@mcs.anl.gov> <87vaqlpwyd.fsf@jedbrown.org> <87pogtpwex.fsf@jedbrown.org> Message-ID: BTW what are the relevant papers that describe this problem? Is this one http://epubs.siam.org/doi/abs/10.1137/110834512 On Mon, Apr 3, 2017 at 1:04 PM Justin Chang wrote: > So my makefile/script is slightly different from the tutorial directory. > Basically I have a shell for loop that runs the 'make runex48' four times > where -da_refine is increased each time. It showed Levels 1 0 then 2 1 0 > because the job was in the middle of the loop, and I cancelled it halfway > when I realized it was returning errors as I didn't want to burn any > precious SU's :) > > Anyway, I ended up using Edison with 16 cores/node and Cori/Haswell with > 32 cores/node and got some nice numbers for 128x128x16 coarse grid. I am > however having issues with Cori/KNL, which I think has more to do with how > I configured PETSc and/or the job scripts. > > On Mon, Apr 3, 2017 at 6:23 AM, Jed Brown wrote: > > Matthew Knepley writes: > > I can't think why it would fail there, but DMDA really likes old numbers > of > > vertices, because it wants > > to take every other point, 129 seems good. I will see if I can reproduce > > once I get a chance. > > This problem uses periodic boundary conditions so even is right, but > Justin only defines the coarsest grid and uses -da_refine so it should > actually be irrelevant. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Apr 5 13:41:45 2017 From: jed at jedbrown.org (Jed Brown) Date: Wed, 05 Apr 2017 12:41:45 -0600 Subject: [petsc-users] Correlation between da_refine and pg_mg_levels In-Reply-To: References: <87y3vlmqye.fsf@jedbrown.org> <3FA19A8A-CED7-4D65-9A4C-03D10CCBF3EF@mcs.anl.gov> <87vaqlpwyd.fsf@jedbrown.org> <87pogtpwex.fsf@jedbrown.org> Message-ID: <87pogqhf2u.fsf@jedbrown.org> Justin Chang writes: > BTW what are the relevant papers that describe this problem? Is this one > > http://epubs.siam.org/doi/abs/10.1137/110834512 Yup. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From aherrema at iastate.edu Wed Apr 5 15:05:13 2017 From: aherrema at iastate.edu (Austin Herrema) Date: Wed, 5 Apr 2017 15:05:13 -0500 Subject: [petsc-users] Simultaneous use petsc4py and fortran/petsc-based python module In-Reply-To: References: Message-ID: When I was having this issue, I was using Homebrew-based PETSc. After switching to my using my own build, I no longer have the issues described above. Thanks again, Austin On Tue, Apr 4, 2017 at 1:30 PM, Gaetan Kenway wrote: > There shouldn't be any additional issue with the petsc4py wrapper. We do > this all the time. In fact, it's generally best to use the petsc4py to do > the initialization of petsc at the very top of your highest level python > script. You'll need to do this anyway if you want to use command line > arguments to change the petsc arch. > Again, its probably some 4/8 byte issue or maybe a real/complex issue that > is caused by the petsc4py import initializing something different from what > you expect. > > Gaetan > > On Tue, Apr 4, 2017 at 11:24 AM, Austin Herrema > wrote: > >> Hello all, >> >> Another question in a fairly long line of questions from me. Thank you to >> this community for all the help I've gotten. >> >> I have a Fortran/PETSc-based code that, with the help of f2py and some of >> you, I have compiled into a python module (we'll call it pc_fort_mod). So I >> can now successfully execute my code with >> >> import pc_fort_mod >> pc_fort_mod.execute() >> >> I am now hoping to use this analysis module in a large optimization >> problem being solved with OpenMDAO . OpenMDAO also >> makes use of PETSc/petsc4py, which, unsurprisingly, does not play well with >> my PETSc-based module. So doing >> >> from petsc4py import PETSc >> import pc_fort_mod >> pc_fort_mod.execute() >> >> causes the pc_fort_mod execution to fail (in particular, preallocation >> fails with an exit code of 63, "input argument out of range." I assume the >> matrix is invalid or something along those lines). >> >> So my question is, is there any way to make this work? Or is this pretty >> much out of the realm of what should be possible at this point? >> >> Thank you, >> Austin >> >> >> -- >> *Austin Herrema* >> PhD Student | Graduate Research Assistant | Iowa State University >> Wind Energy Science, Engineering, and Policy | Mechanical Engineering >> > > -- *Austin Herrema* PhD Student | Graduate Research Assistant | Iowa State University Wind Energy Science, Engineering, and Policy | Mechanical Engineering -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Apr 5 15:14:27 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 5 Apr 2017 15:14:27 -0500 Subject: [petsc-users] examples of DMPlex*FVM methods In-Reply-To: <87zifuhgd5.fsf@jedbrown.org> References: <87k270kjtd.fsf@jedbrown.org> <877f2yiy6l.fsf@jedbrown.org> <87zifuhgd5.fsf@jedbrown.org> Message-ID: On Wed, Apr 5, 2017 at 1:13 PM, Jed Brown wrote: > Matthew Knepley writes: > > > On Wed, Apr 5, 2017 at 12:03 PM, Jed Brown wrote: > > > >> Matthew Knepley writes: > >> > As a side note, I think using FV to solve an elliptic equation should > be > >> a > >> > felony. Continuous FEM is excellent for this, whereas FV needs > >> > a variety of twisted hacks and is always worse in terms of computation > >> and > >> > accuracy. > >> > >> Unless you need exact (no discretization error) local conservation, > >> e.g., for a projection in a staggered grid incompressible flow problem, > >> in which case you can use either FV or mixed FEM (algebraically > >> equivalent to FV in some cases). > >> > > > > Okay, the words are getting in the way of me understanding. I want to see > > if I can pull something I can use out of the above explanation. > > > > First, "locally conservative" bothers me. It does not seem to indicate > what > > it really does. I start with the Poisson equation > > > > \Delta p = f > > > > So the setup is then that I discretize both the quantity and its > derivative > > (I will use mixed FEM style since I know it better) > > > > div v = f > > grad p = v > > > > Now, you might expect that "local conservation" would give me the exact > > result for > > > > \int_T p > > > > everywhere, meaning the integral of p over every cell T. > > Since when is pressure a conserved quantity? > > In your notation above, local conservation means > > \int_T (div v - f) = 0 > > I.e., if you have a tracer moving in a source-free velocity field v > solving the above equation, its concentration satisfies > > c_t + div(c v) = 0 > > and it will be conserved element-wise. > But again that seems like a terrible term. What that statement above means is that globally I will have no loss, but the individual amounts in each cell are not accurate to machine error, they are accurate to discretization error because the flux is only accurate to discretization error. Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Apr 5 21:09:43 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 5 Apr 2017 21:09:43 -0500 Subject: [petsc-users] Understanding DMPlexDistribute overlap In-Reply-To: <8336361.4c60jFNEBE@pc-fcaimmi> References: <8575748.yB8pcOHeRQ@pc-fcaimmi> <47d05e4c-7ec3-7b51-8893-d34ba9df0ceb@imperial.ac.uk> <8336361.4c60jFNEBE@pc-fcaimmi> Message-ID: On Wed, Apr 5, 2017 at 6:03 AM, Francesco Caimmi wrote: > Hi Michael, > > thanks for the prompt reply! > > While I am happy I mostly got it right, this means I have some kind of > problem > I cannot solve on my own. :( > > I have this very simple 2D mesh I am experimenting with: a rectangle with > 64 > vertexes and 45 cells (attached in exodus format as cantilever.e); I am > using > in this very simple petsc4py program to read it, define a section and > output a > vector. The overlay value can be controlled by the -o command line switch. > The > program is executed as: > mpiexec -np 2 python overlay-test.py -o -log_view. > > > Everything works smoothly for = 0 or to 1, but for values > >2 > the program fails with the error message captured in the attached file > error.log. Changing the number of processors does not alter the behavior. > Note > also that the same holds if I use a mesh generated by DMPlexCreateBoxMesh. > Francesco, I will reproduce your problem, but it may take me a few days. It is strange since we have tests for overlap > 1 that do use CreateBoxMesh. For example, cd src/dm/impls/plex/examples/tests make ex12 mpiexec -n 8 -test_partition -overlap 2 -dm_view ::ascii_info_detail Thanks, Matt > I would really appreciate hints on how to solve this issue and I will of > course provide any needed additional information. > > Thank you very much, > FC > > On mercoled? 5 aprile 2017 10:50:59 CEST you wrote: > > Hi Francesco, > > > > Your description is almost correct: the overlap defines the topological > > depth of shared entities as counted in "neighboring cells", where a cell > > counts as a neighbor of an owned cell according to the defined adjacency > > style. So for overlap=0 only facets, edges and vertices may be shared > > along the partition boundary, whereas for overlap=1 you can expect one > > additional "layer" of cells around each partition (the partitioning is > > done based on cell connectivity). For second neighbors, however, you > > need overlap=2. And yes, there is conceptually no upper bound on the > > overlap. > > > > Hope this helps, > > > > Michael > > > > On 05/04/17 10:27, Francesco Caimmi wrote: > > > Dear all, > > > > > > I was playing with DMPlex objects and I was trying to exactly figure > out > > > what the `overlap` parameter in DMPlexDistribute does. > > > > > > From the tutorial "Flexible, Scalable Mesh and Data Management > > > > > > using PETSc DMPlex" (slide 10) and from the work by Knepley et al. > > > "Unstructured Overlapping Mesh Distribution in Parallel" I somehow got > the > > > idea that it should control the "depth" of the mesh overlap. > > > That is, given the partition boundary, if overlay is set to 0 only the > > > entities adjacent (in the DMPlex topological sense and with the "style" > > > defined by the AdjacencyUse routines) to entities at the boundary are > > > shared, if overlay is 1 the first and the second neighbors (always in > the > > > DMPlex topological sense) are shared and so on, up to the point were we > > > have a full duplicate of the mesh on each process (i.e. there is no > upper > > > bound on `overlap`). > > > > > > Is this correct or am I -totally- misunderstanding the meaning of the > > > parameter? > > > > > > I am asking this because I see some behavior I cannot explain at > varying > > > the value of the overlap, but before going into the details I would > like > > > to be sure to understand exactly what the overlap parameter is > supposed > > > to do. > > > > > > Many thanks, > > > -- > Francesco Caimmi > > Laboratorio di Ingegneria dei Polimeri > http://www.chem.polimi.it/polyenglab/ > > Politecnico di Milano - Dipartimento di Chimica, > Materiali e Ingegneria Chimica ?Giulio Natta? > > P.zza Leonardo da Vinci, 32 > I-20133 Milano > Tel. +39.02.2399.4711 > Fax +39.02.7063.8173 > > francesco.caimmi at polimi.it > Skype: fmglcaimmi (please arrange meetings by e-mail) > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Apr 5 21:57:27 2017 From: jed at jedbrown.org (Jed Brown) Date: Wed, 05 Apr 2017 20:57:27 -0600 Subject: [petsc-users] examples of DMPlex*FVM methods In-Reply-To: References: <87k270kjtd.fsf@jedbrown.org> <877f2yiy6l.fsf@jedbrown.org> <87zifuhgd5.fsf@jedbrown.org> Message-ID: <87fuhmgs4o.fsf@jedbrown.org> Matthew Knepley writes: > On Wed, Apr 5, 2017 at 1:13 PM, Jed Brown wrote: > >> Matthew Knepley writes: >> >> > On Wed, Apr 5, 2017 at 12:03 PM, Jed Brown wrote: >> > >> >> Matthew Knepley writes: >> >> > As a side note, I think using FV to solve an elliptic equation should >> be >> >> a >> >> > felony. Continuous FEM is excellent for this, whereas FV needs >> >> > a variety of twisted hacks and is always worse in terms of computation >> >> and >> >> > accuracy. >> >> >> >> Unless you need exact (no discretization error) local conservation, >> >> e.g., for a projection in a staggered grid incompressible flow problem, >> >> in which case you can use either FV or mixed FEM (algebraically >> >> equivalent to FV in some cases). >> >> >> > >> > Okay, the words are getting in the way of me understanding. I want to see >> > if I can pull something I can use out of the above explanation. >> > >> > First, "locally conservative" bothers me. It does not seem to indicate >> what >> > it really does. I start with the Poisson equation >> > >> > \Delta p = f >> > >> > So the setup is then that I discretize both the quantity and its >> derivative >> > (I will use mixed FEM style since I know it better) >> > >> > div v = f >> > grad p = v >> > >> > Now, you might expect that "local conservation" would give me the exact >> > result for >> > >> > \int_T p >> > >> > everywhere, meaning the integral of p over every cell T. >> >> Since when is pressure a conserved quantity? >> >> In your notation above, local conservation means >> >> \int_T (div v - f) = 0 >> >> I.e., if you have a tracer moving in a source-free velocity field v >> solving the above equation, its concentration satisfies >> >> c_t + div(c v) = 0 >> >> and it will be conserved element-wise. >> > > But again that seems like a terrible term. What that statement above means > is that globally > I will have no loss, but the individual amounts in each cell are not > accurate to machine error, > they are accurate to discretization error because the flux is only accurate > to discretization error. No. The velocity field is divergence-free up to solver tolerance. Since the piecewise constants are in the test space, there is a literal equation that reads \int_T (div v - f) = 0. That holds up to solver tolerance, not just up to discretization error. That's what local conservation means. If you use continuous FEM, you don't have a statement like the above. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From jed at jedbrown.org Wed Apr 5 22:26:11 2017 From: jed at jedbrown.org (Jed Brown) Date: Wed, 05 Apr 2017 21:26:11 -0600 Subject: [petsc-users] Configuring PETSc for KNL In-Reply-To: References: <87h923lvkh.fsf@jedbrown.org> <78CC519D-CA8C-4735-AD22-48F63CF60442@anl.gov> <87inmij1f3.fsf@jedbrown.org> Message-ID: <87wpayfc8c.fsf@jedbrown.org> Matthew Knepley writes: > On Wed, Apr 5, 2017 at 12:23 PM, Justin Chang wrote: > >> I simply ran these KNL simulations in flat mode with the following options: >> >> srun -n 64 -c 4 --cpu_bind=cores numactl -p 1 ./ex48 .... >> >> Basically I told it that MCDRAM usage in NUMA domain 1 is preferred. I >> followed the last example: http://www.nersc.gov/users/ >> computational-systems/cori/configuration/knl-processor-modes/ >> > > Right. I think, from the prior discussion, that -m 1 causes the run to fail > if you spill out of MCDRAM. I think that is usually > what we want since it makes things easier to interpret and running MKL from > DRAM is like towing your McLaren with > your Toyota. I'm not sure whether getting the Intel acronyms mixed up (KNL vs MKL) makes the quote above better or worse. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From ingogaertner.tus at gmail.com Thu Apr 6 01:16:27 2017 From: ingogaertner.tus at gmail.com (Ingo Gaertner) Date: Thu, 6 Apr 2017 08:16:27 +0200 Subject: [petsc-users] examples of DMPlex*FVM methods In-Reply-To: <8737dmivr9.fsf@jedbrown.org> References: <87k270kjtd.fsf@jedbrown.org> <8737dmivr9.fsf@jedbrown.org> Message-ID: 2017-04-05 19:56 GMT+02:00 Jed Brown : > Ingo Gaertner writes: > > > Hi Matt, > > I don't care if FV is suboptimal to solve the Poisson equation. I only > want > > to better understand the method by getting my hands dirty, and also > > implement the general transport equation later. We were told that FVM is > > far more efficient for the transport equation than FEM, and this is why > > most CFD codes would use FVM. Do you contradict? Do you have benchmarks > > that show bad performance for the (parabolic) transport equation > > What is the "parabolic transport equation"? Advection-dominated > diffusion? The hyperbolic part is usually the hard part. FEM can solve > these problems, but FV is a good method, particularly if you want local > conservation and monotonicity. > By transport equation I mean the advection-diffusion equation. This is always parabolic, independent of whether it is advection dominated or diffusion dominated. And the elliptic Poisson equation can be solved by making it timedependent and converge to steady state, again solving a parabolic equation. At least this is how I learned the terms. My impression is that everybody has his hammer, be it FEM or FVM, so that every problem looks like a nail. You can also hammer a screw into the wall if the wall isn't too hard. Ingo Virenfrei. www.avast.com <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jychang48 at gmail.com Thu Apr 6 02:56:09 2017 From: jychang48 at gmail.com (Justin Chang) Date: Thu, 6 Apr 2017 02:56:09 -0500 Subject: [petsc-users] examples of DMPlex*FVM methods In-Reply-To: References: <87k270kjtd.fsf@jedbrown.org> <8737dmivr9.fsf@jedbrown.org> Message-ID: There are many flavors of FEM and FVM. If by FEM you mean the Continuous Galerkin FEM, then yes it is a far from ideal method for solving advection-diffusion equations, especially when advection is the dominating effect. The Discontinuous Galerkin (DG) FEM on the other hand is much better for advection-diffusion equations (though far from perfect). It has properties very similar to the FVM as it also ensures local/element-wise mass conservation. Ferziger's book is quite biased in favor of FVM and doesn't discuss other numerical methods in depth. I don't think there are any PETSc DMPlex DG examples at the moment (though I could be wrong). But this paper gives a nice introduction/overview to DG: http://epubs.siam.org/doi/abs/10.1137/S0036142901384162 Now if you are interested in a really sophisticated "PETSc example" of solving the advection-diffusion equation using the two-point flux FVM, there's PFLOTRAN: http://www.pflotran.org Justin PS - Is there any particular reason why PFLOTRAN is not listed as on the PETSc homepage? It seems to be a pretty major "Related packages that use PETSc." On Thu, Apr 6, 2017 at 1:16 AM, Ingo Gaertner wrote: > > > 2017-04-05 19:56 GMT+02:00 Jed Brown : > >> Ingo Gaertner writes: >> >> > Hi Matt, >> > I don't care if FV is suboptimal to solve the Poisson equation. I only >> want >> > to better understand the method by getting my hands dirty, and also >> > implement the general transport equation later. We were told that FVM is >> > far more efficient for the transport equation than FEM, and this is why >> > most CFD codes would use FVM. Do you contradict? Do you have benchmarks >> > that show bad performance for the (parabolic) transport equation >> >> What is the "parabolic transport equation"? Advection-dominated >> diffusion? The hyperbolic part is usually the hard part. FEM can solve >> these problems, but FV is a good method, particularly if you want local >> conservation and monotonicity. >> > > By transport equation I mean the advection-diffusion equation. This is > always parabolic, independent of whether it is advection dominated or > diffusion dominated. And the elliptic Poisson equation can be solved by > making it timedependent and converge to steady state, again solving a > parabolic equation. At least this is how I learned the terms. > My impression is that everybody has his hammer, be it FEM or FVM, so that > every problem looks like a nail. You can also hammer a screw into the wall > if the wall isn't too hard. > > Ingo > > > Virenfrei. > www.avast.com > > <#m_-6291878642536978062_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From francesco.caimmi at polimi.it Thu Apr 6 04:25:04 2017 From: francesco.caimmi at polimi.it (Francesco Caimmi) Date: Thu, 6 Apr 2017 11:25:04 +0200 Subject: [petsc-users] Understanding DMPlexDistribute overlap In-Reply-To: References: <8575748.yB8pcOHeRQ@pc-fcaimmi> <8336361.4c60jFNEBE@pc-fcaimmi> Message-ID: <3596627.diPbWC0zYa@pc-fcaimmi> Dear Matt, thanks for your reply. On mercoled? 5 aprile 2017 21:09:43 CEST Matthew Knepley wrote: > On Wed, Apr 5, 2017 at 6:03 AM, Francesco Caimmi >[...] > > the program fails with the error message captured in the attached file > > error.log. Changing the number of processors does not alter the behavior. > > Note > > also that the same holds if I use a mesh generated by DMPlexCreateBoxMesh. > > Francesco, I will reproduce your problem, but it may take me a few days. Thank you very much for your time. > > It is strange since we have tests for overlap > 1 that do use > CreateBoxMesh. For example, > > cd src/dm/impls/plex/examples/tests > make ex12 > mpiexec -n 8 -test_partition -overlap 2 -dm_view ::ascii_info_detail Anyway, there might be something related to my install: for the records, if I do $cd src/dm/impls/plex/examples/tests $make ex12 I get /home/fcaimmi/Packages/petsc/my-linux-petsc/bin/mpicc ex12.c -o ex12 ex12.c:3:25: fatal error: petscdmplex.h: No such file or directory #include ^ compilation terminated. : recipe for target 'ex12' failed make: *** [ex12] Error 1 I get this error I cannot explain ( I am not that much into PETSc build system, I mostly use the python bindings), since for example I can successfully build ex1.c or ex3.c which of course contain the same include statement... I don't know if this is useful to track the issue, but hope it helps. Thank you again, FC -- Francesco Caimmi Laboratorio di Ingegneria dei Polimeri http://www.chem.polimi.it/polyenglab/ Politecnico di Milano - Dipartimento di Chimica, Materiali e Ingegneria Chimica ?Giulio Natta? P.zza Leonardo da Vinci, 32 I-20133 Milano Tel. +39.02.2399.4711 Fax +39.02.7063.8173 francesco.caimmi at polimi.it Skype: fmglcaimmi (please arrange meetings by e-mail) From michael.lange at imperial.ac.uk Thu Apr 6 05:40:27 2017 From: michael.lange at imperial.ac.uk (Michael Lange) Date: Thu, 6 Apr 2017 11:40:27 +0100 Subject: [petsc-users] Understanding DMPlexDistribute overlap In-Reply-To: <3596627.diPbWC0zYa@pc-fcaimmi> References: <8575748.yB8pcOHeRQ@pc-fcaimmi> <8336361.4c60jFNEBE@pc-fcaimmi> <3596627.diPbWC0zYa@pc-fcaimmi> Message-ID: Hi Francesco, Ok, I can confirm that your test runs fine for me with the latest master branch. I'm attaching the log for two processes up to overlap 7, where the entire mesh is effectively replicated on each partition. The command I ran was: for OL in 1 2 3 4 5 6 7; do mpiexec -np 2 python overlay-test.py -o $OL; done Looks like this might be a problem with your local build. Can you please try and update to the latest master? Hope this helps, Michael On 06/04/17 10:25, Francesco Caimmi wrote: > Dear Matt, > > thanks for your reply. > > On mercoled? 5 aprile 2017 21:09:43 CEST Matthew Knepley wrote: >> On Wed, Apr 5, 2017 at 6:03 AM, Francesco Caimmi >> [...] >>> the program fails with the error message captured in the attached file >>> error.log. Changing the number of processors does not alter the behavior. >>> Note >>> also that the same holds if I use a mesh generated by DMPlexCreateBoxMesh. >> Francesco, I will reproduce your problem, but it may take me a few days. > Thank you very much for your time. > >> It is strange since we have tests for overlap > 1 that do use >> CreateBoxMesh. For example, >> >> cd src/dm/impls/plex/examples/tests >> make ex12 >> mpiexec -n 8 -test_partition -overlap 2 -dm_view ::ascii_info_detail > Anyway, there might be something related to my install: for the records, if I > do > > $cd src/dm/impls/plex/examples/tests > $make ex12 > > I get > > /home/fcaimmi/Packages/petsc/my-linux-petsc/bin/mpicc ex12.c -o ex12 > ex12.c:3:25: fatal error: petscdmplex.h: No such file or directory > #include > ^ > compilation terminated. > : recipe for target 'ex12' failed > make: *** [ex12] Error 1 > > I get this error I cannot explain ( I am not that much into PETSc build > system, I mostly use the python bindings), since for example I can > successfully build ex1.c or ex3.c which of course contain the same include > statement... > > I don't know if this is useful to track the issue, but hope it helps. > > Thank you again, > FC > -------------- next part -------------- A non-text attachment was scrubbed... Name: overlap_test.log Type: text/x-log Size: 4223 bytes Desc: not available URL: From knepley at gmail.com Thu Apr 6 06:25:31 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 6 Apr 2017 06:25:31 -0500 Subject: [petsc-users] Configuring PETSc for KNL In-Reply-To: <87wpayfc8c.fsf@jedbrown.org> References: <87h923lvkh.fsf@jedbrown.org> <78CC519D-CA8C-4735-AD22-48F63CF60442@anl.gov> <87inmij1f3.fsf@jedbrown.org> <87wpayfc8c.fsf@jedbrown.org> Message-ID: On Wed, Apr 5, 2017 at 10:26 PM, Jed Brown wrote: > Matthew Knepley writes: > > > On Wed, Apr 5, 2017 at 12:23 PM, Justin Chang > wrote: > > > >> I simply ran these KNL simulations in flat mode with the following > options: > >> > >> srun -n 64 -c 4 --cpu_bind=cores numactl -p 1 ./ex48 .... > >> > >> Basically I told it that MCDRAM usage in NUMA domain 1 is preferred. I > >> followed the last example: http://www.nersc.gov/users/ > >> computational-systems/cori/configuration/knl-processor-modes/ > >> > > > > Right. I think, from the prior discussion, that -m 1 causes the run to > fail > > if you spill out of MCDRAM. I think that is usually > > what we want since it makes things easier to interpret and running MKL > from > > DRAM is like towing your McLaren with > > your Toyota. > > I'm not sure whether getting the Intel acronyms mixed up (KNL vs MKL) > makes the quote above better or worse. > Too cryptic. Are you saying that this cannot be what is happening? How would you explain the drop off in performance? Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From lawrence.mitchell at imperial.ac.uk Thu Apr 6 06:26:35 2017 From: lawrence.mitchell at imperial.ac.uk (Lawrence Mitchell) Date: Thu, 6 Apr 2017 12:26:35 +0100 Subject: [petsc-users] Configuring PETSc for KNL In-Reply-To: References: <87h923lvkh.fsf@jedbrown.org> <78CC519D-CA8C-4735-AD22-48F63CF60442@anl.gov> <87inmij1f3.fsf@jedbrown.org> <87wpayfc8c.fsf@jedbrown.org> Message-ID: <369a543d-41eb-63e5-74fd-3a580a71aff5@imperial.ac.uk> On 06/04/17 12:25, Matthew Knepley wrote: > I'm not sure whether getting the Intel acronyms mixed up (KNL vs MKL) > makes the quote above better or worse. > > > Too cryptic. Are you saying that this cannot be what is happening? How > would you explain > the drop off in performance? > I think Jed is being facetious about both KNL and MKL. Lawrence -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 473 bytes Desc: OpenPGP digital signature URL: From knepley at gmail.com Thu Apr 6 06:34:02 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 6 Apr 2017 06:34:02 -0500 Subject: [petsc-users] examples of DMPlex*FVM methods In-Reply-To: <87fuhmgs4o.fsf@jedbrown.org> References: <87k270kjtd.fsf@jedbrown.org> <877f2yiy6l.fsf@jedbrown.org> <87zifuhgd5.fsf@jedbrown.org> <87fuhmgs4o.fsf@jedbrown.org> Message-ID: On Wed, Apr 5, 2017 at 9:57 PM, Jed Brown wrote: > Matthew Knepley writes: > > > On Wed, Apr 5, 2017 at 1:13 PM, Jed Brown wrote: > > > >> Matthew Knepley writes: > >> > >> > On Wed, Apr 5, 2017 at 12:03 PM, Jed Brown wrote: > >> > > >> >> Matthew Knepley writes: > >> >> > As a side note, I think using FV to solve an elliptic equation > should > >> be > >> >> a > >> >> > felony. Continuous FEM is excellent for this, whereas FV needs > >> >> > a variety of twisted hacks and is always worse in terms of > computation > >> >> and > >> >> > accuracy. > >> >> > >> >> Unless you need exact (no discretization error) local conservation, > >> >> e.g., for a projection in a staggered grid incompressible flow > problem, > >> >> in which case you can use either FV or mixed FEM (algebraically > >> >> equivalent to FV in some cases). > >> >> > >> > > >> > Okay, the words are getting in the way of me understanding. I want to > see > >> > if I can pull something I can use out of the above explanation. > >> > > >> > First, "locally conservative" bothers me. It does not seem to indicate > >> what > >> > it really does. I start with the Poisson equation > >> > > >> > \Delta p = f > >> > > >> > So the setup is then that I discretize both the quantity and its > >> derivative > >> > (I will use mixed FEM style since I know it better) > >> > > >> > div v = f > >> > grad p = v > >> > > >> > Now, you might expect that "local conservation" would give me the > exact > >> > result for > >> > > >> > \int_T p > >> > > >> > everywhere, meaning the integral of p over every cell T. > >> > >> Since when is pressure a conserved quantity? > >> > >> In your notation above, local conservation means > >> > >> \int_T (div v - f) = 0 > >> > >> I.e., if you have a tracer moving in a source-free velocity field v > >> solving the above equation, its concentration satisfies > >> > >> c_t + div(c v) = 0 > >> > >> and it will be conserved element-wise. > >> > > > > But again that seems like a terrible term. What that statement above > means > > is that globally > > I will have no loss, but the individual amounts in each cell are not > > accurate to machine error, > > they are accurate to discretization error because the flux is only > accurate > > to discretization error. > > No. The velocity field is divergence-free up to solver tolerance. Since > the piecewise constants are in the test space, there is a literal > equation that reads > > \int_T (div v - f) = 0. > > That holds up to solver tolerance, not just up to discretization error. > That's what local conservation means. > > If you use continuous FEM, you don't have a statement like the above. > Okay, that is what you mean by local conservation. That state is still only accurate to discretization error. Why do I care about satisfying that equation to machine precision? Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu Apr 6 06:52:13 2017 From: jed at jedbrown.org (Jed Brown) Date: Thu, 06 Apr 2017 05:52:13 -0600 Subject: [petsc-users] examples of DMPlex*FVM methods In-Reply-To: References: <87k270kjtd.fsf@jedbrown.org> <8737dmivr9.fsf@jedbrown.org> Message-ID: <87pogpg3de.fsf@jedbrown.org> Ingo Gaertner writes: > By transport equation I mean the advection-diffusion equation. This is > always parabolic, independent of whether it is advection dominated or > diffusion dominated. This is true from an analysis perspective, but nearly meaningless from the perspective of numerical methods on finite grids. > And the elliptic Poisson equation can be solved by making it > timedependent and converge to steady state, again solving a parabolic > equation. Yes, but also nearly meaningless because that is a tremendously inefficient method unless you have an efficient solver for the elliptic case, in which cas you may as well use it. > At least this is how I learned the terms. My impression is that > everybody has his hammer, be it FEM or FVM, so that every problem > looks like a nail. You can also hammer a screw into the wall if the > wall isn't too hard. True, but it isn't just religion, these choices depend on what you consider to be important, and if you have the same goals, the methods can sometimes be made to coincide. Anyway, there are several ways of implementing finite volume methods for elliptic problems, but if your problems is advection-dominated (high cell P?clet number), the discretization of advective terms will be more important. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From jed at jedbrown.org Thu Apr 6 07:04:15 2017 From: jed at jedbrown.org (Jed Brown) Date: Thu, 06 Apr 2017 06:04:15 -0600 Subject: [petsc-users] examples of DMPlex*FVM methods In-Reply-To: References: <87k270kjtd.fsf@jedbrown.org> <877f2yiy6l.fsf@jedbrown.org> <87zifuhgd5.fsf@jedbrown.org> <87fuhmgs4o.fsf@jedbrown.org> Message-ID: <87mvbtg2tc.fsf@jedbrown.org> Matthew Knepley writes: > On Wed, Apr 5, 2017 at 9:57 PM, Jed Brown wrote: > >> Matthew Knepley writes: >> >> > On Wed, Apr 5, 2017 at 1:13 PM, Jed Brown wrote: >> > >> >> Matthew Knepley writes: >> >> >> >> > On Wed, Apr 5, 2017 at 12:03 PM, Jed Brown wrote: >> >> > >> >> >> Matthew Knepley writes: >> >> >> > As a side note, I think using FV to solve an elliptic equation >> should >> >> be >> >> >> a >> >> >> > felony. Continuous FEM is excellent for this, whereas FV needs >> >> >> > a variety of twisted hacks and is always worse in terms of >> computation >> >> >> and >> >> >> > accuracy. >> >> >> >> >> >> Unless you need exact (no discretization error) local conservation, >> >> >> e.g., for a projection in a staggered grid incompressible flow >> problem, >> >> >> in which case you can use either FV or mixed FEM (algebraically >> >> >> equivalent to FV in some cases). >> >> >> >> >> > >> >> > Okay, the words are getting in the way of me understanding. I want to >> see >> >> > if I can pull something I can use out of the above explanation. >> >> > >> >> > First, "locally conservative" bothers me. It does not seem to indicate >> >> what >> >> > it really does. I start with the Poisson equation >> >> > >> >> > \Delta p = f >> >> > >> >> > So the setup is then that I discretize both the quantity and its >> >> derivative >> >> > (I will use mixed FEM style since I know it better) >> >> > >> >> > div v = f >> >> > grad p = v >> >> > >> >> > Now, you might expect that "local conservation" would give me the >> exact >> >> > result for >> >> > >> >> > \int_T p >> >> > >> >> > everywhere, meaning the integral of p over every cell T. >> >> >> >> Since when is pressure a conserved quantity? >> >> >> >> In your notation above, local conservation means >> >> >> >> \int_T (div v - f) = 0 >> >> >> >> I.e., if you have a tracer moving in a source-free velocity field v >> >> solving the above equation, its concentration satisfies >> >> >> >> c_t + div(c v) = 0 >> >> >> >> and it will be conserved element-wise. >> >> >> > >> > But again that seems like a terrible term. What that statement above >> means >> > is that globally >> > I will have no loss, but the individual amounts in each cell are not >> > accurate to machine error, >> > they are accurate to discretization error because the flux is only >> accurate >> > to discretization error. >> >> No. The velocity field is divergence-free up to solver tolerance. Since >> the piecewise constants are in the test space, there is a literal >> equation that reads >> >> \int_T (div v - f) = 0. >> >> That holds up to solver tolerance, not just up to discretization error. >> That's what local conservation means. >> >> If you use continuous FEM, you don't have a statement like the above. >> > > Okay, that is what you mean by local conservation. That state is still only > accurate to discretization error. > Why do I care about satisfying that equation to machine precision? I swear we've had this discussion before. If you have a tracer moving in a velocity field that is not discrete divergence-free (i.e., satisfying the element-wise equation above), you'll get artifacts in the concentration (possibly violating positivity or a maximum principle). The (normal component of) velocity is also more accurate when you solve in mixed H(div) form (or an equivalent FV method) than if you solve in H^1. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From jed at jedbrown.org Thu Apr 6 07:13:39 2017 From: jed at jedbrown.org (Jed Brown) Date: Thu, 06 Apr 2017 06:13:39 -0600 Subject: [petsc-users] Configuring PETSc for KNL In-Reply-To: <369a543d-41eb-63e5-74fd-3a580a71aff5@imperial.ac.uk> References: <87h923lvkh.fsf@jedbrown.org> <78CC519D-CA8C-4735-AD22-48F63CF60442@anl.gov> <87inmij1f3.fsf@jedbrown.org> <87wpayfc8c.fsf@jedbrown.org> <369a543d-41eb-63e5-74fd-3a580a71aff5@imperial.ac.uk> Message-ID: <87k26xg2do.fsf@jedbrown.org> Lawrence Mitchell writes: > On 06/04/17 12:25, Matthew Knepley wrote: >> I'm not sure whether getting the Intel acronyms mixed up (KNL vs MKL) >> makes the quote above better or worse. >> >> >> Too cryptic. Are you saying that this cannot be what is happening? How >> would you explain >> the drop off in performance? >> > I think Jed is being facetious about both KNL and MKL. > > Lawrence Thanks. Matt almost certainly meant to write KNL (the statement makes sense in that context), but instead wrote MKL (not relevant in this context). -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From knepley at gmail.com Thu Apr 6 08:18:14 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 6 Apr 2017 08:18:14 -0500 Subject: [petsc-users] examples of DMPlex*FVM methods In-Reply-To: <87mvbtg2tc.fsf@jedbrown.org> References: <87k270kjtd.fsf@jedbrown.org> <877f2yiy6l.fsf@jedbrown.org> <87zifuhgd5.fsf@jedbrown.org> <87fuhmgs4o.fsf@jedbrown.org> <87mvbtg2tc.fsf@jedbrown.org> Message-ID: On Thu, Apr 6, 2017 at 7:04 AM, Jed Brown wrote: > Matthew Knepley writes: > > > On Wed, Apr 5, 2017 at 9:57 PM, Jed Brown wrote: > > > >> Matthew Knepley writes: > >> > >> > On Wed, Apr 5, 2017 at 1:13 PM, Jed Brown wrote: > >> > > >> >> Matthew Knepley writes: > >> >> > >> >> > On Wed, Apr 5, 2017 at 12:03 PM, Jed Brown > wrote: > >> >> > > >> >> >> Matthew Knepley writes: > >> >> >> > As a side note, I think using FV to solve an elliptic equation > >> should > >> >> be > >> >> >> a > >> >> >> > felony. Continuous FEM is excellent for this, whereas FV needs > >> >> >> > a variety of twisted hacks and is always worse in terms of > >> computation > >> >> >> and > >> >> >> > accuracy. > >> >> >> > >> >> >> Unless you need exact (no discretization error) local > conservation, > >> >> >> e.g., for a projection in a staggered grid incompressible flow > >> problem, > >> >> >> in which case you can use either FV or mixed FEM (algebraically > >> >> >> equivalent to FV in some cases). > >> >> >> > >> >> > > >> >> > Okay, the words are getting in the way of me understanding. I want > to > >> see > >> >> > if I can pull something I can use out of the above explanation. > >> >> > > >> >> > First, "locally conservative" bothers me. It does not seem to > indicate > >> >> what > >> >> > it really does. I start with the Poisson equation > >> >> > > >> >> > \Delta p = f > >> >> > > >> >> > So the setup is then that I discretize both the quantity and its > >> >> derivative > >> >> > (I will use mixed FEM style since I know it better) > >> >> > > >> >> > div v = f > >> >> > grad p = v > >> >> > > >> >> > Now, you might expect that "local conservation" would give me the > >> exact > >> >> > result for > >> >> > > >> >> > \int_T p > >> >> > > >> >> > everywhere, meaning the integral of p over every cell T. > >> >> > >> >> Since when is pressure a conserved quantity? > >> >> > >> >> In your notation above, local conservation means > >> >> > >> >> \int_T (div v - f) = 0 > >> >> > >> >> I.e., if you have a tracer moving in a source-free velocity field v > >> >> solving the above equation, its concentration satisfies > >> >> > >> >> c_t + div(c v) = 0 > >> >> > >> >> and it will be conserved element-wise. > >> >> > >> > > >> > But again that seems like a terrible term. What that statement above > >> means > >> > is that globally > >> > I will have no loss, but the individual amounts in each cell are not > >> > accurate to machine error, > >> > they are accurate to discretization error because the flux is only > >> accurate > >> > to discretization error. > >> > >> No. The velocity field is divergence-free up to solver tolerance. Since > >> the piecewise constants are in the test space, there is a literal > >> equation that reads > >> > >> \int_T (div v - f) = 0. > >> > >> That holds up to solver tolerance, not just up to discretization error. > >> That's what local conservation means. > >> > >> If you use continuous FEM, you don't have a statement like the above. > >> > > > > Okay, that is what you mean by local conservation. That state is still > only > > accurate to discretization error. > > Why do I care about satisfying that equation to machine precision? > > I swear we've had this discussion before. If you have a tracer moving > in a velocity field that is not discrete divergence-free (i.e., > satisfying the element-wise equation above), you'll get artifacts in the > concentration (possibly violating positivity or a maximum principle). > The (normal component of) velocity is also more accurate when you solve > in mixed H(div) form (or an equivalent FV method) than if you solve in > H^1. > Okay, that makes sense. If I do not have fluxes matching the sources, I do not preserve montonicity for an advected field. I might need this to machine precision because some other equations cannot tolerate a negative number there. I will write this one down. Why do I need it "for a projection in a staggered grid incompressible flow problem". This would mean I satisfy (I think) \int_T div p = 0 meaning that there is a force balance on each cell to machine precision. If I just care about the fluid flow, this does not seem important. Thanks, Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Thu Apr 6 08:32:50 2017 From: jed at jedbrown.org (Jed Brown) Date: Thu, 06 Apr 2017 07:32:50 -0600 Subject: [petsc-users] examples of DMPlex*FVM methods In-Reply-To: References: <87k270kjtd.fsf@jedbrown.org> <877f2yiy6l.fsf@jedbrown.org> <87zifuhgd5.fsf@jedbrown.org> <87fuhmgs4o.fsf@jedbrown.org> <87mvbtg2tc.fsf@jedbrown.org> Message-ID: <877f2xfypp.fsf@jedbrown.org> Matthew Knepley writes: > Okay, that makes sense. If I do not have fluxes matching the sources, I do > not > preserve montonicity for an advected field. I might need this to machine > precision > because some other equations cannot tolerate a negative number there. I will > write this one down. > > Why do I need it "for a projection in a staggered grid incompressible flow > problem". > This would mean I satisfy (I think) > > \int_T div p = 0 Matt Knepley can take the divergence of a scalar. > meaning that there is a force balance on each cell to machine precision. If > I just care > about the fluid flow, this does not seem important. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From knepley at gmail.com Thu Apr 6 09:13:23 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 6 Apr 2017 09:13:23 -0500 Subject: [petsc-users] examples of DMPlex*FVM methods In-Reply-To: <877f2xfypp.fsf@jedbrown.org> References: <87k270kjtd.fsf@jedbrown.org> <877f2yiy6l.fsf@jedbrown.org> <87zifuhgd5.fsf@jedbrown.org> <87fuhmgs4o.fsf@jedbrown.org> <87mvbtg2tc.fsf@jedbrown.org> <877f2xfypp.fsf@jedbrown.org> Message-ID: On Thu, Apr 6, 2017 at 8:32 AM, Jed Brown wrote: > Matthew Knepley writes: > > Okay, that makes sense. If I do not have fluxes matching the sources, I > do > > not > > preserve montonicity for an advected field. I might need this to machine > > precision > > because some other equations cannot tolerate a negative number there. I > will > > write this one down. > > > > Why do I need it "for a projection in a staggered grid incompressible > flow > > problem". > > This would mean I satisfy (I think) > > > > \int_T div p = 0 > > Matt Knepley can take the divergence of a scalar. Yes, I forgot the grad. It is crazy here. Same question. Matt > > meaning that there is a force balance on each cell to machine precision. > If > > I just care > > about the fluid flow, this does not seem important. > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Apr 6 09:27:24 2017 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 6 Apr 2017 07:27:24 -0700 Subject: [petsc-users] GAMG for the unsymmetrical matrix In-Reply-To: <66D79752-D4C3-4A2E-ADCE-EFC23C93CCD7@mcs.anl.gov> References: <66D79752-D4C3-4A2E-ADCE-EFC23C93CCD7@mcs.anl.gov> Message-ID: On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith wrote: > >> Does this mean that GAMG works for the symmetrical matrix only? > > No, it means that for non symmetric nonzero structure you need the extra flag. So use the extra flag. The reason we don't always use the flag is because it adds extra cost and isn't needed if the matrix already has a symmetric nonzero structure. BTW, if you have symmetric non-zero structure you can just set -pc_gamg_threshold -1.0', note the "or" in the message. If you want to mess with the threshold then you need to use the symmetrized flag. From fande.kong at inl.gov Thu Apr 6 09:39:31 2017 From: fande.kong at inl.gov (Kong, Fande) Date: Thu, 6 Apr 2017 08:39:31 -0600 Subject: [petsc-users] GAMG for the unsymmetrical matrix In-Reply-To: References: <66D79752-D4C3-4A2E-ADCE-EFC23C93CCD7@mcs.anl.gov> Message-ID: Thanks, Mark and Barry, It works pretty wells in terms of the number of linear iterations (using "-pc_gamg_sym_graph true"), but it is horrible in the compute time. I am using the two-level method via "-pc_mg_levels 2". The reason why the compute time is larger than other preconditioning options is that a matrix free method is used in the fine level and in my particular problem the function evaluation is expensive. I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton, but I do not think I want to make the preconditioning part matrix-free. Do you guys know how to turn off the matrix-free method for GAMG? Here is the detailed solver: *SNES Object: 384 MPI processes type: newtonls maximum iterations=200, maximum function evaluations=10000 tolerances: relative=1e-08, absolute=1e-08, solution=1e-50 total number of linear solver iterations=20 total number of function evaluations=166 norm schedule ALWAYS SNESLineSearch Object: 384 MPI processes type: bt interpolation: cubic alpha=1.000000e-04 maxstep=1.000000e+08, minlambda=1.000000e-12 tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 maximum iterations=40 KSP Object: 384 MPI processes type: gmres GMRES: restart=100, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=100, initial guess is zero tolerances: relative=0.001, absolute=1e-50, divergence=10000. right preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 384 MPI processes type: gamg MG: type is MULTIPLICATIVE, levels=2 cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices GAMG specific options Threshold for dropping small values from graph 0. AGG specific options Symmetric graph true Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 384 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 384 MPI processes type: bjacobi block Jacobi: number of blocks = 384 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (mg_coarse_sub_) 1 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_sub_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5., needed 1.31367 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=37, cols=37 package used to perform factorization: petsc total: nonzeros=913, allocated nonzeros=913 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=37, cols=37 total: nonzeros=695, allocated nonzeros=695 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 384 MPI processes type: mpiaij rows=18145, cols=18145 total: nonzeros=1709115, allocated nonzeros=1709115 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 384 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.133339, max = 1.46673 Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] KSP Object: (mg_levels_1_esteig_) 384 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_1_) 384 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix followed by preconditioner matrix: Mat Object: 384 MPI processes type: mffd rows=3020875, cols=3020875 Matrix-free approximation: err=1.49012e-08 (relative error in function evaluation) Using wp compute h routine Does not compute normU Mat Object: () 384 MPI processes type: mpiaij rows=3020875, cols=3020875 total: nonzeros=215671710, allocated nonzeros=241731750 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix followed by preconditioner matrix: Mat Object: 384 MPI processes type: mffd rows=3020875, cols=3020875 Matrix-free approximation: err=1.49012e-08 (relative error in function evaluation) Using wp compute h routine Does not compute normU Mat Object: () 384 MPI processes type: mpiaij rows=3020875, cols=3020875 total: nonzeros=215671710, allocated nonzeros=241731750 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines* Fande, On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams wrote: > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith wrote: > > > >> Does this mean that GAMG works for the symmetrical matrix only? > > > > No, it means that for non symmetric nonzero structure you need the > extra flag. So use the extra flag. The reason we don't always use the flag > is because it adds extra cost and isn't needed if the matrix already has a > symmetric nonzero structure. > > BTW, if you have symmetric non-zero structure you can just set > -pc_gamg_threshold -1.0', note the "or" in the message. > > If you want to mess with the threshold then you need to use the > symmetrized flag. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Apr 6 09:47:04 2017 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 6 Apr 2017 07:47:04 -0700 Subject: [petsc-users] GAMG for the unsymmetrical matrix In-Reply-To: References: <66D79752-D4C3-4A2E-ADCE-EFC23C93CCD7@mcs.anl.gov> Message-ID: On Thu, Apr 6, 2017 at 7:39 AM, Kong, Fande wrote: > Thanks, Mark and Barry, > > It works pretty wells in terms of the number of linear iterations (using > "-pc_gamg_sym_graph true"), but it is horrible in the compute time. I am > using the two-level method via "-pc_mg_levels 2". The reason why the compute > time is larger than other preconditioning options is that a matrix free > method is used in the fine level and in my particular problem the function > evaluation is expensive. > > I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton, but I > do not think I want to make the preconditioning part matrix-free. Do you > guys know how to turn off the matrix-free method for GAMG? You do have an option to use the operator or the preconditioner operator (matrix) for the fine grid smoother, but I thought it uses the PC matrix by default. I don't recall the parameters nor do I see this in the view output. Others should be able to help. From francesco.caimmi at polimi.it Thu Apr 6 10:03:59 2017 From: francesco.caimmi at polimi.it (Francesco Caimmi) Date: Thu, 6 Apr 2017 17:03:59 +0200 Subject: [petsc-users] Understanding DMPlexDistribute overlap In-Reply-To: References: <8575748.yB8pcOHeRQ@pc-fcaimmi> <3596627.diPbWC0zYa@pc-fcaimmi> Message-ID: <3810305.GoBycLGZHu@pc-fcaimmi> Hi Michael, thank you for verifying that the test works. I actually was on `maint`, as per https://www.mcs.anl.gov/petsc/download/index.html. Now that I switched to `master ` everything works as expected, both my test example and the DMPlex test ex12, which I can now build without issue. I thought unlikely a problem with my setup because I saw the same beahviour on two different machines, but actually it turned out they had the same setup. Thank you (and Matt) for helping me track down the source of the problem. Best, Francesco On gioved? 6 aprile 2017 11:40:27 CEST Michael Lange wrote: > Hi Francesco, > > Ok, I can confirm that your test runs fine for me with the latest master > branch. I'm attaching the log for two processes up to overlap 7, where > the entire mesh is effectively replicated on each partition. The command > I ran was: > > for OL in 1 2 3 4 5 6 7; do mpiexec -np 2 python overlay-test.py -o $OL; > done > > Looks like this might be a problem with your local build. Can you please > try and update to the latest master? > > Hope this helps, > > Michael > > On 06/04/17 10:25, Francesco Caimmi wrote: > > Dear Matt, > > > > thanks for your reply. > > > > On mercoled? 5 aprile 2017 21:09:43 CEST Matthew Knepley wrote: > >> On Wed, Apr 5, 2017 at 6:03 AM, Francesco Caimmi > >> > > >>> [...] > >>> the program fails with the error message captured in the attached file > >>> error.log. Changing the number of processors does not alter the > >>> behavior. > >>> Note > >>> also that the same holds if I use a mesh generated by > >>> DMPlexCreateBoxMesh. > >> > >> Francesco, I will reproduce your problem, but it may take me a few days. > > > > Thank you very much for your time. > > > >> It is strange since we have tests for overlap > 1 that do use > >> CreateBoxMesh. For example, > >> > >> cd src/dm/impls/plex/examples/tests > >> make ex12 > >> mpiexec -n 8 -test_partition -overlap 2 -dm_view ::ascii_info_detail > > > > Anyway, there might be something related to my install: for the records, > > if I do > > > > $cd src/dm/impls/plex/examples/tests > > $make ex12 > > > > I get > > > > /home/fcaimmi/Packages/petsc/my-linux-petsc/bin/mpicc ex12.c -o > > ex12 > > > > ex12.c:3:25: fatal error: petscdmplex.h: No such file or directory > > > > #include > > > > ^ > > > > compilation terminated. > > : recipe for target 'ex12' failed > > make: *** [ex12] Error 1 > > > > I get this error I cannot explain ( I am not that much into PETSc build > > system, I mostly use the python bindings), since for example I can > > successfully build ex1.c or ex3.c which of course contain the same include > > statement... > > > > I don't know if this is useful to track the issue, but hope it helps. > > > > Thank you again, > > FC -- Francesco Caimmi Laboratorio di Ingegneria dei Polimeri http://www.chem.polimi.it/polyenglab/ Politecnico di Milano - Dipartimento di Chimica, Materiali e Ingegneria Chimica ?Giulio Natta? P.zza Leonardo da Vinci, 32 I-20133 Milano Tel. +39.02.2399.4711 Fax +39.02.7063.8173 francesco.caimmi at polimi.it Skype: fmglcaimmi (please arrange meetings by e-mail) From jed at jedbrown.org Thu Apr 6 10:52:38 2017 From: jed at jedbrown.org (Jed Brown) Date: Thu, 06 Apr 2017 09:52:38 -0600 Subject: [petsc-users] examples of DMPlex*FVM methods In-Reply-To: References: <87k270kjtd.fsf@jedbrown.org> <877f2yiy6l.fsf@jedbrown.org> <87zifuhgd5.fsf@jedbrown.org> <87fuhmgs4o.fsf@jedbrown.org> <87mvbtg2tc.fsf@jedbrown.org> <877f2xfypp.fsf@jedbrown.org> Message-ID: <8737dlfs8p.fsf@jedbrown.org> Matthew Knepley writes: > On Thu, Apr 6, 2017 at 8:32 AM, Jed Brown wrote: > >> Matthew Knepley writes: >> > Okay, that makes sense. If I do not have fluxes matching the sources, I >> do >> > not >> > preserve montonicity for an advected field. I might need this to machine >> > precision >> > because some other equations cannot tolerate a negative number there. I >> will >> > write this one down. >> > >> > Why do I need it "for a projection in a staggered grid incompressible >> flow >> > problem". >> > This would mean I satisfy (I think) >> > >> > \int_T div p = 0 >> >> Matt Knepley can take the divergence of a scalar. > > > Yes, I forgot the grad. It is crazy here. Same question. It's crazy here too, but as far as I know, I didn't wake up this morning with discretization amnesia. Recall that pressure projection is derived by taking the divergence of the velocity equation and using div u = 0 as an identity. If you want a velocity field that is element-wise divergence-free (as is desirable for any advected field) then you need a compatible pressure space. With staggered FD, that pressure space is the piecewise constants. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From abhyshr at anl.gov Thu Apr 6 13:04:54 2017 From: abhyshr at anl.gov (Abhyankar, Shrirang G.) Date: Thu, 6 Apr 2017 18:04:54 +0000 Subject: [petsc-users] Using same DMPlex for solving two different problems Message-ID: I am solving a time-dependent problem using DMNetwork (uses DMPlex internally) to manage the network. To find the initial conditions, I need to solve a nonlinear problem on the same network but with different number of dofs on the nodes and edges. Question: Can I reuse the same DMNetwork (DMPlex) for solving the two problems. The way I am trying to approach this currently is by creating two PetscSections to be used with Plex. The first one is used for the initial conditions and the second one is for the time-stepping. Here?s the code I have for(i=vStart; i < vEnd; i++) { /* Two variables at each vertex for the initial condition problem */ ierr = PetscSectionSetDof(dyn->initpflowpsection,i,2);CHKERRQ(ierr); } ierr = PetscSectionSetUp(dyn->initpflowpsection);CHKERRQ(ierr); /* Get the plex dm */ ierr = DMNetworkGetPlex(networkdm,&plexdm);CHKERRQ(ierr); /* Get default sections associated with this plex set for time-stepping */ ierr = DMGetDefaultSection(plexdm,&dyn->defaultsection);CHKERRQ(ierr); ierr = DMGetDefaultGlobalSection(plexdm,&dyn->defaultglobalsection);CHKERRQ(ierr); /* Increase the reference count so that the section does not get destroyed when a new one is set with DMSetDefaultSection */ ierr = PetscObjectReference((PetscObject)dyn->defaultsection);CHKERRQ(ierr); ierr = PetscObjectReference((PetscObject)dyn->defaultglobalsection);CHKERRQ(ierr); /* Set the new section created for initial conditions */ ierr = DMSetDefaultSection(plexdm,dyn->initpflowpsection);CHKERRQ(ierr); ierr = DMGetDefaultGlobalSection(plexdm,&dyn->initpflowpglobsection);CHKERRQ(ierr) ; Would this work or should I rather use DMPlexCreateSection to create the PetscSection used for initial conditions (dyn->initpflowpsection)? Any other problems that I should be aware of? Has anyone else attempted using the same plex for solving two different problems? Thanks, Shri From jmlarson at mcs.anl.gov Thu Apr 6 13:32:43 2017 From: jmlarson at mcs.anl.gov (Larson, Jeffrey M.) Date: Thu, 6 Apr 2017 18:32:43 +0000 Subject: [petsc-users] Calling PETSc functions from Python using petsc4py Message-ID: <4914053FAC39104BAF8D7532A4EF2073AE1D7DEC@HALAS.anl.gov> Hello, Is there a way to call a function in a PETSc example file from python? Explicitly, I'd like to call EvaluateFunction and EvaluateJacobian (resp. lines 123 and 147) of https://www.mcs.anl.gov/petsc/petsc-current/src/tao/leastsquares/examples/tutorials/chwirut1.c.html from Python. Can petsc4py version 3.7.0 help with this? Thank you for your help, Jeffrey Larson -------------- next part -------------- An HTML attachment was scrubbed... URL: From fande.kong at inl.gov Thu Apr 6 15:27:59 2017 From: fande.kong at inl.gov (Kong, Fande) Date: Thu, 6 Apr 2017 14:27:59 -0600 Subject: [petsc-users] EPSViewer in SLEPc Message-ID: Hi All, The EPSViewer in SLEPc looks weird. I do not understand the viewer logic. For example there is a piece of code in SLEPc (at line 225 of epsview.c): * if (!ispower) { if (!eps->ds) { ierr = EPSGetDS(eps,&eps->ds);CHKERRQ(ierr); } ierr = DSView(eps->ds,viewer);CHKERRQ(ierr); }* If eps->ds is NULL, why we are going to create a new one? I just want to view this object. If it is NULL, you should just show me that this object is empty. You could print out: ds: null. If a user wants to develop a new EPS solver, and then register the new EPS to SLEPc. But the user does not want to use DS, and DSView will show some error messages: *[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------[0]PETSC ERROR: Object is in wrong state[0]PETSC ERROR: Requested matrix was not created in this DS[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.[0]PETSC ERROR: Petsc Release Version 3.7.5, unknown [0]PETSC ERROR: ../../../moose_test-opt on a arch-darwin-c-debug named FN604208 by kongf Thu Apr 6 14:22:14 2017[0]PETSC ERROR: #1 DSViewMat() line 149 in /slepc/src/sys/classes/ds/interface/dspriv.c[0]PETSC ERROR: #2 DSView_NHEP() line 47 in/slepc/src/sys/classes/ds/impls/nhep/dsnhep.c[0]PETSC ERROR: #3 DSView() line 772 in/slepc/src/sys/classes/ds/interface/dsbasic.c[0]PETSC ERROR: #4 EPSView() line 227 in /slepc/src/eps/interface/epsview.c[0]PETSC ERROR: #5 PetscObjectView() line 106 in/petsc/src/sys/objects/destroy.c[0]PETSC ERROR: #6 PetscObjectViewFromOptions() line 2808 in /petsc/src/sys/objects/options.c[0]PETSC ERROR: #7 EPSSolve() line 159 in /slepc/src/eps/interface/epssolve.c* Fande, -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Apr 6 16:39:08 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 6 Apr 2017 16:39:08 -0500 Subject: [petsc-users] GAMG for the unsymmetrical matrix In-Reply-To: References: <66D79752-D4C3-4A2E-ADCE-EFC23C93CCD7@mcs.anl.gov> Message-ID: <772D2966-F917-44D1-B2AC-B0F4E506DC7C@mcs.anl.gov> > On Apr 6, 2017, at 9:39 AM, Kong, Fande wrote: > > Thanks, Mark and Barry, > > It works pretty wells in terms of the number of linear iterations (using "-pc_gamg_sym_graph true"), but it is horrible in the compute time. I am using the two-level method via "-pc_mg_levels 2". The reason why the compute time is larger than other preconditioning options is that a matrix free method is used in the fine level and in my particular problem the function evaluation is expensive. > > I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton, but I do not think I want to make the preconditioning part matrix-free. Do you guys know how to turn off the matrix-free method for GAMG? -pc_use_amat false > > Here is the detailed solver: > > SNES Object: 384 MPI processes > type: newtonls > maximum iterations=200, maximum function evaluations=10000 > tolerances: relative=1e-08, absolute=1e-08, solution=1e-50 > total number of linear solver iterations=20 > total number of function evaluations=166 > norm schedule ALWAYS > SNESLineSearch Object: 384 MPI processes > type: bt > interpolation: cubic > alpha=1.000000e-04 > maxstep=1.000000e+08, minlambda=1.000000e-12 > tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 > maximum iterations=40 > KSP Object: 384 MPI processes > type: gmres > GMRES: restart=100, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=100, initial guess is zero > tolerances: relative=0.001, absolute=1e-50, divergence=10000. > right preconditioning > using UNPRECONDITIONED norm type for convergence test > PC Object: 384 MPI processes > type: gamg > MG: type is MULTIPLICATIVE, levels=2 cycles=v > Cycles per PCApply=1 > Using Galerkin computed coarse grid matrices > GAMG specific options > Threshold for dropping small values from graph 0. > AGG specific options > Symmetric graph true > Coarse grid solver -- level ------------------------------- > KSP Object: (mg_coarse_) 384 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_) 384 MPI processes > type: bjacobi > block Jacobi: number of blocks = 384 > Local solve is same for all blocks, in the following KSP and PC objects: > KSP Object: (mg_coarse_sub_) 1 MPI processes > type: preonly > maximum iterations=1, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_sub_) 1 MPI processes > type: lu > LU: out-of-place factorization > tolerance for zero pivot 2.22045e-14 > using diagonal shift on blocks to prevent zero pivot [INBLOCKS] > matrix ordering: nd > factor fill ratio given 5., needed 1.31367 > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=37, cols=37 > package used to perform factorization: petsc > total: nonzeros=913, allocated nonzeros=913 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=37, cols=37 > total: nonzeros=695, allocated nonzeros=695 > total number of mallocs used during MatSetValues calls =0 > not using I-node routines > linear system matrix = precond matrix: > Mat Object: 384 MPI processes > type: mpiaij > rows=18145, cols=18145 > total: nonzeros=1709115, allocated nonzeros=1709115 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > Down solver (pre-smoother) on level 1 ------------------------------- > KSP Object: (mg_levels_1_) 384 MPI processes > type: chebyshev > Chebyshev: eigenvalue estimates: min = 0.133339, max = 1.46673 > Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] > KSP Object: (mg_levels_1_esteig_) 384 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10, initial guess is zero > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > maximum iterations=2 > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using nonzero initial guess > using NONE norm type for convergence test > PC Object: (mg_levels_1_) 384 MPI processes > type: sor > SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. > linear system matrix followed by preconditioner matrix: > Mat Object: 384 MPI processes > type: mffd > rows=3020875, cols=3020875 > Matrix-free approximation: > err=1.49012e-08 (relative error in function evaluation) > Using wp compute h routine > Does not compute normU > Mat Object: () 384 MPI processes > type: mpiaij > rows=3020875, cols=3020875 > total: nonzeros=215671710, allocated nonzeros=241731750 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > Up solver (post-smoother) same as down solver (pre-smoother) > linear system matrix followed by preconditioner matrix: > Mat Object: 384 MPI processes > type: mffd > rows=3020875, cols=3020875 > Matrix-free approximation: > err=1.49012e-08 (relative error in function evaluation) > Using wp compute h routine > Does not compute normU > Mat Object: () 384 MPI processes > type: mpiaij > rows=3020875, cols=3020875 > total: nonzeros=215671710, allocated nonzeros=241731750 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > > > Fande, > > On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams wrote: > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith wrote: > > > >> Does this mean that GAMG works for the symmetrical matrix only? > > > > No, it means that for non symmetric nonzero structure you need the extra flag. So use the extra flag. The reason we don't always use the flag is because it adds extra cost and isn't needed if the matrix already has a symmetric nonzero structure. > > BTW, if you have symmetric non-zero structure you can just set > -pc_gamg_threshold -1.0', note the "or" in the message. > > If you want to mess with the threshold then you need to use the > symmetrized flag. > From jed at jedbrown.org Thu Apr 6 18:33:59 2017 From: jed at jedbrown.org (Jed Brown) Date: Thu, 06 Apr 2017 17:33:59 -0600 Subject: [petsc-users] Calling PETSc functions from Python using petsc4py In-Reply-To: <4914053FAC39104BAF8D7532A4EF2073AE1D7DEC@HALAS.anl.gov> References: <4914053FAC39104BAF8D7532A4EF2073AE1D7DEC@HALAS.anl.gov> Message-ID: <87o9w9dsbc.fsf@jedbrown.org> No, these are not part of the PETSc library so they would need to be compiled and called separately (you can do that, but it isn't part of petsc4py). "Larson, Jeffrey M." writes: > Hello, > > Is there a way to call a function in a PETSc example file from python? > > Explicitly, I'd like to call EvaluateFunction and EvaluateJacobian (resp. lines 123 and 147) of https://www.mcs.anl.gov/petsc/petsc-current/src/tao/leastsquares/examples/tutorials/chwirut1.c.html > from Python. > > Can petsc4py version 3.7.0 help with this? > > Thank you for your help, > Jeffrey Larson -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From bsmith at mcs.anl.gov Thu Apr 6 18:50:06 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 6 Apr 2017 18:50:06 -0500 Subject: [petsc-users] Calling PETSc functions from Python using petsc4py In-Reply-To: <87o9w9dsbc.fsf@jedbrown.org> References: <4914053FAC39104BAF8D7532A4EF2073AE1D7DEC@HALAS.anl.gov> <87o9w9dsbc.fsf@jedbrown.org> Message-ID: <559AFC99-49D2-42B9-B3B5-00C9DB1F8437@mcs.anl.gov> > On Apr 6, 2017, at 6:33 PM, Jed Brown wrote: > > No, these are not part of the PETSc library so they would need to be > compiled and called separately (you can do that, but it isn't part of > petsc4py). Jeff, See demo/wrap-cypython; I believe this example does pretty much exactly what you need, only use SNES instead of Tao. Barry > > "Larson, Jeffrey M." writes: > >> Hello, >> >> Is there a way to call a function in a PETSc example file from python? >> >> Explicitly, I'd like to call EvaluateFunction and EvaluateJacobian (resp. lines 123 and 147) of https://www.mcs.anl.gov/petsc/petsc-current/src/tao/leastsquares/examples/tutorials/chwirut1.c.html >> from Python. >> >> Can petsc4py version 3.7.0 help with this? >> >> Thank you for your help, >> Jeffrey Larson From francescomigliorini93 at gmail.com Fri Apr 7 05:11:12 2017 From: francescomigliorini93 at gmail.com (Francesco Migliorini) Date: Fri, 7 Apr 2017 12:11:12 +0200 Subject: [petsc-users] Using MatShell without MatMult Message-ID: Hello, I need to solve a linear system using GMRES without creating explicitly the matrix because very large. So, I am trying to use the MatShell strategy but I am stucked. The problem is that it seems to me that inside the user-defined MyMatMult it is required to use MatMult and this would honestly vanish all the gain from using this strategy. Indeed, I would need to access directly to the entries of the input vector, multiply them by some parameters imported in MyMatMult with *common* and finally compose the output vector without creating any matrix. First of all, is it possible? Secondly, if so, where is my mistake? Here's an example of my code with a very simple 10x10 system with the identity matrix: [...] call PetscInitialize(PETSC_NULL_CHARACTER,perr) ind(1) = 10 call VecCreate(PETSC_COMM_WORLD,feP,perr) call VecSetSizes(feP,PETSC_DECIDE,ind,perr) call VecSetFromOptions(feP,perr) call VecDuplicate(feP,u1P,perr) do jt = 1,10 ind(1) = jt-1 fval(1) = jt call VecSetValues(feP,1,ind,fval(1),INSERT_VALUES,perr) enddo call VecAssemblyBegin(feP,perr) call VecAssemblyEnd(feP,perr) ind(1) = 10 call MatCreateShell(PETSC_COMM_WORLD, PETSC_DECIDE, PETSC_DECIDE, ind, ind, PETSC_NULL_INTEGER, TheShellMatrix, perr) call MatShellSetOperation(TheShellMatrix, MATOP_MULT, MyMatMult, perr) call KSPCreate(PETSC_COMM_WORLD, ksp, perr) call KSPSetType(ksp,KSPGMRES,perr) call KSPSetOperators(ksp,TheShellMatrix,TheShellMatrix,perr) call KSPSolve(ksp,feP,u1P,perr) call PetscFinalize(PETSC_NULL_CHARACTER,perr) [...] subroutine MyMatMult(TheShellMatrix,T,AT,ierr) [...] Vec T, AT Mat TheShellMatrix PetscReal fval(1), u0(1) [...] call PetscInitialize(PETSC_NULL_CHARACTER,ierr) ind(1) = 10 call VecCreate(PETSC_COMM_WORLD,AT,ierr) call VecSetSizes(AT,PETSC_DECIDE,ind,ierr) call VecSetFromOptions(AT,ierr) do i =0,9 ind(1) = i call VecGetValues(T,1,ind,u0(1),ierr) fval(1) = u0(1) call VecSetValues(AT,1,ind,fval(1),INSERT_VALUES,ierr) enddo call VecAssemblyBegin(AT,ierr) call VecAssemblyEnd(AT,ierr) return end subroutine MyMatMult The output of this code is something completely invented but in some way related to the actual solution: 5.0964719143762542E-002 0.10192943828752508 0.15289415743128765 0.20385887657505017 0.25482359571881275 0.30578831486257529 0.35675303400633784 0.40771775315010034 0.45868247229386289 0.50964719143762549 Instead, if I use MatMult in MyMatMult I get the right solution. Here's the code subroutine MyMatMult(TheShellMatrix,T,AT,ierr) [...] Vec T, AT Mat TheShellMatrix, IDEN PetscReal fval(1) [...] call PetscInitialize(PETSC_NULL_CHARACTER,ierr) ind(1) = 10 call MatCreate(PETSC_COMM_WORLD,IDEN,ierr) call MatSetSizes(IDEN,PETSC_DECIDE,PETSC_DECIDE,ind,ind,ierr) call MatSetUp(IDEN,ierr) do i =0,9 ind(1) = i fval(1) = 1 call MatSetValues(IDEN,1,ind,1,ind,fval(1),INSERT_VALUES,ierr) enddo call MatAssemblyBegin(IDEN,MAT_FINAL_ASSEMBLY,ierr) call MatAssemblyEnd(IDEN,MAT_FINAL_ASSEMBLY,ierr) call MatMult(IDEN,T,AT,ierr) return end subroutine MyMatMult Thanks in advance for any answer! Francesco Migliorini -------------- next part -------------- An HTML attachment was scrubbed... URL: From jroman at dsic.upv.es Fri Apr 7 05:41:44 2017 From: jroman at dsic.upv.es (Jose E. Roman) Date: Fri, 7 Apr 2017 12:41:44 +0200 Subject: [petsc-users] EPSViewer in SLEPc In-Reply-To: References: Message-ID: <889F2A82-ECD1-441C-A859-35A16243B147@dsic.upv.es> I have pushed a commit that should avoid this problem. Jose > El 6 abr 2017, a las 22:27, Kong, Fande escribi?: > > Hi All, > > The EPSViewer in SLEPc looks weird. I do not understand the viewer logic. For example there is a piece of code in SLEPc (at line 225 of epsview.c): > > if (!ispower) { > if (!eps->ds) { ierr = EPSGetDS(eps,&eps->ds);CHKERRQ(ierr); } > ierr = DSView(eps->ds,viewer);CHKERRQ(ierr); > } > > > If eps->ds is NULL, why we are going to create a new one? I just want to view this object. If it is NULL, you should just show me that this object is empty. You could print out: ds: null. > > If a user wants to develop a new EPS solver, and then register the new EPS to SLEPc. But the user does not want to use DS, and DSView will show some error messages: > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Object is in wrong state > [0]PETSC ERROR: Requested matrix was not created in this DS > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.7.5, unknown > [0]PETSC ERROR: ../../../moose_test-opt on a arch-darwin-c-debug named FN604208 by kongf Thu Apr 6 14:22:14 2017 > [0]PETSC ERROR: #1 DSViewMat() line 149 in /slepc/src/sys/classes/ds/interface/dspriv.c > [0]PETSC ERROR: #2 DSView_NHEP() line 47 in/slepc/src/sys/classes/ds/impls/nhep/dsnhep.c > [0]PETSC ERROR: #3 DSView() line 772 in/slepc/src/sys/classes/ds/interface/dsbasic.c > [0]PETSC ERROR: #4 EPSView() line 227 in /slepc/src/eps/interface/epsview.c > [0]PETSC ERROR: #5 PetscObjectView() line 106 in/petsc/src/sys/objects/destroy.c > [0]PETSC ERROR: #6 PetscObjectViewFromOptions() line 2808 in /petsc/src/sys/objects/options.c > [0]PETSC ERROR: #7 EPSSolve() line 159 in /slepc/src/eps/interface/epssolve.c > > > > Fande, From knepley at gmail.com Fri Apr 7 06:23:59 2017 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 7 Apr 2017 06:23:59 -0500 Subject: [petsc-users] Using MatShell without MatMult In-Reply-To: References: Message-ID: On Fri, Apr 7, 2017 at 5:11 AM, Francesco Migliorini < francescomigliorini93 at gmail.com> wrote: > Hello, > > I need to solve a linear system using GMRES without creating explicitly > the matrix because very large. So, I am trying to use the MatShell strategy > but I am stucked. The problem is that it seems to me that inside the > user-defined MyMatMult it is required to use MatMult and this would > honestly vanish all the gain from using this strategy. Indeed, I would need > to access directly to the entries of the input vector, multiply them by > some parameters imported in MyMatMult with *common* and finally compose > the output vector without creating any matrix. First of all, is it > possible? > Yes. > Secondly, if so, where is my mistake? Here's an example of my code with a > very simple 10x10 system with the identity matrix: > > [...] > call PetscInitialize(PETSC_NULL_CHARACTER,perr) > ind(1) = 10 > call VecCreate(PETSC_COMM_WORLD,feP,perr) > call VecSetSizes(feP,PETSC_DECIDE,ind,perr) > call VecSetFromOptions(feP,perr) > call VecDuplicate(feP,u1P,perr) > do jt = 1,10 > ind(1) = jt-1 > fval(1) = jt > call VecSetValues(feP,1,ind,fval(1),INSERT_VALUES,perr) > enddo > call VecAssemblyBegin(feP,perr) > call VecAssemblyEnd(feP,perr) > ind(1) = 10 > call MatCreateShell(PETSC_COMM_WORLD, PETSC_DECIDE, PETSC_DECIDE, ind, > ind, PETSC_NULL_INTEGER, TheShellMatrix, perr) > call MatShellSetOperation(TheShellMatrix, MATOP_MULT, MyMatMult, perr) > Here I would probably use http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatShellSetContext.html instead of a common block, but that works too. > call KSPCreate(PETSC_COMM_WORLD, ksp, perr) > call KSPSetType(ksp,KSPGMRES,perr) > call KSPSetOperators(ksp,TheShellMatrix,TheShellMatrix,perr) > call KSPSolve(ksp,feP,u1P,perr) > call PetscFinalize(PETSC_NULL_CHARACTER,perr) > [...] > > subroutine MyMatMult(TheShellMatrix,T,AT,ierr) > [...] > Vec T, AT > Mat TheShellMatrix > PetscReal fval(1), u0(1) > [...] > call PetscInitialize(PETSC_NULL_CHARACTER,ierr) > ind(1) = 10 > call VecCreate(PETSC_COMM_WORLD,AT,ierr) > call VecSetSizes(AT,PETSC_DECIDE,ind,ierr) > call VecSetFromOptions(AT,ierr) > Its not your job to create AT. We are passing it in, so just use it. > do i =0,9 > ind(1) = i > call VecGetValues(T,1,ind,u0(1),ierr) > fval(1) = u0(1) > call VecSetValues(AT,1,ind,fval(1),INSERT_VALUES,ierr) > You can do it this way, but its easier to use http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecGetArray.html outside the loop for both vectors. Matt > enddo > call VecAssemblyBegin(AT,ierr) > call VecAssemblyEnd(AT,ierr) > return > end subroutine MyMatMult > > The output of this code is something completely invented but in some way > related to the actual solution: > 5.0964719143762542E-002 > 0.10192943828752508 > 0.15289415743128765 > 0.20385887657505017 > 0.25482359571881275 > 0.30578831486257529 > 0.35675303400633784 > 0.40771775315010034 > 0.45868247229386289 > 0.50964719143762549 > > Instead, if I use MatMult in MyMatMult I get the right solution. Here's > the code > > subroutine MyMatMult(TheShellMatrix,T,AT,ierr) > [...] > Vec T, AT > Mat TheShellMatrix, IDEN > PetscReal fval(1) > [...] > call PetscInitialize(PETSC_NULL_CHARACTER,ierr) > ind(1) = 10 > call MatCreate(PETSC_COMM_WORLD,IDEN,ierr) > call MatSetSizes(IDEN,PETSC_DECIDE,PETSC_DECIDE,ind,ind,ierr) > call MatSetUp(IDEN,ierr) > do i =0,9 > ind(1) = i > fval(1) = 1 > call MatSetValues(IDEN,1,ind,1,ind,fval(1),INSERT_VALUES,ierr) > enddo > call MatAssemblyBegin(IDEN,MAT_FINAL_ASSEMBLY,ierr) > call MatAssemblyEnd(IDEN,MAT_FINAL_ASSEMBLY,ierr) > call MatMult(IDEN,T,AT,ierr) > return > end subroutine MyMatMult > > Thanks in advance for any answer! > Francesco Migliorini > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Apr 7 06:25:25 2017 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 7 Apr 2017 06:25:25 -0500 Subject: [petsc-users] Using same DMPlex for solving two different problems In-Reply-To: References: Message-ID: On Thu, Apr 6, 2017 at 1:04 PM, Abhyankar, Shrirang G. wrote: > I am solving a time-dependent problem using DMNetwork (uses DMPlex > internally) to manage the network. To find the initial conditions, I need > to solve a nonlinear problem on the same network but with different number > of dofs on the nodes and edges. > > Question: Can I reuse the same DMNetwork (DMPlex) for solving the two > problems. The way I am trying to approach this currently is by creating > two PetscSections to be used with Plex. The first one is used for the > initial conditions and the second one is for the time-stepping. Here?s the > code I have > This is the right way to do it, but its easier if you call DMClone() first to get a new DM with the same Plex, and then change its default Section and solve your problem with it. Matt > > for(i=vStart; i < vEnd; i++) { > /* Two variables at each vertex for the initial condition problem */ > ierr = PetscSectionSetDof(dyn->initpflowpsection,i,2);CHKERRQ(ierr); > } > ierr = PetscSectionSetUp(dyn->initpflowpsection);CHKERRQ(ierr); > > /* Get the plex dm */ > ierr = DMNetworkGetPlex(networkdm,&plexdm);CHKERRQ(ierr); > > /* Get default sections associated with this plex set for time-stepping > */ > ierr = DMGetDefaultSection(plexdm,&dyn->defaultsection);CHKERRQ(ierr); > ierr = > DMGetDefaultGlobalSection(plexdm,&dyn->defaultglobalsection);CHKERRQ( > ierr); > > /* Increase the reference count so that the section does not get destroyed > when a new one is set with DMSetDefaultSection */ > ierr = > PetscObjectReference((PetscObject)dyn->defaultsection);CHKERRQ(ierr); > ierr = > PetscObjectReference((PetscObject)dyn->defaultglobalsection);CHKERRQ( > ierr); > > > /* Set the new section created for initial conditions */ > ierr = DMSetDefaultSection(plexdm,dyn->initpflowpsection);CHKERRQ(ierr); > ierr = > DMGetDefaultGlobalSection(plexdm,&dyn->initpflowpglobsection); > CHKERRQ(ierr) > ; > > > > Would this work or should I rather use DMPlexCreateSection to create the > PetscSection used for initial conditions (dyn->initpflowpsection)? Any > other problems that I should be aware of? Has anyone else attempted using > the same plex for solving two different problems? > > Thanks, > Shri > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From abhyshr at anl.gov Fri Apr 7 06:32:27 2017 From: abhyshr at anl.gov (Abhyankar, Shrirang G.) Date: Fri, 7 Apr 2017 11:32:27 +0000 Subject: [petsc-users] Using same DMPlex for solving two different problems In-Reply-To: References: , Message-ID: On Apr 7, 2017, at 6:25 AM, Matthew Knepley > wrote: On Thu, Apr 6, 2017 at 1:04 PM, Abhyankar, Shrirang G. > wrote: I am solving a time-dependent problem using DMNetwork (uses DMPlex internally) to manage the network. To find the initial conditions, I need to solve a nonlinear problem on the same network but with different number of dofs on the nodes and edges. Question: Can I reuse the same DMNetwork (DMPlex) for solving the two problems. The way I am trying to approach this currently is by creating two PetscSections to be used with Plex. The first one is used for the initial conditions and the second one is for the time-stepping. Here?s the code I have This is the right way to do it, but its easier if you call DMClone() first to get a new DM with the same Plex, and then change its default Section and solve your problem with it. Thanks Matt. Doesn't DMClone() make a shallow copy for Plex? So any section set for the cloned Plex will also be set for the original one? Matt for(i=vStart; i < vEnd; i++) { /* Two variables at each vertex for the initial condition problem */ ierr = PetscSectionSetDof(dyn->initpflowpsection,i,2);CHKERRQ(ierr); } ierr = PetscSectionSetUp(dyn->initpflowpsection);CHKERRQ(ierr); /* Get the plex dm */ ierr = DMNetworkGetPlex(networkdm,&plexdm);CHKERRQ(ierr); /* Get default sections associated with this plex set for time-stepping */ ierr = DMGetDefaultSection(plexdm,&dyn->defaultsection);CHKERRQ(ierr); ierr = DMGetDefaultGlobalSection(plexdm,&dyn->defaultglobalsection);CHKERRQ(ierr); /* Increase the reference count so that the section does not get destroyed when a new one is set with DMSetDefaultSection */ ierr = PetscObjectReference((PetscObject)dyn->defaultsection);CHKERRQ(ierr); ierr = PetscObjectReference((PetscObject)dyn->defaultglobalsection);CHKERRQ(ierr); /* Set the new section created for initial conditions */ ierr = DMSetDefaultSection(plexdm,dyn->initpflowpsection);CHKERRQ(ierr); ierr = DMGetDefaultGlobalSection(plexdm,&dyn->initpflowpglobsection);CHKERRQ(ierr) ; Would this work or should I rather use DMPlexCreateSection to create the PetscSection used for initial conditions (dyn->initpflowpsection)? Any other problems that I should be aware of? Has anyone else attempted using the same plex for solving two different problems? Thanks, Shri -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Apr 7 06:35:21 2017 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 7 Apr 2017 06:35:21 -0500 Subject: [petsc-users] Using same DMPlex for solving two different problems In-Reply-To: References: Message-ID: On Fri, Apr 7, 2017 at 6:32 AM, Abhyankar, Shrirang G. wrote: > > On Apr 7, 2017, at 6:25 AM, Matthew Knepley wrote: > > On Thu, Apr 6, 2017 at 1:04 PM, Abhyankar, Shrirang G. > wrote: > >> I am solving a time-dependent problem using DMNetwork (uses DMPlex >> internally) to manage the network. To find the initial conditions, I need >> to solve a nonlinear problem on the same network but with different number >> of dofs on the nodes and edges. >> >> Question: Can I reuse the same DMNetwork (DMPlex) for solving the two >> problems. The way I am trying to approach this currently is by creating >> two PetscSections to be used with Plex. The first one is used for the >> initial conditions and the second one is for the time-stepping. Here?s the >> code I have >> > > This is the right way to do it, but its easier if you call DMClone() first > to get a new > DM with the same Plex, and then change its default Section and solve your > problem > with it. > > > Thanks Matt. Doesn't DMClone() make a shallow copy for Plex? So any > section set for the cloned Plex will also be set for the original one? > The Plex is just a reference, but its the implementation. The DM itself is copied, so the Section is independent. That is the point. Thanks, Matt > Matt > > >> >> for(i=vStart; i < vEnd; i++) { >> /* Two variables at each vertex for the initial condition problem */ >> ierr = PetscSectionSetDof(dyn->initpflowpsection,i,2);CHKERRQ(ierr); >> } >> ierr = PetscSectionSetUp(dyn->initpflowpsection);CHKERRQ(ierr); >> >> /* Get the plex dm */ >> ierr = DMNetworkGetPlex(networkdm,&plexdm);CHKERRQ(ierr); >> >> /* Get default sections associated with this plex set for time-stepping >> */ >> ierr = DMGetDefaultSection(plexdm,&dyn->defaultsection);CHKERRQ(ierr); >> ierr = >> DMGetDefaultGlobalSection(plexdm,&dyn->defaultglobalsection) >> ;CHKERRQ(ierr); >> >> /* Increase the reference count so that the section does not get destroyed >> when a new one is set with DMSetDefaultSection */ >> ierr = >> PetscObjectReference((PetscObject)dyn->defaultsection);CHKERRQ(ierr); >> ierr = >> PetscObjectReference((PetscObject)dyn->defaultglobalsection) >> ;CHKERRQ(ierr); >> >> >> /* Set the new section created for initial conditions */ >> ierr = DMSetDefaultSection(plexdm,dyn->initpflowpsection);CHKERRQ( >> ierr); >> ierr = >> DMGetDefaultGlobalSection(plexdm,&dyn->initpflowpglobsection >> );CHKERRQ(ierr) >> ; >> >> >> >> Would this work or should I rather use DMPlexCreateSection to create the >> PetscSection used for initial conditions (dyn->initpflowpsection)? Any >> other problems that I should be aware of? Has anyone else attempted using >> the same plex for solving two different problems? >> >> Thanks, >> Shri >> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From mailinglists at xgm.de Fri Apr 7 06:40:10 2017 From: mailinglists at xgm.de (Florian Lindner) Date: Fri, 7 Apr 2017 13:40:10 +0200 Subject: [petsc-users] Symmetric matrix: Setting entries below diagonal Message-ID: <495dc5fd-402a-bb30-05f6-e5913392dd23@xgm.de> Hello, two questions about symmetric (MATSBAIJ) matrices. + Entries set with MatSetValue below the main diagonal are ignored. Is that by design? I rather expected setting A_ij to have the same effect as setting A_ji. + Has MatSetOption to MAT_SYMMETRIC and MAT_SYMMETRIC_ETERNAL any gain on MATSBAIJ matrices? Thanks, Florian Test programm: #include "petscmat.h" #include "petscviewer.h" int main(int argc, char **argv) { PetscInitialize(&argc, &argv, "", NULL); PetscErrorCode ierr = 0; Mat A; ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr); MatSetType(A, MATSBAIJ); CHKERRQ(ierr); ierr = MatSetSizes(A, 4, 4, PETSC_DECIDE, PETSC_DECIDE); CHKERRQ(ierr); ierr = MatSetUp(A); CHKERRQ(ierr); ierr = MatSetOption(A, MAT_SYMMETRIC, PETSC_TRUE); CHKERRQ(ierr); ierr = MatSetOption(A, MAT_SYMMETRY_ETERNAL, PETSC_TRUE); CHKERRQ(ierr); // Stored ierr = MatSetValue(A, 1, 2, 21, INSERT_VALUES); CHKERRQ(ierr); ierr = MatSetValue(A, 1, 1, 11, INSERT_VALUES); CHKERRQ(ierr); // Ignored ierr = MatSetValue(A, 2, 1, 22, INSERT_VALUES); CHKERRQ(ierr); ierr = MatSetValue(A, 3, 2, 32, INSERT_VALUES); CHKERRQ(ierr); ierr = MatSetValue(A, 3, 1, 31, INSERT_VALUES); CHKERRQ(ierr); ierr = MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); ierr = MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); PetscViewer viewer; ierr = PetscViewerCreate(PETSC_COMM_WORLD, &viewer); CHKERRQ(ierr); ierr = PetscViewerSetType(viewer, PETSCVIEWERASCII); CHKERRQ(ierr); ierr = PetscViewerPushFormat(viewer, PETSC_VIEWER_ASCII_DENSE); CHKERRQ(ierr); ierr = MatView(A, viewer); CHKERRQ(ierr); ierr = PetscViewerPopFormat(viewer); CHKERRQ(ierr); ierr = PetscViewerDestroy(&viewer); CHKERRQ(ierr); PetscFinalize(); return 0; } From dave.mayhem23 at gmail.com Fri Apr 7 06:55:49 2017 From: dave.mayhem23 at gmail.com (Dave May) Date: Fri, 07 Apr 2017 11:55:49 +0000 Subject: [petsc-users] Using MatShell without MatMult In-Reply-To: References: Message-ID: You should also not call PetscInitialize() from within your user MatMult function. On Fri, 7 Apr 2017 at 13:24, Matthew Knepley wrote: > On Fri, Apr 7, 2017 at 5:11 AM, Francesco Migliorini < > francescomigliorini93 at gmail.com> wrote: > > Hello, > > I need to solve a linear system using GMRES without creating explicitly > the matrix because very large. So, I am trying to use the MatShell strategy > but I am stucked. The problem is that it seems to me that inside the > user-defined MyMatMult it is required to use MatMult and this would > honestly vanish all the gain from using this strategy. Indeed, I would need > to access directly to the entries of the input vector, multiply them by > some parameters imported in MyMatMult with *common* and finally compose > the output vector without creating any matrix. First of all, is it > possible? > > > Yes. > > > Secondly, if so, where is my mistake? Here's an example of my code with a > very simple 10x10 system with the identity matrix: > > [...] > call PetscInitialize(PETSC_NULL_CHARACTER,perr) > ind(1) = 10 > call VecCreate(PETSC_COMM_WORLD,feP,perr) > call VecSetSizes(feP,PETSC_DECIDE,ind,perr) > call VecSetFromOptions(feP,perr) > call VecDuplicate(feP,u1P,perr) > do jt = 1,10 > ind(1) = jt-1 > fval(1) = jt > call VecSetValues(feP,1,ind,fval(1),INSERT_VALUES,perr) > enddo > call VecAssemblyBegin(feP,perr) > call VecAssemblyEnd(feP,perr) > ind(1) = 10 > call MatCreateShell(PETSC_COMM_WORLD, PETSC_DECIDE, PETSC_DECIDE, ind, > ind, PETSC_NULL_INTEGER, TheShellMatrix, perr) > call MatShellSetOperation(TheShellMatrix, MATOP_MULT, MyMatMult, perr) > > > Here I would probably use > > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatShellSetContext.html > > instead of a common block, but that works too. > > > call KSPCreate(PETSC_COMM_WORLD, ksp, perr) > call KSPSetType(ksp,KSPGMRES,perr) > call KSPSetOperators(ksp,TheShellMatrix,TheShellMatrix,perr) > call KSPSolve(ksp,feP,u1P,perr) > call PetscFinalize(PETSC_NULL_CHARACTER,perr) > [...] > > subroutine MyMatMult(TheShellMatrix,T,AT,ierr) > [...] > Vec T, AT > Mat TheShellMatrix > PetscReal fval(1), u0(1) > [...] > call PetscInitialize(PETSC_NULL_CHARACTER,ierr) > ind(1) = 10 > call VecCreate(PETSC_COMM_WORLD,AT,ierr) > call VecSetSizes(AT,PETSC_DECIDE,ind,ierr) > call VecSetFromOptions(AT,ierr) > > > Its not your job to create AT. We are passing it in, so just use it. > > > do i =0,9 > ind(1) = i > call VecGetValues(T,1,ind,u0(1),ierr) > fval(1) = u0(1) > call VecSetValues(AT,1,ind,fval(1),INSERT_VALUES,ierr) > > > You can do it this way, but its easier to use > > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecGetArray.html > > outside the loop for both vectors. > > Matt > > > enddo > call VecAssemblyBegin(AT,ierr) > call VecAssemblyEnd(AT,ierr) > return > end subroutine MyMatMult > > The output of this code is something completely invented but in some way > related to the actual solution: > 5.0964719143762542E-002 > 0.10192943828752508 > 0.15289415743128765 > 0.20385887657505017 > 0.25482359571881275 > 0.30578831486257529 > 0.35675303400633784 > 0.40771775315010034 > 0.45868247229386289 > 0.50964719143762549 > > Instead, if I use MatMult in MyMatMult I get the right solution. Here's > the code > > subroutine MyMatMult(TheShellMatrix,T,AT,ierr) > [...] > Vec T, AT > Mat TheShellMatrix, IDEN > PetscReal fval(1) > [...] > call PetscInitialize(PETSC_NULL_CHARACTER,ierr) > ind(1) = 10 > call MatCreate(PETSC_COMM_WORLD,IDEN,ierr) > call MatSetSizes(IDEN,PETSC_DECIDE,PETSC_DECIDE,ind,ind,ierr) > call MatSetUp(IDEN,ierr) > do i =0,9 > ind(1) = i > fval(1) = 1 > call MatSetValues(IDEN,1,ind,1,ind,fval(1),INSERT_VALUES,ierr) > enddo > call MatAssemblyBegin(IDEN,MAT_FINAL_ASSEMBLY,ierr) > call MatAssemblyEnd(IDEN,MAT_FINAL_ASSEMBLY,ierr) > call MatMult(IDEN,T,AT,ierr) > return > end subroutine MyMatMult > > Thanks in advance for any answer! > Francesco Migliorini > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fande.kong at inl.gov Fri Apr 7 09:31:15 2017 From: fande.kong at inl.gov (Kong, Fande) Date: Fri, 7 Apr 2017 08:31:15 -0600 Subject: [petsc-users] EPSViewer in SLEPc In-Reply-To: <889F2A82-ECD1-441C-A859-35A16243B147@dsic.upv.es> References: <889F2A82-ECD1-441C-A859-35A16243B147@dsic.upv.es> Message-ID: Thanks, Jose Fande, On Fri, Apr 7, 2017 at 4:41 AM, Jose E. Roman wrote: > I have pushed a commit that should avoid this problem. > Jose > > > El 6 abr 2017, a las 22:27, Kong, Fande escribi?: > > > > Hi All, > > > > The EPSViewer in SLEPc looks weird. I do not understand the viewer > logic. For example there is a piece of code in SLEPc (at line 225 of > epsview.c): > > > > if (!ispower) { > > if (!eps->ds) { ierr = EPSGetDS(eps,&eps->ds);CHKERRQ(ierr); } > > ierr = DSView(eps->ds,viewer);CHKERRQ(ierr); > > } > > > > > > If eps->ds is NULL, why we are going to create a new one? I just want to > view this object. If it is NULL, you should just show me that this object > is empty. You could print out: ds: null. > > > > If a user wants to develop a new EPS solver, and then register the new > EPS to SLEPc. But the user does not want to use DS, and DSView will show > some error messages: > > > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [0]PETSC ERROR: Object is in wrong state > > [0]PETSC ERROR: Requested matrix was not created in this DS > > [0]PETSC ERROR: See https://urldefense.proofpoint. > com/v2/url?u=http-3A__www.mcs.anl.gov_petsc_documentation_ > faq.html&d=DwIFaQ&c=54IZrppPQZKX9mLzcGdPfFD1hxrcB_ > _aEkJFOKJFd00&r=DUUt3SRGI0_JgtNaS3udV68GRkgV4ts7XKfj2opmiCY&m= > RUH2LlACLIVsE06Hdki8z27uIfsiU8hQJ2mN6Lxo628&s= > T1QKhCMs9EnX64WJhlZd0wRvwQB0W6aeVSiC6R02Gag&e= for trouble shooting. > > [0]PETSC ERROR: Petsc Release Version 3.7.5, unknown > > [0]PETSC ERROR: ../../../moose_test-opt on a arch-darwin-c-debug named > FN604208 by kongf Thu Apr 6 14:22:14 2017 > > [0]PETSC ERROR: #1 DSViewMat() line 149 in /slepc/src/sys/classes/ds/ > interface/dspriv.c > > [0]PETSC ERROR: #2 DSView_NHEP() line 47 in/slepc/src/sys/classes/ds/ > impls/nhep/dsnhep.c > > [0]PETSC ERROR: #3 DSView() line 772 in/slepc/src/sys/classes/ds/ > interface/dsbasic.c > > [0]PETSC ERROR: #4 EPSView() line 227 in /slepc/src/eps/interface/ > epsview.c > > [0]PETSC ERROR: #5 PetscObjectView() line 106 in/petsc/src/sys/objects/ > destroy.c > > [0]PETSC ERROR: #6 PetscObjectViewFromOptions() line 2808 in > /petsc/src/sys/objects/options.c > > [0]PETSC ERROR: #7 EPSSolve() line 159 in /slepc/src/eps/interface/ > epssolve.c > > > > > > > > Fande, > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Apr 7 10:19:52 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 7 Apr 2017 10:19:52 -0500 Subject: [petsc-users] Symmetric matrix: Setting entries below diagonal In-Reply-To: <495dc5fd-402a-bb30-05f6-e5913392dd23@xgm.de> References: <495dc5fd-402a-bb30-05f6-e5913392dd23@xgm.de> Message-ID: > On Apr 7, 2017, at 6:40 AM, Florian Lindner wrote: > > Hello, > > two questions about symmetric (MATSBAIJ) matrices. > > + Entries set with MatSetValue below the main diagonal are ignored. Is that by design? Yes > I rather expected setting A_ij to > have the same effect as setting A_ji. You need to check the relationship between i and j and swap them if needed before the call. > > + Has MatSetOption to MAT_SYMMETRIC and MAT_SYMMETRIC_ETERNAL any gain on MATSBAIJ matrices? No > > Thanks, > Florian > > Test programm: > > > #include "petscmat.h" > #include "petscviewer.h" > > int main(int argc, char **argv) > { > PetscInitialize(&argc, &argv, "", NULL); > PetscErrorCode ierr = 0; > > Mat A; > ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr); > MatSetType(A, MATSBAIJ); CHKERRQ(ierr); > ierr = MatSetSizes(A, 4, 4, PETSC_DECIDE, PETSC_DECIDE); CHKERRQ(ierr); > ierr = MatSetUp(A); CHKERRQ(ierr); > ierr = MatSetOption(A, MAT_SYMMETRIC, PETSC_TRUE); CHKERRQ(ierr); > ierr = MatSetOption(A, MAT_SYMMETRY_ETERNAL, PETSC_TRUE); CHKERRQ(ierr); > > // Stored > ierr = MatSetValue(A, 1, 2, 21, INSERT_VALUES); CHKERRQ(ierr); > ierr = MatSetValue(A, 1, 1, 11, INSERT_VALUES); CHKERRQ(ierr); > > // Ignored > ierr = MatSetValue(A, 2, 1, 22, INSERT_VALUES); CHKERRQ(ierr); > ierr = MatSetValue(A, 3, 2, 32, INSERT_VALUES); CHKERRQ(ierr); > ierr = MatSetValue(A, 3, 1, 31, INSERT_VALUES); CHKERRQ(ierr); > > ierr = MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); > ierr = MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); > > PetscViewer viewer; > ierr = PetscViewerCreate(PETSC_COMM_WORLD, &viewer); CHKERRQ(ierr); > ierr = PetscViewerSetType(viewer, PETSCVIEWERASCII); CHKERRQ(ierr); > ierr = PetscViewerPushFormat(viewer, PETSC_VIEWER_ASCII_DENSE); CHKERRQ(ierr); > ierr = MatView(A, viewer); CHKERRQ(ierr); > ierr = PetscViewerPopFormat(viewer); CHKERRQ(ierr); > ierr = PetscViewerDestroy(&viewer); CHKERRQ(ierr); > > PetscFinalize(); > return 0; > } From ibarletta at inogs.it Fri Apr 7 11:23:40 2017 From: ibarletta at inogs.it (Barletta, Ivano) Date: Fri, 7 Apr 2017 18:23:40 +0200 Subject: [petsc-users] Symmetric matrix: Setting entries below diagonal In-Reply-To: References: <495dc5fd-402a-bb30-05f6-e5913392dd23@xgm.de> Message-ID: So, as far as I understand, the only benefit of PETSc with symmetric matrices is only when Matrix values are set, by reducing the overhead of MatSetValue calls? Thanks, Ivano 2017-04-07 17:19 GMT+02:00 Barry Smith : > > > On Apr 7, 2017, at 6:40 AM, Florian Lindner wrote: > > > > Hello, > > > > two questions about symmetric (MATSBAIJ) matrices. > > > > + Entries set with MatSetValue below the main diagonal are ignored. Is > that by design? > > Yes > > > I rather expected setting A_ij to > > have the same effect as setting A_ji. > > You need to check the relationship between i and j and swap them if > needed before the call. > > > > > + Has MatSetOption to MAT_SYMMETRIC and MAT_SYMMETRIC_ETERNAL any gain > on MATSBAIJ matrices? > > No > > > > > Thanks, > > Florian > > > > Test programm: > > > > > > #include "petscmat.h" > > #include "petscviewer.h" > > > > int main(int argc, char **argv) > > { > > PetscInitialize(&argc, &argv, "", NULL); > > PetscErrorCode ierr = 0; > > > > Mat A; > > ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr); > > MatSetType(A, MATSBAIJ); CHKERRQ(ierr); > > ierr = MatSetSizes(A, 4, 4, PETSC_DECIDE, PETSC_DECIDE); CHKERRQ(ierr); > > ierr = MatSetUp(A); CHKERRQ(ierr); > > ierr = MatSetOption(A, MAT_SYMMETRIC, PETSC_TRUE); CHKERRQ(ierr); > > ierr = MatSetOption(A, MAT_SYMMETRY_ETERNAL, PETSC_TRUE); CHKERRQ(ierr); > > > > // Stored > > ierr = MatSetValue(A, 1, 2, 21, INSERT_VALUES); CHKERRQ(ierr); > > ierr = MatSetValue(A, 1, 1, 11, INSERT_VALUES); CHKERRQ(ierr); > > > > // Ignored > > ierr = MatSetValue(A, 2, 1, 22, INSERT_VALUES); CHKERRQ(ierr); > > ierr = MatSetValue(A, 3, 2, 32, INSERT_VALUES); CHKERRQ(ierr); > > ierr = MatSetValue(A, 3, 1, 31, INSERT_VALUES); CHKERRQ(ierr); > > > > ierr = MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); > > ierr = MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); > > > > PetscViewer viewer; > > ierr = PetscViewerCreate(PETSC_COMM_WORLD, &viewer); CHKERRQ(ierr); > > ierr = PetscViewerSetType(viewer, PETSCVIEWERASCII); CHKERRQ(ierr); > > ierr = PetscViewerPushFormat(viewer, PETSC_VIEWER_ASCII_DENSE); > CHKERRQ(ierr); > > ierr = MatView(A, viewer); CHKERRQ(ierr); > > ierr = PetscViewerPopFormat(viewer); CHKERRQ(ierr); > > ierr = PetscViewerDestroy(&viewer); CHKERRQ(ierr); > > > > PetscFinalize(); > > return 0; > > } > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Apr 7 11:27:35 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 7 Apr 2017 11:27:35 -0500 Subject: [petsc-users] Symmetric matrix: Setting entries below diagonal In-Reply-To: References: <495dc5fd-402a-bb30-05f6-e5913392dd23@xgm.de> Message-ID: <5EE2915D-6481-4C18-B9AA-A571C3EA950F@mcs.anl.gov> If you want to set all values in the matrix and have the SBAIJ matrix ignore those below the diagonal you can use MatSetOption(mat,MAT_IGNORE_LOWER_TRIANGULAR,PETSC_TRUE); or the options database -mat_ignore_lower_triangular This is useful when you have a symmetric matrix but you want to switch between using AIJ and SBAIJ format without changing anything in the code. Barry > On Apr 7, 2017, at 10:19 AM, Barry Smith wrote: > > >> On Apr 7, 2017, at 6:40 AM, Florian Lindner wrote: >> >> Hello, >> >> two questions about symmetric (MATSBAIJ) matrices. >> >> + Entries set with MatSetValue below the main diagonal are ignored. Is that by design? > > Yes > >> I rather expected setting A_ij to >> have the same effect as setting A_ji. > > You need to check the relationship between i and j and swap them if needed before the call. > >> >> + Has MatSetOption to MAT_SYMMETRIC and MAT_SYMMETRIC_ETERNAL any gain on MATSBAIJ matrices? > > No > >> >> Thanks, >> Florian >> >> Test programm: >> >> >> #include "petscmat.h" >> #include "petscviewer.h" >> >> int main(int argc, char **argv) >> { >> PetscInitialize(&argc, &argv, "", NULL); >> PetscErrorCode ierr = 0; >> >> Mat A; >> ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr); >> MatSetType(A, MATSBAIJ); CHKERRQ(ierr); >> ierr = MatSetSizes(A, 4, 4, PETSC_DECIDE, PETSC_DECIDE); CHKERRQ(ierr); >> ierr = MatSetUp(A); CHKERRQ(ierr); >> ierr = MatSetOption(A, MAT_SYMMETRIC, PETSC_TRUE); CHKERRQ(ierr); >> ierr = MatSetOption(A, MAT_SYMMETRY_ETERNAL, PETSC_TRUE); CHKERRQ(ierr); >> >> // Stored >> ierr = MatSetValue(A, 1, 2, 21, INSERT_VALUES); CHKERRQ(ierr); >> ierr = MatSetValue(A, 1, 1, 11, INSERT_VALUES); CHKERRQ(ierr); >> >> // Ignored >> ierr = MatSetValue(A, 2, 1, 22, INSERT_VALUES); CHKERRQ(ierr); >> ierr = MatSetValue(A, 3, 2, 32, INSERT_VALUES); CHKERRQ(ierr); >> ierr = MatSetValue(A, 3, 1, 31, INSERT_VALUES); CHKERRQ(ierr); >> >> ierr = MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); >> ierr = MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); >> >> PetscViewer viewer; >> ierr = PetscViewerCreate(PETSC_COMM_WORLD, &viewer); CHKERRQ(ierr); >> ierr = PetscViewerSetType(viewer, PETSCVIEWERASCII); CHKERRQ(ierr); >> ierr = PetscViewerPushFormat(viewer, PETSC_VIEWER_ASCII_DENSE); CHKERRQ(ierr); >> ierr = MatView(A, viewer); CHKERRQ(ierr); >> ierr = PetscViewerPopFormat(viewer); CHKERRQ(ierr); >> ierr = PetscViewerDestroy(&viewer); CHKERRQ(ierr); >> >> PetscFinalize(); >> return 0; >> } > From bsmith at mcs.anl.gov Fri Apr 7 11:30:33 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 7 Apr 2017 11:30:33 -0500 Subject: [petsc-users] Symmetric matrix: Setting entries below diagonal In-Reply-To: References: <495dc5fd-402a-bb30-05f6-e5913392dd23@xgm.de> Message-ID: > On Apr 7, 2017, at 11:23 AM, Barletta, Ivano wrote: > > So, as far as I understand, the only benefit of PETSc with symmetric matrices > is only when Matrix values are set, by reducing the overhead of MatSetValue calls? The benefits of using SBAIJ matrices are 1 You don't need to compute or set values below the diagonal 2) the matrix storage requires about 1/2 the memory since the lower diagonal part is not stored If you use AIJ or BAIJ matrices and MatSetOption() to indicate they are symmetric there is, of course no benefit of 1 or 2. Barry > > Thanks, > Ivano > > 2017-04-07 17:19 GMT+02:00 Barry Smith : > > > On Apr 7, 2017, at 6:40 AM, Florian Lindner wrote: > > > > Hello, > > > > two questions about symmetric (MATSBAIJ) matrices. > > > > + Entries set with MatSetValue below the main diagonal are ignored. Is that by design? > > Yes > > > I rather expected setting A_ij to > > have the same effect as setting A_ji. > > You need to check the relationship between i and j and swap them if needed before the call. > > > > > + Has MatSetOption to MAT_SYMMETRIC and MAT_SYMMETRIC_ETERNAL any gain on MATSBAIJ matrices? > > No > > > > > Thanks, > > Florian > > > > Test programm: > > > > > > #include "petscmat.h" > > #include "petscviewer.h" > > > > int main(int argc, char **argv) > > { > > PetscInitialize(&argc, &argv, "", NULL); > > PetscErrorCode ierr = 0; > > > > Mat A; > > ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr); > > MatSetType(A, MATSBAIJ); CHKERRQ(ierr); > > ierr = MatSetSizes(A, 4, 4, PETSC_DECIDE, PETSC_DECIDE); CHKERRQ(ierr); > > ierr = MatSetUp(A); CHKERRQ(ierr); > > ierr = MatSetOption(A, MAT_SYMMETRIC, PETSC_TRUE); CHKERRQ(ierr); > > ierr = MatSetOption(A, MAT_SYMMETRY_ETERNAL, PETSC_TRUE); CHKERRQ(ierr); > > > > // Stored > > ierr = MatSetValue(A, 1, 2, 21, INSERT_VALUES); CHKERRQ(ierr); > > ierr = MatSetValue(A, 1, 1, 11, INSERT_VALUES); CHKERRQ(ierr); > > > > // Ignored > > ierr = MatSetValue(A, 2, 1, 22, INSERT_VALUES); CHKERRQ(ierr); > > ierr = MatSetValue(A, 3, 2, 32, INSERT_VALUES); CHKERRQ(ierr); > > ierr = MatSetValue(A, 3, 1, 31, INSERT_VALUES); CHKERRQ(ierr); > > > > ierr = MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); > > ierr = MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); > > > > PetscViewer viewer; > > ierr = PetscViewerCreate(PETSC_COMM_WORLD, &viewer); CHKERRQ(ierr); > > ierr = PetscViewerSetType(viewer, PETSCVIEWERASCII); CHKERRQ(ierr); > > ierr = PetscViewerPushFormat(viewer, PETSC_VIEWER_ASCII_DENSE); CHKERRQ(ierr); > > ierr = MatView(A, viewer); CHKERRQ(ierr); > > ierr = PetscViewerPopFormat(viewer); CHKERRQ(ierr); > > ierr = PetscViewerDestroy(&viewer); CHKERRQ(ierr); > > > > PetscFinalize(); > > return 0; > > } > > From knepley at gmail.com Fri Apr 7 12:10:57 2017 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 7 Apr 2017 12:10:57 -0500 Subject: [petsc-users] Symmetric matrix: Setting entries below diagonal In-Reply-To: References: <495dc5fd-402a-bb30-05f6-e5913392dd23@xgm.de> Message-ID: On Fri, Apr 7, 2017 at 11:23 AM, Barletta, Ivano wrote: > So, as far as I understand, the only benefit of PETSc with symmetric > matrices > is only when Matrix values are set, by reducing the overhead of > MatSetValue calls? > It halves the storage. There is a slight advantage from not having to load the lower triangle. Matt > Thanks, > Ivano > > 2017-04-07 17:19 GMT+02:00 Barry Smith : > >> >> > On Apr 7, 2017, at 6:40 AM, Florian Lindner >> wrote: >> > >> > Hello, >> > >> > two questions about symmetric (MATSBAIJ) matrices. >> > >> > + Entries set with MatSetValue below the main diagonal are ignored. Is >> that by design? >> >> Yes >> >> > I rather expected setting A_ij to >> > have the same effect as setting A_ji. >> >> You need to check the relationship between i and j and swap them if >> needed before the call. >> >> > >> > + Has MatSetOption to MAT_SYMMETRIC and MAT_SYMMETRIC_ETERNAL any gain >> on MATSBAIJ matrices? >> >> No >> >> > >> > Thanks, >> > Florian >> > >> > Test programm: >> > >> > >> > #include "petscmat.h" >> > #include "petscviewer.h" >> > >> > int main(int argc, char **argv) >> > { >> > PetscInitialize(&argc, &argv, "", NULL); >> > PetscErrorCode ierr = 0; >> > >> > Mat A; >> > ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr); >> > MatSetType(A, MATSBAIJ); CHKERRQ(ierr); >> > ierr = MatSetSizes(A, 4, 4, PETSC_DECIDE, PETSC_DECIDE); CHKERRQ(ierr); >> > ierr = MatSetUp(A); CHKERRQ(ierr); >> > ierr = MatSetOption(A, MAT_SYMMETRIC, PETSC_TRUE); CHKERRQ(ierr); >> > ierr = MatSetOption(A, MAT_SYMMETRY_ETERNAL, PETSC_TRUE); >> CHKERRQ(ierr); >> > >> > // Stored >> > ierr = MatSetValue(A, 1, 2, 21, INSERT_VALUES); CHKERRQ(ierr); >> > ierr = MatSetValue(A, 1, 1, 11, INSERT_VALUES); CHKERRQ(ierr); >> > >> > // Ignored >> > ierr = MatSetValue(A, 2, 1, 22, INSERT_VALUES); CHKERRQ(ierr); >> > ierr = MatSetValue(A, 3, 2, 32, INSERT_VALUES); CHKERRQ(ierr); >> > ierr = MatSetValue(A, 3, 1, 31, INSERT_VALUES); CHKERRQ(ierr); >> > >> > ierr = MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); >> > ierr = MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); >> > >> > PetscViewer viewer; >> > ierr = PetscViewerCreate(PETSC_COMM_WORLD, &viewer); CHKERRQ(ierr); >> > ierr = PetscViewerSetType(viewer, PETSCVIEWERASCII); CHKERRQ(ierr); >> > ierr = PetscViewerPushFormat(viewer, PETSC_VIEWER_ASCII_DENSE); >> CHKERRQ(ierr); >> > ierr = MatView(A, viewer); CHKERRQ(ierr); >> > ierr = PetscViewerPopFormat(viewer); CHKERRQ(ierr); >> > ierr = PetscViewerDestroy(&viewer); CHKERRQ(ierr); >> > >> > PetscFinalize(); >> > return 0; >> > } >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bhatiamanav at gmail.com Fri Apr 7 13:46:13 2017 From: bhatiamanav at gmail.com (Manav Bhatia) Date: Fri, 7 Apr 2017 13:46:13 -0500 Subject: [petsc-users] odd behavior when using lapack's dgeev with petsc Message-ID: <9ABBA7EA-63AC-40BD-BA47-64D77E6B0471@gmail.com> Hi, I have compile petsc on my Ubuntu machine (also Mac OS 10.12 separately) to link to the system lapack and blas libraries (shown below). I have created an interface class to dgeev in lapack to calculate the eigenvalues of a matrix. My application code links to multiple libraries: libMesh, petsc, slepc, hdf5, etc. If I test my interface inside this application code, I get junk results. However, on the same machine, if I use the interface in a separate main() function without linking to any of the libraries except lapack and blas, then I get expected results. Also, this problem does not show up on Mac. I am not sure what could be causing this and don?t quite know where to start. Could Petsc have anything to do with this? Any insight would be greatly appreciated. Regards, Manav manav at manav1:~/test$ ldd /opt/local/lib/libpetsc.so linux-vdso.so.1 => (0x00007fff3e7a8000) libsuperlu_dist.so.5 => /opt/local/lib/libsuperlu_dist.so.5 (0x00007f721fbd1000) libparmetis.so => /opt/local/lib/libparmetis.so (0x00007f721f990000) libmetis.so => /opt/local/lib/libmetis.so (0x00007f721f718000) libsuperlu.so.5 => /opt/local/lib/libsuperlu.so.5 (0x00007f721f4a7000) libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f721f124000) liblapack.so.3 => /usr/lib/liblapack.so.3 (0x00007f721e92c000) libblas.so.3 => /usr/lib/libblas.so.3 (0x00007f721e6bd000) libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007f721e382000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f721e079000) libmpi_mpifh.so.12 => /usr/lib/libmpi_mpifh.so.12 (0x00007f721de20000) libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007f721daf4000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f721d8f0000) libmpi.so.12 => /usr/lib/libmpi.so.12 (0x00007f721d61a000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f721d403000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f721d1e6000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f721ce1d000) /lib64/ld-linux-x86-64.so.2 (0x000055d739f1b000) libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007f721cbcf000) libopen-pal.so.13 => /usr/lib/libopen-pal.so.13 (0x00007f721c932000) libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f721c6f2000) libibverbs.so.1 => /usr/lib/libibverbs.so.1 (0x00007f721c4e3000) libopen-rte.so.12 => /usr/lib/libopen-rte.so.12 (0x00007f721c269000) libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007f721c064000) libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007f721be5e000) librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f721bc56000) libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f721ba52000) libhwloc.so.5 => /usr/lib/x86_64-linux-gnu/libhwloc.so.5 (0x00007f721b818000) libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007f721b60c000) libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007f721b402000) -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Apr 7 14:40:29 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 7 Apr 2017 14:40:29 -0500 Subject: [petsc-users] odd behavior when using lapack's dgeev with petsc In-Reply-To: <9ABBA7EA-63AC-40BD-BA47-64D77E6B0471@gmail.com> References: <9ABBA7EA-63AC-40BD-BA47-64D77E6B0471@gmail.com> Message-ID: <74FE8F6B-C424-4E87-A3FD-29C4F1213236@mcs.anl.gov> > On Apr 7, 2017, at 1:46 PM, Manav Bhatia wrote: > > Hi, > > I have compile petsc on my Ubuntu machine (also Mac OS 10.12 separately) to link to the system lapack and blas libraries (shown below). > > I have created an interface class to dgeev in lapack to calculate the eigenvalues of a matrix. > > My application code links to multiple libraries: libMesh, petsc, slepc, hdf5, etc. > > If I test my interface inside this application code, I get junk results. This is easy to debug because you have a version that works. Run both versions in separate windows each in a debugger and put a break point in the dgeev_ function. When it gets there check that it is the same dgeev_ function in both cases and check that the inputs are the same then step through both to see when things start to change between the two. > > However, on the same machine, if I use the interface in a separate main() function without linking to any of the libraries except lapack and blas, then I get expected results. > > Also, this problem does not show up on Mac. > > I am not sure what could be causing this and don?t quite know where to start. Could Petsc have anything to do with this? > > Any insight would be greatly appreciated. > > Regards, > Manav > > manav at manav1:~/test$ ldd /opt/local/lib/libpetsc.so > linux-vdso.so.1 => (0x00007fff3e7a8000) > libsuperlu_dist.so.5 => /opt/local/lib/libsuperlu_dist.so.5 (0x00007f721fbd1000) > libparmetis.so => /opt/local/lib/libparmetis.so (0x00007f721f990000) > libmetis.so => /opt/local/lib/libmetis.so (0x00007f721f718000) > libsuperlu.so.5 => /opt/local/lib/libsuperlu.so.5 (0x00007f721f4a7000) > libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f721f124000) > liblapack.so.3 => /usr/lib/liblapack.so.3 (0x00007f721e92c000) > libblas.so.3 => /usr/lib/libblas.so.3 (0x00007f721e6bd000) > libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007f721e382000) > libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f721e079000) > libmpi_mpifh.so.12 => /usr/lib/libmpi_mpifh.so.12 (0x00007f721de20000) > libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007f721daf4000) > libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f721d8f0000) > libmpi.so.12 => /usr/lib/libmpi.so.12 (0x00007f721d61a000) > libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f721d403000) > libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f721d1e6000) > libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f721ce1d000) > /lib64/ld-linux-x86-64.so.2 (0x000055d739f1b000) > libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007f721cbcf000) > libopen-pal.so.13 => /usr/lib/libopen-pal.so.13 (0x00007f721c932000) > libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f721c6f2000) > libibverbs.so.1 => /usr/lib/libibverbs.so.1 (0x00007f721c4e3000) > libopen-rte.so.12 => /usr/lib/libopen-rte.so.12 (0x00007f721c269000) > libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007f721c064000) > libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007f721be5e000) > librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f721bc56000) > libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f721ba52000) > libhwloc.so.5 => /usr/lib/x86_64-linux-gnu/libhwloc.so.5 (0x00007f721b818000) > libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007f721b60c000) > libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007f721b402000) > From bhatiamanav at gmail.com Fri Apr 7 14:57:13 2017 From: bhatiamanav at gmail.com (Manav Bhatia) Date: Fri, 7 Apr 2017 14:57:13 -0500 Subject: [petsc-users] odd behavior when using lapack's dgeev with petsc In-Reply-To: <74FE8F6B-C424-4E87-A3FD-29C4F1213236@mcs.anl.gov> References: <9ABBA7EA-63AC-40BD-BA47-64D77E6B0471@gmail.com> <74FE8F6B-C424-4E87-A3FD-29C4F1213236@mcs.anl.gov> Message-ID: Hi Barry, Thanks for the inputs. I did try that, but the debugger (gdb) stepped right over the dgeev_ call, without getting inside the function. I am wondering if this has anything to do with the fact that the system lapack library might not have any debugging info in it. Thoughts? Regards, Manav > On Apr 7, 2017, at 2:40 PM, Barry Smith wrote: > > >> On Apr 7, 2017, at 1:46 PM, Manav Bhatia wrote: >> >> Hi, >> >> I have compile petsc on my Ubuntu machine (also Mac OS 10.12 separately) to link to the system lapack and blas libraries (shown below). >> >> I have created an interface class to dgeev in lapack to calculate the eigenvalues of a matrix. >> >> My application code links to multiple libraries: libMesh, petsc, slepc, hdf5, etc. >> >> If I test my interface inside this application code, I get junk results. > > This is easy to debug because you have a version that works. > > Run both versions in separate windows each in a debugger and put a break point in the dgeev_ function. When it gets there check that it is the same dgeev_ function in both cases and check that the inputs are the same then step through both to see when things start to change between the two. > >> >> However, on the same machine, if I use the interface in a separate main() function without linking to any of the libraries except lapack and blas, then I get expected results. >> >> Also, this problem does not show up on Mac. >> >> I am not sure what could be causing this and don?t quite know where to start. Could Petsc have anything to do with this? >> >> Any insight would be greatly appreciated. >> >> Regards, >> Manav >> >> manav at manav1:~/test$ ldd /opt/local/lib/libpetsc.so >> linux-vdso.so.1 => (0x00007fff3e7a8000) >> libsuperlu_dist.so.5 => /opt/local/lib/libsuperlu_dist.so.5 (0x00007f721fbd1000) >> libparmetis.so => /opt/local/lib/libparmetis.so (0x00007f721f990000) >> libmetis.so => /opt/local/lib/libmetis.so (0x00007f721f718000) >> libsuperlu.so.5 => /opt/local/lib/libsuperlu.so.5 (0x00007f721f4a7000) >> libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f721f124000) >> liblapack.so.3 => /usr/lib/liblapack.so.3 (0x00007f721e92c000) >> libblas.so.3 => /usr/lib/libblas.so.3 (0x00007f721e6bd000) >> libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007f721e382000) >> libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f721e079000) >> libmpi_mpifh.so.12 => /usr/lib/libmpi_mpifh.so.12 (0x00007f721de20000) >> libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007f721daf4000) >> libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f721d8f0000) >> libmpi.so.12 => /usr/lib/libmpi.so.12 (0x00007f721d61a000) >> libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f721d403000) >> libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f721d1e6000) >> libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f721ce1d000) >> /lib64/ld-linux-x86-64.so.2 (0x000055d739f1b000) >> libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007f721cbcf000) >> libopen-pal.so.13 => /usr/lib/libopen-pal.so.13 (0x00007f721c932000) >> libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f721c6f2000) >> libibverbs.so.1 => /usr/lib/libibverbs.so.1 (0x00007f721c4e3000) >> libopen-rte.so.12 => /usr/lib/libopen-rte.so.12 (0x00007f721c269000) >> libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007f721c064000) >> libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007f721be5e000) >> librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f721bc56000) >> libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f721ba52000) >> libhwloc.so.5 => /usr/lib/x86_64-linux-gnu/libhwloc.so.5 (0x00007f721b818000) >> libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007f721b60c000) >> libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007f721b402000) >> > From bsmith at mcs.anl.gov Fri Apr 7 15:22:17 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 7 Apr 2017 15:22:17 -0500 Subject: [petsc-users] odd behavior when using lapack's dgeev with petsc In-Reply-To: References: <9ABBA7EA-63AC-40BD-BA47-64D77E6B0471@gmail.com> <74FE8F6B-C424-4E87-A3FD-29C4F1213236@mcs.anl.gov> Message-ID: > On Apr 7, 2017, at 2:57 PM, Manav Bhatia wrote: > > Hi Barry, > > Thanks for the inputs. > > I did try that, but the debugger (gdb) stepped right over the dgeev_ call, without getting inside the function. Did it at least stop at the function so you do an up and print all the arguments passed in? > > I am wondering if this has anything to do with the fact that the system lapack library might not have any debugging info in it. Yeah I forgot it might not have them. Barry > > Thoughts? > > Regards, > Manav > >> On Apr 7, 2017, at 2:40 PM, Barry Smith wrote: >> >> >>> On Apr 7, 2017, at 1:46 PM, Manav Bhatia wrote: >>> >>> Hi, >>> >>> I have compile petsc on my Ubuntu machine (also Mac OS 10.12 separately) to link to the system lapack and blas libraries (shown below). >>> >>> I have created an interface class to dgeev in lapack to calculate the eigenvalues of a matrix. >>> >>> My application code links to multiple libraries: libMesh, petsc, slepc, hdf5, etc. >>> >>> If I test my interface inside this application code, I get junk results. >> >> This is easy to debug because you have a version that works. >> >> Run both versions in separate windows each in a debugger and put a break point in the dgeev_ function. When it gets there check that it is the same dgeev_ function in both cases and check that the inputs are the same then step through both to see when things start to change between the two. >> >>> >>> However, on the same machine, if I use the interface in a separate main() function without linking to any of the libraries except lapack and blas, then I get expected results. >>> >>> Also, this problem does not show up on Mac. >>> >>> I am not sure what could be causing this and don?t quite know where to start. Could Petsc have anything to do with this? >>> >>> Any insight would be greatly appreciated. >>> >>> Regards, >>> Manav >>> >>> manav at manav1:~/test$ ldd /opt/local/lib/libpetsc.so >>> linux-vdso.so.1 => (0x00007fff3e7a8000) >>> libsuperlu_dist.so.5 => /opt/local/lib/libsuperlu_dist.so.5 (0x00007f721fbd1000) >>> libparmetis.so => /opt/local/lib/libparmetis.so (0x00007f721f990000) >>> libmetis.so => /opt/local/lib/libmetis.so (0x00007f721f718000) >>> libsuperlu.so.5 => /opt/local/lib/libsuperlu.so.5 (0x00007f721f4a7000) >>> libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f721f124000) >>> liblapack.so.3 => /usr/lib/liblapack.so.3 (0x00007f721e92c000) >>> libblas.so.3 => /usr/lib/libblas.so.3 (0x00007f721e6bd000) >>> libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007f721e382000) >>> libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f721e079000) >>> libmpi_mpifh.so.12 => /usr/lib/libmpi_mpifh.so.12 (0x00007f721de20000) >>> libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007f721daf4000) >>> libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f721d8f0000) >>> libmpi.so.12 => /usr/lib/libmpi.so.12 (0x00007f721d61a000) >>> libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f721d403000) >>> libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f721d1e6000) >>> libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f721ce1d000) >>> /lib64/ld-linux-x86-64.so.2 (0x000055d739f1b000) >>> libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007f721cbcf000) >>> libopen-pal.so.13 => /usr/lib/libopen-pal.so.13 (0x00007f721c932000) >>> libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f721c6f2000) >>> libibverbs.so.1 => /usr/lib/libibverbs.so.1 (0x00007f721c4e3000) >>> libopen-rte.so.12 => /usr/lib/libopen-rte.so.12 (0x00007f721c269000) >>> libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007f721c064000) >>> libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007f721be5e000) >>> librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f721bc56000) >>> libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f721ba52000) >>> libhwloc.so.5 => /usr/lib/x86_64-linux-gnu/libhwloc.so.5 (0x00007f721b818000) >>> libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007f721b60c000) >>> libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007f721b402000) >>> >> > From bhatiamanav at gmail.com Fri Apr 7 15:34:11 2017 From: bhatiamanav at gmail.com (Manav Bhatia) Date: Fri, 7 Apr 2017 15:34:11 -0500 Subject: [petsc-users] odd behavior when using lapack's dgeev with petsc In-Reply-To: References: <9ABBA7EA-63AC-40BD-BA47-64D77E6B0471@gmail.com> <74FE8F6B-C424-4E87-A3FD-29C4F1213236@mcs.anl.gov> Message-ID: Yes, I printed the data in both cases and they look the same. I also used ?set step-mode on? to show the system lapack info, and they both are using the same lapack routine. This is still baffling me. -Manav > On Apr 7, 2017, at 3:22 PM, Barry Smith wrote: > > >> On Apr 7, 2017, at 2:57 PM, Manav Bhatia wrote: >> >> Hi Barry, >> >> Thanks for the inputs. >> >> I did try that, but the debugger (gdb) stepped right over the dgeev_ call, without getting inside the function. > > Did it at least stop at the function so you do an up and print all the arguments passed in? > >> >> I am wondering if this has anything to do with the fact that the system lapack library might not have any debugging info in it. > > Yeah I forgot it might not have them. > > Barry > >> >> Thoughts? >> >> Regards, >> Manav >> >>> On Apr 7, 2017, at 2:40 PM, Barry Smith wrote: >>> >>> >>>> On Apr 7, 2017, at 1:46 PM, Manav Bhatia wrote: >>>> >>>> Hi, >>>> >>>> I have compile petsc on my Ubuntu machine (also Mac OS 10.12 separately) to link to the system lapack and blas libraries (shown below). >>>> >>>> I have created an interface class to dgeev in lapack to calculate the eigenvalues of a matrix. >>>> >>>> My application code links to multiple libraries: libMesh, petsc, slepc, hdf5, etc. >>>> >>>> If I test my interface inside this application code, I get junk results. >>> >>> This is easy to debug because you have a version that works. >>> >>> Run both versions in separate windows each in a debugger and put a break point in the dgeev_ function. When it gets there check that it is the same dgeev_ function in both cases and check that the inputs are the same then step through both to see when things start to change between the two. >>> >>>> >>>> However, on the same machine, if I use the interface in a separate main() function without linking to any of the libraries except lapack and blas, then I get expected results. >>>> >>>> Also, this problem does not show up on Mac. >>>> >>>> I am not sure what could be causing this and don?t quite know where to start. Could Petsc have anything to do with this? >>>> >>>> Any insight would be greatly appreciated. >>>> >>>> Regards, >>>> Manav >>>> >>>> manav at manav1:~/test$ ldd /opt/local/lib/libpetsc.so >>>> linux-vdso.so.1 => (0x00007fff3e7a8000) >>>> libsuperlu_dist.so.5 => /opt/local/lib/libsuperlu_dist.so.5 (0x00007f721fbd1000) >>>> libparmetis.so => /opt/local/lib/libparmetis.so (0x00007f721f990000) >>>> libmetis.so => /opt/local/lib/libmetis.so (0x00007f721f718000) >>>> libsuperlu.so.5 => /opt/local/lib/libsuperlu.so.5 (0x00007f721f4a7000) >>>> libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f721f124000) >>>> liblapack.so.3 => /usr/lib/liblapack.so.3 (0x00007f721e92c000) >>>> libblas.so.3 => /usr/lib/libblas.so.3 (0x00007f721e6bd000) >>>> libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007f721e382000) >>>> libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f721e079000) >>>> libmpi_mpifh.so.12 => /usr/lib/libmpi_mpifh.so.12 (0x00007f721de20000) >>>> libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007f721daf4000) >>>> libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f721d8f0000) >>>> libmpi.so.12 => /usr/lib/libmpi.so.12 (0x00007f721d61a000) >>>> libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f721d403000) >>>> libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f721d1e6000) >>>> libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f721ce1d000) >>>> /lib64/ld-linux-x86-64.so.2 (0x000055d739f1b000) >>>> libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007f721cbcf000) >>>> libopen-pal.so.13 => /usr/lib/libopen-pal.so.13 (0x00007f721c932000) >>>> libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f721c6f2000) >>>> libibverbs.so.1 => /usr/lib/libibverbs.so.1 (0x00007f721c4e3000) >>>> libopen-rte.so.12 => /usr/lib/libopen-rte.so.12 (0x00007f721c269000) >>>> libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007f721c064000) >>>> libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007f721be5e000) >>>> librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f721bc56000) >>>> libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f721ba52000) >>>> libhwloc.so.5 => /usr/lib/x86_64-linux-gnu/libhwloc.so.5 (0x00007f721b818000) >>>> libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007f721b60c000) >>>> libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007f721b402000) >>>> >>> >> > From fande.kong at inl.gov Fri Apr 7 16:29:47 2017 From: fande.kong at inl.gov (Kong, Fande) Date: Fri, 7 Apr 2017 15:29:47 -0600 Subject: [petsc-users] GAMG for the unsymmetrical matrix In-Reply-To: <772D2966-F917-44D1-B2AC-B0F4E506DC7C@mcs.anl.gov> References: <66D79752-D4C3-4A2E-ADCE-EFC23C93CCD7@mcs.anl.gov> <772D2966-F917-44D1-B2AC-B0F4E506DC7C@mcs.anl.gov> Message-ID: Thanks, Barry. It works. GAMG is three times better than ASM in terms of the number of linear iterations, but it is five times slower than ASM. Any suggestions to improve the performance of GAMG? Log files are attached. Fande, On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith wrote: > > > On Apr 6, 2017, at 9:39 AM, Kong, Fande wrote: > > > > Thanks, Mark and Barry, > > > > It works pretty wells in terms of the number of linear iterations (using > "-pc_gamg_sym_graph true"), but it is horrible in the compute time. I am > using the two-level method via "-pc_mg_levels 2". The reason why the > compute time is larger than other preconditioning options is that a matrix > free method is used in the fine level and in my particular problem the > function evaluation is expensive. > > > > I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton, > but I do not think I want to make the preconditioning part matrix-free. Do > you guys know how to turn off the matrix-free method for GAMG? > > -pc_use_amat false > > > > > Here is the detailed solver: > > > > SNES Object: 384 MPI processes > > type: newtonls > > maximum iterations=200, maximum function evaluations=10000 > > tolerances: relative=1e-08, absolute=1e-08, solution=1e-50 > > total number of linear solver iterations=20 > > total number of function evaluations=166 > > norm schedule ALWAYS > > SNESLineSearch Object: 384 MPI processes > > type: bt > > interpolation: cubic > > alpha=1.000000e-04 > > maxstep=1.000000e+08, minlambda=1.000000e-12 > > tolerances: relative=1.000000e-08, absolute=1.000000e-15, > lambda=1.000000e-08 > > maximum iterations=40 > > KSP Object: 384 MPI processes > > type: gmres > > GMRES: restart=100, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > > GMRES: happy breakdown tolerance 1e-30 > > maximum iterations=100, initial guess is zero > > tolerances: relative=0.001, absolute=1e-50, divergence=10000. > > right preconditioning > > using UNPRECONDITIONED norm type for convergence test > > PC Object: 384 MPI processes > > type: gamg > > MG: type is MULTIPLICATIVE, levels=2 cycles=v > > Cycles per PCApply=1 > > Using Galerkin computed coarse grid matrices > > GAMG specific options > > Threshold for dropping small values from graph 0. > > AGG specific options > > Symmetric graph true > > Coarse grid solver -- level ------------------------------- > > KSP Object: (mg_coarse_) 384 MPI processes > > type: preonly > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using NONE norm type for convergence test > > PC Object: (mg_coarse_) 384 MPI processes > > type: bjacobi > > block Jacobi: number of blocks = 384 > > Local solve is same for all blocks, in the following KSP and > PC objects: > > KSP Object: (mg_coarse_sub_) 1 MPI processes > > type: preonly > > maximum iterations=1, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using NONE norm type for convergence test > > PC Object: (mg_coarse_sub_) 1 MPI processes > > type: lu > > LU: out-of-place factorization > > tolerance for zero pivot 2.22045e-14 > > using diagonal shift on blocks to prevent zero pivot > [INBLOCKS] > > matrix ordering: nd > > factor fill ratio given 5., needed 1.31367 > > Factored matrix follows: > > Mat Object: 1 MPI processes > > type: seqaij > > rows=37, cols=37 > > package used to perform factorization: petsc > > total: nonzeros=913, allocated nonzeros=913 > > total number of mallocs used during MatSetValues calls > =0 > > not using I-node routines > > linear system matrix = precond matrix: > > Mat Object: 1 MPI processes > > type: seqaij > > rows=37, cols=37 > > total: nonzeros=695, allocated nonzeros=695 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node routines > > linear system matrix = precond matrix: > > Mat Object: 384 MPI processes > > type: mpiaij > > rows=18145, cols=18145 > > total: nonzeros=1709115, allocated nonzeros=1709115 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node (on process 0) routines > > Down solver (pre-smoother) on level 1 ------------------------------ > - > > KSP Object: (mg_levels_1_) 384 MPI processes > > type: chebyshev > > Chebyshev: eigenvalue estimates: min = 0.133339, max = 1.46673 > > Chebyshev: eigenvalues estimated using gmres with > translations [0. 0.1; 0. 1.1] > > KSP Object: (mg_levels_1_esteig_) 384 MPI > processes > > type: gmres > > GMRES: restart=30, using Classical (unmodified) > Gram-Schmidt Orthogonalization with no iterative refinement > > GMRES: happy breakdown tolerance 1e-30 > > maximum iterations=10, initial guess is zero > > tolerances: relative=1e-12, absolute=1e-50, > divergence=10000. > > left preconditioning > > using PRECONDITIONED norm type for convergence test > > maximum iterations=2 > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using nonzero initial guess > > using NONE norm type for convergence test > > PC Object: (mg_levels_1_) 384 MPI processes > > type: sor > > SOR: type = local_symmetric, iterations = 1, local iterations > = 1, omega = 1. > > linear system matrix followed by preconditioner matrix: > > Mat Object: 384 MPI processes > > type: mffd > > rows=3020875, cols=3020875 > > Matrix-free approximation: > > err=1.49012e-08 (relative error in function evaluation) > > Using wp compute h routine > > Does not compute normU > > Mat Object: () 384 MPI processes > > type: mpiaij > > rows=3020875, cols=3020875 > > total: nonzeros=215671710, allocated nonzeros=241731750 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node (on process 0) routines > > Up solver (post-smoother) same as down solver (pre-smoother) > > linear system matrix followed by preconditioner matrix: > > Mat Object: 384 MPI processes > > type: mffd > > rows=3020875, cols=3020875 > > Matrix-free approximation: > > err=1.49012e-08 (relative error in function evaluation) > > Using wp compute h routine > > Does not compute normU > > Mat Object: () 384 MPI processes > > type: mpiaij > > rows=3020875, cols=3020875 > > total: nonzeros=215671710, allocated nonzeros=241731750 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node (on process 0) routines > > > > > > Fande, > > > > On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams wrote: > > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith wrote: > > > > > >> Does this mean that GAMG works for the symmetrical matrix only? > > > > > > No, it means that for non symmetric nonzero structure you need the > extra flag. So use the extra flag. The reason we don't always use the flag > is because it adds extra cost and isn't needed if the matrix already has a > symmetric nonzero structure. > > > > BTW, if you have symmetric non-zero structure you can just set > > -pc_gamg_threshold -1.0', note the "or" in the message. > > > > If you want to mess with the threshold then you need to use the > > symmetrized flag. > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- Time Step 10, time = 0.1 dt = 0.01 0 Nonlinear |R| = 2.004779e-03 0 Linear |R| = 2.004779e-03 1 Linear |R| = 1.080152e-03 2 Linear |R| = 5.066679e-04 3 Linear |R| = 3.045271e-04 4 Linear |R| = 1.925133e-04 5 Linear |R| = 1.404396e-04 6 Linear |R| = 1.087962e-04 7 Linear |R| = 9.433190e-05 8 Linear |R| = 8.650164e-05 9 Linear |R| = 7.511298e-05 10 Linear |R| = 6.116103e-05 11 Linear |R| = 5.097880e-05 12 Linear |R| = 4.528093e-05 13 Linear |R| = 4.238188e-05 14 Linear |R| = 3.852598e-05 15 Linear |R| = 3.211727e-05 16 Linear |R| = 2.655089e-05 17 Linear |R| = 2.308499e-05 18 Linear |R| = 1.988423e-05 19 Linear |R| = 1.686685e-05 20 Linear |R| = 1.453042e-05 21 Linear |R| = 1.227912e-05 22 Linear |R| = 9.829701e-06 23 Linear |R| = 7.695993e-06 24 Linear |R| = 6.092649e-06 25 Linear |R| = 5.293533e-06 26 Linear |R| = 4.583670e-06 27 Linear |R| = 3.427266e-06 28 Linear |R| = 2.442730e-06 29 Linear |R| = 1.855485e-06 1 Nonlinear |R| = 1.855485e-06 0 Linear |R| = 1.855485e-06 1 Linear |R| = 1.626392e-06 2 Linear |R| = 1.505583e-06 3 Linear |R| = 1.258325e-06 4 Linear |R| = 8.295100e-07 5 Linear |R| = 6.184171e-07 6 Linear |R| = 5.114149e-07 7 Linear |R| = 4.146942e-07 8 Linear |R| = 3.335395e-07 9 Linear |R| = 2.647491e-07 10 Linear |R| = 2.099801e-07 11 Linear |R| = 1.774148e-07 12 Linear |R| = 1.508766e-07 13 Linear |R| = 1.214361e-07 14 Linear |R| = 1.009707e-07 15 Linear |R| = 9.148193e-08 16 Linear |R| = 8.608036e-08 17 Linear |R| = 7.997930e-08 18 Linear |R| = 7.004223e-08 19 Linear |R| = 5.671891e-08 20 Linear |R| = 4.909039e-08 21 Linear |R| = 4.690188e-08 22 Linear |R| = 4.309895e-08 23 Linear |R| = 3.325854e-08 24 Linear |R| = 2.375529e-08 25 Linear |R| = 1.690025e-08 26 Linear |R| = 1.237871e-08 27 Linear |R| = 8.720643e-09 28 Linear |R| = 5.961891e-09 29 Linear |R| = 4.283073e-09 30 Linear |R| = 3.126338e-09 31 Linear |R| = 2.185008e-09 32 Linear |R| = 1.411854e-09 2 Nonlinear |R| = 1.411854e-09 SNES Object: 384 MPI processes type: newtonls maximum iterations=200, maximum function evaluations=10000 tolerances: relative=1e-08, absolute=1e-08, solution=1e-50 total number of linear solver iterations=61 total number of function evaluations=66 norm schedule ALWAYS SNESLineSearch Object: 384 MPI processes type: bt interpolation: cubic alpha=1.000000e-04 maxstep=1.000000e+08, minlambda=1.000000e-12 tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 maximum iterations=40 KSP Object: 384 MPI processes type: gmres GMRES: restart=100, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=100, initial guess is zero tolerances: relative=0.001, absolute=1e-50, divergence=10000. right preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 384 MPI processes type: asm Additive Schwarz: total subdomain blocks = 384, amount of overlap = 1 Additive Schwarz: restriction/interpolation type - RESTRICT Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1., needed 1. Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=20493, cols=20493 package used to perform factorization: petsc total: nonzeros=1270950, allocated nonzeros=1270950 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: () 1 MPI processes type: seqaij rows=20493, cols=20493 total: nonzeros=1270950, allocated nonzeros=1270950 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix followed by preconditioner matrix: Mat Object: 384 MPI processes type: mffd rows=3020875, cols=3020875 Matrix-free approximation: err=1.49012e-08 (relative error in function evaluation) Using wp compute h routine Does not compute normU Mat Object: () 384 MPI processes type: mpiaij rows=3020875, cols=3020875 total: nonzeros=215671710, allocated nonzeros=241731750 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Solve Converged! ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- /home/kongf/workhome/projects/yak/yak-opt on a arch-linux2-c-opt named r4i0n1 with 384 processors, by kongf Tue Mar 14 16:28:04 2017 Using Petsc Release Version 3.7.5, unknown Max Max/Min Avg Total Time (sec): 4.387e+02 1.00001 4.387e+02 Objects: 1.279e+03 1.00000 1.279e+03 Flops: 4.230e+09 1.99161 2.946e+09 1.131e+12 Flops/sec: 9.642e+06 1.99162 6.716e+06 2.579e+09 MPI Messages: 2.935e+05 4.95428 1.810e+05 6.951e+07 MPI Message Lengths: 3.105e+09 3.16103 1.072e+04 7.449e+11 MPI Reductions: 5.022e+04 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 4.3875e+02 100.0% 1.1314e+12 100.0% 6.951e+07 100.0% 1.072e+04 100.0% 5.022e+04 100.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage VecDot 20 1.0 3.2134e-03 2.4 4.53e+05 2.3 0.0e+00 0.0e+00 2.0e+01 0 0 0 0 0 0 0 0 0 0 37601 VecMDot 839 1.0 6.7209e-01 1.2 3.52e+08 2.3 0.0e+00 0.0e+00 8.4e+02 0 8 0 0 2 0 8 0 0 2 139634 VecNorm 1802 1.0 6.7932e+00 2.5 4.08e+07 2.3 0.0e+00 0.0e+00 1.8e+03 1 1 0 0 4 1 1 0 0 4 1603 VecScale 3877 1.0 1.0508e-01 1.4 1.34e+08 1.3 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 439546 VecCopy 4153 1.0 7.2803e-01 3.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 5493 1.0 5.1735e-01 6.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 5365 1.0 4.0282e-01 2.3 3.01e+08 1.4 0.0e+00 0.0e+00 0.0e+00 0 9 0 0 0 0 9 0 0 0 251646 VecWAXPY 884 1.0 5.5227e-02 3.5 1.97e+07 2.3 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 95341 VecMAXPY 864 1.0 1.7126e-01 2.6 3.71e+08 2.3 0.0e+00 0.0e+00 0.0e+00 0 9 0 0 0 0 9 0 0 0 577621 VecAssemblyBegin 15491 1.0 1.3738e+02 3.0 0.00e+00 0.0 8.9e+06 1.8e+04 4.6e+04 28 0 13 22 93 28 0 13 22 93 0 VecAssemblyEnd 15491 1.0 7.9072e-0128.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 13390 1.0 2.5097e+00 3.6 0.00e+00 0.0 5.9e+07 8.4e+03 2.8e+01 0 0 85 67 0 0 0 85 67 0 0 VecScatterEnd 13362 1.0 5.7428e+00 7.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecReduceArith 55 1.0 1.2808e-03 2.2 1.25e+06 2.3 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 259431 VecReduceComm 25 1.0 5.5003e-02 4.7 0.00e+00 0.0 0.0e+00 0.0e+00 2.5e+01 0 0 0 0 0 0 0 0 0 0 0 VecNormalize 864 1.0 4.4664e+00 3.5 2.93e+07 2.3 0.0e+00 0.0e+00 8.6e+02 1 1 0 0 2 1 1 0 0 2 1753 MatMult MF 859 1.0 3.1339e+02 1.0 4.12e+08 1.4 5.7e+07 9.6e+03 4.2e+04 71 12 81 73 83 71 12 81 73 83 439 MatMult 859 1.0 3.1340e+02 1.0 4.12e+08 1.4 5.7e+07 9.6e+03 4.2e+04 71 12 81 73 83 71 12 81 73 83 439 MatSolve 864 1.0 2.1255e+00 2.0 1.83e+09 2.1 0.0e+00 0.0e+00 0.0e+00 0 43 0 0 0 0 43 0 0 0 226791 MatLUFactorNum 25 1.0 1.0920e+00 2.4 1.20e+09 2.5 0.0e+00 0.0e+00 0.0e+00 0 26 0 0 0 0 26 0 0 0 267745 MatILUFactorSym 13 1.0 1.0606e-01 5.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 150 1.0 2.0643e+00 1.2 0.00e+00 0.0 1.7e+05 1.7e+05 2.0e+02 0 0 0 4 0 0 0 0 4 0 0 MatAssemblyEnd 150 1.0 4.3198e+00 1.1 0.00e+00 0.0 1.9e+04 1.1e+03 2.1e+02 1 0 0 0 0 1 0 0 0 0 0 MatGetRowIJ 13 1.0 1.3113e-0513.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetSubMatrice 25 1.0 4.4022e+00 2.8 0.00e+00 0.0 5.9e+05 8.4e+04 7.5e+01 1 0 1 7 0 1 0 1 7 0 0 MatGetOrdering 13 1.0 1.7283e-0217.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatIncreaseOvrlp 13 1.0 2.0244e-01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.3e+01 0 0 0 0 0 0 0 0 0 0 0 MatZeroEntries 29 1.0 5.0908e-02 4.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatView 52 2.0 5.5351e-02 3.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.3e+01 0 0 0 0 0 0 0 0 0 0 0 SNESSolve 13 1.0 3.7214e+02 1.0 4.21e+09 2.0 6.6e+07 1.0e+04 4.8e+04 85100 95 92 95 85100 95 92 95 3026 SNESFunctionEval 897 1.0 3.2606e+02 1.0 3.62e+08 1.3 5.9e+07 9.6e+03 4.3e+04 74 11 85 76 85 74 11 85 76 85 384 SNESJacobianEval 25 1.0 3.4770e+01 1.0 1.95e+07 1.4 2.3e+06 2.3e+04 1.9e+03 8 1 3 7 4 8 1 3 7 4 195 SNESLineSearch 25 1.0 1.8090e+01 1.0 2.57e+07 1.4 3.1e+06 1.0e+04 2.3e+03 4 1 4 4 5 4 1 4 4 5 475 BuildTwoSided 25 1.0 4.6378e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFSetGraph 25 1.0 2.7061e-04 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFReduceBegin 25 1.0 4.6412e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFReduceEnd 25 1.0 8.1301e-05 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPGMRESOrthog 839 1.0 8.0119e-01 1.2 7.03e+08 2.3 0.0e+00 0.0e+00 8.4e+02 0 17 0 0 2 0 17 0 0 2 234277 KSPSetUp 50 1.0 3.0220e-03 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 25 1.0 3.1444e+02 1.0 4.16e+09 2.0 6.0e+07 9.9e+03 4.3e+04 72 98 86 80 85 72 98 86 80 85 3526 PCSetUp 50 1.0 5.4896e+00 2.4 1.20e+09 2.5 7.1e+05 7.0e+04 1.8e+02 1 26 1 7 0 1 26 1 7 0 53260 PCSetUpOnBlocks 25 1.0 1.1928e+00 2.4 1.20e+09 2.5 0.0e+00 0.0e+00 0.0e+00 0 26 0 0 0 0 26 0 0 0 245124 PCApply 864 1.0 2.4803e+00 2.0 1.83e+09 2.1 4.1e+06 4.4e+03 0.0e+00 0 43 6 2 0 0 43 6 2 0 194354 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Vector 740 740 732012968 0. Vector Scatter 76 76 1212680 0. Index Set 176 176 4673716 0. IS L to G Mapping 33 33 3228828 0. MatMFFD 13 13 10088 0. Matrix 45 45 364469360 0. SNES 13 13 17316 0. SNESLineSearch 13 13 12896 0. DMSNES 13 13 8632 0. Distributed Mesh 13 13 60320 0. Star Forest Bipartite Graph 51 51 43248 0. Discrete System 13 13 11232 0. Krylov Solver 26 26 2223520 0. DMKSP interface 13 13 8424 0. Preconditioner 26 26 25688 0. Viewer 15 13 10816 0. ======================================================================================================================== Average time to get PetscTime(): 0. Average time for MPI_Barrier(): 1.27792e-05 Average time for zero size MPI_Send(): 2.08554e-06 #PETSc Option Table entries: --n-threads=1 -i treat-cube_transient.i -ksp_gmres_restart 100 -log_view -pc_hypre_boomeramg_max_iter 4 -pc_hypre_boomeramg_strong_threshold 0.7 -pc_hypre_boomeramg_tol 1.0e-6 -pc_hypre_type boomeramg -pc_type asm -snes_mf_operator -snes_view #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --download-hypre=1 --with-ssl=0 --with-debugging=no --with-pic=1 --with-shared-libraries=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --download-fblaslapack=1 --download-metis=1 --download-parmetis=1 --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 -CC=mpicc -CXX=mpicxx -FC=mpif90 -F77=mpif77 -F90=mpif90 -CFLAGS="-fPIC -fopenmp" -CXXFLAGS="-fPIC -fopenmp" -FFLAGS="-fPIC -fopenmp" -FCFLAGS="-fPIC -fopenmp" -F90FLAGS="-fPIC -fopenmp" -F77FLAGS="-fPIC -fopenmp" PETSC_DIR=/home/kongf/workhome/projects/petsc -download-cmake=1 ----------------------------------------- Libraries compiled on Tue Feb 7 16:47:41 2017 on falcon1 Machine characteristics: Linux-3.0.101-84.1.11909.0.PTF-default-x86_64-with-SuSE-11-x86_64 Using PETSc directory: /home/kongf/workhome/projects/petsc Using PETSc arch: arch-linux2-c-opt ----------------------------------------- Using C compiler: mpicc -fPIC -fopenmp -g -O ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: mpif90 -fPIC -fopenmp -g -O ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/include -I/home/kongf/workhome/projects/petsc/include -I/home/kongf/workhome/projects/petsc/include -I/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/include ----------------------------------------- Using C linker: mpicc Using Fortran linker: mpif90 Using libraries: -Wl,-rpath,/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/lib -L/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/lib -L/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/lib -lsuperlu_dist -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lHYPRE -Wl,-rpath,/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -L/apps/local/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -Wl,-rpath,/apps/local/easybuild/software/tbb/4.3.0.090/tbb/lib -L/apps/local/easybuild/software/tbb/4.3.0.090/tbb/lib -Wl,-rpath,/apps/local/easybuild/software/cppunit/1.12.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/cppunit/1.12.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib -L/apps/local/easybuild/software/GCC/4.9.2/lib -lmpichcxx -lstdc++ -lscalapack -lflapack -lfblas -lX11 -lmpichf90 -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpichcxx -lstdc++ -Wl,-rpath,/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -L/apps/local/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/tbb/4.3.0.090/tbb/lib -L/apps/local/easybuild/software/tbb/4.3.0.090/tbb/lib -Wl,-rpath,/apps/local/easybuild/software/cppunit/1.12.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/cppunit/1.12.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib -L/apps/local/easybuild/software/GCC/4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib -L/apps/local/easybuild/software/GCC/4.9.2/lib -ldl -Wl,-rpath,/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -lmpich -lopa -lmpl -lgomp -lgcc_s -lpthread -ldl ----------------------------------------- -------------- next part -------------- Time Step 10, time = 0.1 dt = 0.01 0 Nonlinear |R| = 2.004778e-03 0 Linear |R| = 2.004778e-03 1 Linear |R| = 4.440581e-04 2 Linear |R| = 1.283930e-04 3 Linear |R| = 9.874954e-05 4 Linear |R| = 6.589984e-05 5 Linear |R| = 4.483411e-05 6 Linear |R| = 2.787575e-05 7 Linear |R| = 1.435839e-05 8 Linear |R| = 8.720579e-06 9 Linear |R| = 3.704796e-06 10 Linear |R| = 2.317054e-06 11 Linear |R| = 9.060942e-07 1 Nonlinear |R| = 9.060942e-07 0 Linear |R| = 9.060942e-07 1 Linear |R| = 6.874101e-07 2 Linear |R| = 3.052995e-07 3 Linear |R| = 1.728171e-07 4 Linear |R| = 7.805237e-08 5 Linear |R| = 5.011253e-08 6 Linear |R| = 2.903814e-08 7 Linear |R| = 2.421108e-08 8 Linear |R| = 1.594860e-08 9 Linear |R| = 1.116189e-08 10 Linear |R| = 4.372907e-09 11 Linear |R| = 1.575997e-09 12 Linear |R| = 5.765413e-10 2 Nonlinear |R| = 5.765413e-10 SNES Object: 384 MPI processes type: newtonls maximum iterations=200, maximum function evaluations=10000 tolerances: relative=1e-08, absolute=1e-08, solution=1e-50 total number of linear solver iterations=23 total number of function evaluations=28 norm schedule ALWAYS SNESLineSearch Object: 384 MPI processes type: bt interpolation: cubic alpha=1.000000e-04 maxstep=1.000000e+08, minlambda=1.000000e-12 tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 maximum iterations=40 KSP Object: 384 MPI processes type: gmres GMRES: restart=100, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=100, initial guess is zero tolerances: relative=0.001, absolute=1e-50, divergence=10000. right preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 384 MPI processes type: gamg MG: type is MULTIPLICATIVE, levels=2 cycles=v Cycles per PCApply=1 Using Galerkin computed coarse grid matrices GAMG specific options Threshold for dropping small values from graph 0. AGG specific options Symmetric graph true Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 384 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 384 MPI processes type: bjacobi block Jacobi: number of blocks = 384 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (mg_coarse_sub_) 1 MPI processes type: preonly maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_sub_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 using diagonal shift on blocks to prevent zero pivot [INBLOCKS] matrix ordering: nd factor fill ratio given 5., needed 1.31367 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=37, cols=37 package used to perform factorization: petsc total: nonzeros=913, allocated nonzeros=913 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=37, cols=37 total: nonzeros=695, allocated nonzeros=695 total number of mallocs used during MatSetValues calls =0 not using I-node routines linear system matrix = precond matrix: Mat Object: 384 MPI processes type: mpiaij rows=18145, cols=18145 total: nonzeros=1709115, allocated nonzeros=1709115 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Down solver (pre-smoother) on level 1 ------------------------------- KSP Object: (mg_levels_1_) 384 MPI processes type: chebyshev Chebyshev: eigenvalue estimates: min = 0.138116, max = 1.51927 Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] KSP Object: (mg_levels_1_esteig_) 384 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test maximum iterations=2 tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using NONE norm type for convergence test PC Object: (mg_levels_1_) 384 MPI processes type: sor SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. linear system matrix = precond matrix: Mat Object: () 384 MPI processes type: mpiaij rows=3020875, cols=3020875 total: nonzeros=215671710, allocated nonzeros=241731750 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Up solver (post-smoother) same as down solver (pre-smoother) linear system matrix followed by preconditioner matrix: Mat Object: 384 MPI processes type: mffd rows=3020875, cols=3020875 Matrix-free approximation: err=1.49012e-08 (relative error in function evaluation) Using wp compute h routine Does not compute normU Mat Object: () 384 MPI processes type: mpiaij rows=3020875, cols=3020875 total: nonzeros=215671710, allocated nonzeros=241731750 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines Solve Converged! ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- /home/kongf/workhome/projects/yak/yak-opt on a arch-linux2-c-opt named r4i4n2 with 384 processors, by kongf Fri Apr 7 13:36:35 2017 Using Petsc Release Version 3.7.5, unknown Max Max/Min Avg Total Time (sec): 2.266e+03 1.00001 2.266e+03 Objects: 6.020e+03 1.00000 6.020e+03 Flops: 1.064e+10 2.27050 7.337e+09 2.817e+12 Flops/sec: 4.695e+06 2.27050 3.237e+06 1.243e+09 MPI Messages: 3.459e+05 5.11666 2.112e+05 8.111e+07 MPI Message Lengths: 3.248e+09 3.35280 9.453e+03 7.667e+11 MPI Reductions: 4.610e+04 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 2.2663e+03 100.0% 2.8172e+12 100.0% 8.111e+07 100.0% 9.453e+03 100.0% 4.610e+04 100.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage VecDot 20 1.0 6.1171e-01 1.6 4.53e+05 2.3 0.0e+00 0.0e+00 2.0e+01 0 0 0 0 0 0 0 0 0 0 198 VecMDot 1091 1.0 3.4823e+01 1.7 1.05e+08 2.3 0.0e+00 0.0e+00 1.1e+03 1 1 0 0 2 1 1 0 0 2 803 VecNorm 1943 1.0 6.9656e+01 1.6 3.66e+07 2.3 0.0e+00 0.0e+00 1.9e+03 3 0 0 0 4 3 0 0 0 4 140 VecScale 2928 1.0 1.1091e-01 2.8 7.24e+07 1.4 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 219463 VecCopy 3086 1.0 6.0201e-01 5.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 7168 1.0 4.2314e-01 7.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 3263 1.0 3.7908e-01 4.1 1.59e+08 1.4 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 138504 VecAYPX 4112 1.0 1.1982e-01 4.2 3.59e+07 2.3 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 80071 VecAXPBYCZ 2056 1.0 7.5538e-02 3.3 7.18e+07 2.3 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 254030 VecWAXPY 743 1.0 7.8864e-02 4.9 1.65e+07 2.3 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 55963 VecMAXPY 1196 1.0 7.9660e-02 3.3 1.23e+08 2.3 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 411137 VecAssemblyBegin 12333 1.0 1.1090e+03 1.2 0.00e+00 0.0 7.6e+06 1.9e+04 3.7e+04 48 0 9 19 80 48 0 9 19 80 0 VecAssemblyEnd 12333 1.0 4.2957e-0124.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecPointwiseMult 440 1.0 2.2301e-02 5.7 3.12e+06 2.3 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 37433 VecScatterBegin 13638 1.0 2.3693e+00 4.9 0.00e+00 0.0 6.4e+07 5.6e+03 2.8e+01 0 0 79 46 0 0 0 79 46 0 0 VecScatterEnd 13610 1.0 2.1648e+0213.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 VecSetRandom 40 1.0 4.5372e-02 5.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecReduceArith 55 1.0 1.3552e-03 2.7 1.25e+06 2.3 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 245191 VecReduceComm 25 1.0 2.3911e+00 4.5 0.00e+00 0.0 0.0e+00 0.0e+00 2.5e+01 0 0 0 0 0 0 0 0 0 0 0 VecNormalize 1196 1.0 2.8596e+01 1.1 2.95e+07 2.3 0.0e+00 0.0e+00 1.2e+03 1 0 0 0 3 1 0 0 0 3 275 MatMult MF 718 1.0 1.4078e+03 1.0 2.00e+08 1.4 4.2e+07 8.2e+03 3.2e+04 62 2 52 45 69 62 2 52 45 69 46 MatMult 4195 1.0 1.4272e+03 1.0 3.33e+09 2.2 5.8e+07 6.6e+03 3.2e+04 63 32 72 50 69 63 32 72 50 69 627 MatMultAdd 514 1.0 9.7981e+0016.1 3.84e+07 2.4 2.0e+06 1.3e+02 0.0e+00 0 0 2 0 0 0 0 2 0 0 995 MatMultTranspose 514 1.0 6.0183e+0019.9 3.84e+07 2.4 2.0e+06 1.3e+02 0.0e+00 0 0 2 0 0 0 0 2 0 0 1620 MatSolve 316 1.3 1.7905e-0219.7 1.76e+06 4.6 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 18236 MatSOR 3524 1.0 6.6987e+00 3.9 2.50e+09 2.6 0.0e+00 0.0e+00 0.0e+00 0 23 0 0 0 0 23 0 0 0 97291 MatLUFactorSym 25 1.0 1.7944e-0217.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatLUFactorNum 25 1.0 2.2082e-03 6.0 2.10e+0610.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 136111 MatConvert 40 1.0 2.6915e-01 5.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatScale 120 1.0 1.0204e+0022.5 3.86e+07 2.3 1.9e+05 2.9e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 10018 MatResidual 514 1.0 5.3226e+01 1.1 4.35e+08 2.3 3.7e+06 4.2e+03 1.1e+03 2 4 5 2 2 2 4 5 2 2 2165 MatAssemblyBegin 1010 1.0 6.0257e+01 2.2 0.00e+00 0.0 1.7e+06 3.5e+04 8.4e+02 2 0 2 8 2 2 0 2 8 2 0 MatAssemblyEnd 1010 1.0 7.7316e+01 1.0 0.00e+00 0.0 2.5e+06 4.6e+02 2.1e+03 3 0 3 0 5 3 0 3 0 5 0 MatGetRow 1078194 2.3 2.4485e-01 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetRowIJ 25 1.2 3.7956e-04 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetSubMatrix 30 1.0 1.6949e+01 1.0 0.00e+00 0.0 1.2e+05 2.8e+02 5.1e+02 1 0 0 0 1 1 0 0 0 1 0 MatGetOrdering 25 1.2 1.8878e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatCoarsen 40 1.0 1.5944e+01 1.1 0.00e+00 0.0 2.6e+06 2.3e+03 3.0e+02 1 0 3 1 1 1 0 3 1 1 0 MatZeroEntries 69 1.0 7.3145e-02 7.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatView 90 1.4 1.1229e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 4.8e+01 0 0 0 0 0 0 0 0 0 0 0 MatAXPY 40 1.0 3.4301e+00 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+01 0 0 0 0 0 0 0 0 0 0 0 MatTranspose 20 1.0 1.2561e+01 1.1 0.00e+00 0.0 7.1e+05 2.0e+04 2.4e+02 1 0 1 2 1 1 0 1 2 1 0 MatMatMult 40 1.0 2.6365e+01 1.0 3.56e+07 2.3 1.2e+06 1.4e+03 6.4e+02 1 0 1 0 1 1 0 1 0 1 358 MatMatMultSym 40 1.0 2.3430e+01 1.0 0.00e+00 0.0 9.8e+05 1.1e+03 5.6e+02 1 0 1 0 1 1 0 1 0 1 0 MatMatMultNum 40 1.0 2.9809e+00 1.1 3.56e+07 2.3 1.9e+05 2.9e+03 8.0e+01 0 0 0 0 0 0 0 0 0 0 3170 MatPtAP 40 1.0 3.1763e+01 1.0 2.59e+08 2.3 2.7e+06 2.6e+03 6.8e+02 1 2 3 1 1 1 2 3 1 1 2012 MatPtAPSymbolic 40 1.0 1.7240e+01 1.1 0.00e+00 0.0 1.2e+06 4.6e+03 2.8e+02 1 0 1 1 1 1 0 1 1 1 0 MatPtAPNumeric 40 1.0 1.5004e+01 1.1 2.59e+08 2.3 1.5e+06 1.0e+03 4.0e+02 1 2 2 0 1 1 2 2 0 1 4259 MatTrnMatMult 25 1.0 1.1522e+02 1.0 4.05e+09 2.3 7.5e+05 2.6e+05 4.8e+02 5 37 1 25 1 5 37 1 25 1 9105 MatTrnMatMultSym 25 1.0 7.3735e+01 1.0 0.00e+00 0.0 6.3e+05 1.0e+05 4.2e+02 3 0 1 8 1 3 0 1 8 1 0 MatTrnMatMultNum 25 1.0 4.1508e+01 1.0 4.05e+09 2.3 1.2e+05 1.1e+06 5.0e+01 2 37 0 17 0 2 37 0 17 0 25275 MatGetLocalMat 170 1.0 6.0506e-01 4.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetBrAoCol 120 1.0 3.7906e+00 5.3 0.00e+00 0.0 1.3e+06 5.0e+03 0.0e+00 0 0 2 1 0 0 0 2 1 0 0 SNESSolve 13 1.0 1.9975e+03 1.0 1.06e+10 2.3 7.8e+07 9.1e+03 4.3e+04 88100 96 92 94 88100 96 92 94 1408 SNESFunctionEval 756 1.0 1.4539e+03 1.0 1.62e+08 1.4 4.4e+07 8.3e+03 3.3e+04 64 2 55 48 71 64 2 55 48 71 38 SNESJacobianEval 25 1.0 1.0415e+02 1.0 1.95e+07 1.4 2.3e+06 2.3e+04 1.9e+03 5 0 3 7 4 5 0 3 7 4 65 SNESLineSearch 25 1.0 1.0113e+02 1.0 2.57e+07 1.4 3.1e+06 1.0e+04 2.3e+03 4 0 4 4 5 4 0 4 4 5 85 BuildTwoSided 85 1.0 5.0838e+00 1.5 0.00e+00 0.0 1.5e+05 4.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFSetGraph 85 1.0 3.2002e-02 4.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFBcastBegin 382 1.0 3.1338e+00 1.4 0.00e+00 0.0 2.6e+06 2.3e+03 0.0e+00 0 0 3 1 0 0 0 3 1 0 0 SFBcastEnd 382 1.0 5.2611e+00 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFReduceBegin 45 1.0 2.5858e+00 1.5 0.00e+00 0.0 2.4e+05 1.8e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFReduceEnd 45 1.0 3.6487e-01253.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPGMRESOrthog 1091 1.0 3.4858e+01 1.7 2.09e+08 2.3 0.0e+00 0.0e+00 1.1e+03 1 2 0 0 2 1 2 0 0 2 1604 KSPSetUp 195 1.0 2.9202e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+01 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 25 1.0 1.7661e+03 1.0 1.06e+10 2.3 7.2e+07 8.6e+03 3.9e+04 78 99 88 80 84 78 99 88 80 84 1582 PCGAMGGraph_AGG 40 1.0 3.5930e+01 1.0 3.56e+07 2.3 1.6e+06 1.9e+04 7.6e+02 2 0 2 4 2 2 0 2 4 2 263 PCGAMGCoarse_AGG 40 1.0 1.4450e+02 1.0 4.05e+09 2.3 4.0e+06 5.1e+04 1.2e+03 6 37 5 27 3 6 37 5 27 3 7260 PCGAMGProl_AGG 40 1.0 3.2209e+01 1.0 0.00e+00 0.0 9.8e+05 2.9e+03 9.6e+02 1 0 1 0 2 1 0 1 0 2 0 PCGAMGPOpt_AGG 40 1.0 6.3251e+01 1.0 4.72e+08 2.3 3.1e+06 2.3e+03 1.9e+03 3 4 4 1 4 3 4 4 1 4 1987 GAMG: createProl 40 1.0 2.7631e+02 1.0 4.56e+09 2.3 9.6e+06 2.5e+04 4.8e+03 12 42 12 32 10 12 42 12 32 10 4286 Graph 80 1.0 3.5926e+01 1.0 3.56e+07 2.3 1.6e+06 1.9e+04 7.6e+02 2 0 2 4 2 2 0 2 4 2 263 MIS/Agg 40 1.0 1.5945e+01 1.1 0.00e+00 0.0 2.6e+06 2.3e+03 3.0e+02 1 0 3 1 1 1 0 3 1 1 0 SA: col data 40 1.0 1.3401e+01 1.1 0.00e+00 0.0 4.2e+05 6.1e+03 4.0e+02 1 0 1 0 1 1 0 1 0 1 0 SA: frmProl0 40 1.0 1.4033e+01 1.1 0.00e+00 0.0 5.6e+05 4.6e+02 4.0e+02 1 0 1 0 1 1 0 1 0 1 0 SA: smooth 40 1.0 6.3251e+01 1.0 4.72e+08 2.3 3.1e+06 2.3e+03 1.9e+03 3 4 4 1 4 3 4 4 1 4 1987 GAMG: partLevel 40 1.0 5.8738e+01 1.0 2.59e+08 2.3 2.9e+06 2.5e+03 1.5e+03 3 2 4 1 3 3 2 4 1 3 1088 repartition 35 1.0 3.3741e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 9.0e+01 0 0 0 0 0 0 0 0 0 0 0 Invert-Sort 15 1.0 2.7445e+00 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+01 0 0 0 0 0 0 0 0 0 0 0 Move A 15 1.0 9.3221e+00 1.0 0.00e+00 0.0 6.6e+04 4.9e+02 2.7e+02 0 0 0 0 1 0 0 0 0 1 0 Move P 15 1.0 8.7196e+00 1.0 0.00e+00 0.0 5.7e+04 3.6e+01 2.7e+02 0 0 0 0 1 0 0 0 0 1 0 PCSetUp 50 1.0 3.4248e+02 1.0 4.81e+09 2.3 1.2e+07 2.0e+04 6.5e+03 15 44 15 33 14 15 44 15 33 14 3645 PCSetUpOnBlocks 316 1.0 2.1314e-02 6.3 2.10e+0610.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 14102 PCApply 316 1.0 7.8870e+02 1.0 5.52e+09 2.4 4.0e+07 4.4e+03 1.7e+04 34 52 49 23 37 34 52 49 23 37 1863 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Vector 2951 2951 828338752 0. Vector Scatter 353 353 367264 0. Index Set 833 833 6198336 0. IS L to G Mapping 33 33 3228828 0. MatMFFD 13 13 10088 0. Matrix 1334 1334 3083683516 0. Matrix Coarsen 40 40 25120 0. SNES 13 13 17316 0. SNESLineSearch 13 13 12896 0. DMSNES 13 13 8632 0. Distributed Mesh 13 13 60320 0. Star Forest Bipartite Graph 111 111 94128 0. Discrete System 13 13 11232 0. Krylov Solver 123 123 4660776 0. DMKSP interface 13 13 8424 0. Preconditioner 123 123 117692 0. PetscRandom 13 13 8294 0. Viewer 15 13 10816 0. ======================================================================================================================== Average time to get PetscTime(): 9.53674e-08 Average time for MPI_Barrier(): 0.0217308 Average time for zero size MPI_Send(): 0.000133693 #PETSc Option Table entries: --n-threads=1 -i treat-cube_transient.i -ksp_gmres_restart 100 -log_view -pc_gamg_sym_graph true -pc_hypre_boomeramg_max_iter 4 -pc_hypre_boomeramg_strong_threshold 0.7 -pc_hypre_boomeramg_tol 1.0e-6 -pc_hypre_type boomeramg -pc_mg_levels 2 -pc_type gamg -pc_use_amat false -snes_mf_operator -snes_view #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --download-hypre=1 --with-ssl=0 --with-debugging=no --with-pic=1 --with-shared-libraries=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --download-fblaslapack=1 --download-metis=1 --download-parmetis=1 --download-superlu_dist=1 --download-mumps=1 --download-scalapack=1 -CC=mpicc -CXX=mpicxx -FC=mpif90 -F77=mpif77 -F90=mpif90 -CFLAGS="-fPIC -fopenmp" -CXXFLAGS="-fPIC -fopenmp" -FFLAGS="-fPIC -fopenmp" -FCFLAGS="-fPIC -fopenmp" -F90FLAGS="-fPIC -fopenmp" -F77FLAGS="-fPIC -fopenmp" PETSC_DIR=/home/kongf/workhome/projects/petsc -download-cmake=1 ----------------------------------------- Libraries compiled on Tue Feb 7 16:47:41 2017 on falcon1 Machine characteristics: Linux-3.0.101-84.1.11909.0.PTF-default-x86_64-with-SuSE-11-x86_64 Using PETSc directory: /home/kongf/workhome/projects/petsc Using PETSc arch: arch-linux2-c-opt ----------------------------------------- Using C compiler: mpicc -fPIC -fopenmp -g -O ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: mpif90 -fPIC -fopenmp -g -O ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/include -I/home/kongf/workhome/projects/petsc/include -I/home/kongf/workhome/projects/petsc/include -I/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/include ----------------------------------------- Using C linker: mpicc Using Fortran linker: mpif90 Using libraries: -Wl,-rpath,/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/lib -L/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/lib -L/home/kongf/workhome/projects/petsc/arch-linux2-c-opt/lib -lsuperlu_dist -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lHYPRE -Wl,-rpath,/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -L/apps/local/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -Wl,-rpath,/apps/local/easybuild/software/tbb/4.3.0.090/tbb/lib -L/apps/local/easybuild/software/tbb/4.3.0.090/tbb/lib -Wl,-rpath,/apps/local/easybuild/software/cppunit/1.12.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/cppunit/1.12.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib -L/apps/local/easybuild/software/GCC/4.9.2/lib -lmpichcxx -lstdc++ -lscalapack -lflapack -lfblas -lX11 -lmpichf90 -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpichcxx -lstdc++ -Wl,-rpath,/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -L/apps/local/easybuild/software/GCC/4.9.2/lib/gcc/x86_64-unknown-linux-gnu/4.9.2 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/tbb/4.3.0.090/tbb/lib -L/apps/local/easybuild/software/tbb/4.3.0.090/tbb/lib -Wl,-rpath,/apps/local/easybuild/software/cppunit/1.12.1-GCC-4.9.2/lib -L/apps/local/easybuild/software/cppunit/1.12.1-GCC-4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib64 -L/apps/local/easybuild/software/GCC/4.9.2/lib64 -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib -L/apps/local/easybuild/software/GCC/4.9.2/lib -Wl,-rpath,/apps/local/easybuild/software/GCC/4.9.2/lib -L/apps/local/easybuild/software/GCC/4.9.2/lib -ldl -Wl,-rpath,/apps/local/easybuild/software/MVAPICH2/2.0.1-GCC-4.9.2/lib -lmpich -lopa -lmpl -lgomp -lgcc_s -lpthread -ldl ----------------------------------------- From bsmith at mcs.anl.gov Fri Apr 7 16:35:36 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 7 Apr 2017 16:35:36 -0500 Subject: [petsc-users] odd behavior when using lapack's dgeev with petsc In-Reply-To: References: <9ABBA7EA-63AC-40BD-BA47-64D77E6B0471@gmail.com> <74FE8F6B-C424-4E87-A3FD-29C4F1213236@mcs.anl.gov> Message-ID: <6D896251-CE0D-4EA9-92D0-4E65B5F749AF@mcs.anl.gov> > On Apr 7, 2017, at 3:34 PM, Manav Bhatia wrote: > > Yes, I printed the data in both cases and they look the same. > > I also used ?set step-mode on? to show the system lapack info, and they both are using the same lapack routine. > > This is still baffling me. alignment of the input arrays, both the same? I don't know why this is happening; what if you use your standalone code but link against all the libraries that are linked against for the PETSc case. > > -Manav > > >> On Apr 7, 2017, at 3:22 PM, Barry Smith wrote: >> >> >>> On Apr 7, 2017, at 2:57 PM, Manav Bhatia wrote: >>> >>> Hi Barry, >>> >>> Thanks for the inputs. >>> >>> I did try that, but the debugger (gdb) stepped right over the dgeev_ call, without getting inside the function. >> >> Did it at least stop at the function so you do an up and print all the arguments passed in? >> >>> >>> I am wondering if this has anything to do with the fact that the system lapack library might not have any debugging info in it. >> >> Yeah I forgot it might not have them. >> >> Barry >> >>> >>> Thoughts? >>> >>> Regards, >>> Manav >>> >>>> On Apr 7, 2017, at 2:40 PM, Barry Smith wrote: >>>> >>>> >>>>> On Apr 7, 2017, at 1:46 PM, Manav Bhatia wrote: >>>>> >>>>> Hi, >>>>> >>>>> I have compile petsc on my Ubuntu machine (also Mac OS 10.12 separately) to link to the system lapack and blas libraries (shown below). >>>>> >>>>> I have created an interface class to dgeev in lapack to calculate the eigenvalues of a matrix. >>>>> >>>>> My application code links to multiple libraries: libMesh, petsc, slepc, hdf5, etc. >>>>> >>>>> If I test my interface inside this application code, I get junk results. >>>> >>>> This is easy to debug because you have a version that works. >>>> >>>> Run both versions in separate windows each in a debugger and put a break point in the dgeev_ function. When it gets there check that it is the same dgeev_ function in both cases and check that the inputs are the same then step through both to see when things start to change between the two. >>>> >>>>> >>>>> However, on the same machine, if I use the interface in a separate main() function without linking to any of the libraries except lapack and blas, then I get expected results. >>>>> >>>>> Also, this problem does not show up on Mac. >>>>> >>>>> I am not sure what could be causing this and don?t quite know where to start. Could Petsc have anything to do with this? >>>>> >>>>> Any insight would be greatly appreciated. >>>>> >>>>> Regards, >>>>> Manav >>>>> >>>>> manav at manav1:~/test$ ldd /opt/local/lib/libpetsc.so >>>>> linux-vdso.so.1 => (0x00007fff3e7a8000) >>>>> libsuperlu_dist.so.5 => /opt/local/lib/libsuperlu_dist.so.5 (0x00007f721fbd1000) >>>>> libparmetis.so => /opt/local/lib/libparmetis.so (0x00007f721f990000) >>>>> libmetis.so => /opt/local/lib/libmetis.so (0x00007f721f718000) >>>>> libsuperlu.so.5 => /opt/local/lib/libsuperlu.so.5 (0x00007f721f4a7000) >>>>> libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f721f124000) >>>>> liblapack.so.3 => /usr/lib/liblapack.so.3 (0x00007f721e92c000) >>>>> libblas.so.3 => /usr/lib/libblas.so.3 (0x00007f721e6bd000) >>>>> libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007f721e382000) >>>>> libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f721e079000) >>>>> libmpi_mpifh.so.12 => /usr/lib/libmpi_mpifh.so.12 (0x00007f721de20000) >>>>> libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007f721daf4000) >>>>> libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f721d8f0000) >>>>> libmpi.so.12 => /usr/lib/libmpi.so.12 (0x00007f721d61a000) >>>>> libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f721d403000) >>>>> libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f721d1e6000) >>>>> libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f721ce1d000) >>>>> /lib64/ld-linux-x86-64.so.2 (0x000055d739f1b000) >>>>> libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007f721cbcf000) >>>>> libopen-pal.so.13 => /usr/lib/libopen-pal.so.13 (0x00007f721c932000) >>>>> libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f721c6f2000) >>>>> libibverbs.so.1 => /usr/lib/libibverbs.so.1 (0x00007f721c4e3000) >>>>> libopen-rte.so.12 => /usr/lib/libopen-rte.so.12 (0x00007f721c269000) >>>>> libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007f721c064000) >>>>> libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007f721be5e000) >>>>> librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f721bc56000) >>>>> libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f721ba52000) >>>>> libhwloc.so.5 => /usr/lib/x86_64-linux-gnu/libhwloc.so.5 (0x00007f721b818000) >>>>> libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007f721b60c000) >>>>> libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007f721b402000) >>>>> >>>> >>> >> > From bsmith at mcs.anl.gov Fri Apr 7 16:39:05 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 7 Apr 2017 16:39:05 -0500 Subject: [petsc-users] GAMG for the unsymmetrical matrix In-Reply-To: References: <66D79752-D4C3-4A2E-ADCE-EFC23C93CCD7@mcs.anl.gov> <772D2966-F917-44D1-B2AC-B0F4E506DC7C@mcs.anl.gov> Message-ID: Using Petsc Release Version 3.7.5, unknown So are you using the release or are you using master branch? If you use master the ASM will be even faster. > On Apr 7, 2017, at 4:29 PM, Kong, Fande wrote: > > Thanks, Barry. > > It works. > > GAMG is three times better than ASM in terms of the number of linear iterations, but it is five times slower than ASM. Any suggestions to improve the performance of GAMG? Log files are attached. > > Fande, > > On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith wrote: > > > On Apr 6, 2017, at 9:39 AM, Kong, Fande wrote: > > > > Thanks, Mark and Barry, > > > > It works pretty wells in terms of the number of linear iterations (using "-pc_gamg_sym_graph true"), but it is horrible in the compute time. I am using the two-level method via "-pc_mg_levels 2". The reason why the compute time is larger than other preconditioning options is that a matrix free method is used in the fine level and in my particular problem the function evaluation is expensive. > > > > I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton, but I do not think I want to make the preconditioning part matrix-free. Do you guys know how to turn off the matrix-free method for GAMG? > > -pc_use_amat false > > > > > Here is the detailed solver: > > > > SNES Object: 384 MPI processes > > type: newtonls > > maximum iterations=200, maximum function evaluations=10000 > > tolerances: relative=1e-08, absolute=1e-08, solution=1e-50 > > total number of linear solver iterations=20 > > total number of function evaluations=166 > > norm schedule ALWAYS > > SNESLineSearch Object: 384 MPI processes > > type: bt > > interpolation: cubic > > alpha=1.000000e-04 > > maxstep=1.000000e+08, minlambda=1.000000e-12 > > tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 > > maximum iterations=40 > > KSP Object: 384 MPI processes > > type: gmres > > GMRES: restart=100, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > > GMRES: happy breakdown tolerance 1e-30 > > maximum iterations=100, initial guess is zero > > tolerances: relative=0.001, absolute=1e-50, divergence=10000. > > right preconditioning > > using UNPRECONDITIONED norm type for convergence test > > PC Object: 384 MPI processes > > type: gamg > > MG: type is MULTIPLICATIVE, levels=2 cycles=v > > Cycles per PCApply=1 > > Using Galerkin computed coarse grid matrices > > GAMG specific options > > Threshold for dropping small values from graph 0. > > AGG specific options > > Symmetric graph true > > Coarse grid solver -- level ------------------------------- > > KSP Object: (mg_coarse_) 384 MPI processes > > type: preonly > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using NONE norm type for convergence test > > PC Object: (mg_coarse_) 384 MPI processes > > type: bjacobi > > block Jacobi: number of blocks = 384 > > Local solve is same for all blocks, in the following KSP and PC objects: > > KSP Object: (mg_coarse_sub_) 1 MPI processes > > type: preonly > > maximum iterations=1, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using NONE norm type for convergence test > > PC Object: (mg_coarse_sub_) 1 MPI processes > > type: lu > > LU: out-of-place factorization > > tolerance for zero pivot 2.22045e-14 > > using diagonal shift on blocks to prevent zero pivot [INBLOCKS] > > matrix ordering: nd > > factor fill ratio given 5., needed 1.31367 > > Factored matrix follows: > > Mat Object: 1 MPI processes > > type: seqaij > > rows=37, cols=37 > > package used to perform factorization: petsc > > total: nonzeros=913, allocated nonzeros=913 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node routines > > linear system matrix = precond matrix: > > Mat Object: 1 MPI processes > > type: seqaij > > rows=37, cols=37 > > total: nonzeros=695, allocated nonzeros=695 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node routines > > linear system matrix = precond matrix: > > Mat Object: 384 MPI processes > > type: mpiaij > > rows=18145, cols=18145 > > total: nonzeros=1709115, allocated nonzeros=1709115 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node (on process 0) routines > > Down solver (pre-smoother) on level 1 ------------------------------- > > KSP Object: (mg_levels_1_) 384 MPI processes > > type: chebyshev > > Chebyshev: eigenvalue estimates: min = 0.133339, max = 1.46673 > > Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] > > KSP Object: (mg_levels_1_esteig_) 384 MPI processes > > type: gmres > > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > > GMRES: happy breakdown tolerance 1e-30 > > maximum iterations=10, initial guess is zero > > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > > left preconditioning > > using PRECONDITIONED norm type for convergence test > > maximum iterations=2 > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using nonzero initial guess > > using NONE norm type for convergence test > > PC Object: (mg_levels_1_) 384 MPI processes > > type: sor > > SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. > > linear system matrix followed by preconditioner matrix: > > Mat Object: 384 MPI processes > > type: mffd > > rows=3020875, cols=3020875 > > Matrix-free approximation: > > err=1.49012e-08 (relative error in function evaluation) > > Using wp compute h routine > > Does not compute normU > > Mat Object: () 384 MPI processes > > type: mpiaij > > rows=3020875, cols=3020875 > > total: nonzeros=215671710, allocated nonzeros=241731750 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node (on process 0) routines > > Up solver (post-smoother) same as down solver (pre-smoother) > > linear system matrix followed by preconditioner matrix: > > Mat Object: 384 MPI processes > > type: mffd > > rows=3020875, cols=3020875 > > Matrix-free approximation: > > err=1.49012e-08 (relative error in function evaluation) > > Using wp compute h routine > > Does not compute normU > > Mat Object: () 384 MPI processes > > type: mpiaij > > rows=3020875, cols=3020875 > > total: nonzeros=215671710, allocated nonzeros=241731750 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node (on process 0) routines > > > > > > Fande, > > > > On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams wrote: > > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith wrote: > > > > > >> Does this mean that GAMG works for the symmetrical matrix only? > > > > > > No, it means that for non symmetric nonzero structure you need the extra flag. So use the extra flag. The reason we don't always use the flag is because it adds extra cost and isn't needed if the matrix already has a symmetric nonzero structure. > > > > BTW, if you have symmetric non-zero structure you can just set > > -pc_gamg_threshold -1.0', note the "or" in the message. > > > > If you want to mess with the threshold then you need to use the > > symmetrized flag. > > > > > From fande.kong at inl.gov Fri Apr 7 16:46:12 2017 From: fande.kong at inl.gov (Kong, Fande) Date: Fri, 7 Apr 2017 15:46:12 -0600 Subject: [petsc-users] GAMG for the unsymmetrical matrix In-Reply-To: References: <66D79752-D4C3-4A2E-ADCE-EFC23C93CCD7@mcs.anl.gov> <772D2966-F917-44D1-B2AC-B0F4E506DC7C@mcs.anl.gov> Message-ID: On Fri, Apr 7, 2017 at 3:39 PM, Barry Smith wrote: > > Using Petsc Release Version 3.7.5, unknown > > So are you using the release or are you using master branch? > I am working on the maint branch. I did something two months ago: git clone -b maint https://bitbucket.org/petsc/petsc petsc. I am interested to improve the GAMG performance. Is it possible? It can not beat ASM at all? The multilevel method should be better than the one-level if the number of processor cores is large. Fande, > > If you use master the ASM will be even faster. > What's new in master? Fande, > > > > On Apr 7, 2017, at 4:29 PM, Kong, Fande wrote: > > > > Thanks, Barry. > > > > It works. > > > > GAMG is three times better than ASM in terms of the number of linear > iterations, but it is five times slower than ASM. Any suggestions to > improve the performance of GAMG? Log files are attached. > > > > Fande, > > > > On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith wrote: > > > > > On Apr 6, 2017, at 9:39 AM, Kong, Fande wrote: > > > > > > Thanks, Mark and Barry, > > > > > > It works pretty wells in terms of the number of linear iterations > (using "-pc_gamg_sym_graph true"), but it is horrible in the compute time. > I am using the two-level method via "-pc_mg_levels 2". The reason why the > compute time is larger than other preconditioning options is that a matrix > free method is used in the fine level and in my particular problem the > function evaluation is expensive. > > > > > > I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton, > but I do not think I want to make the preconditioning part matrix-free. Do > you guys know how to turn off the matrix-free method for GAMG? > > > > -pc_use_amat false > > > > > > > > Here is the detailed solver: > > > > > > SNES Object: 384 MPI processes > > > type: newtonls > > > maximum iterations=200, maximum function evaluations=10000 > > > tolerances: relative=1e-08, absolute=1e-08, solution=1e-50 > > > total number of linear solver iterations=20 > > > total number of function evaluations=166 > > > norm schedule ALWAYS > > > SNESLineSearch Object: 384 MPI processes > > > type: bt > > > interpolation: cubic > > > alpha=1.000000e-04 > > > maxstep=1.000000e+08, minlambda=1.000000e-12 > > > tolerances: relative=1.000000e-08, absolute=1.000000e-15, > lambda=1.000000e-08 > > > maximum iterations=40 > > > KSP Object: 384 MPI processes > > > type: gmres > > > GMRES: restart=100, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > > > GMRES: happy breakdown tolerance 1e-30 > > > maximum iterations=100, initial guess is zero > > > tolerances: relative=0.001, absolute=1e-50, divergence=10000. > > > right preconditioning > > > using UNPRECONDITIONED norm type for convergence test > > > PC Object: 384 MPI processes > > > type: gamg > > > MG: type is MULTIPLICATIVE, levels=2 cycles=v > > > Cycles per PCApply=1 > > > Using Galerkin computed coarse grid matrices > > > GAMG specific options > > > Threshold for dropping small values from graph 0. > > > AGG specific options > > > Symmetric graph true > > > Coarse grid solver -- level ------------------------------- > > > KSP Object: (mg_coarse_) 384 MPI processes > > > type: preonly > > > maximum iterations=10000, initial guess is zero > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > > left preconditioning > > > using NONE norm type for convergence test > > > PC Object: (mg_coarse_) 384 MPI processes > > > type: bjacobi > > > block Jacobi: number of blocks = 384 > > > Local solve is same for all blocks, in the following KSP and > PC objects: > > > KSP Object: (mg_coarse_sub_) 1 MPI processes > > > type: preonly > > > maximum iterations=1, initial guess is zero > > > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000. > > > left preconditioning > > > using NONE norm type for convergence test > > > PC Object: (mg_coarse_sub_) 1 MPI processes > > > type: lu > > > LU: out-of-place factorization > > > tolerance for zero pivot 2.22045e-14 > > > using diagonal shift on blocks to prevent zero pivot > [INBLOCKS] > > > matrix ordering: nd > > > factor fill ratio given 5., needed 1.31367 > > > Factored matrix follows: > > > Mat Object: 1 MPI processes > > > type: seqaij > > > rows=37, cols=37 > > > package used to perform factorization: petsc > > > total: nonzeros=913, allocated nonzeros=913 > > > total number of mallocs used during MatSetValues > calls =0 > > > not using I-node routines > > > linear system matrix = precond matrix: > > > Mat Object: 1 MPI processes > > > type: seqaij > > > rows=37, cols=37 > > > total: nonzeros=695, allocated nonzeros=695 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node routines > > > linear system matrix = precond matrix: > > > Mat Object: 384 MPI processes > > > type: mpiaij > > > rows=18145, cols=18145 > > > total: nonzeros=1709115, allocated nonzeros=1709115 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node (on process 0) routines > > > Down solver (pre-smoother) on level 1 > ------------------------------- > > > KSP Object: (mg_levels_1_) 384 MPI processes > > > type: chebyshev > > > Chebyshev: eigenvalue estimates: min = 0.133339, max = > 1.46673 > > > Chebyshev: eigenvalues estimated using gmres with > translations [0. 0.1; 0. 1.1] > > > KSP Object: (mg_levels_1_esteig_) 384 MPI > processes > > > type: gmres > > > GMRES: restart=30, using Classical (unmodified) > Gram-Schmidt Orthogonalization with no iterative refinement > > > GMRES: happy breakdown tolerance 1e-30 > > > maximum iterations=10, initial guess is zero > > > tolerances: relative=1e-12, absolute=1e-50, > divergence=10000. > > > left preconditioning > > > using PRECONDITIONED norm type for convergence test > > > maximum iterations=2 > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > > left preconditioning > > > using nonzero initial guess > > > using NONE norm type for convergence test > > > PC Object: (mg_levels_1_) 384 MPI processes > > > type: sor > > > SOR: type = local_symmetric, iterations = 1, local > iterations = 1, omega = 1. > > > linear system matrix followed by preconditioner matrix: > > > Mat Object: 384 MPI processes > > > type: mffd > > > rows=3020875, cols=3020875 > > > Matrix-free approximation: > > > err=1.49012e-08 (relative error in function evaluation) > > > Using wp compute h routine > > > Does not compute normU > > > Mat Object: () 384 MPI processes > > > type: mpiaij > > > rows=3020875, cols=3020875 > > > total: nonzeros=215671710, allocated nonzeros=241731750 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node (on process 0) routines > > > Up solver (post-smoother) same as down solver (pre-smoother) > > > linear system matrix followed by preconditioner matrix: > > > Mat Object: 384 MPI processes > > > type: mffd > > > rows=3020875, cols=3020875 > > > Matrix-free approximation: > > > err=1.49012e-08 (relative error in function evaluation) > > > Using wp compute h routine > > > Does not compute normU > > > Mat Object: () 384 MPI processes > > > type: mpiaij > > > rows=3020875, cols=3020875 > > > total: nonzeros=215671710, allocated nonzeros=241731750 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node (on process 0) routines > > > > > > > > > Fande, > > > > > > On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams wrote: > > > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith > wrote: > > > > > > > >> Does this mean that GAMG works for the symmetrical matrix only? > > > > > > > > No, it means that for non symmetric nonzero structure you need the > extra flag. So use the extra flag. The reason we don't always use the flag > is because it adds extra cost and isn't needed if the matrix already has a > symmetric nonzero structure. > > > > > > BTW, if you have symmetric non-zero structure you can just set > > > -pc_gamg_threshold -1.0', note the "or" in the message. > > > > > > If you want to mess with the threshold then you need to use the > > > symmetrized flag. > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Fri Apr 7 16:52:06 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 7 Apr 2017 16:52:06 -0500 Subject: [petsc-users] GAMG for the unsymmetrical matrix In-Reply-To: References: <66D79752-D4C3-4A2E-ADCE-EFC23C93CCD7@mcs.anl.gov> <772D2966-F917-44D1-B2AC-B0F4E506DC7C@mcs.anl.gov> Message-ID: <95206537-BB6B-49D9-A500-BFF8903442F4@mcs.anl.gov> > On Apr 7, 2017, at 4:46 PM, Kong, Fande wrote: > > > > On Fri, Apr 7, 2017 at 3:39 PM, Barry Smith wrote: > > Using Petsc Release Version 3.7.5, unknown > > So are you using the release or are you using master branch? > > I am working on the maint branch. > > I did something two months ago: > > git clone -b maint https://bitbucket.org/petsc/petsc petsc. > > > I am interested to improve the GAMG performance. Why, why not use the best solver for your problem? > Is it possible? It can not beat ASM at all? The multilevel method should be better than the one-level if the number of processor cores is large. The ASM is taking 30 iterations, this is fantastic, it is really going to be tough to get GAMG to be faster (set up time for GAMG is high). What happens to both with 10 times as many processes? 100 times as many? Barry > > Fande, > > > If you use master the ASM will be even faster. > > What's new in master? > > > Fande, > > > > > On Apr 7, 2017, at 4:29 PM, Kong, Fande wrote: > > > > Thanks, Barry. > > > > It works. > > > > GAMG is three times better than ASM in terms of the number of linear iterations, but it is five times slower than ASM. Any suggestions to improve the performance of GAMG? Log files are attached. > > > > Fande, > > > > On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith wrote: > > > > > On Apr 6, 2017, at 9:39 AM, Kong, Fande wrote: > > > > > > Thanks, Mark and Barry, > > > > > > It works pretty wells in terms of the number of linear iterations (using "-pc_gamg_sym_graph true"), but it is horrible in the compute time. I am using the two-level method via "-pc_mg_levels 2". The reason why the compute time is larger than other preconditioning options is that a matrix free method is used in the fine level and in my particular problem the function evaluation is expensive. > > > > > > I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton, but I do not think I want to make the preconditioning part matrix-free. Do you guys know how to turn off the matrix-free method for GAMG? > > > > -pc_use_amat false > > > > > > > > Here is the detailed solver: > > > > > > SNES Object: 384 MPI processes > > > type: newtonls > > > maximum iterations=200, maximum function evaluations=10000 > > > tolerances: relative=1e-08, absolute=1e-08, solution=1e-50 > > > total number of linear solver iterations=20 > > > total number of function evaluations=166 > > > norm schedule ALWAYS > > > SNESLineSearch Object: 384 MPI processes > > > type: bt > > > interpolation: cubic > > > alpha=1.000000e-04 > > > maxstep=1.000000e+08, minlambda=1.000000e-12 > > > tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 > > > maximum iterations=40 > > > KSP Object: 384 MPI processes > > > type: gmres > > > GMRES: restart=100, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > > > GMRES: happy breakdown tolerance 1e-30 > > > maximum iterations=100, initial guess is zero > > > tolerances: relative=0.001, absolute=1e-50, divergence=10000. > > > right preconditioning > > > using UNPRECONDITIONED norm type for convergence test > > > PC Object: 384 MPI processes > > > type: gamg > > > MG: type is MULTIPLICATIVE, levels=2 cycles=v > > > Cycles per PCApply=1 > > > Using Galerkin computed coarse grid matrices > > > GAMG specific options > > > Threshold for dropping small values from graph 0. > > > AGG specific options > > > Symmetric graph true > > > Coarse grid solver -- level ------------------------------- > > > KSP Object: (mg_coarse_) 384 MPI processes > > > type: preonly > > > maximum iterations=10000, initial guess is zero > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > > left preconditioning > > > using NONE norm type for convergence test > > > PC Object: (mg_coarse_) 384 MPI processes > > > type: bjacobi > > > block Jacobi: number of blocks = 384 > > > Local solve is same for all blocks, in the following KSP and PC objects: > > > KSP Object: (mg_coarse_sub_) 1 MPI processes > > > type: preonly > > > maximum iterations=1, initial guess is zero > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > > left preconditioning > > > using NONE norm type for convergence test > > > PC Object: (mg_coarse_sub_) 1 MPI processes > > > type: lu > > > LU: out-of-place factorization > > > tolerance for zero pivot 2.22045e-14 > > > using diagonal shift on blocks to prevent zero pivot [INBLOCKS] > > > matrix ordering: nd > > > factor fill ratio given 5., needed 1.31367 > > > Factored matrix follows: > > > Mat Object: 1 MPI processes > > > type: seqaij > > > rows=37, cols=37 > > > package used to perform factorization: petsc > > > total: nonzeros=913, allocated nonzeros=913 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node routines > > > linear system matrix = precond matrix: > > > Mat Object: 1 MPI processes > > > type: seqaij > > > rows=37, cols=37 > > > total: nonzeros=695, allocated nonzeros=695 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node routines > > > linear system matrix = precond matrix: > > > Mat Object: 384 MPI processes > > > type: mpiaij > > > rows=18145, cols=18145 > > > total: nonzeros=1709115, allocated nonzeros=1709115 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node (on process 0) routines > > > Down solver (pre-smoother) on level 1 ------------------------------- > > > KSP Object: (mg_levels_1_) 384 MPI processes > > > type: chebyshev > > > Chebyshev: eigenvalue estimates: min = 0.133339, max = 1.46673 > > > Chebyshev: eigenvalues estimated using gmres with translations [0. 0.1; 0. 1.1] > > > KSP Object: (mg_levels_1_esteig_) 384 MPI processes > > > type: gmres > > > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > > > GMRES: happy breakdown tolerance 1e-30 > > > maximum iterations=10, initial guess is zero > > > tolerances: relative=1e-12, absolute=1e-50, divergence=10000. > > > left preconditioning > > > using PRECONDITIONED norm type for convergence test > > > maximum iterations=2 > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > > left preconditioning > > > using nonzero initial guess > > > using NONE norm type for convergence test > > > PC Object: (mg_levels_1_) 384 MPI processes > > > type: sor > > > SOR: type = local_symmetric, iterations = 1, local iterations = 1, omega = 1. > > > linear system matrix followed by preconditioner matrix: > > > Mat Object: 384 MPI processes > > > type: mffd > > > rows=3020875, cols=3020875 > > > Matrix-free approximation: > > > err=1.49012e-08 (relative error in function evaluation) > > > Using wp compute h routine > > > Does not compute normU > > > Mat Object: () 384 MPI processes > > > type: mpiaij > > > rows=3020875, cols=3020875 > > > total: nonzeros=215671710, allocated nonzeros=241731750 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node (on process 0) routines > > > Up solver (post-smoother) same as down solver (pre-smoother) > > > linear system matrix followed by preconditioner matrix: > > > Mat Object: 384 MPI processes > > > type: mffd > > > rows=3020875, cols=3020875 > > > Matrix-free approximation: > > > err=1.49012e-08 (relative error in function evaluation) > > > Using wp compute h routine > > > Does not compute normU > > > Mat Object: () 384 MPI processes > > > type: mpiaij > > > rows=3020875, cols=3020875 > > > total: nonzeros=215671710, allocated nonzeros=241731750 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node (on process 0) routines > > > > > > > > > Fande, > > > > > > On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams wrote: > > > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith wrote: > > > > > > > >> Does this mean that GAMG works for the symmetrical matrix only? > > > > > > > > No, it means that for non symmetric nonzero structure you need the extra flag. So use the extra flag. The reason we don't always use the flag is because it adds extra cost and isn't needed if the matrix already has a symmetric nonzero structure. > > > > > > BTW, if you have symmetric non-zero structure you can just set > > > -pc_gamg_threshold -1.0', note the "or" in the message. > > > > > > If you want to mess with the threshold then you need to use the > > > symmetrized flag. > > > > > > > > > > > From fande.kong at inl.gov Fri Apr 7 17:03:14 2017 From: fande.kong at inl.gov (Kong, Fande) Date: Fri, 7 Apr 2017 16:03:14 -0600 Subject: [petsc-users] GAMG for the unsymmetrical matrix In-Reply-To: <95206537-BB6B-49D9-A500-BFF8903442F4@mcs.anl.gov> References: <66D79752-D4C3-4A2E-ADCE-EFC23C93CCD7@mcs.anl.gov> <772D2966-F917-44D1-B2AC-B0F4E506DC7C@mcs.anl.gov> <95206537-BB6B-49D9-A500-BFF8903442F4@mcs.anl.gov> Message-ID: On Fri, Apr 7, 2017 at 3:52 PM, Barry Smith wrote: > > > On Apr 7, 2017, at 4:46 PM, Kong, Fande wrote: > > > > > > > > On Fri, Apr 7, 2017 at 3:39 PM, Barry Smith wrote: > > > > Using Petsc Release Version 3.7.5, unknown > > > > So are you using the release or are you using master branch? > > > > I am working on the maint branch. > > > > I did something two months ago: > > > > git clone -b maint https://urldefense.proofpoint. > com/v2/url?u=https-3A__bitbucket.org_petsc_petsc&d=DwIFAg&c= > 54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=DUUt3SRGI0_ > JgtNaS3udV68GRkgV4ts7XKfj2opmiCY&m=c92UNplDTVgzFrXIn_ > 70buWa2rXPGUKN083_aJYI0FQ&s=yrulwZxJiduZc-703r7PJOUApPDehsFIkhS0BTrroXc&e= > petsc. > > > > > > I am interested to improve the GAMG performance. > > Why, why not use the best solver for your problem? > I am just curious. I want to understand the potential of interesting preconditioners. > > > Is it possible? It can not beat ASM at all? The multilevel method should > be better than the one-level if the number of processor cores is large. > > The ASM is taking 30 iterations, this is fantastic, it is really going > to be tough to get GAMG to be faster (set up time for GAMG is high). > > What happens to both with 10 times as many processes? 100 times as many? > Did not try many processes yet. Fande, > > > Barry > > > > > Fande, > > > > > > If you use master the ASM will be even faster. > > > > What's new in master? > > > > > > Fande, > > > > > > > > > On Apr 7, 2017, at 4:29 PM, Kong, Fande wrote: > > > > > > Thanks, Barry. > > > > > > It works. > > > > > > GAMG is three times better than ASM in terms of the number of linear > iterations, but it is five times slower than ASM. Any suggestions to > improve the performance of GAMG? Log files are attached. > > > > > > Fande, > > > > > > On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith > wrote: > > > > > > > On Apr 6, 2017, at 9:39 AM, Kong, Fande wrote: > > > > > > > > Thanks, Mark and Barry, > > > > > > > > It works pretty wells in terms of the number of linear iterations > (using "-pc_gamg_sym_graph true"), but it is horrible in the compute time. > I am using the two-level method via "-pc_mg_levels 2". The reason why the > compute time is larger than other preconditioning options is that a matrix > free method is used in the fine level and in my particular problem the > function evaluation is expensive. > > > > > > > > I am using "-snes_mf_operator 1" to turn on the Jacobian-free > Newton, but I do not think I want to make the preconditioning part > matrix-free. Do you guys know how to turn off the matrix-free method for > GAMG? > > > > > > -pc_use_amat false > > > > > > > > > > > Here is the detailed solver: > > > > > > > > SNES Object: 384 MPI processes > > > > type: newtonls > > > > maximum iterations=200, maximum function evaluations=10000 > > > > tolerances: relative=1e-08, absolute=1e-08, solution=1e-50 > > > > total number of linear solver iterations=20 > > > > total number of function evaluations=166 > > > > norm schedule ALWAYS > > > > SNESLineSearch Object: 384 MPI processes > > > > type: bt > > > > interpolation: cubic > > > > alpha=1.000000e-04 > > > > maxstep=1.000000e+08, minlambda=1.000000e-12 > > > > tolerances: relative=1.000000e-08, absolute=1.000000e-15, > lambda=1.000000e-08 > > > > maximum iterations=40 > > > > KSP Object: 384 MPI processes > > > > type: gmres > > > > GMRES: restart=100, using Classical (unmodified) Gram-Schmidt > Orthogonalization with no iterative refinement > > > > GMRES: happy breakdown tolerance 1e-30 > > > > maximum iterations=100, initial guess is zero > > > > tolerances: relative=0.001, absolute=1e-50, divergence=10000. > > > > right preconditioning > > > > using UNPRECONDITIONED norm type for convergence test > > > > PC Object: 384 MPI processes > > > > type: gamg > > > > MG: type is MULTIPLICATIVE, levels=2 cycles=v > > > > Cycles per PCApply=1 > > > > Using Galerkin computed coarse grid matrices > > > > GAMG specific options > > > > Threshold for dropping small values from graph 0. > > > > AGG specific options > > > > Symmetric graph true > > > > Coarse grid solver -- level ------------------------------- > > > > KSP Object: (mg_coarse_) 384 MPI processes > > > > type: preonly > > > > maximum iterations=10000, initial guess is zero > > > > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000. > > > > left preconditioning > > > > using NONE norm type for convergence test > > > > PC Object: (mg_coarse_) 384 MPI processes > > > > type: bjacobi > > > > block Jacobi: number of blocks = 384 > > > > Local solve is same for all blocks, in the following KSP > and PC objects: > > > > KSP Object: (mg_coarse_sub_) 1 MPI processes > > > > type: preonly > > > > maximum iterations=1, initial guess is zero > > > > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000. > > > > left preconditioning > > > > using NONE norm type for convergence test > > > > PC Object: (mg_coarse_sub_) 1 MPI processes > > > > type: lu > > > > LU: out-of-place factorization > > > > tolerance for zero pivot 2.22045e-14 > > > > using diagonal shift on blocks to prevent zero pivot > [INBLOCKS] > > > > matrix ordering: nd > > > > factor fill ratio given 5., needed 1.31367 > > > > Factored matrix follows: > > > > Mat Object: 1 MPI processes > > > > type: seqaij > > > > rows=37, cols=37 > > > > package used to perform factorization: petsc > > > > total: nonzeros=913, allocated nonzeros=913 > > > > total number of mallocs used during MatSetValues > calls =0 > > > > not using I-node routines > > > > linear system matrix = precond matrix: > > > > Mat Object: 1 MPI processes > > > > type: seqaij > > > > rows=37, cols=37 > > > > total: nonzeros=695, allocated nonzeros=695 > > > > total number of mallocs used during MatSetValues calls =0 > > > > not using I-node routines > > > > linear system matrix = precond matrix: > > > > Mat Object: 384 MPI processes > > > > type: mpiaij > > > > rows=18145, cols=18145 > > > > total: nonzeros=1709115, allocated nonzeros=1709115 > > > > total number of mallocs used during MatSetValues calls =0 > > > > not using I-node (on process 0) routines > > > > Down solver (pre-smoother) on level 1 > ------------------------------- > > > > KSP Object: (mg_levels_1_) 384 MPI processes > > > > type: chebyshev > > > > Chebyshev: eigenvalue estimates: min = 0.133339, max = > 1.46673 > > > > Chebyshev: eigenvalues estimated using gmres with > translations [0. 0.1; 0. 1.1] > > > > KSP Object: (mg_levels_1_esteig_) 384 > MPI processes > > > > type: gmres > > > > GMRES: restart=30, using Classical (unmodified) > Gram-Schmidt Orthogonalization with no iterative refinement > > > > GMRES: happy breakdown tolerance 1e-30 > > > > maximum iterations=10, initial guess is zero > > > > tolerances: relative=1e-12, absolute=1e-50, > divergence=10000. > > > > left preconditioning > > > > using PRECONDITIONED norm type for convergence test > > > > maximum iterations=2 > > > > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000. > > > > left preconditioning > > > > using nonzero initial guess > > > > using NONE norm type for convergence test > > > > PC Object: (mg_levels_1_) 384 MPI processes > > > > type: sor > > > > SOR: type = local_symmetric, iterations = 1, local > iterations = 1, omega = 1. > > > > linear system matrix followed by preconditioner matrix: > > > > Mat Object: 384 MPI processes > > > > type: mffd > > > > rows=3020875, cols=3020875 > > > > Matrix-free approximation: > > > > err=1.49012e-08 (relative error in function evaluation) > > > > Using wp compute h routine > > > > Does not compute normU > > > > Mat Object: () 384 MPI processes > > > > type: mpiaij > > > > rows=3020875, cols=3020875 > > > > total: nonzeros=215671710, allocated nonzeros=241731750 > > > > total number of mallocs used during MatSetValues calls =0 > > > > not using I-node (on process 0) routines > > > > Up solver (post-smoother) same as down solver (pre-smoother) > > > > linear system matrix followed by preconditioner matrix: > > > > Mat Object: 384 MPI processes > > > > type: mffd > > > > rows=3020875, cols=3020875 > > > > Matrix-free approximation: > > > > err=1.49012e-08 (relative error in function evaluation) > > > > Using wp compute h routine > > > > Does not compute normU > > > > Mat Object: () 384 MPI processes > > > > type: mpiaij > > > > rows=3020875, cols=3020875 > > > > total: nonzeros=215671710, allocated nonzeros=241731750 > > > > total number of mallocs used during MatSetValues calls =0 > > > > not using I-node (on process 0) routines > > > > > > > > > > > > Fande, > > > > > > > > On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams wrote: > > > > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith > wrote: > > > > > > > > > >> Does this mean that GAMG works for the symmetrical matrix only? > > > > > > > > > > No, it means that for non symmetric nonzero structure you need > the extra flag. So use the extra flag. The reason we don't always use the > flag is because it adds extra cost and isn't needed if the matrix already > has a symmetric nonzero structure. > > > > > > > > BTW, if you have symmetric non-zero structure you can just set > > > > -pc_gamg_threshold -1.0', note the "or" in the message. > > > > > > > > If you want to mess with the threshold then you need to use the > > > > symmetrized flag. > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Sat Apr 8 02:20:43 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Sat, 8 Apr 2017 02:20:43 -0500 Subject: [petsc-users] odd behavior when using lapack's dgeev with petsc In-Reply-To: <6D896251-CE0D-4EA9-92D0-4E65B5F749AF@mcs.anl.gov> References: <9ABBA7EA-63AC-40BD-BA47-64D77E6B0471@gmail.com> <74FE8F6B-C424-4E87-A3FD-29C4F1213236@mcs.anl.gov> <6D896251-CE0D-4EA9-92D0-4E65B5F749AF@mcs.anl.gov> Message-ID: I would do a minimal petsc build - without any packages from /usr/local - and see if the problem presists.. Satish On Fri, 7 Apr 2017, Barry Smith wrote: > > > On Apr 7, 2017, at 3:34 PM, Manav Bhatia wrote: > > > > Yes, I printed the data in both cases and they look the same. > > > > I also used ?set step-mode on? to show the system lapack info, and they both are using the same lapack routine. > > > > This is still baffling me. > > alignment of the input arrays, both the same? > > I don't know why this is happening; what if you use your standalone code but link against all the libraries that are linked against for the PETSc case. > > > > > > > > -Manav > > > > > >> On Apr 7, 2017, at 3:22 PM, Barry Smith wrote: > >> > >> > >>> On Apr 7, 2017, at 2:57 PM, Manav Bhatia wrote: > >>> > >>> Hi Barry, > >>> > >>> Thanks for the inputs. > >>> > >>> I did try that, but the debugger (gdb) stepped right over the dgeev_ call, without getting inside the function. > >> > >> Did it at least stop at the function so you do an up and print all the arguments passed in? > >> > >>> > >>> I am wondering if this has anything to do with the fact that the system lapack library might not have any debugging info in it. > >> > >> Yeah I forgot it might not have them. > >> > >> Barry > >> > >>> > >>> Thoughts? > >>> > >>> Regards, > >>> Manav > >>> > >>>> On Apr 7, 2017, at 2:40 PM, Barry Smith wrote: > >>>> > >>>> > >>>>> On Apr 7, 2017, at 1:46 PM, Manav Bhatia wrote: > >>>>> > >>>>> Hi, > >>>>> > >>>>> I have compile petsc on my Ubuntu machine (also Mac OS 10.12 separately) to link to the system lapack and blas libraries (shown below). > >>>>> > >>>>> I have created an interface class to dgeev in lapack to calculate the eigenvalues of a matrix. > >>>>> > >>>>> My application code links to multiple libraries: libMesh, petsc, slepc, hdf5, etc. > >>>>> > >>>>> If I test my interface inside this application code, I get junk results. > >>>> > >>>> This is easy to debug because you have a version that works. > >>>> > >>>> Run both versions in separate windows each in a debugger and put a break point in the dgeev_ function. When it gets there check that it is the same dgeev_ function in both cases and check that the inputs are the same then step through both to see when things start to change between the two. > >>>> > >>>>> > >>>>> However, on the same machine, if I use the interface in a separate main() function without linking to any of the libraries except lapack and blas, then I get expected results. > >>>>> > >>>>> Also, this problem does not show up on Mac. > >>>>> > >>>>> I am not sure what could be causing this and don?t quite know where to start. Could Petsc have anything to do with this? > >>>>> > >>>>> Any insight would be greatly appreciated. > >>>>> > >>>>> Regards, > >>>>> Manav > >>>>> > >>>>> manav at manav1:~/test$ ldd /opt/local/lib/libpetsc.so > >>>>> linux-vdso.so.1 => (0x00007fff3e7a8000) > >>>>> libsuperlu_dist.so.5 => /opt/local/lib/libsuperlu_dist.so.5 (0x00007f721fbd1000) > >>>>> libparmetis.so => /opt/local/lib/libparmetis.so (0x00007f721f990000) > >>>>> libmetis.so => /opt/local/lib/libmetis.so (0x00007f721f718000) > >>>>> libsuperlu.so.5 => /opt/local/lib/libsuperlu.so.5 (0x00007f721f4a7000) > >>>>> libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f721f124000) > >>>>> liblapack.so.3 => /usr/lib/liblapack.so.3 (0x00007f721e92c000) > >>>>> libblas.so.3 => /usr/lib/libblas.so.3 (0x00007f721e6bd000) > >>>>> libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007f721e382000) > >>>>> libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f721e079000) > >>>>> libmpi_mpifh.so.12 => /usr/lib/libmpi_mpifh.so.12 (0x00007f721de20000) > >>>>> libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007f721daf4000) > >>>>> libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f721d8f0000) > >>>>> libmpi.so.12 => /usr/lib/libmpi.so.12 (0x00007f721d61a000) > >>>>> libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f721d403000) > >>>>> libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f721d1e6000) > >>>>> libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f721ce1d000) > >>>>> /lib64/ld-linux-x86-64.so.2 (0x000055d739f1b000) > >>>>> libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007f721cbcf000) > >>>>> libopen-pal.so.13 => /usr/lib/libopen-pal.so.13 (0x00007f721c932000) > >>>>> libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f721c6f2000) > >>>>> libibverbs.so.1 => /usr/lib/libibverbs.so.1 (0x00007f721c4e3000) > >>>>> libopen-rte.so.12 => /usr/lib/libopen-rte.so.12 (0x00007f721c269000) > >>>>> libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007f721c064000) > >>>>> libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007f721be5e000) > >>>>> librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f721bc56000) > >>>>> libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f721ba52000) > >>>>> libhwloc.so.5 => /usr/lib/x86_64-linux-gnu/libhwloc.so.5 (0x00007f721b818000) > >>>>> libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007f721b60c000) > >>>>> libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007f721b402000) > >>>>> > >>>> > >>> > >> > > > > From kaushikggg at gmail.com Sat Apr 8 06:54:56 2017 From: kaushikggg at gmail.com (Kaushik Kulkarni) Date: Sat, 8 Apr 2017 17:24:56 +0530 Subject: [petsc-users] KSP Solver giving inf's Message-ID: Hello, I just started with PETSc and was trying the KSP Solvers. I was trying for the problem - int main(int argc, char *argv[]) { /// matrix creation variables. PetscInt *idxm = new PetscInt[3]; PetscInt *idxn = new PetscInt[3]; PetscReal loc[] = { 1.0, -2.0, -6.0, 2.0, 4.0, 12.0, 1.0, -4.0,-12.0}; PetscReal b_array[] = { 12.0, -17.0, 22.0}; PetscInt i; KSP ksp; /// Declaring the vectors Vec x, b; // Declaring matrices Mat A; PetscInitialize(&argc,&argv,(char*)0,help); // Creating vectors VecCreateSeq(PETSC_COMM_SELF, 3, &x); VecCreateSeq(PETSC_COMM_SELF, 3, &b); // Creating matrix MatCreateSeqAIJ(PETSC_COMM_SELF, 3, 3, 3, NULL, &A); // Creating the indices for(i=0; i<3; i++) { idxm[i] = i; idxn[i] = i; } // Assembling the vector b and x VecSetValues(b, 3, idxm, b_array, INSERT_VALUES); VecAssemblyBegin(b); VecAssemblyEnd(b); //Assembling the Matrix MatSetValues(A, 3, idxm, 3, idxn, loc, INSERT_VALUES); MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); // KSP related operations KSPCreate(PETSC_COMM_SELF, &ksp); KSPSetType(ksp, KSPGMRES); KSPSetOperators(ksp, A, A); KSPSetFromOptions(ksp); KSPSolve(ksp,b,x); KSPDestroy(&ksp); VecView(x, PETSC_VIEWER_STDOUT_SELF); PetscFinalize(); return 0; } But the obtained solution is found out to be- (inf, inf, inf). I wanted to know whether I am doing something wrong or is the problem inherently not solvable using GMRES. Currently I am running the code in a sequential manner(no parallelism intended). Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat Apr 8 08:22:41 2017 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 8 Apr 2017 08:22:41 -0500 Subject: [petsc-users] KSP Solver giving inf's In-Reply-To: References: Message-ID: On Sat, Apr 8, 2017 at 6:54 AM, Kaushik Kulkarni wrote: > Hello, > I just started with PETSc and was trying the KSP Solvers. I was trying for > the problem - > > int main(int argc, char *argv[]) > { > /// matrix creation variables. > PetscInt *idxm = new PetscInt[3]; > PetscInt *idxn = new PetscInt[3]; > Note that the problem below is rank deficient (2 * row 1 + row 2 = 0). Its not clear to me whether b is in the range space of the operator. Thanks, Matt > PetscReal loc[] = { 1.0, -2.0, -6.0, > 2.0, 4.0, 12.0, > 1.0, -4.0,-12.0}; > PetscReal b_array[] = { 12.0, > -17.0, > 22.0}; > PetscInt i; > KSP ksp; > > /// Declaring the vectors > Vec x, b; > > // Declaring matrices > Mat A; > > PetscInitialize(&argc,&argv,(char*)0,help); > // Creating vectors > VecCreateSeq(PETSC_COMM_SELF, 3, &x); > VecCreateSeq(PETSC_COMM_SELF, 3, &b); > // Creating matrix > MatCreateSeqAIJ(PETSC_COMM_SELF, 3, 3, 3, NULL, &A); > // Creating the indices > for(i=0; i<3; i++) { > idxm[i] = i; > idxn[i] = i; > } > // Assembling the vector b and x > VecSetValues(b, 3, idxm, b_array, INSERT_VALUES); > VecAssemblyBegin(b); > VecAssemblyEnd(b); > > //Assembling the Matrix > MatSetValues(A, 3, idxm, 3, idxn, loc, INSERT_VALUES); > MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); > MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); > > // KSP related operations > KSPCreate(PETSC_COMM_SELF, &ksp); > KSPSetType(ksp, KSPGMRES); > KSPSetOperators(ksp, A, A); > KSPSetFromOptions(ksp); > KSPSolve(ksp,b,x); > KSPDestroy(&ksp); > > VecView(x, PETSC_VIEWER_STDOUT_SELF); > > PetscFinalize(); > return 0; > } > > But the obtained solution is found out to be- (inf, inf, inf). > > I wanted to know whether I am doing something wrong or is the problem > inherently not solvable using GMRES. Currently I am running the code in a > sequential manner(no parallelism intended). > > Thank you. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From kaushikggg at gmail.com Sat Apr 8 08:26:14 2017 From: kaushikggg at gmail.com (Kaushik Kulkarni) Date: Sat, 8 Apr 2017 18:56:14 +0530 Subject: [petsc-users] KSP Solver giving inf's In-Reply-To: References: Message-ID: I guess there is a miscalculation: 2*row1 + row2 = (4, 0, 0) Thanks, Kaushik On Apr 8, 2017 6:52 PM, "Matthew Knepley" wrote: On Sat, Apr 8, 2017 at 6:54 AM, Kaushik Kulkarni wrote: > Hello, > I just started with PETSc and was trying the KSP Solvers. I was trying for > the problem - > > int main(int argc, char *argv[]) > { > /// matrix creation variables. > PetscInt *idxm = new PetscInt[3]; > PetscInt *idxn = new PetscInt[3]; > Note that the problem below is rank deficient (2 * row 1 + row 2 = 0). Its not clear to me whether b is in the range space of the operator. Thanks, Matt > PetscReal loc[] = { 1.0, -2.0, -6.0, > 2.0, 4.0, 12.0, > 1.0, -4.0,-12.0}; > PetscReal b_array[] = { 12.0, > -17.0, > 22.0}; > PetscInt i; > KSP ksp; > > /// Declaring the vectors > Vec x, b; > > // Declaring matrices > Mat A; > > PetscInitialize(&argc,&argv,(char*)0,help); > // Creating vectors > VecCreateSeq(PETSC_COMM_SELF, 3, &x); > VecCreateSeq(PETSC_COMM_SELF, 3, &b); > // Creating matrix > MatCreateSeqAIJ(PETSC_COMM_SELF, 3, 3, 3, NULL, &A); > // Creating the indices > for(i=0; i<3; i++) { > idxm[i] = i; > idxn[i] = i; > } > // Assembling the vector b and x > VecSetValues(b, 3, idxm, b_array, INSERT_VALUES); > VecAssemblyBegin(b); > VecAssemblyEnd(b); > > //Assembling the Matrix > MatSetValues(A, 3, idxm, 3, idxn, loc, INSERT_VALUES); > MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); > MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); > > // KSP related operations > KSPCreate(PETSC_COMM_SELF, &ksp); > KSPSetType(ksp, KSPGMRES); > KSPSetOperators(ksp, A, A); > KSPSetFromOptions(ksp); > KSPSolve(ksp,b,x); > KSPDestroy(&ksp); > > VecView(x, PETSC_VIEWER_STDOUT_SELF); > > PetscFinalize(); > return 0; > } > > But the obtained solution is found out to be- (inf, inf, inf). > > I wanted to know whether I am doing something wrong or is the problem > inherently not solvable using GMRES. Currently I am running the code in a > sequential manner(no parallelism intended). > > Thank you. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From kaushikggg at gmail.com Sat Apr 8 08:28:08 2017 From: kaushikggg at gmail.com (Kaushik Kulkarni) Date: Sat, 8 Apr 2017 18:58:08 +0530 Subject: [petsc-users] KSP Solver giving inf's In-Reply-To: References: Message-ID: Oh sorry I get it. The determinant is zero, it is rank deficient. Sorry for the trouble. On Apr 8, 2017 6:56 PM, "Kaushik Kulkarni" wrote: I guess there is a miscalculation: 2*row1 + row2 = (4, 0, 0) Thanks, Kaushik On Apr 8, 2017 6:52 PM, "Matthew Knepley" wrote: On Sat, Apr 8, 2017 at 6:54 AM, Kaushik Kulkarni wrote: > Hello, > I just started with PETSc and was trying the KSP Solvers. I was trying for > the problem - > > int main(int argc, char *argv[]) > { > /// matrix creation variables. > PetscInt *idxm = new PetscInt[3]; > PetscInt *idxn = new PetscInt[3]; > Note that the problem below is rank deficient (2 * row 1 + row 2 = 0). Its not clear to me whether b is in the range space of the operator. Thanks, Matt > PetscReal loc[] = { 1.0, -2.0, -6.0, > 2.0, 4.0, 12.0, > 1.0, -4.0,-12.0}; > PetscReal b_array[] = { 12.0, > -17.0, > 22.0}; > PetscInt i; > KSP ksp; > > /// Declaring the vectors > Vec x, b; > > // Declaring matrices > Mat A; > > PetscInitialize(&argc,&argv,(char*)0,help); > // Creating vectors > VecCreateSeq(PETSC_COMM_SELF, 3, &x); > VecCreateSeq(PETSC_COMM_SELF, 3, &b); > // Creating matrix > MatCreateSeqAIJ(PETSC_COMM_SELF, 3, 3, 3, NULL, &A); > // Creating the indices > for(i=0; i<3; i++) { > idxm[i] = i; > idxn[i] = i; > } > // Assembling the vector b and x > VecSetValues(b, 3, idxm, b_array, INSERT_VALUES); > VecAssemblyBegin(b); > VecAssemblyEnd(b); > > //Assembling the Matrix > MatSetValues(A, 3, idxm, 3, idxn, loc, INSERT_VALUES); > MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); > MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); > > // KSP related operations > KSPCreate(PETSC_COMM_SELF, &ksp); > KSPSetType(ksp, KSPGMRES); > KSPSetOperators(ksp, A, A); > KSPSetFromOptions(ksp); > KSPSolve(ksp,b,x); > KSPDestroy(&ksp); > > VecView(x, PETSC_VIEWER_STDOUT_SELF); > > PetscFinalize(); > return 0; > } > > But the obtained solution is found out to be- (inf, inf, inf). > > I wanted to know whether I am doing something wrong or is the problem > inherently not solvable using GMRES. Currently I am running the code in a > sequential manner(no parallelism intended). > > Thank you. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat Apr 8 08:31:40 2017 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 8 Apr 2017 08:31:40 -0500 Subject: [petsc-users] KSP Solver giving inf's In-Reply-To: References: Message-ID: On Sat, Apr 8, 2017 at 8:26 AM, Kaushik Kulkarni wrote: > I guess there is a miscalculation: > 2*row1 + row2 = (4, 0, 0) > You are right here, but I think it is still rank deficient. Thanks, Matt > Thanks, > Kaushik > > On Apr 8, 2017 6:52 PM, "Matthew Knepley" wrote: > > On Sat, Apr 8, 2017 at 6:54 AM, Kaushik Kulkarni > wrote: > >> Hello, >> I just started with PETSc and was trying the KSP Solvers. I was trying >> for the problem - >> >> int main(int argc, char *argv[]) >> { >> /// matrix creation variables. >> PetscInt *idxm = new PetscInt[3]; >> PetscInt *idxn = new PetscInt[3]; >> > > Note that the problem below is rank deficient (2 * row 1 + row 2 = 0). Its > not clear to me whether b is > in the range space of the operator. > > Thanks, > > Matt > > >> PetscReal loc[] = { 1.0, -2.0, -6.0, >> 2.0, 4.0, 12.0, >> 1.0, -4.0,-12.0}; >> PetscReal b_array[] = { 12.0, >> -17.0, >> 22.0}; >> PetscInt i; >> KSP ksp; >> >> /// Declaring the vectors >> Vec x, b; >> >> // Declaring matrices >> Mat A; >> >> PetscInitialize(&argc,&argv,(char*)0,help); >> // Creating vectors >> VecCreateSeq(PETSC_COMM_SELF, 3, &x); >> VecCreateSeq(PETSC_COMM_SELF, 3, &b); >> // Creating matrix >> MatCreateSeqAIJ(PETSC_COMM_SELF, 3, 3, 3, NULL, &A); >> // Creating the indices >> for(i=0; i<3; i++) { >> idxm[i] = i; >> idxn[i] = i; >> } >> // Assembling the vector b and x >> VecSetValues(b, 3, idxm, b_array, INSERT_VALUES); >> VecAssemblyBegin(b); >> VecAssemblyEnd(b); >> >> //Assembling the Matrix >> MatSetValues(A, 3, idxm, 3, idxn, loc, INSERT_VALUES); >> MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); >> MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); >> >> // KSP related operations >> KSPCreate(PETSC_COMM_SELF, &ksp); >> KSPSetType(ksp, KSPGMRES); >> KSPSetOperators(ksp, A, A); >> KSPSetFromOptions(ksp); >> KSPSolve(ksp,b,x); >> KSPDestroy(&ksp); >> >> VecView(x, PETSC_VIEWER_STDOUT_SELF); >> >> PetscFinalize(); >> return 0; >> } >> >> But the obtained solution is found out to be- (inf, inf, inf). >> >> I wanted to know whether I am doing something wrong or is the problem >> inherently not solvable using GMRES. Currently I am running the code in a >> sequential manner(no parallelism intended). >> >> Thank you. >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From kaushikggg at gmail.com Sat Apr 8 10:20:28 2017 From: kaushikggg at gmail.com (Kaushik Kulkarni) Date: Sat, 8 Apr 2017 20:50:28 +0530 Subject: [petsc-users] KSP Solver giving inf's In-Reply-To: References: Message-ID: Hello Matthew, The same happened for: PetscReal loc[] = { 0.0, 2.0, 1.0, 1.0, -2.0, -3.0, -1.0, 1.0, 2.0}; PetscReal b_array[] = { -8.0, 0.0, 3.0}; And this time I checked the matrix to be non-singular(determinant=1). I am again getting (inf, inf, inf) as the solution. On running with -ksp_converged_reason the following message comes: Linear solve did not converge due to DIVERGED_PCSETUP_FAILED iterations 0 PCSETUP_FAILED due to FACTOR_NUMERIC_ZEROPIVOT I am once again appending the whole code for reference. Thanks. --------- static char help[] = "KSP try.\n\n"; #include "petsc.h" #include "petscksp.h" int main(int argc, char *argv[]) { /// matrix creation variables. PetscInt *idxm = new PetscInt[3]; PetscInt *idxn = new PetscInt[3]; PetscReal loc[] = { 0.0, 2.0, 1.0, 1.0, -2.0, -3.0, -1.0, 1.0, 2.0}; PetscReal b_array[] = { -8.0, 0.0, 3.0}; PetscInt i; KSP ksp; /// Declaring the vectors Vec x, b; // Declaring matrices Mat A; PetscInitialize(&argc,&argv,(char*)0,help); // Creating vectors VecCreateSeq(PETSC_COMM_SELF, 3, &x); VecCreateSeq(PETSC_COMM_SELF, 3, &b); // Creating matrix MatCreateSeqAIJ(PETSC_COMM_SELF, 3, 3, 3, NULL, &A); // Creating the indices for(i=0; i<3; i++) { idxm[i] = i; idxn[i] = i; } // Assembling the vector b and x VecSetValues(b, 3, idxm, b_array, INSERT_VALUES); VecAssemblyBegin(b); VecAssemblyEnd(b); //Assembling the Matrix MatSetValues(A, 3, idxm, 3, idxn, loc, INSERT_VALUES); MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); // KSP related operations KSPCreate(PETSC_COMM_SELF, &ksp); KSPSetType(ksp, KSPGMRES); KSPSetOperators(ksp, A, A); KSPSetFromOptions(ksp); KSPSolve(ksp,b,x); KSPDestroy(&ksp); VecView(x, PETSC_VIEWER_STDOUT_SELF); PetscFinalize(); return 0; }//End of file -------- On Sat, Apr 8, 2017 at 7:01 PM, Matthew Knepley wrote: > On Sat, Apr 8, 2017 at 8:26 AM, Kaushik Kulkarni > wrote: > >> I guess there is a miscalculation: >> 2*row1 + row2 = (4, 0, 0) >> > > You are right here, but I think it is still rank deficient. > > Thanks, > > Matt > > >> Thanks, >> Kaushik >> >> On Apr 8, 2017 6:52 PM, "Matthew Knepley" wrote: >> >> On Sat, Apr 8, 2017 at 6:54 AM, Kaushik Kulkarni >> wrote: >> >>> Hello, >>> I just started with PETSc and was trying the KSP Solvers. I was trying >>> for the problem - >>> >>> int main(int argc, char *argv[]) >>> { >>> /// matrix creation variables. >>> PetscInt *idxm = new PetscInt[3]; >>> PetscInt *idxn = new PetscInt[3]; >>> >> >> Note that the problem below is rank deficient (2 * row 1 + row 2 = 0). >> Its not clear to me whether b is >> in the range space of the operator. >> >> Thanks, >> >> Matt >> >> >>> PetscReal loc[] = { 1.0, -2.0, -6.0, >>> 2.0, 4.0, 12.0, >>> 1.0, -4.0,-12.0}; >>> PetscReal b_array[] = { 12.0, >>> -17.0, >>> 22.0}; >>> PetscInt i; >>> KSP ksp; >>> >>> /// Declaring the vectors >>> Vec x, b; >>> >>> // Declaring matrices >>> Mat A; >>> >>> PetscInitialize(&argc,&argv,(char*)0,help); >>> // Creating vectors >>> VecCreateSeq(PETSC_COMM_SELF, 3, &x); >>> VecCreateSeq(PETSC_COMM_SELF, 3, &b); >>> // Creating matrix >>> MatCreateSeqAIJ(PETSC_COMM_SELF, 3, 3, 3, NULL, &A); >>> // Creating the indices >>> for(i=0; i<3; i++) { >>> idxm[i] = i; >>> idxn[i] = i; >>> } >>> // Assembling the vector b and x >>> VecSetValues(b, 3, idxm, b_array, INSERT_VALUES); >>> VecAssemblyBegin(b); >>> VecAssemblyEnd(b); >>> >>> //Assembling the Matrix >>> MatSetValues(A, 3, idxm, 3, idxn, loc, INSERT_VALUES); >>> MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); >>> MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); >>> >>> // KSP related operations >>> KSPCreate(PETSC_COMM_SELF, &ksp); >>> KSPSetType(ksp, KSPGMRES); >>> KSPSetOperators(ksp, A, A); >>> KSPSetFromOptions(ksp); >>> KSPSolve(ksp,b,x); >>> KSPDestroy(&ksp); >>> >>> VecView(x, PETSC_VIEWER_STDOUT_SELF); >>> >>> PetscFinalize(); >>> return 0; >>> } >>> >>> But the obtained solution is found out to be- (inf, inf, inf). >>> >>> I wanted to know whether I am doing something wrong or is the problem >>> inherently not solvable using GMRES. Currently I am running the code in a >>> sequential manner(no parallelism intended). >>> >>> Thank you. >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -- Kaushik Kulkarni Fourth Year Undergraduate Department of Mechanical Engineering Indian Institute of Technology Bombay Mumbai, India https://kaushikcfd.github.io/About/ +91-9967687150 -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sat Apr 8 11:19:10 2017 From: knepley at gmail.com (Matthew Knepley) Date: Sat, 8 Apr 2017 11:19:10 -0500 Subject: [petsc-users] KSP Solver giving inf's In-Reply-To: References: Message-ID: On Apr 8, 2017 10:20, "Kaushik Kulkarni" wrote: Hello Matthew, The same happened for: PetscReal loc[] = { 0.0, 2.0, 1.0, 1.0, -2.0, -3.0, -1.0, 1.0, 2.0}; PetscReal b_array[] = { -8.0, 0.0, 3.0}; And this time I checked the matrix to be non-singular(determinant=1). I am again getting (inf, inf, inf) as the solution. On running with -ksp_converged_reason the following message comes: Linear solve did not converge due to DIVERGED_PCSETUP_FAILED iterations 0 PCSETUP_FAILED due to FACTOR_NUMERIC_ZEROPIVOT 1) Always check the return code of every call using CHKERRQ(ierr) 2) You get Info because x I'd uninitialized and the solve failed 3) You are using LU but have a zero on the diagonal. Use -pc_type jacobi and it will work Matt I am once again appending the whole code for reference. Thanks. --------- static char help[] = "KSP try.\n\n"; #include "petsc.h" #include "petscksp.h" int main(int argc, char *argv[]) { /// matrix creation variables. PetscInt *idxm = new PetscInt[3]; PetscInt *idxn = new PetscInt[3]; PetscReal loc[] = { 0.0, 2.0, 1.0, 1.0, -2.0, -3.0, -1.0, 1.0, 2.0}; PetscReal b_array[] = { -8.0, 0.0, 3.0}; PetscInt i; KSP ksp; /// Declaring the vectors Vec x, b; // Declaring matrices Mat A; PetscInitialize(&argc,&argv,(char*)0,help); // Creating vectors VecCreateSeq(PETSC_COMM_SELF, 3, &x); VecCreateSeq(PETSC_COMM_SELF, 3, &b); // Creating matrix MatCreateSeqAIJ(PETSC_COMM_SELF, 3, 3, 3, NULL, &A); // Creating the indices for(i=0; i<3; i++) { idxm[i] = i; idxn[i] = i; } // Assembling the vector b and x VecSetValues(b, 3, idxm, b_array, INSERT_VALUES); VecAssemblyBegin(b); VecAssemblyEnd(b); //Assembling the Matrix MatSetValues(A, 3, idxm, 3, idxn, loc, INSERT_VALUES); MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); // KSP related operations KSPCreate(PETSC_COMM_SELF, &ksp); KSPSetType(ksp, KSPGMRES); KSPSetOperators(ksp, A, A); KSPSetFromOptions(ksp); KSPSolve(ksp,b,x); KSPDestroy(&ksp); VecView(x, PETSC_VIEWER_STDOUT_SELF); PetscFinalize(); return 0; }//End of file -------- On Sat, Apr 8, 2017 at 7:01 PM, Matthew Knepley wrote: > On Sat, Apr 8, 2017 at 8:26 AM, Kaushik Kulkarni > wrote: > >> I guess there is a miscalculation: >> 2*row1 + row2 = (4, 0, 0) >> > > You are right here, but I think it is still rank deficient. > > Thanks, > > Matt > > >> Thanks, >> Kaushik >> >> On Apr 8, 2017 6:52 PM, "Matthew Knepley" wrote: >> >> On Sat, Apr 8, 2017 at 6:54 AM, Kaushik Kulkarni >> wrote: >> >>> Hello, >>> I just started with PETSc and was trying the KSP Solvers. I was trying >>> for the problem - >>> >>> int main(int argc, char *argv[]) >>> { >>> /// matrix creation variables. >>> PetscInt *idxm = new PetscInt[3]; >>> PetscInt *idxn = new PetscInt[3]; >>> >> >> Note that the problem below is rank deficient (2 * row 1 + row 2 = 0). >> Its not clear to me whether b is >> in the range space of the operator. >> >> Thanks, >> >> Matt >> >> >>> PetscReal loc[] = { 1.0, -2.0, -6.0, >>> 2.0, 4.0, 12.0, >>> 1.0, -4.0,-12.0}; >>> PetscReal b_array[] = { 12.0, >>> -17.0, >>> 22.0}; >>> PetscInt i; >>> KSP ksp; >>> >>> /// Declaring the vectors >>> Vec x, b; >>> >>> // Declaring matrices >>> Mat A; >>> >>> PetscInitialize(&argc,&argv,(char*)0,help); >>> // Creating vectors >>> VecCreateSeq(PETSC_COMM_SELF, 3, &x); >>> VecCreateSeq(PETSC_COMM_SELF, 3, &b); >>> // Creating matrix >>> MatCreateSeqAIJ(PETSC_COMM_SELF, 3, 3, 3, NULL, &A); >>> // Creating the indices >>> for(i=0; i<3; i++) { >>> idxm[i] = i; >>> idxn[i] = i; >>> } >>> // Assembling the vector b and x >>> VecSetValues(b, 3, idxm, b_array, INSERT_VALUES); >>> VecAssemblyBegin(b); >>> VecAssemblyEnd(b); >>> >>> //Assembling the Matrix >>> MatSetValues(A, 3, idxm, 3, idxn, loc, INSERT_VALUES); >>> MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); >>> MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); >>> >>> // KSP related operations >>> KSPCreate(PETSC_COMM_SELF, &ksp); >>> KSPSetType(ksp, KSPGMRES); >>> KSPSetOperators(ksp, A, A); >>> KSPSetFromOptions(ksp); >>> KSPSolve(ksp,b,x); >>> KSPDestroy(&ksp); >>> >>> VecView(x, PETSC_VIEWER_STDOUT_SELF); >>> >>> PetscFinalize(); >>> return 0; >>> } >>> >>> But the obtained solution is found out to be- (inf, inf, inf). >>> >>> I wanted to know whether I am doing something wrong or is the problem >>> inherently not solvable using GMRES. Currently I am running the code in a >>> sequential manner(no parallelism intended). >>> >>> Thank you. >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -- Kaushik Kulkarni Fourth Year Undergraduate Department of Mechanical Engineering Indian Institute of Technology Bombay Mumbai, India https://kaushikcfd.github.io/About/ +91-9967687150 <+91%2099676%2087150> -------------- next part -------------- An HTML attachment was scrubbed... URL: From kaushikggg at gmail.com Sat Apr 8 11:21:49 2017 From: kaushikggg at gmail.com (Kaushik Kulkarni) Date: Sat, 8 Apr 2017 21:51:49 +0530 Subject: [petsc-users] KSP Solver giving inf's In-Reply-To: References: Message-ID: Yes that worked. Thanks a lot. On Sat, Apr 8, 2017 at 9:49 PM, Matthew Knepley wrote: > > > On Apr 8, 2017 10:20, "Kaushik Kulkarni" wrote: > > Hello Matthew, > The same happened for: > > PetscReal loc[] = { 0.0, 2.0, 1.0, > 1.0, -2.0, -3.0, > -1.0, 1.0, 2.0}; > PetscReal b_array[] = { -8.0, > 0.0, > 3.0}; > > And this time I checked the matrix to be non-singular(determinant=1). I am > again getting (inf, inf, inf) as the solution. > > On running with -ksp_converged_reason the following message comes: > Linear solve did not converge due to DIVERGED_PCSETUP_FAILED iterations 0 > PCSETUP_FAILED due to FACTOR_NUMERIC_ZEROPIVOT > > > 1) Always check the return code of every call using CHKERRQ(ierr) > > 2) You get Info because x I'd uninitialized and the solve failed > > 3) You are using LU but have a zero on the diagonal. Use -pc_type jacobi > and it will work > > > Matt > > I am once again appending the whole code for reference. > > Thanks. > --------- > > static char help[] = "KSP try.\n\n"; > > > #include "petsc.h" > #include "petscksp.h" > > > int main(int argc, char *argv[]) > { > /// matrix creation variables. > PetscInt *idxm = new PetscInt[3]; > PetscInt *idxn = new PetscInt[3]; > PetscReal loc[] = { 0.0, 2.0, 1.0, > 1.0, -2.0, -3.0, > -1.0, 1.0, 2.0}; > PetscReal b_array[] = { -8.0, > 0.0, > 3.0}; > > PetscInt i; > KSP ksp; > > /// Declaring the vectors > Vec x, b; > > // Declaring matrices > Mat A; > > PetscInitialize(&argc,&argv,(char*)0,help); > // Creating vectors > VecCreateSeq(PETSC_COMM_SELF, 3, &x); > VecCreateSeq(PETSC_COMM_SELF, 3, &b); > // Creating matrix > MatCreateSeqAIJ(PETSC_COMM_SELF, 3, 3, 3, NULL, &A); > // Creating the indices > for(i=0; i<3; i++) { > idxm[i] = i; > idxn[i] = i; > } > // Assembling the vector b and x > VecSetValues(b, 3, idxm, b_array, INSERT_VALUES); > VecAssemblyBegin(b); > VecAssemblyEnd(b); > > //Assembling the Matrix > MatSetValues(A, 3, idxm, 3, idxn, loc, INSERT_VALUES); > MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); > MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); > > // KSP related operations > KSPCreate(PETSC_COMM_SELF, &ksp); > KSPSetType(ksp, KSPGMRES); > KSPSetOperators(ksp, A, A); > KSPSetFromOptions(ksp); > KSPSolve(ksp,b,x); > KSPDestroy(&ksp); > > VecView(x, PETSC_VIEWER_STDOUT_SELF); > > PetscFinalize(); > return 0; > }//End of file > -------- > > > On Sat, Apr 8, 2017 at 7:01 PM, Matthew Knepley wrote: > >> On Sat, Apr 8, 2017 at 8:26 AM, Kaushik Kulkarni >> wrote: >> >>> I guess there is a miscalculation: >>> 2*row1 + row2 = (4, 0, 0) >>> >> >> You are right here, but I think it is still rank deficient. >> >> Thanks, >> >> Matt >> >> >>> Thanks, >>> Kaushik >>> >>> On Apr 8, 2017 6:52 PM, "Matthew Knepley" wrote: >>> >>> On Sat, Apr 8, 2017 at 6:54 AM, Kaushik Kulkarni >>> wrote: >>> >>>> Hello, >>>> I just started with PETSc and was trying the KSP Solvers. I was trying >>>> for the problem - >>>> >>>> int main(int argc, char *argv[]) >>>> { >>>> /// matrix creation variables. >>>> PetscInt *idxm = new PetscInt[3]; >>>> PetscInt *idxn = new PetscInt[3]; >>>> >>> >>> Note that the problem below is rank deficient (2 * row 1 + row 2 = 0). >>> Its not clear to me whether b is >>> in the range space of the operator. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> PetscReal loc[] = { 1.0, -2.0, -6.0, >>>> 2.0, 4.0, 12.0, >>>> 1.0, -4.0,-12.0}; >>>> PetscReal b_array[] = { 12.0, >>>> -17.0, >>>> 22.0}; >>>> PetscInt i; >>>> KSP ksp; >>>> >>>> /// Declaring the vectors >>>> Vec x, b; >>>> >>>> // Declaring matrices >>>> Mat A; >>>> >>>> PetscInitialize(&argc,&argv,(char*)0,help); >>>> // Creating vectors >>>> VecCreateSeq(PETSC_COMM_SELF, 3, &x); >>>> VecCreateSeq(PETSC_COMM_SELF, 3, &b); >>>> // Creating matrix >>>> MatCreateSeqAIJ(PETSC_COMM_SELF, 3, 3, 3, NULL, &A); >>>> // Creating the indices >>>> for(i=0; i<3; i++) { >>>> idxm[i] = i; >>>> idxn[i] = i; >>>> } >>>> // Assembling the vector b and x >>>> VecSetValues(b, 3, idxm, b_array, INSERT_VALUES); >>>> VecAssemblyBegin(b); >>>> VecAssemblyEnd(b); >>>> >>>> //Assembling the Matrix >>>> MatSetValues(A, 3, idxm, 3, idxn, loc, INSERT_VALUES); >>>> MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); >>>> MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); >>>> >>>> // KSP related operations >>>> KSPCreate(PETSC_COMM_SELF, &ksp); >>>> KSPSetType(ksp, KSPGMRES); >>>> KSPSetOperators(ksp, A, A); >>>> KSPSetFromOptions(ksp); >>>> KSPSolve(ksp,b,x); >>>> KSPDestroy(&ksp); >>>> >>>> VecView(x, PETSC_VIEWER_STDOUT_SELF); >>>> >>>> PetscFinalize(); >>>> return 0; >>>> } >>>> >>>> But the obtained solution is found out to be- (inf, inf, inf). >>>> >>>> I wanted to know whether I am doing something wrong or is the problem >>>> inherently not solvable using GMRES. Currently I am running the code in a >>>> sequential manner(no parallelism intended). >>>> >>>> Thank you. >>>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > > > -- > Kaushik Kulkarni > Fourth Year Undergraduate > Department of Mechanical Engineering > Indian Institute of Technology Bombay > Mumbai, India > https://kaushikcfd.github.io/About/ > +91-9967687150 <+91%2099676%2087150> > > > -- Kaushik Kulkarni Fourth Year Undergraduate Department of Mechanical Engineering Indian Institute of Technology Bombay Mumbai, India https://kaushikcfd.github.io/About/ +91-9967687150 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bhatiamanav at gmail.com Sat Apr 8 23:21:17 2017 From: bhatiamanav at gmail.com (Manav Bhatia) Date: Sat, 8 Apr 2017 23:21:17 -0500 Subject: [petsc-users] odd behavior when using lapack's dgeev with petsc In-Reply-To: References: <9ABBA7EA-63AC-40BD-BA47-64D77E6B0471@gmail.com> <74FE8F6B-C424-4E87-A3FD-29C4F1213236@mcs.anl.gov> <6D896251-CE0D-4EA9-92D0-4E65B5F749AF@mcs.anl.gov> Message-ID: <323B2492-44A6-41E4-ABDE-0EB2781EF072@gmail.com> It turned out to be another third-party library that had some functions names same as those from BLAS. This was confusing LAPACK. Building without that library is leading to expected results from the code. What I don?t understand is why this shows up on one platform out of the four different OSs that I have been using this code on. -Manav > On Apr 8, 2017, at 2:20 AM, Satish Balay wrote: > > I would do a minimal petsc build - without any packages from > /usr/local - and see if the problem presists.. > > Satish > > On Fri, 7 Apr 2017, Barry Smith wrote: > >> >>> On Apr 7, 2017, at 3:34 PM, Manav Bhatia wrote: >>> >>> Yes, I printed the data in both cases and they look the same. >>> >>> I also used ?set step-mode on? to show the system lapack info, and they both are using the same lapack routine. >>> >>> This is still baffling me. >> >> alignment of the input arrays, both the same? >> >> I don't know why this is happening; what if you use your standalone code but link against all the libraries that are linked against for the PETSc case. >> >> >> >> >>> >>> -Manav >>> >>> >>>> On Apr 7, 2017, at 3:22 PM, Barry Smith wrote: >>>> >>>> >>>>> On Apr 7, 2017, at 2:57 PM, Manav Bhatia wrote: >>>>> >>>>> Hi Barry, >>>>> >>>>> Thanks for the inputs. >>>>> >>>>> I did try that, but the debugger (gdb) stepped right over the dgeev_ call, without getting inside the function. >>>> >>>> Did it at least stop at the function so you do an up and print all the arguments passed in? >>>> >>>>> >>>>> I am wondering if this has anything to do with the fact that the system lapack library might not have any debugging info in it. >>>> >>>> Yeah I forgot it might not have them. >>>> >>>> Barry >>>> >>>>> >>>>> Thoughts? >>>>> >>>>> Regards, >>>>> Manav >>>>> >>>>>> On Apr 7, 2017, at 2:40 PM, Barry Smith wrote: >>>>>> >>>>>> >>>>>>> On Apr 7, 2017, at 1:46 PM, Manav Bhatia wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I have compile petsc on my Ubuntu machine (also Mac OS 10.12 separately) to link to the system lapack and blas libraries (shown below). >>>>>>> >>>>>>> I have created an interface class to dgeev in lapack to calculate the eigenvalues of a matrix. >>>>>>> >>>>>>> My application code links to multiple libraries: libMesh, petsc, slepc, hdf5, etc. >>>>>>> >>>>>>> If I test my interface inside this application code, I get junk results. >>>>>> >>>>>> This is easy to debug because you have a version that works. >>>>>> >>>>>> Run both versions in separate windows each in a debugger and put a break point in the dgeev_ function. When it gets there check that it is the same dgeev_ function in both cases and check that the inputs are the same then step through both to see when things start to change between the two. >>>>>> >>>>>>> >>>>>>> However, on the same machine, if I use the interface in a separate main() function without linking to any of the libraries except lapack and blas, then I get expected results. >>>>>>> >>>>>>> Also, this problem does not show up on Mac. >>>>>>> >>>>>>> I am not sure what could be causing this and don?t quite know where to start. Could Petsc have anything to do with this? >>>>>>> >>>>>>> Any insight would be greatly appreciated. >>>>>>> >>>>>>> Regards, >>>>>>> Manav >>>>>>> >>>>>>> manav at manav1:~/test$ ldd /opt/local/lib/libpetsc.so >>>>>>> linux-vdso.so.1 => (0x00007fff3e7a8000) >>>>>>> libsuperlu_dist.so.5 => /opt/local/lib/libsuperlu_dist.so.5 (0x00007f721fbd1000) >>>>>>> libparmetis.so => /opt/local/lib/libparmetis.so (0x00007f721f990000) >>>>>>> libmetis.so => /opt/local/lib/libmetis.so (0x00007f721f718000) >>>>>>> libsuperlu.so.5 => /opt/local/lib/libsuperlu.so.5 (0x00007f721f4a7000) >>>>>>> libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f721f124000) >>>>>>> liblapack.so.3 => /usr/lib/liblapack.so.3 (0x00007f721e92c000) >>>>>>> libblas.so.3 => /usr/lib/libblas.so.3 (0x00007f721e6bd000) >>>>>>> libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007f721e382000) >>>>>>> libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f721e079000) >>>>>>> libmpi_mpifh.so.12 => /usr/lib/libmpi_mpifh.so.12 (0x00007f721de20000) >>>>>>> libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007f721daf4000) >>>>>>> libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f721d8f0000) >>>>>>> libmpi.so.12 => /usr/lib/libmpi.so.12 (0x00007f721d61a000) >>>>>>> libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f721d403000) >>>>>>> libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f721d1e6000) >>>>>>> libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f721ce1d000) >>>>>>> /lib64/ld-linux-x86-64.so.2 (0x000055d739f1b000) >>>>>>> libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007f721cbcf000) >>>>>>> libopen-pal.so.13 => /usr/lib/libopen-pal.so.13 (0x00007f721c932000) >>>>>>> libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f721c6f2000) >>>>>>> libibverbs.so.1 => /usr/lib/libibverbs.so.1 (0x00007f721c4e3000) >>>>>>> libopen-rte.so.12 => /usr/lib/libopen-rte.so.12 (0x00007f721c269000) >>>>>>> libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007f721c064000) >>>>>>> libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007f721be5e000) >>>>>>> librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f721bc56000) >>>>>>> libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f721ba52000) >>>>>>> libhwloc.so.5 => /usr/lib/x86_64-linux-gnu/libhwloc.so.5 (0x00007f721b818000) >>>>>>> libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007f721b60c000) >>>>>>> libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007f721b402000) >>>>>>> >>>>>> >>>>> >>>> >>> >> >> From bsmith at mcs.anl.gov Sat Apr 8 23:28:42 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 8 Apr 2017 23:28:42 -0500 Subject: [petsc-users] odd behavior when using lapack's dgeev with petsc In-Reply-To: <323B2492-44A6-41E4-ABDE-0EB2781EF072@gmail.com> References: <9ABBA7EA-63AC-40BD-BA47-64D77E6B0471@gmail.com> <74FE8F6B-C424-4E87-A3FD-29C4F1213236@mcs.anl.gov> <6D896251-CE0D-4EA9-92D0-4E65B5F749AF@mcs.anl.gov> <323B2492-44A6-41E4-ABDE-0EB2781EF072@gmail.com> Message-ID: <8D3D9529-87F4-46EA-B8FF-DB16D1A6F277@mcs.anl.gov> > On Apr 8, 2017, at 11:21 PM, Manav Bhatia wrote: > > It turned out to be another third-party library that had some functions names same as those from BLAS. Can you tell us what library it was? We can add some checks for this and thus prevent others from having to spend so much time, like you had to, figure it out. IMHO it is evil for a library to include routines with the same names as in BLAS and they should fix their libraries. > This was confusing LAPACK. > Building without that library is leading to expected results from the code. > > What I don?t understand is why this shows up on one platform out of the four different OSs that I have been using this code on. It depends on the order that the OS references the libraries, so this is actually not surprising. Barry > > -Manav > >> On Apr 8, 2017, at 2:20 AM, Satish Balay wrote: >> >> I would do a minimal petsc build - without any packages from >> /usr/local - and see if the problem presists.. >> >> Satish >> >> On Fri, 7 Apr 2017, Barry Smith wrote: >> >>> >>>> On Apr 7, 2017, at 3:34 PM, Manav Bhatia wrote: >>>> >>>> Yes, I printed the data in both cases and they look the same. >>>> >>>> I also used ?set step-mode on? to show the system lapack info, and they both are using the same lapack routine. >>>> >>>> This is still baffling me. >>> >>> alignment of the input arrays, both the same? >>> >>> I don't know why this is happening; what if you use your standalone code but link against all the libraries that are linked against for the PETSc case. >>> >>> >>> >>> >>>> >>>> -Manav >>>> >>>> >>>>> On Apr 7, 2017, at 3:22 PM, Barry Smith wrote: >>>>> >>>>> >>>>>> On Apr 7, 2017, at 2:57 PM, Manav Bhatia wrote: >>>>>> >>>>>> Hi Barry, >>>>>> >>>>>> Thanks for the inputs. >>>>>> >>>>>> I did try that, but the debugger (gdb) stepped right over the dgeev_ call, without getting inside the function. >>>>> >>>>> Did it at least stop at the function so you do an up and print all the arguments passed in? >>>>> >>>>>> >>>>>> I am wondering if this has anything to do with the fact that the system lapack library might not have any debugging info in it. >>>>> >>>>> Yeah I forgot it might not have them. >>>>> >>>>> Barry >>>>> >>>>>> >>>>>> Thoughts? >>>>>> >>>>>> Regards, >>>>>> Manav >>>>>> >>>>>>> On Apr 7, 2017, at 2:40 PM, Barry Smith wrote: >>>>>>> >>>>>>> >>>>>>>> On Apr 7, 2017, at 1:46 PM, Manav Bhatia wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I have compile petsc on my Ubuntu machine (also Mac OS 10.12 separately) to link to the system lapack and blas libraries (shown below). >>>>>>>> >>>>>>>> I have created an interface class to dgeev in lapack to calculate the eigenvalues of a matrix. >>>>>>>> >>>>>>>> My application code links to multiple libraries: libMesh, petsc, slepc, hdf5, etc. >>>>>>>> >>>>>>>> If I test my interface inside this application code, I get junk results. >>>>>>> >>>>>>> This is easy to debug because you have a version that works. >>>>>>> >>>>>>> Run both versions in separate windows each in a debugger and put a break point in the dgeev_ function. When it gets there check that it is the same dgeev_ function in both cases and check that the inputs are the same then step through both to see when things start to change between the two. >>>>>>> >>>>>>>> >>>>>>>> However, on the same machine, if I use the interface in a separate main() function without linking to any of the libraries except lapack and blas, then I get expected results. >>>>>>>> >>>>>>>> Also, this problem does not show up on Mac. >>>>>>>> >>>>>>>> I am not sure what could be causing this and don?t quite know where to start. Could Petsc have anything to do with this? >>>>>>>> >>>>>>>> Any insight would be greatly appreciated. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Manav >>>>>>>> >>>>>>>> manav at manav1:~/test$ ldd /opt/local/lib/libpetsc.so >>>>>>>> linux-vdso.so.1 => (0x00007fff3e7a8000) >>>>>>>> libsuperlu_dist.so.5 => /opt/local/lib/libsuperlu_dist.so.5 (0x00007f721fbd1000) >>>>>>>> libparmetis.so => /opt/local/lib/libparmetis.so (0x00007f721f990000) >>>>>>>> libmetis.so => /opt/local/lib/libmetis.so (0x00007f721f718000) >>>>>>>> libsuperlu.so.5 => /opt/local/lib/libsuperlu.so.5 (0x00007f721f4a7000) >>>>>>>> libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f721f124000) >>>>>>>> liblapack.so.3 => /usr/lib/liblapack.so.3 (0x00007f721e92c000) >>>>>>>> libblas.so.3 => /usr/lib/libblas.so.3 (0x00007f721e6bd000) >>>>>>>> libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007f721e382000) >>>>>>>> libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f721e079000) >>>>>>>> libmpi_mpifh.so.12 => /usr/lib/libmpi_mpifh.so.12 (0x00007f721de20000) >>>>>>>> libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007f721daf4000) >>>>>>>> libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f721d8f0000) >>>>>>>> libmpi.so.12 => /usr/lib/libmpi.so.12 (0x00007f721d61a000) >>>>>>>> libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f721d403000) >>>>>>>> libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f721d1e6000) >>>>>>>> libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f721ce1d000) >>>>>>>> /lib64/ld-linux-x86-64.so.2 (0x000055d739f1b000) >>>>>>>> libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007f721cbcf000) >>>>>>>> libopen-pal.so.13 => /usr/lib/libopen-pal.so.13 (0x00007f721c932000) >>>>>>>> libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f721c6f2000) >>>>>>>> libibverbs.so.1 => /usr/lib/libibverbs.so.1 (0x00007f721c4e3000) >>>>>>>> libopen-rte.so.12 => /usr/lib/libopen-rte.so.12 (0x00007f721c269000) >>>>>>>> libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007f721c064000) >>>>>>>> libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007f721be5e000) >>>>>>>> librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f721bc56000) >>>>>>>> libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f721ba52000) >>>>>>>> libhwloc.so.5 => /usr/lib/x86_64-linux-gnu/libhwloc.so.5 (0x00007f721b818000) >>>>>>>> libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007f721b60c000) >>>>>>>> libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007f721b402000) >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>> > From bhatiamanav at gmail.com Sat Apr 8 23:31:32 2017 From: bhatiamanav at gmail.com (Manav Bhatia) Date: Sat, 8 Apr 2017 23:31:32 -0500 Subject: [petsc-users] odd behavior when using lapack's dgeev with petsc In-Reply-To: <8D3D9529-87F4-46EA-B8FF-DB16D1A6F277@mcs.anl.gov> References: <9ABBA7EA-63AC-40BD-BA47-64D77E6B0471@gmail.com> <74FE8F6B-C424-4E87-A3FD-29C4F1213236@mcs.anl.gov> <6D896251-CE0D-4EA9-92D0-4E65B5F749AF@mcs.anl.gov> <323B2492-44A6-41E4-ABDE-0EB2781EF072@gmail.com> <8D3D9529-87F4-46EA-B8FF-DB16D1A6F277@mcs.anl.gov> Message-ID: It was a library from a colleague, particularly for internal use. So, this will not perpetuate to your users. I appreciate your concern and support. We will fix this at our end. -Manav > On Apr 8, 2017, at 11:28 PM, Barry Smith wrote: > > >> On Apr 8, 2017, at 11:21 PM, Manav Bhatia wrote: >> >> It turned out to be another third-party library that had some functions names same as those from BLAS. > > Can you tell us what library it was? We can add some checks for this and thus prevent others from having to spend so much time, like you had to, figure it out. IMHO it is evil for a library to include routines with the same names as in BLAS and they should fix their libraries. > >> This was confusing LAPACK. >> Building without that library is leading to expected results from the code. >> >> What I don?t understand is why this shows up on one platform out of the four different OSs that I have been using this code on. > > > It depends on the order that the OS references the libraries, so this is actually not surprising. > > Barry > > >> >> -Manav >> >>> On Apr 8, 2017, at 2:20 AM, Satish Balay wrote: >>> >>> I would do a minimal petsc build - without any packages from >>> /usr/local - and see if the problem presists.. >>> >>> Satish >>> >>> On Fri, 7 Apr 2017, Barry Smith wrote: >>> >>>> >>>>> On Apr 7, 2017, at 3:34 PM, Manav Bhatia wrote: >>>>> >>>>> Yes, I printed the data in both cases and they look the same. >>>>> >>>>> I also used ?set step-mode on? to show the system lapack info, and they both are using the same lapack routine. >>>>> >>>>> This is still baffling me. >>>> >>>> alignment of the input arrays, both the same? >>>> >>>> I don't know why this is happening; what if you use your standalone code but link against all the libraries that are linked against for the PETSc case. >>>> >>>> >>>> >>>> >>>>> >>>>> -Manav >>>>> >>>>> >>>>>> On Apr 7, 2017, at 3:22 PM, Barry Smith wrote: >>>>>> >>>>>> >>>>>>> On Apr 7, 2017, at 2:57 PM, Manav Bhatia wrote: >>>>>>> >>>>>>> Hi Barry, >>>>>>> >>>>>>> Thanks for the inputs. >>>>>>> >>>>>>> I did try that, but the debugger (gdb) stepped right over the dgeev_ call, without getting inside the function. >>>>>> >>>>>> Did it at least stop at the function so you do an up and print all the arguments passed in? >>>>>> >>>>>>> >>>>>>> I am wondering if this has anything to do with the fact that the system lapack library might not have any debugging info in it. >>>>>> >>>>>> Yeah I forgot it might not have them. >>>>>> >>>>>> Barry >>>>>> >>>>>>> >>>>>>> Thoughts? >>>>>>> >>>>>>> Regards, >>>>>>> Manav >>>>>>> >>>>>>>> On Apr 7, 2017, at 2:40 PM, Barry Smith wrote: >>>>>>>> >>>>>>>> >>>>>>>>> On Apr 7, 2017, at 1:46 PM, Manav Bhatia wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I have compile petsc on my Ubuntu machine (also Mac OS 10.12 separately) to link to the system lapack and blas libraries (shown below). >>>>>>>>> >>>>>>>>> I have created an interface class to dgeev in lapack to calculate the eigenvalues of a matrix. >>>>>>>>> >>>>>>>>> My application code links to multiple libraries: libMesh, petsc, slepc, hdf5, etc. >>>>>>>>> >>>>>>>>> If I test my interface inside this application code, I get junk results. >>>>>>>> >>>>>>>> This is easy to debug because you have a version that works. >>>>>>>> >>>>>>>> Run both versions in separate windows each in a debugger and put a break point in the dgeev_ function. When it gets there check that it is the same dgeev_ function in both cases and check that the inputs are the same then step through both to see when things start to change between the two. >>>>>>>> >>>>>>>>> >>>>>>>>> However, on the same machine, if I use the interface in a separate main() function without linking to any of the libraries except lapack and blas, then I get expected results. >>>>>>>>> >>>>>>>>> Also, this problem does not show up on Mac. >>>>>>>>> >>>>>>>>> I am not sure what could be causing this and don?t quite know where to start. Could Petsc have anything to do with this? >>>>>>>>> >>>>>>>>> Any insight would be greatly appreciated. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Manav >>>>>>>>> >>>>>>>>> manav at manav1:~/test$ ldd /opt/local/lib/libpetsc.so >>>>>>>>> linux-vdso.so.1 => (0x00007fff3e7a8000) >>>>>>>>> libsuperlu_dist.so.5 => /opt/local/lib/libsuperlu_dist.so.5 (0x00007f721fbd1000) >>>>>>>>> libparmetis.so => /opt/local/lib/libparmetis.so (0x00007f721f990000) >>>>>>>>> libmetis.so => /opt/local/lib/libmetis.so (0x00007f721f718000) >>>>>>>>> libsuperlu.so.5 => /opt/local/lib/libsuperlu.so.5 (0x00007f721f4a7000) >>>>>>>>> libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f721f124000) >>>>>>>>> liblapack.so.3 => /usr/lib/liblapack.so.3 (0x00007f721e92c000) >>>>>>>>> libblas.so.3 => /usr/lib/libblas.so.3 (0x00007f721e6bd000) >>>>>>>>> libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007f721e382000) >>>>>>>>> libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f721e079000) >>>>>>>>> libmpi_mpifh.so.12 => /usr/lib/libmpi_mpifh.so.12 (0x00007f721de20000) >>>>>>>>> libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007f721daf4000) >>>>>>>>> libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f721d8f0000) >>>>>>>>> libmpi.so.12 => /usr/lib/libmpi.so.12 (0x00007f721d61a000) >>>>>>>>> libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f721d403000) >>>>>>>>> libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f721d1e6000) >>>>>>>>> libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f721ce1d000) >>>>>>>>> /lib64/ld-linux-x86-64.so.2 (0x000055d739f1b000) >>>>>>>>> libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007f721cbcf000) >>>>>>>>> libopen-pal.so.13 => /usr/lib/libopen-pal.so.13 (0x00007f721c932000) >>>>>>>>> libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f721c6f2000) >>>>>>>>> libibverbs.so.1 => /usr/lib/libibverbs.so.1 (0x00007f721c4e3000) >>>>>>>>> libopen-rte.so.12 => /usr/lib/libopen-rte.so.12 (0x00007f721c269000) >>>>>>>>> libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007f721c064000) >>>>>>>>> libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007f721be5e000) >>>>>>>>> librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f721bc56000) >>>>>>>>> libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f721ba52000) >>>>>>>>> libhwloc.so.5 => /usr/lib/x86_64-linux-gnu/libhwloc.so.5 (0x00007f721b818000) >>>>>>>>> libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007f721b60c000) >>>>>>>>> libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007f721b402000) >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >> > From mfadams at lbl.gov Sun Apr 9 07:04:37 2017 From: mfadams at lbl.gov (Mark Adams) Date: Sun, 9 Apr 2017 08:04:37 -0400 Subject: [petsc-users] GAMG for the unsymmetrical matrix In-Reply-To: References: <66D79752-D4C3-4A2E-ADCE-EFC23C93CCD7@mcs.anl.gov> <772D2966-F917-44D1-B2AC-B0F4E506DC7C@mcs.anl.gov> Message-ID: You seem to have two levels here and 3M eqs on the fine grid and 37 on the coarse grid. I don't understand that. You are also calling the AMG setup a lot, but not spending much time in it. Try running with -info and grep on "GAMG". On Fri, Apr 7, 2017 at 5:29 PM, Kong, Fande wrote: > Thanks, Barry. > > It works. > > GAMG is three times better than ASM in terms of the number of linear > iterations, but it is five times slower than ASM. Any suggestions to improve > the performance of GAMG? Log files are attached. > > Fande, > > On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith wrote: >> >> >> > On Apr 6, 2017, at 9:39 AM, Kong, Fande wrote: >> > >> > Thanks, Mark and Barry, >> > >> > It works pretty wells in terms of the number of linear iterations (using >> > "-pc_gamg_sym_graph true"), but it is horrible in the compute time. I am >> > using the two-level method via "-pc_mg_levels 2". The reason why the compute >> > time is larger than other preconditioning options is that a matrix free >> > method is used in the fine level and in my particular problem the function >> > evaluation is expensive. >> > >> > I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton, >> > but I do not think I want to make the preconditioning part matrix-free. Do >> > you guys know how to turn off the matrix-free method for GAMG? >> >> -pc_use_amat false >> >> > >> > Here is the detailed solver: >> > >> > SNES Object: 384 MPI processes >> > type: newtonls >> > maximum iterations=200, maximum function evaluations=10000 >> > tolerances: relative=1e-08, absolute=1e-08, solution=1e-50 >> > total number of linear solver iterations=20 >> > total number of function evaluations=166 >> > norm schedule ALWAYS >> > SNESLineSearch Object: 384 MPI processes >> > type: bt >> > interpolation: cubic >> > alpha=1.000000e-04 >> > maxstep=1.000000e+08, minlambda=1.000000e-12 >> > tolerances: relative=1.000000e-08, absolute=1.000000e-15, >> > lambda=1.000000e-08 >> > maximum iterations=40 >> > KSP Object: 384 MPI processes >> > type: gmres >> > GMRES: restart=100, using Classical (unmodified) Gram-Schmidt >> > Orthogonalization with no iterative refinement >> > GMRES: happy breakdown tolerance 1e-30 >> > maximum iterations=100, initial guess is zero >> > tolerances: relative=0.001, absolute=1e-50, divergence=10000. >> > right preconditioning >> > using UNPRECONDITIONED norm type for convergence test >> > PC Object: 384 MPI processes >> > type: gamg >> > MG: type is MULTIPLICATIVE, levels=2 cycles=v >> > Cycles per PCApply=1 >> > Using Galerkin computed coarse grid matrices >> > GAMG specific options >> > Threshold for dropping small values from graph 0. >> > AGG specific options >> > Symmetric graph true >> > Coarse grid solver -- level ------------------------------- >> > KSP Object: (mg_coarse_) 384 MPI processes >> > type: preonly >> > maximum iterations=10000, initial guess is zero >> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> > left preconditioning >> > using NONE norm type for convergence test >> > PC Object: (mg_coarse_) 384 MPI processes >> > type: bjacobi >> > block Jacobi: number of blocks = 384 >> > Local solve is same for all blocks, in the following KSP and >> > PC objects: >> > KSP Object: (mg_coarse_sub_) 1 MPI processes >> > type: preonly >> > maximum iterations=1, initial guess is zero >> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> > left preconditioning >> > using NONE norm type for convergence test >> > PC Object: (mg_coarse_sub_) 1 MPI processes >> > type: lu >> > LU: out-of-place factorization >> > tolerance for zero pivot 2.22045e-14 >> > using diagonal shift on blocks to prevent zero pivot >> > [INBLOCKS] >> > matrix ordering: nd >> > factor fill ratio given 5., needed 1.31367 >> > Factored matrix follows: >> > Mat Object: 1 MPI processes >> > type: seqaij >> > rows=37, cols=37 >> > package used to perform factorization: petsc >> > total: nonzeros=913, allocated nonzeros=913 >> > total number of mallocs used during MatSetValues calls >> > =0 >> > not using I-node routines >> > linear system matrix = precond matrix: >> > Mat Object: 1 MPI processes >> > type: seqaij >> > rows=37, cols=37 >> > total: nonzeros=695, allocated nonzeros=695 >> > total number of mallocs used during MatSetValues calls =0 >> > not using I-node routines >> > linear system matrix = precond matrix: >> > Mat Object: 384 MPI processes >> > type: mpiaij >> > rows=18145, cols=18145 >> > total: nonzeros=1709115, allocated nonzeros=1709115 >> > total number of mallocs used during MatSetValues calls =0 >> > not using I-node (on process 0) routines >> > Down solver (pre-smoother) on level 1 >> > ------------------------------- >> > KSP Object: (mg_levels_1_) 384 MPI processes >> > type: chebyshev >> > Chebyshev: eigenvalue estimates: min = 0.133339, max = >> > 1.46673 >> > Chebyshev: eigenvalues estimated using gmres with translations >> > [0. 0.1; 0. 1.1] >> > KSP Object: (mg_levels_1_esteig_) 384 MPI >> > processes >> > type: gmres >> > GMRES: restart=30, using Classical (unmodified) >> > Gram-Schmidt Orthogonalization with no iterative refinement >> > GMRES: happy breakdown tolerance 1e-30 >> > maximum iterations=10, initial guess is zero >> > tolerances: relative=1e-12, absolute=1e-50, >> > divergence=10000. >> > left preconditioning >> > using PRECONDITIONED norm type for convergence test >> > maximum iterations=2 >> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> > left preconditioning >> > using nonzero initial guess >> > using NONE norm type for convergence test >> > PC Object: (mg_levels_1_) 384 MPI processes >> > type: sor >> > SOR: type = local_symmetric, iterations = 1, local iterations >> > = 1, omega = 1. >> > linear system matrix followed by preconditioner matrix: >> > Mat Object: 384 MPI processes >> > type: mffd >> > rows=3020875, cols=3020875 >> > Matrix-free approximation: >> > err=1.49012e-08 (relative error in function evaluation) >> > Using wp compute h routine >> > Does not compute normU >> > Mat Object: () 384 MPI processes >> > type: mpiaij >> > rows=3020875, cols=3020875 >> > total: nonzeros=215671710, allocated nonzeros=241731750 >> > total number of mallocs used during MatSetValues calls =0 >> > not using I-node (on process 0) routines >> > Up solver (post-smoother) same as down solver (pre-smoother) >> > linear system matrix followed by preconditioner matrix: >> > Mat Object: 384 MPI processes >> > type: mffd >> > rows=3020875, cols=3020875 >> > Matrix-free approximation: >> > err=1.49012e-08 (relative error in function evaluation) >> > Using wp compute h routine >> > Does not compute normU >> > Mat Object: () 384 MPI processes >> > type: mpiaij >> > rows=3020875, cols=3020875 >> > total: nonzeros=215671710, allocated nonzeros=241731750 >> > total number of mallocs used during MatSetValues calls =0 >> > not using I-node (on process 0) routines >> > >> > >> > Fande, >> > >> > On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams wrote: >> > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith wrote: >> > > >> > >> Does this mean that GAMG works for the symmetrical matrix only? >> > > >> > > No, it means that for non symmetric nonzero structure you need the >> > > extra flag. So use the extra flag. The reason we don't always use the flag >> > > is because it adds extra cost and isn't needed if the matrix already has a >> > > symmetric nonzero structure. >> > >> > BTW, if you have symmetric non-zero structure you can just set >> > -pc_gamg_threshold -1.0', note the "or" in the message. >> > >> > If you want to mess with the threshold then you need to use the >> > symmetrized flag. >> > >> > From sdaralagodudatta at wpi.edu Sun Apr 9 14:21:08 2017 From: sdaralagodudatta at wpi.edu (Daralagodu Dattatreya Jois, Sathwik Bharadw) Date: Sun, 9 Apr 2017 19:21:08 +0000 Subject: [petsc-users] Patching in generalized eigen value problems Message-ID: Dear petsc users, I am solving for generalized eigen value problems using petsc and slepc. Our equation will be of the form, A X=? B X. I am constructing the A and B matrix of type MATMPIAIJ. Let us consider that both of my matrices are of dimension 10*10. When we are solving for a closed geometry, we require to add all the entries of the last (9th) row and column to the first (0th) row and column respectively for both matrices. In a high density mesh, I will have a large number of such row to row and column to column additions. For example, I may have to add last 200 rows and columns to first 200 rows and columns respectively. We will then zero the copied row and column expect the diagonal element (9th row/column in the former case). I understand that MatGetRow, MatGetColumnVector, MatGetValues or any other MatGet- or VecGet- functions are not collective. Can you suggest any efficient algorithm or function to achieve this way of patching? One way I can think of is to obtain the column vector using MatGetColumnVector and row vector by MatZeroRows and then scatter these vectors to all processes. Once we have entire row/column vector entries in each process, we can add the values to the matrix by their global index. Of course, care should be taken to add the values of diagonal element only once. But this will be a quite slow process. Any ideas are appreciated. Thanks, Sathwik Bharadwaj -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sun Apr 9 14:34:13 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 9 Apr 2017 14:34:13 -0500 Subject: [petsc-users] Patching in generalized eigen value problems In-Reply-To: References: Message-ID: > On Apr 9, 2017, at 2:21 PM, Daralagodu Dattatreya Jois, Sathwik Bharadw wrote: > > Dear petsc users, > > I am solving for generalized eigen value problems using petsc and slepc. > Our equation will be of the form, > > A X=? B X. > > I am constructing the A and B matrix of type MATMPIAIJ. Let us consider that > both of my matrices are of dimension 10*10. When we are solving for a closed > geometry, we require to add all the entries of the last (9th) row and column to > the first (0th) row and column respectively for both matrices. In a high density mesh, > I will have a large number of such row to row and column to column additions. > For example, I may have to add last 200 rows and columns to first 200 rows and columns > respectively. We will then zero the copied row and column expect the diagonal > element (9th row/column in the former case). Where is this "strange" operation coming from? Boundary conditions? Is there any way to assemble matrices initially with these sums instead of doing it after the fact? Why is it always the "last rows" and the "first rows"? What happens when you run in parallel where first and last rows are on different processes? How large will the matrices get? Are the matrices symmetric? > > I understand that MatGetRow, MatGetColumnVector, MatGetValues or any other > MatGet- or VecGet- functions are not collective. Can you suggest any > efficient algorithm or function to achieve this way of patching? > > One way I can think of is to obtain the column vector using MatGetColumnVector and > row vector by MatZeroRows and then scatter these vectors to all processes. Once we have > entire row/column vector entries in each process, we can add the values to the matrix by > their global index. Of course, care should be taken to add the values of diagonal element > only once. But this will be a quite slow process. > Any ideas are appreciated. > > Thanks, > Sathwik Bharadwaj From sdaralagodudatta at wpi.edu Sun Apr 9 14:47:47 2017 From: sdaralagodudatta at wpi.edu (Daralagodu Dattatreya Jois, Sathwik Bharadw) Date: Sun, 9 Apr 2017 19:47:47 +0000 Subject: [petsc-users] Patching in generalized eigen value problems In-Reply-To: References: , Message-ID: Dear Barry, These operations comes into picture when I have to map a plane surface to a closed surface (lets say cylinder). As you can imagine nodes (hence nodal values) at 2 opposite sides of the plane have to add up to give a closed geometry. Matrices can be as large as 30,000*30,000 or more depending on the density of the mesh. Since, in effect different elements sits in different processes doing this in assembly level will be tricky. Sathwik Bharadwaj ________________________________ From: Barry Smith Sent: Sunday, April 9, 2017 3:34:13 PM To: Daralagodu Dattatreya Jois, Sathwik Bharadw Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] Patching in generalized eigen value problems > On Apr 9, 2017, at 2:21 PM, Daralagodu Dattatreya Jois, Sathwik Bharadw wrote: > > Dear petsc users, > > I am solving for generalized eigen value problems using petsc and slepc. > Our equation will be of the form, > > A X=? B X. > > I am constructing the A and B matrix of type MATMPIAIJ. Let us consider that > both of my matrices are of dimension 10*10. When we are solving for a closed > geometry, we require to add all the entries of the last (9th) row and column to > the first (0th) row and column respectively for both matrices. In a high density mesh, > I will have a large number of such row to row and column to column additions. > For example, I may have to add last 200 rows and columns to first 200 rows and columns > respectively. We will then zero the copied row and column expect the diagonal > element (9th row/column in the former case). Where is this "strange" operation coming from? Boundary conditions? Is there any way to assemble matrices initially with these sums instead of doing it after the fact? Why is it always the "last rows" and the "first rows"? What happens when you run in parallel where first and last rows are on different processes? How large will the matrices get? Are the matrices symmetric? > > I understand that MatGetRow, MatGetColumnVector, MatGetValues or any other > MatGet- or VecGet- functions are not collective. Can you suggest any > efficient algorithm or function to achieve this way of patching? > > One way I can think of is to obtain the column vector using MatGetColumnVector and > row vector by MatZeroRows and then scatter these vectors to all processes. Once we have > entire row/column vector entries in each process, we can add the values to the matrix by > their global index. Of course, care should be taken to add the values of diagonal element > only once. But this will be a quite slow process. > Any ideas are appreciated. > > Thanks, > Sathwik Bharadwaj -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun Apr 9 15:36:10 2017 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 9 Apr 2017 15:36:10 -0500 Subject: [petsc-users] Patching in generalized eigen value problems In-Reply-To: References: Message-ID: On Sun, Apr 9, 2017 at 2:47 PM, Daralagodu Dattatreya Jois, Sathwik Bharadw wrote: > Dear Barry, > > > These operations comes into picture when I have to map a > > plane surface to a closed surface (lets say cylinder). As you > > can imagine nodes (hence nodal values) at 2 opposite sides > > of the plane have to add up to give a closed geometry. > > Instead of assembling both, why not just make one row into constraints, namely 1 for the primal node and -1 for the host node. Then you do not need communication, and its sparser. IMHO, these "extra" nodes should just be eliminated from the system. Matt > Matrices can be as large as 30,000*30,000 or more depending > > on the density of the mesh. Since, in effect different elements sits in > > different processes doing this in assembly level will be tricky. > > > Sathwik Bharadwaj > ------------------------------ > *From:* Barry Smith > *Sent:* Sunday, April 9, 2017 3:34:13 PM > *To:* Daralagodu Dattatreya Jois, Sathwik Bharadw > *Cc:* petsc-users at mcs.anl.gov > *Subject:* Re: [petsc-users] Patching in generalized eigen value problems > > > > On Apr 9, 2017, at 2:21 PM, Daralagodu Dattatreya Jois, Sathwik Bharadw < > sdaralagodudatta at wpi.edu> wrote: > > > > Dear petsc users, > > > > I am solving for generalized eigen value problems using petsc and slepc. > > Our equation will be of the form, > > > > A X=? B X. > > > > I am constructing the A and B matrix of type MATMPIAIJ. Let us consider > that > > both of my matrices are of dimension 10*10. When we are solving for a > closed > > geometry, we require to add all the entries of the last (9th) row and > column to > > the first (0th) row and column respectively for both matrices. In a high > density mesh, > > I will have a large number of such row to row and column to column > additions. > > For example, I may have to add last 200 rows and columns to first 200 > rows and columns > > respectively. We will then zero the copied row and column expect the > diagonal > > element (9th row/column in the former case). > > Where is this "strange" operation coming from? > > Boundary conditions? > > Is there any way to assemble matrices initially with these sums instead > of doing it after the fact? > > Why is it always the "last rows" and the "first rows"? > > What happens when you run in parallel where first and last rows are on > different processes? > > How large will the matrices get? > > Are the matrices symmetric? > > > > > > > > I understand that MatGetRow, MatGetColumnVector, MatGetValues or any > other > > MatGet- or VecGet- functions are not collective. Can you suggest any > > efficient algorithm or function to achieve this way of patching? > > > > One way I can think of is to obtain the column vector using > MatGetColumnVector and > > row vector by MatZeroRows and then scatter these vectors to all > processes. Once we have > > entire row/column vector entries in each process, we can add the values > to the matrix by > > their global index. Of course, care should be taken to add the values of > diagonal element > > only once. But this will be a quite slow process. > > Any ideas are appreciated. > > > > Thanks, > > Sathwik Bharadwaj > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sun Apr 9 15:55:17 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 9 Apr 2017 15:55:17 -0500 Subject: [petsc-users] Patching in generalized eigen value problems In-Reply-To: References: Message-ID: > On Apr 9, 2017, at 2:47 PM, Daralagodu Dattatreya Jois, Sathwik Bharadw wrote: > > Dear Barry, > > These operations comes into picture when I have to map a > plane surface to a closed surface (lets say cylinder). As you > can imagine nodes (hence nodal values) at 2 opposite sides > of the plane have to add up to give a closed geometry. > Matrices can be as large as 30,000*30,000 or more depending > on the density of the mesh. Since, in effect different elements sits in > different processes doing this in assembly level will be tricky. Hmm, for this specific case you can use a 2d DMDA with periodic boundary conditions in one direction (the wrapped direction). The DMDA will create an empty matrix with the needed nonzero structure (including the "periodic connecting parts") and you can use MatSetValuesStencil() to provide the matrix entries; since the "stencil" knows about the periodic boundary condition it automatically puts the provided values into the correct place in the matrix. Can also be used in 3d and you can have any combination of some edges periodic and some not. If you can't use a simple structured grid with DMDA then DMPLEX (which allows unstructured grids) can also automatically handle the periodic boundary condition in assembly. But I would definitely start with DMDA. Barry > > Sathwik Bharadwaj > From: Barry Smith > Sent: Sunday, April 9, 2017 3:34:13 PM > To: Daralagodu Dattatreya Jois, Sathwik Bharadw > Cc: petsc-users at mcs.anl.gov > Subject: Re: [petsc-users] Patching in generalized eigen value problems > > > > On Apr 9, 2017, at 2:21 PM, Daralagodu Dattatreya Jois, Sathwik Bharadw wrote: > > > > Dear petsc users, > > > > I am solving for generalized eigen value problems using petsc and slepc. > > Our equation will be of the form, > > > > A X=? B X. > > > > I am constructing the A and B matrix of type MATMPIAIJ. Let us consider that > > both of my matrices are of dimension 10*10. When we are solving for a closed > > geometry, we require to add all the entries of the last (9th) row and column to > > the first (0th) row and column respectively for both matrices. In a high density mesh, > > I will have a large number of such row to row and column to column additions. > > For example, I may have to add last 200 rows and columns to first 200 rows and columns > > respectively. We will then zero the copied row and column expect the diagonal > > element (9th row/column in the former case). > > Where is this "strange" operation coming from? > > Boundary conditions? > > Is there any way to assemble matrices initially with these sums instead of doing it after the fact? > > Why is it always the "last rows" and the "first rows"? > > What happens when you run in parallel where first and last rows are on different processes? > > How large will the matrices get? > > Are the matrices symmetric? > > > > > > > > I understand that MatGetRow, MatGetColumnVector, MatGetValues or any other > > MatGet- or VecGet- functions are not collective. Can you suggest any > > efficient algorithm or function to achieve this way of patching? > > > > One way I can think of is to obtain the column vector using MatGetColumnVector and > > row vector by MatZeroRows and then scatter these vectors to all processes. Once we have > > entire row/column vector entries in each process, we can add the values to the matrix by > > their global index. Of course, care should be taken to add the values of diagonal element > > only once. But this will be a quite slow process. > > Any ideas are appreciated. > > > > Thanks, > > Sathwik Bharadwaj From fande.kong at inl.gov Mon Apr 10 11:17:23 2017 From: fande.kong at inl.gov (Kong, Fande) Date: Mon, 10 Apr 2017 10:17:23 -0600 Subject: [petsc-users] GAMG for the unsymmetrical matrix In-Reply-To: References: <66D79752-D4C3-4A2E-ADCE-EFC23C93CCD7@mcs.anl.gov> <772D2966-F917-44D1-B2AC-B0F4E506DC7C@mcs.anl.gov> Message-ID: On Sun, Apr 9, 2017 at 6:04 AM, Mark Adams wrote: > You seem to have two levels here and 3M eqs on the fine grid and 37 on > the coarse grid. 37 is on the sub domain. rows=18145, cols=18145 on the entire coarse grid. > I don't understand that. > > You are also calling the AMG setup a lot, but not spending much time > in it. Try running with -info and grep on "GAMG". > > > On Fri, Apr 7, 2017 at 5:29 PM, Kong, Fande wrote: > > Thanks, Barry. > > > > It works. > > > > GAMG is three times better than ASM in terms of the number of linear > > iterations, but it is five times slower than ASM. Any suggestions to > improve > > the performance of GAMG? Log files are attached. > > > > Fande, > > > > On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith wrote: > >> > >> > >> > On Apr 6, 2017, at 9:39 AM, Kong, Fande wrote: > >> > > >> > Thanks, Mark and Barry, > >> > > >> > It works pretty wells in terms of the number of linear iterations > (using > >> > "-pc_gamg_sym_graph true"), but it is horrible in the compute time. I > am > >> > using the two-level method via "-pc_mg_levels 2". The reason why the > compute > >> > time is larger than other preconditioning options is that a matrix > free > >> > method is used in the fine level and in my particular problem the > function > >> > evaluation is expensive. > >> > > >> > I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton, > >> > but I do not think I want to make the preconditioning part > matrix-free. Do > >> > you guys know how to turn off the matrix-free method for GAMG? > >> > >> -pc_use_amat false > >> > >> > > >> > Here is the detailed solver: > >> > > >> > SNES Object: 384 MPI processes > >> > type: newtonls > >> > maximum iterations=200, maximum function evaluations=10000 > >> > tolerances: relative=1e-08, absolute=1e-08, solution=1e-50 > >> > total number of linear solver iterations=20 > >> > total number of function evaluations=166 > >> > norm schedule ALWAYS > >> > SNESLineSearch Object: 384 MPI processes > >> > type: bt > >> > interpolation: cubic > >> > alpha=1.000000e-04 > >> > maxstep=1.000000e+08, minlambda=1.000000e-12 > >> > tolerances: relative=1.000000e-08, absolute=1.000000e-15, > >> > lambda=1.000000e-08 > >> > maximum iterations=40 > >> > KSP Object: 384 MPI processes > >> > type: gmres > >> > GMRES: restart=100, using Classical (unmodified) Gram-Schmidt > >> > Orthogonalization with no iterative refinement > >> > GMRES: happy breakdown tolerance 1e-30 > >> > maximum iterations=100, initial guess is zero > >> > tolerances: relative=0.001, absolute=1e-50, divergence=10000. > >> > right preconditioning > >> > using UNPRECONDITIONED norm type for convergence test > >> > PC Object: 384 MPI processes > >> > type: gamg > >> > MG: type is MULTIPLICATIVE, levels=2 cycles=v > >> > Cycles per PCApply=1 > >> > Using Galerkin computed coarse grid matrices > >> > GAMG specific options > >> > Threshold for dropping small values from graph 0. > >> > AGG specific options > >> > Symmetric graph true > >> > Coarse grid solver -- level ------------------------------- > >> > KSP Object: (mg_coarse_) 384 MPI processes > >> > type: preonly > >> > maximum iterations=10000, initial guess is zero > >> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > >> > left preconditioning > >> > using NONE norm type for convergence test > >> > PC Object: (mg_coarse_) 384 MPI processes > >> > type: bjacobi > >> > block Jacobi: number of blocks = 384 > >> > Local solve is same for all blocks, in the following KSP and > >> > PC objects: > >> > KSP Object: (mg_coarse_sub_) 1 MPI processes > >> > type: preonly > >> > maximum iterations=1, initial guess is zero > >> > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000. > >> > left preconditioning > >> > using NONE norm type for convergence test > >> > PC Object: (mg_coarse_sub_) 1 MPI processes > >> > type: lu > >> > LU: out-of-place factorization > >> > tolerance for zero pivot 2.22045e-14 > >> > using diagonal shift on blocks to prevent zero pivot > >> > [INBLOCKS] > >> > matrix ordering: nd > >> > factor fill ratio given 5., needed 1.31367 > >> > Factored matrix follows: > >> > Mat Object: 1 MPI processes > >> > type: seqaij > >> > rows=37, cols=37 > >> > package used to perform factorization: petsc > >> > total: nonzeros=913, allocated nonzeros=913 > >> > total number of mallocs used during MatSetValues > calls > >> > =0 > >> > not using I-node routines > >> > linear system matrix = precond matrix: > >> > Mat Object: 1 MPI processes > >> > type: seqaij > >> > rows=37, cols=37 > >> > total: nonzeros=695, allocated nonzeros=695 > >> > total number of mallocs used during MatSetValues calls =0 > >> > not using I-node routines > >> > linear system matrix = precond matrix: > >> > Mat Object: 384 MPI processes > >> > type: mpiaij > >> > rows=18145, cols=18145 > >> > total: nonzeros=1709115, allocated nonzeros=1709115 > >> > total number of mallocs used during MatSetValues calls =0 > >> > not using I-node (on process 0) routines > >> > Down solver (pre-smoother) on level 1 > >> > ------------------------------- > >> > KSP Object: (mg_levels_1_) 384 MPI processes > >> > type: chebyshev > >> > Chebyshev: eigenvalue estimates: min = 0.133339, max = > >> > 1.46673 > >> > Chebyshev: eigenvalues estimated using gmres with > translations > >> > [0. 0.1; 0. 1.1] > >> > KSP Object: (mg_levels_1_esteig_) 384 MPI > >> > processes > >> > type: gmres > >> > GMRES: restart=30, using Classical (unmodified) > >> > Gram-Schmidt Orthogonalization with no iterative refinement > >> > GMRES: happy breakdown tolerance 1e-30 > >> > maximum iterations=10, initial guess is zero > >> > tolerances: relative=1e-12, absolute=1e-50, > >> > divergence=10000. > >> > left preconditioning > >> > using PRECONDITIONED norm type for convergence test > >> > maximum iterations=2 > >> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > >> > left preconditioning > >> > using nonzero initial guess > >> > using NONE norm type for convergence test > >> > PC Object: (mg_levels_1_) 384 MPI processes > >> > type: sor > >> > SOR: type = local_symmetric, iterations = 1, local > iterations > >> > = 1, omega = 1. > >> > linear system matrix followed by preconditioner matrix: > >> > Mat Object: 384 MPI processes > >> > type: mffd > >> > rows=3020875, cols=3020875 > >> > Matrix-free approximation: > >> > err=1.49012e-08 (relative error in function evaluation) > >> > Using wp compute h routine > >> > Does not compute normU > >> > Mat Object: () 384 MPI processes > >> > type: mpiaij > >> > rows=3020875, cols=3020875 > >> > total: nonzeros=215671710, allocated nonzeros=241731750 > >> > total number of mallocs used during MatSetValues calls =0 > >> > not using I-node (on process 0) routines > >> > Up solver (post-smoother) same as down solver (pre-smoother) > >> > linear system matrix followed by preconditioner matrix: > >> > Mat Object: 384 MPI processes > >> > type: mffd > >> > rows=3020875, cols=3020875 > >> > Matrix-free approximation: > >> > err=1.49012e-08 (relative error in function evaluation) > >> > Using wp compute h routine > >> > Does not compute normU > >> > Mat Object: () 384 MPI processes > >> > type: mpiaij > >> > rows=3020875, cols=3020875 > >> > total: nonzeros=215671710, allocated nonzeros=241731750 > >> > total number of mallocs used during MatSetValues calls =0 > >> > not using I-node (on process 0) routines > >> > > >> > > >> > Fande, > >> > > >> > On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams wrote: > >> > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith > wrote: > >> > > > >> > >> Does this mean that GAMG works for the symmetrical matrix only? > >> > > > >> > > No, it means that for non symmetric nonzero structure you need the > >> > > extra flag. So use the extra flag. The reason we don't always use > the flag > >> > > is because it adds extra cost and isn't needed if the matrix > already has a > >> > > symmetric nonzero structure. > >> > > >> > BTW, if you have symmetric non-zero structure you can just set > >> > -pc_gamg_threshold -1.0', note the "or" in the message. > >> > > >> > If you want to mess with the threshold then you need to use the > >> > symmetrized flag. > >> > > >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Rodrigo.Felicio at iongeo.com Mon Apr 10 11:49:43 2017 From: Rodrigo.Felicio at iongeo.com (Rodrigo Felicio) Date: Mon, 10 Apr 2017 16:49:43 +0000 Subject: [petsc-users] how to use petsc4py with mpi subcommunicators? Message-ID: <350529B93F4E2F4497FD8DE4E86E84AA16F1FC5C@AUS1EXMBX04.ioinc.ioroot.tld> Hello all, Sorry for the newbie question, but is there a way of making petsc4py work with an MPI group or subcommunicator? I saw a solution posted back in 2010 (http://lists.mcs.anl.gov/pipermail/petsc-users/2010-May/006382.html), but it does not work for me. Indeed, if I try to use petsc4py.init(comm=newcom), then my sample code prints a msg "Attempting to use an MPI routine before initializing MPI". Below I attach both the output and the source of the python code. kind regards Rodrigo time mpirun -n 5 python split_comm_ex2.py Global: rank 0 of 5. New comm : rank 0 of 3 Global: rank 1 of 5. New comm : rank 0 of 2 Global: rank 2 of 5. New comm : rank 1 of 3 Global: rank 3 of 5. New comm : rank 1 of 2 Global: rank 4 of 5. New comm : rank 2 of 3 Attempting to use an MPI routine before initializing MPI Attempting to use an MPI routine before initializing MPI real 0m0.655s user 0m1.122s sys 0m1.047s And the python code: from mpi4py import MPI comm = MPI.COMM_WORLD world_rank = comm.rank world_size = comm.size color = world_rank % 2 newcomm = comm.Split(color) newcomm_rank = newcomm.rank newcomm_size = newcomm.size for i in range(world_size): comm.Barrier() if (world_rank == i): print ("Global: rank %d of %d. New comm : rank % d of %d" % (world_rank, world_size, newcomm_rank, newcomm_size)) if newcomm.rank == 0: import petsc4py petsc4py.init(comm=newcomm) from petsc4py import PETSc pcomm = PETSc.COMM_WORLD print('pcomm size is {}/{}'.format(pcomm.rank, pcomm.size)) newcomm.Free() ________________________________ This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kaushikggg at gmail.com Mon Apr 10 13:17:58 2017 From: kaushikggg at gmail.com (Kaushik Kulkarni) Date: Mon, 10 Apr 2017 23:47:58 +0530 Subject: [petsc-users] Solving NON-Diagonally dominant sparse system Message-ID: Hello, I am trying to solve a 2500x2500 sparse matrix. To get an idea about the matrix structure I have added a file matrix.log which contains the output of MatView() and also the output of Matview_draw in the image file. >From the matrix structure it can be seen that Jacobi iteration won't work and some of the diagonal entries being very low(of the order of 1E-16) LU factorization would also fail. C ?an someone please suggest what all could I try next, in order to make the solution converge? Thanks, Kaushik ? -- Kaushik Kulkarni Fourth Year Undergraduate Department of Mechanical Engineering Indian Institute of Technology Bombay Mumbai, India https://kaushikcfd.github.io/About/ +91-9967687150 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: matrix.log Type: text/x-log Size: 2091693 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: matrix_pattern.png Type: image/png Size: 25197 bytes Desc: not available URL: From friedenhe at gmail.com Mon Apr 10 13:22:38 2017 From: friedenhe at gmail.com (Ping He) Date: Mon, 10 Apr 2017 14:22:38 -0400 Subject: [petsc-users] SNES diverges when KSP reach max iteration? Message-ID: <686a48a0-f1cb-d248-958f-ecb3574d2c41@gmail.com> Dear all, I am using SNES for an incompressible flow problem. I choose the preconditioned matrix-free Newton approach with the Eisenstat-Walker option, and the KSP solver is GMRES. Now I have a question on how do I setup the convergence criteria for KSP. I notice that if the ksp iteration reaches the maxIter_ksp (I set it to 300), SNES will treat it as divergence and quit. However, this is usually not the case and the SNES norm can definitely drop more. So I am wondering if there is a way to tell SNES that if KSP reaches maxIter_ksp, continue doing line search and don't quit. I know that I can play with the maxIter_ksp and EW parameters to mitigate this issue, but I don't want KSP to over-solve for each SNES iteration. Anyway, I want a setup that SNES will quit based on rTol_snes and maxIter_snes, instead of maxIter_ksp. Any suggestions? Thanks very much in advance! Best regards, Jack From bsmith at mcs.anl.gov Mon Apr 10 14:07:26 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 10 Apr 2017 14:07:26 -0500 Subject: [petsc-users] Solving NON-Diagonally dominant sparse system In-Reply-To: References: Message-ID: I would suggest using ./configure --download-superlu and then when running the program -pc_type lu -pc_factor_mat_solver_package superlu Note that this is SuperLU, it is not SuperLU_DIST. Superlu uses partial pivoting for numerical stability so should be able to handle the small or zero diagonal entries. Barry > On Apr 10, 2017, at 1:17 PM, Kaushik Kulkarni wrote: > > Hello, > I am trying to solve a 2500x2500 sparse matrix. To get an idea about the matrix structure I have added a file matrix.log which contains the output of MatView() and also the output of Matview_draw in the image file. > > From the matrix structure it can be seen that Jacobi iteration won't work and some of the diagonal entries being very low(of the order of 1E-16) LU factorization would also fail. > > C?an someone please suggest what all could I try next, in order to make the solution converge? > > Thanks, > Kaushik > ? > -- > Kaushik Kulkarni > Fourth Year Undergraduate > Department of Mechanical Engineering > Indian Institute of Technology Bombay > Mumbai, India > https://kaushikcfd.github.io/About/ > +91-9967687150 > From bsmith at mcs.anl.gov Mon Apr 10 14:14:22 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 10 Apr 2017 14:14:22 -0500 Subject: [petsc-users] SNES diverges when KSP reach max iteration? In-Reply-To: <686a48a0-f1cb-d248-958f-ecb3574d2c41@gmail.com> References: <686a48a0-f1cb-d248-958f-ecb3574d2c41@gmail.com> Message-ID: SNESSetMaxLinearSolveFailures() or -snes_max_linear_solve_fail 1000 (use some large number here). > On Apr 10, 2017, at 1:22 PM, Ping He wrote: > > Dear all, > > I am using SNES for an incompressible flow problem. I choose the preconditioned matrix-free Newton approach with the Eisenstat-Walker option, and the KSP solver is GMRES. > > Now I have a question on how do I setup the convergence criteria for KSP. I notice that if the ksp iteration reaches the maxIter_ksp (I set it to 300), SNES will treat it as divergence and quit. However, this is usually not the case and the SNES norm can definitely drop more. So I am wondering if there is a way to tell SNES that if KSP reaches maxIter_ksp, continue doing line search and don't quit. I know that I can play with the maxIter_ksp and EW parameters to mitigate this issue, but I don't want KSP to over-solve for each SNES iteration. > > Anyway, I want a setup that SNES will quit based on rTol_snes and maxIter_snes, instead of maxIter_ksp. Any suggestions? Thanks very much in advance! > > Best regards, > > Jack > From xsli at lbl.gov Mon Apr 10 14:34:41 2017 From: xsli at lbl.gov (Xiaoye S. Li) Date: Mon, 10 Apr 2017 12:34:41 -0700 Subject: [petsc-users] Solving NON-Diagonally dominant sparse system In-Reply-To: References: Message-ID: If you need to use SuperLU_DIST, the pivoting is done statically, using maximum weighted matching, so the small diagonals are usually taken care as well. It is not as good as partial pivoting, but works most of the time. Sherry On Mon, Apr 10, 2017 at 12:07 PM, Barry Smith wrote: > > I would suggest using ./configure --download-superlu and then when > running the program -pc_type lu -pc_factor_mat_solver_package superlu > > Note that this is SuperLU, it is not SuperLU_DIST. Superlu uses > partial pivoting for numerical stability so should be able to handle the > small or zero diagonal entries. > > Barry > > > On Apr 10, 2017, at 1:17 PM, Kaushik Kulkarni > wrote: > > > > Hello, > > I am trying to solve a 2500x2500 sparse matrix. To get an idea about the > matrix structure I have added a file matrix.log which contains the output > of MatView() and also the output of Matview_draw in the image file. > > > > From the matrix structure it can be seen that Jacobi iteration won't > work and some of the diagonal entries being very low(of the order of 1E-16) > LU factorization would also fail. > > > > C?an someone please suggest what all could I try next, in order to make > the solution converge? > > > > Thanks, > > Kaushik > > ? > > -- > > Kaushik Kulkarni > > Fourth Year Undergraduate > > Department of Mechanical Engineering > > Indian Institute of Technology Bombay > > Mumbai, India > > https://kaushikcfd.github.io/About/ > > +91-9967687150 > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From friedenhe at gmail.com Mon Apr 10 14:44:28 2017 From: friedenhe at gmail.com (Ping He) Date: Mon, 10 Apr 2017 15:44:28 -0400 Subject: [petsc-users] SNES diverges when KSP reach max iteration? In-Reply-To: References: <686a48a0-f1cb-d248-958f-ecb3574d2c41@gmail.com> Message-ID: <6d9434d1-896f-ded8-8bfc-ceb8bf34ba9a@gmail.com> It works. Thanks! On 04/10/2017 03:14 PM, Barry Smith wrote: > SNESSetMaxLinearSolveFailures() or -snes_max_linear_solve_fail 1000 (use some large number here). > > > > >> On Apr 10, 2017, at 1:22 PM, Ping He wrote: >> >> Dear all, >> >> I am using SNES for an incompressible flow problem. I choose the preconditioned matrix-free Newton approach with the Eisenstat-Walker option, and the KSP solver is GMRES. >> >> Now I have a question on how do I setup the convergence criteria for KSP. I notice that if the ksp iteration reaches the maxIter_ksp (I set it to 300), SNES will treat it as divergence and quit. However, this is usually not the case and the SNES norm can definitely drop more. So I am wondering if there is a way to tell SNES that if KSP reaches maxIter_ksp, continue doing line search and don't quit. I know that I can play with the maxIter_ksp and EW parameters to mitigate this issue, but I don't want KSP to over-solve for each SNES iteration. >> >> Anyway, I want a setup that SNES will quit based on rTol_snes and maxIter_snes, instead of maxIter_ksp. Any suggestions? Thanks very much in advance! >> >> Best regards, >> >> Jack >> From jed at jedbrown.org Mon Apr 10 20:45:24 2017 From: jed at jedbrown.org (Jed Brown) Date: Mon, 10 Apr 2017 19:45:24 -0600 Subject: [petsc-users] how to use petsc4py with mpi subcommunicators? In-Reply-To: <350529B93F4E2F4497FD8DE4E86E84AA16F1FC5C@AUS1EXMBX04.ioinc.ioroot.tld> References: <350529B93F4E2F4497FD8DE4E86E84AA16F1FC5C@AUS1EXMBX04.ioinc.ioroot.tld> Message-ID: <87bms367kb.fsf@jedbrown.org> Rodrigo Felicio writes: > Hello all, > > Sorry for the newbie question, but is there a way of making petsc4py work with an MPI group or subcommunicator? I saw a solution posted back in 2010 (http://lists.mcs.anl.gov/pipermail/petsc-users/2010-May/006382.html), but it does not work for me. Indeed, if I try to use petsc4py.init(comm=newcom), then my sample code prints a msg "Attempting to use an MPI routine before initializing MPI". Below I attach both the output and the source of the python code. > > kind regards > Rodrigo > > > time mpirun -n 5 python split_comm_ex2.py > Global: rank 0 of 5. New comm : rank 0 of 3 > Global: rank 1 of 5. New comm : rank 0 of 2 > Global: rank 2 of 5. New comm : rank 1 of 3 > Global: rank 3 of 5. New comm : rank 1 of 2 > Global: rank 4 of 5. New comm : rank 2 of 3 > Attempting to use an MPI routine before initializing MPI > Attempting to use an MPI routine before initializing MPI > > real 0m0.655s > user 0m1.122s > sys 0m1.047s > > And the python code: > > from mpi4py import MPI > > comm = MPI.COMM_WORLD > world_rank = comm.rank > world_size = comm.size > > color = world_rank % 2 > > newcomm = comm.Split(color) > newcomm_rank = newcomm.rank > newcomm_size = newcomm.size > > for i in range(world_size): > comm.Barrier() > if (world_rank == i): > print ("Global: rank %d of %d. New comm : rank % d of %d" % > (world_rank, world_size, newcomm_rank, newcomm_size)) > > if newcomm.rank == 0: > import petsc4py > petsc4py.init(comm=newcomm) I don't know if it fixes your problem, but you definitely need to call this collectively on newcomm. For example, using "if color == 0" above would have the right collective semantics. > from petsc4py import PETSc > > pcomm = PETSc.COMM_WORLD > print('pcomm size is {}/{}'.format(pcomm.rank, pcomm.size)) > > newcomm.Free() > > ________________________________ > > > This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From kaushikggg at gmail.com Mon Apr 10 23:38:03 2017 From: kaushikggg at gmail.com (Kaushik Kulkarni) Date: Tue, 11 Apr 2017 10:08:03 +0530 Subject: [petsc-users] Solving NON-Diagonally dominant sparse system In-Reply-To: References: Message-ID: Thank you for the inputs. I tried Barry' s suggestion to use SuperLU, but the solution does not converge and on doing -ksp_monitor -ksp_converged_reason. I get the following error:- 240 KSP Residual norm 1.722571678777e+07 Linear solve did not converge due to DIVERGED_DTOL iterations 240 For some reason it is diverging, although I am sure that for the given system a unique solution exists. Thanks, Kaushik On Tue, Apr 11, 2017 at 1:04 AM, Xiaoye S. Li wrote: > If you need to use SuperLU_DIST, the pivoting is done statically, using > maximum weighted matching, so the small diagonals are usually taken care as > well. It is not as good as partial pivoting, but works most of the time. > > Sherry > > On Mon, Apr 10, 2017 at 12:07 PM, Barry Smith wrote: > >> >> I would suggest using ./configure --download-superlu and then when >> running the program -pc_type lu -pc_factor_mat_solver_package superlu >> >> Note that this is SuperLU, it is not SuperLU_DIST. Superlu uses >> partial pivoting for numerical stability so should be able to handle the >> small or zero diagonal entries. >> >> Barry >> >> > On Apr 10, 2017, at 1:17 PM, Kaushik Kulkarni >> wrote: >> > >> > Hello, >> > I am trying to solve a 2500x2500 sparse matrix. To get an idea about >> the matrix structure I have added a file matrix.log which contains the >> output of MatView() and also the output of Matview_draw in the image file. >> > >> > From the matrix structure it can be seen that Jacobi iteration won't >> work and some of the diagonal entries being very low(of the order of 1E-16) >> LU factorization would also fail. >> > >> > C?an someone please suggest what all could I try next, in order to make >> the solution converge? >> > >> > Thanks, >> > Kaushik >> > ? >> > -- >> > Kaushik Kulkarni >> > Fourth Year Undergraduate >> > Department of Mechanical Engineering >> > Indian Institute of Technology Bombay >> > Mumbai, India >> > https://kaushikcfd.github.io/About/ >> > +91-9967687150 >> > >> >> > -- Kaushik Kulkarni Fourth Year Undergraduate Department of Mechanical Engineering Indian Institute of Technology Bombay Mumbai, India https://kaushikcfd.github.io/About/ +91-9967687150 -------------- next part -------------- An HTML attachment was scrubbed... URL: From kaushikggg at gmail.com Tue Apr 11 00:28:14 2017 From: kaushikggg at gmail.com (Kaushik Kulkarni) Date: Tue, 11 Apr 2017 10:58:14 +0530 Subject: [petsc-users] Solving NON-Diagonally dominant sparse system In-Reply-To: References: Message-ID: A strange behavior I am observing is: Problem: I have to solve A*x=rhs, and currently I am currently trying to solve for a system where I know the exact solution. I have initialized the exact solution in the Vec x_exact. MatMult(A, x_exact, dummy);// Storing the value of A*x_exact in dummy VecAXPY(dummy, -1.0, rhs); // dummy = dummy -rhs VecNorm(dummy, NORM_INFINITY, &norm_val); // norm_val = ||dummy||, which gives us the residual norm PetscPrintf(PETSC_COMM_SELF, "Norm = %f\n", norm_val); // Printing the norm. // Starting with the linear solver KSPCreate(PETSC_COMM_SELF, &ksp); KSPSetOperators(ksp, A, A); KSPSetFromOptions(ksp); KSPSolve(ksp,rhs,x_exact); // Solving the system A*x= rhs, with the given initial input x_exact. So the result will also be stored in x_exact On running with -pc_type lu -pc_factor_mat_solver_package superlu -ksp_monitor I get the following output: Norm = 0.000000 0 KSP Residual norm 4.371606462669e+04 1 KSP Residual norm 5.850058113796e+02 2 KSP Residual norm 5.832677911508e+02 3 KSP Residual norm 1.987386549571e+02 4 KSP Residual norm 1.220006530614e+02 . . . Since the initial guess is the exact solution should'nt the first residual itself be zero and converge in one iteration. Thanks, Kaushik On Tue, Apr 11, 2017 at 10:08 AM, Kaushik Kulkarni wrote: > Thank you for the inputs. > I tried Barry' s suggestion to use SuperLU, but the solution does not > converge and on doing -ksp_monitor -ksp_converged_reason. I get the > following error:- > 240 KSP Residual norm 1.722571678777e+07 > Linear solve did not converge due to DIVERGED_DTOL iterations 240 > For some reason it is diverging, although I am sure that for the given > system a unique solution exists. > > Thanks, > Kaushik > > On Tue, Apr 11, 2017 at 1:04 AM, Xiaoye S. Li wrote: > >> If you need to use SuperLU_DIST, the pivoting is done statically, using >> maximum weighted matching, so the small diagonals are usually taken care as >> well. It is not as good as partial pivoting, but works most of the time. >> >> Sherry >> >> On Mon, Apr 10, 2017 at 12:07 PM, Barry Smith wrote: >> >>> >>> I would suggest using ./configure --download-superlu and then when >>> running the program -pc_type lu -pc_factor_mat_solver_package superlu >>> >>> Note that this is SuperLU, it is not SuperLU_DIST. Superlu uses >>> partial pivoting for numerical stability so should be able to handle the >>> small or zero diagonal entries. >>> >>> Barry >>> >>> > On Apr 10, 2017, at 1:17 PM, Kaushik Kulkarni >>> wrote: >>> > >>> > Hello, >>> > I am trying to solve a 2500x2500 sparse matrix. To get an idea about >>> the matrix structure I have added a file matrix.log which contains the >>> output of MatView() and also the output of Matview_draw in the image file. >>> > >>> > From the matrix structure it can be seen that Jacobi iteration won't >>> work and some of the diagonal entries being very low(of the order of 1E-16) >>> LU factorization would also fail. >>> > >>> > C?an someone please suggest what all could I try next, in order to >>> make the solution converge? >>> > >>> > Thanks, >>> > Kaushik >>> > ? >>> > -- >>> > Kaushik Kulkarni >>> > Fourth Year Undergraduate >>> > Department of Mechanical Engineering >>> > Indian Institute of Technology Bombay >>> > Mumbai, India >>> > https://kaushikcfd.github.io/About/ >>> > +91-9967687150 >>> > >>> >>> >> > > > -- > Kaushik Kulkarni > Fourth Year Undergraduate > Department of Mechanical Engineering > Indian Institute of Technology Bombay > Mumbai, India > https://kaushikcfd.github.io/About/ > +91-9967687150 > -- Kaushik Kulkarni Fourth Year Undergraduate Department of Mechanical Engineering Indian Institute of Technology Bombay Mumbai, India https://kaushikcfd.github.io/About/ +91-9967687150 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.mayhem23 at gmail.com Tue Apr 11 01:27:09 2017 From: dave.mayhem23 at gmail.com (Dave May) Date: Tue, 11 Apr 2017 06:27:09 +0000 Subject: [petsc-users] Solving NON-Diagonally dominant sparse system In-Reply-To: References: Message-ID: On Tue, 11 Apr 2017 at 07:28, Kaushik Kulkarni wrote: > A strange behavior I am observing is: > Problem: I have to solve A*x=rhs, and currently I am currently trying to > solve for a system where I know the exact solution. I have initialized the > exact solution in the Vec x_exact. > > MatMult(A, x_exact, dummy);// Storing the value of A*x_exact in dummy > VecAXPY(dummy, -1.0, rhs); // dummy = dummy -rhs > VecNorm(dummy, NORM_INFINITY, &norm_val); // norm_val = ||dummy||, which > gives us the residual norm > PetscPrintf(PETSC_COMM_SELF, "Norm = %f\n", norm_val); // Printing the > norm. > > // Starting with the linear solver > KSPCreate(PETSC_COMM_SELF, &ksp); > KSPSetOperators(ksp, A, A); > KSPSetFromOptions(ksp); > KSPSolve(ksp,rhs,x_exact); // Solving the system A*x= rhs, with the given > initial input x_exact. So the result will also be stored in x_exact > > On running with -pc_type lu -pc_factor_mat_solver_package superlu > -ksp_monitor I get the following output: > Norm = 0.000000 > 0 KSP Residual norm 4.371606462669e+04 > 1 KSP Residual norm 5.850058113796e+02 > 2 KSP Residual norm 5.832677911508e+02 > 3 KSP Residual norm 1.987386549571e+02 > 4 KSP Residual norm 1.220006530614e+02 > . > . > . > The default KSP is left preconditioned GMRES. Hence the above iterates report the preconditioned residual. If your operator is singular, and LU generated garbage, the preconditioned residual can be very different to the true residual. To see the true residual, use -ksp_monitor_true_residual Alternatively, use a right preconditioned KSP method, e.g. -ksp_type fgmres (or -ksp_type gcr) With these methods, you will see the true residual with just -ksp_monitor Thanks Dave > > Since the initial guess is the exact solution should'nt the first residual > itself be zero and converge in one iteration. > > Thanks, > Kaushik > > > On Tue, Apr 11, 2017 at 10:08 AM, Kaushik Kulkarni > wrote: > > Thank you for the inputs. > I tried Barry' s suggestion to use SuperLU, but the solution does not > converge and on doing -ksp_monitor -ksp_converged_reason. I get the > following error:- > 240 KSP Residual norm 1.722571678777e+07 > Linear solve did not converge due to DIVERGED_DTOL iterations 240 > For some reason it is diverging, although I am sure that for the given > system a unique solution exists. > > Thanks, > Kaushik > > On Tue, Apr 11, 2017 at 1:04 AM, Xiaoye S. Li wrote: > > If you need to use SuperLU_DIST, the pivoting is done statically, using > maximum weighted matching, so the small diagonals are usually taken care as > well. It is not as good as partial pivoting, but works most of the time. > > Sherry > > On Mon, Apr 10, 2017 at 12:07 PM, Barry Smith wrote: > > > I would suggest using ./configure --download-superlu and then when > running the program -pc_type lu -pc_factor_mat_solver_package superlu > > Note that this is SuperLU, it is not SuperLU_DIST. Superlu uses > partial pivoting for numerical stability so should be able to handle the > small or zero diagonal entries. > > Barry > > > On Apr 10, 2017, at 1:17 PM, Kaushik Kulkarni > wrote: > > > > Hello, > > I am trying to solve a 2500x2500 sparse matrix. To get an idea about the > matrix structure I have added a file matrix.log which contains the output > of MatView() and also the output of Matview_draw in the image file. > > > > From the matrix structure it can be seen that Jacobi iteration won't > work and some of the diagonal entries being very low(of the order of 1E-16) > LU factorization would also fail. > > > > C?an someone please suggest what all could I try next, in order to make > the solution converge? > > > > Thanks, > > Kaushik > > ? > > -- > > Kaushik Kulkarni > > Fourth Year Undergraduate > > Department of Mechanical Engineering > > Indian Institute of Technology Bombay > > Mumbai, India > > https://kaushikcfd.github.io/About/ > > +91-9967687150 > > > > > > > > -- > Kaushik Kulkarni > Fourth Year Undergraduate > Department of Mechanical Engineering > Indian Institute of Technology Bombay > Mumbai, India > https://kaushikcfd.github.io/About/ > +91-9967687150 > > > > > -- > Kaushik Kulkarni > Fourth Year Undergraduate > Department of Mechanical Engineering > Indian Institute of Technology Bombay > Mumbai, India > https://kaushikcfd.github.io/About/ > +91-9967687150 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From j.wuttke at fz-juelich.de Tue Apr 11 02:53:58 2017 From: j.wuttke at fz-juelich.de (Joachim Wuttke) Date: Tue, 11 Apr 2017 09:53:58 +0200 Subject: [petsc-users] Manual mistaken about VecSetValues Message-ID: <2c2051ff-ad79-f7ea-d095-de669e43f3b3@fz-juelich.de> The PETSc User Manual [ANL-95/11 Rev 3.7 ] says on p. 44: Example usage of VecSetValues() may be found in ${PETSC_DIR}/src/vec/vec/examples/tutorials/ex2.c However, that example [pp. 32-35 in the User Manual] does not contain "VecSetValues". -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5110 bytes Desc: S/MIME Cryptographic Signature URL: From patrick.sanan at gmail.com Tue Apr 11 03:03:21 2017 From: patrick.sanan at gmail.com (Patrick Sanan) Date: Tue, 11 Apr 2017 10:03:21 +0200 Subject: [petsc-users] Manual mistaken about VecSetValues In-Reply-To: <2c2051ff-ad79-f7ea-d095-de669e43f3b3@fz-juelich.de> References: <2c2051ff-ad79-f7ea-d095-de669e43f3b3@fz-juelich.de> Message-ID: The path is important, as there are many examples named "ex2.c". The example on pp. 32-35 is ${PETSC_DIR}/src/ksp/ksp/examples/tutorials/ex2.c , which is about KSP, not Vec. 2017-04-11 9:53 GMT+02:00 Joachim Wuttke : > The PETSc User Manual [ANL-95/11 Rev 3.7 ] says on p. 44: > Example usage of VecSetValues() may be found in > ${PETSC_DIR}/src/vec/vec/examples/tutorials/ex2.c > > However, that example [pp. 32-35 in the User Manual] > does not contain "VecSetValues". > From kaushikggg at gmail.com Tue Apr 11 07:07:32 2017 From: kaushikggg at gmail.com (Kaushik Kulkarni) Date: Tue, 11 Apr 2017 17:37:32 +0530 Subject: [petsc-users] Solving NON-Diagonally dominant sparse system In-Reply-To: References: Message-ID: But anyway since I am starting off with the exact solution itself, shouldn't the norm should be zero independent of the conditioning? On Tue, Apr 11, 2017 at 11:57 AM, Dave May wrote: > > > On Tue, 11 Apr 2017 at 07:28, Kaushik Kulkarni > wrote: > >> A strange behavior I am observing is: >> Problem: I have to solve A*x=rhs, and currently I am currently trying to >> solve for a system where I know the exact solution. I have initialized the >> exact solution in the Vec x_exact. >> >> MatMult(A, x_exact, dummy);// Storing the value of A*x_exact in dummy >> VecAXPY(dummy, -1.0, rhs); // dummy = dummy -rhs >> VecNorm(dummy, NORM_INFINITY, &norm_val); // norm_val = ||dummy||, which >> gives us the residual norm >> PetscPrintf(PETSC_COMM_SELF, "Norm = %f\n", norm_val); // Printing the >> norm. >> >> // Starting with the linear solver >> KSPCreate(PETSC_COMM_SELF, &ksp); >> KSPSetOperators(ksp, A, A); >> KSPSetFromOptions(ksp); >> KSPSolve(ksp,rhs,x_exact); // Solving the system A*x= rhs, with the given >> initial input x_exact. So the result will also be stored in x_exact >> >> On running with -pc_type lu -pc_factor_mat_solver_package superlu >> -ksp_monitor I get the following output: >> Norm = 0.000000 >> 0 KSP Residual norm 4.371606462669e+04 >> 1 KSP Residual norm 5.850058113796e+02 >> 2 KSP Residual norm 5.832677911508e+02 >> 3 KSP Residual norm 1.987386549571e+02 >> 4 KSP Residual norm 1.220006530614e+02 >> . >> . >> . >> > > The default KSP is left preconditioned GMRES. Hence the above iterates > report the preconditioned residual. If your operator is singular, and LU > generated garbage, the preconditioned residual can be very different to the > true residual. > > To see the true residual, use > -ksp_monitor_true_residual > > Alternatively, use a right preconditioned KSP method, e.g. > -ksp_type fgmres > (or -ksp_type gcr) > With these methods, you will see the true residual with just -ksp_monitor > > > Thanks > Dave > > > > >> >> Since the initial guess is the exact solution should'nt the first >> residual itself be zero and converge in one iteration. >> >> Thanks, >> Kaushik >> >> >> On Tue, Apr 11, 2017 at 10:08 AM, Kaushik Kulkarni >> wrote: >> >> Thank you for the inputs. >> I tried Barry' s suggestion to use SuperLU, but the solution does not >> converge and on doing -ksp_monitor -ksp_converged_reason. I get the >> following error:- >> 240 KSP Residual norm 1.722571678777e+07 >> Linear solve did not converge due to DIVERGED_DTOL iterations 240 >> For some reason it is diverging, although I am sure that for the given >> system a unique solution exists. >> >> Thanks, >> Kaushik >> >> On Tue, Apr 11, 2017 at 1:04 AM, Xiaoye S. Li wrote: >> >> If you need to use SuperLU_DIST, the pivoting is done statically, using >> maximum weighted matching, so the small diagonals are usually taken care as >> well. It is not as good as partial pivoting, but works most of the time. >> >> Sherry >> >> On Mon, Apr 10, 2017 at 12:07 PM, Barry Smith wrote: >> >> >> I would suggest using ./configure --download-superlu and then when >> running the program -pc_type lu -pc_factor_mat_solver_package superlu >> >> Note that this is SuperLU, it is not SuperLU_DIST. Superlu uses >> partial pivoting for numerical stability so should be able to handle the >> small or zero diagonal entries. >> >> Barry >> >> > On Apr 10, 2017, at 1:17 PM, Kaushik Kulkarni >> wrote: >> > >> > Hello, >> > I am trying to solve a 2500x2500 sparse matrix. To get an idea about >> the matrix structure I have added a file matrix.log which contains the >> output of MatView() and also the output of Matview_draw in the image file. >> > >> > From the matrix structure it can be seen that Jacobi iteration won't >> work and some of the diagonal entries being very low(of the order of 1E-16) >> LU factorization would also fail. >> > >> > C?an someone please suggest what all could I try next, in order to make >> the solution converge? >> > >> > Thanks, >> > Kaushik >> > ? >> > -- >> > Kaushik Kulkarni >> > Fourth Year Undergraduate >> > Department of Mechanical Engineering >> > Indian Institute of Technology Bombay >> > Mumbai, India >> > https://kaushikcfd.github.io/About/ >> > +91-9967687150 >> > >> >> >> >> >> >> -- >> Kaushik Kulkarni >> Fourth Year Undergraduate >> Department of Mechanical Engineering >> Indian Institute of Technology Bombay >> Mumbai, India >> https://kaushikcfd.github.io/About/ >> +91-9967687150 >> >> >> >> >> -- >> Kaushik Kulkarni >> Fourth Year Undergraduate >> Department of Mechanical Engineering >> Indian Institute of Technology Bombay >> Mumbai, India >> https://kaushikcfd.github.io/About/ >> +91-9967687150 >> > -- Kaushik Kulkarni Fourth Year Undergraduate Department of Mechanical Engineering Indian Institute of Technology Bombay Mumbai, India https://kaushikcfd.github.io/About/ +91-9967687150 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.mayhem23 at gmail.com Tue Apr 11 07:58:47 2017 From: dave.mayhem23 at gmail.com (Dave May) Date: Tue, 11 Apr 2017 12:58:47 +0000 Subject: [petsc-users] Solving NON-Diagonally dominant sparse system In-Reply-To: References: Message-ID: Nope - welcome to finite precision arithmetic. What's the condition number? On Tue, 11 Apr 2017 at 14:07, Kaushik Kulkarni wrote: > But anyway since I am starting off with the exact solution itself, > shouldn't the norm should be zero independent of the conditioning? > > On Tue, Apr 11, 2017 at 11:57 AM, Dave May > wrote: > > > > On Tue, 11 Apr 2017 at 07:28, Kaushik Kulkarni > wrote: > > A strange behavior I am observing is: > Problem: I have to solve A*x=rhs, and currently I am currently trying to > solve for a system where I know the exact solution. I have initialized the > exact solution in the Vec x_exact. > > MatMult(A, x_exact, dummy);// Storing the value of A*x_exact in dummy > VecAXPY(dummy, -1.0, rhs); // dummy = dummy -rhs > VecNorm(dummy, NORM_INFINITY, &norm_val); // norm_val = ||dummy||, which > gives us the residual norm > PetscPrintf(PETSC_COMM_SELF, "Norm = %f\n", norm_val); // Printing the > norm. > > // Starting with the linear solver > KSPCreate(PETSC_COMM_SELF, &ksp); > KSPSetOperators(ksp, A, A); > KSPSetFromOptions(ksp); > KSPSolve(ksp,rhs,x_exact); // Solving the system A*x= rhs, with the given > initial input x_exact. So the result will also be stored in x_exact > > On running with -pc_type lu -pc_factor_mat_solver_package superlu > -ksp_monitor I get the following output: > Norm = 0.000000 > 0 KSP Residual norm 4.371606462669e+04 > 1 KSP Residual norm 5.850058113796e+02 > 2 KSP Residual norm 5.832677911508e+02 > 3 KSP Residual norm 1.987386549571e+02 > 4 KSP Residual norm 1.220006530614e+02 > . > . > . > > > The default KSP is left preconditioned GMRES. Hence the above iterates > report the preconditioned residual. If your operator is singular, and LU > generated garbage, the preconditioned residual can be very different to the > true residual. > > To see the true residual, use > -ksp_monitor_true_residual > > Alternatively, use a right preconditioned KSP method, e.g. > -ksp_type fgmres > (or -ksp_type gcr) > With these methods, you will see the true residual with just -ksp_monitor > > > Thanks > Dave > > > > > > Since the initial guess is the exact solution should'nt the first residual > itself be zero and converge in one iteration. > > Thanks, > Kaushik > > > On Tue, Apr 11, 2017 at 10:08 AM, Kaushik Kulkarni > wrote: > > Thank you for the inputs. > I tried Barry' s suggestion to use SuperLU, but the solution does not > converge and on doing -ksp_monitor -ksp_converged_reason. I get the > following error:- > 240 KSP Residual norm 1.722571678777e+07 > Linear solve did not converge due to DIVERGED_DTOL iterations 240 > For some reason it is diverging, although I am sure that for the given > system a unique solution exists. > > Thanks, > Kaushik > > On Tue, Apr 11, 2017 at 1:04 AM, Xiaoye S. Li wrote: > > If you need to use SuperLU_DIST, the pivoting is done statically, using > maximum weighted matching, so the small diagonals are usually taken care as > well. It is not as good as partial pivoting, but works most of the time. > > Sherry > > On Mon, Apr 10, 2017 at 12:07 PM, Barry Smith wrote: > > > I would suggest using ./configure --download-superlu and then when > running the program -pc_type lu -pc_factor_mat_solver_package superlu > > Note that this is SuperLU, it is not SuperLU_DIST. Superlu uses > partial pivoting for numerical stability so should be able to handle the > small or zero diagonal entries. > > Barry > > > On Apr 10, 2017, at 1:17 PM, Kaushik Kulkarni > wrote: > > > > Hello, > > I am trying to solve a 2500x2500 sparse matrix. To get an idea about the > matrix structure I have added a file matrix.log which contains the output > of MatView() and also the output of Matview_draw in the image file. > > > > From the matrix structure it can be seen that Jacobi iteration won't > work and some of the diagonal entries being very low(of the order of 1E-16) > LU factorization would also fail. > > > > C?an someone please suggest what all could I try next, in order to make > the solution converge? > > > > Thanks, > > Kaushik > > ? > > -- > > Kaushik Kulkarni > > Fourth Year Undergraduate > > Department of Mechanical Engineering > > Indian Institute of Technology Bombay > > Mumbai, India > > https://kaushikcfd.github.io/About/ > > +91-9967687150 > > > > > > > > -- > Kaushik Kulkarni > Fourth Year Undergraduate > Department of Mechanical Engineering > Indian Institute of Technology Bombay > Mumbai, India > https://kaushikcfd.github.io/About/ > +91-9967687150 > > > > > -- > Kaushik Kulkarni > Fourth Year Undergraduate > Department of Mechanical Engineering > Indian Institute of Technology Bombay > Mumbai, India > https://kaushikcfd.github.io/About/ > +91-9967687150 > > > > > -- > Kaushik Kulkarni > Fourth Year Undergraduate > Department of Mechanical Engineering > Indian Institute of Technology Bombay > Mumbai, India > https://kaushikcfd.github.io/About/ > +91-9967687150 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From peetz2 at illinois.edu Tue Apr 11 07:59:26 2017 From: peetz2 at illinois.edu (Peetz, Darin T) Date: Tue, 11 Apr 2017 12:59:26 +0000 Subject: [petsc-users] Solving NON-Diagonally dominant sparse system In-Reply-To: References: , Message-ID: Did you call KSPSetInitialGuessNonzero() or use the option -ksp_initial_guess_nonzero? Otherwise I think Petsc zeroes out your initial guess when you call KSPSolve(). ________________________________ From: petsc-users-bounces at mcs.anl.gov [petsc-users-bounces at mcs.anl.gov] on behalf of Kaushik Kulkarni [kaushikggg at gmail.com] Sent: Tuesday, April 11, 2017 7:07 AM To: Dave May Cc: PETSc users list Subject: Re: [petsc-users] Solving NON-Diagonally dominant sparse system But anyway since I am starting off with the exact solution itself, shouldn't the norm should be zero independent of the conditioning? On Tue, Apr 11, 2017 at 11:57 AM, Dave May > wrote: On Tue, 11 Apr 2017 at 07:28, Kaushik Kulkarni > wrote: A strange behavior I am observing is: Problem: I have to solve A*x=rhs, and currently I am currently trying to solve for a system where I know the exact solution. I have initialized the exact solution in the Vec x_exact. MatMult(A, x_exact, dummy);// Storing the value of A*x_exact in dummy VecAXPY(dummy, -1.0, rhs); // dummy = dummy -rhs VecNorm(dummy, NORM_INFINITY, &norm_val); // norm_val = ||dummy||, which gives us the residual norm PetscPrintf(PETSC_COMM_SELF, "Norm = %f\n", norm_val); // Printing the norm. // Starting with the linear solver KSPCreate(PETSC_COMM_SELF, &ksp); KSPSetOperators(ksp, A, A); KSPSetFromOptions(ksp); KSPSolve(ksp,rhs,x_exact); // Solving the system A*x= rhs, with the given initial input x_exact. So the result will also be stored in x_exact On running with -pc_type lu -pc_factor_mat_solver_package superlu -ksp_monitor I get the following output: Norm = 0.000000 0 KSP Residual norm 4.371606462669e+04 1 KSP Residual norm 5.850058113796e+02 2 KSP Residual norm 5.832677911508e+02 3 KSP Residual norm 1.987386549571e+02 4 KSP Residual norm 1.220006530614e+02 . . . The default KSP is left preconditioned GMRES. Hence the above iterates report the preconditioned residual. If your operator is singular, and LU generated garbage, the preconditioned residual can be very different to the true residual. To see the true residual, use -ksp_monitor_true_residual Alternatively, use a right preconditioned KSP method, e.g. -ksp_type fgmres (or -ksp_type gcr) With these methods, you will see the true residual with just -ksp_monitor Thanks Dave Since the initial guess is the exact solution should'nt the first residual itself be zero and converge in one iteration. Thanks, Kaushik On Tue, Apr 11, 2017 at 10:08 AM, Kaushik Kulkarni > wrote: Thank you for the inputs. I tried Barry' s suggestion to use SuperLU, but the solution does not converge and on doing -ksp_monitor -ksp_converged_reason. I get the following error:- 240 KSP Residual norm 1.722571678777e+07 Linear solve did not converge due to DIVERGED_DTOL iterations 240 For some reason it is diverging, although I am sure that for the given system a unique solution exists. Thanks, Kaushik On Tue, Apr 11, 2017 at 1:04 AM, Xiaoye S. Li > wrote: If you need to use SuperLU_DIST, the pivoting is done statically, using maximum weighted matching, so the small diagonals are usually taken care as well. It is not as good as partial pivoting, but works most of the time. Sherry On Mon, Apr 10, 2017 at 12:07 PM, Barry Smith > wrote: I would suggest using ./configure --download-superlu and then when running the program -pc_type lu -pc_factor_mat_solver_package superlu Note that this is SuperLU, it is not SuperLU_DIST. Superlu uses partial pivoting for numerical stability so should be able to handle the small or zero diagonal entries. Barry > On Apr 10, 2017, at 1:17 PM, Kaushik Kulkarni > wrote: > > Hello, > I am trying to solve a 2500x2500 sparse matrix. To get an idea about the matrix structure I have added a file matrix.log which contains the output of MatView() and also the output of Matview_draw in the image file. > > From the matrix structure it can be seen that Jacobi iteration won't work and some of the diagonal entries being very low(of the order of 1E-16) LU factorization would also fail. > > C?an someone please suggest what all could I try next, in order to make the solution converge? > > Thanks, > Kaushik > ? > -- > Kaushik Kulkarni > Fourth Year Undergraduate > Department of Mechanical Engineering > Indian Institute of Technology Bombay > Mumbai, India > https://kaushikcfd.github.io/About/ > +91-9967687150 > g> -- Kaushik Kulkarni Fourth Year Undergraduate Department of Mechanical Engineering Indian Institute of Technology Bombay Mumbai, India https://kaushikcfd.github.io/About/ +91-9967687150 -- Kaushik Kulkarni Fourth Year Undergraduate Department of Mechanical Engineering Indian Institute of Technology Bombay Mumbai, India https://kaushikcfd.github.io/About/ +91-9967687150 -- Kaushik Kulkarni Fourth Year Undergraduate Department of Mechanical Engineering Indian Institute of Technology Bombay Mumbai, India https://kaushikcfd.github.io/About/ +91-9967687150 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Hassan.Raiesi at aero.bombardier.com Tue Apr 11 09:21:29 2017 From: Hassan.Raiesi at aero.bombardier.com (Hassan Raiesi) Date: Tue, 11 Apr 2017 14:21:29 +0000 Subject: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel Message-ID: Hello, I'm trying to use DMPlexCreateFromCellListParallel to create a DM from an already partitioned mesh, It requires an array of numVertices*spaceDim numbers, but how should one order the coordinates of the vertices? we only pass the global vertex numbers using 'const int cells[]' to define the cell-connectivity, so passing the vertex coordinates in local ordering wouldn't make sense? If it needs to be in global ordering, should I sort the global index of the node numbers owned by each rank (as they wont be continuous). Thank you Hassan Raiesi, Bombardier Aerospace www.bombardier.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From Rodrigo.Felicio at iongeo.com Tue Apr 11 09:31:21 2017 From: Rodrigo.Felicio at iongeo.com (Rodrigo Felicio) Date: Tue, 11 Apr 2017 14:31:21 +0000 Subject: [petsc-users] how to use petsc4py with mpi subcommunicators? In-Reply-To: <87bms367kb.fsf@jedbrown.org> References: <350529B93F4E2F4497FD8DE4E86E84AA16F1FC5C@AUS1EXMBX04.ioinc.ioroot.tld> <87bms367kb.fsf@jedbrown.org> Message-ID: <350529B93F4E2F4497FD8DE4E86E84AA16F1FD65@AUS1EXMBX04.ioinc.ioroot.tld> Thanks, Jed, but using color == 0 lead to the same error msg. Is there no way to set PETSc.COMM_WORLD to a subcomm instead of MPI.COMM_WORLD in python? Cheers Rodrigo -----Original Message----- From: Jed Brown [mailto:jed at jedbrown.org] Sent: Monday, April 10, 2017 8:45 PM To: Rodrigo Felicio; petsc-users at mcs.anl.gov Subject: Re: [petsc-users] how to use petsc4py with mpi subcommunicators? Rodrigo Felicio writes: > Hello all, > > Sorry for the newbie question, but is there a way of making petsc4py work with an MPI group or subcommunicator? I saw a solution posted back in 2010 (http://lists.mcs.anl.gov/pipermail/petsc-users/2010-May/006382.html), but it does not work for me. Indeed, if I try to use petsc4py.init(comm=newcom), then my sample code prints a msg "Attempting to use an MPI routine before initializing MPI". Below I attach both the output and the source of the python code. > > kind regards > Rodrigo > > > time mpirun -n 5 python split_comm_ex2.py > Global: rank 0 of 5. New comm : rank 0 of 3 > Global: rank 1 of 5. New comm : rank 0 of 2 > Global: rank 2 of 5. New comm : rank 1 of 3 > Global: rank 3 of 5. New comm : rank 1 of 2 > Global: rank 4 of 5. New comm : rank 2 of 3 Attempting to use an MPI > routine before initializing MPI Attempting to use an MPI routine > before initializing MPI > > real 0m0.655s > user 0m1.122s > sys 0m1.047s > > And the python code: > > from mpi4py import MPI > > comm = MPI.COMM_WORLD > world_rank = comm.rank > world_size = comm.size > > color = world_rank % 2 > > newcomm = comm.Split(color) > newcomm_rank = newcomm.rank > newcomm_size = newcomm.size > > for i in range(world_size): > comm.Barrier() > if (world_rank == i): > print ("Global: rank %d of %d. New comm : rank % d of %d" % > (world_rank, world_size, newcomm_rank, newcomm_size)) > > if newcomm.rank == 0: > import petsc4py > petsc4py.init(comm=newcomm) I don't know if it fixes your problem, but you definitely need to call this collectively on newcomm. For example, using "if color == 0" above would have the right collective semantics. > from petsc4py import PETSc > > pcomm = PETSc.COMM_WORLD > print('pcomm size is {}/{}'.format(pcomm.rank, pcomm.size)) > > newcomm.Free() > > ________________________________ > > > This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original. ________________________________ This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original. From j.wuttke at fz-juelich.de Tue Apr 11 10:19:23 2017 From: j.wuttke at fz-juelich.de (Joachim Wuttke) Date: Tue, 11 Apr 2017 17:19:23 +0200 Subject: [petsc-users] status of PETSc installation via CMake? Message-ID: <9f6402ad-a82e-8049-44d6-430805d2bab0@fz-juelich.de> The current source archive petsc-3.7.5.tar.gz comes with files - configure for Autotools based installation - CMakeLists.txt for CMake based installation However, the installation web page https://www.mcs.anl.gov/petsc/documentation/installation.html only mentions "./configure". So what is the status of support for CMake based installation? The following attempt failed: cd petsc-3.7.5 [the unpacked source archive] mkdir build cd build cmake .. Result: [...] CMake Error at CMakeLists.txt:4 (include): include could not find load file: /lib/petsc/conf/PETScBuildInternal.cmake -- Configuring incomplete, errors occurred! In-place build (which of course is disadvised) results in the same error. -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5110 bytes Desc: S/MIME Cryptographic Signature URL: From balay at mcs.anl.gov Tue Apr 11 10:26:45 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 11 Apr 2017 10:26:45 -0500 Subject: [petsc-users] status of PETSc installation via CMake? In-Reply-To: <9f6402ad-a82e-8049-44d6-430805d2bab0@fz-juelich.de> References: <9f6402ad-a82e-8049-44d6-430805d2bab0@fz-juelich.de> Message-ID: We never had a cmake build infrastructure.. It was always configure && make. However - earlier on - we did have a mode of using cmake to generate gnumakefiles [currently depricated - it might still work] - but we've moved on to using native gnumakefiles - so don't need cmake to generate them anymore. To use the cmake generated makefiles - you would do: ./configure && make all-cmake Satish On Tue, 11 Apr 2017, Joachim Wuttke wrote: > The current source archive petsc-3.7.5.tar.gz > comes with files > - configure for Autotools based installation > - CMakeLists.txt for CMake based installation > > However, the installation web page > https://www.mcs.anl.gov/petsc/documentation/installation.html > only mentions "./configure". > > So what is the status of support for CMake based installation? > > The following attempt failed: > cd petsc-3.7.5 [the unpacked source archive] > mkdir build > cd build > cmake .. > Result: > [...] > CMake Error at CMakeLists.txt:4 (include): > include could not find load file: > /lib/petsc/conf/PETScBuildInternal.cmake > -- Configuring incomplete, errors occurred! > > In-place build (which of course is disadvised) > results in the same error. > > > From j.wuttke at fz-juelich.de Tue Apr 11 10:27:28 2017 From: j.wuttke at fz-juelich.de (Joachim Wuttke) Date: Tue, 11 Apr 2017 17:27:28 +0200 Subject: [petsc-users] status of "make install" ? Message-ID: <0d622f68-5e24-2675-eca2-d4ced7fef53f@fz-juelich.de> Does PETSc support the usual installation command sequence configure; make; make install? The installation web page does not mention "make install". The Makefile generated by configure does contain a target "install". However, "make install" does not work; it yields the error message "Incorrect prefix usage. Specified destDir same as current PETSC_DIR/PETSC_ARCH" -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5110 bytes Desc: S/MIME Cryptographic Signature URL: From balay at mcs.anl.gov Tue Apr 11 10:29:03 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 11 Apr 2017 10:29:03 -0500 Subject: [petsc-users] status of PETSc installation via CMake? In-Reply-To: References: <9f6402ad-a82e-8049-44d6-430805d2bab0@fz-juelich.de> Message-ID: BTW: we don't use Autotools either. [our configure tool is homegrown] Satish On Tue, 11 Apr 2017, Satish Balay wrote: > We never had a cmake build infrastructure.. It was always configure && make. > > However - earlier on - we did have a mode of using cmake to generate > gnumakefiles [currently depricated - it might still work] - but we've > moved on to using native gnumakefiles - so don't need cmake to > generate them anymore. > > To use the cmake generated makefiles - you would do: > > ./configure && make all-cmake > > Satish > > On Tue, 11 Apr 2017, Joachim Wuttke wrote: > > > The current source archive petsc-3.7.5.tar.gz > > comes with files > > - configure for Autotools based installation > > - CMakeLists.txt for CMake based installation > > > > However, the installation web page > > https://www.mcs.anl.gov/petsc/documentation/installation.html > > only mentions "./configure". > > > > So what is the status of support for CMake based installation? > > > > The following attempt failed: > > cd petsc-3.7.5 [the unpacked source archive] > > mkdir build > > cd build > > cmake .. > > Result: > > [...] > > CMake Error at CMakeLists.txt:4 (include): > > include could not find load file: > > /lib/petsc/conf/PETScBuildInternal.cmake > > -- Configuring incomplete, errors occurred! > > > > In-place build (which of course is disadvised) > > results in the same error. > > > > > > > > From balay at mcs.anl.gov Tue Apr 11 10:31:05 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 11 Apr 2017 10:31:05 -0500 Subject: [petsc-users] status of "make install" ? In-Reply-To: <0d622f68-5e24-2675-eca2-d4ced7fef53f@fz-juelich.de> References: <0d622f68-5e24-2675-eca2-d4ced7fef53f@fz-juelich.de> Message-ID: It supports 2 modes: 1. inplace [default]: ./configure && make 2. prefix ./configure --prefix=/prefix/location && make && make install So to use 'make install' - you need to run configure with the correct prefix option. Satish On Tue, 11 Apr 2017, Joachim Wuttke wrote: > Does PETSc support the usual installation command sequence > configure; make; make install? > > The installation web page does not mention "make install". > > The Makefile generated by configure does contain a target "install". > However, "make install" does not work; it yields the error message > "Incorrect prefix usage. Specified destDir same as current > PETSC_DIR/PETSC_ARCH" > > From j.wuttke at fz-juelich.de Tue Apr 11 10:43:03 2017 From: j.wuttke at fz-juelich.de (Joachim Wuttke) Date: Tue, 11 Apr 2017 17:43:03 +0200 Subject: [petsc-users] FindPETSc.cmake expects petscvariables in wrong location Message-ID: The PETSc FAQ recommends to use the FindPETSc.cmake module from the repository https://github.com/jedbrown/cmake-modules. This doesn't work for me. CMake fails with the following message: CMake Error at cmake/modules/FindPETSc.cmake:125 (message): The pair PETSC_DIR=/usr/lib/src/petsc-3.7.5 PETSC_ARCH=linux-amd64 do not specify a valid PETSc installation Line 115 of FindPETSc.cmake shows that the file petscvariables is expected at location ${PETSC_DIR}/${PETSC_ARCH}/lib/petsc/conf/petscvariables However, in the current source archive, it is located in ${PETSC_DIR}/lib/petsc/conf/petscvariables, and the default install procedure does not copy it to ${PETSC_ARCH}. -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5110 bytes Desc: S/MIME Cryptographic Signature URL: From balay at mcs.anl.gov Tue Apr 11 10:49:10 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 11 Apr 2017 10:49:10 -0500 Subject: [petsc-users] FindPETSc.cmake expects petscvariables in wrong location In-Reply-To: References: Message-ID: If you use a prefix install - after the install process - you would Not use PETSC_ARCH. i.e PETSC_ARCH='' Satish On Tue, 11 Apr 2017, Joachim Wuttke wrote: > The PETSc FAQ recommends to use the FindPETSc.cmake module > from the repository https://github.com/jedbrown/cmake-modules. > > This doesn't work for me. CMake fails with the following message: > CMake Error at cmake/modules/FindPETSc.cmake:125 (message): > The pair PETSC_DIR=/usr/lib/src/petsc-3.7.5 PETSC_ARCH=linux-amd64 do not > specify a valid PETSc installation > > Line 115 of FindPETSc.cmake shows that the file petscvariables > is expected at location > ${PETSC_DIR}/${PETSC_ARCH}/lib/petsc/conf/petscvariables > > However, in the current source archive, it is located in > ${PETSC_DIR}/lib/petsc/conf/petscvariables, > and the default install procedure does not copy it to ${PETSC_ARCH}. > > > From j.wuttke at fz-juelich.de Tue Apr 11 10:58:28 2017 From: j.wuttke at fz-juelich.de (Joachim Wuttke) Date: Tue, 11 Apr 2017 17:58:28 +0200 Subject: [petsc-users] FindPETSc.cmake expects petscvariables in wrong location In-Reply-To: References: Message-ID: <3d2637ec-eec9-22dc-4841-4d2ad23b6589@fz-juelich.de> Satish, thank you very much for your prompt answers. Your last one, however, does not solve my problem. PETSC_ARCH='' is not the solution. Line 115 of FindPETSc.cmake, if (EXISTS "${PETSC_DIR}/${PETSC_ARCH}/lib/petsc/conf/petscvariables") # > 3.5 would then search for ${PETSC_DIR}//lib/petsc/conf/petscvariables which obviously fails. -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5110 bytes Desc: S/MIME Cryptographic Signature URL: From balay at mcs.anl.gov Tue Apr 11 11:11:26 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 11 Apr 2017 11:11:26 -0500 Subject: [petsc-users] FindPETSc.cmake expects petscvariables in wrong location In-Reply-To: <3d2637ec-eec9-22dc-4841-4d2ad23b6589@fz-juelich.de> References: <3d2637ec-eec9-22dc-4841-4d2ad23b6589@fz-juelich.de> Message-ID: you need to give me details on how you installed petsc. [say make.log] Satish On Tue, 11 Apr 2017, Joachim Wuttke wrote: > Satish, thank you very much for your prompt answers. > Your last one, however, does not solve my problem. > > PETSC_ARCH='' is not the solution. > > Line 115 of FindPETSc.cmake, > > if (EXISTS "${PETSC_DIR}/${PETSC_ARCH}/lib/petsc/conf/petscvariables") # > > 3.5 > > would then search for > > ${PETSC_DIR}//lib/petsc/conf/petscvariables > > which obviously fails. > > From j.wuttke at fz-juelich.de Tue Apr 11 11:15:59 2017 From: j.wuttke at fz-juelich.de (Joachim Wuttke) Date: Tue, 11 Apr 2017 18:15:59 +0200 Subject: [petsc-users] FindPETSc.cmake expects petscvariables in wrong location In-Reply-To: References: <3d2637ec-eec9-22dc-4841-4d2ad23b6589@fz-juelich.de> Message-ID: <31b8eb5c-250f-f02f-62c9-997041cfc9a3@fz-juelich.de> > you need to give me details on how you installed petsc. [say make.log] tar zxvf petsc-3.7.5.tar.gz cd petsc-3.7.5 ./configure --with-shared-libraries PETSC_DIR=/usr/local/src/petsc-3.7.5 PETSC_ARCH=linux-amd64 --with-mpi-dir=/usr/lib/x86_64-linux-gnu/openmpi make MAKE_NP=7 PETSC_DIR=/usr/local/src/petsc-3.7.5 PETSC_ARCH=linux-amd64 all -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5110 bytes Desc: S/MIME Cryptographic Signature URL: From balay at mcs.anl.gov Tue Apr 11 11:23:17 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 11 Apr 2017 11:23:17 -0500 Subject: [petsc-users] FindPETSc.cmake expects petscvariables in wrong location In-Reply-To: <31b8eb5c-250f-f02f-62c9-997041cfc9a3@fz-juelich.de> References: <3d2637ec-eec9-22dc-4841-4d2ad23b6589@fz-juelich.de> <31b8eb5c-250f-f02f-62c9-997041cfc9a3@fz-juelich.de> Message-ID: On Tue, 11 Apr 2017, Joachim Wuttke wrote: > > you need to give me details on how you installed petsc. [say make.log] > > tar zxvf petsc-3.7.5.tar.gz > > cd petsc-3.7.5 > > ./configure --with-shared-libraries PETSC_DIR=/usr/local/src/petsc-3.7.5 > PETSC_ARCH=linux-amd64 --with-mpi-dir=/usr/lib/x86_64-linux-gnu/openmpi > > make MAKE_NP=7 PETSC_DIR=/usr/local/src/petsc-3.7.5 PETSC_ARCH=linux-amd64 all So you are not using prefix install - so would require to use 'PETSC_DIR=/usr/local/src/petsc-3.7.5 PETSC_ARCH=linux-amd64' with FindPETSc.cmake Wrt your previous note: >>> However, in the current source archive, it is located in ${PETSC_DIR}/lib/petsc/conf/petscvariables, and the default install procedure does not copy it to ${PETSC_ARCH}. <<<< Thre is also be ${PETSC_DIR}/${PETSC_ARCH}/lib/petsc/conf/petscvariables - that FindPETSc.cmake should find If it doesn't work - send the relavent cmake logs - and Jed might provide better debug instructions. Satish From balay at mcs.anl.gov Tue Apr 11 11:27:05 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 11 Apr 2017 11:27:05 -0500 Subject: [petsc-users] FindPETSc.cmake expects petscvariables in wrong location In-Reply-To: References: <3d2637ec-eec9-22dc-4841-4d2ad23b6589@fz-juelich.de> <31b8eb5c-250f-f02f-62c9-997041cfc9a3@fz-juelich.de> Message-ID: On Tue, 11 Apr 2017, Satish Balay wrote: > On Tue, 11 Apr 2017, Joachim Wuttke wrote: > > > > you need to give me details on how you installed petsc. [say make.log] > > > > tar zxvf petsc-3.7.5.tar.gz > > > > cd petsc-3.7.5 > > > > ./configure --with-shared-libraries PETSC_DIR=/usr/local/src/petsc-3.7.5 > > PETSC_ARCH=linux-amd64 --with-mpi-dir=/usr/lib/x86_64-linux-gnu/openmpi > > > > make MAKE_NP=7 PETSC_DIR=/usr/local/src/petsc-3.7.5 PETSC_ARCH=linux-amd64 all > > So you are not using prefix install - so would require to use 'PETSC_DIR=/usr/local/src/petsc-3.7.5 PETSC_ARCH=linux-amd64' with FindPETSc.cmake > > Wrt your previous note: > > >>> > However, in the current source archive, it is located in > ${PETSC_DIR}/lib/petsc/conf/petscvariables, > and the default install procedure does not copy it to ${PETSC_ARCH}. > > <<<< > > Thre is also be > ${PETSC_DIR}/${PETSC_ARCH}/lib/petsc/conf/petscvariables - that > FindPETSc.cmake should find > > If it doesn't work - send the relavent cmake logs - and Jed might > provide better debug instructions. Also - please confirm if 'make test' was successful for your build. > > Satish > From j.wuttke at fz-juelich.de Tue Apr 11 11:30:55 2017 From: j.wuttke at fz-juelich.de (Joachim Wuttke) Date: Tue, 11 Apr 2017 18:30:55 +0200 Subject: [petsc-users] FindPETSc.cmake expects petscvariables in wrong location In-Reply-To: References: <3d2637ec-eec9-22dc-4841-4d2ad23b6589@fz-juelich.de> <31b8eb5c-250f-f02f-62c9-997041cfc9a3@fz-juelich.de> Message-ID: <43bc6f6c-4cc4-4e91-827e-f390decedcc9@fz-juelich.de> 'make test' was successful. And indeed, ${PETSC_DIR}/${PETSC_ARCH}/lib/petsc/conf/petscvariables exists. There was a mistake in my PETSC_DIR. Now FindPETSc.cmake does find petscvariables ... ... and I am at the next error: CMake Error at cmake/modules/FindPETSc.cmake:164 (include): include could not find load file: ResolveCompilerPaths -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5110 bytes Desc: S/MIME Cryptographic Signature URL: From j.wuttke at fz-juelich.de Tue Apr 11 11:34:05 2017 From: j.wuttke at fz-juelich.de (Joachim Wuttke) Date: Tue, 11 Apr 2017 18:34:05 +0200 Subject: [petsc-users] FindPETSc.cmake expects petscvariables in wrong location In-Reply-To: <43bc6f6c-4cc4-4e91-827e-f390decedcc9@fz-juelich.de> References: <3d2637ec-eec9-22dc-4841-4d2ad23b6589@fz-juelich.de> <31b8eb5c-250f-f02f-62c9-997041cfc9a3@fz-juelich.de> <43bc6f6c-4cc4-4e91-827e-f390decedcc9@fz-juelich.de> Message-ID: <84ff4747-670b-9e0b-17a5-c14031d65798@fz-juelich.de> > ... and I am at the next error: > CMake Error at cmake/modules/FindPETSc.cmake:164 (include): > include could not find load file: > > ResolveCompilerPaths which is easily solved by copying that file from the external repository https://github.com/jedbrown/cmake-modules/ -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5110 bytes Desc: S/MIME Cryptographic Signature URL: From jed at jedbrown.org Tue Apr 11 11:36:17 2017 From: jed at jedbrown.org (Jed Brown) Date: Tue, 11 Apr 2017 10:36:17 -0600 Subject: [petsc-users] how to use petsc4py with mpi subcommunicators? In-Reply-To: <350529B93F4E2F4497FD8DE4E86E84AA16F1FC5C@AUS1EXMBX04.ioinc.ioroot.tld> References: <350529B93F4E2F4497FD8DE4E86E84AA16F1FC5C@AUS1EXMBX04.ioinc.ioroot.tld> Message-ID: <874lxu52bi.fsf@jedbrown.org> Looks like a question for Lisandro. I believe the code you have (with appropriate collective semantics) was intended to work, but I'm not in a position to debug right now. Have you confirmed that mpi4py is linked to the same MPI as petsc4py/PETSc? Rodrigo Felicio writes: > Hello all, > > Sorry for the newbie question, but is there a way of making petsc4py work with an MPI group or subcommunicator? I saw a solution posted back in 2010 (http://lists.mcs.anl.gov/pipermail/petsc-users/2010-May/006382.html), but it does not work for me. Indeed, if I try to use petsc4py.init(comm=newcom), then my sample code prints a msg "Attempting to use an MPI routine before initializing MPI". Below I attach both the output and the source of the python code. > > kind regards > Rodrigo > > > time mpirun -n 5 python split_comm_ex2.py > Global: rank 0 of 5. New comm : rank 0 of 3 > Global: rank 1 of 5. New comm : rank 0 of 2 > Global: rank 2 of 5. New comm : rank 1 of 3 > Global: rank 3 of 5. New comm : rank 1 of 2 > Global: rank 4 of 5. New comm : rank 2 of 3 > Attempting to use an MPI routine before initializing MPI > Attempting to use an MPI routine before initializing MPI > > real 0m0.655s > user 0m1.122s > sys 0m1.047s > > And the python code: > > from mpi4py import MPI > > comm = MPI.COMM_WORLD > world_rank = comm.rank > world_size = comm.size > > color = world_rank % 2 > > newcomm = comm.Split(color) > newcomm_rank = newcomm.rank > newcomm_size = newcomm.size > > for i in range(world_size): > comm.Barrier() > if (world_rank == i): > print ("Global: rank %d of %d. New comm : rank % d of %d" % > (world_rank, world_size, newcomm_rank, newcomm_size)) > > if newcomm.rank == 0: > import petsc4py > petsc4py.init(comm=newcomm) > from petsc4py import PETSc > > pcomm = PETSc.COMM_WORLD > print('pcomm size is {}/{}'.format(pcomm.rank, pcomm.size)) > > newcomm.Free() > > ________________________________ > > > This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From balay at mcs.anl.gov Tue Apr 11 11:37:15 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 11 Apr 2017 11:37:15 -0500 Subject: [petsc-users] FindPETSc.cmake expects petscvariables in wrong location In-Reply-To: <43bc6f6c-4cc4-4e91-827e-f390decedcc9@fz-juelich.de> References: <3d2637ec-eec9-22dc-4841-4d2ad23b6589@fz-juelich.de> <31b8eb5c-250f-f02f-62c9-997041cfc9a3@fz-juelich.de> <43bc6f6c-4cc4-4e91-827e-f390decedcc9@fz-juelich.de> Message-ID: On Tue, 11 Apr 2017, Joachim Wuttke wrote: > 'make test' was successful. > > And indeed, > ${PETSC_DIR}/${PETSC_ARCH}/lib/petsc/conf/petscvariables > exists. > There was a mistake in my PETSC_DIR. > Now FindPETSc.cmake does find petscvariables ... > > ... and I am at the next error: > CMake Error at cmake/modules/FindPETSc.cmake:164 (include): > include could not find load file: > > ResolveCompilerPaths https://github.com/jedbrown/cmake-modules The repo has ResolveCompilerPaths.cmake Did you not get the whole repo? Satish From Rodrigo.Felicio at iongeo.com Tue Apr 11 12:14:05 2017 From: Rodrigo.Felicio at iongeo.com (Rodrigo Felicio) Date: Tue, 11 Apr 2017 17:14:05 +0000 Subject: [petsc-users] how to use petsc4py with mpi subcommunicators? In-Reply-To: <874lxu52bi.fsf@jedbrown.org> References: <350529B93F4E2F4497FD8DE4E86E84AA16F1FC5C@AUS1EXMBX04.ioinc.ioroot.tld> <874lxu52bi.fsf@jedbrown.org> Message-ID: <350529B93F4E2F4497FD8DE4E86E84AA16F1FDDA@AUS1EXMBX04.ioinc.ioroot.tld> I think they are linked to the same MPI, but I am not sure how to confirm that. Looking into mpi4py/mpi.cfg I see the expected mpicc. On petsc4py/lib/petsc.cfg points to the petsc install directory and in there on the configure.log file I see the same path for the same intel mpi compiler... I noticed that if I instead of petsc4py.init(comm=newcomm) using petsc4py.init() the works, but then PETSc.COMM_WORLD = MPI.COMM_WORLD Further searching on the web suggest you are right and that the error msg that I am getting points to mismatch between mpirun and linked libraries, but why does it only happens if trying to initiate petsc with petsc4py.init(comm=newcomm)? Anyway, thanks Jed, really appreciate the help. Cheers Rodrigo -----Original Message----- From: Jed Brown [mailto:jed at jedbrown.org] Sent: Tuesday, April 11, 2017 11:36 AM To: Rodrigo Felicio; petsc-users at mcs.anl.gov Cc: Lisandro Dalcin Subject: Re: [petsc-users] how to use petsc4py with mpi subcommunicators? Looks like a question for Lisandro. I believe the code you have (with appropriate collective semantics) was intended to work, but I'm not in a position to debug right now. Have you confirmed that mpi4py is linked to the same MPI as petsc4py/PETSc? Rodrigo Felicio writes: > Hello all, > > Sorry for the newbie question, but is there a way of making petsc4py work with an MPI group or subcommunicator? I saw a solution posted back in 2010 (http://lists.mcs.anl.gov/pipermail/petsc-users/2010-May/006382.html), but it does not work for me. Indeed, if I try to use petsc4py.init(comm=newcom), then my sample code prints a msg "Attempting to use an MPI routine before initializing MPI". Below I attach both the output and the source of the python code. > > kind regards > Rodrigo > > > time mpirun -n 5 python split_comm_ex2.py > Global: rank 0 of 5. New comm : rank 0 of 3 > Global: rank 1 of 5. New comm : rank 0 of 2 > Global: rank 2 of 5. New comm : rank 1 of 3 > Global: rank 3 of 5. New comm : rank 1 of 2 > Global: rank 4 of 5. New comm : rank 2 of 3 Attempting to use an MPI > routine before initializing MPI Attempting to use an MPI routine > before initializing MPI > > real 0m0.655s > user 0m1.122s > sys 0m1.047s > > And the python code: > > from mpi4py import MPI > > comm = MPI.COMM_WORLD > world_rank = comm.rank > world_size = comm.size > > color = world_rank % 2 > > newcomm = comm.Split(color) > newcomm_rank = newcomm.rank > newcomm_size = newcomm.size > > for i in range(world_size): > comm.Barrier() > if (world_rank == i): > print ("Global: rank %d of %d. New comm : rank % d of %d" % > (world_rank, world_size, newcomm_rank, newcomm_size)) > > if newcomm.rank == 0: > import petsc4py > petsc4py.init(comm=newcomm) > from petsc4py import PETSc > > pcomm = PETSc.COMM_WORLD > print('pcomm size is {}/{}'.format(pcomm.rank, pcomm.size)) > > newcomm.Free() > > ________________________________ > > > This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original. ________________________________ This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original. From gaetank at gmail.com Tue Apr 11 12:21:33 2017 From: gaetank at gmail.com (Gaetan Kenway) Date: Tue, 11 Apr 2017 10:21:33 -0700 Subject: [petsc-users] how to use petsc4py with mpi subcommunicators? In-Reply-To: <350529B93F4E2F4497FD8DE4E86E84AA16F1FDDA@AUS1EXMBX04.ioinc.ioroot.tld> References: <350529B93F4E2F4497FD8DE4E86E84AA16F1FC5C@AUS1EXMBX04.ioinc.ioroot.tld> <874lxu52bi.fsf@jedbrown.org> <350529B93F4E2F4497FD8DE4E86E84AA16F1FDDA@AUS1EXMBX04.ioinc.ioroot.tld> Message-ID: Hi all I think I remember having this or a similar issue at some point as well. The issue was we had two (python wrapped) codes on the MPI_COMM_WORLD but only one of them used PETSc. We never did figure out how to just get petsc initialized on the one sub-comm. The workaround was just to do from petscpy import PETSc at the very top of the highest level python execution script which has worked for us. Gaetan On Tue, Apr 11, 2017 at 10:14 AM, Rodrigo Felicio < Rodrigo.Felicio at iongeo.com> wrote: > I think they are linked to the same MPI, but I am not sure how to confirm > that. Looking into mpi4py/mpi.cfg I see the expected mpicc. On > petsc4py/lib/petsc.cfg points to the petsc install directory and in there > on the configure.log file I see the same path for the same intel mpi > compiler... > > I noticed that if I instead of > petsc4py.init(comm=newcomm) > > using > petsc4py.init() > > the works, but then PETSc.COMM_WORLD = MPI.COMM_WORLD > > Further searching on the web suggest you are right and that the error msg > that I am getting points to mismatch between mpirun and linked libraries, > but why does it only happens if trying to initiate petsc with > petsc4py.init(comm=newcomm)? > Anyway, thanks Jed, really appreciate the help. > > Cheers > Rodrigo > > > -----Original Message----- > From: Jed Brown [mailto:jed at jedbrown.org] > Sent: Tuesday, April 11, 2017 11:36 AM > To: Rodrigo Felicio; petsc-users at mcs.anl.gov > Cc: Lisandro Dalcin > Subject: Re: [petsc-users] how to use petsc4py with mpi subcommunicators? > > Looks like a question for Lisandro. I believe the code you have (with > appropriate collective semantics) was intended to work, but I'm not in a > position to debug right now. Have you confirmed that mpi4py is linked to > the same MPI as petsc4py/PETSc? > > Rodrigo Felicio writes: > > > Hello all, > > > > Sorry for the newbie question, but is there a way of making petsc4py > work with an MPI group or subcommunicator? I saw a solution posted back > in 2010 (http://lists.mcs.anl.gov/pipermail/petsc-users/2010- > May/006382.html), but it does not work for me. Indeed, if I try to use > petsc4py.init(comm=newcom), then my sample code prints a msg "Attempting to > use an MPI routine before initializing MPI". Below I attach both the > output and the source of the python code. > > > > kind regards > > Rodrigo > > > > > > time mpirun -n 5 python split_comm_ex2.py > > Global: rank 0 of 5. New comm : rank 0 of 3 > > Global: rank 1 of 5. New comm : rank 0 of 2 > > Global: rank 2 of 5. New comm : rank 1 of 3 > > Global: rank 3 of 5. New comm : rank 1 of 2 > > Global: rank 4 of 5. New comm : rank 2 of 3 Attempting to use an MPI > > routine before initializing MPI Attempting to use an MPI routine > > before initializing MPI > > > > real 0m0.655s > > user 0m1.122s > > sys 0m1.047s > > > > And the python code: > > > > from mpi4py import MPI > > > > comm = MPI.COMM_WORLD > > world_rank = comm.rank > > world_size = comm.size > > > > color = world_rank % 2 > > > > newcomm = comm.Split(color) > > newcomm_rank = newcomm.rank > > newcomm_size = newcomm.size > > > > for i in range(world_size): > > comm.Barrier() > > if (world_rank == i): > > print ("Global: rank %d of %d. New comm : rank % d of %d" % > > (world_rank, world_size, newcomm_rank, newcomm_size)) > > > > if newcomm.rank == 0: > > import petsc4py > > petsc4py.init(comm=newcomm) > > from petsc4py import PETSc > > > > pcomm = PETSc.COMM_WORLD > > print('pcomm size is {}/{}'.format(pcomm.rank, pcomm.size)) > > > > newcomm.Free() > > > > ________________________________ > > > > > > This email and any files transmitted with it are confidential and are > intended solely for the use of the individual or entity to whom they are > addressed. If you are not the original recipient or the person responsible > for delivering the email to the intended recipient, be advised that you > have received this email in error, and that any use, dissemination, > forwarding, printing, or copying of this email is strictly prohibited. If > you received this email in error, please immediately notify the sender and > delete the original. > > ________________________________ > > > This email and any files transmitted with it are confidential and are > intended solely for the use of the individual or entity to whom they are > addressed. If you are not the original recipient or the person responsible > for delivering the email to the intended recipient, be advised that you > have received this email in error, and that any use, dissemination, > forwarding, printing, or copying of this email is strictly prohibited. If > you received this email in error, please immediately notify the sender and > delete the original. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Apr 11 12:34:39 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 11 Apr 2017 12:34:39 -0500 Subject: [petsc-users] status of "make install" ? In-Reply-To: References: <0d622f68-5e24-2675-eca2-d4ced7fef53f@fz-juelich.de> Message-ID: >> "Incorrect prefix usage. Specified destDir same as current >> PETSC_DIR/PETSC_ARCH" Satish, We need a better error message here. It should tell the user what you said below, that they need to run ./configure with a prefix before doing make install Barry > On Apr 11, 2017, at 10:31 AM, Satish Balay wrote: > > It supports 2 modes: > > 1. inplace [default]: > > ./configure && make > > 2. prefix > > ./configure --prefix=/prefix/location && make && make install > > > So to use 'make install' - you need to run configure with the correct prefix option. > > Satish > > On Tue, 11 Apr 2017, Joachim Wuttke wrote: > >> Does PETSc support the usual installation command sequence >> configure; make; make install? >> >> The installation web page does not mention "make install". >> >> The Makefile generated by configure does contain a target "install". >> However, "make install" does not work; it yields the error message >> "Incorrect prefix usage. Specified destDir same as current >> PETSC_DIR/PETSC_ARCH" >> >> > From jed at jedbrown.org Tue Apr 11 21:40:45 2017 From: jed at jedbrown.org (Jed Brown) Date: Tue, 11 Apr 2017 20:40:45 -0600 Subject: [petsc-users] how to use petsc4py with mpi subcommunicators? In-Reply-To: <350529B93F4E2F4497FD8DE4E86E84AA16F1FDDA@AUS1EXMBX04.ioinc.ioroot.tld> References: <350529B93F4E2F4497FD8DE4E86E84AA16F1FC5C@AUS1EXMBX04.ioinc.ioroot.tld> <874lxu52bi.fsf@jedbrown.org> <350529B93F4E2F4497FD8DE4E86E84AA16F1FDDA@AUS1EXMBX04.ioinc.ioroot.tld> Message-ID: <87h91u2vrm.fsf@jedbrown.org> Rodrigo Felicio writes: > I think they are linked to the same MPI, but I am not sure how to confirm that. Looking into mpi4py/mpi.cfg I see the expected mpicc. On petsc4py/lib/petsc.cfg points to the petsc install directory and in there on the configure.log file I see the same path for the same intel mpi compiler... > > I noticed that if I instead of > petsc4py.init(comm=newcomm) > > using > petsc4py.init() > > the works, but then PETSc.COMM_WORLD = MPI.COMM_WORLD > > Further searching on the web suggest you are right and that the error msg that I am getting points to mismatch between mpirun and linked libraries, but why does it only happens if trying to initiate petsc with petsc4py.init(comm=newcomm)? If you don't load mpi4py then PETSc would be the only library directly calling MPI, so all would be consistent. Note that you can initialize PETSc on MPI_COMM_WORLD (the default) and still create a subcommunicator for individual objects. That is normally recommended because the debugging and profiling tools can be more accurate and produce cleaner output. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From j.wuttke at fz-juelich.de Wed Apr 12 04:58:39 2017 From: j.wuttke at fz-juelich.de (Joachim Wuttke) Date: Wed, 12 Apr 2017 11:58:39 +0200 Subject: [petsc-users] -log_summary: User Manual outdated Message-ID: <49b5f909-ce01-8567-3d62-2a59cb13a86a@fz-juelich.de> pp. 174, 183 in the current User Manual describe option -log_summary. Running code with this option yields WARNING: -log_summary is being deprecated; switch to -log_view Btw either form of the option is missing in the Index. -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5110 bytes Desc: S/MIME Cryptographic Signature URL: From patrick.sanan at gmail.com Wed Apr 12 05:01:04 2017 From: patrick.sanan at gmail.com (Patrick Sanan) Date: Wed, 12 Apr 2017 12:01:04 +0200 Subject: [petsc-users] -log_summary: User Manual outdated In-Reply-To: <49b5f909-ce01-8567-3d62-2a59cb13a86a@fz-juelich.de> References: <49b5f909-ce01-8567-3d62-2a59cb13a86a@fz-juelich.de> Message-ID: This has been fixed in the master branch (and thus the pdf will be fixed when the next version of the manual comes out with PETSc 3.8). On Wed, Apr 12, 2017 at 11:58 AM, Joachim Wuttke wrote: > pp. 174, 183 in the current User Manual describe option -log_summary. > > Running code with this option yields > > WARNING: -log_summary is being deprecated; switch to -log_view > > Btw either form of the option is missing in the Index. > > From Rodrigo.Felicio at iongeo.com Wed Apr 12 09:30:56 2017 From: Rodrigo.Felicio at iongeo.com (Rodrigo Felicio) Date: Wed, 12 Apr 2017 14:30:56 +0000 Subject: [petsc-users] how to use petsc4py with mpi subcommunicators? In-Reply-To: <87h91u2vrm.fsf@jedbrown.org> References: <350529B93F4E2F4497FD8DE4E86E84AA16F1FC5C@AUS1EXMBX04.ioinc.ioroot.tld> <874lxu52bi.fsf@jedbrown.org> <350529B93F4E2F4497FD8DE4E86E84AA16F1FDDA@AUS1EXMBX04.ioinc.ioroot.tld> <87h91u2vrm.fsf@jedbrown.org> Message-ID: <350529B93F4E2F4497FD8DE4E86E84AA16F1FE5E@AUS1EXMBX04.ioinc.ioroot.tld> Thanks Jed and Gaetan. I will try that approach of splitting PETSc.COMM_WORLD, but I still need to load mpi4py (probably after PETSc), because PETSc.Comm is very limited, i.e., it does not have the split function, for example. My goal is to be able to set different matrices and vectors for each subcommunicator, and I am guessing that I can create them using something like PETSc.Mat().createAij(comm=subcomm) Kind regards Rodrigo ________________________________ This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original. From mfadams at lbl.gov Wed Apr 12 10:16:02 2017 From: mfadams at lbl.gov (Mark Adams) Date: Wed, 12 Apr 2017 11:16:02 -0400 Subject: [petsc-users] GAMG for the unsymmetrical matrix In-Reply-To: References: <66D79752-D4C3-4A2E-ADCE-EFC23C93CCD7@mcs.anl.gov> <772D2966-F917-44D1-B2AC-B0F4E506DC7C@mcs.anl.gov> Message-ID: The problem comes from setting the number of MG levels (-pc_mg_levels 2). Not your fault, it looks like the GAMG logic is faulty, in your version at least. GAMG will force the coarsest grid to one processor by default, in newer versions. You can override the default with: -pc_gamg_use_parallel_coarse_grid_solver Your coarse grid solver is ASM with these 37 equation per process and 512 processes. That is bad. Note, you could run this on one process to see the proper convergence rate. You can fix this with parameters: > -pc_gamg_process_eq_limit <50>: Limit (goal) on number of equations per process on coarse grids (PCGAMGSetProcEqLim) > -pc_gamg_coarse_eq_limit <50>: Limit on number of equations for the coarse grid (PCGAMGSetCoarseEqLim) If you really want two levels then set something like -pc_gamg_coarse_eq_limit 18145 (or higher) -pc_gamg_coarse_eq_limit 18145 (or higher). You can run with -info and grep on GAMG and you will meta-data for each level. you should see "npe=1" for the coarsest, last, grid. Or use a parallel direct solver. Note, you should not see much degradation as you increase the number of levels. 18145 eqs on a 3D problem will probably be noticeable. I generally aim for about 3000. On Mon, Apr 10, 2017 at 12:17 PM, Kong, Fande wrote: > > > On Sun, Apr 9, 2017 at 6:04 AM, Mark Adams wrote: > >> You seem to have two levels here and 3M eqs on the fine grid and 37 on >> the coarse grid. > > > 37 is on the sub domain. > > rows=18145, cols=18145 on the entire coarse grid. > > > > > >> I don't understand that. >> >> You are also calling the AMG setup a lot, but not spending much time >> in it. Try running with -info and grep on "GAMG". >> >> >> On Fri, Apr 7, 2017 at 5:29 PM, Kong, Fande wrote: >> > Thanks, Barry. >> > >> > It works. >> > >> > GAMG is three times better than ASM in terms of the number of linear >> > iterations, but it is five times slower than ASM. Any suggestions to >> improve >> > the performance of GAMG? Log files are attached. >> > >> > Fande, >> > >> > On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith wrote: >> >> >> >> >> >> > On Apr 6, 2017, at 9:39 AM, Kong, Fande wrote: >> >> > >> >> > Thanks, Mark and Barry, >> >> > >> >> > It works pretty wells in terms of the number of linear iterations >> (using >> >> > "-pc_gamg_sym_graph true"), but it is horrible in the compute time. >> I am >> >> > using the two-level method via "-pc_mg_levels 2". The reason why the >> compute >> >> > time is larger than other preconditioning options is that a matrix >> free >> >> > method is used in the fine level and in my particular problem the >> function >> >> > evaluation is expensive. >> >> > >> >> > I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton, >> >> > but I do not think I want to make the preconditioning part >> matrix-free. Do >> >> > you guys know how to turn off the matrix-free method for GAMG? >> >> >> >> -pc_use_amat false >> >> >> >> > >> >> > Here is the detailed solver: >> >> > >> >> > SNES Object: 384 MPI processes >> >> > type: newtonls >> >> > maximum iterations=200, maximum function evaluations=10000 >> >> > tolerances: relative=1e-08, absolute=1e-08, solution=1e-50 >> >> > total number of linear solver iterations=20 >> >> > total number of function evaluations=166 >> >> > norm schedule ALWAYS >> >> > SNESLineSearch Object: 384 MPI processes >> >> > type: bt >> >> > interpolation: cubic >> >> > alpha=1.000000e-04 >> >> > maxstep=1.000000e+08, minlambda=1.000000e-12 >> >> > tolerances: relative=1.000000e-08, absolute=1.000000e-15, >> >> > lambda=1.000000e-08 >> >> > maximum iterations=40 >> >> > KSP Object: 384 MPI processes >> >> > type: gmres >> >> > GMRES: restart=100, using Classical (unmodified) Gram-Schmidt >> >> > Orthogonalization with no iterative refinement >> >> > GMRES: happy breakdown tolerance 1e-30 >> >> > maximum iterations=100, initial guess is zero >> >> > tolerances: relative=0.001, absolute=1e-50, divergence=10000. >> >> > right preconditioning >> >> > using UNPRECONDITIONED norm type for convergence test >> >> > PC Object: 384 MPI processes >> >> > type: gamg >> >> > MG: type is MULTIPLICATIVE, levels=2 cycles=v >> >> > Cycles per PCApply=1 >> >> > Using Galerkin computed coarse grid matrices >> >> > GAMG specific options >> >> > Threshold for dropping small values from graph 0. >> >> > AGG specific options >> >> > Symmetric graph true >> >> > Coarse grid solver -- level ------------------------------- >> >> > KSP Object: (mg_coarse_) 384 MPI processes >> >> > type: preonly >> >> > maximum iterations=10000, initial guess is zero >> >> > tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> >> > left preconditioning >> >> > using NONE norm type for convergence test >> >> > PC Object: (mg_coarse_) 384 MPI processes >> >> > type: bjacobi >> >> > block Jacobi: number of blocks = 384 >> >> > Local solve is same for all blocks, in the following KSP >> and >> >> > PC objects: >> >> > KSP Object: (mg_coarse_sub_) 1 MPI processes >> >> > type: preonly >> >> > maximum iterations=1, initial guess is zero >> >> > tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> >> > left preconditioning >> >> > using NONE norm type for convergence test >> >> > PC Object: (mg_coarse_sub_) 1 MPI processes >> >> > type: lu >> >> > LU: out-of-place factorization >> >> > tolerance for zero pivot 2.22045e-14 >> >> > using diagonal shift on blocks to prevent zero pivot >> >> > [INBLOCKS] >> >> > matrix ordering: nd >> >> > factor fill ratio given 5., needed 1.31367 >> >> > Factored matrix follows: >> >> > Mat Object: 1 MPI processes >> >> > type: seqaij >> >> > rows=37, cols=37 >> >> > package used to perform factorization: petsc >> >> > total: nonzeros=913, allocated nonzeros=913 >> >> > total number of mallocs used during MatSetValues >> calls >> >> > =0 >> >> > not using I-node routines >> >> > linear system matrix = precond matrix: >> >> > Mat Object: 1 MPI processes >> >> > type: seqaij >> >> > rows=37, cols=37 >> >> > total: nonzeros=695, allocated nonzeros=695 >> >> > total number of mallocs used during MatSetValues calls =0 >> >> > not using I-node routines >> >> > linear system matrix = precond matrix: >> >> > Mat Object: 384 MPI processes >> >> > type: mpiaij >> >> > rows=18145, cols=18145 >> >> > total: nonzeros=1709115, allocated nonzeros=1709115 >> >> > total number of mallocs used during MatSetValues calls =0 >> >> > not using I-node (on process 0) routines >> >> > Down solver (pre-smoother) on level 1 >> >> > ------------------------------- >> >> > KSP Object: (mg_levels_1_) 384 MPI processes >> >> > type: chebyshev >> >> > Chebyshev: eigenvalue estimates: min = 0.133339, max = >> >> > 1.46673 >> >> > Chebyshev: eigenvalues estimated using gmres with >> translations >> >> > [0. 0.1; 0. 1.1] >> >> > KSP Object: (mg_levels_1_esteig_) 384 >> MPI >> >> > processes >> >> > type: gmres >> >> > GMRES: restart=30, using Classical (unmodified) >> >> > Gram-Schmidt Orthogonalization with no iterative refinement >> >> > GMRES: happy breakdown tolerance 1e-30 >> >> > maximum iterations=10, initial guess is zero >> >> > tolerances: relative=1e-12, absolute=1e-50, >> >> > divergence=10000. >> >> > left preconditioning >> >> > using PRECONDITIONED norm type for convergence test >> >> > maximum iterations=2 >> >> > tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> >> > left preconditioning >> >> > using nonzero initial guess >> >> > using NONE norm type for convergence test >> >> > PC Object: (mg_levels_1_) 384 MPI processes >> >> > type: sor >> >> > SOR: type = local_symmetric, iterations = 1, local >> iterations >> >> > = 1, omega = 1. >> >> > linear system matrix followed by preconditioner matrix: >> >> > Mat Object: 384 MPI processes >> >> > type: mffd >> >> > rows=3020875, cols=3020875 >> >> > Matrix-free approximation: >> >> > err=1.49012e-08 (relative error in function evaluation) >> >> > Using wp compute h routine >> >> > Does not compute normU >> >> > Mat Object: () 384 MPI processes >> >> > type: mpiaij >> >> > rows=3020875, cols=3020875 >> >> > total: nonzeros=215671710, allocated nonzeros=241731750 >> >> > total number of mallocs used during MatSetValues calls =0 >> >> > not using I-node (on process 0) routines >> >> > Up solver (post-smoother) same as down solver (pre-smoother) >> >> > linear system matrix followed by preconditioner matrix: >> >> > Mat Object: 384 MPI processes >> >> > type: mffd >> >> > rows=3020875, cols=3020875 >> >> > Matrix-free approximation: >> >> > err=1.49012e-08 (relative error in function evaluation) >> >> > Using wp compute h routine >> >> > Does not compute normU >> >> > Mat Object: () 384 MPI processes >> >> > type: mpiaij >> >> > rows=3020875, cols=3020875 >> >> > total: nonzeros=215671710, allocated nonzeros=241731750 >> >> > total number of mallocs used during MatSetValues calls =0 >> >> > not using I-node (on process 0) routines >> >> > >> >> > >> >> > Fande, >> >> > >> >> > On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams wrote: >> >> > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith >> wrote: >> >> > > >> >> > >> Does this mean that GAMG works for the symmetrical matrix only? >> >> > > >> >> > > No, it means that for non symmetric nonzero structure you need >> the >> >> > > extra flag. So use the extra flag. The reason we don't always use >> the flag >> >> > > is because it adds extra cost and isn't needed if the matrix >> already has a >> >> > > symmetric nonzero structure. >> >> > >> >> > BTW, if you have symmetric non-zero structure you can just set >> >> > -pc_gamg_threshold -1.0', note the "or" in the message. >> >> > >> >> > If you want to mess with the threshold then you need to use the >> >> > symmetrized flag. >> >> > >> >> >> > >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ingogaertner.tus at gmail.com Wed Apr 12 10:52:32 2017 From: ingogaertner.tus at gmail.com (Ingo Gaertner) Date: Wed, 12 Apr 2017 17:52:32 +0200 Subject: [petsc-users] dmplex face normals orientation Message-ID: Hello, I have problems determining the orientation of the face normals of a DMPlex. I create a DMPlex, for example with DMPlexCreateHexBoxMesh(). Next, I get the face normals using DMPlexComputeGeometryFVM(DM dm, Vec *cellgeom, Vec *facegeom). facegeom gives the correct normals, but I don't know how the inside/outside is defined with respect to the adjacant cells? Finally, I iterate over all cells. For each cell I iterate over the bounding faces (obtained from DMPlexGetCone) and try to obtain their orientation with respect to the current cell using DMPlexGetConeOrientation(). However, the six integers for the orientation are the same for each cell. I expect them to flip between neighbour cells, because if a face normal is pointing outside for any cell, the same normal is pointing inside for its neighbour. Apparently I have a misunderstanding here. How can I make use of the face normals in facegeom and the orientation values from DMPlexGetConeOrientation() to get the outside face normals for each cell? Thank you Ingo Virenfrei. www.avast.com <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> -------------- next part -------------- An HTML attachment was scrubbed... URL: From gaetank at gmail.com Wed Apr 12 11:04:00 2017 From: gaetank at gmail.com (Gaetan Kenway) Date: Wed, 12 Apr 2017 09:04:00 -0700 Subject: [petsc-users] how to use petsc4py with mpi subcommunicators? In-Reply-To: <350529B93F4E2F4497FD8DE4E86E84AA16F1FE5E@AUS1EXMBX04.ioinc.ioroot.tld> References: <350529B93F4E2F4497FD8DE4E86E84AA16F1FC5C@AUS1EXMBX04.ioinc.ioroot.tld> <874lxu52bi.fsf@jedbrown.org> <350529B93F4E2F4497FD8DE4E86E84AA16F1FDDA@AUS1EXMBX04.ioinc.ioroot.tld> <87h91u2vrm.fsf@jedbrown.org> <350529B93F4E2F4497FD8DE4E86E84AA16F1FE5E@AUS1EXMBX04.ioinc.ioroot.tld> Message-ID: One other quick note: Sometimes it appears that mpi4py for petsc4py do not always play nicely together. I think you want to do the petsc4py import first and then the mpi4py import. Then you can split the MPI.COMM_WORLD all you want and create any petsc4py objects on them. Or if that doesn't work, swap the import order. If I recall correctly, you could get a warning on exit that something in mpi4py wasn't cleaned up correctly. Gaetan On Wed, Apr 12, 2017 at 7:30 AM, Rodrigo Felicio wrote: > Thanks Jed and Gaetan. > I will try that approach of splitting PETSc.COMM_WORLD, but I still need > to load mpi4py (probably after PETSc), because PETSc.Comm is very limited, > i.e., it does not have the split function, for example. My goal is to be > able to set different matrices and vectors for each subcommunicator, and I > am guessing that I can create them using something like > PETSc.Mat().createAij(comm=subcomm) > > Kind regards > Rodrigo > > ________________________________ > > > This email and any files transmitted with it are confidential and are > intended solely for the use of the individual or entity to whom they are > addressed. If you are not the original recipient or the person responsible > for delivering the email to the intended recipient, be advised that you > have received this email in error, and that any use, dissemination, > forwarding, printing, or copying of this email is strictly prohibited. If > you received this email in error, please immediately notify the sender and > delete the original. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Rodrigo.Felicio at iongeo.com Wed Apr 12 12:10:14 2017 From: Rodrigo.Felicio at iongeo.com (Rodrigo Felicio) Date: Wed, 12 Apr 2017 17:10:14 +0000 Subject: [petsc-users] how to use petsc4py with mpi subcommunicators? In-Reply-To: <350529B93F4E2F4497FD8DE4E86E84AA16F1FE5E@AUS1EXMBX04.ioinc.ioroot.tld> References: <350529B93F4E2F4497FD8DE4E86E84AA16F1FC5C@AUS1EXMBX04.ioinc.ioroot.tld> <874lxu52bi.fsf@jedbrown.org> <350529B93F4E2F4497FD8DE4E86E84AA16F1FDDA@AUS1EXMBX04.ioinc.ioroot.tld> <87h91u2vrm.fsf@jedbrown.org>, <350529B93F4E2F4497FD8DE4E86E84AA16F1FE5E@AUS1EXMBX04.ioinc.ioroot.tld> Message-ID: <350529B93F4E2F4497FD8DE4E86E84AA16F1FEF5@AUS1EXMBX04.ioinc.ioroot.tld> Going over my older codes I found out that I have already tried the approach of splitting PETSc.COMM_WORLD, but whenever I try to create a matrix using a subcommuicator, the program fails. For example, executing the following python code attached to this msg, I get the following output time mpirun -n 5 python another_split_ex.py petsc rank=2, petsc size=5 petsc rank=3, petsc size=5 petsc rank=0, petsc size=5 petsc rank=1, petsc size=5 petsc rank=4, petsc size=5 number of subcomms = 2 sub rank 0/3, color:0 sub rank 0/2, color:1 sub rank 1/3, color:0 sub rank 1/2, color:1 sub rank 2/3, color:0 creating A in subcomm 1= 2, 1 creating A in subcomm 1= 2, 0 Traceback (most recent call last): File "another_split_ex.py", line 43, in Traceback (most recent call last): File "another_split_ex.py", line 43, in A = PETSc.Mat().createDense([n,n], comm=subcomm) File "PETSc/Mat.pyx", line 390, in petsc4py.PETSc.Mat.createDense (src/petsc4py.PETSc.c:113792) A = PETSc.Mat().createDense([n,n], comm=subcomm) File "PETSc/Mat.pyx", line 390, in petsc4py.PETSc.Mat.createDense (src/petsc4py.PETSc.c:113792) File "PETSc/petscmat.pxi", line 602, in petsc4py.PETSc.Mat_Create (src/petsc4py.PETSc.c:25274) File "PETSc/petscmat.pxi", line 602, in petsc4py.PETSc.Mat_Create (src/petsc4py.PETSc.c:25274) File "PETSc/petscsys.pxi", line 104, in petsc4py.PETSc.Sys_Layout (src/petsc4py.PETSc.c:13666) File "PETSc/petscsys.pxi", line 104, in petsc4py.PETSc.Sys_Layout (src/petsc4py.PETSc.c:13666) petsc4py.PETSc.Error: petsc4py.PETSc.Errorerror code 608517 [1] PetscSplitOwnership() line 86 in ~/mylocal/petsc/src/sys/utils/psplit.c : error code 134826245 [3] PetscSplitOwnership() line 86 in ~/mylocal/petsc/src/sys/utils/psplit.c Checking the traceback, all I can say is that when the subcommunicator object reaches psplit.c code it gets somehow corrupted, because PetscSplitOwnership() fails to retrieve the size of the subcommunicator ... :-( regards Rodrigo ________________________________ This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original. ________________________________ This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original. -------------- next part -------------- A non-text attachment was scrubbed... Name: another_split_ex.py Type: text/x-python Size: 1709 bytes Desc: another_split_ex.py URL: From gaetank at gmail.com Wed Apr 12 12:17:02 2017 From: gaetank at gmail.com (Gaetan Kenway) Date: Wed, 12 Apr 2017 10:17:02 -0700 Subject: [petsc-users] how to use petsc4py with mpi subcommunicators? In-Reply-To: <350529B93F4E2F4497FD8DE4E86E84AA16F1FEF5@AUS1EXMBX04.ioinc.ioroot.tld> References: <350529B93F4E2F4497FD8DE4E86E84AA16F1FC5C@AUS1EXMBX04.ioinc.ioroot.tld> <874lxu52bi.fsf@jedbrown.org> <350529B93F4E2F4497FD8DE4E86E84AA16F1FDDA@AUS1EXMBX04.ioinc.ioroot.tld> <87h91u2vrm.fsf@jedbrown.org> <350529B93F4E2F4497FD8DE4E86E84AA16F1FE5E@AUS1EXMBX04.ioinc.ioroot.tld> <350529B93F4E2F4497FD8DE4E86E84AA16F1FEF5@AUS1EXMBX04.ioinc.ioroot.tld> Message-ID: Hi Rodrigo I just ran your example on Nasa's Pleiades system. Here's what I got: PBS r459i4n11:~> time mpiexec -n 5 python3.5 another_split_ex.py number of subcomms = 2.5 petsc rank=2, petsc size=5 sub rank 1/3, color:0 petsc rank=4, petsc size=5 sub rank 2/3, color:0 petsc rank=0, petsc size=5 sub rank 0/3, color:0 KSP Object: 2 MPI processes type: cg maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using DEFAULT norm type for convergence test PC Object: 2 MPI processes type: none PC has not been set up so information may be incomplete linear system matrix = precond matrix: Mat Object: 2 MPI processes type: mpidense rows=100, cols=100 total: nonzeros=10000, allocated nonzeros=10000 total number of mallocs used during MatSetValues calls =0 petsc rank=1, petsc size=5 sub rank 0/2, color:1 creating A in subcomm 1= 2, 0 petsc rank=3, petsc size=5 sub rank 1/2, color:1 creating A in subcomm 1= 2, 1 real 0m1.236s user 0m0.088s sys 0m0.008s So everything looks like it went through fine. I know this doesn't help you directly, but we can confirm at least the python code itself is fine. Gaetan On Wed, Apr 12, 2017 at 10:10 AM, Rodrigo Felicio < Rodrigo.Felicio at iongeo.com> wrote: > Going over my older codes I found out that I have already tried the > approach of splitting PETSc.COMM_WORLD, but whenever I try to create a > matrix using a subcommuicator, the program fails. For example, executing > the following python code attached to this msg, I get the following output > > time mpirun -n 5 python another_split_ex.py > petsc rank=2, petsc size=5 > petsc rank=3, petsc size=5 > petsc rank=0, petsc size=5 > petsc rank=1, petsc size=5 > petsc rank=4, petsc size=5 > number of subcomms = 2 > sub rank 0/3, color:0 > sub rank 0/2, color:1 > sub rank 1/3, color:0 > sub rank 1/2, color:1 > sub rank 2/3, color:0 > creating A in subcomm 1= 2, 1 > creating A in subcomm 1= 2, 0 > Traceback (most recent call last): > File "another_split_ex.py", line 43, in > Traceback (most recent call last): > File "another_split_ex.py", line 43, in > A = PETSc.Mat().createDense([n,n], comm=subcomm) > File "PETSc/Mat.pyx", line 390, in petsc4py.PETSc.Mat.createDense > (src/petsc4py.PETSc.c:113792) > A = PETSc.Mat().createDense([n,n], comm=subcomm) > File "PETSc/Mat.pyx", line 390, in petsc4py.PETSc.Mat.createDense > (src/petsc4py.PETSc.c:113792) > File "PETSc/petscmat.pxi", line 602, in petsc4py.PETSc.Mat_Create > (src/petsc4py.PETSc.c:25274) > File "PETSc/petscmat.pxi", line 602, in petsc4py.PETSc.Mat_Create > (src/petsc4py.PETSc.c:25274) > File "PETSc/petscsys.pxi", line 104, in petsc4py.PETSc.Sys_Layout > (src/petsc4py.PETSc.c:13666) > File "PETSc/petscsys.pxi", line 104, in petsc4py.PETSc.Sys_Layout > (src/petsc4py.PETSc.c:13666) > petsc4py.PETSc.Error: petsc4py.PETSc.Errorerror code 608517 > [1] PetscSplitOwnership() line 86 in ~/mylocal/petsc/src/sys/utils/ > psplit.c > : error code 134826245 > [3] PetscSplitOwnership() line 86 in ~/mylocal/petsc/src/sys/utils/ > psplit.c > > > Checking the traceback, all I can say is that when the subcommunicator > object reaches psplit.c code it gets somehow corrupted, because > PetscSplitOwnership() fails to retrieve the size of the subcommunicator ... > :-( > > regards > Rodrigo > > ________________________________ > > > This email and any files transmitted with it are confidential and are > intended solely for the use of the individual or entity to whom they are > addressed. If you are not the original recipient or the person responsible > for delivering the email to the intended recipient, be advised that you > have received this email in error, and that any use, dissemination, > forwarding, printing, or copying of this email is strictly prohibited. If > you received this email in error, please immediately notify the sender and > delete the original. > > > ________________________________ > > > This email and any files transmitted with it are confidential and are > intended solely for the use of the individual or entity to whom they are > addressed. If you are not the original recipient or the person responsible > for delivering the email to the intended recipient, be advised that you > have received this email in error, and that any use, dissemination, > forwarding, printing, or copying of this email is strictly prohibited. If > you received this email in error, please immediately notify the sender and > delete the original. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gaetank at gmail.com Wed Apr 12 12:19:32 2017 From: gaetank at gmail.com (Gaetan Kenway) Date: Wed, 12 Apr 2017 10:19:32 -0700 Subject: [petsc-users] how to use petsc4py with mpi subcommunicators? In-Reply-To: References: <350529B93F4E2F4497FD8DE4E86E84AA16F1FC5C@AUS1EXMBX04.ioinc.ioroot.tld> <874lxu52bi.fsf@jedbrown.org> <350529B93F4E2F4497FD8DE4E86E84AA16F1FDDA@AUS1EXMBX04.ioinc.ioroot.tld> <87h91u2vrm.fsf@jedbrown.org> <350529B93F4E2F4497FD8DE4E86E84AA16F1FE5E@AUS1EXMBX04.ioinc.ioroot.tld> <350529B93F4E2F4497FD8DE4E86E84AA16F1FEF5@AUS1EXMBX04.ioinc.ioroot.tld> Message-ID: Maybe try doing pComm=MPI.COMM_WORLD instead of the PETSc.COMM_WORLD. I know it shouldn't matter, but it's worth a shot. Also then you won't need the tompi4py() i guess. Gaetan On Wed, Apr 12, 2017 at 10:17 AM, Gaetan Kenway wrote: > Hi Rodrigo > > I just ran your example on Nasa's Pleiades system. Here's what I got: > > PBS r459i4n11:~> time mpiexec -n 5 python3.5 another_split_ex.py > number of subcomms = 2.5 > petsc rank=2, petsc size=5 > sub rank 1/3, color:0 > petsc rank=4, petsc size=5 > sub rank 2/3, color:0 > petsc rank=0, petsc size=5 > sub rank 0/3, color:0 > KSP Object: 2 MPI processes > type: cg > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using DEFAULT norm type for convergence test > PC Object: 2 MPI processes > type: none > PC has not been set up so information may be incomplete > linear system matrix = precond matrix: > Mat Object: 2 MPI processes > type: mpidense > rows=100, cols=100 > total: nonzeros=10000, allocated nonzeros=10000 > total number of mallocs used during MatSetValues calls =0 > petsc rank=1, petsc size=5 > sub rank 0/2, color:1 > creating A in subcomm 1= 2, 0 > petsc rank=3, petsc size=5 > sub rank 1/2, color:1 > creating A in subcomm 1= 2, 1 > > real 0m1.236s > user 0m0.088s > sys 0m0.008s > > So everything looks like it went through fine. I know this doesn't help > you directly, but we can confirm at least the python code itself is fine. > > Gaetan > > On Wed, Apr 12, 2017 at 10:10 AM, Rodrigo Felicio < > Rodrigo.Felicio at iongeo.com> wrote: > >> Going over my older codes I found out that I have already tried the >> approach of splitting PETSc.COMM_WORLD, but whenever I try to create a >> matrix using a subcommuicator, the program fails. For example, executing >> the following python code attached to this msg, I get the following output >> >> time mpirun -n 5 python another_split_ex.py >> petsc rank=2, petsc size=5 >> petsc rank=3, petsc size=5 >> petsc rank=0, petsc size=5 >> petsc rank=1, petsc size=5 >> petsc rank=4, petsc size=5 >> number of subcomms = 2 >> sub rank 0/3, color:0 >> sub rank 0/2, color:1 >> sub rank 1/3, color:0 >> sub rank 1/2, color:1 >> sub rank 2/3, color:0 >> creating A in subcomm 1= 2, 1 >> creating A in subcomm 1= 2, 0 >> Traceback (most recent call last): >> File "another_split_ex.py", line 43, in >> Traceback (most recent call last): >> File "another_split_ex.py", line 43, in >> A = PETSc.Mat().createDense([n,n], comm=subcomm) >> File "PETSc/Mat.pyx", line 390, in petsc4py.PETSc.Mat.createDense >> (src/petsc4py.PETSc.c:113792) >> A = PETSc.Mat().createDense([n,n], comm=subcomm) >> File "PETSc/Mat.pyx", line 390, in petsc4py.PETSc.Mat.createDense >> (src/petsc4py.PETSc.c:113792) >> File "PETSc/petscmat.pxi", line 602, in petsc4py.PETSc.Mat_Create >> (src/petsc4py.PETSc.c:25274) >> File "PETSc/petscmat.pxi", line 602, in petsc4py.PETSc.Mat_Create >> (src/petsc4py.PETSc.c:25274) >> File "PETSc/petscsys.pxi", line 104, in petsc4py.PETSc.Sys_Layout >> (src/petsc4py.PETSc.c:13666) >> File "PETSc/petscsys.pxi", line 104, in petsc4py.PETSc.Sys_Layout >> (src/petsc4py.PETSc.c:13666) >> petsc4py.PETSc.Error: petsc4py.PETSc.Errorerror code 608517 >> [1] PetscSplitOwnership() line 86 in ~/mylocal/petsc/src/sys/utils/ >> psplit.c >> : error code 134826245 >> [3] PetscSplitOwnership() line 86 in ~/mylocal/petsc/src/sys/utils/ >> psplit.c >> >> >> Checking the traceback, all I can say is that when the subcommunicator >> object reaches psplit.c code it gets somehow corrupted, because >> PetscSplitOwnership() fails to retrieve the size of the subcommunicator ... >> :-( >> >> regards >> Rodrigo >> >> ________________________________ >> >> >> This email and any files transmitted with it are confidential and are >> intended solely for the use of the individual or entity to whom they are >> addressed. If you are not the original recipient or the person responsible >> for delivering the email to the intended recipient, be advised that you >> have received this email in error, and that any use, dissemination, >> forwarding, printing, or copying of this email is strictly prohibited. If >> you received this email in error, please immediately notify the sender and >> delete the original. >> >> >> ________________________________ >> >> >> This email and any files transmitted with it are confidential and are >> intended solely for the use of the individual or entity to whom they are >> addressed. If you are not the original recipient or the person responsible >> for delivering the email to the intended recipient, be advised that you >> have received this email in error, and that any use, dissemination, >> forwarding, printing, or copying of this email is strictly prohibited. If >> you received this email in error, please immediately notify the sender and >> delete the original. >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Rodrigo.Felicio at iongeo.com Wed Apr 12 12:30:56 2017 From: Rodrigo.Felicio at iongeo.com (Rodrigo Felicio) Date: Wed, 12 Apr 2017 17:30:56 +0000 Subject: [petsc-users] how to use petsc4py with mpi subcommunicators? In-Reply-To: References: <350529B93F4E2F4497FD8DE4E86E84AA16F1FC5C@AUS1EXMBX04.ioinc.ioroot.tld> <874lxu52bi.fsf@jedbrown.org> <350529B93F4E2F4497FD8DE4E86E84AA16F1FDDA@AUS1EXMBX04.ioinc.ioroot.tld> <87h91u2vrm.fsf@jedbrown.org> <350529B93F4E2F4497FD8DE4E86E84AA16F1FE5E@AUS1EXMBX04.ioinc.ioroot.tld> <350529B93F4E2F4497FD8DE4E86E84AA16F1FEF5@AUS1EXMBX04.ioinc.ioroot.tld> Message-ID: <350529B93F4E2F4497FD8DE4E86E84AA16F1FF13@AUS1EXMBX04.ioinc.ioroot.tld> Thanks, Gaetan, for your suggestions and for running the code?Now I know for sure there is something wrong with my installation! Cheers Rodrigo From: Gaetan Kenway [mailto:gaetank at gmail.com] Sent: Wednesday, April 12, 2017 12:20 PM To: Rodrigo Felicio Cc: petsc-users at mcs.anl.gov Subject: Re: [petsc-users] how to use petsc4py with mpi subcommunicators? Maybe try doing pComm=MPI.COMM_WORLD instead of the PETSc.COMM_WORLD. I know it shouldn't matter, but it's worth a shot. Also then you won't need the tompi4py() i guess. Gaetan On Wed, Apr 12, 2017 at 10:17 AM, Gaetan Kenway > wrote: Hi Rodrigo I just ran your example on Nasa's Pleiades system. Here's what I got: PBS r459i4n11:~> time mpiexec -n 5 python3.5 another_split_ex.py number of subcomms = 2.5 petsc rank=2, petsc size=5 sub rank 1/3, color:0 petsc rank=4, petsc size=5 sub rank 2/3, color:0 petsc rank=0, petsc size=5 sub rank 0/3, color:0 KSP Object: 2 MPI processes type: cg maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using DEFAULT norm type for convergence test PC Object: 2 MPI processes type: none PC has not been set up so information may be incomplete linear system matrix = precond matrix: Mat Object: 2 MPI processes type: mpidense rows=100, cols=100 total: nonzeros=10000, allocated nonzeros=10000 total number of mallocs used during MatSetValues calls =0 petsc rank=1, petsc size=5 sub rank 0/2, color:1 creating A in subcomm 1= 2, 0 petsc rank=3, petsc size=5 sub rank 1/2, color:1 creating A in subcomm 1= 2, 1 real 0m1.236s user 0m0.088s sys 0m0.008s So everything looks like it went through fine. I know this doesn't help you directly, but we can confirm at least the python code itself is fine. Gaetan On Wed, Apr 12, 2017 at 10:10 AM, Rodrigo Felicio > wrote: Going over my older codes I found out that I have already tried the approach of splitting PETSc.COMM_WORLD, but whenever I try to create a matrix using a subcommuicator, the program fails. For example, executing the following python code attached to this msg, I get the following output time mpirun -n 5 python another_split_ex.py petsc rank=2, petsc size=5 petsc rank=3, petsc size=5 petsc rank=0, petsc size=5 petsc rank=1, petsc size=5 petsc rank=4, petsc size=5 number of subcomms = 2 sub rank 0/3, color:0 sub rank 0/2, color:1 sub rank 1/3, color:0 sub rank 1/2, color:1 sub rank 2/3, color:0 creating A in subcomm 1= 2, 1 creating A in subcomm 1= 2, 0 Traceback (most recent call last): File "another_split_ex.py", line 43, in Traceback (most recent call last): File "another_split_ex.py", line 43, in A = PETSc.Mat().createDense([n,n], comm=subcomm) File "PETSc/Mat.pyx", line 390, in petsc4py.PETSc.Mat.createDense (src/petsc4py.PETSc.c:113792) A = PETSc.Mat().createDense([n,n], comm=subcomm) File "PETSc/Mat.pyx", line 390, in petsc4py.PETSc.Mat.createDense (src/petsc4py.PETSc.c:113792) File "PETSc/petscmat.pxi", line 602, in petsc4py.PETSc.Mat_Create (src/petsc4py.PETSc.c:25274) File "PETSc/petscmat.pxi", line 602, in petsc4py.PETSc.Mat_Create (src/petsc4py.PETSc.c:25274) File "PETSc/petscsys.pxi", line 104, in petsc4py.PETSc.Sys_Layout (src/petsc4py.PETSc.c:13666) File "PETSc/petscsys.pxi", line 104, in petsc4py.PETSc.Sys_Layout (src/petsc4py.PETSc.c:13666) petsc4py.PETSc.Error: petsc4py.PETSc.Errorerror code 608517 [1] PetscSplitOwnership() line 86 in ~/mylocal/petsc/src/sys/utils/psplit.c : error code 134826245 [3] PetscSplitOwnership() line 86 in ~/mylocal/petsc/src/sys/utils/psplit.c Checking the traceback, all I can say is that when the subcommunicator object reaches psplit.c code it gets somehow corrupted, because PetscSplitOwnership() fails to retrieve the size of the subcommunicator ... :-( regards Rodrigo ________________________________ This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original. ________________________________ This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original. ________________________________ This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify the sender and delete the original. -------------- next part -------------- An HTML attachment was scrubbed... URL: From fande.kong at inl.gov Wed Apr 12 12:31:57 2017 From: fande.kong at inl.gov (Kong, Fande) Date: Wed, 12 Apr 2017 11:31:57 -0600 Subject: [petsc-users] GAMG for the unsymmetrical matrix In-Reply-To: References: <66D79752-D4C3-4A2E-ADCE-EFC23C93CCD7@mcs.anl.gov> <772D2966-F917-44D1-B2AC-B0F4E506DC7C@mcs.anl.gov> Message-ID: Hi Mark, Thanks for your reply. On Wed, Apr 12, 2017 at 9:16 AM, Mark Adams wrote: > The problem comes from setting the number of MG levels (-pc_mg_levels 2). > Not your fault, it looks like the GAMG logic is faulty, in your version at > least. > What I want is that GAMG coarsens the fine matrix once and then stops doing anything. I did not see any benefits to have more levels if the number of processors is small. > > GAMG will force the coarsest grid to one processor by default, in newer > versions. You can override the default with: > > -pc_gamg_use_parallel_coarse_grid_solver > > Your coarse grid solver is ASM with these 37 equation per process and 512 > processes. That is bad. > Why this is bad? The subdomain problem is too small? > Note, you could run this on one process to see the proper convergence > rate. > Convergence rate for which part? coarse solver, subdomain solver? > You can fix this with parameters: > > > -pc_gamg_process_eq_limit <50>: Limit (goal) on number of equations > per process on coarse grids (PCGAMGSetProcEqLim) > > -pc_gamg_coarse_eq_limit <50>: Limit on number of equations for the > coarse grid (PCGAMGSetCoarseEqLim) > > If you really want two levels then set something like > -pc_gamg_coarse_eq_limit 18145 (or higher) -pc_gamg_coarse_eq_limit 18145 > (or higher). > May have something like: make the coarse problem 1/8 large as the original problem? Otherwise, this number is just problem dependent. > You can run with -info and grep on GAMG and you will meta-data for each > level. you should see "npe=1" for the coarsest, last, grid. Or use a > parallel direct solver. > I will try. > > Note, you should not see much degradation as you increase the number of > levels. 18145 eqs on a 3D problem will probably be noticeable. I generally > aim for about 3000. > It should be fine as long as the coarse problem is solved by a parallel solver. Fande, > > > On Mon, Apr 10, 2017 at 12:17 PM, Kong, Fande wrote: > >> >> >> On Sun, Apr 9, 2017 at 6:04 AM, Mark Adams wrote: >> >>> You seem to have two levels here and 3M eqs on the fine grid and 37 on >>> the coarse grid. >> >> >> 37 is on the sub domain. >> >> rows=18145, cols=18145 on the entire coarse grid. >> >> >> >> >> >>> I don't understand that. >>> >>> You are also calling the AMG setup a lot, but not spending much time >>> in it. Try running with -info and grep on "GAMG". >>> >>> >>> On Fri, Apr 7, 2017 at 5:29 PM, Kong, Fande wrote: >>> > Thanks, Barry. >>> > >>> > It works. >>> > >>> > GAMG is three times better than ASM in terms of the number of linear >>> > iterations, but it is five times slower than ASM. Any suggestions to >>> improve >>> > the performance of GAMG? Log files are attached. >>> > >>> > Fande, >>> > >>> > On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith >>> wrote: >>> >> >>> >> >>> >> > On Apr 6, 2017, at 9:39 AM, Kong, Fande wrote: >>> >> > >>> >> > Thanks, Mark and Barry, >>> >> > >>> >> > It works pretty wells in terms of the number of linear iterations >>> (using >>> >> > "-pc_gamg_sym_graph true"), but it is horrible in the compute time. >>> I am >>> >> > using the two-level method via "-pc_mg_levels 2". The reason why >>> the compute >>> >> > time is larger than other preconditioning options is that a matrix >>> free >>> >> > method is used in the fine level and in my particular problem the >>> function >>> >> > evaluation is expensive. >>> >> > >>> >> > I am using "-snes_mf_operator 1" to turn on the Jacobian-free >>> Newton, >>> >> > but I do not think I want to make the preconditioning part >>> matrix-free. Do >>> >> > you guys know how to turn off the matrix-free method for GAMG? >>> >> >>> >> -pc_use_amat false >>> >> >>> >> > >>> >> > Here is the detailed solver: >>> >> > >>> >> > SNES Object: 384 MPI processes >>> >> > type: newtonls >>> >> > maximum iterations=200, maximum function evaluations=10000 >>> >> > tolerances: relative=1e-08, absolute=1e-08, solution=1e-50 >>> >> > total number of linear solver iterations=20 >>> >> > total number of function evaluations=166 >>> >> > norm schedule ALWAYS >>> >> > SNESLineSearch Object: 384 MPI processes >>> >> > type: bt >>> >> > interpolation: cubic >>> >> > alpha=1.000000e-04 >>> >> > maxstep=1.000000e+08, minlambda=1.000000e-12 >>> >> > tolerances: relative=1.000000e-08, absolute=1.000000e-15, >>> >> > lambda=1.000000e-08 >>> >> > maximum iterations=40 >>> >> > KSP Object: 384 MPI processes >>> >> > type: gmres >>> >> > GMRES: restart=100, using Classical (unmodified) Gram-Schmidt >>> >> > Orthogonalization with no iterative refinement >>> >> > GMRES: happy breakdown tolerance 1e-30 >>> >> > maximum iterations=100, initial guess is zero >>> >> > tolerances: relative=0.001, absolute=1e-50, divergence=10000. >>> >> > right preconditioning >>> >> > using UNPRECONDITIONED norm type for convergence test >>> >> > PC Object: 384 MPI processes >>> >> > type: gamg >>> >> > MG: type is MULTIPLICATIVE, levels=2 cycles=v >>> >> > Cycles per PCApply=1 >>> >> > Using Galerkin computed coarse grid matrices >>> >> > GAMG specific options >>> >> > Threshold for dropping small values from graph 0. >>> >> > AGG specific options >>> >> > Symmetric graph true >>> >> > Coarse grid solver -- level ------------------------------- >>> >> > KSP Object: (mg_coarse_) 384 MPI processes >>> >> > type: preonly >>> >> > maximum iterations=10000, initial guess is zero >>> >> > tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> >> > left preconditioning >>> >> > using NONE norm type for convergence test >>> >> > PC Object: (mg_coarse_) 384 MPI processes >>> >> > type: bjacobi >>> >> > block Jacobi: number of blocks = 384 >>> >> > Local solve is same for all blocks, in the following KSP >>> and >>> >> > PC objects: >>> >> > KSP Object: (mg_coarse_sub_) 1 MPI processes >>> >> > type: preonly >>> >> > maximum iterations=1, initial guess is zero >>> >> > tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> >> > left preconditioning >>> >> > using NONE norm type for convergence test >>> >> > PC Object: (mg_coarse_sub_) 1 MPI processes >>> >> > type: lu >>> >> > LU: out-of-place factorization >>> >> > tolerance for zero pivot 2.22045e-14 >>> >> > using diagonal shift on blocks to prevent zero pivot >>> >> > [INBLOCKS] >>> >> > matrix ordering: nd >>> >> > factor fill ratio given 5., needed 1.31367 >>> >> > Factored matrix follows: >>> >> > Mat Object: 1 MPI processes >>> >> > type: seqaij >>> >> > rows=37, cols=37 >>> >> > package used to perform factorization: petsc >>> >> > total: nonzeros=913, allocated nonzeros=913 >>> >> > total number of mallocs used during MatSetValues >>> calls >>> >> > =0 >>> >> > not using I-node routines >>> >> > linear system matrix = precond matrix: >>> >> > Mat Object: 1 MPI processes >>> >> > type: seqaij >>> >> > rows=37, cols=37 >>> >> > total: nonzeros=695, allocated nonzeros=695 >>> >> > total number of mallocs used during MatSetValues calls >>> =0 >>> >> > not using I-node routines >>> >> > linear system matrix = precond matrix: >>> >> > Mat Object: 384 MPI processes >>> >> > type: mpiaij >>> >> > rows=18145, cols=18145 >>> >> > total: nonzeros=1709115, allocated nonzeros=1709115 >>> >> > total number of mallocs used during MatSetValues calls =0 >>> >> > not using I-node (on process 0) routines >>> >> > Down solver (pre-smoother) on level 1 >>> >> > ------------------------------- >>> >> > KSP Object: (mg_levels_1_) 384 MPI processes >>> >> > type: chebyshev >>> >> > Chebyshev: eigenvalue estimates: min = 0.133339, max = >>> >> > 1.46673 >>> >> > Chebyshev: eigenvalues estimated using gmres with >>> translations >>> >> > [0. 0.1; 0. 1.1] >>> >> > KSP Object: (mg_levels_1_esteig_) 384 >>> MPI >>> >> > processes >>> >> > type: gmres >>> >> > GMRES: restart=30, using Classical (unmodified) >>> >> > Gram-Schmidt Orthogonalization with no iterative refinement >>> >> > GMRES: happy breakdown tolerance 1e-30 >>> >> > maximum iterations=10, initial guess is zero >>> >> > tolerances: relative=1e-12, absolute=1e-50, >>> >> > divergence=10000. >>> >> > left preconditioning >>> >> > using PRECONDITIONED norm type for convergence test >>> >> > maximum iterations=2 >>> >> > tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> >> > left preconditioning >>> >> > using nonzero initial guess >>> >> > using NONE norm type for convergence test >>> >> > PC Object: (mg_levels_1_) 384 MPI processes >>> >> > type: sor >>> >> > SOR: type = local_symmetric, iterations = 1, local >>> iterations >>> >> > = 1, omega = 1. >>> >> > linear system matrix followed by preconditioner matrix: >>> >> > Mat Object: 384 MPI processes >>> >> > type: mffd >>> >> > rows=3020875, cols=3020875 >>> >> > Matrix-free approximation: >>> >> > err=1.49012e-08 (relative error in function >>> evaluation) >>> >> > Using wp compute h routine >>> >> > Does not compute normU >>> >> > Mat Object: () 384 MPI processes >>> >> > type: mpiaij >>> >> > rows=3020875, cols=3020875 >>> >> > total: nonzeros=215671710, allocated nonzeros=241731750 >>> >> > total number of mallocs used during MatSetValues calls =0 >>> >> > not using I-node (on process 0) routines >>> >> > Up solver (post-smoother) same as down solver (pre-smoother) >>> >> > linear system matrix followed by preconditioner matrix: >>> >> > Mat Object: 384 MPI processes >>> >> > type: mffd >>> >> > rows=3020875, cols=3020875 >>> >> > Matrix-free approximation: >>> >> > err=1.49012e-08 (relative error in function evaluation) >>> >> > Using wp compute h routine >>> >> > Does not compute normU >>> >> > Mat Object: () 384 MPI processes >>> >> > type: mpiaij >>> >> > rows=3020875, cols=3020875 >>> >> > total: nonzeros=215671710, allocated nonzeros=241731750 >>> >> > total number of mallocs used during MatSetValues calls =0 >>> >> > not using I-node (on process 0) routines >>> >> > >>> >> > >>> >> > Fande, >>> >> > >>> >> > On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams wrote: >>> >> > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith >>> wrote: >>> >> > > >>> >> > >> Does this mean that GAMG works for the symmetrical matrix only? >>> >> > > >>> >> > > No, it means that for non symmetric nonzero structure you need >>> the >>> >> > > extra flag. So use the extra flag. The reason we don't always use >>> the flag >>> >> > > is because it adds extra cost and isn't needed if the matrix >>> already has a >>> >> > > symmetric nonzero structure. >>> >> > >>> >> > BTW, if you have symmetric non-zero structure you can just set >>> >> > -pc_gamg_threshold -1.0', note the "or" in the message. >>> >> > >>> >> > If you want to mess with the threshold then you need to use the >>> >> > symmetrized flag. >>> >> > >>> >> >>> > >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fande.kong at inl.gov Wed Apr 12 18:04:12 2017 From: fande.kong at inl.gov (Kong, Fande) Date: Wed, 12 Apr 2017 17:04:12 -0600 Subject: [petsc-users] GAMG for the unsymmetrical matrix In-Reply-To: References: <66D79752-D4C3-4A2E-ADCE-EFC23C93CCD7@mcs.anl.gov> <772D2966-F917-44D1-B2AC-B0F4E506DC7C@mcs.anl.gov> Message-ID: On Sun, Apr 9, 2017 at 6:04 AM, Mark Adams wrote: > You seem to have two levels here and 3M eqs on the fine grid and 37 on > the coarse grid. I don't understand that. > > You are also calling the AMG setup a lot, but not spending much time > in it. Try running with -info and grep on "GAMG". > I got the following output: [0] PCSetUp_GAMG(): level 0) N=3020875, n data rows=1, n data cols=1, nnz/row (ave)=71, np=384 [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold 0., 73.6364 nnz ave. (N=3020875) [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square [0] PCGAMGProlongator_AGG(): New grid 18162 nodes [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.978702e+00 min=2.559747e-02 PC=jacobi [0] PCGAMGCreateLevel_GAMG(): Aggregate processors noop: new_size=384, neq(loc)=40 [0] PCSetUp_GAMG(): 1) N=18162, n data cols=1, nnz/row (ave)=94, 384 active pes [0] PCSetUp_GAMG(): 2 levels, grid complexity = 1.00795 [0] PCSetUp_GAMG(): level 0) N=3020875, n data rows=1, n data cols=1, nnz/row (ave)=71, np=384 [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold 0., 73.6364 nnz ave. (N=3020875) [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square [0] PCGAMGProlongator_AGG(): New grid 18145 nodes [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.978584e+00 min=2.557887e-02 PC=jacobi [0] PCGAMGCreateLevel_GAMG(): Aggregate processors noop: new_size=384, neq(loc)=37 [0] PCSetUp_GAMG(): 1) N=18145, n data cols=1, nnz/row (ave)=94, 384 active pes [0] PCSetUp_GAMG(): 2 levels, grid complexity = 1.00792 GAMG specific options PCGAMGGraph_AGG 40 1.0 8.0759e+00 1.0 3.56e+07 2.3 1.6e+06 1.9e+04 7.6e+02 2 0 2 4 2 2 0 2 4 2 1170 PCGAMGCoarse_AGG 40 1.0 7.1698e+01 1.0 4.05e+09 2.3 4.0e+06 5.1e+04 1.2e+03 18 37 5 27 3 18 37 5 27 3 14632 PCGAMGProl_AGG 40 1.0 9.2650e-01 1.2 0.00e+00 0.0 9.8e+05 2.9e+03 9.6e+02 0 0 1 0 2 0 0 1 0 2 0 PCGAMGPOpt_AGG 40 1.0 2.4484e+00 1.0 4.72e+08 2.3 3.1e+06 2.3e+03 1.9e+03 1 4 4 1 4 1 4 4 1 4 51328 GAMG: createProl 40 1.0 8.3786e+01 1.0 4.56e+09 2.3 9.6e+06 2.5e+04 4.8e+03 21 42 12 32 10 21 42 12 32 10 14134 GAMG: partLevel 40 1.0 6.7755e+00 1.1 2.59e+08 2.3 2.9e+06 2.5e+03 1.5e+03 2 2 4 1 3 2 2 4 1 3 9431 > > > On Fri, Apr 7, 2017 at 5:29 PM, Kong, Fande wrote: > > Thanks, Barry. > > > > It works. > > > > GAMG is three times better than ASM in terms of the number of linear > > iterations, but it is five times slower than ASM. Any suggestions to > improve > > the performance of GAMG? Log files are attached. > > > > Fande, > > > > On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith wrote: > >> > >> > >> > On Apr 6, 2017, at 9:39 AM, Kong, Fande wrote: > >> > > >> > Thanks, Mark and Barry, > >> > > >> > It works pretty wells in terms of the number of linear iterations > (using > >> > "-pc_gamg_sym_graph true"), but it is horrible in the compute time. I > am > >> > using the two-level method via "-pc_mg_levels 2". The reason why the > compute > >> > time is larger than other preconditioning options is that a matrix > free > >> > method is used in the fine level and in my particular problem the > function > >> > evaluation is expensive. > >> > > >> > I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton, > >> > but I do not think I want to make the preconditioning part > matrix-free. Do > >> > you guys know how to turn off the matrix-free method for GAMG? > >> > >> -pc_use_amat false > >> > >> > > >> > Here is the detailed solver: > >> > > >> > SNES Object: 384 MPI processes > >> > type: newtonls > >> > maximum iterations=200, maximum function evaluations=10000 > >> > tolerances: relative=1e-08, absolute=1e-08, solution=1e-50 > >> > total number of linear solver iterations=20 > >> > total number of function evaluations=166 > >> > norm schedule ALWAYS > >> > SNESLineSearch Object: 384 MPI processes > >> > type: bt > >> > interpolation: cubic > >> > alpha=1.000000e-04 > >> > maxstep=1.000000e+08, minlambda=1.000000e-12 > >> > tolerances: relative=1.000000e-08, absolute=1.000000e-15, > >> > lambda=1.000000e-08 > >> > maximum iterations=40 > >> > KSP Object: 384 MPI processes > >> > type: gmres > >> > GMRES: restart=100, using Classical (unmodified) Gram-Schmidt > >> > Orthogonalization with no iterative refinement > >> > GMRES: happy breakdown tolerance 1e-30 > >> > maximum iterations=100, initial guess is zero > >> > tolerances: relative=0.001, absolute=1e-50, divergence=10000. > >> > right preconditioning > >> > using UNPRECONDITIONED norm type for convergence test > >> > PC Object: 384 MPI processes > >> > type: gamg > >> > MG: type is MULTIPLICATIVE, levels=2 cycles=v > >> > Cycles per PCApply=1 > >> > Using Galerkin computed coarse grid matrices > >> > GAMG specific options > >> > Threshold for dropping small values from graph 0. > >> > AGG specific options > >> > Symmetric graph true > >> > Coarse grid solver -- level ------------------------------- > >> > KSP Object: (mg_coarse_) 384 MPI processes > >> > type: preonly > >> > maximum iterations=10000, initial guess is zero > >> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > >> > left preconditioning > >> > using NONE norm type for convergence test > >> > PC Object: (mg_coarse_) 384 MPI processes > >> > type: bjacobi > >> > block Jacobi: number of blocks = 384 > >> > Local solve is same for all blocks, in the following KSP and > >> > PC objects: > >> > KSP Object: (mg_coarse_sub_) 1 MPI processes > >> > type: preonly > >> > maximum iterations=1, initial guess is zero > >> > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000. > >> > left preconditioning > >> > using NONE norm type for convergence test > >> > PC Object: (mg_coarse_sub_) 1 MPI processes > >> > type: lu > >> > LU: out-of-place factorization > >> > tolerance for zero pivot 2.22045e-14 > >> > using diagonal shift on blocks to prevent zero pivot > >> > [INBLOCKS] > >> > matrix ordering: nd > >> > factor fill ratio given 5., needed 1.31367 > >> > Factored matrix follows: > >> > Mat Object: 1 MPI processes > >> > type: seqaij > >> > rows=37, cols=37 > >> > package used to perform factorization: petsc > >> > total: nonzeros=913, allocated nonzeros=913 > >> > total number of mallocs used during MatSetValues > calls > >> > =0 > >> > not using I-node routines > >> > linear system matrix = precond matrix: > >> > Mat Object: 1 MPI processes > >> > type: seqaij > >> > rows=37, cols=37 > >> > total: nonzeros=695, allocated nonzeros=695 > >> > total number of mallocs used during MatSetValues calls =0 > >> > not using I-node routines > >> > linear system matrix = precond matrix: > >> > Mat Object: 384 MPI processes > >> > type: mpiaij > >> > rows=18145, cols=18145 > >> > total: nonzeros=1709115, allocated nonzeros=1709115 > >> > total number of mallocs used during MatSetValues calls =0 > >> > not using I-node (on process 0) routines > >> > Down solver (pre-smoother) on level 1 > >> > ------------------------------- > >> > KSP Object: (mg_levels_1_) 384 MPI processes > >> > type: chebyshev > >> > Chebyshev: eigenvalue estimates: min = 0.133339, max = > >> > 1.46673 > >> > Chebyshev: eigenvalues estimated using gmres with > translations > >> > [0. 0.1; 0. 1.1] > >> > KSP Object: (mg_levels_1_esteig_) 384 MPI > >> > processes > >> > type: gmres > >> > GMRES: restart=30, using Classical (unmodified) > >> > Gram-Schmidt Orthogonalization with no iterative refinement > >> > GMRES: happy breakdown tolerance 1e-30 > >> > maximum iterations=10, initial guess is zero > >> > tolerances: relative=1e-12, absolute=1e-50, > >> > divergence=10000. > >> > left preconditioning > >> > using PRECONDITIONED norm type for convergence test > >> > maximum iterations=2 > >> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > >> > left preconditioning > >> > using nonzero initial guess > >> > using NONE norm type for convergence test > >> > PC Object: (mg_levels_1_) 384 MPI processes > >> > type: sor > >> > SOR: type = local_symmetric, iterations = 1, local > iterations > >> > = 1, omega = 1. > >> > linear system matrix followed by preconditioner matrix: > >> > Mat Object: 384 MPI processes > >> > type: mffd > >> > rows=3020875, cols=3020875 > >> > Matrix-free approximation: > >> > err=1.49012e-08 (relative error in function evaluation) > >> > Using wp compute h routine > >> > Does not compute normU > >> > Mat Object: () 384 MPI processes > >> > type: mpiaij > >> > rows=3020875, cols=3020875 > >> > total: nonzeros=215671710, allocated nonzeros=241731750 > >> > total number of mallocs used during MatSetValues calls =0 > >> > not using I-node (on process 0) routines > >> > Up solver (post-smoother) same as down solver (pre-smoother) > >> > linear system matrix followed by preconditioner matrix: > >> > Mat Object: 384 MPI processes > >> > type: mffd > >> > rows=3020875, cols=3020875 > >> > Matrix-free approximation: > >> > err=1.49012e-08 (relative error in function evaluation) > >> > Using wp compute h routine > >> > Does not compute normU > >> > Mat Object: () 384 MPI processes > >> > type: mpiaij > >> > rows=3020875, cols=3020875 > >> > total: nonzeros=215671710, allocated nonzeros=241731750 > >> > total number of mallocs used during MatSetValues calls =0 > >> > not using I-node (on process 0) routines > >> > > >> > > >> > Fande, > >> > > >> > On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams wrote: > >> > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith > wrote: > >> > > > >> > >> Does this mean that GAMG works for the symmetrical matrix only? > >> > > > >> > > No, it means that for non symmetric nonzero structure you need the > >> > > extra flag. So use the extra flag. The reason we don't always use > the flag > >> > > is because it adds extra cost and isn't needed if the matrix > already has a > >> > > symmetric nonzero structure. > >> > > >> > BTW, if you have symmetric non-zero structure you can just set > >> > -pc_gamg_threshold -1.0', note the "or" in the message. > >> > > >> > If you want to mess with the threshold then you need to use the > >> > symmetrized flag. > >> > > >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dalcinl at gmail.com Thu Apr 13 02:16:22 2017 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Thu, 13 Apr 2017 10:16:22 +0300 Subject: [petsc-users] how to use petsc4py with mpi subcommunicators? In-Reply-To: <350529B93F4E2F4497FD8DE4E86E84AA16F1FD65@AUS1EXMBX04.ioinc.ioroot.tld> References: <350529B93F4E2F4497FD8DE4E86E84AA16F1FC5C@AUS1EXMBX04.ioinc.ioroot.tld> <87bms367kb.fsf@jedbrown.org> <350529B93F4E2F4497FD8DE4E86E84AA16F1FD65@AUS1EXMBX04.ioinc.ioroot.tld> Message-ID: On 11 April 2017 at 17:31, Rodrigo Felicio wrote: > Thanks, Jed, but using color == 0 lead to the same error msg. Is there no > way to set PETSc.COMM_WORLD to a subcomm instead of MPI.COMM_WORLD in > python? > > You can do it, but it is a bit tricky, and related to the fact that PETSC_COMM_WORLD has to be set to subcomm before calling PetscInitialize(), which happens automatically the first time Python executes "from petsc4py import PETSc". The way to do it would be the following: import sys, petsc4py petsc4py.init(sys.argv, comm=subcomm) from petsc4py import PETSc -- Lisandro Dalcin ============ Research Scientist Computer, Electrical and Mathematical Sciences & Engineering (CEMSE) Extreme Computing Research Center (ECRC) King Abdullah University of Science and Technology (KAUST) http://ecrc.kaust.edu.sa/ 4700 King Abdullah University of Science and Technology al-Khawarizmi Bldg (Bldg 1), Office # 0109 Thuwal 23955-6900, Kingdom of Saudi Arabia http://www.kaust.edu.sa Office Phone: +966 12 808-0459 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Apr 13 10:12:50 2017 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 13 Apr 2017 11:12:50 -0400 Subject: [petsc-users] GAMG for the unsymmetrical matrix In-Reply-To: References: <66D79752-D4C3-4A2E-ADCE-EFC23C93CCD7@mcs.anl.gov> <772D2966-F917-44D1-B2AC-B0F4E506DC7C@mcs.anl.gov> Message-ID: On Wed, Apr 12, 2017 at 1:31 PM, Kong, Fande wrote: > Hi Mark, > > Thanks for your reply. > > On Wed, Apr 12, 2017 at 9:16 AM, Mark Adams wrote: > >> The problem comes from setting the number of MG levels (-pc_mg_levels 2). >> Not your fault, it looks like the GAMG logic is faulty, in your version at >> least. >> > > What I want is that GAMG coarsens the fine matrix once and then stops > doing anything. I did not see any benefits to have more levels if the > number of processors is small. > The number of levels is a math issue and has nothing to do with parallelism. If you do just one level your coarse grid is very large and expensive to solve, so you want to keep coarsening. There is rarely a need to use -pc_mg_levels > > >> >> GAMG will force the coarsest grid to one processor by default, in newer >> versions. You can override the default with: >> >> -pc_gamg_use_parallel_coarse_grid_solver >> >> Your coarse grid solver is ASM with these 37 equation per process and 512 >> processes. That is bad. >> > > Why this is bad? The subdomain problem is too small? > Because ASM with 512 blocks is a weak solver. You want the coarse grid to be solved exactly. > > >> Note, you could run this on one process to see the proper convergence >> rate. >> > > Convergence rate for which part? coarse solver, subdomain solver? > The overall converge rate. > > >> You can fix this with parameters: >> >> > -pc_gamg_process_eq_limit <50>: Limit (goal) on number of equations >> per process on coarse grids (PCGAMGSetProcEqLim) >> > -pc_gamg_coarse_eq_limit <50>: Limit on number of equations for the >> coarse grid (PCGAMGSetCoarseEqLim) >> >> If you really want two levels then set something like >> -pc_gamg_coarse_eq_limit 18145 (or higher) -pc_gamg_coarse_eq_limit 18145 >> (or higher). >> > > > May have something like: make the coarse problem 1/8 large as the original > problem? Otherwise, this number is just problem dependent. > GAMG will stop automatically so that you do not need problem dependant parameters. > > > >> You can run with -info and grep on GAMG and you will meta-data for each >> level. you should see "npe=1" for the coarsest, last, grid. Or use a >> parallel direct solver. >> > > I will try. > > >> >> Note, you should not see much degradation as you increase the number of >> levels. 18145 eqs on a 3D problem will probably be noticeable. I generally >> aim for about 3000. >> > > It should be fine as long as the coarse problem is solved by a parallel > solver. > > > Fande, > > >> >> >> On Mon, Apr 10, 2017 at 12:17 PM, Kong, Fande wrote: >> >>> >>> >>> On Sun, Apr 9, 2017 at 6:04 AM, Mark Adams wrote: >>> >>>> You seem to have two levels here and 3M eqs on the fine grid and 37 on >>>> the coarse grid. >>> >>> >>> 37 is on the sub domain. >>> >>> rows=18145, cols=18145 on the entire coarse grid. >>> >>> >>> >>> >>> >>>> I don't understand that. >>>> >>>> You are also calling the AMG setup a lot, but not spending much time >>>> in it. Try running with -info and grep on "GAMG". >>>> >>>> >>>> On Fri, Apr 7, 2017 at 5:29 PM, Kong, Fande wrote: >>>> > Thanks, Barry. >>>> > >>>> > It works. >>>> > >>>> > GAMG is three times better than ASM in terms of the number of linear >>>> > iterations, but it is five times slower than ASM. Any suggestions to >>>> improve >>>> > the performance of GAMG? Log files are attached. >>>> > >>>> > Fande, >>>> > >>>> > On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith >>>> wrote: >>>> >> >>>> >> >>>> >> > On Apr 6, 2017, at 9:39 AM, Kong, Fande >>>> wrote: >>>> >> > >>>> >> > Thanks, Mark and Barry, >>>> >> > >>>> >> > It works pretty wells in terms of the number of linear iterations >>>> (using >>>> >> > "-pc_gamg_sym_graph true"), but it is horrible in the compute >>>> time. I am >>>> >> > using the two-level method via "-pc_mg_levels 2". The reason why >>>> the compute >>>> >> > time is larger than other preconditioning options is that a matrix >>>> free >>>> >> > method is used in the fine level and in my particular problem the >>>> function >>>> >> > evaluation is expensive. >>>> >> > >>>> >> > I am using "-snes_mf_operator 1" to turn on the Jacobian-free >>>> Newton, >>>> >> > but I do not think I want to make the preconditioning part >>>> matrix-free. Do >>>> >> > you guys know how to turn off the matrix-free method for GAMG? >>>> >> >>>> >> -pc_use_amat false >>>> >> >>>> >> > >>>> >> > Here is the detailed solver: >>>> >> > >>>> >> > SNES Object: 384 MPI processes >>>> >> > type: newtonls >>>> >> > maximum iterations=200, maximum function evaluations=10000 >>>> >> > tolerances: relative=1e-08, absolute=1e-08, solution=1e-50 >>>> >> > total number of linear solver iterations=20 >>>> >> > total number of function evaluations=166 >>>> >> > norm schedule ALWAYS >>>> >> > SNESLineSearch Object: 384 MPI processes >>>> >> > type: bt >>>> >> > interpolation: cubic >>>> >> > alpha=1.000000e-04 >>>> >> > maxstep=1.000000e+08, minlambda=1.000000e-12 >>>> >> > tolerances: relative=1.000000e-08, absolute=1.000000e-15, >>>> >> > lambda=1.000000e-08 >>>> >> > maximum iterations=40 >>>> >> > KSP Object: 384 MPI processes >>>> >> > type: gmres >>>> >> > GMRES: restart=100, using Classical (unmodified) Gram-Schmidt >>>> >> > Orthogonalization with no iterative refinement >>>> >> > GMRES: happy breakdown tolerance 1e-30 >>>> >> > maximum iterations=100, initial guess is zero >>>> >> > tolerances: relative=0.001, absolute=1e-50, divergence=10000. >>>> >> > right preconditioning >>>> >> > using UNPRECONDITIONED norm type for convergence test >>>> >> > PC Object: 384 MPI processes >>>> >> > type: gamg >>>> >> > MG: type is MULTIPLICATIVE, levels=2 cycles=v >>>> >> > Cycles per PCApply=1 >>>> >> > Using Galerkin computed coarse grid matrices >>>> >> > GAMG specific options >>>> >> > Threshold for dropping small values from graph 0. >>>> >> > AGG specific options >>>> >> > Symmetric graph true >>>> >> > Coarse grid solver -- level ------------------------------- >>>> >> > KSP Object: (mg_coarse_) 384 MPI processes >>>> >> > type: preonly >>>> >> > maximum iterations=10000, initial guess is zero >>>> >> > tolerances: relative=1e-05, absolute=1e-50, >>>> divergence=10000. >>>> >> > left preconditioning >>>> >> > using NONE norm type for convergence test >>>> >> > PC Object: (mg_coarse_) 384 MPI processes >>>> >> > type: bjacobi >>>> >> > block Jacobi: number of blocks = 384 >>>> >> > Local solve is same for all blocks, in the following KSP >>>> and >>>> >> > PC objects: >>>> >> > KSP Object: (mg_coarse_sub_) 1 MPI processes >>>> >> > type: preonly >>>> >> > maximum iterations=1, initial guess is zero >>>> >> > tolerances: relative=1e-05, absolute=1e-50, >>>> divergence=10000. >>>> >> > left preconditioning >>>> >> > using NONE norm type for convergence test >>>> >> > PC Object: (mg_coarse_sub_) 1 MPI processes >>>> >> > type: lu >>>> >> > LU: out-of-place factorization >>>> >> > tolerance for zero pivot 2.22045e-14 >>>> >> > using diagonal shift on blocks to prevent zero pivot >>>> >> > [INBLOCKS] >>>> >> > matrix ordering: nd >>>> >> > factor fill ratio given 5., needed 1.31367 >>>> >> > Factored matrix follows: >>>> >> > Mat Object: 1 MPI processes >>>> >> > type: seqaij >>>> >> > rows=37, cols=37 >>>> >> > package used to perform factorization: petsc >>>> >> > total: nonzeros=913, allocated nonzeros=913 >>>> >> > total number of mallocs used during MatSetValues >>>> calls >>>> >> > =0 >>>> >> > not using I-node routines >>>> >> > linear system matrix = precond matrix: >>>> >> > Mat Object: 1 MPI processes >>>> >> > type: seqaij >>>> >> > rows=37, cols=37 >>>> >> > total: nonzeros=695, allocated nonzeros=695 >>>> >> > total number of mallocs used during MatSetValues calls >>>> =0 >>>> >> > not using I-node routines >>>> >> > linear system matrix = precond matrix: >>>> >> > Mat Object: 384 MPI processes >>>> >> > type: mpiaij >>>> >> > rows=18145, cols=18145 >>>> >> > total: nonzeros=1709115, allocated nonzeros=1709115 >>>> >> > total number of mallocs used during MatSetValues calls =0 >>>> >> > not using I-node (on process 0) routines >>>> >> > Down solver (pre-smoother) on level 1 >>>> >> > ------------------------------- >>>> >> > KSP Object: (mg_levels_1_) 384 MPI processes >>>> >> > type: chebyshev >>>> >> > Chebyshev: eigenvalue estimates: min = 0.133339, max = >>>> >> > 1.46673 >>>> >> > Chebyshev: eigenvalues estimated using gmres with >>>> translations >>>> >> > [0. 0.1; 0. 1.1] >>>> >> > KSP Object: (mg_levels_1_esteig_) 384 >>>> MPI >>>> >> > processes >>>> >> > type: gmres >>>> >> > GMRES: restart=30, using Classical (unmodified) >>>> >> > Gram-Schmidt Orthogonalization with no iterative refinement >>>> >> > GMRES: happy breakdown tolerance 1e-30 >>>> >> > maximum iterations=10, initial guess is zero >>>> >> > tolerances: relative=1e-12, absolute=1e-50, >>>> >> > divergence=10000. >>>> >> > left preconditioning >>>> >> > using PRECONDITIONED norm type for convergence test >>>> >> > maximum iterations=2 >>>> >> > tolerances: relative=1e-05, absolute=1e-50, >>>> divergence=10000. >>>> >> > left preconditioning >>>> >> > using nonzero initial guess >>>> >> > using NONE norm type for convergence test >>>> >> > PC Object: (mg_levels_1_) 384 MPI processes >>>> >> > type: sor >>>> >> > SOR: type = local_symmetric, iterations = 1, local >>>> iterations >>>> >> > = 1, omega = 1. >>>> >> > linear system matrix followed by preconditioner matrix: >>>> >> > Mat Object: 384 MPI processes >>>> >> > type: mffd >>>> >> > rows=3020875, cols=3020875 >>>> >> > Matrix-free approximation: >>>> >> > err=1.49012e-08 (relative error in function >>>> evaluation) >>>> >> > Using wp compute h routine >>>> >> > Does not compute normU >>>> >> > Mat Object: () 384 MPI processes >>>> >> > type: mpiaij >>>> >> > rows=3020875, cols=3020875 >>>> >> > total: nonzeros=215671710, allocated nonzeros=241731750 >>>> >> > total number of mallocs used during MatSetValues calls =0 >>>> >> > not using I-node (on process 0) routines >>>> >> > Up solver (post-smoother) same as down solver (pre-smoother) >>>> >> > linear system matrix followed by preconditioner matrix: >>>> >> > Mat Object: 384 MPI processes >>>> >> > type: mffd >>>> >> > rows=3020875, cols=3020875 >>>> >> > Matrix-free approximation: >>>> >> > err=1.49012e-08 (relative error in function evaluation) >>>> >> > Using wp compute h routine >>>> >> > Does not compute normU >>>> >> > Mat Object: () 384 MPI processes >>>> >> > type: mpiaij >>>> >> > rows=3020875, cols=3020875 >>>> >> > total: nonzeros=215671710, allocated nonzeros=241731750 >>>> >> > total number of mallocs used during MatSetValues calls =0 >>>> >> > not using I-node (on process 0) routines >>>> >> > >>>> >> > >>>> >> > Fande, >>>> >> > >>>> >> > On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams >>>> wrote: >>>> >> > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith >>>> wrote: >>>> >> > > >>>> >> > >> Does this mean that GAMG works for the symmetrical matrix only? >>>> >> > > >>>> >> > > No, it means that for non symmetric nonzero structure you need >>>> the >>>> >> > > extra flag. So use the extra flag. The reason we don't always >>>> use the flag >>>> >> > > is because it adds extra cost and isn't needed if the matrix >>>> already has a >>>> >> > > symmetric nonzero structure. >>>> >> > >>>> >> > BTW, if you have symmetric non-zero structure you can just set >>>> >> > -pc_gamg_threshold -1.0', note the "or" in the message. >>>> >> > >>>> >> > If you want to mess with the threshold then you need to use the >>>> >> > symmetrized flag. >>>> >> > >>>> >> >>>> > >>>> >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Apr 13 10:14:30 2017 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 13 Apr 2017 11:14:30 -0400 Subject: [petsc-users] GAMG for the unsymmetrical matrix In-Reply-To: References: <66D79752-D4C3-4A2E-ADCE-EFC23C93CCD7@mcs.anl.gov> <772D2966-F917-44D1-B2AC-B0F4E506DC7C@mcs.anl.gov> Message-ID: On Wed, Apr 12, 2017 at 7:04 PM, Kong, Fande wrote: > > > On Sun, Apr 9, 2017 at 6:04 AM, Mark Adams wrote: > >> You seem to have two levels here and 3M eqs on the fine grid and 37 on >> the coarse grid. I don't understand that. >> >> You are also calling the AMG setup a lot, but not spending much time >> in it. Try running with -info and grep on "GAMG". >> > > I got the following output: > > [0] PCSetUp_GAMG(): level 0) N=3020875, n data rows=1, n data cols=1, > nnz/row (ave)=71, np=384 > [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold > 0., 73.6364 nnz ave. (N=3020875) > [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square > [0] PCGAMGProlongator_AGG(): New grid 18162 nodes > [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.978702e+00 > min=2.559747e-02 PC=jacobi > [0] PCGAMGCreateLevel_GAMG(): Aggregate processors noop: new_size=384, > neq(loc)=40 > [0] PCSetUp_GAMG(): 1) N=18162, n data cols=1, nnz/row (ave)=94, 384 > active pes > [0] PCSetUp_GAMG(): 2 levels, grid complexity = 1.00795 > [0] PCSetUp_GAMG(): level 0) N=3020875, n data rows=1, n data cols=1, > nnz/row (ave)=71, np=384 > [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold > 0., 73.6364 nnz ave. (N=3020875) > [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square > [0] PCGAMGProlongator_AGG(): New grid 18145 nodes > [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.978584e+00 > min=2.557887e-02 PC=jacobi > [0] PCGAMGCreateLevel_GAMG(): Aggregate processors noop: new_size=384, > neq(loc)=37 > [0] PCSetUp_GAMG(): 1) N=18145, n data cols=1, nnz/row (ave)=94, 384 > active pes > You are still doing two levels. Just use the parameters that I told you and you should see that 1) this coarsest (last) grid has "1 active pes" and 2) the overall solve time and overall convergence rate is much better. > [0] PCSetUp_GAMG(): 2 levels, grid complexity = 1.00792 > GAMG specific options > PCGAMGGraph_AGG 40 1.0 8.0759e+00 1.0 3.56e+07 2.3 1.6e+06 1.9e+04 > 7.6e+02 2 0 2 4 2 2 0 2 4 2 1170 > PCGAMGCoarse_AGG 40 1.0 7.1698e+01 1.0 4.05e+09 2.3 4.0e+06 5.1e+04 > 1.2e+03 18 37 5 27 3 18 37 5 27 3 14632 > PCGAMGProl_AGG 40 1.0 9.2650e-01 1.2 0.00e+00 0.0 9.8e+05 2.9e+03 > 9.6e+02 0 0 1 0 2 0 0 1 0 2 0 > PCGAMGPOpt_AGG 40 1.0 2.4484e+00 1.0 4.72e+08 2.3 3.1e+06 2.3e+03 > 1.9e+03 1 4 4 1 4 1 4 4 1 4 51328 > GAMG: createProl 40 1.0 8.3786e+01 1.0 4.56e+09 2.3 9.6e+06 2.5e+04 > 4.8e+03 21 42 12 32 10 21 42 12 32 10 14134 > GAMG: partLevel 40 1.0 6.7755e+00 1.1 2.59e+08 2.3 2.9e+06 2.5e+03 > 1.5e+03 2 2 4 1 3 2 2 4 1 3 9431 > > > > > > > > >> >> >> On Fri, Apr 7, 2017 at 5:29 PM, Kong, Fande wrote: >> > Thanks, Barry. >> > >> > It works. >> > >> > GAMG is three times better than ASM in terms of the number of linear >> > iterations, but it is five times slower than ASM. Any suggestions to >> improve >> > the performance of GAMG? Log files are attached. >> > >> > Fande, >> > >> > On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith wrote: >> >> >> >> >> >> > On Apr 6, 2017, at 9:39 AM, Kong, Fande wrote: >> >> > >> >> > Thanks, Mark and Barry, >> >> > >> >> > It works pretty wells in terms of the number of linear iterations >> (using >> >> > "-pc_gamg_sym_graph true"), but it is horrible in the compute time. >> I am >> >> > using the two-level method via "-pc_mg_levels 2". The reason why the >> compute >> >> > time is larger than other preconditioning options is that a matrix >> free >> >> > method is used in the fine level and in my particular problem the >> function >> >> > evaluation is expensive. >> >> > >> >> > I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton, >> >> > but I do not think I want to make the preconditioning part >> matrix-free. Do >> >> > you guys know how to turn off the matrix-free method for GAMG? >> >> >> >> -pc_use_amat false >> >> >> >> > >> >> > Here is the detailed solver: >> >> > >> >> > SNES Object: 384 MPI processes >> >> > type: newtonls >> >> > maximum iterations=200, maximum function evaluations=10000 >> >> > tolerances: relative=1e-08, absolute=1e-08, solution=1e-50 >> >> > total number of linear solver iterations=20 >> >> > total number of function evaluations=166 >> >> > norm schedule ALWAYS >> >> > SNESLineSearch Object: 384 MPI processes >> >> > type: bt >> >> > interpolation: cubic >> >> > alpha=1.000000e-04 >> >> > maxstep=1.000000e+08, minlambda=1.000000e-12 >> >> > tolerances: relative=1.000000e-08, absolute=1.000000e-15, >> >> > lambda=1.000000e-08 >> >> > maximum iterations=40 >> >> > KSP Object: 384 MPI processes >> >> > type: gmres >> >> > GMRES: restart=100, using Classical (unmodified) Gram-Schmidt >> >> > Orthogonalization with no iterative refinement >> >> > GMRES: happy breakdown tolerance 1e-30 >> >> > maximum iterations=100, initial guess is zero >> >> > tolerances: relative=0.001, absolute=1e-50, divergence=10000. >> >> > right preconditioning >> >> > using UNPRECONDITIONED norm type for convergence test >> >> > PC Object: 384 MPI processes >> >> > type: gamg >> >> > MG: type is MULTIPLICATIVE, levels=2 cycles=v >> >> > Cycles per PCApply=1 >> >> > Using Galerkin computed coarse grid matrices >> >> > GAMG specific options >> >> > Threshold for dropping small values from graph 0. >> >> > AGG specific options >> >> > Symmetric graph true >> >> > Coarse grid solver -- level ------------------------------- >> >> > KSP Object: (mg_coarse_) 384 MPI processes >> >> > type: preonly >> >> > maximum iterations=10000, initial guess is zero >> >> > tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> >> > left preconditioning >> >> > using NONE norm type for convergence test >> >> > PC Object: (mg_coarse_) 384 MPI processes >> >> > type: bjacobi >> >> > block Jacobi: number of blocks = 384 >> >> > Local solve is same for all blocks, in the following KSP >> and >> >> > PC objects: >> >> > KSP Object: (mg_coarse_sub_) 1 MPI processes >> >> > type: preonly >> >> > maximum iterations=1, initial guess is zero >> >> > tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> >> > left preconditioning >> >> > using NONE norm type for convergence test >> >> > PC Object: (mg_coarse_sub_) 1 MPI processes >> >> > type: lu >> >> > LU: out-of-place factorization >> >> > tolerance for zero pivot 2.22045e-14 >> >> > using diagonal shift on blocks to prevent zero pivot >> >> > [INBLOCKS] >> >> > matrix ordering: nd >> >> > factor fill ratio given 5., needed 1.31367 >> >> > Factored matrix follows: >> >> > Mat Object: 1 MPI processes >> >> > type: seqaij >> >> > rows=37, cols=37 >> >> > package used to perform factorization: petsc >> >> > total: nonzeros=913, allocated nonzeros=913 >> >> > total number of mallocs used during MatSetValues >> calls >> >> > =0 >> >> > not using I-node routines >> >> > linear system matrix = precond matrix: >> >> > Mat Object: 1 MPI processes >> >> > type: seqaij >> >> > rows=37, cols=37 >> >> > total: nonzeros=695, allocated nonzeros=695 >> >> > total number of mallocs used during MatSetValues calls =0 >> >> > not using I-node routines >> >> > linear system matrix = precond matrix: >> >> > Mat Object: 384 MPI processes >> >> > type: mpiaij >> >> > rows=18145, cols=18145 >> >> > total: nonzeros=1709115, allocated nonzeros=1709115 >> >> > total number of mallocs used during MatSetValues calls =0 >> >> > not using I-node (on process 0) routines >> >> > Down solver (pre-smoother) on level 1 >> >> > ------------------------------- >> >> > KSP Object: (mg_levels_1_) 384 MPI processes >> >> > type: chebyshev >> >> > Chebyshev: eigenvalue estimates: min = 0.133339, max = >> >> > 1.46673 >> >> > Chebyshev: eigenvalues estimated using gmres with >> translations >> >> > [0. 0.1; 0. 1.1] >> >> > KSP Object: (mg_levels_1_esteig_) 384 >> MPI >> >> > processes >> >> > type: gmres >> >> > GMRES: restart=30, using Classical (unmodified) >> >> > Gram-Schmidt Orthogonalization with no iterative refinement >> >> > GMRES: happy breakdown tolerance 1e-30 >> >> > maximum iterations=10, initial guess is zero >> >> > tolerances: relative=1e-12, absolute=1e-50, >> >> > divergence=10000. >> >> > left preconditioning >> >> > using PRECONDITIONED norm type for convergence test >> >> > maximum iterations=2 >> >> > tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> >> > left preconditioning >> >> > using nonzero initial guess >> >> > using NONE norm type for convergence test >> >> > PC Object: (mg_levels_1_) 384 MPI processes >> >> > type: sor >> >> > SOR: type = local_symmetric, iterations = 1, local >> iterations >> >> > = 1, omega = 1. >> >> > linear system matrix followed by preconditioner matrix: >> >> > Mat Object: 384 MPI processes >> >> > type: mffd >> >> > rows=3020875, cols=3020875 >> >> > Matrix-free approximation: >> >> > err=1.49012e-08 (relative error in function evaluation) >> >> > Using wp compute h routine >> >> > Does not compute normU >> >> > Mat Object: () 384 MPI processes >> >> > type: mpiaij >> >> > rows=3020875, cols=3020875 >> >> > total: nonzeros=215671710, allocated nonzeros=241731750 >> >> > total number of mallocs used during MatSetValues calls =0 >> >> > not using I-node (on process 0) routines >> >> > Up solver (post-smoother) same as down solver (pre-smoother) >> >> > linear system matrix followed by preconditioner matrix: >> >> > Mat Object: 384 MPI processes >> >> > type: mffd >> >> > rows=3020875, cols=3020875 >> >> > Matrix-free approximation: >> >> > err=1.49012e-08 (relative error in function evaluation) >> >> > Using wp compute h routine >> >> > Does not compute normU >> >> > Mat Object: () 384 MPI processes >> >> > type: mpiaij >> >> > rows=3020875, cols=3020875 >> >> > total: nonzeros=215671710, allocated nonzeros=241731750 >> >> > total number of mallocs used during MatSetValues calls =0 >> >> > not using I-node (on process 0) routines >> >> > >> >> > >> >> > Fande, >> >> > >> >> > On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams wrote: >> >> > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith >> wrote: >> >> > > >> >> > >> Does this mean that GAMG works for the symmetrical matrix only? >> >> > > >> >> > > No, it means that for non symmetric nonzero structure you need >> the >> >> > > extra flag. So use the extra flag. The reason we don't always use >> the flag >> >> > > is because it adds extra cost and isn't needed if the matrix >> already has a >> >> > > symmetric nonzero structure. >> >> > >> >> > BTW, if you have symmetric non-zero structure you can just set >> >> > -pc_gamg_threshold -1.0', note the "or" in the message. >> >> > >> >> > If you want to mess with the threshold then you need to use the >> >> > symmetrized flag. >> >> > >> >> >> > >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ibarletta at inogs.it Thu Apr 13 10:56:07 2017 From: ibarletta at inogs.it (Barletta, Ivano) Date: Thu, 13 Apr 2017 17:56:07 +0200 Subject: [petsc-users] specifying vertex coordinates using DMPlexCreateFromCellListParallel In-Reply-To: References: Message-ID: I'm interested in this topic as well Thanks Ivano 2017-04-11 16:21 GMT+02:00 Hassan Raiesi : > Hello, > > > > I?m trying to use DMPlexCreateFromCellListParallel to create a DM from an > already partitioned mesh, > > It requires an array of numVertices*spaceDim numbers, but how should one > order the coordinates of the vertices? > > we only pass the global vertex numbers using ?const int cells[]? to define > the cell-connectivity, so passing the vertex coordinates in local ordering > wouldn?t make sense? > > If it needs to be in global ordering, should I sort the global index of > the node numbers owned by each rank (as they wont be continuous). > > > > > > Thank you > > > > Hassan Raiesi, > > Bombardier Aerospace > > www.bombardier.com > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lailaizhu00 at gmail.com Thu Apr 13 20:29:05 2017 From: lailaizhu00 at gmail.com (Lailai Zhu) Date: Thu, 13 Apr 2017 21:29:05 -0400 Subject: [petsc-users] including petsc Mat and Vec in a user-defined structure in FORTRAN Message-ID: Dear petsc developers and users, I am currently using fortran version of petsc 3.7.*. I tried to define Mat or Vec variables in a user-defined structure like below, module myMOD type, public :: myStr Mat A Vec x,b end type myStr interface myStr !! user-defined constructor module procedure new_Str end interface myStr contains function new_Str() type(myStr) :: new_Str call VecCreate(petsc_comm_self,10,new_str%x,ierr) call vecgetsize(new_str%x, size, ierr) end function new_Str end module myMOD then i define an instance of myStr in another file like below type(myStr),save :: mystr1 mystr1 = myStr() It compiles and the veccreate executes without problem, however error occurs on the vecgetsize part, telling me [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Null argument, when expecting valid pointer [0]PETSC ERROR: Null Object: Parameter # 1 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 [0]PETSC ERROR: ./nek5000 on a pet3.7.5-mpich-intel named quququ by user Thu Apr 13 21:11:05 2017 [0]PETSC ERROR: Configure options --with-c++-support --with-shared-libraries=1 --known-mpi-shared-libraries=1 --with-batch=0 --with-mpi=1 --with-debugging=1 -download-fblaslapack=1 --download-blacs=1 --download-scalapack=1 --download-plapack=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort --download-dscpack=1 [0]PETSC ERROR: #1 VecGetSize() line 667 in /home/user/petsc/3.7.5/mpich_intel/src/vec/vec/interface/vector.c It seems to me that the petsc vector x belong to the derived type variable is recognized in the veccreate subroutine, but not known by the vecgetsize one. Is there way to work this around? or perhaps one cannot use petsc objects in such ways? Thanks in advance, best, lailai From may at bu.edu Thu Apr 13 21:13:06 2017 From: may at bu.edu (Young, Matthew, Adam) Date: Fri, 14 Apr 2017 02:13:06 +0000 Subject: [petsc-users] Field data in KSPSetCompute[RHS/Operators] Message-ID: I'd like to develop a hybrid fluid/PIC code based on petsc/petsc-3.7/src/ksp/ksp/examples/tutorials/ex50.c.html, in which KSPSolve() solves an equation for the electrostatic potential at each time step. To do so, I need KSPSetComputeOperators() and KSPSetComputeRHS() to know about scalar fields (e.g. density) that I compute by gathering the particles before solving for the potential. Should I pass them via an application context, store them in a DM dof, or something else? --Matt ---------------------------- Matthew Young PhD Candidate Astronomy Department Boston University -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Apr 13 21:34:37 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 13 Apr 2017 21:34:37 -0500 Subject: [petsc-users] including petsc Mat and Vec in a user-defined structure in FORTRAN In-Reply-To: References: Message-ID: <8FCEDC9B-2854-48AD-AC90-F5287EDB3B0D@mcs.anl.gov> VecCreate() does not take a size argument (perhaps you mean VecCreateSeq()?) hence when you try to get the size it is confused. Barry > On Apr 13, 2017, at 8:29 PM, Lailai Zhu wrote: > > Dear petsc developers and users, > > I am currently using fortran version of petsc 3.7.*. > I tried to define Mat or Vec variables in a user-defined structure like below, > > module myMOD > type, public :: myStr > Mat A > Vec x,b > end type myStr > > interface myStr !! user-defined constructor > module procedure new_Str > end interface myStr > > contains > function new_Str() > type(myStr) :: new_Str > call VecCreate(petsc_comm_self,10,new_str%x,ierr) > call vecgetsize(new_str%x, size, ierr) > end function new_Str > > end module myMOD > > > then i define an instance of myStr in another file like below > > type(myStr),save :: mystr1 > mystr1 = myStr() > > It compiles and the veccreate executes without problem, however error occurs on the vecgetsize part, > telling me > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Null argument, when expecting valid pointer > [0]PETSC ERROR: Null Object: Parameter # 1 > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 > [0]PETSC ERROR: ./nek5000 on a pet3.7.5-mpich-intel named quququ by user Thu Apr 13 21:11:05 2017 > [0]PETSC ERROR: Configure options --with-c++-support --with-shared-libraries=1 --known-mpi-shared-libraries=1 --with-batch=0 --with-mpi=1 --with-debugging=1 -download-fblaslapack=1 --download-blacs=1 --download-scalapack=1 --download-plapack=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort --download-dscpack=1 > [0]PETSC ERROR: #1 VecGetSize() line 667 in /home/user/petsc/3.7.5/mpich_intel/src/vec/vec/interface/vector.c > > It seems to me that the petsc vector x belong to the derived type variable is recognized in the veccreate subroutine, but > not known by the vecgetsize one. Is there way to work this around? or perhaps one cannot use petsc objects in such ways? > Thanks in advance, > > best, > lailai From bsmith at mcs.anl.gov Thu Apr 13 21:41:56 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 13 Apr 2017 21:41:56 -0500 Subject: [petsc-users] Field data in KSPSetCompute[RHS/Operators] In-Reply-To: References: Message-ID: <813A9E13-3950-4E85-8FF0-18F9B3F11FA0@mcs.anl.gov> Store the information in the context argument for these two functions (and use the same context argument for both, for simplicity). Barry > On Apr 13, 2017, at 9:13 PM, Young, Matthew, Adam wrote: > > I'd like to develop a hybrid fluid/PIC code based on petsc/petsc-3.7/src/ksp/ksp/examples/tutorials/ex50.c.html, in which KSPSolve() solves an equation for the electrostatic potential at each time step. To do so, I need KSPSetComputeOperators() and KSPSetComputeRHS() to know about scalar fields (e.g. density) that I compute by gathering the particles before solving for the potential. Should I pass them via an application context, store them in a DM dof, or something else? > > --Matt > > ---------------------------- > Matthew Young > PhD Candidate > Astronomy Department > Boston University From ingogaertner.tus at gmail.com Fri Apr 14 03:55:55 2017 From: ingogaertner.tus at gmail.com (Ingo Gaertner) Date: Fri, 14 Apr 2017 10:55:55 +0200 Subject: [petsc-users] Off-diagonal matrix-vector product y=(A-diag(A))x Message-ID: Does PETSc include an efficient implementation for the operation y=(A-diag(A))x or y_i=\sum_{j!=i}A_{ij}x_j on a sparse matrix A? In words, I need a matrix-vector product after the matrix diagonal has been set to zero. For efficiency reasons I can't copy and modify the matrix or first calculate the full product and then subtract the diagonal contribution. Thanks Ingo -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Apr 14 04:00:31 2017 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 14 Apr 2017 04:00:31 -0500 Subject: [petsc-users] dmplex face normals orientation In-Reply-To: References: Message-ID: On Wed, Apr 12, 2017 at 10:52 AM, Ingo Gaertner wrote: > Hello, > I have problems determining the orientation of the face normals of a > DMPlex. > > I create a DMPlex, for example with DMPlexCreateHexBoxMesh(). > Next, I get the face normals using DMPlexComputeGeometryFVM(DM dm, Vec > *cellgeom, Vec *facegeom). facegeom gives the correct normals, but I don't > know how the inside/outside is defined with respect to the adjacant cells? > The normal should be outward, unless the face has orientation < 0. > Finally, I iterate over all cells. For each cell I iterate over the > bounding faces (obtained from DMPlexGetCone) and try to obtain their > orientation with respect to the current cell using > DMPlexGetConeOrientation(). However, the six integers for the orientation > are the same for each cell. I expect them to flip between neighbour cells, > because if a face normal is pointing outside for any cell, the same normal > is pointing inside for its neighbour. Apparently I have a misunderstanding > here. > I see the orientations changing sign for adjacent cells. Want to send a simple code? You should see this for examples. You can run SNES ex12 with -dm_view ::ascii_info_detail to see the change in sign. Thanks, Matt > How can I make use of the face normals in facegeom and the orientation > values from DMPlexGetConeOrientation() to get the outside face normals for > each cell? > > Thank you > Ingo > > > Virenfrei. > www.avast.com > > <#m_-6432344443275881332_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From ingogaertner.tus at gmail.com Fri Apr 14 04:28:12 2017 From: ingogaertner.tus at gmail.com (Ingo Gaertner) Date: Fri, 14 Apr 2017 11:28:12 +0200 Subject: [petsc-users] dmplex face normals orientation In-Reply-To: References: Message-ID: Thank you, Matt, as you say, the face orientations do change sign when switching between the two adjacent cells. (I confused my program output. You are right.) But it seems not to be always correct to keep the normal direction for orientation>=0 and invert it for orientation<0: I include sample code below to make my question more clear. The program creates a HexBoxMesh split horizontally in two cells (at x=0.5) It produces this output: "Face centroids (c) and normals(n): face #008 c=(0.250000 0.000000 0.000000) n=(-0.000000 -0.500000 0.000000) face #009 c=(0.750000 0.000000 0.000000) n=(-0.000000 -0.500000 0.000000) face #010 c=(0.250000 1.000000 0.000000) n=(0.000000 0.500000 0.000000) face #011 c=(0.750000 1.000000 0.000000) n=(0.000000 0.500000 0.000000) face #012 c=(0.000000 0.500000 0.000000) n=(-1.000000 -0.000000 0.000000) face #013 c=(0.500000 0.500000 0.000000) n=(1.000000 0.000000 0.000000) face #014 c=(1.000000 0.500000 0.000000) n=(1.000000 0.000000 0.000000) Cell faces orientations: cell #0, faces:[8 13 10 12] orientations:[0 0 -2 -2] cell #1, faces:[9 14 11 13] orientations:[0 0 -2 -2]" Looking at the face normals, all boundary normals point outside (good). The normal of face #013 points outside with respect to the left cell #0, but inside w.r.t. the right cell #1. Face 13 is shared between both cells. It has orientation 0 for cell #0, but orientation -2 for cell #1 (good). What I don't understand is the orientation of face 12 (cell 0) and of face 11 (cell 1). These are negative, which would make them point into the cell. Have I done some other stupid mistake? Thanks Ingo Here is the code: static char help[] = "Check face normals orientations.\n\n"; #include #undef __FUNCT__ #define __FUNCT__ "main" int main(int argc, char **argv) { DM dm; PetscErrorCode ierr; PetscFVCellGeom *cgeom; PetscFVFaceGeom *fgeom; Vec cellgeom,facegeom; int dim=2; int cells[]={2,1}; int cStart,cEnd,fStart,fEnd; int coneSize,supportSize; const int *cone,*coneOrientation; ierr = PetscInitialize(&argc, &argv, NULL, help);CHKERRQ(ierr); ierr = PetscOptionsGetInt(NULL, NULL, "-dim", &dim, NULL);CHKERRQ(ierr); ierr = DMPlexCreateHexBoxMesh(PETSC_COMM_WORLD, dim, cells,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE, &dm);CHKERRQ(ierr); ierr = DMPlexComputeGeometryFVM(dm, &cellgeom,&facegeom);CHKERRQ(ierr); ierr = VecGetArray(cellgeom, (PetscScalar**)&cgeom);CHKERRQ(ierr); ierr = VecGetArray(facegeom, (PetscScalar**)&fgeom);CHKERRQ(ierr); ierr = DMPlexGetHeightStratum(dm, 0, &cStart, &cEnd);CHKERRQ(ierr); ierr = DMPlexGetHeightStratum(dm, 1, &fStart, &fEnd);CHKERRQ(ierr); fprintf(stderr,"Face centroids (c) and normals(n):\n"); for (int f=fStart;f: > On Wed, Apr 12, 2017 at 10:52 AM, Ingo Gaertner < > ingogaertner.tus at gmail.com> wrote: > >> Hello, >> I have problems determining the orientation of the face normals of a >> DMPlex. >> >> I create a DMPlex, for example with DMPlexCreateHexBoxMesh(). >> Next, I get the face normals using DMPlexComputeGeometryFVM(DM dm, Vec >> *cellgeom, Vec *facegeom). facegeom gives the correct normals, but I don't >> know how the inside/outside is defined with respect to the adjacant cells? >> > > The normal should be outward, unless the face has orientation < 0. > > >> Finally, I iterate over all cells. For each cell I iterate over the >> bounding faces (obtained from DMPlexGetCone) and try to obtain their >> orientation with respect to the current cell using >> DMPlexGetConeOrientation(). However, the six integers for the orientation >> are the same for each cell. I expect them to flip between neighbour cells, >> because if a face normal is pointing outside for any cell, the same normal >> is pointing inside for its neighbour. Apparently I have a misunderstanding >> here. >> > > I see the orientations changing sign for adjacent cells. Want to send a > simple code? You should see this > for examples. You can run SNES ex12 with -dm_view ::ascii_info_detail to > see the change in sign. > > Thanks, > > Matt > > >> How can I make use of the face normals in facegeom and the orientation >> values from DMPlexGetConeOrientation() to get the outside face normals for >> each cell? >> >> Thank you >> Ingo >> >> >> Virenfrei. >> www.avast.com >> >> <#m_-1897645241821352774_m_-6432344443275881332_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Fri Apr 14 09:50:10 2017 From: jed at jedbrown.org (Jed Brown) Date: Fri, 14 Apr 2017 08:50:10 -0600 Subject: [petsc-users] Off-diagonal matrix-vector product y=(A-diag(A))x In-Reply-To: References: Message-ID: <87d1cft55p.fsf@jedbrown.org> Ingo Gaertner writes: > Does PETSc include an efficient implementation for the operation > y=(A-diag(A))x or y_i=\sum_{j!=i}A_{ij}x_j on a sparse matrix A? > > In words, I need a matrix-vector product after the matrix diagonal has been > set to zero. For efficiency reasons I can't copy and modify the matrix or > first calculate the full product and then subtract the diagonal > contribution. How many entries per row in your matrix? There isn't a special function for this, but I'm skeptical that the performance gains of a custom implementation would be significant. Do you have a profile showing that it is? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From lailaizhu00 at gmail.com Fri Apr 14 11:24:56 2017 From: lailaizhu00 at gmail.com (Lailai Zhu) Date: Fri, 14 Apr 2017 12:24:56 -0400 Subject: [petsc-users] including petsc Mat and Vec in a user-defined structure in FORTRAN In-Reply-To: <8FCEDC9B-2854-48AD-AC90-F5287EDB3B0D@mcs.anl.gov> References: <8FCEDC9B-2854-48AD-AC90-F5287EDB3B0D@mcs.anl.gov> Message-ID: <85f49a9c-c940-6aa3-09e3-54a49f7fff5b@gmail.com> thanks, Barray, indeed this is due to this naive mistake. Thanks again. best, lailai On 04/13/2017 10:34 PM, Barry Smith wrote: > VecCreate() does not take a size argument (perhaps you mean VecCreateSeq()?) hence when you try to get the size it is confused. > > Barry > > > >> On Apr 13, 2017, at 8:29 PM, Lailai Zhu wrote: >> >> Dear petsc developers and users, >> >> I am currently using fortran version of petsc 3.7.*. >> I tried to define Mat or Vec variables in a user-defined structure like below, >> >> module myMOD >> type, public :: myStr >> Mat A >> Vec x,b >> end type myStr >> >> interface myStr !! user-defined constructor >> module procedure new_Str >> end interface myStr >> >> contains >> function new_Str() >> type(myStr) :: new_Str >> call VecCreate(petsc_comm_self,10,new_str%x,ierr) >> call vecgetsize(new_str%x, size, ierr) >> end function new_Str >> >> end module myMOD >> >> >> then i define an instance of myStr in another file like below >> >> type(myStr),save :: mystr1 >> mystr1 = myStr() >> >> It compiles and the veccreate executes without problem, however error occurs on the vecgetsize part, >> telling me >> >> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >> [0]PETSC ERROR: Null argument, when expecting valid pointer >> [0]PETSC ERROR: Null Object: Parameter # 1 >> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >> [0]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 >> [0]PETSC ERROR: ./nek5000 on a pet3.7.5-mpich-intel named quququ by user Thu Apr 13 21:11:05 2017 >> [0]PETSC ERROR: Configure options --with-c++-support --with-shared-libraries=1 --known-mpi-shared-libraries=1 --with-batch=0 --with-mpi=1 --with-debugging=1 -download-fblaslapack=1 --download-blacs=1 --download-scalapack=1 --download-plapack=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort --download-dscpack=1 >> [0]PETSC ERROR: #1 VecGetSize() line 667 in /home/user/petsc/3.7.5/mpich_intel/src/vec/vec/interface/vector.c >> >> It seems to me that the petsc vector x belong to the derived type variable is recognized in the veccreate subroutine, but >> not known by the vecgetsize one. Is there way to work this around? or perhaps one cannot use petsc objects in such ways? >> Thanks in advance, >> >> best, >> lailai From bsmith at mcs.anl.gov Fri Apr 14 11:37:15 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 14 Apr 2017 11:37:15 -0500 Subject: [petsc-users] Off-diagonal matrix-vector product y=(A-diag(A))x In-Reply-To: References: Message-ID: <4C503D16-85FF-48DF-BB99-8FC349F9E81F@mcs.anl.gov> You can do MatGetDiagonal() then do a MatMult() followed by subtracting a VecPointsizeMult(). Sure it is extra work but likely it is good enough. Only if you run profiling on your full application and find substantial time in these two ops you could write a custom multiply by taking the MatMult_SeqAIJ and copying it and modifying it. Barry > On Apr 14, 2017, at 3:55 AM, Ingo Gaertner wrote: > > Does PETSc include an efficient implementation for the operation > y=(A-diag(A))x or y_i=\sum_{j!=i}A_{ij}x_j on a sparse matrix A? > > In words, I need a matrix-vector product after the matrix diagonal has been set to zero. For efficiency reasons I can't copy and modify the matrix or first calculate the full product and then subtract the diagonal contribution. > > Thanks > Ingo From jed at jedbrown.org Fri Apr 14 12:38:18 2017 From: jed at jedbrown.org (Jed Brown) Date: Fri, 14 Apr 2017 11:38:18 -0600 Subject: [petsc-users] Off-diagonal matrix-vector product y=(A-diag(A))x In-Reply-To: <4C503D16-85FF-48DF-BB99-8FC349F9E81F@mcs.anl.gov> References: <4C503D16-85FF-48DF-BB99-8FC349F9E81F@mcs.anl.gov> Message-ID: <87shlasxdh.fsf@jedbrown.org> Barry Smith writes: > You can do MatGetDiagonal() then do a MatMult() followed by > subtracting a VecPointsizeMult(). I would use VecPointsizeMult followed by MatMultAdd. One fewer traversal of a vector. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From ingogaertner.tus at gmail.com Fri Apr 14 15:38:48 2017 From: ingogaertner.tus at gmail.com (Ingo Gaertner) Date: Fri, 14 Apr 2017 22:38:48 +0200 Subject: [petsc-users] Off-diagonal matrix-vector product y=(A-diag(A))x In-Reply-To: <87d1cft55p.fsf@jedbrown.org> References: <87d1cft55p.fsf@jedbrown.org> Message-ID: I have 2*ndim+1 entries per row (including the diagonal). In 2 dimensions, the suggested solution is 6 multiplications + 5 additions = 11 flops per row. The optimized solution is 4 multiplications + 3 additions = 7 flops per row. I call this significant. Thanks Ingo Virenfrei. www.avast.com <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> 2017-04-14 16:50 GMT+02:00 Jed Brown : > Ingo Gaertner writes: > > > Does PETSc include an efficient implementation for the operation > > y=(A-diag(A))x or y_i=\sum_{j!=i}A_{ij}x_j on a sparse matrix A? > > > > In words, I need a matrix-vector product after the matrix diagonal has > been > > set to zero. For efficiency reasons I can't copy and modify the matrix or > > first calculate the full product and then subtract the diagonal > > contribution. > > How many entries per row in your matrix? There isn't a special function > for this, but I'm skeptical that the performance gains of a custom > implementation would be significant. Do you have a profile showing that > it is? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Fri Apr 14 16:57:38 2017 From: jed at jedbrown.org (Jed Brown) Date: Fri, 14 Apr 2017 15:57:38 -0600 Subject: [petsc-users] Off-diagonal matrix-vector product y=(A-diag(A))x In-Reply-To: References: <87d1cft55p.fsf@jedbrown.org> Message-ID: <8737dasld9.fsf@jedbrown.org> Ingo Gaertner writes: > I have 2*ndim+1 entries per row (including the diagonal). > In 2 dimensions, the suggested solution is 6 multiplications + 5 additions > = 11 flops per row. > The optimized solution is 4 multiplications + 3 additions = 7 flops per row. Run it. Flops are not the performance limiting factor for these operations. Your algorithm still needs to traverse the matrix, which for 2D is 5*sizeof(PetscScalar)+6*sizeof(PetscInt) = 64 bytes per dof. It almost certainly does not cost less to apply the matrix than to apply it while skipping the diagonal entry. A custom implementation also will not be able to use vector-friendly matrix formats without extra masking which may indeed impact performance and in any case, would require a custom implementation for that format. I think you're wasting your time, but please run it to see. > I call this significant. > > Thanks > Ingo > > > > Virenfrei. > www.avast.com > > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > > 2017-04-14 16:50 GMT+02:00 Jed Brown : > >> Ingo Gaertner writes: >> >> > Does PETSc include an efficient implementation for the operation >> > y=(A-diag(A))x or y_i=\sum_{j!=i}A_{ij}x_j on a sparse matrix A? >> > >> > In words, I need a matrix-vector product after the matrix diagonal has >> been >> > set to zero. For efficiency reasons I can't copy and modify the matrix or >> > first calculate the full product and then subtract the diagonal >> > contribution. >> >> How many entries per row in your matrix? There isn't a special function >> for this, but I'm skeptical that the performance gains of a custom >> implementation would be significant. Do you have a profile showing that >> it is? >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From orxan.shibli at gmail.com Sat Apr 15 12:13:46 2017 From: orxan.shibli at gmail.com (Orxan Shibliyev) Date: Sat, 15 Apr 2017 20:13:46 +0300 Subject: [petsc-users] Partitioning does not work Message-ID: I modified ex11.c: Tests MatMeshToDual() in order to partition the unstructured grid provided in Petsc documentation , page 71. The resulting code is as follows but *MatView()* does not print entries of matrix, *dual* whereas in the original example it does. Why? static char help[] = "Tests MatMeshToDual()\n\n"; /*T Concepts: Mat^mesh partitioning Processors: n T*/ /* Include "petscmat.h" so that we can use matrices. automatically includes: petscsys.h - base PETSc routines petscvec.h - vectors petscmat.h - matrices petscis.h - index sets petscviewer.h - viewers */ #include #undef __FUNCT__ #define __FUNCT__ "main" int main(int argc,char **args) { Mat mesh,dual; PetscErrorCode ierr; PetscInt Nvertices = 4; /* total number of vertices */ PetscInt ncells = 2; /* number cells on this process */ PetscInt *ii,*jj; PetscMPIInt size,rank; MatPartitioning part; IS is; PetscInitialize(&argc,&args,(char*)0,help); ierr = MPI_Comm_size(MPI_COMM_WORLD,&size);CHKERRQ(ierr); if (size != 2) SETERRQ(PETSC_COMM_WORLD,PETSC_ERR_SUP,"This example is for exactly two processes"); ierr = MPI_Comm_rank(MPI_COMM_WORLD,&rank);CHKERRQ(ierr); ierr = PetscMalloc1(3,&ii);CHKERRQ(ierr); ierr = PetscMalloc1(3,&jj);CHKERRQ(ierr); if (rank == 0) { ii[0] = 0; ii[1] = 2; ii[2] = 3; jj[0] = 2; jj[1] = 3; jj[2] = 3; } else { ii[0] = 0; ii[1] = 1; ii[2] = 3; jj[0] = 0; jj[1] = 0; jj[2] = 1; } ierr = MatCreateMPIAdj(MPI_COMM_WORLD,ncells,Nvertices,ii,jj,NULL,&mesh);CHKERRQ(ierr); ierr = MatMeshToCellGraph(mesh,2,&dual);CHKERRQ(ierr); ierr = MatView(dual,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr); ierr = MatPartitioningCreate(MPI_COMM_WORLD,&part);CHKERRQ(ierr); ierr = MatPartitioningSetAdjacency(part,dual);CHKERRQ(ierr); ierr = MatPartitioningSetFromOptions(part);CHKERRQ(ierr); ierr = MatPartitioningApply(part,&is);CHKERRQ(ierr); ierr = ISView(is,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr); ierr = ISDestroy(&is);CHKERRQ(ierr); ierr = MatPartitioningDestroy(&part);CHKERRQ(ierr); ierr = MatDestroy(&mesh);CHKERRQ(ierr); ierr = MatDestroy(&dual);CHKERRQ(ierr); ierr = PetscFinalize(); return 0; } -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sat Apr 15 12:22:32 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 15 Apr 2017 12:22:32 -0500 Subject: [petsc-users] Partitioning does not work In-Reply-To: References: Message-ID: > On Apr 15, 2017, at 12:13 PM, Orxan Shibliyev wrote: > > I modified ex11.c: How did you modify it, what exactly did you change? > Tests MatMeshToDual() in order to partition the unstructured grid provided in Petsc documentation, page 71. The resulting code is as follows but MatView() does not print entries of matrix, dual whereas in the original example it does. What does MatView() show instead? How is the output different? Send as attachments the "modified" example and the output from both. > Why? > > static char help[] = "Tests MatMeshToDual()\n\n"; > > /*T > Concepts: Mat^mesh partitioning > Processors: n > T*/ > > /* > Include "petscmat.h" so that we can use matrices. > automatically includes: > petscsys.h - base PETSc routines petscvec.h - vectors > petscmat.h - matrices > petscis.h - index sets petscviewer.h - viewers > */ > #include > > #undef __FUNCT__ > #define __FUNCT__ "main" > int main(int argc,char **args) > { > Mat mesh,dual; > PetscErrorCode ierr; > PetscInt Nvertices = 4; /* total number of vertices */ > PetscInt ncells = 2; /* number cells on this process */ > PetscInt *ii,*jj; > PetscMPIInt size,rank; > MatPartitioning part; > IS is; > > PetscInitialize(&argc,&args,(char*)0,help); > ierr = MPI_Comm_size(MPI_COMM_WORLD,&size);CHKERRQ(ierr); > if (size != 2) SETERRQ(PETSC_COMM_WORLD,PETSC_ERR_SUP,"This example is for exactly two processes"); > ierr = MPI_Comm_rank(MPI_COMM_WORLD,&rank);CHKERRQ(ierr); > > ierr = PetscMalloc1(3,&ii);CHKERRQ(ierr); > ierr = PetscMalloc1(3,&jj);CHKERRQ(ierr); > if (rank == 0) { > ii[0] = 0; ii[1] = 2; ii[2] = 3; > jj[0] = 2; jj[1] = 3; jj[2] = 3; > } else { > ii[0] = 0; ii[1] = 1; ii[2] = 3; > jj[0] = 0; jj[1] = 0; jj[2] = 1; > } > ierr = MatCreateMPIAdj(MPI_COMM_WORLD,ncells,Nvertices,ii,jj,NULL,&mesh);CHKERRQ(ierr); > ierr = MatMeshToCellGraph(mesh,2,&dual);CHKERRQ(ierr); > ierr = MatView(dual,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr); > > ierr = MatPartitioningCreate(MPI_COMM_WORLD,&part);CHKERRQ(ierr); > ierr = MatPartitioningSetAdjacency(part,dual);CHKERRQ(ierr); > ierr = MatPartitioningSetFromOptions(part);CHKERRQ(ierr); > ierr = MatPartitioningApply(part,&is);CHKERRQ(ierr); > ierr = ISView(is,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr); > ierr = ISDestroy(&is);CHKERRQ(ierr); > ierr = MatPartitioningDestroy(&part);CHKERRQ(ierr); > > ierr = MatDestroy(&mesh);CHKERRQ(ierr); > ierr = MatDestroy(&dual);CHKERRQ(ierr); > ierr = PetscFinalize(); > return 0; > } From bsmith at mcs.anl.gov Sat Apr 15 14:33:14 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 15 Apr 2017 14:33:14 -0500 Subject: [petsc-users] Partitioning does not work In-Reply-To: References: Message-ID: Put a MatView() on the mesh matrix. The mesh information in your modified version is not a list of cells of triangles or quads and hence the conversion to dual doesn't find anything. Each row of the mesh matrix (for trianglular mesh) needs three entries indicating the vertices of the triangles (four entries for quad) and then the different rows need to make sense with respect to the other rows, so there are no overlapping triangles etc. In order to call MatMeshToDual() the mat has to correspond to a real mesh it cannot be any graph/sparse matrix. Barry > On Apr 15, 2017, at 12:40 PM, Orxan Shibliyev wrote: > > I attached two files. One includes the original example and the output at the end of the file while the other file includes modified example (commented modified lines) and also its output at the EOF. > > On Sat, Apr 15, 2017 at 8:22 PM, Barry Smith wrote: > > > On Apr 15, 2017, at 12:13 PM, Orxan Shibliyev wrote: > > > > I modified ex11.c: > > How did you modify it, what exactly did you change? > > > Tests MatMeshToDual() in order to partition the unstructured grid provided in Petsc documentation, page 71. The resulting code is as follows but MatView() does not print entries of matrix, dual whereas in the original example it does. > > What does MatView() show instead? How is the output different? Send as attachments the "modified" example and the output from both. > > > > > > Why? > > > > static char help[] = "Tests MatMeshToDual()\n\n"; > > > > /*T > > Concepts: Mat^mesh partitioning > > Processors: n > > T*/ > > > > /* > > Include "petscmat.h" so that we can use matrices. > > automatically includes: > > petscsys.h - base PETSc routines petscvec.h - vectors > > petscmat.h - matrices > > petscis.h - index sets petscviewer.h - viewers > > */ > > #include > > > > #undef __FUNCT__ > > #define __FUNCT__ "main" > > int main(int argc,char **args) > > { > > Mat mesh,dual; > > PetscErrorCode ierr; > > PetscInt Nvertices = 4; /* total number of vertices */ > > PetscInt ncells = 2; /* number cells on this process */ > > PetscInt *ii,*jj; > > PetscMPIInt size,rank; > > MatPartitioning part; > > IS is; > > > > PetscInitialize(&argc,&args,(char*)0,help); > > ierr = MPI_Comm_size(MPI_COMM_WORLD,&size);CHKERRQ(ierr); > > if (size != 2) SETERRQ(PETSC_COMM_WORLD,PETSC_ERR_SUP,"This example is for exactly two processes"); > > ierr = MPI_Comm_rank(MPI_COMM_WORLD,&rank);CHKERRQ(ierr); > > > > ierr = PetscMalloc1(3,&ii);CHKERRQ(ierr); > > ierr = PetscMalloc1(3,&jj);CHKERRQ(ierr); > > if (rank == 0) { > > ii[0] = 0; ii[1] = 2; ii[2] = 3; > > jj[0] = 2; jj[1] = 3; jj[2] = 3; > > } else { > > ii[0] = 0; ii[1] = 1; ii[2] = 3; > > jj[0] = 0; jj[1] = 0; jj[2] = 1; > > } > > ierr = MatCreateMPIAdj(MPI_COMM_WORLD,ncells,Nvertices,ii,jj,NULL,&mesh);CHKERRQ(ierr); > > ierr = MatMeshToCellGraph(mesh,2,&dual);CHKERRQ(ierr); > > ierr = MatView(dual,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr); > > > > ierr = MatPartitioningCreate(MPI_COMM_WORLD,&part);CHKERRQ(ierr); > > ierr = MatPartitioningSetAdjacency(part,dual);CHKERRQ(ierr); > > ierr = MatPartitioningSetFromOptions(part);CHKERRQ(ierr); > > ierr = MatPartitioningApply(part,&is);CHKERRQ(ierr); > > ierr = ISView(is,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr); > > ierr = ISDestroy(&is);CHKERRQ(ierr); > > ierr = MatPartitioningDestroy(&part);CHKERRQ(ierr); > > > > ierr = MatDestroy(&mesh);CHKERRQ(ierr); > > ierr = MatDestroy(&dual);CHKERRQ(ierr); > > ierr = PetscFinalize(); > > return 0; > > } > > > From bsmith at mcs.anl.gov Sat Apr 15 15:30:23 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 15 Apr 2017 15:30:23 -0500 Subject: [petsc-users] Partitioning does not work In-Reply-To: References: Message-ID: > On Apr 15, 2017, at 2:48 PM, Orxan Shibliyev wrote: > > Yes I just realized that before your reply. I would like to ask a few more questions. > ? MatCreateMPIAdj() takes "number of local rows" as argument. Well that's what normally I don't know prior to partitioning. Should not the number of elements assigned to partitions determined by Petsc? After all the number of elements in partitions do not need to equal. I have used METIS before but never provided this kind of parameter. It is whatever it is when you are providing the information. It need not be associated with a "good" partitioning, in fact, it probably is not associated with a good partitioning, otherwise you would not have to call the partitioner. For example if you are reading the list of cells from a file you just put an equal number of cells on each process and then do the partitioning process to determine where they "should go" to have a good partitioning. > ? Do I need to migrate elements to processes myself? This is normally what I do after using METIS. Yes, this API in PETSc is only useful for telling you how to partition, it doesn't move all the data to the "correct" processes. > ? Examples just stop after partitioning. After partitioning I just know to which processes the elements are assigned and orderings (IS objects). How can I get a matrix, A to use it in Ax=b? Depends, if you are using the finite element method you need to compute the element stiffness matrices for each element and call MatSetValues() to put them into the matrix. The PETSc DMPlex object manages most of the finite element/volume process for you so you should consider that. It handles doing the partitioning and moving the element information to the right process and doing the finite element integrations for you. Much easier than doing it all yourself. Barry > > > > On Sat, Apr 15, 2017 at 10:33 PM, Barry Smith wrote: > > Put a MatView() on the mesh matrix. > > The mesh information in your modified version is not a list of cells of triangles or quads and hence the conversion to dual doesn't find anything. > Each row of the mesh matrix (for trianglular mesh) needs three entries indicating the vertices of the triangles (four entries for quad) and then the different rows need to make sense with respect to the other rows, so there are no overlapping triangles etc. > > In order to call MatMeshToDual() the mat has to correspond to a real mesh it cannot be any graph/sparse matrix. > > Barry > > > > > On Apr 15, 2017, at 12:40 PM, Orxan Shibliyev wrote: > > > > I attached two files. One includes the original example and the output at the end of the file while the other file includes modified example (commented modified lines) and also its output at the EOF. > > > > On Sat, Apr 15, 2017 at 8:22 PM, Barry Smith wrote: > > > > > On Apr 15, 2017, at 12:13 PM, Orxan Shibliyev wrote: > > > > > > I modified ex11.c: > > > > How did you modify it, what exactly did you change? > > > > > Tests MatMeshToDual() in order to partition the unstructured grid provided in Petsc documentation, page 71. The resulting code is as follows but MatView() does not print entries of matrix, dual whereas in the original example it does. > > > > What does MatView() show instead? How is the output different? Send as attachments the "modified" example and the output from both. > > > > > > > > > > > Why? > > > > > > static char help[] = "Tests MatMeshToDual()\n\n"; > > > > > > /*T > > > Concepts: Mat^mesh partitioning > > > Processors: n > > > T*/ > > > > > > /* > > > Include "petscmat.h" so that we can use matrices. > > > automatically includes: > > > petscsys.h - base PETSc routines petscvec.h - vectors > > > petscmat.h - matrices > > > petscis.h - index sets petscviewer.h - viewers > > > */ > > > #include > > > > > > #undef __FUNCT__ > > > #define __FUNCT__ "main" > > > int main(int argc,char **args) > > > { > > > Mat mesh,dual; > > > PetscErrorCode ierr; > > > PetscInt Nvertices = 4; /* total number of vertices */ > > > PetscInt ncells = 2; /* number cells on this process */ > > > PetscInt *ii,*jj; > > > PetscMPIInt size,rank; > > > MatPartitioning part; > > > IS is; > > > > > > PetscInitialize(&argc,&args,(char*)0,help); > > > ierr = MPI_Comm_size(MPI_COMM_WORLD,&size);CHKERRQ(ierr); > > > if (size != 2) SETERRQ(PETSC_COMM_WORLD,PETSC_ERR_SUP,"This example is for exactly two processes"); > > > ierr = MPI_Comm_rank(MPI_COMM_WORLD,&rank);CHKERRQ(ierr); > > > > > > ierr = PetscMalloc1(3,&ii);CHKERRQ(ierr); > > > ierr = PetscMalloc1(3,&jj);CHKERRQ(ierr); > > > if (rank == 0) { > > > ii[0] = 0; ii[1] = 2; ii[2] = 3; > > > jj[0] = 2; jj[1] = 3; jj[2] = 3; > > > } else { > > > ii[0] = 0; ii[1] = 1; ii[2] = 3; > > > jj[0] = 0; jj[1] = 0; jj[2] = 1; > > > } > > > ierr = MatCreateMPIAdj(MPI_COMM_WORLD,ncells,Nvertices,ii,jj,NULL,&mesh);CHKERRQ(ierr); > > > ierr = MatMeshToCellGraph(mesh,2,&dual);CHKERRQ(ierr); > > > ierr = MatView(dual,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr); > > > > > > ierr = MatPartitioningCreate(MPI_COMM_WORLD,&part);CHKERRQ(ierr); > > > ierr = MatPartitioningSetAdjacency(part,dual);CHKERRQ(ierr); > > > ierr = MatPartitioningSetFromOptions(part);CHKERRQ(ierr); > > > ierr = MatPartitioningApply(part,&is);CHKERRQ(ierr); > > > ierr = ISView(is,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr); > > > ierr = ISDestroy(&is);CHKERRQ(ierr); > > > ierr = MatPartitioningDestroy(&part);CHKERRQ(ierr); > > > > > > ierr = MatDestroy(&mesh);CHKERRQ(ierr); > > > ierr = MatDestroy(&dual);CHKERRQ(ierr); > > > ierr = PetscFinalize(); > > > return 0; > > > } > > > > > > > > From bsmith at mcs.anl.gov Sat Apr 15 16:31:12 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 15 Apr 2017 16:31:12 -0500 Subject: [petsc-users] Partitioning does not work In-Reply-To: References: Message-ID: <4619E702-D4E7-42C4-92DC-5BF5D4D35C11@mcs.anl.gov> The DMPlex experts and point you in the correct direction. > On Apr 15, 2017, at 4:28 PM, Orxan Shibliyev wrote: > > About partitioning of DMPlex, manual says that, > > In exactly the same way as MatPartitioning and MatOrdering, we encode the results of a partition or order in an IS. However, the graph we are dealing with now is not the adjacency graph of the problem Jacobian, but the mesh itself. > > I could not find a function related to partitioning to feed in DM object. I know how to partition with adjacency matrix but not with DM. Would you refer me to an example in which DMPlex is partitioned or to a proper function? > > On Sat, Apr 15, 2017 at 11:30 PM, Barry Smith wrote: > > > On Apr 15, 2017, at 2:48 PM, Orxan Shibliyev wrote: > > > > Yes I just realized that before your reply. I would like to ask a few more questions. > > ? MatCreateMPIAdj() takes "number of local rows" as argument. Well that's what normally I don't know prior to partitioning. Should not the number of elements assigned to partitions determined by Petsc? After all the number of elements in partitions do not need to equal. I have used METIS before but never provided this kind of parameter. > > It is whatever it is when you are providing the information. It need not be associated with a "good" partitioning, in fact, it probably is not associated with a good partitioning, otherwise you would not have to call the partitioner. For example if you are reading the list of cells from a file you just put an equal number of cells on each process and then do the partitioning process to determine where they "should go" to have a good partitioning. > > > > ? Do I need to migrate elements to processes myself? This is normally what I do after using METIS. > > Yes, this API in PETSc is only useful for telling you how to partition, it doesn't move all the data to the "correct" processes. > > > ? Examples just stop after partitioning. After partitioning I just know to which processes the elements are assigned and orderings (IS objects). How can I get a matrix, A to use it in Ax=b? > > Depends, if you are using the finite element method you need to compute the element stiffness matrices for each element and call MatSetValues() to put them into the matrix. > > The PETSc DMPlex object manages most of the finite element/volume process for you so you should consider that. It handles doing the partitioning and moving the element information to the right process and doing the finite element integrations for you. Much easier than doing it all yourself. > > > Barry > > > > > > > > > On Sat, Apr 15, 2017 at 10:33 PM, Barry Smith wrote: > > > > Put a MatView() on the mesh matrix. > > > > The mesh information in your modified version is not a list of cells of triangles or quads and hence the conversion to dual doesn't find anything. > > Each row of the mesh matrix (for trianglular mesh) needs three entries indicating the vertices of the triangles (four entries for quad) and then the different rows need to make sense with respect to the other rows, so there are no overlapping triangles etc. > > > > In order to call MatMeshToDual() the mat has to correspond to a real mesh it cannot be any graph/sparse matrix. > > > > Barry > > > > > > > > > On Apr 15, 2017, at 12:40 PM, Orxan Shibliyev wrote: > > > > > > I attached two files. One includes the original example and the output at the end of the file while the other file includes modified example (commented modified lines) and also its output at the EOF. > > > > > > On Sat, Apr 15, 2017 at 8:22 PM, Barry Smith wrote: > > > > > > > On Apr 15, 2017, at 12:13 PM, Orxan Shibliyev wrote: > > > > > > > > I modified ex11.c: > > > > > > How did you modify it, what exactly did you change? > > > > > > > Tests MatMeshToDual() in order to partition the unstructured grid provided in Petsc documentation, page 71. The resulting code is as follows but MatView() does not print entries of matrix, dual whereas in the original example it does. > > > > > > What does MatView() show instead? How is the output different? Send as attachments the "modified" example and the output from both. > > > > > > > > > > > > > > > > Why? > > > > > > > > static char help[] = "Tests MatMeshToDual()\n\n"; > > > > > > > > /*T > > > > Concepts: Mat^mesh partitioning > > > > Processors: n > > > > T*/ > > > > > > > > /* > > > > Include "petscmat.h" so that we can use matrices. > > > > automatically includes: > > > > petscsys.h - base PETSc routines petscvec.h - vectors > > > > petscmat.h - matrices > > > > petscis.h - index sets petscviewer.h - viewers > > > > */ > > > > #include > > > > > > > > #undef __FUNCT__ > > > > #define __FUNCT__ "main" > > > > int main(int argc,char **args) > > > > { > > > > Mat mesh,dual; > > > > PetscErrorCode ierr; > > > > PetscInt Nvertices = 4; /* total number of vertices */ > > > > PetscInt ncells = 2; /* number cells on this process */ > > > > PetscInt *ii,*jj; > > > > PetscMPIInt size,rank; > > > > MatPartitioning part; > > > > IS is; > > > > > > > > PetscInitialize(&argc,&args,(char*)0,help); > > > > ierr = MPI_Comm_size(MPI_COMM_WORLD,&size);CHKERRQ(ierr); > > > > if (size != 2) SETERRQ(PETSC_COMM_WORLD,PETSC_ERR_SUP,"This example is for exactly two processes"); > > > > ierr = MPI_Comm_rank(MPI_COMM_WORLD,&rank);CHKERRQ(ierr); > > > > > > > > ierr = PetscMalloc1(3,&ii);CHKERRQ(ierr); > > > > ierr = PetscMalloc1(3,&jj);CHKERRQ(ierr); > > > > if (rank == 0) { > > > > ii[0] = 0; ii[1] = 2; ii[2] = 3; > > > > jj[0] = 2; jj[1] = 3; jj[2] = 3; > > > > } else { > > > > ii[0] = 0; ii[1] = 1; ii[2] = 3; > > > > jj[0] = 0; jj[1] = 0; jj[2] = 1; > > > > } > > > > ierr = MatCreateMPIAdj(MPI_COMM_WORLD,ncells,Nvertices,ii,jj,NULL,&mesh);CHKERRQ(ierr); > > > > ierr = MatMeshToCellGraph(mesh,2,&dual);CHKERRQ(ierr); > > > > ierr = MatView(dual,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr); > > > > > > > > ierr = MatPartitioningCreate(MPI_COMM_WORLD,&part);CHKERRQ(ierr); > > > > ierr = MatPartitioningSetAdjacency(part,dual);CHKERRQ(ierr); > > > > ierr = MatPartitioningSetFromOptions(part);CHKERRQ(ierr); > > > > ierr = MatPartitioningApply(part,&is);CHKERRQ(ierr); > > > > ierr = ISView(is,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr); > > > > ierr = ISDestroy(&is);CHKERRQ(ierr); > > > > ierr = MatPartitioningDestroy(&part);CHKERRQ(ierr); > > > > > > > > ierr = MatDestroy(&mesh);CHKERRQ(ierr); > > > > ierr = MatDestroy(&dual);CHKERRQ(ierr); > > > > ierr = PetscFinalize(); > > > > return 0; > > > > } > > > > > > > > > > > > > > > From ingogaertner.tus at gmail.com Tue Apr 18 00:46:32 2017 From: ingogaertner.tus at gmail.com (Ingo Gaertner) Date: Tue, 18 Apr 2017 07:46:32 +0200 Subject: [petsc-users] dmplex face normals orientation In-Reply-To: References: Message-ID: Dear Matt, please explain your previous answer ("The normal should be outward, unless the face has orientation < 0") with respect to my example. As I think, the example shows that the face normals for faces 11 and 12 are pointing outward, although they have orientation=-2. For all other faces the normal direction and their orientation sign agree with what you said. Thanks Ingo 2017-04-14 11:28 GMT+02:00 Ingo Gaertner : > Thank you, Matt, > as you say, the face orientations do change sign when switching between > the two adjacent cells. (I confused my program output. You are right.) But > it seems not to be always correct to keep the normal direction for > orientation>=0 and invert it for orientation<0: > > I include sample code below to make my question more clear. The program > creates a HexBoxMesh split horizontally in two cells (at x=0.5) It produces > this output: > > "Face centroids (c) and normals(n): > face #008 c=(0.250000 0.000000 0.000000) n=(-0.000000 -0.500000 0.000000) > face #009 c=(0.750000 0.000000 0.000000) n=(-0.000000 -0.500000 0.000000) > face #010 c=(0.250000 1.000000 0.000000) n=(0.000000 0.500000 0.000000) > face #011 c=(0.750000 1.000000 0.000000) n=(0.000000 0.500000 0.000000) > face #012 c=(0.000000 0.500000 0.000000) n=(-1.000000 -0.000000 0.000000) > face #013 c=(0.500000 0.500000 0.000000) n=(1.000000 0.000000 0.000000) > face #014 c=(1.000000 0.500000 0.000000) n=(1.000000 0.000000 0.000000) > Cell faces orientations: > cell #0, faces:[8 13 10 12] orientations:[0 0 -2 -2] > cell #1, faces:[9 14 11 13] orientations:[0 0 -2 -2]" > > Looking at the face normals, all boundary normals point outside (good). > The normal of face #013 points outside with respect to the left cell #0, > but inside w.r.t. the right cell #1. > > Face 13 is shared between both cells. It has orientation 0 for cell #0, > but orientation -2 for cell #1 (good). > What I don't understand is the orientation of face 12 (cell 0) and of face > 11 (cell 1). These are negative, which would make them point into the cell. > Have I done some other stupid mistake? > > Thanks > Ingo > > Here is the code: > > static char help[] = "Check face normals orientations.\n\n"; > #include > #undef __FUNCT__ > #define __FUNCT__ "main" > > int main(int argc, char **argv) > { > DM dm; > PetscErrorCode ierr; > PetscFVCellGeom *cgeom; > PetscFVFaceGeom *fgeom; > Vec cellgeom,facegeom; > int dim=2; > int cells[]={2,1}; > int cStart,cEnd,fStart,fEnd; > int coneSize,supportSize; > const int *cone,*coneOrientation; > > ierr = PetscInitialize(&argc, &argv, NULL, help);CHKERRQ(ierr); > ierr = PetscOptionsGetInt(NULL, NULL, "-dim", &dim, NULL);CHKERRQ(ierr); > ierr = DMPlexCreateHexBoxMesh(PETSC_COMM_WORLD, dim, > cells,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE, > &dm);CHKERRQ(ierr); > > ierr = DMPlexComputeGeometryFVM(dm, &cellgeom,&facegeom);CHKERRQ(ierr); > ierr = VecGetArray(cellgeom, (PetscScalar**)&cgeom);CHKERRQ(ierr); > ierr = VecGetArray(facegeom, (PetscScalar**)&fgeom);CHKERRQ(ierr); > ierr = DMPlexGetHeightStratum(dm, 0, &cStart, &cEnd);CHKERRQ(ierr); > ierr = DMPlexGetHeightStratum(dm, 1, &fStart, &fEnd);CHKERRQ(ierr); > > fprintf(stderr,"Face centroids (c) and normals(n):\n"); > for (int f=fStart;f fprintf(stderr,"face #%03d c=(%03f %03f %03f) n=(%03f %03f %03f)\n",f, > fgeom[f-fStart].centroid[0],fgeom[f-fStart].centroid[1],fgeo > m[f-fStart].centroid[2], > fgeom[f-fStart].normal[0],fgeom[f-fStart].normal[1],fgeom[f- > fStart].normal[2]); > } > > fprintf(stderr,"Cell faces orientations:\n"); > for (int c=cStart;c ierr = DMPlexGetConeSize(dm, c, &coneSize);CHKERRQ(ierr); > ierr = DMPlexGetCone(dm, c, &cone);CHKERRQ(ierr); > ierr = DMPlexGetConeOrientation(dm, c, &coneOrientation);CHKERRQ(ierr > ); > if (dim==2){ > if (coneSize!=4){ > fprintf(stderr,"Expected coneSize 4, got %d.\n",coneSize); > exit(1); > } > fprintf(stderr,"cell #%d, faces:[%d %d %d %d] orientations:[%d %d > %d %d]\n",c, > cone[0],cone[1],cone[2],cone[3], > coneOrientation[0],coneOrientation[1],coneOrientation[2],con > eOrientation[3] > ); > } else if (dim==3){ > if (coneSize!=6){ > fprintf(stderr,"Expected coneSize 6, got %d.\n",coneSize); > exit(1); > } > fprintf(stderr,"cell #%d, faces:[%d %d %d %d %d %d] > orientations:[%d %d %d %d %d %d]\n",c, > cone[0],cone[1],cone[2],cone[3],cone[4],cone[5], > coneOrientation[0],coneOrientation[1],coneOrientation[2],con > eOrientation[3],coneOrientation[4],coneOrientation[5] > ); > } else { > fprintf(stderr,"Dimension %d not implemented.\n",dim); > exit(1); > } > } > ierr = PetscFinalize(); > > } > > > 2017-04-14 11:00 GMT+02:00 Matthew Knepley : > >> On Wed, Apr 12, 2017 at 10:52 AM, Ingo Gaertner < >> ingogaertner.tus at gmail.com> wrote: >> >>> Hello, >>> I have problems determining the orientation of the face normals of a >>> DMPlex. >>> >>> I create a DMPlex, for example with DMPlexCreateHexBoxMesh(). >>> Next, I get the face normals using DMPlexComputeGeometryFVM(DM dm, Vec >>> *cellgeom, Vec *facegeom). facegeom gives the correct normals, but I don't >>> know how the inside/outside is defined with respect to the adjacant cells? >>> >> >> The normal should be outward, unless the face has orientation < 0. >> >> >>> Finally, I iterate over all cells. For each cell I iterate over the >>> bounding faces (obtained from DMPlexGetCone) and try to obtain their >>> orientation with respect to the current cell using >>> DMPlexGetConeOrientation(). However, the six integers for the orientation >>> are the same for each cell. I expect them to flip between neighbour cells, >>> because if a face normal is pointing outside for any cell, the same normal >>> is pointing inside for its neighbour. Apparently I have a misunderstanding >>> here. >>> >> >> I see the orientations changing sign for adjacent cells. Want to send a >> simple code? You should see this >> for examples. You can run SNES ex12 with -dm_view ::ascii_info_detail to >> see the change in sign. >> >> Thanks, >> >> Matt >> >> >>> How can I make use of the face normals in facegeom and the orientation >>> values from DMPlexGetConeOrientation() to get the outside face normals for >>> each cell? >>> >>> Thank you >>> Ingo >>> >>> >>> Virenfrei. >>> www.avast.com >>> >>> <#m_2360740100696784902_m_-1897645241821352774_m_-6432344443275881332_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lawrence.mitchell at imperial.ac.uk Tue Apr 18 07:02:27 2017 From: lawrence.mitchell at imperial.ac.uk (Lawrence Mitchell) Date: Tue, 18 Apr 2017 13:02:27 +0100 Subject: [petsc-users] Partitioning does not work In-Reply-To: <4619E702-D4E7-42C4-92DC-5BF5D4D35C11@mcs.anl.gov> References: <4619E702-D4E7-42C4-92DC-5BF5D4D35C11@mcs.anl.gov> Message-ID: <4C2AB3EC-512C-4836-AA1E-F12A72CE75EF@imperial.ac.uk> > On 15 Apr 2017, at 22:31, Barry Smith wrote: > > > The DMPlex experts and point you in the correct direction. > > >> On Apr 15, 2017, at 4:28 PM, Orxan Shibliyev wrote: >> >> About partitioning of DMPlex, manual says that, >> >> In exactly the same way as MatPartitioning and MatOrdering, we encode the results of a partition or order in an IS. However, the graph we are dealing with now is not the adjacency graph of the problem Jacobian, but the mesh itself. >> >> I could not find a function related to partitioning to feed in DM object. I know how to partition with adjacency matrix but not with DM. Would you refer me to an example in which DMPlex is partitioned or to a proper function? You want DMPlexDistribute. See, for example snes/examples/tutorials/ex12.c, which solves Poisson with simplicial elements. Lawrence From knepley at gmail.com Tue Apr 18 13:20:39 2017 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 18 Apr 2017 13:20:39 -0500 Subject: [petsc-users] dmplex face normals orientation In-Reply-To: References: Message-ID: On Tue, Apr 18, 2017 at 12:46 AM, Ingo Gaertner wrote: > Dear Matt, > please explain your previous answer ("The normal should be outward, unless > the face has orientation < 0") with respect to my example. > As I think, the example shows that the face normals for faces 11 and 12 > are pointing outward, although they have orientation=-2. For all other > faces the normal direction and their orientation sign agree with what you > said. > 1) You should destroy the objects at the end ierr = VecDestroy(&cellgeom);CHKERRQ(ierr); ierr = VecDestroy(&facegeom);CHKERRQ(ierr); ierr = DMDestroy(&dm);CHKERRQ(ierr); 2) You should call ierr = DMSetFromOptions(dm);CHKERRQ(ierr); ierr = DMViewFromOptions(dm, NULL, "-dm_view");CHKERRQ(ierr); after DM creation to make it easier to customize and debug. Then we can use -dm_view ::ascii_info_detail to look at the DM. 3) Lets look at Cell 0 [0]: 0 <---- 8 (0) [0]: 0 <---- 13 (0) [0]: 0 <---- 10 (-2) [0]: 0 <---- 12 (-2) There are two edges reversed. The edges themselves {8, 13, 10, 12} should proceed counter-clockwise from the bottom, and have vertices [0]: 8 <---- 2 (0) [0]: 8 <---- 3 (0) [0]: 13 <---- 3 (0) [0]: 13 <---- 6 (0) [0]: 10 <---- 5 (0) [0]: 10 <---- 6 (0) [0]: 12 <---- 2 (0) [0]: 12 <---- 5 (0) so we get as we expect 2 --> 3 --> 6 --> 5 which agrees with the coordinates ( 2) dim 2 offset 0 0. 0. ( 3) dim 2 offset 2 0.5 0. ( 4) dim 2 offset 4 1. 0. ( 5) dim 2 offset 6 0. 1. ( 6) dim 2 offset 8 0.5 1. ( 7) dim 2 offset 10 1. 1. Which part does not make sense? Thanks, Matt > Thanks > Ingo > > 2017-04-14 11:28 GMT+02:00 Ingo Gaertner : > >> Thank you, Matt, >> as you say, the face orientations do change sign when switching between >> the two adjacent cells. (I confused my program output. You are right.) But >> it seems not to be always correct to keep the normal direction for >> orientation>=0 and invert it for orientation<0: >> >> I include sample code below to make my question more clear. The program >> creates a HexBoxMesh split horizontally in two cells (at x=0.5) It produces >> this output: >> >> "Face centroids (c) and normals(n): >> face #008 c=(0.250000 0.000000 0.000000) n=(-0.000000 -0.500000 0.000000) >> face #009 c=(0.750000 0.000000 0.000000) n=(-0.000000 -0.500000 0.000000) >> face #010 c=(0.250000 1.000000 0.000000) n=(0.000000 0.500000 0.000000) >> face #011 c=(0.750000 1.000000 0.000000) n=(0.000000 0.500000 0.000000) >> face #012 c=(0.000000 0.500000 0.000000) n=(-1.000000 -0.000000 0.000000) >> face #013 c=(0.500000 0.500000 0.000000) n=(1.000000 0.000000 0.000000) >> face #014 c=(1.000000 0.500000 0.000000) n=(1.000000 0.000000 0.000000) >> Cell faces orientations: >> cell #0, faces:[8 13 10 12] orientations:[0 0 -2 -2] >> cell #1, faces:[9 14 11 13] orientations:[0 0 -2 -2]" >> >> Looking at the face normals, all boundary normals point outside (good). >> The normal of face #013 points outside with respect to the left cell #0, >> but inside w.r.t. the right cell #1. >> >> Face 13 is shared between both cells. It has orientation 0 for cell #0, >> but orientation -2 for cell #1 (good). >> What I don't understand is the orientation of face 12 (cell 0) and of >> face 11 (cell 1). These are negative, which would make them point into the >> cell. Have I done some other stupid mistake? >> >> Thanks >> Ingo >> >> Here is the code: >> >> static char help[] = "Check face normals orientations.\n\n"; >> #include >> #undef __FUNCT__ >> #define __FUNCT__ "main" >> >> int main(int argc, char **argv) >> { >> DM dm; >> PetscErrorCode ierr; >> PetscFVCellGeom *cgeom; >> PetscFVFaceGeom *fgeom; >> Vec cellgeom,facegeom; >> int dim=2; >> int cells[]={2,1}; >> int cStart,cEnd,fStart,fEnd; >> int coneSize,supportSize; >> const int *cone,*coneOrientation; >> >> ierr = PetscInitialize(&argc, &argv, NULL, help);CHKERRQ(ierr); >> ierr = PetscOptionsGetInt(NULL, NULL, "-dim", &dim, NULL);CHKERRQ(ierr); >> ierr = DMPlexCreateHexBoxMesh(PETSC_COMM_WORLD, dim, >> cells,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE, >> &dm);CHKERRQ(ierr); >> >> ierr = DMPlexComputeGeometryFVM(dm, &cellgeom,&facegeom);CHKERRQ(ierr); >> ierr = VecGetArray(cellgeom, (PetscScalar**)&cgeom);CHKERRQ(ierr); >> ierr = VecGetArray(facegeom, (PetscScalar**)&fgeom);CHKERRQ(ierr); >> ierr = DMPlexGetHeightStratum(dm, 0, &cStart, &cEnd);CHKERRQ(ierr); >> ierr = DMPlexGetHeightStratum(dm, 1, &fStart, &fEnd);CHKERRQ(ierr); >> >> fprintf(stderr,"Face centroids (c) and normals(n):\n"); >> for (int f=fStart;f> fprintf(stderr,"face #%03d c=(%03f %03f %03f) n=(%03f %03f %03f)\n",f, >> fgeom[f-fStart].centroid[0],fgeom[f-fStart].centroid[1],fgeo >> m[f-fStart].centroid[2], >> fgeom[f-fStart].normal[0],fgeom[f-fStart].normal[1],fgeom[f- >> fStart].normal[2]); >> } >> >> fprintf(stderr,"Cell faces orientations:\n"); >> for (int c=cStart;c> ierr = DMPlexGetConeSize(dm, c, &coneSize);CHKERRQ(ierr); >> ierr = DMPlexGetCone(dm, c, &cone);CHKERRQ(ierr); >> ierr = DMPlexGetConeOrientation(dm, c, >> &coneOrientation);CHKERRQ(ierr); >> if (dim==2){ >> if (coneSize!=4){ >> fprintf(stderr,"Expected coneSize 4, got %d.\n",coneSize); >> exit(1); >> } >> fprintf(stderr,"cell #%d, faces:[%d %d %d %d] orientations:[%d %d >> %d %d]\n",c, >> cone[0],cone[1],cone[2],cone[3], >> coneOrientation[0],coneOrientation[1],coneOrientation[2],con >> eOrientation[3] >> ); >> } else if (dim==3){ >> if (coneSize!=6){ >> fprintf(stderr,"Expected coneSize 6, got %d.\n",coneSize); >> exit(1); >> } >> fprintf(stderr,"cell #%d, faces:[%d %d %d %d %d %d] >> orientations:[%d %d %d %d %d %d]\n",c, >> cone[0],cone[1],cone[2],cone[3],cone[4],cone[5], >> coneOrientation[0],coneOrientation[1],coneOrientation[2],con >> eOrientation[3],coneOrientation[4],coneOrientation[5] >> ); >> } else { >> fprintf(stderr,"Dimension %d not implemented.\n",dim); >> exit(1); >> } >> } >> ierr = PetscFinalize(); >> >> } >> >> >> 2017-04-14 11:00 GMT+02:00 Matthew Knepley : >> >>> On Wed, Apr 12, 2017 at 10:52 AM, Ingo Gaertner < >>> ingogaertner.tus at gmail.com> wrote: >>> >>>> Hello, >>>> I have problems determining the orientation of the face normals of a >>>> DMPlex. >>>> >>>> I create a DMPlex, for example with DMPlexCreateHexBoxMesh(). >>>> Next, I get the face normals using DMPlexComputeGeometryFVM(DM dm, Vec >>>> *cellgeom, Vec *facegeom). facegeom gives the correct normals, but I don't >>>> know how the inside/outside is defined with respect to the adjacant cells? >>>> >>> >>> The normal should be outward, unless the face has orientation < 0. >>> >>> >>>> Finally, I iterate over all cells. For each cell I iterate over the >>>> bounding faces (obtained from DMPlexGetCone) and try to obtain their >>>> orientation with respect to the current cell using >>>> DMPlexGetConeOrientation(). However, the six integers for the orientation >>>> are the same for each cell. I expect them to flip between neighbour cells, >>>> because if a face normal is pointing outside for any cell, the same normal >>>> is pointing inside for its neighbour. Apparently I have a misunderstanding >>>> here. >>>> >>> >>> I see the orientations changing sign for adjacent cells. Want to send a >>> simple code? You should see this >>> for examples. You can run SNES ex12 with -dm_view ::ascii_info_detail to >>> see the change in sign. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> How can I make use of the face normals in facegeom and the orientation >>>> values from DMPlexGetConeOrientation() to get the outside face normals for >>>> each cell? >>>> >>>> Thank you >>>> Ingo >>>> >>>> >>>> Virenfrei. >>>> www.avast.com >>>> >>>> <#m_8179380476569161657_m_2360740100696784902_m_-1897645241821352774_m_-6432344443275881332_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> >>>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Tue Apr 18 15:21:28 2017 From: knepley at gmail.com (Matthew Knepley) Date: Tue, 18 Apr 2017 15:21:28 -0500 Subject: [petsc-users] dmplex face normals orientation In-Reply-To: References: Message-ID: On Tue, Apr 18, 2017 at 2:33 PM, Ingo Gaertner wrote: > The part that does not make sense is: > The code calculates that > face 11 (or edge 11 if you call the boundary of a 2D cell an edge) has the > centroid c=(0.750000 1.000000 0.000000) and the normal n=(0.000000 0.500000 > 0.000000) and that > face 12 (or edge 12) has the centroid c=(0.000000 0.500000 0.000000) and > the normal n=(-1.000000 -0.000000 0.000000). > I understood your previous answer ("The normal should be outward, unless > the face has orientation < 0") such that I have to reverse these normals to > get an outward normal, because faces 11 and 12 have orientation=-2. But the > normals n=(0.000000 0.500000 0.000000) and n=(-1.000000 -0.000000 > 0.000000) do already point outward. If I reverse them, they point inward. > > I need a CONSISTENT rule how to make use of the normals obtained from > DMPlexComputeGeometryFVM(dm, &cellgeom,&facegeom) to calculate OUTWARD > pointing normals with respect to EACH INDIVIDUAL cell. > Or do I have to iterate over each face of each cell and use a geometric > check to see if the calculated normals are pointing inward or outward? > I apologize. I did not understand the question before. The convention I have chosen might not be the best one, but it seems appropriate for FVM. The orientation of a face normal is chosen such that n . r > 0 where r is the vector from the centroid of the left face to the centroid of the right face. If we do GetSupport() for a face we get back {l, r}, where r could be empty, meaning the cell at infinity. This convention means that I never have to check orientations when I am using the facegeom[] stuff in FVM, I just need to have {l, r} which I generally have in the loop. Thanks, Matt > Thanks > Ingo > > > > 2017-04-18 20:20 GMT+02:00 Matthew Knepley : > >> On Tue, Apr 18, 2017 at 12:46 AM, Ingo Gaertner < >> ingogaertner.tus at gmail.com> wrote: >> >>> Dear Matt, >>> please explain your previous answer ("The normal should be outward, >>> unless the face has orientation < 0") with respect to my example. >>> As I think, the example shows that the face normals for faces 11 and 12 >>> are pointing outward, although they have orientation=-2. For all other >>> faces the normal direction and their orientation sign agree with what you >>> said. >>> >> >> 1) You should destroy the objects at the end >> >> ierr = VecDestroy(&cellgeom);CHKERRQ(ierr); >> ierr = VecDestroy(&facegeom);CHKERRQ(ierr); >> ierr = DMDestroy(&dm);CHKERRQ(ierr); >> >> 2) You should call >> >> ierr = DMSetFromOptions(dm);CHKERRQ(ierr); >> ierr = DMViewFromOptions(dm, NULL, "-dm_view");CHKERRQ(ierr); >> >> after DM creation to make it easier to customize and debug. Then we can >> use >> >> -dm_view ::ascii_info_detail >> >> to look at the DM. >> >> 3) Lets look at Cell 0 >> >> [0]: 0 <---- 8 (0) >> [0]: 0 <---- 13 (0) >> [0]: 0 <---- 10 (-2) >> [0]: 0 <---- 12 (-2) >> >> There are two edges reversed. The edges themselves {8, 13, 10, 12} >> should proceed counter-clockwise from the bottom, and have vertices >> >> [0]: 8 <---- 2 (0) >> [0]: 8 <---- 3 (0) >> [0]: 13 <---- 3 (0) >> [0]: 13 <---- 6 (0) >> [0]: 10 <---- 5 (0) >> [0]: 10 <---- 6 (0) >> [0]: 12 <---- 2 (0) >> [0]: 12 <---- 5 (0) >> >> so we get as we expect >> >> 2 --> 3 --> 6 --> 5 >> >> which agrees with the coordinates >> >> ( 2) dim 2 offset 0 0. 0. >> ( 3) dim 2 offset 2 0.5 0. >> ( 4) dim 2 offset 4 1. 0. >> ( 5) dim 2 offset 6 0. 1. >> ( 6) dim 2 offset 8 0.5 1. >> ( 7) dim 2 offset 10 1. 1. >> >> Which part does not make sense? >> >> Thanks, >> >> Matt >> >> >>> Thanks >>> Ingo >>> >>> 2017-04-14 11:28 GMT+02:00 Ingo Gaertner : >>> >>>> Thank you, Matt, >>>> as you say, the face orientations do change sign when switching between >>>> the two adjacent cells. (I confused my program output. You are right.) But >>>> it seems not to be always correct to keep the normal direction for >>>> orientation>=0 and invert it for orientation<0: >>>> >>>> I include sample code below to make my question more clear. The program >>>> creates a HexBoxMesh split horizontally in two cells (at x=0.5) It produces >>>> this output: >>>> >>>> "Face centroids (c) and normals(n): >>>> face #008 c=(0.250000 0.000000 0.000000) n=(-0.000000 -0.500000 >>>> 0.000000) >>>> face #009 c=(0.750000 0.000000 0.000000) n=(-0.000000 -0.500000 >>>> 0.000000) >>>> face #010 c=(0.250000 1.000000 0.000000) n=(0.000000 0.500000 0.000000) >>>> face #011 c=(0.750000 1.000000 0.000000) n=(0.000000 0.500000 0.000000) >>>> face #012 c=(0.000000 0.500000 0.000000) n=(-1.000000 -0.000000 >>>> 0.000000) >>>> face #013 c=(0.500000 0.500000 0.000000) n=(1.000000 0.000000 0.000000) >>>> face #014 c=(1.000000 0.500000 0.000000) n=(1.000000 0.000000 0.000000) >>>> Cell faces orientations: >>>> cell #0, faces:[8 13 10 12] orientations:[0 0 -2 -2] >>>> cell #1, faces:[9 14 11 13] orientations:[0 0 -2 -2]" >>>> >>>> Looking at the face normals, all boundary normals point outside (good). >>>> The normal of face #013 points outside with respect to the left cell >>>> #0, but inside w.r.t. the right cell #1. >>>> >>>> Face 13 is shared between both cells. It has orientation 0 for cell #0, >>>> but orientation -2 for cell #1 (good). >>>> What I don't understand is the orientation of face 12 (cell 0) and of >>>> face 11 (cell 1). These are negative, which would make them point into the >>>> cell. Have I done some other stupid mistake? >>>> >>>> Thanks >>>> Ingo >>>> >>>> Here is the code: >>>> >>>> static char help[] = "Check face normals orientations.\n\n"; >>>> #include >>>> #undef __FUNCT__ >>>> #define __FUNCT__ "main" >>>> >>>> int main(int argc, char **argv) >>>> { >>>> DM dm; >>>> PetscErrorCode ierr; >>>> PetscFVCellGeom *cgeom; >>>> PetscFVFaceGeom *fgeom; >>>> Vec cellgeom,facegeom; >>>> int dim=2; >>>> int cells[]={2,1}; >>>> int cStart,cEnd,fStart,fEnd; >>>> int coneSize,supportSize; >>>> const int *cone,*coneOrientation; >>>> >>>> ierr = PetscInitialize(&argc, &argv, NULL, help);CHKERRQ(ierr); >>>> ierr = PetscOptionsGetInt(NULL, NULL, "-dim", &dim, >>>> NULL);CHKERRQ(ierr); >>>> ierr = DMPlexCreateHexBoxMesh(PETSC_COMM_WORLD, dim, >>>> cells,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE, >>>> &dm);CHKERRQ(ierr); >>>> >>>> ierr = DMPlexComputeGeometryFVM(dm, &cellgeom,&facegeom);CHKERRQ(i >>>> err); >>>> ierr = VecGetArray(cellgeom, (PetscScalar**)&cgeom);CHKERRQ(ierr); >>>> ierr = VecGetArray(facegeom, (PetscScalar**)&fgeom);CHKERRQ(ierr); >>>> ierr = DMPlexGetHeightStratum(dm, 0, &cStart, &cEnd);CHKERRQ(ierr); >>>> ierr = DMPlexGetHeightStratum(dm, 1, &fStart, &fEnd);CHKERRQ(ierr); >>>> >>>> fprintf(stderr,"Face centroids (c) and normals(n):\n"); >>>> for (int f=fStart;f>>> fprintf(stderr,"face #%03d c=(%03f %03f %03f) n=(%03f %03f >>>> %03f)\n",f, >>>> fgeom[f-fStart].centroid[0],fgeom[f-fStart].centroid[1],fgeo >>>> m[f-fStart].centroid[2], >>>> fgeom[f-fStart].normal[0],fgeom[f-fStart].normal[1],fgeom[f- >>>> fStart].normal[2]); >>>> } >>>> >>>> fprintf(stderr,"Cell faces orientations:\n"); >>>> for (int c=cStart;c>>> ierr = DMPlexGetConeSize(dm, c, &coneSize);CHKERRQ(ierr); >>>> ierr = DMPlexGetCone(dm, c, &cone);CHKERRQ(ierr); >>>> ierr = DMPlexGetConeOrientation(dm, c, >>>> &coneOrientation);CHKERRQ(ierr); >>>> if (dim==2){ >>>> if (coneSize!=4){ >>>> fprintf(stderr,"Expected coneSize 4, got %d.\n",coneSize); >>>> exit(1); >>>> } >>>> fprintf(stderr,"cell #%d, faces:[%d %d %d %d] orientations:[%d >>>> %d %d %d]\n",c, >>>> cone[0],cone[1],cone[2],cone[3], >>>> coneOrientation[0],coneOrientation[1],coneOrientation[2],con >>>> eOrientation[3] >>>> ); >>>> } else if (dim==3){ >>>> if (coneSize!=6){ >>>> fprintf(stderr,"Expected coneSize 6, got %d.\n",coneSize); >>>> exit(1); >>>> } >>>> fprintf(stderr,"cell #%d, faces:[%d %d %d %d %d %d] >>>> orientations:[%d %d %d %d %d %d]\n",c, >>>> cone[0],cone[1],cone[2],cone[3],cone[4],cone[5], >>>> coneOrientation[0],coneOrientation[1],coneOrientation[2],con >>>> eOrientation[3],coneOrientation[4],coneOrientation[5] >>>> ); >>>> } else { >>>> fprintf(stderr,"Dimension %d not implemented.\n",dim); >>>> exit(1); >>>> } >>>> } >>>> ierr = PetscFinalize(); >>>> >>>> } >>>> >>>> >>>> 2017-04-14 11:00 GMT+02:00 Matthew Knepley : >>>> >>>>> On Wed, Apr 12, 2017 at 10:52 AM, Ingo Gaertner < >>>>> ingogaertner.tus at gmail.com> wrote: >>>>> >>>>>> Hello, >>>>>> I have problems determining the orientation of the face normals of a >>>>>> DMPlex. >>>>>> >>>>>> I create a DMPlex, for example with DMPlexCreateHexBoxMesh(). >>>>>> Next, I get the face normals using DMPlexComputeGeometryFVM(DM dm, >>>>>> Vec *cellgeom, Vec *facegeom). facegeom gives the correct normals, but I >>>>>> don't know how the inside/outside is defined with respect to the adjacant >>>>>> cells? >>>>>> >>>>> >>>>> The normal should be outward, unless the face has orientation < 0. >>>>> >>>>> >>>>>> Finally, I iterate over all cells. For each cell I iterate over the >>>>>> bounding faces (obtained from DMPlexGetCone) and try to obtain their >>>>>> orientation with respect to the current cell using >>>>>> DMPlexGetConeOrientation(). However, the six integers for the orientation >>>>>> are the same for each cell. I expect them to flip between neighbour cells, >>>>>> because if a face normal is pointing outside for any cell, the same normal >>>>>> is pointing inside for its neighbour. Apparently I have a misunderstanding >>>>>> here. >>>>>> >>>>> >>>>> I see the orientations changing sign for adjacent cells. Want to send >>>>> a simple code? You should see this >>>>> for examples. You can run SNES ex12 with -dm_view ::ascii_info_detail >>>>> to see the change in sign. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> How can I make use of the face normals in facegeom and the >>>>>> orientation values from DMPlexGetConeOrientation() to get the outside face >>>>>> normals for each cell? >>>>>> >>>>>> Thank you >>>>>> Ingo >>>>>> >>>>>> >>>>>> Virenfrei. >>>>>> www.avast.com >>>>>> >>>>>> <#m_1492622701141221377_m_-6171374370410339176_m_8179380476569161657_m_2360740100696784902_m_-1897645241821352774_m_-6432344443275881332_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>> >>>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From francescomigliorini93 at gmail.com Wed Apr 19 05:26:05 2017 From: francescomigliorini93 at gmail.com (Francesco Migliorini) Date: Wed, 19 Apr 2017 12:26:05 +0200 Subject: [petsc-users] VecAssembly gives segmentation fault with MPI Message-ID: Hello! I have an MPI code in which a linear system is created and solved with PETSc. It works in sequential run but when I use multiple cores the VecAssemblyBegin/End give segmentation fault. Here's a sample of my code: call PetscInitialize(PETSC_NULL_CHARACTER,perr) ind(1) = 3*nnod_loc*max_time_deg call VecCreate(PETSC_COMM_WORLD,feP,perr) call VecSetSizes(feP,PETSC_DECIDE,ind,perr) call VecSetFromOptions(feP,perr) do in = nnod_loc do jt = 1,mm ind(1) = 3*((in -1)*max_time_deg + (jt-1)) fval(1) = fe(3*((in -1)*max_time_deg + (jt-1)) +1) call VecSetValues(feP,1,ind,fval(1),INSERT_VALUES,perr) ind(1) = 3*((in -1)*max_time_deg + (jt-1)) +1 fval(1) = fe(3*((in -1)*max_time_deg + (jt-1)) +2) call VecSetValues(feP,1,ind,fval(1),INSERT_VALUES,perr) ind(1) = 3*((in -1)*max_time_deg + (jt-1)) +2 fval(1) = fe(3*((in -1)*max_time_deg + (jt-1)) +3) call VecSetValues(feP,1,ind,fval(1),INSERT_VALUES,perr) enddo enddo enddo call VecAssemblyBegin(feP,perr) call VecAssemblyEnd(feP,perr) The vector has 640.000 elements more or less but I am running on a high performing computer so there shouldn't be memory issues. Does anyone know where is the problem and how can I fix it? Thank you, Francesco Migliorini -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Apr 19 06:20:05 2017 From: jed at jedbrown.org (Jed Brown) Date: Wed, 19 Apr 2017 05:20:05 -0600 Subject: [petsc-users] VecAssembly gives segmentation fault with MPI In-Reply-To: References: Message-ID: <8737d4lk4a.fsf@jedbrown.org> Francesco Migliorini writes: > Hello! > > I have an MPI code in which a linear system is created and solved with > PETSc. It works in sequential run but when I use multiple cores the > VecAssemblyBegin/End give segmentation fault. Here's a sample of my code: > > call PetscInitialize(PETSC_NULL_CHARACTER,perr) > > ind(1) = 3*nnod_loc*max_time_deg > call VecCreate(PETSC_COMM_WORLD,feP,perr) > call VecSetSizes(feP,PETSC_DECIDE,ind,perr) You set the global size here (does "nnod_loc" mean local? and is it the same size on every process?), but then set values for all of these below. > call VecSetFromOptions(feP,perr) > > do in = nnod_loc > do jt = 1,mm What is mm? > ind(1) = 3*((in -1)*max_time_deg + (jt-1)) > fval(1) = fe(3*((in -1)*max_time_deg + (jt-1)) +1) > call VecSetValues(feP,1,ind,fval(1),INSERT_VALUES,perr) > ind(1) = 3*((in -1)*max_time_deg + (jt-1)) +1 > fval(1) = fe(3*((in -1)*max_time_deg + (jt-1)) +2) > call VecSetValues(feP,1,ind,fval(1),INSERT_VALUES,perr) > ind(1) = 3*((in -1)*max_time_deg + (jt-1)) +2 > fval(1) = fe(3*((in -1)*max_time_deg + (jt-1)) +3) > call VecSetValues(feP,1,ind,fval(1),INSERT_VALUES,perr) > enddo > enddo > enddo > call VecAssemblyBegin(feP,perr) > call VecAssemblyEnd(feP,perr) > > The vector has 640.000 elements more or less but I am running on a high > performing computer so there shouldn't be memory issues. Does anyone know > where is the problem and how can I fix it? > > Thank you, > Francesco Migliorini -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From rupp at iue.tuwien.ac.at Wed Apr 19 06:25:06 2017 From: rupp at iue.tuwien.ac.at (Karl Rupp) Date: Wed, 19 Apr 2017 13:25:06 +0200 Subject: [petsc-users] VecAssembly gives segmentation fault with MPI In-Reply-To: References: Message-ID: <1d8e5502-2d9c-92db-6287-1823adbf2a99@iue.tuwien.ac.at> Hi Francesco, please consider the following: a) run your code through valgrind to locate the segmentation fault. Maybe there is already a memory access problem in the sequential version. b) send any error messages as well as the stack trace. c) what is you intent with "do in = nnod_loc"? Isn't nnoc_loc the number of local elements? Best regards, Karli On 04/19/2017 12:26 PM, Francesco Migliorini wrote: > Hello! > > I have an MPI code in which a linear system is created and solved with > PETSc. It works in sequential run but when I use multiple cores the > VecAssemblyBegin/End give segmentation fault. Here's a sample of my code: > > call PetscInitialize(PETSC_NULL_CHARACTER,perr) > > ind(1) = 3*nnod_loc*max_time_deg > call VecCreate(PETSC_COMM_WORLD,feP,perr) > call VecSetSizes(feP,PETSC_DECIDE,ind,perr) > call VecSetFromOptions(feP,perr) > > do in = nnod_loc > do jt = 1,mm > ind(1) = 3*((in -1)*max_time_deg + (jt-1)) > fval(1) = fe(3*((in -1)*max_time_deg + (jt-1)) +1) > call VecSetValues(feP,1,ind,fval(1),INSERT_VALUES,perr) > ind(1) = 3*((in -1)*max_time_deg + (jt-1)) +1 > fval(1) = fe(3*((in -1)*max_time_deg + (jt-1)) +2) > call VecSetValues(feP,1,ind,fval(1),INSERT_VALUES,perr) > ind(1) = 3*((in -1)*max_time_deg + (jt-1)) +2 > fval(1) = fe(3*((in -1)*max_time_deg + (jt-1)) +3) > call VecSetValues(feP,1,ind,fval(1),INSERT_VALUES,perr) > enddo > enddo > enddo > call VecAssemblyBegin(feP,perr) > call VecAssemblyEnd(feP,perr) > > The vector has 640.000 elements more or less but I am running on a high > performing computer so there shouldn't be memory issues. Does anyone > know where is the problem and how can I fix it? > > Thank you, > Francesco Migliorini From jed at jedbrown.org Wed Apr 19 08:07:44 2017 From: jed at jedbrown.org (Jed Brown) Date: Wed, 19 Apr 2017 07:07:44 -0600 Subject: [petsc-users] VecAssembly gives segmentation fault with MPI In-Reply-To: References: <8737d4lk4a.fsf@jedbrown.org> Message-ID: <87tw5kk0kf.fsf@jedbrown.org> Please always use "reply-all" so that your messages go to the list. This is standard mailing list etiquette. It is important to preserve threading for people who find this discussion later and so that we do not waste our time re-answering the same questions that have already been answered in private side-conversations. You'll likely get an answer faster that way too. Francesco Migliorini writes: > Hi, thank you for your answer! > > Yes xxx_loc means local but it is referred to the MPI processes, so each > process has different xxx_loc values. Always use Debug mode PETSc so we can check when you pass inconsistent information to Vec. The global size needs to be the same on every process. > Indeed, the program arrives to Petsc initialization with already > multiple processes. Then I thought Petsc was applied to all the > processes separately and therefore the global dimensions of the system > were the local ones of the MPI processes. Maybe it does not work in > this way... However mm is parameter equal for all the processes (in > particular it is 3) and the processes do not have exactly the same > number of nodes. > > 2017-04-19 13:20 GMT+02:00 Jed Brown : > >> Francesco Migliorini writes: >> >> > Hello! >> > >> > I have an MPI code in which a linear system is created and solved with >> > PETSc. It works in sequential run but when I use multiple cores the >> > VecAssemblyBegin/End give segmentation fault. Here's a sample of my code: >> > >> > call PetscInitialize(PETSC_NULL_CHARACTER,perr) >> > >> > ind(1) = 3*nnod_loc*max_time_deg >> > call VecCreate(PETSC_COMM_WORLD,feP,perr) >> > call VecSetSizes(feP,PETSC_DECIDE,ind,perr) >> >> You set the global size here (does "nnod_loc" mean local? and is it the >> same size on every process?), but then set values for all of these >> below. >> >> > call VecSetFromOptions(feP,perr) >> > >> > do in = nnod_loc >> > do jt = 1,mm >> >> What is mm? >> >> > ind(1) = 3*((in -1)*max_time_deg + (jt-1)) >> > fval(1) = fe(3*((in -1)*max_time_deg + (jt-1)) +1) >> > call VecSetValues(feP,1,ind,fval(1),INSERT_VALUES,perr) >> > ind(1) = 3*((in -1)*max_time_deg + (jt-1)) +1 >> > fval(1) = fe(3*((in -1)*max_time_deg + (jt-1)) +2) >> > call VecSetValues(feP,1,ind,fval(1),INSERT_VALUES,perr) >> > ind(1) = 3*((in -1)*max_time_deg + (jt-1)) +2 >> > fval(1) = fe(3*((in -1)*max_time_deg + (jt-1)) +3) >> > call VecSetValues(feP,1,ind,fval(1),INSERT_VALUES,perr) >> > enddo >> > enddo >> > enddo >> > call VecAssemblyBegin(feP,perr) >> > call VecAssemblyEnd(feP,perr) >> > >> > The vector has 640.000 elements more or less but I am running on a high >> > performing computer so there shouldn't be memory issues. Does anyone know >> > where is the problem and how can I fix it? >> > >> > Thank you, >> > Francesco Migliorini >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From rupp at iue.tuwien.ac.at Wed Apr 19 08:09:57 2017 From: rupp at iue.tuwien.ac.at (Karl Rupp) Date: Wed, 19 Apr 2017 15:09:57 +0200 Subject: [petsc-users] VecAssembly gives segmentation fault with MPI In-Reply-To: References: <1d8e5502-2d9c-92db-6287-1823adbf2a99@iue.tuwien.ac.at> Message-ID: <2d9dae82-e733-a0ed-5606-364dd8e542e1@iue.tuwien.ac.at> Hi Francesco, please don't drop petsc-users from the communication. This will likely provide you with better and faster answers. Since your current build is with debugging turned off, please reconfigure with debugging turned on, as the error message says. Chances are good that you will get much more precise information about what went wrong. Best regards, Karli On 04/19/2017 03:03 PM, Francesco Migliorini wrote: > Hi, thank you for your answer! > > Unfortunately I cannot use Valgrind on the machine I am using, but I am > sure than the error is in using VecAssembly. Here's the error message > from PETSc: > > [1]PETSC ERROR: > ------------------------------------------------------------------------ > [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [1]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS > X to find memory corruption errors > [1]PETSC ERROR: configure using --with-debugging=yes, recompile, link, > and run > [1]PETSC ERROR: to get more information on the crash. > [1]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [1]PETSC ERROR: Signal received > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [1]PETSC ERROR: Petsc Release Version 3.6.3, Dec, 03, 2015 > [1]PETSC ERROR: /u/migliorini/SPEED/SPEED on a arch-linux-opt named > idra116 by migliorini Wed Apr 19 10:20:48 2017 > [1]PETSC ERROR: Configure options > --prefix=/u/sw/pkgs/toolchains/gcc-glibc/5/pkgs/petsc/3.6.3 > --with-petsc-arch=arch-linux-opt --with-fortran=1 --with-pic=1 > --with-debugging=0 --with-x=0 --with-blas-lapack=1 > --with-blas-lib=/u/sw/pkgs/toolchains/gcc-glibc/5/pkgs/openblas/0.2.17/lib/libopenblas.so > --with-lapack-lib=/u/sw/pkgs/toolchains/gcc-glibc/5/pkgs/openblas/0.2.17/lib/libopenblas.so > --with-boost=1 > --with-boost-dir=/u/sw/pkgs/toolchains/gcc-glibc/5/pkgs/boost/1.60.0 > --with-fftw=1 > --with-fftw-dir=/u/sw/pkgs/toolchains/gcc-glibc/5/pkgs/fftw/3.3.4 > --with-hdf5=1 > --with-hdf5-dir=/u/sw/pkgs/toolchains/gcc-glibc/5/pkgs/hdf5/1.8.16 > --with-hypre=1 > --with-hypre-dir=/u/sw/pkgs/toolchains/gcc-glibc/5/pkgs/hypre/2.11.0 > --with-metis=1 > --with-metis-dir=/u/sw/pkgs/toolchains/gcc-glibc/5/pkgs/metis/5 > --with-mumps=1 > --with-mumps-dir=/u/sw/pkgs/toolchains/gcc-glibc/5/pkgs/mumps/5.0.1 > --with-netcdf=1 > --with-netcdf-dir=/u/sw/pkgs/toolchains/gcc-glibc/5/pkgs/netcdf/4.4.0 > --with-p4est=1 > --with-p4est-dir=/u/sw/pkgs/toolchains/gcc-glibc/5/pkgs/p4est/1.1 > --with-parmetis=1 > --with-parmetis-dir=/u/sw/pkgs/toolchains/gcc-glibc/5/pkgs/metis/5 > --with-ptscotch=1 > --with-ptscotch-dir=/u/sw/pkgs/toolchains/gcc-glibc/5/pkgs/scotch/6.0.4 > --with-scalapack=1 > --with-scalapack-dir=/u/sw/pkgs/toolchains/gcc-glibc/5/pkgs/scalapack/2.0.2 > --with-suitesparse=1 > --with-suitesparse-dir=/u/sw/pkgs/toolchains/gcc-glibc/5/pkgs/suitesparse/4.5.1 > [1]PETSC ERROR: #1 User provided function() line 0 in unknown file > -------------------------------------------------------------------------- > MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD > with errorcode 59. > > do in=1,nnod_loc is a loop over the nodes contained in the local vector > because the program arrives to Petsc initialization with already > multiple processes. Then I thought Petsc was applied to all the > processes separately and therefore the global dimensions of the system > were the local ones of the MPI processes. Maybe it does not work in this > way... > > 2017-04-19 13:25 GMT+02:00 Karl Rupp >: > > Hi Francesco, > > please consider the following: > > a) run your code through valgrind to locate the segmentation fault. > Maybe there is already a memory access problem in the sequential > version. > > b) send any error messages as well as the stack trace. > > c) what is you intent with "do in = nnod_loc"? Isn't nnoc_loc the > number of local elements? > > Best regards, > Karli > > > > > On 04/19/2017 12:26 PM, Francesco Migliorini wrote: > > Hello! > > I have an MPI code in which a linear system is created and > solved with > PETSc. It works in sequential run but when I use multiple cores the > VecAssemblyBegin/End give segmentation fault. Here's a sample of > my code: > > call PetscInitialize(PETSC_NULL_CHARACTER,perr) > > ind(1) = 3*nnod_loc*max_time_deg > call VecCreate(PETSC_COMM_WORLD,feP,perr) > call VecSetSizes(feP,PETSC_DECIDE,ind,perr) > call VecSetFromOptions(feP,perr) > > do in = nnod_loc > do jt = 1,mm > ind(1) = 3*((in -1)*max_time_deg + (jt-1)) > fval(1) = fe(3*((in -1)*max_time_deg + (jt-1)) +1) > call VecSetValues(feP,1,ind,fval(1),INSERT_VALUES,perr) > ind(1) = 3*((in -1)*max_time_deg + (jt-1)) +1 > fval(1) = fe(3*((in -1)*max_time_deg + (jt-1)) +2) > call VecSetValues(feP,1,ind,fval(1),INSERT_VALUES,perr) > ind(1) = 3*((in -1)*max_time_deg + (jt-1)) +2 > fval(1) = fe(3*((in -1)*max_time_deg + (jt-1)) +3) > call VecSetValues(feP,1,ind,fval(1),INSERT_VALUES,perr) > enddo > enddo > enddo > call VecAssemblyBegin(feP,perr) > call VecAssemblyEnd(feP,perr) > > The vector has 640.000 elements more or less but I am running on > a high > performing computer so there shouldn't be memory issues. Does anyone > know where is the problem and how can I fix it? > > Thank you, > Francesco Migliorini > > From fande.kong at inl.gov Wed Apr 19 10:31:30 2017 From: fande.kong at inl.gov (Kong, Fande) Date: Wed, 19 Apr 2017 09:31:30 -0600 Subject: [petsc-users] GAMG for the unsymmetrical matrix In-Reply-To: References: <66D79752-D4C3-4A2E-ADCE-EFC23C93CCD7@mcs.anl.gov> <772D2966-F917-44D1-B2AC-B0F4E506DC7C@mcs.anl.gov> Message-ID: Thanks, Mark, Now, the total compute time using GAMG is competitive with ASM. Looks like I could not use something like: "-mg_level_1_ksp_type gmres" because this option makes the compute time much worse. Fande, On Thu, Apr 13, 2017 at 9:14 AM, Mark Adams wrote: > > > On Wed, Apr 12, 2017 at 7:04 PM, Kong, Fande wrote: > >> >> >> On Sun, Apr 9, 2017 at 6:04 AM, Mark Adams wrote: >> >>> You seem to have two levels here and 3M eqs on the fine grid and 37 on >>> the coarse grid. I don't understand that. >>> >>> You are also calling the AMG setup a lot, but not spending much time >>> in it. Try running with -info and grep on "GAMG". >>> >> >> I got the following output: >> >> [0] PCSetUp_GAMG(): level 0) N=3020875, n data rows=1, n data cols=1, >> nnz/row (ave)=71, np=384 >> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold >> 0., 73.6364 nnz ave. (N=3020875) >> [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square >> [0] PCGAMGProlongator_AGG(): New grid 18162 nodes >> [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.978702e+00 >> min=2.559747e-02 PC=jacobi >> [0] PCGAMGCreateLevel_GAMG(): Aggregate processors noop: new_size=384, >> neq(loc)=40 >> [0] PCSetUp_GAMG(): 1) N=18162, n data cols=1, nnz/row (ave)=94, 384 >> active pes >> [0] PCSetUp_GAMG(): 2 levels, grid complexity = 1.00795 >> [0] PCSetUp_GAMG(): level 0) N=3020875, n data rows=1, n data cols=1, >> nnz/row (ave)=71, np=384 >> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with threshold >> 0., 73.6364 nnz ave. (N=3020875) >> [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square >> [0] PCGAMGProlongator_AGG(): New grid 18145 nodes >> [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.978584e+00 >> min=2.557887e-02 PC=jacobi >> [0] PCGAMGCreateLevel_GAMG(): Aggregate processors noop: new_size=384, >> neq(loc)=37 >> [0] PCSetUp_GAMG(): 1) N=18145, n data cols=1, nnz/row (ave)=94, 384 >> active pes >> > > You are still doing two levels. Just use the parameters that I told you > and you should see that 1) this coarsest (last) grid has "1 active pes" and > 2) the overall solve time and overall convergence rate is much better. > > >> [0] PCSetUp_GAMG(): 2 levels, grid complexity = 1.00792 >> GAMG specific options >> PCGAMGGraph_AGG 40 1.0 8.0759e+00 1.0 3.56e+07 2.3 1.6e+06 1.9e+04 >> 7.6e+02 2 0 2 4 2 2 0 2 4 2 1170 >> PCGAMGCoarse_AGG 40 1.0 7.1698e+01 1.0 4.05e+09 2.3 4.0e+06 5.1e+04 >> 1.2e+03 18 37 5 27 3 18 37 5 27 3 14632 >> PCGAMGProl_AGG 40 1.0 9.2650e-01 1.2 0.00e+00 0.0 9.8e+05 2.9e+03 >> 9.6e+02 0 0 1 0 2 0 0 1 0 2 0 >> PCGAMGPOpt_AGG 40 1.0 2.4484e+00 1.0 4.72e+08 2.3 3.1e+06 2.3e+03 >> 1.9e+03 1 4 4 1 4 1 4 4 1 4 51328 >> GAMG: createProl 40 1.0 8.3786e+01 1.0 4.56e+09 2.3 9.6e+06 2.5e+04 >> 4.8e+03 21 42 12 32 10 21 42 12 32 10 14134 >> GAMG: partLevel 40 1.0 6.7755e+00 1.1 2.59e+08 2.3 2.9e+06 2.5e+03 >> 1.5e+03 2 2 4 1 3 2 2 4 1 3 9431 >> >> >> >> >> >> >> >> >>> >>> >>> On Fri, Apr 7, 2017 at 5:29 PM, Kong, Fande wrote: >>> > Thanks, Barry. >>> > >>> > It works. >>> > >>> > GAMG is three times better than ASM in terms of the number of linear >>> > iterations, but it is five times slower than ASM. Any suggestions to >>> improve >>> > the performance of GAMG? Log files are attached. >>> > >>> > Fande, >>> > >>> > On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith >>> wrote: >>> >> >>> >> >>> >> > On Apr 6, 2017, at 9:39 AM, Kong, Fande wrote: >>> >> > >>> >> > Thanks, Mark and Barry, >>> >> > >>> >> > It works pretty wells in terms of the number of linear iterations >>> (using >>> >> > "-pc_gamg_sym_graph true"), but it is horrible in the compute time. >>> I am >>> >> > using the two-level method via "-pc_mg_levels 2". The reason why >>> the compute >>> >> > time is larger than other preconditioning options is that a matrix >>> free >>> >> > method is used in the fine level and in my particular problem the >>> function >>> >> > evaluation is expensive. >>> >> > >>> >> > I am using "-snes_mf_operator 1" to turn on the Jacobian-free >>> Newton, >>> >> > but I do not think I want to make the preconditioning part >>> matrix-free. Do >>> >> > you guys know how to turn off the matrix-free method for GAMG? >>> >> >>> >> -pc_use_amat false >>> >> >>> >> > >>> >> > Here is the detailed solver: >>> >> > >>> >> > SNES Object: 384 MPI processes >>> >> > type: newtonls >>> >> > maximum iterations=200, maximum function evaluations=10000 >>> >> > tolerances: relative=1e-08, absolute=1e-08, solution=1e-50 >>> >> > total number of linear solver iterations=20 >>> >> > total number of function evaluations=166 >>> >> > norm schedule ALWAYS >>> >> > SNESLineSearch Object: 384 MPI processes >>> >> > type: bt >>> >> > interpolation: cubic >>> >> > alpha=1.000000e-04 >>> >> > maxstep=1.000000e+08, minlambda=1.000000e-12 >>> >> > tolerances: relative=1.000000e-08, absolute=1.000000e-15, >>> >> > lambda=1.000000e-08 >>> >> > maximum iterations=40 >>> >> > KSP Object: 384 MPI processes >>> >> > type: gmres >>> >> > GMRES: restart=100, using Classical (unmodified) Gram-Schmidt >>> >> > Orthogonalization with no iterative refinement >>> >> > GMRES: happy breakdown tolerance 1e-30 >>> >> > maximum iterations=100, initial guess is zero >>> >> > tolerances: relative=0.001, absolute=1e-50, divergence=10000. >>> >> > right preconditioning >>> >> > using UNPRECONDITIONED norm type for convergence test >>> >> > PC Object: 384 MPI processes >>> >> > type: gamg >>> >> > MG: type is MULTIPLICATIVE, levels=2 cycles=v >>> >> > Cycles per PCApply=1 >>> >> > Using Galerkin computed coarse grid matrices >>> >> > GAMG specific options >>> >> > Threshold for dropping small values from graph 0. >>> >> > AGG specific options >>> >> > Symmetric graph true >>> >> > Coarse grid solver -- level ------------------------------- >>> >> > KSP Object: (mg_coarse_) 384 MPI processes >>> >> > type: preonly >>> >> > maximum iterations=10000, initial guess is zero >>> >> > tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> >> > left preconditioning >>> >> > using NONE norm type for convergence test >>> >> > PC Object: (mg_coarse_) 384 MPI processes >>> >> > type: bjacobi >>> >> > block Jacobi: number of blocks = 384 >>> >> > Local solve is same for all blocks, in the following KSP >>> and >>> >> > PC objects: >>> >> > KSP Object: (mg_coarse_sub_) 1 MPI processes >>> >> > type: preonly >>> >> > maximum iterations=1, initial guess is zero >>> >> > tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> >> > left preconditioning >>> >> > using NONE norm type for convergence test >>> >> > PC Object: (mg_coarse_sub_) 1 MPI processes >>> >> > type: lu >>> >> > LU: out-of-place factorization >>> >> > tolerance for zero pivot 2.22045e-14 >>> >> > using diagonal shift on blocks to prevent zero pivot >>> >> > [INBLOCKS] >>> >> > matrix ordering: nd >>> >> > factor fill ratio given 5., needed 1.31367 >>> >> > Factored matrix follows: >>> >> > Mat Object: 1 MPI processes >>> >> > type: seqaij >>> >> > rows=37, cols=37 >>> >> > package used to perform factorization: petsc >>> >> > total: nonzeros=913, allocated nonzeros=913 >>> >> > total number of mallocs used during MatSetValues >>> calls >>> >> > =0 >>> >> > not using I-node routines >>> >> > linear system matrix = precond matrix: >>> >> > Mat Object: 1 MPI processes >>> >> > type: seqaij >>> >> > rows=37, cols=37 >>> >> > total: nonzeros=695, allocated nonzeros=695 >>> >> > total number of mallocs used during MatSetValues calls >>> =0 >>> >> > not using I-node routines >>> >> > linear system matrix = precond matrix: >>> >> > Mat Object: 384 MPI processes >>> >> > type: mpiaij >>> >> > rows=18145, cols=18145 >>> >> > total: nonzeros=1709115, allocated nonzeros=1709115 >>> >> > total number of mallocs used during MatSetValues calls =0 >>> >> > not using I-node (on process 0) routines >>> >> > Down solver (pre-smoother) on level 1 >>> >> > ------------------------------- >>> >> > KSP Object: (mg_levels_1_) 384 MPI processes >>> >> > type: chebyshev >>> >> > Chebyshev: eigenvalue estimates: min = 0.133339, max = >>> >> > 1.46673 >>> >> > Chebyshev: eigenvalues estimated using gmres with >>> translations >>> >> > [0. 0.1; 0. 1.1] >>> >> > KSP Object: (mg_levels_1_esteig_) 384 >>> MPI >>> >> > processes >>> >> > type: gmres >>> >> > GMRES: restart=30, using Classical (unmodified) >>> >> > Gram-Schmidt Orthogonalization with no iterative refinement >>> >> > GMRES: happy breakdown tolerance 1e-30 >>> >> > maximum iterations=10, initial guess is zero >>> >> > tolerances: relative=1e-12, absolute=1e-50, >>> >> > divergence=10000. >>> >> > left preconditioning >>> >> > using PRECONDITIONED norm type for convergence test >>> >> > maximum iterations=2 >>> >> > tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000. >>> >> > left preconditioning >>> >> > using nonzero initial guess >>> >> > using NONE norm type for convergence test >>> >> > PC Object: (mg_levels_1_) 384 MPI processes >>> >> > type: sor >>> >> > SOR: type = local_symmetric, iterations = 1, local >>> iterations >>> >> > = 1, omega = 1. >>> >> > linear system matrix followed by preconditioner matrix: >>> >> > Mat Object: 384 MPI processes >>> >> > type: mffd >>> >> > rows=3020875, cols=3020875 >>> >> > Matrix-free approximation: >>> >> > err=1.49012e-08 (relative error in function >>> evaluation) >>> >> > Using wp compute h routine >>> >> > Does not compute normU >>> >> > Mat Object: () 384 MPI processes >>> >> > type: mpiaij >>> >> > rows=3020875, cols=3020875 >>> >> > total: nonzeros=215671710, allocated nonzeros=241731750 >>> >> > total number of mallocs used during MatSetValues calls =0 >>> >> > not using I-node (on process 0) routines >>> >> > Up solver (post-smoother) same as down solver (pre-smoother) >>> >> > linear system matrix followed by preconditioner matrix: >>> >> > Mat Object: 384 MPI processes >>> >> > type: mffd >>> >> > rows=3020875, cols=3020875 >>> >> > Matrix-free approximation: >>> >> > err=1.49012e-08 (relative error in function evaluation) >>> >> > Using wp compute h routine >>> >> > Does not compute normU >>> >> > Mat Object: () 384 MPI processes >>> >> > type: mpiaij >>> >> > rows=3020875, cols=3020875 >>> >> > total: nonzeros=215671710, allocated nonzeros=241731750 >>> >> > total number of mallocs used during MatSetValues calls =0 >>> >> > not using I-node (on process 0) routines >>> >> > >>> >> > >>> >> > Fande, >>> >> > >>> >> > On Thu, Apr 6, 2017 at 8:27 AM, Mark Adams wrote: >>> >> > On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith >>> wrote: >>> >> > > >>> >> > >> Does this mean that GAMG works for the symmetrical matrix only? >>> >> > > >>> >> > > No, it means that for non symmetric nonzero structure you need >>> the >>> >> > > extra flag. So use the extra flag. The reason we don't always use >>> the flag >>> >> > > is because it adds extra cost and isn't needed if the matrix >>> already has a >>> >> > > symmetric nonzero structure. >>> >> > >>> >> > BTW, if you have symmetric non-zero structure you can just set >>> >> > -pc_gamg_threshold -1.0', note the "or" in the message. >>> >> > >>> >> > If you want to mess with the threshold then you need to use the >>> >> > symmetrized flag. >>> >> > >>> >> >>> > >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pvsang002 at gmail.com Wed Apr 19 10:37:37 2017 From: pvsang002 at gmail.com (Pham Pham) Date: Wed, 19 Apr 2017 23:37:37 +0800 Subject: [petsc-users] Installation question Message-ID: Hi, I just installed petsc-3.7.5 into my university cluster. When evaluating the computer system, PETSc reports "It appears you have 1 node(s)", I donot understand this, since the system is a multinodes system. Could you please explain this to me? Thank you very much. S. Output: ========================================= Now to evaluate the computer systems you plan use - do: make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 PETSC_ARCH=arch-linux-cxx-opt streams [mpepvs at atlas7-c10 petsc-3.7.5]$ make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 PETSC_ARCH=arch-linux-cxx-opt streams cd src/benchmarks/streams; /usr/bin/gmake --no-print-directory PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 PETSC_ARCH=arch-linux-cxx-opt streams /home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/bin/mpicxx -o MPIVersion.o -c -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fvisibility=hidden -g -O -I/home/svu/mpepvs/petsc/petsc-3.7.5/include -I/home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/include `pwd`/MPIVersion.c Running streams with '/home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/bin/mpiexec ' using 'NPMAX=12' Number of MPI processes 1 Processor names atlas7-c10 Triad: 9137.5025 Rate (MB/s) Number of MPI processes 2 Processor names atlas7-c10 atlas7-c10 Triad: 9707.2815 Rate (MB/s) Number of MPI processes 3 Processor names atlas7-c10 atlas7-c10 atlas7-c10 Triad: 13559.5275 Rate (MB/s) Number of MPI processes 4 Processor names atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 Triad: 14193.0597 Rate (MB/s) Number of MPI processes 5 Processor names atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 Triad: 14492.9234 Rate (MB/s) Number of MPI processes 6 Processor names atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 Triad: 15476.5912 Rate (MB/s) Number of MPI processes 7 Processor names atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 Triad: 15148.7388 Rate (MB/s) Number of MPI processes 8 Processor names atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 Triad: 15799.1290 Rate (MB/s) Number of MPI processes 9 Processor names atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 Triad: 15671.3104 Rate (MB/s) Number of MPI processes 10 Processor names atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 Triad: 15601.4754 Rate (MB/s) Number of MPI processes 11 Processor names atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 Triad: 15434.5790 Rate (MB/s) Number of MPI processes 12 Processor names atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 Triad: 15134.1263 Rate (MB/s) ------------------------------------------------ np speedup 1 1.0 2 1.06 3 1.48 4 1.55 5 1.59 6 1.69 7 1.66 8 1.73 9 1.72 10 1.71 11 1.69 12 1.66 Estimation of possible speedup of MPI programs based on Streams benchmark. It appears you have 1 node(s) Unable to plot speedup to a file Unable to open matplotlib to plot speedup [mpepvs at atlas7-c10 petsc-3.7.5]$ [mpepvs at atlas7-c10 petsc-3.7.5]$ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: text/x-log Size: 6194578 bytes Desc: not available URL: From balay at mcs.anl.gov Wed Apr 19 10:43:55 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 19 Apr 2017 10:43:55 -0500 Subject: [petsc-users] Installation question In-Reply-To: References: Message-ID: Presumably your cluster already has a recommended MPI to use [which is already installed. So you should use that - instead of --download-mpich=1 Satish On Wed, 19 Apr 2017, Pham Pham wrote: > Hi, > > I just installed petsc-3.7.5 into my university cluster. When evaluating > the computer system, PETSc reports "It appears you have 1 node(s)", I donot > understand this, since the system is a multinodes system. Could you please > explain this to me? > > Thank you very much. > > S. > > Output: > ========================================= > Now to evaluate the computer systems you plan use - do: > make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > PETSC_ARCH=arch-linux-cxx-opt streams > [mpepvs at atlas7-c10 petsc-3.7.5]$ make > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 PETSC_ARCH=arch-linux-cxx-opt > streams > cd src/benchmarks/streams; /usr/bin/gmake --no-print-directory > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 PETSC_ARCH=arch-linux-cxx-opt > streams > /home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/bin/mpicxx -o > MPIVersion.o -c -Wall -Wwrite-strings -Wno-strict-aliasing > -Wno-unknown-pragmas -fvisibility=hidden -g -O > -I/home/svu/mpepvs/petsc/petsc-3.7.5/include > -I/home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/include > `pwd`/MPIVersion.c > Running streams with > '/home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/bin/mpiexec ' using > 'NPMAX=12' > Number of MPI processes 1 Processor names atlas7-c10 > Triad: 9137.5025 Rate (MB/s) > Number of MPI processes 2 Processor names atlas7-c10 atlas7-c10 > Triad: 9707.2815 Rate (MB/s) > Number of MPI processes 3 Processor names atlas7-c10 atlas7-c10 atlas7-c10 > Triad: 13559.5275 Rate (MB/s) > Number of MPI processes 4 Processor names atlas7-c10 atlas7-c10 atlas7-c10 > atlas7-c10 > Triad: 14193.0597 Rate (MB/s) > Number of MPI processes 5 Processor names atlas7-c10 atlas7-c10 atlas7-c10 > atlas7-c10 atlas7-c10 > Triad: 14492.9234 Rate (MB/s) > Number of MPI processes 6 Processor names atlas7-c10 atlas7-c10 atlas7-c10 > atlas7-c10 atlas7-c10 atlas7-c10 > Triad: 15476.5912 Rate (MB/s) > Number of MPI processes 7 Processor names atlas7-c10 atlas7-c10 atlas7-c10 > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > Triad: 15148.7388 Rate (MB/s) > Number of MPI processes 8 Processor names atlas7-c10 atlas7-c10 atlas7-c10 > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > Triad: 15799.1290 Rate (MB/s) > Number of MPI processes 9 Processor names atlas7-c10 atlas7-c10 atlas7-c10 > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > Triad: 15671.3104 Rate (MB/s) > Number of MPI processes 10 Processor names atlas7-c10 atlas7-c10 > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > atlas7-c10 atlas7-c10 > Triad: 15601.4754 Rate (MB/s) > Number of MPI processes 11 Processor names atlas7-c10 atlas7-c10 > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > atlas7-c10 atlas7-c10 atlas7-c10 > Triad: 15434.5790 Rate (MB/s) > Number of MPI processes 12 Processor names atlas7-c10 atlas7-c10 > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > Triad: 15134.1263 Rate (MB/s) > ------------------------------------------------ > np speedup > 1 1.0 > 2 1.06 > 3 1.48 > 4 1.55 > 5 1.59 > 6 1.69 > 7 1.66 > 8 1.73 > 9 1.72 > 10 1.71 > 11 1.69 > 12 1.66 > Estimation of possible speedup of MPI programs based on Streams benchmark. > It appears you have 1 node(s) > Unable to plot speedup to a file > Unable to open matplotlib to plot speedup > [mpepvs at atlas7-c10 petsc-3.7.5]$ > [mpepvs at atlas7-c10 petsc-3.7.5]$ > From pvsang002 at gmail.com Wed Apr 19 11:34:56 2017 From: pvsang002 at gmail.com (Pham Pham) Date: Thu, 20 Apr 2017 00:34:56 +0800 Subject: [petsc-users] Installation question In-Reply-To: References: Message-ID: I reconfigured PETSs with installed MPI, however, I got serous error: **************************ERROR************************************* Error during compile, check arch-linux-cxx-opt/lib/petsc/conf/make.log Send it and arch-linux-cxx-opt/lib/petsc/conf/configure.log to petsc-maint at mcs.anl.gov ******************************************************************** Please explain what is happening? Thank you very much. On Wed, Apr 19, 2017 at 11:43 PM, Satish Balay wrote: > Presumably your cluster already has a recommended MPI to use [which is > already installed. So you should use that - instead of > --download-mpich=1 > > Satish > > On Wed, 19 Apr 2017, Pham Pham wrote: > > > Hi, > > > > I just installed petsc-3.7.5 into my university cluster. When evaluating > > the computer system, PETSc reports "It appears you have 1 node(s)", I > donot > > understand this, since the system is a multinodes system. Could you > please > > explain this to me? > > > > Thank you very much. > > > > S. > > > > Output: > > ========================================= > > Now to evaluate the computer systems you plan use - do: > > make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > > PETSC_ARCH=arch-linux-cxx-opt streams > > [mpepvs at atlas7-c10 petsc-3.7.5]$ make > > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > PETSC_ARCH=arch-linux-cxx-opt > > streams > > cd src/benchmarks/streams; /usr/bin/gmake --no-print-directory > > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > PETSC_ARCH=arch-linux-cxx-opt > > streams > > /home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/bin/mpicxx -o > > MPIVersion.o -c -Wall -Wwrite-strings -Wno-strict-aliasing > > -Wno-unknown-pragmas -fvisibility=hidden -g -O > > -I/home/svu/mpepvs/petsc/petsc-3.7.5/include > > -I/home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/include > > `pwd`/MPIVersion.c > > Running streams with > > '/home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/bin/mpiexec ' > using > > 'NPMAX=12' > > Number of MPI processes 1 Processor names atlas7-c10 > > Triad: 9137.5025 Rate (MB/s) > > Number of MPI processes 2 Processor names atlas7-c10 atlas7-c10 > > Triad: 9707.2815 Rate (MB/s) > > Number of MPI processes 3 Processor names atlas7-c10 atlas7-c10 > atlas7-c10 > > Triad: 13559.5275 Rate (MB/s) > > Number of MPI processes 4 Processor names atlas7-c10 atlas7-c10 > atlas7-c10 > > atlas7-c10 > > Triad: 14193.0597 Rate (MB/s) > > Number of MPI processes 5 Processor names atlas7-c10 atlas7-c10 > atlas7-c10 > > atlas7-c10 atlas7-c10 > > Triad: 14492.9234 Rate (MB/s) > > Number of MPI processes 6 Processor names atlas7-c10 atlas7-c10 > atlas7-c10 > > atlas7-c10 atlas7-c10 atlas7-c10 > > Triad: 15476.5912 Rate (MB/s) > > Number of MPI processes 7 Processor names atlas7-c10 atlas7-c10 > atlas7-c10 > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > Triad: 15148.7388 Rate (MB/s) > > Number of MPI processes 8 Processor names atlas7-c10 atlas7-c10 > atlas7-c10 > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > Triad: 15799.1290 Rate (MB/s) > > Number of MPI processes 9 Processor names atlas7-c10 atlas7-c10 > atlas7-c10 > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > Triad: 15671.3104 Rate (MB/s) > > Number of MPI processes 10 Processor names atlas7-c10 atlas7-c10 > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > atlas7-c10 atlas7-c10 > > Triad: 15601.4754 Rate (MB/s) > > Number of MPI processes 11 Processor names atlas7-c10 atlas7-c10 > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > atlas7-c10 atlas7-c10 atlas7-c10 > > Triad: 15434.5790 Rate (MB/s) > > Number of MPI processes 12 Processor names atlas7-c10 atlas7-c10 > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > Triad: 15134.1263 Rate (MB/s) > > ------------------------------------------------ > > np speedup > > 1 1.0 > > 2 1.06 > > 3 1.48 > > 4 1.55 > > 5 1.59 > > 6 1.69 > > 7 1.66 > > 8 1.73 > > 9 1.72 > > 10 1.71 > > 11 1.69 > > 12 1.66 > > Estimation of possible speedup of MPI programs based on Streams > benchmark. > > It appears you have 1 node(s) > > Unable to plot speedup to a file > > Unable to open matplotlib to plot speedup > > [mpepvs at atlas7-c10 petsc-3.7.5]$ > > [mpepvs at atlas7-c10 petsc-3.7.5]$ > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: make.log Type: text/x-log Size: 22382 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: configure.log Type: text/x-log Size: 4405520 bytes Desc: not available URL: From balay at mcs.anl.gov Wed Apr 19 13:02:20 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 19 Apr 2017 13:02:20 -0500 Subject: [petsc-users] Installation question In-Reply-To: References: Message-ID: Sorry - should have mentioned: do 'rm -rf arch-linux-cxx-opt' and rerun configure again. The mpich install from previous build [that is currently in arch-linux-cxx-opt/] is conflicting with --with-mpi-dir=/app1/centos6.3/gnu/mvapich2-1.9/ Satish On Wed, 19 Apr 2017, Pham Pham wrote: > I reconfigured PETSs with installed MPI, however, I got serous error: > > **************************ERROR************************************* > Error during compile, check arch-linux-cxx-opt/lib/petsc/conf/make.log > Send it and arch-linux-cxx-opt/lib/petsc/conf/configure.log to > petsc-maint at mcs.anl.gov > ******************************************************************** > > Please explain what is happening? > > Thank you very much. > > > > > On Wed, Apr 19, 2017 at 11:43 PM, Satish Balay wrote: > > > Presumably your cluster already has a recommended MPI to use [which is > > already installed. So you should use that - instead of > > --download-mpich=1 > > > > Satish > > > > On Wed, 19 Apr 2017, Pham Pham wrote: > > > > > Hi, > > > > > > I just installed petsc-3.7.5 into my university cluster. When evaluating > > > the computer system, PETSc reports "It appears you have 1 node(s)", I > > donot > > > understand this, since the system is a multinodes system. Could you > > please > > > explain this to me? > > > > > > Thank you very much. > > > > > > S. > > > > > > Output: > > > ========================================= > > > Now to evaluate the computer systems you plan use - do: > > > make PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > > > PETSC_ARCH=arch-linux-cxx-opt streams > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ make > > > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > > PETSC_ARCH=arch-linux-cxx-opt > > > streams > > > cd src/benchmarks/streams; /usr/bin/gmake --no-print-directory > > > PETSC_DIR=/home/svu/mpepvs/petsc/petsc-3.7.5 > > PETSC_ARCH=arch-linux-cxx-opt > > > streams > > > /home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/bin/mpicxx -o > > > MPIVersion.o -c -Wall -Wwrite-strings -Wno-strict-aliasing > > > -Wno-unknown-pragmas -fvisibility=hidden -g -O > > > -I/home/svu/mpepvs/petsc/petsc-3.7.5/include > > > -I/home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/include > > > `pwd`/MPIVersion.c > > > Running streams with > > > '/home/svu/mpepvs/petsc/petsc-3.7.5/arch-linux-cxx-opt/bin/mpiexec ' > > using > > > 'NPMAX=12' > > > Number of MPI processes 1 Processor names atlas7-c10 > > > Triad: 9137.5025 Rate (MB/s) > > > Number of MPI processes 2 Processor names atlas7-c10 atlas7-c10 > > > Triad: 9707.2815 Rate (MB/s) > > > Number of MPI processes 3 Processor names atlas7-c10 atlas7-c10 > > atlas7-c10 > > > Triad: 13559.5275 Rate (MB/s) > > > Number of MPI processes 4 Processor names atlas7-c10 atlas7-c10 > > atlas7-c10 > > > atlas7-c10 > > > Triad: 14193.0597 Rate (MB/s) > > > Number of MPI processes 5 Processor names atlas7-c10 atlas7-c10 > > atlas7-c10 > > > atlas7-c10 atlas7-c10 > > > Triad: 14492.9234 Rate (MB/s) > > > Number of MPI processes 6 Processor names atlas7-c10 atlas7-c10 > > atlas7-c10 > > > atlas7-c10 atlas7-c10 atlas7-c10 > > > Triad: 15476.5912 Rate (MB/s) > > > Number of MPI processes 7 Processor names atlas7-c10 atlas7-c10 > > atlas7-c10 > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > Triad: 15148.7388 Rate (MB/s) > > > Number of MPI processes 8 Processor names atlas7-c10 atlas7-c10 > > atlas7-c10 > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > Triad: 15799.1290 Rate (MB/s) > > > Number of MPI processes 9 Processor names atlas7-c10 atlas7-c10 > > atlas7-c10 > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > Triad: 15671.3104 Rate (MB/s) > > > Number of MPI processes 10 Processor names atlas7-c10 atlas7-c10 > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > atlas7-c10 atlas7-c10 > > > Triad: 15601.4754 Rate (MB/s) > > > Number of MPI processes 11 Processor names atlas7-c10 atlas7-c10 > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > atlas7-c10 atlas7-c10 atlas7-c10 > > > Triad: 15434.5790 Rate (MB/s) > > > Number of MPI processes 12 Processor names atlas7-c10 atlas7-c10 > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > atlas7-c10 atlas7-c10 atlas7-c10 atlas7-c10 > > > Triad: 15134.1263 Rate (MB/s) > > > ------------------------------------------------ > > > np speedup > > > 1 1.0 > > > 2 1.06 > > > 3 1.48 > > > 4 1.55 > > > 5 1.59 > > > 6 1.69 > > > 7 1.66 > > > 8 1.73 > > > 9 1.72 > > > 10 1.71 > > > 11 1.69 > > > 12 1.66 > > > Estimation of possible speedup of MPI programs based on Streams > > benchmark. > > > It appears you have 1 node(s) > > > Unable to plot speedup to a file > > > Unable to open matplotlib to plot speedup > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ > > > [mpepvs at atlas7-c10 petsc-3.7.5]$ > > > > > > > > From ingogaertner.tus at gmail.com Thu Apr 20 00:41:43 2017 From: ingogaertner.tus at gmail.com (Ingo Gaertner) Date: Thu, 20 Apr 2017 07:41:43 +0200 Subject: [petsc-users] dmplex face normals orientation In-Reply-To: References: Message-ID: Thank you, Matt, this answers my question. Ingo 2017-04-18 22:21 GMT+02:00 Matthew Knepley : > On Tue, Apr 18, 2017 at 2:33 PM, Ingo Gaertner > wrote: > >> The part that does not make sense is: >> The code calculates that >> face 11 (or edge 11 if you call the boundary of a 2D cell an edge) has >> the centroid c=(0.750000 1.000000 0.000000) and the normal n=(0.000000 >> 0.500000 0.000000) and that >> face 12 (or edge 12) has the centroid c=(0.000000 0.500000 0.000000) and >> the normal n=(-1.000000 -0.000000 0.000000). >> I understood your previous answer ("The normal should be outward, unless >> the face has orientation < 0") such that I have to reverse these normals to >> get an outward normal, because faces 11 and 12 have orientation=-2. But the >> normals n=(0.000000 0.500000 0.000000) and n=(-1.000000 -0.000000 >> 0.000000) do already point outward. If I reverse them, they point inward. >> >> I need a CONSISTENT rule how to make use of the normals obtained from >> DMPlexComputeGeometryFVM(dm, &cellgeom,&facegeom) to calculate OUTWARD >> pointing normals with respect to EACH INDIVIDUAL cell. >> Or do I have to iterate over each face of each cell and use a geometric >> check to see if the calculated normals are pointing inward or outward? >> > > I apologize. I did not understand the question before. The convention I > have chosen might not be the best one, > but it seems appropriate for FVM. > > The orientation of a face normal is chosen such that > > n . r > 0 > > where r is the vector from the centroid of the left face to > the centroid of the right face. If we do GetSupport() for > a face we get back {l, r}, where r could be empty, meaning > the cell at infinity. > > This convention means that I never have to check orientations > when I am using the facegeom[] stuff in FVM, I just need to have > {l, r} which I generally have in the loop. > > Thanks, > > Matt > > >> Thanks >> Ingo >> >> >> >> 2017-04-18 20:20 GMT+02:00 Matthew Knepley : >> >>> On Tue, Apr 18, 2017 at 12:46 AM, Ingo Gaertner < >>> ingogaertner.tus at gmail.com> wrote: >>> >>>> Dear Matt, >>>> please explain your previous answer ("The normal should be outward, >>>> unless the face has orientation < 0") with respect to my example. >>>> As I think, the example shows that the face normals for faces 11 and 12 >>>> are pointing outward, although they have orientation=-2. For all other >>>> faces the normal direction and their orientation sign agree with what you >>>> said. >>>> >>> >>> 1) You should destroy the objects at the end >>> >>> ierr = VecDestroy(&cellgeom);CHKERRQ(ierr); >>> ierr = VecDestroy(&facegeom);CHKERRQ(ierr); >>> ierr = DMDestroy(&dm);CHKERRQ(ierr); >>> >>> 2) You should call >>> >>> ierr = DMSetFromOptions(dm);CHKERRQ(ierr); >>> ierr = DMViewFromOptions(dm, NULL, "-dm_view");CHKERRQ(ierr); >>> >>> after DM creation to make it easier to customize and debug. Then we can >>> use >>> >>> -dm_view ::ascii_info_detail >>> >>> to look at the DM. >>> >>> 3) Lets look at Cell 0 >>> >>> [0]: 0 <---- 8 (0) >>> [0]: 0 <---- 13 (0) >>> [0]: 0 <---- 10 (-2) >>> [0]: 0 <---- 12 (-2) >>> >>> There are two edges reversed. The edges themselves {8, 13, 10, 12} >>> should proceed counter-clockwise from the bottom, and have vertices >>> >>> [0]: 8 <---- 2 (0) >>> [0]: 8 <---- 3 (0) >>> [0]: 13 <---- 3 (0) >>> [0]: 13 <---- 6 (0) >>> [0]: 10 <---- 5 (0) >>> [0]: 10 <---- 6 (0) >>> [0]: 12 <---- 2 (0) >>> [0]: 12 <---- 5 (0) >>> >>> so we get as we expect >>> >>> 2 --> 3 --> 6 --> 5 >>> >>> which agrees with the coordinates >>> >>> ( 2) dim 2 offset 0 0. 0. >>> ( 3) dim 2 offset 2 0.5 0. >>> ( 4) dim 2 offset 4 1. 0. >>> ( 5) dim 2 offset 6 0. 1. >>> ( 6) dim 2 offset 8 0.5 1. >>> ( 7) dim 2 offset 10 1. 1. >>> >>> Which part does not make sense? >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thanks >>>> Ingo >>>> >>>> 2017-04-14 11:28 GMT+02:00 Ingo Gaertner : >>>> >>>>> Thank you, Matt, >>>>> as you say, the face orientations do change sign when switching >>>>> between the two adjacent cells. (I confused my program output. You are >>>>> right.) But it seems not to be always correct to keep the normal direction >>>>> for orientation>=0 and invert it for orientation<0: >>>>> >>>>> I include sample code below to make my question more clear. The >>>>> program creates a HexBoxMesh split horizontally in two cells (at x=0.5) It >>>>> produces this output: >>>>> >>>>> "Face centroids (c) and normals(n): >>>>> face #008 c=(0.250000 0.000000 0.000000) n=(-0.000000 -0.500000 >>>>> 0.000000) >>>>> face #009 c=(0.750000 0.000000 0.000000) n=(-0.000000 -0.500000 >>>>> 0.000000) >>>>> face #010 c=(0.250000 1.000000 0.000000) n=(0.000000 0.500000 0.000000) >>>>> face #011 c=(0.750000 1.000000 0.000000) n=(0.000000 0.500000 0.000000) >>>>> face #012 c=(0.000000 0.500000 0.000000) n=(-1.000000 -0.000000 >>>>> 0.000000) >>>>> face #013 c=(0.500000 0.500000 0.000000) n=(1.000000 0.000000 0.000000) >>>>> face #014 c=(1.000000 0.500000 0.000000) n=(1.000000 0.000000 0.000000) >>>>> Cell faces orientations: >>>>> cell #0, faces:[8 13 10 12] orientations:[0 0 -2 -2] >>>>> cell #1, faces:[9 14 11 13] orientations:[0 0 -2 -2]" >>>>> >>>>> Looking at the face normals, all boundary normals point outside (good). >>>>> The normal of face #013 points outside with respect to the left cell >>>>> #0, but inside w.r.t. the right cell #1. >>>>> >>>>> Face 13 is shared between both cells. It has orientation 0 for cell >>>>> #0, but orientation -2 for cell #1 (good). >>>>> What I don't understand is the orientation of face 12 (cell 0) and of >>>>> face 11 (cell 1). These are negative, which would make them point into the >>>>> cell. Have I done some other stupid mistake? >>>>> >>>>> Thanks >>>>> Ingo >>>>> >>>>> Here is the code: >>>>> >>>>> static char help[] = "Check face normals orientations.\n\n"; >>>>> #include >>>>> #undef __FUNCT__ >>>>> #define __FUNCT__ "main" >>>>> >>>>> int main(int argc, char **argv) >>>>> { >>>>> DM dm; >>>>> PetscErrorCode ierr; >>>>> PetscFVCellGeom *cgeom; >>>>> PetscFVFaceGeom *fgeom; >>>>> Vec cellgeom,facegeom; >>>>> int dim=2; >>>>> int cells[]={2,1}; >>>>> int cStart,cEnd,fStart,fEnd; >>>>> int coneSize,supportSize; >>>>> const int *cone,*coneOrientation; >>>>> >>>>> ierr = PetscInitialize(&argc, &argv, NULL, help);CHKERRQ(ierr); >>>>> ierr = PetscOptionsGetInt(NULL, NULL, "-dim", &dim, >>>>> NULL);CHKERRQ(ierr); >>>>> ierr = DMPlexCreateHexBoxMesh(PETSC_COMM_WORLD, dim, >>>>> cells,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE,DM_BOUNDARY_NONE, >>>>> &dm);CHKERRQ(ierr); >>>>> >>>>> ierr = DMPlexComputeGeometryFVM(dm, &cellgeom,&facegeom);CHKERRQ(i >>>>> err); >>>>> ierr = VecGetArray(cellgeom, (PetscScalar**)&cgeom);CHKERRQ(ierr); >>>>> ierr = VecGetArray(facegeom, (PetscScalar**)&fgeom);CHKERRQ(ierr); >>>>> ierr = DMPlexGetHeightStratum(dm, 0, &cStart, &cEnd);CHKERRQ(ierr); >>>>> ierr = DMPlexGetHeightStratum(dm, 1, &fStart, &fEnd);CHKERRQ(ierr); >>>>> >>>>> fprintf(stderr,"Face centroids (c) and normals(n):\n"); >>>>> for (int f=fStart;f>>>> fprintf(stderr,"face #%03d c=(%03f %03f %03f) n=(%03f %03f >>>>> %03f)\n",f, >>>>> fgeom[f-fStart].centroid[0],fgeom[f-fStart].centroid[1],fgeo >>>>> m[f-fStart].centroid[2], >>>>> fgeom[f-fStart].normal[0],fgeom[f-fStart].normal[1],fgeom[f- >>>>> fStart].normal[2]); >>>>> } >>>>> >>>>> fprintf(stderr,"Cell faces orientations:\n"); >>>>> for (int c=cStart;c>>>> ierr = DMPlexGetConeSize(dm, c, &coneSize);CHKERRQ(ierr); >>>>> ierr = DMPlexGetCone(dm, c, &cone);CHKERRQ(ierr); >>>>> ierr = DMPlexGetConeOrientation(dm, c, >>>>> &coneOrientation);CHKERRQ(ierr); >>>>> if (dim==2){ >>>>> if (coneSize!=4){ >>>>> fprintf(stderr,"Expected coneSize 4, got %d.\n",coneSize); >>>>> exit(1); >>>>> } >>>>> fprintf(stderr,"cell #%d, faces:[%d %d %d %d] orientations:[%d >>>>> %d %d %d]\n",c, >>>>> cone[0],cone[1],cone[2],cone[3], >>>>> coneOrientation[0],coneOrientation[1],coneOrientation[2],con >>>>> eOrientation[3] >>>>> ); >>>>> } else if (dim==3){ >>>>> if (coneSize!=6){ >>>>> fprintf(stderr,"Expected coneSize 6, got %d.\n",coneSize); >>>>> exit(1); >>>>> } >>>>> fprintf(stderr,"cell #%d, faces:[%d %d %d %d %d %d] >>>>> orientations:[%d %d %d %d %d %d]\n",c, >>>>> cone[0],cone[1],cone[2],cone[3],cone[4],cone[5], >>>>> coneOrientation[0],coneOrientation[1],coneOrientation[2],con >>>>> eOrientation[3],coneOrientation[4],coneOrientation[5] >>>>> ); >>>>> } else { >>>>> fprintf(stderr,"Dimension %d not implemented.\n",dim); >>>>> exit(1); >>>>> } >>>>> } >>>>> ierr = PetscFinalize(); >>>>> >>>>> } >>>>> >>>>> >>>>> 2017-04-14 11:00 GMT+02:00 Matthew Knepley : >>>>> >>>>>> On Wed, Apr 12, 2017 at 10:52 AM, Ingo Gaertner < >>>>>> ingogaertner.tus at gmail.com> wrote: >>>>>> >>>>>>> Hello, >>>>>>> I have problems determining the orientation of the face normals of a >>>>>>> DMPlex. >>>>>>> >>>>>>> I create a DMPlex, for example with DMPlexCreateHexBoxMesh(). >>>>>>> Next, I get the face normals using DMPlexComputeGeometryFVM(DM dm, >>>>>>> Vec *cellgeom, Vec *facegeom). facegeom gives the correct normals, but I >>>>>>> don't know how the inside/outside is defined with respect to the adjacant >>>>>>> cells? >>>>>>> >>>>>> >>>>>> The normal should be outward, unless the face has orientation < 0. >>>>>> >>>>>> >>>>>>> Finally, I iterate over all cells. For each cell I iterate over the >>>>>>> bounding faces (obtained from DMPlexGetCone) and try to obtain their >>>>>>> orientation with respect to the current cell using >>>>>>> DMPlexGetConeOrientation(). However, the six integers for the orientation >>>>>>> are the same for each cell. I expect them to flip between neighbour cells, >>>>>>> because if a face normal is pointing outside for any cell, the same normal >>>>>>> is pointing inside for its neighbour. Apparently I have a misunderstanding >>>>>>> here. >>>>>>> >>>>>> >>>>>> I see the orientations changing sign for adjacent cells. Want to send >>>>>> a simple code? You should see this >>>>>> for examples. You can run SNES ex12 with -dm_view ::ascii_info_detail >>>>>> to see the change in sign. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>>> How can I make use of the face normals in facegeom and the >>>>>>> orientation values from DMPlexGetConeOrientation() to get the outside face >>>>>>> normals for each cell? >>>>>>> >>>>>>> Thank you >>>>>>> Ingo >>>>>>> >>>>>>> >>>>>>> Virenfrei. >>>>>>> www.avast.com >>>>>>> >>>>>>> <#m_8361884064281082256_m_1492622701141221377_m_-6171374370410339176_m_8179380476569161657_m_2360740100696784902_m_-1897645241821352774_m_-6432344443275881332_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>> >>>>> >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From damian at man.poznan.pl Thu Apr 20 05:03:16 2017 From: damian at man.poznan.pl (Damian Kaliszan) Date: Thu, 20 Apr 2017 12:03:16 +0200 Subject: [petsc-users] petsc4py & GPU Message-ID: <33676872.20170420120316@man.poznan.pl> Hi, Currently I'm doing a simple Ax=b with KSP method and MPI by loading in parallel a data from file viewer = petsc4py.PETSc.Viewer().createBinary(Afilename, 'r') A = petsc4py.PETSc.Mat().load(viewer). (same for b & x vectors) and then calling a solver ksp.solve(b, x) I would like to do the same using GPU. I changed respectively the above to viewer = petsc4py.PETSc.Viewer().createBinary(Afilename, 'r') A = PETSc.Mat().create(comm=comm) A.setType(PETSc.Mat.Type.MPIAIJCUSPARSE) A.load(viewer) ... What else needs to be changed? Running the above and checking nvidia-smi output confirms the python script and computations are done on CPU, not GPU as I 'd like to... Any help would be appreciated. Best, Damian From knepley at gmail.com Thu Apr 20 07:10:11 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 20 Apr 2017 07:10:11 -0500 Subject: [petsc-users] petsc4py & GPU In-Reply-To: <33676872.20170420120316@man.poznan.pl> References: <33676872.20170420120316@man.poznan.pl> Message-ID: On Thu, Apr 20, 2017 at 5:03 AM, Damian Kaliszan wrote: > Hi, > > Currently I'm doing a simple Ax=b with KSP method and MPI by loading > in parallel a data from file > > viewer = petsc4py.PETSc.Viewer().createBinary(Afilename, 'r') > A = petsc4py.PETSc.Mat().load(viewer). > > (same for b & x vectors) and then calling a solver > > ksp.solve(b, x) > > I would like to do the same using GPU. > I changed respectively the above to > viewer = petsc4py.PETSc.Viewer().createBinary(Afilename, 'r') > A = PETSc.Mat().create(comm=comm) > A.setType(PETSc.Mat.Type.MPIAIJCUSPARSE) > A.load(viewer) > ... > What else needs to be changed? > Running the above and checking nvidia-smi output confirms the python > script and computations are done on CPU, not GPU as I 'd like to... > 1) Make sure the Vec types are also for the GPU 2) Send the output of -ksp_view -log_view Thanks, Matt > Any help would be appreciated. > > Best, > Damian > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From damian at man.poznan.pl Thu Apr 20 09:09:44 2017 From: damian at man.poznan.pl (Damian Kaliszan) Date: Thu, 20 Apr 2017 16:09:44 +0200 Subject: [petsc-users] petsc4py & GPU In-Reply-To: References: <33676872.20170420120316@man.poznan.pl> Message-ID: <691962187.20170420160944@man.poznan.pl> An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Apr 20 10:26:34 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 20 Apr 2017 10:26:34 -0500 Subject: [petsc-users] petsc4py & GPU In-Reply-To: <691962187.20170420160944@man.poznan.pl> References: <33676872.20170420120316@man.poznan.pl> <691962187.20170420160944@man.poznan.pl> Message-ID: On Thu, Apr 20, 2017 at 9:09 AM, Damian Kaliszan wrote: > Hi, > > Thank you for reply:) Sorry for maybe stupid question in the scope of > setting petsc(4py) options. > Should the following calls (somewhere before creating matrix & vectors): > > PETSc.Options().setValue("ksp_view", "") > PETSc.Options().setValue("log_view", "") > > > be enough to enable extended output? > I think so, but why not just give them on the command line? Thanks, Matt > Best, > Damian > > > W li?cie datowanym 20 kwietnia 2017 (14:10:11) napisano: > > > On Thu, Apr 20, 2017 at 5:03 AM, Damian Kaliszan > wrote: > Hi, > > Currently I'm doing a simple Ax=b with KSP method and MPI by loading > in parallel a data from file > > viewer = petsc4py.PETSc.Viewer().createBinary(Afilename, 'r') > A = petsc4py.PETSc.Mat().load(viewer). > > (same for b & x vectors) and then calling a solver > > ksp.solve(b, x) > > I would like to do the same using GPU. > I changed respectively the above to > viewer = petsc4py.PETSc.Viewer().createBinary(Afilename, 'r') > A = PETSc.Mat().create(comm=comm) > A.setType(PETSc.Mat.Type.MPIAIJCUSPARSE) > A.load(viewer) > ... > What else needs to be changed? > Running the above and checking nvidia-smi output confirms the python > script and computations are done on CPU, not GPU as I 'd like to... > > 1) Make sure the Vec types are also for the GPU > > 2) Send the output of > > -ksp_view -log_view > > Thanks, > > Matt > > Any help would be appreciated. > > Best, > Damian > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > ------------------------------------------------------- > Damian Kaliszan > > Poznan Supercomputing and Networking Center > HPC and Data Centres Technologies > ul. Jana Paw?a II 10 > 61-139 Poznan > POLAND > > phone (+48 61) 858 5109 <+48%2061%20858%2051%2009> > e-mail damian at man.poznan.pl > www - http://www.man.poznan.pl/ > ------------------------------------------------------- > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From dalcinl at gmail.com Thu Apr 20 10:36:48 2017 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Thu, 20 Apr 2017 18:36:48 +0300 Subject: [petsc-users] petsc4py & GPU In-Reply-To: <691962187.20170420160944@man.poznan.pl> References: <33676872.20170420120316@man.poznan.pl> <691962187.20170420160944@man.poznan.pl> Message-ID: On 20 April 2017 at 17:09, Damian Kaliszan wrote: > Thank you for reply:) Sorry for maybe stupid question in the scope of > setting petsc(4py) options. > Should the following calls (somewhere before creating matrix & vectors): > > PETSc.Options().setValue("ksp_view", "") > PETSc.Options().setValue("log_view", "") > Unfortunately, no. There are a few options (-log_view ?) that you should set before calling PetscInitialize() (which happens automatically at import time), otherwise things do not work as expected. To pass things from the command line and set them before PetscInitialize() the usual idiom is: import sys, petsc4py petsc4py.init(sys.argv) from petsc4py import PETSc -- Lisandro Dalcin ============ Research Scientist Computer, Electrical and Mathematical Sciences & Engineering (CEMSE) Extreme Computing Research Center (ECRC) King Abdullah University of Science and Technology (KAUST) http://ecrc.kaust.edu.sa/ 4700 King Abdullah University of Science and Technology al-Khawarizmi Bldg (Bldg 1), Office # 0109 Thuwal 23955-6900, Kingdom of Saudi Arabia http://www.kaust.edu.sa Office Phone: +966 12 808-0459 -------------- next part -------------- An HTML attachment was scrubbed... URL: From damian at man.poznan.pl Thu Apr 20 11:26:56 2017 From: damian at man.poznan.pl (Damian Kaliszan) Date: Thu, 20 Apr 2017 18:26:56 +0200 Subject: [petsc-users] petsc4py & GPU In-Reply-To: References: <33676872.20170420120316@man.poznan.pl> <691962187.20170420160944@man.poznan.pl> Message-ID: <43415ff5-1e9b-40e1-a51e-7cc220cc997d@man.poznan.pl> Hi, There might be the problem because I'm using ArgumentParser class to catch complex command line arguments. In this case is there any chance to make both to cooperate or the only solution is to pass everything through argument to init method of petsc4py? Best, Damian W dniu 20 kwi 2017, 17:37, o 17:37, u?ytkownik Lisandro Dalcin napisa?: >On 20 April 2017 at 17:09, Damian Kaliszan >wrote: > >> Thank you for reply:) Sorry for maybe stupid question in the scope of >> setting petsc(4py) options. >> Should the following calls (somewhere before creating matrix & >vectors): >> >> PETSc.Options().setValue("ksp_view", "") >> PETSc.Options().setValue("log_view", "") >> > >Unfortunately, no. There are a few options (-log_view ?) that you >should >set before calling PetscInitialize() (which happens automatically at >import >time), otherwise things do not work as expected. To pass things from >the >command line and set them before PetscInitialize() the usual idiom is: > >import sys, petsc4py >petsc4py.init(sys.argv) >from petsc4py import PETSc > > > >-- >Lisandro Dalcin >============ >Research Scientist >Computer, Electrical and Mathematical Sciences & Engineering (CEMSE) >Extreme Computing Research Center (ECRC) >King Abdullah University of Science and Technology (KAUST) >http://ecrc.kaust.edu.sa/ > >4700 King Abdullah University of Science and Technology >al-Khawarizmi Bldg (Bldg 1), Office # 0109 >Thuwal 23955-6900, Kingdom of Saudi Arabia >http://www.kaust.edu.sa > >Office Phone: +966 12 808-0459 -------------- next part -------------- An HTML attachment was scrubbed... URL: From francesco.caimmi at polimi.it Thu Apr 20 12:15:33 2017 From: francesco.caimmi at polimi.it (Francesco Caimmi) Date: Thu, 20 Apr 2017 17:15:33 +0000 Subject: [petsc-users] petsc4py & GPU In-Reply-To: <43415ff5-1e9b-40e1-a51e-7cc220cc997d@man.poznan.pl> References: <33676872.20170420120316@man.poznan.pl> <691962187.20170420160944@man.poznan.pl> , <43415ff5-1e9b-40e1-a51e-7cc220cc997d@man.poznan.pl> Message-ID: Hi Damian, You can use the "parse_known_args" method of the ArgumentParser class; it will create a Namespace with the args you defined and return the uknown ones as a list of strings you can pass to petsc4py.init. See: https://docs.python.org/3.6/library/argparse.html section 16.4.5.7. I used it successfully in the past. Hope this helps. Best, -- Francesco Caimmi Laboratorio di Ingegneria dei Polimeri http://www.chem.polimi.it/polyenglab/ Politecnico di Milano - Dipartimento di Chimica, Materiali e Ingegneria Chimica ?Giulio Natta? P.zza Leonardo da Vinci, 32 I-20133 Milano Tel. +39.02.2399.4711 Fax +39.02.7063.8173 francesco.caimmi at polimi.it Skype: fmglcaimmi ________________________________________ From: petsc-users-bounces at mcs.anl.gov on behalf of Damian Kaliszan Sent: Thursday, April 20, 2017 6:26 PM To: Lisandro Dalcin Cc: PETSc Subject: Re: [petsc-users] petsc4py & GPU Hi, There might be the problem because I'm using ArgumentParser class to catch complex command line arguments. In this case is there any chance to make both to cooperate or the only solution is to pass everything through argument to init method of petsc4py? Best, Damian W dniu 20 kwi 2017, o 17:37, u?ytkownik Lisandro Dalcin > napisa?: On 20 April 2017 at 17:09, Damian Kaliszan > wrote: Thank you for reply:) Sorry for maybe stupid question in the scope of setting petsc(4py) options. Should the following calls (somewhere before creating matrix & vectors): PETSc.Options().setValue("ksp_view", "") PETSc.Options().setValue("log_view", "") Unfortunately, no. There are a few options (-log_view ?) that you should set before calling PetscInitialize() (which happens automatically at import time), otherwise things do not work as expected. To pass things from the command line and set them before PetscInitialize() the usual idiom is: import sys, petsc4py petsc4py.init(sys.argv) from petsc4py import PETSc -- Lisandro Dalcin ============ Research Scientist Computer, Electrical and Mathematical Sciences & Engineering (CEMSE) Extreme Computing Research Center (ECRC) King Abdullah University of Science and Technology (KAUST) http://ecrc.kaust.edu.sa/ 4700 King Abdullah University of Science and Technology al-Khawarizmi Bldg (Bldg 1), Office # 0109 Thuwal 23955-6900, Kingdom of Saudi Arabia http://www.kaust.edu.sa Office Phone: +966 12 808-0459 From bhatiamanav at gmail.com Thu Apr 20 14:30:43 2017 From: bhatiamanav at gmail.com (Manav Bhatia) Date: Thu, 20 Apr 2017 14:30:43 -0500 Subject: [petsc-users] sees convergence with pc_fieldsplit Message-ID: <302E9415-1E6A-4DF4-9B9F-FA6240DF4C18@gmail.com> Hi, I have a time-dependent multiphysics problem that I am trying to solve using pc_fieldsplit. I have defined a nested matrix for the jacobian with the diagonal block matrices explicitly created and the off-diagonal blocks defined using shell matrices, so that the matrix vector product is defined. The nonlinear system of equations at each time-step is solved using an snes construct. I have been facing some convergence issues, so I have reduced the problem scope to ensure that the code converges to a single discipline solution when the off-diagonal couplings are ignored. Here, I have provided a constant forcing function to discipline two, which is a linear problem, so that I expect convergence in a single iteration. The linear solver, defined using pc_fieldsplit seems to be converging without problems. The nonlinear solver convergence in a single time-step with FNORM in the first time step. The second time-step onwards, the nonlinear solver does not converge in a single step, and is terminating due to SNORM_RELATIVE. I am not sure why this is happening. What is intriguing is that the solution at the end of the n^th time-step is n times the solution after the first time step. In other words, snes at each time-step is taking the same step as was used in the first time-step. Not sure sure why this is happening. I would appreciate any advice. Regards, Manav Time step: 0 : t = 0.000 || R ||_2 = 433013 : || R_i ||_2 = ( 2.81069e-07 , 433013 ) 0 KSP Residual norm 1.746840810717e-02 1 KSP Residual norm 5.983637077441e-12 Linear solve converged due to CONVERGED_RTOL iterations 1 || R ||_2 = 5.07622e-07 : || R_i ||_2 = ( 5.07622e-07 , 7.12896e-11 ) Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 Time step: 1 : t = 0.001 || R ||_2 = 1.73186e+06 : || R_i ||_2 = ( 5.77273e-07 , 1.73186e+06 ) 0 KSP Residual norm 1.745842035995e-02 1 KSP Residual norm 2.366595812944e-12 Linear solve converged due to CONVERGED_RTOL iterations 1 || R ||_2 = 1.29889e+06 : || R_i ||_2 = ( 9.7379e-07 , 1.29889e+06 ) Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 Time step: 2 : t = 0.002 : xdot-L2 = 2.72522e-06 || R ||_2 = 433159 : || R_i ||_2 = ( 1.35694e-06 , 433159 ) 0 KSP Residual norm 1.744848431182e-02 1 KSP Residual norm 7.650255893811e-12 Linear solve converged due to CONVERGED_RTOL iterations 1 || R ||_2 = 866074 : || R_i ||_2 = ( 8.42454e-07 , 866074 ) Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 Time step: 3 : t = 0.003 || R ||_2 = 2.59764e+06 : || R_i ||_2 = ( 1.12168e-06 , 2.59764e+06 ) 0 KSP Residual norm 1.743859969865e-02 1 KSP Residual norm 1.045225058356e-11 Linear solve converged due to CONVERGED_RTOL iterations 1 || R ||_2 = 2.16477e+06 : || R_i ||_2 = ( 9.73157e-07 , 2.16477e+06 ) Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 Following is the output of snesview: SNES Object: 1 MPI processes type: newtonls maximum iterations=50, maximum function evaluations=10000 tolerances: relative=1e-08, absolute=1e-50, solution=1e-08 total number of linear solver iterations=1 total number of function evaluations=2 norm schedule ALWAYS SNESLineSearch Object: 1 MPI processes type: basic maxstep=1.000000e+08, minlambda=1.000000e-12 tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 maximum iterations=1 KSP Object: 1 MPI processes type: gmres GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement GMRES: happy breakdown tolerance 1e-30 maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 1 MPI processes type: fieldsplit FieldSplit with SYMMETRIC_MULTIPLICATIVE composition: total splits = 2 Solver info for each split is in the following KSP objects: Split number 0 Defined by IS KSP Object: (fieldsplit_0_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (fieldsplit_0_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1., needed 1. Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=108, cols=108 package used to perform factorization: petsc total: nonzeros=2800, allocated nonzeros=2800 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 27 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: (fieldsplit_0_) 1 MPI processes type: seqaij rows=108, cols=108 total: nonzeros=2800, allocated nonzeros=2800 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 27 nodes, limit used is 5 Split number 1 Defined by IS KSP Object: (fieldsplit_1_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (fieldsplit_1_) 1 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 matrix ordering: nd factor fill ratio given 5., needed 1.15385 Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=30, cols=30 package used to perform factorization: petsc total: nonzeros=540, allocated nonzeros=540 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 9 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: (fieldsplit_1_) 1 MPI processes type: seqaij rows=30, cols=30 total: nonzeros=468, allocated nonzeros=468 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 10 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: nest rows=138, cols=138 Matrix object: type=nest, rows=2, cols=2 MatNest structure: (0,0) : prefix="fieldsplit_0_", type=seqaij, rows=108, cols=108 (0,1) : type=shell, rows=108, cols=30 (1,0) : type=shell, rows=30, cols=108 (1,1) : prefix="fieldsplit_1_", type=seqaij, rows=30, cols=30 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Apr 20 14:38:06 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 20 Apr 2017 14:38:06 -0500 Subject: [petsc-users] sees convergence with pc_fieldsplit In-Reply-To: <302E9415-1E6A-4DF4-9B9F-FA6240DF4C18@gmail.com> References: <302E9415-1E6A-4DF4-9B9F-FA6240DF4C18@gmail.com> Message-ID: <86483B5B-8BB5-49FA-9909-00047041FC7C@mcs.anl.gov> Run with -snes_monitor also and send the output > On Apr 20, 2017, at 2:30 PM, Manav Bhatia wrote: > > Hi, > > I have a time-dependent multiphysics problem that I am trying to solve using pc_fieldsplit. I have defined a nested matrix for the jacobian with the diagonal block matrices explicitly created and the off-diagonal blocks defined using shell matrices, so that the matrix vector product is defined. The nonlinear system of equations at each time-step is solved using an snes construct. > > I have been facing some convergence issues, so I have reduced the problem scope to ensure that the code converges to a single discipline solution when the off-diagonal couplings are ignored. > > Here, I have provided a constant forcing function to discipline two, which is a linear problem, so that I expect convergence in a single iteration. > > The linear solver, defined using pc_fieldsplit seems to be converging without problems. The nonlinear solver convergence in a single time-step with FNORM in the first time step. > > The second time-step onwards, the nonlinear solver does not converge in a single step, and is terminating due to SNORM_RELATIVE. I am not sure why this is happening. > > What is intriguing is that the solution at the end of the n^th time-step is n times the solution after the first time step. In other words, snes at each time-step is taking the same step as was used in the first time-step. > > Not sure sure why this is happening. I would appreciate any advice. > > Regards, > Manav > > Time step: 0 : t = 0.000 > || R ||_2 = 433013 : || R_i ||_2 = ( 2.81069e-07 , 433013 ) > 0 KSP Residual norm 1.746840810717e-02 > 1 KSP Residual norm 5.983637077441e-12 > Linear solve converged due to CONVERGED_RTOL iterations 1 > || R ||_2 = 5.07622e-07 : || R_i ||_2 = ( 5.07622e-07 , 7.12896e-11 ) > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > Time step: 1 : t = 0.001 > || R ||_2 = 1.73186e+06 : || R_i ||_2 = ( 5.77273e-07 , 1.73186e+06 ) > 0 KSP Residual norm 1.745842035995e-02 > 1 KSP Residual norm 2.366595812944e-12 > Linear solve converged due to CONVERGED_RTOL iterations 1 > || R ||_2 = 1.29889e+06 : || R_i ||_2 = ( 9.7379e-07 , 1.29889e+06 ) > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > Time step: 2 : t = 0.002 : xdot-L2 = 2.72522e-06 > || R ||_2 = 433159 : || R_i ||_2 = ( 1.35694e-06 , 433159 ) > 0 KSP Residual norm 1.744848431182e-02 > 1 KSP Residual norm 7.650255893811e-12 > Linear solve converged due to CONVERGED_RTOL iterations 1 > || R ||_2 = 866074 : || R_i ||_2 = ( 8.42454e-07 , 866074 ) > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > Time step: 3 : t = 0.003 > || R ||_2 = 2.59764e+06 : || R_i ||_2 = ( 1.12168e-06 , 2.59764e+06 ) > 0 KSP Residual norm 1.743859969865e-02 > 1 KSP Residual norm 1.045225058356e-11 > Linear solve converged due to CONVERGED_RTOL iterations 1 > || R ||_2 = 2.16477e+06 : || R_i ||_2 = ( 9.73157e-07 , 2.16477e+06 ) > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > > Following is the output of snesview: > SNES Object: 1 MPI processes > type: newtonls > maximum iterations=50, maximum function evaluations=10000 > tolerances: relative=1e-08, absolute=1e-50, solution=1e-08 > total number of linear solver iterations=1 > total number of function evaluations=2 > norm schedule ALWAYS > SNESLineSearch Object: 1 MPI processes > type: basic > maxstep=1.000000e+08, minlambda=1.000000e-12 > tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 > maximum iterations=1 > KSP Object: 1 MPI processes > type: gmres > GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: 1 MPI processes > type: fieldsplit > FieldSplit with SYMMETRIC_MULTIPLICATIVE composition: total splits = 2 > Solver info for each split is in the following KSP objects: > Split number 0 Defined by IS > KSP Object: (fieldsplit_0_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (fieldsplit_0_) 1 MPI processes > type: ilu > ILU: out-of-place factorization > 0 levels of fill > tolerance for zero pivot 2.22045e-14 > matrix ordering: natural > factor fill ratio given 1., needed 1. > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=108, cols=108 > package used to perform factorization: petsc > total: nonzeros=2800, allocated nonzeros=2800 > total number of mallocs used during MatSetValues calls =0 > using I-node routines: found 27 nodes, limit used is 5 > linear system matrix = precond matrix: > Mat Object: (fieldsplit_0_) 1 MPI processes > type: seqaij > rows=108, cols=108 > total: nonzeros=2800, allocated nonzeros=2800 > total number of mallocs used during MatSetValues calls =0 > using I-node routines: found 27 nodes, limit used is 5 > Split number 1 Defined by IS > KSP Object: (fieldsplit_1_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (fieldsplit_1_) 1 MPI processes > type: lu > LU: out-of-place factorization > tolerance for zero pivot 2.22045e-14 > matrix ordering: nd > factor fill ratio given 5., needed 1.15385 > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=30, cols=30 > package used to perform factorization: petsc > total: nonzeros=540, allocated nonzeros=540 > total number of mallocs used during MatSetValues calls =0 > using I-node routines: found 9 nodes, limit used is 5 > linear system matrix = precond matrix: > Mat Object: (fieldsplit_1_) 1 MPI processes > type: seqaij > rows=30, cols=30 > total: nonzeros=468, allocated nonzeros=468 > total number of mallocs used during MatSetValues calls =0 > using I-node routines: found 10 nodes, limit used is 5 > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: nest > rows=138, cols=138 > Matrix object: > type=nest, rows=2, cols=2 > MatNest structure: > (0,0) : prefix="fieldsplit_0_", type=seqaij, rows=108, cols=108 > (0,1) : type=shell, rows=108, cols=30 > (1,0) : type=shell, rows=30, cols=108 > (1,1) : prefix="fieldsplit_1_", type=seqaij, rows=30, cols=30 > > From bhatiamanav at gmail.com Thu Apr 20 14:40:00 2017 From: bhatiamanav at gmail.com (Manav Bhatia) Date: Thu, 20 Apr 2017 14:40:00 -0500 Subject: [petsc-users] sees convergence with pc_fieldsplit In-Reply-To: <86483B5B-8BB5-49FA-9909-00047041FC7C@mcs.anl.gov> References: <302E9415-1E6A-4DF4-9B9F-FA6240DF4C18@gmail.com> <86483B5B-8BB5-49FA-9909-00047041FC7C@mcs.anl.gov> Message-ID: <0810F087-05DB-4C6C-B6F7-E2C9D578E11F@gmail.com> Hi Barry, Attached is the output. -Manav || R ||_2 = 433013 : || R_i ||_2 = ( 2.81069e-07 , 433013 ) 0 SNES Function norm 4.330127018922e+05 0 KSP Residual norm 1.746840810717e-02 1 KSP Residual norm 5.983637077441e-12 Linear solve converged due to CONVERGED_RTOL iterations 1 || R ||_2 = 5.07622e-07 : || R_i ||_2 = ( 5.07622e-07 , 7.12896e-11 ) 1 SNES Function norm 5.076218984984e-07 Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 Time step: 1 : t = 0.001 : xdot-L2 = 2.01301e-06 || R ||_2 = 1.73186e+06 : || R_i ||_2 = ( 5.77273e-07 , 1.73186e+06 ) 0 SNES Function norm 1.731856101521e+06 0 KSP Residual norm 1.745842035995e-02 1 KSP Residual norm 2.366595812944e-12 Linear solve converged due to CONVERGED_RTOL iterations 1 || R ||_2 = 1.29889e+06 : || R_i ||_2 = ( 9.7379e-07 , 1.29889e+06 ) 1 SNES Function norm 1.298892076140e+06 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 Time step: 2 : t = 0.002 : xdot-L2 = 2.72522e-06 || R ||_2 = 433159 : || R_i ||_2 = ( 1.35694e-06 , 433159 ) 0 SNES Function norm 4.331589481273e+05 0 KSP Residual norm 1.744848431182e-02 1 KSP Residual norm 7.650255893811e-12 Linear solve converged due to CONVERGED_RTOL iterations 1 || R ||_2 = 866074 : || R_i ||_2 = ( 8.42454e-07 , 866074 ) 1 SNES Function norm 8.660737893156e+05 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 Time step: 3 : t = 0.003 : xdot-L2 = 3.59383e-06 || R ||_2 = 2.59764e+06 : || R_i ||_2 = ( 1.12168e-06 , 2.59764e+06 ) 0 SNES Function norm 2.597639281693e+06 0 KSP Residual norm 1.743859969865e-02 1 KSP Residual norm 1.045225058356e-11 Linear solve converged due to CONVERGED_RTOL iterations 1 || R ||_2 = 2.16477e+06 : || R_i ||_2 = ( 9.73157e-07 , 2.16477e+06 ) 1 SNES Function norm 2.164772029312e+06 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 Changing dt: old dt = 0.001 new dt = 0.001 Time step: 4 : t = 0.004 : xdot-L2 = 4.9281e-06 || R ||_2 = 1.29933e+06 : || R_i ||_2 = ( 1.2554e-06 , 1.29933e+06 ) 0 SNES Function norm 1.299329216581e+06 0 KSP Residual norm 1.742876625743e-02 1 KSP Residual norm 6.512084951772e-12 Linear solve converged due to CONVERGED_RTOL iterations 1 || R ||_2 = 1.73215e+06 : || R_i ||_2 = ( 1.00171e-06 , 1.73215e+06 ) 1 SNES Function norm 1.732147071753e+06 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > On Apr 20, 2017, at 2:38 PM, Barry Smith wrote: > > > Run with -snes_monitor also and send the output > > >> On Apr 20, 2017, at 2:30 PM, Manav Bhatia wrote: >> >> Hi, >> >> I have a time-dependent multiphysics problem that I am trying to solve using pc_fieldsplit. I have defined a nested matrix for the jacobian with the diagonal block matrices explicitly created and the off-diagonal blocks defined using shell matrices, so that the matrix vector product is defined. The nonlinear system of equations at each time-step is solved using an snes construct. >> >> I have been facing some convergence issues, so I have reduced the problem scope to ensure that the code converges to a single discipline solution when the off-diagonal couplings are ignored. >> >> Here, I have provided a constant forcing function to discipline two, which is a linear problem, so that I expect convergence in a single iteration. >> >> The linear solver, defined using pc_fieldsplit seems to be converging without problems. The nonlinear solver convergence in a single time-step with FNORM in the first time step. >> >> The second time-step onwards, the nonlinear solver does not converge in a single step, and is terminating due to SNORM_RELATIVE. I am not sure why this is happening. >> >> What is intriguing is that the solution at the end of the n^th time-step is n times the solution after the first time step. In other words, snes at each time-step is taking the same step as was used in the first time-step. >> >> Not sure sure why this is happening. I would appreciate any advice. >> >> Regards, >> Manav >> >> Time step: 0 : t = 0.000 >> || R ||_2 = 433013 : || R_i ||_2 = ( 2.81069e-07 , 433013 ) >> 0 KSP Residual norm 1.746840810717e-02 >> 1 KSP Residual norm 5.983637077441e-12 >> Linear solve converged due to CONVERGED_RTOL iterations 1 >> || R ||_2 = 5.07622e-07 : || R_i ||_2 = ( 5.07622e-07 , 7.12896e-11 ) >> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >> Time step: 1 : t = 0.001 >> || R ||_2 = 1.73186e+06 : || R_i ||_2 = ( 5.77273e-07 , 1.73186e+06 ) >> 0 KSP Residual norm 1.745842035995e-02 >> 1 KSP Residual norm 2.366595812944e-12 >> Linear solve converged due to CONVERGED_RTOL iterations 1 >> || R ||_2 = 1.29889e+06 : || R_i ||_2 = ( 9.7379e-07 , 1.29889e+06 ) >> Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 >> Time step: 2 : t = 0.002 : xdot-L2 = 2.72522e-06 >> || R ||_2 = 433159 : || R_i ||_2 = ( 1.35694e-06 , 433159 ) >> 0 KSP Residual norm 1.744848431182e-02 >> 1 KSP Residual norm 7.650255893811e-12 >> Linear solve converged due to CONVERGED_RTOL iterations 1 >> || R ||_2 = 866074 : || R_i ||_2 = ( 8.42454e-07 , 866074 ) >> Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 >> Time step: 3 : t = 0.003 >> || R ||_2 = 2.59764e+06 : || R_i ||_2 = ( 1.12168e-06 , 2.59764e+06 ) >> 0 KSP Residual norm 1.743859969865e-02 >> 1 KSP Residual norm 1.045225058356e-11 >> Linear solve converged due to CONVERGED_RTOL iterations 1 >> || R ||_2 = 2.16477e+06 : || R_i ||_2 = ( 9.73157e-07 , 2.16477e+06 ) >> Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 >> >> >> Following is the output of snesview: >> SNES Object: 1 MPI processes >> type: newtonls >> maximum iterations=50, maximum function evaluations=10000 >> tolerances: relative=1e-08, absolute=1e-50, solution=1e-08 >> total number of linear solver iterations=1 >> total number of function evaluations=2 >> norm schedule ALWAYS >> SNESLineSearch Object: 1 MPI processes >> type: basic >> maxstep=1.000000e+08, minlambda=1.000000e-12 >> tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 >> maximum iterations=1 >> KSP Object: 1 MPI processes >> type: gmres >> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement >> GMRES: happy breakdown tolerance 1e-30 >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using PRECONDITIONED norm type for convergence test >> PC Object: 1 MPI processes >> type: fieldsplit >> FieldSplit with SYMMETRIC_MULTIPLICATIVE composition: total splits = 2 >> Solver info for each split is in the following KSP objects: >> Split number 0 Defined by IS >> KSP Object: (fieldsplit_0_) 1 MPI processes >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (fieldsplit_0_) 1 MPI processes >> type: ilu >> ILU: out-of-place factorization >> 0 levels of fill >> tolerance for zero pivot 2.22045e-14 >> matrix ordering: natural >> factor fill ratio given 1., needed 1. >> Factored matrix follows: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=108, cols=108 >> package used to perform factorization: petsc >> total: nonzeros=2800, allocated nonzeros=2800 >> total number of mallocs used during MatSetValues calls =0 >> using I-node routines: found 27 nodes, limit used is 5 >> linear system matrix = precond matrix: >> Mat Object: (fieldsplit_0_) 1 MPI processes >> type: seqaij >> rows=108, cols=108 >> total: nonzeros=2800, allocated nonzeros=2800 >> total number of mallocs used during MatSetValues calls =0 >> using I-node routines: found 27 nodes, limit used is 5 >> Split number 1 Defined by IS >> KSP Object: (fieldsplit_1_) 1 MPI processes >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (fieldsplit_1_) 1 MPI processes >> type: lu >> LU: out-of-place factorization >> tolerance for zero pivot 2.22045e-14 >> matrix ordering: nd >> factor fill ratio given 5., needed 1.15385 >> Factored matrix follows: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=30, cols=30 >> package used to perform factorization: petsc >> total: nonzeros=540, allocated nonzeros=540 >> total number of mallocs used during MatSetValues calls =0 >> using I-node routines: found 9 nodes, limit used is 5 >> linear system matrix = precond matrix: >> Mat Object: (fieldsplit_1_) 1 MPI processes >> type: seqaij >> rows=30, cols=30 >> total: nonzeros=468, allocated nonzeros=468 >> total number of mallocs used during MatSetValues calls =0 >> using I-node routines: found 10 nodes, limit used is 5 >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: nest >> rows=138, cols=138 >> Matrix object: >> type=nest, rows=2, cols=2 >> MatNest structure: >> (0,0) : prefix="fieldsplit_0_", type=seqaij, rows=108, cols=108 >> (0,1) : type=shell, rows=108, cols=30 >> (1,0) : type=shell, rows=30, cols=108 >> (1,1) : prefix="fieldsplit_1_", type=seqaij, rows=30, cols=30 >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Apr 20 14:47:08 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 20 Apr 2017 14:47:08 -0500 Subject: [petsc-users] sees convergence with pc_fieldsplit In-Reply-To: <0810F087-05DB-4C6C-B6F7-E2C9D578E11F@gmail.com> References: <302E9415-1E6A-4DF4-9B9F-FA6240DF4C18@gmail.com> <86483B5B-8BB5-49FA-9909-00047041FC7C@mcs.anl.gov> <0810F087-05DB-4C6C-B6F7-E2C9D578E11F@gmail.com> Message-ID: <15FD6AE3-C46A-4CDE-B114-7C33D54F7E93@mcs.anl.gov> 1 SNES Function norm 1.298892076140e+06 Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 This is not good. snorm convergence means the full Newton step is small relative to the current solution. It should generally not happen when the function norm is huge like in your case. My guess is that your "Newton direction" is not a Newton (descent) direction at the second time step. So something is wrong with the generation of the Jacobian at the second time step. Barry > On Apr 20, 2017, at 2:40 PM, Manav Bhatia wrote: > > Hi Barry, > > Attached is the output. > > -Manav > > || R ||_2 = 433013 : || R_i ||_2 = ( 2.81069e-07 , 433013 ) > 0 SNES Function norm 4.330127018922e+05 > 0 KSP Residual norm 1.746840810717e-02 > 1 KSP Residual norm 5.983637077441e-12 > Linear solve converged due to CONVERGED_RTOL iterations 1 > || R ||_2 = 5.07622e-07 : || R_i ||_2 = ( 5.07622e-07 , 7.12896e-11 ) > 1 SNES Function norm 5.076218984984e-07 > Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 > Time step: 1 : t = 0.001 : xdot-L2 = 2.01301e-06 > || R ||_2 = 1.73186e+06 : || R_i ||_2 = ( 5.77273e-07 , 1.73186e+06 ) > 0 SNES Function norm 1.731856101521e+06 > 0 KSP Residual norm 1.745842035995e-02 > 1 KSP Residual norm 2.366595812944e-12 > Linear solve converged due to CONVERGED_RTOL iterations 1 > || R ||_2 = 1.29889e+06 : || R_i ||_2 = ( 9.7379e-07 , 1.29889e+06 ) > 1 SNES Function norm 1.298892076140e+06 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > Time step: 2 : t = 0.002 : xdot-L2 = 2.72522e-06 > || R ||_2 = 433159 : || R_i ||_2 = ( 1.35694e-06 , 433159 ) > 0 SNES Function norm 4.331589481273e+05 > 0 KSP Residual norm 1.744848431182e-02 > 1 KSP Residual norm 7.650255893811e-12 > Linear solve converged due to CONVERGED_RTOL iterations 1 > || R ||_2 = 866074 : || R_i ||_2 = ( 8.42454e-07 , 866074 ) > 1 SNES Function norm 8.660737893156e+05 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > Time step: 3 : t = 0.003 : xdot-L2 = 3.59383e-06 > || R ||_2 = 2.59764e+06 : || R_i ||_2 = ( 1.12168e-06 , 2.59764e+06 ) > 0 SNES Function norm 2.597639281693e+06 > 0 KSP Residual norm 1.743859969865e-02 > 1 KSP Residual norm 1.045225058356e-11 > Linear solve converged due to CONVERGED_RTOL iterations 1 > || R ||_2 = 2.16477e+06 : || R_i ||_2 = ( 9.73157e-07 , 2.16477e+06 ) > 1 SNES Function norm 2.164772029312e+06 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > Changing dt: old dt = 0.001 new dt = 0.001 > Time step: 4 : t = 0.004 : xdot-L2 = 4.9281e-06 > || R ||_2 = 1.29933e+06 : || R_i ||_2 = ( 1.2554e-06 , 1.29933e+06 ) > 0 SNES Function norm 1.299329216581e+06 > 0 KSP Residual norm 1.742876625743e-02 > 1 KSP Residual norm 6.512084951772e-12 > Linear solve converged due to CONVERGED_RTOL iterations 1 > || R ||_2 = 1.73215e+06 : || R_i ||_2 = ( 1.00171e-06 , 1.73215e+06 ) > 1 SNES Function norm 1.732147071753e+06 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > >> On Apr 20, 2017, at 2:38 PM, Barry Smith wrote: >> >> >> Run with -snes_monitor also and send the output >> >> >>> On Apr 20, 2017, at 2:30 PM, Manav Bhatia wrote: >>> >>> Hi, >>> >>> I have a time-dependent multiphysics problem that I am trying to solve using pc_fieldsplit. I have defined a nested matrix for the jacobian with the diagonal block matrices explicitly created and the off-diagonal blocks defined using shell matrices, so that the matrix vector product is defined. The nonlinear system of equations at each time-step is solved using an snes construct. >>> >>> I have been facing some convergence issues, so I have reduced the problem scope to ensure that the code converges to a single discipline solution when the off-diagonal couplings are ignored. >>> >>> Here, I have provided a constant forcing function to discipline two, which is a linear problem, so that I expect convergence in a single iteration. >>> >>> The linear solver, defined using pc_fieldsplit seems to be converging without problems. The nonlinear solver convergence in a single time-step with FNORM in the first time step. >>> >>> The second time-step onwards, the nonlinear solver does not converge in a single step, and is terminating due to SNORM_RELATIVE. I am not sure why this is happening. >>> >>> What is intriguing is that the solution at the end of the n^th time-step is n times the solution after the first time step. In other words, snes at each time-step is taking the same step as was used in the first time-step. >>> >>> Not sure sure why this is happening. I would appreciate any advice. >>> >>> Regards, >>> Manav >>> >>> Time step: 0 : t = 0.000 >>> || R ||_2 = 433013 : || R_i ||_2 = ( 2.81069e-07 , 433013 ) >>> 0 KSP Residual norm 1.746840810717e-02 >>> 1 KSP Residual norm 5.983637077441e-12 >>> Linear solve converged due to CONVERGED_RTOL iterations 1 >>> || R ||_2 = 5.07622e-07 : || R_i ||_2 = ( 5.07622e-07 , 7.12896e-11 ) >>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >>> Time step: 1 : t = 0.001 >>> || R ||_2 = 1.73186e+06 : || R_i ||_2 = ( 5.77273e-07 , 1.73186e+06 ) >>> 0 KSP Residual norm 1.745842035995e-02 >>> 1 KSP Residual norm 2.366595812944e-12 >>> Linear solve converged due to CONVERGED_RTOL iterations 1 >>> || R ||_2 = 1.29889e+06 : || R_i ||_2 = ( 9.7379e-07 , 1.29889e+06 ) >>> Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 >>> Time step: 2 : t = 0.002 : xdot-L2 = 2.72522e-06 >>> || R ||_2 = 433159 : || R_i ||_2 = ( 1.35694e-06 , 433159 ) >>> 0 KSP Residual norm 1.744848431182e-02 >>> 1 KSP Residual norm 7.650255893811e-12 >>> Linear solve converged due to CONVERGED_RTOL iterations 1 >>> || R ||_2 = 866074 : || R_i ||_2 = ( 8.42454e-07 , 866074 ) >>> Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 >>> Time step: 3 : t = 0.003 >>> || R ||_2 = 2.59764e+06 : || R_i ||_2 = ( 1.12168e-06 , 2.59764e+06 ) >>> 0 KSP Residual norm 1.743859969865e-02 >>> 1 KSP Residual norm 1.045225058356e-11 >>> Linear solve converged due to CONVERGED_RTOL iterations 1 >>> || R ||_2 = 2.16477e+06 : || R_i ||_2 = ( 9.73157e-07 , 2.16477e+06 ) >>> Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 >>> >>> >>> Following is the output of snesview: >>> SNES Object: 1 MPI processes >>> type: newtonls >>> maximum iterations=50, maximum function evaluations=10000 >>> tolerances: relative=1e-08, absolute=1e-50, solution=1e-08 >>> total number of linear solver iterations=1 >>> total number of function evaluations=2 >>> norm schedule ALWAYS >>> SNESLineSearch Object: 1 MPI processes >>> type: basic >>> maxstep=1.000000e+08, minlambda=1.000000e-12 >>> tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 >>> maximum iterations=1 >>> KSP Object: 1 MPI processes >>> type: gmres >>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement >>> GMRES: happy breakdown tolerance 1e-30 >>> maximum iterations=10000, initial guess is zero >>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>> left preconditioning >>> using PRECONDITIONED norm type for convergence test >>> PC Object: 1 MPI processes >>> type: fieldsplit >>> FieldSplit with SYMMETRIC_MULTIPLICATIVE composition: total splits = 2 >>> Solver info for each split is in the following KSP objects: >>> Split number 0 Defined by IS >>> KSP Object: (fieldsplit_0_) 1 MPI processes >>> type: preonly >>> maximum iterations=10000, initial guess is zero >>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>> left preconditioning >>> using NONE norm type for convergence test >>> PC Object: (fieldsplit_0_) 1 MPI processes >>> type: ilu >>> ILU: out-of-place factorization >>> 0 levels of fill >>> tolerance for zero pivot 2.22045e-14 >>> matrix ordering: natural >>> factor fill ratio given 1., needed 1. >>> Factored matrix follows: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=108, cols=108 >>> package used to perform factorization: petsc >>> total: nonzeros=2800, allocated nonzeros=2800 >>> total number of mallocs used during MatSetValues calls =0 >>> using I-node routines: found 27 nodes, limit used is 5 >>> linear system matrix = precond matrix: >>> Mat Object: (fieldsplit_0_) 1 MPI processes >>> type: seqaij >>> rows=108, cols=108 >>> total: nonzeros=2800, allocated nonzeros=2800 >>> total number of mallocs used during MatSetValues calls =0 >>> using I-node routines: found 27 nodes, limit used is 5 >>> Split number 1 Defined by IS >>> KSP Object: (fieldsplit_1_) 1 MPI processes >>> type: preonly >>> maximum iterations=10000, initial guess is zero >>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>> left preconditioning >>> using NONE norm type for convergence test >>> PC Object: (fieldsplit_1_) 1 MPI processes >>> type: lu >>> LU: out-of-place factorization >>> tolerance for zero pivot 2.22045e-14 >>> matrix ordering: nd >>> factor fill ratio given 5., needed 1.15385 >>> Factored matrix follows: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=30, cols=30 >>> package used to perform factorization: petsc >>> total: nonzeros=540, allocated nonzeros=540 >>> total number of mallocs used during MatSetValues calls =0 >>> using I-node routines: found 9 nodes, limit used is 5 >>> linear system matrix = precond matrix: >>> Mat Object: (fieldsplit_1_) 1 MPI processes >>> type: seqaij >>> rows=30, cols=30 >>> total: nonzeros=468, allocated nonzeros=468 >>> total number of mallocs used during MatSetValues calls =0 >>> using I-node routines: found 10 nodes, limit used is 5 >>> linear system matrix = precond matrix: >>> Mat Object: 1 MPI processes >>> type: nest >>> rows=138, cols=138 >>> Matrix object: >>> type=nest, rows=2, cols=2 >>> MatNest structure: >>> (0,0) : prefix="fieldsplit_0_", type=seqaij, rows=108, cols=108 >>> (0,1) : type=shell, rows=108, cols=30 >>> (1,0) : type=shell, rows=30, cols=108 >>> (1,1) : prefix="fieldsplit_1_", type=seqaij, rows=30, cols=30 >>> >>> >> > From bhatiamanav at gmail.com Thu Apr 20 15:00:07 2017 From: bhatiamanav at gmail.com (Manav Bhatia) Date: Thu, 20 Apr 2017 15:00:07 -0500 Subject: [petsc-users] sees convergence with pc_fieldsplit In-Reply-To: <15FD6AE3-C46A-4CDE-B114-7C33D54F7E93@mcs.anl.gov> References: <302E9415-1E6A-4DF4-9B9F-FA6240DF4C18@gmail.com> <86483B5B-8BB5-49FA-9909-00047041FC7C@mcs.anl.gov> <0810F087-05DB-4C6C-B6F7-E2C9D578E11F@gmail.com> <15FD6AE3-C46A-4CDE-B114-7C33D54F7E93@mcs.anl.gov> Message-ID: <457C1A20-920C-4EB5-A07F-9BA406F4BB54@gmail.com> That was it! I had an error in the off-diagonal coupling term, which was giving a non-zero contribution in this test. SNES is converging is one iteration at each time-step now. Thanks! Manav > On Apr 20, 2017, at 2:47 PM, Barry Smith wrote: > > > 1 SNES Function norm 1.298892076140e+06 > Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 > > This is not good. snorm convergence means the full Newton step is small relative to the current solution. It should generally not happen when the function norm is huge like in your case. > > My guess is that your "Newton direction" is not a Newton (descent) direction at the second time step. So something is wrong with the generation of the Jacobian at the second time step. > > Barry > > > >> On Apr 20, 2017, at 2:40 PM, Manav Bhatia wrote: >> >> Hi Barry, >> >> Attached is the output. >> >> -Manav >> >> || R ||_2 = 433013 : || R_i ||_2 = ( 2.81069e-07 , 433013 ) >> 0 SNES Function norm 4.330127018922e+05 >> 0 KSP Residual norm 1.746840810717e-02 >> 1 KSP Residual norm 5.983637077441e-12 >> Linear solve converged due to CONVERGED_RTOL iterations 1 >> || R ||_2 = 5.07622e-07 : || R_i ||_2 = ( 5.07622e-07 , 7.12896e-11 ) >> 1 SNES Function norm 5.076218984984e-07 >> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >> Time step: 1 : t = 0.001 : xdot-L2 = 2.01301e-06 >> || R ||_2 = 1.73186e+06 : || R_i ||_2 = ( 5.77273e-07 , 1.73186e+06 ) >> 0 SNES Function norm 1.731856101521e+06 >> 0 KSP Residual norm 1.745842035995e-02 >> 1 KSP Residual norm 2.366595812944e-12 >> Linear solve converged due to CONVERGED_RTOL iterations 1 >> || R ||_2 = 1.29889e+06 : || R_i ||_2 = ( 9.7379e-07 , 1.29889e+06 ) >> 1 SNES Function norm 1.298892076140e+06 >> Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 >> Time step: 2 : t = 0.002 : xdot-L2 = 2.72522e-06 >> || R ||_2 = 433159 : || R_i ||_2 = ( 1.35694e-06 , 433159 ) >> 0 SNES Function norm 4.331589481273e+05 >> 0 KSP Residual norm 1.744848431182e-02 >> 1 KSP Residual norm 7.650255893811e-12 >> Linear solve converged due to CONVERGED_RTOL iterations 1 >> || R ||_2 = 866074 : || R_i ||_2 = ( 8.42454e-07 , 866074 ) >> 1 SNES Function norm 8.660737893156e+05 >> Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 >> Time step: 3 : t = 0.003 : xdot-L2 = 3.59383e-06 >> || R ||_2 = 2.59764e+06 : || R_i ||_2 = ( 1.12168e-06 , 2.59764e+06 ) >> 0 SNES Function norm 2.597639281693e+06 >> 0 KSP Residual norm 1.743859969865e-02 >> 1 KSP Residual norm 1.045225058356e-11 >> Linear solve converged due to CONVERGED_RTOL iterations 1 >> || R ||_2 = 2.16477e+06 : || R_i ||_2 = ( 9.73157e-07 , 2.16477e+06 ) >> 1 SNES Function norm 2.164772029312e+06 >> Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 >> Changing dt: old dt = 0.001 new dt = 0.001 >> Time step: 4 : t = 0.004 : xdot-L2 = 4.9281e-06 >> || R ||_2 = 1.29933e+06 : || R_i ||_2 = ( 1.2554e-06 , 1.29933e+06 ) >> 0 SNES Function norm 1.299329216581e+06 >> 0 KSP Residual norm 1.742876625743e-02 >> 1 KSP Residual norm 6.512084951772e-12 >> Linear solve converged due to CONVERGED_RTOL iterations 1 >> || R ||_2 = 1.73215e+06 : || R_i ||_2 = ( 1.00171e-06 , 1.73215e+06 ) >> 1 SNES Function norm 1.732147071753e+06 >> Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 >> >> >>> On Apr 20, 2017, at 2:38 PM, Barry Smith wrote: >>> >>> >>> Run with -snes_monitor also and send the output >>> >>> >>>> On Apr 20, 2017, at 2:30 PM, Manav Bhatia wrote: >>>> >>>> Hi, >>>> >>>> I have a time-dependent multiphysics problem that I am trying to solve using pc_fieldsplit. I have defined a nested matrix for the jacobian with the diagonal block matrices explicitly created and the off-diagonal blocks defined using shell matrices, so that the matrix vector product is defined. The nonlinear system of equations at each time-step is solved using an snes construct. >>>> >>>> I have been facing some convergence issues, so I have reduced the problem scope to ensure that the code converges to a single discipline solution when the off-diagonal couplings are ignored. >>>> >>>> Here, I have provided a constant forcing function to discipline two, which is a linear problem, so that I expect convergence in a single iteration. >>>> >>>> The linear solver, defined using pc_fieldsplit seems to be converging without problems. The nonlinear solver convergence in a single time-step with FNORM in the first time step. >>>> >>>> The second time-step onwards, the nonlinear solver does not converge in a single step, and is terminating due to SNORM_RELATIVE. I am not sure why this is happening. >>>> >>>> What is intriguing is that the solution at the end of the n^th time-step is n times the solution after the first time step. In other words, snes at each time-step is taking the same step as was used in the first time-step. >>>> >>>> Not sure sure why this is happening. I would appreciate any advice. >>>> >>>> Regards, >>>> Manav >>>> >>>> Time step: 0 : t = 0.000 >>>> || R ||_2 = 433013 : || R_i ||_2 = ( 2.81069e-07 , 433013 ) >>>> 0 KSP Residual norm 1.746840810717e-02 >>>> 1 KSP Residual norm 5.983637077441e-12 >>>> Linear solve converged due to CONVERGED_RTOL iterations 1 >>>> || R ||_2 = 5.07622e-07 : || R_i ||_2 = ( 5.07622e-07 , 7.12896e-11 ) >>>> Nonlinear solve converged due to CONVERGED_FNORM_RELATIVE iterations 1 >>>> Time step: 1 : t = 0.001 >>>> || R ||_2 = 1.73186e+06 : || R_i ||_2 = ( 5.77273e-07 , 1.73186e+06 ) >>>> 0 KSP Residual norm 1.745842035995e-02 >>>> 1 KSP Residual norm 2.366595812944e-12 >>>> Linear solve converged due to CONVERGED_RTOL iterations 1 >>>> || R ||_2 = 1.29889e+06 : || R_i ||_2 = ( 9.7379e-07 , 1.29889e+06 ) >>>> Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 >>>> Time step: 2 : t = 0.002 : xdot-L2 = 2.72522e-06 >>>> || R ||_2 = 433159 : || R_i ||_2 = ( 1.35694e-06 , 433159 ) >>>> 0 KSP Residual norm 1.744848431182e-02 >>>> 1 KSP Residual norm 7.650255893811e-12 >>>> Linear solve converged due to CONVERGED_RTOL iterations 1 >>>> || R ||_2 = 866074 : || R_i ||_2 = ( 8.42454e-07 , 866074 ) >>>> Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 >>>> Time step: 3 : t = 0.003 >>>> || R ||_2 = 2.59764e+06 : || R_i ||_2 = ( 1.12168e-06 , 2.59764e+06 ) >>>> 0 KSP Residual norm 1.743859969865e-02 >>>> 1 KSP Residual norm 1.045225058356e-11 >>>> Linear solve converged due to CONVERGED_RTOL iterations 1 >>>> || R ||_2 = 2.16477e+06 : || R_i ||_2 = ( 9.73157e-07 , 2.16477e+06 ) >>>> Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1 >>>> >>>> >>>> Following is the output of snesview: >>>> SNES Object: 1 MPI processes >>>> type: newtonls >>>> maximum iterations=50, maximum function evaluations=10000 >>>> tolerances: relative=1e-08, absolute=1e-50, solution=1e-08 >>>> total number of linear solver iterations=1 >>>> total number of function evaluations=2 >>>> norm schedule ALWAYS >>>> SNESLineSearch Object: 1 MPI processes >>>> type: basic >>>> maxstep=1.000000e+08, minlambda=1.000000e-12 >>>> tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08 >>>> maximum iterations=1 >>>> KSP Object: 1 MPI processes >>>> type: gmres >>>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement >>>> GMRES: happy breakdown tolerance 1e-30 >>>> maximum iterations=10000, initial guess is zero >>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>>> left preconditioning >>>> using PRECONDITIONED norm type for convergence test >>>> PC Object: 1 MPI processes >>>> type: fieldsplit >>>> FieldSplit with SYMMETRIC_MULTIPLICATIVE composition: total splits = 2 >>>> Solver info for each split is in the following KSP objects: >>>> Split number 0 Defined by IS >>>> KSP Object: (fieldsplit_0_) 1 MPI processes >>>> type: preonly >>>> maximum iterations=10000, initial guess is zero >>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>>> left preconditioning >>>> using NONE norm type for convergence test >>>> PC Object: (fieldsplit_0_) 1 MPI processes >>>> type: ilu >>>> ILU: out-of-place factorization >>>> 0 levels of fill >>>> tolerance for zero pivot 2.22045e-14 >>>> matrix ordering: natural >>>> factor fill ratio given 1., needed 1. >>>> Factored matrix follows: >>>> Mat Object: 1 MPI processes >>>> type: seqaij >>>> rows=108, cols=108 >>>> package used to perform factorization: petsc >>>> total: nonzeros=2800, allocated nonzeros=2800 >>>> total number of mallocs used during MatSetValues calls =0 >>>> using I-node routines: found 27 nodes, limit used is 5 >>>> linear system matrix = precond matrix: >>>> Mat Object: (fieldsplit_0_) 1 MPI processes >>>> type: seqaij >>>> rows=108, cols=108 >>>> total: nonzeros=2800, allocated nonzeros=2800 >>>> total number of mallocs used during MatSetValues calls =0 >>>> using I-node routines: found 27 nodes, limit used is 5 >>>> Split number 1 Defined by IS >>>> KSP Object: (fieldsplit_1_) 1 MPI processes >>>> type: preonly >>>> maximum iterations=10000, initial guess is zero >>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>>> left preconditioning >>>> using NONE norm type for convergence test >>>> PC Object: (fieldsplit_1_) 1 MPI processes >>>> type: lu >>>> LU: out-of-place factorization >>>> tolerance for zero pivot 2.22045e-14 >>>> matrix ordering: nd >>>> factor fill ratio given 5., needed 1.15385 >>>> Factored matrix follows: >>>> Mat Object: 1 MPI processes >>>> type: seqaij >>>> rows=30, cols=30 >>>> package used to perform factorization: petsc >>>> total: nonzeros=540, allocated nonzeros=540 >>>> total number of mallocs used during MatSetValues calls =0 >>>> using I-node routines: found 9 nodes, limit used is 5 >>>> linear system matrix = precond matrix: >>>> Mat Object: (fieldsplit_1_) 1 MPI processes >>>> type: seqaij >>>> rows=30, cols=30 >>>> total: nonzeros=468, allocated nonzeros=468 >>>> total number of mallocs used during MatSetValues calls =0 >>>> using I-node routines: found 10 nodes, limit used is 5 >>>> linear system matrix = precond matrix: >>>> Mat Object: 1 MPI processes >>>> type: nest >>>> rows=138, cols=138 >>>> Matrix object: >>>> type=nest, rows=2, cols=2 >>>> MatNest structure: >>>> (0,0) : prefix="fieldsplit_0_", type=seqaij, rows=108, cols=108 >>>> (0,1) : type=shell, rows=108, cols=30 >>>> (1,0) : type=shell, rows=30, cols=108 >>>> (1,1) : prefix="fieldsplit_1_", type=seqaij, rows=30, cols=30 >>>> >>>> >>> >> > From damian at man.poznan.pl Fri Apr 21 09:27:57 2017 From: damian at man.poznan.pl (Damian Kaliszan) Date: Fri, 21 Apr 2017 16:27:57 +0200 Subject: [petsc-users] petsc4py & GPU In-Reply-To: References: <33676872.20170420120316@man.poznan.pl> <691962187.20170420160944@man.poznan.pl> , <43415ff5-1e9b-40e1-a51e-7cc220cc997d@man.poznan.pl> Message-ID: <6158761.20170421162757@man.poznan.pl> Hi Francesco, Matthew, Lisandro, Thank you a lot, it works:) However the next problem I'm facing is: - I'm trying to run run a problem (Ax=b) with the size of A (equals to 3375x3375) using PBS on 10 nodes, each node has 2 GPU cards with 2GB each. - 'qstat' shows these nodes are allocated, job is running - I use ksp/gmres - the error I get is as follows: =>> PBS: job killed: mem job total 5994876 kb exceeded limit 2048000 kb [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch system) has told this process to end [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [0]PETSC ERROR: likely location of problem given in stack below [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, [0]PETSC ERROR: INSTEAD the line number of the start of the function [0]PETSC ERROR: is given. [0]PETSC ERROR: [0] VecNorm_MPICUDA line 34 /home/users/damiank/petsc_bitbucket/src/vec/vec/impls/mpi/mpicuda/mpicuda.cu [0]PETSC ERROR: [0] VecNorm line 217 /home/users/damiank/petsc_bitbucket/src/vec/vec/interface/rvector.c [0]PETSC ERROR: [0] VecNormalize line 317 /home/users/damiank/petsc_bitbucket/src/vec/vec/interface/rvector.c [0]PETSC ERROR: [0] KSPInitialResidual line 42 /home/users/damiank/petsc_bitbucket/src/ksp/ksp/interface/itres.c [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_bitbucket/include/petsc/private/kspimpl.h [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_bitbucket/include/petsc/private/kspimpl.h [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_bitbucket/include/petsc/private/kspimpl.h [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_bitbucket/include/petsc/private/kspimpl.h [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_bitbucket/include/petsc/private/kspimpl.h [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_bitbucket/include/petsc/private/kspimpl.h [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_bitbucket/include/petsc/private/kspimpl.h [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_bitbucket/include/petsc/private/kspimpl.h [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_bitbucket/include/petsc/private/kspimpl.h [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_bitbucket/include/petsc/private/kspimpl.h [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_bitbucket/include/petsc/private/kspimpl.h [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_bitbucket/include/petsc/private/kspimpl.h [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_bitbucket/include/petsc/private/kspimpl.h [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_bitbucket/include/petsc/private/kspimpl.h [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_bitbucket/include/petsc/private/kspimpl.h [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_bitbucket/include/petsc/private/kspimpl.h [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_bitbucket/include/petsc/private/kspimpl.h [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_bitbucket/include/petsc/private/kspimpl.h [0]PETSC ERROR: [0] KSPGMRESCycle line 122 /home/users/damiank/petsc_bitbucket/src/ksp/ksp/impls/gmres/gmres.c [0]PETSC ERROR: [0] KSPInitialResidual line 42 /home/users/damiank/petsc_bitbucket/src/ksp/ksp/interface/itres.c [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_bitbucket/include/petsc/private/kspimpl.h [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_bitbucket/include/petsc/private/kspimpl.h [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_bitbucket/include/petsc/private/kspimpl.h [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_bitbucket/include/petsc/private/kspimpl.h - 'A' matrix is loaded in the following way (so as 'b' vector): viewer = petsc4py.PETSc.Viewer().createBinary(A, 'r') A = PETSc.Mat().create(comm=MPI.COMM_WORLD) A.setType(PETSc.Mat.Type.MPIAIJCUSPARSE) A.load(viewer) - Do you know what is going wrong? Best, Damian W li?cie datowanym 20 kwietnia 2017 (19:15:33) napisano: > Hi Damian, > You can use the "parse_known_args" method of the ArgumentParser > class; it will create a Namespace with the args you defined and > return the uknown ones ?as a list of strings you can pass to petsc4py.init. > See: > https://docs.python.org/3.6/library/argparse.html > section 16.4.5.7. > I used it successfully in the past. Hope this helps. > Best, > -- > Francesco Caimmi > Laboratorio di Ingegneria dei Polimeri > http://www.chem.polimi.it/polyenglab/ > Politecnico di Milano - Dipartimento di Chimica, > Materiali e Ingegneria Chimica ?Giulio Natta? > P.zza Leonardo da Vinci, 32 > I-20133 Milano > Tel. +39.02.2399.4711 > Fax +39.02.7063.8173 > francesco.caimmi at polimi.it > Skype: fmglcaimmi > ________________________________________ > From: petsc-users-bounces at mcs.anl.gov > on behalf of Damian Kaliszan > Sent: Thursday, April 20, 2017 6:26 PM > To: Lisandro Dalcin > Cc: PETSc > Subject: Re: [petsc-users] petsc4py & GPU > Hi, > There might be the problem because I'm using ArgumentParser class > to catch complex command line arguments. In this case is there any > chance to make both to cooperate or the only solution is to pass > everything through argument to init method of petsc4py? > Best, > Damian > W dniu 20 kwi 2017, o 17:37, u?ytkownik Lisandro Dalcin > > napisa?: > On 20 April 2017 at 17:09, Damian Kaliszan > > wrote: > Thank you for reply:) Sorry for maybe stupid question in the scope of setting petsc(4py) options. > Should the following calls (somewhere before creating matrix & vectors): > PETSc.Options().setValue("ksp_view", "") > PETSc.Options().setValue("log_view", "") > Unfortunately, no. There are a few options (-log_view ?) that you > should set before calling PetscInitialize() (which happens > automatically at import time), otherwise things do not work as > expected. To pass things from the command line and set them before > PetscInitialize() the usual idiom is: > import sys, petsc4py > petsc4py.init(sys.argv) > from petsc4py import PETSc > -- > Lisandro Dalcin > ============ > Research Scientist > Computer, Electrical and Mathematical Sciences & Engineering (CEMSE) > Extreme Computing Research Center (ECRC) > King Abdullah University of Science and Technology (KAUST) > http://ecrc.kaust.edu.sa/ > 4700 King Abdullah University of Science and Technology > al-Khawarizmi Bldg (Bldg 1), Office # 0109 > Thuwal 23955-6900, Kingdom of Saudi Arabia > http://www.kaust.edu.sa > Office Phone: +966 12 808-0459 ------------------------------------------------------- Damian Kaliszan Poznan Supercomputing and Networking Center HPC and Data Centres Technologies ul. Jana Paw?a II 10 61-139 Poznan POLAND phone (+48 61) 858 5109 e-mail damian at man.poznan.pl www - http://www.man.poznan.pl/ ------------------------------------------------------- From knepley at gmail.com Fri Apr 21 10:08:04 2017 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 21 Apr 2017 10:08:04 -0500 Subject: [petsc-users] petsc4py & GPU In-Reply-To: <6158761.20170421162757@man.poznan.pl> References: <33676872.20170420120316@man.poznan.pl> <691962187.20170420160944@man.poznan.pl> <43415ff5-1e9b-40e1-a51e-7cc220cc997d@man.poznan.pl> <6158761.20170421162757@man.poznan.pl> Message-ID: On Fri, Apr 21, 2017 at 9:27 AM, Damian Kaliszan wrote: > Hi Francesco, Matthew, Lisandro, > > Thank you a lot, it works:) > > However the next problem I'm facing is: > - I'm trying to run run a problem (Ax=b) with the size of A (equals to > 3375x3375) using PBS on 10 nodes, each node has 2 > GPU cards with 2GB each. > - 'qstat' shows these nodes are allocated, job > is running > - I use ksp/gmres > - the error I get is as follows: > =>> PBS: job killed: mem job total 5994876 kb exceeded limit 2048000 kb > You exceeded a memory limit. This should be mailed to your cluster admin who can tell you how to schedule it appropriately. Thanks, Matt > [0]PETSC ERROR: ------------------------------ > ------------------------------------------ > [0]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the > batch system) has told this process to end > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/ > documentation/faq.html#valgrind > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS > X to find memory corruption errors > [0]PETSC ERROR: likely location of problem given in stack below > [0]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not > available, > [0]PETSC ERROR: INSTEAD the line number of the start of the function > [0]PETSC ERROR: is given. > [0]PETSC ERROR: [0] VecNorm_MPICUDA line 34 /home/users/damiank/petsc_ > bitbucket/src/vec/vec/impls/mpi/mpicuda/mpicuda.cu > [0]PETSC ERROR: [0] VecNorm line 217 /home/users/damiank/petsc_ > bitbucket/src/vec/vec/interface/rvector.c > [0]PETSC ERROR: [0] VecNormalize line 317 /home/users/damiank/petsc_ > bitbucket/src/vec/vec/interface/rvector.c > [0]PETSC ERROR: [0] KSPInitialResidual line 42 /home/users/damiank/petsc_ > bitbucket/src/ksp/ksp/interface/itres.c > [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_ > bitbucket/include/petsc/private/kspimpl.h > [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_ > bitbucket/include/petsc/private/kspimpl.h > [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_ > bitbucket/include/petsc/private/kspimpl.h > [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_ > bitbucket/include/petsc/private/kspimpl.h > [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_ > bitbucket/include/petsc/private/kspimpl.h > [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_ > bitbucket/include/petsc/private/kspimpl.h > [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_ > bitbucket/include/petsc/private/kspimpl.h > [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_ > bitbucket/include/petsc/private/kspimpl.h > [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_ > bitbucket/include/petsc/private/kspimpl.h > [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_ > bitbucket/include/petsc/private/kspimpl.h > [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_ > bitbucket/include/petsc/private/kspimpl.h > [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_ > bitbucket/include/petsc/private/kspimpl.h > [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_ > bitbucket/include/petsc/private/kspimpl.h > [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_ > bitbucket/include/petsc/private/kspimpl.h > [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_ > bitbucket/include/petsc/private/kspimpl.h > [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_ > bitbucket/include/petsc/private/kspimpl.h > [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_ > bitbucket/include/petsc/private/kspimpl.h > [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_ > bitbucket/include/petsc/private/kspimpl.h > [0]PETSC ERROR: [0] KSPGMRESCycle line 122 /home/users/damiank/petsc_ > bitbucket/src/ksp/ksp/impls/gmres/gmres.c > [0]PETSC ERROR: [0] KSPInitialResidual line 42 /home/users/damiank/petsc_ > bitbucket/src/ksp/ksp/interface/itres.c > [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_ > bitbucket/include/petsc/private/kspimpl.h > [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_ > bitbucket/include/petsc/private/kspimpl.h > [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_ > bitbucket/include/petsc/private/kspimpl.h > [0]PETSC ERROR: [0] KSP_PCApplyBAorAB line 277 /home/users/damiank/petsc_ > bitbucket/include/petsc/private/kspimpl.h > > - 'A' matrix is loaded in the following way (so as 'b' vector): > viewer = petsc4py.PETSc.Viewer().createBinary(A, 'r') > A = PETSc.Mat().create(comm=MPI.COMM_WORLD) > A.setType(PETSc.Mat.Type.MPIAIJCUSPARSE) > A.load(viewer) > - Do you know what is going wrong? > > Best, > Damian > > W li?cie datowanym 20 kwietnia 2017 (19:15:33) napisano: > > > Hi Damian, > > > You can use the "parse_known_args" method of the ArgumentParser > > class; it will create a Namespace with the args you defined and > > return the uknown ones as a list of strings you can pass to > petsc4py.init. > > See: > > https://docs.python.org/3.6/library/argparse.html > > section 16.4.5.7. > > > I used it successfully in the past. Hope this helps. > > > Best, > > -- > > Francesco Caimmi > > > Laboratorio di Ingegneria dei Polimeri > > http://www.chem.polimi.it/polyenglab/ > > > Politecnico di Milano - Dipartimento di Chimica, > > Materiali e Ingegneria Chimica ?Giulio Natta? > > > P.zza Leonardo da Vinci, 32 > > I-20133 Milano > > Tel. +39.02.2399.4711 > > Fax +39.02.7063.8173 > > > francesco.caimmi at polimi.it > > Skype: fmglcaimmi > > > ________________________________________ > > From: petsc-users-bounces at mcs.anl.gov > > on behalf of Damian Kaliszan < > damian at man.poznan.pl> > > Sent: Thursday, April 20, 2017 6:26 PM > > To: Lisandro Dalcin > > Cc: PETSc > > Subject: Re: [petsc-users] petsc4py & GPU > > > Hi, > > There might be the problem because I'm using ArgumentParser class > > to catch complex command line arguments. In this case is there any > > chance to make both to cooperate or the only solution is to pass > > everything through argument to init method of petsc4py? > > Best, > > Damian > > W dniu 20 kwi 2017, o 17:37, u?ytkownik Lisandro Dalcin > > > napisa?: > > > On 20 April 2017 at 17:09, Damian Kaliszan > > > wrote: > > Thank you for reply:) Sorry for maybe stupid question in the scope of > setting petsc(4py) options. > > Should the following calls (somewhere before creating matrix & vectors): > > > PETSc.Options().setValue("ksp_view", "") > > PETSc.Options().setValue("log_view", "") > > > Unfortunately, no. There are a few options (-log_view ?) that you > > should set before calling PetscInitialize() (which happens > > automatically at import time), otherwise things do not work as > > expected. To pass things from the command line and set them before > > PetscInitialize() the usual idiom is: > > > import sys, petsc4py > > petsc4py.init(sys.argv) > > from petsc4py import PETSc > > > > > -- > > Lisandro Dalcin > > ============ > > Research Scientist > > Computer, Electrical and Mathematical Sciences & Engineering (CEMSE) > > Extreme Computing Research Center (ECRC) > > King Abdullah University of Science and Technology (KAUST) > > http://ecrc.kaust.edu.sa/ > > > 4700 King Abdullah University of Science and Technology > > al-Khawarizmi Bldg (Bldg 1), Office # 0109 > > Thuwal 23955-6900, Kingdom of Saudi Arabia > > http://www.kaust.edu.sa > > > Office Phone: +966 12 808-0459 > > > > ------------------------------------------------------- > Damian Kaliszan > > Poznan Supercomputing and Networking Center > HPC and Data Centres Technologies > ul. Jana Paw?a II 10 > 61-139 Poznan > POLAND > > phone (+48 61) 858 5109 > e-mail damian at man.poznan.pl > www - http://www.man.poznan.pl/ > ------------------------------------------------------- > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From tinap89 at yahoo.com Sat Apr 22 20:37:47 2017 From: tinap89 at yahoo.com (Tina Patel) Date: Sun, 23 Apr 2017 01:37:47 +0000 (UTC) Subject: [petsc-users] Extract subdomain values from DMDA References: <1954929953.7016065.1492911467522.ref@mail.yahoo.com> Message-ID: <1954929953.7016065.1492911467522@mail.yahoo.com> Hello everyone, I want to manipulate a global vector that has values only on the subgrid of a larger grid obtained by DMDACreate3d(). For example, I'd like to extract a 6x6x6 chunk from a 7x7x7 grid.? At some point, I need to put that 6x6x6 chunk from the logical xyz cartesian coordinates into a natural vector "b" to solve Ax=b at every iteration, then map it back to xyz coordinate system to do further calculations with ghost values. The closest thing I can compare it to is MPI_Type_create_subarray. Do I just use this or is there a better way? Thanks for your time,Tina -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sat Apr 22 21:43:40 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 22 Apr 2017 21:43:40 -0500 Subject: [petsc-users] Extract subdomain values from DMDA In-Reply-To: <1954929953.7016065.1492911467522@mail.yahoo.com> References: <1954929953.7016065.1492911467522.ref@mail.yahoo.com> <1954929953.7016065.1492911467522@mail.yahoo.com> Message-ID: <7F7C1535-B53E-4210-9B4A-2D37E3EDC74A@mcs.anl.gov> Tina, Is the matrix also obtained by extracting it from a larger matrix on the larger grid or do you generate the matrix for the smaller problem directly on the smaller grid? In order to "pull out" the part you want, you need to create two index sets that indicate the values you are taking from the larger array and where you are putting them in the smaller array. This is easier to do in the natural ordering than the PETSc ordering on the DMDA so I would do the following create a DMDA for both the large grid and the smaller grid (I would write a test code in 2d first because it is much easier to debug in 2d), then you can create Vecs to hold the vectors in "natural ordering" for each grid with DMDACreateNaturalVector(). You can use DMDAGlobalToNaturalBegin()/DMDAGlobalToNaturalEnd() to take the larger vector into natural ordering, then use a VecScatterBegin/End() to select the parts you want for the smaller grid, then use DMDANaturalToGlobalBegin()/DMDANaturalToGlobalEnd() to move the results to the DMDA ordering on the smaller grid, solve the system and then do the reverse process to get the values back onto the larger grid. The only slightly tricky part is generating the two IS you need to create the VecScatter that you use to move from the natural ordering on the larger grid to the natural ordering on the smaller grid. But since you can create the natural ordering numbering from the i,j (in 2d or i,j,k in 3d) location of the grid points on the larger grid to the smaller grid this is not too difficult; but I cannot tell you how to do this since it depends on what locations are you taking from the larger grid to the smaller grid). Feel free to ask more questions as you work on it. Barry > On Apr 22, 2017, at 8:37 PM, Tina Patel wrote: > > Hello everyone, > > I want to manipulate a global vector that has values only on the subgrid of a larger grid obtained by DMDACreate3d(). For example, I'd like to extract a 6x6x6 chunk from a 7x7x7 grid. > > At some point, I need to put that 6x6x6 chunk from the logical xyz cartesian coordinates into a natural vector "b" to solve Ax=b at every iteration, then map it back to xyz coordinate system to do further calculations with ghost values. > > The closest thing I can compare it to is MPI_Type_create_subarray. Do I just use this or is there a better way? > > > Thanks for your time, > Tina From jed at jedbrown.org Sat Apr 22 22:09:29 2017 From: jed at jedbrown.org (Jed Brown) Date: Sat, 22 Apr 2017 21:09:29 -0600 Subject: [petsc-users] Extract subdomain values from DMDA In-Reply-To: <1954929953.7016065.1492911467522@mail.yahoo.com> References: <1954929953.7016065.1492911467522.ref@mail.yahoo.com> <1954929953.7016065.1492911467522@mail.yahoo.com> Message-ID: <87bmrn95w6.fsf@jedbrown.org> Tina Patel writes: > Hello everyone, > I want to manipulate a global vector that has values only on the subgrid of a larger grid obtained by DMDACreate3d(). For example, I'd like to extract a 6x6x6 chunk from a 7x7x7 grid.? > At some point, I need to put that 6x6x6 chunk from the logical xyz > cartesian coordinates into a natural vector "b" This doesn't sounds like "natural ordering" as defined in the users manual. > to solve Ax=b at every iteration, then map it back to xyz coordinate > system to do further calculations with ghost values. The closest > thing I can compare it to is MPI_Type_create_subarray. You could use DMDACreate3d to create a new child DMDA with smaller sizes, then create a VecScatter to extract the portion of the parent vector. The MPI calls should not be needed or relevant here. > Do I just use this or is there a better way? > > Thanks for your time,Tina -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From hgbk2008 at gmail.com Sun Apr 23 12:22:05 2017 From: hgbk2008 at gmail.com (Hoang Giang Bui) Date: Sun, 23 Apr 2017 19:22:05 +0200 Subject: [petsc-users] strange convergence Message-ID: Hello I encountered a strange convergence behavior that I have trouble to understand KSPSetFromOptions completed 0 KSP preconditioned resid norm 1.106709687386e+31 true resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.933141742664e+29 true resid norm 9.015152282123e+06 ||r(i)||/||b|| 1.000000198575e+00 2 KSP preconditioned resid norm 9.686409637174e+16 true resid norm 9.015354521944e+06 ||r(i)||/||b|| 1.000022631902e+00 3 KSP preconditioned resid norm 4.219243615809e+15 true resid norm 9.017157702420e+06 ||r(i)||/||b|| 1.000222648583e+00 ..... 999 KSP preconditioned resid norm 3.043754298076e+12 true resid norm 9.015425041089e+06 ||r(i)||/||b|| 1.000030454195e+00 1000 KSP preconditioned resid norm 3.043000287819e+12 true resid norm 9.015424313455e+06 ||r(i)||/||b|| 1.000030373483e+00 Linear solve did not converge due to DIVERGED_ITS iterations 1000 KSP Object: 4 MPI processes type: gmres GMRES: restart=1000, using Modified Gram-Schmidt Orthogonalization GMRES: happy breakdown tolerance 1e-30 maximum iterations=1000, initial guess is zero tolerances: relative=1e-20, absolute=1e-09, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 4 MPI processes type: fieldsplit FieldSplit with MULTIPLICATIVE composition: total splits = 2 Solver info for each split is in the following KSP objects: Split number 0 Defined by IS KSP Object: (fieldsplit_u_) 4 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (fieldsplit_u_) 4 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.6 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type PMIS HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: (fieldsplit_u_) 4 MPI processes type: mpiaij rows=938910, cols=938910, bs=3 total: nonzeros=8.60906e+07, allocated nonzeros=8.60906e+07 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 78749 nodes, limit used is 5 Split number 1 Defined by IS KSP Object: (fieldsplit_wp_) 4 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object: (fieldsplit_wp_) 4 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 0, needed 0 Factored matrix follows: Mat Object: 4 MPI processes type: mpiaij rows=34141, cols=34141 package used to perform factorization: pastix Error : -nan Error : -nan total: nonzeros=0, allocated nonzeros=0 Error : -nan total number of mallocs used during MatSetValues calls =0 PaStiX run parameters: Matrix type : Symmetric Level of printing (0,1,2): 0 Number of refinements iterations : 0 Error : -nan linear system matrix = precond matrix: Mat Object: (fieldsplit_wp_) 4 MPI processes type: mpiaij rows=34141, cols=34141 total: nonzeros=485655, allocated nonzeros=485655 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines linear system matrix = precond matrix: Mat Object: 4 MPI processes type: mpiaij rows=973051, cols=973051 total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 78749 nodes, limit used is 5 The pattern of convergence gives a hint that this system is somehow bad/singular. But I don't know why the preconditioned error goes up too high. Anyone has an idea? Best regards Giang Bui -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Sun Apr 23 12:32:03 2017 From: knepley at gmail.com (Matthew Knepley) Date: Sun, 23 Apr 2017 12:32:03 -0500 Subject: [petsc-users] strange convergence In-Reply-To: References: Message-ID: On Sun, Apr 23, 2017 at 12:22 PM, Hoang Giang Bui wrote: > Hello > > I encountered a strange convergence behavior that I have trouble to > understand > > KSPSetFromOptions completed > 0 KSP preconditioned resid norm 1.106709687386e+31 true resid norm > 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 2.933141742664e+29 true resid norm > 9.015152282123e+06 ||r(i)||/||b|| 1.000000198575e+00 > 2 KSP preconditioned resid norm 9.686409637174e+16 true resid norm > 9.015354521944e+06 ||r(i)||/||b|| 1.000022631902e+00 > 3 KSP preconditioned resid norm 4.219243615809e+15 true resid norm > 9.017157702420e+06 ||r(i)||/||b|| 1.000222648583e+00 > ..... > 999 KSP preconditioned resid norm 3.043754298076e+12 true resid norm > 9.015425041089e+06 ||r(i)||/||b|| 1.000030454195e+00 > 1000 KSP preconditioned resid norm 3.043000287819e+12 true resid norm > 9.015424313455e+06 ||r(i)||/||b|| 1.000030373483e+00 > Linear solve did not converge due to DIVERGED_ITS iterations 1000 > KSP Object: 4 MPI processes > type: gmres > GMRES: restart=1000, using Modified Gram-Schmidt Orthogonalization > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=1000, initial guess is zero > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: 4 MPI processes > type: fieldsplit > FieldSplit with MULTIPLICATIVE composition: total splits = 2 > Solver info for each split is in the following KSP objects: > Split number 0 Defined by IS > KSP Object: (fieldsplit_u_) 4 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > left preconditioning > using NONE norm type for convergence test > PC Object: (fieldsplit_u_) 4 MPI processes > type: hypre > HYPRE BoomerAMG preconditioning > HYPRE BoomerAMG: Cycle type V > HYPRE BoomerAMG: Maximum number of levels 25 > HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 > HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 > HYPRE BoomerAMG: Threshold for strong coupling 0.6 > HYPRE BoomerAMG: Interpolation truncation factor 0 > HYPRE BoomerAMG: Interpolation: max elements per row 0 > HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 > HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 > HYPRE BoomerAMG: Maximum row sums 0.9 > HYPRE BoomerAMG: Sweeps down 1 > HYPRE BoomerAMG: Sweeps up 1 > HYPRE BoomerAMG: Sweeps on coarse 1 > HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi > HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi > HYPRE BoomerAMG: Relax on coarse Gaussian-elimination > HYPRE BoomerAMG: Relax weight (all) 1 > HYPRE BoomerAMG: Outer relax weight (all) 1 > HYPRE BoomerAMG: Using CF-relaxation > HYPRE BoomerAMG: Measure type local > HYPRE BoomerAMG: Coarsen type PMIS > HYPRE BoomerAMG: Interpolation type classical > linear system matrix = precond matrix: > Mat Object: (fieldsplit_u_) 4 MPI processes > type: mpiaij > rows=938910, cols=938910, bs=3 > total: nonzeros=8.60906e+07, allocated nonzeros=8.60906e+07 > total number of mallocs used during MatSetValues calls =0 > using I-node (on process 0) routines: found 78749 nodes, limit > used is 5 > Split number 1 Defined by IS > KSP Object: (fieldsplit_wp_) 4 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > left preconditioning > using NONE norm type for convergence test > PC Object: (fieldsplit_wp_) 4 MPI processes > type: lu > LU: out-of-place factorization > tolerance for zero pivot 2.22045e-14 > matrix ordering: natural > factor fill ratio given 0, needed 0 > Factored matrix follows: > Mat Object: 4 MPI processes > type: mpiaij > rows=34141, cols=34141 > package used to perform factorization: pastix > Error : -nan > Error : -nan > total: nonzeros=0, allocated nonzeros=0 > Error : -nan > total number of mallocs used during MatSetValues calls =0 > PaStiX run parameters: > Matrix type : Symmetric > Level of printing (0,1,2): 0 > Number of refinements iterations : 0 > Error : -nan > linear system matrix = precond matrix: > Mat Object: (fieldsplit_wp_) 4 MPI processes > type: mpiaij > rows=34141, cols=34141 > total: nonzeros=485655, allocated nonzeros=485655 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > linear system matrix = precond matrix: > Mat Object: 4 MPI processes > type: mpiaij > rows=973051, cols=973051 > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 > total number of mallocs used during MatSetValues calls =0 > using I-node (on process 0) routines: found 78749 nodes, limit used > is 5 > > The pattern of convergence gives a hint that this system is somehow > bad/singular. But I don't know why the preconditioned error goes up too > high. Anyone has an idea? > Your original system is singular I think. Hypre badly misbehaves in this situation, generating a coarse system that has enormous condition number rather than signaling rank deficiency. We have seen this before. At least this is my guess from the information. Thanks, Matt > Best regards > Giang Bui > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sun Apr 23 12:39:21 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 23 Apr 2017 12:39:21 -0500 Subject: [petsc-users] strange convergence In-Reply-To: References: Message-ID: <7891536D-91FE-4BFF-8DAD-CE7AB85A4E57@mcs.anl.gov> Huge preconditioned norms but normal unpreconditioned norms almost always come from a very small pivot in an LU or ILU factorization. The first thing to do is monitor the two sub solves. Run with the additional options -fieldsplit_u_ksp_type richardson -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor -fieldsplit_wp_ksp_max_it 1 > On Apr 23, 2017, at 12:22 PM, Hoang Giang Bui wrote: > > Hello > > I encountered a strange convergence behavior that I have trouble to understand > > KSPSetFromOptions completed > 0 KSP preconditioned resid norm 1.106709687386e+31 true resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 2.933141742664e+29 true resid norm 9.015152282123e+06 ||r(i)||/||b|| 1.000000198575e+00 > 2 KSP preconditioned resid norm 9.686409637174e+16 true resid norm 9.015354521944e+06 ||r(i)||/||b|| 1.000022631902e+00 > 3 KSP preconditioned resid norm 4.219243615809e+15 true resid norm 9.017157702420e+06 ||r(i)||/||b|| 1.000222648583e+00 > ..... > 999 KSP preconditioned resid norm 3.043754298076e+12 true resid norm 9.015425041089e+06 ||r(i)||/||b|| 1.000030454195e+00 > 1000 KSP preconditioned resid norm 3.043000287819e+12 true resid norm 9.015424313455e+06 ||r(i)||/||b|| 1.000030373483e+00 > Linear solve did not converge due to DIVERGED_ITS iterations 1000 > KSP Object: 4 MPI processes > type: gmres > GMRES: restart=1000, using Modified Gram-Schmidt Orthogonalization > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=1000, initial guess is zero > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: 4 MPI processes > type: fieldsplit > FieldSplit with MULTIPLICATIVE composition: total splits = 2 > Solver info for each split is in the following KSP objects: > Split number 0 Defined by IS > KSP Object: (fieldsplit_u_) 4 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > left preconditioning > using NONE norm type for convergence test > PC Object: (fieldsplit_u_) 4 MPI processes > type: hypre > HYPRE BoomerAMG preconditioning > HYPRE BoomerAMG: Cycle type V > HYPRE BoomerAMG: Maximum number of levels 25 > HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 > HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 > HYPRE BoomerAMG: Threshold for strong coupling 0.6 > HYPRE BoomerAMG: Interpolation truncation factor 0 > HYPRE BoomerAMG: Interpolation: max elements per row 0 > HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 > HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 > HYPRE BoomerAMG: Maximum row sums 0.9 > HYPRE BoomerAMG: Sweeps down 1 > HYPRE BoomerAMG: Sweeps up 1 > HYPRE BoomerAMG: Sweeps on coarse 1 > HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi > HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi > HYPRE BoomerAMG: Relax on coarse Gaussian-elimination > HYPRE BoomerAMG: Relax weight (all) 1 > HYPRE BoomerAMG: Outer relax weight (all) 1 > HYPRE BoomerAMG: Using CF-relaxation > HYPRE BoomerAMG: Measure type local > HYPRE BoomerAMG: Coarsen type PMIS > HYPRE BoomerAMG: Interpolation type classical > linear system matrix = precond matrix: > Mat Object: (fieldsplit_u_) 4 MPI processes > type: mpiaij > rows=938910, cols=938910, bs=3 > total: nonzeros=8.60906e+07, allocated nonzeros=8.60906e+07 > total number of mallocs used during MatSetValues calls =0 > using I-node (on process 0) routines: found 78749 nodes, limit used is 5 > Split number 1 Defined by IS > KSP Object: (fieldsplit_wp_) 4 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > left preconditioning > using NONE norm type for convergence test > PC Object: (fieldsplit_wp_) 4 MPI processes > type: lu > LU: out-of-place factorization > tolerance for zero pivot 2.22045e-14 > matrix ordering: natural > factor fill ratio given 0, needed 0 > Factored matrix follows: > Mat Object: 4 MPI processes > type: mpiaij > rows=34141, cols=34141 > package used to perform factorization: pastix > Error : -nan > Error : -nan > total: nonzeros=0, allocated nonzeros=0 > Error : -nan > total number of mallocs used during MatSetValues calls =0 > PaStiX run parameters: > Matrix type : Symmetric > Level of printing (0,1,2): 0 > Number of refinements iterations : 0 > Error : -nan > linear system matrix = precond matrix: > Mat Object: (fieldsplit_wp_) 4 MPI processes > type: mpiaij > rows=34141, cols=34141 > total: nonzeros=485655, allocated nonzeros=485655 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > linear system matrix = precond matrix: > Mat Object: 4 MPI processes > type: mpiaij > rows=973051, cols=973051 > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 > total number of mallocs used during MatSetValues calls =0 > using I-node (on process 0) routines: found 78749 nodes, limit used is 5 > > The pattern of convergence gives a hint that this system is somehow bad/singular. But I don't know why the preconditioned error goes up too high. Anyone has an idea? > > Best regards > Giang Bui > From hgbk2008 at gmail.com Sun Apr 23 14:42:09 2017 From: hgbk2008 at gmail.com (Hoang Giang Bui) Date: Sun, 23 Apr 2017 21:42:09 +0200 Subject: [petsc-users] strange convergence In-Reply-To: <7891536D-91FE-4BFF-8DAD-CE7AB85A4E57@mcs.anl.gov> References: <7891536D-91FE-4BFF-8DAD-CE7AB85A4E57@mcs.anl.gov> Message-ID: Dear Matt/Barry With your options, it results in 0 KSP preconditioned resid norm 1.106709687386e+31 true resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 Residual norms for fieldsplit_u_ solve. 0 KSP Residual norm 2.407308987203e+36 1 KSP Residual norm 5.797185652683e+72 Residual norms for fieldsplit_wp_ solve. 0 KSP Residual norm 0.000000000000e+00 ... 999 KSP preconditioned resid norm 2.920157329174e+12 true resid norm 9.015683504616e+06 ||r(i)||/||b|| 1.000059124102e+00 Residual norms for fieldsplit_u_ solve. 0 KSP Residual norm 1.533726746719e+36 1 KSP Residual norm 3.692757392261e+72 Residual norms for fieldsplit_wp_ solve. 0 KSP Residual norm 0.000000000000e+00 Do you suggest that the pastix solver for the "wp" block encounters small pivot? In addition, seem like the "u" block is also singular. Giang On Sun, Apr 23, 2017 at 7:39 PM, Barry Smith wrote: > > Huge preconditioned norms but normal unpreconditioned norms almost > always come from a very small pivot in an LU or ILU factorization. > > The first thing to do is monitor the two sub solves. Run with the > additional options -fieldsplit_u_ksp_type richardson > -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 > -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor > -fieldsplit_wp_ksp_max_it 1 > > > On Apr 23, 2017, at 12:22 PM, Hoang Giang Bui > wrote: > > > > Hello > > > > I encountered a strange convergence behavior that I have trouble to > understand > > > > KSPSetFromOptions completed > > 0 KSP preconditioned resid norm 1.106709687386e+31 true resid norm > 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 > > 1 KSP preconditioned resid norm 2.933141742664e+29 true resid norm > 9.015152282123e+06 ||r(i)||/||b|| 1.000000198575e+00 > > 2 KSP preconditioned resid norm 9.686409637174e+16 true resid norm > 9.015354521944e+06 ||r(i)||/||b|| 1.000022631902e+00 > > 3 KSP preconditioned resid norm 4.219243615809e+15 true resid norm > 9.017157702420e+06 ||r(i)||/||b|| 1.000222648583e+00 > > ..... > > 999 KSP preconditioned resid norm 3.043754298076e+12 true resid norm > 9.015425041089e+06 ||r(i)||/||b|| 1.000030454195e+00 > > 1000 KSP preconditioned resid norm 3.043000287819e+12 true resid norm > 9.015424313455e+06 ||r(i)||/||b|| 1.000030373483e+00 > > Linear solve did not converge due to DIVERGED_ITS iterations 1000 > > KSP Object: 4 MPI processes > > type: gmres > > GMRES: restart=1000, using Modified Gram-Schmidt Orthogonalization > > GMRES: happy breakdown tolerance 1e-30 > > maximum iterations=1000, initial guess is zero > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 > > left preconditioning > > using PRECONDITIONED norm type for convergence test > > PC Object: 4 MPI processes > > type: fieldsplit > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 > > Solver info for each split is in the following KSP objects: > > Split number 0 Defined by IS > > KSP Object: (fieldsplit_u_) 4 MPI processes > > type: preonly > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > left preconditioning > > using NONE norm type for convergence test > > PC Object: (fieldsplit_u_) 4 MPI processes > > type: hypre > > HYPRE BoomerAMG preconditioning > > HYPRE BoomerAMG: Cycle type V > > HYPRE BoomerAMG: Maximum number of levels 25 > > HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 > > HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 > > HYPRE BoomerAMG: Threshold for strong coupling 0.6 > > HYPRE BoomerAMG: Interpolation truncation factor 0 > > HYPRE BoomerAMG: Interpolation: max elements per row 0 > > HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 > > HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 > > HYPRE BoomerAMG: Maximum row sums 0.9 > > HYPRE BoomerAMG: Sweeps down 1 > > HYPRE BoomerAMG: Sweeps up 1 > > HYPRE BoomerAMG: Sweeps on coarse 1 > > HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi > > HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi > > HYPRE BoomerAMG: Relax on coarse Gaussian-elimination > > HYPRE BoomerAMG: Relax weight (all) 1 > > HYPRE BoomerAMG: Outer relax weight (all) 1 > > HYPRE BoomerAMG: Using CF-relaxation > > HYPRE BoomerAMG: Measure type local > > HYPRE BoomerAMG: Coarsen type PMIS > > HYPRE BoomerAMG: Interpolation type classical > > linear system matrix = precond matrix: > > Mat Object: (fieldsplit_u_) 4 MPI processes > > type: mpiaij > > rows=938910, cols=938910, bs=3 > > total: nonzeros=8.60906e+07, allocated nonzeros=8.60906e+07 > > total number of mallocs used during MatSetValues calls =0 > > using I-node (on process 0) routines: found 78749 nodes, limit > used is 5 > > Split number 1 Defined by IS > > KSP Object: (fieldsplit_wp_) 4 MPI processes > > type: preonly > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > left preconditioning > > using NONE norm type for convergence test > > PC Object: (fieldsplit_wp_) 4 MPI processes > > type: lu > > LU: out-of-place factorization > > tolerance for zero pivot 2.22045e-14 > > matrix ordering: natural > > factor fill ratio given 0, needed 0 > > Factored matrix follows: > > Mat Object: 4 MPI processes > > type: mpiaij > > rows=34141, cols=34141 > > package used to perform factorization: pastix > > Error : -nan > > Error : -nan > > total: nonzeros=0, allocated nonzeros=0 > > Error : -nan > > total number of mallocs used during MatSetValues calls =0 > > PaStiX run parameters: > > Matrix type : Symmetric > > Level of printing (0,1,2): 0 > > Number of refinements iterations : 0 > > Error : -nan > > linear system matrix = precond matrix: > > Mat Object: (fieldsplit_wp_) 4 MPI processes > > type: mpiaij > > rows=34141, cols=34141 > > total: nonzeros=485655, allocated nonzeros=485655 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node (on process 0) routines > > linear system matrix = precond matrix: > > Mat Object: 4 MPI processes > > type: mpiaij > > rows=973051, cols=973051 > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 > > total number of mallocs used during MatSetValues calls =0 > > using I-node (on process 0) routines: found 78749 nodes, limit > used is 5 > > > > The pattern of convergence gives a hint that this system is somehow > bad/singular. But I don't know why the preconditioned error goes up too > high. Anyone has an idea? > > > > Best regards > > Giang Bui > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Sun Apr 23 15:19:55 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sun, 23 Apr 2017 15:19:55 -0500 Subject: [petsc-users] strange convergence In-Reply-To: References: <7891536D-91FE-4BFF-8DAD-CE7AB85A4E57@mcs.anl.gov> Message-ID: <425BBB58-9721-49F3-8C86-940F08E925F7@mcs.anl.gov> > On Apr 23, 2017, at 2:42 PM, Hoang Giang Bui wrote: > > Dear Matt/Barry > > With your options, it results in > > 0 KSP preconditioned resid norm 1.106709687386e+31 true resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 > Residual norms for fieldsplit_u_ solve. > 0 KSP Residual norm 2.407308987203e+36 > 1 KSP Residual norm 5.797185652683e+72 It looks like Matt is right, hypre is seemly producing useless garbage. First how do things run on one process. If you have similar problems then debug on one process (debugging any kind of problem is always far easy on one process). First run with -fieldsplit_u_type lu (instead of using hypre) to see if that works or also produces something bad. What is the operator and the boundary conditions for u? It could be singular. > Residual norms for fieldsplit_wp_ solve. > 0 KSP Residual norm 0.000000000000e+00 > ... > 999 KSP preconditioned resid norm 2.920157329174e+12 true resid norm 9.015683504616e+06 ||r(i)||/||b|| 1.000059124102e+00 > Residual norms for fieldsplit_u_ solve. > 0 KSP Residual norm 1.533726746719e+36 > 1 KSP Residual norm 3.692757392261e+72 > Residual norms for fieldsplit_wp_ solve. > 0 KSP Residual norm 0.000000000000e+00 > > Do you suggest that the pastix solver for the "wp" block encounters small pivot? In addition, seem like the "u" block is also singular. > > Giang > > On Sun, Apr 23, 2017 at 7:39 PM, Barry Smith wrote: > > Huge preconditioned norms but normal unpreconditioned norms almost always come from a very small pivot in an LU or ILU factorization. > > The first thing to do is monitor the two sub solves. Run with the additional options -fieldsplit_u_ksp_type richardson -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor -fieldsplit_wp_ksp_max_it 1 > > > On Apr 23, 2017, at 12:22 PM, Hoang Giang Bui wrote: > > > > Hello > > > > I encountered a strange convergence behavior that I have trouble to understand > > > > KSPSetFromOptions completed > > 0 KSP preconditioned resid norm 1.106709687386e+31 true resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 > > 1 KSP preconditioned resid norm 2.933141742664e+29 true resid norm 9.015152282123e+06 ||r(i)||/||b|| 1.000000198575e+00 > > 2 KSP preconditioned resid norm 9.686409637174e+16 true resid norm 9.015354521944e+06 ||r(i)||/||b|| 1.000022631902e+00 > > 3 KSP preconditioned resid norm 4.219243615809e+15 true resid norm 9.017157702420e+06 ||r(i)||/||b|| 1.000222648583e+00 > > ..... > > 999 KSP preconditioned resid norm 3.043754298076e+12 true resid norm 9.015425041089e+06 ||r(i)||/||b|| 1.000030454195e+00 > > 1000 KSP preconditioned resid norm 3.043000287819e+12 true resid norm 9.015424313455e+06 ||r(i)||/||b|| 1.000030373483e+00 > > Linear solve did not converge due to DIVERGED_ITS iterations 1000 > > KSP Object: 4 MPI processes > > type: gmres > > GMRES: restart=1000, using Modified Gram-Schmidt Orthogonalization > > GMRES: happy breakdown tolerance 1e-30 > > maximum iterations=1000, initial guess is zero > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 > > left preconditioning > > using PRECONDITIONED norm type for convergence test > > PC Object: 4 MPI processes > > type: fieldsplit > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 > > Solver info for each split is in the following KSP objects: > > Split number 0 Defined by IS > > KSP Object: (fieldsplit_u_) 4 MPI processes > > type: preonly > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > left preconditioning > > using NONE norm type for convergence test > > PC Object: (fieldsplit_u_) 4 MPI processes > > type: hypre > > HYPRE BoomerAMG preconditioning > > HYPRE BoomerAMG: Cycle type V > > HYPRE BoomerAMG: Maximum number of levels 25 > > HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 > > HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 > > HYPRE BoomerAMG: Threshold for strong coupling 0.6 > > HYPRE BoomerAMG: Interpolation truncation factor 0 > > HYPRE BoomerAMG: Interpolation: max elements per row 0 > > HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 > > HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 > > HYPRE BoomerAMG: Maximum row sums 0.9 > > HYPRE BoomerAMG: Sweeps down 1 > > HYPRE BoomerAMG: Sweeps up 1 > > HYPRE BoomerAMG: Sweeps on coarse 1 > > HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi > > HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi > > HYPRE BoomerAMG: Relax on coarse Gaussian-elimination > > HYPRE BoomerAMG: Relax weight (all) 1 > > HYPRE BoomerAMG: Outer relax weight (all) 1 > > HYPRE BoomerAMG: Using CF-relaxation > > HYPRE BoomerAMG: Measure type local > > HYPRE BoomerAMG: Coarsen type PMIS > > HYPRE BoomerAMG: Interpolation type classical > > linear system matrix = precond matrix: > > Mat Object: (fieldsplit_u_) 4 MPI processes > > type: mpiaij > > rows=938910, cols=938910, bs=3 > > total: nonzeros=8.60906e+07, allocated nonzeros=8.60906e+07 > > total number of mallocs used during MatSetValues calls =0 > > using I-node (on process 0) routines: found 78749 nodes, limit used is 5 > > Split number 1 Defined by IS > > KSP Object: (fieldsplit_wp_) 4 MPI processes > > type: preonly > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > left preconditioning > > using NONE norm type for convergence test > > PC Object: (fieldsplit_wp_) 4 MPI processes > > type: lu > > LU: out-of-place factorization > > tolerance for zero pivot 2.22045e-14 > > matrix ordering: natural > > factor fill ratio given 0, needed 0 > > Factored matrix follows: > > Mat Object: 4 MPI processes > > type: mpiaij > > rows=34141, cols=34141 > > package used to perform factorization: pastix > > Error : -nan > > Error : -nan > > total: nonzeros=0, allocated nonzeros=0 > > Error : -nan > > total number of mallocs used during MatSetValues calls =0 > > PaStiX run parameters: > > Matrix type : Symmetric > > Level of printing (0,1,2): 0 > > Number of refinements iterations : 0 > > Error : -nan > > linear system matrix = precond matrix: > > Mat Object: (fieldsplit_wp_) 4 MPI processes > > type: mpiaij > > rows=34141, cols=34141 > > total: nonzeros=485655, allocated nonzeros=485655 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node (on process 0) routines > > linear system matrix = precond matrix: > > Mat Object: 4 MPI processes > > type: mpiaij > > rows=973051, cols=973051 > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 > > total number of mallocs used during MatSetValues calls =0 > > using I-node (on process 0) routines: found 78749 nodes, limit used is 5 > > > > The pattern of convergence gives a hint that this system is somehow bad/singular. But I don't know why the preconditioned error goes up too high. Anyone has an idea? > > > > Best regards > > Giang Bui > > > > From hgbk2008 at gmail.com Mon Apr 24 03:16:33 2017 From: hgbk2008 at gmail.com (Hoang Giang Bui) Date: Mon, 24 Apr 2017 10:16:33 +0200 Subject: [petsc-users] strange convergence In-Reply-To: <425BBB58-9721-49F3-8C86-940F08E925F7@mcs.anl.gov> References: <7891536D-91FE-4BFF-8DAD-CE7AB85A4E57@mcs.anl.gov> <425BBB58-9721-49F3-8C86-940F08E925F7@mcs.anl.gov> Message-ID: Thanks Barry, trying with -fieldsplit_u_type lu gives better convergence. I still used 4 procs though, probably with 1 proc it should also be the same. The u block used a Nitsche-type operator to connect two non-matching domains. I don't think it will leave some rigid body motion leads to not sufficient constraints. Maybe you have other idea? Residual norms for fieldsplit_u_ solve. 0 KSP Residual norm 3.129067184300e+05 1 KSP Residual norm 5.906261468196e-01 Residual norms for fieldsplit_wp_ solve. 0 KSP Residual norm 0.000000000000e+00 0 KSP preconditioned resid norm 3.129067184300e+05 true resid norm 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 Residual norms for fieldsplit_u_ solve. 0 KSP Residual norm 9.999955993437e-01 1 KSP Residual norm 4.019774691831e-06 Residual norms for fieldsplit_wp_ solve. 0 KSP Residual norm 0.000000000000e+00 1 KSP preconditioned resid norm 5.003913641475e-01 true resid norm 4.692996324114e+01 ||r(i)||/||b|| 5.205677185522e-06 Residual norms for fieldsplit_u_ solve. 0 KSP Residual norm 1.000012180204e+00 1 KSP Residual norm 1.017367950422e-05 Residual norms for fieldsplit_wp_ solve. 0 KSP Residual norm 0.000000000000e+00 2 KSP preconditioned resid norm 2.330910333756e-07 true resid norm 3.474855463983e+01 ||r(i)||/||b|| 3.854461960453e-06 Residual norms for fieldsplit_u_ solve. 0 KSP Residual norm 1.000004200085e+00 1 KSP Residual norm 6.231613102458e-06 Residual norms for fieldsplit_wp_ solve. 0 KSP Residual norm 0.000000000000e+00 3 KSP preconditioned resid norm 8.671259838389e-11 true resid norm 3.545103468011e+01 ||r(i)||/||b|| 3.932384125024e-06 Linear solve converged due to CONVERGED_ATOL iterations 3 KSP Object: 4 MPI processes type: gmres GMRES: restart=1000, using Modified Gram-Schmidt Orthogonalization GMRES: happy breakdown tolerance 1e-30 maximum iterations=1000, initial guess is zero tolerances: relative=1e-20, absolute=1e-09, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 4 MPI processes type: fieldsplit FieldSplit with MULTIPLICATIVE composition: total splits = 2 Solver info for each split is in the following KSP objects: Split number 0 Defined by IS KSP Object: (fieldsplit_u_) 4 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (fieldsplit_u_) 4 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 0, needed 0 Factored matrix follows: Mat Object: 4 MPI processes type: mpiaij rows=938910, cols=938910 package used to perform factorization: pastix total: nonzeros=0, allocated nonzeros=0 Error : 3.36878e-14 total number of mallocs used during MatSetValues calls =0 PaStiX run parameters: Matrix type : Unsymmetric Level of printing (0,1,2): 0 Number of refinements iterations : 3 Error : 3.36878e-14 linear system matrix = precond matrix: Mat Object: (fieldsplit_u_) 4 MPI processes type: mpiaij rows=938910, cols=938910, bs=3 Error : 3.36878e-14 Error : 3.36878e-14 total: nonzeros=8.60906e+07, allocated nonzeros=8.60906e+07 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 78749 nodes, limit used is 5 Split number 1 Defined by IS KSP Object: (fieldsplit_wp_) 4 MPI processes type: richardson Richardson: damping factor=1 maximum iterations=1, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (fieldsplit_wp_) 4 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 0, needed 0 Factored matrix follows: Mat Object: 4 MPI processes type: mpiaij rows=34141, cols=34141 package used to perform factorization: pastix Error : -nan Error : -nan Error : -nan total: nonzeros=0, allocated nonzeros=0 total number of mallocs used during MatSetValues calls =0 PaStiX run parameters: Matrix type : Symmetric Level of printing (0,1,2): 0 Number of refinements iterations : 0 Error : -nan linear system matrix = precond matrix: Mat Object: (fieldsplit_wp_) 4 MPI processes type: mpiaij rows=34141, cols=34141 total: nonzeros=485655, allocated nonzeros=485655 total number of mallocs used during MatSetValues calls =0 not using I-node (on process 0) routines linear system matrix = precond matrix: Mat Object: 4 MPI processes type: mpiaij rows=973051, cols=973051 total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 78749 nodes, limit used is 5 Giang On Sun, Apr 23, 2017 at 10:19 PM, Barry Smith wrote: > > > On Apr 23, 2017, at 2:42 PM, Hoang Giang Bui wrote: > > > > Dear Matt/Barry > > > > With your options, it results in > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true resid norm > 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 2.407308987203e+36 > > 1 KSP Residual norm 5.797185652683e+72 > > It looks like Matt is right, hypre is seemly producing useless garbage. > > First how do things run on one process. If you have similar problems then > debug on one process (debugging any kind of problem is always far easy on > one process). > > First run with -fieldsplit_u_type lu (instead of using hypre) to see if > that works or also produces something bad. > > What is the operator and the boundary conditions for u? It could be > singular. > > > > > > > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 0.000000000000e+00 > > ... > > 999 KSP preconditioned resid norm 2.920157329174e+12 true resid norm > 9.015683504616e+06 ||r(i)||/||b|| 1.000059124102e+00 > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 1.533726746719e+36 > > 1 KSP Residual norm 3.692757392261e+72 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 0.000000000000e+00 > > > > Do you suggest that the pastix solver for the "wp" block encounters > small pivot? In addition, seem like the "u" block is also singular. > > > > Giang > > > > On Sun, Apr 23, 2017 at 7:39 PM, Barry Smith wrote: > > > > Huge preconditioned norms but normal unpreconditioned norms almost > always come from a very small pivot in an LU or ILU factorization. > > > > The first thing to do is monitor the two sub solves. Run with the > additional options -fieldsplit_u_ksp_type richardson > -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 > -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor > -fieldsplit_wp_ksp_max_it 1 > > > > > On Apr 23, 2017, at 12:22 PM, Hoang Giang Bui > wrote: > > > > > > Hello > > > > > > I encountered a strange convergence behavior that I have trouble to > understand > > > > > > KSPSetFromOptions completed > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true resid norm > 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 > > > 1 KSP preconditioned resid norm 2.933141742664e+29 true resid norm > 9.015152282123e+06 ||r(i)||/||b|| 1.000000198575e+00 > > > 2 KSP preconditioned resid norm 9.686409637174e+16 true resid norm > 9.015354521944e+06 ||r(i)||/||b|| 1.000022631902e+00 > > > 3 KSP preconditioned resid norm 4.219243615809e+15 true resid norm > 9.017157702420e+06 ||r(i)||/||b|| 1.000222648583e+00 > > > ..... > > > 999 KSP preconditioned resid norm 3.043754298076e+12 true resid norm > 9.015425041089e+06 ||r(i)||/||b|| 1.000030454195e+00 > > > 1000 KSP preconditioned resid norm 3.043000287819e+12 true resid norm > 9.015424313455e+06 ||r(i)||/||b|| 1.000030373483e+00 > > > Linear solve did not converge due to DIVERGED_ITS iterations 1000 > > > KSP Object: 4 MPI processes > > > type: gmres > > > GMRES: restart=1000, using Modified Gram-Schmidt Orthogonalization > > > GMRES: happy breakdown tolerance 1e-30 > > > maximum iterations=1000, initial guess is zero > > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 > > > left preconditioning > > > using PRECONDITIONED norm type for convergence test > > > PC Object: 4 MPI processes > > > type: fieldsplit > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 > > > Solver info for each split is in the following KSP objects: > > > Split number 0 Defined by IS > > > KSP Object: (fieldsplit_u_) 4 MPI processes > > > type: preonly > > > maximum iterations=10000, initial guess is zero > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > > left preconditioning > > > using NONE norm type for convergence test > > > PC Object: (fieldsplit_u_) 4 MPI processes > > > type: hypre > > > HYPRE BoomerAMG preconditioning > > > HYPRE BoomerAMG: Cycle type V > > > HYPRE BoomerAMG: Maximum number of levels 25 > > > HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 > > > HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 > > > HYPRE BoomerAMG: Threshold for strong coupling 0.6 > > > HYPRE BoomerAMG: Interpolation truncation factor 0 > > > HYPRE BoomerAMG: Interpolation: max elements per row 0 > > > HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 > > > HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 > > > HYPRE BoomerAMG: Maximum row sums 0.9 > > > HYPRE BoomerAMG: Sweeps down 1 > > > HYPRE BoomerAMG: Sweeps up 1 > > > HYPRE BoomerAMG: Sweeps on coarse 1 > > > HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi > > > HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi > > > HYPRE BoomerAMG: Relax on coarse Gaussian-elimination > > > HYPRE BoomerAMG: Relax weight (all) 1 > > > HYPRE BoomerAMG: Outer relax weight (all) 1 > > > HYPRE BoomerAMG: Using CF-relaxation > > > HYPRE BoomerAMG: Measure type local > > > HYPRE BoomerAMG: Coarsen type PMIS > > > HYPRE BoomerAMG: Interpolation type classical > > > linear system matrix = precond matrix: > > > Mat Object: (fieldsplit_u_) 4 MPI processes > > > type: mpiaij > > > rows=938910, cols=938910, bs=3 > > > total: nonzeros=8.60906e+07, allocated nonzeros=8.60906e+07 > > > total number of mallocs used during MatSetValues calls =0 > > > using I-node (on process 0) routines: found 78749 nodes, > limit used is 5 > > > Split number 1 Defined by IS > > > KSP Object: (fieldsplit_wp_) 4 MPI processes > > > type: preonly > > > maximum iterations=10000, initial guess is zero > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > > left preconditioning > > > using NONE norm type for convergence test > > > PC Object: (fieldsplit_wp_) 4 MPI processes > > > type: lu > > > LU: out-of-place factorization > > > tolerance for zero pivot 2.22045e-14 > > > matrix ordering: natural > > > factor fill ratio given 0, needed 0 > > > Factored matrix follows: > > > Mat Object: 4 MPI processes > > > type: mpiaij > > > rows=34141, cols=34141 > > > package used to perform factorization: pastix > > > Error : -nan > > > Error : -nan > > > total: nonzeros=0, allocated nonzeros=0 > > > Error : -nan > > > total number of mallocs used during MatSetValues calls =0 > > > PaStiX run parameters: > > > Matrix type : Symmetric > > > Level of printing (0,1,2): 0 > > > Number of refinements iterations : 0 > > > Error : -nan > > > linear system matrix = precond matrix: > > > Mat Object: (fieldsplit_wp_) 4 MPI processes > > > type: mpiaij > > > rows=34141, cols=34141 > > > total: nonzeros=485655, allocated nonzeros=485655 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node (on process 0) routines > > > linear system matrix = precond matrix: > > > Mat Object: 4 MPI processes > > > type: mpiaij > > > rows=973051, cols=973051 > > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 > > > total number of mallocs used during MatSetValues calls =0 > > > using I-node (on process 0) routines: found 78749 nodes, limit > used is 5 > > > > > > The pattern of convergence gives a hint that this system is somehow > bad/singular. But I don't know why the preconditioned error goes up too > high. Anyone has an idea? > > > > > > Best regards > > > Giang Bui > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From damian at man.poznan.pl Mon Apr 24 06:48:04 2017 From: damian at man.poznan.pl (Damian Kaliszan) Date: Mon, 24 Apr 2017 13:48:04 +0200 Subject: [petsc-users] petsc4py & GPU In-Reply-To: References: <33676872.20170420120316@man.poznan.pl> <691962187.20170420160944@man.poznan.pl> <43415ff5-1e9b-40e1-a51e-7cc220cc997d@man.poznan.pl> <6158761.20170421162757@man.poznan.pl> Message-ID: <1279816835.20170424134804@man.poznan.pl> An HTML attachment was scrubbed... URL: From driver.dan12 at yahoo.com Mon Apr 24 09:46:26 2017 From: driver.dan12 at yahoo.com (D D) Date: Mon, 24 Apr 2017 14:46:26 +0000 (UTC) Subject: [petsc-users] memory usage for dense vs sparse matrices References: <1362006833.8991817.1493045186293.ref@mail.yahoo.com> Message-ID: <1362006833.8991817.1493045186293@mail.yahoo.com> Hello, I see memory usage that confuses me: me at blah:src$ ./example1 -n_row 200 -n_col 2000 -sparsity 0.03 -mat_type mpidenseInitialize Got options Create and assemble matrix Assembled Peak RSS 21 Mb me at blah:~/src$ ./example1 -n_row 200 -n_col 2000 -sparsity 0.03 -mat_type mpiaij Initialize Got options Create and assemble matrix Assembled Peak RSS 19 Mb I put my example code on Github so I can more effectively communicate my question. And here is my question: why does the program as written use so much memory for the sparse case - matrix type mpiaij? Note that I'm creating a random dense matrix with at most 3% non-zero entries since this is my use case. I have read the relevant portions of the user's manual and searched for answers. Have I missed a resource that can answer my question? dtsmith2001/hpc | | | | | | | | | | | dtsmith2001/hpc hpc - High Performance Computing Explorations using PETSc and SLEPc | | | | Dale -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongzhang at anl.gov Mon Apr 24 10:28:45 2017 From: hongzhang at anl.gov (Zhang, Hong) Date: Mon, 24 Apr 2017 15:28:45 +0000 Subject: [petsc-users] memory usage for dense vs sparse matrices In-Reply-To: <1362006833.8991817.1493045186293@mail.yahoo.com> References: <1362006833.8991817.1493045186293.ref@mail.yahoo.com> <1362006833.8991817.1493045186293@mail.yahoo.com> Message-ID: <0498A2C8-15D1-4E6F-BEAF-CD612F152667@anl.gov> The peak RSS does not tell you how much memory the matrix takes. It may include many things such as the binary, the libraries linked to it, and stack and heap memory. Hong (Mr.) On Apr 24, 2017, at 9:46 AM, D D > wrote: Hello, I see memory usage that confuses me: me at blah:src$ ./example1 -n_row 200 -n_col 2000 -sparsity 0.03 -mat_type mpidenseInitialize Got options Create and assemble matrix Assembled Peak RSS 21 Mb me at blah:~/src$ ./example1 -n_row 200 -n_col 2000 -sparsity 0.03 -mat_type mpiaij Initialize Got options Create and assemble matrix Assembled Peak RSS 19 Mb I put my example code on Github so I can more effectively communicate my question. And here is my question: why does the program as written use so much memory for the sparse case - matrix type mpiaij? Note that I'm creating a random dense matrix with at most 3% non-zero entries since this is my use case. I have read the relevant portions of the user's manual and searched for answers. Have I missed a resource that can answer my question? dtsmith2001/hpc [https://s.yimg.com/nq/storm/assets/enhancrV2/23/logos/github.png] dtsmith2001/hpc hpc - High Performance Computing Explorations using PETSc and SLEPc Dale -------------- next part -------------- An HTML attachment was scrubbed... URL: From driver.dan12 at yahoo.com Mon Apr 24 11:10:31 2017 From: driver.dan12 at yahoo.com (D D) Date: Mon, 24 Apr 2017 16:10:31 +0000 (UTC) Subject: [petsc-users] memory usage for dense vs sparse matrices In-Reply-To: <0498A2C8-15D1-4E6F-BEAF-CD612F152667@anl.gov> References: <1362006833.8991817.1493045186293.ref@mail.yahoo.com> <1362006833.8991817.1493045186293@mail.yahoo.com> <0498A2C8-15D1-4E6F-BEAF-CD612F152667@anl.gov> Message-ID: <1877458518.9106129.1493050231082@mail.yahoo.com> You are correct, and that is why I'm using the peak RSS. The total memory should be lower to reflect the sparse versus dense structure. On Monday, April 24, 2017 11:28 AM, "Zhang, Hong" wrote: The peak RSS does not tell you how much memory the matrix takes. It may include many things such as the binary, the libraries linked to it, and stack and heap memory. Hong (Mr.) On Apr 24, 2017, at 9:46 AM, D D wrote: Hello, I see memory usage that confuses me: me at blah:src$ ./example1 -n_row 200 -n_col 2000 -sparsity 0.03 -mat_type mpidenseInitialize Got options Create and assemble matrix Assembled Peak RSS 21 Mb me at blah:~/src$ ./example1 -n_row 200 -n_col 2000 -sparsity 0.03 -mat_type mpiaij Initialize Got options Create and assemble matrix Assembled Peak RSS 19 Mb I put my example code on Github so I can more effectively communicate my question. And here is my question: why does the program as written use so much memory for the sparse case - matrix type mpiaij? Note that I'm creating a random dense matrix with at most 3% non-zero entries since this is my use case. I have read the relevant portions of the user's manual and searched for answers. Have I missed a resource that can answer my question? dtsmith2001/hpc | | | | | | | | | | | dtsmith2001/hpc hpc - High Performance Computing Explorations using PETSc and SLEPc | | | | Dale -------------- next part -------------- An HTML attachment was scrubbed... URL: From driver.dan12 at yahoo.com Mon Apr 24 11:24:53 2017 From: driver.dan12 at yahoo.com (D D) Date: Mon, 24 Apr 2017 16:24:53 +0000 (UTC) Subject: [petsc-users] memory usage for dense vs sparse matrices In-Reply-To: <1877458518.9106129.1493050231082@mail.yahoo.com> References: <1362006833.8991817.1493045186293.ref@mail.yahoo.com> <1362006833.8991817.1493045186293@mail.yahoo.com> <0498A2C8-15D1-4E6F-BEAF-CD612F152667@anl.gov> <1877458518.9106129.1493050231082@mail.yahoo.com> Message-ID: <351025685.9064558.1493051093126@mail.yahoo.com> Unless, of course, my assumption is incorrect. But why should my assumption be incorrect? I think I'm constructing my sparse matrix properly by calling MatSetFromOptions. The loop from line 52 - 57 in example1.cpp may be incorrect. How do you think I should measure the effect of the size of the sparse vs dense matrix structure to make sure I'm effectively using the PETSc sparse matrix structure in my example code? On Monday, April 24, 2017 12:10 PM, D D wrote: You are correct, and that is why I'm using the peak RSS. The total memory should be lower to reflect the sparse versus dense structure. On Monday, April 24, 2017 11:28 AM, "Zhang, Hong" wrote: The peak RSS does not tell you how much memory the matrix takes. It may include many things such as the binary, the libraries linked to it, and stack and heap memory. Hong (Mr.) On Apr 24, 2017, at 9:46 AM, D D wrote: Hello, I see memory usage that confuses me: me at blah:src$ ./example1 -n_row 200 -n_col 2000 -sparsity 0.03 -mat_type mpidenseInitialize Got options Create and assemble matrix Assembled Peak RSS 21 Mb me at blah:~/src$ ./example1 -n_row 200 -n_col 2000 -sparsity 0.03 -mat_type mpiaij Initialize Got options Create and assemble matrix Assembled Peak RSS 19 Mb I put my example code on Github so I can more effectively communicate my question. And here is my question: why does the program as written use so much memory for the sparse case - matrix type mpiaij? Note that I'm creating a random dense matrix with at most 3% non-zero entries since this is my use case. I have read the relevant portions of the user's manual and searched for answers. Have I missed a resource that can answer my question? dtsmith2001/hpc | | | | | | | | | | | dtsmith2001/hpc hpc - High Performance Computing Explorations using PETSc and SLEPc | | | | Dale -------------- next part -------------- An HTML attachment was scrubbed... URL: From A.T.T.McRae at bath.ac.uk Mon Apr 24 11:44:33 2017 From: A.T.T.McRae at bath.ac.uk (Andrew McRae) Date: Mon, 24 Apr 2017 17:44:33 +0100 Subject: [petsc-users] memory usage for dense vs sparse matrices In-Reply-To: References: <1362006833.8991817.1493045186293.ref@mail.yahoo.com> <1362006833.8991817.1493045186293@mail.yahoo.com> <0498A2C8-15D1-4E6F-BEAF-CD612F152667@anl.gov> <1877458518.9106129.1493050231082@mail.yahoo.com> Message-ID: This matrix is 2000 x 200? That's tiny. Even a 2000-by-200 dense matrix takes only (2000*200 entries)*(8 bytes per entry) to store, or about 3 MB. The sparse version might take 150KB. The 'peak RSS' differs by 2MB, so this seems consistent. Try a 20,000 x 20,000 dense and sparse matrix, by which time the memory usage will be dominated by the matrix storage. On 24 April 2017 at 17:24, D D wrote: > Unless, of course, my assumption is incorrect. But why should my > assumption be incorrect? > > I think I'm constructing my sparse matrix properly by calling > MatSetFromOptions. The loop from line 52 - 57 in example1.cpp may be > incorrect. > > How do you think I should measure the effect of the size of the sparse vs > dense matrix structure to make sure I'm effectively using the PETSc sparse > matrix structure in my example code? > > > On Monday, April 24, 2017 12:10 PM, D D wrote: > > > You are correct, and that is why I'm using the peak RSS. The total memory > should be lower to reflect the sparse versus dense structure. > > > On Monday, April 24, 2017 11:28 AM, "Zhang, Hong" > wrote: > > > The peak RSS does not tell you how much memory the matrix takes. It may > include many things such as the binary, the libraries linked to it, and > stack and heap memory. > > Hong (Mr.) > > On Apr 24, 2017, at 9:46 AM, D D wrote: > > Hello, > > I see memory usage that confuses me: > > me at blah:src$ ./example1 -n_row 200 -n_col 2000 -sparsity 0.03 -mat_type > mpidenseInitialize > Got options > Create and assemble matrix > Assembled > Peak RSS 21 Mb > me at blah:~/src$ ./example1 -n_row 200 -n_col 2000 -sparsity 0.03 -mat_type > mpiaij > Initialize > Got options > Create and assemble matrix > Assembled > Peak RSS 19 Mb > > I put my example code on Github so I can more effectively communicate my > question. And here is my question: why does the program as written use so > much memory for the sparse case - matrix type mpiaij? Note that I'm > creating a random dense matrix with at most 3% non-zero entries since this > is my use case. > > I have read the relevant portions of the user's manual and searched for > answers. Have I missed a resource that can answer my question? > > dtsmith2001/hpc > > dtsmith2001/hpc > hpc - High Performance Computing Explorations using PETSc and SLEPc > > > > Dale > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From driver.dan12 at yahoo.com Mon Apr 24 11:53:11 2017 From: driver.dan12 at yahoo.com (D D) Date: Mon, 24 Apr 2017 16:53:11 +0000 (UTC) Subject: [petsc-users] memory usage for dense vs sparse matrices In-Reply-To: References: <1362006833.8991817.1493045186293.ref@mail.yahoo.com> <1362006833.8991817.1493045186293@mail.yahoo.com> <0498A2C8-15D1-4E6F-BEAF-CD612F152667@anl.gov> <1877458518.9106129.1493050231082@mail.yahoo.com> Message-ID: <586065122.2899873.1493052791730@mail.yahoo.com> Hmm, you are right. The case I did here is much more in line with our use case matrix sizes. Hmm. I need to do further experiments. me at blah:~/src$ ./ex8 -n_row 200 -n_col 40000 -sparsity 0.03 -mat_type mpiaij Initialize Got options Create and assemble matrix Assembled Peak RSS 19 Mb me at blah:~/src$ ./ex8 -n_row 200 -n_col 40000 -sparsity 0.03 -mat_type mpidense Initialize Got options Create and assemble matrix Assembled Peak RSS 80 Mb Thank you, Andrew McRae and Zhang Hong. On Monday, April 24, 2017 12:45 PM, Andrew McRae wrote: This matrix is 2000 x 200?? That's tiny.? Even a 2000-by-200 dense matrix takes only (2000*200 entries)*(8 bytes per entry) to store, or about 3 MB.? The sparse version might take 150KB.? The 'peak RSS' differs by 2MB, so this seems consistent. Try a 20,000 x 20,000 dense and sparse matrix, by which time the memory usage will be dominated by the matrix storage. On 24 April 2017 at 17:24, D D wrote: Unless, of course, my assumption is incorrect. But why should my assumption be incorrect? I think I'm constructing my sparse matrix properly by calling MatSetFromOptions. The loop from line 52 - 57 in example1.cpp may be incorrect. How do you think I should measure the effect of the size of the sparse vs dense matrix structure to make sure I'm effectively using the PETSc sparse matrix structure in my example code? On Monday, April 24, 2017 12:10 PM, D D wrote: You are correct, and that is why I'm using the peak RSS. The total memory should be lower to reflect the sparse versus dense structure. On Monday, April 24, 2017 11:28 AM, "Zhang, Hong" wrote: The peak RSS does not tell you how much memory the matrix takes. It may include many things such as the binary, the libraries linked to it, and stack and heap memory. Hong (Mr.) On Apr 24, 2017, at 9:46 AM, D D wrote: Hello, I see memory usage that confuses me: me at blah:src$ ./example1 -n_row 200 -n_col 2000 -sparsity 0.03 -mat_type mpidenseInitialize Got options Create and assemble matrix Assembled Peak RSS 21 Mb me at blah:~/src$ ./example1 -n_row 200 -n_col 2000 -sparsity 0.03 -mat_type mpiaij Initialize Got options Create and assemble matrix Assembled Peak RSS 19 Mb I put my example code on Github so I can more effectively communicate my question. And here is my question: why does the program as written use so much memory for the sparse case - matrix type mpiaij? Note that I'm creating a random dense matrix with at most 3% non-zero entries since this is my use case. I have read the relevant portions of the user's manual and searched for answers. Have I missed a resource that can answer my question? dtsmith2001/hpc | | | | | | | | | | | dtsmith2001/hpc hpc - High Performance Computing Explorations using PETSc and SLEPc | | | | Dale -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon Apr 24 12:32:08 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 24 Apr 2017 12:32:08 -0500 Subject: [petsc-users] strange convergence In-Reply-To: References: <7891536D-91FE-4BFF-8DAD-CE7AB85A4E57@mcs.anl.gov> <425BBB58-9721-49F3-8C86-940F08E925F7@mcs.anl.gov> Message-ID: > On Apr 24, 2017, at 3:16 AM, Hoang Giang Bui wrote: > > Thanks Barry, trying with -fieldsplit_u_type lu gives better convergence. I still used 4 procs though, probably with 1 proc it should also be the same. > > The u block used a Nitsche-type operator to connect two non-matching domains. I don't think it will leave some rigid body motion leads to not sufficient constraints. Maybe you have other idea? > > Residual norms for fieldsplit_u_ solve. > 0 KSP Residual norm 3.129067184300e+05 > 1 KSP Residual norm 5.906261468196e-01 > Residual norms for fieldsplit_wp_ solve. > 0 KSP Residual norm 0.000000000000e+00 ^^^^ something is wrong here. The sub solve should not be starting with a 0 residual (this means the right hand side for this sub solve is zero which it should not be). > FieldSplit with MULTIPLICATIVE composition: total splits = 2 How are you providing the outer operator? As an explicit matrix or with some shell matrix? > 0 KSP preconditioned resid norm 3.129067184300e+05 true resid norm 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 > Residual norms for fieldsplit_u_ solve. > 0 KSP Residual norm 9.999955993437e-01 > 1 KSP Residual norm 4.019774691831e-06 > Residual norms for fieldsplit_wp_ solve. > 0 KSP Residual norm 0.000000000000e+00 > 1 KSP preconditioned resid norm 5.003913641475e-01 true resid norm 4.692996324114e+01 ||r(i)||/||b|| 5.205677185522e-06 > Residual norms for fieldsplit_u_ solve. > 0 KSP Residual norm 1.000012180204e+00 > 1 KSP Residual norm 1.017367950422e-05 > Residual norms for fieldsplit_wp_ solve. > 0 KSP Residual norm 0.000000000000e+00 > 2 KSP preconditioned resid norm 2.330910333756e-07 true resid norm 3.474855463983e+01 ||r(i)||/||b|| 3.854461960453e-06 > Residual norms for fieldsplit_u_ solve. > 0 KSP Residual norm 1.000004200085e+00 > 1 KSP Residual norm 6.231613102458e-06 > Residual norms for fieldsplit_wp_ solve. > 0 KSP Residual norm 0.000000000000e+00 > 3 KSP preconditioned resid norm 8.671259838389e-11 true resid norm 3.545103468011e+01 ||r(i)||/||b|| 3.932384125024e-06 > Linear solve converged due to CONVERGED_ATOL iterations 3 > KSP Object: 4 MPI processes > type: gmres > GMRES: restart=1000, using Modified Gram-Schmidt Orthogonalization > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=1000, initial guess is zero > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: 4 MPI processes > type: fieldsplit > FieldSplit with MULTIPLICATIVE composition: total splits = 2 > Solver info for each split is in the following KSP objects: > Split number 0 Defined by IS > KSP Object: (fieldsplit_u_) 4 MPI processes > type: richardson > Richardson: damping factor=1 > maximum iterations=1, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: (fieldsplit_u_) 4 MPI processes > type: lu > LU: out-of-place factorization > tolerance for zero pivot 2.22045e-14 > matrix ordering: natural > factor fill ratio given 0, needed 0 > Factored matrix follows: > Mat Object: 4 MPI processes > type: mpiaij > rows=938910, cols=938910 > package used to perform factorization: pastix > total: nonzeros=0, allocated nonzeros=0 > Error : 3.36878e-14 > total number of mallocs used during MatSetValues calls =0 > PaStiX run parameters: > Matrix type : Unsymmetric > Level of printing (0,1,2): 0 > Number of refinements iterations : 3 > Error : 3.36878e-14 > linear system matrix = precond matrix: > Mat Object: (fieldsplit_u_) 4 MPI processes > type: mpiaij > rows=938910, cols=938910, bs=3 > Error : 3.36878e-14 > Error : 3.36878e-14 > total: nonzeros=8.60906e+07, allocated nonzeros=8.60906e+07 > total number of mallocs used during MatSetValues calls =0 > using I-node (on process 0) routines: found 78749 nodes, limit used is 5 > Split number 1 Defined by IS > KSP Object: (fieldsplit_wp_) 4 MPI processes > type: richardson > Richardson: damping factor=1 > maximum iterations=1, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: (fieldsplit_wp_) 4 MPI processes > type: lu > LU: out-of-place factorization > tolerance for zero pivot 2.22045e-14 > matrix ordering: natural > factor fill ratio given 0, needed 0 > Factored matrix follows: > Mat Object: 4 MPI processes > type: mpiaij > rows=34141, cols=34141 > package used to perform factorization: pastix > Error : -nan > Error : -nan > Error : -nan > total: nonzeros=0, allocated nonzeros=0 > total number of mallocs used during MatSetValues calls =0 > PaStiX run parameters: > Matrix type : Symmetric > Level of printing (0,1,2): 0 > Number of refinements iterations : 0 > Error : -nan > linear system matrix = precond matrix: > Mat Object: (fieldsplit_wp_) 4 MPI processes > type: mpiaij > rows=34141, cols=34141 > total: nonzeros=485655, allocated nonzeros=485655 > total number of mallocs used during MatSetValues calls =0 > not using I-node (on process 0) routines > linear system matrix = precond matrix: > Mat Object: 4 MPI processes > type: mpiaij > rows=973051, cols=973051 > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 > total number of mallocs used during MatSetValues calls =0 > using I-node (on process 0) routines: found 78749 nodes, limit used is 5 > > > > Giang > > On Sun, Apr 23, 2017 at 10:19 PM, Barry Smith wrote: > > > On Apr 23, 2017, at 2:42 PM, Hoang Giang Bui wrote: > > > > Dear Matt/Barry > > > > With your options, it results in > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 2.407308987203e+36 > > 1 KSP Residual norm 5.797185652683e+72 > > It looks like Matt is right, hypre is seemly producing useless garbage. > > First how do things run on one process. If you have similar problems then debug on one process (debugging any kind of problem is always far easy on one process). > > First run with -fieldsplit_u_type lu (instead of using hypre) to see if that works or also produces something bad. > > What is the operator and the boundary conditions for u? It could be singular. > > > > > > > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 0.000000000000e+00 > > ... > > 999 KSP preconditioned resid norm 2.920157329174e+12 true resid norm 9.015683504616e+06 ||r(i)||/||b|| 1.000059124102e+00 > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 1.533726746719e+36 > > 1 KSP Residual norm 3.692757392261e+72 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 0.000000000000e+00 > > > > Do you suggest that the pastix solver for the "wp" block encounters small pivot? In addition, seem like the "u" block is also singular. > > > > Giang > > > > On Sun, Apr 23, 2017 at 7:39 PM, Barry Smith wrote: > > > > Huge preconditioned norms but normal unpreconditioned norms almost always come from a very small pivot in an LU or ILU factorization. > > > > The first thing to do is monitor the two sub solves. Run with the additional options -fieldsplit_u_ksp_type richardson -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor -fieldsplit_wp_ksp_max_it 1 > > > > > On Apr 23, 2017, at 12:22 PM, Hoang Giang Bui wrote: > > > > > > Hello > > > > > > I encountered a strange convergence behavior that I have trouble to understand > > > > > > KSPSetFromOptions completed > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 > > > 1 KSP preconditioned resid norm 2.933141742664e+29 true resid norm 9.015152282123e+06 ||r(i)||/||b|| 1.000000198575e+00 > > > 2 KSP preconditioned resid norm 9.686409637174e+16 true resid norm 9.015354521944e+06 ||r(i)||/||b|| 1.000022631902e+00 > > > 3 KSP preconditioned resid norm 4.219243615809e+15 true resid norm 9.017157702420e+06 ||r(i)||/||b|| 1.000222648583e+00 > > > ..... > > > 999 KSP preconditioned resid norm 3.043754298076e+12 true resid norm 9.015425041089e+06 ||r(i)||/||b|| 1.000030454195e+00 > > > 1000 KSP preconditioned resid norm 3.043000287819e+12 true resid norm 9.015424313455e+06 ||r(i)||/||b|| 1.000030373483e+00 > > > Linear solve did not converge due to DIVERGED_ITS iterations 1000 > > > KSP Object: 4 MPI processes > > > type: gmres > > > GMRES: restart=1000, using Modified Gram-Schmidt Orthogonalization > > > GMRES: happy breakdown tolerance 1e-30 > > > maximum iterations=1000, initial guess is zero > > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 > > > left preconditioning > > > using PRECONDITIONED norm type for convergence test > > > PC Object: 4 MPI processes > > > type: fieldsplit > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 > > > Solver info for each split is in the following KSP objects: > > > Split number 0 Defined by IS > > > KSP Object: (fieldsplit_u_) 4 MPI processes > > > type: preonly > > > maximum iterations=10000, initial guess is zero > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > > left preconditioning > > > using NONE norm type for convergence test > > > PC Object: (fieldsplit_u_) 4 MPI processes > > > type: hypre > > > HYPRE BoomerAMG preconditioning > > > HYPRE BoomerAMG: Cycle type V > > > HYPRE BoomerAMG: Maximum number of levels 25 > > > HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 > > > HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 > > > HYPRE BoomerAMG: Threshold for strong coupling 0.6 > > > HYPRE BoomerAMG: Interpolation truncation factor 0 > > > HYPRE BoomerAMG: Interpolation: max elements per row 0 > > > HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 > > > HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 > > > HYPRE BoomerAMG: Maximum row sums 0.9 > > > HYPRE BoomerAMG: Sweeps down 1 > > > HYPRE BoomerAMG: Sweeps up 1 > > > HYPRE BoomerAMG: Sweeps on coarse 1 > > > HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi > > > HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi > > > HYPRE BoomerAMG: Relax on coarse Gaussian-elimination > > > HYPRE BoomerAMG: Relax weight (all) 1 > > > HYPRE BoomerAMG: Outer relax weight (all) 1 > > > HYPRE BoomerAMG: Using CF-relaxation > > > HYPRE BoomerAMG: Measure type local > > > HYPRE BoomerAMG: Coarsen type PMIS > > > HYPRE BoomerAMG: Interpolation type classical > > > linear system matrix = precond matrix: > > > Mat Object: (fieldsplit_u_) 4 MPI processes > > > type: mpiaij > > > rows=938910, cols=938910, bs=3 > > > total: nonzeros=8.60906e+07, allocated nonzeros=8.60906e+07 > > > total number of mallocs used during MatSetValues calls =0 > > > using I-node (on process 0) routines: found 78749 nodes, limit used is 5 > > > Split number 1 Defined by IS > > > KSP Object: (fieldsplit_wp_) 4 MPI processes > > > type: preonly > > > maximum iterations=10000, initial guess is zero > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > > left preconditioning > > > using NONE norm type for convergence test > > > PC Object: (fieldsplit_wp_) 4 MPI processes > > > type: lu > > > LU: out-of-place factorization > > > tolerance for zero pivot 2.22045e-14 > > > matrix ordering: natural > > > factor fill ratio given 0, needed 0 > > > Factored matrix follows: > > > Mat Object: 4 MPI processes > > > type: mpiaij > > > rows=34141, cols=34141 > > > package used to perform factorization: pastix > > > Error : -nan > > > Error : -nan > > > total: nonzeros=0, allocated nonzeros=0 > > > Error : -nan > > > total number of mallocs used during MatSetValues calls =0 > > > PaStiX run parameters: > > > Matrix type : Symmetric > > > Level of printing (0,1,2): 0 > > > Number of refinements iterations : 0 > > > Error : -nan > > > linear system matrix = precond matrix: > > > Mat Object: (fieldsplit_wp_) 4 MPI processes > > > type: mpiaij > > > rows=34141, cols=34141 > > > total: nonzeros=485655, allocated nonzeros=485655 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node (on process 0) routines > > > linear system matrix = precond matrix: > > > Mat Object: 4 MPI processes > > > type: mpiaij > > > rows=973051, cols=973051 > > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 > > > total number of mallocs used during MatSetValues calls =0 > > > using I-node (on process 0) routines: found 78749 nodes, limit used is 5 > > > > > > The pattern of convergence gives a hint that this system is somehow bad/singular. But I don't know why the preconditioned error goes up too high. Anyone has an idea? > > > > > > Best regards > > > Giang Bui > > > > > > > > > From hgbk2008 at gmail.com Mon Apr 24 12:47:47 2017 From: hgbk2008 at gmail.com (Hoang Giang Bui) Date: Mon, 24 Apr 2017 19:47:47 +0200 Subject: [petsc-users] strange convergence In-Reply-To: References: <7891536D-91FE-4BFF-8DAD-CE7AB85A4E57@mcs.anl.gov> <425BBB58-9721-49F3-8C86-940F08E925F7@mcs.anl.gov> Message-ID: Good catch. I get this for the very first step, maybe at that time the rhs_w is zero. In the later step, it shows 2 step convergence Residual norms for fieldsplit_u_ solve. 0 KSP Residual norm 3.165886479830e+04 1 KSP Residual norm 2.905922877684e-01 Residual norms for fieldsplit_wp_ solve. 0 KSP Residual norm 2.397669419027e-01 1 KSP Residual norm 0.000000000000e+00 0 KSP preconditioned resid norm 3.165886479920e+04 true resid norm 7.963616922323e+05 ||r(i)||/||b|| 1.000000000000e+00 Residual norms for fieldsplit_u_ solve. 0 KSP Residual norm 9.999891813771e-01 1 KSP Residual norm 1.512000395579e-05 Residual norms for fieldsplit_wp_ solve. 0 KSP Residual norm 8.192702188243e-06 1 KSP Residual norm 0.000000000000e+00 1 KSP preconditioned resid norm 5.252183822848e-02 true resid norm 7.135927677844e+04 ||r(i)||/||b|| 8.960661653427e-02 Residual norms for fieldsplit_u_ solve. 0 KSP Residual norm 6.946213936597e-01 1 KSP Residual norm 1.195514007343e-05 Residual norms for fieldsplit_wp_ solve. 0 KSP Residual norm 1.025694497535e+00 1 KSP Residual norm 0.000000000000e+00 2 KSP preconditioned resid norm 8.785709535405e-03 true resid norm 1.419341799277e+04 ||r(i)||/||b|| 1.782282866091e-02 Residual norms for fieldsplit_u_ solve. 0 KSP Residual norm 7.255149996405e-01 1 KSP Residual norm 6.583512434218e-06 Residual norms for fieldsplit_wp_ solve. 0 KSP Residual norm 1.015229700337e+00 1 KSP Residual norm 0.000000000000e+00 3 KSP preconditioned resid norm 7.110407712709e-04 true resid norm 5.284940654154e+02 ||r(i)||/||b|| 6.636357205153e-04 Residual norms for fieldsplit_u_ solve. 0 KSP Residual norm 3.512243341400e-01 1 KSP Residual norm 2.032490351200e-06 Residual norms for fieldsplit_wp_ solve. 0 KSP Residual norm 1.282327290982e+00 1 KSP Residual norm 0.000000000000e+00 4 KSP preconditioned resid norm 3.482036620521e-05 true resid norm 4.291231924307e+01 ||r(i)||/||b|| 5.388546393133e-05 Residual norms for fieldsplit_u_ solve. 0 KSP Residual norm 3.423609338053e-01 1 KSP Residual norm 4.213703301972e-07 Residual norms for fieldsplit_wp_ solve. 0 KSP Residual norm 1.157384757538e+00 1 KSP Residual norm 0.000000000000e+00 5 KSP preconditioned resid norm 1.203470314534e-06 true resid norm 4.544956156267e+00 ||r(i)||/||b|| 5.707150658550e-06 Residual norms for fieldsplit_u_ solve. 0 KSP Residual norm 3.838596289995e-01 1 KSP Residual norm 9.927864176103e-08 Residual norms for fieldsplit_wp_ solve. 0 KSP Residual norm 1.066298905618e+00 1 KSP Residual norm 0.000000000000e+00 6 KSP preconditioned resid norm 3.331619244266e-08 true resid norm 2.821511729024e+00 ||r(i)||/||b|| 3.543002829675e-06 Residual norms for fieldsplit_u_ solve. 0 KSP Residual norm 4.624964188094e-01 1 KSP Residual norm 6.418229775372e-08 Residual norms for fieldsplit_wp_ solve. 0 KSP Residual norm 9.800784311614e-01 1 KSP Residual norm 0.000000000000e+00 7 KSP preconditioned resid norm 8.788046233297e-10 true resid norm 2.849209671705e+00 ||r(i)||/||b|| 3.577783436215e-06 Linear solve converged due to CONVERGED_ATOL iterations 7 The outer operator is an explicit matrix. Giang On Mon, Apr 24, 2017 at 7:32 PM, Barry Smith wrote: > > > On Apr 24, 2017, at 3:16 AM, Hoang Giang Bui wrote: > > > > Thanks Barry, trying with -fieldsplit_u_type lu gives better > convergence. I still used 4 procs though, probably with 1 proc it should > also be the same. > > > > The u block used a Nitsche-type operator to connect two non-matching > domains. I don't think it will leave some rigid body motion leads to not > sufficient constraints. Maybe you have other idea? > > > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 3.129067184300e+05 > > 1 KSP Residual norm 5.906261468196e-01 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 0.000000000000e+00 > > ^^^^ something is wrong here. The sub solve should not be starting > with a 0 residual (this means the right hand side for this sub solve is > zero which it should not be). > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 > > > How are you providing the outer operator? As an explicit matrix or with > some shell matrix? > > > > > 0 KSP preconditioned resid norm 3.129067184300e+05 true resid norm > 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 9.999955993437e-01 > > 1 KSP Residual norm 4.019774691831e-06 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 0.000000000000e+00 > > 1 KSP preconditioned resid norm 5.003913641475e-01 true resid norm > 4.692996324114e+01 ||r(i)||/||b|| 5.205677185522e-06 > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 1.000012180204e+00 > > 1 KSP Residual norm 1.017367950422e-05 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 0.000000000000e+00 > > 2 KSP preconditioned resid norm 2.330910333756e-07 true resid norm > 3.474855463983e+01 ||r(i)||/||b|| 3.854461960453e-06 > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 1.000004200085e+00 > > 1 KSP Residual norm 6.231613102458e-06 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 0.000000000000e+00 > > 3 KSP preconditioned resid norm 8.671259838389e-11 true resid norm > 3.545103468011e+01 ||r(i)||/||b|| 3.932384125024e-06 > > Linear solve converged due to CONVERGED_ATOL iterations 3 > > KSP Object: 4 MPI processes > > type: gmres > > GMRES: restart=1000, using Modified Gram-Schmidt Orthogonalization > > GMRES: happy breakdown tolerance 1e-30 > > maximum iterations=1000, initial guess is zero > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 > > left preconditioning > > using PRECONDITIONED norm type for convergence test > > PC Object: 4 MPI processes > > type: fieldsplit > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 > > Solver info for each split is in the following KSP objects: > > Split number 0 Defined by IS > > KSP Object: (fieldsplit_u_) 4 MPI processes > > type: richardson > > Richardson: damping factor=1 > > maximum iterations=1, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > left preconditioning > > using PRECONDITIONED norm type for convergence test > > PC Object: (fieldsplit_u_) 4 MPI processes > > type: lu > > LU: out-of-place factorization > > tolerance for zero pivot 2.22045e-14 > > matrix ordering: natural > > factor fill ratio given 0, needed 0 > > Factored matrix follows: > > Mat Object: 4 MPI processes > > type: mpiaij > > rows=938910, cols=938910 > > package used to perform factorization: pastix > > total: nonzeros=0, allocated nonzeros=0 > > Error : 3.36878e-14 > > total number of mallocs used during MatSetValues calls =0 > > PaStiX run parameters: > > Matrix type : Unsymmetric > > Level of printing (0,1,2): 0 > > Number of refinements iterations : 3 > > Error : 3.36878e-14 > > linear system matrix = precond matrix: > > Mat Object: (fieldsplit_u_) 4 MPI processes > > type: mpiaij > > rows=938910, cols=938910, bs=3 > > Error : 3.36878e-14 > > Error : 3.36878e-14 > > total: nonzeros=8.60906e+07, allocated nonzeros=8.60906e+07 > > total number of mallocs used during MatSetValues calls =0 > > using I-node (on process 0) routines: found 78749 nodes, limit > used is 5 > > Split number 1 Defined by IS > > KSP Object: (fieldsplit_wp_) 4 MPI processes > > type: richardson > > Richardson: damping factor=1 > > maximum iterations=1, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > left preconditioning > > using PRECONDITIONED norm type for convergence test > > PC Object: (fieldsplit_wp_) 4 MPI processes > > type: lu > > LU: out-of-place factorization > > tolerance for zero pivot 2.22045e-14 > > matrix ordering: natural > > factor fill ratio given 0, needed 0 > > Factored matrix follows: > > Mat Object: 4 MPI processes > > type: mpiaij > > rows=34141, cols=34141 > > package used to perform factorization: pastix > > Error : -nan > > Error : -nan > > Error : -nan > > total: nonzeros=0, allocated nonzeros=0 > > total number of mallocs used during MatSetValues calls =0 > > PaStiX run parameters: > > Matrix type : Symmetric > > Level of printing (0,1,2): 0 > > Number of refinements iterations : 0 > > Error : -nan > > linear system matrix = precond matrix: > > Mat Object: (fieldsplit_wp_) 4 MPI processes > > type: mpiaij > > rows=34141, cols=34141 > > total: nonzeros=485655, allocated nonzeros=485655 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node (on process 0) routines > > linear system matrix = precond matrix: > > Mat Object: 4 MPI processes > > type: mpiaij > > rows=973051, cols=973051 > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 > > total number of mallocs used during MatSetValues calls =0 > > using I-node (on process 0) routines: found 78749 nodes, limit > used is 5 > > > > > > > > Giang > > > > On Sun, Apr 23, 2017 at 10:19 PM, Barry Smith > wrote: > > > > > On Apr 23, 2017, at 2:42 PM, Hoang Giang Bui > wrote: > > > > > > Dear Matt/Barry > > > > > > With your options, it results in > > > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true resid norm > 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 > > > Residual norms for fieldsplit_u_ solve. > > > 0 KSP Residual norm 2.407308987203e+36 > > > 1 KSP Residual norm 5.797185652683e+72 > > > > It looks like Matt is right, hypre is seemly producing useless garbage. > > > > First how do things run on one process. If you have similar problems > then debug on one process (debugging any kind of problem is always far easy > on one process). > > > > First run with -fieldsplit_u_type lu (instead of using hypre) to see if > that works or also produces something bad. > > > > What is the operator and the boundary conditions for u? It could be > singular. > > > > > > > > > > > > > > > Residual norms for fieldsplit_wp_ solve. > > > 0 KSP Residual norm 0.000000000000e+00 > > > ... > > > 999 KSP preconditioned resid norm 2.920157329174e+12 true resid norm > 9.015683504616e+06 ||r(i)||/||b|| 1.000059124102e+00 > > > Residual norms for fieldsplit_u_ solve. > > > 0 KSP Residual norm 1.533726746719e+36 > > > 1 KSP Residual norm 3.692757392261e+72 > > > Residual norms for fieldsplit_wp_ solve. > > > 0 KSP Residual norm 0.000000000000e+00 > > > > > > Do you suggest that the pastix solver for the "wp" block encounters > small pivot? In addition, seem like the "u" block is also singular. > > > > > > Giang > > > > > > On Sun, Apr 23, 2017 at 7:39 PM, Barry Smith > wrote: > > > > > > Huge preconditioned norms but normal unpreconditioned norms almost > always come from a very small pivot in an LU or ILU factorization. > > > > > > The first thing to do is monitor the two sub solves. Run with the > additional options -fieldsplit_u_ksp_type richardson > -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 > -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor > -fieldsplit_wp_ksp_max_it 1 > > > > > > > On Apr 23, 2017, at 12:22 PM, Hoang Giang Bui > wrote: > > > > > > > > Hello > > > > > > > > I encountered a strange convergence behavior that I have trouble to > understand > > > > > > > > KSPSetFromOptions completed > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true resid norm > 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 > > > > 1 KSP preconditioned resid norm 2.933141742664e+29 true resid norm > 9.015152282123e+06 ||r(i)||/||b|| 1.000000198575e+00 > > > > 2 KSP preconditioned resid norm 9.686409637174e+16 true resid norm > 9.015354521944e+06 ||r(i)||/||b|| 1.000022631902e+00 > > > > 3 KSP preconditioned resid norm 4.219243615809e+15 true resid norm > 9.017157702420e+06 ||r(i)||/||b|| 1.000222648583e+00 > > > > ..... > > > > 999 KSP preconditioned resid norm 3.043754298076e+12 true resid norm > 9.015425041089e+06 ||r(i)||/||b|| 1.000030454195e+00 > > > > 1000 KSP preconditioned resid norm 3.043000287819e+12 true resid > norm 9.015424313455e+06 ||r(i)||/||b|| 1.000030373483e+00 > > > > Linear solve did not converge due to DIVERGED_ITS iterations 1000 > > > > KSP Object: 4 MPI processes > > > > type: gmres > > > > GMRES: restart=1000, using Modified Gram-Schmidt > Orthogonalization > > > > GMRES: happy breakdown tolerance 1e-30 > > > > maximum iterations=1000, initial guess is zero > > > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 > > > > left preconditioning > > > > using PRECONDITIONED norm type for convergence test > > > > PC Object: 4 MPI processes > > > > type: fieldsplit > > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 > > > > Solver info for each split is in the following KSP objects: > > > > Split number 0 Defined by IS > > > > KSP Object: (fieldsplit_u_) 4 MPI processes > > > > type: preonly > > > > maximum iterations=10000, initial guess is zero > > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > > > left preconditioning > > > > using NONE norm type for convergence test > > > > PC Object: (fieldsplit_u_) 4 MPI processes > > > > type: hypre > > > > HYPRE BoomerAMG preconditioning > > > > HYPRE BoomerAMG: Cycle type V > > > > HYPRE BoomerAMG: Maximum number of levels 25 > > > > HYPRE BoomerAMG: Maximum number of iterations PER hypre call > 1 > > > > HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 > > > > HYPRE BoomerAMG: Threshold for strong coupling 0.6 > > > > HYPRE BoomerAMG: Interpolation truncation factor 0 > > > > HYPRE BoomerAMG: Interpolation: max elements per row 0 > > > > HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 > > > > HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 > > > > HYPRE BoomerAMG: Maximum row sums 0.9 > > > > HYPRE BoomerAMG: Sweeps down 1 > > > > HYPRE BoomerAMG: Sweeps up 1 > > > > HYPRE BoomerAMG: Sweeps on coarse 1 > > > > HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi > > > > HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi > > > > HYPRE BoomerAMG: Relax on coarse Gaussian-elimination > > > > HYPRE BoomerAMG: Relax weight (all) 1 > > > > HYPRE BoomerAMG: Outer relax weight (all) 1 > > > > HYPRE BoomerAMG: Using CF-relaxation > > > > HYPRE BoomerAMG: Measure type local > > > > HYPRE BoomerAMG: Coarsen type PMIS > > > > HYPRE BoomerAMG: Interpolation type classical > > > > linear system matrix = precond matrix: > > > > Mat Object: (fieldsplit_u_) 4 MPI processes > > > > type: mpiaij > > > > rows=938910, cols=938910, bs=3 > > > > total: nonzeros=8.60906e+07, allocated nonzeros=8.60906e+07 > > > > total number of mallocs used during MatSetValues calls =0 > > > > using I-node (on process 0) routines: found 78749 nodes, > limit used is 5 > > > > Split number 1 Defined by IS > > > > KSP Object: (fieldsplit_wp_) 4 MPI processes > > > > type: preonly > > > > maximum iterations=10000, initial guess is zero > > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > > > left preconditioning > > > > using NONE norm type for convergence test > > > > PC Object: (fieldsplit_wp_) 4 MPI processes > > > > type: lu > > > > LU: out-of-place factorization > > > > tolerance for zero pivot 2.22045e-14 > > > > matrix ordering: natural > > > > factor fill ratio given 0, needed 0 > > > > Factored matrix follows: > > > > Mat Object: 4 MPI processes > > > > type: mpiaij > > > > rows=34141, cols=34141 > > > > package used to perform factorization: pastix > > > > Error : -nan > > > > Error : -nan > > > > total: nonzeros=0, allocated nonzeros=0 > > > > Error : -nan > > > > total number of mallocs used during MatSetValues calls =0 > > > > PaStiX run parameters: > > > > Matrix type : Symmetric > > > > Level of printing (0,1,2): 0 > > > > Number of refinements iterations : 0 > > > > Error : -nan > > > > linear system matrix = precond matrix: > > > > Mat Object: (fieldsplit_wp_) 4 MPI processes > > > > type: mpiaij > > > > rows=34141, cols=34141 > > > > total: nonzeros=485655, allocated nonzeros=485655 > > > > total number of mallocs used during MatSetValues calls =0 > > > > not using I-node (on process 0) routines > > > > linear system matrix = precond matrix: > > > > Mat Object: 4 MPI processes > > > > type: mpiaij > > > > rows=973051, cols=973051 > > > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 > > > > total number of mallocs used during MatSetValues calls =0 > > > > using I-node (on process 0) routines: found 78749 nodes, limit > used is 5 > > > > > > > > The pattern of convergence gives a hint that this system is somehow > bad/singular. But I don't know why the preconditioned error goes up too > high. Anyone has an idea? > > > > > > > > Best regards > > > > Giang Bui > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon Apr 24 13:21:29 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 24 Apr 2017 13:21:29 -0500 Subject: [petsc-users] strange convergence In-Reply-To: References: <7891536D-91FE-4BFF-8DAD-CE7AB85A4E57@mcs.anl.gov> <425BBB58-9721-49F3-8C86-940F08E925F7@mcs.anl.gov> Message-ID: <42EB791A-40C2-439F-A5F7-5F8C15CECA6F@mcs.anl.gov> > On Apr 24, 2017, at 12:47 PM, Hoang Giang Bui wrote: > > Good catch. I get this for the very first step, maybe at that time the rhs_w is zero. With the multiplicative composition the right hand side of the second solve is the initial right hand side of the second solve minus A_10*x where x is the solution to the first sub solve and A_10 is the lower left block of the outer matrix. So unless both the initial right hand side has a zero for the second block and A_10 is identically zero the right hand side for the second sub solve should not be zero. Is A_10 == 0? > In the later step, it shows 2 step convergence > > Residual norms for fieldsplit_u_ solve. > 0 KSP Residual norm 3.165886479830e+04 > 1 KSP Residual norm 2.905922877684e-01 > Residual norms for fieldsplit_wp_ solve. > 0 KSP Residual norm 2.397669419027e-01 > 1 KSP Residual norm 0.000000000000e+00 > 0 KSP preconditioned resid norm 3.165886479920e+04 true resid norm 7.963616922323e+05 ||r(i)||/||b|| 1.000000000000e+00 > Residual norms for fieldsplit_u_ solve. > 0 KSP Residual norm 9.999891813771e-01 > 1 KSP Residual norm 1.512000395579e-05 > Residual norms for fieldsplit_wp_ solve. > 0 KSP Residual norm 8.192702188243e-06 > 1 KSP Residual norm 0.000000000000e+00 > 1 KSP preconditioned resid norm 5.252183822848e-02 true resid norm 7.135927677844e+04 ||r(i)||/||b|| 8.960661653427e-02 The outer residual norms are still wonky, the preconditioned residual norm goes from 3.165886479920e+04 to 5.252183822848e-02 which is a huge drop but the 7.963616922323e+05 drops very much less 7.135927677844e+04. This is not normal. What if you just use -pc_type lu for the entire system (no fieldsplit), does the true residual drop to almost zero in the first iteration (as it should?). Send the output. > Residual norms for fieldsplit_u_ solve. > 0 KSP Residual norm 6.946213936597e-01 > 1 KSP Residual norm 1.195514007343e-05 > Residual norms for fieldsplit_wp_ solve. > 0 KSP Residual norm 1.025694497535e+00 > 1 KSP Residual norm 0.000000000000e+00 > 2 KSP preconditioned resid norm 8.785709535405e-03 true resid norm 1.419341799277e+04 ||r(i)||/||b|| 1.782282866091e-02 > Residual norms for fieldsplit_u_ solve. > 0 KSP Residual norm 7.255149996405e-01 > 1 KSP Residual norm 6.583512434218e-06 > Residual norms for fieldsplit_wp_ solve. > 0 KSP Residual norm 1.015229700337e+00 > 1 KSP Residual norm 0.000000000000e+00 > 3 KSP preconditioned resid norm 7.110407712709e-04 true resid norm 5.284940654154e+02 ||r(i)||/||b|| 6.636357205153e-04 > Residual norms for fieldsplit_u_ solve. > 0 KSP Residual norm 3.512243341400e-01 > 1 KSP Residual norm 2.032490351200e-06 > Residual norms for fieldsplit_wp_ solve. > 0 KSP Residual norm 1.282327290982e+00 > 1 KSP Residual norm 0.000000000000e+00 > 4 KSP preconditioned resid norm 3.482036620521e-05 true resid norm 4.291231924307e+01 ||r(i)||/||b|| 5.388546393133e-05 > Residual norms for fieldsplit_u_ solve. > 0 KSP Residual norm 3.423609338053e-01 > 1 KSP Residual norm 4.213703301972e-07 > Residual norms for fieldsplit_wp_ solve. > 0 KSP Residual norm 1.157384757538e+00 > 1 KSP Residual norm 0.000000000000e+00 > 5 KSP preconditioned resid norm 1.203470314534e-06 true resid norm 4.544956156267e+00 ||r(i)||/||b|| 5.707150658550e-06 > Residual norms for fieldsplit_u_ solve. > 0 KSP Residual norm 3.838596289995e-01 > 1 KSP Residual norm 9.927864176103e-08 > Residual norms for fieldsplit_wp_ solve. > 0 KSP Residual norm 1.066298905618e+00 > 1 KSP Residual norm 0.000000000000e+00 > 6 KSP preconditioned resid norm 3.331619244266e-08 true resid norm 2.821511729024e+00 ||r(i)||/||b|| 3.543002829675e-06 > Residual norms for fieldsplit_u_ solve. > 0 KSP Residual norm 4.624964188094e-01 > 1 KSP Residual norm 6.418229775372e-08 > Residual norms for fieldsplit_wp_ solve. > 0 KSP Residual norm 9.800784311614e-01 > 1 KSP Residual norm 0.000000000000e+00 > 7 KSP preconditioned resid norm 8.788046233297e-10 true resid norm 2.849209671705e+00 ||r(i)||/||b|| 3.577783436215e-06 > Linear solve converged due to CONVERGED_ATOL iterations 7 > > The outer operator is an explicit matrix. > > Giang > > On Mon, Apr 24, 2017 at 7:32 PM, Barry Smith wrote: > > > On Apr 24, 2017, at 3:16 AM, Hoang Giang Bui wrote: > > > > Thanks Barry, trying with -fieldsplit_u_type lu gives better convergence. I still used 4 procs though, probably with 1 proc it should also be the same. > > > > The u block used a Nitsche-type operator to connect two non-matching domains. I don't think it will leave some rigid body motion leads to not sufficient constraints. Maybe you have other idea? > > > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 3.129067184300e+05 > > 1 KSP Residual norm 5.906261468196e-01 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 0.000000000000e+00 > > ^^^^ something is wrong here. The sub solve should not be starting with a 0 residual (this means the right hand side for this sub solve is zero which it should not be). > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 > > > How are you providing the outer operator? As an explicit matrix or with some shell matrix? > > > > > 0 KSP preconditioned resid norm 3.129067184300e+05 true resid norm 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 9.999955993437e-01 > > 1 KSP Residual norm 4.019774691831e-06 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 0.000000000000e+00 > > 1 KSP preconditioned resid norm 5.003913641475e-01 true resid norm 4.692996324114e+01 ||r(i)||/||b|| 5.205677185522e-06 > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 1.000012180204e+00 > > 1 KSP Residual norm 1.017367950422e-05 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 0.000000000000e+00 > > 2 KSP preconditioned resid norm 2.330910333756e-07 true resid norm 3.474855463983e+01 ||r(i)||/||b|| 3.854461960453e-06 > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 1.000004200085e+00 > > 1 KSP Residual norm 6.231613102458e-06 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 0.000000000000e+00 > > 3 KSP preconditioned resid norm 8.671259838389e-11 true resid norm 3.545103468011e+01 ||r(i)||/||b|| 3.932384125024e-06 > > Linear solve converged due to CONVERGED_ATOL iterations 3 > > KSP Object: 4 MPI processes > > type: gmres > > GMRES: restart=1000, using Modified Gram-Schmidt Orthogonalization > > GMRES: happy breakdown tolerance 1e-30 > > maximum iterations=1000, initial guess is zero > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 > > left preconditioning > > using PRECONDITIONED norm type for convergence test > > PC Object: 4 MPI processes > > type: fieldsplit > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 > > Solver info for each split is in the following KSP objects: > > Split number 0 Defined by IS > > KSP Object: (fieldsplit_u_) 4 MPI processes > > type: richardson > > Richardson: damping factor=1 > > maximum iterations=1, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > left preconditioning > > using PRECONDITIONED norm type for convergence test > > PC Object: (fieldsplit_u_) 4 MPI processes > > type: lu > > LU: out-of-place factorization > > tolerance for zero pivot 2.22045e-14 > > matrix ordering: natural > > factor fill ratio given 0, needed 0 > > Factored matrix follows: > > Mat Object: 4 MPI processes > > type: mpiaij > > rows=938910, cols=938910 > > package used to perform factorization: pastix > > total: nonzeros=0, allocated nonzeros=0 > > Error : 3.36878e-14 > > total number of mallocs used during MatSetValues calls =0 > > PaStiX run parameters: > > Matrix type : Unsymmetric > > Level of printing (0,1,2): 0 > > Number of refinements iterations : 3 > > Error : 3.36878e-14 > > linear system matrix = precond matrix: > > Mat Object: (fieldsplit_u_) 4 MPI processes > > type: mpiaij > > rows=938910, cols=938910, bs=3 > > Error : 3.36878e-14 > > Error : 3.36878e-14 > > total: nonzeros=8.60906e+07, allocated nonzeros=8.60906e+07 > > total number of mallocs used during MatSetValues calls =0 > > using I-node (on process 0) routines: found 78749 nodes, limit used is 5 > > Split number 1 Defined by IS > > KSP Object: (fieldsplit_wp_) 4 MPI processes > > type: richardson > > Richardson: damping factor=1 > > maximum iterations=1, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > left preconditioning > > using PRECONDITIONED norm type for convergence test > > PC Object: (fieldsplit_wp_) 4 MPI processes > > type: lu > > LU: out-of-place factorization > > tolerance for zero pivot 2.22045e-14 > > matrix ordering: natural > > factor fill ratio given 0, needed 0 > > Factored matrix follows: > > Mat Object: 4 MPI processes > > type: mpiaij > > rows=34141, cols=34141 > > package used to perform factorization: pastix > > Error : -nan > > Error : -nan > > Error : -nan > > total: nonzeros=0, allocated nonzeros=0 > > total number of mallocs used during MatSetValues calls =0 > > PaStiX run parameters: > > Matrix type : Symmetric > > Level of printing (0,1,2): 0 > > Number of refinements iterations : 0 > > Error : -nan > > linear system matrix = precond matrix: > > Mat Object: (fieldsplit_wp_) 4 MPI processes > > type: mpiaij > > rows=34141, cols=34141 > > total: nonzeros=485655, allocated nonzeros=485655 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node (on process 0) routines > > linear system matrix = precond matrix: > > Mat Object: 4 MPI processes > > type: mpiaij > > rows=973051, cols=973051 > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 > > total number of mallocs used during MatSetValues calls =0 > > using I-node (on process 0) routines: found 78749 nodes, limit used is 5 > > > > > > > > Giang > > > > On Sun, Apr 23, 2017 at 10:19 PM, Barry Smith wrote: > > > > > On Apr 23, 2017, at 2:42 PM, Hoang Giang Bui wrote: > > > > > > Dear Matt/Barry > > > > > > With your options, it results in > > > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 > > > Residual norms for fieldsplit_u_ solve. > > > 0 KSP Residual norm 2.407308987203e+36 > > > 1 KSP Residual norm 5.797185652683e+72 > > > > It looks like Matt is right, hypre is seemly producing useless garbage. > > > > First how do things run on one process. If you have similar problems then debug on one process (debugging any kind of problem is always far easy on one process). > > > > First run with -fieldsplit_u_type lu (instead of using hypre) to see if that works or also produces something bad. > > > > What is the operator and the boundary conditions for u? It could be singular. > > > > > > > > > > > > > > > Residual norms for fieldsplit_wp_ solve. > > > 0 KSP Residual norm 0.000000000000e+00 > > > ... > > > 999 KSP preconditioned resid norm 2.920157329174e+12 true resid norm 9.015683504616e+06 ||r(i)||/||b|| 1.000059124102e+00 > > > Residual norms for fieldsplit_u_ solve. > > > 0 KSP Residual norm 1.533726746719e+36 > > > 1 KSP Residual norm 3.692757392261e+72 > > > Residual norms for fieldsplit_wp_ solve. > > > 0 KSP Residual norm 0.000000000000e+00 > > > > > > Do you suggest that the pastix solver for the "wp" block encounters small pivot? In addition, seem like the "u" block is also singular. > > > > > > Giang > > > > > > On Sun, Apr 23, 2017 at 7:39 PM, Barry Smith wrote: > > > > > > Huge preconditioned norms but normal unpreconditioned norms almost always come from a very small pivot in an LU or ILU factorization. > > > > > > The first thing to do is monitor the two sub solves. Run with the additional options -fieldsplit_u_ksp_type richardson -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor -fieldsplit_wp_ksp_max_it 1 > > > > > > > On Apr 23, 2017, at 12:22 PM, Hoang Giang Bui wrote: > > > > > > > > Hello > > > > > > > > I encountered a strange convergence behavior that I have trouble to understand > > > > > > > > KSPSetFromOptions completed > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 > > > > 1 KSP preconditioned resid norm 2.933141742664e+29 true resid norm 9.015152282123e+06 ||r(i)||/||b|| 1.000000198575e+00 > > > > 2 KSP preconditioned resid norm 9.686409637174e+16 true resid norm 9.015354521944e+06 ||r(i)||/||b|| 1.000022631902e+00 > > > > 3 KSP preconditioned resid norm 4.219243615809e+15 true resid norm 9.017157702420e+06 ||r(i)||/||b|| 1.000222648583e+00 > > > > ..... > > > > 999 KSP preconditioned resid norm 3.043754298076e+12 true resid norm 9.015425041089e+06 ||r(i)||/||b|| 1.000030454195e+00 > > > > 1000 KSP preconditioned resid norm 3.043000287819e+12 true resid norm 9.015424313455e+06 ||r(i)||/||b|| 1.000030373483e+00 > > > > Linear solve did not converge due to DIVERGED_ITS iterations 1000 > > > > KSP Object: 4 MPI processes > > > > type: gmres > > > > GMRES: restart=1000, using Modified Gram-Schmidt Orthogonalization > > > > GMRES: happy breakdown tolerance 1e-30 > > > > maximum iterations=1000, initial guess is zero > > > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 > > > > left preconditioning > > > > using PRECONDITIONED norm type for convergence test > > > > PC Object: 4 MPI processes > > > > type: fieldsplit > > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 > > > > Solver info for each split is in the following KSP objects: > > > > Split number 0 Defined by IS > > > > KSP Object: (fieldsplit_u_) 4 MPI processes > > > > type: preonly > > > > maximum iterations=10000, initial guess is zero > > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > > > left preconditioning > > > > using NONE norm type for convergence test > > > > PC Object: (fieldsplit_u_) 4 MPI processes > > > > type: hypre > > > > HYPRE BoomerAMG preconditioning > > > > HYPRE BoomerAMG: Cycle type V > > > > HYPRE BoomerAMG: Maximum number of levels 25 > > > > HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 > > > > HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 > > > > HYPRE BoomerAMG: Threshold for strong coupling 0.6 > > > > HYPRE BoomerAMG: Interpolation truncation factor 0 > > > > HYPRE BoomerAMG: Interpolation: max elements per row 0 > > > > HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 > > > > HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 > > > > HYPRE BoomerAMG: Maximum row sums 0.9 > > > > HYPRE BoomerAMG: Sweeps down 1 > > > > HYPRE BoomerAMG: Sweeps up 1 > > > > HYPRE BoomerAMG: Sweeps on coarse 1 > > > > HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi > > > > HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi > > > > HYPRE BoomerAMG: Relax on coarse Gaussian-elimination > > > > HYPRE BoomerAMG: Relax weight (all) 1 > > > > HYPRE BoomerAMG: Outer relax weight (all) 1 > > > > HYPRE BoomerAMG: Using CF-relaxation > > > > HYPRE BoomerAMG: Measure type local > > > > HYPRE BoomerAMG: Coarsen type PMIS > > > > HYPRE BoomerAMG: Interpolation type classical > > > > linear system matrix = precond matrix: > > > > Mat Object: (fieldsplit_u_) 4 MPI processes > > > > type: mpiaij > > > > rows=938910, cols=938910, bs=3 > > > > total: nonzeros=8.60906e+07, allocated nonzeros=8.60906e+07 > > > > total number of mallocs used during MatSetValues calls =0 > > > > using I-node (on process 0) routines: found 78749 nodes, limit used is 5 > > > > Split number 1 Defined by IS > > > > KSP Object: (fieldsplit_wp_) 4 MPI processes > > > > type: preonly > > > > maximum iterations=10000, initial guess is zero > > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > > > left preconditioning > > > > using NONE norm type for convergence test > > > > PC Object: (fieldsplit_wp_) 4 MPI processes > > > > type: lu > > > > LU: out-of-place factorization > > > > tolerance for zero pivot 2.22045e-14 > > > > matrix ordering: natural > > > > factor fill ratio given 0, needed 0 > > > > Factored matrix follows: > > > > Mat Object: 4 MPI processes > > > > type: mpiaij > > > > rows=34141, cols=34141 > > > > package used to perform factorization: pastix > > > > Error : -nan > > > > Error : -nan > > > > total: nonzeros=0, allocated nonzeros=0 > > > > Error : -nan > > > > total number of mallocs used during MatSetValues calls =0 > > > > PaStiX run parameters: > > > > Matrix type : Symmetric > > > > Level of printing (0,1,2): 0 > > > > Number of refinements iterations : 0 > > > > Error : -nan > > > > linear system matrix = precond matrix: > > > > Mat Object: (fieldsplit_wp_) 4 MPI processes > > > > type: mpiaij > > > > rows=34141, cols=34141 > > > > total: nonzeros=485655, allocated nonzeros=485655 > > > > total number of mallocs used during MatSetValues calls =0 > > > > not using I-node (on process 0) routines > > > > linear system matrix = precond matrix: > > > > Mat Object: 4 MPI processes > > > > type: mpiaij > > > > rows=973051, cols=973051 > > > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 > > > > total number of mallocs used during MatSetValues calls =0 > > > > using I-node (on process 0) routines: found 78749 nodes, limit used is 5 > > > > > > > > The pattern of convergence gives a hint that this system is somehow bad/singular. But I don't know why the preconditioned error goes up too high. Anyone has an idea? > > > > > > > > Best regards > > > > Giang Bui > > > > > > > > > > > > > > > > From hgbk2008 at gmail.com Mon Apr 24 17:10:49 2017 From: hgbk2008 at gmail.com (Hoang Giang Bui) Date: Tue, 25 Apr 2017 00:10:49 +0200 Subject: [petsc-users] strange convergence In-Reply-To: <42EB791A-40C2-439F-A5F7-5F8C15CECA6F@mcs.anl.gov> References: <7891536D-91FE-4BFF-8DAD-CE7AB85A4E57@mcs.anl.gov> <425BBB58-9721-49F3-8C86-940F08E925F7@mcs.anl.gov> <42EB791A-40C2-439F-A5F7-5F8C15CECA6F@mcs.anl.gov> Message-ID: It took a while, here I send you the output 0 KSP preconditioned resid norm 3.129073545457e+05 true resid norm 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 7.442444222843e-01 true resid norm 1.003356247696e+02 ||r(i)||/||b|| 1.112966720375e-05 2 KSP preconditioned resid norm 3.267453132529e-07 true resid norm 3.216722968300e+01 ||r(i)||/||b|| 3.568130084011e-06 3 KSP preconditioned resid norm 1.155046883816e-11 true resid norm 3.234460376820e+01 ||r(i)||/||b|| 3.587805194854e-06 Linear solve converged due to CONVERGED_ATOL iterations 3 KSP Object: 4 MPI processes type: gmres GMRES: restart=1000, using Modified Gram-Schmidt Orthogonalization GMRES: happy breakdown tolerance 1e-30 maximum iterations=1000, initial guess is zero tolerances: relative=1e-20, absolute=1e-09, divergence=10000 left preconditioning using PRECONDITIONED norm type for convergence test PC Object: 4 MPI processes type: lu LU: out-of-place factorization tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 0, needed 0 Factored matrix follows: Mat Object: 4 MPI processes type: mpiaij rows=973051, cols=973051 package used to perform factorization: pastix Error : 3.24786e-14 total: nonzeros=0, allocated nonzeros=0 total number of mallocs used during MatSetValues calls =0 PaStiX run parameters: Matrix type : Unsymmetric Level of printing (0,1,2): 0 Number of refinements iterations : 3 Error : 3.24786e-14 linear system matrix = precond matrix: Mat Object: 4 MPI processes type: mpiaij rows=973051, cols=973051 Error : 3.24786e-14 total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 78749 nodes, limit used is 5 Error : 3.24786e-14 It doesn't do as you said. Something is not right here. I will look in depth. Giang On Mon, Apr 24, 2017 at 8:21 PM, Barry Smith wrote: > > > On Apr 24, 2017, at 12:47 PM, Hoang Giang Bui > wrote: > > > > Good catch. I get this for the very first step, maybe at that time the > rhs_w is zero. > > With the multiplicative composition the right hand side of the second > solve is the initial right hand side of the second solve minus A_10*x where > x is the solution to the first sub solve and A_10 is the lower left block > of the outer matrix. So unless both the initial right hand side has a zero > for the second block and A_10 is identically zero the right hand side for > the second sub solve should not be zero. Is A_10 == 0? > > > > In the later step, it shows 2 step convergence > > > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 3.165886479830e+04 > > 1 KSP Residual norm 2.905922877684e-01 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 2.397669419027e-01 > > 1 KSP Residual norm 0.000000000000e+00 > > 0 KSP preconditioned resid norm 3.165886479920e+04 true resid norm > 7.963616922323e+05 ||r(i)||/||b|| 1.000000000000e+00 > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 9.999891813771e-01 > > 1 KSP Residual norm 1.512000395579e-05 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 8.192702188243e-06 > > 1 KSP Residual norm 0.000000000000e+00 > > 1 KSP preconditioned resid norm 5.252183822848e-02 true resid norm > 7.135927677844e+04 ||r(i)||/||b|| 8.960661653427e-02 > > The outer residual norms are still wonky, the preconditioned residual > norm goes from 3.165886479920e+04 to 5.252183822848e-02 which is a huge > drop but the 7.963616922323e+05 drops very much less 7.135927677844e+04. > This is not normal. > > What if you just use -pc_type lu for the entire system (no fieldsplit), > does the true residual drop to almost zero in the first iteration (as it > should?). Send the output. > > > > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 6.946213936597e-01 > > 1 KSP Residual norm 1.195514007343e-05 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 1.025694497535e+00 > > 1 KSP Residual norm 0.000000000000e+00 > > 2 KSP preconditioned resid norm 8.785709535405e-03 true resid norm > 1.419341799277e+04 ||r(i)||/||b|| 1.782282866091e-02 > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 7.255149996405e-01 > > 1 KSP Residual norm 6.583512434218e-06 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 1.015229700337e+00 > > 1 KSP Residual norm 0.000000000000e+00 > > 3 KSP preconditioned resid norm 7.110407712709e-04 true resid norm > 5.284940654154e+02 ||r(i)||/||b|| 6.636357205153e-04 > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 3.512243341400e-01 > > 1 KSP Residual norm 2.032490351200e-06 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 1.282327290982e+00 > > 1 KSP Residual norm 0.000000000000e+00 > > 4 KSP preconditioned resid norm 3.482036620521e-05 true resid norm > 4.291231924307e+01 ||r(i)||/||b|| 5.388546393133e-05 > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 3.423609338053e-01 > > 1 KSP Residual norm 4.213703301972e-07 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 1.157384757538e+00 > > 1 KSP Residual norm 0.000000000000e+00 > > 5 KSP preconditioned resid norm 1.203470314534e-06 true resid norm > 4.544956156267e+00 ||r(i)||/||b|| 5.707150658550e-06 > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 3.838596289995e-01 > > 1 KSP Residual norm 9.927864176103e-08 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 1.066298905618e+00 > > 1 KSP Residual norm 0.000000000000e+00 > > 6 KSP preconditioned resid norm 3.331619244266e-08 true resid norm > 2.821511729024e+00 ||r(i)||/||b|| 3.543002829675e-06 > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 4.624964188094e-01 > > 1 KSP Residual norm 6.418229775372e-08 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 9.800784311614e-01 > > 1 KSP Residual norm 0.000000000000e+00 > > 7 KSP preconditioned resid norm 8.788046233297e-10 true resid norm > 2.849209671705e+00 ||r(i)||/||b|| 3.577783436215e-06 > > Linear solve converged due to CONVERGED_ATOL iterations 7 > > > > The outer operator is an explicit matrix. > > > > Giang > > > > On Mon, Apr 24, 2017 at 7:32 PM, Barry Smith wrote: > > > > > On Apr 24, 2017, at 3:16 AM, Hoang Giang Bui > wrote: > > > > > > Thanks Barry, trying with -fieldsplit_u_type lu gives better > convergence. I still used 4 procs though, probably with 1 proc it should > also be the same. > > > > > > The u block used a Nitsche-type operator to connect two non-matching > domains. I don't think it will leave some rigid body motion leads to not > sufficient constraints. Maybe you have other idea? > > > > > > Residual norms for fieldsplit_u_ solve. > > > 0 KSP Residual norm 3.129067184300e+05 > > > 1 KSP Residual norm 5.906261468196e-01 > > > Residual norms for fieldsplit_wp_ solve. > > > 0 KSP Residual norm 0.000000000000e+00 > > > > ^^^^ something is wrong here. The sub solve should not be starting > with a 0 residual (this means the right hand side for this sub solve is > zero which it should not be). > > > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 > > > > > > How are you providing the outer operator? As an explicit matrix or > with some shell matrix? > > > > > > > > > 0 KSP preconditioned resid norm 3.129067184300e+05 true resid norm > 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 > > > Residual norms for fieldsplit_u_ solve. > > > 0 KSP Residual norm 9.999955993437e-01 > > > 1 KSP Residual norm 4.019774691831e-06 > > > Residual norms for fieldsplit_wp_ solve. > > > 0 KSP Residual norm 0.000000000000e+00 > > > 1 KSP preconditioned resid norm 5.003913641475e-01 true resid norm > 4.692996324114e+01 ||r(i)||/||b|| 5.205677185522e-06 > > > Residual norms for fieldsplit_u_ solve. > > > 0 KSP Residual norm 1.000012180204e+00 > > > 1 KSP Residual norm 1.017367950422e-05 > > > Residual norms for fieldsplit_wp_ solve. > > > 0 KSP Residual norm 0.000000000000e+00 > > > 2 KSP preconditioned resid norm 2.330910333756e-07 true resid norm > 3.474855463983e+01 ||r(i)||/||b|| 3.854461960453e-06 > > > Residual norms for fieldsplit_u_ solve. > > > 0 KSP Residual norm 1.000004200085e+00 > > > 1 KSP Residual norm 6.231613102458e-06 > > > Residual norms for fieldsplit_wp_ solve. > > > 0 KSP Residual norm 0.000000000000e+00 > > > 3 KSP preconditioned resid norm 8.671259838389e-11 true resid norm > 3.545103468011e+01 ||r(i)||/||b|| 3.932384125024e-06 > > > Linear solve converged due to CONVERGED_ATOL iterations 3 > > > KSP Object: 4 MPI processes > > > type: gmres > > > GMRES: restart=1000, using Modified Gram-Schmidt Orthogonalization > > > GMRES: happy breakdown tolerance 1e-30 > > > maximum iterations=1000, initial guess is zero > > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 > > > left preconditioning > > > using PRECONDITIONED norm type for convergence test > > > PC Object: 4 MPI processes > > > type: fieldsplit > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 > > > Solver info for each split is in the following KSP objects: > > > Split number 0 Defined by IS > > > KSP Object: (fieldsplit_u_) 4 MPI processes > > > type: richardson > > > Richardson: damping factor=1 > > > maximum iterations=1, initial guess is zero > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > > left preconditioning > > > using PRECONDITIONED norm type for convergence test > > > PC Object: (fieldsplit_u_) 4 MPI processes > > > type: lu > > > LU: out-of-place factorization > > > tolerance for zero pivot 2.22045e-14 > > > matrix ordering: natural > > > factor fill ratio given 0, needed 0 > > > Factored matrix follows: > > > Mat Object: 4 MPI processes > > > type: mpiaij > > > rows=938910, cols=938910 > > > package used to perform factorization: pastix > > > total: nonzeros=0, allocated nonzeros=0 > > > Error : 3.36878e-14 > > > total number of mallocs used during MatSetValues calls =0 > > > PaStiX run parameters: > > > Matrix type : Unsymmetric > > > Level of printing (0,1,2): 0 > > > Number of refinements iterations : 3 > > > Error : 3.36878e-14 > > > linear system matrix = precond matrix: > > > Mat Object: (fieldsplit_u_) 4 MPI processes > > > type: mpiaij > > > rows=938910, cols=938910, bs=3 > > > Error : 3.36878e-14 > > > Error : 3.36878e-14 > > > total: nonzeros=8.60906e+07, allocated nonzeros=8.60906e+07 > > > total number of mallocs used during MatSetValues calls =0 > > > using I-node (on process 0) routines: found 78749 nodes, > limit used is 5 > > > Split number 1 Defined by IS > > > KSP Object: (fieldsplit_wp_) 4 MPI processes > > > type: richardson > > > Richardson: damping factor=1 > > > maximum iterations=1, initial guess is zero > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > > left preconditioning > > > using PRECONDITIONED norm type for convergence test > > > PC Object: (fieldsplit_wp_) 4 MPI processes > > > type: lu > > > LU: out-of-place factorization > > > tolerance for zero pivot 2.22045e-14 > > > matrix ordering: natural > > > factor fill ratio given 0, needed 0 > > > Factored matrix follows: > > > Mat Object: 4 MPI processes > > > type: mpiaij > > > rows=34141, cols=34141 > > > package used to perform factorization: pastix > > > Error : -nan > > > Error : -nan > > > Error : -nan > > > total: nonzeros=0, allocated nonzeros=0 > > > total number of mallocs used during MatSetValues calls =0 > > > PaStiX run parameters: > > > Matrix type : Symmetric > > > Level of printing (0,1,2): 0 > > > Number of refinements iterations : 0 > > > Error : -nan > > > linear system matrix = precond matrix: > > > Mat Object: (fieldsplit_wp_) 4 MPI processes > > > type: mpiaij > > > rows=34141, cols=34141 > > > total: nonzeros=485655, allocated nonzeros=485655 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node (on process 0) routines > > > linear system matrix = precond matrix: > > > Mat Object: 4 MPI processes > > > type: mpiaij > > > rows=973051, cols=973051 > > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 > > > total number of mallocs used during MatSetValues calls =0 > > > using I-node (on process 0) routines: found 78749 nodes, limit > used is 5 > > > > > > > > > > > > Giang > > > > > > On Sun, Apr 23, 2017 at 10:19 PM, Barry Smith > wrote: > > > > > > > On Apr 23, 2017, at 2:42 PM, Hoang Giang Bui > wrote: > > > > > > > > Dear Matt/Barry > > > > > > > > With your options, it results in > > > > > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true resid norm > 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 > > > > Residual norms for fieldsplit_u_ solve. > > > > 0 KSP Residual norm 2.407308987203e+36 > > > > 1 KSP Residual norm 5.797185652683e+72 > > > > > > It looks like Matt is right, hypre is seemly producing useless garbage. > > > > > > First how do things run on one process. If you have similar problems > then debug on one process (debugging any kind of problem is always far easy > on one process). > > > > > > First run with -fieldsplit_u_type lu (instead of using hypre) to see > if that works or also produces something bad. > > > > > > What is the operator and the boundary conditions for u? It could be > singular. > > > > > > > > > > > > > > > > > > > > > > Residual norms for fieldsplit_wp_ solve. > > > > 0 KSP Residual norm 0.000000000000e+00 > > > > ... > > > > 999 KSP preconditioned resid norm 2.920157329174e+12 true resid norm > 9.015683504616e+06 ||r(i)||/||b|| 1.000059124102e+00 > > > > Residual norms for fieldsplit_u_ solve. > > > > 0 KSP Residual norm 1.533726746719e+36 > > > > 1 KSP Residual norm 3.692757392261e+72 > > > > Residual norms for fieldsplit_wp_ solve. > > > > 0 KSP Residual norm 0.000000000000e+00 > > > > > > > > Do you suggest that the pastix solver for the "wp" block encounters > small pivot? In addition, seem like the "u" block is also singular. > > > > > > > > Giang > > > > > > > > On Sun, Apr 23, 2017 at 7:39 PM, Barry Smith > wrote: > > > > > > > > Huge preconditioned norms but normal unpreconditioned norms > almost always come from a very small pivot in an LU or ILU factorization. > > > > > > > > The first thing to do is monitor the two sub solves. Run with the > additional options -fieldsplit_u_ksp_type richardson > -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 > -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor > -fieldsplit_wp_ksp_max_it 1 > > > > > > > > > On Apr 23, 2017, at 12:22 PM, Hoang Giang Bui > wrote: > > > > > > > > > > Hello > > > > > > > > > > I encountered a strange convergence behavior that I have trouble > to understand > > > > > > > > > > KSPSetFromOptions completed > > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true resid > norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 > > > > > 1 KSP preconditioned resid norm 2.933141742664e+29 true resid > norm 9.015152282123e+06 ||r(i)||/||b|| 1.000000198575e+00 > > > > > 2 KSP preconditioned resid norm 9.686409637174e+16 true resid > norm 9.015354521944e+06 ||r(i)||/||b|| 1.000022631902e+00 > > > > > 3 KSP preconditioned resid norm 4.219243615809e+15 true resid > norm 9.017157702420e+06 ||r(i)||/||b|| 1.000222648583e+00 > > > > > ..... > > > > > 999 KSP preconditioned resid norm 3.043754298076e+12 true resid > norm 9.015425041089e+06 ||r(i)||/||b|| 1.000030454195e+00 > > > > > 1000 KSP preconditioned resid norm 3.043000287819e+12 true resid > norm 9.015424313455e+06 ||r(i)||/||b|| 1.000030373483e+00 > > > > > Linear solve did not converge due to DIVERGED_ITS iterations 1000 > > > > > KSP Object: 4 MPI processes > > > > > type: gmres > > > > > GMRES: restart=1000, using Modified Gram-Schmidt > Orthogonalization > > > > > GMRES: happy breakdown tolerance 1e-30 > > > > > maximum iterations=1000, initial guess is zero > > > > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 > > > > > left preconditioning > > > > > using PRECONDITIONED norm type for convergence test > > > > > PC Object: 4 MPI processes > > > > > type: fieldsplit > > > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 > > > > > Solver info for each split is in the following KSP objects: > > > > > Split number 0 Defined by IS > > > > > KSP Object: (fieldsplit_u_) 4 MPI processes > > > > > type: preonly > > > > > maximum iterations=10000, initial guess is zero > > > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > > > > left preconditioning > > > > > using NONE norm type for convergence test > > > > > PC Object: (fieldsplit_u_) 4 MPI processes > > > > > type: hypre > > > > > HYPRE BoomerAMG preconditioning > > > > > HYPRE BoomerAMG: Cycle type V > > > > > HYPRE BoomerAMG: Maximum number of levels 25 > > > > > HYPRE BoomerAMG: Maximum number of iterations PER hypre > call 1 > > > > > HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 > > > > > HYPRE BoomerAMG: Threshold for strong coupling 0.6 > > > > > HYPRE BoomerAMG: Interpolation truncation factor 0 > > > > > HYPRE BoomerAMG: Interpolation: max elements per row 0 > > > > > HYPRE BoomerAMG: Number of levels of aggressive coarsening > 0 > > > > > HYPRE BoomerAMG: Number of paths for aggressive coarsening > 1 > > > > > HYPRE BoomerAMG: Maximum row sums 0.9 > > > > > HYPRE BoomerAMG: Sweeps down 1 > > > > > HYPRE BoomerAMG: Sweeps up 1 > > > > > HYPRE BoomerAMG: Sweeps on coarse 1 > > > > > HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi > > > > > HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi > > > > > HYPRE BoomerAMG: Relax on coarse Gaussian-elimination > > > > > HYPRE BoomerAMG: Relax weight (all) 1 > > > > > HYPRE BoomerAMG: Outer relax weight (all) 1 > > > > > HYPRE BoomerAMG: Using CF-relaxation > > > > > HYPRE BoomerAMG: Measure type local > > > > > HYPRE BoomerAMG: Coarsen type PMIS > > > > > HYPRE BoomerAMG: Interpolation type classical > > > > > linear system matrix = precond matrix: > > > > > Mat Object: (fieldsplit_u_) 4 MPI processes > > > > > type: mpiaij > > > > > rows=938910, cols=938910, bs=3 > > > > > total: nonzeros=8.60906e+07, allocated nonzeros=8.60906e+07 > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > using I-node (on process 0) routines: found 78749 nodes, > limit used is 5 > > > > > Split number 1 Defined by IS > > > > > KSP Object: (fieldsplit_wp_) 4 MPI processes > > > > > type: preonly > > > > > maximum iterations=10000, initial guess is zero > > > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > > > > left preconditioning > > > > > using NONE norm type for convergence test > > > > > PC Object: (fieldsplit_wp_) 4 MPI processes > > > > > type: lu > > > > > LU: out-of-place factorization > > > > > tolerance for zero pivot 2.22045e-14 > > > > > matrix ordering: natural > > > > > factor fill ratio given 0, needed 0 > > > > > Factored matrix follows: > > > > > Mat Object: 4 MPI processes > > > > > type: mpiaij > > > > > rows=34141, cols=34141 > > > > > package used to perform factorization: pastix > > > > > Error : -nan > > > > > Error : -nan > > > > > total: nonzeros=0, allocated nonzeros=0 > > > > > Error : -nan > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > PaStiX run parameters: > > > > > Matrix type : Symmetric > > > > > Level of printing (0,1,2): 0 > > > > > Number of refinements iterations : 0 > > > > > Error : -nan > > > > > linear system matrix = precond matrix: > > > > > Mat Object: (fieldsplit_wp_) 4 MPI processes > > > > > type: mpiaij > > > > > rows=34141, cols=34141 > > > > > total: nonzeros=485655, allocated nonzeros=485655 > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > not using I-node (on process 0) routines > > > > > linear system matrix = precond matrix: > > > > > Mat Object: 4 MPI processes > > > > > type: mpiaij > > > > > rows=973051, cols=973051 > > > > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > using I-node (on process 0) routines: found 78749 nodes, > limit used is 5 > > > > > > > > > > The pattern of convergence gives a hint that this system is > somehow bad/singular. But I don't know why the preconditioned error goes up > too high. Anyone has an idea? > > > > > > > > > > Best regards > > > > > Giang Bui > > > > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Mon Apr 24 17:17:57 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Mon, 24 Apr 2017 17:17:57 -0500 Subject: [petsc-users] strange convergence In-Reply-To: References: <7891536D-91FE-4BFF-8DAD-CE7AB85A4E57@mcs.anl.gov> <425BBB58-9721-49F3-8C86-940F08E925F7@mcs.anl.gov> <42EB791A-40C2-439F-A5F7-5F8C15CECA6F@mcs.anl.gov> Message-ID: This can happen in the matrix is singular or nearly singular or if the factorization generates small pivots, which can occur for even nonsingular problems if the matrix is poorly scaled or just plain nasty. > On Apr 24, 2017, at 5:10 PM, Hoang Giang Bui wrote: > > It took a while, here I send you the output > > 0 KSP preconditioned resid norm 3.129073545457e+05 true resid norm 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 7.442444222843e-01 true resid norm 1.003356247696e+02 ||r(i)||/||b|| 1.112966720375e-05 > 2 KSP preconditioned resid norm 3.267453132529e-07 true resid norm 3.216722968300e+01 ||r(i)||/||b|| 3.568130084011e-06 > 3 KSP preconditioned resid norm 1.155046883816e-11 true resid norm 3.234460376820e+01 ||r(i)||/||b|| 3.587805194854e-06 > Linear solve converged due to CONVERGED_ATOL iterations 3 > KSP Object: 4 MPI processes > type: gmres > GMRES: restart=1000, using Modified Gram-Schmidt Orthogonalization > GMRES: happy breakdown tolerance 1e-30 > maximum iterations=1000, initial guess is zero > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 > left preconditioning > using PRECONDITIONED norm type for convergence test > PC Object: 4 MPI processes > type: lu > LU: out-of-place factorization > tolerance for zero pivot 2.22045e-14 > matrix ordering: natural > factor fill ratio given 0, needed 0 > Factored matrix follows: > Mat Object: 4 MPI processes > type: mpiaij > rows=973051, cols=973051 > package used to perform factorization: pastix > Error : 3.24786e-14 > total: nonzeros=0, allocated nonzeros=0 > total number of mallocs used during MatSetValues calls =0 > PaStiX run parameters: > Matrix type : Unsymmetric > Level of printing (0,1,2): 0 > Number of refinements iterations : 3 > Error : 3.24786e-14 > linear system matrix = precond matrix: > Mat Object: 4 MPI processes > type: mpiaij > rows=973051, cols=973051 > Error : 3.24786e-14 > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 > total number of mallocs used during MatSetValues calls =0 > using I-node (on process 0) routines: found 78749 nodes, limit used is 5 > Error : 3.24786e-14 > > It doesn't do as you said. Something is not right here. I will look in depth. > > Giang > > On Mon, Apr 24, 2017 at 8:21 PM, Barry Smith wrote: > > > On Apr 24, 2017, at 12:47 PM, Hoang Giang Bui wrote: > > > > Good catch. I get this for the very first step, maybe at that time the rhs_w is zero. > > With the multiplicative composition the right hand side of the second solve is the initial right hand side of the second solve minus A_10*x where x is the solution to the first sub solve and A_10 is the lower left block of the outer matrix. So unless both the initial right hand side has a zero for the second block and A_10 is identically zero the right hand side for the second sub solve should not be zero. Is A_10 == 0? > > > > In the later step, it shows 2 step convergence > > > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 3.165886479830e+04 > > 1 KSP Residual norm 2.905922877684e-01 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 2.397669419027e-01 > > 1 KSP Residual norm 0.000000000000e+00 > > 0 KSP preconditioned resid norm 3.165886479920e+04 true resid norm 7.963616922323e+05 ||r(i)||/||b|| 1.000000000000e+00 > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 9.999891813771e-01 > > 1 KSP Residual norm 1.512000395579e-05 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 8.192702188243e-06 > > 1 KSP Residual norm 0.000000000000e+00 > > 1 KSP preconditioned resid norm 5.252183822848e-02 true resid norm 7.135927677844e+04 ||r(i)||/||b|| 8.960661653427e-02 > > The outer residual norms are still wonky, the preconditioned residual norm goes from 3.165886479920e+04 to 5.252183822848e-02 which is a huge drop but the 7.963616922323e+05 drops very much less 7.135927677844e+04. This is not normal. > > What if you just use -pc_type lu for the entire system (no fieldsplit), does the true residual drop to almost zero in the first iteration (as it should?). Send the output. > > > > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 6.946213936597e-01 > > 1 KSP Residual norm 1.195514007343e-05 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 1.025694497535e+00 > > 1 KSP Residual norm 0.000000000000e+00 > > 2 KSP preconditioned resid norm 8.785709535405e-03 true resid norm 1.419341799277e+04 ||r(i)||/||b|| 1.782282866091e-02 > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 7.255149996405e-01 > > 1 KSP Residual norm 6.583512434218e-06 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 1.015229700337e+00 > > 1 KSP Residual norm 0.000000000000e+00 > > 3 KSP preconditioned resid norm 7.110407712709e-04 true resid norm 5.284940654154e+02 ||r(i)||/||b|| 6.636357205153e-04 > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 3.512243341400e-01 > > 1 KSP Residual norm 2.032490351200e-06 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 1.282327290982e+00 > > 1 KSP Residual norm 0.000000000000e+00 > > 4 KSP preconditioned resid norm 3.482036620521e-05 true resid norm 4.291231924307e+01 ||r(i)||/||b|| 5.388546393133e-05 > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 3.423609338053e-01 > > 1 KSP Residual norm 4.213703301972e-07 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 1.157384757538e+00 > > 1 KSP Residual norm 0.000000000000e+00 > > 5 KSP preconditioned resid norm 1.203470314534e-06 true resid norm 4.544956156267e+00 ||r(i)||/||b|| 5.707150658550e-06 > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 3.838596289995e-01 > > 1 KSP Residual norm 9.927864176103e-08 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 1.066298905618e+00 > > 1 KSP Residual norm 0.000000000000e+00 > > 6 KSP preconditioned resid norm 3.331619244266e-08 true resid norm 2.821511729024e+00 ||r(i)||/||b|| 3.543002829675e-06 > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 4.624964188094e-01 > > 1 KSP Residual norm 6.418229775372e-08 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 9.800784311614e-01 > > 1 KSP Residual norm 0.000000000000e+00 > > 7 KSP preconditioned resid norm 8.788046233297e-10 true resid norm 2.849209671705e+00 ||r(i)||/||b|| 3.577783436215e-06 > > Linear solve converged due to CONVERGED_ATOL iterations 7 > > > > The outer operator is an explicit matrix. > > > > Giang > > > > On Mon, Apr 24, 2017 at 7:32 PM, Barry Smith wrote: > > > > > On Apr 24, 2017, at 3:16 AM, Hoang Giang Bui wrote: > > > > > > Thanks Barry, trying with -fieldsplit_u_type lu gives better convergence. I still used 4 procs though, probably with 1 proc it should also be the same. > > > > > > The u block used a Nitsche-type operator to connect two non-matching domains. I don't think it will leave some rigid body motion leads to not sufficient constraints. Maybe you have other idea? > > > > > > Residual norms for fieldsplit_u_ solve. > > > 0 KSP Residual norm 3.129067184300e+05 > > > 1 KSP Residual norm 5.906261468196e-01 > > > Residual norms for fieldsplit_wp_ solve. > > > 0 KSP Residual norm 0.000000000000e+00 > > > > ^^^^ something is wrong here. The sub solve should not be starting with a 0 residual (this means the right hand side for this sub solve is zero which it should not be). > > > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 > > > > > > How are you providing the outer operator? As an explicit matrix or with some shell matrix? > > > > > > > > > 0 KSP preconditioned resid norm 3.129067184300e+05 true resid norm 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 > > > Residual norms for fieldsplit_u_ solve. > > > 0 KSP Residual norm 9.999955993437e-01 > > > 1 KSP Residual norm 4.019774691831e-06 > > > Residual norms for fieldsplit_wp_ solve. > > > 0 KSP Residual norm 0.000000000000e+00 > > > 1 KSP preconditioned resid norm 5.003913641475e-01 true resid norm 4.692996324114e+01 ||r(i)||/||b|| 5.205677185522e-06 > > > Residual norms for fieldsplit_u_ solve. > > > 0 KSP Residual norm 1.000012180204e+00 > > > 1 KSP Residual norm 1.017367950422e-05 > > > Residual norms for fieldsplit_wp_ solve. > > > 0 KSP Residual norm 0.000000000000e+00 > > > 2 KSP preconditioned resid norm 2.330910333756e-07 true resid norm 3.474855463983e+01 ||r(i)||/||b|| 3.854461960453e-06 > > > Residual norms for fieldsplit_u_ solve. > > > 0 KSP Residual norm 1.000004200085e+00 > > > 1 KSP Residual norm 6.231613102458e-06 > > > Residual norms for fieldsplit_wp_ solve. > > > 0 KSP Residual norm 0.000000000000e+00 > > > 3 KSP preconditioned resid norm 8.671259838389e-11 true resid norm 3.545103468011e+01 ||r(i)||/||b|| 3.932384125024e-06 > > > Linear solve converged due to CONVERGED_ATOL iterations 3 > > > KSP Object: 4 MPI processes > > > type: gmres > > > GMRES: restart=1000, using Modified Gram-Schmidt Orthogonalization > > > GMRES: happy breakdown tolerance 1e-30 > > > maximum iterations=1000, initial guess is zero > > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 > > > left preconditioning > > > using PRECONDITIONED norm type for convergence test > > > PC Object: 4 MPI processes > > > type: fieldsplit > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 > > > Solver info for each split is in the following KSP objects: > > > Split number 0 Defined by IS > > > KSP Object: (fieldsplit_u_) 4 MPI processes > > > type: richardson > > > Richardson: damping factor=1 > > > maximum iterations=1, initial guess is zero > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > > left preconditioning > > > using PRECONDITIONED norm type for convergence test > > > PC Object: (fieldsplit_u_) 4 MPI processes > > > type: lu > > > LU: out-of-place factorization > > > tolerance for zero pivot 2.22045e-14 > > > matrix ordering: natural > > > factor fill ratio given 0, needed 0 > > > Factored matrix follows: > > > Mat Object: 4 MPI processes > > > type: mpiaij > > > rows=938910, cols=938910 > > > package used to perform factorization: pastix > > > total: nonzeros=0, allocated nonzeros=0 > > > Error : 3.36878e-14 > > > total number of mallocs used during MatSetValues calls =0 > > > PaStiX run parameters: > > > Matrix type : Unsymmetric > > > Level of printing (0,1,2): 0 > > > Number of refinements iterations : 3 > > > Error : 3.36878e-14 > > > linear system matrix = precond matrix: > > > Mat Object: (fieldsplit_u_) 4 MPI processes > > > type: mpiaij > > > rows=938910, cols=938910, bs=3 > > > Error : 3.36878e-14 > > > Error : 3.36878e-14 > > > total: nonzeros=8.60906e+07, allocated nonzeros=8.60906e+07 > > > total number of mallocs used during MatSetValues calls =0 > > > using I-node (on process 0) routines: found 78749 nodes, limit used is 5 > > > Split number 1 Defined by IS > > > KSP Object: (fieldsplit_wp_) 4 MPI processes > > > type: richardson > > > Richardson: damping factor=1 > > > maximum iterations=1, initial guess is zero > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > > left preconditioning > > > using PRECONDITIONED norm type for convergence test > > > PC Object: (fieldsplit_wp_) 4 MPI processes > > > type: lu > > > LU: out-of-place factorization > > > tolerance for zero pivot 2.22045e-14 > > > matrix ordering: natural > > > factor fill ratio given 0, needed 0 > > > Factored matrix follows: > > > Mat Object: 4 MPI processes > > > type: mpiaij > > > rows=34141, cols=34141 > > > package used to perform factorization: pastix > > > Error : -nan > > > Error : -nan > > > Error : -nan > > > total: nonzeros=0, allocated nonzeros=0 > > > total number of mallocs used during MatSetValues calls =0 > > > PaStiX run parameters: > > > Matrix type : Symmetric > > > Level of printing (0,1,2): 0 > > > Number of refinements iterations : 0 > > > Error : -nan > > > linear system matrix = precond matrix: > > > Mat Object: (fieldsplit_wp_) 4 MPI processes > > > type: mpiaij > > > rows=34141, cols=34141 > > > total: nonzeros=485655, allocated nonzeros=485655 > > > total number of mallocs used during MatSetValues calls =0 > > > not using I-node (on process 0) routines > > > linear system matrix = precond matrix: > > > Mat Object: 4 MPI processes > > > type: mpiaij > > > rows=973051, cols=973051 > > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 > > > total number of mallocs used during MatSetValues calls =0 > > > using I-node (on process 0) routines: found 78749 nodes, limit used is 5 > > > > > > > > > > > > Giang > > > > > > On Sun, Apr 23, 2017 at 10:19 PM, Barry Smith wrote: > > > > > > > On Apr 23, 2017, at 2:42 PM, Hoang Giang Bui wrote: > > > > > > > > Dear Matt/Barry > > > > > > > > With your options, it results in > > > > > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 > > > > Residual norms for fieldsplit_u_ solve. > > > > 0 KSP Residual norm 2.407308987203e+36 > > > > 1 KSP Residual norm 5.797185652683e+72 > > > > > > It looks like Matt is right, hypre is seemly producing useless garbage. > > > > > > First how do things run on one process. If you have similar problems then debug on one process (debugging any kind of problem is always far easy on one process). > > > > > > First run with -fieldsplit_u_type lu (instead of using hypre) to see if that works or also produces something bad. > > > > > > What is the operator and the boundary conditions for u? It could be singular. > > > > > > > > > > > > > > > > > > > > > > Residual norms for fieldsplit_wp_ solve. > > > > 0 KSP Residual norm 0.000000000000e+00 > > > > ... > > > > 999 KSP preconditioned resid norm 2.920157329174e+12 true resid norm 9.015683504616e+06 ||r(i)||/||b|| 1.000059124102e+00 > > > > Residual norms for fieldsplit_u_ solve. > > > > 0 KSP Residual norm 1.533726746719e+36 > > > > 1 KSP Residual norm 3.692757392261e+72 > > > > Residual norms for fieldsplit_wp_ solve. > > > > 0 KSP Residual norm 0.000000000000e+00 > > > > > > > > Do you suggest that the pastix solver for the "wp" block encounters small pivot? In addition, seem like the "u" block is also singular. > > > > > > > > Giang > > > > > > > > On Sun, Apr 23, 2017 at 7:39 PM, Barry Smith wrote: > > > > > > > > Huge preconditioned norms but normal unpreconditioned norms almost always come from a very small pivot in an LU or ILU factorization. > > > > > > > > The first thing to do is monitor the two sub solves. Run with the additional options -fieldsplit_u_ksp_type richardson -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor -fieldsplit_wp_ksp_max_it 1 > > > > > > > > > On Apr 23, 2017, at 12:22 PM, Hoang Giang Bui wrote: > > > > > > > > > > Hello > > > > > > > > > > I encountered a strange convergence behavior that I have trouble to understand > > > > > > > > > > KSPSetFromOptions completed > > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 > > > > > 1 KSP preconditioned resid norm 2.933141742664e+29 true resid norm 9.015152282123e+06 ||r(i)||/||b|| 1.000000198575e+00 > > > > > 2 KSP preconditioned resid norm 9.686409637174e+16 true resid norm 9.015354521944e+06 ||r(i)||/||b|| 1.000022631902e+00 > > > > > 3 KSP preconditioned resid norm 4.219243615809e+15 true resid norm 9.017157702420e+06 ||r(i)||/||b|| 1.000222648583e+00 > > > > > ..... > > > > > 999 KSP preconditioned resid norm 3.043754298076e+12 true resid norm 9.015425041089e+06 ||r(i)||/||b|| 1.000030454195e+00 > > > > > 1000 KSP preconditioned resid norm 3.043000287819e+12 true resid norm 9.015424313455e+06 ||r(i)||/||b|| 1.000030373483e+00 > > > > > Linear solve did not converge due to DIVERGED_ITS iterations 1000 > > > > > KSP Object: 4 MPI processes > > > > > type: gmres > > > > > GMRES: restart=1000, using Modified Gram-Schmidt Orthogonalization > > > > > GMRES: happy breakdown tolerance 1e-30 > > > > > maximum iterations=1000, initial guess is zero > > > > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 > > > > > left preconditioning > > > > > using PRECONDITIONED norm type for convergence test > > > > > PC Object: 4 MPI processes > > > > > type: fieldsplit > > > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 > > > > > Solver info for each split is in the following KSP objects: > > > > > Split number 0 Defined by IS > > > > > KSP Object: (fieldsplit_u_) 4 MPI processes > > > > > type: preonly > > > > > maximum iterations=10000, initial guess is zero > > > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > > > > left preconditioning > > > > > using NONE norm type for convergence test > > > > > PC Object: (fieldsplit_u_) 4 MPI processes > > > > > type: hypre > > > > > HYPRE BoomerAMG preconditioning > > > > > HYPRE BoomerAMG: Cycle type V > > > > > HYPRE BoomerAMG: Maximum number of levels 25 > > > > > HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 > > > > > HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 > > > > > HYPRE BoomerAMG: Threshold for strong coupling 0.6 > > > > > HYPRE BoomerAMG: Interpolation truncation factor 0 > > > > > HYPRE BoomerAMG: Interpolation: max elements per row 0 > > > > > HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 > > > > > HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 > > > > > HYPRE BoomerAMG: Maximum row sums 0.9 > > > > > HYPRE BoomerAMG: Sweeps down 1 > > > > > HYPRE BoomerAMG: Sweeps up 1 > > > > > HYPRE BoomerAMG: Sweeps on coarse 1 > > > > > HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi > > > > > HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi > > > > > HYPRE BoomerAMG: Relax on coarse Gaussian-elimination > > > > > HYPRE BoomerAMG: Relax weight (all) 1 > > > > > HYPRE BoomerAMG: Outer relax weight (all) 1 > > > > > HYPRE BoomerAMG: Using CF-relaxation > > > > > HYPRE BoomerAMG: Measure type local > > > > > HYPRE BoomerAMG: Coarsen type PMIS > > > > > HYPRE BoomerAMG: Interpolation type classical > > > > > linear system matrix = precond matrix: > > > > > Mat Object: (fieldsplit_u_) 4 MPI processes > > > > > type: mpiaij > > > > > rows=938910, cols=938910, bs=3 > > > > > total: nonzeros=8.60906e+07, allocated nonzeros=8.60906e+07 > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > using I-node (on process 0) routines: found 78749 nodes, limit used is 5 > > > > > Split number 1 Defined by IS > > > > > KSP Object: (fieldsplit_wp_) 4 MPI processes > > > > > type: preonly > > > > > maximum iterations=10000, initial guess is zero > > > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > > > > left preconditioning > > > > > using NONE norm type for convergence test > > > > > PC Object: (fieldsplit_wp_) 4 MPI processes > > > > > type: lu > > > > > LU: out-of-place factorization > > > > > tolerance for zero pivot 2.22045e-14 > > > > > matrix ordering: natural > > > > > factor fill ratio given 0, needed 0 > > > > > Factored matrix follows: > > > > > Mat Object: 4 MPI processes > > > > > type: mpiaij > > > > > rows=34141, cols=34141 > > > > > package used to perform factorization: pastix > > > > > Error : -nan > > > > > Error : -nan > > > > > total: nonzeros=0, allocated nonzeros=0 > > > > > Error : -nan > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > PaStiX run parameters: > > > > > Matrix type : Symmetric > > > > > Level of printing (0,1,2): 0 > > > > > Number of refinements iterations : 0 > > > > > Error : -nan > > > > > linear system matrix = precond matrix: > > > > > Mat Object: (fieldsplit_wp_) 4 MPI processes > > > > > type: mpiaij > > > > > rows=34141, cols=34141 > > > > > total: nonzeros=485655, allocated nonzeros=485655 > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > not using I-node (on process 0) routines > > > > > linear system matrix = precond matrix: > > > > > Mat Object: 4 MPI processes > > > > > type: mpiaij > > > > > rows=973051, cols=973051 > > > > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > using I-node (on process 0) routines: found 78749 nodes, limit used is 5 > > > > > > > > > > The pattern of convergence gives a hint that this system is somehow bad/singular. But I don't know why the preconditioned error goes up too high. Anyone has an idea? > > > > > > > > > > Best regards > > > > > Giang Bui > > > > > > > > > > > > > > > > > > > > > > > > > From gnw20 at cam.ac.uk Tue Apr 25 11:48:58 2017 From: gnw20 at cam.ac.uk (Garth N. Wells) Date: Tue, 25 Apr 2017 17:48:58 +0100 Subject: [petsc-users] petsc4py bool type Message-ID: I'm seeing some behaviour with bool types in petsc4py that I didn't expect. In the Python interface, returned Booleans have type '', where I expected them to have type ' '. Below program illustrates issue. Seems to be related to bint in cython. Am I doing something wrong? Garth from petsc4py import PETSc A = PETSc.Mat() A.createAIJ((2, 2)) A.setOption(PETSc.Mat.Option.SYMMETRIC, True) symm = A.isSymmetricKnown() print("Symmetry set:", symm[0] is True) print("Symmetry set:", symm[0] == True) print("Bool type:", type(symm[0])) From hongzhang at anl.gov Tue Apr 25 13:36:54 2017 From: hongzhang at anl.gov (Zhang, Hong) Date: Tue, 25 Apr 2017 18:36:54 +0000 Subject: [petsc-users] petsc4py bool type In-Reply-To: References: Message-ID: PetscBool is indeed an int. So there is nothing wrong. PETSc does not use bool in order to support C89. Hong (Mr.) > On Apr 25, 2017, at 11:48 AM, Garth N. Wells wrote: > > I'm seeing some behaviour with bool types in petsc4py that I didn't > expect. In the Python interface, returned Booleans have type ' 'int'>', where I expected them to have type ' '. Below > program illustrates issue. Seems to be related to bint in cython. Am I > doing something wrong? > > Garth > > > from petsc4py import PETSc > A = PETSc.Mat() > A.createAIJ((2, 2)) > A.setOption(PETSc.Mat.Option.SYMMETRIC, True) > symm = A.isSymmetricKnown() > print("Symmetry set:", symm[0] is True) > print("Symmetry set:", symm[0] == True) > print("Bool type:", type(symm[0])) From bsmith at mcs.anl.gov Tue Apr 25 13:45:05 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 25 Apr 2017 13:45:05 -0500 Subject: [petsc-users] petsc4py bool type In-Reply-To: References: Message-ID: > On Apr 25, 2017, at 1:36 PM, Zhang, Hong wrote: > > PetscBool is indeed an int. So there is nothing wrong. PETSc does not use bool in order to support C89. Yes, but in Python using a bool is more natural. For example in Fortran PETSc uses the Fortran logical as the PetscBool Barry > > Hong (Mr.) > >> On Apr 25, 2017, at 11:48 AM, Garth N. Wells wrote: >> >> I'm seeing some behaviour with bool types in petsc4py that I didn't >> expect. In the Python interface, returned Booleans have type '> 'int'>', where I expected them to have type ' '. Below >> program illustrates issue. Seems to be related to bint in cython. Am I >> doing something wrong? >> >> Garth >> >> >> from petsc4py import PETSc >> A = PETSc.Mat() >> A.createAIJ((2, 2)) >> A.setOption(PETSc.Mat.Option.SYMMETRIC, True) >> symm = A.isSymmetricKnown() >> print("Symmetry set:", symm[0] is True) >> print("Symmetry set:", symm[0] == True) >> print("Bool type:", type(symm[0])) > From fande.kong at inl.gov Tue Apr 25 16:35:16 2017 From: fande.kong at inl.gov (Kong, Fande) Date: Tue, 25 Apr 2017 15:35:16 -0600 Subject: [petsc-users] misleading "mpich" messages Message-ID: Hi, We configured PETSc with a version of MVAPICH, and complied with another version of MVAPICH. Got the error messages: *error "PETSc was configured with one MPICH mpi.h version but now appears to be compiling using a different MPICH mpi.h version"* Why we could not say something about "MVAPICH" (not "MPICH")? Do we just simply consider all MPI implementations (MVAPICH, maybe Intel MPI, IBM mpi?) based on MPICH as "MPICH"? Fande, -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Tue Apr 25 16:42:41 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 25 Apr 2017 16:42:41 -0500 Subject: [petsc-users] misleading "mpich" messages In-Reply-To: References: Message-ID: The error message is generated based on the macro MPICH_NUMVERSION contained in the mpi.h file. Apparently MVAPICH also provides this macro, hence PETSc has no way to know that it is not MPICH. If you can locate in the MVAPICH mpi.h include files macros related explicitly to MVAPICH then we could possibly use that macro to provide a more specific error message. Barry > On Apr 25, 2017, at 4:35 PM, Kong, Fande wrote: > > Hi, > > We configured PETSc with a version of MVAPICH, and complied with another version of MVAPICH. Got the error messages: > > error "PETSc was configured with one MPICH mpi.h version but now appears to be compiling using a different MPICH mpi.h version" > > > Why we could not say something about "MVAPICH" (not "MPICH")? > > Do we just simply consider all MPI implementations (MVAPICH, maybe Intel MPI, IBM mpi?) based on MPICH as "MPICH"? > > Fande, From fande.kong at inl.gov Tue Apr 25 16:53:57 2017 From: fande.kong at inl.gov (Kong, Fande) Date: Tue, 25 Apr 2017 15:53:57 -0600 Subject: [petsc-users] misleading "mpich" messages In-Reply-To: References: Message-ID: On Tue, Apr 25, 2017 at 3:42 PM, Barry Smith wrote: > > The error message is generated based on the macro MPICH_NUMVERSION > contained in the mpi.h file. > > Apparently MVAPICH also provides this macro, hence PETSc has no way to > know that it is not MPICH. > > If you can locate in the MVAPICH mpi.h include files macros related > explicitly to MVAPICH then we could possibly use that macro to provide a > more specific error message. > There is also a macro: MVAPICH2_NUMVERSION in mpi.h. We might use it to have a right message. Looks possible for me. Fande, > > Barry > > > > On Apr 25, 2017, at 4:35 PM, Kong, Fande wrote: > > > > Hi, > > > > We configured PETSc with a version of MVAPICH, and complied with another > version of MVAPICH. Got the error messages: > > > > error "PETSc was configured with one MPICH mpi.h version but now appears > to be compiling using a different MPICH mpi.h version" > > > > > > Why we could not say something about "MVAPICH" (not "MPICH")? > > > > Do we just simply consider all MPI implementations (MVAPICH, maybe Intel > MPI, IBM mpi?) based on MPICH as "MPICH"? > > > > Fande, > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Tue Apr 25 16:57:50 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 25 Apr 2017 16:57:50 -0500 Subject: [petsc-users] misleading "mpich" messages In-Reply-To: References: Message-ID: You can try the attached [untested] patch. It replicates the MPICH_NUMVERSION code and replaces it with MVAPICH2_NUMVERSION Satish On Tue, 25 Apr 2017, Kong, Fande wrote: > On Tue, Apr 25, 2017 at 3:42 PM, Barry Smith wrote: > > > > > The error message is generated based on the macro MPICH_NUMVERSION > > contained in the mpi.h file. > > > > Apparently MVAPICH also provides this macro, hence PETSc has no way to > > know that it is not MPICH. > > > > If you can locate in the MVAPICH mpi.h include files macros related > > explicitly to MVAPICH then we could possibly use that macro to provide a > > more specific error message. > > > > > There is also a macro: MVAPICH2_NUMVERSION in mpi.h. We might use it to > have a right message. > > Looks possible for me. > > Fande, > > > > > > > Barry > > > > > > > On Apr 25, 2017, at 4:35 PM, Kong, Fande wrote: > > > > > > Hi, > > > > > > We configured PETSc with a version of MVAPICH, and complied with another > > version of MVAPICH. Got the error messages: > > > > > > error "PETSc was configured with one MPICH mpi.h version but now appears > > to be compiling using a different MPICH mpi.h version" > > > > > > > > > Why we could not say something about "MVAPICH" (not "MPICH")? > > > > > > Do we just simply consider all MPI implementations (MVAPICH, maybe Intel > > MPI, IBM mpi?) based on MPICH as "MPICH"? > > > > > > Fande, > > > > > -------------- next part -------------- diff --git a/config/BuildSystem/config/packages/MPI.py b/config/BuildSystem/config/packages/MPI.py index f016a019eb..daed0d3f69 100644 --- a/config/BuildSystem/config/packages/MPI.py +++ b/config/BuildSystem/config/packages/MPI.py @@ -425,14 +425,22 @@ class Configure(config.package.Package): return def checkMPICHorOpenMPI(self): - '''Determine if MPICH_NUMVERSION or OMPI_MAJOR_VERSION exist in mpi.h + '''Determine if MVAPICH2_NUMVERSION, MPICH_NUMVERSION or OMPI_MAJOR_VERSION exist in mpi.h Used for consistency checking of MPI installation at compile time''' import re + mvapich2_test = '#include \nint mpich_ver = MVAPICH2_NUMVERSION;\n' mpich_test = '#include \nint mpich_ver = MPICH_NUMVERSION;\n' openmpi_test = '#include \nint ompi_major = OMPI_MAJOR_VERSION;\nint ompi_minor = OMPI_MINOR_VERSION;\nint ompi_release = OMPI_RELEASE_VERSION;\n' oldFlags = self.compilers.CPPFLAGS self.compilers.CPPFLAGS += ' '+self.headers.toString(self.include) - if self.checkCompile(mpich_test): + if self.checkCompile(mvapich2_test): + buf = self.outputPreprocess(mvapich2_test) + try: + mvapich2_numversion = re.compile('\nint mvapich2_ver = *([0-9]*) *;').search(buf).group(1) + self.addDefine('HAVE_MVAPICH2_NUMVERSION',mvapich2_numversion) + except: + self.logPrint('Unable to parse MVAPICH2 version from header. Probably a buggy preprocessor') + elif self.checkCompile(mpich_test): buf = self.outputPreprocess(mpich_test) try: mpich_numversion = re.compile('\nint mpich_ver = *([0-9]*) *;').search(buf).group(1) diff --git a/include/petscsys.h b/include/petscsys.h index 1ece229001..4a9446fdd4 100644 --- a/include/petscsys.h +++ b/include/petscsys.h @@ -139,8 +139,14 @@ void assert_never_put_petsc_headers_inside_an_extern_c(int); void assert_never_p # if !defined(__MPIUNI_H) # error "PETSc was configured with --with-mpi=0 but now appears to be compiling using a different mpi.h" # endif +#elif defined(PETSC_HAVE_MVAPICH2_NUMVERSION) +# if !defined(MVAPICH2_NUMVERSION) +# error "PETSc was configured with MVAPICH2 but now appears to be compiling using a non-MVAPICH2 mpi.h" +# elif MVAPICH2_NUMVERSION != PETSC_HAVE_MVAPICH2_NUMVERSION +# error "PETSc was configured with one MVAPICH2 mpi.h version but now appears to be compiling using a different MVAPICH2 mpi.h version" +# endif #elif defined(PETSC_HAVE_MPICH_NUMVERSION) -# if !defined(MPICH_NUMVERSION) +# if !defined(MPICH_NUMVERSION) || defined(MVAPICH2_NUMVERSION) # error "PETSc was configured with MPICH but now appears to be compiling using a non-MPICH mpi.h" # elif MPICH_NUMVERSION != PETSC_HAVE_MPICH_NUMVERSION # error "PETSc was configured with one MPICH mpi.h version but now appears to be compiling using a different MPICH mpi.h version" From balay at mcs.anl.gov Tue Apr 25 17:03:25 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 25 Apr 2017 17:03:25 -0500 Subject: [petsc-users] misleading "mpich" messages In-Reply-To: References: Message-ID: Added this patch to balay/add-mvapich-version-check Satish On Tue, 25 Apr 2017, Satish Balay wrote: > You can try the attached [untested] patch. It replicates the > MPICH_NUMVERSION code and replaces it with MVAPICH2_NUMVERSION > > Satish > > On Tue, 25 Apr 2017, Kong, Fande wrote: > > > On Tue, Apr 25, 2017 at 3:42 PM, Barry Smith wrote: > > > > > > > > The error message is generated based on the macro MPICH_NUMVERSION > > > contained in the mpi.h file. > > > > > > Apparently MVAPICH also provides this macro, hence PETSc has no way to > > > know that it is not MPICH. > > > > > > If you can locate in the MVAPICH mpi.h include files macros related > > > explicitly to MVAPICH then we could possibly use that macro to provide a > > > more specific error message. > > > > > > > > > There is also a macro: MVAPICH2_NUMVERSION in mpi.h. We might use it to > > have a right message. > > > > Looks possible for me. > > > > Fande, > > > > > > > > > > > > Barry > > > > > > > > > > On Apr 25, 2017, at 4:35 PM, Kong, Fande wrote: > > > > > > > > Hi, > > > > > > > > We configured PETSc with a version of MVAPICH, and complied with another > > > version of MVAPICH. Got the error messages: > > > > > > > > error "PETSc was configured with one MPICH mpi.h version but now appears > > > to be compiling using a different MPICH mpi.h version" > > > > > > > > > > > > Why we could not say something about "MVAPICH" (not "MPICH")? > > > > > > > > Do we just simply consider all MPI implementations (MVAPICH, maybe Intel > > > MPI, IBM mpi?) based on MPICH as "MPICH"? > > > > > > > > Fande, > > > > > > > > > From fande.kong at inl.gov Tue Apr 25 17:08:48 2017 From: fande.kong at inl.gov (Kong, Fande) Date: Tue, 25 Apr 2017 16:08:48 -0600 Subject: [petsc-users] misleading "mpich" messages In-Reply-To: References: Message-ID: Thanks, Satish, One more question: will petsc complain different versions of other implementations such as intel MPI and IBM MPI? For example, configure with a version of intel MPI, and compile with another version of intel MPI. Do we have error messages on this? Fande, On Tue, Apr 25, 2017 at 4:03 PM, Satish Balay wrote: > Added this patch to balay/add-mvapich-version-check > > Satish > > On Tue, 25 Apr 2017, Satish Balay wrote: > > > You can try the attached [untested] patch. It replicates the > > MPICH_NUMVERSION code and replaces it with MVAPICH2_NUMVERSION > > > > Satish > > > > On Tue, 25 Apr 2017, Kong, Fande wrote: > > > > > On Tue, Apr 25, 2017 at 3:42 PM, Barry Smith > wrote: > > > > > > > > > > > The error message is generated based on the macro MPICH_NUMVERSION > > > > contained in the mpi.h file. > > > > > > > > Apparently MVAPICH also provides this macro, hence PETSc has no way > to > > > > know that it is not MPICH. > > > > > > > > If you can locate in the MVAPICH mpi.h include files macros related > > > > explicitly to MVAPICH then we could possibly use that macro to > provide a > > > > more specific error message. > > > > > > > > > > > > > There is also a macro: MVAPICH2_NUMVERSION in mpi.h. We might use it to > > > have a right message. > > > > > > Looks possible for me. > > > > > > Fande, > > > > > > > > > > > > > > > > > Barry > > > > > > > > > > > > > On Apr 25, 2017, at 4:35 PM, Kong, Fande > wrote: > > > > > > > > > > Hi, > > > > > > > > > > We configured PETSc with a version of MVAPICH, and complied with > another > > > > version of MVAPICH. Got the error messages: > > > > > > > > > > error "PETSc was configured with one MPICH mpi.h version but now > appears > > > > to be compiling using a different MPICH mpi.h version" > > > > > > > > > > > > > > > Why we could not say something about "MVAPICH" (not "MPICH")? > > > > > > > > > > Do we just simply consider all MPI implementations (MVAPICH, maybe > Intel > > > > MPI, IBM mpi?) based on MPICH as "MPICH"? > > > > > > > > > > Fande, > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Tue Apr 25 17:19:41 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 25 Apr 2017 17:19:41 -0500 Subject: [petsc-users] misleading "mpich" messages In-Reply-To: References: Message-ID: Nope - this error check is not complete - i.e does not cover all MPI impls. Is that possible - or desirable? I don't know. Currently it does it for MPICH and openMPI. [and the MPICH check should cover all all derivatives that are likely to use MPICH_NUMVERSION - eventhough they will be tagged as MPICH in the error message - as you've noticed] So my patch tries to separate out mvapich check from mpich check. Perhaps there is a better way to check all MPICH derivatives [that have both MPICH_NUMVERSION and pkg_NUMVERSION] without duplicating code all over the place.. Satish On Tue, 25 Apr 2017, Kong, Fande wrote: > Thanks, Satish, > > One more question: will petsc complain different versions of other > implementations such as intel MPI and IBM MPI? For example, configure with > a version of intel MPI, and compile with another version of intel MPI. Do > we have error messages on this? > > Fande, > > On Tue, Apr 25, 2017 at 4:03 PM, Satish Balay wrote: > > > Added this patch to balay/add-mvapich-version-check > > > > Satish > > > > On Tue, 25 Apr 2017, Satish Balay wrote: > > > > > You can try the attached [untested] patch. It replicates the > > > MPICH_NUMVERSION code and replaces it with MVAPICH2_NUMVERSION > > > > > > Satish > > > > > > On Tue, 25 Apr 2017, Kong, Fande wrote: > > > > > > > On Tue, Apr 25, 2017 at 3:42 PM, Barry Smith > > wrote: > > > > > > > > > > > > > > The error message is generated based on the macro MPICH_NUMVERSION > > > > > contained in the mpi.h file. > > > > > > > > > > Apparently MVAPICH also provides this macro, hence PETSc has no way > > to > > > > > know that it is not MPICH. > > > > > > > > > > If you can locate in the MVAPICH mpi.h include files macros related > > > > > explicitly to MVAPICH then we could possibly use that macro to > > provide a > > > > > more specific error message. > > > > > > > > > > > > > > > > > There is also a macro: MVAPICH2_NUMVERSION in mpi.h. We might use it to > > > > have a right message. > > > > > > > > Looks possible for me. > > > > > > > > Fande, > > > > > > > > > > > > > > > > > > > > > > Barry > > > > > > > > > > > > > > > > On Apr 25, 2017, at 4:35 PM, Kong, Fande > > wrote: > > > > > > > > > > > > Hi, > > > > > > > > > > > > We configured PETSc with a version of MVAPICH, and complied with > > another > > > > > version of MVAPICH. Got the error messages: > > > > > > > > > > > > error "PETSc was configured with one MPICH mpi.h version but now > > appears > > > > > to be compiling using a different MPICH mpi.h version" > > > > > > > > > > > > > > > > > > Why we could not say something about "MVAPICH" (not "MPICH")? > > > > > > > > > > > > Do we just simply consider all MPI implementations (MVAPICH, maybe > > Intel > > > > > MPI, IBM mpi?) based on MPICH as "MPICH"? > > > > > > > > > > > > Fande, > > > > > > > > > > > > > > > > > > > > > > From bsmith at mcs.anl.gov Tue Apr 25 17:33:16 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 25 Apr 2017 17:33:16 -0500 Subject: [petsc-users] misleading "mpich" messages In-Reply-To: References: Message-ID: > On Apr 25, 2017, at 5:08 PM, Kong, Fande wrote: > > Thanks, Satish, > > One more question: will petsc complain different versions of other implementations such as intel MPI and IBM MPI? For example, configure with a version of intel MPI, and compile with another version of intel MPI. Do we have error messages on this? The compile time checking is in include/petscsys.h so you can easily see what we do do. As Satish says we can try to add more cases one at a time if we know unique macros used in particular mpi.h but with many cases the code will become messy unless there is a pattern we can organize around. > > Fande, > > On Tue, Apr 25, 2017 at 4:03 PM, Satish Balay wrote: > Added this patch to balay/add-mvapich-version-check > > Satish > > On Tue, 25 Apr 2017, Satish Balay wrote: > > > You can try the attached [untested] patch. It replicates the > > MPICH_NUMVERSION code and replaces it with MVAPICH2_NUMVERSION > > > > Satish > > > > On Tue, 25 Apr 2017, Kong, Fande wrote: > > > > > On Tue, Apr 25, 2017 at 3:42 PM, Barry Smith wrote: > > > > > > > > > > > The error message is generated based on the macro MPICH_NUMVERSION > > > > contained in the mpi.h file. > > > > > > > > Apparently MVAPICH also provides this macro, hence PETSc has no way to > > > > know that it is not MPICH. > > > > > > > > If you can locate in the MVAPICH mpi.h include files macros related > > > > explicitly to MVAPICH then we could possibly use that macro to provide a > > > > more specific error message. > > > > > > > > > > > > > There is also a macro: MVAPICH2_NUMVERSION in mpi.h. We might use it to > > > have a right message. > > > > > > Looks possible for me. > > > > > > Fande, > > > > > > > > > > > > > > > > > Barry > > > > > > > > > > > > > On Apr 25, 2017, at 4:35 PM, Kong, Fande wrote: > > > > > > > > > > Hi, > > > > > > > > > > We configured PETSc with a version of MVAPICH, and complied with another > > > > version of MVAPICH. Got the error messages: > > > > > > > > > > error "PETSc was configured with one MPICH mpi.h version but now appears > > > > to be compiling using a different MPICH mpi.h version" > > > > > > > > > > > > > > > Why we could not say something about "MVAPICH" (not "MPICH")? > > > > > > > > > > Do we just simply consider all MPI implementations (MVAPICH, maybe Intel > > > > MPI, IBM mpi?) based on MPICH as "MPICH"? > > > > > > > > > > Fande, > > > > > > > > > > > > > > > From fande.kong at inl.gov Tue Apr 25 17:36:10 2017 From: fande.kong at inl.gov (Kong, Fande) Date: Tue, 25 Apr 2017 16:36:10 -0600 Subject: [petsc-users] misleading "mpich" messages In-Reply-To: References: Message-ID: Thanks, Barry and Satish, It makes sense. Fande, On Tue, Apr 25, 2017 at 4:33 PM, Barry Smith wrote: > > > On Apr 25, 2017, at 5:08 PM, Kong, Fande wrote: > > > > Thanks, Satish, > > > > One more question: will petsc complain different versions of other > implementations such as intel MPI and IBM MPI? For example, configure with > a version of intel MPI, and compile with another version of intel MPI. Do > we have error messages on this? > > The compile time checking is in include/petscsys.h so you can easily > see what we do do. As Satish says we can try to add more cases one at a > time if we know unique macros used in particular mpi.h but with many cases > the code will become messy unless there is a pattern we can organize around. > > > > > > > > Fande, > > > > On Tue, Apr 25, 2017 at 4:03 PM, Satish Balay wrote: > > Added this patch to balay/add-mvapich-version-check > > > > Satish > > > > On Tue, 25 Apr 2017, Satish Balay wrote: > > > > > You can try the attached [untested] patch. It replicates the > > > MPICH_NUMVERSION code and replaces it with MVAPICH2_NUMVERSION > > > > > > Satish > > > > > > On Tue, 25 Apr 2017, Kong, Fande wrote: > > > > > > > On Tue, Apr 25, 2017 at 3:42 PM, Barry Smith > wrote: > > > > > > > > > > > > > > The error message is generated based on the macro > MPICH_NUMVERSION > > > > > contained in the mpi.h file. > > > > > > > > > > Apparently MVAPICH also provides this macro, hence PETSc has no > way to > > > > > know that it is not MPICH. > > > > > > > > > > If you can locate in the MVAPICH mpi.h include files macros related > > > > > explicitly to MVAPICH then we could possibly use that macro to > provide a > > > > > more specific error message. > > > > > > > > > > > > > > > > > There is also a macro: MVAPICH2_NUMVERSION in mpi.h. We might use it > to > > > > have a right message. > > > > > > > > Looks possible for me. > > > > > > > > Fande, > > > > > > > > > > > > > > > > > > > > > > Barry > > > > > > > > > > > > > > > > On Apr 25, 2017, at 4:35 PM, Kong, Fande > wrote: > > > > > > > > > > > > Hi, > > > > > > > > > > > > We configured PETSc with a version of MVAPICH, and complied with > another > > > > > version of MVAPICH. Got the error messages: > > > > > > > > > > > > error "PETSc was configured with one MPICH mpi.h version but now > appears > > > > > to be compiling using a different MPICH mpi.h version" > > > > > > > > > > > > > > > > > > Why we could not say something about "MVAPICH" (not "MPICH")? > > > > > > > > > > > > Do we just simply consider all MPI implementations (MVAPICH, > maybe Intel > > > > > MPI, IBM mpi?) based on MPICH as "MPICH"? > > > > > > > > > > > > Fande, > > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Tue Apr 25 19:32:16 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Tue, 25 Apr 2017 19:32:16 -0500 Subject: [petsc-users] misleading "mpich" messages In-Reply-To: References: Message-ID: I've updated balay/add-mvapich-version-check to also check IMPI version. Satish On Tue, 25 Apr 2017, Kong, Fande wrote: > Thanks, Barry and Satish, > > It makes sense. > > Fande, > > On Tue, Apr 25, 2017 at 4:33 PM, Barry Smith wrote: > > > > > > On Apr 25, 2017, at 5:08 PM, Kong, Fande wrote: > > > > > > Thanks, Satish, > > > > > > One more question: will petsc complain different versions of other > > implementations such as intel MPI and IBM MPI? For example, configure with > > a version of intel MPI, and compile with another version of intel MPI. Do > > we have error messages on this? > > > > The compile time checking is in include/petscsys.h so you can easily > > see what we do do. As Satish says we can try to add more cases one at a > > time if we know unique macros used in particular mpi.h but with many cases > > the code will become messy unless there is a pattern we can organize around. > > > > > > > > > > > > > > Fande, > > > > > > On Tue, Apr 25, 2017 at 4:03 PM, Satish Balay wrote: > > > Added this patch to balay/add-mvapich-version-check > > > > > > Satish > > > > > > On Tue, 25 Apr 2017, Satish Balay wrote: > > > > > > > You can try the attached [untested] patch. It replicates the > > > > MPICH_NUMVERSION code and replaces it with MVAPICH2_NUMVERSION > > > > > > > > Satish > > > > > > > > On Tue, 25 Apr 2017, Kong, Fande wrote: > > > > > > > > > On Tue, Apr 25, 2017 at 3:42 PM, Barry Smith > > wrote: > > > > > > > > > > > > > > > > > The error message is generated based on the macro > > MPICH_NUMVERSION > > > > > > contained in the mpi.h file. > > > > > > > > > > > > Apparently MVAPICH also provides this macro, hence PETSc has no > > way to > > > > > > know that it is not MPICH. > > > > > > > > > > > > If you can locate in the MVAPICH mpi.h include files macros related > > > > > > explicitly to MVAPICH then we could possibly use that macro to > > provide a > > > > > > more specific error message. > > > > > > > > > > > > > > > > > > > > > There is also a macro: MVAPICH2_NUMVERSION in mpi.h. We might use it > > to > > > > > have a right message. > > > > > > > > > > Looks possible for me. > > > > > > > > > > Fande, > > > > > > > > > > > > > > > > > > > > > > > > > > > Barry > > > > > > > > > > > > > > > > > > > On Apr 25, 2017, at 4:35 PM, Kong, Fande > > wrote: > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > We configured PETSc with a version of MVAPICH, and complied with > > another > > > > > > version of MVAPICH. Got the error messages: > > > > > > > > > > > > > > error "PETSc was configured with one MPICH mpi.h version but now > > appears > > > > > > to be compiling using a different MPICH mpi.h version" > > > > > > > > > > > > > > > > > > > > > Why we could not say something about "MVAPICH" (not "MPICH")? > > > > > > > > > > > > > > Do we just simply consider all MPI implementations (MVAPICH, > > maybe Intel > > > > > > MPI, IBM mpi?) based on MPICH as "MPICH"? > > > > > > > > > > > > > > Fande, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From jed at jedbrown.org Tue Apr 25 20:27:38 2017 From: jed at jedbrown.org (Jed Brown) Date: Tue, 25 Apr 2017 19:27:38 -0600 Subject: [petsc-users] petsc4py bool type In-Reply-To: References: Message-ID: <87pog02c1h.fsf@jedbrown.org> Barry Smith writes: >> On Apr 25, 2017, at 1:36 PM, Zhang, Hong wrote: >> >> PetscBool is indeed an int. So there is nothing wrong. PETSc does not use bool in order to support C89. > > Yes, but in Python using a bool is more natural. For example in Fortran PETSc uses the Fortran logical as the PetscBool Right, there is no requirement or even convenience in the Python native type having the same bit representation. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From bsmith at mcs.anl.gov Tue Apr 25 20:37:46 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Tue, 25 Apr 2017 20:37:46 -0500 Subject: [petsc-users] petsc4py bool type In-Reply-To: <87pog02c1h.fsf@jedbrown.org> References: <87pog02c1h.fsf@jedbrown.org> Message-ID: <72D37517-699B-4B38-966D-1F9E34A9C9EC@mcs.anl.gov> > On Apr 25, 2017, at 8:27 PM, Jed Brown wrote: > > Barry Smith writes: > >>> On Apr 25, 2017, at 1:36 PM, Zhang, Hong wrote: >>> >>> PetscBool is indeed an int. So there is nothing wrong. PETSc does not use bool in order to support C89. >> >> Yes, but in Python using a bool is more natural. For example in Fortran PETSc uses the Fortran logical as the PetscBool > > Right, there is no requirement or even convenience in the Python native > type having the same bit representation. Hopefully the petsc4py developers or users could add implementation "feature". From aroli.marcellinus at gmail.com Wed Apr 26 02:42:57 2017 From: aroli.marcellinus at gmail.com (Aroli Marcellinus) Date: Wed, 26 Apr 2017 16:42:57 +0900 Subject: [petsc-users] Using PETSc to read/manipulate mesh files Message-ID: Dear all, I have this kind of mesh file, and I want to use PETSc to include the information of this file. In short, this file contains of: <1> <2> <3> ... ... ... <----showing element connectivity from node 1 2 3 and 4. Can I read this kind of data into PETSc? I am new in PETSc and want to use it for this purpose. Actually, this file contains a mesh of cardiac heart, and I want to convert this data into a suitable mesh data structure, and I thought that PETSc have a kind of mesh data structure. However, after surfing into a lot of tutorial, I cannot find any references that suits my purpose. ? heart_het.dat ? Aroli Marcellinus *Kumoh Institute of Technology**Computational Medicine Laboratory* 61 Daehak-ro (Sinpyeong-dong), Gumi, Gyeongbuk +82 10 9724 3957 KTalk ID: vondarkness -------------- next part -------------- An HTML attachment was scrubbed... URL: From dalcinl at gmail.com Wed Apr 26 03:06:12 2017 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Wed, 26 Apr 2017 11:06:12 +0300 Subject: [petsc-users] petsc4py bool type In-Reply-To: References: Message-ID: On 25 April 2017 at 19:48, Garth N. Wells wrote: > I'm seeing some behaviour with bool types in petsc4py that I didn't > expect. In the Python interface, returned Booleans have type ' 'int'>', where I expected them to have type ' '. Below > program illustrates issue. Seems to be related to bint in cython. Am I > doing something wrong? Wow! Yes, you are right, this seems like a regression in Cython. In the past, a cast used to coerce values to the Python `bool` type. Damn, these casts are everywhere, it might take a while to fix all the code, unless I can find a hacky way to workaround the issue. -- Lisandro Dalcin ============ Research Scientist Computer, Electrical and Mathematical Sciences & Engineering (CEMSE) Extreme Computing Research Center (ECRC) King Abdullah University of Science and Technology (KAUST) http://ecrc.kaust.edu.sa/ 4700 King Abdullah University of Science and Technology al-Khawarizmi Bldg (Bldg 1), Office # 0109 Thuwal 23955-6900, Kingdom of Saudi Arabia http://www.kaust.edu.sa Office Phone: +966 12 808-0459 From gnw20 at cam.ac.uk Wed Apr 26 04:05:11 2017 From: gnw20 at cam.ac.uk (Garth N. Wells) Date: Wed, 26 Apr 2017 10:05:11 +0100 Subject: [petsc-users] petsc4py bool type In-Reply-To: References: Message-ID: On 26 April 2017 at 09:06, Lisandro Dalcin wrote: > On 25 April 2017 at 19:48, Garth N. Wells wrote: >> I'm seeing some behaviour with bool types in petsc4py that I didn't >> expect. In the Python interface, returned Booleans have type '> 'int'>', where I expected them to have type ' '. Below >> program illustrates issue. Seems to be related to bint in cython. Am I >> doing something wrong? > > Wow! Yes, you are right, this seems like a regression in Cython. In > the past, a cast used to coerce values to the Python `bool` > type. Damn, these casts are everywhere, it might take a while to fix > all the code, unless I can find a hacky way to workaround the issue. > I've created an issue at https://bitbucket.org/petsc/petsc4py/issues/63/. >From some quick testing, Cython maps int correctly via the bint cast, but not PetscBool. Garth > > -- > Lisandro Dalcin > ============ > Research Scientist > Computer, Electrical and Mathematical Sciences & Engineering (CEMSE) > Extreme Computing Research Center (ECRC) > King Abdullah University of Science and Technology (KAUST) > http://ecrc.kaust.edu.sa/ > > 4700 King Abdullah University of Science and Technology > al-Khawarizmi Bldg (Bldg 1), Office # 0109 > Thuwal 23955-6900, Kingdom of Saudi Arabia > http://www.kaust.edu.sa > > Office Phone: +966 12 808-0459 From fmilicchio at me.com Wed Apr 26 07:40:19 2017 From: fmilicchio at me.com (Franco Milicchio) Date: Wed, 26 Apr 2017 14:40:19 +0200 Subject: [petsc-users] MPI and Matrix Sharing Message-ID: <14BFFA85-34ED-4917-AFE1-8F6FF82E4AB1@me.com> Dear all, I am currently implementing a FEM software that runs on single hosts, as the DOF number is not big enough to distribute it. However, I need to solve several right hand sides, and to speed the process up, I?d like to share the matrix between processes, so that I could reuse the factorization. My question is simple: is this a viable approach? Thanks, Franco /fm -- Franco Milicchio Department of Engineering University Roma Tre https://fmilicchio.bitbucket.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Apr 26 08:37:15 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 26 Apr 2017 08:37:15 -0500 Subject: [petsc-users] Using PETSc to read/manipulate mesh files In-Reply-To: References: Message-ID: On Wed, Apr 26, 2017 at 2:42 AM, Aroli Marcellinus < aroli.marcellinus at gmail.com> wrote: > Dear all, > > I have this kind of mesh file, and I want to use PETSc to include the > information of this file. > > > In short, this file contains of: > > <1> <2> <3> > > > > ... > ... > ... > > > <----showing element connectivity from node 1 2 3 and 4. > > Can I read this kind of data into PETSc? I am new in PETSc and want to use > it for this purpose. > > > Actually, this file contains a mesh of cardiac heart, and I want to > convert this data into a suitable mesh data structure, and I thought that > PETSc have a kind of mesh data structure. However, after surfing into a lot > of tutorial, I cannot find any references that suits my purpose. > You could write a PETSc program to read in the data, and call http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DM/DMPlexCreateFromCellList.html However, I would probably write a small Python program to convert this to Gmsh format or MED format, which PETSc can read natively, and so can many other programs. Thanks, Matt > > > ? > heart_het.dat > > ? > Aroli Marcellinus > > *Kumoh Institute of Technology**Computational Medicine Laboratory* > 61 Daehak-ro (Sinpyeong-dong), Gumi, Gyeongbuk > +82 10 9724 3957 <+82%2010-9724-3957> > KTalk ID: vondarkness > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Apr 26 09:02:00 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 26 Apr 2017 09:02:00 -0500 Subject: [petsc-users] MPI and Matrix Sharing In-Reply-To: <14BFFA85-34ED-4917-AFE1-8F6FF82E4AB1@me.com> References: <14BFFA85-34ED-4917-AFE1-8F6FF82E4AB1@me.com> Message-ID: On Wed, Apr 26, 2017 at 7:40 AM, Franco Milicchio wrote: > Dear all, > > I am currently implementing a FEM software that runs on single hosts, as > the DOF number is not big enough to distribute it. However, I need to solve > several right hand sides, and to speed the process up, I?d like to share > the matrix between processes, so that I could reuse the factorization. > > My question is simple: is this a viable approach? > Yes. This will happen automatically if you do not change the matrix in the KSP. Thanks, Matt > Thanks, > Franco > /fm > > -- > Franco Milicchio > > Department of Engineering > University Roma Tre > https://fmilicchio.bitbucket.io/ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From fmilicchio at me.com Wed Apr 26 09:25:51 2017 From: fmilicchio at me.com (Franco Milicchio) Date: Wed, 26 Apr 2017 16:25:51 +0200 Subject: [petsc-users] MPI and Matrix Sharing In-Reply-To: References: <14BFFA85-34ED-4917-AFE1-8F6FF82E4AB1@me.com> Message-ID: > On Apr 26, 2017, at 4:02pm, Matthew Knepley wrote: > > On Wed, Apr 26, 2017 at 7:40 AM, Franco Milicchio > wrote: > Dear all, > > I am currently implementing a FEM software that runs on single hosts, as the DOF number is not big enough to distribute it. However, I need to solve several right hand sides, and to speed the process up, I?d like to share the matrix between processes, so that I could reuse the factorization. > > My question is simple: is this a viable approach? > > Yes. This will happen automatically if you do not change the matrix in the KSP. Thanks, Matthew, glad to hear that! Franco /fm -- Franco Milicchio Department of Engineering University Roma Tre https://fmilicchio.bitbucket.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hng.email at gmail.com Wed Apr 26 10:32:29 2017 From: hng.email at gmail.com (Hom Nath Gharti) Date: Wed, 26 Apr 2017 11:32:29 -0400 Subject: [petsc-users] Configure takes very long Message-ID: Dear all, With version > 3.7.4, I notice that the configure takes very long about 24 hours! Configure process hangs at the line: TESTING: configureMPIEXEC from config.packages.MPI(config/BuildSystem/config/packages/MPI.py:143) Following is my configure command: ./configure -with-blas-lapack-dir=/opt/intel/compilers_and_libraries_2017.2.174/linux/mkl/lib/intel64 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --with-mpiexec=mpiexec --with-debugging=1 --download-scalapack --download-mumps --download-pastix --download-superlu --download-superlu_dist --download-metis --download-parmetis --download-ptscotch --download-hypre =============================================================================== Configuring PETSc to compile on your system =============================================================================== TESTING: configureMPIEXEC from config.packages.MPI(config/BuildSystem/config/packages/MPI.py:143) Am I doing something wrong? Thanks, Hom Nath From balay at mcs.anl.gov Wed Apr 26 10:38:06 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 26 Apr 2017 10:38:06 -0500 Subject: [petsc-users] Configure takes very long In-Reply-To: References: Message-ID: Perhaps mpiexec is hanging. What MPI are you using? Are you able to manually run jobs with mpiexec? Satish On Wed, 26 Apr 2017, Hom Nath Gharti wrote: > Dear all, > > With version > 3.7.4, I notice that the configure takes very long > about 24 hours! > > Configure process hangs at the line: > > TESTING: configureMPIEXEC from > config.packages.MPI(config/BuildSystem/config/packages/MPI.py:143) > > Following is my configure command: > > ./configure -with-blas-lapack-dir=/opt/intel/compilers_and_libraries_2017.2.174/linux/mkl/lib/intel64 > --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 > --with-mpiexec=mpiexec --with-debugging=1 --download-scalapack > --download-mumps --download-pastix --download-superlu > --download-superlu_dist --download-metis --download-parmetis > --download-ptscotch --download-hypre > =============================================================================== > Configuring PETSc to compile on your system > =============================================================================== > TESTING: configureMPIEXEC from > config.packages.MPI(config/BuildSystem/config/packages/MPI.py:143) > > Am I doing something wrong? > > Thanks, > Hom Nath > From hng.email at gmail.com Wed Apr 26 11:17:15 2017 From: hng.email at gmail.com (Hom Nath Gharti) Date: Wed, 26 Apr 2017 12:17:15 -0400 Subject: [petsc-users] Configure takes very long In-Reply-To: References: Message-ID: Yes indeed! Thanks a lot, Satish! I am using intel MPI. Now I replace mipexec with srun, and it configures quickly. Hom Nath On Wed, Apr 26, 2017 at 11:38 AM, Satish Balay wrote: > Perhaps mpiexec is hanging. > > What MPI are you using? Are you able to manually run jobs with > mpiexec? > > Satish > > On Wed, 26 Apr 2017, Hom Nath Gharti wrote: > >> Dear all, >> >> With version > 3.7.4, I notice that the configure takes very long >> about 24 hours! >> >> Configure process hangs at the line: >> >> TESTING: configureMPIEXEC from >> config.packages.MPI(config/BuildSystem/config/packages/MPI.py:143) >> >> Following is my configure command: >> >> ./configure -with-blas-lapack-dir=/opt/intel/compilers_and_libraries_2017.2.174/linux/mkl/lib/intel64 >> --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 >> --with-mpiexec=mpiexec --with-debugging=1 --download-scalapack >> --download-mumps --download-pastix --download-superlu >> --download-superlu_dist --download-metis --download-parmetis >> --download-ptscotch --download-hypre >> =============================================================================== >> Configuring PETSc to compile on your system >> =============================================================================== >> TESTING: configureMPIEXEC from >> config.packages.MPI(config/BuildSystem/config/packages/MPI.py:143) >> >> Am I doing something wrong? >> >> Thanks, >> Hom Nath >> > From balay at mcs.anl.gov Wed Apr 26 11:22:24 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Wed, 26 Apr 2017 11:22:24 -0500 Subject: [petsc-users] Configure takes very long In-Reply-To: References: Message-ID: Great! Glad it work. With Intel MPI - I normally use mpiexec.hydra. However you must be using a cluster - and presumably 'srun' is the way to scedule mpi jobs on it. Satish On Wed, 26 Apr 2017, Hom Nath Gharti wrote: > Yes indeed! Thanks a lot, Satish! I am using intel MPI. Now I replace > mipexec with srun, and it configures quickly. > > Hom Nath > > On Wed, Apr 26, 2017 at 11:38 AM, Satish Balay wrote: > > Perhaps mpiexec is hanging. > > > > What MPI are you using? Are you able to manually run jobs with > > mpiexec? > > > > Satish > > > > On Wed, 26 Apr 2017, Hom Nath Gharti wrote: > > > >> Dear all, > >> > >> With version > 3.7.4, I notice that the configure takes very long > >> about 24 hours! > >> > >> Configure process hangs at the line: > >> > >> TESTING: configureMPIEXEC from > >> config.packages.MPI(config/BuildSystem/config/packages/MPI.py:143) > >> > >> Following is my configure command: > >> > >> ./configure -with-blas-lapack-dir=/opt/intel/compilers_and_libraries_2017.2.174/linux/mkl/lib/intel64 > >> --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 > >> --with-mpiexec=mpiexec --with-debugging=1 --download-scalapack > >> --download-mumps --download-pastix --download-superlu > >> --download-superlu_dist --download-metis --download-parmetis > >> --download-ptscotch --download-hypre > >> =============================================================================== > >> Configuring PETSc to compile on your system > >> =============================================================================== > >> TESTING: configureMPIEXEC from > >> config.packages.MPI(config/BuildSystem/config/packages/MPI.py:143) > >> > >> Am I doing something wrong? > >> > >> Thanks, > >> Hom Nath > >> > > > From hng.email at gmail.com Wed Apr 26 11:27:06 2017 From: hng.email at gmail.com (Hom Nath Gharti) Date: Wed, 26 Apr 2017 12:27:06 -0400 Subject: [petsc-users] Configure takes very long In-Reply-To: References: Message-ID: Yes, I am compiling on a cluster. Thanks for the advice! On Wed, Apr 26, 2017 at 12:22 PM, Satish Balay wrote: > Great! Glad it work. > > With Intel MPI - I normally use mpiexec.hydra. However you must be > using a cluster - and presumably 'srun' is the way to scedule mpi jobs > on it. > > Satish > > On Wed, 26 Apr 2017, Hom Nath Gharti wrote: > >> Yes indeed! Thanks a lot, Satish! I am using intel MPI. Now I replace >> mipexec with srun, and it configures quickly. >> >> Hom Nath >> >> On Wed, Apr 26, 2017 at 11:38 AM, Satish Balay wrote: >> > Perhaps mpiexec is hanging. >> > >> > What MPI are you using? Are you able to manually run jobs with >> > mpiexec? >> > >> > Satish >> > >> > On Wed, 26 Apr 2017, Hom Nath Gharti wrote: >> > >> >> Dear all, >> >> >> >> With version > 3.7.4, I notice that the configure takes very long >> >> about 24 hours! >> >> >> >> Configure process hangs at the line: >> >> >> >> TESTING: configureMPIEXEC from >> >> config.packages.MPI(config/BuildSystem/config/packages/MPI.py:143) >> >> >> >> Following is my configure command: >> >> >> >> ./configure -with-blas-lapack-dir=/opt/intel/compilers_and_libraries_2017.2.174/linux/mkl/lib/intel64 >> >> --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 >> >> --with-mpiexec=mpiexec --with-debugging=1 --download-scalapack >> >> --download-mumps --download-pastix --download-superlu >> >> --download-superlu_dist --download-metis --download-parmetis >> >> --download-ptscotch --download-hypre >> >> =============================================================================== >> >> Configuring PETSc to compile on your system >> >> =============================================================================== >> >> TESTING: configureMPIEXEC from >> >> config.packages.MPI(config/BuildSystem/config/packages/MPI.py:143) >> >> >> >> Am I doing something wrong? >> >> >> >> Thanks, >> >> Hom Nath >> >> >> > >> > From driver.dan12 at yahoo.com Wed Apr 26 12:10:19 2017 From: driver.dan12 at yahoo.com (D D) Date: Wed, 26 Apr 2017 17:10:19 +0000 (UTC) Subject: [petsc-users] Mat_AllocAIJ_CSR in petsc4py References: <2000685901.11593608.1493226619608.ref@mail.yahoo.com> Message-ID: <2000685901.11593608.1493226619608@mail.yahoo.com> Why does MatAlloc_AIJ_CSR make two preallocation calls? ??????? CHKERR( MatSeqAIJSetPreallocationCSR(A, i, j, v) ) ??????? CHKERR( MatMPIAIJSetPreallocationCSR(A, i, j, v) ) I am assuming memory if preallocated twice, once for sequential and another for MPI. So each matrix created with createAIJ will have a sequential and MPI structure. Is this for convenience when switching from sequential to MPI? Thanks,Dale -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Apr 26 12:28:36 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 26 Apr 2017 12:28:36 -0500 Subject: [petsc-users] Mat_AllocAIJ_CSR in petsc4py In-Reply-To: <2000685901.11593608.1493226619608@mail.yahoo.com> References: <2000685901.11593608.1493226619608.ref@mail.yahoo.com> <2000685901.11593608.1493226619608@mail.yahoo.com> Message-ID: On Wed, Apr 26, 2017 at 12:10 PM, D D wrote: > Why does MatAlloc_AIJ_CSR make two preallocation calls? > > CHKERR( MatSeqAIJSetPreallocationCSR(A, i, j, v) ) > CHKERR( MatMPIAIJSetPreallocationCSR(A, i, j, v) ) > > I am assuming memory if preallocated twice, once for sequential and > another for MPI. So each matrix created with createAIJ will have a > sequential and MPI structure. > > Is this for convenience when switching from sequential to MPI? > Only one function will actually take affect, depending on the type of matrix A is. This is like Objective-C rather than C++. Thanks, Matt > Thanks, > Dale > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From driver.dan12 at yahoo.com Wed Apr 26 12:35:45 2017 From: driver.dan12 at yahoo.com (D D) Date: Wed, 26 Apr 2017 17:35:45 +0000 (UTC) Subject: [petsc-users] Mat_AllocAIJ_CSR in petsc4py In-Reply-To: References: <2000685901.11593608.1493226619608.ref@mail.yahoo.com> <2000685901.11593608.1493226619608@mail.yahoo.com> Message-ID: <512053677.4715071.1493228145318@mail.yahoo.com> Thanks. On Wednesday, April 26, 2017 1:28 PM, Matthew Knepley wrote: On Wed, Apr 26, 2017 at 12:10 PM, D D wrote: Why does MatAlloc_AIJ_CSR make two preallocation calls? ??????? CHKERR( MatSeqAIJSetPreallocationCSR( A, i, j, v) ) ??????? CHKERR( MatMPIAIJSetPreallocationCSR( A, i, j, v) ) I am assuming memory if preallocated twice, once for sequential and another for MPI. So each matrix created with createAIJ will have a sequential and MPI structure. Is this for convenience when switching from sequential to MPI? Only one function will actually take affect, depending on the type of matrix A is. This is like Objective-C rather than C++. ? Thanks, ? ? ?Matt? Thanks,Dale -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From elbueler at alaska.edu Wed Apr 26 15:31:25 2017 From: elbueler at alaska.edu (Ed Bueler) Date: Wed, 26 Apr 2017 12:31:25 -0800 Subject: [petsc-users] nondeterministic behavior with ./program -help | head Message-ID: Dear Petsc -- Copied at the bottom is the behavior I get for ex5.c in snes examples/tutorials. When I do $./ex5 -help |head I get a "Caught signal number 13 Broken Pipe" or just a version string. Randomly, one or the other. I know this has come up on petsc-users before: http://lists.mcs.anl.gov/pipermail/petsc-users/2014-May/021848.html The claim was that the SIGPIPE error is standard behavior. My real issue (feature request?) is that when writing a petsc application I want a rational help system. Suppose "program.c" has options with prefix "prg_". The user doesn't know the prefix yet (I could have used "prog_" as the prefix ...) The petsc-canonical help system is (I believe) ./program -help |grep prg_ for program.c-specific options. Works great if you know to look for "prg_". My plan for my applications is to put the prefix in the first line or two of my help string, the user can discover the prefix by ./program -help |head and then use "-help |grep -prg_" and etc. So my question is: Can petsc be set up to not generate an error message when the help system is used in a correct way? (I.e. catching return codes when writing, or handling SIGPIPE differently?) Or, on the other hand, can someone suggest a better help system that irritates less? Thanks! Ed ~/petsc/src/snes/examples/tutorials[master*]$ ./ex5 -help |head Bratu nonlinear PDE in 2d. We solve the Bratu (SFI - solid fuel ignition) problem in a 2D rectangular domain, using distributed arrays (DMDAs) to partition the parallel grid. The command line options include: -par , where indicates the problem's nonlinearity problem SFI: = Bratu parameter (0 <= par <= 6.81) -m_par/n_par , where indicates an integer that MMS3 will be evaluated with 2^m_par, 2^n_par-------------------------------------------------------------------------- Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: 2017-04-26 13:00:15 -0500 [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 13 Broken Pipe: Likely while reading or writing to a socket [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [0]PETSC ERROR: likely location of problem given in stack below [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, [0]PETSC ERROR: INSTEAD the line number of the start of the function [0]PETSC ERROR: is given. [0]PETSC ERROR: [0] PetscVFPrintfDefault line 240 /home/ed/petsc/src/sys/fileio/mprint.c [0]PETSC ERROR: [0] PetscHelpPrintfDefault line 622 /home/ed/petsc/src/sys/fileio/mprint.c [0]PETSC ERROR: [0] PetscOptionsBegin_Private line 29 /home/ed/petsc/src/sys/objects/aoptions.c [0]PETSC ERROR: [0] PetscObjectOptionsBegin_Private line 61 /home/ed/petsc/src/sys/objects/aoptions.c [0]PETSC ERROR: [0] PetscDSSetFromOptions line 218 /home/ed/petsc/src/dm/dt/interface/dtds.c [0]PETSC ERROR: [0] DMSetFromOptions line 747 /home/ed/petsc/src/dm/interface/dm.c [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Signal received [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: 2017-04-26 13:00:15 -0500 [0]PETSC ERROR: ./ex5 on a linux-c-dbg named bueler-leopard by ed Wed Apr 26 11:22:07 2017 [0]PETSC ERROR: Configure options --download-mpich --with-debugging=1 [0]PETSC ERROR: #1 User provided function() line 0 in unknown file application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 [unset]: aborting job: application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 ~/petsc/src/snes/examples/tutorials[master*]$ ./ex5 -help |head Bratu nonlinear PDE in 2d. We solve the Bratu (SFI - solid fuel ignition) problem in a 2D rectangular domain, using distributed arrays (DMDAs) to partition the parallel grid. The command line options include: -par , where indicates the problem's nonlinearity problem SFI: = Bratu parameter (0 <= par <= 6.81) -m_par/n_par , where indicates an integer that MMS3 will be evaluated with 2^m_par, 2^n_par-------------------------------------------------------------------------- Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: 2017-04-26 13:00:15 -0500 ~/petsc/src/snes/examples/tutorials[master*]$ -- Ed Bueler Dept of Math and Stat and Geophysical Institute University of Alaska Fairbanks Fairbanks, AK 99775-6660 301C Chapman -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Apr 26 15:55:53 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 26 Apr 2017 15:55:53 -0500 Subject: [petsc-users] nondeterministic behavior with ./program -help | head In-Reply-To: References: Message-ID: <4A3EF0F4-8CBF-47D6-B89F-05A0D20E63E5@mcs.anl.gov> Is ./ex5 -help -no_signal_handler | head good enough? Normally we won't want to turn off catching of PIPE errors since it might hide other errors. Barry > On Apr 26, 2017, at 3:31 PM, Ed Bueler wrote: > > Dear Petsc -- > > Copied at the bottom is the behavior I get for ex5.c in snes examples/tutorials. When I do > > $./ex5 -help |head > > I get a "Caught signal number 13 Broken Pipe" or just a version string. Randomly, one or the other. > > I know this has come up on petsc-users before: > http://lists.mcs.anl.gov/pipermail/petsc-users/2014-May/021848.html > The claim was that the SIGPIPE error is standard behavior. > > My real issue (feature request?) is that when writing a petsc application I want a rational help system. Suppose "program.c" has options with prefix "prg_". The user doesn't know the prefix yet (I could have used "prog_" as the prefix ...) The petsc-canonical help system is (I believe) > > ./program -help |grep prg_ > > for program.c-specific options. Works great if you know to look for "prg_". My plan for my applications is to put the prefix in the first line or two of my help string, the user can discover the prefix by > > ./program -help |head > > and then use "-help |grep -prg_" and etc. > > So my question is: Can petsc be set up to not generate an error message when the help system is used in a correct way? (I.e. catching return codes when writing, or handling SIGPIPE differently?) Or, on the other hand, can someone suggest a better help system that irritates less? > > Thanks! > > Ed > > ~/petsc/src/snes/examples/tutorials[master*]$ ./ex5 -help |head > Bratu nonlinear PDE in 2d. > We solve the Bratu (SFI - solid fuel ignition) problem in a 2D rectangular > domain, using distributed arrays (DMDAs) to partition the parallel grid. > The command line options include: > -par , where indicates the problem's nonlinearity > problem SFI: = Bratu parameter (0 <= par <= 6.81) > > -m_par/n_par , where indicates an integer > that MMS3 will be evaluated with 2^m_par, 2^n_par-------------------------------------------------------------------------- > Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: 2017-04-26 13:00:15 -0500 > [0]PETSC ERROR: ------------------------------------------------------------------------ > [0]PETSC ERROR: Caught signal number 13 Broken Pipe: Likely while reading or writing to a socket > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors > [0]PETSC ERROR: likely location of problem given in stack below > [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > [0]PETSC ERROR: INSTEAD the line number of the start of the function > [0]PETSC ERROR: is given. > [0]PETSC ERROR: [0] PetscVFPrintfDefault line 240 /home/ed/petsc/src/sys/fileio/mprint.c > [0]PETSC ERROR: [0] PetscHelpPrintfDefault line 622 /home/ed/petsc/src/sys/fileio/mprint.c > [0]PETSC ERROR: [0] PetscOptionsBegin_Private line 29 /home/ed/petsc/src/sys/objects/aoptions.c > [0]PETSC ERROR: [0] PetscObjectOptionsBegin_Private line 61 /home/ed/petsc/src/sys/objects/aoptions.c > [0]PETSC ERROR: [0] PetscDSSetFromOptions line 218 /home/ed/petsc/src/dm/dt/interface/dtds.c > [0]PETSC ERROR: [0] DMSetFromOptions line 747 /home/ed/petsc/src/dm/interface/dm.c > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: Signal received > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: 2017-04-26 13:00:15 -0500 > [0]PETSC ERROR: ./ex5 on a linux-c-dbg named bueler-leopard by ed Wed Apr 26 11:22:07 2017 > [0]PETSC ERROR: Configure options --download-mpich --with-debugging=1 > [0]PETSC ERROR: #1 User provided function() line 0 in unknown file > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 > [unset]: aborting job: > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 > ~/petsc/src/snes/examples/tutorials[master*]$ ./ex5 -help |head > Bratu nonlinear PDE in 2d. > We solve the Bratu (SFI - solid fuel ignition) problem in a 2D rectangular > domain, using distributed arrays (DMDAs) to partition the parallel grid. > The command line options include: > -par , where indicates the problem's nonlinearity > problem SFI: = Bratu parameter (0 <= par <= 6.81) > > -m_par/n_par , where indicates an integer > that MMS3 will be evaluated with 2^m_par, 2^n_par-------------------------------------------------------------------------- > Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: 2017-04-26 13:00:15 -0500 > ~/petsc/src/snes/examples/tutorials[master*]$ > > > -- > Ed Bueler > Dept of Math and Stat and Geophysical Institute > University of Alaska Fairbanks > Fairbanks, AK 99775-6660 > 301C Chapman From elbueler at alaska.edu Wed Apr 26 16:30:47 2017 From: elbueler at alaska.edu (Ed Bueler) Date: Wed, 26 Apr 2017 13:30:47 -0800 Subject: [petsc-users] nondeterministic behavior with ./program -help | head In-Reply-To: <4A3EF0F4-8CBF-47D6-B89F-05A0D20E63E5@mcs.anl.gov> References: <4A3EF0F4-8CBF-47D6-B89F-05A0D20E63E5@mcs.anl.gov> Message-ID: > Is ./ex5 -help -no_signal_handler | head good enough? Umm. It eliminates the nondeterminism. Should discovery of options for petsc user applications go via error messages? (Or asking users to remember "-no_signal_handler" to avoid?) I don't understand signal handling well-enough to recommend a better way ... maybe there is no better way for |head. The best alternative I can think of is if my codes supply the program name for the "const char man[]" argument of PetscOptionsXXX(). Then I can suggest this to users: ./program -help |grep program Is this a mis-use of the "man" argument? Ed PS Amused to find "This routine should not be used from within a signal handler." in the man page for MPI_Abort(), which I reached by clicking on the call to MPI_Abort() in the view of PetscSignalHandlerDefault(). ;-) On Wed, Apr 26, 2017 at 12:55 PM, Barry Smith wrote: > > Is > > ./ex5 -help -no_signal_handler | head > > good enough? > > Normally we won't want to turn off catching of PIPE errors since it might > hide other errors. > > Barry > > > > > On Apr 26, 2017, at 3:31 PM, Ed Bueler wrote: > > > > Dear Petsc -- > > > > Copied at the bottom is the behavior I get for ex5.c in snes > examples/tutorials. When I do > > > > $./ex5 -help |head > > > > I get a "Caught signal number 13 Broken Pipe" or just a version string. > Randomly, one or the other. > > > > I know this has come up on petsc-users before: > > http://lists.mcs.anl.gov/pipermail/petsc-users/2014-May/021848.html > > The claim was that the SIGPIPE error is standard behavior. > > > > My real issue (feature request?) is that when writing a petsc > application I want a rational help system. Suppose "program.c" has options > with prefix "prg_". The user doesn't know the prefix yet (I could have > used "prog_" as the prefix ...) The petsc-canonical help system is (I > believe) > > > > ./program -help |grep prg_ > > > > for program.c-specific options. Works great if you know to look for > "prg_". My plan for my applications is to put the prefix in the first line > or two of my help string, the user can discover the prefix by > > > > ./program -help |head > > > > and then use "-help |grep -prg_" and etc. > > > > So my question is: Can petsc be set up to not generate an error message > when the help system is used in a correct way? (I.e. catching return codes > when writing, or handling SIGPIPE differently?) Or, on the other hand, can > someone suggest a better help system that irritates less? > > > > Thanks! > > > > Ed > > > > ~/petsc/src/snes/examples/tutorials[master*]$ ./ex5 -help |head > > Bratu nonlinear PDE in 2d. > > We solve the Bratu (SFI - solid fuel ignition) problem in a 2D > rectangular > > domain, using distributed arrays (DMDAs) to partition the parallel grid. > > The command line options include: > > -par , where indicates the problem's > nonlinearity > > problem SFI: = Bratu parameter (0 <= par <= 6.81) > > > > -m_par/n_par , where indicates an integer > > that MMS3 will be evaluated with 2^m_par, > 2^n_par----------------------------------------------------- > --------------------- > > Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: > 2017-04-26 13:00:15 -0500 > > [0]PETSC ERROR: ------------------------------ > ------------------------------------------ > > [0]PETSC ERROR: Caught signal number 13 Broken Pipe: Likely while > reading or writing to a socket > > [0]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > > [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/ > documentation/faq.html#valgrind > > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac > OS X to find memory corruption errors > > [0]PETSC ERROR: likely location of problem given in stack below > > [0]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not > available, > > [0]PETSC ERROR: INSTEAD the line number of the start of the > function > > [0]PETSC ERROR: is given. > > [0]PETSC ERROR: [0] PetscVFPrintfDefault line 240 > /home/ed/petsc/src/sys/fileio/mprint.c > > [0]PETSC ERROR: [0] PetscHelpPrintfDefault line 622 > /home/ed/petsc/src/sys/fileio/mprint.c > > [0]PETSC ERROR: [0] PetscOptionsBegin_Private line 29 > /home/ed/petsc/src/sys/objects/aoptions.c > > [0]PETSC ERROR: [0] PetscObjectOptionsBegin_Private line 61 > /home/ed/petsc/src/sys/objects/aoptions.c > > [0]PETSC ERROR: [0] PetscDSSetFromOptions line 218 > /home/ed/petsc/src/dm/dt/interface/dtds.c > > [0]PETSC ERROR: [0] DMSetFromOptions line 747 /home/ed/petsc/src/dm/ > interface/dm.c > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > [0]PETSC ERROR: Signal received > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3453-ge45481d470 > GIT Date: 2017-04-26 13:00:15 -0500 > > [0]PETSC ERROR: ./ex5 on a linux-c-dbg named bueler-leopard by ed Wed > Apr 26 11:22:07 2017 > > [0]PETSC ERROR: Configure options --download-mpich --with-debugging=1 > > [0]PETSC ERROR: #1 User provided function() line 0 in unknown file > > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 > > [unset]: aborting job: > > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 > > ~/petsc/src/snes/examples/tutorials[master*]$ ./ex5 -help |head > > Bratu nonlinear PDE in 2d. > > We solve the Bratu (SFI - solid fuel ignition) problem in a 2D > rectangular > > domain, using distributed arrays (DMDAs) to partition the parallel grid. > > The command line options include: > > -par , where indicates the problem's > nonlinearity > > problem SFI: = Bratu parameter (0 <= par <= 6.81) > > > > -m_par/n_par , where indicates an integer > > that MMS3 will be evaluated with 2^m_par, > 2^n_par----------------------------------------------------- > --------------------- > > Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: > 2017-04-26 13:00:15 -0500 > > ~/petsc/src/snes/examples/tutorials[master*]$ > > > > > > -- > > Ed Bueler > > Dept of Math and Stat and Geophysical Institute > > University of Alaska Fairbanks > > Fairbanks, AK 99775-6660 > > 301C Chapman > > -- Ed Bueler Dept of Math and Stat and Geophysical Institute University of Alaska Fairbanks Fairbanks, AK 99775-6660 301C Chapman -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Wed Apr 26 16:40:21 2017 From: knepley at gmail.com (Matthew Knepley) Date: Wed, 26 Apr 2017 16:40:21 -0500 Subject: [petsc-users] nondeterministic behavior with ./program -help | head In-Reply-To: References: <4A3EF0F4-8CBF-47D6-B89F-05A0D20E63E5@mcs.anl.gov> Message-ID: On Wed, Apr 26, 2017 at 4:30 PM, Ed Bueler wrote: > > Is ./ex5 -help -no_signal_handler | head good enough? > > Umm. It eliminates the nondeterminism. > I do not think this is non-determinism. When head ends it generates a SIGPIPE (by closing the pipe) to notify the application behind the pipe that its done. This is deterministic. You want PETSc to ignore SIGPIPE, which is bad in general. Thus we want you to use an option to say so. You could always use something other than 'head' which eats the rest of the pipe input. Matt > Should discovery of options for petsc user applications go via error > messages? (Or asking users to remember "-no_signal_handler" to avoid?) I > don't understand signal handling well-enough to recommend a better way ... > maybe there is no better way for |head. > > The best alternative I can think of is if my codes supply the program name > for the "const char man[]" argument of PetscOptionsXXX(). Then I can > suggest this to users: > > ./program -help |grep program > > Is this a mis-use of the "man" argument? > > Ed > > PS Amused to find "This routine should not be used from within a signal > handler." in the man page for MPI_Abort(), which I reached by clicking on > the call to MPI_Abort() in the view of PetscSignalHandlerDefault(). ;-) > > > On Wed, Apr 26, 2017 at 12:55 PM, Barry Smith wrote: > >> >> Is >> >> ./ex5 -help -no_signal_handler | head >> >> good enough? >> >> Normally we won't want to turn off catching of PIPE errors since it >> might hide other errors. >> >> Barry >> >> >> >> > On Apr 26, 2017, at 3:31 PM, Ed Bueler wrote: >> > >> > Dear Petsc -- >> > >> > Copied at the bottom is the behavior I get for ex5.c in snes >> examples/tutorials. When I do >> > >> > $./ex5 -help |head >> > >> > I get a "Caught signal number 13 Broken Pipe" or just a version >> string. Randomly, one or the other. >> > >> > I know this has come up on petsc-users before: >> > http://lists.mcs.anl.gov/pipermail/petsc-users/2014-May/021848.html >> > The claim was that the SIGPIPE error is standard behavior. >> > >> > My real issue (feature request?) is that when writing a petsc >> application I want a rational help system. Suppose "program.c" has options >> with prefix "prg_". The user doesn't know the prefix yet (I could have >> used "prog_" as the prefix ...) The petsc-canonical help system is (I >> believe) >> > >> > ./program -help |grep prg_ >> > >> > for program.c-specific options. Works great if you know to look for >> "prg_". My plan for my applications is to put the prefix in the first line >> or two of my help string, the user can discover the prefix by >> > >> > ./program -help |head >> > >> > and then use "-help |grep -prg_" and etc. >> > >> > So my question is: Can petsc be set up to not generate an error >> message when the help system is used in a correct way? (I.e. catching >> return codes when writing, or handling SIGPIPE differently?) Or, on the >> other hand, can someone suggest a better help system that irritates less? >> > >> > Thanks! >> > >> > Ed >> > >> > ~/petsc/src/snes/examples/tutorials[master*]$ ./ex5 -help |head >> > Bratu nonlinear PDE in 2d. >> > We solve the Bratu (SFI - solid fuel ignition) problem in a 2D >> rectangular >> > domain, using distributed arrays (DMDAs) to partition the parallel grid. >> > The command line options include: >> > -par , where indicates the problem's >> nonlinearity >> > problem SFI: = Bratu parameter (0 <= par <= 6.81) >> > >> > -m_par/n_par , where indicates an integer >> > that MMS3 will be evaluated with 2^m_par, >> 2^n_par----------------------------------------------------- >> --------------------- >> > Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: >> 2017-04-26 13:00:15 -0500 >> > [0]PETSC ERROR: ------------------------------ >> ------------------------------------------ >> > [0]PETSC ERROR: Caught signal number 13 Broken Pipe: Likely while >> reading or writing to a socket >> > [0]PETSC ERROR: Try option -start_in_debugger or >> -on_error_attach_debugger >> > [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/d >> ocumentation/faq.html#valgrind >> > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac >> OS X to find memory corruption errors >> > [0]PETSC ERROR: likely location of problem given in stack below >> > [0]PETSC ERROR: --------------------- Stack Frames >> ------------------------------------ >> > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not >> available, >> > [0]PETSC ERROR: INSTEAD the line number of the start of the >> function >> > [0]PETSC ERROR: is given. >> > [0]PETSC ERROR: [0] PetscVFPrintfDefault line 240 >> /home/ed/petsc/src/sys/fileio/mprint.c >> > [0]PETSC ERROR: [0] PetscHelpPrintfDefault line 622 >> /home/ed/petsc/src/sys/fileio/mprint.c >> > [0]PETSC ERROR: [0] PetscOptionsBegin_Private line 29 >> /home/ed/petsc/src/sys/objects/aoptions.c >> > [0]PETSC ERROR: [0] PetscObjectOptionsBegin_Private line 61 >> /home/ed/petsc/src/sys/objects/aoptions.c >> > [0]PETSC ERROR: [0] PetscDSSetFromOptions line 218 >> /home/ed/petsc/src/dm/dt/interface/dtds.c >> > [0]PETSC ERROR: [0] DMSetFromOptions line 747 >> /home/ed/petsc/src/dm/interface/dm.c >> > [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> > [0]PETSC ERROR: Signal received >> > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html >> for trouble shooting. >> > [0]PETSC ERROR: Petsc Development GIT revision: >> v3.7.6-3453-ge45481d470 GIT Date: 2017-04-26 13:00:15 -0500 >> > [0]PETSC ERROR: ./ex5 on a linux-c-dbg named bueler-leopard by ed Wed >> Apr 26 11:22:07 2017 >> > [0]PETSC ERROR: Configure options --download-mpich --with-debugging=1 >> > [0]PETSC ERROR: #1 User provided function() line 0 in unknown file >> > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 >> > [unset]: aborting job: >> > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 >> > ~/petsc/src/snes/examples/tutorials[master*]$ ./ex5 -help |head >> > Bratu nonlinear PDE in 2d. >> > We solve the Bratu (SFI - solid fuel ignition) problem in a 2D >> rectangular >> > domain, using distributed arrays (DMDAs) to partition the parallel grid. >> > The command line options include: >> > -par , where indicates the problem's >> nonlinearity >> > problem SFI: = Bratu parameter (0 <= par <= 6.81) >> > >> > -m_par/n_par , where indicates an integer >> > that MMS3 will be evaluated with 2^m_par, >> 2^n_par----------------------------------------------------- >> --------------------- >> > Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: >> 2017-04-26 13:00:15 -0500 >> > ~/petsc/src/snes/examples/tutorials[master*]$ >> > >> > >> > -- >> > Ed Bueler >> > Dept of Math and Stat and Geophysical Institute >> > University of Alaska Fairbanks >> > Fairbanks, AK 99775-6660 >> > 301C Chapman >> >> > > > -- > Ed Bueler > Dept of Math and Stat and Geophysical Institute > University of Alaska Fairbanks > Fairbanks, AK 99775-6660 > 301C Chapman > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Wed Apr 26 17:17:09 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 26 Apr 2017 17:17:09 -0500 Subject: [petsc-users] nondeterministic behavior with ./program -help | head In-Reply-To: References: <4A3EF0F4-8CBF-47D6-B89F-05A0D20E63E5@mcs.anl.gov> Message-ID: Another "solution" :-) ./ex5 -help | (head ; cat > /dev/null) > On Apr 26, 2017, at 4:30 PM, Ed Bueler wrote: > > > Is ./ex5 -help -no_signal_handler | head good enough? > > Umm. It eliminates the nondeterminism. Should discovery of options for petsc user applications go via error messages? (Or asking users to remember "-no_signal_handler" to avoid?) I don't understand signal handling well-enough to recommend a better way ... maybe there is no better way for |head. > > The best alternative I can think of is if my codes supply the program name for the "const char man[]" argument of PetscOptionsXXX(). Then I can suggest this to users: > > ./program -help |grep program > > Is this a mis-use of the "man" argument? > > Ed > > PS Amused to find "This routine should not be used from within a signal handler." in the man page for MPI_Abort(), which I reached by clicking on the call to MPI_Abort() in the view of PetscSignalHandlerDefault(). ;-) > > > On Wed, Apr 26, 2017 at 12:55 PM, Barry Smith wrote: > > Is > > ./ex5 -help -no_signal_handler | head > > good enough? > > Normally we won't want to turn off catching of PIPE errors since it might hide other errors. > > Barry > > > > > On Apr 26, 2017, at 3:31 PM, Ed Bueler wrote: > > > > Dear Petsc -- > > > > Copied at the bottom is the behavior I get for ex5.c in snes examples/tutorials. When I do > > > > $./ex5 -help |head > > > > I get a "Caught signal number 13 Broken Pipe" or just a version string. Randomly, one or the other. > > > > I know this has come up on petsc-users before: > > http://lists.mcs.anl.gov/pipermail/petsc-users/2014-May/021848.html > > The claim was that the SIGPIPE error is standard behavior. > > > > My real issue (feature request?) is that when writing a petsc application I want a rational help system. Suppose "program.c" has options with prefix "prg_". The user doesn't know the prefix yet (I could have used "prog_" as the prefix ...) The petsc-canonical help system is (I believe) > > > > ./program -help |grep prg_ > > > > for program.c-specific options. Works great if you know to look for "prg_". My plan for my applications is to put the prefix in the first line or two of my help string, the user can discover the prefix by > > > > ./program -help |head > > > > and then use "-help |grep -prg_" and etc. > > > > So my question is: Can petsc be set up to not generate an error message when the help system is used in a correct way? (I.e. catching return codes when writing, or handling SIGPIPE differently?) Or, on the other hand, can someone suggest a better help system that irritates less? > > > > Thanks! > > > > Ed > > > > ~/petsc/src/snes/examples/tutorials[master*]$ ./ex5 -help |head > > Bratu nonlinear PDE in 2d. > > We solve the Bratu (SFI - solid fuel ignition) problem in a 2D rectangular > > domain, using distributed arrays (DMDAs) to partition the parallel grid. > > The command line options include: > > -par , where indicates the problem's nonlinearity > > problem SFI: = Bratu parameter (0 <= par <= 6.81) > > > > -m_par/n_par , where indicates an integer > > that MMS3 will be evaluated with 2^m_par, 2^n_par-------------------------------------------------------------------------- > > Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: 2017-04-26 13:00:15 -0500 > > [0]PETSC ERROR: ------------------------------------------------------------------------ > > [0]PETSC ERROR: Caught signal number 13 Broken Pipe: Likely while reading or writing to a socket > > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > > [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors > > [0]PETSC ERROR: likely location of problem given in stack below > > [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ > > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > > [0]PETSC ERROR: INSTEAD the line number of the start of the function > > [0]PETSC ERROR: is given. > > [0]PETSC ERROR: [0] PetscVFPrintfDefault line 240 /home/ed/petsc/src/sys/fileio/mprint.c > > [0]PETSC ERROR: [0] PetscHelpPrintfDefault line 622 /home/ed/petsc/src/sys/fileio/mprint.c > > [0]PETSC ERROR: [0] PetscOptionsBegin_Private line 29 /home/ed/petsc/src/sys/objects/aoptions.c > > [0]PETSC ERROR: [0] PetscObjectOptionsBegin_Private line 61 /home/ed/petsc/src/sys/objects/aoptions.c > > [0]PETSC ERROR: [0] PetscDSSetFromOptions line 218 /home/ed/petsc/src/dm/dt/interface/dtds.c > > [0]PETSC ERROR: [0] DMSetFromOptions line 747 /home/ed/petsc/src/dm/interface/dm.c > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > [0]PETSC ERROR: Signal received > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: 2017-04-26 13:00:15 -0500 > > [0]PETSC ERROR: ./ex5 on a linux-c-dbg named bueler-leopard by ed Wed Apr 26 11:22:07 2017 > > [0]PETSC ERROR: Configure options --download-mpich --with-debugging=1 > > [0]PETSC ERROR: #1 User provided function() line 0 in unknown file > > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 > > [unset]: aborting job: > > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 > > ~/petsc/src/snes/examples/tutorials[master*]$ ./ex5 -help |head > > Bratu nonlinear PDE in 2d. > > We solve the Bratu (SFI - solid fuel ignition) problem in a 2D rectangular > > domain, using distributed arrays (DMDAs) to partition the parallel grid. > > The command line options include: > > -par , where indicates the problem's nonlinearity > > problem SFI: = Bratu parameter (0 <= par <= 6.81) > > > > -m_par/n_par , where indicates an integer > > that MMS3 will be evaluated with 2^m_par, 2^n_par-------------------------------------------------------------------------- > > Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: 2017-04-26 13:00:15 -0500 > > ~/petsc/src/snes/examples/tutorials[master*]$ > > > > > > -- > > Ed Bueler > > Dept of Math and Stat and Geophysical Institute > > University of Alaska Fairbanks > > Fairbanks, AK 99775-6660 > > 301C Chapman > > > > > -- > Ed Bueler > Dept of Math and Stat and Geophysical Institute > University of Alaska Fairbanks > Fairbanks, AK 99775-6660 > 301C Chapman From bsmith at mcs.anl.gov Wed Apr 26 17:20:13 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 26 Apr 2017 17:20:13 -0500 Subject: [petsc-users] nondeterministic behavior with ./program -help | head In-Reply-To: References: <4A3EF0F4-8CBF-47D6-B89F-05A0D20E63E5@mcs.anl.gov> Message-ID: <86332CA2-5ECA-41C2-8122-CDFA35E5EFE6@mcs.anl.gov> In The Hitchhiker's Guide to the Galaxy, Douglas Adams mentions an extremely dull planet, inhabited by a bunch of depressed humans and a certain breed of animals with sharp teeth which communicate with the humans by biting them very hard in the thighs. This is strikingly similar to UNIX, in which the kernel communicates with processes by sending paralyzing or deadly signals to them. Processes may intercept some of the signals, and try to adapt to the situation, but most of them don't. > On Apr 26, 2017, at 5:17 PM, Barry Smith wrote: > > > Another "solution" :-) > > ./ex5 -help | (head ; cat > /dev/null) > > >> On Apr 26, 2017, at 4:30 PM, Ed Bueler wrote: >> >>> Is ./ex5 -help -no_signal_handler | head good enough? >> >> Umm. It eliminates the nondeterminism. Should discovery of options for petsc user applications go via error messages? (Or asking users to remember "-no_signal_handler" to avoid?) I don't understand signal handling well-enough to recommend a better way ... maybe there is no better way for |head. >> >> The best alternative I can think of is if my codes supply the program name for the "const char man[]" argument of PetscOptionsXXX(). Then I can suggest this to users: >> >> ./program -help |grep program >> >> Is this a mis-use of the "man" argument? >> >> Ed >> >> PS Amused to find "This routine should not be used from within a signal handler." in the man page for MPI_Abort(), which I reached by clicking on the call to MPI_Abort() in the view of PetscSignalHandlerDefault(). ;-) >> >> >> On Wed, Apr 26, 2017 at 12:55 PM, Barry Smith wrote: >> >> Is >> >> ./ex5 -help -no_signal_handler | head >> >> good enough? >> >> Normally we won't want to turn off catching of PIPE errors since it might hide other errors. >> >> Barry >> >> >> >>> On Apr 26, 2017, at 3:31 PM, Ed Bueler wrote: >>> >>> Dear Petsc -- >>> >>> Copied at the bottom is the behavior I get for ex5.c in snes examples/tutorials. When I do >>> >>> $./ex5 -help |head >>> >>> I get a "Caught signal number 13 Broken Pipe" or just a version string. Randomly, one or the other. >>> >>> I know this has come up on petsc-users before: >>> http://lists.mcs.anl.gov/pipermail/petsc-users/2014-May/021848.html >>> The claim was that the SIGPIPE error is standard behavior. >>> >>> My real issue (feature request?) is that when writing a petsc application I want a rational help system. Suppose "program.c" has options with prefix "prg_". The user doesn't know the prefix yet (I could have used "prog_" as the prefix ...) The petsc-canonical help system is (I believe) >>> >>> ./program -help |grep prg_ >>> >>> for program.c-specific options. Works great if you know to look for "prg_". My plan for my applications is to put the prefix in the first line or two of my help string, the user can discover the prefix by >>> >>> ./program -help |head >>> >>> and then use "-help |grep -prg_" and etc. >>> >>> So my question is: Can petsc be set up to not generate an error message when the help system is used in a correct way? (I.e. catching return codes when writing, or handling SIGPIPE differently?) Or, on the other hand, can someone suggest a better help system that irritates less? >>> >>> Thanks! >>> >>> Ed >>> >>> ~/petsc/src/snes/examples/tutorials[master*]$ ./ex5 -help |head >>> Bratu nonlinear PDE in 2d. >>> We solve the Bratu (SFI - solid fuel ignition) problem in a 2D rectangular >>> domain, using distributed arrays (DMDAs) to partition the parallel grid. >>> The command line options include: >>> -par , where indicates the problem's nonlinearity >>> problem SFI: = Bratu parameter (0 <= par <= 6.81) >>> >>> -m_par/n_par , where indicates an integer >>> that MMS3 will be evaluated with 2^m_par, 2^n_par-------------------------------------------------------------------------- >>> Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: 2017-04-26 13:00:15 -0500 >>> [0]PETSC ERROR: ------------------------------------------------------------------------ >>> [0]PETSC ERROR: Caught signal number 13 Broken Pipe: Likely while reading or writing to a socket >>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >>> [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >>> [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors >>> [0]PETSC ERROR: likely location of problem given in stack below >>> [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ >>> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, >>> [0]PETSC ERROR: INSTEAD the line number of the start of the function >>> [0]PETSC ERROR: is given. >>> [0]PETSC ERROR: [0] PetscVFPrintfDefault line 240 /home/ed/petsc/src/sys/fileio/mprint.c >>> [0]PETSC ERROR: [0] PetscHelpPrintfDefault line 622 /home/ed/petsc/src/sys/fileio/mprint.c >>> [0]PETSC ERROR: [0] PetscOptionsBegin_Private line 29 /home/ed/petsc/src/sys/objects/aoptions.c >>> [0]PETSC ERROR: [0] PetscObjectOptionsBegin_Private line 61 /home/ed/petsc/src/sys/objects/aoptions.c >>> [0]PETSC ERROR: [0] PetscDSSetFromOptions line 218 /home/ed/petsc/src/dm/dt/interface/dtds.c >>> [0]PETSC ERROR: [0] DMSetFromOptions line 747 /home/ed/petsc/src/dm/interface/dm.c >>> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>> [0]PETSC ERROR: Signal received >>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>> [0]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: 2017-04-26 13:00:15 -0500 >>> [0]PETSC ERROR: ./ex5 on a linux-c-dbg named bueler-leopard by ed Wed Apr 26 11:22:07 2017 >>> [0]PETSC ERROR: Configure options --download-mpich --with-debugging=1 >>> [0]PETSC ERROR: #1 User provided function() line 0 in unknown file >>> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 >>> [unset]: aborting job: >>> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 >>> ~/petsc/src/snes/examples/tutorials[master*]$ ./ex5 -help |head >>> Bratu nonlinear PDE in 2d. >>> We solve the Bratu (SFI - solid fuel ignition) problem in a 2D rectangular >>> domain, using distributed arrays (DMDAs) to partition the parallel grid. >>> The command line options include: >>> -par , where indicates the problem's nonlinearity >>> problem SFI: = Bratu parameter (0 <= par <= 6.81) >>> >>> -m_par/n_par , where indicates an integer >>> that MMS3 will be evaluated with 2^m_par, 2^n_par-------------------------------------------------------------------------- >>> Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: 2017-04-26 13:00:15 -0500 >>> ~/petsc/src/snes/examples/tutorials[master*]$ >>> >>> >>> -- >>> Ed Bueler >>> Dept of Math and Stat and Geophysical Institute >>> University of Alaska Fairbanks >>> Fairbanks, AK 99775-6660 >>> 301C Chapman >> >> >> >> >> -- >> Ed Bueler >> Dept of Math and Stat and Geophysical Institute >> University of Alaska Fairbanks >> Fairbanks, AK 99775-6660 >> 301C Chapman > From gnw20 at cam.ac.uk Wed Apr 26 17:25:42 2017 From: gnw20 at cam.ac.uk (Garth N. Wells) Date: Wed, 26 Apr 2017 23:25:42 +0100 Subject: [petsc-users] Multigrid coarse grid solver Message-ID: I'm a bit confused by the selection of the coarse grid solver for multigrid. For the demo ksp/ex56, if I do: mpirun -np 1 ./ex56 -ne 16 -ksp_view -pc_type gamg -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu I see Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 1 MPI processes type: lu out-of-place factorization tolerance for zero pivot 2.22045e-14 matrix ordering: nd factor fill ratio given 5., needed 1. Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=6, cols=6, bs=6 package used to perform factorization: petsc total: nonzeros=36, allocated nonzeros=36 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 2 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: 1 MPI processes type: seqaij rows=6, cols=6, bs=6 total: nonzeros=36, allocated nonzeros=36 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 2 nodes, limit used is 5 which is what I expect. Increasing from 1 to 2 processes: mpirun -np 2 ./ex56 -ne 16 -ksp_view -pc_type gamg -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu I see Coarse grid solver -- level ------------------------------- KSP Object: (mg_coarse_) 2 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (mg_coarse_) 2 MPI processes type: lu out-of-place factorization tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 0., needed 0. Factored matrix follows: Mat Object: 2 MPI processes type: superlu_dist rows=6, cols=6 package used to perform factorization: superlu_dist total: nonzeros=0, allocated nonzeros=0 total number of mallocs used during MatSetValues calls =0 SuperLU_DIST run parameters: Process grid nprow 2 x npcol 1 Equilibrate matrix TRUE Matrix input mode 1 Replace tiny pivots FALSE Use iterative refinement FALSE Processors in row 2 col partition 1 Row permutation LargeDiag Column permutation METIS_AT_PLUS_A Parallel symbolic factorization FALSE Repeated factorization SamePattern linear system matrix = precond matrix: Mat Object: 2 MPI processes type: mpiaij rows=6, cols=6, bs=6 total: nonzeros=36, allocated nonzeros=36 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 2 nodes, limit used is 5 Note that the coarse grid is now using superlu_dist. Is the coarse grid being solved in parallel? Garth From bsmith at mcs.anl.gov Wed Apr 26 17:28:05 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 26 Apr 2017 17:28:05 -0500 Subject: [petsc-users] nondeterministic behavior with ./program -help | head In-Reply-To: References: <4A3EF0F4-8CBF-47D6-B89F-05A0D20E63E5@mcs.anl.gov> Message-ID: <3C68ED64-CFA3-4190-ACA2-276CAD4C529D@mcs.anl.gov> Ed I think want you want is something like ./program -intro that causes the help message to PetscInitialize() to be printed and then have the program end. We can add this, -intro is not a great name, any idea for a better name? Barry > On Apr 26, 2017, at 4:30 PM, Ed Bueler wrote: > > > Is ./ex5 -help -no_signal_handler | head good enough? > > Umm. It eliminates the nondeterminism. Should discovery of options for petsc user applications go via error messages? (Or asking users to remember "-no_signal_handler" to avoid?) I don't understand signal handling well-enough to recommend a better way ... maybe there is no better way for |head. > > The best alternative I can think of is if my codes supply the program name for the "const char man[]" argument of PetscOptionsXXX(). Then I can suggest this to users: > > ./program -help |grep program > > Is this a mis-use of the "man" argument? > > Ed > > PS Amused to find "This routine should not be used from within a signal handler." in the man page for MPI_Abort(), which I reached by clicking on the call to MPI_Abort() in the view of PetscSignalHandlerDefault(). ;-) > > > On Wed, Apr 26, 2017 at 12:55 PM, Barry Smith wrote: > > Is > > ./ex5 -help -no_signal_handler | head > > good enough? > > Normally we won't want to turn off catching of PIPE errors since it might hide other errors. > > Barry > > > > > On Apr 26, 2017, at 3:31 PM, Ed Bueler wrote: > > > > Dear Petsc -- > > > > Copied at the bottom is the behavior I get for ex5.c in snes examples/tutorials. When I do > > > > $./ex5 -help |head > > > > I get a "Caught signal number 13 Broken Pipe" or just a version string. Randomly, one or the other. > > > > I know this has come up on petsc-users before: > > http://lists.mcs.anl.gov/pipermail/petsc-users/2014-May/021848.html > > The claim was that the SIGPIPE error is standard behavior. > > > > My real issue (feature request?) is that when writing a petsc application I want a rational help system. Suppose "program.c" has options with prefix "prg_". The user doesn't know the prefix yet (I could have used "prog_" as the prefix ...) The petsc-canonical help system is (I believe) > > > > ./program -help |grep prg_ > > > > for program.c-specific options. Works great if you know to look for "prg_". My plan for my applications is to put the prefix in the first line or two of my help string, the user can discover the prefix by > > > > ./program -help |head > > > > and then use "-help |grep -prg_" and etc. > > > > So my question is: Can petsc be set up to not generate an error message when the help system is used in a correct way? (I.e. catching return codes when writing, or handling SIGPIPE differently?) Or, on the other hand, can someone suggest a better help system that irritates less? > > > > Thanks! > > > > Ed > > > > ~/petsc/src/snes/examples/tutorials[master*]$ ./ex5 -help |head > > Bratu nonlinear PDE in 2d. > > We solve the Bratu (SFI - solid fuel ignition) problem in a 2D rectangular > > domain, using distributed arrays (DMDAs) to partition the parallel grid. > > The command line options include: > > -par , where indicates the problem's nonlinearity > > problem SFI: = Bratu parameter (0 <= par <= 6.81) > > > > -m_par/n_par , where indicates an integer > > that MMS3 will be evaluated with 2^m_par, 2^n_par-------------------------------------------------------------------------- > > Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: 2017-04-26 13:00:15 -0500 > > [0]PETSC ERROR: ------------------------------------------------------------------------ > > [0]PETSC ERROR: Caught signal number 13 Broken Pipe: Likely while reading or writing to a socket > > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > > [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors > > [0]PETSC ERROR: likely location of problem given in stack below > > [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ > > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > > [0]PETSC ERROR: INSTEAD the line number of the start of the function > > [0]PETSC ERROR: is given. > > [0]PETSC ERROR: [0] PetscVFPrintfDefault line 240 /home/ed/petsc/src/sys/fileio/mprint.c > > [0]PETSC ERROR: [0] PetscHelpPrintfDefault line 622 /home/ed/petsc/src/sys/fileio/mprint.c > > [0]PETSC ERROR: [0] PetscOptionsBegin_Private line 29 /home/ed/petsc/src/sys/objects/aoptions.c > > [0]PETSC ERROR: [0] PetscObjectOptionsBegin_Private line 61 /home/ed/petsc/src/sys/objects/aoptions.c > > [0]PETSC ERROR: [0] PetscDSSetFromOptions line 218 /home/ed/petsc/src/dm/dt/interface/dtds.c > > [0]PETSC ERROR: [0] DMSetFromOptions line 747 /home/ed/petsc/src/dm/interface/dm.c > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > [0]PETSC ERROR: Signal received > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: 2017-04-26 13:00:15 -0500 > > [0]PETSC ERROR: ./ex5 on a linux-c-dbg named bueler-leopard by ed Wed Apr 26 11:22:07 2017 > > [0]PETSC ERROR: Configure options --download-mpich --with-debugging=1 > > [0]PETSC ERROR: #1 User provided function() line 0 in unknown file > > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 > > [unset]: aborting job: > > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 > > ~/petsc/src/snes/examples/tutorials[master*]$ ./ex5 -help |head > > Bratu nonlinear PDE in 2d. > > We solve the Bratu (SFI - solid fuel ignition) problem in a 2D rectangular > > domain, using distributed arrays (DMDAs) to partition the parallel grid. > > The command line options include: > > -par , where indicates the problem's nonlinearity > > problem SFI: = Bratu parameter (0 <= par <= 6.81) > > > > -m_par/n_par , where indicates an integer > > that MMS3 will be evaluated with 2^m_par, 2^n_par-------------------------------------------------------------------------- > > Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: 2017-04-26 13:00:15 -0500 > > ~/petsc/src/snes/examples/tutorials[master*]$ > > > > > > -- > > Ed Bueler > > Dept of Math and Stat and Geophysical Institute > > University of Alaska Fairbanks > > Fairbanks, AK 99775-6660 > > 301C Chapman > > > > > -- > Ed Bueler > Dept of Math and Stat and Geophysical Institute > University of Alaska Fairbanks > Fairbanks, AK 99775-6660 > 301C Chapman From elbueler at alaska.edu Wed Apr 26 17:34:04 2017 From: elbueler at alaska.edu (Ed Bueler) Date: Wed, 26 Apr 2017 14:34:04 -0800 Subject: [petsc-users] Fwd: nondeterministic behavior with ./program -help | head In-Reply-To: References: <4A3EF0F4-8CBF-47D6-B89F-05A0D20E63E5@mcs.anl.gov> <3C68ED64-CFA3-4190-ACA2-276CAD4C529D@mcs.anl.gov> Message-ID: [Missed reply all.] ---------- Forwarded message ---------- From: Ed Bueler Date: Wed, Apr 26, 2017 at 2:33 PM Subject: Re: [petsc-users] nondeterministic behavior with ./program -help | head To: Barry Smith Yes, I thought of something vaguely like that. How about ./program -help_string or -help_only? Ed On Wed, Apr 26, 2017 at 2:28 PM, Barry Smith wrote: > > Ed > > I think want you want is something like > > ./program -intro > > that causes the help message to PetscInitialize() to be printed and then > have the program end. > > We can add this, -intro is not a great name, any idea for a better name? > > Barry > > > On Apr 26, 2017, at 4:30 PM, Ed Bueler wrote: > > > > > Is ./ex5 -help -no_signal_handler | head good enough? > > > > Umm. It eliminates the nondeterminism. Should discovery of options for > petsc user applications go via error messages? (Or asking users to > remember "-no_signal_handler" to avoid?) I don't understand signal > handling well-enough to recommend a better way ... maybe there is no better > way for |head. > > > > The best alternative I can think of is if my codes supply the program > name for the "const char man[]" argument of PetscOptionsXXX(). Then I can > suggest this to users: > > > > ./program -help |grep program > > > > Is this a mis-use of the "man" argument? > > > > Ed > > > > PS Amused to find "This routine should not be used from within a signal > handler." in the man page for MPI_Abort(), which I reached by clicking on > the call to MPI_Abort() in the view of PetscSignalHandlerDefault(). ;-) > > > > > > On Wed, Apr 26, 2017 at 12:55 PM, Barry Smith > wrote: > > > > Is > > > > ./ex5 -help -no_signal_handler | head > > > > good enough? > > > > Normally we won't want to turn off catching of PIPE errors since it > might hide other errors. > > > > Barry > > > > > > > > > On Apr 26, 2017, at 3:31 PM, Ed Bueler wrote: > > > > > > Dear Petsc -- > > > > > > Copied at the bottom is the behavior I get for ex5.c in snes > examples/tutorials. When I do > > > > > > $./ex5 -help |head > > > > > > I get a "Caught signal number 13 Broken Pipe" or just a version > string. Randomly, one or the other. > > > > > > I know this has come up on petsc-users before: > > > http://lists.mcs.anl.gov/pipermail/petsc-users/2014-May/ > 021848.html > > > The claim was that the SIGPIPE error is standard behavior. > > > > > > My real issue (feature request?) is that when writing a petsc > application I want a rational help system. Suppose "program.c" has options > with prefix "prg_". The user doesn't know the prefix yet (I could have > used "prog_" as the prefix ...) The petsc-canonical help system is (I > believe) > > > > > > ./program -help |grep prg_ > > > > > > for program.c-specific options. Works great if you know to look for > "prg_". My plan for my applications is to put the prefix in the first line > or two of my help string, the user can discover the prefix by > > > > > > ./program -help |head > > > > > > and then use "-help |grep -prg_" and etc. > > > > > > So my question is: Can petsc be set up to not generate an error > message when the help system is used in a correct way? (I.e. catching > return codes when writing, or handling SIGPIPE differently?) Or, on the > other hand, can someone suggest a better help system that irritates less? > > > > > > Thanks! > > > > > > Ed > > > > > > ~/petsc/src/snes/examples/tutorials[master*]$ ./ex5 -help |head > > > Bratu nonlinear PDE in 2d. > > > We solve the Bratu (SFI - solid fuel ignition) problem in a 2D > rectangular > > > domain, using distributed arrays (DMDAs) to partition the parallel > grid. > > > The command line options include: > > > -par , where indicates the problem's > nonlinearity > > > problem SFI: = Bratu parameter (0 <= par <= 6.81) > > > > > > -m_par/n_par , where indicates an integer > > > that MMS3 will be evaluated with 2^m_par, > 2^n_par----------------------------------------------------- > --------------------- > > > Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: > 2017-04-26 13:00:15 -0500 > > > [0]PETSC ERROR: ------------------------------ > ------------------------------------------ > > > [0]PETSC ERROR: Caught signal number 13 Broken Pipe: Likely while > reading or writing to a socket > > > [0]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > > > [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/d > ocumentation/faq.html#valgrind > > > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac > OS X to find memory corruption errors > > > [0]PETSC ERROR: likely location of problem given in stack below > > > [0]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > > > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not > available, > > > [0]PETSC ERROR: INSTEAD the line number of the start of the > function > > > [0]PETSC ERROR: is given. > > > [0]PETSC ERROR: [0] PetscVFPrintfDefault line 240 > /home/ed/petsc/src/sys/fileio/mprint.c > > > [0]PETSC ERROR: [0] PetscHelpPrintfDefault line 622 > /home/ed/petsc/src/sys/fileio/mprint.c > > > [0]PETSC ERROR: [0] PetscOptionsBegin_Private line 29 > /home/ed/petsc/src/sys/objects/aoptions.c > > > [0]PETSC ERROR: [0] PetscObjectOptionsBegin_Private line 61 > /home/ed/petsc/src/sys/objects/aoptions.c > > > [0]PETSC ERROR: [0] PetscDSSetFromOptions line 218 > /home/ed/petsc/src/dm/dt/interface/dtds.c > > > [0]PETSC ERROR: [0] DMSetFromOptions line 747 > /home/ed/petsc/src/dm/interface/dm.c > > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > > [0]PETSC ERROR: Signal received > > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d > ocumentation/faq.html for trouble shooting. > > > [0]PETSC ERROR: Petsc Development GIT revision: > v3.7.6-3453-ge45481d470 GIT Date: 2017-04-26 13:00:15 -0500 > > > [0]PETSC ERROR: ./ex5 on a linux-c-dbg named bueler-leopard by ed Wed > Apr 26 11:22:07 2017 > > > [0]PETSC ERROR: Configure options --download-mpich --with-debugging=1 > > > [0]PETSC ERROR: #1 User provided function() line 0 in unknown file > > > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 > > > [unset]: aborting job: > > > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 > > > ~/petsc/src/snes/examples/tutorials[master*]$ ./ex5 -help |head > > > Bratu nonlinear PDE in 2d. > > > We solve the Bratu (SFI - solid fuel ignition) problem in a 2D > rectangular > > > domain, using distributed arrays (DMDAs) to partition the parallel > grid. > > > The command line options include: > > > -par , where indicates the problem's > nonlinearity > > > problem SFI: = Bratu parameter (0 <= par <= 6.81) > > > > > > -m_par/n_par , where indicates an integer > > > that MMS3 will be evaluated with 2^m_par, > 2^n_par----------------------------------------------------- > --------------------- > > > Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: > 2017-04-26 13:00:15 -0500 > > > ~/petsc/src/snes/examples/tutorials[master*]$ > > > > > > > > > -- > > > Ed Bueler > > > Dept of Math and Stat and Geophysical Institute > > > University of Alaska Fairbanks > > > Fairbanks, AK 99775-6660 > > > 301C Chapman > > > > > > > > > > -- > > Ed Bueler > > Dept of Math and Stat and Geophysical Institute > > University of Alaska Fairbanks > > Fairbanks, AK 99775-6660 > > 301C Chapman > > -- Ed Bueler Dept of Math and Stat and Geophysical Institute University of Alaska Fairbanks Fairbanks, AK 99775-6660 301C Chapman -- Ed Bueler Dept of Math and Stat and Geophysical Institute University of Alaska Fairbanks Fairbanks, AK 99775-6660 301C Chapman -------------- next part -------------- An HTML attachment was scrubbed... URL: From gnw20 at cam.ac.uk Wed Apr 26 17:42:43 2017 From: gnw20 at cam.ac.uk (Garth N. Wells) Date: Wed, 26 Apr 2017 23:42:43 +0100 Subject: [petsc-users] petsc4py bool type In-Reply-To: References: Message-ID: To close this thread off and for reference, Lisandro has pushed a fix [1] to petsc4py maint and master. Garth [1] https://bitbucket.org/petsc/petsc4py/commits/51b4b9fe3948007c2b1320df6130d171297f407c On 26 April 2017 at 10:05, Garth N. Wells wrote: > On 26 April 2017 at 09:06, Lisandro Dalcin wrote: >> On 25 April 2017 at 19:48, Garth N. Wells wrote: >>> I'm seeing some behaviour with bool types in petsc4py that I didn't >>> expect. In the Python interface, returned Booleans have type '>> 'int'>', where I expected them to have type ' '. Below >>> program illustrates issue. Seems to be related to bint in cython. Am I >>> doing something wrong? >> >> Wow! Yes, you are right, this seems like a regression in Cython. In >> the past, a cast used to coerce values to the Python `bool` >> type. Damn, these casts are everywhere, it might take a while to fix >> all the code, unless I can find a hacky way to workaround the issue. >> > > I've created an issue at https://bitbucket.org/petsc/petsc4py/issues/63/. > > From some quick testing, Cython maps int correctly via the bint cast, > but not PetscBool. > > Garth > >> >> -- >> Lisandro Dalcin >> ============ >> Research Scientist >> Computer, Electrical and Mathematical Sciences & Engineering (CEMSE) >> Extreme Computing Research Center (ECRC) >> King Abdullah University of Science and Technology (KAUST) >> http://ecrc.kaust.edu.sa/ >> >> 4700 King Abdullah University of Science and Technology >> al-Khawarizmi Bldg (Bldg 1), Office # 0109 >> Thuwal 23955-6900, Kingdom of Saudi Arabia >> http://www.kaust.edu.sa >> >> Office Phone: +966 12 808-0459 From jed at jedbrown.org Wed Apr 26 17:44:24 2017 From: jed at jedbrown.org (Jed Brown) Date: Wed, 26 Apr 2017 16:44:24 -0600 Subject: [petsc-users] nondeterministic behavior with ./program -help | head In-Reply-To: <3C68ED64-CFA3-4190-ACA2-276CAD4C529D@mcs.anl.gov> References: <4A3EF0F4-8CBF-47D6-B89F-05A0D20E63E5@mcs.anl.gov> <3C68ED64-CFA3-4190-ACA2-276CAD4C529D@mcs.anl.gov> Message-ID: <87bmri23hz.fsf@jedbrown.org> PETSc is the only package I know for which "./app -help" does everything that "./app" would do, with extra output. It is an unfortunate consequence of extensibility that we can't exit immediately. What if we had a function the user could call to exit early and cleanly after all options have been processed? Maybe something like: PetscInitialize ... if (PetscHelpActive()) { TSSetDuration(ts,1,PETSC_DEFAULT); PetscExitSilently(); } where PetscExitSilently doesn't print -malloc or -log_view or the like. We should probably also have "-help snes,ksp_fieldsplit" that would only print help for options with those prefixes. Barry Smith writes: > Ed > > I think want you want is something like > > ./program -intro > > that causes the help message to PetscInitialize() to be printed and then have the program end. > > We can add this, -intro is not a great name, any idea for a better name? > > Barry > >> On Apr 26, 2017, at 4:30 PM, Ed Bueler wrote: >> >> > Is ./ex5 -help -no_signal_handler | head good enough? >> >> Umm. It eliminates the nondeterminism. Should discovery of options for petsc user applications go via error messages? (Or asking users to remember "-no_signal_handler" to avoid?) I don't understand signal handling well-enough to recommend a better way ... maybe there is no better way for |head. >> >> The best alternative I can think of is if my codes supply the program name for the "const char man[]" argument of PetscOptionsXXX(). Then I can suggest this to users: >> >> ./program -help |grep program >> >> Is this a mis-use of the "man" argument? >> >> Ed >> >> PS Amused to find "This routine should not be used from within a signal handler." in the man page for MPI_Abort(), which I reached by clicking on the call to MPI_Abort() in the view of PetscSignalHandlerDefault(). ;-) >> >> >> On Wed, Apr 26, 2017 at 12:55 PM, Barry Smith wrote: >> >> Is >> >> ./ex5 -help -no_signal_handler | head >> >> good enough? >> >> Normally we won't want to turn off catching of PIPE errors since it might hide other errors. >> >> Barry >> >> >> >> > On Apr 26, 2017, at 3:31 PM, Ed Bueler wrote: >> > >> > Dear Petsc -- >> > >> > Copied at the bottom is the behavior I get for ex5.c in snes examples/tutorials. When I do >> > >> > $./ex5 -help |head >> > >> > I get a "Caught signal number 13 Broken Pipe" or just a version string. Randomly, one or the other. >> > >> > I know this has come up on petsc-users before: >> > http://lists.mcs.anl.gov/pipermail/petsc-users/2014-May/021848.html >> > The claim was that the SIGPIPE error is standard behavior. >> > >> > My real issue (feature request?) is that when writing a petsc application I want a rational help system. Suppose "program.c" has options with prefix "prg_". The user doesn't know the prefix yet (I could have used "prog_" as the prefix ...) The petsc-canonical help system is (I believe) >> > >> > ./program -help |grep prg_ >> > >> > for program.c-specific options. Works great if you know to look for "prg_". My plan for my applications is to put the prefix in the first line or two of my help string, the user can discover the prefix by >> > >> > ./program -help |head >> > >> > and then use "-help |grep -prg_" and etc. >> > >> > So my question is: Can petsc be set up to not generate an error message when the help system is used in a correct way? (I.e. catching return codes when writing, or handling SIGPIPE differently?) Or, on the other hand, can someone suggest a better help system that irritates less? >> > >> > Thanks! >> > >> > Ed >> > >> > ~/petsc/src/snes/examples/tutorials[master*]$ ./ex5 -help |head >> > Bratu nonlinear PDE in 2d. >> > We solve the Bratu (SFI - solid fuel ignition) problem in a 2D rectangular >> > domain, using distributed arrays (DMDAs) to partition the parallel grid. >> > The command line options include: >> > -par , where indicates the problem's nonlinearity >> > problem SFI: = Bratu parameter (0 <= par <= 6.81) >> > >> > -m_par/n_par , where indicates an integer >> > that MMS3 will be evaluated with 2^m_par, 2^n_par-------------------------------------------------------------------------- >> > Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: 2017-04-26 13:00:15 -0500 >> > [0]PETSC ERROR: ------------------------------------------------------------------------ >> > [0]PETSC ERROR: Caught signal number 13 Broken Pipe: Likely while reading or writing to a socket >> > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >> > [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >> > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors >> > [0]PETSC ERROR: likely location of problem given in stack below >> > [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ >> > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, >> > [0]PETSC ERROR: INSTEAD the line number of the start of the function >> > [0]PETSC ERROR: is given. >> > [0]PETSC ERROR: [0] PetscVFPrintfDefault line 240 /home/ed/petsc/src/sys/fileio/mprint.c >> > [0]PETSC ERROR: [0] PetscHelpPrintfDefault line 622 /home/ed/petsc/src/sys/fileio/mprint.c >> > [0]PETSC ERROR: [0] PetscOptionsBegin_Private line 29 /home/ed/petsc/src/sys/objects/aoptions.c >> > [0]PETSC ERROR: [0] PetscObjectOptionsBegin_Private line 61 /home/ed/petsc/src/sys/objects/aoptions.c >> > [0]PETSC ERROR: [0] PetscDSSetFromOptions line 218 /home/ed/petsc/src/dm/dt/interface/dtds.c >> > [0]PETSC ERROR: [0] DMSetFromOptions line 747 /home/ed/petsc/src/dm/interface/dm.c >> > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >> > [0]PETSC ERROR: Signal received >> > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >> > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: 2017-04-26 13:00:15 -0500 >> > [0]PETSC ERROR: ./ex5 on a linux-c-dbg named bueler-leopard by ed Wed Apr 26 11:22:07 2017 >> > [0]PETSC ERROR: Configure options --download-mpich --with-debugging=1 >> > [0]PETSC ERROR: #1 User provided function() line 0 in unknown file >> > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 >> > [unset]: aborting job: >> > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 >> > ~/petsc/src/snes/examples/tutorials[master*]$ ./ex5 -help |head >> > Bratu nonlinear PDE in 2d. >> > We solve the Bratu (SFI - solid fuel ignition) problem in a 2D rectangular >> > domain, using distributed arrays (DMDAs) to partition the parallel grid. >> > The command line options include: >> > -par , where indicates the problem's nonlinearity >> > problem SFI: = Bratu parameter (0 <= par <= 6.81) >> > >> > -m_par/n_par , where indicates an integer >> > that MMS3 will be evaluated with 2^m_par, 2^n_par-------------------------------------------------------------------------- >> > Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: 2017-04-26 13:00:15 -0500 >> > ~/petsc/src/snes/examples/tutorials[master*]$ >> > >> > >> > -- >> > Ed Bueler >> > Dept of Math and Stat and Geophysical Institute >> > University of Alaska Fairbanks >> > Fairbanks, AK 99775-6660 >> > 301C Chapman >> >> >> >> >> -- >> Ed Bueler >> Dept of Math and Stat and Geophysical Institute >> University of Alaska Fairbanks >> Fairbanks, AK 99775-6660 >> 301C Chapman -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From bsmith at mcs.anl.gov Wed Apr 26 18:30:30 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 26 Apr 2017 18:30:30 -0500 Subject: [petsc-users] Multigrid coarse grid solver In-Reply-To: References: Message-ID: <2F68C9DC-49EA-4F79-889B-9E24D068A6C5@mcs.anl.gov> Yes, you asked for LU so it used LU! Of course for smaller coarse grids and large numbers of processes this is very inefficient. The default behavior for GAMG is probably what you want. In that case it is equivalent to -mg_coarse_pc_type bjacobi --mg_coarse_sub_pc_type lu. But GAMG tries hard to put all the coarse grid degrees of freedom on the first process and none on the rest, so you do end up with the exact equivalent of a direct solver. Try -ksp_view in that case. There is also -mg_coarse_pc_type redundant -mg_coarse_redundant_pc_type lu. In that case it makes a copy of the coarse matrix on EACH process and each process does its own factorization and solve. This saves one phase of the communication for each V cycle since every process has the entire solution it just grabs from itself the values it needs without communication. > On Apr 26, 2017, at 5:25 PM, Garth N. Wells wrote: > > I'm a bit confused by the selection of the coarse grid solver for > multigrid. For the demo ksp/ex56, if I do: > > mpirun -np 1 ./ex56 -ne 16 -ksp_view -pc_type gamg > -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu > > I see > > Coarse grid solver -- level ------------------------------- > KSP Object: (mg_coarse_) 1 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_) 1 MPI processes > type: lu > out-of-place factorization > tolerance for zero pivot 2.22045e-14 > matrix ordering: nd > factor fill ratio given 5., needed 1. > Factored matrix follows: > Mat Object: 1 MPI processes > type: seqaij > rows=6, cols=6, bs=6 > package used to perform factorization: petsc > total: nonzeros=36, allocated nonzeros=36 > total number of mallocs used during MatSetValues calls =0 > using I-node routines: found 2 nodes, limit used is 5 > linear system matrix = precond matrix: > Mat Object: 1 MPI processes > type: seqaij > rows=6, cols=6, bs=6 > total: nonzeros=36, allocated nonzeros=36 > total number of mallocs used during MatSetValues calls =0 > using I-node routines: found 2 nodes, limit used is 5 > > which is what I expect. Increasing from 1 to 2 processes: > > mpirun -np 2 ./ex56 -ne 16 -ksp_view -pc_type gamg > -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu > > I see > > Coarse grid solver -- level ------------------------------- > KSP Object: (mg_coarse_) 2 MPI processes > type: preonly > maximum iterations=10000, initial guess is zero > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > left preconditioning > using NONE norm type for convergence test > PC Object: (mg_coarse_) 2 MPI processes > type: lu > out-of-place factorization > tolerance for zero pivot 2.22045e-14 > matrix ordering: natural > factor fill ratio given 0., needed 0. > Factored matrix follows: > Mat Object: 2 MPI processes > type: superlu_dist > rows=6, cols=6 > package used to perform factorization: superlu_dist > total: nonzeros=0, allocated nonzeros=0 > total number of mallocs used during MatSetValues calls =0 > SuperLU_DIST run parameters: > Process grid nprow 2 x npcol 1 > Equilibrate matrix TRUE > Matrix input mode 1 > Replace tiny pivots FALSE > Use iterative refinement FALSE > Processors in row 2 col partition 1 > Row permutation LargeDiag > Column permutation METIS_AT_PLUS_A > Parallel symbolic factorization FALSE > Repeated factorization SamePattern > linear system matrix = precond matrix: > Mat Object: 2 MPI processes > type: mpiaij > rows=6, cols=6, bs=6 > total: nonzeros=36, allocated nonzeros=36 > total number of mallocs used during MatSetValues calls =0 > using I-node (on process 0) routines: found 2 nodes, limit used is 5 > > Note that the coarse grid is now using superlu_dist. Is the coarse > grid being solved in parallel? > > Garth From bsmith at mcs.anl.gov Wed Apr 26 18:32:34 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 26 Apr 2017 18:32:34 -0500 Subject: [petsc-users] nondeterministic behavior with ./program -help | head In-Reply-To: <87bmri23hz.fsf@jedbrown.org> References: <4A3EF0F4-8CBF-47D6-B89F-05A0D20E63E5@mcs.anl.gov> <3C68ED64-CFA3-4190-ACA2-276CAD4C529D@mcs.anl.gov> <87bmri23hz.fsf@jedbrown.org> Message-ID: > On Apr 26, 2017, at 5:44 PM, Jed Brown wrote: > > PETSc is the only package I know for which "./app -help" does everything > that "./app" would do, with extra output. It is an unfortunate > consequence of extensibility that we can't exit immediately. What if we > had a function the user could call to exit early and cleanly after all > options have been processed? Maybe something like: > > PetscInitialize > ... > if (PetscHelpActive()) { > TSSetDuration(ts,1,PETSC_DEFAULT); > PetscExitSilently(); > } > > where PetscExitSilently doesn't print -malloc or -log_view or the like. Yuck! > > We should probably also have "-help snes,ksp_fieldsplit" that would only > print help for options with those prefixes. This would be useful. > > Barry Smith writes: > >> Ed >> >> I think want you want is something like >> >> ./program -intro >> >> that causes the help message to PetscInitialize() to be printed and then have the program end. >> >> We can add this, -intro is not a great name, any idea for a better name? >> >> Barry >> >>> On Apr 26, 2017, at 4:30 PM, Ed Bueler wrote: >>> >>>> Is ./ex5 -help -no_signal_handler | head good enough? >>> >>> Umm. It eliminates the nondeterminism. Should discovery of options for petsc user applications go via error messages? (Or asking users to remember "-no_signal_handler" to avoid?) I don't understand signal handling well-enough to recommend a better way ... maybe there is no better way for |head. >>> >>> The best alternative I can think of is if my codes supply the program name for the "const char man[]" argument of PetscOptionsXXX(). Then I can suggest this to users: >>> >>> ./program -help |grep program >>> >>> Is this a mis-use of the "man" argument? >>> >>> Ed >>> >>> PS Amused to find "This routine should not be used from within a signal handler." in the man page for MPI_Abort(), which I reached by clicking on the call to MPI_Abort() in the view of PetscSignalHandlerDefault(). ;-) >>> >>> >>> On Wed, Apr 26, 2017 at 12:55 PM, Barry Smith wrote: >>> >>> Is >>> >>> ./ex5 -help -no_signal_handler | head >>> >>> good enough? >>> >>> Normally we won't want to turn off catching of PIPE errors since it might hide other errors. >>> >>> Barry >>> >>> >>> >>>> On Apr 26, 2017, at 3:31 PM, Ed Bueler wrote: >>>> >>>> Dear Petsc -- >>>> >>>> Copied at the bottom is the behavior I get for ex5.c in snes examples/tutorials. When I do >>>> >>>> $./ex5 -help |head >>>> >>>> I get a "Caught signal number 13 Broken Pipe" or just a version string. Randomly, one or the other. >>>> >>>> I know this has come up on petsc-users before: >>>> http://lists.mcs.anl.gov/pipermail/petsc-users/2014-May/021848.html >>>> The claim was that the SIGPIPE error is standard behavior. >>>> >>>> My real issue (feature request?) is that when writing a petsc application I want a rational help system. Suppose "program.c" has options with prefix "prg_". The user doesn't know the prefix yet (I could have used "prog_" as the prefix ...) The petsc-canonical help system is (I believe) >>>> >>>> ./program -help |grep prg_ >>>> >>>> for program.c-specific options. Works great if you know to look for "prg_". My plan for my applications is to put the prefix in the first line or two of my help string, the user can discover the prefix by >>>> >>>> ./program -help |head >>>> >>>> and then use "-help |grep -prg_" and etc. >>>> >>>> So my question is: Can petsc be set up to not generate an error message when the help system is used in a correct way? (I.e. catching return codes when writing, or handling SIGPIPE differently?) Or, on the other hand, can someone suggest a better help system that irritates less? >>>> >>>> Thanks! >>>> >>>> Ed >>>> >>>> ~/petsc/src/snes/examples/tutorials[master*]$ ./ex5 -help |head >>>> Bratu nonlinear PDE in 2d. >>>> We solve the Bratu (SFI - solid fuel ignition) problem in a 2D rectangular >>>> domain, using distributed arrays (DMDAs) to partition the parallel grid. >>>> The command line options include: >>>> -par , where indicates the problem's nonlinearity >>>> problem SFI: = Bratu parameter (0 <= par <= 6.81) >>>> >>>> -m_par/n_par , where indicates an integer >>>> that MMS3 will be evaluated with 2^m_par, 2^n_par-------------------------------------------------------------------------- >>>> Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: 2017-04-26 13:00:15 -0500 >>>> [0]PETSC ERROR: ------------------------------------------------------------------------ >>>> [0]PETSC ERROR: Caught signal number 13 Broken Pipe: Likely while reading or writing to a socket >>>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >>>> [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >>>> [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors >>>> [0]PETSC ERROR: likely location of problem given in stack below >>>> [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ >>>> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, >>>> [0]PETSC ERROR: INSTEAD the line number of the start of the function >>>> [0]PETSC ERROR: is given. >>>> [0]PETSC ERROR: [0] PetscVFPrintfDefault line 240 /home/ed/petsc/src/sys/fileio/mprint.c >>>> [0]PETSC ERROR: [0] PetscHelpPrintfDefault line 622 /home/ed/petsc/src/sys/fileio/mprint.c >>>> [0]PETSC ERROR: [0] PetscOptionsBegin_Private line 29 /home/ed/petsc/src/sys/objects/aoptions.c >>>> [0]PETSC ERROR: [0] PetscObjectOptionsBegin_Private line 61 /home/ed/petsc/src/sys/objects/aoptions.c >>>> [0]PETSC ERROR: [0] PetscDSSetFromOptions line 218 /home/ed/petsc/src/dm/dt/interface/dtds.c >>>> [0]PETSC ERROR: [0] DMSetFromOptions line 747 /home/ed/petsc/src/dm/interface/dm.c >>>> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>>> [0]PETSC ERROR: Signal received >>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>>> [0]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: 2017-04-26 13:00:15 -0500 >>>> [0]PETSC ERROR: ./ex5 on a linux-c-dbg named bueler-leopard by ed Wed Apr 26 11:22:07 2017 >>>> [0]PETSC ERROR: Configure options --download-mpich --with-debugging=1 >>>> [0]PETSC ERROR: #1 User provided function() line 0 in unknown file >>>> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 >>>> [unset]: aborting job: >>>> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 >>>> ~/petsc/src/snes/examples/tutorials[master*]$ ./ex5 -help |head >>>> Bratu nonlinear PDE in 2d. >>>> We solve the Bratu (SFI - solid fuel ignition) problem in a 2D rectangular >>>> domain, using distributed arrays (DMDAs) to partition the parallel grid. >>>> The command line options include: >>>> -par , where indicates the problem's nonlinearity >>>> problem SFI: = Bratu parameter (0 <= par <= 6.81) >>>> >>>> -m_par/n_par , where indicates an integer >>>> that MMS3 will be evaluated with 2^m_par, 2^n_par-------------------------------------------------------------------------- >>>> Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: 2017-04-26 13:00:15 -0500 >>>> ~/petsc/src/snes/examples/tutorials[master*]$ >>>> >>>> >>>> -- >>>> Ed Bueler >>>> Dept of Math and Stat and Geophysical Institute >>>> University of Alaska Fairbanks >>>> Fairbanks, AK 99775-6660 >>>> 301C Chapman >>> >>> >>> >>> >>> -- >>> Ed Bueler >>> Dept of Math and Stat and Geophysical Institute >>> University of Alaska Fairbanks >>> Fairbanks, AK 99775-6660 >>> 301C Chapman From bsmith at mcs.anl.gov Wed Apr 26 18:33:52 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Wed, 26 Apr 2017 18:33:52 -0500 Subject: [petsc-users] Fwd: nondeterministic behavior with ./program -help | head In-Reply-To: References: <4A3EF0F4-8CBF-47D6-B89F-05A0D20E63E5@mcs.anl.gov> <3C68ED64-CFA3-4190-ACA2-276CAD4C529D@mcs.anl.gov> Message-ID: <0C269211-4CB1-4073-8A2E-746E338EDD11@mcs.anl.gov> Does -help_only make sense? Not to me, I would think this would mean print all the help but don't run anything; but this is impossible since it has to run in order to know what help to print. > On Apr 26, 2017, at 5:34 PM, Ed Bueler wrote: > > [Missed reply all.] > > ---------- Forwarded message ---------- > From: Ed Bueler > Date: Wed, Apr 26, 2017 at 2:33 PM > Subject: Re: [petsc-users] nondeterministic behavior with ./program -help | head > To: Barry Smith > > > Yes, I thought of something vaguely like that. How about > > ./program -help_string > > or -help_only? > > Ed > > On Wed, Apr 26, 2017 at 2:28 PM, Barry Smith wrote: > > Ed > > I think want you want is something like > > ./program -intro > > that causes the help message to PetscInitialize() to be printed and then have the program end. > > We can add this, -intro is not a great name, any idea for a better name? > > Barry > > > On Apr 26, 2017, at 4:30 PM, Ed Bueler wrote: > > > > > Is ./ex5 -help -no_signal_handler | head good enough? > > > > Umm. It eliminates the nondeterminism. Should discovery of options for petsc user applications go via error messages? (Or asking users to remember "-no_signal_handler" to avoid?) I don't understand signal handling well-enough to recommend a better way ... maybe there is no better way for |head. > > > > The best alternative I can think of is if my codes supply the program name for the "const char man[]" argument of PetscOptionsXXX(). Then I can suggest this to users: > > > > ./program -help |grep program > > > > Is this a mis-use of the "man" argument? > > > > Ed > > > > PS Amused to find "This routine should not be used from within a signal handler." in the man page for MPI_Abort(), which I reached by clicking on the call to MPI_Abort() in the view of PetscSignalHandlerDefault(). ;-) > > > > > > On Wed, Apr 26, 2017 at 12:55 PM, Barry Smith wrote: > > > > Is > > > > ./ex5 -help -no_signal_handler | head > > > > good enough? > > > > Normally we won't want to turn off catching of PIPE errors since it might hide other errors. > > > > Barry > > > > > > > > > On Apr 26, 2017, at 3:31 PM, Ed Bueler wrote: > > > > > > Dear Petsc -- > > > > > > Copied at the bottom is the behavior I get for ex5.c in snes examples/tutorials. When I do > > > > > > $./ex5 -help |head > > > > > > I get a "Caught signal number 13 Broken Pipe" or just a version string. Randomly, one or the other. > > > > > > I know this has come up on petsc-users before: > > > http://lists.mcs.anl.gov/pipermail/petsc-users/2014-May/021848.html > > > The claim was that the SIGPIPE error is standard behavior. > > > > > > My real issue (feature request?) is that when writing a petsc application I want a rational help system. Suppose "program.c" has options with prefix "prg_". The user doesn't know the prefix yet (I could have used "prog_" as the prefix ...) The petsc-canonical help system is (I believe) > > > > > > ./program -help |grep prg_ > > > > > > for program.c-specific options. Works great if you know to look for "prg_". My plan for my applications is to put the prefix in the first line or two of my help string, the user can discover the prefix by > > > > > > ./program -help |head > > > > > > and then use "-help |grep -prg_" and etc. > > > > > > So my question is: Can petsc be set up to not generate an error message when the help system is used in a correct way? (I.e. catching return codes when writing, or handling SIGPIPE differently?) Or, on the other hand, can someone suggest a better help system that irritates less? > > > > > > Thanks! > > > > > > Ed > > > > > > ~/petsc/src/snes/examples/tutorials[master*]$ ./ex5 -help |head > > > Bratu nonlinear PDE in 2d. > > > We solve the Bratu (SFI - solid fuel ignition) problem in a 2D rectangular > > > domain, using distributed arrays (DMDAs) to partition the parallel grid. > > > The command line options include: > > > -par , where indicates the problem's nonlinearity > > > problem SFI: = Bratu parameter (0 <= par <= 6.81) > > > > > > -m_par/n_par , where indicates an integer > > > that MMS3 will be evaluated with 2^m_par, 2^n_par-------------------------------------------------------------------------- > > > Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: 2017-04-26 13:00:15 -0500 > > > [0]PETSC ERROR: ------------------------------------------------------------------------ > > > [0]PETSC ERROR: Caught signal number 13 Broken Pipe: Likely while reading or writing to a socket > > > [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > > > [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors > > > [0]PETSC ERROR: likely location of problem given in stack below > > > [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ > > > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, > > > [0]PETSC ERROR: INSTEAD the line number of the start of the function > > > [0]PETSC ERROR: is given. > > > [0]PETSC ERROR: [0] PetscVFPrintfDefault line 240 /home/ed/petsc/src/sys/fileio/mprint.c > > > [0]PETSC ERROR: [0] PetscHelpPrintfDefault line 622 /home/ed/petsc/src/sys/fileio/mprint.c > > > [0]PETSC ERROR: [0] PetscOptionsBegin_Private line 29 /home/ed/petsc/src/sys/objects/aoptions.c > > > [0]PETSC ERROR: [0] PetscObjectOptionsBegin_Private line 61 /home/ed/petsc/src/sys/objects/aoptions.c > > > [0]PETSC ERROR: [0] PetscDSSetFromOptions line 218 /home/ed/petsc/src/dm/dt/interface/dtds.c > > > [0]PETSC ERROR: [0] DMSetFromOptions line 747 /home/ed/petsc/src/dm/interface/dm.c > > > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > > > [0]PETSC ERROR: Signal received > > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > > > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: 2017-04-26 13:00:15 -0500 > > > [0]PETSC ERROR: ./ex5 on a linux-c-dbg named bueler-leopard by ed Wed Apr 26 11:22:07 2017 > > > [0]PETSC ERROR: Configure options --download-mpich --with-debugging=1 > > > [0]PETSC ERROR: #1 User provided function() line 0 in unknown file > > > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 > > > [unset]: aborting job: > > > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 > > > ~/petsc/src/snes/examples/tutorials[master*]$ ./ex5 -help |head > > > Bratu nonlinear PDE in 2d. > > > We solve the Bratu (SFI - solid fuel ignition) problem in a 2D rectangular > > > domain, using distributed arrays (DMDAs) to partition the parallel grid. > > > The command line options include: > > > -par , where indicates the problem's nonlinearity > > > problem SFI: = Bratu parameter (0 <= par <= 6.81) > > > > > > -m_par/n_par , where indicates an integer > > > that MMS3 will be evaluated with 2^m_par, 2^n_par-------------------------------------------------------------------------- > > > Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: 2017-04-26 13:00:15 -0500 > > > ~/petsc/src/snes/examples/tutorials[master*]$ > > > > > > > > > -- > > > Ed Bueler > > > Dept of Math and Stat and Geophysical Institute > > > University of Alaska Fairbanks > > > Fairbanks, AK 99775-6660 > > > 301C Chapman > > > > > > > > > > -- > > Ed Bueler > > Dept of Math and Stat and Geophysical Institute > > University of Alaska Fairbanks > > Fairbanks, AK 99775-6660 > > 301C Chapman > > > > > -- > Ed Bueler > Dept of Math and Stat and Geophysical Institute > University of Alaska Fairbanks > Fairbanks, AK 99775-6660 > 301C Chapman > > > > -- > Ed Bueler > Dept of Math and Stat and Geophysical Institute > University of Alaska Fairbanks > Fairbanks, AK 99775-6660 > 301C Chapman From elbueler at alaska.edu Wed Apr 26 18:38:55 2017 From: elbueler at alaska.edu (Ed Bueler) Date: Wed, 26 Apr 2017 15:38:55 -0800 Subject: [petsc-users] Fwd: nondeterministic behavior with ./program -help | head In-Reply-To: <0C269211-4CB1-4073-8A2E-746E338EDD11@mcs.anl.gov> References: <4A3EF0F4-8CBF-47D6-B89F-05A0D20E63E5@mcs.anl.gov> <3C68ED64-CFA3-4190-ACA2-276CAD4C529D@mcs.anl.gov> <0C269211-4CB1-4073-8A2E-746E338EDD11@mcs.anl.gov> Message-ID: Right. -help_string? Ed On Wed, Apr 26, 2017 at 3:33 PM, Barry Smith wrote: > > Does -help_only make sense? Not to me, I would think this would mean > print all the help but don't run anything; but this is impossible since it > has to run in order to know what help to print. > > > > On Apr 26, 2017, at 5:34 PM, Ed Bueler wrote: > > > > [Missed reply all.] > > > > ---------- Forwarded message ---------- > > From: Ed Bueler > > Date: Wed, Apr 26, 2017 at 2:33 PM > > Subject: Re: [petsc-users] nondeterministic behavior with ./program > -help | head > > To: Barry Smith > > > > > > Yes, I thought of something vaguely like that. How about > > > > ./program -help_string > > > > or -help_only? > > > > Ed > > > > On Wed, Apr 26, 2017 at 2:28 PM, Barry Smith wrote: > > > > Ed > > > > I think want you want is something like > > > > ./program -intro > > > > that causes the help message to PetscInitialize() to be printed and then > have the program end. > > > > We can add this, -intro is not a great name, any idea for a better name? > > > > Barry > > > > > On Apr 26, 2017, at 4:30 PM, Ed Bueler wrote: > > > > > > > Is ./ex5 -help -no_signal_handler | head good enough? > > > > > > Umm. It eliminates the nondeterminism. Should discovery of options > for petsc user applications go via error messages? (Or asking users to > remember "-no_signal_handler" to avoid?) I don't understand signal > handling well-enough to recommend a better way ... maybe there is no better > way for |head. > > > > > > The best alternative I can think of is if my codes supply the program > name for the "const char man[]" argument of PetscOptionsXXX(). Then I can > suggest this to users: > > > > > > ./program -help |grep program > > > > > > Is this a mis-use of the "man" argument? > > > > > > Ed > > > > > > PS Amused to find "This routine should not be used from within a > signal handler." in the man page for MPI_Abort(), which I reached by > clicking on the call to MPI_Abort() in the view of > PetscSignalHandlerDefault(). ;-) > > > > > > > > > On Wed, Apr 26, 2017 at 12:55 PM, Barry Smith > wrote: > > > > > > Is > > > > > > ./ex5 -help -no_signal_handler | head > > > > > > good enough? > > > > > > Normally we won't want to turn off catching of PIPE errors since it > might hide other errors. > > > > > > Barry > > > > > > > > > > > > > On Apr 26, 2017, at 3:31 PM, Ed Bueler wrote: > > > > > > > > Dear Petsc -- > > > > > > > > Copied at the bottom is the behavior I get for ex5.c in snes > examples/tutorials. When I do > > > > > > > > $./ex5 -help |head > > > > > > > > I get a "Caught signal number 13 Broken Pipe" or just a version > string. Randomly, one or the other. > > > > > > > > I know this has come up on petsc-users before: > > > > http://lists.mcs.anl.gov/pipermail/petsc-users/2014- > May/021848.html > > > > The claim was that the SIGPIPE error is standard behavior. > > > > > > > > My real issue (feature request?) is that when writing a petsc > application I want a rational help system. Suppose "program.c" has options > with prefix "prg_". The user doesn't know the prefix yet (I could have > used "prog_" as the prefix ...) The petsc-canonical help system is (I > believe) > > > > > > > > ./program -help |grep prg_ > > > > > > > > for program.c-specific options. Works great if you know to look for > "prg_". My plan for my applications is to put the prefix in the first line > or two of my help string, the user can discover the prefix by > > > > > > > > ./program -help |head > > > > > > > > and then use "-help |grep -prg_" and etc. > > > > > > > > So my question is: Can petsc be set up to not generate an error > message when the help system is used in a correct way? (I.e. catching > return codes when writing, or handling SIGPIPE differently?) Or, on the > other hand, can someone suggest a better help system that irritates less? > > > > > > > > Thanks! > > > > > > > > Ed > > > > > > > > ~/petsc/src/snes/examples/tutorials[master*]$ ./ex5 -help |head > > > > Bratu nonlinear PDE in 2d. > > > > We solve the Bratu (SFI - solid fuel ignition) problem in a 2D > rectangular > > > > domain, using distributed arrays (DMDAs) to partition the parallel > grid. > > > > The command line options include: > > > > -par , where indicates the problem's > nonlinearity > > > > problem SFI: = Bratu parameter (0 <= par <= 6.81) > > > > > > > > -m_par/n_par , where indicates an integer > > > > that MMS3 will be evaluated with 2^m_par, > 2^n_par----------------------------------------------------- > --------------------- > > > > Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: > 2017-04-26 13:00:15 -0500 > > > > [0]PETSC ERROR: ------------------------------ > ------------------------------------------ > > > > [0]PETSC ERROR: Caught signal number 13 Broken Pipe: Likely while > reading or writing to a socket > > > > [0]PETSC ERROR: Try option -start_in_debugger or > -on_error_attach_debugger > > > > [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/ > documentation/faq.html#valgrind > > > > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple > Mac OS X to find memory corruption errors > > > > [0]PETSC ERROR: likely location of problem given in stack below > > > > [0]PETSC ERROR: --------------------- Stack Frames > ------------------------------------ > > > > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not > available, > > > > [0]PETSC ERROR: INSTEAD the line number of the start of the > function > > > > [0]PETSC ERROR: is given. > > > > [0]PETSC ERROR: [0] PetscVFPrintfDefault line 240 > /home/ed/petsc/src/sys/fileio/mprint.c > > > > [0]PETSC ERROR: [0] PetscHelpPrintfDefault line 622 > /home/ed/petsc/src/sys/fileio/mprint.c > > > > [0]PETSC ERROR: [0] PetscOptionsBegin_Private line 29 > /home/ed/petsc/src/sys/objects/aoptions.c > > > > [0]PETSC ERROR: [0] PetscObjectOptionsBegin_Private line 61 > /home/ed/petsc/src/sys/objects/aoptions.c > > > > [0]PETSC ERROR: [0] PetscDSSetFromOptions line 218 > /home/ed/petsc/src/dm/dt/interface/dtds.c > > > > [0]PETSC ERROR: [0] DMSetFromOptions line 747 /home/ed/petsc/src/dm/ > interface/dm.c > > > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > > > > [0]PETSC ERROR: Signal received > > > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/ > documentation/faq.html for trouble shooting. > > > > [0]PETSC ERROR: Petsc Development GIT revision: > v3.7.6-3453-ge45481d470 GIT Date: 2017-04-26 13:00:15 -0500 > > > > [0]PETSC ERROR: ./ex5 on a linux-c-dbg named bueler-leopard by ed > Wed Apr 26 11:22:07 2017 > > > > [0]PETSC ERROR: Configure options --download-mpich --with-debugging=1 > > > > [0]PETSC ERROR: #1 User provided function() line 0 in unknown file > > > > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 > > > > [unset]: aborting job: > > > > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 > > > > ~/petsc/src/snes/examples/tutorials[master*]$ ./ex5 -help |head > > > > Bratu nonlinear PDE in 2d. > > > > We solve the Bratu (SFI - solid fuel ignition) problem in a 2D > rectangular > > > > domain, using distributed arrays (DMDAs) to partition the parallel > grid. > > > > The command line options include: > > > > -par , where indicates the problem's > nonlinearity > > > > problem SFI: = Bratu parameter (0 <= par <= 6.81) > > > > > > > > -m_par/n_par , where indicates an integer > > > > that MMS3 will be evaluated with 2^m_par, > 2^n_par----------------------------------------------------- > --------------------- > > > > Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: > 2017-04-26 13:00:15 -0500 > > > > ~/petsc/src/snes/examples/tutorials[master*]$ > > > > > > > > > > > > -- > > > > Ed Bueler > > > > Dept of Math and Stat and Geophysical Institute > > > > University of Alaska Fairbanks > > > > Fairbanks, AK 99775-6660 > > > > 301C Chapman > > > > > > > > > > > > > > > -- > > > Ed Bueler > > > Dept of Math and Stat and Geophysical Institute > > > University of Alaska Fairbanks > > > Fairbanks, AK 99775-6660 > > > 301C Chapman > > > > > > > > > > -- > > Ed Bueler > > Dept of Math and Stat and Geophysical Institute > > University of Alaska Fairbanks > > Fairbanks, AK 99775-6660 > > 301C Chapman > > > > > > > > -- > > Ed Bueler > > Dept of Math and Stat and Geophysical Institute > > University of Alaska Fairbanks > > Fairbanks, AK 99775-6660 > > 301C Chapman > > -- Ed Bueler Dept of Math and Stat and Geophysical Institute University of Alaska Fairbanks Fairbanks, AK 99775-6660 301C Chapman -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Wed Apr 26 18:39:45 2017 From: jed at jedbrown.org (Jed Brown) Date: Wed, 26 Apr 2017 17:39:45 -0600 Subject: [petsc-users] nondeterministic behavior with ./program -help | head In-Reply-To: References: <4A3EF0F4-8CBF-47D6-B89F-05A0D20E63E5@mcs.anl.gov> <3C68ED64-CFA3-4190-ACA2-276CAD4C529D@mcs.anl.gov> <87bmri23hz.fsf@jedbrown.org> Message-ID: <8760hq20xq.fsf@jedbrown.org> Barry Smith writes: >> On Apr 26, 2017, at 5:44 PM, Jed Brown wrote: >> >> PETSc is the only package I know for which "./app -help" does everything >> that "./app" would do, with extra output. It is an unfortunate >> consequence of extensibility that we can't exit immediately. What if we >> had a function the user could call to exit early and cleanly after all >> options have been processed? Maybe something like: >> >> PetscInitialize >> ... >> if (PetscHelpActive()) { >> TSSetDuration(ts,1,PETSC_DEFAULT); >> PetscExitSilently(); >> } >> >> where PetscExitSilently doesn't print -malloc or -log_view or the like. > > Yuck! This isn't something we would use in examples, but is something some applications might prefer. I agree it's not elegant, but don't know how else to get anything like normal behavior (perhaps that's okay). >> We should probably also have "-help snes,ksp_fieldsplit" that would only >> print help for options with those prefixes. > > This would be useful. >> >> Barry Smith writes: >> >>> Ed >>> >>> I think want you want is something like >>> >>> ./program -intro >>> >>> that causes the help message to PetscInitialize() to be printed and then have the program end. >>> >>> We can add this, -intro is not a great name, any idea for a better name? >>> >>> Barry >>> >>>> On Apr 26, 2017, at 4:30 PM, Ed Bueler wrote: >>>> >>>>> Is ./ex5 -help -no_signal_handler | head good enough? >>>> >>>> Umm. It eliminates the nondeterminism. Should discovery of options for petsc user applications go via error messages? (Or asking users to remember "-no_signal_handler" to avoid?) I don't understand signal handling well-enough to recommend a better way ... maybe there is no better way for |head. >>>> >>>> The best alternative I can think of is if my codes supply the program name for the "const char man[]" argument of PetscOptionsXXX(). Then I can suggest this to users: >>>> >>>> ./program -help |grep program >>>> >>>> Is this a mis-use of the "man" argument? >>>> >>>> Ed >>>> >>>> PS Amused to find "This routine should not be used from within a signal handler." in the man page for MPI_Abort(), which I reached by clicking on the call to MPI_Abort() in the view of PetscSignalHandlerDefault(). ;-) >>>> >>>> >>>> On Wed, Apr 26, 2017 at 12:55 PM, Barry Smith wrote: >>>> >>>> Is >>>> >>>> ./ex5 -help -no_signal_handler | head >>>> >>>> good enough? >>>> >>>> Normally we won't want to turn off catching of PIPE errors since it might hide other errors. >>>> >>>> Barry >>>> >>>> >>>> >>>>> On Apr 26, 2017, at 3:31 PM, Ed Bueler wrote: >>>>> >>>>> Dear Petsc -- >>>>> >>>>> Copied at the bottom is the behavior I get for ex5.c in snes examples/tutorials. When I do >>>>> >>>>> $./ex5 -help |head >>>>> >>>>> I get a "Caught signal number 13 Broken Pipe" or just a version string. Randomly, one or the other. >>>>> >>>>> I know this has come up on petsc-users before: >>>>> http://lists.mcs.anl.gov/pipermail/petsc-users/2014-May/021848.html >>>>> The claim was that the SIGPIPE error is standard behavior. >>>>> >>>>> My real issue (feature request?) is that when writing a petsc application I want a rational help system. Suppose "program.c" has options with prefix "prg_". The user doesn't know the prefix yet (I could have used "prog_" as the prefix ...) The petsc-canonical help system is (I believe) >>>>> >>>>> ./program -help |grep prg_ >>>>> >>>>> for program.c-specific options. Works great if you know to look for "prg_". My plan for my applications is to put the prefix in the first line or two of my help string, the user can discover the prefix by >>>>> >>>>> ./program -help |head >>>>> >>>>> and then use "-help |grep -prg_" and etc. >>>>> >>>>> So my question is: Can petsc be set up to not generate an error message when the help system is used in a correct way? (I.e. catching return codes when writing, or handling SIGPIPE differently?) Or, on the other hand, can someone suggest a better help system that irritates less? >>>>> >>>>> Thanks! >>>>> >>>>> Ed >>>>> >>>>> ~/petsc/src/snes/examples/tutorials[master*]$ ./ex5 -help |head >>>>> Bratu nonlinear PDE in 2d. >>>>> We solve the Bratu (SFI - solid fuel ignition) problem in a 2D rectangular >>>>> domain, using distributed arrays (DMDAs) to partition the parallel grid. >>>>> The command line options include: >>>>> -par , where indicates the problem's nonlinearity >>>>> problem SFI: = Bratu parameter (0 <= par <= 6.81) >>>>> >>>>> -m_par/n_par , where indicates an integer >>>>> that MMS3 will be evaluated with 2^m_par, 2^n_par-------------------------------------------------------------------------- >>>>> Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: 2017-04-26 13:00:15 -0500 >>>>> [0]PETSC ERROR: ------------------------------------------------------------------------ >>>>> [0]PETSC ERROR: Caught signal number 13 Broken Pipe: Likely while reading or writing to a socket >>>>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >>>>> [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >>>>> [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors >>>>> [0]PETSC ERROR: likely location of problem given in stack below >>>>> [0]PETSC ERROR: --------------------- Stack Frames ------------------------------------ >>>>> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available, >>>>> [0]PETSC ERROR: INSTEAD the line number of the start of the function >>>>> [0]PETSC ERROR: is given. >>>>> [0]PETSC ERROR: [0] PetscVFPrintfDefault line 240 /home/ed/petsc/src/sys/fileio/mprint.c >>>>> [0]PETSC ERROR: [0] PetscHelpPrintfDefault line 622 /home/ed/petsc/src/sys/fileio/mprint.c >>>>> [0]PETSC ERROR: [0] PetscOptionsBegin_Private line 29 /home/ed/petsc/src/sys/objects/aoptions.c >>>>> [0]PETSC ERROR: [0] PetscObjectOptionsBegin_Private line 61 /home/ed/petsc/src/sys/objects/aoptions.c >>>>> [0]PETSC ERROR: [0] PetscDSSetFromOptions line 218 /home/ed/petsc/src/dm/dt/interface/dtds.c >>>>> [0]PETSC ERROR: [0] DMSetFromOptions line 747 /home/ed/petsc/src/dm/interface/dm.c >>>>> [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- >>>>> [0]PETSC ERROR: Signal received >>>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. >>>>> [0]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: 2017-04-26 13:00:15 -0500 >>>>> [0]PETSC ERROR: ./ex5 on a linux-c-dbg named bueler-leopard by ed Wed Apr 26 11:22:07 2017 >>>>> [0]PETSC ERROR: Configure options --download-mpich --with-debugging=1 >>>>> [0]PETSC ERROR: #1 User provided function() line 0 in unknown file >>>>> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 >>>>> [unset]: aborting job: >>>>> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 >>>>> ~/petsc/src/snes/examples/tutorials[master*]$ ./ex5 -help |head >>>>> Bratu nonlinear PDE in 2d. >>>>> We solve the Bratu (SFI - solid fuel ignition) problem in a 2D rectangular >>>>> domain, using distributed arrays (DMDAs) to partition the parallel grid. >>>>> The command line options include: >>>>> -par , where indicates the problem's nonlinearity >>>>> problem SFI: = Bratu parameter (0 <= par <= 6.81) >>>>> >>>>> -m_par/n_par , where indicates an integer >>>>> that MMS3 will be evaluated with 2^m_par, 2^n_par-------------------------------------------------------------------------- >>>>> Petsc Development GIT revision: v3.7.6-3453-ge45481d470 GIT Date: 2017-04-26 13:00:15 -0500 >>>>> ~/petsc/src/snes/examples/tutorials[master*]$ >>>>> >>>>> >>>>> -- >>>>> Ed Bueler >>>>> Dept of Math and Stat and Geophysical Institute >>>>> University of Alaska Fairbanks >>>>> Fairbanks, AK 99775-6660 >>>>> 301C Chapman >>>> >>>> >>>> >>>> >>>> -- >>>> Ed Bueler >>>> Dept of Math and Stat and Geophysical Institute >>>> University of Alaska Fairbanks >>>> Fairbanks, AK 99775-6660 >>>> 301C Chapman -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From gnw20 at cam.ac.uk Thu Apr 27 00:59:02 2017 From: gnw20 at cam.ac.uk (Garth N. Wells) Date: Thu, 27 Apr 2017 06:59:02 +0100 Subject: [petsc-users] Multigrid coarse grid solver In-Reply-To: <2F68C9DC-49EA-4F79-889B-9E24D068A6C5@mcs.anl.gov> References: <2F68C9DC-49EA-4F79-889B-9E24D068A6C5@mcs.anl.gov> Message-ID: On 27 April 2017 at 00:30, Barry Smith wrote: > > Yes, you asked for LU so it used LU! > > Of course for smaller coarse grids and large numbers of processes this is very inefficient. > > The default behavior for GAMG is probably what you want. In that case it is equivalent to > -mg_coarse_pc_type bjacobi --mg_coarse_sub_pc_type lu. But GAMG tries hard to put all the coarse grid degrees > of freedom on the first process and none on the rest, so you do end up with the exact equivalent of a direct solver. > Try -ksp_view in that case. > Thanks, Barry. I'm struggling a little to understand the matrix data structure for the coarse grid. Is it just a mpiaji matrix, with all entries (usually) on one process? Is there an options key prefix for the matrix on different levels? E.g., to turn on a viewer? If I get GAMG to use more than one process for the coarse grid (a GAMG setting), can I get a parallel LU (exact) solver to solve it using only the processes that store parts of the coarse grid matrix? Related to all this, do the parallel LU solvers internally re-distribute a matrix over the whole MPI communicator as part of their re-ordering phase? Garth > There is also -mg_coarse_pc_type redundant -mg_coarse_redundant_pc_type lu. In that case it makes a copy of the coarse matrix on EACH process and each process does its own factorization and solve. This saves one phase of the communication for each V cycle since every process has the entire solution it just grabs from itself the values it needs without communication. > > > > >> On Apr 26, 2017, at 5:25 PM, Garth N. Wells wrote: >> >> I'm a bit confused by the selection of the coarse grid solver for >> multigrid. For the demo ksp/ex56, if I do: >> >> mpirun -np 1 ./ex56 -ne 16 -ksp_view -pc_type gamg >> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu >> >> I see >> >> Coarse grid solver -- level ------------------------------- >> KSP Object: (mg_coarse_) 1 MPI processes >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (mg_coarse_) 1 MPI processes >> type: lu >> out-of-place factorization >> tolerance for zero pivot 2.22045e-14 >> matrix ordering: nd >> factor fill ratio given 5., needed 1. >> Factored matrix follows: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=6, cols=6, bs=6 >> package used to perform factorization: petsc >> total: nonzeros=36, allocated nonzeros=36 >> total number of mallocs used during MatSetValues calls =0 >> using I-node routines: found 2 nodes, limit used is 5 >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=6, cols=6, bs=6 >> total: nonzeros=36, allocated nonzeros=36 >> total number of mallocs used during MatSetValues calls =0 >> using I-node routines: found 2 nodes, limit used is 5 >> >> which is what I expect. Increasing from 1 to 2 processes: >> >> mpirun -np 2 ./ex56 -ne 16 -ksp_view -pc_type gamg >> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu >> >> I see >> >> Coarse grid solver -- level ------------------------------- >> KSP Object: (mg_coarse_) 2 MPI processes >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (mg_coarse_) 2 MPI processes >> type: lu >> out-of-place factorization >> tolerance for zero pivot 2.22045e-14 >> matrix ordering: natural >> factor fill ratio given 0., needed 0. >> Factored matrix follows: >> Mat Object: 2 MPI processes >> type: superlu_dist >> rows=6, cols=6 >> package used to perform factorization: superlu_dist >> total: nonzeros=0, allocated nonzeros=0 >> total number of mallocs used during MatSetValues calls =0 >> SuperLU_DIST run parameters: >> Process grid nprow 2 x npcol 1 >> Equilibrate matrix TRUE >> Matrix input mode 1 >> Replace tiny pivots FALSE >> Use iterative refinement FALSE >> Processors in row 2 col partition 1 >> Row permutation LargeDiag >> Column permutation METIS_AT_PLUS_A >> Parallel symbolic factorization FALSE >> Repeated factorization SamePattern >> linear system matrix = precond matrix: >> Mat Object: 2 MPI processes >> type: mpiaij >> rows=6, cols=6, bs=6 >> total: nonzeros=36, allocated nonzeros=36 >> total number of mallocs used during MatSetValues calls =0 >> using I-node (on process 0) routines: found 2 nodes, limit used is 5 >> >> Note that the coarse grid is now using superlu_dist. Is the coarse >> grid being solved in parallel? >> >> Garth > From hgbk2008 at gmail.com Thu Apr 27 02:14:14 2017 From: hgbk2008 at gmail.com (Hoang Giang Bui) Date: Thu, 27 Apr 2017 09:14:14 +0200 Subject: [petsc-users] strange convergence In-Reply-To: References: <7891536D-91FE-4BFF-8DAD-CE7AB85A4E57@mcs.anl.gov> <425BBB58-9721-49F3-8C86-940F08E925F7@mcs.anl.gov> <42EB791A-40C2-439F-A5F7-5F8C15CECA6F@mcs.anl.gov> Message-ID: I have changed the way to tie the nonconforming mesh. It seems the matrix now is better with -pc_type lu the output is 0 KSP preconditioned resid norm 3.308678584240e-01 true resid norm 9.006493082896e+06 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.004313395301e-12 true resid norm 2.549872332830e-05 ||r(i)||/||b|| 2.831148938173e-12 Linear solve converged due to CONVERGED_ATOL iterations 1 with -pc_type fieldsplit -fieldsplit_u_pc_type hypre -fieldsplit_wp_pc_type lu the convergence is slow 0 KSP preconditioned resid norm 1.116302362553e-01 true resid norm 9.006493083520e+06 ||r(i)||/||b|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.582134825666e-02 true resid norm 9.268347719866e+06 ||r(i)||/||b|| 1.029073984060e+00 ... 824 KSP preconditioned resid norm 1.018542387738e-09 true resid norm 2.906608839310e+02 ||r(i)||/||b|| 3.227237074804e-05 825 KSP preconditioned resid norm 9.743727947637e-10 true resid norm 2.820369993061e+02 ||r(i)||/||b|| 3.131485215062e-05 Linear solve converged due to CONVERGED_ATOL iterations 825 checking with additional -fieldsplit_u_ksp_type richardson -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor -fieldsplit_wp_ksp_max_it 1 gives 0 KSP preconditioned resid norm 1.116302362553e-01 true resid norm 9.006493083520e+06 ||r(i)||/||b|| 1.000000000000e+00 Residual norms for fieldsplit_u_ solve. 0 KSP Residual norm 5.803507549280e-01 1 KSP Residual norm 2.069538175950e-01 Residual norms for fieldsplit_wp_ solve. 0 KSP Residual norm 0.000000000000e+00 1 KSP preconditioned resid norm 2.582134825666e-02 true resid norm 9.268347719866e+06 ||r(i)||/||b|| 1.029073984060e+00 Residual norms for fieldsplit_u_ solve. 0 KSP Residual norm 7.831796195225e-01 1 KSP Residual norm 1.734608520110e-01 Residual norms for fieldsplit_wp_ solve. 0 KSP Residual norm 0.000000000000e+00 .... 823 KSP preconditioned resid norm 1.065070135605e-09 true resid norm 3.081881356833e+02 ||r(i)||/||b|| 3.421843916665e-05 Residual norms for fieldsplit_u_ solve. 0 KSP Residual norm 6.113806394327e-01 1 KSP Residual norm 1.535465290944e-01 Residual norms for fieldsplit_wp_ solve. 0 KSP Residual norm 0.000000000000e+00 824 KSP preconditioned resid norm 1.018542387746e-09 true resid norm 2.906608839353e+02 ||r(i)||/||b|| 3.227237074851e-05 Residual norms for fieldsplit_u_ solve. 0 KSP Residual norm 6.123437055586e-01 1 KSP Residual norm 1.524661826133e-01 Residual norms for fieldsplit_wp_ solve. 0 KSP Residual norm 0.000000000000e+00 825 KSP preconditioned resid norm 9.743727947718e-10 true resid norm 2.820369990571e+02 ||r(i)||/||b|| 3.131485212298e-05 Linear solve converged due to CONVERGED_ATOL iterations 825 The residual for wp block is zero since in this first step the rhs is zero. As can see in the output, the multigrid does not perform well to reduce the residual in the sub-solve. Is my observation right? what can be done to improve this? Giang On Tue, Apr 25, 2017 at 12:17 AM, Barry Smith wrote: > > This can happen in the matrix is singular or nearly singular or if the > factorization generates small pivots, which can occur for even nonsingular > problems if the matrix is poorly scaled or just plain nasty. > > > > On Apr 24, 2017, at 5:10 PM, Hoang Giang Bui wrote: > > > > It took a while, here I send you the output > > > > 0 KSP preconditioned resid norm 3.129073545457e+05 true resid norm > 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 > > 1 KSP preconditioned resid norm 7.442444222843e-01 true resid norm > 1.003356247696e+02 ||r(i)||/||b|| 1.112966720375e-05 > > 2 KSP preconditioned resid norm 3.267453132529e-07 true resid norm > 3.216722968300e+01 ||r(i)||/||b|| 3.568130084011e-06 > > 3 KSP preconditioned resid norm 1.155046883816e-11 true resid norm > 3.234460376820e+01 ||r(i)||/||b|| 3.587805194854e-06 > > Linear solve converged due to CONVERGED_ATOL iterations 3 > > KSP Object: 4 MPI processes > > type: gmres > > GMRES: restart=1000, using Modified Gram-Schmidt Orthogonalization > > GMRES: happy breakdown tolerance 1e-30 > > maximum iterations=1000, initial guess is zero > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 > > left preconditioning > > using PRECONDITIONED norm type for convergence test > > PC Object: 4 MPI processes > > type: lu > > LU: out-of-place factorization > > tolerance for zero pivot 2.22045e-14 > > matrix ordering: natural > > factor fill ratio given 0, needed 0 > > Factored matrix follows: > > Mat Object: 4 MPI processes > > type: mpiaij > > rows=973051, cols=973051 > > package used to perform factorization: pastix > > Error : 3.24786e-14 > > total: nonzeros=0, allocated nonzeros=0 > > total number of mallocs used during MatSetValues calls =0 > > PaStiX run parameters: > > Matrix type : Unsymmetric > > Level of printing (0,1,2): 0 > > Number of refinements iterations : 3 > > Error : 3.24786e-14 > > linear system matrix = precond matrix: > > Mat Object: 4 MPI processes > > type: mpiaij > > rows=973051, cols=973051 > > Error : 3.24786e-14 > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 > > total number of mallocs used during MatSetValues calls =0 > > using I-node (on process 0) routines: found 78749 nodes, limit > used is 5 > > Error : 3.24786e-14 > > > > It doesn't do as you said. Something is not right here. I will look in > depth. > > > > Giang > > > > On Mon, Apr 24, 2017 at 8:21 PM, Barry Smith wrote: > > > > > On Apr 24, 2017, at 12:47 PM, Hoang Giang Bui > wrote: > > > > > > Good catch. I get this for the very first step, maybe at that time the > rhs_w is zero. > > > > With the multiplicative composition the right hand side of the > second solve is the initial right hand side of the second solve minus > A_10*x where x is the solution to the first sub solve and A_10 is the lower > left block of the outer matrix. So unless both the initial right hand side > has a zero for the second block and A_10 is identically zero the right hand > side for the second sub solve should not be zero. Is A_10 == 0? > > > > > > > In the later step, it shows 2 step convergence > > > > > > Residual norms for fieldsplit_u_ solve. > > > 0 KSP Residual norm 3.165886479830e+04 > > > 1 KSP Residual norm 2.905922877684e-01 > > > Residual norms for fieldsplit_wp_ solve. > > > 0 KSP Residual norm 2.397669419027e-01 > > > 1 KSP Residual norm 0.000000000000e+00 > > > 0 KSP preconditioned resid norm 3.165886479920e+04 true resid norm > 7.963616922323e+05 ||r(i)||/||b|| 1.000000000000e+00 > > > Residual norms for fieldsplit_u_ solve. > > > 0 KSP Residual norm 9.999891813771e-01 > > > 1 KSP Residual norm 1.512000395579e-05 > > > Residual norms for fieldsplit_wp_ solve. > > > 0 KSP Residual norm 8.192702188243e-06 > > > 1 KSP Residual norm 0.000000000000e+00 > > > 1 KSP preconditioned resid norm 5.252183822848e-02 true resid norm > 7.135927677844e+04 ||r(i)||/||b|| 8.960661653427e-02 > > > > The outer residual norms are still wonky, the preconditioned > residual norm goes from 3.165886479920e+04 to 5.252183822848e-02 which is a > huge drop but the 7.963616922323e+05 drops very much less > 7.135927677844e+04. This is not normal. > > > > What if you just use -pc_type lu for the entire system (no > fieldsplit), does the true residual drop to almost zero in the first > iteration (as it should?). Send the output. > > > > > > > > > Residual norms for fieldsplit_u_ solve. > > > 0 KSP Residual norm 6.946213936597e-01 > > > 1 KSP Residual norm 1.195514007343e-05 > > > Residual norms for fieldsplit_wp_ solve. > > > 0 KSP Residual norm 1.025694497535e+00 > > > 1 KSP Residual norm 0.000000000000e+00 > > > 2 KSP preconditioned resid norm 8.785709535405e-03 true resid norm > 1.419341799277e+04 ||r(i)||/||b|| 1.782282866091e-02 > > > Residual norms for fieldsplit_u_ solve. > > > 0 KSP Residual norm 7.255149996405e-01 > > > 1 KSP Residual norm 6.583512434218e-06 > > > Residual norms for fieldsplit_wp_ solve. > > > 0 KSP Residual norm 1.015229700337e+00 > > > 1 KSP Residual norm 0.000000000000e+00 > > > 3 KSP preconditioned resid norm 7.110407712709e-04 true resid norm > 5.284940654154e+02 ||r(i)||/||b|| 6.636357205153e-04 > > > Residual norms for fieldsplit_u_ solve. > > > 0 KSP Residual norm 3.512243341400e-01 > > > 1 KSP Residual norm 2.032490351200e-06 > > > Residual norms for fieldsplit_wp_ solve. > > > 0 KSP Residual norm 1.282327290982e+00 > > > 1 KSP Residual norm 0.000000000000e+00 > > > 4 KSP preconditioned resid norm 3.482036620521e-05 true resid norm > 4.291231924307e+01 ||r(i)||/||b|| 5.388546393133e-05 > > > Residual norms for fieldsplit_u_ solve. > > > 0 KSP Residual norm 3.423609338053e-01 > > > 1 KSP Residual norm 4.213703301972e-07 > > > Residual norms for fieldsplit_wp_ solve. > > > 0 KSP Residual norm 1.157384757538e+00 > > > 1 KSP Residual norm 0.000000000000e+00 > > > 5 KSP preconditioned resid norm 1.203470314534e-06 true resid norm > 4.544956156267e+00 ||r(i)||/||b|| 5.707150658550e-06 > > > Residual norms for fieldsplit_u_ solve. > > > 0 KSP Residual norm 3.838596289995e-01 > > > 1 KSP Residual norm 9.927864176103e-08 > > > Residual norms for fieldsplit_wp_ solve. > > > 0 KSP Residual norm 1.066298905618e+00 > > > 1 KSP Residual norm 0.000000000000e+00 > > > 6 KSP preconditioned resid norm 3.331619244266e-08 true resid norm > 2.821511729024e+00 ||r(i)||/||b|| 3.543002829675e-06 > > > Residual norms for fieldsplit_u_ solve. > > > 0 KSP Residual norm 4.624964188094e-01 > > > 1 KSP Residual norm 6.418229775372e-08 > > > Residual norms for fieldsplit_wp_ solve. > > > 0 KSP Residual norm 9.800784311614e-01 > > > 1 KSP Residual norm 0.000000000000e+00 > > > 7 KSP preconditioned resid norm 8.788046233297e-10 true resid norm > 2.849209671705e+00 ||r(i)||/||b|| 3.577783436215e-06 > > > Linear solve converged due to CONVERGED_ATOL iterations 7 > > > > > > The outer operator is an explicit matrix. > > > > > > Giang > > > > > > On Mon, Apr 24, 2017 at 7:32 PM, Barry Smith > wrote: > > > > > > > On Apr 24, 2017, at 3:16 AM, Hoang Giang Bui > wrote: > > > > > > > > Thanks Barry, trying with -fieldsplit_u_type lu gives better > convergence. I still used 4 procs though, probably with 1 proc it should > also be the same. > > > > > > > > The u block used a Nitsche-type operator to connect two non-matching > domains. I don't think it will leave some rigid body motion leads to not > sufficient constraints. Maybe you have other idea? > > > > > > > > Residual norms for fieldsplit_u_ solve. > > > > 0 KSP Residual norm 3.129067184300e+05 > > > > 1 KSP Residual norm 5.906261468196e-01 > > > > Residual norms for fieldsplit_wp_ solve. > > > > 0 KSP Residual norm 0.000000000000e+00 > > > > > > ^^^^ something is wrong here. The sub solve should not be starting > with a 0 residual (this means the right hand side for this sub solve is > zero which it should not be). > > > > > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 > > > > > > > > > How are you providing the outer operator? As an explicit matrix or > with some shell matrix? > > > > > > > > > > > > > 0 KSP preconditioned resid norm 3.129067184300e+05 true resid norm > 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 > > > > Residual norms for fieldsplit_u_ solve. > > > > 0 KSP Residual norm 9.999955993437e-01 > > > > 1 KSP Residual norm 4.019774691831e-06 > > > > Residual norms for fieldsplit_wp_ solve. > > > > 0 KSP Residual norm 0.000000000000e+00 > > > > 1 KSP preconditioned resid norm 5.003913641475e-01 true resid norm > 4.692996324114e+01 ||r(i)||/||b|| 5.205677185522e-06 > > > > Residual norms for fieldsplit_u_ solve. > > > > 0 KSP Residual norm 1.000012180204e+00 > > > > 1 KSP Residual norm 1.017367950422e-05 > > > > Residual norms for fieldsplit_wp_ solve. > > > > 0 KSP Residual norm 0.000000000000e+00 > > > > 2 KSP preconditioned resid norm 2.330910333756e-07 true resid norm > 3.474855463983e+01 ||r(i)||/||b|| 3.854461960453e-06 > > > > Residual norms for fieldsplit_u_ solve. > > > > 0 KSP Residual norm 1.000004200085e+00 > > > > 1 KSP Residual norm 6.231613102458e-06 > > > > Residual norms for fieldsplit_wp_ solve. > > > > 0 KSP Residual norm 0.000000000000e+00 > > > > 3 KSP preconditioned resid norm 8.671259838389e-11 true resid norm > 3.545103468011e+01 ||r(i)||/||b|| 3.932384125024e-06 > > > > Linear solve converged due to CONVERGED_ATOL iterations 3 > > > > KSP Object: 4 MPI processes > > > > type: gmres > > > > GMRES: restart=1000, using Modified Gram-Schmidt > Orthogonalization > > > > GMRES: happy breakdown tolerance 1e-30 > > > > maximum iterations=1000, initial guess is zero > > > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 > > > > left preconditioning > > > > using PRECONDITIONED norm type for convergence test > > > > PC Object: 4 MPI processes > > > > type: fieldsplit > > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 > > > > Solver info for each split is in the following KSP objects: > > > > Split number 0 Defined by IS > > > > KSP Object: (fieldsplit_u_) 4 MPI processes > > > > type: richardson > > > > Richardson: damping factor=1 > > > > maximum iterations=1, initial guess is zero > > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > > > left preconditioning > > > > using PRECONDITIONED norm type for convergence test > > > > PC Object: (fieldsplit_u_) 4 MPI processes > > > > type: lu > > > > LU: out-of-place factorization > > > > tolerance for zero pivot 2.22045e-14 > > > > matrix ordering: natural > > > > factor fill ratio given 0, needed 0 > > > > Factored matrix follows: > > > > Mat Object: 4 MPI processes > > > > type: mpiaij > > > > rows=938910, cols=938910 > > > > package used to perform factorization: pastix > > > > total: nonzeros=0, allocated nonzeros=0 > > > > Error : 3.36878e-14 > > > > total number of mallocs used during MatSetValues calls =0 > > > > PaStiX run parameters: > > > > Matrix type : Unsymmetric > > > > Level of printing (0,1,2): 0 > > > > Number of refinements iterations : 3 > > > > Error : 3.36878e-14 > > > > linear system matrix = precond matrix: > > > > Mat Object: (fieldsplit_u_) 4 MPI processes > > > > type: mpiaij > > > > rows=938910, cols=938910, bs=3 > > > > Error : 3.36878e-14 > > > > Error : 3.36878e-14 > > > > total: nonzeros=8.60906e+07, allocated nonzeros=8.60906e+07 > > > > total number of mallocs used during MatSetValues calls =0 > > > > using I-node (on process 0) routines: found 78749 nodes, > limit used is 5 > > > > Split number 1 Defined by IS > > > > KSP Object: (fieldsplit_wp_) 4 MPI processes > > > > type: richardson > > > > Richardson: damping factor=1 > > > > maximum iterations=1, initial guess is zero > > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > > > left preconditioning > > > > using PRECONDITIONED norm type for convergence test > > > > PC Object: (fieldsplit_wp_) 4 MPI processes > > > > type: lu > > > > LU: out-of-place factorization > > > > tolerance for zero pivot 2.22045e-14 > > > > matrix ordering: natural > > > > factor fill ratio given 0, needed 0 > > > > Factored matrix follows: > > > > Mat Object: 4 MPI processes > > > > type: mpiaij > > > > rows=34141, cols=34141 > > > > package used to perform factorization: pastix > > > > Error : -nan > > > > Error : -nan > > > > Error : -nan > > > > total: nonzeros=0, allocated nonzeros=0 > > > > total number of mallocs used during MatSetValues calls > =0 > > > > PaStiX run parameters: > > > > Matrix type : Symmetric > > > > Level of printing (0,1,2): 0 > > > > Number of refinements iterations : 0 > > > > Error : -nan > > > > linear system matrix = precond matrix: > > > > Mat Object: (fieldsplit_wp_) 4 MPI processes > > > > type: mpiaij > > > > rows=34141, cols=34141 > > > > total: nonzeros=485655, allocated nonzeros=485655 > > > > total number of mallocs used during MatSetValues calls =0 > > > > not using I-node (on process 0) routines > > > > linear system matrix = precond matrix: > > > > Mat Object: 4 MPI processes > > > > type: mpiaij > > > > rows=973051, cols=973051 > > > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 > > > > total number of mallocs used during MatSetValues calls =0 > > > > using I-node (on process 0) routines: found 78749 nodes, limit > used is 5 > > > > > > > > > > > > > > > > Giang > > > > > > > > On Sun, Apr 23, 2017 at 10:19 PM, Barry Smith > wrote: > > > > > > > > > On Apr 23, 2017, at 2:42 PM, Hoang Giang Bui > wrote: > > > > > > > > > > Dear Matt/Barry > > > > > > > > > > With your options, it results in > > > > > > > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true resid > norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 > > > > > Residual norms for fieldsplit_u_ solve. > > > > > 0 KSP Residual norm 2.407308987203e+36 > > > > > 1 KSP Residual norm 5.797185652683e+72 > > > > > > > > It looks like Matt is right, hypre is seemly producing useless > garbage. > > > > > > > > First how do things run on one process. If you have similar problems > then debug on one process (debugging any kind of problem is always far easy > on one process). > > > > > > > > First run with -fieldsplit_u_type lu (instead of using hypre) to see > if that works or also produces something bad. > > > > > > > > What is the operator and the boundary conditions for u? It could be > singular. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Residual norms for fieldsplit_wp_ solve. > > > > > 0 KSP Residual norm 0.000000000000e+00 > > > > > ... > > > > > 999 KSP preconditioned resid norm 2.920157329174e+12 true resid > norm 9.015683504616e+06 ||r(i)||/||b|| 1.000059124102e+00 > > > > > Residual norms for fieldsplit_u_ solve. > > > > > 0 KSP Residual norm 1.533726746719e+36 > > > > > 1 KSP Residual norm 3.692757392261e+72 > > > > > Residual norms for fieldsplit_wp_ solve. > > > > > 0 KSP Residual norm 0.000000000000e+00 > > > > > > > > > > Do you suggest that the pastix solver for the "wp" block > encounters small pivot? In addition, seem like the "u" block is also > singular. > > > > > > > > > > Giang > > > > > > > > > > On Sun, Apr 23, 2017 at 7:39 PM, Barry Smith > wrote: > > > > > > > > > > Huge preconditioned norms but normal unpreconditioned norms > almost always come from a very small pivot in an LU or ILU factorization. > > > > > > > > > > The first thing to do is monitor the two sub solves. Run with > the additional options -fieldsplit_u_ksp_type richardson > -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 > -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor > -fieldsplit_wp_ksp_max_it 1 > > > > > > > > > > > On Apr 23, 2017, at 12:22 PM, Hoang Giang Bui < > hgbk2008 at gmail.com> wrote: > > > > > > > > > > > > Hello > > > > > > > > > > > > I encountered a strange convergence behavior that I have trouble > to understand > > > > > > > > > > > > KSPSetFromOptions completed > > > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true resid > norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 > > > > > > 1 KSP preconditioned resid norm 2.933141742664e+29 true resid > norm 9.015152282123e+06 ||r(i)||/||b|| 1.000000198575e+00 > > > > > > 2 KSP preconditioned resid norm 9.686409637174e+16 true resid > norm 9.015354521944e+06 ||r(i)||/||b|| 1.000022631902e+00 > > > > > > 3 KSP preconditioned resid norm 4.219243615809e+15 true resid > norm 9.017157702420e+06 ||r(i)||/||b|| 1.000222648583e+00 > > > > > > ..... > > > > > > 999 KSP preconditioned resid norm 3.043754298076e+12 true resid > norm 9.015425041089e+06 ||r(i)||/||b|| 1.000030454195e+00 > > > > > > 1000 KSP preconditioned resid norm 3.043000287819e+12 true resid > norm 9.015424313455e+06 ||r(i)||/||b|| 1.000030373483e+00 > > > > > > Linear solve did not converge due to DIVERGED_ITS iterations 1000 > > > > > > KSP Object: 4 MPI processes > > > > > > type: gmres > > > > > > GMRES: restart=1000, using Modified Gram-Schmidt > Orthogonalization > > > > > > GMRES: happy breakdown tolerance 1e-30 > > > > > > maximum iterations=1000, initial guess is zero > > > > > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 > > > > > > left preconditioning > > > > > > using PRECONDITIONED norm type for convergence test > > > > > > PC Object: 4 MPI processes > > > > > > type: fieldsplit > > > > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 > > > > > > Solver info for each split is in the following KSP objects: > > > > > > Split number 0 Defined by IS > > > > > > KSP Object: (fieldsplit_u_) 4 MPI processes > > > > > > type: preonly > > > > > > maximum iterations=10000, initial guess is zero > > > > > > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000 > > > > > > left preconditioning > > > > > > using NONE norm type for convergence test > > > > > > PC Object: (fieldsplit_u_) 4 MPI processes > > > > > > type: hypre > > > > > > HYPRE BoomerAMG preconditioning > > > > > > HYPRE BoomerAMG: Cycle type V > > > > > > HYPRE BoomerAMG: Maximum number of levels 25 > > > > > > HYPRE BoomerAMG: Maximum number of iterations PER hypre > call 1 > > > > > > HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 > > > > > > HYPRE BoomerAMG: Threshold for strong coupling 0.6 > > > > > > HYPRE BoomerAMG: Interpolation truncation factor 0 > > > > > > HYPRE BoomerAMG: Interpolation: max elements per row 0 > > > > > > HYPRE BoomerAMG: Number of levels of aggressive > coarsening 0 > > > > > > HYPRE BoomerAMG: Number of paths for aggressive > coarsening 1 > > > > > > HYPRE BoomerAMG: Maximum row sums 0.9 > > > > > > HYPRE BoomerAMG: Sweeps down 1 > > > > > > HYPRE BoomerAMG: Sweeps up 1 > > > > > > HYPRE BoomerAMG: Sweeps on coarse 1 > > > > > > HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi > > > > > > HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi > > > > > > HYPRE BoomerAMG: Relax on coarse Gaussian-elimination > > > > > > HYPRE BoomerAMG: Relax weight (all) 1 > > > > > > HYPRE BoomerAMG: Outer relax weight (all) 1 > > > > > > HYPRE BoomerAMG: Using CF-relaxation > > > > > > HYPRE BoomerAMG: Measure type local > > > > > > HYPRE BoomerAMG: Coarsen type PMIS > > > > > > HYPRE BoomerAMG: Interpolation type classical > > > > > > linear system matrix = precond matrix: > > > > > > Mat Object: (fieldsplit_u_) 4 MPI processes > > > > > > type: mpiaij > > > > > > rows=938910, cols=938910, bs=3 > > > > > > total: nonzeros=8.60906e+07, allocated > nonzeros=8.60906e+07 > > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > > using I-node (on process 0) routines: found 78749 > nodes, limit used is 5 > > > > > > Split number 1 Defined by IS > > > > > > KSP Object: (fieldsplit_wp_) 4 MPI processes > > > > > > type: preonly > > > > > > maximum iterations=10000, initial guess is zero > > > > > > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000 > > > > > > left preconditioning > > > > > > using NONE norm type for convergence test > > > > > > PC Object: (fieldsplit_wp_) 4 MPI processes > > > > > > type: lu > > > > > > LU: out-of-place factorization > > > > > > tolerance for zero pivot 2.22045e-14 > > > > > > matrix ordering: natural > > > > > > factor fill ratio given 0, needed 0 > > > > > > Factored matrix follows: > > > > > > Mat Object: 4 MPI processes > > > > > > type: mpiaij > > > > > > rows=34141, cols=34141 > > > > > > package used to perform factorization: pastix > > > > > > Error : -nan > > > > > > Error : -nan > > > > > > total: nonzeros=0, allocated nonzeros=0 > > > > > > Error : -nan > > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > > PaStiX run parameters: > > > > > > Matrix type : Symmetric > > > > > > Level of printing (0,1,2): 0 > > > > > > Number of refinements iterations : 0 > > > > > > Error : -nan > > > > > > linear system matrix = precond matrix: > > > > > > Mat Object: (fieldsplit_wp_) 4 MPI processes > > > > > > type: mpiaij > > > > > > rows=34141, cols=34141 > > > > > > total: nonzeros=485655, allocated nonzeros=485655 > > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > > not using I-node (on process 0) routines > > > > > > linear system matrix = precond matrix: > > > > > > Mat Object: 4 MPI processes > > > > > > type: mpiaij > > > > > > rows=973051, cols=973051 > > > > > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 > > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > > using I-node (on process 0) routines: found 78749 nodes, > limit used is 5 > > > > > > > > > > > > The pattern of convergence gives a hint that this system is > somehow bad/singular. But I don't know why the preconditioned error goes up > too high. Anyone has an idea? > > > > > > > > > > > > Best regards > > > > > > Giang Bui > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From neok.m4700 at gmail.com Thu Apr 27 03:46:59 2017 From: neok.m4700 at gmail.com (neok m4700) Date: Thu, 27 Apr 2017 10:46:59 +0200 Subject: [petsc-users] explanations on DM_BOUNDARY_PERIODIC Message-ID: Hi, I am trying to change my problem to using periodic boundary conditions. However, when I use DMDASetUniformCoordinates on the DA, the spacing changes. This is due to an additional point e.g. in dm/impls/da/gr1.c else if (dim == 2) { if (bx == DM_BOUNDARY_PERIODIC) hx = (xmax-xmin)/(M); else hx = (xmax-xmin)/(M-1); if (by == DM_BOUNDARY_PERIODIC) hy = (ymax-ymin)/(N); else hy = (ymax-ymin)/(N-1); I don't understand the logic here, since xmin an xmax refer to the physical domain, how does changing to a periodic BC change the discretization ? Could someone clarify or point to a reference ? Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Apr 27 07:35:19 2017 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 27 Apr 2017 08:35:19 -0400 Subject: [petsc-users] Multigrid coarse grid solver In-Reply-To: <2F68C9DC-49EA-4F79-889B-9E24D068A6C5@mcs.anl.gov> References: <2F68C9DC-49EA-4F79-889B-9E24D068A6C5@mcs.anl.gov> Message-ID: On Wed, Apr 26, 2017 at 7:30 PM, Barry Smith wrote: > > Yes, you asked for LU so it used LU! > > Of course for smaller coarse grids and large numbers of processes this > is very inefficient. > > The default behavior for GAMG is probably what you want. In that case > it is equivalent to > -mg_coarse_pc_type bjacobi --mg_coarse_sub_pc_type lu. But GAMG tries > hard No, it just slams those puppies onto proc 0 :) > to put all the coarse grid degrees > of freedom on the first process and none on the rest, so you do end up > with the exact equivalent of a direct solver. > Try -ksp_view in that case. > > There is also -mg_coarse_pc_type redundant -mg_coarse_redundant_pc_type > lu. In that case it makes a copy of the coarse matrix on EACH process and > each process does its own factorization and solve. This saves one phase of > the communication for each V cycle since every process has the entire > solution it just grabs from itself the values it needs without > communication. > > > > > > On Apr 26, 2017, at 5:25 PM, Garth N. Wells wrote: > > > > I'm a bit confused by the selection of the coarse grid solver for > > multigrid. For the demo ksp/ex56, if I do: > > > > mpirun -np 1 ./ex56 -ne 16 -ksp_view -pc_type gamg > > -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu > > > > I see > > > > Coarse grid solver -- level ------------------------------- > > KSP Object: (mg_coarse_) 1 MPI processes > > type: preonly > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using NONE norm type for convergence test > > PC Object: (mg_coarse_) 1 MPI processes > > type: lu > > out-of-place factorization > > tolerance for zero pivot 2.22045e-14 > > matrix ordering: nd > > factor fill ratio given 5., needed 1. > > Factored matrix follows: > > Mat Object: 1 MPI processes > > type: seqaij > > rows=6, cols=6, bs=6 > > package used to perform factorization: petsc > > total: nonzeros=36, allocated nonzeros=36 > > total number of mallocs used during MatSetValues calls =0 > > using I-node routines: found 2 nodes, limit used is 5 > > linear system matrix = precond matrix: > > Mat Object: 1 MPI processes > > type: seqaij > > rows=6, cols=6, bs=6 > > total: nonzeros=36, allocated nonzeros=36 > > total number of mallocs used during MatSetValues calls =0 > > using I-node routines: found 2 nodes, limit used is 5 > > > > which is what I expect. Increasing from 1 to 2 processes: > > > > mpirun -np 2 ./ex56 -ne 16 -ksp_view -pc_type gamg > > -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu > > > > I see > > > > Coarse grid solver -- level ------------------------------- > > KSP Object: (mg_coarse_) 2 MPI processes > > type: preonly > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using NONE norm type for convergence test > > PC Object: (mg_coarse_) 2 MPI processes > > type: lu > > out-of-place factorization > > tolerance for zero pivot 2.22045e-14 > > matrix ordering: natural > > factor fill ratio given 0., needed 0. > > Factored matrix follows: > > Mat Object: 2 MPI processes > > type: superlu_dist > > rows=6, cols=6 > > package used to perform factorization: superlu_dist > > total: nonzeros=0, allocated nonzeros=0 > > total number of mallocs used during MatSetValues calls =0 > > SuperLU_DIST run parameters: > > Process grid nprow 2 x npcol 1 > > Equilibrate matrix TRUE > > Matrix input mode 1 > > Replace tiny pivots FALSE > > Use iterative refinement FALSE > > Processors in row 2 col partition 1 > > Row permutation LargeDiag > > Column permutation METIS_AT_PLUS_A > > Parallel symbolic factorization FALSE > > Repeated factorization SamePattern > > linear system matrix = precond matrix: > > Mat Object: 2 MPI processes > > type: mpiaij > > rows=6, cols=6, bs=6 > > total: nonzeros=36, allocated nonzeros=36 > > total number of mallocs used during MatSetValues calls =0 > > using I-node (on process 0) routines: found 2 nodes, limit used > is 5 > > > > Note that the coarse grid is now using superlu_dist. Is the coarse > > grid being solved in parallel? > > > > Garth > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Apr 27 07:45:12 2017 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 27 Apr 2017 08:45:12 -0400 Subject: [petsc-users] Multigrid coarse grid solver In-Reply-To: References: <2F68C9DC-49EA-4F79-889B-9E24D068A6C5@mcs.anl.gov> Message-ID: Barry, we seem to get an error when you explicitly set this. Garth, Maybe to set the default explicitly you need to use pc_type asm -sub_pc_type lu. That is the true default. More below but this is the error message: 17:46 knepley/feature-plasma-example *= ~/Codes/petsc/src/ksp/ksp/examples/tutorials$ /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -np 2 ./ex56 -ne 16 -pc_type gamg -ksp_view -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu -mg_coarse_pc_factor_mat_solver_package petsc [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/ linearsolvertable.html for possible LU and Cholesky solvers [0]PETSC ERROR: MatSolverPackage petsc does not support matrix type mpiaij [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3658-g99fa2798da GIT Date: 2017-04-25 12:56:20 -0500 [0]PETSC ERROR: ./ex56 on a arch-macosx-gnu-g named MarksMac-5.local by markadams Wed Apr 26 17:46:28 2017 [0]PETSC ERROR: Configure options --with-cc=clang --with-cc++=clang++ COPTFLAGS="-g -O0 -mavx2" CXXOPTFLAGS="-g -O0 -mavx2" F On Thu, Apr 27, 2017 at 1:59 AM, Garth N. Wells wrote: > On 27 April 2017 at 00:30, Barry Smith wrote: > > > > Yes, you asked for LU so it used LU! > > > > Of course for smaller coarse grids and large numbers of processes > this is very inefficient. > > > > The default behavior for GAMG is probably what you want. In that case > it is equivalent to > > -mg_coarse_pc_type bjacobi --mg_coarse_sub_pc_type lu. But GAMG tries > hard to put all the coarse grid degrees > > of freedom on the first process and none on the rest, so you do end up > with the exact equivalent of a direct solver. > > Try -ksp_view in that case. > > > > Thanks, Barry. > > I'm struggling a little to understand the matrix data structure for > the coarse grid. Is it just a mpiaji matrix, with all entries > (usually) on one process? > Yes. > > Is there an options key prefix for the matrix on different levels? > E.g., to turn on a viewer? > something like -mg_level_1_ksp_view should work (run with -help to get the correct syntax). > > If I get GAMG to use more than one process for the coarse grid (a GAMG > setting), can I get a parallel LU (exact) solver to solve it using > only the processes that store parts of the coarse grid matrix? > No, we should make a sub communicator for the active processes only, but I am not too motivated to do this because the only reason that this matters is if 1) a solver (ie, the parallel direct solver) is lazy and puts reductions everywhere for not good reason, or 2) you use a Krylov solver (very uncommon). All of the communication in a non-krylov solver in point to point and there is no win that I know of with a sub communicator. Note, the redundant coarse grid solver does use a subcommuncator, obviously, but I think it is hardwired to PETSC_COMM_SELF, but maybe not? > > Related to all this, do the parallel LU solvers internally > re-distribute a matrix over the whole MPI communicator as part of > their re-ordering phase? > They better not! I doubt any solver would be that eager by default. > > Garth > > > There is also -mg_coarse_pc_type redundant > -mg_coarse_redundant_pc_type lu. In that case it makes a copy of the coarse > matrix on EACH process and each process does its own factorization and > solve. This saves one phase of the communication for each V cycle since > every process has the entire solution it just grabs from itself the values > it needs without communication. > > > > > > > > > >> On Apr 26, 2017, at 5:25 PM, Garth N. Wells wrote: > >> > >> I'm a bit confused by the selection of the coarse grid solver for > >> multigrid. For the demo ksp/ex56, if I do: > >> > >> mpirun -np 1 ./ex56 -ne 16 -ksp_view -pc_type gamg > >> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu > >> > >> I see > >> > >> Coarse grid solver -- level ------------------------------- > >> KSP Object: (mg_coarse_) 1 MPI processes > >> type: preonly > >> maximum iterations=10000, initial guess is zero > >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > >> left preconditioning > >> using NONE norm type for convergence test > >> PC Object: (mg_coarse_) 1 MPI processes > >> type: lu > >> out-of-place factorization > >> tolerance for zero pivot 2.22045e-14 > >> matrix ordering: nd > >> factor fill ratio given 5., needed 1. > >> Factored matrix follows: > >> Mat Object: 1 MPI processes > >> type: seqaij > >> rows=6, cols=6, bs=6 > >> package used to perform factorization: petsc > >> total: nonzeros=36, allocated nonzeros=36 > >> total number of mallocs used during MatSetValues calls =0 > >> using I-node routines: found 2 nodes, limit used is 5 > >> linear system matrix = precond matrix: > >> Mat Object: 1 MPI processes > >> type: seqaij > >> rows=6, cols=6, bs=6 > >> total: nonzeros=36, allocated nonzeros=36 > >> total number of mallocs used during MatSetValues calls =0 > >> using I-node routines: found 2 nodes, limit used is 5 > >> > >> which is what I expect. Increasing from 1 to 2 processes: > >> > >> mpirun -np 2 ./ex56 -ne 16 -ksp_view -pc_type gamg > >> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu > >> > >> I see > >> > >> Coarse grid solver -- level ------------------------------- > >> KSP Object: (mg_coarse_) 2 MPI processes > >> type: preonly > >> maximum iterations=10000, initial guess is zero > >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > >> left preconditioning > >> using NONE norm type for convergence test > >> PC Object: (mg_coarse_) 2 MPI processes > >> type: lu > >> out-of-place factorization > >> tolerance for zero pivot 2.22045e-14 > >> matrix ordering: natural > >> factor fill ratio given 0., needed 0. > >> Factored matrix follows: > >> Mat Object: 2 MPI processes > >> type: superlu_dist > >> rows=6, cols=6 > >> package used to perform factorization: superlu_dist > >> total: nonzeros=0, allocated nonzeros=0 > >> total number of mallocs used during MatSetValues calls =0 > >> SuperLU_DIST run parameters: > >> Process grid nprow 2 x npcol 1 > >> Equilibrate matrix TRUE > >> Matrix input mode 1 > >> Replace tiny pivots FALSE > >> Use iterative refinement FALSE > >> Processors in row 2 col partition 1 > >> Row permutation LargeDiag > >> Column permutation METIS_AT_PLUS_A > >> Parallel symbolic factorization FALSE > >> Repeated factorization SamePattern > >> linear system matrix = precond matrix: > >> Mat Object: 2 MPI processes > >> type: mpiaij > >> rows=6, cols=6, bs=6 > >> total: nonzeros=36, allocated nonzeros=36 > >> total number of mallocs used during MatSetValues calls =0 > >> using I-node (on process 0) routines: found 2 nodes, limit > used is 5 > >> > >> Note that the coarse grid is now using superlu_dist. Is the coarse > >> grid being solved in parallel? > >> > >> Garth > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gnw20 at cam.ac.uk Thu Apr 27 08:27:47 2017 From: gnw20 at cam.ac.uk (Garth N. Wells) Date: Thu, 27 Apr 2017 14:27:47 +0100 Subject: [petsc-users] Multigrid coarse grid solver In-Reply-To: References: <2F68C9DC-49EA-4F79-889B-9E24D068A6C5@mcs.anl.gov> Message-ID: On 27 April 2017 at 13:45, Mark Adams wrote: > Barry, we seem to get an error when you explicitly set this. > > Garth, Maybe to set the default explicitly you need to use pc_type asm > -sub_pc_type lu. That is the true default. > > More below but this is the error message: > > 17:46 knepley/feature-plasma-example *= > ~/Codes/petsc/src/ksp/ksp/examples/tutorials$ > /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -np 2 ./ex56 -ne > 16 -pc_type gamg -ksp_view -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu > -mg_coarse_pc_factor_mat_solver_package petsc > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: See > http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html for > possible LU and Cholesky solvers > [0]PETSC ERROR: MatSolverPackage petsc does not support matrix type mpiaij > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for > trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3658-g99fa2798da GIT > Date: 2017-04-25 12:56:20 -0500 > [0]PETSC ERROR: ./ex56 on a arch-macosx-gnu-g named MarksMac-5.local by > markadams Wed Apr 26 17:46:28 2017 > [0]PETSC ERROR: Configure options --with-cc=clang --with-cc++=clang++ > COPTFLAGS="-g -O0 -mavx2" CXXOPTFLAGS="-g -O0 -mavx2" F > > > On Thu, Apr 27, 2017 at 1:59 AM, Garth N. Wells wrote: >> >> On 27 April 2017 at 00:30, Barry Smith wrote: >> > >> > Yes, you asked for LU so it used LU! >> > >> > Of course for smaller coarse grids and large numbers of processes >> > this is very inefficient. >> > >> > The default behavior for GAMG is probably what you want. In that case >> > it is equivalent to >> > -mg_coarse_pc_type bjacobi --mg_coarse_sub_pc_type lu. But GAMG tries >> > hard to put all the coarse grid degrees >> > of freedom on the first process and none on the rest, so you do end up >> > with the exact equivalent of a direct solver. >> > Try -ksp_view in that case. >> > >> >> Thanks, Barry. >> >> I'm struggling a little to understand the matrix data structure for >> the coarse grid. Is it just a mpiaji matrix, with all entries >> (usually) on one process? > > > Yes. > >> >> >> Is there an options key prefix for the matrix on different levels? >> E.g., to turn on a viewer? > > > something like -mg_level_1_ksp_view should work (run with -help to get the > correct syntax). > Does the matrix operator(s) associated with the ksp have an options prefix? >> >> >> If I get GAMG to use more than one process for the coarse grid (a GAMG >> setting), can I get a parallel LU (exact) solver to solve it using >> only the processes that store parts of the coarse grid matrix? > > > No, we should make a sub communicator for the active processes only, but I > am not too motivated to do this because the only reason that this matters is > if 1) a solver (ie, the parallel direct solver) is lazy and puts reductions > everywhere for not good reason, or 2) you use a Krylov solver (very > uncommon). All of the communication in a non-krylov solver in point to point > and there is no win that I know of with a sub communicator. > > Note, the redundant coarse grid solver does use a subcommuncator, obviously, > but I think it is hardwired to PETSC_COMM_SELF, but maybe not? > >> >> >> Related to all this, do the parallel LU solvers internally >> re-distribute a matrix over the whole MPI communicator as part of >> their re-ordering phase? > > > They better not! > I did a test with MUMPS, and from the MUMPS diagnostics (memory use per process) it appears that it does split the matrix across all processes. Garth > I doubt any solver would be that eager by default. > >> >> >> Garth >> >> > There is also -mg_coarse_pc_type redundant >> > -mg_coarse_redundant_pc_type lu. In that case it makes a copy of the coarse >> > matrix on EACH process and each process does its own factorization and >> > solve. This saves one phase of the communication for each V cycle since >> > every process has the entire solution it just grabs from itself the values >> > it needs without communication. >> > >> > >> > >> > >> >> On Apr 26, 2017, at 5:25 PM, Garth N. Wells wrote: >> >> >> >> I'm a bit confused by the selection of the coarse grid solver for >> >> multigrid. For the demo ksp/ex56, if I do: >> >> >> >> mpirun -np 1 ./ex56 -ne 16 -ksp_view -pc_type gamg >> >> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu >> >> >> >> I see >> >> >> >> Coarse grid solver -- level ------------------------------- >> >> KSP Object: (mg_coarse_) 1 MPI processes >> >> type: preonly >> >> maximum iterations=10000, initial guess is zero >> >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> >> left preconditioning >> >> using NONE norm type for convergence test >> >> PC Object: (mg_coarse_) 1 MPI processes >> >> type: lu >> >> out-of-place factorization >> >> tolerance for zero pivot 2.22045e-14 >> >> matrix ordering: nd >> >> factor fill ratio given 5., needed 1. >> >> Factored matrix follows: >> >> Mat Object: 1 MPI processes >> >> type: seqaij >> >> rows=6, cols=6, bs=6 >> >> package used to perform factorization: petsc >> >> total: nonzeros=36, allocated nonzeros=36 >> >> total number of mallocs used during MatSetValues calls =0 >> >> using I-node routines: found 2 nodes, limit used is 5 >> >> linear system matrix = precond matrix: >> >> Mat Object: 1 MPI processes >> >> type: seqaij >> >> rows=6, cols=6, bs=6 >> >> total: nonzeros=36, allocated nonzeros=36 >> >> total number of mallocs used during MatSetValues calls =0 >> >> using I-node routines: found 2 nodes, limit used is 5 >> >> >> >> which is what I expect. Increasing from 1 to 2 processes: >> >> >> >> mpirun -np 2 ./ex56 -ne 16 -ksp_view -pc_type gamg >> >> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu >> >> >> >> I see >> >> >> >> Coarse grid solver -- level ------------------------------- >> >> KSP Object: (mg_coarse_) 2 MPI processes >> >> type: preonly >> >> maximum iterations=10000, initial guess is zero >> >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> >> left preconditioning >> >> using NONE norm type for convergence test >> >> PC Object: (mg_coarse_) 2 MPI processes >> >> type: lu >> >> out-of-place factorization >> >> tolerance for zero pivot 2.22045e-14 >> >> matrix ordering: natural >> >> factor fill ratio given 0., needed 0. >> >> Factored matrix follows: >> >> Mat Object: 2 MPI processes >> >> type: superlu_dist >> >> rows=6, cols=6 >> >> package used to perform factorization: superlu_dist >> >> total: nonzeros=0, allocated nonzeros=0 >> >> total number of mallocs used during MatSetValues calls =0 >> >> SuperLU_DIST run parameters: >> >> Process grid nprow 2 x npcol 1 >> >> Equilibrate matrix TRUE >> >> Matrix input mode 1 >> >> Replace tiny pivots FALSE >> >> Use iterative refinement FALSE >> >> Processors in row 2 col partition 1 >> >> Row permutation LargeDiag >> >> Column permutation METIS_AT_PLUS_A >> >> Parallel symbolic factorization FALSE >> >> Repeated factorization SamePattern >> >> linear system matrix = precond matrix: >> >> Mat Object: 2 MPI processes >> >> type: mpiaij >> >> rows=6, cols=6, bs=6 >> >> total: nonzeros=36, allocated nonzeros=36 >> >> total number of mallocs used during MatSetValues calls =0 >> >> using I-node (on process 0) routines: found 2 nodes, limit >> >> used is 5 >> >> >> >> Note that the coarse grid is now using superlu_dist. Is the coarse >> >> grid being solved in parallel? >> >> >> >> Garth >> > > > From mfadams at lbl.gov Thu Apr 27 09:07:02 2017 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 27 Apr 2017 10:07:02 -0400 Subject: [petsc-users] Multigrid coarse grid solver In-Reply-To: References: <2F68C9DC-49EA-4F79-889B-9E24D068A6C5@mcs.anl.gov> Message-ID: > > Does the matrix operator(s) associated with the ksp have an options prefix? > > I don't think so. run with -help to check. > >> > >> > >> If I get GAMG to use more than one process for the coarse grid (a GAMG > >> setting), can I get a parallel LU (exact) solver to solve it using > >> only the processes that store parts of the coarse grid matrix? > > > > > > No, we should make a sub communicator for the active processes only, but > I > > am not too motivated to do this because the only reason that this > matters is > > if 1) a solver (ie, the parallel direct solver) is lazy and puts > reductions > > everywhere for not good reason, or 2) you use a Krylov solver (very > > uncommon). All of the communication in a non-krylov solver in point to > point > > and there is no win that I know of with a sub communicator. > > > > Note, the redundant coarse grid solver does use a subcommuncator, > obviously, > > but I think it is hardwired to PETSC_COMM_SELF, but maybe not? > > > >> > >> > >> Related to all this, do the parallel LU solvers internally > >> re-distribute a matrix over the whole MPI communicator as part of > >> their re-ordering phase? > > > > > > They better not! > > > > I did a test with MUMPS, and from the MUMPS diagnostics (memory use > per process) it appears that it does split the matrix across all > processes. > Yikes! That is your problem with strong speedup. Use SuperLU. I think making a subcommunicator for the coarse grid in GAMG would wreck havoc. Could we turn that option off in MUMPS from GAMG? Or just turn it off by default? PETSc does not usually get that eager about partitioning. > > Garth > > > I doubt any solver would be that eager by default. > > > >> > >> > >> Garth > >> > >> > There is also -mg_coarse_pc_type redundant > >> > -mg_coarse_redundant_pc_type lu. In that case it makes a copy of the > coarse > >> > matrix on EACH process and each process does its own factorization and > >> > solve. This saves one phase of the communication for each V cycle > since > >> > every process has the entire solution it just grabs from itself the > values > >> > it needs without communication. > >> > > >> > > >> > > >> > > >> >> On Apr 26, 2017, at 5:25 PM, Garth N. Wells wrote: > >> >> > >> >> I'm a bit confused by the selection of the coarse grid solver for > >> >> multigrid. For the demo ksp/ex56, if I do: > >> >> > >> >> mpirun -np 1 ./ex56 -ne 16 -ksp_view -pc_type gamg > >> >> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu > >> >> > >> >> I see > >> >> > >> >> Coarse grid solver -- level ------------------------------- > >> >> KSP Object: (mg_coarse_) 1 MPI processes > >> >> type: preonly > >> >> maximum iterations=10000, initial guess is zero > >> >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > >> >> left preconditioning > >> >> using NONE norm type for convergence test > >> >> PC Object: (mg_coarse_) 1 MPI processes > >> >> type: lu > >> >> out-of-place factorization > >> >> tolerance for zero pivot 2.22045e-14 > >> >> matrix ordering: nd > >> >> factor fill ratio given 5., needed 1. > >> >> Factored matrix follows: > >> >> Mat Object: 1 MPI processes > >> >> type: seqaij > >> >> rows=6, cols=6, bs=6 > >> >> package used to perform factorization: petsc > >> >> total: nonzeros=36, allocated nonzeros=36 > >> >> total number of mallocs used during MatSetValues calls > =0 > >> >> using I-node routines: found 2 nodes, limit used is 5 > >> >> linear system matrix = precond matrix: > >> >> Mat Object: 1 MPI processes > >> >> type: seqaij > >> >> rows=6, cols=6, bs=6 > >> >> total: nonzeros=36, allocated nonzeros=36 > >> >> total number of mallocs used during MatSetValues calls =0 > >> >> using I-node routines: found 2 nodes, limit used is 5 > >> >> > >> >> which is what I expect. Increasing from 1 to 2 processes: > >> >> > >> >> mpirun -np 2 ./ex56 -ne 16 -ksp_view -pc_type gamg > >> >> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu > >> >> > >> >> I see > >> >> > >> >> Coarse grid solver -- level ------------------------------- > >> >> KSP Object: (mg_coarse_) 2 MPI processes > >> >> type: preonly > >> >> maximum iterations=10000, initial guess is zero > >> >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > >> >> left preconditioning > >> >> using NONE norm type for convergence test > >> >> PC Object: (mg_coarse_) 2 MPI processes > >> >> type: lu > >> >> out-of-place factorization > >> >> tolerance for zero pivot 2.22045e-14 > >> >> matrix ordering: natural > >> >> factor fill ratio given 0., needed 0. > >> >> Factored matrix follows: > >> >> Mat Object: 2 MPI processes > >> >> type: superlu_dist > >> >> rows=6, cols=6 > >> >> package used to perform factorization: superlu_dist > >> >> total: nonzeros=0, allocated nonzeros=0 > >> >> total number of mallocs used during MatSetValues calls > =0 > >> >> SuperLU_DIST run parameters: > >> >> Process grid nprow 2 x npcol 1 > >> >> Equilibrate matrix TRUE > >> >> Matrix input mode 1 > >> >> Replace tiny pivots FALSE > >> >> Use iterative refinement FALSE > >> >> Processors in row 2 col partition 1 > >> >> Row permutation LargeDiag > >> >> Column permutation METIS_AT_PLUS_A > >> >> Parallel symbolic factorization FALSE > >> >> Repeated factorization SamePattern > >> >> linear system matrix = precond matrix: > >> >> Mat Object: 2 MPI processes > >> >> type: mpiaij > >> >> rows=6, cols=6, bs=6 > >> >> total: nonzeros=36, allocated nonzeros=36 > >> >> total number of mallocs used during MatSetValues calls =0 > >> >> using I-node (on process 0) routines: found 2 nodes, limit > >> >> used is 5 > >> >> > >> >> Note that the coarse grid is now using superlu_dist. Is the coarse > >> >> grid being solved in parallel? > >> >> > >> >> Garth > >> > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Apr 27 09:13:49 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 27 Apr 2017 09:13:49 -0500 Subject: [petsc-users] Multigrid coarse grid solver In-Reply-To: References: <2F68C9DC-49EA-4F79-889B-9E24D068A6C5@mcs.anl.gov> Message-ID: On Thu, Apr 27, 2017 at 9:07 AM, Mark Adams wrote: > > >> Does the matrix operator(s) associated with the ksp have an options >> prefix? >> >> > I don't think so. run with -help to check. > > >> >> >> >> >> >> If I get GAMG to use more than one process for the coarse grid (a GAMG >> >> setting), can I get a parallel LU (exact) solver to solve it using >> >> only the processes that store parts of the coarse grid matrix? >> > >> > >> > No, we should make a sub communicator for the active processes only, >> but I >> > am not too motivated to do this because the only reason that this >> matters is >> > if 1) a solver (ie, the parallel direct solver) is lazy and puts >> reductions >> > everywhere for not good reason, or 2) you use a Krylov solver (very >> > uncommon). All of the communication in a non-krylov solver in point to >> point >> > and there is no win that I know of with a sub communicator. >> > >> > Note, the redundant coarse grid solver does use a subcommuncator, >> obviously, >> > but I think it is hardwired to PETSC_COMM_SELF, but maybe not? >> > >> >> >> >> >> >> Related to all this, do the parallel LU solvers internally >> >> re-distribute a matrix over the whole MPI communicator as part of >> >> their re-ordering phase? >> > >> > >> > They better not! >> > >> >> I did a test with MUMPS, and from the MUMPS diagnostics (memory use >> per process) it appears that it does split the matrix across all >> processes. > > 1) Can we motivate why you would ever want a parallel coarse grid? I cannot think of a reason. > Yikes! That is your problem with strong speedup. Use SuperLU. > > I think making a subcommunicator for the coarse grid in GAMG would wreck > havoc. > 2) I do not see why a subcommunicator is a problem. In fact, this is exactly what PCTELESCOPE is designed to do. GAMG does a good job of reducing, but if you want completely custom reductions, TELESCOPE is for that. Matt > Could we turn that option off in MUMPS from GAMG? Or just turn it off by > default? PETSc does not usually get that eager about partitioning. > > >> >> Garth >> >> > I doubt any solver would be that eager by default. >> > >> >> >> >> >> >> Garth >> >> >> >> > There is also -mg_coarse_pc_type redundant >> >> > -mg_coarse_redundant_pc_type lu. In that case it makes a copy of the >> coarse >> >> > matrix on EACH process and each process does its own factorization >> and >> >> > solve. This saves one phase of the communication for each V cycle >> since >> >> > every process has the entire solution it just grabs from itself the >> values >> >> > it needs without communication. >> >> > >> >> > >> >> > >> >> > >> >> >> On Apr 26, 2017, at 5:25 PM, Garth N. Wells >> wrote: >> >> >> >> >> >> I'm a bit confused by the selection of the coarse grid solver for >> >> >> multigrid. For the demo ksp/ex56, if I do: >> >> >> >> >> >> mpirun -np 1 ./ex56 -ne 16 -ksp_view -pc_type gamg >> >> >> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu >> >> >> >> >> >> I see >> >> >> >> >> >> Coarse grid solver -- level ------------------------------- >> >> >> KSP Object: (mg_coarse_) 1 MPI processes >> >> >> type: preonly >> >> >> maximum iterations=10000, initial guess is zero >> >> >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> >> >> left preconditioning >> >> >> using NONE norm type for convergence test >> >> >> PC Object: (mg_coarse_) 1 MPI processes >> >> >> type: lu >> >> >> out-of-place factorization >> >> >> tolerance for zero pivot 2.22045e-14 >> >> >> matrix ordering: nd >> >> >> factor fill ratio given 5., needed 1. >> >> >> Factored matrix follows: >> >> >> Mat Object: 1 MPI processes >> >> >> type: seqaij >> >> >> rows=6, cols=6, bs=6 >> >> >> package used to perform factorization: petsc >> >> >> total: nonzeros=36, allocated nonzeros=36 >> >> >> total number of mallocs used during MatSetValues calls >> =0 >> >> >> using I-node routines: found 2 nodes, limit used is 5 >> >> >> linear system matrix = precond matrix: >> >> >> Mat Object: 1 MPI processes >> >> >> type: seqaij >> >> >> rows=6, cols=6, bs=6 >> >> >> total: nonzeros=36, allocated nonzeros=36 >> >> >> total number of mallocs used during MatSetValues calls =0 >> >> >> using I-node routines: found 2 nodes, limit used is 5 >> >> >> >> >> >> which is what I expect. Increasing from 1 to 2 processes: >> >> >> >> >> >> mpirun -np 2 ./ex56 -ne 16 -ksp_view -pc_type gamg >> >> >> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu >> >> >> >> >> >> I see >> >> >> >> >> >> Coarse grid solver -- level ------------------------------- >> >> >> KSP Object: (mg_coarse_) 2 MPI processes >> >> >> type: preonly >> >> >> maximum iterations=10000, initial guess is zero >> >> >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> >> >> left preconditioning >> >> >> using NONE norm type for convergence test >> >> >> PC Object: (mg_coarse_) 2 MPI processes >> >> >> type: lu >> >> >> out-of-place factorization >> >> >> tolerance for zero pivot 2.22045e-14 >> >> >> matrix ordering: natural >> >> >> factor fill ratio given 0., needed 0. >> >> >> Factored matrix follows: >> >> >> Mat Object: 2 MPI processes >> >> >> type: superlu_dist >> >> >> rows=6, cols=6 >> >> >> package used to perform factorization: superlu_dist >> >> >> total: nonzeros=0, allocated nonzeros=0 >> >> >> total number of mallocs used during MatSetValues calls >> =0 >> >> >> SuperLU_DIST run parameters: >> >> >> Process grid nprow 2 x npcol 1 >> >> >> Equilibrate matrix TRUE >> >> >> Matrix input mode 1 >> >> >> Replace tiny pivots FALSE >> >> >> Use iterative refinement FALSE >> >> >> Processors in row 2 col partition 1 >> >> >> Row permutation LargeDiag >> >> >> Column permutation METIS_AT_PLUS_A >> >> >> Parallel symbolic factorization FALSE >> >> >> Repeated factorization SamePattern >> >> >> linear system matrix = precond matrix: >> >> >> Mat Object: 2 MPI processes >> >> >> type: mpiaij >> >> >> rows=6, cols=6, bs=6 >> >> >> total: nonzeros=36, allocated nonzeros=36 >> >> >> total number of mallocs used during MatSetValues calls =0 >> >> >> using I-node (on process 0) routines: found 2 nodes, limit >> >> >> used is 5 >> >> >> >> >> >> Note that the coarse grid is now using superlu_dist. Is the coarse >> >> >> grid being solved in parallel? >> >> >> >> >> >> Garth >> >> > >> > >> > >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Thu Apr 27 09:15:56 2017 From: knepley at gmail.com (Matthew Knepley) Date: Thu, 27 Apr 2017 09:15:56 -0500 Subject: [petsc-users] explanations on DM_BOUNDARY_PERIODIC In-Reply-To: References: Message-ID: On Thu, Apr 27, 2017 at 3:46 AM, neok m4700 wrote: > Hi, > > I am trying to change my problem to using periodic boundary conditions. > > However, when I use DMDASetUniformCoordinates on the DA, the spacing > changes. > > This is due to an additional point e.g. in dm/impls/da/gr1.c > > else if (dim == 2) { > if (bx == DM_BOUNDARY_PERIODIC) hx = (xmax-xmin)/(M); > else hx = (xmax-xmin)/(M-1); > if (by == DM_BOUNDARY_PERIODIC) hy = (ymax-ymin)/(N); > else hy = (ymax-ymin)/(N-1); > > I don't understand the logic here, since xmin an xmax refer to the > physical domain, how does changing to a periodic BC change the > discretization ? > > Could someone clarify or point to a reference ? > Just do a 1D example with 3 vertices. With a normal domain, you have 2 cells 1-----2-----3 so each cell is 1/2 of the domain. In a periodic domain, the last vertex is connected to the first, so we have 3 cells 1-----2-----3-----1 and each is 1/3 of the domain. Matt > Thanks > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From mfadams at lbl.gov Thu Apr 27 10:05:59 2017 From: mfadams at lbl.gov (Mark Adams) Date: Thu, 27 Apr 2017 11:05:59 -0400 Subject: [petsc-users] Multigrid coarse grid solver In-Reply-To: References: <2F68C9DC-49EA-4F79-889B-9E24D068A6C5@mcs.anl.gov> Message-ID: > > > 1) Can we motivate why you would ever want a parallel coarse grid? I > cannot think of a reason. > > AMG coarsening is not 100% reliable, in many respects, but on complex domains with a lot of levels you can fail eventually and stopping coarsening prematurely can be a stop gap measure. True, in theory you never need to stop. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsmith at mcs.anl.gov Thu Apr 27 10:25:35 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 27 Apr 2017 10:25:35 -0500 Subject: [petsc-users] strange convergence In-Reply-To: References: <7891536D-91FE-4BFF-8DAD-CE7AB85A4E57@mcs.anl.gov> <425BBB58-9721-49F3-8C86-940F08E925F7@mcs.anl.gov> <42EB791A-40C2-439F-A5F7-5F8C15CECA6F@mcs.anl.gov> Message-ID: <82193784-B4C4-47D7-80EA-25F549C9091B@mcs.anl.gov> Run again using LU on both blocks to see what happens. > On Apr 27, 2017, at 2:14 AM, Hoang Giang Bui wrote: > > I have changed the way to tie the nonconforming mesh. It seems the matrix now is better > > with -pc_type lu the output is > 0 KSP preconditioned resid norm 3.308678584240e-01 true resid norm 9.006493082896e+06 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 2.004313395301e-12 true resid norm 2.549872332830e-05 ||r(i)||/||b|| 2.831148938173e-12 > Linear solve converged due to CONVERGED_ATOL iterations 1 > > > with -pc_type fieldsplit -fieldsplit_u_pc_type hypre -fieldsplit_wp_pc_type lu the convergence is slow > 0 KSP preconditioned resid norm 1.116302362553e-01 true resid norm 9.006493083520e+06 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 2.582134825666e-02 true resid norm 9.268347719866e+06 ||r(i)||/||b|| 1.029073984060e+00 > ... > 824 KSP preconditioned resid norm 1.018542387738e-09 true resid norm 2.906608839310e+02 ||r(i)||/||b|| 3.227237074804e-05 > 825 KSP preconditioned resid norm 9.743727947637e-10 true resid norm 2.820369993061e+02 ||r(i)||/||b|| 3.131485215062e-05 > Linear solve converged due to CONVERGED_ATOL iterations 825 > > checking with additional -fieldsplit_u_ksp_type richardson -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor -fieldsplit_wp_ksp_max_it 1 gives > > 0 KSP preconditioned resid norm 1.116302362553e-01 true resid norm 9.006493083520e+06 ||r(i)||/||b|| 1.000000000000e+00 > Residual norms for fieldsplit_u_ solve. > 0 KSP Residual norm 5.803507549280e-01 > 1 KSP Residual norm 2.069538175950e-01 > Residual norms for fieldsplit_wp_ solve. > 0 KSP Residual norm 0.000000000000e+00 > 1 KSP preconditioned resid norm 2.582134825666e-02 true resid norm 9.268347719866e+06 ||r(i)||/||b|| 1.029073984060e+00 > Residual norms for fieldsplit_u_ solve. > 0 KSP Residual norm 7.831796195225e-01 > 1 KSP Residual norm 1.734608520110e-01 > Residual norms for fieldsplit_wp_ solve. > 0 KSP Residual norm 0.000000000000e+00 > .... > 823 KSP preconditioned resid norm 1.065070135605e-09 true resid norm 3.081881356833e+02 ||r(i)||/||b|| 3.421843916665e-05 > Residual norms for fieldsplit_u_ solve. > 0 KSP Residual norm 6.113806394327e-01 > 1 KSP Residual norm 1.535465290944e-01 > Residual norms for fieldsplit_wp_ solve. > 0 KSP Residual norm 0.000000000000e+00 > 824 KSP preconditioned resid norm 1.018542387746e-09 true resid norm 2.906608839353e+02 ||r(i)||/||b|| 3.227237074851e-05 > Residual norms for fieldsplit_u_ solve. > 0 KSP Residual norm 6.123437055586e-01 > 1 KSP Residual norm 1.524661826133e-01 > Residual norms for fieldsplit_wp_ solve. > 0 KSP Residual norm 0.000000000000e+00 > 825 KSP preconditioned resid norm 9.743727947718e-10 true resid norm 2.820369990571e+02 ||r(i)||/||b|| 3.131485212298e-05 > Linear solve converged due to CONVERGED_ATOL iterations 825 > > > The residual for wp block is zero since in this first step the rhs is zero. As can see in the output, the multigrid does not perform well to reduce the residual in the sub-solve. Is my observation right? what can be done to improve this? > > > Giang > > On Tue, Apr 25, 2017 at 12:17 AM, Barry Smith wrote: > > This can happen in the matrix is singular or nearly singular or if the factorization generates small pivots, which can occur for even nonsingular problems if the matrix is poorly scaled or just plain nasty. > > > > On Apr 24, 2017, at 5:10 PM, Hoang Giang Bui wrote: > > > > It took a while, here I send you the output > > > > 0 KSP preconditioned resid norm 3.129073545457e+05 true resid norm 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 > > 1 KSP preconditioned resid norm 7.442444222843e-01 true resid norm 1.003356247696e+02 ||r(i)||/||b|| 1.112966720375e-05 > > 2 KSP preconditioned resid norm 3.267453132529e-07 true resid norm 3.216722968300e+01 ||r(i)||/||b|| 3.568130084011e-06 > > 3 KSP preconditioned resid norm 1.155046883816e-11 true resid norm 3.234460376820e+01 ||r(i)||/||b|| 3.587805194854e-06 > > Linear solve converged due to CONVERGED_ATOL iterations 3 > > KSP Object: 4 MPI processes > > type: gmres > > GMRES: restart=1000, using Modified Gram-Schmidt Orthogonalization > > GMRES: happy breakdown tolerance 1e-30 > > maximum iterations=1000, initial guess is zero > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 > > left preconditioning > > using PRECONDITIONED norm type for convergence test > > PC Object: 4 MPI processes > > type: lu > > LU: out-of-place factorization > > tolerance for zero pivot 2.22045e-14 > > matrix ordering: natural > > factor fill ratio given 0, needed 0 > > Factored matrix follows: > > Mat Object: 4 MPI processes > > type: mpiaij > > rows=973051, cols=973051 > > package used to perform factorization: pastix > > Error : 3.24786e-14 > > total: nonzeros=0, allocated nonzeros=0 > > total number of mallocs used during MatSetValues calls =0 > > PaStiX run parameters: > > Matrix type : Unsymmetric > > Level of printing (0,1,2): 0 > > Number of refinements iterations : 3 > > Error : 3.24786e-14 > > linear system matrix = precond matrix: > > Mat Object: 4 MPI processes > > type: mpiaij > > rows=973051, cols=973051 > > Error : 3.24786e-14 > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 > > total number of mallocs used during MatSetValues calls =0 > > using I-node (on process 0) routines: found 78749 nodes, limit used is 5 > > Error : 3.24786e-14 > > > > It doesn't do as you said. Something is not right here. I will look in depth. > > > > Giang > > > > On Mon, Apr 24, 2017 at 8:21 PM, Barry Smith wrote: > > > > > On Apr 24, 2017, at 12:47 PM, Hoang Giang Bui wrote: > > > > > > Good catch. I get this for the very first step, maybe at that time the rhs_w is zero. > > > > With the multiplicative composition the right hand side of the second solve is the initial right hand side of the second solve minus A_10*x where x is the solution to the first sub solve and A_10 is the lower left block of the outer matrix. So unless both the initial right hand side has a zero for the second block and A_10 is identically zero the right hand side for the second sub solve should not be zero. Is A_10 == 0? > > > > > > > In the later step, it shows 2 step convergence > > > > > > Residual norms for fieldsplit_u_ solve. > > > 0 KSP Residual norm 3.165886479830e+04 > > > 1 KSP Residual norm 2.905922877684e-01 > > > Residual norms for fieldsplit_wp_ solve. > > > 0 KSP Residual norm 2.397669419027e-01 > > > 1 KSP Residual norm 0.000000000000e+00 > > > 0 KSP preconditioned resid norm 3.165886479920e+04 true resid norm 7.963616922323e+05 ||r(i)||/||b|| 1.000000000000e+00 > > > Residual norms for fieldsplit_u_ solve. > > > 0 KSP Residual norm 9.999891813771e-01 > > > 1 KSP Residual norm 1.512000395579e-05 > > > Residual norms for fieldsplit_wp_ solve. > > > 0 KSP Residual norm 8.192702188243e-06 > > > 1 KSP Residual norm 0.000000000000e+00 > > > 1 KSP preconditioned resid norm 5.252183822848e-02 true resid norm 7.135927677844e+04 ||r(i)||/||b|| 8.960661653427e-02 > > > > The outer residual norms are still wonky, the preconditioned residual norm goes from 3.165886479920e+04 to 5.252183822848e-02 which is a huge drop but the 7.963616922323e+05 drops very much less 7.135927677844e+04. This is not normal. > > > > What if you just use -pc_type lu for the entire system (no fieldsplit), does the true residual drop to almost zero in the first iteration (as it should?). Send the output. > > > > > > > > > Residual norms for fieldsplit_u_ solve. > > > 0 KSP Residual norm 6.946213936597e-01 > > > 1 KSP Residual norm 1.195514007343e-05 > > > Residual norms for fieldsplit_wp_ solve. > > > 0 KSP Residual norm 1.025694497535e+00 > > > 1 KSP Residual norm 0.000000000000e+00 > > > 2 KSP preconditioned resid norm 8.785709535405e-03 true resid norm 1.419341799277e+04 ||r(i)||/||b|| 1.782282866091e-02 > > > Residual norms for fieldsplit_u_ solve. > > > 0 KSP Residual norm 7.255149996405e-01 > > > 1 KSP Residual norm 6.583512434218e-06 > > > Residual norms for fieldsplit_wp_ solve. > > > 0 KSP Residual norm 1.015229700337e+00 > > > 1 KSP Residual norm 0.000000000000e+00 > > > 3 KSP preconditioned resid norm 7.110407712709e-04 true resid norm 5.284940654154e+02 ||r(i)||/||b|| 6.636357205153e-04 > > > Residual norms for fieldsplit_u_ solve. > > > 0 KSP Residual norm 3.512243341400e-01 > > > 1 KSP Residual norm 2.032490351200e-06 > > > Residual norms for fieldsplit_wp_ solve. > > > 0 KSP Residual norm 1.282327290982e+00 > > > 1 KSP Residual norm 0.000000000000e+00 > > > 4 KSP preconditioned resid norm 3.482036620521e-05 true resid norm 4.291231924307e+01 ||r(i)||/||b|| 5.388546393133e-05 > > > Residual norms for fieldsplit_u_ solve. > > > 0 KSP Residual norm 3.423609338053e-01 > > > 1 KSP Residual norm 4.213703301972e-07 > > > Residual norms for fieldsplit_wp_ solve. > > > 0 KSP Residual norm 1.157384757538e+00 > > > 1 KSP Residual norm 0.000000000000e+00 > > > 5 KSP preconditioned resid norm 1.203470314534e-06 true resid norm 4.544956156267e+00 ||r(i)||/||b|| 5.707150658550e-06 > > > Residual norms for fieldsplit_u_ solve. > > > 0 KSP Residual norm 3.838596289995e-01 > > > 1 KSP Residual norm 9.927864176103e-08 > > > Residual norms for fieldsplit_wp_ solve. > > > 0 KSP Residual norm 1.066298905618e+00 > > > 1 KSP Residual norm 0.000000000000e+00 > > > 6 KSP preconditioned resid norm 3.331619244266e-08 true resid norm 2.821511729024e+00 ||r(i)||/||b|| 3.543002829675e-06 > > > Residual norms for fieldsplit_u_ solve. > > > 0 KSP Residual norm 4.624964188094e-01 > > > 1 KSP Residual norm 6.418229775372e-08 > > > Residual norms for fieldsplit_wp_ solve. > > > 0 KSP Residual norm 9.800784311614e-01 > > > 1 KSP Residual norm 0.000000000000e+00 > > > 7 KSP preconditioned resid norm 8.788046233297e-10 true resid norm 2.849209671705e+00 ||r(i)||/||b|| 3.577783436215e-06 > > > Linear solve converged due to CONVERGED_ATOL iterations 7 > > > > > > The outer operator is an explicit matrix. > > > > > > Giang > > > > > > On Mon, Apr 24, 2017 at 7:32 PM, Barry Smith wrote: > > > > > > > On Apr 24, 2017, at 3:16 AM, Hoang Giang Bui wrote: > > > > > > > > Thanks Barry, trying with -fieldsplit_u_type lu gives better convergence. I still used 4 procs though, probably with 1 proc it should also be the same. > > > > > > > > The u block used a Nitsche-type operator to connect two non-matching domains. I don't think it will leave some rigid body motion leads to not sufficient constraints. Maybe you have other idea? > > > > > > > > Residual norms for fieldsplit_u_ solve. > > > > 0 KSP Residual norm 3.129067184300e+05 > > > > 1 KSP Residual norm 5.906261468196e-01 > > > > Residual norms for fieldsplit_wp_ solve. > > > > 0 KSP Residual norm 0.000000000000e+00 > > > > > > ^^^^ something is wrong here. The sub solve should not be starting with a 0 residual (this means the right hand side for this sub solve is zero which it should not be). > > > > > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 > > > > > > > > > How are you providing the outer operator? As an explicit matrix or with some shell matrix? > > > > > > > > > > > > > 0 KSP preconditioned resid norm 3.129067184300e+05 true resid norm 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 > > > > Residual norms for fieldsplit_u_ solve. > > > > 0 KSP Residual norm 9.999955993437e-01 > > > > 1 KSP Residual norm 4.019774691831e-06 > > > > Residual norms for fieldsplit_wp_ solve. > > > > 0 KSP Residual norm 0.000000000000e+00 > > > > 1 KSP preconditioned resid norm 5.003913641475e-01 true resid norm 4.692996324114e+01 ||r(i)||/||b|| 5.205677185522e-06 > > > > Residual norms for fieldsplit_u_ solve. > > > > 0 KSP Residual norm 1.000012180204e+00 > > > > 1 KSP Residual norm 1.017367950422e-05 > > > > Residual norms for fieldsplit_wp_ solve. > > > > 0 KSP Residual norm 0.000000000000e+00 > > > > 2 KSP preconditioned resid norm 2.330910333756e-07 true resid norm 3.474855463983e+01 ||r(i)||/||b|| 3.854461960453e-06 > > > > Residual norms for fieldsplit_u_ solve. > > > > 0 KSP Residual norm 1.000004200085e+00 > > > > 1 KSP Residual norm 6.231613102458e-06 > > > > Residual norms for fieldsplit_wp_ solve. > > > > 0 KSP Residual norm 0.000000000000e+00 > > > > 3 KSP preconditioned resid norm 8.671259838389e-11 true resid norm 3.545103468011e+01 ||r(i)||/||b|| 3.932384125024e-06 > > > > Linear solve converged due to CONVERGED_ATOL iterations 3 > > > > KSP Object: 4 MPI processes > > > > type: gmres > > > > GMRES: restart=1000, using Modified Gram-Schmidt Orthogonalization > > > > GMRES: happy breakdown tolerance 1e-30 > > > > maximum iterations=1000, initial guess is zero > > > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 > > > > left preconditioning > > > > using PRECONDITIONED norm type for convergence test > > > > PC Object: 4 MPI processes > > > > type: fieldsplit > > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 > > > > Solver info for each split is in the following KSP objects: > > > > Split number 0 Defined by IS > > > > KSP Object: (fieldsplit_u_) 4 MPI processes > > > > type: richardson > > > > Richardson: damping factor=1 > > > > maximum iterations=1, initial guess is zero > > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > > > left preconditioning > > > > using PRECONDITIONED norm type for convergence test > > > > PC Object: (fieldsplit_u_) 4 MPI processes > > > > type: lu > > > > LU: out-of-place factorization > > > > tolerance for zero pivot 2.22045e-14 > > > > matrix ordering: natural > > > > factor fill ratio given 0, needed 0 > > > > Factored matrix follows: > > > > Mat Object: 4 MPI processes > > > > type: mpiaij > > > > rows=938910, cols=938910 > > > > package used to perform factorization: pastix > > > > total: nonzeros=0, allocated nonzeros=0 > > > > Error : 3.36878e-14 > > > > total number of mallocs used during MatSetValues calls =0 > > > > PaStiX run parameters: > > > > Matrix type : Unsymmetric > > > > Level of printing (0,1,2): 0 > > > > Number of refinements iterations : 3 > > > > Error : 3.36878e-14 > > > > linear system matrix = precond matrix: > > > > Mat Object: (fieldsplit_u_) 4 MPI processes > > > > type: mpiaij > > > > rows=938910, cols=938910, bs=3 > > > > Error : 3.36878e-14 > > > > Error : 3.36878e-14 > > > > total: nonzeros=8.60906e+07, allocated nonzeros=8.60906e+07 > > > > total number of mallocs used during MatSetValues calls =0 > > > > using I-node (on process 0) routines: found 78749 nodes, limit used is 5 > > > > Split number 1 Defined by IS > > > > KSP Object: (fieldsplit_wp_) 4 MPI processes > > > > type: richardson > > > > Richardson: damping factor=1 > > > > maximum iterations=1, initial guess is zero > > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > > > left preconditioning > > > > using PRECONDITIONED norm type for convergence test > > > > PC Object: (fieldsplit_wp_) 4 MPI processes > > > > type: lu > > > > LU: out-of-place factorization > > > > tolerance for zero pivot 2.22045e-14 > > > > matrix ordering: natural > > > > factor fill ratio given 0, needed 0 > > > > Factored matrix follows: > > > > Mat Object: 4 MPI processes > > > > type: mpiaij > > > > rows=34141, cols=34141 > > > > package used to perform factorization: pastix > > > > Error : -nan > > > > Error : -nan > > > > Error : -nan > > > > total: nonzeros=0, allocated nonzeros=0 > > > > total number of mallocs used during MatSetValues calls =0 > > > > PaStiX run parameters: > > > > Matrix type : Symmetric > > > > Level of printing (0,1,2): 0 > > > > Number of refinements iterations : 0 > > > > Error : -nan > > > > linear system matrix = precond matrix: > > > > Mat Object: (fieldsplit_wp_) 4 MPI processes > > > > type: mpiaij > > > > rows=34141, cols=34141 > > > > total: nonzeros=485655, allocated nonzeros=485655 > > > > total number of mallocs used during MatSetValues calls =0 > > > > not using I-node (on process 0) routines > > > > linear system matrix = precond matrix: > > > > Mat Object: 4 MPI processes > > > > type: mpiaij > > > > rows=973051, cols=973051 > > > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 > > > > total number of mallocs used during MatSetValues calls =0 > > > > using I-node (on process 0) routines: found 78749 nodes, limit used is 5 > > > > > > > > > > > > > > > > Giang > > > > > > > > On Sun, Apr 23, 2017 at 10:19 PM, Barry Smith wrote: > > > > > > > > > On Apr 23, 2017, at 2:42 PM, Hoang Giang Bui wrote: > > > > > > > > > > Dear Matt/Barry > > > > > > > > > > With your options, it results in > > > > > > > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 > > > > > Residual norms for fieldsplit_u_ solve. > > > > > 0 KSP Residual norm 2.407308987203e+36 > > > > > 1 KSP Residual norm 5.797185652683e+72 > > > > > > > > It looks like Matt is right, hypre is seemly producing useless garbage. > > > > > > > > First how do things run on one process. If you have similar problems then debug on one process (debugging any kind of problem is always far easy on one process). > > > > > > > > First run with -fieldsplit_u_type lu (instead of using hypre) to see if that works or also produces something bad. > > > > > > > > What is the operator and the boundary conditions for u? It could be singular. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Residual norms for fieldsplit_wp_ solve. > > > > > 0 KSP Residual norm 0.000000000000e+00 > > > > > ... > > > > > 999 KSP preconditioned resid norm 2.920157329174e+12 true resid norm 9.015683504616e+06 ||r(i)||/||b|| 1.000059124102e+00 > > > > > Residual norms for fieldsplit_u_ solve. > > > > > 0 KSP Residual norm 1.533726746719e+36 > > > > > 1 KSP Residual norm 3.692757392261e+72 > > > > > Residual norms for fieldsplit_wp_ solve. > > > > > 0 KSP Residual norm 0.000000000000e+00 > > > > > > > > > > Do you suggest that the pastix solver for the "wp" block encounters small pivot? In addition, seem like the "u" block is also singular. > > > > > > > > > > Giang > > > > > > > > > > On Sun, Apr 23, 2017 at 7:39 PM, Barry Smith wrote: > > > > > > > > > > Huge preconditioned norms but normal unpreconditioned norms almost always come from a very small pivot in an LU or ILU factorization. > > > > > > > > > > The first thing to do is monitor the two sub solves. Run with the additional options -fieldsplit_u_ksp_type richardson -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor -fieldsplit_wp_ksp_max_it 1 > > > > > > > > > > > On Apr 23, 2017, at 12:22 PM, Hoang Giang Bui wrote: > > > > > > > > > > > > Hello > > > > > > > > > > > > I encountered a strange convergence behavior that I have trouble to understand > > > > > > > > > > > > KSPSetFromOptions completed > > > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 > > > > > > 1 KSP preconditioned resid norm 2.933141742664e+29 true resid norm 9.015152282123e+06 ||r(i)||/||b|| 1.000000198575e+00 > > > > > > 2 KSP preconditioned resid norm 9.686409637174e+16 true resid norm 9.015354521944e+06 ||r(i)||/||b|| 1.000022631902e+00 > > > > > > 3 KSP preconditioned resid norm 4.219243615809e+15 true resid norm 9.017157702420e+06 ||r(i)||/||b|| 1.000222648583e+00 > > > > > > ..... > > > > > > 999 KSP preconditioned resid norm 3.043754298076e+12 true resid norm 9.015425041089e+06 ||r(i)||/||b|| 1.000030454195e+00 > > > > > > 1000 KSP preconditioned resid norm 3.043000287819e+12 true resid norm 9.015424313455e+06 ||r(i)||/||b|| 1.000030373483e+00 > > > > > > Linear solve did not converge due to DIVERGED_ITS iterations 1000 > > > > > > KSP Object: 4 MPI processes > > > > > > type: gmres > > > > > > GMRES: restart=1000, using Modified Gram-Schmidt Orthogonalization > > > > > > GMRES: happy breakdown tolerance 1e-30 > > > > > > maximum iterations=1000, initial guess is zero > > > > > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 > > > > > > left preconditioning > > > > > > using PRECONDITIONED norm type for convergence test > > > > > > PC Object: 4 MPI processes > > > > > > type: fieldsplit > > > > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 > > > > > > Solver info for each split is in the following KSP objects: > > > > > > Split number 0 Defined by IS > > > > > > KSP Object: (fieldsplit_u_) 4 MPI processes > > > > > > type: preonly > > > > > > maximum iterations=10000, initial guess is zero > > > > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > > > > > left preconditioning > > > > > > using NONE norm type for convergence test > > > > > > PC Object: (fieldsplit_u_) 4 MPI processes > > > > > > type: hypre > > > > > > HYPRE BoomerAMG preconditioning > > > > > > HYPRE BoomerAMG: Cycle type V > > > > > > HYPRE BoomerAMG: Maximum number of levels 25 > > > > > > HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 > > > > > > HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 > > > > > > HYPRE BoomerAMG: Threshold for strong coupling 0.6 > > > > > > HYPRE BoomerAMG: Interpolation truncation factor 0 > > > > > > HYPRE BoomerAMG: Interpolation: max elements per row 0 > > > > > > HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 > > > > > > HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 > > > > > > HYPRE BoomerAMG: Maximum row sums 0.9 > > > > > > HYPRE BoomerAMG: Sweeps down 1 > > > > > > HYPRE BoomerAMG: Sweeps up 1 > > > > > > HYPRE BoomerAMG: Sweeps on coarse 1 > > > > > > HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi > > > > > > HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi > > > > > > HYPRE BoomerAMG: Relax on coarse Gaussian-elimination > > > > > > HYPRE BoomerAMG: Relax weight (all) 1 > > > > > > HYPRE BoomerAMG: Outer relax weight (all) 1 > > > > > > HYPRE BoomerAMG: Using CF-relaxation > > > > > > HYPRE BoomerAMG: Measure type local > > > > > > HYPRE BoomerAMG: Coarsen type PMIS > > > > > > HYPRE BoomerAMG: Interpolation type classical > > > > > > linear system matrix = precond matrix: > > > > > > Mat Object: (fieldsplit_u_) 4 MPI processes > > > > > > type: mpiaij > > > > > > rows=938910, cols=938910, bs=3 > > > > > > total: nonzeros=8.60906e+07, allocated nonzeros=8.60906e+07 > > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > > using I-node (on process 0) routines: found 78749 nodes, limit used is 5 > > > > > > Split number 1 Defined by IS > > > > > > KSP Object: (fieldsplit_wp_) 4 MPI processes > > > > > > type: preonly > > > > > > maximum iterations=10000, initial guess is zero > > > > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > > > > > left preconditioning > > > > > > using NONE norm type for convergence test > > > > > > PC Object: (fieldsplit_wp_) 4 MPI processes > > > > > > type: lu > > > > > > LU: out-of-place factorization > > > > > > tolerance for zero pivot 2.22045e-14 > > > > > > matrix ordering: natural > > > > > > factor fill ratio given 0, needed 0 > > > > > > Factored matrix follows: > > > > > > Mat Object: 4 MPI processes > > > > > > type: mpiaij > > > > > > rows=34141, cols=34141 > > > > > > package used to perform factorization: pastix > > > > > > Error : -nan > > > > > > Error : -nan > > > > > > total: nonzeros=0, allocated nonzeros=0 > > > > > > Error : -nan > > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > > PaStiX run parameters: > > > > > > Matrix type : Symmetric > > > > > > Level of printing (0,1,2): 0 > > > > > > Number of refinements iterations : 0 > > > > > > Error : -nan > > > > > > linear system matrix = precond matrix: > > > > > > Mat Object: (fieldsplit_wp_) 4 MPI processes > > > > > > type: mpiaij > > > > > > rows=34141, cols=34141 > > > > > > total: nonzeros=485655, allocated nonzeros=485655 > > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > > not using I-node (on process 0) routines > > > > > > linear system matrix = precond matrix: > > > > > > Mat Object: 4 MPI processes > > > > > > type: mpiaij > > > > > > rows=973051, cols=973051 > > > > > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 > > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > > using I-node (on process 0) routines: found 78749 nodes, limit used is 5 > > > > > > > > > > > > The pattern of convergence gives a hint that this system is somehow bad/singular. But I don't know why the preconditioned error goes up too high. Anyone has an idea? > > > > > > > > > > > > Best regards > > > > > > Giang Bui > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From bsmith at mcs.anl.gov Thu Apr 27 10:37:53 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 27 Apr 2017 10:37:53 -0500 Subject: [petsc-users] Multigrid coarse grid solver In-Reply-To: References: <2F68C9DC-49EA-4F79-889B-9E24D068A6C5@mcs.anl.gov> Message-ID: > On Apr 27, 2017, at 7:45 AM, Mark Adams wrote: > > Barry, we seem to get an error when you explicitly set this. Of course you get an error, you are asking PETSc to do a parallel LU; PETSc does NOT have a parallel LU as you well know. How could you possibly think this would work? Note if you have for example superlu_dist or mumps installed and do NOT list -mg_coarse_pc_factor_mat_solver_package petsc then it will default to use one of the parallel solvers automatically (it just looks through the list of installed solvers for that case and picks the first one). > > Garth, Maybe to set the default explicitly you need to use pc_type asm -sub_pc_type lu. That is the true default. > > More below but this is the error message: > > 17:46 knepley/feature-plasma-example *= ~/Codes/petsc/src/ksp/ksp/examples/tutorials$ /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -np 2 ./ex56 -ne 16 -pc_type gamg -ksp_view -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu -mg_coarse_pc_factor_mat_solver_package petsc > [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html for possible LU and Cholesky solvers > [0]PETSC ERROR: MatSolverPackage petsc does not support matrix type mpiaij > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3658-g99fa2798da GIT Date: 2017-04-25 12:56:20 -0500 > [0]PETSC ERROR: ./ex56 on a arch-macosx-gnu-g named MarksMac-5.local by markadams Wed Apr 26 17:46:28 2017 > [0]PETSC ERROR: Configure options --with-cc=clang --with-cc++=clang++ COPTFLAGS="-g -O0 -mavx2" CXXOPTFLAGS="-g -O0 -mavx2" F > > > On Thu, Apr 27, 2017 at 1:59 AM, Garth N. Wells wrote: > On 27 April 2017 at 00:30, Barry Smith wrote: > > > > Yes, you asked for LU so it used LU! > > > > Of course for smaller coarse grids and large numbers of processes this is very inefficient. > > > > The default behavior for GAMG is probably what you want. In that case it is equivalent to > > -mg_coarse_pc_type bjacobi --mg_coarse_sub_pc_type lu. But GAMG tries hard to put all the coarse grid degrees > > of freedom on the first process and none on the rest, so you do end up with the exact equivalent of a direct solver. > > Try -ksp_view in that case. > > > > Thanks, Barry. > > I'm struggling a little to understand the matrix data structure for > the coarse grid. Is it just a mpiaji matrix, with all entries > (usually) on one process? > > Yes. > > > Is there an options key prefix for the matrix on different levels? > E.g., to turn on a viewer? > > something like -mg_level_1_ksp_view should work (run with -help to get the correct syntax). > > > If I get GAMG to use more than one process for the coarse grid (a GAMG > setting), can I get a parallel LU (exact) solver to solve it using > only the processes that store parts of the coarse grid matrix? > > No, we should make a sub communicator for the active processes only, but I am not too motivated to do this because the only reason that this matters is if 1) a solver (ie, the parallel direct solver) is lazy and puts reductions everywhere for not good reason, or 2) you use a Krylov solver (very uncommon). All of the communication in a non-krylov solver in point to point and there is no win that I know of with a sub communicator. > > Note, the redundant coarse grid solver does use a subcommuncator, obviously, but I think it is hardwired to PETSC_COMM_SELF, but maybe not? > > > Related to all this, do the parallel LU solvers internally > re-distribute a matrix over the whole MPI communicator as part of > their re-ordering phase? > > They better not! > > I doubt any solver would be that eager by default. > > > Garth > > > There is also -mg_coarse_pc_type redundant -mg_coarse_redundant_pc_type lu. In that case it makes a copy of the coarse matrix on EACH process and each process does its own factorization and solve. This saves one phase of the communication for each V cycle since every process has the entire solution it just grabs from itself the values it needs without communication. > > > > > > > > > >> On Apr 26, 2017, at 5:25 PM, Garth N. Wells wrote: > >> > >> I'm a bit confused by the selection of the coarse grid solver for > >> multigrid. For the demo ksp/ex56, if I do: > >> > >> mpirun -np 1 ./ex56 -ne 16 -ksp_view -pc_type gamg > >> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu > >> > >> I see > >> > >> Coarse grid solver -- level ------------------------------- > >> KSP Object: (mg_coarse_) 1 MPI processes > >> type: preonly > >> maximum iterations=10000, initial guess is zero > >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > >> left preconditioning > >> using NONE norm type for convergence test > >> PC Object: (mg_coarse_) 1 MPI processes > >> type: lu > >> out-of-place factorization > >> tolerance for zero pivot 2.22045e-14 > >> matrix ordering: nd > >> factor fill ratio given 5., needed 1. > >> Factored matrix follows: > >> Mat Object: 1 MPI processes > >> type: seqaij > >> rows=6, cols=6, bs=6 > >> package used to perform factorization: petsc > >> total: nonzeros=36, allocated nonzeros=36 > >> total number of mallocs used during MatSetValues calls =0 > >> using I-node routines: found 2 nodes, limit used is 5 > >> linear system matrix = precond matrix: > >> Mat Object: 1 MPI processes > >> type: seqaij > >> rows=6, cols=6, bs=6 > >> total: nonzeros=36, allocated nonzeros=36 > >> total number of mallocs used during MatSetValues calls =0 > >> using I-node routines: found 2 nodes, limit used is 5 > >> > >> which is what I expect. Increasing from 1 to 2 processes: > >> > >> mpirun -np 2 ./ex56 -ne 16 -ksp_view -pc_type gamg > >> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu > >> > >> I see > >> > >> Coarse grid solver -- level ------------------------------- > >> KSP Object: (mg_coarse_) 2 MPI processes > >> type: preonly > >> maximum iterations=10000, initial guess is zero > >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > >> left preconditioning > >> using NONE norm type for convergence test > >> PC Object: (mg_coarse_) 2 MPI processes > >> type: lu > >> out-of-place factorization > >> tolerance for zero pivot 2.22045e-14 > >> matrix ordering: natural > >> factor fill ratio given 0., needed 0. > >> Factored matrix follows: > >> Mat Object: 2 MPI processes > >> type: superlu_dist > >> rows=6, cols=6 > >> package used to perform factorization: superlu_dist > >> total: nonzeros=0, allocated nonzeros=0 > >> total number of mallocs used during MatSetValues calls =0 > >> SuperLU_DIST run parameters: > >> Process grid nprow 2 x npcol 1 > >> Equilibrate matrix TRUE > >> Matrix input mode 1 > >> Replace tiny pivots FALSE > >> Use iterative refinement FALSE > >> Processors in row 2 col partition 1 > >> Row permutation LargeDiag > >> Column permutation METIS_AT_PLUS_A > >> Parallel symbolic factorization FALSE > >> Repeated factorization SamePattern > >> linear system matrix = precond matrix: > >> Mat Object: 2 MPI processes > >> type: mpiaij > >> rows=6, cols=6, bs=6 > >> total: nonzeros=36, allocated nonzeros=36 > >> total number of mallocs used during MatSetValues calls =0 > >> using I-node (on process 0) routines: found 2 nodes, limit used is 5 > >> > >> Note that the coarse grid is now using superlu_dist. Is the coarse > >> grid being solved in parallel? > >> > >> Garth > > > From bsmith at mcs.anl.gov Thu Apr 27 10:41:22 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 27 Apr 2017 10:41:22 -0500 Subject: [petsc-users] Multigrid coarse grid solver In-Reply-To: References: <2F68C9DC-49EA-4F79-889B-9E24D068A6C5@mcs.anl.gov> Message-ID: <4522A0BF-FB33-4529-9973-B9AB12B6A599@mcs.anl.gov> > On Apr 27, 2017, at 8:27 AM, Garth N. Wells wrote: > > On 27 April 2017 at 13:45, Mark Adams wrote: >> Barry, we seem to get an error when you explicitly set this. >> >> Garth, Maybe to set the default explicitly you need to use pc_type asm >> -sub_pc_type lu. That is the true default. >> >> More below but this is the error message: >> >> 17:46 knepley/feature-plasma-example *= >> ~/Codes/petsc/src/ksp/ksp/examples/tutorials$ >> /Users/markadams/Codes/petsc/arch-macosx-gnu-g/bin/mpiexec -np 2 ./ex56 -ne >> 16 -pc_type gamg -ksp_view -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu >> -mg_coarse_pc_factor_mat_solver_package petsc >> [0]PETSC ERROR: --------------------- Error Message >> -------------------------------------------------------------- >> [0]PETSC ERROR: See >> http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html for >> possible LU and Cholesky solvers >> [0]PETSC ERROR: MatSolverPackage petsc does not support matrix type mpiaij >> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for >> trouble shooting. >> [0]PETSC ERROR: Petsc Development GIT revision: v3.7.6-3658-g99fa2798da GIT >> Date: 2017-04-25 12:56:20 -0500 >> [0]PETSC ERROR: ./ex56 on a arch-macosx-gnu-g named MarksMac-5.local by >> markadams Wed Apr 26 17:46:28 2017 >> [0]PETSC ERROR: Configure options --with-cc=clang --with-cc++=clang++ >> COPTFLAGS="-g -O0 -mavx2" CXXOPTFLAGS="-g -O0 -mavx2" F >> >> >> On Thu, Apr 27, 2017 at 1:59 AM, Garth N. Wells wrote: >>> >>> On 27 April 2017 at 00:30, Barry Smith wrote: >>>> >>>> Yes, you asked for LU so it used LU! >>>> >>>> Of course for smaller coarse grids and large numbers of processes >>>> this is very inefficient. >>>> >>>> The default behavior for GAMG is probably what you want. In that case >>>> it is equivalent to >>>> -mg_coarse_pc_type bjacobi --mg_coarse_sub_pc_type lu. But GAMG tries >>>> hard to put all the coarse grid degrees >>>> of freedom on the first process and none on the rest, so you do end up >>>> with the exact equivalent of a direct solver. >>>> Try -ksp_view in that case. >>>> >>> >>> Thanks, Barry. >>> >>> I'm struggling a little to understand the matrix data structure for >>> the coarse grid. Is it just a mpiaji matrix, with all entries >>> (usually) on one process? >> >> >> Yes. >> >>> >>> >>> Is there an options key prefix for the matrix on different levels? >>> E.g., to turn on a viewer? >> >> >> something like -mg_level_1_ksp_view should work (run with -help to get the >> correct syntax). >> > > Does the matrix operator(s) associated with the ksp have an options prefix? No, because the matrices are created independent of the KSP/PC infrastructure. You can use -mg_coarse_ksp_view_pmat to print the matrix for just the coarse level; and do things like -mg_coarse_ksp_view_pmat ::ascii_info to display information about the matrix; > >>> >>> >>> If I get GAMG to use more than one process for the coarse grid (a GAMG >>> setting), can I get a parallel LU (exact) solver to solve it using >>> only the processes that store parts of the coarse grid matrix? >> >> >> No, we should make a sub communicator for the active processes only, but I >> am not too motivated to do this because the only reason that this matters is >> if 1) a solver (ie, the parallel direct solver) is lazy and puts reductions >> everywhere for not good reason, or 2) you use a Krylov solver (very >> uncommon). All of the communication in a non-krylov solver in point to point >> and there is no win that I know of with a sub communicator. >> >> Note, the redundant coarse grid solver does use a subcommuncator, obviously, >> but I think it is hardwired to PETSC_COMM_SELF, but maybe not? >> >>> >>> >>> Related to all this, do the parallel LU solvers internally >>> re-distribute a matrix over the whole MPI communicator as part of >>> their re-ordering phase? >> >> >> They better not! >> > > I did a test with MUMPS, and from the MUMPS diagnostics (memory use > per process) it appears that it does split the matrix across all > processes. > > Garth > >> I doubt any solver would be that eager by default. >> >>> >>> >>> Garth >>> >>>> There is also -mg_coarse_pc_type redundant >>>> -mg_coarse_redundant_pc_type lu. In that case it makes a copy of the coarse >>>> matrix on EACH process and each process does its own factorization and >>>> solve. This saves one phase of the communication for each V cycle since >>>> every process has the entire solution it just grabs from itself the values >>>> it needs without communication. >>>> >>>> >>>> >>>> >>>>> On Apr 26, 2017, at 5:25 PM, Garth N. Wells wrote: >>>>> >>>>> I'm a bit confused by the selection of the coarse grid solver for >>>>> multigrid. For the demo ksp/ex56, if I do: >>>>> >>>>> mpirun -np 1 ./ex56 -ne 16 -ksp_view -pc_type gamg >>>>> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu >>>>> >>>>> I see >>>>> >>>>> Coarse grid solver -- level ------------------------------- >>>>> KSP Object: (mg_coarse_) 1 MPI processes >>>>> type: preonly >>>>> maximum iterations=10000, initial guess is zero >>>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>>>> left preconditioning >>>>> using NONE norm type for convergence test >>>>> PC Object: (mg_coarse_) 1 MPI processes >>>>> type: lu >>>>> out-of-place factorization >>>>> tolerance for zero pivot 2.22045e-14 >>>>> matrix ordering: nd >>>>> factor fill ratio given 5., needed 1. >>>>> Factored matrix follows: >>>>> Mat Object: 1 MPI processes >>>>> type: seqaij >>>>> rows=6, cols=6, bs=6 >>>>> package used to perform factorization: petsc >>>>> total: nonzeros=36, allocated nonzeros=36 >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> using I-node routines: found 2 nodes, limit used is 5 >>>>> linear system matrix = precond matrix: >>>>> Mat Object: 1 MPI processes >>>>> type: seqaij >>>>> rows=6, cols=6, bs=6 >>>>> total: nonzeros=36, allocated nonzeros=36 >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> using I-node routines: found 2 nodes, limit used is 5 >>>>> >>>>> which is what I expect. Increasing from 1 to 2 processes: >>>>> >>>>> mpirun -np 2 ./ex56 -ne 16 -ksp_view -pc_type gamg >>>>> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu >>>>> >>>>> I see >>>>> >>>>> Coarse grid solver -- level ------------------------------- >>>>> KSP Object: (mg_coarse_) 2 MPI processes >>>>> type: preonly >>>>> maximum iterations=10000, initial guess is zero >>>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>>>> left preconditioning >>>>> using NONE norm type for convergence test >>>>> PC Object: (mg_coarse_) 2 MPI processes >>>>> type: lu >>>>> out-of-place factorization >>>>> tolerance for zero pivot 2.22045e-14 >>>>> matrix ordering: natural >>>>> factor fill ratio given 0., needed 0. >>>>> Factored matrix follows: >>>>> Mat Object: 2 MPI processes >>>>> type: superlu_dist >>>>> rows=6, cols=6 >>>>> package used to perform factorization: superlu_dist >>>>> total: nonzeros=0, allocated nonzeros=0 >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> SuperLU_DIST run parameters: >>>>> Process grid nprow 2 x npcol 1 >>>>> Equilibrate matrix TRUE >>>>> Matrix input mode 1 >>>>> Replace tiny pivots FALSE >>>>> Use iterative refinement FALSE >>>>> Processors in row 2 col partition 1 >>>>> Row permutation LargeDiag >>>>> Column permutation METIS_AT_PLUS_A >>>>> Parallel symbolic factorization FALSE >>>>> Repeated factorization SamePattern >>>>> linear system matrix = precond matrix: >>>>> Mat Object: 2 MPI processes >>>>> type: mpiaij >>>>> rows=6, cols=6, bs=6 >>>>> total: nonzeros=36, allocated nonzeros=36 >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> using I-node (on process 0) routines: found 2 nodes, limit >>>>> used is 5 >>>>> >>>>> Note that the coarse grid is now using superlu_dist. Is the coarse >>>>> grid being solved in parallel? >>>>> >>>>> Garth >>>> >> >> From bsmith at mcs.anl.gov Thu Apr 27 10:46:30 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 27 Apr 2017 10:46:30 -0500 Subject: [petsc-users] Multigrid coarse grid solver In-Reply-To: References: <2F68C9DC-49EA-4F79-889B-9E24D068A6C5@mcs.anl.gov> Message-ID: <2A4ED388-DBE6-4E2E-8B02-CDFCC83A8E6A@mcs.anl.gov> > On Apr 27, 2017, at 12:59 AM, Garth N. Wells wrote: > > On 27 April 2017 at 00:30, Barry Smith wrote: >> >> Yes, you asked for LU so it used LU! >> >> Of course for smaller coarse grids and large numbers of processes this is very inefficient. >> >> The default behavior for GAMG is probably what you want. In that case it is equivalent to >> -mg_coarse_pc_type bjacobi --mg_coarse_sub_pc_type lu. But GAMG tries hard to put all the coarse grid degrees >> of freedom on the first process and none on the rest, so you do end up with the exact equivalent of a direct solver. >> Try -ksp_view in that case. >> > > Thanks, Barry. > > I'm struggling a little to understand the matrix data structure for > the coarse grid. Is it just a mpiaji matrix, with all entries > (usually) on one process? Yes, when using GAMG > > Is there an options key prefix for the matrix on different levels? > E.g., to turn on a viewer? > > If I get GAMG to use more than one process for the coarse grid (a GAMG > setting), can I get a parallel LU (exact) solver to solve it using > only the processes that store parts of the coarse grid matrix? See below > > Related to all this, do the parallel LU solvers internally > re-distribute a matrix over the whole MPI communicator as part of > their re-ordering phase? This is up to each package, superlu_dist, mumps etc. certainly they could have lots of logic based on matrix size, number of processes, available memory, switch bandwidth to pick an optimal number of processes to use for each factorization but that is complicated I suspect they just use all the resources they can, and so perform very poorly for multiple processes and small problems. What is wrong with just leaving the default which uses a single process for the coarse grid solve? The only time you don't want that is when the coarse problem is very large, then you can use PCREDUNDANT (which allows solves on subsets) or PCTELESCOPE which is more general. Barry > > Garth > >> There is also -mg_coarse_pc_type redundant -mg_coarse_redundant_pc_type lu. In that case it makes a copy of the coarse matrix on EACH process and each process does its own factorization and solve. This saves one phase of the communication for each V cycle since every process has the entire solution it just grabs from itself the values it needs without communication. >> >> >> >> >>> On Apr 26, 2017, at 5:25 PM, Garth N. Wells wrote: >>> >>> I'm a bit confused by the selection of the coarse grid solver for >>> multigrid. For the demo ksp/ex56, if I do: >>> >>> mpirun -np 1 ./ex56 -ne 16 -ksp_view -pc_type gamg >>> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu >>> >>> I see >>> >>> Coarse grid solver -- level ------------------------------- >>> KSP Object: (mg_coarse_) 1 MPI processes >>> type: preonly >>> maximum iterations=10000, initial guess is zero >>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>> left preconditioning >>> using NONE norm type for convergence test >>> PC Object: (mg_coarse_) 1 MPI processes >>> type: lu >>> out-of-place factorization >>> tolerance for zero pivot 2.22045e-14 >>> matrix ordering: nd >>> factor fill ratio given 5., needed 1. >>> Factored matrix follows: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=6, cols=6, bs=6 >>> package used to perform factorization: petsc >>> total: nonzeros=36, allocated nonzeros=36 >>> total number of mallocs used during MatSetValues calls =0 >>> using I-node routines: found 2 nodes, limit used is 5 >>> linear system matrix = precond matrix: >>> Mat Object: 1 MPI processes >>> type: seqaij >>> rows=6, cols=6, bs=6 >>> total: nonzeros=36, allocated nonzeros=36 >>> total number of mallocs used during MatSetValues calls =0 >>> using I-node routines: found 2 nodes, limit used is 5 >>> >>> which is what I expect. Increasing from 1 to 2 processes: >>> >>> mpirun -np 2 ./ex56 -ne 16 -ksp_view -pc_type gamg >>> -mg_coarse_ksp_type preonly -mg_coarse_pc_type lu >>> >>> I see >>> >>> Coarse grid solver -- level ------------------------------- >>> KSP Object: (mg_coarse_) 2 MPI processes >>> type: preonly >>> maximum iterations=10000, initial guess is zero >>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>> left preconditioning >>> using NONE norm type for convergence test >>> PC Object: (mg_coarse_) 2 MPI processes >>> type: lu >>> out-of-place factorization >>> tolerance for zero pivot 2.22045e-14 >>> matrix ordering: natural >>> factor fill ratio given 0., needed 0. >>> Factored matrix follows: >>> Mat Object: 2 MPI processes >>> type: superlu_dist >>> rows=6, cols=6 >>> package used to perform factorization: superlu_dist >>> total: nonzeros=0, allocated nonzeros=0 >>> total number of mallocs used during MatSetValues calls =0 >>> SuperLU_DIST run parameters: >>> Process grid nprow 2 x npcol 1 >>> Equilibrate matrix TRUE >>> Matrix input mode 1 >>> Replace tiny pivots FALSE >>> Use iterative refinement FALSE >>> Processors in row 2 col partition 1 >>> Row permutation LargeDiag >>> Column permutation METIS_AT_PLUS_A >>> Parallel symbolic factorization FALSE >>> Repeated factorization SamePattern >>> linear system matrix = precond matrix: >>> Mat Object: 2 MPI processes >>> type: mpiaij >>> rows=6, cols=6, bs=6 >>> total: nonzeros=36, allocated nonzeros=36 >>> total number of mallocs used during MatSetValues calls =0 >>> using I-node (on process 0) routines: found 2 nodes, limit used is 5 >>> >>> Note that the coarse grid is now using superlu_dist. Is the coarse >>> grid being solved in parallel? >>> >>> Garth >> From jed at jedbrown.org Thu Apr 27 10:22:06 2017 From: jed at jedbrown.org (Jed Brown) Date: Thu, 27 Apr 2017 09:22:06 -0600 Subject: [petsc-users] Multigrid coarse grid solver In-Reply-To: References: <2F68C9DC-49EA-4F79-889B-9E24D068A6C5@mcs.anl.gov> Message-ID: <87pofxzxi9.fsf@jedbrown.org> Mark Adams writes: > On Wed, Apr 26, 2017 at 7:30 PM, Barry Smith wrote: > >> >> Yes, you asked for LU so it used LU! >> >> Of course for smaller coarse grids and large numbers of processes this >> is very inefficient. >> >> The default behavior for GAMG is probably what you want. In that case >> it is equivalent to >> -mg_coarse_pc_type bjacobi --mg_coarse_sub_pc_type lu. But GAMG tries >> hard > > > No, it just slams those puppies onto proc 0 :) Mark is a puppy slammer. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From neok.m4700 at gmail.com Thu Apr 27 12:43:39 2017 From: neok.m4700 at gmail.com (neok m4700) Date: Thu, 27 Apr 2017 19:43:39 +0200 Subject: [petsc-users] explanations on DM_BOUNDARY_PERIODIC In-Reply-To: References: Message-ID: Hi Matthew, Thank you for the clarification, however, it is unclear why there is an additional unknown in the case of periodic bcs. Please see attached to this email what I'd like to achieve, the number of unknowns does not change when switching to the periodic case for e.g. a laplace operator. And in the case of dirichlet or neumann bcs, the extremum cell add information to the RHS, they do not appear in the matrix formulation. Hope I was clear enough, thanks 2017-04-27 16:15 GMT+02:00 Matthew Knepley : > On Thu, Apr 27, 2017 at 3:46 AM, neok m4700 wrote: > >> Hi, >> >> I am trying to change my problem to using periodic boundary conditions. >> >> However, when I use DMDASetUniformCoordinates on the DA, the spacing >> changes. >> >> This is due to an additional point e.g. in dm/impls/da/gr1.c >> >> else if (dim == 2) { >> if (bx == DM_BOUNDARY_PERIODIC) hx = (xmax-xmin)/(M); >> else hx = (xmax-xmin)/(M-1); >> if (by == DM_BOUNDARY_PERIODIC) hy = (ymax-ymin)/(N); >> else hy = (ymax-ymin)/(N-1); >> >> I don't understand the logic here, since xmin an xmax refer to the >> physical domain, how does changing to a periodic BC change the >> discretization ? >> >> Could someone clarify or point to a reference ? >> > > Just do a 1D example with 3 vertices. With a normal domain, you have 2 > cells > > 1-----2-----3 > > so each cell is 1/2 of the domain. In a periodic domain, the last vertex > is connected to the first, so we have 3 cells > > 1-----2-----3-----1 > > and each is 1/3 of the domain. > > Matt > > >> Thanks >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1D.pdf Type: application/pdf Size: 69178 bytes Desc: not available URL: From bsmith at mcs.anl.gov Thu Apr 27 13:03:43 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Thu, 27 Apr 2017 13:03:43 -0500 Subject: [petsc-users] explanations on DM_BOUNDARY_PERIODIC In-Reply-To: References: Message-ID: <3DF0AB40-32C2-4A55-8078-C3D649B4B7E8@mcs.anl.gov> > On Apr 27, 2017, at 12:43 PM, neok m4700 wrote: > > Hi Matthew, > > Thank you for the clarification, however, it is unclear why there is an additional unknown in the case of periodic bcs. > > Please see attached to this email what I'd like to achieve, the number of unknowns does not change when switching to the periodic case for e.g. a laplace operator. So here you are thinking in terms of cell-centered discretizations. You are correct in that case that the number of "unknowns" is the same for both Dirichlet or periodic boundary conditions. DMDA was originally written in support of vertex centered coordinates, then this was extended somewhat with DMDASetInterpolationType() where DMDA_Q1 represents piecewise linear vertex centered while DMDA_Q0 represents piecewise constatant cell-centered. If you look at the source code for DMDASetUniformCoordinates() it is written in the context of vertex centered where the coordinates are stored for each vertex if (bx == DM_BOUNDARY_PERIODIC) hx = (xmax-xmin)/M; else hx = (xmax-xmin)/(M-1); ierr = VecGetArray(xcoor,&coors);CHKERRQ(ierr); for (i=0; i > And in the case of dirichlet or neumann bcs, the extremum cell add information to the RHS, they do not appear in the matrix formulation. > > Hope I was clear enough, > thanks > > > 2017-04-27 16:15 GMT+02:00 Matthew Knepley : > On Thu, Apr 27, 2017 at 3:46 AM, neok m4700 wrote: > Hi, > > I am trying to change my problem to using periodic boundary conditions. > > However, when I use DMDASetUniformCoordinates on the DA, the spacing changes. > > This is due to an additional point e.g. in dm/impls/da/gr1.c > > else if (dim == 2) { > if (bx == DM_BOUNDARY_PERIODIC) hx = (xmax-xmin)/(M); > else hx = (xmax-xmin)/(M-1); > if (by == DM_BOUNDARY_PERIODIC) hy = (ymax-ymin)/(N); > else hy = (ymax-ymin)/(N-1); > > I don't understand the logic here, since xmin an xmax refer to the physical domain, how does changing to a periodic BC change the discretization ? > > Could someone clarify or point to a reference ? > > Just do a 1D example with 3 vertices. With a normal domain, you have 2 cells > > 1-----2-----3 > > so each cell is 1/2 of the domain. In a periodic domain, the last vertex is connected to the first, so we have 3 cells > > 1-----2-----3-----1 > > and each is 1/3 of the domain. > > Matt > > Thanks > > > > -- > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > -- Norbert Wiener > > <1D.pdf> From neok.m4700 at gmail.com Fri Apr 28 02:36:21 2017 From: neok.m4700 at gmail.com (neok m4700) Date: Fri, 28 Apr 2017 09:36:21 +0200 Subject: [petsc-users] explanations on DM_BOUNDARY_PERIODIC In-Reply-To: <3DF0AB40-32C2-4A55-8078-C3D649B4B7E8@mcs.anl.gov> References: <3DF0AB40-32C2-4A55-8078-C3D649B4B7E8@mcs.anl.gov> Message-ID: Hello Barry, Thank you for answering. I quote the DMDA webpage: "The vectors can be thought of as either cell centered or vertex centered on the mesh. But some variables cannot be cell centered and others vertex centered." So If I use this, then when creating the DMDA the overall size will be the number of nodes, with nodal coordinates, and by setting DMDA_Q0 interp together with DM_BOUNDARY_PERIODIC I should be able to recover the solution at cell centers ? I that possible in PETSc or should I stick to the nodal representation of my problem ? thanks. 2017-04-27 20:03 GMT+02:00 Barry Smith : > > > On Apr 27, 2017, at 12:43 PM, neok m4700 wrote: > > > > Hi Matthew, > > > > Thank you for the clarification, however, it is unclear why there is an > additional unknown in the case of periodic bcs. > > > > Please see attached to this email what I'd like to achieve, the number > of unknowns does not change when switching to the periodic case for e.g. a > laplace operator. > > So here you are thinking in terms of cell-centered discretizations. You > are correct in that case that the number of "unknowns" is the same for both > Dirichlet or periodic boundary conditions. > > DMDA was originally written in support of vertex centered coordinates, > then this was extended somewhat with DMDASetInterpolationType() where > DMDA_Q1 represents piecewise linear vertex centered while DMDA_Q0 > represents piecewise constatant cell-centered. > > If you look at the source code for DMDASetUniformCoordinates() it is > written in the context of vertex centered where the coordinates are stored > for each vertex > > if (bx == DM_BOUNDARY_PERIODIC) hx = (xmax-xmin)/M; > else hx = (xmax-xmin)/(M-1); > ierr = VecGetArray(xcoor,&coors);CHKERRQ(ierr); > for (i=0; i coors[i] = xmin + hx*(i+istart); > } > > Note that in the periodic case say domain [0,1) vertex centered with 3 > grid points (in the global problem) the coordinates for the vertices are 0, > 1/3, 2/3 If you are using cell-centered and have 3 cells, the coordinates > of the vertices are again 0, 1/3, 2/3 > > Note that in the cell centered case we are storing in each location of the > vector the coordinates of a vertex, not the coordinates of the cell center > so it is a likely "wonky". > > There is no contradiction between what you are saying and what we are > saying. > > Barry > > > > > And in the case of dirichlet or neumann bcs, the extremum cell add > information to the RHS, they do not appear in the matrix formulation. > > > > Hope I was clear enough, > > thanks > > > > > > 2017-04-27 16:15 GMT+02:00 Matthew Knepley : > > On Thu, Apr 27, 2017 at 3:46 AM, neok m4700 > wrote: > > Hi, > > > > I am trying to change my problem to using periodic boundary conditions. > > > > However, when I use DMDASetUniformCoordinates on the DA, the spacing > changes. > > > > This is due to an additional point e.g. in dm/impls/da/gr1.c > > > > else if (dim == 2) { > > if (bx == DM_BOUNDARY_PERIODIC) hx = (xmax-xmin)/(M); > > else hx = (xmax-xmin)/(M-1); > > if (by == DM_BOUNDARY_PERIODIC) hy = (ymax-ymin)/(N); > > else hy = (ymax-ymin)/(N-1); > > > > I don't understand the logic here, since xmin an xmax refer to the > physical domain, how does changing to a periodic BC change the > discretization ? > > > > Could someone clarify or point to a reference ? > > > > Just do a 1D example with 3 vertices. With a normal domain, you have 2 > cells > > > > 1-----2-----3 > > > > so each cell is 1/2 of the domain. In a periodic domain, the last vertex > is connected to the first, so we have 3 cells > > > > 1-----2-----3-----1 > > > > and each is 1/3 of the domain. > > > > Matt > > > > Thanks > > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > > -- Norbert Wiener > > > > <1D.pdf> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgbk2008 at gmail.com Fri Apr 28 03:56:10 2017 From: hgbk2008 at gmail.com (Hoang Giang Bui) Date: Fri, 28 Apr 2017 10:56:10 +0200 Subject: [petsc-users] strange convergence In-Reply-To: <82193784-B4C4-47D7-80EA-25F549C9091B@mcs.anl.gov> References: <7891536D-91FE-4BFF-8DAD-CE7AB85A4E57@mcs.anl.gov> <425BBB58-9721-49F3-8C86-940F08E925F7@mcs.anl.gov> <42EB791A-40C2-439F-A5F7-5F8C15CECA6F@mcs.anl.gov> <82193784-B4C4-47D7-80EA-25F549C9091B@mcs.anl.gov> Message-ID: It's in fact quite good Residual norms for fieldsplit_u_ solve. 0 KSP Residual norm 4.014715925568e+00 1 KSP Residual norm 2.160497019264e-10 Residual norms for fieldsplit_wp_ solve. 0 KSP Residual norm 0.000000000000e+00 0 KSP preconditioned resid norm 4.014715925568e+00 true resid norm 9.006493082896e+06 ||r(i)||/||b|| 1.000000000000e+00 Residual norms for fieldsplit_u_ solve. 0 KSP Residual norm 9.999999999416e-01 1 KSP Residual norm 7.118380416383e-11 Residual norms for fieldsplit_wp_ solve. 0 KSP Residual norm 0.000000000000e+00 1 KSP preconditioned resid norm 1.701150951035e-10 true resid norm 5.494262251846e-04 ||r(i)||/||b|| 6.100334726599e-11 Linear solve converged due to CONVERGED_ATOL iterations 1 Giang On Thu, Apr 27, 2017 at 5:25 PM, Barry Smith wrote: > > Run again using LU on both blocks to see what happens. > > > > On Apr 27, 2017, at 2:14 AM, Hoang Giang Bui wrote: > > > > I have changed the way to tie the nonconforming mesh. It seems the > matrix now is better > > > > with -pc_type lu the output is > > 0 KSP preconditioned resid norm 3.308678584240e-01 true resid norm > 9.006493082896e+06 ||r(i)||/||b|| 1.000000000000e+00 > > 1 KSP preconditioned resid norm 2.004313395301e-12 true resid norm > 2.549872332830e-05 ||r(i)||/||b|| 2.831148938173e-12 > > Linear solve converged due to CONVERGED_ATOL iterations 1 > > > > > > with -pc_type fieldsplit -fieldsplit_u_pc_type hypre > -fieldsplit_wp_pc_type lu the convergence is slow > > 0 KSP preconditioned resid norm 1.116302362553e-01 true resid norm > 9.006493083520e+06 ||r(i)||/||b|| 1.000000000000e+00 > > 1 KSP preconditioned resid norm 2.582134825666e-02 true resid norm > 9.268347719866e+06 ||r(i)||/||b|| 1.029073984060e+00 > > ... > > 824 KSP preconditioned resid norm 1.018542387738e-09 true resid norm > 2.906608839310e+02 ||r(i)||/||b|| 3.227237074804e-05 > > 825 KSP preconditioned resid norm 9.743727947637e-10 true resid norm > 2.820369993061e+02 ||r(i)||/||b|| 3.131485215062e-05 > > Linear solve converged due to CONVERGED_ATOL iterations 825 > > > > checking with additional -fieldsplit_u_ksp_type richardson > -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 > -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor > -fieldsplit_wp_ksp_max_it 1 gives > > > > 0 KSP preconditioned resid norm 1.116302362553e-01 true resid norm > 9.006493083520e+06 ||r(i)||/||b|| 1.000000000000e+00 > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 5.803507549280e-01 > > 1 KSP Residual norm 2.069538175950e-01 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 0.000000000000e+00 > > 1 KSP preconditioned resid norm 2.582134825666e-02 true resid norm > 9.268347719866e+06 ||r(i)||/||b|| 1.029073984060e+00 > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 7.831796195225e-01 > > 1 KSP Residual norm 1.734608520110e-01 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 0.000000000000e+00 > > .... > > 823 KSP preconditioned resid norm 1.065070135605e-09 true resid norm > 3.081881356833e+02 ||r(i)||/||b|| 3.421843916665e-05 > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 6.113806394327e-01 > > 1 KSP Residual norm 1.535465290944e-01 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 0.000000000000e+00 > > 824 KSP preconditioned resid norm 1.018542387746e-09 true resid norm > 2.906608839353e+02 ||r(i)||/||b|| 3.227237074851e-05 > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 6.123437055586e-01 > > 1 KSP Residual norm 1.524661826133e-01 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 0.000000000000e+00 > > 825 KSP preconditioned resid norm 9.743727947718e-10 true resid norm > 2.820369990571e+02 ||r(i)||/||b|| 3.131485212298e-05 > > Linear solve converged due to CONVERGED_ATOL iterations 825 > > > > > > The residual for wp block is zero since in this first step the rhs is > zero. As can see in the output, the multigrid does not perform well to > reduce the residual in the sub-solve. Is my observation right? what can be > done to improve this? > > > > > > Giang > > > > On Tue, Apr 25, 2017 at 12:17 AM, Barry Smith > wrote: > > > > This can happen in the matrix is singular or nearly singular or if > the factorization generates small pivots, which can occur for even > nonsingular problems if the matrix is poorly scaled or just plain nasty. > > > > > > > On Apr 24, 2017, at 5:10 PM, Hoang Giang Bui > wrote: > > > > > > It took a while, here I send you the output > > > > > > 0 KSP preconditioned resid norm 3.129073545457e+05 true resid norm > 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 > > > 1 KSP preconditioned resid norm 7.442444222843e-01 true resid norm > 1.003356247696e+02 ||r(i)||/||b|| 1.112966720375e-05 > > > 2 KSP preconditioned resid norm 3.267453132529e-07 true resid norm > 3.216722968300e+01 ||r(i)||/||b|| 3.568130084011e-06 > > > 3 KSP preconditioned resid norm 1.155046883816e-11 true resid norm > 3.234460376820e+01 ||r(i)||/||b|| 3.587805194854e-06 > > > Linear solve converged due to CONVERGED_ATOL iterations 3 > > > KSP Object: 4 MPI processes > > > type: gmres > > > GMRES: restart=1000, using Modified Gram-Schmidt Orthogonalization > > > GMRES: happy breakdown tolerance 1e-30 > > > maximum iterations=1000, initial guess is zero > > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 > > > left preconditioning > > > using PRECONDITIONED norm type for convergence test > > > PC Object: 4 MPI processes > > > type: lu > > > LU: out-of-place factorization > > > tolerance for zero pivot 2.22045e-14 > > > matrix ordering: natural > > > factor fill ratio given 0, needed 0 > > > Factored matrix follows: > > > Mat Object: 4 MPI processes > > > type: mpiaij > > > rows=973051, cols=973051 > > > package used to perform factorization: pastix > > > Error : 3.24786e-14 > > > total: nonzeros=0, allocated nonzeros=0 > > > total number of mallocs used during MatSetValues calls =0 > > > PaStiX run parameters: > > > Matrix type : Unsymmetric > > > Level of printing (0,1,2): 0 > > > Number of refinements iterations : 3 > > > Error : 3.24786e-14 > > > linear system matrix = precond matrix: > > > Mat Object: 4 MPI processes > > > type: mpiaij > > > rows=973051, cols=973051 > > > Error : 3.24786e-14 > > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 > > > total number of mallocs used during MatSetValues calls =0 > > > using I-node (on process 0) routines: found 78749 nodes, limit > used is 5 > > > Error : 3.24786e-14 > > > > > > It doesn't do as you said. Something is not right here. I will look in > depth. > > > > > > Giang > > > > > > On Mon, Apr 24, 2017 at 8:21 PM, Barry Smith > wrote: > > > > > > > On Apr 24, 2017, at 12:47 PM, Hoang Giang Bui > wrote: > > > > > > > > Good catch. I get this for the very first step, maybe at that time > the rhs_w is zero. > > > > > > With the multiplicative composition the right hand side of the > second solve is the initial right hand side of the second solve minus > A_10*x where x is the solution to the first sub solve and A_10 is the lower > left block of the outer matrix. So unless both the initial right hand side > has a zero for the second block and A_10 is identically zero the right hand > side for the second sub solve should not be zero. Is A_10 == 0? > > > > > > > > > > In the later step, it shows 2 step convergence > > > > > > > > Residual norms for fieldsplit_u_ solve. > > > > 0 KSP Residual norm 3.165886479830e+04 > > > > 1 KSP Residual norm 2.905922877684e-01 > > > > Residual norms for fieldsplit_wp_ solve. > > > > 0 KSP Residual norm 2.397669419027e-01 > > > > 1 KSP Residual norm 0.000000000000e+00 > > > > 0 KSP preconditioned resid norm 3.165886479920e+04 true resid norm > 7.963616922323e+05 ||r(i)||/||b|| 1.000000000000e+00 > > > > Residual norms for fieldsplit_u_ solve. > > > > 0 KSP Residual norm 9.999891813771e-01 > > > > 1 KSP Residual norm 1.512000395579e-05 > > > > Residual norms for fieldsplit_wp_ solve. > > > > 0 KSP Residual norm 8.192702188243e-06 > > > > 1 KSP Residual norm 0.000000000000e+00 > > > > 1 KSP preconditioned resid norm 5.252183822848e-02 true resid norm > 7.135927677844e+04 ||r(i)||/||b|| 8.960661653427e-02 > > > > > > The outer residual norms are still wonky, the preconditioned > residual norm goes from 3.165886479920e+04 to 5.252183822848e-02 which is a > huge drop but the 7.963616922323e+05 drops very much less > 7.135927677844e+04. This is not normal. > > > > > > What if you just use -pc_type lu for the entire system (no > fieldsplit), does the true residual drop to almost zero in the first > iteration (as it should?). Send the output. > > > > > > > > > > > > > Residual norms for fieldsplit_u_ solve. > > > > 0 KSP Residual norm 6.946213936597e-01 > > > > 1 KSP Residual norm 1.195514007343e-05 > > > > Residual norms for fieldsplit_wp_ solve. > > > > 0 KSP Residual norm 1.025694497535e+00 > > > > 1 KSP Residual norm 0.000000000000e+00 > > > > 2 KSP preconditioned resid norm 8.785709535405e-03 true resid norm > 1.419341799277e+04 ||r(i)||/||b|| 1.782282866091e-02 > > > > Residual norms for fieldsplit_u_ solve. > > > > 0 KSP Residual norm 7.255149996405e-01 > > > > 1 KSP Residual norm 6.583512434218e-06 > > > > Residual norms for fieldsplit_wp_ solve. > > > > 0 KSP Residual norm 1.015229700337e+00 > > > > 1 KSP Residual norm 0.000000000000e+00 > > > > 3 KSP preconditioned resid norm 7.110407712709e-04 true resid norm > 5.284940654154e+02 ||r(i)||/||b|| 6.636357205153e-04 > > > > Residual norms for fieldsplit_u_ solve. > > > > 0 KSP Residual norm 3.512243341400e-01 > > > > 1 KSP Residual norm 2.032490351200e-06 > > > > Residual norms for fieldsplit_wp_ solve. > > > > 0 KSP Residual norm 1.282327290982e+00 > > > > 1 KSP Residual norm 0.000000000000e+00 > > > > 4 KSP preconditioned resid norm 3.482036620521e-05 true resid norm > 4.291231924307e+01 ||r(i)||/||b|| 5.388546393133e-05 > > > > Residual norms for fieldsplit_u_ solve. > > > > 0 KSP Residual norm 3.423609338053e-01 > > > > 1 KSP Residual norm 4.213703301972e-07 > > > > Residual norms for fieldsplit_wp_ solve. > > > > 0 KSP Residual norm 1.157384757538e+00 > > > > 1 KSP Residual norm 0.000000000000e+00 > > > > 5 KSP preconditioned resid norm 1.203470314534e-06 true resid norm > 4.544956156267e+00 ||r(i)||/||b|| 5.707150658550e-06 > > > > Residual norms for fieldsplit_u_ solve. > > > > 0 KSP Residual norm 3.838596289995e-01 > > > > 1 KSP Residual norm 9.927864176103e-08 > > > > Residual norms for fieldsplit_wp_ solve. > > > > 0 KSP Residual norm 1.066298905618e+00 > > > > 1 KSP Residual norm 0.000000000000e+00 > > > > 6 KSP preconditioned resid norm 3.331619244266e-08 true resid norm > 2.821511729024e+00 ||r(i)||/||b|| 3.543002829675e-06 > > > > Residual norms for fieldsplit_u_ solve. > > > > 0 KSP Residual norm 4.624964188094e-01 > > > > 1 KSP Residual norm 6.418229775372e-08 > > > > Residual norms for fieldsplit_wp_ solve. > > > > 0 KSP Residual norm 9.800784311614e-01 > > > > 1 KSP Residual norm 0.000000000000e+00 > > > > 7 KSP preconditioned resid norm 8.788046233297e-10 true resid norm > 2.849209671705e+00 ||r(i)||/||b|| 3.577783436215e-06 > > > > Linear solve converged due to CONVERGED_ATOL iterations 7 > > > > > > > > The outer operator is an explicit matrix. > > > > > > > > Giang > > > > > > > > On Mon, Apr 24, 2017 at 7:32 PM, Barry Smith > wrote: > > > > > > > > > On Apr 24, 2017, at 3:16 AM, Hoang Giang Bui > wrote: > > > > > > > > > > Thanks Barry, trying with -fieldsplit_u_type lu gives better > convergence. I still used 4 procs though, probably with 1 proc it should > also be the same. > > > > > > > > > > The u block used a Nitsche-type operator to connect two > non-matching domains. I don't think it will leave some rigid body motion > leads to not sufficient constraints. Maybe you have other idea? > > > > > > > > > > Residual norms for fieldsplit_u_ solve. > > > > > 0 KSP Residual norm 3.129067184300e+05 > > > > > 1 KSP Residual norm 5.906261468196e-01 > > > > > Residual norms for fieldsplit_wp_ solve. > > > > > 0 KSP Residual norm 0.000000000000e+00 > > > > > > > > ^^^^ something is wrong here. The sub solve should not be > starting with a 0 residual (this means the right hand side for this sub > solve is zero which it should not be). > > > > > > > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 > > > > > > > > > > > > How are you providing the outer operator? As an explicit matrix > or with some shell matrix? > > > > > > > > > > > > > > > > > 0 KSP preconditioned resid norm 3.129067184300e+05 true resid > norm 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 > > > > > Residual norms for fieldsplit_u_ solve. > > > > > 0 KSP Residual norm 9.999955993437e-01 > > > > > 1 KSP Residual norm 4.019774691831e-06 > > > > > Residual norms for fieldsplit_wp_ solve. > > > > > 0 KSP Residual norm 0.000000000000e+00 > > > > > 1 KSP preconditioned resid norm 5.003913641475e-01 true resid > norm 4.692996324114e+01 ||r(i)||/||b|| 5.205677185522e-06 > > > > > Residual norms for fieldsplit_u_ solve. > > > > > 0 KSP Residual norm 1.000012180204e+00 > > > > > 1 KSP Residual norm 1.017367950422e-05 > > > > > Residual norms for fieldsplit_wp_ solve. > > > > > 0 KSP Residual norm 0.000000000000e+00 > > > > > 2 KSP preconditioned resid norm 2.330910333756e-07 true resid > norm 3.474855463983e+01 ||r(i)||/||b|| 3.854461960453e-06 > > > > > Residual norms for fieldsplit_u_ solve. > > > > > 0 KSP Residual norm 1.000004200085e+00 > > > > > 1 KSP Residual norm 6.231613102458e-06 > > > > > Residual norms for fieldsplit_wp_ solve. > > > > > 0 KSP Residual norm 0.000000000000e+00 > > > > > 3 KSP preconditioned resid norm 8.671259838389e-11 true resid > norm 3.545103468011e+01 ||r(i)||/||b|| 3.932384125024e-06 > > > > > Linear solve converged due to CONVERGED_ATOL iterations 3 > > > > > KSP Object: 4 MPI processes > > > > > type: gmres > > > > > GMRES: restart=1000, using Modified Gram-Schmidt > Orthogonalization > > > > > GMRES: happy breakdown tolerance 1e-30 > > > > > maximum iterations=1000, initial guess is zero > > > > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 > > > > > left preconditioning > > > > > using PRECONDITIONED norm type for convergence test > > > > > PC Object: 4 MPI processes > > > > > type: fieldsplit > > > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 > > > > > Solver info for each split is in the following KSP objects: > > > > > Split number 0 Defined by IS > > > > > KSP Object: (fieldsplit_u_) 4 MPI processes > > > > > type: richardson > > > > > Richardson: damping factor=1 > > > > > maximum iterations=1, initial guess is zero > > > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > > > > left preconditioning > > > > > using PRECONDITIONED norm type for convergence test > > > > > PC Object: (fieldsplit_u_) 4 MPI processes > > > > > type: lu > > > > > LU: out-of-place factorization > > > > > tolerance for zero pivot 2.22045e-14 > > > > > matrix ordering: natural > > > > > factor fill ratio given 0, needed 0 > > > > > Factored matrix follows: > > > > > Mat Object: 4 MPI processes > > > > > type: mpiaij > > > > > rows=938910, cols=938910 > > > > > package used to perform factorization: pastix > > > > > total: nonzeros=0, allocated nonzeros=0 > > > > > Error : 3.36878e-14 > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > PaStiX run parameters: > > > > > Matrix type : Unsymmetric > > > > > Level of printing (0,1,2): 0 > > > > > Number of refinements iterations : 3 > > > > > Error : 3.36878e-14 > > > > > linear system matrix = precond matrix: > > > > > Mat Object: (fieldsplit_u_) 4 MPI processes > > > > > type: mpiaij > > > > > rows=938910, cols=938910, bs=3 > > > > > Error : 3.36878e-14 > > > > > Error : 3.36878e-14 > > > > > total: nonzeros=8.60906e+07, allocated nonzeros=8.60906e+07 > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > using I-node (on process 0) routines: found 78749 nodes, > limit used is 5 > > > > > Split number 1 Defined by IS > > > > > KSP Object: (fieldsplit_wp_) 4 MPI processes > > > > > type: richardson > > > > > Richardson: damping factor=1 > > > > > maximum iterations=1, initial guess is zero > > > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > > > > left preconditioning > > > > > using PRECONDITIONED norm type for convergence test > > > > > PC Object: (fieldsplit_wp_) 4 MPI processes > > > > > type: lu > > > > > LU: out-of-place factorization > > > > > tolerance for zero pivot 2.22045e-14 > > > > > matrix ordering: natural > > > > > factor fill ratio given 0, needed 0 > > > > > Factored matrix follows: > > > > > Mat Object: 4 MPI processes > > > > > type: mpiaij > > > > > rows=34141, cols=34141 > > > > > package used to perform factorization: pastix > > > > > Error : -nan > > > > > Error : -nan > > > > > Error : -nan > > > > > total: nonzeros=0, allocated nonzeros=0 > > > > > total number of mallocs used during MatSetValues > calls =0 > > > > > PaStiX run parameters: > > > > > Matrix type : Symmetric > > > > > Level of printing (0,1,2): 0 > > > > > Number of refinements iterations : 0 > > > > > Error : -nan > > > > > linear system matrix = precond matrix: > > > > > Mat Object: (fieldsplit_wp_) 4 MPI processes > > > > > type: mpiaij > > > > > rows=34141, cols=34141 > > > > > total: nonzeros=485655, allocated nonzeros=485655 > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > not using I-node (on process 0) routines > > > > > linear system matrix = precond matrix: > > > > > Mat Object: 4 MPI processes > > > > > type: mpiaij > > > > > rows=973051, cols=973051 > > > > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > using I-node (on process 0) routines: found 78749 nodes, > limit used is 5 > > > > > > > > > > > > > > > > > > > > Giang > > > > > > > > > > On Sun, Apr 23, 2017 at 10:19 PM, Barry Smith > wrote: > > > > > > > > > > > On Apr 23, 2017, at 2:42 PM, Hoang Giang Bui > wrote: > > > > > > > > > > > > Dear Matt/Barry > > > > > > > > > > > > With your options, it results in > > > > > > > > > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true resid > norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 > > > > > > Residual norms for fieldsplit_u_ solve. > > > > > > 0 KSP Residual norm 2.407308987203e+36 > > > > > > 1 KSP Residual norm 5.797185652683e+72 > > > > > > > > > > It looks like Matt is right, hypre is seemly producing useless > garbage. > > > > > > > > > > First how do things run on one process. If you have similar > problems then debug on one process (debugging any kind of problem is always > far easy on one process). > > > > > > > > > > First run with -fieldsplit_u_type lu (instead of using hypre) to > see if that works or also produces something bad. > > > > > > > > > > What is the operator and the boundary conditions for u? It could > be singular. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Residual norms for fieldsplit_wp_ solve. > > > > > > 0 KSP Residual norm 0.000000000000e+00 > > > > > > ... > > > > > > 999 KSP preconditioned resid norm 2.920157329174e+12 true resid > norm 9.015683504616e+06 ||r(i)||/||b|| 1.000059124102e+00 > > > > > > Residual norms for fieldsplit_u_ solve. > > > > > > 0 KSP Residual norm 1.533726746719e+36 > > > > > > 1 KSP Residual norm 3.692757392261e+72 > > > > > > Residual norms for fieldsplit_wp_ solve. > > > > > > 0 KSP Residual norm 0.000000000000e+00 > > > > > > > > > > > > Do you suggest that the pastix solver for the "wp" block > encounters small pivot? In addition, seem like the "u" block is also > singular. > > > > > > > > > > > > Giang > > > > > > > > > > > > On Sun, Apr 23, 2017 at 7:39 PM, Barry Smith > wrote: > > > > > > > > > > > > Huge preconditioned norms but normal unpreconditioned norms > almost always come from a very small pivot in an LU or ILU factorization. > > > > > > > > > > > > The first thing to do is monitor the two sub solves. Run with > the additional options -fieldsplit_u_ksp_type richardson > -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 > -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor > -fieldsplit_wp_ksp_max_it 1 > > > > > > > > > > > > > On Apr 23, 2017, at 12:22 PM, Hoang Giang Bui < > hgbk2008 at gmail.com> wrote: > > > > > > > > > > > > > > Hello > > > > > > > > > > > > > > I encountered a strange convergence behavior that I have > trouble to understand > > > > > > > > > > > > > > KSPSetFromOptions completed > > > > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true > resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 > > > > > > > 1 KSP preconditioned resid norm 2.933141742664e+29 true > resid norm 9.015152282123e+06 ||r(i)||/||b|| 1.000000198575e+00 > > > > > > > 2 KSP preconditioned resid norm 9.686409637174e+16 true > resid norm 9.015354521944e+06 ||r(i)||/||b|| 1.000022631902e+00 > > > > > > > 3 KSP preconditioned resid norm 4.219243615809e+15 true > resid norm 9.017157702420e+06 ||r(i)||/||b|| 1.000222648583e+00 > > > > > > > ..... > > > > > > > 999 KSP preconditioned resid norm 3.043754298076e+12 true > resid norm 9.015425041089e+06 ||r(i)||/||b|| 1.000030454195e+00 > > > > > > > 1000 KSP preconditioned resid norm 3.043000287819e+12 true > resid norm 9.015424313455e+06 ||r(i)||/||b|| 1.000030373483e+00 > > > > > > > Linear solve did not converge due to DIVERGED_ITS iterations > 1000 > > > > > > > KSP Object: 4 MPI processes > > > > > > > type: gmres > > > > > > > GMRES: restart=1000, using Modified Gram-Schmidt > Orthogonalization > > > > > > > GMRES: happy breakdown tolerance 1e-30 > > > > > > > maximum iterations=1000, initial guess is zero > > > > > > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 > > > > > > > left preconditioning > > > > > > > using PRECONDITIONED norm type for convergence test > > > > > > > PC Object: 4 MPI processes > > > > > > > type: fieldsplit > > > > > > > FieldSplit with MULTIPLICATIVE composition: total splits = > 2 > > > > > > > Solver info for each split is in the following KSP objects: > > > > > > > Split number 0 Defined by IS > > > > > > > KSP Object: (fieldsplit_u_) 4 MPI processes > > > > > > > type: preonly > > > > > > > maximum iterations=10000, initial guess is zero > > > > > > > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000 > > > > > > > left preconditioning > > > > > > > using NONE norm type for convergence test > > > > > > > PC Object: (fieldsplit_u_) 4 MPI processes > > > > > > > type: hypre > > > > > > > HYPRE BoomerAMG preconditioning > > > > > > > HYPRE BoomerAMG: Cycle type V > > > > > > > HYPRE BoomerAMG: Maximum number of levels 25 > > > > > > > HYPRE BoomerAMG: Maximum number of iterations PER > hypre call 1 > > > > > > > HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 > > > > > > > HYPRE BoomerAMG: Threshold for strong coupling 0.6 > > > > > > > HYPRE BoomerAMG: Interpolation truncation factor 0 > > > > > > > HYPRE BoomerAMG: Interpolation: max elements per row 0 > > > > > > > HYPRE BoomerAMG: Number of levels of aggressive > coarsening 0 > > > > > > > HYPRE BoomerAMG: Number of paths for aggressive > coarsening 1 > > > > > > > HYPRE BoomerAMG: Maximum row sums 0.9 > > > > > > > HYPRE BoomerAMG: Sweeps down 1 > > > > > > > HYPRE BoomerAMG: Sweeps up 1 > > > > > > > HYPRE BoomerAMG: Sweeps on coarse 1 > > > > > > > HYPRE BoomerAMG: Relax down > symmetric-SOR/Jacobi > > > > > > > HYPRE BoomerAMG: Relax up > symmetric-SOR/Jacobi > > > > > > > HYPRE BoomerAMG: Relax on coarse > Gaussian-elimination > > > > > > > HYPRE BoomerAMG: Relax weight (all) 1 > > > > > > > HYPRE BoomerAMG: Outer relax weight (all) 1 > > > > > > > HYPRE BoomerAMG: Using CF-relaxation > > > > > > > HYPRE BoomerAMG: Measure type local > > > > > > > HYPRE BoomerAMG: Coarsen type PMIS > > > > > > > HYPRE BoomerAMG: Interpolation type classical > > > > > > > linear system matrix = precond matrix: > > > > > > > Mat Object: (fieldsplit_u_) 4 MPI processes > > > > > > > type: mpiaij > > > > > > > rows=938910, cols=938910, bs=3 > > > > > > > total: nonzeros=8.60906e+07, allocated > nonzeros=8.60906e+07 > > > > > > > total number of mallocs used during MatSetValues calls > =0 > > > > > > > using I-node (on process 0) routines: found 78749 > nodes, limit used is 5 > > > > > > > Split number 1 Defined by IS > > > > > > > KSP Object: (fieldsplit_wp_) 4 MPI processes > > > > > > > type: preonly > > > > > > > maximum iterations=10000, initial guess is zero > > > > > > > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000 > > > > > > > left preconditioning > > > > > > > using NONE norm type for convergence test > > > > > > > PC Object: (fieldsplit_wp_) 4 MPI processes > > > > > > > type: lu > > > > > > > LU: out-of-place factorization > > > > > > > tolerance for zero pivot 2.22045e-14 > > > > > > > matrix ordering: natural > > > > > > > factor fill ratio given 0, needed 0 > > > > > > > Factored matrix follows: > > > > > > > Mat Object: 4 MPI processes > > > > > > > type: mpiaij > > > > > > > rows=34141, cols=34141 > > > > > > > package used to perform factorization: pastix > > > > > > > Error : -nan > > > > > > > Error : -nan > > > > > > > total: nonzeros=0, allocated nonzeros=0 > > > > > > > Error : -nan > > > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > > > PaStiX run parameters: > > > > > > > Matrix type : Symmetric > > > > > > > Level of printing (0,1,2): 0 > > > > > > > Number of refinements iterations : 0 > > > > > > > Error : -nan > > > > > > > linear system matrix = precond matrix: > > > > > > > Mat Object: (fieldsplit_wp_) 4 MPI processes > > > > > > > type: mpiaij > > > > > > > rows=34141, cols=34141 > > > > > > > total: nonzeros=485655, allocated nonzeros=485655 > > > > > > > total number of mallocs used during MatSetValues calls > =0 > > > > > > > not using I-node (on process 0) routines > > > > > > > linear system matrix = precond matrix: > > > > > > > Mat Object: 4 MPI processes > > > > > > > type: mpiaij > > > > > > > rows=973051, cols=973051 > > > > > > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 > > > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > > > using I-node (on process 0) routines: found 78749 nodes, > limit used is 5 > > > > > > > > > > > > > > The pattern of convergence gives a hint that this system is > somehow bad/singular. But I don't know why the preconditioned error goes up > too high. Anyone has an idea? > > > > > > > > > > > > > > Best regards > > > > > > > Giang Bui > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Apr 28 06:35:42 2017 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 28 Apr 2017 06:35:42 -0500 Subject: [petsc-users] explanations on DM_BOUNDARY_PERIODIC In-Reply-To: References: <3DF0AB40-32C2-4A55-8078-C3D649B4B7E8@mcs.anl.gov> Message-ID: On Fri, Apr 28, 2017 at 2:36 AM, neok m4700 wrote: > Hello Barry, > > Thank you for answering. > > I quote the DMDA webpage: > "The vectors can be thought of as either cell centered or vertex centered > on the mesh. But some variables cannot be cell centered and others vertex > centered." > > So If I use this, then when creating the DMDA the overall size will be the > number of nodes, with nodal coordinates, and by setting DMDA_Q0 interp > together with DM_BOUNDARY_PERIODIC I should be able to recover the solution > at cell centers ? > > I that possible in PETSc or should I stick to the nodal representation of > my problem ? > With periodic problems, you must think of the input sizes as being vertices. Matt > thanks. > > 2017-04-27 20:03 GMT+02:00 Barry Smith : > >> >> > On Apr 27, 2017, at 12:43 PM, neok m4700 wrote: >> > >> > Hi Matthew, >> > >> > Thank you for the clarification, however, it is unclear why there is an >> additional unknown in the case of periodic bcs. >> > >> > Please see attached to this email what I'd like to achieve, the number >> of unknowns does not change when switching to the periodic case for e.g. a >> laplace operator. >> >> So here you are thinking in terms of cell-centered discretizations. >> You are correct in that case that the number of "unknowns" is the same for >> both Dirichlet or periodic boundary conditions. >> >> DMDA was originally written in support of vertex centered coordinates, >> then this was extended somewhat with DMDASetInterpolationType() where >> DMDA_Q1 represents piecewise linear vertex centered while DMDA_Q0 >> represents piecewise constatant cell-centered. >> >> If you look at the source code for DMDASetUniformCoordinates() it is >> written in the context of vertex centered where the coordinates are stored >> for each vertex >> >> if (bx == DM_BOUNDARY_PERIODIC) hx = (xmax-xmin)/M; >> else hx = (xmax-xmin)/(M-1); >> ierr = VecGetArray(xcoor,&coors);CHKERRQ(ierr); >> for (i=0; i> coors[i] = xmin + hx*(i+istart); >> } >> >> Note that in the periodic case say domain [0,1) vertex centered with 3 >> grid points (in the global problem) the coordinates for the vertices are 0, >> 1/3, 2/3 If you are using cell-centered and have 3 cells, the coordinates >> of the vertices are again 0, 1/3, 2/3 >> >> Note that in the cell centered case we are storing in each location of >> the vector the coordinates of a vertex, not the coordinates of the cell >> center so it is a likely "wonky". >> >> There is no contradiction between what you are saying and what we are >> saying. >> >> Barry >> >> > >> > And in the case of dirichlet or neumann bcs, the extremum cell add >> information to the RHS, they do not appear in the matrix formulation. >> > >> > Hope I was clear enough, >> > thanks >> > >> > >> > 2017-04-27 16:15 GMT+02:00 Matthew Knepley : >> > On Thu, Apr 27, 2017 at 3:46 AM, neok m4700 >> wrote: >> > Hi, >> > >> > I am trying to change my problem to using periodic boundary conditions. >> > >> > However, when I use DMDASetUniformCoordinates on the DA, the spacing >> changes. >> > >> > This is due to an additional point e.g. in dm/impls/da/gr1.c >> > >> > else if (dim == 2) { >> > if (bx == DM_BOUNDARY_PERIODIC) hx = (xmax-xmin)/(M); >> > else hx = (xmax-xmin)/(M-1); >> > if (by == DM_BOUNDARY_PERIODIC) hy = (ymax-ymin)/(N); >> > else hy = (ymax-ymin)/(N-1); >> > >> > I don't understand the logic here, since xmin an xmax refer to the >> physical domain, how does changing to a periodic BC change the >> discretization ? >> > >> > Could someone clarify or point to a reference ? >> > >> > Just do a 1D example with 3 vertices. With a normal domain, you have 2 >> cells >> > >> > 1-----2-----3 >> > >> > so each cell is 1/2 of the domain. In a periodic domain, the last >> vertex is connected to the first, so we have 3 cells >> > >> > 1-----2-----3-----1 >> > >> > and each is 1/3 of the domain. >> > >> > Matt >> > >> > Thanks >> > >> > >> > >> > -- >> > What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> > -- Norbert Wiener >> > >> > <1D.pdf> >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From fmilicchio at me.com Fri Apr 28 09:13:25 2017 From: fmilicchio at me.com (Franco Milicchio) Date: Fri, 28 Apr 2017 16:13:25 +0200 Subject: [petsc-users] Using ViennaCL without recompiling Message-ID: <1DBD3F6F-E787-4E13-B997-88B85090BA17@me.com> Dear all, I need to integrate ViennaCL into an existing project that uses PETSc, but for backwards compatibility, we cannot recompile it. Is there any simple interface to copy a Mat and Vec objects into a ViennaCL matrix and vector ones? I am sorry if this is a trivial question, but as far as I see, I cannot find any tutorials on these matters. Thanks for any help! Franco /fm -- Franco Milicchio Department of Engineering University Roma Tre https://fmilicchio.bitbucket.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Apr 28 09:30:59 2017 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 28 Apr 2017 09:30:59 -0500 Subject: [petsc-users] Using ViennaCL without recompiling In-Reply-To: <1DBD3F6F-E787-4E13-B997-88B85090BA17@me.com> References: <1DBD3F6F-E787-4E13-B997-88B85090BA17@me.com> Message-ID: On Fri, Apr 28, 2017 at 9:13 AM, Franco Milicchio wrote: > Dear all, > > I need to integrate ViennaCL into an existing project that uses PETSc, but > for backwards compatibility, we cannot recompile it. > Not recompiling your own project is fine. PETSc has an ABI. You just reconfigure/recompile PETSc with ViennaCL support. Then you can use -mat_type viennacl etc. Matt > Is there any simple interface to copy a Mat and Vec objects into a > ViennaCL matrix and vector ones? I am sorry if this is a trivial question, > but as far as I see, I cannot find any tutorials on these matters. > > Thanks for any help! > Franco > /fm > > -- > Franco Milicchio > > Department of Engineering > University Roma Tre > https://fmilicchio.bitbucket.io/ > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From fmilicchio at me.com Fri Apr 28 09:43:05 2017 From: fmilicchio at me.com (Franco Milicchio) Date: Fri, 28 Apr 2017 16:43:05 +0200 Subject: [petsc-users] Using ViennaCL without recompiling In-Reply-To: References: <1DBD3F6F-E787-4E13-B997-88B85090BA17@me.com> Message-ID: > Not recompiling your own project is fine. PETSc has an ABI. You just reconfigure/recompile PETSc with > ViennaCL support. Then you can use -mat_type viennacl etc. Thanks for your answer, Matt, but I expressed myself in an ambiguous way. I cannot recompile PETSc, I can do whatever I want with my code. Thank you! Franco /fm -- Franco Milicchio Department of Engineering University Roma Tre https://fmilicchio.bitbucket.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From balay at mcs.anl.gov Fri Apr 28 09:46:49 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Fri, 28 Apr 2017 09:46:49 -0500 Subject: [petsc-users] Using ViennaCL without recompiling In-Reply-To: References: <1DBD3F6F-E787-4E13-B997-88B85090BA17@me.com> Message-ID: On Fri, 28 Apr 2017, Franco Milicchio wrote: > > > Not recompiling your own project is fine. PETSc has an ABI. You just reconfigure/recompile PETSc with > > ViennaCL support. Then you can use -mat_type viennacl etc. > > Thanks for your answer, Matt, but I expressed myself in an ambiguous way. > > I cannot recompile PETSc, I can do whatever I want with my code. You can always install PETSc. If you don't have write permission to the install you are currently using - you can start with a fresh tarball [of the same version], use reconfigure*.py from the current install to configure - and install your own copy [obviously at a different location. Satish > > Thank you! > Franco > /fm > > From bsmith at mcs.anl.gov Fri Apr 28 11:21:41 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 28 Apr 2017 11:21:41 -0500 Subject: [petsc-users] strange convergence In-Reply-To: References: <7891536D-91FE-4BFF-8DAD-CE7AB85A4E57@mcs.anl.gov> <425BBB58-9721-49F3-8C86-940F08E925F7@mcs.anl.gov> <42EB791A-40C2-439F-A5F7-5F8C15CECA6F@mcs.anl.gov> <82193784-B4C4-47D7-80EA-25F549C9091B@mcs.anl.gov> Message-ID: Ok, so boomerAMG algebraic multigrid is not good for the first block. You mentioned the first block has two things glued together? AMG is fantastic for certain problems but doesn't work for everything. Tell us more about the first block, what PDE it comes from, what discretization, and what the "gluing business" is and maybe we'll have suggestions for how to precondition it. Barry > On Apr 28, 2017, at 3:56 AM, Hoang Giang Bui wrote: > > It's in fact quite good > > Residual norms for fieldsplit_u_ solve. > 0 KSP Residual norm 4.014715925568e+00 > 1 KSP Residual norm 2.160497019264e-10 > Residual norms for fieldsplit_wp_ solve. > 0 KSP Residual norm 0.000000000000e+00 > 0 KSP preconditioned resid norm 4.014715925568e+00 true resid norm 9.006493082896e+06 ||r(i)||/||b|| 1.000000000000e+00 > Residual norms for fieldsplit_u_ solve. > 0 KSP Residual norm 9.999999999416e-01 > 1 KSP Residual norm 7.118380416383e-11 > Residual norms for fieldsplit_wp_ solve. > 0 KSP Residual norm 0.000000000000e+00 > 1 KSP preconditioned resid norm 1.701150951035e-10 true resid norm 5.494262251846e-04 ||r(i)||/||b|| 6.100334726599e-11 > Linear solve converged due to CONVERGED_ATOL iterations 1 > > Giang > > On Thu, Apr 27, 2017 at 5:25 PM, Barry Smith wrote: > > Run again using LU on both blocks to see what happens. > > > > On Apr 27, 2017, at 2:14 AM, Hoang Giang Bui wrote: > > > > I have changed the way to tie the nonconforming mesh. It seems the matrix now is better > > > > with -pc_type lu the output is > > 0 KSP preconditioned resid norm 3.308678584240e-01 true resid norm 9.006493082896e+06 ||r(i)||/||b|| 1.000000000000e+00 > > 1 KSP preconditioned resid norm 2.004313395301e-12 true resid norm 2.549872332830e-05 ||r(i)||/||b|| 2.831148938173e-12 > > Linear solve converged due to CONVERGED_ATOL iterations 1 > > > > > > with -pc_type fieldsplit -fieldsplit_u_pc_type hypre -fieldsplit_wp_pc_type lu the convergence is slow > > 0 KSP preconditioned resid norm 1.116302362553e-01 true resid norm 9.006493083520e+06 ||r(i)||/||b|| 1.000000000000e+00 > > 1 KSP preconditioned resid norm 2.582134825666e-02 true resid norm 9.268347719866e+06 ||r(i)||/||b|| 1.029073984060e+00 > > ... > > 824 KSP preconditioned resid norm 1.018542387738e-09 true resid norm 2.906608839310e+02 ||r(i)||/||b|| 3.227237074804e-05 > > 825 KSP preconditioned resid norm 9.743727947637e-10 true resid norm 2.820369993061e+02 ||r(i)||/||b|| 3.131485215062e-05 > > Linear solve converged due to CONVERGED_ATOL iterations 825 > > > > checking with additional -fieldsplit_u_ksp_type richardson -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor -fieldsplit_wp_ksp_max_it 1 gives > > > > 0 KSP preconditioned resid norm 1.116302362553e-01 true resid norm 9.006493083520e+06 ||r(i)||/||b|| 1.000000000000e+00 > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 5.803507549280e-01 > > 1 KSP Residual norm 2.069538175950e-01 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 0.000000000000e+00 > > 1 KSP preconditioned resid norm 2.582134825666e-02 true resid norm 9.268347719866e+06 ||r(i)||/||b|| 1.029073984060e+00 > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 7.831796195225e-01 > > 1 KSP Residual norm 1.734608520110e-01 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 0.000000000000e+00 > > .... > > 823 KSP preconditioned resid norm 1.065070135605e-09 true resid norm 3.081881356833e+02 ||r(i)||/||b|| 3.421843916665e-05 > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 6.113806394327e-01 > > 1 KSP Residual norm 1.535465290944e-01 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 0.000000000000e+00 > > 824 KSP preconditioned resid norm 1.018542387746e-09 true resid norm 2.906608839353e+02 ||r(i)||/||b|| 3.227237074851e-05 > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 6.123437055586e-01 > > 1 KSP Residual norm 1.524661826133e-01 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 0.000000000000e+00 > > 825 KSP preconditioned resid norm 9.743727947718e-10 true resid norm 2.820369990571e+02 ||r(i)||/||b|| 3.131485212298e-05 > > Linear solve converged due to CONVERGED_ATOL iterations 825 > > > > > > The residual for wp block is zero since in this first step the rhs is zero. As can see in the output, the multigrid does not perform well to reduce the residual in the sub-solve. Is my observation right? what can be done to improve this? > > > > > > Giang > > > > On Tue, Apr 25, 2017 at 12:17 AM, Barry Smith wrote: > > > > This can happen in the matrix is singular or nearly singular or if the factorization generates small pivots, which can occur for even nonsingular problems if the matrix is poorly scaled or just plain nasty. > > > > > > > On Apr 24, 2017, at 5:10 PM, Hoang Giang Bui wrote: > > > > > > It took a while, here I send you the output > > > > > > 0 KSP preconditioned resid norm 3.129073545457e+05 true resid norm 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 > > > 1 KSP preconditioned resid norm 7.442444222843e-01 true resid norm 1.003356247696e+02 ||r(i)||/||b|| 1.112966720375e-05 > > > 2 KSP preconditioned resid norm 3.267453132529e-07 true resid norm 3.216722968300e+01 ||r(i)||/||b|| 3.568130084011e-06 > > > 3 KSP preconditioned resid norm 1.155046883816e-11 true resid norm 3.234460376820e+01 ||r(i)||/||b|| 3.587805194854e-06 > > > Linear solve converged due to CONVERGED_ATOL iterations 3 > > > KSP Object: 4 MPI processes > > > type: gmres > > > GMRES: restart=1000, using Modified Gram-Schmidt Orthogonalization > > > GMRES: happy breakdown tolerance 1e-30 > > > maximum iterations=1000, initial guess is zero > > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 > > > left preconditioning > > > using PRECONDITIONED norm type for convergence test > > > PC Object: 4 MPI processes > > > type: lu > > > LU: out-of-place factorization > > > tolerance for zero pivot 2.22045e-14 > > > matrix ordering: natural > > > factor fill ratio given 0, needed 0 > > > Factored matrix follows: > > > Mat Object: 4 MPI processes > > > type: mpiaij > > > rows=973051, cols=973051 > > > package used to perform factorization: pastix > > > Error : 3.24786e-14 > > > total: nonzeros=0, allocated nonzeros=0 > > > total number of mallocs used during MatSetValues calls =0 > > > PaStiX run parameters: > > > Matrix type : Unsymmetric > > > Level of printing (0,1,2): 0 > > > Number of refinements iterations : 3 > > > Error : 3.24786e-14 > > > linear system matrix = precond matrix: > > > Mat Object: 4 MPI processes > > > type: mpiaij > > > rows=973051, cols=973051 > > > Error : 3.24786e-14 > > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 > > > total number of mallocs used during MatSetValues calls =0 > > > using I-node (on process 0) routines: found 78749 nodes, limit used is 5 > > > Error : 3.24786e-14 > > > > > > It doesn't do as you said. Something is not right here. I will look in depth. > > > > > > Giang > > > > > > On Mon, Apr 24, 2017 at 8:21 PM, Barry Smith wrote: > > > > > > > On Apr 24, 2017, at 12:47 PM, Hoang Giang Bui wrote: > > > > > > > > Good catch. I get this for the very first step, maybe at that time the rhs_w is zero. > > > > > > With the multiplicative composition the right hand side of the second solve is the initial right hand side of the second solve minus A_10*x where x is the solution to the first sub solve and A_10 is the lower left block of the outer matrix. So unless both the initial right hand side has a zero for the second block and A_10 is identically zero the right hand side for the second sub solve should not be zero. Is A_10 == 0? > > > > > > > > > > In the later step, it shows 2 step convergence > > > > > > > > Residual norms for fieldsplit_u_ solve. > > > > 0 KSP Residual norm 3.165886479830e+04 > > > > 1 KSP Residual norm 2.905922877684e-01 > > > > Residual norms for fieldsplit_wp_ solve. > > > > 0 KSP Residual norm 2.397669419027e-01 > > > > 1 KSP Residual norm 0.000000000000e+00 > > > > 0 KSP preconditioned resid norm 3.165886479920e+04 true resid norm 7.963616922323e+05 ||r(i)||/||b|| 1.000000000000e+00 > > > > Residual norms for fieldsplit_u_ solve. > > > > 0 KSP Residual norm 9.999891813771e-01 > > > > 1 KSP Residual norm 1.512000395579e-05 > > > > Residual norms for fieldsplit_wp_ solve. > > > > 0 KSP Residual norm 8.192702188243e-06 > > > > 1 KSP Residual norm 0.000000000000e+00 > > > > 1 KSP preconditioned resid norm 5.252183822848e-02 true resid norm 7.135927677844e+04 ||r(i)||/||b|| 8.960661653427e-02 > > > > > > The outer residual norms are still wonky, the preconditioned residual norm goes from 3.165886479920e+04 to 5.252183822848e-02 which is a huge drop but the 7.963616922323e+05 drops very much less 7.135927677844e+04. This is not normal. > > > > > > What if you just use -pc_type lu for the entire system (no fieldsplit), does the true residual drop to almost zero in the first iteration (as it should?). Send the output. > > > > > > > > > > > > > Residual norms for fieldsplit_u_ solve. > > > > 0 KSP Residual norm 6.946213936597e-01 > > > > 1 KSP Residual norm 1.195514007343e-05 > > > > Residual norms for fieldsplit_wp_ solve. > > > > 0 KSP Residual norm 1.025694497535e+00 > > > > 1 KSP Residual norm 0.000000000000e+00 > > > > 2 KSP preconditioned resid norm 8.785709535405e-03 true resid norm 1.419341799277e+04 ||r(i)||/||b|| 1.782282866091e-02 > > > > Residual norms for fieldsplit_u_ solve. > > > > 0 KSP Residual norm 7.255149996405e-01 > > > > 1 KSP Residual norm 6.583512434218e-06 > > > > Residual norms for fieldsplit_wp_ solve. > > > > 0 KSP Residual norm 1.015229700337e+00 > > > > 1 KSP Residual norm 0.000000000000e+00 > > > > 3 KSP preconditioned resid norm 7.110407712709e-04 true resid norm 5.284940654154e+02 ||r(i)||/||b|| 6.636357205153e-04 > > > > Residual norms for fieldsplit_u_ solve. > > > > 0 KSP Residual norm 3.512243341400e-01 > > > > 1 KSP Residual norm 2.032490351200e-06 > > > > Residual norms for fieldsplit_wp_ solve. > > > > 0 KSP Residual norm 1.282327290982e+00 > > > > 1 KSP Residual norm 0.000000000000e+00 > > > > 4 KSP preconditioned resid norm 3.482036620521e-05 true resid norm 4.291231924307e+01 ||r(i)||/||b|| 5.388546393133e-05 > > > > Residual norms for fieldsplit_u_ solve. > > > > 0 KSP Residual norm 3.423609338053e-01 > > > > 1 KSP Residual norm 4.213703301972e-07 > > > > Residual norms for fieldsplit_wp_ solve. > > > > 0 KSP Residual norm 1.157384757538e+00 > > > > 1 KSP Residual norm 0.000000000000e+00 > > > > 5 KSP preconditioned resid norm 1.203470314534e-06 true resid norm 4.544956156267e+00 ||r(i)||/||b|| 5.707150658550e-06 > > > > Residual norms for fieldsplit_u_ solve. > > > > 0 KSP Residual norm 3.838596289995e-01 > > > > 1 KSP Residual norm 9.927864176103e-08 > > > > Residual norms for fieldsplit_wp_ solve. > > > > 0 KSP Residual norm 1.066298905618e+00 > > > > 1 KSP Residual norm 0.000000000000e+00 > > > > 6 KSP preconditioned resid norm 3.331619244266e-08 true resid norm 2.821511729024e+00 ||r(i)||/||b|| 3.543002829675e-06 > > > > Residual norms for fieldsplit_u_ solve. > > > > 0 KSP Residual norm 4.624964188094e-01 > > > > 1 KSP Residual norm 6.418229775372e-08 > > > > Residual norms for fieldsplit_wp_ solve. > > > > 0 KSP Residual norm 9.800784311614e-01 > > > > 1 KSP Residual norm 0.000000000000e+00 > > > > 7 KSP preconditioned resid norm 8.788046233297e-10 true resid norm 2.849209671705e+00 ||r(i)||/||b|| 3.577783436215e-06 > > > > Linear solve converged due to CONVERGED_ATOL iterations 7 > > > > > > > > The outer operator is an explicit matrix. > > > > > > > > Giang > > > > > > > > On Mon, Apr 24, 2017 at 7:32 PM, Barry Smith wrote: > > > > > > > > > On Apr 24, 2017, at 3:16 AM, Hoang Giang Bui wrote: > > > > > > > > > > Thanks Barry, trying with -fieldsplit_u_type lu gives better convergence. I still used 4 procs though, probably with 1 proc it should also be the same. > > > > > > > > > > The u block used a Nitsche-type operator to connect two non-matching domains. I don't think it will leave some rigid body motion leads to not sufficient constraints. Maybe you have other idea? > > > > > > > > > > Residual norms for fieldsplit_u_ solve. > > > > > 0 KSP Residual norm 3.129067184300e+05 > > > > > 1 KSP Residual norm 5.906261468196e-01 > > > > > Residual norms for fieldsplit_wp_ solve. > > > > > 0 KSP Residual norm 0.000000000000e+00 > > > > > > > > ^^^^ something is wrong here. The sub solve should not be starting with a 0 residual (this means the right hand side for this sub solve is zero which it should not be). > > > > > > > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 > > > > > > > > > > > > How are you providing the outer operator? As an explicit matrix or with some shell matrix? > > > > > > > > > > > > > > > > > 0 KSP preconditioned resid norm 3.129067184300e+05 true resid norm 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 > > > > > Residual norms for fieldsplit_u_ solve. > > > > > 0 KSP Residual norm 9.999955993437e-01 > > > > > 1 KSP Residual norm 4.019774691831e-06 > > > > > Residual norms for fieldsplit_wp_ solve. > > > > > 0 KSP Residual norm 0.000000000000e+00 > > > > > 1 KSP preconditioned resid norm 5.003913641475e-01 true resid norm 4.692996324114e+01 ||r(i)||/||b|| 5.205677185522e-06 > > > > > Residual norms for fieldsplit_u_ solve. > > > > > 0 KSP Residual norm 1.000012180204e+00 > > > > > 1 KSP Residual norm 1.017367950422e-05 > > > > > Residual norms for fieldsplit_wp_ solve. > > > > > 0 KSP Residual norm 0.000000000000e+00 > > > > > 2 KSP preconditioned resid norm 2.330910333756e-07 true resid norm 3.474855463983e+01 ||r(i)||/||b|| 3.854461960453e-06 > > > > > Residual norms for fieldsplit_u_ solve. > > > > > 0 KSP Residual norm 1.000004200085e+00 > > > > > 1 KSP Residual norm 6.231613102458e-06 > > > > > Residual norms for fieldsplit_wp_ solve. > > > > > 0 KSP Residual norm 0.000000000000e+00 > > > > > 3 KSP preconditioned resid norm 8.671259838389e-11 true resid norm 3.545103468011e+01 ||r(i)||/||b|| 3.932384125024e-06 > > > > > Linear solve converged due to CONVERGED_ATOL iterations 3 > > > > > KSP Object: 4 MPI processes > > > > > type: gmres > > > > > GMRES: restart=1000, using Modified Gram-Schmidt Orthogonalization > > > > > GMRES: happy breakdown tolerance 1e-30 > > > > > maximum iterations=1000, initial guess is zero > > > > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 > > > > > left preconditioning > > > > > using PRECONDITIONED norm type for convergence test > > > > > PC Object: 4 MPI processes > > > > > type: fieldsplit > > > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 > > > > > Solver info for each split is in the following KSP objects: > > > > > Split number 0 Defined by IS > > > > > KSP Object: (fieldsplit_u_) 4 MPI processes > > > > > type: richardson > > > > > Richardson: damping factor=1 > > > > > maximum iterations=1, initial guess is zero > > > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > > > > left preconditioning > > > > > using PRECONDITIONED norm type for convergence test > > > > > PC Object: (fieldsplit_u_) 4 MPI processes > > > > > type: lu > > > > > LU: out-of-place factorization > > > > > tolerance for zero pivot 2.22045e-14 > > > > > matrix ordering: natural > > > > > factor fill ratio given 0, needed 0 > > > > > Factored matrix follows: > > > > > Mat Object: 4 MPI processes > > > > > type: mpiaij > > > > > rows=938910, cols=938910 > > > > > package used to perform factorization: pastix > > > > > total: nonzeros=0, allocated nonzeros=0 > > > > > Error : 3.36878e-14 > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > PaStiX run parameters: > > > > > Matrix type : Unsymmetric > > > > > Level of printing (0,1,2): 0 > > > > > Number of refinements iterations : 3 > > > > > Error : 3.36878e-14 > > > > > linear system matrix = precond matrix: > > > > > Mat Object: (fieldsplit_u_) 4 MPI processes > > > > > type: mpiaij > > > > > rows=938910, cols=938910, bs=3 > > > > > Error : 3.36878e-14 > > > > > Error : 3.36878e-14 > > > > > total: nonzeros=8.60906e+07, allocated nonzeros=8.60906e+07 > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > using I-node (on process 0) routines: found 78749 nodes, limit used is 5 > > > > > Split number 1 Defined by IS > > > > > KSP Object: (fieldsplit_wp_) 4 MPI processes > > > > > type: richardson > > > > > Richardson: damping factor=1 > > > > > maximum iterations=1, initial guess is zero > > > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > > > > left preconditioning > > > > > using PRECONDITIONED norm type for convergence test > > > > > PC Object: (fieldsplit_wp_) 4 MPI processes > > > > > type: lu > > > > > LU: out-of-place factorization > > > > > tolerance for zero pivot 2.22045e-14 > > > > > matrix ordering: natural > > > > > factor fill ratio given 0, needed 0 > > > > > Factored matrix follows: > > > > > Mat Object: 4 MPI processes > > > > > type: mpiaij > > > > > rows=34141, cols=34141 > > > > > package used to perform factorization: pastix > > > > > Error : -nan > > > > > Error : -nan > > > > > Error : -nan > > > > > total: nonzeros=0, allocated nonzeros=0 > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > PaStiX run parameters: > > > > > Matrix type : Symmetric > > > > > Level of printing (0,1,2): 0 > > > > > Number of refinements iterations : 0 > > > > > Error : -nan > > > > > linear system matrix = precond matrix: > > > > > Mat Object: (fieldsplit_wp_) 4 MPI processes > > > > > type: mpiaij > > > > > rows=34141, cols=34141 > > > > > total: nonzeros=485655, allocated nonzeros=485655 > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > not using I-node (on process 0) routines > > > > > linear system matrix = precond matrix: > > > > > Mat Object: 4 MPI processes > > > > > type: mpiaij > > > > > rows=973051, cols=973051 > > > > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > using I-node (on process 0) routines: found 78749 nodes, limit used is 5 > > > > > > > > > > > > > > > > > > > > Giang > > > > > > > > > > On Sun, Apr 23, 2017 at 10:19 PM, Barry Smith wrote: > > > > > > > > > > > On Apr 23, 2017, at 2:42 PM, Hoang Giang Bui wrote: > > > > > > > > > > > > Dear Matt/Barry > > > > > > > > > > > > With your options, it results in > > > > > > > > > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 > > > > > > Residual norms for fieldsplit_u_ solve. > > > > > > 0 KSP Residual norm 2.407308987203e+36 > > > > > > 1 KSP Residual norm 5.797185652683e+72 > > > > > > > > > > It looks like Matt is right, hypre is seemly producing useless garbage. > > > > > > > > > > First how do things run on one process. If you have similar problems then debug on one process (debugging any kind of problem is always far easy on one process). > > > > > > > > > > First run with -fieldsplit_u_type lu (instead of using hypre) to see if that works or also produces something bad. > > > > > > > > > > What is the operator and the boundary conditions for u? It could be singular. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Residual norms for fieldsplit_wp_ solve. > > > > > > 0 KSP Residual norm 0.000000000000e+00 > > > > > > ... > > > > > > 999 KSP preconditioned resid norm 2.920157329174e+12 true resid norm 9.015683504616e+06 ||r(i)||/||b|| 1.000059124102e+00 > > > > > > Residual norms for fieldsplit_u_ solve. > > > > > > 0 KSP Residual norm 1.533726746719e+36 > > > > > > 1 KSP Residual norm 3.692757392261e+72 > > > > > > Residual norms for fieldsplit_wp_ solve. > > > > > > 0 KSP Residual norm 0.000000000000e+00 > > > > > > > > > > > > Do you suggest that the pastix solver for the "wp" block encounters small pivot? In addition, seem like the "u" block is also singular. > > > > > > > > > > > > Giang > > > > > > > > > > > > On Sun, Apr 23, 2017 at 7:39 PM, Barry Smith wrote: > > > > > > > > > > > > Huge preconditioned norms but normal unpreconditioned norms almost always come from a very small pivot in an LU or ILU factorization. > > > > > > > > > > > > The first thing to do is monitor the two sub solves. Run with the additional options -fieldsplit_u_ksp_type richardson -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor -fieldsplit_wp_ksp_max_it 1 > > > > > > > > > > > > > On Apr 23, 2017, at 12:22 PM, Hoang Giang Bui wrote: > > > > > > > > > > > > > > Hello > > > > > > > > > > > > > > I encountered a strange convergence behavior that I have trouble to understand > > > > > > > > > > > > > > KSPSetFromOptions completed > > > > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 > > > > > > > 1 KSP preconditioned resid norm 2.933141742664e+29 true resid norm 9.015152282123e+06 ||r(i)||/||b|| 1.000000198575e+00 > > > > > > > 2 KSP preconditioned resid norm 9.686409637174e+16 true resid norm 9.015354521944e+06 ||r(i)||/||b|| 1.000022631902e+00 > > > > > > > 3 KSP preconditioned resid norm 4.219243615809e+15 true resid norm 9.017157702420e+06 ||r(i)||/||b|| 1.000222648583e+00 > > > > > > > ..... > > > > > > > 999 KSP preconditioned resid norm 3.043754298076e+12 true resid norm 9.015425041089e+06 ||r(i)||/||b|| 1.000030454195e+00 > > > > > > > 1000 KSP preconditioned resid norm 3.043000287819e+12 true resid norm 9.015424313455e+06 ||r(i)||/||b|| 1.000030373483e+00 > > > > > > > Linear solve did not converge due to DIVERGED_ITS iterations 1000 > > > > > > > KSP Object: 4 MPI processes > > > > > > > type: gmres > > > > > > > GMRES: restart=1000, using Modified Gram-Schmidt Orthogonalization > > > > > > > GMRES: happy breakdown tolerance 1e-30 > > > > > > > maximum iterations=1000, initial guess is zero > > > > > > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 > > > > > > > left preconditioning > > > > > > > using PRECONDITIONED norm type for convergence test > > > > > > > PC Object: 4 MPI processes > > > > > > > type: fieldsplit > > > > > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 > > > > > > > Solver info for each split is in the following KSP objects: > > > > > > > Split number 0 Defined by IS > > > > > > > KSP Object: (fieldsplit_u_) 4 MPI processes > > > > > > > type: preonly > > > > > > > maximum iterations=10000, initial guess is zero > > > > > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > > > > > > left preconditioning > > > > > > > using NONE norm type for convergence test > > > > > > > PC Object: (fieldsplit_u_) 4 MPI processes > > > > > > > type: hypre > > > > > > > HYPRE BoomerAMG preconditioning > > > > > > > HYPRE BoomerAMG: Cycle type V > > > > > > > HYPRE BoomerAMG: Maximum number of levels 25 > > > > > > > HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 > > > > > > > HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 > > > > > > > HYPRE BoomerAMG: Threshold for strong coupling 0.6 > > > > > > > HYPRE BoomerAMG: Interpolation truncation factor 0 > > > > > > > HYPRE BoomerAMG: Interpolation: max elements per row 0 > > > > > > > HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 > > > > > > > HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 > > > > > > > HYPRE BoomerAMG: Maximum row sums 0.9 > > > > > > > HYPRE BoomerAMG: Sweeps down 1 > > > > > > > HYPRE BoomerAMG: Sweeps up 1 > > > > > > > HYPRE BoomerAMG: Sweeps on coarse 1 > > > > > > > HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi > > > > > > > HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi > > > > > > > HYPRE BoomerAMG: Relax on coarse Gaussian-elimination > > > > > > > HYPRE BoomerAMG: Relax weight (all) 1 > > > > > > > HYPRE BoomerAMG: Outer relax weight (all) 1 > > > > > > > HYPRE BoomerAMG: Using CF-relaxation > > > > > > > HYPRE BoomerAMG: Measure type local > > > > > > > HYPRE BoomerAMG: Coarsen type PMIS > > > > > > > HYPRE BoomerAMG: Interpolation type classical > > > > > > > linear system matrix = precond matrix: > > > > > > > Mat Object: (fieldsplit_u_) 4 MPI processes > > > > > > > type: mpiaij > > > > > > > rows=938910, cols=938910, bs=3 > > > > > > > total: nonzeros=8.60906e+07, allocated nonzeros=8.60906e+07 > > > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > > > using I-node (on process 0) routines: found 78749 nodes, limit used is 5 > > > > > > > Split number 1 Defined by IS > > > > > > > KSP Object: (fieldsplit_wp_) 4 MPI processes > > > > > > > type: preonly > > > > > > > maximum iterations=10000, initial guess is zero > > > > > > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000 > > > > > > > left preconditioning > > > > > > > using NONE norm type for convergence test > > > > > > > PC Object: (fieldsplit_wp_) 4 MPI processes > > > > > > > type: lu > > > > > > > LU: out-of-place factorization > > > > > > > tolerance for zero pivot 2.22045e-14 > > > > > > > matrix ordering: natural > > > > > > > factor fill ratio given 0, needed 0 > > > > > > > Factored matrix follows: > > > > > > > Mat Object: 4 MPI processes > > > > > > > type: mpiaij > > > > > > > rows=34141, cols=34141 > > > > > > > package used to perform factorization: pastix > > > > > > > Error : -nan > > > > > > > Error : -nan > > > > > > > total: nonzeros=0, allocated nonzeros=0 > > > > > > > Error : -nan > > > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > > > PaStiX run parameters: > > > > > > > Matrix type : Symmetric > > > > > > > Level of printing (0,1,2): 0 > > > > > > > Number of refinements iterations : 0 > > > > > > > Error : -nan > > > > > > > linear system matrix = precond matrix: > > > > > > > Mat Object: (fieldsplit_wp_) 4 MPI processes > > > > > > > type: mpiaij > > > > > > > rows=34141, cols=34141 > > > > > > > total: nonzeros=485655, allocated nonzeros=485655 > > > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > > > not using I-node (on process 0) routines > > > > > > > linear system matrix = precond matrix: > > > > > > > Mat Object: 4 MPI processes > > > > > > > type: mpiaij > > > > > > > rows=973051, cols=973051 > > > > > > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 > > > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > > > using I-node (on process 0) routines: found 78749 nodes, limit used is 5 > > > > > > > > > > > > > > The pattern of convergence gives a hint that this system is somehow bad/singular. But I don't know why the preconditioned error goes up too high. Anyone has an idea? > > > > > > > > > > > > > > Best regards > > > > > > > Giang Bui > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From bsmith at mcs.anl.gov Fri Apr 28 11:26:13 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Fri, 28 Apr 2017 11:26:13 -0500 Subject: [petsc-users] explanations on DM_BOUNDARY_PERIODIC In-Reply-To: References: <3DF0AB40-32C2-4A55-8078-C3D649B4B7E8@mcs.anl.gov> Message-ID: <9A9763FE-A3E6-465B-B97E-4A077032E3BF@mcs.anl.gov> > On Apr 28, 2017, at 2:36 AM, neok m4700 wrote: > > Hello Barry, > > Thank you for answering. > > I quote the DMDA webpage: > "The vectors can be thought of as either cell centered or vertex centered on the mesh. But some variables cannot be cell centered and others vertex centered." > > So If I use this, then when creating the DMDA the overall size will be the number of nodes, with nodal coordinates, and by setting DMDA_Q0 interp together with DM_BOUNDARY_PERIODIC I should be able to recover the solution at cell centers ? > > I that possible in PETSc or should I stick to the nodal representation of my problem ? You can use them as cell centered values for your problem. Just keep in mind that the coordinate values in the coordinate array correspond to the vertex locations in the mesh. Barry > > thanks. > > 2017-04-27 20:03 GMT+02:00 Barry Smith : > > > On Apr 27, 2017, at 12:43 PM, neok m4700 wrote: > > > > Hi Matthew, > > > > Thank you for the clarification, however, it is unclear why there is an additional unknown in the case of periodic bcs. > > > > Please see attached to this email what I'd like to achieve, the number of unknowns does not change when switching to the periodic case for e.g. a laplace operator. > > So here you are thinking in terms of cell-centered discretizations. You are correct in that case that the number of "unknowns" is the same for both Dirichlet or periodic boundary conditions. > > DMDA was originally written in support of vertex centered coordinates, then this was extended somewhat with DMDASetInterpolationType() where DMDA_Q1 represents piecewise linear vertex centered while DMDA_Q0 represents piecewise constatant cell-centered. > > If you look at the source code for DMDASetUniformCoordinates() it is written in the context of vertex centered where the coordinates are stored for each vertex > > if (bx == DM_BOUNDARY_PERIODIC) hx = (xmax-xmin)/M; > else hx = (xmax-xmin)/(M-1); > ierr = VecGetArray(xcoor,&coors);CHKERRQ(ierr); > for (i=0; i coors[i] = xmin + hx*(i+istart); > } > > Note that in the periodic case say domain [0,1) vertex centered with 3 grid points (in the global problem) the coordinates for the vertices are 0, 1/3, 2/3 If you are using cell-centered and have 3 cells, the coordinates of the vertices are again 0, 1/3, 2/3 > > Note that in the cell centered case we are storing in each location of the vector the coordinates of a vertex, not the coordinates of the cell center so it is a likely "wonky". > > There is no contradiction between what you are saying and what we are saying. > > Barry > > > > > And in the case of dirichlet or neumann bcs, the extremum cell add information to the RHS, they do not appear in the matrix formulation. > > > > Hope I was clear enough, > > thanks > > > > > > 2017-04-27 16:15 GMT+02:00 Matthew Knepley : > > On Thu, Apr 27, 2017 at 3:46 AM, neok m4700 wrote: > > Hi, > > > > I am trying to change my problem to using periodic boundary conditions. > > > > However, when I use DMDASetUniformCoordinates on the DA, the spacing changes. > > > > This is due to an additional point e.g. in dm/impls/da/gr1.c > > > > else if (dim == 2) { > > if (bx == DM_BOUNDARY_PERIODIC) hx = (xmax-xmin)/(M); > > else hx = (xmax-xmin)/(M-1); > > if (by == DM_BOUNDARY_PERIODIC) hy = (ymax-ymin)/(N); > > else hy = (ymax-ymin)/(N-1); > > > > I don't understand the logic here, since xmin an xmax refer to the physical domain, how does changing to a periodic BC change the discretization ? > > > > Could someone clarify or point to a reference ? > > > > Just do a 1D example with 3 vertices. With a normal domain, you have 2 cells > > > > 1-----2-----3 > > > > so each cell is 1/2 of the domain. In a periodic domain, the last vertex is connected to the first, so we have 3 cells > > > > 1-----2-----3-----1 > > > > and each is 1/3 of the domain. > > > > Matt > > > > Thanks > > > > > > > > -- > > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. > > -- Norbert Wiener > > > > <1D.pdf> > > From natacha.bereux at gmail.com Fri Apr 28 11:48:03 2017 From: natacha.bereux at gmail.com (Natacha BEREUX) Date: Fri, 28 Apr 2017 18:48:03 +0200 Subject: [petsc-users] Configure nested PCFIELDSPLIT with general index sets In-Reply-To: References: <6496846F-19F8-4494-87E1-DDC390513370@imperial.ac.uk> Message-ID: Dear Matt, Sorry for my (very) late reply. I was not able to find the Fortran interface of DMSellSetCreateFieldDecomposition in the late petsc-3.7.6 fortran (and my code still fails to link). I have the feeling that it is missing in the master branch. And I was not able to get it on bitbucket either. Is there a branch from which I can pull your commit ? Thans a lot for your help, Natacha On Thu, Mar 30, 2017 at 9:25 PM, Matthew Knepley wrote: > On Wed, Mar 22, 2017 at 1:45 PM, Natacha BEREUX > wrote: > >> Hello Matt, >> Thanks a lot for your answers. >> Since I am working on a large FEM Fortran code, I have to stick to >> Fortran. >> Do you know if someone plans to add this Fortran interface? Or may be I >> could do it myself ? Is this particular interface very hard to add ? >> Perhaps could I mimic some other interface ? >> What would you advise ? >> > > I have added the interface in branch knepley/feature-fortran-compose. I > also put this in the 'next' branch. It > should make it to master soon. There is a test in sys/examples/tests/ex13f > > Thanks, > > Matt > > >> Best regards, >> Natacha >> >> On Wed, Mar 22, 2017 at 12:33 PM, Matthew Knepley >> wrote: >> >>> On Wed, Mar 22, 2017 at 10:03 AM, Natacha BEREUX < >>> natacha.bereux at gmail.com> wrote: >>> >>>> Hello, >>>> if my understanding is correct, the approach proposed by Matt and >>>> Lawrence is the following : >>>> - create a DMShell (DMShellCreate) >>>> - define my own CreateFieldDecomposition to return the index sets I >>>> need (for displacement, pressure and temperature degrees of freedom) : >>>> myCreateFieldDecomposition(... ) >>>> - set it in the DMShell ( DMShellSetCreateFieldDecomposition) >>>> - then sets the DM in KSP context (KSPSetDM) >>>> >>>> I have some more questions >>>> - I did not succeed in setting my own CreateFieldDecomposition in the >>>> DMShell : link fails with " unknown reference to ? >>>> dmshellsetcreatefielddecomposition_ ?. Could it be a Fortran problem >>>> (I am using Fortran)? Is this routine available in PETSc Fortran >>>> interface ? \ >>>> >>> >>> Yes, exactly. The Fortran interface for passing function pointers is >>> complex, and no one has added this function yet. >>> >>> >>>> - CreateFieldDecomposition is supposed to return an array of dms (to >>>> define the fields). I am not able to return such datas. Do I return a >>>> PETSC_NULL_OBJECT instead ? >>>> >>> >>> Yes. >>> >>> >>>> - do I have to provide something else to define the DMShell ? >>>> >>> >>> I think you will have to return local and global vectors, but this just >>> means creating a vector of the correct size and distribution. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Thanks a lot for your help >>>> Natacha >>>> >>>> On Tue, Mar 21, 2017 at 2:44 PM, Natacha BEREUX < >>>> natacha.bereux at gmail.com> wrote: >>>> >>>>> Thanks for your quick answers. To be honest, I am not familiar at all >>>>> with DMShells and DMPlexes. But since it is what I need, I am going to try >>>>> it. >>>>> Thanks again for your advices, >>>>> Natacha >>>>> >>>>> On Tue, Mar 21, 2017 at 2:27 PM, Lawrence Mitchell < >>>>> lawrence.mitchell at imperial.ac.uk> wrote: >>>>> >>>>>> >>>>>> > On 21 Mar 2017, at 13:24, Matthew Knepley >>>>>> wrote: >>>>>> > >>>>>> > I think the remedy is as easy as specifying a DMShell that has a >>>>>> PetscSection (DMSetDefaultSection) with your ordering, and >>>>>> > I think this is how Firedrake (http://www.firedrakeproject.org/) >>>>>> does it. >>>>>> >>>>>> We actually don't use a section, but we do provide >>>>>> DMCreateFieldDecomposition_Shell. >>>>>> >>>>>> If you have a section that describes all the fields, then I think if >>>>>> the DMShell knows about it, you effectively get the same behaviour as >>>>>> DMPlex (which does the decomposition in the same manner?). >>>>>> >>>>>> > However, I usually use a DMPlex which knows about my >>>>>> > mesh, so I am not sure if this strategy has any holes. >>>>>> >>>>>> I haven't noticed anything yet. >>>>>> >>>>>> Lawrence >>>>> >>>>> >>>>> >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Apr 28 13:09:51 2017 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 28 Apr 2017 13:09:51 -0500 Subject: [petsc-users] Configure nested PCFIELDSPLIT with general index sets In-Reply-To: References: <6496846F-19F8-4494-87E1-DDC390513370@imperial.ac.uk> Message-ID: On Fri, Apr 28, 2017 at 11:48 AM, Natacha BEREUX wrote: > Dear Matt, > Sorry for my (very) late reply. > I was not able to find the Fortran interface of > DMSellSetCreateFieldDecomposition in the late petsc-3.7.6 fortran (and my > code still fails to link). > I have the feeling that it is missing in the master branch. > And I was not able to get it on bitbucket either. > Is there a branch from which I can pull your commit ? > I would either: a) Use the 'next' branch or b) wait until Monday for me to merge to 'master' This merge has been held up, but can now go forward. Thanks, Matt > Thans a lot for your help, > Natacha > > On Thu, Mar 30, 2017 at 9:25 PM, Matthew Knepley > wrote: > >> On Wed, Mar 22, 2017 at 1:45 PM, Natacha BEREUX > > wrote: >> >>> Hello Matt, >>> Thanks a lot for your answers. >>> Since I am working on a large FEM Fortran code, I have to stick to >>> Fortran. >>> Do you know if someone plans to add this Fortran interface? Or may be I >>> could do it myself ? Is this particular interface very hard to add ? >>> Perhaps could I mimic some other interface ? >>> What would you advise ? >>> >> >> I have added the interface in branch knepley/feature-fortran-compose. I >> also put this in the 'next' branch. It >> should make it to master soon. There is a test in sys/examples/tests/ex13f >> >> Thanks, >> >> Matt >> >> >>> Best regards, >>> Natacha >>> >>> On Wed, Mar 22, 2017 at 12:33 PM, Matthew Knepley >>> wrote: >>> >>>> On Wed, Mar 22, 2017 at 10:03 AM, Natacha BEREUX < >>>> natacha.bereux at gmail.com> wrote: >>>> >>>>> Hello, >>>>> if my understanding is correct, the approach proposed by Matt and >>>>> Lawrence is the following : >>>>> - create a DMShell (DMShellCreate) >>>>> - define my own CreateFieldDecomposition to return the index sets I >>>>> need (for displacement, pressure and temperature degrees of freedom) : >>>>> myCreateFieldDecomposition(... ) >>>>> - set it in the DMShell ( DMShellSetCreateFieldDecomposition) >>>>> - then sets the DM in KSP context (KSPSetDM) >>>>> >>>>> I have some more questions >>>>> - I did not succeed in setting my own CreateFieldDecomposition in the >>>>> DMShell : link fails with " unknown reference to ? >>>>> dmshellsetcreatefielddecomposition_ ?. Could it be a Fortran problem >>>>> (I am using Fortran)? Is this routine available in PETSc Fortran >>>>> interface ? \ >>>>> >>>> >>>> Yes, exactly. The Fortran interface for passing function pointers is >>>> complex, and no one has added this function yet. >>>> >>>> >>>>> - CreateFieldDecomposition is supposed to return an array of dms (to >>>>> define the fields). I am not able to return such datas. Do I return a >>>>> PETSC_NULL_OBJECT instead ? >>>>> >>>> >>>> Yes. >>>> >>>> >>>>> - do I have to provide something else to define the DMShell ? >>>>> >>>> >>>> I think you will have to return local and global vectors, but this just >>>> means creating a vector of the correct size and distribution. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> Thanks a lot for your help >>>>> Natacha >>>>> >>>>> On Tue, Mar 21, 2017 at 2:44 PM, Natacha BEREUX < >>>>> natacha.bereux at gmail.com> wrote: >>>>> >>>>>> Thanks for your quick answers. To be honest, I am not familiar at all >>>>>> with DMShells and DMPlexes. But since it is what I need, I am going to try >>>>>> it. >>>>>> Thanks again for your advices, >>>>>> Natacha >>>>>> >>>>>> On Tue, Mar 21, 2017 at 2:27 PM, Lawrence Mitchell < >>>>>> lawrence.mitchell at imperial.ac.uk> wrote: >>>>>> >>>>>>> >>>>>>> > On 21 Mar 2017, at 13:24, Matthew Knepley >>>>>>> wrote: >>>>>>> > >>>>>>> > I think the remedy is as easy as specifying a DMShell that has a >>>>>>> PetscSection (DMSetDefaultSection) with your ordering, and >>>>>>> > I think this is how Firedrake (http://www.firedrakeproject.org/) >>>>>>> does it. >>>>>>> >>>>>>> We actually don't use a section, but we do provide >>>>>>> DMCreateFieldDecomposition_Shell. >>>>>>> >>>>>>> If you have a section that describes all the fields, then I think if >>>>>>> the DMShell knows about it, you effectively get the same behaviour as >>>>>>> DMPlex (which does the decomposition in the same manner?). >>>>>>> >>>>>>> > However, I usually use a DMPlex which knows about my >>>>>>> > mesh, so I am not sure if this strategy has any holes. >>>>>>> >>>>>>> I haven't noticed anything yet. >>>>>>> >>>>>>> Lawrence >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From knepley at gmail.com Fri Apr 28 13:11:47 2017 From: knepley at gmail.com (Matthew Knepley) Date: Fri, 28 Apr 2017 13:11:47 -0500 Subject: [petsc-users] Configure nested PCFIELDSPLIT with general index sets In-Reply-To: References: <6496846F-19F8-4494-87E1-DDC390513370@imperial.ac.uk> Message-ID: On Fri, Apr 28, 2017 at 1:09 PM, Matthew Knepley wrote: > On Fri, Apr 28, 2017 at 11:48 AM, Natacha BEREUX > wrote: > >> Dear Matt, >> Sorry for my (very) late reply. >> I was not able to find the Fortran interface of >> DMSellSetCreateFieldDecomposition in the late petsc-3.7.6 fortran (and >> my code still fails to link). >> I have the feeling that it is missing in the master branch. >> And I was not able to get it on bitbucket either. >> Is there a branch from which I can pull your commit ? >> > > I would either: > > a) Use the 'next' branch > > or > > b) wait until Monday for me to merge to 'master' > > This merge has been held up, but can now go forward. > I just checked master. It was already merged. Please recheck your master. Thanks, Matt > Thanks, > > Matt > > >> Thans a lot for your help, >> Natacha >> >> On Thu, Mar 30, 2017 at 9:25 PM, Matthew Knepley >> wrote: >> >>> On Wed, Mar 22, 2017 at 1:45 PM, Natacha BEREUX < >>> natacha.bereux at gmail.com> wrote: >>> >>>> Hello Matt, >>>> Thanks a lot for your answers. >>>> Since I am working on a large FEM Fortran code, I have to stick to >>>> Fortran. >>>> Do you know if someone plans to add this Fortran interface? Or may be >>>> I could do it myself ? Is this particular interface very hard to add ? >>>> Perhaps could I mimic some other interface ? >>>> What would you advise ? >>>> >>> >>> I have added the interface in branch knepley/feature-fortran-compose. I >>> also put this in the 'next' branch. It >>> should make it to master soon. There is a test in >>> sys/examples/tests/ex13f >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Best regards, >>>> Natacha >>>> >>>> On Wed, Mar 22, 2017 at 12:33 PM, Matthew Knepley >>>> wrote: >>>> >>>>> On Wed, Mar 22, 2017 at 10:03 AM, Natacha BEREUX < >>>>> natacha.bereux at gmail.com> wrote: >>>>> >>>>>> Hello, >>>>>> if my understanding is correct, the approach proposed by Matt and >>>>>> Lawrence is the following : >>>>>> - create a DMShell (DMShellCreate) >>>>>> - define my own CreateFieldDecomposition to return the index sets I >>>>>> need (for displacement, pressure and temperature degrees of freedom) : >>>>>> myCreateFieldDecomposition(... ) >>>>>> - set it in the DMShell ( DMShellSetCreateFieldDecomposition) >>>>>> - then sets the DM in KSP context (KSPSetDM) >>>>>> >>>>>> I have some more questions >>>>>> - I did not succeed in setting my own CreateFieldDecomposition in the >>>>>> DMShell : link fails with " unknown reference to ? >>>>>> dmshellsetcreatefielddecomposition_ ?. Could it be a Fortran problem >>>>>> (I am using Fortran)? Is this routine available in PETSc Fortran >>>>>> interface ? \ >>>>>> >>>>> >>>>> Yes, exactly. The Fortran interface for passing function pointers is >>>>> complex, and no one has added this function yet. >>>>> >>>>> >>>>>> - CreateFieldDecomposition is supposed to return an array of dms (to >>>>>> define the fields). I am not able to return such datas. Do I return a >>>>>> PETSC_NULL_OBJECT instead ? >>>>>> >>>>> >>>>> Yes. >>>>> >>>>> >>>>>> - do I have to provide something else to define the DMShell ? >>>>>> >>>>> >>>>> I think you will have to return local and global vectors, but this >>>>> just means creating a vector of the correct size and distribution. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> Thanks a lot for your help >>>>>> Natacha >>>>>> >>>>>> On Tue, Mar 21, 2017 at 2:44 PM, Natacha BEREUX < >>>>>> natacha.bereux at gmail.com> wrote: >>>>>> >>>>>>> Thanks for your quick answers. To be honest, I am not familiar at >>>>>>> all with DMShells and DMPlexes. But since it is what I need, I am going to >>>>>>> try it. >>>>>>> Thanks again for your advices, >>>>>>> Natacha >>>>>>> >>>>>>> On Tue, Mar 21, 2017 at 2:27 PM, Lawrence Mitchell < >>>>>>> lawrence.mitchell at imperial.ac.uk> wrote: >>>>>>> >>>>>>>> >>>>>>>> > On 21 Mar 2017, at 13:24, Matthew Knepley >>>>>>>> wrote: >>>>>>>> > >>>>>>>> > I think the remedy is as easy as specifying a DMShell that has a >>>>>>>> PetscSection (DMSetDefaultSection) with your ordering, and >>>>>>>> > I think this is how Firedrake (http://www.firedrakeproject.org/) >>>>>>>> does it. >>>>>>>> >>>>>>>> We actually don't use a section, but we do provide >>>>>>>> DMCreateFieldDecomposition_Shell. >>>>>>>> >>>>>>>> If you have a section that describes all the fields, then I think >>>>>>>> if the DMShell knows about it, you effectively get the same behaviour as >>>>>>>> DMPlex (which does the decomposition in the same manner?). >>>>>>>> >>>>>>>> > However, I usually use a DMPlex which knows about my >>>>>>>> > mesh, so I am not sure if this strategy has any holes. >>>>>>>> >>>>>>>> I haven't noticed anything yet. >>>>>>>> >>>>>>>> Lawrence >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>> >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: From fmilicchio at me.com Sat Apr 29 01:14:42 2017 From: fmilicchio at me.com (Franco Milicchio) Date: Sat, 29 Apr 2017 08:14:42 +0200 Subject: [petsc-users] Using ViennaCL without recompiling In-Reply-To: References: <1DBD3F6F-E787-4E13-B997-88B85090BA17@me.com> Message-ID: > On Apr 28, 2017, at 4:46pm, Satish Balay wrote: > > On Fri, 28 Apr 2017, Franco Milicchio wrote: > >> >>> Not recompiling your own project is fine. PETSc has an ABI. You just reconfigure/recompile PETSc with >>> ViennaCL support. Then you can use -mat_type viennacl etc. >> >> Thanks for your answer, Matt, but I expressed myself in an ambiguous way. >> >> I cannot recompile PETSc, I can do whatever I want with my code. > > You can always install PETSc. > > If you don't have write permission to the install you are currently > using - you can start with a fresh tarball [of the same version], use > reconfigure*.py from the current install to configure - and install > your own copy [obviously at a different location. Thanks, Satish. As I understand, you are suggesting to just substitute PETSc at linking level with my ViennaCL-enabled library, and it should work ?flawlessly?? (the milage may vary, obviously) This would be a huge gain to the project. Thanks, Franco /fm -- Franco Milicchio Department of Engineering University Roma Tre https://fmilicchio.bitbucket.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgbk2008 at gmail.com Sat Apr 29 02:22:21 2017 From: hgbk2008 at gmail.com (Hoang Giang Bui) Date: Sat, 29 Apr 2017 09:22:21 +0200 Subject: [petsc-users] strange convergence In-Reply-To: References: <7891536D-91FE-4BFF-8DAD-CE7AB85A4E57@mcs.anl.gov> <425BBB58-9721-49F3-8C86-940F08E925F7@mcs.anl.gov> <42EB791A-40C2-439F-A5F7-5F8C15CECA6F@mcs.anl.gov> <82193784-B4C4-47D7-80EA-25F549C9091B@mcs.anl.gov> Message-ID: Hi Barry The first block is from a standard solid mechanics discretization based on balance of momentum equation. There is some material involved but in principal it's well-posed elasticity equation with positive definite tangent operator. The "gluing business" uses the mortar method to keep the continuity of displacement. Instead of using Lagrange multiplier to treat the constraint I used penalty method to penalize the energy. The discretization form of mortar is quite simple \int_{\Gamma_1} { rho * (\delta u_1 - \delta u_2) * (u_1 - u_2) dA } rho is penalty parameter. In the simulation I initially set it low (~E) to preserve the conditioning of the system. In the figure below, the colorful blocks are u_1 and the base is u_2. Both u_1 and u_2 use isoparametric quadratic approximation. ? Snapshot.png ??? Giang On Fri, Apr 28, 2017 at 6:21 PM, Barry Smith wrote: > > Ok, so boomerAMG algebraic multigrid is not good for the first block. > You mentioned the first block has two things glued together? AMG is > fantastic for certain problems but doesn't work for everything. > > Tell us more about the first block, what PDE it comes from, what > discretization, and what the "gluing business" is and maybe we'll have > suggestions for how to precondition it. > > Barry > > > On Apr 28, 2017, at 3:56 AM, Hoang Giang Bui wrote: > > > > It's in fact quite good > > > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 4.014715925568e+00 > > 1 KSP Residual norm 2.160497019264e-10 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 0.000000000000e+00 > > 0 KSP preconditioned resid norm 4.014715925568e+00 true resid norm > 9.006493082896e+06 ||r(i)||/||b|| 1.000000000000e+00 > > Residual norms for fieldsplit_u_ solve. > > 0 KSP Residual norm 9.999999999416e-01 > > 1 KSP Residual norm 7.118380416383e-11 > > Residual norms for fieldsplit_wp_ solve. > > 0 KSP Residual norm 0.000000000000e+00 > > 1 KSP preconditioned resid norm 1.701150951035e-10 true resid norm > 5.494262251846e-04 ||r(i)||/||b|| 6.100334726599e-11 > > Linear solve converged due to CONVERGED_ATOL iterations 1 > > > > Giang > > > > On Thu, Apr 27, 2017 at 5:25 PM, Barry Smith wrote: > > > > Run again using LU on both blocks to see what happens. > > > > > > > On Apr 27, 2017, at 2:14 AM, Hoang Giang Bui > wrote: > > > > > > I have changed the way to tie the nonconforming mesh. It seems the > matrix now is better > > > > > > with -pc_type lu the output is > > > 0 KSP preconditioned resid norm 3.308678584240e-01 true resid norm > 9.006493082896e+06 ||r(i)||/||b|| 1.000000000000e+00 > > > 1 KSP preconditioned resid norm 2.004313395301e-12 true resid norm > 2.549872332830e-05 ||r(i)||/||b|| 2.831148938173e-12 > > > Linear solve converged due to CONVERGED_ATOL iterations 1 > > > > > > > > > with -pc_type fieldsplit -fieldsplit_u_pc_type hypre > -fieldsplit_wp_pc_type lu the convergence is slow > > > 0 KSP preconditioned resid norm 1.116302362553e-01 true resid norm > 9.006493083520e+06 ||r(i)||/||b|| 1.000000000000e+00 > > > 1 KSP preconditioned resid norm 2.582134825666e-02 true resid norm > 9.268347719866e+06 ||r(i)||/||b|| 1.029073984060e+00 > > > ... > > > 824 KSP preconditioned resid norm 1.018542387738e-09 true resid norm > 2.906608839310e+02 ||r(i)||/||b|| 3.227237074804e-05 > > > 825 KSP preconditioned resid norm 9.743727947637e-10 true resid norm > 2.820369993061e+02 ||r(i)||/||b|| 3.131485215062e-05 > > > Linear solve converged due to CONVERGED_ATOL iterations 825 > > > > > > checking with additional -fieldsplit_u_ksp_type richardson > -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 > -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor > -fieldsplit_wp_ksp_max_it 1 gives > > > > > > 0 KSP preconditioned resid norm 1.116302362553e-01 true resid norm > 9.006493083520e+06 ||r(i)||/||b|| 1.000000000000e+00 > > > Residual norms for fieldsplit_u_ solve. > > > 0 KSP Residual norm 5.803507549280e-01 > > > 1 KSP Residual norm 2.069538175950e-01 > > > Residual norms for fieldsplit_wp_ solve. > > > 0 KSP Residual norm 0.000000000000e+00 > > > 1 KSP preconditioned resid norm 2.582134825666e-02 true resid norm > 9.268347719866e+06 ||r(i)||/||b|| 1.029073984060e+00 > > > Residual norms for fieldsplit_u_ solve. > > > 0 KSP Residual norm 7.831796195225e-01 > > > 1 KSP Residual norm 1.734608520110e-01 > > > Residual norms for fieldsplit_wp_ solve. > > > 0 KSP Residual norm 0.000000000000e+00 > > > .... > > > 823 KSP preconditioned resid norm 1.065070135605e-09 true resid norm > 3.081881356833e+02 ||r(i)||/||b|| 3.421843916665e-05 > > > Residual norms for fieldsplit_u_ solve. > > > 0 KSP Residual norm 6.113806394327e-01 > > > 1 KSP Residual norm 1.535465290944e-01 > > > Residual norms for fieldsplit_wp_ solve. > > > 0 KSP Residual norm 0.000000000000e+00 > > > 824 KSP preconditioned resid norm 1.018542387746e-09 true resid norm > 2.906608839353e+02 ||r(i)||/||b|| 3.227237074851e-05 > > > Residual norms for fieldsplit_u_ solve. > > > 0 KSP Residual norm 6.123437055586e-01 > > > 1 KSP Residual norm 1.524661826133e-01 > > > Residual norms for fieldsplit_wp_ solve. > > > 0 KSP Residual norm 0.000000000000e+00 > > > 825 KSP preconditioned resid norm 9.743727947718e-10 true resid norm > 2.820369990571e+02 ||r(i)||/||b|| 3.131485212298e-05 > > > Linear solve converged due to CONVERGED_ATOL iterations 825 > > > > > > > > > The residual for wp block is zero since in this first step the rhs is > zero. As can see in the output, the multigrid does not perform well to > reduce the residual in the sub-solve. Is my observation right? what can be > done to improve this? > > > > > > > > > Giang > > > > > > On Tue, Apr 25, 2017 at 12:17 AM, Barry Smith > wrote: > > > > > > This can happen in the matrix is singular or nearly singular or if > the factorization generates small pivots, which can occur for even > nonsingular problems if the matrix is poorly scaled or just plain nasty. > > > > > > > > > > On Apr 24, 2017, at 5:10 PM, Hoang Giang Bui > wrote: > > > > > > > > It took a while, here I send you the output > > > > > > > > 0 KSP preconditioned resid norm 3.129073545457e+05 true resid norm > 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 > > > > 1 KSP preconditioned resid norm 7.442444222843e-01 true resid norm > 1.003356247696e+02 ||r(i)||/||b|| 1.112966720375e-05 > > > > 2 KSP preconditioned resid norm 3.267453132529e-07 true resid norm > 3.216722968300e+01 ||r(i)||/||b|| 3.568130084011e-06 > > > > 3 KSP preconditioned resid norm 1.155046883816e-11 true resid norm > 3.234460376820e+01 ||r(i)||/||b|| 3.587805194854e-06 > > > > Linear solve converged due to CONVERGED_ATOL iterations 3 > > > > KSP Object: 4 MPI processes > > > > type: gmres > > > > GMRES: restart=1000, using Modified Gram-Schmidt > Orthogonalization > > > > GMRES: happy breakdown tolerance 1e-30 > > > > maximum iterations=1000, initial guess is zero > > > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 > > > > left preconditioning > > > > using PRECONDITIONED norm type for convergence test > > > > PC Object: 4 MPI processes > > > > type: lu > > > > LU: out-of-place factorization > > > > tolerance for zero pivot 2.22045e-14 > > > > matrix ordering: natural > > > > factor fill ratio given 0, needed 0 > > > > Factored matrix follows: > > > > Mat Object: 4 MPI processes > > > > type: mpiaij > > > > rows=973051, cols=973051 > > > > package used to perform factorization: pastix > > > > Error : 3.24786e-14 > > > > total: nonzeros=0, allocated nonzeros=0 > > > > total number of mallocs used during MatSetValues calls =0 > > > > PaStiX run parameters: > > > > Matrix type : Unsymmetric > > > > Level of printing (0,1,2): 0 > > > > Number of refinements iterations : 3 > > > > Error : 3.24786e-14 > > > > linear system matrix = precond matrix: > > > > Mat Object: 4 MPI processes > > > > type: mpiaij > > > > rows=973051, cols=973051 > > > > Error : 3.24786e-14 > > > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 > > > > total number of mallocs used during MatSetValues calls =0 > > > > using I-node (on process 0) routines: found 78749 nodes, limit > used is 5 > > > > Error : 3.24786e-14 > > > > > > > > It doesn't do as you said. Something is not right here. I will look > in depth. > > > > > > > > Giang > > > > > > > > On Mon, Apr 24, 2017 at 8:21 PM, Barry Smith > wrote: > > > > > > > > > On Apr 24, 2017, at 12:47 PM, Hoang Giang Bui > wrote: > > > > > > > > > > Good catch. I get this for the very first step, maybe at that time > the rhs_w is zero. > > > > > > > > With the multiplicative composition the right hand side of the > second solve is the initial right hand side of the second solve minus > A_10*x where x is the solution to the first sub solve and A_10 is the lower > left block of the outer matrix. So unless both the initial right hand side > has a zero for the second block and A_10 is identically zero the right hand > side for the second sub solve should not be zero. Is A_10 == 0? > > > > > > > > > > > > > In the later step, it shows 2 step convergence > > > > > > > > > > Residual norms for fieldsplit_u_ solve. > > > > > 0 KSP Residual norm 3.165886479830e+04 > > > > > 1 KSP Residual norm 2.905922877684e-01 > > > > > Residual norms for fieldsplit_wp_ solve. > > > > > 0 KSP Residual norm 2.397669419027e-01 > > > > > 1 KSP Residual norm 0.000000000000e+00 > > > > > 0 KSP preconditioned resid norm 3.165886479920e+04 true resid > norm 7.963616922323e+05 ||r(i)||/||b|| 1.000000000000e+00 > > > > > Residual norms for fieldsplit_u_ solve. > > > > > 0 KSP Residual norm 9.999891813771e-01 > > > > > 1 KSP Residual norm 1.512000395579e-05 > > > > > Residual norms for fieldsplit_wp_ solve. > > > > > 0 KSP Residual norm 8.192702188243e-06 > > > > > 1 KSP Residual norm 0.000000000000e+00 > > > > > 1 KSP preconditioned resid norm 5.252183822848e-02 true resid > norm 7.135927677844e+04 ||r(i)||/||b|| 8.960661653427e-02 > > > > > > > > The outer residual norms are still wonky, the preconditioned > residual norm goes from 3.165886479920e+04 to 5.252183822848e-02 which is a > huge drop but the 7.963616922323e+05 drops very much less > 7.135927677844e+04. This is not normal. > > > > > > > > What if you just use -pc_type lu for the entire system (no > fieldsplit), does the true residual drop to almost zero in the first > iteration (as it should?). Send the output. > > > > > > > > > > > > > > > > > Residual norms for fieldsplit_u_ solve. > > > > > 0 KSP Residual norm 6.946213936597e-01 > > > > > 1 KSP Residual norm 1.195514007343e-05 > > > > > Residual norms for fieldsplit_wp_ solve. > > > > > 0 KSP Residual norm 1.025694497535e+00 > > > > > 1 KSP Residual norm 0.000000000000e+00 > > > > > 2 KSP preconditioned resid norm 8.785709535405e-03 true resid > norm 1.419341799277e+04 ||r(i)||/||b|| 1.782282866091e-02 > > > > > Residual norms for fieldsplit_u_ solve. > > > > > 0 KSP Residual norm 7.255149996405e-01 > > > > > 1 KSP Residual norm 6.583512434218e-06 > > > > > Residual norms for fieldsplit_wp_ solve. > > > > > 0 KSP Residual norm 1.015229700337e+00 > > > > > 1 KSP Residual norm 0.000000000000e+00 > > > > > 3 KSP preconditioned resid norm 7.110407712709e-04 true resid > norm 5.284940654154e+02 ||r(i)||/||b|| 6.636357205153e-04 > > > > > Residual norms for fieldsplit_u_ solve. > > > > > 0 KSP Residual norm 3.512243341400e-01 > > > > > 1 KSP Residual norm 2.032490351200e-06 > > > > > Residual norms for fieldsplit_wp_ solve. > > > > > 0 KSP Residual norm 1.282327290982e+00 > > > > > 1 KSP Residual norm 0.000000000000e+00 > > > > > 4 KSP preconditioned resid norm 3.482036620521e-05 true resid > norm 4.291231924307e+01 ||r(i)||/||b|| 5.388546393133e-05 > > > > > Residual norms for fieldsplit_u_ solve. > > > > > 0 KSP Residual norm 3.423609338053e-01 > > > > > 1 KSP Residual norm 4.213703301972e-07 > > > > > Residual norms for fieldsplit_wp_ solve. > > > > > 0 KSP Residual norm 1.157384757538e+00 > > > > > 1 KSP Residual norm 0.000000000000e+00 > > > > > 5 KSP preconditioned resid norm 1.203470314534e-06 true resid > norm 4.544956156267e+00 ||r(i)||/||b|| 5.707150658550e-06 > > > > > Residual norms for fieldsplit_u_ solve. > > > > > 0 KSP Residual norm 3.838596289995e-01 > > > > > 1 KSP Residual norm 9.927864176103e-08 > > > > > Residual norms for fieldsplit_wp_ solve. > > > > > 0 KSP Residual norm 1.066298905618e+00 > > > > > 1 KSP Residual norm 0.000000000000e+00 > > > > > 6 KSP preconditioned resid norm 3.331619244266e-08 true resid > norm 2.821511729024e+00 ||r(i)||/||b|| 3.543002829675e-06 > > > > > Residual norms for fieldsplit_u_ solve. > > > > > 0 KSP Residual norm 4.624964188094e-01 > > > > > 1 KSP Residual norm 6.418229775372e-08 > > > > > Residual norms for fieldsplit_wp_ solve. > > > > > 0 KSP Residual norm 9.800784311614e-01 > > > > > 1 KSP Residual norm 0.000000000000e+00 > > > > > 7 KSP preconditioned resid norm 8.788046233297e-10 true resid > norm 2.849209671705e+00 ||r(i)||/||b|| 3.577783436215e-06 > > > > > Linear solve converged due to CONVERGED_ATOL iterations 7 > > > > > > > > > > The outer operator is an explicit matrix. > > > > > > > > > > Giang > > > > > > > > > > On Mon, Apr 24, 2017 at 7:32 PM, Barry Smith > wrote: > > > > > > > > > > > On Apr 24, 2017, at 3:16 AM, Hoang Giang Bui > wrote: > > > > > > > > > > > > Thanks Barry, trying with -fieldsplit_u_type lu gives better > convergence. I still used 4 procs though, probably with 1 proc it should > also be the same. > > > > > > > > > > > > The u block used a Nitsche-type operator to connect two > non-matching domains. I don't think it will leave some rigid body motion > leads to not sufficient constraints. Maybe you have other idea? > > > > > > > > > > > > Residual norms for fieldsplit_u_ solve. > > > > > > 0 KSP Residual norm 3.129067184300e+05 > > > > > > 1 KSP Residual norm 5.906261468196e-01 > > > > > > Residual norms for fieldsplit_wp_ solve. > > > > > > 0 KSP Residual norm 0.000000000000e+00 > > > > > > > > > > ^^^^ something is wrong here. The sub solve should not be > starting with a 0 residual (this means the right hand side for this sub > solve is zero which it should not be). > > > > > > > > > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 > > > > > > > > > > > > > > > How are you providing the outer operator? As an explicit matrix > or with some shell matrix? > > > > > > > > > > > > > > > > > > > > > 0 KSP preconditioned resid norm 3.129067184300e+05 true resid > norm 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 > > > > > > Residual norms for fieldsplit_u_ solve. > > > > > > 0 KSP Residual norm 9.999955993437e-01 > > > > > > 1 KSP Residual norm 4.019774691831e-06 > > > > > > Residual norms for fieldsplit_wp_ solve. > > > > > > 0 KSP Residual norm 0.000000000000e+00 > > > > > > 1 KSP preconditioned resid norm 5.003913641475e-01 true resid > norm 4.692996324114e+01 ||r(i)||/||b|| 5.205677185522e-06 > > > > > > Residual norms for fieldsplit_u_ solve. > > > > > > 0 KSP Residual norm 1.000012180204e+00 > > > > > > 1 KSP Residual norm 1.017367950422e-05 > > > > > > Residual norms for fieldsplit_wp_ solve. > > > > > > 0 KSP Residual norm 0.000000000000e+00 > > > > > > 2 KSP preconditioned resid norm 2.330910333756e-07 true resid > norm 3.474855463983e+01 ||r(i)||/||b|| 3.854461960453e-06 > > > > > > Residual norms for fieldsplit_u_ solve. > > > > > > 0 KSP Residual norm 1.000004200085e+00 > > > > > > 1 KSP Residual norm 6.231613102458e-06 > > > > > > Residual norms for fieldsplit_wp_ solve. > > > > > > 0 KSP Residual norm 0.000000000000e+00 > > > > > > 3 KSP preconditioned resid norm 8.671259838389e-11 true resid > norm 3.545103468011e+01 ||r(i)||/||b|| 3.932384125024e-06 > > > > > > Linear solve converged due to CONVERGED_ATOL iterations 3 > > > > > > KSP Object: 4 MPI processes > > > > > > type: gmres > > > > > > GMRES: restart=1000, using Modified Gram-Schmidt > Orthogonalization > > > > > > GMRES: happy breakdown tolerance 1e-30 > > > > > > maximum iterations=1000, initial guess is zero > > > > > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 > > > > > > left preconditioning > > > > > > using PRECONDITIONED norm type for convergence test > > > > > > PC Object: 4 MPI processes > > > > > > type: fieldsplit > > > > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 > > > > > > Solver info for each split is in the following KSP objects: > > > > > > Split number 0 Defined by IS > > > > > > KSP Object: (fieldsplit_u_) 4 MPI processes > > > > > > type: richardson > > > > > > Richardson: damping factor=1 > > > > > > maximum iterations=1, initial guess is zero > > > > > > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000 > > > > > > left preconditioning > > > > > > using PRECONDITIONED norm type for convergence test > > > > > > PC Object: (fieldsplit_u_) 4 MPI processes > > > > > > type: lu > > > > > > LU: out-of-place factorization > > > > > > tolerance for zero pivot 2.22045e-14 > > > > > > matrix ordering: natural > > > > > > factor fill ratio given 0, needed 0 > > > > > > Factored matrix follows: > > > > > > Mat Object: 4 MPI processes > > > > > > type: mpiaij > > > > > > rows=938910, cols=938910 > > > > > > package used to perform factorization: pastix > > > > > > total: nonzeros=0, allocated nonzeros=0 > > > > > > Error : 3.36878e-14 > > > > > > total number of mallocs used during MatSetValues calls > =0 > > > > > > PaStiX run parameters: > > > > > > Matrix type : Unsymmetric > > > > > > Level of printing (0,1,2): 0 > > > > > > Number of refinements iterations : 3 > > > > > > Error : 3.36878e-14 > > > > > > linear system matrix = precond matrix: > > > > > > Mat Object: (fieldsplit_u_) 4 MPI processes > > > > > > type: mpiaij > > > > > > rows=938910, cols=938910, bs=3 > > > > > > Error : 3.36878e-14 > > > > > > Error : 3.36878e-14 > > > > > > total: nonzeros=8.60906e+07, allocated > nonzeros=8.60906e+07 > > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > > using I-node (on process 0) routines: found 78749 > nodes, limit used is 5 > > > > > > Split number 1 Defined by IS > > > > > > KSP Object: (fieldsplit_wp_) 4 MPI processes > > > > > > type: richardson > > > > > > Richardson: damping factor=1 > > > > > > maximum iterations=1, initial guess is zero > > > > > > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000 > > > > > > left preconditioning > > > > > > using PRECONDITIONED norm type for convergence test > > > > > > PC Object: (fieldsplit_wp_) 4 MPI processes > > > > > > type: lu > > > > > > LU: out-of-place factorization > > > > > > tolerance for zero pivot 2.22045e-14 > > > > > > matrix ordering: natural > > > > > > factor fill ratio given 0, needed 0 > > > > > > Factored matrix follows: > > > > > > Mat Object: 4 MPI processes > > > > > > type: mpiaij > > > > > > rows=34141, cols=34141 > > > > > > package used to perform factorization: pastix > > > > > > Error : -nan > > > > > > Error : -nan > > > > > > Error : -nan > > > > > > total: nonzeros=0, allocated nonzeros=0 > > > > > > total number of mallocs used during MatSetValues > calls =0 > > > > > > PaStiX run parameters: > > > > > > Matrix type : Symmetric > > > > > > Level of printing (0,1,2): 0 > > > > > > Number of refinements iterations : 0 > > > > > > Error : -nan > > > > > > linear system matrix = precond matrix: > > > > > > Mat Object: (fieldsplit_wp_) 4 MPI processes > > > > > > type: mpiaij > > > > > > rows=34141, cols=34141 > > > > > > total: nonzeros=485655, allocated nonzeros=485655 > > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > > not using I-node (on process 0) routines > > > > > > linear system matrix = precond matrix: > > > > > > Mat Object: 4 MPI processes > > > > > > type: mpiaij > > > > > > rows=973051, cols=973051 > > > > > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 > > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > > using I-node (on process 0) routines: found 78749 nodes, > limit used is 5 > > > > > > > > > > > > > > > > > > > > > > > > Giang > > > > > > > > > > > > On Sun, Apr 23, 2017 at 10:19 PM, Barry Smith < > bsmith at mcs.anl.gov> wrote: > > > > > > > > > > > > > On Apr 23, 2017, at 2:42 PM, Hoang Giang Bui < > hgbk2008 at gmail.com> wrote: > > > > > > > > > > > > > > Dear Matt/Barry > > > > > > > > > > > > > > With your options, it results in > > > > > > > > > > > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true > resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 > > > > > > > Residual norms for fieldsplit_u_ solve. > > > > > > > 0 KSP Residual norm 2.407308987203e+36 > > > > > > > 1 KSP Residual norm 5.797185652683e+72 > > > > > > > > > > > > It looks like Matt is right, hypre is seemly producing useless > garbage. > > > > > > > > > > > > First how do things run on one process. If you have similar > problems then debug on one process (debugging any kind of problem is always > far easy on one process). > > > > > > > > > > > > First run with -fieldsplit_u_type lu (instead of using hypre) to > see if that works or also produces something bad. > > > > > > > > > > > > What is the operator and the boundary conditions for u? It could > be singular. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Residual norms for fieldsplit_wp_ solve. > > > > > > > 0 KSP Residual norm 0.000000000000e+00 > > > > > > > ... > > > > > > > 999 KSP preconditioned resid norm 2.920157329174e+12 true > resid norm 9.015683504616e+06 ||r(i)||/||b|| 1.000059124102e+00 > > > > > > > Residual norms for fieldsplit_u_ solve. > > > > > > > 0 KSP Residual norm 1.533726746719e+36 > > > > > > > 1 KSP Residual norm 3.692757392261e+72 > > > > > > > Residual norms for fieldsplit_wp_ solve. > > > > > > > 0 KSP Residual norm 0.000000000000e+00 > > > > > > > > > > > > > > Do you suggest that the pastix solver for the "wp" block > encounters small pivot? In addition, seem like the "u" block is also > singular. > > > > > > > > > > > > > > Giang > > > > > > > > > > > > > > On Sun, Apr 23, 2017 at 7:39 PM, Barry Smith < > bsmith at mcs.anl.gov> wrote: > > > > > > > > > > > > > > Huge preconditioned norms but normal unpreconditioned norms > almost always come from a very small pivot in an LU or ILU factorization. > > > > > > > > > > > > > > The first thing to do is monitor the two sub solves. Run > with the additional options -fieldsplit_u_ksp_type richardson > -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 > -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor > -fieldsplit_wp_ksp_max_it 1 > > > > > > > > > > > > > > > On Apr 23, 2017, at 12:22 PM, Hoang Giang Bui < > hgbk2008 at gmail.com> wrote: > > > > > > > > > > > > > > > > Hello > > > > > > > > > > > > > > > > I encountered a strange convergence behavior that I have > trouble to understand > > > > > > > > > > > > > > > > KSPSetFromOptions completed > > > > > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true > resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 > > > > > > > > 1 KSP preconditioned resid norm 2.933141742664e+29 true > resid norm 9.015152282123e+06 ||r(i)||/||b|| 1.000000198575e+00 > > > > > > > > 2 KSP preconditioned resid norm 9.686409637174e+16 true > resid norm 9.015354521944e+06 ||r(i)||/||b|| 1.000022631902e+00 > > > > > > > > 3 KSP preconditioned resid norm 4.219243615809e+15 true > resid norm 9.017157702420e+06 ||r(i)||/||b|| 1.000222648583e+00 > > > > > > > > ..... > > > > > > > > 999 KSP preconditioned resid norm 3.043754298076e+12 true > resid norm 9.015425041089e+06 ||r(i)||/||b|| 1.000030454195e+00 > > > > > > > > 1000 KSP preconditioned resid norm 3.043000287819e+12 true > resid norm 9.015424313455e+06 ||r(i)||/||b|| 1.000030373483e+00 > > > > > > > > Linear solve did not converge due to DIVERGED_ITS iterations > 1000 > > > > > > > > KSP Object: 4 MPI processes > > > > > > > > type: gmres > > > > > > > > GMRES: restart=1000, using Modified Gram-Schmidt > Orthogonalization > > > > > > > > GMRES: happy breakdown tolerance 1e-30 > > > > > > > > maximum iterations=1000, initial guess is zero > > > > > > > > tolerances: relative=1e-20, absolute=1e-09, > divergence=10000 > > > > > > > > left preconditioning > > > > > > > > using PRECONDITIONED norm type for convergence test > > > > > > > > PC Object: 4 MPI processes > > > > > > > > type: fieldsplit > > > > > > > > FieldSplit with MULTIPLICATIVE composition: total splits > = 2 > > > > > > > > Solver info for each split is in the following KSP > objects: > > > > > > > > Split number 0 Defined by IS > > > > > > > > KSP Object: (fieldsplit_u_) 4 MPI processes > > > > > > > > type: preonly > > > > > > > > maximum iterations=10000, initial guess is zero > > > > > > > > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000 > > > > > > > > left preconditioning > > > > > > > > using NONE norm type for convergence test > > > > > > > > PC Object: (fieldsplit_u_) 4 MPI processes > > > > > > > > type: hypre > > > > > > > > HYPRE BoomerAMG preconditioning > > > > > > > > HYPRE BoomerAMG: Cycle type V > > > > > > > > HYPRE BoomerAMG: Maximum number of levels 25 > > > > > > > > HYPRE BoomerAMG: Maximum number of iterations PER > hypre call 1 > > > > > > > > HYPRE BoomerAMG: Convergence tolerance PER hypre > call 0 > > > > > > > > HYPRE BoomerAMG: Threshold for strong coupling 0.6 > > > > > > > > HYPRE BoomerAMG: Interpolation truncation factor 0 > > > > > > > > HYPRE BoomerAMG: Interpolation: max elements per row > 0 > > > > > > > > HYPRE BoomerAMG: Number of levels of aggressive > coarsening 0 > > > > > > > > HYPRE BoomerAMG: Number of paths for aggressive > coarsening 1 > > > > > > > > HYPRE BoomerAMG: Maximum row sums 0.9 > > > > > > > > HYPRE BoomerAMG: Sweeps down 1 > > > > > > > > HYPRE BoomerAMG: Sweeps up 1 > > > > > > > > HYPRE BoomerAMG: Sweeps on coarse 1 > > > > > > > > HYPRE BoomerAMG: Relax down > symmetric-SOR/Jacobi > > > > > > > > HYPRE BoomerAMG: Relax up > symmetric-SOR/Jacobi > > > > > > > > HYPRE BoomerAMG: Relax on coarse > Gaussian-elimination > > > > > > > > HYPRE BoomerAMG: Relax weight (all) 1 > > > > > > > > HYPRE BoomerAMG: Outer relax weight (all) 1 > > > > > > > > HYPRE BoomerAMG: Using CF-relaxation > > > > > > > > HYPRE BoomerAMG: Measure type local > > > > > > > > HYPRE BoomerAMG: Coarsen type PMIS > > > > > > > > HYPRE BoomerAMG: Interpolation type classical > > > > > > > > linear system matrix = precond matrix: > > > > > > > > Mat Object: (fieldsplit_u_) 4 MPI processes > > > > > > > > type: mpiaij > > > > > > > > rows=938910, cols=938910, bs=3 > > > > > > > > total: nonzeros=8.60906e+07, allocated > nonzeros=8.60906e+07 > > > > > > > > total number of mallocs used during MatSetValues > calls =0 > > > > > > > > using I-node (on process 0) routines: found 78749 > nodes, limit used is 5 > > > > > > > > Split number 1 Defined by IS > > > > > > > > KSP Object: (fieldsplit_wp_) 4 MPI processes > > > > > > > > type: preonly > > > > > > > > maximum iterations=10000, initial guess is zero > > > > > > > > tolerances: relative=1e-05, absolute=1e-50, > divergence=10000 > > > > > > > > left preconditioning > > > > > > > > using NONE norm type for convergence test > > > > > > > > PC Object: (fieldsplit_wp_) 4 MPI processes > > > > > > > > type: lu > > > > > > > > LU: out-of-place factorization > > > > > > > > tolerance for zero pivot 2.22045e-14 > > > > > > > > matrix ordering: natural > > > > > > > > factor fill ratio given 0, needed 0 > > > > > > > > Factored matrix follows: > > > > > > > > Mat Object: 4 MPI processes > > > > > > > > type: mpiaij > > > > > > > > rows=34141, cols=34141 > > > > > > > > package used to perform factorization: pastix > > > > > > > > Error : -nan > > > > > > > > Error : -nan > > > > > > > > total: nonzeros=0, allocated nonzeros=0 > > > > > > > > Error : -nan > > > > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > > > > PaStiX run parameters: > > > > > > > > Matrix type : > Symmetric > > > > > > > > Level of printing (0,1,2): 0 > > > > > > > > Number of refinements iterations : 0 > > > > > > > > Error : -nan > > > > > > > > linear system matrix = precond matrix: > > > > > > > > Mat Object: (fieldsplit_wp_) 4 MPI processes > > > > > > > > type: mpiaij > > > > > > > > rows=34141, cols=34141 > > > > > > > > total: nonzeros=485655, allocated nonzeros=485655 > > > > > > > > total number of mallocs used during MatSetValues > calls =0 > > > > > > > > not using I-node (on process 0) routines > > > > > > > > linear system matrix = precond matrix: > > > > > > > > Mat Object: 4 MPI processes > > > > > > > > type: mpiaij > > > > > > > > rows=973051, cols=973051 > > > > > > > > total: nonzeros=9.90037e+07, allocated > nonzeros=9.90037e+07 > > > > > > > > total number of mallocs used during MatSetValues calls =0 > > > > > > > > using I-node (on process 0) routines: found 78749 > nodes, limit used is 5 > > > > > > > > > > > > > > > > The pattern of convergence gives a hint that this system is > somehow bad/singular. But I don't know why the preconditioned error goes up > too high. Anyone has an idea? > > > > > > > > > > > > > > > > Best regards > > > > > > > > Giang Bui > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jed at jedbrown.org Sat Apr 29 08:34:32 2017 From: jed at jedbrown.org (Jed Brown) Date: Sat, 29 Apr 2017 07:34:32 -0600 Subject: [petsc-users] strange convergence In-Reply-To: References: <7891536D-91FE-4BFF-8DAD-CE7AB85A4E57@mcs.anl.gov> <425BBB58-9721-49F3-8C86-940F08E925F7@mcs.anl.gov> <42EB791A-40C2-439F-A5F7-5F8C15CECA6F@mcs.anl.gov> <82193784-B4C4-47D7-80EA-25F549C9091B@mcs.anl.gov> Message-ID: <87wpa3wd5j.fsf@jedbrown.org> Hoang Giang Bui writes: > Hi Barry > > The first block is from a standard solid mechanics discretization based on > balance of momentum equation. There is some material involved but in > principal it's well-posed elasticity equation with positive definite > tangent operator. The "gluing business" uses the mortar method to keep the > continuity of displacement. Instead of using Lagrange multiplier to treat > the constraint I used penalty method to penalize the energy. The > discretization form of mortar is quite simple > > \int_{\Gamma_1} { rho * (\delta u_1 - \delta u_2) * (u_1 - u_2) dA } > > rho is penalty parameter. In the simulation I initially set it low (~E) to > preserve the conditioning of the system. There are two things that can go wrong here with AMG: * The penalty term can mess up the strength of connection heuristics such that you get poor choice of C-points (classical AMG like BoomerAMG) or poor choice of aggregates (smoothed aggregation). * The penalty term can prevent Jacobi smoothing from being effective; in this case, it can lead to poor coarse basis functions (higher energy than they should be) and poor smoothing in an MG cycle. You can fix the poor smoothing in the MG cycle by using a stronger smoother, like ASM with some overlap. I'm generally not a fan of penalty methods due to the irritating tradeoffs and often poor solver performance. > In the figure below, the colorful blocks are u_1 and the base is u_2. Both > u_1 and u_2 use isoparametric quadratic approximation. > > ? > Snapshot.png > > ??? > > Giang > > On Fri, Apr 28, 2017 at 6:21 PM, Barry Smith wrote: > >> >> Ok, so boomerAMG algebraic multigrid is not good for the first block. >> You mentioned the first block has two things glued together? AMG is >> fantastic for certain problems but doesn't work for everything. >> >> Tell us more about the first block, what PDE it comes from, what >> discretization, and what the "gluing business" is and maybe we'll have >> suggestions for how to precondition it. >> >> Barry >> >> > On Apr 28, 2017, at 3:56 AM, Hoang Giang Bui wrote: >> > >> > It's in fact quite good >> > >> > Residual norms for fieldsplit_u_ solve. >> > 0 KSP Residual norm 4.014715925568e+00 >> > 1 KSP Residual norm 2.160497019264e-10 >> > Residual norms for fieldsplit_wp_ solve. >> > 0 KSP Residual norm 0.000000000000e+00 >> > 0 KSP preconditioned resid norm 4.014715925568e+00 true resid norm >> 9.006493082896e+06 ||r(i)||/||b|| 1.000000000000e+00 >> > Residual norms for fieldsplit_u_ solve. >> > 0 KSP Residual norm 9.999999999416e-01 >> > 1 KSP Residual norm 7.118380416383e-11 >> > Residual norms for fieldsplit_wp_ solve. >> > 0 KSP Residual norm 0.000000000000e+00 >> > 1 KSP preconditioned resid norm 1.701150951035e-10 true resid norm >> 5.494262251846e-04 ||r(i)||/||b|| 6.100334726599e-11 >> > Linear solve converged due to CONVERGED_ATOL iterations 1 >> > >> > Giang >> > >> > On Thu, Apr 27, 2017 at 5:25 PM, Barry Smith wrote: >> > >> > Run again using LU on both blocks to see what happens. >> > >> > >> > > On Apr 27, 2017, at 2:14 AM, Hoang Giang Bui >> wrote: >> > > >> > > I have changed the way to tie the nonconforming mesh. It seems the >> matrix now is better >> > > >> > > with -pc_type lu the output is >> > > 0 KSP preconditioned resid norm 3.308678584240e-01 true resid norm >> 9.006493082896e+06 ||r(i)||/||b|| 1.000000000000e+00 >> > > 1 KSP preconditioned resid norm 2.004313395301e-12 true resid norm >> 2.549872332830e-05 ||r(i)||/||b|| 2.831148938173e-12 >> > > Linear solve converged due to CONVERGED_ATOL iterations 1 >> > > >> > > >> > > with -pc_type fieldsplit -fieldsplit_u_pc_type hypre >> -fieldsplit_wp_pc_type lu the convergence is slow >> > > 0 KSP preconditioned resid norm 1.116302362553e-01 true resid norm >> 9.006493083520e+06 ||r(i)||/||b|| 1.000000000000e+00 >> > > 1 KSP preconditioned resid norm 2.582134825666e-02 true resid norm >> 9.268347719866e+06 ||r(i)||/||b|| 1.029073984060e+00 >> > > ... >> > > 824 KSP preconditioned resid norm 1.018542387738e-09 true resid norm >> 2.906608839310e+02 ||r(i)||/||b|| 3.227237074804e-05 >> > > 825 KSP preconditioned resid norm 9.743727947637e-10 true resid norm >> 2.820369993061e+02 ||r(i)||/||b|| 3.131485215062e-05 >> > > Linear solve converged due to CONVERGED_ATOL iterations 825 >> > > >> > > checking with additional -fieldsplit_u_ksp_type richardson >> -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 >> -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor >> -fieldsplit_wp_ksp_max_it 1 gives >> > > >> > > 0 KSP preconditioned resid norm 1.116302362553e-01 true resid norm >> 9.006493083520e+06 ||r(i)||/||b|| 1.000000000000e+00 >> > > Residual norms for fieldsplit_u_ solve. >> > > 0 KSP Residual norm 5.803507549280e-01 >> > > 1 KSP Residual norm 2.069538175950e-01 >> > > Residual norms for fieldsplit_wp_ solve. >> > > 0 KSP Residual norm 0.000000000000e+00 >> > > 1 KSP preconditioned resid norm 2.582134825666e-02 true resid norm >> 9.268347719866e+06 ||r(i)||/||b|| 1.029073984060e+00 >> > > Residual norms for fieldsplit_u_ solve. >> > > 0 KSP Residual norm 7.831796195225e-01 >> > > 1 KSP Residual norm 1.734608520110e-01 >> > > Residual norms for fieldsplit_wp_ solve. >> > > 0 KSP Residual norm 0.000000000000e+00 >> > > .... >> > > 823 KSP preconditioned resid norm 1.065070135605e-09 true resid norm >> 3.081881356833e+02 ||r(i)||/||b|| 3.421843916665e-05 >> > > Residual norms for fieldsplit_u_ solve. >> > > 0 KSP Residual norm 6.113806394327e-01 >> > > 1 KSP Residual norm 1.535465290944e-01 >> > > Residual norms for fieldsplit_wp_ solve. >> > > 0 KSP Residual norm 0.000000000000e+00 >> > > 824 KSP preconditioned resid norm 1.018542387746e-09 true resid norm >> 2.906608839353e+02 ||r(i)||/||b|| 3.227237074851e-05 >> > > Residual norms for fieldsplit_u_ solve. >> > > 0 KSP Residual norm 6.123437055586e-01 >> > > 1 KSP Residual norm 1.524661826133e-01 >> > > Residual norms for fieldsplit_wp_ solve. >> > > 0 KSP Residual norm 0.000000000000e+00 >> > > 825 KSP preconditioned resid norm 9.743727947718e-10 true resid norm >> 2.820369990571e+02 ||r(i)||/||b|| 3.131485212298e-05 >> > > Linear solve converged due to CONVERGED_ATOL iterations 825 >> > > >> > > >> > > The residual for wp block is zero since in this first step the rhs is >> zero. As can see in the output, the multigrid does not perform well to >> reduce the residual in the sub-solve. Is my observation right? what can be >> done to improve this? >> > > >> > > >> > > Giang >> > > >> > > On Tue, Apr 25, 2017 at 12:17 AM, Barry Smith >> wrote: >> > > >> > > This can happen in the matrix is singular or nearly singular or if >> the factorization generates small pivots, which can occur for even >> nonsingular problems if the matrix is poorly scaled or just plain nasty. >> > > >> > > >> > > > On Apr 24, 2017, at 5:10 PM, Hoang Giang Bui >> wrote: >> > > > >> > > > It took a while, here I send you the output >> > > > >> > > > 0 KSP preconditioned resid norm 3.129073545457e+05 true resid norm >> 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 >> > > > 1 KSP preconditioned resid norm 7.442444222843e-01 true resid norm >> 1.003356247696e+02 ||r(i)||/||b|| 1.112966720375e-05 >> > > > 2 KSP preconditioned resid norm 3.267453132529e-07 true resid norm >> 3.216722968300e+01 ||r(i)||/||b|| 3.568130084011e-06 >> > > > 3 KSP preconditioned resid norm 1.155046883816e-11 true resid norm >> 3.234460376820e+01 ||r(i)||/||b|| 3.587805194854e-06 >> > > > Linear solve converged due to CONVERGED_ATOL iterations 3 >> > > > KSP Object: 4 MPI processes >> > > > type: gmres >> > > > GMRES: restart=1000, using Modified Gram-Schmidt >> Orthogonalization >> > > > GMRES: happy breakdown tolerance 1e-30 >> > > > maximum iterations=1000, initial guess is zero >> > > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 >> > > > left preconditioning >> > > > using PRECONDITIONED norm type for convergence test >> > > > PC Object: 4 MPI processes >> > > > type: lu >> > > > LU: out-of-place factorization >> > > > tolerance for zero pivot 2.22045e-14 >> > > > matrix ordering: natural >> > > > factor fill ratio given 0, needed 0 >> > > > Factored matrix follows: >> > > > Mat Object: 4 MPI processes >> > > > type: mpiaij >> > > > rows=973051, cols=973051 >> > > > package used to perform factorization: pastix >> > > > Error : 3.24786e-14 >> > > > total: nonzeros=0, allocated nonzeros=0 >> > > > total number of mallocs used during MatSetValues calls =0 >> > > > PaStiX run parameters: >> > > > Matrix type : Unsymmetric >> > > > Level of printing (0,1,2): 0 >> > > > Number of refinements iterations : 3 >> > > > Error : 3.24786e-14 >> > > > linear system matrix = precond matrix: >> > > > Mat Object: 4 MPI processes >> > > > type: mpiaij >> > > > rows=973051, cols=973051 >> > > > Error : 3.24786e-14 >> > > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 >> > > > total number of mallocs used during MatSetValues calls =0 >> > > > using I-node (on process 0) routines: found 78749 nodes, limit >> used is 5 >> > > > Error : 3.24786e-14 >> > > > >> > > > It doesn't do as you said. Something is not right here. I will look >> in depth. >> > > > >> > > > Giang >> > > > >> > > > On Mon, Apr 24, 2017 at 8:21 PM, Barry Smith >> wrote: >> > > > >> > > > > On Apr 24, 2017, at 12:47 PM, Hoang Giang Bui >> wrote: >> > > > > >> > > > > Good catch. I get this for the very first step, maybe at that time >> the rhs_w is zero. >> > > > >> > > > With the multiplicative composition the right hand side of the >> second solve is the initial right hand side of the second solve minus >> A_10*x where x is the solution to the first sub solve and A_10 is the lower >> left block of the outer matrix. So unless both the initial right hand side >> has a zero for the second block and A_10 is identically zero the right hand >> side for the second sub solve should not be zero. Is A_10 == 0? >> > > > >> > > > >> > > > > In the later step, it shows 2 step convergence >> > > > > >> > > > > Residual norms for fieldsplit_u_ solve. >> > > > > 0 KSP Residual norm 3.165886479830e+04 >> > > > > 1 KSP Residual norm 2.905922877684e-01 >> > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > 0 KSP Residual norm 2.397669419027e-01 >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> > > > > 0 KSP preconditioned resid norm 3.165886479920e+04 true resid >> norm 7.963616922323e+05 ||r(i)||/||b|| 1.000000000000e+00 >> > > > > Residual norms for fieldsplit_u_ solve. >> > > > > 0 KSP Residual norm 9.999891813771e-01 >> > > > > 1 KSP Residual norm 1.512000395579e-05 >> > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > 0 KSP Residual norm 8.192702188243e-06 >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> > > > > 1 KSP preconditioned resid norm 5.252183822848e-02 true resid >> norm 7.135927677844e+04 ||r(i)||/||b|| 8.960661653427e-02 >> > > > >> > > > The outer residual norms are still wonky, the preconditioned >> residual norm goes from 3.165886479920e+04 to 5.252183822848e-02 which is a >> huge drop but the 7.963616922323e+05 drops very much less >> 7.135927677844e+04. This is not normal. >> > > > >> > > > What if you just use -pc_type lu for the entire system (no >> fieldsplit), does the true residual drop to almost zero in the first >> iteration (as it should?). Send the output. >> > > > >> > > > >> > > > >> > > > > Residual norms for fieldsplit_u_ solve. >> > > > > 0 KSP Residual norm 6.946213936597e-01 >> > > > > 1 KSP Residual norm 1.195514007343e-05 >> > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > 0 KSP Residual norm 1.025694497535e+00 >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> > > > > 2 KSP preconditioned resid norm 8.785709535405e-03 true resid >> norm 1.419341799277e+04 ||r(i)||/||b|| 1.782282866091e-02 >> > > > > Residual norms for fieldsplit_u_ solve. >> > > > > 0 KSP Residual norm 7.255149996405e-01 >> > > > > 1 KSP Residual norm 6.583512434218e-06 >> > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > 0 KSP Residual norm 1.015229700337e+00 >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> > > > > 3 KSP preconditioned resid norm 7.110407712709e-04 true resid >> norm 5.284940654154e+02 ||r(i)||/||b|| 6.636357205153e-04 >> > > > > Residual norms for fieldsplit_u_ solve. >> > > > > 0 KSP Residual norm 3.512243341400e-01 >> > > > > 1 KSP Residual norm 2.032490351200e-06 >> > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > 0 KSP Residual norm 1.282327290982e+00 >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> > > > > 4 KSP preconditioned resid norm 3.482036620521e-05 true resid >> norm 4.291231924307e+01 ||r(i)||/||b|| 5.388546393133e-05 >> > > > > Residual norms for fieldsplit_u_ solve. >> > > > > 0 KSP Residual norm 3.423609338053e-01 >> > > > > 1 KSP Residual norm 4.213703301972e-07 >> > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > 0 KSP Residual norm 1.157384757538e+00 >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> > > > > 5 KSP preconditioned resid norm 1.203470314534e-06 true resid >> norm 4.544956156267e+00 ||r(i)||/||b|| 5.707150658550e-06 >> > > > > Residual norms for fieldsplit_u_ solve. >> > > > > 0 KSP Residual norm 3.838596289995e-01 >> > > > > 1 KSP Residual norm 9.927864176103e-08 >> > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > 0 KSP Residual norm 1.066298905618e+00 >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> > > > > 6 KSP preconditioned resid norm 3.331619244266e-08 true resid >> norm 2.821511729024e+00 ||r(i)||/||b|| 3.543002829675e-06 >> > > > > Residual norms for fieldsplit_u_ solve. >> > > > > 0 KSP Residual norm 4.624964188094e-01 >> > > > > 1 KSP Residual norm 6.418229775372e-08 >> > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > 0 KSP Residual norm 9.800784311614e-01 >> > > > > 1 KSP Residual norm 0.000000000000e+00 >> > > > > 7 KSP preconditioned resid norm 8.788046233297e-10 true resid >> norm 2.849209671705e+00 ||r(i)||/||b|| 3.577783436215e-06 >> > > > > Linear solve converged due to CONVERGED_ATOL iterations 7 >> > > > > >> > > > > The outer operator is an explicit matrix. >> > > > > >> > > > > Giang >> > > > > >> > > > > On Mon, Apr 24, 2017 at 7:32 PM, Barry Smith >> wrote: >> > > > > >> > > > > > On Apr 24, 2017, at 3:16 AM, Hoang Giang Bui >> wrote: >> > > > > > >> > > > > > Thanks Barry, trying with -fieldsplit_u_type lu gives better >> convergence. I still used 4 procs though, probably with 1 proc it should >> also be the same. >> > > > > > >> > > > > > The u block used a Nitsche-type operator to connect two >> non-matching domains. I don't think it will leave some rigid body motion >> leads to not sufficient constraints. Maybe you have other idea? >> > > > > > >> > > > > > Residual norms for fieldsplit_u_ solve. >> > > > > > 0 KSP Residual norm 3.129067184300e+05 >> > > > > > 1 KSP Residual norm 5.906261468196e-01 >> > > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > > 0 KSP Residual norm 0.000000000000e+00 >> > > > > >> > > > > ^^^^ something is wrong here. The sub solve should not be >> starting with a 0 residual (this means the right hand side for this sub >> solve is zero which it should not be). >> > > > > >> > > > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 >> > > > > >> > > > > >> > > > > How are you providing the outer operator? As an explicit matrix >> or with some shell matrix? >> > > > > >> > > > > >> > > > > >> > > > > > 0 KSP preconditioned resid norm 3.129067184300e+05 true resid >> norm 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 >> > > > > > Residual norms for fieldsplit_u_ solve. >> > > > > > 0 KSP Residual norm 9.999955993437e-01 >> > > > > > 1 KSP Residual norm 4.019774691831e-06 >> > > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > > 0 KSP Residual norm 0.000000000000e+00 >> > > > > > 1 KSP preconditioned resid norm 5.003913641475e-01 true resid >> norm 4.692996324114e+01 ||r(i)||/||b|| 5.205677185522e-06 >> > > > > > Residual norms for fieldsplit_u_ solve. >> > > > > > 0 KSP Residual norm 1.000012180204e+00 >> > > > > > 1 KSP Residual norm 1.017367950422e-05 >> > > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > > 0 KSP Residual norm 0.000000000000e+00 >> > > > > > 2 KSP preconditioned resid norm 2.330910333756e-07 true resid >> norm 3.474855463983e+01 ||r(i)||/||b|| 3.854461960453e-06 >> > > > > > Residual norms for fieldsplit_u_ solve. >> > > > > > 0 KSP Residual norm 1.000004200085e+00 >> > > > > > 1 KSP Residual norm 6.231613102458e-06 >> > > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > > 0 KSP Residual norm 0.000000000000e+00 >> > > > > > 3 KSP preconditioned resid norm 8.671259838389e-11 true resid >> norm 3.545103468011e+01 ||r(i)||/||b|| 3.932384125024e-06 >> > > > > > Linear solve converged due to CONVERGED_ATOL iterations 3 >> > > > > > KSP Object: 4 MPI processes >> > > > > > type: gmres >> > > > > > GMRES: restart=1000, using Modified Gram-Schmidt >> Orthogonalization >> > > > > > GMRES: happy breakdown tolerance 1e-30 >> > > > > > maximum iterations=1000, initial guess is zero >> > > > > > tolerances: relative=1e-20, absolute=1e-09, divergence=10000 >> > > > > > left preconditioning >> > > > > > using PRECONDITIONED norm type for convergence test >> > > > > > PC Object: 4 MPI processes >> > > > > > type: fieldsplit >> > > > > > FieldSplit with MULTIPLICATIVE composition: total splits = 2 >> > > > > > Solver info for each split is in the following KSP objects: >> > > > > > Split number 0 Defined by IS >> > > > > > KSP Object: (fieldsplit_u_) 4 MPI processes >> > > > > > type: richardson >> > > > > > Richardson: damping factor=1 >> > > > > > maximum iterations=1, initial guess is zero >> > > > > > tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000 >> > > > > > left preconditioning >> > > > > > using PRECONDITIONED norm type for convergence test >> > > > > > PC Object: (fieldsplit_u_) 4 MPI processes >> > > > > > type: lu >> > > > > > LU: out-of-place factorization >> > > > > > tolerance for zero pivot 2.22045e-14 >> > > > > > matrix ordering: natural >> > > > > > factor fill ratio given 0, needed 0 >> > > > > > Factored matrix follows: >> > > > > > Mat Object: 4 MPI processes >> > > > > > type: mpiaij >> > > > > > rows=938910, cols=938910 >> > > > > > package used to perform factorization: pastix >> > > > > > total: nonzeros=0, allocated nonzeros=0 >> > > > > > Error : 3.36878e-14 >> > > > > > total number of mallocs used during MatSetValues calls >> =0 >> > > > > > PaStiX run parameters: >> > > > > > Matrix type : Unsymmetric >> > > > > > Level of printing (0,1,2): 0 >> > > > > > Number of refinements iterations : 3 >> > > > > > Error : 3.36878e-14 >> > > > > > linear system matrix = precond matrix: >> > > > > > Mat Object: (fieldsplit_u_) 4 MPI processes >> > > > > > type: mpiaij >> > > > > > rows=938910, cols=938910, bs=3 >> > > > > > Error : 3.36878e-14 >> > > > > > Error : 3.36878e-14 >> > > > > > total: nonzeros=8.60906e+07, allocated >> nonzeros=8.60906e+07 >> > > > > > total number of mallocs used during MatSetValues calls =0 >> > > > > > using I-node (on process 0) routines: found 78749 >> nodes, limit used is 5 >> > > > > > Split number 1 Defined by IS >> > > > > > KSP Object: (fieldsplit_wp_) 4 MPI processes >> > > > > > type: richardson >> > > > > > Richardson: damping factor=1 >> > > > > > maximum iterations=1, initial guess is zero >> > > > > > tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000 >> > > > > > left preconditioning >> > > > > > using PRECONDITIONED norm type for convergence test >> > > > > > PC Object: (fieldsplit_wp_) 4 MPI processes >> > > > > > type: lu >> > > > > > LU: out-of-place factorization >> > > > > > tolerance for zero pivot 2.22045e-14 >> > > > > > matrix ordering: natural >> > > > > > factor fill ratio given 0, needed 0 >> > > > > > Factored matrix follows: >> > > > > > Mat Object: 4 MPI processes >> > > > > > type: mpiaij >> > > > > > rows=34141, cols=34141 >> > > > > > package used to perform factorization: pastix >> > > > > > Error : -nan >> > > > > > Error : -nan >> > > > > > Error : -nan >> > > > > > total: nonzeros=0, allocated nonzeros=0 >> > > > > > total number of mallocs used during MatSetValues >> calls =0 >> > > > > > PaStiX run parameters: >> > > > > > Matrix type : Symmetric >> > > > > > Level of printing (0,1,2): 0 >> > > > > > Number of refinements iterations : 0 >> > > > > > Error : -nan >> > > > > > linear system matrix = precond matrix: >> > > > > > Mat Object: (fieldsplit_wp_) 4 MPI processes >> > > > > > type: mpiaij >> > > > > > rows=34141, cols=34141 >> > > > > > total: nonzeros=485655, allocated nonzeros=485655 >> > > > > > total number of mallocs used during MatSetValues calls =0 >> > > > > > not using I-node (on process 0) routines >> > > > > > linear system matrix = precond matrix: >> > > > > > Mat Object: 4 MPI processes >> > > > > > type: mpiaij >> > > > > > rows=973051, cols=973051 >> > > > > > total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 >> > > > > > total number of mallocs used during MatSetValues calls =0 >> > > > > > using I-node (on process 0) routines: found 78749 nodes, >> limit used is 5 >> > > > > > >> > > > > > >> > > > > > >> > > > > > Giang >> > > > > > >> > > > > > On Sun, Apr 23, 2017 at 10:19 PM, Barry Smith < >> bsmith at mcs.anl.gov> wrote: >> > > > > > >> > > > > > > On Apr 23, 2017, at 2:42 PM, Hoang Giang Bui < >> hgbk2008 at gmail.com> wrote: >> > > > > > > >> > > > > > > Dear Matt/Barry >> > > > > > > >> > > > > > > With your options, it results in >> > > > > > > >> > > > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true >> resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 >> > > > > > > Residual norms for fieldsplit_u_ solve. >> > > > > > > 0 KSP Residual norm 2.407308987203e+36 >> > > > > > > 1 KSP Residual norm 5.797185652683e+72 >> > > > > > >> > > > > > It looks like Matt is right, hypre is seemly producing useless >> garbage. >> > > > > > >> > > > > > First how do things run on one process. If you have similar >> problems then debug on one process (debugging any kind of problem is always >> far easy on one process). >> > > > > > >> > > > > > First run with -fieldsplit_u_type lu (instead of using hypre) to >> see if that works or also produces something bad. >> > > > > > >> > > > > > What is the operator and the boundary conditions for u? It could >> be singular. >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > > > 0 KSP Residual norm 0.000000000000e+00 >> > > > > > > ... >> > > > > > > 999 KSP preconditioned resid norm 2.920157329174e+12 true >> resid norm 9.015683504616e+06 ||r(i)||/||b|| 1.000059124102e+00 >> > > > > > > Residual norms for fieldsplit_u_ solve. >> > > > > > > 0 KSP Residual norm 1.533726746719e+36 >> > > > > > > 1 KSP Residual norm 3.692757392261e+72 >> > > > > > > Residual norms for fieldsplit_wp_ solve. >> > > > > > > 0 KSP Residual norm 0.000000000000e+00 >> > > > > > > >> > > > > > > Do you suggest that the pastix solver for the "wp" block >> encounters small pivot? In addition, seem like the "u" block is also >> singular. >> > > > > > > >> > > > > > > Giang >> > > > > > > >> > > > > > > On Sun, Apr 23, 2017 at 7:39 PM, Barry Smith < >> bsmith at mcs.anl.gov> wrote: >> > > > > > > >> > > > > > > Huge preconditioned norms but normal unpreconditioned norms >> almost always come from a very small pivot in an LU or ILU factorization. >> > > > > > > >> > > > > > > The first thing to do is monitor the two sub solves. Run >> with the additional options -fieldsplit_u_ksp_type richardson >> -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 >> -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor >> -fieldsplit_wp_ksp_max_it 1 >> > > > > > > >> > > > > > > > On Apr 23, 2017, at 12:22 PM, Hoang Giang Bui < >> hgbk2008 at gmail.com> wrote: >> > > > > > > > >> > > > > > > > Hello >> > > > > > > > >> > > > > > > > I encountered a strange convergence behavior that I have >> trouble to understand >> > > > > > > > >> > > > > > > > KSPSetFromOptions completed >> > > > > > > > 0 KSP preconditioned resid norm 1.106709687386e+31 true >> resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 >> > > > > > > > 1 KSP preconditioned resid norm 2.933141742664e+29 true >> resid norm 9.015152282123e+06 ||r(i)||/||b|| 1.000000198575e+00 >> > > > > > > > 2 KSP preconditioned resid norm 9.686409637174e+16 true >> resid norm 9.015354521944e+06 ||r(i)||/||b|| 1.000022631902e+00 >> > > > > > > > 3 KSP preconditioned resid norm 4.219243615809e+15 true >> resid norm 9.017157702420e+06 ||r(i)||/||b|| 1.000222648583e+00 >> > > > > > > > ..... >> > > > > > > > 999 KSP preconditioned resid norm 3.043754298076e+12 true >> resid norm 9.015425041089e+06 ||r(i)||/||b|| 1.000030454195e+00 >> > > > > > > > 1000 KSP preconditioned resid norm 3.043000287819e+12 true >> resid norm 9.015424313455e+06 ||r(i)||/||b|| 1.000030373483e+00 >> > > > > > > > Linear solve did not converge due to DIVERGED_ITS iterations >> 1000 >> > > > > > > > KSP Object: 4 MPI processes >> > > > > > > > type: gmres >> > > > > > > > GMRES: restart=1000, using Modified Gram-Schmidt >> Orthogonalization >> > > > > > > > GMRES: happy breakdown tolerance 1e-30 >> > > > > > > > maximum iterations=1000, initial guess is zero >> > > > > > > > tolerances: relative=1e-20, absolute=1e-09, >> divergence=10000 >> > > > > > > > left preconditioning >> > > > > > > > using PRECONDITIONED norm type for convergence test >> > > > > > > > PC Object: 4 MPI processes >> > > > > > > > type: fieldsplit >> > > > > > > > FieldSplit with MULTIPLICATIVE composition: total splits >> = 2 >> > > > > > > > Solver info for each split is in the following KSP >> objects: >> > > > > > > > Split number 0 Defined by IS >> > > > > > > > KSP Object: (fieldsplit_u_) 4 MPI processes >> > > > > > > > type: preonly >> > > > > > > > maximum iterations=10000, initial guess is zero >> > > > > > > > tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000 >> > > > > > > > left preconditioning >> > > > > > > > using NONE norm type for convergence test >> > > > > > > > PC Object: (fieldsplit_u_) 4 MPI processes >> > > > > > > > type: hypre >> > > > > > > > HYPRE BoomerAMG preconditioning >> > > > > > > > HYPRE BoomerAMG: Cycle type V >> > > > > > > > HYPRE BoomerAMG: Maximum number of levels 25 >> > > > > > > > HYPRE BoomerAMG: Maximum number of iterations PER >> hypre call 1 >> > > > > > > > HYPRE BoomerAMG: Convergence tolerance PER hypre >> call 0 >> > > > > > > > HYPRE BoomerAMG: Threshold for strong coupling 0.6 >> > > > > > > > HYPRE BoomerAMG: Interpolation truncation factor 0 >> > > > > > > > HYPRE BoomerAMG: Interpolation: max elements per row >> 0 >> > > > > > > > HYPRE BoomerAMG: Number of levels of aggressive >> coarsening 0 >> > > > > > > > HYPRE BoomerAMG: Number of paths for aggressive >> coarsening 1 >> > > > > > > > HYPRE BoomerAMG: Maximum row sums 0.9 >> > > > > > > > HYPRE BoomerAMG: Sweeps down 1 >> > > > > > > > HYPRE BoomerAMG: Sweeps up 1 >> > > > > > > > HYPRE BoomerAMG: Sweeps on coarse 1 >> > > > > > > > HYPRE BoomerAMG: Relax down >> symmetric-SOR/Jacobi >> > > > > > > > HYPRE BoomerAMG: Relax up >> symmetric-SOR/Jacobi >> > > > > > > > HYPRE BoomerAMG: Relax on coarse >> Gaussian-elimination >> > > > > > > > HYPRE BoomerAMG: Relax weight (all) 1 >> > > > > > > > HYPRE BoomerAMG: Outer relax weight (all) 1 >> > > > > > > > HYPRE BoomerAMG: Using CF-relaxation >> > > > > > > > HYPRE BoomerAMG: Measure type local >> > > > > > > > HYPRE BoomerAMG: Coarsen type PMIS >> > > > > > > > HYPRE BoomerAMG: Interpolation type classical >> > > > > > > > linear system matrix = precond matrix: >> > > > > > > > Mat Object: (fieldsplit_u_) 4 MPI processes >> > > > > > > > type: mpiaij >> > > > > > > > rows=938910, cols=938910, bs=3 >> > > > > > > > total: nonzeros=8.60906e+07, allocated >> nonzeros=8.60906e+07 >> > > > > > > > total number of mallocs used during MatSetValues >> calls =0 >> > > > > > > > using I-node (on process 0) routines: found 78749 >> nodes, limit used is 5 >> > > > > > > > Split number 1 Defined by IS >> > > > > > > > KSP Object: (fieldsplit_wp_) 4 MPI processes >> > > > > > > > type: preonly >> > > > > > > > maximum iterations=10000, initial guess is zero >> > > > > > > > tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000 >> > > > > > > > left preconditioning >> > > > > > > > using NONE norm type for convergence test >> > > > > > > > PC Object: (fieldsplit_wp_) 4 MPI processes >> > > > > > > > type: lu >> > > > > > > > LU: out-of-place factorization >> > > > > > > > tolerance for zero pivot 2.22045e-14 >> > > > > > > > matrix ordering: natural >> > > > > > > > factor fill ratio given 0, needed 0 >> > > > > > > > Factored matrix follows: >> > > > > > > > Mat Object: 4 MPI processes >> > > > > > > > type: mpiaij >> > > > > > > > rows=34141, cols=34141 >> > > > > > > > package used to perform factorization: pastix >> > > > > > > > Error : -nan >> > > > > > > > Error : -nan >> > > > > > > > total: nonzeros=0, allocated nonzeros=0 >> > > > > > > > Error : -nan >> > > > > > > > total number of mallocs used during MatSetValues calls =0 >> > > > > > > > PaStiX run parameters: >> > > > > > > > Matrix type : >> Symmetric >> > > > > > > > Level of printing (0,1,2): 0 >> > > > > > > > Number of refinements iterations : 0 >> > > > > > > > Error : -nan >> > > > > > > > linear system matrix = precond matrix: >> > > > > > > > Mat Object: (fieldsplit_wp_) 4 MPI processes >> > > > > > > > type: mpiaij >> > > > > > > > rows=34141, cols=34141 >> > > > > > > > total: nonzeros=485655, allocated nonzeros=485655 >> > > > > > > > total number of mallocs used during MatSetValues >> calls =0 >> > > > > > > > not using I-node (on process 0) routines >> > > > > > > > linear system matrix = precond matrix: >> > > > > > > > Mat Object: 4 MPI processes >> > > > > > > > type: mpiaij >> > > > > > > > rows=973051, cols=973051 >> > > > > > > > total: nonzeros=9.90037e+07, allocated >> nonzeros=9.90037e+07 >> > > > > > > > total number of mallocs used during MatSetValues calls =0 >> > > > > > > > using I-node (on process 0) routines: found 78749 >> nodes, limit used is 5 >> > > > > > > > >> > > > > > > > The pattern of convergence gives a hint that this system is >> somehow bad/singular. But I don't know why the preconditioned error goes up >> too high. Anyone has an idea? >> > > > > > > > >> > > > > > > > Best regards >> > > > > > > > Giang Bui >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > >> > > > > >> > > > >> > > > >> > > >> > > >> > >> > >> >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From bsmith at mcs.anl.gov Sat Apr 29 13:06:38 2017 From: bsmith at mcs.anl.gov (Barry Smith) Date: Sat, 29 Apr 2017 13:06:38 -0500 Subject: [petsc-users] strange convergence In-Reply-To: <87wpa3wd5j.fsf@jedbrown.org> References: <7891536D-91FE-4BFF-8DAD-CE7AB85A4E57@mcs.anl.gov> <425BBB58-9721-49F3-8C86-940F08E925F7@mcs.anl.gov> <42EB791A-40C2-439F-A5F7-5F8C15CECA6F@mcs.anl.gov> <82193784-B4C4-47D7-80EA-25F549C9091B@mcs.anl.gov> <87wpa3wd5j.fsf@jedbrown.org> Message-ID: <3BD8A742-171F-4982-BA98-4951B285972D@mcs.anl.gov> > On Apr 29, 2017, at 8:34 AM, Jed Brown wrote: > > Hoang Giang Bui writes: > >> Hi Barry >> >> The first block is from a standard solid mechanics discretization based on >> balance of momentum equation. There is some material involved but in >> principal it's well-posed elasticity equation with positive definite >> tangent operator. The "gluing business" uses the mortar method to keep the >> continuity of displacement. Instead of using Lagrange multiplier to treat >> the constraint I used penalty method to penalize the energy. The >> discretization form of mortar is quite simple >> >> \int_{\Gamma_1} { rho * (\delta u_1 - \delta u_2) * (u_1 - u_2) dA } >> >> rho is penalty parameter. In the simulation I initially set it low (~E) to >> preserve the conditioning of the system. > > There are two things that can go wrong here with AMG: > > * The penalty term can mess up the strength of connection heuristics > such that you get poor choice of C-points (classical AMG like > BoomerAMG) or poor choice of aggregates (smoothed aggregation). > > * The penalty term can prevent Jacobi smoothing from being effective; in > this case, it can lead to poor coarse basis functions (higher energy > than they should be) and poor smoothing in an MG cycle. You can fix > the poor smoothing in the MG cycle by using a stronger smoother, like > ASM with some overlap. > > I'm generally not a fan of penalty methods due to the irritating > tradeoffs and often poor solver performance. So, let's first see what hypre BoomerAMG is doing with the system. Run for just one BoomerAMG solve with the additional options -fieldsplit_u_ksp_view -fieldsplit_u_pc_hypre_boomeramg_print_statistics this should print a good amount of information of what BoomerAMG has decided to do based on the input matrix. I'm bringing the hypre team into the conversation since they obviously know far more about BoomerAMG tuning options that may help your case. Barry > >> In the figure below, the colorful blocks are u_1 and the base is u_2. Both >> u_1 and u_2 use isoparametric quadratic approximation. >> >> ? >> Snapshot.png >> >> ??? >> >> Giang >> >> On Fri, Apr 28, 2017 at 6:21 PM, Barry Smith wrote: >> >>> >>> Ok, so boomerAMG algebraic multigrid is not good for the first block. >>> You mentioned the first block has two things glued together? AMG is >>> fantastic for certain problems but doesn't work for everything. >>> >>> Tell us more about the first block, what PDE it comes from, what >>> discretization, and what the "gluing business" is and maybe we'll have >>> suggestions for how to precondition it. >>> >>> Barry >>> >>>> On Apr 28, 2017, at 3:56 AM, Hoang Giang Bui wrote: >>>> >>>> It's in fact quite good >>>> >>>> Residual norms for fieldsplit_u_ solve. >>>> 0 KSP Residual norm 4.014715925568e+00 >>>> 1 KSP Residual norm 2.160497019264e-10 >>>> Residual norms for fieldsplit_wp_ solve. >>>> 0 KSP Residual norm 0.000000000000e+00 >>>> 0 KSP preconditioned resid norm 4.014715925568e+00 true resid norm >>> 9.006493082896e+06 ||r(i)||/||b|| 1.000000000000e+00 >>>> Residual norms for fieldsplit_u_ solve. >>>> 0 KSP Residual norm 9.999999999416e-01 >>>> 1 KSP Residual norm 7.118380416383e-11 >>>> Residual norms for fieldsplit_wp_ solve. >>>> 0 KSP Residual norm 0.000000000000e+00 >>>> 1 KSP preconditioned resid norm 1.701150951035e-10 true resid norm >>> 5.494262251846e-04 ||r(i)||/||b|| 6.100334726599e-11 >>>> Linear solve converged due to CONVERGED_ATOL iterations 1 >>>> >>>> Giang >>>> >>>> On Thu, Apr 27, 2017 at 5:25 PM, Barry Smith wrote: >>>> >>>> Run again using LU on both blocks to see what happens. >>>> >>>> >>>>> On Apr 27, 2017, at 2:14 AM, Hoang Giang Bui >>> wrote: >>>>> >>>>> I have changed the way to tie the nonconforming mesh. It seems the >>> matrix now is better >>>>> >>>>> with -pc_type lu the output is >>>>> 0 KSP preconditioned resid norm 3.308678584240e-01 true resid norm >>> 9.006493082896e+06 ||r(i)||/||b|| 1.000000000000e+00 >>>>> 1 KSP preconditioned resid norm 2.004313395301e-12 true resid norm >>> 2.549872332830e-05 ||r(i)||/||b|| 2.831148938173e-12 >>>>> Linear solve converged due to CONVERGED_ATOL iterations 1 >>>>> >>>>> >>>>> with -pc_type fieldsplit -fieldsplit_u_pc_type hypre >>> -fieldsplit_wp_pc_type lu the convergence is slow >>>>> 0 KSP preconditioned resid norm 1.116302362553e-01 true resid norm >>> 9.006493083520e+06 ||r(i)||/||b|| 1.000000000000e+00 >>>>> 1 KSP preconditioned resid norm 2.582134825666e-02 true resid norm >>> 9.268347719866e+06 ||r(i)||/||b|| 1.029073984060e+00 >>>>> ... >>>>> 824 KSP preconditioned resid norm 1.018542387738e-09 true resid norm >>> 2.906608839310e+02 ||r(i)||/||b|| 3.227237074804e-05 >>>>> 825 KSP preconditioned resid norm 9.743727947637e-10 true resid norm >>> 2.820369993061e+02 ||r(i)||/||b|| 3.131485215062e-05 >>>>> Linear solve converged due to CONVERGED_ATOL iterations 825 >>>>> >>>>> checking with additional -fieldsplit_u_ksp_type richardson >>> -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 >>> -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor >>> -fieldsplit_wp_ksp_max_it 1 gives >>>>> >>>>> 0 KSP preconditioned resid norm 1.116302362553e-01 true resid norm >>> 9.006493083520e+06 ||r(i)||/||b|| 1.000000000000e+00 >>>>> Residual norms for fieldsplit_u_ solve. >>>>> 0 KSP Residual norm 5.803507549280e-01 >>>>> 1 KSP Residual norm 2.069538175950e-01 >>>>> Residual norms for fieldsplit_wp_ solve. >>>>> 0 KSP Residual norm 0.000000000000e+00 >>>>> 1 KSP preconditioned resid norm 2.582134825666e-02 true resid norm >>> 9.268347719866e+06 ||r(i)||/||b|| 1.029073984060e+00 >>>>> Residual norms for fieldsplit_u_ solve. >>>>> 0 KSP Residual norm 7.831796195225e-01 >>>>> 1 KSP Residual norm 1.734608520110e-01 >>>>> Residual norms for fieldsplit_wp_ solve. >>>>> 0 KSP Residual norm 0.000000000000e+00 >>>>> .... >>>>> 823 KSP preconditioned resid norm 1.065070135605e-09 true resid norm >>> 3.081881356833e+02 ||r(i)||/||b|| 3.421843916665e-05 >>>>> Residual norms for fieldsplit_u_ solve. >>>>> 0 KSP Residual norm 6.113806394327e-01 >>>>> 1 KSP Residual norm 1.535465290944e-01 >>>>> Residual norms for fieldsplit_wp_ solve. >>>>> 0 KSP Residual norm 0.000000000000e+00 >>>>> 824 KSP preconditioned resid norm 1.018542387746e-09 true resid norm >>> 2.906608839353e+02 ||r(i)||/||b|| 3.227237074851e-05 >>>>> Residual norms for fieldsplit_u_ solve. >>>>> 0 KSP Residual norm 6.123437055586e-01 >>>>> 1 KSP Residual norm 1.524661826133e-01 >>>>> Residual norms for fieldsplit_wp_ solve. >>>>> 0 KSP Residual norm 0.000000000000e+00 >>>>> 825 KSP preconditioned resid norm 9.743727947718e-10 true resid norm >>> 2.820369990571e+02 ||r(i)||/||b|| 3.131485212298e-05 >>>>> Linear solve converged due to CONVERGED_ATOL iterations 825 >>>>> >>>>> >>>>> The residual for wp block is zero since in this first step the rhs is >>> zero. As can see in the output, the multigrid does not perform well to >>> reduce the residual in the sub-solve. Is my observation right? what can be >>> done to improve this? >>>>> >>>>> >>>>> Giang >>>>> >>>>> On Tue, Apr 25, 2017 at 12:17 AM, Barry Smith >>> wrote: >>>>> >>>>> This can happen in the matrix is singular or nearly singular or if >>> the factorization generates small pivots, which can occur for even >>> nonsingular problems if the matrix is poorly scaled or just plain nasty. >>>>> >>>>> >>>>>> On Apr 24, 2017, at 5:10 PM, Hoang Giang Bui >>> wrote: >>>>>> >>>>>> It took a while, here I send you the output >>>>>> >>>>>> 0 KSP preconditioned resid norm 3.129073545457e+05 true resid norm >>> 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 >>>>>> 1 KSP preconditioned resid norm 7.442444222843e-01 true resid norm >>> 1.003356247696e+02 ||r(i)||/||b|| 1.112966720375e-05 >>>>>> 2 KSP preconditioned resid norm 3.267453132529e-07 true resid norm >>> 3.216722968300e+01 ||r(i)||/||b|| 3.568130084011e-06 >>>>>> 3 KSP preconditioned resid norm 1.155046883816e-11 true resid norm >>> 3.234460376820e+01 ||r(i)||/||b|| 3.587805194854e-06 >>>>>> Linear solve converged due to CONVERGED_ATOL iterations 3 >>>>>> KSP Object: 4 MPI processes >>>>>> type: gmres >>>>>> GMRES: restart=1000, using Modified Gram-Schmidt >>> Orthogonalization >>>>>> GMRES: happy breakdown tolerance 1e-30 >>>>>> maximum iterations=1000, initial guess is zero >>>>>> tolerances: relative=1e-20, absolute=1e-09, divergence=10000 >>>>>> left preconditioning >>>>>> using PRECONDITIONED norm type for convergence test >>>>>> PC Object: 4 MPI processes >>>>>> type: lu >>>>>> LU: out-of-place factorization >>>>>> tolerance for zero pivot 2.22045e-14 >>>>>> matrix ordering: natural >>>>>> factor fill ratio given 0, needed 0 >>>>>> Factored matrix follows: >>>>>> Mat Object: 4 MPI processes >>>>>> type: mpiaij >>>>>> rows=973051, cols=973051 >>>>>> package used to perform factorization: pastix >>>>>> Error : 3.24786e-14 >>>>>> total: nonzeros=0, allocated nonzeros=0 >>>>>> total number of mallocs used during MatSetValues calls =0 >>>>>> PaStiX run parameters: >>>>>> Matrix type : Unsymmetric >>>>>> Level of printing (0,1,2): 0 >>>>>> Number of refinements iterations : 3 >>>>>> Error : 3.24786e-14 >>>>>> linear system matrix = precond matrix: >>>>>> Mat Object: 4 MPI processes >>>>>> type: mpiaij >>>>>> rows=973051, cols=973051 >>>>>> Error : 3.24786e-14 >>>>>> total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 >>>>>> total number of mallocs used during MatSetValues calls =0 >>>>>> using I-node (on process 0) routines: found 78749 nodes, limit >>> used is 5 >>>>>> Error : 3.24786e-14 >>>>>> >>>>>> It doesn't do as you said. Something is not right here. I will look >>> in depth. >>>>>> >>>>>> Giang >>>>>> >>>>>> On Mon, Apr 24, 2017 at 8:21 PM, Barry Smith >>> wrote: >>>>>> >>>>>>> On Apr 24, 2017, at 12:47 PM, Hoang Giang Bui >>> wrote: >>>>>>> >>>>>>> Good catch. I get this for the very first step, maybe at that time >>> the rhs_w is zero. >>>>>> >>>>>> With the multiplicative composition the right hand side of the >>> second solve is the initial right hand side of the second solve minus >>> A_10*x where x is the solution to the first sub solve and A_10 is the lower >>> left block of the outer matrix. So unless both the initial right hand side >>> has a zero for the second block and A_10 is identically zero the right hand >>> side for the second sub solve should not be zero. Is A_10 == 0? >>>>>> >>>>>> >>>>>>> In the later step, it shows 2 step convergence >>>>>>> >>>>>>> Residual norms for fieldsplit_u_ solve. >>>>>>> 0 KSP Residual norm 3.165886479830e+04 >>>>>>> 1 KSP Residual norm 2.905922877684e-01 >>>>>>> Residual norms for fieldsplit_wp_ solve. >>>>>>> 0 KSP Residual norm 2.397669419027e-01 >>>>>>> 1 KSP Residual norm 0.000000000000e+00 >>>>>>> 0 KSP preconditioned resid norm 3.165886479920e+04 true resid >>> norm 7.963616922323e+05 ||r(i)||/||b|| 1.000000000000e+00 >>>>>>> Residual norms for fieldsplit_u_ solve. >>>>>>> 0 KSP Residual norm 9.999891813771e-01 >>>>>>> 1 KSP Residual norm 1.512000395579e-05 >>>>>>> Residual norms for fieldsplit_wp_ solve. >>>>>>> 0 KSP Residual norm 8.192702188243e-06 >>>>>>> 1 KSP Residual norm 0.000000000000e+00 >>>>>>> 1 KSP preconditioned resid norm 5.252183822848e-02 true resid >>> norm 7.135927677844e+04 ||r(i)||/||b|| 8.960661653427e-02 >>>>>> >>>>>> The outer residual norms are still wonky, the preconditioned >>> residual norm goes from 3.165886479920e+04 to 5.252183822848e-02 which is a >>> huge drop but the 7.963616922323e+05 drops very much less >>> 7.135927677844e+04. This is not normal. >>>>>> >>>>>> What if you just use -pc_type lu for the entire system (no >>> fieldsplit), does the true residual drop to almost zero in the first >>> iteration (as it should?). Send the output. >>>>>> >>>>>> >>>>>> >>>>>>> Residual norms for fieldsplit_u_ solve. >>>>>>> 0 KSP Residual norm 6.946213936597e-01 >>>>>>> 1 KSP Residual norm 1.195514007343e-05 >>>>>>> Residual norms for fieldsplit_wp_ solve. >>>>>>> 0 KSP Residual norm 1.025694497535e+00 >>>>>>> 1 KSP Residual norm 0.000000000000e+00 >>>>>>> 2 KSP preconditioned resid norm 8.785709535405e-03 true resid >>> norm 1.419341799277e+04 ||r(i)||/||b|| 1.782282866091e-02 >>>>>>> Residual norms for fieldsplit_u_ solve. >>>>>>> 0 KSP Residual norm 7.255149996405e-01 >>>>>>> 1 KSP Residual norm 6.583512434218e-06 >>>>>>> Residual norms for fieldsplit_wp_ solve. >>>>>>> 0 KSP Residual norm 1.015229700337e+00 >>>>>>> 1 KSP Residual norm 0.000000000000e+00 >>>>>>> 3 KSP preconditioned resid norm 7.110407712709e-04 true resid >>> norm 5.284940654154e+02 ||r(i)||/||b|| 6.636357205153e-04 >>>>>>> Residual norms for fieldsplit_u_ solve. >>>>>>> 0 KSP Residual norm 3.512243341400e-01 >>>>>>> 1 KSP Residual norm 2.032490351200e-06 >>>>>>> Residual norms for fieldsplit_wp_ solve. >>>>>>> 0 KSP Residual norm 1.282327290982e+00 >>>>>>> 1 KSP Residual norm 0.000000000000e+00 >>>>>>> 4 KSP preconditioned resid norm 3.482036620521e-05 true resid >>> norm 4.291231924307e+01 ||r(i)||/||b|| 5.388546393133e-05 >>>>>>> Residual norms for fieldsplit_u_ solve. >>>>>>> 0 KSP Residual norm 3.423609338053e-01 >>>>>>> 1 KSP Residual norm 4.213703301972e-07 >>>>>>> Residual norms for fieldsplit_wp_ solve. >>>>>>> 0 KSP Residual norm 1.157384757538e+00 >>>>>>> 1 KSP Residual norm 0.000000000000e+00 >>>>>>> 5 KSP preconditioned resid norm 1.203470314534e-06 true resid >>> norm 4.544956156267e+00 ||r(i)||/||b|| 5.707150658550e-06 >>>>>>> Residual norms for fieldsplit_u_ solve. >>>>>>> 0 KSP Residual norm 3.838596289995e-01 >>>>>>> 1 KSP Residual norm 9.927864176103e-08 >>>>>>> Residual norms for fieldsplit_wp_ solve. >>>>>>> 0 KSP Residual norm 1.066298905618e+00 >>>>>>> 1 KSP Residual norm 0.000000000000e+00 >>>>>>> 6 KSP preconditioned resid norm 3.331619244266e-08 true resid >>> norm 2.821511729024e+00 ||r(i)||/||b|| 3.543002829675e-06 >>>>>>> Residual norms for fieldsplit_u_ solve. >>>>>>> 0 KSP Residual norm 4.624964188094e-01 >>>>>>> 1 KSP Residual norm 6.418229775372e-08 >>>>>>> Residual norms for fieldsplit_wp_ solve. >>>>>>> 0 KSP Residual norm 9.800784311614e-01 >>>>>>> 1 KSP Residual norm 0.000000000000e+00 >>>>>>> 7 KSP preconditioned resid norm 8.788046233297e-10 true resid >>> norm 2.849209671705e+00 ||r(i)||/||b|| 3.577783436215e-06 >>>>>>> Linear solve converged due to CONVERGED_ATOL iterations 7 >>>>>>> >>>>>>> The outer operator is an explicit matrix. >>>>>>> >>>>>>> Giang >>>>>>> >>>>>>> On Mon, Apr 24, 2017 at 7:32 PM, Barry Smith >>> wrote: >>>>>>> >>>>>>>> On Apr 24, 2017, at 3:16 AM, Hoang Giang Bui >>> wrote: >>>>>>>> >>>>>>>> Thanks Barry, trying with -fieldsplit_u_type lu gives better >>> convergence. I still used 4 procs though, probably with 1 proc it should >>> also be the same. >>>>>>>> >>>>>>>> The u block used a Nitsche-type operator to connect two >>> non-matching domains. I don't think it will leave some rigid body motion >>> leads to not sufficient constraints. Maybe you have other idea? >>>>>>>> >>>>>>>> Residual norms for fieldsplit_u_ solve. >>>>>>>> 0 KSP Residual norm 3.129067184300e+05 >>>>>>>> 1 KSP Residual norm 5.906261468196e-01 >>>>>>>> Residual norms for fieldsplit_wp_ solve. >>>>>>>> 0 KSP Residual norm 0.000000000000e+00 >>>>>>> >>>>>>> ^^^^ something is wrong here. The sub solve should not be >>> starting with a 0 residual (this means the right hand side for this sub >>> solve is zero which it should not be). >>>>>>> >>>>>>>> FieldSplit with MULTIPLICATIVE composition: total splits = 2 >>>>>>> >>>>>>> >>>>>>> How are you providing the outer operator? As an explicit matrix >>> or with some shell matrix? >>>>>>> >>>>>>> >>>>>>> >>>>>>>> 0 KSP preconditioned resid norm 3.129067184300e+05 true resid >>> norm 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 >>>>>>>> Residual norms for fieldsplit_u_ solve. >>>>>>>> 0 KSP Residual norm 9.999955993437e-01 >>>>>>>> 1 KSP Residual norm 4.019774691831e-06 >>>>>>>> Residual norms for fieldsplit_wp_ solve. >>>>>>>> 0 KSP Residual norm 0.000000000000e+00 >>>>>>>> 1 KSP preconditioned resid norm 5.003913641475e-01 true resid >>> norm 4.692996324114e+01 ||r(i)||/||b|| 5.205677185522e-06 >>>>>>>> Residual norms for fieldsplit_u_ solve. >>>>>>>> 0 KSP Residual norm 1.000012180204e+00 >>>>>>>> 1 KSP Residual norm 1.017367950422e-05 >>>>>>>> Residual norms for fieldsplit_wp_ solve. >>>>>>>> 0 KSP Residual norm 0.000000000000e+00 >>>>>>>> 2 KSP preconditioned resid norm 2.330910333756e-07 true resid >>> norm 3.474855463983e+01 ||r(i)||/||b|| 3.854461960453e-06 >>>>>>>> Residual norms for fieldsplit_u_ solve. >>>>>>>> 0 KSP Residual norm 1.000004200085e+00 >>>>>>>> 1 KSP Residual norm 6.231613102458e-06 >>>>>>>> Residual norms for fieldsplit_wp_ solve. >>>>>>>> 0 KSP Residual norm 0.000000000000e+00 >>>>>>>> 3 KSP preconditioned resid norm 8.671259838389e-11 true resid >>> norm 3.545103468011e+01 ||r(i)||/||b|| 3.932384125024e-06 >>>>>>>> Linear solve converged due to CONVERGED_ATOL iterations 3 >>>>>>>> KSP Object: 4 MPI processes >>>>>>>> type: gmres >>>>>>>> GMRES: restart=1000, using Modified Gram-Schmidt >>> Orthogonalization >>>>>>>> GMRES: happy breakdown tolerance 1e-30 >>>>>>>> maximum iterations=1000, initial guess is zero >>>>>>>> tolerances: relative=1e-20, absolute=1e-09, divergence=10000 >>>>>>>> left preconditioning >>>>>>>> using PRECONDITIONED norm type for convergence test >>>>>>>> PC Object: 4 MPI processes >>>>>>>> type: fieldsplit >>>>>>>> FieldSplit with MULTIPLICATIVE composition: total splits = 2 >>>>>>>> Solver info for each split is in the following KSP objects: >>>>>>>> Split number 0 Defined by IS >>>>>>>> KSP Object: (fieldsplit_u_) 4 MPI processes >>>>>>>> type: richardson >>>>>>>> Richardson: damping factor=1 >>>>>>>> maximum iterations=1, initial guess is zero >>>>>>>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000 >>>>>>>> left preconditioning >>>>>>>> using PRECONDITIONED norm type for convergence test >>>>>>>> PC Object: (fieldsplit_u_) 4 MPI processes >>>>>>>> type: lu >>>>>>>> LU: out-of-place factorization >>>>>>>> tolerance for zero pivot 2.22045e-14 >>>>>>>> matrix ordering: natural >>>>>>>> factor fill ratio given 0, needed 0 >>>>>>>> Factored matrix follows: >>>>>>>> Mat Object: 4 MPI processes >>>>>>>> type: mpiaij >>>>>>>> rows=938910, cols=938910 >>>>>>>> package used to perform factorization: pastix >>>>>>>> total: nonzeros=0, allocated nonzeros=0 >>>>>>>> Error : 3.36878e-14 >>>>>>>> total number of mallocs used during MatSetValues calls >>> =0 >>>>>>>> PaStiX run parameters: >>>>>>>> Matrix type : Unsymmetric >>>>>>>> Level of printing (0,1,2): 0 >>>>>>>> Number of refinements iterations : 3 >>>>>>>> Error : 3.36878e-14 >>>>>>>> linear system matrix = precond matrix: >>>>>>>> Mat Object: (fieldsplit_u_) 4 MPI processes >>>>>>>> type: mpiaij >>>>>>>> rows=938910, cols=938910, bs=3 >>>>>>>> Error : 3.36878e-14 >>>>>>>> Error : 3.36878e-14 >>>>>>>> total: nonzeros=8.60906e+07, allocated >>> nonzeros=8.60906e+07 >>>>>>>> total number of mallocs used during MatSetValues calls =0 >>>>>>>> using I-node (on process 0) routines: found 78749 >>> nodes, limit used is 5 >>>>>>>> Split number 1 Defined by IS >>>>>>>> KSP Object: (fieldsplit_wp_) 4 MPI processes >>>>>>>> type: richardson >>>>>>>> Richardson: damping factor=1 >>>>>>>> maximum iterations=1, initial guess is zero >>>>>>>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000 >>>>>>>> left preconditioning >>>>>>>> using PRECONDITIONED norm type for convergence test >>>>>>>> PC Object: (fieldsplit_wp_) 4 MPI processes >>>>>>>> type: lu >>>>>>>> LU: out-of-place factorization >>>>>>>> tolerance for zero pivot 2.22045e-14 >>>>>>>> matrix ordering: natural >>>>>>>> factor fill ratio given 0, needed 0 >>>>>>>> Factored matrix follows: >>>>>>>> Mat Object: 4 MPI processes >>>>>>>> type: mpiaij >>>>>>>> rows=34141, cols=34141 >>>>>>>> package used to perform factorization: pastix >>>>>>>> Error : -nan >>>>>>>> Error : -nan >>>>>>>> Error : -nan >>>>>>>> total: nonzeros=0, allocated nonzeros=0 >>>>>>>> total number of mallocs used during MatSetValues >>> calls =0 >>>>>>>> PaStiX run parameters: >>>>>>>> Matrix type : Symmetric >>>>>>>> Level of printing (0,1,2): 0 >>>>>>>> Number of refinements iterations : 0 >>>>>>>> Error : -nan >>>>>>>> linear system matrix = precond matrix: >>>>>>>> Mat Object: (fieldsplit_wp_) 4 MPI processes >>>>>>>> type: mpiaij >>>>>>>> rows=34141, cols=34141 >>>>>>>> total: nonzeros=485655, allocated nonzeros=485655 >>>>>>>> total number of mallocs used during MatSetValues calls =0 >>>>>>>> not using I-node (on process 0) routines >>>>>>>> linear system matrix = precond matrix: >>>>>>>> Mat Object: 4 MPI processes >>>>>>>> type: mpiaij >>>>>>>> rows=973051, cols=973051 >>>>>>>> total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 >>>>>>>> total number of mallocs used during MatSetValues calls =0 >>>>>>>> using I-node (on process 0) routines: found 78749 nodes, >>> limit used is 5 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Giang >>>>>>>> >>>>>>>> On Sun, Apr 23, 2017 at 10:19 PM, Barry Smith < >>> bsmith at mcs.anl.gov> wrote: >>>>>>>> >>>>>>>>> On Apr 23, 2017, at 2:42 PM, Hoang Giang Bui < >>> hgbk2008 at gmail.com> wrote: >>>>>>>>> >>>>>>>>> Dear Matt/Barry >>>>>>>>> >>>>>>>>> With your options, it results in >>>>>>>>> >>>>>>>>> 0 KSP preconditioned resid norm 1.106709687386e+31 true >>> resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 >>>>>>>>> Residual norms for fieldsplit_u_ solve. >>>>>>>>> 0 KSP Residual norm 2.407308987203e+36 >>>>>>>>> 1 KSP Residual norm 5.797185652683e+72 >>>>>>>> >>>>>>>> It looks like Matt is right, hypre is seemly producing useless >>> garbage. >>>>>>>> >>>>>>>> First how do things run on one process. If you have similar >>> problems then debug on one process (debugging any kind of problem is always >>> far easy on one process). >>>>>>>> >>>>>>>> First run with -fieldsplit_u_type lu (instead of using hypre) to >>> see if that works or also produces something bad. >>>>>>>> >>>>>>>> What is the operator and the boundary conditions for u? It could >>> be singular. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Residual norms for fieldsplit_wp_ solve. >>>>>>>>> 0 KSP Residual norm 0.000000000000e+00 >>>>>>>>> ... >>>>>>>>> 999 KSP preconditioned resid norm 2.920157329174e+12 true >>> resid norm 9.015683504616e+06 ||r(i)||/||b|| 1.000059124102e+00 >>>>>>>>> Residual norms for fieldsplit_u_ solve. >>>>>>>>> 0 KSP Residual norm 1.533726746719e+36 >>>>>>>>> 1 KSP Residual norm 3.692757392261e+72 >>>>>>>>> Residual norms for fieldsplit_wp_ solve. >>>>>>>>> 0 KSP Residual norm 0.000000000000e+00 >>>>>>>>> >>>>>>>>> Do you suggest that the pastix solver for the "wp" block >>> encounters small pivot? In addition, seem like the "u" block is also >>> singular. >>>>>>>>> >>>>>>>>> Giang >>>>>>>>> >>>>>>>>> On Sun, Apr 23, 2017 at 7:39 PM, Barry Smith < >>> bsmith at mcs.anl.gov> wrote: >>>>>>>>> >>>>>>>>> Huge preconditioned norms but normal unpreconditioned norms >>> almost always come from a very small pivot in an LU or ILU factorization. >>>>>>>>> >>>>>>>>> The first thing to do is monitor the two sub solves. Run >>> with the additional options -fieldsplit_u_ksp_type richardson >>> -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 >>> -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor >>> -fieldsplit_wp_ksp_max_it 1 >>>>>>>>> >>>>>>>>>> On Apr 23, 2017, at 12:22 PM, Hoang Giang Bui < >>> hgbk2008 at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>> Hello >>>>>>>>>> >>>>>>>>>> I encountered a strange convergence behavior that I have >>> trouble to understand >>>>>>>>>> >>>>>>>>>> KSPSetFromOptions completed >>>>>>>>>> 0 KSP preconditioned resid norm 1.106709687386e+31 true >>> resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 >>>>>>>>>> 1 KSP preconditioned resid norm 2.933141742664e+29 true >>> resid norm 9.015152282123e+06 ||r(i)||/||b|| 1.000000198575e+00 >>>>>>>>>> 2 KSP preconditioned resid norm 9.686409637174e+16 true >>> resid norm 9.015354521944e+06 ||r(i)||/||b|| 1.000022631902e+00 >>>>>>>>>> 3 KSP preconditioned resid norm 4.219243615809e+15 true >>> resid norm 9.017157702420e+06 ||r(i)||/||b|| 1.000222648583e+00 >>>>>>>>>> ..... >>>>>>>>>> 999 KSP preconditioned resid norm 3.043754298076e+12 true >>> resid norm 9.015425041089e+06 ||r(i)||/||b|| 1.000030454195e+00 >>>>>>>>>> 1000 KSP preconditioned resid norm 3.043000287819e+12 true >>> resid norm 9.015424313455e+06 ||r(i)||/||b|| 1.000030373483e+00 >>>>>>>>>> Linear solve did not converge due to DIVERGED_ITS iterations >>> 1000 >>>>>>>>>> KSP Object: 4 MPI processes >>>>>>>>>> type: gmres >>>>>>>>>> GMRES: restart=1000, using Modified Gram-Schmidt >>> Orthogonalization >>>>>>>>>> GMRES: happy breakdown tolerance 1e-30 >>>>>>>>>> maximum iterations=1000, initial guess is zero >>>>>>>>>> tolerances: relative=1e-20, absolute=1e-09, >>> divergence=10000 >>>>>>>>>> left preconditioning >>>>>>>>>> using PRECONDITIONED norm type for convergence test >>>>>>>>>> PC Object: 4 MPI processes >>>>>>>>>> type: fieldsplit >>>>>>>>>> FieldSplit with MULTIPLICATIVE composition: total splits >>> = 2 >>>>>>>>>> Solver info for each split is in the following KSP >>> objects: >>>>>>>>>> Split number 0 Defined by IS >>>>>>>>>> KSP Object: (fieldsplit_u_) 4 MPI processes >>>>>>>>>> type: preonly >>>>>>>>>> maximum iterations=10000, initial guess is zero >>>>>>>>>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000 >>>>>>>>>> left preconditioning >>>>>>>>>> using NONE norm type for convergence test >>>>>>>>>> PC Object: (fieldsplit_u_) 4 MPI processes >>>>>>>>>> type: hypre >>>>>>>>>> HYPRE BoomerAMG preconditioning >>>>>>>>>> HYPRE BoomerAMG: Cycle type V >>>>>>>>>> HYPRE BoomerAMG: Maximum number of levels 25 >>>>>>>>>> HYPRE BoomerAMG: Maximum number of iterations PER >>> hypre call 1 >>>>>>>>>> HYPRE BoomerAMG: Convergence tolerance PER hypre >>> call 0 >>>>>>>>>> HYPRE BoomerAMG: Threshold for strong coupling 0.6 >>>>>>>>>> HYPRE BoomerAMG: Interpolation truncation factor 0 >>>>>>>>>> HYPRE BoomerAMG: Interpolation: max elements per row >>> 0 >>>>>>>>>> HYPRE BoomerAMG: Number of levels of aggressive >>> coarsening 0 >>>>>>>>>> HYPRE BoomerAMG: Number of paths for aggressive >>> coarsening 1 >>>>>>>>>> HYPRE BoomerAMG: Maximum row sums 0.9 >>>>>>>>>> HYPRE BoomerAMG: Sweeps down 1 >>>>>>>>>> HYPRE BoomerAMG: Sweeps up 1 >>>>>>>>>> HYPRE BoomerAMG: Sweeps on coarse 1 >>>>>>>>>> HYPRE BoomerAMG: Relax down >>> symmetric-SOR/Jacobi >>>>>>>>>> HYPRE BoomerAMG: Relax up >>> symmetric-SOR/Jacobi >>>>>>>>>> HYPRE BoomerAMG: Relax on coarse >>> Gaussian-elimination >>>>>>>>>> HYPRE BoomerAMG: Relax weight (all) 1 >>>>>>>>>> HYPRE BoomerAMG: Outer relax weight (all) 1 >>>>>>>>>> HYPRE BoomerAMG: Using CF-relaxation >>>>>>>>>> HYPRE BoomerAMG: Measure type local >>>>>>>>>> HYPRE BoomerAMG: Coarsen type PMIS >>>>>>>>>> HYPRE BoomerAMG: Interpolation type classical >>>>>>>>>> linear system matrix = precond matrix: >>>>>>>>>> Mat Object: (fieldsplit_u_) 4 MPI processes >>>>>>>>>> type: mpiaij >>>>>>>>>> rows=938910, cols=938910, bs=3 >>>>>>>>>> total: nonzeros=8.60906e+07, allocated >>> nonzeros=8.60906e+07 >>>>>>>>>> total number of mallocs used during MatSetValues >>> calls =0 >>>>>>>>>> using I-node (on process 0) routines: found 78749 >>> nodes, limit used is 5 >>>>>>>>>> Split number 1 Defined by IS >>>>>>>>>> KSP Object: (fieldsplit_wp_) 4 MPI processes >>>>>>>>>> type: preonly >>>>>>>>>> maximum iterations=10000, initial guess is zero >>>>>>>>>> tolerances: relative=1e-05, absolute=1e-50, >>> divergence=10000 >>>>>>>>>> left preconditioning >>>>>>>>>> using NONE norm type for convergence test >>>>>>>>>> PC Object: (fieldsplit_wp_) 4 MPI processes >>>>>>>>>> type: lu >>>>>>>>>> LU: out-of-place factorization >>>>>>>>>> tolerance for zero pivot 2.22045e-14 >>>>>>>>>> matrix ordering: natural >>>>>>>>>> factor fill ratio given 0, needed 0 >>>>>>>>>> Factored matrix follows: >>>>>>>>>> Mat Object: 4 MPI processes >>>>>>>>>> type: mpiaij >>>>>>>>>> rows=34141, cols=34141 >>>>>>>>>> package used to perform factorization: pastix >>>>>>>>>> Error : -nan >>>>>>>>>> Error : -nan >>>>>>>>>> total: nonzeros=0, allocated nonzeros=0 >>>>>>>>>> Error : -nan >>>>>>>>>> total number of mallocs used during MatSetValues calls =0 >>>>>>>>>> PaStiX run parameters: >>>>>>>>>> Matrix type : >>> Symmetric >>>>>>>>>> Level of printing (0,1,2): 0 >>>>>>>>>> Number of refinements iterations : 0 >>>>>>>>>> Error : -nan >>>>>>>>>> linear system matrix = precond matrix: >>>>>>>>>> Mat Object: (fieldsplit_wp_) 4 MPI processes >>>>>>>>>> type: mpiaij >>>>>>>>>> rows=34141, cols=34141 >>>>>>>>>> total: nonzeros=485655, allocated nonzeros=485655 >>>>>>>>>> total number of mallocs used during MatSetValues >>> calls =0 >>>>>>>>>> not using I-node (on process 0) routines >>>>>>>>>> linear system matrix = precond matrix: >>>>>>>>>> Mat Object: 4 MPI processes >>>>>>>>>> type: mpiaij >>>>>>>>>> rows=973051, cols=973051 >>>>>>>>>> total: nonzeros=9.90037e+07, allocated >>> nonzeros=9.90037e+07 >>>>>>>>>> total number of mallocs used during MatSetValues calls =0 >>>>>>>>>> using I-node (on process 0) routines: found 78749 >>> nodes, limit used is 5 >>>>>>>>>> >>>>>>>>>> The pattern of convergence gives a hint that this system is >>> somehow bad/singular. But I don't know why the preconditioned error goes up >>> too high. Anyone has an idea? >>>>>>>>>> >>>>>>>>>> Best regards >>>>>>>>>> Giang Bui >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> From rupp at iue.tuwien.ac.at Sat Apr 29 14:44:27 2017 From: rupp at iue.tuwien.ac.at (Karl Rupp) Date: Sat, 29 Apr 2017 21:44:27 +0200 Subject: [petsc-users] Using ViennaCL without recompiling In-Reply-To: References: <1DBD3F6F-E787-4E13-B997-88B85090BA17@me.com> Message-ID: <84a74d00-cee6-2eb3-a28d-9e910032e323@iue.tuwien.ac.at> Hi Franco, yes, in principle you can substitute your current PETSc build with your custom PETSc build (with ViennaCL enabled) at link time, provided that the version numbers match. Just to be clear: There is no way around recompiling PETSc, though; there's no mechanism for dynamically loading some other shared library to enable ViennaCL bindings in PETSc. Best regards, Karli On 04/29/2017 08:14 AM, Franco Milicchio wrote: > >> On Apr 28, 2017, at 4:46pm, Satish Balay > > wrote: >> >> On Fri, 28 Apr 2017, Franco Milicchio wrote: >> >>> >>>> Not recompiling your own project is fine. PETSc has an ABI. You just >>>> reconfigure/recompile PETSc with >>>> ViennaCL support. Then you can use -mat_type viennacl etc. >>> >>> Thanks for your answer, Matt, but I expressed myself in an ambiguous way. >>> >>> I cannot recompile PETSc, I can do whatever I want with my code. >> >> You can always install PETSc. >> >> If you don't have write permission to the install you are currently >> using - you can start with a fresh tarball [of the same version], use >> reconfigure*.py from the current install to configure - and install >> your own copy [obviously at a different location. > > Thanks, Satish. > > As I understand, you are suggesting to just substitute PETSc at linking > level with my ViennaCL-enabled library, and it should work ?flawlessly?? > (the milage may vary, obviously) > > This would be a huge gain to the project. > > Thanks, > Franco > /fm > > -- > Franco Milicchio > > > Department of Engineering > University Roma Tre > https://fmilicchio.bitbucket.io/ > From balay at mcs.anl.gov Sat Apr 29 16:59:27 2017 From: balay at mcs.anl.gov (Satish Balay) Date: Sat, 29 Apr 2017 16:59:27 -0500 Subject: [petsc-users] Using ViennaCL without recompiling In-Reply-To: <84a74d00-cee6-2eb3-a28d-9e910032e323@iue.tuwien.ac.at> References: <1DBD3F6F-E787-4E13-B997-88B85090BA17@me.com> <84a74d00-cee6-2eb3-a28d-9e910032e323@iue.tuwien.ac.at> Message-ID: I was actually suggesting that you recompile and link with the new install of PETSc. If using PETSc makefiles - you would be chaning PETSC_DIR [and perhaps PETSC_ARCH] values and rebuilding all PETSc related code. Satish > Hi Franco, > > yes, in principle you can substitute your current PETSc build with your custom > PETSc build (with ViennaCL enabled) at link time, provided that the version > numbers match. > > Just to be clear: There is no way around recompiling PETSc, though; there's no > mechanism for dynamically loading some other shared library to enable ViennaCL > bindings in PETSc. > > Best regards, > Karli > > > On 04/29/2017 08:14 AM, Franco Milicchio wrote: > > > > > On Apr 28, 2017, at 4:46pm, Satish Balay > > > wrote: > > > > > > On Fri, 28 Apr 2017, Franco Milicchio wrote: > > > > > > > > > > > > Not recompiling your own project is fine. PETSc has an ABI. You just > > > > > reconfigure/recompile PETSc with > > > > > ViennaCL support. Then you can use -mat_type viennacl etc. > > > > > > > > Thanks for your answer, Matt, but I expressed myself in an ambiguous > > > > way. > > > > > > > > I cannot recompile PETSc, I can do whatever I want with my code. > > > > > > You can always install PETSc. > > > > > > If you don't have write permission to the install you are currently > > > using - you can start with a fresh tarball [of the same version], use > > > reconfigure*.py from the current install to configure - and install > > > your own copy [obviously at a different location. > > > > Thanks, Satish. > > > > As I understand, you are suggesting to just substitute PETSc at linking > > level with my ViennaCL-enabled library, and it should work ?flawlessly?? > > (the milage may vary, obviously) > > > > This would be a huge gain to the project. > > > > Thanks, > > Franco > > /fm > > > > -- > > Franco Milicchio > > > > > Department of Engineering > > University Roma Tre > > https://fmilicchio.bitbucket.io/ > > > > From hgbk2008 at gmail.com Sat Apr 29 16:59:36 2017 From: hgbk2008 at gmail.com (Hoang Giang Bui) Date: Sat, 29 Apr 2017 23:59:36 +0200 Subject: [petsc-users] strange convergence In-Reply-To: <3BD8A742-171F-4982-BA98-4951B285972D@mcs.anl.gov> References: <7891536D-91FE-4BFF-8DAD-CE7AB85A4E57@mcs.anl.gov> <425BBB58-9721-49F3-8C86-940F08E925F7@mcs.anl.gov> <42EB791A-40C2-439F-A5F7-5F8C15CECA6F@mcs.anl.gov> <82193784-B4C4-47D7-80EA-25F549C9091B@mcs.anl.gov> <87wpa3wd5j.fsf@jedbrown.org> <3BD8A742-171F-4982-BA98-4951B285972D@mcs.anl.gov> Message-ID: Thanks Barry Running with that option gives the output for the first solve: BoomerAMG SETUP PARAMETERS: Max levels = 25 Num levels = 7 Strength Threshold = 0.100000 Interpolation Truncation Factor = 0.000000 Maximum Row Sum Threshold for Dependency Weakening = 0.900000 Coarsening Type = PMIS measures are determined locally No global partition option chosen. Interpolation = modified classical interpolation Operator Matrix Information: nonzero entries per row row sums lev rows entries sparse min max avg min max =================================================================== 0 1056957 109424691 0.000 30 1617 103.5 -2.075e+11 3.561e+11 1 185483 33504881 0.001 17 713 180.6 -3.493e+11 1.323e+13 2 26295 4691629 0.007 17 513 178.4 -3.367e+10 6.960e+12 3 3438 432138 0.037 24 295 125.7 -2.194e+10 2.154e+11 4 476 34182 0.151 8 192 71.8 -6.435e+09 2.306e+11 5 84 2410 0.342 8 70 28.7 -1.052e+07 6.640e+10 6 18 252 0.778 10 18 14.0 9.038e+06 8.828e+10 Interpolation Matrix Information: entries/row min max row sums lev rows cols min max weight weight min max ================================================================= 0 1056957 x 185483 0 18 -1.143e+02 7.741e+01 -1.143e+02 7.741e+01 1 185483 x 26295 0 15 -1.053e+01 2.918e+00 -1.053e+01 2.918e+00 2 26295 x 3438 0 9 1.308e-02 1.036e+00 0.000e+00 1.058e+00 3 3438 x 476 0 7 1.782e-02 1.015e+00 0.000e+00 1.015e+00 4 476 x 84 0 5 1.378e-02 1.000e+00 0.000e+00 1.000e+00 5 84 x 18 0 3 1.330e-02 1.000e+00 0.000e+00 1.000e+00 Complexity: grid = 1.204165 operator = 1.353353 memory = 1.381360 BoomerAMG SOLVER PARAMETERS: Maximum number of cycles: 1 Stopping Tolerance: 0.000000e+00 Cycle type (1 = V, 2 = W, etc.): 1 Relaxation Parameters: Visiting Grid: down up coarse Number of sweeps: 1 1 1 Type 0=Jac, 3=hGS, 6=hSGS, 9=GE: 6 6 6 Point types, partial sweeps (1=C, -1=F): Pre-CG relaxation (down): 1 -1 Post-CG relaxation (up): -1 1 Coarsest grid: 0 Output flag (print_level): 3 relative residual factor residual -------- ------ -------- Initial 9.006493e+06 1.000000e+00 Cycle 1 7.994266e+06 0.887611 <08876%2011> 8.876114e-01 Average Convergence Factor = 0.887611 <08876%2011> Complexity: grid = 1.204165 operator = 1.353353 cycle = 2.706703 KSP Object:(fieldsplit_u_) 8 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000 left preconditioning using NONE norm type for convergence test PC Object:(fieldsplit_u_) 8 MPI processes type: hypre HYPRE BoomerAMG preconditioning HYPRE BoomerAMG: Cycle type V HYPRE BoomerAMG: Maximum number of levels 25 HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 HYPRE BoomerAMG: Threshold for strong coupling 0.1 HYPRE BoomerAMG: Interpolation truncation factor 0 HYPRE BoomerAMG: Interpolation: max elements per row 0 HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 HYPRE BoomerAMG: Maximum row sums 0.9 HYPRE BoomerAMG: Sweeps down 1 HYPRE BoomerAMG: Sweeps up 1 HYPRE BoomerAMG: Sweeps on coarse 1 HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi HYPRE BoomerAMG: Relax on coarse Gaussian-elimination HYPRE BoomerAMG: Relax weight (all) 1 HYPRE BoomerAMG: Outer relax weight (all) 1 HYPRE BoomerAMG: Using CF-relaxation HYPRE BoomerAMG: Measure type local HYPRE BoomerAMG: Coarsen type PMIS HYPRE BoomerAMG: Interpolation type classical linear system matrix = precond matrix: Mat Object: (fieldsplit_u_) 8 MPI processes type: mpiaij rows=1056957, cols=1056957, bs=3 total: nonzeros=1.09425e+08, allocated nonzeros=1.09425e+08 total number of mallocs used during MatSetValues calls =0 using I-node (on process 0) routines: found 43537 nodes, limit used is 5 0 KSP preconditioned resid norm 4.076033642262e+00 true resid norm 9.006493083033e+06 ||r(i)||/||b|| 1.000000000000e+00 Giang On Sat, Apr 29, 2017 at 8:06 PM, Barry Smith wrote: > > > On Apr 29, 2017, at 8:34 AM, Jed Brown wrote: > > > > Hoang Giang Bui writes: > > > >> Hi Barry > >> > >> The first block is from a standard solid mechanics discretization based > on > >> balance of momentum equation. There is some material involved but in > >> principal it's well-posed elasticity equation with positive definite > >> tangent operator. The "gluing business" uses the mortar method to keep > the > >> continuity of displacement. Instead of using Lagrange multiplier to > treat > >> the constraint I used penalty method to penalize the energy. The > >> discretization form of mortar is quite simple > >> > >> \int_{\Gamma_1} { rho * (\delta u_1 - \delta u_2) * (u_1 - u_2) dA } > >> > >> rho is penalty parameter. In the simulation I initially set it low (~E) > to > >> preserve the conditioning of the system. > > > > There are two things that can go wrong here with AMG: > > > > * The penalty term can mess up the strength of connection heuristics > > such that you get poor choice of C-points (classical AMG like > > BoomerAMG) or poor choice of aggregates (smoothed aggregation). > > > > * The penalty term can prevent Jacobi smoothing from being effective; in > > this case, it can lead to poor coarse basis functions (higher energy > > than they should be) and poor smoothing in an MG cycle. You can fix > > the poor smoothing in the MG cycle by using a stronger smoother, like > > ASM with some overlap. > > > > I'm generally not a fan of penalty methods due to the irritating > > tradeoffs and often poor solver performance. > > So, let's first see what hypre BoomerAMG is doing with the system. Run > for just one BoomerAMG solve with the additional options > > -fieldsplit_u_ksp_view -fieldsplit_u_pc_hypre_boomeramg_print_statistics > > this should print a good amount of information of what BoomerAMG has > decided to do based on the input matrix. > > I'm bringing the hypre team into the conversation since they obviously > know far more about BoomerAMG tuning options that may help your case. > > Barry > > > > > > >> In the figure below, the colorful blocks are u_1 and the base is u_2. > Both > >> u_1 and u_2 use isoparametric quadratic approximation. > >> > >> ? > >> Snapshot.png > >> U/view?usp=drive_web> > >> ??? > >> > >> Giang > >> > >> On Fri, Apr 28, 2017 at 6:21 PM, Barry Smith > wrote: > >> > >>> > >>> Ok, so boomerAMG algebraic multigrid is not good for the first block. > >>> You mentioned the first block has two things glued together? AMG is > >>> fantastic for certain problems but doesn't work for everything. > >>> > >>> Tell us more about the first block, what PDE it comes from, what > >>> discretization, and what the "gluing business" is and maybe we'll have > >>> suggestions for how to precondition it. > >>> > >>> Barry > >>> > >>>> On Apr 28, 2017, at 3:56 AM, Hoang Giang Bui > wrote: > >>>> > >>>> It's in fact quite good > >>>> > >>>> Residual norms for fieldsplit_u_ solve. > >>>> 0 KSP Residual norm 4.014715925568e+00 > >>>> 1 KSP Residual norm 2.160497019264e-10 > >>>> Residual norms for fieldsplit_wp_ solve. > >>>> 0 KSP Residual norm 0.000000000000e+00 > >>>> 0 KSP preconditioned resid norm 4.014715925568e+00 true resid norm > >>> 9.006493082896e+06 ||r(i)||/||b|| 1.000000000000e+00 > >>>> Residual norms for fieldsplit_u_ solve. > >>>> 0 KSP Residual norm 9.999999999416e-01 > >>>> 1 KSP Residual norm 7.118380416383e-11 > >>>> Residual norms for fieldsplit_wp_ solve. > >>>> 0 KSP Residual norm 0.000000000000e+00 > >>>> 1 KSP preconditioned resid norm 1.701150951035e-10 true resid norm > >>> 5.494262251846e-04 ||r(i)||/||b|| 6.100334726599e-11 > >>>> Linear solve converged due to CONVERGED_ATOL iterations 1 > >>>> > >>>> Giang > >>>> > >>>> On Thu, Apr 27, 2017 at 5:25 PM, Barry Smith > wrote: > >>>> > >>>> Run again using LU on both blocks to see what happens. > >>>> > >>>> > >>>>> On Apr 27, 2017, at 2:14 AM, Hoang Giang Bui > >>> wrote: > >>>>> > >>>>> I have changed the way to tie the nonconforming mesh. It seems the > >>> matrix now is better > >>>>> > >>>>> with -pc_type lu the output is > >>>>> 0 KSP preconditioned resid norm 3.308678584240e-01 true resid norm > >>> 9.006493082896e+06 ||r(i)||/||b|| 1.000000000000e+00 > >>>>> 1 KSP preconditioned resid norm 2.004313395301e-12 true resid norm > >>> 2.549872332830e-05 ||r(i)||/||b|| 2.831148938173e-12 > >>>>> Linear solve converged due to CONVERGED_ATOL iterations 1 > >>>>> > >>>>> > >>>>> with -pc_type fieldsplit -fieldsplit_u_pc_type hypre > >>> -fieldsplit_wp_pc_type lu the convergence is slow > >>>>> 0 KSP preconditioned resid norm 1.116302362553e-01 true resid norm > >>> 9.006493083520e+06 ||r(i)||/||b|| 1.000000000000e+00 > >>>>> 1 KSP preconditioned resid norm 2.582134825666e-02 true resid norm > >>> 9.268347719866e+06 ||r(i)||/||b|| 1.029073984060e+00 > >>>>> ... > >>>>> 824 KSP preconditioned resid norm 1.018542387738e-09 true resid norm > >>> 2.906608839310e+02 ||r(i)||/||b|| 3.227237074804e-05 > >>>>> 825 KSP preconditioned resid norm 9.743727947637e-10 true resid norm > >>> 2.820369993061e+02 ||r(i)||/||b|| 3.131485215062e-05 > >>>>> Linear solve converged due to CONVERGED_ATOL iterations 825 > >>>>> > >>>>> checking with additional -fieldsplit_u_ksp_type richardson > >>> -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 > >>> -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor > >>> -fieldsplit_wp_ksp_max_it 1 gives > >>>>> > >>>>> 0 KSP preconditioned resid norm 1.116302362553e-01 true resid norm > >>> 9.006493083520e+06 ||r(i)||/||b|| 1.000000000000e+00 > >>>>> Residual norms for fieldsplit_u_ solve. > >>>>> 0 KSP Residual norm 5.803507549280e-01 > >>>>> 1 KSP Residual norm 2.069538175950e-01 > >>>>> Residual norms for fieldsplit_wp_ solve. > >>>>> 0 KSP Residual norm 0.000000000000e+00 > >>>>> 1 KSP preconditioned resid norm 2.582134825666e-02 true resid norm > >>> 9.268347719866e+06 ||r(i)||/||b|| 1.029073984060e+00 > >>>>> Residual norms for fieldsplit_u_ solve. > >>>>> 0 KSP Residual norm 7.831796195225e-01 > >>>>> 1 KSP Residual norm 1.734608520110e-01 > >>>>> Residual norms for fieldsplit_wp_ solve. > >>>>> 0 KSP Residual norm 0.000000000000e+00 > >>>>> .... > >>>>> 823 KSP preconditioned resid norm 1.065070135605e-09 true resid norm > >>> 3.081881356833e+02 ||r(i)||/||b|| 3.421843916665e-05 > >>>>> Residual norms for fieldsplit_u_ solve. > >>>>> 0 KSP Residual norm 6.113806394327e-01 > >>>>> 1 KSP Residual norm 1.535465290944e-01 > >>>>> Residual norms for fieldsplit_wp_ solve. > >>>>> 0 KSP Residual norm 0.000000000000e+00 > >>>>> 824 KSP preconditioned resid norm 1.018542387746e-09 true resid norm > >>> 2.906608839353e+02 ||r(i)||/||b|| 3.227237074851e-05 > >>>>> Residual norms for fieldsplit_u_ solve. > >>>>> 0 KSP Residual norm 6.123437055586e-01 > >>>>> 1 KSP Residual norm 1.524661826133e-01 > >>>>> Residual norms for fieldsplit_wp_ solve. > >>>>> 0 KSP Residual norm 0.000000000000e+00 > >>>>> 825 KSP preconditioned resid norm 9.743727947718e-10 true resid norm > >>> 2.820369990571e+02 ||r(i)||/||b|| 3.131485212298e-05 > >>>>> Linear solve converged due to CONVERGED_ATOL iterations 825 > >>>>> > >>>>> > >>>>> The residual for wp block is zero since in this first step the rhs is > >>> zero. As can see in the output, the multigrid does not perform well to > >>> reduce the residual in the sub-solve. Is my observation right? what > can be > >>> done to improve this? > >>>>> > >>>>> > >>>>> Giang > >>>>> > >>>>> On Tue, Apr 25, 2017 at 12:17 AM, Barry Smith > >>> wrote: > >>>>> > >>>>> This can happen in the matrix is singular or nearly singular or if > >>> the factorization generates small pivots, which can occur for even > >>> nonsingular problems if the matrix is poorly scaled or just plain > nasty. > >>>>> > >>>>> > >>>>>> On Apr 24, 2017, at 5:10 PM, Hoang Giang Bui > >>> wrote: > >>>>>> > >>>>>> It took a while, here I send you the output > >>>>>> > >>>>>> 0 KSP preconditioned resid norm 3.129073545457e+05 true resid norm > >>> 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 > >>>>>> 1 KSP preconditioned resid norm 7.442444222843e-01 true resid norm > >>> 1.003356247696e+02 ||r(i)||/||b|| 1.112966720375e-05 > >>>>>> 2 KSP preconditioned resid norm 3.267453132529e-07 true resid norm > >>> 3.216722968300e+01 ||r(i)||/||b|| 3.568130084011e-06 > >>>>>> 3 KSP preconditioned resid norm 1.155046883816e-11 true resid norm > >>> 3.234460376820e+01 ||r(i)||/||b|| 3.587805194854e-06 > >>>>>> Linear solve converged due to CONVERGED_ATOL iterations 3 > >>>>>> KSP Object: 4 MPI processes > >>>>>> type: gmres > >>>>>> GMRES: restart=1000, using Modified Gram-Schmidt > >>> Orthogonalization > >>>>>> GMRES: happy breakdown tolerance 1e-30 > >>>>>> maximum iterations=1000, initial guess is zero > >>>>>> tolerances: relative=1e-20, absolute=1e-09, divergence=10000 > >>>>>> left preconditioning > >>>>>> using PRECONDITIONED norm type for convergence test > >>>>>> PC Object: 4 MPI processes > >>>>>> type: lu > >>>>>> LU: out-of-place factorization > >>>>>> tolerance for zero pivot 2.22045e-14 > >>>>>> matrix ordering: natural > >>>>>> factor fill ratio given 0, needed 0 > >>>>>> Factored matrix follows: > >>>>>> Mat Object: 4 MPI processes > >>>>>> type: mpiaij > >>>>>> rows=973051, cols=973051 > >>>>>> package used to perform factorization: pastix > >>>>>> Error : 3.24786e-14 > >>>>>> total: nonzeros=0, allocated nonzeros=0 > >>>>>> total number of mallocs used during MatSetValues calls =0 > >>>>>> PaStiX run parameters: > >>>>>> Matrix type : Unsymmetric > >>>>>> Level of printing (0,1,2): 0 > >>>>>> Number of refinements iterations : 3 > >>>>>> Error : 3.24786e-14 > >>>>>> linear system matrix = precond matrix: > >>>>>> Mat Object: 4 MPI processes > >>>>>> type: mpiaij > >>>>>> rows=973051, cols=973051 > >>>>>> Error : 3.24786e-14 > >>>>>> total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 > >>>>>> total number of mallocs used during MatSetValues calls =0 > >>>>>> using I-node (on process 0) routines: found 78749 nodes, limit > >>> used is 5 > >>>>>> Error : 3.24786e-14 > >>>>>> > >>>>>> It doesn't do as you said. Something is not right here. I will look > >>> in depth. > >>>>>> > >>>>>> Giang > >>>>>> > >>>>>> On Mon, Apr 24, 2017 at 8:21 PM, Barry Smith > >>> wrote: > >>>>>> > >>>>>>> On Apr 24, 2017, at 12:47 PM, Hoang Giang Bui > >>> wrote: > >>>>>>> > >>>>>>> Good catch. I get this for the very first step, maybe at that time > >>> the rhs_w is zero. > >>>>>> > >>>>>> With the multiplicative composition the right hand side of the > >>> second solve is the initial right hand side of the second solve minus > >>> A_10*x where x is the solution to the first sub solve and A_10 is the > lower > >>> left block of the outer matrix. So unless both the initial right hand > side > >>> has a zero for the second block and A_10 is identically zero the right > hand > >>> side for the second sub solve should not be zero. Is A_10 == 0? > >>>>>> > >>>>>> > >>>>>>> In the later step, it shows 2 step convergence > >>>>>>> > >>>>>>> Residual norms for fieldsplit_u_ solve. > >>>>>>> 0 KSP Residual norm 3.165886479830e+04 > >>>>>>> 1 KSP Residual norm 2.905922877684e-01 > >>>>>>> Residual norms for fieldsplit_wp_ solve. > >>>>>>> 0 KSP Residual norm 2.397669419027e-01 > >>>>>>> 1 KSP Residual norm 0.000000000000e+00 > >>>>>>> 0 KSP preconditioned resid norm 3.165886479920e+04 true resid > >>> norm 7.963616922323e+05 ||r(i)||/||b|| 1.000000000000e+00 > >>>>>>> Residual norms for fieldsplit_u_ solve. > >>>>>>> 0 KSP Residual norm 9.999891813771e-01 > >>>>>>> 1 KSP Residual norm 1.512000395579e-05 > >>>>>>> Residual norms for fieldsplit_wp_ solve. > >>>>>>> 0 KSP Residual norm 8.192702188243e-06 > >>>>>>> 1 KSP Residual norm 0.000000000000e+00 > >>>>>>> 1 KSP preconditioned resid norm 5.252183822848e-02 true resid > >>> norm 7.135927677844e+04 ||r(i)||/||b|| 8.960661653427e-02 > >>>>>> > >>>>>> The outer residual norms are still wonky, the preconditioned > >>> residual norm goes from 3.165886479920e+04 to 5.252183822848e-02 which > is a > >>> huge drop but the 7.963616922323e+05 drops very much less > >>> 7.135927677844e+04. This is not normal. > >>>>>> > >>>>>> What if you just use -pc_type lu for the entire system (no > >>> fieldsplit), does the true residual drop to almost zero in the first > >>> iteration (as it should?). Send the output. > >>>>>> > >>>>>> > >>>>>> > >>>>>>> Residual norms for fieldsplit_u_ solve. > >>>>>>> 0 KSP Residual norm 6.946213936597e-01 > >>>>>>> 1 KSP Residual norm 1.195514007343e-05 > >>>>>>> Residual norms for fieldsplit_wp_ solve. > >>>>>>> 0 KSP Residual norm 1.025694497535e+00 > >>>>>>> 1 KSP Residual norm 0.000000000000e+00 > >>>>>>> 2 KSP preconditioned resid norm 8.785709535405e-03 true resid > >>> norm 1.419341799277e+04 ||r(i)||/||b|| 1.782282866091e-02 > >>>>>>> Residual norms for fieldsplit_u_ solve. > >>>>>>> 0 KSP Residual norm 7.255149996405e-01 > >>>>>>> 1 KSP Residual norm 6.583512434218e-06 > >>>>>>> Residual norms for fieldsplit_wp_ solve. > >>>>>>> 0 KSP Residual norm 1.015229700337e+00 > >>>>>>> 1 KSP Residual norm 0.000000000000e+00 > >>>>>>> 3 KSP preconditioned resid norm 7.110407712709e-04 true resid > >>> norm 5.284940654154e+02 ||r(i)||/||b|| 6.636357205153e-04 > >>>>>>> Residual norms for fieldsplit_u_ solve. > >>>>>>> 0 KSP Residual norm 3.512243341400e-01 > >>>>>>> 1 KSP Residual norm 2.032490351200e-06 > >>>>>>> Residual norms for fieldsplit_wp_ solve. > >>>>>>> 0 KSP Residual norm 1.282327290982e+00 > >>>>>>> 1 KSP Residual norm 0.000000000000e+00 > >>>>>>> 4 KSP preconditioned resid norm 3.482036620521e-05 true resid > >>> norm 4.291231924307e+01 ||r(i)||/||b|| 5.388546393133e-05 > >>>>>>> Residual norms for fieldsplit_u_ solve. > >>>>>>> 0 KSP Residual norm 3.423609338053e-01 > >>>>>>> 1 KSP Residual norm 4.213703301972e-07 > >>>>>>> Residual norms for fieldsplit_wp_ solve. > >>>>>>> 0 KSP Residual norm 1.157384757538e+00 > >>>>>>> 1 KSP Residual norm 0.000000000000e+00 > >>>>>>> 5 KSP preconditioned resid norm 1.203470314534e-06 true resid > >>> norm 4.544956156267e+00 ||r(i)||/||b|| 5.707150658550e-06 > >>>>>>> Residual norms for fieldsplit_u_ solve. > >>>>>>> 0 KSP Residual norm 3.838596289995e-01 > >>>>>>> 1 KSP Residual norm 9.927864176103e-08 > >>>>>>> Residual norms for fieldsplit_wp_ solve. > >>>>>>> 0 KSP Residual norm 1.066298905618e+00 > >>>>>>> 1 KSP Residual norm 0.000000000000e+00 > >>>>>>> 6 KSP preconditioned resid norm 3.331619244266e-08 true resid > >>> norm 2.821511729024e+00 ||r(i)||/||b|| 3.543002829675e-06 > >>>>>>> Residual norms for fieldsplit_u_ solve. > >>>>>>> 0 KSP Residual norm 4.624964188094e-01 > >>>>>>> 1 KSP Residual norm 6.418229775372e-08 > >>>>>>> Residual norms for fieldsplit_wp_ solve. > >>>>>>> 0 KSP Residual norm 9.800784311614e-01 > >>>>>>> 1 KSP Residual norm 0.000000000000e+00 > >>>>>>> 7 KSP preconditioned resid norm 8.788046233297e-10 true resid > >>> norm 2.849209671705e+00 ||r(i)||/||b|| 3.577783436215e-06 > >>>>>>> Linear solve converged due to CONVERGED_ATOL iterations 7 > >>>>>>> > >>>>>>> The outer operator is an explicit matrix. > >>>>>>> > >>>>>>> Giang > >>>>>>> > >>>>>>> On Mon, Apr 24, 2017 at 7:32 PM, Barry Smith > >>> wrote: > >>>>>>> > >>>>>>>> On Apr 24, 2017, at 3:16 AM, Hoang Giang Bui > >>> wrote: > >>>>>>>> > >>>>>>>> Thanks Barry, trying with -fieldsplit_u_type lu gives better > >>> convergence. I still used 4 procs though, probably with 1 proc it > should > >>> also be the same. > >>>>>>>> > >>>>>>>> The u block used a Nitsche-type operator to connect two > >>> non-matching domains. I don't think it will leave some rigid body > motion > >>> leads to not sufficient constraints. Maybe you have other idea? > >>>>>>>> > >>>>>>>> Residual norms for fieldsplit_u_ solve. > >>>>>>>> 0 KSP Residual norm 3.129067184300e+05 > >>>>>>>> 1 KSP Residual norm 5.906261468196e-01 > >>>>>>>> Residual norms for fieldsplit_wp_ solve. > >>>>>>>> 0 KSP Residual norm 0.000000000000e+00 > >>>>>>> > >>>>>>> ^^^^ something is wrong here. The sub solve should not be > >>> starting with a 0 residual (this means the right hand side for this sub > >>> solve is zero which it should not be). > >>>>>>> > >>>>>>>> FieldSplit with MULTIPLICATIVE composition: total splits = 2 > >>>>>>> > >>>>>>> > >>>>>>> How are you providing the outer operator? As an explicit matrix > >>> or with some shell matrix? > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> 0 KSP preconditioned resid norm 3.129067184300e+05 true resid > >>> norm 9.015150492169e+06 ||r(i)||/||b|| 1.000000000000e+00 > >>>>>>>> Residual norms for fieldsplit_u_ solve. > >>>>>>>> 0 KSP Residual norm 9.999955993437e-01 > >>>>>>>> 1 KSP Residual norm 4.019774691831e-06 > >>>>>>>> Residual norms for fieldsplit_wp_ solve. > >>>>>>>> 0 KSP Residual norm 0.000000000000e+00 > >>>>>>>> 1 KSP preconditioned resid norm 5.003913641475e-01 true resid > >>> norm 4.692996324114e+01 ||r(i)||/||b|| 5.205677185522e-06 > >>>>>>>> Residual norms for fieldsplit_u_ solve. > >>>>>>>> 0 KSP Residual norm 1.000012180204e+00 > >>>>>>>> 1 KSP Residual norm 1.017367950422e-05 > >>>>>>>> Residual norms for fieldsplit_wp_ solve. > >>>>>>>> 0 KSP Residual norm 0.000000000000e+00 > >>>>>>>> 2 KSP preconditioned resid norm 2.330910333756e-07 true resid > >>> norm 3.474855463983e+01 ||r(i)||/||b|| 3.854461960453e-06 > >>>>>>>> Residual norms for fieldsplit_u_ solve. > >>>>>>>> 0 KSP Residual norm 1.000004200085e+00 > >>>>>>>> 1 KSP Residual norm 6.231613102458e-06 > >>>>>>>> Residual norms for fieldsplit_wp_ solve. > >>>>>>>> 0 KSP Residual norm 0.000000000000e+00 > >>>>>>>> 3 KSP preconditioned resid norm 8.671259838389e-11 true resid > >>> norm 3.545103468011e+01 ||r(i)||/||b|| 3.932384125024e-06 > >>>>>>>> Linear solve converged due to CONVERGED_ATOL iterations 3 > >>>>>>>> KSP Object: 4 MPI processes > >>>>>>>> type: gmres > >>>>>>>> GMRES: restart=1000, using Modified Gram-Schmidt > >>> Orthogonalization > >>>>>>>> GMRES: happy breakdown tolerance 1e-30 > >>>>>>>> maximum iterations=1000, initial guess is zero > >>>>>>>> tolerances: relative=1e-20, absolute=1e-09, divergence=10000 > >>>>>>>> left preconditioning > >>>>>>>> using PRECONDITIONED norm type for convergence test > >>>>>>>> PC Object: 4 MPI processes > >>>>>>>> type: fieldsplit > >>>>>>>> FieldSplit with MULTIPLICATIVE composition: total splits = 2 > >>>>>>>> Solver info for each split is in the following KSP objects: > >>>>>>>> Split number 0 Defined by IS > >>>>>>>> KSP Object: (fieldsplit_u_) 4 MPI processes > >>>>>>>> type: richardson > >>>>>>>> Richardson: damping factor=1 > >>>>>>>> maximum iterations=1, initial guess is zero > >>>>>>>> tolerances: relative=1e-05, absolute=1e-50, > >>> divergence=10000 > >>>>>>>> left preconditioning > >>>>>>>> using PRECONDITIONED norm type for convergence test > >>>>>>>> PC Object: (fieldsplit_u_) 4 MPI processes > >>>>>>>> type: lu > >>>>>>>> LU: out-of-place factorization > >>>>>>>> tolerance for zero pivot 2.22045e-14 > >>>>>>>> matrix ordering: natural > >>>>>>>> factor fill ratio given 0, needed 0 > >>>>>>>> Factored matrix follows: > >>>>>>>> Mat Object: 4 MPI processes > >>>>>>>> type: mpiaij > >>>>>>>> rows=938910, cols=938910 > >>>>>>>> package used to perform factorization: pastix > >>>>>>>> total: nonzeros=0, allocated nonzeros=0 > >>>>>>>> Error : 3.36878e-14 > >>>>>>>> total number of mallocs used during MatSetValues calls > >>> =0 > >>>>>>>> PaStiX run parameters: > >>>>>>>> Matrix type : Unsymmetric > >>>>>>>> Level of printing (0,1,2): 0 > >>>>>>>> Number of refinements iterations : 3 > >>>>>>>> Error : 3.36878e-14 > >>>>>>>> linear system matrix = precond matrix: > >>>>>>>> Mat Object: (fieldsplit_u_) 4 MPI processes > >>>>>>>> type: mpiaij > >>>>>>>> rows=938910, cols=938910, bs=3 > >>>>>>>> Error : 3.36878e-14 > >>>>>>>> Error : 3.36878e-14 > >>>>>>>> total: nonzeros=8.60906e+07, allocated > >>> nonzeros=8.60906e+07 > >>>>>>>> total number of mallocs used during MatSetValues calls =0 > >>>>>>>> using I-node (on process 0) routines: found 78749 > >>> nodes, limit used is 5 > >>>>>>>> Split number 1 Defined by IS > >>>>>>>> KSP Object: (fieldsplit_wp_) 4 MPI processes > >>>>>>>> type: richardson > >>>>>>>> Richardson: damping factor=1 > >>>>>>>> maximum iterations=1, initial guess is zero > >>>>>>>> tolerances: relative=1e-05, absolute=1e-50, > >>> divergence=10000 > >>>>>>>> left preconditioning > >>>>>>>> using PRECONDITIONED norm type for convergence test > >>>>>>>> PC Object: (fieldsplit_wp_) 4 MPI processes > >>>>>>>> type: lu > >>>>>>>> LU: out-of-place factorization > >>>>>>>> tolerance for zero pivot 2.22045e-14 > >>>>>>>> matrix ordering: natural > >>>>>>>> factor fill ratio given 0, needed 0 > >>>>>>>> Factored matrix follows: > >>>>>>>> Mat Object: 4 MPI processes > >>>>>>>> type: mpiaij > >>>>>>>> rows=34141, cols=34141 > >>>>>>>> package used to perform factorization: pastix > >>>>>>>> Error : -nan > >>>>>>>> Error : -nan > >>>>>>>> Error : -nan > >>>>>>>> total: nonzeros=0, allocated nonzeros=0 > >>>>>>>> total number of mallocs used during MatSetValues > >>> calls =0 > >>>>>>>> PaStiX run parameters: > >>>>>>>> Matrix type : Symmetric > >>>>>>>> Level of printing (0,1,2): 0 > >>>>>>>> Number of refinements iterations : 0 > >>>>>>>> Error : -nan > >>>>>>>> linear system matrix = precond matrix: > >>>>>>>> Mat Object: (fieldsplit_wp_) 4 MPI processes > >>>>>>>> type: mpiaij > >>>>>>>> rows=34141, cols=34141 > >>>>>>>> total: nonzeros=485655, allocated nonzeros=485655 > >>>>>>>> total number of mallocs used during MatSetValues calls =0 > >>>>>>>> not using I-node (on process 0) routines > >>>>>>>> linear system matrix = precond matrix: > >>>>>>>> Mat Object: 4 MPI processes > >>>>>>>> type: mpiaij > >>>>>>>> rows=973051, cols=973051 > >>>>>>>> total: nonzeros=9.90037e+07, allocated nonzeros=9.90037e+07 > >>>>>>>> total number of mallocs used during MatSetValues calls =0 > >>>>>>>> using I-node (on process 0) routines: found 78749 nodes, > >>> limit used is 5 > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> Giang > >>>>>>>> > >>>>>>>> On Sun, Apr 23, 2017 at 10:19 PM, Barry Smith < > >>> bsmith at mcs.anl.gov> wrote: > >>>>>>>> > >>>>>>>>> On Apr 23, 2017, at 2:42 PM, Hoang Giang Bui < > >>> hgbk2008 at gmail.com> wrote: > >>>>>>>>> > >>>>>>>>> Dear Matt/Barry > >>>>>>>>> > >>>>>>>>> With your options, it results in > >>>>>>>>> > >>>>>>>>> 0 KSP preconditioned resid norm 1.106709687386e+31 true > >>> resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 > >>>>>>>>> Residual norms for fieldsplit_u_ solve. > >>>>>>>>> 0 KSP Residual norm 2.407308987203e+36 > >>>>>>>>> 1 KSP Residual norm 5.797185652683e+72 > >>>>>>>> > >>>>>>>> It looks like Matt is right, hypre is seemly producing useless > >>> garbage. > >>>>>>>> > >>>>>>>> First how do things run on one process. If you have similar > >>> problems then debug on one process (debugging any kind of problem is > always > >>> far easy on one process). > >>>>>>>> > >>>>>>>> First run with -fieldsplit_u_type lu (instead of using hypre) to > >>> see if that works or also produces something bad. > >>>>>>>> > >>>>>>>> What is the operator and the boundary conditions for u? It could > >>> be singular. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> Residual norms for fieldsplit_wp_ solve. > >>>>>>>>> 0 KSP Residual norm 0.000000000000e+00 > >>>>>>>>> ... > >>>>>>>>> 999 KSP preconditioned resid norm 2.920157329174e+12 true > >>> resid norm 9.015683504616e+06 ||r(i)||/||b|| 1.000059124102e+00 > >>>>>>>>> Residual norms for fieldsplit_u_ solve. > >>>>>>>>> 0 KSP Residual norm 1.533726746719e+36 > >>>>>>>>> 1 KSP Residual norm 3.692757392261e+72 > >>>>>>>>> Residual norms for fieldsplit_wp_ solve. > >>>>>>>>> 0 KSP Residual norm 0.000000000000e+00 > >>>>>>>>> > >>>>>>>>> Do you suggest that the pastix solver for the "wp" block > >>> encounters small pivot? In addition, seem like the "u" block is also > >>> singular. > >>>>>>>>> > >>>>>>>>> Giang > >>>>>>>>> > >>>>>>>>> On Sun, Apr 23, 2017 at 7:39 PM, Barry Smith < > >>> bsmith at mcs.anl.gov> wrote: > >>>>>>>>> > >>>>>>>>> Huge preconditioned norms but normal unpreconditioned norms > >>> almost always come from a very small pivot in an LU or ILU > factorization. > >>>>>>>>> > >>>>>>>>> The first thing to do is monitor the two sub solves. Run > >>> with the additional options -fieldsplit_u_ksp_type richardson > >>> -fieldsplit_u_ksp_monitor -fieldsplit_u_ksp_max_it 1 > >>> -fieldsplit_wp_ksp_type richardson -fieldsplit_wp_ksp_monitor > >>> -fieldsplit_wp_ksp_max_it 1 > >>>>>>>>> > >>>>>>>>>> On Apr 23, 2017, at 12:22 PM, Hoang Giang Bui < > >>> hgbk2008 at gmail.com> wrote: > >>>>>>>>>> > >>>>>>>>>> Hello > >>>>>>>>>> > >>>>>>>>>> I encountered a strange convergence behavior that I have > >>> trouble to understand > >>>>>>>>>> > >>>>>>>>>> KSPSetFromOptions completed > >>>>>>>>>> 0 KSP preconditioned resid norm 1.106709687386e+31 true > >>> resid norm 9.015150491938e+06 ||r(i)||/||b|| 1.000000000000e+00 > >>>>>>>>>> 1 KSP preconditioned resid norm 2.933141742664e+29 true > >>> resid norm 9.015152282123e+06 ||r(i)||/||b|| 1.000000198575e+00 > >>>>>>>>>> 2 KSP preconditioned resid norm 9.686409637174e+16 true > >>> resid norm 9.015354521944e+06 ||r(i)||/||b|| 1.000022631902e+00 > >>>>>>>>>> 3 KSP preconditioned resid norm 4.219243615809e+15 true > >>> resid norm 9.017157702420e+06 ||r(i)||/||b|| 1.000222648583e+00 > >>>>>>>>>> ..... > >>>>>>>>>> 999 KSP preconditioned resid norm 3.043754298076e+12 true > >>> resid norm 9.015425041089e+06 ||r(i)||/||b|| 1.000030454195e+00 > >>>>>>>>>> 1000 KSP preconditioned resid norm 3.043000287819e+12 true > >>> resid norm 9.015424313455e+06 ||r(i)||/||b|| 1.000030373483e+00 > >>>>>>>>>> Linear solve did not converge due to DIVERGED_ITS iterations > >>> 1000 > >>>>>>>>>> KSP Object: 4 MPI processes > >>>>>>>>>> type: gmres > >>>>>>>>>> GMRES: restart=1000, using Modified Gram-Schmidt > >>> Orthogonalization > >>>>>>>>>> GMRES: happy breakdown tolerance 1e-30 > >>>>>>>>>> maximum iterations=1000, initial guess is zero > >>>>>>>>>> tolerances: relative=1e-20, absolute=1e-09, > >>> divergence=10000 > >>>>>>>>>> left preconditioning > >>>>>>>>>> using PRECONDITIONED norm type for convergence test > >>>>>>>>>> PC Object: 4 MPI processes > >>>>>>>>>> type: fieldsplit > >>>>>>>>>> FieldSplit with MULTIPLICATIVE composition: total splits > >>> = 2 > >>>>>>>>>> Solver info for each split is in the following KSP > >>> objects: > >>>>>>>>>> Split number 0 Defined by IS > >>>>>>>>>> KSP Object: (fieldsplit_u_) 4 MPI processes > >>>>>>>>>> type: preonly > >>>>>>>>>> maximum iterations=10000, initial guess is zero > >>>>>>>>>> tolerances: relative=1e-05, absolute=1e-50, > >>> divergence=10000 > >>>>>>>>>> left preconditioning > >>>>>>>>>> using NONE norm type for convergence test > >>>>>>>>>> PC Object: (fieldsplit_u_) 4 MPI processes > >>>>>>>>>> type: hypre > >>>>>>>>>> HYPRE BoomerAMG preconditioning > >>>>>>>>>> HYPRE BoomerAMG: Cycle type V > >>>>>>>>>> HYPRE BoomerAMG: Maximum number of levels 25 > >>>>>>>>>> HYPRE BoomerAMG: Maximum number of iterations PER > >>> hypre call 1 > >>>>>>>>>> HYPRE BoomerAMG: Convergence tolerance PER hypre > >>> call 0 > >>>>>>>>>> HYPRE BoomerAMG: Threshold for strong coupling 0.6 > >>>>>>>>>> HYPRE BoomerAMG: Interpolation truncation factor 0 > >>>>>>>>>> HYPRE BoomerAMG: Interpolation: max elements per row > >>> 0 > >>>>>>>>>> HYPRE BoomerAMG: Number of levels of aggressive > >>> coarsening 0 > >>>>>>>>>> HYPRE BoomerAMG: Number of paths for aggressive > >>> coarsening 1 > >>>>>>>>>> HYPRE BoomerAMG: Maximum row sums 0.9 > >>>>>>>>>> HYPRE BoomerAMG: Sweeps down 1 > >>>>>>>>>> HYPRE BoomerAMG: Sweeps up 1 > >>>>>>>>>> HYPRE BoomerAMG: Sweeps on coarse 1 > >>>>>>>>>> HYPRE BoomerAMG: Relax down > >>> symmetric-SOR/Jacobi > >>>>>>>>>> HYPRE BoomerAMG: Relax up > >>> symmetric-SOR/Jacobi > >>>>>>>>>> HYPRE BoomerAMG: Relax on coarse > >>> Gaussian-elimination > >>>>>>>>>> HYPRE BoomerAMG: Relax weight (all) 1 > >>>>>>>>>> HYPRE BoomerAMG: Outer relax weight (all) 1 > >>>>>>>>>> HYPRE BoomerAMG: Using CF-relaxation > >>>>>>>>>> HYPRE BoomerAMG: Measure type local > >>>>>>>>>> HYPRE BoomerAMG: Coarsen type PMIS > >>>>>>>>>> HYPRE BoomerAMG: Interpolation type classical > >>>>>>>>>> linear system matrix = precond matrix: > >>>>>>>>>> Mat Object: (fieldsplit_u_) 4 MPI processes > >>>>>>>>>> type: mpiaij > >>>>>>>>>> rows=938910, cols=938910, bs=3 > >>>>>>>>>> total: nonzeros=8.60906e+07, allocated > >>> nonzeros=8.60906e+07 > >>>>>>>>>> total number of mallocs used during MatSetValues > >>> calls =0 > >>>>>>>>>> using I-node (on process 0) routines: found 78749 > >>> nodes, limit used is 5 > >>>>>>>>>> Split number 1 Defined by IS > >>>>>>>>>> KSP Object: (fieldsplit_wp_) 4 MPI processes > >>>>>>>>>> type: preonly > >>>>>>>>>> maximum iterations=10000, initial guess is zero > >>>>>>>>>> tolerances: relative=1e-05, absolute=1e-50, > >>> divergence=10000 > >>>>>>>>>> left preconditioning > >>>>>>>>>> using NONE norm type for convergence test > >>>>>>>>>> PC Object: (fieldsplit_wp_) 4 MPI processes > >>>>>>>>>> type: lu > >>>>>>>>>> LU: out-of-place factorization > >>>>>>>>>> tolerance for zero pivot 2.22045e-14 > >>>>>>>>>> matrix ordering: natural > >>>>>>>>>> factor fill ratio given 0, needed 0 > >>>>>>>>>> Factored matrix follows: > >>>>>>>>>> Mat Object: 4 MPI processes > >>>>>>>>>> type: mpiaij > >>>>>>>>>> rows=34141, cols=34141 > >>>>>>>>>> package used to perform factorization: pastix > >>>>>>>>>> Error : -nan > >>>>>>>>>> Error : -nan > >>>>>>>>>> total: nonzeros=0, allocated nonzeros=0 > >>>>>>>>>> Error : -nan > >>>>>>>>>> total number of mallocs used during MatSetValues calls =0 > >>>>>>>>>> PaStiX run parameters: > >>>>>>>>>> Matrix type : > >>> Symmetric > >>>>>>>>>> Level of printing (0,1,2): 0 > >>>>>>>>>> Number of refinements iterations : 0 > >>>>>>>>>> Error : -nan > >>>>>>>>>> linear system matrix = precond matrix: > >>>>>>>>>> Mat Object: (fieldsplit_wp_) 4 MPI processes > >>>>>>>>>> type: mpiaij > >>>>>>>>>> rows=34141, cols=34141 > >>>>>>>>>> total: nonzeros=485655, allocated nonzeros=485655 > >>>>>>>>>> total number of mallocs used during MatSetValues > >>> calls =0 > >>>>>>>>>> not using I-node (on process 0) routines > >>>>>>>>>> linear system matrix = precond matrix: > >>>>>>>>>> Mat Object: 4 MPI processes > >>>>>>>>>> type: mpiaij > >>>>>>>>>> rows=973051, cols=973051 > >>>>>>>>>> total: nonzeros=9.90037e+07, allocated > >>> nonzeros=9.90037e+07 > >>>>>>>>>> total number of mallocs used during MatSetValues calls =0 > >>>>>>>>>> using I-node (on process 0) routines: found 78749 > >>> nodes, limit used is 5 > >>>>>>>>>> > >>>>>>>>>> The pattern of convergence gives a hint that this system is > >>> somehow bad/singular. But I don't know why the preconditioned error > goes up > >>> too high. Anyone has an idea? > >>>>>>>>>> > >>>>>>>>>> Best regards > >>>>>>>>>> Giang Bui > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>> > >>>> > >>> > >>> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: